The present invention generally relates to methods and systems for image segmentation and, more particularly, to methods for adaptive and progressive gradient-based multi-resolution color image segmentation and systems thereof.
The interest in digital media has grown to new heights with rapid technological advancements being made in the capture and sharing of images, consequently necessitating the exploration of methods to enhance, classify and/or extract information from them. Image segmentation is one approach that provides the foundation to make these functionalities ever more effective and expeditious. Segmentation is defined as the meaningful partitioning of an image into distinct clusters that exhibit homogeneous characteristics. In doing so, it generates a reduced and relevant dataset for high level semantic operations such as rendering, indexing, classification, compression, content-based retrieval, and multimedia applications, to name a few. Though segmentation comes naturally to human observers, the development of a simulated environment to perform this imaging task has proven to be extremely challenging.
Many grayscale/color domain methodologies have been adopted in the past to tackle this ill-defined problem (see Lucchese et al., “Color Image Segmentation: A State of the Art Survey,” Proc. Indian National Science Acad. 67(2):207-221 (2001); Cheng et al., “Color Image Segmentation: Advances & Prospects,” Pat. Rec. 34(12):2259-2281 (2001), which are hereby incorporated by reference in their entirety, for comprehensive surveys). Initial multiscale research was aimed to overcome drawbacks being faced by Bayesian approaches for segmentation/classification, using Markov Random Fields (MRF's) and Gibbs Random Field's (GRF's) estimation techniques. Derin et al., “Modeling and Segmentation of Noisy and Textured Images Using Gibbs Random Fields,” IEEE Trans. on Pat. Anal. and Mach. Int 9(1):39-55 (1987), which is hereby incorporated by reference in its entirety, proposed a method of segmenting images by comparing the Gibbs distribution results to a predefined set of textures using a Maximum a posteriori (MAP) criterion. Pappas et al. “An Adaptive Clustering Method For Image Segmentation,” IEEE Trans. on Sig. Process. 40(4):901-914 (1992), which is hereby incorporated by reference in its entirety, generalized the k-means clustering method using adaptive and spatial constraints, and the Gibbs Random Field (GRF) model to achieve segmentation in the gray scale domain. Chang et al. “Adaptive Bayesian Segmentation of Color Images,” Jour. of Elec. Imag. 3(4):404-414 (1994), which is hereby incorporated by reference in its entirety, extended this to color images by assuming conditional independence of each color channel. Improved segmentation and edge linking was achieved by Saber et al. “Fusion of Color and Edge Information For Improved Segmentation and Edge Linking,” Imag. and Vision Comp. 15:769-780 (1997), which is hereby incorporated by reference in its entirety, who combined spatial edge information and the regions resulting from a GRF model of the segmentation field. Bouman et al. “Multiple Resolution Segmentation of Textured Images,” IEEE Trans. on Pat. Anal. and Mach. Int 7(1):39-55 (1991), which is hereby incorporated by reference in its entirety, proposed an method for segmenting textured images comprising regions with varied statistical profiles using a causal Gaussian autoregressive model and a MRF representing the classification of each pixel at various scales. However most of the aforementioned methods suffered from the fact that the obtained estimates could not be calculated exactly and were computationally prohibitive. To overcome these problems, Bouman et al. “A Multiscale Random Field Model For Bayesian Image Segmentation” IEEE Transactions on Image Processing 3(2):1454-1466 (1994), which is hereby incorporated by reference in its entirety, extended his work by incorporating a multiscale random field model (MSRF) and a sequential MAP (SMAP) estimator. The MSRF model was used to capture the characteristics of image behavior at various scales. However, the work in Bouman et al., “Multiple Resolution Segmentation of Textured Images,” IEEE Trans. on Pat. Anal. and Mach. Int 7(1):39-55 (1991); and Bouman et al., “A Multiscale Random Field Model For Bayesian Image Segmentation” IEEE Transactions on Image Processing 3(2):1454-1466 (1994), which are hereby incorporated by reference in their entirety, had either used single scale versions of the input image, or multiscale versions of the image with the underlying hypothesis that the random variables at a given level of the image data pyramid were independent from the ones at other levels.
Comer et al. “Multiresolution Image Segmentation,” IEEE International Conference on Acoustics Speech and Signal Processing (1995), which is hereby incorporated by reference in its entirety, used a multiresolution Gaussian autoregressive model (MGAR) for a pyramid representation of the input image and “maximization of posterior marginals” (MPM) for pixel label estimates. He established correlations for these estimates at different levels using the interim segmentations corresponding to each level. He extended his work in Comer et al., “Segmentation of Textured Images Using a Multiresolution Gaussian Autoregressive Model,” IEEE Transactions on Image Processing 8(3):1454-1466 (1999), which is hereby incorporated by reference in its entirety, by using a multiresolution MPM model for class estimates and a multiscale MRF to establish interlevel correlations into the class pyramid model. Liu et al. “Multiresolution Color Image Segmentation,” IEEE Transactions on Image Processing 16(7):1454-1466 (1994), which is hereby incorporated by reference in its entirety, proposed a relaxation process that converged to a MAP estimate of the eventual segmentation of the input image using MRF's in a quadtree structure. An MRF model in combination with the discrete wavelet transform was proposed by Tab et al. “Scalable Multiresolution Color Image Segmentation,” Signal Processing 86:1670-1687 (2006), which is hereby incorporated by reference in its entirety, for effective segmentations with spatial scalability, producing similar patterns at different resolutions. Cheng et al. in International Conference on Image Processing (1998), which is hereby incorporated by reference in its entirety, incorporated Hidden Markov Models (HMM's) for developing complex contextual structure, capturing textural information, and correlating among image features at different scales unlike previously mentioned MRF models. The method's usefulness was illustrated on the problem of document segmentation where intra scale contextual dependencies can be imperative. A similar principle was applied by Won et al. (Won et al., “Hidden Markov Multiresolution Texture Segmentation Using Complex Wavelets,” in International Conference on Telecommunications, which is hereby incorporated by reference in its entirety), who combined HMM and Hidden Markov Tree (HMT) forming a hybrid HMM-HMT model to establish local and global correlations for efficient block-based segmentations.
Watershed and wavelet-driven segmentation methods has been of interest for many researchers. Vanhamel et al. “Multiscale Gradient Watersheds of Color Images,” IEEE Transactions on Image Processing 12(6):1454-1466 (2003), which is hereby incorporated by reference in its entirety, proposed a scheme constituting a non-linear anisotropic scale space and vector value gradient watersheds in a hierarchical frame work for multiresolution analysis. In a similar framework Makrogiannis et al. “Watershed-Based Multiscale Segmentation Method For Color Images Using Automated Scale Selection,” J. Electronic Imaging 14(3) (2005), which is hereby incorporated by reference in its entirety, proposed watershed based segmentations utilizing a fuzzy dissimilarity measure and connectivity graphs for region merging. Jung et al. “Combining Wavelets and Watersheds For Robust Multiscale Image Segmentation,” Image And Vision Computing 25:24-33 (2007), which is hereby incorporated by reference in its entirety, combined orthogonal wavelet decomposition with the watershed transform for multiscale image segmentation.
Edge, contour and region structure are other features that have been adopted in various approaches for effective segmentations. Tabb et al. “Multiscale Image Segmentation by Integrated Edge and Region Detection,” IEEE Transactions on Image Processing 6(5) (1997), which is hereby incorporated by reference in its entirety, instituted a multiscale approach where the concept of scale represented image structures at different resolutions rather than the image itself. The work involved performing a Gestalt analysis facilitating detection of edges and regions without any smoothing required at lower scales. On the other hand, Gui et al. “Multiscale Image Segmentation Using Active Contours,” which is hereby incorporated by reference in its entirety, obtained multiscale representations of the image using weighted TV flow and used active contours for segmentation. The contours at one level were given as input to the next higher level to refine the segmentation outcome at that level. Munoz et al. “Unsupervised active Regions For Multiresolution Image Segmentation,” IEEE International Conference On Pattern Recognition (2002), which is hereby incorporated by reference in its entirety, applied fusion of region and boundary information, where the later was used for initializing a set of active regions which in turn would compete for pixels in the image in manner that would eventually minimize a region-boundary based energy function. Sumengen et al. “Multi-Scale Edge Detection and Image Segmentation,” (2005), which is hereby incorporated by reference in its entirety, showed through his work that multiscale approaches are very effective for edge detection and segmentation of natural images. Mean shift clustering followed by a minimum description length (MDL) criterion was used by Luo et al. “Unsupervised Multiscale Color Image Segmentation Based on MDL Principle,” IEEE Transactions on Image Processing 15(9):1454-1466 (2006), which is hereby incorporated by reference in its entirety, for the same purpose.
Fusion of color and texture information is an eminent methodology in multiresolution image understanding/analysis research. Deng et al. “Unsupervised Segmentation of Color-Texture Regions in Images and Video,” IEEE Transactions on Pattern Analysis and Machine Intelligence 23(8):800-810 (2001), which is hereby incorporated by reference in its entirety, proposed a method prominently known as JSEG that performed color quantization and spatial segmentation in combination of a multiscale growth procedure for segmenting color-texture regions in images and video. Pappas et al. (Chen and Pappas, “Perceptually Tuned Multi-Scale Color Texture Segmentation,” in IEEE International Conference on Image Processing (2004), which is hereby incorporated by reference in its entirety,) utilized spatially adaptive features pertaining to color and texture in a multiresolution structure to develop perceptually tuned segmentations, validated using photographic targets. Dominant color and homogenous texture features (HTF) integrated with an adaptive region merging technique were employed by Wan et al. “Multi-Scale Color Texture Image Segmentation With Adaptive Region Merging,” IEEE International Conference on Acoustics Speech and Signal Processing (2007), which is hereby incorporated by reference in its entirety, to achieve multiscale color-texture segmentations.
The task of segmenting images in perceptually uniform color spaces is an ongoing area of research in image processing. Paschos et al. “Perceptually Uniform Color Spaces For Color Texture Analysis: An Empirical Evaluation,” IEEE Transactions on Image Processing 10(6):932-937 (2001), which is hereby incorporated by reference in its entirety, proposed an evaluation methodology for analyzing the performance of various color spaces for color texture analysis methods such as segmentation and classification. The work showed that uniform/approximately uniform color spaces such as L*a*b*, L*u*v* and HSV possess a performance advantage over RGB, a non uniform color space traditionally used for color representation. The use of these color spaces was found to be suited for the calculation of color difference using the Euclidean distance, employed in many segmentation methods. Yoon et al. “Color Image Segmentation Considering the Human Sensitivity For Color Pattern Variations,” SPIE Proceedings 4572:269-278 (2001), which is hereby incorporated by reference in its entirety, utilized this principle to propose a Color Complexity Measure (CCM) for generalizing the K-means clustering method, in the CIE L*a*b* space. Chen et al. “Contrast-Based Color Image Segmentation,” IEEE Signal Processing Letters, 11(7): 64 1-644 (2004), which is hereby incorporated by reference in its entirety, employed color difference in the CIE L*a*b* space to propose directional color contrast segmentations. Contrast generation as a function of the minimum and maximum value of the Euclidean distance in the CIE L*a*b* space, was seen in the work of Chang et al. “Color-Texture Segmentation of Medical Images Based on Local Contrast Information,” IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology pp. 488-493 (2007), which is hereby incorporated by reference in its entirety. This contrast map, subjected to noise removal and edge enhancement to generate an Improved Contrast Map (ICMap), was the proposed solution to the problem of over-segmentation in the JSEG method. More recently, Gao et al. “A Novel Multiresolution Color Image Segmentation Technique and its Application to Dermatoscopic Image Segmentation,” in IEEE International Conference on Image Processing (2000), which is hereby incorporated by reference in its entirety, introduced a ‘narrow-band’ scheme for multiresolution processing of images by utilizing the MRF expectations-maximization principle in the L*u*v* space. This technique was found to be competent especially for segmenting dermatoscopic images. Lefevre et al. “Multiresolution Color Image Segmentation Applied to Background Extraction in Outdoor Images,” IS&T European Conference on Color in Graphics, Image and Vision, Poitiers, France, pp. 363-367 (2002), which is hereby incorporated by reference in its entirety, performed multiresolution image segmentation in the HSV space, applied to the problem of background extraction in outdoor images.
Color gradient-based segmentation is a new contemporary methodology in the segmentation realm. Dynamic color gradient thresholding (DCGT) was first seen in the work by Balasubramanian et al. “Unsupervised Color Image Segmentation By Dynamic Color Gradient Thresholding” Proceedings of SPIE/IS&T: Electronic Imaging Symposium, San Jose, Calif. (2008), which is hereby incorporated by reference in its entirety. The DCGT technique was primarily used to guide the region growth procedure, laying emphasis on color homogenous and color transition regions without generating edges. However this method faced problems of over segmentation due to lack of a texture descriptor and proved to be computationally expensive. Garcia et al. “Automatic Color Image Segmentation By Dynamic Region Growth and Multimodal Merging of Color and Texture Information”, International Conference on Acoustics, Speech and Signal Processing, Las Vegas, Nev., (2008), which is hereby incorporated by reference in its entirety, proposed a segmentation method that was an enhanced version of the DCGT technique (abbreviated here as Gradient Segmentation (GS) algorithm) by incorporating an entropic texture descriptor and a multiresolution merging procedure. The method brought significant improvement in the segmentation quality and computational costs, but was not fast enough to meet real time practical applications.
There remains a need for segmentation methods that efficiently facilitate: 1) selective access and manipulation of individual content in images based on desired level of detail, 2) handling sub sampled versions of the input images and decently robust to scalability, 3) a good compromise between quality and speed, laying the foundation for fast and intelligent object/region based real-world applications of color imagery.
This patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
a) is a diagram of multiresolution image representation;
a) is a gradient histogram of a parachute;
b) is a gradient histogram of a Cheetah;
c) is a gradient histogram of cars;
a) is a graph of a gradient histogram of RBG versus CIE L*a*b* of a parachute;
b) is a graph of a gradient histogram of RBG versus CIE L*a*b* of a Cheetah;
c) is a graph of a gradient histogram of RBG versus CIE L*a*b* of cars;
a) is an image of an RGB version of the image of the parachute;
b) is an image of an L*a*b* version of the image of the parachute;
c) is an image of an initial cluster generated at λ=5 of the image of the parachute;
d) is an image of an initial cluster generated at λ=10 of the image of the parachute;
a) is an L*a*b* (81×121) image of a car;
b) is an image of a corresponding color gradient of the ‘Cars’ image;
c) is an image of initial clusters generated at λ=5, 50*MSSL of the ‘Cars’ image;
d) is a logical seed map of the ‘Cars’ image;
e) is a logical seed map after dilation of the ‘Cars’ image;
f) is a gradient map of the ‘Cars’ image with padded seeds;
g) is an image of initial clusters at λ+5 (10), 25*MSSL of the ‘Cars’ image;
h) is an image of parent seeds of the ‘Cars’ image;
a) is a logical parent seeds map of ‘Cars’ image;
b) is an image of unassigned pixels of the ‘Cars’ image;
c) is an image of large unassigned regions of the ‘Cars’ image;
d) is a map of isolated and small contiguous pixel regions of the ‘Cars’ image;
e) is the map shown in
f) is an image of isolated and small seed borders of the ‘Cars’ image;
g) is an image of neighborhood labels of the ‘Cars’ image;
h) an image of label assignment of the ‘Cars’ image;
i) is an image of seed saturation of the ‘Cars’ image;
a) is a map of parent seeds after seed saturation;
b) is an image of new seeds after threshold increment;
c) is an image of parent seed borders;
d) is a map of adjacent child seeds;
e) is a seed map after one interval of the region growth procedure;
f) is an image of seeds obtained during the first stage dynamic seed addition procedure;
g) is an image of parent Seeds for the next region growth interval;
a) is an L*a*b* (161×241) image of a car;
b) is an image of a corresponding color gradient of the ‘Cars’ image;
c) is an image of interim segmentation of an (81×121) ‘Cars’ image;
d) is an image of high confidence seeds of an (161×241) ‘Cars’ image;
e) is a gradient map of the ‘Cars’ image with padded high confidence seeds;
b) is a first zoomed view of the gradient histogram shown in
c) is a second zoomed view of the gradient histogram shown in
a) is a graph classifying threshold intervals for distributed dynamic seed addition;
b) is a graph illustrating a zero crossing curve between red and green curves shown in
a) is an image of agglomeration of seeds obtained at decision boundary 1 (gradient value 22) of an image of a car;
b) is an overall seed map after an initial phase of seed addition;
c) is an image of an agglomeration of obtained seeds;
d) is an overall seed map prior to region growth;
a) is an L*a*b* (321×481) image of a car;
b) is an image of a corresponding color gradient of the ‘Cars’ image;
c) is an image of interim segmentation of an (161×241) ‘Cars’ image;
d) is an image of high confidence seeds of an (161×241) ‘Cars’ image;
e) is a gradient map of the ‘Cars’ image with padded high confidence seeds;
f) is an image of an agglomeration of seeds obtained at various thresholds lower than the decision gradient value;
a) is an MAPGSEG region growth map at Level 1;
b) is an MAPGSEG region growth map at Level 0;
c) is an image of neighborhood label assignment at Level 1;
d) is an image of neighborhood label assignment at Level 0;
e) is an image of iterative morphological label assignment at Level 1;
g) is an image of the region growth map using the GS method before residual pixel label assignment;
h) is an image of the region growth map using the GS method after residual pixel label assignment;
a) is an image of interim output at Level 2;
b) is an image of interim output at Level 1;
c) is an image of zero insertion yielding at Level 1;
d) is an image of zero insertion yielding at Level 0;
e) is an image of neighborhood label assignment at Level 1;
f) is an image of neighborhood label assignment at Level 0;
a) is a logical map of high confidence pixel locations corresponding to quantization levels 5 and 12 at a Level 1;
b) is a color map of high confidence pixel locations corresponding to quantization levels 5 and 12 at a Level 1;
c) is a zoomed in version of the circular region shown in
a) is a color map of high confidence pixel locations of an image of a car;
b) is an image of mutual seed border regions of high confidence pixel locations of the ‘Cars’ image;
c) is a color map of high confidence pixel locations of the ‘Cars’ image after mutual seed border regions removal;
d) is an image of large confident regions of the ‘Cars’ image;
e) is an image of large confident regions seed borders of the ‘Cars’ image;
f) is an image of mutual seed border region labels of the ‘Cars’ image;
g) is an image of high confidence mutual seed border regions of the ‘Cars’ image;
h) is an image of a-priori information after border refinement of the ‘Cars’ image;
a) is a multiresolution representation of a color converted ‘Star fish’ image;
b) is a multiresolution representation of a color gradient of the ‘Star fish’ image;
c) is a multiresolution representation of seeds maps at the end of progressive region growth of the ‘Star fish’ image;
d) is a multiresolution representation of entropy based texture maps of the ‘Star fish’ image;
e) is a multiresolution representation of interim and final segmentation outputs of the ‘Star fish’ image;
a) is an image of interim segmentation of the ‘Star fish’ image at Level 2;
b) is an image of interim segmentation of the ‘Star fish’ image unconverted to Level 1;
c) is an image of a-priori information of the ‘Star fish’ image at Level 1;
d) is an image of interim segmentation of the ‘Star fish’ image at Level 1;
e) is an image of interim segmentation of the ‘Star fish’ image unconverted to Level 0;
f) is an image of a-priori information of the ‘Star fish’ image at Level 0;
g) is an image of MAPGSEG final segmentation output of the ‘Star fish’ image;
a) is an original image of a ‘Church’;
b) is an image of Gibbs Random Field ‘Church’ results;
c) is an image of JSEG ‘Church’ results;
d) is an image of DCGT ‘Church’ results;
e) is an image of GS ‘Church’ results;
f) is an image of MAPGSEG ‘Church’ results;
a) is an original image of a ‘Parachute’;
b) is an image of GRF ‘Parachute’ results;
c) is an image of JSEG ‘Parachute’ results;
d) is an image of DCGT ‘Parachute’ results;
e) is an image of GS ‘Parachute’ results;
f) is an image of MAPGSEG ‘Parachute’ results;
a) is an original image of a ‘Cheetah’;
b) is an image of GRF ‘Cheetah’ results;
c) is an image of JSEG ‘Cheetah’ results;
d) is an image of DCGT ‘Cheetah’ results;
e) is an image of GS ‘Cheetah’ results;
f) is an image of MAPGSEG ‘Cheetah’ results;
a) is an original image of ‘Nature’;
b) is an image of GRF ‘Nature’ results;
c) is an image of JSEG ‘Nature’ results;
d) is an image of DCGT ‘Nature’ results;
e) is an image of GS ‘Nature’ results;
f) is an image of MAPGSEG ‘Nature’ results;
a) is an original image of ‘Cars’;
b) is an image of GRF ‘Cars’ results;
c) is an image of JSEG ‘Cars’ results;
d) is an image of DCGT ‘Cars’ results;
e) is an image of GS ‘Cars’ results;
f) is an image of MAPGSEG ‘Cars’ results;
a) is an original image of an ‘Island’;
b) is an image of DCGT ‘Island’ results;
c) is an image of GS ‘Island’ results;
d) is an image of Level 2 MAPGSEG ‘Island’ results;
e) is an image of Level 1 MAPGSEG ‘Island’ results;
f) is an image of Level 0 MAPGSEG ‘Island’ results;
a) is an original image of ‘Asians’;
b) is an image of DCGT of ‘Asians’ results;
c) is an image of GS of ‘Asians’ results;
d) is an image of Level 2 MAPGSEG ‘Asians’ results;
e) is an image of Level 1 MAPGSEG ‘Asians’ results;
f) is an image of Level 0 MAPGSEG ‘Asians’ results;
a) is an original image of a ‘Tree’;
b) is an image of GS ‘Tree’ results;
c) is an image of Level 3 ‘Tree’ results;
d) is an image of Level 2 ‘Tree’ results;
e) is an image of Level 1 ‘Tree’ results;
f) is an image of Level 0 ‘Tree’ results;
a) is an original image of a ‘Road’;
b) is an image of CSEG ‘Road’ results;
c) is an image of Level 3 ‘Road’ results;
d) is an image of Level 2 ‘Road’ results;
e) is an image of Level 1 ‘Road’ results;
f) is an image of Level 0 ‘Road’ results;
a) is a graph of a distribution of NPR scores for 300 images of the Berkeley database from the GRF method;
b) is a graph of a distribution of NPR scores for 300 images of the Berkeley database from the JSEG method;
c) is a graph of a distribution of NPR scores for 300 images of the Berkeley database from the DCGT method;
d) is a graph of a distribution of NPR scores for 300 images of the Berkeley database from the GS method;
e) is a graph of a distribution of NPR scores for 300 images of the Berkeley database from the MAPSEG method;
f) is a graph of all method distributions superimposed;
a) is a graph of a computational time comparison utilizing the Berkeley database (321×421) from the MAPGSEG, GS and DCGT method;
b) is a graph of a computational time comparison utilizing the Berkeley database (321×421) from various levels of MAPGSEG method;
c) is a graph of a computational time comparison utilizing a large resolution image database (750×11200) from MAPGSEG and GS method; and
d) is a graph of a computational time comparison utilizing a large resolution image database (750×11200) from various levels of the MAPGSEG method.
An image processing computing device used for adaptive and progressive gradient-based multi-resolution color image segmentation includes a central processing unit (CPU) or processor, a memory, a user input device, a display, and an interface system, and which are coupled together by a bus or other link, although the computing system can include other numbers and types of components, parts, devices, systems, and elements in other configurations and other types and numbers of systems which perform other types and numbers of functions can be used.
The processor in the image processing computing device executes a program of stored instructions for one or more aspects of the present invention as described and illustrated herein, although the processor could execute other numbers and types of programmed instructions. The memory in the image processing computing device stores these programmed instructions for one or more aspects of the present invention as described and illustrated herein, although some or all of the programmed instructions could be stored and/or executed elsewhere. A variety of different types of memory storage devices, such as a random access memory (RAM) or a read only memory (ROM) in the system or a floppy disk, hard disk, CD ROM, or other computer readable medium which is read from and/or written to by a magnetic, optical, or other reading and/or writing system that is coupled to one or more processors, can be used for the memory.
The user input device in the image processing computing device is used to input information, such as image data, although the user input device could be used to input other types of data and interact with other elements. The user input device can include a computer keyboard and a computer mouse, although other types and numbers of user input devices can be used. The display in the image processing computing device is used to show images by way of example only, although the display can show other types of information The display can include a computer display screen, such as a CRT or LCD screen, although other types and numbers of displays could be used.
The interface system in the image processing computing device is used to operatively couple and communicate between the computing system with other types of systems and devices, such as a server system over a communication network, although other types and numbers of communication networks or systems with other types and numbers of connections and configurations to other types and numbers of systems, devices, and components can be used. By way of example only, the communication network can use TCP/IP over Ethernet and industry-standard protocols, including SOAP, XML, LDAP, and SNMP, although other types and numbers of communication networks, such as a direct connection, a local area network, a wide area network, modems and phone lines, e-mail, and wireless communication technology, each having their own communications protocols, can be used.
Although an embodiment of the image processing computing device is described and illustrated herein, the image processing computing device can be implemented on any suitable computer system or computing device. It is to be understood that the devices and systems of the embodiments described herein are for exemplary purposes, as many variations of the specific hardware and software used to implement the embodiments are possible, as will be appreciated by those skilled in the relevant art(s).
Furthermore, the system of the embodiments described herein may be conveniently implemented using one or more general purpose computer systems, microprocessors, digital signal processors, and micro-controllers, programmed according to the teachings of the embodiments, as described and illustrated herein, and as will be appreciated by those ordinary skill in the art.
In addition, two or more computing systems or devices can be substituted for the system in any embodiment of the embodiments. Accordingly, principles and advantages of distributed processing, such as redundancy and replication also can be implemented, as desired, to increase the robustness and performance of the devices and systems of the embodiments. The embodiments may also be implemented on computer system or systems that extend across any suitable network using any suitable interface mechanisms and communications technologies, including by way of example only telecommunications in any suitable form (e.g., voice and modem), wireless communications media, wireless communications networks, cellular communications networks, G3 communications networks, Public Switched Telephone Network (PSTNs), Packet Data Networks (PDNs), the Internet, intranets, and combinations thereof.
The embodiments may also be embodied as a computer readable medium having instructions stored thereon for one or more aspects of the present invention as described and illustrated by way of the embodiments herein, as described herein, which when executed by a processor, cause the processor to carry out the steps necessary to implement the methods of the embodiments, as described and illustrated herein.
A fast unsupervised multiresolution color image segmentation algorithm which takes advantage of gradient information in an adaptive and progressive framework in accordance with disclosed embodiments is described below. This gradient-based segmentation method is initiated with a dyadic wavelet decomposition scheme of an arbitrary input image, accompanied by a vector gradient calculation of its color converted counterpart. The resultant gradient map is used to automatically and adaptively generate thresholds for segregating regions of varying gradient densities, at different resolution levels of the input image pyramid. At each level, the classification obtained by a progressively thresholded growth procedure is integrated with an entropy-based texture model utilizing a unique region merging procedure to obtain an interim segmentation. In combination with a confidence map and non-linear spatial filtering techniques, regions of high confidence are passed from one resolution level to another until the final segmentation at highest (original) resolution is achieved. Performance evaluation of results obtained in accordance with embodiments of the present invention on several hundred images utilizing the Normalized Probabilistic Rand index demonstrates that the present invention computationally outperforms published segmentation techniques while obtaining superior quality.
An example of an unsupervised Multiresolution Adaptive and Progressive Gradient-based color image SEGmentation (MAPGSEG) method in accordance with the present invention is described herein, facilitating: 1) selective access and manipulation of individual content in images based on desired level of detail, 2) treatment of sub sampled versions of images with robustness to scalability, 3) a potential solution that computationally measures up to meet the demands of most practical applications involving segmentation, and 4) a practical compromise between quality and speed, laying the foundation for fast and intelligent object/region based real-world applications of color imagery.
An overview of a method for and progressive gradient-based multi-resolution color image segmentation in accordance with embodiments of the present invention is illustrated in
Image segmentation has wide spread medical, military and commercial interests. One particular embodiment of the method is designed from a commercial standpoint with an emphasis on performance. Several applications that can take advantage of the capabilities of this method are illustrated for exemplary purposes.
Rendering is often utilized in cameras and printers to acquire images with superior visual or print quality. This application is a tool that comes closest to transmuting reality to a photograph or printer output. A typical region/object oriented rendering method, designed for better print quality is shown in
The rendering procedure illustrated in
Content based image retrieval also known as Query By Image Content (QBIC) is defined as the process of sifting through large archives of digital images based on color, texture, orientation features, and other image content such as objects and shapes.
Technical concepts for the optimal implementation and understanding of embodiments of the method of the present invention are described herein. Firstly, a mathematical insight into the Wavelet Transform is provided, the foundation on which the wavelet theory has been established. Secondly, a brief discussion involving the extension of the wavelet transform for pyramidal image representations and its practical implementation using filter banks is provided, which is important from a multiresolution analysis standpoint. Thirdly, a brief description of the CIE L*a*b* color space and its characteristics that helped develop this method is given. Fourthly, a brief description of the Multivariate ANalysis Of Variance (MANOVA) data analysis statistical procedure has been provided. Finally, a segmentation evaluation metric known as the Normalized Probabilistic Rand Index, utilized to determine the performance of the present invention from a qualitative standpoint, has been discussed.
Wavelets are powerful tools capable of dividing data into various frequency bands describing, in general, the horizontal, vertical, and diagonal spatial frequency characteristics of the data. A detailed mathematical analysis of initial multiresolution image representation models and its relation to the Wavelet Transform (WT) can be seen in the work of Mallat et al. “A Theory for Multiresolution Signal Decomposition: The Wavelet Representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence 11(7):674-693 (1989), which is hereby incorporated by reference in its entirety. Let L2 (R) denote the Hilbert space of square integrable 1-D functions f(x). The dilation of this function by a scaling component s can be represented as:
fs(x)=√{square root over (s)}f(sx) (1).
The WT can be defined by decomposing a signal into a class of functions obtained by the translation and dilation of a function ψ(x). Here, ψ(x) is called a wavelet and the class of functions is defined, using (1), by (√{square root over (s)}ψ(s(x−u))) (s,u)εR2. To this effect, the WT is defined as:
An inner product representation of Eq. (2) can be written as:
Wf(s,u)=f(x),ψs(x−u) (3).
To enable the reconstruction of f(x) from Wf(s, u) the Fourier transform of {circumflex over (ψ)}(x) must comply with:
Eq. (4) signifies that {circumflex over (ψ)}(0)=0, and {circumflex over (ψ)}(x) is small in the vicinity of ω=0. Therefore, ψ(x) can be construed as the impulse response of a Band Pass Filter (BPF). WT can be now written as a convolution product given as:
Wf(s,u)=f*{tilde over (ψ)}2(u) (5)
where {tilde over (ψ)}s. ψs(x)=(x). Thus, a WT can be interpreted as a filtering of f(x) with a BPF whose impulse response is {tilde over (ψ)}s(x). Furthermore from the aforementioned discussion it can be seen that the resolution of a WT varies with scale parameter s.
Sampling s, u and selecting a sequence of scales (aj)jεZ, can be utilized to discretize the WT. Thus Eq. (5) can be rewritten as
Wf(αj,u)=f*{tilde over (ψ)}α
For the characterization of the decomposed signal in each channel, a uniform sampling rate proportional to aj must be used. Let the uniform sampling rate be aj/β at a scale aj. The Discrete Wavelet Transform (DWT) can be defined as:
A signal f (x) at resolution r can be acquired by filtering f (x) with a Low Pass Filter (LPF) whose bandwidth is proportional to the desired uniform sampling rate r, of the filtered result (S. G. Mallat, “A Theory for Multiresolution Signal Decomposition The Wavelet Representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence 11(7):674-693 (1989), which is hereby incorporated by reference in its entirety). To negate the possibility of inconsistency with resolution variation these LPF's are obtained from a function θ(x) dilated by the resolution parameter r and can be represented in form identical to that of Eq. (1), given below:
θr=√{square root over (r)}θ(rx) (8).
Likewise to Eq. (7) the discrete approximation of a function ƒ(x) on a dyadic array of resolutions (2j)jεZ can be represented as:
A
2
f=(f*θ2
Eq. (9) represents an important category of the DWT known as orthogonal wavelets. Consequently, a wavelet orthonormal basis corresponds to the DWT for a=2 and β=1. Although orthonormal basis can be constructed for scale sequences other than (2j)jεZ, in general dyadic scales are used because they result in simple decomposition methods. For pyramidal multiresolution image representations, θ(x) is chosen with a Fourier transform defined by:
Where U(e−iω) represents the transfer function of a discrete filter U=(un)neZ. Subsequently, the approximation of a function ƒ(x) at a scale (2j)jeZ is obtained by filtering A2si j+1fnεZ with U and restoring every alternate sample in the resultant convolution, written as:
Λ=A2
A
2
f=(λ2n)nεZ (12)
where A2
D
2
f=A
2
f−A
2
e
f (13)
where A2
Altogether, the previously mentioned discussion can be utilized to develop a multiresolution wavelet model. Earlier, Eq. (9) represented the estimate of f(x) at a scale of 2j, utilizing Eq. (3) and (5) this estimate can be re-written as:
A
2
f=(
f(x),{tilde over (θ)}j
In addition, the best estimate of f(x) at a resolution 2j can be derived to be the orthogonal projection of the signal on the array of all possible estimates designated by a vector space V2
Here H(e−iω) is the transfer function of a discrete filter. Furthermore if:
|H(e−iω)|2+|H−(e−iω)|2=1 (16)
then the discrete filters represented by H=(hn)nεZ are called as quadrature mirror filters. In addition, the orthogonal projection of f(x) on V2
represents the best estimate of f(x). Now A2
Utilizing Eq. (15), (18) in conjunction with Eq. (11) and (12) the discrete approximations A2
O
2
ΓC
n
=V
2
(19).
The orthogonal projection of f(x) onto O2
{circumflex over (ψ)}(2ω)=G(e−iω){circumflex over (φ)}(ω) and G(eiω)=e−iω
where G(e−iω) is the transfer function of a discrete filter G=(gn)nεZ. From Eq. (17) and (18) we have:
Here D2
Consequently, from the aforementioned mathematical discussion of the wavelet theory, it can be concluded that the notion of multiscale/resolution and quadrature mirror filters are directly allied to a wavelet orthonormal basis. Without any loss of generalization, this theory can be extended to 2-D signals f(x, y). In 2-D, the orthonormal basis is acquired using three wavelets ψ1(x), ψ2(x), ψ3(x), where each of these can be considered to be the impulse response of a BPF with a certain orientation preference. Thus the approximation A2
A
2
f=(f(x, y)*{tilde over (θ)}2
D
2
1
f−(f(x, y)*ψ2
D
2
2
f=(f(x, y)*ψ2
D
2
3
f=(f(x, y)*ψ2
Here D2
Practical implementation of multiscale image decomposition has been done effectively using filter banks. A filter bank is defined as an array of filters utilized to separate a signal into various sub bands, generally designed in a manner to facilitate reconstruction of the signal by simply combining the acquired sub bands. The decomposition and reconstruction procedures are better known as analysis and synthesis respectively.
In 1976 the Commission International de l′Eclairage (CIE) proposed two device independent approximately uniform color spaces, L*a*b* and L*u*v*, for different industrial applications with the aim to model the human perception of color. One important objective these color spaces were able to achieve with reasonable consistency was that, given two colors, the magnitude difference of the numerical values between them was proportional to the perceived difference as seen by the human eye (Green et al., Color Engineering, John Wiley and Sons Ltd. (2002), which is hereby incorporated by reference in its entirety). Experimental data was used to model the response of a person through tristimulus values X, Y and Z, which are linear transformations from R,G and B. Using these tristimulus values the CIE L*a*b* was defined as:
where Xn, Yn and Zn are the tristimulus values of a reference white.
The Multivariate ANalysis of Variance abbreviated as MANOVA, is a popular statistical method employed in highlighting differences between groups of data (W. J. Krzanowski, Principles of Multivariate Analysis, Oxford University Press, chapter 11 (1988), which is hereby incorporated by reference in its entirety) cumulatively structured in the form of a matrix of dimensions n×p, in which n samples are divided into g groups, where each sample is associated with p variables x1, x2, . . . , xp. To this effect, the goal of the MANOVA procedure is to find the optimal single direction in the p-dimensional space, so as to conveniently view differences between various groups.
In the general case of p variables, any direction in the p-dimensional space can be designated as a linear combination of certain vectors (a1, a2, . . . , ap), which can be utilized to convert every p-variate sample to a univariate observation yi=aTxi, where aT=(a1, a2, . . . , ap). However since the n data samples are divided into g groups, the obtained univariate observations are re-labeled as yij denoting the y value for the jth sample in the ith group, where i=1 . . . g and j=1 . . . nj. In order to establish whether the aforementioned univariate observations demonstrate differences between groups, the total sum-of-squares of yij is partitioned into its Sum-of-Squares Between (SSB)-groups and Sum-of-Squares Within (SSW)-groups components, defined as:
Here,
and the notation (a) in Eq. (30) is utilized to underscore the fact that the SSB and SSW components vary with the choice of a. Utilizing these components a mean square ratio (F), to highlight group differences, is obtained as:
From Eq. (31) it can be observed that the larger the value of F, the more variability exists between groups than within groups. Consequently, the optimal choice of the coefficients aT=(a1, a2, . . . , ap) will be the one that yields the largest value for F.
However, to ascertain the optimal values of a, multivariate analogues of the between-groups and within-groups sum of squares components used in the univariate analysis of the variance are computed, and defined as:
where B0 known as the between-groups sum-of-squares and products matrix and W0 known as the within-groups sum-of-squares and products matrix, should be positive definite matrices. Furthermore, the notations xij,
where B=B0/(g−1) and W=W0/(n−g) are the between-groups and within-groups covariance matrices respectively.
The choice of the coefficients aT=(a1, a2, . . . , ap) which maximizes the value of F in Eq. (34) signifies the optimal single direction (or the best linear combination y=aTx) in the p-dimensional space, so as to highlight differences between various groups, and can be obtained by differentiating Eq. (34) with respect to a and assigning result to zero. To this effect, we have:
where aT Ba/aTWa=l is a constant, equal to the maximum value of the mean square ratio(F). Also, for Eq. (35) to be satisfied, it can be inferred that l must be an eigenvalue and a must be an eigenvector of W−1B. Moreover, since l is constant value at the maximum of F, a must be eigenvector associated with the largest eigenvalue of W−1B, that determines the optimal linear combination y=aTx. Note that for a distinct separation of groups (greater variability between groups than within groups), l will be significantly greater than unity.
When the number of groups (g) or the dimensionality of the original space (p) is large, the goal of determining a single direction in the p-dimension space renders an inefficient solution to view disparities between various groups. However, since Eq. (35) often possesses more than one solution, multiple differentiating directions can be generated, whose efficiency in delineating groups of data depends on the magnitude of the eigenvalue/eigenvector pairs. To this effect, if Eq. (35) possessed s non-zero eigenvalues (l1, l2, . . . , ls) with corresponding eigenvectors (a1, a2, . . . , as), a set of new variates (y1, y2, . . . , ys) know as canonical variates can be obtained according to yi=aiTxi, and the space spanning all yi's is termed as a canonical variate space. Following this, Eq. (18) can be re-written in matrix terms as BA=WAL, where A is matrix of all ai's of dimensions (p×s), while L is matrix of all li's of dimensions (s×s). Furthermore, in this space, the mean of an arbitrary ith group of individuals can be represented as
An appropriate measure to quantify the variability between two random groups (ith and jth) of data is the distance between the corresponding group means, given by:
D=((
Here M is a matrix that modifies the influence of each variate in the aforementioned distance computation. Moreover, to exploit the covariances between variables and as well as differential variances, M can be chosen to be the inverse of the Within-groups dispersion matrix (W). The resultant distance measure for this choice of M yields the Mahalanobis distance, defined as:
D=((
The Euclidean between ith and the jth groups in the canonical variate space after substitution for
D=((
Furthermore, it can be shown that AAT=W−1, resulting in Eq. (38) being equal to Eq. (37). Thus, by generating a canonical variate space in a manner described in this section, the Euclidean distance between the group means in this space is equivalent to the Mahalanobis distance in the original p-dimension space. Moreover, since the Mahalanobis distance metric takes in to consideration the covariance and differential variance between variables, this distance measure is utilized to measure of variability between two multivariate populations.
To objectively measure the quality of our segmentation results, we have selected a recently proposed generic technique of evaluating segmentation correctness, referred to as the Normalized Probabilistic Rand (NPR) index (Unnikrishnan et al., “Toward Objective Evaluation of Image Segmentation Methods,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):929-944 (2007), which is hereby incorporated by reference in its entirety), designed such that: 1) it does not yield cases where the evaluation produces a high value in spite of the automatic segmentation result being nowhere closely similar to any one of its corresponding ground truths (nondegeneracy), 2) no assumptions are made about label assignments and region sizes, 3) it penalizes the evaluation score when the automatic segmentation fails to distinguish between regions that humans can distinctly identify and facilitates for lesser penalty in regions that are visually ambiguous (adaptive accommodation of label refinement), and 4) it facilitates comparison amongst multiple ground truths of the same image as well as of different images. The following subsections briefly discuss the mathematical preliminaries used for implementing the NPR evaluation methodology.
The Rand Index, first instituted by William Rand (Unnikrishnan et al., “Measures of Similarity,” IEEE Workshop on Applications of Computer Vision, pp. 394-400 (2005), which is hereby incorporated by reference in its entirety), facilitates the comparison of two arbitrary segmentations utilizing pair wise label relationships. It is defined as the ratio of number of pixel pairs that share the same label relationship in two segmentations, and is represented as:
Here S and S′ are two segmentations of an image comprising of N pixels, with corresponding label assignments and {li} and {liu} wherel i=1, 2, . . . , N. Furthermore, I is the identity function, ̂ represents a logical conjunction (‘AND’ operation), and the denominator represents all possible unique pixel pairs in a dataset of N points. The Rand Index varies from 0 to 1, where 0 represents complete dissimilarity and 1 symbolizes that S and S′ are identical. The Rand index is disadvantaged by its capability of handling only one ground truth segmentation for evaluation and its inability to accommodate adaptive label refinement.
The Probabilistic Rand Index (Unnikrislman et al., “Measures of Similarity,” IEEE Workshop on Applications of Computer Vision, pp. 394-400 (2005), which is hereby incorporated by reference in its entirety), enables evaluation of segmentation correctness, taking into consideration the statistical nature of the Rand Index. The PR index allows comparison of a test segmentation result (Stest) to a set of multiple ground-truths (S1, S2, . . . SK) through a soft non-uniform weighting of pixel pairs as a function of the variability in the ground-truth set.
The Probabilistic Rand (PR) Index is defined as:
where {liS
The PR Index takes the same range of values as the Rand Index, from 0 to 1 where 0 signifies complete dissimilarity and 1 represents a perfect match with ground truths. Although the PR Index helps overcome the aforementioned drawbacks of the Rand Index, it suffers from a deficiency of variation in its values over a large set of images due to its small effective range combined with the variation in its maximum value across images. Moreover the interpretation of the PR index across images is often ambiguous.
In order to overcome the aforementioned shortcomings of the PR index, Unnikrishnan et al. “Toward Objective Evaluation of Image Segmentation Algorithms,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):929-944 (2007), which is hereby incorporated by reference in its entirety, proposed the Normalized Probabilistic Rand (NPR) Index. The NPR metric is referenced to expected value of the PR Index, and is computed utilizing the variation and randomness in the set of ground truth images, defined as:
The normalization with respect to the expected value of the PR Index results in a much higher range of values, making the NPR Index a much more robust evaluation metric. In Eq. 25 the maximum value of the PR Index is chosen to be 1 (max[PR]=1), and the expected value of the PR Index (E[PR]) is obtained as:
To make the computation of meaningful Unnikrishnan et al. proposed its computation from segmentations of all images from an arbitrary database, for all unordered pixel pairs (i, j). Therefore, if Φ is the number of images in the database and KΦ is the number of ground truths per image then E[I(liS
where E[I(liS
The MAPGSEG method embodied in six modules is shown in
The segmentation method developed by Garcia et al. “Automatic Color Image Segmentation By Dynamic Region Growth and Multimodal Merging of Color and Texture Information”, International Conference on Acoustics, Speech and Signal Processing, Las Vegas, Nev., (2008), which is hereby incorporated by reference in its entirety, and abbreviated as the Gradient Segmentation (GS) method utilized fixed thresholds for segmentation, in the RGB color space. Initial clustering was performed using a threshold value of 10, followed by a region growth procedure carried out at thresholds intervals of 15, 20, 30, 50, 85, and 120. These fixed thresholds were utilized for any image irrespective of its content, and intuitively can be deemed non-ideal, owed to the varied gradient composition present in natural images. This intuitive notion was substantiated as the fixed thresholds intervals were found to consistently pose major problems that hindered the performance of the GS method, clearly demonstrated by the images in
In
In
Moreover, in
The MAPGSEG employs adaptive gradient thresholding in the CIE 1976 L*a*b* color space. The method begins with a conversion from RGB to CIE L*a*b for correct color differentiation, owed to the fact that the latter is better modeled for human perception and is more uniform in comparison to the RGB space. It should be noted that although the CIE L*a*b* has been used in the present invention, other color space models such as the CIE L*u*v*, CIE U*v*w*, YIQ and the like can be used. In general any 3-dimensional color space/model can be used for obtaining image segmentation in accordance with the present invention. The L*a*b* data is 8-bit encoded to values ranging from 0-255 for convenient color interpretation and to overcome viewing and display limitations. In addition it has also been widely used for commercial applications. The resultant color converted data is utilized for computing the vector color gradient utilizing the previously mentioned method described in Lee et al., “Detecting Boundaries in a Vector Field,” IEEE Transactions on Signal Processing 39(5):1181-1194 (1991), which is hereby incorporated by reference in its entirety, without any enhancement methodology. In general for an image, 8-bit L*a*b* values were found to span over a much smaller range than 8-bit RGB, consequently resulting in a relatively compact histogram compared to its enhanced RGB counterpart.
In
The MAPGSEG method is initiated with a color space conversion of the input image from RGB to CIE L*a*b* for reasons specified previously. Using the resultant L*a*b*data, the magnitude of the gradient of the full resolution color image field is calculated. The threshold values required for segmentation are determined utilizing the histogram of the color converted gradient map.
At first, the objective is to select a threshold for the initiation of the seed generation process. Preferably, a threshold value should be selected to expose most edges while ignoring the noise present in images. However, accomplishing this task is precluded by the unique disposition of natural scene images, where a threshold that correctly demarcates the periphery of a given region may unify other regions. Due to this factor, the thresholding method is initiated by estimating a value λ that aids in selecting the regions without any edges or with extremely weak and imperceptible edges. This threshold primarily is estimated based on the span of the histogram in combination with empirical data.
Given an image, one of two empirically determined threshold values is proposed to be chosen for initiating the seed generation process, by validating how far apart the low and high gradient content in the image are, in its corresponding histogram. The idea is that a high initial threshold be used for images in which a large percentage of gradient values spread over a narrow range and a low initial threshold value be used for images in which a large percentage of gradient values spread over a wide range, in comparison to the span of the histogram. These high and low initial threshold values were determined by obtaining the mean L*a*b* gradient histogram of 300 natural scene images present in the publicly available Berkeley database, shown in
As illustrated in
From a practical implementation standpoint, a decision of selecting the initial threshold (λ=5 or λ=10) for any arbitrary image by obtaining the percentage ratio of the gradient values corresponding to 80% and 100% area under its histogram curve was made, as shown in
Dynamic seed generation is that portion of the growth process where additional seeds are added to the initial seeds at the lowest resolution or existing high confidence seeds at subsequent higher resolutions. These threshold limits constituting various intervals selected for region growth are determined utilizing the area under the gradient histogram that does not fall within the range of the initial thresholds. The first threshold (Ti, i=1) is determined by adding 10% of this uncovered area to the area covered by Ti−1=λ+5 and obtaining the corresponding gradient value. This process is continued for each new stage of the dynamic seed addition procedure where a 10% increment of the uncovered area at each stage is added to the area already covered by the threshold value of its corresponding previous stage (Ai−1) as illustrated in
In the general sense a threshold value (Ti) that drives seed addition at the end of an ith stage of region growth is determined by:
The first summation in Eq. (45) represents the cumulative image area less than the gradient threshold Ti−1 that is processed in the (i−1)th stage of region growth, while the second summation represents the cumulative unprocessed image area greater than Ti−1. The quantity iΔg defined as the ‘growth factor’, determines the incremental percentage of image area of higher gradient densities to be processed in the nth stage. The entire quantity beyond the ‘+’ sign, is known as a Region Growth Interval (RGI), which represents the range of gradient values from Ti−1 to Ti (lower and upper limits of the ith RGI. In this manner utilizing Eq. (45) and an initialization threshold (λ), segmentation thresholds T1 to Ti differentiating T RGIs are computed, which are utilized for the previously mentioned functionalities.
The effect of utilizing the aforementioned threshold generation procedure is clearly illustrated in
The MAPGSEG method employs a dyadic wavelet decomposition scheme for multiscale image representation, as described above. In order to ensure suitable approximations of the 2-D input signal, the Daubechies (9, 7) biorthogonal analysis coefficients similar to the ones used in the JPEG2000 compression scheme, were employed at different resolution levels. Other coefficients associated with the haar, Daubechies 4, Daubechies 6, Daubechies 8, coiflet wavelets and the like, can be used in the present invention. These levels are designated as L=0, 1, 2 . . . k for a k-level decomposition. Since segmentation can be used in multiple applications it is preferred to make the number of decomposition levels dynamic, for an arbitrary image. However in order to be able to achieve this objective a user or application defined variable called ‘Desired dimension’ is introduced. Desired dimension (D) is defined as the smallest workable dimension desired by a user or constrained by an application. Often applications are restricted by the smallest size of an image that they can handle. Since a dyadic (power of 2) wavelet decomposition scheme is employed in the present invention, preferred values of D are values that are powers of 2 such as 64, 128, 256, 512 etc. Although any value of D can be picked by a user, it should be noted that D should be chosen to be less than the image dimension but greater than preferably about 20% of the image dimension to achieve optimal results. The MAPGSEG method is designed such that it gives the application or user the option to set the smallest workable dimension for segmentation. Once D is initialized based on the resolution of the input image, the method automatically determines the number of dyadic decomposition levels that will result in the input image resolution being in the vicinity DXD, since DXD may or may not be a dyadic scale of the original input. In the case of images that are of the form m by n where m≠n (rectangular image) the number of decomposition levels is found by working with the maximum of m and n and find the number of levels that will take this maximum value in the vicinity of D (see
Having obtained the number of decomposition levels (L) based on desired dimension D the input image (L=0) is decomposed to the smallest resolution (L=k). In doing so all the channel (L*, a*, b*) information acquired from the LL sub band and corresponding size information pertaining to the intermediate levels (L=k−1, k−2, . . . , 1) are stored. To this effect the decomposition scheme is performed only once without having to be repeated for every level. In
In accordance with the present invention a multi-resolution region growing methodology does not depend exclusively on the initial assignment of clusters, to arrive at the final segmentation. In the GS algorithm the region growth and seed addition process were interlaced with each other, where every growth cycle corresponded with a seed addition stage. However in the MAPGSEG method, a progressively thresholded growth procedure was preferred where region growth cycles do not have an exclusive one-to-one relationship with the seed addition procedure. The multiresolution region growing procedure involving distributed dynamic seed addition in accordance with embodiments of the present invention and its performance advantages in a multiscale framework is discussed below. A flow chart of the entire module is shown in
The initial positioning of seeds utilizing λ (either 5 or 10 for a particular image) and λ+5 is done only at the lowest resolution of the image pyramid, as shown in
MSSL=2Lα{dMLdNL} (46)
where, α is a small percentage (typically 0.01% with preferred range of 0.01% to 0.1% for optimal results) of the Lth scale image area. The parameter α is preferably chosen so as to capture all details in the image. The aforementioned ranges were determined based on this requirement. In addition it can be observed that as the value of α is increased to larger values the detail in eventual segmentation decreases and vice versa. Hence, for the initial clustering phase performed only at the lowest resolution L=kth level (previously mentioned), a Minimum Seed Size criterion proportional to MSSk=2ka{dMkdNk} is employed.
For the initial clustering phase these size criterions are obtained as 50*MSSL and 25*MSSL for λ and λ+5 respectively, as illustrated in
The parent seeds map, prior to region growth, is subjected to a seed saturation process where all isolated and small unassigned pixel regions encompassed within seed boundaries, are assigned the labels of corresponding parent seeds. However, contiguous unassigned pixel locations larger than the current size criterion (25*MSSL) are left unassigned as these are potential locations for new seeds during region growth. The seed saturation procedure for the parent seeds map shown in
where (i,j) is a random pixel co-ordinate and PSs Γψ v represents the dilation of the PSs map with a 3×3 square structuring element ψ that posses a value of ‘1’ at every location. The dilated version of
The adaptive gradient thresholding method discussed above generates dissimilar values of growth intervals for most natural scene images. However, in the case of images with less gradient detail or foreground content, a situation may arise where identical thresholds are generated, causing the region growth and seed addition procedure to be inefficient. To overcome this problem, at the very beginning of the region growth procedure, all the thresholds demarcating the growth intervals are checked for similarity with one another. The ‘check’ is designed such that the growth procedure is performed only if the two thresholds constituting the current interval are different from each other, otherwise it is forcibly existed and the processing of the next interval begins. This adds an additional dimension to the MAPGSEG method, as not only are the thresholds generated adaptively but also their number may vary from image to image. Such a situation can be viewed by the image in
The ‘Bird’ image shown in
Once the updated parent seeds map after seed saturation is obtained (
Having obtained the adjacent child seeds (
The combination of the CIE L*a*b* color space and the Euclidean distance metric was employed because: 1) it assures the comparison of colors is similar to the differentiation of colors by the human eye, 2) the increased complexity of a different distance metric, for example, the Mahalanobis distance, does not improve the results, due to the small variance of the regions being compared, owing to their spatial proximity. On the other hand the GS method employed the Euclidean distance metric in the RGB color space which is non-uniform in nature. Thus the Euclidean distance measure in a non-uniform color space, employed earlier, was not a true indication of similarity of between regions, resulting in the GS algorithm yielding many oversegmented results. However the use of the CIE L*a*b* which is more uniform in comparison to RGB, helped reducing the over segmentation problem to a great extent. The maximum color distance to allow the integration of a child seed to its parent was empirically chosen preferably to be 60 in the MAPGSEG method. This value can be varied with a preferred range of 40 to 60. However it should be noted that when the value of color distance is lowered it implies a more stringent parent-child similarity criterion and vice versa. The aforementioned value of 60 was chosen based on the embodiment of the present invention's performance on 300 images. However, the preferred range may vary slightly when based upon a larger number of images.
The dynamic seed addition portion of the region growth procedure is responsible for the detection of new areas with higher gradient densities, where each stage corresponds to a different threshold validated by performing a similarity check for the thresholds generated at the very beginning of the growth procedure. The seeds added due to dynamic seed addition process may include adjacent and non-adjacent seeds, and obtained at varying size criterions (10*MSSL, 5*MSSL, and a criterion equivalent to MSSL for all remaining seed addition thresholds) in a manner similar to initial clustering (shown in
In case of natural scene images where gradient content can dramatically vary, accomplishing region growth in the aforementioned iterative procedure over the entire gamut of gradient values present in these images can be computationally intensive. To this effect, the totality of the growth procedure in the proposed algorithm is restricted to a finite number of RGIs which may span only a portion of the total gamut of gradient values in an image, but sufficient enough to segment a large portion of it. This limit on the number of RGIs was chosen based on the average percentage of total segmented image area at different resolution levels, determined utilizing 300 natural scene images provided by the University of California at Berkeley. However, the preferred range may vary slightly when based upon a larger number of images. We found that with a growth factor (iΔg defined in Section 2.3) varying from 10% to 50% obtained utilizing 1≦i≦5 and Δg=10%, on an average more than 85% area of an image to be segmented, and hence constrain the growth phase to preferably a maximum of five RGIs (N=5). Furthermore, using the preferred range of values for N from 3 to 7, with Δg varying from 5% to 30% it was found that on an average more than 75% to 95% of the image can be segmented in the region growth procedure with varying computational requirements. This constraint on the number of RGIs, results in a small portion of the image largely including regions of color transitions in the periphery of existing seeds being left unsegmented, at the conclusion of the growth procedure (as mentioned earlier). These unsegmented regions are assigned labels in a procedure known as residual region growth that involves local neighborhood-based mode filtering and morphological dilation operations. Mode filtering is a technique in which un-labeled pixel locations (i,j) surrounded by existing seeds in their respective local 3×3 neighborhood (β), are assigned the most frequently occurring label from amongst the non-zero elements of that neighborhood (βnz), using a non-linear spatial mode filter (mf) defined as:
In locations where the mode of βnz is not unique (multimodal), a random label assignment φ from the acquired multiple mode values is performed, as represented in Eq. (35). At this stage the pixels that remain unsegmented are the ones whose corresponding local neighborhoods do not constitute any of the existing seeds. To this effect, an iterative morphological label assignment is employed, where-in all existing seeds are repeatedly dilated using a 3×3 structuring element ψ (defined previously) until there exists no unassigned pixels, to yield the final region growth map. The aforementioned constraint on the number of RGIs was chosen with the fundamental objective to reduce the computational expense incurred in the present invention. Thus, it should be noted that the region growth procedure can be employed to segment the entire image (100%) without the need for the residual region growth process. However segmenting the entire image using only the region process was found to be computationally intensive with small variations in the final results and thus the residual region growth procedure was introduced.
Progressive Region Growing utilizing Distributed Dynamic Seed Addition (DDSA)
In the region growth process discussed so far, there exists an exclusive one-to-one relationship with the seed addition procedure, which is the methodology adopted by the MAPGSEG method only at the smallest resolution in the image pyramid (see red arrows in
The significance of the DDSA can be intuitively derived from the images in
It can be observed that the a-priori information at the CDS, shown in
Practically, this objective is achieved by a histogram analysis of gradient information of the CDS image (
In
On the other hand, due to the diverse nature of natural images this consideration can yield contrasting results, illustrated by
The zero crossing point between the histogram curve of the segmented (red) and unsegmented pixels (green), was chosen to be a suitable threshold to distinguish among intervals which can be used for seed addition with and without region growing. To ensure that the correct decision threshold is being used the consistency in zero-crossing was also checked, as can be seen in
The performance advantage of the DDSA can be seen in
Clear advantages of the controlled threshold section for progressive region growing can be seen by observing the images presented in
Table 2 summarizes the functionality of the adaptively generated thresholds at various scales of the ‘Cars’ image. The progressive nature of the region growth procedure can be clearly observed in Table 2, where sequential growth takes place at level 0 and in doing so employing all growth intervals. At level 1 the growth procedure shift to the higher gradient content and finally at the highest resolution is not employed at all because of the absence of any significant unsegmented pixels so as to take full advantage of region growing. However for most images the MAPGSEG operates in a threshold range that covers regions of significant area in comparison to the image resolution, thus leaving all strong gradient regions unsegmented, as shown
In
The seed transfer module can be deemed as an interface for information transfer from one resolution to another in the MAPGSEG method. This module (M4 as seen in
This module is initiated by a seed map up conversion of the kth level segmentation to the resolution of the (k−1)th level. This step is necessary to ensure that the data transferred is in perfect alliance with the next higher resolution. To this effect, we first up sample the interim segmentation at kth level by a factor of two along each dimension thus transmuting it to the subsequent higher dyadic resolution. The up conversion process is a two step method including zero's insertion followed by neighborhood pixel assignment. Zero's insertion involves inserting zero's between every pixel along both dimensions such that an M×N scale is transmuted to a (2*M)×(2*N) scale. In
In addition this filter is applied only to the neighborhoods whose center pixel is zero, which from the aforementioned discussion is M×N numbered in a (2*M)×(2*N) scale image. The result of the aforementioned non-linear spatial filtering operation on the images presented in
Gradient quantization is required to determine the pixels in the estimated seed map that are acceptable with high confidence, and be passed on as a-priori information for an arbitrary decomposition level. In general when an image is decomposed to certain number of levels, flat regions can be segmented with relative ease even at lowest scale in comparison to strong gradient regions. This is due to the fact they have not undergone much change in gradient content, but it is just their size that has decreased. However, in case of strong gradient regions decomposition results in loss of information content of these regions and so cannot be segmented with the same ease as done on the full resolution image. The MAPGSEG method is designed to exploit this gradient characteristic for facilitating seed transfer. Thus, the gradient map is quantized at every dyadic scale to differentiate between high and low confidence pixels at that scale. The gradient quantization levels are chosen to be the adaptively generated threshold intervals, obtained at the commencement of the MAPGSEG method. The quantized gradient map combined with varying size criterions (discussed later in the seed map cleaning procedure) is utilized to derive a-priori information at a certain decomposition level.
A quantized gradient map utilizing the initial threshold (λ=5), growth intervals at (12, 15, 21, 34, 56), as well the maximum gradient value in the histogram (111), is shown in
The removal of minute seeds cannot be done by connected component analysis as it would only result in partial elimination of these seeds and simultaneously merge mutually adjacent ones, giving an undesired result. Therefore in order to be able to efficiently clean up all isolated as well as mutually adjacent seeds we proceed to determine the Mutual adjacent Seed Border Regions (MSBR). MSBR is defined as all those pixels that are common to two regions that are labeled differently. These regions are obtained through non-linear spatial filtering in the MAPGSEG method. The advantage of using nonlinear spatial filters is that it gives information in the image without actually manipulating individual pixel values.
Given a labeled seed map for facilitating the calculation of MSBR first all pixel neighborhoods containing having multiple labels including 0 are identified. This is done by differencing each pixel in a neighborhood from its adjacent value and finding the total difference. If this value is 0 then all pixels have the same value (in the neighborhood) else their labels differ. Having obtained all such neighborhoods a validation matrix (V) is generated, given by
where β3 is the 3×3 neighborhood being operated and h is the map comprising high confidence pixel locations. This validation matrix is required to segregate neighborhoods comprising multiple labels but having unique nonzero labels and the ones having multiple nonzero labels. Assume that V is being computed in a unique nonzero neighborhood β. In such a scenario the mean β of will be equivalent to the nonzero label itself resulting in V for β being 0. Similarly for multiple nonzero labels V>0 is obtain. Thus, in this particular example MSBR is defined as
The MSBR for the high confidence pixels map at level 1 (
The MSBR computation is followed by its elimination from high confidence pixels map, resulting in all seeds being independent, sharing no common border, as shown in
The border refinement procedure is responsible for finding all MSBR that have labels present in the map including large seeds, after subjecting it through the seed map cleaning protocol. These borders in turn are added back to large seeds map (
The border refinement procedure is the culmination point of the seed transfer module (see
In case of natural scene imagery, the segmentation task is often hampered by the presence of regions/patterns composed of multiple shades of colors or intensity variations due to surface/material properties like density, gradient, coarseness, directionality and the like. Such regions/patterns are referred to as ‘textures’ and are broadly classified into structured and stochastic types, in the image understanding domain. Structured textures are often man-made and have regularity in their appearance, such as a brick wall, interwoven fiber, etc., while stochastic textures are natural and are completely random patterns, such as leopard skin, tree bark, grass, etc. Due to the extensive presence of such patterns in natural scene images the MAPGSEG algorithm has been equipped with a texture characterization module (M5 in
A fundamental principle in information theory is based on the hypothesis that the presence of information can be modeled as a probabilistic process, and that the amount of information contained in a random event is inversely proportional to the probability of the occurrence of that event. Thus, if {x1, x2, . . . , xj} are a set of random gray levels present in an image, and {P(x1), P(x2), . . . , P(xj)} are the corresponding probabilities of occurrences of each of these gray levels, an arbitrary gray level {xi} from the set is said to contain:
binary units or bits of information when the base of the logarithm is 2. Furthermore, for an image comprising of k pixels, the law of large numbers states that a gray level {xi} exists on average of kP(xi)times. Consequently, the total information content (I) in these k pixels, whose intensity values is modeled as a discrete random variable X, is given by:
Therefore, the average information content per pixel is given by:
Apart from information content, the quantity H(X) also symbolizes the degree of randomness present in the image, and is popularly known as entropy. The entropy calculation in Eq. (55) defined for a single random variable (single channel gray image) can be extended to multiple random variables X, Y, Z (three channel color image) by computing the joint entropy, defined as:
However in order to achieve computational efficiency by avoiding joint entropy calculation between channels, quantization is done by uniformly dividing the 8-bit encoded L*a*b* cube into small boxes, and mapping all information that fall within each box to the color and luminance value at the center of that box (see
The progressive region growth procedure involving distributed dynamic seed addition, described in previously, was performed primarily based on the similarity of L*a*b* data between image regions. Consequently, the region growth map obtained at the end of this procedure at an arbitrary scale, in general comprises of over-segmented image regions due to illumination variations, occlusions, texture disparities etc. Thus, we employ a region merging module (M6 in
As mentioned previously the region merging module is integrated with the MANOVA procedure, to analyze data associated with each group in the region growth map (generated previously), to produce output segmentations that are spatially and spectrally coherent with the content of image being segmented. Consequently, to facilitate the aforementioned MANOVA-based region merging methodology, at the commencement of processing in this module, the L*, a*, b* and texture data associated with each group in the region growth map are vectorized and concatenated to matrix of dimensions equivalent to the total number of pixels in the image and number of variables (L*, a*, b*, texture) per pixel. The result matrix is employed in the MANOVA procedure involving the Mahalanobis distance (or similarity value) calculation between all possible group pairs, to identify and merge groups with similar characteristics.
The merging process is commenced by identifying the pair of groups with the minimum Mahalanobis distance, signifying the maximum similarity. However in order to reduce the number of iterations of the merging protocol for computational efficiency, by avoiding the merging of only a single group pair per iteration, the obtained distance value between the two most similar groups is gradually increased until a larger set of similar groups pairs (empirically set at five) are obtained. Subsequently, the acquired group pairs are merged with each other from the most similar group pair of the set, to the least similar one, eventually concluding a single iteration of the merging process. Following this, the Mahalanobis distances is recomputed for the all possible group pairs comprised in the new segmentation map, and the process is repeated until either a desired number of groups is achieved or the smallest distance between groups is larger than a certain threshold between two arbitrary groups. These termination criterions ensure that that all images displayed a similar level of segmentation, and were empirically chosen to be 40 and 4 respectively. However these could be varied depending on the application for which this algorithm is being used as well as image complexity, with preferred ranges being 30-60 and 2-4 respectively, for natural scene images.
The MAPGSEG results were benchmarked qualitatively and quantitatively—using the Normalized Probabilistic Rand index (NPR) (Unnikrishnan et al., “Toward Objective Evaluation of image Segmentation Methods,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):929-944 (2007), which is hereby incorporated by reference in its entirety)—against several popular methods on the same test bed of manually segmented images (ground truths). The results were compared against those from a spectrum of published segmentation methods such as GRF (Saber et al., “Fusion of Color and Edge Information for improved Segmentation and Edge Linking,” Image and Vision Computing 15(10):769-780 (1997), which is hereby incorporated by reference in its entirety), JSEG (Deng et al., “Unsupervised Segmentation of Color-Texture Regions in Images and Video,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(8):800-810 (2001), which is hereby incorporated by reference in its entirety), DCGT (Balasubramanian et al., “Unsupervised Color Image Segmentation by Dynamic Color Gradient Thresholding” Proceedings of SPIE/IS&T Electronic Imaging Symposium (2008); Garcia et al., “Automatic Color Image Segmentation by Dynamic Region Growth and Multimodal Merging of Color and Texture Information”, International Conference on Acoustics, Speech and Signal Processing, Las Vegas, Nev. (2008), which are hereby incorporated by reference in their entirety), and a computational time analysis was also performed to furnish a fair indication of the overall performance of the MAPGSEG method. The NPR index requires a set of images each having multiple manual segmentations, for evaluation. Such a set, comprising 1633 manual segmentations for 300 images of dimension ˜321×481, created by 30 human subjects, has been made publicly available by the University of California at Berkeley (Martin et al., “A Database of Human Segmented Natural Images and Its Application to Evaluating Segmentation Methods and Measuring Ecological Statistics,” in IEEE International Conference on Computer Vision Vol. 2:416-423, Vancouver, BC, Canada (2001), which is hereby incorporated by reference in its entirety). An additional 445 images with dimension 750×1200 were also utilized for accessing the performance of the MPGSEG against its single scale version. The entire testing database (745 images) was segmented on the same machine having a Pentium® 4 CPU 3.20 GHz, and 3.00 GB of RAM. The GRF, DCGT and GS methods were run from the executable files and MATLAB code provided by the Rochester Institute of Technology, while the JSEG method was run from a different executable file provided by the University of California at Santa Barbara. The proposed method was implemented using MATLAB version R2007a.
The results of the MAPGSEG method for the ‘starfish’ image at different stages are presented in
The input RGB image namely ‘starfish’ is first subjected to processing in module 1. The processing in this module commences in a color space conversion from RGB to CIEL*a*b* in a manner described in Green et al., Color Engineering, John Wiley and Sons Ltd. (2002), which is hereby incorporated by reference in its entirety, and using Eqs. (27) to (29), as described in [0224] and [0225].
Using the resultant L*a*b*data, the magnitude of the gradient of the full resolution color image field is calculated using the procedure described in Lee et al., “Detecting Boundaries in a Vector Field,” IEEE Transactions on Signal Processing 39(5):1181-1194 (1991), which is hereby incorporated by reference in its entirety.
Following this, using initialization thresholds of λ=5 and λ+5=10, as well as number of RGIs N=5, with Δg=10% the threshold values required for segmentation are determined adaptively utilizing Eq. 45, as described in [0226] to [0234]. These values were obtained to be 12, 15, 21, 30, and 45 for the aforementioned starfish image. The same thresholds were utilized for segmenting the input image at all resolutions.
Having obtained the segmentation thresholds in step 3 the full resolution RGB image is subjected to a dyadic wavelet decomposition (module 2) using the Daubechies (9, 7) biorthogonal wavelet analysis coefficients, summarized in Table 1. The number of decomposition levels was determined using the desired dimension D=128 as well as the image dimensions of 321×481, in accordance with the procedure described in [0235]. For this particular example 2-level decomposition was performed.
Starting at the smallest resolution (L=2 of dimensions 81×121), the initial clustering phase is performed using the procedure described in [0237] to [0238]. Initialization thresholds of λ=5 and λ+5=10, as well as a Minimum Seed Size (MSSL=2) criterion of 3 pixels is utilized for this purpose to eventually generate a Parent Seeds (PSs) map. As mentioned in [0238] the size of the Parent seeds were restricted to 50*MSS and 25*MSS for λ=5 and λ+5=10 respectively. Consequently these criteria were obtained to be 150 and 75 pixels, respectively.
The resultant initial seeds map is subjected through a region growth process (module 3) described in [0239] to [0257]. This procedure facilitates the growth of the existent parent seeds as well as the addition of new seeds in unsegmented regions at distinct stages of region processing. The growth of existent parent seeds was done by merging of child seeds to them using a color distance threshold of 60. In addition the region size criteria for the five RGIs, set at 10*MSS, 5*MSS, MSS, MSS and MSS, was obtained to be 30, 15, 3, 3 and 3 pixels, respectively.
Having completed the region growth process, a texture information channel (module 5) was computed using the procedure described in [0269] to [0271]. More specifically the color converted image in the L*a*b* color space was first quantized to 216 different colors and the result indexed/quantized map was employed in a local neighborhood based entropy calculation using Eq. (55) in a 9×9 window around every pixel in the image.
The acquired texture information, along with the L*a*b* information and the region growth map acquired in module 3, are engaged in region merging procedure (module 6) described in [0272] to [0274]. Furthermore this region merging process is based a statistical procedure known as the Multivariate ANalysis Of Variance (MANOVA) described in [0217]. In performing the aforementioned process the maximum similarity value for region merging was set at 4 while the maximum number of groups in the output segmentation (MaxNg) was set at 40.
The output of the region merging module yields an interim segmentation map at the smallest resolution (L=2 for this example). This interim segmentation is subjected through a multiresolution seed transfer process (module 4), described in [0258] to [0268], to identify seeds/regions transferable from the current resolution (L=2) to the subsequent higher resolution (L=1).
Having identified seeds transferable from the current resolution (L=2) to the subsequent higher resolution (L=1), excepting steps 3, 4 and 5 all the remaining steps (1, 2, 6, 7, 8, 9, and 10) are repeated for the image at resolution level L=1 of dimensions 161×241, to arrive at an interim segmentation at resolution level L=1. The only parametric change for this resolution level is that a Minimum Seed Size (MSSL=1) criterion of 7 pixels is utilized for processing various steps.
Having obtained at interim segmentation at resolution level L=1 and identified seeds transferable from the current resolution (L=1) to the subsequent higher resolution (L=0), excepting steps 3, 4 and 5 all the remaining steps (1, 2, 6, 7, 8, 9, and 10) are repeated for the image at resolution level L=0 of dimensions 321×481, to arrive at a final segmentation result at resolution level L=0. The only parametric change for this resolution level is that a Minimum Seed Size (MSSL=0) criterion of 15 pixels is utilized for processing various steps.
The input image pyramid (see
In addition,
Clear performance advantages of the MAPGSEG method can be viewed in
Results obtained from the MAPGSEG method in comparison to the previously mentioned segmentation methods, are shown in
The input RGB image namely ‘Church’ is first subjected to processing in module 1. The processing in this module commences in a color space conversion from RGB to CIEL*a*b* in a manner described in Green et al., Color Engineering, John Wiley and Sons Ltd. (2002), which is hereby incorporated by reference in its entirety, and using Eqs. (27) to (29), as described in [0224] and [0225].
Using the resultant L*a*b*data, the magnitude of the gradient of the full resolution color image field is calculated using the procedure described in Lee et al., “Detecting Boundaries in a Vector Field,” IEEE Transactions on Signal Processing 39(5):1181-1194 (1991), which is hereby incorporated by reference in its entirety.
Following this, using initialization thresholds of λ=10 and λ+5=15, as well as number of RGIs N=5, with Δg=10% the threshold values required for segmentation are determined adaptively utilizing Eq. 45, as described in [0226] to [0234]. These values were obtained to be 17, 19, 25, 35, and 56 for the aforementioned Church image. The same thresholds were utilized for segmenting the input image at all resolutions.
Having obtained the segmentation thresholds in step 3 the full resolution RGB image is subjected to a dyadic wavelet decomposition (module 2) using the Daubechies (9, 7) biorthogonal wavelet analysis coefficients, summarized in Table 1. The number of decomposition levels was determined using the desired dimension D=128 as well as the image dimensions of 321×481, in accordance with the procedure described in [0235]. For this particular example 2-level decomposition was performed.
Starting at the smallest resolution (L=2 of dimensions 81×121), the initial clustering phase is performed using the procedure described in [0237] to [0238]. Initialization thresholds of λ=10 and λ+5=15, as well as a Minimum Seed Size (MSSL=2) criterion of 3 pixels is utilized for this purpose to eventually generate a Parent Seeds (PSs) map. As mentioned in [0238] the size of the Parent seeds were restricted to 50*MSS and 25*MSS for λ=5 and λ+5=10 respectively. Consequently these criteria were obtained to be 150 and 75 pixels, respectively.
The resultant initial seeds map is subjected through a region growth process (module 3) described in [0239] to [0257]. This procedure facilitates the growth of the existent parent seeds as well as the addition of new seeds in unsegmented regions at distinct stages of region processing. The growth of existent parent seeds was done by merging of child seeds to them using a color distance threshold of 60. In addition the region size criteria for the five RGIs, set at 10*MSS, 5*MSS, MSS, MSS and MSS, was obtained to be 30, 15, 3, 3 and 3 pixels, respectively.
Having completed the region growth process, a texture information channel (module 5) was computed using the procedure described in [0269] to [0271]. More specifically the color converted image in the L*a*b* color space was first quantized to 216 different colors and the result indexed/quantized map was employed in a local neighborhood based entropy calculation using Eq. (55) in a 9×9 window around every pixel in the image.
The acquired texture information, along with the L*a*b* information and the region growth map acquired in module 3, are engaged in region merging procedure (module 6) described in [0272] to [0274]. Furthermore this region merging process is based a statistical procedure known as the Multivariate ANalysis Of Variance (MANOVA) described in [0217]. In performing the aforementioned process the maximum similarity value for region merging was set at 4 while the maximum number of groups in the output segmentation (MaxNg) was set at 40.
The output of the region merging module yields an interim segmentation map at the smallest resolution (L=2 for this example). This interim segmentation is subjected through a multiresolution seed transfer process (module 4), described in [0258] to [0268], to identify seeds/regions transferable from the current resolution (L=2) to the subsequent higher resolution (L=1).
Having identified seeds transferable from the current resolution (L=2) to the subsequent higher resolution (L=1), excepting steps 3, 4 and 5 all the remaining steps (1, 2, 6, 7, 8, 9, and 10) are repeated for the image at resolution level L=1 of dimensions 161×241, to arrive at an interim segmentation at resolution level L=1. The only parametric change for this resolution level is that a Minimum Seed Size (MSSL=1) criterion of 7 pixels is utilized for processing various steps.
Having obtained at interim segmentation at resolution level L=1 and identified seeds transferable from the current resolution (L=1) to the subsequent higher resolution (L=0), excepting steps 3, 4 and 5 all the remaining steps (1, 2, 6, 7, 8, 9, and 10) are repeated for the image at resolution level L=0 of dimensions 321×481, to arrive at a final segmentation result at resolution level L=0. The only parametric change for this resolution level is that a Minimum Seed Size (MSSL=0) criterion of 15 pixels is utilized for processing various steps.
Similar results can be seen in the ‘Parachute’ image. All methods apart from the MAPSEG algorithm, over segment the sky and mountain regions, as seen in
The input RGB image namely ‘Parachute’ is first subjected to processing in module 1. The processing in this module commences in a color space conversion from RGB to CIEL*a*b* in a manner described in Green et al., Color Engineering, John Wiley and Sons Ltd. (2002), which is hereby incorporated by reference in its entirety, and using Eqs. (27) to (29), as described in [0224] and [0225].
Using the resultant L*a*b*data, the magnitude of the gradient of the full resolution color image field is calculated using the procedure described in Lee et al., “Detecting Boundaries in a Vector Field,” IEEE Transactions on Signal Processing 39(5):1181-1194 (1991), which is hereby incorporated by reference in its entirety.
Following this, using initialization thresholds of λ=10 and λ+5=15, as well as number of RGIs N=5, with Δg=10% the threshold values required for segmentation are determined adaptively utilizing Eq. 45, as described in [0226] to [0234]. These values were obtained to be 18, 24, 45, 64, and 87 for the aforementioned Parachute image. The same thresholds were utilized for segmenting the input image at all resolutions.
Having obtained the segmentation thresholds in step 3 the full resolution RGB image is subjected to a dyadic wavelet decomposition (module 2) using the Daubechies (9, 7) biorthogonal wavelet analysis coefficients, summarized in Table 1. The number of decomposition levels was determined using the desired dimension D=128 as well as the image dimensions of 321×481, in accordance with the procedure described in [0235]. For this particular example 2-level decomposition was performed.
Starting at the smallest resolution (L=2 of dimensions 81×121), the initial clustering phase is performed using the procedure described in [0237] to [0238]. Initialization thresholds of λ=10 and λ+5=15, as well as a Minimum Seed Size (MSSL=2) criterion of 3 pixels is utilized for this purpose to eventually generate a Parent Seeds (PSs) map. As mentioned in [0238] the size of the Parent seeds were restricted to 50*MSS and 25*MSS for λ=5 and λ+5=10 respectively. Consequently these criteria were obtained to be 150 and 75 pixels, respectively.
The resultant initial seeds map is subjected through a region growth process (module 3) described in [0239] to [0257]. This procedure facilitates the growth of the existent parent seeds as well as the addition of new seeds in unsegmented regions at distinct stages of region processing. The growth of existent parent seeds was done by merging of child seeds to them using a color distance threshold of 60. In addition the region size criteria for the five RGIs, set at 10*MSS, 5*MSS, MSS, MSS and MSS, was obtained to be 30, 15, 3, 3 and 3 pixels, respectively.
Having completed the region growth process, a texture information channel (module 5) was computed using the procedure described in [0269] to [0271]. More specifically the color converted image in the L*a*b* color space was first quantized to 216 different colors and the result indexed/quantized map was employed in a local neighborhood based entropy calculation using Eq. (55) in a 9×9 window around every pixel in the image.
The acquired texture information, along with the L*a*b* information and the region growth map acquired in module 3, are engaged in region merging procedure (module 6) described in [0272] to [0274]. Furthermore this region merging process is based a statistical procedure known as the Multivariate ANalysis Of Variance (MANOVA) described in [0217]. In performing the aforementioned process the maximum similarity value for region merging was set at 4 while the maximum number of groups in the output segmentation (MaxNg) was set at 40.
The output of the region merging module yields an interim segmentation map at the smallest resolution (L=2 for this example). This interim segmentation is subjected through a multiresolution seed transfer process (module 4), described in [0258] to [0268], to identify seeds/regions transferable from the current resolution (L=2) to the subsequent higher resolution (L=1).
Having identified seeds transferable from the current resolution (L=2) to the subsequent higher resolution (L=1), excepting steps 3, 4 and 5 all the remaining steps (1, 2, 6, 7, 8, 9, and 10) are repeated for the image at resolution level L=1 of dimensions 161×241, to arrive at an interim segmentation at resolution level L=1. The only parametric change for this resolution level is that a Minimum Seed Size (MSSL=1) criterion of 7 pixels is utilized for processing various steps.
Having obtained at interim segmentation at resolution level L=1 and identified seeds transferable from the current resolution (L=1) to the subsequent higher resolution (L=0), excepting steps 3, 4 and 5 all the remaining steps (1, 2, 6, 7, 8, 9, and 10) are repeated for the image at resolution level L=0 of dimensions 321×481, to arrive at a final segmentation result at resolution level L=0. The only parametric change for this resolution level is that a Minimum Seed Size (MSSL=0) criterion of 15 pixels is utilized for processing various steps.
Segmenting textured regions becomes a challenge when regions with diverse textures are extremely similar in color. Here a good texture descriptor is indispensible.
In the following figures, shown are the interim and final segmentation outputs of this method in comparison to the DCGT, GS and human segmentations provided by the University of California at Berkeley. In
In addition the human segmentations for the island image are shown in
In
In the NPR evaluation, the normalization factor was computed by evaluating the Probabilistic Rand (PR) for all available manual segmentations, and the expected index (E [PR]) obtained was 0.6064. A distributional comparison of this evaluation, of the segmentation results for 300 images (of size 321×481) in the Berkeley database, obtained from the GRF, JSEG, DCGT, GS and MAPGSEG is displayed in
The actual improvement can be seen by superimposing all these distributions (as seen in
A comparison of the evaluation, for the segmentation results obtained from the five methods, is displayed in Table 3. This table shows that the method in accordance with embodiments of the present application has the highest average NPR score, and the lowest average run time per image, showing that this method is achieving quality segmentations with the least computational complexity, considering the different environments in which they were developed. Table 4 exhibits qualitative and quantitative comparison of various levels of the MAPGSEG, after all interim outputs were up scaled to the size of original input. Comparing the average level 2 NPR score to that that of level 1 even at level 2 it is seen that the outputs obtained are more than 98% of segmentation quality at the highest level (level 0), and acquired as fast as 2.3 seconds an image. Further more from Table 3 and 4 it can be seen that the MAPGSEG is three times faster than the GS with marginal improvement in segmentation quality. Table 5 shows the computational time comparison of various levels of the MAPGSEG to the GS for 445 large resolution images (˜750×1200). Here it is seen that the GS has an average runtime in minutes (177.2 sec˜=2.9 minutes) in comparison to this method with an overall runtime of 35.7 seconds, almost 5 times faster than its single scale version. In
The present invention provides a computationally efficient method designed for fast unsupervised segmentation of color images with varied complexities in a multiresolution framework. This Multiresolution Adaptive and Progressive Gradient-based color image SEGmentation (MAPGSEG) method is primarily based on adaptive gradient thresholding, progressive region growth involving distributed dynamic seed addition, multiresolution seed transfer and culminates in a unique region merging procedure. The method has been tested on a large database of images including the publicly available Berkeley database, and the quality of results show that this method is robust to various image scenarios at different scales and is superior to the results obtained on the same image when segmented by other methods, as can been seen in the results displayed.
Having thus described the basic concept of the invention, it will be rather apparent to those skilled in the art that the foregoing detailed disclosure is intended to be presented by way of example only, and is not limiting. Various alterations, improvements, and modifications will occur and are intended to those skilled in the art, though not expressly stated herein. These alterations, improvements, and modifications are intended to be suggested hereby, and are within the spirit and scope of the invention. Additionally, the recited order of processing elements or sequences, or the use of numbers, letters, or other designations therefor, is not intended to limit the claimed processes to any order except as may be specified in the claims. Accordingly, the invention is limited only by the following claims and equivalents thereto.
This application claims the benefit of U.S. Provisional Patent Application Ser. No. 61/204,787, filed Jan. 9, 2009, which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61204787 | Jan 2009 | US |