The present invention generally relates to 3D (three dimensional) compression technology. In particular, the present invention relates to a method and apparatus for consistent segmentation (co-segmentation) of 3D models.
In the processing of 3D models in computer graphics, the segmentation of a set of 3D models is a primary step and an important pre-precessing for the shape understanding of the 3D models. With the segmentation process, the set of 3D models could be partitioned into multiple segments, which can simplify and/or change the representation of 3D models into something that is more meaningful and easier to analyze. With the increasing number of 3D models, it has been an intensive research topic for the consistent segmentation of a dataset of 3D models to be associated with correspondence.
Some methods have been proposed for a consistent segmentation of a set of 3D models, which can be categorized into supervised, unsupervised and semi-supervised methods. It is known to a person skilled in the art that the above mentioned categorization depends on whether the input is composed of manual segmentations, none of manual ones, or part of manual ones.
In a paper of E. Kalogerakis, A. Hertzmann, K. Singh, entitled “Learning 3D Mesh Segmentation and Labeling”, ACM Trans. on Graphics, vol. 29, no. 4, pp. 102:1-102:12, 2010 (hereinafter referred to as reference 1), a supervised method was provided. In the reference 1, features are selected by JointBoost, which is a machine learning method employed for selecting appropriate features. The JointBoost requires a training dataset.
In a paper of R. Hu, L. Fan, L. Liu., entiled “Co-Segmentation of 3D Shapes via Subspace Clustering”, Computer Graphics Forum (SGP 2012), vol. 31, no. 5, pp. 1703-1713, 2012 (hereinafter referred to as reference 2), an unsupervised method was discussed. The reference 2 proposes to extend the multi-task learning in image processing to fuse multiple features in shape segmentation. However, an additional parameter is introduced, which increases the complexity of optimization. And a sparse subspace clustering method is presented, which exploits the sparsity of representation by the linear combination of points belonging to the same subspace. This method only captures the local linear relationship among data points, which is sensitive to noise and outlier.
In a paper of Y. Wang, S. Asafi, O. Kaick, H. Zhang, D. Cohen-Or, B. Chen, entitled “Active Co-Analysis of a Set of Shapes”, ACM Trans. on Graphics, vol. 31, no. 6, pp. 165:1-165:10, 2012 (hereinafter referred to as reference 3), a semi-supervised method was proposed. In the solution of the reference 3, an active learning method is employed, which requires the input of a user.
In a dataset of 3D models from one category, although the semantic parts which are inherent in multiple shapes are consistent, there exist large variations among these shapes in geometry and topology. Therefore, it is not enough to achieve satisfactory results using only one shape descriptor. In order to improve the quality of the consistent segmentation, more shape descriptors are beneficial, which however will inevitably increase the computing complexity. But since the quality will be improved much better by using multiple shape descriptors than only using one, conventional segmentation methods for a set of 3D models usually will take multiple shape descriptors into account.
In view of the above problem in the conventional technologies, the invention proposes an unsupervised method and apparatus for consistent segmentation of 3D models, wherein the consistent segmentation is formulated as a multi-view spectral clustering task by co-training a set of affinity matrices for different shape descriptors. This method does not require training data, user input, and additional parameters for multiple features.
According to one aspect of the invention, a method for consistent segmentation of a set of 3D models is provided. The method comprises: over-segmenting each 3D model in the set of 3D models into patches, each of which comprises at least one primitive of the 3D model; computing at least one feature descriptor on each 3D model which is used for the segmentation of the 3D model; defining a feature vector for each patch over the at least one feature descriptor computed on each 3D model; calculating a low-rank and sparse representation for each feature descriptor by using the feature vectors; and clustering the patches with a fused sparse and low-rank representation.
According to one aspect of the invention, an apparatus for consistent segmentation of a set of 3D models is provided. The apparatus comprises: means for over-segmenting each 3D model in the set of 3D models into patches, each of which comprises at least one primitive of the 3D model; means for computing at least one feature descriptor on each 3D model which is used for the segmentation of the 3D model; means for defining a feature vector for each patch over the at least one feature descriptor computed on each 3D model; means for calculating a low-rank and sparse representation by using the feature vectors; and means for clustering the patches with a fused sparse and low-rank representation.
It is to be understood that more aspects and advantages of the invention will be found in the following detailed description of the present invention.
The accompanying drawings are included to provide further understanding of the embodiments of the invention together with the description which serves to explain the principle of the embodiments. The invention is not limited to the embodiments.
In the drawings:
An embodiment of the present invention will now be described in detail in conjunction with the drawings. In the following description, some detailed descriptions of known functions and configurations may be omitted for conciseness.
In a segmentation of a set of 3D models, since there are variations between different 3D models in the same category, it is hard to segment these 3D models individually and build the correspondence between the resulted components. Moreover, different feature descriptors capture different characteristics of the shapes and it is therefore almost impossible to find a kind of feature which is suitable for the segmentation of all shapes. In view of the foregoing problems, an embodiment of the invention proposes to employ a multi-view spectral clustering method to fuse multiple features in the segmentation. Furthermore, during the construction of the affinity matrix for each feature, the low-rankness is imposed to capture the global structures inherent in the shapes. The embodiment of the invention can segment a dataset of 3D models into meaningful parts in a consistent way and create the correspondence simultaneously.
In the embodiment shown in
As shown in
A normalized cut method can be employed for the over-segmentation of each 3D model into patches in the step S201. In the normalized cut method, it computes firstly the dihedral angle of each pair of neighboring faces (a face indicates a model primitive, e.g. triangle). Then the Gaussian weights are calculated as their similarity metric. Finally, the normalized cuts method is performed on the similarity matrix to cluster faces into several patches.
At step S203, it computes at least one feature descriptor on each 3D model. The feature descriptor, for example, could be Gaussian Curvature (GC), average geodesic distance (AGD) and shape diameter function (SDF), etc. Each feature descriptor can be used in the segmentation of a single 3D model.
At step S205 it defines a feature vector for each patch obtained from over-segmentation in the step S201 over each feature descriptor computed on each 3D model in the step S203. The above function can count the feature values (scalars or vectors) computed over each patch. As one example, for each feature descriptor, it could define a feature vector for each patch by computing a histogram which captures the distribution of this feature descriptor on the triangles of this patch. For each patch obtained in the step S201, the feature values have been computed on its vertices in the step S203.
In this embodiment, the feature histogram is generated by setting the number of bins, which is the disjoint categories in which the number of feature values are counted, as 100, that is the dimension of a feature vector. Thus, a 3D model can be represented by a n*m matrix Pi, where n denotes the number of bins, m denotes the number of patches and each column of which denotes the feature vector for each patch.
At step S207, it calculates a low-rank and sparse representation by using feature vectors for each feature descriptor. Let feature vectors on patches be input samples, denoted by Pi, each column of which represents the feature vector on one patch. Based on the theory of sparse representation, each sample of the input data can be represented as a linear combination of the other samples in the same cluster, which exploits the local linear relationship among the samples.
Furthermore, the low-rank representation is also based on the hypothesis of the linear relationship among samples, which finds the representation with lowest rank and captures the global structure. In a paper of L. Zhuang, H. Gao, Z. Lin, Y. Ma, X. Zhang, N. Yu. Entitled “Non-negative Low Rank and Sparse Graph for Semi-Supervised Learning”, CVPR, 2012, a method for low-rank and sparse representation is described (hereinafter referred to as reference 4). According to the reference 4, the affinity matrix Zi for measurement of the similarity between a pair of patches can be derived from the following optimization problem.
for the ith kind of feature, ∥Zi∥1 denote the nuclear norm of Zi, which makes the solution to be lowest rank. And ∥Zi∥1 denote the l1 norm of Zi, which makes it to be sparse. Ei denote the noise term. The parameter α is used to trade off the rankness and sparsity, and λ controls the size of noise. ρ is selected as the l2,1 norm in this embodiment.
The above problem can be solved by the popular Alternating Direction Method (ADM), which is proposed in a paper of S. P. Boyd and L. Vandenberghe, entitled“Convex Optimization”, Cambridge Univ. Press, 2004 (hereinafter referred to as reference 5). In this method, two auxiliary variables are introduced to separate the problem. The objective function can be rewritten using the augmented Lagrangian methods and the minimization problem can be solved by alternatively updating one variable while fixing the others. Thus, for each type of feature, the affinity matrix Zi can be obtained.
It should be noted that, in this embodiment the module of augmented representations is introduced as an example of low-rank and sparse representation. The augmented representation can integrate more knowledge into the affinity matrix, such as the spatial proximity between a pair of patches. For example, if a pair of patches are derived from the same 3D model, their spatial proximity is based on whether there is a common boundary between them. The concavity along the boundary and the length are usually used to define the similarity. For a pair of patches from different 3D models, the two models should be aligned first, such as using principal component analysis (PCA). Then, for the faces on the first patch, if there exist the closest faces on the second patch, the similarity between these two patches can be defined using the properties of the pairs of closest faces, such as areas, distances, etc. Thus, an extra matrix can be generated to describe the spatial proximity between any pair of patches, which can be integrated into sparse and low-rank representation for an augmented representation. However, it could be appreciated that other types of representation is also applicable.
At step S209, it clusters patches with fused sparse and low-rank representation. After the affinity matrix for each type of feature has been computed in the previous steps, a co-training method could be employed to update the affinity matrix in order to make the clusters from different views consistent. A paper of A. Kumar, H. Daum III, entitled “A Co-Training Approach for Multi-View Spectral Clustering”, ICML, 2011 (hereinafter referred to as reference 6) proposed a multi-view spectral clustering method which is utilized to get the consistent segmentation by fusing multiple features.
In spectral clustering, the first Ki eigenvectors of the Laplacian matrix are the indicator vectors for the ith feature, which contain the discriminative information between clusters. The number Ki for different features can be the same or different. In this embodiment, the number Ki for each kind of feature is assigned to be the same as the number of parts K to be segmented. The indicator vectors for one feature can be used to improve the clusters from another feature. The process of multi-feature fusion is iterative. For each feature, a discriminative subspace can be spanned by the K eigenvectors. Then for the other features, their affinity matrices can be projected onto the subspace, which discards the intra-cluster details that confuse the clustering while preserves the discriminative inter-cluster information. In each iteration, the subspaces derived from all the features are traversed. Finally the K eigenvectors for each feature are concatenated column-wisely to form a matrix UA, which is used to perform k means clustering to obtain the final clusters of patches.
A post-processing can be operated for the result of step S209 to refine the segment boundary. It could be appreciated that the post processing is a optional step for which conventional methods can apply. No further details will be provided in this respect.
As described above, with the method for consistent segmentation of a set of 3D models according to an embodiment of the present invention, the consistent segmentation task is generally formulated as a multi-view spectral clustering task. First, each 3D model in the dataset of 3D models is over-segmented into a plurality of patches, which are used in the clustering algorithm to reduce the computational cost. Then, features on each 3D model are detected. For each feature, a low-rank and sparse graph representation is employed to achieve the affinity matrix that measures the similarity between patches. And the affinity matrix can be augmented optionally with more knowledge, such as the spatial proximity among the patches of 3D models. Each feature representation can be regarded as one view of the data. Finally, all the views are co-trained with each other and the consistent segmentation result is obtained by multi-view spectral clustering method. For each feature, the number of indicated eigenvectors can be determined adaptively during the co-training process.
The result of the embodiment of the invention was compared with the unsupervised method in the reference 2 and the supervised method in the reference 1 on five categories (Human, Airplane, Bird, Armadillo, Fourleg) from Princeton Segmentation Benchmark.
Another embodiment of the present invention provides a corresponding apparatus for consistent segmentation of a set of 3D models.
As shown in
The apparatus 1100 further comprises a feature detection unit 1103 for receiving the set of 3D models and computing at least one feature descriptor on each 3D model of the set of 3D models. Each computed feature descriptor should be able to be used in the segmentation of a single 3D model. Examples of the feature descriptor could be Gaussian Curvature (GC), average geodesic distance (AGD) and shape diameter function (SDF), etc.
The apparatus 1100 further comprises a feature analysis unit 1105 for receiving the results from the over-segmentation unit 1101 and the feature detection unit 1103 and defining a feature vector for each patch obtained from the over-segmentation unit 1101 over the feature descriptors computed on each 3D model by the feature detection unit 1103.
The apparatus 1100 further comprises a low rank and sparse representation unit 1107 for receiving the result from the a feature analysis unit 1105 and calculating a low-rank and sparse representation by using each feature vector obtained by the feature analysis unit 1105. The low rank and sparse representation can be in the form of an affinity matrix of the similarity between a pair of patches of each feature descriptor. In addition, the affinity matrix can be augmented to integrate more knowledge into the affinity matrix.
The apparatus 1100 further comprises a clustering unit 1109 for receiving the result from the low rank and sparse representation unit 1107 and clustering the patches with fused sparse and low-rank representation obtained by the low rank and sparse representation unit 1107.
The apparatus 1100 can further comprise a post-processing (not shown) for receiving the result from the clustering unit 1109 and refining the segment boundary.
It is to be understood that the present invention may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof, for example, within any one or more of the plurality of 3D display devices or their respective driving devices in the system and/or with a separate server or workstation. Moreover, the software is preferably implemented as an application program tangibly embodied on a program storage device. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (CPU), a random access memory (RAM), and input/output (I/O) interface(s). The computer platform also includes an operating system and microinstruction code. The various processes and functions described herein may either be part of the microinstruction code or part of the application program (or a combination thereof), which is executed via the operating system. In addition, various other peripheral devices may be connected to the computer platform such as an additional data storage device and a printing device.
It is to be further understood that, because some of the constituent system components and method steps depicted in the accompanying figures are preferably implemented in software, the actual connections between the system components (or the process steps) may differ depending upon the manner in which the present invention is programmed. Given the teachings herein, one of ordinary skill in the related art will be able to contemplate these and similar implementations or configurations of the present invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2013/077843 | 6/25/2013 | WO | 00 |