This application claims the benefit of Korean Patent Application No. 10-2004-0098147, filed on Nov. 26, 2004, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein by reference.
1. Field of the Invention
The present invention relates to an apparatus and method for processing an image for facial recognition used in a biometrics, video surveillance, or multimedia retrieval system etc. as an essential technology, and more particularly, to an apparatus and method for processing an image based on layers.
2. Description of Related Art
Recently, a variety of methods for improving the performance of facial recognition have been suggested. One of the conventional methods, local feature analysis (LFA), has been introduced by P. S. Penev and J. J. Atick [“Local Feature Analysis: A General Statistical Theory for Object Representation,” Network: Communication in Neural Systems, Vol. 7, No. 3, pp. 477-500, 1996]. Sparsification used in reducing dimension of an image and a correlation of values obtained by LFA is performed to reduce a reconstruction error instead of improving discrimination of a facial model and thus, there is a limitation in the method. Another one of the conventional methods, linear discriminant analysis (LDA), has been introduced by P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman [“Eigenface vs. Fisherfaces: Recognition Using Class Specific Linear Projection,” IEEE Trans. PAMI, Vol. 19, No. 7, pp. 711-720, July 1997].
In order to solve the problem on a small sample size (SSS) caused by LFA and improve discrimination of a feature vector using LDA, another conventional method combining LFA and LDA has been introduced by Q. Yang, X. Ding, and Z. Chen [“Discriminant Local Feature Analysis of Facial Images,” IEEE Proc. ICIP, Spain, September, 2003]. In the conventional method, since selection of features is designed not to improve discrimination of a facial model but to minimize a reconstruction error, a structural problem still remains.
As another conventional method, local analysis, for example, component analysis shows only local characteristics, and thus, a local minimum problem may occur. This component analysis has been introduced by T. Kim, H. Kim, W. Hwang, S. Kee, and J. Kittler [“Independent Component Analysis in Facial Local Residue Space,” IEEE Proc., CVPR, Madison, USA, July 2003].
An aspect of the present invention provides an apparatus for processing an image based on layers in which an image is divided into plurality of layers and basis matrices of the image are generated and used.
An aspect of the present invention also provides a method of processing an image based on layers by which an image is divided into plurality of layers and basis matrices of the image are generated and used.
According to an aspect of the present invention, there is provided an apparatus for processing an image based on layers, the apparatus including: an image divider dividing an image into E (where, E is a positive integer equal to or greater than 2) layers, each layer having at least one block; and first through E-th layer basis matrix generators respectively generating first through E-th layer basis matrices using the divided image and outputting a set of the first through E-th layer basis matrices as a final basis matrix, wherein the e-th (1≦e≦E) layer basis matrix generator, with respect to each block included in the e-th layer, generates a block model using a kernel matrix obtained by local feature analysis, multiplies a zero mean matrix generated from the divided image by the result of transposing the block model, calculates a between-class scatter matrix and a within-class scatter matrix by linear discriminant analysis using the multiplied result, calculates a discriminant transformation matrix using the calculated between-class scatter matrix and the calculated within-class scatter matrix, multiplies the discriminant transformation matrix by the block model, outputs the multiplied result as a subbasis matrix, and outputs a set of subbasis matrices generated in all of blocks included in the e-th layer as the e-th layer basis matrix, and the number of blocks of the layers is different from each other.
According to another aspect of the present invention, there is provided a method of processing an image based on layers, the method including: dividing an image into E (where, E is a positive integer equal to or greater than 2) layers, each layer having at least one block; and generating first through E-th layer basis matrices using the divided image and determining a set of the first through E-th layer basis matrices as a final basis matrix, wherein the generating of the e-th layer basis matrix comprises, with respect to each block included in the e-th layer, generating a block model using a kernel matrix obtained by local feature analysis, multiplying a zero mean matrix generated from the divided image by the result of transposing the block model, calculating a between-class scatter matrix and a within-class scatter matrix by linear discriminant analysis using the multiplied result, calculating a discriminant transformation matrix using the calculated between-class scatter matrix and the calculated within-class scatter matrix, multiplying the discriminant transformation matrix by the block model, outputting the multiplied result as a subbasis matrix, and outputting a set of the subbasis matrices generated in all of blocks included in the e-th layer as an e-th layer basis matrix, and the number of blocks of the layers is different from each other.
According to another aspect of the present invention, there is provided an image processing apparatus, including: an image divider dividing an the into E layers each having at least one block, E being a positive integer at least equal to 2; and first through E-th layer basis matrix generators respectively generating first through E-th layer basis matrices based on the divided image and outputting a set of the first through E-th layer basis matrices as a final basis matrix. An e-th layer basis matrix generator, for each block of an e-th layer, generates a block model using a kernel matrix obtained by local feature analysis, multiplies a zero mean matrix based on the divided image by a result of transposing the block model, calculates a between-class scatter matrix and a within-class scatter matrix by linear discriminant analysis based on the multiplied result, calculates a discriminant transformation matrix based on the between-class scatter matrix and the within-class scatter matrix, multiplies the discriminant transformation matrix by the block model, outputs the multiplied result as a subbasis matrix, and outputs a set of subbasis matrices generated in all of the blocks included in the e-th layer as the e-th layer basis matrix. e is a positive integer between 1 and E. A number of blocks differs for each layer.
Additional and/or other aspects and advantages of the present invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
These and/or other aspects and advantages of the present invention will become apparent and more readily appreciated from the following detailed description, taken in conjunction with the accompanying drawings of which:
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.
In operation 50, the image divider 10 inputs an image through an input terminal IN1, divides the inputted image into E layers, and outputs the image divided into the E layers to the first through E-th layer basis matrix generators 12 to 16. In this case, each of the divided layers is composed of at least one block, and each layer has different numbers of blocks.
For example, the image divider 10 can divide the image inputted through the input terminal IN1 into a plurality of layers 70, 72, and 74, for example, as shown in
After operation 50, in operation 52, the first through E-th layer basis matrix generators 12 to 16 shown in
That is, the e-th layer basis matrix generator 14 generates a block model using a kernel matrix obtained by local feature analysis (LFA), multiplies a zero mean matrix (ZMM) generated from the divided image inputted from the image divider 10 by the result of transposing the block model, calculates a between-class scatter matrix and a within-class scatter matrix by linear discriminant analysis (LDA) using the multiplied result, calculates a discriminant transformation matrix using the calculated between-class scatter matrix and the calculated within-class scatter matrix, multiplies the discriminant transformation matrix by the block model, and outputs the multiplied result as a subbasis matrix. The e-th layer basis matrix generator 14 generates a subbasis matrix in each block included in the e-th layer and outputs a set of subbasis matrices generated in all of blocks included in the e-th layer as an e-th layer basis matrix.
Assuming that M learning images exist, ψi is an N-dimensional vector obtained by a raster scan as an i-th learning image and 1≦i≦M, general LFA will be described below.
First, a mean vector m of M learning images is obtained using equation 1.
As shown in equation 2, a zero mean vector xi with respect to the i-th learning image is obtained by subtracting the mean vector m from the i-th learning vector ψi.
xi=Ψi−m (2)
The apparatus for processing an image based on layers shown in
X=[x1, . . . ,xM] (3)
In this case, a covariance matrix S is obtained using equation 4.
S=X·XT (4)
, where T is a transpose.
A series of kernels K may be defined using equation 5 with the use of eigen analysis, and the covariance matrix S expressed in equation 4 may be obtained using equation 6.
K=P·V·PT (5)
S=P·D·PT (6)
, where P is an eigen vector, D is an eigen value matrix, and V is obtained using equation 7.
, where diag( ) is a diagonal matrix, λi is an i-th eigen vector of the covariance matrix S, Fi is obtained using equation 8, and low-pass filtering is performed using Fi.
, where n is a specified number and may be 0.25, for example. As a result, an output kernel matrix K is obtained using equation 9.
K=[k1, . . . ,kN] (9)
Columns of the output kernel matrix K shown in equation 9 have spatially local features. As shown in
General LDA will now be schematically described below.
Traditional LDA is performed using a between-class scatter matrix SB and a within-class scatter matrix Sw obtained using equations 10 and 11.
, where Mi is the number of image samples with respect to an i-th class, c is a total number of classes, mi is a mean image of the i-th class having Mi samples, and a projection vector W for satisfying a basic concept of LDA is obtained using equation 12.
, where arg max (or argmax) stands for the argument of the maximum. The term “arg max” is defined as the value of the given argument for which the value of the given expression attains its maximum value. The arg max is defined at the web site http://en.wikipedia.org/wiki/Arg max.
The first through Q-th subbasis matrix generators 100, 102, . . . , 104, . . . , and 106 shown in
In operation 138, the block model generator 118 shown in
After operation 138, in operation 140, the model transposing unit 120 transposes the block model generated by the block model generator 118 and outputs the transposed block model to the first multiplier 122.
After operation 140, in operation 142, the first multiplier 122 multiplies the zero mean matrix X inputted from the subtracting unit 22 through an input terminal IN4 by the transposed block model LgrT inputted from the model transposing unit 120 using equation 13, and outputs the multiplied result Ygr to the scatter matrix calculator 124.
Ygr=LgrTX (13)
After operation 142, in operation 144, the scatter matrix calculator 124 calculates a between-class scatter matrix SgrB and a within-class scatter matrix SgrW using the result Ygr multiplied by the first multiplier 122 and the calculated between-class scatter matrix SgrB and within-class scatter matrix SgrW to the transformation matrix calculator 126. For example, the scatter matrix calculator 124 calculates the between-class scatter matrix SgrB and the within-class scatter matrix SgrW using the above-described equations 10 and 11, as shown in equations 14 and 15.
, where Ygri is the result multiplied by the first multiplier 122 with respect to the i-th class, mgri is a mean vector of Ygri in the i-th class, mgr is a total mean vector of results multiplied by the first multiplier 122, and ci is the i-th class.
After operation 144, in operation 146, the transformation matrix calculator 126 calculates a discriminant transformation matrix Wgr using the between-class scatter matrix SgrB and within-class scatter matrix SgrW inputted from the scatter matrix calculator 124 and outputs the calculated transformation matrix Wgr to the second multiplier 128. For example, the transformation matrix calculator 126 calculates the discriminant transformation matrix Wgr using the above-described equation 12, as shown in equation 16.
After operation 146, in operation 148, the second multiplier 128 multiplies the discriminant transformation matrix Wgr generated by the transformation matrix calculator 126 by the block model Lgr generated by the block model generator 118 and outputs the multiplied result as a q-th subbasis matrix through an output terminal OUT3.
For example, assuming that the image divider 10 divides the image into the two layers 70 and 72 shown in
The block model generator 118 shown in
, where u,v is a spatial position in each layer 70 or 72, w and h is the width and height of each layer 70 or 72, and K(u,v) is Ku+v×w and a column of the above-described kernel matrix K.
Each block model expressed in equation 17 has N/4 local kernels, and each block model expressed in equation 18 has N/16 local kernels.
In this case, the first layer basis matrix generator 12 shown in
V=[V11,V12,V21,V22] (19)
, where the first, second, third, and fourth subbasis matrices V11, V12, V21, and V22 with respect to the first layer 70 are obtained using equation 20.
V11=L11W11
V12=L12W12
V21=L21W21
V22=L22W22 (20)
Similarly, the second layer basis matrix generator 16 shown in
v=[v11,v12, . . . ,v44] (21)
, where the first through 16-th subbasis matrices V11 to V44 with respect to the second layer 72 are obtained using equation 22.
As a result, a final basis vector W of a set of first and second layer basis matrices V and v outputted from the first and second layer basis matrix generators 12 and 16 shown in
W=[V,v] (23)
According to an embodiment of the present invention, the apparatus for processing an image is implemented by only the image divider of
According to another embodiment of the present invention, the apparatus for processing an image based on layers may further include the mean vector calculator 20 and the subtracter 22 and may further generate a zero mean matrix from an inputted image.
According to another embodiment of the present invention, the apparatus for processing an image based on layers according to an embodiment of the present invention may further include the matrix transposing unit 18 and the feature matrix calculator 24 and may further generate a feature vector from a final basis vector as will be described below.
After operation 52, in operation 54, the matrix transposing unit 18 transposes the final basis matrix generated by the first through E-th layer basis matrix generators 12, . . . , 14, . . . , and 16 and outputs the transposed final basis matrix to the feature matrix calculator 24. After operation 54, in operation 56, the feature matrix calculator 24 multiplies a zero mean matrix X inputted from the subtracter 22 by the result transposed by the matrix transposing unit 18, as shown in equation 24 and outputs the multiplied result as a feature matrix.
fi=WfTX (24)
, where fi is a feature matrix with respect to an i-th class, and Wf is a final basis matrix.
If the image is divided into the two layers 70 and 72 shown in
(2×2)·k1+(4×4)·k2 (25)
fgr1=WgrT(LgrT(Ψ−m))=(LgrWgr)T(Ψ−m)=VgrT(Ψ−m) (26)
, where fgr1 is a feature vector with respect to the first layer 70, and Ψ is an image inputted through an input terminal IN1.
fgr2=wgrT(lgrT(Ψ−m))=(lgrwgr)T(Ψ−m)=vgrT(Ψ−m) (27)
, where fgr2 is a feature vector with respect to the second layer 72.
The number of feature vectors shown in
As described above, a procedure for creating a final basis matrix or obtaining a feature matrix using the generated final basis matrix is referred to as a learning procedure.
According to another embodiment of the present invention, the apparatus for processing an image based on layers according to the present invention may further include the storage unit 26, the correlation calculator 28, the comparator 30, and the correlation determining unit 32 and may further recognize a correlation between two images.
In operation 160, the feature matrix calculator 24 calculates feature matrices with respect to previous images as described previously, and the storage unit 26 stores the feature matrices calculated by the feature matrix calculator 24 with respect to the previous images.
After operation 160, in operation 162, the feature matrix calculator 24 calculates feature matrices with respect to current images as described previously, and outputs feature matrices with respect to the calculated current images to the correlation calculator 28.
After operation 162, in operation 164, the correlation calculator 28 calculates a final correlation between the feature matrices outputted from the feature matrix calculator 24 with respect to the current images and the feature matrices read out from the storage unit 26 with respect to the previous images and outputs the calculated final correlation to the comparator 30.
The first through E-th correlation calculators 180 to 186 shown in
, where ∥ ∥ is norm, Se(a,b) is an e-th correlation between a previous image a and a current image b with respect to an e-th layer, and Wgr is a discriminant transformation matrix and is obtained using equation 29.
In equation 28, (fgre)a is a feature vector of a block placed at a g-th position in a horizontal direction and at a r-th position in a vertical direction on an e-th layer of an image a and the result of multiplying VgrT and a zero mean vector. Here, VgrT is the result of transposing the result in which a block model of a block placed at a position (g,r) on the e-th layer is multiplied by a discriminant transformation matrix. Similarly, (fgre)b is a feature vector of a block placed at a g-th position in a horizontal direction and at a r-th position in a vertical direction on the e-th layer of an image b. When E=2, (fgr1)a[or, (fgr1)b] with respect to a first layer of each of images a and b is obtained using equation 26, and (fgr2)a[or, (fgr2)b] with respect to a second layer of each of the images a and b is obtained using equation 27. In this case, the feature matrix calculated by the feature matrix calculator 24 is composed of GR feature vectors.
In addition, Z of equation 28 as a normalized correlation is a value, its ranging from ‘+1’ to ‘−1’, produced from an angle of Cosine by the two vector [(fgre)a and (fgre)b]. As Z is closer to ‘+1’, cos(0°)=1, the two images a and b with respect to the e-th layer becomes more similar to each other, and as Z is closer to ‘−1’, cos(180°)=−1, the two images a and b with respect to the e-th layer becomes less similar to each other.
The synthesizing unit 188 synthesizes first through E-th correlations [S1(a,b), S2(a,b), . . . , Se(a,b), . . . and SE(a,b)] respectively calculated by the first through E-th correlation calculators 180 to 186, and outputs the synthesized result as a final correlation [S(a,b)] to the comparator 30 through an output terminal OUT4.
After operation 164, in operation 166, the comparator 30 compares the final correlation calculated by the correlation calculator 28 with a specified value and outputs the compared result to the correlation determining unit 32. That is, the comparator 30 determines whether the final correlation calculated by the correlation calculator 28 is equal to or greater than the specified value or not.
If it is recognized through the compared result that the final correlation between the two images is equal to or greater than the specified value, in operation 168, the correlation determining unit 32 determines that there is a correlation between the previous image and the current image. That is, the correlation determining unit 32 recognizes that the previous image and the current image are similar to each other.
However, if it is recognized through the compared result that the final correlation between the two images is smaller than the specified value, in operation 170, the correlation determining unit 32 determines that there is no correlation between the previous image and the current image. That is, the correlation determining unit 32 recognized that the previous image and the current image are not similar to each other.
As described above, a procedure for recognizing a correlation between two images using a feature matrix is referred to as a recognition procedure.
When the apparatus and method for processing an image based on layers according to the above-described embodiments of the present invention is used for facial recognition, a facial image may be detected from the entire input image including entire face, the detected facial image may be normalized, the normalized facial image may be pre-processed, and the pre-processed facial image may be inputted through an input terminal IN1 of the apparatus for processing an image based on layers shown in
The performance of the apparatus and method for processing an image based on layers according to the above-described embodiments of the present invention that can be used for facial recognition will now be described below with reference to the attached drawings.
The performance of the apparatus and method for processing an image based on layers according to the above-described embodiments of the present invention with respect to three different subsets, that is, “light subset”, “pose subset”, and “XM2VTS database” can be evaluated. Here, “light subset” and “pose subset” are databases generated by pose illumination expression (PIE) developed in Carnegie Mellon University and are introduced by T. Sim, S. Baker, and M. Bsat [“The CMU Pose Illumination, and Expression (PIE) Database,” International Conference on Automatic Face and Gesture Recognition, May 2002, pp. 53-58”]. In addition, “XM2VTS database” is introduced by K. Messer, J. Matas, J. Kittler, and K. Jonsson [“XM2VTSDB: The Extended M2VTS Database,” Audio and Video-based Biometric Person Authentication, March 1999, pp. 72-77”].
Specifically, “light subset” has 1,496 images with respect to the overall face having neutral illumination. “Pose subset” has 1,020 images having neutral expression under neutral illumination, and a pose change is limited to ±22.5°. “XM2VTS database” has 2,360 front facial images and changes illumination, expression, and time elapse etc. diversely.
All of images included in the databases are normalized to manual eye positions and adjusted to have the size of 32×32 pixels and backgrounds of the images are hidden, thereby obtaining the images shown in
In this case, in order to obtain proper subspaces, 34 individuals are randomly selected from each of “light subset” and “pose subset” as a learning set. The other 34 subjects from each of “light subset” and “pose subset” are used for a test set, and “XM2VTS database” is used only for a test set. In this case, rank order statistics indicated by a graph like a cumulative match characteristic (CMC) curve, are used for a criterion for evaluating the performance of facial recognition.
As shown in
Table 1 shows two PCLDAs, that is, PCLDA-1 and PCLDA-2 and the entire recognition rate of the apparatus and method for processing an image based on layers according to the present invention.
PCLDA-2 as well as PCLDA-1 has 33 number of features but the present invention has 660 (33×4+33×16) number of features. PCLDA-1 is excessively adjusted with respect to a learned change in a PIE database and there is a large difference in performance between PCLDA-1 and PCLDA-2 in a database “light subset”. This difference is not shown in a “XM2VTS database”. That is, while traditional PCLDA is easily overfitted by a learned change and shows a bad performance with respect to a unlearned change, the present invention always shows a good result in all of test sets and in particular, an increase in performance in “XM2VTS database” is worthy of close attention.
As described previously, in the apparatus and method for processing an image based on layers according to the above-described embodiments of the present invention, an image is divided into a plurality of layers, and linear discriminant analysis (LDA) is used in each block so as to determine which block among blocks included in each of the divided layers is important for facial recognition, instead of sparsification. That is, in the above-described embodiments of the present invention, local feature analysis (LFA) is adopted so as to express a facial image every a plurality of (local) blocks using block models and LDA is adopted so as to improve the discrimination of each block model. A block of each divided layer, that is, flocks of local feature can express own local feature and holistic facial information simultaneously. Thus, in the above-described embodiments of the present invention, since the flocks of local feature are used, a problem on a small sample size (SSS) can be easily solved, and since a basis matrix is generated using LDA, important information for recognition (not for expression) can be extracted. Further, many feature vectors can be extracted from different layers with respect to one facial image at separate viewpoints. In addition, two different feature spaces extracted from different ranges with respect to the same character can be made, and for example, a first layer for dividing an image can be used for low-frequency analysis, and a second layer can be used for high-frequency analysis.
In the apparatus and method for processing an image based on layers according to above-described embodiments of the present invention, without the use of a special sparsification scheme like in LFA, an image is divided into a plurality of layers and basis matrices are generated so that a correlation of LFA can be reduced and several feature vectors can be obtained every layers and blocks without causing an SSS problem. Since a final basis matrix is generated using LDA, feature matrices having high discrimination can be generated, and an image, in particular, a facial image can be better recognized using the feature matrices having high discrimination. A stable recognition performance even with respect to characteristics that are not generated in a learning procedure for generating basis matrices can be provided, and in particular, when comparing a conventional PCLDA, a facial model having a sufficient dimension occurred when the number of feature vectors increases in a limited learning database can be expressed, overfitting even with respect to a change that is not generated in the learning procedure can be coped with, and a more improved facial recognition performance can be provided. In other words, performance degradation caused by an unlearned change can be prevented, and due to adoption of holistic analysis, since an image is divided into layers and processed, unlike conventional PCLDA that may be affected when the overall face is recognized due to a spatial local change, local information as well as holistic information can be analyzed in a facial model having a remarkable local block feature, that is, the effect of a holistic facial image can be considered simultaneously with the emphasis of a local block so that a probability for being local minimum can be reduced and a robust facial recognition performance can be provided.
Although a few embodiments of the present invention have been shown and described, the present invention is not limited to the described embodiments. Instead, it would be appreciated by those skilled in the art that changes may be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2004-0098147 | Nov 2004 | KR | national |