This application claims priority to Korean Patent Application No. 10-2012-0106685 filed on Sep. 25, 2012 in the Korean Intellectual Property Office (KIPO), the entire contents of which are hereby incorporated by reference.
1. Technical Field
Example embodiments of the present invention relate in general to a learning method and a learning apparatus and more specifically to a learning method using extracted data features that may provide high recognition performance and a learning apparatus thereof.
2. Related Art
From the point of view of pattern recognition or machine learning, sex recognition may be seen as a problem of binary classification of distinguishing between men and women.
On the other hand, age recognition may be seen as a problem of multi-classification of distinguishing among pre-teens, the teens, and those in their twenties, thirties, forties, fifties, sixties, and seventies or more, or a problem of regression of estimating the age in detail in units of one year such as an 11-year-old or a 23-year-old. In addition, pose recognition of recognizing vertical and horizontal directions of a user's face based on face image data of the user may be also seen as a problem of multi-classification or regression.
Pose classification of approximating an angle of the user's face into −80 degrees, −60 degrees, −40 degrees, −20 degrees, 0 degrees, +20 degrees, +40 degrees, +60 degrees, and +80 degrees depending on vertical and horizontal directions of the user's face to thereby estimate may be also seen as a problem of multi-classification. On the other hand, a case of subdividing the angle of the user's face into continuous values such as +11 degrees or −23 degrees to thereby estimate may be seen as a problem of regression.
A regression analyzer or a classifier is configured in the form of a function in which an input value and an output value are connected. A process of connecting the input value and the output value of the function using data prepared in advance may be referred to as learning (or training), and data for the learning may be referred to a learning (training) data.
The learning data is configured of input values and target values (or desirable outputs) with respect to the input values. For example, in the case of the age recognition or pose recognition using the face image information, the face image information corresponds to the input values, and ages or poses (face orientation angle) of corresponding face image information correspond to the target values.
The learning process is performed by adjusting parameters of a function constituting the regression analyzer or classifier, and is performed by adjusting parameter values or obtaining optimized parameter values so that output values and target values of a function with respect to input values coincide as much as possible.
Meanwhile, in order to simplify the learning process or improve accuracy of estimation, a feature extraction process has been introduced, and studies on the feature extraction process have been continuously made.
Accordingly, example embodiments of the present invention are provided to substantially obviate one or more problems due to limitations and disadvantages of the related art.
Example embodiments of the present invention provide a learning method using extracted data features in order to simplify a learning process or improve accuracy of estimation.
Example embodiments of the present invention also provide a learning apparatus using extracted data features in order to simplify a learning process or improve accuracy of estimation.
In some example embodiments, a learning method using extracted data features, which is performed in a learning device, includes: dividing input learning data into two groups based on a predetermined reference; extracting data features for distinguishing the two divided groups; and performing learning using the extracted data features.
Here, after the extracting, when there is a group required to be divided into sub-groups among the two groups, the learning method may further include dividing the group required to be divided into the sub-groups; and extracting data features for distinguishing the divided sub-groups.
Here, the extracting of the data features for distinguishing the two divided groups may include setting one group of the two divided groups as a class 1 and setting the other group thereof as a class 2, acquiring a variance between the class 1 and the class 2 and a projection vector for enabling a ratio of the variance between the class 1 and the class 2 to be a maximum value, and extracting the data features by projecting the input learning data to the acquired projection vector.
Here, the extracting of the data features for distinguishing the two divided groups may include extracting candidate features for the input learning data, assigning a weight to individual data included in the input learning data, selecting a part of the individual data in accordance with the weight assigned to the individual data, learning classifiers for classifying the two groups using the part of the individual data with respect to each of the candidate features, calculating accuracy of the classifiers based on the input learning data and the weight assigned to the individual data, selecting the classifier having the highest accuracy as the classifier having the highest classification performance, and extracting the candidate features used in learning the classifier having the highest classification performance as the data features for distinguishing the two groups.
Here, the extracting of the data features for distinguishing the two divided groups may further include reducing the weight of the individual data classified by the classifier having the highest classification performance, and increasing the weight of the individual data excluding the classified individual data, determining whether the data features for distinguishing the two groups are output by the number of the data features set in advance, and repeatedly performing the process from the selecting of the part of the individual data to the determining until the data features for distinguishing the two groups are extracted by the number of the data features set in advance when the data features are determined not to be extracted by the number of the data features set in advance.
Here, in the selecting of the part of the individual data, a probability of selecting the higher weight assigned to the individual data may be high.
Here, the extracting of the data features for distinguishing the two divided groups may include extracting the data features for distinguishing the two divided groups through at least one of an image filter, a texture expression method, wavelet analysis, a Fourier transform, a dimension reduction method, and a feature extraction means.
Here, after the performing of the learning, the learning method may further include inputting face image data to a result of the performing of the learning to thereby extract an age or a pose corresponding to the face image data.
In other example embodiments, a learning apparatus using extracted data features, includes: a learning data providing unit that provides input learning data; a feature extraction unit that divides the learning data into two groups based on a predetermined reference, and extracts data features for distinguishing the two divided groups to thereby provide the extracted data features; and a processing unit that performs learning using the extracted data features.
Here, when there is a group required to be divided into sub-groups among the two groups, the feature extraction unit may divide the group required to be divided into the sub-groups, and extract data features for distinguishing the divided sub-groups to thereby provide the extracted data features to the processing unit.
Here, the feature extraction unit may set one group of the two divided groups as a class 1 and sets the other group thereof as a class 2, acquire a variance between the class 1 and the class 2 and a projection vector for enabling a ratio of the variance between the class 1 and the class 2 to be a maximum value, and then extract the data features by projecting the input learning data to the acquired projection vector.
Here, the feature extraction unit may extract the data features for distinguishing the two divided groups through at least one of an image filter, a texture expression method, wavelet analysis, a Fourier transform, a dimension reduction method, and a feature extraction means.
Here, when face image data is provided from the learning data providing unit, the processing unit may input the face image data to a result obtained by performing the learning to thereby extract an age or a pose corresponding to the face image data.
Example embodiments of the present invention will become more apparent by describing in detail example embodiments of the present invention with reference to the accompanying drawings, in which:
Example embodiments of the present invention are disclosed herein. However, specific structural and functional details disclosed herein are merely representative for purposes of describing example embodiments of the present invention, however, example embodiments of the present invention may be embodied in many alternate forms and should not be construed as limited to example embodiments of the present invention set forth herein.
Accordingly, while the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that there is no intent to limit the invention to the particular forms disclosed, but on the contrary, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like numbers refer to like elements throughout the description of the figures.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present invention. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
It will be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being “directly connected” or “directly coupled” to another element, there are no intervening elements present. Other words used to describe the relationship between elements should be interpreted in a like fashion (i.e., “between” versus “directly between,” “adjacent” versus “directly adjacent,” etc.).
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise” It will be further understood that the terms “comprises,” “comprising,” “includes” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
It should also be noted that in some alternative implementations, the functions/acts noted in the blocks may occur out of the order noted in the flowcharts. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
Hereinafter, it is assumed that learning data is composed of a plurality of unit data, and individual data is composed of a pair of input data and a target value. For example, face image data in age recognition and pose (face orientation angle) recognition using face image information corresponds to the input data, and the age or the pose corresponds to the target value.
Referring to
In step S120, the learning apparatus selects or extracts data features for readily distinguishing the two groups divided in step S110.
Next, in step S130, the learning apparatus determines whether the two divided groups are required to be divided into sub-groups.
In step S140, the learning apparatus divides the two groups into the sub-groups when it is determined through step S130 that the two divided groups are required to be divided into the sub-groups, and repeatedly performs step S120.
Alternatively, in step S150, the learning apparatus performs learning using the data features extracted in step S120 when it is determined in step S130 that the two divided groups are not required to be divided into the sub-groups.
Here, the learning apparatus may not be required to use all of the extracted data features, and may use data features which are selectively extracted in accordance with a configuration of the learning apparatus.
A case in which the learning apparatus using the extracted data features according to an embodiment of the present invention divides the two divided groups in half based on a target value has been described, but according to another embodiment of the present invention, the divided groups need not have the same number of sub-groups.
Referring to
Specifically, the learning apparatus divides (1-1) the entire learning data group [1, 2, 3, 4, 5, 6, 7, 8] into a group [1, 2, 3, 4] and a group [5, 6, 7, 8] through first division, and selects or extracts features for readily distinguishing input data included in the group [1, 2, 3, 4] and input data included in the group [5, 6, 7, 8].
In addition, the learning apparatus respectively divides the group [1, 2, 3, 4] and the group [5, 6, 7, 8] into two groups through second division. That is, the learning apparatus divides (2-1) the group [1, 2, 3, 4] into a group [1, 2] and a group [3, 4], and divides (2-2) the group [5, 6, 7, 8] into a group [5, 6] and a group [7, 8].
Next, the learning apparatus selects or extracts features for readily distinguishing input data included in the group [1, 2] and input data included in the group [3, 4]. In addition, the learning apparatus selects or extracts features for readily distinguishing input data included in the group [5, 6] and input data included in the group [7, 8].
In addition, the learning apparatus divides (3-1) the group [1, 2] into a group [1] and a group [2] through third division, and selects or extracts features for readily distinguishing input data included in the group [1] and input data included in the group [2].
By repeatedly performing the above-described process, the learning apparatus respectively divides (3-2) the group [3, 4] into a group [3] and a group [4], divides (3-3) the group [5, 6] into a group [5] and a group [6], and divides (3-4) the group [7, 8] into a group [7] and a group [8], and extracts or selects features for readily distinguishing the divided groups.
The above-described first to third divisions are performed by dividing in half with respect to the target values for the convenience of description, and the divided groups need not have the same number of sub-groups.
Referring to
In
First, the learning apparatus divides (1-1) the entire learning data group [0, 10, 20, 30, 40, 50, 60, 70] into a group [0, 10, 20, 30] and a group [40, 50, 60, 70] through the first division, and selects or extracts features for readily distinguishing face image data included in the group [0, 10, 20, 30] and face image data included in the group [40, 50, 60, 70].
In addition, the learning apparatus respectively divides the group [0, 10, 20, 30] and the group [40, 50, 60, 70] into two groups through second division. That is, the learning apparatus divides (2-1) the group [0, 10, 20, 30] into a group [0, 10] and a group [20, 30], and divides (2-2) the group [40, 50, 60, 70] into a group [40, 50] and a group [60, 70].
Next, the learning apparatus selects or extracts features for readily distinguishing face image data included in the group [0, 10] and face image data included in the group [20, 30]. In addition, the learning apparatus selects or extracts features for readily distinguishing the group [40, 50] and the group [60, 70].
In addition, the learning apparatus divides (3-1) the group [0, 10] into a group [0] and a group [10], and selects or extracts features for readily distinguishing face image data included in the group [0] and face image data included in the group [10].
By repeatedly performing the same process, the learning apparatus divides (3-2) the group [20, 30] into a group [20] and a group [30], divides (3-4) the group [40, 50] into a group [40] and a group [50], and extracts or selects features for readily distinguishing the divided groups.
In addition, the learning apparatus may repeatedly perform the above-described process while dividing a corresponding group into sub-groups, as necessary.
Referring to
As described through
The extracting or selecting of the data features which has been described through
Alternatively, the extracting or selecting of the data features may be performed through optimized setting values or an application combination of the image process, the feature extraction means and method or algorithm, and the like.
Hereinafter, a method of extracting features according to an embodiment of the present invention using FLD that is the dimension reduction method will be described.
The FLD obtains a projection vector w for enabling a ratio (Equation 1) of between-class covariance and within-class covariance to be a maximum.
Next, data features are extracted by projecting data to the obtained projection vector w.
Here, a between-class covariance SB and a within-class covariance SW are respectively denoted as Equation 2 and Equation 3.
In Equations 2 and 3, m1 and m2 respectively denote an average of input data included in a class 1 and a class 2, C1 denotes an index set of data included in the class 1, and C2 denotes an index set of data included in the class 2. In Addition, xn denotes input data of learning data.
In
The calculated projection vector w is used to perform a regression or multi-classification method.
For example, in the case of age recognition of
In addition, the learning apparatus sets the group [0, 10] of the second division as a class 1 and the group [20, 30] of the second division as a class 2, and calculates a projection vector w which is effective for distinguishing the group [0, 10] and the group [20, 30]. In addition, the learning apparatus sets another group [40, 50] of the second division as a class 1 and another group [60,70] of the second division as a class 2 in the same manner, and calculates a projection vector w which is effective for distinguishing the group [40,50] and the group [60,70].
In addition, the learning apparatus sets the group [0] of third division as a class 1 and the group [10] of third division as a class 2, and calculates a projection vector w which is effective for distinguishing the group [0] and the group [10]. In addition, the learning apparatus repeatedly performs the above-described process with respect to the remaining groups of the third division.
When the settings are performed as shown in
Referring to
Referring to
Hereinafter, a method of extracting features for effectively distinguishing a group X and a group Z using Adaboost will be described.
Referring to
For example, when applying an image processing filter set shown in
Here, the image processing filter set may be composed of 24 primary Gaussian differential filters, 24 secondary Gaussian differential filters, 8 Laplacian filters, and 4 Gaussian filters.
In addition, each of filtered images of
In addition, as shown in
The features of the extracted image in the learning method using features of the extracted image according to an embodiment of the present invention may be defined or specified as a type of a filter or a position or shape of the region. Hereinafter, for the convenience of description, the number of the entire candidate features is D, and a process of selecting the features will be described.
Referring again to
In step S123, the learning apparatus selects a part of the learning data in accordance with the weight assigned to individual learning data.
Here, a probability of selecting the higher weight assigned to the individual learning data may be high.
In step S124, the learning apparatus learns classifiers for classifying the group Y and the group Z using learning data selected through step S123, with respect to each of the candidate features extracted through step S121.
Here, D classifiers may be generated through step S124.
In step S125, the learning apparatus calculates accuracy of the classifiers based on the learning data included in the group X and the weight assigned to each data.
In step S126, the learning apparatus selects a classifier having the highest accuracy through step S125 as a classifier having the highest classification performance.
In step S127, the learning apparatus extracts the candidate features used in learning the classifier selected thorough step S126, as features for distinguishing the group Y and the group Z.
Next, in step S128, the learning apparatus reduces a weight of data accurately classified by the selected classifier, and increases a weight of data erroneously classified.
In step S129, the learning apparatus determines whether the features for distinguishing the group Y and the group Z are extracted by a predetermined number of features.
When it is determined that the features are not extracted by the predetermined number of features, the learning apparatus returns to step S123, and repeatedly performs a procedure until the features for distinguishing the group Y and the group Z are extracted by the predetermined number of features.
Referring to
First, the learning data providing unit 110 receives an image, and provides the input image to the feature extraction unit 120.
The feature extraction unit 120 divides the image input from the learning data providing unit 110 into two groups.
In addition, the feature extraction unit 120 selects or extracts data features for readily distinguishing the two divided groups.
In addition, the feature extraction unit 120 determines whether the two divided groups are required to be divided into sub-groups, divides each of the two groups into sub-groups when it is determined that the two divided groups are required to be divided into sub-groups, and selects or extracts data features for distinguishing the group divided into the sub-groups.
Alternatively, when it is determined that the two divided groups are not required to be divided into sub-groups, the feature extraction unit 120 provides the data features selected or extracted so far to the processing unit 130.
The processing unit 130 performs learning using the data features provided from the feature extraction unit 120.
Here, the processing unit 130 may perform regression that may estimate detailed ages using the provided data features or a classification method that may classify ages.
In addition, the processing unit 130 does not need to all of the data features provided from the feature extraction unit 120, and may use the data features selectively provided in accordance with a configuration of the learning apparatus 100.
According to the learning apparatus using the extracted data features according to an embodiment of the present invention, a learning process may be simplified, and accuracy of estimation may be improved.
Using the data features which have been extracted or selected through the above-described data feature extraction or selection, a learning and configuration method of a multi-classifier will be described. Here, the classifier is not limited to a specific classifier, and the case of using a binary classifier such as a support vector machine (SVM) will be described.
Referring to
Referring to
In addition, using extracted (or selected) features so as to readily distinguish the group [1, 2] and the group [3, 4] of the second division and learning data included in the group [1, 2, 3, 4], a classifier for readily classifying into the group [1, 2] and the group [3, 4] is learned (2-1).
In addition, using extracted (or selected) features so as to readily distinguish the group [5,6] and the group [7,8] of the second division and learning data included in the group [5, 6, 7, 8], learning of a classifier for classifying into the group [5,6] and the group [7,8] is performed (2-2).
In addition, using extracted (or selected) features so as to readily distinguish the group [1] and the group [2] of the third division and learning data included in the group [1, 2], learning of a classifier for classifying into the group [1] and the group [2] is performed (3-1).
In addition, using extracted (or selected) features so as to readily distinguish the group [3] and the group [4] of the third division and learning data included in the group [3, 4], learning of a classifier for classifying into the group [3] and the group [4] is performed (3-2).
The above-described process may be repeatedly performed with respect to data of the remaining groups.
The multi-classifier configured through the above-described process generates features (for example, the features for distinguishing the group [1, 2, 3, 4] and the group [5, 6, 7, 8]) used in the classifier learning (1-1) from test data when the test data is input, and inputs the generated features into the classifier (1-1).
The classifier (1-1) determines in which group the test data is included among the group [1, 2, 3, 4,] and the group [5, 6, 7, 8].
When it is determined that the test data is included in the group [1, 2, 3, 4], the classifier (1-1) extracts features (for example, the features for distinguishing the group [1, 2] and the group [3, 4]) used in the classifier learning (2-1). In addition, whether the test data is included in the group [1, 2] or the group [3, 4] is determined by inputting the features to the classifier (2-1).
Alternatively, when it is determined that the test data is included in the group [5, 6, 7, 8], the classifier (1-1) extracts features (for example, the features for distinguishing the group [5, 6] and the group [7, 8]) used in the classifier learning (2-2) from the test data. In addition, whether the test data is included in the group [5, 6] or the group [7, 8] is determined by inputting the features to the classifier (2-2).
By applying the above process to a classifier (3-1), a classifier (3-2), a classifier (3-3), and a classifier (3-4) based on the determination results of the classifier (1-1), the classifier (2-1), and the classifier (2-2), finally, a group (for example, ages or pose interval) in which the test data is included may be determined.
The multi-classifier set according to another embodiment of the present invention will be configured through the following process.
Groups are configured one-to-one with each other in pairs, and learning of a classifier for distinguishing the groups constituting the pair is performed.
For example, in
In addition, learning of a classifier for readily distinguishing the group [2] and the group [3], the group [2] and the group [4], the group [2] and the group [5], the group [2] and the group [6], the group [2] and the group [7], and the group [2] and the group [8] is performed.
When the learning is performed in the above-described method, a total of 28(=8×7/2) classifiers may be generated.
When the test data is input to the multi-classifier configured through the above-described process, in which group the input test data is included using the 28 classifiers is determined.
That is, the multi-classifier configured through the above-described process generates 28 determination results with respect to the input test data, and determines a group having the largest number of votes as the group in which the test data is included, by the majority rule.
A multi-classifier set according to still another embodiment of the present invention will be configured through the following process.
Learning of a classifier for forming pairs using one group and the remaining groups for each group, and distinguishing the groups forming the pair is performed.
In
In addition, a classifier for distinguishing the group [2] and the remaining groups (the group [1], the group [3], the group [4], the group [5], the group [6], the group [7], and the group [8]) is performed using learning data included in each group pair.
When the learning is performed as described above, a total of 8 classifiers representing each of the group [1], the [2], the group [3], the group [4], the group [5], the group [6], the group [7], and the group [8] are generated.
When the test data is input, the multi-classifier configured through the above-described process generates 8 determination results with respect to the input test data using the generated 8 classifiers.
In addition, the multi-classifier selects a classifier outputting the highest determination value (or the lowest determination value), and determines the ages of test data of a group represented by the selected classifier.
A learning and configuration method of a regression analyzer which is used in detailed age or detailed pose estimation using data features extracted or selected through the above-described extraction or selection of the data features will be described.
In Equation 4, N denotes the number of pieces of the entire learning data, a denotes a parameter of a regression function, xn denotes nth learning data as an input value of the regression analyzer, and tn denotes a target value with respect to nth data.
In the detailed age estimation (or the detailed pose estimation), xn corresponds to a face image feature value that is extracted or selected through the above-described method, and tn corresponds to a detailed age of the face image data (or detailed pose).
The learning with respect to the regression analyzer is performed by adjusting or calculating a parameter vector a so that a value of Equation 4 is a minimum.
A function of Equation 4 may be denoted as the following Equation 5.
Here, M denotes a dimension of a parameter vector a, aj denotes a jth element value of the vector a, and xn,j denotes a jth element value of xn.
Data features may be extracted from the face image data input for the age estimation (or pose estimation) using the above-described method, and the age (or pose) with respect to the test data may be calculated when inputting the extracted data features to the regression function.
When features with respect to the test data are x, this process may be represented as the following Equation 6.
The detailed age estimation and the detailed pose estimation using support vector regression (SVR) as the regression method may be represented as the following Equation 7.
As in the following Equation 7, by calculating a parameter vector a so that a sum of ξn and {circumflex over (ξ)}n is a minimum while satisfying given restriction conditions, learning of estimating the detailed age or the detailed pose is performed.
subject to.
t
n
≦y(xn,a)+ε+ξn, for n=1, . . . , N
t
n
≦y(xn,a)−ε−ξn, for n=1, . . . , N
Here, C denotes a coefficient for reflecting a relative consideration degree of first and second sections of a target function, and ε denotes a coefficient indicating an acceptable error range.
Other than the above-described learning method using the regression analyzer, a learning method using a regression analyzer using a variety of methods such as polynomial Curve Fitting, an artificial neural network and the like may be used.
Hereinafter, a regression analysis method for the detailed age estimation or detailed pose estimation using face information will be described in detail.
In Equation 5 or 7, as described in the learning using the regression analyzer, it is preferable that a difference between an output value of the regression analyzer with respect to an input value and a target value be reduced so that the output value and the target value coincide as much as possible.
However, in a case in which learning data is insufficient, or noise or outline is present in the learning data, when the output value and the target value excessively coincide, recognition performance may be rather reduced due to over-fitting as a whole.
To solve this problem, a similar output value may be obtained with respect to a similar input value while enabling the output value and the target value to coincide with each other.
In particular, the a case of age recognition, accurately estimating an actual age with respect to data of a face image is important, but in an actual application, it is preferable to perform estimation using the age that may be represented as the appearance of a corresponding face image.
Accordingly, when two faces are similar to each other even though the ages of two face images are actually different, it is preferable that learning of the regression analyzer be performed so that similar ages are output.
By reflecting the above, Equation 4 is corrected using Equation 8 so that similar output values are obtained with respect to similar input values while enabling the output value and the target value to coincide.
Here, C denotes a coefficient for reflecting a relative consideration degree of first and second sections of a target function. The second section is added to Equation 4 so that similar output values are obtained with respect to similar input values.
Wm,n of the second section indicates similarity between mth face image data and nth face image data, and is denoted by Equation 9.
w
m,n=exp(−∥xm−xn∥2/σ2) [Equation 9]
In addition, by reflecting the above, Equation 7 is corrected using Equation 10 so that similar output values are obtained with respect to similar input values while enabling the output value and the target value to coincide.
subject to.
t
n
≦y(xn,a)+ε+ξn, for n=1, . . . , N
t
n
≦y(xn,a)−ε−ξn, for n=1, . . . , N
Here, C1 and C2 denote a coefficient for reflecting a relative consideration degree of first, second, and third sections of a target function. The third section is added to Equation 7 so that similar output values are obtained with respect to the similar input values.
Another configuration example of the learning using the regression analyzer for the detailed age estimation and detailed pose estimation using face information will be described in detail.
In the case of the detailed age estimation or detailed pose estimation using the face information, face image data corresponding to input values of learning data are relatively easily collected, but it is significantly difficult to collect values with respect to detailed ages or detailed poses corresponding to the target value.
When the learning data is insufficient, over-fitting may occur, and therefore it is difficult to expect high recognition performance.
As described in the detailed age or pose estimation, when the number of pieces of the learning data without the target value is large, whereas the learning data having the target value is insufficient, it is preferable that learning of the regression analyzer be performed so that the learning data without the target value has similar output values with respect to similar input values in order to reduce performance deterioration due to over-fitting and improve recognition accuracy.
By reflecting this, Equation 4 is corrected and represented as Equation 11.
Here, among N numbered entire learning data, indexes of learning data having target values are represented as 1 to T, and indexes of the learning data without the target values are represented as T+1 to N. In addition, C is a coefficient for reflecting relative consideration information of first and second sections of a target function.
The first section of Equation 11 is corrected so that learning with respect to only the learning data having target values is performed in accordance with the target values. In addition, in the case of data without the target values including data having the target values, the second section is added so that similar output values are obtained with respect to similar input values.
As described above, in the case of the learning data without the target value, Equation 7 is corrected such as in Equation 12 so that similar output values are obtained with respect to similar input values.
subject to.
t
n
≦y(xn,a)+ε+ξn, for n=1, . . . , N
t
n
≦y(xn,a)−ε−ξn, for n=1, . . . , N
Here, C1 and C2 are coefficients for reflecting a relative consideration degree of the first, the second, and the third sections of a target function.
The first section of Equation 12 is corrected so that learning with respect to only the learning data having the target value is performed in accordance with the target value. In addition, in the case of data without the target value including the data having the target value, the third section is added so that similar output values are obtained with respect to similar input values.
Still another configuration of the learning using the regression analyzer for the detailed age estimation or the detailed pose estimation using the face information will be described in detail.
The age of a person may be estimated through a face of the person, but it is not easy to accurately determine the age of the person. In particular, this may mean that the number of overlapping portions of feature regions of face image data included in mutually different two groups is large on a feature space from the point of view of pattern recognition.
Referring to
For example, technically, it is not easy to infer the age of a person having a baby face such as mid 30s even though the actual age of the person is 40s, using only face image information (or features).
When the isolated data corresponds to noise or outline, and learning of a regression analyzer or a classifier is performed with respect to even the isolated data so that the detailed age is accurately estimated or the ages are divided, recognition performance is reduced due to over-fitting as a whole.
Accordingly, preferably, face image data whose age division is ambiguous may be separately gathered, and used in the learning so as to be induced from a similar relationship of neighboring data (or similar data), rather than performing learning of the regression analyzer or the classifier with respect to the face image data in accordance with the actual age.
When the face image data whose age division is ambiguous is separately gathered and used in the learning so as to be induced from the similar relation of the neighboring data, deterioration of recognition performance due to over-fitting may be prevented, and an age recognizer for outputting a natural recognition result similar to recognition of a human being may be configured.
A method of selecting learning data whose division is ambiguous may be applied to all or a part of steps of data division which have been described in
As described in
Referring to
Next, data (or data included in a rejection region) whose a posteriori probability is lower than a threshold value (A) is selected as the learning data whose division is ambiguous.
Alternatively, using data features selected as to readily distinguish two groups and learning data included in the corresponding two groups, learning of a regression analyzer for detailed age estimation is performed. Thereafter, data in which the age estimated by the actual age and the regression analyzer is the oldest is selected as the learning data whose division is ambiguous.
Alternatively, using data features selected so as to readily distinguish two groups and learning data included in the corresponding two groups, learning of an analyzer for two groups is performed. Thereafter, data in which groups (output values) estimated by an actual group (target value) and the classifier are different is selected as the learning data whose division is ambiguous.
Alternatively, as shown in (a) of
Thereafter, data (or data included in a rejection region) whose a posteriori probability is lower than a threshold value (θ) is selected as the learning data whose division is ambiguous.
Here, Equation 13 is obtained by correcting Equation 4 so that the data whose division is ambiguous is induced from the similar relationship of neighboring data (or similar data) rather than training the regression analyzer in accordance with the actual age.
Here, the learning data whose division is ambiguous is represented as indexes of T+1 to N. That is, data whose division is ambiguous, which is selected from N numbered entire learning data, is represented as xn, T+1≦n≦N. In addition, C denotes a coefficient for reflecting relative consideration information of the first and second sections of a target function.
The first section is corrected so as to be learned in accordance with the actual age with respect to only data whose division is clear, and in data whose division is ambiguous, the second section is added to Equation 4 so that the age is induced from the similar relationship of neighboring data (or similar data).
Wm,n of the second section denotes similarity between mth face image data and nth face image data.
In addition, Equation 14 is obtained by correcting Equation 7 so that the data whose division is ambiguous is induced from a similar relationship of neighboring data (or similar data) rather than training the regression analyzer in accordance with the actual age.
subject to.
t
n
≦y(xn,a)+ε+ξn, for n=1, . . . , N
t
n
≦y(xn,a)−ε−ξn, for n=1, . . . , N
Here, C1 and C2 denote coefficients for reflecting a relative consideration degree of the first, the second, and the third sections of the target function.
The first section is corrected so as to be learned in accordance with the actual age with respect to only data whose division is clear, and the third section is added to Equation 7 so that data whose division is ambiguous is induced from the similar relationship of neighboring data (or similar data).
As described above, in the learning method and learning apparatus using the extracted data features according to embodiments of the present invention, input learning data is divided into two groups in a stepwise manner, and data features for distinguishing the divided groups are extracted, and therefore learning is performed using the extracted data features.
Accordingly, features for readily distinguishing each group by dividing learning data in a stepwise manner are extracted, and therefore the regression analyzer or the multi-classifier may be effectively configured. In addition, when the present invention is utilized in the age recognition or the pose estimation based on the face image data, an analyzer having high recognition performance may be configured.
While the example embodiments of the present invention and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations may be made herein without departing from the scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
10-2012-0106685 | Sep 2012 | KR | national |