1. Field of the Invention
The present invention relates to an image processing apparatus, in particular, an image processing apparatus and an image processing method which identify the face of a person or the like contained in an image.
2. Description of the Related Art
Image capturing apparatuses such as a digital still camera and a digital video camera which capture a subject such as a person and record the captured image are in widespread use. In addition, there have been proposed a large number of image processing apparatuses including a face detection function for detecting the face of a person from a captured image.
Further, in recent years, there have been proposed face identification techniques for identifying the face of a specific person from among faces detected by using such a face detection function. For example, there has been proposed an image processing apparatus that locates the position of a corresponding region on an input image, which is a region corresponding to a target region on a registered image, and identifies a face in the input image on the basis of the position of this corresponding region (see, for example, Japanese Unexamined Patent Application No. 2007-115109 (
For example, in fields such as security and entertainment, as a part of biometrics authentication, face identification techniques for identifying individuals by using face images containing person's faces are being put into practical use.
According to the above-mentioned techniques of the related art, accuracy enhancement can be achieved for faces in frontal orientation. However, for example, when shooting a person's face, although a frontal face is shot in many cases, situations are also conceivable in which a non-frontal face is shot. Therefore, cases may arise in which a captured image containing a frontal face and a captured image containing a non-frontal face are recorded. As described above, when faces contained in captured images differ in orientation, face images contained in the captured images can differ greatly even though these images belong to the same person, and there is a risk of unstable face identification performance. Accordingly, when identifying a face contained in a captured image, for example, it is conceivable to perform identification by making the orientation of a face contained in a target captured image the same as the orientation of a registered face, by using a complex three-dimensional face model corresponding to a person's face.
However, in cases where, for example, face identification is performed by an image capturing apparatus such as a compact digital still camera, the image capturing apparatus is often capable of a relatively limited amount of computation related to face identification. Thus, it may be difficult to perform face identification by using the above-mentioned complex three-dimensional face model. Accordingly, it is important to enhance the accuracy of face identification irrespective of the face orientation, while reducing the load of a face identification process.
It is desirable to enhance the accuracy of face identification while reducing the load of a face identification process.
According to an embodiment of the present invention, there is provided an image processing apparatus, an image processing method for the image processing apparatus, and a program for causing a computer to execute the image processing method, the image processing apparatus including: a projecting unit that projects a registered face image containing at least a part of a face onto a surface of a three-dimensional model, which has a shape in which at least a part of the three-dimensional model in one direction on the surface onto which an image is to be projected is bent to a front side, so that a horizontal direction of the face contained in the registered face image substantially coincides with the one direction; a transforming unit that transforms the three-dimensional model on the basis of an orientation of a face contained in a target image; a generating unit that generates a two-dimensional image by projecting the registered face image projected on the surface of the three-dimensional model transformed by the transforming unit, onto a plane; and an identifying unit that identifies the face contained in the target image, by comparing the two-dimensional image generated by the generating unit against the target image. Therefore, the registered face image is projected onto the surface of the three-dimensional model, the three-dimensional model is transformed on the basis of the orientation of the face contained in the target image, the registered face image projected on the surface of the transformed three-dimensional model is projected onto a plane to generate the two-dimensional model, and this two-dimensional image and the target image are compared against each other to identify the face contained in the target image.
In the above-mentioned embodiment, the three-dimensional model may have a shape in which both ends in the one direction are bent to a back side, with a part of the three-dimensional model in the one direction on the surface taken as a bend line, and the projecting unit may project the registered face image onto the surface of the three-dimensional model so that a centerline with respect to the horizontal direction of the face contained in the registered face image substantially coincides with the bend line. Therefore, the registered face image is projected onto the surface of the three-dimensional model so that the centerline with respect to the horizontal direction of the face contained in the registered face image, and the bend line of the three-dimensional model substantially coincide with each other.
In the above-mentioned embodiment, the registered face image may be a normalized image normalized on the basis of eyes of the face contained in the registered face image, the image processing apparatus may further include an eye detection unit that detects eyes of the face contained in the target image, and a normalizing unit that normalizes the target image to generate a normalized target image, on the basis of the eyes detected by the eye detection unit, the transforming unit may rotate and translate the three-dimensional model with reference to a midpoint of a line segment connecting the eyes of the face contained in the registered face image projected on the surface of the three-dimensional model, so that positions of eyes of a face contained in the two-dimensional image generated by the generating unit and positions of eyes of a face contained in the normalized target image become the same, and the identifying unit may identify the face contained in the target image, by comparing the two-dimensional image generated by the generating unit against the normalized target image. Therefore, the three-dimensional model is rotated and translated with reference to the midpoint of a line segment connecting the eyes of the face contained in the registered face image, so that the positions of the eyes of the face contained in the two-dimensional image and the positions of the eyes of the face contained in the normalized target image become the same, and the two-dimensional image and the normalized target image are compared against each other to thereby identify the face contained in the target image.
In the above-mentioned embodiment, the image processing apparatus may further include a transformation parameter storing unit that stores transformation parameters in association with a face orientation, the transformation parameters being used for projecting the registered face image onto the surface of the three-dimensional model so that the centerline with respect to the horizontal direction of the face contained in the registered face image substantially coincides with the bend line, rotating and translating the three-dimensional model with reference to the midpoint of the line segment connecting the eyes of the face contained in the registered face image projected on the surface of the three-dimensional model, so that the positions of the eyes of the face contained in the two-dimensional image generated by the generating unit and the positions of the eyes of the face contained in the normalized target image become the same, and projecting the registered face image projected on the surface of the three-dimensional model that has been rotated and translated, onto a plane to generate a two-dimensional image for each face orientation, and the generating unit may generate the two-dimensional image from the registered face image by using the transformation parameters stored in association with the orientation of the face contained in the target image. Therefore, the two-dimensional image is generated from the registered face image by using the transformation parameters stored in association with the orientation of the face contained in the target image.
In the above-mentioned embodiment, the image processing apparatus may further include an organ detection unit that detects two organs of the face contained in the target image, and a normalizing unit that normalizes the target image to generate a normalized target image, on the basis of the two organs detected by the organ detection unit, the transforming unit may rotate and translate the three-dimensional model so that positions of two organs of a face contained in the two-dimensional image generated by the generating unit and positions of two organs of a face contained in the normalized target image become the same, and the identifying unit may identify the face contained in the target image, by comparing the two-dimensional image generated by the generating unit against the normalized target image. Therefore, the three-dimensional model is rotated and translated so that the positions of the two organs of the face contained in the two-dimensional image and the positions of the two organs of the face contained in the normalized target image become the same, and the two-dimensional image and the normalized target image are compared against each other to identify the face contained in the target image.
In the above-mentioned embodiment, the image processing apparatus may further include a transformation data storing unit that stores values of a rotation angle and a translation distance of the three-dimensional model in association with a face orientation, and the transforming unit may rotate and translate the three-dimensional model by using the values of the rotation angle and the translation distance which are stored in association with the orientation of the face contained in the target image. Therefore, the three-dimensional model is rotated and translated by using the values of the rotation angle and the translation distance stored in association with the orientation of the face contained in the target image.
In the above-mentioned embodiment, the identifying unit may identify the face contained in the target image by comparing, as an object of comparison with the target image, one of the registered face image determined on the basis of the orientation of the face contained in the target image, and the two-dimensional image generated by the generating unit, against the target image. Therefore, as an object of comparison with the target image, one of the registered face image determined on the basis of the orientation of the face contained in the target image, and the two-dimensional image generated by the generating unit is compared against the target image to identify the face contained in the target image.
In the above-mentioned embodiment, the image processing apparatus may further include a registered face image storing unit that stores an image containing at least a part of a frontal face, as the registered face image, and a determining unit that determines the orientation of the face contained in the target image, and if it is determined by the determining unit that the orientation of the face contained in the target image is frontal, the identifying unit may identify the face contained in the target image by comparing the registered face image against the target image. Therefore, if the orientation of the face contained in the target image is determined to be frontal, the registered face image and the target image are compared against each other to thereby identify the face contained in the target image.
In the above-mentioned embodiment, the image processing apparatus may further include an image capturing unit that captures a subject to generate a captured image, a face detection unit that detects a face contained in the captured image, and a determining unit that determines an orientation of the face detected by the face detection unit, and the identifying unit may identify the face contained in the captured image by comparing the two-dimensional image generated, by the generating unit against a face image containing the face detected by the face detection unit. Therefore, the subject is captured to generate the captured image, the face contained in this captured image is detected, the orientation of this detected face is determined, and the two-dimensional image and the face image are compared against each other to thereby identify the face contained in the captured image.
According to an embodiment of the present invention, there is provided an image processing apparatus, an image processing method for the image processing apparatus, and a program for causing a computer to execute the image processing method, the image processing apparatus including: a registered face image storing unit that stores a registered face image, which is a normalized face image containing at least a part of a face and normalized on the basis of eyes of the face; a determining unit that makes a determination of an orientation of a face contained in a target image; an eye detection unit that detects eyes of the face contained in the target image; a normalizing unit that normalizes the target image to generate a normalized target image, on the basis of the eyes detected by the eye detection unit; a transformation parameter storing unit that stores transformation parameters in association with a face orientation that is subject to the determination, the transformation parameters being used for projecting the registered face image onto a surface of a three-dimensional model, which has a shape in which both ends in one direction on the surface onto which an image is to be projected are bent to a back side, with a part of the three-dimensional model in the one direction taken as a bend line, so that a centerline with respect to a horizontal direction of the face contained in the registered face image substantially coincides with the bend line, rotating and translating the three-dimensional model so that positions of the eyes of the face contained in the registered face image projected on the surface of the three-dimensional model become specific positions, with reference to a midpoint of a line segment connecting the eyes, and projecting the registered face image projected on the surface of the three-dimensional model that has been rotated and translated, onto a plane to generate a two-dimensional image for each face orientation that is subject to the determination; an image transformation unit that generates the two-dimensional image from the registered face image by using the transformation parameters stored in association with the orientation of the face determined by the determining unit; and an identifying unit that identifies the face contained in the target image, by comparing the two-dimensional image generated by the image transformation unit against the normalized target image. Therefore, the orientation of the face contained in the target image is determined, the eyes of this face are detected, the target image is normalized on the basis of the eyes, the two-dimensional image is generated from the registered face image by using the transformation parameters stored in association with the determined face orientation, and this two-dimensional image and the target image are compared against each other to thereby identify the face contained in the target image.
Next, an embodiment of the present invention will be described in detail with reference to the drawings.
The optical system 111 is configured by a plurality of lenses (such as a zoom lens and a focus lens) that collect light from a subject. Incident light from the subject is supplied to the image capturing unit 112 via these lenses and an iris (not shown).
The image capturing unit 112 transforms the incident light from a subject to generate a captured image in accordance with predetermined image capturing parameters, and outputs the generated captured image to the face detection unit 120, the eye detection unit 130, and the normalizing unit 140. That is, in the image capturing unit 112, an optical signal from a subject made incident via the optical system 111 undergoes photoelectric conversion by an image capturing device (not shown) into an analog image signal. Then, a signal processing unit (not shown) applies camera signal processing such as noise removal, A/D (Analog/Digital) conversion, and the like to this analog image signal obtained by the photoelectric conversion, thereby generating a captured image.
The face detection unit 120 detects the face of a person contained in the captured image outputted from the image capturing unit 112, and outputs face detection information related to the detected face to the eye detection unit 130 and the normalizing unit 140. As the face detection method, it is possible to employ, for example, a face detection method based on a matching between the actual image and a template in which luminance distribution information of a face is recorded (see, for example, Japanese Unexamined Patent Application Publication No. 2004-133637), or a face detection method based on human face feature quantities, skin color portions, or the like contained in a captured image. The face detection information includes the position and size of a detected face on a captured image. The position of a detected face on a captured image can be set as, for example, the center position of a face image on the captured image, and the size of a detected face on a captured image can be set as, for example, the horizontal and vertical lengths of a face image on the captured image. Through this face detection information, it is possible to locate a face image that is a rectangular captured image containing at least a part of a face on a captured image.
The eye detection unit 130 detects the eyes of a person's face contained in the captured image outputted from the image capturing unit 112, and outputs eye information related to the detected eyes to the normalizing unit 140. That is, the eye detection unit 130 extracts a face image corresponding to the face detected by the face detection unit 120, from the captured image outputted from the image capturing unit 112, by using the face detection information (position and size) outputted from the face detection unit 120. Then, the eye detection unit 130 detects the eyes in this extracted face image. As this eye detection method, like the face detection method, it is possible to employ, for example, an eye detection method based on a matching between the actual image and a template in which luminance distribution information of eyes is recorded. The eye detection information includes the positions of the detected eyes in a face image. The positions in a face image can be set as, for example, the center positions of the eyes in the face image. The face image can be normalized by using this eye detection information. The eye detection unit 130 represents an example of an eye detection unit and an organ detection unit described in the claims.
The normalizing unit 140 performs normalization on a face image corresponding to the face detected by the face detection unit 120, on the basis of the eye detection information outputted from the eye detection unit 130, and outputs the face image that has been normalized (normalized face image) to the face orientation determining unit 150 and the face identification unit 190. In addition, the normalizing unit 140 holds a normalization template for performing normalization, and normalizes a face image on the basis of this normalization template. As this normalization template, it is possible to use, for example, a normalization template 141 that takes the positions of the eyes as a reference, as shown in
The face orientation determining unit 150 determines the orientation of a face detected by the face detection unit 120, by using face orientation determination reference data stored in the face-orientation-determination-reference-data holding unit 151, and outputs this determination result to the image transformation unit 180. That is, the face orientation determining unit 150 calculates to what extent determination conditions according to face orientation determination reference data stored in the face-orientation-determination-reference-data holding unit 151 are satisfied, and obtains a cumulative result value related to face orientation. Then, on the basis of this cumulative result value, the face orientation determining unit 150 determines the orientation of a face detected by the face detection unit 120. Face orientations to be determined by the face orientation determining unit 150 are, for example, “frontal”, “right-facing”, and “left-facing”. As a method of determining a face orientation, it is possible to employ, for example, a determination method that determines the face orientation by using the geometrical features of face organs such as eyes, nose, and mouth, and a determination method that performs a determination process based on a discriminator using a difference in luminance value between two points on a face image to be determined. The face orientation determination will be described later in detail with reference to
The face-orientation-determination-reference-data holding unit 151 holds face orientation determination reference data used for determination by the face orientation determining unit 150. This face orientation determination reference data is reference data that has been sufficiently trained for face orientations to be determined by the face orientation determining unit 150. The face orientation determination reference data will be described later in detail with reference to
The registered face image storing unit 160 stores, as registered face images, face images used for face identification by the face identification unit 190, and supplies the stored registered face images to the image transformation unit 180. The registered face images stored in the registered face image storing unit 160 will be described later in detail with reference to
The three-dimensional model storing unit 170 stores a three-dimensional model for transforming a registered face image stored in the registered face image storing unit 160, and supplies the stored three-dimensional model to the image transformation unit 180. The three-dimensional model stored in the three-dimensional model storing unit 170 will be described later in detail with reference to
The image transformation unit 180 transforms a registered face image stored in the registered face image storing unit 160 to generate a check face image, and outputs the generated check face image to the face identification unit 190. That is, the image transformation unit 180 projects a registered face image onto the surface of a three-dimensional model stored in the three-dimensional model storing unit 170 so that the horizontal direction of the three-dimensional model substantially coincides with the horizontal direction of a face contained in the registered face image. Then, the image transformation unit 180 transforms the three-dimensional model with the registered face image pasted, on the basis of the face orientation determination result outputted from the face orientation determining unit 150. Then, the image transformation unit 180 projects the registered face image pasted on the transformed three-dimensional model, onto a plane to generate a check face image (two-dimensional image). When a determination result indicating “frontal” is outputted from the face orientation determining unit 150, the image transformation unit 180 outputs a registered face image stored in the registered face image storing unit 160 to the face identification unit 190 as a check face image, without performing transformation on the registered face image. This transformation of a registered face image will be described later in detail with reference to
The transformation data storing unit 181 stores transformation data used by the image transformation unit 180 to perform transformation, in association with the orientation of a face to be determined by the face orientation determining unit 150, and supplies the stored transformation data to the image transformation unit 180. The transformation data storing unit 181 will be described later in detail with reference to
The face identification unit 190 identifies whether or not a face detected by the face detection unit 120 is a face (registered face) contained in a registered face image stored in the registered face image storing unit 160, and outputs the identification result to the face identification result outputting unit 195. That is, the face identification unit 190 identifies whether or not a face detected by the face detection unit 120 is a registered face, by comparing a normalized face image outputted from the normalizing unit 140 against a check face image outputted from the image transformation unit 180. As this face identification method, for example, it is possible to employ a face identification method that extracts feature quantities respectively from a registered face image and a normalized face image that are to be compared against each other, and performs face identification on the basis of the extracted feature quantities. That is, the feature quantities extracted from the registered face image, and the feature quantities extracted from the normalized face image are compared against each other to calculate a similarity between these feature quantities. Then, if the calculated similarity exceeds a threshold, the face contained in the normalized face image is determined to be a registered face. Also, as the face identification method, for example, it is also possible to employ an identification method that performs an identification process based on a weak discriminator using a difference in luminance value between two points on each of a registered face image and a normalized face image that are to be compared against each other, or the like. The face identification unit 190 represents an example of an identifying unit described in the claims.
The face identification result outputting unit 195 outputs a face identification result outputted from the face identification unit 190. For example, when an identification result indicating that a face detected by the face detection unit 120 is a registered face is outputted from the face identification unit 190, the face identification result outputting unit 195 makes a display to that effect. For example, the face identification result outputting unit 195 can attach a specific marker to a face that has been determined as a registered face, on a captured image displayed on a display unit (not shown). Also, the face identification result outputting unit 195 can attach the name of the corresponding person in the vicinity of a face that has been determined as a registered face. Also, the face identification result outputting unit 195 can update the image capturing parameters of the image capturing unit 112 on the basis of a face that has been determined as a registered face.
As shown in
The normalizing unit 140 performs a scaling process, a rotating process, and the like on the face image 203 so that the detected eyes' positions 204 and 205 coincide with the reference positions 142 and 143 in the normalization template 141 shown in
In the case shown in
As shown in
Next, a detailed description will be given of a face orientation determination with reference to the drawings.
The face orientation determination reference data held in the face-orientation-determination-reference-data holding unit 151 includes a Coordinate 0(x, y) 152 of a normalized face image and a Coordinate 1(x, y) 153 of the normalized face image, a threshold (m) 154, and a weight (r) 155 of reference data. The Coordinate 0(x, y) 152 and the Coordinate 1(x, y) 153 are coordinates indicating the positions of two points in the normalized face image. The threshold (m) 154 is a threshold with respect to the level difference (luminance difference) between Coordinate 0 and Coordinate 1. Further, the weight (r) 155 of reference data is a weight coefficient that is added on the basis of the result of a comparison between the level difference (luminance difference) between Coordinate 0 and Coordinate 1, and the threshold (m). The face-orientation-determination-reference-data holding unit 151 stores n pieces of reference data each made up of a combination of these values. The reference data illustrated in the embodiment of the present invention is referred to as weak discriminator (weak hypothesis).
Values constituting the face orientation determination reference data are set by using, for example, the top 300 to 1000 most effective combinations, among those combinations learned by a machine learning algorithm such as the AdaBoost.
Next, with reference to the drawings, a detailed description will be given of an example in which a face orientation determination is made with respect to a normalized face image by using face orientation determination reference data. In this example, with the upper left corner of the normalized face image 216 shown in
For example, a position in the normalized face image 216 corresponding to the value of the Coordinate 0(x, y) 152 stored on the first row (Reference Data 0) of face orientation determination reference data is defined as a position 221, and a position in the normalized face image 216 corresponding to the value of the Coordinate 1(x, y) 153 is defined as a position 222. Also, a position in the normalized face image 216 corresponding to the value of the Coordinate 0(x, y) 152 stored on the second row (Reference Data 1) of face orientation determination reference data is defined as a position 223, and a position in the normalized face image 216 corresponding to the value of the Coordinate 1(x, y) 153 is defined as a position 224. Further, a position in the normalized face image 216 corresponding to the value of the Coordinate 0(x, y) 152 stored on the third row (Reference Data 2) of face orientation determination reference data is defined as a position 225, and a position in the normalized face image 216 corresponding to the value of the Coordinate 1(x, y) 153 is defined as a position 226.
First, 0 is set for the value of a score S used for performing a determination, and computations using values contained in Reference Data 0 of the face orientation determination reference data is performed. Specifically, a luminance value A(0) at the position 221 corresponding to the value of the Coordinate 0(x, y) 152 contained in Reference Data 0 of the face orientation determination reference data, and a luminance value B(O) at the position 222 corresponding to the value of the Coordinate 1(x, y) 153 are extracted. Then, the difference C(O) between the respective extracted luminance values is calculated by using the equation below.
C(0)=A(0)−B(0)
Subsequently, by comparing the calculated value C(O) of the difference between the respective luminances, against the value of the threshold (m) 154 contained in Reference Data 0 of the face orientation determination reference data, it is determined whether or not the calculated value C(0) is larger than the threshold (m) 154. If the calculated value C(0) is equal to or smaller than the value of the threshold (m) 154, the value of the weight (r) 155 contained in Reference Data 0 of the face orientation determination reference data is added to the score S. On the other hand, if the calculated value C(0) is larger than the value of the threshold (m) 154, the value of the weight (r) 155 contained in Reference Data 0 of the face orientation determination reference data is not added to the score S.
Subsequently, the above-described computations are repeated by using the values contained in Reference Data 1 of the face orientation determination reference data. Specifically, a luminance value A(1) at the position 223 corresponding to the value of the Coordinate 0(x, y) 152 contained in Reference Data 1 of the face orientation determination reference data, and a luminance value B(1) at the position 224 corresponding to the value of the Coordinate 1(x, y) 153 are extracted. Then, the difference C(1) between the respective extracted luminance values is calculated by using the equation below.
C(1)=A(1)−B(1)
Subsequently, by comparing the calculated value C(1) of the difference between the respective luminances, against the value of the threshold (m) 154 contained in Reference Data 1 of the face orientation determination reference data, it is determined whether or not the calculated value C(1) is larger than the threshold (m) 154. If the calculated value C(1) is equal to or smaller than the value of the threshold (m) 154, the value of the weight (r) 155 contained in Reference Data 1 of the face orientation determination reference data is added to the score S. On the other hand, if the calculated value C(1) is larger than the value of the threshold (m) 154, the value of the weight (r) 155 contained in Reference Data 1 of the face orientation determination reference data is not added to the score S.
Subsequently, from Reference Data 3 of the face orientation determination reference data onwards, the above-described computations are repeated by sequentially using values up to those of Reference Data n−1.
That is, when performing a determination process using face orientation determination reference data with respect to the normalized face image 216, C(i) is calculated by using Equation (A), by sequentially using values contained in Reference Data 0 to n−1 of the face orientation determination reference data. Then, it is determined whether or not the calculated value of C(i) satisfies Equation (B). Here, a variable i is an integer, and is a value from 0 to n−1.
C(i)=A(i)−B(i) . . . (A)
C(i)>m(i) . . . (B)
If the calculated value of C(i) satisfies Equation (B), the value of r(i) is not added to the score S, and if the calculated value of C(i) does not satisfy Equation (B), the value of r(i) is added to the score S. Here, the value of luminance corresponding to the Coordinate 0(x, y) 152 contained in Reference Data i is indicated by A(i), and the value of luminance corresponding to the Coordinate 1(x, y) 153 contained in Reference Data i is indicated by B(i). Also, the value of the threshold (m) 154 contained in Reference Data i is indicated by m(i), and the value of the weight (r) 155 contained in Reference Data i is indicated by r(i).
Then, after computations using values contained in Reference Data 0 to n−1 of the face orientation determination reference data are finished, respective attributes are determined on the basis of the value of the score S as a cumulative result value.
Here, a score Sn(P) obtained after finishing computations using values contained in Reference Data 0 to n−1 of the face orientation determination reference data can be represented by Equation (C) below.
Here, Sn(P) indicates a cumulative result value of Reference Data 0 to n−1, ri indicates the value of the weight (r) 155 contained in Reference Data i, and P(xi0, yi0) indicates the value of luminance corresponding to the Coordinate 0(x, y) 152 contained in Reference Data i. Also, P(xi1, yi1) indicates the value of luminance corresponding to the Coordinate 1(x, y) 153 contained in Reference Data i, mi indicates the value of the threshold (m) 154 contained in Reference Data i, and n indicates the number of pieces of reference data. Also, h(z) indicates a function that becomes “0” when z>0, and becomes “1” when z≦0.
Next, with reference to
For example, it is assumed that learning is performed on the basis of the above-described machine learning algorithm, with learning samples for right-facing orientation at the time of learning taken on the positive side, and learning samples for left-facing orientation taken on the negative side. In the case where learning has been performed in this way, when determining a face orientation, “right-facing”, “frontal”, or “left-facing” is determined by using thresholds 156 and 157 shown in
For example, in the range of cumulative result values related to face orientation shown in
As shown in
The three-dimensional model 300 is a three-dimensional CG (Computer Graphics) model having a shape obtained by bending a rectangle in half to the back side along a bend line that is a line connecting between the respective midpoints of the top and bottom sides of the rectangle. Also, as shown in
The yaw rotation angle (θ) 184 is a value indicating an angle for performing yaw rotation of the three-dimensional model 300 having a registered face image projected on its surface. The roll rotation angle (θ) 185 is a value indicating an angle for performing roll rotation of the three-dimensional model 300 having a registered face image projected on its surface. While this example is directed to a case where θ=−30 or 30, it is possible to set, for example, θ=−30 to −20 or 20 to 30.
The translation distance along x-axis (Tx) 186 is a value indicating a distance for performing parallel translation along the x-axis direction of the three-dimensional model 300 having a registered face image projected on its surface. The translation distance along y-axis (Ty) 187 is a value indicating a distance for performing parallel translation along the y-axis direction of the three-dimensional model 300 having a registered face image projected on its surface. The translation distance along z-axis (Tz) 188 is a value indicating a distance for performing parallel translation along the z-axis direction of the three-dimensional model 300 having a registered face image projected on its surface.
Here, when a face orientation is determined to be “left-facing” or “right-facing” by the face orientation determining unit 150, the image transformation unit 180 performs rotation and parallel translation with respect to the three-dimensional model 300 by using values stored in association with the determined face orientation (Left-facing 182 or Right-facing 183). Such rotation and parallel translation will be described later in detail with reference to
Equation (1) represents a determinant for transforming Coordinate g1(x, y) on the registered face image 161 shown in
Here, the matrix (x y 11) on the left hand side of Equation (1) is a matrix corresponding to Coordinate g1(x, y) of the registered face image 161, and the matrix (u v 11) on the right hand side is a matrix corresponding to Coordinate g4(u, v) of the check face image 340. The matrix F on the right hand side is a matrix for projecting the image pasted on the three-dimensional model 300 onto a plane to generate a two-dimensional image, and changing the origin of coordinates in this two-dimensional image. That is, by using the inverse matrix F−1 of the matrix F, as shown in
Here, let a distance Z be the distance from the reference point K1 to the origin O3 of the registered face image three-dimensional model 330, and a distance f be the distance from the reference point K1 to the image projection plane 350. The distance f can be set as, for example, the same value as the length of one side at the left and right ends of the registered face image three-dimensional model 330. The matrix J using the distances Zc and f is a matrix for projecting an image onto the surface of the three-dimensional model 300 by using triangle similitude. Also, as shown in
As shown in
Next, operation of the image processing apparatus 100 according to an embodiment of the present invention will be described with reference to the drawings.
First, the image capturing unit 112 generates a captured image (step S901). Subsequently, the face detection unit 120 performs a face detection process with respect to the generated captured image (step S902). Through this face detection process, it is determined whether or not a face has been detected (step S903), and if no face has been detected, operation of the face identification process is ended. On the other hand, if a face has been detected (step S903), the eye detection unit 130 performs an eye detection process with respect to the face detected by the face detection unit 120 (step S904).
Subsequently, the normalizing unit 140 performs normalization by adjusting a face image containing the face detected by the face detection unit 120 so that the positions of the eyes coincide with the reference positions 142 and 143 of the normalization template 141 (step S905). Subsequently, the face orientation determining unit 150 executes a face orientation determining process with respect to a normalized face image, which is the face image that has been normalized (step S920). This face orientation determining process will be described later in detail with reference to
Subsequently, the image transformation unit 180 determines whether or not a face orientation determined by the face orientation determining unit 150 is frontal (step S906). If the face orientation determined by the face orientation determining unit 150 is not frontal (step S906), the image transformation unit 180 projects a registered face image stored in the registered face image storing unit 160 onto the three-dimensional model (step S907). Subsequently, the image transformation unit 180 rotates and translates the three-dimensional model 300 with the registered face image pasted thereon, by using transformation data stored in the transformation data storing unit 181 in association with the face orientation determined by the face orientation determining unit 150 (step S908). Subsequently, the registered face image pasted on the three-dimensional model 300 that has been rotated and translated is projected onto a plane to generate a two-dimensional image (check face image) (step S909).
Subsequently, the face identification unit 190 compares the two-dimensional image generated by the image transformation unit 180, against the normalized face image generated by the normalizing unit 140 to determine whether or not the face in this normalized face image is the face of the same person as the face in the registered face image (step S910). That is, a face identification process is performed with respect to the face detected by the face detection unit 120. In a case where a plurality of registered face images are stored in the registered face image storing unit 160, two-dimensional images are generated with respect to the individual registered face images, and the face identification process is performed with respect to each of the two-dimensional images.
If the face orientation determined by the face orientation determining unit 150 is frontal (step S906), the face identification unit 190 compares the registered face image stored in the registered face image storing unit 160, against the normalized face image generated by the normalizing unit 140. Then, the face identification unit 190 determines whether or not the face in this normalized face image is the face of the same person as the face in the registered face image (step S910). In a case where a plurality of registered face images are stored in the registered face image storing unit 160, the face identification process is performed with respect to each of the registered face images.
First, the score S is initialized to “0” (step S921), and the variable i is initialized to “0” (step S922). Subsequently, from among luminance values extracted from a normalized face image, the luminance value A(i) corresponding to the Coordinate 0(x, y) 152 of Reference Data i in the face-orientation-determination-reference-data holding unit 151, and the luminance value B(i) corresponding to the Coordinate 1(x, y) 153 are acquired (step S923). Subsequently, the difference C(i) between the respective acquired luminance values is calculated by using the following equation (step S924).
C(i)=A(i)−B(i)
Subsequently, the calculated value C(i) of the difference between the respective luminances, and the value of the threshold (m) 154 contained in Reference Data i in the face-orientation-determination-reference-data holding unit 151 are compared against each other to determine whether or not the calculated value C(i) is larger than the value of the threshold (m) 154 (step S925). If the calculated value C(i) is equal to or smaller than the value of the threshold (m) 154 (step S925), the value of the weight (r) 155 contained in Reference Data i in the face-orientation-determination-reference-data holding unit 151 is added to the score S (step S926). On the other hand, if the calculated value C(i) is larger than the value of the threshold (m) 154 (step S925), the value of the weight (r) 155 contained in Reference Data i in the face-orientation-determination-reference-data holding unit 151 is not added to the score S, and the process proceeds to step S927.
Subsequently, “1” is added to the variable i (step S927), and it is determined whether or not the variable i is larger than n−1 (step S928). If the variable i is not larger than n−1 (step S928), the determination process has not been finished with respect to each reference data in the face-orientation-determination-reference-data holding unit 151, so the process returns to step S923, and the determination process is repeated (steps S923 to S927). On the other hand, if the variable i is larger than n−1 (step S928), it is determined whether or not the value of the score S falls between Threshold 1 and Threshold 2 (step S929). Threshold 1 corresponds to the threshold 156 shown in
If the value of the score S falls between Threshold 1 and Threshold 2 (step S929), it is determined that the orientation of a face contained in a normalized face image is “frontal” (step S930).
If the value of the score S does not fall between Threshold 1 and Threshold 2 (step S929), it is determined whether or not the value of the score S is larger than Threshold 2 (step S931). If the value of the score S is larger than Threshold 2 (step S931), it is determined that the face contained in the normalized face image is “right-facing” (step S932). On the other hand, if the value of the score S is smaller than Threshold 2 (step S931), it is determined that the face contained in the normalized face image is “left-facing” (step S933). Step S907 represents an example of a projecting step described in the claims. Step S908 represents an example of a transforming step described in the claims. Step S909 represents an example of a generating step described in the claims. Step S910 represents an example of an identifying step described in the claims.
The foregoing description is directed to the example in which a registered face image is transformed into a check face image by performing computations using the matrices F to K in Equation (1). That is, in the computations using the matrices F to K in Equation (1), the three-dimensional coordinate origin in the three-dimensional model 300 with a registered face image pasted on its surface is set as the midpoint of the bend line of the three-dimensional model 300, and rotation and translation are performed with this three-dimensional coordinate origin as a reference. In a case where the midpoint of the bend line is set as the three-dimensional coordinate origin in this way, after performing yaw rotation of the three-dimensional model 300, it is necessary to perform roll rotation and parallel translation along the y-axis for effecting normalization with reference to the positions of the eyes.
Here, a registered face image is an image that has been normalized with reference to the positions of the eyes. Thus, for example, when the midpoint of the line segment connecting the eyes of a face contained in a registered face image pasted on the surface of the three-dimensional model 300 is set as the three-dimensional coordinate origin, roll rotation and parallel translation along the y-axis can be omitted. This allows for a reduction in the amount of computation in comparison to the case of performing computations using the matrices F to K in Equation (1). In the following, with reference to the drawings, a detailed description will be given of an example in which image transformation is performed by setting, as the three-dimensional coordinate origin, the midpoint of the line segment connecting the eyes in the three-dimensional model 300, thereby omitting roll rotation and parallel translation along the y-axis.
The image transformation unit 510 generates a two-dimensional image from a registered face image stored in the registered face image storing unit 160, by using transformation parameters that are stored in the transformation data storing unit 520 in association with a face orientation determined by the face orientation determining unit 150.
The transformation data storing unit 520 stores transformation parameters used by the image transformation unit 510 to perform transformation, in association with a face orientation to be determined by the face orientation determining unit 150, and supplies the stored transformation parameters to the image transformation unit 510. The transformation data storing unit 520 will be described later in detail with reference to
The transformation parameters 523 are transformation parameters used by the image transformation unit 510 to transform a registered face image stored in the registered face image storing unit 160 to generate a check face image. Specifically, the transformation parameters 523 are transformation parameters used for projecting a registered face image onto the surface of a three-dimensional model, transforming the three-dimensional model on the basis of the face orientation determination result, and projecting the registered face image on the transformed three-dimensional model onto a plane to generate a check face image. When projecting the registered face image onto the surface of the three-dimensional model, the registered face image is projected in such a way that the bend line of the three-dimensional model stored in the three-dimensional model storing unit 170 substantially coincides with the centerline with respect to the horizontal direction of a face contained in the registered face image. When transforming the three-dimensional model, the three-dimensional model is rotated and translated with reference to the midpoint of the line segment connecting the eyes of a face contained in the registered face image pasted on the surface of the three-dimensional model, in such a way that the positions of the eyes become specific positions.
Here, when a face orientation is determined to be either “left-facing” or “right-facing” by the face orientation determining unit 150, the image transformation unit 510 transforms the registered face image into a two-dimensional image by using transformation parameters stored in association with the determined face orientation (Left-orientation 521 or Right-orientation 522). This transformation uses transformation parameters according to “x<0” and “x≧0”, with the center position of the registered face image taken as the coordinate origin. The registered face image is transformed into the two-dimensional image with the center position of the two-dimensional image taken as the coordinate origin. A method of calculating transformation parameters A to E will be described later in detail with reference to
As shown in
As shown in
Here, in a case where the middle position of the line segment connecting the eyes of a face contained in each of the registered face image 161 and the registered face image three-dimensional model 330 is taken as the origin, q=0. Accordingly, let Ty=0. By substituting Equation (2) and Equation (3) into Equation (4) and Equation (5), parallel translation components Tx, Ty, and Tz can be obtained as follows.
Tx=p sin θ(tan α−(p/f)) (6)
Ty=0 (7)
Tz=(f+p tan α)cos θ−Zc (8)
Here, in a case where the middle position between the eyes of the face in the registered face image three-dimensional model 330 is taken as the coordinate origin, as described above, parallel translation of Ty and roll rotation can be omitted. Here, as for the three-dimensional coordinate prior to yaw rotation, which is obtained after the coordinate origin on the registered face image 161 is changed to the center point and the registered face image 161 is projected onto the surface of the three-dimensional model 300, if xc≧0, then, referring to
x
c
=x+Cx
y
c
=y+Cy
Cx and Cy are the same as the values shown in
Here, the matrix on the left hand side of the right term of Equation (9) is a matrix for performing yaw rotation of the three-dimensional model in three-dimensional space. Like the matrix G, this matrix is a matrix for performing yaw rotation by the rotation angle θ. The matrix on the right hand side of the right term of Equation (9) is a matrix indicating a three-dimensional coordinate obtained after performing yaw rotation of the three-dimensional model in three-dimensional space. It should be noted, however, that uc and vc are defined as follows. Cu and Cv are the same as the values shown in
u
c
=u−Cu
v
c
=v−Cv
Subsequently, expanding the matrices in Equation (9) yields Equations (10) to (12) below.
x
c=(uc/f)Z cos θ+(uc/f)Zc cos θ−Tx cos θ−Z sin θ+Tz sin θ (10)
y
c=(vc/f)Z+(vc/f)Zc (11)
−xc tan α=(uc/f)Z sin θ+(uc/f)Zc sin θ−Tx sin θ+Z cos θ−Tz cos θ (12)
In a case where the middle position between the eyes in the registered face image three-dimensional model 330 is taken as an origin O7, the value of Z can be obtained by Equation (13) below.
Z=−(x cos θ−xc tan α sin θ)tan(α+θ)+Tz (13)
Here, substituting Tz in Equation (8) described above into Equation (13) yields Equation (14) below.
Z=−x
c(cos θ−tan α sin θ)tan(α+θ)+(f+p tan α)cos θ−Zc (14)
Subsequently, by substituting Tx in Equation (6), Tz in Equation (8), and Z in Equation (14) into Equation (10) and Equation (11), xc and yc when xc≧0 can be obtained as Equation (15) and Equation (16). Here, xc, yc, uc and vc shown in Equations (15) to (33) are simply represented as x, y, u, and v for the purpose of generalization.
Likewise, xc and yc when xc<0 can be obtained as Equation (17) and Equation (18).
Subsequently, xc and yc shown in Equations (15) to (18) can be modified by using Equations (19) to (21) below. The modified equations are represented as Equations (22) to (25).
Here, Equations (22) and (23) indicate xc and yc when xc≧0, and Equations (24) and (25) indicate xc and yc when xc≦0. Here, f, p, α, and θ other than the variables u and v can be set as fixed values. For example, α and θ can be set such that α=10 to 20 and θ=−30 to −20 or 20 to 30. In the embodiment of the present invention, a registered face image is transformed in accordance with “left-facing” or “right-facing” determined by the face orientation determining unit 150. Accordingly, assuming the case of “left-facing” or “right-facing” (θ≧0 or θ<0), Equations (26) to (33) below can be calculated.
Here, A0, B0, B1, C0, C1, D0, D1 and E are values stored in the transformation data storing unit 520 in
In this way, the transformation parameters 523 stored in the transformation data storing unit 520 are calculated in advance, and the image transformation unit 510 can generate a two-dimensional image from a registered face image by using the values of the transformation parameters 523. That is, with the center position of the registered face image taken as the origin, and the center position of the two-dimensional image taken as the origin, a coordinate (x, y) in the registered face image is transformed into a coordinate (u, v) in the two-dimensional image. This makes it possible to reduce the amount of computation in comparison to the case in which the three-dimensional coordinate origin in the three-dimensional model 300 having a registered face image pasted on its surface is set as the midpoint of the bend line. This enables a significant reduction in the processing load at the time of the face identification process. Thus, the face identification function can be easily incorporated into compact digital still cameras, mobile telephones, and the like.
Here, in the face identification result graph 600, the horizontal axis indicates the value of the actual orientation angle of a face contained in a captured image, and the vertical axis indicates a value obtained by summing up and then averaging the scores of face identification results outputted by the face identification unit 190. It is assumed that a face identification threshold 601 for determining whether or not a face is a registered face on the basis of the calculated score of face identification result is set within the range of 0 to 10000. Further, in this example, the face orientation angle when a face contained in a captured image is frontal is set as 90 degrees. The angle at or below which a face is determined to be left-facing by the face orientation determining unit 150 is set to 77 degrees. The angle at or above which a face is determined to be right-facing by the face orientation determining unit 150 is set to 113 degrees.
A line 611 indicates values obtained with respect to individual captured images by summing up and then averaging the scores of face identification results calculated when face identification was performed after performing image transformation by the image transformation unit 180 on the basis of the results of determination by the face orientation determining unit 150. On the other hand, a line 612 indicates values obtained with respect to individual captured images by summing up and then averaging the scores of face identification results calculated when face identification was performed without performing image transformation by the image transformation unit 180.
As indicated by the face identification result graph 600, there is hardly any difference between the lines 611 and 612 when the orientation of a face is close to frontal (between vertical lines 603 and 604). However, upon exceeding an angle beyond which a face is determined to be left-facing or right-facing by the face orientation determining unit 150 (on the left side with respect to the vertical line 603 or on the right side with respect to the vertical line 604), as indicated by, for example, difference values 621 to 624, the difference value between the lines 611 and 612 becomes large. That is, by applying the embodiment of the present invention, the accuracy of face identification can be enhanced also with respect to a captured image containing a face that is facing in an oblique direction. While this example is directed to the case in which face identification is performed by using a pre-recorded captured image, accuracy can be similarly enhanced with respect to face identification for a captured image performed at the time of image capture by an image capturing apparatus such as a digital still camera.
Next, operation of the image processing apparatus 500 according to an embodiment of the present invention will be described with reference to the drawings.
If a face orientation determined by the face orientation determining unit 150 is not frontal (step S906), the image transformation unit 510 generates a two-dimensional image (check face image) on the basis of the face orientation determined by the face orientation determining unit 150 (step S950). That is, the image transformation unit 510 transforms a registered face image stored in the registered face image storing unit 160 to generate a two-dimensional image (check face image), by using transformation parameters stored in the transformation data storing unit 520 in association with the face orientation determined by the face orientation determining unit 150.
A three-dimensional model 700 shown in
Image transformation may be also performed by using the three-dimensional model 300 in which α=0, for example. That is, image transformation may be performed by using a three-dimensional model having a substantially planar shape. Alternatively, image transformation may be performed by using a three-dimensional model that has a shape in which at least a part of the three-dimensional model in the horizontal direction on the surface onto which an image is to be projected is bent to the back side. In this way, according to the embodiment of the present invention, image transformation of a registered face image can be performed by using a so-called three-dimensional simplified model (simplified polygon model).
As described above, according to the embodiment of the present invention, by using a three-dimensional model, a two-dimensional image (check face image) of the same orientation as the face image contained in a captured image can be generated from a registered face image, and this generated two-dimensional image and a normalized face image can be compared and checked against each other. Thus, at the time of the identification process by the face identification unit 190, the orientations of faces contained in two images to be compared against each other become the same, thereby making it possible to enhance the accuracy of face identification. In addition, at the time of the identification process, a two-dimensional image of the same orientation as the face image contained in a captured image can be generated from a registered face image. Therefore, it suffices to register only one frontal registered face image with respect to each person. This allows for a reduction in the storage size of registered face images.
When performing image transformation on a registered face image, a simplified geometric model is used, thereby making it possible to achieve a significant reduction in the amount of necessary computation in comparison to a case in which image transformation is performed by using a standard three-dimensional face model according to the related art. This enables implementation also on mobile devices and the like capable of a relatively limited amount of computation, such as mobile telephones and digital still cameras.
Here, a case is considered in which, instead of transforming a registered face image on the basis of a determined face orientation, a face image contained in a captured image is transformed into a frontal image to perform face identification. For example, if a face contained in the face image is a right-facing face, the right-side portion of the face is not contained in the face image. Hence, if the face image is transformed into a frontal face, there is a great possibility that accurate image transformation may not be performed on the right-side portion contained in the transformed frontal face. In particular, organs such as the eyes, mouth, and the like of a face are important for face identification. If these organs are not contained in the face image, there is a great possibility that accurate image transformation may not be performed on those organs contained in the transformed frontal face, resulting in a decrease in the accuracy of face identification. In contrast, according to the embodiment of the present invention, a check face image is generated from a registered face image containing a frontal face, allowing accurate image transformation to be performed on the organs such as the eyes and mouth of the face. This allows for enhanced accuracy of face identification.
While the embodiment of the present invention is directed to the example in which the face orientation determining unit 150 determines a face orientation by classifying the face orientation into “frontal”, “right-facing”, and “left-facing”, the embodiment of the present invention is also applicable to a case in which the face orientation determining unit 150 determines a face orientation by classifying the face orientation into four or more orientations. Also, while the embodiment of the present invention is directed to the example in which the face orientation determining unit 150 determines the orientation of a face with respect to the lateral direction, the embodiment of the present invention is also applicable to a case in which the face orientation determining unit 150 determines the orientation of a face with respect to the vertical direction. In this case, image transformation can be performed by using, for example, a three-dimensional model that has a shape in which at least a part of the three-dimensional model in the vertical direction on the surface onto which an image is to be projected is bent to the front side.
The embodiment of the present invention can be also applied to an image processing apparatus such as a device with a camera function such as a mobile telephone including a camcorder (camera and recorder) or an image capturing unit, or a PC (Personal Computer).
While the face of a person is exemplified as the face to be subject to face identification in the embodiment of the present invention, the embodiment of the present invention is also applicable to the case of identifying the face of another animal such as a mammal.
While a still captured image is exemplified as the target image to be subject to face identification in the embodiment of the present invention, the embodiment of the present invention is also applicable to a moving image. In the case of a moving image, for example, a face is detected for each stream, and face identification can be performed with respect to a face image containing this face. Also, a face may be detected for each GOP (Group of Pictures), or a face can be detected for each fixed interval within a stream.
It should be noted that the embodiment of the present invention is merely illustrative of an example of implementation of the present invention, and has correspondence to each of the invention-specifying matters in the claims as described above. It should be noted, however, that the present invention is not limited to the embodiment, and various modifications can be made without departing from the scope of the present invention.
The processing steps described with reference to the embodiment of the present invention may be grasped as a method having a series of these steps, or may be grasped as a program for causing a computer to execute a series of these steps and a recording medium that stores the program. As such a recording medium, for example, a CD (Compact Disc), an MD (MiniDisc), a DVD (Digital Versatile Disk), a memory card, a Blur-ray Disc (R), or the like may be used.
The present application contains subject matter related to that disclosed in Japanese Priority Patent Application JP 2008-152643 filed in the Japan Patent Office on Jun. 11, 2008, the entire content of which is hereby incorporated by reference.
It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and alterations may occur depending on design requirements and other factors insofar as they are within the scope of the appended claims or the equivalents thereof.
Number | Date | Country | Kind |
---|---|---|---|
2008-152643 | Jun 2008 | JP | national |