The present disclosure relates to an information processing apparatus for estimating age from a captured face image, and a method thereof.
There has recently been a growing trend to estimate age and sex from face images obtained from a camera and using the estimations for marketing or security purposes. Under the circumstances, there are techniques for inputting a face image into generation-specific classifiers and estimating age based on the output results thereof. An example of such a technique is discussed in Japanese Patent Application Laid-Open No. 2014-153815.
It is extremely difficult to estimate the exact age from only the appearance of a face image in image recognition, just like humans have difficulty in estimating other people's age.
According to an aspect of the present disclosure, an information processing apparatus includes an extraction unit configured to extract a feature from an image including a face, a first estimation unit configured to estimate a likelihood of the face with respect to each generation based on the feature, a storage unit storing a plurality of samples, the plurality of samples each including a generation-specific combination of likelihoods and a correct age as a pair, a selection unit configured to select a sample from the storage unit based on a combination of likelihoods estimated by the first estimation unit, and a second estimation unit configured to estimate an estimated age of the face and an error range thereof based on the sample selected by the selection unit.
Further features will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
Exemplary embodiments will be described below with reference to the drawings.
A first exemplary embodiment will be described.
In step S1000, the image acquisition unit 100 obtains a digital image that is obtained through a light collecting element such as a lens, an image sensor for converting light into an electrical signal, and an analog-to-digital (AD) converter for converting an analog signal into a digital signal. Examples of the image sensor include a complementary metal-oxide-semiconductor (CMOS) sensor and a charge-coupled device (CCD) sensor. Through thinning processing, the image acquisition unit 100 can obtain, for example, an image converted into full high-definition (Full HD) resolution (1920×1080 [pixels]) or HD resolution (1280×720[pixels]).
In step S1100, the face detection unit 110 detects a face included in the image obtained in step S1000. As a technique for detecting a face (hereinafter, referred to as “face detection”), the face detection unit 110 uses a technique discussed in P. Viola, M. Jones. “Rapid Object Detection using a Boosted Cascade of Simple Features”, in Proc. Of CVPR, vol. 1, pp. 511-518, December, 2001.
In step S1110, the face detection unit 110 determines whether a face is detected in step S1100. If a face is detected (YES in step S1110), the processing proceeds to step S1200. If a face is not detected (NO in step S1110), the processing returns to step S1000.
In step S1200, the face organ detection unit 120 detects face organs, such as eye corners and a mouth, in the face detected in step S1100. As a technique for detecting face organs, the face organ detection unit 120 uses a technique discussed in Xudong Cao, Yichen Wei, Fang Wen, Jian Sun, “Face Alignment by Explicit Shape Regression”, CVPR, pp. 2887-2894, 2012.
In step S1300, the feature extraction units 130 sets feature extraction areas, like the rectangles in
In step S1400, the generation estimation unit 140 connects the plurality of luminance gradient histograms extracted in step S1300 as illustrated in
In step S1500, the age and error estimation unit 150 determines age and errors by using the outputs of the respective generator estimators in step S1400.
In step S1501, the age and error estimation unit 150 generates an N-dimensional feature vector V=[V1, V2, . . . , Vn] from N likelihoods output from the respective generation estimators as illustrated in
In step S1502, the age and error estimation unit 150 selects a dictionary sample Si=[Si1, Si2, . . . , Sin] stored in the database. In step S1503 the age and error estimation unit 150 determines a distance Li between the N-dimensional feature vector of the dictionary sample Si selected in step S1502 and the N-dimensional feature vector generated from the input image in step S1501 by Eq. (1):
Li=Σj=1n|Vj−Sij|. Eq. (1)
Instead of the Manhattan distance, the distance Li may be a Euclidean distance given by Eq. (2):
Li=Σj=1n(Vj−Sij)2. Eq. (2)
In step S1504, the age and error estimation unit 150 determines whether there is an unselected dictionary sample from among the dictionary samples S stored in the database. If there is an unselected dictionary sample (YES in step S1504), the processing returns to step S1502. In step S1502, the age and error estimation unit 150 selects the unselected dictionary sample. If all the dictionary samples S have been selected (NO in step S1504), the processing proceeds to step S1505.
In step S1505, the age and error estimation unit 150 selects M similar dictionary samples by using the distances calculated in step S1503.
In step S1506, the age and error estimation unit 150 generates a histogram from the correct ages Agegt of the M dictionary samples selected in step S1505.
In step S1507, the age and error estimation unit 150 determines an average μ1 and standard deviations σ1 and σ2 of the histogram generated in step S1506 as illustrated in
In step S1600 of
According to the present exemplary embodiment, an error range is determined and displayed along with age. The user can thus find out the possible range of ages of the target and also whether the target tends to be determined to be younger or older.
In the first exemplary embodiment, M dictionary samples are selected based on the distances from the dictionary samples prepared in advance, and an estimated age and errors are determined from a histogram of the selected M dictionary samples. However, a feature amount extracted from an image can be affected by a change in an illumination environment and a change in facial expression. The age and error estimation unit 150 then can determine an estimated age and errors, for example, by analyzing a histogram of the selected M dictionary samples and excluding outliers, if any, as illustrated in
In step S1508, the age and error estimation unit 150 analyzes the histogram of the selected M dictionary samples, and excludes outliers if any. As a technique for determining an outlier, the age and error estimation unit 150 can use the SVM discussed in the foregoing paper by Bertozzi et al.
According to the present exemplary embodiment, the effects of a change in the illumination environment and a change in facial expression can be reduced, and an error range can be determined and displayed along with age.
In the first exemplary embodiment, both an estimated age and errors are always determined. However, such a display with a wide error range is equivalent to displaying “unknown age”, as in
In step S1601, the display processing unit 160 determines a difference between the errors σ1 and σ2, and determines whether the absolute value of the difference is greater than or equal to a predetermined threshold. If the absolute value of the difference is greater than or equal to the predetermined threshold (YES in step S1601), the processing proceeds to step S1602. If the absolute value of the difference is less than the predetermined threshold (NO in step S1601), the processing proceeds to step S1603.
In step S1602, the display processing unit 160 displays “unknown age”. Aside from the display “unknown age”, any character string equivalent to “not known” can be used.
In step S1603, the display processing unit 160 displays the estimated age and the errors as in the first and second exemplary embodiments. Alternatively, a range of ages can be displayed based on the estimated age and the errors.
According to the present exemplary embodiment, an age and errors are displayed only if reliability is high. If reliability is low, a message “unknown age” is displayed. This enables the user to immediately determine the reliability of the displayed data.
According to the foregoing exemplary embodiments, age and errors can be more accurately estimated from an image including a face. The estimations can be output as well.
Embodiment(s) can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
While exemplary embodiments have been described, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
This application claims the benefit of Japanese Patent Application No. 2018-012406, filed Jan. 29, 2018, which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
JP2018-012406 | Jan 2018 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
8498491 | Steffens | Jul 2013 | B1 |
20120308087 | Chao et al. | Dec 2012 | A1 |
20120314957 | Narikawa | Dec 2012 | A1 |
20140219518 | Yamazaki et al. | Aug 2014 | A1 |
20160275337 | Shibutani | Sep 2016 | A1 |
Number | Date | Country |
---|---|---|
2014-153815 | Aug 2014 | JP |
Entry |
---|
Paul Viola, Michael Jones, Rapid Object Detection using a Boosted Cascade of Simple Features, Accepted Conference on Computer Vision and Pattern Recognition 2001,(1-9); Proc. of CVPR, vol. 1, pp. 511-518. |
Xudong Cao, et. al., Face Alignment by Explicit Shape Regression, Int J. Comput Vis DOI 10.1007/s11263-013-0667-3, 2012, CVPR, pp. 2887-2894. |
M. Bertozzi, et. al., A Pedestrian Detector Using Histograms of Oriented Gradients and a Support Vector Machine Classifier, Proceedings of the 2007 IEEE, Intelligent Transportation Systems Conference, Seattle, WA, USA, Sep. 30-Oct. 3, 2007, MoD2.2, 1-4244-1396-6/07/$25.00 2007 IEEE 143-148. |
Number | Date | Country | |
---|---|---|---|
20190236337 A1 | Aug 2019 | US |