This disclosure relates generally to image capture, and in particular but not exclusively, relates to detecting features in images.
Image sensors have become ubiquitous. They are widely used in digital still cameras, cellular phones, security cameras, as well as, medical, automobile, and other applications. The technology used to manufacture image sensors, has continued to advance at great pace.
One feature that is useful in connection with image sensors is feature detection. For example, some devices that include image sensors are capable of capturing an image in response to detecting a given feature in an image. For example, an image may be captured in response to detecting that a person in the image frame is smiling. Conventionally, classifiers of features from training images requiring significant storage space have been loaded into memory to be used to compare with a current image that an image sensor is currently imaging. To accommodate the variety in size, shape, and shades that features (e.g. mouths and teeth) include, a large number of classifiers of features from training images may be required to sufficiently identify a smile, for example. Furthermore, additional training images are necessary to identify additional features (e.g. eyes for blink detection). Therefore, feature detection takes significant memory resources.
In addition to memory resources, the conventional feature detection also requires significant processing resources to compare the current image to the variety of classifiers of features from training images. This may cause time delays in capturing the desired images and drain battery resources. Hence, a feature detection device and/or method that would reduce memory, processing, and/or power consumption would be desirable.
Non-limiting and non-exhaustive embodiments of the invention are described with reference to the following figures, wherein like reference numerals refer to like parts throughout the various views unless otherwise specified.
Embodiments of a system and method for detecting states of features in images for facial recognition are described herein. In the following description, numerous specific details are set forth to provide a thorough understanding of the embodiments. One skilled in the relevant art will recognize, however, that the techniques described herein can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring certain aspects.
Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Throughout this specification, several terms of art are used. These terms are to take on their ordinary meaning in the art from which they come, unless specifically defined herein or the context of their use would clearly suggest otherwise.
Process 100 may be executed by processing circuitry within a camera with a digital image sensor or may be executed on processing circuitry integrated into an image sensor. The processing circuitry may include a processor, a Field Programmable Gate Array (“FPGA”), a Digital Signal Processor (“DSP”), or otherwise. The processing circuitry may include memory to store settings, images, and image data received from the image sensor. In the context of utilizing process 100 with an image sensor, the image sensor may be constantly capturing preliminary digital images for evaluation prior to capturing a permanent digital image to be saved indefinitely. In one embodiment, when the image sensor is capturing preliminary digital images, those initial digital images are evaluated for certain features (e.g. eyes, nose, mouth, wrinkles).
In process block 105, an approximate location of a feature in an image is identified.
As an example, distance ratios between an upper eye and lower eye lid or distances between an upper lip and a lower lip can be utilized to determine an approximate location of an eye or a mouth. Shape algorithms that match shapes such as the shape of a mouth or an eye can also be utilized to determine an approximate location of an eye or a mouth. The distance ratios may be combined with shape algorithms to identify the approximate location of the feature as well.
Entropy analysis to identify approximate locations of a feature includes applying an initial entropy analysis to pixel intensities of the image. In one embodiment, identifying approximate locations of a feature includes applying a projection function to pixel intensities of the image.
Learning algorithms use training images to determine an approximate location of a feature. Learning images contain mouths, eyes, and other features that may be leveraged to better capture a permanent image. In learning algorithms, the learning images may be compared with regions of the preliminary image to identify similar features. For example, image processing that includes comparing a learning image of an eye to preliminary image 210 may identify an eye in preliminary image 210. After identifying an eye in preliminary image 210, the approximate location of the eye can be identified and preliminary image 210 may be cropped to yield feature image 225A. Similarly, image processing that includes comparing a learning image of a mouth to preliminary image 210 may identify a mouth in preliminary image 210. After identifying a mouth in preliminary image 210, the approximate location of the mouth can be identified and preliminary image 210 may be cropped to yield feature image 225B.
Returning to
Once the gradient phase map is generated in process block 115 in
In process block 120 of
Applying a projection function (e.g. IPF or VPF) to gradient phase map 551A will yield a different projection result than applying a projection function to gradient phase map 551B. Gradient phase maps 551A, 551B, 552A, and 552B include gradient arrows that indicate the angle assigned to a given pixel. Convention is to designate gradients from white to black, although the example gradient arrows in
As
As mentioned previously,
Once the state of the feature is determined in process block 120, an action may be initiated in response to the determination. For example, if the feature is a mouth and the state of the mouth is open (smiling), the action initiated may be initiating an image capture by an image sensor. The processing circuitry that determines the state of the feature may send instructions to the image sensor to capture the image. Similarly, if the feature is an eye and the state of the eye is open, the image capture may be initiated in response to determining that the eye is open (rather than being closed).
One potential advantage of the disclosed method is to reduce computation cost and processing resources used to detect a facial expression. In addition, the detection of the facial expression may be quicker than conventional methods, which reduces lag in capturing a desired image. Furthermore, the need for having arrays of training images stored in memory is decreased.
The disclosed method may also increase the reliability of facial recognition compared to the conventional methods. Since conventional methods rely on training images, features that are slightly different from the training image in shape, size, or shade may generate error in analysis. In comparison, the slight differences in shape, size, or shade of a feature may not significantly affect the gradient phase map of an image, and therefore the disclosed method is less prone to error.
The processes explained above are described in terms of computer software and hardware. The techniques described may constitute machine-executable instructions embodied within a tangible or non-transitory machine (e.g., computer) readable storage medium, that when executed by a machine will cause the machine to perform the operations described. Additionally, the processes may be embodied within hardware, such as an application specific integrated circuit (“ASIC”) or otherwise.
A tangible non-transitory machine-readable storage medium includes any mechanism that provides (i.e., stores) information in a form accessible by a machine (e.g., a computer, network device, personal digital assistant, manufacturing tool, any device with a set of one or more processors, etc.). For example, a machine-readable storage medium includes recordable/non-recordable media (e.g., read only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media, flash memory devices, etc.).
The above description of illustrated embodiments of the invention, including what is described in the Abstract, is not intended to be exhaustive or to limit the invention to the precise forms disclosed. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes, various modifications are possible within the scope of the invention, as those skilled in the relevant art will recognize.
These modifications can be made to the invention in light of the above detailed description. The terms used in the following claims should not be construed to limit the invention to the specific embodiments disclosed in the specification. Rather, the scope of the invention is to be determined entirely by the following claims, which are to be construed in accordance with established doctrines of claim interpretation.
Number | Name | Date | Kind |
---|---|---|---|
6118887 | Cosatto | Sep 2000 | A |
7039222 | Simon et al. | May 2006 | B2 |
8274581 | Su et al. | Sep 2012 | B2 |
8374442 | Yu et al. | Feb 2013 | B2 |
8380004 | Moffat | Feb 2013 | B1 |
8437516 | Song et al. | May 2013 | B2 |
8805018 | Li et al. | Aug 2014 | B2 |
20030223623 | Gutta et al. | Dec 2003 | A1 |
20080310731 | Stojancic | Dec 2008 | A1 |
20100111489 | Presler | May 2010 | A1 |
20110052081 | Onoe | Mar 2011 | A1 |
20110211073 | Foster | Sep 2011 | A1 |
20110310237 | Wang et al. | Dec 2011 | A1 |
20120070041 | Wang | Mar 2012 | A1 |
20120114198 | Yang et al. | May 2012 | A1 |
20120288166 | Sun et al. | Nov 2012 | A1 |
20120288167 | Sun et al. | Nov 2012 | A1 |
20120321140 | Xiong | Dec 2012 | A1 |
20120328202 | Tian et al. | Dec 2012 | A1 |
20130142426 | Kaneda | Jun 2013 | A1 |
20140050372 | Qi et al. | Feb 2014 | A1 |
20140063236 | Shreve | Mar 2014 | A1 |
20140105487 | Irie | Apr 2014 | A1 |
Number | Date | Country |
---|---|---|
101860680 | Oct 2010 | CN |
102385691 | Mar 2012 | CN |
103077537 | May 2013 | CN |
200416622 | Sep 2004 | TW |
201013545 | Apr 2010 | TW |
201039251 | Nov 2010 | TW |
201220214 | May 2012 | TW |
201227541 | Jul 2012 | TW |
201305923 | Feb 2013 | TW |
Entry |
---|
TW Patent Application No. 102139395—Taiwanese Office Action and Search Report, mailed Jul. 20, 2015, with English Translation, 14 pages. |
J.-F. Aujol et al., “Image Decomposition into a Bounded Variation Component and an Oscillating Component,” Journal of Mathematical Imaging and Vision, 22 (2005), Springer Science & Business Media, Inc., Netherlands, pp. 71-88. |
P. C. Mahalanobis, “On the generalised distance in statistics,” Proceedings of the National Institute of Sciences of India, vol. 2, No. 1 (1936), Calcutta, India, pp. 49-55. |
Z. Wang et al., “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE Transactions on Image Processing, vol. 13, No. 4, Apr. 2004, pp. 1-14. |
U.S. Appl. No. 13/946,299, filed Jul. 19, 2013, Hsu. |
Z. Wang et al., “Translation Insensitive Image Similarity in Complex Wavelet Domain,” Presented at: ICASSP-05, Philadelphia, PA, Mar. 19-23, 2005, Proc. IEEE International Conference on Acoustics, Speech & Signal Processing, vol. II, pp. 573-576, Mar. 2005. |
E. A. Silva et al., “Quantifying image similarity using measure of enhancement by entropy,” In Mobile Multimedia/Image Processing for Military and Security Applications, vol. 6579 of Presented at the Society of Photo-Optical Instrumentation Engineers (SPIE) Conference, San Jose, CA, USA, Apr. 2007, 12 pages. |
M. Pedersen et al., “Survey of full-reference image quality metrics,” Høgskolen i Gjøviks rapportserie, 2009 nr. 5, The Norwegian Color Research Laboratory (Gjøvik University College), Jun. 2009, 74 pages. ISSN: 1890-520X. |
U.S. Appl. No. 13/946,299—Final Office Action, mailed Apr. 8, 2016, 21 pages. |
Wright et al., Robust Face Recognition via Sparse Representation, 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, No. 2, pp. 1-18. |
Huang et al., (Face Recognition Based on Collaborative Image Similarity Assessment, 2012, 2012 International Conference on Information Security and Intelligence Control, pp. 254-257. |
U.S. Appl. No. 13/946,299—Non-Final Office Action, mailed Oct. 9, 2015, 25 pages. |
TW 102139395—Second Office Action with English Translation, issued Dec. 16, 2015, 8 pages. |
TW Patent Application No. 103136358—Taiwanese Office Action and Search Report, issued Aug. 28, 2015, with English Translation (6 pages). |
Number | Date | Country | |
---|---|---|---|
20160044237 A1 | Feb 2016 | US |