The present invention relates to face detection in digital images. In particular, although not exclusively, the invention relates to a system for identifying a face to enable a camera to focus on that face and/or control the exposure of a photograph to optimise the exposure of the face.
A large proportion of photographs, especially those taken by recreational photographers, include people. In such situations, it is important that the camera focuses on the faces in the composition rather than on anywhere else in the picture. For some portrait photography a wide aperture is deliberately used to ensure that the face is in focus but all the surroundings are blurred.
Ensuring that the face is in focus can be problematic, especially in cases where the face is not in the centre of the picture. Many cameras automatically focus on a point in the centre of the field of view. If this point is located at a different distance from the camera than the face, then the face may end up out of focus. Some cameras overcome this problem by enabling the user to select a point of the composition as a focus point. The user starts by lining up his chosen focus point in the centre of the field of view. He then half depresses the camera shutter release button to ensure that the camera focuses at that distance. The camera can then be moved laterally if it is desired that the focus point is off-centre. However, this makes the process of preparing to take a picture slow and unwieldy. In many cases the user would prefer to be able simply to point the camera and take the picture, without having to worry about selecting the point of focus.
Other cameras select one point from a plurality of points on which to focus. This will work in some situations, but the user has very little control over which point is selected by the camera, and in some situations the wrong point is selected, resulting in the face being out of focus in the final image.
It would therefore be desirable to provide a system by which a camera can identify that a face is present, and focus automatically on the part of the field of view containing that face.
A further problem often encountered during portrait photography involves the exposure of the face. Modem cameras detect how much light enters the lens and adjust the aperture and shutter speed to optimise the exposure. However, the light of the entire field of view may not be appropriate for selecting the exposure when it is the face itself which is the subject. For example, if a photograph is taken outdoors, with a large expanse of bright blue sky in the background, a typical camera will recognise that the ambient light is very bright, and ensure that the light entering the lens is reduced. This may result in the face being under-exposed, especially if the face is shaded. Conversely, if the background is generally dark, the camera will increase the amount of light entering the lens, and the face may be over-exposed.
It would thus be desirable to enable the camera to control the exposure of the photograph on the basis of the part of the field of view containing a face only. The exposure of the whole image can then be selected to optimise the exposure of the face itself.
Most modem digital cameras are provided with an array of photodetectors behind the lens which receive light even when a picture is not actually being taken. The array of photodetectors records a series of images, which are transferred to a buffer memory and displayed successively on a liquid crystal display (LCD), normally located on the back of the camera. When the user wishes to “take a picture”—i.e. record the image currently visible on the LCD—he presses an actuating button which causes the camera to focus, the aperture to reduce, and the operating system to record the output from the array of photodetectors onto a memory device such as a memory card.
Since an image of the field of view is constantly determined by the camera for transferral to the LCD, even when this image is not being recorded, it can still be used to provide information to the control system of the camera. If the presence of a face in this image can be detected, the location and size of the face can be passed to the control system so that the camera can focus on the part of the field of view containing the face and/or control the exposure of the picture on the basis of the brightness of the face. It will be appreciated that such face detection must operate extremely rapidly so that the camera can begin focussing and/or selecting the correct exposure as soon as the decision to take a picture has been made.
It is therefore an object of the present invention to provide a system for detecting faces in a digital image. It is a further object to provide a system capable of detecting a face sufficiently rapidly to be usable by the control system of a camera to focus on and/or select exposure on the basis of the brightness of that face.
In accordance with one aspect of the present invention there is provided a method of detecting a face in a digital image, the method comprising:
A considerable time saving can be obtained by reducing the number of pixels which need to be searched. The image size is reduced by ignoring most of the pixels in the image, and performing all subsequent operations on just a selection of these pixels. This is distinct from the process of compressing an image to reduce its size, which is a much more time consuming operation. Since a face will always occupy a significant number of pixels, sufficient information is contained in just a small selection of these pixels.
A suitable method for forming a sampled image includes selecting pixels from one out of every m rows and one out of every n columns of the digital image, where m and n are integers greater than one. m and n may be the same. For example, an image having 1000 rows and 1200 columns may be reduced in size by selecting one pixel out of five in each direction to form a sampled image having 200 rows and 240 columns.
Pixels in a digital image are usually described in terms of data in a colour space. Typical systems describe pixels in terms of the levels of red, blue and green of each pixel (typically known as RGB), or hue, luminance and saturation (HLS), although other systems are also known. A common system used in digital cameras is known as YCC. Most systems provide for 256 possibilities for each attribute (e.g. R, G or B), so an individual pixel can have up to 16.7 million (256×256×256) different RGB values.
In accordance with another aspect of the invention there is provided a method of detecting a face in a digital image, comprising generating a map from the image, the map having a plurality of elements each corresponding to a pixel of the image, and searching the map for regions of elements corresponding to regions of pixels in the image exhibiting characteristics of a face.
The use of a map enables the colour space data of pixels in the image to be replaced by map elements containing more focussed information, speeding up subsequent operations in detecting regions corresponding to a face. In preferred embodiments the map is produced from the sampled image described above. In order to keep the total data held by the map as small as possible, the map preferably contains 256×256 elements or fewer, each element being represented by one byte of information (i.e. a value between 0 and 255).
The map is preferably populated from a look-up table. This enables the value of each map element to be determined quickly. As an example, the look-up table may be a matrix of 64×64×64 bytes, with the inputs being the R, G and B values of each pixel scaled down by a factor of 4.
The map element values are preferably subdivided into categories, at least one of the categories corresponding to pixels exhibiting skin tone, so that map elements in the category or categories corresponding to skin tone pixels may be classed as “skin tone elements”. More than one skin tone category may be used, to cover a range of values, both dark and light, to account for factors such as skin colouration and lighting conditions. Further categories may be provided for pixels corresponding to eyes, lips, hair etc.
Preferably, if the number of skin tone elements in the map is smaller than a predetermined value, it is determined that no face is present in the digital image.
In a preferred embodiment, the step of searching for regions of elements which could correspond to a face begins by searching for “skin tone regions” of skin tone elements. These are generally contiguous regions of skin tone elements, although it will be appreciated that such regions need not necessarily contain only skin tone elements. For example, the elements corresponding to the pixels representing the eyes and mouth, and in some cases the nose, will usually not be skin tone elements, but will still fall within the overall skin tone region.
Preferably, if a skin tone region is below a predetermined size (e.g. one or two elements), it is determined that this region does not correspond to a face in the digital image. Especially in the case of portrait photography, faces will occupy a significant proportion of the image, and will therefore always be above a certain minimum size. This step enables artefacts which happen to have skin tones to be ignored.
Preferably the categories of some or all of the elements in a skin tone region are merged on the basis of the categories of all the elements in that skin tone region. This step reduces the number of different categories of elements, decreasing complexity in subsequent steps.
Preferably, if the shape of a skin tone region is far from elliptical (e.g. the region is generally linear), it is determined that this region does not correspond to a face in the digital image. Thus regions whose shape is unreasonable for the purposes of face detection can be rejected early in the procedure.
A validation may now be performed on each skin tone region to determine whether or not it corresponds to a face in the digital image. The same validation may be repeated up to four times, following rotation of the map each time, to account for different orientations of faces (i.e. extending vertically or horizontally) in the image.
The validation preferably includes identifying the height: width ratio of each skin tone region and determining that a skin tone region whose height: width ratio falls outside predetermined parameters does not correspond to a face in the image.
The validation preferably includes determining a bounding box for a skin tone region by identifying the smallest rectangular shape encompassing the skin tone region, calculating what proportion of elements within the bounding box are skin tone elements and, if the proportion falls outside predetermined parameters, determining that the skin tone region does not correspond to a face in the image. A “face” shape will typically occupy approximately 60-80% of its bounding box.
It will be appreciated that a skin tone region may be larger than the face of a subject. For example, the subject may be wearing a “V” neck which leaves the neck exposed, or even have one or both shoulders exposed. It is useful to be able to identify that faces are present in such situations, even though the skin tone region will not initially appear to be “face”-shaped. It is also useful to be able to determine the location and size of the face itself within the skin tone region.
The validation step preferably includes dividing the bounding box vertically into a plurality of segments (e.g. three segments) extending across the width of the box; calculating, for each segment, a weighted horizontal centre of the skin tone elements contained in that segment; calculating, for each segment, a maximum width of the portion of the skin tone region contained in that segment; and using predetermined rules to examine properties of the segments relative to each other and/or the bounding box. Suitable rules include testing for one or more of the relative horizontal position of the weighted horizontal centres, the relative size of the maximum widths, and the height of the bounding box, although it will be appreciated that other factors may also be considered. This enables the detection of faces present by themselves, or faces present when the subject's neck is exposed, or the neck and one shoulder, or other possibilities that will be apparent.
If the skin tone region satisfies none of the predetermined rules, it is preferably determined that the associated skin tone region does not correspond to a face in the image. If the skin tone region does satisfy one of the predetermined rules, that rule may be used to determine the location and size of a “face region” within the associated skin tone region which could correspond to a face in the image. If only the face itself is uncovered, the face region and skin tone region will be substantially the same. If the neck or shoulder(s) are exposed, the face region may well be considerably smaller than the skin tone region.
Once the face region has been identified, it is preferably matched against a plurality of face-shaped templates to determine if it does correspond to a face in the image. Each template may be divided into regions within the bounding box where a high proportion of skin tone elements is expected (i.e. corresponding to skin regions of the face), and other regions where a low proportion of skin tone elements is expected, for example corresponding to eyes, lips, hair etc. Further regions may have a “neutral” expectation of skin tone elements: for example, ears may be visible or may be covered by hair. In the region of the nose, it would be expected that some elements are skin tone elements but the majority are not, because the nose often appears a different colour due to its angle.
Some regions of each template (e.g. eyes, lips, hair) corresponding to a low expectation of skin tone elements may have high expectations of other categories of element.
The step of matching the face region to each template may preferably include applying a score to each element on the basis of how well the category of that element matches the expected category in the region template in which it is located.
Further selection criteria may be applied to each face region to determine whether or not it corresponds to a face in the image. For example, the relative sizes of eye regions and mouth regions within each face region may be compared. The symmetry of the eye regions may be checked.
If a face region meets all of the above criteria, it may be considered to correspond to a face in the original image. The location and size of the face may then be passed to a camera for use in focussing on that face and/or other purposes such as selecting exposure on the basis of the brightness of the face.
The invention thus also provides a camera which is arranged to carry out face detection as described above. There is also provided a method of focussing a camera onto a face, comprising detecting a face using a method as described above, and focussing onto that face. There is also provided a method of selecting the exposure of a photograph, comprising detecting a face using a method as described above, and selecting the photograph exposure to optimise the exposure of that face.
It will be appreciated that the above described steps need not all be used in the detection of a face. Many of the operations may be used independently, although the overall efficiency is increased if all are used. For example, skin tone regions may be detected directly in the original image, or in a sampled image, without the creation of a map, although the use of a map is preferred because it decreases computational complexity. Skin tone regions may be compared to templates whether or not the previous validation steps have taken place first, although template matching will take longer if no previous filtering has been performed. Similarly, validation may be performed without rejecting “abnormal” (e.g. linear) shapes first, but will be less efficient.
Thus according to another aspect of the invention there is provided a method of detecting a face in a digital image, the method comprising:
According to a further aspect of the invention there is provided a method of detecting a face in a digital image, the method comprising:
Some preferred embodiments of the invention will now be described by way of example only and with reference to the accompanying drawings, in which:
As can be seen from
Contained within the skin tone region 7 are also found regions corresponding to the eyes 9,10, nose 11 and mouth 12. The pixels found in such regions will generally not be skin tone pixels: the pixels 13 in the eyes 9, 10 may, for example, be generally blue, the pixels 14 in the nose 11 a colour defined by a shadow caused by light falling on the nose, and the pixels 15 in the mouth 12 red. It will also be noted that, even outside the area of the eyes, nose and mouth, not all pixels in the skin tone region will necessarily be skin tone pixels: pixels 16 with other colours may be caused by a birthmark or other similar mark such as a blemish or a wound on the face, or may be the result of colour distortion caused by the camera and/or lighting conditions.
In order to process the image 1 shown in
The values entered into the map elements 23 may be seen as providing “categories” of pixels 3. This is a finer distinction than simply “skin tone” or “not skin tone”. The categories distinguish between colours which are skin tone, but too dissimilar to be considered the same skin tone. A suitable number of categories for the different types of skin tone is of the order of 128, providing information about how bright, red, saturated etc. the corresponding pixel 3 is. Thus there may be approximately 128 categories of “skin tone elements”. The remaining categories may be used for elements corresponding to pixels not having skin tone, and some may be used for colours characteristic of eyes, hair, mouth etc.
The 14×15 element map 21 shown in
It can be seen in even the very sparsely populated exemplary map 21 that there are seventy-one elements within the skin tone region 27. Sixty-one of these elements 25 (“skin tone elements”) correspond to skin tone pixels 5 in the original image 1. There are also two “eye” elements 33 in each eye 9,10, one “nose” element 34 in the nose 11, three “mouth” elements 35 in the mouth 12, and one “blemish” element 36 corresponding to the birthmark 16. Thus, even with the very low number of map elements shown in
Once the map has been populated, a check is made of the total number of skin tone elements. If too few are present, a report is made that no face is present in the image and the process stops. A typical threshold value for a 200×240 element map is approximately 200 elements.
If there is a significant number of skin tone elements, the detection stage begins. The first stage involves merging the categories of those elements whose categories are similar, according to predefined metrics. For example, two categories that differ only in that one is lighter than the other may be the same skin tone but in sunlight and shade, respectively. Merging is context-dependent. For example, an element corresponding to a brownish element could be the shaded part of a Caucasian face, or the brightly lit part of a face with darker skin colour. It will generally be possible to deduce from the other skin tone elements in the skin tone region whether the categories of the brownish elements should be merged with those of elements corresponding to slightly lighter pixels, or with elements corresponding to slightly darker pixels. They should not be merged with both. Similarly, the merging of categories should not go so far as to blend a light face with a white wall, or a dark face with a red/brown wall. It is also desirable to avoid blending faces with the hair surrounding them. Categories that are very different are not merged together.
The map is then searched for contiguous skin tone regions of skin tone elements having similar categories. In the exemplary map 21, the skin tone region 27 containing seventy-one elements would be identified, along with an isolated element 28 corresponding to the small skin tone patch 8 in the image 1.
The process has now identified skin tone regions in the map 21 (corresponding to skin tone regions of the image). In order to ensure that elements such as the background element 28 are not erroneously classed as potential faces, the map is cleaned up by removal of isolated elements that do not look like their neighbourhood. For every skin tone element, a check is made of its eight immediate neighbours. If too few of them (e.g. one or two) are also skin tone elements, it is assumed that this is an isolated element and can be ignored when searching for skin. As mentioned above, an off-white wall could have occasional pixels that are sufficiently pink to look like skin. The removal of isolated elements ensures that such anomalies are removed from the process near the beginning.
The overall shapes of the identified skin tone regions are now checked, and those whose shape is unreasonable for the purposes of face detection are rejected. For example, a very linear region will never correspond to a face. For those shapes which are not rejected, the process executes a validation procedure.
Validation consists of deciding whether a map region is a face or not. There are four face orientations possible—two in portrait mode and two in landscape mode. So the validation has to be repeated up to four times, and it is enough for the region to be accepted in one of these orientations to be considered a face. In practice, the upside-down portrait orientation is unlikely, so it may be skipped. The description below refers to normal portrait orientation. It will be appreciated that the procedure will be identical for other orientations (with suitable changes in direction). Alternatively, the map may be rotated and then the same validation performed on the rotated map.
In some images a face may appear by itself, with the rest of the body covered by non-skin tone clothing and against a non-skin tone background. In such a case the skin tone region would be expected to be generally elliptical, with a height:width ratio approximately equal to 1.4. This may be considered to be the “ideal” case. However, not all images will be like this. If the neck, and possibly a V of the chest, is exposed, the height will be greater. If shoulders, or even further down, is exposed, the shape will be different and both height and width will be higher. If one or two ears are exposed the width will be higher. If hair covers the forehead the height will be lower. It will be appreciated that other variants are also possible. The most effective way to proceed with validation is to consider the most common cases.
The validation process begins by calculating the weighted centre (horizontal and vertical) of the skin tone elements in each skin tone region of the map. The weighted centre has co-ordinates determined by the algebraic average of the co-ordinates of all the skin tone elements in the skin tone region. In addition, a bounding box of each skin tone region is determined. The bounding box is the smallest possible rectangular shape that completely encompasses the skin tone region. A check is made as to the proportion of the bounding box filled by the skin tone region, and those regions where the proportion is below a predetermined value are rejected. A typical threshold value is 60%. This check helps to eliminate shapes unlikely to have been generated by a face.
The relationships between the horizontal position of the horizontal centres 49,50,51,69,70,71, maximum widths 52,53,54,72,73,74 of the skin tone region in the segments, and overall height of the bounding box 45,65 are used to determine what type of face the skin tone region could correspond to. In the example shown in
In the example shown in
It will be appreciated that similar rules may be devised for other types of skin tone region. If one shoulder is exposed, the bottom width will be larger than the top two widths, and the bottom horizontal centre will be laterally offset from the top two. If both shoulders are exposed the bottom width will be significantly larger than the top two, and the horizontal centres will be substantially aligned. If the “ideal” face of
Once the “type” of skin tone region has been established, a “face region” is isolated from the overall skin tone region.
The face region 62 shown in
The next step is to determine whether the face region has eyes and a mouth in the places which would be expected in a face, and also whether it generically “looks like a face”. This is achieved by matching the face region with a number of templates of typical faces. Before matching, the template is scaled to the size of the face region 42,62 previously determined.
One such template is shown in
The elements in each portion of the template are scored depending on how well they fit the set of rules for that template. A positive score is allocated to every pixel that meets expectation (e.g. a skin tone element in the skin portion 82, or an “eye” category element in the eye portion 83) and a negative score to each element that does not meet expectation (e.g. a non-skin tone element in the skin portion 82, or a skin tone pixel in the background portion 87). The score for all of the elements is added up. If the total score is above a certain threshold then the face region is accepted to proceed to the final round of validation. If it is not, then it is decided that this face region does not match the template, and attempts are made to match to further templates. If no templates are found to be a match, the face region is rejected. A range of templates should be used, including templates for faces with and without ears, beards, and hair, and facing directly towards the camera or not. Templates may also be provided for faces in profile: it will be appreciated that there would only be one eye portion in such templates, and the location and size of the eye and mouth portions would be very different.
The scoring takes account of the strength of expectation. In the skin portion 82, for example in the centre of a cheek, there is a high expectation of skin tone elements. Thus elements in this region that are not skin tone elements have a large negative score. At the sides of the nose region, where it might be expected that some elements are skin tone elements and some not, the positive or negative scores assigned to elements are much smaller. Thus it only takes a few wrong elements in “important” places to reject the template match, but many wrong elements will be required in “less important” places.
Further testing is applied using additional criteria such as symmetry of the eyes (whether the number of eye elements is roughly the same in each eye) and ratio of eye elements to mouth elements.
A final check is now made of a sample of pixels in the original image in each accepted face region, in the areas indicated by the template as eye and mouth portions. These pixels do not necessarily correspond to the originally selected pixels from which the map elements were determined. If the colours and placing within the face region of these meet further selection criteria such as symmetry of the eyes and ratio of eye pixels to mouth pixels then it is determined that the face region does correspond to a face in the image, and its location and size is output, for example to a camera operating system to enable the camera to focus on that face in the field of view.
It will be appreciated that variations from the above described embodiments may still fall within the scope of the invention. In particular, the process has been described as a series if steps, each following on from the previous step. However, many of the steps may be carried out independently. It is not necessary to sample the original image before forming a map: it would be possible to produce a map having the same number of elements as the original image. Where the image is only small to start with, this may well be a realistic option. It could also be envisaged, for example, that skin tone regions could be matched to templates without the previous validation steps having taken place, although this will require the use of many more templates. It will be appreciated that each of the above described steps improves the efficiency of the process, but they need not all be used together.
Number | Date | Country | Kind |
---|---|---|---|
0524137.7 | Nov 2005 | GB | national |