The present invention relates to a method for skin tone detection, and especially a method providing for real time colour transformation for effective skin tone detection.
Detecting human skin tone is used in numerous applications such as video surveillance, face and gesture recognition, human computer interaction, image and video indexing and retrieval, image editing, vehicle drivers' drowsiness detection, controlling users' browsing behaviour (e.g., surfing pornographic sites) etc.
Skin tone detection involves choosing a colour space, providing a skin model for the colour space and processing regions obtained from an image using the skin model to fit any specific application.
There exist several colour spaces including, for example, RGB, CMY, XYZ, xyY, UVW, LSLM, L*a*b*, L*u*v*, LHC, LHS, HSV, HSI, YUV, YIQ, YCbCr.
The native representation of colour images is typically the RGB colour space which describes the world view in three colour matrices: Red (R), Green (G) and Blue (B).
Some skin detection algorithms operate in this colour space, for example, Kova{hacek over (c)}, J., Peer, P., and Solina, F., (2003), “Human Skin Colour Clustering for Face Detection”, EUROCON 2003 International Conference on Computer as a Tool, Ljubljana, Slovenia, September 2003 eliminate luminance by basing their approach on RGB components not being close together using the following rules:
An RGB pixel is classified as skin iff.
R>95&G>40&B>20
&max(R,G,B)−min(R,G,B)>15
&|R−G|>15&R>G&R>B
However, many colour spaces used for skin detection are based on linear transforms from RGB and many of these transformations are directed towards extracting luminance information from colour information to decorrelate luminance from the colour channels.
It is appreciated that the terms illumination and luminance are slightly different and indeed depend on each other. However, for simplicity, in the present specification, they are used interchangeably as each is a function of response to incident light flux or the brightness.
Some literature such as Albiol, A., Torres, L., and Delp, E. J. (2001), “Optimum color spaces for skin detection”, Proceedings of the IEEE International Conference on Image Processing, vol. 1, 122-124 argue that choosing colour space has no implication on the detection given an optimum skin detector is used, in other words all colour spaces perform the same.
By contrast, others discuss in depth the different colour spaces and their performance including Martinkauppi J. B., Soriano M. N., and Laaksonen M. H. (2001), “Behavior of skin color under varying illumination seen by different cameras at different color spaces”, In Proc. of SPIE vol. 4301, Machine Vision Applications in Industrial Inspection IX, pages 102-113, 2001; and Son Lam Phung, Bouzerdoum A., and Chai D., (2005), “Skin Segmentation Using Color Pixel Classification: Analysis and Comparison”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 1, pp. 148-154, January, 2005.
Furthermore, Abadpour A., and Kasaei S., (2005), “Pixel-Based Skin Detection for Pornography Filtering”, Iranian Journal of Electrical & Electronic Engineering, IJEEE, 1(3): 21-41, July 2005 concluded that “in the YUV, YIQ, and YCbCr colour spaces, removing the illumination related component (Y) increases the performance of skin detection process”.
Again however, by contrast Jayaram, S., Schmugge, S., Shin, M. C. and Tsap, L. V. (2004), “Effect of Colorspace Transformation, the Illuminance Component, and Color Modeling on Skin Detection”, Proc of the 2004 IEEE Computer Vision and Pattern Recognition (CVPR'04) IEEE Computer Society conclude that the illumination component provides different levels of information for the separation of skin and non-skin color, thus absence of illumination does not help boost performance.
Hsu R.-L., Abdel-Mottaleb M. and Jain A. K. (2002), “Face detection in color images. IEEE Trans. Pattern Analysis and Machine Intelligence”, vol. 24(5), 696-702, 2002; and Vezhnevets V., Sazonov V., and Andreeva A., (2003), “A Survey on Pixel-Based Skin Color Detection Techniques”, Proc. Graphicon-2003, pp. 85-92, Moscow, Russia, September 2003 disclose dropping luminance prior to any processing as they indicate the mixing of chrominance and luminance data makes RGB based analysis marred and not a very favourable choice for colour analysis and colour based recognition.
The approach of Hsu et al. is shown in more detail in
Yun Jae-Ung., Lee Hyung-Jin., Paul A. K., and Baek Joong-Hwan., (2007) “Robust Face Detection for Video Summary Using Illumination-Compensation and Morphological Processing”, Third International Conference on Natural Computation, 710-714, 24-27 Aug. 2007, added an extra morphological step to the approach of Hsu et al.
Shin, M. C., Chang, K. I., and Tsap, L. V. (2002), “Does colorspace transformation make any difference on skin detection?” IEEE Workshop on Applications of Computer Vision argue and question the benefit of colour transformation for skin tone detection, e.g., RGB and non-RGB colour spaces; and also argue that the use of Orthogonal Colour Space (YCbCr) gives better skin detection results compared to seven other colour transformations.
Also, US 2005/0207643A1, Lee, H. J. and Lee, C. C., discloses clustering human skin tone in the YCbCr space.
Another space, the Log-Opponent (LO) space uses a base 10 logarithm to convert RGB matrices into I, Rg, By. The concept behind such hybrid colour spaces is to combine different colour components from different colour spaces to increase the efficiency of colour components to discriminate colour data.
In Forsyth, D. and Fleck, M. (1999), “Automatic Detection of Human Nudes”, International Journal of Computer Vision 32(1): 63-77. Springer Netherlands, two spaces are used, namely IRgBy and HS from the HSV (Hue, Saturation and Value) colour space. A texture amplitude map is used to find regions of low texture information. The algorithm first locates images containing large areas whose colour and texture is appropriate for skin, and then segregates those regions with little texture. The texture amplitude map is generated from the matrix I by applying 2D median filters.
Nonetheless, there remains a need to provide an improved method of skin tone detection.
According to the present invention there is provided a method of skin tone detection comprising the steps of:
The present invention provides a rapid skin tone detection classifier particularly useful for real time applications.
Preferably, said method comprises deriving said gray scale representation I by transforming RGB values normalised to the interval [0,1] for said pixel as follows:
I=(R*a)+(G*b)+(B*c),
wherein 0.25<a<0.35, 0.5<b<0.7, and 0.05<c<0.16.
Example values which may be used could be:
I=R*0.2989360212937750+G*0.587043074451121+B*0.114020904255103
Preferably, said method comprises deriving said red chrominance independent representation from the maximum of the G and B values for said pixel.
Preferably, said determining comprises determining a skin tone value fskin(x,y) for a pixel as:
where e(x,y)=I(x,y)−Î(x,y), and TL and TH are lower and upper threshold values, respectively.
It will be understood that Î=max(G,B)
Preferably, 0.02<TL<0.04 and 0.10<TH=<0.14.
Alternatively, TL and TH are calculated such that:
μ−(Δleft*σ)=TL
μ+(Δright*σ)=TH
wherein μ is the mean of the frequency distribution of a series of pixels to be analysed, σ is the standard deviation of said frequency distribution, and Δleft and Δright right are chosen to be those values 1 and 3 σ away from μ respectively.
Further preferably, TL=0.02511 and TH=0.1177.
Preferably, the method further comprises the step of determining one or more regions of skin tone in said image, each region comprising a plurality of contiguous pixels, each determined to have a value indicating the pixel has a skin tone.
In a further aspect there is provided a method of embedding data in an acquired image, the method comprising the steps of:
Preferably, said embedding comprises embedding said data into a red chrominance channel of said image.
Further preferably, said method comprises performing a DWT transform of said image data prior to said embedding. DWT stands for a Discrete Wavelet Transform.
Preferably, said data is encrypted prior to said embedding to provide a substantially chaotic data set.
Further preferably, the method comprises the step of identifying within one of said skin tone regions a facial feature. Preferably, said facial feature comprises a pair of eyes. Further preferably, the method comprises the step of determining an orientation of said skin tone region in accordance with the relative rotation of said eyes within said acquired image.
Preferably, said data comprises adding an indication of said orientation to said image.
In a still further aspect, there is provided a method of extracting data embedded in an image according to the present invention, the method comprising:
Preferably, said method comprises:
An embodiment of the invention will now be described, by way of example, with reference to the accompanying drawings, in which:
a) shows a frequency distribution for a set of sample pixel data, with
An embodiment of the invention will now be described with reference to
I=(R*a)+(G*b)+(B*c),
wherein 0.25<a<0.35, 0.5<b<0.7, and 0.05<c<0.16.
In particular, the following transformation is used:
I=R*0.2989360212937750+G*0.587043074451121+B*0.114020904255103
This is similar to the RGB to Y transform into YCbCr colour space. In the embodiment, the RGB values are stored in double precision and linearly scaled in the interval [0,1]. The vector I eliminates the hue and saturation information while retaining the luminance.
Next, another version of luminance Î is obtained, but this time without taking the R vector into account (most of skin colour tends to cluster in the red channel), step 12:
Î=max(G,B)
The discarding of red colour is deliberate, as in the next stage this will help us calculate an error signal. This step is actually a modification of the way HSV (Hue, Saturation and Value) computes the V values, except that we do not include the red component in the calculation.
Then, step 14, for any value of x and y, an error signal e(x,y) is derived from the element-wise subtraction of the Î and I matrices:
e(x,y)=I(x,y)−Î(x,y)
In the embodiment, e(x,y) does not employ either truncation or rounding.
Then a skin probability map (SPM) is determined from lower and upper boundaries, step 16. In an embodiment an empirical rule can be chosen as follows:
Most preferably, a skin probability map (SPM) is created that uses an explicit threshold based skin cluster classifier, which defines the lower and upper boundaries of the skin cluster. With reference to
A statistical analysis is performed to provide the detailed boundaries. Let μ and σ denote the mean and standard deviation of the above distribution, and let Δleft and Δright denote the distances from μ, on the left and right hand side, respectively. The boundaries are determined based on:
μ−(Δleft*σ)≈0.02511
μ+(Δright*σ)≈0.1177
where Δleft and Δright right are chosen to be 1 and 3 σ away from μ, respectively, to cover the majority of the area under the curve. Hence, a more precise empirical rule set is given in
It is proposed that the above rule provides a balanced threshold for further processing. While the inclusion of luminance is adopted, the 3D projection of the three matrices I(x),Î(x),e(x) in
a) shows an original image,
Skin tone detection according to the above embodiment finds particular application in steganography—the science of concealing the data in another transmission medium. Steganography has various applications for example as a replacement for encryption where it is prohibited or not appropriate, smart identifiers where individuals' details are embedded in their photographs (content authentication), data integrity by embedding checksum information into an image, medical imaging and secure transmission of medical data and bank transactions.
Conventional approaches to steganography can be categorized into three major areas:
Most existing steganographic methods rely on two factors: the secrecy of the key used to encode the hidden data prior to embedding and the robustness of the steganographic algorithm. Nonetheless, all of the above tools along with the majority of other introduced techniques suffer from intolerance to any kind of geometric distortion applied to the stego-image—the carrier image including the hidden steganographic information. For instance, if rotation or translation occurs all of the hidden information can be lost.
An implementation of the present invention, remedies this problem by finding clusters of skin areas in a carrier image, step 18. This can be based on conventional region growing algorithms starting from seed skin pixels determined in the previous step 16. It has been found that embedding data into these regions produces less distortion to the carrier image compared to embedding in a sequential order or in any other areas. This is because when information is embedded in such regions, it is psycho-visually redundant, i.e. the eye does not respond with as much sensitivity to information in these regions as in others, and so the carrier image can be altered in these regions without significantly impairing the quality of image perception.
In a decoder (not shown) arranged to extract hidden data from a carrier image produced according to the method of
In one implementation, to cope with rotation, at encoding,
So as shown in
Turning now to
However, if the original orientation angle of the face is included with the image, step 24, then even if an image has been subjected to a rotation attack, as a pre-processing step prior to decoding, the attacked image can be rotated in the opposite direction by the required angle to re-orient the face region to an angle θ and so restore the relative coordinates of skin regions within the image.
In a further refinement of this approach, the angle θ can be modified with a secret key αε{1, 2, . . . , 359}, wherein the secret key α is an agreed-upon angle for embedding that is shared between the sender and the recipient (i.e. between the encoder and the decoder), step 28. The secret key α can be determined in any conventional manner, step 26, by the parties transmitting and decoding the hidden data, so that on decoding the image, the angle θ can be determined and used to re-orient the image if required.
For example, the original image containing a face region is initially inspected, and is found to form an angle of, say, 1.5° to the base. Having knowledge of the agreed angle key α, say 90°, the original image is rotated by 88.5° (i.e. 90°−1.5°). The bit stream is then embedded in the rotated original image, step 28. The resultant image is then re-oriented to the initial angle of the face region in the original image, i.e. 1.5° (1.5°−90°=a rotation of −88.5°. (This would be in the form of an additional step after step 22, not shown in
It is appreciated that embedding the calculated angle θ with the payload is very fragile to any image processing attack and in alternative implementations; the angle θ can be transmitted by alternative means or channels. For example, in JPEG images, the angle θ could be included in the EXIF image header and so could be unaffected by a rotation attack.
In any case, knowledge of the orientation of reference points within an image when data is embedded aids recovery from rotation distortion.
In a preferred embodiment, embedding of the stego-image takes place in the 1st-level 2D Haar DWT (Discrete Wavelet Transform) with the Symmetric-padding mode to resist noise impulse and compression. Although algorithms based on DWT experience some losses of data since the reverse transform truncates the values if they go beyond the lower and upper boundaries (i.e., 0-255), knowing that human skin tone resides along the middle range in the chromatic red of YCbCr colour space allows us to embed in the DWT of the Cr channel, leaving the perceptibility of the carrier image virtually unchanged.
The invention is not limited to the embodiments described herein but can be amended or modified without departing from the scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
0819407.8 | Oct 2008 | GB | national |
0819982.0 | Oct 2008 | GB | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP09/07504 | 10/20/2009 | WO | 00 | 8/5/2011 |