This application claims the benefit, under 35 U.S.C. § 365 of International Application of PCT/EP15/054035, filed Feb. 26, 2015, which was published in accordance with PCT Article 21(2) on Sep. 11, 2015, in English, and which claims the benefit of European patent application No. 14305324.7, filed Mar. 6, 2014.
In the following, a method for processing a picture is disclosed. Specifically, a method for processing a picture comprising at least one face is disclosed, wherein processing comprising cropping said picture. Corresponding device is also disclosed.
For navigation applications in a set of fixed pictures or videos, it is useful to be able to display all the pictures or videos at a single glance. For this purpose, it is interesting to generate a reduced size version of each fixed picture or each picture of the videos so that said pictures or videos are displayed simultaneously on a same screen and are able to be compared easily. Likewise, for a broadcast application of a video content on a mobile device having a small sized screen, e.g. on a mobile phone or on a PDA, it is necessary to generate a reduced size version of each picture of the video in order to display them on the small sized screen.
A method known to a person skilled the art to generate these reduced pictures or videos from a source picture or video consists in subsampling said source picture or video. In the case of a significant reduction in size some picture parts are unusable by the user as they are too small.
Cropping a part of the picture containing the most salient or visually attractive areas of the picture is another method. However, in the presence of multiple faces in the picture, such method often fails to define an appropriate cropping window in the picture.
A method for processing a picture comprising at least one face is disclosed. The method comprises:
Advantageously, the processed picture is of better quality because the presence of faces is taken into account.
According to a specific embodiment, modifying the position of the cropping window in the picture based on the weight comprises:
In a variant, the picture comprises a plurality of faces. In this case, the detection and the determination of a weight are performed for each face of the plurality of faces.
In this variant, modifying the position of the cropping window in the picture based on the weights comprises:
Advantageously, the step a) is followed by a step a′) comprising calculating the differences between the weights of two consecutive faces in the ordered list and removing from the ordered list of faces, the faces following a difference above a threshold value.
Exemplarily, determining a weight for each face of the plurality of faces comprises for one face:
A device for processing a picture comprising at least one face is disclosed. The device comprises at least one processor configured to:
In a specific embodiment, modifying the position of the cropping window in the picture based on the weight comprises:
In a variant, the picture comprises a plurality of faces and the detection and the determination of a weight are performed for each face of the plurality of faces.
In this case, modifying the position of the cropping window in the picture based on the weights comprises:
Advantageously, step a) is followed by a step a′) comprising calculating the differences between the weights of two consecutive faces in the ordered list and removing from the ordered list of faces, the faces following a difference above a threshold value.
Exemplarily, determining a weight for each face of the plurality of faces comprises for one face:
A device for processing a picture comprising at least one face is disclosed. The device comprises:
According to a specific embodiment of the invention, the means for modifying the position of the cropping window in the picture based on the weight comprises:
In a variant, in which the picture comprises a plurality of faces, the means for detecting and the means for determining a weight are configured to perform detecting and determining a weight for each face of the plurality of faces.
A computer program product comprising program code instructions to execute of the steps of the processing method according to any of the embodiments and variants disclosed when this program is executed on a computer.
A processor readable medium having stored therein instructions for causing a processor to perform at least the steps of the processing method according to any of the embodiments and variants disclosed.
In the drawings, an embodiment of the present invention is illustrated. It shows:
According to an exemplary and non-limitative embodiment of the invention, the processing device 1 further comprises a computer program stored in the memory 120. The computer program comprises instructions which, when executed by the processing device 1, in particular by the processor 110, make the processing device 1 carry out the processing method described with reference to
According to exemplary and non-limitative embodiments, the processing device 1 is a device, which belongs to a set comprising:
In a step S10, a cropping window is obtained, e.g. by the module 12. Exemplarily, the cropping window is obtained from a memory. In this case, the cropping window was determined beforehand and stored in the memory. In a variant, the module 12 obtains the cropping window by applying the method disclosed in the European patent application EP2005291938. In this patent application, the cropping window is named extraction window. The method comprises first determining a saliency map from the picture Y. The saliency map is a two dimensional topographic representation of conspicuity of the picture. This map is normalized for example between 0 and 255. The saliency map is thus providing a saliency value per pixel that characterizes its perceptual relevancy. The higher the saliency value for a pixel, the more visually relevant the pixel. Exemplarily, the cropping window is obtained as follows:
In a step S12, face(s) are detected in the picture Y, e.g. by the face detection module 14. The output of the method is a list of n detected faces Fi with their respective size Sz and position within the picture Y, where i and n are integer, n≥1 and iϵ[1; n]. The detected face may cover more or less the true face. Indeed, the detected face is often a rectangle which may cover only partially the true face. In the following, the word “face” is used to mean a detected face. The face(s) is(are) for example detected by applying the method disclosed in the document from Viola et al entitled “Rapid object detection using boosted cascade of simple features” published in IEEE Conference on Computer Vision and Pattern Recognition in 2001. It will be appreciated, however, that the invention is not restricted to this specific method of faces detection. Any method adapted to detect faces in a picture is appropriate. The method disclosed by Zhu et al. in the paper entitled “Fast Face Detection Using Subspace Discriminant Wavelet Features” published in Proceedings of the International Conference on Computer Vision and Pattern Recognition, 2000 is another example of such a method. This method approximates the multi-template T by a low-dimensional linear subspace F, usually called the face space. Images are initially classified as potential members of T, if their distance from F is smaller than a certain threshold. The images which pass this test are projected on F and these projections are compared to those in the training set. In the paper entitled “Convolutional Face Finder: A Neural Architecture for Fast and Robust Face Detection” published in IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), November 2004, Garcia et al disclose another approach based on a convolutional neural architecture designed to robustly detect highly variable face patterns.
In a step S14, a weight is determined for each of the detected face(s) responsive at least to the size Sz of these detected faces. The weight is determined for example by the module 16. Exemplarily, the weight for a face is equal to the size Sz of the face. The size Sz is for example the product of the height and the width in pixels of the detected face.
A variant of step S14 is depicted on
The method disclosed in the paper from Baveye et al entitled “Picture and video saliency models improvement by sharpness identification” published in ICCVG in 2012 can be used to determine such a sharpness level. This method provides a sharpness map that associates with each pixel in a picture a level of sharpness. In a paper entitled “Blur detection for digital images using wavelet transform” published in Proceedings of IEEE ICME, 2004, Tong et al. suggest to use wavelet transforms. Indeed, such transforms are able to both discriminate different types of edges and identify sharpness from blur. It will be appreciated, however, that the invention is not restricted to these specific methods of sharpness level determination.
In a step S142, a depth level LDi is determined for the face Fi. The level of depth for the face Fi can be obtained by averaging the depth values associated with the pixels located in the face Fi. When a face is close to the foreground, its depth level is high. On the contrary when a face is close to the background, its depth level is low. The method disclosed in the paper from Kyuseo et al entitled “Geometric and texture Cue based depth-map estimation for 2D to 3D Picture conversion” published in IEEE ICCE in 2011 can be used to estimate such a depth level. This method provides a depth map that associates with each pixel in a picture a level of depth. In a paper entitled “Learning Depth from Single Monocular Images” published in ICCV workshop on 3D Representation for Recognition, 2007, Saxena et al disclose a model using a hierarchical, multi-scale Markov Random Field (MRF) that incorporates multiscale local-image and global-image features, and models the depths and the relation between depths at different points in the image. It will be appreciated, however, that the invention is not restricted to this specific method of depth level determination.
In a step S144, a saliency level LSi is determined for the face Fi. The method disclosed in the European patent application EP2004804828 can be used to determine such a saliency level. This method provides a saliency map that associates with each pixel in a picture a level of saliency. A saliency level characterizes its perceptual relevancy. This method comprises:
It will be appreciated, however, that the invention is not restricted to this specific method of saliency level determination. Any method enabling the perceptual interest data to be calculated (e.g. saliency maps) in a picture is suitable. For example, the method described in the document by Itti et al entitled “A model of saliency-based visual attention for rapid scene analysis” and published in 1998 in IEEE trans. on PAMI can be used. The level of saliency for the face Fi can be obtained by averaging the saliency values associated with the pixels located in the face Fi. When a face is salient, its saliency level is high. On the contrary when a face is not salient, its saliency level is low.
In a step S146, the sharpness level LBi, the depth level LDi, the saliency level LSi and the size Sz of the face Fi are linearly combined into a weight Wi. Exemplarily, Wi=KB*LBi/K+KD*LDi/K+KS*LSi/K+KSz*Sz/Sim, where Sim is the size of the picture and K is a constant used for normalizing the values between 0 and 1. If the values of LBi, LDi and LSi are in the range [0; 255], then K=255. The parameters KB, KD, KS and KSz are defined such that KB+KD+KS+KSz=1. Exemplarily, KB=KD=KS=KSz=0.25. In a variant, KB=KD=KS=0 and KSz=1. In the latter case, the weight is only responsive to the size of the face. The values of the parameters can be set up via the Input/Output interface(s) 130 of the processing device 1. Using different values for the parameters makes it possible to weight differently the various levels and size.
The steps S140 to S144 are iterated until a weight is determined for each face Fi detected in step S12.
In a step S16, the position of the cropping window CW in the picture Y is modified based on the weights determined in step S14. The position of the cropping window in the picture Y is modified so that it is centered on a bounding box, wherein the bounding box comprises at least the detected face with the highest weight. In the case where there is a single face in the picture, the cropping window is centered on the bounding box enclosing the single face. The bounding box is to be understand as the minimum bounding box also called enclosing box. The minimum bounding box refers to the box with the smallest area within which all the pixels of the detected face lie. The cropping window centered on the bounding box is represented on
According to a specific embodiment depicted on
According to another embodiment, the method comprises a preliminary step S162. In the step S162, the differences Dw between the weights of two consecutive faces in the ordered list are calculated. When a difference Dw is above a threshold value then only precedent faces in the list are kept, i.e. the faces whose weights are W1, W2 and W3, as depicted on
In a step S18, the picture Y is processed by cropping the picture part delimited by the modified cropping window W. The cropped picture can be stored in a memory or sent to a destination.
The present principles may be applied to objects of interest other than faces, e.g. animals in a field.
The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of features discussed may also be implemented in other forms (for example a program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants (“PDAs”), and other devices that facilitate communication of information between end-users.
Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette (“CD”), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory (“RAM”), or a read-only memory (“ROM”). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application.
Number | Date | Country | Kind |
---|---|---|---|
14305324 | Mar 2014 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2015/054035 | 2/26/2015 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2015/132125 | 9/11/2015 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
8406515 | Cheatle | Mar 2013 | B2 |
8938100 | Ptucha | Jan 2015 | B2 |
8957865 | Cieplinski | Feb 2015 | B2 |
9070182 | Chua | Jun 2015 | B1 |
9292756 | Welinder | Mar 2016 | B2 |
20020089516 | Sobol | Jul 2002 | A1 |
20080181512 | Gavin et al. | Jul 2008 | A1 |
20080304745 | Honma | Dec 2008 | A1 |
20090208118 | Csurka | Aug 2009 | A1 |
20120093418 | Kim et al. | Apr 2012 | A1 |
Number | Date | Country |
---|---|---|
1695288 | Jun 2005 | EP |
1764736 | Mar 2007 | EP |
1814082 | Aug 2007 | EP |
2249307 | Nov 2010 | EP |
WO2005059832 | Jun 2005 | WO |
Entry |
---|
Delakis et al., “Convolutional Face Finder: A Neural Architecture for Fast and Robust Face Detection”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, No. 11, Nov. 2004, pp. 1408-1423. |
Han et al., “Geometric and Texture Cue Based Depth-map Estimation for 2D to 3D Image Conversion”, IEEE International Conference on Consumer Electronics (ICCE), Chiang Mai, Thailand, Nov. 28, 2011, pp. 651-652. |
Itti et al., “A Model of Saliency-Based Visual Attention for Rapid Scene Analysis”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, No. 11, Nov. 1998, pp. 1254-1259. |
Tong et al., “Blur Detection for Digital Images Using Wavelet Transform”, IEEE International Conference on Multimedia and Expo, Taipei, Taiwan, Jun. 27, 2004, pp. 17-20. |
Viola et al., “Rapid Object Detection using a Boosted Cascade of Simple Features”, IEEE Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii, USA, Dec. 8, 2001, pp. 511-518. |
Zhu et al., “Fast Face Detection Using Subspace Discriminant Wavelet Features”, IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, USA, Jun. 13, 2000, pp. 636-642. |
Suh et al: “Automatic thumbnail cropping and its effectiveness”; ACM Symposium on User Interface Software and Technology, 95-104; 2003. |
Kiess et al: “Improved Image Retargeting by Distinguishing between Faces in Focus and out of Focus”; Multimedia and Expo Workshops (ICMEW), 2012 IEEE; Jul. 9-13, 2012. |
Byoung Chul Ko et al: “Object-of-interest image segmentation based on human attention and semantic region clustering”; Abstract; Journal of the Optical Society of America; Oct. 1, 2006. |
Kiess et al: “SeamCrop for image retargeting”; Proc. SPIE 8304, Multimedia on Mobile Devices 2012; and Multimedia Content Access: Algorithms and Systems VI; Feb. 1, 2012. |
Saxena et al: “Learning Depth from Single Monocular Images”; ICCV workshop on 3D Representation for Recognition, 2007. |
Vaquero et al: “A survey of image retargeting techniques”; SPIE, 2010. |
Baveye et al: “Image and video saliency models improvement by blur identification” ; ICCVG; 2012. |
Number | Date | Country | |
---|---|---|---|
20170018106 A1 | Jan 2017 | US |