1. Field of the Invention
The present invention relates to an object estimation device, an object estimation method and an object estimation program for estimating position and size of an object in an image.
2. Description of the Related Art
As a method for detecting a specific object, such as a face or a person in an image, detection methods using an image pyramid have been proposed (see, for example, U.S. Patent Application Publication Nos. 2009324087 and 2008247651, which are hereinafter referred to as Patent Documents 1 and 2). In the techniques disclosed in Patent Documents 1 and 2, a plurality of smoothed images of different scales are generated by repeating convolution of an image with a Gaussian filter. Then, a differential image between each pair of the smoothed images of adjacent scales is generated (which is also referred to as a DOG (difference of Gaussian) image). The generation of the DOG images is carried out for each of multi-resolution images of different resolutions. Then, the maximum value and the minimum value in each DOG image are detected to detect the position of an object and to detect the size of the object in the multi-resolution images.
Further, a method for estimating an object area with using a Gaussian filter tailored to characteristics of the object has been proposed (see, for example, Japanese Unexamined Patent Publication No. 2003-248824, which is hereinafter referred to as Patent Document 3). Patent Document 3 discloses carrying out filtering with using a Gaussian filter, which has filtering characteristics tailored to the contour of an object to be detected, and then, highlighting an area having the contour of interest and diminishing the other areas, thereby detecting the object.
However, the techniques disclosed in Patent Documents 1 and 2 require generation of the differential image between each pair of smoothed images of adjacent scales, and it is necessary to generate a large number of differential images for a large number of resolution images. Therefore, speeding up of the operation is difficult to be achieved. Further, in the case where only the Gaussian filter based on the contour is applied, as disclosed in Patent Document 3, it is impossible to achieve accurate object detection.
In view of the above-described circumstances, the present invention is directed to providing an object estimation device, an object estimation method and an object estimation program which provide high-speed and accurate object estimation.
An aspect of the object estimation device of the invention is an object estimation device including: smoothing means for generating a plurality of smoothed images of different scales by repeating convolution of an image with a smoothing filter having filtering characteristics corresponding to a contour of an object; differential image generating means for generating a plurality of differential images by calculating a difference between each pair of the smoothed images apart from each other by a predetermined scale interval in the smoothed images generated by the smoothing means; combining means for generating a combined image by combining the differential images generated by the differential image generating means; and position estimating means for estimating a position of the object from a position where a maximum or minimum signal value is found in the combined image generated by the combining means.
An aspect of the object estimation method of the invention is an object estimation method including the steps of: generating a plurality of smoothed images of different scales by repeating convolution of an image with a smoothing filter having filtering characteristics corresponding to a contour of an object; generating a plurality of differential images by calculating a difference between each pair of the smoothed images apart from each other by a predetermined scale interval in the generated smoothed images; generating a combined image by combining the generated differential images; and estimating a position of the object from a position where a maximum or minimum signal value is found in the generated combined image.
An aspect of the object estimation program of the invention is a program for causing a computer to execute a procedure comprising: generating a plurality of smoothed images of different scales by repeating convolution of an image with a smoothing filter having filtering characteristics corresponding to a contour of an object; generating a plurality of differential images by calculating a difference between each pair of the smoothed images apart from each other by a predetermined scale interval in the generated smoothed images; generating a combined image by combining the generated differential images; and estimating a position of the object from a position where a maximum or minimum signal value is found in the generated combined image.
Repeating convolution of the image with the smoothing filter herein refers to repeating further convolution of the smoothed image with the smoothing filter.
It should be noted that the smoothing filter may be a Gaussian filter, for example, and, in particular, a smoothing filter having filtering characteristics tailored to the contour of the object.
The smoothing means may generate a×k pieces of smoothed images L(x, y, σi), wherein a is an integer of 2 or more, σi is a scale of the smoothing filter and i=1 to a×k. In this case, the differential image generating means generates the plurality of differential images by calculating a difference between each pair of the smoothed images apart from each other by a predetermined scale interval, and may generate, for example, k pieces of differential images G (x, y, σi), wherein k is an integer of 2 or more, using the a×k pieces of smoothed images L(x, y, σi) according to equation (1) below:
G(x,y,σi)=(x,y,σi)−L(x,y,σi×a) (1).
Alternatively, the smoothing means may generate k pieces of smoothed images L(x, y, σ1), wherein k is an integer of 2 or more, σi is a scale of the smoothing filter and i=1 to k, and the differential image generating means may generate k-p pieces of differential images G(x, y, σx), wherein p is an integer of 1 or more, using the k pieces of smoothed images L(x, y, σi) according to equation (2) below:
G(x,y,σi)=L(x,y,σi)−L(x,y,σi+p) (2).
In this case, the size estimating means may estimate the size of the object based on an extent of widening of blur in the smoothed images L(x, y, σi) used to generate the differential image G(x, y, σi) with the largest differential value or the smallest differential value.
The object estimation device may further include preprocessing means for generating a resolution-reduced image from an inputted original image. Further, the image may be a moving image, and the object estimation device may further include preprocessing means for extracting a motion area containing motion from a frame image based on the moving image, carrying out background differentiation, and applying contrast reduction.
The combining means generates the combined image by combining the differential images. At this time, the combining means may generate the combined image with using all the differential images, or using some of the differential images selected depending on the variation range of the size of object.
The position estimating means and the size estimating means may determine which of the maximum (largest) value or the minimum (smallest) value should be used to estimate the position and size, depending on the object to be detected. In the case where the object has a larger signal value than that of the background (for example, where the object is white and the background area is black), the maximum (largest) value may be used. In contrast, in the case where the object has a smaller signal value than that of the background (for example, where the object is black and the background area is white), the minimum (smallest) value may be used.
The object estimation device may further include object detecting means for detecting the object in the image based on the position of the object estimated by the position estimating means or the size of the object estimated by the size estimating means.
According to the object estimation device, the object estimation method and the object estimation program of the invention, a plurality of smoothed images of different scales are generated by repeating convolution of an image with a smoothing filter having filtering characteristics corresponding to a contour of an object, a plurality of differential images are generated by calculating a difference between each pair of the smoothed images apart from each other by a predetermined scale interval in the generated smoothed images, a combined image is generated by combining the generated differential images, and a position of the object is estimated from a position where a maximum or minimum signal value is found in the generated combined image. In this manner, high-speed and accurate estimation of the position of an object can be achieved.
In the case where the size estimating means for estimating the size of the object from the differential image with the largest differential value or the smallest differential value among the differential images generated by the differential image generating means is provided, estimation of the size of the object can be achieved without generating multi-resolution images as in the conventional techniques, thereby allowing efficient estimation of the size of the object.
In the case where the smoothing means generates the a×k pieces of smoothed images L(x, y, σi) (where σi is a scale of the smoothing filter and i=1 to a×k), and the differential image generating means generates the k pieces of differential images G(x, y, σi) (k is an integer of 2 or more) according to equation (1) above with using the a×k pieces of smoothed images L(x, y, σi), accurate estimation of the position and size of the object can be achieved.
In the case where the preprocessing means for generating a resolution-reduced image from the inputted original image is provided, faster detection can be achieved with the resolution of the image reduced to some extent without impairing the accuracy of detection.
In the case where the object detecting means for detecting the object in the image based on the position of the object estimated by the position estimating means is provided, more accurate and efficient detection of the object can be achieved with focusing the detection of the object in an area around the position estimated by the position estimating means.
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
The object estimation device 1 shown in
The preprocessing means 10 applies preprocessing to an inputted moving image. The preprocessing means 10 includes a resolution converting means 11 and a motion area extracting means 12. The resolution converting means 11 reduces the resolution of the moving image to provide a reduced image of, for example, ⅛×⅛ of the number of pixels of the inputted image. Reducing the resolution of the original image to a necessary and sufficient resolution for the object estimation allows speeding up of the operation.
The motion area extracting means 12 extracts a motion area from a predetermined frame image forming the moving image. A known technique, such as a method involving calculating differential values between background images or differential values of an inter-frame image may be used to extract the motion area. Further, the motion area extracting means 12 has a function to perform gray-scale transformation or binarization, such as to convert the color of the motion area into white and the color of the background area into black, as shown in
The smoothing means 20 shown in
The smoothing filter may, for example, be a Gaussian filter. In particular, the smoothing filter is formed by a 3×3 operator having filtering characteristics tailored to the shape of an object to be detected (person's head in this embodiment), as shown in
The differential image generating means 30 shown in
G(x,y,σi)=L(x,y,σi)−L(x,y,σi×a) (1).
It should be noted that the differential image G(x, y, σi) may represent absolute values of the differential values resulting from equation (1).
As can be seen from equation (1), each differential image G(x, y, σi) is formed by differences between the smoothed images L of predetermined scales σi and σi×a. For example, in the case where a=2 and a×k=60, 30 differential images G(x, y, σi) including those generated from a pair of the smoothed images L of scales σi and σ2, a pair of the smoothed images L of scales σ2 and σ4, a pair of the smoothed images L of scales σ3 and σ6, . . . , and a pair of the smoothed images L of scales σ30 and σ60 are generated. Then, the combining means 40 combines the differential images G(x, y, σi) generated by the differential image generating means 30 to generate a combined image AP(x, y), as shown in
It should be noted that, although each differential image G(x, y, σi) is generated from a pair of the smoothed images L(x, y, σi) of scales σi and σi×a in this example, each differential image G(x, y, σi) may be generated from a pair of the smoothed images L(x, y, σi) of scales σi and σi+p (where p is an integer of 1 or more), as represented by equation (2) below:
G(x,y,σi)=L(x,y,σi)−L(x,y,σi+p) (2).
The position estimating means 50 shown in
First, since the above-described smoothing is performed with using the smoothing filter of having filtering characteristics tailored to the shape of the object, an area having the specified shape is highlighted and the other areas are diminished in the resulting smoothed images L(x, y, σi). Although contour components of the object remain in the smoothed images even after some dozen times of smoothing, the area of the object is blurred and widened as the scale i is increased, as shown in
Here, assumption is made that the shape and size of the object in the image are the shape and size of the object in a given smoothed image L(x, y, σi). Further, in order to calculate the saliency of the shape and size of the object in this smoothed image L(x, y, σi), another smoothed image L(x, y, σi×a) of a scale that is apart from the scale of the smoothed image L(x, y, σi) by a predetermined scale interval is set as a background (see
In this operation, if the differential image G(x, y, σi) contains the object having an ideal shape (a shape that matches the best with the filtering characteristics) and no noise in the background, it has the largest signal value when compared to the other differential images G(x, y, σi). That is, when pixel components forming the object in the preprocessed image P(x, y) have been widened to fill an area that is almost equal to the area of the object, the differential value of the differential image G(x, y, σi) is maximized. For example, if the object in the image P(x, y) has a circular shape with a 10-pixel diameter, as shown in
On the other hand, appearance of the object actually captured in the image P varies depending on the positional relationship between the camera and the object, individual variability, etc., and the contour and size of the object are not always those of the ideal shape described above, as shown in
It should be noted that, although the combining means 60 adds up all the differential images G(x, y, σi) in this example, the differential images G(x, y, σi) used in the addition may be changed depending on the size variation range desired to be absorbed. For example, the size variation range desired to be absorbed may be set depending on the type of the object, and the combined image AP(x, y) for detecting a certain object may be generated with using only the differential images G(x, y, σi) where i=3 to k in the addition and without using the differential images G(x, y, σi) where i=1 and i=2. Similarly, the differential images G(x, y, σi) where i=1 to k−q−1 (q is an integer of 1 or more) may be used in the addition without using the differential images G(x, y, σi) where i=k−q to k.
Further, the scale i in the above-described equations (1) and (2) is a parameter corresponding to the size of the object to be detected in the image P. If the size of the object is small, the largest value is detected from the differential image G(x, y, σi) with a small scale i. If the size of the object is large, the largest value is detected from the differential image G(x, y, σi) with a large scale i. Using this nature, the size estimating means 60 detects the differential image G(x, y, σi) with the largest differential value from the differential images G(x, y, σi), and estimates the size of the object from the scale, i.e., the number of repetition of the smoothing, of the detected differential image G(x, y, σi).
It should be noted that the size estimating means 60 may detect the differential image G(x, y, σi) with the largest differential value from the differential images G(x, y, σi), or may detect the largest differential value at the position of the object estimated by the position estimating means 50. Further, since the extent of widening (blurring) of the object image in the image P through the smoothing is known in advance, estimation of the size of the object can be achieved when the scale i is found out. Still further, since the appearance of the object to be detected varies, as described above, the estimated size (which is estimated from the differential image G(x, y, σi) with the largest differential value) ±α (α is a preset value) may be outputted as the estimated size of the object.
Further, since the motion area extracting means 12 carries out the gray-scale transformation or binarization to convert the color of the motion area (object) into white and the color of the background area into black (see
The object estimation device 1 shown in
As described above, estimation of the position and size of the object can be achieved only by calculating the differential image for each pair of the images of scales σi and σ1×2, which are apart from each other by a predetermined scale interval σ1×2 (or σi×a or σi+p). Therefore, efficient and accurate object estimation can be achieved with a smaller number of the differential images to be generated than those required in the conventional object detection techniques using the DOG images.
Then, the smoothing means 20 generates the smoothed images (step ST2, see
According to the above-described embodiment, the smoothed images L(x, y, σi) are generated by repeating convolution of the image P(x, y) with the smoothing filter having the filtering characteristics corresponding to the contour of the object, the differential images are generated by calculating a difference between each pair of the smoothed images L(x, y, σi) of scales apart from each other by a predetermined scale interval, the combined image AP(x, y) is generated by combining the differential images G(x, y, σi), and the position of the object is estimated from the position (xmax, ymax) where the maximum or minimum signal value is found in the combined image AP(x, y). In this manner, high-speed and accurate estimation of the position of the object can be achieved.
Further, in the case where the size estimating means 60, which estimates the size of the object from the differential image G(x, y, σi) with the largest differential value among the differential images G(x, y, σi) generated by the differential image generating means 30, is provided, estimation of the size of the object can be achieved without generating the multi-resolution images as in the conventional techniques, thereby allowing efficient estimation of the size of the object.
In the case where the smoothing means 20 generates the a×k pieces of smoothed images L(x, y, σi) (where σi is a scale of the smoothing filter and i=1 to a×k), and the differential image generating means 30 generates the k pieces of differential images G(x, y, σi) (k is an integer of 2 or more) according to equation (1) described above with using the a×k pieces of smoothed images L (x, y, σi), as shown in
In the case where the preprocessing means 10, which generates the resolution-reduced image from the inputted original image, is provided, faster detection can be achieved with the resolution of the image reduced to some extent without impairing the accuracy of detection.
In the case where the object detecting means 70, which detects the object in the image based on the position of the object estimated by the position estimating means 50, is provided, more accurate and efficient object detection can be achieved with focusing the detection of the object in an area around the position estimated by the position estimating means 50.
Embodiments of the present invention are not limited to the above-described embodiments. For example, although the specified object is detected from a moving image in the above-described embodiment, the object may be detected from a still image. Further, although the object is a person's head in the above-described embodiments, this is not intended to limit the invention, and the object may be face, car, etc., for example.
Number | Date | Country | Kind |
---|---|---|---|
151865/2010 | Jul 2010 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
20030118246 | August | Jun 2003 | A1 |
20080247651 | Takaki et al. | Oct 2008 | A1 |
20090324087 | Kletter | Dec 2009 | A1 |
Number | Date | Country |
---|---|---|
2003248824 | Sep 2003 | JP |
2009130451 | Oct 2009 | WO |
Entry |
---|
Fujiyoshi, “Gradient-Based Feature Extraction -SIFT and HOG-”, Study Report of the Information Processing Society of Japan, the Information Processing Society of Japan, vol. 2007, No. 87, pp. 211-224 (2007). |
Fukushima et al., “A Study of Multi-scaled Edge Integration”, Technololgy Report of the Institute of Television Engineers of Japan, the Institute of Television Engineers of Japan, vol. 17, No. 38, pp. 31-36 (1993). |
Office Action of JP 2010-151865 dated Nov. 19, 2013. |
Number | Date | Country | |
---|---|---|---|
20120057759 A1 | Mar 2012 | US |