Field of the Invention
The present invention relates, in particular, to an object detection apparatus for detecting a main object, an object detection method thereof, and a storage medium.
Description of the Related Art
As a conventional method for detecting a main object in an input image, there is a method discussed in Japanese Patent Application Laid-Open No. 2012-243313, for example. According to the method discussed in Japanese Patent Application Laid-Open No. 2012-243313, an input image is first divided into a plurality of partial regions by using an automatic partition algorithm. In addition, based on a weighted sum of differences in feature amount between a given partial region and the other partial regions among the obtained plurality of partial regions, a salience degree of the given partial region is calculated. Then, a main object in the image is detected based on the obtained salience degree.
In addition, A. Dominik et al., Center-surround Divergence of Feature Statistics for Salient Object Detection, ICCV 2011, for example, discusses another method for detecting a main object in an input image. According to the method discussed in A. Dominik et al., Center-surround Divergence of Feature Statistics for Salient Object Detection, ICCV 2011, a plurality of types of feature amounts is first extracted from an input image, and multiple-resolution images are generated with respect to the feature amounts. In addition, two partial regions of different sizes are set for each of the obtained multiple-resolution images, and a salience degree is calculated based on a difference in statistical distribution (Kullback-Leibler divergence) of extracted feature amount between the aforementioned two partial regions. Furthermore, a salience degree image is generated by integrating the salience degrees obtained for the respective multiple-resolution images, and lastly a main object in the image is detected based on the obtained salience degree image.
T. Kadir et al., An affine invariant salient region detector, ECCV 2004, for example, discusses yet another method for detecting a main object (or a partial region thereof) in an input image. According to the method discussed in T. Kadir et al., An affine invariant salient region detector, ECCV 2004, a plurality of types of feature amounts is first extracted from an input image, and multiple-resolution images are generated with respect to the feature amounts. Then, two partial regions of different sizes are set for each of the generated multiple-resolution images. Thereafter, a salience degree is calculated based on a product of a difference in statistical distribution of extracted feature amount between the aforementioned two partial regions (a distance between scaled probability distributions) and an information amount of the feature amount extracted from one of the aforementioned two partial regions (information entropy). Furthermore, a salience degree image is generated by integrating the salience degrees obtained for the respective multiple-resolution images, and lastly a main object (or a partial region thereof) in the image is detected based on the obtained salience degree image.
As described above, according to the methods discussed in Japanese Patent Application Laid-Open No. 2012-243313 and in A. Dominik et al., Center-surround Divergence of Feature Statistics for Salient Object Detection, ICCV 2011, the salience degree is calculated based on the difference in statistical feature amount distribution in the input image, and the main object in the image is detected based on the obtained salience degree. However, there arises a problem in that the accuracy in detecting the main object degrades if the main object in the image is not visually prominent.
In addition, according to the method discussed in T. Kadir et al., An affine invariant salient region detector, ECCV 2004, the size of the information amount (information entropy) contained in the main object in the input image is calculated, and the main object in the image is detected based on the obtained size of the information amount (information entropy). However, there arises a problem in that this method is susceptible to noise caused by an environmental or observational factor, and the accuracy in detecting the main object degrades in turn.
The present invention is directed to enabling a main object in an image to be detected more robustly.
According to an aspect of the present invention, an object detection apparatus includes a first setting unit configured to set a first partial region in an input image, a second setting unit configured to set a second partial region, which is different from the first partial region set by the first setting unit, in the input image, a third setting unit configured to set a third partial region, which belongs to the second partial region, based on an information amount in the second partial region set by the second setting unit, a salience degree deriving unit configured to derive a salience degree based on a feature amount in the first partial region and a feature amount in the second partial region, and a detection unit configured to detect a main object in the input image based on the salience degree derived by the salience degree deriving unit and an information amount in the third partial region set by the third setting unit.
Further features of the present invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
Hereinafter, a first exemplary embodiment of the present invention will be described with reference to
As illustrated in
As illustrated in
Upon an image being input to the main object detection apparatus 100, the processing starts. In step S201, it is determined whether processes in steps S202 through S208 have been completed on all points in the input image. If the determination result indicates that the processes have been completed on all points in the input image (YES in step S201), the processing proceeds to step S209. If the determination result indicates that the processes have not been completed on all points in the input image (NO in step S201), the processing proceeds to step S202.
In step S202, the first partial region setting unit 101 sets a first partial region 301 in an image space of the input image, which has been input from the outside of the main object detection apparatus 100, as illustrated in
In step S203, the second partial region setting unit 102 sets a second partial region 302, which is different from the first partial region 301, in the image space of the input image, as illustrated in
In step S204, the second partial region setting unit 102 calculates an information amount (hereinafter referred to as first information amount) of the feature amount (e.g., luminance value, color component, edge intensity) in the second partial region 302. The size of the first information amount is calculated as entropy H through the following expression (1), for example.
In the expression (1), Pi represents an occurrence probability of the ith gradation in the feature amount in the second partial region 302. When the number of gradations in a given feature amount is 256 (=28), the maximum value of the entropy H is 8, and the minimum value thereof is 0.
In step S205, the third partial region setting unit 103 sets a third partial region based on the size of the first information amount in the second partial region 302 of the input image. More specifically, when the first information amount is large, for example, the third partial region setting unit 103 sets, as a third partial region 403, a relatively small square region that contains the first partial region 301 and that is contained by the second partial region 302, as illustrated in
When the first partial region 301 and the second partial region 302 are each set to a square region, a length L3 of a side of the third partial region 403 is calculated through the following expression (2) by using a length L1 of a side of the first partial region 301 and a length L2 of a side of the second partial region 302, for example.
According to the expression (2), the length L3 of the side of the third partial region 403 illustrated in
When the first partial region 301 and the second partial region 302 are set as illustrated in
When the first partial region 301 and the second partial region 302 are set as illustrated in
In addition, when the first partial region 301 and the second partial region 302 are set as illustrated in
In step S206, the salience degree calculation unit 104, serving as a salience degree deriving unit, calculates a salience degree based on the first partial region 301 set by the first partial region setting unit 101 and the second partial region 302 set by the second partial region setting unit 102. More specifically, as illustrated in
Alternatively, the salience degree may be calculated through the expression (4) by using the Pearson divergence DPR.
As another alternative, the salience degree may be calculated through the following expression (5) by using the relative Pearson divergence DRP. Here, β is an arbitrary real number between 0 and 1 inclusive.
As yet another alternative, the salience degree may be calculated through the following expression (6) by using the Kullback-Leibler divergence DKL.
As yet another alternative, the salience degree may be calculated through the following expression (7) by using the Bhattacharyya distance DBT.
As yet another alternative, the salience degree may be calculated through the following expression (8) by using the distance scale D.
As yet another alternative, the salience degree may be calculated through the following expression (9) by using Dabs.
Here, in the expressions (3) through (9), P(i) represents a probability of the ith gradation in probability density P of the feature amount extracted from the first partial region 301, and Q(i) represents a probability of the ith gradation in probability density Q of the feature amount extracted from the second partial region 302.
In step S207, the information amount calculation unit 105 calculates an information amount (hereinafter referred to as second information amount) of the feature amount (e.g., luminance value, color component, edge intensity) in the third partial region 403 set by the third partial region setting unit 103. More specifically, the second information amount may be given, for example, by a total of gradient intensities of the feature amount in the third partial region 403 that are calculated at the respective points in the third partial region 403. Here, the gradient intensity may be calculated by using a known image processing filter (e.g., Sobel filter, Canny filter, Laplacian filter, Gabor filter).
In step S208, the score calculation unit 106 calculates a score (scale indicating whether the main object is present) at the point to be processed in the input image, based on the salience degree obtained by the salience degree calculation unit 104 and the second information amount obtained by the information amount calculation unit 105. Here, the score at each point in the input image may be given, for example, by a product of the salience degree obtained by the salience degree calculation unit 104 and the second information amount obtained by the information amount calculation unit 105. Alternatively, the score may be given by a sum of the salience degree obtained by the salience degree calculation unit 104 and the second information amount obtained by the information amount calculation unit 105. As another alternative, the score may be given by a combination of a product and a sum of the salience degree obtained by the salience degree calculation unit 104 and the second information amount obtained by the information amount calculation unit 105.
In step S209, the identification unit 107 detects the main object in the input image based on the scores calculated by the score calculation unit 106. More specifically, for example, the identification unit 107 first generates a score map (see
The result of detecting the main object obtained as described above is used by an apparatus that utilizes the main object detection apparatus 100. For example, in a case where the region detected as the main object is brought into focus and a high-quality image of the region is to be captured by a digital still camera, the aforementioned result is transmitted to a CPU, a program, and so on within the digital still camera that controls the main object detection apparatus 100.
According to the present exemplary embodiment described above, the main object is detected by calculating the scores according to the visual salience degree and the information amount of the feature amount (e.g., luminance value, color component, edge intensity) in the third partial region. This allows, even when a main object in an image is not visually prominent or noise caused by an environmental or observational factor is present, the main object to be detected robustly.
Hereinafter, a second exemplary embodiment of the present invention will be described with reference to
In step S204 of
The position and the size of the third partial region to be set in step S205 of
Alternatively, the position and the size of the third partial region may be calculated by using the Laplacian of Gaussian (LoG), which is a type of a bandpass filter, described in T. Lindeberg (1994), Scale-Space Theory in Computer Vision, Springer, ISBN 0-7923-9418-6. In this case, a circular region having a diameter represented by the Gaussian parameter σ (or a value obtained by multiplying the Gaussian parameter σ by a predetermined coefficient) is obtained based on the extrema (maximum value and minimum value) that are obtained by comparing a pixel of interest and neighborhood pixels in an LoG image or based on the coordinates (x, y) where a pixel value is equal to or greater than a threshold value in the LoG image.
As another alternative, the position and the size of the third partial region may be calculated by using a Gabor filter, which is a type of a bandpass filter and is a known image processing filter. In this case, a circular region having a diameter represented by the Gaussian parameter σ (or a value obtained by multiplying the Gaussian parameter σ by a predetermined coefficient) is obtained based on the coordinates (x, y) of the extrema (maximum value and minimum value) that are obtained by comparing a pixel of interest and neighborhood pixels in terms of their filter output values.
After the third partial region is set as described above, the main object can be detected through procedures similar to those in the first exemplary embodiment.
Hereinafter, a third exemplary embodiment of the present invention will be described with reference to
In step S202 of
Alternatively, as illustrated in
As yet another alternative, as illustrated in
In step S203 of
In step S204 of
In step S205 of
Here, the sizes of the information amounts in the first partial region and the second partial region may be given, for example, by the entropy H indicated in the expression (1) above. Alternatively, the size of the information amount in each of the first partial region and the second partial region may be given by a total of gradient intensities of the feature amount in the first or second partial region calculated at each point in the first or second partial region. Here, the gradient intensity may be calculated, for example, by using a known image processing filter (e.g., Sobel filter, Canny filter, Laplacian filter, Gabor filter).
In step S209, the identification unit 107 detects the main object in the input image based on the scores calculated by the score calculation unit 106. More specifically, the score calculation unit 106 first calculates scores on all the combinations of the first partial region set by the first partial region setting unit 101 and the second partial region set by the second partial region setting unit 102 in the input image.
While the processes in steps S201 through S208 are repeated on all the points in the input image in the first and second exemplary embodiments, the processes in steps S201 through S208 are repeated on all the combinations of the first partial region and the second partial region in the present exemplary embodiment. The scores are calculated by using the salience degree and the second information amount, as in the first exemplary embodiment.
Then, the identification unit 107 generates a score map in which the scores on all the combinations are arranged in an image space. The obtained score map is subjected to the aforementioned binarization processing described in N. Otsu, An Automatic Threshold Selection Method Based on Discriminant and Least Squares Criteria (Japanese), Transactions of the Institute of Electronics and Communication Engineers of Japan, vol. J63-D, No. 4, (1980), pp. 349-356, and thus a candidate region for the main object is set. In addition, by setting a rectangular region that circumscribes the obtained candidate region for the main object, the main object in the input image is detected.
According to the present exemplary embodiment described above, even when a main object in an image is not visually prominent or noise caused by an environmental or observational factor is present, the main object can be detected robustly.
An exemplary embodiment of the present invention can also be implemented by executing the following process. More specifically, software (program) for implementing the functions of the above-described exemplary embodiments is supplied to a system or to an apparatus through a network or various types of storage media, and a computer (or a CPU, a microprocessor unit (MPU), or the like) in the system or in the apparatus then loads and executes the program.
According to the exemplary embodiments of the present invention, even when a main object in an image is not visually prominent or noise caused by an environmental or observational factor is present, the main object can be detected more robustly.
Embodiments of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions recorded on a storage medium (e.g., non-transitory computer-readable storage medium) to perform the functions of one or more of the above-described embodiment(s) of the present invention, and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more of a central processing unit (CPU), micro processing unit (MPU), or other circuitry, and may include a network of separate computers or separate computer processors. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
This application claims the benefit of Japanese Patent Application No. 2014-083703 filed Apr. 15, 2014, which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2014-083703 | Apr 2014 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
8363984 | Goldman | Jan 2013 | B1 |
8478055 | Hosoi | Jul 2013 | B2 |
9275305 | Yokono | Mar 2016 | B2 |
9330334 | Lin | May 2016 | B2 |
20050163344 | Kayahara | Jul 2005 | A1 |
20060193536 | Pilu | Aug 2006 | A1 |
20090220155 | Yamamoto | Sep 2009 | A1 |
20100142821 | Hosoi | Jun 2010 | A1 |
20100226564 | Marchesotti | Sep 2010 | A1 |
20110019921 | Hamada | Jan 2011 | A1 |
20120063639 | Yano | Mar 2012 | A1 |
20120092495 | Yano | Apr 2012 | A1 |
20120275701 | Park | Nov 2012 | A1 |
20120288189 | Hu | Nov 2012 | A1 |
20130064455 | Yamanaka | Mar 2013 | A1 |
20130121590 | Yamanaka | May 2013 | A1 |
20130223740 | Wang | Aug 2013 | A1 |
20140056518 | Yano | Feb 2014 | A1 |
20140334682 | Lee | Nov 2014 | A1 |
20140347513 | Kobayashi | Nov 2014 | A1 |
20150055824 | Hong | Feb 2015 | A1 |
20150071532 | Ruan | Mar 2015 | A1 |
20150117784 | Lin | Apr 2015 | A1 |
20150227817 | Lin | Aug 2015 | A1 |
20150294181 | Yamanaka | Oct 2015 | A1 |
20150324995 | Yamamoto | Nov 2015 | A1 |
Number | Date | Country |
---|---|---|
2012-243313 | Dec 2012 | JP |
Entry |
---|
Klein et al., “Center-surround Divergence of Feature Statistics for Salient Object Detection”, IEEE International Conference on Computer Vision (ICCV), Nov. 6-13, 2011, pp. 2214-2219, Barcelona. |
Kadir et al., “An affine invariant salient region detector”, Computer Vision—ECCV 2004, 8th European Conference on Computer Vision, Prague, Czech Republic, May 11-14, 2004, Proceedings, Part I, pp. 228-241. |
Lowe, David G., “Object Recognition from Local Scale-Invariant Features”, Proc. of the International Conference on Computer Vision, Corfu, Sep. 1999, pp. 1-8. |
Lindeberg, Tony, “Scale-Space Theory in Computer Vision”, The Springer International Series in Engineering and Computer Science, vol. 256, (1994), pp. 1-69. |
Sharon, et al., “Fast Multiscale Image Segmentation”, IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head Island, SC, Jun. 13, 2000-Jun. 15, 2000, vol. 1, pp. 1-8. |
Nobuyuki, Otsu, “An Automatic Threshold Selection Method Based on Discriminant and Least Squares Criteria”, The Institute of Electronics and Communication Engineers, Apr. 1980, vol. J63-D No. 4, pp. 349-356. |
Number | Date | Country | |
---|---|---|---|
20150294181 A1 | Oct 2015 | US |