This patent application is based on and claims priority pursuant to 35 U.S.C. §119(a) to Japanese Patent Application No. 2013-245401, filed on Nov. 27, 2013, in the Japan Patent Office, the entire disclosure of which is hereby incorporated by reference herein.
Technical Field
Example embodiments generally relate to an image analyzing device, an image analyzing method, and a recording medium storing an image analyzing program.
Background Art
As image recognition technology, an image analyzing technique is used by which input images are analyzed and an object or the like included in these images is identified. In such image recognition technology, reference data that indicates characteristics of the image of an object to be identified is compared with the amount of characteristic of each portion of an input image that may include the object to be identified, to specify the position at which the object to be identified is displayed on the input image.
Embodiments of the present invention described herein provide an image analyzing device, an image analyzing method, and a recording medium storing an image analyzing program are provided. Each of the image analyzing device, the image analyzing method, and the recording medium storing the image analyzing program recognizes an area where a target is displayed based on a feature value of an input image to generate a recognition result, generates space recognition information to recognize spatial properties of each portion of the input image, divides the image into a plurality of similar areas according to similarity in feature value of the input image, the similar area having a similar feature value, obtains specified attribute data of the spatial properties to be referred to, from image areas around the recognized area where the target is displayed, recognizes the spatial properties according to the space recognition information, and determines whether the recognition result is appropriate at the portion where the target is displayed, based on the distribution of the similar areas of the specified spatial properties in the image areas around the recognized area where the target is displayed.
A more complete appreciation of exemplary embodiments and the many attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings.
The accompanying drawings are intended to depict exemplary embodiments of the present disclosure and should not be interpreted to limit the scope thereof. The accompanying drawings are not to be considered as drawn to scale unless explicitly noted.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the present invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “includes” and/or “including”, when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
In describing example embodiments shown in the drawings, specific terminology is employed for the sake of clarity. However, the present disclosure is not intended to be limited to the specific terminology so selected and it is to be understood that each specific element includes all technical equivalents that have the same structure, operate in a similar manner, and achieve a similar result.
In the following description, illustrative embodiments will be described with reference to acts and symbolic representations of operations (e.g., in the form of flowcharts) that may be implemented as program modules or functional processes including routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types and may be implemented using existing hardware at existing network elements or control nodes. Such existing hardware may include one or more Central Processing Units (CPUs), digital signal processors (DSPs), application-specific-integrated-circuits (ASICs), field programmable gate arrays (FPGAs) computers or the like. These terms in general may be collectively referred to as processors.
Unless specifically stated otherwise, or as is apparent from the discussion, terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical, electronic quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
Example embodiments of the present invention will be described below in detail with reference to the drawings. In the present example embodiment, an image analyzing device, which analyzes an input image and recognizes an object or the like included in the image, determines whether the results of recognition are reliable based on the characteristics of images around the recognized object in the input image. In the above processes, types of space in areas of an image such as “ground”, “upright object”, and “sky” are recognized, and the recognition results of such types of space are considered to determine the reliability. Accordingly, the reliability can be determined based on the amount of amount of characteristic of the image around a target regardless of the angle of the input image.
Firstly, the hardware configuration of an image analyzing device 1 according to an example embodiment of the present invention is described with reference to
The CPU 10 serves as a computation unit, and controls the entire operation of the image analyzing device 1. The RAM 20 is a volatile memory capable of reading and writing data at high speed, and is used as a working area when the CPU 10 processes data. The ROM 30 is a read-only non-volatile memory in which firmware programs or the like are stored. The volatile HDD 40 is a data readable/writable nonvolatile memory in which an operating system (OS), various kinds of control programs, applications, programs, or the like are stored.
The I/F 50 connects various kinds of hardware, networks, or the like to the bus 90, and controls these elements. The LCD 60 is a user interface that allows a user to visually monitor the state of the inspection device 4. The operation panel 70 is a user interface such as a keyboard or a mouse used by a user to input data to the image analyzing device 1.
The dedicated device 80 is a hardware device used to realize functions dedicated to the image analyzing device 1, and is, for example, a camera that captures an image to generate image data, or an application-specific-integrated-circuit that executes processes required to analyze an image at high speed.
In such a hardware configuration, the CPU 10 performs computation according to programs stored on the ROM 30 or programs read on the RAM 20 from the HDD 40 or another recording medium such as an optical disk, to configure a software controller. The software controller as configured above cooperates with hardware to configure a functional block (
Next, the functional configuration of the image analyzing device 1 according to an example embodiment of the present invention is described. As illustrated in
The data obtaining unit 101 obtains the data of an image to be analyzed and the designation of learning information to be used for analyzing the image, through the operation made on the operation panel 70 and through the network. The DB selector 102 selects the designated learning information from the learning information stored in the learning information DB 104, based on the designated learning information obtained by the data obtaining unit 101. The image obtaining unit 103 obtains the image data of the object obtained by the data obtaining unit 101.
The “learning ID” is an example of identifier that identifies the learning information for each situation. The “learnt image” is the data indicating each of the situations and an object to be identified, as described above. The “feature value” indicates the photographic characteristics of an object to be identified, and includes, for example, vector data. A target object on an image to be analyzed is identified by comparing the “amount of characteristics” included in the learning information with the amount of characteristics extracted from the image to be analyzed.
The “type of specified space” indicates space in which an object to be identified exists, on an image of each situation. For example, image areas around the image area where an object to be identified is displayed are classified into types of space such as “ground”, “upright object”, and “sky”. In a situation of “car in urban area”, it is considered that the vehicle, which is the object to be identified, is normally displayed on ground. Thus, “ground” is selected.
The “distribution of features around target” indicates the states of images in image areas around an object to be identified. In the “distribution of features around target” according to the example embodiment of the present invention, the pixel values of the pixels of an image is classified into classes 1 to 6, and the ratio of each of the classes 1 to 6 in image areas around a target object is stored. Moreover, the mean value of the photographic feature values of each class is stored. The “distribution of features around target” is used as “distribution of learnt image”, which is class distribution in the learning information.
Conventionally, the reliability of the results of the identification of a target object to be identified is checked based on images around a target to be identified. For example, when vehicles are to be identified in the satellite images captured by artificial satellites, the images positioned outside the road are excluded from the image areas identified as vehicles. However, such processing is only applicable to limited circumstances such as the case in which satellite images are analyzed as described above.
By contrast, in the image analysis according to the present example embodiment, the “distribution of features around target” is calculated for a certain type of space on an image. The type of space is, for example, “ground”, as depicted in
In the example of
The target recognition unit 105 analyzes the image obtained by the image obtaining unit 103 based on the learning information sent from the learning information DB 104, and specifies an area in which an object to be identified is located in an image to be analyzed.
The target recognition unit 105 refers to the “feature values” described above with reference to
Further, for the area of “1” where the object to be identified is supposed to be displayed as a results of recognition performed by the target recognition unit 105 as illustrated at bottom-right portion of
The space recognition unit 106 analyzes an image obtained by the image obtaining unit 103, and estimates the spatial structure of the image to be analyzed.
As illustrated in
The space recognition unit 106 may use several methods for recognizing space. For example, a method disclosed in “Hoiem, D., Efros, A. A., and Hebert, M. (2005). Geometric Context from a Single Image. Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on (Volume 1), pp. 654-661, Vol. 1” may be used by the space recognition unit 106.
The feature dividing unit 107 analyzes an image obtained by the image obtaining unit 103, and divides the pixels of an image to be analyzed into classes 1 to 6 according to the degree of similarity in pixel value. In other words, the feature dividing unit 107 divides an image into several areas according to the similarity in feature value of an input image. Such a divided area has a similar feature value, and is referred to as a similar area.
The feature dividing unit 107 may perform feature dividing processes by using, for example, superpixel segmentation. In other words, the feature dividing unit 107 analyzes features such as color, texture, and shape of an image to be analyzed, and labels an area having a similar feature as a group of pixels. As described above with reference to
The space selector 108 obtains the types of space to be referred to in the image recognition processes performed by the recognition finalizing unit 109, based on the learning information output from the learning information DB 104. The space selector 108 refers to the “type of specified space” described above with reference to
The recognition finalizing unit 109 checks the reliability of the results of the recognition performed by the target recognition unit 105, based on the data generated by the space recognition unit 106 and the feature dividing unit 107, and then outputs the final identification result of the object. In other words, the recognition finalizing unit 109 determines whether the results of the recognition performed by the target recognition unit 105 are appropriate.
Next, the recognition finalizing unit 109 obtains a classification result, as described above with reference to
The mean values Sclass1-001 to Sclass6-001 associated with the classes 1 to 6 of the “distribution of features around target” in the learning information described above with reference to
Next, the recognition finalizing unit 109 obtains the results of space type recognition, as described above with reference to
In
Among the areas of class 1, class 2, and class 4, only class 2 is recognized as “ground” in the result of space type recognition illustrated in
As described above, after the class distribution is calculated for each specified type of space, the recognition finalizing unit 109 calculates a probability “P(periphery)k” of the class distribution of image areas around the object specified by the target recognition unit 105, based on the class distribution as depicted in
As described above, after “P(specified)i”, “P(feature)k”, and “P(periphery)k” are calculated, the recognition finalizing unit 109 calculates an overall precision Pi based on these three probabilities as in formula (1) below (S907). Note that, while formula (1) may be expressed in various other ways, the value of Pi always becomes greater in a calculation result as each of the values of “P(specified)i”, “P(feature)k”, and “P(periphery)k” increases.
Pi=f(P(Specified)i, P(feature)k, P(periphery)k) (1)
After the overall precision Pi is calculated, the recognition finalizing unit 109 compares the calculated value with a prescribed threshold to determine whether or not the identification result is appropriate (S908). In other words, when the calculated value of Pi is greater than a prescribed threshold, the recognition finalizing unit 109 determines that the identification result is appropriate, and outputs the data of an identification result. The output data indicates, for example, the data related to the image at top left of
In the “distribution of features around target” in an example of the learning information of
By contrast, the recognition processes of the target recognition unit 105 may specify an area where no vehicle is displayed, as indicated by a rectangular area surrounded by broken lines in
Referring to
Here, cases in which types of space are not considered for the identification result as illustrated in
When the value of “P(periphery)k” is calculated based on the above value, the value of class 1 in the “distribution of features around target” becomes 15% as depicted in
When the types of space are not considered, the “distribution of features around target” as depicted in
In other words, by the technique where a specified object is identified by analyzing an input image, it is possible to determine the precision of identification result by analyzing the images around an image to be identified. However, even if an object to be identified is properly captured in the image, images around the object vary depending on the conditions at the time the image is captured.
If images around an object to be identified are adopted with no discrimination when learning information is generated, the “distribution of features around target” depicted in
By contrast, if the type of space where little change is expected when an image is captured is specified, for example, as “ground” depending on the type of situation such as “car in urban area”, as in the example embodiment described above, only the class of image corresponding to the type of space that can be a feature of an object to be identified is referred to when the value of probability of the identification result is calculated. Accordingly, it becomes possible to calculate the value of “P(periphery)k” more precisely, and thus the precision of the processes of identifying an object or the like on an image can be improved.
In the example embodiment described above, an example case of “car in urban area” has been described. In this case, universally characteristic image on a captured image would be an area of ground where asphalt-paved road is usually displayed, and thus “ground” is selected as the “type of specified space” in the example embodiment described above. However, such a configuration is merely an example, and the “type of specified space” is variable depending on the “learnt image”. For example, in the case of “flying airplane”, “sky” is selected as the “type of specified space”. Moreover, in the case of “animal in forests”, “upright object” is selected as the “type of specified space” because it is likely that forests are captured as background.
In the example embodiment described above, cases have been described in which “ground”, “upright object”, and “space” are recognized as spatial properties. These properties are applicable in outdoor situations, and “floor”, “upright object”, and “ceiling” would be special properties in indoor situations. In other words, “under side” corresponding to under sides of the space such as “ground” and “floor” and “upper side” corresponding to upper sides of the space such as “sky” and “ceiling” are recognized as spatial properties by the space recognition unit 106.
In the example embodiment described above, cases were described in which class distribution is referred to for the image area of the spatial properties specified in the learning information among the image areas around the portion specified by the target recognition unit 105. Alternatively, the class distribution of the entire image may be referred to.
For example, a case in which the image illustrated in
For this reason, “P(periphery)k” is calculated such that a class that has a low ratio in the entire image has greater influence on the value of “P(periphery)k”. Accordingly, in relation to a target to be specified, calculation can be performed in view of characteristic peripheral images.
The image analyzing device 1 according to the example embodiment described above can be applied to, for example, a surveillance camera. A surveillance camera identifies a person as a target in the images captured by the camera, and needs to perform learning according to the installed environment as such installed environment varies. If the image analyzing device 1 is used for a surveillance camera, a wide range of installed environment can be covered if learnt information of people with the “type of specified space” of “ground” as depicted in
Moreover, the image forming apparatus 1 may be applied to a vehicle-installed camera. A vehicle-installed camera captures images ahead of the vehicle, and a system that detects a vehicle, person, and obstacle ahead of the vehicle is provided for the vehicle-installed camera. However, such a system may detect, for example, a part of building or a street lamp by error. By contrast, the image forming apparatus 1 according to the example embodiment described above can avoid erroneous detection because, for example, “ground” is selected as the “type of specified space” and thus an object in midair is excluded from the detection result.
Numerous additional modifications and variations are possible in light of the above teachings. It is therefore to be understood that within the scope of the appended claims, the disclosure of the present invention may be practiced otherwise than as specifically described herein. For example, elements and/or features of different illustrative embodiments may be combined with each other and/or substituted for each other within the scope of this disclosure and appended claims.
Further, as described above, any one of the above-described and other methods of the present invention may be embodied in the form of a computer program stored in any kind of storage medium. Examples of storage mediums include, but are not limited to, flexible disk, hard disk, optical discs, magneto-optical discs, magnetic tapes, nonvolatile memory cards, ROM, etc. Alternatively, any one of the above-described and other methods of the present invention may be implemented by ASICs, prepared by interconnecting an appropriate network of conventional component circuits, or by a combination thereof with one or more conventional general-purpose microprocessors and/or signal processors programmed accordingly.
Number | Date | Country | Kind |
---|---|---|---|
2013-245401 | Nov 2013 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
8300935 | Distante | Oct 2012 | B2 |
20080069426 | Liu | Mar 2008 | A1 |
20080150968 | Kihara | Jun 2008 | A1 |
20080304700 | Kihara | Dec 2008 | A1 |
20090007019 | Kobayashi et al. | Jan 2009 | A1 |
20090067747 | Inamoto et al. | Mar 2009 | A1 |
20090074236 | Kihara | Mar 2009 | A1 |
20090106699 | Kihara et al. | Apr 2009 | A1 |
20090110300 | Kihara et al. | Apr 2009 | A1 |
20090119583 | Kihara et al. | May 2009 | A1 |
20090119585 | Sakuyama et al. | May 2009 | A1 |
20090141940 | Zhao | Jun 2009 | A1 |
20100067736 | Kihara | Mar 2010 | A1 |
20110052063 | McAuley | Mar 2011 | A1 |
20110058228 | Inamoto et al. | Mar 2011 | A1 |
20110164283 | Sadasue et al. | Jul 2011 | A1 |
20110170122 | Kihara et al. | Jul 2011 | A1 |
20120166934 | Kihara | Jun 2012 | A1 |
20120263352 | Fan | Oct 2012 | A1 |
20120297169 | Momoi | Nov 2012 | A1 |
20130265421 | Jia | Oct 2013 | A1 |
20140016105 | Kihara | Jan 2014 | A1 |
20140049788 | Inamoto et al. | Feb 2014 | A1 |
20140164852 | Sumiyoshi et al. | Jun 2014 | A1 |
20140169680 | Tang | Jun 2014 | A1 |
20140204019 | Kihara | Jul 2014 | A1 |
20140212039 | Barkan | Jul 2014 | A1 |
20140334676 | Skans | Nov 2014 | A1 |
20150049943 | Hamsici | Feb 2015 | A1 |
20150169638 | Jaber | Jun 2015 | A1 |
Number | Date | Country |
---|---|---|
2007-164560 | Jun 2007 | JP |
2009-182530 | Aug 2009 | JP |
2013-157795 | Aug 2013 | JP |
Entry |
---|
Galleguillos, Carolina, et al. “Multi-class object localization by combining local contextual interactions.” Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010. |
Number | Date | Country | |
---|---|---|---|
20150146924 A1 | May 2015 | US |