This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2018-118089, filed on Jun. 21, 2018; the entire contents of which are incorporated herein by reference.
Embodiments described herein relate generally to an image analysis device, an image analysis method, and a computer program product.
Conventionally, recognizing a person or an object from an image has been known. For example, use of a convolution neural network (CNN) for image recognition is known.
It is however difficult to recognize unknown objects other than known objects which are registered in learning dataset.
According to one embodiment, generally, an image analysis device includes one or more processors configured to receive input of an image; calculate feature amount information indicating a feature of a region of the image; recognize a known object from the image on the basis of the feature amount information, the known object being registered in learning data of image recognition; recognize a generalization object from the image on the basis of the feature amount information, the generalization object being generalizable from the known object; and output output information on an object identified from the image as the known object or the generalization object.
Hereinafter, an embodiment of an image analysis device, an image analysis method, and a computer program product will be described in detail with reference to the accompanying drawings.
First, an exemplary functional configuration of an image analysis device 100 of the embodiment will be described.
Exemplary Functional Configuration
The receiver 10 receives input of an image.
The calculator 11 calculates feature amount information indicating the features of a region of the image.
The known-object recognizer 12 recognizes a known object, registered in the learning data of image recognition, from the image on the basis of the feature amount information. The feature amount information refers to, for example, a feature amount map representing the feature amount of each pixel. In the present embodiment, the feature amount information is exemplified by the feature amount map.
The generalization-object recognizer 13 recognizes a generalization object (unknown object) generalizable from the known object, from the image on the basis of the feature amount information.
The output controller 14 outputs output information on an object identified from the image as the known object or the generalization object.
Next, the calculator 11 calculates a feature amount map as the feature amount information (step S101). The feature amount map may be of any type.
Returning to
Exemplary shapes of recognized known objects will now be described.
Recognition method for generalization objects by the generalization-object recognizer 13 is similar to the methods in
Returning to
The present embodiment describes shape as the features of known objects, by way of example. However, the features of known objects may be color or texture.
The generalization-object recognizer 13 can learn image recognition from the learning dataset where the known-object recognizer 12 learns image recognition. One of such learning methods is described. First, the generalization-object recognizer 13 not categorizes objects in the learning image of the learning dataset into known object categories but integrates the learning dataset into a single generalization category, “object”. In other words, the generalization-object recognizer 13 learns the image recognition using a learning model of only one category of object to learn. The generalization-object recognizer 13 thus learns the image recognition by the learning method by which the known-object recognizer 12 learns the image recognition.
The known-object recognizer 12 can learn the image recognition by the learning methods as described in Ren et al., “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, Neural Information Processing Systems (NIPS), 2015; and Li et al., “Fully Convolutional Instance-aware Semantic Segmentation”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, for example. An external device may learn the image recognition. That is, the known-object recognizer 12 (generalization-object recognizer 13) may execute the image recognition learned by the external device.
Next, returning to
Next, an exemplary functional configuration of the output controller 14 of the embodiment will be described.
The integrator 20 receives known-object data including the known object recognized by the known-object recognizer 12, from the known-object recognizer 12. The integrator 20 also receives generalization-object data including the generalization object recognized by the generalization-object recognizer 13, from the generalization-object recognizer 13. The integrator 20 then integrates the known-object data and the generalization-object data into integrated data.
When the position of the known object and the position of the generalization object in the integrated data match each other, the determiner 21 determines the object as a known object. When the position of the known object and the position of the generalization object in the integrated data do not match, the determiner 21 determines the object as a generalization object.
The integrator 20 integrates, into integrated data, the known-object data including the known object recognized by the known-object recognizer 12 and the generalization-object data including the generalization object recognized by the generalization-object recognizer 13.
When the position of the known object and the position of the generalization object in the integrated data match each other, the determiner 21 determines the object as a known object. When the position of the known object and the position of the generalization object in the integrated data do not match, the determiner 21 determines the object as a generalization object. In the example of
The determination criterion of the determiner 21 is exemplary. The determiner 21 may determine whether the object is a known object or a generalization object by another criterion.
As described above, according to the image analysis device 100 of the embodiment, the receiver 10 receives an image input. The calculator 11 calculates the feature amount information indicating the features of the region of the image. The known-object recognizer 12 recognizes from the image the known object registered in the learning data of the image recognition, on the basis of the feature amount information. The generalization-object recognizer 13 recognizes from the image the generalization object generalizable from the known object, on the basis of the feature amount information. The output controller 14 outputs the output information on an object identified from the image as the known object or the generalization object. Thereby, the image analysis device 100 of the embodiment can recognize unknown objects other than the known objects registered in the learning dataset as generalization objects. Moreover, the image analysis device 100 of the embodiment can recognize an unknown object as a generalization object without change in the learning data or the size of the network that calculates the feature amount information (for example,
Lastly, an exemplary hardware configuration of the image analysis device of the embodiment will be described.
Exemplary Hardware Configuration
The control device 301 executes a computer program read from the auxiliary storage device 303 to the main storage device 302. The main storage device 302 represents memory such as a read only memory (ROM) and a random access memory (RAM). The auxiliary storage device 303 represents a hard disk drive (HDD) or a memory card, for example.
The display device 304 displays display information. The display device 304 is a liquid crystal display, for example. The input device 305 is an interface for operating the image analysis device 100. The input device 305 is exemplified by a keyboard or a mouse. When the image analysis device 100 is a smart device such as a smartphone or a tablet terminal, the display device 304 and the input device 305 are a touch panel, for example. The communication device 306 is an interface for communicating with other devices.
Programs to be executed by the image analysis device 100 of the embodiment are recorded in installable or executable file format on a computer readable storage medium such as a compact disc-read only memory (CD-ROM), a memory card, a compact disc-recordable (CD-R), and a digital versatile disc (DVD) and provided as a computer program product.
Programs to be executed by the image analysis device 100 of the embodiment may be stored in a computer connected to a network such as the Internet, and downloaded and provided via the network. Programs to be executed by the image analysis device 100 of the embodiment may be provided via a network such as the Internet without being downloaded.
The programs of the image analysis device 100 of the embodiment may be incorporated in advance in the ROM.
Programs to be executed by the image analysis device 100 of the embodiment have a modular configuration including functional blocks that can be implemented by a program, among the above functional blocks. As actual hardware, the control device 301 reads and executes the program from the storage medium to load the functional blocks on the main storage device 302. In other words, the functional blocks are generated on the main storage device 302.
Part or all of the functional blocks may be implemented not by software but by hardware such as an integrated circuit (IC).
In the case of using a plurality of processors to implement the functions, each of the processors may implement one or two or more of the functions.
The operational form of the image analysis device 100 of the embodiment may be optional. For example, the image analysis device 100 of the embodiment may operate as a cloud system on the network.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions.
Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
JP2018-118089 | Jun 2018 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
20100027892 | Guan | Feb 2010 | A1 |
20110052063 | McAuley | Mar 2011 | A1 |
20120063674 | Yano | Mar 2012 | A1 |
20130036438 | Kutaragi | Feb 2013 | A1 |
20140149376 | Kutaragi | May 2014 | A1 |
20140289323 | Kutaragi | Sep 2014 | A1 |
20150248586 | Gaidon | Sep 2015 | A1 |
20170091671 | Mitarai | Mar 2017 | A1 |
20170270389 | Skaff | Sep 2017 | A1 |
20170330059 | Novotny | Nov 2017 | A1 |
20180349748 | Pham | Mar 2018 | A1 |
20190095716 | Shrestha | Mar 2019 | A1 |
Number | Date | Country |
---|---|---|
2010-262601 | Nov 2010 | JP |
2018-205800 | Dec 2018 | JP |
Entry |
---|
Everingham, M., Gool, L. Van, Williams, C.K., Winn, J., and Zisserman, A. “The Pascal Visual Object Classes (VOC) Challenge.” IJCV, 2010. |
LeCun, et al. “Backpropagation Applied to Handwritten Zip Code Recognition,” Neural computation, 1989. |
Li, et al. “Fully Convolutional Instance-aware Semantic Segmentation,” IEEE Conference on Computer Vision and Pattern Recognition (CVRR) 2017. |
Ren, et al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” Neural Information Processing Systems (NIPS) 2015. |
Akiyama et al., “Generic Object Recognition by a Specific Object Recognition Method Using a Large Number of Images on the Web”, IPSJ SIG Technical Report, Information Processing Society of Japan (2010), Jun. 15, 2020, vol. 2010-CVIM-172, No. 11, pp. 1-8. |
Number | Date | Country | |
---|---|---|---|
20190392270 A1 | Dec 2019 | US |