This application is based on Japanese patent application NO. 2021-019543, the content of which is incorporated hereinto by reference.
The present invention relates to a data generation apparatus, a data generation method, and a program.
In recent years, machine learning has been used in various fields. In order to generate a model by machine learning, learning data is required to be prepared. Japanese Patent Application Publication No. 2019-29021 describes, when applying machine learning for control of a robot used in a factory, preparing learning data using a following method. Specifically, object information of an object is first associated with a position and pose detection marker. Also, a learning data set generation jig is prepared. The learning data set generation jig includes a base part that serves as a guide for a placement position of an object and a marker being fixed above the base part. Then, in a state where the object arranged using the base part as a guide, a group of multi-viewpoint images of the whole object including the marker is acquired. Then, a bounding box for the object is set for the acquired group of images, and a captured image is associated with pose information and gravity center position information of the object being estimated from the captured image, the object information, and information related to the bounding box, and thereby a learning data set for performing object recognition of the object and estimation of a position and a pose thereof is generated.
In order to improve an accuracy of a model generated by machine learning, it is necessary to prepare a large number of learning data. One example of an object of the present invention is to facilitate preparation of learning data that is used when generating a model for image recognition and has a high learning effect.
In one embodiment, there is provided a data generation apparatus that generates learning data for generating an object inference model that infers a type of an object included in an image, the data generation apparatus including:
In another embodiment, there is provided a data generation method for generating learning data that is executed by a computer and generates an object inference model that infers a type of an object included in an image, the data generation method including:
In still another embodiment, there is provided a program that causes a computer to function as a data generation apparatus that generates learning data for generating an object inference model that infers a type of an object included in an image, the program causing the computer to have:
According to the present invention, preparation of learning data that is used when generating an image recognition model and has a high learning effect is facilitated.
The above and other objects, advantages and features of the present invention will be more apparent from the following description of certain preferred example embodiments taken in conjunction with the accompanying drawings, in which:
The invention will be now described herein with reference to illustrative embodiments. Those skilled in the art will recognize that many alternative embodiments can be accomplished using the teachings of the present invention and that the invention is not limited to the example embodiments illustrated for explanatory purposes.
The following will describe example embodiments of the present invention with reference to the drawings. Note that, in all the drawings, like components are designated by like reference numerals, and description thereof will be omitted as appropriate.
The object recognition apparatus 20 is, for example, a product registration apparatus used at the time of settlement at a store. In this case, the object to be inferred is a product. Then, the object recognition apparatus 20 generates an image to be processed by taking an image of the product to be registered. Then, the object recognition apparatus 20 identifies the product by processing this image and registers the product as a product to be settled. Note that the object recognition apparatus 20 may be operated by a clerk or operated by a customer. Further, the object recognition apparatus 20 may or may not have a settlement function. In the latter case, the object recognition apparatus 20 is used with a checkout apparatus.
However, the object recognition apparatus 20 is not limited to the product registration apparatus.
The image acquisition unit 110 acquires a plurality of images. The plurality of images include an object to be inferred. In the example illustrated in
Note that, when generating a moving image, the moving image of the object may be shot while being held in a human hand. In this case, the orientation of the object is changed manually while the moving image is shot. On the other hand, the object may be stationary when generating a moving image. In this case, the relative orientation of the object to the imaging apparatus is changed by moving the imaging apparatus while the moving image is shot.
The image cut-out unit 120 cuts out an object region including the object from each of the plurality of images acquired by the image acquisition unit 110. An example of the shape of the object region is a rectangle. In the present example embodiment, the type of the object to be recognized is known in advance. Thus, the image cut-out unit 120 may determine and cut out the object region using the feature value of the object. Other examples of the processing performed by the image cut-out unit 120 will be described later using other drawings.
The importance generation unit 130 generates importance information by processing the object region cut out by the image cut-out unit 120. The importance information indicates importance of the object region when an object inference model is generated, and is generated for each object region, that is, for each image acquired by the image acquisition unit 110. The importance information includes, for example, a score indicating the importance.
As an example, the importance generation unit 130 generates importance information using sharpness of the object in the object region. The sharpness decreases as a blur and focus deviation (the lack of focus) at the time of capturing an image increase. Specifically, the importance generation unit 130 reduces the importance of the object region when the sharpness of the object is low. The sharpness can be calculated using, for example, an auto-focusing technology used in the imaging apparatus. Specifically, sharpness becomes high when in focus. Further, the image cut-out unit 120 may use the gradient of the pixel color information (for example, numerical values indicating the respective strengths of RGB) at the edge of the object as a value indicating sharpness. Then, the importance generation unit 130 determines that the sharpness is lower as this gradient becomes lower.
As another example, the importance generation unit 130 generates importance information using information related to a size of an object in the object region. For example, the importance generation unit 130 reduces the importance of the object region as the object in the object region becomes smaller. The importance generation unit 130 also reduces the importance of the object region when part of the object is hidden.
As another example, when an object is held in a human hand, the importance generation unit 130 generates importance information using information related to a size of the hand in the object region. This is because when the ratio of the hand that occupies the object region is large, the ratio of the object that occupies the object region is often low. Specifically, the importance generation unit 130 reduces the importance of the object region as the size of the hand in the object region increases.
The learning data generation unit 140 stores a plurality of object regions cut out by the image cut-out unit 120 and a plurality of pieces of importance information generated by the importance generation unit 130 in the learning data storage unit 150 as at least a part of the learning data. At this time, each of the plurality of object regions is linked to the importance information of the object region. In this learning data, the object region is used as at least a part of explanatory variables. Then, when learning an object region with a low importance, the learning data generation unit 140 reduces the influence (for example, contribution to loss) of the object region on an object recognition model. Note that an objective variable of this learning data is the type of the object (for example, the product name).
Note that the moving image storage unit 112 stores a moving image for each of a plurality of objects. Then, the image cut-out unit 120, the importance generation unit 130, and the learning data generation unit 140 perform the above-described processing for each object.
In the example illustrated in
Note that, in the example illustrated in
Further, the model generation unit 160, the model storage unit 170, and the transmitting unit 180 may be provided in an apparatus separate from the data generation apparatus 10.
The bus 1010 is a data transmission path that allows the processor 1020, the memory 1030, the storage device 1040, the input/output interface 1050, and the network interface 1060 to transmit and receive data to and from one another. However, the method of connecting the processor 1020 and the like to one another is not limited to the bus connection.
The processor 1020 is a processor achieved by a central processing unit (CPU), a graphics processing unit (GPU), or the like.
The memory 1030 is a main storage achieved by random access memory (RAM) or the like.
The storage device 1040 is an auxiliary storage achieved by a hard disk drive (HDD), a solid state drive (SSD), a memory card, a read only memory (ROM), or the like. The storage device 1040 stores a program module that enables each function of the data generation apparatus 10 (for example, the image acquisition unit 110, the image cut-out unit 120, the importance generation unit 130, the learning data generation unit 140, the model generation unit 160, and the transmitting unit 180). The processor 1020 reads and executes each program module on the memory 1030 to enable each function corresponding to the program module. The storage device 1040 also functions as the moving image storage unit 112, the learning data storage unit 150, and the model storage unit 170.
The input/output interface 1050 is an interface for connecting the data generation apparatus 10 to various input/output equipment.
The network interface 1060 is an interface for connecting the data generation apparatus 10 to the network. This network is, for example, a local area network (LAN) or a wide area network (WAN). The method by which the network interface 1060 connects to the network may be a wireless connection or a wired connection. The data generation apparatus 10 may communicate with the object recognition apparatus 20 via the network interface 1060.
First, the image acquisition unit 110 of the data generation apparatus 10 acquires a moving image obtained by capturing an object to be processed (step S110). Then, the data generation apparatus 10 performs the processing illustrated in steps S120 to S150 for all of the plurality of frame images constituting the acquired moving image (step S160).
The image cut-out unit 120 first selects a frame image to be processed (step S120). Then, the image cut-out unit 120 cuts out an object region from the selected frame image (step S130). The details of this processing are as described with reference to
Next, the importance generation unit 130 generates importance information by processing the object region (step S140). The details of this processing are as described with reference to
According to the present example embodiment, the learning data generated by the data generation apparatus 10 has an image including an object (an object region), as well as, importance data indicating the importance of the object region. In this way, even when the learning data generated by the data generation apparatus 10 includes an image that is not appropriate as learning data (for example, a blurry image), the effect of the image on the model is reduced. As such, even when an object recognition model is generated using the learning data generated by the data generation apparatus 10, the accuracy of the object recognition model is high. Therefore, learning data can be easily generated by using the data generation apparatus 10.
After the position of an object is determined in a certain frame image (hereinafter, referred to as the first frame image), the tracking unit 122 uses the position of the object in the first frame image when determining the position of the object in the following frame image (hereinafter, referred to as the second frame image). That is, the tracking unit 122 has a function of tracking an object in a moving image. As an example, the tracking unit 122 searches an object around an area near the position of the object in the first frame image. For example, the second frame image is the next frame image of the first frame image without limitation.
According to the present example embodiment, as in the first example embodiment, the learning data can be easily generated by using the data generation apparatus 10. Further, since the image cut-out unit 120 has the tracking unit 122, the object region can be easily cut out from the image.
The importance generation unit 130 generates importance information using a model for calculating importance (hereinafter, referred to as the importance inference model). The importance inference model outputs importance information when an object region is input. The importance inference model is stored, for example, in the inference model storage unit 132. The inference model storage unit 132 may be part of the data generation apparatus 10 or may be located outside of the data generation apparatus 10.
The importance inference model is generated, for example, as follows. First, learning data for an importance inference model is prepared. This learning data has an image corresponding to an object region as an explanatory variable and importance information as an object variable. To generate the learning data, importance information is input, for example, manually. Next, machine learning is performed using this learning data. In this way, an importance inference model is generated.
Note that, in the present example embodiment, the image cut-out unit 120 may also have the tracking unit 122 illustrated in the second example embodiment.
According to the present example embodiment, as in the first example embodiment, the learning data can be easily generated by using the data generation apparatus 10.
Although example embodiments of the present invention have been described with reference to the drawings, these are examples of the present invention, and various configurations other than the above can be adopted.
Further, although the plurality of steps (processes) are described in order in the plurality of flowcharts used in the above description, the execution order of the steps performed in each example embodiment is not limited to the described order. In each example embodiment, the order of the illustrated steps can be changed as long as the change does not interfere with the content. In addition, the above-described example embodiments can be combined as long as the contents do not conflict with each other.
Some or all of the above example embodiments may also be described as in the following supplementary notes, but are not limited to the following.
1. A data generation apparatus that generates learning data for generating an object inference model that infers a type of an object included in an image, the data generation apparatus including:
2. The data generation apparatus according to the above 1, in which
3. The data generation apparatus according to the above 2, in which
4. The data generation apparatus according to any one of the above 1 to 3, further including
5. The data generation apparatus according to any one of the above 1 to 4, in which
6. The data generation apparatus according to any one of the above 1 to 5, in which
7. The data generation apparatus according to any one of the above 1 to 6, in which
8. The data generation apparatus according to any one of the above 1 to 7, in which
9. The data generation apparatus according to any one of the above 1 to 8, in which
10. The data generation apparatus according to any one of the above 1 to 9, in which
11. The data generation apparatus according to any one of the above 1 to 10, in which
12. A data generation method that is executed by a computer and generates learning data for generating an object inference model that infers a type of an object included in an image, the data generation method comprising:
13. The data generation method according to the above 12, in which,
14. The data generation method according to the above 13, in which,
15. The data generation method according to any one of the above 12 to 14, further including,
16. The data generation method according to any one of the above 12 to 15, in which,
17. The data generation method according to any one of the above 12 to 16, in which,
18. The data generation method according to any one of the above 12 to 17, in which
19. The data generation method according to any one of the above 12 to 18, in which,
20. The data generation method according to any one of the above 12 to 19, in which
21. The data generation method according to any one of the above 12 to 20, in which
22. The data generation method according to any one of the above 12 to 21, in which
23. A program that causes a computer to function as a data generation apparatus that generates learning data for generating an object inference model that infers a type of an object included in an image, the program causing the computer to have:
24. The program according to the above 23, in which
25. The program according to the above 24, in which
26. The program according to any one of the above 23 to 25, causing the computer to have
27. The program according to any one of the above 23 to 26, in which
28. The program according to any one of the above 23 to 27, in which
29. The program according to any one of the above 23 to 28, in which
30. The program according to any one of the above 23 to 29, in which
31. The program according to any one of the above 23 to 30, in which
32. The program according to any one of the above 23 to 31, in which
33. The program according to any one of the above 23 to 32, in which
It is apparent that the present invention is not limited to the above example embodiments, and may be modified and changed without departing from the scope and spirit of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2021-019543 | Feb 2021 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
9665802 | Wang | May 2017 | B2 |
20160140424 | Wang | May 2016 | A1 |
20180039911 | Bezzubtseva | Feb 2018 | A1 |
20200012893 | Shiraishi | Jan 2020 | A1 |
20200137300 | Ogawa | Apr 2020 | A1 |
20200151511 | Tsutsumi | May 2020 | A1 |
20200242345 | Huang | Jul 2020 | A1 |
20200380289 | Jagadeesh | Dec 2020 | A1 |
20200394434 | Rao | Dec 2020 | A1 |
20210097354 | Amato | Apr 2021 | A1 |
20210256307 | Papli | Aug 2021 | A1 |
20220101047 | Puri | Mar 2022 | A1 |
20230004811 | Ishikawa | Jan 2023 | A1 |
20230401813 | Cao | Dec 2023 | A1 |
20240062048 | Sakai | Feb 2024 | A1 |
20240062545 | Nabeto | Feb 2024 | A1 |
Number | Date | Country |
---|---|---|
110598609 | Dec 2019 | CN |
2013164834 | Aug 2013 | JP |
2019-029021 | Feb 2019 | JP |
WO-2018083910 | May 2018 | WO |
2018168515 | Sep 2018 | WO |
WO-2021157067 | Aug 2021 | WO |
WO-2022127814 | Jun 2022 | WO |
Entry |
---|
Gupta AK, Seal A, Prasad M, Khanna P. Salient Object Detection Techniques in Computer Vision—A Survey. Entropy (Basel). Oct. 19, 2020;22(10):1174. doi: 10.3390/e22101174. PMID: 33286942; PMCID: PMC7597345. (Year: 2020). |
Borji, A., Cheng, M. M., Hou, Q., Jiang, H., & Li, J. (2019). Salient object detection: A survey. Computational visual media, 5, 117-150} (Year: 2019). |
Wang, Wenguan, Jianbing Shen, and Ling Shao. “Video salient object detection via fully convolutional networks.” IEEE Transactions on Image Processing 27.1 (2017): 38-49. (Year: 2017). |
M.-M. Cheng, N. J. Mitra, X. Huang, P. H. S. Torr and S.-M. Hu, “Global Contrast Based Salient Region Detection,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, No. 3, pp. 569-582, Mar. 1, 2015, doi: 10.1109/TPAMI.2014.2345401. (Year: 2015). |
X. Ding and Z. Chen, “Improving Saliency Detection Based on Modeling Photographer's Intention,” in IEEE Transactions on Multimedia, vol. 21, No. 1, pp. 124-134, Jan. 2019, doi: 10.1109/TMM.2018.2851389. (Year: 2019). |
JP Office Action for JP Application No. 2021-019543, mailed on Oct. 22, 2024 with English Translation. |
Katsumi Kikuchi et al., “Heterogeneous Object Recognition to Identify Retail Products”, NEC Technical Journal, vol. 72, No. 1, Japan, Oct. 31, 2019, pp. 86-90. |
Number | Date | Country | |
---|---|---|---|
20220254136 A1 | Aug 2022 | US |