This application is a National Stage of International Application No. PCT/JP2020/033921 filed on Sep. 8, 2020. The contents of the above document is incorporated herein by reference in its entirety.
The present invention relates to an image judgement apparatus, an image judgement method, and a program, and more particularly to a technique for determining substantial identity of a plurality of images.
In the field of computer technology, a strong need has developed for determining whether a plurality of images, each having an object, are substantially identical. For example, screen images of computer applications are manually designed on a precondition that the images are displayed in a selected single execution environment, such as a particular smartphone. These screen images contain objects such as buttons, images, and input forms. When a design of a screen image for one execution environment is completed, screen images of substantially the same design are also ported to other execution environments. This allows substantially the same screen image to be provided in various execution environments. For example, a screen image designed for a particular smartphone of a particular manufacturer is ported to other smartphones of the same manufacturer or to smartphones of different manufacturers by manual or computer software automated conversion. The screen images thus obtained have often been visually checked for their substantial identity. However, as the number of execution environments for computer applications increases, there is a growing need to automate such verification tasks.
The screen size, aspect ratio, and resolution vary depending on an execution environment of a computer application. Further, objects such as buttons that are provided by the execution environment such as an operation system and included in each screen image differ in appearance to a considerable degree. As such, it is difficult to confirm substantial identity of a plurality of screen images even if the screen images are compared pixel by pixel. Although it is conceivable to input screen images to a machine learning model to determine the substantial identity of the screen images, there is a concern that volume of learning becomes enormous.
In addition to screen images of computer applications, there also exists a strong need to determine the substantial identity of a plurality of images in which objects are arranged, such as page images of electronic books viewed in various environments and web content images viewed in various environments. One or more embodiments of the present invention have been conceived in view of the above, and an object thereof is to provide an image judgement apparatus, an image judgement method, and a program capable of easily and correctly determining substantial identity of a plurality of images in which objects are respectively arranged.
In order to solve the above described problems, an image judgement apparatus according to one aspect of the present invention includes object data obtaining means for obtaining first object data from a first image and second object data from a second image with a use of a first machine learning model in which an image is entered and which outputs object data indicating an attribute and a layout of an object in the image, the first object data indicating an attribute and a layout of an object in the first image, the second object data indicating an attribute and a layout of an object in the second image, and determining means for determining substantial identity of the first image and the second image with a use of a second machine learning model in which the first object data and the second object data are entered and which outputs substantial identity of the first image and the second image.
Here, the first machine learning model may include an R-CNN.
The first machine learning model may be trained by a training image that is generated by overlaying one or more objects on a predetermined base image.
Further, the second machine learning model may include fully connected layers.
The second machine learning model may include a convolutional layer and a pooling layer, which reduce dimensionality of input data based on the first object data and the second object data, on an upstream side of the fully connected layers.
The second machine learning model may be trained by first learning object data indicating an attribute and a layout of an object in the first training image and second learning object data indicating an attribute and a layout of an object in the second training image, the first learning object data and the second learning object data being respectively obtained from the first training image and the second training image that are generated by overlaying a predetermined object on each of identical or similar first and second base images according to a predetermined layout rule.
An image judgement method according to one aspect of the present invention includes obtaining first object data from a first image with a use of a first machine learning model, the first object data indicating an attribute and a layout of an object in the first image, obtaining second object data from a second image with a use of the first machine learning model, the second object data indicating an attribute and a layout of an object in the second image, and determining substantial identity of the first image and the second image based on the first object data and the second object data with a use of a second machine learning model.
Here, the first machine learning model may include an R-CNN.
The image judgement method may further include training the first machine learning model by a training image that is generated by overlaying one or more objects on a predetermined base image.
Further, the second machine learning model may include fully connected layers.
The second machine learning model may include a convolutional layer and a pooling layer, which reduce dimensionality of input data based on the first object data and the second object data, on an upstream side of the fully connected layers.
The method may further include overlaying a predetermined object on each of identical or similar first and second base images according to a predetermined layout rule so as to generate a first training image and a second training image, inputting the first training image and the second training image to the first machine learning model so as to obtain first learning object data and second learning object data, the first learning object data indicating an attribute and a layout of an object in the first training image, the second learning object data indicating an attribute and a layout of an object in the second training image, and training the second machine learning model by the first learning object data and the second learning object data.
A program according to still another aspect of the present invention causes a computer to obtain first object data from a first image with a use of a first machine learning model, the first object data indicating an attribute and a layout of an object in the first image, obtain second object data from a second image with a use of the first machine learning model, the second object data indicating an attribute and a layout of an object in the second image, and determine substantial identity of the first image and the second image based on the first object data and the second object data with a use of a second machine learning model. The program may be stored in a computer-readable information storage medium, such as a magneto-optical disk or a semiconductor memory.
An embodiment of the present invention will be described below with reference to the accompanying drawings. In the following, the identical components are labeled by the same numerals in the drawings, and description thereof is omitted as appropriate.
The objects, such as buttons, are arranged on the screen images as described above. The image judgement apparatus 10 determines the substantial identity of the two screen images based on the number of objects included in the two screen images, an attribute of each object, and the layout of the objects in the screen images. Here, the attribute of object is, for example, a type and color information of the object. Types of objects include a button, logo image, trademark image, and input form, for example. The color information of the object may be information of one or more representative colors and information of an average color of the object, for example.
Here, two images are “substantially identical” when the number of object images included in each image and respective attributes and layouts of the objects satisfy a predetermined positive example rule. In contrast, two images are “not substantially identical” when the number of object images included in each image and respective attributes and layouts of the objects satisfy a predetermined negative example rule.
According to the image judgement apparatus 10 shown in
As shown in
On the other hand, a screen image having a different size of the object B as shown in
In order to determine such substantial identity, as shown in
Here, the screen image A and the screen image B are sequentially entered in the R-CNN 12 to sequentially obtain the object data A and the object data B. Alternatively, as shown in
In
As shown in
The input data generated by the data integrating unit 15 is dimensionally reduced by a plurality of stages of the dimension reduction units 16, and two-dimensional intermediate data is output from the last stage of the dimension reduction units 16. The one-dimensionalization unit 17 one-dimensionalizes the intermediate data, and inputs the one-dimensionalized intermediate data to the first stage of the fully connected layers 18. The last stage of the fully connected layers 18 outputs a one-dimensional (may include two pieces of data) identity determination result from the one-dimensionalized intermediate data. The identity determination result includes the data indicating the degree of identity of the screen image A and the screen image B and the data indicating the degree of non-identity.
According to the CNN 14 shown in
In
With the use of the CNN 14 shown in
Here, the learning of the R-CNN 12 and the CNN 14 will be described.
The object image storage unit 26 stores an object attribute table shown in
The training data generating unit 22 generates a large number of training data based on the data stored in the base image storage unit 24 and the object image storage unit 26. The training data includes training images and correct answer data.
A base image used for generating a training image is randomly selected by the training data generating unit 22 from a large number of base images stored in the base image storage unit 24. An object image to be overlaid on the base image is also randomly selected by the training data generating unit 22 from a large number of object images stored in the object image storage unit 24. Further, the layout (position and size) of the object images is also randomly determined by the training data generating unit 22.
When generating a training image, the training data generating unit 22 randomly selects an object image to overlay on a base image and also randomly determines the layout of the object image. The training data generating unit 22 reads the attribute of the selected object image from the object attribute table shown in
The training unit 21 executes a training process of the R-CNN 12 by using the training data generated by the training data generating unit 22. Specifically, the training unit 21 sequentially inputs the training images included in the training data to the R-CNN 12 so as to obtain the output of the object data. The difference between the output and the correct answer data included in the training data is calculated, and the internal parameters of the R-CNN 12 are updated so as to reduce such a difference.
The training data generating unit 22 disposes the selected object images on the selected base image, thereby generating a training image (S103). At this time, the training data generating unit 22 randomly determines a position and a size of each object image.
The training data generating unit 22 further generates correct answer data illustrated in
After repeating the processing in S101 to S104 until a predetermined number of pieces of training data are generated (S105), the training unit 21 executes the training process of the R-CNN 12 using the generated training data (S106).
Next,
The training unit 31 executes a training process of the CNN 14 by using the training data generated as described above. Specifically, two training images included in the training data are sequentially entered into the R-CNN 12 to obtain two pieces of object data. The obtained object data is entered into the CNN 14. The training unit 31 obtains the identity determination result that is output from the CNN 14, and updates the inner parameters of the CNN 14 so that the identity determination result is correct. That is, when the training image relating to the positive example is entered into the R-CNN 12, the inner parameters are updated so as to indicate that the identity determination result from the CNN 14 is substantially identical. In contrast, when the training image relating to the negative example is entered into the R-CNN 12, the inner parameters are updated so as to indicate that the identity determination result from the CNN 14 is not substantially identical.
Next, the training data generating unit 32 employs the positive example rule or the negative example rule and overlays all or some of the selected object images on the selected base image, thereby generating a training image B (S204). For example, if the first negative example rule is used, the training image B is generated without overlaying a part of the selected object images on the selected base image. If the second negative example rule is used, a part of the selected object images is moved rightward or leftward and then overlaid on the selected base image so as to generate the training image B. If the positive example rule is used, a part of the selected object images is enlarged or reduced and then overlaid on the selected base image so as to generate the training image B.
Subsequently, the training data generating unit 32 generates training data including the generated pair of training image A and training image B and whether the pair is positive example or negative example, and stores the generated training data (S205).
After repeating the processing in S201 to S205 until a predetermined number of pieces of training data are generated (S206), the training unit 31 executes the training process of the CNN 14 using the generated training data (S207).
According to the image judgement apparatus 10 described above, it is possible to obtain, from each of two screen images to be compared, object data indicating the attributes and positions of the object images included in the two screen images. Based on the two pieces of object data, the substantial identity of the two screen images is determined. For determining the substantial identity, the CNN 14 is trained in advance using a large number of training image pairs generated according to the positive and negative example rules. According to the present embodiment, the substantial identity of the two screen images can be suitably determined.
Specifically, the inventors of the present invention generated 5000 pieces of training data for the R-CNN 12 and 8000 pieces of training data for the CNN 14 by using 500 base images and 33 types of object images, and trained the R-CNN 12 and the CNN 14. As a result, it was found that the accuracy of determination of the substantial identity of the screen images was about 86%, which is sufficiently practical.
The scope of the present invention is not limited to the above embodiment, and includes various modifications. For example, the present invention may be applied not only to a screen image but also to various images, such as a page image of an electronic book and a web content image.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/033921 | 9/8/2020 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2022/054124 | 3/17/2022 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
8315423 | Jing | Nov 2012 | B1 |
9767381 | Rodríguez-Serrano | Sep 2017 | B2 |
10210627 | Vitsnudel | Feb 2019 | B1 |
10607331 | Tandia | Mar 2020 | B1 |
10748650 | Ricci | Aug 2020 | B1 |
10794710 | Liu | Oct 2020 | B1 |
10803602 | Kim | Oct 2020 | B2 |
10838601 | Chen | Nov 2020 | B2 |
10885336 | Davis | Jan 2021 | B1 |
10902615 | Tao | Jan 2021 | B2 |
11100400 | Fang | Aug 2021 | B2 |
11113586 | Wang | Sep 2021 | B2 |
11288544 | Leung | Mar 2022 | B2 |
11308350 | Habibian | Apr 2022 | B2 |
11354791 | Tsymbalenko | Jun 2022 | B2 |
11513670 | Singh | Nov 2022 | B2 |
11551348 | Zhang | Jan 2023 | B2 |
20080187170 | Matsubayashi | Aug 2008 | A1 |
20090154765 | Watanabe | Jun 2009 | A1 |
20110273578 | Okamoto | Nov 2011 | A1 |
20140099026 | Krishnaswamy | Apr 2014 | A1 |
20170083792 | Rodríguez-Serrano | Mar 2017 | A1 |
20180129906 | Habibian | May 2018 | A1 |
20180268307 | Kobayashi | Sep 2018 | A1 |
20190066313 | Kim | Feb 2019 | A1 |
20190147602 | Tao | May 2019 | A1 |
20190212903 | Chen | Jul 2019 | A1 |
20200202502 | Tsymbalenko | Jun 2020 | A1 |
20200202505 | Tsai | Jun 2020 | A1 |
20200242422 | Wang | Jul 2020 | A1 |
20200257940 | Leung | Aug 2020 | A1 |
20200327654 | Zhang | Oct 2020 | A1 |
20210150243 | Wang | May 2021 | A1 |
20210166063 | Nakamura | Jun 2021 | A1 |
20210333983 | Singh | Oct 2021 | A1 |
20210374947 | Shin | Dec 2021 | A1 |
20210398407 | Adato | Dec 2021 | A1 |
20210400195 | Adato | Dec 2021 | A1 |
20220032457 | Anand | Feb 2022 | A1 |
20220067812 | Song | Mar 2022 | A1 |
20220180485 | Chen | Jun 2022 | A1 |
Number | Date | Country |
---|---|---|
2018169690 | Nov 2018 | JP |
2018169690 | Nov 2018 | JP |
2019219766 | Dec 2019 | JP |
2019219766 | Dec 2019 | JP |
2020107185 | Jul 2020 | JP |
2020107185 | Jul 2020 | JP |
Entry |
---|
Girshick et al., “Rich feature hierarchies for accurate object detection and semantic segmentation.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587. 2014. (Year: 2014). |
Bell et al., “Learning visual similarity for product design with convolutional neural networks.” ACM transactions on graphics (TOG) 34, No. 4 (2015): 1-10. (Year: 2016). |
Oksuz et al., “Imbalance problems in object detection: A review.” IEEE transactions on pattern analysis and machine intelligence 43 , No. 10 (2020): 3388-3415. (Year: 2020). |
Wu et al., “Spot the difference by object detection.” arXiv preprint arXiv: 1801.01051 (2018). (Year: 2018). |
Zagoruyko et al., “Learning to compare image patches via convolutional neural networks, ” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 4353-4361. (Year: 2016). |
International Search Report for PCT/JP2020/033921 (See the transmittal letter). |
Number | Date | Country | |
---|---|---|---|
20220309648 A1 | Sep 2022 | US |