The present disclosure relates to an information processing technique for completion a missing portion of an input image.
In recent years, a service that supports production of a poster or a flyer is provided. The service supports the production such that anyone can easily obtain a product with a certain level of quality by editing a layout based on multiple templates prepared in advance.
In the case where an image owned by the user is to be used in the selected templated, there is a case where a portion of an object in the image is missing. In the case where the missing portion is an essential portion, the user needs to prepare another image suiting the usage purpose again.
In recent years, generation of a necessary content is becoming possible with a generative AI technology. In the generative AI technology, the user inputs an image or a text as an input prompt into a generative model. Then, the generative AI technology can generate a text, an image, a moving image or the like that matches a “context” expressed by the inputted prompt at high probability. Using this technique allows the user to easily obtain an image (completed image) in which the missing portion is completed. However, the user needs to specify which portion of the image is to be completed.
Japanese Patent Laid-Open No. 2008-250046 discloses a technique of automatically correcting image data for performing image generation of an original, based on a read image of the original and ground truth data. In Japanese Patent Laid-Open No. 2008-250046, the read image of the original and the ground truth data in which multiple lines varying in boldness and orientation are drawn are compared with each other to detect a missing portion such as a broken line and a change in line width in the read image, and the image data for performing the image generation of the original is automatically corrected.
However, the technique described in Japanese Patent Laid-Open No. 2008-250046 has the following problems. It is necessary to prepare the ground truth data in advance and read the ground truth data every time the correction is performed. Moreover, the correction target is limited to a line image.
The present disclosure provides an information processing apparatus including: an obtaining unit configured to obtain an input image; an extraction unit configured to extract an object from the input image; an identification unit configured to identify a missing portion of the object extracted by the extraction unit, based on a feature of an outline of the object; and a generation unit configured to generate a completed image in which the missing portion identified in the input image is completed.
Further features of the present disclosure will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
Preferable embodiments of the present disclosure are explained below in detail with reference to the attached drawings. Note that the following embodiments do not limit the present disclosure relating to the scope of claims, and not all of combinations of features explained in the present embodiments are necessarily essential for the solving means of the present disclosure.
First, an information processing system according to the present embodiment is explained. The information processing system according to the present embodiment is a print system involving layout data editing for an image output apparatus. In the print system, editing of layout data and print job transmission to the image output apparatus are performed in an externally-connected PC. In print job generation, editing work of print setting is performed on a screen of the PC as necessary.
In the present document, a system in which the print job is transmitted from a printing application installed in the PC to the image output apparatus 100 via a printer driver is explained as an example of execution of printing. For example, the printing application and the printer driver are installed in the client PC 102. The printing application can obtain device information of the associated image output apparatus 100 and print parameters such as a type of sheet, a sheet size, and a print quality from the printer driver, and perform print setting editing of a parameter from among the obtained parameters. The print job is formed based on the above-mentioned print setting and a layout data image for which rendering processing is completed in the server 104, and the print job is transmitted to the image output apparatus via a spool of the printer driver to execute print processing. In the image output apparatus, the printing is executed based on the print setting in the received print job. Moreover, in the image output apparatus, configuration information relating to the handled inks and sheets and status information such as an idling state and a print error are held as the device information. Furthermore, in the case where the printing cannot be normally executed due to an error in the print setting or a problem in the image output apparatus such as lack of remaining sheets and ink empty, a warning message is displayed on a main body panel to present the reason why the printing cannot be normally executed to the user.
A layout data editing unit 401 adds and deletes contents such as a text and an image to be put on a poster or a flyer, and adjusts layout of the contents. In the case where processing such as cut-out and filling is performed on the contents, the layout data editing unit 401 requests a data content editing unit 409 of the server 104 to perform the processing. The layout data is saved in a layout data DB 400 of the client PC 102 as a cache, or is saved in a layout data DB 410 of the server 104 for each client PC 102 (or for each account in the case where there is a user account).
A print job transmission unit 405 generates the print job, and transmits the generated print job to the image output apparatus 100. In the case where the print job transmission unit 405 generates the print job, the print job transmission unit 405 requests a preview image generation unit 413 and a print image generation unit 414 of the server 104 to perform processing of generating a preview or a print image of the layout data.
An image completion processing unit 402 requests an image generation unit 411 of the server 104 to perform image generation based on image information set in an image input unit 404. The image generation unit 411 of the server 104 requested to perform the image generation generates a completed image by using a generative model 412, and transmits the generated completed image to the client PC 102. The client PC 102 receiving the completed image provides the completed image to the user from the image completion processing unit 402. Moreover, the image completion processing unit 402 can request the image generation unit 411 of the server 104 to perform the image generation based on prompt information set in a prompt setting unit 403 together with the image information set in the image input unit 404.
The generative model 412 used by the image generation unit 411 for the generation of the completed image is a machine-learned generative model that performs the image generation by using a diffusion model or the like used in GAN (generative adversarial network), Stable Diffusion, or DALL E.
Next, the functional blocks in the image output apparatus 100 are explained. The ROM 201 includes a device information holding unit 406, a print job reception unit 407, and a print execution unit 408. The device information holding unit 406 holds information such as types and remaining amounts of inks installed in the image output apparatus 100, information such as types and sizes of registered sheets and fed sheets, main body status information of the image output apparatus 100, and status information of the print job. The print job reception unit 407 receives the print job transmitted from the client PC 102. The print execution unit 408 executes the print processing for the print job.
Note that the information held by the device information holding unit 406 may be held on the client PC 102 side or the server 104 side while being associated with the layout data DB 400 or 410. This allows the client PC 102 or the server 104 to generate the layout data suiting the image output apparatus 100 to be used in the case where the image output apparatus 100 to be used is determined in advance.
In the layout editing area 704, the layout data editing unit 401 or the data content editing unit 409 can perform editing such as position adjustment, cut-out, and filling on each displayed content. Addition of a content such as an image or a text to the layout editing area 704 is performed by, for example, pressing an image addition button 702 or a text addition button 703 and specifying a path of a content file as an import source. Note that a button corresponding to each type of content may be added in addition to the image addition button 702 and the text addition button 703. Moreover, a storage of an SNS or another external cloud service may be specifiable as the import source of the content, in addition to a local storage of the client PC 102. Furthermore, addition of a content to the layout editing area 704 through drag and drop may be receivable.
In the case where the layout data editing unit 401 detects the pressing of a printing execution button 705, the layout data editing unit 401 requests the print job transmission unit 405 to generate and transmit the print job to execute printing. The print job transmission unit 405 requested to generate and transmit the print job generates the print job of the layout data displayed in the layout editing area 704, and transmits the generated print job to the image output apparatus A 100 or the image output apparatus B 101.
In the present embodiment, in the case where a missing portion is present in an object in an input image instructed by the user to be added to the layout data editing screen 700, the completed image in which this missing portion is completed can be generated and displayed in the layout data editing screen. The image completion processing unit 402 sets, in the image input unit 404, the input image added by pressing of the image addition button 702, transmits the set input image to the server 104, and receives the completed image generated by the image generation unit 411 from the server 104. The image completion processing unit 402 provides the completed image to the user by displaying the completed image on the layout data editing screen 700. Moreover, the image completion processing may be executable from a context menu.
In S801, the image completion processing unit 402 sets the ID 600 indicating the content set in the image input unit 404.
In S802, the image input unit 404 determines whether the content type 602 corresponding to the specified ID 600 is “image” or not. In the case where the content type 602 is not “image” (No in S802), the image completion processing unit 402 directly terminates the processing. In the case where the content type 602 is “image” (Yes in S802), the processing proceeds to S803.
In S803, the image completion processing unit 402 determines whether a missing portion is present in the input image or not, and identifies the missing portion. The missing portion being present means that part of an object of a predetermined type such as a person or an animal is unnaturally missing as in the input image illustrated in
In S804, the image completion processing unit 402 sets a completion region in a region that is adjacent to the missing portion of the input image and that is outside the input image. The size of the completion region is as follows. For example, in the case where the right side of the input image is the missing portion, a width ¼ of the width size of the input image is set as the width of the completion region, and the same height as the input image is set as the height of the completion region. In the case where the upper side of the input image is the missing portion, the same width as the input image is set as the width of the completion region, and a height ¼ of the height of the input image is set as the height of the completion region. Note that the completion region is not limited to a region set as described above, and may be a region with a predetermined fixed size irrespective of the size of the input image. Moreover, the shape of the completion region may be changed depending on the shape of the object determined to have the missing portion. For example, in the case where a head portion of a person is missing as in the example of
In S805, the image completion processing unit 402 requests the image generation unit 411 of the server 104 to generate the completed image based on the information set in the image input unit 404 and completion region information.
In S806, the image generation unit 411 generates the completed image with a generative AI technology using the generative model 412, transmits the generated completed image to the image completion processing unit 402 of the client PC 102, and terminates the processing.
Note that the configuration may be such that a flag indicating whether the image completion processing is to be performed or not is used in addition to the content type for the determination of whether to perform the image completion processing or not in S802. For example, the configuration may be such that the flag is set for each content, and in the case where the flag is “TRUE”, the image missing portion determination and the generation of the completed image are performed. Meanwhile, in the case where the flag is “FALSE”, the processing is directly terminated.
In the present embodiment, as described above, the image complementation is performed on the image with the missing portion. Accordingly, it is possible to omit work of the user preparing another image without the missing portion or preparing ground truth data for specifying how the complementation is to be performed. Moreover, since the image completion function is incorporated in the processing of importing the input image, the user does not have to abort the layout editing and separately perform the image completion processing, and can seamlessly perform the layout editing.
As described above, the editing of content itself such as position adjustment, cut-out, and filling can be performed on each of the contents displayed in the layout editing area 704. Accordingly, in the present embodiment, the generation of the completed image is performed along with the cut-out processing of each content performed in the content editing.
S1001 to S1002 are the same as S801 to S802.
In S1003, the image completion processing unit 402 performs the cut-out processing on each content, that is performs object extraction. The object is an object to be a foreground such as a person or an animal in the image, and the object extraction is processing of cutting out the foreground such as the person or the animal from the input image, that is processing of removing a background.
In S1004, the image completion processing unit 402 determines whether or not the missing portion is present in the extracted object such as the person or the animal. Examples of a method of determining whether or not the missing portion is present in the extracted object includes methods such as the method of determining whether or not at least part of the outline of the extracted object matches at least one of the upper, lower, left, and right end portions of the input image as in Embodiment 1 or the like. In the case where the missing portion is absent in the object (No in S1004), the image completion processing unit 402 directly terminates the processing. In the case where the missing portion is present in the object (Yes in S1004), the processing proceeds to S1005.
S1005 to S1007 are the same as S804 to S806.
Note that the image used in the case where the image completion processing unit 402 requests the image generation unit 411 of the server 104 to perform the image generation in S1006 may be either the image after the object extraction, that is the image after removal of the background from the input image or the input image before the object extraction. As a matter of course, in the case where the input image before the object extraction is used, the object extraction is performed again before the completed image is provided to the user. Moreover, in the case where the image after the object extraction used, a white or black image may be used as the background. As a matter of course, the background image is removed again before the completed image is provided to the user.
In the present embodiment, as described above, in the case where the missing portion is present in the object cut out from each content, the image complementation is performed. Accordingly, as in Embodiment 1, work of the user preparing another input image without the missing portion can be omitted. Moreover, since the completion function is provided in the cut-out processing, the user does not have to abort the layout editing and separately perform the image completion processing, and can seamlessly perform the layout editing.
In the present embodiment, whether the generation of the completed image is performed or not is determined based on an arrangement position of the content.
In S1204, the image completion processing unit 402 obtains the image arrangement position based on the layout coordinates 603 indicating the position of the content set in the image input unit 404 and the content size 604 indicating the size of the content.
In S1205, the image completion processing unit 402 determines whether or not the arrangement position of the missing portion of the input image is within the layout region. Specifically, in the case where the content 1101 whose right side is missing is arranged at the center of the layout region 1100 as in
In S1206, the image completion processing unit 402 sets the completion region as in S804. In the case where the arrangement positions of the input image and the missing portion of the input image are determined, the determination may be performed based on the layout region 1100. Moreover, the configuration may be such that the information in the device information holding unit 406 is obtained from the image output apparatus 100 in addition to the layout region 1100, and the determination is performed based on a region in which an image can be drawn.
S1207 to S1208 are the same as S805 to S806.
In the present embodiment, as described above, processing with high calculation cost can be reduced by not performing unnecessary image generation processing even in the case where the missing portion is present in the input image. Accordingly, it is possible to reduce processing load and reduce work time of the user.
The completed image generated by the generative model is not always an image expected by the user. Meanwhile, as explained by using
S1301 to S1306 are the same as S801 to S806.
In S1307, the image generation unit 411 changes the seed value of the random number generator for generating the initial value used in the generative model used in the generation processing of the completed image.
After the execution of S1307, the processing returns to S1305, and S1305 to S1307 are repeated multiple times. The number of times of repeating may be a fixed predetermined number, or may be a number set by the user every time the processing is performed.
In S1308, the image completion processing unit 402 displays the generated multiple completed images on the display 309.
In S1309, the image completion processing unit 402 receives input of selection of one of the displayed multiple completed images by the user, finalizes the completed image to be used, and terminates the processing.
As explained above, in the present embodiment, an image closer to the intention of the user can be provided by providing multiple completed images for the input image with the missing portion.
As illustrated in
S1401 to S1404 are the same as S801 to S804.
In S1405, the image completion processing unit 402 sets the prompt being supplementary information of the image to be generated, in the prompt setting unit 403. For example, in the case where a head portion of a person is the completion region as illustrated in
In S1406, the image completion processing unit 402 requests the image generation unit 411 of the server 104 to generate the completed image based on the information set in the image input unit 404, the completion region information, and the information set in the prompt setting unit 403.
S1407 is same as S806.
As explained above, in the present embodiment, generation of the completed image using the prompt is provided for the image with the missing portion, and an image closer to the intention of the user can be thereby provided.
Although the completed image is generated in the image generation unit 411 such that the missing portion of the input image is completed, there is a case where the missing portion is present also in the generated completed image. Accordingly, the following is assumed to be desirable. Whether the missing portion is present in the generated completed image or not is determined. In the case where the missing portion is present, there is generated a re-completed image in which the missing portion of the completed image is completed.
S1501 to S1506 are the same as S801 to S806.
In S1507, the image completion processing unit 402 determines whether or not the missing portion is present in the generated completed image. In the case where the missing portion is absent (No in S1507), the image completion processing unit 402 directly terminates the processing. In the case where the missing portion is present (Yes in S1507), the processing returns to S1504.
In S1504, the image completion processing unit 402 sets the completion region to a region adjacent to the missing portion of the completed image that is in the image input unit 404 and that is generated in S1506. Then, S1505 to S1507 are executed again to generate the re-completed image in which the missing portion of the completed image is completed.
As explained above, in the present embodiment, the completed image without the missing portion can be provided to the user by determining whether or not the missing portion is present in the generated completed image.
Whether to perform the generation of the completed image or not may be received from the user.
S1601 to S1603 are the same as S801 to S803.
In S1604, the image completion processing unit 402 displays a dialog in which the user selects whether to generate the completed image or not, and receives a user input indicating execution or non-execution of the completion processing from the user.
In 1605, in the case where the user selects non-execution of the completion processing (No in S1605), the image completion processing unit 402 directly terminates the processing. In the case where the user selects execution of the completion processing (Yes in S1605), the processing proceeds to S1606.
In S1606, the image completion processing unit 402 sets the completion region as in S804.
S1607 to S1608 are the same as S805 to S806.
As explained above, in the present embodiment, the user himself/herself can select whether to generate the completed image or not.
In the case where cut-out of each content is performed, the extracted object is sometimes fragmented depending on the content because part of the extracted object is located behind another object.
S1801 to S1804 are the same as S901 to S904.
In S1805, the image completion processing unit 402 determines whether there are multiple extracted objects. In the case where there are no multiple extracted objects (No in S1805), the processing proceeds to S1808. In the case where there are multiple extracted objects (Yes in S1805), the processing proceeds to S1806.
In S1806, the image completion processing unit 402 determines whether the extracted multiple objects are the same object or not. An example of a method of determining whether the multiple objects are the same object or not includes a method of determining the extracted multiple objects as the same object in the case where outlines of the respective extracted multiple objects partially have shapes close to a straight line and these portions with the shapes close to the straight line face each other while being spaced away from each other. Examples of a determination criterion of whether the multiple objects have shapes close to a straight line or not are as follows. For example, Hough transform of detecting a straight line is performed on pixels forming the outlines, and whether or not a parameter of the straight line has a peak within a predetermined width may be used as the determination criterion. Moreover, a regression line is obtained, and whether or not a coefficient of determination of this regression line is a predetermined value or more may be used as the determination criterion. Furthermore, the configuration may be such that setting of objects to be extracted is received from the user in the object extraction of S1803, and the set extracted objects are determined as the same object. In the case where the multiple objects are not the same object (No in S1806), the processing proceeds to S1808. In the case where the multiple objects are the same object (Yes in S1806), the processing proceeds to S1807.
In S1807, the image completion processing unit 402 sets the completion region in a region between the extracted objects forming the same object.
S1808 to S1810 are the same as S1005 to S1007.
As described above, in the present embodiment, the image is completed also in the case where, in execution of cut-out editing of each content, multiple cut-out objects are present and are the same object. Accordingly, work of the user preparing another image again is eliminated as in Embodiment 1. Moreover, the user does not have to separately perform the completion processing, and does not have to stop the layout editing work, by providing the completion function in the cut-out function.
The completion region where the completed image is to be generated may be received from the user.
S1901 to S1903 are the same as S801 to S803.
In S1904, the image completion processing unit 402 receives a user input specifying the completion region by displaying a dialog in which the user sets the completion region where the completed image is to be generated.
Moreover, in the activation of the completion region setting dialog 2000 in
S1905 to S1906 are the same as S805 to S806.
As explained above, in the present embodiment, the user himself/herself can set the completion region in which the completed image is to be generated. The user setting the completion region enables the size and location of the completion region to be set as intended by the user.
Whether the completed image is to be generated or not may be determined based on an overlapping state with another content.
S2201 to S2204 are the same as S1201 to S1204.
In S2205, the image completion processing unit 402 determines whether another content is arranged in front of the input image with the missing portion to overlap the input image. In the case where another content is arranged in front of the input image to overlap the input image (Yes in S2205), the processing proceeds to S2206. In the case where another content does not overlap the input image (No in S2205), the processing proceeds to S2207.
In S2206, the image completion processing unit 402 determines whether the missing portion of the input image matches an overlapping position with the other content. In the case where the missing portion of the input image matches the position of the other content (No in S2206), the image completion processing unit 402 directly terminates the processing. In the case where the missing portion of the input image does not match the position of the other content (Yes in S2206), the processing proceeds to S2208.
In S2207, the image completion processing unit 402 performs determination of the arrangement position as in S1205.
In S2208, the image completion processing unit 402 sets the completion region as in S1206.
S2209 to S2210 are the same as S1207 to S1208.
As explained above, in the present embodiment, since whether the complementation is to be performed or not is determined depending on the overlapping state of the input image and another content even in the case where the missing portion is present in the input image, unnecessary image generation processing can be eliminated. Moreover, the image generation processing has very high calculation cost, and requires image generation processing time as a matter of course. Eliminating the unnecessary image generation processing leads to reduction of the calculation cost and also to reduction of work time of the user.
It is possible not only to generate multiple completed images as explained in Embodiment 4 but also to specify multiple completion regions and generate multiple completed images.
S2301 to S2303 are the same as S1301 to S1303.
In S2304, the image completion processing unit 402 sets the completion region as in S1304, and generates the completed image (S2305 to 2307). In this case, the corresponding generation processing of the completed image is repeated multiple times. The number of times of repeating may be set to a fixed value, be changeable by the user, or be set by the user every time the completed image generation is performed. Moreover, in the case where the completion region is repeatedly set, the setting may be performed with the size, shape, and position of the completion region changed. As a matter of course, the setting of the completion region may be set to a fixed value or be changeable by the user.
S2308 to S2309 are the same as S1308 to S1309.
As explained above, in the present embodiment, the completed images are generated for multiple patterns of the completion regions for the image with the missing portion, and the corresponding multiple completed images are provided. An image closer to the intention of the user can be thereby provided.
Embodiment(s) of the present disclosure can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
While the present disclosure has been described with reference to exemplary embodiments, it is to be understood that the disclosure is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
According to the present disclosure, it is possible to automatically complement the missing portion of the image without ground truth data.
This application claims the benefit of Japanese Patent Application No. 2023-214528 filed Dec. 20, 2023, which is hereby incorporated by reference wherein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2023-214528 | Dec 2023 | JP | national |