The present invention relates to an image processing apparatus, an image processing method, and a storage medium.
It is known that shooting action includes the aspect of “recording” a shooting target as image content, and the aspect of “expressing” what a photographer wishes to convey through image content. In a case where shooting action emphasizes “expression” through image content, it is particularly important that the intent of the photographer (referred to as “content acquisition intent” hereinafter) is reflected in the content. Meanwhile, in an actual scene for shooting, facial expressions and motion of subjects, positional relationships between subjects, and the like often do not match the intent of the photographer, and thus the photographer needs to wait until subject's conditions match the content acquisition intent and concentrate at all times so as not to miss a shot.
On the other hand, in a case where emphasis is placed on “expression” through image content, the necessity for the obtained image content to be an “image content obtained by a photographer through shooting action” is diminished. PTL 1 proposes a technique for generating aggregated content for providing a rich retrospective experience including atmosphere using shot images or video content. Also, a technique using a deep neural network model using a generative adversarial network (GAN) has been proposed as a technique for generating non-existent image content. PTL 2 proposes a technique for generating an image in which the direction of the line of sight or the orientation of a face is changed using a trained GAN model.
In the technique proposed in PTL 1, in a case where the content acquisition intent is not reflected in the original image or video content, the content acquisition intent cannot be reflected in aggregated content generated using the image or the like. Also, the technique proposed in PTL 2 is a technique for generating an image in which the direction of the line of sight or the orientation of a face is changed, and generation of content that reflects the content acquisition intent of the image content is not considered.
The present invention has been made in view of the above-mentioned issues, and aims to realize a technique by which it is possible to obtain an image content that more appropriately reflects a content acquisition intent.
In order to resolve these issues, for example, an image processing apparatus according to the present invention includes a configuration below. Specifically, the image processing apparatus is characterized by including: content acquisition means for acquiring first image content; degree acquisition means for acquiring a degree of fluctuation of a fluctuation element of the first image content, the fluctuation element being an element having fluctuation as a state variation, out of elements constituting an image; intent acquisition means for acquiring information indicating a shooting intent of a user; and generation means for generating second image content having a different degree of fluctuation of a fluctuation element of image content from the first image content, using a trained learning model, in which the learning model generates the second image content in which the degree of fluctuation acquired from the first image content corresponds to the information indicating the shooting intent.
Further features of the present invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the invention and, together with the description, serve to explain principles of the invention.
Hereinafter, embodiments will be described in detail with reference to the attached drawings. Note, the following embodiments are not intended to limit the scope of the claimed invention. Multiple features are described in the embodiments, but limitation is not made to an invention that requires all such features, and multiple such features may be combined as appropriate. Furthermore, in the attached drawings, the same reference numerals are given to the same or similar configurations, and redundant description thereof is omitted.
An example in which a digital camera capable of generating image content is used as an example of an image processing apparatus will be described below. However, this embodiment is applicable not only to digital cameras but also to other devices capable of generating image content. Examples of these devices include mobile phones including smartphones, game consoles, personal computers, tablet terminals, other wearable information terminals, and server devices.
The digital camera 100 includes, for example, an image content acquisition unit 101, a fluctuation element extraction unit 102, a fluctuation model generator 103, a fluctuation model database 104, and a content intent acquisition unit 105. Also, the digital camera 100 further includes a fluctuation rule determination unit 106, an image content reconstruction unit 107, a display unit 108, and a user instruction acquisition unit 109.
First, the image content acquisition unit 101 performs image content acquisition processing. In this embodiment, the image content acquisition unit 101 may not only acquire image content but also acquire meta information regarding the image content. Meta information regarding the image content includes, for example, information regarding the date and time when the image content is acquired, and acquisition position information.
The image content acquisition unit 101 controls acquisition of image content by an image capture device 129, which will be described later, and outputs the acquired image content to the fluctuation element extraction unit 102 and the image content reconstruction unit 107, which will be described later. The image content acquisition unit 101 may perform normalization by performing image processing such as any trimming and resizing on the image content in accordance with an output destination and then output the resulting image content.
Here, “fluctuation” and “fluctuation elements” according to this embodiment will be described with reference to
In the example shown in
The timings when the fluctuation of the fluctuation elements is the highest are the timing when the “degree of the smile” is reference numeral 204, the timing when the “composition position” is reference numeral 205, and the timing when the “amount of clouds”) is reference numeral 206. The image content acquired at the timings 204, 205, and 206 are respectively content 207, 208, and 209.
The fluctuation element extraction unit 102 extracts fluctuation elements included in the image content. For example, in an example in which the facial expression of a person is used as a fluctuation element, the fluctuation element extraction unit 102 extracts the fluctuation element by executing detection of the face of the person in the image content. The fluctuation element extraction unit 102 further performs fluctuation degree acquisition processing on the facial expression of the person in a case where the face of the person is detected. For example, the fluctuation element extraction unit 102 quantifies the degree of a smile, the degree of joy, anger, grief, and happiness, the degree of eye opening, the degree of mouth opening, and the like, through the degree acquisition. Note that when acquiring the degree of fluctuation, the degree of fluctuation may be calculated from the image content, or the degree of fluctuation corresponding to the image content may be acquired via a network.
Note that other fluctuation elements may include, for example, the posture of a person in image content, the composition of the image content, the lighting in the image content, the weather in the image content, the clothing of the subject in the image content, or the like. As for the posture of a person, the degree of fluctuation may be determined, for example, from at least any of the orientation of the face, the orientation of the body, the motion blur amount of the person, and the like. Also, as for the composition of the image content, the degree of fluctuation may be determined, for example, from at least any of the positional relationship between subjects, the distance between subjects, and the like. As for lighting, the degree of fluctuation may be determined, for example, from the position of a light source or the like. As for the weather, the degree of fluctuation may be determined, for example, from at least any of the weather, the amount of clouds, and the like. As for clothing, the degree of fluctuation may be determined, for example, from at least any of the type, color, and the like of the clothing. The fluctuation element extraction unit 102 outputs, to the fluctuation rule determination unit 106, the calculated degree of the fluctuation element together with the image content. Also, the fluctuation element extraction unit 102 outputs, to the fluctuation model generator 103, the image content and the degrees of fluctuation of the fluctuation elements as training data for a fluctuation model, which will be described later.
The fluctuation model generator 103 performs processing for training a learning model for each fluctuation element (referred to as a “fluctuation model” hereinafter) using the image content and the extracted degrees of fluctuation of the fluctuation elements that are obtained from the fluctuation element extraction unit 102. The fluctuation model is generated for each fluctuation element, and is trained to generate image content corresponding to a designated degree of fluctuation. For example, the fluctuation model in which the facial expression of a person is a fluctuation element is trained to generate image content having a designated facial expression. Note that, even for the same fluctuation element, a plurality of fluctuation models may be generated for each period such as every one month, for each region where a user has stayed, or in response to an instruction from the user.
The fluctuation models may be constructed using a known machine learning algorithm capable of generating an image, such as a GAN (Generative Adversarial Network). The GAN is constituted by two neural networks: a generator that generates image content, and a discriminator that discriminates whether or not the image content generated by the generator is a real image.
In fluctuation model training stage processing, the above-described generator and the above-described discriminator share a loss function with each other, and respectively and repeatedly update neural networks such that the generator minimizes the loss function and the discriminator maximizes the loss function. As a result, the image content generated by the generator will be a natural image. Note that well-known techniques are applied to the configurations of neural networks and the learning algorithm in the GAN, and thus will not be described in this embodiment. In this manner, the data used in training is stored in the fluctuation model database 104 in association with the trained fluctuation model. In other words, the image content included in the training data and the degree of the fluctuation element of the image content are stored in the fluctuation model database 104 in association with information indicating the fluctuation element (corresponding to the model).
The fluctuation model database 104 is stored in a later-described HDD 125, and stores a fluctuation model for each fluctuation element generated by the fluctuation model generator 103, and data used in training.
Note that, in this embodiment, a case where the fluctuation model generator 103 and the fluctuation model database 104 are included in the digital camera 100 will be described as an example. However, a configuration may be adopted in which a communication unit is provided in the digital camera 100, and the fluctuation model generator 103 and the fluctuation model database 104 are stored on an external server or cloud. Alternatively, the fluctuation model generator 103 and the fluctuation model database 104 may be stored in both the digital camera 100 and an external server, and may be used depending on the use or purpose.
For example, a database and a generator that generates a fluctuation model associated with a fluctuation element expected to be used frequently, such as the facial expression of the main subject, are provided on the digital camera 100 side. On the other hand, a generator that generates a fluctuation model that is used infrequently and a fluctuation model in the process of training, and training data may be stored on the external server side. Also, the update history of the fluctuation model may also be managed on the external server or the cloud service side.
The content intent acquisition unit 105 acquires a content acquisition intent that the photographer wishes to express in input image content, and outputs, to the fluctuation rule determination unit 106, the identifier of the content acquisition intent indicating the content acquisition intent.
In this embodiment, for example, a relationship between the fluctuation element included in the image content and the content acquisition intent identifier is determined in advance, and the fluctuation element included in the image content to be acquired is converted into the identifier of the content acquisition intent. That is, the content intent acquisition unit 105 can acquire the identifier of the content acquisition intent based on image information of the image content. The identifier of the content acquisition intent includes, for example, keywords used for tagging ordinary image content, such as “fun” and “commemorative picture”. Furthermore, the content intent acquisition unit 105 may receive an instruction or selection regarding the content acquisition intent identifier from the user. Also, the content intent acquisition unit 105 may estimate information of the content acquisition intent identifier from the user action history such as operation history and the number of shooting attempts performed to acquire image contents.
The content intent acquisition unit 105 may further output the content acquisition intent identifier using sound information. For example, by using sound information regarding a surrounding region at the time of content acquisition, the content intent acquisition unit 105 can also convert sound information regarding the shooting space including the voice of the photographer into a content acquisition intent identifier.
The fluctuation rule determination unit 106 calculates a fluctuation degree change amount for each fluctuation element (referred to as a “fluctuation rule” hereinafter) for the fluctuation element of image content to be reconstructed and the degree of the fluctuation element, using the above-described content acquisition intent identifier. Also, the fluctuation rule determination unit 106 designates a fluctuation model used for the image content reconstruction unit 107, which will be described later. Details of processing performed by the fluctuation rule determination unit 106 will be described later.
The image content reconstruction unit 107 reads the fluctuation model from the fluctuation model database 104 in accordance with the rule (the fluctuation degree change amount of the fluctuation element) determined by the fluctuation rule determination unit 106. Also, the image content reconstruction unit 107 reconstructs image content by inputting image content to be reconstructed and a parameter for reconstruction into the fluctuation model. Details of image content reconstruction will be described later. The image content reconstruction unit 107 outputs the reconstructed image content to the display unit 108.
The display unit 108 causes a display device 128 to display various image content. In this embodiment, the display unit 108 causes the display device 128 to display at least the image content acquired by the image content acquisition unit 101 or the image content reconstructed by the image content reconstruction unit 107.
The user instruction acquisition unit 109 receives various instructions regarding reconstruction of an image content from the user via an input device 127, and prompts each processing unit of the digital camera 100 to perform predetermined processing. For example, the user instruction acquisition unit 109 receives an image content acquisition instruction and a reconstruction instruction from the user. In addition, designation of parameters required for image content reconstruction, such as an identifier of a content acquisition intent and a fluctuation model, may also be received.
Next, an example of the hardware configuration of the digital camera 100 will be described later with reference to
The CPU 122 is an arithmetic circuit such as a CPU (central processing unit) and realizes each function of the digital camera 100 by loading a computer program stored in the ROM 123 on the RAM 124 or the HDD 125 and executing the computer program. The ROM 123 includes, for example, a nonvolatile storage medium such as a semiconductor memory, and stores, for example, programs executed by the CPU 122 and necessary data. The RAM 124 includes, for example, a volatile storage medium such as a semiconductor memory, and temporarily stores, for example, the calculation results of the CPU 122 and the like. The HDD 125 includes a hard disk drive, and stores, for example, computer programs executed by the CPU 122, the results of processing performed by the CPU 122, and the like. Although a case where the digital camera 100 has a hard disk is described as an example, the digital camera 100 may have a storage medium such as an SSD, instead of the hard disk. The GPU (Graphics Processing Unit) 126 includes an arithmetic circuit, and can execute, for example, part or the entirety of training stage processing for the learning model and inference stage processing. The GPU can process more data in parallel, compared to the CPU, and thus effectively performs deep learning processing for repeatedly performing calculation using the above-described neural networks.
The input device 127 includes an operation member such as a button or a touch panel that receives an operation input made on the digital camera 100. The display device 128 includes, for example, a display panel such as an OLED. The image capture device 129 includes, for example, optical system units such as a lens, an aperture, and a shutter, and an image sensor such as a CMOS sensor. The optical system unit may be configured including a compound eye lens or a multi-eye lens. Also, optical properties, such as a zoom and the aperture, of an optical unit may be able to be changed (e.g., depending on an image content to be acquired).
Fluctuation model training processing performed by the fluctuation model generator 103 or the like will be described with reference to
In step S301, the image content acquisition unit 101 acquires image content for training via the image capture device 129. For example, the acquired image content for training is still image data. Also, the image content acquisition unit 101 may cut out still image data from moving image content. The image content acquisition unit 101 outputs the acquired still image data to the fluctuation element extraction unit 102. Note that the image content to be acquired is not limited to image content output from the image capture device 129, and image content that has been acquired and stored in the HDD 125 in advance may be used. The image content for training may be limited to an image content acquired in a specific period or at a specific position. For example, the image content for training may image content acquired between a predetermined start instruction and an end instruction given by the user as a shooting period or a training data collection period. Alternatively, the image content for training may be acquired in accordance with image content to be reconstructed. The image content for training may be image content acquired in a predetermined period before and after the date and time when the image content to be processed for reconstruction is acquired. Alternatively, the image content for training may be image content acquired in a predetermined range around the position at which the image content to be processed for reconstruction is acquired.
In step S302, the fluctuation element extraction unit 102 extracts a predetermined fluctuation element from the input still image data, and calculates (acquires) the degree of fluctuation (score) for the extracted fluctuation element. The image content acquisition unit 101 normalizes the still image data in a region including the extracted fluctuation element, and outputs the normalized data together with information regarding the degree of fluctuation to the fluctuation model generator 103 (as training data for a fluctuation model).
Note that it is presumed that this processing is executed for each fluctuation element for one piece of still image data in this description. However, the frequency of extraction of fluctuation elements may be determined for each fluctuation element. For example, the extraction frequency may be set high for elements having a large fluctuation change, and the extraction frequency may be set low for elements having a small fluctuation change.
In step S303, the fluctuation model generator 103 reads information regarding a fluctuation model to be trained, from the fluctuation model database 104, and performs machine learning processing on the fluctuation model using the input training data. The machine learning processing for the fluctuation model is, for example, processing at the training stage of the GAN described above. Then, the fluctuation model generator 103 updates fluctuation model information in the fluctuation model database 104 together with data used in training. Note that, in a case where a fluctuation model to be trained is not present in the fluctuation model database 104, a new fluctuation model is added.
Fluctuation of the fluctuation elements in image content acquired by the user through the above processing or obtained through user experience is used as training data for each fluctuation element model. As a result, it is possible to construct a neural network of a GAN generator in which the fluctuation of the fluctuation element can be tuned (that is, it is possible to generate an image according to the designated degree of fluctuation).
Next, image content reconstruction processing using the fluctuation element model will be described with reference to
In step S401, the image content acquisition unit 101 acquires image content to be reconstructed. Here, for example, a case where the image content 208 is image content to be reconstructed will be described as an example.
In step S402, the fluctuation element extraction unit 102 receives, from the image content acquisition unit 101, the image content to be reconstructed, extracts a fluctuation element included in the image content, and calculates (acquires) the degree of the fluctuation element. The operation of the fluctuation element extraction unit 102 is similar to that in the learning processing.
In step S403, the content intent acquisition unit 105 acquires the identifier of the content acquisition intent from any information group accompanying the image content. For example, an identifier of the content acquisition intent such as “travel”, “commemorative picture”, or “fun” is acquired from a person present in the image content 208, the facial expression thereof, and an object in the background, and is associated with the image content.
Note that the content intent acquisition unit 105 may acquire the identifier of the content acquisition intent based on further information other than the image content. For example, in a case where the digital camera 100 is provided with voice recognition technology, the content intent acquisition unit 105 uses the results of voice recognition to acquire the content acquisition intent identifier. For example, the content intent acquisition unit 105 may acquire the identifier of the content acquisition intent based on user utterance information recorded in a predetermined period before and after the image content is shot, or user utterance information input in a predetermined period after the image content is reproduced. Specifically, in a case where user's voice such as “it is cloudy”, “I cannot see because of clouds”, or “I wish it was sunny”, is recognized, when acquiring the image content 208 or giving a reconstruction instruction, the keyword may be “weather” or “sunny”, which is considered to be an ideal condition. In this case, this keyword is associated with the image content as a content acquisition intent identifier.
In addition to the above-described example, the identifier of the content acquisition intent may be predicted and calculated from user operation history information and action history information before and after the image content 208 selected in step S401 is shot, text information input by the user, and the like.
Then, the content intent acquisition unit 105 associates the content acquisition intent identifier with the image content 208 and outputs the content acquisition intent identifier to the fluctuation rule determination unit 106.
In step S404, the fluctuation rule determination unit 106 determines a fluctuation rule that serves as control information for the image content reconstruction unit 107, using the image content to be reconstructed, the fluctuation element information associated with the image content, and the content acquisition intent identifier.
A method for creating a fluctuation rule according to this embodiment will be described with reference to
The fluctuation rule determination unit 106 selects and reads, from the fluctuation model database 104, fluctuation model information associated with the fluctuation element of the image content 208 to be reconstructed. Note that fluctuation model information to be read is information regarding a fluctuation model trained using training data, and the training data includes at least the image content including the fluctuation element to be reconstructed.
The fluctuation rule determination unit 106 calculates information regarding a fluctuation range in which reconstruction is possible in the fluctuation model, using the read fluctuation model information and the associated training data group. For example,
Next, the fluctuation rule determination unit 106 calculates a recommended value of the degree of fluctuation of the fluctuation element after reconstruction, from the content acquisition intent identifier. In this embodiment, for example, the digital camera 100 stores information in which the above-described content acquisition intent identifier is associated with the ideal degree of fluctuation of the fluctuation element, as conversion table information regarding the intent and the ideal degree of fluctuation in advance. The fluctuation rule determination unit 106 calculates the degree of fluctuation of the fluctuation element after reconstruction, with reference to the conversion table information.
For example, as shown in
The fluctuation rule determination unit 106 determines a fluctuation model to be used, and calculates a parameter set to the determined fluctuation model. The parameter to be set is calculated to be in the above-described fluctuation range in which reconstruction is possible and approach the ideal degree of fluctuation of the fluctuation element according to the content acquisition intent. For example, first, the fluctuation rule determination unit 106 determines whether the ideal degree of fluctuation corresponding to the shooting intent corresponds to a degree that can be set for reconstruction (whether the ideal degree is a degree of 1 to 6 in the above example), out of fluctuation degrees. In a case where the ideal degree of fluctuation corresponds to a degree that can be set for reconstruction, out of fluctuation degrees, the fluctuation rule determination unit 106 sets the ideal degree of fluctuation as the degree to be set for reconstruction. In a case where the ideal degree of fluctuation does not correspond to a degree that can be set for reconstruction, out of fluctuation degrees, the fluctuation rule determination unit 106 sets a degree that is the closest to the ideal degree, out of degrees that can be set for reconstruction as the degree to be set for reconstruction. That is, the adjusted degree adjusted in accordance with the ideal degree of fluctuation is set for reconstruction. For example, as shown in
Further, the fluctuation rule determination unit 106 determines the order of reconstruction processes in which a plurality of fluctuation models are used. The fluctuation models are processed in any order, and the order of processes may be determined depending on various factors. In this embodiment, for example, processing is performed in the order of a fluctuation model having a larger difference between the above-described recommended value of the degree of fluctuation and the degree of fluctuation in the image content to be reconstructed, and then a fluctuation model having a smaller difference therebetween. In this case, for example, as shown in
The fluctuation rule determination unit 106 outputs, as a fluctuation rule, the fluctuation model information, the parameter information to be passed to the fluctuation models, and information regarding the fluctuation model reconstruction processing order to the image content reconstruction unit 107 in this manner.
In step S405, the image content reconstruction unit 107 executes reconstruction processing using the image content to be constructed and the fluctuation rule determined by the fluctuation rule determination unit 106. For example, an image such as that shown in
Note that a configuration may be adopted in which the generated image is displayed on the display unit 108 to prompt the user to check the image, and to receive feedback regarding reconstruction processing. For example, in a case where the user issues an instruction to record the reconstructed image content, recording processing may be performed and positive feedback may be given to the fluctuation model, otherwise, negative feedback may be given and new reconstruction processing may be performed.
As described above, in this embodiment, the degree of fluctuation of the fluctuation element of the acquired image content and information indicating the shooting intent of the user are acquired, and image content having different degrees of fluctuation are generated from the acquired image content using the trained leaning model. At this time, the learning model generates image content in which the degree of fluctuation acquired from the acquired image content corresponds to information indicating a shooting intent. Doing this makes it possible to obtain image content that more appropriately reflects a content acquisition intent.
According to the present invention, it is possible to obtain image content that more appropriately reflects a content acquisition intent.
Embodiment(s) of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
Number | Date | Country | Kind |
---|---|---|---|
2022-015820 | Feb 2022 | JP | national |
This application is a Continuation of International Patent Application No. PCT/JP2022/047854, filed Dec. 26, 2022, which claims the benefit of Japanese Patent Application No. 2022-015820, filed Feb. 3, 2022, both of which are hereby incorporated by reference herein in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2022/047854 | Dec 2022 | WO |
Child | 18763606 | US |