This application claims the benefit of Japanese Patent Application No. 2021-206266, filed Dec. 20, 2021, which is hereby incorporated by reference herein in its entirety.
The present invention relates to an image processing apparatus, an image capturing apparatus, an image processing method, and a storage medium.
In recent years, artificial intelligence (AI) techniques, such as deep learning, have been utilized in a variety of technical fields. For example, conventionally, digital still cameras and the like are known to have a function to detect a human face from a shot image. Also, Japanese Patent Laid-Open No. 2015-099559 discloses a technique to accurately detect and recognize animals, such as dogs and cats, without limiting a detection target to humans.
Furthermore, there is a known technique whereby a composite image is generated by compositing a plurality of material images, such as multiple composition and trajectory composition. In connection to this technique, Japanese Patent Laid-Open No. 2019-009577 discloses that only shooting information of an image including a main subject (a material image) is added to a post-composition image and recorded.
Assume a case where subjects are detected, recognized, and so forth using, for example, AI techniques from a composite image that has been generated by compositing a plurality of material images (multiple composition, trajectory composition, or the like). There is a possibility that, in the composite image, subjects in respective material images overlap at the same position. This case has a problem in that it is difficult to correctly perform detection, recognition, and so forth of all subjects included in the composite image. However, the techniques of Japanese Patent Laid-Open No. 2015-099559 and Japanese Patent Laid-Open No. 2019-009577 cannot address such a problem.
The present invention has been made in view of the aforementioned situation. The present invention provides a technique whereby, even in a case where a subject detected from a material image cannot be detected from a composite image generated from a plurality of material images, subject information indicating this subject can be obtained together with the composite image.
According to a first aspect of the present invention, there is provided an image processing apparatus, comprising: an obtainment unit configured to obtain a first image, first subject information indicating a first subject detected from the first image, a second image, and second subject information indicating a second subject detected from the second image; a composition unit configured to generate a composite image by compositing the first image and the second image; and a recording unit configured to record the first subject information and the second subject information in association with the composite image.
According to a second aspect of the present invention, there is provided the image processing apparatus according to the first aspect, further comprising: a detection unit configured to detect a third subject from the composite image; and a generation unit configured to generate third subject information indicating the third subject that has been detected from the composite image, wherein the recording unit records the third subject information in association with the composite image.
According to a third aspect of the present invention, there is provided an image capturing apparatus, comprising: the image processing apparatus according to the first aspect; an image capturing unit configured to generate the first image and the second image; a detection unit configured to detect the first subject from the first image, and detect the second subject from the second image; and a generation unit configured to generate the first subject information indicating the first subject that has been detected from the first image, and the second subject information indicating the second subject that has been detected from the second image, wherein the obtainment unit obtains the first image and the second image generated by the image capturing unit, as well as the first subject information and the second subject information generated by the generation unit.
According to a fourth aspect of the present invention, there is provided an image capturing apparatus, comprising: the image processing apparatus according to the second aspect; and an image capturing unit configured to generate the first image and the second image, wherein the detection unit detects the first subject from the first image, and detects the second subject from the second image, the generation unit generates the first subject information indicating the first subject that has been detected from the first image, and the second subject information indicating the second subject that has been detected from the second image, and the obtainment unit obtains the first image and the second image generated by the image capturing unit, as well as the first subject information and the second subject information generated by the generation unit.
According to a fifth aspect of the present invention, there is provided an image processing method executed by an image processing apparatus, comprising: obtaining a first image, first subject information indicating a first subject detected from the first image, a second image, and second subject information indicating a second subject detected from the second image; generating a composite image by compositing the first image and the second image; and recording the first subject information and the second subject information in association with the composite image.
According to a sixth aspect of the present invention, there is provided a non-transitory computer-readable storage medium which stores a program for causing a computer to execute an image processing method comprising: obtaining a first image, first subject information indicating a first subject detected from the first image, a second image, and second subject information indicating a second subject detected from the second image; generating a composite image by compositing the first image and the second image; and recording the first subject information and the second subject information in association with the composite image.
Further features of the present invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
Hereinafter, embodiments will be described in detail with reference to the attached drawings. Note, the following embodiments are not intended to limit the scope of the claimed invention. Multiple features are described in the embodiments, but limitation is not made to an invention that requires all such features, and multiple such features may be combined as appropriate. Furthermore, in the attached drawings, the same reference numerals are given to the same or similar configurations, and redundant description thereof is omitted.
Furthermore, the following description exemplarily presents a digital camera (an image capturing apparatus) as an image processing apparatus that performs subject classification with use of an inference model. However, in the following embodiments, the image processing apparatus is not limited to a digital camera. The image processing apparatus according to the following embodiments may be any apparatus as long as it is an apparatus that has digital camera functions to be described below, and may be, for example, a smartphone, a tablet PC, or the like.
An A/D converter 15 converts analog image signals output from the image sensor 13 into digital image signals. The digital image signals converted by the A/D converter 15 are written to a memory 25 as so-called RAW image data pieces. In addition to this, development parameters corresponding to respective RAW image data pieces are generated based on information at the time of shooting, and written to the memory 25. Development parameters are composed of various types of parameters that are used in image processing for recording images using a JPEG method or the like, such as an exposure setting, white balance, color space, and contrast.
A timing generation unit 14 is controlled by a memory control unit 22 and a system control unit 50, and supplies clock signals and control signals to the image sensor 13, the A/D converter 15, and a D/A converter 21.
An image processing unit 20 executes various types of image processing, such as predetermined pixel interpolation processing, color conversion processing, correction processing, resize processing, and image composition processing, with respect to data from the A/D converter 15 or data from the memory control unit 22. Also, the image processing unit 20 executes predetermined image processing and computation processing with use of image data obtained through image capture, and provides the obtained computation result to the system control unit 50. The system control unit 50 realizes AF (autofocus) processing, AE (automatic exposure) processing, and EF (preliminary flash emission) processing by controlling an exposure control unit 40 and a focus control unit 41 based on the provided computation result.
Furthermore, the image processing unit 20 executes predetermined computation processing with use of image data obtained through image capture, and also executes AWB (auto white balance) processing based on the obtained computation result. In addition, the image processing unit 20 reads in image data stored in the memory 25, and executes compression processing or decompression processing with use of such methods as a JPEG method, an MPEG-4 AVC method, an HEVC (High Efficiency Video Coding) method, and a lossless compression method for uncompressed RAW data. Then, the image processing unit 20 writes the image data for which processing has been completed to the memory 25.
Also, the image processing unit 20 executes predetermined computation processing with use of image data obtained through image capture, and executes editing processing with respect to various types of image data. For example, the image processing unit 20 can execute trimming processing in which the display range and size of an image is adjusted by causing unnecessary portions around image data not to be displayed, and resize processing in which the size is changed by enlarging or reducing image data, display elements of a screen, and the like. Furthermore, the image processing unit 20 can execute RAW development whereby image data is generated by applying image processing, such as color conversion, to data that has undergone compression processing or decompression processing with use of a lossless compression method for uncompressed RAW data, and converting the resultant data into a JPEG format. Moreover, the image processing unit 20 can execute moving image cutout processing in which a designated frame of a moving image format, such as MPEG-4, is cut out, converted into a JPEG format, and stored.
Also, the image processing unit 20 includes a composition processing circuit that composites a plurality of image data pieces. In the present embodiment, the image processing unit 20 can execute addition composition processing, weighted addition composition processing, lighten composition processing, and darken composition processing. The lighten composition processing is processing for generating one composite image from a plurality of material images by selecting the brightest pixel values of the plurality of material images as the pixel values of respective pixels of the composite image. The darken composition processing is processing for generating one composite image from a plurality of material images by selecting the darkest pixel values of the plurality of material images as the pixel values of respective pixels of the composite image.
Furthermore, the image processing unit 20 also executes, for example, processing for causing OSD (On-Screen Display), such as a menu to be displayed on a display unit 23 and no particular characters, to be superimposed on image data to be displayed.
In addition, the image processing unit 20 executes subject detection processing for detecting a subject that exists within image data and detecting a subject region thereof with use of, for example, input image data and information of a distance to the subject at the time of shooting, which is obtained from, for example, the image sensor 13. Examples of detectable information (subject detection information) include information of the position, size, inclination, and the like of a subject region within an image, and information indicating certainty.
The memory control unit 22 controls the A/D converter 15, the timing generation unit 14, the image processing unit 20, an image display memory 24, the D/A converter 21, and the memory 25. RAW image data generated by the A/D converter 15 is written to the image display memory 24 or the memory 25 via the image processing unit 20 and the memory control unit 22, or directly via the memory control unit 22.
Image data for display that has been written to the image display memory 24 is displayed on the display unit 23, which is composed of a TFT LCD or the like, via the D/A converter 21. An electronic viewfinder function for displaying live images can be realized by sequentially displaying image data pieces obtained through image capture with use of the display unit 23.
The memory 25 has a storage capacity that is sufficient to store a predetermined number of still images and moving images of a predetermined length of time, and stores still images and moving images that have been shot. Furthermore, the memory 25 can also be used as a working area for the system control unit 50.
The exposure control unit 40 controls the shutter 12, which has a diaphragm function. Furthermore, the exposure control unit 40 also exerts a flash light adjustment function by operating in coordination with a flash 44. The focus control unit 41 performs focus adjustment by driving a non-illustrated focus lens included in the photographing lens 11 based on an instruction from the system control unit 50. A zoom control unit 42 controls zooming by driving a non-illustrated zoom lens included in the photographing lens 11. The flash 44 has a function of emitting AF auxiliary light, and a flash light adjustment function.
The system control unit 50 controls the entirety of the digital camera 100. A nonvolatile memory 51 is an electrically erasable and recordable nonvolatile memory; for example, an EEPROM or the like is used thereas. Note that not only programs, but also map information and the like are recorded in the nonvolatile memory 51.
A shutter switch 61 (SW1) is turned ON and issues an instruction for starting operations of AF processing, AE processing, AWB processing, EF processing, and the like in the midst of an operation on a shutter button 60. A shutter switch 62 (SW2) is turned ON and issues an instruction for starting a series of shooting operations, including exposure processing, development processing, and recording processing, upon completion of the operation on the shutter button 60. In the exposure processing, signals that have been read out from the image sensor 13 are written to the memory 25 as RAW image data via the A/D converter 15 and the memory control unit 22. In the development processing, the image processing unit 20 and the memory control unit 22 perform computation to develop RAW image data that has been written to the memory 25 and write the same to the memory 25 as image data. In the recording processing, image data is read out from the memory 25, the image data is compressed by the image processing unit 20, the compressed image data is stored to the memory 25, and then the stored image data is written to an external recording medium 91 via a card controller 90.
An operation unit 63 includes such operation members as various types of buttons and a touchscreen. For example, the operation unit 63 includes a power button, a menu button, a mode changing switch for switching among a shooting mode, a reproduction mode, and other special shooting modes, directional keys, a set button, a macro button, and a multi-screen reproduction page break button. Also, for example, the operation unit 63 includes a flash setting button, a button for switching among single shooting, continuous shooting, and self-timer, a menu change + (plus) button, a menu change − (minus) button, a shooting image quality selection button, an exposure correction button, a date/time setting button, and so forth.
When image data is to be recorded in the external recording medium 91, a metadata generation and analysis unit 70 generates various types of metadata, such as information of the Exif (Exchangeable image file format) standard to be attached to the image data, based on information at the time of shooting. Also, when image data recorded in the external recording medium 91 has been read in, the metadata generation and analysis unit 70 analyzes metadata added to the image data. Examples of metadata include shooting setting information at the time of shooting, image data information related to image data, feature information of a subject included in image data, and so forth. Furthermore, when moving image data is to be recorded, the metadata generation and analysis unit 70 can also generate and add metadata with respect to each frame.
A power 80 includes, for example, a primary battery such as an alkaline battery and a lithium battery, a secondary battery such as a NiCd battery, a NiMH battery, and a Li battery, or an AC adapter. A power control unit 81 supplies power supplied from the power 80 to each component of the digital camera 100.
The card controller 90 transmits/receives data to/from the external recording medium 91, such as a memory card. The external recording medium 91 is composed of, for example, a memory card, and images (still images and moving images) shot by the digital camera 100 are recorded therein.
Using an inference model recorded in an inference model recording unit 72, an inference engine 73 performs inference with respect to image data that has been input via the system control unit 50. The system control unit 50 can record an inference model that has been input from an external apparatus (not shown) via a communication unit 71 in the inference model recording unit 72. Also, the system control unit 50 can record, in the inference model recording unit 72, an inference model that has been obtained by re-training the inference model with use of a training unit 74. Note, there is a possibility that an inference model recorded in the inference model recording unit 72 is updated due to inputting of an inference model from an external apparatus, or re-training of an inference model with use of the training unit 74. For this reason, the inference model recording unit 72 holds version information so that the version of an inference model can be identified.
Also, the inference engine 73 includes a neural network design 73a. The neural network design 73a is configured in such a manner that intermediate layers (neurons) are arranged between an input layer and an output layer. The system control unit 50 inputs image data to the input layer. Neurons in several layers are arranged as the intermediate layers. The number of layers of neurons is determined as appropriate in terms of design. Furthermore, the number of neurons in each layer is also determined as appropriate in terms of design. In the intermediate layers, weighting is performed based on an inference model recorded in the inference model recording unit 72. An inference result corresponding to the image data input to the input layer is output to the output layer.
It is assumed that, in the present embodiment, an inference model recorded in the inference model recording unit 72 is an inference model that infers classification, that is to say, what kind of subject is included in an image. An inference model is used that has been generated through deep learning while using image data pieces of various subjects, as well as the result of classification thereof (e.g., classification of animals such as dogs and cats, classification of subject types such as humans, animals, plants, and buildings, and so forth), as supervisory data. Therefore, when an image has been input, together with information indicating a region of a subject that has been detected in this image, to the inference engine 73 that uses the inference model, an inference result indicating classification of this subject is output.
Upon receiving a request from the system control unit 50 or the like, the training unit 74 re-trains an inference model. The training unit 74 includes a supervisory data recording unit 74a. Information related to supervisory data for the inference engine 73 is recorded in the supervisory data recording unit 74a. The training unit 74 can cause the inference engine 73 to be re-trained with use of the supervisory data recorded in the supervisory data recording unit 74a, and update the inference engine 73 with use of the inference model recording unit 72.
The communication unit 71 includes a communication circuit for performing transmission and reception. Communication performed by the communication circuit specifically may be wireless communication via Wi-Fi, Bluetooth®, or the like, or may be wired communication via Ethernet, a USB, or the like.
A description is now given of composition processing in which a plurality of image data pieces (a plurality of material images) are composited by the image processing unit 20. As the composition processing, the image processing unit 20 can execute four types of processing: addition composition processing, weighted addition composition processing, lighten composition processing, and darken composition processing. It is assumed that the pixel value of a pre-composition image i (i=1 to N) is I_i (x, y) (where x, y denotes coordinates in the image), and the pixel value of a composite image is I (x, y). As a pixel value, values of respective signals of R, G1, G2, and B based on the Bayer array may be used, or a value of a luminance signal obtained from a group of signals of R, G1, G2, and B (a luminance value) may be used. At this time, a luminance value may be calculated on a per-pixel basis after executing interpolation processing with respect to signals based on the Bayer array in such a manner that signals of R, G, and B exist on a per-pixel basis. For example, provided that a luminance value is Y, a computation formula for performing calculation by way of weighted addition of signals of R, G, and B, such as Y=0.3×R+0.59×G+0.11×B, is used as a computation formula for the luminance value. The composition processing is executed based on each pixel value for which the positions have been aligned by executing such processing as positioning among a plurality of images as necessary.
The addition composition processing is executed in accordance with the following formula. That is to say, the image processing unit 20 generates a composite image by executing addition processing with respect to pixel values of N images, pixel by pixel.
I(x,y)=I_1(x,y)+I_2(x,y)+ . . . +I_N(x,y)
The weighted addition composition processing is executed in accordance with the following formula. ai(i=1 to N) is a weighting coefficient. That is to say, the image processing unit 20 generates a composite image by executing weighted addition processing with respect to pixel values of N images, pixel by pixel. In a case where a1+a2+ . . . +aN=1, the following formula is equivalent to weighted average processing.
I(x,y)=a1×I_1(x,y)+a2×I_2(x,y)+ . . . +aN×I_N(x,y)
The lighten composition processing is executed in accordance with the following formula. That is to say, the image processing unit 20 generates a composite image by selecting the maximum value of pixel values of N images, pixel by pixel.
I(x,y)=max(I_1(x,y),I_2(x,y), . . . ,I_N(x,y))
The darken composition processing is executed in accordance with the following formula. That is to say, the image processing unit 20 generates a composite image by selecting the minimum value of pixel values of N images, pixel by pixel.
I(x,y)=min(I_1(x,y),I_2(x,y), . . . ,I_N(x,y))
Next, multiple composition shooting processing executed by the digital camera 100 will be described with reference to
In step S202, the system control unit 50 determines whether the user has issued a shooting instruction. The user can issue the shooting instruction by depressing the shutter button 60, thereby turning ON the shutter switches 61 (SW1) and 62 (SW2). The system control unit 50 repeats determination processing in step S202 until the user issues the shooting instruction. Once the user has issued the shooting instruction, processing steps proceed to step S203.
Processing of steps S203 to S208 is repeatedly executed until it is determined that the shooting instruction has not continued in step S209, which will be described later. In the following description, it is assumed that processing of steps S203 to S208 has been executed 11 times (therefore, 11 material images have been generated).
In step S203, the system control unit 50 executes shooting processing. In the shooting processing, the system control unit 50 executes AF (autofocus) processing and AE (automatic exposure) processing with use of the focus control unit 41 and the exposure control unit 40, and then stores image signals that are output from the image sensor 13 via the A/D converter 15 into the memory 25. Also, the image processing unit 20 generates image data of a format conforming to a user setting (e.g., a JPEG format) by executing compression processing conforming to the user setting with respect to the image signals stored in the memory 25.
In step S204, the image processing unit 20 executes subject detection processing with respect to the image signals stored in the memory 25, and obtains information of subjects included in the image (subject detection information).
In step S205, with use of the inference engine 73, the system control unit 50 executes inference processing with respect to the subjects that were detected from the image signals (material image) stored in the memory 25. The system control unit 50 specifies subject regions within the image based on the image signals stored in the memory 25 and on the subject detection information obtained in step S204. The system control unit 50 inputs the image signals (material image), as well as information indicating the subject regions in the material image, to the inference engine 73. An inference result indicating classification of the subjects included in the subject regions is output as the result of execution of the inference processing by the inference engine 73 for each subject region. Note that the inference engine 73 may output information related to the inference processing, such as debug information and logs associated with the operations of the inference processing, in addition to the inference result.
In step S206, the system control unit 50 records a file including the image data generated in step S203, the subject detection information obtained in step S204, and the inference result obtained in step S205 as a material image file for multiple composition into the external recording medium 91.
The metadata generation and analysis unit 70 records the subject detection information obtained in step S204 into a subject detection information tag 306 within a MakerNote 305 (a region in which metadata unique to a maker can be described in a basically-undisclosed form) included in the Exif region 301. Also, in a case where there are version information of the current inference model recorded in the inference model recording unit 72, debug information output from the inference engine 73 in step S205, and so forth, these pieces of information are recorded inside the MakerNote 305 as inference model management information 307.
The inference result obtained in step S205 is recorded in the annotation information region 310 as annotation information. The location of the annotation information region 310 is indicated by an annotation information link 303 included in an annotation link information storage tag 302. In the present embodiment, it is assumed that annotation information is described in a text format, such as XML and JSON.
Returning to
In step S208, the system control unit 50 executes processing for generating sub-annotation information for the composite image based on the inference result obtained in step S205 (i.e., the inference result for the material image). Specifically, in processing of the first step S208 (i.e., at the time of processing related to the material image 401), the system control unit 50 generates sub-annotation information including the inference result obtained in step S205 within the memory 25. In processing of the second or subsequent step S207 (i.e., at the time of processing related to any of the material images 402 to 411), the system control unit 50 adds information related to the inference result obtained in step S205 to the sub-annotation information stored in the memory 25. In this way, the inference result for the material image can be carried on into the composite image.
In step S209, the system control unit 50 determines whether the shooting instruction by the user has continued. The user can continue the shooting instruction by continuously placing the shutter switches 61 (SW1) and 62 (SW2) in the ON state while continuously depressing the shutter button 60. Processing steps return to step S203 in a case where the shooting instruction has continued, and processing steps proceed to step S210 in a case where the shooting instruction has not continued.
In step S210, the image processing unit 20 executes subject detection processing with respect to the composite image generated through processing of step S207, and obtains information of subjects included in the composite image (subject detection information). Processing of step S210 is similar to processing of step S204, except that the target of processing is the composite image rather than the material image.
In step S211, using the inference engine 73, the system control unit 50 executes inference processing with respect to the composite image. Processing of step S211 is similar to processing of step S205, except that the target of processing is the composite image rather than the material image.
In step S212, the system control unit 50 records a file including the composite image generated in step S207, the sub-annotation information generated in step S207, the subject detection information obtained in step S210, and the inference result obtained in step S211, in the external recording medium 91 as a composite image file.
In the case of the composite image file 320 shown in
In the case of the composite image file 330 shown in
As described above, according to the first embodiment, the digital camera 100 obtains a plurality of material images (e.g., the material image 401 and the material image 402), and subject information pieces indicating subjects that have been detected from respective material images (e.g., information including the inference results from the inference engine 73). Also, the digital camera 100 generates a composite image by compositing the plurality of material images. Furthermore, the digital camera 100 records the subject information pieces of the respective material images in association with the composite image by, for example, generating and recording a composite image file including the subject information pieces of the respective material images and the composite image.
In this way, according to the first embodiment, subject information pieces of respective material images are recorded in association with the composite image. Therefore, even in a case where a subject detected from a material image cannot be detected from a composite image generated from a plurality of material images, subject information indicating this subject can be obtained together with the composite image.
Embodiment(s) of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
Number | Date | Country | Kind |
---|---|---|---|
2021-206266 | Dec 2021 | JP | national |