The present disclosure relates to an imaging device, an image processing device, and an image processing method.
In recent years, a method has been proposed in which an image sensor is shifted to acquire a plurality of images and the acquired plurality of images is combined to generate a high-resolution image as an output image by applying a camera shake prevention mechanism provided in an imaging device. For example, as an example of such a method, a technique disclosed in Patent Literature 1 below can be exemplified.
Patent Literature 1: WO 2019/008693 A
In the above method, in a case where a moving subject is photographed, a plurality of continuously acquired images is combined, and thus subject blurring occurs. Therefore, in a case where a moving subject is photographed, it is conceivable to switch the output mode of the output image such as outputting one image as an output image instead of combining a plurality of images in order to avoid subject blurring. Then, in a case where the switching as described above is performed, it is required to more accurately determine whether or not the moving subject (moving subject) is included in the acquired image.
Therefore, the present disclosure proposes an imaging device, an image processing device, and an image processing method capable of more accurately determining whether or not a moving subject is included.
According to the present disclosure, provided is an imaging device including: an imaging module including an image sensor in which a plurality of pixels for converting light into an electric signal is arranged; a drive unit that moves a part of the imaging module in a manner that the image sensor can sequentially acquire a reference image under a predetermined pixel phase, a plurality of generation images, and a detection image under the predetermined pixel phase in this order; and a detection unit that detects a moving subject based on a difference between the reference image and the detection image.
Furthermore, according to the present disclosure, provided is an image processing device including: an acquisition unit that sequentially acquires a reference image under a predetermined pixel phase, a plurality of generation images, and a detection image under the predetermined pixel phase obtained by an image sensor in which a plurality of pixels for converting light into an electric signal is arranged, in this order; and a detection unit that detects a moving subject based on a difference between the reference image and the detection image.
Moreover, according to the present disclosure, provided is an image processing method including: sequentially acquiring a reference image under a predetermined pixel phase, a plurality of generation images, and a detection image under the predetermined pixel phase obtained by an image sensor in which a plurality of pixels for converting light into an electric signal is arranged, in this order; and detecting a moving subject based on a difference between the reference image and the detection image.
Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. Note that, in the present specification and the drawings, components having substantially the same functional configuration are denoted by the same reference numerals, and redundant description is omitted. Furthermore, in the present specification and the drawings, similar components of different embodiments may be distinguished by adding different alphabets after the same reference numerals. However, in a case where it is not necessary to particularly distinguish each of similar components, only the same reference numeral is assigned.
Note that the description will be given in the following order.
1. History until creation of embodiments according to present disclosure
1.1. History until creation of embodiments according to present disclosure
1.2. Concept of embodiments of present disclosure
2. First Embodiment
2.1. Outline of imaging device
2.2. Details of processing unit
2.3. Details of generation unit
2.4. Image processing method
2.5. Modifications
3. Second Embodiment
4. Third Embodiment
5. Fourth Embodiment
6. Fifth Embodiment
7. Summary
8. Hardware configuration
9. Supplement
<1.1. History Until Creation of Embodiments According to Present Disclosure>
First, before describing the details of the embodiments according to the present disclosure, the history until creation of the embodiments according to the present disclosure by the present inventors will be described with reference to
In a charge coupled device (CCD) image sensor or a complementary metal-oxide-semiconductor (CMOS) image sensor, a configuration in which primary color filters are used and a plurality of pixels for detecting red, green, and blue light is arranged on a plane is widely used. For example, as illustrated in
That is, in the image sensor unit 130, a plurality of pixels 132 corresponding to each color is arranged in a manner that a predetermined pattern repeats. In the following description, the term “pixel phase” means a relative position of the arrangement pattern of pixels with respect to a subject indicated by an angle as a position within one cycle in a case where the above pattern is set as one cycle. Hereinafter, the definition of the “pixel phase” will be specifically described using the example illustrated in
By the way, in recent years, a method has been proposed in which the image sensor unit 130 is shifted along a predetermined direction by one pixel to acquire a plurality of images and the acquired plurality of images is combined to generate a high-resolution image by applying a camera shake prevention mechanism provided in an imaging device. In detail, as illustrated in
In the image obtained by the above method, as is clear from the above description, improvement in resolution can be expected in the region of the subject 400 (stationary subject) that is stationary. On the other hand, in the region of the moving subject in the image obtained by the above method, since a plurality of images obtained by continuous photographing at different timings is combined, subject blurring occurs because of the movement of the subject 400 during continuous photographing. Therefore, in a case where a plurality of images photographed at different timings is combined as in the above method, it is conceivable to prevent subject blurring by the following method. For example, there is a method of determining whether or not a moving subject is included in an image by detecting a difference between a plurality of images acquired by the above method, and selecting not to combine the plurality of images in the region of the moving subject in a case where the moving subject is included.
However, as a result of intensive studies on the above method, the present inventors have found that a stationary subject may be misidentified as a moving subject in a method of simply detecting a difference between a plurality of images and determining whether or not a moving subject is included in an image as in the above method. Hereinafter, it will be described with reference to
As illustrated in
Then, as illustrated in
<1.2. Concept of Embodiments of Present Disclosure>
Therefore, the present inventors have created the embodiments of the present disclosure in which it is possible to prevent a stationary subject from being misidentified as a moving subject, that is, it is possible to more accurately determine whether or not a moving subject is included, by focusing on the above knowledge. Hereinafter, a concept common to the embodiments of the present disclosure will be described with reference to
As described above, in a method of simply detecting a difference between a plurality of images and determining whether or not a moving subject is included in an image, a stationary subject may be misidentified as a moving subject. The reason for this is considered to be that, even in the case of an image of a stationary subject, a difference occurs between a plurality of images because the form of mixing of the return signal is different due to a difference in the pixel phases between the plurality of images. Therefore, the present inventors have conceived that determination of whether or not a moving subject is included in an image is performed by detecting a difference between the images of the same phase in view of the reason why a difference occurs because of the different mixing forms of the return signal.
In detail, as illustrated in
Note that, in
By the way, in a case where the imaging device is not fixed (for example, vibration of the ground to which the imaging device is fixed, vibration of the imaging device due to user operation, vibration of a tripod to which the imaging device is fixed, and the like), if the above method for generating a high-resolution image is to be used, an image having subject blurring as a whole is generated. That is, in a case where the imaging device is not fixed, it may be preferable not to use a method for generating a high-resolution image (in the following description, it is referred to as a fitting combination mode) in a manner that breakage (for example, subject blurring) does not occur in the generated image. Therefore, in the embodiment of the present disclosure created by the present inventors, in a case where it is detected that the imaging device is not fixed, the mode is switched to generate the output image in the motion compensation mode (see
<2.1. Outline of Imaging Device>
First, a configuration of an imaging device 10 according to an embodiment of the present disclosure will be described with reference to
(Imaging Module 100)
The imaging module 100 forms an image of incident light from the subject 400 on the image sensor unit 130 to supply electric charge generated in the image sensor unit 130 to the processing unit 200 as an imaging signal. In detail, as illustrated in
The optical lens 110 can collect light from the subject 400 and form an optical image on the plurality of pixels 132 (see
The image sensor unit 130 can acquire an optical image formed by the above optical lens 110 as an imaging signal. Furthermore, in the image sensor unit 130, for example, acquisition of an imaging signal is controlled by the control unit 300. In detail, the image sensor unit 130 includes the plurality of pixels 132 arranged on the light receiving surface that converts light into an electric signal (see
More specifically, as illustrated in
For example, in the present embodiment, as illustrated in
The drive unit 140 can shift the image sensor unit 130 along the arrangement direction of the pixels, in other words, can shift the image sensor unit 130 in units of pixels in the horizontal direction and the vertical direction. In addition, the drive unit 140 includes an actuator, and the shift operation (the shift direction and the shift amount) is controlled by the control unit 300 to be described later. Specifically, the drive unit 140 can move the image sensor unit 130 at least in the light receiving surface (predetermined surface) in the horizontal direction and the vertical direction by a predetermined unit (for example, by one pixel) in a manner that the reference image, the plurality of generation images, and the detection image can be sequentially acquired in this order by the image sensor unit 130 described above (see
(Processing Unit 200)
The processing unit 200 can generate a high-resolution output image based on the imaging signal from the imaging module 100 described above. The processing unit 200 is realized by, for example, hardware such as a central processing unit (CPU), a read only memory (ROM), and a random access memory (RAM). In addition, for example, in the processing unit 200, generation of an output image may be controlled by the control unit 300 to be described later. A detailed configuration of the processing unit 200 will be described later.
(Control Unit 300)
The control unit 300 can control the imaging module 100 and the processing unit 200. The control unit 300 is realized by, for example, hardware such as a CPU, a ROM, and a RAM.
Note that, in the following description, the imaging module 100, the processing unit 200, and the control unit 300 will be described as being configured as the integrated imaging device 10 (standalone). However, the present embodiment is not limited to such a standalone configuration. That is, in the present embodiment, for example, the imaging module 100, the control unit 300, and the processing unit 200 may be configured as separate units. In addition, in the present embodiment, for example, the processing unit 200 may be configured as a system including a plurality of devices on the premise of connection to a network (or communication between devices), such as cloud computing.
<2.2. Details of Processing Unit>
As described above, the processing unit 200 is a device capable of generating a high-resolution output image based on the imaging signal from the imaging module 100 described above. As illustrated in
(Acquisition Unit 210)
By acquiring the imaging signal from the imaging module 100, the acquisition unit 210 can acquire the reference image, the generation image, and the detection image sequentially obtained by the image sensor unit 130 in association with the shift direction and the shift amount (pixel phase) of the image sensor unit 130. The shift direction and the shift amount can be used for alignment and the like at the time of generating a composite image. Then, the acquisition unit 210 outputs the acquired images to the detection unit 220 and the generation unit 240 to be described later.
(Detection Unit 220)
The detection unit 220 can detect a moving subject based on a difference between the reference image and one or the plurality of detection images or based on a difference between the plurality of detection images acquired in the order adjacent to each other. For example, the detection unit 220 extracts a region (difference) of different images between the reference image and the detection image, and performs binarization processing on the extracted difference image. Thus, a difference value map (see
(Comparison Unit 230)
The comparison unit 230 calculates the area of the imaging region of the moving subject based on the difference between the reference image and the detection image, and compares the area of the moving subject region corresponding to the moving subject with a predetermined threshold value. For example, the comparison unit 230 calculates the area of the image region of the moving subject in the difference value map output from the detection unit 220. Furthermore, for example, in a case where the calculated area is the same as the area of the entire image (predetermined threshold value) or larger than the area corresponding to, for example, 80% of the entire image area (predetermined threshold value), the comparison unit 230 determines that the imaging device 10 is not fixed. Then, the comparison unit 230 outputs the result of the comparison (determination) to the generation unit 240 to be described later, and the generation unit 240 switches (changes) the generation mode of the output image according to the result. Note that, in the present embodiment, the predetermined threshold value can be appropriately changed by the user.
(Generation Unit 240)
The generation unit 240 generates an output image using the plurality of generation images based on the result of detection of a moving subject by the detection unit 220 (in detail, the comparison result of the comparison unit 230). Note that a detailed configuration of the generation unit 240 will be described later.
<2.3. Details of Generation Unit>
As described above, the generation unit 240 changes the generation mode of the output image based on the comparison result of the comparison unit 230. Therefore, in the following description, details of each functional unit of the generation unit 240 will be described for each generation mode with reference to
—Fitting Combination Mode—
In a case where the area of the moving subject region is smaller than the predetermined threshold value, the generation unit 240 generates an output image in the fitting combination mode. In the fitting combination mode, the generation unit 240 can generate a composite image by combining a plurality of stationary subject images obtained by excluding a moving subject from each of the plurality of generation images, and generate an output image by fitting the reference image into the composite image. In detail, as illustrated in
(Difference Detection Unit 242)
The difference detection unit 242 detects a difference between the reference image and the detection image output from the acquisition unit 210 described above. Similarly to the detection unit 220 described above, the difference detection unit 242 extracts a region (difference) of different images between the reference image and the detection image, and performs binarization processing on the extracted difference image. Thus, a difference value map (see
(Motion Vector Detection Unit 244)
For example, the motion vector detection unit 244 divides the reference image and the detection image output from the acquisition unit 210 described above for each pixel, performs image matching for each of the divided blocks (block matching), and detects the motion vector (see
(Extraction Map Generation Unit 246)
The extraction map generation unit 246 refers to the difference value map (see
(Stationary Subject Image Generation Unit 248)
The stationary subject image generation unit 248 refers to the above extraction maps #11 to #13 (see
(Composite Image Generation Unit 250)
The composite image generation unit 250 combines the plurality of stationary subject images #21 to #23 (see
(Output Image Generation Unit 252)
The output image generation unit 252 generates an output image by fitting the reference image #0 into the composite image obtained by the composite image generation unit 250. At this time, regarding the reference image #0 to be combined, it preferable to perform interpolation processing (for example, a process of interpolating the missing color information by the color information of blocks located around the block on the image) and fill the images of all the blocks beforehand. In the present embodiment, by doing so, even in a case where there is a missing region in all the stationary subject images #21 to #23 (see
As described above, in the present embodiment, the output image is obtained by combining the plurality of stationary subject images #21 to #23 (see
—Motion Compensation Mode—
In a case where the area of the moving subject region is larger than the predetermined threshold value, the generation unit 240 generates an output image in the motion compensation mode. In the motion compensation mode, the generation unit 240 predicts motion of the moving subject based on the plurality of generation images sequentially acquired by the image sensor unit 130, and can generate a high-resolution output image to which motion compensation processing based on the result of the prediction has been applied. In detail, as illustrated in
(Upsampling Unit 260)
The upsampling unit 260 acquires a low-resolution image (in detail, the low-resolution image in the current frame) from the acquisition unit 210 described above, and upsamples the acquired low-resolution image to the same resolution as that of the high-resolution image. Then, the upsampling unit 260 outputs the upsampled high-resolution image to the motion vector detection unit 264, the mask generation unit 268, and the mixing unit 270.
(Buffer Unit 262)
The buffer unit 262 holds the high-resolution image of the immediately preceding frame obtained by the processing immediately before the current frame, and outputs the held image to the motion vector detection unit 264 and the motion compensation unit 266.
(Motion Vector Detection Unit 264)
The motion vector detection unit 264 detects a motion vector from the upsampled high-resolution image from the upsampling unit 260 and the high-resolution image from the buffer unit 262 described above. Note that a method similar to that of the motion vector detection unit 244 described above can be used for the detection of the motion vector by the motion vector detection unit 264. Then, the motion vector detection unit 264 outputs the detected motion vector to the motion compensation unit 266 to be described later.
(Motion Compensation Unit 266)
The motion compensation unit 266 refers to the motion vector from the motion vector detection unit 264 and the high-resolution image of the immediately preceding frame from the buffer unit 262, predicts the high-resolution image of the current frame, and generates a predicted image. Then, the motion compensation unit 266 outputs the predicted image to the mask generation unit 268 and the mixing unit 270.
(Mask Generation Unit 268)
The mask generation unit 268 detects a difference between the upsampled high-resolution image from the upsampling unit 260 and the predicted image from the motion compensation unit 266, and generates a mask that is an image region of the moving subject. A method similar to that of the detection unit 220 described above can be used for the detection of the difference in the mask generation unit 268. Then, the mask generation unit 268 outputs the generated mask to the mixing unit 270.
(Mixing Unit 270)
The mixing unit 270 refers to the mask from the mask generation unit 268, performs weighting on the predicted image and the upsampled high-resolution image, and mixes the predicted image and the upsampled high-resolution image according to the weighting to generate a mixed image. Then, the mixing unit 270 outputs the generated mixed image to the downsampling unit 272 and the addition unit 278. In the present embodiment, in the generation of the mixed image, it is preferable to avoid failure in the final image caused by an error in prediction by the motion compensation unit 266 by weighting and mixing the upsampled high-resolution image in a manner that the upsampled high-resolution image is largely reflected in the moving subject image region (mask) with motion.
(Downsampling Unit 272)
The downsampling unit 272 downsamples the mixed image from the mixing unit 270 to the same resolution as that of the low-resolution image, and outputs the downsampled low-resolution image to the subtraction unit 274.
(Subtraction Unit 274)
The subtraction unit 274 generates a difference image between the low-resolution image of the current frame from the acquisition unit 210 described above and the low-resolution image from the downsampling unit 272, and outputs the difference image to the upsampling unit 276. The difference image indicates a difference in the predicted image with respect to the low-resolution image of the current frame, that is, an error due to prediction.
(Upsampling Unit 276)
The upsampling unit 276 upsamples the difference image from the subtraction unit 274 to the same resolution as that of the high-resolution image, and outputs the upsampled difference image to the addition unit 278 to be described later.
(Addition Unit 278)
The addition unit 278 adds the mixed image from the mixing unit 270 and the upsampled difference image from the upsampling unit 276, and generates a final high-resolution image of the current frame. The generated high-resolution image is output to the buffer unit 262 described above as an image of the immediately preceding frame in the processing of the next frame, and is also output to another device.
As described above, according to the present embodiment, by adding the error of the low-resolution image based on the prediction with respect to the low-resolution image of the current frame obtained by the imaging module 100 to the mixed image from the mixing unit 270, it is possible to obtain a high-resolution image closer to the high-resolution image of the current frame to be originally obtained.
<2.4. Image Processing Method>
The imaging device 10 according to the present embodiment and the configuration of each unit included in the imaging device 10 have been described in detail above. Next, the image processing method according to the present embodiment will be described. Hereinafter, the image processing method in the present embodiment will be described with reference to
Note that, in the following description, a case where the present embodiment is applied to the pixels 132r that detect red light in the image sensor unit 130 will be described. That is, in the following, a case where a moving subject is detected by an image by the plurality of pixels 132r that detects red light will be described as an example. In the present embodiment, for example, by detecting a moving subject by an image by one type of the pixel 132 among the three types of the pixels 132b, 132g, and 132r that detect blue, green, and red light, an increase in processing amount for detection can be suppressed. Note that, in the present embodiment, detection of a moving subject may be performed by an image by the pixels 132b that have an arrangement pattern similar to that of the pixels 132r and detect blue light, instead of the pixels 132r that detect red light. Even in this case, the detection can be performed similarly to the case of detecting by the image by the pixels 132r to be described below.
(Step S101)
First, the imaging device 10 acquires the reference image #0, for example, in phase A (predetermined pixel phase) (see
(Step S103)
As illustrated in
(Step S105)
As illustrated in
In this way, for example, in the example illustrated in
(Step S107)
The imaging device 10 detects a difference between the reference image #0 acquired in Step S101 and the detection image #4 acquired in Step S105. In detail, as illustrated in the lower right part of
In the present embodiment, since the reference image #0 and the detection image #4 are acquired in the same phase (phase A), the form of mixing of the return signal is the same, and thus a difference due to a difference in the form of mixing of the return signal does not occur. Therefore, according to the present embodiment, since it is possible to prevent a stationary subject from being misidentified as a moving subject because of the different mixing forms of the return signal, it is possible to accurately detect the moving subject.
(Step S109)
The imaging device 10 detects a moving subject based on the difference value map generated in Step S107 described above. In detail, the imaging device 10 calculates the area of the imaging region of the moving subject, and compares the area of the moving subject region corresponding to the moving subject with, for example, the area corresponding to 80% of the area of the entire image (predetermined threshold value). In the present embodiment, in a case where the area of the moving subject region is larger than the predetermined threshold value, it is assumed that the imaging device 10 is not fixed. Therefore, the generation mode of the output image is switched from the fitting combination mode to the motion compensation mode. In detail, in a case where the area of the moving subject region is smaller than the predetermined threshold value, the process proceeds to Step S111 of performing the fitting combination mode, and in a case where the area of the moving subject region is larger than the predetermined threshold value, the process proceeds to Step S121 of performing the motion compensation mode.
(Step S111)
Next, the imaging device 10 divides (partitions) the reference image #0 acquired in Step S101 and the detection image #4 acquired in Step S105 in units of pixels, performs image matching for each divided block (block matching), and detects a motion vector indicating the direction and distance in which a moving subject moves. Then, the imaging device 10 generates a motion vector map as illustrated in the lower left part of
Then, as illustrated in the third row from the top in
(Step S113)
As illustrated in the fourth row from the top in
(Step S115)
As illustrated in the lower part of
(Step S117)
The imaging device 10 determines whether or not the stationary subject images #21 to #23 corresponding to all the generation images #1 to #3 are combined in the output image generated in Step S115 described above. In a case where it is determined that the images related to all the generation images #1 to #3 are combined, the process proceeds to Step S119, and in a case where it is determined that the images related to all the generation images #1 to #3 are not combined, the process returns to Step S113.
(Step S119)
The imaging device 10 outputs the generated output image to, for example, another device and the like, and ends the processing.
(Step S121)
As described above, in the present embodiment, in a case where the area of the moving subject region is larger than the predetermined threshold value, it is assumed that the imaging device 10 is not fixed. Therefore, the generation mode of the output image is switched from the fitting combination mode to the motion compensation mode. In the motion compensation mode, as described above, the motion of the moving subject is predicted based on the plurality of generation images sequentially acquired, and a high-resolution output image to which motion compensation processing based on the result of the prediction has been applied can be generated.
To briefly describe the processing in the motion compensation mode, first, the imaging device 10 upsamples the low-resolution image in the current frame to the same resolution as that of the high-resolution image, and detects the motion vector from the upsampled high-resolution image and the held high-resolution image of the immediately preceding frame. Next, the imaging device 10 refers to the motion vector and the high-resolution image of the immediately preceding frame, predicts the high-resolution image of the current frame, and generates a predicted image. Then, the imaging device 10 detects a difference between the upsampled high-resolution image and the predicted image, and generates a mask that is a region of the moving subject. Further, the imaging device 10 refers to the generated mask, performs weighting on the predicted image and the upsampled high-resolution image, and mixes the predicted image and the upsampled high-resolution image according to the weighting to generate a mixed image. Next, the imaging device 10 downsamples the mixed image to the same resolution as that of the low-resolution image, and generates a difference image between the downsampled mixed image and the low-resolution image of the current frame. Then, the imaging device 10 upsamples the difference image to the same resolution as that of the high-resolution image and adds the upsampled difference image to the above mixed image to generate a final high-resolution image of the current frame. In the motion compensation mode of the present embodiment, by adding the error of the low-resolution image based on the prediction with respect to the low-resolution image of the current frame to the mixed image, it is possible to obtain a high-resolution image closer to the high-resolution image of the current frame to be originally obtained.
Furthermore, the imaging device 10 proceeds to Step S119 described above. According to the present embodiment, by switching the generation mode of the output image, even in a case where it is assumed that the imaging device 10 is not fixed, it is possible to provide a robust image without breakage in the generated image.
As described above, according to the present embodiment, since the reference image #0 and the detection image #4 are acquired in the same phase (phase A), the form of mixing of the return signal is the same, and thus a difference due to a difference in the form of mixing of the return signal does not occur. Therefore, according to the present embodiment, since it is possible to prevent a stationary subject from being misidentified as a moving subject because of the different mixing forms of the return signal, it is possible to accurately detect the moving subject. As a result, according to the present embodiment, it is possible to generate a high-resolution image without breakage in the generated image.
Furthermore, in the present embodiment, by detecting a moving subject by an image by one type of the pixel 132r (or the pixel 132b) among the three types of the pixels 132b, 132g, and 132r that detect blue, green, and red light, an increase in processing amount for detection can be suppressed.
<2.5. Modifications>
The details of the first embodiment have been described above. Next, various modifications according to the first embodiment will be described. Note that the following modifications are merely examples of the first embodiment, and the first embodiment is not limited to the following examples.
(Modification 1)
In the present embodiment, in a case where it is desired to more accurately detect a moving subject moving at high speed or moving at changing speed, it is possible to add acquisition of the detection image while acquiring the plurality of generation images. Hereinafter, modification 1 in which the acquisition of the detection image is added will be described with reference to
In the present modification, as illustrated in
Furthermore, in the present modification, in order to detect a moving subject, a difference between the reference image #0 and the detection image #2 is taken, a difference between the reference image #0 and the detection image #4 is taken, and a difference between the reference image #0 and the detection image #6 is taken. Then, in the present modification, a moving subject can be detected without fail even if the moving subject moves at high speed or moves at changing speed by detecting the moving subject by the plurality of differences.
Furthermore, in the present modification, it is possible to detect a motion vector at the timing of acquiring each of the detection images #2 and #4 with respect to the reference image #0. Therefore, according to the present modification, by using the plurality of motion vectors, it is possible to estimate the position of the moving subject on the image at the timing when each of the generation images #1, #3, and #5 is acquired (Step S111). For example, even in a case where the moving speed of the moving subject changes during the period from the acquisition of the reference image #0 to the acquisition of the last detection image #6, according to the present modification, by using the plurality of motion vectors in each stage, the accuracy of the estimation of the position of the moving subject on the image at the timing when each of the generation images #1, #3, and #5 is acquired can be improved. As a result, according to the present modification, since the estimation accuracy is improved, the extraction map corresponding to each of the generation images #1, #3, and #5 can be generated accurately, and furthermore, the stationary subject image can be generated accurately.
That is, according to this modification, it is possible to more accurately detect a moving subject and accurately generate a stationary subject image from each of the generation images #1, #3, and #5. As a result, according to the present modification, a stationary subject is not misidentified as a moving subject and it is possible to generate a high-resolution image without breakage in the generated image.
(Modification 2)
In addition, in the first embodiment described above, the detection image #4 is acquired after the reference image #1 and the generation images #1 to #3 are acquired. However, the present embodiment is not limited to acquiring the detection image #4 at the end. For example, in the present embodiment, by combining the motion prediction, the detection image #4 may be acquired while the generation images #1 to #3 are acquired. In this case, the motion vector of the moving subject is detected using the reference image #0 and the detection image #4, the position of the moving subject in the generation image acquired after the detection image #4 is acquired is predicted with reference to the detected motion vector, and the extraction map is generated.
(Modification 3)
Furthermore, in the first embodiment described above, in Step S109, in a case where the area of the moving subject region is larger than the predetermined threshold value, it is assumed that the imaging device 10 is not fixed. Therefore, the processing has been switched from the fitting combination mode to the motion compensation mode. However, in the present embodiment, the mode is not automatically switched, and the user may finely set in which mode the processing is performed for each region of the image beforehand. In this way, according to the present modification, the freedom of expression of the user who is the photographer can be further expanded.
(Modification 4)
Furthermore, in the present embodiment, the moving subject may be detected by an image by the pixels 132g that detect green light instead of the pixels 132r that detect red light. Therefore, a modification of the present embodiment in which a moving subject is detected in an image by the pixels 132g that detect green light will be described below with reference to
For example, in the present embodiment, in a case of the image sensor unit 130 having a Bayer array as illustrated in
Therefore, in the present modification, as illustrated in
Furthermore, in the present modification, as illustrated in
Furthermore, in the present modification, as illustrated in
In detail, as illustrated in
In the first embodiment described above, a moving subject is detected by an image by the pixels 132r that detect red light (alternatively, the pixels 132b or the pixels 132g). By doing so, in the first embodiment, an increase in the processing amount for detection is suppressed. However, the present disclosure is not limited to detection of a moving subject by an image by one type of pixel 132, and detection of a moving subject may be performed by images by three pixels 132b, 132g, and 132r that detect blue, green, and red light. By doing so, the accuracy of the detection of the moving subject can be further improved. Hereinafter, details of such a second embodiment of the present disclosure will be described.
First, details of a processing unit 200a according to the second embodiment of the present disclosure will be described with reference to
In the present embodiment, as described above, a moving subject is detected by each image of the three pixels 132b, 132g, and 132r that detect blue, green, and red light. Therefore, the processing unit 200a of an imaging device 10a according to the present embodiment includes three detection units 220b, 220g, and 220r in a detection unit 220a. In detail, the B detection unit 220b detects a moving subject by an image by the pixels 132b that detect blue light, the G detection unit 220g detects a moving subject by an image by the pixels 132g that detect green light, and the R detection unit 220r detects a moving subject by an image by the pixels 132r that detect red light. Note that, since the method for detecting a moving subject in an image of each color has been described in the first embodiment, a detailed description will be omitted here.
In the present embodiment, since a moving subject is detected by each image by the three pixels 132b, 132g, and 132r that detect blue, green, and red light, even a moving subject that is difficult to detect depending on the color can be detected without fail by performing detection using images corresponding to a plurality of colors. That is, according to the present embodiment, the accuracy of detection of a moving subject can be further improved.
Note that, in the present embodiment, detection of a moving subject is not limited to being performed by each image by the three pixels 132b, 132g, and 132r that detect blue, green, and red light. For example, in the present embodiment, a moving subject may be detected by an image by two types of pixels 132 among the three pixels 132b, 132g, and 132r. In this case, it is possible to suppress an increase in processing amount for detection while preventing leakage of detection of the moving subject.
In the first embodiment described above, the image sensor unit 130 is shifted along the arrangement direction of the pixels 132 by one pixel, but the present disclosure is not limited to shifting by one pixel, and for example, the image sensor unit 130 may be shifted by 0.5 pixels. Note that, in the following description, shifting the image sensor unit 130 by 0.5 pixels means shifting the image sensor unit 130 along the arrangement direction of the pixels by a distance of half of one side of one pixel. Hereinafter, an image processing method in such a third embodiment will be described with reference to
In addition, in the following description, a case where the present embodiment is applied to the pixels 132r that detect red light in the image sensor unit 130 will be described. That is, in the following, a case where a moving subject is detected by an image by the pixels 132r that detect red light will be described as an example. Note that, in the present embodiment, detection of a moving subject may be performed by an image by the pixels 132b that detect blue light or may be performed by an image by the pixels 132g that detect green light, instead of the pixels 132r that detect red light.
In detail, in the present embodiment, as illustrated in
As described above, according to the present embodiment, by finely shifting the image sensor unit 130 by 0.5 pixels, it is possible to acquire more generation images, and thus, it is possible to generate a high-resolution image with higher definition. Note that the present embodiment is not limited to shifting the image sensor unit 130 by 0.5 pixels, and for example, the image sensor unit 130 may be shifted by another shift amount such as by 0.2 pixels (in this case, the image sensor unit 130 is shifted by a distance of ⅕ of one side of one pixel).
By the way, in each of the above embodiments, in a case where the time between the timing of acquiring the reference image and the timing of acquiring the last detection image becomes long, there is a case where it is difficult to detect a moving subject because the moving subject does not move at constant speed. For example, a case where it is difficult to detect a moving subject will be described with reference to
In detail, as illustrated in
Therefore, a fourth embodiment of the present disclosure capable of detecting a moving subject even in such a case will be described with reference to
In the present modification, as illustrated in
Furthermore, in the present embodiment, in order to detect a moving subject having changing motion, not only the difference between the reference image #0 and the detection image #6 but also the difference between the detection image #4 and the detection image #6 is taken. Specifically, when applied to the example of
In the present embodiment, not only the difference between the reference image #0 and the detection image #6 and the difference between the detection image #4 and the detection image #6 but also the difference between the reference image #0 and the detection image #2 and the difference between the detection image #2 and the detection image #4 may be used. In this case, the moving subject is also detected by the difference between the reference image #0 and the detection image #2 and the difference between the detection image #2 and the detection image #4. As described above, in the present embodiment, the moving subject can be detected without fail by using the plurality of differences.
In the embodiment described so far, the image sensor unit 130 is shifted along the arrangement direction of the pixels by the drive unit 140. However, in the embodiment of the present disclosure, the optical lens 110 may be shifted instead of the image sensor unit 130. Therefore, as a fifth embodiment of the present disclosure, an embodiment in which an optical lens 110a is shifted will be described.
A configuration of an imaging device 10b according to the present embodiment will be described with reference to
Similarly to the embodiments described above, the imaging module 100a forms an image of incident light from the subject 400 on an image sensor unit 130a to supply electric charge generated in the image sensor unit 130a to the processing unit 200 as an imaging signal. In detail, as illustrated in FIG. 21, the imaging module 100a includes the optical lens 110a, the shutter mechanism 120, the image sensor unit 130a, and a drive unit 140a. Hereinafter, details of each functional unit included in the imaging module 100a will be described.
Similarly to the embodiments described above, the optical lens 110a can collect light from the subject 400 and form an optical image on the plurality of pixels 132 (see
Furthermore, the embodiment of the present disclosure is not limited to shifting the image sensor unit 130 or shifting the optical lens 110a, and other blocks (the shutter mechanism 120, the imaging module 100, and the like) may be shifted as long as the image sensor unit 130 can sequentially acquire the reference image, the plurality of generation images, and the detection image.
As described above, according to each embodiment of the present disclosure described above, it is possible to more accurately determine whether or not a moving subject is included in an image. In detail, according to each embodiment, since the reference image #0 and the detection image #4 are acquired in the same phase (phase A), the form of mixing of the return signal is the same, and there is no case where a difference occurs even though the image is an image of a stationary subject. Therefore, according to each present embodiment, a stationary subject is not misidentified as a moving subject because of the different mixing forms of the return signal, and it is possible to accurately detect the moving subject. As a result, according to each embodiment, it is possible to generate a high-resolution image without breakage in the generated image.
The information processing device such as the processing device according to each embodiment described above is realized by a computer 1000 having a configuration as illustrated in
The CPU 1100 operates based on the program stored in the ROM 1300 or the HDD 1400, and controls each unit. For example, the CPU 1100 develops a program stored in the ROM 1300 or the HDD 1400 in the RAM 1200, and executes processing corresponding to various programs.
The ROM 1300 stores a boot program such as a basic input output system (BIOS) executed by the CPU 1100 when the computer 1000 is activated, a program depending on hardware of the computer 1000, and the like.
The HDD 1400 is a computer-readable recording medium that performs non-transient recording of a program executed by the CPU 1100, data used by such a program, and the like. Specifically, the HDD 1400 is a recording medium that records an image processing program according to the present disclosure as an example of a program data 1450.
The communication interface 1500 is an interface for the computer 1000 to connect to an external network 1550 (for example, the Internet). For example, the CPU 1100 receives data from another device or transmits data generated by the CPU 1100 to another device via the communication interface 1500.
The input/output interface 1600 is an interface for connecting an input/output device 1650 and the computer 1000. For example, the CPU 1100 receives data from an input device such as a keyboard or a mouse via the input/output interface 1600. In addition, the CPU 1100 transmits data to an output device such as a display, a speaker, or a printer via the input/output interface 1600. In addition, the input/output interface 1600 may function as a media interface that reads a program and the like recorded in a predetermined recording medium (medium). The medium is, for example, an optical recording medium such as a digital versatile disc (DVD) or a phase change rewritable disk (PD), a magneto-optical recording medium such as a magneto-optical disk (MO), a tape medium, a magnetic recording medium, a semiconductor memory, or the like.
For example, in a case where the computer 1000 functions as the processing unit 200 according to the embodiment of the present disclosure, the CPU 1100 of the computer 1000 executes the image processing program loaded on the RAM 1200 to implement the functions of the detection unit 220, the comparison unit 230, the generation unit 240, and the like. In addition, the HDD 1400 stores an image processing program and the like according to the present disclosure. Note that the CPU 1100 reads the program data 1450 from the HDD 1400 and executes the program data, but as another example, these programs may be acquired from another device via the external network 1550.
In addition, the information processing device according to the present embodiment may be applied to a system including a plurality of devices on the premise of connection to a network (or communication between devices), such as cloud computing. That is, the information processing device according to the present embodiment described above can also be realized as an information processing system that performs processing related to the image processing method according to the present embodiment by a plurality of devices, for example.
Note that the embodiment of the present disclosure described above can include, for example, a program for causing a computer to function as the information processing device according to the present embodiment, and a non-transitory tangible medium on which the program is recorded. In addition, the program may be distributed via a communication line (including wireless communication) such as the Internet.
In addition, each step in the image processing of each embodiment described above may not necessarily be processed in the described order. For example, each step may be processed in an appropriately changed order. In addition, each step may be partially processed in parallel or individually instead of being processed in time series. Furthermore, the processing method of each step may not necessarily be processed according to the described method, and may be processed by another method by another functional unit, for example.
Although the preferred embodiments of the present disclosure have been described in detail with reference to the accompanying drawings, the technical scope of the present disclosure is not limited to such examples. It is obvious that a person having ordinary knowledge in the technical field of the present disclosure can conceive various changes or modifications within the scope of the technical idea described in the claims, and it is naturally understood that these also belong to the technical scope of the present disclosure.
In addition, the effects described in the present specification are merely illustrative or exemplary, and are not restrictive. That is, the technology according to the present disclosure can exhibit other effects obvious to those skilled in the art from the description of the present specification together with or instead of the above effects.
Note that the present technology can also have the configuration below.
(1) An imaging device comprising:
an imaging module including an image sensor in which a plurality of pixels for converting light into an electric signal is arranged;
a drive unit that moves a part of the imaging module in a manner that the image sensor can sequentially acquire a reference image under a predetermined pixel phase, a plurality of generation images, and a detection image under the predetermined pixel phase in this order; and
a detection unit that detects a moving subject based on a difference between the reference image and the detection image.
(2) The imaging device according to (1), wherein
the drive unit moves the image sensor.
(3) The imaging device according to (1), wherein
the drive unit moves an optical lens included in the imaging module.
(4) The imaging device according to any one of (1) to (3) further comprising:
a generation unit that generates an output image using the plurality of generation images based on a result of detection of the moving subject.
(5) The imaging device according to (4) further comprising:
a comparison unit that compares an area of a moving subject region corresponding to the moving subject with a predetermined threshold value, wherein
the generation unit changes a generation mode of the output image based on a result of the comparison.
(6) The imaging device according to (5), wherein
in a case where the area of the moving subject region is smaller than the predetermined threshold value,
the generation unit
combines a plurality of stationary subject images obtained by excluding the moving subject from each of the plurality of generation images to generate a composite image, and
generates the output image by fitting the reference image into the composite image.
(7) The imaging device according to (6), wherein
the generation unit includes
a difference detection unit that detects the difference between the reference image and the detection image,
a motion vector detection unit that detects a motion vector of the moving subject based on the reference image and the detection image,
an extraction map generation unit that estimates a position of the moving subject on an image at a timing when each of the generation images is acquired based on the difference and the motion vector, and generates a plurality of extraction maps including the moving subject disposed at the estimated position,
a stationary subject image generation unit that generates the plurality of stationary subject images by subtracting the corresponding extraction map from the plurality of generation images other than the reference image,
a composite image generation unit that combines the plurality of stationary subject images to generate the composite image, and
an output image generation unit that generates the output image by fitting the reference image into the composite image.
(8) The imaging device according to (5), wherein
in a case where the area of the moving subject region is larger than the predetermined threshold value,
the generation unit
predicts a motion of the moving subject based on the plurality of generation images sequentially acquired by the image sensor, and
generates the output image subjected to motion compensation processing based on a result of prediction.
(9) The imaging device according to any one of (1) to (8), wherein,
the drive unit moves a part of the imaging module in a manner that the image sensor can sequentially acquire the plurality of generation images under a pixel phase other than the predetermined pixel phase.
(10) The imaging device according to any one of (1) to (8), wherein,
the drive unit moves a part of the imaging module in a manner that the image sensor can repeatedly sequentially acquire the generation image and the detection image in this order.
(11) The imaging device according to (10), wherein
the detection unit detects the moving subject based on a difference between the reference image and each of the plurality of detection images.
(12) The imaging device according to (10), wherein
the detection unit detects the moving subject based on a difference between the plurality of the detection images acquired in a mutually adjacent order.
(13) The imaging device according to any one of (1) to (12), wherein
the plurality of pixels includes at least a plurality of first pixels, a plurality of second pixels, and a plurality of third pixels having different arrangements in the image sensor, and
the detection unit detects the moving subject based on a difference between the reference image and the detection image by the plurality of first pixels.
(14) The imaging device according to (13), wherein
a number of the plurality of first pixels in the image sensor is smaller than a number of the plurality of second pixels in the image sensor.
(15) The imaging device according to (13), wherein
a number of the plurality of first pixels in the image sensor is larger than a number of the plurality of second pixels in the image sensor, and is larger than a number of the plurality of third pixels in the image sensor.
(16) The imaging device according to (15), wherein
the detection image is included in the plurality of generation images.
(17) The imaging device according to any one of (1) to (8), wherein
the plurality of pixels includes at least a plurality of first pixels, a plurality of second pixels, and a plurality of third pixels having different arrangements in the image sensor, and
the detection unit includes
a first detection unit that detects the moving subject based on a difference between the reference image and the detection image by the plurality of first pixels, and
a second detection unit that detects the moving subject based on a difference between the reference image and the detection image by the plurality of second pixels.
(18) The imaging device according to (17), wherein
the detection unit further includes a third detection unit that detects the moving subject based on a difference between the reference image and the detection image by the plurality of third pixels.
(19) The imaging device according to any one of (1) to (8), wherein
the drive unit moves a part of the imaging module along an arrangement direction of the plurality of pixels by one pixel in a predetermined plane.
(20) The imaging device according to any one of (1) to (8), wherein
the drive unit moves a part of the imaging module along an arrangement direction of the plurality of pixels by 0.5 pixels in a predetermined plane.
(21) An image processing device comprising:
an acquisition unit that sequentially acquires a reference image under a predetermined pixel phase, a plurality of generation images, and a detection image under the predetermined pixel phase obtained by an image sensor in which a plurality of pixels for converting light into an electric signal is arranged, in this order; and
a detection unit that detects a moving subject based on a difference between the reference image and the detection image.
(22) An image processing method comprising:
sequentially acquiring a reference image under a predetermined pixel phase, a plurality of generation images, and a detection image under the predetermined pixel phase obtained by an image sensor in which a plurality of pixels for converting light into an electric signal is arranged, in this order; and
detecting a moving subject based on a difference between the reference image and the detection image.
(23) An imaging device comprising:
an image sensor in which a plurality of pixels for converting light into an electric signal is arranged;
a drive unit that moves the image sensor in a manner that the image sensor can sequentially acquire a reference image, a plurality of generation images, and a detection image in this order; and
a detection unit that detects a moving subject based on a difference between the reference image and the detection image, wherein
in the image sensor,
a position of at least a part of the plurality of pixels of a predetermined type at a time of acquiring the reference image overlaps a position of at least a part of the plurality of pixels of the predetermined type at a time of acquiring the detection image.
Number | Date | Country | Kind |
---|---|---|---|
2019-159717 | Sep 2019 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/028133 | 7/20/2020 | WO |