The present application claims priority from Japanese application JP2023-083130 filed on May 19, 2023, the content of which is hereby incorporated by reference into this application.
The present invention relates to a depth map generation device, a depth map generation method, and a non-transitory information storage medium storing a program for generating a depth map.
A paper described below proposes a technique for generating a depth map from an image captured through a coded aperture. In the paper, two coded apertures having different aperture patterns (shapes of a light transmitting region and a light blocking region) are used. The two coded apertures are used in combination to prevent a frequency band in which a power spectrum is zero from occurring in a filter (a filter for generating a restored image) used in a process for generating a depth map. “C. Zhou, S. Lin, and S. Nayar: Coded Aperture Pairs for Depth from Defocus, IEEE international conference on computer vision, 2009”
In a process for generating a map, noise determination is necessary for determining whether a dot (a pixel) in a depth map is noise or a dot indicating the distance to a subject. A dot determined as noise needs to be removed from the depth map. However, when the same noise removal processing is applied to an entire region of an image in which a plurality of regions having different natures such as a region with low contrast and a region with high contrast are mixed, the noise removal is sometimes not correctly executed.
A depth map generation device proposed in the present disclosure is a depth map generation device for generating a depth map from a captured image captured through a coded aperture. The generation device includes: a depth calculating unit configured to calculate a depth for each of a plurality of unit regions forming the depth map; a dividing unit configured to divide an image region corresponding to the captured image into a plurality of partial regions including at least a first partial region and a second partial region; and a noise removing unit configured to remove, using a first parameter, noise from depths calculated for a plurality of unit regions included in the first partial region, and the noise removing unit being configured to remove, using a second parameter, noise from depths calculated for a plurality of unit regions included in the second partial region.
A depth map generation method proposed in the present disclosure is a method of generating a depth map from a captured image captured through a coded aperture. The generation method includes: a depth calculating step for calculating a depth for each of a plurality of unit regions forming the depth map; a dividing step for dividing an image region corresponding to the captured image into a plurality of partial regions including at least a first partial region and a second partial region; and a noise removing step for removing, using a first parameter, noise from depths calculated for a plurality of unit regions included in the first partial region, and the noise removing step for removing, using a second parameter, noise from depths calculated for a plurality of unit regions included in the second partial region.
A program proposed in the present disclosure is a program for causing a computer to function as a device for generating a depth map from a captured image captured through a coded aperture. The program causes the computer to function as: depth calculating means for calculating a depth for each of a plurality of unit regions forming the depth map; dividing means for dividing an image region corresponding to the captured image into a plurality of partial regions including at least a first partial region and a second partial region; and noise removing means for removing, using a first parameter, noise from depths calculated for a plurality of unit regions included in the first partial region, and the noise removing means for removing, using a second parameter, noise from depths calculated for a plurality of unit regions included in the second partial region.
With the depth map generation device, the depth map generation method, and the program proposed in the present disclosure, it is possible to prevent a problem in that an estimated depth that should originally be displayed is removed from a depth map.
A depth map generation device, a depth map generation method, and a program proposed in the present disclosure are explained below.
The imaging element 13 is an image sensor such as a CMOS (Complementary Metal Oxide Semiconductor) sensor or a CCD (Charge Coupled Device).
The liquid crystal panel 14 includes a plurality of pixels. The liquid crystal panel 14 includes a coded aperture 14a in a part thereof. A control unit 11 explained below drives liquid crystal of the coded aperture 14a to form an aperture pattern specified in advance. The aperture pattern is explained in detail below.
As illustrated in
The control unit 11 includes at least one processor such as a CPU (Central Processing Unit) or a GPU (Graphical Processing Unit). Image data acquired by the imaging element 13 is provided to the control unit 11. The control unit 11 generates, using the image data, a depth map indicating the distance to a subject.
The storage unit 12 includes a main storage unit and an auxiliary storage unit. For example, the main storage unit is a volatile memory such as a RAM (Random Access Memory). The auxiliary storage unit is a nonvolatile memory such as a ROM (Read Only Memory), an EEPROM (Electrically Erasable and Programmable Read Only Memory), a flash memory, or a hard disk. The control unit 11 executes a program stored in the storage unit 12 to control the liquid crystal panel 14 and calculate a depth (the distance to the subject). Processing executed by the control unit 11 is explained below. The generation device 10 may be a portable device such as a smartphone or a tablet PC (Personal Computer) or may be a personal computer connected to a camera.
The input unit 16 may be a touch sensor attached to an image display device that displays a captured image. The input unit 16 may be a keyboard or a pointing device such as a mouse. The input unit 16 inputs a signal corresponding to operation of a user to the control unit 11.
The aperture control unit 11b controls liquid crystal of the coded aperture 14a of the liquid crystal panel 14 to form an aperture pattern specified in advance.
In aperture patterns B1 and B2 illustrated in
The image acquiring unit 11a controls the imaging element 13 and the coded aperture 14a and continuously captures two images f1 and f2 using the two aperture patterns B1 and B2. (In the following explanation, the images f1 and f2 are referred to as captured images). At this time, an interval between the imaging by the first aperture pattern B1 and the imaging by the second aperture pattern B2 may be several hundred milliseconds or several ten milliseconds.
Note that the number of aperture patterns formed by the aperture control unit 11b may be larger than two. In this case, the image acquiring unit 11a may continuously capture images by the number of aperture patterns larger than two. The aperture control unit 11b switches a plurality of aperture patterns in order while synchronizing with light reception by the imaging element 13.
The depth calculating unit 11c generates a depth map from the captured images f1 and f2 acquired by the image acquiring unit 11a. The depth calculating unit 11c calculates, for each of a plurality of unit regions forming the depth map, depths (the distances from the imaging system N to the subject) of the captured images f1 and f2. The unit region of the depth map may be formed of, for example, one pixel in the captured images f1 and f2 . In contrast to this, the unit region of the depth map may be larger than one pixel in the captured images. For example, a plurality of adjacent pixels (for example, 2×2) may form the unit region.
First, the depth calculating unit 11c performs two-dimensional Fourier transform of the captured images f1 and f2 (S101). In the following explanation, frequency characteristics of the captured images f1 and f2 obtained by the two-dimensional Fourier transform are represented as F1 and F2. That is, F1 and F2 is are the results of the two-dimensional Fourier transform executed for the captured images f1 and f2, respectively. Note that, if a high frequency component is included in an image, the influence of noise is sometimes excessively large when a “reference depth restored image” is calculated. Therefore, high frequency components may be removed from the frequency characteristics F1 and F2. For example, a low pass filter that transmits only a frequency equal to or lower than a half (a Nyquist frequency) of a sampling frequency may be used. The frequency characteristics Fi and Fe from which the high-frequency components are removed may be used in processing explained below.
In the generation device 10, a plurality of point spread functions (PSFs) are prepared, which respectively corresponding to a plurality of reference depths that is defined discretely. The reference depths are candidate values of the distance to the subject such as 100 mm, 300 mm, and 700 mm. Each PSF has a shape corresponding to the aperture patterns B1 and B2 of the coded aperture 14a (expressing the aperture patterns B1 and B2). Each PSF has a size corresponding to the reference depth corresponding thereto. Specifically, the size of the PSF decreases as the reference depth increases. The storage unit 12 stores frequency characteristics obtained by performing two-dimensional Fourier transform of the plurality of PSFs. The depth calculating unit 11c acquires, as the PSF, the frequency characteristic from the storage unit 12. The frequency characteristic is referred to as optical transfer function (OTF) as well.
The depth calculating unit 11c generates a restored image corresponding to the reference depth for each of the plurality of reference depths, by using the captured images f1 and f2 and the PSFs (S102). In the following explanation, the restored image is referred to as “reference depth restored image”. The depth calculating unit 11c calculates, specifically, by using the following Math. 1, a frequency characteristic of a reference depth restored image to corresponding the frequency characteristics F1 and F2 of the captured images f1 and f2 and the frequency characteristics of the PSFs.
Math. 1 is a Wiener filter generalized to be applicable to the two aperture patterns B1 and B2. In Math 1, characters means the following elements.
The sizes of the PSFs decrease as the distance (the depth) from the imaging system N to the subject increases. For that reason, the frequency characteristics K1_d and K2_d a of the PSFs are also defined according to the distance from the imaging system N to the subject. In Math. 1, a subscript “d” added to K1 and K2 corresponds to a reference depth such as 100 mm or 300 mm. For example, functions K1_100 and K2_100 are frequency characteristics (optical transfer functions) of a point spread function for a subject which is distant at 100 mm from the imaging system N. The number of reference depths may be larger than two and may be, for example, ten, twenty, or thirty.
In S102, the depth calculating unit 11c calculates, using Math. 1, a frequency characteristic F0_d of a reference depth restored image corresponding to the two frequency characteristics F1 and F2 of the captured images f1 and f2 and the frequency characteristics K1_d and K2_d of the PSFs. The depth calculating unit 11c calculates, for each of a plurality of reference depths (d), the frequency characteristic F0_d of the reference depth restored image. For example, the depth calculating unit 11c calculates a frequency characteristic F0_100 of the reference depth restored image based on the frequency characteristics K1_100 and K2_100 and the frequency characteristics F1 and F2 of the captured images f1 and f2. Further, the depth calculating unit 11c calculates a frequency characteristic F0_300 of the reference depth restored image based on the frequency characteristics K1 300 and K2 300 and the frequency characteristics F1 and F2 of the captured images f1 and f2. The depth calculating unit 11c executes the same calculations for the other frequency characteristics K1_700, K2_700, K1_1000, K2_1000, or the like.
According to Math. 1, a subject, the distance of which from the imaging system N is equal to the reference depth, appears in a state without blur in the reference depth restored image. For example, a subject placed at 100 mm from the imaging system N appears in a state without blur in the reference depth restored image obtained by the frequency characteristics K1_100 and K2_100. On the other hand, in a reference depth restored image obtained by the frequency characteristics K1_d and K2_d of PSFs corresponding to other reference depths such as 300 mm or 700 mm, blur (deviation of a pixel value) appears in the same subject. The blur more strongly appears as the difference between an actual depth of the subject and the reference depth increases. Therefore, in the following processing, the depth calculating unit 11c calculates the distance (the depth) to the subject by using a degree of the blur.
The depth calculating unit 11c calculates a deviation value map Md based on the frequency characteristic F0_d of the reference depth restored image and the frequency characteristics F1 and F2 of the captured images f1 and f2 (S103). The deviation value map Md is calculated for each of the plurality of reference depths (d). In the deviation value map Md, a deviation degree (a deviation value) between the reference depth (d) and the actual depth is indicated at each pixel (each unit region of the depth map).
The depth calculating unit 11c calculates the deviation value map Md, for example, referring to the following Math. 2.
In Math. 2, characters mean the following elements.
F1: A frequency characteristic of the captured image f1 obtained by the first aperture pattern B1
F2: A frequency characteristic of the captured image f2 obtained by the second aperture pattern B2
As explained above, in the processing in S102, the depth calculating unit 11c generates, respectively using a plurality of the frequency characteristics K1_d and K2_d expressing PSFs defined for the reference depths (d), a plurality of reference depth restored images (more specifically, frequency characteristics F0_d thereof) from the captured images f1 and f2 (more specifically, the frequency characteristics F1 and F2 thereof). In S103, the depth calculating unit 11c calculates a deviation value wd between the reference depth (d) and the actual depth, by using the plurality of reference depth restored images represented by the frequency characteristics F0_d and the frequency characteristics F1 and F2 expressing the captured images f1 and f2.
The depth calculating unit 11c calculates a depth (the distance from the imaging system N to the subject) for each pixel (the unit region of the depth map), by referring to a plurality of deviation value maps Ma (S104).
The deviation value wd is the smallest at the reference depth (d) that is the same as the actual depth or close to the actual depth. In the example illustrated in
Note that depth map calculation processing is not limited to the explained above. In the processing explained above, the depth calculating unit 11c sets, as the estimated depth d0 of the pixel, the reference depth (d) of the deviation value map Md that minimizes the deviation value wd. In contrast to this, the depth calculating unit 11c may calculate a function indicating a relation between a depth and the deviation value wd, and then calculate a minimum value by using the function. The depth calculating unit 11c may calculate, as the estimated depth d0 of the pixel, a depth at which the minimum value is obtained.
This processing by the depth calculating unit 11c can be performed, for example, as explained below. The depth calculating unit 11c fits, for example, a cubic function to the relation between the reference depth (d) and the deviation value wd illustrated in
The depth calculating unit 11c calculates a minimum of Math. 3. That is, the depth calculating unit 11c solves δW/δd=0 to calculate a depth for minimizing the function W (d). The depth calculating unit 11c sets, as the estimated depth d0, the depth calculated in this way. With the processing explained above, it is possible to increase the resolution for the depth without increasing the number of reference depths (the number of PSFs).
With Math. 2 explained above, the estimated depth d0 is calculated for a position where blur occurs by executing convolutional integration based on PSFs. For example, the estimated depth d0 is calculated in the boundary between the outer edge of the subject and a background (That is, the estimated depth d0 is calculated for a position where change in pixel values appears). On the other hand, it is difficult to estimate an accurate depth for a region where blur does not occur by executing convolutional integration based on PSFs. In the region where blur does not occur, the change in the deviation value wd dependent on the reference depths is small because change in pixel values is small (that is, the contrast in the pixels is low). In other words, the estimated depth d0 obtained for a pixel having small change in the deviation value wd is likely to be noise. Therefore, the depth calculating unit 11c determines such an estimated depth d0 as noise and removes the estimated depth d0 from the depth map obtained in S104. The processing of the depth calculating unit 11c is performed, for example, as explained below.
The depth calculating unit 11c calculates, based on the deviation value wd calculated for each pixel (the unit region of the depth map), an accuracy evaluation value indicating accuracy of the estimated depth d0 obtained for the pixel. For example, the depth calculating unit 11c calculates the accuracy evaluation value based on the widths of changes in a plurality of deviation values wd calculated for each pixel. The depth calculating unit 11c calculates accuracy evaluation values for all the pixels. The accuracy evaluation value is represented by, for example, Math. 4 explained below.
The change amount Δw is the width of the change in the deviation value wd. The change amount Δw is, for example, as illustrated in
In general, a wrong estimated result is often obtained for a subject located at a short distance due to noise or the like. Therefore, as shown in Math. 4, the accuracy evaluation value β may be a result of weighting the change amount Δw by the estimated depth d0 so that the change amount Δw is evaluated as small amount for the subject (so that the change amount Δw is likely to be removed as a noise).
The depth calculating unit 11c determines, as noise, the estimated depth d0 obtained for a pixel having the accuracy evaluation value β lower than a noise determination threshold and removes the estimated depth d0 from the depth map. In other words, the depth calculating unit 11c displays, on the depth map, only the estimated depth d0 obtained for a pixel having the accuracy evaluation value β higher than the noise determination threshold.
When the processing illustrated in
Therefore, the depth calculating unit 11c divides an image region corresponding to the captured image into a plurality of partial regions. The depth calculating unit 11c executes noise removal processing for the plurality of partial regions respectively using a plurality of parameters different from one another. In the following explanation, the noise removal processing is specifically explained.
The dividing unit 11d divides an image region corresponding to the captured images f1 and f2 into a plurality of partial regions. For example, the dividing unit 11d divides the depth map before noise removal calculated in S104 into a plurality of partial regions (S105).
The noise removing unit 11e removes, using a first parameter set for a first partial region A1 (see
The processing of the noise removing unit 11e is performed, for example, as explained below. After the dividing unit 11d has executed the division processing, first, the noise removing unit 11e sets a parameter for noise determination based on a plurality of accuracy evaluation values β calculated for pixels in each partial region (S106). For example, the noise removing unit 11e calculates a maximum value of the accuracy evaluation values β calculated for a plurality of pixels in each partial region. The noise removing unit 11e multiplies the maximum value of the accuracy evaluation values β by a predetermined percentage (A %) and sets a result of the multiplication (β×A %) as a parameter (a noise determination threshold) used in noise determination for the partial region. The noise removing unit 11e calculates a noise determination threshold for each of the plurality partial regions A1 to A15 (see
Subsequently, the noise removing unit 11e compares the parameter (the noise determination value) set for each partial region and the accuracy evaluation value β of each pixel included in each partial region and removes, from the depth map, the estimated depth d0 determined as noise based on a result of the comparison (S107). For example, the noise removing unit 11e determines, as noise, the estimated depth d0 calculated for a pixel having the accuracy evaluation value β lower than the noise determination threshold and removes the estimated depth d0 from the depth map. The noise removing unit 11e executes the noise removal processing in S107 for each of the plurality of partial regions A1 to A15 (see
As explained above, the noise removing unit 11e determines whether the estimated depths d0 of the pixels in each partial regions are noise, based on the accuracy evaluation value β calculated for each pixel within the plurality of partial regions and the parameter (the noise determination threshold) set for each of the partial regions A1 to A15.
Note that the processing of the noise removing unit 11e is not limited to the example explained here. For example, the parameter set for each of the partial regions A1 to A15 may not be the noise determination threshold.
The noise removing unit 11e may compare the accuracy evaluation value α and the noise determination threshold Nth common to the plurality of partial regions. When the accuracy evaluation value β is lower than the noise determination threshold, the noise removing unit 11e may remove the estimated depth d0 of the pixel from the depth map. The noise removing unit 11e may execute the processing explained above for each of the plurality of partial regions A1 to A15.
In this case, parameters α1 to α15 applied to the partial regions A1 to A15 may be set based on contrasts of the partial regions. For example, in a partial region having small contrast (for example, a region where the entire partial region is dark), a relatively large parameter may be used. More specifically, the parameter for the partial region having small contrast (for example, the region where the entire partial region is dark) may be set so that a maximum of change amounts Δw of deviation values w of a plurality of pixels in the partial region having relatively small contrast coincides with a maximum of change amounts Δw of deviation values w of pixels in a partial region having relatively large contrast. With the processing explained above as well, it is possible to prevent the problem in that the estimated depth d0 that should originally be displayed is excluded from the depth map.
Note that the depth map generation device, the depth map generation method, and the program proposed in the present disclosure are not limited to the examples explained above.
In the examples explained above, as illustrated in
First, after the captured images f1 and f2 are acquired, the dividing unit 11d divides each of the captured images f1 and f2 into the plurality of partial regions A1 to A15 (S201). For example, as explained with reference to
The noise removing unit 11e sets, based on the accuracy evaluation values β calculated for the pixels in each of the partial regions A1 to A15, a parameter (a noise determination threshold) for noise determination (S206). For example, the noise removing unit 11e multiplies a maximum value of the accuracy evaluation values β calculated for the pixels in each of the partial regions A1 to A15 by a predetermined percentage (for example, 50%) and sets a result of the multiplication as the parameter (the noise determination threshold) for noise determination. The noise removing unit 11e compares the parameter (the noise determination threshold) set for each of the partial regions A1 to A15 and the accuracy evaluation values β of the pixels included in each of the partial regions A1 to A15. The noise removing unit 11e removes, from the depth map, the estimated depth d0 determined as noise based on a result of the comparison (S207). For example, the noise removing unit 11e determines, as noise, for example, the estimated depth d0 calculated for a pixel having the accuracy evaluation value β lower than the noise determination threshold and removes the estimated depth d0 from the depth map.
In the depth map generation device 10, the control unit 11 acquires the two captured images f1 and f2 with the two coded apertures different from each other. In contrast to this, the control unit 11 may acquire three captured images with three coded apertures different from one another. Further, as another example, the control unit 11 may acquire a captured image with one kind of a coded aperture.
The control unit 11 calculates the accuracy evaluation value β for the estimated depth d0 based on the difference between the maximum and the minimum of the deviation value wd calculated for each pixel. The calculation of the accuracy evaluation value β is not limited to this. For example, the control unit 11 may calculate the accuracy evaluation value β for the estimated depth d0 based on the minimum of the deviation value wd calculated for each pixel, although the accuracy of the estimation might be deteriorated.
The depth calculating unit 11c may perform various kinds of image processing for the captured images f1 and f2. For example, the depth calculating unit 11c may calculate contrast of each partial region after the division processing by the dividing unit 11d was performed. Then, when there is a partial region where contrast therein is lower than a predetermined value and the maximum luminance therein is lower than the predetermined value, the depth calculating unit 11c may execute, for the partial region or the entire captured images f1 and f2, image processing for expanding a histogram. This makes it possible to calculate the estimated depth d0 for a dark partial region.
Although the present invention has been illustrated and described herein with reference to embodiments and specific examples thereof, it will be readily apparent to those of ordinary skill in the art that other embodiments and examples may perform similar functions and/or achieve like results. All such equivalent embodiments and examples are within the spirit and scope of the present invention, are contemplated thereby, and are intended to be covered by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
2023-083130 | May 2023 | JP | national |