The present disclosure relates generally to the field of Light Field (LF) content (e.g. LF image or video).
More specifically, the disclosure relates to a method for estimating the depth of pixels belonging to images in a LF content.
The disclosure can be of interest in any field where LF capture is used, both for professional and consumers.
LF contents consist either in:
Estimation of depth for pixels in LF contents most of the time reduce to the duplication to each view belonging to the LF content of known technics (e.g. matching technics) classically used for determining such depth, based on the availability of at least two views capturing the same scene from two different points of view.
However, approaches based on the use of two different views fail to give reliable results e.g. in particular cases where an occlusion occurs. Indeed, in that case a part of the scene captured by a given view may have been not captured by the other view. In that case no depth can be determined, or an aberrant value is returned by such known algorithms.
Some proposals have been made for using the largest number of points of view available in a LF content in order to get more reliable results in depth estimation e.g. when occlusion occurs like in the paper by H. Zhu, Q. Wang and J. Yu, “Occlusion-Model Guided Anti-Occlusion Depth Estimation in Light Field,” in IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 7, pp. 965-978, October 2017. However, such approach can still be improved.
Consequently, there is a need for a method that takes advantage of the information related to the different views in a LF content for estimating the depth of pixels with an enforced consistency, in particular in presence of occlusions.
There is also a need for having such method that allows a high quality in the estimated depth.
The present disclosure relates to a method for estimating a depth for pixels in a matrix of M images of a light field content, with M>2. Such method comprises, at least for one set of N images taken among the M images, 2<N≤M, a process comprising:
The process is enforced iteratively, each new iteration of the process being carried out with a new N value which is lower than the previous N value used in the previous iteration of the process.
Another aspect of the disclosure pertains to a device for estimating a depth for pixels in a matrix of M images of a light field content, with M>2. Such device comprises a processor or a dedicated computing machine configured for, at least for one set of N images taken among the M images, 2<N≤M, enforcing a process comprising:
The process is enforced iteratively, each new iteration of the process being carried out with a new N value which is lower than the previous N value used in the previous iteration of the process.
In addition, the present disclosure concerns a non-transitory computer readable medium comprising a computer program product recorded thereon and capable of being run by a processor, including program code instructions comprising program code instructions for implementing a method for estimating a depth for pixels in a matrix of M images of a light field content previously described.
Other features and advantages of embodiments shall appear from the following description, given by way of indicative and non-exhaustive examples and from the appended drawings, of which:
In all of the figures of the present document, the same numerical reference signs designate similar elements and steps.
We now describe in relationship with
The matrix of images 100mi comprises four images captured simultaneously, i.e. at a given instant, by a LF capturing system. In other embodiments, the matrix of images comprises any number of images greater than two that have been captured simultaneously by a LF capturing system.
Back to
Among the images belonging to the matrix of images 100mi, there is a current image 100ci and a current pixel 100cp belonging to the current image 100ci. In the sequel, we assume that a depth has not yet been estimated for the current pixel 100cp so that the current pixel 100cp is taken as an example for detailing the steps of the method for estimating a depth for pixels according to the disclosure discussed below in relation with
We now describe in relationship with
Each set of images 200N (also named “angular patch”) of the plurality 200pN comprises three different images (depicted in dark grey) taken among the four images of the matrix of images 100mi.
Such set of images 200N and such plurality 200pN of sets of images 200N are used in the method for estimating a depth for pixels according to the disclosure as discussed below in relation with
In the present embodiment, the plurality 200pN comprises four sets of images 200N. In other embodiments, the method for estimating a depth for pixels according to the disclosure relies on sets of images comprising a different number of images than three and the plurality of sets of images comprises a different number of sets of images than four. For instance, in the embodiment illustrated in
We now describe in relationship with
Each spatial patch of pixels 300P of the plurality 300pP comprises eight different pixels taken in a neighborhood of the current pixel 100cp in the current image 100ci. Spatial neighborhoods usually consist in a square spatial patch of an odd side length, centered on the considered current pixel 100cp.
Such set of spatial patches of pixels 300P and such plurality 300pP of spatial patches of pixels 300P are used in some embodiments of the method for estimating a depth for pixels according to the disclosure as discussed below in relation with
In the present embodiment, the plurality 300pP comprises four spatial patches of pixels 300P. In other embodiments, the method for estimating a depth for pixels according to the disclosure uses spatial patches of pixels comprising a different number of pixels than eight and the plurality of spatial patches of pixels comprises a different number of spatial patches of pixels than four. For instance, in the embodiment illustrated in
We now describe in relationship with
The method for estimating a depth for pixels in the matrix of images 100mi comprises, at least for a set of N images taken among the M images of the matrix of images 100mi (in the embodiment of
Thus, the estimation of the depth of the current pixel 100cp is based on an iterative process involving a different set of images for each iteration. Consequently, all the information available in the matrix of images 100mi is used so that a reliable depth estimation is achieved even in presence of occlusions. Furthermore, sets of images comprising a decreasing number of images are successively considered for the iterations of the process. Thus, the first time a depth is decided as consistent corresponds to a depth estimated based on a greatest number of images in the LF content. This allows a higher quality in the result.
In other embodiments, the step S410 and the step S420 are enforced not only for the current pixel 100cp in the current image 100ci, but for each pixel of each image of the set of N images for which a depth has not yet been estimated.
In other embodiments wherein N<M, and wherein the set of N images belongs to a plurality of sets of N images taken among the M images of the current image 100ci, each iteration of the process is carried out for each set of N images of the plurality of sets of N images. Thus, all the available information in the different views is used, leading to improved depth estimations. In some embodiments, the plurality of sets of N images comprises all the sets of N images taken among the M images of the current image 100ci.
Back to
The derivation of such geometrical consistency is illustrated through an example depicted in
Only two cameras 501, 502 among the four of the LF capturing system capturing the matrix of images 100mi are illustrated in
Let
be the coordinates of the current pixel 100cp in view #c. Let
be the coordinates of the corresponding 3D point 510 in the reference coordinate system (CS), according to the candidate depth associated to the current pixel 100cp. Its projection in the other image corresponding to view number #c gives the theoretical location 502tl of coordinates
Indeed, such projection, based e.g. on a pinhole model of the cameras 501, 502, leads to a location that may not coincide exactly with a pixel of the considered another image. Interpolating the depth map of the other image corresponding to view number #c, one can compute the corresponding 3D point 520 of coordinates
and project it back in view number #c, ending at another theoretical location 501atl coordinates
in the current image 100ci.
The candidate depth of the current pixel 100cp is decided as consistent when a norm of the drift vector
is lower man a predetermined threshold, e.g.:
In other words, in the embodiment of
The candidate depth of the current pixel 100cp is decided as consistent when a distance in the current image between the current pixel 100cp and the other theoretical location 501atl is below a predetermined threshold (e.g. the predetermined threshold is half a size of a pixel in the current image).
In other embodiments wherein N>3, the step S410a1 is enforced for all the other images of the set of N images than the current image 100ci, delivering a corresponding set of theoretical locations. The step S410a2 is enforced for all the theoretical locations in the set of theoretical locations delivering a corresponding set of depth values. The step S410a3 is enforced for all the theoretical locations in the set of theoretical locations and for all the associated depth values in the set of depth values, delivering a set of another theoretical locations in the current images 100ci. The candidate depth of the current pixel 100cp is decided as consistent when a distance between the current pixel 100cp and the other theoretical locations is below the predetermined threshold. Thus, the consistency is based on the information present in all the images of the matrix of images.
Back to
For instance, the photometric consistency can be measured by the standard deviation of the color distribution within the other image(s) at the theoretical location(s) 502t1:
with zc the candidate depth associated to the current pixel 100cp of coordinates
in the current image 100ci corresponding to view number #c.
Geometrical consistency determined in step S410a and photo-consistency determined in step S410b are combined for deciding if the candidate depth associated with the current pixel 100cp is consistent or not with the other depth map(s) of the set of N depth maps. For instance, such combination relies on the method proposed in K. Wolff et al., “Point Cloud Noise and Outlier Removal for Image-Based 3D Reconstruction,” 2016 Fourth International Conference on 3D Vision (3DV), Stanford, Calif., 2016, pp. 118-127.
In other embodiments, only a geometrical consistency criterion is used for deciding if the candidate depth of the current pixel 100cp is consistent or not with the other depth map(s) in step S410.
In other embodiments, only a photo-consistency criterion is used for deciding if the candidate depth of the current pixel 100cp is consistent or not with the other depth map(s).
In still other embodiments, any other consistency criterion between the images of the matrix of images 100mi is used for deciding if a candidate depth of the current pixel 100cp is consistent or not with the other depth map(s).
Back to
Thus, the determination of depth maps is based on the use of all the information available in the images of the LF content. Furthermore, spatial patches of pixels of decreasing number of pixels are successively considered for the successive iterations. Thus, the first time a depth value in a depth map is determined corresponds to a depth value determined based on a largest number of pixels in the images of the LF content. This allows a higher quality in the result
In one embodiment, the projection of the spatial patch of P pixels in at least another image than the current image 100ci is obtained for instance using the same projection mechanism, based e.g. on a pinhole model of the cameras of the LF capturing system, as disclosed above in relation with
In some embodiments wherein P<Pmax, the spatial patch of P pixels belongs to a plurality of spatial patches of P pixels. For a given iteration of the step S400, the matching technic is successively enforced for each spatial patch of P pixels belonging to the plurality of spatial patches of P pixels, delivering a set of intermediate depth values and a corresponding set of confidence values associated to the current pixel 100cp. The candidate depth associated to the current pixel 100cp in the depth map associated to the current image 100ci is an intermediate depth value of higher confidence value in the set of intermediate depth values. In variants, the plurality of spatial patches of P pixels comprises all the spatial patches of P pixels.
Thus, for a given size of patches of pixels considered for determining the depth maps, the patch of pixels that provides the best confidence level is kept for determining the depth maps in question.
In some embodiments, the matching technic enforces a minimization of a cost function Ec according to:
with Dc the depth map for the current image 100ci corresponding to view #c and (u,v) the coordinates of the current pixel 100cp in the current image 100ci.
In that case, the successive enforcement of the matching technic further delivers a set of minimum values of the cost function Ec. In some embodiments, the confidence value is a function at least of:
In some embodiments, the cost function Ec is a function of an MSE (for “Mean Squared Error”) norm, e.g.:
with:
In some embodiments, the cost function Ec is a function of an MAD (for “Mean Absolute Difference”) norm, e.g.:
with ∥.∥1 the L1 norm (sum of absolute values).
In some embodiments, the cost function Ec is a function of an ZNCC (for “Zero-mean Normalized Cross-correlation”) norm, e.g.:
where the summation is performed on the three components R, G and B, and with:
where:
is the mean of image I in the spatial patch Ωs(u,v) taken in a neighborhood of the current pixel 100cp of coordinate (u,v); and
is the standard deviation of image I in the spatial patch Ωs(u,v) taken in a neighborhood of the current pixel 100cp of coordinate (u,v).
In this embodiment, the device 600 for implementing the disclosed method comprises a non-volatile memory 603 (e.g. a read-only memory (ROM) or a hard disk), a volatile memory 601 (e.g. a random-access memory or RAM) and a processor 602. The non-volatile memory 603 is a non-transitory computer-readable carrier medium. It stores executable program code instructions, which are executed by the processor 602 in order to enable implementation of the method described above (method for estimating a depth for pixels in a matrix of images of a light field content) in its various embodiments disclosed above in relationship with
Upon initialization, the aforementioned program code instructions are transferred from the non-volatile memory 603 to the volatile memory 601 so as to be executed by the processor 602. The volatile memory 601 likewise includes registers for storing the variables and parameters required for this execution.
All the steps of the above method for estimating a depth for pixels in a matrix of images of a light field content according to the disclosure may be implemented equally well:
In other words, the disclosure is not limited to a purely software-based implementation, in the form of computer program instructions, but that it may also be implemented in hardware form or any form combining a hardware portion and a software portion.
According to one embodiment, a method is proposed for estimating a depth for pixels in a matrix of M images of a light field content, with M>2. Such method comprises, at least for one set of N images taken among the M images, 2<N≤M, a process comprising:
The process is enforced iteratively, each new iteration of the process being carried out with a new N value which is lower than the previous N value used in the previous iteration of the process.
Thus, the present disclosure proposes a new and inventive solution for estimating the depth of pixels in a light filed (LF) content with an enforced consistency.
More particularly, the estimation of the depth of the pixels is based on an iterative process involving a different set of images (also named angular patch) for each iteration. Thus, all the information available in the matrix of images of the LF content is used so that a reliable depth estimation is achieved even in presence of occlusions.
Furthermore, sets of images comprising a decreasing number of images are successively considered for the iterations of the process. Thus, the first time a depth is decided as consistent corresponds to a depth estimated based on a greatest number of images in the LF content. This allows a higher quality in the result.
According to one embodiment, a device is proposed for estimating a depth for pixels in a matrix of M images of a light field content, with M>2. Such device comprises a processor or a dedicated computing machine configured for, at least for one set of N images taken among the M images, 2<N≤M, enforcing a process comprising:
The process is enforced iteratively, each new iteration of the process being carried out with a new N value which is lower than the previous N value used in the previous iteration of the process.
According to one embodiment, the deciding if a candidate depth of the current pixel is consistent or not, and the selecting the depth of the current pixel are enforced for each pixel of each image of the set of N images for which a depth has not yet been estimated.
According to one embodiment, N<M and the set of N images belongs to a plurality of sets of N images taken among the M images. Each iteration of the process is carried out for each set of N images.
According to one embodiment, the plurality of sets of N images comprises all the sets of N images taken among the M images.
According to one embodiment, the deciding if a candidate depth of the current pixel is consistent or not comprises determining a geometrical consistency between the candidate depth of the current pixel and the other depth map(s) of the set of N depth maps.
Thus, a geometrical consistency criterion between the views allows deciding a depth value that corresponds to a consistent reconstructed 3D scene for all the considered view.
According to one embodiment, the determining a geometrical consistency comprises:
According to one embodiment, N>3. The determining a theoretical location in another image is enforced for all the other images of the set of N images than the current image delivering a corresponding set of theoretical locations. The obtaining a depth value associated to the theoretical location is enforced for all the theoretical locations in the set of theoretical locations delivering a corresponding set of depth values. The determining, in the current image, another theoretical location is enforced for all the theoretical locations in the set of theoretical locations and for all the associated depth values in the set of depth values, delivering a set of another theoretical locations in the current images, the candidate depth of the current pixel being decided as consistent when a distance between the current pixel and the other theoretical locations is below the predetermined threshold (e.g. the predetermined threshold is half a size of a pixel in the current image).
According to one embodiment, the deciding if a candidate depth of the current pixel is consistent or not further comprises determining a photo-consistency between the current pixel and the theoretical location(s).
According to one embodiment, the determining depth maps for the images in the set of N images enforces for at least the current pixel, a matching technic between:
Thus, a determination of depth maps based on an iterative process involving for each iteration a different spatial patch of pixels thus allowing the use of all the information available in the images of the LF content.
Furthermore, spatial patches of pixels of decreasing number of pixels are successively considered for the successive iterations. Thus, the first time a depth value in a depth map is determined corresponds to a depth value determined based on a greatest number of pixels in the images of the LF content. This allows a higher quality in the result.
According to one embodiment, P<Pmax and the spatial patch of P pixels belongs to a plurality of spatial patches of P pixels. For a given iteration of the determining depth maps, the matching technic is successively enforced for each spatial patch of P pixels belonging to the plurality of spatial patches of P pixels, delivering a set of intermediate depth values and a corresponding set of confidence values associated to the current pixel, the candidate depth associated to the current pixel in the depth map associated to the current image being an intermediate depth value of higher confidence value in the set of intermediate depth values.
Thus, for a given size of patches of pixels considered for determining the depth maps, the patch of pixels that provides the best confidence level is kept for determining the depth maps in question.
According to one embodiment, the plurality of spatial patches of P pixels comprises all the spatial patches of P pixels.
According to one embodiment, the matching technic enforces a minimization of a cost function. The successive enforcement of the matching technic further delivers a set of minimum values of the cost function, the confidence value being a function at least of:
According to one embodiment, the cost function is a function of a norm belonging to the group comprising:
According to one embodiment, a non-transitory computer readable medium comprising a computer program product recorded thereon and capable of being run by a processor, including program code instructions comprising program code instructions for implementing a method for estimating a depth for pixels in a matrix of M images of a light field content previously described is proposed
Number | Date | Country | Kind |
---|---|---|---|
18305989.8 | Jul 2019 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2019/069246 | 7/17/2019 | WO | 00 |