The present disclosure generally pertains to the technical field of time-of-flight imaging, in particular to a configuration control circuitry for a time-of-flight system and a corresponding configuration control method for a time-of-flight system.
Time-of-flight (ToF) cameras are typically used for determining a depth map of objects in a scene that is illuminated with modulated light. Time-of-flight systems typically include an illumination unit (e.g., including an array of light emitting diodes (“LED”)) and an imaging unit including an image sensor (e.g., an array of current-assisted photonic demodulator (“CAPD”) pixels or an array of single-photon avalanche diode (“SPAD”) pixels) with read-out circuitry and optical parts (e.g., lenses). Still further, Time-of-flight systems typically include a processing unit (e.g., a processor) for processing the depth data generated in the ToF device.
For capturing a depth image in an iToF system, the iToF system typically illuminates the scene with, for instance, a modulated light and images the backscattered/reflected light with an optical lens portion on the image sensor, as generally known.
According to the time-of-flight principle the time that a light wave needs to travel a distance in a medium is measured. ToF systems obtain depth information of objects in a scene for every pixel of the depth image. Known are, for example, direct ToF (“dToF”) systems and indirect ToF (“iToF”) systems. ToF systems may further be configured as using either flood illumination with a rather homogeneous beam profile (full-field ToF), or an illumination with a certain beam profile (spot ToF, line-scan ToF, structured light, etc.).
The generated image data is output to a processing unit for image processing and depth information generation.
Typically, ToF systems operate with a predetermined configuration including different configuration parameters of the ToF system setup, including settings for the illumination unit and the imaging unit such as output power, modulation frequency, and sensor integration time.
Although there exist techniques for setting the configuration of a ToF system, it is generally desirable to improve these existing techniques.
According to a first aspect the disclosure provides an electronic device comprising circuitry configured to update a camera configuration based on adaptive information obtained by relating depth information obtained from ToF measurements with a reconstructed model of a scene.
According to a further aspect the disclosure provides method comprising updating a camera configuration based on adaptive information obtained by relating depth information obtained from ToF measurements with a reconstructed model of a scene.
According to a further aspect the disclosure provides a computer program comprising instructions which when executed by a processor cause the processor to update a camera configuration based on adaptive information obtained by relating depth information obtained from ToF measurements with a reconstructed model of a scene.
Further aspects are set forth in the dependent claims, the following description and the drawings.
Embodiments are explained by way of example with respect to the accompanying drawings, in which:
The embodiments described below in more detail provide an electronic device comprising circuitry configured to update a camera configuration based on adaptive information obtained by relating depth information obtained from ToF measurements with a reconstructed model of a scene.
The electronic device may for example be an imaging camera, in particular an iToF imaging camera, a control device for a camera, or a LiDAR or the like.
Circuitry may for example comprise a ToF imaging sensor configured to capture frames of the scene and an illumination unit configured to illuminate the scene with modulated light. Still further, the circuitry may include a processor, a memory (RAM, ROM or the like), a data storage, input means (control buttons, keys), etc. as it is generally known for electronic devices (computers, smartphones, etc.). Moreover, circuitry may include sensors for sensing light, or other environmental parameters, etc.
The model may for example be a 3D model. The model may for example be implemented as a triangle mesh grid (e.g., a local or global three-dimensional triangle mesh), a local or global voxel representation of a point cloud (uniform or octree), a local or global occupancy grid, a mathematical description of the scene in terms of planes, statistical distributions (e.g., Gaussian mixture models), or similar attributes extracted from the measured point cloud. The model is typically constructed progressively by fusing measurements from available data sources, e.g., including but not limited to depth information, color information, inertial measurement unit information, event-based camera information.
The camera configuration may be described by any configuration settings of an iToF camera's functional units such as the imaging sensor, the illumination unit, or the like.
A camera configuration may for example be defined as a camera mode comprising one or more configuration parameters.
Relating depth information obtained from ToF measurements with a reconstructed model (i.e., a running 3D reconstruction) of a scene may comprise any processing performed on raw ToF measurements, such as processing raw measurements obtained from the sensor in a ToF datapath. Relating depth information obtained from ToF measurements with a reconstructed model may also comprise transforming ToF measurements into a point cloud, registering the point cloud to the reconstructed model, and the like.
The circuitry may be configured to reconstruct and/or update the model of the scene based on the depth information obtained from ToF measurements.
The model of the scene may for example be updated based on point cloud information, and/or registered point cloud information.
The circuitry may be configured to determine an overlap between the depth information and the model of the scene, and to update the camera configuration based for example on the overlap.
Such determining an overlap between the depth information and the model of the scene relates the depth information to the reconstructed model of the scene.
Overlap may for example be any quantity that describes the overlap between the depth information and the model of the scene, e.g., a residual between the point cloud information and the model, a residual between the depth information and a projected depth view of the model, a residual between the color information and a projected color view of the model, or the like.
The circuitry may be configured to decide, based on the overlap, whether or not the camera configuration is to be updated.
The circuitry is configured to improve, for example, the signal-to-noise ratio by updating the camera configuration. The SNR may be defined as the phasor amplitude divided by the phasor standard deviation.
The camera configuration comprises one or more of a modulation frequency of an illumination unit of a ToF camera, an integration time, a duty cycle, a number samples per correlation waveform period, a number of sub-frames per measurement, a frame rate, a length of a read-out period (which may also be fixed by the sensor), a number of sub-integration cycles and a time span of the sub-integration cycles.
The camera mode feedback information controlling the camera configuration comprises an effective range of the scene.
The camera mode feedback information controlling the camera configuration comprises a saturation value of a ToF signal amplitude.
The circuitry may be configured to determine unwrapping feedback based on the model (Sk-1) of the scene.
The circuitry may be configured to determine unwrapping feedback for a pixel based on the model of the scene, and an estimated camera pose.
The circuitry may be configured to determine a wrapping index for a pixel based on the unwrapping feedback for the pixel.
The circuitry may be configured to determine model feedback based on an overlap between the depth information from ToF measurements and the model of the scene.
The circuitry may be configured to update parts of the model of the scene.
The circuitry may be configured to estimate a camera pose and to determine an overlap between the model of the scene and a current frame viewed from the estimated pose of the camera corresponding to the current frame.
The embodiments also describe a method comprising updating a camera configuration based on camera mode feedback information obtained by relating depth information obtained from ToF measurements with a reconstructed model of a scene.
The embodiments also describe a computer program comprising instructions which when executed by a processor cause the processor to update a camera configuration based on camera mode feedback information obtained by relating depth information obtained from ToF measurements with a reconstructed model of a scene.
The computer program may be implemented on a computer-readable medium storing the instructions.
Operational Principle and Datapath of an Indirect Time-of-Flight Imaging System (iToF)
Consider an iToF camera pixel imaging an object at a distance D. A (differential) iToF pixel measurement ν(τE, τD) as obtained in the iToF pixel is a variable whose expected value μ(τE, τD) is given by
where, t is the time variable, TI is the exposure time (integration time),m (t) is the in-pixel reference signal which corresponds to the modulation signal (i.e. the emitted light signal) or a phase shifted version of the modulation signal. ΦR(t, TE, τD) is the pixel irradiance signal which represents the reflected light (RL in
where D is the distance between the ToF camera and the object, and c is the speed of light.
The reflected light signal ΦR(t, τE, τD) is a scaled and delayed version of the emitted light ΦE(t−τE). The pixel irradiance signal ΦR(t, τE, τD) is given by:
where Φ(τD) is a real value scaling factor that depends on the distance D between the ToF camera and the object, and ΦE(t−τE−τD) is the emitted light ΦE(t−τE) (16 in
In the context of iToF, both m(t) and ϕE(t) are typically periodical signals with period TM=fmod−1 (fmod being the fundamental frequency or modulation frequency generated by the timing generator (106 in
As TI>>TM, the expected differential signal μ(τE, τD) is also a periodical function with respect to the electronic delay τE between in-pixel reference signal m(t) and optical emission ΦE(t−τE) with the same fundamental frequency fmod.
Writing μ(τE, τD) in terms of its Fourier Coefficients Mk yields
Note that due to the distance-dependent scaling of the light (factor Φ(τD)), the expected differential signal μ(τE, τD) is not periodical with respect to the time-of-flight τD.
From the above it is clear that the time-of-flight, and hence depth, can be estimated from the first harmonic H1(τD) Of μ(τE, τD):
From the first harmonic H1,μ(τD) the phase angle θ1,μ(τD) is obtained as
Here, ∠ denotes the phase of a complex number z=reiϕ
In practice, it is not feasible to evaluate H1,μ(τD) due to the presence of noise and due to the number of transmit delays.
Concerning the presence of noise, H1,μ(τD) is formulated in terms of the expected value μ(τE, τD) of differential mode measurements ν(τE, τD). Estimating this expected value from measurements may be performed by multiple repeated acquisitions (of static scene) to average out noise.
Concerning the number of transmit delays, H1,μ(τD) is given as an integral over all possible transmit delays τE. Approximating this integral may require a high number of transmit delays.
Due to these reasons iToF systems measure an approximation of this first harmonic H1,μ(τD). This approximation uses N differential mode measurements (i.e. N different measurements collected at the N taps) ν(τE,n, τD) (n=0, . . . , N−1) corresponding to N electronic transmit delays τE,n. A vectorized representation of this set of transmit delays is:
The approximation of the first harmonic H1,μ(τD) is obtained by an N-point EDFT (Extended Discrete Fourier Transform), according to
with n being the N-point EDFT bin considered. In standard iToF, n=1. However, depending on the transmit delays selected, different values of n could be more appropriate. For simplicity and without loss of generality, we will assume n=1 in the remainder of this disclosure:
This first harmonic estimate H1,ν(τD; tE) is also referred to as IQ measurement (with I and Q the real resp. imaginary part of the first harmonic estimate). In order to stay close to iToF nomenclature, in the following H1,ν(τD; tE) is denoted as “IQ measurement”. However, it is important to remember that an IQ measurement is an estimate of the first harmonic H1,μ(τD) of the expected differential measurement (as function of transmit delay).
From the first harmonic estimate H1,ν(τD; tE) of equation Eq. 11, the phase value ϕ between the emitted and the received light is obtained as
with Im( ) and Re( ) being respectively the imaginary part and the real part operator, and arctan 2 being the 4-quadrant inverse tangent function.
Due to the statistical nature of the differential mode measurements ν(τE,n, τD), the IQ measurement H1,ν(τD; tE) is a random variable with the following expected value
This expected value is here referred to as expected IQ measurement.
With
denoting the IQ measurement, and
denoting the N measurements (samples) obtained by the pixel at respective phases, this gives:
Specifically, once the Fourier transform is computed on the samples of the correlation waveform, the first harmonic will contain I and Q information as its real and imaginary part, respectively.
Based on equation Eq. 12 the phase value ϕ between the emitted and the received light is obtained as:
In another embodiment the above described technique may also be applied to N-Tap pixel (N being a natural number greater than 2), or to continuous wave time of flight imaging.
When determining the distance D corresponding to a phase delay value ϕ of a pixel, a so-called “wrapping problem” may occur. As explained above, the distance is a function of the phase difference between the emitted and received modulated signal. This is a periodical function with period 2π. Different distances will produce the same phase measurement. This is called the wrapping problem. A phase measurement produced by the iToF camera is “wrapped” into a fixed interval, i.e., [0,2π], such that all phase values corresponding to a set {Φ|Φ=2iπ+φ, lϵZ} become φ, where i is called “wrapping index”. In terms of depth measurement, all depths are wrapped into an interval that is defined by the modulation frequency. In other words, the modulation frequency sets the unambiguous operating range Zunambiguous as described by:
with c being the speed of light, and fmod the modulation frequency. For example, for an iToF camera having a modulation frequency 20 MHz, the unambiguous range is 7.5 m.
The wrapping problem may be solved for example based on single-, dual-, or multi-frequency phase measurements. Additionally, or instead, the wrapping problem may be solved based on the smoothness of prior probabilities for neighboring pixels (i.e. close pixels will likely have the same wrapping index). Additionally, or instead the wrapping problem may be solved based on an unwrapping feedback, in the form of prior probabilities (i.e. a priori information on which are the most likely wrapping indexes) from a reconstructed model, for example based on a model overlap decision as described below with regard to
Multi-frequency iToF uses multiple frequencies to solve the wrapping problem and improve the quality and range of the depth information. The iToF camera repeats the depth measurements at more than one frequency and thereby extends the unambiguous range based on a multi-frequency phase unwrapping algorithm, which may be based on the Chinese Remainder Theorem. For example, for a fixed integration time and to achieve an effective unambiguous range of 15 m for a mobile rear-facing use-case, a pair of frequencies 40 MHz, 50 MHz, to resolve the effective frequency of 10 MHz=GreatestCommonDivisor(40 MHz, 50 MHz). In this case, phase unwrapping will be needed to fuse the dual- (or multi-) frequency measurements, wherein in case the camera moves during the acquisition motion artefacts will appear in the form of inconsistent depth values. For multi-frequency measurements the unwrapping algorithm is inherently part of the ToF datapath (the datapath may or may not include additional pipeline blocks to track illumination patterns, to fuse different exposures, or to fuse different modalities at low-level to obtain a depth estimate). Therefore, the effective modulation frequency is lowered by phase unwrapping and if a minimum SNR requirement is met better precision performances per-frequency are achieved.
In iToF cameras the signal-to-noise ratio SNRSignal affects the measurements' precision, that is the noise of the depth measurement Ndepth,RMS, and for high SNRSignal (for example approximately bigger than 5) they assume a linear relation:
Further, any ground truth depth DGT in the observed scene satisfies in modulation frequency and corresponding unambiguous range ZUnambiguous, the equation
wherein bias refers to a systematic error, that does not vanish if infinite frames of the same scene are averaged and noise refers to a part of the signal, that vanishes if infinite frames of the same scene are averaged.
Several methods exist to increase the SNRSignal and therefore reduce noise of the depth measurement Ndepth,RMS, like focusing light at specific locations rather than using full-field, uniform illumination (also referred to as spot ToF using vertical-cavity surface emitting laser with diffractive optical elements) or using multi-frequency methods.
The SNR in the field of iToF refers to the ratio between the mean signal amplitude and phasor noise standard deviation. Further, the signal-to-noise ratio SNRSignal can be improved by a higher modulation frequency or by a shorter integration time.
In another embodiment the precision P, which is the relative standard deviation in percent i.e.
[P]=%, may be improved (P is only relating to the final depth value statistics).
In the following it is described an approach to minimize the noise of the depth measurement Ndepth,RMS by changing and optimizing adaptively a iToF camera configuration mode in accordance with dynamic feedback information acquired based on a reconstructed 3D model of the imaged scene. The camera configuration mode may be stored as preset profiles (also referred to as presets) which are set off-line. The presets may contain sensor calibration data and may define specific values for integration times and modulation frequencies according to a specific use-case requirement such as maximum unambiguous range or typical object reflectivity (e.g., for front-facing or rear-facing mobile devices).
Adaptive iToF Camera System Configuration
A scene 101 is illuminated by an iToF camera 102 (see also
3D reconstruction 104 creates and maintains a three-dimensional (3D) model of the scene 101 based on technologies known to the skilled person, for example based on the “KinectFusion” pipeline described in more detail with regard to
The registered point cloud obtained by the pose estimation 104-1 is forwarded to a 3D model reconstruction 104-2. The 3D model reconstruction 104-2 updates a 3D model of the scene based on the registered point cloud obtained from the pose estimation 104-1 and based on auxiliary input obtained from the auxiliary sensors 103. This process of updating the 3D model is described in more detail with regard to
The updated 3D model of the scene 101 is stored in a 3D model memory and forwarded to a model overlap decision 105-1 of a camera mode sequencer 105 (in another embodiment there may no 3D model memory and the updated 3D model is forwarded directly). The model overlap decision 105-1 decides if there is overlap between the registered point cloud and the updated 3D model and produces camera mode feedback information based on this decision (and optionally other information obtained from the ToF measurements) and forwards this camera mode feedback information to the adaptive mode generator 105-2 as camera mode feedback information. For example, the model overlap decision 105-1 may decide if the model overlap between the current registered point cloud and the updated 3D model exceeds a predetermined overlap threshold as described in more detail with regard to
Further, based on the registered point cloud and the updated 3D model, the model overlap decision 105-1 yields unwrapping feedback, for example in the form of an unwrapping index probability map that is delivered to the ToF datapath 102-2 to improve the disambiguation of the ToF measurements (see
Still further, based on the registered point cloud and the 3D model, the model overlap decision 105-1 determines model feedback that is delivered to the 3D model reconstruction 104-2. The model feedback may for example be in the form of an overlap information between registered point cloud and 3D model, or an error probability map that can be used to invalidate or keep in a separate buffer the unreliable registered point cloud information for further processing.
Based on the camera mode feedback information determined by the model decision 105-1, the adaptive mode generation 105-2 determines a camera mode update. The determined camera mode update is delivered to the ToF camera control 102-1 where these camera configurations of the imaging sensor and the illuminator are updated accordingly. For example, the model decision 105-1 adapts the camera configuration mode for each frame.
As described above, the pose estimation 104-1 and the 3D model reconstruction 104-2 obtain auxiliary input from auxiliary sensors 103. The auxiliary sensors 103 comprise a colour camera 103-1 which provides e.g. an RGB/LAB/YUV image of the scene 101, from which sparse or dense visual features can be extracted to perform conventional visual odometry, that is determining the position and orientation of the current camera pose. The auxiliary sensors 103 further comprises an event-based camera 103-2 providing e.g. high frame rate cues for visual odometry from events. The auxiliary sensors 103 further comprise an inertial measurement unit (IMU) 103-3 which provides e.g. acceleration and orientation information, that can be suitably integrated to provide pose estimates. These auxiliary sensors 103 gather information about the scene 101 in order to aid the 3D reconstruction 104 in producing and updating a 3D model of the scene.
Camera modes comprise configuration settings of an iToF camera's functional units such as the imaging sensor, and the illumination unit. The camera configuration modes as described here may for example be stored as preset profiles (also referred to as presets) in the camera controller and/or in the adaptive mode generator. A camera mode may define specific configuration parameters of e.g. the imaging sensor, and the illumination unit.
In the following, three exemplary camera modes are described for a multi-frequency camera that allows for three different modulation frequencies, namely 20 MHz, 50 MHz, and 60 MHz.
Accordingly, a camera mode update from the default camera mode A to camera mode B will change the modulation frequency of the imaging sensor from 20 to 50 MHz.
Accordingly, a camera mode update from the default camera mode A to camera mode C will change the modulation frequency of the imaging sensor from 20 to 60 MHz.
In addition or alternatively (not shown in
Still further, a camera configuration mode may define illumination spatial attributes such as field of illumination and illumination pattern (for example spot illumination patterns, which allow a maximization of the signal-to-noise ratio at specific coordinates).
The ToF camera configuration may for example include four-components single-frequency measurements, eight-components single-frequency measurements, eight-components dual-frequency measurements (two sub-frames) or the like (see below).
Any of these parameters may be changed to increase the precision of the 3D model of the scene.
The configuration parameters defined in the camera modes may for example be chosen according to specific use-case requirements such as maximum unambiguous range or typical object reflectivity (e.g., for front-facing or rear-facing mobile devices).
A ToF datapath (102-2 in
The ToF datapath may also perform processing such as transforming a depth frame into a vertex map and normal vectors (see 701 in
The ToF datapath may also comprise a sensor calibration block, which, by calibration, removes the phases, sources of systematic error such as temperature drift, cyclic error due to spectral aliasing on the return signal, and any error due to electrical non-uniformity of the pixel array. Based on the phase value ϕ obtained from the measurements q0, . . . , qN at a pixel according to equation Eq. 17 the corresponding depth value d for the pixel is determined as follows:
with fmod being the modulation frequency of the emitted signal and c being the speed of light.
For each frame k, from the depth measurement Dk for each pixel a three-dimensional coordinate within the camera coordinate system is determined, which yields a ToF point cloud for the current frame k. Further, the ToF datapath 102-2 may comprise filters that improve the signal quality and mitigate errors on the point cloud, such as ToF data denoising, removal of pixels incompatible with the viewpoint (e.g., “flying” pixels between foreground and background), removal of multipath effects such as scene, lens, or sensor scattering.
3D reconstruction 104 of
Auxiliary sensor data (e.g. from the auxiliary sensors 103 of
A surface measurement 701 of the ToF data path receives a depth map Dk(u) of the scene 101 from the ToF camera for each pixel for the current frame k to obtain a point cloud represented as a vertex map Vk,c and normal map Nk,c. The subscript “c” stands for camera coordinates.
A pose estimation 702 of the 3D reconstruction estimates a pose Tg,k of the sensor based on the point cloud Vk,c, Nk,c and model feedback {circumflex over (V)}k-1,c(u), {circumflex over (N)}k-1,c(u), Tg,k-1. The subscript “g” stands for global coordinates.
A model reconstruction 703 of the 3D reconstruction performs a surface reconstruction update based on the estimated pose Tk and the depth measurement Dk(u) and provides an updated 3D model Sk of the scene 101.
A surface prediction 704 receives the updated model Sk and determines a dense 3 model surface prediction of the scene 101 viewed from the currently estimated pose Tg,k, which yields a model estimated vertex map {circumflex over (V)}k,c(u) and model estimated normal vector Nk,c(u) stated in the ToF camera coordinate system of the current frame k. Surface Measurement
The surface measurement 701 of the ToF datapath receives a depth map Dk(u) of the scene 101 from the ToF camera for each pixel for the current frame k to obtain a point cloud represented as a vertex map Vk,c and normal map Nk,c. Each pixel is characterized by its corresponding (2D) image domain coordinates u=(u1, u2), wherein the depth measurement Dk(u) for each pixel u for the current frame k combined yields the depth map Dk for the current frame k. This yields a vertex map Vk,c(u) for each pixel u (i.e. a metric point measurement in the ToF sensor coordinate system of the current frame k) which is also referred to as the point cloud Vk,c. To the depth measurement Dk(u) a bilateral filter, or any other noise reduction filter known in the state of the art (anisotropic diffusion, nonlocal means, or the like) may be applied before transformation. Further, the measurement 701 further determines a normal vector Nk,c(u) for each pixel u in a ToF camera coordinate system.
Using a camera calibration matrix K—which comprises intrinsic camera configuration parameters—each pixel u in the image domain coordinates with its according depth measurement Dk(u) is transformed into a three dimensional vertex point Vk,c(u)=(xk,c, Yk,c, zk,c)Tϵ3 within the ToF camera coordinate system corresponding to the current frame k:
This transformation is applied to each pixel u with its according depth measurement Dk(u) for the current frame k which yields a vertex map Vk,c(u) for each pixel u (i.e., a metric point measurement in the ToF sensor coordinate system of the current frame k) which is also referred to as the point cloud Vk,c. Further, the measurement 701 further determines a normal vector Nk,c(u) for each pixel u in a ToF camera coordinate system.
The pose estimation 702 of the 3D reconstruction receives the vertex map Vk,c(u) and the normal vector Nk,c(u) for each pixel u in the camera coordinate system corresponding to the current frame k, and a model estimation for the vertex map {circumflex over (V)}k-c(u) and a model estimation for the normal vector {circumflex over (N)}k,c(u) for each pixel u from surface prediction 704 (see below) based on the latest available model updated of the previous frame k−1. In another embodiment the pose estimation may be based directly on the model Sk from which all points and all normals may be received by resampling. Further, the pose estimation 702 obtains an estimated pose Tg,k-1 for the last frame k−1 from a storage. In another embodiment more than one past pose may be used. For example, in a SLAM pipeline a separate (or “backend”) thread is available that does online bundle adjustment and/or pose graph optimization in order to leverage all past poses. Then the pose estimation estimates a pose Tg,k for the current frame k. The pose of the ToF camera describes the position and the orientation of the ToF system, which is described by 6 degrees-of-freedom (6DOF), that is three DOF for the position and three DOF for the orientation. The three positional DOF are forward/back, up/down, left/right and the three orientational DOF are yaw, pitch, and roll. The current pose of the ToF camera at frame k can be represented by a rigid body transformation, which is defined by a pose matrix Tg,k:
wherein Rkϵ3×3 is the matrix representing the rotation of the ToF camera and tkϵ3×1 the vector representing the translation of the ToF camera from the origin, wherein they are denoted in a global coordinate system. SE(3) denotes the so called special Euclidean group of dimension three. The pose estimation is performed based on the vertex map Vk,c(u) and the normal vector Nk,c(u) for each pixel u of the current frame k and a model estimation for the vertex map Vk-1,c(u) and a model estimation for the normal vector Nk,c(u) for each pixel u based on the latest available model updated to the previous frame k−1. In another embodiment the model Sk is used directly, especially if it is a mesh model, for example by resampling the mesh. Still further, it is based on the estimated pose Tg,k-1 for the last frame k−1. The pose estimation estimates the pose Tg,k for the current frame k based on an iterative closest point (ICP) algorithm as it is explained in the above cited “KinectFusion” paper. With the estimated pose Tg,k for the current frame k, a vertex map Vk,c(u) of the current frame k can be transformed into the global coordinate system which yields the global vertex map Vk,g(u):
When this is performed for all pixels u it yields a registered point cloud Vk,g. Accordingly the normal vector Nk,c(u) for each pixel u of the current frame k can be transformed into the global coordinate system:
The 3D model of the scene 101 can be reconstructed for example based on volumetric truncated signed distance functions (TSDFs) or other models as described below. The TSDF based volumetric surface representation represents the 3D scene 101 within a volume Vol as a voxel grid in which the TSDF model stores for each voxel p the signed distance to the nearest surface. The volume Vol is represented by a grid of equally sized voxels which are characterized by its center pϵ3. The voxel p (i.e. its center) is given in global coordinates. The value of the TSDF at a voxel p corresponds to the signed distance to the closest zero crossing (which is the surface interface of the scene 101), taking on positive and increasing values moving from the visible surface of the scene 101 into free space, and negative and decreasing values on the non-visible side of the scene 101, wherein the function is truncated when the distance from the surface surpasses a certain distance. The result of iteratively fusing (averaging) TSDF's of multiple 3D registered point clouds (of multiple frames) of the same scene 101 into a global 3D model yields a global TSDF model Sk which contains a fusion of the frames 1, . . . , k for the scene 101. The global TSDF model Sk is described by two values for each voxel p within the volume Vol, i.e. the actual TSDF function Fk(p) which describes the distance to the nearest surface and an uncertainty weight Wk(p) which assesses the uncertainty of Fk(p), that is Sk:=[Fk(p), Wk(p)]. The global TSDF model Sk for the scene 101 is built iteratively and depth map Dk of the scene 101 with the corresponding pose estimation Tg,k and the of a current frame k is integrated and fused into the previous global TSDF model Sk-1 of the scene 101, such that the global TSDF model Sk-1:=[Fk-1(p), Wk-1(p)] is updated—and thereby improved—by the registered point cloud Vk,g of the current frame k. Therefore, the model reconstruction receives the depth map Dk of the current frame k and the current estimated pose Tk (which yields the registered point cloud Vk,g of the current frame k) and outputs an updated global TSDF model Sk=[Fk(p), Wk(p)]. That means the updated global TSDF model Sk=[Fk(p), Wk(p)] is based on the previous global TSDF model Sk-1=[Fk-1(p), Wk-1(p)] and on the current registered point cloud Vk,g. According to the above cited “KinectFusion” paper this is determined as:
wherein the function y=π(z) performs perspective projection of zϵ3 including de-homogenization to obtain yϵ2 and, where θ is the angle between the associated pixel ray direction and the surface normal measurement Nk,c. TSDFs are for example also described in more detail in the KinectFusion paper cited above. Still further, the model reconstruction 703 may receive a model feedback (for example a model feedback matrix Ak, see below) which indicates for each pixel if it is reliable (overlap pixel in case that overlap is sufficient and in case that overlap is not sufficient), unreliable (non-overlap pixel in case that overlap is sufficient) or new (non-overlap pixel in case that overlap is not sufficient). The depth data of a reliable or new pixel may be used to improve the 3 model as described above (that means the model is created or updated with the corresponding depth measurement), the depth data of an unreliable pixel may be discarded or stored to a dedicated buffer that can be used or not.
The surface prediction 704 receives the updated TSDF model Sk=[Fk(p), Wk(p)] and determines a dense 3 model surface prediction of the scene 101 viewed from the currently estimated pose Tg,k. That is a dense 3 model surface prediction of the scene 101 viewed from the currently estimated pose Tg,k can be determined by evaluating the surface encoded in the zero-level-set, that is Fk ≡0. That means a model estimated vertex map {circumflex over (V)}k,c(u) and model estimated normal vector {circumflex over (N)}k,c(u) stated in the ToF camera coordinate system of the current frame k are determined.
This evaluation is based on ray casting the TSDF function Fk(p). That means each pixel's u corresponding ray within the global coordinate system, which is given by Tg,kK−1[u,1]T, is “marched” within the volume Vol and stopped when a zero crossing is found indicating the surface interface. That means each pixel's ray Tg,kK−1[U, 1]T (or a value rounded to the nearest voxel p) is inserted into the TSDF value Fk(p) and if a zero-level Fk(p)=0 is determined it is stopped and the voxel p is determined as part of the model surface (i.e. of the zero-level-set Fk≡0) and thereby the estimated model vertex map {circumflex over (V)}k,c(u) is determined. If the ray of the ray casting of a certain pixel u “marches” in a region outside the volume Vol, which means that the model Sk is not defined in this region, the estimated model vertex map {circumflex over (V)}k,c(u) at this pixel u is defined for example as Vk,c(u)=NaN (not a number).
Still further, after a pose estimation in the pose estimation 702 and before the model reconstruction in the model reconstruction 703, ray tracing viewed from the currently estimated pose Tk is determined based on ray casting the previously updated model Sk-1=[Fk-1(p), Wk-1(p)] may be performed as described above, which may yield an estimated model vertex map {circumflex over (V)}k,k-1,c(u) map for each pixel u viewed from the currently estimated pose Tg,k. In this case for {circumflex over (V)}k,k-1,c(u) the first subscript “k” refers to the currently estimated pose Tg,k with regards to the frame k, and the second subscript “k−1” refers to the previously updated model Sk-1 with regards to the frame k−1. This may be used in the model overlap decision 105-1 as described below.
In another embodiment a model may be characterized as a mathematical object that fulfills one or more of the following aspects: it is projectable to any arbitrary view, it can be queried for nearest neighbors (closest model points) with respect to any input 3D point, it computes distances with respect to any 3D point cloud, it estimates normals and/or it can be resampled at arbitrary 3D coordinates.
As shown in the exemplary embodiment of
In another embodiment the 3D model as obtained from 3D reconstruction (104 in
It should be noted that the model feedback may be determined based on the registered point cloud (which is based on the received point cloud from the device and the current scene 3D model from the past (obtained from the memory)).The model overlap decision es determined at 901 defines a model overlap between the previous (i.e. k−1) reconstructed 3D model and the current frame k (FoV of the current frame) frame based on the registered point cloud and decides on a camera mode update based on the model overlap. By means of the estimated camera pose, the 3D model can be projected to the desired view, and it can be assessed what fraction of the ToF data of the current frame is overlapping (and therefore improving) the 3D model, and what fraction is new and may be annotated as such in a model feedback. Based on a predetermined criterion, (for example but not limited to a minimum overlapping region) a decision is made whether or not to modify the camera configuration mode.
In another embodiment the 3D model may be projected to the view of the point cloud and the overlap may be computed (photometric error, point-to-mesh distance, depth map distances between depth information from ToF sensor and 3D model projected to depth map (using camera intrinsics)). At this point, it may be decided whether the overlap is sufficient (see
wherein npixel is the total number of pixels of the imaging sensor of the iToF camera. At 1015, it is asked if the model overlap value poverlap between reconstructed 3D model and the current frame k is greater than a predetermined minimum overlap value minoverlap, that is poverlap>minoverlap. If the answer a 1015 is yes, it is proceeded further with 1016. At 1016, a Boolean variable booladapt that represents the model overlap decision is set to booladapt=1 to indicate an adaptation of the current camera configuration mode. This may lead to an increased signal-to-noise ratio SNRsignai of the next frame and therefore improve the 3D model or it may improve the depth precision (for example by increasing the modulation frequency). If the answer at 1015 is no, it is proceeded further with 1017. At 1017, the Boolean variable booladapt that represents the model overlap decision is set to booladapt=0 to indicate the use of a default camera configuration mode.
The model overlap decision passes this model overlap decision booladapt on to the adaptive mode generator (150-2 in
It should be noted that it may be sufficient to only count the number NaN entries nNaN in the estimated model vertex map {circumflex over (V)}k,k-1,c(u) and determine the model overlap value
This allows to determine if the camera pose has changed significantly (but does not allow to pixel determine if the elements within the scene have moved).
It should further be noted that the model overlap decision described above determines a model overlap between the previous (i.e. k−1) reconstructed 3D model and the current frame k (FoV of the current frame) frame based on the registered point cloud. It should however be noted that in alternative embodiments, the model overlap decision may alternatively determine a model overlap based on the depth map of the scene.
As shown in the exemplary embodiment of
In another embodiment the probability density function f(Dk) may have another density function than a Gaussian distribution and it is looked at this density function to decide where to acquire the bulk of depth map information. Further, it may be looked at the amplitude histogram, so that the exposure is so that there is no saturation. For example, if 5% of the current depth map is saturated, the modulation frequency is changed (which does not affect integration time, it can be changed independently) to adapt to, e.g., a reduced unambiguous range (to improve the SNR), where the integration time may have to be reduced also to remove that saturation.
The effective range reff is defined by the mean value deff and the standard deviation σ, that is reff=[deff,σ].
In the diagram of
This effective range reff which characterizes the effective range of the scene may for example be determined by the model overlap decision (105-1 in
In the example above, the effective range reff comprises the mean depth of the standard deviation of the depth distribution of the current depth map Dk. Alternatively, the effective range reff may also be defined by the mean depth alone, or by the mean depth weighted by the standard deviation or the like.
Still further, the effective range reff may comprise the minimum and maximum depth of the current depth map Dk, or the 5th and 95th depth percentiles of the current depth map Dk, or a full depth histogram.
It should also be noted that in the example given above, the effective range reff of the scene is defined by the mean depth deff of the depth map Dk. Alternatively, the median depth of the depth map Dk may be used instead of the mean depth.
In another embodiment the effective range reff may be the interval [0, Dupper] where Dupper may be the 90th or 95th percentile of the current depth histogram.
As shown in
As described above, in order to decide whether the overlap is sufficient or not those points that are overlapping are taken to improve the current 3D model into an updated 3D model, where the new points that come in from the measurements (registered point cloud) refine it. Still further, the new, non-overlapping parts may be used to complete the 3D model with new information (which is also equipped with uncertainty weights) which yields the model feedback.
In another embodiment the model feedback may comprise a model feedback matrix Ak (which may also be implemented as a vector or any other data structure) which has the same size as the depth map Dk and where an entry is set to 0 if the pixel is not known so far in the 3D model (i.e. if the pixel has NaN or ∥{circumflex over (V)}k,k-1,c(u)−Vk(u)∥>c or ∥{circumflex over (V)}k-1,c(u)−Vk(u)∥>C), that is aij=Ak(u)=0 or set to 1 if the pixel is known in the 3D model, aij=Ak(u)=1. The model feedback matrix Ak may be provided together with the Boolean variable booladapt as model feedback to the 3D model reconstruction (104-2 in
As shown in
When the overlap decision (105-1 in
The unwrapping feedback may comprise information for each pixel about the probability for the pixel of being inside a certain wrapping index (or “bin”) i, based on the reconstructed 3D model. As described with respect to
The parts of the model 1302 indicated by brackets 1303 and 1304 are determined to have a high prior probability, e.g. P(i)=1, for a wrapping index i=1 but a low prior probability, e.g. P(i)=0, for wrapping indexes i≠1. The part of the model 1302 indicated by bracket 1305 is determined to have a high prior probability, e.g. P(i)=1 for a wrapping index i=2 but a low prior probability, e.g. P(i)=0, for wrapping indices i≠2.
Alternatively, the prior probability P may be chosen as a soft distribution. That is, for each pixel the wrapping index obtained by the 3D model reconstruction may be promoted with a high prior probability, but the neighboring wrapping indices may also be weighted with a slightly higher prior probability than the rest of the available wrapping indices.
In another embodiment an estimated model vertex map {circumflex over (V)}k,k-1,c(u) viewed from the current estimated pose Tk based on the previously updated model Sk-1=[Fk-1(p), Wk-1(p)]) is used to determine an estimated depth data or phase.
From the received estimated model vertex map {circumflex over (V)}k-1,c(u) an estimated depth data {circumflex over (D)}k-1(u) is determined for each pixel U by using a back-transformation following from Eq. 22:
Based on Eq. 32, a model-estimated phase
is determined and for a pixel u a maximum likelihood estimator is determined (it may be assumed relatively smooth motion). This corresponds to finding the wrapping index hypothesis that maximizes the likelihood of observing a certain phase in the model and in the measurements:
That is, a wrapping index i deduced from the modeled phase from the reconstructed 3D model is weighted higher. Further, this approach can be extended to maximizing a posteriori criterion if prior information is available, for example leveraging spatial priors on the neighboring measurements.
Still further, this approach can be applied also to a coarser scene discretization, for example by looking at an occupancy grid of the 3D model rather than the wrapping indexes of the projected depth map.
When the overlap is insufficient to deduce a wrapping index from the model, it may for example be determined that all unwrapping coefficients have equal probability.
As shown in
Based on the camera mode feedback information determined by the model decision, the adaptive mode generation may for example determine a camera mode update in such a way to receive increase the signal-to-noise ratio SNRSignal of the next frame and therefore improve the reconstruction of the 3D model of the scene.
The adaptive mode generator may for example manage a number of camera modes which each defines a set of configuration parameters for the iToF camera (see
Based on the camera mode feedback information the adaptive mode generator selects a camera configuration mode from the available camera configuration modes. The adaptive mode generator may for example select the camera mode based on the camera mode feedback information in such a way that the signal-to-noise ratio SNRSignal of the next frame to improve the 3D reconstruction model.
In the example given in
At 1402, the adaptive mode generator determines based on overlap decision parameter booladapt obtained by model overlap decision if there is enough overlap for a camera mode update (901 in
In a case where the overlap decision detects that there is sufficient overlap between the currently measured point cloud and the previous 3D model (i.e. booladapt=1), the adaptive mode generator decides that it is possible to switch the camera mode to an optimized mode and continues at 1404.
In the example provided here, the adaptive mode generator selects the alternative camera mode on the basis of the effective range reff=[deff, σ] as obtained from the model overlap decision and described with regard to
At 1404 the frequency modulation fmod is determined based on Eq. 19 above in such a way that unambiguous range ZUnambiguous exceeds the first standard deviation above the mean depth:
With exemplifying parameters of μ=deff=3 m, σ=0.61 m, this yields a modulation frequency fmod≤ MHz for the mode update (see also
Based on this envisaged modulation frequency fmod=41.5 MHz, the adaptive mode generator, at 1405, selects a camera mode which fits best to the frequency requirement. At 1406 the adaptive mode generator controls the ToF camera to switch from the default camera mode A to this selected camera mode (mode B).
This increases the signal-to-noise ratio SNRSignal of the next frame as it increases the resolution within the decreased unambiguous range that fits the current scene better than the configuration settings applied in the previous frame in which the unambiguous range of the was longer than needed.
In the example provided above, the adaptive mode generator sets the modulation frequency so that the unambiguous range one standard deviation above the mean depth of the scene. In alternative embodiments, this may be chosen differently. For example, the modulation frequency fmod may be set to correspond to two or three standard deviations above the mean, or to correspond to an unambiguous range defined as a certain percentile of the probability density function of depth of a depth map Dk or the like.
In the example above, the adaptive mode generator responds to camera mode feedback information comprising the effective range of the scene and adapts the camera mode by changing the modulation frequency of a multi-frequency ToF camera. Focusing here on the effective range and modulation frequency, however, serves only as an example. Any single configuration parameters or groups of configuration parameters may be adapted by the adaptive mode generator in a similar way. And the adaptive mode generator might base its decision on any other camera mode feedback information that is suitable to control the camera mode.
It should also be noted that in another embodiment it may not be required that a predefined camera mode comprises multiple configuration parameters as shown in
It should be noted that the description above is only an example configuration. Alternative configurations may be implemented with additional or other sensors, storage devices, interfaces, or the like.
It should be recognized that the embodiments describe methods with an exemplary ordering of method steps. The specific ordering of method steps is, however, given for illustrative purposes only and should not be construed as binding.
It should also be noted that the division of the electronic device of
All units and entities described in this specification and claimed in the appended claims can, if not stated otherwise, be implemented as integrated circuit logic, for example, on a chip, and functionality 1s provided by such units and entities can, if not stated otherwise, be implemented by software.
In so far as the embodiments of the disclosure described above are implemented, at least in part, using software-controlled data processing apparatus, it will be appreciated that a computer program providing such software control and a transmission, storage or other medium by which such a computer program is provided are envisaged as aspects of the present disclosure.
Note that the present technology can also be configured as described below:
(1) An electronic device comprising circuitry configured to update a camera configuration (fmod, mode A, B, C) based on camera mode feedback information (booladapt, reff) obtained by relating depth information (Vk,c(u), Nk,c(u), Dk) obtained from ToF measurements with a reconstructed model (Sk) of a scene (101).
(2) The electronic device of (1), wherein the camera configuration is described by configuration settings of an imaging sensor and/or an illumination unit of an iToF camera.
(3) The electronic device of (1) or (2), wherein the circuitry is configured to reconstruct and/or update the model (Sk) of the scene (101) based on the depth information (Vk,c(u), Nk,c(u), Dk) obtained from ToF measurements.
(4) The electronic device of anyone of (1) to (3), wherein the circuitry is configured to determine an overlap (poverlap) between the depth information (Vk,c(u), Nk,c(u), Dk) and the model (Sk) of the scene, and to update the camera configuration based on the overlap (poverlap).
(5) The electronic device of (4), wherein the circuitry is configured to decide, based on the overlap (poverlap), whether or not the camera configuration is to be updated.
(6) The electronic device of anyone of (1) to (5), wherein the circuitry is configured to improve a signal-to-noise ratio (SNRSignal) by updating the camera configuration.
(7) The electronic device of anyone of (1) to (6), wherein the camera configuration comprises one or more of a modulation frequency (fmod) of an illumination unit (210) of a ToF camera, an integration time, a duty cycle, a number samples per correlation waveform period, a number of sub-frames per measurement, a frame rate, a length of a read-out period, a number of sub-integration cycles and a time span of the sub-integration cycles.
(8) The electronic device of anyone of (1) to (7), wherein the camera mode feedback information controlling the camera configuration comprises an effective range (reff) of the scene.
(9) The electronic device of anyone of (1) to (8), wherein the camera mode feedback information controlling the camera configuration comprises a saturation value the ToF signal amplitude.
(10) The electronic device of anyone of (1) to (9), wherein the circuitry is configured to determine unwrapping feedback based on the model (Sk-1) of the scene.
(11) The electronic device of (10), wherein the circuitry is configured to determine unwrapping feedback for a pixel (u) based on the model (Sk-1) of the scene (101), and an estimated camera pose (Tk-1).
(12) The electronic device of (11), wherein the circuitry is configured to determine a wrapping index (i) for a pixel (u) based on the unwrapping feedback for the pixel (u).
(13) The electronic device of anyone of (1) to (12), wherein the circuitry is configured to determine model feedback (Ak) based on an overlap (poverlap) between the depth information from ToF measurements and the model (Sk) of the scene.
(14) The electronic device of anyone of (1) to (13), wherein the circuitry is configured to update parts of the model (Sk-1) of the scene (101).
(15) The electronic device of anyone of (1) to (14), wherein the circuitry is configured to estimate a camera pose (Tk) and to determine an overlap (poverlap) between the model (Sk) of the scene (101) and a current frame (k) viewed from the estimated pose (Tk) of the camera corresponding to the current frame (k).
(16) A method comprising updating a camera configuration (fmod, mode A, B, C) based on camera mode feedback information (booladapt, reff) obtained by relating depth information (Vk,c(u), Nk,c(u), Dk) obtained from ToF measurements with a reconstructed model (Sk) of a scene (101).
(17) A computer program comprising instructions which when executed by a processor cause the processor to update a camera configuration (fmod, mode A, B, C) based on camera mode feedback information (booladapt, reff) obtained by relating depth information (Vk,c(u), Nk,c(u), Dk) obtained from ToF measurements with a reconstructed model (Sk) of a scene (101).
Number | Date | Country | Kind |
---|---|---|---|
21205237.7 | Oct 2021 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2022/079129 | 10/19/2022 | WO |