Time-of-flight (ToF) depth sensors have become the technology of choice in diverse applications, from automotive and aviation to robotics, gaming and consumer electronics. These sensors come in two general flavors: LIDAR-based systems that rely on extremely brief pulses of light to sense depth, and continuous-wave (CW) systems that emit a modulated light signal over much longer duration. The LIDAR-based systems can acquire centimeter-accurate depth maps up to a kilometer away in broad daylight, but have low measurement rates. Additionally, the cost per pixel is orders of magnitude higher than CW systems, whose range, outdoor operation and robustness are extremely limited. Because low cost, large-scale production and high measurement rate often trump other considerations, continuous-wave time-of-flight (CW-ToF) sensors continue to dominate the consumer electronics and low-end robotics space despite their shortcomings. Further, consumer grade time-of-flight depth cameras like Kinect and PMD are cheap, compact and produce video-rate depth maps in short-range applications.
The present invention significantly reduces the shortcomings of CW-ToF through the use of energy-efficient epipolar imaging. In certain embodiments, a continuously-modulated sheet of laser light is projected along a sequence of carefully chosen epipolar planes that collectively span the field of view. For each projected sheet, only a strip of CW-ToF pixels corresponding to each epipolar plane is exposed. As shown in
Epipolar imaging was first proposed for acquiring live direct-only or global-only video with a conventional (non-ToF) video sensor. The approach has been extended to the ToF domain, but its energy efficiency is very low and it involves capturing more than 500 images to calculate a single “direct-only” ToF image. In the context of triangulation-based 3D imaging, significant improvements in energy efficiency and robustness can be achieved with a 2D scanning-laser projector and a rolling shutter camera. The present invention extends this idea to the ToF domain. As such, it inherits all the advantages of non-ToF energy-efficient epipolar imaging while also addressing challenges that are specific to CW-ToF.
The primary difficulty is that the range of CW-ToF sensors is severely limited by power consumption and eye safety considerations. Although most CW-ToF sensors electronically subtract the DC component of incident light, photon noise from strong ambient sources such as sunlight can easily overwhelm the CW-ToF signal at distances of more than a few meters outdoors at typical frame rates. By concentrating the energy of the light source into a single sheet, epipolar ToF boosts this range to 10 m and acquires a useful, albeit noisier, depth signal at over 15 m outdoors.
A secondary difficulty is that the depth accuracy of CW-ToF sensors is strongly affected by global illumination effects, such as inter-reflections and global illumination transport. These effects produce longer light paths and show up as a source of structured additive noise. These effects cannot be cancelled a posteriori without imposing strong assumptions on the scene's geometry and reflectance properties, yet are extremely common indoors (e,g., corners between walls, shiny surfaces of tables and floors, mirrors, etc.). The present invention demonstrates significant robustness to all forms of global transport, and to specular inter-reflections in particular, a form of global illumination transport that has never been possible to handle in live CW-ToF.
As devices equipped with CW-ToF depth sensors become increasingly common indoors and outdoors, they must be able to operate without interfering with each other. While non-interference between devices of a given make and model can be achieved by varying modulation frequency across them, robustness against the broader ecosystem of CW-ToF sensors is desirable. The present invention demonstrates that epipolar ToF enables interference-free live 3D imaging, even for devices that have the exact same modulation frequency and light source wavelength.
Lastly, CW-ToF sensors must acquire two or more frames with a different phase of emitted light to compute a single depth map. This makes them highly sensitive to camera shake, unlike conventional cameras where shaking merely blurs the image, camera shake in CW-ToF causes the static-scene assumption to be violated, leading to depth maps that are both blurry and corrupted by motion artifacts. Epipolar ToF makes it possible to address both problems: motion blur is minimized by relying on very short exposures for each epipolar plane, motion artifacts and depth errors are minimized by acquiring multiple phase measurements per epipolar plane, rather than per frame and rolling-shutter-like distortions due to the sequential nature of epipolar-plane ToF are reduced by scheduling the sequence of epipolar planes so that post-acquisition distortion correction becomes easier.
The term microcontroller, as used herein, may mean a dedicated hardware device, circuitry, an ASIC, an FPGA, a microprocessor running software, or any other means known in the art. It is further understood that the microcontroller will include connections to both the sensor and the laser light projector for sending control signals, and for receiving data. The invention is not intended to be limited to one method of implementing the functions of the controller.
As used herein, the terms camera and sensor are used interchangeably.
Continuous Wave Time of Flight
CW-ToF cameras use a temporally modulated light source and a sensor where the exposure is also modulated during integration. If the illumination modulation function is ƒωt=cos(ωt) and the sensor modulation function is gω,ϕ(t)=cos(ωt+ϕ) where ω is the modulation frequency in rad/s and ϕ is the phase offset between the source and sensor modulation functions, then the measurement at a pixel x is:
where hx(t) represents a pixel's transient response to the active light source and Ax is the light received due to ambient light and the DC component of the active light source. Although, Ax drops out of the integral, in practice, Iω,ϕ(x) is measured by integrating the incoming light to two different storage sites (called taps) depending on whether gω,ϕ(t) is positive or negative and then taking the difference between the stored values so the ambient light still adds to the measurement shot noise.
If there are no indirect light paths between the light source and sensor pixel x, then hx(t) ∝δ(t−l(x)/c) where c is the speed of light and l(x) is the length of the path from the light source to the scene point corresponding to x and back to the sensor.
Assuming the scene is static, the path length l(x) can be recovered by capturing a pair of images at the same frequency but two different modulation phases ϕ=0 and ϕ=π/2:
The pixel depth z(x) can be computed from l(x)) using the geometric calibration parameters of the light source and sensor.
Epipolar Time of Flight
To realize the geometry of
In a preferred embodiment, the third option is chosen because it is more light-efficient than using a DMD mask, and it leads to a simpler design. The ROI is set to one row tall to match the requirements of epipolar ToF.
Epipolar Plane Sampling
CW-ToF requires at least two images to recover depth. To cover an entire scene using epipolar ToF, the active epipolar plane must be swept across the field-of-view. This offers flexibility to choose the order by which epipolar planes are sampled.
Another embodiment in
Using this strategy, each row is captured at a slightly different time. Although this induces a rolling shutter-like effect in the acquired depth map, the individual depth values will be blur- and artifact-free and can be combined into a consistent model by post-processing.
To make such post-processing even easier while obeying the kinematic constraints of the mirror's actuator, epipolar planes are ordered in a sawtooth pattern, as shown in
More generally,
In operation, the projector generates a sheet of modulated laser light and sequentially illuminates epipolar planes defined between the laser projector and the sensor. The planes may be illuminated in any order, but, in a preferred embodiment, are illuminated from top-to-bottom and then bottom-to-top. The actual order in which the planes are illuminated may be dependent upon the particular environment in which the platform is being used or the application for which the depth map is being created. Also, any number of planes may be defined within the field-of-view, limited only by the capabilities of the laser and the sensor, and the desired frame rate. In a preferred embodiment, there are 240 planes defined in the field-of-view, with each plane being 320×240 pixels.
The region of interest of the sensor can be set to any portion of the field-of-view and, in operation, a microcontroller synchronizes the laser projector and the sensor such that the ROI of the sensor is set to sense a row of pixels within the currently illuminated epipolar plane. Phase is estimated using two images. In general, the sensor uses 4, measurements for correlating the incoming signal with shifted input signals (angles 0, 90, 180, 270). Either 2 or 4 of these images can be used for phase estimation, however, using 4 images gives more accuracy but takes longer to capture and reduces frame rate. If phase unwrapping is necessary, the phase estimation process will need to be performed at different modulation frequencies, and, as such, 4 images instead of 2 images will be required for phase unwrapping. In certain embodiments of the invention, an inertial measurement unit (IMU) may be attached to the sensor and is used to compensate for motion of the platform.
Epipolar ToF Prototype
A prototype device for epipolar ToF imaging, shown in
The ToF sensor used is the EPC660 (from Espros Photonics) which has a resolution of 320×240 and the pixels implement ambient saturation prevention. The sensor is fitted with an 8 mm F1.6 low distortion lens and an optical bandpass filter (650 nm center frequency, 20 nm bandwidth). The sensor allows the ROI to be changed with every sensor readout and this feature is used to select different rows to image. To read data out of the sensor, the sensor development kit (DME660) from the manufacturer is utilized. It should be realized that the invention is not limited to the use of the described ToF sensor, but that any ToF sensor might be used.
The line projector utilized for the prototype uses a 638 nm laser diode with a peak power of 700 mW as its light source. Light from the diode is collimated and passed through a Powell lens that stretches the beam cross-section into a diverging, almost uniformly illuminated straight line with a 45 degree fanout angle. The laser light is directed at a 1D scanning galvomirror that can be rotated to deflect the sheet. The rotational range of the mirror gives the projector a 40 degree vertical field of view. The projector's effective center of projection moves as the mirror rotates, but because the distance between the fanout point and the galvomirror is very small compared to depths in the scene, this effect can be ignored.
A block diagram of the system components is shown in
The projector and camera are aligned side-by-side in a rectified stereo configuration, as required for epipolar imaging. When correctly aligned, the projected light sheet illuminates a single row of pixels in the camera, and this row is independent of depth. A mirror calibration is performed to determine the mapping between the galvomirror angle and the illuminated camera row.
Sensor Calibration
In practice, the measurements read out from the sensor, as observed, do not match their expected values. There are a number of reasons for this discrepancy, including fixed pattern noise, unequal sensitivity and crosstalk between taps and variations in the phase of the actual exposure modulation function at each pixel. The relation between the expected sensor measurements Iω(x) and the observed measurements Îω(x) is modelled using a projective correction Hω(x) at each pixel.
To find Hω(x), the sensor is placed fronto-parallel to a planar surface at a set of known distances zk, k=1, . . . , K. For each position of the plane, sensor measurements are collected at different aperture settings (s=1, . . . S) to simulate the effect of varying scene albedos. For each plane position k, the path length can be computed at a pixel lk (x) and from it the expected phase
The Hω(x) that best explains the sensor measurements Iω,k,s (x) can be computed by finding the correction Hω(x) that minimizes the least square error between the corrected measurements and the expected phase.
These calibration parameters are dependent on both modulation frequency and exposure time so the process is repeated for all the frequencies and exposure times. Although the modulation signals passed to the sensor and light source driver are square waves, at modulation frequencies of 20 MHz and above, the harmonics were largely suppressed and so the modulation functions were well approximated by sinusoids.
Timing
The time needed to image a row (and by extension the frame rate) with the prototype is a function of n, the number of readouts per row, exposure time texp, the readout time for a row tread and tmirror, the time taken by the galvomirror to move to the next row position in the sampling sequence.
trow−ntexp+(n−1)tread+max(tread,tmirror) (5)
With a two-tap sensor like the one used in our prototype, at least n=2 readouts are needed to measure depth using a single modulation frequency.
Limitations
Currently, the main bottleneck for the frame rate is the readout time. Embodiments of the present invention need data from only one row of the sensor per readout, but the smallest region of interest the EPC660 sensor supports is 4 rows tall, the reading of 4 rows is forced when in actuality, only one row is used. In addition, the development kit limits the sensor data bus to 20 MHz, but the sensor itself supports bus rates up to 80 MHz. The minimum value of texp depends on the peak power of the light source and desired range. The described prototype of the present invention has a source with a peak power of 700 mW, while most other experimental time-of-flight systems have a peak light source power in the 3 W to 10 W range. With a brighter light source, a shorter exposure time could be used without loss of range. Lastly, the low cost galvomirror could be replaced with a faster 1D MEMs mirror. With these improvements, a system based on the described prototype would operate at video frame rates.
The sensor used in the described prototype supports a maximum modulation frequency of only 24 MHz, whereas most other time-of-flight sensors can run in the 50 MHz to 100 MHz range. This limits the ability of the prototype to accurately scan smaller objects or to be used for transient imaging. The EPC660 datasheet specifies that the sensor ADC returns 12 bit values, but the version of the sensor which was used returns only 10 bits, which effects range and makes the output depth maps noisier.
Results
To run the sensor in regular imaging mode for comparing performance under ambient light and global illumination, the entire sensor is exposed at once instead of using a small ROI and the sensor is left exposed until the sheet projector has finished a sweep across the field of view. For regular ToF imaging in the multi-device interference and camera motion experiments, the sheet projector can be replaced with a diffused source.
Ambient Light
The benefits of applying epipolar imaging to time-of-flight in brightly lit environments were simulated, and the results shown in
Global Illumination
The conference table in the second row of
With epipolar imaging, the walls appear straight and meet at a sharp right angle diffuse inter-reflections at the corner, glossy inter-projections from the projection screen onto a shiny conference table, reflections from the mirrors in the restroom and in between the wall and the shiny water fountain. Epipolar ToF eliminates most of the global light transport, resulting in depth maps that are significantly more accurate than regular ToF.
Multi-Camera Interference
With epipolar CW-ToF imaging, two cameras running at the same modulation frequency can usually only interfere with each other at a sparse set of pixels in each image. Each camera illuminates and images a single line in the scene at a time, so at any point of time the second camera can only interfere with the first camera at the points where its illuminated line intersects with the first camera's exposed row of pixels. A degenerate case occurs when the light source of one camera forms a rectified stereo pair with the sensor of the second camera and both cameras happen to be synchronized, but this can be considered a rare occurrence.
If more than two cameras are present, each pair of cameras has a sparse set of points where they interfere with each other. When a set of epipolar ToF cameras are running at different modulation frequencies, the contribution of each camera to shot noise in the other cameras is greatly reduced.
Camera Motion
With a rotating camera having a known rotational trajectory (obtained from a MEMS gyroscope), with regular imaging, each captured ToF measurement has motion blur and strong artefacts at depth discontinuities because the measurements are not aligned to each other. In theory, these could be collected using a spatially varying deconvolution but this is computationally expensive and does a poor job of recovering high frequency components. With epipolar ToF imaging, motion blur has basically no effect and a depth map with a rolling shutter like effect is acquired. This can be corrected with a simple image warp computed from the rotation.
Epipolar imaging for time-of-flight depth cameras mitigates many of the problems commonly encountered with depth cameras, such as poor performance in brightly lit conditions, systemic errors due to global illumination, inter-device interference and errors due to camera motion. Compared to depth cameras, systems like scanning LIDAR that illuminate and image a single point at a time are very robust to all these effects but have a low measurement rate. Epipolar imaging can be thought of a compromise between these two extremes of full-field capture and point-by-point capture. Because epipolar imaging illuminates and captures a single line at a time, it allows a depth camera to have most of the robustness of point scanning while still having a high measurement rate.
Cycling through patterns row-by-row, as is done here for ToF, is directly applicable to structured light as well. It would make it possible to apply multi-image structured light methods that generate high quality depth maps to dynamic scenes where currently only single-shot methods can be used.
In the described prototype, the scanning mirror follows a sawtooth pattern and captures rows in an orderly sequence. However, with a faster scanning mirror, pseudo random row sampling strategies could be implemented that might allow epipolar imaging to be used in conjunction with compressed sensing or similar techniques to recover temporally super-resolved depth maps of fast moving scenes. Embodiments of the invention have been described herein using specific identified components, however, the invention is not meant to be limited thereby. The scope of the claimed invention is defined by the claim set presented below.
This application is a national phase filing under 35 U.S.C. § 371 claiming the benefit of and priority to International Patent Application No. PCT/US2018/014369, filed on Jan. 19, 2018, which claims the benefit of U.S. Provisional Patent Application Ser. No. 62/499,193, filed Jan. 20, 2017. This application is a Continuation-In-Part of U.S. patent application Ser. No. 15/545,391, which is a national phase filing under 35 U.S.C. § 371 claiming the benefit of and priority to International Patent Application No. PCT/US2016/017942, filed on Feb. 15, 2016, which claims the benefit of U.S. Provisional Patent Application Ser. No. 62/176,352, filed Feb. 13, 2015. The entire contents of these applications are incorporated herein by reference.
This invention was made with government support under N000141512358 awarded by the ONR, 11S1317749 awarded by the NSF, HR00111620021 awarded by DARPA, and grants NNX16AD98G and NNX14AM53H awarded by NASA. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2018/014369 | 1/19/2018 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2018/136709 | 7/26/2018 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
4621185 | Brown | Nov 1986 | A |
4687325 | Corby, Jr. | Aug 1987 | A |
4687326 | Corby, Jr. | Aug 1987 | A |
5128753 | Lemelson | Jul 1992 | A |
5717390 | Hasselbring | Feb 1998 | A |
5852672 | Lu | Dec 1998 | A |
6043905 | Kato | Mar 2000 | A |
6529627 | Callari et al. | Mar 2003 | B1 |
6556307 | Norita et al. | Apr 2003 | B1 |
8400494 | Zalevsky et al. | Mar 2013 | B2 |
9500477 | Lee et al. | Nov 2016 | B2 |
9536320 | Prince | Jan 2017 | B1 |
9838611 | Haraguchi | Dec 2017 | B2 |
10021284 | Wang et al. | Jul 2018 | B2 |
10145678 | Wang et al. | Dec 2018 | B2 |
20010035636 | Adachi | Nov 2001 | A1 |
20020014533 | Zhu | Feb 2002 | A1 |
20040151345 | Morcom | Aug 2004 | A1 |
20060132752 | Kane | Jun 2006 | A1 |
20070242872 | Rudin | Oct 2007 | A1 |
20080123939 | Wieneke | May 2008 | A1 |
20090066929 | Tropf | Mar 2009 | A1 |
20090201486 | Cramblitt | Aug 2009 | A1 |
20100074532 | Gordon et al. | Mar 2010 | A1 |
20100085425 | Tan | Apr 2010 | A1 |
20100128221 | Muller et al. | May 2010 | A1 |
20100303299 | Cho et al. | Dec 2010 | A1 |
20110102763 | Brown | May 2011 | A1 |
20110235018 | Mori | Sep 2011 | A1 |
20110292347 | Zhang et al. | Dec 2011 | A1 |
20110299135 | Takabatake | Dec 2011 | A1 |
20110317005 | Atkinson | Dec 2011 | A1 |
20120008128 | Bamji | Jan 2012 | A1 |
20120062705 | Ovsiannikov et al. | Mar 2012 | A1 |
20120062963 | Gillham | Mar 2012 | A1 |
20120200829 | Bronstein et al. | Aug 2012 | A1 |
20130010087 | Nieten | Jan 2013 | A1 |
20130021474 | Taylor | Jan 2013 | A1 |
20130127854 | Shpunt et al. | May 2013 | A1 |
20140055771 | Oggier | Feb 2014 | A1 |
20140055779 | Enami | Feb 2014 | A1 |
20140111616 | Blayvas | Apr 2014 | A1 |
20140125775 | Holz | May 2014 | A1 |
20140232566 | Mimeault et al. | Aug 2014 | A1 |
20140247323 | Griffis et al. | Sep 2014 | A1 |
20140328535 | Sorkine-Hornung | Nov 2014 | A1 |
20150067929 | Blanton et al. | Mar 2015 | A1 |
20150176977 | Abele et al. | Jun 2015 | A1 |
20150177506 | Nishiwaki | Jun 2015 | A1 |
20150215547 | Muller | Jul 2015 | A1 |
20150281671 | Bloom et al. | Oct 2015 | A1 |
20150285618 | Haraguchi | Oct 2015 | A1 |
20150285625 | Deane | Oct 2015 | A1 |
20150294496 | Medasani et al. | Oct 2015 | A1 |
20150362698 | Lansel | Dec 2015 | A1 |
20160041266 | Smits | Feb 2016 | A1 |
20160065945 | Yin | Mar 2016 | A1 |
20160124203 | Ryu | May 2016 | A1 |
20160198147 | Waligorski | Jul 2016 | A1 |
20160209183 | Bakken et al. | Jul 2016 | A1 |
20160335778 | Smits | Nov 2016 | A1 |
20160349369 | Lee et al. | Dec 2016 | A1 |
20170064235 | Wang | Mar 2017 | A1 |
20170127036 | You | May 2017 | A1 |
20170142406 | Ovsiannikov | May 2017 | A1 |
20170272726 | Ovsiannikov | Sep 2017 | A1 |
20170310948 | Pei | Oct 2017 | A1 |
20170353707 | Wang | Dec 2017 | A1 |
20170366801 | Horesh | Dec 2017 | A1 |
20180246189 | Smits | Aug 2018 | A1 |
20180252800 | Morcom | Sep 2018 | A1 |
20180307310 | McCombe | Oct 2018 | A1 |
20190025986 | Yamauchi | Jan 2019 | A1 |
20190236796 | Blasco Claret et al. | Aug 2019 | A1 |
Number | Date | Country |
---|---|---|
2002039716 | Feb 2002 | JP |
2012168049 | Sep 2012 | JP |
2015513825 | May 2015 | JP |
WO-03016982 | Feb 2003 | WO |
2015003108 | Jan 2015 | WO |
WO-2015119872 | Aug 2015 | WO |
2016131036 | Aug 2016 | WO |
Entry |
---|
International Search Report and Written Opinion for International application No. PCT/US2018/014369, dated Apr. 26, 2018, 8 pages. |
Extended European search report for EP Application No. 18742111.0, dated Jul. 10, 2020, 6 pages. |
International Search Report and Written Opinion for International Application No. PCT/US2018/014369, dated Apr. 26, 2018. |
Achar, S. et al., “Epipolar Time-of-Flight Imaging.” ACM Transactions on Graphics, 36(4) Article 37 (2017) 8 pages. |
Barry, A.J., et al, “Pushbroom Stereo for High-Speed Navigation in Cluttered Environments”, IEEE International Conference on Robotics and Automation (ICRA), pp. 3046-3052, (May 26-30, 2015). |
Blais, F., “Review of 20 Years of Range Sensor Development”, Journal of Electronic Imaging, 13(1): 231-240, Jan. 2004. |
Extended European Search Report for Application No. EP19772623.5, dated Oct. 22, 2021, 7 pages. |
Heckman, P. J., “Underwater Range Gated Photography”, Proc. SPIE 0007, Underwater Photo Optics I, Jun. 1, 1966, [online] [retrieved on Jan. 28, 2019]. Retrieved from Internet URL: https://www.spiedigitallibrary.org/conference-proceedings-of-spie. |
International Preliminary Report and Written Opinion for International Application No. PCT/US2016/017942, dated Aug. 15, 2017, 10 pages. |
International Search Report and Written Opinion for International application No. PCT/US16/17942, dated May 19, 2016, 11 pages. |
International Search Report and Written Opinion for International application No. PCT/US19/21569, dated May 24, 2019, 10 pages. |
International Search Report and Written Opinion for International application No. PCT/US19/052854 dated Jul. 15, 2020, 8 pages. |
Jarvis, R. A., “A Perspective on Range Finding Techniques for Computer Vision”, IEEE Transactions On Pattern Analysis and Machine Intelligence, vol. PAMI-5, No. 2, pp. 122-139, Mar. 1983. |
O'Toole et al. “3D Shape and Indirect Appearance by Structured Light Transport”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 3246-3253. |
O'Toole, M., et al., “Homogeneous Codes for Energy-Efficient Illumination and Imaging”, ACM Transactions on Graphics Article 35 (2015) 13 pages. |
O'Toole et al. “Prima-Dual Coding to Probe Light Transport.” ACM Transactions on Graphics, vol. 31, No. 4, Article 39, Publication Date: Jul. 2012. |
Tadano, R., et al., “Depth Selective Camera: A Direct, On-chip, Programmable Technique for Depth Selectivity in Photography”, IEEE International Conference on Computer Vision (2015) 9 pages. |
Wang, J. et al., “Programmable Triangulation Light Curtains.” ECCV Computer Vision Foundation, 2018, pp. 1-16. |
Communication pursuant to Article 94(3) EPC for European Patent Application No. 18742111.0, dated Dec. 16, 2021, 5 pages. |
Number | Date | Country | |
---|---|---|---|
20200092533 A1 | Mar 2020 | US |
Number | Date | Country | |
---|---|---|---|
62499193 | Jan 2017 | US | |
62176352 | Feb 2015 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15545391 | US | |
Child | 16468617 | US |