Embodiments of the present disclosure relate generally to imaging systems for depth estimation and more particularly to imaging systems for hybrid depth estimation.
Imaging systems in the field of the invention generally rely on the principle of triangulation. One implementation of this principle involves acquiring images from two locations where an effective aperture for pixels in the two images is small relative to the separation between the two locations. (The effective aperture refers to the portion of a physical aperture that contains all of the rays that reach the active part of a pixel.) This implementation is called stereo vision and is often implemented with two separate cameras and lenses. To perform triangulation, a correspondence is made between the two images to determine a position of features within both images. When the positions are offset from one another, the amount of offset may be used to determine the 3-dimensional location of the feature, including the depth of the feature.
Depth estimates obtained using such techniques are useful for a variety of applications. For example, depth estimates may be used to obtain a three dimensional map of a site or area of interest, such as a construction site, a room, an anatomical region, and/or the like. Depth estimates may also be used to form three dimensional models of objects for applications such as three-dimensional printing or for archival purposes. Depth estimates may also be used by cinematographers, photographers, or other artists to form three-dimensional images or video.
Accordingly, it would be desirable to develop improved imaging systems and methods for estimating the depth of an object.
According to some embodiments, an image acquisition device may include a first image sensor configured to acquire one or more first images, a second image sensor configured to acquire a set of polarized images, and an image processor. The image processor may be configured to perform operations including receiving the one or more first images from the first image sensor, determining, based on the one or more first images, a first depth estimate of a feature appearing in the one or more first images, determining a reliability of the first depth estimate, receiving the set of polarized images from the second image sensor, determining, based on the set of polarized images, a second depth estimate of the feature, and determining a hybrid depth estimate corresponding to the first depth estimate or the second depth estimate. The hybrid depth estimate is selected based on the reliability of the first depth estimate.
According to some embodiments, a system may include a non-transitory memory and one or more hardware processors configured to read instructions from the non-transitory memory and perform operations. The operations may include receiving one or more first images from a first image sensor, determining, based on the one or more first images, a first depth estimate of a feature appearing in the one or more first images, identifying the first depth estimate as a hybrid depth estimate, and determining a reliability of the first depth estimate. When the reliability of the first depth estimate is below a predetermined threshold the operations may further include receiving a set of polarized images from a second image sensor, determining, based on the set of polarized images, a second depth estimate of the feature, and replacing the hybrid depth estimate with the second depth estimate.
According to some embodiments, a method may include receiving a first phase image and a second phase image from a phase detection sensor, determining, based on the first phase image and the second phase image, a phase-based depth estimate of a feature appearing in the first and second phase images, determining a reliability of the phase-based depth estimate, receiving a set of polarized images from a polarized image sensor, determining, based on the set of polarized images, a polarization-based depth estimate of the feature, and determining a hybrid depth estimate corresponding to the phase-based depth estimate or the polarization-based depth estimate. The hybrid depth estimate is selected based on the reliability of the phase-based depth estimate.
These and other aspects and features of the present disclosure will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments in conjunction with the accompanying figures, wherein:
Embodiments of the present disclosure will now be described in detail with reference to the drawings, which are provided as illustrative examples of the disclosure so as to enable those skilled in the art to practice the disclosure. The drawings provided herein include representations of devices and device process flows which are not drawn to scale. Notably, the figures and examples below are not meant to limit the scope of the present disclosure to a single embodiment, but other embodiments are possible by way of interchange of some or all of the described or illustrated elements. Moreover, where certain elements of the present disclosure can be partially or fully implemented using known components, only those portions of such known components that are necessary for an understanding of the present disclosure will be described, and detailed descriptions of other portions of such known components will be omitted so as not to obscure the disclosure. In the present specification, an embodiment showing a singular component should not be considered limiting; rather, the disclosure is intended to encompass other embodiments including a plurality of the same component, and vice-versa, unless explicitly stated otherwise herein. Moreover, inventors do not intend for any term in the specification or claims to be ascribed an uncommon or special meaning unless explicitly set forth as such. Further, the present disclosure encompasses present and future known equivalents to the known components referred to herein by way of illustration.
There are a variety of ways to acquire depth images and/or depth maps of a scene. Active methods send light from imaging equipment into the scene and measure the response. One active technique is time of flight imaging that measures the amount of time required for light to travel into the scene and return to the imaging system. Another technique is structured light where some type of projector is used to illuminate the scene with light patterns such as sinusoidal or squares waves, random dots, or various other patterns. Through triangulation using the projector and an image captured by an imaging system, the depth is estimated. Both time of flight and structured light require lighting systems with complex components. These components are typically expensive, prone to breaking or misalignment, and demand significant space and additional equipment for mechanical and electrical support.
Some imaging systems can measure depth maps of a scene through multiple exposures including video recording. Techniques include when the camera is moved through different positions or the camera acquires multiple images each with different focal settings. These systems are typically limited to scenes that are static since movement within the scene may interfere with depth estimation.
Other depth estimation techniques include shape from shading and photometric stereo, which use light coming from known direction(s) and estimate depth by analyzing the intensity of light captured by an image system to determine the relative shape of objects in the scene. Shape from shading generally uses a single image, whereas photometric stereo uses multiple images each captured under illumination from a different direction. These techniques assume the light is approximately collimated without any significant falloff as it passes through the scene. This assumption often requires use of large light sources placed relatively far from the scene. This assumption also results in estimation of only the relative shape of the surface while the absolute distance of points or objects in the scene is not possible. Additionally, shape from shading generally requires a constant or known albedo (overall object brightness), which is not practical for nearly all natural objects. Shape from shading and photometric stereo generally assume objects are Lambertian, which means they reflect light equally in all directions. Again this is not practical for many natural objects.
Another depth estimation technique involves capturing two images where the image sensing unit remains stationary and the scene is illuminated from illumination unit or units that are placed at different distances (“near” and “far”) from the scene. The distance is estimated as
where represents the estimated depth of a point of interest from the first position of illumination unit, Δ represents the distance between the near and far positions of the illumination unit or units, and m1 and m2 represent the measured intensities of the point of interest in the first and second images corresponding to the first and second positions, respectively. This technique generally is able to estimate depth using a compact system that includes a single imaging sensing unit and illumination unit and also can operate reliably on regions of the scene with little or no contrast. However, this technique provides an accurate depth estimate for only a single point of the scene that lies on the line connecting the positions of the illumination units. Significant errors are introduced for points away from this line. The systematic depth error results in estimates being noticeably distorted, except when the observed scene is contained within a small cone emanating from the position of the illumination unit that is centered about the line connecting the positions of the illumination unit. Therefore either the region of the scene with accurate depth estimates is limited in size by such a cone or the illumination units must be placed at a significant distance from the scene in order to increase the size of the cone.
For illustrative purposes, an object 110 is depicted at a depth Z from depth estimation system 100. Object 110 emits, reflects, and/or otherwise produces illumination 112 that is captured by depth estimation system 100. In some embodiments, illumination 112 may include various types of electromagnetic radiation, such as visible light, ultraviolet radiation, infrared radiation, and/or any combination thereof. In some embodiments, object 110 may be passively illuminated by ambient light and/or actively illuminated by an illumination source, such as a camera flash (not shown). In some examples, object 110 may itself be a source of illumination 112. As depicted in
Imager 120 acquires images by converting incident electromagnetic radiation, such as illumination 112, into electronic signals. For example, imager 120 may include one or more image sensors that convert electromagnetic radiation into electronic signals by photoelectric conversion.
In some embodiments, imager 120 may include two or more image sensors that generate images from which depth information may be extracted. In general, the two or more image sensors each provide the depth information using different mechanisms or modalities, each of which may have different strengths and/or limitations. For example, as depicted in
In some embodiments, imager 120 may include various additional image sensors for purposes other than depth estimation. For example, imager 120 may include a third image sensor for acquiring images in a different imaging modality than phase detection sensor 130 and/or polarized image sensor 140 (e.g., a non-phase and/or non-polarization imaging modality). In some examples, the third image sensor may acquire conventional black-and-white and/or color images, thermal images, infrared images, and/or the like. In some embodiments, imager 120 may additionally include electronic components for processing the electronic signals generated by the image sensors, such as amplifiers, analog to digital (A/D) converters, image encoders, control logic, memory buffers, and/or the like. Relative to phase detection sensor 130 and/or polarized image sensor 140, the additional electronic components may be placed in a separate module, in the same package, on the same chip, on the backside of a chip, and/or the like.
In some examples, imager 120 may include imaging optics 125 for forming images on phase detection sensor 130 and/or polarized image sensor 140. For example, imaging optics 125 may include one or more lenses, mirrors, beam splitters, prisms, apertures, color filters, polarizers, and/or the like. In some examples, imaging optics 125 may define a focal length f of imager 120, which corresponds to the depth at which a particular feature is in focus. The focal length f of imager 120 may be fixed and/or adjustable, e.g., by moving one or more lenses of imaging optics 125. In some examples, imaging optics 125 may define various other optical parameters associated with imager 120, such as an aperture diameter.
In some embodiments, phase detection sensor 130 and/or polarized image sensor 140 may include respective sensor layers 132 and/or 142. In some examples, sensor layers 132 and/or 142 may include a plurality of pixels, such as an array of pixels. For example, sensor layers 132 and/or 142 may include a charge coupled device (CCD) sensor, active pixel sensor, complementary metal oxide semiconductor (CMOS) sensor, N-type metal oxide semiconductor (NMOS) sensor and/or the like. According to some examples, sensor layers 132 and/or 142 may be implemented as a monolithic integrated sensor, and/or may be implemented using a plurality of discrete components.
Phase detection sensor 130 captures two or more phase images in which features are offset in different directions based on the depth of a feature relative to the focal length f of imager 120. For example, the phase images may include a “left” image, in which a feature is offset to the left when the depth is less than the focal length f and to the right when the depth is greater than f. The phase images may further include a “right” image in which the feature is offset in the opposite direction relative to the “left” image. That is, the feature is offset to the right when the depth is less than f and to the left when the depth is greater than f. As depicted in
In some embodiments, phase detection sensor 130 may include phase detection optics 134 for forming the two or more phase images captured by sensor 132. For example, phase detection optics 134 may include one or more lenses (e.g., microlenses over each pixel of sensor layer 132), apertures (e.g., apertures forming left and/or right windows over each pixel of sensor layer 132), and/or the like. Illustrative embodiments of phase detection sensor 130 are described in further detail in U.S. Pat. No. 8,605,179, entitled “Image Pickup Apparatus,” and U.S. Pat. No. 8,902,349, entitled “Image Pickup Apparatus,” which are hereinafter incorporated by reference in their entirety.
Polarized image sensor 140 captures a plurality of polarized images, each image corresponding to a particular polarization component of illumination 112. Differences among the polarized images may be used to determine the polarization angle and degree of polarization of illumination 112, which in turn may be used to determine the angle at which illumination 112 is reflected from a feature. Consequently, depth information may be extracted from the polarized images by analyzing spatial variations in the reflection angle, as will be discussed in greater detail below with reference to
In some embodiments, polarized image sensor 140 may include polarization optics 144 for forming the three or more polarized images captured by sensor 142. For example, polarization optics 144 may include a layer of polarizers (e.g., a set of polarizers arranged over each pixel of sensor layer 142). Illustrative embodiments of polarized image sensor 140 are described in further detail in U.S. Pat. Publ. No. 2016/0163752, entitled “Image-Acquisition Device,” which is hereinafter incorporated by reference in its entirety.
As depicted in
Depth estimation system 100 includes an image processor 150 that is communicatively coupled to imager 120. According to some embodiments, image processor 150 may be coupled to imager 120 and/or various other components of depth estimation system 100 using a local bus and/or remotely coupled through one or more networking components. Accordingly, image processor 150 may be implemented using local, distributed, and/or cloud-based systems and/or the like. In some examples, image processor 150 may include a processor 160 that controls operation and/or execution of hardware and/or software. Although only one processor 160 is shown, image processor 150 may include multiple processors, CPUs, multi-core processors, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), and/or the like. Processor 160 is coupled to a memory 170, which may include one or more types of machine readable media. Some common forms of machine readable media may include floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, and/or any other medium from which a processor or computer is adapted to read. Processor 160 and/or memory 170 may include multiple chips in multiple packages, multiple chips in a single package (e.g., system-in-package (SIP)), and/or a single chip (e.g., system-on-chip (SOC)). In some examples, one or more components of image processor 150, such as processor 160 and/or memory 170 may be embedded within imager 120. For example, various components of depth estimation system 100, such as sensor layers 132 and/or 144, processor 160, and/or memory 170 may be integrated in a single package and/or on a single chip.
As will be discussed, image processor 150 may be configured to perform depth estimation based on image data received from imager 120. In some examples, the image data received from imager 120 may be received in an analog format and/or in a digital format, such as a RAW image file. Similarly, the depth estimate data output by image processor 150 may be formatted using a suitable output file format including various uncompressed, compressed, raster, and/or vector file formats and/or the like. In some examples, the depth estimate data generated by image processor 150 may be stored locally and/or remotely, sent to another processor for further processing, transmitted via a network, displayed to a user of depth estimation system 100 via a display interface (not shown), and/or the like.
A subset of the pixels of phase detection sensor 200 are configured to generate phase images 204. That is, in certain pixels, the right portions of the pixels are blocked to generate a first or“left” phase image 204a, whereas in other pixels, the left portions of the pixels are blocked to generate a second or “right” phase image 204b.
As depicted in
At a process 510, first and second phase images are received. In some examples, the first and second phase images may correspond to the “left” and “right” phase images captured by the phase detection sensor, as previously discussed. Consistent with such examples, a feature appearing in both images may be offset in different directions in each of the phase images. The direction and magnitude of the offset depends on the depth of the feature relative to a focal length associated with the phase detection sensor.
One of skill in the art would appreciate that certain geometric relationships between the first and second phase images may be described using an epipolar geometry. While a detailed discussion of the epipolar geometry is beyond the scope of this disclosure, the epipolar geometry defines several geometric parameters that have standard meanings in the context of stereo matching and are referred to below, including a baseline and an epipolar line.
At a process 520, a cost function is evaluated based on the first and second phase images. In general, the cost function is a mathematical function of the first and second phase images that reaches a minimum value when the first and second phase images are aligned. In some examples, the cost function may be evaluated as a function of displacement of the second phase image relative to the first phase image along the epipolar line. For example, the cost function may be evaluated using the sum of absolute difference (SAD) technique. In some examples, the cos
where Cd is the cost function at a given point (x,y) in the phase images, x is a coordinate along the epipolar line, li is the left phase image, lr is the right phase image, d is the displacement of the right phase image along the epipolar line, and w is a window of points around (x,y) for which the cost function is evaluated. For example, the window may correspond to an n x n box of pixels around (x,y).
At a process 530, a disparity d′(x,y) between the first and second phase images is determined based on the cost function. In some embodiments, the disparity d′(x,y) may correspond to the value of d at which the cost function is minimized for a given point (x,y). Consequently, the disparity d′ may represent the degree of misalignment between the first and second phase images (i.e., the amount of displacement d that is needed to align the first and second phase images).
At a process 540, a depth is estimated based on the disparity. In some embodiments, the depth is calculated based on the following equation:
where Z1 is the estimated depth of a feature at point (x,y), B is the length of the baseline, f is the focal length of the imager, and d′(x,y) is the disparity determined at process 530. In some examples, B may correspond to half of the aperture diameter of the imager.
At a process 610, a set of polarized images is received. In some examples, the set of polarized images may include a plurality of polarized images captured by the polarized image sensor. Each of the polarized images may correspond to a different polarization direction.
At a process 620, polarization parameters are determined. In some examples, the polarization parameters may include the polarization angle and the degree of polarization. For example, the polarization angle and the degree of polarization may be determined by solving the following system of equations:
where ln is the measured intensity associated with a polarizer set at an angle pn, A is the amplitude, φ is the polarization angle, and C is the bias. In some embodiments consistent with
In some embodiments, the above system of equations may be solved analytically and/or numerically, yielding the unknown values of A, φ, and C. Accordingly, the polarization angle φ may be directly determined by solving the above equations. Similarly, the degree of polarization ρ may be determined based on the values of A and C using the following equation:
At a process 630, an angle of reflection is determined. In some examples, the angle of reflection is determined based on the value of the degree of polarization determined at process 620. For example, the relationship between the degree of polarization and the angle of reflection may be expressed using the following equation:
where θ is the angle of reflection and n is the refractive index. Notably, solving the above equation to determine the unknown value of θ may yield two possible values of θ for a given value of ρ. Accordingly, the angle of reflection may not be uniquely determined at process 630, but rather two candidate values of the angle of reflection may be determined.
At a process 640, a depth is estimated based on the polarization angle and the angle of reflection. In some embodiments, the depth may be estimated by iteratively solving the following equation:
where Z2,n is depth estimate after the nth iteration of the calculation, H is a smoothing filter, S is the sum of the smoothing filter coefficients, ϵ is the step size between pixels, p is given by tan θ cos φ, q is given by tan θ sin φ, and a is given by the following equation:
As discussed previously, process 630 yields two candidate values of the angle of reflection θ. Moreover, the symmetry of the polarization angle φ is such that the same results are obtained for φ±π. Consequently, a plurality of candidate depths may be estimated at process 640, corresponding to each possible value of θ and/or φ. Accordingly, reducing the depth estimate from method 600 to a single value may involve supplying additional information (e.g., contextual information) to select a value from among the plurality of candidate values that is most likely to correspond to the true depth of the feature.
Comparing the values of Z1 and Z2 estimated using methods 500 and 600, respectively, Z1 is relatively simple to calculate by method 500 and generally provides a single, absolute depth estimate at a given point. However, because stereo matching involves aligning features appearing in images, Z1 may be inaccurate and/or noisy when estimating the depth of an object or surface with little texture (e.g., few spatial variations and/or landmark features that may be used to align the phase images).
By contrast, Z2 is generally independent of the surface texture because the depth is estimated based on the polarization of the reflected illumination rather than image alignment. Consequently, Z2 may provide a higher accuracy and/or less noisy estimate of depth than Z1 on a low texture surface. However, method 600 generally does not yield a single value of Z2, but rather a plurality of candidate values, as discussed previously with respect to process 640.
At a process 710, one or more first images are received from a first image sensor. In some embodiments, the one or more first images may correspond to first and second phase images received from a phase detection sensor. Consistent with such embodiments, process 710 may correspond to process 510 of method 500. However, it is to be understood that the one or more first images may correspond to various other types of images from which depth information may be extracted, such as stereo images, images corresponding to an active depth estimation method (e.g., time of flight and/or structured lighting), images captured using a moving camera and/or light source, and/or the like.
At a process 720, a first depth estimate is determined based on the first images. In some embodiments, process 720 may correspond to processes 520-540 of method 500. Consistent with such embodiments, the first depth estimate may be determined using the stereo matching technique described above. In particular, the first depth estimate may be determined using a cost function to determine a disparity between first and second phase images. However, it is to be understood that a variety of techniques may be used to determine the first depth estimate, consistent with the variety of image types that may be received from the first image sensor at process 710.
At a process 730, one or more second images are received from a second image sensor. In some embodiments, the one or more second images may correspond to a set of polarized images received from a polarized image sensor. Consistent with such embodiments, process 730 may correspond to process 610 of method 600.
At a process 740, a second depth estimate is determined based on the second images. In some embodiments, process 740 may correspond to processes 620-640 of method 600. Consistent with such embodiments, the second depth estimate may include a plurality of candidate depth estimates, such that the second depth estimate is not uniquely defined.
At a process 750, a reliability of the first depth estimate is determined. In some examples, the reliability may be determined based on the cost function used to determine the first depth estimate during process 720. An illustration of how the cost function may be used to determine the reliability is discussed below with reference to
Returning to process 750 of
At a process 760, a hybrid depth estimate is determined. In some examples, the hybrid depth estimate may default to the first depth estimate determined at process 720. However, in some examples, the first depth estimate may not be sufficiently reliable. For example, the first depth estimate may be determined to be unreliable when the reliability determined at process 750 is below a predetermined threshold. Accordingly, when the reliability is below the predetermined threshold, the second depth estimate determined at process 740 may be used as the hybrid depth estimate. For example, when the second depth estimate includes a plurality of candidate depth estimate, contextual information may be used to determine which of the plurality of candidate depth estimates is likely to represent the true depth value. In some examples, the contextual information may include information associated with the first depth estimate (e.g., selecting the candidate depth estimate that is closest to the first depth estimate), the hybrid depth estimate assigned to nearby points (e.g., selecting the candidate depth estimate that is closest to the depth assigned to a neighboring pixel), and/or the like. Upon determining the hybrid depth estimate, method 700 may terminate and/or may proceed to processes 720 and/or 740 to perform hybrid depth estimation at other points in the received images.
In plot 910, the depth estimate Z1 corresponds to a phase-based depth estimate, such as the first depth estimate determined at process 720 of method 700. As depicted in
In plot 920, the depth estimate Z2 corresponds to a polarization-based depth estimate, such as the second depth estimate determined at process 740 of method 700. As depicted in
In plot 930, the depth estimate Z corresponds to a hybrid depth estimate, such as the hybrid depth estimate determined at process 760 of process 700. As
Some examples of controllers, such as image processor 150 may include non-transient, tangible, machine readable media that include executable code that when run by one or more processors may cause the one or more processors to perform the processes of methods 500, 600, and/or 700. Some common forms of machine readable media that may include the processes of methods 500, 600 and/or 700 are, for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, and/or any other medium from which a processor or computer is adapted to read.
Although illustrative embodiments have been shown and described, a wide range of modifications, changes and substitutions are contemplated in the foregoing disclosure and in some instances, some features of the embodiments may be employed without a corresponding use of other features. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. Thus, the scope of the invention should be limited only by the following claims, and it is appropriate that the claims be construed broadly and in a manner consistent with the scope of the embodiments disclosed herein.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2017/067534 | 12/20/2017 | WO | 00 |