This invention is in the 3D sensing field, in particular, image sensing of the third dimension, depth, on top of the 2D image sensing.
Three-dimensional (3D) computer vision has vast applications, especially in the AI era. VR, AR, digital photography, Robotics, Simultaneous Localization and Mapping (SLAM), 3D reconstruction, face reorganization, bill pay based on 3D face recognition, and gesture recognition all require 3D vision.
The current image sensors are CCD (charge-coupled device) and CMOS (complementary metal-oxide semiconductor) sensor. Both are two-dimensional image sensor arrays that are placed at the image plane of an imaging system (camera) to capture the 2D projection of the 3D world. Both are using photosensitive materials to sense the number of photons collected in the pixel area. A CCD image sensor converts photons into a current, then moves the current out for measurement, while CMOS image sensor converts photons into voltage, measure voltage at the pixel location. The CMOS image sensor has become the mainstream of the image sensor because it can be manufactured using the standard CMOS process that makes most semiconductor chips (CPU, GPU, DRAM, FLASH memories).
Currently, there are three major approaches to obtain the 3rd dimension (depth) from 2D image sensors: stereo, structured-light, and ToF (time of flight). A stereo camera has a pair of standard 2D image sensors and obtains the depth information similar to our human eyes: it identifies and matches the location shift, called disparity, between the same object on the images from the left and right camera captured at the same time. Then the depth is the inverse of the disparity. A structured-light camera projects specific patterns to the object, and calculate the depth from the deformation of such patterns observed from a standard 2D image sensor placed at a different location from the pattern projector. ToF camera emits laser light and receives the bounced signal, the directly or indirectly computes the depth from the time of the laser light flight.
Both stereo and structured light are based on triangulation. Both require complicated matching algorithms to match the pattern and recover the depth. A Stereo camera is categorized as a passive stereo camera or an active stereo camera. In a passive stereo camera, the color, grayscale, or texture from the image is used for matching. It works both indoor and outdoor. However, it does not work well for the places lack of textures, such as walls painted with a single color. The active stereo camera was invented to overcome this drawback. It uses a projector (usually infrared light) to project patterns, textures, speckles, onto the scene, so that an artificial texture is created everywhere on the objects for dense depth matching and calculation. Both stereo and structured light require baseline—placing the second camera in stereo or the projector in structured light away from the camera. The depth range and accuracy are related to the baseline length, larger the baseline, larger the depth range, or more accuracy of the depth can be calculated. Such a baseline requirement limits stereo and structured light in some applications that prefer a very compact size or no baseline.
ToF, on the other hand, doesn't have a baseline. It also requires a light projector, but the light projector can be placed next to the ToF image sensor. Therefore, it fits for applications that require a compact size or no baseline. ToF does not base on triangulation; it is based on the time of light (from the projector) flies to the objects, reflected, and flies back to the ToF image sensor.
ToF has direct ToF (dToF) and indirect ToF (iToF): in dToF the light projector sends out pulses in a very high frequency, in which a special sensor can distinguish each pulse and count the accumulated light dots. By look at the histogram of the light dots, it can interpolate the distance.
In iToF the light projector sends out modulated continuous light, such as sing wave or step functions. For objects in different depths, the modulated light received back at the sensor is different. It converts the distance into a frequency that a special receiver can receive and distinguish. Then from the frequency to map back to distance. Since the modulation is periodic, the sensor will receive the same signals for object more than integer times of the light can travel for each period of the modulation, causing phase wrap, or limit the depth range to a limited range (a couple of meters in most cases). ToF doesn't use triangulation, therefore, the accuracy is about the same within the measurable depth range, about a few millimeters to centimeters. The depth accuracy from ToF cannot match the depth accuracy of stereo and structured light in near range, where they can achieve sub millimeters but can be better than stereo and structured light for longer range.
Stereo, structured-light, and ToF are all trying to convert or encode depth information into something that a sensor can measure. Stereo and structured-light convert the depth into pixel shift, while ToF converts the depths into a super-fast light pulse accumulation in dToF and phase/frequency change in iToF.
Stereo and structured-light are leveraging the massively produced CMOS image sensor. ToF, in general, requires a special semiconductor senor, although there are ways to use the CMOS process to create their receivers. Stereo, structured-light, and ToF all benefit from a light projector, but light projectors are required in structured-light and ToF. In general, the light projector in structured light needs to project certain pre-designed patterns, therefore, more complicated than the light project in ToF, which doesn't project patterns, just modulated with a very high frequency.
In summary the current three types of image sensor that can measure depth all have its own advantages and disadvantages, they are all cost more than the massive manufactured CMOS image sensor, some have a lower resolution, some requires more computing power and complicated algorithms, most require active light projectors, some have large form factors and requires calibrations. The challenges in 3D image sensor remain.
In our invention, a new CMOS image sensor is created, it still uses the same CMOS manufacturing process, but it can output depth, and in some embodiment, can also product color (RGB) image, or even infrared images simultaneously. This new 3D COMS image sensor and imaging device can be operated in passive mode, which means just with the light from the scene, but it can also be operated with the help of active light—the light projector. This new 3D CMOS image sensor would have the lowest cost (same cost as a single CMOS image sensor), same frame rate as CMOS image sensor, same or almost the same resolution as CMOS image sensor, same form factor, the depth, and infrared image are aligned with the color image, and no active light (i.e., Infrared light dot projector) is required.
In our invention, we convert or encode the depth in a scene into something that a normal CMOS image sensor can measure. A CMOS sensor can measure intensity (the number of photons) with high accuracy. The standard CMOS sensor can sense 256 levels (stored as 8 bits unsigned integer) of intensity changes. It can also use the CMOS sensor array to match the relocation (shift) of the same or similar intensity or intensity distribution (pattern matching).
One way to understand imaging is through light cone: imagine there is a ball, at any location, one can draw a cone from the observation location to the edge of the ball, this is the “light cone.” The lights from any object will form an infinite number of such light cones for an infinite number of observation points no matter whether there is an observer.
When an observation point is fixed, one can move the ball: when the ball is moved closer to the observation point, the light cone becomes wider; when the ball is moved further away from the observation point, the light cone becomes narrower. However, since all objects will form some kind of light cone to the same observation point, all light cones are mixed up.
The way to separate the light cone is through an imaging device. The simplest model for an imaging device is a pin-hole camera. If a small pin-hole is placed at the observation point and a piece of paper (image plane) is placed behind the pin-hole, such setup is wrapped, so that only light reaches the image plane through the pin-hole. Then the extension of the light cone will form behind the pin-hole to the image plane. Therefore, the angle of the light cone on the image plane is different for an object at a different depth.
Modern imaging devices use a lens to replace the pin-hole. The lens bends the incident angle of the light based on the angle of the two surfaces of the lens because the reflective index of the lens material (usually glass, which is silicon dioxide) is different from the air: At the center of the lens, two surfaces are parallel, the light going through it bends less, while the further from the center, the two surfaces have a bigger angle Therefore, the light going through it bends more. The lens is designed so that all light rays from a single point of an object that goes through different locations on a lens reach a single point on the image plane.
Since the “bending power” on a lens is fixed, the further of the object, the incident angle from one point on the object to the edge of the lens will be smaller, the narrower of the “light cone,” then after it is bent by the lens, the light cone at the image plane for this point is wider; the near the distance of the object, the incident angle from one point on the object to the edge of the lens will be bigger, the wider of the “light cone,” then after the lens bends it, the light cone at the image plane for this point is narrower.
Therefore, at each point of the image plane, the depth is encoded or reflected by the narrowness or wideness of the light cone.
The normal CMOS image sensor can only sense the intensity of the light, the number of photons reaches the pixel area, not light direction.
This invention leverages a property of diffraction gratings. The same light ray with varying the incident angle produces a shift on its diffraction pattern after the gratings. In addition, the diffraction light forms different patterns multiple times from the gratings near field to far field. For a finite grating, the peaks of the diffraction pattern start from the same number as the number of grating periods, reduced when distance increases and eventually becomes 1 in grating near field and far field.
In this invention, a finite grating is placed before light rays reach the photo receiving section of CMOS image pixel. For a different angle of the incident light to the gratings, the shift of diffraction peak is different. The final intensity distribution is integral or superposition of all light rays in the light cone; therefore, for different light cones the final intensity distribution of the diffraction pattern is different. Multiple pixels are placed at a certain distance from the finite grating, where the interference patterns form, to collect the light cone intensity distribution after diffraction. Once such diffraction pattern intensity distribution is collected, the depth can be calculated by solving the inverse problem—from intensity distribution back to light cone angle, from light cone angle to distance of the object. Such multiple pixels to sense diffraction pattern intensity distribution is called one unit of “depth pixel.”
In some of the implementations, a finite grating for one unit of depth sensor is placed before pixel light sensitive material at a certain distance that three diffraction pattern peaks are formed for incident angle 0 (the light ray enters the gratings perpendicularly). Four CMOS sensor pixels are placed right here to collect the diffraction pattern distribution for the 0 order and the two 1st order diffraction peaks.
In some implementations, two CMOS pixels are used at these 3 diffraction peak locations to collect the light cone diffraction distribution.
In some different implementations, the multiple CMOS pixels can be placed at multiple locations that diffraction patterns form, from the same number of the finite grating period, all the way to 3, 2, and 1.
In some implementations, the finite grating can be placed after the CMOS image sensor color filter. This makes the light reaching the grating has a single wavelength or narrow distribution of wavelength.
In some implementations, the normal color channel pixel intensity is calculated by the integral of such multiple pixels of one depth sensor. Therefore, such depth pixels can also output the normal color or grayscale. An approximation of such integral is the sum (or average) of the intensity of pixels in such a depth pixel.
In some implementations, the finite grating can be placed for different CMOS image sensor color filter. The grating properties, such as but not limited, period and duty cycle, height, can be designed so that they all form the same diffraction patterns at the CMOS image pixels. Each color depth sensor can calculate the depth. Multiple color channels can improve depth resolution and robustness.
In some implementations, multiple depth pixels with the same color can be combined to improve depth resolution and robustness. In such cases, either the grating properties, such as period, duty cycle, height, can be designed to form either the same number or a different number of diffraction patterns at the same distance, or phase shift gratings are used, and each depth pixel has a phase offset comparing to others.
In some implementations, the finite grating can be placed behind a color filter that only allows infrared light passes. One can modify the color filters currently used in CMOS image sensors to filter red, green, and blue colors, to filter infrared light.
In some implementations, 4 pixels in a row can be used to form one unit depth pixel for infrared, red, green, and blue, a totally of 4 rows. Such 4×4 pixels can form one unit of a super pixel. Such super pixel can output normal color in red, green, and blue, plus infrared, plus depth. This not only adds infrared and depth channel to each pixel, but also makes all color and depth output automatically aligned and have the same resolution, eliminating the image alignment step that required when combining color image sensor and depth sensor, such as some active stereo and all structured-light, and also in ToF when color image is needed (ToF can only output gray-scale image in addition to depth map).
In some implementations, rows or columns of depth pixels are combined with normal CMOS image sensor color pixels. For example, in a 3×3 super pixel, 2 rows or columns are made by two depth pixels with 3 pixels, the other row or column is made by three normal CMOS image sensor color pixels: red, green, and blue, each with one pixel.
In some implementations, the 4 sub-pixels for a CMOS RGB color pixel, one red, two green in diagonal location, and blue, can be re-arranged. 2 green pixels are placed in a row or a column, a corresponding finite grating is placed on top of the two adjacent green pixels. Such two green pixels can output normal green intensity and depth information. Such implementation can add depth without reducing the normal color CMOS image sensor resolution.
In some implementations, the gratings can be implemented using CMOS metal layer process. The metal (copper) can be to block the light. This is called a binary grating.
In some implementations, the gratings can be implemented using CMOS process with materials that transparent to light. The grating hole region will be etched so that the whole area has the different depth from the other region. Light going through such gratings will not lose energy, but they still interference with two regions with different depths, which causes the different phases of the light. This is called phase shift gratings.
In some implementations, Front Side Illumination (FSI) process of CMOS image sensor is used. In FSI, the light will go through the microlens, and color filter, both optional, then go through the space between metal layers for semiconductor wiring, then hit the light receiving section, which is made of the photosensitive material. The finite gratings can be implemented on top of metal layers (wiring) of the transistor.
In some implementations, Back Side Illumination (BSI) process of CMOS image sensor is used. In BSI, the substrate is flipped and made very thin. It opens the area for the light receiving section, pixel's sensitive material, and the transistors are placed in the back so that light won't be blocked or affected by metal wiring layers. In BSI, we have more degree of freedom, and the gratings can be implemented on top of the substrate. Materials that are transparent to light, such as photo dioxide, can be used in between to make the distance needed.
In some implementations, the grating layer can be combined with a light shield layer in the current CMOS image sensor.
In some implementations, multiple units of depth pixels are put together to form one super depth pixel, for example, 1×2, 2×1, or 2×2 depth pixels to form one super depth pixel. Within each super depth pixel, the finite grating on each depth pixel can be placed orthogonally. Since one dimensional grating samples the angle of the light cone in one direction, adding an orthogonal direction grating will improve the robustness of the depth and color value.
In some implementations, the finite grating can be placed orthogonally within one unit of depth pixel. For example, in a depth sensor composed by 4×4 CMOS image sensor, the grating in the top or bottom one row of sensors are one direction (vertical holes), the rest of three rows are made by 4 columns of the depth sensor, while the gratings in those depth pixels are in the orthogonal direction (horizontal holes).
In some implementations, a mask is placed next to the lens (either in front or behind) to filter the light passing through the lens. For example, an annular mask is placed so that only the largest incident angle of the light cone is collected to improve depth sensitivity.
In the following detailed description of this invention, we use the same or similar number in different figures to refer to the same element or the same thing.
An image sensor that can sense the depth information, the third dimension on top of the normal two-dimensional projection of the 3D scene is described. This image sensor can be manufactured using the massively produced, the industry standard CMOS process. The normal CMOS image sensor converts the photons to voltage and measures the voltage digitally at each pixel location. Therefore, it measures the intensity of the 3D scene projected to a 2D image plane. In this 3D CMOS image sensor and imaging device, we still use the normal CMOS pixel and circuit to measure the photons collected at each pixel area—the pixel intensity, but we are modulating the light so that the depth information can be encoded into in intensity distribution in a small number of pixels. Once such distribution is collected, we can solve the inverse problem to calculate the depth. Such encoding is through two steps: the first step is to encode the depth into the light cone angle. The light cone angle is transferred or enhanced through an imaging setup (pin-hole camera, a camera with a lens or multiple lenses) to the image plane. We can use gratings to bend the light. We leverage the facts 1). the light passing through a finite grating will interference and produce diffraction patterns at multiple locations, and the number of diffraction peaks will gradually reduce from the number of periods on the finite grating to 3, 2, eventually 1, 2). The oblique incident angle will produce a shift on its diffraction pattern from near field to far field; therefore, the light cone angle can be encoded into an intensity distribution, such intensity distribution can be collected by multiple normal CMOS sensor pixels.
The benefits of such a 3D CMOS image sensor and imaging device are many. The first and the most significant aspect is the economic benefit: this sensor can be made with the industry standard and massive produced CMOS process; therefore, the cost is the same or similar to normal 2D CMOS image sensors.
The second benefit is its size: It can be made with the same size as the current color CMOS image sensor in some implementations, therefore, the smallest size among all image sensors, not just 3D image sensor, but also all 2D image sensors. This makes it perfect for applications that require compact size, like cell phones.
The third benefit is that it works in passive mode. Since it is using the light cone angle to encode the depth, it can measure and calculate depth as long as the image sensor “sees” the scene. This means it does not need active light. Therefore, it consumes minimum energy. This is particularly useful for wearable devices, and VR and AR glasses, and even cell phones. In a dark environment, this 3D CMOS image sensor and imaging device need light to light up the scene. This can be done visibly to human eyes or non-visibly to human eyes, like using infrared light. Even in such case, comparing to the active light used in active stereo, structured light, and ToF, it does not require the light with certain patterns like inactive stereo and structured light, nor requires the light pulsed and synchronized with the camera sensor like in ToF, just a normal infrared light will work, therefore much simpler and cheaper than other active depth sensors.
The fourth benefit is it can output both depth and normal intensity or color image at the same time. Since we only use diffraction to redistribute the energy passing through the finite grating, the integral or the summation of the distribution is the normal intensity so that the depth pixel arrangement can produce the normal intensity with the depth.
The fifth benefit is related to the fourth benefit: the depth map and intensity (or color) map is automatically aligned by construction. This is a great benefit for many applications that require the depth map aligned with a color map, such as 3D reconstruction, VR, AR, simultaneous localization, and mapping (SLAM).
The sixth benefit is related to the second benefit: it is a single sensor. Unlike stereo and structured light, two cameras, or the camera and the projector are physically separated; therefore, it requires calibration before usage and requires self-calibration during usage from time to time to maintain its accuracy.
The seventh benefit is that the depth can be output directly from the sensor with a built-in Image Signal Processor (ISP). No extra chip for computing is needed. Not like stereo and structured light, which requires a separated computing unit to do pattern matching, the depth calculation from such a 3D CMOS image sensor and the imaging device is much simpler. Although it requires solving an inverse problem, it is similar to in iToF, which encodes the depth into phase change. The depth is encoded in a couple 8 bit intensities. This has limited entries, for example, 4 pixels of 8 bit sensors, after normalized with its intensity, have total 256{circumflex over ( )}4/256=16,777,216 entries, which can be easily implemented in a look-up table by the integrated ISP. In addition, the change in intensity distribution is proportional to the light cone angle. The light cone angle is proportional to depth. Therefore, for a higher number of pixels, we can always use interpolation to interpolate depth from a smaller look-up-table.
The eighth benefit is related to the seventh benefit: high depth resolution. The depth resolution of such a 3D CMOS image sensor and the imaging device is much higher than other depth sensors, since it minimally uses 2 pixels, so 256×256=65,536 levels of depth. For a typical range of 10 meters depth, this is equivalent to roughly 0.0002 meters (0.2 mm) resolution, much higher than stereo, structured light, and ToF. Since normal intensity have 256 levels for 8 bit sensor, we have to normalize the normal intensity, reducing the depth resolution. However, we can increase the number of pixels, and also use different color channels, such as red, green, and blue. These channels have a different wavelength, therefore, different grating design. Their depth results are independent to each other; by combining three color channels together, the depth robustness and resolution can increase 2{circumflex over ( )}3=8 times.
The ninth benefit is that this can potentially reduce the reflectivity on CMOS image sensors, therefore enabling smaller pixel size and color sensitivity. When the grating periods is smaller than the wavelength, the grating can absorb all light, letting them pass through, and almost no reflected light.
Before we start describing such 3D CMOS image sensor and imaging device in detail, we want to clarify the terminologies. “Light” is referring to photons; it has wavelength that are visible to human eyes or visible to CMOS image sensors. For example, CMOS image sensor can sense infrared light (wavelength from ˜700 nm to ˜1500 nm), although we humans cannot see it. Many other technical terms will be defined and explained in the context.
Leonardo Da Vinci first described the imaging theory in the following words: “Everybody in the light and shade fills the surrounding air with infinite images of itself; and these, by infinite pyramids diffused in the air, represent this body throughout space and on every side.” He uses these words to describe the relationship between objects' light and image formation. In fact, pyramids he refers to are not pyramids, but cones. So, it is typically referred to as “light cones.” As shown in
We need an “imaging setup” to project the 3D scene into a 2D image. Here the “imaging setup” usually is called a camera, which includes a box that blocks the light from the environment, only opens a small area; it could be a pin-hole, a lens, or multiple lenses, to allow the light to go through. There is also “lensless” imaging setup, in such setup, the object, usually emitting light by itself, is placed in a black box, and it emits the light directly to the image sensor. The image sensor directly senses the light rays from the object and forms the image.
For pin-hole imaging setup 20, as shown in
For lens or multiple lenses imaging setups 30, as shown in
Now we know we can encode the depth information into the angle of the light cone received at the imaging setup image plane. The next is how to measure the wideness or narrowness of the light cone.
We use gratings and its near field and far field to measure the angle of the light cone. A finit grating 88 is referring to multiple periodic structures that have the dimension around the scale of light wavelength, as shown in
In this 3D CMOS image sensor and imaging device, we use finite grating and diffraction patterns at a certain distance, especially at the places where a small number of diffraction peaks are formed, such as 3, 2, or 1. At those places, we just need a small number of sensing units to sense and measure diffraction pattern changes.
In this 3D CMOS image sensor and imaging device, we leverage another property of gratings and diffraction—its sensitivity to the incident angle. When the incident becomes oblique, the diffraction pattern will change, they still form on the same distance to the grating, but they shifted, as shown in
This 3D CMOS image sensor is built with a standard CMOS image sensor process. There are two types of CMOS image sensor structure and process, Front Side Illumination (FSI) and Back Side Illumination (BSI), as shown in
In FSI (50 in
Since in FSI, the light has to go through the circuit, and these metal interconnect layers 57 and circuits take some space in each pixel area, only a portion of the pixel area is used to receive light. The ratio of between such area of receiving light to the entire pixel area is called the filling ratio. Since the circuit does take some area, therefore, the filling ratio is limited, and not high in FSI. The limit the number of photons that can reach the light receiving section, which is a drawback of FSI.
BSI was invented to fix such an issue. In BSI, shown as 60
In this 3D CMOS image sensor 100, a finite grating 88 is on top of the light receiving section 61, as shown in
One can also make phase shift gratings with only transparent materials, like silicon dioxide. This can be done with the standard CMOS deposition, lithography, and etching process. First, a uniform film of silicon dioxide is created, then certain areas that require different phases are defined by exposing the resist on this silicon dioxide layer, followed by etching to etch down those areas. By controlling the etching time and other parameters, these exposed areas can be made to specified thickness. Usually, the two areas in each grating period are made with the opposite phase (180-degree phase difference), so the light passing through will strongly interfere. The benefit of using phase shift gratings instead of binary gratings is that no energy loss, all light will pass through the layer, therefore, improve sensitivity and also the performance in a low lighting environment.
In this 3D CMOS image sensor, multiple CMOS pixels are placed at the selected grating near the field where the diffraction peaks are formed, as shown in
Once pixel intensity values are collected, in this example in
The light energy passing through the finite grating will keep the energy unchanged; diffraction only changes the energy distribution. There are also some photons reflected from the gratings, but usually, this is less than 1% of the total energy, which can be neglected. Since the energy doesn't change, the normal intensity would be the integral of all pixels. Since each pixel value is already the integration of photons on that pixel, the normal intensity equals the summation of all pixel intensities divided by the number of pixels:
Intensity=(P1+P2+P3+P4)/4
The value for depth can be calculated using the distribution of light intensity P1, P2, P3, and P4. This is an inverse problem to solve; the depth is the inverse solution:
Depth=inverse(P1,P2,P3,P4)
Since there are a limited number of values of P1, P2, P3, and P4, for example, for 8-bit intensity, it is 256 levels. We can build a look-up-table (LUT) to map from the intensity distribution to depth. We can reduce the number of entries of this LUT even further. For example, from diffraction theory and our design the intensity distribution on P1 to P4 are almost symmetric (depending the location of the image pixel in the image plane, the light cone is slightly tilted, but that is a secondary effect, can be ignored), and we also have to normalize on the color intensity, we can reduce the lookup table to
Depth=LUT(UINT 8(P2+P3,P1+P4)*256/2/Intensity)
UINT8 means unsigned 8 bit integer, basically 0, 1, 2, all the way to 255.
For example, a lookup table after calibration may look at this:
LUT (255, 0)=100 m, LUT (254, 1)=94 m, LUT (253, 2)=92 m, . . . . LUT (130, 124)=0.24 m, LUT (129, 125)=0.236 m, LUT(128, 126)=0.233 m, LUT(127, 127)=0.231 m
Please note the resolution of depth is controlled by the size of the look-up-table, which is actually controlled by number of pixels to measure the diffraction distribution in the depth pixel and intensity resolution of each normal CMOS image sensor pixel.
Another way to improve the depth resolution is by adding additional such rows or columns of multiple pixels with different wavelength and the corresponding finite grating design. For example, if one row of 4 pixels is underneath a green color filter, another row next to it can be 4 pixels underneath a blue color filter. The grating period and duty cycle can be calculated by FDTD simulation, so that both rows will form the same number of diffraction peaks at the light receiving section. Since each row can calculate the depth independently, the combined depth resolution will be square of the original resolution.
Another way to improve the depth resolution is by adding additional such rows or columns of multiple pixels with the same wavelength but a different phase. For example, if one row of 4 pixels is underneath a green color filter, another 3 rows next to it can be 4 pixels underneath the green color filter. The finite grating in all 4 rows is made with phase shifting grating with the same period, duty cycle, and aligned. The only difference between rows is the phase, each row is etched in different depth, so that the phase from one row to the next row has an offset, for example, the phase shift for 4 rows is 0 degree, 90 degrees, 180 degrees, and 270 degrees.
In the following, we will show some examples of depth pixel design. They only illustrate some instances based on the principle, not the complete variation of this 3D CMOS image sensor and imaging device.
In some implementation, 4×4 pixels are used to form one superpixel to measure depth, normal color image, and infrared images. As shown in
What described here is just one example, one can make these pixels in columns instead of row, and one can re-arrange the order of infrared, red, green, and blue color rows.
In some implementation, the depth pixels are combined with normal color pixels.
The minimum of multiple pixels that can collect the fraction pattern intensity distribution or light cone angle is 2, in which the finite grating has to shift with an offset so that the center of the fraction pattern is not exactly at the center of the two pixels as shown in
In some implementation, adding depth pixels does not need to increase the normal CMOS color image sensor pixel size, while both the depth map and normal color image can be obtained. As shown in
Again, what described here is just one example, one can make these pixels in columns instead of row, and one can re-arrange the order of red and blue color pixels. One can even choose either red or blue to be the depth pixels, although that may not be optimal.
This implementation may not be the best for depth measurement since it has only two pixels to sense the diffraction distribution. However, it may be perfect for certain applications that do not require an accurate or absolute depth value. For example, this can be used in cell phone cameras to mimic the effect of expensive Single Lens Reflex (SLR) camera, where the background is blurred with a large lens and large aperture. The center of the image is the foreground, the normalized two depth pixel difference will be similar, then those areas will be kept, the rest of the area will be blurred using digital image low pass filters.
In some implementation, not only one can keep the same resolution of the normal CMOS color image sensor, but also keep the color filter mosaic untouched. As shown in
In some implemtation, a mask that blocks light but with certain area of openings to allow light to pass can be placed in front or behind the lens to change the weight of light distribution in the light cone to the image plane. As shown in
The methods of such 3D sensing can be applied to the implementations described above, but can also apply to other variations.
In some implementation of 310, a mask can be placed near the lens. It can have opening areas, such as annular openings, that just allow light that close to the edge of the light cone to pass. This can enhance the weight of oblique incident angle in the light cone, therefore, enhance the depth sensing sensitivity.
Then in step 320, optionally, a color filter is placed only to allow the light with certain wavelength to pass. Then, in step 330, at or near the image plane, diffraction elements, such as the finite grating designed above, are used to convert the light cone angle into a diffraction pattern distribution shift. Then in step 340, multiple CMOS image sensor pixels, such as 2, 3, 4, etc., can be placed at a certain location where the finite grating forms a relatively small number of diffraction peaks, such as 1, 2, 3, etc. Then in step 350, the intensity from each CMOS image sensor pixel is collected. The depth can be calculated by solving the inverse problem. Usually, the solution of such an inverse problem can be pre-calculated, pre-calibrated, and stored in a look-up table. The normal pixel intensity can also be calculated, too: in general, it is the integral of the whole diffraction pattern intensity distribution, or in a good approximation, the sum of the intensity of all pixels in a depth unit.
In some implementation of 400, instead of using depth sensor units with different colors, multiple depth units with the same color but different phases can be used. In such an implementation, usually phase grating is used. The finite grating in each unit are etched with different depth so that the light passing through them have a phase offset (phase shift). By combining the data from multiple channels with phase offset, the depth calculated can have better resolution and robustness.
These implementations are made only to illustrate some cases using this 3D CMOS image sensor and imaging device; numerous modifications can be made from the above implementation examples but still within the principle described in this section.
| Number | Date | Country | |
|---|---|---|---|
| 63100968 | Apr 2020 | US |