The invention relates generally to time-of-flight (TOF) systems in which the system emits optical energy toward a target object, and examines phase shift in the fraction of the emitted optical energy that reflects back from the target object to determine depth distance Z to the target object. More specifically the invention relates to maximizing space-time resolution for phase-based TOF systems.
In
In system 50′, under control of microprocessor 62, optical energy source 70 is periodically energized by an exciter 76, and emits modulated optical energy toward an object target 52. Emitter 70 preferably is at least one LED or laser diode(s) emitting low power (e.g., perhaps 1 W) periodic waveform, producing optical energy emissions of known frequency (perhaps a few dozen MHz) for a time period known as the shutter time (perhaps 10 ms). Emitter 70 typically operates in the near IR range, with a wavelength of perhaps 800 nm. A lens 72 may be used to focus the emitted optical energy.
Some of the emitted optical energy (denoted Sout) will be reflected (denoted Sin) off the surface of target object 20. This reflected optical energy Sin will pass through an aperture field stop and lens, collectively 74, and will fall upon two-dimensional array 56 of pixel or photodetectors 58, often referred to herein as pixels, arranged typically in rows and columns. When reflected optical energy Sin impinges upon the photodetectors, photons within the photodetectors are released, and converted into tiny amounts of detection current. For ease of explanation, incoming optical energy may be modeled as Sin=A·cos(ω·t+Θ), where A is a brightness or intensity coefficient, ω·t represents the periodic modulation frequency, and Θ is phase shift. As distance Z changes, phase shift Θ changes, and
Within array 56, pixel detection current can be integrated to accumulate a meaningful detection signal, used to form a depth image. In this fashion, TOF system 40′ can capture and provide Z depth information at each pixel detector 58 in sensor array 56 for each frame of acquired data.
As described in the above-cited phase-shift type TOF system patents, pixel detection information is captured at least two discrete phases, preferably 0° and 90°, and is processed to yield Z data.
System 50′ yields a phase shift Θ at distance Z due to time-of-flight given by:
Θ=2·ω·Z/C=2·(2·π·f)·Z/C (2)
where C is the speed of light, 300,000 Km/sec. From equation (2) above it follows that distance Z is given by:
Z=Θ·C/2·ω=Θ·C/(2·2·f·π) (3)
And when Θ=2·π, the aliasing interval range associated with modulation frequency f is given as:
Z
AIR
=C/(2·f) (4)
In practice, changes in Z produce change in phase shift Θ but eventually the phase shift begins to repeat, e.g., Θ=Θ+2·π, etc. Thus, distance Z is known modulo 2·π·C/2·ω=C/2·f, where f is the modulation frequency. Thus there can be inherent ambiguity between detected values of phase shift Θ and distance Z. In practice, multi-frequency methods are used to disambiguate or dealias the phase shift data.
Typical time of flight (TOF) sensors require multiple image captures of different configurations to measure depth or Z-distance to a target object. Multiple TOF system acquired images from discrete points in time are then combined to yield a single depth frame. A primary source of so-called bias error results from motion of the target object over time during image acquisition. Another source of bias error is due to depth edges in space. In either case, a pixel (or an array of pixels) images more than one object and returns a single incorrect depth value. It is advantageous to maximize resolution in both space and time to minimize bias from such effects.
What is needed is a method and system whereby resolution in time and space can be maximized in a phase-based TOF system. Preferably a decision as to which parameters shall be maximized should be determinable on-the-fly.
The present invention provides such a method and system.
Phase-type time-of-flight (TOF) systems often require combining multiple pixel values to provide a depth (Z) value. In a so-called temporal mode, pixel values for a same (x,y) pixel location in the pixel array (x rows, y columns) but representing different captures in time may be combined to produce a depth reading. However in a so-called spatial mode, pixel values from neighboring (often adjacent) pixels within the pixel array in a same temporal capture may be combined to yield a depth value.
When target objects move, values for a same pixel in consecutive frames may not correspond to the same object. Similarly when a spatial edge occurs in the image then pixels values for neighboring pixels in a same capture may not correspond to the same object. In either case incorrect values may be combined thus producing an erroneous depth value. The undesired result from such temporal or spatial errors is that the pixel array can report an incorrect depth Z value that is not indicative of depth distance to either target object point. For example, if the sensor array images target object points at say 1 m and at 2 m, and those values are erroneously combined together then the erroneously report depth measurement might be Z=0.5 m. Thus, in a first aspect, the present invention addresses the problem of combining multiples pieces of consistent pixel data to yield accurate z depth information, despite changes in time or space that unless corrected can cause the TOF system to yield erroneous depth information. These errors, requiring correction, are sometimes referred to as bias error.
In a second aspect, the present invention recognizes that each pixel in the pixel array has an independent offset, that generally is different from the offset of other pixels in the array. When pixel values are combined either in time (from consecutive captures) or space (same capture but data taken from neighboring pixels) this offset must be cancelled or the TOF system will yield erroneous depth values. Thus, in time or in space, it is always desirable to cancel pixel offsets. In time or temporal mode, using temporarily preferably consecutive captures that are 180° apart, e.g.,)(0°-180° advantageously cancels the offsets, as values are taken from the same pixel. However if the target object is moving, motion blur may occur while sequential captures are obtained. On the other hand, spatial mode (and other modes) does not require consecutive captures, but pixel offset is not necessarily automatically canceled, even when using 0°, 180° captures. This is because values are being obtained from different pixels that typically each have different offsets. However spatial mode is less susceptible to motion blur because less time is required to image the target object, relative to using temporal mode.
The present invention advantageously can remove and model over time each pixel dependent offset, often without need to calibrate (i.e., explicitly measure, store and subtract) each pixel dependent offset. Discrete measurements may be intelligently combined in space and in time to compute a more accurate single depth or Z measurement, which can be computed using at least two discrete measurements in space and time. Embodiments of the present invention can configure a single image capture to facilitate high space-time resolution depth calculation. A sequence of capture configurations can be defined to enable removal of the pixel dependent offset as well as to maximize a desired combination of space-time resolution. Depth values may be computed by combining pixel values either in space or time. In a preferred embodiment selection of temporal pixel value combinations, spatial pixel value combinations or simultaneously both (e.g. combining multiple pixel values in a capture with data from other temporal captures) may be selected on a pixel basis depending on the scene. This per pixel location selection may be used to minimize a function of motion blur and spatial edge artifacts as well as other considerations.
Preferably a unique decoding algorithm is defined for each capture sequence, and the combination of capture sequence and decoding algorithm may be used to weight space-time resolution according to desired specifications. A corrective offset vector can be determined using preferably run time cancellation or calibration techniques. If desired, the decision to maximize either space or time resolution may be computed per pixel per frame on-the-fly.
To recapitulate, embodiments of the present invention enable a TOF system to operate with reduced depth error due to motion blur, and/or spatial blur, and/or pixel offset, by intelligently determining how best to combine pixel values, and how best to compensate for individual pixel offsets. Either or both of these determinations can be carried out on a per pixel basis, dynamically, in real-time during TOF operation, or on archived TOF data. Embodiments of the present invention can dynamically calculate offsets for individual pixels and subtract such offsets from the values acquired by those pixels, including calculating individual pixel offsets by combining data acquired by the same pixel at two acquisitions, 180° out of phase with respect to each other. Offsets may be calculated on a per pixel basis and averaged over multiple captures. To reduce motion blur, if target object motion is detected, one or more offset calculations can be discarded rather than averaged. In some embodiments, offsets acquired a priori during a TOF system calibration procedure may be used.
Other features and advantages of the invention will appear from the following description in which the preferred embodiments have been set forth in detail, in conjunction with their accompanying drawings.
It can be challenging in a phase-type time-of-flight (TOF) systems to intelligently combine multiple pixel values to provide accurate depth (Z) values so as to reduce mal-effects due to motion of the target object (motion blur), and due to spatial edges of the target object. Under some circumstances, spatially acquired pixel values should be combined, whereas under other circumstances, temporally acquired pixel values should be combined, where such circumstances may change dynamically. Complicating the problem of intelligently combining pixel values is the fact that different pixels in the TOF system pixel array have different offsets. Thus, embodiments of the present invention preferably intelligently determine not only how best to combine pixel values, but also determine how best to compensate for individual pixel offsets.
As noted, phase-type time of flight (TOF) systems can exhibit error due to the system integrating acquired pixel detection signals (values) to form a single depth value or array of values. Error may result from the target object moving during the time that multiple images are acquired to form the single depth frame. Error can also result from target object depth edges in space because neighboring pixels for different objects may be erroneously combined as though they relate to the same object. Suffice to say, it is not desired that pixel(s) in the pixel detector sensor array that image different target objects be combined, as erroneous Z depth data can result. As will now be described, embodiments of the present invention can optimize, preferably dynamically, operation in both space and time to minimize error from motion blur and object edge effects. As used herein, the term Zfast refers to the decision to maximize resolution in time with phase computed spatially. The term Zfine refers to the decision to maximize resolution in space, with phase computed temporally. The term Zsmart refers to an on-the-fly decision to optimize operation in space and/or in time.
Embodiments of the present invention advantageously can remove each pixel dependent offset, and can model each pixel dependent offset over time, without requiring calibration for each pixel dependent offset. Further, discrete measurements may be combined in space and time to compute a more accurate single depth measurement. The depth distance Z may be computed using at least two discrete measurements in space and time. Preferably the decision to maximize either space or time resolution may be computed per pixel per frame according to desired specifications. Frame rate typically may be on the order of say 30 frames/second.
Principles of phase-type time of flight (TOF) systems have been described herein with reference to
V_Offset
V_Offset = [D_Offset D_Offset]
V_A
V_M
Due to pixel offset, measured differentials (D0_m and D90_m) are the sum of the active differentials (D0_a and D90_a) and a pixel dependent offset (D_Offset). The offset is theoretically identical for all values of PHI, as follows:
DPHI—m=DPHI—a+D_Offset
D0—m=D0—a+D_Offset
D90—m=D90—a+D_Offset
D180—m=D180—a+D_Offset
D270—m=D270—a+D_Offset
Note that when there is no active (emitted) light, DPHI_active reduces to 0 and what is left is:
DPHI—m=D_Offset
Consequently, if phase is measured when there is no active light, then Theta_m is 45° or 225°, depending on the sign of the offset:
D0—m=D_Offset
D90—m=D_Offset
Theta—m=atan 2(D_Offset,D_Offset)
Theta—m=45 degrees (D_Offset>0),225 degrees (D_Offset<0)
Therefore, one can conclude that the offset vector lies along the line defined by y=x, which is true for all values of PHI.
Upon integrating active light within the pixel detector array, the active vector grows at an angle that is dependent on the distance of the imaged target object; unlike the offset vector. The measured phase vector, consequently, is the vector sum of the offset vector (distance independent) and the active vector (distance dependent). Thus, measuring depth distance correctly requires removal of the offset vector from the measured vector. That is, in order to remove bias, D_Offset must be removed from both D0_m and D90_m.
The phase from the active vector (needed to compute depth):
Theta—a=atan 2(D90—a,D0—a)
The offset vector (biases depth):
Theta_offset=atan 2(D_Offset,D_Offset)=45 degrees
The measured vector:
D0—m=D_Offset+D0—a
D90—m=D_Offset+D90—a
Theta—m=atan 2(D90—m,D0—m)=atan 2(D90—a+D_Offset,D0—a+D_Offset)
Theta_bias=Theta—m−Theta—a
The offset vector can be removed using various techniques, most commonly cancellation and calibration, each technique having advantages and disadvantages. Consider first an offset vector cancellation approach.
Differential sensing yields the following useful property for static objects:
DPHI—a=−D(PHI+180)—a
Inverting the timing clock signals and taking the differential A−B, where A and B denote differential pixel input signals, simply inverts the sign of the active differential, and yields:
D0—a=−D180—a
D90—a=−D270—a
This property is extremely useful as it enables canceling out D_Offset using only measured differential values, without having to know D_Offset.
D0—m=D0—a+D_Offset
D180—m=D180—a+D_Offset=D_Offset−D0—a (notice substitution D0—a=−D180—a)
The difference of the measured values can now be used to extract the active signal:
D0—m−D180—m=D0—a+D_Offset−(D_Offset−D0—a)=2*D0—a
D90—m−D270—m=D90—a+D_Offset−(D_Offset−D90 a)=2*D90—a
After this transformation, one can compute Theta_a without bias, using only known measured values:
Theta a=atan 2(D90—m−D270—m,D0—m−D180—m)
The method of offset cancellation depicted in
One can model offset vector calibration as a function of known system parameters, including, for example and without limitation, ambient light, temperature, shutter time, and conversion gain. If the offset vector is known, then it can be subtracted before computing phase.
With reference to
Consider, now, temporal modeling of the offset vector. As stated earlier, differential sensing yields the following useful property for static target object:
DPHI—a=−D(PHI+180)—a
However it is also true that:
DPHI—m=DPHI—a+D_Offset
By combining the two equations, one can solve for the offset
D_Offset=0.5*(DPHI—m+D(PHI+180)—m)
In a fashion similar to the describe D_Offset cancellation method, this enables determination of the value of the offset without having to explicitly solve for the offset value, as was required when using an offset calibration method.
Preferably pixel dependent offset at each frame is computed by taking the average of two images in which each pixel is clocked 180° out of phase. This is suitable for systems that use an integrated shutter (pixels are clocked directly) or an external shutter, such as Gallium Arsenide.
However similar to use of the offset vector cancellation method, however, the temporal model method is also subject to motion blur. Motion blur results because motion breaks the property DPHI_a=−D(PHI+180)_a, and consequently adds a portion of the active signal to the modeled offset. However, an attractive property of the offset is that for a given set of controlled system parameters, the offset reduces only to a function of temperature (and sometimes ambient light as well).
With respect to offset vector calibration modeling, one can state:
D_Offset=K0+K1*TS*(exp(beta*(T−T0))−1)+K2*TS/CS
where TS=shutter, CS=CMR (common mode reset) time unit
For a given shutter and CMR setting, the offset equation reduces to:
D_Offset=K0+K1*exp(beta*(T−T0))
Although temperature cannot be controlled directly, it changes relatively slowly. That is, for a reasonable frame rate, the temperature is roughly constant for a large number of frames, and is slowly changing over time. Consequently, for a constant temperature, shutter, and common mode reset (CMR) setting (for example, as in the case above), the offset remains approximately constant for a large number of frames, and changes slowly over time.
More precisely, the derivative of the temperature with respect to time is small, and the offset reduces to a function of temperature. Therefore, the derivative of the offset is small with respect to time:
Observation: dT/dt is small
d(D_Offset)/dt=d(D_Offset)/dT*dT/dt
d(D_Offset)/dT=K1*beta*exp(beta*(T−T0))
d(D_Offset)/dt=K1*beta*exp(beta*(T−T0))*dT/dt
Conclusion: d(D_Offset)/dt is small
Given that the time derivative d(D_Offset)/dt is small, it is possible to model the offset over time by low pass filtering the output of the offset calculation (which may be the average of the two most recent frames, assuming that every other image is clocked 180° out of phase).
Motion can be filtered because it adds a high frequency component to the offset calculation.
Table 2 below summarizes the advantages and disadvantages of the above-described three methods of pixel offset removal.
From the standpoint of maximizing space-time resolution, a substantial advantage is single frame offset removal, which can be done using the calibration method or using the temporal model method. Note that the temporal model method advantageously requires no prior calibration, or storage, with relatively less computationally intensive filtering when compared to a per-pixel exponential model of the offset.
An important aspect in maximizing temporal resolution in a TOF system is to compute depth from a single frame, in which the offset needs to be removed before phase computation. This can be using either of the above two described methods, namely calibration or temporal modeling. Another important aspect to computing depth from a single frame is to obtain orthogonal differentials in a single frame, which can be done by clocking pixels at different phases. High temporal resolution may be desirable for applications such as object tracking.
Referring to
Turning now to
Maximizing spatial resolution according to the present invention will now be described. According to embodiments of the present invention, maximizing spatial resolution in a TOF system calls for computation of depth Z from a single pixel, preferably by clocking images at different phases. It is of course desired to maintain maximum temporal resolution within this restriction. Accordingly, according to embodiments of the present invention, offset preferably is still removed from each individual frame before phase computation. High spatial resolution may be desirable for many applications, e.g., digitization of a static object.
Turning now to
With respect to exemplary decoding sequence shown in
It is seen from the above descriptions involving maximizing space (Zfine) or time (Zfast) resolution in a TOF that one first removes the offset from a single frame. Having removed this offset from a single frame, then either temporal resolution is maximized by computing phase along 0°-90° edges in space, e.g., Zfast, or spatial resolution is maximized by computing phase along 0°-90° edges in time, e.g., Zfine.
However the decision to compute phase in space or time need not be globally limited. Indeed one can make on-the-fly decisions to maximize time/space resolution. Such on-the-fly decision making may be denoted as Zsmart. For example, on-the-fly determination is permissible if at each point in time, each pixel has at least one available 0°-90° edge in space, and has at least one available 0°-90° edge in time. For some applications, it may be desirable for certain pixels to maximize temporal resolution while other pixels maximize spatial resolution. An exemplary such application would be tracking movement of a human player in an intricate background scene.
Consider now the exemplary sequence to facilitate Zsmart or on-the-fly decision making to maximize space resolution or time resolution depicted in
In
In
Modifications and variations may be made to the disclosed embodiments without departing from the subject and spirit of the invention as defined by the following claims.
Priority is claimed to co-pending U.S. provisional patent application Ser. No. 61/217,352 filed 29 May 2009 entitled Method and System to Maximize Space-Time Resolution in a Time-of-Flight (TOF) System.
Number | Date | Country | |
---|---|---|---|
61217352 | May 2009 | US |