In general, the present invention relates to computer implemented triangulation-based three-dimensional (“3-D” or “3D”) digital image reconstruction techniques—such as structured light illumination (SLI) systems, stereo vision, time-of-flight depth sensing, and so on—used in performing 3D image acquisition to digitize an artifact feature or contoured surface/subject of interest.
More-particularly, the invention is directed to a unique computer-implemented process, system, and computer-readable storage medium having stored thereon, executable program code and instructions for 3-D triangulation-based image acquisition of a contoured surface-of-interest (or simply, “contour” or “contour-of-interest”) under observation by at least one camera, by projecting onto the surface-of-interest a multi-frequency pattern comprising a plurality of pixels representing at least a first and second superimposed sinusoid projected simultaneously, each of the sinusoids represented by the pixels having a unique temporal frequency and each of the pixels projected to satisfy
where Inp is the intensity of a pixel in the projector for the nth projected image in a particular instant/moment in time (p, to represent projector); K is an integer representing the number of component sinusoids (e.g., K=2 for a dual-frequency sinusoid pattern, K=3 for a triple-frequency sinusoid, and so on), each component sinusoid having a distinct temporal frequency, where K is less than or equal to (N+1)/2. The parameter Bkp represents constants that determine the amplitude or signal strength of the component sinusoids; Ap is a scalar constant used to ensure that all values of Inp are greater than zero, 0 (that is to say, that the projector unit will not project less than 0 magnitude of light); fk is the spatial frequency of the kth sinusoid corresponding to temporal frequency k; and yp represents a spatial coordinate in the projected image. For example, yp may represent a vertical row coordinate or a horizontal column coordinate of the projected image; n represents phase-shift index or sequence order (e.g., the n=0 pattern is first projected, and then the n=1 pattern, and so on, effectively representing a specific moment in discrete time). N is the total number of phase shifts—i.e., the total number of patterns—that are projected, and for each pattern projected, a corresponding image will be captured by the camera (or rather, the camera's image sensor). When used throughout, the superscript “c” references parameters relating to an image or series of images (video) as captured by the camera, whereas superscript “p” references the projector.
Where pixels are projected to satisfy Eq. 1.1, the pixels of the images then captured by the camera are defined according to the unique technique governed by the expression:
The term η (“eta”) represents a noise due to a certain amount of error introduced into the image by the light sensor of the camera. Recall, a camera image is made up of a multitude of pixels, each pixel defined by Eq. 1.2, with values for Ac, Bkc, and ηc different for each pixel. The “c” superscript indicating a value is dependent on the position of the pixel as referenced in the camera sensor (‘camera space’). To obtain phase terms from the pixels projected in accordance with Eq. 1.2, the unique expression, below, is carried-out for each k:
where, as before, yp represents a spatial coordinate in the projected image. In EXAMPLE 01, herein below, where K is set equal to 2, the phase terms for the cases where k=1 and k=2 (i.e., for the two superimposed sinusoids) must be determined.
When applying the use of temporal unwrapping techniques, for the case where k=2 using Eq. 1.1, one can determine that the projected pixels will satisfy
Where this leads to 20 stripes (as shown, for example, in
Rather, according to the instant invention, a second set of patterns (k=1) all unit-frequency sinusoids (i.e., f=1) is superimposed with a high-frequency sinusoid, such as one of 20 stripes, k=2 pattern. The unit-frequency signal is defined by an adaptation of Eq. 1.1
Therefore, rather than projecting a total of N patterns onto the contoured surface-of-interest, there are now 2*N patterns projected (such that K=2 and each pixel projected from the projector is comprised of a dual-frequency pattern, one is a unit-frequency sinusoid and the second is a high-frequency sinusoid). However, very unique to the applicants' technique according to the invention, the plurality of pixels projected using Eq. 1.1 are ‘instantly decodable’ such that the computerized processing unit (CPU) of the computerized device in communication with the projector and camera units, at this point already, has the data and the means to determine (closely enough) which stripe each projected pixel Inp is in, while determining 2πf2yp (i.e., phase) of the camera image (of pixel intensity, Inc), according to Eq. 1.3—reproduced again, below, for handy reference:
To carry-out phase unwrapping of the high-frequency sinusoid the following steps can be taken, whereby the following four variables—unitPhase, highPhase, tempPhase, and finalPhase as labeled as defined, below—are determined:
Or, summarized in pseudo code short-hand notation as done in
unitPhase=arctan(cos SumK1/sin SumK1);
highPhase=arctan(cos SumK2/sin SumK2)/F2;
tempPhase=round((unitPhase−highPhase)/(2*PI)*F2);
finalPhase=tempPhase+highPhase*2*PI/F2
The first and second superimposed sinusoid may comprise, for example as noted in EXAMPLE 01, below, a unit-frequency sinusoid (in this context, ‘unit’ refers to having a magnitude value of 1) superimposed on a high-frequency sinusoid, the unit-frequency sinusoid and high-frequency sinusoid being projected simultaneously (i.e., effectively ‘on top of one another’ over a selected epoch/duration of frames, n) from a projection unit, or projector, as a plurality of pixels such that each of the pixels projected satisfy the expression
where Inp is the intensity of a pixel in the projector, Ap, B1p, and B2p are constants set such that the value of Inp falls between a target intensity range, (e.g., between 0 and 255 for an 8-bit color depth projector), fh is the high frequency of the sine wave, fu is the ‘unit’ frequency of the sine wave. The unit-frequency signal/sinusoid is used during a demodulation step to produce a decodable, unwrapped-phase term temporally.
Additionally, the process includes a decoding of the projected patterns by carrying-out a lookup table (LUT)-based processing of video image data acquired by at least one image-capture device. The decoding step is performed to extract, real-time, coordinate information about the surface shape-of-interest. The LUT-based processing includes the step of implementing (or, querying) a pre-computed modulation lookup table (MLUT) to obtain a texture map for the contoured surface-of-interest and implementing (or, querying) a pre-computed phase lookup table (PLUT) to obtain corresponding phase for the video image data acquired of the contoured surface-of-interest. Furthermore, use of conventional digital image point clouds can be made to display, real-time, the data acquired.
In one aspect, the unique computer-implemented process, system, and computer-readable storage medium with executable program code and instructions, can be characterized as having two stages. The first being a dual-frequency pattern generation and projection stage, the dual-frequency pattern characterized by the expression
where Inp is the intensity of a pixel in the projector, Ap, B1p, and B2p) are constants that are preferably set, by way of example, to make the value of Inp fall between 0 and 255 for an 8-bit color depth projector, fh is the high frequency of the sine wave, fu is the unit frequency of the sine wave and equals 1, n represents phase-shift index, and N is the total number of phase shifts and is preferably greater than or equal to 5. The second stage comprises a de-codification stage employing a lookup table (LUT) method for phase, intensity/texture, and depth data.
SLI works by measuring the deformation of a light pattern that has been projected on the surface contours of an object; see,
General Background Discussion of Technology: SLI
SLI measurement is based on the mathematical operation known as triangulation, see
Background: Computerized Devices, Memory & Storage Devices/Media.
I. Digital Computers.
A processor is the set of logic devices/circuitry that responds to and processes instructions to drive a computerized device. The central processing unit (CPU) is considered the computing part of a digital or other type of computerized system. Often referred to simply as a processor, a CPU is made up of the control unit, program sequencer, and an arithmetic logic unit (ALU)—a high-speed circuit that does calculating and comparing. Numbers are transferred from memory into the ALU for calculation, and the results are sent back into memory. Alphanumeric data is sent from memory into the ALU for comparing. The CPUs of a computer may be contained on a single ‘chip’, often referred to as microprocessors because of their tiny physical size. As is well known, the basic elements of a simple computer include a CPU, clock and main memory; whereas a complete computer system requires the addition of control units, input, output and storage devices, as well as an operating system. The tiny devices referred to as ‘microprocessors’ typically contain the processing components of a CPU as integrated circuitry, along with associated bus interface. A microcontroller typically incorporates one or more microprocessor, memory, and I/O circuits as an integrated circuit (IC). Computer instruction(s) are used to trigger computations carried out by the CPU.
II. Computer Memory and Computer Readable Storage.
While the word ‘memory’ has historically referred to that which is stored temporarily, with storage traditionally used to refer to a semi-permanent or permanent holding place for digital data—such as that entered by a user for holding long term—more-recently, the definitions of these terms have blurred. A non-exhaustive listing of well known computer readable storage device technologies compatible with a variety of computer processing structures are categorized here for reference: (1) magnetic tape technologies; (2) magnetic disk technologies include floppy disk/diskettes, fixed hard disks (often in desktops, laptops, workstations, etc.), (3) solid-state disk (SSD) technology including DRAM and ‘flash memory’; and (4) optical disk technology, including magneto-optical disks, PD, CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-R, DVD-RAM, WORM, OROM, holographic, solid state optical disk technology, and so on.
Briefly described, once again, the invention includes a unique computer-implemented process, system, and computer-readable storage medium having stored thereon, executable program code and instructions for 3-D triangulation-based image acquisition of a contoured surface/object-of-interest under observation by at least one camera, by projecting onto the surface-of-interest a multi-frequency pattern comprising a plurality of pixels representing at least a first and second superimposed sinusoid projected simultaneously, each of the sinusoids represented by the pixels having a unique temporal frequency and each of the pixels projected to satisfy
where Inp is the intensity of a pixel in the projector for the nth projected image in a particular instant/moment in time (p, to represent projector); K is an integer representing the number of component sinusoids (e.g., K=2 for a dual-frequency sinusoid pattern, K=3 for a triple-frequency sinusoid, and so on), each component sinusoid having a distinct temporal frequency, where K is less than or equal to (N+1)/2. The parameter Bkp represents constants that determine the amplitude or signal strength of the component sinusoids; Ap is a scalar constant used to ensure that all values of Inp are greater than zero, 0; fk, is the spatial frequency of the kth sinusoid corresponding to temporal frequency k; and yp represents a spatial coordinate in the projected image. N is the total number of phase shifts—i.e., the total number of patterns—that are projected, and for each pattern projected, a corresponding image will be captured by the camera.
Images captured by the camera are defined according to the unique technique governed by the expression:
The term η representing a noise due to a certain amount of error introduced into the image by the light sensor of the camera. To obtain (extract) phase terms from the pixels projected in accordance with Eq. 1.2, the unique expression, below, is carried-out for each k:
By way of using lookup tables (LUT) to obtain modulation (M) and phase (P) according to
To carry-out phase unwrapping of, for example, a high-frequency sinusoid the following steps are taken to combine phase terms to obtain a single phase image:
Next, a conversion of phase to X, Y, Z point clouds is implemented using the following:
Zw=Mz(xc,yc)+Nz(xc,yc)T,
Xw=Ex(xc,yc)Zw+Fx(xc,yc)
Yw=Ey(xc,yc)Zw+Fy(xc,yc)
where
Implementing the 7 parameters Mz, Nz, C, Ex, Ey, Fx, and Fy by means of table look-up for indices (xc, yc) (camera column and row indices), reduces the total computational complexity associated with deriving the 3-D point cloud from the phase term.
For purposes of illustrating the innovative nature plus the flexibility of design and versatility of the new system and associated technique, as customary, figures are included. One can readily appreciate the advantages as well as novel features that distinguish the instant invention from conventional computer-implemented 3D imaging techniques. The figures as well as any incorporated technical materials have been included to communicate the features of applicants' innovation by way of example, only, and are in no way intended to limit the disclosure hereof. Each item labeled an ATTACHMENT is hereby incorporated herein by reference for purposes of providing background technical information.
ATTACHMENT A1 is composed of sections from a doctoral dissertation manuscript entitled “Real-time 3-D Reconstruction by Means of Structured Light Illumination” authored by one of the applicants hereof, highlighting applicants' rigorous analysis and scientific effort behind the technological advancements herein, especially the materials labeled EXAMPLE 01; these sections are hereby annexed and incorporated herein by reference, to serve as an integral part of the instant patent application further evidencing the complex, multifaceted nature of problems encountered by those attempting to create solutions in the arena of 3-D image acquisition.
ATTACHMENT A2 is an in-process technical manuscript entitled “Real-time Three-Dimensional Shape Measurement of Moving Objects without Edge Errors by Time Synchronized Structured Illumination” authored by three of the applicants hereof, highlighting applicants' rigorous analysis and scientific effort behind the technological advancements outlined herein, especially the materials labeled EXAMPLE 02.
ATTACHMENT B, an article co-authored by one applicant hereof, is provided for its technical background and incorporated by reference, herein; having been published as Li, Jielin, Hassebrook, Laurence G., and Guan, Chun, “Optimized two-frequency phase-measuring profilometry light-sensor temporal-noise sensitivity,” J. Opt. Soc. Am. A, Vol. 20, No. 1, pp. 106-115 (January 2003).
ATTACHMENT C, provided for its technical background and incorporated by reference, herein, was published as S. F. Frisken, R. N. Perry, A. P. Rockwood, and T. R. Jones, “Adaptively sampled distance fields: A general representation of shape for computer graphics,” in Proceedings of the 27th annual conference on Computer graphics and interactive techniques, 249-254 (2000).
By viewing the figures incorporated below, and associated representative embodiments, along with technical materials outlined and labeled an ATTACHMENT, one can further appreciate the unique nature of core as well as additional and alternative features of the new computer-implemented process, system, and computer-readable storage medium having stored thereon, executable program code and instructions for 3-D triangulation-based image acquisition of a contoured surface-of-interest under observation by at least one camera, by projecting onto the surface-of-interest a multi-frequency pattern comprising a plurality of pixels representing at least a first and second superimposed sinusoid projected simultaneously, disclosed herein. Back-and-forth reference and association will be made to various features identified in the figures.
The overall process of 3-D triangulation-based image acquisition and reconstruction of a contour includes the following stages: First, a multi-frequency pattern is projected at the contour-of-interest comprised of pixels characterized by a first sinusoid (e.g., a unit-frequency sinusoid) embedded with a second sinusoid (e.g., a high-frequency sinusoid) producing a high-quality, wrapped phase signal. The first (e.g., unit-frequency) sinusoid signal is has a temporal frequency selected such that, during a demodulation temporal unwrapping step, an instantly de-codable, unwrapped-phase term is produced such that the process of spatially unwrapping the phase of the second (e.g., higher frequency) sinusoid is avoided.
Using an image sensor, data is acquired of the contour on which the multi-frequency pattern is being projected; second, generating the phase or finding the correspondence between camera and projector is preferably performed in real-time—this is very difficult for conventional pattern schemes such as frequency multiplexing and neighborhood coding one-shot SLI pattern schemes; third, 3-D point clouds are reconstructed using either phase-to-height conversion or absolute 3-D coordinates computation; and finally, the reconstructed 3-D point clouds are preferably displayed in real time.
Turning to
When employing conventional 3-D triangulation-based image acquisition techniques bottlenecks include: (1) decoding the image/pixel data captured by the camera's light sensor light of the pixel patterns projected onto the contoured surface-of-interest—this bottleneck involves traditional computation of the arctangent function, and (2) deriving absolute (‘world’) 3-D coordinates for each pixel of the captured images/pixel data, as a result of using conventional computationally-bulky matrix inversion techniques to map phase to 3-D world coordinates. A wide variety of solutions have been attempted by others in their efforts to address these bottlenecks; such as, new pattern decoding algorithms and GPU processing, but any incremental improvement gained or noted by employing known, conventional attempts is limited due to overall complexity associated with traditional demodulation.
When higher frequency patterns are employed to improve measurement accuracy in SLI, this is done at the extra computational cost of complicated time-intensive phase unwrapping. Image processing unwrapping phase strategies generally fall into two categories: spatial (space) and temporal (time). Spatial phase unwrapping approaches assume a continuity of scanned surfaces and likely fail in regions with depth discontinuities (step edges). Temporal methods, alternatively, use the intermediate phase values typically generated by projecting several additional reference patterns and, therefore, are not efficient for real-time operation.
A novel multi-frequency pattern is disclosed herein: contained in each multi-frequency pattern pixel projected are at least two sinusoidal signals, each sinusoid having a distinct frequency. For example, the pixels may each contain information for a high-frequency sinusoid superimposed with a unit-frequency sinusoid, where the high-frequency sinusoid component is used to generate robust phase information, and the unit-frequency component is used to reduce phase unwrapping ambiguities. By way of example, the image in
In 3-D contour/surface acquisition, it is useful to identify each pixel explicitly; this has led to employment of several different temporal and spatial techniques for pattern creation for point discrimination, such as Binary or Gray Scale encoding or PMP. Binary/Gray Scale encoding and PMP employ multiple pattern projections that are time multiplexed, which means that the patterns occur in succession, one after each other. Performing the patterns in succession increases the time to scan for each new pattern that must be included. According to the invention, to allow real-time processing, a time sequenced dual-frequency pattern is projected on a contoured surface-of-interest, the dual-frequency pattern is comprised of a plurality of pixels.
Phase Measuring Profilometry (PMP): Background Discussion.
As highlighted in ATTACHMENT B hereof—J. Li, L. G. Hassebrook, and C. Guan, “Optimized two-frequency phase-measuring-profilometry light-sensor temporal-noise sensitivity,” 20, 106-115 (2003)—Phase Measuring Profilometry (PMP) is a known SLI technique that measures depth information from a surface using a sequence of phase shifted sinusoidally varying patterns. This known PMP technique is outlined, next, and depicted graphically in
where (xp, yp) are the coordinates of a pixel in the projector, the intensity value of the given pixel is In(xp, yp), and Ap and Bp are constants. The p superscripts denote the expression is in projector coordinates. The f is the frequency of the sine pattern measured in cycles per image-frame, N is the total number of phase shifts for the whole sequence, and n is the current phase shift index or current frame in the time sequence. Since the equation depends only on yp, the intensity value of the given pixel, In(xp, yp), varies only in the yp direction. This direction is called the Phase Direction of the PMP pattern because it is the direction of the phase shift. The term Orthogonal Direction is appropriately named for the relationship with the phase direction—it lies 90 degrees from the Phase Direction along the constant xp values of the pattern.
PMP is depicted by
For reconstruction, a camera captures each image where the sine wave pattern is distorted by the scanned surface topology, resulting in the patterned images expressed as:
where (xc, yc) are the coordinates of a pixel in the camera while Inc(xc, yc) is the intensity of that pixel. To simplify the notation, the coordinates index both in the camera and projector will be removed from our equation henceforth. The term Ac is the averaged pixel intensity across the pattern set, which can be derived according to:
such that the image Ac is equal to an intensity or texture photograph of the scene. Correspondingly, the term Bc is the intensity modulation of a given pixel and is derived from Inc as:
such that Bc can be thought of as the amplitude of the sinusoid reflecting off of a point on the target surface. If Inc is constant or less affected by the projected sinusoid patterns, Bc will be close to zero. Thus BC is employed as a shadow noise detector/filter such that the shadow-noised regions, with small BC values, are discarded from further processing.
In order to triangulate to, a specific point, a point's position in both camera and projector space must be known. While the camera position is known explicitly due to captured image being in camera space, the projector points are determined by matching a phase value measured at a point to that same phase value in the sinusoidally varying pattern. The phase value at each pixel captured in the camera space, φ(xc, yc), is determined by projecting N phase shifted patterns at the target and processing a sequence of N images by carrying out, the following expression for the reliable pixels with sufficiently large, Bc
where n denotes an index into the image sequence and In(xc, yc) is the pixel intensity value at the position (xc, yc) in the nth image in the sequence.
For a pattern frequency, f=1, often termed a base frequency since it is a pattern of a single period of sinusoidal variation, the phase, φ, is readily mapped to a projector frame percentage along the Phase Direction according to the expression
Notice that yp is not actually a coordinate in the projector space, but it is a value ranging from 0 to 1 that denotes a percentage of the distance between the bottom of the projector frame and the top.
An ambiguous phase problem occurs when increasing the frequency beyond 1 even though better depth resolution can be achieved through the use of higher frequency patterns. The problem is due to the periodic nature in sinusoids, and can be explained by examining the phase variation in φ(xc, yc) and comparing it to the expression immediately, above. The variation in φ(xc, yc) is always from 0 to 2π but at f>1 the above expression will only vary between 0 and 1/f. So at higher frequencies, this creates what is called a repeated—or wrapped phase image—that requires ‘unwrapping’ to fully acquire the unambiguous phase value for the correct yp coordinate. In digital image processing it is known that, while accuracy of phase shifting patterns tends to increase with the number of periods, this increase comes with it the cost of depth ambiguity.
In a general sense, phase unwrapping ensures that all appropriate multiples of 2π have been included in the phase response (i.e., the complex angle of the frequency response). When phase is constrained to an interval such as −π, π or 0, 2π it is deemed a wrapped phase; when not so constrained it is an unwrapped phase. A mathematical description of phase unwrapping is φ=ψ+2kπ, where φ is the unwrapped phase, ψ is the wrapped phase, and k is an integer counting the number of 2π multiples.
Triangulation is based upon known geometric relationships to determine the points in a spatial coordinate system.
where the center of projection is located on the origin of the world coordinate system for the ideal model and λ is a non-zero scalar. The camera coordinate system and world coordinate system are both represented in
pc=Rpw+T Eq. (2.8)
Applying the assumption that the camera and projector are adequately modeled using the pinhole lens model, their perspective matrices, Mc (camera) and Mp (projector) are given by:
A benefit of performing PMP in only the yp direction is that the equations become simpler, and only three parameters (xc, yc, yp) are necessary for triangulating the corresponding world coordinate for each pixel. Of the required parameters, the camera coordinates (xc, yc) are known explicitly, while the yp value is determined by the specific method for SLI. The 3-D world coordinates Xw, Yw, and Zw are defined conventionally according as
It is this matrix inversion as well as the arctangent computation in Eq. (2.5) that has proven bottlenecks in preventing real-time surface reconstruction using conventional techniques.
In their U.S. Provisional Patent Application No. 61/371,626, two examples of applicants' new multi-frequency pattern were introduced in terms of analogies to traditional electrical circuitry signal/current propagation types: An AC (alternating current) flavor and an DC (direct current) flavor. Applicants coined the multi-frequency pattern fashioned after principals governing AC as “Dual-frequency Phase Multiplexing” (DFPM); and coined multi-frequency pattern fashioned after principals governing DC as “Period Coded Phase Measuring” (PCPM) functioning according to principals governing DC. The example called DFPM patterns comprised two superimposed sinusoids, one a unit-frequency phase sine wave and the other a high-frequency phase sine wave, whereby after receiving/acquiring the pattern data by an image sensor, the phase of the two patterns is separated. The unit-frequency phase is used to unwrap the high-frequency phase. The unwrapped high-frequency phase is then employed for 3-D reconstruction. PCPM patterns is an example of considering DC current propagation, whereby the period information is embedded directly into high-frequency base patterns; such that, the high-frequency phase can be unwrapped temporally.
A unit-frequency sinusoid (in this context, ‘unit’ refers to having a magnitude value of 1) is superimposed on a high-frequency sinusoid, the unit-frequency sinusoid and high-frequency sinusoid being projected simultaneously (i.e., effectively ‘on top of one another’ over a selected epoch/duration of frames, n) from a projection unit, or projector, as a plurality of pixels such that each of the pixels projected satisfy the expression
where Inp is the intensity of a pixel in the projector, Ap, B1p, and B2p are constants set such that the value of Inp falls between a target intensity range, (e.g., between 0 and 255 for an 8-bit color depth projector), f, is the high frequency of the sine wave, fu is the ‘unit’ frequency of the sine wave. The unit-frequency signal/sinusoid is used during a demodulation step to produce a decodable, unwrapped-phase term temporally. As stated above, for reconstruction, a camera captures each image where the sine wave pattern is distorted by the scanned surface topology, resulting in the patterned images expressed as:
where (xc, yc) are the coordinates of a pixel in the camera while Inc(xc, yc) is the intensity of that pixel. For this case, Eq. (2.2) reduces to a form of Eq. (1.2) and becomes:
where Inc is the intensity of a pixel in the camera. The term Ac is still the averaged pixel intensity across the pattern set, which can be derived by Eq. 3 such that the image Ac is equal to an intensity or texture photograph of the scene. Correspondingly, the term B1c is the intensity modulation of a given pixel corresponding to φh and is derived from Inc, as:
with m=1 such that B1c can be thought of as the amplitude of the sinusoid reflecting off of a point on the target surface.
Similar to traditional PMP, if Inc is constant or less affected by the projected sinusoid patterns, then B1c will be close to zero, indicating that B1c is very sensitive to noise in either the camera, projector, and/or ambient light. Thus B1c is employed as an indicator of the signal to noise ratio such that excessively noisy regions, with small B1c values, are discarded from further processing [25]. B2c is derived from Inc by Eq. 10 with m=2 and it is the intensity modulation corresponding to φu. Of the reliable pixels with sufficiently large B1c, the phase-pair, (φh, φu), is derived as:
where φh represents the wrapped phase value of the captured pattern and φu represents the base phase used to unwrap φh.
LUT-Based Processing
LUT-based programming is a well known and widely used technique for minimizing processing latency. Implementing LUT-based processing, uniquely derived are modulation Bc, and phase, φ, terms for traditional PMP suitable for all triangulation-based 3-D measurement, including the dual-frequency pattern set detailed in this EXAMPLE 01.
Modulation
Uniquely, to reduce computational complexity associated with conventional techniques, modulation Bc is reduced to
for N=6. Noting that we need only salve these equations for 8-bits per pixel Inc, images, we can implement a modulation look-up table, MLUT, that, for N=3, is defined according to
where integer indices V and U are derived from
V=I1c−I2c and U=2I0c−I1c−I2c; (16)
for N=4, is defined according to
where integer indices V and U are derived from
V=I1c−I3c and U=I0c−I2c; (18)
or for N=6, is defined according to
where integer indices V and U are derived from
V=I1c+I2c−I4c−I5c and U=2I0c−2I3c+I1c−I2c−I4c+I5c. (20)
The double-precision results of Eq. (15), (17) or (19) are stored in the MLUT. In contrast to Eq. (4). MLUT reduces the run-time computation cost of modulation without losing accuracy, where the size of the LUT is determined by the number of bits per pixel of the camera sensor and the number of patterns being projected.
For our proposed dual-frequency pattern scheme, the minimum number of patterns is 5, but to take advantage of our LUT-based processing, we set the number of patterns to 6 where the modulation terms B1c and B2c; in Eq. (9), are described as:
In practice, we need only calculate B1c as a shadow filter and for representing object texture. The MULT for B1c is the same as Eq. (19).
4.2. Phase
As our second step to producing real-time 3-D video with texture, we intend to minimize the computational complexity associated with generating phase video where the arctangent function has long been considered a significant obstacle for fringe pattern analysis [26]. Previous approaches to this problem include Huang et al.'s cross ratio algorithm [22] and Guo et al.'s approximation algorithm [26]. However, these methods reduce computational cost at the expense of accuracy and are not faster than our LUT-based algorithms.
As we did with Eq. (4) for Bc, we simplify Eq. (5) according to the number of patterns N such that
for N=6. Again based on the fact that the intensity values of grabbed images are range-limited integers, we can implement these calculations through a phase LUT (PLUT), for N=3, defined according to
where U and V are defined in Eq. (16); for N=4, defined according to
where U and V are defined in Eq. (18); or for N=6, defined according to
where U and V are defined in Eq. (20). The double-precision results are stored in the PLUT. Thus, the nine-consuming arctangent operation is pre-performed, and phase values are obtained by accessing the pre-computed PLUT whose size is, again, determined by the number of bits per pixel of the sensor as well as the number of patterns projected with no loss in accuracy. Compared to Eqs. (5), PLUT avoids computing arctangent function at run-time such that the computational cost of phase is greatly reduced without introducing distortion.
For our proposed dual-frequency pattern scheme, the more accurate wrapped phase φh and the coarse unit frequency phase φu pair in the Eq. (8) is rewritten as
The PLUT for φh is the same as the Eq. (28), and the PLUT for φu is defined as
where V and U are derived from
V=I1c−I2c+I4c−I5c and U=2I0c+2I3c−I1c−I2c−I4c−I5c. (31)
Once φh and φu are obtained, phase unwrapping can be also achieved in real-time.
4.3. 3-D Point Cloud
Having obtained both the phase and modulation images by means of LUTs, we begin our derivation of a LUT for the purpose of implementing Eq. (7). After matrix inverse and matrix multiplication operations. Eq (7) can be expanded into direct algebraic forms as
Xw=Mx(xc,yc)+Nx(xc,yc)T
Yw=My(xc,yc)+Ny(xc,yc)T
Zw=Mz(xc,yc)+Nz(xc,yc)T.
where T=(C(xc,yc)yp+1)−1 and Mx, My, Mz, Nx, Ny and Nz, are defined in the Appendix.
We then note that, based on the mapping of world coordinates to the camera plane [7] by Mc, if Zw is calculated according to
Zw=Mz(xc,yc)+Nz(xc,yc)T, (32)
then Xw and Yw can be computed as
Xw=Ex(xc,yc)Zw+Fx(xc,yc) and Yw=Ey(xc,yc)Zw+Fy(xc,yc), (33)
where
Implementing the 7 parameters Mz, Nz, C, Ex, Ey, Fx, and Fy by means of table look-up for indices (xc,yc) (camera column and row indices), reduces the total computational complexity associated with deriving the 3-D point cloud from the phase term to 7 look-ups. 4 additions, 3 multiplications, and 1 division, which is significantly less than what is involved in performing matrix inversion and matrix multiplication, as required by Eq. (7). It should be noted that the method presented in Eqs. (32) and (33) can be applied to all triangulation-based, 3-D coordinate, reconstruction techniques including stereo vision, 3-D from video, time-of-flight, and so on.
FIGS. 21(A)-(F) are reproductions of the MLUT and PLUT results for the k=1, k=2, and k=3 terms, utilizing the dual-frequency sinusoid patterns set out in EXAMPLE 01. The MLUT values for k=1 and k=2 is an indicator of how reliable the PLUT values are. Larger values of MLUT for k=1 and k=2 are preferable. Large values for MLUT on k=3 are undesirable because that is an indication of error or movement during scanning. It is preferable to obtain small values for MLUT on k=3. Where values on MLUT for k=3 that are relatively large, the pixels are discarded and no XYZ points are generated for them.
Structured light illumination is a process of 3-D imaging where a series of time-multiplexed, striped patterns are projected onto a target scene with the corresponding captured images used to determine surface shape according to the warping of the projected patterns around the target. In a real-time system, a high-speed projector/camera pair are used such that any surface motion is small over the projected pattern sequence, but regardless of acquisition speed, there are always those pixels near the edge of a moving surface that capture the projected patterns on both foreground and background surfaces. These edge pixels then create unpredictable results that typically require expensive processing steps to remove, but in this paper, we introduce a novel filtering process that identifies motion artifacts based upon the discrete Fourier transform applied to the time axis of the captured pattern sequence.
Reference made to
Herein, to carry out the unique features of the technique described in this EXAMPLE 02, the commonly used SLI technique of phase measuring profilometry (PMP), was employed. Like stereo-vision, PMP attempts to reconstruct a 3-D object surface by triangulating between the pixels of a digital camera, defined by to coordinates (xc, yc), and projector (Np, yp). And, assuming that the camera is epipolar rectified to the projector along the x axis, one need match (xc, yc) to a yp coordinate. As such, SLI encodes/modulates yp coordinates of the projector over a sequence of N projected patterns such that a unique yp value can be obtained by demodulating the captured camera pixels.
In PMP, the component patterns are defined by the set, {Inp: n=0, 1, . . . , N−1}, according to:
where (χp, yp) is the column and row coordinate of a pixel in the projector, Inp is the intensity of that pixel in a projector with dynamic range from 0 to 1, and n represents the phase-shift index Over the N total patterns.
For reconstruction, a camera captures each image where the sine wave pattern is distorted by the scanned surface topology, resulting in the patterned images expressed as:
where (χc, yc) is the coordinates of a pixel in the camera while Inc(χc, yc) the intensity of that pixel. The term Ac is the averaged pixel intensity across the pattern set that includes the ambient light component, which can be derived according to:
Correspondingly, the term Bc is the intensity modulation of a given pixel and is derived from Inc(χc, yc) in terms of real and imaginary components where:
which is the amplitude of the observed sinusoid.
If Inc(χc, yc) is constant or less affected by the projected sinusoid patterns, Bc will be close to zero. Thus Bc is employed as a shadow noise detector/filter [6] such that the shadow-noised regions, with, small Bc values; are discarded from further processing. Of the reliable pixels with sufficiently large Bc, θ represents the phase value of the captured sinusoid pattern derived as:
which is used to derive the projector row according to θ=2πyp.
If we treat the samples of a subject camera pixel as a single period of a discrete-time signal, χ[n] for n=1, 2, . . . , N−1, then we can define the Fourier terms, X[k] for k=1, 2, . . . , N−1, using a discrete-time Fourier transform. From X[k], we then note that Ac is related to the DC component according to:
while Bc and θ are related to X[1]=X*[N−1] according to:
and
θ=∠X[1]=−∠X[N−1]. (10)
Because, of these relationships, we refer to the frequency terms, X[0], X[1], and X[N−1], as the principal frequency components, while the remaining terms are referred to as the non-principal terms and are the harmonics of X[1].
Under ideal conditions, the non-principal frequency terms, X[k] for k=2, 3, . . . , N−2, are always equal to zero, but in the presence of sensor noise, the terms are, themselves, drawn from an additive, white-noise process. At the same time, if a target surface is moving toward or away from the camera sensor at a slow rate, then the frequency of the sinusoid represented in the time sequence, χ[n], is modulated to have a slightly higher or lower frequency, and as such, we expect the spectral components X[2] and X[N−2] to increase as the energy in X[1] and X[0] becomes less concentrated. Combining slow motion with additive noise, there exists some threshold, T, at which the magnitude of these non-principal frequency terms can be assumed to reside.
Given the lack of an information bearing signal on the harmonic frequencies, one might be tempted to use these non-principal frequency terms to carry additional phase information for the purpose, for example, of phase unwrapping. But doing so involves the reduction in amplitude of the component sinusoids in order to fit the assorted signals within the dynamic range of both the camera and projector: thereby, reducing the signal to noise ratio on these channels. Absent this use, if a pixel falls off the edge of a foreground object or if there is a dramatic change in surface texture, we expect to see a large magnitude term in the harmonics as the foreground phase term is essentially modulated by a step-edge while the background phase term is modulated by the inverse of this same step.
Step edges and impulses, having large high-frequency spectral components, introduce high-frequency energy in the captured PMP patterns. As such, if a particular non-principal frequency term were to have a magnitude larger than T, then we can assume that this pixel contains a discontinuity in phase. Hence, we can delete this pixel from further processing. Incorporating the above ideas into the real-time SLI system of Liu et al, we employ the first harmonic Dc=2/N∥X[2]∥ such that only those pixels with sufficiently large Bc>TB and sufficiently small Dc>TD are further processed to obtain phase. While there may be a small performance hit by adding a second LUT operation, a portion of that loss is offset by a reduction in the number of pixels that are further processed to obtain θ.
To test performance and LUT-based processing, the Liu et al experimental set up (i.e., that used in connection with EXAMPLE 01, above) was modified using Microsoft® Visual Studio 2005 with managed C++. The imaging sensor used was an 8-bits per pixel, monochrome, Prosilica GC640M, gigabit Ethernet camera with a frame rate of 120 fps and 640×480 pixel resolution. The projector used was composed of a Texas Instrument's Discovery 1100 board with ALP-1 controller and LED-OM with 225 ANSI lumens. The projector has a resolution of 1024×768 and 8-bits per pixel grayscale. The camera and projector were synchronized by an external triggering circuit. As far as the processing unit, a Dell Optiplex 960 with an Intel Core 2 Duo Quad Q9650 processor running at 3.0 GHz was employed.
Using a calibration target of a white box with black circles and the SLI system described above, an image (i.e., snap shot shown in
While certain representative embodiments and details have been shown for the purpose of illustrating features of the invention, those skilled in the art will readily appreciate that various modifications, whether specifically or expressly identified herein, may be made to these representative embodiments without departing from the novel core teachings or scope of this technical disclosure. Accordingly, all such modifications are intended to be included within the scope of the claims. Although the commonly employed preamble phrase “comprising the steps of” may be used herein, or hereafter, in a method claim, the applicants do not intend to invoke 35 U.S.C. §112 ¶6 in a manner that unduly limits rights to its claimed invention. Furthermore, in any claim that is filed herewith or hereafter, any means-plus-function clauses used, or later found to be present, are intended to cover at least all structure(s) described herein as performing the recited function and not only structural equivalents but also equivalent structures.
This application claims benefit of U.S. Provisional Patent Application No. 61/371,626 filed 6 Aug. 2010 describing developments of the applicants hereof, on behalf of the assignee. The specification, drawings, and technical materials of U.S. Prov. Pat. App. No. 61/371,626 are hereby incorporated herein by reference, in their entirety, to the extent it is consistent with, and provides further edification of the advancements set forth in, the instant application.
The invention disclosed in this U.S. patent application was made, in part, with United States government support awarded by the following agencies: Department of Justice/National Institute of Justice, Grant No. 304688600. Accordingly, the U.S. Government has rights in this invention.
Number | Name | Date | Kind |
---|---|---|---|
5633755 | Manabe et al. | May 1997 | A |
6788210 | Huang et al. | Sep 2004 | B1 |
6874894 | Kitamura | Apr 2005 | B2 |
6977732 | Chen et al. | Dec 2005 | B2 |
7440590 | Hassebrook et al. | Oct 2008 | B1 |
7844079 | Hassebrook et al. | Nov 2010 | B2 |
8224068 | Hassebrook et al. | Jul 2012 | B2 |
20080279446 | Hassebrook et al. | Nov 2008 | A1 |
20120113229 | Hassebrook et al. | May 2012 | A1 |
20120120412 | Bellis et al. | May 2012 | A1 |
20120120413 | Bellis et al. | May 2012 | A1 |
Entry |
---|
Frisken, S. F.. R. N. Perry, A. P. Rockwood, and T. R. Jones, “Adaptively sampled distance fields: A general representation of shape for computer graphics,” in Proceedings of the 27th annual conference on Computer graphics and interactive techniques, 249-254 (2000). |
Li, Jielin, Hassebrook, L. G., and Guan. C., “Optimized two-frequency phase-measuringprofilometry light-sensor temporal-noise sensitivity,” J. Opt. Soc Am. A. vol. 20, No. 1, pp. 106-115 (Jan. 2003). |
Number | Date | Country | |
---|---|---|---|
20120092463 A1 | Apr 2012 | US |
Number | Date | Country | |
---|---|---|---|
61371626 | Aug 2010 | US |