In general, the present invention relates to computer implemented digital structured light illumination (SLI) systems and techniques for performing three-dimensional (“3-D”) image acquisition to digitize an artifact feature or contoured surface. More-particularly, the invention is directed to a unique computer-implemented process, system, and computer-readable storage medium having stored thereon, executable program code and instructions for 3-dimentional (3-D) image acquisition of a contoured surface-of-interest under observation by at least one camera and employing a preselected SLI pattern. The system includes a 3-D video sequence capture unit having (a) one or more image-capture devices for acquiring video image data as well as color texture data of a 3-D surface-of-interest, and (b) a projector device for illuminating the surface-of-interest with a preselected SLI pattern in, first, an initial Epipolar Alignment and ending in alignment with an Orthogonal (i.e., phase) direction, and then shifting the preselected SLI pattern (‘translation’ step).
A follow-up ‘post-processing’ stage of the technique includes analysis and processing of the 3-D video sequence captured, including the steps of: identification of Epipolar Alignment from the 3-D video sequence, tracking ‘snakes’/stripes from the initial Epipolar Alignment through alignment with the Orthogonal direction, tracking ‘snakes’/stripes through pattern shifting/translation; correcting for relative motion (object motion); determining phase and interpolating adjacent frames to achieve uniform phase shift; employing conventional PMP phase processing to obtain wrapped phase; unwrapping phase at each pixel using snake identity; using conventional techniques to map phase to world coordinates. See the flow diagram in
Rotate and Hold and Scan (RAHAS) is a phrase coined by applicants for their unique SLI technique and associated system. In a preferred embodiment the technique employs mathematical ‘snake tracking’—likewise coined by applicants and further detailed in U.S. Pat. App. 12/284,253 entitled “Lock and Hold Structured Light Illumination” filed by two named applicants hereof on 18 Sep. 2008, and commonly owned by the assignee of the instant application. As explained in further detail elsewhere, ‘snake tracking’ is used in connection with the instant invention to identify the projected pattern stripes to remove ambiguities from a traditional Phase Measuring Profilometry (PMP) scan.
The acronym RAHAS, as coined by applicants and used herethroughout, recognizes three types of movement of a preselected SLI pattern across the object: Rotate, Hold, and Scan (e.g., see graphical representations labeled
The unique computer-implemented process, system, and computer-readable storage medium having stored thereon, executable program code and instructions, can be characterized as having a couple of stages: an on-site 3-D video sequence capture—or data collection—stage (
SLI works by measuring the deformation of a light pattern that has been projected on the surface contours of an object. The pattern is used to identify the points in camera coordinates; mathematical operation is performed to find the position of each point in 3-D space. There are a variety of SLI pattern types in use: single dimensioned lines, two dimensioned stripes, grids or dot matrices. For example, the unique composite SLI patterns may be employed as disclosed in U.S. Pat. No. 7,844,079 B2 granted 30 Nov. 2010 to Hassebrook et al. and U.S. Pat. No. 7,440,590 B1 granted 21 Oct. 2008 to Hassebrook et al., both patents having at least one common inventor with the instant application and commonly-owned with the instant application upon filing, and both patents entitled “System and Technique for Retrieving Depth Information about a Surface by Projecting a Composite Image of Modulated Light Patterns.”
SLI measurement is based on the mathematical operation known as triangulation. Results are useful when there is a well defined relationship between a single point on the projection plane of interest and a corresponding point on a captured image. It is to establish this relationship that SLI projection patterns are utilized. An SLI projection pattern is preferably designed such that each pixel (or row or column, depending on the specific implementation) of the projection image is uniquely characterized, either by some characteristic intensity value sequence or by some other identifiable property such as a local pattern of shapes or colors. When projected onto a subject of interest, an image of the pattern illuminating a projection plane of interest captured by a camera is analyzed to locate these identifiable projection pattern points. Given a fixed location for both camera and projector, the location of any given pattern point on the subject creates a unique triangle with dimensions defined by the depth of the subject surface.
SLI systems are further classified into single-frame techniques, which require only one image capture to calculate surface characteristics, and multi-frame (or “time multiplexed”) techniques, which require multiple projection/capture instances in order to acquire enough information for surface reconstruction.
Hassebrook, et al. U.S. patent application 12/284,253 “Lock and Hold Structured Light Illumination” explains the various aspects of their technique, system and program code for 3-dimentional image acquisition of a surface-of-interest under observation by at least one camera using structured light illumination, as follows:
The in-process manuscript labeled herein as “ATTACHMENT A” titled Methodology and Technology for Rapid Three-Dimensional Scanning of In Situ Archaeological Materials in Remote Areas was authored by the applicants hereof and labeled “EXAMPLE A.” as an integral part of applicants' pending U.S. provisional Pat. App. No. 61/358,397—to which the instant application claims priority. Not only does their provisional app. EXAMPLE A. manuscript highlight implementation of RAHAS in the practice of archaeology, but it also highlights the rigorous analysis done by applicants in developing their innovative RAHAS approach; further evidencing the complex, multifaceted nature of problems encountered by those attempting to create solutions in the arena of 3-D image acquisition employing SLI.
Archaeology faces unique challenges among the historical sciences in that much of the accepted methodologies used are destructive to the archaeological resources. From excavation to surface collections, nearly all archaeological fieldwork impacts the resources. This also applies to the deterioration of artifacts during their study through handling by various researchers and shipment between research groups. The technique and system of the invention is adapted for use in the area of archaeology to produce 3-D digitized images of artifacts and other shapes and surfaces located outdoors, inside building structures, and immersed underwater, whether retained untouched, in their native state, or removed and stored or displayed elsewhere. Since the unique 3-D video sequence unit of the system has a small processing footprint, it is useful for 3-D image acquisition where system weight/portability is a concern, conditions are harsh, and/or electrical power resources are scarce or nonexistent. While the 3-D surface acquisition technology disclosed herein will assist archaeologists—especially those working in remote locations—face these challenges, the technique and system of the invention have a multitude of uses and applications beyond archeology, such as to catalog and digitize 3-D surfaces or contours of all sorts whether above-ground, underground, underwater, in outer space, and whether composed of living matter, manmade structures, artwork, mammal anatomy (e.g., faces), and such.
I. Digital computers. A processor is the set of logic devices/circuitry that responds to and processes instructions to drive a computerized device. The central processing unit (CPU) is considered the computing part of a digital or other type of computerized system. Often referred to simply as a processor, a CPU is made up of the control unit, program sequencer, and an arithmetic logic unit (ALU)—a high-speed circuit that does calculating and comparing. Numbers are transferred from memory into the ALU for calculation, and the results are sent back into memory. Alphanumeric data is sent from memory into the ALU for comparing. The CPUs of a computer may be contained on a single ‘chip’, often referred to as microprocessors because of their tiny physical size. As is well known, the basic elements of a simple computer include a CPU, clock and main memory; whereas a complete computer system requires the addition of control units, input, output and storage devices, as well as an operating system. The tiny devices referred to as ‘microprocessors’ typically contain the processing components of a CPU as integrated circuitry, along with associated bus interface. A microcontroller typically incorporates one or more microprocessor, memory, and I/O circuits as an integrated circuit (IC). Computer instruction(s) are used to trigger computations carried out by the CPU.
II. Computer Memory and Computer Readable Storage. While the word ‘memory’ has historically referred to that which is stored temporarily, with storage traditionally used to refer to a semi-permanent or permanent holding place for digital data—such as that entered by a user for holding long term—more-recently, the definitions of these terms have blurred. A non-exhaustive listing of well known computer readable storage device technologies compatible with a variety of computer processing structures are categorized here for reference: (1) magetic tape technologies; (2) magnetic disk technologies include floppy disk/diskettes, fixed hard disks (often in desktops, laptops, workstations, etc.), (3) solid-state disk (SSD) technology including DRAM and ‘flash memory’; and (4) optical disk technology, including magneto-optical disks, PD, CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-R, DVD-RAM, WORM, OROM, holographic, solid state optical disk technology, and so on.
Briefly described, once again, the invention includes a unique computer-implemented process, system, and computer-readable storage medium having stored thereon, executable program code and instructions for 3-dimentional (3-D) image acquisition of a contoured surface-of-interest under observation by at least one camera and employing a preselected SLI pattern. The system includes a 3-D video sequence capture unit having (a) one or more image-capture devices for acquiring video image data as well as color texture data of a 3-D surface-of-interest, and (b) a projector device for illuminating the surface-of-interest with a preselected SLI pattern in, first, an initial Epipolar Alignment and ending in alignment with an Orthogonal (i.e., phase) direction, and then shifting the preselected SLI pattern (‘translation’ step). A follow-up ‘post-processing’ stage of the technique includes analysis and processing of the 3-D video sequence captured.
For purposes of illustrating the innovative nature plus the flexibility of design and versatility of the new system and associated technique, as customary, figures are included. One can readily appreciate the advantages as well as novel features that distinguish the instant invention from conventional computer-implemented 3D imaging techniques. The figures as well as any incorporated technical materials have been included to communicate the features of applicants' innovation by way of example, only, and are in no way intended to limit the disclosure hereof. Each item labeled an ATTACHMENT is hereby incorporated herein by reference for purposes of providing background technical information.
The flow diagram
An overview of the overall snake tracking process is presented in
A visual representation of the phase shift estimating process is shown in
To visualize triangulation, the simplistic pinhole model represented in
ATTACHMENT A, authored by applicants hereof, is an in-process manuscript titled Methodology and Technology for Rapid Three-Dimensional Scanning of In Situ Archaeological Materials in Remote Areas, which was labeled as “EXAMPLE A.” as an integral part of applicants' pending U.S. provisional Pat. App. No. 61/358,397. Just as it served an integral part of applicant's pending provisional Pat. App. No. 61/358,397, the ATTACHMENT A/EXAMPLE A. is incorporated by reference in its entirety, herein.
ATTACHMENT B is an article Kumar, B. V. K. Vijaya and Hassebrook, L., Performance measures for correlation filters, APPLIED OPTICS, Vol. 29, No. 20, pp. 2997-3006 (10 Jul. 1990); provided for its technical background and incorporated by reference, herein.
ATTACHMENT C is an article Li, Jielin, Hassebrook, Laurence G., and Guan, Chun, “Optimized two-frequency phase-measuringprofilometry light-sensor temporal-noise sensitivity,” J. Opt. Soc. Am. A, Vol. 20, No. 1, pp. 106-115 (January 2003); provided for its technical background and incorporated by reference, herein.
By viewing the figures incorporated below, and associated representative embodiments, along with technical materials outlined and labeled ATTACHMENT A, ATTACHMENT B, and ATTACHMENT C one can further appreciate the unique nature of core as well as additional and alternative features of the new system and associated technique for 3-D image acquisition disclosed herein. Back-and-forth reference and association will be made to various features identified in the figures.
Referring to
The units 100, 200 depicted in
As further highlighted graphically in sequence (follow arrows) in
In 3-D contour acquisition, it is important to be able to identify each pixel explicitly; this has led to employment of several different temporal and spatial techniques for pattern creation for point discrimination, such as Binary or Gray Scale encoding or PMP. Binary/Gray Scale encoding and PMP employ multiple pattern projections that are time multiplexed, which means that the patterns occur in succession, one after each other. Performing the patterns in succession increases the time to scan for each new pattern that must be included. According to the invention, to save on scanning time and allow real-time processing, the time sequenced pattern can be projected as a single pattern through a technique called composite pattern.
Referring, again, to
The video camera captures the sequence of images during the pattern motion outlined in
For example, the computer controlled PMP capture process can be thought of as a discrete-time systemof the projected patterns. The pattern that represents a specific phase shift is projected onto the object, the camera captures that image, and then the new pattern with incremented phase is projected. This is repeated until the pattern has been shifted a full period, and only those signals at distinct phase shifts are ever projected upon the object. In contrast, the RAHAS technique can be thought of as an analog-to-digital conversion. The process uses a slide projector, which is controlled with a hand-turned lever and crank, that continuously varies the phase of the pattern across the surface of the object. The camera captures the images of the pattern on the surface at discrete-time intervals related to the video rate of the camera.
As opposed to the computer-generated PMP's succession of precisely shifted images, the instant technique is adapted for manual control of the phase-shift of the pattern. This manual control causes the translational motion of the projected pattern to have non-constant motion, which causes the sinusoidally varying temporal responseof a single pixel to distort. The temporal response of each pixel is vital to the success of PMP because this distortion is frequency or time dependent, which can obscure the phase information for accurate identification of pixels in triangulation. The distortion can be characterized by a temporal compression or rarefaction of the expected sinusoidal shape, which is a change in the frequency spectra of the signal.
The flow diagram
The computer-implemented process begins by loading and preparing a sequence of images that have captured the SLI pattern motion, described in the previously. The sequence of images are normalized, to balance the slight variations when using an automatic camera, and filtered, in order to remove noise and smooth the images for ease of snake tracking. Two filters are used, a median filter and a moving average filter, both with rectangular kernels.
After the images are filtered, a rotation is performed on each image in the sequence to orient their stripes to the phase direction, shown in
The angle of rotation for each image in the sequence is determined by the equation
where N denotes the number of frames in the rotation sequence, and n is the frame index. The angles with relatively flat target surfaces could alternatively be determined using a 2-D FFT and measuring the angular displacement of the peaks corresponding to the sinusoidal pattern. This method worked well for image frames in which the stripes were straight, but when surface changes distorted the stripes to oblique angles, like what occurs when projecting onto the calibration grids, this approach does not work reliably.
The definition of a snake specifies that they be placed along the stripes of the sinusoidal pattern that is used for the PMP scan. In order to “snake” an image, the peak position of the stripes must be determined. The determination of the peak locations is by the usage of a peak-to-sidelobe-ratio (PSR) operation performed on the image frame Ai(x,y), and found in the functionanglePSR. This function generates a separate image Apsr (x,y) through the equation
where r is the expected sidelobe offset.
Next a function called snakeMaskPeakPSR generates the Snake Mask image, D, that has the position of the snakes in each image highlighted. The first step in the function snakeMaskPeakPSR is a thresholding operation on the PSR image, Apsr(x,y), with user defined minimum PSR value, minA, and minimum pixel value of A, minpsr, such that
Next, a vertical search is performed on the resulting thresholded image, A′psr, to find regions of non-zero values, so that the maximum values in these regions can be determined such that the region S is defined by
Sε{s
0
,s
0+1, . . . ,s1−1,s1} where s1>s0
and
A′
psr(x,s0−1)=0 and A′psr(x,s1+1)=0 and A′psr(x,S)>0 (3.4)
with s0 as the lower bound and s1 as the upper bound of the region S.
The binary encoding is defined by
D(x,y)=255
where
A′
psr(x,y)=Max{v|v=A′psr(x,y+S) and Sε{s0,s0+1, . . . ,s1−1,s1}} (3.5)
so that the resultant image is the binary Snake Mask image, D, that has zeros everywhere, except where the PSR of the image frame A has local maxima. These local maxima correspond to the centers of the stripes in the sinusoidal pattern, and the location of snake pixels that will be named and tracked.
Snakes are initially identified in the first frame of the sequence with the function SnakeFind_init2Snake, which operates on the image D after the number of snakes and their spacing is determined. The findNumSnakes function generates the number and the spacing of the snakes by using a 2-D FFT on the first frame, the frame with undistorted stripes, to determine the frequency of the sine pattern.
The Snake Mask D is processed using SnakeFind_init2Snake to label the snakes in a narrow vertical band in the middle of the image. The function collapseSlice is used to average the location of the snakes in the D matrix to give a position for the start of the search algorithm.
With the width of the image, Nx, the midpoint is
The collapse equation can be described by the equation
where dm is the radius, in the x-direction, of the vertical band about the midpoint. The output rs is a 1-D vector that is searched by the function findCollapsePeaks to find where the summations from the collapse were the largest. The search process can be described by
where rpeaks is a 1-D vector that has length My, the height of the image, and has a value of 1 where the snakes are located. Since the indexes into rpeaks(y) correspond to y-coordinates in the center column of the image, the final step is to store all the indexes (or y-coordinates of the snakes) that have been flagged as a snake into the vector slocs. This allows the recovery of the snake y-coordinate values by iterating through the elements of the vector slocs, and serves as a starting point for the region searched in the next part of this algorithm. This region is defined by a fixed maximum distance in the vertical direction, deltaY=10 pixels, and the horizontal direction, deltaX=5 pixels, although the pixels are searched starting with the central points column and only searching adjacent columns if no snake pixel is found. An overview of the process can be seen in
The result of the SnakeFind_init2Snake function is the labeled identity of each snake, although only for a single pixel. The next step is to “grow” the snakes from these single pixels.
Now that there is a single pixel identified for each snake, these lone pixels can be “grown” horizontally to fill out the rest of the snake in Si. The snakes will be expanded by a single pixel at a time, until the end of the image has been reached, an annomally prevents snake detection, or a large shadow exists on the object to prevent growth. This expansion process is performed by the functions SnakeFind_growLeft and SnakeFind_growRight, which will also be used to fill in holes in the snakes during the snake tracking. While we will discuss the algorithm specifically for growRight, the only difference between the two (Right or Left) is a change of orientation.
The equations describing the growRight process are as follows
where xmgrow and ymgrow are values that define the maximum distances to search in their respective directions.
The grow functions begin by searching through a snake row in the Si matrix looking for the “end” of a snake, which corresponds to the current pixel being “blank” (Si(x,msnake)=0) and the previous pixel being “active” (Si(x,msnake)>0). When the “end” of the snake is found, a search is performed for the “valid” (D(x,y)=255) snake pixel in the Snake Mask D with minimum distance to the location of the active pixel at the end of the labeled snake. This search of D starts at the current pixel's column, x, and the previous “active” pixel's y coordinate, determined from the Sy matrix y=Sy(x±1, msnake). The search progresses in single pixel increments, both up and down, until a valid pixel is found, or a y-search limit has been reached. If the y-search limit has been reached, then the search continues in the next column (in the direction of the grow) starting at the same y coordinate and repeating the vertical search. This is repeated until a valid pixel is found in D and stored into {Sp, Sy, Si}, or the x-search limit is reached, in which case the current “end” of the snake is left alone and a new “end” is sought to repeat the same process. Using the SnakeFind_grow[Left|Right] functions, whole snakes can be formed in the initial image, and the tracking of the snakes can begin.
Due to the orientation of the snakes in the horizontal direction (from the “un-rotation” of the images) and the slow movement of the pattern relative to the image capture rate, a snake pixel is assumed to only shift a small amount in its column between image frames. These small variations allow a search in the snake pixel's column to find the location that the snake pixel has shifted, as long as the shift was within a maximum allowable distance. By finding the position of each pixel in the snake after it has shifted, the snakes can be tracked between two frames. If this is continued frame after frame, snake tracking in a whole sequence of images can be accomplished. An overview of the overall snake tracking process is presented in
The SnakeFind_multiPassTrackDeltaY function performs snake tracking after the first frame's snakes have been identified, and continues through the whole image sequence. The notation used for the sequence is as follows: n, denotes the current frame, and N, denotes the total number of frames. The snakes are tracked by comparing the marked pixels in the Snake Mask of the current frame (Dn) to the snake pixels in the Snake Matrices of the previous frame {SpB4, SyB4, SiB4}, and determining whether the snake pixel identified in the previous frame has moved within a maximum acceptable vertical distance ymtrack) in its respective column. The equations describing this function are as follows
where ms is the snake that is currently being operated on, which also corresponds to the row of the Snake Matrices, and ymtrack is the maximum distance to search in the vertical direction.
The multiPass designation means that the process was performed twice, once starting with the first snake and proceeding top-down, and a second time starting with the last snake, and proceeding bottom-up. Both of these snake matrices are generated, compared, and only the pixels that tracked the same in both passes are kept, or the output is the intersection of the sets of “top-down” and “bottom-up”. The multiPass process can be characterized by
Si
n
=Si
nDown
∩Si
nUp (3.11)
where SinDown and SinUp are both determined from Equation 3.10, but using opposite directions for processing the snakes. SinDown begins at the top and works down (from snake 1 to N), while SinUp begins at the bottom and works up (from snake N to 1).
The process begins at either the first snake or the last snake in the previous frame's snake matrices {SiB4, SyB4, SpB4} depending on the pass direction that is being processed, with the other direction to be processed after, and the intersection of both passes as the final result. For each snake (ms) in the snake matrix SiB4, the snake pixels are looped through and processed if the pixel is active (SiB4(x, ms)>0). When an active snake pixel is found, the snake pixel's image y-coordinate, y=SyB4(x,ms), is used as a starting point for a search in the current Snake Mask Dn(x,y). The search progresses in single pixel increments, both up and down, until a valid pixel is found, or the y-search limit (ymtrack) has been reached. If a valid pixel is found then it is stored in the current Snake Matrix set {Sin, Syn, Spn}, and the process proceeds to the next snake pixel. The process also proceeds to the next snake pixel if the search reaches the limit without finding a valid pixel in the Snake Mask. This continues through all the pixels in the current snake, then checks the subsequent snakes (either above or below depending on the pass direction). After all the snakes in a frame are processed, and the intersection of the multiPass has been determined as in Equation 3.11 the function SnakeFind_multiPassTrackDeltaY generates a set of Snake Matrices that correspond to the current frame {Sin, Syn, Spn}. This set of Snake Matrices is processed with SnakeFind_multiPassGrow[Left|Right], as discussed above, but with a multiPass addition, to attempt at filling in any holes that could not be tracked. Now the Snake Matrix set {Sin, Syn, Spn} is stored as the previous set {SiB4, SyB4, SpB4}, and the current frame is incremented—the process repeats to the end of the image sequence.
The phase processing begins after the snakes have been tracked through both the rotation and translation of the sinusoidal pattern, and the snake positions in the image plane are known explicitly. The snake positions will be used in the unwrapping stage to remove the ambiguities of the high frequency PMP scan, and in the estimation of pattern motion used in the correction process. The motion of the sinusoidal pattern (translation in the phase direction) defines the PMP scan stage, but a problem arises due to the motion generated by a hand turned crank. The velocity of the pattern is not constant, so the frame captures of the video camera are not uniformly spaced in the phase of the sinusoidal pattern.
The Phase processing begins with determination of the frames in the PMP sequence. The first frame of the PMP sequence is chosen manually by visual inspection of the video sequence for the frame in which the translation is first noticeable, but the actual number of frames must be determined using the snake positions of this first frame. Using the definition of the pattern as a sinusoid and the location of the snakes at the peaks of the sinusoidal pattern, the vertical distance between two adjacent snakes should be the phase equivalent of 2π, or a single period of a sine wave. Taking advantage of this relationship allows the snake locations to determine when the pattern has moved a distance that would correspond to a full period, and will give the final frame of the sequence.
In the first frame, the y-coordinate of a snake is defined by y0=Sy0(x,msnake), and the y-value that would correspond to a translation of 2π is the next snake's y-coordinate yN=Sy0(x,msnake+1). The x denotes the column, and must be the same. After the initial and final y-values are determined from the first frame (denoted by the 0 subscript), they-value of the same snake in subsequent frames will be compared to yN, they-value corresponding to a phase of 2π. The current frame's y-value is described by
y
n
=Sy
n(x,msnake) (3.12)
where n corresponds to the current frame index. The frame index starts at 1, the first frame after 0 phase, and continues until the boundary of 2π is reached as explained by
nε{1,2, . . . ,k} where Syk(x,ms)<yN<Syk+1(x,ms)) (3.13)
k is the last frame of the PMP sequence just before the y-value passes yN. The Equations 3.12 and 3.13 define the search criteria for finding the end of the PMP sequence, as well as simultaneously performing a task to construct the per frame phase shift estimate, φn.
The phaseshift estimate between frames is based upon the snake distance moved per frame as a percentage of the total distance between snake msnake and the adjacent snake msnake+1 corresponding to the percentage of total phase. φn is the actual phase shift per frame and will be used to interpolate the images In(x,y). A visual representation of this phase shift estimating process is shown in
In order to use the well-known PMP equations discussed in Chapter 2, the image values, In(x,y), need to be corrected and interpolated to the values at the uniformly distributed phase values described by
where θn is the desired or corrected phase values per frame, n is the frame index and N is the total number of frames in the sequence. The interpolation process alters the pixel values in each image of the sequence using the following equations:
where m is an index for the actual phase values denoting the indexes above and below the index n of the desired phase index. With the corrected images that are uniformly spaced in phase, the traditional PMP equations can be used.
As presented in Chapter 2, the equation for the wrapped phase is
and the equation used for the quality image can be described as
The Q image acts as a measure for how good the scan is based upon the peak-to-peak temporal variation of the patterns at each pixel. The Q image should be a gay level image without the appearance of stripes or bands, and for traditional PMP this is the case, but for RAHAS bands are present and have not been able to be removed completely. This banding phenomenon will be discussed later.
Now that the wrapped phase has been generated it must be unwrapped in order to combine the repeated phase bands into a whole phase image. As discussed earlier, the snakes are used to determine the boundaries of the wrapped phase image, but the first step is to linearly generate the correct phase at each snake boundary by the equation
Equation 3.20 assumes a phase of zero at snake k=1, and a phase of 2π at the last snake Ms, which is the total number of snakes that were in the slide pattern. The phase is going to be used to map positions in the camera space to the projector space for triangulation, and we assume 0 phase at the top of the slide pattern, and 2π at the bottom of the slide pattern.
With the phase of each snake known in the projector space, the snakes are used as boundaries to unwrap the bands of the wrapped phase image. The unwrapping of each band is performed pixel-by-pixel and band-by-band by the equation
In which k represents the current boundary, θk is the phase at the boundary, as determined from Equation 3.20, and the pixels within 4 pixels of the boundary are zeroed out to prevent discontinuous phase at the boundaries. For each band, the wrapped phase, φW, is simply added to the constant phase.
Since the unwrapped phase φuw is linearly distributed in the phase direction of the projector coordinates, the Equation 2.4 shows the relationship between the φuw(xc,yc) phase image and a corresponding column on the projector image planeyp. The x-coordinate of a specific pixel (xp,yp) along the column of the projector image plane equals the xc coordinate due to the symmetry of the epipolar alignment of the projector and camera. Using Equation 2.17 the vector (xc,yc,yp) generates a corresponding 3-D world coordinate point (Xw,Yw,Zw).
Lock—Aligning to set an Epipolar Alignment of an SLI pattern projected on a contoured surface-of-interest so as to initially assign an epipolar identity to each snake.
Hold—The unique process of capturing 3D motion using the continuously projected SLI pattern set during ‘Lock’ usually consisting of bands of light (with a sinusoidal cross section), to track whereabouts of illuminated stripes referred to as ‘snakes’.
A film frame, or just frame, is one of the many single consecutive images in a motion picture or video. Individual frames may be separated by frame lines. Typically, 24 frames are needed for one second of a motion picture (“movie”) film.
Frame rate, or frame frequency, is the measurement of the frequency (rate) at which an imaging device produces unique consecutive images ('frames'). The term is used when referring to computer graphics, digital or analog video cameras, film cameras, motion capture systems, and so on. Frame rate is often expressed in frames per second (fps).
Active—A pixel is called an “active” snake pixel when it gets included into the Snake Matrix Set and has a non-zero value in the Si matrix. “active” implies that the pixel is also “valid”
Blank—Calling a pixel “blank” refers to its zero valued entry into the Si matrix.
Snake—A snake is a stripe of single pixel width that is located along the stripes of the sinusoidally varying, or other SLI, pattern. For RAHAS processing, the snake identities are determined by SnakeFind functions and are stored in a Snake Matrix Set, which consist of three matrices {Si,Sy,Sp}.
Snake Matrix Set {Si, Sy, Sp}—A collection of three separate 2-D matrices that are used to identify and label snakes in a given image frame. The matrices are all the same size: the width is the width of the image, and the number of rows in the matrix corresponds to the number of snakes. Each entry in these matrices corresponds to a pixel on a snake, while a given row is a single snake. To index into a matrix, the x coordinate in the image is used along with the snake identity. For example, Sy(3,512) would return y coordinate of snake number 3 at the 512th column in the image.
Snake Identity Matrix, Si—The Si matrix is part of the Snake Matrix Set and is the indicator of whether a snake is valid or not. If a value of 0 is stored in the Si matrix, then the snake is not valid at that pixel. A value greater than 0 represents the identity given to the specific snake, and signifies an identified snake at the corresponding pixel. It is common to have snakes that do not span the whole width of the matrix, leaving gaps in the Si matrix to signify shadows on the object, undetected snake pixels, or that the stripe did not span the width of the image.
Snake Y Matrix, Sy—The Sy matrix is part of the Snake Matrix Set and holds the y-coordinate for the position of the corresponding snake pixel in the image. Since the snakes are oriented in the horizontal direction, the x values are used for indexing into the matrices
Snake Pixel Matrix, Sp—The Sp matrix is part of the Snake Matrix Set and holds the value of the pixel in the image that corresponds to that snake pixel.
Snake Mask, D—A trinary valued matrix the same size as an image in the capture sequence. D has the value 255 where the peaks of the snakes are located and a value of 128 transitioning between the peak locations and the rest of the image which is set to zero. The D image represents the pixels in the target image frame, A, that were identified as having peak characteristics of a snake.
Target Image, A—The target image A is the frame in the image sequence that is being processed. It is occasionally in the drawings, denoted as A.
Valid—A pixel is called a “valid” snake pixel when it has a value of 255 in the Snake Mask D matrix. While a valid pixel will usually be active in the Si matrix, it could have been missed in the identification process and still be un-labeled.
The capture unit 100, 200 of the system may not be resting on a tripod but rather floating or moving, such as is the case in operation at a steady motion to scan long objects-of-interest. An operator of the system may use a mechanical switch to initiate the slide rotation process which could be driven by a motor while the unit is relatively stationary. Once the SLI pattern has been rotated a preselected amount, from its initial epipolar 90-degrees and into the orthogonal direction, the SLI pattern is held stationary while the scanner is moved linearly across the target surface-of-interest. Preferably, in this case, minimal movement is experienced by the unit during SLI pattern rotation step, and then approximately linear motion during the “PMP” scan step.
There are three new functions that are performed, in the case of moving the scanner linerarly along the target surface-of-interest to process the data:
1. Detect and Correct any object motion.
2. Detect Pattern Incoherent Shifting after the object motion correction.
3. Correct Pattern Incoherence by interpolating pattern shifts so that they are uniformly distributed and a conventional processing step can be used to evaluate the phase. Referring to
c
object(x,y)=−1{exp(j arg(Gn(fx,fy)))exp(−j arg(Gn+1(fx,fy)))}
Once the spatial shift between image frames is detected by cobject(x,y) then we correct by translating one of the image frames. After all the correction translations are implemented, the images are then correlated as FP filters set to “matched filter” mode with “dc” suppressed such that
c
pattern(x,y)=F−1{Gn(fx,fy)G*n+1(fx,fy)HHP(fx,fy)} (3)
where HHP is a high pass filter used to suppress the “dc” and very low frequencies. The peak locations of the pattern correlation are used to determine the interpolation weights that generate the uniformly shifted (i.e., coherent) patterns needed for the PMP calculations. The flowchart in
The technique uses the rotation of the pattern to “hold” onto the stripes in order to remove ambiguities of a high frequency PMP scan. The geometry of the novel technique disclosed herein, opens up the possibility of rotation based encoding methods for projector-camera space mapping.
The image of a rotating sinusoidal pattern in projector space can be described by the equation
where θp is the angle of clockwise rotation, kc is the frequency of the sinusoidal pattern, (x0p, y0p) is the center of rotation, and (Nx, My) are the dimensions of the image.
In PMP, the linear translation of a sinusoidal pattern generates a sinusoidal signal at each pixel. Similarly, the rotation of a pattern will generate chirp shaped signals at each pixel whose characteristics are determined by the angle θp and radius rp of the pixel from the center of pattern rotation.
To further illustrate the characteristic patterns generated at each pixel, the Equation 2.2 for a PMP image in Cartesian coordinates (xp, yp) can be converted to polar coordinates (rp, θp) as in the equation
where rp denotes the radius in the projector space, and θp is the rotation angle of the sinusoidal pattern. Using Equation B.2, an image In(rp, θp) can be generated to display the relationship between pixel radius and angle, shown graphically in
Phase Measuring Profilometry (PMP) is a known SLI technique that measures depth information from a surface using a sequence of phase shifted sinusoidally varying patterns. Much like how Binary Encoding uses a code sequence to identify pixels, a PMP pattern sequence can be thought of as encoding rows in the camera image with values that correspond to the phase shift of a sinusoid. The sequence of projected patterns generate a temporal signal at each pixel, such that the signal is a sinusoid, and the phase of the sinusoid is directly related to the position of the pixel along the Phase Direction. The pattern from the perspective of the projector can be described by the following equation
where Ap and Bp are constants. The p superscripts denote the projector coordinates. The f is the frequency of the sine pattern measured in cycles per image-frame, N is the total number of phase shifts for the whole sequence, and n is the current phase shift index or current frame in the time sequence. Since the equation depends only on yp, the intensity value of a given pixel, In(xp, yp), varies only in the yp direction. This direction is called the Phase Direction of the PMP pattern because it is the direction of the phase shift. The term Orthogonal Direction is appropriately named for the relationship with the phase direction—it lies 90 degrees from the Phase Direction along the constant xp values of the pattern.
In order to triangulate to a specific point, the point's position in both camera and projector space must be known. While the camera position is known explicitly due to captured image being in camera space, the projector points are determined by matching a phase value measured at a point to that same phase value in the sinusoidally varying pattern as described in Equation 2.2. The phase value at each pixel in the camera space, φ(xc, yc), is determined by projecting N phase shifted patterns at the target and processing a sequence of N images by the equation
where n denotes an index into the image sequence and In (xc, yc) is the pixel intensity value at the position (xc, yc) in the nth image in the sequence.
For a pattern frequency of 1, which we will call the base frequency or a pattern of a single period of sinusoidal variation, the phase, φ, is easily mapped to a projector frame percentage along the Phase Direction by the equation
Notice that yp is not actually a coordinate in the projector space, but it is a value ranging from 0 to 1 that denotes a percentage of the distance between the bottom of the projector frame and the top.
An ambiguous phase problem occurs when increasing the frequency beyond 1 even though better depth resolution can be achieved through the use of higher frequency patterns. The problem is due to the periodic nature in sinusoids, and can be explained by examining the phase variation in φ(xc, yc) and comparing it to Equation 2.4. The variation in φ(xc, yc) is always from 0 to 2π but at f>1 Equation 2.4 will only vary between 0 and 1/f. So at higher frequencies, this creates what is called a repeated or Wrapped Phase image that requires unwrapping to fully acquire the unambiguous phase value for the correct yp coordinate.
To take advantage of the benefits of higher frequency patterns on depth measurement, a technique called multi-frequency PMP was developed that uses lower frequency PMP scans to “unwrap” the phase for the higher frequencies, which allows for better accuracy with fewer number of frames.
Triangulation is based upon known geometric relationships to determine the points in a spatial coordinate system; therefore, it is important to the performance of any of these systems that the required geometric parameters be computed in a calibration process. Below, both the calibration procedure and the reconstruction method used are presented. To visualize triangulation, the simplistic pinhole model represented in
where the center of projection is located on the origin of the world coordinate system for the ideal model and λ is a non-zero scalar.
The camera coordinate system and world coordinate system are both represented in 3×3 and the translation vector Tε
3, such that
p
c
=Rp
w
+T (2.6)
The Equation 2.6 can be rewritten and combined with Equation 2.5 as
Together (R,T) make up the extrinsic parameters of the camera by describing its orientation and location with respect to the world coordinate system. While the extrinsic parameters map the point pw to pc, the intrinsic parameters map the point pc to the 2-D pixel image plane. To compensate for intrinsic parameters such as scaling differences between the image plane and the pixel coordinates, variations in focal length, or a skewed image plane from its actual orientation, a matrix Kε3×3 is introduced to the transformation such that
which can be further simplified to
It can be shown that the world to pixel coordinate transforms for a pinhole model follows directly from Equation 2.9, which are
where n=1, 2, . . . , N points on a calibration target. These equations assume that the m12 term is 1, so that the transformation is linear at the world origin. The calibration procedure generates an image of a calibration target with N points on the surface that have known values for (Xnw, Ynw, Znw, xnc, ync) for each point on the target. Using this calibration procedure all the terms of the matrix M can be determined. It follows that both the matrix Mc for the camera coordinate system and the matrix Mp that represents the projector coordinate transformations are determined using the same method and can be found simultaneously.
A benefit of performing PMP in only the yp direction is that the equations become simpler, and only three parameters (xc,yc,yp) are necessary for triangulating the corresponding world coordinate for each pixel. Of the required parameters, the camera coordinates (xc,yc) are known explicitly, while the yp value is determined by the specific method for Structured Light Illumination. In this thesis, the yp values are obtained from the unwrapped phase and the Equation 2.3. To obtain the world coordinate (Xw,Yw,Zw), it has been shown [36] that manipulation of Equations 2.11 and 2.12 leads to
and let
Now back substituting C and D into Equation 2.13, the equation becomes
and the world coordinate 3-D point can be obtained with
While certain representative embodiments and details have been shown for the purpose of illustrating features of the invention, those skilled in the art will readily appreciate that various modifications, whether specifically or expressly identified herein, may be made to these representative embodiments without departing from the novel core teachings or scope of this technical disclosure. Accordingly, all such modifications are intended to be included within the scope of the claims. Although the commonly employed preamble phrase “comprising the steps of” may be used herein, or hereafter, in a method claim, the applicants do not intend to invoke 35 U.S.C. §112 ¶6 in a manner that unduly limits rights to its claimed invention. Furthermore, in any claim that is filed herewith or hereafter, any means-plus-function clauses used, or later found to be present, are intended to cover at least all structure(s) described herein as performing the recited function and not only structural equivalents but also equivalent structures.
This application claims benefit of pending U.S. Provisional Patent Application No. 61/358,397 filed 24 Jun. 2011 describing developments of the applicants hereof, on behalf of the assignee. The specification, drawings, and technical EXAMPLE materials of U.S. Prov. Pat. App. No. 61/358,397 are hereby incorporated herein by reference, in their entirety, to the extent each provides further edification of the advancements set forth herein.
Number | Date | Country | |
---|---|---|---|
61358397 | Jun 2010 | US |