This patent document relates generally to the field of three-dimensional shape capture of the surface geometry of an object, and more particularly to structured lighting three-dimensional shape capture.
Three-dimensional scanning and digitization of the surface geometry of objects is commonly used in many industries and services, and their applications are numerous. A few examples of such applications are inspection and measurement of shape conformity in industrial production systems, digitization of clay models for industrial design and styling applications, reverse engineering of existing parts with complex geometry for three-dimensional printing, interactive visualization of objects in multimedia applications, three-dimensional documentation of artwork, historic and archaeological artifacts, human body scanning for better orthotics adaptation, biometry or custom-fit clothing, and three-dimensional forensic reconstruction of crime scenes.
One technology for three-dimensional shape capture is based on structured lighting. Three dimensional shape capture systems based on structure lighting are more accurate than those based on time-of-flight (TOF) image sensors. In a standard structured lighting 3D shape capture system a pattern projector is used to illuminate the scene of interest with a sequence of known two-dimensional patterns, and a camera is used to capture a sequence of images, synchronized with the projected patterns. The camera captures one image for each projected pattern. Each sequence of images captured by the camera is decoded by a computer processor into a dense set of projector-camera pixel correspondences, and subsequently into a three-dimensional range image, using the principles of optical triangulation.
The main limitation of three-dimensional shape capture systems is the required synchronization between projector and camera. To capture a three-dimensional snapshot of a moving scene, the sequence of patterns must be projected at a fast rate, the camera must capture image frames exactly at the same frame rate, and the camera has to start capturing the first frame of the sequence exactly when the projector starts to project the first pattern.
Therefore, there is a need for three-dimensional shape measurement methods and systems based on structure lighting where the camera and the pattern projector are not synchronized.
Further complicating matters, image sensors generally use one of two different technologies to capture an image, referred to as “rolling shutter” and “global shutter”. “Rolling shutter” is a method of image capture in which a still picture or each frame of a video is captured not by taking a snapshot of the entire scene at single instant in time but rather by scanning across the scene rapidly, either vertically or horizontally. In other words, not all parts of the image of the scene are recorded at exactly the same instant. This is in contrast with “global shutter” in which the entire frame is captured at the same instant. Even though most image sensors in consumer devices are rolling shutter sensors, many image sensors used in industrial applications are global shutter sensors.
Therefore, there is a need for three-dimensional shape measurement methods and systems based on structure lighting where the camera and the pattern projector are not synchronized, supporting both global shutter and rolling shutter image sensors.
A system and method to capture the surface geometry a three-dimensional object in a scene using unsynchronized structured lighting solves the problems of the prior art. The method and system includes a pattern projector configured and arranged to project a sequence of image patterns onto the scene at a pattern frame rate, a camera configured and arranged to capture a sequence of unsynchronized image patterns of the scene at an image capture rate; and a processor configured and arranged to synthesize a sequence of synchronized image frames from the unsynchronized image patterns of the scene, each of the synchronized image frames corresponding to one image pattern of the sequence of image patterns. Because the method enables use of an unsynchronized pattern projector and camera significant cost savings can be achieved. The method enables use of inexpensive cameras, such as smartphone cameras, webcams, point-and-shoot digital cameras, camcorders as well as industrial cameras. Furthermore, the method and system enable processing the images with a variety of computing hardware, such as computers, digital signal processors, smartphone processors and the like. Consequently, three-dimensional image capture using structured lighting may be used with relatively little capital investment.
These and other features, aspects, and advantages of the method and system will become better understood with reference to the following description, appended claims, and accompanying drawings where:
A system and method to capture the surface geometry a three-dimensional object in a scene using unsynchronized structured lighting is shown generally in
One object of the present invention is a system to synthesize a synchronized sequence of image frames from an unsynchronized sequence of image frames, illustrated in
Another object of the invention is an unsynchronized three-dimensional shape capture system, comprising the system to synthesize a synchronized sequence of image frames from an unsynchronized sequence of image frames described above, and further comprising prior art methods for decoding, three-dimensional triangulation, and optionally geometric processing, executed by the computer processor.
Another object of the invention is a three-dimensional snapshot camera comprising the unsynchronized three-dimensional shape capture system, where the projector has the means to select the pattern rate from a plurality of supported pattern rates, the camera has the means to select the frame rate from a plurality of supported frame rates, and the camera is capable of capturing the unsynchronized image frames in burst mode at a fast frame rate. In a preferred embodiment the projector has a knob to select the pattern rate. In another preferred embodiment the pattern rate is set by a pattern rate code sent to the projector through a communications link. Furthermore, the system has means to set the pattern rate and the frame rate so that the frame rate is not slower than the pattern rate. In a more preferred embodiment the user sets the pattern rate and the frame rate.
In a more preferred embodiment of the snapshot camera, the camera has the means to receive a camera trigger signal, and the means to set the number of burst mode frames. In an even more preferred embodiment, the camera trigger signal is generated by a camera trigger push-button. When the camera receives the trigger signal it starts capturing the unsynchronized image frames at the set frame rate, and it stops capturing unsynchronized image frames after capturing the set number of burst mode frames.
In a first preferred embodiment of the snapshot camera with camera trigger signal, the projector continuously projects the sequence of patterns in a cyclic fashion. In a more preferred embodiment the system has the means of detecting when the first pattern is about to be projected, and the camera trigger signal is delayed until that moment.
In a second preferred embodiment of the snapshot camera with camera trigger signal, the projector has the means to receive a projector trigger signal. In a more preferred embodiment the camera generates the projector trigger signal after receiving the camera trigger signal, and the camera has the means to send the projector trigger signal to the projector. In an even more preferred embodiment the camera has a flash trigger output, and it sends the projector trigger signal to the projector through the flash trigger output. When the projector receives the trigger signal it starts projecting the sequence of patterns at the set pattern rate, and it stops projecting patterns after it projects the last pattern.
Another object of this invention is a method to synthesize a synchronized sequence of image frames from an unsynchronized sequence of image frames, generating a number of frames in the synchronized sequence of image frames equal to the number of projected patterns, and representing estimates of what the camera would have captured if it were synchronized with the projector.
As will be described in greater detail below in the associated proofs, the method to synthesize the synchronized sequence of image frames from the unsynchronized sequence of image frames is shown generally in
In a preferred embodiment, the method to synthesize the synchronized sequence of image frames from an unsynchronized sequence of image frames, applies to a global shutter image sensor where the image frame rate is identical to the pattern frame rate.
I
n(x,y)=(1−i0)Pn(x,y)+i0Pn+1(x,y)
where Pn(x, y) and Pn−1(x, y) represent the pattern values to be estimated that contribute to the image pixel (x, y) and Pn+1=P1. Projected patterns are known in advance, but since it is not known which projector pixel illuminates each image pixel, they have to be treated as unknown. To estimate the value of t.sub.0, the following expression is minimized
with respect to t0, where the sum is over a subset of pixels (x, y) for which the corresponding pattern pixel values Pn(x, y) and Pn−1(x, y) are known. Differentiating E(t0) with respect to t0 and equating the result to zero, an expression to estimate t0 is obtained
Once the value of t0 has been estimated, the N pattern pixel values P1(x, y), . . . , Pn(x, y) can be estimated for each pixel (x, y) by minimizing the following expression
E(P1(x,y), . . . ,PN(x,y))=½Σn=1N((1−t0)Pn(x,y)+t0Pn+1(x,y)−In(x,y))2
which reduces to solving the following system of N linear equations
βPn−1(x,y)+αPn(x,y)+βPn+1(x,y)=t0In−1(x,y)+(1−t0)In(x,y)
for n=1, . . . , N, where α=t20+(1−t0)2 and β=t0(1−t0).
In another preferred embodiment, the method to synthesize the synchronized sequence of image frames from an unsynchronized sequence of image frames, applies to a rolling shutter image sensor where the image frame rate is identical to the pattern frame rate.
Camera row y in image n begins being exposed at time tn,y
t
n,y
=t
0+(n−1)tf+y tr,y:0 . . . Y−1,
and exposition ends at time tn,y+te
In this model image n is exposed while pattern Pn and Pn+1 are being projected. Intensity level measured at a pixel in row y is given by
I
n,y=(n−tn,y)kn,yPn+(tn,y+te−n)kn,yPn+1+Cn,y,
The constants kn, y and Cn, y are scene dependent.
Let be min {In, y} a pixel being being exposed while P(t)=0, and max {In,y} a pixel being exposed while P(t)=1, max {In, y}=te kn, y+Cn, y. Now, we define a normalized image In, y as,
A normalized image is completely defined by the time variables and pattern values. In this section we want to estimate the time variables. Lets rewrite Equation 58 as
being t0 and d unknown. Image pixel values are given by
I
n(x,y)=(1−t0−yd)Pn(x,y)+(t0+yd)Pn+1(x,y),
Same as before, Pn(x, y) and Pn+1(x, y) represent the pattern values contributing to camera pixel (x, y), we define Pn+1=P1, P0=Pn, In+1=I1, and I0=IN, and I will omit pixel (x, y) to simplify the notation. We now minimize the following energy to find the time variables t0 and d
The partial derivatives are given by
We set the gradient equal to the null vector and reorder as
We use Equation 29 to compute t0 and d when we have some known (or estimated) pattern values.
With known t0 and d we estimate pattern values minimizing
Analogous as in Case 1 we obtain that Ap=b with A as in Equation 12 and α, β, and b defined as
α=(1t0−yd)2+(t0+yd)2,β=(1−t0−yd)(t0+yd)
b=(1−t0−yd)(I1I2 . . . ,IN)r+(t0+yd)(IN,I1, . . . IN−1)r
Pattern values for each pixel are given by p=A−1 b.
In another preferred embodiment, the method to synthesize the synchronized sequence of image frames from an unsynchronized sequence of image frames, applies to a global shutter image sensor where the image frame rate is higher or equal than the pattern frame rate.
Let Δt=tn−1-tn the time between image frames, let p=(P1 . . . ,PM)T and Φn(t0, Δt)=(Φn, 1, t0, Δt), . . . , Φ(n, M, t0, Δt)) T and rewrite Equation 33 as
Each function Φ(n, m, t0, Δt)=∫n−1tnfm(t)dt can be written as
Φ(n,m,t0,Δt)=max(0,min(m,tn)−max(m−1,tn−1))
Same as before, Pn(x, y) represents a pattern value contributing to camera pixel (x, y), we define Pn+1.=P1, P0=PN, In+1=I1, and I0=In, and I will omit pixel (x, y) to simplify the notation.
We now minimize the following energy to find the time variables t0 and Δt
We solve for t0 and Δt by making
Because JΦn(t0, Δt) depends on the unknown value t=(t0, Δt)T we solve for them iteratively
Matrix VA(n,t) and vector Vb(n,t) are defined such as
For completeness we include the following definitions:
With known t0 and 66 t we estimate pattern values minimizing
Analogous as in Case 1 we obtain that Ap=b with
Pattern values for each pixel are given by p=A−1b.
In another preferred embodiment, the method to synthesize the synchronized sequence of image frames from an unsynchronized sequence of image frames, applies to a rolling shutter image sensor where the image frame rate is higher or equal than the pattern frame rate.
Camera row y in image n begins being exposed at time tn, y
t
n,y
=t
0+(n−1)tf+y ty,y:0 . . . Y−1
and exposition ends at time tn,y+te.
In this model a pixel intensity in image n at row y is given by
The constants kn, y and Cn, y are scene dependent, P.sub.m is either 0 or 1.
Let be min{I.sub.n, y} a pixel being exposed while P(t)=0, and max{I.sub.n, y} a pixel being exposed while P(t)=1,
min{In,y}=Cn,y
Max{In,y}=iekn,y+Cn,y
Now, we define a normalized image.sub.n, yas,
A normalized image is completely defined by the time variables and pattern values. In this section we want to estimate the time variables. Lets rewrite the previous equation as,
Let be
We now minimize the following energy to find the unknown h
with the following constraints
or equivalently
Equation E(h) cannot be minimized in closed form because the values matrix Vn,y depends on the unknown values. Using an iterative approach the current value h(i) is used to compute Vn,y(i) and the next value h(i+1)pl.
Up to this point we have assumed that the only unknown is h, meaning that pattern values are known for all image pixels. The difficulty lies is knowing which pattern pixel is being observed by each camera pixel. We simplify this issue by making calibration patterns all ‘black or all ‘white’, best seen in
Decoding is done in two steps: 1) the time offset t0 need to be estimated for this particular sequence; 2) the pattern values are estimated for each camera pixel, as shown in
Similarly as for the time variables, pattern values are estimated by minimizing the following energy
The matrix hT Vn,yT is bi-diagonal for N=M and it is fixed if h is known.
Therefore, it can be seen that the exemplary embodiments of the method and system provides a unique solution to the problem of using structure lighting for three-dimensional image capture where the camera and projector are unsychronized.
It would be appreciated by those skilled in the art that various changes and modifications can be made to the illustrated embodiments without departing from the spirit of the present invention. All such modifications and changes are intended to be within the scope of the present invention except as limited by the scope of the appended claims.
This application is a divisional application of U.S. patent application Ser. No. 15/124,176, which is a national phase filing under 35 U.S.C. § 371 of International Application No. PCT/US2015/019357 filed Mar. 9, 2015, which claims priority to earlier filed U.S. Provisional Application Ser. No. 61/949,529, filed Mar. 7, 2014, the contents of which are incorporated herein by reference.
This Invention was made with government support under grant number DE-FG02-08ER15937 awarded by the Department of Energy. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
61949529 | Mar 2014 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15124176 | Sep 2016 | US |
Child | 16434846 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16434846 | Jun 2019 | US |
Child | 17226920 | US |