The present invention relates to 3D shape modeling.
An open problem in computer vision since early works on optical flow has been to determine the shape of an object with unknown reflectance undergoing differential motion, when observed by a static camera under unknown illumination.
Shape from differential motion is solved under the umbrella of optical flow methods. They rely on brightness constancy assumptions, such as assuming that the local brightness of an image point does not change with variation in lighting and viewing configuration (which is obviously incorrect from a physical point of view). Shape reconstruction methods that account for this variation in brightness attempt to model the image formation as a diffuse reflection, which is inaccurate for most real-world objects.
Systems and methods are disclosed for determining three dimensional (3D) shape by capturing with a camera a plurality of images of an object in differential motion; derive a general relation that relates spatial and temporal image derivatives to BRDF derivatives; exploiting rank deficiency to eliminate BRDF terms and recover depth or normal for directional lighting; and using depth-normal-BRDF relation to recover depth or normal for unknown arbitrary lightings.
The above system solves the fundamental computer vision problem of determining shape from small (differential) motion of an object with an unknown surface reflectance. In the general case, reflectance is an arbitrary function of surface orientation, camera and lighting (henceforth called the bidirectional reflectance distribution function, or the BRDF). The system can handle several camera and illumination conditions:
(i) Orthographic projection
(ii) Perspective projection
(i) Orthographic projection
(ii) Perspective projection
(i) Orthographic projection
(ii) Perspective projection
Advantages of the preferred embodiment may include one or more of the following. The system can recover shape from motion under conditions of general, unknown BRDF and illumination. So, the methods are the first of their kind that can handle shape reconstruction under challenging imaging conditions. Further, prior methods simplify the problem with physically incorrect assumptions like brightness constancy or diffuse reflectance. In contrast to conventional method, we correctly account for reflectance behavior as an unknown BRDF, relate it to image intensities and demonstrate that it is still possible to recover the shape. By correctly accounting for the BRDF, we improve the accuracy of shape reconstruction. The system can handle both orthographic and perspective camera projections, with arbitrary unknown distant lighting (directional or area).
The present system solves the fundamental computer vision problem of determining shape from the (small or differential) motion of an object with unknown isotropic reflectance, under unknown distant illumination. The system works with a fixed camera, without restrictive assumptions like brightness constancy, Lambertian BRDF or a known directional light source. Under orthographic projection, three differential motions suffice to yield an invariant that relates shape to image derivatives, regardless of BRDF and illumination. Further, we delineate the topological classes up to which reconstruction may be achieved using the invariant. Under perspective projection, four differential motions suffice to yield depth and a linear constraint on the surface gradient, with unknown BRDF and lighting. The invariants are homogeneous partial differential equations for simple lighting, and inhomogeneous for more complex lighting. The system uses a stratification of shape recovery, related to the number of differential motions required, generalizing earlier work with Lambertian BRDFs. The reconstruction methods are validated on synthetic and real data.
Turning now to
Module (300) applies the rank deficiency in module (101) to estimate depth for an unknown directional point light source. It is assumed that the object is moving under a fixed camera and light source. An isotropic BRDF in this case depends on the two angles between the (surface normal, light) and (surface normal, camera). Module (301) handles the case where the camera model is orthographic. Then, we show that using 3 or more differential pairs of images, we may eliminate BRDF terms to derive an inhomogeneous quasilinear PDE in surface depth. This PDE can be solved to recover level curves of the surface using a method of characteristics. The level curves are interpolated to recover dense depth. In module (302), the camera model is perspective. Then, using 4 or more differential pairs of images, we may eliminate BRDF terms to extract two equations. The first directly yields the surface depth, while the second is an inhomogeneous quasilinear PDE in surface depth. Since depth is known from the first equation, the second equation may now be treated as a constraint on the surface normal.
Module (400) handles an area light source, where a diffuse BRDF is a quadratic function of the surface normal. The differential stereo relation now becomes a nonlinear PDE in surface depth, which may be solved using nonlinear optimization methods.
In module (500), with a depth sensor and RGB camera input, the differential stereo relation is a decoupled expression in the depth and surface normal. This decoupling can be exploited to design more efficient optimization algorithms (such as alternating minimization).
In module (601), the camera model is orthographic. Then, using 3 or more differential pairs of images, we may use module (102) to eliminate BRDF terms to derive an inhomogeneous quasilinear PDE in surface depth. This PDE can be solved to recover level curves of the surface using a method of characteristics. The level curves are interpolated to recover dense depth. Module (602) handles the situation where the camera model is perspective. Then, using 4 or more differential pairs of images, we may use module (102) to eliminate BRDF terms to extract two equations. The first directly yields the surface depth, while the second is a linear constraint on the surface gradient. The two constraints may be combined to yield a highly sparse linear system, which can be solved efficiently to recover the surface depth.
The system of
(i) For orthographic projections, we derive a first-order quasilinear partial differential equation (PDE) which can be solved for surface depth using a method of characteristics.
(ii) For perspective projections, we show that it is possible to directly estimate depth from image derivatives in four or more images.
(iii) For perspective images, we derive an additional constraint on the surface gradient.
(iv) We demonstrate that the depth and gradient constraints may be combined to yield an efficient solution for surface depth as a sparse linear system.
We exploit the rank deficiency in (a) to derive solutions for several camera and lighting conditions:
(i) For orthographic projections, we derive a first-order quasilinear partial differential equation (PDE) which can be solved for surface depth using a method of characteristics.
(ii) For perspective projections, we show that depth may be directly recovered by exploiting the rank deficiency, along with an additional PDE that constrains the surface normal.
(iii) For colocated lighting, we show that two differential pairs suffice for recovering shape.
(iv) For general directional lighting, we show that three differential pairs suffice for recovering shape, without requiring knowledge of lighting.
(v) When we have additional depth sensor input, our method can be used to obtain depth input with surface normal information, thereby improving accuracy.
Next, one of our exemplary set ups is discussed. The camera and lighting in our setup are fixed, while the object moves. The object BRDF is assumed isotropic and homogeneous (or having slow spatial variation), with an unknown functional form. The distant illumination may be directional or environment. Interreflections and shadows are assumed negligible. Let the focal length of the camera be f. The principal point on the image plane is defined as the origin of the 3D coordinate system, with the camera center at (0,0,−f)T. Denoting β=f−1, a 3D point x=(x, y, z)T is imaged at u=(u, v)T, where
u=x/(1+βz),v=y/(1+βz). (1)
Differential motion is detailed next. Using the projection equations in (1), the motion field is given by
Consider a small rotation R; I+[ω]x and translation τ=(τ1, τ2, τ3)T, where [ω]x is the skew-symmetric matrix of ω=(ω1, ω2, ω3)T. Then, {dot over (x)}=ω×x+τ for a point x on the object. In the perspective case, the motion field is
where α1=ω2βu2−ω1βuv−ω3v, α2=τ1−βuτ3, α3=−ω1βv2+ω2βuv+ω3u and α4=τ2−βvτ3. Under orthography, β→0, thus, the motion field is
μ=(α5+ω2z,α6−ω1z)T, (4)
where α5=τ1−ω3v and α6=τ2+ω3u.
Differential flow relation is now discussed. Assuming isotropic BRDF ρ, the image intensity of a 3D point x, imaged at pixel u, is
I(u,t)=σ(x)ρ(n,x), (5)
where σ is the albedo and n is the surface normal at the point. The cosine fall-off is absorbed within ρ. The BRDF ρ is usually written as a function of incident and outgoing directions, but for fixed lighting and view, can be seen as a function of surface position and orientation. This is a reasonable image formation model that subsumes traditional ones like Lambertian and allows general isotropic BRDFs modulated by spatially varying albedo. Note that we do not make any assumptions on the functional form of ρ, in fact, our theory will derive invariants that eliminate it.
Considering the total derivative on both sides of (5), using the chain rule, we have
Since σ is intrinsically defined on the surface coordinates, its total derivative vanishes (for a rigorous derivation, please refer to Appendix 9). Noting that μ=({dot over (u)}, {dot over (v)})T is the motion field, the above can be rewritten as
(∇uI)Tμ+It=σ└(∇nρ)T(ω×n)+(∇xρ)Tv┘, (7)
where v is the linear velocity and we use {dot over (n)}=ω×n. Since lighting is distant and BRDF homogeneous (or with slow spatial variation), ∇xρ is negligible. Moreover, using standard vector identities, (∇nρ)T(ω×n)=(n×∇nρ)Tω. Denoting E=log I, we note that the albedo can be easily eliminated by dividing out I(u,t), to yield the differential flow relation:
(∇uE)Tμ+Et=(n×∇n log ρ)Tω. (8)
The differential flow relation in (7) and (42) is a strict generalization of the brightness constancy relation used by the vast majority of prior works on optical flow. Indeed, with a constant BRDF ρ=1, the RHS in (7) or (42) vanishes, which is precisely the brightness constancy assumption. However, note that ρ=1 is physically unrealistic—even the most basic Lambertian assumption is ρ(n)=nTs, in which case (42) reduces to a well-known relation:
In the following, we explore the extent to which the motion field μ and object shape may be recovered using (42), under both orthographic and perspective image formation. Precisely, we show that it is possible to eliminate all BRDF and lighting effects in an image sequence, leaving a simple relationship between image derivatives, surface depths and normals.
Orthographic Projection is now discussed. We consider recovery of the shape of an object with unknown BRDF, using a sequence of differential motions. Under orthography, the motion field μ is given by (47). Denoting π=n×∇n log ρ, one may rewrite (42) as
pz+q=ωTπ, (10)
where, using (47), p and q are known entities given by
P=ω2Eu−ω1Ev (11)
q=α5Eu+α6Ev+Et. (12)
Rank-Deficiency in an Image Sequence is discussed next. For m≧3, consider a sequence of m+1 images, E0, . . . , Em, where Ei is related to E0 by a known differential motion {ωi, τi}. We assume that the object undergoes general motion, that is, the set of vectors ωi, i=1, . . . , m, span R3. Then, from (10), we have a set of relations
piz+qi=πτωi,i=1, . . . ,m. (13)
Note that pi, qi and ωi are known from the images and calibration, while surface depth z and the entity π related to normals and BRDF are unknown. It might appear at a glance that using the above m relations in (13), one may set up a linear system whose each row is given by [pi,−ω1i,−ω2i,−ω3i]T, to solve for both z and π at every pixel. However, note the form of pi=Euω2i−Evω1i, which means that the first column in the involved m×4 linear system is a linear combination of the other three columns. Thus, the linear system is rank deficient (rank 3 in the general case when the set of vectors {ωi}, i=1, . . . , m, span R3), whereby we have:
BRDF-Invariant Constraints on Surface
While one may not use (10) directly to obtain depth, we may still exploit the rank deficiency to infer information about the surface depth. For an object with unknown BRDF, observed under unknown lighting and orthographic camera, three differential motions suffice to yield a BRDF and lighting invariant relation between image derivatives and surface geometry. We have the parameterized solution
(z,πT)T=−B−q+k(1,−Ev,Eu,0)T, (14)
where B+ is the Moore-Penrose pseudoinverse of B and k an arbitrary scalar. Define γ=−B+q and γ′=(γ2, γ3, γ4)T. Then, we have the following two relations
z=γ1+k (15)
π=γ′+k(−Ev,Eu,0)T. (16)
From the definition of π, we have nTπ=0. Substituting from the above two relations (with k=z−γ1), we get
(λ1+λ2z)n1+(λ3+λ4z)n2−γ4n3=0, (17)
where λ1=−(γ2+γ1Ev), λ2=Ev, λ3=−γ3+γ1Eu and λ4=−Eu. Noting that n1/n3=−zx and n2/n3=−zy, we may rewrite (17) as
(λ1λ2z)zx+(λ3+λ4z)zyγ4=0, (18)
which is independent of BRDF and lighting.
Thus, we may directly relate surface depth and gradient to image intensity, even for unknown BRDF and illumination. This is a fundamental constraint that relates object shape to motion, regardless of choice of reconstruction method.
Surface Depth Estimation is discussed next. We consider the precise extent to which surface depth may be recovered using Proposition 2. We first consider the simpler case of a colocated source and sensor, where an isotropic BRDF is given by ρ(nTs), for an unknown function ρ. For our choice of coordinate system, s=(0,0,−1)T. Recall that π=n×∇n log ρ. It is easily verified that π3=0, thus, γ4=0 using (14). The relation in (18) now becomes
zx/zy=−(λ3+λ4z)/(λ1+λ2z) (19)
where the λ1, i=1, . . . , 4 are defined as before. Now, we are in a position to state the following result:
Two or more differential motions of a surface with unknown BRDF, with a colocated source and sensor, yield level curves of surface depth, corresponding to known depths of some (possibly isolated) points on the surface. Define a=(λ1+λ2z, λ3+λ4z)T. Then, from (19),
aT∇z=0. (20)
Since ∇z is orthogonal to the level curves of z, the tangent space to the level curves of z is defined by a. Consider a rectifiable curve C(x(s), y(s)) parameterized by the arc length parameter s. The derivative of z along C is given by
If C is a level curve of z(x, y), then dz/ds=0 on C. Define t=(dx/ds, dy/ds). Then, we also have
tT∇z=0. (22)
From (20) and (22), it follows that a and t are parallel. Thus, t2/t1=a2/a1, whereby we get
dy/dx=(λ3+λ4z)/(λ1+λ2z). (23)
Along a level curve z(x, y)=c, the solution is given by
Given the value of z at any point, the ODE (24) determines all other points on the surface with the same value of z.
Thus, (19) allows reconstruction of level curves of the surface, with unknown BRDF, under colocated illumination. Note that (19) is a first-order, homogeneous, quasilinear partial differential equation (PDE). Similarly, we may interpret (18) as a PDE in z(x, y), in particular, it is an inhomogeneous, first-order, quasilinear PDE. This immediately suggests the following surface reconstructibility result in the general case:
Three or more differential motions of a surface with unknown BRDF, under unknown illumination, yield characteristic surface curves C(x(s), y(s), z(s)), defined by
corresponding to depths at some (possibly isolated) points.
Surface Reconstruction
Given depth z0 at point (x0, y0)T, for a small step size ds, the relations (24) or (25) yield (dx,dy,dz)T, such that (x0+dx, y0+dy)T lies on the characteristic curves of (18) through (x0, y0)T, with depth z0+dz. The process is repeated until the entire characteristic curve is estimated.
Note that dz is zero for the colocated case since characteristic curves correspond to level curves of depth, while it is in general non-zero for the non-colocated case. In practice, initial depths z0 may be obtained from feature correspondences, or the occluding contour in the non-colocated case.
Perspective Projection
In this section, we relax the assumption of orthography. Surprisingly, we obtain even stronger results in the perspective case, showing that with four or more differential motions with unknown BRDF, we can directly recover surface depth, as well as a linear constraint on the derivatives of the depth. Strictly speaking, our theory is an approximation in the perspective case, since viewing direction may vary over object dimensions, thus, ∇xρ may be non-zero in (7). However, we illustrate in this section that accounting for finite focal length has benefits, as long as the basic assumption is satisfied that object dimensions are small compared to camera and source distance (which ensures that ∇xρ is negligibly small).
Differential Flow Relation
In the perspective case, one may rewrite (42) as (compare to the linear relation in (10) for the orthographic case),
where p′=Euω2−Evω1, q′=α1Eu+α3Ev+Et and r′=α2Eu+α4Ev are known entities, using (3).
Now, one may derive a theory similar to the orthographic case by treating z/(1+βz), 1/(1+βz) and π as independent variables and using the rank deficiency (note the form of p′) arising from a sequence of m≧4 differential motions. We leave the derivations as an exercise for the reader, but note that most of the observations in the preceding section for the orthographic case hold true in the perspective case too, albeit with the requirement of one additional image.
Instead, in the following, we take a closer look at the perspective equations for differential flow, to show that they yield a more comprehensive solution for surface geometry.
BRDF-Invariant Depth Estimation
We demonstrate that under perspective projection, object motion can completely specify the surface depth, without any initial information:
Four or more differential motions of a surface with unknown BRDF, under unknown illumination, suffice to yield under perspective projection:
For m≧4, let images E1, . . . , Em be related to E0 by known differential motions {ωi, τi}, where ωi span R3. From (26), we have a sequence of differential flow relations
(p′i+βq′i)z−((1+βz)π)Tωi+(q′i+r′i)=0, (27)
for i=1, . . . , m. Let ci=[p′i+βq′i, −ω1t, −ω2t, −ω3t]T be the rows of the m×4 matrix C=[c1, . . . , cm]T. Let q′=[q′1, . . . , q′m]T and r′=[r′1, . . . , r′m]T. Then, we may rewrite the system (27) as
which yields the solution
where C+ is the Moore-Penrose pseudoinverse of C. Define ε=−C+(q′+r′) and ε′=(ε2, ε3, ε4)T. Then, we have
z=ε1,(1+βz)π=ε′. (30)
By definition, π=n×∇n log ρ, thus, nTπ=0. We now have two separate relations for depths and normals:
z=ε1 (31)
nTε′=0. (32)
Thus, in the perspective case, one may directly use (31) to recover the surface depth. Further, noting that n1/n3=−zx and n2/n3=−zy, we may rewrite (32) as
ε3zx+ε3zy−ε4=0, (33)
which is a linear constraint on surface depth derivatives.
Again, in the simpler case of colocated illumination, we observe that ε4=0, thus, the minimal imaging requirement is three motions. Further, from (32), the ratio −ε2/ε3 yields the slope of the gradient, leading to:
Three or more differential motions of a surface with unknown BRDF, under unknown illumination, suffice to yield under perspective projection the surface depth and the slope of the gradient. Even when BRDF and illumination are unknown, one may derive an invariant that relates shape to object motion, through a linear relation and a linear PDE on the surface depth. Again, we note that this is a fundamental constraint, independent of any particular reconstruction approach.
Surface Reconstruction is detailed next. Under perspective projection, one may directly recover the surface depth using (31). An object with unknown BRDF is imaged with perspective projection under unknown illumination after undergoing four arbitrary differential motions. No prior knowledge of the surface is required in the perspective case, even at isolated points. Combining Depth and Normal Information can be done by solving the following linear system that combines the two constraints (31) and (33) on depths and gradients:
where λ is a relative weighting term. Standard discretization schemes may be used to represent zx and zy. Then, the above is a highly sparse linear system in the depths z, which may be solved using a linear least squares solver. Incorporating gradient constraints has the effect of regularizing the depth estimation by introducing neighborhood information, which may be advantageous in noisy scenarios.
Stratification of Shape from Motion is discussed next. For the Lambertian BRDF, under known directional lighting, shape and image derivatives may be related by a quasilinear PDE. They use special considerations of the two-view setup to arrive at the result. In the context of our theory, under a directional light source s=(s1, s2, 1)T/√{square root over (s12+s22+1)}, we have ρ(n)=nTs. Then, we may rewrite the basic relation in (42) as (9). For the orthographic case, using (11) and (12), we may again rewrite (9) as:
with known scalars λ1′=ω2−ω3s2, λ2′=ω1+ω3s1 and λ3′=−ω1s2+ω2s1. Note that (35) is a quasilinear PDE. It may be verified that the perspective case can also be written as a quasilinear PDE:
In particular, the framework of this paper can also handle general BRDFs, unknown directional or area lighting and various camera projections.
The system analyzes motion which reveals the shape, with unknown isotropic BRDF and arbitrary, unknown distant illumination, for orthographic and perspective projections. We derive differential flow invariants that relate image derivatives to shape and exactly characterize the object geometry that can be recovered. This work generalizes traditional notions of brightness constancy or Lambertian BRDFs in the optical flow and multiview stereo literatures. Our results are not just valid for a particular approach to reconstruction, rather they impose fundamental limits on the hardness of surface reconstruction. In the process, we also present a stratification of shape from motion that relates hardness of reconstruction to scene complexity—qualitatively in terms of the nature of the involved PDE and quantitatively in terms of the minimum number of required motions.
Many of the relations, such as (19), (35) and (36) may be expressed in the form f(z)=g(n). With the availability of depth sensors, it becomes possible to measure f(z), making the optimization problem to solve for only n easier. The accuracy and convergence of alternating minimization approaches can be used to simultaneously estimate depth and normals.
The invention may be implemented in hardware, firmware or software, or a combination of the three. Preferably the invention is implemented in a computer program executed on a programmable computer having a processor, a data storage system, volatile and non-volatile memory and/or storage elements, at least one input device and at least one output device.
By way of example, a block diagram of a computer to support the system is discussed next. The computer preferably includes a processor, random access memory (RAM), a program memory (preferably a writable read-only memory (ROM) such as a flash ROM) and an input/output (I/O) controller coupled by a CPU bus. The computer may optionally include a hard drive controller which is coupled to a hard disk and CPU bus. Hard disk may be used for storing application programs, such as the present invention, and data. Alternatively, application programs may be stored in RAM or ROM. I/O controller is coupled by means of an I/O bus to an I/O interface. I/O interface receives and transmits data in analog or digital form over communication links such as a serial link, local area network, wireless link, and parallel link. Optionally, a display, a keyboard and a pointing device (mouse) may also be connected to I/O bus. Alternatively, separate connections (separate buses) may be used for I/O interface, display, keyboard and pointing device. Programmable processing system may be preprogrammed or it may be programmed (and reprogrammed) by downloading a program from another source (e.g., a floppy disk, CD-ROM, or another computer).
Each computer program is tangibly stored in a machine-readable storage media or device (e.g., program memory or magnetic disk) readable by a general or special purpose programmable computer, for configuring and controlling operation of a computer when the storage media or device is read by the computer to perform the procedures described herein. The inventive system may also be considered to be embodied in a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
The invention has been described herein in considerable detail in order to comply with the patent Statutes and to provide those skilled in the art with the information needed to apply the novel principles and to construct and use such specialized components as are required. However, it is to be understood that the invention can be carried out by specifically different equipment and devices, and that various modifications, both as to the equipment details and operating procedures, can be accomplished without departing from the scope of the invention itself.
This application is a utility conversion and claims priority to Provisional Application Ser. 61/725,728 filed Nov. 13, 2012, the content of which is incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
5500904 | Markandey | Mar 1996 | A |
20070206008 | Kaufman | Sep 2007 | A1 |
20070211069 | Baker | Sep 2007 | A1 |
Number | Date | Country | |
---|---|---|---|
20140132727 A1 | May 2014 | US |
Number | Date | Country | |
---|---|---|---|
61725728 | Nov 2012 | US |