Shape from motion for unknown, arbitrary lighting and reflectance

BACKGROUND

The present invention relates to 3D shape modeling.

An open problem in computer vision since early works on optical flow has been to determine the shape of an object with unknown reflectance undergoing differential motion, when observed by a static camera under unknown illumination.

Shape from differential motion is solved under the umbrella of optical flow methods. They rely on brightness constancy assumptions, such as assuming that the local brightness of an image point does not change with variation in lighting and viewing configuration (which is obviously incorrect from a physical point of view). Shape reconstruction methods that account for this variation in brightness attempt to model the image formation as a diffuse reflection, which is inaccurate for most real-world objects.

SUMMARY

Systems and methods are disclosed for determining three dimensional (3D) shape by capturing with a camera a plurality of images of an object in differential motion; derive a general relation that relates spatial and temporal image derivatives to BRDF derivatives; exploiting rank deficiency to eliminate BRDF terms and recover depth or normal for directional lighting; and using depth-normal-BRDF relation to recover depth or normal for unknown arbitrary lightings.

The above system solves the fundamental computer vision problem of determining shape from small (differential) motion of an object with an unknown surface reflectance. In the general case, reflectance is an arbitrary function of surface orientation, camera and lighting (henceforth called the bidirectional reflectance distribution function, or the BRDF). The system can handle several camera and illumination conditions:

(a) Unknown arbitrary lighting, unknown general reflectance

(i) Orthographic projection

(ii) Perspective projection

(b) Unknown directional lighting, unknown general reflectance

(i) Orthographic projection

(ii) Perspective projection

(i) Orthographic projection

(ii) Perspective projection

(d) Area lighting
(e) RGB+Depth sensor

The system can model the dependence of image formation on the bidirectional reflectance distribution function (BRDF) and illumination, to derive a physically valid differential flow relation. Even when the BRDF and illumination are unknown, the differential flow constrains the shape of an object through an invariant relating surface depth to image derivatives. The form of the invariant depends on the camera projection and the complexity of the illumination. For orthographic projections, three differential motions suffice and the invariant is a quasilinear partial differential equation (PDE). For perspective projections, surface depth may be directly recovered from four differential motions, with an additional linear PDE constraining the surface normal. The involved PDEs are homogeneous for simple illuminations, but inhomogeneous for complex lighting. Besides characterizing the invariants, in each case, surface reconstruction may be performed.

Advantages of the preferred embodiment may include one or more of the following. The system can recover shape from motion under conditions of general, unknown BRDF and illumination. So, the methods are the first of their kind that can handle shape reconstruction under challenging imaging conditions. Further, prior methods simplify the problem with physically incorrect assumptions like brightness constancy or diffuse reflectance. In contrast to conventional method, we correctly account for reflectance behavior as an unknown BRDF, relate it to image intensities and demonstrate that it is still possible to recover the shape. By correctly accounting for the BRDF, we improve the accuracy of shape reconstruction. The system can handle both orthographic and perspective camera projections, with arbitrary unknown distant lighting (directional or area).

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 shows an exemplary process on a generalized version of an optical flow relation that flexibly handles a number of parametric bidirectional reflectance distribution function (BRDF) and solutions to recover surface depth.

FIG. 2 shows more details from FIG. 1 of a process to handle differential stereo relations.

FIG. 3 shows an exemplary process for determining rank-deficient relations across image sequences.

FIG. 4 shows an exemplary process with depth-normal BRDF relation.

FIG. 5 shows more details from FIG. 1 of a process to handle depth from collocated lighting and an orthographic camera.

FIG. 6 shows more details from FIG. 1 of a process to handle depth from collocated lighting and a perspective camera.

FIG. 7 shows more details from FIG. 1 of a process to handle depth from unknown but directional lighting and an orthographic camera.

FIG. 8 shows more details from FIG. 1 of a process to handle depth from unknown directional lighting and a perspective camera.

FIG. 9 shows more details from FIG. 1 of a process to handle depth from unknown arbitrary lighting and an orthographic camera.

FIG. 10 shows more details from FIG. 1 of a process to handle depth from unknown arbitrary lighting and a perspective camera.

FIG. 11 shows an exemplary computer to run processes of FIGS. 1-10.

DESCRIPTION

The present system solves the fundamental computer vision problem of determining shape from the (small or differential) motion of an object with unknown isotropic reflectance, under unknown distant illumination. The system works with a fixed camera, without restrictive assumptions like brightness constancy, Lambertian BRDF or a known directional light source. Under orthographic projection, three differential motions suffice to yield an invariant that relates shape to image derivatives, regardless of BRDF and illumination. Further, we delineate the topological classes up to which reconstruction may be achieved using the invariant. Under perspective projection, four differential motions suffice to yield depth and a linear constraint on the surface gradient, with unknown BRDF and lighting. The invariants are homogeneous partial differential equations for simple lighting, and inhomogeneous for more complex lighting. The system uses a stratification of shape recovery, related to the number of differential motions required, generalizing earlier work with Lambertian BRDFs. The reconstruction methods are validated on synthetic and real data.

FIGS. 1-10 show an exemplary process (100) providing for a generalized version of an optical flow relation that flexibly handles a number of parametric BRDFs and solutions to recover surface depth. In contrast, the system of FIG. 1 derives the relation for general BRDFs and relates it to the surface depth. FIG. 1 shows a top level view of major modules 101-602, while FIGS. 2-10 provide more details on each module.

Turning now to FIG. 1, the method includes a module (101) to observe a rank deficiency in relation across different images. This rank deficiency can be exploited to recover depth, as explained in following sections. The method also includes a module (102) that directly uses a relationship between the depths, normal and BRDF to derive constraints on surface depth. This is valid for arbitrary, unknown lighting.

FIG. 1 also includes module (200) where the rank deficiency in module (101) can be used to estimate depth for lighting colocated with the camera. An isotropic BRDF in this case depends only on the magnitude of the gradient. Module (201) handles the case where the camera model is orthographic. Then, using 2 or more differential pairs of images, the system may eliminate BRDF terms to derive a homogeneous quasilinear PDE in surface depth. This PDE can be solved to recover level curves of the surface using a method of characteristics. The level curves are interpolated to recover dense depth. Module (202) handles the case where the camera model is perspective. Then, using 3 or more differential pairs of images, we may eliminate BRDF terms to extract two equations. The first directly yields the surface depth, while the second is a homogeneous quasilinear PDE in surface depth. Since depth is known from the first equation, the second equation may now be treated as a constraint on the surface normal.

Module (300) applies the rank deficiency in module (101) to estimate depth for an unknown directional point light source. It is assumed that the object is moving under a fixed camera and light source. An isotropic BRDF in this case depends on the two angles between the (surface normal, light) and (surface normal, camera). Module (301) handles the case where the camera model is orthographic. Then, we show that using 3 or more differential pairs of images, we may eliminate BRDF terms to derive an inhomogeneous quasilinear PDE in surface depth. This PDE can be solved to recover level curves of the surface using a method of characteristics. The level curves are interpolated to recover dense depth. In module (302), the camera model is perspective. Then, using 4 or more differential pairs of images, we may eliminate BRDF terms to extract two equations. The first directly yields the surface depth, while the second is an inhomogeneous quasilinear PDE in surface depth. Since depth is known from the first equation, the second equation may now be treated as a constraint on the surface normal.

Module (400) handles an area light source, where a diffuse BRDF is a quadratic function of the surface normal. The differential stereo relation now becomes a nonlinear PDE in surface depth, which may be solved using nonlinear optimization methods.

In module (500), with a depth sensor and RGB camera input, the differential stereo relation is a decoupled expression in the depth and surface normal. This decoupling can be exploited to design more efficient optimization algorithms (such as alternating minimization).

In module (601), the camera model is orthographic. Then, using 3 or more differential pairs of images, we may use module (102) to eliminate BRDF terms to derive an inhomogeneous quasilinear PDE in surface depth. This PDE can be solved to recover level curves of the surface using a method of characteristics. The level curves are interpolated to recover dense depth. Module (602) handles the situation where the camera model is perspective. Then, using 4 or more differential pairs of images, we may use module (102) to eliminate BRDF terms to extract two equations. The first directly yields the surface depth, while the second is a linear constraint on the surface gradient. The two constraints may be combined to yield a highly sparse linear system, which can be solved efficiently to recover the surface depth.

The system of FIG. 1 achieves high performance by:

(a) instead of assuming brightness constancy or Lambertian reflectance, we model the correct dependence of surface reflectance on surface normal, lighting and viewing directions

(b) we recognize a rank deficiency in the differential stereo relations and recognize that it can be used to eliminate BRDF and lighting dependence

(c) we recognize a relationship in the depth-normal-BRDF relationship, which can be used to eliminate BRDF and dependence on arbitrary unknown lighting

(d) we manage to eliminate the BRDF and lighting, so we can handle objects that reflect light in complex ways, without need to calibrate the lighting

(e) we derive our BRDF-invariant expressions in the form of quasilinear PDEs, which can be conveniently solved with predictable solution properties (initial conditions, accuracy and convergence behavior)

(f) we derive linear constraints on depth and gradient, which can be solved efficiently as a sparse linear system to yield surface depth with unknown BRDSF and arbitrary unknown illumination.

In one implementation, we first derive a general relation that relates spatial and temporal image derivatives to BRDF derivatives. Directly using the relation for shape recovery is not possible due to a rank deficiency.

We exploit the form of the relationship to derive the following for arbitrary unknown lighting:

(i) For orthographic projections, we derive a first-order quasilinear partial differential equation (PDE) which can be solved for surface depth using a method of characteristics.

(ii) For perspective projections, we show that it is possible to directly estimate depth from image derivatives in four or more images.

(iii) For perspective images, we derive an additional constraint on the surface gradient.

(iv) We demonstrate that the depth and gradient constraints may be combined to yield an efficient solution for surface depth as a sparse linear system.

We exploit the rank deficiency in (a) to derive solutions for several camera and lighting conditions:

(i) For orthographic projections, we derive a first-order quasilinear partial differential equation (PDE) which can be solved for surface depth using a method of characteristics.

(ii) For perspective projections, we show that depth may be directly recovered by exploiting the rank deficiency, along with an additional PDE that constrains the surface normal.

(iii) For colocated lighting, we show that two differential pairs suffice for recovering shape.

(iv) For general directional lighting, we show that three differential pairs suffice for recovering shape, without requiring knowledge of lighting.

(v) When we have additional depth sensor input, our method can be used to obtain depth input with surface normal information, thereby improving accuracy.

Next, one of our exemplary set ups is discussed. The camera and lighting in our setup are fixed, while the object moves. The object BRDF is assumed isotropic and homogeneous (or having slow spatial variation), with an unknown functional form. The distant illumination may be directional or environment. Interreflections and shadows are assumed negligible. Let the focal length of the camera be f. The principal point on the image plane is defined as the origin of the 3D coordinate system, with the camera center at (0,0,−f)^T. Denoting β=f⁻¹, a 3D point x=(x, y, z)^Tis imaged at u=(u, v)^T, where

u=x/(1+βz),v=y/(1+βz). (1)

Differential motion is detailed next. Using the projection equations in (1), the motion field is given by

$\begin{matrix} [\begin{matrix} μ_{1} \\ μ_{2} \end{matrix}] = [\begin{matrix} \dot{u} \\ \dot{v} \end{matrix}] = \frac{1}{1 + β z} [\begin{matrix} \dot{x} - β u \dot{z} \\ \dot{y} - β v \dot{z} \end{matrix}] . & (2) \end{matrix}$

Consider a small rotation R; I+[ω]_xand translation τ=(τ₁, τ₂, τ₃)^T, where [ω]_xis the skew-symmetric matrix of ω=(ω₁, ω₂, ω₃)^T. Then, {dot over (x)}=ω×x+τ for a point x on the object. In the perspective case, the motion field is

$\begin{matrix} μ = {(α_{1} + \frac{α_{2} + ω_{2} z}{1 + β z}, α_{3} + \frac{α_{4} - ω_{1} z}{1 + β z})}^{T}, & (3) \end{matrix}$

where α₁=ω₂βu²−ω₁βuv−ω₃v, α₂=τ₁−βuτ₃, α₃=−ω₁βv²+ω₂βuv+ω₃u and α₄=τ₂−βvτ₃. Under orthography, β→0, thus, the motion field is

μ=(α₅+ω₂z,α₆−ω₁z)^T, (4)

where α₅=τ₁−ω₃v and α₆=τ₂+ω₃u.

Differential flow relation is now discussed. Assuming isotropic BRDF ρ, the image intensity of a 3D point x, imaged at pixel u, is

I(u,t)=σ(x)ρ(n,x), (5)

where σ is the albedo and n is the surface normal at the point. The cosine fall-off is absorbed within ρ. The BRDF ρ is usually written as a function of incident and outgoing directions, but for fixed lighting and view, can be seen as a function of surface position and orientation. This is a reasonable image formation model that subsumes traditional ones like Lambertian and allows general isotropic BRDFs modulated by spatially varying albedo. Note that we do not make any assumptions on the functional form of ρ, in fact, our theory will derive invariants that eliminate it.

Considering the total derivative on both sides of (5), using the chain rule, we have

$\begin{matrix} I_{u} \dot{u} + I_{v} \dot{v} + I_{t} = σ \frac{ⅆ}{ⅆ t} ρ (n, x) + ρ \frac{ⅆ σ}{ⅆ t} . & (6) \end{matrix}$

Since σ is intrinsically defined on the surface coordinates, its total derivative vanishes (for a rigorous derivation, please refer to Appendix 9). Noting that μ=({dot over (u)}, {dot over (v)})^Tis the motion field, the above can be rewritten as

(∇_uI)^Tμ+I_t=σ└(∇_nρ)^T(ω×n)+(∇_xρ)^Tv┘, (7)

where v is the linear velocity and we use {dot over (n)}=ω×n. Since lighting is distant and BRDF homogeneous (or with slow spatial variation), ∇_xρ is negligible. Moreover, using standard vector identities, (∇_nρ)^T(ω×n)=(n×∇_nρ)^Tω. Denoting E=log I, we note that the albedo can be easily eliminated by dividing out I(u,t), to yield the differential flow relation:

(∇_uE)^Tμ+E_t=(n×∇_nlog ρ)^Tω. (8)

The differential flow relation in (7) and (42) is a strict generalization of the brightness constancy relation used by the vast majority of prior works on optical flow. Indeed, with a constant BRDF ρ=1, the RHS in (7) or (42) vanishes, which is precisely the brightness constancy assumption. However, note that ρ=1 is physically unrealistic—even the most basic Lambertian assumption is ρ(n)=n^Ts, in which case (42) reduces to a well-known relation:

$\begin{matrix} {(\nabla_{u} E)}^{T} μ + E_{t} = \frac{{(n \times s)}^{T} ω}{n^{T} s} . & (9) \end{matrix}$

In the following, we explore the extent to which the motion field μ and object shape may be recovered using (42), under both orthographic and perspective image formation. Precisely, we show that it is possible to eliminate all BRDF and lighting effects in an image sequence, leaving a simple relationship between image derivatives, surface depths and normals.

Orthographic Projection is now discussed. We consider recovery of the shape of an object with unknown BRDF, using a sequence of differential motions. Under orthography, the motion field μ is given by (47). Denoting π=n×∇_nlog ρ, one may rewrite (42) as

pz+q=ω^Tπ, (10)

where, using (47), p and q are known entities given by

P=ω₂E_u−ω₁E_v (11)
q=α₅E_u+α₆E_v+E_t. (12)

Rank-Deficiency in an Image Sequence is discussed next. For m≧3, consider a sequence of m+1 images, E₀, . . . , E_m, where E_iis related to E₀by a known differential motion {ωⁱ, τⁱ}. We assume that the object undergoes general motion, that is, the set of vectors ωⁱ, i=1, . . . , m, span R³. Then, from (10), we have a set of relations

pⁱz+qⁱ=π^τωⁱ,i=1, . . . ,m. (13)

Note that pⁱ, qⁱand ωⁱare known from the images and calibration, while surface depth z and the entity π related to normals and BRDF are unknown. It might appear at a glance that using the above m relations in (13), one may set up a linear system whose each row is given by [pⁱ,−ω₁ⁱ,−ω₂ⁱ,−ω₃ⁱ]^T, to solve for both z and π at every pixel. However, note the form of pⁱ=E_uω₂ⁱ−E_vω₁ⁱ, which means that the first column in the involved m×4 linear system is a linear combination of the other three columns. Thus, the linear system is rank deficient (rank 3 in the general case when the set of vectors {ωⁱ}, i=1, . . . , m, span R³), whereby we have:

BRDF-Invariant Constraints on Surface

While one may not use (10) directly to obtain depth, we may still exploit the rank deficiency to infer information about the surface depth. For an object with unknown BRDF, observed under unknown lighting and orthographic camera, three differential motions suffice to yield a BRDF and lighting invariant relation between image derivatives and surface geometry. We have the parameterized solution

(z,π^T)^T=−B⁻q+k(1,−E_v,E_u,0)^T, (14)

where B⁺ is the Moore-Penrose pseudoinverse of B and k an arbitrary scalar. Define γ=−B⁺q and γ′=(γ₂, γ₃, γ₄)^T. Then, we have the following two relations

z=γ₁+k (15)
π=γ′+k(−E_v,E_u,0)^T. (16)

From the definition of π, we have n^Tπ=0. Substituting from the above two relations (with k=z−γ₁), we get

(λ₁+λ₂z)n₁+(λ₃+λ₄z)n₂−γ₄n₃=0, (17)

where λ₁=−(γ₂+γ₁E_v), λ₂=E_v, λ₃=−γ₃+γ₁E_uand λ₄=−E_u. Noting that n₁/n₃=−z_xand n₂/n₃=−z_y, we may rewrite (17) as

(λ₁λ₂z)z_x+(λ₃+λ₄z)z_yγ₄=0, (18)

which is independent of BRDF and lighting.

Thus, we may directly relate surface depth and gradient to image intensity, even for unknown BRDF and illumination. This is a fundamental constraint that relates object shape to motion, regardless of choice of reconstruction method.

Surface Depth Estimation is discussed next. We consider the precise extent to which surface depth may be recovered using Proposition 2. We first consider the simpler case of a colocated source and sensor, where an isotropic BRDF is given by ρ(n^Ts), for an unknown function ρ. For our choice of coordinate system, s=(0,0,−1)^T. Recall that π=n×∇_nlog ρ. It is easily verified that π₃=0, thus, γ₄=0 using (14). The relation in (18) now becomes

z_x/z_y=−(λ₃+λ₄z)/(λ₁+λ₂z) (19)

where the λ₁, i=1, . . . , 4 are defined as before. Now, we are in a position to state the following result:

Two or more differential motions of a surface with unknown BRDF, with a colocated source and sensor, yield level curves of surface depth, corresponding to known depths of some (possibly isolated) points on the surface. Define a=(λ₁+λ₂z, λ₃+λ₄z)^T. Then, from (19),

a^T∇z=0. (20)

Since ∇z is orthogonal to the level curves of z, the tangent space to the level curves of z is defined by a. Consider a rectifiable curve C(x(s), y(s)) parameterized by the arc length parameter s. The derivative of z along C is given by

$\begin{matrix} \frac{ⅆ z}{ⅆ s} = \frac{\partial z}{\partial x} \frac{ⅆ x}{ⅆ s} + \frac{\partial z}{\partial y} \frac{ⅆ y}{ⅆ s} . & (21) \end{matrix}$

If C is a level curve of z(x, y), then dz/ds=0 on C. Define t=(dx/ds, dy/ds). Then, we also have

t^T∇z=0. (22)

From (20) and (22), it follows that a and t are parallel. Thus, t₂/t₁=a₂/a₁, whereby we get

dy/dx=(λ₃+λ₄z)/(λ₁+λ₂z). (23)

Along a level curve z(x, y)=c, the solution is given by

$\begin{matrix} z = c, \frac{ⅆ y}{ⅆ x} = \frac{λ_{3} + λ_{4} c}{λ_{1} + λ_{2} c} . & (24) \end{matrix}$

Given the value of z at any point, the ODE (24) determines all other points on the surface with the same value of z.

Thus, (19) allows reconstruction of level curves of the surface, with unknown BRDF, under colocated illumination. Note that (19) is a first-order, homogeneous, quasilinear partial differential equation (PDE). Similarly, we may interpret (18) as a PDE in z(x, y), in particular, it is an inhomogeneous, first-order, quasilinear PDE. This immediately suggests the following surface reconstructibility result in the general case:

Three or more differential motions of a surface with unknown BRDF, under unknown illumination, yield characteristic surface curves C(x(s), y(s), z(s)), defined by

$\begin{matrix} \frac{1}{λ_{1} + λ_{2} z} \frac{ⅆ x}{ⅆ s} = \frac{1}{λ_{3} + λ_{4} z} \frac{ⅆ y}{ⅆ s} = \frac{- 1}{γ_{4}} \frac{ⅆ z}{ⅆ s} & (25) \end{matrix}$

corresponding to depths at some (possibly isolated) points.

Surface Reconstruction

Given depth z₀at point (x₀, y₀)^T, for a small step size ds, the relations (24) or (25) yield (dx,dy,dz)^T, such that (x₀+dx, y₀+dy)^Tlies on the characteristic curves of (18) through (x₀, y₀)^T, with depth z₀+dz. The process is repeated until the entire characteristic curve is estimated.

Note that dz is zero for the colocated case since characteristic curves correspond to level curves of depth, while it is in general non-zero for the non-colocated case. In practice, initial depths z₀may be obtained from feature correspondences, or the occluding contour in the non-colocated case.

Perspective Projection

In this section, we relax the assumption of orthography. Surprisingly, we obtain even stronger results in the perspective case, showing that with four or more differential motions with unknown BRDF, we can directly recover surface depth, as well as a linear constraint on the derivatives of the depth. Strictly speaking, our theory is an approximation in the perspective case, since viewing direction may vary over object dimensions, thus, ∇_xρ may be non-zero in (7). However, we illustrate in this section that accounting for finite focal length has benefits, as long as the basic assumption is satisfied that object dimensions are small compared to camera and source distance (which ensures that ∇_xρ is negligibly small).

Differential Flow Relation

In the perspective case, one may rewrite (42) as (compare to the linear relation in (10) for the orthographic case),

$\begin{matrix} p^{'} (\frac{z}{1 + β z}) + r^{'} (\frac{1}{1 + β z}) + q^{'} = ω^{T} π, & (26) \end{matrix}$

where p′=E_uω₂−E_vω₁, q′=α₁E_u+α₃E_v+E_tand r′=α₂E_u+α₄E_vare known entities, using (3).

Now, one may derive a theory similar to the orthographic case by treating z/(1+βz), 1/(1+βz) and π as independent variables and using the rank deficiency (note the form of p′) arising from a sequence of m≧4 differential motions. We leave the derivations as an exercise for the reader, but note that most of the observations in the preceding section for the orthographic case hold true in the perspective case too, albeit with the requirement of one additional image.

Instead, in the following, we take a closer look at the perspective equations for differential flow, to show that they yield a more comprehensive solution for surface geometry.

BRDF-Invariant Depth Estimation

We demonstrate that under perspective projection, object motion can completely specify the surface depth, without any initial information:

Four or more differential motions of a surface with unknown BRDF, under unknown illumination, suffice to yield under perspective projection:

- the surface depth
- a linear constraint on the derivatives of surface depth.

For m≧4, let images E₁, . . . , E_mbe related to E₀by known differential motions {ωⁱ, τⁱ}, where ωⁱspan R³. From (26), we have a sequence of differential flow relations

(p′ⁱ+βq′ⁱ)z−((1+βz)π)^Tωⁱ+(q′ⁱ+r′ⁱ)=0, (27)

for i=1, . . . , m. Let cⁱ=[p′ⁱ+βq′ⁱ, −ω₁^t, −ω₂^t, −ω₃^t]^Tbe the rows of the m×4 matrix C=[c¹, . . . , c^m]^T. Let q′=[q′¹, . . . , q′^m]^Tand r′=[r′¹, . . . , r′^m]^T. Then, we may rewrite the system (27) as

$\begin{matrix} C [\begin{matrix} z \\ (1 + β z) π \end{matrix}] = - (q^{'} + r^{'}) & (28) \end{matrix}$

which yields the solution

$\begin{matrix} [\begin{matrix} z \\ (1 + β z) π \end{matrix}] = - C^{+} (q^{'} + r^{'}) & (29) \end{matrix}$

where C⁺ is the Moore-Penrose pseudoinverse of C. Define ε=−C⁺(q′+r′) and ε′=(ε₂, ε₃, ε₄)^T. Then, we have

z=ε₁,(1+βz)π=ε′. (30)

By definition, π=n×∇_nlog ρ, thus, n^Tπ=0. We now have two separate relations for depths and normals:

z=ε₁ (31)
n^Tε′=0. (32)

Thus, in the perspective case, one may directly use (31) to recover the surface depth. Further, noting that n₁/n₃=−z_xand n₂/n₃=−z_y, we may rewrite (32) as

ε₃z_x+ε₃z_y−ε₄=0, (33)

which is a linear constraint on surface depth derivatives.

Again, in the simpler case of colocated illumination, we observe that ε₄=0, thus, the minimal imaging requirement is three motions. Further, from (32), the ratio −ε₂/ε₃yields the slope of the gradient, leading to:

Three or more differential motions of a surface with unknown BRDF, under unknown illumination, suffice to yield under perspective projection the surface depth and the slope of the gradient. Even when BRDF and illumination are unknown, one may derive an invariant that relates shape to object motion, through a linear relation and a linear PDE on the surface depth. Again, we note that this is a fundamental constraint, independent of any particular reconstruction approach.

Surface Reconstruction is detailed next. Under perspective projection, one may directly recover the surface depth using (31). An object with unknown BRDF is imaged with perspective projection under unknown illumination after undergoing four arbitrary differential motions. No prior knowledge of the surface is required in the perspective case, even at isolated points. Combining Depth and Normal Information can be done by solving the following linear system that combines the two constraints (31) and (33) on depths and gradients:

$\begin{matrix} \min_{z (x, y)} {(z - ɛ_{1})}^{2} + {λ (ɛ_{2} z_{x} + ɛ_{3} z_{y} - ɛ_{4})}^{2}, & (34) \end{matrix}$

where λ is a relative weighting term. Standard discretization schemes may be used to represent z_xand z_y. Then, the above is a highly sparse linear system in the depths z, which may be solved using a linear least squares solver. Incorporating gradient constraints has the effect of regularizing the depth estimation by introducing neighborhood information, which may be advantageous in noisy scenarios.

Stratification of Shape from Motion is discussed next. For the Lambertian BRDF, under known directional lighting, shape and image derivatives may be related by a quasilinear PDE. They use special considerations of the two-view setup to arrive at the result. In the context of our theory, under a directional light source s=(s₁, s₂, 1)^T/√{square root over (s₁²+s₂²+1)}, we have ρ(n)=n^Ts. Then, we may rewrite the basic relation in (42) as (9). For the orthographic case, using (11) and (12), we may again rewrite (9) as:

$\begin{matrix} pz + q = \frac{λ_{1^{'}} z_{x} + λ_{2^{'}} z_{y} + λ_{3^{'}}}{- s_{1} z_{x} - s_{2} z_{y} + 1}, & (35) \end{matrix}$

with known scalars λ_1′=ω₂−ω₃s₂, λ_2′=ω₁+ω₃s₁and λ_3′=−ω₁s₂+ω₂s₁. Note that (35) is a quasilinear PDE. It may be verified that the perspective case can also be written as a quasilinear PDE:

$\begin{matrix} \frac{(p^{'} + β q^{'}) z + (q^{'} + r^{'})}{1 + β z} = \frac{λ_{1^{'}} z_{x} + λ_{2^{'}} z_{y} + λ_{3^{'}}}{- s_{1} z_{x} - s_{2} z_{y} + 1} . & (36) \end{matrix}$

In particular, the framework of this paper can also handle general BRDFs, unknown directional or area lighting and various camera projections.

The system analyzes motion which reveals the shape, with unknown isotropic BRDF and arbitrary, unknown distant illumination, for orthographic and perspective projections. We derive differential flow invariants that relate image derivatives to shape and exactly characterize the object geometry that can be recovered. This work generalizes traditional notions of brightness constancy or Lambertian BRDFs in the optical flow and multiview stereo literatures. Our results are not just valid for a particular approach to reconstruction, rather they impose fundamental limits on the hardness of surface reconstruction. In the process, we also present a stratification of shape from motion that relates hardness of reconstruction to scene complexity—qualitatively in terms of the nature of the involved PDE and quantitatively in terms of the minimum number of required motions.

Many of the relations, such as (19), (35) and (36) may be expressed in the form f(z)=g(n). With the availability of depth sensors, it becomes possible to measure f(z), making the optimization problem to solve for only n easier. The accuracy and convergence of alternating minimization approaches can be used to simultaneously estimate depth and normals.

The invention may be implemented in hardware, firmware or software, or a combination of the three. Preferably the invention is implemented in a computer program executed on a programmable computer having a processor, a data storage system, volatile and non-volatile memory and/or storage elements, at least one input device and at least one output device.

By way of example, a block diagram of a computer to support the system is discussed next. The computer preferably includes a processor, random access memory (RAM), a program memory (preferably a writable read-only memory (ROM) such as a flash ROM) and an input/output (I/O) controller coupled by a CPU bus. The computer may optionally include a hard drive controller which is coupled to a hard disk and CPU bus. Hard disk may be used for storing application programs, such as the present invention, and data. Alternatively, application programs may be stored in RAM or ROM. I/O controller is coupled by means of an I/O bus to an I/O interface. I/O interface receives and transmits data in analog or digital form over communication links such as a serial link, local area network, wireless link, and parallel link. Optionally, a display, a keyboard and a pointing device (mouse) may also be connected to I/O bus. Alternatively, separate connections (separate buses) may be used for I/O interface, display, keyboard and pointing device. Programmable processing system may be preprogrammed or it may be programmed (and reprogrammed) by downloading a program from another source (e.g., a floppy disk, CD-ROM, or another computer).

Each computer program is tangibly stored in a machine-readable storage media or device (e.g., program memory or magnetic disk) readable by a general or special purpose programmable computer, for configuring and controlling operation of a computer when the storage media or device is read by the computer to perform the procedures described herein. The inventive system may also be considered to be embodied in a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.

The invention has been described herein in considerable detail in order to comply with the patent Statutes and to provide those skilled in the art with the information needed to apply the novel principles and to construct and use such specialized components as are required. However, it is to be understood that the invention can be carried out by specifically different equipment and devices, and that various modifications, both as to the equipment details and operating procedures, can be accomplished without departing from the scope of the invention itself.

Number	Name	Date	Kind
5500904	Markandey	Mar 1996	A
20070206008	Kaufman	Sep 2007	A1
20070211069	Baker	Sep 2007	A1

Shape from motion for unknown, arbitrary lighting and reflectance

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

US

International Classifications

Term Extension

Abstract

Description

Claims

Parent Case Info

US Referenced Citations (3)

Related Publications (1)

Provisional Applications (1)