This invention, relates generally to computer vision, and more particularly to factorizing images of a scene that is subject to lighting variations into basis images,
Edge detection is a fundamental problem in computer vision. Edge detection provides important low level features for many applications. Edges in images of a scene can result from different causes, including depth discontinuities, differences in surface orientation, surface texture, changes in material properties, and varying lighting.
Many methods model edges as changes in low-level image properties, such as brightness, color, and texture, within an individual image. Yet, the issue of identifying image pixels that correspond to 3D geometric boundaries, which are discrete changes in surface depth or orientation, has received less attention.
Raskar, in U.S. Pat. No. 7,295,720 B2, detects depth edges by applying a shadow-based technique using a multi-flash camera. That method applies only to depth discontinuities, not changes in surface normal, and requires a controlled set of lights that encircles the lens of a camera.
3D geometric boundaries accurately represent characteristics of scenes that can provide useful cues for a variety of tasks including segmentation, scene categorization, 3D reconstruction, and scene layout recovery.
In “Deriving intrinsic images from image sequences,” ICCV 2001, Volume: 2, Page(s): 68-75 vol.2, Weiss et al. describe a sequence of images of a scene that undergoes illumination changes. Each image in the sequence is factorized into a product of a single, constant reflectance image and an image-specific illumination image.
U.S. Pat. No. 7,756,356 describes factoring a time-lapse photographic sequence of an outdoor scene into shadow, illumination, and reflectance components, which can facilitate scene modeling and editing applications. That method assumes a single point light source at infinity (the sun), which is moving smoothly over time, and an ambient lighting component.
In “Appearance derivatives for isonormal clustering of scenes” IEEE TPAMI, 31(8):1375-1385, 2009,” Koppal et al. describe image sequences that are acquired by waving a distant light source around a scene. The images are then clustered into regions with similar surface normals. That work also assumes a single distant light source whose position varies smoothly over time and an orthographic camera model.
Our invention considers a set of images of a static scene (which can be in an indoor or outdoor environment), acquired by a stationary camera under varying illumination conditions. One objective of the invention is to detect 3D geometric boundaries from the set of images.
Another objective of the invention, is to factorize these images into a set of basis images. In these applications, the positions of the light sources are unknown, the lights are not necessarily point sources, and distances from the lights (and camera) to the scene cannot be assumed to be infinite, because they are not necessarily much larger (i.e., larger by one or more orders of magnitude) than a size of the scene. This breaks the assumptions of existing methods for recovering 3D structure from 2D images under varying lighting such as photometric stereo, structured light, and isonormal clustering, and of existing methods for factorizing the effects of illumination from a set of images such as factored time-lapse video and intrinsic images.
The embodiments of the invention provide a method for recovering lighting basis images from a set of images of a static scene, captured with a fixed camera viewpoint, under unknown and varying illumination conditions. In some embodiments, 3D geometric boundaries are detected.
One objective of some embodiments is to identify 3D geometric boundaries in a set of 2D images of a static scene (which can he in an indoor environment) that is subject, to unknown and changing illumination. As strictly defined herein, a 3D geometric boundary as observed in images of a scene is a contour that separates two surfaces in the scene where there is a 3D depth discontinuity, or a significant change in surface orientation. These boundaries can be used effectively to understand the 3D layout of the scene. The 3D geometric boundary is different from a 2D edge, such as a texture edge or shadow edge.
To distinguish 3D geometric boundaries from 2D texture edges, some embodiments of the invention analyze the illumination subspace of local appearance at each image location. This is based on the realization that for non-specular, e.g., Lambertian, surfaces, neighboring pixels on the same smooth 3D surface tend to have the same relative response to lighting even though the pixels may have different colors, i.e., albedos or reflection coefficients. The reason is that in a small neighborhood the 3D surface is locally planar, so two points on the surface that correspond to adjacent pixels have approximately the same normal. The distance between these two points is typically much smaller than the distance to any of the light sources and the camera.
Based on this realization, the 3D geometric boundary detection method can distinguish pixels associated with 3D geometric boundaries. i.e., pixels whose immediate neighborhoods in the image include a discontinuity in surface normal or in depth, from pixels whose neighborhoods may contain sharp texture or intensity boundaries but correspond to a a single surface.
The method formulates 3D geometric boundary detection as a per-pixel classification problem by analyzing the illumination subspace of local appearance at each pixel location. Specifically, the method uses the dimension of the illumination subspace to indicate the presence of a 3D geometric boundary.
One objective of the invention is to determine a set of lighting basis images from the set of images of a static scene subject to unknown and changing illumination due to combinations of a set of stationary light sources. A lighting basis image is the image that would be formed when the scene is illuminated by one of the individual light sources. The light sources do not need to point light sources. The basis images provide a natural, succinct representation of the scene, with qualitative and quantitative improvement when compared with the prior art, to enable scene editing (such as relighting) and identification and removal of shadow edges.
In some embodiments of the invention, the method for recovery of lighting basis images uses semi-binary nonnegative matrix factorization (SBNMF). SBNMF is related to nonnegative matrix factorization (NMF), which factors a nonnegative data matrix into a product of two nonnegative matrices, and for which many techniques are known.
Unlike NMF, SBNMF factors a nonnegative data matrix, into a product of a nonnegative matrix and a binary matrix, where a binary matrix is a matrix in which each element is either 0 or 1. That is the method factors a nonnegative matrix containing the images into a nonnegative matrix of lighting basis images and a binary weight matrix that indicates which light sources are ON or OFF for each image. The recovered set of lighting basis images provide a compact representation of the scene under varying lighting.
In some embodiments, the basis images can be used to in conjunction with the method for 3D geometric boundary detection to distinguish shadow edges from true 3D geometry edges.
As shown in
As shown in
Generative image Model
There are l light sources illuminating a scene, with each light source controlled by an independent ON/OFF switch. If there is a group of two or more lights that are either all ON or all OFF together (such that there is no input image in which one of the lights in the group is ON while another light in the group is OFF), then the group of lights is considered a single light source. For example, two lights that are controlled by the same switch are considered a single light source.
We assign a binary variable wi to indicate, the status of each light source i. Then, we define a basis image vi ∈ +
Herein, we express every image as a column vector formed by stacking the values of all pixels in the image into a single column.
We acquire the set Y of in images lit by various binary combinations of the l light sources, and arrange the image data into a matrix
Y=[y
1
, y
2
, . . . , y
m] ∈+
Following equation (1), this data matrix can be factorized as:
Y=VW, (2)
where the columns of V ∈ +
Note that if there is ambient lighting in the scene (light that is present in every image), in our model this can be modeled by an additional, basis image (an additional column in the matrix V) and a corresponding additional row of the indicator matrix W whose elements are all 1.
In some embodiments, if a single light source is moved to a different position between the times at which two input images are acquired, we consider that light source to be two separate light sources. For example, if two input images acquired several hours apart are both illuminated by the sun, we consider the sun in the first image to be a first light source and the sun in the second image to be a second light source.
Recovering Basis Images via SBNMF
In some embodiments, we recover the lighting basis images and indicator matrix from the set of input images using SBNMF. If the true lighting basis images are linearly independent, and we observe sufficient illumination variability, i.e., the rank of the true indicator matrix W is not less than the number of lights, then the number of lights l in the scene is given by the rank of the data matrix Y.
We formulate recovery of the basis images and indicator matrix as a constrained optimization problem:
which we call the SBNMF. This is a challenging problem due to the non-convex objective function and the binary constraints on W. Therefore, we initially solve a continuous relaxation:
where the binary constraints on Wij are replaced by simple box constraints of lower and upper bounds This is a bi-convex problem which we solve using alternating direction method of multipliers (ADMM). ADMM is a variant of the conventional augmented Lagrangian method, see below.
We rewrite equation (4) using, an auxiliary variable X, and replacing positivity and box constraints by indicator functions:
where an indicator function IS(x) takes value 0 if x ∈ S and equals ∞ everywhere else.
Next, we form the augmented Lagrangian:
L(X, V, W, U)∥Y−X∥F2+I[0,∞((V)+I[0,1](W)+(μ/2)∥X−VW+U∥F2−(μ/2)∥U∥F2 (6)
where U is the scaled dual variable and μ is the augmented Lagrangian parameter. We use the scaled form of the augmented Lagrangian function in which the scaled Lagrangian multiplier is redefined as U=Z/μ, where Z is the original Lagrange multiplier.
ADMM solves the augmented Lagrangian dual function by a set of convex subproblems where the biconvex function is decoupled:
These subproblems are iteratively solved until convergence of primal and dual residuals.
Following that, we round each entry of the matrix W to {0, 1}, and determine the basis images V based on the binary indicator matrix using nonnegative least squares:
Note that because W is constant in the optimization (10), the problem is convex.
Note that in other embodiments, the individual light sources are not restricted to being only ON or OFF, but: the light sources' intensities can change continuously, (e.g., a dimmer switch is used). In this case, the indicator coefficients in the indicator matrix W are not restricted to binary values {0, 1} but can be any nonnegative real numbers. In this case, every input image is a nonnegative linear combination of the m lighting basis images. In such cases, the factorization can be performed using conventional nonnegative matrix factorization.
Detecting 3D Geometric Boundaries
In some embodiments, we detect 3D geometric boundaries in the set of images acquired of the scene. As observed in images of the scene, 3D geometric boundaries are contours that separate two surfaces in the scene where there is a 3D depth discontinuity, or where there is significant change in surface normals. For typical indoor scenes, a distant lighting assumption is not valid. To allow for nearby lighting, we consider one small image patch at a time, and analyze how the local appearance of that patch varies with multiple lighting conditions. The method can use patches with a variety of shapes and sizes. For example, we can consider a square or circular patch with a fixed diameter (e.g., 7 pixels) centered at each image pixel.
If all pixels in a patch come from a single smooth surface in the scene, then the patch appearance across varying lighting forms a one-dimensional subspace. If the patch contains a 3D geometric boundary, then the appearance subspace of the patch generally has dimension greater than one.
Illumination Subspace of Local Appearance
For simplicity, we justify our method for Lambertian surfaces with only a direct lighting component, but an analogous argument applies to a broader class of reflectance functions and indirect lighting, e.g., multiple reflections. To simplify the explanation, we describe only point light sources, because an extended isotropic light source can be arbitrarily well approximated as a superposition of multiple point light sources.
We describe our notation for light source A, and that for source B is analogous. The surface normal at point i is {circumflex over (n)}i, and the vector from point i to light A is ria (the corresponding unit vector is {circumflex over (r)}ia). The intensity of the point on the image plane that corresponds to surface point i is Iia (for light source A) or Iib (for light B):
Here {circumflex over (n)}iT{circumflex over (r)}ia is the cosine of the angle between {circumflex over (n)}i and ria, Ea is the radiance intensity of light source A, and ρi is the surface albedo at point i. The binary value γia=1 if point i is illuminated by source A whereas γia=0 if point i is not illuminated by source A due to an attached or cast shadow.
For each of the three surfaces, points 1 and 2 are near each other from the perspective of the camera, so the points will both he included in the same small image patch. In
{circumflex over (n)}1≈{circumflex over (n)}2, r1a≈r2a, r1b≈r2b. (12)
Because all points in the patch share approximately the same normal and the same vector to each light source, we can eliminate the subscripts i in equation (11) and use {circumflex over (n)}, ra, and rb for all points in the patch. For now, we assume that every point i in the patch shares a single value for γia (which we call γa) and shares a single value γb of γib, which means that for each light source, the entire patch is either illuminated by or shadowed from that light, i.e., the patch contains no shadow edges. We consider shadow edges below.
Let Pa and Pb represent the vector of pixel intensities of the patch imaged under light A alone and light B alone, respectively. For the case in
where the scalar ka is constant for all pixels in the patch, and ρ is the vector of surface albedos for all of the pixels in the patch. For the same patch under light source B, we have the analogous equation: Pb32 kbρ.
Thus, if a patch contains no sudden changes in normal nor in depth (and no shadow edges), then the pixel intensities under any light source are equal to a scalar multiple of ρ. In other words, the subspace of spanned by the appearance of that local patch under all light sources (which we call the illumination subspace of local appearance) is one-dimensional (ID). Note that this is true regardless of the surface texture (albedo). Even if the surface albedo of the patch contains high-contrast texture edges, its illumination subspace of local appearance is still 1D.
This realization is at the heart of our method for finding geometric edges, because the same is not generally true if a patch contains a 3D geometric edge.
For example, if a patch contains an abrupt change in normal, as in
Confidence Map of 3D Geometric Boundaries
In some embodiments, we detect geometric boundaries by identifying patches whose illumination subspaces of local appearance have dimension greater than one. For each pixel location, we extract a τ-pixel patch centered at that location from all m input images (m light combinations), and arrange the patches as column vectors in a τ×m matrix, Z:
Z=[P
(1)
, P
(2)
, . . . , P
(m)], (14)
where vector P(i) contains all τ pixel (color or intensity) values of the patch extracted from image i at that pixel location. To determine the rank of the illumination subspace of local appearance for that patch location, we apply singular value decomposition (SVD) to Z and obtain the singular values {σiP} (ordered largest to smallest). In the absence of noise, a one-dimensional illumination subspace yields just one nonzero singular value σ1P, with σ2P=0. Due to noise in the images σ2P is not exactly 0, but approximately 0. To determine whether the illumination subspace of local appearance has rank 1 we use a confidence value that is is accurate in the presense of noise.
In some embodiments, for each pixel location, we determine a confidence value that the corresponding patch contains a 3D geometric boundary as a ratio of the second to the first singular value for the patch centered at that location:
c(P)=σ2P/σ1P. (15)
Using, equation (15), we obtain a confidence map, an image in which the intensity of each pixel is the confidence value that was determined for that pixel location.
In other embodiments, the confidence value that the illumination subspace of local appearance has rank greater than 1 can be computed in other ways than equation (15). For example, we could define c(P) some other function of the singular values, such as c(P)=σ2P/k, where k is a normalization factor determined from the singular values of the illumination subspaces of all of the patches. The pseudocode for our 3D geometric boundary detection procedure in shown in
In one embodiment, rather than extracting patches from the in original input images, we extract patches from the l nonnegative lighting basis images described above. This can be a more stable approach if the set of input images is unbalanced, for example, if a large number of input images come from a single lighting condition, and only a few input images come from the other lighting conditions.
Removing Shadow Edges
Our method successfully detects both types of 3D geometric boundaries: discontinuity in the normal, and discontinuity in depth, herein both types are characterized as “boundaries.” In addition, our method is not confused by texture edges. However, shadow edges can be detected by the method outlined in
In most cases, each shadow edge is caused by only a single light source. Based on this observation, we can use our ability to factorize a set of images of a scene into single-light-source lighting basis images to eliminate most of the false positives caused by shadow edges.
We can eliminate the shadows produced by light source i by subtracting basis image vi from the set of images Y:
Y
(i)
=Y−v
i
w
i, (16)
where wi is the ith row of lighting indicator matrix W, and Y(i) denotes the scene images re-rendered with light i turned off.
Applying our boundary detection technique on Y(i) results in a boundary confidence map C(i) in which the shadow edges resulting from the ith light source are eliminated. The final confidence map is aggregated by taking the minimum at each pixel location among all confidence maps of {C(i)}i=1l, so that if a shadow edge disappears when any one of the light sources is removed, that edge will not be present in the final confidence map.
The pseudocode for our boundary detection procedure with shadow edges removed is shown in
In one embodiment, rather than setting Y(i) equal to a re-rendered version of the original images using equation (16), instead for each i we set Y(i) equal to the set of all of the lighting basis images other than lighting basis image i. In this embodiment, for each i, the reduced set of lighting basis images Y(i) contains l−1 lighting basis images. This can be a more stable approach if the set of input images is unbalanced, for example, if a large number of input images come from a single lighting condition, and only a few input images come from the other lighting conditions.
Scene Editing
As shown in
The individual lighting basis images can be edited to produce edited basis images 712 by applying an editing function 710 which can be a linear function, such as scaling, or a nonlinear function such as histogram equalization, gamma correction, tone mapping, or brightness and contrast adjustment. In addition it is possible to edit a region of the lighting basis images, such as inserting an object or modifying the texture. An output image 720 is constructed by applying a merging function 715, such as a linear combination, to the edited basis images. The editing function can be applied to all or a part of the lighting basis images.
In another application, the shadow and highlight, regions in a lighting basis image can be detected by finding almost black regions and saturated regions, respectively, in the lighting basis image. Intensity (color) values of such regions can be copied from identical locations in a different lighting basis image in which the values are not black or saturated. The values are then modified so that the image values are continuous across the original shadow and highlight boundaries. This approach can eliminate dark shadow and highlight regions in the scene without saturating or darkening the entire image.
Detected shadow regions can be used to replace the texture of a surface with a different texture while conforming to the illumination information. The brightness of the part of the new texture that is under shadow is darkened to match the shadow information.
The steps in the methods described and shown herein can be performed in a processor connected to a memory and input/output interfaces as known in the art. It is understood that typical digital images include millions and millions of pixels, and that: it is impossible to process intensities or color values of this enormous magnitude mentally.
Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as conic within the true spirit and scope of the invention.
This U.S. Patent Application is related to U.S. Patent Application MERL-2630 cofiled herewith and incorporated by :reference.