1. Field of the Invention
The present invention relates generally to imaging methods and systems, and more particularly to methods and systems for reducing distortion of single-viewpoint projections derived from images captured by non-single viewpoint imaging systems.
2. Description of the Related Art
A typical imaging system receives one or more rays of light from each point in a scene being imaged. In a classic “pinhole” camera, a single ray of light is received from each scene point and is projected upon one point of a detector (e.g., a piece of film or CCD image detector array). In an imager which uses one or more lenses to collect more light than would otherwise be collected using a simple pinhole camera, a bundle of light rays is received from each scene point and is focused onto a single point of a focal plane within the imager. Each bundle of light emanating from a scene point is considered to have a chief ray which can be used to define the direction in which the scene point is located with respect to the field of view of the imager. Many conventional imaging systems are designed to have a single “viewpoint”—a point of intersection of all of the chief rays of the bundles of light received from the various scene points. The viewpoint can also be referred to as a “virtual pinhole”.
The concept of a perspective projection can be further understood with reference to
However, many imaging systems do not have a single viewpoint; in other words, not all of the chief rays of the bundles of light rays received by the imager intersect at a single point. Non-single viewpoint imagers can provide advantages such as wider field of view. However, unlike an image captured from a single viewpoint imager, an image captured by a non-single viewpoint imager typically cannot be used to generate an accurate, undistorted perspective view—or, in fact, any other single-viewpoint image—unless additional information regarding the scene geometry is available.
An example of a typical, non-single viewpoint imaging system is a fish-eye lens based system. Such a system is illustrated in
Some imaging systems utilize reflective elements, rather than lenses, to capture images. Such systems can be referred to as “catoptric” systems. Examples of catoptric imaging systems are illustrated in
In addition, although the above discussion refers to a camera 704 having an actual pinhole 708, most conventional cameras include lenses. Such a lens-based camera, if used as the camera 704 in one of the systems illustrated in
Catadioptric systems such as those illustrated in
It is therefore an object of the present invention to provide an imaging system which reduces the distortion of images captured by non-single viewpoint imagers.
This and other objects are accomplished by the following-aspects of the present invention.
In accordance with one aspect of the present invention, the following method for imaging is provided. An image generated by an image-sensing system is received, the image-sensing system having a plurality of viewpoints. The image is generated based upon radiation rays received by the image-sensing system, the radiation rays coming from a scene. First information regarding a statistical distribution associated with at least one depth value of the scene is used for selecting a virtual viewpoint for a projection representing the scene. The virtual viewpoint is selected for reducing distortion of the projection. The image, the virtual viewpoint, and second information regarding at least one geometrical characteristic of the image-sensing system are used to generate the projection.
In accordance with an additional aspect of the present invention, another method for imaging is provided. In this method, an image generated by an image-sensing system having a plurality of viewpoints is received. The image is generated based upon radiation rays received by the image-sensing system, the radiation rays coming from a scene. First information regarding at least one geometrical characteristic of the image-sensing system is used to determine a portion of a caustic of the image-sensing system. An average location of the portion of the caustic is determined, and the average location is selected as a first virtual viewpoint for a projection representing the scene. The image, the first virtual viewpoint, and the first information are used for generating the projection.
Further objects, features, and advantages of the present invention will become apparent from the following detailed description taken in conjunction with the accompanying figures showing illustrative embodiments of the invention, in which:
Throughout the figures, unless otherwise stated, the same reference numerals and characters are used to denote like features, elements, components, or portions of the illustrated embodiments. Moreover, while the subject invention will now be described in detail with reference to the figures, and in connection with the illustrated embodiments, changes and modifications can be made to the described embodiments without departing from the true scope and spirit of the subject invention as defined by the appended claims.
In a typical single viewpoint or non-single viewpoint imaging system, a light ray bundle having a particular chief ray is received from each point in the scene being imaged. The imager measures at least one property of the light coming from the scene point and generates a pixel value representing the value of the light property being measured. The imager generates an entire image by creating numerous pixels, each representing at least one property of the light emanating from a particular scene point. The position of each pixel within the image is determined by the chief ray of the bundle of light rays received from the scene point. An example of such an imaging system is illustrated in
In accordance with the present invention, the geometrical properties of an imaging system—such as, for example, the system 822 illustrated in
In addition, the reflecting surface 816 can have its own ray surface 810, and this ray surface 810 can also serve as a ray surface for the entire imaging system 822, including the camera 820 and the reflecting surface 816. As is discussed in further detail below, it is advantageous to select, as a ray surface, the surface to which every incident ray is tangent. Such a ray surface can be referred to as the “caustic” of the system. A caustic can usually be viewed as the most compact possible ray surface for a particular imager. Therefore, if the caustic consists of a single point (which can be considered a degenerate case), the imager is a single viewpoint system. Furthermore, even for a system having a caustic which includes more than one point, if the system has a relatively compact caustic, the system will generally produce images having less distortion than a system having a less compact caustic.
A technique for computing caustics in accordance with the invention is described as follows. Consider the exemplary system 822 illustrated in
VR(θ, φ)=VI(θ, φ)−2(nR(θ, φ)VI(θ, φ)nR(θ, φ) (1)
where nR(θ, φ) denotes the normal vector at the point of reflection sR(θ, φ). The ray surface of incoming rays of light is then given by lI(θ, φ)=(sR(θ, φ), νI(θ, φ).
A discussed above, the caustic is the surface which is tangential to all the incoming chief rays. In other words, the caustic can be considered the “envelope” of all of the incident chief rays. Points along an incident ray can be parameterized as a function of the distance r from the point of reflection sR(θ, φ), as illustrated in
det(J(L′)(θ, φ, r))=0 (2)
Although Eq. (2) applies to the above-described, general, three-dimensional case, computing the caustic in two dimensions is sufficient for a radially symmetric optical system. For the two-dimensional calculation, let the reflecting surface be SR, the reflecting ray be vR(φ), and the point of reflection be sR(φ). The incident ray of light is then described by the reflection equation (Eq. (1)).
Points along the incident ray are, as above, parameterized as a function of the distance r from the point of reflection. The vector valued function L is now defined as:
L(φ, r)=(sR(φ)+rvI(φ), vI(φ)) (3)
As before, only the position information is required:
L′(φ, r)=((sI)x+r(vI)x, (sI)y+r(vI)y) where sI=sR (4)
The x and y components of sI are denoted as (sI)x and (sI)y respectively. The same convention is also used for νI. To compute the caustic curve, the determinant of the Jacobian det (J(L′)(φ, r)) must vanish. The determinant is given by:
Where
can be solved for explicitly by enforcing the singularity constraint:
Finally, substituting r(φ) back into Eq. (3) gives the parameterization of the bundle of rays on the caustic.
As discussed above, it is beneficial to generate a perspective projection based upon an image captured by an imaging system, because when viewed by the human eye, a perspective projection of a scene accurately represents what the eye would see if looking at the scene from the virtual viewpoint of the perspective projection.
If the distance between the scene point P and the reflector 816 or ray surface 904 is known exactly, the exact location of the scene point P can be determined simply by back-tracing the incoming ray 812 for the required distance. An image point 1114 within a perspective view 1112 having any virtual viewpoint O′ can be generated in its proper location by determining where a ray 1108—which extends between the virtual viewpoint O′ and the scene point P—intersects the perspective view 1112. The entire perspective view 1112 can be generated by: (1) tracing the rays 1116 between various scene points 1120 and the virtual viewpoint O′, and (2) determining the locations of the points 1118 at which the respective rays 1116 intersect the plane of the perspective view 1112.
An algorithm in accordance with the present invention can be used to estimate a ray-image map for an imager. For rotationally symmetric optical systems, the procedure can be treated as a problem in one dimension—i.e., the displacement from the center of the field of view. Let the image points be parameterized as functions of an angle φ representing the viewing direction relative to the center of the field of view. The vector along the corresponding viewing direction is then given by vI(φ). The forward map V is computed for a densely sampled set of image points Φ={φ1, φ2, . . . , φj, . . . , φN}. The computed viewing directions are given by V={v1, v2, . . . , vj, . . . , vN}. As illustrated in
In the above example, the value of φj obtained by the above process is the closest neighbor to the true inverse of wk. However, φj can also be calculated by interpolating its closest neighbors. In addition, the closest neighbors can be used as bounds for a localized search of the true inverse of wk. Such a localized search is more efficient than a “brute force” search over the entire space of possible solutions.
In order to compute the inverse map I, it is preferable to create a densely sampled forward map V=vI(θ, φ). Furthermore, for any desired viewing direction wk, there can exist a computed forward map vj such that |wk−vj| ε, where ε is a maximum amount of error, and the forward function should be sampled densely enough to meet this criterion. Under-sampling is likely to result in errors, while over-sampling is likely to result in redundant data. An algorithm in accordance with the present invention can estimate the preferred resolution at which to sample the forward map, assuming that the forward map is to be uniformly sampled.
The following analysis applies to the sampling rate for a one dimensional forward map. Let y=ƒ(x) be a forward map in one dimension, and assume that ƒ(x) is known and is twice differentiable. The maximum permissible error in the inverse map is ε. If Δy denotes the difference between any two successive y values, then Δy ε. The problem then is to estimate the appropriate resolution Δx at which to sample the forward map. In the limiting case as
Therefore:
Determining the smallest Δx for which the sampling rate is adequate implies that ƒ′ should be maximized. The maximum value ƒ′(x) can have in the interval xL x xU is either a maximum of the function ƒ′(x) within the bounds xL and xU, or the higher of the two function values at the aforementioned bounds. This technique can easily be extended to multiple dimensions, by simply assigning to each dimension its own critical sampling rate. This critical sampling rate can also be considered as the Nyquist rate—a type of minimum sampling rate which is well-known in the art.
It is thus demonstrated that if the depths of the various points P and 1120 in a scene 1106 are known, an exact perspective view can be calculated. However, in practice, the exact depth of the respective scene points is rarely known. Rather, in many cases, only a statistical distribution of the various depths is available.
However, often information is available regarding a range of possible depths of a scene point P. For example, it may be known that the scene point P is no closer than a minimum distance P1, and no farther away than a maximum distance P2. The geometry of the system and scene can be used to determine the point p1′ oil the perspective projection 1112 which would represent the scene point P if the scene point P were located at the minimum depth P1. In addition, the location p2′ which would represent the scene point P if it were located at the maximum depth P2 can also be determined. It is therefore known that the true pixel location p′ lies within a region defined by the outer limits p1′ and p2′. In particular, the true location p′ typically lies on a line segment defined by the outer limits p1′ and p2′.
If no information is available regarding the statistics of the depth of the scene point P, other than that the scene point P lies somewhere between P1 and P2, then the midpoint of the line segment between p1′ and p2′ can be used as a rough approximation of the true location p′ of the pixel representing the scene point P.
However, in many cases, further statistical information is available regarding the depth distribution of points in the scene 1106. For example, the distribution of the depths of scene points may be in the form of a Gaussian distribution 1308 having a central value 1312. Each depth value of the Gaussian distribution 1308 corresponds to a location on the perspective projection 1112, and accordingly, the probability associated with each depth value in the statistical distribution 1308 equals the probability associated with the corresponding point on the perspective plane. For example, the probability 1318 associated with the peak value 1312 of the Gaussian distribution 1308 illustrated in
A general example of a distortion reducing algorithm as discussed above is illustrated in
In order to perform the above-described procedure for estimating the location of a pixel within a perspective projection, it is advantageous to model the depth distributions of typical scenes in order to provide an appropriate depth distribution function for use in the estimation procedure. Several methods for generating depth distributions are described as follows.
For example, consider a scene in two dimensions, in which it is known that all objects reside within a circular room centered on a viewpoint O and having a radius R. An example of such a scene is illustrated in
An occlusion occurs when some part of an object 1410 (circles, in this case) falls within the field of view. As illustrated in
Aperim=(A1+A2+A3+A4+A5+A6+A7+A8). (8)
where:
The area of the bounded region 1408 (e.g., the room) is given by Atotal=π·R2. If X denotes the situation in which s is the un-occluded distance in direction θ with a field of view having an angle of dθ, then:
where P(X) denotes the probability of situation X occurring. Note that this computation holds true for all dθ<π and for only one circle. For N circles uniformly distributed within the bounded region, the probability is given by:
In addition, as illustrated in
The probability distribution of occlusion at distance s is then given by:
This probability distribution equals the probability distribution of scene depth, because scene depth in a particular direction is defined as the distance that a ray in that direction can be extended from the viewpoint without intersecting an object. For N objects, the probability is:
To perform the calculation for a three-dimensional, spherical chamber, the system illustrated in
In addition, the depth distribution can be estimated numerically, e.g., by using a computer to simulate a scene. Such a procedure is performed by using a computer algorithm to mathematically generate objects of randomized sizes, shapes, and orientations within some finite scene space. Given a particular viewing direction and field of view, the algorithm computes the maximum distance s up to which the view is un-occluded. Multiple samples of s are collected with each simulation, in order to generate the simulated distribution of s.
An additional approach for estimating the depth distribution is to utilize actual, real-scene range data acquired using, for example, a range scanner or similar instrument. It is assumed that all data collected is statistically independent, and thus represents a good statistical measure of distances of scene objects from a viewpoint. In addition, range data can be segregated into categories such as indoor, outdoor, urban, rural, and/or even microscopic types. This can help to compute more accurate distributions for each case.
In most cases, the above-described methods for generating an estimated perspective projection 1112 will not yield a perfectly undistorted image of the scene, even if scene depth distribution information is available. In fact, the amount of distortion remaining in the perspective projection 1112 tends to depend upon the choice of virtual viewpoint O′. Therefore, an algorithm in accordance with the present invention is used to select a virtual viewpoint O′ which results in a perspective projection 1112 having the lowest possible amount of distortion. In order to enable a suitable virtual viewpoint O′ to be selected, the algorithm uses one or more techniques for measuring perspective distortion. Examples of such distortion measuring techniques are described below.
Under true perspective projection, a straight line in the scene maps to a straight line in the image. However, as illustrated in
The notion of distortion can be formalized as follows. Let C be a sequence of k points in the scene. Such a sequence can be referred to as a “configuration” of points. Let y denote the set of such configurations—i.e., Cεγ. ƒ is defined as a function of γ, which receives a configuration C and maps the configuration C to a real number. An example of ƒ can be the cross ratio between four collinear points, which is “projectively invariant”—i.e., does not vary under projective transformation. When the points are collinear, ƒ vanishes. In both these cases, ƒ is not projectively invariant as applied to all configurations of points, but only as applied to a certain class of configurations. Suppose, for example, that ƒ is a function applied to configurations for which it is invariant under projective transformations. If T is any projective transformation acting on Cεγ, and γ is closed—i.e., T(C)εγ—then:
f(T(C))=f(γ) (20)
The above equation formalizes the constraint that ƒ is projectively invariant. Thus, the value of the function ƒ computed at C is equal to that computed at the projective transform of C—i.e., τ(C).
An exemplary procedure for measuring image distortion can be understood by considering an imaging and distortion reducing system in which, for a set of scene points in a configuration Ci, τ is the map from the scene points to the final, reduced-distortion image. The distortion remaining in the final image is measured by determining how close τ is to being a true projective map T. Eq. (20) holds only when τ is, in fact, projective. Therefore, a simple measure of the distortion in the final image is:
mi=ƒ(τ(Ci))−ƒ(T(Ci)) (21)
Note that mi vanishes when τ=T.
Objective functions are constructed by taking many measurements mi and averaging them with one of a family of norms. For a large number N of configurations, let {(ƒi,Ci)}1≦i≦N be a set of pairs of functions and configurations which are invariant under perspective projection. The N measurements are mi=ƒi(τ(Ci))−ƒ(Ci). If M=(mo . . . mN) is a vector of N measurements, then an appropriate objective function is given by the p norm mp. To select the best virtual viewpoint—i.e., the virtual viewpoint giving rise to the least amount of distortion—an algorithm in accordance with the present invention uses a numerical method such as a minimum-searching procedure to find the viewpoint for which the objective function has a minimum. In addition, each measure can be weighted by its probability of accuracy. This objective function ξp,m, which is a variant of the p norm, is given by:
where wi is the weight associated with the corresponding measure mi and Σ wi=1. The infinity norm ξ∞ is given by:
ξ∞=∥m∥∞=max(wi·mi)1≦i≦N (23)
In order to make the objective functions resilient to noise, robust statistical techniques can be used. For instance, the well-known RANSAC technique increases robustness by discarding outliers. Additional techniques include using the median, rather than the mean, of the N measures:
ξmedian=median(wi·mi)1≦i≦N (24)
An additional distortion measuring method uses scene points to compute the error measure. In general, any invariant under perspective projection—such as the cross ratio—can be used.
For example, consider the configuration formed by triplets of collinear scene points Ci. Such a configuration is illustrated in
If the geometry of the scene is known, then the three-dimensional locations of scene points can be used to define a measure of distortion. The scene points need not be collinear. Consider, for example, an arbitrary set of known scene points, and define a configuration Ci to consist of a single scene point. Let τ represent the entire imaging system's mapping from scene points onto some ultimate projection surface. The mapping τ generates an image point τ(Ci) on the projection surface. From a selected virtual viewpoint, if scene points with known depths are accurately projected onto the same projection surface, this true perspective projection can be denoted as T. The error metric can now be defined as a measure of the distance between the image point τ(Ci) and the correct projection T(Ci) of the scene point Ci. Such an error metric can be expressed as follows:
mi=dist(τ(Ci)−T(Ci))1≦i≦N (26)
where N represents the number of points in the scene and dist(Pa−Pb) measures the distance between two points Pa and Pb on the projection surface. For N measurements, a vector of measures M={m1, m2, . . . , MN} is generated. The objective function which is to be minimized is defined as a p-norm of this vector:
ξprojective=|M|p (27)
A virtual viewpoint selection algorithm in accordance with the invention can employ distortion measures and objective functions based upon distortion of straight lines in the scene. Under true perspective projection, such straight scene lines should map to straight lines in the image. Let Ci be a set of collinear points—i.e., points on the same line—within the scene. The image τ(Ci) of Ci should, in a true perspective projection, be a configuration of collinear points. The function ƒ(Ci) (discussed above) can be constrained to vanish for all Ci. Such a constraint can be expressed as ƒ(τ(Ci))=0.
The value of the function ƒ is defined as the distance of a point in Ci from the best fit line estimated from Ci. Thus ƒ vanishes when all the points are collinear. Let λ1(Ci) and λ2(Ci) be estimates of the two parameters of the best-fit line Λ. Thus, λ1, λ2: C . Let the jth point in τ(C) be qj=(xj.yj). The distance dj of the selected points from the line Λ is:
dj=(Xj·sin(λ1(Ci))−yj·cos (λ1(Ci)+λ2(Ci)) (29)
Let the distances of all points in Ci to the best fit line Λ be represented by the vector D={d1(Ci), d2(Ci), . . . , dj(Ci), . . . dk(Ci)}. The function ƒ can be defined as a p-norm of the above vector:
mi=ƒ(Ci)=∥d∥p (30)
Similarly to the objective functions discussed above, an objective function based upon Eq. (30) can be made more statistically robust by employing techniques such as RANSAC, or by using the median, rather than the mean, of the errors associated with the various points.
An additional distortion-measuring method sums the absolute values of distances (dj in D) of the respective points from Λ. the sum can also be referred to as the infinity norm or the L1-norm of the error vector d. The L1-norm is defined as:
where ξl,1 represents the objective function to be minimized.
The above metric is useful for the case when all configurations have the same number of points in them. However, in some cases, the various configurations have different numbers of points, and as a result, some lines contribute, more than other lines, to the total error. In order to compensate for this effect, the algorithm preferably uses, as a normalization factor, the number of points in each configuration Ci, denoted as Ci. The distortion is then defined as:
The formulations discussed above use the absolute operator, which is not linear. Yet, the L2-norm lends itself to linear solutions. The error measure mi can be formulated using the L2-norm as follows:
The above formulation treats all lines equally and does not weight them by any measure of quality. However, in some cases, it is preferable to weight the error contribution of each line differently. For example, if the imaging system has a spatially varying resolution, the resolution of a line can be used to weight the contributions of that line. Thus, lines imaged at higher resolutions contribute more to the error metric than do lower resolution lines. Let the weights be denoted by wi:iε[1, N]. The objective function is then defined as:
One potential drawback associated with using an average as a metric is that averages tend to be susceptible to noise. In contrast, medians tend to be comparatively less susceptible to noise. An exemplary objective function ξ using the median of the squares of the spatial errors dj is defined as follows:
The objective function ξ can be modified by weighting the error contributions of each line differently. Let the set of weights associated with each line be denoted as wi:iε[1, N]. The objective function ξ is then defined as follows:
It is to be noted that a distortion-reducing algorithm in accordance with the present invention is not limited to the distortion measures discussed above. In general, any function which measures an invariant under perspective projection can be used.
If an explicit mathematical expression—such as, for example, a Gaussian equation—is available to describe the probability distribution of scene depths, the explicit scene depth distribution expression (illustrated as item 1308 in
In addition, if an explicit equation for the scene depth distribution 1308 is unavailable, but a set of simulated or measured depths is available, then the set of depths can be used to perform a numerical optimization of virtual viewpoint location, by simulating objective function computations for a number of virtual viewpoint locations. Numerical methods for finding the minimum of a function are well known.
An objective function minimization procedure can be used, not only to select a virtual viewpoint, but to optimize any model or mapping G which is used to reduce distortion of images. Such a procedure can, for example, employ an explicit mathematical expression describing scene depth distribution. The distortion-reducing mapping function G—which has one or more model parameters which can be optimized—is plugged into the scene depth distribution equation in order to generate an explicit mathematical expression for the probability distribution associated with each location on an image processed using the distortion reduction mapping function. This mathematical expression is then plugged into any one of the objective function equations discussed above, thereby generating a single expression representing approximate image distortion as a function of the model parameters. The optimal values of the model parameters are the values which minimize the objective function. Several exemplary models are discussed below in the context of a radially symmetric imaging system, in which distortions only exist in the radial dimension. In such a system, morphing the raw image only in the radial direction is sufficient to correct for the aforementioned radial distortions.
Finding a morph G that removes the distortion in an image amounts to finding the right parameter values for the morph G. If there is very little information available regarding the possible depths at which scene points may be located, it is preferable to assess how well G performs for a wide variety of images. For example, it may be important for G to perform well, on average, for all images, or it may be more important for the worst case performance to be as good as possible.
If G is performing well for removing distortions, then for a particular configuration of points in three-dimensional space, such as points on a line, points on a circle, or points forming a square, the perspective images of these points should retain certain properties. For example, scene points on a circle in three-dimensional space should be represented by points along an ellipse or circle in a true perspective image of the scene. A set of scene points on a line in three-dimensional space should be represented by points along a line in a perspective image. An algorithm in accordance with the present invention generates a number which quantifies how badly a set of image points fails to satisfy the above criteria. For many types of geometric configurations of scene points, such a function effectively measures the degree to which an image of those scene points is perspective, thereby indicating how well G performs for removing distortions.
A configuration type, such as straight lines, and a measure of straightness, such as Eq. (30) can be used to measure the performance of the morphing function G. For example, suppose that it is known that all objects are between 5 and 15 meters away from the imaging device, and the depth distribution associated with the objects is a spherical Gaussian function having an average distance value of 10 meters. The Gaussian function can be truncated at 5 and 15 meter limits. The performance of the morph G can, for example, be measured using the following procedure/algorithm. A line segment having a particular length (e.g., 1 meter) is provided. A number (e.g., 10) of equally spaced points are located on the line segment. A center location of the line segment is selected randomly according to the depth distribution. An orientation (i.e., a direction) of the line segment is selected according to a uniform distribution. The algorithm determines the position within an initial image, corresponding to each scene point. The location of each initial image point is adjusted using the morph G, which has the current morph parameters. An error measure, such as Eq. (30), is used to generate a number mi representing the straightness of the line in the morphed image. This number is stored. The process of choosing a line is iterated repeatedly, each iteration including the steps of determining the morphed line and measuring its straightness to obtain a number value. After many (e.g., 1,000,000) iterations, the stored straightness values are added and/or averaged to generate the value ξ of an object function. This value ξ represents a measure of the degree to which straight scene lines remain straight in the image using the morph G.
The parameters of G are optimized so that ξ is minimized. This optimization may be done by a “brute force” numerical search, or any of numerous well-known optimization techniques.
An example of a morph function which can be used is a polynomial warp function G. The function G can be applied either to the acquired image or to any representation thereof—for example, a spherical panorama. If r measures the distance of points along the dimensions of distortions, the model is given by:
where N is the degree of the polynomial. The parameters of this model G are the coefficients ξi of the polynomial. The set of coefficients which produce the least perspectively-distorted view are used to model the transformation.
The polynomial model is typically effective for systems having smoothly varying ray-image maps. For systems having discontinuous maps, piece-wise models can be used for the transformations. Each locally smooth piece can be modeled, for example, by a polynomial, a spline, a B-spline, or a cubic spline.
Asymmetric imaging systems can be viewed as a more general case. Asymmetry of a system can be defined in terms of the radial asymmetry of the ray-image map. Due to asymmetry, distortions are present in two dimensions. Thus, the transformation G should be defined as a vector valued function which maps image points to their new locations.
Let the image points be parameterized by (r, θ), where r is real and θε[0, 2π]. The transformation G maps (r, θ) to (r′, θ′), where r′ is real and θ′ε[0, 2π]. G can be defined in terms of two scalar valued functions Gr and Gθ, such that Gr: (r, θ)→r′ and Gθ: (r, θ)→θ′. Each of these scalar valued functions can be modeled as either smooth continuous functions or as piecewise functions.
If transformation cannot be assumed to be locally smooth, discrete maps are preferably used. Every pixel in the acquired image is thus mapped to another location using a lookup table. However, the use of discrete lookup tables is not restricted to non-smooth transformations; a smooth transformation can be discretized and represented as a lookup table in order to enhance the performance of image rendering. Since the transformations are invariant to motions such as rotation and/or translation of the imaging system as a whole, a map can be computed once and stored as a lookup table for future use of view creation.
Once the form of a transformation (i.e., morphing function) is determined, there are several ways to optimize the parameters of the transformation function. The goal is to estimate the set of parameters for which the transformation function produces the least distorted view. The amount of distortion in a computed view can be quantized using the objective measures discussed above. The transformation map G is chosen depending on the imaging system and its ray-image map.
The optimization of the transformation parameters can be posed as a minimization problem. Depending on the transformation function and the distortion measure being used, this minimization is solved either linearly or by nonlinear search mechanisms. An exemplary non-linear search technique can be understood with reference to
Such a search can be carried out using various well-known methods contained in numerical libraries such as IMSL. Typical search methods include the simplex search or direct search methods. When the measure function is analytically differentiable, gradient based methods such as the LM-algorithm can be used. In cases when an analytical gradient is not available, numerically estimated finite difference gradients can be used.
Transformation map optimization using linear parameter estimation can be understood with reference to the following example, illustrated in
The transformation parameters are estimated using information regarding the scene structure. Either a known scene or a scene depth distribution can be used. In addition, synthetic scenes can be artificially generated in order to statistically represent a real scene. Such synthetic scenes can, for example, be generated off-line and used to compute the transformation parameters. Synthetic scene points are randomly generated around the virtual viewpoint O′ using an appropriate scene depth distribution function. The resulting simulated scene points now form the desired configuration of scene points Ci∀1≦j≦N where N denotes the number of scene points.
The geometry of the imaging system is also assumed to be known. Let τ represent the imaging system map—including the distortion reduction transformation G—which projects scene points onto a projection surface (e.g., a spherical panorama 1902) centered on the virtual viewpoint O′. The spherical panorama 1902 is represented by a discrete lookup table parameterized according to the pan and azimuthal angles θ, φε[0.2π], as illustrated in
where ξi denotes the parameters of the transformation map G. Let T denote a true perspective map of scene points onto the spherical panorama 1902, based on the virtual viewpoint O′. The azimuthal distance for this ideal projection is given by Tr(·).
The measure of true perspective errors described in Eq. (26) is used as the error metric. The error to be minimized is thus given by:
ξperspective=(G(τr(Cj)−Tr(Cj))2∀1≦j≦N (39)
In order to minimize ξ, the algorithm takes partial derivatives with respect to all the coefficients {ξ0, . . . ξK} to arrive at the following constraint equation:
ξ0+ξ1τr(Cj)1+ . . . +ξKτr(Cj)K=Tr(Cj)∀1≦j≦N (40)
The only unknowns in the above linear equation are the coefficients of the transformation map. Therefore, the equation can be solved linearly, without any need for a search routine. For a polynomial model of degree K there are K+1 unknowns. Each synthetically generated point in space provides one constraint. Therefore, the model uses at least N=K+1 scene points to solve for a unique solution. For additional robustness, an overdetermined system of equations can be used to estimate the morph parameters of G.
It will be appreciated by those skilled in the art that the methods of
The memory unit 2330 can include different types of memory, such as volatile and non-volatile memory and read-only and programmable memory. For example, as shown in
Although the present invention has been described in connection with specific exemplary embodiments, it should be understood that various changes, substitutions, and alterations can be made to the disclosed embodiments without departing from the spirit and scope of the invention as set forth in the appended claims.
This application claims priority to U.S. Provisional Patent Application entitled “Method to Minimize Perspective Distortions in Non-Single Viewpoint Imaging Systems,” Ser. No. 60/220,024, filed on Jul. 21, 2000, which is incorporated herein by reference in its entirety.
This invention was partially made with U.S. Government support from the National Science Foundation, Information Technology Research Award No. IIS-00-85864; DARPA/ONR MURI Contract No. N00014-95-1-0601; and DARPA Human Identification Program Contract No. N00014-00-1-0929. Accordingly, the U.S. Government may have certain rights in this invention.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US01/23161 | 7/23/2001 | WO | 00 | 8/1/2003 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO02/09036 | 1/31/2002 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5760826 | Nayar | Jun 1998 | A |
5999660 | Zorin et al. | Dec 1999 | A |
6198852 | Anandan et al. | Mar 2001 | B1 |
6545702 | Konolige et al. | Apr 2003 | B1 |
6870563 | Kang | Mar 2005 | B1 |
Number | Date | Country | |
---|---|---|---|
20040012544 A1 | Jan 2004 | US |
Number | Date | Country | |
---|---|---|---|
60220024 | Jul 2000 | US |