This invention relates to the field of video image processing and more particularly to the processing of a non-linear 2D spatial transformation.
Geometric or spatial transformations are operations that redefine spatial relationships between points in an image. Generally, a spatial transformation is defined as a mapping function that establishes a spatial correspondence between all points in the input and output images. Given a spatial transformation, each point in the output assumes the value of its corresponding point in the input image. Common applications for applying a geometric transformation include image distortion correction, compensation of geometrically distorted image projected on non-flat or perpendicular surfaces, and compensation of optical distortions.
Geometric transformation algorithms use simplifications in processing or modeling of the problem in order to simplify implementation. These compromises result in limitations on the complexity of the transformation and/or introduce results that do not match the transformation exactly. One such simplification method is called the two-pass method. This method decomposes a 2D map into a series of 1D maps. Processing geometric transforms in 1D is much simpler than in 2D and allows high optimization of the hardware. A general discussion of image warping, including some prior art techniques mentioned below, can be found in the IEEE Computer Society Press Monograph entitled Digital Image Warping, George Wolberg, 1998, which is hereby incorporated by reference.
Much of the current state of the art in image warping is based on techniques used in texture mapping for 3D computer graphics, as well as other specialized methods developed specifically for special effects video. Texture mapping as used in 3D graphics is typically implemented as a non-separable 2D geometric operation. The non-separable methods typically approximate the geometric distortion as a collection of piecewise linear distortions, or focus on perspective mappings. A perspective mapping is useful in 3D graphics as this mapping defines how to render a 3D object to a 2D viewing area, however distortions such as pin/barrel effect and lens aberration present in optical systems cannot be modeled as a perspective mapping. Using piecewise linear representations for the geometric distortions can significantly increase memory requirements.
Due to the complexity of implementing non-separable methods most prior art designs simplify the filtering/interpolation methods used, for example MIP (“multum in parvo”) mapping which uses a hierarchy of filtered images is often substituted for complex filtering. The image artifacts created using this simplified filtering are unacceptable for video processing. Other texture mapping methods exploit the fact that in 3D graphics it is often required to warp a single static texture (image) for many different distortions, while in optical processing it is important to be able to warp dynamic images using a single geometric distortion.
Methods specifically developed for video warping are discussed in U.S. Pat. Nos. 5,175,808 to Sayre and 5,204,944 Wolberg et al. These methods are based on a pixel-by-pixel (i.e. look-up tables) representation of the geometric transformation. However, a pixel-by-pixel representation requires a large amount of data storage and does not allow for a simple method of performing manipulations of the geometric transformation (example rotation, scaling translation of an image in addition to the geometric transformation). Other separable methods for geometric transformations are limited to affine or perspective transformations that cannot perform “non-linear” transformations, for example see U.S. Pat. No. 4,835,532 to Fant, U.S. Pat. No. 4,975,976 to Kimata et al., U.S. Pat. No. 5,808,623 to Hamburg, and U.S. Pat. No. 6,097,855 to Levien. Methods that use mathematical models of the spatial transformation and allow the user to adjust the parameters of the model are often limited to affine and perspective warps. As has already been discussed, affine or perspective warps cannot adequately model distortions required for optical systems.
The invention provides in one aspect, an image transformation method for translating a non-linear 2D geometrical transformation into two separable 1D geometrical transformations, said method comprising the steps of:
Further aspects and advantages of the invention will appear from the following description taken together with the accompanying drawings.
In the accompanying drawings:
The image transformation method 10 of the present invention provides a separable two-pass transform for processing an image warp in separate horizontal and vertical passes, allowing the 2D spatial transform to be defined as two 1D processes. Accordingly, any system where a 2D spatial transformation needs to be applied can make use of image transformation method 10. Image transformation method 10 can be used in applications that range from correcting small distortions (in projectors, cameras, and display devices), to correcting for perspective (keystone or special wide-angle lens corrections), to a complete change in image geometry (such as forming rectangular panoramas from circular 360 images, or other rectangular to polar type mappings).
While the use of a separable algorithm does not improve the representation of 2D spatial transformations, it makes the transformations more amenable to hardware implementation. Separating the scaling data into vertical and horizontal passes (or datasets), allows for the use of 1D filtering to process the image. Without separation, a 2D filter is needed, which is more complicated and less efficient. While a representation of a 2D transformation in terms of separated datasets is more complicated, better filtering (image quality) is obtained by using 2D filtering rather than 1D, and not all transformations are separable, the hardware benefits associated with the two-pass approach, outweigh these disadvantages. The main benefit of using a one-dimensional spatial transform is a substantial reduction in calculations, and accordingly, a substantially simplified hardware implementation. Further benefits include efficient memory access, the ability to process data in a real-time environment. As will be described, the compact representation of the spatial transforms allows for scalability to the required level of precision.
The grid data description of a geometric (i.e. spatial) transformation defines the mapping of a finite set of points relative to either the input or output image. Image transformation method 10 allows users to map given pixels to certain “landmark” positions (i.e. point mapping) that define the desired geometric transformation such as those available from a ray trace of an optical system. While it is possible to represent a geometric transformation in an analytical form, the required number of user parameters grows quickly when non-linear warps are required. Since most hardware implementations of warping restrict the model or “form” of warping possible (representing the warp with polynomial functions for example) only a finite number of point mappings are required to uniquely specify the geometric transformation. As an example, if the hardware defines a warp as a bi-variate polynomial of order four then a grid of 5×5 point mappings is sufficient to uniquely specify a geometric transformation. Although the inverse map is a useful concept in hardware implementation, allowing the user to define the geometric transformation in a forward map is important since the user naturally considers the geometric transformation as one in which the input image is mapped to an output image. In some situations, obtaining the inverse data may not be possible as not all destination points have known mapping to source locations. In such a case, it is useful to model a forward or inverse 2D map as two surface maps.
During the inverse transformation stage 20, at step (12), the transformation input grid is provided by the user. The input grid is a list of input to output (forward map) or output to input (inverse) transformations. For example:
At step (14), it is determined whether the geometric transformation is a forward transformation, and if so then the transformation is converted into an inverse transformation at step (16).
The non-regular inverse map is approximated within image transformation method 10 using a “thin plate spline” radial basis function of the form:
where φ(x−xl)=r2 log(r) and x∈[X,Y] and r is given as the radial distance between two dimensional points, p(x) is an additional linear transformation of the image map.
The non-regular inverse map is decomposed into its two surface maps u=U(x,y) and v=V(x,y). However, because the original inverse map was defined on a non-regular grid, the two surface maps will also be defined on a non-regular grid as shown in
and the map V(x,y) will be fit using the function
Details on the use of radial basis functions for two-dimensional scattered data interpolation are provided in the reference Radial Basis Function Methods for Interpolation to Functions of Many Variables, by M. J. D. Powell (Fifth Hellenic-European Conference on Computer Mathematics and it's Applications—September 2001), hereby incorporated by reference.
The function s(x) is sampled on a uniform grid of points to obtain an inverse surface map u=U(x,y) which is defined on a uniform grid. The function sv(z) is sampled on a uniform grid of points to obtain an inverse surface map v=V(x,y) which is defined on a uniform grid. The functions u=U(x,y) and v=V(x,y) are sampled on a uniform grid in order to simplify the processing required in the analytical stage 30. The radial basis functions are preferred as they tend to be more stable at points of extrapolation that are far from the points of interpolation. Since many points on the inverse map will be defined by extrapolation, this is a useful property of the radial basis functions. The preferred embodiment of the image transformation method 10 uses a forward map defined on a uniform grid to simplify user input, however a forward map defined on a non-uniform grid is handled exactly the same way as the case for the uniform grid, as such the image transformation method 10 already supports forward maps defined on a non-uniform grid.
During the analytical stage 30 at step (18), image projection method 10 converts the inverse warp map generated by inverse transformation stage 20, into an analytical form suitable for further processing. Specifically, the inverse warp map with (x,y) defined on a regular grid (e.g. the map shown in
The 2D warp map is represented by two surface functions u=U(x,y) and v=V(x,y) as discussed above. Using the well known techniques of spline interpolation/approximation on regular grids, we can represent the inverse transformation (u,v)=(U(x,y), V(x,y)) in the form
u=U(x,y)=ΣΣaijNi(x)NJ(y)
v=V(x,y)=ΣΣbyNi(x)NJ(y) (2)
where Ni(x) and Nj(y) are conventionally known spline basis functions. Additional details on these functions are provided in the reference Geometric Modeling, by Wiley 1997, herein incorporated by reference. There are several benefits of using spline basis functions. First, the number of spline pieces can be adjusted according to the complexity of the warp that has been defined. Second, the spline pieces can be joined in a smooth or un-smooth manner as described by the warp. For example it is possible to represent tears in the warp by not requiring continuity at the patch boundaries. Third, the spline functions are essentially polynomial functions which can be easily evaluated in hardware.
During the geometric transformation stage 40 at step (21), image transformation method 10 optionally performs a geometric adjustment of the analytic form of the inverse image map. By using the form (u,v)=(U(x,y), V(x,y)), where:
u=U(x,y)=ΣΣaijNi(x)Nj(y)
v=V(x,y)=ΣΣbijNi(x)NJ(y) (2)
it is simple to perform scaling of both the input and output image sizes. To scale the size of the input space for either axis we use the relationship,
s*u=s*U(x,y)=s*ΣΣaijNi(x)Nj(y)=ΣΣs*aijNi(x)Nj(y)
t*v=t* V(x,y)=t*ΣΣbijNi(x)Nj(y)=ΣΣt*bijNi(x)Nj(y) (3)
where s and t are scale factors of the u,v axis respectively and where Ni(x) and Nj(y) are conventionally known spline basis functions. To scale the size of the output space for either axis we use the relationship:
u=U(a* x,b*y)=ΣΣaijNi(a*x)Nj(b*y)
v=V(a*x,b *y)=ΣΣbijNi(a*x)Nj(b*y) (4)
where a and b are scale factors of the x,y axis respectively. Further details of applying linear transforms to spline functions are available in Geometric Modeling, by Wiley 1997. Scaling of the input and output image sizes allows the user to use an existing forward/inverse map defined for one combination of input/output image size to be used with another input/output image size without having to generate a new forward/inverse map. Other linear operations such as translation and rotation of the analytical form can be used to allow a user adjust a given forward/inverse map without having to generate a new forward/inverse map, this situation might occur in a calibration environment where a user starts off with a reference map which requires slight adjustment for that particular environment.
During the separation stage 50 at step (22), the inverse map is separated into two 1D image maps, namely PassOne and PassTwo maps. In order to represent the original warp mapping defined by (u,v)=(U(x,y), V(x,y)) as a separable 2-pass transform two warp maps need to be defined, namely (x,v)=(x, V(x,y)) and (u,v)=(U(x,v), v), with u=U(x,v) defined so that the composition of these two maps will be equivalent to the original map (u,v)=(U(x,y), V(x,y)). The relationship between the maps (u,v)=(U(x,y), V(x,y)), (x,v)=(x, V(x,y)), and (u,v)=(U(x,v), v) is shown in
From
We have the following:
(x,v)→(x,V′(x,v))→(x,V(x,V′(x,v)))=(x,v) (5)
from the definition of V′(x,v) and V(x,y)V(x,V′(x,v))=v. Then, using the above identity, (U′(x,v),v)=(U(x,V′(x,v)),v):
(U′(x,v),v)=U(x,y),V(x,y))∘(x,V′(x,v))=(U(x,V′(x,v)),V(x,V′(x,v))) (6)
The function (x,v)=(x,V(x,y)) is given by using equation (2) the function (u,v)=(U′(x,v),v) can be approximated as follows.
We now have the two functions u=U′(x,v) and v=V(x,y), that is PassOne and PassTwo, respectively, represented in analytical form, respectively:
u=U′(x,v)=ΣΣaijNi(x)Nj(v)
v=V(x,y)=ΣΣbijNi(x)Nj(y) (7)
where Ni(x) and Nj(y) are spline basis functions. The relationship between the functions (7) and the separable two-pass transform is shown in
The second pass PassTwo creates the output image from the intermediate image by using the map (x,v)=(x,V(x,y)). By fixing the value of x and allowing y to vary the map (x,v)=(x,V(x,y)) becomes a mapping of a vertical line in the output image to a vertical line in the intermediate image, and so the interpolation becomes a one-dimensional problem. By repeating this process for every value of x in the output image the entire output image can be created.
The PassOne and PassTwo maps are converted to conventionally known Target Increments (Tarlnc) and Offset form. The analytic form for a spline surface is given as:
ΣΣcijNi(x)Nj(y) (8)
which can be grouped as:
with
or alternatively:
with
and where Ni(x) and Nj(y) are conventionally known spline basis functions. These formulas provide the equations of curves that are embedded in the surface. The grouping implies that the curves are orthogonal to the domain parameters x or y. The form in (9) will give scan-lines or curves in the x parameter on the surface, and form in (10) will give us scan-lines or curves in the y parameter on the surface.
Since the first pass only performs processing in the horizontal or x direction, we represent the function u=U′(x,y)=ΣΣaijNi(x)Nj(v) in the form Σ(ΣaijNJ(v))Ni(x) that represents the surface as vertical scan-lines. Since the second pass only performs processing in the vertical direction we represent the function u=V(x,y)=ΣΣaijNi(x)Nj(y) in the form Σ(ΣaijNi(x))Nj(y) that represents the surface as vertical scan-lines.
The number of scan-line equations required for each pass can be reduced by approximating the surface by a “ruled-surface”, which uses interpolation between scan-lines. Linear interpolation can be implemented if the differences between anchor scan lines are small and precision is maintained. An alternative to using linear interpolation would be to evaluate the surfaces in conventionally known tensor form in which case no approximation is performed.
Image warping requires a positional mapping as well as a “local scale factor” in order to determine a local warp at a given output pixel. The position mapping and “local scale factor” can be combined by using a concept of an Offset and Tarlnc value that together give the positional mapping as well as a “local scale factor”. The scan line functions for each pass in the form
must be represented in a way that allow for the generation of Tarlnc values.
The Offset value for each scan-line will be determined by evaluating a polynomial O(a) while each scan-line's Tarlnc values will be determined by a polynomial T (b).
Given a scan-line mapping function x(t) (i.e. where the scan-line variable is fixed), the difference operator Δ is defined by:
Δx(b)=x(b+1)−x(b)
x(b+1)=x(b)+Δx(b)
The last equation indicates that the next function value is a combination of the current value plus some delta. The difference operator function Δx(b) can be defined as an nth -order polynomial, with the form:
Δx(b)=p1tn+p2tn−1 +. . . +pnt+pn+1
in which pn are the polynomial coefficients. This function definition can be used to model the space between domain target pixels in terms of the range source spacing. This implies that Δx(b)=T(b) and defines all target increments along the domain. An example is given in
the second graph shows the function after application of the difference operator. An advantage of using this form is that the hardware to evaluate the Tarlncs is simpler than the hardware required to evaluate the position function directly. As the difference operator reduces the order of a polynomial by one, a Tarlnc scan-line will require less computation to evaluate than the original scan-line. All that remains is showing the relationship between the functions: U′(x,v)=ΣΣaijN(x)Nj(v), V(x,y)=ΣΣaijNi(x)Nj(y), and the Offset and Tarlnc piecewise polynomial.
Given the first pass mapping and setting x=0 allows for the definition of a boundary curve of the surface. As the surface has been represented using spline patches, the boundary curve is already in a piecewise polynomial format and no additional work needs to be done. So we have O(v)=U′(O,v)=ΣaojNj(v) where O(v) is the offset polynomial for the first pass of the separable two-pass transform. The mapping U′(x,v) in scan-line form as defined by (9) is already in piecewise polynomial form, by applying the difference operator to each scan-line function we have a Tarlnc piecewise polynomial for that scan-line.
Given the second pass scan-line position mapping V(x,y) and setting y=0 we have V(x,0)=ΣaioNi(x) which defines a boundary curve of the surface V(x,y). As we represented the surface using spline patches the boundary curve is already in a piecewise polynomial format and no additional work needs to be done. As we represented the surface using spline patches the boundary curve is already in a piecewise polynomial format and no additional work needs to be done. So we have O(y)=V(x,0)=Σai0Ni(x) where O(y) is the offset polynomial for the second pass of the separable two-pass transform O(y) is the offset polynomial for the second pass of the separable two-pass transform. The mapping V(x,y) in scan-line form as defined by (10) is already in piece-wise polynomial form, by applying the difference operator to each scan-line function
we have a Tarlnc piece-wise polynomial for that vertical scan-line.
directly with an alternative image filtering apparatus, however in this case we also need to supply the “local scale factor” to the apparatus as well. One way to provide the local scale factor is to use an additional scan-line polynomial in the form
that represents the derivative of the positional scan-line. Spline functions allow for a simple manipulation of the positional spline coefficients to generate the derivative spline coefficients. Another issue with using the scan-line position mapping directly is that the initial starting point for interpolated scan-lines will not be explicitly given as in the Offset/Tarlnc case, but instead will depend on the method used to determine the interpolated scan-lines.
An example of an interpolated scan-line case without using Offset or Tarlnc functions is shown in
During the error check stage 70, the error difference between the analytic non-separable form of the warp given by (2) and the output of the convert to scan-line form stage 60 is determined. Specifically, given the scan-line based description of the separated warp it is possible to evaluate the warp at each pixel for the two passes and compare the result to the original forward or inverse non-separable warp as defined by (2). The error could be due to the fact that the analytic form of the non-separable warp given by (2) was not sampled on a sufficiently dense uniform grid as an input to the separate analytical form stage 50, in this case the sampling of the analytic form (2) is done on an increasingly dense uniform grid until an accepted tolerance is reached. The error could also be caused by approximation error when fitting the sampled approximate separable form in (3). In this case the spline basis function can be subdivided to increase the number of polynomial pieces used and or increase the order of the polynomial pieces used. Another source of error could be interpolation error between scan-lines. In this case the solution is to increase the number of scan-lines used until an acceptable error is found. A simple method would be to add scan-lines half way between existing scan-lines. This may result in not using any interpolated scan-lines at all and in effect evaluating (7) directly.
As will be apparent to those skilled in the art, various modifications and adaptations of the structure described above are possible without departing from the present invention, the scope of which is defined in the appended claims.
This regular application claims priority from provisional U.S. Application No. 60/297,240 filed Jun. 12, 2001.
Number | Name | Date | Kind |
---|---|---|---|
4472732 | Bennett et al. | Sep 1984 | A |
4835532 | Fant | May 1989 | A |
4908874 | Gabriel | Mar 1990 | A |
4922544 | Stansfield et al. | May 1990 | A |
4975976 | Kimata et al. | Dec 1990 | A |
5175808 | Sayre | Dec 1992 | A |
5204944 | Wolberg et al. | Apr 1993 | A |
5475803 | Stearns et al. | Dec 1995 | A |
5568600 | Kaba | Oct 1996 | A |
5594676 | Greggain et al. | Jan 1997 | A |
5715385 | Stearns et al. | Feb 1998 | A |
5808623 | Hamburg | Sep 1998 | A |
5848199 | Naqvi | Dec 1998 | A |
6097855 | Levien | Aug 2000 | A |
20020102031 | Lafage et al. | Aug 2002 | A1 |
20040130669 | Shin et al. | Jul 2004 | A1 |
20040156558 | Kim | Aug 2004 | A1 |
Number | Date | Country | |
---|---|---|---|
20030020732 A1 | Jan 2003 | US |
Number | Date | Country | |
---|---|---|---|
60297240 | Jun 2001 | US |