The present invention relates to digital images and more specifically to altering the format of digital image data from a first format to a second format.
In the video industry, there are a variety of digital video formats. One common digital video format is referred to in the art as “525”. Such digital video has an active picture area of 720 columns and 486 rows wherein each point on the active picture area is defined by a pixel. There are 486 lines, each consisting of 720 intensities, typically in three colors each having a separate intensity value. This active area represents a 4×3 image in that the width of the image is 1.33 times its height. In particular, the pixels in “525” are non-square rectangles with aspect ratio
In the video industry, it is often desirable to upconvert such images for High Definition (“HD”) broadcasts and other HD applications. One HD format, known as “1080,” has 1920 columns and 1080 rows, and represents a scene with aspect ratio 16/9.
the pixels are square. In order to convert from “525” format to the “1080” format, it is necessary to “stretch” 720 columns into 1920 columns. Since the pixels are non-square in “525”, an effective 16:9 scene is obtained by using only 364 of the 486 rows; in other words, in each of the 720 columns, 364 pixels are stretched into 1080 pixels. This approach fills up the HD image, but sacrifices some of the original image content, namely a horizontal strip of the original scene consisting of 486−264=122 lines. Another approach stretches the 486 rows of “525” into 1080 rows of “1080” format and stretches the 720 original columns into 1437 (=0.9*720*1080/486). Black mattes are placed on the left and right of the destination image in order to expand the image to 1920 columns, thus the entire screen of the display device is not fully utilized.
Other video applications not only require upconversion as with the “525” to “1080” conversion by also require downconversion. For example, in the case of computer animation, one might wish to convert a file of dimensions 1848×1101 to a universal video master of dimension 1920×1080. The 1848 columns are stretched to 1920 and the 1101 rows are squeezed to 1080. Whereas the pixels in the computer file are square, the resulting “1080” HD image is “anamorphic” (non-square pixels).
In order to determine what color (intensity) values a pixel should have once the aspect ratio has been normalized as described above, prior art systems have used a technique which is known in the art as “Nearest Neighbor.” For each pixel (location) in the destination image, the nearest neighbor method finds the nearest pixel in the source image and copies the source intensity value (for each color separately) at that location to the destination pixel. This is an extremely fast approximation, but provides poor quality. For example, in the case of upconversion, an intensity value may be repeated many times, creating disturbing, uniform blocks.
Another prior art technique is known as bi-linear Interpolation. In bi-linear Interpolation for each pixel z in the destination image, the nearest four pixels, say z1, z2, z3, z4, are first found in the same field from the source image corresponding to the same instance in time. If the destination array has square pixels, then these four points form a square with z inside. The source intensity data is modeled for each color component with a bilinear function, P(x, y)=A+Bx+Cy+Dxy, of the two variables. The four coefficients A, B, C, D are determined by the four intensity values Is(z1), . . . , Is(z4) of the source image at the four nearest pixels. The intensity value selected for the destination image at z=(u, v) is P (u, v), where u and v are the coordinates of z in the common coordinate system of the source image in a first format and the destination image in a second format. Other prior art techniques improve upon the bi-linear technique such as bi-quadratic and bi-cubic. These techniques also rely on polynomials, however the polynomials have degrees two and three respectively. Additionally, the subset of pixels in the source image used to determine the coefficients of the polynomial is larger. For example, in the case of bi-quadratic interpolation, there are nine coefficients to estimate and hence, in the case of exact interpolation, nine pixel values in the source image are identified with each destination location. Other variations are possible in which the polynomials are not fitted exactly to the source data but rather provide only a smooth approximation. The quality is superior to bilinear approximation, but the amount of computation is far greater.
Although the prior art polynomial techniques allow for the creation of previously non-existant pixel values, these techniques encounter a prominent loss of sharpness due to the fact that image transitions and other high frequency structures are compromised due to filter over-smoothing. This can be readily visualized by considering a scan line of the original image which crosses a sharp boundary, such as would occur when one subject occludes another; the values along the scan line will typically have smooth sections joined by rather sharp transitions in which the image intensities change abruptly from one characteristic level to another. If the image is approximated by a low-order polynomial, these transitions will be overly smoothed. Visually, the new image appears to have fewer details and to be blurred as compared with the original. It is well-known that image transitions (edges and boundaries due to occulsion, shadowing, motion, etc.), carry a large portion of the information content, therefore this type of degradation poses a serious problem. Thus, a need exists for adaptive filtering in which the filtering is dependent upon the location within the source image, so as to reduce the amount of blurring caused by traditional interpolation or smoothing.
In a first embodiment of the invention there is provided a method and computer program product for format conversion of a digital image. In one embodiment of the invention, a digital image in an old format is converted to a converted image having a new format wherein the digital image and the converted image have the same reference system for describing the location of pixels. For example, a reference system may be the standard reference system used in the video industry in which pixels have an origin at the upper left hand corner of the image which designates pixel point (0,0). The digital image is having a plurality of digital data wherein there is digital data at each pixel location. An example of digital data is RGB (red, green, blue) values of each pixel. The digital image being converted has dimensions of length and width and a resulting aspect ratio.
The method as embodied normalizes the aspect ratio of the source image and the new format. As used in the following description and in the appended claims, the term “window” shall mean a collection of intensity values associated with pixels in the source image. A window is defined which contains a set of intensity values in the old format. A gradient is estimated for the window and a polynomial is selected based on the gradient in order to represent the intensity variation in the window. The order of the polynomial is larger in the direction of the gradient and smaller in the direction orthogonal to the gradient to avoid blurring at transitions in the destination image. Coefficients are calculated by solving a group of simultaneous equations using the known digital data to solve the equations. After the coefficients for the polynomial are calculated, the intensity values for the destination image pixels within the window can be determined dependent upon the polynomial values at those location. This process is continued by selecting different windows until all of the pixels within the converted image have associated digital data thus, adaptive filtering is accomplished.
In the preferred embodiment, it should be understood that the size and direction of the gradient of the digital data determines the polynomial type that is selected. As a result, each window will have its own polynomial type and coefficients
In one embodiment after the pixels in the destination image have been assigned values, the values are recalculated based on polynomials from other proximate windows. By having multiple calculations for each point, a straight or weighted average of the points can be performed which then determines the final value of the digital data for the converted image.
The foregoing features of the invention will be more readily understood by reference to the following detailed description, taken with reference to the accompanying drawings, in which:
Definitions. As used in this description and the accompanying claims, the following terms shall have the meanings indicated, unless the context otherwise requires:
The term “filtering” as used in the following claims and the disclosure shall imply to both interpolation and smoothing as is understood by those of ordinary skill in the art. The term “adaptive” as used in conjunction with the word “filtering” means that the filter changes dependent upon a characteristic of the source image. For example, the filter may vary depending upon the location within the source image or the filter may vary in direction, such that the filter is different in the x and y directions of the image as defined by a cartesian coordinate system or the filtering may vary in direction dependent upon some combination of x and y based upon the source image. “Pixels” refer to both a small, rectangular element of a picture and a grid point. Further, the term pixel may refer to the intensity values associated with a pixel grid point where the context requires. The location of a pixel is assumed to be its upper left hand corner as depicted in
The environment defining the digital video data of the source and destination images is defined in mathematical terms as follows. Let F1, F2, . . . represent the source sequence of digital video fields. Such the separation in the time between fields is one-sixtieth of a second, the total number of fields is sixty times the run length in seconds. In standard color video, each field Fn consists of three images, say (In(1), In(2)), In(3)), whose physical meaning depends on the color coordinate system as shown in
Given that the desired destination format is C′ columns and R′ rows, and an aspect ratio W′/H′. For each image, In(k), k=1, 2, 3, of each field Fn, to use exactly the same algorithm is used to reformat each image, therefore the following description describes one embodiment of the method for a single C×R image I and all indices referring to field numbers and color components are henceforth dropped.
The source image is an array of intensities defined over a regular, rectangular grid of points in the plane, denoted by G={(xi,yj), 1≦i≦C, 1≦j≦R}. Thus there are C R points. Standard image coordinates are presumed, so that the point (x1, y1), is at the upper left hand corner of the grid, the point (x1, y2) is immediately below (x1, y1), and the point (x1, yR) is at the lower left hand corner. Point (x1, y1), is the upper left hand corner of a pixel and the other three corners being (xi+1,yj),(xi, yj+1) and (xi+1,yj+1). The grid is assumed to be regular in the sense that the spacing is the same between any two adjacent grid points in any row; similarly, the spacing is the same between any two adjacent points in any column. However, the row and column spacing need not be the same, i.e., the pixels need not be square. The relative spacing between any two adjacent column points and any two adjacent row points conveys the shape of the pixels.
The new grid defining the new image has C′ columns and R′ rows, with an aspect ratio of W′/H′. The set of intensity values assigned to this new array is the reformatted, destination image; it represents one color component of one field in the reformatted digital video sequence. In general, the new grid G′ has a different number of points (R′C′ vs RC) and different spacings.
The destination image is in the same coordinate system as the source image wherein the two pixel arrays determine the same bounding rectangle. In fact, all methods for reformatting, in particular those discussed in the background make this assumption in order to perform smoothing or interpolation.
Once the aspect ratios are standardized, the destination image is represented by grid G′={(x′i, y′j), 1≦i≦C′, 1≦j≦R′}, which can be assumed to occupy the same region of the plane as G. For example, in
Let Is(i, j) denote the intensity data of the source image at pixel (xiyj) ε G. The source intensity data are then Is={Is(i, j), 1≦i≦C, 1≦j≦R}. Based on this, intensity values are assigned to each pixel (x′i, y′j)ε G′. Let Id={Id(i, j), 1≦i≦C′, 1≦j≦R′} denote the destination image.
A window Bgr ⊂ G is defined for each pixel z ε G′, roughly centered at z (Step 602). The source values {Is(m, n),(xm, yn) ε Bgr} are used in order to estimate the gradient of Is at the location z (Step 603). The estimate of the gradient of Is at z, denoted ∇Is(z), may be based on as few as three pixels in G. The three pixels form a triangle whose orientation depends on where z falls relative to the pixels in G. Let (xkyl) ε G be the pixel location closest to z. There are four possible choices for the other two locations depending on which of the four quadrants that z lies in relative to (xkyl). For example, if the location is toward the upper left of z, then Bgr={(xk, yl),(xk+1, yl),(xk, yl+1)} and the estimated gradient is
∇Is(z)=(Is(k+1, I)−Is(k, j+1)−Is(k, j)).
The formula defining the estimated gradient for the other three choices should be obvious to those of ordinary skill in the art.
In general z does not belong to G, the grid over which Is is defined. The estimated gradient determines the structure of the polynomial that will approximate Is in the vicinity of z. The choice of this polynomial is based on a pre-computed table which identifies one polynomial with each vector in a set of quantized (representative) gradients (Step 604). The structure of the polynomial is the set of non-zero coefficients. The representative polynomials are not necessarily symmetric, such as P(x, y)=A+Bx+Cy or P(x, y)=A+Bx+Cy+Dxy+Ex2+Fy2. Instead, the amount of smoothing is smaller (i.e., the degree is larger) in general direction of the gradient than in the direction orthogonal to the gradient. Moreover, the amount of smoothing in both directions depends on the magnitude of the gradient. The larger the magnitude the less the amount of smoothing, thereby providing a relatively higher accuracy of approximation along directions of significant transition than elsewhere. For each z ε G′, the coefficients of the chosen polynomial are estimated based on the source intensity values in another window Bls ⊂ G, also roughly centered at z, and in general larger than Bgr (Step 605). The size N(Bls) of Bls depends on the number N(P) of (non-zero) coefficients in the polynomial assigned to pixel z. Estimation of the coefficients is based on least-squares minimization (Step 606). In least-squares minimization, the disparity between the polynomial and the source data is the sum of the squared intensity differences at the points in Bls. In the case of “smoothing”, N(P)<N(Bls) and in the case of “interpolation”, N(P)=N(Bls). For the least squares minimization there is a closed-form solution which is linear in the data. Once the polynomial is fully determined the intensity values of the destination image for pixel point z are determined (step 607). The process then continues until intensity values are determined for all pixel points in the destination image (Step 608).
The selected window Bgr(z) which is a set of source pixels from G the lattice of the source image is only limited by the fact that it must contain the point z from the destination image and must have enough points proximate to z to estimate the gradient of Is at z. In the preferred embodiment, the choice would be space-variant, depending on the nature of the video material. One might desire a finer estimate (larger window) at “important” locations or when the data is noisy. A small window might suffice for relatively flat regions, such as a section of sky or wall in which the frequency of change of the intensity values is low, whereas a larger window might provide a better estimate near high-frequency structures. The examples provided below presume a non-space-varying window Bgr, but it should be understood by one of ordinary skill in the art that in the alternative a space-varying window could similiarly be implemented.
The possible gradient vectors are quantized into a discrete set of magnitudes and directions, wherein he quantization level is a parameter of the system. In one implementation, both the magnitude and the direction of the gradient are quantized into four possible values, yielding sixteen possible “classes” for the gradient vector ∇Is(z) (Step 801). Let these classes be denoted by c=1, . . . 16. Each class c is assigned both a polynomial Pc and a window Bls(c). It should be understood that through the quantization of the direction of the gradient the selected polynomial has an order which is larger in the direction which is substantially equivalent to the gradient and and smaller order in the direction which is substantially orthogonal to the gradient. See
In order to illustrate these assignments, we consider two simple examples. For simplicity, suppose G is the integer lattice, i.e., (xi, yj)=(i, j), that the closet point to z in G is (0,0), and that z is the lower right of (0,0).
Assume the gradient is the vector (w,0) and is assigned to a class which represents an intensity surface which is somewhat flat near z but with a mild inclination in the horizontal direction. The designated polynomial might then be Pc(x, y)=A+Bx, which is constant in the vertical direction (as smooth as possible) and linear in the horizontal direction. There are then two parameters to estimate, namely A and B. In this case, the window Bls(c) determined by ∇Is(z) might simply be the two points {(0,0), (1,0)}. For larger values of w, the polynomial chosen might be P(x, y)=A+Bx+Cx2 or P(x, y)=A+Bx+Cy+Dx2 if the y-component of the gradient was non-zero but small compared with w. In these cases, the window Bls(c) would contain three or more points.
In another example, if instead the gradient vector is (w, w), then the polynomial might be first degree in both x and y, for instance Pc(x, y)=A+Bx+Cy, and the window might be taken as Bls(c){(0,0),(1,0),(0,1)}. A larger magnitude might be assigned the polynomial Pc(x, y)=A+Bx+Cy+Dx y, Ex2+Fy2, where again the amount of smoothing in the x and y directions is the same.
As defined in Step 803 a least squares approximation is used to determine the coefficeints. For a general polynomial P(x, y) in two variables, let αnm denote the coefficient of the term xnym, n, m=0,1 . . . , and let A(P) denote the set of indices (n, m) for which αnm≠0. The pixel defined by z is assumed to be fixed so that c=c(z), the class assigned to the gradient ∇Is(z), as explained above. The polynomial used to approximate the source data in the vicinity of z is then
where {right arrow over (α)} denotes any ordering of the set {αnm, (n, m)ε α(P)}. The vector {right arrow over (α)} has N(Pc) components where N(Pc)=|A(P)|. The set of locations used that will be utilized for estimating {right arrow over (α)} is Bls(z)⊂G, where Bls depends only on c. The estimate of {right arrow over (α)} will be a function of the image values {Is(i, j),(xi, yj)ε Bls(z)}. In the remainder of the specification, the dependence on c from the notation will be dropped.
A quadratic cost is provided which is used for defining the simultaneous equations for the least squares approximation used to determine the coefficients as follows:
The coefficients are chosen by minimizing F, which is a quadratic function of {right arrow over (α)}. The partial derivatives of F is taken with respect to the components of a α leading to N(P) linear equations in the N(P) unknowns {αnm}:
where the sums extend over all (i, j) for which (xi, yj)ε Bls. This can be rewritten in matrix form as
Γ{right arrow over (α)}={right arrow over (β)},
where Γ is an N(P)×N(P) matrix constructed from the powers {xin yj m}, but is independent of the source data, and {right arrow over (β)} is an N(P) dimensional vector which constructed from inner products between the source intensity values {Is(i, j),(xiyj)ε Bls}, and the components of P.
Depending on the relative sizes of N(P), the number of non-zero coefficients of P, and the N(Bls), the number of points in (Bls) there are two generic cases for determining a solution to the simultaneous equations. The first is if N(P)=N(Bls). If this is true then there is a unique interpolating polynomial, meaning that the coefficients of P are uniquely determined by the source data and this polynomial coincides with Is on Bls, i.e. P(xi, yj)=Is(i, j) for each (xi, yj)ε Bls. In the alternative case, if N(P)<N(Bls), then (in general) there is no interpolating polynomial since the number of constraints exceeds the number of degrees of freedom. In this case, the image data is “smoothed” in the sense that the approximating polynomial is smoother than the intensity surface in the vicinity of z.
In one implementation, interpolating polynomials are used therefore N(P)=N(Bls). In the alternate, but more complex, implementation, the choice of the type of approximation can depend on the behavior of the image in the vicinity of z, allowing either smoothing or interpolating polynomials depending on the nature of the data and the time constraints. In cases in which very fast reformatting is required, the smoothing might be preferred. Finally, the new image value at location z=(x′i, y′j)ε G′ is then Id(i, j)=P(x′i, y′j;{right arrow over (α)}) where {right arrow over (α)}=Γ—1{right arrow over (62 )}. This process is carried out for every element of zε G′, thus resulting in the construction of the destination image Id={Id(i, j),(x′i, y′j)ε G′}.
At each location of z there is a system of N(Pc) linear equation to solve in N(Pc) unknowns, where the subscript c depends on the location of z through the gradient there. Consider the matrix equation from above {right arrow over (α)}Γ={right arrow over (β)}, with the dependence on z explicitly incorporated, the equation may be rewritten so as to solve for the coefficients, {right arrow over (αz)}=Γz−1{right arrow over (βz)}. The matrix Γz−1 depends on the set of powers (xinyjm) for (xiyj)ε Bls(z) while the vector {right arrow over (β)}z depends on both the powers and the source intensity values in the vicinity of z. Thus, the vector {right arrow over (α)}z depends only on the class c(z) of the polynomial assigned to the estimated gradient ∇Is(z), and as a result, it may be assumed that z is at some fixed reference location, say z0 ε G′. The polynomial which approximates the source intensity data in vicinity of a point z ε G′ will be a simple translation of the polynomial which approximates the same intensity surface in the vicinity of any other point in G′. Since the matrix Γ is independent of the intensity data and also independent of the coordinate system assigned and only dependent upon the relative spacing, the inverse can be computed off-line and stored thus providing an improvement in efficiency to the method. There is then one matrix Γ for each type of window Bls, independent of its actual location. The size and shape of the window depends on the category of the gradient as well as the position of z relative to its nearest neighbor in G. It should be understood by one of ordinary skill in the art that other improvements in computational efficiency may be achieved using the foregoing method. For example, additional points in the new format defining the destination image may be calculated based upon the determined polynomial for a given window without recalculating the gradient, determining the polynomial, and calculating the coefficients. For instance, if two points in the new format reside within the window which is used for defining the gradient, and the points within the window which are selected for calculating the gradient are the same including the nearest neighbor in G, then the same polynomial and coefficients may be used to calculate the intensity value(s) associated with both points. It should be clear that this computational gain in efficiency is dependent on the size of the selected windows. Further, the efficiency is proportional to the number of source image intensity values which are within the selected window.
The value of the destination image at a point z is then computed as shown in
What follows is as numerical illustration of the process of adaptive interpolation. Suppose the source grid is G={(2i, 3j):1≦i≦C, 1≦j≦R}. The corresponding pixels are not square; their width-to-height ratio is
The aspect ratio W/H of the corresponding scene is 2C/3R. The destination image is to have square pixels and dimension 2C×3R; it has the same aspect ratio as the source, but with (nominally) twice the resolution in the horizontal direction and (nominally) three times the resolution in the vertical direction. The appropriate destination grid is then G′={(i, j):1≦i≦C′, 1≦j≦R′} with C′=2C, R′=3R. These two grids are depicted in
Consider defining the destination image at the location z=(x′i, y′j)=(4,7)ε G′; see
Minimizing F with respect to α is equivalent to solving the linear system
Γ{right arrow over (α)}=(β0, β1, β2)
where Γ is a 3×3 matrix constructed from the coordinates {(xi, yj)ε Bls}; in this particular case this only involves the three x-coordinates, namely (x1, x2, x3)=(4, 6, 4). The components of {right arrow over (β)} are inner products involving the source intensity values:
βk=<(x1k, x2k, x3k),{right arrow over (I)}s>, k=0,1,2,
where {right arrow over (I)}s=(Is(4, 6),Is(6, 6), Is(4, 9)).
The destination image at z is defined as Id(4, 7)=P(4, 7)=α00+4α10+16α20, where {right arrow over (α)}=Γ−1{right arrow over (β)}. The computation would be exactly the same for any other location z which has the same relative position in the grid G, and hence lies directly below a point in G, i.e., for all pixels in G with coordinates of the form (2i, 3j+1). For example, suppose z=(8, 10). The relevant source image values are then Is(8, 9), Is(10, 9), Is(8, 12). If the gradient Is at (8,10) belongs to the same class as the gradient at (4,7), and if we replace the source intensity values at (4,6), (6,6), (4,9) by those at (8,9),(10,9), (8,12), we can use the same matrix Γ as above. Only {right arrow over (β)} changes. The destination image at z is then defined to be Id(8, 10)=P(4, 7).
Finally, the computation would be analogous for any other location in z ε G, depending on where z falls in the grid G. There are in fact six generic cases to consider, leading to six different matrices Γ for each class c. (In general, in pure upconversion, the number of cases is the number of grid points G′ covered by each pixel in G.) All of these inverses can be computed and stored once the grids G and G′ are fixed.
The polynomial class c(z) may change rather abruptly from location to location. The result is that the destination image has visible seams and a generally “blotchy” appearance. In another embodiment, the final intensity value at point z is first determined as described above (Step 1100). At least one new value for the point z is calculated based upon coefficients and polynomials of neighboring windows to the point of z (Step 1101). The final value of z is the determined through a weighted average of the value determined Pc(z) and the new values calculated from the polynomials identified with locations in the vicinity of z (Step 1102). Let P*z denote this weighting for the location of z. The smoothed destination image at z is then P*z(x′i, y′j). Since averaging is linear, the coefficients of P*z remain linear in the data.
In an alternative embodiment, the disclosed method for format conversion may be implemented as a computer program product for use with a computer system as described above. Such implementation may include a series of computer instructions fixed either on a tangible medium, such as a computer readable medium (e.g., a diskette, CD-ROM, ROM, or fixed disk) or transmittable to a computer system, via a modem or other interface device, such as a communications adapter connected to a network over a medium. The medium may be either a tangible medium (e.g., optical or analog communications lines) or a medium implemented with wireless techniques (e.g., microwave, infrared or other transmission techniques). The series of computer instructions embodies all or part of the functionality previously described herein with respect to the system. Those skilled in the art should appreciate that such computer instructions can be written in an number of programming languages for use with many computer architectures or operating systems. Furthermore, such instructions may be stored in any memory device, such as semiconductor, magnetic, optical or other memory devices, and may be transmitted using any communications technology, such as optical, infrared, microwave, or other transmission technologies. It is expected that such a computer program product may be distributed as a removable medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server or electronic bulletin board over the network (e.g., the Internet or World Wide Web). Of course, some embodiments of the invention may be implemented as a combination of both software (e.g., a compute program product) and hardware. Still other embodiments of the invention are implemented as entirely hardware, or entirely software (e.g., a computer program product).
Although various exemplary embodiments of the invention have been disclosed, it should be apparent to those skilled in the art that various changes and modifications can be made which will achieve some of the advantages of the invention without departing from the true scope of the invention. These and other obvious modifications are intended to be covered by the appended claims.
This application is a continuation application and claims priority from U.S. patent application Ser. No. 09/821,778 filed on: Mar. 29, 2001 which itself claims priority from U.S. Provisional Application Ser. No. 60/192,926 filed Mar. 29, 2000 both of which are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
60192926 | Mar 2000 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 09821778 | Mar 2001 | US |
Child | 11343045 | Jan 2006 | US |