This application is based upon and claims the benefit of priority from prior Japanese Patent Application No. 2007-082613, filed on Mar. 27, 2007; the entire contents of which are incorporated herein by reference.
The present invention relates to a frame interpolation apparatus and a method for interpolating a new frame between frames of an input motion picture.
As a general method for generating an interpolation frame, a motion vector (flow) between frames is calculated by estimating a motion between the frames. The interpolation frame is generated by motion compensation using the motion vector. However, in this method, if the flow is erroneously estimated, artifacts occur.
Next, a technique to overcome the above defect is disclosed in JP-A No. 2005-6275 (Patent reference 1). As to this technique, in case of decoding a (encoded) motion picture, a distribution, a direction, and a DCT coefficient of flows (motion vectors) in an entire frame are calculated. A reliance of the flows in the entire frame is estimated by the distribution, the direction and the DCT coefficient. Based on the reliance, interpolation/non-interpolation of a new frame is controlled. In case of low reliance, the new frame is not interpolated. In this case, artifacts can be suppressed.
However, in the Patent reference 1, interpolation/non-interpolation of the new frame is controlled based on statistic quantity of the entire frame. Accordingly, it is difficult to cope with local artifacts in the frame. Furthermore, change of interpolation/non-interpolation is frequently executed for the entire frame. Accordingly, this frequent change is visually recognized as a flicker, and new artifacts occur.
The present invention is directed to a frame interpolation apparatus and a method for smoothing motion pictures by interpolating a new frame between frames of the motion pictures.
According to an aspect of the present invention, there is provided an apparatus for generating an interpolation picture between a source picture and a destination picture, comprising: a motion estimation unit configured to calculate a first motion vector from a source region of the source picture to a destination region of the destination picture, a second motion vector scaled from the first motion vector based on a first temporal distance between the source picture and the interpolation picture, and a third motion vector scaled from the first motion vector based on a second temporal distance between the destination picture and the interpolation picture; a distortion energy calculation unit configured to calculate a distortion energy of the source region, the distortion energy being smaller when a difference between a pixel value of a pixel of the source region and a pixel value of a corresponding pixel of the destination region is smaller; a weight calculation unit configured to calculate a first weight of a first interpolation region of the interpolation picture using the distortion energy of the source region, the first interpolation region being pointed from the source region by the second motion vector; a motion compensation picture generation unit configured to generate a first motion compensation picture by compensating the source region to a temporal position of the interpolation picture using the second motion vector, a second motion compensation picture by compensating the destination region to the temporal position using the third motion vector, and a third motion compensation picture by averaging the first motion compensation picture and the second motion compensation picture with the first temporal distance and the second temporal distance; an artifact prevention picture generation unit configured to generate an artifact prevention picture selected from the source picture, the destination picture, and a temporal weighted average picture, the temporal weighted average picture being generated by averaging the source picture and the destination picture with the first temporal distance and the second temporal distance; and a picture interpolation unit configured to generate a second interpolation region of the interpolation picture by averaging a region of the third motion compensation picture and a region of the artifact prevention picture, each corresponding to the second interpolation region with the first weight.
According to another aspect of the present invention, there is also provided a apparatus for generating an interpolation picture between a source picture and a destination picture, comprising: a motion estimation unit configured to calculate a first motion vector from a source region of the source picture to a destination region of the destination picture, a second motion vector scaled from the first motion vector based on a first temporal distance between the source picture and the interpolation picture, and a third motion vector scaled from the first motion vector based on a second temporal distance between the destination picture and the interpolation picture; a distortion energy calculation unit configured to calculate a distortion energy of the source region, the distortion energy being smaller when a difference between a pixel value of a pixel of the source region and a pixel value of a corresponding pixel of the destination region is smaller; a weight calculation unit configured to calculate a first weight of a first interpolation region of the interpolation picture using the distortion energy of the source region, the first interpolation region being pointed from the source region by the second motion vector; a motion compensation picture generation unit configured to generate a first motion compensation picture by compensating the source region to a temporal position of the interpolation picture using the second motion vector, a second motion compensation picture by compensating the destination region to the temporal position using the third motion vector, and a third motion compensation picture by averaging the first motion compensation picture and the second motion compensation picture with the first temporal distance and the second temporal distance; an artifact prevention picture generation unit configured to generate an artifact prevention picture using a global motion vector of a vector extraction region of the source picture, the global motion vector representing geometrical transformation from the source picture to the destination picture; and a picture interpolation unit configured to generate a second interpolation region of the interpolation picture by averaging a region of the third motion compensation picture and a region of the artifact prevention picture, each corresponding to the second interpolation region with the first weight.
According to still another aspect of the present invention, there is also provided a method for generating an interpolation picture between a source picture and a destination picture, comprising: calculating a first motion vector from a source region of the source picture to a destination region of the destination picture, a second motion vector scaled from the first motion vector based on a first temporal distance between the source picture and the interpolation picture, and a third motion vector scaled from the first motion vector based on a second temporal distance between the destination picture and the interpolation picture; calculating a distortion energy of the source region, the distortion energy being smaller when a difference between a pixel value of a pixel of the source region and a pixel value of a corresponding pixel of the destination region is smaller; calculating a first weight of a first interpolation region of the interpolation picture using the distortion energy of the source region, the first interpolation region being pointed from the source region by the second motion vector; generating a first motion compensation picture by compensating the source region to a temporal position of the interpolation picture using the second motion vector, a second motion compensation picture by compensating the destination region to the temporal position using the third motion vector, and a third motion compensation picture by averaging the first motion compensation picture and the second motion compensation picture with the first temporal distance and the second temporal distance; generating an artifact prevention picture selected from the source picture, the destination picture, and a temporal weighted average picture, the temporal weighted average picture being generated by averaging the source picture and the destination picture with the first temporal distance and the second temporal distance; and generating a second interpolation region of the interpolation picture by averaging a region of the third motion compensation picture and a region of the artifact prevention picture, each corresponding to the second interpolation region with the first weight.
According to still another aspect of the present invention, there is also provided a method for generating an interpolation picture between a source picture and a destination picture, comprising: calculating a first motion vector from a source region of the source picture to a destination region of the destination picture, a second motion vector scaled from the first motion vector based on a first temporal distance between the source picture and the interpolation picture, and a third motion vector scaled from the first motion vector based on a second temporal distance between the destination picture and the interpolation picture; calculating a distortion energy of the source region, the distortion energy being smaller when a difference between a pixel value of a pixel of the source region and a pixel value of a corresponding pixel of the destination region is smaller; calculating a first weight of a first interpolation region of the interpolation picture using the distortion energy of the source region, the first interpolation region being pointed from the source region by the second motion vector; generating a first motion compensation picture by compensating the source region to a temporal position of the interpolation picture using the second motion vector, a second motion compensation picture by compensating the destination region to the temporal position using the third motion vector, and a third motion compensation picture by averaging the first motion compensation picture and the second motion compensation picture with the first temporal distance and the second temporal distance; generating an artifact prevention picture using a global motion vector of a vector extraction region of the source picture, the global motion vector representing geometrical transformation from the source picture to the destination picture; and generating a second interpolation region of the interpolation picture by averaging a region of the third motion compensation picture and a region of the artifact prevention picture, each corresponding to the second interpolation region with the first weight.
Hereinafter, various embodiments of the present invention will be explained by referring to the drawings. The present invention is not limited to the following embodiments.
A frame interpolation apparatus 10 of the first embodiment is explained by referring to
As shown in
A flow estimation method of the motion estimation unit 12 is, for example, a block matching method, an optical flow method, a Pel-recursive method, and a Bayesian method. In the first embodiment, the block matching method is explained. However, the flow estimation method is not limited to the block matching method. The optical flow method, the Pel-recursive method, or the Bayesian method may be used.
First, a source picture is divided into blocks each having a rectangular region as follows.
B(i)={i+(x,y)T|0=<x<M1, 0=<y<M2}
A block matching algorithm based on SSD (Sum of Squared Difference) is represented as follows.
As to the block matching, assume that each pixel in the same block has the same flow. The flow of each pixel is represented as follows.
u(i+x)≡u(i), ∀x∈B(i) (2)
(3) A distortion Energy Calculation Unit 14:
Next, the distortion energy calculation unit 14 is explained by referring to a flow chart of
Assume that a brightness of each pixel is constant over frames along motion. Following equation is concluded.
I
dst(x+u(x))=Isrc(x)
A displaced pixel difference (a difference between a brightness of a pixel of a source picture and a brightness of a flowed pixel of a destination picture) is represented as follows.
dpd(x)=Idst(x+u(x))−Isrc(X)
The smaller the displaced pixel difference is, the higher a reliance degree of a flow is. Accordingly, the reliance degree is defined by distortion energy of the displaced pixel difference as follows.
D(x)=dpd(x)2
However, if the reliance degree is calculated for only one pixel, the one pixel is affected by noise, and reliability of the reliance degree of the one pixel is not high. Accordingly, a first distortion energy by convoluting displaced pixel differences of adjacent pixels is calculated as follows.
The smaller the first distortion energy is, the higher the reliance degree is. In this equation, “N(x)∈X” is a set of pixels adjacent to the pixel x. For example, nine pixels or twenty five pixels including the pixel x is the set N(x).
Even if the first distortion energy (displaced pixel difference energy) is small, if flows of other pixels adjacent to the pixel x disperse in space, an interpolation frame may be locally poor. Accordingly, a smoothing energy of the flows of the other pixels is calculated as follows.
V(x,s)=∥u(x+S)−u(x)∥2
The nearer each flow of the other pixel is, the lower the smoothing energy is. Briefly, the lower the smoothing energy is, the smoother the space is. In other words, the flow is more reliable.
The first distortion energy (displaced pixel difference energy) is extended by considering the second distortion energy (smoothing energy) as follows.
In equation (4), “λ>=0” represents a hyper parameter to attach importance to which of the displaced pixel difference and the smoothing. By the extended distortion energy, a flow having small displaced pixel difference and smoothing in space is decided as high reliable degree.
Above equations are defined as L2 norm. However, they may be defined as L1 norm, i.e., an absolute value as follows.
D(x)=|Idpt(x)|
V(x,s)=|u(x+s)−u(x)|+|v(x+s)−v(x)|
U(x)=(u(x), v(x))T
Above convolution has no problem in case that the other pixels adjacent to the pixel x has almost same motion. However, in case that the other pixels include a boundary between a first motion and a second motion, a distortion occurs in an interpolated frame. Accordingly, weighting may be operated by similarity between pixels as follows.
w(x, s)=kσ
In the second equation of equations (5), the first term of the right side represents weight of Gaussian in space kernel. Briefly, a center pixel of the kernel has the largest weight. The second term of the right side represents similarity kernel. The nearer two pixels are located to each other on the picture, the larger the weight of the two pixels is. Accordingly, a region of pixels having the same motion is only convoluted. As a result, if only a boundary of the region of pixels having the same motion overlaps an edge of the picture, convolution of another region of pixels having different motion can be avoided.
Next, the artifact prevention picture generation unit 16 is explained by referring to a flow chart of
As the interpolation frame, a temporal weighted average picture of adjacent frames (a source frame, a destination frame) is used. If an interpolation temporal position of the interpolation frame is “0=<t=<1”, the temporal weighted average picture is calculated as follows.
I
t(x) =(1−t)Isrc(x)+Idst(x)
The temporal weighted average picture has visually few entropy and excellent quality. However, the source picture Isrc or the destination picture Idst may be used as the interpolation frame.
Furthermore, a normal average (not the weighted average) may be used as follows.
I
t(x)=0.5Isrc(x)+0.5Idst(x)
Next, the alpha-map generation unit 18 is explained by referring to a flow chart of
The alpha-map generation unit 18 generates an alpha-map, i.e., a reliance map. The alpha-map has a coefficient of each pixel used for alpha-blending. An alpha-value (a reliance degree) as the coefficient has a range “0˜1” and the same size as the source picture.
The distortion energy U(x) is defined as a real number above “0”. The smaller the distortion energy is, the higher the reliance degree is. As to a region having high reliance degree, a motion compensation picture (based on flow) should be preferentially used for alpha-blending. Briefly, as to a region having small distortion energy, the alpha-value is nearly set to “1”. As to a region having large distortion energy, the alpha-value is nearly set to “0”.
Accordingly, by following normal distribution function, the distortion energy U(x) is converted to the alpha-value having range “0˜1”.
In equation (6), if σ is smaller, the reliance degree is easily decided to be low. Furthermore, logistic mapping may be used as follows.
The distortion energy is calculated at the same temporal position as the source picture. Accordingly, a temporal position of the alpha-map need be shifted. The temporal position is shifted by a flow. If a temporal position t of the interpolation frame is “0=<t=<1”, the temporal position is represented as follows.
Z
0
=└x+tu(x)┘ (8)
Plural alpha-values often overlap on the same region of the alpha-map by shifting. In this case, a higher alpha-value is preferentially used because a reliance degree of the higher alpha-value is higher than a reliance degree of a lower alpha-value. Furthermore, the x-element and y-element are omitted by the operator. In order to guarantee omission of the x-element and y-element, the x-element and the y-element may be locally repeated by shifting.
Next, the motion compensation warping unit 20 is explained by referring to a flow chart of
A shift method by flow is the same as the alpha-map generation unit 18. A pixel value of a pixel x on the source picture and the destination picture are represented as follows.
I1Isrc(x)
I2=Idst(x+u(x))
If a temporal position of an interpolation frame is “0=<t=<1”, a pixel value of the interpolation frame is calculated by temporal weighted average as follows.
I
mc=(1−t)I1+tI2
In above equation, “Imc” represents a motion compensation picture.
In the same way as the alpha-map generation unit 18, plural pixel values often overlap on the same region of the motion compensation picture by shifting. In this case, one pixel value is preferentially used by comparing the alpha-values.
Furthermore, a region having no pixel values often exists on the motion compensation picture. As to such small region, a pixel value of adjacent regions is assigned by filtering. As to such large region, a pixel value of a corresponding region on the artifact prevention picture is assigned.
Next, the alpha-blending unit 22 is explained by referring to a flow chart of
I=α(x)Imc(x)+(1−(x))It(x)
By determining an interpolated pixel value using the above equation, a new (interpolation) frame to be interpolated between the source picture and the destination picture is generated.
As to the frame interpolation apparatus 10 of the first embodiment, in case of inputting a motion picture, a new frame is interpolated between a source frame and a destination frame. By increasing the number of frames per unit time, contents of the motion picture are visually smoothed. Especially, local defects in the frame are removed.
Next, the frame interpolation apparatus 10 of the second embodiment is explained by referring to
The distortion energy calculation unit 14 calculates distortion energy by unit of block. A source picture is divided into plural blocks each having a rectangular region as follows.
B(i)={i+(x,y)T|0=<x<M1, 0=<y<M2}
Difference energy is calculated by SAD (Sum of Absolute Difference) of each block as follows.
In equation (9), sum of absolute difference is used. However, sum of square may be used.
Smoothing energy is calculated as a difference between a flow of a notice block and flows of adjacent blocks as follows.
V(i,j)=∥u(j)−u(i)∥2 (10)
In equation (10), a sum of absolute differences may be used. The smoothing energy is calculated as a sum of the difference for all the adjacent blocks. By linearly connecting equations (9) and (10), distortion energy is defined as follows.
The distortion energy of equation (11) is calculated by unit of block. In equation (11), “N” represents the number of adjacent blocks (for example, adjacent four blocks).
Next, the frame interpolation apparatus 10 of the third embodiment is explained by referring to
Next, the frame interpolation apparatus 10 of the fourth embodiment is explained by referring to
In the first embodiment, a temporal weighted average picture is used as the artifact prevention picture. As shown in
However, a flow of an area having low alpha-value has a low-reliability, and the motion compensation picture cannot be used as it is. On the other hand, even if the temporal weighted average picture is mixed with the motion compensation picture, the flicker in the interpolation picture is viewed. This problem is caused by a difference for motion between the motion compensation picture and the temporal weighted average picture. Accordingly, a picture similar to the motion compensation picture (than the temporal weighted average picture) is better to use for alpha-blending as the artifact prevention picture.
In this case, the temporal weighted average picture may be suddenly inserted onto the interpolation picture in which almost region has the same motion. The same motion is extracted as a global motion from the source picture and the destination picture. The global motion is a leading motion of almost region in the picture. Accordingly, a picture to compensate a low reliance region is generated using the global motion and used for alpha-blending. The temporal weighted average picture is also regarded as a static global motion. As a result, the global motion is considered as extension from the alpha-blending.
The frame interpolation apparatus 10 of the fourth embodiment is explained by referring to
The fourth embodiment is basically an extension of alpha-blending and the same as the first embodiment. Accordingly, units different from the first embodiment in the frame interpolation apparatus of the fourth embodiment are explained. In the fourth embodiment, the following descriptions are used.
First, the global motion extraction unit 26 is explained by referring to
ū=(u,v)T (13)
This geometrical transformation is determined by calculating a typical flow from some region. The typical flow is calculated as an average value or a median value. A problem that global motions “ūk(k=1, . . . , K)” are extracted from a flow field is simply solved by k-means method. The k-means method is an algorithm to cluster the flow field into K units.
z(i) Z: label of cluster to which each flow u(i) belongs (14)
In this case, an average value of each cluster is calculated as follows.
U
k
(t)
={u(i)|z(t)(i)=k,∀i∈Λ2}
In equation (15), “Num( )” is an operator to count the number of elements. Affix “(t)” of right shoulder is iteration.
Next, a cluster having minimum distance from the average is determined and the label is updated as follows.
As to the k-means method, the equations (14) and (15) are repeated a predetermined number of times. The output value converges at several times of operation, and sometimes converges at one time.
Many methods to set the initial label are considered. For example, the label k (k=1, . . ., K) may be randomly assigned. In the fourth embodiment, as shown in
In equation (17), an initial value “Īk(0)(k=1, . . . , K)” of average of each cluster is suitably set (for example, a division value that “0˜255” is divided by K).
The label is updated by following labeling.
Next, an average value of each cluster is calculated as follows.
I
k
(t)={Īsrc(i)|zsrc(i)(i)=k,∀i∈Λ2}
The equations (17) and (18) are repeatedly operated several times. A clustering result of an entire region of the source picture is set to the initial value of flow clustering as follows.
z
(0)(i)=Zsrc (T)(i),∀i∈Λ2 (20)
In equation (20), “T” is the number of iteration of k-means method. After calculating a label z(i), a global motion is determined by an average value or a vector median value as follows.
The second alpha-map generation unit 30 generates the alpha-map using the global motion uk(k=1, . . . , K). If all pixels in a frame have the global motion ūk (the entire frame has uniform flow), the alpha-map is generated in the same way as in the first embodiment. The alpha-map is represented as αk(x)(k=K).
Furthermore, another alpha-map for the temporal weighted average picture is also generated. The global motion is represented as ūk=(0,0)T (i.e., static motion) and the alpha-map α1(x) is generated.
The second motion compensation warping unit 32 generates a motion compensation warping picture using the global motion ūk(k=1, . . . , K). If all pixels in a frame have the global motion ūk (the entire frame has uniform flow), motion compensation warping is executed in the same way as in the first embodiment. The warping result is represented as Ik(x)(k=1, . . . , K).
The alpha-blending unit 22 blends (compounds) the motion compensation warping picture Im(x), the global motion warping picture Ik(x)(k=1, . . . , K), and the temporal weighted average picture Il(x). In order to correctly operate alpha-blending, sum of weight coefficients of Im(x),Ik(x),Il(x) is designed as “1”. This model is represented as follows.
In equation (23), in case of “K=0”, this model is equivalent to the alpha-blending of the first embodiment. Accordingly, the fourth embodiment is regarded as an extension of the alpha-blending. The second term of the right side is weighted average of each alpha-value in the global motion warping picture. In this case, a region having high alpha-value in the global motion warping picture is preferentially used.
In the above embodiments, the motion compensation picture and the temporal weighted average picture are respectively weighted-averaged at a temporal position of the interpolation picture. However, when the frame interpolation apparatus is packaged, in case of quantizing a temporal direction between frames, the temporal position of the interpolation picture may not be quantized. In this case, the motion compensation picture and the temporal weighted average picture are weighted-averaged at any position near the temporal position.
In the above embodiments, the alpha-map calculation unit 18 shifts the alpha-map at the temporal position of the interpolation frame after generating the alpha-map. However, after shifting the distortion energy of each region to the temporal position of the interpolation frame, the alpha-map may be generated.
In the disclosed embodiments, the processing can be accomplished by a computer-executable program, and this program can be realized in a computer-readable memory device.
In the embodiments, the memory device, such as a magnetic disk, a flexible disk, a hard disk, an optical disk (CD-ROM, CD-R, DVD, and so on), or an optical magnetic disk (MD and so on) can be used to store instructions for causing a processor or a computer to perform the processes described above.
Furthermore, based on an indication of the program installed from the memory device to the computer, OS (operation system) operating on the computer, or MW (middle ware software), such as database management software or network, may execute one part of each processing to realize the embodiments.
Furthermore, the memory device is not limited to a device independent from the computer. By downloading a program transmitted through a LAN or the Internet, a memory device in which the program is stored is included. Furthermore, the memory device is not limited to one. In the case that the processing of the embodiments is executed by a plurality of memory devices, a plurality of memory devices may be included in the memory device. The component of the device may be arbitrarily composed.
A computer may execute each processing stage of the embodiments according to the program stored in the memory device. The computer may be one apparatus such as a personal computer or a system in which a plurality of processing apparatuses are connected through a network. Furthermore, the computer is not limited to a personal computer. Those skilled in the art will appreciate that a computer includes a processing unit in an information processor, a microcomputer, and so on. In short, the equipment and the apparatus that can execute the functions in embodiments using the program are generally called the computer.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. It is intended that the specification and examples be considered as exemplary only, with the true scope and spirit of the invention being indicated by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
P2007-082613 | Mar 2007 | JP | national |