This application claims the benefits of Korean Patent Application No. 10-2005-0076456, filed on Aug. 19, 2005, and Korean Patent Application No. 10-2005-0124050, filed on Dec. 15, 2005, in the Korean Intellectual Property Office, the disclosures of which are incorporated herein in their entireties by reference.
1. Field of the Invention
The present invention relates to a method of minimizing prediction calculation of a sub-pixel motion vector and an integer motion vector with avoiding degradation of picture quality when moving pictures are encoded, and more particularly, to a method of estimating a sub-pixel and an integer pixel through a conjugate quadratic function by using the convexity property of a sub-pixel in a moving picture encoder, and a method of estimating a discrete image as a continuous image.
2. Description of the Related Art
The amount of digital image data that is used in video conference, high picture quality televisions, video on demand (VOD) receivers, personal computers (PCs) supporting moving picture experts group (MPEG) images, game consoles, ground-wave digital broadcasting receivers, digital satellite broadcasting receivers, and cable TVs, increases greatly in the process of digitizing an analog signal and due to the characteristic of an image itself. Accordingly, the digital image data is not used as it is, but is compressed through an effective compression method and then used.
The compression of the digital image data can be broken down into three types: a method reducing temporal redundancy, a method reducing spatial redundancy, and a method reducing the amount of data using the statistical characteristic of generated codes. Among these, a representative one of the method of reducing temporal redundancy is a motion estimation and compensation method and is used by most of moving picture compression standard drafts, such as the MPEG and H.263.
In the motion estimation and compensation method, a part most similar to a predetermined part of a current image is found in the previous or next reference image and only the difference component of the two parts is transmitted. If a motion vector is found as precisely as possible, the difference component to be transmitted is reduced and the amount of the data can be reduced more effectively. However, a lot of estimation time and computation amount are required in order to find a most similar part in the previous or next image. Accordingly, efforts have been being made continuously to reduce a motion estimation time that takes the largest part of a time for encoding moving pictures.
Meanwhile, the motion estimation method includes a method on a pixel-by-pixel basis and a method on a block-by-block basis. Between the two, the block-based method is the most widely used one.
In the block-based estimation method, an image is divided into blocks of a predetermined size, and a block best matching a block of a current image is found in a search area of the previous image. The difference between the found block and the current image block is referred to as a motion vector. In the block-based estimation method, the motion vector is encoded and processed. For the matching calculation between two blocks, a variety of matching functions can be used. The most widely used one is a sum of absolute difference (SAD) that is a value obtained by adding all the absolute values of differences between pixels of the two blocks.
In the case of H.264 codec, a searching operation is performed by using a cost function based on a rate distortion optimization (RDO), instead of using the convention SAD-focused searching method. The cost function used in H.264 performs a search operation using a rate-distortion cost that is obtained by adding the conventional SAD value to a value obtained by multiplying the number of encoded coefficients by a Lagrangian. At this time, the number of the encoded coefficient is determined through substitution with a value in proportion to a quantization coefficient value, and the result is multiplied by a fixed Lagrangian multiplier, a cost value is determined and a search operation is performed.
Also, in the H.264 moving picture encoding, in order to achieve both a high compression efficiency and picture quality, a total of 8 different blocking modes are used to select one mode having a minimum value among the blocks, unlike the conventional moving picture encoding in which encoding is performed in units of 16:16 large blocks or 8:8 large blocks.
Also, in the H.264 moving picture encoding standard, a sub-pixel is generated as down to a ¼ pixel as well as a ½ pixel and with the sub-pixels, moving picture encoding is performed. Accordingly, an encoding object image that has a pixel area four times bigger than an integer pixel should be generated and at the same time even estimation of a sub-pixel should be performed. As a result, the amount of computation is much more than that of the previous methods, and it is needed to reduce the amount of the image estimation and motion estimation computation.
The present invention provides a method estimation method greatly reducing operations for estimating a sub-pixel by using the characteristic of the sub-pixel, and a method of generating a discrete image as a continuous image.
According to an aspect of the present invention, there is provided a motion estimation method for encoding moving pictures, the method including: generating an objective function for a differential definition of a sub-pixel system composed of a chain of arbitrary n-tap filters with respect to an arbitrary k-th order sub-pixel, based on the relation between an integer pixel and a sub-pixel generated through linear FIR filtering; and estimating a motion vector of a sub-pixel by applying a conjugated quadratic algorithm to the objective function.
According to another aspect of the present invention, there is provided a method of generating a discrete image as a continuous image, the method including: function for a differential definition of a sub-pixel system composed of a chain of arbitrary n-tap filters with respect to an arbitrary k-th order sub-pixel, based on the relation between an integer pixel and a sub-pixel generated through linear FIR filtering; and generating a discrete image of the n-tap FIR filter sub-pixel system as a continuous image, by expanding a Taylor series in relation to the value of an objective function with respect to the arbitrary k-th order sub-pixel.
The method of conjugated polynomial search method using the convexity property of a sub-pixel and the method of generating a discrete image as a continuous image according to the present invention, the characteristic of a sub-pixel image is analyzed and through the characteristic of three sub-pixel images derived from the analysis and a differential definition, an integer pixel, a sub-pixel and a continuous image of a discrete image are efficiently estimated.
The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings in which:
The present invention will now be described more fully with reference to the accompanying drawings, in which exemplary embodiments of the invention are shown.
Referring to
The present invention is not limited to the H.264 application but can also be applied to all video images. The H.264 is only an example of the applications. The integer pixel estimator estimates a motion vector by performing integer pixel prediction and sub-pixel prediction on a block-by-block basis.
A method of conjugated polynomial search using the convexity of a sub-pixel and an algorithm for generating a discrete image as a continuous image will now be explained. In order to help understanding of the present invention, the difference between the conventional technology and the present invention will be explained first.
In the estimation of an integer pixel and a sub-pixel according to the conventional technology, first, a sub-pixel image that is the object of estimation is generated in relation to an entire image unit at maximum, or in relation to each estimation object window region for estimating a motion. Then, through an estimation pattern based on block estimation, an integer pixel and a sub-pixel are estimated.
Meanwhile, in the present invention, through an RD-cost value in relation to neighboring integer pixels generated after first integer pixel estimation is finished, estimation of a sub-pixel is performed. If it is determined that estimation of a next integer pixel is needed in the process of estimating the first integer pixel, an estimation position of the next integer pixel is predicted through an RD-cost value generated by the estimation of the first integer pixel. After the integer pixel is estimated through the RD-cost value, a sub-pixel is estimated by using only the integer pixel RD-cost value without separately estimating a sub-pixel.
The principle of the present invention will now be explained with reference to mathematical expressions. Generally, estimation of a sub-pixel is performed through a finite impulse response (FIR) filtering of integer pixels. This will be explained briefly now.
First, it is assumed that P(x,y,t) indicates an integer pixel value of an image at coordinates (x,y) at time t. Here, for convenience of explanation, assuming that time t is all fixed as a current time, and any one axis of coordinates (x, y) is fixed, an integer pixel at an arbitrary position can be expressed as P(x0+s0h). Here, x0 indicates initial coordinates; h indicates a unit variation of an integer pixel and is usually 1; and s0 is a proportional value in relation to the unit variation and is a scale parameter to indicate an arbitrary position with respect to x0.
In this case, generation of a half pixel used in most of moving picture codecs is expressed as the following expression 1:
In equation 1, δ(x) denotes a Dirac-Delta function, and if x is 0, it is 1 or else, it is 0. F0(s-k) is a 0-th FIR filter coefficient, and only when (s-k) is an integer and in a predetermined range, F0(s-k) has a value.
Generally, in case of the MPEG-2 or H.261, F0(s-k) has a value as the following equation 2:
Meanwhile, in case of H.264, F0(s-k) has a value as the following equation 3:
By generalizing this and determining as an arbitrary k-th order sub-pixel, the can be expressed as the following equation 4:
If a ¼ pixel, that is, a second-order sub-pixel is expressed as an integer pixel as used in H.264, it can be expressed as the following equation 5:
If this is expanded to a general k-th sub-pixel, it can be expressed as the following equation 6:
In order to express the relation between the arbitrary k-th order sub-pixel and an integer pixel shown as equation 6 in the form of a matrix, if the filter coefficient part of the right side is unified, it can be expressed as the following equation 7:
Here,
of equation 7 is defined as the following equation 8:
In equation 8, subscript f(x) is defined as the following equation 9:
The form of equation 7 defined in relation to a first-order sub-pixel is transformed into a determinant as the following equation 10:
In equation 10, vector P(x0,h) is defined as the following equation 11:
P(x0, h)=( . . . , P(x0−h), P(x0), P(x0+h), . . . )T∈∞ (11)
The notation used in equations 10 and 11 is expanded to its k-th order sub-pixel as the following equation 12:
In equation 12, it is defined that
For convenience, equation 12 expressed in the form of a matrix can be transformed to a vector form. Assuming that vector fs
Here, the integer pixel and vector in relation to an arbitrary k-th order sub-pixel can be expressed as the following equation 14:
In order to define a differential at an arbitrary sub-pixel by using equation 12, the subscript at the k-th order sub-pixel is expressed as the following equation 15:
If the k-th order sub-pixel is expressed as equation 15, an arbitrary k-th order sub-pixel and a reference sub-pixel (this may be an integer pixel) P(x0) can be expressed in the form of a k-th order sub-pixel as the following equation 16:
P(x0+hk)=FkP(x0, h)|s
P(x0)=FkP(x0, h)|s
A differential of an arbitrary k-th order sub-pixel P(x0) is defined by using equation 16, as the following equation 17:
If this is applied to an arbitrary k-th order sub-pixel system composed of a series of 2-tap FIR filters as shown in equation 2, the following equation 18 can be obtained:
If equation 18 is substituted in equation 17, a differential is derived as the following equation 19 and the concept of the differential can be understood referring to
In order to apply an example of a 2-tap filter to an encoding system composed of a series of arbitrary n-tap FIR filters, if fs
∀k ∈+ and s ∈, ∃κ>0, such that |P(x0+shk)−P(x0)|<κ (20)
If a differential is derived from sub-pixel system composed of a series of arbitrary n-tap FIR filters by using equation 20, the following equation 21 is obtained:
The second term of the right side of equation 21 can be expressed as the following equation 22:
In equation 22, because of the assumption of equation 20 and that
Accordingly, in relation to k ↑ ∞, o(hε) that is o(hε)↓0 exists, and equation 21 can be expressed as the following equation 23:
According to equation 23, the differential in the sub-pixel system composed of a series of arbitrary n-tap FIR filters is as the following equation 24 and the differential concept can be shown referring to
As the differential is derived from the sub-pixel system composed of a series of arbitrary n-tap FIR filters, if an objective function for estimating a sub-pixel is convex on a sub-pixel system and continuously second-order differentiable, and at the same time, the Hessian of the objective function is formed as a first order combination of a sub-pixel and is Lipschitz continuous, estimation can be performed by a conjugated quadratic algorithm.
First, it is assumed that objective function ƒ(x) ∈ C2 for motion estimation is composed of a square of pixel differences between q blocks as the following equation 25:
Here, objective function ƒ(x) ∈ C2 is a convex function satisfying the following equation 26:
∀y=x0+hε∈ and s ∈[0,1], ƒ(x0+s(y−x0))=ƒ(x0+shε)≦ƒ(x0)+s(ƒ(x0+hε)−ƒ(x0)) (26)
In order to prove equation 26, an objective function value at an arbitrary sub-pixel position x0+shε, is derived as the following equation 27:
Here, since ƒ(x0+hε)=ƒ(x0+shε)|s−1, ƒ(x0+hε)−ƒ(x0) is as the following equation 28:
Through the results of equations 27 and 28, the following equation 29 is derived from the relation of equation 26:
Since the second term of the right side of equation 28 is a positive number according to the assumption of equation 26 and by raising the term to a higher power, and the first term is as equation 27, a conclusion of the following equation 30 is derived and the objective function is convex:
ƒ(x0+shε)≦ƒ(x0)+s·(ƒ(x0+hε)−ƒ(x0)) (30)
Also, if an objective function is continuously second-order differentiable, the Hessian of the objective function is a positive definite, and is expressed as a square of a first order combination of an integer pixel as the following equation 31:
In providing equation 31, first, since the objective function is continuously second-order differentiable, the objective function satisfies the following equation 32 with respect to x ∈[x0,x0+hε]:
Objective function ƒ(x) ∈ C2 also satisfies the following equation 33 according to equation 28 derived from the convex condition and the definition of a continuous differential:
Accordingly, from equation 32, the following equation 34 can be derived:
In the right side of equation 34, partial differential
is expressed as the following equation 35:
In equation 35, partial differential
is a first order combination of g(x), and also, g(x) is a first order combination of an integer pixel by definition. Accordingly,
with respect to n ∈[1, N], and a fixed value by a first order combination of an integer pixel. Accordingly, the conclusion of the following equation 36 is obtained and equation 31 is proved:
By using equation 30 to express Lipschitz continuity, it can be proved simply as the following expression 37:
According to equations 30, 31, and 37, the value of the objective function with respect to an arbitrary k-th order sub-pixel can be obtained simply by expansion of a Taylor series. Accordingly, from equation 28, the following equation 38 can be obtained such that a discrete image in an n-tap FIR sub-pixel system can be generated as a continuous image:
For estimating a sub-pixel, if a sub-pixel objective function is put as the following equation 39 with respect to ∀x ∈ 2[x−h, x+h] under conditions of equations 30, 31, and 37, the sub-pixel can be estimated with only an integer objective function value:
Here,
The objective function of equation 39 is an objective function satisfying all equations 30, 31, and 37.
Accordingly, since according to the convex condition, objective function ƒ(x) ∈ C2 satisfies a first-order necessary condition with respect to a minimum value, the following equation 40 is satisfied:
∇ƒ(x)=Qx*+b=0 x*=−Q−1b (40)
According to equation 40 and the definition of matrix Q, the position of a sub-pixel minimizing an objective function is obtained as the following equation 41:
If matrix elements of matrix Q are put as q10=q01=0, it can be regarded that only a diagonal component of the objective function is left. The eigen value and common vector of the diagonal value are (0,1) and (1,0), and it can be regarded as estimated by a conjugated quadratic function having x and y axes as principal components.
Here, in matrix Q, only estimation of the diagonal component is needed, and each component is expressed as a first-order combination of an objective function with respect to an integer pixel as the following equation 42:
Equations 39, 40, 41, and 43 are determined and if a center integer pixel is not a minimum value after finishing the first search of an integer pixel, the objective function is approximated to equation 42 and an integer pixel minimizing a next objective function according to equations 40 and 41 is predicted. Then, the integer pixel search can be performed more quickly as shown in
The present invention can also be embodied as computer readable codes on a computer readable recording medium. The computer readable recording medium is any data storage device that can store data which can be thereafter read by a computer system. Examples of the computer readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the Internet). The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims. The preferred embodiments should be considered in descriptive sense only and not for purposes of limitation. Therefore, the scope of the invention is defined not by the detailed description of the invention but by the appended claims, and all differences within the scope will be construed as being included in the present invention.
According to the present invention, since an additional sub-pixel image for estimation of a sub-pixel is not needed in encoding of moving pictures, the amount of computation is greatly reduced and as a result, the computation time can also be reduced. Furthermore, a discrete image can be generated as a continuous image and at the same time, search of an integer pixel can also be performed more quickly.
Number | Date | Country | Kind |
---|---|---|---|
10-2005-0076456 | Aug 2005 | KR | national |
10-2005-0124050 | Dec 2005 | KR | national |