1. Field of the Invention
The present invention relates to a moving image decoder which decodes a moving image signal encoded for respective frames, a moving image decoding method, and a computer-readable medium storing a moving image decoding program and, more particularly, to decoding processing executed when an error region or lost region is generated in a decoded image.
2. Description of the Related Art
A moving image encoded for respective frames is normally decoded using motion vectors and motion-compensated prediction errors. However, with this method, when a received signal fails to be correctly decoded, data of motion vectors and motion-compensated prediction errors are lost, thus consequently generating distortions and lost regions caused by errors (to be correctly referred to as a lost region hereinafter) in a decoded image.
As means for solving this problem, a method which estimates motion vectors of a lost region from those of a neighboring region of the lost region and interpolates pixel values from a previous frame has been proposed (for example, see reference 1 [M. Ghanbari and V. Seferidis, “Cell loss concealment in ATM video codecs,” IEEE Trans. Circuit System Video Technol., vol. 3, pp. 238-247, June 1993]). However, with this method, when the neighboring region of the lost region cannot be used, it is difficult to restore data. Also, in case of a frame including a moving object, since the re-estimated motion vectors have low precision, high-precision restoration cannot be attained.
On the other hand, a method of interpolating pixel values of a lost region on a spatial domain using information only in the same frame as that including the lost region has been proposed. As such method, a method of interpolating to minimize boundary errors using surrounding pixels (for example, see reference 2 [S. S. Hemami and T. H. Y. Meng, “Transform coded image reconstruction exploiting interblock correlations,” IEEE Trans. Image Processing, vol. 4, pp. 1023-1027, July 1995]) and a method using edge information of surrounding pixels (for example, see reference 3 [H. Sun and W. Kwok, “Concealment of damaged block transform coded images using projection onto convex sets,” IEEE Trans. Image Processing, vol. 4, pp. 470-477, April 1995]) have been proposed. However, these methods cannot attain high-precision restoration since they do not Use any inter-frame correlations to estimate pixel values of a lost region.
Hence, as a method using inter-frame correlations, a boundary matching algorithm has been proposed. With this method, motion vectors of a lost region are estimated from a region in which pixel values are given and which exists in the neighborhood of the lost region, and the lost region is interpolated by pixel values of a region of the previous frame associated by the estimated motion vectors (for example, see reference 4 [W. M. Lam, A. R. Reibman and B. Liu, “Recovery of lost or erroneously received motion vectors,” ‘Proc. ICASSP 1993, vol. 5, pp. 417-420]) However, this boundary matching algorithm poses another problem that errors propagate to subsequent frames to be restored since it does not consider any errors generated upon interpolating the lost region by the pixel values of the corresponding region of the previous frame.
As described above, in the conventional moving image decoder, when data of motion vectors and motion-compensated prediction errors are lost, the method of interpolating pixel values from the previous frame by estimating motion vectors of a lost region, the method of interpolating pixel values of a lost region on a spatial domain, and the method of interpolating pixel values of a lost region from the previous frame using the boundary matching algorithm are carried out, but it is difficult for these methods to maintain high-precision restoration.
It is an object of the present invention to provide a moving image decoder which can restore a lost region with high precision even when data of motion vectors and motion-predicted prediction errors are lost, a moving image decoding method, and a computer-readable medium storing a moving image decoding program.
According to first embodiment of the invention, there is provided a moving image decoder, which receives a moving image signal which is compressed and encoded by prediction between frames which are motion-compensated for respective blocks each including M×M (M is a natural number greater than or equal to 2) pixels, and decodes an original moving image signal by sequentially repeating processing for detecting motion vectors for respective blocks from an image of a first frame in the moving image signal, calculating motion-compensated prediction values corresponding to the motion vectors detected from the image of the first frame, and generating an image of a second frame which follows the first frame from the motion vectors and the motion-compensated prediction values, the decoder comprising: a matching processing unit configured to detect a defective region Ψ which suffers a loss or an error from the image of the second frame, to divide defective region Ψ into a plurality of regions ωt each including N×N (N≦M) pixels as a unit, to estimate first motion vectors (d=(dx, dy)) of the plurality of obtained divided defective regions ωt, to estimate a plurality of regions ωt−1 in the image of the first frame, which correspond to the plurality of divided defective regions ωt in the image of the second frame, based on the first motion vectors, and to interpolate the plurality of divided defective regions ωt in the image of the second frame by pixel values of the plurality of estimated regions ωt−1 in the image of the first frame; a pre-processing unit configured to calculate second motion vectors (v=(vx, vy)) of small defective regions γt each of which has each of the N×N pixels (x, y) as the center and includes L×L (L≦N) pixels in each divided defective region ωt of N×N pixels in the image of the second frame, to estimate a plurality of small regions γt−1 each including L×L pixels in the image of the first frame, which respectively correspond to the plurality of small defective regions γt in the image of the second frame, based on the second motion vectors, and to calculate a matrix Ax,y (t) used to estimate pixel values Xx,y(t) of original images of the plurality of small defective regions γt in the image of the second frame from pixel values Xx+vx, y+vy(t−1) of the plurality of small estimated regions γt−1 in the image of the first frame; and an estimation unit configured to estimate pixel values Xx,y(t) of the original image of each small defective region γt by estimating a covariance matrix Qv(t) of an error vector, which is expressed by Zx,y(t)−Hx,y(t)Xx,y(t), using a matrix Hx,y(t) which gives pixel values Zx,y(t) of an observation image from pixel values Xx,y(t) of each small defective region γt including the L×L pixels.
According to second embodiment of the invention, there is a moving image decoding method, which receives a moving image signal which is compressed and encoded by prediction between frames which are motion-compensated for respective blocks each including M×M (M is a natural number greater than or equal to 2) pixels, and decodes an original moving image signal by sequentially repeating processing for detecting motion vectors for respective blocks from an image of a first frame in the moving image signal, and generating an image of a second frame which follows the first frame from motion-compensated prediction values corresponding to the motion vectors detected from the image of the first frame, the method comprising: executing matching processing for detecting a defective region which suffers a loss or an error from the image of the second frame, dividing defective region Ψ into a plurality of regions ωt each including N×N (N≦M) pixels as a unit, estimating first motion vectors (d (dx, dy)) of the plurality of obtained divided defective regions ωt, estimating a plurality of regions ωt−1 in the image of the first frame, which correspond to the plurality of divided defective regions ωt in the image of the second frame, based on the first motion vectors, and interpolating the plurality of divided defective regions ωt in the image of the second frame by pixel values of the plurality of estimated regions ωt−1 in the image of the first frame; executing pre-processing for calculating second motion vectors (v=(vx, vy)) of small defective regions γt each of which has each of the N×N pixels (x, y) as the center and includes L×L (L≦N) pixels in each divided defective region ωt of N×N pixels in the image of the second frame, estimating a plurality of small regions γt−1 each including L×L pixels in the image of the first frame, which respectively correspond to the plurality of small defective regions γt in the image of the second frame, based on the second motion vectors, and calculating a matrix Ax,y(t) used to estimate pixel values Xx,y(t) of original images of the plurality of small defective regions γt in the image of the second frame from pixel values Xx+vx,y+vy(t−1) of the plurality of small estimated regions γt−1 in the image of the first frame; and executing estimation processing for estimating pixel values Xx,y(t) of the original image of each small defective region γt by estimating a covariance matrix Qv(t) of an error vector, which is expressed by Zx,y(t)−Hx,y(t)Xx,y(t), using a matrix Hx,y(t) which gives pixel values Zx,y(t) of an observation image from pixel values Xx,y(t) of each small defective region γt including the L×L pixels.
According to third embodiment of the invention, there is a computer-readable medium storing a moving image decoding program that makes a computer execute moving image decoding processing, which receives a moving image signal which is compressed and encoded by prediction between frames which are motion-compensated for respective blocks each including M×M (M is a natural number greater than or equal to 2) pixels, and decodes an original moving image signal by sequentially repeating processing for detecting motion vectors for respective blocks from an image of a first frame in the moving image signal, and generating an image of a second frame which follows the first frame from motion-compensated prediction values corresponding to the motion vectors detected from the image of the first frame, the program making the computer execute: matching processing for detecting a defective region Ψ which suffers a loss or an error from the image of the second frame, dividing defective region into a plurality of regions ωt each including N×N (N≦M) pixels as a unit, estimating first motion vectors (d=(dx, dy)) of the plurality of obtained divided defective regions ωt, estimating a plurality of regions ωt−1 in the image of the first frame, which correspond to the plurality of divided defective regions ωt in the image of the second frame, based on the first motion vectors, and interpolating the plurality of divided defective regions ωt in the image of the second frame by pixel values of the plurality of estimated regions ωt−1 in the image of the first frame; pre-processing for calculating second motion vectors (v=(vx, vy)) of small defective regions γt each of which has each of the N×N pixels (x, y) as the center and includes L×L (L≦N) pixels in each divided defective region ωt of N×N pixels in the image of the second frame, estimating a plurality of small regions γt−1 each including L×L pixels in the image of the first frame, which respectively correspond to the plurality of small defective regions γt in the image of the second frame, based on the second motion vectors, and calculating a matrix Ax,y(t) used to estimate pixel values Xx,y(t) of original images of the plurality of small defective regions γt in the image of the second frame from pixel values Xx+vx,y+vy(t−1) of the plurality of small estimated regions γt−1 in the image of the first frame; and estimation processing for estimating pixel values Xx,y(t) of the original image of each small defective region γt by estimating a covariance matrix Qv(t) of an error vector, which is expressed by Zx,y(t)−Hx,y(t)Xx,y(t), using a matrix Hx,y(t) which gives pixel values Zx,y(t) of an observation image from pixel values Xx,y(t) of each small defective region γt including the L×L pixels.
Embodiments of the present invention will be described hereinafter with reference to the drawings. Note that the interpolation sequence of a lost region according to the present invention when an encoded sequence of a received moving image includes errors and a lost portion is generated in a decoded image will be described in detail below with reference to the drawings.
This inter-frame prediction decoding unit 13 fetches a previous frame image stored in a frame memory 14, and decodes a next frame image using the previous frame image, and the newly input motion vectors and prediction errors. Then, the inter-frame prediction decoding unit 13 executes matching process S1, estimation pre-process S2, and original image estimation process S3 shown in
The sequence of the error concealment processing of the inter-frame prediction decoding unit 13 will be described below with reference to
In the matching process S1, as shown in
Next, in the estimation pre-process S2, as shown in
Finally, in the original image estimation process S3, a transition model and observation model are defined based on the result obtained by the estimation pre-process S2, and an original image is estimated using a Kalman filter algorithm shown in
The contents of the processing executed in aforementioned processes S1 to S3 will be described in more detail below.
[Matching Process S1]
In the matching process S1, letting ft(x, y) be a pixel value of a pixel (x, y) in each divided lost region ωt, first motion vectors (d=(dx, dy)) of that divided lost region ωt are estimated as shown below. Note that the boundary matching algorithm is used in this case. However, other methods that estimate pixel values by estimating motion vectors from the previous frame may be used.
Let ωt−1 be a divided estimated region of N×N pixels in the image of Frame t−1, which corresponds to the same position as that of each divided lost region ωt of N×N pixels obtained by dividing lost region Ψ in the image of Frame t, and Ω be a neighboring region including divided estimated region ωt−1. Then, it is estimated that divided lost region ωt in the image of Frame t is included in this region Ω.
Note that let (x0, y0) be the position of a pixel at the upper left end in each of regions ωt and ωt−1, (x0+N, y0) be the upper right end, and (x0, y0+N) be the lower left end. Then, let CA be a variance value between pixel values ft−1(x, y0) (x0≦x≦x0+N−1) of pixels on the top side of divided estimated region ωt−1 and pixel values ft(x, y0−1) (x0≦x≦x0+N−1) of pixels above by one pixel the top side of divided lost region ωt, CL be a variance value between pixel values ft−1(x0, y) (y0≦y≦y0+N−1) of pixels on pixel values ft(x0−1, y) (y0≦y≦y0+N−1) of pixels on the left side by one pixel of the left side of ωt, and CB be a variance value between pixel values ft−1(x, y0+N−1) (x0≦x≦x0+N−1) of pixels on the bottom side of divided estimated region ωt−1 and pixel values ft(x, y0+N) (x0≦x≦x0+N−1) of pixels below by one pixel the bottom side of divided lost region ωt. Then, the variance values CA, CL, and CB can be calculated as follows:
The position of the pixel (x, y) is sequentially moved in the neighboring region Ω, the variance values CA, CL, and CB are calculated for respective pixels (x, y), and the first motion vector d=(dx, dy) is estimated from a position (x+dx, y+dy) where a total variance value C=CA+CL+CB becomes smallest. Then, divided lost region ωt is interpolated by pixel values of a region of N×N pixels having the position (x+dx, y+dy) as the center.
[Estimation Pre-Process S2]
In the estimation pre-process S2, as shown in
Next, correspondence between pixels of these two local regions γt and γt−1 is calculated to estimate pixel values Xx,y(t) in the image of target Frame t from pixel values Xx+vx,y+vy(t−1) in the image of Frame t−1. In this process, an element in the k-th row and 1st column in a matrix Ax,y(t) used to estimate pixel values Xx,y(t) of the original image of each local region γt in the image of Frame t assumes “1” when the k-th element of Xx,y(t) corresponds to the 1st element of Xx+vx,y+vy (t−1); otherwise, it assumes “0”.
[Original Image Estimation Process S3]
Letting Xx,y(t) and Zx,y(t) be pixel values in local regions γt extracted from the original image and observation image, the state transition model and observation model are respectively expressed by:
[State Transition Model]
X
x,y(t)=Ax,y(t)Xx+vx,y+vy(t−1)+U(t) (4)
[Observation Model]
Z
x,y(t)=Hx,y(t)Xx,y(t)+V(t) (5)
where Xx,y(t) and Zx,y(t) are vectors obtained by raster-scanning pixel values in local regions γt (L×L pixels) having pixels (x, y) in the original image and observation image as the centers.
Furthermore, other matrices and vectors are defined as follows:
When the state transition model and observation models are defined, as described above, the Kalman filter algorithm is expressed by:
P
b
x,y(t)=Ax,y(n)Pa
K
x,y(t)=Pb
x,y(t)=Ax,y(t){circumflex over (X)}x,y(t−1) (8)
{circumflex over (X)}
x,y(t)=
P
a
x,y(t)=Pb
where Pb
P
b
x,y(t)=E[(Xx,y(t)−
P
a
x,y(t)=E[(Xx,y(t)−{circumflex over (X)}x,y(t))(Xx,y(t)−{circumflex over (X)}x,y(t))T] (12)
where QU(t) and QV(t) are diagonal matrices including an average 0 and variances σu2 and σv2 as diagonal elements.
When an observation image Zx,y(t) includes a lost region; all elements in rows corresponding to the lost region in the observation matrix Hx,y(t) become zero. As a result, elements of rows in Kx,y(t) corresponding to these rows become zero, and cannot be corrected. As one method to solve this problem, a result obtained by reconstructing the lost region by a convex projection method is used as the observation image Zx,y(t). That is, an image obtained by processing the entire frame by low-pass filter processing may be used intact, but when an image further reconstructed by the convex projection method is used, estimation with higher precision can be attained. However, the present invention is not limited to such specific method. As other methods, for example, intra-frame interpolation may be used.
More specifically, a result obtained by restoring the lost region using the boundary matching algorithm for the purpose of motion vector compensation in the matching process S1 is used as an initial value, and pixel values Xx,y(t) of the original image in small defective region γt are estimated, using the convex projection method, from an image which is given with the low-pass filter characteristics by the observation matrix. A result which is converged by the convex projection method under the following two constraint conditions is used as the reconstruction result.
(1) Given pixel values in an image to be reconstructed are values of the original image, and remain unchanged.
(2) Low-frequency components in a frequency domain remain unchanged, and high-frequency components become zero.
In this manner, pixel values of a region where the lost region exists in the observation image reconstructed by the convex projection method are pixel values of the original image which are degraded by the low-pass filter processing and on which noise components are superposed. Hence, the observation matrix Hx,y(t) can be defined by approximating a low-pass filter using a matrix, and includes coefficients of the low-pass filter in respective rows.
Assuming that each element of a vector W(t) including observation noise as elements corresponds to white noise according to N(0, σv2), σv2 can be calculated as a difference between pixel value Zx,y(t) of the reconstruction result by the convex projection method, and the product of pixel value Xx,y(t) of the original image and Hx,y(t), i.e., by Wx,y=Zx,y(t)−Hx,y(t)Xx,y(t) (Equation (5)). Thus, an estimated current image {circumflex over (X)}x,y(t) in a small region of L×L pixels can be derived from Equations (6) to (12).
The Kalman filter algorithm applies this processing to all N×N pixels while shifting the center pixel (x, y) one by one, and further applies similar processing to all regions of N×N pixels in an error region and lost region in the image of Frame t. More specifically, as shown in
By executing aforementioned processes S1 to S3, errors and losses can be restored with high precision.
In order to present the effects of the present invention,
Furthermore, the effects in an actual decoded image will be explained by comparing the decoding algorithm of the embodiment and the conventional BMA method.
Assume that as a result of transmission of an original image (free from any error) shown in
Therefore, according to the moving image decoder with the above arrangement, even when data of motion vectors and motion-compensated prediction errors are lost, the lost region can be restored from an image of the previous frame with very high precision. In addition, since errors generated upon interpolating the lost region by pixel values of the corresponding region in the previous frame are taken into consideration, errors can be prevented from propagating to subsequent frames to be restored. Hence, high-precision restoration can be continuously executed.
Note that the present invention is not limited to the above embodiment intact, and can be embodied by modifying required constituent elements without departing from the scope of the invention when it is practiced. For example, the case has been explained wherein the original image estimation process S3 of the embodiment uses the Kalman filter algorithm. However, the present invention is not limited to such specific algorithm. As other methods, for example, a recursive least squares (RLS) algorithm, and extended Kalman filter algorithm may be used.
By appropriately combining a plurality of required constituent elements disclosed in the embodiment, various inventions can be formed. For example, some of all the required constituent elements disclosed in the embodiment may be deleted. Furthermore, required constituent elements in different embodiments may be appropriately combined.
The present invention is especially suitably used in a moving image decoder included in a mobile phone, image processing terminal, and the like, each of which receives and decodes a compression-encoded moving image which is transmitted wirelessly.
Number | Date | Country | Kind |
---|---|---|---|
2007-263721 | Oct 2007 | JP | national |
This is a Continuation Application of PCT Application No. PCT/JP2008/068393, filed Oct. 9, 2008, which was published under PCT Article 21 (2) in Japanese. This application is based upon and claims the benefit of priority from prior Japanese Patent Application No. 2007-263721, filed Oct. 9, 2007, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP08/68393 | Oct 2008 | US |
Child | 12757749 | US |