The present invention relates generally to systems and methods for visual signal extrapolation or interpolation. More specifically, the present invention employs statistic similarity estimation for improved visual signal extrapolation or interpolation.
Extrapolation and interpolation of a visual signal, such as image, video, and graphics, have been widely used in various contexts, including, but not limited to: video-coding, transcoding, error concealment, pre-processing, and interactive rendering.
For instance, techniques for extrapolating and interpolating in video-coding applications have been described by Aaron et al., Toward Practical Wyner-Ziv Coding of Video, P
Non-motion-based extrapolation/interpolation methods, which are typically used in other applications, include the model-based view extrapolation method used for virtual reality rendering, the feature extrapolation method used for pre-compression, and the video fading scene prediction method. For example, the model-based view extrapolation method is described by U.S. Pat. No. 6,375,567 issued on Apr. 23, 2002 to Acres for “Model-Based View Extrapolation for Interactive Virtual Reality Systems.” The feature extrapolation method is described by U.S. Pat. No. 5,949,919 issued on Sep. 7, 1999 to Chen for “Precompression Extrapolation Method.” The video fading scene prediction is described by Koto et al., Adaptive Bi-Predictive Video Coding Temporal Extrapolation, ICIP (2003).
One example of the motion-based extrapolation/interpolation methods is the Wyner-Ziv video coding technique. A typical Wyner-Ziv video coding system includes a video encoder and a video decoder. The video encoder is a low complexity and low power encoder, so the computation-heavy signal processing tasks, such as the motion estimation, are carried by the decoder instead. To achieve high efficiency, the Wyner-Ziv decoder needs to exploit the correlation between the source and side information, which is only known to the decoder, in order to decode the received video signals and reconstruct the video. The source information is the video signal (e.g., a picture) to be encoded at the encoder and transmitted to the decoder for decoding, and the side information is essentially an estimate of the picture to be decoded. Since the performance of the Wyner-Ziv system depends heavily on the reliability of the side information, the mechanism used by the decoder for generating the side information plays a very crucial role in the Wyner-Ziv video coding system. Typically, the decoder first performs motion estimation on previously reconstructed pictures to generate a set of motion vectors and then uses such motion vectors to generate an estimate of the picture currently being decoded by extrapolation or interpolation. This estimate is used as the side information by the decoder for decoding and reconstructing the current picture.
The above-described conventional motion-based extrapolation and interpolation methods have several serious drawbacks, including:
It is therefore desirable to provide an improved system and method for visual signal extrapolation and interpolation, without the drawbacks of the conventional motion-based extrapolation and interpolation methods.
The present invention is directed to a computer-based method for visual signal extrapolation or interpolation, comprising:
providing at least a first and a second reference pictures;
conducting motion estimation on the first and second reference pictures to generate motion vectors indicative of movement of at least one of the first and second reference pictures in relation to the other;
generating an estimate picture by extrapolation or interpolation from the first and/or the second reference picture using the motion vectors; and
refining the estimate picture,
wherein statistic similarity estimation is used either in motion estimation or in refining the estimate picture, or a combination of both.
The reference pictures as used in the present invention are previously reconstructed pictures that can be used for constructing the estimate picture via extrapolation or interpolation.
When the statistic similarity estimation is used for motion estimation, statistic features of a block of pixels on the first reference picture are calculated and compared with statistic features of one or more blocks of pixels on the second reference picture. The best matching block of pixels on the second reference picture is then determined, at least partially based on its statistic similarity to the block of pixels on the first reference picture, and motion vectors are generated for the block of pixels on the first reference picture indicative of its movement in relation to the best matching block of pixels on the second reference picture.
When the statistic similarity estimation is used for refining the estimate picture, it can be used for filling empty pixel positions on the estimate picture or for resolving multiple mappings to the same pixel position on the estimate picture.
Preferably, but not necessarily, the following steps are taken to fill an empty pixel position on the estimate picture:
calculating statistic features for a neighboring block that surrounds the empty pixel position on the estimate picture;
identifying a search area on the reference picture from which the estimate picture is generated;
searching for the best matching block within the search area, wherein the best matching block surrounds a specific pixel on the reference picture and has the highest statistic similarity to the neighboring block that surrounds the empty pixel position on the estimate picture; and
filling the empty pixel position with the specific pixel surrounded by the best matching block on the reference picture.
In the event of multiple mappings, i.e., there exist multiple pixels on the reference picture from which the estimate picture is generated, all of which extrapolate or interpolate to a specific pixel position on the estimate picture, the following steps are preferably, but not necessarily, taken to select the best matching pixel for the specific pixel position on the estimate picture:
calculating statistic features for a neighboring block that surrounds the specific pixel position on the estimate picture;
calculating statistic features for multiple blocks of pixels that each surrounds one of the multiple pixels on the reference picture;
identifying the best matching block among the multiple blocks surrounding the multiple pixels on the reference picture, wherein the best matching block has the highest statistic similarity to the neighboring block that surrounds the specific pixel position on the estimate picture; and
selecting the pixel that is surrounded by the best matching block on the reference picture as the best matching pixel for the specific pixel position on the estimate picture.
The statistic features that can be used in the present invention include, but are not limited to: block sample mean, block sample variance, neighboring parameters, etc.
In a preferred but not necessary embodiment of the present invention, the statistic similarity between two blocks of pixels is determined by calculating a statistic similarity index according to the following formula:
SSI(P,Q)=α√{square root over ([Cov(P,Q)2−Var(P)×Var(Q)]2)}{square root over ([Cov(P,Q)2−Var(P)×Var(Q)]2)}{square root over ([Cov(P,Q)2−Var(P)×Var(Q)]2)}+β[μ(P)−μ(Q)]2,
wherein P is one block of pixels, Q is another block of pixels, SSI(P, Q) is the statistic similarity index indicative of the statistic similarity between blocks P and Q, μ(P) is the block sample mean of the block P, μ(Q) is the block sample mean of the block Q, Var(P) is the block sample variance of the block P, Var(Q) is the block sample variance of the block Q, Cov(P, Q) is the covariance between blocks P and Q, and α and β are weighting factors.
The above-described statistic similarity index can be used for motion estimation, for filling an empty pixel on the estimate picture, and/or for resolving the multiple mapping problem.
Another aspect of the present invention relates to a computer-based system for visual signal extrapolation or interpolation, comprising:
means for obtaining and storing at least a first and a second reference pictures;
means for conducting motion estimation on the first and second reference pictures to generate motion vectors indicative of movement of at least one of the first and second reference pictures in relation to the other;
means for generating an estimate picture by extrapolation or interpolation from the first or the second reference picture using the motion vectors; and
means for refining the estimate picture,
wherein statistic similarity estimation is used in either motion estimation or in refining the estimate picture, or a combination of both.
Other aspects, features and advantages of the invention will be more fully apparent from the ensuing disclosure and appended claims.
The present invention provides improved methods and systems for extrapolation and interpolation by using statistic similarity estimation.
Specifically, motion estimation is first performed on picture signals obtained from previously reconstructed pictures, i.e., reference pictures, to generate a set of motion vectors, which are then used to generate an estimate picture by either extrapolation or interpolation from one of the reference pictures, while statistic similarity estimation is used either for conducting the motion estimation or for refining the estimate picture, or a combination of both, as illustrated by
First, at least two previously decoded and reconstructed pictures, which are referred to hereinafter as the reference pictures, are obtained and stored by the decoder. These two reference pictures are referred to as N−1 and N−2 for extrapolation-based estimation (or as N−1 and N+1 for interpolation-based estimation).
For each block of pixels in the reference picture N−1, a search process is performed to find its best match in the other reference picture N−2 (or N+1). In order to find the best matching block B* in the reference picture N−2 (or N+1) for a specific block Bi in the reference picture N−1, the search process picks a same size block of pixels, Bp, from the reference picture N−2 (or N+1) and computes a statistic similarity index SSI, which is indicative of the statistic similarities between Bi and Bp, and optionally a prediction error E, which is the differences in pixel values between Bi and Bp. The statistic similarity index SSI and the prediction error E can be combined to determine the best matching block B* in the reference picture N−2 (or N+1), as shown in
Once the best matching block B* in the reference picture N−2 (or N+1) is determined, a set of motion vectors can be generated for the block Bi in the reference picture N−1, which are indicative of the movement of block Bi in relation to B*. The motion vectors can be generated from various parameters associated with blocks Bi and B*. Preferably, but not necessarily, they are generated by taking the spatial differences (i.e., the horizontal and vertical coordinates) of blocks Bi and B*. The motion vectors are then manipulated (e.g., reversed, scaled, shifted, or otherwise altered) for extrapolating or interpolating a location in the picture to be decoded and reconstructed, which is referred to hereinafter as the estimate picture N, where the estimate of the block Bi resides. The pixel values of the estimate block are derived from the pixel values of blocks Bi and B*, for example, by averaging the pixel values of these blocks or by otherwise manipulating such pixel values.
The above-described processing steps are repeated for each block of pixels in the reference picture N−1, so that the estimate of each block of pixels in the reference picture N−1 is mapped, thereby forming a complete estimate picture N.
Various statistic features for blocks P and Q are then calculated based on the information directly relating to blocks P and Q and optionally the neighboring information, which are then compared to determine the statistic similarity between blocks P and Q. For example, statistic features such as block sample mean, block sample variance, neighboring parameters, as well as the covariance between blocks P and Q can be used for determining the statistic similarity. Other well-known statistic features can also be used.
More specifically, assuming that the blocks P and Q are both characterized by a block size n×m, the pixel values in block P can be referred to as Pij, and the pixel values in block Q can be referred to as Qij, wherein i=1, 2, . . . , n, and j=1, 2, . . . , m. The block sample mean for P is defined as
and the block sample mean for Q is defined as
The block sample variance for P is defined as
and the block sample variance for P is defined as
The covariance of blocks P and Q is estimated as
Moreover, neighboring parameters of blocks P and Q, such as the homogeneity of the neighborhoods surrounding blocks P and Q, can also be used for determining the statistic similarity between blocks P and Q. The neighborhood homogeneity can be determined based on, for example, the differences between the motion vectors of the block P or Q and the motion vectors of one or more existing neighboring blocks surrounding the block P or Q.
The statistic features of blocks P and Q provide a good indication on how similar these two blocks are. Preferably, a statistic similarity index is computed based on the statistic features of blocks P and Q to provide a quantitative measurement of the statistic similarity between blocks P and Q, as shown in
In a preferred but not necessary embodiment of the present invention, the statistic similarity index SSI can be computed for blocks P and Q by using the following formula:
SSI(P,Q)=α√{square root over ([Cov(P,Q)2−Var(P)×Var(Q)]2)}{square root over ([Cov(P,Q)2−Var(P)×Var(Q)]2)}{square root over ([Cov(P,Q)2−Var(P)×Var(Q)]2)}+β[μ(P)−μ(Q)]2,
wherein μ(P) is the block sample mean of the block P, μ(Q) is the block sample mean of the block Q, Var(P) is the block sample variance of the block P, Var(Q) is the block sample variance of the block Q, Cov(P, Q) is the covariance between blocks P and Q, and α and β are weighting factors, as mentioned hereinabove. The smaller the value of the statistic similarity index, the more similar the two blocks.
More preferably, when statistic similarities of multiple pixel blocks are determined to generate multiple statistic similarity indexes, these indexes are normalized, so that each index value falls between 0 and 1.
As mentioned hereinabove since the extrapolation and interpolation do not generate one-to-one mapping to the estimate picture, there may be pixel positions in the estimate position that do not get any mapping, i.e., leaving empty holes. On the other hand, there may also be pixel positions in the estimate position that get multiple mappings, i.e., leaving superimposed spots. The quality of the estimate picture is adversely affected by existence of the empty holes or superimposed spots.
This invention therefore provides solutions to these problems, by using statistical similarity estimation to refine the estimate picture, i.e., filling in the empty pixel positions and/or resolving the multiple mappings.
First, the statistical features of a neighboring block of pixels that surround the empty pixel position on the estimate picture N are calculated. The motion vectors of the pixels in the neighboring block can be used to determine an initial point on the reference picture N−1, from which the estimate picture is generated by extrapolation or interpolation. An appropriate search window surrounding the initial point is then identified. Within this search window, a searching process is performed to find the block that best matches the neighboring block on the estimate picture N. The best matching block is characterized by the highest statistical similarity, and optionally the lowest pixel value difference, with respect to the neighboring block on the estimate picture N. The specific pixel surrounded by this best matching block on the reference picture N−1, as shown in
Further,
First, the statistical features of a neighboring block of pixels that surround the specific pixel position on the estimate picture N are calculated. Next, the statistic features for multiple blocks of pixels that each surrounds one of the multiple pixels on the reference picture N−1 are calculated. Among these multiple blocks on the reference picture N−1, the one that best matches the neighboring block on the estimate picture N is identified. The best matching block, as mentioned hereinabove, is characterized by the highest statistical similarity, and optionally the lowest pixel value difference, with respect to the neighboring block on the estimate picture N. The specific pixel surrounded by this best matching block on the reference picture N−1 is then selected as the best matching pixel for the specific pixel position in the estimate picture N.
Various computational steps as described hereinabove can be readily carried by a computer-based visual signal analyzer, which may comprise a general-purpose computer, a specific-purpose computer, a central processor unit (CPU), a microprocessor, or an integrated circuitry that is arranged and constructed to collect and process visual signal data. Such visual signal analyzer preferably comprises a visual signal extrapolation or interpolation protocol for computationally carrying out the above-described visual signal extrapolation or interpolation methods to generate and refine estimate pictures, according to the present invention. The visual signal extrapolation or interpolation protocol can be embodied in any suitable form, such as software operable in a general-purpose computer, a specific-purpose computer, or a central processor unit (CPU). Alternatively, the protocol may be hard-wired in circuitry of a microelectronic computational module, embodied as firmware, or available on-line as an operational applet at an Internet site for phase analysis.
Although the invention has been variously disclosed herein with reference to illustrative embodiments and features, it will be appreciated that the embodiments and features described hereinabove are not intended to limit the invention, and that other variations, modifications and alternative embodiments will readily suggest themselves to those of ordinary skill in the art. The invention therefore is to be broadly construed, as including such variations, modifications and alternative embodiments, within the spirit and scope of the ensuing claims.
This application is a continuation application of U.S. application Ser. No. 11/327,072, filed Jan. 6, 2006.
Number | Name | Date | Kind |
---|---|---|---|
5089887 | Robert et al. | Feb 1992 | A |
5535288 | Chen et al. | Jul 1996 | A |
5694487 | Lee | Dec 1997 | A |
5815602 | Ueda et al. | Sep 1998 | A |
5949919 | Chen | Sep 1999 | A |
6058143 | Golin | May 2000 | A |
6285715 | Ozcelik | Sep 2001 | B1 |
6375567 | Acres | Apr 2002 | B1 |
6449312 | Zhang et al. | Sep 2002 | B1 |
6760478 | Adiletta et al. | Jul 2004 | B1 |
20070268964 | Zhao | Nov 2007 | A1 |
20080075171 | Suzuki | Mar 2008 | A1 |
20090125912 | Haghighi | May 2009 | A1 |
20090168884 | Lu et al. | Jul 2009 | A1 |
20100020886 | Raveendran et al. | Jan 2010 | A1 |
Number | Date | Country |
---|---|---|
10134193 | May 1998 | JP |
2003309822 | Aug 2012 | JP |
Entry |
---|
Aaron et al., Toward Practical Wyner-Ziv Coding of Video, Proc. IEEE Int. Conf on Image Processing, pp. 869-872, Barcelona, Spain, Spet. (2003). |
Puri et al., PRISM: A NewRobust Video Coding Architecture based on Distributed Compression Principles, Allerton Conference on Communication, Control and Computing, (2002);. |
Yaman et al., A Low-Complexity Video Encoder with Decoder Motion Estimator, Proc. ICASSP, Montreal, Canada, (2004). |
Peng et al., Block-Based Temporal Error Concealment for Video Packet Using Motion Vector Extrapolation, International Conf on Communications, Circuits, Systems and West Sino Expo, pp. 10-14, Jun. 29-Jul. 1, 2002. |
Koto et al., Adaptive Bi-Predictive Video Coding Temporal Extrapolation, ICIP (2003). |
English-language abstract of Japanese Patent Application No. JP 10134193. |
English-language abstract of Japanese Patent Application No. JP 2003309822. |
English-language abstract of Japanese Patent Application No. JP11086003. |
Chao et al., “Motion-Compensation Spatio-Temporal Interpolation for Frame Rate Up-Conversion of Interlaced or Progressive Image Sequence”, Proceedings of the SPIE, Visual Communications and Image Processing, Sep. 28, 1994, pp. 682-693. |
Yaman et al., A Low-Complexity Video Encoder with Decoder Motion Estimator, Proc. ICASSP'04, US, IEEE, May 17, 2004, V3, pp. III157-III160. |
Number | Date | Country | |
---|---|---|---|
20110164682 A1 | Jul 2011 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11327072 | Jan 2006 | US |
Child | 13046264 | US |