The present invention relates generally to processing and coding of image sequences, and more particularly to encoding and decoding images using motion compensation.
Imaging display technology has been subject to huge growth and technological advances. The ability of displays to provided higher and higher resolution images has resulted in a related increase in the size of the image data necessary to represent the displayed images. Moreover, electronic video displays are being implemented in increasingly smaller sizes. Personal phones and other devices provide users with high-quality view screens. Many of such devices provide access to various networks, such as the Internet, which allow for downloading of video content. Examples of important factors in such applications include processing power (in terms of larger processor size, increased power consumption and/or longer processing times) and bandwidth for video downloads. To compensate for bandwidth limitations, many applications related to the transmission of video content implement relatively complex video compression/coding techniques. Unfortunately, increasing the compression/coding complexity can lead to increases in the processing power necessary to code (i.e., encode or decode).
Many coding techniques use spatial and/or temporal compression techniques (downsampling) involving a transform that helps to decrease the amount of data used to represent the video image. One such transform is the 8×8 discrete cosine transform (DCT). Another type of transform is a wavelet transform. The output of the transform can be quantized to facilitate transmission and further encoding of the data. For example, entropy encoding can be used to further reduce the data size.
Certain types of video coding techniques use temporal redundancies in the video images to reduce the size of the encoded video. For example, various MPEG (and related) standards use a predicted-frames (P-frames) or inter-frames to exploit similarities between images. For many applications much (or even all) of the image may remain the same for successive images. Some standards use previously transmitted image data to reproduce other images, thereby allowing a particular frame to be coded with only the differences between the current frame and a previous frame. More complex algorithms allow for compensation for motion of objects within successive frames. In particular, the difference between temporal frames can be determined using motion vectors to track similarities between frames where the similarities may have shifted within the video image. Such motion vectors indicate a possible correlation between pixels or portions of two different images. Generally, the motion vectors are the result of movement of objects within successive images; however, motion vectors can represent similarities between different images other than those resulting from movement of objects. The motion vector represents the difference, if any, in the positions of the pixels/portions of the different images. Such motion vector data will be embedded in the P-frame for use by the decoder. A specific type of motion compensation uses bidirectional-frames (B-frames). Such frames allow for the motion vectors from both the previous and future frames.
Hybrid video coding techniques as well as motion-compensated subband coding schemes can be used to generate data representing image sequences and used for coding and communication applications. To achieve high compression efficiency, some hybrid video encoders operate in a closed-loop fashion such that the total distortion across the reconstructed pictures equals the total distortion in the corresponding intra picture and encoded displaced frame differences. In case of transmission errors, decoded reference frames differ from the optimized reference frames at the encoder and error propagation is observed. On the other hand, transform coding schemes operate in an open-loop fashion. Such open-loop schemes include high-rate transform coding schemes in which the analysis transform produces independent transform coefficients. With uniform quantization, these schemes are optimal when utilizing an orthogonal transform. Further, energy conservation holds for orthogonal transforms such that the total quantization distortion in the coefficient domain equals that in the image domain. In case of transmission errors, the error energy in the image domain equals that in the coefficient domain. Hence, the error energy is preserved in the image domain and is not amplified by the decoder, as is the case, e.g., for predictive decoders.
During the last decade, there have been attempts to incorporate motion compensation into temporal subband coding schemes by approaching problems arising from multi-connected pixels. For example, some methods choose a reversible lifting implementation for the temporal filter and incorporate motion compensation into the lifting steps. In particular, the motion-compensated lifted Haar wavelet maintains orthogonality only for single-connecting motion fields; however, for complex motion fields with many multi-connected and unconnected pixels, the reversible motion-compensated lifted Haar wavelet loses the property of orthogonality. A motion-compensated orthogonal transform that strictly maintains orthogonality for any motion field would be advantageous.
Aspects of the present invention are directed to image processing and coding applications that address challenges including those discussed above, and that are applicable to a variety of video processing and coding applications, devices, systems and methods. These and other aspects of the present invention are exemplified in a number of implementations and applications, some of which are shown in the figures and characterized in the claims section that follows.
According to one embodiment of the present invention, a method is implemented for representing a sequence of images with the help of a motion-compensated orthogonal basis. A plurality of orthogonal transforms are implemented on a set of N images, where N is greater than one. The images are linked by motion fields that include sets of respective portions of the images. Orthogonality is maintained also for the important case where at least one portion of any of the N images—or any part of this portion—is used more than once to motion-compensate other portions of the N images—or parts thereof.
According to one embodiment of the present invention, a method is implemented for coding a sequence of images. A method includes the step of implementing a plurality of orthogonal transforms on a set of N images, where N is greater than one, the images linked by a motion field that includes sets of respective portions of the images, the motion field defining a first pixel from a portion of the set of N images that is not used to motion-compensate any other portions of the N images and using a second pixel from the set of N images to motion-compensate other portions of the N images at least once.
One embodiment of the present invention relates to a device for coding a sequence of images. The device includes a processing arrangement for implementing a plurality of orthogonal transforms on a set of N images, where N is greater than one. The images are linked by one or more motion fields that include sets of respective portions of the images. The motion field having a first portion of the set of N images that is unconnected by a motion vector to any other portions of the N images and having a second portion of the set of N images connected by a motion vector to other portions of the N images at least once.
An embodiment of the present invention relates to a method for coding a sequence of images. The method includes transforming at least two images of the sequence of images using a plurality of sequential transforms. The sequential transforms modifying respective portions of each of the at least two images. Each transform is orthogonal and also corresponds to a respective motion vector linking the portions.
Another embodiment of the present invention is directed to a method for coding a temporal sequence of images with multiple fields of view for respective temporal timings. For each field of view of a particular temporal timing, two or more images are transformed using a sequence of transforms that modify respective portions of each of the two or more images. Each transform of the sequence of transforms is orthogonal and also corresponds to a respective motion vector linking the portions.
The above summary is not intended to describe each illustrated embodiment or every implementation of the present invention.
The invention may be more completely understood in consideration of the following detailed description of various embodiments of the invention in connection with the accompanying drawings, in which:
While the invention is amenable to various modifications and alternative forms, specifics thereof have been shown by way of example in the drawings and will be described in detail. It should be understood, however, that the intention is not to limit the invention to the particular embodiments described. On the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.
The present invention is directed to image coding and related approaches, their uses and systems for the same. These and other aspects of the present invention are exemplified in a number of illustrated implementations and applications, some of which are shown and characterized in the following description and related figures, and further in the claims section that follows.
Aspects of the present invention relate to a coding technique that is particularly suited for use with a variety of (complex) motion fields. The transform of an image is implemented as a series of transforms such that the overall transform can be orthogonal for any of a variety of different motion fields. In a specific instance, the overall transform is implemented by factoring into a sequence of incremental transforms. The incremental transforms can be applied to N images, where N is greater than one, to reduce the energy in any of the N images by relying upon correspondence to one or more other images as defined by a motion field. The transform is applied to each portion of the image. By factoring the transform into a series of sequential transforms, the sequential transforms can be selected so that the overall transform is orthogonal. In a particular implementation, each sequential transform can be selected so as to be orthogonal, thereby ensuring that the overall transform is also orthogonal. As the sequential transforms are relatively simple, the process of ensuring the orthogonality of these transforms is practical.
The orthogonal nature of the transforms can be particularly useful for a number of reasons, some of which are described herein. Notwithstanding, other aspects of the invention have utility. For example, factoring the transform into a series of sequential transforms can be used in combination with other aspects, such as multiple motion hypotheses, optimal type selection for each incremental transform, energy conservation, and energy concentration constraints. As another example (not necessarily preferred), certain ones of the incremental transforms need not to be strictly orthogonal. Accordingly, the invention is not limited to only those aspects directly resulting from orthogonal transforms.
According to an example embodiment of the present invention, a video coding scheme is implemented that allows for the use of incremental orthogonal transforms that each maintain their orthogonality for motion fields with zero, one or multiple motion compensations/correspondences. Particular implementations are especially useful for adaptation of the incremental transforms for use with a variety of different possible motion models for a particular pixel or portion of an image.
According to a specific implementation, the incremental transforms can be selected to concentrate energy into the temporal low-band while removing energy from the temporal high-band. In one instance, scale factors are selected to help accomplish such energy concentration. The transforms decompose the image into a low-band image and one or more high-band images. The low-band image contains coarse portions of the signal. The high-band image contains finer details of the signal.
In one implementation, each incremental transform can be evaluated for two or more (hypothetical) motion models. An appropriate motion model can then be selected for each incremental transform.
Various implementations allow for implementation of a decoder for generating image data using incremental (inverse) transforms. In a specific instance, the decoder is able to determine the inverse-motion compensated transform without requiring additional data to indicate the motion-compensation transform selected during an encoding process (e.g., data in addition to the motion information used at the encoder). This can be particularly useful to allow flexibility for the encoder/decoder pair regarding the selection of optimal motion-compensation transforms without requiring transmission of additional data.
A variety of different algorithms, devices, methods and systems can be implemented in accordance with the present invention. The various methods and algorithms can be implemented using a variety of different processing arrangements including, but not limited to, one or more general purpose processors configured with specialized software, special purpose processors, hardware designed to implement one or more functions, programmable logic arrays, digital signal processors and combinations thereof.
The specific algorithms and/or methods implemented include a number of variations from the specific embodiments disclosed herein. For example, variations upon the implementation specifically described herein can be implemented so as to provide a sequence of orthogonal transforms that can be used in connection with complex motion fields. To the extent that specific transforms, motion-compensation models, and/or scale factors are disclosed herein, such disclosures do not preclude variations thereof. For instance, specific examples are discussed in connection with implementations relating to concentration of energy in a low-band image, with respect to one or more high-band images. It should be apparent that there are a multitude of techniques that will improve energy concentration within the framework of the orthogonal transform methodology discussed herein.
Consistent with one implementation of the present invention, the sequential orthogonal transforms can be formed from a number of transforms that are themselves not orthogonal. For example, one implementation involves concatenation of complementary incremental transforms, each potentially non-orthogonal, to form transforms that are orthogonal. This concatenation of two or more transforms to form orthogonal transforms still results in a set of orthogonal transforms.
Encoder 130 receives coded image data. Processing arrangement 132 is used to implement decoding operations on the received coded image data. In particular, the decoding operations reverse the coding operations performed by encoder 120. For example, a sequence of incremental transforms TkT-T1T are performed on coded image data Y1-YN to produced image data X1-XN.
The processing arrangement can be implemented using a number of different processing circuits including, but not limited to, a general purpose processor configured with specialized software, a special purpose processor, a digital signal processor, a programmable logic device, discrete hardware/logic components and combinations thereof.
Encoder 120 and decoder 130 can be used in a wide range of image applications. One area of use involves video transmissions (e.g., over a network) between two devices. The transmitting device can use an encoded version of the image data to reduce the bandwidth necessary for transmitting an image of a given quality. The receiving device can decode the image data for use (e.g., display) of the image data. A few non-limiting examples of video transmission include Internet broadcasts, downloadable videos, mobile television, satellite television, cable television, streaming video, podcasts, digital video recorder (DVRs) and similar applications. Another area of use involves video capture and subsequent storage or transmission of the captured video. A few non-limiting examples include video cameras encoding images for storage, live web casts, (live or recorded) television events or security applications. Various other implementations are possible, and the above list is merely representative of a few such implementations.
At step 103 an appropriate incremental transform is selected. The transform can be selected as a function of the motion of the current portion being operated upon.
A decorrelation factor is determined at step 104. This factor can be determined such that the combination of portions by the incremental transform serves to remove energy from the respective portion. In one instance, the transform can be set so as to remove all energy from the respective portion; however, the invention need not be so limited. The transform is implemented at step 106.
At step 108 a scale factor is determined and/or recorded for one or more of the portions of the images. As discussed herein, the scale factor is particularly useful when the portion is potentially modified by more than one incremental transform. Thus, the particular type of coding and linking between portions can result in the use of scale factors for more, less or even none of the portions. In a specific implementation, the scale factors are implemented for low-band portions that the energy is concentrated into as these portions may receive energy from other high-band portions in subsequent transforms.
At step 110 if additional transforms are left, the process proceeds to step 112 to select the next transform, otherwise the process can exit at step 114. The selection of the next transform 112 can be a simple increment to the next portion/pixel of the image or can be implemented using a more complex algorithm, after which the process returns to step 103. Such an algorithm can take into consideration that the energy concentration can be different depending upon the order that the transforms are implemented. Thus, various factors can be taken into account, including but not limited to, the number of motion vectors linked to a particular portion or the energy in a particular portion.
The following discussions provide a number of different specific example implementations. From these examples, it should be apparent that variations of the invention can be made including, but not limited to, a wide variety of image processing and coding applications other than those specifically mentioned.
According to a specific implementation of the present invention, a coding scheme is implemented for a sequence of video images. Within the coding scheme, x1 and x2 are two vectors representing consecutive pictures of an image sequence. The transform T maps these vectors according to
into two vectors y1 and y2 which represent the temporal low- and high-band, respectively. The transform T is factored into a sequence of k incremental transforms Tk such that
T=T
k
T
k−1
. . . T
κ
. . . T
2
T
1, (2)
where each incremental transform Tk is orthogonal by itself, i.e., TkTkT=I holds for all κ=1; 2; . . . , k, where I denotes the identity matrix. This guarantees that the transform T is also orthogonal. Let x1(k) and x2(k) be two vectors representing consecutive pictures of an image sequence if κ=1, or two output vectors of the incremental transform Tκ−1 if κ>1. The incremental transform Tκ maps these vectors according to
into two vectors x1(k+1) and x2(k+1) which will be further transformed into the temporal low- and high-band, respectively.
To picture the sequence of transformed image pairs (x1(k), x2(k)), it can be imagined that the pixels of the image x2 are processed from top-left to bottom-right in k steps where each step κ is represented by the incremental transform Tκ.
A specific implementation can be used for 1-hypothesis motion compensation. In a 1-hypothesis motion compensation implementation each pixel in the image x2 is linked to only one pixel in the image x1.
to provide orthogonality. For a 2×2 matrix, one scalar decorrelation factor ‘a’ is sufficient to capture all possible orthogonal transforms. As shown by the form
where a is a positive real value to remove the energy in the image x2 and to concentrate the energy in the image x1. Tκ performs only a linear combination with pixel pairs that are connected by the associated motion vector. All other pixels are left untouched. This is reflected with the following matrix notation
where the diagonal elements equal to 1 represent the untouched pixels and where the elements hμv represent the pixels subject to linear operations.
For 2-hypothesis motion compensation, each pixel in the image x2 is linked to two pixels in the image x1.
For 4-hypothesis motion compensation, each pixel in the image x2 is linked to four pixel in the image x1. Here, 25 scalar weights are arranged into the 5×5 orthogonal matrix H. H is constructed by a composition of rotations about 7 axis. In one implementation the following composition was used:
H=H
a(φ7)Hb(φ6)Hc(φ5)Hd(φ4)He(φ3)Hb(φ2)Ha(φ1) (8)
with the following individual rotations:
Further multi-hypothesis motion compensations can be constructed in a similar fashion. In a preferred embodiment, the orthogonal matrix H is constructed by a composition of appropriate Euler rotations.
Another aspect of the present invention is directed to implementation of an energy concentration constraint. The decorrelation factors and angles of the incremental transforms are chosen such that the energy in the temporal low band is concentrated.
Consider the pixel pair x1,i and x2,j to be processed by the incremental transform Tκ in
The conditions of energy conservation and energy concentration are satisfied if
Aspects of the present invention are also useful for implementation in connection with multi-hypothesis motion compensation involving two or more potential motion compensation models. In
Energy conservation requires that u12+u22=v12+v22+v32. The Euler angle ø in H1 is chosen such that the two hypotheses x′1,i and x′1,j are weighted equally after being attenuated by their scale factors v1 and v2.
The Euler angle θ in H2 is chosen such that it meets the zero-energy constraint for the high-band in (13).
Finally, the Euler angle ψ in H3 is chosen such that the pixels x1,i and x1,j, after the incremental transform Tκ, have scalar weights u1 and u2, respectively.
This ratio is open for selection. The angle ø was selected such that the i-th pixel x1,i and the j-th pixel x1,j have equal contribution after resealing with v1 and v2. Consequently, the scale factors u1 and u2 were selected such that their energy increases equally.
For 4-hypothesis motion compensation, 7 angles were chosen to minimize the energy of pixels in the image x2. To determine the angles for the pixel x2,l, it is assumed that the pixel is connected to the pixels x1,i, x1,j, x1,μ, and x1,v such that x2,l=x1,i=x1,j=x1,μ,=x1,v. Let v1; v2; v3; and v4 be the scale factors for the four pixels in x1 and let v5 be that of the pixel x2,l. Let u1; u2; u3; and u4 be the scale factors for the four pixels in x1 after they have been processed by Tκ. Now, the four pixels in x1 as well as the pixel x′2,l are processed by Tκ as follows:
Energy conservation requires that u12+u22+u32+u42=v12+v22+v32+v42+v52. The angle ø1 is chosen such that the two hypotheses x′1,i and x′1,j are weighted equally after being attenuated by their scale factors v1 and v2. The same argument holds for angle ø2. The angle ø3 is chosen such that the combination of the two previous hypothesis pairs is also weighted equally after being attenuated by their combined scale factors. With this, weights are generally achieved as powers of two, and in this particular case, a weight of ¼ is achieved for each hypothesis. The angle ø4 is chosen such that it meets the zero-energy constraint for the high-band in (18). Finally, the angles ø5, ø6, and ø7 are chosen such that the pixels in x1, after the incremental transform Tκ, have scalar weights uρ, ρ=1; 2; 3; 4.
The ratios among uρ are free to be selected. The angles were selected such that each hypothesis has equal contribution. Consequently, the scale factors uρ were selected such that their energy increases equally.
Aspects of the present invention are directed to half-pel accurate motion compensation. Using three types of incremental transforms, half-pel accurate motion compensation can be achieved where half-pel intensity values are obtained by averaging neighboring integer-pel positions.
The following example implementation describes an approach that can be useful for bidirectional motion compensation. The presented bi-directionally motion-compensated orthogonal transform is able to consider up to two motion fields per frame, although more motion fields per frame are possible. The transform is factored into a sequence of incremental transforms which are strictly orthogonal. The incremental transforms maintain scale counters. The decorrelation factors of each incremental transform are determined such that an energy-concentration constraint is met for bidirectional motion compensation.
To factor the transform into incremental transforms, the construction of the incremental transform and the incorporation of the energy-concentration constraint are outlined hereafter. Let x1, x2, and x3 be three vectors representing consecutive pictures of an image sequence. The transform T maps these vectors according to
into three vectors y1, y2, and y3, which represent the first temporal low-band, the high-band, and the second temporal low-band, respectively. The transform T can be factored into a sequence of k incremental transforms Tκ such that
T=T
k
T
k−1
. . . T
κ
. . . T
2
T
1, (24)
where each incremental transform Tκ is orthogonal by itself, i.e., TκTTκ=I holds for all κ=1; 2; . . . ; k. This guarantees that the transform T is also orthogonal.
Let x1(k), x2(k) and x3(k) be three vectors representing consecutive pictures of an image sequence if κ=1, or three output vectors of the incremental transform Tκ−1 if κ>1. The incremental transform Tκ maps these vectors according to
into three vectors x1(k+1), x2(k+1) and x3(k+1) which will be further transformed into the first temporal low-band, high-band, and second temporal low-band, respectively.
Thus, the incremental transform Tκ touches only pixels that are linked by the same motion vector pair ({right arrow over (dk)}, {right arrow over (dk*)},). Of these, Tκ performs only a linear combination with three pixels that are connected by this motion vector pair. All other pixels are left untouched. This is reflected in the following matrix notation:
The diagonal elements that equal 1 represent the untouched pixels and the elements hμv represent the pixels subject to linear operations. All other entries are zero. The scalar weights hμv are then arranged into the 3×3 matrix H. The incremental transform Tκ is orthogonal if H is also orthogonal. An orthogonal H is constructed with the help of Euler's rotation theorem which states that any 3-d rotation can be given as a composition of rotations about three axes, i.e., H=H3H2H1, where Hr denotes a rotation about one axes. The following composition was chosen
with the Euler angles ψ, θ, and ø. The Euler angles will be determined in relation to the energy concentration constraint as discussed hereafter. Note that, to carry out the full transform T, each pixel in x2 is touched only once whereas the pixels in x1 and X3 may be touched multiple times or never. Further, the order in which the incremental transforms Tκ are applied does not affect the orthogonality of T, but it may affect the energy concentration of the transform T.
The three Euler angles for each pixel touched by the incremental transform have to be chosen such that the energy in image x2 is minimized. In an example, the pixel triplet x1,i, x2,j, and X3,l is to be processed by the incremental transform Tκ. To determine the Euler angles for the pixel x2,j, it is assumed that the pixel x2,j is connected to the pixels x1,i and x3,l such that x2,j=x1,i=x3,l. Consequently, the resulting high-band pixel X″2,j shall be zero. Note that the pixels x1,i and x3,l may have been processed previously by Tτ, where τ<κ. Therefore, v1 and v3 are implemented as the scale factors for the pixels x1,i and X3,l, respectively, such that x′1,i=v1x1,i and x′3,l=v3x3,l. The pixel x2,j is used only once during the transform process T and no scale factor needs to be considered; however, in general, when considering subsequent dyadic decompositions with T, scale factors are passed on to higher decomposition levels and, consequently, they should to be considered, i.e., x′2,j=v2x2,j. For the first decomposition level, v2=1. Let u1 and u3 be the scale factors for the pixels x1,i and x3,l, respectively, after they have been processed by Tκ. Now, the pixels x′1,i, x′2,j, and x′3,l are processed by Tκ as follows:
Energy conservation requires that
u
1
2
+u
3
2
=v
1
2
+v
2
2
+v
3
2. (29)
The Euler angle ø in H1 is chosen such that the two hypotheses x′1,i and x′3,l are weighted equally after being attenuated by their scale factors v1 and V3.
The Euler angle θ in H2 is chosen such that it meets the zero-energy constraint for the high-band in (28).
Finally, the Euler angle ψ in H3 is chosen such that the pixels x′1,i and x′3,l, after the incremental transform Tκ, have scalar weights u1 and u3, respectively.
tan(ψ)=u1/u3 (32)
Note that this ratio can be chosen freely. The Euler angle ø was chosen such that the previous frame and the future frame have equal contribution after rescaling with v1 and v3. Consequently, the scale factors u1 and u3 were chosen such that they increase equally.
Scale counters can be utilized to keep track of the scale factors. Scale counters count how often a pixel is used as reference for motion compensation. Before any transform is applied, the scale counter for each pixel is n=0 and the scale factor is v=1. For arbitrary scale counter n and m, the scale factors are
v=√{square root over (n+1)} and u=√{square root over (m+1)} (34)
After applying the incremental transform, the scale counters are updated for the modified pixels. For the aforementioned 1-hypothesis uni-directionally motion compensated orthogonal transform, the updated scale counter for low-band pixels is given by m=n1+n2+1, where n1 and n2 are the scale counters of the utilized input pixel pairs. For the bi-directionally motion-compensated orthogonal transform, the updated scale counters for low-band pixels result from (33) as follows:
As an example, consider the transform in the first decomposition level where n2=0. The 1-hypothesis uni-directionally motion-compensated transform increases the scale counter by 1 for each used reference pixel, whereas the bi-directionally motion-compensated transform increases the counter by 0.5 for each of the two used reference pixels.
Further embodiments of the present invention combine multi-hypothesis motion as explained above with bi-directionally motion-compensated orthogonal transforms.
An embodiment of the present invention relates to use of a dyadic transform for groups of pictures. One of the aforementioned orthogonal transforms is defined for three input pictures but generates two temporal low-bands. In combination with other orthogonal transforms discussed herein, an orthogonal transform can be defined with only one temporal low-band for groups of pictures whose number of pictures is larger than two and a power of two.
Further embodiments of the present invention relate to motion-compensated orthogonal transforms for N images, where N is greater than one and not limited to a power of two. The incremental transforms which process all N images at a time generate N-1 high-bands while concentrating the energy of N images into one low-band.
An embodiment of the present invention relates to motion compensated orthogonal transforms for mono-view and multi-view video coding. A bi-directionally motion-compensated orthogonal transform is used as a starting point as described above. Referring back to
Further, if any type of motion compensation is not suitable for a pixel or block in x2, the corresponding incremental transform in step κ is set to
T
κ
(0)
=I, (37)
where I denotes the identity matrix. This called the intra mode for a pixel, block, or portion in the picture x2.
Thus, the type of incremental transform can be chosen freely in each step κ to match the motion of the affected pixels in x2 without destroying the property of orthonormality. In each step κ, the scalar weights hμv are arranged into the matrix Hκ. The incremental transform Tκ is orthogonal if Hκ is also orthogonal. Unidirectional motion compensation is accomplished with a 2×2 matrix Hk(1), and bidirectional motion compensation with a 3×3 matrix Hk(2). In general, p-hypothesis motion requires a (p+1)×(p+1) matrix Hk(ρ). The coefficients of Hκ in each step κ will be determined as taught hereafter in connection with the energy concentration constraint. Note that, to carry out the full transform T, each pixel in x2 is touched only once whereas the pixels in x1 and X3 may be touched multiple times or never. Further, the order in which the incremental transforms Tκ are applied does not affect the orthogonality of T, but it may affect the energy concentration of the transform T.
An aspect of the present invention relates to the energy concentration constraint. The coefficients hμv of Hκ have to be chosen such that the energy in image x2 is minimized. The aforementioned method can be used to reduce the energy in the high-band to zero for any motion vector field, if the input pictures are identical and of constant intensity introduced scale factors u and v to capture the effect of previous incremental transforms on the intensity of each pixel. For example, consider unidirectional motion compensation. With the notation in
is satisfied, i.e., the high-band coefficient is zero. Thus, energy conservation requires that the scale factors satisfy
u
1
2
=v
1
2
+v
2
2. (39)
Further, energy concentration determines also the decorrelation factor that is the sole degree of freedom for the 2×2 matrix Hk(1). Interestingly, this decorrelation factor is determined only by the scale factors v1 and v2. Moreover, the scale factors are linked to so-called scale counters m and n such that
u=√{square root over (m+1)} and v=√{square root over (n+1)}. (40)
The scale counters simply count for each pixel how often it is used as reference for motion compensation. Processing starts with scale counters set to zero for all pixels. For unidirectional motion compensation, the scale counter update rule is simply
m
1
=n
1
+n
2+1. (41)
Bidirectional motion compensation uses two reference pixels at the same time. Hence, two scale counters have to be updated. For example, the following scale counter update rule has been used
Note that for bidirectional motion compensation, the 3×3 matrix Hk(2) can be factored into rotations about three axes with the help of Euler's rotation theorem. This implies that the bi-directionally motion-compensated incremental transform has only three degrees of freedom. For general p-hypothesis motion, the extension of Euler's theorem to the p+1-dimensional space can be utilized.
Aspects of the present invention relate to dyadic transforms for groups of pictures. The bidirectional transform described above is defined for three input pictures and generates two temporal low-bands. In combination with the unidirectional transform, an orthogonal transform was defined with only one temporal low-band for each group of pictures whose number of pictures is larger than two and a power of two. There is freedom to choose the type of motion compensation and, if necessary, the intra mode for each incremental transform individually. Hence, the dyadic structure for groups of pictures permits an intra block mode as well as block-wise decisions between unidirectional and bidirectional motion compensation. This ability to adapt can be used for mono-view as well as for multi-view video coding schemes.
An example coding scheme cascades the decompositions in time and view direction. First, each view is independently decomposed with motion-compensated orthogonal transforms. Second, the resulting temporal low-bands are further decomposed in view direction with disparity-compensated orthogonal transforms. The multi-view video data is arranged into a Matrix of Pictures (MOP). Each MOP consists of N image sequences, each with K temporally successive pictures. With that, the correlation is considered among all the pictures within a MOP.
An implementation of decomposition of the multiview video signal is discussed on connection with the example in
The various embodiments described above are provided by way of illustration only and should not be construed to limit the invention. Based upon the above discussion and illustrations, those skilled in the art will readily recognize that various modifications and changes may be made to the present invention without strictly following the exemplary embodiments and applications illustrated and described herein. For example, the methods, devices and systems discussed herein may be implemented in connection with a variety of technologies such as those involving one or more of portable video displays, downloadable content, video phones and other communication devices, personal computers, DVD players, next generation video players and the like. The invention may also be implemented using a variety of approaches such as those involving images captured from multiple points of view. Such modifications and changes do not depart from the true spirit and scope of the present invention, including that set forth in the following claims.
This application claims priority, under 35 U.S.C. §119(e), of U.S. Patent Application Ser. No. 60/963,006, entitled “Motion-Compensated Orthogonal Video Transform,” and filed on Aug. 1, 2007, which is fully incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60963006 | Aug 2007 | US |