The present invention relates to digital video processing, and in particular to frame rate conversion.
In a number of video applications, it is necessary to change the frame rate of a digital video sequence. This requires some form of interpolation in time between successive frames of the sequence. A standard way to perform frame rate conversion (FRC) includes detecting a structure of the video in the form of local motion vectors or sets of local directions of regularity in image contexts. Depending on the local structure that has been detected, the frame rate converter computes interpolated pixels.
A multiscale hierarchical motion estimation method is disclosed in “Hierarchical Model-Based Motion Estimation”, J. R. Bergen, et al., Proceedings of the 2nd European Conference on Computer Vision, May 1992, pages 237-252. Multiscale differential motion estimation methods are disclosed in “Bayesian Multi-Scale Differential Optical Flow”, E. P. Simoncelli, Handbook of Computer Vision and Applications, Vol. 2, chapter 14, Academic Press, San Diego, April 1999, pages 397-422, and in “Robust computation of optical flow in a multi-scale differential framework”, J. Weber and J. Malik, International Journal of Computer Vision, Vol. 2, 1994, pages 5-19.
All these methods allow to perform frame rate conversion based on motion compensation using a multiscale estimation method, and to provide a dense motion map at the final pixel or subpixel resolution. The accuracy of the motion estimation is not related to the needs of the interpolation process applied to perform frame rate conversion.
A frame rate converter is commonly implemented in an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA). In such components, the internal memory is normally not large enough to store two full-resolution images. The processing is done in an order prescribed by the input and output interfaces of the FRC circuit, usually raster, striped or tiled. At any given time, the chip holds in memory a context of lines for doing the structure detection, and for computing interpolated pixel values.
A hard limitation affects most prior art FRC systems: supporting a range of vertical speeds or displacements [−Vy, Vy] between consecutive frames requires buffers covering more than 2×Vy+1 lines for each input frame. In addition the size of the logic required to handle with good visual quality a large range of speeds increases sharply with the range.
There is a need for an implementation of FRC processing with a good trade-off between quality of the converted video sequence and (i) cost of a hardware implementation in terms of internal memory and logic size of the component or (ii) complexity of a software implementation. Such need is particularly acute in the case of real-time applications of FRC.
The invention thus proposes a method of converting the frame rate of a video sequence, comprising:
Advantageously, when considering a direction of regularity in at least one of the steps of determining directions of regularity and of determining interpolated pixel values, representations of the successive frames of the input video sequence are used at at least one resolution level depending on said direction of regularity.
Typically, the raster order is used to generate the frame rate-converted output pixels. The interpolated pixel values are then determined in succession along lines of the frame of the output video sequence, and the aforesaid representations of the successive frames of the input video sequence are used at at least one resolution level depending on a temporal slope of said direction of regularity along a spatial dimension transverse to the lines.
This provides a trade-off between the amount of vertical speeds of objects that can be taken into account by the processing, and the cost of implementation. In certain embodiments, different ranges of horizontal speed can also be accounted for.
In an embodiment, determining an interpolated pixel value for a pixel of the output video sequence includes, for each direction of regularity associated with said pixel, interpolating between pixel values from the representations of successive frames of the input video sequence at a resolution level depending on said direction of regularity.
In another embodiment, which may or may not be combined with the previous one, determining directions of regularity associated with a pixel of a frame of the output video sequence comprises:
The aforesaid plurality of candidate directions can be taken from sets of candidate directions respectively associated with different resolution levels and covering separate, or split, ranges of directions of regularity. The spacing between the candidate directions of one of the sets is preferably a decreasing function of the resolution level associated with this set. The loss value for a candidate direction of one of the sets is then estimated using local representations of the frames at the resolution level associated with this set.
Alternatively, in a “nested” embodiment, sets of candidate directions respectively associated with different resolution levels are defined to cover different ranges of directions of regularity such that the range covered by each set associated with a first resolution level is embedded in the range covered by any set associated with a second resolution level lower than said first resolution level. The spacing between the candidate directions of one of the sets can also be a decreasing function of the resolution level associated with said set. Determining directions of regularity associated with a pixel of a frame of the output video sequence then includes:
In the “nested” embodiment, selecting directions of regularity among the candidate directions may comprise eliminating candidate directions having an estimated loss value above a threshold, and keeping each remaining candidate direction of the set associated with a first resolution level higher than a second resolution level if a candidate direction of the set associated with said second resolution level equal or closest to said candidate direction of the set associated with the first resolution level remains.
The invention provides flexibility in the adaptation of the resolution levels in one or more sections of a frame rate converter. In the above-disclosed embodiments, the adaptation is typically made depending on the temporal slope of the directions of regularity along one or more dimension, in particular the vertical speeds of moving objects. Another possibility, which may be provided for in such embodiments or separately, is to make use of different resolution levels in different parts of the frame rate converter.
In particular, it is possible, for at least some directions of regularity, that the step of determining directions of regularity involves representations of successive frames of the input video sequence at at least one resolution level different from the resolution level of the representations of the successive frames of the input video sequence used for interpolating between pixel values for said directions of regularity.
Another aspect of the invention relates to a computer program product, comprising instructions to carry out a frame rate conversion method as outlined above when said program product is run in a computer processing unit.
Still another aspect of the invention relates to an FRC device comprising:
At least one of the geometry detection and interpolation sections is arranged to take into account a direction of regularity using representations of frame regions in the successive frames of the input video sequence at at least one resolution level depending on said direction of regularity.
In an embodiment, the resolution level depends on a temporal slope of said direction of regularity along a spatial dimension transverse to the lines of the output video sequence. Preferably, for each integer k such that 1<k≦K, the kth frame region is smaller than the (k−1)th frame region along said spatial dimension.
For the operation of the interpolation section, a plurality of interpolation intervals can be defined for the temporal slopes of the directions of regularity along said spatial dimension. The interpolation section then has a plurality of interpolators each associated with a respective resolution level and with a respective interpolation interval for interpolating between pixel values from the representations of frame regions at this resolution level using directions of regularity having temporal slopes within this interpolation interval along said spatial dimension.
In such an embodiment of the FRC device, the geometry detection section may have a single detector for estimating respective loss values for pixels of the current line and for candidate directions based on said candidate directions and on the representations of frame regions at one resolution level, and a selector for selecting directions of regularity for the pixels of the current line among the candidate directions based on the estimated loss values.
Alternatively, a plurality of detection intervals are defined for the temporal slopes of the directions of regularity along said spatial dimension, and the geometry detection section has:
Each of the detectors associated with a resolution level and with a detection interval may be arranged for estimating loss values for candidate directions belonging to a set of candidate directions associated with said resolution level. The loss value for a pixel of the current line and a candidate direction of one of the sets is then estimated using local representations of the frames at the resolution level associated with said set.
In a “split” embodiment, the sets of candidate directions associated with different resolution levels cover separate temporal slope ranges of the candidate directions along said spatial dimension.
Alternatively, in a “nested” embodiment, the sets of candidate directions associated with different resolution levels cover different temporal slope ranges of the candidate directions along said spatial dimension such that the temporal slope range covered by each set associated with a first resolution level is embedded in the temporal slope range covered by any set associated with a second resolution level lower than said first resolution level. The selector is then arranged to eliminate from the directions of regularity candidate directions having an estimated loss value above a threshold, and to select as a direction of regularity each remaining candidate direction of the set associated with a first resolution level higher than a second resolution level if a candidate direction of the set associated with said second resolution level equal or closest to said candidate direction of the set associated with the first resolution level was not eliminated.
For at least some directions of regularity, the geometry detection section may use representations of frame regions of the input video sequence at at least one resolution level different from the resolution level of the representations of the frame regions used in the interpolation section for interpolating between pixel values for said directions of regularity.
The foregoing and other objects of this invention, the various features thereof, as well as the invention itself, may be more fully understood from the following description, when read together with the accompanying drawings in which:
A video signal is denoted It(x), where t is a scalar temporal index and x is a 2-dimensional spatial index: x=(x1, x2). The video signal is made of pixels taking values in scalar or vector form. For color video, the pixels take 3-dimensional values. Common bases are RGB (red, green, blue) and YUV. In the exemplary embodiments below, we will consider components YUV in a non-limiting manner. The video It(x) in a YUV basis then has a luma component Yt(x) and two chroma components Ut(x) and Vt(x). To designate indistinctively one of the three channels Y, U and V, the notation Ct(x) is also used.
A frame channel, namely an image with one-dimensional pixel values, can be decomposed using a discrete wavelet transform. Using common conventions:
The notation Cj,t(x) is used for the scaling coefficients corresponding to the signal component Ct(x) with a resolution index j.
A direction of regularity of a video block at a pixel location (x, t) denotes a spatio-temporal direction (dx, dt) along which the video has small variations. Such a direction of regularity is for example detected by evaluating a directional cost of the form:
for different candidate directions (dx, dt), where w is a window centered on the pixel (x, t), and retaining the directions for which the cost is minimum. A window w centered on pixel (x, t) spans for example the indexes x+(u, v) such that −w1<u<w1 and −w2<v<w2 for some integers w1 and w2 and for frame t and optionally one or more preceding and/or following frames.
In addition, a direction of regularity can be evaluated based on a low-resolution approximation Ij,t(x) of the video signal, i.e. with a resolution index j<0, in which case formula (1) is scaled accordingly:
where w is a window centered around (2j·x, t), and of size depending on j, and Nj is a weighting factor. When j=0, (2) reduces to (1). It is possible to consider all integer values of a vector dx, or only those that are a multiple of the integer 2−j, in which case the candidate directions are more widely spaced apart.
An exemplary embodiment of an FRC device is depicted in
A layered structure of the multiresolution coefficients is used. The number of layers used in the device is noted K (K>1). Accordingly, K groups of coefficients are transferred to the line buffers:
The base coefficients represent a frame at a first resolution level k=1. This can be a low resolution image representation, corresponding to a low-pass approximation of the frame obtained in a wavelet transform. In the example depicted in
If the image is multi-channel (e.g. a color image with Y, U and V components for each pixel), a base coefficient representation can be a low-resolution representation of each channel, with possibly different resolution indexes for each channel. In a limit case, a channel can be completely absent, which is referred to as a j=−∞ resolution index. Each layer or resolution level then corresponds to a choice of a resolution index j for each channel.
In an exemplary embodiment, K=3 layers are defined, using layers of coefficients r=1, 2 and 3. The respective resolutions for Y, U and V may be in accordance with Table I.
In this example, the representation at level k=1 provided by the base coefficients in layer r=1 only contains a coarse representation (scaling coefficients) of the Y channel with the resolution index j=−2, while layer r=2 introduces color information with the resolution index j=−2 into the representation at level k=2. The refinement coefficients in layer r=2, stored in line buffer 103, are thus wavelet coefficients with the resolution index j=−2 to increase the resolution of the Y channel from j=−2 to j=−1, and low-pass scaling coefficients providing a full representation of the U and V channels with the resolution index j=−2. The refinement coefficients in layer r=3, stored in line buffer 104, are wavelet coefficients for the Y channel with the resolution index j=−1 to increase the resolution of the Y channel from j=−1 to j=0, and wavelet coefficients for the U and V channels with the resolution index j=−2 to increase the resolution of the U and V channels from j=−2 to j=−1.
Such a choice for the layer structure can be suitable, for example, when processing video frames that are already downsampled in the chroma channels, such as all MPEG-encoded video represented in the so-called YUV 4:2:0 format.
The coefficients indicated in Table I can also be generated from the Y, U and V channels of the video signal by means of a wavelet transform unit 200 as illustrated in
The FRC circuit 101 of
Each geometry detector 107-109 receives a representation of a frame generated at a respective resolution level k, i.e. using coefficient layers r=1 to k. The resolution level 1 detector 107 reads in line buffer 102 the base coefficients forming the representation REP.(1) of the frame at the first resolution level, and computes a geometry for resolution level k=1.
The resolution level 2 geometry detector 108 receives the representation REP.(2) of the frame computed using coefficient layers r=1 and 2 and derives a geometry for resolution level k=2. The representation REP.(2) received by detector 108 is not read from one of the line buffers, but recombined by an inverse wavelet transform unit 105 from (i) base coefficients of layer r=1 read from line buffer 102 and (ii) the refinement coefficients of layer r=2 read from line buffer 103. It corresponds to the scaling coefficients output by the WT blocks 201Y, 202U and 202V of
Likewise, the resolution level 3 geometry detector 109 receives the representation REP.(3) of the frame computed using coefficient layers r=1, 2 and 3, and derives a geometry for resolution level k=3. The representation REP.(3) received by detector 109 is not read from one of the line buffers, but recombined by another inverse wavelet transform unit 106 from (i) the representation REP.(2) of the frame for resolution level k=2 received from the transform unit 105 and (ii) the refinement coefficients of layer r=3 read from line buffer 104. It corresponds to the pixel representation of the Y channel and to the scaling coefficients output by the WT blocks 201U and 201V of
For each target pixel location, each geometry detector 107-109 computes loss values L for a number of directions v belonging to a set of candidate directions of regularity in the video. The loss value is for example a directional energy computed according to (2) for each channel.
Then, the selector 110 receives the directions v and associated loss values L from the geometry detectors 107-109, selects among the directions a subset of directions v and associated loss values L and outputs the selected direction and loss values. The selector 110 typically applies a threshold to the loss values L in order to eliminate the non-relevant candidate directions v. The threshold can be fixed or chosen dynamically depending on the image contents. It will be appreciated that the selection could also be performed, in part or completely, in the detector modules 107-109.
The interpolators 111, 112 use the directions provided by the selector 110, together with a representation of the frames at a resolution level which may be different from the one that was used to detect some of the directions v received by these interpolators 111, 112. The resolution level 2 interpolator 111 uses directions provided by the selector 110 and computes interpolated pixels using the representation REP.(2) of the frame at resolution level k=2. The resolution level 3 interpolator 112 uses directions provided by the selector 110 and computes interpolated pixels using the representation REP.(3) of the frame at the maximum resolution level k=3.
In an exemplary embodiment, for each direction of regularity v=(dx, dt) provided by selector 110, the resolution level 2 interpolator 111 computes a pixel value p by interpolating between the pixels values of the frame representation at resolution level k=2 using the direction v. In the above example of layers in Table 1, the interpolator 111 interpolates channel values for Yt+dt/2(x), Ut+dt/2(x) and Vt+dt/2(x):
Cj,t+dt/2(x)=[Cj,t(2j·x−2j·dx/2)+Cj,t+dt(2j·x+2j·dx/2)]/2 (3)
where the values of the image arrays at non integer pixel positions are estimated using spatial interpolation. The formula (3) is used with j=−1 for Y, and with j=−2 for U and V.
The resolution level 3 interpolator 112 performs the same kind of interpolation as the interpolator 111, but using the frame representation at resolution level k=3. According to the above example of layer organization, the interpolator 112 uses (3) with j=0 for Y coefficients and j=−1 for U and V coefficients.
Objects having a high vertical speed in the video are advantageously analyzed with a lower resolution than objects having a low vertical speed. Also, the interpolation of the corresponding area of the video is preferably of high resolution for objects having low vertical speeds, and of low resolution for objects having high vertical speeds. Since high-speed objects are affected by motion blur, it is reasonable to perform low resolution interpolation on the area of image that they cover. In addition, the human eye is less accurate in reading textures if these textures are moving with a high speed. This makes it less critical to perform accurate interpolations on high-speed contents. These considerations make it possible to consider tradeoffs as discussed further below.
The direction that the most closely corresponds to the motion of the pattern in time is (0,1,1). If a 2-tap linear interpolator is used for spatial interpolation, each of the three directions yields the same interpolated pixel value (C+D)/2. This illustrates that using a direction of (0,0,1) or (0,2,1) in this case instead of (0,1,1) does not change the result of the interpolation stage. In practice, spatial interpolation is usually performed with higher order filters, but this gain in precision is only useful if the direction is estimated with a high accuracy and if the gain in precision is visible. The eye sensitivity and the screen response (e.g. motion blur of liquid crystal displays) vary strongly on static and moving patterns.
A number of parameters can be provided to adjust the accuracy of the FRC processing depending on the spatio-temporal geometry of the video, including the resolution level of the reference frame representations used for the detection and/or the interpolation, and the difference between the resolution level of frames used by detecting directions and that used for interpolating. Various grades of frame-rate conversion can then be offered by the device:
The organization of the coefficients in different layers (base and refinement) can be done based on resolution only (j=r−K for each channel), or in a mixed manner as exemplified in Table 1 and
In an embodiment, referred to as “split” embodiment, the candidate directions evaluated by detectors 107-109 are provided as separate sets of directions covering non-overlapping ranges for the vertical component of the directions (transverse to the lines along which the target pixels are scanned). The detector 107 computes the loss values for directions corresponding to objects having a high vertical speed, while the detector 108 handles objects having a lower vertical speed, and the detector 109 handles objects having an even lower vertical speed. Alternatively, in a “nested” embodiment, the detectors 107-109 use embedded ranges of directions.
These two embodiments are explained in more detail below, with reference to
In the “split” embodiment, each of the geometry detectors 107-109 considers a separate set of candidate directions, and these sets are not overlapping. For example, the set of directions used by detector 107 is the set of directions (dx1, dx2, dt) such that α2·|dt|<|dx2|≦α3·|dt|, dt=1 and dx1 and dx2 are both multiples of 4. The set of directions used by detector 108 is the set of directions (dx1, dx2, dt) such that α1·|dt|<|dx2|≦α2·|dt|, dt=1 and dx1 and dx2 are both multiples of 2. The set of directions used by detector 109 is the set of directions (dx1, dx2, dt) such that |dx2|≦α1·|dt|, dt=1 and dx1 and dx2 are both multiples of 1. Each detector computes a directional cost of the video signal for each direction, and may perform a first selection of directions that minimize the directional energy, or for which the directional energy is below a fixed threshold. For each candidate direction, only one directional energy or loss value is computed by one of the detectors 107-109 depending on the vertical-temporal slope of the candidate direction.
In the “nested” embodiment, the detectors 107-109 consider embedded ranges of candidate directions. For example, the directions (dx1, dx2, dt) of the set used by detector 107 have vertical-temporal slopes
such that |dx2|≦α3·|dt|, dt=1, while dx1 and dx2 are both multiples of 4.
The set of directions used by detector (108) is the set of directions (dx1, dx2, dt) such that |dx2|≦α2·|dt|, dt=1, and dx1 and dx2 are both multiples of 2. The set of directions used by detector 109 is the set of directions (dx1, dx2, dt) such that |dx2|≦α1·|dt|, dt=1, and dx1 and dx2 are both multiples of 1. In this case, up to three loss values are computed for directions in the small range 604, up to two loss values are computed for directions in the intermediate range 605 and one loss value is computed for directions in the large range 606.
In both the “split” and “nested” embodiments, each of the detectors 107-109 may output a fixed or variable number of directions v=(dx, dt) each associated with a respective loss value L. For example, each detector can select the subset of directions corresponding to a directional energy lower than a fixed threshold, and output the directions of this subset with respective loss values equal to the corresponding directional energies.
The selector 110 receives sets {(v, L)} of directions/loss value pairs from the geometry detectors 107-109 and outputs a reduced set of such pairs for each pixel to be interpolated.
In the “split” embodiment, the decision module 110 can simply select the directions that have a loss value less than a threshold. This threshold can be fixed or preferably computed from the candidate directions as follows. First the “best” direction v0 is determined as the direction for which the loss value L0 is minimal. Then the threshold T is calculated as a function of L0, for example T=2·L0. Finally, all directions v received by the decision module 110 with a loss value L larger than T are discarded.
In the “nested” embodiment, the decision module 110 can apply a similar selection procedure. It may further be adapted to eliminate a direction v received from the geometry detector operating at a given resolution level k>1 if no direction close enough to v (e.g. the closest to v, or equal to v) was selected as received from the geometry detector operating at the lower resolution level k−1. This is a way to validate a choice of a direction detected from high resolution coefficients of the input frames with a corresponding direction detected from lower resolution coefficients.
Each of the interpolators 111, 112 can be arranged to compute interpolated pixel values for directions in a predefined range of directions only. In an exemplary embodiment, interpolator 111 is capable of interpolating along directions v=(dx1, dx2, dt) having vertical-temporal slopes
such that β1·|dt|<|dx2|≦β2·|dt|, with 0<β1<β2, and interpolator 112 is capable of interpolating along directions v=(dx1, dx2, dt) such that |dx2|≦β1·|dt|. When an interpolator receives a direction in input which is not within its range of allowed directions, the direction is dropped, and no corresponding output is provided by the interpolator.
Each of the interpolators 111, 112 outputs, for each pixel to be interpolated and each relevant direction v received for that pixel, an interpolated pixel value p, for example computed according to (3), associated with a loss value L that can be identical to the loss value received by the interpolator for that direction.
The combiner 113 receives the interpolated pixel values p and associated loss values L from the interpolators 111, 112. It combines the interpolated pixel values p to output a single pixel value, advantageously with weights derived from the loss values. Many different kinds of combination can be used in combiner 113. In an exemplary embodiment, the output pixel value p′ is:
The above embodiments enable a substantial reduction of logic size and memory in the hardware architecture of the FRC device.
An organization of the memory is illustrated in
A first trace 704 (“r=3”) corresponds to a detection interval made of directions (dx1, dx2, dt) having vertical-temporal slopes
lower than α1, i.e. such that |dx2|≦α1·|dt|. Another trace 714 (“r=2”) corresponds to a detection interval made of directions (dx1, dx2, dt) having vertical-temporal slopes
between α1 and α2 (α1·|dt|<|dx2|≦α2·|dt|). A last trace 724 (“r=1”) corresponds to a detection interval made of directions having vertical-temporal slopes
between α2 and α3 (α2·|dt|<|dx2|≦α3·|dt|).
The traces 704, 714, 724 in frame t+1 and similar traces in frame t indicate pixels needed at both ends of the candidate direction vectors relating to the target pixel (x1, x2, t+½). When detecting directions, an additional context or window of pixels around each pixel in these traces 704, 714, 724 is needed to compute the directional energy. The overall context of pixels is displayed as the non-hatched portion of map 701. Three regions 705, 715, 725 are distinguished in the frame (or frame tile), with respective notations “r=3”, “r=2” and “r=1”. Each of regions 715 and 725 is made of two non-connected parts symmetrically placed about region 705.
The geometry detector 107 operating according to the “split” embodiment, with frame coefficients of layer r=1, needs coefficients corresponding to pixels in trace 724, with some additional coefficient lines above and below each line of trace 724 for estimating the directional energies. It thus needs coefficients corresponding to vertical-temporal slopes
in an expanded range ]α2−2w2, α3+2w2]. Such coefficients relate to pixels located mainly in region 725 and in part in region 715. Likewise, the geometry detector 108 (r=2) needs coefficients corresponding to pixels in trace 714 with some more lines (vertical-temporal slopes
in the expanded range ]α1−2w2, α2+2w2]), such pixels being located mainly in region 715 and in part in region 705. Finally, the geometry detector 109 (r=3) operating according to the “split” embodiment needs coefficients corresponding to pixels in trace 704 with some more lines (vertical-temporal slopes
lower than α1+2w2), such pixels being located only in region 705.
When the detectors 107-109 operate according to the “nested” embodiment, they need different sets of coefficients, but these are also available in the pixel context illustrated in map 701.
The map 703 in
The traces 706, 716 in frame t+1 and similar traces in frame t indicate the input pixels whose values may be used to generate the interpolated value of the target pixel (x1, x2, t+½). The first trace 706 (“r=3”) corresponds to an interpolation interval made of directions (dx1, dx2, dt) having vertical-temporal slopes
lower than β1, i.e. such that |dx2|≦β1·|dt|. Another trace 716 (“r=2”) corresponds to an interpolation interval made of directions (dx1, dx2, dt) having vertical-temporal slopes
between β1 and β2 (β1·|dt|<|dx2|≦β2·|dt|). In this example, no interpolation is performed at the lowest resolution layer (r=1).
In this example, we have α1≦β1≦α2<α3=β2. Therefore, for the directions of regularity (dx1, dx2, dt) such that α2·|dt|<|dx2|≦β2·|dt| or such that α1·|dt|<|dx2|≦β1·|dt|, the input frames representations used for determining such directions of regularity in detector 107 or 108 are at a resolution level k=1 or 2, lower than the resolution level k=2 or 3 of the input frame representations used for interpolating in interpolator 111 or 112 between pixel values for such directions of regularity.
The maps 701 and 703 illustrate the layout of the coefficients that are required for each part of the processing (geometry detection and interpolation) at a given location of a reference frame t+1. Similar maps can be drawn for reference frame t. These maps can be coined “layer” maps because they indicate at each location of a reference frame which coefficients are required for the processing in each layer. The lower layers (e.g., r=1) are far less expensive to store in internal memory than higher layers (e.g., r=3). In addition, in the processing of a current line of the frame, no coefficient information is needed out of the regions labeled with “r=3”, “r=2” or “r=1”, i.e. in the hatched portions of maps 701 and 703.
If there is no significant delay between the geometry detection in modules 107-109 and the resulting interpolation in modules 111-112, the line buffers 102-104 must be dimensioned to contain the information to be made available to those modules 107-109, 111-112 for the processing of one line. The non-hatched portion of map 801 in
The first frame region 802 includes the second frame region 803 which includes the third frame region 804. Therefore, the contents of the line buffer section 120 when processing a current line of an output frame makes it possible to retrieve a representation at the kth resolution level of the kth frame region for each k=1, 2, . . . , K.
In an alternative embodiment, the selector 110 combines detections provided for different target pixel locations in a rectangular window [x1−D1, x1+D1]×[x2−D2, x2+D2] to output a number of directions for pixel (x1, x2) at time t+½. The selector 110 then includes a line buffer of input data, and it introduces some line delay in the processing between the geometry detection in modules 107-109 and the resulting interpolation in modules 111-112.
In such an embodiment, the overall context of coefficients that is required by the FRC processing is the union of the contexts shown in map 703 in
The internal memory size of the FRC circuit 101 is reduced because line buffers of various resolutions are used instead of a full resolution line buffer. The different resolution levels for which the geometry is examined by the detectors 107-109 also implies a substantial reduction in logic size, especially in the “split” embodiment of the detectors.
The method can also be implemented in software. Instead of reduction of logic size and of internal memory size, the benefit is then a reduced computation time because the reduction in logic size translates into reduced number of operations, and the reduction of the size of the line buffers 102-104 translates into a reduction of the cache misses in a software implementation, and again in a reduction of computation time.
In the alternative embodiment of
It is also possible to use more resolution levels in the interpolation section than in the geometry detection section. An example of this is depicted in
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB2008/052204 | 3/18/2008 | WO | 00 | 9/8/2010 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2009/115867 | 9/24/2009 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6317136 | Choi | Nov 2001 | B1 |
7116372 | Kondo et al. | Oct 2006 | B2 |
7586540 | Ogino et al. | Sep 2009 | B2 |
8098327 | Yamauchi | Jan 2012 | B2 |
8130837 | Heyward | Mar 2012 | B2 |
8134640 | Doswald | Mar 2012 | B2 |
8194184 | Turetken et al. | Jun 2012 | B2 |
20030123550 | Wang et al. | Jul 2003 | A1 |
20100220783 | Mallat et al. | Sep 2010 | A1 |
Number | Date | Country |
---|---|---|
2435360 | Aug 2007 | GB |
2006030400 | Mar 2006 | WO |
Entry |
---|
Zhenyu L. et al. “32-Parallel SAD Tree Hardwired Engine for Variable Block Size Motion Estimation in HDTV1080P Real-Time Encoding Application”, Proceedings of the 2007 IEEE Workshop on Signal Processing Systems, Oct. 17, 2007, pp. 675-680. |
Joseph Weber et al., “Robust Computation of Optical Flow in a Multi-Scale Differential Framework”, International Journal of Computer Vision, 1994, 2, pp. 5-19. |
Yue W. et al, “A Novel Parallel Fast Motion Estimation Algorithm”, Proceedings of the 2005 International Conference on Intelligent Sensing and Information Processing, Jan. 4, 2005, pp. 378-381. |
International Search Report and Written Opinion in corresponding International Application No. PCT/IB2008/052204 dated Jul. 16, 2009. |
Number | Date | Country | |
---|---|---|---|
20110096227 A1 | Apr 2011 | US |