The invention relates to a method and to an apparatus for controlling the insertion of additional fields or frames into a first format picture sequence having e.g. 24 progressive frames per second in order to construct therefrom a second format picture sequence having e.g. 25 frames per second.
The major TV systems in the world use interlaced scanning and either 50 Hz field frequency (e.g. in Europe and China for PAL and SECAM) or 60 Hz or nearly 60 Hz field frequency (e.g. in USA and Japan for NTSC), denoted 50 i and 60 i, respectively. However, movies are produced in 24 Hz frame frequency and progressive scanning, denoted 24 p, which value when expressed in interlace format would correspond to 48 i.
At present, conversion of 24 p movie to 60 Hz interlaced display is handled by ‘3:2 pull-down’ as shown in
It is desirable that distribution media do have a single-format video and audio track which are playable worldwide rather than the current situation where at least a 50 Hz and a 60 Hz version exist of each packaged media title, e.g. DVD. Because many sources consist of 24 fps (frames per second) film, this 24 p format is preferably the desired format for such single-format video tracks, which format therefore needs to be adapted at play-back time for displaying correctly on display devices, both, in the 50 Hz and in the 60 Hz countries.
The following solutions are known for 24 p to 25 p or 50 i conversion or, more general, to 25 fps conversion:
At present, conversion of original 24 p format movie video and audio data streams to 50 Hz interlaced display is carried out by replaying the movie about 4% faster. This means, however, that in 50 Hz countries the artistic content of the movie (its duration, pitch of voices) is modified. Field/frame repetition schemes similar to 3:2 pull-down are not used since they show unacceptable motion judder artefacts when applied in a regular manner, such as inserting one extra field every 12 frames.
A problem to be solved by the invention is to provide a field or frame insertion scheme for conversion from 24 p format to 25 fps format in an improved manner thereby minimising motion judder artefacts. This problem is solved by the method disclosed in claim 1. An apparatus that utilises this method is disclosed in claim 2.
The characteristics of a current movie scene such as global motion, brightness/intensity level and scene change locations are evaluated in order to apply duplicated or repeated frames/fields at subjectively non-annoying locations. In other words, the invention uses relatively easily available information about the source material to be converted from 24 p to 25 fps for adaptively inserting repeated fields/frames at non-equidistant locations where the resulting insertion artefacts are minimum.
Advantageously, the invention can be used for all frame rate conversion problems where there is a small difference between source frame rate and destination frame rate. If these frame rates differ a lot, such as in 24 fps to 30 fps conversion, there is hardly any freedom left for shifting in time fields or frames to be repeated.
The invention facilitates computationally inexpensive conversion from 24 fps to 25 fps format picture sequences (example values) with minimised motion judder.
In principle, the inventive method is suited for controlling the insertion of additional fields or frames into a first format picture sequence in order to construct therefrom a second format picture sequence the frame frequency of which is constant and is greater than that of the first format picture sequence, the method including the steps:
In principle the inventive apparatus is suited for controlling the insertion of additional fields or frames into a first format picture sequence in order to construct therefrom a second format picture sequence the frame frequency of which is constant and is greater than that of the first format picture sequence, said apparatus including means that are adapted
Advantageous additional embodiments of the invention are disclosed in the respective dependent claims.
Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in:
In
Instead of a disc player, the invention can also be used in other types of devices, e.g. a digital settop box or a digital TV receiver, in which case the front-end including the disk drive and the track buffer is replaced by a tuner for digital signals.
For carrying out the inventive adaptive insertion of repeated fields or frames at non-equidistant (or irregular) locations corresponding control information is required. Content information and picture signal characteristics about the source material become available as soon as the picture sequence is compressed by a scheme such as MPEG-2 Video, MPEG-4 Video or MPEG-4 Video part 10, which supposedly will be used not only for current generation broadcast and packaged media such as DVD but also for future media such as disks based on blue laser technology.
Picture signal characteristics or information that is useful in the context of this invention are:
Such picture signal characteristics can be transferred from the encoder via a disk or via broadcast to the decoder as MPEG user data or private data. Alternatively, the video decoder can collect or calculate and provide such information.
In order to exploit motion vector information, the set of motion vectors MV for each frame is collected and processed such that it can be determined whether a current frame has large visibly moving areas, since such areas suffer most from motion judder when duplicating frames or fields. To determine the presence of such areas the average absolute vector length AvgMVi can be calculated for a frame as an indication for a panning motion:
with ‘i’ denoting the frame number, ‘VX’ and ‘VY’ being the number of motion vectors in x (horizontal) and y (vertical) direction of the image. Therefore, VX and VY are typically obtained by dividing the image size in the respective direction by the block size for motion estimation.
If motion vectors within one frame point to different reference frames at different temporal distance to the current frame, a normalising factor RDistx,y for this distance is required in addition:
In another embodiment using more complex processing, a motion segmentation of each image is calculated, i.e. one or more clusters of adjacent blocks having motion vectors with similar length and direction are determined, in order to detect multiple large-enough moving areas with different motion directions. In such case the average motion vector can be calculated for example by:
wherein AvgMVc is the average motion vector length for the identified cluster ‘c’.
Advantageously this approach eliminates the effect of motion vectors for randomly moving small objects within an image that are not member of any identified block cluster motion and that do not contribute significantly to motion judder visibility.
The processing may take into account as weighting factors for AvgMVi whether the moving areas are strongly textured or have sharp edges, as this also increases visibility of motion judder. Information about texture strength can be derived most conveniently from a statistical analysis of transmitted or received or replayed AC transform coefficients for the prediction error. In principle, texture strength should be determined from analysing an original image block, however, in many cases such strongly textured blocks after encoding using motion compensated prediction will also have more prediction error energy in their AC coefficients than less textured blocks. The motion judder tolerance MJT at a specific temporal location of the video sequence can, hence, be expressed as:
MJT=f(AvgMV, texture strength, edge strength) (3)
with the following general characteristics:
Preferably the current size of the motion judder tolerance value influences the distribution, as depicted in
One possible solution for this control problem is depicted in
FRD=f(VD, MJT), (4)
with the following general characteristics:
This relation can be expressed in a characteristic of FRD=f(VD) that changes depending on the motion judder tolerance value, as is the case in
Since a short freeze-frame effect at scene change locations is not considered as being annoying, scene change information generated by a video encoder (or by a video decoder) can be used to insert one or more repeated fields or frames at such locations, the number of repetitions depending on the current degree of video delay. For the same reason, repeated fields or frames can be inserted after a fade-to-black sequence, a fade-to-white sequence or a fade to any colour. All such singular locations have a very high MJT value.
Notably repeated frames could be used at such locations even if at other picture content fields only would be repeated in order to reduce motion judder intensity at individual locations. Generally, repeated frames and repeated fields may co-exist in a converted picture sequence.
Typically accepted delay bounds for perceived lip-sync need only be observed if at least one speaker is actually visible within the scene. Hence, the delay between audio and video presentation can become larger than the above-mentioned bounds while no speaker is visible. This is typically the case during fast motion scenes. Hence an additional control can be carried out as shown in
A detection of speech can be derived for example in case of the mostly-used multi-channel audio by evaluating the centre channel relative to left and right channels, as speech in movies is mostly coded into the centre channel. If the centre channel shows a bursty energy distribution over time that is significantly different from the energy distribution in the left and right channels, then the likelihood of speech being present is high.
All the above controls for adaptively determining the local frame repetition distance do work for a single-pass through the video sequence. However, the inventive control benefits from a two-pass encoding processing as is carried out in many professional MPEG-2 encoders. In that case the first pass is used to collect the motion intensity curve, scene cut locations and count, number, location and length of scenes which require tight lip-sync, black frames, etc. Then a modified control scheme can be applied that does not only take into account available information for the currently processed frame and its past, but also for a neighbourhood of past and future frames:
FRD(i)=f(VD, MJT(i−k) . . . MJT(i+k)), (5)
wherein ‘i’ denotes the current frame number and ‘k’ denotes a running number referencing the adjacent frames. A general characteristic of each such function is that FRD increases if MJT(i) is smaller than the surrounding MJT values and decreases if MJT(i) is larger than the surrounding MJT values. Related picture signal characteristics can be transferred as MPEG user data or private data from the encoder via a disk or via broadcast signal to the decoder.
In another embodiment of the invention, under specific circumstances motion compensated interpolation of frames rather than repetition of frames can be applied without computational expense. Such motion compensated interpolation can make use of the transmitted motion vectors for the current frame. In general, these motion vectors are not suitable for motion compensated frame interpolation since they are optimised for optimum prediction gain rather than indicating the true motion of a scene. However, if a decoder analysis of received motion vectors shows that a homogeneous panning of the scene occurs, a highly accurate frame can be interpolated between the current and the previous frame. Panning means that all motion vectors within a frame are identical or nearly identical in length and orientation. Hence an interpolated frame can be generated by translating the previous frame by half the distance indicated by the average motion vector for the current frame. It is assumed that the previous frame is the reference frame for the motion compensated prediction of the current frame and that the interpolated frame is equidistantly positioned between the previous and current frame. If the prediction frame is not the previous frame, adequate scaling of the average motion vector is to be applied.
The corresponding considerations are true for the case where a zoom can be determined from the received motion vectors. A zoom is characterised by zero motion vectors in the zoom centre and increasing length of centre-(opposite)-directed motion vectors around this zoom centre, the motion vector length increasing in relation to the distance from the zoom centre.
Advantageously this kind of motion compensated interpolation yields an improved motion judder behaviour compared to repeating a frame, as is illustrated in
The above-disclosed controls for frame and/or field repetition and interpolation for frame rate conversion can be applied, both, at the encoder and at the decoder side of an MPEG-2 (or similar) compression system since most side information is available at both sides, possibly except reliable scene change indication.
However, in order to exploit the superior picture sequence characteristics knowledge of the encoder, the locations for fields or frames to be repeated or interpolated can be conveyed in the (MPEG-2 or otherwise) compressed 24 fps video signal. Flags to indicate temporal order of fields (top_field_first) and repetition of the first field for display (repeat_first_field) exist already in the MPEG-2 syntax. If it is required to signal the conversion pattern both for 24 fps to 30 fps and 24 fps to 25 fps conversion for the same video signal, one of the two series of flags may be conveyed in a suitable user data field for each picture.
The values 24 fps and 25 fps and the other numbers mentioned above are example values which can be adapted correspondingly to other applications of the invention.
The invention can be applied for:
The invention can be applied in an optical disc player or in an optical disc recorder, or in a harddisk recorder, e.g. an HDD recorder or a PC, or in a settop box, or in a TV receiver.
Number | Date | Country | Kind |
---|---|---|---|
04090021.9 | Jan 2004 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2004/012483 | 11/4/2004 | WO | 00 | 7/14/2006 |