This invention generally relates to video processing, and, in particular, to motion detection in video fields for deinterlacing of the video fields for display.
Video frames are typically encoded in an interlaced format comprising a first video field (e.g., a top video field) and a second video field (e.g., a bottom video field), each video field having alternating lines of the video frame and each field being temporally separated. Video images are typically encoded and transmitted to a receiver in such an interlaced format as a compromise between bandwidth and video image resolution. Since interlaced video frames are displayed using only half the lines of a full video frame, less system bandwidth is required to process and display these video frames. However, since the human eye typically cannot resolve a single video field, but rather, blends the first field and the second field, the perceived image has the vertical resolution of both fields combined.
Some types of receivers, including computers, televisions, mobile phones, computing tablets, etc., may require the use of de-interlaced video frames instead of interlaced video frames. For such receivers, the video frames encoded in an interlaced format must be de-interlaced prior to display. Typically, any missing pixels from the video frame are interpolated using the pixels of the first video field and the second video field.
There are several well-known methods to construct de-interlaced video frames. One such method is commonly referred to as the “bob” method in which a de-interlaced video frame is constructed from a single video field that is vertically interpolated. Whether to rely on a spatial or a temporal interpolation to interpolate an image data is decided by detecting the motion of a subject in the picture. Specifically, spatial interpolation is used to interpolate image data for pixels that are sensing a subject in motion, and temporal interpolation is used to interpolate image data for pixels that are sensing a motionless subject. In this way, by switching interpolation methods according to the state of motion of the subject being sensed by individual pixels, it is possible to faithfully reproduce the sensed subject in each field of the picture being played back.
Conventionally, such detection of motion is achieved by calculating differences of the image data of identical pixels among even-numbered and odd-numbered fields, and then comparing those differences with a predetermined threshold value. If the differences are greater than the threshold value, the subject being sensed by the pixels in question is recognized to be in motion.
In this way, by comparing the field-to-field differences of the image data of identical pixels with a constant, predefined threshold value, whether the subject being sensed by the pixels for which image data is going to be interpolated is in motion or not is judged. However, as long as such a threshold level is kept constant, for example, in a case where motion was present up to the field immediately previous to the one currently being reproduced but no motion is present any more in the current field, the motion that had been recognized just up to the previous field leads to an erroneous judgment that the motion is still present in the current field. In addition, the predefined threshold value has no relationship between any of the other pixels in the current video field/frame, which can lead to inaccurate results during interpolation.
This makes faithful reproduction of the real image impossible, and sometimes causes flickering or the like while a motion picture is being played back. Therefore, it would be desirable to provide new methods and systems for motion detection in video fields that can use an adaptive threshold value, conserve processing power, and increase system bandwidth.
An object of this invention is to provide methods and systems for motion detection in video fields that are adaptive to texture information and combing information of the video fields.
Another object of this invention is to provide methods and systems for motion detection in video fields that minimize the use of memory.
Yet another object of this invention is to provide methods and systems for motion detection in video fields that can be used for deinterlacing.
Briefly, the present invention discloses methods and systems for detecting motion in video fields of video data, comprising the steps of: calculating texture information for a pixel in the video fields; determining a threshold value as a function of the calculated texture information; calculating a differential value for the pixel; and detecting motion in the video fields as a function of the determined threshold value and the calculated differential value.
An advantage of this invention is that methods and systems for motion detection in video fields are provided that are adaptive to texture information and combing information of the video fields.
Another advantage of this invention is that methods and systems for motion detection in video fields are provided that minimize the use of memory.
Yet another advantage of this invention is that methods and systems for motion detection in video fields that can be used for deinterlacing.
The foregoing and other objects, aspects, and advantages of the invention can be better understood from the following detailed description of the preferred embodiment of the invention when taken in conjunction with the accompanying drawings.
In the following detailed description of the embodiments, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration of specific embodiments in which the present invention may be practiced.
Texture information can be calculated 10 for a pixel in the video fields to detect motion for the pixel. For instance, texture information can include a vertical infield texture value txt1 and a horizontal intra-field texture value txt3. The vertical infield texture value txt1 can be calculated for a pixel having a position (x,y) in a current video field at time t by applying a low pass filter (“lpf”) to an absolute difference in luma values f(a,b,c) of neighboring vertical pixels (i.e., pixels at (x, y−1) and (x, y+1) in a previous field (i.e., time=t-1). The following equation can be used to calculate txt1:
txt1(y,x,t−1)=lpf*|f(y−1,x,t−1)−f(y+1,x,t−1)| Equation [1]
The horizontal intra-field texture value txt3 can be calculated for the pixel having position (x,y) in the current video field at the time t by applying a min, max, or average function G(d,e,f) on the difference of textures in different video fields. Each texture txt is based on a single video field, where the video fields can be of the same polarity. Same polarity can mean that the two video fields are both in top fields or both in bottom fields. The texture txt3 can be calculated using the following equation:
txt3(y,x,t−1)=G(txt(y,x,t), txt(y,x,t−2)|) Equation [2]
where txt(a,b,c)=lpf*G(|f(a−2,b,c)−f(a,b,c)|, |f(a+2,b,c)−f(a,b,c)|).
An adaptive threshold thd can be calculated for motion detection 21 as a function of the calculated texture information. For instance, the adaptive threshold can equal:
thd=α*txt1+β*txt3 Equation [3]
where α and β are predefined constants. The predefined constants α and β can be determined based upon empirical and/or statistical analysis. For instance, α can be set to 0.3 and β can be set to 0.5 for luma components.
In addition, differential values for the pixel can be calculated 14 for the pixel in the various video fields. The differential value for the pixel is calculated based on the difference of the luma values f(a,b,c) for that pixel position in two or more video fields. Alternatively, chroma values for the pixel position can be used, as well as other video characteristics for the pixel position. Typically, the difference in luma values can be based on the luma value in the current field (e.g., time=t) and the luma value in the previous field that has the same polarity as the current field (e.g., two fields away, time=t-2). This differential value can be referred to as Dif02 for the pixel, which can equal:
Dif
02(y,x,t,t−1)=f(y,x,t)−f(y,x,t−2) Equation [4]
Additionally other differential values can be calculated depending on the number of video fields used to detect motion for a particular pixel of the current video field. In the current example, there are a total of three video fields, including the current video field, the previous field, and the previous, previous video field. Thus, the other differentials can equal:
Dif
01(y,x,t,t−1)=f(y,x,t)−f(y,x,t−1) Equation [5]
Dif
21(y,x,t−2,t−1)=f(y,x,t−2)−fi(y,x,t−1) Equation [6]
where fi(a,b,c) is an interpolated value for the pixel at the given time. The other differentials Dif01 and Dif21 can also be used for determining an adaptive combing value (not shown in the flow chart, but illustrated in
Motion detection can be determined for the pixel in the video fields 16 as a function of the determined threshold 12 and the calculated differential values 14. For instance, the differential value Dif02 and the adaptive threshold are compared to determine whether motion is detected for the respective pixel. If the differential value Dif02 is greater than the adaptive threshold, then motion is detected for the pixel; else, motion is not detected for the pixel.
The luma functions for a current field f(t), a first previous field f(t-1), and a second previous field f(t-2) are inputted to the differential calculator 40. The differential calculator calculates various differential values for a pixel position in the various fields, including the differential values Dif02, Dif01, and Dif21. The differential values Dif01 and Dif21 are inputted to the adaptive combing block 44 for determining an adaptive combing value cmb. The adaptive combing value can be equal to the following:
cmb=G(Dif(x,y,t t−1), Dif(x,y,t,t−2)) Equation (7)
The differential value Dif02 is inputted to the motion comparison block 48 and the field motion calculator 50.
The luma functions for the current field f(t), the first previous field f(t-1), and the second previous field f(t-2) are also inputted to the textual information calculator 42. The textual information calculator 42 determines textual information for the video fields, including txt1 and txt3. The determined textural information is inputted to the adaptive threshold calculator 46 and the field motion calculator 50.
The adaptive threshold calculator 46 receives the textual information and the combing value cmb, and calculates an adaptive threshold. The adaptive threshold can be updated as needed or required. For instance, the threshold result of Equation [3] can be compared with a scaled combing value. The larger of two values can be selected as a new adaptive threshold value.
The adaptive threshold is outputted to the motion comparison block 48. The motion comparison block 48 compares the differential value Dif02 and the adaptive threshold to detect motion for the respective pixel. The motion comparison block 48 can output a one-bit motion value (e.g., the motion value can be as follows mot(x,y,t)) to indicate whether a motion has been detected for the respective pixel. For instance, a 0 motion value can indicate that there is no motion detected for the pixel, and a 1 motion value can indicate that motion is detected for the pixel. Thus, the motion value is equal to zero when the Dif02 is greater than the adaptive threshold; and the motion value is equal to one for any other cases. Next, the motion value can be stored by the motion field delay 54, and inputted to the sure motion calculator 52.
The motion field delay 54 can store a motion value for each of the pixels of the current field and other fields. A motion history (“hist”) can be stored as the following equation:
hist(y,x,t)=L(hist, mot), Equation [8]
where hist is an iterative function for indicating the motion/static history of a video field. For instance, assume hist(y,x,t-2)=[mot(y,x,t-8),mot(y,x,t-6),mot(y,x,t-4),mot(y,x,t-2)], then hist(y,x,t) can comprise 4-bits for each pixel since hist(y,x,t)=L(hist(y,x,t-2), mot(y,x,t)), where L is a function operator for the various motion values.
The motion values and hist can be read, and then used by the sure motion calculator 52 to confirm that motion is being detected for the current field (i.e., mot(y,x,t)) and/or other fields (e.g., from the motion field delay 54, including hist(y,x,t-2), hist(y−1,x,t-1) and hist(y+1,x,t-1)). If motion is confirmed, then the sure motion calculator 52 can send a sure motion flag to the alpha calculator 56. If motion is not confirmed, then the sure motion calculator 52 can send a still motion flag to the alpha calculator 56.
The field motion calculator 50 can receive the differential value Dif02, the adaptive combing cmb and the textual information for generating a field motion value for use by the alpha calculator 56. The field motion calculator 50 calculates a quantified field motion (“fMt0”) between the current video field (time=t) and the previous, previous video field (time=t-2), which can equal:
fMt0=|Dif(y,x,t,t−2)|−H(I(txt1, txt3), cmb), Equation [9]
where H(k ,l)=a1*k+a2*l+a3 and I(m,n)=b1*m+b2*n+b3. The values a1, a2, a3, b1, b2, and b3 can be predefined and can be implemented by programmable registers.
The field motion calculator 50 also calculates a quantified field motion (“fMt1”) between the previous, previous video field (time=t-2) and the previous video field (time=t-1), which can equal:
fMt1=|dif(y,x,t−1,t−2)|K(I(txt1, txt3), cmb), Equation [10]
where K(k,l)=a1*k+a2*l+a3.
The alpha calculator 56 uses the field motion calculator and the motion flags to determine an alpha for use by a blender during the decoding of the respective fields. Alpha can equal the following:
alpha=M(fMt0, mot, pSM, pSS), Equation[11]
where pSM is the sure motion flag, pSS is the sure still flag, and M(a,b,c,d) function is as follows:
if pSM(y,x,t-1) is true, then there is motion and alpha=a maximum value (e.g. 15 for 4-bit value);
if pSS(y,x,t-1) is true, there there is no motion and is static, thus alpha=a minimum value (e.g. 0); and
otherwise, alpha is scaled as a function of the fMt0 and mot.
While the present invention has been described with reference to certain preferred embodiments or methods, it is to be understood that the present invention is not limited to such specific embodiments or methods. Rather, it is the inventor's contention that the invention be understood and construed in its broadest meaning as reflected by the following claims. Thus, these claims are to be understood as incorporating not only the preferred apparatuses, methods, and systems described herein, but all those other and further alterations and modifications as would be apparent to those of ordinary skilled in the art.
Number | Date | Country | |
---|---|---|---|
61451840 | Mar 2011 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2012/028158 | Mar 2012 | US |
Child | 14042601 | US |