This disclosure relates to digital video encoding and decoding and, more particularly, telecine and inverse telecine techniques, in which the frame rate of a video sequence is changed.
Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, video gaming devices, video game consoles, cellular or satellite radio telephones, and the like. Digital video devices implement video compression techniques, such as those described in standards defined by MPEG-2, MPEG-4, or ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), to transmit and receive digital video information more efficiently. Video compression techniques may perform block-based spatial prediction and/or temporal prediction to reduce or remove redundancy inherent in video sequences.
Telecine techniques may be used to change the frame rate of a video sequence. Telecine techniques are desirable, for example, to enable a motion picture that was originally captured on film media to be viewed with standard video equipment, such as televisions, video media players or computers. In particular, telecine techniques may be used to change a conventional video sequence from 24 frames per second (which is common with motion picture films recorded on film media) to 30 frames per second (which is common for digital video played by digital equipment).
Inverse telecine techniques perform the inverse operations of telecine techniques. Thus, if telecine techniques convert a video sequence from 24 frames per second to 30 frames per second, the inverse telecine techniques may convert the video sequence from 30 frames per second back to 24 frames per second. In some cases, telecine techniques may be performed as part of a video encoding process, while inverse telecine techniques may be performed as part of a video decoding process.
In some cases, inverse telecine can be part of a transcoding process. In this case, inverse telecine may be implemented as part of a transcoder, or as part of an encoder or a decoder. In the case of transcoding, the telecined content may be converted back to an original frame rate, such as 24 frames per second, and re-encoded according to a different encoding format. Inverse telecine, in this case, may occur prior to the transcoding process, and may be implemented in a transmitting device that sends data to the transcoder, or a receiving device that performs the transcoding.
Telecine and inverse telecine, however, are not limited to video encoding or decoding scenarios. Telecine and inverse telecine techniques may be used for many reasons independent of any spatial- or temporal-based video encoding or decoding. Basically, anytime it is desirable to change the frame rate of a video sequence, telecine may provide a useful way to achieve this goal.
In general, this disclosure describes inverse telecine techniques that are performed to adjust or convert the frame rate of a video sequence. The described techniques provide a useful way to identify a telecine technique that was used to increase the frame rate of a video sequence. Upon identifying the telecine technique that was used, the corresponding inverse telecine technique can be performed with respect to the sequence of video frames in order to reduce the frame rate back to its original form (prior to telecine). This disclosure also provides many useful details of inverse telecine techniques that can improve the inverse telecine process, e.g., by simplifying the inverse telecine process and by reducing memory accesses during the process.
In one example, this disclosure provides a method comprising determining whether individual video frames in a sequence of video frames are progressive frames or interlaced frames, identifying a pattern of the progressive frames and the interlaced frames in the sequence of video frames, identifying a telecine technique based on the pattern, and performing an inverse telecine technique with respect to the sequence of video frames based on the identified telecine technique, wherein the inverse telecine technique converts N video frames per second to M video frames per second, wherein M and N are positive integers and M is less than N.
In another example, this disclosure provides an apparatus comprising an inverse telecine unit that determines whether individual video frames in a sequence of video frames are progressive frames or interlaced frames, identifies a pattern of the progressive frames and the interlaced frames in the sequence of video frames, identifies a telecine technique based on the pattern, and performs an inverse telecine technique with respect to the sequence of video frames based on the identified telecine technique, wherein the inverse telecine technique converts N video frames per second to M video frames per second, wherein M and N are positive integers and M is less than N.
In another example, this disclosure provides a device comprising means for determining whether individual video frames in a sequence of video frames are progressive frames or interlaced frames, means for identifying a pattern of the progressive frames and the interlaced frames in the sequence of video frames, means for identifying a telecine technique based on the pattern, and means for performing an inverse telecine technique with respect to the sequence of video frames based on the identified telecine technique, wherein the inverse telecine technique converts N video frames per second to M video frames per second, wherein M and N are positive integers and M is less than N.
The techniques described in this disclosure may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the software may be executed in a processor, such as a microprocessor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), or digital signal processor (DSP). The software that executes the techniques may be initially stored in a computer-readable medium and loaded and executed in the processor.
Accordingly, this disclosure also contemplates a computer-readable medium comprising instructions that when executed by a processor cause the processor to determine whether individual video frames in a sequence of video frames are progressive frames or interlaced frames, identify a pattern of the progressive frames and the interlaced frames in the sequence of video frames, identify a telecine technique based on the pattern, and perform an inverse telecine technique with respect to the sequence of video frames based on the identified telecine technique, wherein the inverse telecine technique converts N video frames per second to M video frames per second, wherein M and N are positive integers and M is less than N.
The details of one or more aspects of the disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the techniques described in this disclosure will be apparent from the description and drawings, and from the claims.
This disclosure describes techniques for detecting telecine and performing inverse telecine. Telecine is the process of converting the frame rate of a video sequence, and inverse telecine is the process of converting the frame rate back to the original rate. Telecine is commonly used to convert film which was shot at 24 frames per second to video at 30 frames per second (or 60 fields per second). Telecine is often performed by a procedure called 3:2 pull down, although other types of conversions could be used.
Inverse telecine is the process of reversing the telecine process, and is conceptually illustrated in
Inverse telecine algorithms, consistent with this disclosure, may analyze the frames and fields of a video sequence to determine the repeating fields, and therefore identify a particular pull down pattern. The inverse telecine techniques may use four fields in order to detect a pull down pattern and perform pull down correction. Similar techniques may use even more fields (e.g., ten fields) for telecine detection. However, the need to process such large amounts of data (e.g., four fields or five frames) may result in high power consumption and create challenges to video decoding.
This disclosure also provides methods that may reduce the pixel area that needs to be processed during inverse telecine by selecting necessary portions of a frame or field. The described techniques may be independent of the actual inverse telecine algorithm and can be used with any type of inverse telecine algorithm, including 3:2 pull down, as well as numerous other types of telecine. The described techniques may involve fetching a subset of the pixel data that might otherwise be needed from external memory, and thereby reducing the number of the memory accesses without degrading the performance of inverse telecine algorithm.
Again, telecine often refers to the process of converting film to video. Film refers to photographic material typically produced for the cinema. Film is commonly recorded at 24 frames per second. However, television defined by the National Television System Committee (NTSC), and other digital video broadcasts may define 30 frames per second for video. Therefore, in order to display film content on NTSC compliant televisions, the film is converted to video. The conversion process is referred to as telecine. In some cases, NTSC standard conventional television systems may operate at 60 interlaced fields per second (actually 59.94 fields per second) and for the film's motion to be accurately rendered on the NTSC video signal, telecine may be needed to convert the film frame rate from 24 fps to 30 fps (i.e., approximately 60 fields per second).
Simply transferring each film frame onto each video frame would result in a film running approximately 24.9 percent faster than intended. A better solution for telecine is to repeat some film frames periodically such as in the case of so-called “3:2 pull down” to prevent apparent speedup of the film when the film is shown at the 30 frame per second video frame rate.
3:2 pull down is one specific type of process of converting 24 fps film rate to 30 fps video rate. To convert the movie rate to TV rate, the 3:2 pull down repeats the film frames in a recurring 3:2 pattern, which can be seen in
The first film frame A may be separated into a top field (A1) and a bottom field (A2). Top field A1 comprises odd numbered lines, and bottom field A2 comprises even numbered lines. The top field A1 and the bottom field A2 define the first video frame as shown in
Other pull down patterns also exist and are consistent with the teaching of this disclosure. A 2:3 pull down, for example, repeats the first film frame two times and the second film frame three times. Therefore, 2:3 pull down is very similar to 3:2 pull down except it is shifted by one frame.
2:2 pull down is another common pull down pattern. It may be used, for example, when converting the 24 frames per second film into a video that defines 48 fields per second. In 2:2 pull down each film frame is repeated twice and becomes 48 fields per second. This method results in speeding up the film and causes the film to run in slightly less time. A less common version of 2:2 pull down is called “2:2:2:2:2:2:2:2:2:2:2:3” pull down. This method inserts a repeated field every 12 frames, resulting in spreading 12 film frames over 25 fields of video and therefore converting 24 frames of film into 50 fields of video. Some motion pictures are telecined this “2:2:2:2:2:2:2:2:2:2:2:3” way. In addition to 3:2 and 2:2 pull down, less common cadences such as 5:5, 6:4 and 8:7 exist as well, and are sometimes used in Japanese animation. Other types of pull downs are also consistent with this disclosure.
Inverse telecine is used to reverse or “undo” the telecine process to regain the original content, e.g., at 24 frames per second. The inverse telecine technique of detection and removal of 3:2 pull down pattern from interlaced video sources to reconstruct 24 frames per second is called both “inverse telecine” or “reverse telecine.” An illustration of the inverse telecine following telecine is shown in
Inverse Telecine can be done in different ways. In some cases, the input telecined video is ingested with telecine information which shows the correspondence between the video frame and the original film frame. In these cases, the decoder (or player) device does not need to detect the pull down pattern but can play the video based on this information (which usually exists in the form of a telecine trace text file).
Another way of inverse telecine is to detect the pull down pattern and reverse it without prior knowledge of the pattern which is the basis of the techniques described herein. Sometimes, once the 3:2 pull down pattern is detected, it can be locked for the remainder of the video and the correction of the pattern can be done based on the initially detected pattern. However, the 3:2 pull down pattern does not necessarily remain consistent throughout the entire video, and edits can be performed on film material. So-called “bad edits” can happen when the editing process eliminates the film frame or more likely, inserts video material, such as commercials or new clips between them. A good inverse telecine algorithm should be able to identify when the 3:2 pull down pattern changes in the source and adaptively correct it. This is sometimes called “bad edit detection.”
The benefits of inverse telecine according to this disclosure may include visual quality improvements, and/or bandwidth and power savings, which will become more apparent from the description below. Specifically, inverse telecine may help to eliminate both spatial and temporal the artifacts in telecined content. If the telecined content is displayed in progressive displays without de-interlacing, combing artifacts may appear particularly at the boundaries of moving objects in a video sequence. However, if the telecined content is de-interlaced, blurring may occur. Furthermore, in addition to spatial artifacts, temporal artifacts such as motion judder may occur due to telecine. The motion judder is sometimes referred to as telecine judder, and may be particularly apparent during slow and steady camera movements. The motion judder is due to the fact that 2 fields out of every 10 fields are repeated during 3:2 pull down process.
Furthermore, some de-interlacing algorithms such as those which use temporal information bias the de-interlacing filtering towards reference (or previous) field to the extent the reference field is repeated and this causes jerkiness as well. On the other hand, “hiccup” like artifacts may occur in material in which 2:2:2:2:2:2:2:2:2:2:2:3 pull down has been applied. Hiccup is slightly different than motion judder and occurs about twice a second in the video.
The “hard telecine” means that pull down is applied before encoding. As opposed to hard telecine, “soft telecine” does not apply pull down before encoding, but rather treats the video as 24 P (wherein P stands for progressive). Soft telecine embeds the bitstream with proper pull down flags and pull down can be executed when displaying the content at interlaced display. It is also important to note that most SD-DVDs are in “hard telecine” mode, and therefore inverse telecine may be needed for both progressive and interlaced displays. In hard telecine, the video becomes 60/50 I (wherein I stands for interlaced) after pull down and is stored as 60/50 I content in video buffer in the same manner as normal interlaced content. The resulting video frames after pull down are used as reference frames for motion estimation and compensation.
In many video sequences, a 3:2 pull down process is applied to the 24 frames per second film source. The resulting 60 fields per second video can be encoded directly, or alternatively, commercials can be added to the video source and the resulting 60 fields per second video content can be encoded after editing. In this case, after the video player decodes the 60 fields per second of video content, the inverse telecine and bad edit detection techniques of this disclosure may be applied. Accordingly, if inverse telecine is detected and corrected, the true progressive 24 frames per second film is displayed. However, if telecine is not detected or does not exist (for example in the case where the input is purely interlaced content with no telecine applied to it), de-interlacing can be applied via a filter and the output device can display 30 frames per second of progressive video.
Inverse telecine is a fundamental post-processing feature. Inverse telecine may also be referred to as “film mode detection technology,” “film cadence and bad edit recovery,” “film mode detection,” and “reverse 3:2 pull down.” 3:2 pull down is widely accepted in the industry.
As shown in
In the example of
Again, the illustrated system 10 of
Video source 18 of source device 12 may include a video capture device, such as a video camera, a video archive containing previously captured video, or a video feed from a video content provider. As a further alternative, video source 18 may generate computer graphics-based data as the source video, or a combination of live video, archived video, and computer-generated video. In some cases, if video source 18 is a video camera, source device 12 and destination device 16 may form so-called camera phones or video phones. In each case, the captured, pre-captured or computer-generated video may be telecined by telecine unit 20, and encoded by video encoder 22. The encoded video information may then be modulated by modem 23 according to a communication standard, e.g., such as code division multiple access (CDMA) or another communication standard, and transmitted to destination device 16 via transmitter 24 and communication channel 15. Modem 23 may include various mixers, filters, amplifiers or other components designed for signal modulation. Transmitter 24 may include circuits designed for transmitting data, including amplifiers, filters, and one or more antennas.
Receiver 25 of destination device 16 receives information over communication channel 15, and modem 26 demodulates the information. Like transmitter 24, receiver 25 may include circuits designed for receiving data, including amplifiers, filters, and one or more antennas. In some instances, transmitter 24 and/or receiver 25 may be incorporated within a single transceiver component that include both receive and transmit circuitry. Modem 26 may include various mixers, filters, amplifiers or other components designed for signal demodulation. In some instances, modems 23 and 26 may include components for performing both modulation and demodulation. Video decoder 28 performs block based video decoding, e.g., the reconstruct the encoded video blocks that were encoded by video encoder 22. Inverse telecine unit 29 then performs inverse telecine with respect to the decoded video.
The inverse telecine process performed by destination device 16 may be performed during video decoding, although aspects of this disclosure might also be performed without block-based video decoding. In particular, inverse telecine unit 29 may perform the inverse telecine techniques, as described herein, to convert the frame rate of a video sequence back to the original film rate (e.g., to “undo” the telecine performed by telecine unit 20 of source device 12).
More specifically, inverse telecine unit 29 may determine whether individual video frames in a sequence of video frames are progressive frames or interlaced frames, identify a pattern of the progressive frames and the interlaced frames in the sequence of video frames, identifying a telecine technique based on the pattern, and perform an inverse telecine technique with respect to the sequence of video frames based on the identified telecine technique. In this case, the inverse telecine technique converts N video frames per second to M video frames per second, wherein M and N are positive integers and M is less than N. Accordingly, inverse telecine reduces the frame rate back to the original film rate associated with the video sequence as it was originally recorded onto film media.
Video decoder 28 may include motion estimation and motion compensation components for temporal-based decoding. In addition, video decoder 28 may include spatial estimation and intra coding units for spatial-based decoding. Display device 30 displays the decoded video data to a user following the inverse telecine process, and may comprise any of a variety of display devices such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device.
In the example of
Video encoder 22 and video decoder 28 may operate according to a video compression standard, such as the ITU-T H.264 standard, alternatively described as MPEG-4, Part 10, Advanced Video Coding (AVC). The techniques of this disclosure, however, are not limited to any particular video coding standard. Although not shown in
The various components of source device 12, and destination device 16, including inverse telecine unit 29 of destination device 16 may be implemented as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. Telecine unit 20 and inverse telecine unit 29 may be incorporated within video encoder 22 and video decoder 28, respectively. Again, the inverse telecine techniques of this disclosure may be implemented as part of a video decoding process, but may also be used in other settings and scenarios. Furthermore, after inverse telecine operations, the video data does not necessarily need to be displayed. In other examples, following inverse telecine, the video data may be re-encoded (for example in a transcoding scenario), and the new encoded video data can either be stored for future playback or can be transmitted for broadcasting applications.
A video sequence typically includes a series of video frames. Video encoder 22 operates on video blocks within individual video frames in order to encode the video data. The video blocks may have fixed or varying sizes, and may differ in size according to a specified coding standard. Each video frame includes a series of slices. Each slice may include a series of macroblocks, which may be arranged into sub-blocks. As an example, the ITU-T H.264 standard supports intra prediction in various block sizes, such as 16 by 16, 8 by 8, or 4 by 4 for luma components, and 8×8 for chroma components, as well as inter prediction in various block sizes, such as 16 by 16, ≠by 8, 8 by 16, 8 by 8, 8 by 4, 4 by 8 and 4 by 4 for luma components and corresponding scaled sizes for chroma components. Video blocks may comprise blocks of pixel data, or blocks of transformation coefficients, e.g., following a transformation process such as discrete cosine transform (DCT) or a conceptually similar transformation process. According to the techniques of this disclosure, video encoder 22 and video decoder 28 operate in the telecined domain, e.g., following telecine performed by unit 20. In another scenario, an encoder could be applied after inverse telecine unit 29, and in this case, the encoder may operate in non-telecine domain.
Smaller video blocks can provide better resolution, and may be used for locations of a video frame that include high levels of detail. In general, macroblocks and the various sub-blocks may be considered to be video blocks. In addition, a slice may be considered to be a series of video blocks, such as macroblocks and/or sub-blocks. Each slice may be an independently decodable unit of a video frame. Alternatively, frames themselves may be decodable units, or other portions of a frame may be defined as decodable units. The term “coded unit” refers to any independently decodable unit of a video frame such as an entire frame, a slice of a frame, or another independently decodable unit defined according to the coding techniques used.
To encode the video blocks, video encoder 22 performs intra- or inter-prediction to generate a prediction block. Video encoder 22 subtracts the prediction blocks from the original video blocks to be encoded to generate residual blocks. Thus, the residual blocks are indicative of differences between the blocks being coded and the prediction blocks. Video encoder 22 may perform a transform on the residual blocks to generate blocks of transform coefficients. Following intra- or inter-based predictive coding and transformation techniques, video encoder 22 performs quantization. Quantization generally refers to a process in which coefficients are quantized to possibly reduce the amount of data used to represent the coefficients. Following quantization, entropy coding may be performed according to an entropy coding methodology, such as context adaptive variable length coding (CAVLC) or context adaptive binary arithmetic coding (CABAC).
In destination device 16, video decoder 28 receives the encoded video data, and entropy decodes the received video data according to an entropy coding methodology, such as CAVLC or CABAC, to obtain the quantized coefficients. Video decoder 28 applies inverse quantization (de-quantization) and inverse transform functions to reconstruct the residual block in the pixel domain. Video decoder 28 also generates a prediction block based on control information or syntax information (e.g., coding mode, motion vectors, syntax that defines filter coefficients and the like) included in the encoded video data. Video decoder 28 sums the prediction block with the reconstructed residual block to produce a reconstructed video block for display.
According to the techniques of this disclosure, inverse telecine unit 29 may determine whether individual video frames in a sequence of video frames are progressive frames or interlaced frames, identify a pattern of the progressive frames and the interlaced frames in the sequence of video frames, identifying a telecine technique based on the pattern, and perform an inverse telecine technique with respect to the sequence of video frames based on the identified telecine technique. In this case, the inverse telecine technique converts N video frames per second to M video frames per second, wherein M and N are positive integers and M is less than N. Accordingly, inverse telecine reduces the frame rate back to the original film rate associated with the video sequence as it was originally recorded onto film media.
Furthermore, inverse telecine unit 29 may leverage the fact that video decoder 28 has already loaded certain video data as part of the decoding process. That is, memory loads of data for purposes of video decoding by video decoder 28 may be used to reduce unnecessary duplicative memory loads of the same data, if such data is also needed for the inverse telecine process performed by inverse telecine unit 29. In this way, memory loads associated with inverse telecine unit 29 may be reduced, conserving power and memory bandwidth.
For 3:2 pull down, for example, the inverse telecine technique converts 30 video frames per second to 24 video frames per second by converting each pattern of five frames (P, P, I, I, P) into a pattern of four progressive frames (P, P, P, P), or each pattern of five frames (P, I, I, P, P) into a pattern of four progressive frames (P, P, P, P). In either case, when a pattern is associated with a 3:2 pull down telecine technique, identifying the pattern comprises identifying five frame sequences that consist of three progressive frames and two interlaced frames. For PPIIP, there would be two progressive frames followed by two interlaced frames followed by one progressive frame, whereas for PIIPP, there would be one progressive frame followed by two interlaced frames followed by two progressive frames. In either case, performing the inverse telecine technique may comprise converting the five frame sequences to four frame sequences, wherein the inverse telecine technique converts 30 video frames per second to 24 video frames per second.
In identifying whether individual video frames in the sequence of video frames are progressive frames or interlaced frames, telecine unit 29 may process only a subset of data associated with the individual video frames. Additional details of how this subset can be defined are provided below. Generally, the subset may comprise a block of pixel data within the individual frames, wherein the block is pre-defined for inverse telecine detection, and wherein the block of pixel data is fetched from memory for each of the individual frames. The subset may comprise vertical columns of pixel data within the individual frames, wherein the vertical columns of pixel data within the individual frames are pre-defined for inverse telecine detection, and wherein the vertical columns of pixel data within the individual frames are fetched from memory for each of the individual frames.
In some cases, the subset of data processed for purposes of inverse telecine may comprise vertical columns of pixel data within the individual frames, wherein the vertical columns of pixel data within the individual frames are adaptively defined based on whether data has already been fetched from memory for use in predictive video coding. In other cases, the subset associated with any given frame may be adaptively defined based on whether data has already been fetched from memory for use in predictive video coding. As outlined in greater detail below, for example, inverse telecine unit 29 may generate a map of pixels associated with a respective frame to define whether data has already been fetched from memory for use in predictive video coding, and define the subset for the respective frame based on the map. To further simplify processing, inverse telecine unit 29 may generate a partial map of pixels associated with a respective frame to define whether data has already been fetched from memory for use in predictive video coding, and define the subset for the respective frame based on the partial map, wherein the partial map is defined during video coding of the respective frame as statistics become available, wherein the statistics define whether individual pixels have already been fetched for the video coding. In either case, the map may pinpoint useful data that is already stored for purposes of video decoding by decoding unit 28, thus eliminating the need for inverse telecine unit 29 to fetch that same data again.
There are many algorithms proposed for inverse telecine process. The focus of this disclosure is an inverse telecine process that does not require information in the bitstream to identify the telecine technique that was used. In addition, another focus of this disclosure is memory bandwidth reduction during the inverse telecine process.
Inverse telecine module 51 may analyze the input frames, perform telecine detection and do a correction based on the pattern identified during detection stage. Telecine detection algorithms may be classified based on the number of input fields or frames used for identifying the pull down pattern. The number of fields used in telecine detection algorithms is usually 2, i.e., top and bottom fields of a video frame. However, algorithms may use 4 fields (i.e., top and bottom fields of two different frames) in telecine detection. Other numbers of fields, e.g., 5 or more input fields, could also be defined.
The processing of such large amounts of data, however, can require high power and resources. A telecine algorithm may conduct a zig-zag scan of a frame to reduce the amount of pixels to be processed. Furthermore, in order to reduce the number of operations performed by inverse telecine module 51, techniques that “disable inverse telecine once the telecine pattern is locked” could be executed by inverse telecine module 51. In this case, once the telecine pattern is found, the pattern is locked, and therefore, inverse telecine module 51 does not need to continue accessing new input frames, which may reduce processing power and bandwidth. However, this type of approach does not reduce the input pixel data that is used by inverse telecine module, 51, but rather, it reduces the number of times that inverse telecine module 51 operates. Accordingly, this type of technique may miss telecine pattern changes that can happen during bad-editing.
The techniques of this disclosure propose an effective algorithm to identify the pixel data to fetch for telecine detection. The advantages of the techniques of this disclosure may include a reduction in the amount of pixels used in inverse telecine process, which may reduce memory bandwidth without degrading inverse telecine performance. In addition, by reducing the amount of data traffic from memory and processing cycles, the described techniques may help to support application of inverse telecine to higher resolutions of video such as high-definition applications. The described techniques do not require any information to be conveyed in the bitstream to identify telecine, rather, telecine is detected purely on the content of the video.
For devices wherein power consumption is a concern (such as wireless devices), the described inverse telecine techniques may help to process more frames for telecine detection relative to other techniques that use similar amounts of power, which in turn helps to catch bad editing that happens during insertion of commercials and scene cuts. The memory bandwidth and power conservation aspects of this disclosure may be independent of the telecine detection algorithm and may be used with other telecine detections algorithms that require access to at least two fields (e.g., even and odd fields) of a frame. In this case, advantages may be achieved by fetching only portions of pixel data, where the portions of pixel data are determined adaptively by compressed domain statistics, or deterministically by vertical sampling approaches described in greater detail below. The moving parts of a picture are usually better indicators for telecine detection. Therefore, performing inverse telecine with respect to regions of interest that have high levels of motion may provide good telecine detection performance while decreasing memory bandwidth. Furthermore, the techniques of this disclosure may utilize available pixel data already fetched to an internal memory during video decoding by tracking motion vectors and the reference pictures identified by motion vectors.
The two major aspects of inverse telecine techniques are “telecine detection” (i.e., pull down detection) and “telecine correction.” In addition to these, “bad edit detection” may also be part of inverse telecine technique.
The basic goal of telecine detection 61 is to find out whether the interlaced video has gone through a 3:2 pull down, a 2:2 pull down, or another pull down process. The “states” of frames refer to the order of video frames as shown in
The goal of bad edit detection 62 may be to determine whether the initially identified pull down pattern is broken in time or not. A broken pull down pattern is illustrated in
The goal of telecine correction 63 is to convert video frames into film frames by using the state information provided by the telecine detection as shown in
Telecine detection algorithms may be classified based on the number of fields they use for identifying the pull down pattern. The minimum number of fields used in telecine detection algorithms is 2, e.g., top and bottom fields of a video frame, although more fields may be used. Telecine detection algorithms can also be classified based on the metric that is used in detection process. The following metrics listed below, for example, could be used for telecine detection:
The basis of some telecine algorithms is pixel differencing, e.g., using the SAD metric. SAD may be calculated between corresponding fields of two frames to identify whether a particular field is repeated or not. For example, referring to
Pixel block parameters may also be used for telecine algorithms. The parameters may include content information such as the edges in a particular block of pixels. This metric is different from SAD in the sense that it measures content change instead of pixel value change. Using pixel statistics is similar to block parameter approach, where a comparison is made between two fields by using the mean and variance of a set of pixels.
Bad edit detection is not usually emphasized in telecine detection. Some algorithms may assume different pull down patterns, but this is usually not preferred. Different telecine detection algorithms may differ in terms of the number and choice of reference fields that they use in detection and the metric they use. Various aspects of this disclosure, particularly the memory bandwidth reduction aspects, may be used with a variety of inverse telecine algorithms.
In one type of inverse telecine algorithm, the SAD metric may be used in order to identify telecine detection. In this case, SAD is calculated between the same parity fields of two consecutive frames. If the SAD value of one field is greater than a preset threshold, the SAD value of the opposite field is also calculated. If the SAD value is comparable to the SAD value of opposite field, no telecine is detected. On the other hand, if the SAD value of opposite field is smaller, “Out_of_phase” is identified. If out_of_phase is detected consecutively during State_2 and State_4, the telecine pattern may be locked. Note that in the context of this algorithm, out_of_phase refers to interlaced video frame where either the top or bottom field of the video frame comes from previous video frame. In a group of five video frames that has gone through 3:2 pull down detection, out_of_phase should be detected twice: (i) between State_2 and State_1, (ii) between State_4 and State_3.
In all, 2 frames, i.e., 4 fields may be used in this type of inverse telecine algorithm. However, SAD may be calculated by using only a part of the pixels in a frame, as outlined in greater detail herein. The image may be scanned in zigzag fashion and only a small part of the image may be used. SAD implementation may be done in an 8-bit architecture. After locking the telecine pattern and detecting the State_2 followed by State_4 and then State_2, the algorithm may perform telecine correction, and output the reverse telecine content. The output may be interrupted any time when the telecine pattern fails at State_2 and State_4. The video frames are outputted as they are (i.e., no correction or change) for the following cases:
Various memory bandwidth reduction aspects of this disclosure (which are addressed in greater detail below) may be applicable to any of these exemplary inverse telecine approaches. At this point, however, this disclosure will focus on a proposed inverse telecine technique that implements “telecine detection” and “telecine correction” modules or units.
In this case, telecine detection may be carried out by two major stages: telecine cost calculation and telecine pattern analysis. A third stage (telecine correction), may also form part of the inverse telecine algorithm.
Telecine cost calculation unit 111 may use 2 fields (i.e., even and odd fields) of a picture. When compared to other algorithms which use more than 2 fields, this type of telecine cost calculation has advantages in terms of fulfilling low memory bandwidth requirements when implemented in resource constraint environments.
Even though the proposed algorithm is designed to detect 3:2 and 2:2 pull down patterns, it could easily be adjusted and used to detect other pull down patterns. The pattern analysis stage of unit 112 can be easily modified to detect other pull down patterns if necessary.
The “cost” in telecine cost calculation unit 111 may indicate the “number of columns that are detected as out-of-phase,” where “out-of-phase” means that even and odd fields in a picture are coming from different time instants. Out-of-phase data indicates interlacing. The goal of cost calculation algorithm is basically to identify whether a picture is interlaced or progressive.
For each length calculated (135), telecine cost calculation unit 111 determines whether the length is greater than a length threshold Len_TH (136). If so (“yes” 136), telecine cost calculation unit 111 increments an out_of_phase_counter (137), and then determines whether the line is finished (138). Telecine cost calculation unit 111 may repeat this process for every pixel in the line, incrementing the out_of_phase_counter each time a given length is greater than the length threshold. Once the line is finished, telecine cost calculation unit 111 determines whether the out_of_phase_counter is greater than a count threshold count_TH (139). If so (“yes” 139), telecine cost calculation unit 111 sets an Out_of_Phase flag to 1 (140). If not (“yes” 139), telecine cost calculation unit 111 determines whether all vertical lines are finished (141).
If more vertical lines need to be considered, (“no” 141), telecine cost calculation unit 111 repeats the process for such lines. However, if telecine cost calculation unit 111 determines whether the out_of_phase_counter is less than a count threshold count_TH (“no” 139) and that all vertical lines are finished (“yes” 141), telecine cost calculation unit 111 sets the Out_of_Phase flag to 0. In this example, the Out_of_Phase flag being 0 means that the frame is progressive, while the Out_of_Phase flag being 1 means that the frame is interlaced.
The algorithm shown in the flow diagram of
d(x, y)=p(x, y)−p(x, y+1) (equation 1)
d(x, y+1)=p(x, y+1)−p(x, y+2) (equation 2)
Next the pixel difference is thresholded with the following equation:
where t(x,y) in Equation (3) represents a peak if it is equal to 1 and valley if it is −1.
In order to avoid the effect of noise in peak-valley determination, telecine cost calculation unit 111 may use a pixel threshold th_p. The intuition behind the algorithm can be explained as follows. If a picture is interlaced, the odd and even fields will have high correlation with each other and similar pixel values. When they are interleaved, as shown in
After determining peaks and valleys, the length of consecutive peak and valleys can be calculated as follows:
for (y=1: number of rows){ if(|t(x,y)−t(x,y+1)|==2) length(y)++; else length(y)=0; } (equation 4)
If the length of consecutive peaks and valleys is above a threshold (len_th), the column is identified as out_of_phase and an out_of_phase counter is increased. The len_th is adjusted based on the resolution of the image.
if (length(y)>(len—th)) the out_of_phase_counter(t) is incremented. (equation 5)
Then, as a final step, the number of columns detected as out_of_phase may be compared against a threshold. If the number of columns detected as out_of_phase is larger than count_th, the whole picture may be identified as out_of_phase and represented with the binary label “1”. If the number of out_of_phase columns is less than a threshold, the picture is identified as in_phase and represented by the binary label “0.” In other words:
if (out_of_phase_counter(t)>(count_th)) picture_label(t)=1; else picture_label(t)=0; (equation 6)
In some implementations, early termination of the process may be possible both in column and in picture level. In column level early termination, once the length of consecutive peaks and valleys exceed the threshold len_th, the algorithm may stop processing the current column and move to the next column. In picture level early termination, once some percent threshold (e.g., count_th) is reached, it may be unnecessary to check the subsequent columns.
Telecine pattern analysis unit 112 may analyze the picture_label information of consecutive pictures and identify whether the input video has 3:2 or 2:2 pull down pattern or not. Furthermore, telecine pattern analysis unit 112 may determine the state information of each frame based on the starting state of the pull down pattern. A correct 3:2 pull down pattern and the picture labels are shown in
CPD—32=[1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 . . . ] (equation 7)
It can be seen from equation 8 above that [0 1 1 0 0] is the basic bit pattern that repeats itself in CPD_32. Note that the pattern can be shifted and can start from the 2nd or 3rd column of CPD_32. Although equation 7 may represent the most common pattern, there is no standard specifying the offset value of the pull down pattern. Therefore, it may be necessary to consider all possible offsets to correctly detect the pull down pattern. An example of the same 3:2 pull down pattern with an offset of 2 is presented below.
CPD—32=[1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 . . . ] (equation 8)
Mathematically, one may find a correct pattern if the following equation is satisfied:
If ([picture_label(t−4) picture_label(t−3) picture_label(t−2) picture_label(t−1) picture_label(t)]=Pattern_ID(1)∥Pattern_ID(2)∥Pattern ID(3)∥Pattern ID(4)∥Pattern_ID(5)) set Picture_ID=get_ID(Pattern_ID); (equation 9)
Where t represents time, ∥ is OR operation and Pattern_ID's with different offsets are given below.
1=Pattern_ID(1)=[0 1 1 0 0]
2=Pattern_ID(2)=[1 1 0 0 0]
3=Pattern_ID(3)=[1 0 0 0 1]
4=Pattern_ID(4)=[0 0 0 1 1]
5=Pattern_ID(5)=[0 0 1 1 0] (equation 10)
Typically the algorithm can find the first 3:2 pull down pattern as early as the fifth frame. However, it may be desirable to lock the 3:2 pull down pattern if four basic patterns are found out of 6 patterns (i.e., after 30th frame) as shown in each of the three examples of
Once the pull down pattern is locked, the state of each picture may be identified. The state of each picture can be found easily by table lookup method as shown in Table 1, below.
A 2:2 (i.e., 2:2:2:2:2:2:2:2:2:2:2:3) pull down pattern detection procedure may be similar to the 3:2 pull down case. The difference is that 2:2 pull down has a particular correct pull down pattern (shown in equation 11), and the lock time is longer since a basic 2:2 pattern is larger in length compared to 3:2 pull down basic pattern.
CPD—22=[0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 . . . ] (equation 11)
Parameters such as “the number of patterns checked” and “the correct pull down pattern” can be easily modified in different implementations.
Telecine Correction unit 113 converts video frames into film frames by using the state information provided by the telecine detection, which is performed by telecine cost calculation unit 111 and telecine pattern analysis unit 112. Telecine correction is relatively a straightforward process once the video frame states are correctly identified by the telecine detection process. Telecine correction is done at the time frames are fetched for display. Simply, one frame may be discarded out of very five frames during telecine correction, and in this way 24 frames per second may be obtained from 30 frames per second of video.
Telecine detection may involve storing a telecine pattern while maintaining a picture state machine. A telecine detection module or unit may inform a telecine correction module or unit of the picture state information. The state information indicates the type of fetching action to be performed for telecine correction. Different Telecine correction actions may be performed for each state as shown in Table 2.
Telecine detection may informs the display (e.g., display device 30 of
A telecine detection module could be implemented within video decoder. This is a convenient location since more than half of the pixels in a frame, which are used by a telecine detection unit may already in an internal memory, and in this case, do not need to be fetched from external memory. This implementation provides advantage in terms of reducing data traffic associated with memory fetches, i.e., reducing the use of memory bandwidth. Once telecine is detected, information such as “Film Mode Flag” and a “Picture State” could be sent to telecine correction module. After telecine correction, the corrected frame may be processed by a pixel processing pipeline, which may include algorithms for image scaling, sharpening and enhancement, and possibly other image processing.
One implementation of the techniques of this disclosure is shown in
At the beginning of decoding, a telecine detection flag may be automatically ON. However, once the pull down pattern is found and locked, the flag can be turned OFF. The telecine detection flag may be controlled by a “telecine update” module labeled as update telecine detection unit 207. This update telecine detection unit 207 enables telecine detection in regular intervals even though a pull down pattern might be locked, and may help the algorithm to identify potential “bad edits.”
When the telecine detection flag is ON (“yes” 202), the first step of the algorithm may be to perform “Cost Calculation.” The output of telecine cost calculation unit 204 is passed to frame level telecine label calculation unit 205, in which the state of each picture is identified. The state information of each picture is used by telecine pattern detection unit 206 (as described herein) to determine whether the video is telecined or not. If a pull down pattern is found, telecine is locked and a “film mode flag” is turned ON. When the Film Mode Flag is ON (“yes” 208), device 200 can calculate the states of each picture. The state information dictates telecine correction unit 209 how to perform the correction since there is different method of correction for each state.
Frame_State Calculation unit 210 may calculate the states of each picture, and output the Frame_State. If the Frame_State is F3, telecine correction unit 209 performs State_F3 telecine correction 212 as described above for State 3. If the Frame_State is state 1, 4 of 5 (“Yes” 213), then those frames are output as being progressive frames. If Frame_State is state 2 (“Yes” 214), the process ends and nothing is output for that frame, i.e., frames in state 2 are dropped in the inverse telecine correction process.
If the Film Mode Flag is OFF, then de-interlacing is applied on the frame by de-interlacing unit 215. Different portions of the algorithm could be partitioned into hardware or software depending on the implementation platform.
Telecine cost calculation may be performed on a per pixel basis as shown in
Referring now to
An overview of exemplary algorithms in telecine pattern analysis and detection is presented in
If the telecine pattern is found (“yes” 235), the algorithm sets FilmMode Flag to 1 (236), sets TelecineDatection Flag to 0 (237), and sets a current frame_state (238). If the telecine pattern is not found (“no” 235), the algorithm sets FilmMode Flag to 0 (239), sets TelecineDatection Flag to 1 (240), and sets a current frame_state to F0 (241).
The input to the algorithm shown in
The process of finding pattern IDs for frames may simply involve putting the picture labels of 5 frames in an array, performing template matching over a five pre-determined templates, and finding the Pattern ID of a current picture. In 3:2 pull down, here are five possible pattern options which are given in Table 3 below, with the corresponding states. If the pattern obtained from the input video does not match any one of the five possible pattern options (which is possible if the input is not telecined or the algorithm cannot identify the pattern), then a dummy pattern ID maybe assigned to picture (see
As shown in
A telecine checking stage may also be executed. Telecine pattern checking is another simple step that determines whether a telecine pattern exists or not. The input to this stage may be the current pattern ID obtained in the manner outlined above. Telecine pattern is detected by using the current Pattern ID as well as the stored pattern IDs from previous frames. The correct 3:2 pull down pattern and the corresponding pattern IDs are given in Table 4, below. A 3:2 Pull down pattern may be found and TC_Pattern_Flag can be set to 1 if the consecutive pattern IDs has a difference of 1 as shown in
Once the Pattern ID is found, determining the picture state is a simple table look up procedure as shown in
After the telecine detection algorithm identifies a pull down pattern and locks a state, a state machine can maintain the state information of consecutive pictures. For example if the pattern is locked during State_2, the next picture's state becomes State_3, then State_4, then State_5, then state_F1 and back to State_2
A telecine flag update process is shown in
At the beginning of decoding, a telecine detection flag will be automatically ON. Once the pull down pattern is found and locked, the flag can be turned OFF, telecine detection flag may be controlled by “telecine update” module. Such a “telecine update” module enables telecine detection in regular intervals even though a pull down pattern might be already locked. The update “interval” may be set to 1 second, e.g., 30 frames. Once the pattern is locked, the process may wait for a second (controlled by TC Update Count in
Telecine correction may be performed when frames are fetched for display, in the manner illustrated in
Line_OOPhase stores the phase information of each column. This information may be passed to identify the phase information of the whole frame. TH1 and TH2 are thresholds used by the cost calculation algorithm and they may be controlled (i.e., adjusted based on the resolution of the video). Frame_Level_Telecine_Detection_Flag controls whether cost calculation is performed or not.
In accordance with another aspect of this disclosure, it may be very desirable to evaluate a portion of a frame when performing telecine detection. By reducing the number of pixels fetched, reductions in memory bandwidth and memory usage may be achieved. There are several options for partial fetches of frames for purposes of telecine detection, some of which are illustrated in
The different options for partial fetches of data for purposes of telecine detection may be referred to herein as “deterministic” fetches insofar as the type of data fetch is pre-determined prior to execution of the inverse telecine algorithm. In other words, the data to be fetched is decided in a deterministic manner without considering any bitstream statistics. In another mode, however, the data to be fetched may be determined adaptively by the bitstream information.
In a deterministic method, specific portions of frames to be used for Telecine detection are fetched from the external memory. Again,
Horizontal sampling is not preferred due to the fact that almost all of the telecine detection makes use of vertical correlation and horizontal sampling will loose important information that is necessary for telecine detection. However, horizontal sampling might have use with some formats of video, and this disclosure generally contemplates horizontal sampling notwithstanding the fact that vertical sampling seems to be more suitable for telecine detection. Some cases, including case 7 of
As noted, adaptive fetching may also be desirable, and may leverage memory loads of similar video data used in video decoding in order to facilitate telecine detection based on such data that is already available. In this case, the amount of data fetched for the inverse telecine algorithm may depend on the motion vector and macroblock mode statistics as well as the GOP (Group of Picture) structure of the video.
Accordingly, it may require a more intricate process to synchronize data fetches associated with decoding and inverse telecine when an IBBP GOP structure is used. One example of such synchronization is demonstrated in
This disclosure proposes adaptive fetching techniques in order to leverage data fetches for predictive coding and thereby avoid duplicative data fetches for purposes of inverse telecine. The proposed adaptive fetch algorithm may analyzes the bit-stream information to reduce the bandwidth used for pixel fetch. At least two different methods for adaptive fetching are discussed. In the first method, access to bitstream statistics for the whole frame may be presumed. In this case, decisions can be made to identify which pixels to fetch based on global statistics. In the second access to partial statistics (not the whole frame) may be assumed, and in this case, decisions can be made regarding the pixels to fetch based on such available information.
In some cases, there may be complete access to whole frame statistics. In this case, the inverse telecine unit may check whether the macroblocks are encoded in MBAFF format (wherein MBAFF stands for macroblock adaptive frame/field). If the macroblocks are encoded in MBAFF format, then both the current and previous field (i.e., even and odd field of a frame) may already be stored in memory for purposes of predictive video decoding. In this case, the inverse telecine unit does not need to fetch the pixel data associated with a previous field. However, if the macroblocks are not encoded in MBAFF format, then the inverse telecine unit may need to fetch such data, e.g., as illustrated in
As shown in
As shown in
If the reference picture is the immediate prior field (“yes” 375), inverse telecine unit 29 may determine whether the motion vector is zero (376). If so (“yes” 376), inverse telecine unit 29 may set the block_is_valid bit to 2. If the reference picture is the immediate prior field (“yes” 375), the motion vector is not zero (“no” 376) and the motion vector is less than block_size multiplied by a threshold (TH1), then inverse telecine unit 29 may set the block_is_valid bit to 1. This process may be repeated for every block of a frame (or every block of a subset of a frame) until the last block is reached (380). After reaching the last block (“yes” 380), inverse telecine unit 29 may form a block_validity_map (381) and calculate column-wise block statistics (382) based on the block_validity_map. The block_validity_map may basically identify blocks as having bits 0, 1 or 2. Bit 2 means that the data for that macroblock is already stored in memory, bit 1 means that some of the data for that macroblock may be stored in memory, and bit 0 means that none of the data for that macroblock is stored in memory. Thus, by forming the block_validity_map, useful columns of data (e.g., columns with predominately block_valid_bits equal to 2) may be used for purposes of inverse telecine. Such columns may correspond to data that is already stored in memory, and therefore, memory fetches of such data can be avoided.
Put another way, inverse telecine unit 29 may process all the blocks, and analyze block statistics to form a “block_validity” map. For each block, a value between 0 and 2 is assigned. A larger value implies a better block which helps to reduce bandwidth, i.e., the whole block or large portions of the block from previous field can be found in the internal memory. For each block, first the block mode is checked. If it is inter mode and motion is referencing to immediately previous frame and furthermore if the motion vector is zero, inverse telecine unit 29 may set the block label to be 2.
The reason that inverse telecine unit 29 may look for a zero motion vector is that for telecine detection the collocated block from previous field is needed. If the motion vector is not zero, but less than some threshold value, inverse telecine unit 29 may set the block label to 1. Block value 1 means that portions of the collocated block that will be used for telecine detection are in the internal memory and only part of it has to be fetched from outside. Block value 0 means that collocated block in the previous field is not available and, has to be completely fetched. After processing all the blocks, inverse telecine unit 29 may form block_validity_map. An example of the map is shown in
In particular,
Thus, according to the techniques of
The columns can be ranked based on the labels and N number of the columns can be selected to be fetched from the external memory. The number N can either be a predetermined value, or can be adjustable. When a given block is in MBAFF format, both fields can be found in the internal memory after decoding. However in this case, a decision still needs to be given based on motion statistics in order to reduce the amount of processing that is performed for telecine detection. This case may not necessarily reduce the bandwidth but may still reduce the amount of memory used by hardware for analyzing a frame. The memory reduction may also be achieved by reducing the portions of a frame to be analyzed.
In order to decide which portions of a frame to use in telecine detection, the inverse telecine unit may apply a simple algorithm which uses motion statistics and prediction error. A similar block_validity motion map can be formed, in which a label of 2 is assigned to blocks with high motion and prediction error, a label of 1 is assigned to smaller motion blocks and 0 label is assigned to intra blocks. A similar ranking-based method can then be applied to select the appropriate block of pixels to fetch from the external memory.
The processing technique conceptually illustrated in
The example of
In summary, the proposed techniques may be beneficial in facilitating inverse telecine detection, and in reducing the bandwidth and memory requirements of video decoders/processors for the telecine detection process. The bandwidth reduction is basically performed by identifying the pixel areas of the previous field which are already in the memory, and selecting those columns of pixels to perform telecine detection either deterministically or adaptively by using bitstream characteristics.
The techniques of this disclosure may be embodied in a wide variety of devices or apparatuses, including a wireless handset, and integrated circuit (IC) or a set of ICs (i.e., a chip set). Any components, modules or units have been described provided to emphasize functional aspects and does not necessarily require realization by different hardware units, etc.
Accordingly, the techniques described herein may be implemented in hardware, software, firmware, or any combination thereof. Any features described as modules or components may be implemented together in an integrated logic device or separately as discrete but interoperable logic devices. If implemented in software, the techniques may be realized at least in part by a computer-readable medium comprising instructions that, when executed, performs one or more of the methods described above. The computer-readable data storage medium may form part of a computer program product, which may include packaging materials. The computer-readable medium may comprise random access memory (RAM) such as synchronous dynamic random access memory (SDRAM), read-only memory (ROM), non-volatile random access memory (NVRAM), electrically erasable programmable read-only memory (EEPROM), FLASH memory, magnetic or optical data storage media, and the like. The techniques additionally, or alternatively, may be realized at least in part by a computer-readable communication medium that carries or communicates code in the form of instructions or data structures and that can be accessed, read, and/or executed by a computer.
The code may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, an application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term “processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated software modules or hardware modules configured for encoding and decoding, or incorporated in a combined video codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
Various aspects of the disclosure have been described. These and other aspects are within the scope of the following claims.