This system incorporates procedures for distinguishing between telecine originated video and conventionally generated broadcast video. Following that decision, data derived from the decision process facilitates the reconstruction of the film images that were telecined.
In the 1990's television technology switched from using analog methods for representing and transmitting video to digital methods. Once it was accepted that the existing solid state technologies would support new methods for processing video, the benefits of digital video were quickly recognized. Digital video could be processed to match various types of receivers having different numbers of lines, and line patterns that were either interlaced or progressive. The cable industry welcomed the opportunity to change the bandwidth-resolution tradeoff virtually on the fly, allowing up to twelve video channels or 7-8 channels of digital video that had superior picture quality to be transmitted in a bandwidth that formerly carried one analog channel of video. Digital pictures would no longer be affected by ghosts caused by multipath in transmission.
The new technology offered the possibility of high definition television (HDTV), having a cinema-like image and a wide screen format. Unlike the current aspect ratio that is 4:3, the aspect ratio of HDTV is 16:9, similar to a movie screen. HDTV can include Dolby Digital surround sound, the same digital sound system used in DVDs and many movie theaters. Broadcasters could choose either to transmit either a high resolution HDTV program or send a number of lower resolution programs in the same bandwidth. Digital television could also offer interactive video and data services.
There are two underlying technologies that drive digital television. The first technology uses transmission formats that take advantage of the higher signal to noise ratios typically available in channels that support video. The second is the use of signal processing to remove unneeded spatial and temporal redundancy present in a single picture or in a sequence of pictures. Spatial redundancy appears in pictures as relatively large areas of the picture that have little variation in them. Temporal redundancy refers to structures in a picture that reappear in later or earlier pictures. The signal processing operations are best performed on frames or fields that are all formed at the same time, and are not composites of picture elements that are scanned at different times. The NTSC compatible fields formed from cinema images by a telecine have an irregular time base that must be corrected for ideal compression to be achieved. However, video formed in telecine may be intermixed with true NTSC video that has a different underlying time base. Effective video compression is a result of using the properties of the video to eliminate redundancy. Therefore there is a need for a technique that automatically would distinguish telecined video from true interlaced NTSC video, and, if telecined video is detected, invert the telecining process, recovering the cinematic images that were the source of the telecined video.
One aspect of this aspect comprises a method for processing video frames that comprises determining a plurality of metrics from said video frames, and inverse telecining said video frames using the determined metrics.
Yet another aspect of this aspect comprises an apparatus for processing video frames comprising a computational module configured to determine a plurality of metrics from said video frames, and a phase detector configured to provide inverse telecine of said video frames using the determined metrics.
Yet another aspect of this aspect comprises an apparatus for processing video frames that comprises a means for determining a plurality of metrics from said video frames, and a means for inverse telecining said video frames using the determined metrics.
Yet another aspect of this aspect comprises a machine readable medium for processing digitized video frames, that comprises instructions that upon execution cause a machine to determine a plurality of metrics from said video data, and inverse telecine the video frames using the determined metrics.
Yet another aspect of this aspect comprises a video compression processor configured to determine a plurality of metrics from a plurality of video frames, and inverse telecine the video frames using the determined metrics.
The following detailed description is directed to certain specific aspects of the invention. However, the invention can be embodied in a multitude of different ways as defined and covered by the claims. In this description, reference is made to the drawings wherein like parts are designated with like numerals throughout.
Video compression gives best results when the properties of the source are known and used to select the ideally matching form of processing. Off-the-air video, for example, can originate in several ways. Broadcast video that is conventionally generated—in video cameras, broadcast studios etc.—conforms in the United States to the NTSC standard. According the standard, each frame is made up of two fields. One field consists of the odd lines, the other, the even lines. This may be referred to as an “interlaced” format. While the frames are generated at approximately 30 frames/sec, the fields are records of the television camera's image that are 1/60 sec apart. Film on the other hand is shot at 24 frames/sec, each frame consisting of a complete image. This may be referred to as a “progressive” format. For transmission in NTSC equipment, “progressive” video is converted into “interlaced” video format via a telecine process. In one aspect, further discussed below, the system advantageously determines when video has been telecined and performs an appropriate transform to regenerate the original progressive frames.
When conventional NTSC video is recognized (the NO path from phase detector 21), it is transmitted to deinterlacer 17 for compression, resulting in video fields that were recorded at intervals of 1/60 of a second. The phase detector 11 continuously analyzes video frames that stream from source 19 because different types of video may be received at any time. As an exemplary, video conforming to the NTSC standard may be inserted into the telecine's video output as a commercial. The decision made in phase detector 21 should be accurate. Processing conventionally originated NTSC as if it were telecined may cause a serious loss of the information in the video signal.
The signal preparation unit 15 also incorporates a group of pictures (GOP) partitioner 26, to adaptively change the composition of the group of pictures coded together. It is designed to assign one of four types of encoding frames (I, P, B or “Skip Frame”) to a plurality of video frames at its input, thereby removing much of the temporal redundancy while maintaining picture quality at the receiving terminal 3. The processing by the group of picture partitioner 26 and the compression module 27 is aided by preprocessor 25, which provides two dimensional filtering for noise removal.
In one aspect, the phase detector 21 makes certain decisions after receipt of a video frame. These decisions include: (i) whether the present video from a telecine output and the 3:2 pull down phase is one of the five phases P0, P1, P2, P3, and P4 shown in definition 12 of
These decisions appear as outputs of phase detector 21 shown in
In summary, the applicable phase is either utilized as the current pull down phase, or as an indicator to command the deinterlace of a frame that has been estimated to have a valid NTSC format.
For every frame received from video input 19 in
SADFS=Σ|Current Field One Value(i,j)−Previous Field One Value(i,j)| (1)
SADSS=Σ|Current Field Two Value(i,j)−Previous Field Two Value(i,j)| (2)
SADPO=Σ|Current Field One Value(i,j)−Previous Field Two Value(i,j)| (3)
SADCO=Σ|Current Field One Value(i,j)−Current Field Two Value(i,j)| (4)
The term SAD is an abbreviation of the term “summed absolute differences.” The fields which are differenced to form the metrics are graphically shown in
The computational load to evaluate each SAD is described below. There are approximately 480 active horizontal lines in conventional NTSC. For the resolution to be the same in the horizontal direction, with a 4:3 aspect ratio, there should be 480×4/3=640 equivalent vertical lines, or degrees of freedom. The video format of 640×480 pixels is one of the formats accepted by the Advanced Television Standards Committee. Thus, every 1/30 of a second, the duration of a frame, 640×480=307,200 new pixels are generated. New data is generated at a rate of 9.2×106 pixels/sec, implying that the hardware or software running this system processes data at approximately a 10 MByte rate or more. This is one of the high speed portions of the system. It can be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof. The SAD calculator could be a standalone component, incorporated as hardware, firmware, middleware in a component of another device, or be implemented in microcode or software that is executed on the processor, or a combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments that perform the calculation may be stored in a machine readable medium such as a storage medium. A code segment may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents.
Flowchart 30 in
Flowchart 80 in
Flowchart 80 illustrates a process for estimating the current phase. The flowchart at a step 83 describes the use of the determined metrics and lower envelope values to compute branch information. The branch information may be recognized as the Euclidean distances discussed earlier. Exemplary equations that may be used to generate the branch information are Eqs. 5-10 below. The Branch Info quantities are computed in block 109 of
The processed video data can be stored in a storage medium which can include, for example, a chip configured storage medium (e.g., ROM, RAM) or a disc-type storage medium (e.g., magnetic or optical) connected to the processor 25. In some aspects, the inverse telecine 23 and the deinterlacer 17 can each contain part or all of the storage medium. The branch information quantities are defined by the following equations.
Branch Info(0)=(SADFS−HS)2+(SADSS−HS)2+(SADPO−HP)2+(SADCO−LC)2 (5)
Branch Info(1)=(SADFS−LS)2+(SADSS−HS)2+(SADPO−LP)2+(SADCO−HC)2 (6)
Branch Info(2)=(SADFS−HS)2+(SADSS−HS)2+(SADPO−LP)2+(SADCO−HC)2 (7)
Branch Info(3)=(SADFS−HS)2+(SADSS−LS)2+(SADPO−LP)2+(SADCO−LC)2 (8)
Branch Info(4)=(SADFS−HS)2+(SADSS−HS)2+(SADPO−HP)2+(SADCO−LC)2 (9)
Branch Info(5)=(SADFS−LS)2+(SADSS−LS)2+(SADPO−LP)2+(SACCO−LC)2 (10)
The fine detail of the branch computation is shown in branch information calculator 109 in
HS=LS+A (11)
HPO=LP+A (12)
HC=LC+A (13)
A process of tracking the values of LS, LP, and LC is presented in
The quantities LS and LC in
In the case of LS, however, the algorithm in
D0=αD4+Branch Info(0) (14)
D1=αD0+Branch Info(1) (15)
D2=αD1+Branch Info(2) (16)
D3=αD2+Branch Info(3) (17)
D4=αD3+Branch Info(4) (18)
D5=αD5+Branch Info(5) (19)
The quantity α is less than unity and limits the dependence of the decision variables on their past values; use of α is equivalent to diminishing the effect of each Euclidean distance as its data ages. In flowchart 62 the decision variables to be updated are listed on the left as available on lines 101, 102, 103, 104, 105, and 106. Each of the decision variables on one of the phase transition paths is then multiplied by α, a number less than one in one of the blocks 100; then the attenuated value of the old decision variable is added to the current value of the branch info variable indexed by the next phase on the phase transition path that the attenuated decision variable was on. This takes place in block 110. Variable D5 is offset by a quantity Δ in block 193; Δ is computed in block 112. As described below, the quantity is chosen to reduce an inconsistency in the sequence of phases determined by this system. The smallest decision variable is found in block 20.
In summary, new information specific to each decision is added to the appropriate decision variable's previous value that has been multiplied by α, to get the current decision variable's value. A new decision can be made when new metrics are in hand; therefore this technique is capable of making a new decision upon receipt of fields 1 and 2 of every frame. These decision variables are the sums of Euclidean distances referred to earlier.
The applicable phase is selected to be the one having the subscript of the smallest decision variable. A decision based on the decision variables is made explicitly in block 90 of
Each phase can be regarded as a possible state of a finite state machine, with transitions between the states dependent on the current values of the decision variables and the six branch information quantities. When the transitions follow the pattern
P5→P5 or P0→P1→P2→P3→P4 or P5→P5→P5→P3→P4→P0
the machine is operating properly. There may be occasional errors in a coherent string of decisions, because the metrics are drawn from video, which is inherently variable. This technique detects phase sequences that are inconsistent with
x=1,y=0 or
x=2,y=1 or
x=3,y=2 or
x=4,y=3 or
x=0,y=4
are tested. If either test is in the affirmative, the decisions are declared to be consistent in block 420. If neither test is affirmative, an offset, shown in block 193 of
The modification to D5 also appears in
ΔB=max(Δ−δ, −40δ0) (20)
Returning again to block 210, assume that the string of decisions is judged to be consistent. The parameter δ is changed to δ+ in block 215, defined by
δ+=max(2δ, 16δ0) (21)
The new value of δ is inserted into ΔA, the updating relationship for Δ in block 112A. This is
ΔA=max(Δ+δ, 40δ0) (22)
Then the updated value of Δ is added to decision variable D5 in block 193.
In the aspect described above, every time a new frame is received four new values of metrics are found and a six fold set of hypotheses is tested using newly computed decision variables. Other processing structures could be adapted to compute the decision variables. A Viterbi decoder adds the metrics of the branches that make up the paths together to form the path metric. The decision variables defined here are formed by a similar rule: each is the “leaky” sum of new information variables. (In a leaky summation the previous value of a decision variable is multiplied by a number less than unity before new information data is added to it.) A Viterbi decoder structure could be modified to support the operation of this procedure.
While the present aspect is described in terms of processing conventional video in which a new frame appears every 1/30 second, it is noted that this process may be applied to frames which are recorded and processed backwards in time. The decision space remains the same, but there are minor changes that reflect the time reversal of the sequence of input frames. For example, a string of coherent telecine decisions from the time-reversed mode (shown here)
P4 P3 P2 P1 P0
would also be reversed in time.
Using this variation on the first aspect would allows the decision process two tries—one going forward in time, the other backward—at making a successful decision. While the two tries are not independent, they are different in that each try would process the metrics in a different order.
This idea could be applied in conjunction of a buffer maintained to store future video frames for processing. If a video segment is found to give unacceptably inconsistent results in the forward direction of processing, the procedure would draw future frames from the buffer and attempt to get over the difficult stretch of video by processing frames in the reverse direction.
The processing of video described in this patent can also be applied to video in the PAL format.
It is noted that the aspects may be described as a process which is depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination corresponds to a return of the function to the calling function or the main function.
It should also be apparent to those skilled in the art that one or more elements of a device disclosed herein may be rearranged without affecting the operation of the device. Similarly, one or more elements of a device disclosed herein may be combined without affecting the operation of the device. Those of ordinary skill in the art would understand that information and signals may be represented using any of a variety of different technologies and techniques. Those of ordinary skill would further appreciate that the various illustrative logical blocks, modules, and algorithm steps described in connection with the examples disclosed herein may be implemented as electronic hardware, firmware, computer software, middleware, microcode, or combinations thereof. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed methods.
The steps of a method or algorithm described in connection with the examples disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuit (ASIC). The ASIC may reside in a wireless modem. In the alternative, the processor and the storage medium may reside as discrete components in the wireless modem.
In addition, the various illustrative logical blocks, components, modules, and circuits described in connection with the examples disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The previous description of the disclosed examples is provided to enable any person of ordinary skill in the art to make or use the disclosed methods and apparatus. Various modifications to these examples will be readily apparent to those skilled in the art, and the principles defined herein may be applied to other examples and additional elements may be added without departing from the spirit or scope of the disclosed method and apparatus. The description of the aspects is intended to be illustrative, and not to limit the scope of the claims.
The Application for Patent claims priority to Provisional Application No. 60/730,145 entitled “Inverse Telecine Algorithm Based on State Machine” filed Oct. 24, 2005, and assigned to the assignee hereof and hereby expressly incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
60730145 | Oct 2005 | US |