The present invention relates generally to video encoding, and more particularly to a method and apparatus for encoding video in which a telecine pattern is detected.
Motion picture photography or film has a rate of 24 frames per second. Every frame itself is a complete picture, also known as a “progressive frame.” This means that all fields, top and bottom, correspond to the same instant of time.
Video signals, on the other hand, may have a progressive frame structure or an interlaced structure. An interlaced frame is divided into top and bottom fields, and scanning of one field does not start until the other one is finished. Moreover, video has a different frame rate than film. The NTSC standard (used primarily in North America) uses a frame rate of approximately thirty frames per second for interlaced video. The PAL standard (used in most of the rest of the world) uses a frame rate of twenty-five frames per second. Progressive video uses a frame rate of 60 frames per second.
The different frame rates used by film and video complicate the conversion between the two formats. In order to solve the problem of having extra video frames when converting film to be shown on television, a telecine process converts multiple frames of film into five frames of video. For progressive video at a frame rate of 60 frames per second, two frames of film are converted into five frames of video. One method of performing this process involves converting a first frame of film into three frames of video and a second frame of film into two frames video. That is, the first frame of film is repeated twice and the second frame of film of repeated once. Because of the 3-2 pattern, the process is often called 3-2 pulldown. This pattern is illustrated generally in
The repeated or duplicate frames in the telecine process enable the viewing of film materials in the video format. However, in some applications, it is desirable to remove the duplicate frames. For example, the repeated frames do not contain new information and should be removed before encoding (compression). An inverse telecine process, also referred to as a detelecine process, converts a video signal back to a film format. This process takes incoming video, which is presumed to have been generated from film source material, and outputs the original frame images so that they can be encoded. By removing repeated frames from the video material, the encoding process can be made more efficient, and ultimately the amount of the resulting data can be greatly reduced.
Video encoders typically compress the video because video can require an enormous amount of digital storage if left uncompressed. One method for compressing digital video involves using the standards of the Moving Pictures Experts Group (MPEG). The MPEG-2 standard calls for three types of frames to be encoded. Intra-frames, or I-frames are encoded in the same manner as still images; an I-frame contains information sufficient to display an entire image. Predictive frames, or P-frames use previous reference frames to determine what the current frame will be by recording changes between a previous frame and the current frame. Bi-directional frames, or B-frames use previous and subsequent reference frames to determine what the current frame will be. P-frames and B-frames use motion vectors to encode frames. A motion vector determines movement between specific areas of one frame to another frame. For example, a P-frame may be encoded by referencing an I-frame immediately preceding it. Motion vectors between the P-frame and the I-frame instruct a decoder to display the P-frame by movement of certain areas within the I-frame which results in the proper display of the P-frame.
One method of encoding digital video calls for grouping frames together into what are known as Groups of Pictures (GOPs). A GOP may begin with an I-frame, and have P-frames and B-frames which refer to the I frame. A P-frame or a B-frame can refer to either an I-frame or a P-frame, but not to a B-frame. The length and order of GOPs can be determined before encoding or dynamically, while the encoder is encoding. An example of a sequence of a GOP may be IBBPBBPBB in display order, meaning an I-frame, followed by two B frames, a P frame, two more B-frames, another P-frame, and two more B-frames. In an encoder which determines the order of a GOP prior to encoding, this sequence would repeat itself. In the above sequence, the first P-frame will refer back to the first I-frame, since it cannot refer to a B-frame, and must refer to a frame that occurs before it. The B-frames may refer to the I- or P-frame just preceding it in display order and/or the I- or P-frame just following it in display order.
A block diagram of one example of a digital video encoder that may be used to encode video that has been converted from film source material is shown in
Video encoders such as shown in
The third case shown in
In the case of 3-2 pulldown, for every five frames of film that are to be encoded three are dropped (i.e., only two frames are encoded). If the pipeline has a duration of one second, by the time the last frame in the pipeline begins to be encoded the pipeline will have been delayed by ⅗ of a second, or 600 ms. This delay significantly reduces the time that is available between the time a frame is encoded and the time it is to be decoded by the decoder, which is specified by the frame's decode time stamp. As a result of this delay, the frame must be compressed more aggressively to prevent underflow in the decoder buffer. Consequently, video quality will be degraded due to the aggressive compression.
In accordance with one aspect of the invention, a video encoder is provided which includes an inverse telecine detector for receiving video frames and generating a telecine detection signal identifying repeated frames and an encoder pipeline buffer for storing unrepeated video frames received from the inverse telecine detector. The video encoder also includes an encoder engine for encoding the unrepeated video frames received from the encoder pipeline buffer, a pre-encoded frame storage medium for storing pre-encoded frames, and a processor. The processor is configured to cause the encoder engine to replace selected frames that have been identified as repeating frames by the inverse telecine detector with a pre-encoded frame accessed from the pre-encoded frame storage medium.
In accordance with another aspect of the invention, a method is provided for encoding a series of video frames. In accordance with the method, a series of video frames are received and a pattern is detected in the series indicative of a telecine process. Repeated video frames are removed from the series of video frames and replaced with pre-encoded video frames selected from a plurality of pre-encoded video frames. Each of the video frames in the series of frames that have not been pre-encoded are sequentially encoded.
The present invention relates to devices and methods for efficiently encoding digital video. This invention may be used to increase efficiency when encoding video that has been processed using a 3:2 pull down process. Although the embodiments described below relate to encoding video that has been processed using a 3:2 pull down process, it is understood that the present invention may be used for any type of video that has been converted from film source material, including both interlaced and progressive video formats.
The methods and techniques discussed herein are applicable to digital video encoders that may be employed in a wide variety of different environments and thus is not limited to an encoder that is configured for any one particular application. In some cases, however, these methods and techniques may be particularly suitable for use in one or more of the various encoders that are employed in a content delivery system that is used to deliver programming content to subscribers, an example of which will be presented in connection with
The headend 10 delivers the programming content received from the content provider to subscriber terminals 40 over a content delivery system 25. Illustrative examples of the content delivery system 25 include, but are not limited to, broadcast television networks, cable data networks, xDSL (e.g., ADSL, ADLS2, ADSL2+, VDSL, and VDSL2) systems, satellite television networks and packet-switched networks such as Ethernet networks, and Internet networks. In the case of a cable data network, an all-coaxial or a hybrid-fiber/coax (HFC) network may be employed. The all-coaxial or HFC network generally includes an edge QAM modulator and a hybrid fiber-coax (HFC) network, for example. The edge modulator receives Ethernet frames that encapsulate transport packets, de-capsulate these frames and removes network jitter, implements modulation and, performs frequency up-conversion and transmits radio frequency signals representative of the transport stream packets to end users over the HFC network. In the HFC network, the transport stream is distributed from the headend 10 (e.g., a central office) to a number of second level facilities (distribution hubs). Each hub in turn distributes carriers to a number of fiber nodes. In a typical arrangement, the distribution medium from the head-end down to the fiber node level is optical fibers. Subscriber homes are connected to fiber hubs via coaxial cables. In the case of a packet-switched network, any suitable network-level protocol may be employed. While the IP protocol suite is often used, other standard and/or proprietary communication protocols are suitable substitutes.
Subscriber terminals 40 may be any device that can receive, decode and, if necessary, decrypt the content received over the content delivery system 25. Illustrative examples of subscriber terminals include set top boxes, personal computers, media centers, and the like.
A simplified block diagram of one example of a digital video encoder such as encoder 20 or encoder 17 which operates in accordance with the methods and techniques described herein is shown in
It will be understood that the function of the various components of the video encoder shown in
One example of an encoder engine is shown in
The DCT module 104 transforms the difference signal from the pixel domain to the frequency domain using a DCT algorithm to produce a set of coefficients. The quantizer 106 quantizes the DCT coefficients. The entropy coder 108 codes the quantized DCT coefficients to produce a coded frame. The inverse quantizer 110 performs the inverse operation of the quantizer 106 to recover the DCT coefficients. The inverse DCT module 112 performs the inverse operation of the DCT module 104 to produce an estimated signal. If no prediction was used, the output of the DCT is the estimated frame. If prediction was used, the output of the inverse DCT is the estimated difference signal. The estimated difference signal is added to the predicted frame by the summer 114 to produce an estimated frame, which is coupled to the deblocking filter 116. The deblocking filter deblocks the estimated frame and stores the estimated frame or reference frame in the frame memory 118. The motion compensated predictor 120 and the motion estimator 124 are coupled to the frame memory 118 and are configured to obtain one or more previously estimated frames (previously coded frames).
The motion estimator 124 also receives the source frame. The motion estimator 124 performs a motion estimation algorithm using the source frame and a previous estimated frame (i.e., reference frame) to produce motion estimation data. The motion estimation data is provided to the entropy coder 108 and the motion compensated predictor 120. The entropy coder 108 codes the motion estimation data to produce coded motion data. The motion compensated predictor 120 performs a motion compensation algorithm using a previous estimated frame and the motion estimation data to produce the predicted frame, which is coupled to the intra/inter switch 122. Motion estimation and motion compensation algorithms are well known in the art.
As previously mentioned, the pipeline to the video encoder may undergo significant delays if it is stopped each time a repeated frame is dropped when encoding video that has been converted from film. To overcome this problem, instead of dropping the repeated frame, the frame is tagged by the inverse telecine detector to identify it as a repeated frame. Then, instead of using the encoding engine to encode the repeated frame, the repeated frame is replaced with a frame that has been pre-encoded and stored in a memory such as the pre-encoded frame storage 180. In the example shown in
Since in the example of
Among other advantages of this technique, since the motion vectors and residual coefficients of the pre-encoded B frame are zero, the pre-encoded B frame will be very small in size. In addition, because it has been pre-encoded, the resources of the encoder will be conserved.
As explained above, when the set of three identical frames in the frame sequence shown in
In particular, when the set of two identical frames are B frames, both B frames may be encoded. On the other hand, one of the B frames may be encoded and the other B frame may be dropped in the conventional manner. While this latter option will cause a pipeline delay, the delay will be much less that the delay that occurs when all repeated frames are dropped. For instance, if one in ten frames is dropped, a 100 ms delay will occur in a pipeline that has a duration of one second. In contrast, as previously mentioned, a delay of 600 ms will arise in the case of 3:2 pulldown when three of every five frames are dropped. Of course, as previously mentioned, the methods and techniques described above not limited to video that has been processed using a 3:2 pulldown process, but is equally applicable to video that has been processed using other telecine processes as well. In another method, one of the B frames may be replaced by a P frame, thus dynamically changing “m”. The remaining B frame of the pair will then be coded with a pre-encoded frame.
Although not illustrated in
The processes described above may be implemented in a video encoder such as the illustrative video encoder shown in
The processes described above, including but not limited to those performed by the video encoder shown in
Although various embodiments are specifically illustrated and described herein, it will be appreciated that modifications and variations of the present invention are covered by the above teachings and are within the purview of the appended claims without departing from the spirit and intended scope of the invention.