The present invention relates to video processing and, more particularly, to a method and apparatus for detecting scene changes.
This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present invention which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Motion picture video content data is generally captured, stored, transmitted, processed, and output as a series of still images. Small frame-by-frame data content changes are perceived as motion when the output is directed to a viewer at sufficiently close time intervals. A large data content change between two adjacent frames is perceived as a scene change (e.g., a change from an indoor to an outdoor scene, a change in camera angle, an abrupt change in illumination within an image, and the like).
Encoding and compression processes take advantage of small frame-by-frame video content data changes to reduce the amount of data needed to store, transmit, and process video data content. The amount of data required to describe the changes is less than the amount of data required to describe the original still image. Under standards developed by the Moving Pictures Experts Group (MPEG), for example, a group of frames begins with an intra-coded frame (I-frame) in which encoded video content data corresponds to visual attributes (e.g., luminance, chrominance) of the original still image. Subsequent frames in the group of frames, such as predictive coded frames (P-frames) and bi-directional coded frames (B-frames), are encoded based on changes from earlier frames in the group. New groups of frames, and thus new I-frames, are begun at regular time intervals to prevent, for instance, noise from inducing false video content data changes. New groups of frames, and thus new I-frames, are also begun at scene changes when the video content data changes are large because less data is required to describe a new still image than to describe the large changes between the adjacent still images. In other words, two pictures from different scenes have little correlation between them. Compression of the new picture into an I-frame is more efficient than using one picture to predict the other picture. Therefore, during content data encoding, it is important to identify scene changes between adjacent video content data frames.
It should also be noted that the identification of scene changes is also relevant in film post-production processing. For example, color correction processing, one type of post-production processing, is typically applied to motion picture video content data on a scene-by-scene basis. As a result, quick and accurate detection of scene boundaries is critical.
Several processes exist to identify scene changes between two video content frames. Motion-based processes compare vector motion for blocks of picture elements (pixels) between two frames to identify scene changes. Histogram-based processes map, for example, the distribution of pixel color data for the two frames and compare the distributions to identify scene changes. Picture feature-based processes identify a given object (e.g., an actor, a piece of scenery or the like) in a video content data frame to determine if the defined attributes of the object are associated with a predetermined scene classification. However, each process has drawbacks. For example, motion-based processes are often very time-consuming requiring multiple clock cycles and dedicated processor bandwidth. Histogram-based processes, when used exclusively, are often inaccurate and incorrectly detect scene changes. Finally, picture feature-based processes are often even more difficult and time-consuming than motion-based processes.
The present invention is directed towards overcoming these drawbacks.
The present invention is directed towards an apparatus and method for detecting scene change by using a Sum of Absolute Histogram Difference (SAHD) and a Sum of Absolute Display Frame Difference (SADFD). The present invention uses the temporal information in the same scene to smooth out variations and accurately detect scene changes. The present invention can be used for both real-time (e.g., real-time video compression) and non-real-time (e.g., film post-production) applications.
These and other advantages and features of the invention will become readily apparent to those skilled in the art after reading the following detailed description of the invention and studying the accompanying drawings.
The following is a detailed description of the presently preferred embodiments of the present invention. However, the present invention is in no way intended to be limited to the embodiments discussed below or shown in the drawings. Rather, the description and the drawings are merely illustrative of the presently preferred embodiments of the invention.
One or more specific embodiments of the present invention will be described below. In an effort to provide a concise description of these embodiments, not all features of an actual implementation are described in the specification. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.
Referring now to
Referring now to
Referring now to
For example, the SADFD for the first two pictures would be calculated using the following formula:
SADFD=ΣM−1i=0ΣN−1j=0|p1(i,j)−p2(i,j)|
Where M is the width of a picture and N is the height of the picture. P1(i,j) is the one channel value at pixel (i,j) of the first picture, and P2(i,j) is that of the second picture.
The SAHD for the first two pictures would be calculated using the following formula:
SAHD=Σ255i=0|H1(i)−H2(i)|
Where H1(i) is the number of pixels that have the value of i in the first picture one channel, and H2(i) is that of the second picture.
It should be noted that when the SADFD is less than four a false scene change may be detected. In order to avoid such false scene change detections, the SADFD is set equal to four if the calculated SADFD is less than four.
At step 50, the scene detection module determines if the picture being processed is a first picture in a new scene. If so, at step 70, the accumulated total values for the SADFD and SAHD are set to zero and the scene detection module returns to step 40 to receive the next picture of the uncompressed motion picture video content datastream. If not, the scene detection module accumulates a total SADFD and total SAHD using a weighted formula. Exemplary weighted formulas that have been found to yield accurate scene detection results are:
TotalSADFD=TotalSADFD*0.4+0.6*SADFD
TotalSAHD=TotalSAHD*0.4+0.6*SAHD
Weight values other that 0.4 and 0.6 may be used, however, these weight values have been found to generate accurate scene detection results.
Next, to detect the presence of a scene change the scene detection module, at steps 52-68, executes a series of selected tests. More specifically, each test utilizes a ratio of a currently read picture's SADFD to an accumulated TotalSADFD and a ratio of the currently read picture's SAHD to an accumulated TotalSAHD.
A first scene detection test starts at step 52, wherein the scene detection module determines if a currently read picture's SADFD is greater than the accumulated TotalSADFD and if the currently read picture's SAHD is greater than the accumulated TotalSAHD. If not, the scene detection module initiates a second scene detection test at step 54 and described in further detail below. If so, the scene detection module, at step 58, generates a SADF-based ratio and a SAHD-based ratio. More specifically, the generated ratios are as follows:
ratioSADFD=SADFD/TotalSADFD
ratioSAHD=SAHD/TotalSAHD
Next, at step 66, the scene detection module calculates a new scene value as follows:
newscene=(int)(ratioSADFD*4+ratioSAHD)/8
Then, at step 68, the scene detection module determines if the calculated new scene value is greater than or equal to one. If the new scene value is greater than or equal to one, the scene detection module generates a control signal, as discussed in
TotalSADFD=TotalSADFD*0.4+0.6*SADFD
TotalSAHD=TotalSAHD*0.4+0.6*SAHD
Weight values other that 0.4 and 0.6 may be used, however, these weight values have been found to generate accurate scene detection results. Afterwards, the scene detection module returns to step 40 to receive the next picture of the uncompressed motion picture video content datastream.
If, at step 52, the scene detection module determines that either the currently read picture's SADFD is not greater than the accumulated TotalSADFD or the currently read picture's SAHD is not greater than the accumulated TotalSAHD, the scene detection module, at step 54, initiates a second scene detection test. At step 54, the scene detection module determines if a currently read picture's SADFD is less than the accumulated TotalSADFD and if the currently read picture's SAHD is less than the accumulated TotalSAHD. If not, the scene detection module initiates a third scene detection test at step 56 and described in further detail below. If so, the scene detection module, at step 60, generates a SADF-based ratio and a SAHD-based ratio. More specifically, the generated ratios are as follows:
ratioSADFD=TotalSADFD/SADFD
ratioSAHD=TotalSAHD/SAHD
Next, at step 66, the scene detection module calculates a new scene value as follows:
newscene=(int)(ratioSADFD*4+ratioSAHD)/8
Then, at step 68, the scene detection module determines if the calculated new scene value is greater than or equal to one. If the new scene value is greater than or equal to one, the scene detection module generates a control signal, as discussed in
TotalSADFD=TotalSADFD*0.4+0.6*SADFD
TotalSAHD=TotalSAHD*0.4+0.6*SAHD
Weight values other that 0.4 and 0.6 may be used, however, these weight values have been found to generate accurate scene detection results. Afterwards, the scene detection module returns to step 40 to receive the next picture of the uncompressed motion picture video content datastream.
If, at step 54, the scene detection module determines that either the currently read picture's SADFD is not less than the accumulated TotalSADFD or the currently read picture's SAHD is not less than the accumulated TotalSAHD, the scene detection module, at step 56, initiates a third scene detection test. At step 56, the scene detection module determines if a currently read picture's SADFD is greater than the accumulated TotalSADFD and if the currently read picture's SAHD is less than the accumulated TotalSAHD. If not, the scene detection module determines that the currently read picture's SADFD is less than the accumulated TotalSADFD and the currently read picture's SAHD is greater than the accumulated TotalSAHD and initiates a fourth scene detection test at step 64 and described in further detail below. If so, the scene detection module, at step 62, generates a SADF-based ratio and a SAHD-based ratio. More specifically, the generated ratios are as follows:
ratioSADFD=SADFD/TotalSADFD
ratioSAHD=TotalSAHD/SAHD
Next, at step 66, the scene detection module calculates a new scene value as follows:
newscene=(int)(ratioSADFD*4+ratioSAHD)/8
Then, at step 68, the scene detection module determines if the calculated new scene value is greater than or equal to one. If the new scene value is greater than or equal to one, the scene detection module generates a control signal, as discussed in
TotalSADFD=TotalSADFD*0.4+0.6*SADFD
TotalSAHD=TotalSAHD*0.4+0.6*SAHD
Weight values other that 0.4 and 0.6 may be used, however, these weight values have been found to generate accurate scene detection results. Afterwards, the scene detection module returns to step 40 to receive the next picture of the uncompressed motion picture video content datastream.
As discussed above, if the scene detection module determines that the currently read picture's SADFD is less than the accumulated TotalSADFD and the currently read picture's SAHD is greater than the accumulated TotalSAHD the scene detection module, at step 64, generates a SADF-based ratio and a SAHD-based ratio. More specifically, the generated ratios are as follows:
ratioSADFD=TotalSADFD/SADFD;
ratioSAHD=SAHD/TotalSAHD
Next, at step 66, the scene detection module calculates a new scene value as follows:
newscene=(int)(ratioSADFD*4+ratioSAHD)/8
Then, at step 68, the scene detection module determines if the calculated new scene value is greater than or equal to one. If the new scene value is greater than or equal to one, the scene detection module generates a control signal, as discussed in
TotalSADFD=TotalSADFD*0.4+0.6*SADFD
TotalSAHD=TotalSAHD*0.4+0.6*SAHD
Weight value's other that 0.4 and 0.6 may be used, however, these weight values have been found to generate accurate scene detection results. Afterwards, the scene detection module returns to step 40 to receive the next picture of the uncompressed motion picture video content datastream.
As described above, the present invention is described as using a combination of Sum of Absolute Histogram Difference (SAHD) and Sum of Absolute Display Frame Difference (SADFD). Components used to generate these differences can include, but are not limited to, luminance, chrominance, R, G, B, or any other video component.
While the present invention has been described in terms of a preferred embodiment above, those skilled in the art will readily appreciate that numerous modifications, substitutions and additions may be made to the disclosed embodiment without departing from the spirit and scope of the present invention. For example, the apparatus and method described herein may be implemented in hardware, software or a combination of hardware and software. It is intended that all such modifications, substitutions and additions fall within the scope of the present invention which is best defined by the claims below.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US06/22341 | 6/8/2006 | WO | 00 | 12/4/2008 |