The present invention relates generally to digital video processing and analysis and, more particularly, to a system and method for scene change detection employing a randomly sub-sampled partition voting algorithm.
The digital video codec technology that enables video compression or decompression is an integral aspect of the telecommunication, entertainment, and broadcasting industries. Many advanced video compression standards, such as, for example, ISO/IEC MPEG-1, MPEG-2, MPEG-4, CCITT H.261, ITU-T H.263, ITU-T H.264, and Microsoft WMV9/VC-1, have been developed to deliver high quality and a low bit rate video stream.
In video compression, a video sequence is encoded using two types of frames: intra frames and predicted frames. Intra frames use only their internal information, while predicted frames exploit the temporal redundancy of a video sequence. Thus, a frame is selected as a reference, and subsequent frames are predicted from the reference. When neighboring frames have high correlation, the compression ratio of the predicted frame is much higher than that of the intra frame. In order to achieve a high compression ratio, the percentage of predicted frames within a video sequence is typically 95% or higher. However, intra frames encode a frame more efficiently than predicted frames when the frame has little correlation to the previous frame. Furthermore, intra frames are inserted in a sequence of predicted frames to avoid propagation of errors which accumulate while encoding predicted frames based on previous predicted frames.
The video sequence can be divided into different shots. A transition between two shots is a scene change. The first frame after the scene change should be encoded as an intra frame, because its correlation to the previous frame, if existing, is very low. A scene change detection algorithm is required to identify changes in the scene content of the video sequence and make a decision as to when to insert an intra frame into a succession of frames, thus segmenting video into shots.
Existing low cost scene change detection algorithms can be divided into spatial correlation-based and histogram-based. Spatial correlation-based algorithms are very sensitive to motion, while histogram-based algorithms lose most of the spatial information during their decision making process. In addition to these shortcomings, the computational complexity of these two types of algorithms is usually quite high. Therefore, they are not entirely suitable to meet the requirements of a real-time embedded video encoder, i.e., low memory access bandwidth, low computational complexity, and low latency.
In view of the foregoing, embodiments of the invention provide a method for a reliable low cost scene change detection, utilizing a randomly sub-sampled partition voting (RSPV) algorithm. The RSPV algorithm exploits advantages of both spatial correlation-based and histogram-based algorithms.
According to embodiments of the invention, a current frame is divided into a number of partitions. Each partition is then randomly sub-sampled and a histogram of the pixel intensity values is built to determine whether the current partition differs from the corresponding partition in a reference frame. A bin-by-bin absolute histogram difference between a partition in the current frame and a co-located partition in the reference frame is calculated. The histogram difference is then compared to an adaptive threshold. If the majority of the examined partitions has significant changes, a scene change is assumed to be detected. In addition, various other thresholds can be used to determine whether a partition can be reported as significantly changed.
Employing the histogram calculation makes the RSPV algorithm motion-independent, while partitioning utilizes sufficient spatial information. Because the histogram is calculated on a sub-sampled frame, the algorithm is characterized by a significantly reduced cost of memory access and computations.
Accordingly, a number of aspects of the invention are presented, along with a number of exemplary embodiments, which are not intended as limiting.
One such aspect is a method for scene change detection in a video sequence is provided, the method comprising: (a) partitioning a current frame into a plurality of partitions each containing a plurality of pixels; (b) sub-sampling randomly said plurality of pixels within each of the plurality of partitions; (c) for each current partition from the plurality of partitions, generating a histogram of the number of pixels in each pixel value range of a plurality of pixel value ranges, the histogram comprising a plurality of bins; (d) determining a bin-by-bin absolute histogram difference between the current partition and a corresponding partition in a reference frame; (e) if the bin-by-bin absolute histogram difference is greater than a first predetermined threshold, labeling the current partition as changed; (f) repeating steps (b) through (e) for each of the plurality of partitions in the current frame; and (g) if a number of the partitions in the current frame labeled as changed is greater than a second predetermined threshold, reporting a scene change in the current partition.
According to another aspect, a computer-readable storage medium encoded with computer instructions for execution on a computer system, the instructions, when executed, performing a method for scene change detection in a video sequence, comprising: (a) partitioning a current frame into a plurality of partitions each containing a plurality of pixels; (b) sub-sampling randomly said plurality of pixels within each of the plurality of partitions; (c) for each current partition from the plurality of partitions, generating a histogram of the number of pixels in each pixel value range of a plurality of pixel value ranges, the histogram comprising a plurality of bins; (d) determining a bin-by-bin absolute histogram difference between the current partition and a corresponding partition in a reference frame; (e) if the bin-by-bin absolute histogram difference is greater than a first predetermined threshold, labeling the current partition as changed; (f) repeating steps (b) through (e) for each of the plurality of partitions in the current frame; and (g) if a number of the partitions in the current frame labeled as changed is greater than a second predetermined threshold, reporting a scene change in the current partition.
According to another aspect, an apparatus comprising a processor and a computer-readable storage medium containing computer instructions for execution on the processor to provide a method for scene change detection in a video sequence, comprising: (a) partitioning a current frame into a plurality of partitions each containing a plurality of pixels; (b) sub-sampling randomly said plurality of pixels within each of the plurality of partitions; (c) for each current partition from the plurality of partitions, generating a histogram of the number of pixels in each pixel value range of a plurality of pixel value ranges, the histogram comprising a plurality of bins; (d) determining a bin-by-bin absolute histogram difference between the current partition and a corresponding partition in a reference frame; (e) if the bin-by-bin absolute histogram difference is greater than a first predetermined threshold, labeling the current partition as changed; (f) repeating steps (b) through (e) for each of the plurality of partitions in the current frame; and (g) if a number of the partitions in the current frame labeled as changed is greater than a second predetermined threshold, reporting a scene change in the current partition.
In some embodiments, the pixel values represent a luminance component of a corresponding pixel color. The number of partitions in the current frame may be in a range from 16 to 128.
In some embodiments, the histogram may be a 16-bin histogram. The second predetermined threshold may be defined as majority of the partitions in the current frame.
It should be understood that the embodiments above-mentioned and discussed below are not, unless context indicates otherwise, intended to be mutually exclusive.
In embodiments of the present invention, frames in a video sequence are divided into partitions. Accordingly,
A randomly sub-sampled partition voting (RSPV) algorithm utilized in embodiments of the present invention is applied to each of the partitioned frames.
The bin-by-bin absolute histogram difference is calculated as shown in steps 310 and 312 of
HistoDiff(k)=Σabs(C(k,j)−R(k,j)),
where C is the current frame, R is the reference frame, k is the partition number, and j is the bin number of the histogram calculated for the kth partition.
After the bin-by-bin absolute histogram difference between each of the M bins of respective histograms built for the kth partitions from the current and reference frames has been calculated, which is determined in step 312, the resulting bin-by-bin absolute histogram difference for the kth partition, HistoDiff(k), is compared to a configurable threshold, referred to as a threshold1, in step 314. If the calculated bin-by-bin absolute histogram difference exceeds the thresholds, the kth partition is labeled as changed, in step 316. Otherwise, the kth partition is labeled as unchanged in step 318, or not labeled as changed.
Step 320 of
In embodiments of the invention, the threshold2 defined as 50% of the number of the partitions that are marked as changed. Thus, if the majority of the frame partitions (i.e., more than 8, in embodiments where the number of partitions is 16) is reported as changed, the frame is considered to contain a scene change. When the scene change occurs, the distribution of the histogram for the current frame partition is notably shifted from that for the respective reference frame partition. The magnitude of the bin-by-bin absolute histogram difference indicates the size of the distribution shift.
The computational cost of the RSPV algorithm is low. If the sub-sampling ratio is, for example, 8:1, both horizontally and vertically, the pixels processed constitute only about 2% of all pixels in the frame. Considering the nature of parallel processing of histogram calculation and memory access, the RSPV algorithm is characterized by a reduced time required for the scene change detection, compared to algorithms that calculate histograms for all pixels in a partition. Moreover, despite the sub-sampling and thus reduced number of pixels examined, the detection result is sufficiently reliable, as was demonstrated in experiments performed by the inventors. For ten well known video sequences, each having a thousand frames, a scene change missing rate is less than 3%, and the false alarm rate is less than 2%.
It should be appreciated that the RSPV algorithm can be scaled, by varying the number of partitions and the sub-sampling ratio, to fit frames of different sizes. The bin-by-bin absolute histogram difference threshold is adaptive, and can be adjusted for various video contents, including adjusting in real-time.
In summary, embodiments of the present invention provide a reliable, low cost, and motion insensitive method for scene change detection. The RSPV algorithm is scalable and can employ various adaptive thresholds.
Embodiments of the present invention can be implemented in software, hardware, firmware, various types of processors, or as a combination thereof. Thus, some embodiments may be implemented as computer-readable instructions embodied on one or more computer-readable media, including but not limited to storage media such as ROMs, RAMs, floppy disks, CD-ROMs, DVDs, etc. Some embodiments of the present invention can be implemented either as a computer-readable medium having stored thereon computer-readable instructions or as hardware components of video encoders within high-performance members of the Blackfin family embedded digital signal processors available from Analog Devices, Inc., Norwood, Mass. For example, a digital signal processor ADSP-BF561, which includes two independent cores each capable of 600 MHz performance, and a single-core ADSP-BF533 digital signal processor that achieves up to 756 MHz performance may be utilized. Other various suitable digital signal processors can implement embodiments of the invention as well.
As should be appreciated from the foregoing, there are numerous aspects of the present invention described herein that can be used independently of one another or in any combination. In particular, various aspects of the present invention may be used alone, in combination, or in a variety of arrangements not specifically discussed in the embodiments described in the foregoing, and the aspects of the present invention described herein are not limited in their application to the details and arrangements of components set forth in the foregoing description or illustrated in the drawings. The aspects of the invention are capable of other embodiments and of being practiced or of being carried out in various ways. Various aspects of the present invention may be implemented using any type of circuit and no limitations are placed on the circuit implementation. Accordingly, the foregoing description and drawings are by way of example only.
It should also be appreciated that the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising,” or “having,” “containing”, and variations thereof herein, is meant to encompass the items listed thereafter and equivalents thereof as well as additional items.
This application claims priority under 35 U.S.C. §119(e) to U.S. Provisional Application Ser. No. 60/750,658, entitled, “RANDOMLY SUB-SAMPLED PARTITION VOTING (RSPV) ULTRA LOW COST SCENE CHANGE DETECTION ALGORITHM,” filed on Dec. 15, 2005, which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
60750658 | Dec 2005 | US |