The present invention relates to frame sequence comparison in multimedia streams. Specifically, the present invention relates to a video comparison system for video content.
The availability of broadband communication channels to end-user devices has enabled ubiquitous media coverage with image, audio, and video content. The increasing amount of multimedia content that is transmitted globally has boosted the need for intelligent content management. Providers must organize their content and be able to analyze their content. Similarly, broadcasters and market researchers want to know when and where specific footage has been broadcast. Content monitoring, market trend analysis, and copyright protection is challenging, if not impossible, due to the increasing amount of multimedia content. However, a need exists to improve the analysis of video content in this technology field.
One approach to comparing video sequences is a process for comparing multimedia segments, such as segments of video. In one embodiment, the video comparison process includes receiving a first list of descriptors pertaining to a plurality of first video frames. Each of the descriptors represents visual information of a corresponding video frame of the sequence of first video frames. The method further includes receiving a second list of descriptors pertaining to a sequence of second video frames. Each of the descriptors relates to visual information of a corresponding video frame of the sequence of second video frames. The method further includes designating first segments of the sequence of first video frames that are similar. Each first segment includes neighboring first video frames. The method further includes designating second segments of the sequence of second video frames that are similar. Each second segment includes neighboring second video frames. The method further includes comparing the first segments and the second segments and analyzing the pairs of first and second segments based on the comparison of the first segments and the second segments to compare the first and second segments to a threshold value.
Another approach to comparing video sequences is a computer program product. In one embodiment, the computer program product is tangibly embodied in an information carrier. The computer program product includes instructions being operable to cause a data processing apparatus to receive a first list of descriptors relating to a sequence of first video frames whereby each of the descriptors represents visual information of a corresponding video frame of the sequence of first video frames. The computer program product further includes instructions being operable to cause a data processing apparatus to receive a second list of descriptors relating to a sequence of second video frames where by each of the descriptors represents visual information of a corresponding video frame of the sequence of second video frames. The computer program product further includes instructions being operable to cause a data processing apparatus to designate one or more first segments of the sequence of first video frames that are similar whereby each first segment includes neighboring first video frames. The computer program product further includes instructions being operable to cause a data processing apparatus to designate one or more second segments of the sequence of second video frames that are similar whereby each second segment includes neighboring second video frames. The computer program product further includes instructions being operable to cause a data processing apparatus to compare at least one of the one or more first segments and at least one of the one or more second segments; and analyze the pairs of first and second segments based on the comparison of the first segments and the second segments to compare the first and second segments to a threshold value.
Another approach to comparing video sequences is a system. In one embodiment, the system includes a communication module, a video segmentation module, and a video segment comparison module. The communication module receives a first list of descriptors pertaining to a sequence of first video frames, each of the descriptors relating to visual information of a corresponding video frame of the sequence of first video frames; and receives a second list of descriptors pertaining to a sequence of second video frames, each of the descriptors relating to visual information of a corresponding video frame of the sequence of second video frames. The video segmentation module designates one or more first segments of the sequence of first video frames that are similar, each of the one or more first segments including neighboring first video frames; and designates one or more second segments of the sequence of second video frames that are similar, each of the one or more second segments including neighboring second video frames. The video segment comparison module compares at least one of the one or more first segments and at least one of the one or more second segments; and analyzes pairs of the at least one first and the at least one second segments based on the comparison of the at least one first segments and the at least one second segments to compare the first and second segments to a threshold value.
Another approach to comparing video sequences is a video comparison system. The system includes means for receiving a first list of descriptors pertaining to a sequence of first video frames, each of the descriptors relating to visual information of a corresponding video frame of the sequence of first video frames. The system further includes means for receiving a second list of descriptors pertaining to a sequence of second video frames, each of the descriptors relating to visual information of a corresponding video frame of the sequence of second video frames. The system further includes means for designating one or more first segments of the sequence of first video frames that are similar, each of the one or more first segments including neighboring first video frames. The system further includes means for designating one or more second segments of the sequence of second video frames that are similar, each of the one or more second segments includes neighboring second video frames. The system further includes means for comparing at least one of the first segments and at least one of the one or more second segments. The system further includes means for analyzing the pairs of first and second segments based on the comparison of the first segments and the second segments to compare the first and second segments to a threshold value.
In other examples, any of the approaches above can include one or more of the following features. In some examples, the analyzing includes determining similar first and second segments.
In other examples, the analyzing includes determining dissimilar first and second segments.
In some examples, the comparing includes comparing each of the one or more first segments to each of the one or more second segments.
In other examples, the comparing includes comparing each of the one or more first segment to each of the one or more second segment that is located within an adaptive window.
In some examples, the method further includes varying a size of the adaptive window during the comparing.
In other examples, the comparing includes designating first clusters of the one or more first segments formed of a sequence of first segments. The comparing can further include for each first cluster, selecting a first segment of the sequence of first segments of that cluster to be a first cluster centroid. The comparing can further include comparing each of the first cluster centroids to each of the second segments. The comparing can further include for each of the second segments within a threshold value of each of the first cluster centroids, comparing the second segments and the first segments of the first cluster.
In some examples, the comparing includes designating first clusters of first segments formed of a sequence of first segments. The comparing can further include for each first cluster, selecting a first segment of the sequence of first segments of that cluster to be a first cluster centroid. The comparing can further include designating second clusters of second segments formed of a sequence of second segments. The comparing can further include for each second cluster, selecting a second segment of the sequence of second segments of that cluster to be a second cluster centroid. The comparing can further include comparing each of the first cluster centroids to each of the second cluster centroids. The comparing can further include for each of the first cluster centroids within a threshold value of each of the second cluster centroids, comparing the first segments of the first cluster and the second segments of the second cluster to each other.
In other examples, the method further includes generating the threshold value based on the descriptors relating to visual information of a first video frame of the sequence of first video frames, and/or the descriptors relating to visual information of a second video frame of the sequence of second video frames.
In some examples, the analyzing is performed using at least one matrix and searching for diagonals of entries in the at least one matrix representing levels of differences in segments of similar video frames
In other examples, the method further includes finding similar frame sequences for previously unmatched frame sequences.
The frame sequence comparison in video streams described herein can provide one or more of the following advantages. An advantage of the frame sequence comparison is that the comparison of multimedia streams is more efficient since a user does not have to view the multimedia streams in parallel, but can more efficiently review the report of an automated comparison to determine the differences and/or similarities between the multimedia streams. Another advantage is that the identification of similar frame sequences provides a more accurate comparison of multimedia streams since an exact bit-by-bit comparison of the multimedia streams is challenging and inefficient.
Other aspects and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating the principles of the invention by way of example only.
The foregoing and other objects, features, and advantages of the present invention, as well as the invention itself, will be more fully understood from the following description of various embodiments, when read together with the accompanying drawings.
By way of general overview, the technology compares multimedia content (e.g., digital footage such as films, clips, and advertisements, digital media broadcasts, etc.) to other multimedia content via a content analyzer. The multimedia content can be obtained from virtually any source able to store, record, or play multimedia (e.g., live television source, network server source, a digital video disc source, etc.). The content analyzer enables automatic and efficient comparison of digital content. The content analyzer can be a content analysis processor or server, is highly scalable and can use computer vision and signal processing technology for analyzing footage in the video and in the audio domain in real time.
Moreover, the content analysis server's automatic content comparison technology is highly accurate. While human observers may err due to fatigue, or miss small details in the footage that are difficult to identify, the content analysis server is routinely capable of comparing content with an accuracy of over 99%. The comparison does not require prior inspection or manipulation of the footage to be monitored. The content analysis server extracts the relevant information from the multimedia stream data itself and can therefore efficiently compare a nearly unlimited amount of multimedia content without manual interaction.
The content analysis server generates descriptors, such as digital signatures—also referred to herein fingerprints—from each sample of multimedia content. The digital signatures describe specific video, audio and/or audiovisual aspects of the content, such as color distribution, shapes, and patterns in the video parts and the frequency spectrum in the audio stream. Each sample of multimedia has a unique fingerprint that is basically a compact digital representation of its unique video, audio, and/or audiovisual characteristics.
The content analysis server utilizes such fingerprints to find similar and/or different frame sequences or clips in multimedia sample. The system and process of finding similar and different frame sequences in multimedia samples can also be referred to as the motion picture copy comparison system (MoPiCCS).
The content analysis server 110 requests and/or receives multimedia streams from one or more of the content devices 105 (e.g., digital video disc device, signal acquisition device, satellite reception device, cable reception box, etc.), the storage server 140 (e.g., storage area network server, network attached storage server, etc.), the content server 150 (e.g., internet based multimedia server, streaming multimedia server, etc.), and/or any other server or device that can store a multimedia stream (e.g., cell phone, camera, etc.). The content analysis server 110 identifies one or more frame sequences for each multimedia stream. The content analysis server 110 generates a respective fingerprint for each of the one or more frame sequences for each multimedia stream. The content analysis server 110 compares the fingerprints of one or more frame sequences between each multimedia stream. The content analysis server 110 generates a report (e.g., written report, graphical report, text message report, alarm, graphical message, etc.) of the similar and/or different frame sequences between the multimedia streams.
In other examples, the content analysis server 110 generates a fingerprint for each frame in each multimedia stream. The content analysis server 110 can generate the fingerprint for each frame sequence (e.g., group of frames, direct sequence of frames, indirect sequence of frames, etc.) for each multimedia stream based on the fingerprint from each frame in the frame sequence and/or any other information associated with the frame sequence (e.g., video content, audio content, metadata, etc.).
In some examples, the content analysis server 110 generates the frame sequences for each multimedia stream based on information about each frame (e.g., video content, audio content, metadata, fingerprint, etc.).
The communication module 211 receives information for and/or transmits information from the content analysis server 210. The processor 212 processes requests for comparison of multimedia streams (e.g., request from a user, automated request from a schedule server, etc.) and instructs the communication module 211 to request and/or receive multimedia streams. The video frame preprocessor module 213 preprocesses multimedia streams (e.g., remove black border, insert stable borders, resize, reduce, selects key frame, groups frames together, etc.). The video frame conversion module 214 converts the multimedia streams (e.g., luminance normalization, RGB to Color9, etc.). The video fingerprint module 215 generates a fingerprint for each key frame selection (e.g., each frame is its own key frame selection, a group of frames have a key frame selection, etc.) in a multimedia stream. The video segmentation module 216 segments frame sequences for each multimedia stream together based on the fingerprints for each key frame selection. The video segment comparison module 217 compares the frame sequences for multimedia streams to identify similar frame sequences between the multimedia streams (e.g., by comparing the fingerprints of each key frame selection of the frame sequences, by comparing the fingerprints of each frame in the frame sequences, etc.). The storage device 218 stores a request, a multimedia stream, a fingerprint, a frame selection, a frame sequence, a comparison of the frame sequences, and/or any other information associated with the comparison of frame sequences.
In some examples, a fingerprint is also referred to as a descriptor. Each fingerprint can be a representation of a frame and/or a group of frames. The fingerprint can be derived from the content of the frame (e.g., function of the colors and/or intensity of an image, derivative of the parts of an image, addition of all intensity value, average of color values, mode of luminance value, spatial frequency value). The fingerprint can be an integer (e.g., 345, 523) and/or a combination of numbers, such as a matrix or vector (e.g., [a, b], [x, y, z]). For example, the fingerprint is a vector defined by [x, y, z] where x is luminance, y is chrominance, and z is spatial frequency for the frame.
In some embodiments, shots are differentiated according to fingerprint values. For example in a vector space, fingerprints determined from frames of the same shot will differ from fingerprints of neighboring frames of the same shot by a relatively small distance. In a transition to a different shot, the fingerprints of a next group of frames differ by a greater distance. Thus, shots can be distinguished according to their fingerprints differing by more than some threshold value.
Thus, fingerprints determined from frames of a first shot 472′ can be used to group or otherwise identify those frames as being related to the first shot. Similarly, fingerprints of subsequent shots can be used to group or otherwise identify subsequent shots 472″, 472″. A representative frame, or key frame 474′, 474″, 474′″ can be selected for each shot 472. In some embodiments, the key frame is statistically selected from the fingerprints of the group of frames in the same shot (e.g., an average or centroid).
For example, the communication module 211 of
In some examples, the image is a single video frame. The content analysis server 210 can generate the fingerprint 618 for every frame in a multimedia stream and/or every key frame in a group of frames. In other words, the image 612 can be a key frame for a group of frames. In some embodiments, the content analysis server 210 takes advantage of a high level of redundancy and generates fingerprints for every nth frame (e.g., n=2).
In other examples, the fingerprint 618 is also referred to as a descriptor. Each multimedia stream has an associated list of descriptors that are compared by the content analysis server 210. Each descriptor can include a multi-level visual fingerprint that represents the visual information of a video frame and/or a group of video frames.
In the example, the video segmentation module 216 compares the fingerprints for segment 1711 and 2712 and merges the two segments into segment 1-2721 based on the difference between the fingerprints of the two segments being less than a threshold value. The video segmentation module 216 compares the fingerprint for segments 2712 and 3713 and does not merge the segments be cause the difference between the two fingerprints is greater than the threshold value. The video segmentation module 216 compares the fingerprints for segment 3713 and 4714 and merges the two segments into segment 3-4722 based on the difference between the fingerprints of the two segments. The video segmentation module 216 compares the fingerprints for segment 3-4722 and 5715 and merges the two segments into segment 3-5731 based on the difference between the fingerprints of the two segments. The video segmentation module 216 can further compare the fingerprints for the other adjacent segments (e.g., segment 2712 to segment 3713, segment 1-2721 to segment 3713, etc.). The video segmentation module 216 completes the merging process when no further fingerprint comparisons are below the segmentation threshold. Thus, selection of a comparison or difference threshold for the comparisons can be used to control the storage and/or processing requirements.
In other examples, each segment 1711, 2712, 3713, 4714, and 5715 includes a fingerprint for a key frame in a group of frames and/or a link to the group of frames. In some examples, each segment 1711, 2712, 3713, 4714, and 5715 includes a fingerprint for a key frame in a group of frames and/or the group of frames.
In some examples, the video segment comparison module 217 identifies similar segments (e.g., merged segments, individual segments, segments grouped by time, etc.). The identification of the similar segments can include one or more of the following identification processes: (i) brute-force process (i.e., compare every segment with every other segment); (ii) adaptive windowing process; and (iii) clustering process.
The video segment comparison module 217 adds the pair of similar segments and the difference between the signatures to a similarsegment_list as illustrated in Table 3.
In other embodiments, the adaptive window 930 can grow and/or shrink based on the matches and/or other information associated with the multimedia streams (e.g., size, content type, etc.). For example, if the video segment comparison module 217 does not identify any matches or below a match threshold number for a segment within the adaptive window 930, the size of the adaptive window 930 can be increased by a predetermined size (e.g., from the size of three to the size of five, from the size of ten to the size of twenty, etc.) and/or a dynamically generated size (e.g., percentage of total number of segments, ratio of the number of segments in each stream, etc.). After the video segment comparison module 217 identifies the match threshold number and/or exceeds a maximum size for the adaptive window 930, the size of the adaptive window 930 can be reset to the initial size and/or increased based on the size of the adaptive window at the time of the match.
In some embodiments, the initial size of the adaptive window is predetermined (e.g., five hundred segments, three segments on either side of the corresponding time in the multimedia streams, five segments on either side of the respective location with respect to the last match in the multimedia streams, etc.) and/or dynamically generated (e.g., ⅓ length of multimedia content, ratio based on the number of segments in each multimedia stream, percentage of segments in the first multimedia stream, etc.). The initial start location for the adaptive window can be predetermined (e.g., same time in both multimedia streams, same frame number for the key frame, etc.) and/or dynamically generated (e.g., percentage size match of the respective segments, respective frame locations from the last match, etc.).
The video segment comparison module 217 compares the segment 1.11011 with the centroid segments 2.11021 and 2.21022 for each cluster 11031 and 21041, respectively. If a centroid segment 2.11021 or 2.21022 is similar to the segment 1.11011, the video segment comparison module 217 compares every segment in the cluster of the similar centroid segment with the segment 1.11011. The video segment comparison module 217 adds any pairs of similar segments and the difference between the signatures to the similar_segment_list.
In some embodiments, one or more of the different statistics can be used. For example, the brute-force comparison process 800 is utilized for multimedia streams under thirty minutes in length, the adaptive window comparison process 900 is utilized for multimedia streams between thirty-sixty minutes in length, and the clustering comparison process 1000 is used for multimedia streams over sixty minutes in length.
Although the clustering comparison process 1000 as described in
The video segment comparison 217 can generate the difference matrix based on the similar_segment_list. As illustrated in
The video segment comparison module 217 can analyze the diagonals of the difference matrix to detect a sequence of similar frames. The video segment comparison module 217 can find the longest diagonal of adjacent similar frames (in this example, the diagonal (1,2)-(4,5) is the longest) and/or find the diagonal of adjacent similar frames with the smallest average difference (in this example, the diagonal (1,5)-(2,6) has the smallest average difference) to identify a set of similar frame sequences. This comparison process can utilize one or both of these calculations to detect the best sequence of similar frames (e.g., use both and average the length times the average and take the highest result to identify the best sequence of similar frames). This comparison process can be repeated by the video segment comparison module 217 until each segment of stream 1 is compared to its similar segments of stream 2.
In some embodiments, the video segment comparison module 217 identifies similar frame sequences for unmatched frame sequences, if any. The unmatched frame sequences can also be referred to as holes. The identification of the similar frame sequences to unmatched frame sequence can be based on a hold comparison threshold that is predetermined and/or dynamically generated. The video segment comparison module 217 can repeat the identification of similar frame sequences for unmatched frame sequences until all unmatched frame sequences are matched and/or can identify the unmatched frame sequences as unmatched (i.e., no match is found). The identification of the similar segments can include one or more of the following identification processes: (i) brute-force process; (ii) adaptive windowing process; (iii) extension process; and (iv) hole matching process.
The extension of the similar frame sequences can be based on the difference of the signatures for the extended frames and the hole comparison threshold (e.g., the difference of the signatures for each extended frame is less than the hole comparison threshold). As illustrated, the similar frame sequence 11514 and 11524 are extended to the left 1512 and 1522 and to the right 1516 and 1526, respectively. In other words, the video segment comparison module 217 can determine the difference in the signatures for each frame to the right and/or to the left of the respective similar frame sequences. If the difference is less than the hole comparison threshold, the video segment comparison module 217 extends the similar frame sequences in the appropriate direction (i.e., left or right).
The video segmentation module 216 designates (2020a) first segments of the plurality of first video frames that are similar. Each segment of the first segments includes neighboring first video frames. The video segmentation module 216 designates (2020b) second segments of the plurality of second video frames that are similar. Each segment of the second segments includes neighboring second video frames.
The video segment comparison module 217 compares (2030) the first segments and the second segments. The video segment comparison module 217 analyzes (2040) the pairs of first and second segments based on the comparison of the first segments and the second segments to compare the first and second segments to a threshold value.
The media acquisition subsystem 442 acquires one or more video signals 450. For each signal, the media acquisition subsystem 442 records it as data chunks on a number of signal buffer units 452. Depending on the use case, the buffer units 452 may perform fingerprint extraction as well, as described in more detail herein. Fingerprint extraction is described in more detail in International Patent Application Serial No. PCT/US2008/060164, entitled “Video Detection System And Methods,” incorporated herein by reference in its entirety. This can be useful in a remote capturing scenario in which the very compact fingerprints are transmitted over a communications medium, such as the Internet, from a distant capturing site to a centralized content analysis site. The video detection system and processes may also be integrated with existing signal acquisition solutions, as long as the recorded data is accessible through a network connection.
The fingerprint for each data chunk can be stored in a media repository 458 portion of the data storage subsystem 446. In some embodiments, the data storage subsystem 446 includes one or more of a system repository 456 and a reference repository 460. One or more of the repositories 456, 458, 460 of the data storage subsystem 446 can include one or more local hard-disk drives, network accessed hard-disk drives, optical storage units, random access memory (RAM) storage drives, and/or any combination thereof. One or more of the repositories 456, 458, 460 can include a database management system to facilitate storage and access of stored content. In some embodiments, the system 440 supports different SQL-based relational database systems through its database access layer, such as Oracle and Microsoft-SQL Server. Such a system database acts as a central repository for all metadata generated during operation, including processing, configuration, and status information.
In some embodiments, the media repository 458 is serves as the main payload data storage of the system 440 storing the fingerprints, along with their corresponding key frames. A low quality version of the processed footage associated with the stored fingerprints is also stored in the media repository 458. The media repository 458 can be implemented using one or more RAID systems that can be accessed as a networked file system.
Each of the data chunk can become an analysis task that is scheduled for processing by a controller 462 of the management subsystem 48. The controller 462 is primarily responsible for load balancing and distribution of jobs to the individual nodes in a content analysis cluster 454 of the content analysis subsystem 444. In at least some embodiments, the management subsystem 448 also includes an operator/administrator terminal, referred to generally as a front-end 464. The operator/administrator terminal 464 can be used to configure one or more elements of the video detection system 440. The operator/administrator terminal 464 can also be used to upload reference video content for comparison and to view and analyze results of the comparison.
The signal buffer units 452 can be implemented to operate around-the-clock without any user interaction necessary. In such embodiments, the continuous video data stream is captured, divided into manageable segments, or chunks, and stored on internal hard disks. The hard disk space can be implanted to function as a circular buffer. In this configuration, older stored data chunks can be moved to a separate long term storage unit for archival, freeing up space on the internal hard disk drives for storing new, incoming data chunks. Such storage management provides reliable, uninterrupted signal availability over very long periods of time (e.g., hours, days, weeks, etc.). The controller 462 is configured to ensure timely processing of all data chunks so that no data is lost. The signal acquisition units 452 are designed to operate without any network connection, if required, (e.g., during periods of network interruption) to increase the system's fault tolerance.
In some embodiments, the signal buffer units 452 perform fingerprint extraction and transcoding on the recorded chunks locally. Storage requirements of the resulting fingerprints are trivial compared to the underlying data chunks and can be stored locally along with the data chunks. This enables transmission of the very compact fingerprints including a storyboard over limited-bandwidth networks, to avoid transmitting the full video content.
In some embodiments, the controller 462 manages processing of the data chunks recorded by the signal buffer units 452. The controller 462 constantly monitors the signal buffer units 452 and content analysis nodes 454, performing load balancing as required to maintain efficient usage of system resources. For example, the controller 462 initiates processing of new data chunks by assigning analysis jobs to selected ones of the analysis nodes 454. In some instances, the controller 462 automatically restarts individual analysis processes on the analysis nodes 454, or one or more entire analysis nodes 454, enabling error recovery without user interaction. A graphical user interface, can be provided at the front end 464 for monitor and control of one or more subsystems 442, 444, 446 of the system 400. For example, the graphical user interface allows a user to configure, reconfigure and obtain status of the content analysis 444 subsystem.
In some embodiments, the analysis cluster 444 includes one or more analysis nodes 454 as workhorses of the video detection and monitoring system. Each analysis node 454 independently processes the analysis tasks that are assigned to them by the controller 462. This primarily includes fetching the recorded data chunks, generating the video fingerprints, and matching of the fingerprints against the reference content. The resulting data is stored in the media repository 458 and in the data storage subsystem 446. The analysis nodes 454 can also operate as one or more of reference clips ingestion nodes, backup nodes, or RetroMatch nodes, in case the system performing retrospective matching. Generally, all activity of the analysis cluster is controlled and monitored by the controller.
After processing several such data chunks 470, the detection results for these chunks are stored in the system database 456. Beneficially, the numbers and capacities of signal buffer units 452 and content analysis nodes 454 may flexibly be scaled to customize the system's capacity to specific use cases of any kind. Realizations of the system 400 can include multiple software components that can be combined and configured to suit individual needs. Depending on the specific use case, several components can be run on the same hardware. Alternatively or in addition, components can be run on individual hardware for better performance and improved fault tolerance. Such a modular system architecture allows customization to suit virtually every possible use case. From a local, single-PC solution to nationwide monitoring systems, fault tolerance, recording redundancy, and combinations thereof.
The GUI 2300 includes one or more user-selectable controls 2382, such as standard window control features. The GUI 2300 also includes a detection results table 2384. In the exemplary embodiment, the detection results table 2384 includes multiple rows 2386, one row for each detection. The row 2386 includes a low-resolution version of the stored image together with other information related to the detection itself. Generally, a name or other textual indication of the stored image can be provided next to the image. The detection information can include one or more of: date and time of detection; indicia of the channel or other video source; indication as to the quality of a match; indication as to the quality of an audio match; date of inspection; a detection identification value; and indication as to detection source. In some embodiments, the GUI 2300 also includes a video viewing window 2388 for viewing one or more frames of the detected and matching video. The GUI 2300 can include an audio viewing window 2389 for comparing indicia of an audio comparison.
Configuring the digital video image detection system 126 further includes generating a timing control sequence 127, wherein a set of signals generated by the timing control sequence 127 provide for an interface to an MPEG video receiver.
In some embodiments, the method flow chart 2500 for the digital video image detection system 100 provides a step to optionally query the web for a file image 131 for the digital video image detection system 100 to match. In some embodiments, the method flow chart 2500 provides a step to optionally upload from the user interface 100 a file image for the digital video image detection system 100 to match. In some embodiments, querying and queuing a file database 133b provides for at least one file image for the digital video image detection system 100 to match.
The method flow chart 2500 further provides steps for capturing and buffering an MPEG video input at the MPEG video receiver and for storing the MPEG video input 171 as a digital image representation in an MPEG video archive.
The method flow chart 2500 further provides for steps of: converting the MPEG video image to a plurality of query digital image representations, converting the file image to a plurality of file digital image representations, wherein the converting the MPEG video image and the converting the file image are comparable methods, and comparing and matching the queried and file digital image representations. Converting the file image to a plurality of file digital image representations is provided by one of: converting the file image at the time the file image is uploaded, converting the file image at the time the file image is queued, and converting the file image in parallel with converting the MPEG video image.
The method flow chart 2500 provides for a method 142 for converting the MPEG video image and the file image to a queried RGB digital image representation and a file RGB digital image representation, respectively. In some embodiments, converting method 142 further comprises removing an image border 143 from the queried and file RGB digital image representations. In some embodiments, the converting method 142 further comprises removing a split screen 143 from the queried and file RGB digital image representations. In some embodiment, one or more of removing an image border and removing a split screen 143 includes detecting edges. In some embodiments, converting method 142 further comprises resizing the queried and file RGB digital image representations to a size of 128×128 pixels.
The method flow chart 2500 further provides for a method 144 for converting the MPEG video image and the file image to a queried COLOR9 digital image representation and a file COLOR9 digital image representation, respectively. Converting method 144 provides for converting directly from the queried and file RGB digital image representations.
Converting method 144 includes steps of: projecting the queried and file RGB digital image representations onto an intermediate luminance axis, normalizing the queried and file RGB digital image representations with the intermediate luminance, and converting the normalized queried and file RGB digital image representations to a queried and file COLOR9 digital image representation, respectively.
The method flow chart 2500 further provides for a method 151 for converting the MPEG video image and the file image to a queried 5-segment, low resolution temporal moment digital image representation and a file 5-segment, low resolution temporal moment digital image representation, respectively. Converting method 151 provides for converting directly from the queried and file COLOR9 digital image representations.
Converting method 151 includes steps of: sectioning the queried and file COLOR9 digital image representations into five spatial, overlapping sections and non-overlapping sections, generating a set of statistical moments for each of the five sections, weighting the set of statistical moments, and correlating the set of statistical moments temporally, generating a set of key frames or shot frames representative of temporal segments of one or more sequences of COLOR9 digital image representations.
Generating the set of statistical moments for converting method 151 includes generating one or more of: a mean, a variance, and a skew for each of the five sections. In some embodiments, correlating a set of statistical moments temporally for converting method 151 includes correlating one or more of a means, a variance, and a skew of a set of sequentially buffered RGB digital image representations.
Correlating a set of statistical moments temporally for a set of sequentially buffered MPEG video image COLOR9 digital image representations allows for a determination of a set of median statistical moments for one or more segments of consecutive COLOR9 digital image representations. The set of statistical moments of an image frame in the set of temporal segments that most closely matches the a set of median statistical moments is identified as the shot frame, or key frame. The key frame is reserved for further refined methods that yield higher resolution matches.
The method flow chart 2500 further provides for a comparing method 152 for matching the queried and file 5-section, low resolution temporal moment digital image representations. In some embodiments, the first comparing method 151 includes finding an one or more errors between the one or more of a mean, variance, and skew of each of the five segments for the queried and file 5-section, low resolution temporal moment digital image representations. In some embodiments, the one or more errors are generated by one or more queried key frames and one or more file key frames, corresponding to one or more temporal segments of one or more sequences of COLOR9 queried and file digital image representations. In some embodiments, the one or more errors are weighted, wherein the weighting is stronger temporally in a center segment and stronger spatially in a center section than in a set of outer segments and sections.
Comparing method 152 includes a branching element ending the method flow chart 2500 at ‘E’ if the first comparing results in no match. Comparing method 152 includes a branching element directing the method flow chart 2500 to a converting method 153 if the comparing method 152 results in a match.
In some embodiments, a match in the comparing method 152 includes one or more of a distance between queried and file means, a distance between queried and file variances, and a distance between queried and file skews registering a smaller metric than a mean threshold, a variance threshold, and a skew threshold, respectively. The metric for the first comparing method 152 can be any of a set of well known distance generating metrics.
A converting method 153a includes a method of extracting a set of high resolution temporal moments from the queried and file COLOR9 digital image representations, wherein the set of high resolution temporal moments include one or more of: a mean, a variance, and a skew for each of a set of images in an image segment representative of temporal segments of one or more sequences of COLOR9 digital image representations.
Converting method 153a temporal moments are provided by converting method 151. Converting method 153a indexes the set of images and corresponding set of statistical moments to a time sequence. Comparing method 154a compares the statistical moments for the queried and the file image sets for each temporal segment by convolution.
The convolution in comparing method 154a convolves the queried and filed one or more of: the first feature mean, the first feature variance, and the first feature skew. In some embodiments, the convolution is weighted, wherein the weighting is a function of chrominance. In some embodiments, the convolution is weighted, wherein the weighting is a function of hue.
The comparing method 154a includes a branching element ending the method flow chart 2500 if the first feature comparing results in no match. Comparing method 154a includes a branching element directing the method flow chart 2500 to a converting method 153b if the first feature comparing method 153a results in a match.
In some embodiments, a match in the first feature comparing method 153a includes one or more of: a distance between queried and file first feature means, a distance between queried and file first feature variances, and a distance between queried and file first feature skews registering a smaller metric than a first feature mean threshold, a first feature variance threshold, and a first feature skew threshold, respectively. The metric for the first feature comparing method 153a can be any of a set of well known distance generating metrics.
The converting method 153b includes extracting a set of nine queried and file wavelet transform coefficients from the queried and file COLOR9 digital image representations. Specifically, the set of nine queried and file wavelet transform coefficients are generated from a grey scale representation of each of the nine color representations comprising the COLOR9 digital image representation. In some embodiments, the grey scale representation is approximately equivalent to a corresponding luminance representation of each of the nine color representations comprising the COLOR9 digital image representation. In some embodiments, the grey scale representation is generated by a process commonly referred to as color gamut sphering, wherein color gamut sphering approximately eliminates or normalizes brightness and saturation across the nine color representations comprising the COLOR9 digital image representation.
In some embodiments, the set of nine wavelet transform coefficients are one of: a set of nine one-dimensional wavelet transform coefficients, a set of one or more non-collinear sets of nine one-dimensional wavelet transform coefficients, and a set of nine two-dimensional wavelet transform coefficients. In some embodiments, the set of nine wavelet transform coefficients are one of: a set of Haar wavelet transform coefficients and a two-dimensional set of Haar wavelet transform coefficients.
The method flow chart 2500 further provides for a comparing method 154b for matching the set of nine queried and file wavelet transform coefficients. In some embodiments, the comparing method 154b includes a correlation function for the set of nine queried and filed wavelet transform coefficients. In some embodiments, the correlation function is weighted, wherein the weighting is a function of hue; that is, the weighting is a function of each of the nine color representations comprising the COLOR9 digital image representation.
The comparing method 154b includes a branching element ending the method flow chart 2500 if the comparing method 154b results in no match. The comparing method 154b includes a branching element directing the method flow chart 2500 to an analysis method 155a-156b if the comparing method 154b results in a match.
In some embodiments, the comparing in comparing method 154b includes one or more of: a distance between the set of nine queried and file wavelet coefficients, a distance between a selected set of nine queried and file wavelet coefficients, and a distance between a weighted set of nine queried and file wavelet coefficients.
The analysis method 155a-156b provides for converting the MPEG video image and the file image to one or more queried RGB digital image representation subframes and file RGB digital image representation subframes, respectively, one or more grey scale digital image representation subframes and file grey scale digital image representation subframes, respectively, and one or more RGB digital image representation difference subframes. The analysis method 155a-156b provides for converting directly from the queried and file RGB digital image representations to the associated subframes.
The analysis method 55a-156b provides for the one or more queried and file grey scale digital image representation subframes 155a, including: defining one or more portions of the queried and file RGB digital image representations as one or more queried and file RGB digital image representation subframes, converting the one or more queried and file RGB digital image representation subframes to one or more queried and file grey scale digital image representation subframes, and normalizing the one or more queried and file grey scale digital image representation subframes.
The method for defining includes initially defining identical pixels for each pair of the one or more queried and file RGB digital image representations. The method for converting includes extracting a luminance measure from each pair of the queried and file RGB digital image representation subframes to facilitate the converting. The method of normalizing includes subtracting a mean from each pair of the one or more queried and file grey scale digital image representation subframes.
The analysis method 155a-156b further provides for a comparing method 155b-156b. The comparing method 155b-156b includes a branching element ending the method flow chart 2500 if the second comparing results in no match. The comparing method 155b-156b includes a branching element directing the method flow chart 2500 to a detection analysis method 325 if the second comparing method 155b-156b results in a match.
The comparing method 155b-156b includes: providing a registration between each pair of the one or more queried and file grey scale digital image representation subframes 155b and rendering one or more RGB digital image representation difference subframes and a connected queried RGB digital image representation dilated change subframe 156a-b.
The method for providing a registration between each pair of the one or more queried and file grey scale digital image representation subframes 155b includes: providing a sum of absolute differences (SAD) metric by summing the absolute value of a grey scale pixel difference between each pair of the one or more queried and file grey scale digital image representation subframes, translating and scaling the one or more queried grey scale digital image representation subframes, and repeating to find a minimum SAD for each pair of the one or more queried and file grey scale digital image representation subframes. The scaling for method 155b includes independently scaling the one or more queried grey scale digital image representation subframes to one of: a 128×128 pixel subframe, a 64×64 pixel subframe, and a 32×32 pixel subframe.
The scaling for method 155b includes independently scaling the one or more queried grey scale digital image representation subframes to one of: a 720×480 pixel (480i/p) subframe, a 720×576 pixel (576 i/p) subframe, a 1280×720 pixel (720p) subframe, a 1280×1080 pixel (1080i) subframe, and a 1920×1080 pixel (1080p) subframe, wherein scaling can be made from the RGB representation image or directly from the MPEG image.
The method for rendering one or more RGB digital image representation difference subframes and a connected queried RGB digital image representation dilated change subframe 156a-b includes: aligning the one or more queried and file grey scale digital image representation subframes in accordance with the method for providing a registration 155b, providing one or more RGB digital image representation difference subframes, and providing a connected queried RGB digital image representation dilated change subframe.
The providing the one or more RGB digital image representation difference subframes in method 56a includes: suppressing the edges in the one or more queried and file RGB digital image representation subframes, providing a SAD metric by summing the absolute value of the RGB pixel difference between each pair of the one or more queried and file RGB digital image representation subframes, and defining the one or more RGB digital image representation difference subframes as a set wherein the corresponding SAD is below a threshold.
The suppressing includes: providing an edge map for the one or more queried and file RGB digital image representation subframes and subtracting the edge map for the one or more queried and file RGB digital image representation subframes from the one or more queried and file RGB digital image representation subframes, wherein providing an edge map includes providing a Sobol filter.
The providing the connected queried RGB digital image representation dilated change subframe in method 56a includes: connecting and dilating a set of one or more queried RGB digital image representation subframes that correspond to the set of one or more RGB digital image representation difference subframes.
The method for rendering one or more RGB digital image representation difference subframes and a connected queried RGB digital image representation dilated change subframe 156a-b includes a scaling for method 156a-b independently scaling the one or more queried RGB digital image representation subframes to one of: a 128×128 pixel subframe, a 64×64 pixel subframe, and a 32×32 pixel subframe.
The scaling for method 156a-b includes independently scaling the one or more queried RGB digital image representation subframes to one of: a 720×480 pixel (480i/p) subframe, a 720×576 pixel (576 i/p) subframe, a 1280×720 pixel (720p) subframe, a 1280×1080 pixel (1080i) subframe, and a 1920×1080 pixel (1080p) subframe, wherein scaling can be made from the RGB representation image or directly from the MPEG image.
The method flow chart 2500 further provides for a detection analysis method 325. The detection analysis method 325 and the associated classify detection method 124 provide video detection match and classification data and images for the display match and video driver 125, as controlled by the user interface 110. The detection analysis method 325 and the classify detection method 124 further provide detection data to a dynamic thresholds method 335, wherein the dynamic thresholds method 335 provides for one of: automatic reset of dynamic thresholds, manual reset of dynamic thresholds, and combinations thereof.
The method flow chart 2500 further provides a third comparing method 340, providing a branching element ending the method flow chart 2500 if the file database queue is not empty.
In some examples, the content analysis server 110 of
The above-described systems and methods can be implemented in digital electronic circuitry, in computer hardware, firmware, and/or software. The implementation can be as a computer program product (i.e., a computer program tangibly embodied in an information carrier). The implementation can, for example, be in a machine-readable storage device, for execution by, or to control the operation of, data processing apparatus. The implementation can, for example, be a programmable processor, a computer, and/or multiple computers.
A computer program can be written in any form of programming language, including compiled and/or interpreted languages, and the computer program can be deployed in any form, including as a stand-alone program or as a subroutine, element, and/or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site.
Method steps can be performed by one or more programmable processors executing a computer program to perform functions of the invention by operating on input data and generating output. Method steps can also be performed by and an apparatus can be implemented as special purpose logic circuitry. The circuitry can, for example, be a FPGA (field programmable gate array) and/or an ASIC (application-specific integrated circuit). Modules, subroutines, and software agents can refer to portions of the computer program, the processor, the special circuitry, software, and/or hardware that implements that functionality.
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor receives instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer can include, can be operatively coupled to receive data from and/or transfer data to one or more mass storage devices for storing data (e.g., magnetic, magneto-optical disks, or optical disks).
Data transmission and instructions can also occur over a communications network. Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices. The information carriers can, for example, be EPROM, EEPROM, flash memory devices, magnetic disks, internal hard disks, removable disks, magneto-optical disks, CD-ROM, and/or DVD-ROM disks. The processor and the memory can be supplemented by, and/or incorporated in special purpose logic circuitry.
To provide for interaction with a user, the above described techniques can be implemented on a computer having a display device. The display device can, for example, be a cathode ray tube (CRT) and/or a liquid crystal display (LCD) monitor. The interaction with a user can, for example, be a display of information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer (e.g., interact with a user interface element). Other kinds of devices can be used to provide for interaction with a user. Other devices can, for example, be feedback provided to the user in any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback). Input from the user can, for example, be received in any form, including acoustic, speech, and/or tactile input.
The above described techniques can be implemented in a distributed computing system that includes a back-end component. The back-end component can, for example, be a data server, a middleware component, and/or an application server. The above described techniques can be implemented in a distributing computing system that includes a front-end component. The front-end component can, for example, be a client computer having a graphical user interface, a Web browser through which a user can interact with an example implementation, and/or other graphical user interfaces for a transmitting device. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (LAN), a wide area network (WAN), the Internet, wired networks, and/or wireless networks.
The system can include clients and servers. A client and a server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
The communication network can include, for example, a packet-based network and/or a circuit-based network. Packet-based networks can include, for example, the Internet, a carrier internet protocol (IP) network (e.g., local area network (LAN), wide area network (WAN), campus area network (CAN), metropolitan area network (MAN), home area network (HAN)), a private IP network, an IP private branch exchange (IPBX), a wireless network (e.g., radio access network (RAN), 802.11 network, 802.16 network, general packet radio service (GPRS) network, HiperLAN), and/or other packet-based networks. Circuit-based networks can include, for example, the public switched telephone network (PSTN), a private branch exchange (PBX), a wireless network (e.g., RAN, bluetooth, code-division multiple access (CDMA) network, time division multiple access (TDMA) network, global system for mobile communications (GSM) network), and/or other circuit-based networks.
The communication device can include, for example, a computer, a computer with a browser device, a telephone, an IP phone, a mobile device (e.g., cellular phone, personal digital assistant (PDA) device, laptop computer, electronic mail device), and/or other type of communication device. The browser device includes, for example, a computer (e.g., desktop computer, laptop computer) with a world wide web browser (e.g., Microsoft® Internet Explorer® available from Microsoft Corporation, Mozilla® Firefox available from Mozilla Corporation). The mobile computing device includes, for example, a personal digital assistant (PDA).
Comprise, include, and/or plural forms of each are open ended and include the listed parts and can include additional parts that are not listed. And/or is open ended and includes one or more of the listed parts and combinations of the listed parts.
In general, the term video refers to a sequence of still images, or frames, representing scenes in motion. Thus, the video frame itself is a still picture. The terms video and multimedia as used herein include television and film-style video clips and streaming media. Video and multimedia include analog formats, such as standard television broadcasting and recording and digital formats, also including standard television broadcasting and recording (e.g., DTV). Video can be interlaced or progressive. The video and multimedia content described herein may be processed according to various storage formats, including: digital video formats (e.g., DVD), QuickTime®, and MPEG 4; and analog videotapes, including VHS® and Betamax®. Formats for digital television broadcasts may use the MPEG-2 video codec and include: ATSC—USA, Canada DVB—Europe ISDB—Japan, Brazil DMB—Korea. Analog television broadcast standards include: FCS—USA, Russia; obsolete MAC—Europe; obsolete MUSE—Japan NTSC—USA, Canada, Japan PAL—Europe, Asia, Oceania PAL-M—PAL variation. Brazil PALplus—PAL extension, Europe RS-343 (military) SECAM—France, Former Soviet Union, Central Africa. Video and multimedia as used herein also include video on demand referring to videos that start at a moment of the user's choice, as opposed to streaming, multicast.
One skilled in the art will realize the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The foregoing embodiments are therefore to be considered in all respects illustrative rather than limiting of the invention described herein. Scope of the invention is thus indicated by the appended claims, rather than by the foregoing description, and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
This application claims the benefit of U.S. Provisional Application No. 61/032,306 filed Feb. 28, 2008. The entire teachings of the above application are incorporated herein by reference.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB09/05407 | 2/28/2009 | WO | 00 | 5/26/2011 |
Number | Date | Country | |
---|---|---|---|
61032306 | Feb 2008 | US |