The present invention relates to a method for identifying motion video content, more particularly, the present invention relates to a method for identifying fingerprints of motion video content.
The so called term “fingerprint” appearing in this specification means a series of dot information, in which each dot information is selected from a frame of pattern of television signals, and a plurality of frames can be selected from the television signals, and one or more dot data can be selected from one frame of pattern of television signals, so that the so called “fingerprint” can be used to uniquely identify the said television signals.
This document describes systems and methods for identifying video content. Video is the best way to distribute information to the masses. Today, almost all video content is created in digital forms, from the moment of video capture, to production, editing and special effects, and compression and distribution. In addition, increasing amount of video content is stored on DVDs, tapes, computer servers, and mass storage arrays.
Organizing digital video content is becoming a major challenge for all content owners, video and broadband internet service providers, and even home users. This is because unlike text, video content cannot be searched and identified easily by computers. Unlike audio, video content data has far large data size. In addition, it is very difficult and inefficient to identify video content by human interactions since the process is very time-consuming and cannot be scaled. These factors makes it difficult to effectively organize, archive, and search video content. However, the need for searching and identifying video content is increasingly important with the increasing bandwidth available on the network and the lowering cost of digital storage devices.
Therefore, there is a need to identify video content efficiently and with minimal or no human interactions.
It is an object of the present invention to provide a method for identifying motion video content, which can effectively organize, archive, and search video content.
It is another object of the present invention to provide a method for identifying motion video content, which can lower the cost of digital storage devices.
It is another object of the present invention to provide a method for identifying motion video content, which can identify video content efficiently and with minimal or no human interactions.
It is another object of the present invention to provide a method for identifying motion video content, which can be used to extract information from a given video content segment and use the extracted information to further automatically identify the same video content if it is ever to appear again in a different video data stream.
Therefore, in the present invention, there is provided a method for identifying motion video content, forming a registered fingerprint database in advance for video contents of broadcasting video signals, wherein said method at least comprises the steps of storing a consecutive of video frame images of a motion video content to be identified into a frame buffer; obtaining sample values on the video frame images by a frame sampler; holding the sample values in a fingerprint store as a fingerprint A for search in the fingerprint database; and performing a fingerprint pattern matching algorithm between the fingerprint A for search in the fingerprint database and fingerprints B contained in the fingerprint database so as to determine whether the motion video content has ever been broadcasted before.
The method according to the present invention can effectively organize, archive, and search video content; lower the cost of digital storage devices; and identify video content efficiently and with minimal or no human interactions.
In what follows, it provide descriptions for a method that can be used extract information from a given video content segment and use the extracted information to further automatically identify the same video content if it is ever to appear again in a different video data stream.
The capability to correctly identify video content has many important applications. These include but not limited to the following:
In what follows, it will first describe methods for extracting information from a given video content data, and calling the fingerprinting process. Then, it will describe how to use the fingerprint data to seek a match within a different video content.
In all following discussions, it will focus on the handling of video signals, although in most cases, video signal comes together with audio signals as an integral part of the audio/video program. The audio signal will be considered in synchronization with the video signal. Fingerprint operations on the video signal identify the video content as well as the associated audio content. Therefore, for the remainder of this document, it will limit discussions on dealing with fingerprint operations on video signal only.
It is also assumed that the video data has been digitized. It's possible to extend the idea to analog video content as well by first digitizing the analog video signal into digital data streams before applying the methods described herein. Therefore, it will not discuss how to deal with analog video content in this document.
In addition, it is assumed that the digital video content is in uncompressed formats. For compressed video content, decompression (or decoding) of the video data stream is required before applying the method used herein.
Lastly, it is assumed that all video frames are in progressive format, which means that each video frame is displayed at the decoder together. For interlaced video frames, the frame is displayed in two separate time instances as two (top and bottom) fields. In this case, it is assumed that all of the processing described below applies to one of the fields.
Digital video data in uncompressed format can be represented by time sequence of video frames. Each frame can be described as a two dimensional array of pixel values. Each pixel value can be further decomposed into brightness (luminance) and color (chrominance) components. For the purpose of obtaining and searching through video content, we only use the luminance pixel values of the video frames.
Digital video content consists of time-consecutive frames that, when presented to the human visual system, presents the illusion of continuous motion. It first describes the methods for extracting information from these video frames so that the extracted information can be used to identify the frame.
The steps required to perform the fingerprint matching can be summarized as follows
In what follows, we describe each of the steps in some details.
2.1 Fingerprint Extraction
The easiest way to do this would be to record all of the video frames and save them in a disk storage. The drawback of this problem, of course, is the tremendous amount of data storage capacity required. In addition, storage bandwidth limitations make it more difficult to rapidly retrieve the stored video frames.
The described method in this document starts with the first step of sub-sampling the video frames. Specifically, for each video frame, it performs a spatial sub-sampling, where a fixed number of samples are taken from the video frame and stored as sample values. The key steps can be illustrated in
The video frames 100 consists of time-continuous video images. Each video frame is first held in the frame store 101, and then a frame sampler 102 is used to obtain the sampled value from frame store 101. The results are saved in fingerprint store 103. We describe each of the steps in some greater detail below.
2.1.1 Video Frame Sub-Sampling
One preferable sub-sampling scheme is to take 5 samples at different locations of the video frame. These samples should be taken as evenly distributed in the frame as possible, with the center of the frame as the center of the sub-sampling. One preferable sub-sampling of the frame is shown in
Of course, there can be other methods of sub-sampling, but it will continue to use the above sub-sampling scheme to describe the rest of the methods. Those skilled in the art will be able to expand the method to other sub-sampling schemes, with more or less than 5 samples per video frame, or sub-sampling at varying number of pixels per video frame.
This sampling scheme is independent of the frame resolution or aspect ratio, making it more robust for dealing with video content of different resolutions and aspect ratios.
2.1.2 Sub-Sampling of Multiple Video Frames
The sub-sampled values are saved for each of the frames. From the above description, it is noted that 5 frame samples are obtained for each video frame. It repeats this process for several consecutive N number of video frames. For example, it can sub-sample N=50 consecutive video frames, and then organize the sub-sampled values into a 5×50 array. This sub-sampling process is shown in
This array is what we called the fingerprint of the video content. From the above description, it is noted that the fingerprint covers only 50 video frames, for PAL video format, it's 2 seconds worth of video, for NTSC, it's less then 2 seconds. If it can uniquely identify this N video frames through the sub-sampled values, then it can significantly reduce the computation and storage required for the identification.
The fingerprint only identifies the 50 video frames within the video content, but not the remainder of the video content. For most video content, where the content titles are usually static, uniquely identifying a segment of the content is sufficient to uniquely identifying the entire video content title.
For those content where segments of which may be re-arranged, it may need to sub-sampling more frames. Therefore, there are provided several preferable ways to determine the number of video frames to sub-sample, that is
This can be illustrated in
Each consecutively sampled video frames results in a continuous two-dimensional array of sampled values. This sampled array is the so-called fingerprint for the sampled video content.
From the above, it is noted that depending on the sampling method used, there may be more than one fingerprint array for a given video content. For the first and the third sampling methods, there is only one fingerprint, for the second sampling method, there can be multiple fingerprint arrays, each identifying a corresponding segment of the video content. Of course, multiple consecutive fingerprint arrays can be organized into more complex fingerprint arrays, which will not be discussed in this document.
In what follows, it focus our discussions on the handling of a single fingerprint array.
2.2 Fingerprint Matching
In this section, it describes methods for the inverse of the fingerprinting process, i.e., to use the given fingerprint array to seek a match within a different video content stream which may match partially or entirely the video content represented by the fingerprint.
There are several different scenarios between the two video contents. Let's call the video content, from which the fingerprint is extracted, as video A, and call the video content, which it will seek to find a match with the video A fingerprint, as video B. If such a match is determined to be true, then it concludes that original video contents A and B are identical at least for the sections associated with the matching fingerprint. This process can be illustrated in
Then video A and B may contain identical video content albeit they may be of different resolution, aspect ratio and possibly with different levels of quality degradations. For the purpose of discussions, this document will not address these different scenarios. Instead, it will focus on how to seek a match between the fingerprints from the two video sequences.
Specific steps can be illustrated in
2.2.1 The Sum of Absolute Difference Operations
The key processing required for the fingerprint matching algorithm is the Sum of Absolute Difference (SAD) operations between the two fingerprints. The operation is performed between the samples obtained from two video frames. Specifically, consider the example given in
SAD(A,B)=|A1-B1|+|A2-B2|+|A3-B3|+|A4-B4|+|A5-B5| (EQ 1)
where the |A-B| is the absolute value operation.
The SAD operation basically evaluates the differences between the sample sets of the two video frames A and B. Larger value of SAD(A,B) implies bigger image content differences between the two video frames. This process can be illustrated in
2.2.2 The Moving SAD Window and Sum of SAD (SSAD) Array
The SAD operation described above is repeated for two fingerprint arrays, one obtained from fingerprint A and the other obtained from the fingerprint B. The goal is to search through fingerprint B to see if there is a subsection of which that matches fingerprint A. The fingerprint A is assumed to have less number of samples than fingerprint B. The moving window of SAD operation defined as follows:
First, fingerprint A and B are item-wise associated with each other, because fingerprint A is smaller than fingerprint B in number of samples, only some of the samples from fingerprint B are associated with those within fingerprint A.
Next, all of the fingerprint B samples within this window are included in the SAD operations with fingerprint A samples, and the results are added together to form a single sum of SAD (SSAD) number.
The same process is then repeated by shifting the position of fingerprint B relative to A by one frame. Each such shift results in a new SSAD value generated. Therefore, a series of SSAD values are generated and saved in a SSAD array.
This process can be illustrated in
2.2.3 The Fingerprint Match Detection
The fingerprint match detection is a process applied to the SSAD time-series of numbers. From the previous descriptions, it is noted that SSAD time-series represents a time-shifted measure of the difference between two video frame sequences under comparison. When the SSAD value is low, it means that the two sections of fingerprinted video frames are similar, otherwise they are not. However, due to the fact that different resolution, different video quality degradation (due to compression), and different noise level all contribute to the increase in SSAD values, so the absolute values of the SSAD series themselves are not sufficient to determine the location of a possible fingerprint match.
The fingerprint match is identified by a very sharp drop in the SSAD values just before the match and a very sharp increase in SSAD values just after the match. This can be shown in an actually measured SSAD values in
The key element to detect the sharp drop pattern within the SSAD values can be illustrated in
Clearly, S(n) represents the difference between video A and video B on their respective n-th frame within the fingerprint window. Note that for video fingerprint B, the index n refers to a different video frame each time the fingerprint array B is shifted by one frame relative to fingerprint array A.
The pattern values can be obtained by the pattern extractor 300, which is described as follows:
P(n)=(S(n)-S(n−1))/S(n) (EQ 2)
Note that P(1) is not defined and will not be used. In addition, it does the above only if S(n) is not zero and above certain fixed threshold. Otherwise, P(n) is set to zero.
From this, it can tell that P(n) is a positive number if S(n)>S(n−1), i.e., the SSAD value is increasing, it means that the two represented video frames are diverging from each other, indicating less probability that a match will be found. On the other hands, if P(n) is a negative number, it means that the two video frames are increasingly more similar to each other. The value of P(n) represents the percentage of the change of S(n), and larger P(n) means more rapid change in values of S(n) vs. S(n−1).
The extracted pattern values form another series of numbers which are then stored in pattern store 301.
The pattern inspector 302 inspects the values contained in pattern store 301 by the following steps:
Select a specific position, say m, within the pattern store 301 and identify all of the values within a window of size 2M−1 of position m:
P(m−M+1), P(m-M+2), . . . , P(m−1), P(m), P(m+1), . . . , P(m+M−2), P(m+M−1) (EQ 3)
These values are then added together by the pattern value collector 303 and yields a result C(m), in the following way:
C(m)=−P(m−M+1)−. . . −P(m−1)−P(m)+P(m+1)+. . . +P(m+M−1) (EQ 4)
From the above, it is noted that C(m) will be a large number when there is a sharp dip in the values of pattern values P(. . . ) at position m. Otherwise, C(m) tends to be small values.
Finally, the value C(m) is compared with a user given threshold 304 to determine if a match has been found between the two fingerprints, and the frame number is determined through the above process and signaled as output to histogram collector, shown in
The histogram collector 305 gathers all of the pattern values C(m) that have exceeded the given threshold, count the number of times each value exceeded the threshold, and store them into an array. Each item in the array holds the value m, C(m) and the number of times that C(m) has crossed the threshold. Finally, the maximum value selector 306 inspect all such values within the histogram for the value that has appeared the most number of times. This value refers to the frame that is identified as the fingerprint matched frame.
This application is the U.S. National Phase under 35 U.S.C. §371 of International Application No. PCT/CN2008/071046, filed on May 22, 2008, which in turn claims the benefit of U.S. Provisional Application No. 60/966,201, filed on Aug. 22, 2007, the disclosures of which Applications are incorporated by reference herein.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CN2008/071046 | 5/22/2008 | WO | 00 | 5/30/2008 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2009/026803 | 3/5/2009 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
3919479 | Moon et al. | Nov 1975 | A |
5019899 | Boles et al. | May 1991 | A |
5870754 | Dimitrova et al. | Feb 1999 | A |
5926223 | Hardiman | Jul 1999 | A |
6037986 | Zhang et al. | Mar 2000 | A |
6084539 | Yamada | Jul 2000 | A |
6374260 | Hoffert et al. | Apr 2002 | B1 |
6473529 | Lin | Oct 2002 | B1 |
6834308 | Ikezoye et al. | Dec 2004 | B1 |
7336841 | Neogi | Feb 2008 | B2 |
7523312 | Kalker et al. | Apr 2009 | B2 |
7809154 | Lienhart et al. | Oct 2010 | B2 |
20030126276 | Kime et al. | Jul 2003 | A1 |
20040021669 | Fredlund et al. | Feb 2004 | A1 |
20040240562 | Bargeron et al. | Dec 2004 | A1 |
20050141707 | Haitsma et al. | Jun 2005 | A1 |
20050213826 | Neogi | Sep 2005 | A1 |
20060129822 | Snijder et al. | Jun 2006 | A1 |
20060184961 | Lee et al. | Aug 2006 | A1 |
20070055987 | Lu et al. | Mar 2007 | A1 |
20070071330 | Oostveen et al. | Mar 2007 | A1 |
20070124796 | Wittkotter | May 2007 | A1 |
20070136782 | Ramaswamy et al. | Jun 2007 | A1 |
20070162571 | Gupta et al. | Jul 2007 | A1 |
20070186228 | Ramaswamy et al. | Aug 2007 | A1 |
20070186229 | Conklin et al. | Aug 2007 | A1 |
20080148309 | Wilcox et al. | Jun 2008 | A1 |
20080310731 | Stojancic et al. | Dec 2008 | A1 |
20090063277 | Bernosky et al. | Mar 2009 | A1 |
20090074235 | Lahr et al. | Mar 2009 | A1 |
20090092375 | Berry et al. | Apr 2009 | A1 |
20090154806 | Chang et al. | Jun 2009 | A1 |
20090213270 | Ismert et al. | Aug 2009 | A1 |
20090324199 | Haitsma et al. | Dec 2009 | A1 |
20100077424 | Ramaswamy et al. | Mar 2010 | A1 |
20100122279 | Zhang | May 2010 | A1 |
20100158488 | Roberts et al. | Jun 2010 | A1 |
20100169911 | Zhang | Jul 2010 | A1 |
20100306791 | Deng | Dec 2010 | A1 |
Number | Date | Country |
---|---|---|
1190218 | Aug 1998 | CN |
2387588 | Jul 2000 | CN |
1262003 | Aug 2000 | CN |
1341240 | Mar 2002 | CN |
1574953 | Feb 2005 | CN |
1628302 | Jun 2005 | CN |
1679051 | Oct 2005 | CN |
1679261 | Oct 2005 | CN |
1719909 | Jan 2006 | CN |
1723458 | Jan 2006 | CN |
1739121 | Feb 2006 | CN |
2914526 | Jun 2007 | CN |
101002472 | Jul 2007 | CN |
101021852 | Aug 2007 | CN |
101047833 | Oct 2007 | CN |
101120594 | Feb 2008 | CN |
0838960 | Apr 1998 | EP |
1482734 | Dec 2004 | EP |
1760693 | Mar 2007 | EP |
2419489 | Apr 2006 | GB |
9274467 | Oct 1997 | JP |
20020001088 | Jan 2002 | KR |
WO 0209447 | Jan 2002 | WO |
WO 02065782 | Aug 2002 | WO |
WO 2006059053 | Jun 2006 | WO |
WO 2007080133 | Jul 2007 | WO |
WO 2007148264 | Dec 2007 | WO |
Entry |
---|
Non-final Office Action dated Oct. 26, 2010, for U.S. Appl. No. 12/085,928. |
Non-final Office Action dated Oct. 26, 2010, for U.S. Appl. No. 12/085,754. |
Final Office Action dated Apr. 12, 2011, for U.S. Appl. No. 12/085,754. |
Non-final Office Action dated Nov. 29, 2010, for U.S. Appl. No. 12/085,927. |
Final Office Action dated May 26, 2011, for U.S. Appl. No. 12/085,927. |
Non-final Office Action dated Oct. 13, 2010, for U.S. Appl. No. 12/085,765. |
Final Office Action dated Mar. 31, 2011, for U.S. Appl. No. 12/085,765. |
Non-final Office Action dated Aug. 30, 2010, for U.S. Appl. No. 12/085,829. |
Final Office Action dated May 19, 2012, for U.S. Appl. No. 12/085,829. |
Non-final Office Action dated Oct. 29, 2010, for U.S. Appl. No. 12/085,835. |
Final Office Action dated Apr. 13, 2011, for U.S. Appl. No. 12/085,835. |
Non-final Office Action dated Dec. 1, 2010, for U.S. Appl. No. 12/085,827. |
Final Office Action dated Apr. 19, 2012, for U.S. Appl. No. 12/085,827. |
Non-final Office Action dated Oct. 13, 2010, for U.S. Appl. No. 12/085,764. |
Final Office Action dated Mar. 24, 2011, for U.S. Appl. No. 12/085,764. |
Cheung et al., “Efficient Video Similarity Measure With Video Signature”, Jan. 2003. |
Oostveen et al., “Feature Extraction and a Database Strategy for Video Fingerprinting”, 2002. |
Non-final Office Action dated Nov. 10, 2010, for U.S. Appl. No. 12/085,834. |
Final Office Action dated Mar. 25, 2011, for U.S. Appl. No. 12/085,834. |
Non-final Office Action dated Dec. 14, 2010, for U.S. Appl. No. 12/085,823. |
Notice of Allowance and Fees Due dated Aug. 19, 2011, for U.S. Appl. No. 12/085,823. |
Non-final Office Action dated Jul. 26, 2012, for U.S. Appl. No. 12/085,752. |
International Preliminary Report on Patentability and Written Opinion dated Dec. 25, 2008, for PCT Application No. PCT/CN2008/071039. |
International Search Report dated Dec. 25, 2008, for PCT/CN2008/071039. |
International Preliminary Report on Patentability and Written Opinion dated Feb. 26, 2009, for PCT Application No. PCT/CN2008/071083. |
International Search Report dated Feb. 26, 2009, for PCT Application No. PCT/CN2008/071083. |
International Preliminary Report on Patentability and Written Opinion dated Mar. 12, 2009, for PCT Application No. PCT/CN2008/071082. |
International Search Report dated Mar. 12, 2009, for PCT Application No. PCT/CN2008/071082. |
International Preliminary Report on Patentability and Written Opinion dated Sep. 4, 2008, for PCT Application No. PCT/CN2008/071046. |
International Search Report dated Sep. 4, 2008, for PCT Application No. PCT/CN2008/071046. |
International Preliminary Report on Patentability and Written Opinion dated Sep. 4, 2008, for PCT Application No. PCT/CN2008/071047. |
International Search Report dated Sep. 4, 2008, for PCT Application No. PCT/CN2008/071047. |
International Preliminary Report on Patentability and Written Opinion dated Mar. 5, 2009, for PCT Application No. PCT/CN2008/071023. |
International Search Report dated Mar. 5, 2009, for PCT Application No. PCT/CN2008/071023. |
International Preliminary Report on Patentability and Written Opinion dated Mar. 5, 2009, for PCT Application No. PCT/CN2008/071028. |
International Search Report dated Mar. 5, 2009, for PCT Application No. PCT/CN2008/071028. |
International Preliminary Report on Patentability and Written Opinion dated Feb. 19, 2009, for PCT Application No. PCT/CN2008/071029. |
International Search Report dated Feb. 19, 2009, for PCT Application No. PCT/CN2008/071029. |
International Preliminary Report on Patentability and Written Opinion dated Feb. 12, 2009, for PCT Application No. PCT/CN2008/071030. |
International Search Report dated Feb. 12, 2009, for PCT Application No. PCT/CN2008/071030. |
International Preliminary Report on Patentability and Written Opinion dated Feb. 26, 2009, for PCT Application No. PCT/CN2008/071038. |
International Search Report dated Feb. 26, 2009, for PCT Application No. PCT/CN2008/071038. |
International Preliminary Report on Patentability and Written Opinion dated Feb. 26, 2009, for PCT Application No. PCT/CN2008/071033. |
International Search Report dated Feb. 26, 2009, for PCT Application No. PCT/CN2008/071033. |
International Preliminary Report on Patentability and Written Opinion dated Oct. 16, 2008, for PCT Application No. PCT/CN2008/071041. |
International Search Report dated Oct. 16, 2008, for PCT Application No. PCT/CN2008/071041. |
English Abstract of JP 9274467 A dated Oct. 21, 1997. |
English Abstract of CN 101002472 A dated Jul. 18, 2007. |
English Abstract of CN 1739121 A dated Feb. 22, 2006. |
English Abstract of CN 101120594 A dated Feb. 6, 2008. |
English Abstract of CN 1719909 A dated Jan. 11, 2006. |
English Abstract of WO 0209447 A1 dated Jan. 31, 2002. |
English Abstract of CN 101047833 A dated Oct. 3, 2007. |
English Abstract CN 2914526 Y dated Jun. 20, 2007. |
English Abstract CN 1262003 A dated Aug. 2, 2000. |
English Abstract CN 2387588 Y dated Jul. 12, 2000. |
English Abstract CN 1679261 A dated Oct. 5, 2005. |
English Abstract CN 1574953 A dated Feb. 2, 2005. |
English Abstract CN 1628302 A dated Jun. 15, 2005. |
English Abstract CN 1190218 A dated Aug. 12, 1998. |
English Abstract CN 1341240 A dated Mar. 20, 2002. |
English Abstract CN 1723458 A dated Jan. 18, 2006. |
English Abstract CN 101021852 A dated Aug. 22, 2007. |
English Abstract CN 1679051 A dated Oct. 5, 2005. |
English Abstract KR 20020001088 A dated Jan. 9, 2002. |
Non-final Office Action dated Jan. 2, 2013, for U.S. Appl. No. 12/085,764. |
Notice of Allowance and Fees Due dated Jan. 4, 2013 for U.S. Appl. No. 12/085,927. |
Notice of Allowability dated Feb. 15, 2013, for U.S. Appl. No. 12/085,927. |
Notice of Allowance and Fees Due dated Mar. 4, 2013 for U.S. Appl. No. 12/085,834. |
Number | Date | Country | |
---|---|---|---|
20110007932 A1 | Jan 2011 | US |
Number | Date | Country | |
---|---|---|---|
60966201 | Aug 2007 | US |