Rights holders of video are faced with a variety of challenges. To grow audience, they often allow their content to be exposed to audiences in ways that were not anticipated in the past, and that have a variety of challenges. For example, YouTube audiences are critical to comedy shows, which may want to allow clips of their content to circulate, or news programs, which may want their news videos to have wide exposure.
One current technique for rights holders of video to use to protect their intellectual property (IP) is to use fingerprinting. With fingerprints, a video is subjected to analysis after production, and a mathematical description of the video or scenes in the video is created.
Another method for rights holders of video to protect their IP is watermarking, in which, during the production of the video, a digital watermark is introduced into the video. These techniques require that rights holders actively participate in the protection of their content.
Claiming ownership of this content that has “escaped into the wild” can be difficult if an owner must watermark each clip prior to release, or if one must fingerprint each clip after release. In the case of watermarking, production workflow or the costs of watermarking technology may preclude broad application; and in the case of fingerprinting) one is essentially acting after it is too late.
However, humans are easily able to recognize an actor, or a set or a logo in the background and understand that the content was produced by a particular rights holder, but having people review content can be expensive and time-consuming.
The systems and methods described here allow a video rights holder to recognize their videos through pattern recognition techniques. Thus, video rights holders can claim content as theirs after-the-fact and without advance steps such as fingerprinting or watermarking. For example, by recognizing a set used in a show's production, a logo on a screen, specific actors, or other recognizable features, video that has never been fingerprinted or watermarked can be detected and identified as belonging to the rights holder. This recognition approach is especially helpful for those who currently have to use technology from a digital watermarking alliance and are forced to pay for technology even though the value of the content may be uncertain.
The systems and methods described herein enable rights holders to claim ownership of their property without ever having submitted the clip for analysis or modification in advance. By scanning videos with object detectors and creating a list of objects, and then creating a mapping of objects, logos, and people to rights holders, a system can automatically establish that a video belongs to a certain rights holder. Other features and advantages will be apparent from the following description, drawings, and claims.
Referring to
Server 10 can include a web crawling system, e.g., in a module, that can operate through an interface to access websites to obtain video content. The audio component of the video can be discarded, or it could be retained and stored if desired. While video content could be displayed in real time using the time it takes to display the content, in other embodiments, multiple screen shots from the video are captured and stored in storage. The screen shots are compared to the information representing frames or portions of frames such as a library of images or mathematical representations or images stored in storage. The processor uses pattern recognition techniques to compare the known information and new frames to identify key features. In the case of television shows, for example, the features could include logos that appear frequently on the television screen, such as on the lower left or lower portion of a screen, or could include other features such as the format of boxes of scrolling content at the bottom of the screen, or could even include pattern recognition that detects individuals or common scenes, such as the scene of a new broadcast or situation comedy. The information derived from comparisons of the pattern recognition library to the video images is then used to indicate that desired video content has been identified. This information can thus identify the video, such as what TV show it is, and can use this information to identify ownership.
A variety of processes for pattern recognition can be used to create the matching. One example is described in Viola et al., “Rapid Object Detecting Using a Boosted Cascade of Simple Features,” 2001.
When matches are found between known video content 18 and downloaded content 12, the specific video and/or ownership can be determined. This information about the specific video can be used to determine ownership. Once ownership is established, an audience size and other metrics can be determined relating to the content's consumption. For instance, by scanning videos on the Internet (such as on a website like “YouTube” that focuses on videos) for the presence of a known set of objects and matching the objects to broadcasters, and then retrieving audience data on the videos detected via this process, one can essentially create an Internet version of a ratings system (like the Nielsen TV ratings) that is potentially more objective and accurate than current methods on the Internet. One could determine ownership without identifying specific videos, e.g., by looking for how often a CNN log appears versus a Fox News logo without distinguishing specific videos.
Because the matching is not tied to a particular frame or a set of frames, this process is not fingerprinting, and therefore lacks the process and scalability issues associated with fingerprinting. Because the process does not look for digital watermarks, it does not require the additional processing step of adding the digital watermarking information. Instead this technique can be used based simply upon a known person, stage set, or other object or logo which the rights holder typically might include in a video.
The systems described above can be implemented on an appropriately programmed processor with suitable storage for programs and data, and interfaces to other components. The processor can include controllers, microprocessors, gate logic, or any other form of data processing. For example, while the system is described as using a “server,” this element and the functions implemented by it could be conducted by one or more different forms of processing, and in the same or multiple physical units. Furthermore, while the storage in
To the extent software is used to implement the systems and methods described here, such software can be maintained as separate modules. Such software can include instructions that are executed by processing systems. The software instructions can be provided in a memory device, such as a magnetic disk, optical disk, semiconductor memory, or some other type of memory.
While human interaction can be included at various portions of this system, in some embodiments, the methods can be implemented in an automated manner without human interaction—the system can thus download videos, capture images from videos, perform a comparison to a library of known images, create a report of the results, and send the report with results, all without human input in that process.
The results can be used in a number of different ways, such as monitoring unauthorized use, tracking frequency of use for rating purposes, tracking use for royalty calculation, or otherwise for monitoring the dissemination of videos.
Other features can also be included. For example, the system can create, store, and report a probability or a confidence level for each of the matches. This level can be numerical (e.g., 90% chance) or qualitative (e.g., highly likely, somewhat likely, etc.). The probability/level can be increased by searching for multiple matches within a video. For example, with respect to a cable news program, the system could look for both the logo and for a format of information boxes and crawling information at the bottom of the video. Finding multiple features can significantly increase the confidence/probability level. The system can also capture metadata that is associated with videos and use that metadata to extract information and/or to affect a confidence index that a comparison has been made. The confidence/probability can further be increased by providing human review and also by detecting the number of times that the image appears. For example, in the case where a logo is to be detected, the logo might be more difficult to observe because of background images in some frames, but more detectable in other frames. The frequency with which the logo is found affects the confidence that the feature has been identified.
This application claims priority to provisional application Ser. No. 61/146,919, filed Jan. 23, 2009, which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61146919 | Jan 2009 | US |