The present invention relates to a method of characterising and identifying a raw or encoded video stream, or subsets of a video stream. In particular, but not exclusively, the invention relates to a method for characterising a video stream and using the characterisation of the video stream to identify identical streams in video repositories.
It is known to encode videos or video streams for storage or streaming, in order to reduce the amount of data required to store them or the bandwidth required for their transmission. A video stream comprises several pictures that are shown sequentially and a corresponding audio file. Each picture may be an entire frame or it may be only a single field which may be combined with another field to form a frame forming an image at some instance in time, as is the case for inter-laced video streams. In this specification the terms picture and frame will be both used and are often interchangeable. Techniques to encode a video are well known and this invention is applicable to many of these techniques, specifically the H.264/AVC standard, which uses a combination of image compression and motion based estimation techniques to encode a video.
Each individual picture in an encoded video stream is divided into typically equal sized macroblocks. A macroblock is a group of neighbouring pixels, typically in a sixteen by sixteen square though other sizes of macroblocks are used. The macroblocks are the standard blocks of data which are encoded and create the picture. A macroblock generally contains Y, Cb and Cr components which are the luma (brightness) and chroma (blue and red) respectively. Macroblocks may be grouped into slices, which are numbered sequences of macroblocks to be processed in sequential order during a raster scan when rendering a picture onto a display. In the known video compression standards, the luma and chroma components may be encoded either spatially or temporally.
Intra-frame encoding in the known H.264/AVC standard, is a form of spatial compression, but in other standards, such as MPEG-4, intra-frame encoding is conducted in a transform domain. In intra-frame encoding the data in a H.264/AVC standard macroblock is compressed by referring to the information contained in the previously-coded macroblocks to the left and/or above said macroblock in the same frame. The information in the encoded macroblock is derived from spatially neighbouring data points and works especially well for pictures which contain smooth surfaces. Slices or macroblocks which are encoded using intra-frame encoding are known as “I” slices or I macroblocks. The intra-frame encoding technique relies only on data contained in that particular frame, and known encoders will often encode entire frames using intra-frame encoding. These frames can be used as reference frames.
Inter-frame encoding in the H.264/AVC standard is a temporal, motion based form of compression, which is encoded with reference to a reference frame. Slices of macroblocks that contain inter-frame prediction are known as “P” slices or P macroblocks. The inter-frame encoding is a form of motion-compensated prediction, which contains the predictive information of displacing a macroblock from the reference frame/picture with a translational motion vector to describe the motion of the block and a picture reference index. Inter-frame encoding typically requires less bits per macroblock than intra frame encoding.
If a macroblock is identical to the corresponding macroblock in a reference frame, the encoder will refer to the reference frame and will “skip” the encoding of that particular macroblock. Such macroblocks are S or skipped macroblocks.
Video compression techniques involve a combination of these, and sometimes other, techniques to optimally compress the data with the loss of as little information as possible.
It is known to attempt to characterise media by assigning a “fingerprint” to describe the data. This fingerprint can then be compared to a list of previously characterised sets of data for a match to be found. Such a system is particularly developed in audio media, where in the case of an audio track library such as the iTunes® library an album is characterised by a fingerprint based on the number of files, length of recording and silence between songs, which is then compared to a known library to identify the album. Other known means of identifying video content such as DVDs involves the use of metadata, which stores the details of the media and is read when a DVD is accessed. Both systems however, are only able to identify the contents of an entire disc or album.
With the increase in digital piracy and unauthorised copies it is desirable to be able to identify content that may be protected by Digital Rights Management (DRM). It is desirable for the owners of the material to be able to locate any material protected by DRM in such large repositories as YouTube®. With multiple copies of a media file being made, altered and renamed it is also possible to have unnecessary duplication of content without knowing that the content is the same. This causes waste of storage space in hard disks or multiple nearly identical videos to be presented to a user searching though a video library. Advertisers may also want to check that advertisements that have been paid to be transmitted as part of a video stream were actually transmitted without assigning to persons the task of watching these streams.
To identify content it is known to determine a “fingerprint” for a video stream. For example Thomson licensing WO/2007/080133 discloses the use of a visual hash function to determine a fingerprint for key frames of the video to characterise the content, which works on the raw un-encoded video. St Andrews WO/2006/059053 discloses the use of motion based fingerprinting by comparing the luminescence of pixels between frames as an estimate of the amount of motion per frame. This technique involves converting each frame to a grey-scale and calculating the luminescence of each macroblock. Both techniques produce different results when the source image has been altered during replication involving e.g. a change in brightness, resolution, size of macroblock, and are computationally expensive to implement and therefore unsuitable for use on a large scale.
There is currently no satisfactory way for quickly and accurately characterising video streams in either raw or encoded formats that remains robust when the parameters of the stream have been altered.
To address at least some of these and other related problems in the prior art, the following invention provides a method of characterising and identifying raw or encoded video streams quickly and accurately as set out in claim 1. The fingerprint returned by the method is also less susceptible to changes in the parameters of the video stream such as resolution, quality, brightness etc, than previously disclosed inventions.
The invention is preferably able to identify quickly and accurately video content from large video repositories by comparing the fingerprint produced for the input stream to the fingerprint of previously characterised content. For instance the invention may be used as a method for identifying copyrighted material that has been posted on a video sharing website such as YouTube®. In other embodiments, the invention may be used to identify duplicate files on such a site, where identification of material is often done nowadays by metadata or user inputted tags, which are expensive to produce and may not accurately describe the content. Embodiment of the invention can identify adverts in a video stream. By inputting and characterising known adverts in the database these can be identified in a stream. It would be immediately apparent to the person skilled in the art, that the invention is not limited to these embodiments which are shown only by way of example.
According to an aspect of the invention there is provided, a method of characterising a video stream comprising one or more pictures, the method comprising the steps of; partitioning a picture in the video stream, to be characterised, into a plurality of blocks of data; measuring for one or more blocks of data which of a plurality of distinct encoding techniques has been used to encode the block of data or calculating which of a plurality of distinct encoding techniques is preferred to encode the block of data and storing data dependent on the measurement or calculation in a memory; determining a value for the picture based on the number of blocks of data that have been encoded, or have been calculated to be preferred to be encoded using a particular encoding technique; determining a characterising fingerprint of the video stream based on the one or more values assigned to each picture of the video stream that a value has been determined for.
A further aspect of the invention is to provide, a method of characterising a video stream as described above, where the characterising value of a picture or a frame in the stream is determined by the ratio of the number of macroblocks encoded, or calculated to be preferred to be encoded, by a particular technique, preferably a combination of techniques, to the total number of macroblocks or to the number of macroblocks encoded, or calculated to be preferred to be encoded, by a different technique, preferably a different combination of techniques. Preferably the characterising value represents the ratio of the number of intra encoded macroblocks to the total number of macroblocks, whereby the said ratio may be expressed in integer percentage points.
Preferably the value for a single picture is expressed as one of a alphanumeric character, numerical, hexadecimal, binary.
Preferably the pictures each comprise a frame of a video.
Preferably the video stream is encoded using the H.264/AVC video coding standard.
Preferably the fingerprint to characterise the video stream is written to some form of writeable memory.
Preferably the characterising value/fingerprint of a video stream is compared to other values by a difference of squares method, and preferably a fit assigned.
Further aspects, features and advantages of the present invention will be apparent from the following description and appended claims.
As is shown the frame 10, is divided into eighty macroblocks 12. There are three different types of macroblocks, P macroblocks 14, which are inter-frame prediction encoded macroblocks, I macroblocks 16, which are spatially encoded intra-frame macroblocks and S blocks 18, which are skipped macroblocks that are identical to the macroblocks in a reference frame, the characterising value 24 and the count value 26.
The macroblocks 12 are arranged in rows 20 and columns 22. An estimate of the amount of motion in a single frame 10 can be determined from the number of macroblocks 12 of a specific type. The estimate of the motion for the frame is expressed as a charactering value 24. In a preferred embodiment the estimate of motion is based on the number of I macroblocks 16 in the single frame 10. The measure of the number of macroblocks 12 of a specific type is the count value 26.
In
The characterising value 24 for the frame 10 need not be an integer value and for example a decimal, fraction, binary, hexadecimal, alphanumeric value etc may be used. In all of these embodiments the resulting characterising value 24 for each frame 10 will return a value that need not be unique and may be shared by other frames. However, by combining the characterising value 24 for a number of (preferably consecutive) frames the resulting sequence of characterising values 24 will become more distinctive as the number of frames are increased, so that the value 24 for an common video stream containing many frames is very unlikely to be shared by any other unrelated video stream. This sequence is the fingerprint of the stream 34.
A video stream 30 comprising one or more frames 10 is read into a computer to be characterised, the computer running a program in accordance with the invention. The program causes the computer processor to read an individual frame 10 at step S100 and in a preferred embodiment sets the count value 26 to zero. The count value 26 is the value that is used to calculate the characterising value 24 for a single frame 10 as described with reference to
Each frame is partitioned into one or more macroblocks 12 at step S102. In a preferred embodiment the macroblock 12 is of a fixed size across the frame 12 of 16×16 pixels, though other sized macroblocks 12 particularly those supported by the H.264/AVC standard may be used. The cost for encoding each macroblock 12 either temporally, using known inter-frame encoding methods, or spatially, using known intra-frame encoding techniques is calculated at step S104. In a preferred embodiment the calculation of the cost is based on the amount of compression achieved by a certain technique, however other methods of calculation such as the amount of CPU time required to encode a macroblock 12 may be used. In a preferred embodiment the macroblocks 12 are then encoded, or transcoded, with the technique that provides the best compression as determined by the calculation at step S106. A comparison of the costs, for each technique, is made at step S106. In the preferred embodiment if the cost of encoding a macroblock 12 by intra-frame encoding is less than for inter-frame encoding then the count value 26 increases by one at step S108 and step S110 follows. If the intra-frame encoding is more expensive than the inter-frame encoding the count value 26 remains the same and step S110 follows. This process is repeated for all macroblocks 12, thereby counting all the macroblocks 12 in a frame that are encoded using the intra-frame technique. Alternatively instead of using the cost calculation directly to alter the count value 26 the program simply counts the number of I macroblocks 16 after encoding or transcoding of the frame 10.
When there are macroblocks left at step S110 the process 100 returns to step S104 but once steps S106 has been performed for all macroblocks in the frame, then step S110 is followed by step S112. At step S112, the characterising value 24 for the frame 10 is determined using the count value 26 determined from steps S104 to S110. The characterising value 24 is preferably determined by the methods described with reference to
As in
Once the encoding technique for a macroblock 12 has been determined the program checks for further macroblocks at step S204 and repeats steps S200 until all macroblocks 12 have had their encoding attributes checked, then the process 200 progresses to step S206. The characterising value 24 for each frame 12 is determined at step S206 based on the count value 26 for the frame, preferably determined by the methods described above.
The video stream 30 to be characterised is read into the program at step S302 and the individual frames that comprise the video stream 30 are extracted at step S304. In the preferred embodiment every frame that forms the video stream 30 is used to characterise the video though other embodiments may use selected pictures or slices of frames.
The encoding technique for a first frame is checked at step S306, using the encoding attributes of the data. If the inputted image is in a raw format or encoded using a different technique to the desired one, the characterising value 24 for that frame 10 is determined at step S 308, which incorporates the steps S104 to S112 of process 100 as described above. If the frame is encoded using the desired encoder, in a preferred embodiment one which uses the H.264/AVC standard, the characterising value 24 of the single frame 10 is determined at step S310, which incorporates steps S200 to S206 of process 200. The characterising value for the single frame 10 is returned at step S312 and the process takes the next picture S315 and returns to step S306 to perform the steps on this picture.
Once step S314 determines that all frames 10 that are used to characterise the video stream 30 have been characterised the fingerprint of the stream 34 is determined at step S316. In a preferred embodiment this fingerprint is a sequence of the characterising values 24 for each frame 10. Therefore the length of the fingerprint 34 is proportional to the length of the stream characterised. In other embodiments, other combinations of the individual characterising values 24 for the frames 10 in the video stream 30 may be used to form the fingerprint 34.
Each frame 10, 36, 38, 40, 42 has already been characterised using one of the processes described above. The frames are consecutive frames in the video stream 30. The stream consists of N number of frames. The first frame 10, has characterising value 24 of 10, the second frame 36, has a characterising value 24 of 5, the third frame 38 has a characterising value 24 of 0, the fourth frame 40 has a characterising value 24 of 62 and the Nth frame 42 has a characterising value 24 of 7. The fingerprint 34 for the video stream 30 is a combination of all the characterising values 24 for each frame 10, 36, 38, 40, 42. In
Once a fingerprint 34 has been determined for a video stream 30 it is desirable to store the fingerprint 34 so that it may be compared to a database of previously determined fingerprints so that matches may be found. The fingerprint 34 in the preferred embodiment is a sequence of numbers, the length of which is proportional to the length of the video. Each value in the sequence, is a measure of the motion in a particular frame. Known matching algorithms are applied to the fingerprint 34 in order to find a match between the newly characterised content and previously characterised content. In a preferred embodiment a square of the difference technique is used as shown in
Because the fingerprint is a sequence of numbers with the order corresponding to the sequential order of the frames, it is easy to search for a previously characterised stream matching a characterised input stream of equal length, for a subset within a previously characterised stream of length equal to the length of an inputted video stream and matching the said inputted stream, as depicted in
There is shown a user personal computer 80, including a computer hard drive 82 hosting a program, a form of writeable memory 92, various processors 94, a display device 84, a connection to the internet 86 and an external database 88. In other embodiments, the personal computer 80 may be another form of computer e.g. portable computer, a network of computers etc. The program may also be stored at a location other than the computer 80, for example on a server, on an external computer, the internet etc. The external database 88 contains the fingerprints of the adverts, which have been previously characterised by the method of process 300.
The user may download or stream the video stream 30 from the internet 86, via known means. The video stream 30 in a preferred embodiment is analysed by the processor 94 running a program which is stored on the user's personal computer 80. The video stream 30 is analysed using process 300. The fingerprint 34 of the stream 30 is then preferentially stored on the writeable memory 92 of the computer 80 or an external database 88 which is accessible to multiple users to allow for the fingerprints 34 of characterised streams to be stored on the database. Such an external database 88 may be accessible in a manner analogous to the well known music databases which identify music CDs. Once the fingerprint 34 of the video stream 30 has been determined, it is then matched against fingerprints of previously characterised adverts stored on the external database 88. In this example the characterised stream 30, is a television programme which is longer than the adverts, subsequently the fingerprint 34 for the characterised stream is longer than for the adverts. In such a scenario it is preferential to search for the fingerprint 34 of the advert within the longer television programme fingerprint. Then matching the advert fingerprint to the fingerprint 34 to known content occurs such as by process 300. In a preferred embodiment information regarding the matches, such as position in the stream and length of the match, can be used by a known video player to skip identified adverts. Alternatively, such information may be used to disable the fast forward mechanism of a media player at particular segments of a stream and not allow adverts to be skipped.
A further application of the invention is the use of the program in large video repositories on the internet 86 such as YouTube® or Dailymotion®. Such repositories allow users to upload content and the content is often described by the users by tags or other metadata. With popular content several different users may upload the same video meaning that identical content may appear multiple times on the same repository with different but similar names. When a user searches for a video, the search is performed on the user inputted tags and may return many identical videos in the set of results. Consequently it may be difficult to get past a large amount of duplicated content to find other content relating to the search request, especially if it is necessary to play each video in a media player before knowing if it is the same as a previously played video.
The invention is able to identify identical content, either by comparing the fingerprints 34 of the content, if they have been previously characterised or by determining the fingerprints 34 of the content returned by the search such as by process 300, and comparing them as described above. When matching content is found the search may group the matching videos together in an analogous way to known url grouping methods found on internet search engines, such as by grouping all identical content and only giving a hyperlink to the first example in each group but giving the user the option to view all videos in a group if desired.
Furthermore matches of that content which are not identical but contain segments or clips of other results in the results set may be identified and grouped. This can occur even if the clips are edited for the reasons stated below.
Another embodiment of the invention is further concerned with the use of the invention in large video repositories on the internet 86, again such as YouTube® or Dailymotion®. Some users upload copyrighted material, or make videos that contain segments of copyrighted material, such as compilations of sporting clips for example. The invention is able to quickly search these large repositories for copyrighted material in a way analogous to that of identifying adverts in a video stream 30 as described with reference to
A further benefit of the invention is that it returns a fingerprint 34, that is robust to changes in the parameters of the stream such as resolution, colour, size of macroblock 12 etc. Therefore even if the content has been altered or downgraded in quality a match may still be found. Additionally, a match would still be found if a logo, digital watermark etc., has been added to the content. Furthermore, as the invention does not rely on the audio content of a video stream 34 a match may still be found for content with altered, and even entirely different, audio. The methods of fingerprinting a video stream 34 in the prior art do not return match results when a stream has been altered, either by changes of parameters of the stream such as resolution, colour, encoding attributes etc., or the inclusion of digital watermarks or logos. The fingerprint returned by the invention is robust to these changes allowing for the identification of altered content. It can also be used in combination with known audio matching techniques.
Whilst the above embodiments have been described in the context of their application for a single video stream, it would be appreciated that the presented invention may be used in a variety of different applications. The use of such a system may be implemented on a single desktop or portable computer to characterise video clips already stored thereon, or to characterise video streams downloaded or streamed from the internet. Furthermore, the invention may be implemented on a content server which contains video clips that may be accessed via, for example, the internet, a network of computers, etc.
Number | Date | Country | Kind |
---|---|---|---|
08386005 | Apr 2008 | EP | regional |