The present invention generally relates to the field of fingerprinting of digital media signals, such as audio and more particularly to the generation of fingerprints when a part of the digital media signal includes digital silence.
It is known to provide fingerprints for media signals such as audio signals in order to identify a certain piece of music. A local computer then generates a fingerprint for an audio signal and sends this fingerprint as a query to a database. In the database the fingerprint is compared with other fingerprints and if a match is found, it is returned to the local computer, which then has received an identification of the audio signal.
Such fingerprinting is useful in many applications, for instance in radio stations for identifying play lists, but there is also a growing market for private persons wanting to buy music after having identified it, for instance on the radio.
One such fingerprinting scheme is described in “A Highly Robust Audio Fingerprinting System”, by Jaap Haitsma and Ton Kalker, Ismir, October 2002, where fingerprints are made up by a number of sub-fingerprints. A sub-fingerprint is based on a part of the media signal. 256 consecutive sub-fingerprints, which we will refer to as the fingerprint or fingerprint block, are computed during a short time interval in order to provide a fast and safe identification of the media signal. A fingerprint can therefore be taken on for example the first three seconds of a media signal. A positive identification is made in a fingerprint database based if the Hamming distance between the derived fingerprint and a fingerprint in the database is below a certain threshold.
A problem of the known fingerprinting schemes that often the media signal can have parts that are made up of digital silence. An audio clip might for instance start with silence, where for instance the PCM sample has a value of zero, and a video clip can start with a number of black frames. This means that sub-fingerprints made in the beginning during this digital silence, will be identical and reflect that no information is present. Since a lot of different media signals or files can have this digital silence in the beginning, it is possible that a query with a fingerprint made on the beginning would be found to wrongly correspond to several different stored media signals in the database.
It is thus an object of the present invention to provide fingerprinting where the effects of digital silence in a media signal are removed such that fingerprinting can be used with a diminished risk of identifying the wrong media signal.
According to a first aspect of the present invention, this object is achieved by a method of handling digital silence when fingerprinting a digital media signal comprising the steps of:
generating a fingerprint comprising a number of sub-fingerprints for at least a part of the digital media signal, and
removing or changing the influence of at least one piece of the media signal on the fingerprint, which piece corresponds to digital silence.
According to a second aspect of the present invention, this object is also achieved by a device for handling digital silence when fingerprinting digital media signals and comprising:
a fingerprint generating unit arranged to generate a fingerprint comprising a number of sub-fingerprints for at least parts of a digital media signal, and
a digital silence removal unit arranged to remove or change the influence of at least one piece of the media signal on the fingerprint, which piece corresponds to digital silence.
According to a third aspect of the present invention, this object is furthermore achieved by a system of devices for handling digital silence when fingerprinting digital media signals and comprising:
a server device having a database of fingerprints related to media signals stored as media files, and
a client device for generating fingerprint queries to the server device, wherein at least one of client and server device comprises:
a fingerprint generating unit arranged to generate a number of sub-fingerprints for at least parts of a digital media signal, and
a silence removal unit arranged to remove or change the influence of at least one piece of the media signal on the fingerprinting, which piece corresponds to digital silence.
According to a fourth aspect of the present invention, this object is also achieved by a computer program product for handling digital silence when fingerprinting digital media signals, to be used on a computer, comprising a computer readable medium having thereon:
computer program code means, to make thy computer execute, when said program is loaded in the computer:
generate a number of sub-fingerprints for at least parts of a digital media signal, and
remove or change the influence of at least one piece of the media signal on the fingerprint, which piece corresponds to digital silence.
According to a fifth aspect of the present invention, this object is also achieved by a computer program element for handling digital silence when fingerprinting digital media signals, to be used on a computer, said computer program element comprising: computer program code means, to make the computer execute, when said program is loaded in the computer:
generate a number of sub-fmgerprints for at least parts of a digital media signal, and
remove or change the influence of at least one piece of the media signal on the fingerprint, which piece corresponds to digital silence.
Claims 2 and 3 are directed towards removing the cause for digital silence.
Claim 4 is directed towards adding random values to the whole media signal.
Claims 5 and 16 are directed towards providing random values for changing the influence of digital silence.
Claims 6 and 17 are directed towards replacing sub-fingerprints representing digital silence with random values.
Claims 7 and 18 are directed towards replacing samples of the media signal representing digital silence with random values.
Claim 8 is directed towards providing different types of random number generations in a client and a server device.
Claims 10 and 19 are directed towards processing the random number with time and date information related to the generation of a fingerprint for lowering the probability of false identifications of media signals.
The present invention has the advantage of in a reliable way avoiding a wrong identification of media signals in which digital silence is included. It is also easy to implement by only requiring some of the functionalities already provided in a computer. In a variation of the invention it also guarantees that random numbers generated almost certainly do not generate false identifications.
The general idea behind the invention is thus to remove digital silence related to media signals or to replace it with random values when generating fingerprints for the media signal.
The expression digital silence is intended to comprise digital audio signals where the information in the signal represents no sound or sound below a certain low threshold where different valued sub-fingerprints are not possible to generate as well as digital video information where the information in the frames represents black or is below a certain threshold in which no images are discernible.
These and other aspects of the invention will be apparent from and elucidated with reference to the embodiments described hereinafter.
The present invention will now be explained in more detail in relation to the enclosed drawings, where
The present invention relates to the field of providing fingerprints for digital media signals and will in the following be described in relation to fingerprinting of audio signals. It is however not limited to audio but can be applied for other media signals like for instance video.
One difference between the fingerprints in client and server is that the database includes fingerprints for whole audio signals, whereas a client normally only generates one or a few fingerprints for an audio signal. The functioning of the device shown in
In the device described above identification of a piece of audio can be made quickly based on a fingerprint corresponding to approximately 3 seconds and containing 256 sub-fingerprints. This can however lead to some problems, which this invention will solve. Many audio signals or clips may start with silence, which can be a few seconds long. Many audio signals will therefore include information, which actually represents silence. This means that there can be several audio signals all of which are also started with silence that can be found to correspond to an audio file for which a fingerprint is taken. There is thus a need for taking care of this silence. In case of video this would correspond to a number of black frames at the beginning.
A device for handling digital silence 30 according to the invention is shown in a block schematic in
The functioning of the units in
The device 30 can as an alternative be provided on the input side of the client device, i.e. before sub-fingerprints are generated. In this case the control unit 32 will be connected to a register where the actual audio signal is temporarily stored before being subject to fingerprinting. A method according to an alternative embodiment of the invention will now be described with reference being made to
There are some other possible variations to the above-described scheme. One variation of the alternative embodiment of the invention is to add a small piece of random noise to all samples of the audio signal before a fingerprint is generated, i.e. also to the samples not corresponding to silence. It is furthermore possible to remove the digital silence from either the digital samples before fingerprinting is performed or to remove the sub-fingerprints, which correspond to digital silence instead of replacing them with random numbers. When this is done it is however not guaranteed that the spacing between subsequent sub-fingerprints are 11,8 ms apart. Then there is a risk that low-amplitude noise which can be added to a radio broadcast audio signal instead of silence will be a part of the fingerprint sent to a database. If the database has the corresponding silence removed, this will lead to a less than optimal match.
The unit in
The sub-fingerprints generated are of 32 bits and a sub-fingerprints corresponding to silence is then the hexadecimal value 0x00000000. It is convenient to use a standard linear congruential random number generator for generating 32 bit random words to use for replacing the zero sub-fingerprints. The random number generator is initialised with a random number X0. Subsequent random numbers are obtained according to equation (1) below.
XN+1=(1664525*XN +1013904223)mod 232 (1)
There is however a problem with the use of this method in case both the client and the server have fingerprints where this same type of random number generator has been used. Since the only real random number is the first number and all subsequent random numbers are computed in a known way from this first random number, there is a risk that both the devices will end up with the same random numbers for digital silence. This could lead to a matching of the fingerprint in the database based on the sequence of “random” sub-fingerprints for silence. If the database has about 1 million songs this risk is at least 1/4000 or 0,025%. In fact the risk is even higher than this because of the risk of matching between sub-fingerprints in a query and database provided in different positions in the fingerprint
One way to solve this problem is to have different random number generating schemes for client and server. This would lead to different implementations of database and fingerprint query generation in server and client. Another solution to this problem will be described in relation to
The probability for these values to correspond to digital silence in both the client and the server are therefore reduced significantly.
One variation of this latter unit is shown in
The present invention is preferably provided with one or more processors with associated program memory in which the program code for performing the method according to the invention is stored. The program code can also be provided in the form of a data carrier, like a CD Rom disk 96 as is shown in
The present invention has several advantages. It avoids the wrong identification of media signals in which digital silence is included in a reliable way. It is also easy to implement since it uses some of the functionality already provided in a computer. In a variation of the invention it also guarantees that random numbers generated almost certainly do not generate false identifications.
The present invention has been described in relation to computers in a computer system. However, it is not limited to this, but can be implemented in other types of environments for instance like in a mobile phone communicating with a server via a cellular network. A mobile phone can also be made to communicate with a computer that is a client device connecting to a server including the above-mentioned database. The invention is furthermore not limited to the described fingerprinting scheme, but can be implemented in any fingerprinting scheme that has to be capable to handle digital silence. The invention was described in relation to PCM samples. It should be realised that it is also applicable when different types of compression and coding are used, like MP3-coding as well as for other types of media signals like video Therefore the present invention is only to be limited by the following claims.
In summary, the invention relates to a method, a device, a client-server system as well as a computer program product and computer program element for handling digital silence when fingerprinting digital media signals. A fingerprint comprising a number of sub-fingerprints for at least a part of the digital media signal is generated, (step 42), and the influence of at least one piece of the media signal on the fingerprint is removed or changed, (step 48), which piece corresponds to digital silence. The invention in a reliable way avoids a wrong identification of media signals, such as audio signals, where digital silence is included. The invention is also easy to implement by only requiring some of the functionalities already provided in a computer.
Number | Date | Country | Kind |
---|---|---|---|
03100461.7- | Feb 2003 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB04/50120 | 2/18/2004 | WO | 8/18/2005 |