Claims
- 1. A method for identifying a fingerprint for a data file, comprising:
receiving the fingerprint having a at least one feature vector developed from the data file; determining a subset of reference fingerprints from a database of reference fingerprints having at least one feature vector developed from corresponding data files, the subset being a set of the reference fingerprints of which the fingerprint is likely to be a member and being based on the at least one feature vector of the fingerprint and the reference fingerprints; and determining if the fingerprint matches one of the reference fingerprints in the subset based on a comparison of the reference fingerprint feature vectors in the subset and the at least one feature vector of the fingerprint.
- 2. A method as recited in claim 1, wherein determining the subset of the reference fingerprints is an iterative process.
- 3. A method as recited in claim 1, wherein the iterative process of finding a subset includes determining a set of reference fingerprints of the plurality of fingerprints that are nearest neighbors of the fingerprint.
- 4. A method as recited in claim 3, wherein the nearest neighbors are determined using hash index on the reference fingerprints.
- 5. A method as recited in claim 1, wherein the determining if there is a match includes determining whether the distance between any of the feature vectors of the reference fingerprints in the subset and the at least one feature vector of the fingerprint is within a predetermined match distance threshold.
- 6. A method as recited in claim 1, further comprising selecting a feature weight bank based on the similarity of the fingerprint and reference feature class vectors and wherein the selected feature weight bank is used in determining the subset of reference fingerprints.
- 7. A method as recited in claim 1, wherein the feature vectors of the fingerprint are based on a non-overlapping time frame sampling of the data file.
- 8. A method as recited in claim 1, further comprising storing the fingerprint for the data file upon determining that there is no match between the fingerprint and the reference fingerprints.
- 9. A method as recited in claim 1, further comprising, upon determining that the fingerprint matches one of the reference fingerprints, outputting a file identification for the corresponding file of the matched reference fingerprint.
- 10. A method as recited in claim 9, wherein the file identification for the corresponding file of the matched reference fingerprint is modified if a different confirmed identification exits for the corresponding file of the matched reference fingerprint.
- 11. A method as recited in claim 1, wherein the fingerprint is a concatenation type fingerprint.
- 12. A method as recited in claim 1, wherein the data file is an audio file.
- 13. A method of identifying a fingerprint for a data file, comprising:
receiving the fingerprint having a plurality of feature vectors sampled from a data file over a series of time; determining a subset of reference fingerprints from a database of reference fingerprints having a plurality of feature vectors sampled from their respective data files over a series of time, the subset being a set of reference fingerprints of which the fingerprint is likely to be a member and being based on the rarity of the feature vectors of the reference fingerprints; and determining if the fingerprint matches one of the reference fingerprints in the subset.
- 14. A method as recited in claim 13, wherein finding a subset of file fingerprints includes determining the rarest of the feature vectors of the file fingerprints.
- 15. A method as recited in claim 14, wherein the fingerprint is an aggregation type fingerprint.
- 16. A method as recited in claim 13, wherein determining the subset of the reference fingerprints is an iterative process.
- 17. A method as recited in claim 13, wherein the iterative process of finding a subset includes determining a set of reference fingerprints of the plurality of fingerprints that are nearest neighbors of the fingerprint.
- 18. A method as recited in claim 17, wherein the nearest neighbors are determined using hash index on the reference fingerprints.
- 19. A method as recited in claim 13, wherein the determining if there is a match includes determining whether the distance between any of the feature vectors of the reference fingerprints in the subset and the at least one feature vector of the fingerprint is within a predetermined match distance threshold.
- 20. A method as recited in claim 13, further comprising selecting a feature weight bank based on the similarity of the fingerprint and reference feature class vectors and wherein the feature weight bank is used in determining the subset of reference fingerprints.
- 21. A method as recited in claim 13, wherein the feature vectors of the fingerprint are based on a non-overlapping time frame sampling of the data file.
- 22. A method as recited in claim 13, further comprising storing the fingerprint for the data file upon determining that there is no match between the fingerprint and the reference fingerprints.
- 23. A method as recited in claim 13, further comprising, upon determining that the fingerprint matches one of the reference fingerprints, outputting a file identification for the corresponding file of the matched reference fingerprint.
- 24. A method as recited in claim 23, wherein the file identification for the corresponding file of the matched reference fingerprint is modified if a different confirmed identification exits for the corresponding file of the matched reference fingerprint.
- 25. A method as recited in claim 13, wherein the data file is an audio file.
- 26. A method for updating a reference fingerprint database, comprising:
receiving a fingerprint for a data file; determining if the fingerprint matches one of a plurality of reference fingerprints; and upon the determining step revealing no match, updating the reference fingerprint database to include the fingerprint.
- 27. A method as recited in claim 26, wherein the data file is an audio file.
- 28. A method as recited in claim 26, wherein the fingerprint is generated from an audio portion of the data file.
- 29. A method determining a fingerprint for a digital file, comprising:
receiving the digital file; accessing the digital file over time to generate a sampling; and determining at least one feature of the digital file based on the sampling, wherein the at least one feature includes at least one of:
a ratio of a mean of the absolute value of the sampling to root-mean-square average of the sampling; spectral domain features of the sampling; a statistical summary of the normalized spectral domain features; Haar wavelets of the sampling; a zero crossing mean of the sampling; a beat tracking of the sampling; and a mean energy delta of the sampling.
- 30. A method as recited in claim 29, wherein the at least one feature includes a ratio of a mean of the absolute value of the sampling to root-mean-square average of the sampling, spectral domain features of the sampling, a statistical summary of the normalized spectral domain features, and Haar wavelets of the sampling.
- 31. A method as recited in claim 29, wherein sampling includes generating time slices and determining the at least one feature includes determining at least one feature for each of the time slices.
- 32. A method as recited in claim 30, wherein sampling includes generating time slices and determining the at least one feature includes determining at least one feature for each of the time slices.
- 33. A method as recited in claim 29, wherein the data file is an audio file.
- 34. A method of identifying digital files, comprising:
accessing a digital file; determining a fingerprint for the digital file, the fingerprint representing at least one feature of the digital file; comparing the fingerprint to reference fingerprints, the reference fingerprints uniquely identifying a corresponding digital file having a corresponding unique identifier; and upon the comparing revealing a match between the fingerprint and one of the reference fingerprints, outputting the corresponding unique identifier for the corresponding digital file of the one of the reference fingerprints that matches the fingerprint.
- 35. A method as recited in claim 34, further comprising generating a unique identifier for the digital file upon the comparing revealing no match.
- 36. A method as recited in claim 35, wherein the digital file is an audio file.
CROSS-REFERENCE TO RELATED APPLICATION
[0001] The present application claims the benefit of U.S. provisional application 60/275,029 filed Mar. 13, 2001 and U.S. application Ser. No. 09/931,859 filed Aug. 20, 2001, both of which are hereby incorporated by reference.
PCT Information
| Filing Document |
Filing Date |
Country |
Kind |
| PCT/US02/07528 |
3/13/2002 |
WO |
|