Method, medium, and system for music retrieval using modulation spectrum

Abstract
An audio information retrieval method, medium, and system that can rapidly retrieve audio information, even in noisy environments, by extracting a modulation spectrum that is robust against noise, converting features of the extracted modulation spectrum into hash bits, and using a hash table. The audio information retrieval method may include extracting a modulation spectrum from audio data of a compressed domain, converting the extracted modulation spectrum into fingerprint bits, arranging the fingerprint bits in a form of a hash table, converting a received query into an address by a hash function corresponding to the query, and retrieving the audio information by referring to the hash table.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

These and/or other aspects and advantages of the invention will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:



FIG. 1 illustrates a music information retrieval system, according to an embodiment of the present invention;



FIG. 2 illustrates a music information retrieval system, according to another embodiment of the present invention;



FIG. 3 illustrates an example of MDCT-MS features in a music information retrieval system, according to an embodiment of the present invention;



FIG. 4 illustrates a music information retrieval method, according to an embodiment of the present invention;



FIG. 5 illustrates a music information retrieval method, according to another embodiment of the present invention; and



FIG. 6 illustrates a music information retrieval method, according to still another embodiment of the present invention.


Claims
  • 1. An audio information storage method, comprising: using a database and hash table generated by extracting a modulation spectrum from audio data, in a compressed domain of the audio data;converting the extracted modulation spectrum into fingerprint bits for each of the audio data, and arranging the fingerprint bits in a form of the hash table; andconverting a received query into an address, using a hash function, corresponding to the query, and retrieving audio information from the database by using the address to refer to the hash table.
  • 2. An audio information storage method, comprising: generating a Modified Discrete Cosine Transformation-Modulation Spectrum (MDCT-MS) fingerprint database from audio data in corresponding compressed domains;generating a hash table by dividing each MDCT-MS fingerprint in the MDCT-MS fingerprint database into segments;extracting an MDCT-MS fingerprint from an audio clip; anddividing the extracted MDCT-MS fingerprint from the audio clip into segments and utilizing the audio clip segments as a hash value for referring to the MDCT-MS fingerprint database to retrieve a stored clip that matches the audio clip.
  • 3. The method of claim 2, further comprising calculating Bit Error Ratio (BER) values between the audio clip and indexed clips of the database, and comparing the calculated BER values to determine one of the indexed clips having a lowest BER value as a final result of the retrieving of the stored clip identical to the audio clip
  • 4. The method of claim 2, wherein the generating of the hash table comprises: dividing each MDCT-MS fingerprint into a plurality of segments, each segment having an identical length; andgenerating the hash table by using the divided segments as the hash value.
  • 5. The method of claim 2, wherein the hash table corresponds to each segment of the MDCT-MS fingerprints.
  • 6. The method of claim 2, further comprising: acquiring unreliable bits with respect to the MDCT-MS fingerprints by ranking deviation values of neighboring frames of a corresponding MDCT-MS.
  • 7. The method of claim 6, wherein the acquiring of the unreliable bits comprises acquiring the unreliable bits with respect to a corresponding MDCT-MS fingerprint by setting a predetermined threshold with respect to the deviation values of the neighboring frames of the corresponding MDCT-MS.
  • 8. An audio information storage method, for retrieving audio information from a database by referring to a hash table, based upon a received query converted into an address by a hash function, the method comprising: extracting a corresponding modulation spectrum from audio data in corresponding compressed domains;converting the corresponding extracted modulation spectrum into fingerprint bits; andarranging the fingerprint bits in a form of the hash table for the retrieval of the audio data from the database based upon the address generated by the hash function.
  • 9. An audio information storage method, comprising: generating an MDCT-MS fingerprint database from audio data in corresponding compressed domains;generating a hash table for the generated MDCT-MS fingerprint database based on corresponding unreliable-bits-toggled MDCT-MS fingerprints;extracting an MDCT-MS fingerprint from an audio clip while calculating a hash value of the audio clip based on the unreliable-bits-toggled MDCT-MS fingerprints; andreferring to the MDCT-MS database to retrieve a clip that matchesl the audio clip based on the hash value of the audio clip.
  • 10. The method of claim 9, further comprising calculating BER values between the audio clip and indexed clips and comparing the calculated BER values to determine one of the indexed clips having a lowest BER value as a final result of the retrieving of the clip matching the audio clip.
  • 11. An audio information storage method, comprising: generating an MDCT-MS fingerprint database from audio data in corresponding compressed domains;generating a hash table for the generated MDCT-MS fingerprint database by using corresponding peak points as a corresponding hash value;calculating a hash value, based on peak points, of an audio clip and extracting an MDCT-MS fingerprint of the audio clip; andreferring to the MDCT-MS database to retrieve a clip that matches the audio clip, from clips that are maintained in the MDCT-MS fingerprint database, based on the calculated hash value of the audio clip.
  • 12. The method of claim 11, further comprising calculating BER values between the audio clip and indexed clips and comparing the calculated BER values to determine at least one of the indexed clips having a lowest BER value as a final result of the retrieving of the clip matching the audio clip.
  • 13. The method of claim 11, wherein the corresponding hash value utilizes a corresponding first peak point and second peak point of the corresponding MDCT-MS.
  • 14. The method of claim 13, wherein corresponding hash value utilizes a distance between the corresponding first peak point and second peak point of the corresponding MDCT-MS.
  • 15. The method of claim 11, wherein the generating of the hash table further comprises generating the hash table by simultaneously utilizing information on a corresponding first peak point and second peak point of the corresponding MDCT-MS.
  • 16. The method of claim 11, wherein the retrieving of the audio clip further comprises retrieving the matching clip from the MDCT-MS fingerprint database based on peak point information of the audio clip.
  • 17. The method of claim 11, further comprising: generating bits bias tolerance with respect to a corresponding first peak point and second peak point of the corresponding MDCT-MS.
  • 18. At least one medium comprising computer readable code to implement the audio information storage method of claim 1.
  • 19. At least one medium comprising computer readable code to implement the audio information storage method of claim 2.
  • 20. At least one medium comprising computer readable code to implement the audio information storage method of claim 8.
  • 21. At least one medium comprising computer readable code to implement the audio information storage method of claim 9.
  • 22. At least one medium comprising computer readable code to implement the audio information storage method of claim 11.
  • 23. An audio information storage system, comprising: an audio fingerprint generation unit to extract an MDCT-MS from audio data in a compressed domain and to generate an audio fingerprint of the audio data; andan audio data retrieval unit to refer to a database to retrieve retrieval audio data corresponding to the generated audio fingerprint.
  • 24. The system of claim 23, wherein the audio fingerprint generation unit comprises: an MDCT coefficient extraction unit to extract MDCT coefficients from the audio data in the compressed domain by partially decoding the audio data;an MDCT coefficient selection unit to select an MDCT coefficient, existing in a frequency domain not affected by noise, from the extracted MDCT coefficients;a modulation spectrum generation unit to perform a Discrete Fourier Transform (DFT) with respect to the selected MDCT coefficient and to generate an MDCT modulation spectrum (MDCT-MS) of the audio data; anda bit unit to quantize features of the generated MDCT-MS according to a bit derivation method.
  • 25. The system of claim 23, wherein the bit unit ranks absolute values according to the bit derivation method, selects unreliable bits from quantized bits, and quantizes the selected unreliable bits to ‘0’ and ‘1’ from ‘1’ and ‘0’, respectively.
  • 26. The system of claim 24, further comprising: a peak point extraction unit to extract peak points from the MDCT-MS features.
  • 27. The system of claim 23, wherein the audio data retrieval unit comprises: a hash retrieval unit to generate a hash value from the generated audio fingerprint and to retrieve at least one candidate audio fingerprint from the database which matches the generated hash value by referring to a hash table;a fingerprint retrieval unit to compare the at least one retrieved candidate audio fingerprint and the generated audio fingerprint and retrieving one of the at least one candidate audio fingerprint that has a bit error rate smaller than a predetermined reference value;an information storage unit to store audio data information, each comprising corresponding candidate audio fingerprints; andan information providing unit to provide a user with audio data information corresponding to the one of the at least one candidate audio fingerprint.
  • 28. The system of claim 27, wherein the hash retrieval unit comprises: a hash value generation unit to extract an indexing bit from the generated audio fingerprint and to generate a hash value by a hash function;a hash table storing hash values corresponding to addresses referring to each candidate audio fingerprint in the database and an address referring to each corresponding audio data information; anda table retrieval unit to retrieve the one of the at least one candidate audio fingerprint which matches the generated hash value from the hash table.
  • 29. The system of claim 27, wherein the fingerprint retrieval unit comprises: an audio fingerprint storage unit to convert the audio data into the generated audio fingerprint and to store the generated audio fingerprint;a BER calculation unit to calculating a BER value of the at least one candidate audio fingerprint and the generated audio fingerprint;a comparison unit to compare a predetermined threshold and the calculated BER value;an audio fingerprint detection unit to detect the one of the at least one candidate audio fingerprint as having a BER value smaller than the threshold; anda threshold adjustment unit to adjust the threshold according to a result of the detection of the one of the at least one candidate audio fingerprint.
  • 30. The system of claim 29, wherein the threshold adjustment unit adjusts the threshold until only a single candidate audio fingerprint, of the at least one candidate audio fingerprints, is detected from the audio fingerprint detection unit.
  • 31. An audio information storage system, to be referred to for retrieval of a stored audio data, corresponding to a query audio data input, using a hash function, comprising: an MDCT coefficient extraction unit to extract corresponding MDCT coefficients from audio data in corresponding compressed domains by partially decoding the audio data;an MDCT coefficient selection unit to select a corresponding MDCT coefficient, existing in a frequency domain not affected by noise, from the extracted corresponding MDCT coefficients;a modulation spectrum generation unit to perform a Discrete Fourier Transform (DFT) with respect to the selected corresponding MDCT coefficient and to generate a corresponding MDCT modulation spectrum (MDCT-MS) of the audio data;a bit unit to quantize features of the generated corresponding MDCT-MS according to a bit derivation method; anda storage to store a plurality of generated audio fingerprints in a database and/or to store a hash table corresponding to the plurality of generated audio fingerprints, based on results of the MDCT coefficient extraction unit, MDCT coefficient selection unit, modulation spectrum generation unit, and bit unit.
Priority Claims (1)
Number Date Country Kind
10-2006-0013125 Feb 2006 KR national