Claims
- 1. A method for automatically classifying consonance of audio data, comprising:
applying audio data to a peak detection process; detecting the location of at least one prominent peak represented by the audio data in the frequency spectrum and determining the energy of the at least one prominent peak; storing the location of the at least one prominent peak and the energy of the at least one prominent peak into at least one output matrix; applying the data stored in said at least one output matrix to critical band masking filtering; applying the data stored in said at least one output matrix to a peak continuation process; and applying the data stored in said at least one output matrix to an intervals calculation process where the frequency of ratios between peaks are stored into an output vector for the audio data being classified.
- 2. A method according to claim 1, wherein the audio data is divided into frames, and the method is performed frame by frame.
- 3. A method according to claim 2, wherein the frame by frame approach includes bin differencing to calculate frame derivatives to facilitate the detection of peaks.
- 4. A method according to claim 2, wherein the number of peaks detected in said application of the peak detection process is limited by a pre-defined parameter.
- 5. A method according to claim 1, further comprising performing Nth order interpolation on the location of the at least one prominent peak and the energy of the at least one prominent peak to increase precision of the location and energy values for the peak.
- 6. A method according to claim 1, further comprising applying the output vector to a classification stage which determines at least one of (1) at least one consonance value and (2) at least one consonance class that describes the audio data.
- 7. A method according to claim 1, where the frequency of ratios between peaks are stored into an output vector that is 1×24.
- 8. A method according to claim 2, wherein the peak continuation process keeps track of peaks that last more than a predetermined number of frames
- 9. A method according to claim 8, wherein the peak continuation process fills in a peak when the peak is missed in a previous frame.
- 10. A method according to claim 1, wherein said critical band masking filtering removes a peak that is masked by surrounding peaks with more energy
- 11. A method according to claim 10, wherein said critical band masking filtering removes a peak when at least one of a lower frequency peak and a higher frequency peak have greater energy.
- 12. A method according to claim 10, wherein said critical band masking filters are scalable so that the amount of masking is scalable.
- 13. A method according to claim 1, wherein said storing includes providing an output of the peak detection and interpolation stage in two matrices, one holding the location of the at least one prominent peak, and the second holding the respective energy of the at least one prominent peak.
- 14. A method according to claim 1, wherein the audio data is formatted according to pulse code modulated format.
- 15. A method according to claim 14, wherein the audio data is previously in a format other than pulse code modulated format, and the method further comprises converting the audio data to pulse code modulated format from the other format.
- 16. The method of claim 1, further comprising converting the input audio data from the time domain to the frequency domain.
- 17. A method according to claim 16, wherein said converting of the input audio data signal from the time domain to the frequency domain includes performing a fast fourier transform on the audio data.
- 18. A computer readable medium bearing computer executable instructions for carrying out the method of claim 1.
- 19. A modulated data signal carrying computer executable instructions for performing the method of claim 1.
- 20. At least one computing device comprising means for performing the method of claim 1.
- 21. A method of classifying data according to consonance properties of the data, comprising:
assigning to each media entity of a plurality of media entities in a data set to at least one consonance class; processing each media entity of said data set to extract at least one consonance characteristic based on digital signal processing of each media entity; generating a plurality of consonance vectors for said plurality of media entities, wherein each consonance vector includes said at least one consonance class and at least one consonance characteristic based on digital signal processing; and forming a classification chain based upon said plurality of feature vectors.
- 22. A method according to claim 21, further comprising:
processing an unclassified media entity to extract at least one consonance characteristic based on digital signal processing of the unclassified media entity; generating a vector for the unclassified media entity including said at least one digital signal processing consonance characteristic; presenting the vector for the unclassified media entity to the classification chain; and classifying the unclassified entry with an estimate of the consonance class by calculating the representative consonance class of the subset of the plurality of vectors of the classification chain located in the neighborhood of the vector for the unclassified entity.
- 23. A method according to claim 22, further including calculating a neighborhood distance that defines a distance within which two vectors in the classification chain space are in the same neighborhood for purposes of being in the same consonance class.
- 24. A method according to claim 22, wherein said classifying of the unclassified entry includes classifying the unclassified entry with a median consonance class represented by the neighborhood.
- 25. A method according to claim 22, wherein said consonance class is described by a numerical value and said classifying of the unclassified entry includes classifying the unclassified entry with a mean of numerical consonance values found in the neighborhood.
- 26. A method according to claim 22, wherein said classifying includes returning at least one number indicating the level of confidence of the consonance class estimate.
- 27. A computer readable medium bearing computer executable instructions for carrying out the method of claim 21.
- 28. A modulated data signal carrying computer executable instructions for performing the method of claim 21.
- 29. At least one computing device comprising means for performing the method of claim 21.
- 30. A computing system, comprising:
a computing device including:
a classification chain data structure stored thereon having a plurality of classification vectors, wherein each vector includes data representative of a consonance class as classified by humans and consonance characteristics as determined by digital signal processing; and processing means for comparing an unclassified media entity to the classification chain data structure to determine an estimate of the consonance class of the unclassified media entity.
- 31. A computing system according to claim 30, wherein said determining of an estimate of the consonance class includes returning at least one number indicating the level of confidence of the consonance class assignment.
- 32. A method according to claim 31, wherein the performance level of the classification chain improves over time due to the examination of unclassified media entities that have a low confidence level associated with the consonance class assignment.
- 33. A classification chain data structure utilized in connection with the classification of consonance of new unclassified media entities, comprising:
a plurality of classification vectors, wherein each vector includes:
consonance data as classified by humans; and consonance data determined by digital signal processing techniques.
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Provisional Application Ser. No. 60/216,103, filed Jul. 6, 2000. This application relates to U.S. patent application Ser. Nos. (Attorney Docket Nos. MSFT-577 through MSFT-579, and MSFT-581 through MSFT-587).
Provisional Applications (1)
|
Number |
Date |
Country |
|
60216103 |
Jul 2000 |
US |