Claims
- 1. A method for automatically classifying melodic movement properties of audio data, comprising:
applying audio data to a peak detection process; detecting the location of at least one prominent peak represented by the audio data in the frequency spectrum and determining the energy of the at least one prominent peak; storing the location of the at least one prominent peak and the energy of the at least one prominent peak into at least one output matrix; applying the data stored in said at least one output matrix to critical band masking filtering; applying the data stored in said at least one output matrix to a peak continuation process; and applying the data stored in said at least one output matrix to a melodic movement vector calculation process that determines pitch class movement data corresponding to the audio data for the melodic movement vector.
- 2. A method according to claim 1, wherein the audio data is divided into frames, and the method is performed frame by frame.
- 3. A method according to claim 2, wherein the frame by frame approach includes frame differencing.
- 4. A method according to claim 2, wherein the number of peaks detected in said application of the peak detection process is limited by a pre-defined parameter.
- 5. A method according to claim 1, further comprising performing Nth order interpolation on at least one of the location of the at least one prominent peak and the energy of the at least one prominent peak to increase precision.
- 6. A method according to claim 1, further comprising applying the melodic movement vector to a classification stage which determines at least one of (1) at least one melodic movement value and (2) at least one melodic movement class that describes the audio data.
- 7. A method according to claim 1, wherein the pitch class movement data is stored into a melodic movement vector that is 1×24.
- 8. A method according to claim 2, wherein the peak continuation process keeps track of peaks that last more than a predetermined number of frames.
- 9. A method according to claim 8, wherein the peak continuation process fills in peaks where a peak has been missed in a predetermined number of frames.
- 10. A method according to claim 1, further comprising transforming the melodic vector to extract the salient features of the data via principal component analysis.
- 11. A method according to claim 1, wherein said critical band masking filtering removes a peak that is masked by surrounding peaks with more energy.
- 12. A method according to claim 11, wherein said critical band masking filtering removes a peak when a lower frequency peak and a higher frequency peak have greater energy.
- 13. A method according to claim 11, wherein said critical band masking filters are scalable so that the amount of masking is scalable.
- 14. A method according to claim 1, wherein said storing includes providing an output of the peak detection and interpolation stage in two matrices, one holding the location of the at least one prominent peak, and the second holding the respective energy of the at least one prominent peak.
- 15. A method according to claim 1, wherein the audio data is formatted according to pulse code modulated format.
- 16. A method according to claim 15, wherein the audio data is previously in a format other than pulse code modulated format, and the method further comprises converting the audio data to pulse code modulated format from the other format.
- 17. The method of claim 1, further comprising converting the input audio data from the time domain to the frequency domain.
- 18. A method according to claim 17, wherein said converting of the input audio data signal from the time domain to the frequency domain includes performing a fast fourier transform on the audio data.
- 19. A computer readable medium bearing computer executable instructions for carrying out the method of claim 1.
- 20. A modulated data signal carrying computer executable instructions for performing the method of claim 1.
- 21. At least one computing device comprising means for performing the method of claim 1.
- 22. A method to quantify and classify the melodic movement in a digital audio file, comprising:
detecting and interpolating the maximum peak locations and energies in the spectrum for each frame of a digital audio file; calculating the melodic vector of the digital audio file; transforming the melodic vector into the principal component coordinate system, thereby generating the melodic movement principal components; and classifying the principal components using a classification chain formed from melodic movement classification data classified by humans and melodic movement classification data classified by digital signal processing techniques.
- 23. The method of claim 22, further including masking critical bands by a scalable amount.
- 24. The method of claim 22, further including the step of continuing peaks which last for more than a pre-specified number of frames.
- 25. A method of classifying data according to melodic movement properties of the data, comprising:
assigning to each media entity of a plurality of media entities in a data set to at least one melodic movement class; processing each media entity of said data set to extract at least one melodic movement class based on digital signal processing of each media entity; generating a plurality of melodic movement properties vectors for said plurality of media entities, wherein each melodic movement properties vector includes said at least one melodic movement class and at least one melodic movement class based on digital signal processing; and forming a classification chain based upon said plurality of feature vectors.
- 26. A method according to claim 25, further comprising:
processing an unclassified media entity to extract at least one melodic movement class based on digital signal processing of the unclassified media entity; generating a vector for the unclassified media entity including said at least one digital signal processing melodic movement class; presenting the vector for the unclassified media entity to the classification chain; and classifying the unclassified entry with an estimate of the melodic movement class by calculating the representative melodic movement class of the subset of the plurality of vectors of the classification chain located in the neighborhood of the vector for the unclassified entity.
- 27. A method according to claim 26, further including calculating a neighborhood distance that defines a distance within which two vectors in the classification chain space are in the same neighborhood for purposes of being in the same melodic movement class.
- 28. A method according to claim 26, wherein said classifying of the unclassified entry includes classifying the unclassified entry with a median melodic movement class represented by the neighborhood.
- 29. A method according to claim 26, wherein said melodic movement class is described by a numerical value and said classifying of the unclassified entry includes classifying the unclassified entry with a mean of numerical melodic movement properties values found in the neighborhood.
- 30. A method according to claim 26, wherein said classifying includes returning at least one number indicating the level of confidence of the melodic movement class estimate.
- 31. A computer readable medium bearing computer executable instructions for carrying out the method of claim 25.
- 32. A modulated data signal carrying computer executable instructions for performing the method of claim 25.
- 33. At least one computing device comprising means for performing the method of claim 25.
- 34. A computing system, comprising:
a computing device including: a classification chain data structure stored thereon having a plurality of classification vectors, wherein each vector includes data representative of a melodic movement class as classified by humans and melodic movement characteristics as determined by digital signal processing; and processing means for comparing an unclassified media entity to the classification chain data structure to determine an estimate of the melodic movement class of the unclassified media entity.
- 35. A computing system according to claim 34, wherein said determining of an estimate of the melodic movement class includes returning at least one number indicating the level of confidence of the melodic movement class assignment.
- 36. A method according to claim 35, wherein the performance level of the classification chain improves over time due to the examination of unclassified media entities that have a low confidence level associated with the melodic movement class assignment.
- 37. A classification chain data structure utilized in connection with the classification of melodic movement properties of new unclassified media entities, comprising:
a plurality of classification vectors, wherein each vector includes:
melodic movement properties data as classified by humans; and melodic movement properties data determined by digital signal processing techniques.
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application relates to U.S. patent application. No. ______ (Attorney Docket Nos. MSFT-577 through MSFT-585 and MSFT-587).