The foregoing aspects and many of the accompanying advantages of this invention will become more readily appreciated as the same becomes better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:
Furthermore, the features from the spectral domain are spectral features, the features from the temporal domain are temporal features, and the features from the statistical value are statistical features. Spectral features are descriptors computed from Short Time Fourier Transform of the signal, such as Linear Predictive Coding, Mel-scale Frequency Cepstral Coefficients, and so forth. Temporal features are descriptors computed from the waveform of the signal, such as Zero-crossing Rate, Temporal Centroid and Log Attack Time. Statistical features are descriptors computed according to the statistical method, such as Skewness and Kurtosis.
A data preprocessing unit 12 couples to the feature extraction unit 11 and normalizes the features, then generating a plurality of classification information for the intelligent signal processing system 10.
A classification unit 13 couples to the feature data preprocessing unit 12 and group the audio signals to various kind of music according the classification information by using nearest neighbor rule (NNR), artificial neural network (ANN), fuzzy neural network (FNN) or hidden Markov model (HMM).
Accordingly, the intelligent signal processing system 10 may automatically classify the received mixed signals into many groups, and store them in the memory 14. For example, the system 10 would classify the music downloaded from the Internet according to singers or instruments, wherein the music may be the mixed signal of creatures' sound signal and instruments' sound signal, the mixed signal of human's sound signal and instruments' sound signal, or the mixed signal of human's sound signal and the instrument's sound signal.
In addition, before the intelligent signal processing system 10 an independent component analysis (ICA) unit (not shown) receives an audio signal and separates it to a plurality of sound components. In the field of audio preprocessing, we may remove the voice from the songs by using independent component analysis. Besides, independent component analysis can help the system lower the noise while we record sound in a nosy environment.
The normalization process comes after feature extraction. It eliminates redundancy, organizes data efficiently, reduces the potential for anomalies during the data operations and improves the data consistency. The steps of normalization include: dividing the features into several parts according to the extraction method; finding the minimum and maximum in each data set; and rescaling each data set so that the maximum of each data is 1 and the minimum of each data is −1.
Table 1 shows the experimental results of the singer identification in accordance with the present invention. The three categories are three singers (Taiwanese): Wu, Du, and Lin. Four classification techniques include NNR, ANN, FNN, and HMM. For each singer, training signals use seven songs and testing signal uses the other one that is different from those used for training (external test). The dimension of the feature space is 75. The number of the training data is 3500 and the number of testing data is 100.
Table 2 shows the experimental results of instrument identification in accordance with present invention. It reveals that the four classification techniques are all effective.
Overall, the performance of the FNN is the best, while the performance of the ANN and the HMM are satisfactory.
While several sources are mixed artificially in a PC, ICA may separate perfectly without knowing anything about the different sound sources. For example, two instruments (piano and violin) are chosen to perform the same music or different music, and then mix them in a PC. We found the ICA could successfully separate these blindly mixed signals. In another condition, several microphones record sounds in a noisy environment. With the help of ICA, the unwanted noise could be lowered but could not be lowered.
In the invention, ICA is used to separate the blind sources, to remove the voice, and to reduce the noise. We could remove the voice from songs, and reduce the noise while recording in a noisy environment by using ICA, which could be applied to a karaoke machine, a recorder, and etc.
Accordingly, the present invention receives a training audio signal, extracts a group of feature variables, normalizes feature variables and generates a plurality of classification items for training the system; next, the system receives a test audio signal, extracts feature variables, normalizes feature variables and generates a plurality of classification information; lastly, the system uses artificial intelligent calculation to classify a test audio signal into classification items, and stores the test audio signal into the memory.
While the invention is susceptible to various modifications and alternative forms, a specific example thereof has been shown in the drawings and is herein described in detail. It should be understood, however, that the invention is not to be limited to the particular form disclosed, but to the contrary, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
95136283 | Sep 2006 | TW | national |