Claims
- 1. An apparatus for transforming a voice signal of a talker into a voice signal having characteristics of a different person, comprising:
- means for separating a voice signal of someone such as a talker into a plurality of voice parameters including frequency components;
- neural network means for transforming at least some of the separated frequency components from having characteristics of the talker into having characteristics of the different person; and
- means for combining the voice parameters after transformation of some of the frequency components by the neural network means, for reconstituting the talker's voice signal having characteristics of the different person.
- 2. The apparatus of claim 1, wherein the means for separating includes means for extracting pitch data and the frequency components of harmonic band energy data related to the pitch, and further wherein the neural network means transforms the extracted harmonic band energy data into harmonic band energy data having characteristics of the different person.
- 3. The apparatus of claim 2, further comprising means for mapping the pitch of the talker to the pitch of the different person.
- 4. The apparatus of claim 3, further comprising means for interpolating the extracted harmonic band energy data into a predetermined number of frequency bands prior to transformation by the neural network means.
- 5. The apparatus of claim 4, further comprising:
- first means for determining a voiced/unvoiced identifier for each extracted harmonic energy band;
- second means for determining a voiced/unvoiced identifier for each interpolated frequency band in response to the identifiers determined for the extracted harmonic energy bands; and
- means for modifying the voiced/unvoiced identifiers of the interpolated frequency bands in response to the transformed energy data and the interpolated energy data for use with the transformed energy data in the means for combining.
- 6. The apparatus of claim 5, wherein the means for modifying includes:
- third means for determining the difference between the interpolated energy data and the transformed energy data for each of the predetermined number of frequency bands; and
- means for changing the voiced/unvoiced identifier for each interpolated frequency band in response to the difference between the interpolated energy data and the transformed energy data for the respective band and including means for assuring a voiced identifier for each frequency band for which the transformed energy data is substantially higher than the interpolated energy data, means for assuring an unvoiced identifier for each frequency band for which the transformed energy data is substantially lower than the interpolated energy data, and means for leaving the voiced/unvoiced identifier unchanged for each frequency band for which the transformed energy data is not substantially different from the interpolated energy data.
- 7. The apparatus of claim 6, wherein the means for combining includes means for modulating the transformed frequency band energy data with the mapped pitch for reconstructing the voice signal of the talker having characteristics of the voice of the different person.
- 8. The apparatus of claim 7, wherein the means for extracting includes means for determining the frequency distribution of the pitch of voice signals coupled thereto, and further comprising:
- first means for separately storing the pitch data extracted from each voice signal coupled through the means for extracting;
- second means for separately storing the interpolated frequency band energy data extracted for each voice signal coupled through the means for extracting;
- means for separately coupling comparable voice signals of the talker and the different person pronouncing the same words through the means for extracting and the means for interpolating; and
- means for training the neural network means to transform the interpolated frequency band energy data of the talker into the interpolated frequency band energy data of the different person using the stored interpolated frequency band energy data of the comparable voice signals.
- 9. The apparatus of claim 7, wherein the means for combining further includes means for decimating the modulated energy data of the predetermined number of frequency bands into harmonic band energy data corresponding to the mapped pitch of the different person.
- 10. The apparatus of claim 9, wherein the means for extracting includes:
- means for digitally sampling a voice signal;
- means for converting the digitized voice signal into the frequency domain for discrete intervals thereof; and
- means for determining from the frequency domain signal of each interval a pitch frequency and frequency components.
- 11. The apparatus of claim 10, wherein the means for converting includes means for performing a process of either Fourier transform or linear predictive coding.
- 12. The apparatus of claim 11, wherein the means for modulating further includes second means for converting frequency domain signal data into the time domain.
- 13. The apparatus of claim 12, wherein the second means for converting includes means for performing the inverse process of either Fourier transform or linear predictive coding.
- 14. The apparatus of claim 3, wherein the means for extracting determines the mean frequency of the pitch frequency distribution and a standard deviation thereof.
- 15. The apparatus of claim 6, wherein the neural network means is a back propagation trained, recurrent neural network.
- 16. The apparatus of claim 14, wherein the means for mapping determines the number of standard deviations from the talker's pitch for specific voice samples and maps each specific sample to the same number of standard deviations from the different person's mean pitch.
- 17. An apparatus for transforming a voice signal of a talker into a voice signal having characteristics of a different person, comprising:
- means for extracting pitch and frequency components from a voice signal of a talker;
- means for mapping the pitch of the talker to a known pitch of the different person;
- means for interpolating the extracted frequency components into a predetermined number of frequency bands;
- neural network means for transforming interpolated frequency components from those of the talker to those having characteristics of the different person;
- means for determining voiced and unvoiced identifiers for each of the predetermined number of interpolated frequency bands;
- means for modifying the identifiers in response to the interpolated and transformed frequency components; and
- means responsive to the modified identifiers for modulating the transformed frequency components with the mapped pitch for reconstructing the voice signal of the talker having characteristics of the voice of the different person.
Parent Case Info
This is a continuation-in-part of application Ser. No. 07/908,585, filed Jun. 29, 1992, and now abandoned, which application was a continuation of application Ser. No. 07/552,679, filed Jul. 11, 1990, and now abandoned.
US Referenced Citations (3)
Number |
Name |
Date |
Kind |
3982070 |
Flanagan |
Sep 1976 |
|
4788649 |
Shea et al. |
Nov 1988 |
|
5278943 |
Gasper et al. |
Jan 1994 |
|
Continuations (1)
|
Number |
Date |
Country |
Parent |
552679 |
Jul 1990 |
|
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
908585 |
Jun 1992 |
|