Claims
- 1. For use with a costume depicting a character having a defined voice with a pre-established voice characteristic, a voice transformation system comprising:
- a microphone that is positionable to receive and transduce speech that is spoken by a person wearing the costume into a source speech signal;
- a mask that is positionable to cover the mouth of the person wearing the costume to muffle the speech of the person wearing the costume to tend to prevent communication of the speech beyond the costume, the mask enabling placement of the microphone between the mouth and the mask;
- a speaker disposed on or within the costume to broadcast acoustic waves carrying speech in the defined voice of the character depicted by the costume; and
- a voice transformation device coupled to receive the signal from the microphone representing source speech spoken by a person wearing the costume, the voice transformation device transforming the received source speech signal to a target speech signal representing the utterances of the source speech signals in the defined voice of the character depicted by the costume;
- wherein the voice transformation device stores a plurality of representations of the defined voice and transforms the voice of the person wearing the costume into the same defined voice of the character depicted by the costume, based upon association of the voice of the particular person with particular ones of the stored representations.
- 2. A voice transformation system according to claim 1, wherein the voice transformation device includes:
- a processing subsystem segmenting and windowing the received source speech signal to generate a sequence of preprocessed speech signal segments;
- an analysis subsystem processing the received preprocessed speech signal segments to generate for each segment a pitch signal indicating a dominant pitch of the segment, a frequency domain vector representing a smoothed frequency characteristic of the segment and an excitation signal representing excitation characteristics of the segment;
- a transformation subsystem storing target frequency domain vectors that are representative of the target speech, substituting a corresponding target frequency domain vector for the frequency domain vector derived by the analysis subsystem, adjusting the pitch of the target excitation spectrum in response to the pitch signal derived by the analysis subsystem, and convolving the substituted target frequency domain vector with the adjusted excitation spectrum to produce a segmented frequency domain representation of the target voice; and
- a post processing subsystem performing an inverse Fourier transform and an inverse segmenting and windowing operation on each segmented frequency domain representation of the target voice to generate a time domain signal representing the source speech in the voice of the character depicted by the costume.
- 3. A voice transformation system comprising:
- a preprocessing subsystem receiving a source voice signal and digitizing and segmenting the source voice signal to generate a segmented time domain signal;
- an analysis subsystem responding to each segment of the segmented time domain signal by generating a source speech pitch signal representative of a pitch thereof, an excitation signal representative of the excitation thereof and a source vector that is representative of a smoothed spectrum of the segment;
- a transformation subsystem storing a plurality of source and target vectors and voice pitch indications for the source voice and a target voice different from the source voice, a correspondence between the source and target vectors and the source and target voice pitch indications, the transformation subsystem using the stored information to substitute a target vector for each received source vector, adjusting the pitch of the frequency domain excitation spectrum in response to the source and target pitch indications to generate a pitch adjusted excitation spectrum, and convolving the pitch adjusted excitation spectrum with a signal represented by the substituted target vector to generate a sequence of segmented target voice segments defining a segmented target voice signal; and
- a post processing subsystem converting the segmented target voice signal into a segmented time domain target voice signal that represents the words of the source signal with vocal characteristics of the different target voice.
- 4. A voice transformation system according to claim 3, wherein the preprocessing subsystem includes a digitizing sampling circuit that samples the source voice signal to produce digital samples that are representative thereof and a segmenting and windowing circuit that devices the digital samples into overlapping segments having a shift distance of at most 1/4 of a segment and applies a windowing function to each segment that reduces aliasing during a subsequent transformation to the frequency domain to produce a sequence of windowed source segments.
- 5. A voice transformation system according to claim 4, wherein each of the segments represent 256 voice samples.
- 6. A voice transformation system according to claim 3, wherein the analysis subsystem includes:
- a discrete Fourier transform unit generating a frequency domain representation of each segment;
- an LPC cepstrum parametrization unit generating source cepstrum coefficient voice vectors representing a smoothed spectrum of each frequency domain segment;
- an inverse convolution unit deconvolving each frequency domain segment with the smoothed cepstrum coefficient representation thereof to produce the excitation signal in the form of a frequency domain excitation spectrum;
- a pitch adjustment unit responding to the source speech pitch signal and adjusting the pitch of the excitation spectrum to generate a pitch adjusted excitation spectrum;
- a substitution unit substituting target cepstrum coefficient voice vectors for the source cepstrum coefficient voice vectors for each corresponding segment; and
- a convolver convolving the pitch adjusted excitation spectrum with the substituted target cepstrum coefficient voice vectors.
- 7. A voice transformation system according to claim 3, wherein the transformation subsystem includes:
- a store storing the target voice pitch information, a plurality of the target vectors, a plurality of the source vectors and the correspondence between the source and target vectors;
- a pitch adjustment unit adjusting the pitch of the frequency domain excitation spectrum to generate a pitch adjusted excitation spectrum;
- a substitution unit receiving source vectors and responsive to the stored voice and target vectors and substituting one of the stored target vectors for each received source vector; and
- a convolver convolving each substituted target vector with the corresponding pitch adjusted excitation spectrum to generate a segmented frequency domain target voice signal.
- 8. A voice transformation system according to claim 3, wherein the post processing subsystem includes:
- an inverse Fourier transform unit transforming the segmented target voice signal to the segmented time domain target voice signal;
- an inverse segmenting and windowing unit converting the segmented time domain target voice signal to a sampled nonsegmented target voice signal; and
- a time duration adjustment unit adjusting the time duration of representations of the sampled nonsegmented target voice signal.
- 9. A voice transformation system according to claim 8, further comprising a digital-to-analog converter converting the time duration adjusted sampled nonsegmented target voice signal to a continuous time varying signal representing spoken utterances of the source voice with acoustical characteristics of the target voice.
- 10. A method of transforming a source signal representing a source voice to a target signal representing a target voice comprising the steps of:
- preprocessing the source signal to produce a time domain sampled and segmented source signal in response thereto;
- analyzing the sampled and segmented source signal, the analysis including executing a transformation of the source signal to the frequency domain, generating a cepstrum vector representation of a smoothed spectrum of each segment of the source signal, generating an excitation signal representing the excitation of each segment of the source signal, determining a pitch for each segment of the source signal, and adjusting the excitation signal for each segment of the source signal in response to the pitch for each segment of the source signal;
- transforming each segment by storing cepstrum vectors representing target speech and corresponding cepstrum vectors representing source speech, substituting a stored target speech cepstrum vector for an analyzed source cepstrum vector and convolving the substituted target cepstrum vector with the excitation signal to generate a target segmented frequency domain signal; and
- post processing the target segmented frequency domain signal to provide transformation to the time domain and inverse segmentation to generate the target voice signal.
- 11. For use with a costume depicting a predefined character having a voice with a pre-established voice characteristic, a voice transformation system comprising:
- a microphone that is positionable to receive and transduce speech that is spoken by a person wearing the costume into a source speech signal;
- a mask that is positionable to cover the mouth of the person wearing the costume to muffle the speech of the person wearing the costume to tent to prevent communication of the speech beyond the costume, the mask enabling placement of the microphone between the mouth and the mask;
- a speaker disposed on or within the costume to broadcast acoustic waves carrying speech in the voice of the character depicted by the costume; and
- a voice transformation device coupled to receive the signal from the microphone representing source speech spoken by a person wearing the costume, the voice transformation device transforming the received source speech signal to a target speech signal by replacing vocal characteristics of the speaker, represented by the signal, with predefined and stored substitute vocal characteristics of the voice of the character depicted by the costume, the target speech signal being communication to the speaker to be transduced and acoustically broadcast by the speaker.
Parent Case Info
This application is a continuation of a prior pending application, application Ser. No. 07/845,375, filed on Mar. 2, 1992, now abandoned.
US Referenced Citations (12)
Foreign Referenced Citations (1)
Number |
Date |
Country |
0285276 |
May 1988 |
EPX |
Continuations (1)
|
Number |
Date |
Country |
Parent |
845375 |
Mar 1992 |
|