Claims
- 1. A method for determining the mouth features for a speaking character, comprising the steps of:
- sampling a time-domain audio signal;
- separating the time-domain audio signal into a plurality of frames;
- applying a window to each of the plurality of frames; and
- applying a linear predictive coding (LPC) technique to each of the plurality of frames to achieve a plurality of LPC coefficients and a gain for each of the plurality of frames, whereby the LPC coefficients and gain for each frame are used to determine the mouth features for the character on a frame-by-frame basis.
- 2. The method recited in claim 1, further comprising the step of:
- transmitting the LPC coefficients and the gain for each of the frames to the character.
- 3. The method recited in claim 1, further comprising the steps of:
- mapping the plurality of LPC coefficients to the Cepstral domain for each frame to obtain a plurality of Cepstral coefficients for each frame;
- vector quantizing the Cepstral coefficients to obtain a vector quantization result corresponding to a lip position of the character; and
- applying the vector quantization result and the gain for each frame to a mapping function to obtain the mouth features of the character for each frame.
- 4. The method recited in claim 3 wherein the mapping function is defined by a lookup table.
- 5. The method recited in claim 3 further comprising the steps of:
- before applying the vector quantization result and the gain for each frame to the mapping function, determining a plurality of local maxima for gain and a plurality of local minima for gain within a predetermined number of frames;
- discarding local maxima which occur too close to the last local minimum;
- discarding local minima which occur too close to the last local maximum;
- adjusting the gain for a frame containing one of the local minima to equal a minimum gain level;
- adjusting the gain for a frame containing one of the local maxima to equal a maximum gain level;
- averaging the distance between the local minima and local maxima; and
- scaling the gain of all of the frames between the range of minimum gain level to maximum gain level.
- 6. The method recited in claim 5 wherein the minimum gain level corresponds to a minimum mouth opening for the character and the maximum gain level corresponds to a maximum mouth opening for the character.
- 7. The method recited in claim 5 further comprising the step of determining a minimum distance between local minima.
- 8. The method recited in claim 5 further comprising the step of causing the distance between local maxima to be averaged between the closing local minima.
- 9. The method recited in claim 5 further comprising the step of scaling the gain between the range of fully closed to fully open.
- 10. A computer-readable medium having computer-readable instructions for performing the steps recited in claim 5.
- 11. A computer-implemented method for generating mouth features of a character, comprising the steps of:
- sampling a time-domain voice signal;
- separating the time-domain voice signal into a plurality of frames;
- applying a windowing technique to each frame;
- applying a linear predictive coding (LPC) technique to each of the plurality of frames to generate a plurality of LPC coefficients and a gain for each frame;
- mapping the plurality of LPC coefficients to the Cepstral domain to obtain a plurality of Cepstral coefficients for each frame;
- vector quantizing the Cepstral coefficients to obtain a lip position for each frame;
- determining a local maximum of the gain and a local minimum of the gain within a predetermined number of frames;
- adjusting the gain for the frame containing the local minimum to equal a minimum gain level;
- adjusting the gain for the frame containing the local maximum to equal a maximum gain level; and
- applying the lip position and the gain for each frame to an empirically derived mapping function to obtain the mouth features of the character for each frame.
- 12. The computer-implemented method recited in claim 11 wherein the step of sampling the time-domain voice signal comprises digitally sampling the time-domain voice signal.
- 13. The computer-implemented method recited in claim 11 wherein the step of applying a windowing technique to each of the plurality of frames comprises the step of applying a Hamming window to each frame.
- 14. The computer-implemented method recited in claim 11 wherein the character is a computer-animated character, further comprising the steps of:
- reproducing the time-domain voice signal through a speaker; and
- displaying on a display device the mouth features of the computer-animated character in unison with reproduction of the time-domain voice signal via the speaker.
- 15. The computer-implemented method recited in claim 11 wherein the character is a mechanical character having a speaker, a pair of lips, and at least one motor for controlling the position of the lips, further comprising the steps of:
- audibly broadcasting the time-domain voice signal through the speaker; and
- activating each motor to move the pair of lips in unison with the time-domain voice signal such that, for each frame of the time-domain voice signal, the pair of lips corresponds to the mouth features obtained through the empirically derived mapping function for the frame of the timedomain voice signal that is being audibly broadcast.
- 16. A computer system for synchronizing the mouth features of a speaking performer to a voice signal transmitted by the performer, comprising:
- a processor; and
- a memory storage device for storing a program module;
- the processor, responsive to instructions from the program module, being operative to:
- sample the voice signal;
- break the voice signal into a number of frames;
- apply a windowing technique to each of the frames;
- apply a linear predictive coding technique to each frame to obtain a number of reflection coefficients and a gain coefficient for each frame;
- transform the reflection coefficients into Cepstral coefficients;
- determine a lip position for each frame that corresponds to the Cepstral coefficients for each frame;
- adjust the gain of certain frames of the voice signal so that a mouth of the performer fully opens and fully closes within a predetermined number of frames; and
- determine the mouth features corresponding to each frame using the gain and lip position for each frame.
- 17. The computer system of claim 16 wherein the windowing technique applies a window to each frame to avoid discontinuities of each frame.
- 18. The computer system of claim 16 wherein the processor is further operative to adjust the gain of certain frames by:
- determining a local maximum for gain and a local minimum for gain for a predetermined number of frames of the voice signal;
- adjusting the gain for the frames containing a local minimum for gain to equal a minimum gain; and
- adjusting the gain for the frames containing a local maximum for gain to equal a maximum gain.
- 19. The computer system of claim 18 wherein the minimum gain corresponds to the mouth of the character being fully open and the maximum gain corresponds to the mouth of the character being fully closed.
- 20. The computer system of claim 16 wherein the processor is further operative to determine the mouth features corresponding to each frame by:
- applying the gain and lip position for each frame to a mapping function to obtain data commands corresponding to the mouth features of the performer for each frame;
- receiving data commands based upon the mapping function; and
- transmitting the data commands to the performer.
REFERENCE TO RELATED APPLICATIONS
This application is related to the subject matter disclosed in U.S. application Ser. Nos. 08/794,921 entitled "A SYSTEM AND METHOD FOR CONTROLLING A REMOTE DEVICE" filed Feb. 4, 1997, 08/795,698 entitled "SYSTEM AND METHOD FOR SUBSTITUTING AN ANIMATED CHARACTER WHEN A REMOTE CONTROL PHYSICAL CHARACTER IS UNAVAILABLE" filed Feb. 4, 1997, and 08/795,710 entitled "PROTOCOL FOR A WIRELESS CONTROL SYSTEM" filed Feb. 4, 1997 which are assigned to a common assignee and which are incorporated herein by reference.
US Referenced Citations (26)
Foreign Referenced Citations (1)
Number |
Date |
Country |
WO9110490 |
Jul 1991 |
WOX |
Non-Patent Literature Citations (1)
Entry |
Rabiner et al., "Linear Predictive Coding of Speech," Chap. 8, Digital Processing Of Speech Signals, pp. 396-461, Prentice-Hall, Inc., Englewood Cliffs, N.J., 1978. |