Claims
- 1. A speech coder apparatus for encoding input speech signals for transmission over a communication channel at bit rates of 100 bits per second or less, comprising:
- transmitting means responsive to an input speech signal for providing a first and a second output signal for transmission, said transmitting means including:
- continuous speech recognition means having a first output and a second output, said continuous speech recognition means having a memory for storing templates and means responsive to said stored templates to provide at an output, digital signals indicative of recognized words in said input speech signal as those matching said stored templates with said digital signals providing said first output signal and providing at a second output a word end point signal wherein each of said recognized works in said input speech signal has a value of pitch, duration and amplitude; and
- front end processing means having an input and an output, said front end processing means responsive to said input speech signal for providing at said output of said front end processing means, digitized speech samples during a given frame interval including side information encoding means responsive to said digitized speech samples and capable of determining value of pitch, duration and amplitude, said side information encoding means having an input coupled to said second output of said continuous speech recognition means and operably responsive thereto, to provide at an output of said side information encoding means a signal indicative of at least the value of the pitch and duration for each word recognized by said continuous speech recognition means with said output of said side information encoding means providing said second output signal for transmission and wherein said side information encoding means includes means for comparing and determining differences of values of said pitch and duration of each recognized word with values of pitch and duration as stored in a memory associated therewith to provide an output parameter signal indicative of said differences.
- 2. The speech coder apparatus according to claim 1, wherein said continuous speech recognition means employs a dynamic time warping (DTW) algorithm to determine the best match being a word contained in signal with at least one of said stored templates.
- 3. The speech coder apparatus according to claim 1, wherein said stored templates include word, filler and silence templates.
- 4. The apparatus according to claim 1, wherein said pre-recorded word memory stores values of amplitude for words stored therein, said apparatus including means for determining and means for comparing the amplitude of each word.
- 5. The speech coder apparatus according to claim 1, further including quantizing means responsive to said output parameter signal to provide a quantized output signal and for coding said quantized output signal into one out of Y digital signals, where Y is the number of possible digital signals, whereby each word with a difference in parameter is coded into at least one out of Y digital signals for transmission over said channel.
- 6. The speech coder apparatus according to claim 1, wherein said low bit rate is about 50 bits per second.
- 7. The speech coder apparatus according to claim 1, wherein said first output signal has a maximum rate of about 27 bits per second with said second output signal having a rate of 21 bits per second.
- 8. A speech coder apparatus according to claim 1, including receiving means responsive to said first and second output signals as transmitted to provide at an output a synthesized speech signal, said receiving means including:
- a synthesizer means responsive to said first and second output signals and having a pre-recorded word memory coupled to said synthesizer and having stored therein values of the pitch, duration and amplitude of a library of words as those words that can be recognized by said continuous speech recognition means, said synthesizer having means for processing said first and second output signals in conjunction with said values from said pre-recorded word memory to change the pitch, duration and amplitude of received words in said first output signal according to said second output signal.
- 9. The speech coder apparatus according to claim 1, including a synthesizer means, wherein said synthesizer means includes means for converting received speech signals via said first output signal into N sets of M signals with each signal including said output parameter signal, wherein N and M are positive integers greater than one.
- 10. The speech coder apparatus according to claim 9, wherein there are 240 (M) samples for each set of four sets (N) of coded excitation constitution one frame.
- 11. The speech coder apparatus according to claim 10, including pitch changing means for interpolating said N sets of M signals in said frame into a lesser number of samples in a first mode or a greater number of samples in a second mode.
- 12. The speech coder apparatus according to claim 10, wherein said set of samples includes 60 samples in a 7.5 millisecond interval, with four sets forming a 30 millisecond frame containing said 240 samples for each set.
- 13. The speech coder apparatus according to claim 12, wherein said values of pitch have a pitch frequency, and said pitch frequency is decreased by interpolating said 240 samples into 192 samples and wherein said pitch frequency is increased by interpolating said 240 samples into 288 samples.
- 14. The speech coder apparatus according to claim 9, including means for determining a long term delay for a frame, , and duration changing means, said duration changing means responsive to said second output signal and responsive to at least one set of said N sets of M signals to add or delete to said M signals, multiple sets of samples, each set of samples containing a number of samples which is the same as the number of the long term delay, for the frame to increase or decrease the duration of a word.
- 15. The speech coder apparatus according to claim 14, further including means for changing the value of the amplitude of said samples by applying to said samples a synthesized gain factor.
- 16. The speech coder apparatus according to claim 14, including means for interpolating which includes a Lagrange interpolator operative to interpolate a frame of data into a different number of samples.
- 17. The speech coder apparatus according to claim 1, further including pitch slope changing means responsive to said pitch value to change said pitch value by a variable percentage from frame to frame.
- 18. A method for coding speech signals for providing compression of such speech signals to permit transmission of speech over a communication channel at bit rates of 100 per second or less, comprising the steps of:
- comparing input speech with word templates stored in a memory to provide a coding indicative of recognized word data samples upon a favorable comparison;
- transmitting said coding indicative of recognized word data samples over a first path;
- simultaneously processing said input speech in a processor for each recognized word to provide an output parameter indicative of differences of values of pitch and duration data for each transmitted word with values of pitch and duration as stored in a memory associated therewith;
- transmitting said output parameters indicative of said differences of values of pitch and duration data over a second path;
- receiving said transmitted coding indicative of said recognized word data samples and said output parameters indicative of said differences of values of pitch and duration data;
- synthesizing said received coding indicative of said recognized word data according to words stored in a library memory to provide a replication of said recognized word data; and
- using said transmitted output parameters indicative of said differences of values of pitch and duration data to change the pitch and duration data of said words as stored in said library memory to provide a synthesized pitch and duration for each word.
- 19. The method according to claim 18, wherein said step of comparing includes applying said input speech to a continuous speech recognition unit to match patterns in said input speech with templates stored in a memory using a dynamic time warping (DTW) algorithm.
- 20. The method according to claim 19, wherein said templates stored are speaker dependent and include words, filler and silence templates.
- 21. The method according to claim 20, further including the steps of:
- analyzing said input speech to find word end points; and
- applying said word end points to said processor.
- 22. The method according to claim 21, further including the step of:
- determining a parameter of amplitude for each word and transmitting said parameter prior to the step of synthesizing.
- 23. The method according to claim 18, wherein the step of changing pitch includes interpolating said recognized word data samples into a different number of data samples.
- 24. The method according to claim 23, wherein the step of changing duration includes inserting or deleting groups of samples into the recognized word data samples having a length equal to a given delay.
- 25. The method according to claim 23, wherein the step of interpolating employs the Lagrange interpolation form.
- 26. The method according to claim 25, wherein said step of synthesizing said received data includes:
- converting said recognized word data samples into a linear predictive code for each word; and
- operating on said linear predictive code for each word to change the pitch and duration according to said transmitted median value of pitch and duration data.
- 27. The method according to claim 26, wherein the pitch of recognized data words has a slope, including the step of:
- changing the slope of the pitch of recognized data words by varying the pitch by a variable percentage.
- 28. A speech coder apparatus for encoding input speech signals for transmission over a communication channel at low bit rates, comprising:
- transmitting means responsive to an input speech signal for providing a first and a second output signal for transmission, said transmitting means including:
- continuous speech recognition means having a first output and a second output, said continuous speech recognition means having a memory for storing templates and means responsive to said stored templates to provide, at said first output, digital signals indicative of recognized words in said input speech signal as those matching said stored templates, with said digital signals providing said first output signal, and providing, at said second output, a word end point signal wherein each of said recognized words in said input speech signal has a value of pitch, duration and amplitude;
- front end processing means having an input and an output, said front end processing means responsive to said input speech signal for providing at said output of said front end processing means, digitized speech samples during a given frame interval including side information encoding means responsive to said digitized speech samples and capable of determining values of pitch, duration and amplitude, said side information encoding means having an input coupled to said second output of said continuous speech recognition means and operably responsive thereto, to provide at an output of said side information encoding means a signal indicative of at least the value of the pitch and duration for each word recognized by said continuous speech recognition means with said output of said side information encoding means includes means for comparing and determining differences of values of said pitch and duration of each recognized word with values of pitch and duration as stored in a memory associated therewith to provide an output parameter signal indicative of said differences; and
- receiving means responsive to said first and second output signals as transmitted to provide at an output a synthesized speech signal, said receiving means including:
- a synthesizer means responsive to said first and second output signals and having a pre-recorded word memory coupled to said synthesizer and having stored therein values of the pitch, duration and amplitude of a library of words as those words that can be recognized by said continuous speech recognition means, said synthesizer having means for processing said first and second output signals in conjunction with said values from said pre-recorded word memory to change the pitch, duration and amplitude of received words in said first output signal according to said second output signal, wherein said synthesizer means includes means for converting received speech signals via said first output signal into N sets of M signals with each signal including said parameter signal, wherein N and M are positive integers greater than one, and wherein there are 240 (M) samples for each set of four sets (N) of coded excitation constituting one frame.
- 29. The speech coder apparatus according to claim 28, including pitch changing means for interpolating said N sets of M signals in said frame into a lesser number of samples in a first mode or a greater number of samples in a second mode.
- 30. The speech coder apparatus according to claim 28, wherein said set of samples includes 60 samples in a 7.5 millisecond interval, with four sets forming a 30 millisecond frame containing said 240 samples for each set.
- 31. The speech coder apparatus according to claim 30, wherein said values of pitch have a pitch frequency, and said pitch frequency is decreased by interpolating said 240 samples into 192 samples and wherein said pitch frequency is increased by interpolating said 240 samples into 288 samples.
- 32. A speech coder apparatus for encoding input speech signals for transmission over a communication channel at low bit rates, comprising:
- transmitting means responsive to an input speech signal for providing a first and a second output signal for transmission, said transmitting means including:
- continuous speech recognition means having a first output and a second output, said continuous speech recognition means having a memory of storing templates and means responsive to said stored templates to provide, at said first output, digital signals indicative of recognized words in said input speech signal as those matching said stored templates, with said digital signals providing said first output signal, and providing, at said second output, a word end point signal wherein each of said recognized words in said input speech signal has a value of pitch, duration and amplitude:
- front end processing means having an input and an output, said front end processing means responsive to said input speech signal for providing at said output of said front end processing means, digitized speech samples during a given frame interval including side information encoding means responsive to said digitized speech samples and capable of determining values of pitch, duration and amplitude, said side information encoding means having an input coupled to said second output of said continuous speech recognition means and operably responsive thereto, to provide at an output of said side information encoding means a signal indicative of at least the value of the pitch and duration for each word recognized by said continuous speech recognition means with said output of said side information encoding means providing said second output signal for transmission and wherein said side information encoding means includes means for comparing and determining differences of values of said pitch and duration of each recognized word with values of pitch and duration as stored in a memory associated therewith to provide an output parameter signal indicative of said differences;
- receiving means responsive to said first and second output signals as transmitted to provide at an output a synthesized speech signal, said receiving means including:
- a synthesizer means responsive to said first and second output signals and having a pre-recorded word memory coupled to said synthesizer and having stored therein values of the pitch, duration and amplitude of a library of words as those words that can be recognized by said continuous speech recognition means, said synthesizer having means for processing said first and second output signals in conjunction with said values from said pre-recorded word memory to change the pitch, duration and amplitude of received words in said first output signal according to said second output signal, wherein said synthesizer means includes means for converting received speech signals via said first output signal into N sets of M signals with each signal including said parameter signal, wherein N and M are positive integers greater than one; and
- means for determining a long-term delay for a frame, , and duration changing means, said duration changing means responsive to said second output signal and responsive to at least one set of N sets of M signals to add or delete to said M signals, multiple sets of samples, each set of samples containing a number of samples which is the same as the number of the long term delay, for the frame to increase or decrease the duration of a word.
- 33. The speech coder apparatus according to claim 32, further including means for changing the value of the amplitude of said samples by applying to said samples a synthesized gain factor.
- 34. The speech coder apparatus according to claim 32, including means for interpolating which includes a Lagrange interpolator operative to interpolate a frame of data into a different number of samples.
Government Interests
The United States Government has rights in this invention pursuant to RADC Contract F30602-89-C-0118 awarded by the Department of the Air Force.
US Referenced Citations (3)
Number |
Name |
Date |
Kind |
4720863 |
Li et al. |
Jan 1988 |
|
4975956 |
Liu et al. |
Dec 1990 |
|
4975957 |
Ichikawa et al. |
Dec 1990 |
|