Claims
- 1. A speech signal processing system comprising:
- an input terminal for receiving successive sample values of a speech waveform S(n) at successive time points n, where n=0, 1, 2, . . . ;
- inverse-filter means connected to said input terminal for obtaining successive sample values of a prediction residual waveform e(n) by removing a short-time correlation from the speech waveform S(n);
- phase-equalizing filter means connected to said input terminal for receiving the speech waveform S(n) therefrom and producing successive samples of a phase-equalized speech waveform Sp(n) in the time domain by zero-phasing a prediction residual waveform component in the speech waveform in accordance with successive sets of M+1 phase-equalizing filter coefficients h(m,n) supplied thereto as filter coefficients thereof, where m=0, 1, 2, . . . , M, and M is a positive integer; and
- filter coefficient determining means connected to the output of said inverse-filter means for determining said phase-equalizing filter coefficients h(m,n) on the basis of said prediction residual waveform e(n), said filter coefficient determining means including voiced/unvoiced sound discriminator means connected to the output of said inverse-filter means for discriminating whether said speech waveform is a voiced sound or an unvoiced sound based on whether a computed value of an auto-correlation function on said prediction residual waveform during an analysis window of a length N at said filter coefficient determining means is above or below a threshold value, pitch position detecting means connected to the outputs of said inversefilter means and said voiced/unvoiced sound discriminator means for detecting, when said speech waveform is discriminated as a voiced sound, pitch positions n.sub.l from said prediction residual waveform e(n), and filter coefficient computing means connected to the outputs of said inverse-filter means, said voiced/unvoiced sound discriminator means and said pitch position detecting means, respectively, for computing, when said speech waveform is discriminated as a voiced sound, a set of the M+1 phase-equalizing filter coefficients h(m,n) for a time point n of each pitch position n=n.sub.l by solving the following simultaneous equations given for K=0, 1, . . . M, ##EQU26## where L is the number of the pitch positions n.sub.l in the analysis window and V(m) is an auto-correlation function of said prediction residual waveform e(n) given by: ##EQU27## and for setting, when said speech waveform is discriminated as an unvoiced sound, a particular one order of coefficient of said phase-equalizing filter coefficients to a certain value and the other orders thereof to zero;
- the output of said filter coefficient determining means being connected to said phase-equalizing filter means so that successive sets of said phase-equalizing filter coefficients h(m,n.sub.l) determined by said filter coefficient determining means are supplied to said phase-equalizing filter means as the filter coefficients thereof, whereby said phase-equalizing filter means outputs the phaseequalized speech waveform Sp(n) as the output of said system representing the input speech waveform.
- 2. The speech signal processing system according to claim 1 wherein the analysis window length N is selected comparable to a pitch period so that the number L of said pitch positions n.sub.l is one, and said filter coefficient computing means computes filter coefficients h*(m,n.sub.l) instead of the coefficients h(m,n.sub.l) when the speech waveform is discriminated as a voiced sound by said voiced/unvoiced sound discriminating means, where ##EQU28## and e(n.sub.l +M/2-m) denotes a sample value of said prediction residual waveform at the pitch position n.sub.l.
- 3. The speech signal processing system according to claim 1 or 2 wherein said pitch position detecting means comprises a second phase equalizing filter means connected to the output of said inverse-filter means for phase-equalizing said prediction residual waveform e(n) from said inverse-filter means to produce a phase-equalized prediction residual waveform ep(n), filter coefficients of said second phase-equalizing filter means being controlled by the phase-equalizing filter coefficients determined by said filter coefficient determining means, and amplitude comparing means connected to the output of said second phase-equalizing means for detecting, as the pitch positions, time points at which relative amplitude values of the phase-equalized prediction residual waveform ep(n) within the analysis window are over a predetermined value.
- 4. The speech signal processing system according to claim 3 wherein said system further comprises:
- pulse-processing means for detecting an amplitude m.sub.l of said phase-equalized prediction residual waveform ep(n) at the pitch position n.sub.l obtained by said pitch position detecting means; and
- quantizing means connected to the output of said pulse-processing means for quantizing said detected pulse amplitude and producing quantized pulse amplitude c(n);
- the quantized pulse amplitude c(n), the pitch position n.sub.l l, a voiced or unvoiced sound discriminating value from said discriminator means and filter coefficients a(k) of said inverse-filter means being output as the output of the system representing the input speech signal.
- 5. The speech signal processing system according to claim 4 wherein said quantizing means comprises quantization step computing means connected to the output of said phase-equalizing filter means for computing the electric power v of said phase-equalized prediction residual waveform ep(n) supplied from said phase-equalizing filter means and a quantization step size from the computed electric power v, and adaptively varying a quantization step size of said quantizing means in accordance with the computed step size, the electric power of said phase-equalized prediction residual waveform being output as part of the output of said system representing the input speech waveform.
- 6. The speech signal processing system according to claim 1 or 2 wherein said filter coefficient determining means comprises filter coefficient interpolating means connected to the output of said filter coefficient computing means for interpolating the phase-equalizing filter coefficients for a time point between the computations of two successive sets of the phase-equalizing filter coefficients by said filter coefficient computing means so that the output of said filter coefficient determining means includes the interpolated phase-equalizing filter coefficients.
- 7. The speech signal processing system according to claim 1 or 2 wherein said system includes coding-processing means connected to the output of said phase-equalizing filter means for coding said phase-equalized speech waveform and outputting the coded phase-equalized speech waveform as the output of said system representing the input speech waveform.
- 8. The speech signal processing system according to claim 7 wherein said coding-processing means comprises:
- a second phase-equalizing filter means connected to the output of said inverse-filter means for receiving therefrom the prediction residual waveform e(n) and producing a phase-equalized prediction residual waveform ep(n) in accordance with the phase-equalizing filter coefficients h(m,n.sub.l) supplied from said filter coefficient determining means as filter coefficients of said second phase-equalizing filter means;
- tree code generating means connected to the output of said second phase-equalizing filter means for producing a series of sample values q(n) along a path of successive branches in a tree of codes defined in accordance with quantizing bit numbers R(n) for quantization of the phase-equalized prediction residual waveform ep(n), said path of successive branches being selected in accordance with a sequence of tree codes c(n);
- prediction filter means connected to the output of said tree code generating means for receiving therefrom the sample values q(n) and producing a local decoded speech waveform Sp(n), said prediction filter means being controlled by the same filter coefficients as those of said inverse-filter means;
- difference detecting means connected to the outputs of said first mentioned phase-equalizing filter means and said second phase-equalizing filter means for detecting the difference between said phase-equalized speech waveform Sp(n) and the local decoded speech waveform Sp(n); and
- code sequence optimizing merans connected to said tree code generating means for generating and supplying thereto sequences of tree codes, said code sequence optimizing means being connected to the output of said difference detecting means for receiving therefrom the detected difference and searching an optimum sequence of the tree codes which minimizes the detected difference produced by said difference detecting means;
- the optimum code sequence c(n) obtained by said code sequence optimizing means and the filter coefficients for said inverse-filter means being outputted as the coded phase-equalized speech waveform.
- 9. The speech signal processing system according to claim 8 wherein said tree code generating means comprises:
- subinterval setting means connected to the output of said second phase-equalizing filter means for receiving therefrom the phase-equalized prediction residual waveform ep(n) and determining an energy-concentrated position Td and a pitch period Tp of the phase-equalized prediction residual waveform and corresponding residual power u.sub.i of each subinterval within the pitch period from the phase-equalized prediction residual waveform;
- bit allocating means connected to the output of said subinterval setting means for receiving therefrom the residual power u.sub.i and computing the quantizing bit number R(n) as the number of branches at each node in said tree code based on the residual power u.sub.i, said number of branches representing the number of bits to be allocated to encode samples of the phase-equalized prediction residual waveform in the corresponding subinterval; and
- step size computing means connected to the output of said subinterval setting means for receiving therefrom the residual power u.sub.i and computing, based on the residual power, a quantization step size .DELTA.(n) for quantizing the phase-equalized prediction residual waveform;
- said tree of codes being defined by the computed number of branches R(n) at each node of the tree and said tree code generating means being operative to produce the sample value q(n) as a decoded value from the computed step size .DELTA.(n) and the tree code c(n) on each selected branch, and the pitch period Tp, the pitch position Td and the residual power u.sub.i being outputted in codes from said coding-processing means as the output of said system representing the input speech waveform.
- 10. The speech signal processing system according to claim 7 wherein said coding-processing means comprises:
- multi-pulse coding means connected to said filter coefficient determining means for determining pulse positions t.sub.i and pulse amplitudes m.sub.i with respect to the pitch position n.sub.l received from said filter coefficient determining means;
- multi-pulse generating means connected to the output of said multi-pulse coding means for receiving therefrom the pulse positions t.sub.i and the pulse amplitudes m.sub.i and generating a multi-pulse signal e(n) composed of a train of pulses having the amplitudes m.sub.i at the respective pulse positions t.sub.i ;
- prediction filter means connected to the output of said multi-pulse coding means for producing a local decoded waveform Sp(n) by passing said multi-pulse signal through said prediction filter means while said prediction filter means is controlled by the same filter coefficients as those for said inverse-filter means; and
- difference detecting means connected to the outputs of said first mentioned phase-equalizing filter means and said second phase-equalizing filter means for receiving therefrom said phase-equalized speech waveform Sp(n) and said local decoded waveform Sp(n) and detecting the difference therebetween;
- the output of said difference detecting means being connected to said multi-pulse coding means to supply thereto the detected difference, and said multi-pulse coding means determing the pulse positions t.sub.i and the pulse amplitudes m.sub.i so as to minimize the detected difference and being operative to output, as part of the coded speech speech waveform, the determined pulse positions t.sub.i and pulse amplitudes m.sub.i along with the filter coefficients a(k).
- 11. A speech signal processing system comprising:
- an input terminal for receiving successive sample values of a speech waveform S(n) at successive time points n, where n=0, 1, 2, . . . ;
- inverse-filter means connected to said input terminal for obtaining successive sample values of a prediction residual waveform e(n) by removing a short-time correlation from the speech waveform S(n);
- phase-equalizing filter means connected to the output of said inverse-filter means for obtaining a phase-equalized residual waveform ep(n) in the time domain by zero-phasing the prediction residual waveform e(n) from said inverse-filter means in accordance with successive sets of M+1 phase-equalizing filter coefficients h(m,n) supplied thereto as filter coefficients thereof, where m=0, 1, 2, . . . , M and M is a positive integer; and
- filter coefficient determining means connected to the output of said inverse-filter means for determining said phase-equalizing filter coefficients h(m,n) on the basis of said prediction residual waveform e(n), said filter coefficient determining means including voiced/unvoiced sound discriminator means connected to the output of said inverse-filter means for discriminating whether said speech waveform is a voiced sound or unvoiced sound based on whether a computed value of an auto-correlation function on said prediction residual waveform during an analysis window of a length N at said filter coefficient determining means is above or below a threshold value, pitch position detecting means connected to the outputs of said inverse-filter means and said voiced/unvoiced sound discriminator means for detecting, when said speech waveform is discriminated as a voiced sound, pitch positions n.sub.l from said prediction residual waveform e(n), and filter coefficient computing means connected to the outputs of said inverse-filter means, said voiced/unvoiced sound discriminator means and said pitch position detecting means, respectively, for computing, when said speech waveform is discriminated as a voiced sound, a set of the M+1 phase-equalizing filter coefficients h(m,n) for a time point n of each pitch position n=n.sub.l by solving the following simultaneous equations given for k=0, 1, . . . M, ##EQU29## where L is the number of the pitch positions n.sub.l in the analysis window and V(m) is an auto-correlation function of said prediction residual waveform e(n) given by: ##EQU30## and for setting, when said speech waveform is discriminated as an unvoiced sound, a particular one order of coefficient of said phase-equalizing filter coefficients to a certain value and the other orders thereof to zero;
- the output of said filter coefficient determining means being connected to said phase-equalizing means so that successive set of said phase-equalizing filter coefficients h(m,n.sub.l) determined by said filter coefficient determining means are supplied to said phase-equalizing filter means as filter coefficients thereof, whereby said phase-equalizing filter means outputs the phase-equalized prediction residual waveform ep(n) as the output of said system representing the input speech waveform.
- 12. The speech signal processing system according to claim 11 wherein the analysis window length N is selected comparable to a pitch period so that the number L of said pitch positions n.sub.l is one, and said filter coefficient computing means computes filter coefficients h*(m,n.sub.l) instead of the coefficients h(m,n.sub.l) when the speech waveform is discriminated as a voiced sound by said voiced/unvoiced sound discriminating means, where ##EQU31## and e(n.sub.l +M/2-m) denotes a sample value of said prediction residual waveform at the pitch position n.sub.l.
- 13. The speech signal processing system according to claim 11 or 12 wherein said pitch position detecting means comprises a second phase equalizing filter means connected to the output of said inverse-filter means for phase-equalizing the prediction residual waveform e(n) from said inverse filter means to produce a phase-equalized prediction residual waveform ep(n), filter coefficients of said second phase-equalizing filter means being controlled by the phase-equalizing filter coefficients determined by said filter coefficient determining means, and amplitude comparing means connected to the output of said second phase equalizing filter means for detecting, as the pitch positions, time points at which relative amplitude values of the phase-equalized prediction residual waveform ep(n) within the analysis window are over a predetermined value.
- 14. The speech signal processing system according to claim 11 or 12 wherein said filter coefficient determining means comprises filter coefficient interpolating means connected to the output of said filter coefficient computing means for interpolating the phase-equalizing filter coefficients for a time point between the computations of two successive sets of the phase-equalizing filter coefficients by said filter coefficient computing means so that the output of said filter coefficient determining means includes the interpolated phase-equalizing filter coefficients.
- 15. The speech signal processing system according to claim 11 wherein said system further comprises coding-processing means connected to the output of said phase-equalizing filter means for coding the phase-equalized prediction residual waveform and outputting the coded phase-equalized prediction residual waveform as the output of said system representing the input speech waveform.
- 16. The speech signal processing system according to claim 15 wherein said coding processing means includes energy-concentrated portion coding means connected to the output of said phase-equalizing means for detecting a position t.sub.i of each energy-concentrated portion in said phase-equalized residual waveform and coding the energy-concentrated portion to produce a code Pc representing the energy concentrated portion, the code of the energy-concentrated portion Pc and a code showing the energy-concentrated position t.sub.i being outputted along with codes of said filter coefficients a(k) of said inverse-filter means as the output of said system representing the input speech waveform.
- 17. The speech signal processing system according to claim 16 wherein said energy-concentrated portion coding means comprises pulse pattern generating means for reproducing a pulse pattern signal P(n) composed of a train of the energy-concentrated portions each centered at the respective energy-concentrated positions t.sub.i of said phase-equalized prediction residual waveform, and said coding processing means further comprises difference signal coding means connected to the output of said energy-concentrated portion coding means for generating a difference code c(n) representing a difference between said pulse pattern signal P(n) and said phase-equalized prediction residual waveform, said difference code c(n) being outputted as part of the output of said system representing the input speech waveform.
- 18. The speech signal processing system according to claim 17 wherein said pulse pattern generating means produces the pulse pattern signal P(n) by vector-quantizing a waveform of plural samples of each said energy-concentrated portion.
- 19. The speech signal processing system according to claim 17 wherein said difference signal coding means comprises subtraction means connected to the outputs of said phase-equalized filter means and said pulse pattern generating means for receiving the phase-equalized prediction residual waveform ep(n) and the pulse pattern signal P(n) and producing a difference therebetween as a difference signal V(n), and spectrum quantizing means connected to the output of said subtraction means for quantizing frequency components of said difference signal V(n) to produce a spectrum envelope code as the difference code c(n) representing said difference signal.
- 20. The speech signal processing system according to claim 17 wherein said difference signal coding means comprises vector code generating means for producing said difference code c(n) and a decoded vector value Vc(n) based on said difference code c(n), adder means connected to the outputs of said pulse pattern generating means and said vector code generating means for adding said pulse pattern signal P(n) and said decoded vector value Vc(n) received therefrom to produce a local decoded residual waveform ep(n), first prediction filter means connected to the output of said adder means for receiving therefrom the local decoded residual waveform ep(n) and producing a local decoded speech waveform Sp(n) by controlling filter coefficients of said prediction filter means with the same filter coefficients as those for said inverse-filter means, second prediction filter means connected to the output of said phase-equalizing filter means for regenerating a phase-equalized speech waveform Sp(n) from said phase-equalized prediction residual waveform ep(n), subtraction means connected to the outputs of said first and second prediction filter means for producing a difference between said regenerated phase-equalized speech waveform Sp(n) and said local decoded speech waveform Sp(n), and path search means connected to receive the difference and to control successive selections of said difference codes in said vector code generating means so that said difference becomes minimum.
- 21. The speech signal processing system according to claim 17 wherein said difference signal coding means comprises means for determining as the difference code c(n) a code of an optimum vector-tree value Vc(n) representing the difference between said phase-equalized residual waveform and said pulse pattern signal P(n).
- 22. The speech signal processing system according to claim 17 wherein said difference signal coding means comprises means for quantizing frequency components of the difference between said phase-equalized residual waveform and said pulse pattern signal and outputting the quantized results as the difference code c(n).
Priority Claims (2)
Number |
Date |
Country |
Kind |
59-53757 |
Mar 1984 |
JPX |
|
59-173903 |
Aug 1984 |
JPX |
|
Parent Case Info
This application is a continuation of Ser. No. 712,811, filed on Mar. 18, 1985, now abandoned.
US Referenced Citations (6)
Non-Patent Literature Citations (2)
Entry |
"On Synthesizing Natural Sounding Speech by Linear Prediction", by B. S. Atal, et al., ICASSP 79, Apr. '79, pp. 44-47. |
"A Harmonic Deviations Linear Prediction Vocoder for Improved Narrowband Speech Transmission," by V. R. Vishwanathan, ICASSP 82, pp. 610-613, May '82. |
Continuations (1)
|
Number |
Date |
Country |
Parent |
712811 |
Mar 1985 |
|