Claims
- 1. A method of compressing speech data, comprising:
parsing an input waveform into pitch segments; determining principal components of at least one pitch segment; sending a subset of the determined principal components during an initial transmission period; and sending coefficients of the input waveform for each pitch segment during a period subsequent to the initial transmission period.
- 2. The method of claim 1 wherein sending a subset of the principal components comprises sending six principal components.
- 3. The method of claim 1 wherein determining comprises:
determining the number of pitch periods; and generating a correlation matrix.
- 4. The method of claim 1 wherein determining comprises:
ordering the principal components.
- 5. The method of claim 1, further comprising:
determining coefficients for each pitch period.
- 6. The method of claim 1, further comprising:
determining if the principal components are still valid.
- 7. The method of claim 6 wherein determining if the principal components are still valid comprises:
determining if a pitch segment exceeds a predetermined threshold.
- 8. The method of claim 7 wherein the predetermined threshold is a measure of a distance from a pitch segment to a centroid determined by the principal components.
- 9. The method of claim 7, further comprising:
selecting a new set of principal components when the predetermined threshold is exceeded.
- 10. The method of claim 1, further comprising:
reconstructing the input waveform.
- 11. The method of claim 10 wherein reconstructing comprises:
scaling the principal components by the coefficients for each pitch segment to form scaled components; and summing the scaled components.
- 12. The method of claim 10, wherein reconstructing further comprises:
concatenating reconstructed components of the input waveform; and using a smoothing filter while concatenating the reconstructed components.
- 13. The method of claim 10 wherein the smoothing filter is an alpha blend filter.
- 14. The method of claim 1, further comprising:
reducing the principal components to reduce the number of bits transmitted.
- 15. The method of claim 1, further comprising:
improving the accuracy of reconstructing the input wave form by increasing the number of principal components.
- 16. A method of receiving an input waveform, comprising:
receiving a subset of determined principal components of at least one pitch segment during an initial transmission period; and receiving coefficients of the input waveform for each pitch segment during a period subsequent to the initial transmission period.
- 17. The method of claim 16 wherein reconstructing comprises:
scaling the principal components by the coefficients for each pitch segment to form scaled components; and summing the scaled components.
- 18. The method of claim 16, wherein reconstructing further comprises:
concatenating reconstructed components of the input waveform; and using a smoothing filter while concatenating the reconstructed components.
- 19. The method of claim 18 wherein the smoothing filter is an alpha blend filter.
- 20. A method of compressing speech data, comprising:
parsing an input waveform into pitch segments; determining principal components of at least one pitch segment; sending a subset of the determined principal components during an initial transmission period; sending coefficients of the input waveform for each pitch segment during a period subsequent to the initial transmission period; receiving a subset of determined principal components of at least one pitch segment during an initial transmission period; and receiving coefficients of the input waveform for each pitch segment during a period subsequent to the initial transmission period.
- 21. An apparatus comprising:
a memory that stores executable instructions for compressing speech data; and a processor that executes the instructions to:
parse an input waveform into pitch segments; determine principal components of at least one pitch segment; send a subset of the determined principal components during an initial transmission period; and send coefficients of the input waveform for each pitch segment during a period subsequent to the initial transmission period.
- 22. The apparatus of claim 21 wherein to send a subset of the principal components comprises sending six principal components.
- 23. The apparatus of claim 21 wherein to determine comprises:
determining the number of pitch periods; and generating a correlation matrix.
- 24. The apparatus of claim 21 wherein to determine comprises:
ordering the principal components.
- 25. The apparatus of claim 21, further comprising instructions to:
determine coefficients for each pitch period.
- 26. The apparatus of claim 21, further comprising instructions to:
determine if the principal components are still valid.
- 27. The apparatus of claim 26 wherein the instructions to determine if the principal components are still valid comprises:
determining if a pitch segment exceeds a predetermined threshold.
- 28. The apparatus of claim 27 wherein the predetermined threshold is a measure of a distance from a pitch segment to a centroid determined by the principal components.
- 29. The apparatus of claim 27, further comprising instructions to:
select a new set of principal components when the predetermined threshold is exceeded.
- 30. The apparatus of claim 21, further comprising instructions to:
reconstruct the input waveform.
- 31. The apparatus of claim 30 wherein instructs to reconstruct comprises:
scaling the principal components by the coefficients for each pitch segment to form scaled components; and summing the scaled components.
- 32. The apparatus of claim 30, wherein instructions to reconstruct comprises:
concatenating reconstructed components of the input waveform; and using a smoothing filter while concatenating the reconstructed components.
- 33. An apparatus comprising:
a memory that stores executable instructions for receiving an input waveform; and a processor that executes the instructions to:
receive a subset of determined principal components of at least one pitch segment during an initial transmission period; and receive coefficients of the input waveform for each pitch segment during a period subsequent to the initial transmission period.
- 34. The apparatus of claim 33, wherein instructions to reconstruct comprises:
scaling the principal components by the coefficients for each pitch segment to form scaled components; and summing the scaled components.
- 35. The apparatus of claim 33, wherein instructions to reconstruct comprises:
concatenating reconstructed components of the input waveform; and using a smoothing filter while concatenating the reconstructed components.
- 36. An apparatus comprising:
a memory that stores executable instructions for compressing speech data; and a processor that executes the instructions to:
parse an input waveform into pitch segments; determine principal components of at least one pitch segment; send a subset of the determined principal components during an initial transmission period; send coefficients of the input waveform for each pitch segment during a period subsequent to the initial transmission period; receive a subset of determined principal components of at least one pitch segment during an initial transmission period; and receive coefficients of the input waveform for each pitch segment during a period subsequent to the initial transmission period.
- 37. An article comprising a machine-readable medium that stores executable instructions for compressing speech data, the instructions causing a machine to:
parse an input waveform into pitch segments; determine principal components of at least one pitch segment; send a subset of the determined principal components during an initial transmission period; and send coefficients of the input waveform for each pitch segment during a period subsequent to the initial transmission period.
- 38. The article of claim 37 wherein instructions causing a machine to send a subset of the principal components comprise instructions causing a machine to send six principal components.
- 39. The article of claim 37 wherein instructions causing a machine to determine comprise instructions causing a machine to:
determine the number of pitch periods; and generating a correlation matrix.
- 40. The article of claim 37 wherein instructions causing a machine to determine comprise instructions causing a machine to:
order the principal components.
- 41. The article of claim 37, further comprising instructions causing a machine to:
determine coefficients for each pitch period.
- 42. The article of claim 37, further comprising instructions causing a machine to:
determine if the principal components are still valid.
- 43. The article of claim 42 wherein instructions causing a machine to determine if the principal components are still valid comprise instructions causing a machine to:
determine if a pitch segment exceeds a predetermined threshold.
- 44. The article of claim 43 wherein the predetermined threshold is a measure of a distance from a pitch segment to a centroid determined by the principal components.
- 45. The article of claim 43, further comprising instructions causing a machine to:
select a new set of principal components when the predetermined threshold is exceeded.
- 46. The article of claim 37, further comprising instructions causing a machine to:
reconstructing the input waveform.
- 47. The article of claim 46 wherein instructions causing a machine to reconstruct comprise instructions causing a machine to:
scale the principal components by the coefficients for each pitch segment to form scaled components; and sum the scaled components.
- 48. The article of claim 46, wherein instructions causing a machine to reconstruct further comprise instructions causing a machine to:
concatenate reconstructed components of the input waveform; and use a smoothing filter while concatenating the reconstructed components.
- 49. An article comprising a machine-readable medium that stores executable instructions for receiving an input waveform, the instructions causing a machine to:
receive a subset of determined principal components of at least one pitch segment during an initial transmission period; and receive coefficients of the input waveform for each pitch segment during a period subsequent to the initial transmission period.
- 50. The article of claim 49, wherein instructions causing a machine to reconstruct comprise instructions causing a machine to:
scaling the principal components by the coefficients for each pitch segment to form scaled components; and summing the scaled components.
- 51. The article of claim 49, wherein instructions causing a machine to reconstruct comprise instructions causing a machine to:
concatenate reconstructed components of the input waveform; and use a smoothing filter while concatenating the reconstructed components.
- 52. An article comprising a machine-readable medium that stores executable instructions for compressing speech data, the instructions causing a machine to:
parse an input waveform into pitch segments; determine principal components of at least one pitch segment; send a subset of the determined principal components during an initial transmission period; send coefficients of the input waveform for each pitch segment during a period subsequent to the initial transmission period; receive a subset of determined principal components of at least one pitch segment during an initial transmission period; and receive coefficients of the input waveform for each pitch segment during a period subsequent to the initial transmission period.
- 53. The method of claim 1, further comprising:
comparing principal components to a library of principal components previously spoken by a speaker.
- 54. The method of claim 53, further comprising:
generating phonemes; and converting the phonemes to text.
- 55. The method of claim 1, further comprising:
receiving a phoneme; and combining the coefficients and the principal components with the phoneme to produce natural speech.
- 56. The method of claim 55, further comprising;
altering the coefficients to reflect user selectable intonations.
- 57. The method of claim 16, further comprising:
comparing principal components to a library of principal components previously spoken by a speaker.
- 58. The method of claim 57, further comprising:
generating phonemes; and converting the phonemes to text.
- 59. The method of claim 16, further comprising:
receiving a phoneme; and combining the coefficients and the principal components with the phoneme to produce natural speech.
- 60. The method of claim 59, further comprising;
altering the coefficients to reflect user selectable intonations.
PRIORITY TO OTHER APPLICATIONS
[0001] This application claims priority from and incorporates herein U.S. Provisional Application No. 60/428,551, filed Nov. 21, 2002, and titled “Speech Compression Using Principal Component Analysis.”
Provisional Applications (1)
|
Number |
Date |
Country |
|
60428551 |
Nov 2002 |
US |