Claims
- 1. A system for performing a speech recognition procedure, comprising:
a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized phone set, said optimized phone set being implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal phones and said vocalic phones; and a processor configured to control said recognizer to thereby perform said speech recognition procedure.
- 2. The system of claim 1 wherein said input speech data includes Mandarin Chinese language data, said optimized phone set being compactly configured to accurately represent said Mandarin Chinese language data.
- 3. The system of claim 1 wherein said recognizer and said processor are implemented as part of a consumer electronics device.
- 4. The system of claim 1 wherein said optimized phone set conserves processing resources and memory resources while performing said speech recognition procedure.
- 5. The system of claim 1 wherein said optimized phone set reduces training requirements for performing a recognizer training procedure to initially implement said recognizer.
- 6. The system of claim 1 wherein said phone strings each include a different series of phones from said optimized phone set, each of said phone strings corresponding to a different word from said vocabulary dictionary.
- 7. The system of claim 6 wherein said recognizer compares said input speech data to Hidden Markov Models for said phone strings from said vocabulary dictionary to thereby select a recognized word during said speech recognition procedure.
- 8. The system of claim 1 wherein said optimized phone set includes phones b, p, d, t, g, k, z, c, zh, ch, j, q, f, s, sh, x, h, m, n, ng, l, r, y, w, a, e, o, i, u, yu, ai, ei, ao, and ou.
- 9. The system of claim 1 wherein said optimized phone set includes consonantal phones b, p, d, t, g, k, z, c, zh, ch, j, q, f, s, sh, x, h, m, n, ng, l, r, y, and w.
- 10. The system of claim 1 wherein said optimized phone set includes a closure phone “cl”.
- 11. The system of claim 1 wherein said optimized phone set includes vocalic phones a, e, o, i, u, yu, ai, ei, ao, and ou.
- 12. The system of claim 1 wherein said optimized phone set represents certain diphthongs by utilizing unified diphthong phones to thereby conserve processing resources and memory resources while providing greater accuracy characteristics for said speech recognition procedure.
- 13. The system of claim 12 wherein said optimized phone set includes unified diphthong phones ai, ei, ao, and ou.
- 14. The system of claim 1 wherein said optimized phone set includes a stops category that includes separate phones for b, p, d, t, g, and k.
- 15. The system of claim 1 wherein said optimized phone set includes a affricates category that includes separate phones for z, c, zh, ch, j, and q.
- 16. The system of claim 1 wherein said optimized phone set includes a fricatives category that includes separate phones for f, s, sh, x, and h.
- 17. The system of claim 1 wherein said optimized phone set includes an approximants category that includes separate phones for l, r, y, w, and yu.
- 18. The system of claim 1 wherein said optimized phone set includes a nasals category that includes separate phones for m, n, and ng.
- 19. The system of claim 1 wherein said optimized phone set represents various sounds of a Mandarin Chinese language without utilizing corresponding tonal information as part of different phones in said optimized phone set.
- 20. The system of claim 1 wherein said consonantal phones and said vocalic phones from said optimized phone set are combined to represent syllables from a Mandarin Chinese language system.
- 21. A method for performing a speech recognition procedure, comprising the steps of:
configuring a recognizer to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized phone set, said optimized phone set being implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal phones and said vocalic phones; and controlling said recognizer with a processor to thereby perform said speech recognition procedure.
- 22. The method of claim 21 wherein said input speech data includes Mandarin Chinese language data, said optimized phone set being compactly configured to accurately represent said Mandarin Chinese language data.
- 23. The method of claim 21 wherein said recognizer and said processor are implemented as part of a consumer electronics device.
- 24. The method of claim 21 wherein said optimized phone set conserves processing resources and memory resources while performing said speech recognition procedure.
- 25. The method of claim 21 wherein said optimized phone set reduces training requirements for performing a recognizer training procedure to initially implement said recognizer.
- 26. The method of claim 21 wherein said phone strings each include a different series of phones from said optimized phone set, each of said phone strings corresponding to a different word from said vocabulary dictionary.
- 27. The method of claim 26 wherein said recognizer compares said input speech data to Hidden Markov Models for said phone strings from said vocabulary dictionary to thereby select a recognized word during said speech recognition procedure.
- 28. The method of claim 21 wherein said optimized phone set includes phones b, p, d, t, g, k, z, c, zh, ch, j, q, f, s, sh, x, h, m, n, ng, l, r, y, w, a, e, o, i, u, yu, ai, ei, ao, and ou.
- 29. The method of claim 21 wherein said optimized phone set includes consonantal phones b, p, d, t, g, k, z, c, zh, ch, j, q, f, s, sh, x, h, m, n, ng, l, r, y, and w.
- 30. The method of claim 21 wherein said optimized phone set includes a closure phone “cl”.
- 31. The method of claim 21 wherein said optimized phone set includes vocalic phones a, e, o, i, u, yu, ai, ei, ao, and ou.
- 32. The method of claim 21 wherein said optimized phone set represents certain diphthongs by utilizing unified diphthong phones to thereby conserve processing resources and memory resources while providing greater accuracy characteristics for said speech recognition procedure.
- 33. The method of claim 32 wherein said optimized phone set includes unified diphthong phones ai, ei, ao, and ou.
- 34. The method of claim 21 wherein said optimized phone set includes a stops category that includes separate phones for b, p, d, t, g, and k.
- 35. The method of claim 21 wherein said optimized phone set includes a affricates category that includes separate phones for z, c, zh, ch, j, and q.
- 36. The method of claim 21 wherein said optimized phone set includes a fricatives category that includes separate phones for f, s, sh, x, and h.
- 37. The method of claim 21 wherein said optimized phone set includes an approximants category that includes separate phones for l, r, y, w, and yu.
- 38. The method of claim 21 wherein said optimized phone set includes a nasals category that includes separate phones for m, n, and ng.
- 39. The method of claim 21 wherein said optimized phone set represents various sounds of a Mandarin Chinese language without utilizing corresponding tonal information as part of different phones in said optimized phone set.
- 40. The method of claim 21 wherein said consonantal phones and said vocalic phones from said optimized phone set are combined to represent syllables from a Mandarin Chinese language system.
- 41. A computer-readable medium comprising program instructions for performing a speech recognition, by performing the steps of:
configuring a recognizer to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized phone set, said optimized phone set being implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal phones and said vocalic phones; and controlling said recognizer with a processor to thereby perform said speech recognition procedure.
- 42. A system for performing a speech recognition procedure, comprising:
means for comparing input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized phone set, said optimized phone set being implemented with a phonetic technique to separately provide consonantal phones and vocalic phones, said optimized phone set being implemented in a compact manner to include only a minimum required number of said consonantal phones and said vocalic phones; and means for controlling said means for comparing to thereby perform said speech recognition procedure.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application relates to, and claims priority in, U.S. Provisional Patent Application Serial No. 60/395,113, entitled “Efficient Phone-Based Recognition Engines For Chinese And English Isolated Command Applications,” filed on Jul. 11, 2002. The foregoing related application is commonly assigned, and is hereby incorporated by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60395113 |
Jul 2002 |
US |