Large-vocabulary speech recognition method, apparatus, and medium based on multilayer central lexicons

Information

  • Patent Application
  • 20070185714
  • Publication Number
    20070185714
  • Date Filed
    August 28, 2006
    19 years ago
  • Date Published
    August 09, 2007
    18 years ago
Abstract
A speech recognition method including: layering a central lexicon in a tree structure with respect to recognition-subject vocabularies; performing multi-pass symbol matching between a recognized phoneme sequence and a phonetic sequence of the central lexicon layered in the tree structure; and selecting a final speech recognition result via a Viterbi search process using a detailed acoustic model with respect to candidate vocabularies selected by the multi-pass symbol matching.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

These and/or other aspects, features, and advantages of the present invention will become apparent and more readily appreciated from the following description of exemplary embodiments, taken in conjunction with the accompanying drawings of which:



FIG. 1 is a diagram illustrating a configuration of a speech recognition apparatus according to an exemplary embodiment of the present invention;



FIG. 2 is a diagram illustrating an example of a layered vocabulary group tree in the speech recognition apparatus according to an exemplary embodiment of the present embodiment;



FIG. 3 is a diagram illustrating an example of a phoneme confusion matrix in the speech recognition apparatus according to an exemplary embodiment of the present embodiment;



FIG. 4 is a diagram illustrating an example of an output of a phoneme decoder and a standard pattern of a lexicon in the speech recognition apparatus according to an exemplary embodiment of the present embodiment; and



FIG. 5 is a flowchart illustrating a speech recognition method according to an exemplary embodiment of the present invention.


Claims
  • 1. A speech recognition method comprising: layering a central lexicon in a tree structure with respect to recognition-subject vocabularies;performing multi-pass symbol matching between a recognized phoneme sequence and a phonetic sequence of the central lexicon layered in the tree structure; andselecting a final speech recognition result via a Viterbi search process using a detailed acoustic model with respect to candidate vocabularies selected by the multi-pass symbol matching.
  • 2. The method of claim 1, wherein the performing multi-pass symbol matching between a recognized phoneme sequence and a phonetic sequence of the central lexicon layered in the tree structure comprises traversing a node recording a maximum matching score for each layer while symbol matching the central lexicon layered in the tree structure, wherein the tracking a node is repeated until reaching a terminal node.
  • 3. The method of claim 1, wherein the performing multi-pass symbol matching between a recognized phoneme sequence and a phonetic sequence of the central lexicon layered in the tree structure comprises selecting a plurality of candidate nodes for each of the layers.
  • 4. The method of claim 3, wherein the candidate nodes are nodes of which a matching score exceeds a predetermined standard value for said each of the layer.
  • 5. The method of claim 1, wherein the matching score is calculated using a probability value of a phoneme confusion matrix.
  • 6. The method of claim 1, wherein the central lexicon represents a certain node and is determined to be a lexicon in a central position from all lexicons included in the node.
  • 7. The method of claim 6, further comprising determining lexicons which are separated from the central lexicon of each of the terminal nodes at a distance less than a predetermined standard value to be neighborhood lexicons.
  • 8. The method of claim 1, wherein the tree structure has a number of nodes, the number is determined according to a standard threshold used in clustering for each of the layers.
  • 9. The method of claim 8, wherein the clustering is performed using a modified K-means clustering method.
  • 10. A computer readable recording medium in which a program for executing a speech recognition method is recorded, the method comprising: layering a central lexicon in a tree structure with respect to recognition-subject vocabularies;performing multi-pass symbol matching between a recognized phoneme sequence and a phonetic sequence of the central lexicon layered in the tree structure; andselecting a final speech recognition result via a Viterbi search process using a detailed acoustic model with respect to candidate vocabularies selected by the multi-pass symbol matching.
  • 11. A speech recognition apparatus comprising: a multi-pass symbol matching unit performing multi-pass symbol matching between a recognized phoneme sequence and a phonetic sequence of a central lexicon layered in a tree structure; anda detailed matching unit performing detailed matching to select a speech recognition result using detailed acoustic model with respect to candidate vocabulary sets selected by the multi-pass symbol matching.
  • 12. The apparatus of claim 11, further comprising a lexicon classification unit classifying all lexicons, with respect to recognition subject vocabularies, into the tree structure.
  • 13. The apparatus of claim 11, wherein the multi-pass symbol matching unit calculates a matching score using a probability value of a phoneme confusion matrix between the recognized phoneme sequence and the phonetic sequence of the central lexicon layered in the tree structure.
  • 14. The apparatus of claim 11, wherein the multi-pass symbol matching unit traverses a node recording a maximum matching score for each layer and repeats traversing the node until reaching a terminal node.
  • 15. The apparatus of claim 14, wherein the candidate nodes are nodes of which a matching score exceeds a predetermined standard value for said each of the layer.
  • 16. A speech recognition method comprising: performing multi-pass symbol matching between a recognized phoneme sequence and a phonetic sequence of a central lexicon layered in a tree structure with respect to recognition-subject vocabularies; andselecting a final speech recognition result via a Viterbi search process using a detailed acoustic model with respect to candidate vocabularies selected by the multi-pass symbol matching.
  • 17. At least one computer readable medium storing instructions implementing the method of claim 16.
Priority Claims (1)
Number Date Country Kind
10-2006-0012529 Feb 2006 KR national