Recognition confidence measuring by lexical distance between candidates

Information

  • Patent Application
  • 20070185713
  • Publication Number
    20070185713
  • Date Filed
    July 31, 2006
    19 years ago
  • Date Published
    August 09, 2007
    18 years ago
Abstract
A recognition confidence measurement method, medium and system which can more accurately determine whether an input speech signal is an in-vocabulary, by extracting an optimum number of candidates that match a phone string extracted from the input speech signal and estimating a lexical distance between the extracted candidates is provided. A recognition confidence measurement method includes: extracting a phoneme string from a feature vector of an input speech signal; extracting candidates by matching the extracted phoneme string and phoneme strings of vocabularies registered in a predetermined dictionary and; estimating a lexical distance between the extracted candidates; and determining whether the input speech signal is an in-vocabulary, based on the lexical distance.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

These and/or other aspects, features, and advantages of the present invention will become apparent and more readily appreciated from the following detailed description, taken in conjunction with the accompanying drawings of which:



FIG. 1 illustrates an example of extracting a candidate in a confidence measurement system according to a conventional art;



FIG. 2 is a configuration diagram illustrating a recognition confidence measurement system according to an exemplary embodiment of the present invention;



FIG. 3 is a flowchart illustrating a method of detecting a feature vector from an input speech signal by a phoneme string extraction unit according to an exemplary embodiment of the present invention;



FIGS. 4 and 5 are flowcharts illustrating an example of estimating a phoneme confusion matrix according to an exemplary embodiment of the present invention;



FIG. 6 is a flowchart illustrating a recognition confidence measurement method according to another exemplary embodiment of the present invention; and



FIG. 7 is a schematic diagram illustrating an example of a recognition confidence measurement method according to still another exemplary embodiment of the present invention.


Claims
  • 1. A recognition confidence measurement method comprising: extracting a phoneme string from a feature vector of an input speech signal;extracting candidates by matching the extracted phoneme string and phoneme strings of vocabularies registered in a predetermined dictionary;estimating a lexical distance between the extracted candidates; anddetermining whether the input speech signal is an in-vocabulary, based on the lexical distance.
  • 2. The method of claim 1, wherein the extracting of the phoneme string extracts an optimum phoneme string according to each language by using a Hidden Markov Model (HMM) and a predetermined phoneme grammar for each language.
  • 3. The method of claim 1, wherein the extracting of the candidates comprises: calculating a similarity between the extracted phoneme string and the phoneme strings of the vocabularies; andextracting the candidates based on the calculated similarity.
  • 4. The method of claim 3, wherein the calculating of the similarity comprises estimating a phoneme confusion matrix.
  • 5. The method of claim 4, wherein estimating the phoneme confusion matrix comprises: allocating an initial value to a distance of phoneme-by-phoneme; andperforming a phoneme recognition using a training database.
  • 6. The method of claim 5, wherein the performing of the phoneme recognition comprises: performing a dynamic matching with respect to a result of the phoneme recognition and a phoneme string corresponding to vocabularies of the training database;estimating an optimum matching pair by back tracking; andestimating a number of matchings of the phoneme-by-phoneme and updating the distance.
  • 7. The method of claim 4, wherein f estimating the phoneme confusion matrix comprises: estimating a continuous HMM or a semi-continuous HMM for each phoneme by using a training database; andestimating a distance of phoneme-by-phoneme.
  • 8. The method of claim 7, wherein the estimating of the distance comprises: utilizing a Bhattacharya distance, in the case of the continuous HMM; andestimating an amount of information loss, in the case of the semi-continuous HMM.
  • 9. The method of claim 1, wherein the estimating of the lexical distance comprises: selecting a pair of candidates from the extracted candidates;performing a dynamic matching of the selected pair of candidates;calculating a score for the pair of candidates; andestimating the lexical distance using the calculated score.
  • 10. The method of claim 9, wherein the calculating of the score calculates the score using a predetermined phoneme confusion matrix.
  • 11. The method of claim 9, wherein the determining whether the input speech signal is in-vocabulary comprises: determining the input speech signal as in-vocabulary, when the calculated score satisfies a set numerical value; anddetermining the input speech signal as an out-of-vocabulary, when the calculated score does not satisfy the set numerical value.
  • 12. The method of claim 9, wherein the determining whether the input speech signal is in-vocabulary comprises: utilizing a predetermined weight for the calculated score to correct the calculated score.
  • 13. The method of claim 11, wherein the determining the input speech signal as out-of-vocabulary comprises: performing a rejection due to a recognition error with respect to the input speech signal.
  • 14. A computer readable storage medium storing a program for implementing a recognition confidence measurement method comprising: extracting a phoneme string from a feature vector of an input speech signal;extracting candidates by matching the extracted phoneme string and phoneme strings of vocabularies registered in a predetermined dictionary;estimating a lexical distance between the extracted candidates; anddetermining whether the input speech signal is an in-vocabulary, based on the lexical distance.
  • 15. A recognition confidence measurement system comprising: a phoneme string extraction unit extracting a phoneme string from a feature vector of an input speech signal;a candidate extraction unit extracting candidates by matching the extracted phoneme string and phoneme strings of vocabularies registered in a predetermined dictionary;a distance estimation unit estimating a lexical distance between the extracted candidates; anda registration determination unit determining whether the input speech signal is an in-vocabulary, based on the lexical distance.
  • 16. The system of claim 15, wherein the phoneme string extraction unit extracts an optimum phoneme string according to each language by using a Hidden Markov Model (HMM) and a predetermined phoneme grammar for the each language.
  • 17. The system of claim 15, wherein the candidate extraction unit calculates a similarity between the extracted phoneme string and the phoneme strings of the vocabularies, and extracts the candidates based on the calculated similarity.
  • 18. The system of claim 15, wherein the distance estimation unit performs a dynamic matching of a pair of candidates selected from the extracted candidates, calculates a score for the pair of candidates, and estimates the lexical distance using the calculated score.
  • 19. The system of claim 18, wherein the registration determination unit determines the input speech signal as in-vocabulary, when the calculated score satisfies a set numerical value and determines the input speech signal as an out-of-vocabulary, when the calculated score does not satisfy the set numerical value.
  • 20. The system of claim 18, wherein the registration determination unit utilizes a predetermined weight for the calculated score to correct the calculated score.
  • 21. The system of claim 18, wherein the registration determination unit performs a rejection due to a recognition error with respect to the input speech signal.
  • 22. A recognition confidence measurement method comprising: extracting candidates by matching a phoneme string of a speech signal and phoneme strings of vocabularies registered in a predetermined dictionary;estimating a lexical distance between the extracted candidates; anddetermining whether the speech signal is an in-vocabulary, based on the lexical distance.
  • 23. A medium comprising computer readable instructions implementing the method of claim 22.
Priority Claims (1)
Number Date Country Kind
10-2006-0012528 Feb 2006 KR national