Identification-function calculator, identification-function calculating method, identification unit, identification method, and speech recognition system

Information

  • Patent Grant
  • 6134525
  • Patent Number
    6,134,525
  • Date Filed
    Wednesday, October 21, 1998
    26 years ago
  • Date Issued
    Tuesday, October 17, 2000
    24 years ago
Abstract
A discriminant or identification function is used for pattern recognition in which the highest performance can be offered when adaptation is made. Learning is carried out while a discriminant or identification function is adapted to a learning sample. For example, a standard pattern of the character "A" used as an identification function is learned such that when the character "A" slanting in the right or left direction is input, the standard pattern of the character "A" is rotated (adapted) in accordance with the slanting of the input learning sample.
Description

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to a discriminant or identification-function calculator, a discriminant- or identification-function calculating method, a classification or identification unit, a classification or identification method, and a speech recognition system. More particularly, the invention relates to an identification-function calculator, an identification-function calculating method, an identification unit, an identification method, and a speech recognition system, all of which are suitable for performing pattern recognition, for example, speech recognition and image recognition.
2. Description of the Related Art
For example, pattern recognition, such as speech recognition and image recognition, is performed in such a manner that feature or characteristic vectors are extracted from input patterns, and the values of the discriminant or identification functions are calculated using the characteristic vectors as input values. The identification functions are used for classifying the input characteristic vectors under a predetermined number of classes. The number of functions are equal to or greater than the number of classes. The class corresponding to the greatest value of the identification functions with respect to the input characteristic vectors is output as the recognition result (classification result).
For performing pattern recognition, high recognition performance is desirably obtained regardless of a change in the state of variation factors. Hence, hitherto, the learning of the identification functions for performing pattern recognition are carried out by use of learning samples including a large number of variations. More specifically, for performing, for example, speech recognition, learning is conducted by use of speech data including a large number of variations as learning samples so as to obtain identification functions (for example, phoneme-discriminant functions when phoneme recognition is carried out) sufficiently resisting a change in the state of speech-variation factors, such as the sound-making environments, the speakers, the characteristics of the input apparatus systems (for example, the characteristics of the microphones and the analog-to-digital convertors for converting the outputs from the microphones). The aforementioned learning method is described in, for example, Speech Recognition with Probabilistic Models by Seiichi Nakagawa, the Institute of Electronics, Information and Communication Engineers, and Context-Dependent Phonetic Hidden Markov Models for Speaker-Independent Continuous Speech Recognition by KAI-FU, IEEE Transaction on ASSP VOL.38, NO.4,April 1990.
Referring to FIG. 7 illustrating the configuration of a typical conventional identification-function calculator, a great number of learning samples are input into an identification-function calculator 51 in which an identification function, i.e., a parameter representing (forming) the identification function, is determined based on the learning samples.
However, satisfactory recognition performance cannot be always ensured by use of the identification function obtained through the aforedescribed learning method. For better recognition performance, a method is available, for example, for adapting the identification functions to the states of speech variation factors during recognition. A method is available for adapting, for example, the phoneme-discriminant functions, to the speaker is disclosed in, various technical literature, such as A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models by Chin- Hui Lee, et al., IEEE, Transaction on signal processing, VOL.39, NO.4, 1991, and Fast Speaker Adaptation for Speech Recognition Systems by F. Class, et al., Proceedings of IEEE ICASSP, pp. 133-136, 1990.
Hitherto, learning is carried out in such a manner that the identification functions are determined regardless of the aforementioned adaptation method employed during recognition. Namely, the identification functions are determined based on the standards in which the highest performance can be offered when speech adaptation is not made. Consequently, optimum adaptation cannot be always made during recognition even by use of the identification functions determined by the above-described technique. This makes it difficult to significantly improve the identification or accuracy (recognition accuracy) even though adaptations are made.
SUMMARY OF THE INVENTION
Accordingly, in view of this background, it is an object of the present invention to provide a significant improvement in identification accuracy (recognition accuracy) achieved by performing optimum adaptation for conducting pattern recognition.
In order to achieve the above object, according to an aspect of the present invention, there is provided a discriminant- or identification-function calculating method for calculating a classification or identification parameter that forms a discriminant or identification function used for classifying an input characteristic or feature vector under one of a predetermined number of classes, the identification parameter being obtained by applying a fixed identification parameter independent of a predetermined variation state and a maximum-likelihood adaptive discriminant function or adaptation parameter to an adaptation function used for transforming the identification function, the maximum-likelihood adaptation parameter being an adaptation parameter that maximizes the likelihood with respect to a signal sample in the predetermined variation state on the condition that the adaptation function is used, the fixed identification parameter being obtained by a learning method which comprises: a first step of determining learning maximum-likelihood adaptation parameters that respectively maximize the probabilities of generating learning samples in a plurality of variation states of the input characteristic vector on the condition that the adaptation function is used for the variation states; a second step of calculating the likelihood of the fixed identification parameter with respect to all of the variation states according to the learning maximum-likelihood adaptation parameters and the fixed identification parameter obtained through learning which has been previously performed; a third step of determining as to whether the calculated likelihood is equal to or greater than a predetermined threshold; and a fourth step of using as a learning value the fixed identification parameter whose likelihood is equal to or greater than the predetermined threshold.
According to another aspect of the present invention, there is provided an identification unit comprising: an input section for inputting a signal in a predetermined variation state; an analysis section for extracting a characteristic vector from the input signal; a calculator for calculating the value of an adaptive identification function with respect to the characteristic vector with use of an identification function for identifying the characteristic vector, the identification function having an identification parameter transformed to be adaptable to the predetermined variation state; and a determination section for determining an identification result based on the calculation result of the calculator, the identification parameter being obtained by applying a fixed identification parameter independent of the predetermined variation state and a maximum-likelihood adaptation parameter to an adaptation parameter used for transforming the identification function, the maximum-likelihood adaptation parameter being an adaptation parameter that maximizes the probability of generating a signal sample in the predetermined variation state on the condition that the adaptation function is used, the fixed identification parameter being obtained by a learning method which comprises a first step of determining learning maximum-likelihood adaptation parameters that respectively maximize the probabilities of generating learning samples in a plurality of variation states of the input characteristic vector on the condition that the adaptation function is used, a second step of calculating the likelihood of the fixed identification parameter with respect to all of the variation states according to the learning maximum-likelihood adaptation parameters and the fixed identification parameter obtained through learning which has been previously performed, a third step of determining as to whether the calculated likelihood is equal to or greater than a predetermined threshold, and a fourth step of using as a learning value the fixed identification parameter whose likelihood is equal to or greater than the predetermined threshold.
According to still another aspect of the present invention, there is provided an identification unit comprising: an input section for inputting a signal in a predetermined variation state; an analysis section for extracting a characteristic vector from the input signal; a characteristic-vector adaptation section for applying the characteristic vector to an adaptation function formed of a maximum-likelihood adaptation parameter so as to adapt the characteristic vector to an identification parameter, which is a standard pattern for identification, thereby generating an adaptive characteristic vector; a calculator for calculating the value of an adaptive identification function with respect to the characteristic vector with the application of the adaptive characteristic vector to the identification function; and a determination section for determining an identification result based on the calculation result of the calculator, the identification parameter being a fixed parameter independent of the predetermined variation state, the maximum-likelihood adaptation parameter being an adaptation parameter that maximizes the probability of generating a signal sample in the predetermined variation state on the condition that the adaptation function is used, the identification parameter being obtained by a learning method which comprises a first step of determining learning maximum-likelihood adaptation parameters that respectively maximize the probabilities of generating learning samples in a plurality of variation states of the input characteristic vector on the condition that the adaptation function is used, a second step of calculating the likelihood of the fixed identification parameter with respect to all of the variation states according to the learning maximum-likelihood adaptation parameters and the identification parameter obtained through learning which has been previously performed, a third step of determining as to whether the calculated likelihood is equal to or greater than a predetermined threshold, and a fourth step of using as a learning value the identification parameter whose likelihood is equal to or greater than the predetermined threshold.
According to a further aspect of the present invention, there is provided a speech recognition system comprising: acoustic analysis means for acoustically analyzing input speech and calculating a characteristic vector of the speech; recognition means for recognizing the speech based on the characteristic vector; a calculator for calculating the value of an adaptive identification function with respect to the characteristic vector with use of an identification function for identifying the characteristic vector, the identification function having an identification parameter transformed to be adaptable to a variation state of the input speech; and a determination section for generating a speech recognition result based on the calculation result of the calculator, the identification parameter being obtained by applying a fixed identification parameter independent of the predetermined variation state and a maximum-likelihood adaptation parameter to an adaptation parameter used for transforming the identification function, the maximum-likelihood adaptation parameter being an adaptation parameter that maximizes the probability of generating a signal sample in the predetermined variation state on the condition that the adaptation function is used, the fixed identification parameter being obtained by a learning method which comprises a first step of determining learning maximum-likelihood adaptation parameters that respectively maximize the probabilities of generating learning samples in a plurality of variation states of the input characteristic vector on the condition that the adaptation function is used, second step of calculating the likelihood of the fixed identification parameter with respect to all of the variation states according to the learning maximum-likelihood adaptation parameters and the fixed identification parameter obtained through learning which has been previously performed, a third step of determining as to whether the calculated likelihood is equal to or greater than a predetermined threshold, and a fourth step of using as a learning value the fixed identification parameter whose likelihood is equal to or greater than the predetermined threshold.
According to a further aspect of the present invention, there is provided a speech recognition system comprising: acoustic analysis means for acoustically analyzing input speech and calculating a characteristic vector of the speech; a characteristic-vector adaptation section for applying the characteristic vector to an adaptation function formed of a maximum-likelihood adaptation parameter so as to adapt the characteristic vector to an identification parameter, which is a standard pattern for identification, thereby generating an adaptive characteristic vector; a calculator for calculating the value of an adaptive identification function with respect to the characteristic vector with the application of the characteristic vector to an identification function formed of the identification parameter; and a determination section for generating a speech recognition result based on the calculation result of the calculator, the identification parameter being a fixed parameter independent of the predetermined variation state, the maximum-likelihood adaptation parameter being an adaptation parameter that maximizes the probability of generating a signal sample in the predetermined variation state on the condition that the adaptation function is used, the identification parameter being obtained by a learning method which comprises a first step of determining learning maximum-likelihood adaptation parameters that respectively maximize the probabilities of generating learning samples in a plurality of variation states on the condition that the adaptation function is used, a second step of calculating the likelihood of the fixed identification parameter with respect to all of the variation states according to the learning maximum-likelihood adaptation parameters and the identification parameter obtained through learning which has been previously performed, a third step of determining as to whether the calculated likelihood is equal to or greater than a predetermined threshold, and a fourth step of using as a learning value the identification parameter whose likelihood is equal to or greater than the predetermined threshold.





BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of the configuration of an embodiment of a speech recognition system formed by the application of the present invention;
FIG. 2 is a block diagram of the configuration of an embodiment of an identification-function calculator formed by the present invention;
FIGS. 3A to 3D illustrate the principle of the present invention;
FIG. 4 illustrates the principle of the present invention;
FIG. 5 is a block diagram of the configuration of another embodiment of a speech recognition system formed by the application of the present invention;
FIG. 6 is a block diagram of the configuration of another embodiment of an identification-function calculator formed by the present invention; and
FIG. 7 is a block diagram of the configuration of an example of conventional identification-function calculators.





DESCRIPTION OF THE PREFERRED EMBODIMENTS
FIG. 1 is a block diagram of the configuration of an embodiment of a speech recognition system formed by the application of the present invention. In this system, speech adapted for the speaker is recognized.
More specifically, speech input into a microphone 1 is converted to a sound signal as an electric signal and is further output to an acoustic analysis section 2. In this section 2, the sound signal output from the microphone 1 is subjected to analog-to-digital conversion for conversion into a digital sound signal and further undergoes predetermined acoustic analyses, so that speech characteristic vectors (for example, the linear predictive coefficient (LPC), the LPC spectrum, the power of every predetermined bandwidth, and so on) can be extracted. The characteristic vectors O are then supplied to a switch 3. The switch 3 selects a terminal when speech adaptation to a speaker Vt is required, for example, when a speaker starts to make a sound. Accordingly, the characteristic vectors O output from the acoustic analysis section 2 are supplied to an adaptation parameter calculator 4 via the switch 3 and the terminal a as a sample At used for performing adaptation to the speaker Vt (hereinafter referred to as "the adaptation sample").
The adaptation parameter calculator 4 calculates, based on the adaptation sample At, an adaptation parameter .GAMMA.(ML)t used for adapting the universal identification function stored in a below-described identification unit 5 to the speaker Vt, and the resulting adaptation parameter .GAMMA.(ML)t is supplied to a speaker adaptor 12 forming the identification unit 5. Upon outputting of the adaptation parameter .GAMMA.(ML)t to the identification unit 5 from the adaptation parameter calculator 4, the switch 3 selects a terminal b. This causes the characteristic vectors O output from the acoustic analysis section 2 to be supplied to identification-function calculators 13 (identification-function calculators 13-1 through 13-K) of the identification unit 5 via the switch 3 and the terminal b.
In the identification unit 5, the characteristic vectors O are classified under a predetermined number K of classes Ck with use of an identification function (different from the universal or discriminant identification function only in the parameter, i.e., .phi.t,k.noteq..lambda.k), so that the speech input into the microphone 1 can be identified (recognized). More specifically, the parameters .lambda.k (vectors) of the universal identification function corresponding to the classes Ck (for example, the parameters a, b and c of the function f(x)=ax.sup.2 +bx+c) (in this embodiment, referred to as "the universal discriminant function parameters or universal parameters" as required) are stored in a universal parameter storage device 11. Upon receiving the adaptation parameter .GAMMA.(ML)t from the adaptation parameter calculator 4, the speaker adaptor 12 reads the universal identification function (universal parameter) from the universal parameter storage device 11 and adapts, based on the adaptation parameter .GAMMA.(ML)t, the function (converts the function) to the speaker Vt. This makes it possible to calculate the parameter .phi.t,k(=F((.lambda.k, .GAMMA.(ML)t) that forms the identification function adapted to the adaptation sample At, i.e., in this case, the speaker Vt (in this embodiment, referred to as "the adaptive identification function" as required).
The parameter .phi.t,k is fed to the identification-function calculator 13-K for calculating the value of the adaptive discriminant or identification function corresponding to the class Ck. In the calculator 13-k, the value of the adaptive identification function expressed by the parameter .phi.t,k is calculated with the use of the characteristic vectors O as an input value and is output to a class determination circuit 14. The determination circuit 14 detects, for example, the greatest value of the functions output from the identification-function calculators 13-1 through 13-K, and the suffix k of the class Ck corresponding to the greatest value of the adaptive identification function is determined and output as a speech recognition result.
The recognition (identification) principle of the speech recognition system shown in FIG. 1 will now be explained. The adaptive identification function associated with a certain class Ck adapted to the state of a variation factor (variation state) Vt is represented by g t,k(.cndot.). For example, in speech recognition performed in which a phoneme is output as a speech recognition result, Vt represents the speaker and the class Ck indicates a phoneme as a recognition result.
If the variation state Vt is known and the sample (characteristic vectors) O is observed in the variation state Vt, the class determination rule for determining under which class the sample O is classified is defined by the following equation. ##EQU1## Equation (1) determines k which maximizes the function g t,k(O). In this case, the sample O is identified as the class Ck with the suffix k determined by equation (1).
It will now be assumed that there are K classes ranging from class Cl to class CK as the classes Ck, and the priori probabilities p(C1, Vt) through p(CK, Vt) of observing the samples 0 belonging to the respective classes C1 through CK in the variation state Vt are equal. More specifically, concerning, for example, speech recognition, if the priori probabilities p(C1, Vt) through p(CK, Vt) of a speaker's Vt making a sound of the respective phonemes C1 through CK to be recognized are equal, the adaptive identification function g t,k(O) that can minimize the erroneous-recognition rate when the sample O is observed in the variation state Vt can be expressed by, for example, the following equation:
g t,k(O)=p(O.vertline.Ck, Vt) (2)
Namely, in this case, the adaptive identification function g t,k(O) can be represented by the probability density function p(O.vertline.Ck, Vt) conditioned by the variation state Vt and the classes Ck (hereinafter referred to as "the conditional probability density function" as required). It will now be assumed that the conditional probability density function p(O.vertline.Ck, Vt) is uniquely determined by the parameter (vectors) .phi.t,k, in other words, the function can be represented in a parametric form. For representing that the conditional probability density function p(O.vertline.Ck, Vt) depends on the parameter .phi.t,k, the function p(O.vertline.Ck, Vt) will hereinafter be represented by p(O.vertline..phi.,t,k).
If the probability density function p(O.vertline..lambda.k) conditioned by the classes Ck and expressed in a parametric form (hereinafter also referred to as "the conditional probability density function" as required) has been determined, the conditional probability density function p(O.vertline..phi.t,k) can be obtained by adapting the function p(O.vertline..lambda.t,k) to the variation state Vt. Since the conditional probability density function p(O.vertline..lambda.k) is applicable to all the variation states Vt, i.e., the function is independent of the variation states Vt, it can be referred to as the "universal" conditional probability density function. It will now be assumed that Gk(O) is equal to p(O.vertline..lambda.k), and thus, Gk(O) will hereinafter be referred to as "the universal discriminant or identification function" as required. Further, since .lambda.k indicates a parameter forming (representing) the universal identification function Gk(.cndot.), it will be referred to as "the universal discriminant function parameter or universal parameter" in this embodiment as required.
As is seen from the foregoing description, the adaptive discriminant or identification function g t,k(O) can be obtained by adapting the universal identification function Gk(O) to the variation state Vt. If the function of performing transformation corresponding to the aforementioned adaptation (hereinafter referred to as "the adaptive function" as required) is expressed by F(.cndot.) and controlled by the adaptation parameter (vectors) .gamma.t,k, the parameter indicating the adaptive identification function g t,k(O) (in this embodiment, referred to as "the identification parameter" as required) can be expressed by the following equation:
.phi.t,k=F(.lambda.k, .gamma.t,k) (3)
As required, the vectors [.phi.t,1,.phi.t,2, . . . .phi.t,K] formed of a group consisting of the identification parameters .phi.t,1, .phi.t, 2, . . . .phi.t,K will be designated by .PSI.t; the vectors [.lambda.1, .lambda.2, . . . , .lambda.K] constituted of a group consisting of the universal parameters .lambda.1, .lambda.2, . . . , .lambda.K will be represented by .LAMBDA.; and the vectors [.gamma.t,1, .gamma.t,2, . . . , .gamma.t,K] formed of a group consisting of the adaptation parameters .gamma.t,1, .gamma.t,2, . . . , .gamma.t,K will be indicated by .GAMMA.t. Under these conditions, equation (3) can be expressed by the following equation:
.PSI.t=F(.LAMBDA.,.GAMMA.t) (4)
For performing speech recognition (identification), the adaptive identification function g t,k(O) is obtained by adapting the universal identification function Gk(O) to the variation state Vt. This requires the determination of the adaptation parameter .GAMMA.t that controls (forms) the adaptive function F(.LAMBDA., .GAMMA.t) and corresponds to the variation state Vt. The adaptation parameter .GAMMA.t can be determined based on the sample At observed in the variation state Vt (this sample is used for adapting the universal identification function Gk(O) to the variation state Vt, and is thus referred to as "the adaptation sample" in this embodiment as required. The adaptation sample At can be expressed by the following equation: ##EQU2## wherein Ot,k,i(A) represents the i-th characteristic vector O observed in the variation state Vt and belonging to the class Ck, and m t,k indicates the total number of the characteristic vectors O observed in the variation state Vt and belonging to the class Ck. Moreover, (A) of the characteristic vectors O t,k,i(A) indicates that the characteristic vectors O t,k,i(A) are used for adapting the universal identification function Gk(O) to the variation state Vt. It should be noted that m t,k may be only a small value, for example, one.
In the above case, the maximum likelihood .GAMMA.(ML)t of the adaptation parameter .GAMMA.t can be found by the following equation: ##EQU3## Namely, equation (6) determines the parameter .GAMMA.t that maximizes the likelihood p(At.vertline.F(.LAMBDA.,.GAMMA.t)). ##EQU4## It should be noted that the universal parameter .LAMBDA. be fixed.
Subsequent to the determination of the adaptation parameter .GAMMA.t (the maximum likelihood .GAMMA.(ML)t of the adaptation parameter) from equation (6), the identification parameter .PSI.t is found according to equation (4), and the conditional probability density function p(O.vertline..PSI.t) with respect to the input characteristic vector O, i.e., the adaptive identification function g t,k(O), is calculated so as to determine k that satisfies the condition expressed by equation (1). In this manner, the identification result of the input characteristic vector O, i.e., the class Ck, can be obtained.
In the speech recognition system shown in FIG. 1, the suffix k of the class Ck is output as a speech recognition result, based on the above-described principle. Namely, the adaptation parameter calculator 4 determines the adaptation parameter .GAMMA.t (the maximum likelihood .GAMMA.(ML)t) with the use of the adaptation sample At according to equation (6). The universal parameter .LAMBDA.(.lambda., .lambda.2, . . . , .lambda.K) stored in the universal parameter storage device 11 is fed to the speaker adaptor 12 in which the universal parameter .LAMBDA.(.lambda.1, .lambda.2, . . . , .lambda.K) is transformed according to equation (3) or (4) with the use of the adaptation parameter .GAMMA.t found in the adaptation parameter calculator 4. This makes it possible to calculate the identification parameter .PSI.t(.phi.t,1, .phi.t,2, . . . , .phi.t,K) forming the adaptive identification function g t,k(.cndot.) obtained by adapting the universal identification function Gk(.cndot.) to the adaptation sample At. In the respective identification-function calculators 13-1 through 13-K, the values g t,1(O), g t,2(O), . . ., g t,K(O) of the adaptive identification function g t,k(.cndot.) represented by the identification functions .phi.t,1, .phi.t,2, . . . , .phi.t,K with respect to the characteristic vector O are calculated. Thereafter, in the class determination circuit 14, k satisfying equation (1) is determined with the use of the values g t,1(O), g t,2(O), . . . , g t,K(O) of the function g t,k(.cndot.).
An explanation will now be given of a calculating method (learning method) for the universal identification function Gk(.cndot.), i.e., for the universal parameter .LAMBDA., stored in the universal parameter storage device 11 shown in FIG. 1. It will now be assumed that there are learning samples X belonging to the K number of classes C1, C2, . . . , CK which are each formed of a group consisting of the characteristic vectors and are observed (extracted) in the T number of variation states V1, V2, . . . , VT. These samples X can be expressed by the following equations: ##EQU5## In equation (8), Xt depicts a group of learning samples observed in the variation state Vt; in equation (9), Ot,k indicates the group of learning samples belonging to the class Ck among the learning samples observed in the variation state Vt; and in equation (10), m t,k designates the characteristic vectors forming the learning samples (a group of learning samples) O t,k, i.e., the total number of the learning samples belonging to the class Ck observed in the variation state Vt, and O t,k,i represents the i-th characteristic vector of the learning sample O t,k.
The vectors formed of a group consisting of the adaptation parameters .GAMMA.1, .GAMMA.2, . . . , .GAMMA.K and the vectors constituted of a group consisting of identification parameters .PSI.1, .PSI.2, . . . , .PSI.K are represented by .GAMMA. and .PSI., as expressed by the following equations (11) and (12), respectively.
.GAMMA.=[.GAMMA.1,.GAMMA.1, .GAMMA.2, . . . , .GAMMA.T] (11)
.PSI.=[.PSI.1, .PSI.2, . . . , .PSI.T] (12)
In the above case, equation (3) or (4) can be expressed by the following equation:
.PSI.=F(.LAMBDA.,.GAMMA.) (13)
In this embodiment, learning is conducted on the universal parameter .LAMBDA. based on the likelihood of the adaptive identification function g t,k(.cndot.) with respect to the learning samples X, the function g t,k(.cndot.) being obtained by performing transformation (transformation with the adaptation function F) F(.cndot.) to adapt the universal identification function G K(.cndot.) to the learning samples X (the function g t,k(.cndot.) is formed of a parameter .PSI.t, i.e.,F(.LAMBDA.,.GAMMA.t) as discussed above). This likelihood will hereinafter be expressed as p(X.vertline.F(.LAMBDA.,.GAMMA.t)).
The adaptation parameter .GAMMA.t, which is used for adapting the universal parameter .LAMBDA. to the variation state Vt, i.e., to the adaptation sample At, can be obtained based on the relationship between the universal parameter .LAMBDA. and the learning sample Xt. Further, since the adaptation sample At is voluntarily observed in the variation state Vt, the adaptation parameter .GAMMA.t can be considered as a probability variable for the variation state Vt. In this case, the logarithmic likelihood U1(X,.LAMBDA.) of the universal parameter .LAMBDA. with respect to the learning samples X can be defined by the following equation: ##EQU6## Equation (14) indicates the logarithmic likelihood of the universal parameter .LAMBDA. for T number of variation states Vt.
The maximum likelihood .LAMBDA.(ML) of the universal parameter .LAMBDA.can be determined by maximizing the logarithmic likelihood U1(X,.LAMBDA.), as expressed by the following equation: ##EQU7## It should be noted that the determination of the universal parameter .LAMBDA. that maximizes the logarithmic likelihood U1(X,.LAMBDA.) is the same as the determination of the universal parameter .LAMBDA. that maximizes the aforementioned likelihood p(x.vertline.F(.LAMBDA.,.GAMMA.t). However, infinite samples are, in general, required for obtaining the likelihood p(.GAMMA.t.vertline..LAMBDA.) in equation (15). Thus, the likelihood p(.GAMMA.t.vertline..LAMBDA.) is learned inductively so as to be maximized.
More specifically, a sample B t,h (which will hereinafter be referred to as "the adaptation samples" as required) used for determining the adaptation parameter .GAMMA. is selected from learning samples Xt (which are not essential and may be samples observed in the variation states Vt). It should be noted that h=1, 2, . . . , Ht which indicates the total number of the selected learning samples. Ht may be equal to or greater than one. In this case, the logarithmic likelihood U2(X,.LAMBDA.) of the universal parameter .LAMBDA. with respect to the learning samples X can be expressed by the following equation: ##EQU8## wherein .GAMMA.(ML)t,h depicts the maximum likelihood of the adaptation parameter obtained based on the sample B t,h and can be found by the following equation: ##EQU9## The maximum likelihood .LAMBDA.(ML) of the universal parameter .LAMBDA. can be obtained by maximizing equation (16) according to the following equation: ##EQU10##
FIG. 2 is a schematic diagram of the configuration of an embodiment of an identification-function calculator for calculating the universal identification function Gk(.cndot.), i.e., the universal parameter .LAMBDA., based on the aforedescribed principle. Learning samples X are supplied to an extractor 21 which extracts a learning sample used for calculating adaptation function-parameter and also to a universal parameter calculator 23. For calculating the universal parameter .LAMBDA. applied to the identification unit 5 shown in FIG. 1 (the parameter A stored in the universal parameter storage device 11), characteristic vectors output from the acoustic analysis section 2 are used as the learning samples X.
The above-described sample B t,h is extracted from the learning-sample extractor 21 by selecting the sample B t,h from the learning samples X based on, for example, random numbers generated in the extractor 21. The sample B t,h extracted in the extractor 21 are fed to an adaptation parameter calculator 22. A predetermined universal parameter (the initial parameter or the previous parameter) is also supplied to the adaptation parameter calculator 22. In the adaptation parameter calculator 22, the adaptation parameter .GAMMA. for adapting the universal parameter .LAMBDA. fed from the universal parameter calculator 23 to the sample B t,h is calculated. That is, the adaptation parameter calculator 22 calculates the maximum likelihood .GAMMA.(ML)t,h of the adaptation parameter .GAMMA. according to equation (17) and then outputs it to the universal parameter calculator 23.
In the universal parameter calculator 23, the maximum likelihood .LAMBDA.(ML) of the universal parameter .LAMBDA. is determined according to equation (18). More specifically, in the universal parameter calculator 23, the universal parameter .LAMBDA. is transformed, based on the adaptation parameter .GAMMA.(ML)t,h supplied from the adaptation parameter calculator 22, so that F(.LAMBDA.,.GAMMA.(ML)t,h) in equation (16) (which represents the parameter .PSI.t of the adaptive identification function g t,k(.cndot.) obtained by transforming the universal identification function Gk(.cndot.) to be adaptable to the sample B t,h) can be attained. Also, the universal parameter .LAMBDA.(ML) (the logarithmic likelihood U2 expressed by equation (16) in this embodiment) that maximizes the likelihood of F(.LAMBDA.,.GAMMA.(ML)t,h) with respect to the samples X is calculated.
Further, the universal parameter calculator 23 determines whether the logarithmic likelihood U2 for providing the universal parameter .LAMBDA.(ML) is equal to or greater than, for example, a predetermined threshold. If the logarithmic likelihood U2 is found to be smaller than the predetermined threshold, the universal parameter calculator 23 supplies to the adaptation parameter calculator 22 the universal parameter .LAMBDA. which is likely to improve the logarithmic likelihood U2. Upon receiving the updated universal parameter .LAMBDA., the adaptation parameter calculator 22 re-calculates the maximum likelihood .GAMMA.(ML)t,h of the adaptation parameter .GAMMA. according to equation (17) and outputs the maximum likelihood .GAMMA.(ML)t,h to the universal parameter calculator 23.
The above-described processing is repeated in the adaptation parameter calculator 22 and the universal parameter calculator 23 until the logarithmic likelihood U2 for supplying the universal parameter .LAMBDA.(ML) reaches the predetermined threshold. When the predetermined threshold is reached, the universal parameter calculator 23 outputs the universal parameter .LAMBDA.(ML) to a memory 24 and stores it therein. The universal parameter .LAMBDA.(ML) is then transferred to, for example, the universal parameter storage device 11 shown in FIG. 1 as required.
In this manner, learning is carried out while the universal identification function Gk(.cndot.) is adapted to the learning sample Xt in accordance with the relationship between the variation state Vt and the universal identification function Gk(.cndot.). It is thus possible to gain the universal identification function Gk(.cndot.) that can offer the highest performance when adaptation is made. By use of this universal identification function Gk(.cndot.) during recognition, the recognition accuracy can be improved.
If p(.GAMMA.t.vertline..LAMBDA.) in equation (14) is designated by comparatively a small number of parameters, and if comparatively a large number of samples are provided as the samples B t,h, the variation in the mean values of the samples is decreased, thereby further decreasing a distribution in the adaptation parameters .GAMMA.t. It will now be assumed that the distribution of the adaptation parameters .GAMMA.t is zero. The logarithmic likelihood U3(X,.LAMBDA.A) of the universal parameter .LAMBDA. with respect to the learning samples X can thus be expressed by the following equation, since .intg.p(.GAMMA.t.vertline..LAMBDA.)d .GAMMA.t is equal to one: ##EQU11## wherein .GAMMA.(ML)t designates the maximum likelihood of the adaptation parameter .GAMMA.t obtained by the following equation: ##EQU12## In the above case, the objective function U4(X,.LAMBDA.,.GAMMA.) to be maximized can be expressed by the following equation according to equations (19) and (20). ##EQU13## From equation (21), the maximum likelihood .LAMBDA.(ML) of the universal parameter .LAMBDA. can be attained by maximizing the objective function U4(X,.LAMBDA.,.GAMMA.) including both of .LAMBDA. and .GAMMA.. .LAMBDA. and .GAMMA. that maximize U4(X,.LAMBDA.,.GAMMA.) can be obtained according to, for example, the simulated annealing method, the steepest descent method, and the expectation maximization (EM) algorithm. The simulated annealing method achieves relatively good solutions, though it requires greater computations. On the other hand, the steepest descent method and the EM algorithm require only small computations, while, disadvantageously, frequently resulting in the localized maximum values as the solutions.
An explanation will now be given of the aforedescribed learning method in which the universal identification function Gk(.cndot.) uses, for example, a normal distribution (mixed normal distribution) function by setting covariance components other than diagonal components to be zero so as to be expressed in a parametric form, and the adaptation parameters employ, for example, add vectors. In the above case, the universal identification function Gk(O) indicating the J-th p Gaussian distribution can be expressed by the following equation: ##EQU14## wherein J represents the number of components (the number of dimensions) of the characteristic vector O; oj indicates the j-th component of the characteristic vector O; and .mu.k,j and .sigma.k,j2 respectively designate the mean value and the distribution of the j-th component oj. In this case, the universal parameter .lambda.k can be represented by [.mu.k,1, .mu.k,2, . . . , .mu.k,J, .sigma.k,12, .sigma.k,22, . . . , .sigma.k,J2].
If the adaptation parameter of the adaptation function F(.cndot.), i.e., the add vector .GAMMA.t in this case, is expressed by [.gamma.t,k,1, .gamma.t,k,2, . . . , .gamma.t,k,J], and if .mu.k,j and .sigma.k,j2 adapted to the learning samples Xt are indicated by .mu. t,k,j and .sigma. t,k,j2, respectively, the adaptation parameter .phi.t,k can be expressed by the following equation:
.phi.t,k=F(.lambda.k,.GAMMA.t)=[.mu.t,k,1,.mu.t,k,2, . . . ,.mu.t,k,J, .sigma.t,k,12, .sigma.t,k,22, . . . , .sigma.t,k,J2]=[.mu.k,1+.gamma.t,k,1, .mu.k,2+.gamma.t,k,2, . . . , .mu.k,J+.gamma.t,k,J, .sigma.k,12, .sigma.k,22, . . . , .sigma.k,J2](23)
In this case, therefore, the adaptive identification function g t,k(O) can be expressed by the following equation: ##EQU15## The universal identification function G k(.cndot.), i.e., the universal parameter .LAMBDA., can be attained by maximizing the logarithmic likelihood L(.LAMBDA.,.GAMMA.) expressed by the following equation with respect to both of .LAMBDA. and .GAMMA.(ML)t,h according to equations (16) through (18). ##EQU16## wherein .GAMMA.(ML)t,h designates the maximum likelihood of the add vector .GAMMA.t obtained with use of learning sample Bt,h.epsilon.Xt, which can be expressed by the following equation: ##EQU17## wherein F(.LAMBDA.,.GAMMA.t,h) can be found by the following equation:
F(.LAMBDA.,.GAMMA.t,h)=[.mu.k,1+.gamma.t,k,1,h,.mu.k,2+.gamma.t,k,2,h, . . . , .mu.k,J,+.gamma.t,k,J,h,.sigma.k,12,.sigma.k,22, . . . ,.sigma.k,J2](27)
wherein .gamma. t,k,j,h represents .gamma. t,k,j with respect to the sample B t,h.
The principle of the learning method discussed above can be considered in the following intuitive manner. An example of this principle will be given of the case where learning is performed in, for example, image recognition (character recognition) on the standard pattern of the character "A" corresponding to the universal identification function Gk(.cndot.). If, for example, only the patterns of the character "A" of substantially the same shape free from slanting are input to be recognized during recognition, it is merely essential that such patterns are learned in a manner similar to the known technique. Hence, the standard pattern reflecting the characteristics of the input character "A" , as shown in FIG. 3A, can be obtained.
In practice, however, it is very unlikely that only the above-described patterns of character "A" of substantially the same shape free from slanting are always input during recognition. Instead, in general, the types of character "A" slanting in the right or left direction illustrated in FIG. 3B, as well as the character "A" free from slanting, are input. Hitherto, learning is unfavorably conducted in such a manner that both of the character "A" free from slanting and the character "A" slanting in the right or left direction are included, as they are, in the standard pattern. Hence, the resulting standard pattern of the character "A" is disadvantageously indicated, in the extreme case, as shown in FIG. 3D, with significant impairment of the characteristics of the character "A".
In order to overcome the above drawback, in the present invention, the standard pattern of the character "A" is rotated, as illustrated in FIG. 3C, in accordance with the slanting of the character "A" as learning samples. In other words, learning is conducted in such a manner that the standard pattern corresponding to the universal parameter .LAMBDA. is transformed with the adaptation function F(.cndot.) to be adaptable to the learning samples, so that the standard pattern sufficiently reflecting the characteristics of the character "A" can be achieved. During recognition, the resulting standard pattern can be adapted to the input characters to be recognized so as to enhance the recognition performance.
The adaptation between the standard pattern and the learning samples performed during learning may be done in the following manner. As described above, the standard pattern of the character "A" may be rotated to match the slanting of the learning samples, as shown in FIG. 3C. Alternatively, the learning samples may be adapted to the standard pattern by rotating the character "A" as the learning samples according to the slanting of the character. The same applies to the adaptation performed during recognition.
FIG. 5 is a schematic diagram of the configuration of another embodiment of a speech recognition system for performing speech recognition based on the adaptation method in which the learning samples are adapted to the standard pattern. The elements corresponding to those shown in FIG. 1 are designated by like reference numerals, and an explanation thereof will thus be omitted as required. In this speech recognition system, as well as the system illustrated in FIG. 1, the recognition of speech adapted to the speaker is conducted.
As has been previously discussed, the switch 3 selects the terminal a when the adaptation to the speaker Vt is required. This causes the characteristic vectors O to output from the acoustic analysis section 2 to an adaptation parameter calculator 34 via the switch 3 and the terminal a as the adaptation sample At used for the adaptation to the speaker Vt. The adaptation parameter calculator 34 calculates, based on the adaptation sample At, the adaptation parameter .GAMMA.(ML)t used for adapting the characteristic vectors O to the universal identification function stored in the below-described identification unit 35. The resulting adaptation parameter .GAMMA.(ML)t is output to an adaptive feature vector or adaptive vector calculator 36.
At this time, the switch selects the terminal b so that the characteristic vectors O output from the acoustic analysis section 2 are supplied to the adaptive vector calculator 36 via the switch 3 and the terminal b.
Upon receiving the adaptation parameter .GAMMA.(ML)t from the adaptation parameter calculator 34, the adaptive vector calculator 36 adapts (transforms) the characteristic vectors O based on the adaptation parameter .GAMMA.(ML)t. This makes it possible to calculate the characteristic parameter F(O.vertline..gamma. t,k) adapted to the universal identification function stored in the universal parameter storage device 11 (referred to as "adaptive characteristic parameter" in this embodiment as required). The adaptive characteristic parameters F(O.vertline..gamma.t,1) through F(O.vertline..gamma. t,K) are supplied to the identification- function calculators 13-1 through 13-K, respectively.
In the identification unit 35, the input adaptive characteristic vectors are classified under a predetermined number K of classes Ck with use of the universal identification function, so that the identification (recognition) of speech input into the microphone 1 can be performed. More specifically, the universal parameter .lambda.k is read from the universal parameter storage device 11 and is fed to the identification-function calculator 13-k. In the calculator 13-k, the value of the universal identification function expressed by the universal parameter .lambda.k is calculated using the adaptive characteristic vector F(O.vertline..gamma. t,k) as the input value, and is then output to the class determination circuit 14. In the circuit 14, the speech recognition result is determined and output in a manner similar to the procedure described above while referring to FIG. 1.
The recognition (identification) principle of the speech recognition system shown in FIG. 5 will now be explained. In this system, fundamentally, the input characteristic vectors O are adapted to the universal identification function Gk(.cndot.), as described above. Thus, this system shown in FIG. 5 merely differs from the system illustrated in FIG. 1 in that the transformation with the adaptation function F(.cndot.) is performed on the characteristic vectors O instead of the universal identification function Gk(.cndot.) (universal parameter .LAMBDA.), and in that the identification function used for identification uses the universal identification function G k(.cndot.) instead of the adaptive identification function g t,k (.cndot.).
Accordingly, the class-determination rule can be obtained simply by substituting the adaptive identification function g t,k(.cndot.) shown in equation (1) with the universal identification function G k(.cndot.) and by substituting the characteristic vector O in equation (1) with the adaptive characteristic vector F(O.vertline..gamma. t,k) adapted to the universal identification function Gk(.cndot.). Namely, the class-determination rule can be specified by the following equation: ##EQU18## Further, the maximum likelihood .GAMMA.(ML)t of the adaptation parameter .GAMMA.t can be expressed by the following equation (29) modified by substituting the adaptation sample At shown in equation (6) with F(At,.GAMMA.t). The likelihood p(F(At,.GAMMA.t.vertline..LAMBDA.) can be expressed by the following equation (30) modified by substituting At, F(.LAMBDA.,.GAMMA.t), O t,k,i(A), and F(.lambda.k.gamma. t,k) shown in equation (7) with F(.LAMBDA.,.GAMMA.t), .LAMBDA., F(O t,k,i(A), .gamma.t,k), and .lambda.k, respectively. ##EQU19## After the adaptation parameter .GAMMA.t (the maximum likelihood .GAMMA.(ML)t of the adaptation parameter .GAMMA.t) has been obtained according to equation (29), the adaptive characteristic vector F(O .gamma. t,k) at the right side in equation (28) are calculated so as to determine the suffix k satisfying the condition shown in equation (28). As a consequence, the class Ck indicating the identification result of the input characteristic vectors O can be determined.
In this manner, in the speech recognition system illustrated in FIG. 5, the suffix k of the class Ck is output, based on the foregoing principle, as a speech recognition result. More specifically, the adaptation parameter calculator 34 calculates the adaptation parameter .GAMMA.t (the maximum likelihood .GAMMA.(ML)t ) with use of the adaptation samples At according to equation (29). In the adaptive vector calculator 36, the characteristic vectors O are transformed with use of the adaptation parameters .GAMMA.t determined in the adaptation parameter calculator 34 according to the function F(O,.GAMMA.t), so that the adaptive characteristic vectors F(O, .gamma. t,1), F(O, .gamma.t,2), . . . , F(O, .gamma. t,K) to be obtained by adapting the characteristic vectors O to the universal identification function Gk(.cndot.) can be calculated. Then, the identification-function calculators 13-1 through 13-K respectively calculate the function values G1(F(O, .gamma. t,1)), G2(F(O, .gamma. t,2)), . . . , GK(F(O, .gamma. t,K)) of the adaptive identification function Gk(.cndot.) expressed by the universal parameters .lambda.1, .lambda.2, . . . , .lambda.K with respect to the adaptive characteristic vectors F(O, .gamma. t,1), F(O, .gamma. t,2), . . . , F(O, .gamma. t,K). Thereafter, in the class determination circuit 14, the suffix k satisfying the condition of equation (28) is determined with use of the function values G1(F(O, .gamma. t,1)), G2(F(O, .gamma. t,2)), GK(F(O, .gamma. t,K)).
A description will now be given of a calculating method (learning method) for the universal identification function Gk(.cndot.), i.e., the universal parameter .LAMBDA., used in the speech recognition system shown in FIG. 5. Fundamentally, in this system, learning is carried out in such a manner that the learning samples X are adapted to the universal identification function Gk(.cndot.), as has been discussed above. Thus, the calculating method for this system merely differs from the method employed for the system illustrated in FIG. 1 in that the transformation with use of the adaptation function F(.cndot.) is performed on the learning samples X instead of the universal identification function Gk(.cndot.) (universal parameter .LAMBDA.). In this embodiment, the samples obtained by adapting the learning samples X to the universal identification function Gk(.cndot.), i.e., the samples obtained by transforming the learning samples X according to the adaptation function F(.cndot.), will be referred to as "the adaptive learning samples" as required.
Accordingly, the learning samples X (or adaptation samples B t,h) are substituted for the universal parameter A to be transformed with the adaptation function F(.cndot.) in equations (14) through (18), thus resulting in equations (31) through (35). The maximum likelihood .LAMBDA.(ML) of the universal parameter .LAMBDA. can be determined by maximizing equation (33) according to equation (35). ##EQU20##
FIG. 6 is a schematic diagram of the configuration of an other embodiment of the identification-function calculator for calculating the universal identification function Gk(.cndot.), i.e., the universal parameter .LAMBDA., based on the foregoing principle. The elements corresponding to those shown in FIG. 2 are designated by like reference numerals, and an explanation thereof will thus be omitted as required.
Supplied to an adaptation parameter calculator 42 are the sample B t,h extracted in the learning-sample extractor 21 and the universal parameter .LAMBDA. output from a universal-parameter calculator 43. The adaptation parameter calculator 42 calculates the adaptation parameter .GAMMA. used for adapting the sample B t,h supplied from the extractor 21 to the universal parameter .LAMBDA. fed from the universal parameter calculator 43. That is, the maximum likelihood estimate .GAMMA.(ML)t) of the adaptation parameter .GAMMA.t is calculated by the adaptation parameter calculator 42 according to equation (34) and is then output to the universal parameter calculator 43.
In the universal parameter calculator 43, the maximum likelihood .LAMBDA.(ML) of the universal parameter .LAMBDA. is determined according to equation (35). More specifically, in the universal parameter calculator 43, the learning samples X are transformed based on the adaptation parameter .GAMMA.(ML)t,h fed from the adaptation parameter calculator 42. This makes it possible to determine the adaptive learning sample F(Xt,.GAMMA.(ML)t,h) in equation (33) and also to calculate the universal parameter .LAMBDA.(ML) that maximizes the likelihood (logarithmic likelihood) U2 with respect to the adaptive learning sample F(Xt,.GAMMA.(ML)t,h).
Further, the universal parameter calculator 43 determines whether the logarithmic likelihood U2 for providing the universal parameter .LAMBDA.(ML) is equal to or greater than, for example, a predetermined threshold. If the logarithmic likelihood U2 is smaller than the predetermined threshold, the universal parameter calculator 43 supplies to the adaptation parameter calculator 42 the universal parameter .LAMBDA. that is likely to improve the logarithmic likelihood U2. Upon receiving the updated universal parameter .LAMBDA., the adaptation parameter calculator 42 re-calculates the maximum likelihood estimate .GAMMA.(ML)t,h of the adaptation parameter .GAMMA. according to equation (34) and outputs it to the universal parameter calculator 43.
The processing similar to the above-described procedure is repeated in the adaptation parameter calculator 42 and the universal parameter calculator 43 until the logarithmic likelihood U2 for providing the universal parameter .GAMMA.(ML) reaches the predetermined threshold. When the predetermined threshold is reached, the universal parameter calculator 43 outputs the universal parameter .GAMMA.(ML) to the memory 24 and stores it therein.
In this fashion, learning is carried out while the learning samples Xt are adapted to the universal identification function Gk(.cndot.) in accordance with the relationship between the variation state Vt and the universal identification function Gk(.cndot.). It is thus possible to gain the universal identification function Gk(.cndot.) that can offer the highest performance when adaptation is made. By use of this universal identification function Gk(.cndot.) during recognition, the recognition accuracy can be improved.
If p(.GAMMA.t.vertline..LAMBDA.) in equation (31) is designated by comparatively a small number of parameters, and if comparatively a large number of samples are provided as the samples B t,h, a variation in the adaptation parameters .GAMMA.t becomes very small. Assuming that the distribution of the adaptation parameters .GAMMA.t is zero, the following equations (36) through (38) corresponding to equations (19) through (21) are obtained. ##EQU21## In the above case, the maximum likelihood estimate .LAMBDA.(ML) of the universal parameter .LAMBDA. can be attained by maximizing U4(X,.LAMBDA.,.GAMMA.) in equation (38) with respect to both of .LAMBDA. and .GAMMA..
Although an explanation has been given of the present invention applied to speech recognition, this is not exclusive. The invention is applicable to any type of pattern recognition, such as image recognition.
Also, although in this embodiment the normal-distribution identification function is specifically used as the universal identification function, other types of identification functions, such as the Mahalanobis identification functions, the identification functions by Hidden Markov Models (HMM), the polynomial discriminant or identification functions, and the identification functions expressed by neural networks, may be employed.
Moreover, in this embodiment, the add vector is calculated as the adaptation parameter and is used to perform transformation to adapt the universal identification function. However, other types of adaptation parameters, for example, parameters for linear transformation, may be calculated and used to linearly transform the universal identification function. Further, the universal identification function may be adapted by, for example, linear transformation, polynomial transformation, and transformation by neural networks. The same methods are applicable to the transformation of the learning samples to the universal identification function.
Further, in the embodiment in which the universal identification function is adapted to the learning samples, the maximum likelihood of the universal parameter is determined. In other words, based on the likelihood (logarithmic likelihood) of the adaptive identification function with respect to the learning samples, the universal parameter that maximizes the likelihood is determined. However, other types of learning criteria such as the mutual information, the classification error rate, and so on, may be used, and based on the references, the universal parameter that maximizes the mutual information and the universal parameter that minimizes the classification error rate may be determined. The same is applicable to the adaptation of the learning samples to the universal identification function; the learning criteria other than the likelihood, such as the mutual information and the classification error rate, may be employed.
In this embodiment, for adapting the universal identification function to the learning samples, the maximum likelihood of the adaptation parameter is determined. In other words, based on the likelihood (logarithmic likelihood) of the adaptive identification function with respect to the learning samples, the adaptation parameter that maximizes the likelihood is determined. Other types of learning criteria, such as the mutual information and the classification error rate, may be used for the adaptation parameter, as well as the universal parameter. The same is applicable to the adaptation of the learning samples to the universal identification function.
Additionally, although in this embodiment a single universal identification function is allocated to each class, a plurality of universal identification functions may be assigned to each class.
As will be clearly understood from the foregoing description, the present invention offers the following advantages.
According to one form of the identification-function calculator and calculating method of the present invention, the identification function is learned while it is being adapted to learning samples. It is thus possible to obtain the identification function that can offer highest performance in the case that adaptation is carried out at the recognition classification stage.
According to one form of the identification unit and method of the present invention, a characteristic vector is identified with use of an adaptive identification function obtained by adapting an identification function to a characteristic vector. In this case, the identification function has been learned while it has been adapted to learning samples, and accordingly, it can offer the highest performance in the case adaptation is carried out thereof, thereby improving the identification (recognition) accuracy of the characteristic vectors, in other words, reducing the classification error rate.
According to one form of the speech recognition system of the present invention, an identification function is adapted to a characteristic vector so as to obtain an adaptive identification function. By use of this function, the characteristic vector is classified under one of a predetermined number of classes. In this case, the identification function was acquired while it was being adapted to learning samples, and accordingly, it can offer the highest performance in the case that adaptation is carried out thereof, thereby enhancing speech recognition accuracy.
According to another form of the identification-function calculator and calculating method, an identification function is learned while learning samples are being adapted to the function. This makes it possible to attain the identification function that can offer the highest identification performance in the case that adaptation is carried out at the recognition stage.
According to another form of the identification unit and method, an adaptive characteristic vector obtained by adapting a characteristic vector to an identification function is identified with use of the identification function. In this case, the identification function was learned while learning samples were been adapted to the function, and accordingly, it can offer the highest performance in the case that adaptation is made of the characteristic vector, thereby improving the identification (recognition) accuracy of the characteristic vectors.
According to another form of the speech recognition system, an adaptive characteristic vector obtained by adapting a characteristic vector to an identification function is classified under one of a predetermined number of classes. In this case, the identification function was learned while learning samples were been adapted to the function, and accordingly, it can offer the highest performance when adaptation is made of the characteristic vector, thereby enhancing speech recognition accuracy.
Claims
  • 1. A speech recognition system comprising:
  • acoustic analysis means for acoustically analyzing input speech and calculating feature vectors;
  • a calculating section for calculating, based on the feature vectors, output values of adaptive discriminant functions in relation to the feature vectors by using the adaptive discriminant functions having adaptive discriminant function parameters that are transformed to be adapted to the state of the variation of the factors that affect the input speech;
  • a determining section for determining a speech recognition result based on a calculation result of said calculating section, wherein
  • the adaptive discriminant function parameters are obtained by applying to adaptation functions, universal discriminant function parameters, which are independent of a state of the variation of the factors, and adaptation parameters for the recognition, and wherein
  • the adaptation parameters for the recognition maximizes a probabilistic likelihood of the adaptive discriminate function parameters in relation to adaptation learning speech signal samples for the recognition under the state of the variation of the factors,
  • the universal discriminant function parameters being obtained by a learning method which comprises:
  • determining, under the condition of using the adaptation functions, adaptation parameters for learning the probabilistic likelihood in relation to learning samples under the state of the variation of a plurality of factors without changing determined universal discriminant function parameters;
  • determining, based on the probabilistic likelihood, universal discriminant function parameters in relation to the learning without changing the determined adaptation parameters for learning;
  • judging, in a process of alternately performing said determining operations repeatedly, whether said learning method is to be discontinued; and
  • using the obtained universal discriminant function parameters as trained values in a case where said learning method is discontinued after performing said judging operations.
  • 2. The speech recognition system according to claim 1, wherein the adaptation parameters for recognition, the adaptation parameters for learning, and the universal discriminant function parameters are determined based on mutual information.
  • 3. The speech recognition system according to claim 1, wherein the adaptation parameters for recognition, the adaptation parameters for learning, and the universal discriminant function parameters are determined based on an error rate among the learning samples.
  • 4. The speech recognition system according to claim 1, wherein the universal discriminant function parameters are obtained by a learning method that comprises:
  • sequentially updating, based on the probabilistic likelihood, each of the universal discriminant function parameters and the adaptation parameters for maximum learning under the condition of using the adaptation functions, both the universal discriminant function parameters and the adaptation parameters being updated concurrently,
  • determining whether the sequential updating operation is to discontinued, and
  • using the obtained universal discriminant function parameters as trained values in a case where the updating operation is discontinued.
Priority Claims (1)
Number Date Country Kind
7-247890 Sep 1995 JPX
Parent Case Info

This is a division of prior application. Ser. No. 08/710,941 filed Sep. 24, 1996 now U.S. Pat. No. 5,828,998.

US Referenced Citations (4)
Number Name Date Kind
5638486 Wang et al. Jun 1997
5737489 Chou et al. Apr 1998
5749072 Mazurkiewicz et al. May 1998
5754681 Watanabe et al. May 1998
Divisions (1)
Number Date Country
Parent 710941 Sep 1996