This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2012-195643, filed on Sep. 6, 2012; the entire contents of which are incorporated herein by reference.
An embodiment described herein relates generally to a model learning device, a model generation method, and a computer program product.
Gaussian distribution used for an acoustic model for speech recognition or the like has an mean vector and a covariance matrix. In general, use of a full covariance matrix taking correlation between variables into account as the covariance matrix results in higher recognition performance than using a diagonal covariance matrix that does not take correlation between variables into account. If the amount of training data for each Gaussian distribution is insufficient, however, there is a problem that a full covariance matrix often cannot be used because a full covariance matrix cannot be obtained or the reliability of the value of a full covariance matrix becomes lower.
A technique of sharing one full covariance matrix among a plurality of Gaussian distributions and learning the shared full covariance matrix by using training data of the Gaussian distributions can be considered as a technique for obtaining a full covariance matrix that is reliable even when the amount of training data for each Gaussian distribution is small. This technique allows the amount of training data per full covariance matrix to be increased as compared to the amount of training data per Gaussian distribution. In this manner, it is possible to obtain a reliable full covariance matrix by adjusting the number of full covariance matrices with respect to the amount of training data so as to improve recognition performance.
With the technique of learning a shared full covariance matrix according to the related art, however, the full covariance matrices of all the Gaussian distributions need to be obtained in advance. Furthermore, there is a problem that it is not necessarily optimum in terms of the maximum likelihood.
According to an embodiment, a model learning device learns a model having a full covariance matrix shared among a plurality of Gaussian distributions. The device includes a first calculator to calculate, from training data, frequencies of occurrence and sufficient statistics of the Gaussian distributions contained in the model; and a second calculator to select, on the basis of the frequencies of occurrence and the sufficient statistics, a sharing structure in which a covariance matrix is shared among Gaussian distributions, and calculate the full covariance matrix shared in the selected sharing structure.
An embodiment of a model learning device will be described below in detail with reference to the accompanying drawings. Here, an example will be described in which a model having a shared full covariance matrix representing a full covariance matrix shared among a plurality of Gaussian distributions contained in a hidden Markov model used for speech recognition.
The first calculating unit 10 receives as input a hidden Markov model (statistical model) having a mixture of Gaussian distributions as output distribution, and calculates the frequency of occurrence Nm and the sufficient statistic Tm=(T1m, T2m) of a Gaussian distribution m (1≦m≦M) contained in the hidden Markov model from training data. When the training data from time 1 to time U is represented by X=(x(1), . . . , x(U)) and the occupation probability of a Gaussian distribution m at time u is represented by γm(u), the frequency of occurrence and the sufficient statistic can be calculated by using the following equations (1) to (3):
In the equation, ·t represents transposition of a matrix.
The second calculating unit 12 clusters the Gaussian distributions by using the frequencies of occurrence and the sufficient statistics calculated by the first calculating unit 10, and calculates a full covariance matrix shared among Gaussian distributions belonging to the same cluster. The second calculating unit 12 then outputs a shared covariance statistical model. Note that the second calculating unit 12 clusters the Gaussian distributions by using K-means algorithm, LBG algorithm, binary tree clustering algorithm, or the like, for example. For example, the second calculating unit 12 assumes the centroid of a cluster as the shared full covariance matrix, samples as the Gaussian distributions, and a measure of closeness (distance or similarity) between the centroid and a sample as an expected value of log likelihood (see equation (4) below). As the expected value of log likelihood is larger, the centroid and a sample are closer to each other.
A cross mark (x) represents the sufficient statistic of each Gaussian distribution, and a black dot () represents the shared full covariance matrix that is the centroid of a cluster. In addition, a two-headed solid arrow connecting a cross mark (x) and a black dot () represents relationship between the sufficient statistic of each Gaussian distribution and the shared full covariance matrix having the maximum expected value of log likelihood calculated by using the sufficient statistic. Furthermore, a broken line represents a boundary of clusters formed in the space a of the sufficient statistics.
As illustrated in
First Exemplary Process by Second Calculating Unit 12
As a first exemplary process, the second calculating unit 12 clusters M Gaussian distributions into K (K≦M) clusters by using the K-means algorithm, and calculates a shared full covariance matrix.
In a cluster selection step (Step S102), the second calculating unit 12 selects an optimum cluster according to the maximum likelihood criterion for each Gaussian distribution. In other words, the second calculating unit 12 determines an optimum sharing structure. For example, when the shared full covariance matrix of a cluster k is represented by Σk, the mean vector of a Gaussian distribution m is represented by μm, the expected value Lm(k) of log likelihood for training data using the shared full covariance matrix Σk is calculated by the following equation (4). In the following equation (4), superscripts i and i−1 represent the number of repetitions of calculation.
In the equation, d represents the number of dimensions of training data x(u) and Tr( ) represents a trace of a matrix. The expected value Lm(k) of log likelihood is calculated for all the clusters and a cluster for which the expected value Lm(k) of log likelihood is maximum is the cluster of a Gaussian distribution m.
In a shared full covariance matrix update step (Step S104), the second calculating unit 12 calculates and updates a shared full covariance matrix by the following equation (5) by using the mean vectors, the frequencies of occurrence and the sufficient statistics of the Gaussian distributions belonging to each cluster. In other words, the second calculating unit 12 updates the centroid.
In the equation, Ck represents a set of indices of the Gaussian distributions belonging to a cluster k.
In a termination determination step (Step S106), the second calculating unit 12 determines whether or not a termination condition of calculation of the shared full covariance matrices is satisfied. If the termination condition is not satisfied (Step S106: No), the second calculating unit 12 proceeds to the processing in Step S102. If, on the other hand, the termination condition is satisfied (Step S106: Yes), the second calculating unit 12 terminates the process. The termination condition may be “the result of the processing in the cluster selecting step (Step S102) being the same as the previous result”, “the number of repetitions of calculation having reached a predetermined number of repetitions”, or the like.
Note that the second calculating unit 12 may have software or hardware to execute the processing in each of the steps illustrated in
Second Exemplary Process by Second Calculating Unit 12
As a second exemplary process, the second calculating unit 12 clusters M Gaussian distributions into K (K≦M) clusters by using the LBG algorithm (Linde-Buzo-Gray algorithm), and calculates shared full covariance matrices.
In a cluster segmentation step (Step S202), the second calculating unit 12 increases the number of clusters from K′ to min(K, nK′) (cluster segmentation). Note that n is within a range of 1<n≦2, and n=2 is typically used. In addition, min(a, b) is a function that outputs the smaller of a and b.
More specifically, the second calculating unit 12 selects min(K, nK′)−K′ shared full covariance matrices from K′ shared full covariance matrices and segments each shared full covariance matrix into two. Subsequently, the second calculating unit 12 combines 2(min(K, nK′)−K′) shared full covariance matrices obtained by the segmentation and K′−(min(K, nK′)−K′) full covariance matrices that are not segmented to obtain min(K, nK′) shared full covariance matrices. The second calculating unit 12 then updates the number K′ of clusters to min(K, nK′).
In a K-means algorithm step (Step S204), the second calculating unit 12 executes the K-means algorithm using K′ shared full covariance matrices obtained in the cluster segmentation step (Step S202) as the initial shared full covariance matrices to calculate K′ shared full covariance matrices.
In a termination determination step (Step S206), the second calculating unit 12 determines whether or not the number of clusters is a desired number K. If K′<K (Step S206: No), the second calculating unit 12 proceeds to the processing in Step S202. If K′=K (Step S206: Yes), on the other hand, the second calculating unit 12 terminates the process.
Note that the second calculating unit 12 may have software or hardware to execute the processing in each of the steps illustrated in
Furthermore, the first calculating unit 10 may be configured to obtain an amount expressed by the following equation (6) instead of the sufficient statistic Tm=(T1m, T2m)
In this case, the aforementioned equations (4) and (5) are expressed by the following equations (7) and (8), respectively.
The model learning device 1 according to the embodiment includes a control device such as a CPU, a storage device such as a ROM and a RAM, an external storage device such as an HDD and a CD drive, a display device such as a display, and an input device such as a keyboard and a mouse, which is a hardware configuration utilizing a common computer system.
Model generation programs to be executed by the model learning device 1 according to the embodiment are recorded on a computer readable recording medium such as a CD-ROM, a flexible disk (FD), a CD-R, and a DVD (digital versatile disk) in a form of a file that can be installed or executed, and provided therefrom.
Alternatively, the model generation programs to be executed by the model learning device 1 according to the embodiment may be stored on a computer system connected to a network such as the Internet, and provided by being downloaded via the network. Still alternatively, the model generation programs to be executed by the model learning device 1 according to the embodiment may be provided or distributed through a network such as the Internet.
Still alternatively, the model generation programs according to the embodiment may be embedded on a ROM or the like in advance and provided therefrom.
The model generation programs to be executed by the model learning device 1 according to the embodiment have a modular structure including the respective units (the first calculating unit 10 and the second calculating unit 12) described above, for example. In an actual hardware configuration, a CPU (processor) reads the model generation programs from the storage medium mentioned above and executes the programs, whereby the first calculated unit 10 and the second generating units are loaded on a main storage device and generated thereon.
As described above, according to the model learning device 1 according to the embodiment, pattern recognition performance can be improved even when the full covariance matrices of all the Gaussian distributions cannot be obtained in advance. Specifically, the model learning device 1 can determine an optimum sharing structure in which a full covariance matrix is shared on the basis of the maximum likelihood criterion even when a full covariance matrix cannot be obtained owing to lack of training data for each Gaussian distribution.
Comparative Example of Clustering Full Covariance Matrices
A cross mark (x) represents the sufficient statistic of each Gaussian distribution, a white dot (o) represents the full covariance matrix of each Gaussian distribution, and a black dot () represents the shared full covariance matrix that is the centroid of a cluster. A single-headed dashed arrow from a cross mark (x) toward a white dot (o) means that a full covariance matrix is to be obtained from the sufficient statistic for each Gaussian distribution. In addition, a two-headed solid arrow connecting a white dot (o) and a black dot () represents relationship between the full covariance matrix of each Gaussian distribution and the shared full covariance matrix at the shortest distance therefrom.
Furthermore, a broken line represents a boundary of clusters formed in the space b of the full covariance matrices. In the clustering of the full covariance matrices according to the comparative example, clustering is performed on the basis of the sum of distances between the full covariance matrices of the respective Gaussian distributions and the shared full covariance matrices associated therewith so that the sum becomes the smallest. Accordingly, it is necessary to obtain the full covariance matrix from the sufficient statistic for each Gaussian distribution in advance. Furthermore, since the clustering of the full covariance matrices according to the comparative example is based on the distances between the full covariance matrices, it is not necessarily optimum in terms of the maximum likelihood.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2012-195643 | Sep 2012 | JP | national |