For a more complete understanding of the invention, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
Those skilled in the pertinent art should understand that the principles of the present invention may be used to reduce the storage requirements of any model in which distributions (sometimes called “elementary distributions”) are weighted and mixed to form the model. Such models may be used as acoustic models and often employ mixtures of Gaussian distributions when used for that purpose. Though the present has broad applicability, the embodiments set forth in this Detailed Description will be directed specifically to GMMs in the context of ASR.
Before describing certain embodiments of the system and the method of the invention, a wireless communication infrastructure in which the novel automatic acoustic model training system and method and the underlying novel state-tying technique of the present invention may be applied will be described. Accordingly,
One advantageous application for the system or method of the invention is in conjunction with the mobile communication devices 110a, 110b. Although not shown in
Having described an exemplary environment within which the system or the method of the present invention may be employed, some remarks underlying the present invention will now be set forth. The system and method can substantially compress the storage requirements for mixture weights without degrading ASR performance. The system and method are founded on three observations regarding the properties of Gaussian mixture weights:
1. Gaussian mixture weights are not independent; they sum up to one.
2. The distribution of each Gaussian mixture weight is homogeneous along each dimension.
3. Mixture weight order can be changed in the likelihood computation using an appropriate tying scheme.
The system and method first reorders the mixture weights within the mixture weight vector by sorting. A corresponding change of the order of Gaussian distributions should also be made in the HMM-GMM to ensure that the mixture weights correspond to the correct Gaussians. Unless the mixture weights happen by chance to be in a desired order, the sorting reduces or compresses the overall vector space of the mixture weights. The sorting also changes the homogeneous distribution along each dimension to a distribution that is different in each dimension so vector quantization can be used to code the vector space efficiently. As those skilled in the pertinent art understand, vector quantization is based on Euclidean distance. After vector (or subvector) quantization of the mixture weight vectors, post processing can be performed to ensure that the sum of the vector elements equals to one.
In one embodiment of the present invention, 95,000 Gaussian mixture weights, representing 9500 tied states with 10 mixtures per state, can be stored in only 13 Kbytes of memory. This includes the codebook and indices that vector quantization requires. The result is an extremely efficient compression to only 1.09 bits per mixture weight. Without benefit of the present invention, scalar quantization of that many mixture weights typically requires as few as eight or as many as 16 bits per mixture weight, resulting in a total of 95 Kbytes of memory. The proposed method clearly has a significant advantage over scalar quantization and, as will be shown, unsorted vector quantization. This reduction in storage requirement is important for mobile communication devices, where storage is a major concern.
Certain embodiments of the system and method will now be described in greater detail.
Turning now to
The at least one Gaussian mixture weight vector 420 is provided to a vector and distribution sorter 430. The vector and distribution sorter 430 is configured to re-order elements of the at least one Gaussian mixture weight vector and corresponding distributions to yield at least one re-ordered Gaussian mixture weight vector. The order of the distributions, e.g., Gaussian distributions, in the acoustic model, are re-ordered so the correct mixture weight continues to be applied to its corresponding distribution.
In one embodiment, the vector and distribution sorter 430 is configured to sort the elements of the at least one Gaussian mixture weight vector to minimize Euclidean distances among elements of the at least one quantized re-ordered Gaussian mixture weight vector. By way of example, the vector and distribution sorter may be configured to sort the elements in ascending order. Alternatively, the vector and distribution sorter may be configured to sort the elements in descending order. Those skilled in the pertinent art will understand, however, that any conventional or later-developed sorting criterion or algorithm may be appropriate for a given application and that all such criteria or algorithms fall within the broad scope of the present invention.
The re-ordered Gaussian mixture weight vector 420 is next provided to a vector quantizer 440 that is associated with the vector and distribution sorter 430. The vector quantizer 440 is configured to vector quantize the at least one re-ordered Gaussian mixture weight vector to yield at least one quantized re-ordered Gaussian mixture weight vector. In a more specific embodiment, the vector quantizer 440 is configured to subvector vector quantize the at least one re-ordered Gaussian mixture weight vector to yield the at least one quantized re-ordered Gaussian mixture weight vector.
The vector quantizer 440 may use any conventional or later-developed vector- (or subvector-) quantization algorithm. The vector quantizer 440 may use, for example, the subvector quantization technique of Digalakis, et al., supra, incorporated herein by reference.
An optional post-processor 450 may be employed to ensure that a sum of the elements of a mixture weight vector equals one. The at least one quantized re-ordered Gaussian mixture weight vector may then be provided to a mobile communication device 410, in which it is stored in a memory 460 thereof as part of an acoustic model. The acoustic model is thereby configured for subsequent use for ASR.
Turning now to
In a step 510, at least one mel-frequency cepstral coefficient (MFCC) vector or any other feature vector is generated by, e.g., a conventional technique. In a step 520, at least one Gaussian mixture weight vector is generated by, e.g., a conventional technique in HMM-GMM training.
In a step 530, elements of the at least one Gaussian mixture weight vector and corresponding (e.g., Gaussian) distributions are re-ordered to yield at least one re-ordered Gaussian mixture weight vector. The re-ordering may involve sorting the elements of the at least one Gaussian mixture weight vector to minimize Euclidean distances among elements of the at least one quantized re-ordered Gaussian mixture weight vector. The re-ordering may involve sorting the elements in ascending order, descending order or in any conventional or later-discovered manner as may be advantageous to a particular application.
In a step 540, the at least one re-ordered Gaussian mixture weight vector is vector quantized to yield at least one quantized re-ordered Gaussian mixture weight vector. The vector quantizing may involve subvector quantizing the at least one re-ordered Gaussian mixture weight vector. In a step 550, the at least one quantized re-ordered Gaussian mixture weight vector may be post-processed to ensure that a sum of the elements equals one.
In a step 560, the at least one quantized re-ordered Gaussian mixture weight vector is stored in a memory. The memory may be associated with a mobile communication device, for example. The quantized Gaussian mixture weights form part of the acoustic model with which ASR may be performed. The method ends in an end step (not referenced).
Having described embodiments of systems and methods that fall within the scope of the present invention, graphical data will now be set forth that illustrates application of embodiments of the present invention to actual Gaussian mixture weight vectors. More specifically,
It will be observed that the dynamic range of each dimension is substantially reduced after re-ordering. To keep 99% of the cases, the dynamic range now can be from 0 to 0.07, 0.09, 0.11, 0.16 and 0.29, respectively, for the 1st, 3rd, 5th 7 and 9th dimensions, and 0.52 for the 10th mixture weights. The greatly reduced dynamic range illustrates the ability to compress the vector space.
Turning now to
Although the present invention has been described in detail, those skilled in the art should understand that they can make various changes, substitutions and alterations herein without departing from the spirit and scope of the invention in its broadest form.