The present invention is directed, in general, to automatic speech recognition (ASR) and, more particularly, to a system and method for combined state- and phone-level or multi-stage phone-level pronunciation adaptation for speaker-independent name dialing.
Speaker-independent name dialing (SIND) is an important application of ASR to mobile telecommunication devices. SIND enables a user to contact a person by simply saying that person's name; no previous enrollment or pre-training of the person's name is required.
Several challenges, such as robustness to environmental distortions and pronunciation variations, stand in the way of extending SIND to a variety of applications. However, providing SIND in mobile telecommunication devices is particularly difficult, because such devices have quite limited computing resources. Since SIND aims at recognizing a list of names, which may amount to thousands, methods that generate phoneme sequence of names are necessary. However, because of the above-mentioned limited computing resources in mobile communication devices, a large dictionary with many entries cannot be used for SIND. Instead, other methods must be used, such as a decision-tree-based pronunciation model (DTPM) (see, e.g., Suontausta, et al., “Low Memory Decision Tree Method for Text-To-Phoneme Mapping,” in ASRU, 2003) that generates a single pronunciation for each name online.
It is generally known that ASR can still benefit from improvements at all processing levels. Most of the benefits so far came from the acoustic level, e.g., by introducing dynamic features (see, e.g., Furui, et al., “Speaker-Independent Isolated Word Recognition Using Dynamic Features of Speech Spectrum,” IEEE Trans. Acoust. Speech Signal Process, pp. 52-59, 1986) and adaptation of acoustic models (see, e.g., Gales, et al., “Robust Speech Recognition in Additive and Convolutional Noise Using Parallel Model Combination,” Computer Speech and Language, vol. 9, pp. 289-307, 1995, Woodland, et al., “Improving Environmental Robustness in Large Vocabulary Speech Recognition,” in ICASSP, 1996, pp. 65-68, and Gauvain, et al., “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” IEEE Trans. on Speech and Audio Processing, vol. 2, no. 2, pp. 291-298, 1994). As the focus of ASR has gradually shifted from carefully read speech in quiet environments to real applications for normal speech in noisy environments, new challenges have occurred that require much effort in other levels of ASR. One challenge is pronunciation variation caused by many factors (see, e.g., Strik, “Pronunciation Adaptation at the Lexical Level,” in ITRW on Adaptation Methods for Speech Recognition, 2001, pp. 123-130), such as different speaking styles, degree of formality, environment, accent or dialect and emotional status. In addition to these factors, in mobile applications of SIND, such variation may also be due to mismatches between a data-driven pronunciation model, e.g., a decision-tree-based pronunciation model (see, e.g., Suontausta, et al., supra), trained from transcriptions of read speech and the actual pronunciation by human users. It is critical to have methods that can compensate effects of pronunciation variation on ASR.
Methods have been proposed to deal with pronunciation variation. These include lexicon modeling at the phone level using re-write rules (see, e.g., Yang, et al., “Data-Driven Lexical Modeling of Pronunciation Variations for ASR,” in ICSLP, 2000), decision trees (see, e.g., Riley, et al., “Stochastic Pronunciation Modeling from Hand-Labelled Phonetic Corpora,” Speech Communication, vol. 29, pp. 209-224, 1999), neural networks (see, e.g., Fukada, et al., “Automatic Generation of Multiple Pronunciations Based on Neural Networks,” Speech Communication, vol. 27, pp. 63-73, 1999), and confusion matrices (see, e.g., Torre, et al., “Automatic Alternative Transcription Generation and Vocabulary Selection for Flexible Word Recognizers,” in ICASSP, 1997, vol. 2, pp. 1463-1466).
Other methods deal with pronunciation variation at the state level. These include sharing mixture components at the state level (see, e.g., Liu, et al., “State-Dependent Phonetic Tied Mixtures with Pronunciation Modeling for Spontaneous Speech Recognition,” IEEE Trans on Speech and Audio Processing, vol. 12, no. 4, pp. 351-364, 2004, Saraclar, et al., “Pronunciation Modeling by Sharing Gaussian Densities Across Phonetic Models,” Computer Speech and Language, vol. 14, pp. 137-160, 2004, Yun, et al., “Stochastic Lexicon Modeling for Speech Recognition,” IEEE signal processing letters, vol. 6, no. 2, pp. 28-30, 1999, and Luo, Balancing Model Resolution and Generalizability in Large Vocabulary Continuous Speech Recognition, Ph.D. thesis, The Johns Hopkins University, 1999). In state-level methods, the HMM states of the phoneme's model are allowed to share Gaussian mixture components with the HMM states of the models of the alternate pronunciation realization. However, some significant disadvantages render these methods inappropriate for use in SIND in mobile communication devices. First, some state-level methods (e.g., Liu, et al., supra, and Saraclar, et al., supra) involve complex state-level operations such as splitting and merging. These operations are impractical in mobile communication devices due to their limited computing resources for SIND. Second, it is known that pronunciation variation is context-dependent. Some of phone-level methods (see, e.g., Torre, et al., supra) do not account for that fact. Third, phone-level methods have not been applied to SIND, since SIND has a unique pronunciation variation caused by differences between pronunciations from data-driven pronunciation models and human speakers.
Accordingly, what is needed in the art is a new technique for dealing with pronunciation variation for SIND that is not only relatively fast and accurate, but also more suitable for use in mobile telecommunication devices than are the above-described techniques.
To address the above-discussed deficiencies of the prior art, the present invention introduces methods and systems for combined state- and phone-level pronunciation adaptation.
The foregoing has outlined preferred and alternative features of the present invention so that those skilled in the pertinent art may better understand the detailed description of the invention that follows. Additional features of the invention will be described hereinafter that form the subject of the claims of the invention. Those skilled in the pertinent art should appreciate that they can readily use the disclosed conception and specific embodiment as a basis for designing or modifying other structures for carrying out the same purposes of the present invention. Those skilled in the pertinent art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention.
For a more complete understanding of the invention, reference is now made to the following descriptions taken in conjunction with the accompanying drawing, in which:
Certain embodiments of a combined state- and phone-level pronunciation adaptation technique carried out in accordance with the principles of the present invention (hereinafter “combined technique”) will now be described. The combined technique compensates for pronunciation variation at two levels. At the state level, pronunciation variation is carried out by mixture-sharing. At the phone level, probabilistic re-write rules are applied to generate multiple pronunciations per word. The re-write rules are context-dependent and therefore enable the combined technique to deal more effectively with pronunciation variation. As will be seen, certain embodiments of the combined technique introduce novel construction of rule sets, rule pruning and generation of multiple pronunciations. The efficacy of the phone-level re-write rules for SIND in mobile communication devices will be demonstrated through experiments set forth below. In addition, phone-level adaptation may be advantageously carried out in a multi-stage architecture to be described. A memory- and computation-efficient mixture-sharing technique will also be introduced that is particularly advantageous in extending SIND in mobile communication devices. Experiments demonstrating the efficacy of both the combined technique and the multi-stage phone-level technique will also be shown below. They will show that, compared to a baseline SIND system with a well-trained decision-tree-based pronunciation model, one embodiment of the combined technique decreases word error rate (WER) by 45%.
Referring initially to
One advantageous application for the system or method of the present invention is in conjunction with the mobile telecommunication devices 110a, 110b. Although not shown in
Having described an exemplary environment within which the system or the method of the present invention may be employed, various specific embodiments of the system and method will now be set forth. Accordingly, turning now to
Although the present invention encompasses performing state-level and phone-level pronunciation adaptation independently or in any order, it has proven particularly advantageous to perform adaptation at the state level before adaptation at the phone level for the following reasons. First, the combined technique performs state-level pronunciation variation by mixture-sharing. Due to the first-order Markovian property of HMMs, mixture-sharing in an HMM may not be able to use long-term context dependency. Therefore, mixture-sharing should occur before phone-level pronunciation adaptation, since the phone-level pronunciation adaptation introduced herein is context-dependent. Second, state-level pronunciation adaptation may be viewed as an integral part of acoustic model training. In addition to dealing with pronunciation variation, the combined technique increases the number of mixture components per state, but does not increase total number of mixture components.
As stated above, pronunciation adaptation at the state level is carried out through mixture-sharing. The mixture-sharing is developed in consideration of the following. First, for SIND, each state may have very limited number of Gaussian components. Further performance improvement may be achieved by increasing the mixture components of each state. However, this may drastically increase the size of the resulting acoustic model, rendering it unsuitable for mobile communication devices. Second, pronunciation variation may be performed at the state level (see, e.g., Liu, et al., supra, Saraclar, et al., supra, Yun, et al., supra, and Luo, supra). However, as described above, direct use of these techniques often is prohibitive for mobile communication devices.
The combined technique is developed to incorporate pronunciation variation at the state level without adversely affecting acoustic model size. Generally speaking, the combined technique involves tying mixtures with alternate pronunciations and thereafter re-training the acoustic models.
Turning now to
One embodiment of state-level pronunciation variation is carried out as follows:
Having described one embodiment of state-level pronunciation adaptation, one embodiment of phone-level pronunciation adaptation will now be described, again with reference to
where X is an observed acoustic feature sequence and W is a word sequence. For SIND, the word is composed of a sequence of sub-word phonemes, which is called the “lexicon.” When multiple pronunciations of the word are considered, the above Equation (3) extends to:
where P is a phoneme sequence of word sequence W. The pronunciation model p(p|W) should cover possible variants of P given W. Performance of the pronunciation model is important to the successful operation of a SIND system.
As described above, phone-level pronunciation adaptation may be performed using probabilistic re-write rules. The phone-level pronunciation adaptation technique includes four steps. First, patterns of phone-level variations are extracted, together with their phone contexts and occurrence counts (in a step 440). Second, a set of phone-level re-write rules is derived (in a step 445). Third, an entropy-based technique is used to prune the rule set (in a step 450). Fourth, these rules are applied to base forms to generate multiple pronunciation entries (in a step 455).
One embodiment of phone-level pronunciation adaptation will now be described. Two dictionaries are used to extract phone-level pronunciation variations (the step 440 of
The first step is to align the base forms and the surface forms. Turning now to
Next, a tree-structured probabilistic rewrite rule set is generated for each variation pattern (the step 445 of
In this embodiment, at most the two preceding and the two succeeding phones are used as the context of the current phone. Let i and j be the length of the preceding and succeeding contexts, respectively. Let Rij denote a set of rules having a context lengths of i and j. Rules are defined in descending order, from the longest context set R22 to a context-independent rule R.
For each pattern q→q′, the rule set is organized in a tree structure. Due to the tree-structured representation of context-dependent rewrite rules, some contexts are not allowed. More formally, given any context cεRij, other contexts in Rij do not overlap c. The rule sets described herein are therefore {R22,R21,R11,R10,R00)}.
The rule set is then pruned (the step 450 of
Let a node n be denoted as a child of a node m if the context in node n is a subset of the context in node m and the difference of lengths of their contexts is one. Let Um denote the set containing a child of node m. Let the phone transition probability p(q→q′|c) for context c at node m be denoted as pm. Given the probability, the entropy at node m is defined as:
Hm=−pm log2 pm−(1−pm)log2(1−pm). (6)
By further refining context of m to its children in Um, the entropy of Um is:
where p(n|m) is the probability of occurrence a subset context represented at node n given its parent node of m, i.e.:
Ĥm is then compared with Hm. Starting from the deepest context R22, the pruning process is stopped when Ĥm>Hm. By the above process, the tree-structured rule set with all those nodes that have undergone the above process is pruned. After pruning, the context selected to transit phone q to q′ may not be as detailed as the rule set R22 nor as general as the rule set R00. For example, the context selected for the transition “ah” to “ax” is in rule set R10. The above pruning process is then used for other nodes.
New surface forms are then generated by applying the pruned rule set (the step 455 of
p(q′|W)←p(q|W)p(q→q′|c). (9)
Three alternative techniques of generating multiple pronunciations will now be described. A threshold of probability θp is assigned to prune those variations without sufficient probabilities.
Note that A3 differs from A1 and A2. Both A1 and A2 retain all base forms; A3 may discard base forms.
The pronunciations generated by these three alternatives are usually different. For example, Table 1, below, shows pronunciations generated for the name “Adam” by alternatives A1, A2 and A3.
From Table 1, it may be observed that:
The speech-recognition performance of these three alternatives will be set forth below.
Having described certain embodiments of the combined technique, certain embodiments of a multi-stage phone-level pronunciation adaptation technique carried out in accordance with the principles of the present invention (hereinafter “multi-stage technique”) will now be described. As previously described, the multi-stage technique may be used for phone-level pronunciation adaptation in the combined technique. Recall that a word sequence is decoded via the MAP principle set forth in Equation (4) above. The objective therefore is to generate multiple pronunciations P that may improve recognition performance.
The multi-stage technique achieves this objective by minimizing a distance of multiple pronunciations to reference pronunciations. The similarity between two pronunciations, one being a reference pronunciation r and the other being a surface pronunciation s that is a variant of the reference pronunciation, is measured in terms of the edit, or Levenshtein, distance between the pronunciations (see, e.g., Levenshtein, “Binary Codes Capable of Correcting Deletions, Insertions, and Reversals,” Doklady Akademii Nauk SSSR, vol. 163, no. 4, pp. 845-848, 1965). The Levenshtein distance, denoted as D(s,r), is the minimum number of deletions, insertions or substitutions required to transform r into S. Here, the Levenshtein distance is extended to measure the distance of multiple pronunciations S with K-entries {si,iε{1, . . . ,K}} to the reference pronunciation r as:
In other words, the shortest distance of these surface forms or the surface pronunciations {Si} to the reference pronunciation r is selected as the distance of S to r. The problem may be defined thus:
The general idea of the multi-stage technique is to generate multiple pronunciations through a sequence of transformations f(•), where each of the transformations f(•) may include several steps. As stated in the objective above, each operation decreases the distance of the transformed pronunciations f(S) to the reference pronunciation r relative to that of the original pronunciations S.
It is therefore important to design f(•) to meet the goal. This may be achieved by the following probabilistic re-write rule technique for the operation f(•) (see, e.g., Akita, et al., supra, and Yang, et al., supra, for a general discussion of probabilistic re-write rule techniques).
At each stage, patterns of phone-level variations of an input pronunciation and a reference pronunciation are extracted. Based on the extracted patterns, a set of phone-level re-write rules is derived and pruned. Then, the rules are applied to the input pronunciations of the current stage. The output is used as input for the next stage, and the process repeats.
A plurality of stages cooperate to perform pronunciation adaptation. These stages, denoted stg1, stg2 . . . stgN include Δ logic blocks 730a, 730b, 730n and {circle around (×)} logic blocks 740a, 740b, 740n.
The A logic blocks 730a, 730b, 730n are employed to perform a delta analysis of the input pronunciation and the pronunciation from the reference pronunciation dictionary 810. The delta analysis includes extracting patterns of pronunciation variation, deriving phone-level re-write rules and pruning the re-write rules as described above.
The {circle around (×)} logic blocks 740a, 740b, 740n are employed to generate multiple pronunciations with the extracted rule set of this stage as described above. The output of each stage, e.g., stg1, stg2, is used as the input for the succeeding stage, e.g., stg2 . . . stgN.
As with the combined technique, two sets of pronunciations are used to extract phone-level pronunciation variation. The first set is taken from a reference dictionary containing true pronunciations. The second set is surface forms generated from the previous stage. A Viterbi alignment process then locates mismatched pairs of reference pronunciations and surface forms.
According to Equation (11), the surface pronunciation with the smallest Levenshtein distance to the reference pronunciation is selected. With the selected surface pronunciation, a pattern of pronunciation variation is extracted from the reference pronunciation as described above for the combined technique.
Next, as with the combined technique, a tree-structured probabilistic rewrite rule set is generated for each variation pattern. Let s denote a certain phone sequence with context c, and s′ be a variant of s. Let C(s|c) and C(s→s′|c) denote occurrence counts of base form s and surface form s′ with context c, respectively. A threshold θc is introduced for C(s|c) to select those contexts c and phones s with reliable statistics. That is, for those patterns that are more frequent than θc are adopted as rule candidates. The context-dependent phone transition probability is calculated as:
Equation (14) is analogous to Equation (5) for base and surface forms. Again, at most the two preceding phones and the two succeeding phones are used as the context of the current phone. Let i and j be the length of the preceding and succeeding contexts, respectively. Let Rij denote a set of rules whose context length is i and j. Rules are defined in descending order, from the longest context set R22 to a context-independent rule R00.
For each pattern s→s′, the rule set is organized in a tree-structure. Due to the tree-structured representation of context-dependent rewrite rules, some contexts are not allowed. More formally, given any context cεRi,j, other contexts in Rij do not overlap c. Referring back to
The rule sets are pruned as described in conjunction with the combined technique. Again, the objective is to have reliable representation of context-dependent phone variation. Equations (6) and (7) and their accompanying definitions and descriptions, above, describe an exemplary pruning process. In the present discussion, p(n|m) is the probability of occurrence a subset context represented at node n given its parent node of m, i.e.:
New surface forms are generated by applying the pruned rule set as described above. When a context is located in a lexicon s, a new pronunciation s′ is generated with probability:
p(s′|W)←p(s|W)p(s→s′|c). (16)
Note that Equation (16) is analogous to Equation (9), above.
A threshold of probability θp is assigned to prune those variations without sufficient probabilities. The process keeps all those generated pronunciation variation having probabilities larger than θp.
Notice that the original pronunciations S are retained. Adding new surface forms through Equation (16) does not increase the distance defined in Equation (11) of the transformed pronunciations relative to the reference pronunciation r, and therefore satisfies Equation (12).
Having described exemplary embodiments of the combined and multi-stage techniques, experimental results pertaining to one embodiment of the combined technique will now be described.
A name database, called WAVES, was used to provide the names for SIND. The WAVES database was collected in a vehicle using an AKG M2 hands-free distant talking microphone in three recording conditions: parked (car parked, engine off), city driving (car driven on a stop-and-go basis) and highway driving (car driven at a relatively constant speed on a highway). In each condition, 20 speakers (ten male, ten female) uttered English names. The WAVES database contained 1325 English name utterances. Because they were collected in cars, the utterances in the database were noisy. Multiple pronunciations of names also existed.
The WAVES database was sampled at 8 kHz, with frame rate of 20 ms. From the speech, 10-dimensional MFCC features and their delta coefficients were extracted. Baseline acoustic models were intra-word, context-dependent, triphone models. The acoustic models were trained from the well-known Wall Street Journal (WSJ) database with a manual dictionary. The models were gender-dependent and had 9573 mean vectors. To improve performance, these mean vectors were tied by a generalized tied-mixture (GTM) process (see, e.g., U.S. patent application Ser. No. 11/196,601), in which, in addition to the usual decision-tree-based state tying, a second stage of mixture-tying mechanism was applied to tie mixture components with these mean vectors. The baseline also used a pronunciation model trained from the well-known Carnegie Mellon University (CMU) dictionary (see, CMU, “The CMU pronunciation dictionary,” http://www.speech.cs.cmu.edu/cgi-bin/cmudict), which has 126,996 entries. Since the CMU dictionary has more proper names than the WSJ dictionary, pronunciation models trained from the CMU dictionary usually outperforms pronunciation models trained from the WSJ dictionary for SIND.
Because it was recorded using a hands-free microphone, the WAVES database presented several severe mismatches.
Although not necessary to an understanding of the performance of the combined technique, the experiment also involved a novel technique introduced in application Ser. No. [Attorney Docket No. TI-39862AA], supra) and called “IJAC” to compensate for environmental effects on acoustic models.
Phone-level pronunciation adaptation required two dictionaries. A dictionary with base forms was generated from the decision-tree-based pronunciation model. Surface forms were from a manual dictionary containing names for recognition. θc was set to 1 for all following experiments.
First, the three alternative techniques of generating multiple pronunciations described above (A1, A2 and A3) were analyzed. The probability threshold θp was set to 0.05. Results of these alternatives are shown in Table 2, below.
From Table 2, it may be observed that:
The results show that lexicon modeling at the phone level using re-write rules (see, e.g., Yang, et al., supra) may not be desirable for SIND with data-driven pronunciation models. Based on the above observations, alternative A2 was selected for further experiments.
A probability threshold θp, is used for pruning rules with low probabilities. The larger the threshold, the fewer the number of pronunciation variations are explored. Experimental results with a set of θp are shown in Table 3, below, together with a plot of the results of phone-level-only pronunciation adaptation and the baseline performance in
From Table 3, it may be observed that:
Recognition results for the combination technique are shown in Table 4, below.
From Table 4, it may be observed that:
Since the HMMs used for phone-level-only pronunciation adaptation also employed a data-driven mixture-tying technique found in U.S. patent application Ser. No. [Attorney Docket No. TI-39685], supra), pronunciation variation was implicitly used when the states to be tied happened to be located in the set of pronunciation variants. This may explain some of the performance results. However, the combined technique consistently and significantly outperformed phone-level-only pronunciation adaptation in the city driving condition.
Table 5 summarizes the performance of the combined technique compared to other techniques in dealing with pronunciation variations. The probability threshold θp for the combined technique was set to 0.05.
From Table 5, it may be observed that:
Having set forth experimental results pertaining to one embodiment of the combined technique, experimental results pertaining to one embodiment of the multi-stage technique will now be set forth pertaining to one embodiment of the multi-stage technique.
Experiments were conducted to verify the efficacy of the multi-stage technique in adapting a baseline pronunciation to multiple pronunciations that may also improve recognition performance. A small dictionary of 665 entries of name pronunciations was used in the experiments. The pruning threshold θp was empirically set to 0.05, and θc was set to 1 according to recognition performances.
The baseline pronunciation models were trained from CALLHOME American English Lexicon (PRONLEX) (see, e.g., LDC, “CALLHOME American English Lexicon,” http://www.ldc.upenn.edu/). Since the task at hand is SIND, entries for letters such as “.” and “'” were removed from the dictionary. Pronunciation of some English names was added into the dictionary. The final dictionary had 96,500 entries with multiple pronunciations. A decision tree of each letter was trained after a text-to-phoneme alignment (see, e.g., U.S. patent application Ser. No. [Attorney Docket No. TI-60422], supra). Because of the decision-tree-based approach, the baseline pronunciation models generated a single pronunciation for each word.
The WAVES database described above, this time containing 1325 English name utterances, was used. Baseline acoustic models were intra-word, context-dependent, triphone models. The acoustic models were trained from the well-known Wall Street Journal (WSJ) database with manual dictionary. The models were gender-dependent and had 9573 mean vectors. Although not necessary to the present invention but to improve performance, these mean vectors were tied by a generalized tied-mixture (GTM) process (see, e.g., U.S. patent application Ser. No. 11/196,601, supra), in which, in addition to usual decision-tree-based state tying, a second stage of mixture-tying mechanism was applied to tie mixture components with these mean vectors. Like the experiments above, IJAC was used to compensate environmental effects on acoustic models. However, the pronunciation model was not trained using the CMU dictionary.
The Levenshtein distance is related to the phoneme accuracy. The phoneme accuracy is defined as:
where N is the total number of phonemes in the reference pronunciations. D, S and I respectively denote the number of deletion errors, substitution errors and insertion errors, which are obtained by alignment of the surface pronunciations with the reference pronunciations. The higher the accuracy, the smaller number of errors and therefore the smaller Levenshtein distances from surface pronunciations to the reference pronunciations.
Table 6, below, shows the number of data-driven probablilistic re-write rules at each stage.
From Table 6, it may be observed that the number of rules decreased from 183 at the 1st stage to 83 at the 4th stage. The experiments, taken together, confirm that the multi-stage technique is both effective and efficient.
Name recognition experiments were then conducted to verify if the multi-stage technique can improve recognition performance. Results are shown in Table 7, below.
From Table 7, it may be observed that:
To achieve a good compromise between performance and complexity, it may be desirable to use a look-up table containing phonetic transcriptions of those names that the multi-stage technique does not correctly generate. While the look-up table may require a modest amount of additional storage space, performance may be significantly increased as a result.
Although the present invention has been described in detail, those skilled in the pertinent art should understand that they can make various changes, substitutions and alterations herein without departing from the spirit and scope of the invention in its broadest form.
The present invention is related to U.S. patent application Ser. No. 11/195,895 by Yao, entitled “System and Method for Noisy Automatic Speech Recognition Employing Joint Compensation of Additive and Convolutive Distortions,” filed Aug. 3, 2005, U.S. patent application Ser. No. 11/196,601 by Yao, entitled “System and Method for Creating Generalized Tied-Mixture Hidden Markov Models for Automatic Speech Recognition,” filed Aug. 3, 2005, and U.S. patent application Ser. No. [Attorney Docket No. TI-60422] by Yao, entitled “System and Method for Text-To-Phoneme Mapping with Prior Knowledge,” all commonly assigned with the present invention and incorporated herein by reference.