Fast, language-independent method for user authentication by voice

Description

FIELD OF THE INVENTION

The present invention relates to speech or voice recognition systems and more particularly to user authentication by speech or voice recognition.

BACKGROUND OF THE INVENTION

The field of user authentication has received increasing attention over the past decade. To enable around-the-clock availability of more and more personal services, many sophisticated transactions have been automated, and remote database access has become pervasive. This, in turn, heightened the need to automatically and reliably establish a user's identity. In addition to standard password-type information, it is now possible to include, in some advanced authentication systems, a variety of biometric data, such as voice characteristics, retina patterns, and fingerprints.

In the context of voice processing, two areas of focus can be distinguished. Speaker identification is the process of determining which registered speaker provides a given utterance. Speaker verification, on the other hand, is the process of accepting or rejecting the identity of a speaker based upon an utterance. Collectively, they refer to the automatic recognition of a speaker (i.e., speaker authentication) on the basis of individual information present in the speech wave form. Most applications in which a voice sample is used as a key to confirm the identity of a speaker are classified as speaker verification. Many of the underlying algorithms, however, can be applied to both speaker identification and verification.

Speaker authentication methods may be divided into text-dependent and text-independent methods. Text-dependent methods require the speaker to say key phrases having the same text for both training and recognition trials, whereas text-independent methods do not rely on a specific text to be spoken. Text-dependent systems offer the possibility of verifying the spoken key phrase (assuming it is kept secret) in addition to the speaker identity, thus resulting in an additional layer of security. This is referred to as the dual verification of speaker and verbal content, which is predicated on the user maintaining the confidentiality of his or her pass-phrase.

On the other hand, text-independent systems offer the possibility of prompting each speaker with a new key phrase every time the system is used. This provides essentially the same level of security as a secret pass-phrase without burdening the user with the responsibility to safeguarding and remembering the pass-phrase. This is because prospective impostors cannot know in advance what random sentence will be requested and therefore cannot (easily) play back some illegally pre-recorded voice samples from a legitimate user. However, implicit verbal content verification must still be performed to be able to reject such potential impostors. Thus, in both cases, the additional layer of security may be traced to the use of dual verification.

In all of the above, the technology of choice to exploit the acoustic information is hidden Markov modeling (HMM) using phonemes as the basic acoustic units. Speaker verification relies on speaker-specific phoneme models while verbal content verification normally employs speaker-independent phoneme models. These models are represented by Gaussian mixture continuous HMMs, or tied-mixture HMMs, depending on the training data. Speaker-specific models are typically constructed by adapting speaker-independent phoneme models to each speaker's voice. During the verification stage, the system concatenates the phoneme models appropriately, according to the expected sentence (or broad phonetic categories, in the non-prompted text-independent case). The likelihood of the input speech matching the reference model is then calculated and used for the authentication decision. If the likelihood is high enough, the speaker/verbal content is accepted as claimed.

The crux of speaker authentication is the comparison between features of the input utterance and some stored templates, so it is important to select appropriate features for the authentication. Speaker identity is correlated with the physiological and behavioral characteristics of the speaker. These characteristics exist both in the spectral envelope (vocal tract characteristics) and in the supra-segmental features (voice source characteristics and dynamic features spanning several segments). As a result, the input utterance is typically represented by a sequence of short-term spectral measurements and their regression coefficients (i.e., the derivatives of the time function of these spectral measurements).

Since HMMs can efficiently model statistical variation in such spectral features, they have achieved significantly better performance than less sophisticated template-matching techniques, such as dynamic time-warping. However, HMMs require the a priori selection of a suitable acoustic unit, such as the phoneme. This selection entails the need to adjust the authentication implementation from one language to another, just as speech recognition systems must be re-implemented when moving from one language to another. In addition, depending on the number of context-dependent phonemes and other modeling parameters, the HMM framework can become computationally intensive.

SUMMARY OF THE INVENTION

A method and system for training a user authentication by voice signal are described. In one embodiment, a set of feature vectors are decomposed into speaker-specific recognition units. The speaker-specific recognition units are used to compute distribution values to train the voice signal. In addition, spectral feature vectors are decomposed into speaker-specific characteristic units which are compared to the speaker-specific distribution values. If the speaker-specific characteristic units are within a threshold limit of the speaker-specific distribution values, the speech signal is authenticated.

BRIEF DESCRIPTION OF THE DRAWINGS

Features and advantages of the present invention will be apparent to one skilled in the art in light of the following detailed description in which:

FIG. 1 is a block diagram of one embodiment of a user authentication system;

FIG. 2 is a block diagram of one embodiment for a computer system architecture of a user authentication system;

FIG. 3 is a block diagram of one embodiment for a computer system memory of FIG. 2;

FIG. 4 is a block diagram of one embodiment for an input feature vector matrix of FIG. 3;

FIG. 5 is a block diagram of one embodiment for speaker-specific decomposition vectors of FIG. 3;

FIG. 6 is a flow diagram of one embodiment for user authentication by voice training; and

FIG. 7 is a flow diagram of one embodiment for user authentication by voice.

DETAILED DESCRIPTION

In one embodiment, an entire utterance is mapped into a single point in some low-dimensional space. The speaker identification/verification problem then becomes a matter of computing distances in that space. As time warping is no longer required, there is no longer a need for the HMM framework for the alignment of two sequences of feature vectors, nor any dependence on a particular phoneme set. As a result, the method is both fast and language-independent

In one embodiment, verbal content verification may also be handled, although here time warping is unavoidable. Because of the lower dimensionality of the space, however, standard template-matching techniques yield sufficiently good results. Again, this obviates the need for a phoneme set, which means verbal content verification may also be done on a language-independent basis.

Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.

Some portions of the detailed description that follows are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory in the form of a computer program. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.

FIG. 1 is a block diagram of one embodiment of a user authentication system 100. Referring to FIG. 1, input device 102 receives a voice input 101 and converts voice input 101 into an electrical signal representative of the voice input 101. Feature extractor 104 receives the electrical signal and samples the signal at a particular frequency, the sampling frequency determined using techniques known in the art. In one embodiment, feature extractor 104 extracts the signal every 10 milliseconds. In addition, feature extractor 104 may use a Fast Fourier Transform (FFT) followed by Filter Bank Analysis on the input signal in order to provide a smooth spectral envelope of the input 101. This provides a stable representation from one repetition to another of a particular speaker's utterances. Feature extraction 104 passes the transformed signal to dynamic feature extractor 108. Dynamic feature extractor 108 extracts the first and second order regression coefficients for every frame of data. The first and second order regression coefficients are concatenated and passed from dynamic feature extractor 108 as feature extraction representation 114. In one embodiment, the feature extraction representation 114 is an M×N matrix which is a sequence of M feature vectors or frames of dimension N. In one embodiment, M is M is on the order of a few hundred and N is typically less than 100 for a typical utterance of a few seconds in length. After feature extraction representation 114 is created, the feature representation is decomposed into speaker-specific recognition units by processor 115 and speaker-specific recognition distribution values are computed from the recognition units.

User authentication system 100 may be hosted on a processor but is not so limited. In alternate embodiments, dynamic feature extractor 108 may comprise a combination of hardware and software that is hosted on a processor different from authentication feature extractor 104 and processor 115.

FIG. 2 is a block diagram of one embodiment for a computer system architecture 200 that may be used for user authentication system 100. Referring to FIG. 2, computer system 200 includes system bus 201 used for communication among the various components of computer system 200. Computer system 200 also includes processor 202, digital signal processor 208, memory 204, and mass storage device 207. System bus 201 is also coupled to receive inputs from keyboard 222, pointing device 223, and speech signal input device 225. In addition, system bus 201 provides outputs to display device 221 and hard copy device 224.

FIG. 3 is a block diagram of one embodiment for a computer system memory 310 of a user authentication system 100. Referring to FIG. 3, input device 302 provides speech signals to a digitizer 304. Digitizer 304, or feature extractor, samples and digitize the speech signals for further processing. Digitizer 304 may include storage of the digitized speech signals in the speech input data memory component of memory 310 via system bus 308. Digitized speech signals are processed by digital processor 306 using authentication and content verification application 320.

In one embodiment, digitizer 304 extracts spectral feature vectors every 10 milliseconds. In addition, a short term Fast Fourier Transform followed by a Filter Bank Analysis is used to ensure a smooth spectral envelope of the input spectral features. The first and second order regression coefficients of the spectral features are extracted. The first and second order regression coefficients, typically referred to as delta and delta-delta parameters, are concatenated to create input feature vector matrix 312. Input feature vector matrix 312 is an M×N matrix of frames (F). Within matrix 312, each row represents the spectral information for a frame and each column represents a particular spectral band over time. In one embodiment, the spectral information for all frames and all bands may include approximately 20,000 parameters. In one embodiment, a single value decomposition (SVD) of the matrix F is computed. The computation is as follows:

F=F′=USV^T

where U is the M×R matrix of left singular vectors, U_m(1≦m≦M), S is the (R×R) diagonal matrix of singular values s_R(1≦r≦R), V is the (N×R) matrix of right singular vectors V_n(1≦n≦N), R<<M, N is the order of the decomposition, and ^Tdenotes matrix transposition. A portion of the SVD of the matrix F (in one embodiment, the S or V portion) is stored in speaker-specific decomposition units 322.

During training sessions, multiple speaker-specific decomposition units 322 are created and speaker-specific recognition units 314 are generated from the decomposition units 322. Each speaker to be registered (1≦j≦J) provides a small number K, of training sentences. In one embodiment, K=4 and J=40. For each speaker, each sentence or utterance is then mapped into the SVD matrices and the R×R matrix is generated into a vector s for each input sentence k. This results in a set of vectors s_j,k(1≦j≦J, 1≦k≦K), one for each training sentence of each speaker. In one embodiment, speaker-specific recognition distribution values 316 are computed for each speaker.

Memory 310 also includes authentication and content verification application 320 which compares speaker-specific recognition units 314 with the speaker specific recognition distribution values 316. If the difference between the speaker-specific recognition units 314 and the distribution values 316 is within an acceptable threshold or range, the authentication is accepted. This distance can be computed using any distance measure, such as Euclidean, Gaussian, or any other appropriate method. Otherwise, the authentication is rejected and the user may be requested to re-input the authentication sentence.

FIG. 4 is a block diagram of one embodiment for an input feature vector matrix 312. Input feature vector matrix 312 is a matrix of M feature vectors 420 of dimension N 404. In one embodiment, M is on the order of a few hundred and N is typically less than 100 for an utterance of a few seconds in length. Each utterance is represented by an individual M×N matrix 312 of frames F. Row 408 represents the spectral information for a frame and column 406 represents a particular spectral band over time. In one embodiment, the utterance may be extracted to produce approximately 20,000 parameters (M×N).

FIG. 5 is a block diagram of one embodiment for a speaker specific decomposition units 322. In one embodiment, singular value decomposition (SVD) of the matrix F is performed. The decomposition is as follows:

F=F′=USV^T

where U 505 is the MxR matrix of left singular vectors, U_m(1≦m≦M), S 515 is the (R×R) diagonal matrix of singular values s_r(1≦r≦R), and V 525 is the (N×R) matrix of right singular vectors V_n(1≦n≦N), in which R<<M, N is the order of the decomposition, and T denotes matrix transposition. The singular value decomposition SVD of the matrix F is stored in speaker specific decomposition units 322.

The nth left singular vector u_m408 may be viewed as an alternative representation of the nth frame (that is, the nth eigenvector of the M×M matrix FF). The nth right singular vector v_n406 is an alternate representation of the nth spectral band (that is, the nth eigenvector of the N×N matrix 525 F^TF). The U matrix 505 comprises eigen-information related to the frame sequence across spectral bands, while the V matrix 525 comprises eigen-information related to the spectral band sequence across time. The S matrix 515 embodies the correlation between the given frame sequence and the given spectral band sequence which includes factors not directly related to the way frames are sequentially generated or spectral bands are sequentially derived. That is, the singular values s_rshould contain information that does not depend on the particular utterance text or spectral processing considered such as, for example, speaker-specific characteristics. The S matrix 515 is a diagonal matrix in which each entry in the diagonal of the matrix may be represented by s_r. The S matrix 515 may be represented by a vector s containing the R values s_r. With this notation, s encapsulates information related to the speaker characteristics.

The SVD defines the mapping between the original utterance and a single vector of dimension R containing speaker-specific information. Thus, s may be defined as the speaker-specific representation of the utterance in a low dimensional space. Comparing two utterances may be used to establish the speaker's identity by computing a suitable distance between two points in the space. In one embodiment, the Gaussian distance is used to account for the different scalings along different coordinates of the decomposition. In one embodiment, a five dimensional space is utilized to compute the distance.

FIG. 6 is a flow diagram of one embodiment for a user authentication by voice training. Initially at processing block 605, the spectral feature vectors for a user are extracted. During training, each speaker to be registered provides a small number of training sentences. In one embodiment, the user provides K=4 sentences. Each sentence is digitized into an individual input feature vector matrix 312.

At processing block 610, each input feature vector matrix 312 is decomposed into speaker-specific recognition units 322. The decomposition is as described in reference to FIG. 5. The decomposition results in a set of vectors s_j,k(1≦j≦J, 1≦k≦K), one set of vectors for each training sentence of each speaker.

At processing block 620, speaker-specific recognition distribution values 316 are computed for each speaker. In one embodiment, a centroid for each speaker is determined using the following formula:

${\overline{μ}}_{j} = \frac{1}{K} \sum_{k = 1}^{K} S_{j, k},$

In addition, the global covariance matrix is computed by the following formula:

$G = \frac{1}{J} \frac{1}{K} \sum_{j = 1}^{J} \sum_{k = 1}^{K} (S_{j, k} - μ_{j}) {(S_{j, k} - μ_{j})}^{T}$

In one embodiment, the global variance matrix is used, as compared to speaker-specific covariances, as the estimation of the matrix becomes a problem in small sampling where K<R. (In general, in situations where the number of speakers and/or sentence is small, a pre-computed speaker-independent covariance matrix is used to increase reliability.)

FIG. 7 is a flow diagram of one embodiment for user authentication by voice. Initially at processing block 705, a spectral feature vector is extracted for an input access sentence. The extraction process is similar to the extraction process of processing block 605 above.

At processing block 710, the input feature vector is decomposed into a speaker-specific characteristic unit 322. The SVD is applied to the input feature vector as described in reference to FIG. 5. The decomposition is as described above.

At processing block 720, the speaker-specific characteristic unit 322 is compared to the speaker-specific recognition distribution values 316 previously trained by the user. The speaker-specific characteristic unit 322 may be represented by s₀which may be compared to the centroid associated with the speaker as identity is being claimed, u_j. For example, the distance between s₀and u_jmay be computed as follows:

d(s₀,U_j)=(s₀−U_j)^TG⁻¹(S₀−U_j).

At processing block 725, the distance d(s₀, U_j) is compared to a set threshold limit. If the distance d(s₀, U_j) falls within the threshold, then at processing block 735, the user authentication is acceptable and the user is allowed to proceed. For example, if the user authentication is utilized to gain access to a personal computer, the user will be allowed to access the personal computer.

If at processing block 725, the distance d(s₀, U_j) is not within the threshold limit, then at processing block 730, the user authentication is rejected and, in one embodiment, the user is returned to the beginning, at processing block 705, for input and retry of the input sentence. In one embodiment, the user may be allowed to attempt to enter the user authentication by voice a given number of times before the process terminates.

In an alternate embodiment, the threshold limit is not used and the following method is used for the authentication. The distance, d(s₀, U_j), is computed for all registered speakers within the system. If the distance for the speaker as claimed is the smallest distance computed, and there is no other distance within the same appropriate ratio (for example, 15%) of the minimum distance, the speaker is accepted. The speaker is rejected if either of the above conditions is not true.

In one embodiment, for verbal content verification, the singular values are not used, since they do not contain information about the utterance text itself. However, this information is present in the sequence of left singular vectors u_m(1≦m≦M). So, comparing two utterances for verbal content can be done by comparing two sequences of left singular vectors, each of which is a trajectory in a space of dimension R. It is well-known that dynamic time-warping is more robust in a low-dimensional space than in a high-dimensional space. As a result, it can be taken advantage of within the SVD approach to perform verbal content verification.

Using dynamic time-warping, the time axes of the input u_msequence and the reference u_msequence are aligned, and the degree of similarity between them, accumulated from the beginning to the end of the utterance, is calculated. The degree of similarity is best determined using Gaussian distances, in a manner analogous to that previously described. Two issues are worth pointing out, however. First, the u_msequences tend to be fairly “jittery”, which requires some smoothing before computing meaningful distances. A good choice is to use robust locally weighted linear regression to smooth out the sequence. Second, the computational load, compared to speaker verification, is greater by a factor equal to the average number of frames in each utterance. After smoothing, however, some downsampling may be done to speed-up the process.

The above system was implemented and released as one component of the voice login feature of MacOS 9. When tuned to obtain an equal number of false acceptances and false rejections, it operates at an error rate of approximately 4%. This figure is comparable to what is reported in the literature for HMM-based systems, albeit obtained at a lower computational cost and without any language restrictions.

The specific arrangements and methods herein are merely illustrative of the principles of this invention. Numerous modifications in form and detail may be made by those skilled in the art without departing from the true spirit and scope of the invention.

Claims

1. A method of speech-based user authentication, comprising: at a device comprising one or more processors and memory: receiving a spoken utterance of a speaker;generating a phoneme-independent matrix based on the spoken utterance, wherein the phoneme-independent matrix comprises a plurality of phoneme-independent feature vectors each having been extracted from a respective frame sampled from the spoken utterance at a sampling frequency;decomposing the phoneme-independent matrix into multiple sets of vectors including at least a first set of vectors defining at least one speaker-specific recognition unit and a second set of vectors defining at least one content reference sequence;computing at least one speaker-specific distribution value based on at least the speaker-specific recognition unit; and authenticating an input speech signal based on the at least one speaker-specific distribution value.
2. The method of claim 1 wherein decomposing the phoneme-independent matrix further comprises: applying a singular value decomposition to the phoneme-independent matrix.
3. The method of claim 1 further comprising: generating the speaker-specific recognition unit from a singular value matrix of a singular value decomposition of the phoneme-independent matrix.
4. The method of claim 1, wherein authenticating the input speech signal based on the at least one speaker-specific distribution value further comprises: decomposing at least one phoneme-independent spectral feature vector of the input speech signal into at least one speaker-specific characteristic unit;comparing the at least one speaker-specific characteristic unit to the at least one speaker-specific distribution value; andauthenticating the input speech signal if the at least one speaker-specific characteristic unit is within a threshold limit of the at least one speaker-specific distribution value.
5. The method of claim 4 wherein decomposing the at least one phoneme-independent spectral feature vector of an input speech signal into at least one speaker-specific characteristic unit further comprises: applying a singular value decomposition to the at least one phoneme-independent spectral feature vector of the input speech signal.
6. The method of claim 4, wherein the at least one phoneme-independent spectral feature vector is further decomposed into at least one content input sequence and authenticating the speech signal further comprises: authenticating the input speech signal if the at least one content input sequence is similar to the at least one content reference sequence.
7. The method of claim 6 further comprising: determining similarity based on a distance calculated between the at least one content input sequence and the at least one content reference sequence.
8. A method of authenticating a speech signal comprising: at a device comprising one or more processors and memory: receiving a spoken utterance of an unauthenticated speaker;generating a phoneme-independent matrix based on the spoken utterance, wherein the phoneme-independent matrix comprises a first plurality of phoneme-independent spectral feature vectors each having been extracted from a respective frame sampled from the spoken utterance at a sampling frequency;decomposing the phoneme-independent matrix into a speaker-specific characteristic unit;comparing the at least one speaker-specific characteristic unit to at least one speaker-specific distribution value, the at least one speaker-specific distribution value previously trained by a registered speaker and generated by decomposing a second plurality of phoneme-independent feature vectors into sets of vectors including at least a first set of vectors defining at least one speaker-specific recognition unit and a second set of vectors defining at least one content reference sequence; andauthenticating the spoken utterance if the at least one speaker-specific characteristic unit is within a threshold limit of the at least one speaker-specific distribution value.
9. The method of claim 8 further comprising: generating the speaker-specific characteristic unit from a singular value matrix of a singular value decomposition of the phoneme-independent matrix.
10. The method of claim 8 further comprising: decomposing the second plurality of phoneme-independent feature vectors into the at least one speaker-specific recognition unit; andcomputing the at least one speaker-specific distribution value from the at least one speaker-specific recognition unit.
11. The method of claim 10 further comprising: generating the at least one speaker-specific recognition unit from a singular value matrix of a singular value decomposition of the second plurality of phoneme-independent feature vectors.
12. The method of claim 10 wherein decomposing the phoneme-independent matrix further comprises: applying a singular value decomposition to the phoneme-independent matrix.
13. The method of claim 8 wherein decomposing the phoneme-independent matrix further comprises: applying a singular value decomposition to the phoneme-independent matrix.
14. The method of claim 8, wherein the phoneme-independent matrix is further decomposed into at least one content input sequence and wherein authenticating the spoken utterance further comprises: authenticating the spoken utterance if the at least one content input sequence is similar to at least one content reference sequence previously trained by the registered speaker.
15. The method of claim 14 further comprising: determining similarity based on a distance calculated between the at least one content input sequence and the at least one content reference sequence.
16. A system for speech-based user authentication, comprising: means for receiving a spoken utterance of a speaker;means for generating a phoneme-independent matrix based on the spoken utterance, wherein the phoneme-independent matrix comprises a plurality of phoneme-independent feature vectors each having been extracted from a respective frame sampled from the spoken utterance at a sample frequency;means for decomposing the phoneme-independent matrix into multiple sets of vectors including at least a first set of vectors defining at least one speaker-specific recognition unit and a second set of vectors defining at least one content reference sequence;means for computing at least one speaker-specific distribution value based on at least the speaker-specific recognition unit; andmeans for authenticating an input speech signal based on the at least one speaker-specific distribution value.
17. A system for authenticating a speech signal comprising: means for receiving a spoken utterance of a speaker;means for generating a phoneme-independent matrix based on the spoken utterance, wherein the phoneme-independent matrix comprises a first plurality of phoneme-independent spectral feature vectors each having been extracted from a respective frame sampled from the spoken utterance at a sampling frequency;means for decomposing the phoneme-independent matrix into a speaker-specific characteristic unit;means for comparing the at least one speaker-specific characteristic unit to at least one speaker-specific distribution value, the at least one speaker-specific distribution value previously trained by a registered speaker and generated by decomposing a second plurality of phoneme-independent feature vectors into sets of vectors including at least a first set of vectors defining at least one speaker-specific recognition unit and a second set of vectors defining at least one content reference sequence; andmeans for authenticating the spoken utterance if the at least one speaker-specific characteristic unit is within a threshold limit of the at least one speaker-specific distribution value.
18. The system of claim 17, wherein the phoneme-independent matrix is further decomposed into at least one content input sequence and wherein the means for authenticating the spoken utterance further authenticates the spoken utterance if the at least one content input sequence is similar to at least one content reference sequence previously trained by the registered speaker.
19. The system of claim 18, wherein the means for comparing further determines similarity based on a distance calculated between the at least one content input sequence and the at least one content reference sequence.
20. A non-transitory computer readable medium comprising instructions, which when executed on a processor, perform a method of speech-based user authentication, comprising: receiving a spoken utterance of a speaker;generating a phoneme-independent matrix based on the spoken utterance, wherein the phoneme-independent matrix comprises a plurality of phoneme-independent feature vectors each having been extracted from a respective frame sampled from the spoken utterance at a sampling frequency;decomposing the phoneme-independent matrix into multiple sets of vectors including at least a first set of vectors defining at least one speaker-specific recognition unit and a second set of vectors defining at least one content reference sequencecomputing at least one speaker-specific distribution value based on at least the speaker-specific recognition unit; andauthenticating an input speech signal based on the at least one speaker-specific distribution value.
21. A non-transitory computer readable medium comprising instructions, which when executed on a processor, perform a method for authenticating a speech signal, comprising: receiving a spoken utterance of a speaker;generating a phoneme-independent matrix based on the spoken utterance, wherein the phoneme-independent matrix comprises a first plurality of phoneme-independent spectral feature vectors each having been extracted from a respective frame sampled from the spoken utterance at a sampling frequency;decomposing the phoneme-independent matrix into a speaker-specific characteristic unit;comparing the at least one speaker-specific characteristic unit to at least one speaker-specific distribution value, the at least one speaker-specific distribution value previously trained by a registered speaker and generated by decomposing a second plurality of phoneme-independent feature vectors into sets of vectors, including at least a first set of vectors defining at least one speaker-specific recognition unit and a second set of vectors defining at least one content reference sequence; andauthenticating the spoken utterance if the at least one speaker-specific characteristic unit is within a threshold limit of the at least one speaker-specific distribution value.
22. The computer readable medium of claim 21, wherein the phoneme-independent matrix is further decomposed into at least one content input sequence and wherein authenticating the speech signal further comprises: authenticating the speech signal if the at least one content input sequence is similar to at least one content reference sequence previously trained by the registered speaker.
23. The computer-readable medium of claim 22, wherein the method further comprises: determining similarity based on a distance calculated between the at least one content input sequence and the at least one content reference sequence.
24. A system for speech-based user authentication, comprising: a processor configured to receive a spoken utterance of a speaker,generate a phoneme-independent matrix based on the spoken utterance, wherein the phoneme-independent matrix comprises a plurality of phoneme-independent feature vectors each having been extracted from a respective frame sampled from the spoken utterance at a sampling frequency,decompose the phoneme-independent matrix into multiple sets of vectors at least a first set of vectors defining at least one speaker-specific recognition unit and a second set of vectors defining at least one content reference sequence,compute at least one speaker-specific distribution value based on at least the speaker-specific recognition unit; andauthenticate an input speech signal based on the at least one speaker-specific distribution value.
25. The system of claim 24 wherein the processor is further configured to decompose the phoneme-independent matrix into the at least one speaker-specific recognition unit.
26. The system of claim 25 wherein the processor is further configured to apply a singular value decomposition to the phoneme-independent matrix to generate the at least one speaker-specific recognition unit.
27. The system of claim 24 wherein the processor is further configured to generate the at least one speaker-specific recognition unit from a singular value matrix of a singular value decomposition of the phoneme-independent matrix.
28. The system of claim 24 wherein the processor is further configured to decompose at least one phoneme-independent spectral feature vector of an input speech signal into at least one speaker-specific characteristic unit, and authenticate the speech signal if the at least one speaker-specific characteristic unit is within a threshold limit of the at least one speaker-specific distribution value.
29. The system of claim 28 wherein the processor is further configured to apply a singular value decomposition to the at least one phoneme-independent spectral feature vector of the input speech signal.
30. The system of claim 28, wherein the processor is further configured to decompose the at least one phoneme-independent spectral feature vector of the input speech signal into at least one content input sequence, and to authenticate the input speech signal if the at least one content input sequence is similar to the at least one content reference sequence.
31. The system of claim 30, wherein the processor is further configured to determine similarity based on a distance calculated between the at least one content input sequence and the at least one content reference sequence.
32. A system for authenticating a speech signal comprising: a processor to receive a spoken utterance of an unauthenticated speaker,generate a phoneme-independent matrix based on the spoken utterance, wherein the phoneme-independent matrix comprises a first plurality of phoneme-independent spectral feature vectors each having been extracted from a respective frame sampled from the spoken utterance at a sampling frequency,decompose the phoneme-independent matrix into a speaker-specific characteristic unit,compare the at least one speaker-specific characteristic unit to at least one speaker-specific distribution value, the at least one speaker-specific distribution value previously trained by a registered speaker, andauthenticate the spoken utterance if the at least one speaker-specific characteristic unit is within a threshold limit of the at least one speaker-specific distribution value, wherein the at least one speaker-specific distribution value is generated by decomposing a second plurality of phoneme-independent feature vectors into sets of vectors including a first set of vectors defining at least one speaker-specific recognition unit and a second set of vectors defining at least one content reference sequence.
33. The system of claim 32 wherein the processor is further configured to apply a singular value decomposition to the phoneme-independent matrix.
34. The system of claim 32 wherein the processor is further configured to generate the at least one speaker-specific characteristic unit from a singular value matrix of a singular value decomposition of the phoneme-independent matrix.
35. The system of claim 32 wherein the processor is further configured to decompose the second plurality of phoneme-independent feature vectors into the at least one speaker-specific recognition unit, and compute the at least one speaker-specific distribution value from the at least one speaker-specific recognition unit.
36. The system of claim 35 further comprising: a feature extractor to extract the second plurality of phoneme-independent feature vectors into a speaker-specific feature extraction representation.
37. The system of claim 36 wherein the processor is further configured to decompose the speaker-specific feature extraction representation into the at least one speaker-specific recognition unit.
38. The system of claim 37 wherein the processor is further configured to apply a singular value decomposition to the speaker-specific extraction representation to generate the at least one speaker-specific recognition unit.
39. The system of claim 35 wherein the processor is further configured to generate the at least one speaker-specific recognition unit from a singular value matrix of a singular value decomposition of the second plurality of phoneme-independent feature vectors.
40. The system of claim 32, wherein the phoneme-independent matrix is further decomposed into at least one content input sequence and wherein the processor is further configured to authenticate the spoken utterance if the at least one content input sequence is similar to at least one content reference sequence previously trained by the registered speaker.
41. The system of claim 40, wherein the processor is further configured to determine similarity based on a distance calculated between the at least one content input sequence and the at least one content reference sequence.

RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No. 09/527,4989, filed Mar. 16, 2000 now abandoned.

US Referenced Citations (504)

Number	Name	Date	Kind
3704345	Coker et al.	Nov 1972	A
3828132	Flanagan et al.	Aug 1974	A
3979557	Schulman et al.	Sep 1976	A
4278838	Antonov	Jul 1981	A
4282405	Taguchi	Aug 1981	A
4310721	Manley et al.	Jan 1982	A
4348553	Baker et al.	Sep 1982	A
4653021	Takagi	Mar 1987	A
4688195	Thompson et al.	Aug 1987	A
4692941	Jacks et al.	Sep 1987	A
4718094	Bahl et al.	Jan 1988	A
4724542	Williford	Feb 1988	A
4726065	Froessl	Feb 1988	A
4727354	Lindsay	Feb 1988	A
4776016	Hansen	Oct 1988	A
4783807	Marley	Nov 1988	A
4811243	Racine	Mar 1989	A
4819271	Bahl et al.	Apr 1989	A
4827520	Zeinstra	May 1989	A
4829576	Porter	May 1989	A
4833712	Bahl et al.	May 1989	A
4839853	Deerwester et al.	Jun 1989	A
4852168	Sprague	Jul 1989	A
4862504	Nomura	Aug 1989	A
4878230	Murakami et al.	Oct 1989	A
4903305	Gillick et al.	Feb 1990	A
4905163	Garber et al.	Feb 1990	A
4914586	Swinehart et al.	Apr 1990	A
4944013	Gouvianakis et al.	Jul 1990	A
4965763	Zamora	Oct 1990	A
4974191	Amirghodsi et al.	Nov 1990	A
4977598	Doddington et al.	Dec 1990	A
4992972	Brooks et al.	Feb 1991	A
5010574	Wang	Apr 1991	A
5020112	Chou	May 1991	A
5021971	Lindsay	Jun 1991	A
5022081	Hirose et al.	Jun 1991	A
5027406	Roberts et al.	Jun 1991	A
5031217	Nishimura	Jul 1991	A
5032989	Tornetta	Jul 1991	A
5040218	Vitale et al.	Aug 1991	A
5072452	Brown et al.	Dec 1991	A
5091945	Kleijn	Feb 1992	A
5127053	Koch	Jun 1992	A
5127055	Larkey	Jun 1992	A
5128672	Kaehler	Jul 1992	A
5133011	McKiel, Jr.	Jul 1992	A
5142584	Ozawa	Aug 1992	A
5164900	Bernath	Nov 1992	A
5165007	Bahl et al.	Nov 1992	A
5179652	Rozmanith et al.	Jan 1993	A
5194950	Murakami et al.	Mar 1993	A
5199077	Wilcox et al.	Mar 1993	A
5202952	Gillick et al.	Apr 1993	A
5208862	Ozawa	May 1993	A
5216747	Hardwick et al.	Jun 1993	A
5220639	Lee	Jun 1993	A
5220657	Bly et al.	Jun 1993	A
5222146	Bahl et al.	Jun 1993	A
5230036	Akamine et al.	Jul 1993	A
5235680	Bijnagte	Aug 1993	A
5267345	Brown et al.	Nov 1993	A
5268990	Cohen et al.	Dec 1993	A
5282265	Rohra Suda et al.	Jan 1994	A
RE34562	Murakami et al.	Mar 1994	E
5291286	Murakami et al.	Mar 1994	A
5293448	Honda	Mar 1994	A
5293452	Picone et al.	Mar 1994	A
5297170	Eyuboglu et al.	Mar 1994	A
5301109	Landauer et al.	Apr 1994	A
5303406	Hansen et al.	Apr 1994	A
5317507	Gallant	May 1994	A
5317647	Pagallo	May 1994	A
5325297	Bird et al.	Jun 1994	A
5325298	Gallant	Jun 1994	A
5327498	Hamon	Jul 1994	A
5333236	Bahl et al.	Jul 1994	A
5333275	Wheatley et al.	Jul 1994	A
5345536	Hoshimi et al.	Sep 1994	A
5349645	Zhao	Sep 1994	A
5353377	Kuroda et al.	Oct 1994	A
5377301	Rosenberg et al.	Dec 1994	A
5384892	Strong	Jan 1995	A
5384893	Hutchins	Jan 1995	A
5386494	White	Jan 1995	A
5386556	Hedin et al.	Jan 1995	A
5390279	Strong	Feb 1995	A
5396625	Parkes	Mar 1995	A
5400434	Pearson	Mar 1995	A
5424947	Nagao et al.	Jun 1995	A
5434777	Luciw	Jul 1995	A
5455888	Iyengar et al.	Oct 1995	A
5469529	Bimbot et al.	Nov 1995	A
5475587	Anick et al.	Dec 1995	A
5479488	Lennig et al.	Dec 1995	A
5491772	Hardwick et al.	Feb 1996	A
5502790	Yi	Mar 1996	A
5502791	Nishimura et al.	Mar 1996	A
5515475	Gupta et al.	May 1996	A
5536902	Serra et al.	Jul 1996	A
5574823	Hassanein et al.	Nov 1996	A
5577241	Spencer	Nov 1996	A
5579436	Chou et al.	Nov 1996	A
5581655	Cohen et al.	Dec 1996	A
5596676	Swaminathan et al.	Jan 1997	A
5608624	Luciw	Mar 1997	A
5613036	Strong	Mar 1997	A
5617507	Lee et al.	Apr 1997	A
5621859	Schwartz et al.	Apr 1997	A
5642464	Yue et al.	Jun 1997	A
5642519	Martin	Jun 1997	A
5664055	Kroon	Sep 1997	A
5675819	Schuetze	Oct 1997	A
5682539	Conrad et al.	Oct 1997	A
5687077	Gough, Jr.	Nov 1997	A
5712957	Waibel et al.	Jan 1998	A
5727950	Cook et al.	Mar 1998	A
5729694	Holzrichter et al.	Mar 1998	A
5732390	Katayanagi et al.	Mar 1998	A
5734791	Acero et al.	Mar 1998	A
5737487	Bellegarda et al.	Apr 1998	A
5748974	Johnson	May 1998	A
5790978	Olive et al.	Aug 1998	A
5794050	Dahlgren et al.	Aug 1998	A
5794182	Manduchi et al.	Aug 1998	A
5799276	Komissarchik et al.	Aug 1998	A
5826261	Spencer	Oct 1998	A
5828999	Bellegarda et al.	Oct 1998	A
5835893	Ushioda	Nov 1998	A
5839106	Bellegarda	Nov 1998	A
5860063	Gorin et al.	Jan 1999	A
5864806	Mokbel et al.	Jan 1999	A
5867799	Lang et al.	Feb 1999	A
5873056	Liddy et al.	Feb 1999	A
5895466	Goldberg et al.	Apr 1999	A
5899972	Miyazawa et al.	May 1999	A
5913193	Huang et al.	Jun 1999	A
5915249	Spencer	Jun 1999	A
5943670	Prager	Aug 1999	A
5987404	Della Pietra et al.	Nov 1999	A
6016471	Kuhn et al.	Jan 2000	A
6029132	Kuhn et al.	Feb 2000	A
6038533	Buchsbaum et al.	Mar 2000	A
6052656	Suda et al.	Apr 2000	A
6064960	Bellegarda et al.	May 2000	A
6081750	Hoffberg et al.	Jun 2000	A
6088731	Kiraly et al.	Jul 2000	A
6108627	Sabourin	Aug 2000	A
6122616	Henton	Sep 2000	A
6141644	Kuhn et al.	Oct 2000	A
6144938	Surace et al.	Nov 2000	A
6173261	Arai et al.	Jan 2001	B1
6188999	Moody	Feb 2001	B1
6195641	Loring et al.	Feb 2001	B1
6208971	Bellegarda et al.	Mar 2001	B1
6233559	Balakrishnan	May 2001	B1
6246981	Papineni et al.	Jun 2001	B1
6266637	Donovan et al.	Jul 2001	B1
6278970	Milner	Aug 2001	B1
6285786	Seni et al.	Sep 2001	B1
6308149	Gaussier et al.	Oct 2001	B1
6317594	Gossman et al.	Nov 2001	B1
6317707	Bangalore et al.	Nov 2001	B1
6317831	King	Nov 2001	B1
6321092	Fitch et al.	Nov 2001	B1
6334103	Surace et al.	Dec 2001	B1
6343267	Kuhn et al.	Jan 2002	B1
6356854	Schubert et al.	Mar 2002	B1
6366883	Campbell et al.	Apr 2002	B1
6366884	Bellegarda et al.	Apr 2002	B1
6421672	McAllister et al.	Jul 2002	B1
6434524	Weber	Aug 2002	B1
6446076	Burkey et al.	Sep 2002	B1
6453292	Ramaswamy et al.	Sep 2002	B2
6466654	Cooper et al.	Oct 2002	B1
6477488	Bellegarda	Nov 2002	B1
6487534	Thelen et al.	Nov 2002	B1
6490560	Ramaswamy et al.	Dec 2002	B1
6493667	de Souza et al.	Dec 2002	B1
6499013	Weber	Dec 2002	B1
6501937	Ho et al.	Dec 2002	B1
6505158	Conkie	Jan 2003	B1
6513063	Julia et al.	Jan 2003	B1
6523061	Halverson et al.	Feb 2003	B1
6526395	Morris	Feb 2003	B1
6532444	Weber	Mar 2003	B1
6532446	King	Mar 2003	B1
6553344	Bellegarda et al.	Apr 2003	B2
6598039	Livowsky	Jul 2003	B1
6601026	Appelt et al.	Jul 2003	B2
6604059	Strubbe et al.	Aug 2003	B2
6615172	Bennett et al.	Sep 2003	B1
6615175	Gazdzinski	Sep 2003	B1
6631346	Karaorman et al.	Oct 2003	B1
6633846	Bennett et al.	Oct 2003	B1
6647260	Dusse et al.	Nov 2003	B2
6650735	Burton et al.	Nov 2003	B2
6654740	Tokuda et al.	Nov 2003	B2
6665639	Mozer et al.	Dec 2003	B2
6665640	Bennett et al.	Dec 2003	B1
6665641	Coorman et al.	Dec 2003	B1
6684187	Conkie	Jan 2004	B1
6691111	Lazaridis et al.	Feb 2004	B2
6691151	Cheyer et al.	Feb 2004	B1
6697780	Beutnagel et al.	Feb 2004	B1
6735632	Kiraly et al.	May 2004	B1
6742021	Halverson et al.	May 2004	B1
6757362	Cooper et al.	Jun 2004	B1
6757718	Halverson et al.	Jun 2004	B1
6778951	Contractor	Aug 2004	B1
6778952	Bellegarda	Aug 2004	B2
6778962	Kasai et al.	Aug 2004	B1
6792082	Levine	Sep 2004	B1
6807574	Partovi et al.	Oct 2004	B1
6810379	Vermeulen et al.	Oct 2004	B1
6813491	McKinney	Nov 2004	B1
6832194	Mozer et al.	Dec 2004	B1
6842767	Partovi et al.	Jan 2005	B1
6847966	Sommer et al.	Jan 2005	B1
6851115	Cheyer et al.	Feb 2005	B1
6859931	Cheyer et al.	Feb 2005	B1
6895380	Sepe, Jr.	May 2005	B2
6895558	Loveland	May 2005	B1
6912499	Sabourin et al.	Jun 2005	B1
6928614	Everhart	Aug 2005	B1
6937975	Elworthy	Aug 2005	B1
6937986	Denenberg et al.	Aug 2005	B2
6964023	Maes et al.	Nov 2005	B2
6980949	Ford	Dec 2005	B2
6980955	Okutani et al.	Dec 2005	B2
6985865	Packingham et al.	Jan 2006	B1
6988071	Gazdzinski	Jan 2006	B1
6996531	Korall et al.	Feb 2006	B2
6999927	Mozer et al.	Feb 2006	B2
7020685	Chen et al.	Mar 2006	B1
7027974	Busch et al.	Apr 2006	B1
7036128	Julia et al.	Apr 2006	B1
7050977	Bennett	May 2006	B1
7058569	Coorman et al.	Jun 2006	B2
7062428	Hogenhout et al.	Jun 2006	B2
7069560	Cheyer et al.	Jun 2006	B1
7092887	Mozer et al.	Aug 2006	B2
7092928	Elad et al.	Aug 2006	B1
7093693	Gazdzinski	Aug 2006	B1
7127046	Smith et al.	Oct 2006	B1
7136710	Hoffberg et al.	Nov 2006	B1
7137126	Coffman et al.	Nov 2006	B1
7139714	Bennett et al.	Nov 2006	B2
7139722	Perrella et al.	Nov 2006	B2
7177798	Hsu et al.	Feb 2007	B2
7197460	Gupta et al.	Mar 2007	B1
7200559	Wang	Apr 2007	B2
7203646	Bennett	Apr 2007	B2
7216073	Lavi et al.	May 2007	B2
7216080	Tsiao et al.	May 2007	B2
7225125	Bennett et al.	May 2007	B2
7233790	Kjellberg et al.	Jun 2007	B2
7233904	Luisi	Jun 2007	B2
7266496	Wang et al.	Sep 2007	B2
7277854	Bennett et al.	Oct 2007	B2
7290039	Lisitsa et al.	Oct 2007	B1
7299033	Kjellberg et al.	Nov 2007	B2
7310600	Garner et al.	Dec 2007	B1
7324947	Jordan et al.	Jan 2008	B2
7349953	Lisitsa et al.	Mar 2008	B2
7376556	Bennett	May 2008	B2
7376645	Bernard	May 2008	B2
7379874	Schmid et al.	May 2008	B2
7386449	Sun et al.	Jun 2008	B2
7392185	Bennett	Jun 2008	B2
7398209	Kennewick et al.	Jul 2008	B2
7403938	Harrison et al.	Jul 2008	B2
7409337	Potter et al.	Aug 2008	B1
7415100	Cooper et al.	Aug 2008	B2
7418392	Mozer et al.	Aug 2008	B1
7426467	Nashida et al.	Sep 2008	B2
7427024	Gazdzinski et al.	Sep 2008	B1
7447635	Konopka et al.	Nov 2008	B1
7454351	Jeschke et al.	Nov 2008	B2
7467087	Gillick et al.	Dec 2008	B1
7475010	Chao	Jan 2009	B2
7483894	Cao	Jan 2009	B2
7487089	Mozer	Feb 2009	B2
7496498	Chu et al.	Feb 2009	B2
7496512	Zhao et al.	Feb 2009	B2
7502738	Kennewick et al.	Mar 2009	B2
7508373	Lin et al.	Mar 2009	B2
7522927	Fitch et al.	Apr 2009	B2
7523108	Cao	Apr 2009	B2
7526466	Au	Apr 2009	B2
7529671	Rockenbeck et al.	May 2009	B2
7529676	Koyama	May 2009	B2
7539656	Fratkina et al.	May 2009	B2
7546382	Healey et al.	Jun 2009	B2
7548895	Pulsipher	Jun 2009	B2
7555431	Bennett	Jun 2009	B2
7558730	Davis et al.	Jul 2009	B2
7571106	Cao et al.	Aug 2009	B2
7599918	Shen et al.	Oct 2009	B2
7620549	Di Cristo et al.	Nov 2009	B2
7624007	Bennett	Nov 2009	B2
7634409	Kennewick et al.	Dec 2009	B2
7636657	Ju et al.	Dec 2009	B2
7640160	Di Cristo et al.	Dec 2009	B2
7647225	Bennett et al.	Jan 2010	B2
7657424	Bennett	Feb 2010	B2
7672841	Bennett	Mar 2010	B2
7676026	Baxter, Jr.	Mar 2010	B1
7684985	Dominach et al.	Mar 2010	B2
7693715	Hwang et al.	Apr 2010	B2
7693720	Kennewick et al.	Apr 2010	B2
7698131	Bennett	Apr 2010	B2
7702500	Blaedow	Apr 2010	B2
7702508	Bennett	Apr 2010	B2
7707027	Balchandran et al.	Apr 2010	B2
7707032	Wang et al.	Apr 2010	B2
7707267	Lisitsa et al.	Apr 2010	B2
7711565	Gazdzinski	May 2010	B1
7711672	Au	May 2010	B2
7716056	Weng et al.	May 2010	B2
7720674	Kaiser et al.	May 2010	B2
7720683	Vermeulen et al.	May 2010	B1
7725307	Bennett	May 2010	B2
7725318	Gavalda et al.	May 2010	B2
7725320	Bennett	May 2010	B2
7725321	Bennett	May 2010	B2
7729904	Bennett	Jun 2010	B2
7729916	Coffman et al.	Jun 2010	B2
7734461	Kwak et al.	Jun 2010	B2
7752152	Paek et al.	Jul 2010	B2
7774204	Mozer et al.	Aug 2010	B2
7783486	Rosser et al.	Aug 2010	B2
7801729	Mozer	Sep 2010	B2
7809570	Kennewick et al.	Oct 2010	B2
7809610	Cao	Oct 2010	B2
7818176	Freeman et al.	Oct 2010	B2
7822608	Cross, Jr. et al.	Oct 2010	B2
7826945	Zhang et al.	Nov 2010	B2
7831426	Bennett	Nov 2010	B2
7840400	Lavi et al.	Nov 2010	B2
7840447	Kleinrock et al.	Nov 2010	B2
7873519	Bennett	Jan 2011	B2
7873654	Bernard	Jan 2011	B2
7881936	Longé et al.	Feb 2011	B2
7912702	Bennett	Mar 2011	B2
7917367	Di Cristo et al.	Mar 2011	B2
7917497	Harrison et al.	Mar 2011	B2
7920678	Cooper et al.	Apr 2011	B2
7925525	Chin	Apr 2011	B2
7930168	Weng et al.	Apr 2011	B2
7949529	Weider et al.	May 2011	B2
7949534	Davis et al.	May 2011	B2
7974844	Sumita	Jul 2011	B2
7974972	Cao	Jul 2011	B2
7983915	Knight et al.	Jul 2011	B2
7983917	Kennewick et al.	Jul 2011	B2
7983997	Allen et al.	Jul 2011	B2
7987151	Schott et al.	Jul 2011	B2
8000453	Cooper et al.	Aug 2011	B2
8005679	Jordan et al.	Aug 2011	B2
8015006	Kennewick et al.	Sep 2011	B2
8024195	Mozer et al.	Sep 2011	B2
8036901	Mozer	Oct 2011	B2
8041570	Mirkovic et al.	Oct 2011	B2
8041611	Kleinrock et al.	Oct 2011	B2
8055708	Chitsaz et al.	Nov 2011	B2
8065155	Gazdzinski	Nov 2011	B1
8065156	Gazdzinski	Nov 2011	B2
8069046	Kennewick et al.	Nov 2011	B2
8073681	Baldwin et al.	Dec 2011	B2
8078473	Gazdzinski	Dec 2011	B1
8082153	Coffman et al.	Dec 2011	B2
8095364	LongÉ et al.	Jan 2012	B2
8099289	Mozer et al.	Jan 2012	B2
8107401	John et al.	Jan 2012	B2
8112275	Kennewick et al.	Feb 2012	B2
8112280	Lu	Feb 2012	B2
8117037	Gazdzinski	Feb 2012	B2
8131557	Davis et al.	Mar 2012	B2
8140335	Kennewick et al.	Mar 2012	B2
8165886	Gagnon et al.	Apr 2012	B1
8166019	Lee et al.	Apr 2012	B1
8190359	Bourne	May 2012	B2
8195467	Mozer et al.	Jun 2012	B2
8204238	Mozer	Jun 2012	B2
8205788	Gazdzinski et al.	Jun 2012	B1
8219407	Roy et al.	Jul 2012	B1
8285551	Gazdzinski	Oct 2012	B2
8285553	Gazdzinski	Oct 2012	B2
8290778	Gazdzinski	Oct 2012	B2
8290781	Gazdzinski	Oct 2012	B2
8296146	Gazdzinski	Oct 2012	B2
8296153	Gazdzinski	Oct 2012	B2
8301456	Gazdzinski	Oct 2012	B2
8311834	Gazdzinski	Nov 2012	B1
8370158	Gazdzinski	Feb 2013	B2
8371503	Gazdzinski	Feb 2013	B2
8447612	Gazdzinski	May 2013	B2
20020032564	Ehsani et al.	Mar 2002	A1
20020046025	Hain	Apr 2002	A1
20020069063	Buchner et al.	Jun 2002	A1
20020077817	Atal	Jun 2002	A1
20040135701	Yasuda et al.	Jul 2004	A1
20050071332	Ortega et al.	Mar 2005	A1
20050080625	Bennett et al.	Apr 2005	A1
20050119897	Bennett et al.	Jun 2005	A1
20050143972	Gopalakrishnan et al.	Jun 2005	A1
20050182629	Coorman et al.	Aug 2005	A1
20050196733	Budra et al.	Sep 2005	A1
20060018492	Chiu et al.	Jan 2006	A1
20060122834	Bennett	Jun 2006	A1
20060143007	Koh et al.	Jun 2006	A1
20070055529	Kanevsky et al.	Mar 2007	A1
20070058832	Hug et al.	Mar 2007	A1
20070088556	Andrew	Apr 2007	A1
20070100790	Cheyer et al.	May 2007	A1
20070118377	Badino et al.	May 2007	A1
20070174188	Fish	Jul 2007	A1
20070185917	Prahlad et al.	Aug 2007	A1
20070282595	Tunning et al.	Dec 2007	A1
20080015864	Ross et al.	Jan 2008	A1
20080021708	Bennett et al.	Jan 2008	A1
20080034032	Healey et al.	Feb 2008	A1
20080052063	Bennett et al.	Feb 2008	A1
20080120112	Jordan et al.	May 2008	A1
20080129520	Lee	Jun 2008	A1
20080140657	Azvine et al.	Jun 2008	A1
20080221903	Kanevsky et al.	Sep 2008	A1
20080228496	Yu et al.	Sep 2008	A1
20080247519	Abella et al.	Oct 2008	A1
20080249770	Kim et al.	Oct 2008	A1
20080300878	Bennett	Dec 2008	A1
20090006100	Badger et al.	Jan 2009	A1
20090006343	Platt et al.	Jan 2009	A1
20090030800	Grois	Jan 2009	A1
20090058823	Kocienda	Mar 2009	A1
20090076796	Daraselia	Mar 2009	A1
20090100049	Cao	Apr 2009	A1
20090112677	Rhett	Apr 2009	A1
20090150156	Kennewick et al.	Jun 2009	A1
20090157401	Bennett	Jun 2009	A1
20090164441	Cheyer	Jun 2009	A1
20090171664	Kennewick et al.	Jul 2009	A1
20090290718	Kahn et al.	Nov 2009	A1
20090299745	Kennewick et al.	Dec 2009	A1
20090299849	Cao et al.	Dec 2009	A1
20100005081	Bennett	Jan 2010	A1
20100023320	Di Cristo et al.	Jan 2010	A1
20100036660	Bennett	Feb 2010	A1
20100042400	Block et al.	Feb 2010	A1
20100088020	Sano et al.	Apr 2010	A1
20100145700	Kennewick et al.	Jun 2010	A1
20100204986	Kennewick et al.	Aug 2010	A1
20100217604	Baldwin et al.	Aug 2010	A1
20100228540	Bennett	Sep 2010	A1
20100235341	Bennett	Sep 2010	A1
20100257160	Cao	Oct 2010	A1
20100277579	Cho et al.	Nov 2010	A1
20100280983	Cho et al.	Nov 2010	A1
20100286985	Kennewick et al.	Nov 2010	A1
20100299142	Freeman et al.	Nov 2010	A1
20100312547	van Os et al.	Dec 2010	A1
20100318576	Kim	Dec 2010	A1
20100332235	David	Dec 2010	A1
20100332348	Cao	Dec 2010	A1
20110060807	Martin et al.	Mar 2011	A1
20110082688	Kim et al.	Apr 2011	A1
20110112827	Kennewick et al.	May 2011	A1
20110112921	Kennewick et al.	May 2011	A1
20110119049	Ylonen	May 2011	A1
20110125540	Jang et al.	May 2011	A1
20110130958	Stahl et al.	Jun 2011	A1
20110131036	Di Cristo et al.	Jun 2011	A1
20110131045	Cristo et al.	Jun 2011	A1
20110144999	Jang et al.	Jun 2011	A1
20110161076	Davis et al.	Jun 2011	A1
20110175810	Markovic et al.	Jul 2011	A1
20110184730	LeBeau et al.	Jul 2011	A1
20110218855	Cao et al.	Sep 2011	A1
20110231182	Weider et al.	Sep 2011	A1
20110231188	Kennewick et al.	Sep 2011	A1
20110264643	Cao	Oct 2011	A1
20110279368	Klein et al.	Nov 2011	A1
20110306426	Novak et al.	Dec 2011	A1
20120002820	Leichter	Jan 2012	A1
20120016678	Gruber et al.	Jan 2012	A1
20120020490	Leichter	Jan 2012	A1
20120022787	LeBeau et al.	Jan 2012	A1
20120022857	Baldwin et al.	Jan 2012	A1
20120022860	Lloyd et al.	Jan 2012	A1
20120022868	LeBeau et al.	Jan 2012	A1
20120022869	Lloyd et al.	Jan 2012	A1
20120022870	Kristjansson et al.	Jan 2012	A1
20120022874	Lloyd et al.	Jan 2012	A1
20120022876	LeBeau et al.	Jan 2012	A1
20120023088	Cheng et al.	Jan 2012	A1
20120034904	LeBeau et al.	Feb 2012	A1
20120035908	LeBeau et al.	Feb 2012	A1
20120035924	Jitkoff et al.	Feb 2012	A1
20120035931	LeBeau et al.	Feb 2012	A1
20120035932	Jitkoff et al.	Feb 2012	A1
20120042343	Laligand et al.	Feb 2012	A1
20120271676	Aravamudan et al.	Oct 2012	A1
20120311583	Gruber et al.	Dec 2012	A1

Foreign Referenced Citations (25)

Number	Date	Country
0218859	Apr 1987	EP
0262938	Apr 1988	EP
0293259	Nov 1988	EP
0313975	May 1989	EP
0314908	May 1989	EP
0327408	Aug 1989	EP
0389271	Sep 1990	EP
0411675	Feb 1991	EP
0559349	Sep 1993	EP
0559349	Sep 1993	EP
0570660	Nov 1993	EP
1245023	Oct 2002	EP
06 019965	Jan 1994	JP
2001 125896	May 2001	JP
2002 024212	Jan 2002	JP
2003517158	May 2003	JP
2009 036999	Feb 2009	JP
10-0776800	Nov 2007	KR
10-0810500	Mar 2008	KR
10 2008 109322	Dec 2008	KR
10 2009 086805	Aug 2009	KR
10-0920267	Oct 2009	KR
10 2011 0113414	Oct 2011	KR
WO 2006129967	Dec 2006	WO
WO 2011088053	Jul 2011	WO

Non-Patent Literature Citations (237)

Entry
Tomoko Matsui, Sadaoki Furui, “Speaker Adaptation of Tied-Mixture-Based Phoneme Models for Text-Prompted Speaker Recognitions”, 1994 IEEE International Conference on Acoustics, Speech and Signal Processing, Apr. 19-22, 1994, p. I-125-I-128.
Martin, D., et al., “The Open Agent Architecture: a Framework for building distributed software systems,” Jan.-Mar. 1999, Applied Artificial Intelligence: an International Journal, vol. 13, No. 1-2, http://adam.cheyer.com/papers/oaa.pdf, 38 pages.
Alfred App, 2011, http://www.alfredapp.com/, 5 pages.
Ambite, JL., et al., “Design and Implementation of the CALO Query Manager,” Copyright © 2006, American Association for Artificial Intelligence, (www.aaai.org), 8 pages.
Ambite, JL., et al., “Integration of Heterogeneous Knowledge Sources in the CALO Query Manager,” 2005, The 4th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE), Agia Napa, Cyprus, ttp://www.isi.edu/people/ambite/publications/integration—heterogeneous—knowledge—sources—calo—query—manager, 18 pages.
Belvin, R. et al., “Development of the HRL Route Navigation Dialogue System,” 2001, in Proceedings of the First International Conference on Human Language Technology Research, Paper, Copyright © 2001 HRL Laboratories, LLC, http://citeseerx.ist.psu.edu/viewdoc/sunnmary?doi=10.1.1.10.6538, 5 pages.
Berry, P. M., et al. “PTIME: Personalized Assistance for Calendaring,” ACM Transactions on Intelligent Systems and Technology, vol. 2, No. 4, Article 40, Publication date: Jul. 2011, 40:1-22, 22 pages.
Butcher, M., “EVI arrives in town to go toe-to-toe with Siri,” Jan. 23, 2012, http://techcrunch.com/2012/01/23/evi-arrives-in-town-to-go-toe-to-toe-with-siri/, 2 pages.
Chen, Y., “Multimedia Siri Finds and Plays Whatever You Ask for,” Feb. 9, 2012, http://www.psfk.com/2012/02/multimedia-siri.html, 9 pages.
Cheyer, A. et al., “Spoken Language and Multimodal Applications for Electronic Realties,” © Springer-Verlag London Ltd, Virtual Reality 1999, 3:1-15, 15 pages.
Cutkosky, M. R. et al., “PACT: An Experiment in Integrating Concurrent Engineering Systems,” Journal, Computer, vol. 26 Issue 1, Jan. 1993, IEEE Computer Society Press Los Alamitos, CA, USA, http://dl.acm.org/citation.cfm?id=165320, 14 pages.
Elio, R. et al., “On Abstract Task Models and Conversation Policies,” http://webdocs.cs.ualberta.ca/˜ree/publications/papers2/Ats.AA99.pdf, 10 pages.
Ericsson, S. et al., “Software illustrating a unified approach to multimodality and multilinguality in the in-home domain,” Dec. 22, 2006, Talk and Look: Tools for Ambient Linguistic Knowledge, http://www.talk-project.eurice.eu/fileadmin/talk/publications—public/deliverables—public/D1—6.pdf, 127 pages.
Evi, “Meet Evi: the one mobile app that provides solutions for your everyday problems,” Feb. 8, 2012, http://www.evi.com/, 3 pages.
Feigenbaum, E., et al., “Computer-assisted Semantic Annotation of Scientific Life Works,” 2007, http://tomgruber.org/writing/stanford-cs300.pdf, 22 pages.
Gannes, L., “Alfred App Gives Personalized Restaurant Recommendations,” allthingsd.com, Jul. 18, 2011, http://allthingsd.com/20110718/alfred-app-gives-personalized-restaurant-recommendations/, 3 pages.
Gautier, P. O., et al. “Generating Explanations of Device Behavior Using Compositional Modeling and Causal Ordering,” 1993, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.8394, 9 pages.
Gervasio, M. T., et al., Active Preference Learning for Personalized Calendar Scheduling Assistancae, Copyright © 2005, http://www.ai.sri.com/˜gervasio/pubs/gervasio-iui05.pdf, 8 pages.
Glass, A., “Explaining Preference Learning,” 2006, http://cs229.stanford.edu/proj2006/Glass-ExplainingPreferenceLearning.pdf, 5 pages.
Gruber, T. R., et al., “An Ontology for Engineering Mathematics,” in Jon Doyle, Piero Torasso, & Erik Sandewall, Eds., Fourth International Conference on Principles of Knowledge Representation and Reasoning, Gustav Stresemann Institut, Bonn, Germany, Morgan Kaufmann, 1994, http://www-ksl.stanford.edu/knowledge-sharing/papers/engmath.html, 22 pages.
Gruber, T. R., “A Translation Approach to Portable Ontology Specifications,” Knowledge Systems Laboratory, Stanford University, Sep. 1992, Technical Report KSL 92-71, Revised Apr. 1993, 27 pages.
Gruber, T. R., “Automated Knowledge Acquisition for Strategic Knowledge,” Knowledge Systems Laboratory, Machine Learning, 4, 293-336 (1989), 44 pages.
Gruber, T. R., “(Avoiding) the Travesty of the Commons,” Presentation at NPUC 2006, New Paradigms for User Computing, IBM Almaden Research Center, Jul. 24, 2006. http://tomgruber.org/writing/avoiding-travestry.htm, 52 pages.
Gruber, T. R., “Big Think Small Screen: How semantic computing in the cloud will revolutionize the consumer experience on the phone,” Keynote presentation at Web 3.0 conference, Jan. 27, 2010, http://tomgruber.org/writing/web30jan2010.htm, 41 pages.
Gruber, T. R., “Collaborating around Shared Content on the WWW,” W3C Workshop on WWW and Collaboration, Cambridge, MA, Sep. 11, 1995, http://www.w3.org/Collaboration/Workshop/Proceedings/P9.html, 1 page.
Gruber, T. R., “Collective Knowledge Systems: Where the Social Web meets the Semantic Web,” Web Semantics: Science, Services and Agents on the World Wide Web (2007), doi:10.1016/j.websem.2007.11.011, keynote presentation given at the 5th International Semantic Web Conference, Nov. 7, 2006, 19 pages.
Gruber, T. R., “Where the Social Web meets the Semantic Web,” Presentation at the 5th International Semantic Web Conference, Nov. 7, 2006, 38 pages.
Gruber, T. R., “Despite our Best Efforts, Ontologies are not the Problem,” AAAI Spring Symposium, Mar. 2008, http://tomgruber.org/writing/aaai-ss08.htm, 40 pages.
Gruber, T. R., “Enterprise Collaboration Management with Intraspect,” Intraspect Software, Inc., Instraspect Technical White Paper Jul. 2001, 24 pages.
Gruber, T. R., “Every ontology is a treaty—a social agreement—among people with some common motive in sharing,” Interview by Dr. Miltiadis D. Lytras, Official Quarterly Bulletin of AIS Special Interest Group on Semantic Web and Information Systems, vol. 1, Issue 3, 2004, http://www.sigsemis.org 1, 5 pages.
Gruber, T. R., et al., “Generative Design Rationale: Beyond the Record and Replay Paradigm,” Knowledge Systems Laboratory, Stanford University, Dec. 1991, Technical Report KSL 92-59, Updated Feb. 1993, 24 pages.
Gruber, T. R., “Helping Organizations Collaborate, Communicate, and Learn,” Presentation to NASA Ames Research, Mountain View, CA, Mar. 2003, http://tomgruber.org/writing/organizational-intelligence-talk.htm, 30 pages.
Gruber, T. R., “Intelligence at the Interface: Semantic Technology and the Consumer Internet Experience,” Presentation at Semantic Technologies conference (SemTech08), May 20, 2008, http://tomgruber.org/writing.htm, 40 pages.
Gruber, T. R., Interactive Acquisition of Justifications: Learning “Why” by Being Told “What” Knowledge Systems Laboratory, Stanford University, Oct. 1990, Technical Report KSL 91-17, Revised Feb. 1991, 24 pages.
Gruber, T. R., “It Is What It Does: the Pragmatics of Ontology for Knowledge Sharing,” (c) 2000, 2003, http://www.cidoc-crm.org/docs/symposium—presentations/gruber—cidoc-ontology-2003.pdf, 21 pages.
Gruber, T. R., et al., “Machine-generated Explanations of Engineering Models: A Compositional Modeling Approach,” (1993) in Proc. International Joint Conference on Artificial Intelligence, http://citeseerx.ist.psu.edu/viewdoc/sunnmary?doi=10.1.1.34.930, 7 pages.
Gruber, T. R., “2021: Mass Collaboration and the Really New Economy,” TNTY Futures, the newsletter of the Next Twenty Years series, vol. 1, Issue 6, Aug. 2001, http://www.tnty.com/newsletter/futures/archive/v01-05business.html, 5 pages.
Gruber, T. R., et al.,“NIKE: A National Infrastructure for Knowledge Exchange,” Oct. 1994, http://www.eit.com/papers/nike/nike.html and nike.ps, 10 pages.
Gruber, T. R., “Ontologies, Web 2.0 and Beyond,” Apr. 24, 2007, Ontology Summit 2007, http://tomgruber.org/writing/ontolog-social-web-keynote.pdf, 17 pages.
Gruber, T. R., “Ontology of Folksonomy: A Mash-up of Apples and Oranges,” Originally published to the web in 2005, Int'l Journal on Semantic Web & Information Systems, 3(2), 2007, 7 pages.
Gruber, T. R., “Siri, a Virtual Personal Assistant—Bringing Intelligence to the Interface,” Jun. 16, 2009, Keynote presentation at Semantic Technologies conference, Jun. 2009. http://tomgruber.org/writing/semtech09.htm, 22 pages.
Gruber, T. R., “TagOntology,” Presentation to Tag Camp, www.tagcamp.org, Oct. 29, 2005, 20 pages.
Gruber, T. R., et al., “Toward a Knowledge Medium for Collaborative Product Development,” in Artificial Intelligence in Design 1992, from Proceedings of the Second International Conference on Artificial Intelligence in Design, Pittsburgh, USA, Jun. 22-25, 1992, 19 pages.
Gruber, T. R., “Toward Principles for the Design of Ontologies Used for Knowledge Sharing,” in International Journal Human-Computer Studies 43, p. 907-928, substantial revision of paper presented at the International Workshop on Formal Ontology, Mar. 1993, Padova, Italy, available as Technical Report KSL 93-04, Knowledge Systems Laboratory, Stanford University, further revised Aug. 23, 1993, 23 pages.
Guzzoni, D., et al., “Active, A Platform for Building Intelligent Operating Rooms,” Surgetica 2007 Computer-Aided Medical Interventions: tools and applications, pp. 191-198, Paris, 2007, Sauramps Médical, http://lsro.epfl.ch/page-68384-en.html, 8 pages.
Guzzoni, D., et al., “Active, A Tool for Building Intelligent User Interfaces,” ASC 2007, Palma de Mallorca, http://lsro.epfl.ch/page-34241.html, 6 pages.
Guzzoni, D., et al., “Modeling Human-Agent Interaction with Active Ontologies,” 2007, AAAI Spring Symposium, Interaction Challenges for Intelligent Assistants, Stanford University, Palo Alto, California, 8 pages.
Hardawar, D., “Driving app Waze builds its own Sid for hands-free voice control,” Feb. 9, 2012, http://venturebeat.com/2012/02/09/driving-app-waze-builds-its-own-siri-for-hands-free-voice-control/, 4 pages.
Intraspect Software, “The Intraspect Knowledge Management Solution: Technical Overview,” http://tomgruber.org/writing/intraspect-whitepaper-1998.pdf, 18 pages.
Julia, L., et al., Un éditeur interactif de tableaux dessinés à main levée (an Interactive Editor for Hand-Sketched Tables), Traitement du Signal 1995, vol. 12, No. 6, 8 pages. No English Translation Available.
Karp, P. D., “A Generic Knowledge-Base Access Protocol,” May 12, 1994, http://lecture.cs.buu.ac.th/˜f50353/Document/gfp.pdf, 66 pages.
Lemon, O., et al., “Multithreaded Context for Robust Conversational Interfaces: Context-Sensitive Speech Recognition and Interpretation of Corrective Fragments,” Sep. 2004, ACM Transactions on Computer-Human Interaction, vol. 11, No. 3, 27 pages.
Leong, L., et al., “CASIS: A Context-Aware Speech Interface System,” IUI'05, Jan. 9-12, 2005, Proceedings of the 10th international conference on Intelligent user interfaces, San Diego, California, USA, 8 pages.
Lieberman, H., et al., “Out of context: Computer systems that adapt to, and learn from, context,” 2000, IBM Systems Journal, vol. 39, Nos. 3/4, 2000, 16 pages.
Lin, B., et al., “A Distributed Architecture for Cooperative Spoken Dialogue Agents with Coherent Dialogue State and History,” 1999, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.272, 4 pages.
McGuire, J., et al., “SHADE: Technology for Knowledge-Based Collaborative Engineering,” 1993, Journal of Concurrent Engineering: Applications and Research (CERA), 18 pages.
Milward, D., et al., “D2.2: Dynamic Multimodal Interface Reconfiguration,” Talk and Look: Tools for Ambient Linguistic Knowledge, Aug. 8, 2006, http://www.ihmc.us/users/nblaylock/Pubs/Files/talk—d2.2.pdf, 69 pages.
Mitra, P., et al., “A Graph-Oriented Model for Articulation of Ontology Interdependencies,” 2000, http://ilpubs.stanford.edu:8090/442/1/2000-20.pdf, 15 pages.
Moran, D. B., et al., “Multimodal User Interfaces in the Open Agent Architecture,” Proc. of the 1997 International Conference on Intelligent User Interfaces (IUI97), 8 pages.
Mozer, M., “An Intelligent Environment Must be Adaptive,” Mar./Apr. 1999, IEEE Intelligent Systems, 3 pages.
Mühlhäuser, M., “Context Aware Voice User Interfaces for Workflow Support,” Darmstadt 2007, http://tuprints.ulb.tu-darmstadt.de/876/1/PhD.pdf, 254 pages.
Naone, E., “TR10: Intelligent Software Assistant,” Mar.-Apr. 2009, Technology Review, http://www.technologyreview.com/printer—friendly—article.aspx?id=22117, 2 pages.
Neches, R., “Enabling Technology for Knowledge Sharing,” Fall 1991, AI Magazine, pp. 37-56, (21 pages).
Nöth, E., et al., “Verbmobil: The Use of Prosody in the Linguistic Components of a Speech Understanding System,” IEEE Transactions on Speech and Audio Processing, vol. 8, No. 5, Sep. 2000, 14 pages.
Rice, J., et al., “Monthly Program: Nov. 14, 1995,” The San Francisco Bay Area Chapter of ACM SIGCHI, http://www.baychi.org/calendar/19951114/, 2 pages.
Rice, J., et al., “Using the Web Instead of a Window System,” Knowledge Systems Laboratory, Stanford University, http://tomgruber.org/writing/ksl-95-69.pdf, 14 pages.
Rivlin, Z., et al., “Maestro: Conductor of Multimedia Analysis Technologies,” 1999 SRI International, Communications of the Association for Computing Machinery (CACM), 7 pages.
Sheth, A., et al., “Relationships at the Heart of Semantic Web: Modeling, Discovering, and Exploiting Complex Semantic Relationships,” Oct. 13, 2002, Enhancing the Power of the Internet: Studies in Fuzziness and Soft Computing, SpringerVerlag, 38 pages.
Simonite, T., “One Easy Way to Make Sid Smarter,” Oct. 18, 2011, Technology Review, http:// www.technologyreview.com/printer—friendly—article.aspx?id=38915, 2 pages.
Stent, A., et al., “The CommandTalk Spoken Dialogue System,” 1999, http://acl.ldc.upenn.edu/P/P99/P99-1024.pdf, 8 pages.
Tofel, K., et al., “SpeakTolt: A personal assistant for older iPhones, iPads,” Feb. 9, 2012, http://gigaom.com/apple/speaktoit-siri-for-older-iphones-ipads/, 7 pages.
Tucker, J., “Too lazy to grab your TV remote? Use Siri instead,” Nov. 30, 2011, http://www.engadget.corn/2011/11/30/too-lazy-to-grab-your-tv-remote-use-siri-instead/, 8 pages.
Tur, G., et al., “The CALO Meeting Speech Recognition and Understanding System,” 2008, Proc. IEEE Spoken Language Technology Workshop, 4 pages.
Tur, G., et al., “The-CALO-Meeting-Assistant System,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, No. 6, Aug. 2010, 11 pages.
Vlingo, “Vlingo Launches Voice Enablement Application on Apple App Store,” Vlingo press release dated Dec. 3, 2008, 2 pages.
YouTube, “Knowledge Navigator,” 5:34 minute video uploaded to YouTube by Knownav on Apr. 29, 2008, http://www.youtube.com/watch?v=QRH8eimU—20on Aug. 3, 2006, 1 page.
YouTube,“Send Text, Listen to and Send E-Mail ‘By Voice’ www.voiceassist.com,” 2:11 minute video uploaded to YouTube by VoiceAssist on Jul. 30, 2009, http://www.youtube.com/watch?v=0tEU61nHHA4, 1 page.
YouTube,“Text'nDrive App Demo—Listen and Reply to your Messages by Voice while Driving!,” 1:57 minute video uploaded to YouTube by TextnDrive on Apr 27, 2010, http://www.youtube.com/watch?v=WaGfzoHsAMw, 1 page.
YouTube, “Voice on the Go (BlackBerry),” 2:51 minute video uploaded to YouTube by VoiceOnTheGo on Jul. 27, 2009, http://www.youtube.com/watch?v=pJqpWgQS98w, 1 page.
International Search Report and Written Opinion dated Nov. 29, 2011, received in International Application No. PCT/US2011/20861, which corresponds to U.S. Appl. No. 12/987,982, 15 pages. (Thomas Robert Gruber).
Glass, J., et al., “Multiple Spoken-Language Understanding in the MIT Voyager System,” Aug. 1995, http://groups.csail.mit.edu/sls/publications/1995/speechcomm95-voyager.pdf, 29 pages.
Goddeau, D., et al., “A Form-Based Dialogue Manager for Spoken Language Applications,” Oct. 1996, http://phasedance.com/pdf/icslp96.pdf, 4 pages.
Goddeau, D., et al., “Galaxy: A Human-Language Interface to On-Line Travel Information,” 1994 International Conference on Spoken Language Processing, Sep. 18-22, 1994, Pacific Convention Plaza Yokohama, Japan, 6 pages.
Meng, H., et al., “Wheels: A Conversational System in the Automobile Classified Domain,” Oct. 1996, httphttp://citeseerx.ist.psu.edu/viewdocs/summary?doi=10.1.1.16.3022, 4 pages.
Phoenix Solutions, Inc. v. West Interactive Corp., Document 40, Declaration of Christopher Schmandt Regarding the MIT Galaxy System dated Jul. 2, 2010, 162 pages.
Seneff, S., et al., “A New Restaurant Guide Conversational System: Issues in Rapid Protyping for Specialized Domains,” Oct. 1996, citeseerx.ist.psu.edu.viewdoc.download?doi=10.1.1.16...rep..., 4 pages.
Vlingo InCar, “Distracted Driving Solution with Vlingo InCar,” 2:38 minute video uploaded to YouTube by Vlingo Voice on Oct. 6, 2010, http://www.youtube.com/watch?v=Vqs8XfXxgz4, 2 pages.
Zue, V., “Conversational Interfaces: Adbances and Challenges, ” Sep. 1997, http://www.cs.cmu.edu/˜dod/papers.zue97.pdf, 10 pages.
Zue, V. W., “Toward Systems that Understand Spoken Language,” Feb. 1994, ARPA Strategic Computing Institute, © 1994 IEEE, 9 pages.
Bussler, C., et al., “Web Service Execution Environment (WSMX),” Jun. 3, 2005, W3C Member Submission, http://www.w3.org/Submission/WSMX, 29 pages.
Cheyer, A., “About Adam Cheyer,” Sep. 17, 2012, http://www.adam.cheyer.com/about.html, 2 pages.
Cheyer, ., “A Perspective on AI & Agent Technologies for SCM,” VerticalNet, 2001 presentation, 22 pages.
Domingue, J., et al., “Web Service Modeling Ontology (WSMO)—An Ontology for Semantic Web Services,” Jun. 9-10, 2005, position paper at the W3C Workshop on Frameworks for Semantics in Web Services, Innsbruck, Austria, 6 pages.
Guzzoni, D., et al., “A Unified Platform for Building Intelligent Web Interaction Assistants,” Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Computer Society, 4 pages.
Roddy, D., et al., “Communication and Collaboration in a Landscape of B2B eMarketplaces,” VerticalNet Solutions, white paper, Jun. 15, 2000, 23 pages.
Acero, A., et al., “Environmental Robustness in Automatic Speech Recognition,” International Conference on Acoustics, Speech, and Signal Processing (ICASSP'90), Apr. 3-6, 1990, 4 pages.
Acero, A., et al., “Robust Speech Recognition by Normalization of the Acoustic Space,” International Conference on Acoustics, Speech, and Signal Processing, 1991, 4 pages.
Ahlbom, G., et al., “Modeling Spectral Speech Transitions Using Temporal Decomposition Techniques,” IEEE International Conference of Acoustics, Speech, and Signal Processing (ICASSP'87), Apr. 1987, vol. 12, 4 pages.
Aikawa, K., “Speech Recognition Using Time-Warping Neural Networks,” Proceedings of the 1991 IEEE Workshop on Neural Networks for Signal Processing, Sep. 30 to Oct. 1, 1991, 10 pages.
Anastasakos, A., et al., “Duration Modeling in Large Vocabulary Speech Recognition,” International Conference on Acoustics, Speech, and Signal Processing (ICASSP'95), May 9-12, 1995, 4 pages.
Anderson, R. H., “Syntax-Directed Recognition of Hand-Printed Two-Dimensional Mathematics,” in Proceedings of Symposium on Interactive Systems for Experimental Applied Mathematics: Proceedings of the Association for Computing Machinery Inc. Symposium, © 1967, 12 pages.
Ansari, R., et al., “Pitch Modification of Speech using a Low-Sensitivity Inverse Filter Approach,” IEEE Signal Processing Letters, vol. 5, No. 3, Mar. 1998, 3 pages.
Anthony, N. J., et al., “Supervised Adaption for Signature Verification System,” Jun. 1, 1978, IBM Technical Disclosure, 3 pages.
Apple Computer, “Guide Maker User's Guide,” © Apple Computer, Inc., Apr. 27, 1994, 8 pages.
Apple Computer, “Introduction to Apple Guide,” © Apple Computer, Inc., Apr. 28, 1994, 20 pages.
Asanović, K., et al., “Experimental Determination of Precision Requirements for Back-Propagation Training of Artificial Neural Networks,” in Proceedings of the 2nd International Conference of Microelectronics for Neural Networks, 1991, www.ICSI.Berkeley.EDU, 7 pages.
Atal, B. S., “Efficient Coding of LPC Parameters by Temporal Decomposition,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'83), Apr. 1983, 4 pages.
Bahl, L. R., et al., “Acoustic Markov Models Used in the Tangora Speech Recognition System,” in Proceeding of International Conference on Acoustics, Speech, and Signal Processing (ICASSP'88), Apr. 11-14, 1988, vol. 1, 4 pages.
Bahl, L. R., et al., “A Maximum Likelihood Approach to Continuous Speech Recognition,” IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. PAMI-5, No. 2, Mar. 1983, 13 pages.
Bahl, L. R., et al., “A Tree-Based Statistical Language Model for Natural Language Speech Recognition,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 37, Issue 7, Jul. 1989, 8 pages.
Bahl, L. R., et al., “Large Vocabulary Natural Language Continuous Speech Recognition,” in Proceedings of 1989 International Conference on Acoustics, Speech, and Signal Processing, May 23-26, 1989, vol. 1, 6 pages.
Bahl, L. R., et al, “Multonic Markov Word Models for Large Vocabulary Continuous Speech Recognition,” IEEE Transactions on Speech and Audio Processing, vol. 1, No. 3, Jul. 1993, 11 pages.
Bahl, L. R., et al., “Speech Recognition with Continuous-Parameter Hidden Markov Models,” in Proceeding of International Conference on Acoustics, Speech, and Signal Processing (ICASSP'88), Apr. 11-14, 1988, vol. 1, 8 pages.
Banbrook, M., “Nonlinear Analysis of Speech from a Synthesis Perspective,” A thesis submitted for the degree of Doctor of Philosophy, The University of Edinburgh, Oct. 15, 1996, 35 pages.
Belaid, A., et al., “A Syntactic Approach for Handwritten Mathematical Formula Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-6, No. 1, Jan. 1984, 7 pages.
Bellegarda, E. J., et al., “On-Line Handwriting Recognition Using Statistical Mixtures,” Advances in Handwriting and Drawings: A Multidisciplinary Approach, Europia, 6th International IGS Conference on Handwriting and Drawing, Paris—France, Jul. 1993, 11 pages.
Bellegarda, J. R., “A Latent Semantic Analysis Framework for Large-Span Language Modeling,” 5th European Conference on Speech, Communication and Technology, (EUROSPEECH'97), Sep. 22-25, 1997, 4 pages.
Bellegarda, J. R., “A Multispan Language Modeling Framework for Large Vocabulary Speech Recognition,” IEEE Transactions on Speech and Audio Processing, vol. 6, No. 5, Sep. 1998, 12 pages.
Bellegarda, J. R., et al., “A Novel Word Clustering Algorithm Based on Latent Semantic Analysis,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'96), vol. 1, 4 pages.
Bellegarda, J. R., et al., “Experiments Using Data Augmentation for Speaker Adaptation,” International Conference on Acoustics, Speech, and Signal Processing (ICASSP'95), May 9-12, 1995, 4 pages.
Bellegarda, J. R., “Exploiting Both Local and Global Constraints for Multi-Span Statistical Language Modeling,” Proceeding of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'98), vol. 2, May 12-15, 1998, 5 pages.
Bellegarda, J. R., “Exploiting Latent Semantic Information in Statistical Language Modeling,” in Proceedings of the IEEE, Aug. 2000, vol. 88, No. 8, 18 pages.
Bellegarda, J. R., “Interaction-Driven Speech Input—A Data-Driven Approach to the Capture of Both Local and Global Language Constraints,” 1992, 7 pages, available at http://old.sigchi.org/bulletin/1998.2/bellegarda.html.
Bellegarda, J. R., “Large Vocabulary Speech Recognition with Multispan Statistical Language Models,” IEEE Transactions on Speech and Audio Processing, vol. 8, No. 1, Jan. 2000, 9 pages.
Bellegarda, J. R., et al., “Performance of the IBM Large Vocabulary Continuous Speech Recognition System on the ARPA Wall Street Journal Task,” Signal Processing VII: Theories and Applications, © 1994 European Association for Signal Processing, 4 pages.
Bellegarda, J. R., et al., “The Metamorphic Algorithm: A Speaker Mapping Approach to Data Augmentation,” IEEE Transactions on Speech and Audio Processing, vol. 2, No. 3, Jul. 1994, 8 pages.
Black, A. W., et al., “Automatically Clustering Similar Units for Unit Selection in Speech Synthesis,” in Proceedings of Eurospeech 1997, vol. 2, 4 pages.
Blair, D. C., et al., “An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System,” Communications of the ACM, vol. 28, No. 3, Mar. 1985, 11 pages.
Briner, L. L., “Identifying Keywords in Text Data Processing,” in Zelkowitz, Marvin V., ED, Directions and Challenges,15th Annual Technical Symposium, Jun. 17, 1976, Gaithersbury, Maryland, 7 pages.
Bulyko, I., et al., “Joint Prosody Prediction and Unit Selection for Concatenative Speech Synthesis,” Electrical Engineering Department, University of Washington, Seattle, 2001, 4 pages.
Bussey, H. E., et al., “Service Architecture, Prototype Description, and Network Implications of a Personalized Information Grazing Service,” INFOCOM'90, Ninth Annual Joint Conference of the IEEE Computer and Communication Societies, Jun. 3-7, 1990, http://slrohall.com/publications/, 8 pages.
Buzo, A., et al., “Speech Coding Based Upon Vector Quantization,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. Assp-28, No. 5, Oct. 1980, 13 pages.
Caminero-Gil, J., et al., “Data-Driven Discourse Modeling for Semantic Interpretation,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, May 7-10, 1996, 6 pages.
Cawley, G. C., “The Application of Neural Networks to Phonetic Modelling,” PhD Thesis, University of Essex, Mar. 1996, 13 pages.
Chang, S., et al., “A Segment-based Speech Recognition System for Isolated Mandarin Syllables,” Proceedings TENCON '93, IEEE Region 10 conference on Computer, Communication, Control and Power Engineering, Oct. 19-21, 1993, vol. 3, 6 pages.
Conklin, J., “Hypertext: An Introduction and Survey,” Computer Magazine, Sep. 1987, 25 pages.
Connolly, F. T., et al., “Fast Algorithms for Complex Matrix Multiplication Using Surrogates,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Jun. 1989, vol. 37, No. 6, 13 pages.
Deerwester, S., et al., “Indexing by Latent Semantic Analysis,” Journal of the American Society for Information Science, vol. 41, No. 6, Sep. 1990, 19 pages.
Deller, Jr., J. R., et al., “Discrete-Time Processing of Speech Signals,” © 1987 Prentice Hall, ISBN: 0-02-328301-7, 14 pages.
Digital Equipment Corporation, “Open VMS Software Overview,” Dec. 1995, software manual, 159 pages.
Donovan, R. E., “A New Distance Measure for Costing Spectral Discontinuities in Concatenative Speech Synthesisers,” 2001, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.6398, 4 pages.
Frisse, M. E., “Searching for Information in a Hypertext Medical Handbook,” Communications of the ACM, vol. 31, No. 7, Jul. 1988, 8 pages.
Goldberg, D., et al., “Using Collaborative Filtering to Weave an Information Tapestry,” Communications of the ACM, vol. 35, No. 12, Dec. 1992, 10 pages.
Gorin, A. L., et al., “On Adaptive Acquisition of Language,” International Conference on Acoustics, Speech, and Signal Processing (ICASSP'90), vol. 1, Apr. 3-6, 1990, 5 pages.
Gotoh, Y., et al., “Document Space Models Using Latent Semantic Analysis,” in Proceedings of Eurospeech, 1997, 4 pages.
Gray, R. M., “Vector Quantization,” IEEE ASSP Magazine, Apr. 1984, 26 pages.
Harris, F. J., “On the Use of Windows for Harmonic Analysis with the Discrete Fourier Transform,” in Proceedings of the IEEE, vol. 66, No. 1, Jan. 1978, 34 pages.
Helm, R., et al., “Building Visual Language Parsers,” in Proceedings of CHI'91 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 8 pages.
Hermansky, H., “Perceptual Linear Predictive (PLP) Analysis of Speech,” Journal of the Acoustical Society of America, vol. 87, No. 4, Apr. 1990, 15 pages.
Hermansky, H., “Recognition of Speech in Additive and Convolutional Noise Based on Rasta Spectral Processing,” in proceedings of IEEE International Conference on Acoustics, speech, and Signal Processing (ICASSP'93), Apr. 27-30, 1993, 4 pages.
Hoehfeld M., et al., “Learning with Limited Numerical Precision Using the Cascade-Correlation Algorithm,” IEEE Transactions on Neural Networks, vol. 3, No. 4, Jul. 1992, 18 pages.
Holmes, J. N., “Speech Synthesis and Recognition—Stochastic Models for Word Recognition,” Speech Synthesis and Recognition, Published by Chapman & Hall, London, ISBN 0 412 53430 4, © 1998 J. N. Holmes, 7 pages.
Hon, H.W., et al., “CMU Robust Vocabulary-Independent Speech Recognition System,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-91), Apr. 14-17, 1991, 4 pages.
IBM Technical Disclosure Bulletin, “Speech Editor,” vol. 29, No. 10, Mar. 10, 1987, 3 pages.
IBM Technical Disclosure Bulletin, “Integrated Audio-Graphics User Interface,” vol. 33, No. 11, Apr. 1991, 4 pages.
IBM Technical Disclosure Bulletin, “Speech Recognition with Hidden Markov Models of Speech Waveforms,” vol. 34, No. 1, Jun. 1991, 10 pages.
Iowegian International, “FIR Filter Properties,” dspGuro, Digital Signal Processing Central, http://www.dspguru.com/dsp/tags/fir/properties, downloaded on Jul. 28, 2010, 6 pages.
Jacobs, P. S., et al., “Scisor: Extracting Information from On-Line News,” Communications of the ACM, vol. 33, No. 11, Nov. 1990, 10 pages.
Jelinek, F., “Self-Organized Language Modeling for Speech Recognition,” Readings in Speech Recognition, edited by Alex Waibel and Kai-Fu Lee, May 15, 1990, © 1990 Morgan Kaufmann Publishers, Inc., ISBN: 1-55860-124-4, 63 pages.
Jennings, A., et al., “A Personal News Service Based on a User Model Neural Network,” IEICE Transactions on Information and Systems, vol. E75-D, No. 2, Mar. 1992, Tokyo, JP, 12 pages.
Ji, T., et al., “A Method for Chinese Syllables Recognition based upon Sub-syllable Hidden Markov Model,” 1994 International Symposium on Speech, Image Processing and Neural Networks, Apr. 13-16, 1994, Hong Kong, 4 pages.
Jones, J., “Speech Recognition for Cyclone,” Apple Computer, Inc., E.R.S., Revision 2.9, Sep. 10, 1992, 93 pages.
Katz, S. M., “Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-35, No. 3, Mar. 1987, 3 pages.
Kitano, H., “PhiDM-Dialog, an Experimental Speech-to-Speech Dialog Translation System,” Jun. 1991 Computer, vol. 24, No. 6, 13 pages.
Klabbers, E., et al., “Reducing Audible Spectral Discontinuities,” IEEE Transactions on Speech and Audio Processing, vol. 9, No. 1, Jan. 2001, 13 pages.
Klatt, D. H., “Linguistic Uses of Segmental Duration in English: Acoustic and Perpetual Evidence,” Journal of the Acoustical Society of America, vol. 59, No. 5, May 1976, 16 pages.
Kominek, J., et al., “Impact of Durational Outlier Removal from Unit Selection Catalogs,” 5th ISCA Speech Synthesis Workshop, Jun. 14-16, 2004, 6 pages.
Kubala, F., et al., “Speaker Adaptation from a Speaker-Independent Training Corpus,” International Conference on Acoustics, Speech, and Signal Processing (ICASSP'90), Apr. 3-6, 1990, 4 pages.
Kubala, F., et al., “The Hub and Spoke Paradigm for CSR Evaluation,” Proceedings of the Spoken Language Technology Workshop, Mar. 6-8, 1994, 9 pages.
Lee, K.F., “Large-Vocabulary Speaker-Independent Continuous Speech Recognition: The SPHINX System,” Apr. 18, 1988, Partial fulfillment of the requirements for the degree of Doctor of Philosophy, Computer Science Department, Carnegie Mellon University, 195 pages.
Lee, L., et al., “A Real-Time Mandarin Dictation Machine for Chinese Language with Unlimited Texts and Very Large Vocabulary,” International Conference on Acoustics, Speech and Signal Processing, vol. 1, Apr. 3-6, 1990, 5 pages.
Lee, L, et al., “Golden Mandarin(II)—An Improved Single-Chip Real-Time Mandarin Dictation Machine for Chinese Language with Very Large Vocabulary,” 0/7803-0946-4/93 © 1993IEEE, 4 pages.
Lee, L, et al., “Golden Mandarin(II)—An Intelligent Mandarin Dictation Machine for Chinese Character Input with Adaptation/Learning Functions,” International Symposium on Speech, Image Processing and Neural Networks, Apr. 13-16, 1994, Hong Kong, 5 pages.
Lee, L., et al., “System Description of Golden Mandarin (I) Voice Input for Unlimited Chinese Characters,” International Conference on Computer Processing of Chinese & Oriental Languages, vol. 5, Nos. 3 & 4, Nov. 1991, 16 pages.
Lin, C.H., et al., “A New Framework for Recognition of Mandarin Syllables With Tones Using Sub-syllabic Unites,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-93), Apr. 27-30, 1993, 4 pages.
Linde, Y., et al., “An Algorithm for Vector Quantizer Design,” IEEE Transactions on Communications, vol. 28, No. 1, Jan. 1980, 12 pages.
Liu, F.H., et al., “Efficient Joint Compensation of Speech for the Effects of Additive Noise and Linear Filtering,” IEEE International Conference of Acoustics, Speech, and Signal Processing, ICASSP-92, Mar. 23-26, 1992, 4 pages.
Logan, B., “Mel Frequency Cepstral Coefficients for Music Modeling,” in International Symposium on Music Information Retrieval, 2000, 2 pages.
Lowerre, B. T., “The-HARPY Speech Recognition System,” Doctoral Dissertation, Department of Computer Science, Carnegie Mellon University, Apr. 1976, 20 pages.
Maghbouleh, A., “An Empirical Comparison of Automatic Decision Tree and Linear Regression Models for Vowel Durations,” Revised version of a paper presented at the Computational Phonology in Speech Technology workshop, 1996 annual meeting of the Association for Computational Linguistics in Santa Cruz, California, 7 pages.
Markel, J. D., et al., “Linear Prediction of Speech,” Springer-Verlag, Berlin Heidelberg New York 1976, 12 pages.
Morgan, B., “Business Objects,” (Business Objects for Windows) Business Objects Inc., DBMS Sep. 1992, vol. 5, No. 10, 3 pages.
Mountford, S. J., et al., “Talking and Listening to Computers,” The Art of Human-Computer Interface Design, Copyright © 1990 Apple Computer, Inc. Addison-Wesley Publishing Company, Inc., 17 pages.
Murty, K. S. R., et al., “Combining Evidence from Residual Phase and MFCC Features for Speaker Recognition,” IEEE Signal Processing Letters, vol. 13, No. 1, Jan. 2006, 4 pages.
Murveit H. et al., “Integrating Natural Language Constraints into HMM-based Speech Recognition,” 1990 International Conference on Acoustics, Speech, and Signal Processing, Apr. 3-6, 1990, 5 pages.
Nakagawa, S., et al., “Speaker Recognition by Combining MFCC and Phase Information,” IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Mar. 14-19, 2010, 4 pages.
Niesler, T. R., et al., “A Variable-Length Category-Based N-Gram Language Model,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'96), vol. 1, May 7-10, 1996, 6 pages.
Papadimitriou, C. H., et al., “Latent Semantic Indexing: A Probabilistic Analysis,” Nov. 14, 1997, http://citeseerx.ist.psu.edu/messages/downloadsexceeded.html, 21 pages.
Parsons, T. W., “Voice and Speech Processing,” Linguistics and Technical Fundamentals, Articulatory Phonetics and Phonemics, © 1987 McGraw-Hill, Inc., ISBN: 0-07-0485541-0, 5 pages.
Parsons, T. W., “Voice and Speech Processing,” Pitch and Formant Estimation, © 1987 McGraw-Hill, Inc., ISBN: 0-07-0485541-0, 15 pages.
Picone, J., “Continuous Speech Recognition Using Hidden Markov Models,” IEEE ASSP Magazine, vol. 7, No. 3, Jul. 1990, 16 pages.
Rabiner, L. R., et al., “Fundamental of Speech Recognition,” © 1993 AT&T, Published by Prentice-Hall, Inc., ISBN: 0-13-285826-6, 17 pages.
Rabiner, L. R., et al., “Note on the Properties of a Vector Quantizer for LPC Coefficients,” The Bell System Technical Journal, vol. 62, No. 8, Oct. 1983, 9 pages.
Ratcliffe, M., “ClearAccess 2.0 allows SQL searches off-line,” (Structured Query Language), ClearAcess Corp., MacWeek Nov. 16, 1992, vol. 6, No. 41, 2 pages.
Remde, J. R., et al., “SuperBook: An Automatic Tool for Information Exploration-Hypertext?,” in Proceedings of Hypertext'87 papers, Nov. 13-15, 1987, 14 pages.
Reynolds, C. F., “On-Line Reviews: A New Application of the HICOM Conferencing System,” IEE Colloquium on Human Factors in Electronic Mail and Conferencing Systems, Feb. 3, 1989, 4 pages.
Rigoll, G., “Speaker Adaptation for Large Vocabulary Speech Recognition Systems Using Speaker Markov Models,” International Conference on Acoustics, Speech, and Signal Processing (ICASSP'89), May 23-26, 1989, 4 pages.
Riley, M. D., “Tree-Based Modelling of Segmental Durations,” Talking Machines Theories, Models, and Designs, 1992 © Elsevier Science Publishers B.V., North-Holland, ISBN: 08-444-89115.3, 15 pages.
Rivoira, S., et al., “Syntax and Semantics in a Word-Sequence Recognition System,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'79), Apr. 1979, 5 pages.
Rosenfeld, R., “A Maximum Entropy Approach to Adaptive Statistical Language Modelling,” Computer Speech and Language, vol. 10, No. 3, Jul. 1996, 25 pages.
Roszkiewicz, A., “Extending your Apple,” Back Talk—Lip Service, A+ Magazine, The Independent Guide for Apple Computing, vol. 2, No. 2, Feb. 1984, 5 pages.
Sakoe, H., et al., “Dynamic Programming Algorithm Optimization for Spoken Word Recognition,” IEEE Transactins on Acoustics, Speech, and Signal Processing, Feb. 1978, vol. ASSP-26 No. 1, 8 pages.
Salton, G., et al., “On the Application of Syntactic Methodologies in Automatic Text Analysis,” Information Processing and Management, vol. 26, No. 1, Great Britain 1990, 22 pages.
Savoy, J., “Searching Information in Hypertext Systems Using Multiple Sources of Evidence,” International Journal of Man-Machine Studies, vol. 38, No. 6, Jun. 1993, 15 pages.
Scagliola, C., “Language Models and Search Algorithms for Real-Time Speech Recognition,” International Journal of Man-Machine Studies, vol. 22, No. 5, 1985, 25 pages.
Schmandt, C., et al., “Augmenting a Window System with Speech Input,” IEEE Computer Society, Computer Aug. 1990, vol. 23, No. 8, 8 pages.
Schütze, H., “Dimensions of Meaning,” Proceedings of Supercomputing'92 Conference, Nov. 16-20, 1992, 10 pages.
Sheth B., et al., “Evolving Agents for Personalized Information Filtering,” in Proceedings of the Ninth Conference on Artificial Intelligence for Applications, Mar. 1-5, 1993, 9 pages.
Shikano, K., et al., “Speaker Adaptation Through Vector Quantization,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'86), vol. 11, Apr. 1986, 4 pages.
Sigurdsson, S., et al., “Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music,” in Proceedings of the 7th International Conference on Music Information Retrieval (ISMIR), 2006, 4 pages.
Silverman, K. E. A., et al., “Using a Sigmoid Transformation for Improved Modeling of Phoneme Duration,” Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Mar. 15-19, 1999, 5 pages.
Tenenbaum, A.M., et al., “Data Structure Using Pascal,” 1981 Prentice-Hall, Inc., 34 pages.
Tsai, W.H., et al., “Attributed Grammar—A Tool for Combining Syntactic and Statistical Approaches to Pattern Recognition,” IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-10, No. 12, Dec. 1980, 13 pages.
Udell, J., “Computer Telephony,” BYTE, vol. 19, No. 7, Jul. 1, 1994, 9 pages.
van Santen, J. P. H., “Contextual Effects on Vowel Duration,” Journal Speech Communication, vol. 11, No. 6, Dec. 1992, 34 pages.
Vepa, J., et al., “New Objective Distance Measures for Spectral Discontinuities in Concatenative Speech Synthesis,” in Proceedings of the IEEE 2002 Workshop on Speech Synthesis, 4 pages.
Verschelde, J., “MATLAB Lecture 8. Special Matrices in MATLAB,” Nov. 23, 2005, UIC Dept. of Math., Stat.. & C.S., MCS 320, Introduction to Symbolic Computation, 4 pages.
Vingron, M. “Near-Optimal Sequence Alignment,” Deutsches Krebsforschungszentrum (DKFZ), Abteilung Theoretische Bioinformatik, Heidelberg, Germany, Jun. 1996, 20 pages.
Werner, S., et al., “Prosodic Aspects of Speech,” Université de Lausanne, Switzerland, 1994, Fundamentals of Speech Synthesis and Speech Recognition: Basic Concepts, State of the Art, and Future Challenges, 18 pages.
Wikipedia, “Mel Scale,” Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Mel—scale, 2 pages.
Wikipedia, “Minimum Phase,” Wikipedia, the free encyclopedia, http://en.wikipedia.org./wiki/Minimum—phase, 8 pages.
Wolff, M., Poststructuralism and the ARTFUL Database: Some Theoretical Considerations, Information Technology and Libraries, vol. 13, No. 1, Mar. 1994, 10 pages.
Wu, M., “Digital Speech Processing and Coding,” ENEE408G Capstone-Multimedia Signal Processing, Spring 2003, Lecture-2 course presentation, University of Maryland, College Park, 8 pages.
Wu, M., “Speech Recognition, Synthesis, and H.C.I.,” ENEE408G Capstone-Multimedia Signal Processing, Spring 2003, Lecture-3 course presentation, University of Maryland, College Park, 11 pages.
Wyle, M. F., “A Wide Area Network Information Filter,” in Proceedings of First International Conference on Artificial Intelligence on Wall Street, Oct. 9-11, 1991, 6 pages.
Yankelovich, N., et al., “Intermedia: The Concept and the Construction of a Seamless Information Environment,” Computer Magazine, Jan. 1988, © 1988 IEEE, 16 pages.
Yoon, K., et al., “Letter-to-Sound Rules for Korean,” Department of Linguistics, The Ohio State University, 2002, 4 pages.
Zhao, Y., “An Acoustic-Phonetic-Based Speaker Adaptation Technique for Improving Speaker-Independent Continuous Speech Recognition,” IEEE Transactions on Speech and Audio Processing, vol. 2, No. 3, Jul. 1994, 15 pages.
Zovato, E., et al., “Towards Emotional Speech Synthesis: A Rule Based Approach,” 2 pages.
International Search Report dated Nov. 9, 1994, received in International Application No. PCT/US1993/12666, which corresponds to U.S. Appl. No. 07/999,302, 8 pages (Robert Don Strong).
International Preliminary Examination Report dated Mar. 1, 1995, received in International Application No. PCT/US1993/12666, which corresponds to U.S. Appl. No. 07/999,302, 5 pages (Robert Don Strong).
International Preliminary Examination Report dated Apr. 10, 1995, received in International Application No. PCT/US1993/12637, which corresponds to U.S. Appl. No. 07/999,354, 7 pages (Alejandro Acero).
International Search Report dated Feb. 8, 1995, received in International Application No. PCT/US1994/11011, which corresponds to U.S. Appl. No. 08/129,679, 7 pages (Yen-Lu Chow).
International Preliminary Examination Report dated Feb. 28, 1996, received in International Application No. PCT/US1994/11011, which corresponds to U.S. Appl. No. 08/129,679, 4 pages (Yen-Lu Chow).
Written Opinion dated Aug. 21, 1995, received in International Application No. PCT/US1994/11011, which corresponds to U.S. Appl. No. 08/129,679, 4 pages (Yen-Lu Chow).
International Search Report dated Nov. 8, 1995, received in International Application No. PCT/US1995/08369, which corresponds to U.S. Appl. No. 08/271,639, 6 pages (Peter V. De Souza).
International Preliminary Examination Report dated Oct. 9, 1996, received in International Application No. PCT/US1995/08369, which corresponds to U.S. Appl. No. 08/271,639, 4 pages (Peter V. De Souza).

Related Publications (1)

	Number	Date	Country
	20070294083 A1	Dec 2007	US

Continuations (1)

	Number	Date	Country
Parent	09527498	Mar 2000	US
Child	11811955		US

Fast, language-independent method for user authentication by voice

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Term Extension

Abstract