Embodiments generally relate to speech recognition. More particularly, embodiments relate to a speech decoder and language interpreter with asynchronous pre-processing.
A speech recognition system may include various modules, including a decoder module, end of speech detection module, and/or a natural language understanding (NLU) module. In some spoken dialog systems, an electronic speech signal is decoded until the end of speech is detected. The speech recognition result is then processed by the NLU. End of speech detection may be accomplished by checking whether there was a fixed amount of silence after a word or phrase in the electronic speech signal.
The various advantages of the embodiments will become apparent to one skilled in the art by reading the following specification and appended claims, and by referencing the following drawings, in which:
Turning now to
In some embodiments, the speech decoder 14 may include a speech detector which may be a part of a WFST decoder that bases speech/non-speech classification on the WFST state that the best active token is currently in. In different embodiments, the speech detector may be an individual classifier, for example, operating on the acoustic signal or the features from the feature extractor 12. It is also possible to use other features. For example, some synchronous video information that captures the mouth movement to detect speech/non-speech sections or similar information from a noise cancelation algorithm.
In some embodiments of the system 10, the language interpreter 16 may be configured to store an interpretation result based on the intermediate recognition result, receive an indication from the speech decoder that the request is complete, compare the complete request to the intermediate recognition result, and retrieve the stored interpretation result if the complete request matches the intermediate recognition result. Advantageously, because the language interpreter 16 pre-processed the intermediate recognition result, the interpretation result has already been prepared and may be provided from the language interpreter with little or no additional latency. The language interpreter 16 may also be configured to determine decode information based on the interpretation of the intermediate recognition result, and the speech decoder 14 may be further configured to decode the electronic speech signal based on the decode information from the language interpreter 16. For example, the language interpreter 16 may determine that the intermediate recognition result corresponds to a complete request and provide that determination to the endpoint detector 15. The endpoint detector 15 may then stop processing the phrase and indicate to the speech decoder 14 that the request is complete. The language interpreter 16 may also suggest a new hypothesis or recognition result to the decoder 14 and/or endpoint detector.
Non-limiting examples of devices which may utilize the speech recognition system 10 include a server, a computer, a smart device, a gaming console, a wearable device, an internet-of-things (IoT) device, a kiosk, a robot, an automated voice response system, and any human machine interface device which includes voice input as part of its user interaction experience. Embodiments of each of the above speech converter 11, feature extractor 12, score converter 13, speech decoder 14, endpoint detector 15, language interpreter 16, and other system components may be implemented in hardware, software, or any suitable combination thereof. For example, hardware implementations may include configurable logic such as, for example, programmable logic arrays (PLAs), field programmable gate arrays (FPGAs), complex programmable logic devices (CPLDs), or in fixed-functionality logic hardware using circuit technology such as, for example, application specific integrated circuit (ASIC), complementary metal oxide semiconductor (CMOS) or transistor-transistor logic (TTL) technology, or any combination thereof Alternatively, or additionally, these components may be implemented in one or more modules as a set of logic instructions stored in a machine- or computer-readable storage medium such as random access memory (RAM), read only memory (ROM), programmable ROM (PROM), firmware, flash memory, etc., to be executed by a processor or computing device. For example, computer program code to carry out the operations of the components may be written in any combination of one or more operating system applicable/appropriate programming languages, including an object oriented programming language such as JAVA, Python, SMALLTALK, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
Turning now to
In some embodiments, the language analyzer 22 may be configured to work with multiple intermediate results and multiple final results (e.g. from an n-best hypothesis decoder). For example, the language analyzer 22 may be further configured to store one or more language interpretation results of analysis corresponding to one or more intermediate recognition results of the electronic speech signal, receive two or more final recognition results of the electronic speech signal, compare each of the final recognition results to the intermediate recognition results, and retrieve each language interpretation result of the analysis which corresponds to one of the intermediate recognition results matching one of the final recognition results.
Embodiments of each of the above language analyzer 22, memory 24, and other components of the language interpreter apparatus 20 may be implemented in hardware, software, or any combination thereof. For example, hardware implementations may include configurable logic such as, for example, PLAs, FPGAs, CPLDs, or in fixed-functionality logic hardware using circuit technology such as, for example, ASIC, CMOS, or TTL technology, or any combination thereof. Alternatively, or additionally, these components may be implemented in one or more modules as a set of logic instructions stored in a machine- or computer-readable storage medium such as RAM, ROM, PROM, firmware, flash memory, etc., to be executed by a processor or computing device. For example, computer program code to carry out the operations of the components may be written in any combination of one or more operating system applicable/appropriate programming languages, including an object oriented programming language such as JAVA, Python, SMALLTALK, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
Turning now to
In some embodiments, the method may further include storing one or more language interpretation results of analysis corresponding to one or more intermediate recognition results of the electronic speech signal at block 38, receiving two or more final recognition results of the electronic speech signal at block 39, comparing each of the final recognition results to the intermediate recognition results at block 40, and retrieving each language interpretation result of the analysis which corresponds to one of the intermediate recognition results matching one of the final recognition results at block 41.
Embodiments of the method 30 may be implemented in a speech recognition system or language interpreter apparatus such as, for example, those described herein. More particularly, hardware implementations of the method 30 may include configurable logic such as, for example, PLAs, FPGAs, CPLDs, or in fixed-functionality logic hardware using circuit technology such as, for example, ASIC, CMOS, or TTL technology, or any combination thereof. Alternatively, or additionally, the method 30 may be implemented in one or more modules as a set of logic instructions stored in a machine- or computer-readable storage medium such as RAM, ROM, PROM, firmware, flash memory, etc., to be executed by a processor or computing device. For example, computer program code to carry out the operations of the components may be written in any combination of one or more operating system applicable/appropriate programming languages, including an object oriented programming language such as JAVA, Python, SMALLTALK, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. For example, the method 30 may be implemented on a computer readable medium as described in connection with Examples 22 to 25 below.
Turning now to
In some embodiments of the apparatus 44, the language interpreter interface 48 may be further configured to receive information related to language interpretation of the intermediate result, and the speech analyzer 46 may analyze the electronic speech signal based on the received information. For example, the speech analyzer 46 may determine an endpoint of the electronic speech signal based on the received information. For example, the speech analyzer 46 may determine that the intermediate recognition result is the final recognition result based on the received information.
Embodiments of each of the above speech analyzer 46, language interpreter interface 48, and other components of the speech decoder apparatus 44 may be implemented in hardware, software, or any combination thereof. For example, hardware implementations may include configurable logic such as, for example, PLAs, FPGAs, CPLDs, or in fixed-functionality logic hardware using circuit technology such as, for example, ASIC, CMOS, or TTL technology, or any combination thereof Alternatively, or additionally, these components may be implemented in one or more modules as a set of logic instructions stored in a machine- or computer-readable storage medium such as RAM, ROM, PROM, firmware, flash memory, etc., to be executed by a processor or computing device. For example, computer program code to carry out the operations of the components may be written in any combination of one or more operating system applicable/appropriate programming languages, including an object oriented programming language such as JAVA, Python, SMALLTALK, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
Turning now to
Some embodiments of the method 50 may also include receiving information related to language interpretation of the intermediate result at block 58, and analyzing the electronic speech signal based on the received information at block 59. For example, the method 50 may include determining an endpoint of the electronic speech signal based on the received information at block 60 and/or determining that the intermediate recognition result is the final recognition result based on the received information at block 61.
Embodiments of the method 50 may be implemented in a speech recognition system or speech decoder apparatus such as, for example, those described herein. More particularly, hardware implementations of the method 50 may include configurable logic such as, for example, PLAs, FPGAs, CPLDs, or in fixed-functionality logic hardware using circuit technology such as, for example, ASIC, CMOS, or TTL technology, or any combination thereof. Alternatively, or additionally, the method 50 may be implemented in one or more modules as a set of logic instructions stored in a machine- or computer-readable storage medium such as RAM, ROM, PROM, firmware, flash memory, etc., to be executed by a processor or computing device. For example, computer program code to carry out the operations of the components may be written in any combination of one or more operating system applicable/appropriate programming languages, including an object oriented programming language such as JAVA, Python, SMALLTALK, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. For example, the method 50 may be implemented on a computer readable medium as described in connection with Examples 30 to 33 below.
Machines with a spoken human machine interface (e.g. wearable devices, home automation, personal assistants) may have to determine whether a user completed his/her request or whether the user is still speaking. If the machine waits too long after the user input, the latency has a negative impact on the user experience. If the machine reacts too fast, it may interrupt a user or may misunderstand a user by evaluating incomplete requests. Both cases may result in a bad user experience.
A spoken dialog system may comprise multiple modules including, for example, speech decoding, end of speech detection, and NLU. Each module may introduce a latency that may adversely affect the user experience. Advantageously, some embodiments may provide latency reduction for natural language understanding of speech. For example, some embodiments may utilize speculative execution in an NLU to reduce latency without compromising accuracy.
For example, some embodiments may reduce or minimize the overall latency by interleaving automatic speech recognition (ASR) and NLU processing using streamlined and speculative computation. The ASR may include speech decoding and endpoint detection. An asynchronous or parallel processing schedule and a semantic aware end of speech detection may be utilized to continuously or periodically evaluate each speech recognition hypothesis by the NLU. The result of the evaluation may be held by the NLU or discarded depending on the result of the end of speech detection of the current hypotheses. Another advantage of interleaving ASR and NLU may include having the end of speech detection make use of intermediate NLU results that may further increase the recognition and/or endpoint accuracy.
In some spoken dialog systems, the electronic speech signal is decoded until the end of speech is detected. The speech recognition result is then processed by the NLU. The end of speech detection is non-semantic-aware. End of speech detection is accomplished by checking whether there was a fixed amount of silence or whether the best recognition hypothesis has not changed for a fixed amount of time. This processing schedule may increase latency and cause compute spikes. Attempts to reduce latency in the individual modules may also reduce accuracy.
Advantageously, some embodiments may provide a processing schedule where the time until end of speech is detected may be used to speculatively compute the NLU. This may allow accuracy improved or optimized speech decoding (e.g. use of less aggressive pruning of the lattice re-scoring or use of a bigger search space). The NLU computation may be streamlined and speculative (e.g. the computed NLU may or may not correspond to the final recognition result). This approach may reduce compute spikes and the NLU results may be available without lag (e.g. when the speculatively computed NLU matches the final recognition result). Moreover, the end of speech detection may be enhanced by using semantic information from intermediate NLU results (e.g. reduced or minimal risk of processing a truncated electronic speech signal). Some embodiments may simplify the calibration/optimization process for different hardware and speech use-cases. Advantageously, the reduced latency without comprising recognition accuracy may significantly improves the user experience.
Turning now to
An endpoint detector 75 may be coupled to the decoder 74 to determine whether the user has finished their request. The recognition result from the decoder 74 may be provided to a language interpreter/execution unit 76 to process the user request and make an appropriate response (e.g. via the loudspeaker 68 and/or the display 69). Advantageously, the endpoint detector 75 may be configured to improve the response time of the system 63 and also to reduce the number of interruptions by utilizing an adaptive and/or context aware time threshold for endpoint detection. In a conventional system, the recognition result is handed to the NLU only after an endpoint is detected. Advantageously, some embodiments of the decoder 74 may pass the current best recognition hypothesis to the language interpreter 76 before the endpoint is detected. For example, an intermediate hypothesis may be transferred continuously, regularly, or periodically from the decoder 74 to the language interpreter 76. An intermediate hypothesis may be transferred, for example, either in regular intervals (e.g. every 500 ms) or whenever the best recognition result changes.
For example, the language interpreter/execution unit 76 may provide an ASR module for the HMI to turn the speech into text form. The semantic content of the text may then be extracted to execute a command or to form an appropriate response. For example, the language interpreter may extract the user intent from the recognition result. In some embodiments, the language interpreter 76 may interpret an intermediate result and store an associated result (e.g. in some form of cache, memory, or register) without processing it further or providing a response. For example, the memory/cache may only contain one entry which stores the latest computation. Alternatively, the memory/cache may contain multiple entries to store multiple computations for multiple intermediate results.
If the decoder 74 sends a new recognition hypothesis, the language interpreter 76 may check whether it has the intent for that result cached, or extract the user intent as needed. If a new intent is extracted, the extracted intent may then be cached (e.g. potentially overwriting a previously cached intent depending on how many results the system can store). When the endpoint detector 75 detects an end of speech, the decoder 74 sends its final result to the language interpreter 76 and signals that an endpoint was detected. The language interpreter 76 may then check whether it has the intent of that result stored based on a comparison of the final result to any stored results. If so, the language interpreter 76 may execute the action corresponding to the stored intent. If the recognition result is not stored, the language interpreter 76 may first extract the intent of the final result and then execute it.
In some embodiments, the ASR module may include a WFST decoder and endpoint detector that provide semantic aware endpoint detection in which intermediate results are developed. The ASR module may pass the intermediate results to the NLU module, which can begin processing them while ASR continues. If the intermediate result is the same as the final from the ASR, the NLU saves times by using the result of the intermediate result processing along a parallel or asynchronous execution path. For example, the system may have multiple processors that support an asynchronous or parallel execution path. In some embodiments, the modules may run on separate machines or servers. For example, the ASR may run on a server (e.g. the cloud) while the NLU runs on a client. An additional advantage may be provided with bi-directional communication between the ASR and the NLU modules in that the NLU can give information to the ASR (e.g. the NLU determines that the sentence is complete after examining the intermediate results). The NLU may get the intermediate results and give the ASR information back which the ASR can use to improve or optimize the endpoint detection and/or decoding.
In some embodiments, the ASR may transmit a new intermediate result whenever the current hypothesis changes. Some embodiments may alternatively or additionally utilize a timer to control when an intermediate result is sent from the ASR to the NLU. For example, the ASR may transmit an intermediate result if both the hypothesis changed and at least 500 ms have passed. Using the timer may avoid sending too many intermediate results to the NLU. In some embodiments, the time interval may correspond to an amount of time needed by the NLU to perform its interpretation.
In some embodiments, the ASR may produce a single best hypothesis or n-best hypotheses. For example, the ASR can produce more than one hypothesis. If the user doesn't speak clearly, for example, the ASR may have trouble distinguishing “I can” from “I can't”. The ASR may deliver both results to the NLU and the NLU can process both to make a further determination. The ASR may return N possible answers, where N is greater than or equal to one. Intermediate results may generally be provided to the NLU one at a time, but the final result may include multiple possibilities. In any of the n-best results correspond to the cached result(s), the NLU can advantageously skip the work for those result.
Turning now to
Turning now to
Turning now to
Turning now to
If the result is marked as final at block 133, the NLU may determine if an intent for the result is cached at block 139. If the intent is cached at block 139, then the NLU may load the cached intent at block 140, execute a command based on the user intent at block 141, and the NLU processing may end at block 142. If the intent for the recognition result is not already cached at block 139, the NLU may compute/extract the user intent at block 143, execute a command based on the user intent at block 141, and the NLU processing may end at block 142.
Example 1 may include a speech recognition system, comprising a speech converter to convert speech from a user into an electronic signal, a feature extractor communicatively coupled to the speech converter to extract speech features from the electronic signal, a score converter communicatively coupled to the feature extractor to convert the speech features into scores of phonetic units, a speech decoder communicatively coupled to the score converter to decode a phrase spoken by the user based on the scores, an endpoint detector communicatively coupled to the speech decoder to determine if the decoded phrase corresponds to a complete request, and a language interpreter communicatively coupled to the speech decoder to interpret the complete request from the user, wherein the speech decoder is further to determine an intermediate recognition result for the decoded phrase and provide the intermediate recognition result to the language interpreter, and the language interpreter is further to asynchronously interpret the intermediate recognition result from the speech decoder.
Example 2 may include the system of Example 1, wherein the language interpreter is further to determine decode information based on the interpretation of the intermediate recognition result, and wherein the speech decoder is further to decode the electronic speech signal based on the decode information from the language interpreter.
Example 3 may include the system of any of Examples 1 to 2, wherein the language interpreter is further to store an interpretation result based on the intermediate recognition result, receive an indication from the speech decoder that the request is complete, compare the complete request to the intermediate recognition result, and retrieve the stored interpretation result if the complete request matches the intermediate recognition result.
Example 4 may include a language interpreter apparatus, comprising a language analyzer to analyze an intermediate recognition result of an electronic speech signal, and a memory to store a language interpretation result of the analysis of the intermediate recognition result, wherein the language analyzer is further to receive a final recognition result of the electronic speech signal, compare the final recognition result to the intermediate recognition result, and retrieve the language interpretation result of the analysis corresponding to the intermediate recognition result if the final recognition result matches the intermediate recognition result.
Example 5 may include the apparatus of Example 4, wherein the language analyzer is further to provide decode information based on the results of the analysis of the intermediate recognition result.
Example 6 may include the apparatus of Example 4, wherein the language analyzer is further to provide speech endpoint information based on the results of the analysis of the intermediate recognition result.
Example 7 may include the apparatus of any of Examples 4 to 6, wherein the language analyzer is further to store one or more language interpretation results of analysis corresponding to one or more intermediate recognition results of the electronic speech signal, receive two or more final recognition results of the electronic speech signal, compare each of the final recognition results to the intermediate recognition results, and retrieve each language interpretation result of the analysis which corresponds to one of the intermediate recognition results matching one of the final recognition results.
Example 8 may include a method of interpreting language, comprising analyzing an intermediate recognition result of an electronic speech signal, storing a language interpretation result of the analysis of the intermediate recognition result, receiving a final recognition result of the electronic speech signal, comparing the final recognition result to the intermediate recognition result, and retrieving the language interpretation result of the analysis corresponding to the intermediate recognition result if the final recognition result matches the intermediate recognition result.
Example 9 may include the method of Example 8, further comprising providing decode information based on the results of the analysis of the intermediate recognition result.
Example 10 may include the method of Example 8, further comprising providing speech endpoint information based on the results of the analysis of the intermediate recognition result.
Example 11 may include the method of any of Examples 8 to 10, further comprising storing one or more language interpretation results of analysis corresponding to one or more intermediate recognition results of the electronic speech signal, receiving two or more final recognition results of the electronic speech signal, comparing each of the final recognition results to the intermediate recognition results, and retrieving each language interpretation result of the analysis which corresponds to one of the intermediate recognition results matching one of the final recognition results.
Example 12 may include a speech decoder apparatus, comprising a speech analyzer to analyze an electronic speech signal to determine an intermediate recognition result of the electronic speech signal, and a language interpreter interface communicatively coupled to the speech analyzer to provide the intermediate recognition result to a language interpreter for language interpretation, wherein the speech analyzer is further to determine if the intermediate recognition result is a final recognition result of the electronic speech signal, and continue analysis of the electronic speech signal until the intermediate recognition result is determined to be the final recognition result.
Example 13 may include the apparatus of Example 12, wherein the speech analyzer is further to determine a new intermediate recognition result, and wherein the language interpreter interface is further to provide the new intermediate recognition result to the language interpreter for language interpretation.
Example 14 may include the apparatus of any of Examples 12 to 13, wherein the language interpreter interface is further to receive information related to language interpretation of the intermediate result, and wherein the speech analyzer is further to analyze the electronic speech signal based on the received information.
Example 15 may include the apparatus of Example 14, wherein the speech analyzer is further to determine an endpoint of the electronic speech signal based on the received information.
Example 16 may include the apparatus of Example 14, wherein the speech analyzer is further to determine that the intermediate recognition result is the final recognition result based on the received information.
Example 17 may include a method of decoding speech, comprising analyzing an electronic speech signal, determining an intermediate recognition result of the electronic speech signal, providing the intermediate recognition result for language interpretation, determining if the intermediate recognition result is a final recognition result of the electronic speech signal, and continuing analysis of the electronic speech signal until the intermediate recognition result is determined to be the final recognition result.
Example 18 may include the method of Example 17, further comprising determining a new intermediate recognition result, and providing the new intermediate recognition result for language interpretation.
Example 19 may include the method of any of Examples 17 to 18, further comprising receiving information related to language interpretation of the intermediate result, and analyzing the electronic speech signal based on the received information.
Example 20 may include the method of Example 19, further comprising determining an endpoint of the electronic speech signal based on the received information.
Example 21 may include the method of Example 19, further comprising determining that the intermediate recognition result is the final recognition result based on the received information.
Example 22 may include at least one computer readable medium, comprising a set of instructions, which when executed by a computing device, cause the computing device to analyze an intermediate recognition result of an electronic speech signal, store a language interpretation result of the analysis of the intermediate recognition result, receive a final recognition result of the electronic speech signal, compare the final recognition result to the intermediate recognition result, and retrieve the language interpretation result of the analysis corresponding to the intermediate recognition result if the final recognition result matches the intermediate recognition result.
Example 23 may include the at least one computer readable medium of Example 22, comprising a further set of instructions, which when executed by a computing device, cause the computing device to provide decode information based on the results of the analysis of the intermediate recognition result.
Example 24 may include the at least one computer readable medium of Example 22, comprising a further set of instructions, which when executed by a computing device, cause the computing device to provide speech endpoint information based on the results of the analysis of the intermediate recognition result.
Example 25 may include the at least one computer readable medium of any of Examples 22 to 24, comprising a further set of instructions, which when executed by a computing device, cause the computing device to store one or more language interpretation results of analysis corresponding to one or more intermediate recognition results of the electronic speech signal, receive two or more final recognition results of the electronic speech signal, compare each of the final recognition results to the intermediate recognition results, and retrieve each language interpretation result of the analysis which corresponds to one of the intermediate recognition results matching one of the final recognition results.
Example 26 may include a language interpreter apparatus, comprising means for analyzing an intermediate recognition result of an electronic speech signal, means for storing a language interpretation result of the analysis of the intermediate recognition result, means for receiving a final recognition result of the electronic speech signal, means for comparing the final recognition result to the intermediate recognition result, and means for retrieving the language interpretation result of the analysis corresponding to the intermediate recognition result if the final recognition result matches the intermediate recognition result.
Example 27 may include the apparatus of Example 26, further comprising means for providing decode information based on the results of the analysis of the intermediate recognition result.
Example 28 may include the apparatus of Example 26, further comprising means for providing speech endpoint information based on the results of the analysis of the intermediate recognition result.
Example 29 may include the apparatus of any of Examples 26 to 28, further comprising means for storing one or more language interpretation results of analysis corresponding to one or more intermediate recognition results of the electronic speech signal, means for receiving two or more final recognition results of the electronic speech signal, means for comparing each of the final recognition results to the intermediate recognition results, and means for retrieving each language interpretation result of the analysis which corresponds to one of the intermediate recognition results matching one of the final recognition results.
Example 30 may include at least one computer readable medium, comprising a set of instructions, which when executed by a computing device, cause the computing device to analyze an electronic speech signal, determine an intermediate recognition result of the electronic speech signal, provide the intermediate recognition result for language interpretation, determine if the intermediate recognition result is a final recognition result of the electronic speech signal, and continue analysis of the electronic speech signal until the intermediate recognition result is determined to be the final recognition result.
Example 31 may include the at least one computer readable medium of Example 30, comprising a further set of instructions, which when executed by a computing device, cause the computing device to determine a new intermediate recognition result, and provide the new intermediate recognition result for language interpretation.
Example 31 may include the at least one computer readable medium of any of Examples 30 to 31, comprising a further set of instructions, which when executed by a computing device, cause the computing device to receive information related to language interpretation of the intermediate result, and analyze the electronic speech signal based on the received information.
Example 32 may include the at least one computer readable medium of Example 31, comprising a further set of instructions, which when executed by a computing device, cause the computing device to determine an endpoint of the electronic speech signal based on the received information.
Example 33 may include the at least one computer readable medium of Example 31, comprising a further set of instructions, which when executed by a computing device, cause the computing device to determine that the intermediate recognition result is the final recognition result based on the received information.
Example 34 may include a speech decoder apparatus, comprising means for analyzing an electronic speech signal, means for determining an intermediate recognition result of the electronic speech signal, means for providing the intermediate recognition result for language interpretation, means for determining if the intermediate recognition result is a final recognition result of the electronic speech signal, and means for continuing analysis of the electronic speech signal until the intermediate recognition result is determined to be the final recognition result.
Example 35 may include the apparatus of Example 34, further comprising means for determining a new intermediate recognition result, and means for providing the new intermediate recognition result for language interpretation.
Example 36 may include the apparatus of any of Examples 34 to 35, further comprising means for receiving information related to language interpretation of the intermediate result, and means for analyzing the electronic speech signal based on the received information.
Example 37 may include the apparatus of Example 36, further comprising means for determining an endpoint of the electronic speech signal based on the received information.
Example 38 may include the apparatus of Example 36, further comprising means for determining that the intermediate recognition result is the final recognition result based on the received information.
Embodiments are applicable for use with all types of semiconductor integrated circuit (“IC”) chips. Examples of these IC chips include but are not limited to processors, controllers, chipset components, programmable logic arrays (PLAs), memory chips, network chips, systems on chip (SoCs), SSD/NAND controller ASICs, and the like. In addition, in some of the drawings, signal conductor lines are represented with lines. Some may be different, to indicate more constituent signal paths, have a number label, to indicate a number of constituent signal paths, and/or have arrows at one or more ends, to indicate primary information flow direction. This, however, should not be construed in a limiting manner. Rather, such added detail may be used in connection with one or more exemplary embodiments to facilitate easier understanding of a circuit. Any represented signal lines, whether or not having additional information, may actually comprise one or more signals that may travel in multiple directions and may be implemented with any suitable type of signal scheme, e.g., digital or analog lines implemented with differential pairs, optical fiber lines, and/or single-ended lines.
Example sizes/models/values/ranges may have been given, although embodiments are not limited to the same. As manufacturing techniques (e.g., photolithography) mature over time, it is expected that devices of smaller size could be manufactured. In addition, well known power/ground connections to IC chips and other components may or may not be shown within the figures, for simplicity of illustration and discussion, and so as not to obscure certain aspects of the embodiments. Further, arrangements may be shown in block diagram form in order to avoid obscuring embodiments, and also in view of the fact that specifics with respect to implementation of such block diagram arrangements are highly dependent upon the platform within which the embodiment is to be implemented, i.e., such specifics should be well within purview of one skilled in the art. Where specific details (e.g., circuits) are set forth in order to describe example embodiments, it should be apparent to one skilled in the art that embodiments can be practiced without, or with variation of, these specific details. The description is thus to be regarded as illustrative instead of limiting.
The term “coupled” may be used herein to refer to any type of relationship, direct or indirect, between the components in question, and may apply to electrical, mechanical, fluid, optical, electromagnetic, electromechanical or other connections. In addition, the terms “first”, “second”, etc. may be used herein only to facilitate discussion, and carry no particular temporal or chronological significance unless otherwise indicated.
As used in this application and in the claims, a list of items joined by the term “one or more of” may mean any combination of the listed terms. For example, the phrases “one or more of A, B or C” may mean A; B; C; A and B; A and C; B and C; or A, B and C.
Those skilled in the art will appreciate from the foregoing description that the broad techniques of the embodiments can be implemented in a variety of forms. Therefore, while the embodiments have been described in connection with particular examples thereof, the true scope of the embodiments should not be so limited since other modifications will become apparent to the skilled practitioner upon a study of the drawings, specification, and following claims.