The disclosure relates to the field of automotive speech recognition systems, and, more particularly, to the optimization of automotive speech recognition systems utilizing multiple speech agents.
It has become common that complex software based platforms (e.g., cell phones, in-vehicle infotainment systems, cloud agents, etc.) aggregate information sources from multiple agents, such as navigation agents, search agents (local and cloud based), OS specific applications, and Bluetooth profiles for hands free telephone operation. Often interaction with these multiple agents is via speech input. Each of these speech agents may be trained to optimally recognize speech input based on a clean input signal optimized for signal/noise performance, freedom from echoes, discrimination between the intended speaker and other background speech, etc. Additionally, the speech recognition engine is expecting a spectral match to the spectral characteristics of the speech training base used to create that particular speech agent. Improper alignment of any of these parameters results in a reduction of recognition accuracy. Where an application and/or system is traditionally built around a single speech agent and acoustic system, an application environment involving multiple speech agents at best will have parametric mismatches resulting in less than optimal performance.
The present invention may provide a spectral matching function specific to each speech agent, and which is invoked by the system application as each speech engine or agent is called upon for interaction. The optimization of the spectral content to the invoked speech agent may improve the recognition rate for that agent.
In one embodiment, the invention comprises an automotive speech input optimization method, including using a microphone to convert audible speech into an audio signal. A selection of a speech agent is received. Spectral matching is performed on the audio signal to produce a conditioned audio signal. The spectral matching is dependent upon the selection of the speech agent. The conditioned audio signal is input to the selected speech agent.
In another embodiment, the invention comprises an automotive speech input optimization arrangement including a microphone converting audible speech into an audio signal. A processing device is communicatively coupled to the microphone and receives a selection of a speech agent. The processing device performs spectral matching on the audio signal to produce a conditioned audio signal. The spectral matching is dependent upon the selection of the speech agent. The processing device transmits the conditioned audio signal to the selected speech agent.
In yet another embodiment, the invention comprises an automotive speech input optimization method, including using a microphone to convert audible speech into an audio signal. Signal conditioning, spatial filtering, echo cancellation and noise reduction are performed on the audio signal. A selection of a speech agent is received. Spectral matching is performed on the audio signal to produce a conditioned audio signal. The spectral matching is based on the selection of the speech agent. The conditioned audio signal is inputted to the selected speech agent.
A better understanding of the present invention will be had upon reference to the following description in conjunction with the accompanying drawings.
In block 114, spectral matching is performed on the audio signals, wherein the spectral matching is tailored for the particular speech agent that is to receive and operate on the audio signals. In the example embodiment shown, block 114 is capable of performing different spectral matching for each of five corresponding speech agents, including Siri, Google, Nuance, Scan Speak and Watson. As indicated at 116, the speech agent is selected by an application, and the selection is received by block 114. As indicated at 118, after the speech agent-specific spectral matching has been performed in block 114, the conditioned audio signals are input to the selected speech agent.
In a first step 202, a microphone is used to convert audible speech into an audio signal. For example, microphones 102a-b may pick up audible speech within a vehicle passenger compartment and convert the speech into audio signals 104a-b, respectively.
In a next step 204, a selection of a speech agent is received. For example, as indicated at 116, a speech agent, such as Siri, Google, Nuance, Scan Speak or Watson, may be selected by a computer application, and the selection may be received by block 114.
Next, in step 206, spectral matching is performed on the audio signal to produce a conditioned audio signal. The spectral matching is dependent upon the selection of the speech agent. For example, speech agent-specific spectral matching may be performed in block 114.
In a final step 208, the conditioned audio signal is input to the selected speech agent. For example, as indicated at 118, after the speech agent-specific spectral matching has been performed in block 114, the conditioned audio signals are input to the selected speech agent.
The foregoing description may refer to “motor vehicle”, “automobile”, “automotive”, or similar expressions. It is to be understood that these terms are not intended to limit the invention to any particular type of transportation vehicle. Rather, the invention may be applied to any type of transportation vehicle whether traveling by air, water, or ground, such as airplanes, boats, etc.
The foregoing detailed description is given primarily for clearness of understanding and no unnecessary limitations are to be understood therefrom for modifications can be made by those skilled in the art upon reading this disclosure and may be made without departing from the spirit of the invention.
This application claims benefit of U.S. Provisional Application No. 62/365,025 filed on Jul. 21, 2016, which the disclosure of which is hereby incorporated by reference in its entirety for all purposes.
Number | Date | Country | |
---|---|---|---|
62365025 | Jul 2016 | US |