1. Field of the Invention
The invention relates generally to a system and method for dynamically optimizing parameters of a text-to-speech (“TTS”) system in response to automotive vehicle environmental conditions in order to maximize the intelligibility of a synthesized TTS voice.
2. Description of Related Art
Systems incorporating text-to-speech (“TTS”) engines or synthesizers coupled to a database of textual data are well known and continue to find an ever-increasing number of applications. For example, automobiles equipped with TTS and speech-recognition capabilities simplify tasks that would otherwise distract a driver from driving. The uses of TTS systems in vehicles include controlling electronic systems aboard the vehicle, such as navigation systems or audio systems, receiving critical emergency broadcasts, and placing telephone calls, among others.
Certain environmental conditions, such as vehicle speed, interior noise, lighting conditions, and weather conditions, among others, can affect a driver's concentration level and may affect the driver's ability to pay attention to and comprehend TTS voice prompts. Some existing systems attempt to compensate for certain environmental conditions while generating TTS voice prompts for automotive vehicles. For example, some systems monitor vehicle speed or other vehicle operating parameters and attempt to schedule messages for time periods when a driver will be better equipped to listen to them, such as when a driver is stopped or moving slowly. Other systems relating to vehicle navigation may use vehicle speed and estimated driver reaction times to give verbal instructions, such as “turn right,” at appropriate times.
However, none of these existing systems addresses the fact that in responding to changing environmental conditions, a driver may be required to put more concentration into driving and controlling a vehicle, leaving less mental power available for other activities, including listening to and concentrating on a TTS system. In fact, under stressful driving conditions, a driver may perceive a slower TTS voice as being normal, and a normal-speed voice as being too fast. On the other hand, under non-stressful conditions, a driver requires less mental effort, and a slower TTS voice may irritate or bore the driver. Likewise, other parameters of a TTS voice, such as pitch or volume, may also require tuning to optimize intelligibility to a driver under different stress conditions.
However, none of the existing automotive TTS systems analyze vehicle sensor data in order to apply corrections to various parameters of the TTS synthesized voice, such as voice speed, pitch, and volume, among others, to compensate for environmental conditions. In fact, most TTS systems are tuned at the factory for a single representative vehicle operating condition and cannot be dynamically optimized to compensate for changing environmental conditions. Thus, it would be useful to provide a system and method for optimizing the output voice parameters of an automotive TTS system to achieve maximum intelligibility across a wide variety of vehicle operating conditions.
An embodiment of an automotive TTS control system in accordance with the present invention comprises one or more vehicle sensors adapted to measure one or more operating states of an automotive vehicle. The vehicle sensors are coupled to a TTS engine that includes a storage module for storing a number of TTS audio parameters that affect the output voice quality of a TTS speech synthesizer. The TTS engine also includes one or more TTS tuning modules that analyze the operating states of the vehicle based on the measurements acquired from the vehicle sensors and calculate changes to the TTS audio parameters to alter the quality of the TTS synthesized voice in order to increase its intelligibility, given the measured states of the vehicle.
The measured states of the vehicle may include the vehicle speed, an interior noise level of the vehicle, a number of occupants of a vehicle and their locations within the vehicle, the lighting conditions of the environment, weather conditions that may affect driving or visibility, road roughness conditions, or any other environmental conditions that may have an effect on the concentration level of the vehicle operator. A wide variety of sensors may be employed to measure these vehicle operating states. For example, vehicle speed may be obtained from a speedometer, an odometer, an anti-lock brake sensor measuring deceleration, a global positioning sensor, or any other sensor that provides a direct or indirect indication of vehicle speed. Similarly, interior noise levels may be measured directly using a microphone, or may be inferred from sensors such as a window up/down sensor that would imply increased wind noise when a window is opened, a vehicle speed sensor that would imply increased noise at higher speeds, a vehicle suspension sensor that would indicate increased noise on a rough road, or any other type of sensor that might indicate the noise level inside a vehicle. The number of occupants of a vehicle and their locations may be measured using pressure sensors, weight sensors, or any other sensors indicating the presence of an occupant. Lighting conditions of the environment in which a vehicle is operating may be measured directly using an ambient light sensor, or could be inferred from a headlight on/off sensor, or any other sensor indicative of lighting conditions. Similarly, weather conditions may be inferred from measurements taken by a temperature sensor or from a windshield wiper on/off sensor, or any other sensor providing an indication of weather conditions. Road conditions may be inferred from measurement from a vehicle suspension sensor, from accelerometers indicating up/down motion of a vehicle, or from any other sensor giving an indication of road roughness. The aforementioned list of sensors is representative of those that might be employed in an embodiment of the present invention and is not intended to limit the scope of the invention. Any other type of sensor that would provide information about the state of the vehicle or its environment would also fall within the scope and spirit of the present invention.
In an embodiment of a TTS control system in accordance with the present invention, the control system may adjust TTS synthesized voice parameters based on measurements of the state of vehicle to increase the level of intelligibility. TTS audio parameters that may be adjusted include the TTS voice volume, the TTS voice speed, the TTS voice pitch, and the speakers to which the TTS voice is directed, among others. Other characteristics of a TTS voice, such as the gender of the voice, the language, or a particular regional accent may also be adjusted according to sensor inputs, such as a microphone that samples the driver's voice, or an input sensor with which the driver makes a preference selection.
An embodiment of a TTS control system also includes one or more TTS tuning modules responsible for relating measurements of the vehicle state to audio parameters of the TTS voice synthesizer in a way that compensates for the vehicle state to increase intelligibility. In one embodiment of a TTS tuning module in accordance with the present invention, values of measured vehicle states are divided into bins. For example, the state of the vehicle speed may be divided into bins corresponding to zero-to-thirty miles per hour, thirty-to-sixty miles per hour, and greater-than-sixty miles per hour, or any other division corresponding to a desired level of granularity. For each of the vehicle state bins, a value for each of the TTS audio parameters is provided. Based on the bin in which a measured vehicle state lies, appropriate TTS audio parameters are assigned.
In another embodiment of a TTS tuning module, values of TTS audio parameters are assigned based on a piecewise-linear function relating audio parameters to values of measured vehicle states. In still another embodiment of a TTS tuning module, values of TTS audio parameters are related to vehicle state parameters by a continuous function. Other methods of associating audio parameters with measured vehicle states are also possible and would lie within the scope and spirit of the present invention. Additional embodiments and advantages of the present invention will be clear to those skilled in the art by examination of the following detailed description and attached sheets of drawings that will first be described briefly.
A text-to-speech (TTS) system for automotive vehicles is presented that allows for optimal tuning of TTS speech parameters to maximize intelligibility across a wide variety of driving conditions. A typical driving scenario is depicted in
The vehicle 104 then accelerates 112 to seventy miles per hour, and the driver again uses the TTS system during interval 108. The increased speed of the vehicle may increase the driver's stress and concentration levels and also increase engine, road, and wind noise. Accordingly, the speed of the synthesized TTS voice may be reduced to take into account the increased stress level of the driver, and the volume may be increased to compensate for the increased interior noise level due to the increased speed.
Finally, the vehicle decelerates 114 back down to thirty-five miles per hour, and during time interval 110, the driver again uses the TTS system. Because the driver's stress level may be lower due to the decreased speed, the speed of the synthesized voice may be increased, and because of the reduced noise levels, the volume may be decreased. Thus, the synthesized TTS voice is played with optimized parameters for each set of environmental conditions to maximize intelligibility while minimizing driver frustration.
The example presented in
The above described sensors are depicted as exemplary vehicle sensors only and are not intended to limit the scope of the invention. Many other vehicle operating parameter sensors are possible, including electrical system sensors, fuel sensors, and weight sensors, among others, and all could be used as inputs to a TTS control engine 202 in accordance with the present invention.
The TTS control engine 202 operates on the received sensor data to understand the current state of the vehicle and then compares this state to a number of models stored in the control unit to determine a human reaction to the perceived vehicle state and to apply countermeasures to the TTS speech synthesis operation to compensate. For example, as a driver speeds up, the amount of concentration required to control the vehicle increases. When the driver then listens to an audio stream, the brain must concentrate on two tasks at once, and this becomes more difficult as more concentration is required to control the vehicle. When the audio stream is slowed down, the driver is better able to comprehend it while maintaining a consistent concentration level on the driving task, and in most cases, the driver will not even notice that the audio stream has been slowed down. Similarly, a driver's receptiveness to volume and pitch levels and other characteristics of synthesized TTS voices may also be affected by the driver's concentration level or stress level under different vehicle operating conditions. Directing a TTS voice preferentially to certain speakers within an automotive vehicle may also improve intelligibility in certain situations. For example, driving directions indicating a right turn that are played from the right speakers and directions indicating a left turn played from the left speakers may improve their intelligibility to the driver. Adjustments of any characteristics of a TTS voice to improve intelligibility given the operating environment of the vehicle would fall within the scope of the present invention.
The TTS tuning modules may take a variety of forms.
For simplicity,
The foregoing description has discussed several embodiments of a system for actively optimizing TTS speech synthesis to maximize intelligibility of voice prompts in an automotive vehicle application. Other embodiments and advantages of the invention may be apparent to those skilled in the art, and such would lie within the scope and spirit of the present invention. The invention is further defined by the following claims.