The present application relates generally to speech recognition systems in vehicles, such as automotive vehicles. One such system is a hands-free telephone system having a microphone and speakers mounted in the interior of a vehicle and a processing circuit which processes spoken commands from a vehicle occupant and performs telephone operations, such as making a telephone call. Speech recognition is used in this system to recognize a spoken command from a vehicle occupant to make a telephone call and to receive a telephone number via spoken words from the vehicle occupant. The processing circuit places the telephone call and provides an audio communication link between the vehicle occupant and the telephone system.
One drawback of prior hands-free telephone systems in vehicles was that the telephone system was not easily upgradeable because it was mounted integrally with the vehicle and was not made compatible with wireless telephones. Therefore, an improved hands-free telephone system has been developed which is configured to provide telephone services between a vehicle occupant and the occupant's own mobile telephone which is located in the vicinity of the vehicle (e.g., in a cradle, in the occupant's pocket or briefcase, etc.). In such a system, a telephone call is placed by the vehicle occupant through the hands-free telephone system mounted integral to the vehicle which creates a wireless communication link with the occupant's mobile phone. The mobile phone becomes a conduit between the hands-free telephone system and the public telephone network.
In such a hands-free telephone system, the speech recognition algorithms require a large amount of processing power and memory, and must be programmed to look for key words in the spoken command and carry out functions by invoking software applications. Because of physical size and cost restraints, processing power and memory are limited in such a vehicle-mounted module. Furthermore, if additional functions are to be added to the hands-free system, new applications to run the functions must be developed and implemented on the hands-free system. This requires additional processing power and memory, and, in the automotive application, requires that the vehicle owner return to the service dealer to receive upgrades to the software operating on the hands-free system.
Accordingly, there is a need for a system and method of operating a speech recognition system in a vehicle which can be configured with additional applications without having to develop and distribute the additional applications onto the hands-free module. Further, there is a need for a system and method of operating a speech recognition system in a vehicle that uses context processing in a more efficient manner to assist the speech recognition engine in determining how to execute a spoken command. Further, there is a need for a system and method of operating a speech recognition system in a vehicle that enables applications to be added without reprogramming the embedded hands-free module or greatly increasing its need for memory.
The teachings hereinbelow extend to those embodiments which fall within the scope of the appended claims, regardless of whether they accomplish one or more of the above-mentioned needs.
According to one exemplary-embodiment, a method of operating a speech recognition system in a vehicle comprises receiving a spoken command from a vehicle occupant and determining if the speech recognition system has an application configured to execute the spoken command. The method further comprises, based on the determining step, sending a wireless message comprising spoken command data to a remote system, receiving response data from the remote system, and performing a function based on the response data.
According to another exemplary embodiment, a method of operating a remote speech recognition server which services a vehicle-based speech recognition system comprises, at the remote speech recognition server, receiving a wireless message comprising spoken command data from the vehicle-based speech recognition system. The method further comprises applying a speech recognition function to the spoken command data, executing the spoken command with an application, and sending a wireless response message based on the executing step to the vehicle.
According to yet another exemplary embodiment, a speech recognition system in a vehicle comprises a microphone, a processing circuit, and a wireless transceiver circuit. The microphone is configured to receive a spoken command from a vehicle occupant. The processing circuit is configured to determine if the speech recognition system has an application configured to execute the spoken command. The processing circuit is further configured to generate spoken command data based on the spoken command. The wireless transceiver circuit is configured to transmit the spoken command data to a remote system and to receive response data from the remote system. The processing circuit is configured to perform a function based on the response data.
The invention will become more fully understood from the following detailed description, taken in conjunction with the accompanying drawings, wherein like reference numerals refer to like parts, and in which:
Referring to
Processing circuit 12 can include one or more analog or digital components, such as microprocessors, microcontrollers, application specific integrated circuits (ASICs), or other processing circuitry. Processing circuit 12 can include memory, including volatile and non-volatile memory for storing a computer program or other software to perform the functions described herein. Microphone 16 can include one or more microphones configured to receive a spoken command from a vehicle occupant. The spoken command can be any word that the occupant utters or provides to system 10 to cause system 10 or another system to perform a function. Speaker 18 is configured to receive audio output data from processing circuit 12 (e.g., an audible communication from another party to a telephone call, information prompts or other messages generated by processing circuit 12, etc.). Speaker 18 can be part of the vehicle radio/tape/CD/MP3 Player, or can be a dedicated speaker serving only system 10.
Wireless transceiver 14 can be a communication circuit including analog and/or digital components configured to transmit and receive wireless data in any of a variety of data transmission formats, such as a Bluetooth communications protocol, an IEEE 802.11 communications protocol, or other personal area network protocols or other wireless communications protocols or data formats.
Referring now to
In another example, system 10 will determine that it does not have an application configured to execute the spoken command. For example, the speech recognition function recognizes the word “weather” (or does not recognize any words in the spoken command). In this case, as shown at step 36, a wireless message comprising spoken command data is transmitted to a remote system, specifically remote speech recognition server 28. The spoken command data can take a variety of forms, and in one exemplary embodiment is at least a portion or all of a phoneme-based representation of the spoken command. Phonemes are phonetic units of the spoken command which can be detected by a speech recognition function. By transmitting a phoneme-based representation, which can include one or more phonemes of the spoken command, the transmission time of the spoken command data to remote server 28 can be greatly reduced as compared to transmitting a complete digitization of the spoken command. Alternatively, spoken command data can comprise the complete digitization of the spoken command, a text representation of one or more recognized words as recognized by the speech recognition function, a plurality of possible recognized words, or other data based on the spoken command.
As shown in
Returning to
Advantageously, remote server 28 can include much greater processing and memory capabilities to run a more rigorous speech recognition algorithm on the spoken command and can further request desired information from other network-based resources, via the Internet or via other networks. Furthermore, new functions can be accessible to system 10 by storing the new applications (e.g., containing vocabulary, operator prompts, and decision logic) on one or more remote servers 28 to be accessed by system 10. Software on processing circuit 12 does not need to be substantially redesigned, if at all, or even updated.
Referring now to
In this exemplary embodiment, speech recognition function 54 is configured to compare a recognized word to a plurality of predetermined key words to determine if system 10 has an application configured to execute the spoken command. These applications can be called local agents and are identified as local agents 60 in
Mobile phone 22 can relay or forward the wireless message via network 24 to remote server 28. According to one example, a Dial-Up Networking (DUN) connection can be used, which makes the transmission of the wireless message through the phone transparent. Other protocols, such as Short Message Service (SMS) could be used. Remote server 28 can operate speech recognition and/or context processing software; for example, remote server 28 can operate the same speech recognition and context processing software as system 10, or can operate a more robust version of the software, since server 28 need not have the processing power and memory limitations of an embedded system. Thus, remote server 28 comprises speech recognition function 62 and context processing function 64. Further, various remote information agents or applications 66 can be accessed by context processing function 64 in order to execute the spoken command.
According to one exemplary embodiment, some spoken commands require off-board resources (i.e., resources not available within system 10, such as stock prices from an Internet-based server) while other spoken commands only require resources contained on-board (e.g., hands-free dialing resources, such as a hands-free dialing application, a phone book, etc.). The former are remote or distributed or off-board resources, and the latter are local or on-board resources.
Context processing functions 58 and 64 are optional. Conventional speech recognition engines are typically based on a predetermined vocabulary of words, which requires the vehicle occupant 50 to know a predetermined command structure. This may be acceptable for simpler applications, but for more complicated applications, the “natural language” understanding provided by context processing is advantageous. For example, a single natural language phrase “What is the weather in Detroit?” can replace a command structure such as: user: “weather,” hands-free system: “What city, please?” user: “Detroit.” Furthermore, natural language allows the request to be made in different forms such as “Get me the forecast for Detroit” or “Detroit weather forecast.”
Local agents or applications 60 can include a telephone dialing application, a set-up application, a configure application, a phone book application, and/or other applications. For example, the set-up application can be configured to provide a Bluetooth pairing function with a Bluetooth-enabled device 68, such as a personal digital system, mobile phone, etc. The configure application can be configured to allow the occupant to establish preferences for the behavior (user profile) of system 10 or other modules/functions in the vehicle. The phone book application can be configured to create, edit, and delete phone book entries in response to spoken commands and to provide operator prompts via voice responses 78 through speaker 18 to guide occupant 50 through phone book functions.
Remote server 28 is configured to determine if the spoken command request data is available from a website 70 stored on a server accessible via the Internet. Remote server 28 is configured to receive data from website 70 and provide the data in a wireless response message 72. Wireless response message 72 can include data, text, and/or other information provided via network 24 and phone 22 to system 10. Optionally, a hypertext transfer protocol (HTTP) manager 74 operable on system 10 (and/or on remote server 28) can be provided to facilitate transmission and receipt of messages in a hypertext or other markup language. Alternatively, other data formats can be used. System 10 then is configured to perform a function based on the wireless response message. In one example, a text-to-speech converter 76 converts response data to speech and provides a voice response 78 to vehicle occupant 50, for example, “JCI stock price is 100”. As shown at element 80 and described hereinabove spoken command data 80 provided to remote server 28 can take any of a variety of forms, such as phonemetric data or other data.
The functions that can be performed by system 10 are not limited to telephone dialing and acquiring data from Internet web pages. According to another example, a location determining system 82 (e.g., a global positioning system, dead reckoning system, or other such system) is configured to provide vehicle location information to system 10). System 10 can be configured to retrieve navigation information from remote server 28 and use information from GPS 82 to provide navigation data to vehicle occupant 50. According to another exemplary embodiment, the vehicle occupant 50 can provide vehicle command and control functions via a vehicle bus 84 which is coupled to system 10. For example, system 10 can be configured to receive a spoken command to control HVAC options, radio selections, vehicle seat position, interior lighting, etc. According to another example, a music management function can be provided by coupling a hand-held Bluetooth-enabled music source 68 (e.g., an MP3 player, laptop personal computer with a built-in or add-on Bluetooth transceiver, or a headset controlled by spoken commands via system 10. According to another example, system 10 can provide vehicle location and heading and/or traffic information. According to another example, communication functions can be provided by system 10, such as hands-free telephone calling, voice memo e-mail sending and receiving e-mail notification, wherein the e-mails can be converted text-to-speech and provided via voice responses 78. According to another example, calendar/to-do list functions can be provided; for example, a to-do list can be converted text-to-speech from a hand-held Bluetooth device 68, such as a personal digital assistant, laptop computer, etc. According to another example, personalized news functions can be provided in response to a spoken command request either from a predetermined Internet service provider source, such as www.yahoo.com or from user selectable sources via spoken commands. Other functions are contemplated.
As illustrated at voice call connection 86, mobile phone 22 and network 24 are configured to provide hands-free phone operation with system 10 for a voice phone call between a third party and vehicle occupant 50.
While the exemplary embodiments illustrated in the FIGS. and described above are presently preferred, it should be understood that these embodiments are offered by way of example only. For example, the teachings herein can be applied to any speech recognition system in a vehicle and is not limited to hands-free telephone applications. Accordingly, the present invention is not limited to a particular embodiment, but extends to various modifications that nevertheless fall within the scope of the appended claims.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US04/24286 | 7/28/2004 | WO | 2/24/2006 |
Number | Date | Country | |
---|---|---|---|
60498830 | Aug 2003 | US |