1. Field of the Invention
The present invention generally relates to a system and method for dynamically updating and using text-to-speech data. More specifically, the present invention relates to dynamically updating the grammar rules used to pre-process text information database entries to achieve improved output text-to-speech phonetics.
2. Description of Related Art
Systems incorporating text-to-speech engines or synthesizers coupled to a database of textual data are well known and continue to find an ever-increasing number of applications. For example, automobiles equipped with text-to-speech and speech-recognition capabilities simplify tasks that would otherwise require a driver to take away his/her attention from driving. The uses of text-to-speech output in a vehicle include, but are not limited to, controlling electronic systems aboard the vehicle, such as navigation systems, audio systems, etc.
While the increasing applicability of text-to-speech (TTS) systems to electronic systems and devices, others have attempted to improve the output of text-to-speech phonetics, i.e., make the synthesized speech more natural or understandable for users. Toward this end, others have implemented a variety of fixed dictionaries. However, fixed dictionaries are necessarily large in order to handle a sufficiently large vocabulary. Moreover, a relatively high speed processor is needed to locate and retrieve entries from such large dictionaries with sufficient speed.
Others have attempted to implement non-fixed dictionaries where certain textual data are pre-processed to achieve improved TTS output. Others have attempted to pre-process the textual data according to defined rules or via manual editing of textual database entries. Such approaches to pre-processing can be time-consuming and inefficient. Moreover, a given set of pre-processing or grammar rules for a particular application may be outdated or inappropriate for another application or scenario.
Accordingly, it would be desirable to provide a system that can pre-process textual data with grammar rules that can be updated or adjusted for particular applications, user preferences, etc. Such a system would have the benefit of non-fixed dictionaries and updateable grammar rules with which to pre-process entries in the non-fixed dictionaries.
The present invention provides a system and method for improving the performance of text-to-speech (TTS) systems by dynamically updating the grammar rules used to pre-process textual entries in a text information database.
In accordance with one aspect of the embodiments described herein, there is provided a system for pre-processing text for TTS generation, comprising a first memory adapted to store a text information database, a second memory adapted to store grammar rules, a receiver adapted to receive update data regarding the grammar rules and relay the received update data to the second memory, and an audio output device. The system further comprises a TTS engine operatively coupled to the first and second memories, the receiver, and the audio output device, wherein the TTS engine is adapted to: (a) retrieve at least one text entry from the text information database; (b) apply the updated grammar rules to the at least one text entry, and thereby pre-process the at least one text entry; (c) generate speech based at least in part on the least one pre-processed text entry; and (d) send the generated speech to the audio output device.
In accordance with another aspect of the embodiments described herein, there is provided a system pre-processing text for TTS generation, comprising a memory adapted to store a text information database and grammar rules, a receiver to receive a request for the TTS generation, and an audio output device. The system further comprises a TTS engine operatively coupled to the memory, the receiver, and the audio output device, wherein the TTS engine is adapted to: (a) retrieve at least one text entry from the text information database according to the received request for the TTS generation; (b) retrieve a subset of rules from the grammar rules according to the received request; (c) apply the retrieved rules to the at least one text entry, and thereby pre-process the at least one text entry; (d) generate speech based at least in part on the at least one pre-processed text entry; and (e) send the generated speech to the audio output device.
In accordance with another aspect of the embodiments described herein, there is provided a method for pre-processing text for a TTS engine according to grammar rules, comprising: (a) receiving update data regarding the grammar rules; (b) updating the grammar rules according to the received update data; (c) receiving a request for TTS generation; (d) retrieving at least one text entry from a text information database; (e) applying the updated grammar rules to the at least one text entry to pre-process the at least one text entry. The method can further comprise providing an audio output with TTS phonetics based at least in part on the at least one pre-processed text entry.
a is a schematic diagram of an embodiment of a communication system pursuant to aspects of the invention;
b is a schematic diagram of a navigation device in communication with a mobile unit according to an embodiment of the invention;
The receiver 110 is adapted to receive, among other things, requests for TTS generation. The receiver 110 relays the request to the TTS engine 130, which in turn accesses and uses the grammar rules 120 to pre-process entries in the text information database 104 to generate a phonetic database 106. The TTS engine 130 processes or converts the entries in the text information database 102 and then reads selected entries from the generated phonetic database 106 for the user. In the embodiment of
The grammar rules 120 are used for automatically producing phonetics that can be saved for later use or used immediately for both TTS and voice recognition purposes. The grammar rules 120 can be stored in any suitable memory that is part of or operatively coupled to the TTS system 100. The grammar rules 120 can be stored with or apart from the text information database 104 and/or the phonetic database 106. The grammar rules 120, regardless of where they are stored, make it possible for the TTS engine 130 or equivalent thereof to pre-process text to achieve better prosody of voice and comprehensibility by the user. The TTS engine 130 or separate processor 112 can be used to go through the text data 104 and generate the raw phonetics 106, thereby allowing automated text manipulation for embedded or mobile TTS engines.
In one embodiment, the grammar rules 120 comprise rules for removal, reformatting, and/or replacement of text based on word spelling (including abbreviations), word and sentence structure, or other formatting structures. The TTS engine 130 or processor 112 uses search algorithms and preprocesses (i.e., removes, reformats, or replaces) entries in the text database 104 to produce a partial or complete phonetic database 106. The phonetic database 106 can be used by TTS and/or voice recognition engines.
The removing technique involves searches for particular items and removal of the identified particular items from the database entries. The removing technique can be for specific words or phrases, as well for punctuation items, such as parenthesis. The purpose of removing words, phrases, or punctuation is to eliminate portions of text database entries that are inappropriate for the TTS engine or will likely cause confusion for the user. Examples of grammar rules 120 for removing symbols include:
—
The reformatting technique involves searches for particular items and changing all or part of the makeup of identified text database entries, such as providing alternative spellings for a mispronounced word or providing letter/word markups for optimum TTS generation. Depending on the particular application of the TTS system, grammar rules 120 appropriate for a given application, such as vehicle audio or music systems, are utilized. For example, in the context of audio systems, the grammar rules 120 can comprises an algorithm for reformatting “Live”, such that “Greatest Hits (Live)” becomes “Greatest Hits Live” (hard wound Lyve). In another example, the grammar rules 120 comprise a zero-to-O algorithm, such that “808 State” becomes “Eight Oh Eight State”. Examples of grammar rules 120 for reformatting classical music composer names can include:
The replace technique involves searches for particular items and replacing them with appropriate substitute items. This can involve replacing an abbreviation with its full word, or substituting letters or characters with appropriate substitutions. For example, the grammar rules 120 can comprises an algorithm for replacing “&” with “and”, such that “Rock & Roll” becomes “Rock and Roll”. In another example, the grammar rules 120 comprise an algorithm for replacing “feat.” with “featuring”, such that “Union (feat. Sting)” becomes “Union featuring Sting”. Examples of grammar rules 120 for replacing words and symbols include:
Other examples of grammar rules 120 for audio or music systems include can include:
As explained above, particular grammar rules 120 can be selected and used for particular applications. While many of the examples of grammar rules 120 described herein are for audio or music systems, it will be understood that the grammar rules 120 generally can comprise rules for automatically producing phonetics that can be saved for later use or used immediately for both TTS and voice recognition purposes, and are not limited to any particular type of electronic system, such as embedded music, audio, or navigation systems.
TTS data, including but not limited to grammar rules 120, text information 104, and generated text phonetics 106, can be updated via any known approach. For example, in the embodiment of
The TTS system 100 typically comprises a receiver or is in communication with a receiver located on the vehicle that allows the TTS data (e.g., grammar rules 120) to be updated remotely. In one embodiment, the receiver supports the receipt of content from a remote location that is broadcast over a one-to-many communication network. One-to-many communication systems include systems that can send information from one source to a plurality of receivers, such as a broadcast network. Broadcast networks include television, radio, and satellite networks. For example, the grammar rules for TTS pre-processing can be updated by a remote broadcast signal such as via satellite radio broadcast service, as illustrated in
With reference to
In one embodiment, data for the TTS data (e.g., grammar rules 120) is generated at the remote location 216 or an alternate location that is not within or near the vehicle 201, The TTS data is broadcast from the remote location 216 over the one-to-many communication network 200 to the vehicle 201. The mobile unit 202 receives the broadcasted message and can transmit the TTS data to the navigation device 208 for updating of the database of available grammar rules 120 and/or databases 104, 106. With respect to the present illustrative embodiment, the grammar rules 120, text information data 104, and text phonetic data 106 are stored in memory 209 (see
The remote location 216 can include a remote server 218, a remote transmitter 222, and a remote memory 224, that are each in communication with one another. The remote transmitter 222 communicates with the navigation device 208 and mobile unit 202 by way of the broadcast 200 communication network. The remote server 218 supports the routing of message content over the broadcast network 200. The remote server 218 comprises an input unit, such as a keyboard, that allows the entry of updated grammar rules 120 or the like into memory 224, and a processor unit that controls the communication over the one-to-many communication network 200.
The server 218 is in communication with the vehicle 201 over a one-to-many communication network 200. In the present embodiment, the one-to-many communication network 200 comprises a broadcast center that is further in communication with one or more communication satellites 122 that relay the TTS data to a mobile unit 202 in the owner's vehicle 201. In the present embodiment, the broadcast center and the satellites 122 are part of a satellite radio broadcasting system, such as XM Satellite Radio or the like. It will be understood that the TTS data can be broadcast via any suitable information broadcast system (e.g., FM radio, AM radio, or the like), and is not limited to the satellite radio broadcast system. In one embodiment, the mobile unit 202 relays the safety message to an onboard computer system, such as the vehicle's navigation system 208, which in turn updates the database of TTS data, such as grammar rules 120, text information data 104, text phonetic data 106, etc.
b shows an expanded view of both the navigation device 208 and the mobile unit 202 contained on the vehicle 201. The navigation device 208 may include an output unit 214, a receiver unit 215, an input unit 212, a TTS engine 210, a navigation memory unit 209, a navigation processor unit 213, and an RF transceiver unit 211 that are all in electrical communication with one another. The navigation memory unit 209 can store TTS data, such as grammar rules 120 and/or text information 104 and/or text phonetics 106. Alternately, the TTS data or components thereof can be stored in memory that is not part of the navigation device 208. The database(s) with TTS grammar rules 120 and/or text information 104 and/or text phonetics 106 can be updated in the vehicle by way of the input unit 212, which can include a keyboard, a touch sensitive display, jog-dial control, etc. The TTS data can also be updated by way of information received through the receiver unit 215 and/or the RF transceiver unit 211.
The receiver unit 215 receives information from the remote location 216 and, in one embodiment, is in communication with the remote location by way of a one-to-many communication network 200 (see
In the embodiment shown in
In embodiments that involve broadcasting the TTS data to affected vehicle owners, one or a few messages may be transmitted over a one-to-many communication network 200 that each comprise a plurality of one-to-one portions (shown in
TTS updates can be received via a dedicated broadcast data stream. The dedicated data stream utilizes a specialized channel connection, such as the connection for transmitting traffic data described in U.S. patent application Ser. No. 11/266,879, filed Nov. 4, 2005, titled “Data Broadcast Method for Traffic Information,” the disclosure of which is incorporated in its entirety herein by reference. For example, the XM Satellite Radio signal uses 12.5 MHz of the S band: 2332.5 to 2345.0 MHz. XM provides portions of the available radio bandwidth to certain companies to utilize for specific applications. The transmission of messages over the negotiated bandwidth would be considered to be a dedicated data stream. In a preferred embodiment, only certain vehicles would be equipped to receive the dedicated broadcast signal or data set. The broadcast signal may comprise, by way of example only, a digital signal, FM signal, WiFi, cell, a satellite signal, a peer-to-peer network and the like. The TTS data can be embedded into the dedicated broadcast message received at the vehicle.
To install new TTS data in the vehicle, the dedicated radio signal, containing one or a plurality of new or updated TTS phonetics and/or grammar rules, is transmitted to each on-board vehicle receiver unit 204. With a dedicated signal, the in-vehicle hardware/software architecture would be able to accept this signal. In an exemplary embodiment, after the mobile unit receiver 204 receives a broadcast signal, the receiver 204 transmits the dedicated broadcast signal to the on-board vehicle processor 206. The broadcast signal is then deciphered or filtered by the processor 206. For example, the processor 206 filters out the TTS phonetics and/or grammar rules from the other portions of the dedicated broadcast signal (e.g., traffic information, the radio broadcast itself, etc.). The other portions of the broadcast signal are sent to the appropriate in-vehicle equipment (e.g., satellite radio receiver, navigation unit, etc.).
In the present embodiment, the TTS data is sent by the processor 206 to the navigation device 208, and is stored in the on-board memory 209 of the device. This updated TTS data, once stored in the on-board memory 209, is then available to the TTS 210. The on-board memory 209 may comprise any type of electronic storage device such as, but not limited to, a hard disk, flash memory, or the like. The on-board memory 209 may be separate from the navigation device 208 or integrated into it. The function of the on-board memory 209 can be dedicated to storing only TTS data or may comprise a multi-function storage capacity by also storing other content such as digital music and navigation-related information.
The navigation device 208 preferably includes an electronic control unit (ECU) (not shown). The ECU processes the TTS data received by the receiver 204 so that the TTS data is stored in the appropriate memory, such as on-board memory 209, memory 102, etc., and can be used by the system. In the present embodiment, TTS data is transmitted to the vehicle and is stored in the on-board memory 209. The ECU organizes and formats the data stored in the memory 209 into a format that is readable by the system, and in particular, so that the TTS engine 210 can read the data.
In another embodiment, shown in
An exemplary modified broadcast signal may be a standard radio audio signal 322 such that the radio signal is modified or combined 323 to also include TTS data 320, as shown in
It should be appreciated that the above-described methods for dynamically updating and utilizing in-vehicle TTS data are for explanatory purposes only and that the invention is not limited thereby. Having thus described a preferred embodiment of a method and system for dynamically updating TTS data, it should be apparent to those skilled in the art that certain advantages of the described method and system have been achieved. It should also be appreciated that various modifications, adaptations, and alternative embodiments thereof may be made within the scope and spirit of the present invention. It should also be apparent that many of the inventive concepts described above would be equally applicable to the use of other electronic systems, and are not limited to vehicle navigation systems.