A position-determining device may enable a user to determine the user's geographic position via one or more location-determining methods. Suitable location-determining methods include utilization of a satellite-based navigation system, utilization of data from cellular phone systems, and so on. A position-determining device may also communicate position-related data to a user, such as the user's current location or directions from the user's current location to another location. For example, if a user wishes to drive from the user's workplace to a particular restaurant, the user can request via the position-determining device driving directions from the user's workplace to the restaurant. The device can then provide the directions in a variety of formats, such as visually displaying the directions on a graphical display. A position-determining device can also provide the directions via audible turn-by-turn instructions to a user. Audible driving instructions are helpful in that the user does not need to switch the user's focus from the road to a graphical display in order to receive driving directions.
Current position-determining devices often use pre-recorded voices (PRVs) in providing audible driving instructions. However, current PRV implementations suffer from a number of drawbacks. First, syntax and vocabulary knowledge in many current PRV implementations is defined by operating software of the position-determining device, which inhibits the modification of existing and creation of new PRVs. Second, the rigid syntax and vocabulary defined within typical operating software inhibits the random selection of audio clips for a particular event for output by the position-determining device. Third, the rigid syntax and vocabulary defined within typical operating software inhibits the playback of audio clips in PRVs and other audio data at random times or intervals. Finally, current PRV implementations are difficult to use by third party developers since direction-related phrases are reused and there are few if any options for customization of audio output.
Techniques are described for enabling flexible and dynamic creation and/or modification of voice data for a position-determining device. In some embodiments, a voice package is provided that includes a language database and a plurality of audio files. The language database specifies appropriate syntax and vocabulary for information that is intended for audio output by a position-determining device. The audio files include words and/or phrases that may be accessed by the position-determining device to communicate the information via audible output.
This Summary is provided solely to introduce subject matter that is fully described in the Detailed Description and Drawings. Accordingly, the Summary should not be considered to describe essential features nor be used to determine scope of the claims.
The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different instances in the description and the figures may indicate similar or identical items.
Techniques and processes for creation and modification of navigation voice data are described. In some embodiments, a voice package is provided that includes a language database and a plurality of audio files. The language database specifies appropriate syntax and vocabulary for information that is intended for audio output by a position-determining device. The audio files include words and/or phrases that may be accessed by the position-determining device to communicate the information via audible output. The audio files may be in any suitable format, such as .wav, .wma, .mp3, ogg, and so on.
Some embodiments also utilize a voice package toolkit to construct and/or customize one or more parts of a voice package. The toolkit may include one or more software modules and/or applications that reside on a position-determining device or other computing device. The toolkit may also include a test module that can be used by developers and/or end users to listen to various combinations of audio files that are generated from the syntax and/or vocabulary information in the voice package. The test kit enables developers and/or end users to test various navigation scenarios in a controlled environment and, in some embodiments, without an actual position-determining device (e.g., the test kit may reside on a computing device separate from a position-determining device).
In the following discussion, an example environment is first described that is operable to employ techniques and processes for creation and modification of navigation voice vocabulary and syntax discussed herein. Example processes are then described which may be employed in the exemplary environment, as well as in other environments without departing from the spirit and scope thereof. A discussion of the voice package toolkit is then presented, which is followed by one example of script that may utilized to implement various techniques and processes discussed herein. Finally, an example process is described for specifying criteria for selecting one or more phrases from a plurality of available phrases, the one or more phrases to be used to output information. Although the techniques and processes for creation and modification of navigation voice data are described in relation to a position-determining environment, it should be readily apparent that these techniques may be employed in a variety of different environments.
The environment 100 also includes a cellular provider 104 and an internet provider 106. The cellular provider 104 may provide cellular phone and/or data retrieval functionality to various aspects of the environment 100, and the internet provider 106 may provide network connectivity and/or data retrieval functionality to various aspects of the environment 100.
The environment 100 also includes a position-determining device 108, such as any type of mobile ground-based, marine-based and/or airborne-based device. In some embodiments, position-determining device 108 comprises a personal navigation device. The position-determining device 108 may implement various types of position-determining functionality which, for purposes of the following discussion, may relate to a variety of different navigation techniques and other techniques that may be supported by “knowing” one or more positions. For instance, position-determining functionality may be employed to provide location information, timing information, speed information, turn-by-turn driving instructions, and a variety of other navigation-related data. Accordingly, the position-determining device 108 may be configured in a variety of ways to perform a wide variety of functions. For example, the positioning-determining device 108 may be configured for vehicle navigation as illustrated, aerial navigation (e.g., for airplanes, helicopters), marine navigation, personal use (e.g., as a part of fitness-related equipment), and so forth. The position-determining device 108 may include a variety of devices to determine position using one or more of the techniques previously described.
The position-determining device 108 of
The position-determining device 108 also includes a network interface 112 that may enable the device to communicate with one or more networks, such as a network 114. The network 114 may include any suitable network, such as a local area network, a wide area network, the Internet, a satellite network, a cellular phone network, and so on. In one or more embodiments, the navigation signal receiver 110 may receive data and/or signals from the network 112 to determine a location (e.g., such as Assisted GPS, or “AGPS”). Thus, in one or more embodiments, the receiver 110 may be configured to include one or more network interface capabilities.
The position-determining device 108 also includes one or more input/output (I/O) device(s) 116 (e.g., a touch screen, buttons, wireless input device, data input, a screen, and so on). The input/output devices 116 include one or more audio I/O devices 118, such as a microphone, speakers, and so on. The various devices and modules of the position-determining device 108 are communicatively coupled to a processor 120 and a memory 122.
The processor 120 is not limited by the materials from which it is formed or the processing mechanisms employed therein, and as such, may be implemented via semiconductor(s) and/or transistors (e.g., electronic integrated circuits (ICs), programmable logic devices), and so forth. Additionally, although a single memory 122 is shown, a wide variety of types and combinations of computer-readable storage memory may be employed, such as random access memory (RAM), hard disk memory, removable medium memory (e.g., the memory 122 may be implemented via a slot that accepts a removable memory cartridge), and other types of computer-readable media. Although the components of the position-determining device 108 are illustrated separately, it should be apparent that these components may also be further divided and/or combined without departing from the spirit and scope thereof.
The position-determining device 108 is configured to receive signals and/or data transmitted by one or more position data platforms and/or position data transmitters, such as the navigation satellites 102. These signals are provided to the processor 120 for processing by a positioning module 124, which is storable in the memory 122 and is executable on the processor 120. The positioning module 124 is representative of functionality that determines a geographic location, such as by processing signals and/or data obtained from various platforms/transmitters to provide position-determining functionality, such as to determine location, speed, time, and so forth. The signals and/or data may include position-related data such as ranging signals, ephemerides, almanacs, and so on.
The positioning module 124 may be executed to use map data 126 stored in the memory 122 to generate navigation instructions (e.g., turn-by-turn instructions to a destination), show a current position on a map, and so on. The positioning module 124 may also be executed to provide other position-determining functionality, such as to determine a current speed, calculate an arrival time, and so on. A wide variety of other examples are also contemplated.
Also stored on memory 122 is an input mode manager 128 that may enable the position determining device 108 to operate in a variety of input modes (e.g., a touch input mode, an automated speech recognition mode, and so on).
Memory 122 also stores a voice module 130 that is configured to perform a variety of speech and/or voice-related functions for the position-determining device 108. A device voice package 132 is stored within memory 122 and includes a language database 134 and audio data 136. In various embodiments, the voice package 132 is separate from the operating software that is utilized by the position-determining device 108. The language database 134 includes syntax data and vocabulary data accessible to the position-determining device 108 for communicating audible information. The audio data 136 is a repository of audio files that can be accessed by various components of the position-determining device 108 to provide audio output functionality.
The memory 122 may optionally store a voice package toolkit 138 that provides functionality for the creation and/or customization of various aspects of the device voice package 132. A developer, end user, or any other entity may utilize the voice package toolkit 138 to add, delete, and/or change the data and/or configuration of the voice package. For example, a user may add audio files to the audio data 136 to be used in outputting navigation information via audio output from the position-determining device 108. A user may add audio files in a certain language or dialect that is not represented in the current assortment of audio files available from audio data 136. A user may also customize the particular syntax and/or vocabulary that the language database 134 currently provides. The voice package toolkit 138 provides an interface for the device voice package 132 contents and enables a variety of different users to modify the device voice package 132 contents without modifying the operating software of the position-determining device 108.
A user interface module 140 is stored on memory 122 and is configured to generate a variety of different graphical user interfaces (GUIs), such as GUIs designed for accepting physical interaction by a user with the position-determining device 108, GUIs designed to accept speech input from a user of the device, and so on. GUIs of the position-determining device 108 may also be configured to accept any combination of user input modes via a single GUI, such as a combination of tactile interaction with the device and audio input to the device.
The position-determining device 108 may also implement cellular phone functionality, such as by connecting to a cellular network provided by the cellular provider 104. Network connectivity (e.g., Internet access) may also be provided to the position-determining device 108 via the Internet provider 106. Using the Internet provider 106 and/or the cellular provider 104, the position-determining device 108 can retrieve maps, driving directions, system updates, the voice package 132, the voice package toolkit 138, and so on.
The positioning system environment 100 also includes a computing device 142. Although computing device 142 is illustrated here as a desktop computer, this is not intended to be limiting, and any suitable computing device may be utilized, such as a laptop computer, a digital media player, a PDA, and so on. The computing device 142 includes one or more processors 144 and computer-readable media 146. As with memory 122 of the position-determining device 108, the computer-readable media 146 can include a wide variety of types and combinations of computer-readable storage memory. Stored on the computer-readable media 146 are a variety of modules, including a remote voice package 148 and a voice package toolkit 150. Included in the remote voice package 148 are a language database 152 and audio data 154. The remote voice package 148 and the voice package toolkit 150 may include similar or the same data and functionality as described for device voice package 132 and voice package toolkit 138. Using the remote voice package 148 and the voice package toolkit 150 enables a voice package to be constructed and/or customized on a device remote from a position-determining device, and then loaded onto the position-determining device. As illustrated, the computing device 142 may communicate with the position-determining device 108 either directly or via the network(s) 114. Although not expressly illustrated here, a voice package toolkit may be implemented as a Web application that may be utilized to create and/or configure a voice package and download the voice package to the position-determining device.
Generally, any of the functions described herein may be implemented using software, firmware, hardware (e.g., fixed logic circuitry), manual processing, or a combination of these implementations. The terms “module” and “functionality” as used herein generally represent software, firmware, hardware or a combination thereof. In the case of a software implementation, for instance, the module represents executable instructions that perform specified tasks when executed on a processor, such as the processor 120 of the position-determining device 108 of
The following discussion describes techniques and processes for creation and modification of navigation voice data that may be implemented utilizing the previously described systems and devices. Aspects of each of the procedures may be implemented in hardware, firmware, software or a combination thereof. The procedures are shown as a set of blocks that specify operations performed by one or more devices and are not necessarily limited to the orders shown for performing the operations by the respective blocks. In portions of the following discussion, reference will be made to the environment 100 of
The language and syntax provided by the language database may correspond to a plurality of entire utterances that may be audibly output by the position-determining device 108. “Utterance,” as used herein, refers to any phrase or other combination of words and/or numbers. In some embodiments, the language database may represent a plurality of expressions and one or more utterances corresponding to each expression. “Expression,” as used herein, refers to a concept that is desired to be communicated to the user. The expressions may correspond to a plurality of navigation-related expressions that may be communicated to the user based on the user's current position, a route traveled or initiated by the user or generated by the navigation device, based on the current position, and other navigation information, combinations thereof, and the like. However, the expressions may correspond to any information that may be audible communicated to the user.
For example, one navigation-related expression is that the user should turn right in <distance>. The language database may specify syntax and vocabulary for a plurality of utterances corresponding to this single expression. For example:
Thus, by accessing the language database, the position-determining device 108 may identify the syntax and vocabulary for utterances and/or corresponding expressions. As is discussed in more detail herein, the language database-and the provided syntax and/or vocabulary-may be easily modified to provide any desired utterance having any syntax and vocabulary without impacting the operating system or other system instructions resident on the position-determining device 108.
An audio data store is constructed that includes a variety of audio data files (block 206). As mentioned above, the audio data files may be stored in any suitable format and may include words and/or phrases in a variety of different languages and dialects. In some embodiments, the language database and the audio data store are assembled into a voice package that may be downloaded or otherwise exported to one or more devices. The language database and the audio data store are loaded onto the device (block 208). Additionally or alternatively, the language database and/or the audio data store may be loaded or otherwise stored on a remote resource that is accessible to the device, such as computing device 142. Process 200 is typically implemented all or in part on a device (e.g., computing device 142) remote from the position-determining device. Alternatively and/or additionally, a voice package toolkit may reside on the position-determining device for configuring one or more aspects of a voice package.
One or more audio files are retrieved that correspond to the identified vocabulary for the information (block 306). In the current example, the vocabulary may include words and/or phrases such as “travel”, “drive for”, “two”, “Main”, “street”, and so on. Thus, audio files that correspond to these words and/or phrases are retrieved (e.g., from an audio data store, such as audio data 136). In some embodiments, a plurality of different audio files may be available that each correspond to a single word in the vocabulary. For example, the information “travel” may be associated with several different audio files, such as “drive”, “walk”, “ride”, and may also have a variety of different accents and/or voice inflections available for each word. Thus, when an audio file is requested for a single word in the vocabulary, a variety of different audio files may be available to fulfill the request. The audio data files are arranged according to the identified appropriate syntax (block 308). In the current example, the audio files are arranged to form a phrase such as “drive for two miles west on Main Street”, or “travel west on Main Street for two miles”, and so on. The arranged audio files are made available for output by the position-determining device (block 310). For example, one or more sentences and/or phrases that each correspond to a discrete travel instruction in a series of travel instructions may be stored in a buffer and provided (individually or as a group) to an audio output device when a travel instruction that corresponds to the sentences and/or phrases is relevant to a user's current position. In the current example, when the user is approaching a street where the user should make a right turn, an instruction such as “turn right in 100 meters” may be stored in a buffer and provided to an audio output device. The arranged audio files are output by the position-determining device (block 312).
As discussed above, some embodiments may utilize a voice package toolkit to construct and/or customize one or more parts of a voice package. In some example implementations, a voice package toolkit may include and/or utilize a scripting language to create and/or customize one or more portions of the voice package without affecting a change in the operating software used by a position-determining device. For example, the toolkit may process a script written in the scripting language to form at least a portion of the voice package (e.g., the language database). The scripting language and associated scripts may be separate from the voice package and/or comprise a portion of the voice package. The voice package, database, and/or associated audio data may be dynamically updated at any time by utilizing the toolkit, other software, or manual methods.
The voice package toolkit may also include a command line utility to process the script and build the voice package, including the database and associated audio data. A test suite may also be included for testing the phrases represented by the audio data without requiring a position-determining device. This may allow a developer or other user to hear the various combinations of audio files that they have used. In at least one embodiment, a command line utility may concatenate the audio files for each phrase into one audio file. Additionally or alternatively, a GUI application may assemble the audio files and play them for one or more phrases.
The following is one example of script that may be used in one or more embodiments to define syntax and vocabulary for various utterances:
The individual words listed in the section above (such as ‘in’, ‘board’, and ‘ferry’ in the first entry) are the filenames for audio files (in any suitable file format), <expression> is a tag for an expression identified by the position-determining device 108, and <utterance entry> is a tag for an utterance. The above script is provided as an example only, and embodiments of the present invention may employ alternative scripts and databases—e.g., such as non-hierarchical scripts and databases that do not associate utterances to expressions.
In some embodiments, the voice package toolkit may read the contents of the script and create a language database (such as a table, listing, .vpm file, and so on) which specifies which audio files should be played for any given event and the order in which the audio files should be played. The language database and associated audio data (such as the audio files) may be transferred to a position-determining device for use using wired or wireless connections, including connections through a network such as the Internet. However, in some embodiments, the voice package toolkit and voice package may be resident on the position-determining device such that a user may change the voice syntax and other voice package data without accessing an external or separate computing device. When the operating software executed by the position-determining device needs to play an audible instruction or other utterance (e.g., phrase), it accesses the voice package to identify which audio files should be used and the order in which the audio files should be played. The identified audio files may then be played back in the specified order to the user.
For each phrase, different individual sets of audio files can be specified and given a use percentage associated with a number of times they should be played relative to each other. For example, for the Board Ferry instruction above, 90% of the time the first set will be played, but 10% of the time the second set will be played. For custom voices, this allows the voice to vary what is said. This keeps phrases such as a famous actor saying “I pity the fool who doesn't board the ferry” from getting old by allowing the user to only hear it 10% of the time. In some embodiments, the position-determining device can generate a random or pseudo-random number to select a particular audio file for playback instead, or in addition to, the percentage-based functionality discussed above.
For each phrase, a placeholder for distances can be used ({dist1}, {dist2}). This allows the database to specify the correct words to use for the distances in each phrase, since the words used for distances can depend on the other words in the phrase, or where it's used (e.g., changes in inflection).
Additionally or alternatively, for each phrase, a placeholder for variable content may be used ({dist1}, {dist2}, {ord1}). This allows the database to specify the correct words to use for that variable content in each phrase, since the words used for this variable content can depend on the other words in the phrase, or where it's used (e.g., changes in inflection).
To provide more creativity in the use of audio files, the voice package and corresponding audio data may include random phrases and non-navigation phrases. The use of these phrases may vary based on the particular implementation or configuration of the position-determining device. For example, on long route legs, a random phrase could be spoken. These might be jokes, quips, etc. “You're doing great!” or “{snoring} Huh? What? Sorry, must have dozed off, hopefully I didn't miss our turn.”
Criteria are then specified for selecting one or more of the plurality of different phrases (e.g., utterances) to convey the information (block 506). For example, and as mentioned above, each of the phrases may be assigned a percentage value or a phrase may be selected based on a randomly or pseudo-randomly generated number. In the current example, the first phrase may be provided 25% of the time, the second phrase 25% of the time, and the third phrase 50% of the time. One or more of the plurality of phrases is selected based at least in part on the specified criteria (block 508). The selected phrase(s) is/are audibly output (e.g., by the position-determining device) (block 510).
Although the invention has been described in language specific to structural features and/or methodological acts, it is to be understood that the invention defined in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the claimed invention.
This Application claims the benefit of and priority to U.S. Provisional Application Ser. No. 61/017,218, filed Dec. 28, 2007, entitled “Method and Apparatus for Creating and Modifying Navigation Voice Syntax”, the disclosure of which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61017218 | Dec 2007 | US |