The disclosure relates to speech recognition systems, and more particularly to speech recognition systems having diverse language support.
Speech recognition systems may be used to receive and process speech input and perform a number of actions based on the speech input. For example, it is common to use speech recognition systems to provide search results based on a spoken search command. In the past, monolingual systems have been provided that recognize a single language (e.g., English or Spanish). More recently, speech recognition systems have been provided where a user can choose a single language preference between multiple available languages.
In one embodiment, a method for providing cross-language automatic speech recognition is provided. The method includes choosing a preferred first language for a speech recognition system. The speech recognition system supports multiple languages. A search operation is initiated using the speech recognition system. A user is prompted to continue the search operation in the first language or a second language. In response to the user selection of continuing in the second language, searching is provided in the second language and interaction is provided with the user in the first language during the search operation.
In another embodiment, an automatic speech recognition system provides cross-language automatic speech recognition and includes a computing device including one or more processors and one or more memory components. The computing device includes speech and language logic that, in response to a user initiating a search operation, prompts the user to continue the search operation in a first language or a second language and, in response to the user selection of continuing in the second language, provides searching in the second language and provides interaction with the user in the first language during the search operation.
In another embodiment, a method for providing cross-language automatic speech recognition is provided. The method includes initiating an address search operation using a speech recognition system. The speech recognition system has a preferred first language and supporting at least one other language. A user is prompted to continue the address search operation in the first language or the at least one other language after the address search is initiated. In response to the user selection of continuing in the at least one other language, searching is provided in the at least one other language and providing interaction with the user in the first language.
These and additional features provided by the embodiments described herein will be more fully understood in view of the following detailed description, in conjunction with the drawings.
The embodiments set forth in the drawings are illustrative and exemplary in nature and not intended to limit the subject matter defined by the claims. The following detailed description of the illustrative embodiments can be understood when read in conjunction with the following drawings, where like structure is indicated with like reference numerals and in which:
Embodiments described herein are generally directed to speech recognition systems having diverse language support. Such speech recognition systems are configured to handle a variety of inputs, such as multiple languages and formats, and provide desired outputs based on the variety of inputs. As one example, the speech recognition systems may include logic that facilitates searching and other functions in multiple languages without changing language preferences. As another example, the speech recognition systems may include logic that facilitates searching of addresses in non-traditional formats, such as irregular house addresses with dashes or other characters.
Referring now to the drawings,
The vehicle 102 also includes a vehicle computing device 114 that can provide computing functions for the speech recognition system 100. The vehicle computing device 114 may include a processor 132 and a memory component 134, which may store speech and language logic 144. The speech and language logic 144 may include a plurality of different pieces of logic, each of which may be embodied as a computer program, firmware and/or hardware, as examples. For example, the speech and language logic 144 may have access to phonetic data saved in the memory component 134 for supporting a variety of languages, such as English, French and Spanish. The speech and language logic 144 may also have access to non-traditional addresses and address formats.
Referring now to
The speech recognition system 100 includes one or more processors 132, a communication path 204, one or more memory components 134, the display 124, the speaker 122, tactile input hardware 126a, the peripheral tactile input 126b, the microphone 120, the activation switch 128, network interface hardware 218, and a satellite antenna 230. The various components of the speech recognition system 100 and the interaction thereof will be described in detail below.
As noted above, the speech recognition system 100 includes the communication path 204. The communication path 204 may be formed from any medium that is capable of transmitting a signal such as, for example, conductive wires, conductive traces, optical waveguides, or the like. Moreover, the communication path 204 may be formed from a combination of mediums capable of transmitting signals. In one embodiment, the communication path 204 comprises a combination of conductive traces, conductive wires, connectors, and buses that cooperate to permit the transmission of electrical data signals to components such as processors, memories, sensors, input devices, output devices, and communication devices. Accordingly, the communication path 204 may comprise a vehicle bus, such as for example a LIN bus, a CAN bus, a VAN bus, and the like. Additionally, it is noted that the term “signal” means a waveform (e.g., electrical, optical, magnetic, mechanical or electromagnetic), such as DC, AC, sinusoidal-wave, triangular-wave, square-wave, vibration, and the like, capable of traveling through a medium. The communication path 204 communicatively couples the various components of the speech recognition system 100. As used herein, the term “communicatively coupled” means that coupled components are capable of exchanging data signals with one another such as, for example, electrical signals via conductive medium, electromagnetic signals via air, optical signals via optical waveguides, and the like.
As noted above, the speech recognition system 100 includes the one or more processors 132. Each of the one or more processors 132 may be any device capable of executing machine readable instructions (e.g., including the speech and language logic). Accordingly, each of the one or more processors 132 may be a controller, an integrated circuit, a microchip, a computer, or any other computing device. The one or more processors 132 are communicatively coupled to the other components of the speech recognition system 100 by the communication path 204. Accordingly, the communication path 204 may communicatively couple any number of processors with one another, and allow the modules coupled to the communication path 204 to operate in a distributed computing environment. Specifically, each of the modules may operate as a node that may send and/or receive data.
As noted above, the speech recognition system 100 includes the one or more memory components 134. Each of the one or more memory components 134 of the speech recognition system 100 is coupled to the communication path 204 and communicatively coupled to the one or more processors 132. The one or more memory components 134 may include RAM, ROM, flash memories, hard drives, or any device capable of storing machine readable instructions such that the machine readable instructions can be accessed and executed by the one or more processors 132. The machine readable instructions may comprise logic or algorithm(s) written in any programming language of any generation (e.g., 1GL, 2GL, 3GL, 4GL, or 5GL) such as, for example, machine language that may be directly executed by the processor, or assembly language, object-oriented programming (OOP), scripting languages, microcode, etc., that may be compiled or assembled into machine readable instructions and stored on the one or more memory components 134. Alternatively, the machine readable instructions may be written in a hardware description language (HDL), such as logic implemented via either a field-programmable gate array (FPGA) configuration or an application-specific integrated circuit (ASIC), or their equivalents. Accordingly, the methods described herein may be implemented in any conventional computer programming language, as pre-programmed hardware elements, or as a combination of hardware and software components.
In some embodiments, the one or more memory components 134 may include one or more speech recognition algorithms, such as an automatic speech recognition engine that processes speech input signals received from the microphone 120 and/or extracts speech information from such signals, as will be described in further detail below. Furthermore, the one or more memory components 134 may include machine readable instructions that, when executed by the one or more processors 132, cause the speech recognition system 100 to perform the actions described below.
Still referring to
The speech recognition system 100 includes the speaker 122 for transforming data signals from the speech recognition system 100 into mechanical vibrations, such as in order to output audible prompts or audible information from the speech recognition system 100. The speaker 122 is coupled to the communication path 204 and communicatively coupled to the one or more processors 132. However, it should be understood that in other embodiments the speech recognition system 100 may not include the speaker 122, such as in embodiments in which the speech recognition system 100 does not output audible prompts or audible information, but instead visually provides output via the display 124.
Still referring to
The speech recognition system 100 may include the peripheral tactile input 126b coupled to the communication path 204 such that the communication path 204 communicatively couples the peripheral tactile input 126b to other modules of the speech recognition system 100. For example, in one embodiment, the peripheral tactile input 126b is located in a vehicle console to provide an additional location for receiving input. The peripheral tactile input 126b operates in a manner substantially similar to the tactile input hardware 126a, i.e., the peripheral tactile input 126b includes movable objects and transforms motion of the movable objects into a data signal that may be transmitted over the communication path 204.
As noted above, the speech recognition system 100 includes the microphone 120 for transforming acoustic vibrations received by the microphone into a speech input signal. The microphone 120 is coupled to the communication path 204 and communicatively coupled to the one or more processors 132. As will be described in further detail below, the one or more processors 132 may process the speech input signals received from the microphone 120 and/or extract speech information from such signals.
Still referring to
As noted above, the speech recognition system 100 includes the network interface hardware 218 for communicatively coupling the speech recognition system 100 with a mobile device 220 or a computer network. The network interface hardware 218 is coupled to the communication path 204 such that the communication path 204 communicatively couples the network interface hardware 218 to other modules of the speech recognition system 100. The network interface hardware 218 can be any device capable of transmitting and/or receiving data via a wireless network. Accordingly, the network interface hardware 218 can include a communication transceiver for sending and/or receiving data according to any wireless communication standard. For example, the network interface hardware 218 may include a chipset (e.g., antenna, processors, machine readable instructions, etc.) to communicate over wireless computer networks such as, for example, wireless fidelity (Wi-Fi), WiMax, Bluetooth, IrDA, Wireless USB, Z-Wave, ZigBee, or the like. In some embodiments, the network interface hardware 218 includes a Bluetooth transceiver that enables the speech recognition system 100 to exchange information with the mobile device 220 (e.g., a smartphone) via Bluetooth communication.
Still referring to
The cellular network 222 generally includes a plurality of base stations that are configured to receive and transmit data according to mobile telecommunication standards. The base stations are further configured to receive and transmit data over wired systems such as public switched telephone network (PSTN) and backhaul networks. The cellular network 222 can further include any network accessible via the backhaul networks such as, for example, wide area networks, metropolitan area networks, the Internet, satellite networks, or the like. Thus, the base stations generally include one or more antennas, transceivers, and processors that execute machine readable instructions to exchange data over various wired and/or wireless networks.
Accordingly, the cellular network 222 can be utilized as a wireless access point by the mobile device 220 to access one or more servers (e.g., a first server 224 and/or a second server 226). The first server 224 and second server 226 generally include processors, memory, and chipset for delivering resources via the cellular network 222. Resources can include providing, for example, processing, storage, software, and information from the first server 224 and/or the second server 226 to the speech recognition system 100 via the cellular network 222. Additionally, it is noted that the first server 224 or the second server 226 can share resources with one another over the cellular network 222 such as, for example, via the wired portion of the network, the wireless portion of the network, or combinations thereof
Still referring to
The speech recognition system 100 may include a satellite antenna 230 coupled to the communication path 204 such that the communication path 204 communicatively couples the satellite antenna 230 to other modules of the speech recognition system 100. The satellite antenna 230 is configured to receive signals from global positioning system satellites. Specifically, in one embodiment, the satellite antenna 230 includes one or more conductive elements that interact with electromagnetic signals transmitted by global positioning system satellites. The received signal is transformed into a data signal indicative of the location (e.g., latitude and longitude) of the satellite antenna 230 or an object positioned near the satellite antenna 230, by the one or more processors 132. Additionally, it is noted that the satellite antenna 230 may include at least one of the one or more processors 132 and the one or memory components 134. In embodiments where the speech recognition system 100 is coupled to a vehicle, the one or more processors 132 execute machine readable instructions to transform the global positioning satellite signals received by the satellite antenna 230 into data indicative of the current location of the vehicle. While the speech recognition system 100 includes the satellite antenna 230 in the embodiment depicted in
Still referring to
Referring now to
The language inventories 240, 242 and 244 may be formed of one or more component inventories, and may generally include vocabulary data and phonetic data. Phonetic data links words to their pronunciations and is used by the speech and language logic 144 to identify words based on the spoken commands of the user. Each language inventory 204, 242 and 244 may be associated with a different language. For example, language inventory 204 may be associated with English, language inventory 242 may be associated with French and language inventory 244 may be associated with Spanish. While only three language inventories are shown, more or less than three language inventories may be used and associated with any of the languages spoken around the world. Further, while the inventories are shown separate for illustration, they may be combined. Customized language inventories may also be created and used.
The speech recognition system 100 may provide cross-language ASR capabilities. The speech recognition system 100 may provide the cross-language ASR capabilities via user-driven commands that cause the speech and language logic 144 to switch between the language inventories 240, 242 and 244 (e.g., from a preferred language inventory to a new language inventory) for recognizing the voice input. For example, a French speaking user having French as a preferred language for the speech recognition system 100 may have an opportunity to voice input English commands upon prompting by the speech recognition system 100 and acknowledgement by the user. Such an arrangement can facilitate various input driven features, such as searching for terms or addresses in a different language using map data 246, despite having another language as the preferred language. In some embodiments, although a different language inventory 240, 242, 244 may be used for ASR, the preferred language may continue to be used for output to the user, such as for display or sound output.
Referring to
The above-described speech recognition systems can handle a variety of inputs, such as multiple languages and formats, and provide desired outputs based on the variety of inputs. The speech recognition systems may include logic that facilitates searching and other functions in multiple languages without changing language preferences. In some embodiments, the speech recognition systems may include logic that facilitates searching of addresses in non-traditional formats, such as irregular house addresses with dashes or other characters.
While particular embodiments have been illustrated and described herein, it should be understood that various other changes and modifications may be made without departing from the spirit and scope of the claimed subject matter. Moreover, although various aspects of the claimed subject matter have been described herein, such aspects need not be utilized in combination. It is therefore intended that the appended claims cover all such changes and modifications that are within the scope of the claimed subject matter.