The invention relates to a method of speech-based navigation and to a method of implementing a speech input possibility in private information units for speech-based navigation in a communications network.
The distribution of information via networks becomes ever more complex. As a result, the Internet obtains growing importance as a communications network. To access information from the Internet it is important to utilize respective aiding means which simplify the finding of information.
Man's most common means of communication is speech. Utilizing speech as an input medium for communication with a computer does have some difficulties, however. A program executing speech recognition, in the following to be denoted a speech recognizer, is to be adapted, on the one hand, to the vocabulary which it is to understand and, on the other hand, to the speaker's pronunciation. For achieving satisfactory recognition results, a costly training is necessary. A basis for the speech recognition is further a powerful computer. This prerequisite is not satisfied in most computers by which users invoke information units. Local speech recognition systems are mostly arranged for only one user who must carry out a costly training of the vocabulary used by him, as described earlier.
DE 44 40 598 C I describes a hypertext navigation system controlled by spoken words. With a local speech recognizer, to which are assigned lexicons and probability models for supporting an acoustic speech recognition of hyperlinks of the hypertext documents, is enabled to control a browser or viewer. The system permits a pronunciation of links during which the speech recognition is adapted to the links to be recognized, without these links having to be known beforehand. For this purpose, the hypertext documents contain additional data which are necessary for adapting the speech recognizer. These additional data are generated either in the calling user system, or assigned to the hypertext documents by the provider and co-transmitted when retrieved by the user system.
DE 197 07 973 A1 discloses a method of executing actions by means of speech input on a computer in a network system, more particularly, the Internet. For this purpose, the user's computer includes a local speech recognizer whose parameters are defined by the respective service provider for executing the speech recognition process and are transmitted from the service provider to the user when the user so requests.
Such local speech recognition systems require a powerful computer and their flexibility with respect to the vocabulary is limited. An increase of the flexibility increases the number of data to be transmitted, because the parameters necessary for tuning the local speech recognizer to the local computer are to be transmitted. The transmission of a large number of data while having a limited transmission capacity, however, costs much more time.
Therefore, it is an object of the invention to enable a speech-based navigation for information units to predefined web sites. According to the invention this object is achieved in that a client downloads from a server a private information unit that enables a speech input, and a speech recognizer generates a recognition result from an uttered speech input, and with the recognition result a link in a data file is determined, which link is assigned to a word that correlates with the recognition result.
A user program which is mostly denoted a browser or viewer is executed on a client to indicate and display the information units. The calling client is connected via a respective connection in a communications network to a server of a service provider, which server enables accessing, for example, the Internet. An information unit is invoked by keying in an IP address or a URL (Universal Resource Locator). A further possibility of invoking information is provided by links or hyperlinks. These links will have a different color or will be underlined in the rest of the text. By clicking this link with the mouse, the information unit is invoked that goes with the link. Indicating information units and invoking further information units based on the information unit then indicated is called navigating. The information in the form of information units is offered by service providers and firms on the Internet and made accessible. Also private information units which are specifically called home pages are ever more offered on the Internet. The respective owner or maker of the home page then puts interesting information on this home page. Mostly such home pages contain details about the person, contributions to hobbies with, for example, photos. Furthermore, the owners of the home pages often indicate important links which a visitor to the home page should also have a look at. Also firms can create home pages and make them accessible on the Internet and mostly the first web page of a web site is called home page from which a user can navigate to other company-specific web pages.
A client downloads a private information unit from a server which is connected to the client through the communications network. This information unit is indicated to a user by means of a browser. The user is requested, for example, by information shown, to give a speech input. This speech input is transferred to a speech recognition server and fed there to a speech recognizer which carries out a speech recognition process. The recognition result produced by the speech recognizer is sent back to the client. The client transmits the recognition result to a data file. This data file is situated on a data file server on which a link correlating with the speech utterance is determined. The speech utterance then corresponds to a word to which a link is assigned.
In a further embodiment of the invention there is provided that the private information unit contains a user identifier. A recognition result produced by the speech recognizer from a speech input uttered by a user is transmitted with the user identifier to the data file. In the data file a link is determined with the aid of the recognition result and the user identifier. The data file contains assignments of links to words or user identifiers. In the case where there is correlation between a word from the assignment to the respective user identifier and the recognition result, the assigned link is returned to the client.
The determined link can be directly returned to the client, so that the user is to invoke the respective link himself. It proves to be highly advantageous, however, for the data file server to activate the determined link and for the connected information unit to be delivered and indicated to the client.
In a further embodiment of the invention it proves to be advantageous to give the private information unit an address of a speech recognition server on the Internet. This address is transmitted to the client when the private information unit is invoked. Speech inputs uttered by the user are then transmitted through the communications network to a speech recognizer on the speech recognition server, which speech recognizer then carries out the speech recognition. The recognition result produced by the speech recognizer is transmitted to the client. The higher calculation power of such a speech recognizer is advantageous when the recognition result is produced on a speech recognition server. These speech recognizers are specialized and have a specially tailored vocabulary, so that a speaker-independent speech recognition is possible. This achieves that there is a higher recognition rate and that the recognition result is available more rapidly.
In a further embodiment there is provided to locally carry out the speech recognition on the computer. For simple applications with a limited vocabulary and a sufficiently powerful computer, the speech recognition is executed locally on the client. As a result, there is no need to transmit to a remote speech recognizer, so that transmission errors are reduced. Furthermore, it is an object of the invention to implement a speech input possibility for home pages without the use of a local speech recognizer.
The object of implementing a possibility of speech input in home pages without the use of a local speech recognizer is achieved in that a registration information unit is downloaded from a server by means of a client, by means of which registration information unit user-specific links are assigned to predefined words, and the assignment with a user identifier is transmitted to a data file and in which the user identifier and an address of a speech recognizer, which can each be combined with a private information unit, are transmitted to the client.
A user who would like to implement a speech input possibility in his private information unit downloads a registration information unit from a server. On this registration information unit respective links are assigned to words predefined by the user. The assignment takes place by means of the keyboard and/or the mouse. When doing so the user assigns these links, which are connected to respective information units on the Internet, according to his own ideas. This user-specific assignment of words to personal links is transmitted to a data file. The data file stores this assignment linked with a user identifier. The user identifier and an address of a speech recognition server on which the speech recognizer is provided, are then transmitted to the client. This user identifier and the address of the speech recognizer are combined with this private information unit by the user of the client who is also denoted the owner/maker of the private information unit. By storing the assignment on the data file server with the individual user identifier and combining the user identifier with the private information unit, a speech input possibility in private information units is implemented. The maker of the home page enables the visitors to his home page to speak the respective predefined words and thus arrive by speech input at the information unit assigned by him per link, without the visitors executing a local speech recognition program on the invoking client.
In a further embodiment of the invention there is provided that the speech recognizer recognizes not only the predefined words. The speech recognizer also recognizes user-independent words. A service provider assigns a respective user-independent link to these user-independent words. Always when the speech recognizer produces a recognition result from a speech utterance, which result correlates with a user-independent word, a user-independent link is returned to the client to which the service provider assigned the respective user-independent word. It is also possible not to return the user-independent link to the client, but to send to the client directly the information unit connected with the user-independent link.
In a preferred embodiment of the invention there is provided to check, on the one hand, when the registration information unit is invoked and, on the other hand, when the private information unit in the speech input possibility is invoked, whether a software module is executed on the respectively invoking client. This software module executes a feature extraction. The speech input data which are led to this software module by means of an input medium, for example, a microphone and are available as an electric signal, are quantized by this software module and subjected to respective analyses which produce components which are assigned to feature vectors. These feature vectors are thereafter transmitted to the coupled speech recognizers. The software module furthermore takes over the handling of the transmission of the feature vectors and the reception of the recognition result and the transmission of the user identifier and recognition result to the data file server and the reception of the link. When the software module is not available, it is also downloaded from the server on which the information units to be invoked are stored.
For users of a client who do not have their own home page and, in consequence, cannot combine the user identifier and the address of a speech recognizer with this home page, there is provided to transmit to these users an information unit containing both the individual user identifier and an address of a speech recognizer. This information unit is indicated by the browser executed on the client and enables the user to invoke the information units per speech input via the links to which he has assigned respective predefined words and which were assigned to user-independent words by the service provider.
It proves to be advantageous when the data file, in which the assignment is stored with the user identifiers, and the speech recognizer are located on one server. This is advantageous in that the recognition result need not first be transmitted to the client again and from there to the data file server, but the recognition result is directly transmitted to the common server of the data file. The respective user identifier is then transmitted to the common server together with the feature vectors. This saves on delay and at the same time minimizes the error probability as a result of transmission errors that occur.
Furthermore, the object of the invention is achieved by means of a software module which assigns the speech input data to feature vectors. This software module transmits the feature vectors to the speech recognizers laid down in the address. The recognition result produced by the speech recognizer is received from this software module and transmitted to a data file together with the user identifier. A determined link is received from the software module and invoked, so that the information unit connected with the link is offered to the user of the invoking client.
In a preferred embodiment of the invention the software module is activated by means of an operating element. Activating this operating element represented, for example, as a button will start the recording of speech input data.
The object of the invention is also achieved by a computer on which a software module described above is executed.
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter.
The clients 1 and 2 are computers from where users invoke information units in the following to be referenced as home pages and/or web pages by means of a browser executed there. The information units which are put on the Internet by companies are denoted web sites. The input information unit of such a web site and information units of private persons are denoted home pages. A web site is understood to mean a collection of web pages which belong together. These home pages and web sites are stored, for example, on a server 6.
The speech recognition server 3 is a powerful computer on which a speech recognition program is executed. This speech recognition server 3 has an application-specific vocabulary that its architecture is optimized for the speech recognition. The data file server 5 is also a computer which is connected to the Internet 4. Assignments are stored on this data file server 5 connected to the Internet 4.
Either assignment 25-26 contains at least one word that is assigned to a respective link. The client 2 is furthermore connected to a speech recognition server 3 through the Internet 4. The connections 28 and 29 each represent a possible direct connection from the server 6 to the data file server 5 and from the speech recognition server 3 to the data file server 5. A determined link is directly transmitted from the data file server 5 to the server 6 via such a connection 28.
It is also possible to transmit the recognition result directly from a speech recognizer 8 to the data file server 5 via the connection 29. Then the client 2 also transmits user identifier IDn in addition to the feature vectors to the speech recognizer 8.
For starting a speech recording, the user activates with his mouse or keyboard a button 24 and utters a speech input. This speech input is subdivided into feature vectors as described earlier. The feature vectors are sent from the software module 21 to a defined speech recognizer 8 on the Internet 4. The speech recognizer 8 receives the feature vectors and produces a recognition result by means of a speech recognition program.
Alternatively, it is possible to select from a large number of predefined words, by selecting tag boxes with the mouse, a certain subsidiary number of words to which respective links are assigned. For verifying the predefined words it is possible that the creator enters the assigned words via speech input. These words are then transmitted to the speech recognizer 8 and recognized. The recognition result is returned to the client 1.
The speech recognizer recognizes not only the predefined words 41-43, but also user-independent words 47. The creator of the home page 27 assigns a link 44-46 to the predefined words 41-43. On the other hand, the service provider, for example the provider of the speech recognizer 8 or of the server 6, assigns links 48 to the user-independent words 47.
For this user-independent assignment it is necessary for the speech recognizer 8 also to recognize these user-independent words 47. The words 41-43, 47 that are recognized by the speech recognizer 8 are laid down by the provider of the speech recognizer 8.
When a user of a client does not have a home page 27 and does not wish to create a home page 27 either, it is nevertheless possible for him to navigate to predefined information units via a speech input. To this end, the user effects the assignment of the registration information unit 19, which is then transmitted to the data file server 5 to be stored under a user identifier IDn. From this data file server 5 is then transmitted a data file that can be displayed by the browser 20 and which data file contains the user identifier ID′, and the address of the speech recognizer. The user, when invoking this data file, can navigate with each speech input to the web pages determined by him or by the service provider.
On the server 6 on which the home page 27 of the creator is stored can in the simplest case also be stored the data file 5 with the assignments 25-26, and also the speech recognizer 8 can be arranged there. This arrangement is not shown. In such a case the feature vectors with user identifier IDn are transmitted from the client 2 to this single server 6. The recognition result produced by the speech recognizer 8 is transmitted directly to the server 6 of the data file 5 together with the user identifier ID, in which file the link to this recognition result and also to this user identifier IDn is determined. This link is then either returned to the client 2, or the web site combined with this link is transmitted to the client 2.
The creator of a speech-based home page 27 assigns on a registration information unit 19 the following links to predefined words: “hobby∓www.sport.de”; “books—www.books.de”; “studies—www.uni.de”. This assignment is transmitted from the client I to the data file server 5. There the user of the client I is registered if he receives an individual user identifier IDn and his assignment 25-26 is stored on the data file server 5. To the client I is transmitted, for example, in the form of an E-mail the user identifier granted to him together with an address of the speech recognizer. The creator of the speech-based home page 27 combines both the user identifier IDn and the address of the speech recognizer 8 with his private home page 27. This home page is then, for example, stored on the server 6. In addition to the words 41-43 assigned by the creator, the service provider combines user-independent words 47 with user-independent links 48; for example, the word “politics−∓www.politics.de” or “telephone directory—). www.number.de”. The user of the client 2 accesses the creator's private home page 27. This is shown on the client 2 by the browser 20.
By means of a click of the mouse the user activates the button 24 and gives a speech input.
The word “books” spoken by the user is subdivided by the software module 21 into feature vectors which are then sent to the speech recognizer 8 known from the transmitted address.
There a recognition result is produced from the speech input “books” and sent back to the client 2. This recognition result is transmitted together with the user identifier IDn to the data file 5 in which the link www.books.de is defined under the creator's user identifier IDn and the recognition result. This link is transmitted to the client 2 and activated by the client 2. The web site connected with the link www.books.de is then displayed on the client 2. When the user of the client 2 pronounces “politics”, the web site www.politics.de will be displayed. When the user of the client 2 invokes a private home page of a second creator and this second creator has combined the word “books” with www.bookworm.de, the web site www.bookworm.de will be displayed when “books” is pronounced. With a speech input of the user-independent word 6 politics”, on the other hand, the same web site will be invoked just like the private home page 27 of the first creator.
When a speech input possibility is implemented in the home page of a web site of companies, the creator assigns links to web pages from all the web sites. As a result, it is possible to reach web pages of the individual sub-ranges of a company for each language. The speech recognizer is matched to the vocabulary of a company via the predefined words. The specific vocabulary may contain, for example, product names, so that a visitor of such a speech-based company home page is shown the respective web pages on his client by pronouncing the product names or brand names in which he takes an interest.
The user-independent words can be assigned to interested parties by means of commercial transactions, so that when the user-independent word is pronounced, the web page of the interested party is automatically invoked or activated. This link is effected by the provider of the speech recognizer who has to take care that this user-independent word is sold or rented to only one interested party. The web page of the interested party may also be linked with a plurality of words so that, for example, with connotations belonging to a theme always the same web page is invoked. The user-independent words may be temporarily issued to interested parties. In addition, it is possible to invoke or activate such a web page via a speech utterance which is recognized in different languages.
In order to guarantee such a function, the respective word or speech utterance, or the pronunciation of the word respectively, in different languages in the speech recognizer is made known by the provider of the speech recognizer. A user of a speech-based web site now effects a respective speech input. This is recognized by the speech recognizer and the produced recognition result is sent back to the invoking client. The recognition result is sent with the user identifier, where appropriate, to the data file in which the assigned link is determined and either sent back to the client, or the web page connected with the link is transmitted to the client.
Although various exemplary embodiments of the invention have been disclosed, it should be apparent to those skilled in the art that various changes and modifications can be made which will achieve some of the advantages of the invention without departing from the true scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
DE 19926213.6 | Jun 1999 | DE | national |
DE 19930407.6 | Jul 1999 | DE | national |
This application is a continuation of U.S. patent application Ser. No. 09/387,627, filed Aug. 31, 1999, which in turn claimed foreign priority under 35 U.S.C. §119 from German Patent Application 19926213.6, filed Jun. 9, 1999, and from German Patent Application 19930407.6, filed Jul. 2, 1999.
Number | Date | Country | |
---|---|---|---|
Parent | 09387627 | Aug 1999 | US |
Child | 10960775 | Oct 2004 | US |