The invention relates to a computer system, a method and a digital storage medium for multilingual associative searching.
Associative searching is a method which is known per se from the prior art. In contrast to normal database using prescribed query methods, associative searching does not involve the use of any prescribed query language to formulate a search query, but rather a text passage. The user can use the text passage to describe the contents of a search query in his own words or sentences.
The text message-type of search is based either on previously stipulated algorithms or on a neural network which has been trained beforehand. The neural network is trained using preclassified example documents. In this context, the text of an example document serves as an input parameter for the neural network, and the classification ascertained by the neural network is aligned with the prescribed classification in order to train the neurons.
An appropriate piece of software for associative searching is commercially available from SER Systems AG, SER brainware (www.ser.de). This program allows associative searching on the basis of example text passages. In this case, the associative search makes use of a neural network previously trained in a classification mode. The learning process used in the course of this is also referred to as “learning by example”.
A drawback of previously known associative search methods is that the search query can be formulated only in the same language of that in which the neural network has been trained.
Against this background, the invention provides an improved method for associative searching which allows a multilingual associative search. In addition, the invention provides an appropriate computer system and a digital storage medium.
Accordingly, the invention utilizes means of the features of the independent patent claims. Preferred embodiments of the invention are specified in the dependent patent claims.
The invention provides a method for multilingual associative searching which allows the search text to be inputted in a first language which is different from a second language, in which the associative search module's neural network has been trained. To this end, the search text in the first language is translated into the second language by means of automatic translation and is then inputted into the associative search module. In this context, simple automatic translation methods based on word-for-word equivalence may be used, or else translation methods which take further-developed grammar and syntax into account may be used.
For this, the invention makes use of the surprising effect in that, although automatic translations, particularly automatic translations based on word-for-word equivalence, are relatively inaccurate and sometimes have barely comprehensible or grammatically incorrect translation results, such an automatically translated search text may nevertheless be used for an associative search without significantly impairing the quality of the associative search.
In accordance with one preferred embodiment of the invention, the language of the search text is recognized automatically. Such automatic recognition methods are known per se from the prior art and are implemented, by way of example, in Microsoft Word. The user is thus able to input his search text in any language which is supported by the system. The language of the search text is then recognized automatically and the translation module required for translating from the language of the search text into the second language is called.
In accordance with another preferred embodiment of the invention, the associative search is made in documents in different languages. To this end, a neural network is trained for each of the languages using example documents in the respective language.
Preferably, the results of the various associative searches are output in a single sorted list. To sort the list, this may involve the use of “ranking values” or “reliability values”, which indicate the degree to which the search text concurs with a hit.
In accordance with another preferred embodiment of the invention, text files are obtained from voice files through automatic voice recognition. These text files can then be searched using a method in accordance with the invention. A voice file is, by way of example, the sound file for a multimedia file stored on a DVD.
In the drawings, wherein like reference numerals delineate similar elements throughout the several views:
Generally, the translation module 106 may be any translation program. Preferably, a translation method based on word-for-word equivalence is used. Such translation methods are used in commercially available voice computers and are known per se from the prior art.
The computer system 100 also includes an associative search module 108 which comprises a neural network 110. The neural network 110 has been trained in a classification mode using documents in the target language SZ which have been categorized by a user.
When a search text in the target language SZ is inputted into the associative search module 108, the neural network 110 is used to ascertain documents in the database 102 which belong to the category matched by the search text. In addition, each of the “hits” has a “ranking value” output which indicates the degree of concurrence between the search text and the hit. The corresponding hits list is preferably sorted according to the ranking values and is output as hits list 112 via the user interface 104.
During operation of the computer system 100, a user uses the user interface 104 to input an input text in the input language SE. The search text may be a search query in which the user uses a few words, sentences or an example text passage to describe the contents of the documents which are to be sought.
Input of the search text in the language SE starts the translation module 106, which translates the search text into the target language SZ automatically. The translated search text is then input into the associative search module 108.
Using the neural network 110, documents in the database 102 which are similar to the search text are then identified and assessed with a ranking value in an extraction mode. The corresponding results are output as hits list 112, each element of the hits list being able to be a hyperlink to the relevant document in the database 102, for example.
In step 204, the search text translated into the target language SZ is input into an associative search module which has a neural network trained using documents in the target language SZ. In step 206, the associative search is performed using the neural network. Besides the actual hits, the neural network also ascertains a ranking or reliability value for each of the hits (step 208). In step 210, the hits list sorted according to ranking is output.
A particular advantage when using a translation method based on word-for-word equivalence is that, firstly, the quality of the translation is sufficient for the purposes of associative searching and that, secondly, the time required for the translation is minimal. This is essential for user-friendly execution of database queries, since, particularly for reasons of software ergonomics, the latency between input of the search text and output of the hits list should be as short as possible.
Unlike in the embodiment in
The user interface 304 is linked to a voice recognition module 305. The voice recognition module 305 automatically recognizes the input language SEj in which the user has input the input text using the user interface 304. The voice recognition module 305 is linked to a translation module 306.
The translation program 307 has a corresponding translation component 314 for each of the m different input languages SEj supported by the computer system 300. Each of the translation components 314 has a number of n translation modules 306 for automatically translating the input language SEj into one of the target languages SZi supported by the computer system 300, where 0<i≦n.
Subsequently, without limiting general nature, it is assumed that the number m of input languages supported by the computer system 300 is equal to the number n of target languages supported, and that also the input languages are identical to the target languages. In this case, each of the translation components 314 contains a number of m−1 translation modules 306 for translation from the respective input language into the other target languages.
By way of example, the translation component 314 for the input language German SE1 thus has translation modules 306 for automatic translation into the target languages English, French, Japanese and Russian. The situation is similar for the other translation components 314, which are each associated with another of the input languages.
The translation program 307 is linked to an associative search module 308. For each of the target languages, the associative search module 308 has a neural network 310 which has been trained using categorized documents in the respective target language. In the exemplary case under consideration, the associative search module 308 thus has a number of m different neural networks 310, with each of the neural networks 310 being associated with one of the languages supported by the computer system 300. Accordingly, the database 302 contains documents in these various languages which can be searched by means of an associative search. Alternatively, the documents may be stored distributed over a plurality of databases.
During operation of the computer system 300, the user uses the user interface 304 to input an input text in one of the input languages SEj which is supported by the computer system 300. The input language is then automatically recognized by the voice recognition module 305. Next, the translation component 314 associated with the input language is started, so that the search text is translated into the various target languages SZi which differ from the input language, where i≠j, using the translation modules 306 in the translation component 314 in question.
The various translations of the search text are then made the basis of the corresponding associative searches by the neural networks 310. In addition, the search text in the input language is also used for the associative search using one of the neural networks 310, since the input language is also simultaneously one of the target languages in the exemplary case under consideration here, of course. The results of the individual associative searches are then output in a sorted hits list 312 via the user interface 304.
Thus, when a user inputs, by way of example, a search text in German SE1 using the user interface 304, German is automatically recognized as the input language SE1 by the voice recognition module 305. The voice recognition module 305 then starts that translation component 314 in the translation module 307 which is associated with the input language German SE1. Next, the search text is translated by the various translation modules 306 into the target languages English, French, Japanese and Russian.
In addition, the original search text is input into the neural network 310 associated with the German language for the purpose of performing an associative search. Accordingly, the search texts which have been translated into English, French, Japanese and Russian are input into those neural networks 310 in the associative search module 308 which are associated with the respective languages. The corresponding hits which are found in the respective language are preferably output in a common hits list 312 which has been sorted according to the ranking values.
The search texts translated into the various target languages and also the search text in the input language—if the input language is one of the target languages—are input into the associative search module in step 406.
Next, respective associative searches for documents in the various target languages are performed in steps 408, 410, 412, which run in parallel. By way of example, step 408 involves a search for documents in the target language SZ1 being performed using the input text which has been translated into the target language SZ1. Accordingly, step 410 involves a search for documents in the target language SZ2 being performed using the search text which has been translated into the target language SZ2 etc.
The corresponding steps 414, 416, 418, . . . involve a respective ranking value being calculated for each of the hits ascertained. In step 420, the hits are sorted according to ranking values, and are output in a single hits list in step 422
Number | Date | Country | Kind |
---|---|---|---|
10348920.7 | Oct 2003 | DE | national |