1. Field of the Invention
The present invention relates to the field of searching a database using search term(s) entered by a user. More particularly, the present invention is a system and method for searching on a database including material in different languages where the search term(s) are entered in one of the languages where the database need not be translated into the different languages.
2. Background Art
Various methods have been proposed for searching a database wherein the database includes material in multiple languages. One approach is to translate the entire database into the language in which a search term is entered or the language of the user. However, this could involve a large amount of translation for a sizable database (and multiple translations if the database is used by users in different languages). Further, each process of translating a document has the potential for losing (or distorting) some of the meaning of the original text.
For these reasons, it is desirable to avoid translating the documents to allow for a search in a particular language.
Another approach is to use synonym list and apply it to the search term(s) entered in one language. That is, the text of the documents in the database remain in the original language and synonyms in each language for each search term(s) are used for the search of the database. This system may work in some cases but is undesirable in other cases because considering all of synonyms in the different languages could lead to incorrect results. The word for “network” in Spanish is “red” and a search on “network” which blindly translates the search term would incorrectly find English documents which include the color “red”.
Further, some of the documents include text in one language and key words presented in a different language to avoid changing the meaning. Thus, it is desirable to search a database which includes these terms but would not be effective to search only for the translated form of the word.
As will be apparent to one skilled in the relevant art, the process of translating and searching in multiple languages can consume substantial computing resources. Many of the multi-language database searching techniques require a powerful computer or take an inordinate amount of time to process a single search, the amount depending on the size of the database, the number of supported languages and the nature of the queries. However, the computing resources have a cost associated with them, either in requiring a larger or faster system or in terms of tying up the computer while a large task is running to the exclusion of other users. Further, a search which takes a long period of time may prevent the user from interactively modifying the search to obtain meaningful results. Accordingly, it is desirable to avoid using large computing resources.
Accordingly, existing systems methods for searching databases have undesirable disadvantages and limitations which will be apparent to those skilled in the art in view of the following description of the present invention.
The present invention overcomes the disadvantages and limitations of the prior art systems by providing a simple, yet effective, method and system for searching a database including documents in multiple supported languages. The present invention also supports searching a database in which the text is comprised of documents written in multiple languages, including those documents which are written in one language but which include words or phrases from a second language.
The present invention has the advantage that a translation of the documents in the database into each of the supported languages is not required.
The present invention also has the advantage that the meaning of the original document is not lost or distorted through a translation process to allow searching of the document in different languages.
The present invention also allows for the searching of a database in a native or natural language while finding documents which are written in other languages.
Other objects and advantages of the system and method of the present invention will be apparent to those skilled in the relevant art, in view of the following description of the preferred embodiment, taken together with the accompanying drawings and the appended claims.
Having thus described some of the objects and advantages of the present invention, other objects and advantages will be apparent to those skilled in the art in view of the following description of the invention taken in conjunction with the accompanying drawings in which:
In the following description of the preferred embodiment, the best implementation of practicing the invention presently known to the inventor will be described with some particularity. However, this description is intended as a broad, general teaching of the concepts of the present invention describing a specific embodiment but is not intended to be limiting the present invention to that as shown in this embodiment, especially since those skilled in the relevant art will recognize many variations and changes to the specific structure and operation shown and described with respect to these figures.
However, some technical documents are written in a native language (such as Spanish) but use technical terms from another language (for example, from English). In such a system, searching the national language database for the national language equivalent of a search term will not find the search term if it is included in the document in another language.
Thus, the process of creating an inverted index involves steps of creating in block 232 an index in each language and in creating a merged inverted index in block 234 using the keyword dictionary 220 which includes synonyms in each supported language. While two languages are shown in the figures of the present invention, the present invention can easily be expanded to support the desired number of languages, and, while English is described as one language for the documents and for the searches, the present invention is not limited to serving documents in English and another language could be substituted, if desired.
In
The present invention, it will be recognized, is especially adapted for use in a data processing system such as a general purpose computer with a stored program containing computer program means including a plurality of instructions. Those instructions will generally be written in a high level language which is readable by a human and translated into machine language, that is, simple instructions which are understood by the data processing system. In an appropriate instance such instructions could be directly written in a machine language programming language, if desired, a system which allows for efficiency of execution but which is more difficult to program. The present invention is not limited to any particular input language.
As used in the present document, software, computer program and computer program means are used interchangeably. Software in the present context means any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following a) conversion to another language, code or notation; b) reproduction in a different material form. The use of the Unicode system for managing different languages has been used in the description of the preferred embodiment but other suitable methods for representing different languages could also be used to advantage in the present invention, if desired.
The term national language has been used to represent a language associated with a user of the system. This language could be any language supported by the system, and might include different languages for different users. So, “national language” might represent Spanish for a Mexican or a person from Spain and might represent French for a person from France or other French-speaking locales. Appropriate synonym tables are available for a variety of common languages as are systems for locating key words and separating common text with little uniqueness from key words which are descriptive of the document under consideration. Such key word locating systems are often technologically directed and identify words which are of interest to the technology under consideration.
Of course, many modifications of the present invention will be apparent to those skilled in the relevant art in view of the foregoing description of the preferred embodiment, taken together with the accompanying drawings and the appended claims. For example, the present invention has been described in connection with documents and searches in English and in a national language whereas the number of supported languages need not be 2 and need not be a single national language. Further, in some circumstances, the documents could be written in a combination of supported languages. Additionally, some elements of the present invention can be used to advantage without the corresponding use of other elements. For example, the use of the synonym or keyword dictionary is not the only way to accomplish the translation of keywords into other language. Further, various other devices could be substituted to advantage depending on the environmental circumstances. Accordingly, the foregoing description of the preferred embodiment should be considered as merely illustrative of the principles of the present invention and not in limitation thereof.
This application is a continuation of U.S. patent application Ser. No. 11/151,047, filed on 13 Jun. 2005 now U.S. Pat. No. 7,433,894, which is a continuation of U.S. patent application Ser. No. 10/066,346, filed on 1 Feb. 2002 now U.S. Pat. No. 6,952,691, both of which are hereby incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
5136505 | Inamori et al. | Aug 1992 | A |
5398302 | Thrift | Mar 1995 | A |
5499366 | Rosenberg et al. | Mar 1996 | A |
5737734 | Schultz | Apr 1998 | A |
5794178 | Caid et al. | Aug 1998 | A |
5819263 | Bromley et al. | Oct 1998 | A |
5878423 | Anderson et al. | Mar 1999 | A |
5893092 | Driscoll | Apr 1999 | A |
5899991 | Karch | May 1999 | A |
5956708 | Dyko et al. | Sep 1999 | A |
5956711 | Sullivan et al. | Sep 1999 | A |
5956740 | Nosohara | Sep 1999 | A |
5987457 | Ballard | Nov 1999 | A |
5991713 | Unger et al. | Nov 1999 | A |
6005860 | Anderson et al. | Dec 1999 | A |
6008817 | Gilmore, Jr. | Dec 1999 | A |
6041326 | Amro et al. | Mar 2000 | A |
6055528 | Evans | Apr 2000 | A |
6065026 | Cornelia et al. | May 2000 | A |
6081774 | De Hita et al. | Jun 2000 | A |
6085162 | Cherny | Jul 2000 | A |
6085186 | Christianson et al. | Jul 2000 | A |
6094647 | Kato et al. | Jul 2000 | A |
6102969 | Christianson et al. | Aug 2000 | A |
6111572 | Blair et al. | Aug 2000 | A |
6141005 | Hetherington et al. | Oct 2000 | A |
6163785 | Carbonell et al. | Dec 2000 | A |
6169986 | Bowman et al. | Jan 2001 | B1 |
6219646 | Cherny | Apr 2001 | B1 |
6226638 | Okura et al. | May 2001 | B1 |
6237011 | Ferguson et al. | May 2001 | B1 |
6240408 | Kaufman | May 2001 | B1 |
6240412 | Dyko et al. | May 2001 | B1 |
6259933 | Bambridge et al. | Jul 2001 | B1 |
6262725 | Hetherington et al. | Jul 2001 | B1 |
6275789 | Moser et al. | Aug 2001 | B1 |
6275810 | Hetherington et al. | Aug 2001 | B1 |
6278967 | Akers et al. | Aug 2001 | B1 |
6327590 | Chidlovskii et al. | Dec 2001 | B1 |
6338055 | Hagmann et al. | Jan 2002 | B1 |
6349307 | Chen | Feb 2002 | B1 |
6360196 | Poznanski et al. | Mar 2002 | B1 |
6424973 | Baclawski | Jul 2002 | B1 |
6453159 | Lewis | Sep 2002 | B1 |
6463430 | Brady et al. | Oct 2002 | B1 |
6516312 | Kraft et al. | Feb 2003 | B1 |
6523026 | Gillis | Feb 2003 | B1 |
6526440 | Bharat | Feb 2003 | B1 |
6560634 | Broadhurst | May 2003 | B1 |
6571249 | Garrecht et al. | May 2003 | B1 |
6581072 | Mathur et al. | Jun 2003 | B1 |
6602300 | Ushioda et al. | Aug 2003 | B2 |
6604099 | Chung et al. | Aug 2003 | B1 |
6604101 | Chan et al. | Aug 2003 | B1 |
6629097 | Keith | Sep 2003 | B1 |
6636848 | Aridor et al. | Oct 2003 | B1 |
6643661 | Polizzi et al. | Nov 2003 | B2 |
6654734 | Mani et al. | Nov 2003 | B1 |
6711568 | Bharat et al. | Mar 2004 | B1 |
6718333 | Matsuda | Apr 2004 | B1 |
6738764 | Mao et al. | May 2004 | B2 |
6738767 | Chung et al. | May 2004 | B1 |
6766316 | Caudill et al. | Jul 2004 | B2 |
6772150 | Whitman et al. | Aug 2004 | B1 |
6778979 | Grefenstette et al. | Aug 2004 | B2 |
6813496 | Numminen et al. | Nov 2004 | B2 |
6829599 | Chidlovskii | Dec 2004 | B2 |
6836777 | Holle | Dec 2004 | B2 |
6901399 | Corston et al. | May 2005 | B1 |
6928432 | Fagan et al. | Aug 2005 | B2 |
6941294 | Flank | Sep 2005 | B2 |
6952691 | Drissi et al. | Oct 2005 | B2 |
7027974 | Busch et al. | Apr 2006 | B1 |
7039625 | Kim et al. | May 2006 | B2 |
7051023 | Kapur et al. | May 2006 | B2 |
7117199 | Frank et al. | Oct 2006 | B2 |
7124364 | Rust et al. | Oct 2006 | B2 |
7127456 | Brown et al. | Oct 2006 | B1 |
7136845 | Chandrasekar et al. | Nov 2006 | B2 |
7174564 | Weatherspoon et al. | Feb 2007 | B1 |
7197508 | Brown, III | Mar 2007 | B1 |
7318057 | Aridor et al. | Jan 2008 | B2 |
20010021947 | Kim | Sep 2001 | A1 |
20020002452 | Christy et al. | Jan 2002 | A1 |
20020007364 | Kobayashi et al. | Jan 2002 | A1 |
20020007384 | Ushioda et al. | Jan 2002 | A1 |
20020016787 | Kanno | Feb 2002 | A1 |
20020042789 | Michalewicz et al. | Apr 2002 | A1 |
20020059289 | Wenegrat et al. | May 2002 | A1 |
20020091671 | Prokoph | Jul 2002 | A1 |
20020095594 | Dellmo et al. | Jul 2002 | A1 |
20020095621 | Lawton | Jul 2002 | A1 |
20020107992 | Osbourne et al. | Aug 2002 | A1 |
20020156776 | Davallou | Oct 2002 | A1 |
20020156792 | Gombocz et al. | Oct 2002 | A1 |
20020184206 | Evans | Dec 2002 | A1 |
20030126136 | Omoigui | Jul 2003 | A1 |
20030142128 | Reulein et al. | Jul 2003 | A1 |
20030144982 | Reulein et al. | Jul 2003 | A1 |
20030149686 | Drissi et al. | Aug 2003 | A1 |
20030149687 | Brown et al. | Aug 2003 | A1 |
20030177111 | Egendorf et al. | Sep 2003 | A1 |
20030221171 | Rust et al. | Nov 2003 | A1 |
20030225722 | Brown et al. | Dec 2003 | A1 |
20030225747 | Brown et al. | Dec 2003 | A1 |
20040019588 | Doganata et al. | Jan 2004 | A1 |
20040024745 | Jeng et al. | Feb 2004 | A1 |
20040024748 | Brown et al. | Feb 2004 | A1 |
20040030690 | Teng et al. | Feb 2004 | A1 |
20040044669 | Brown et al. | Mar 2004 | A1 |
20040068486 | Chidlovskii | Apr 2004 | A1 |
20040111408 | Caudill et al. | Jun 2004 | A1 |
20040181511 | Xu et al. | Sep 2004 | A1 |
20040181525 | Itzhak et al. | Sep 2004 | A1 |
20040205656 | Reulein et al. | Oct 2004 | A1 |
20040214570 | Zhang et al. | Oct 2004 | A1 |
20040220905 | Chen et al. | Nov 2004 | A1 |
20040249808 | Azzam et al. | Dec 2004 | A1 |
20040254920 | Brill et al. | Dec 2004 | A1 |
20050055341 | Haahr et al. | Mar 2005 | A1 |
20050065773 | Huang et al. | Mar 2005 | A1 |
20050065774 | Doganata et al. | Mar 2005 | A1 |
20050154708 | Sun | Jul 2005 | A1 |
20060036588 | Frank et al. | Feb 2006 | A1 |
20060191996 | Drummond et al. | Aug 2006 | A1 |
20090036159 | Chen | Feb 2009 | A1 |
Number | Date | Country |
---|---|---|
0851368 | Dec 1997 | EP |
0964344 | May 1999 | EP |
1072984 | Jan 2001 | EP |
10187752 | Jul 1998 | JP |
11219368 | Aug 1999 | JP |
0201400 | Jan 2002 | WO |
Number | Date | Country | |
---|---|---|---|
20080306923 A1 | Dec 2008 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11151047 | Jun 2005 | US |
Child | 12195862 | US | |
Parent | 10066346 | Feb 2002 | US |
Child | 11151047 | US |