Information retrieval method with natural language interface

Description

TECHNICAL FIELD

This invention relates to information retrieval technologies, and more particularly to a method for retrieving documents by intelligently matching a query string to one or more pre-stored strings. A novel ranking method is employed for said intelligent matching.

BACKGROUND OF THE INVENTION

Frequently Asked Questions (“FAQs”) are commonly presented by customers to a company. Due to the high repetition of FAQs, standard answers are usually pre-stored in a database retrievable by a query inputted into the system. A customer may present the question by dialing into the IVR system of the company, or may input the query at the website of the company.

Natural language queries are more acceptable to common customers as no special searching rules are required to be understood. A questioner can simply input a question (a query string) in natural language into the retrieval system and receive the prestored, correct answer. This is implemented by a mapping technique used inside the retrieval system. Specifically, a group of sample questions are pre-stored in the database, each with a corresponding answer. Upon receiving a query in natural language format, the system intelligently maps, by using a relatively complex, artificial intelligence algorithm, the query question to a pre-stored sample question which is coupled to an answer.

Due to the casual use of words in a natural language query string, it is important to improve the technique in successfully mapping the query string to a sample string. At present, natural language processing techniques are able to detect equivalent strings (strings that have the essentially the same meaning as the query string). They may detect the equivalent strings that are worded very differently from the query string and reject strings that are worded similar to the query string but have a different meaning. Usually more than one equivalent string is mapped to the same query string and ranked by meaning. An answer coupled to the top ranked equivalent string (i.e., that which has a meaning closest to the input string) will be retrieved and displayed to the questioner.

However, there is no technique to further distinguish equivalent strings from each other if they have the same ranking in meaning. Furthermore, the ranking among equivalent strings relies solely on either correlation in meaning or correlation in wording pattern, neither of which may be accurate enough and both of which have their limitations.

Therefore, there exists a need for improved techniques for the retrieval system to map the query strings and the prestored strings more accurately.

SUMMARY OF THE INVENTION

In the novel method of the present invention, both meaning and wording pattern are taken into consideration in ranking equivalent strings. Separate modules are utilized, a first for matching the meaning of an input string to prestored questions, and a second and independently operating module for matching word patterns of an input string to a prestored string. When plural strings are deemed to have an equivalent meaning, the word pattern of each is examined and the word pattern closest to a prestored word pattern is utilized.

In a preferred embodiment, correlation in meaning and correlation in wording pattern are weighted with different factors to obtain a combined correlation for each equivalent string, and the ranking is implemented based on the combined correlation thus obtained.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and further features and advantages of the present invention may be appreciated from the detailed description of preferred embodiments with reference to the accompanying drawings, in which:

FIG. 1

is a schematic illustration of an FAQ retrieval system;

FIG. 2

is an embodiment of the present invention; an.

FIG. 3

is another embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

A typical FAQs retrieval system is schematically shown in FIG.

1

. Database

3

comprises a question source

4

and an answer source

5

. Sample questions are pre-stored in the question source

4

and each of the sample questions is coupled to one of the standard answers that is prestored in the answer source

5

.

The natural language query questions are input at a natural language interface

1

which transmits the query to a natural language processor

2

. The questions may be received via text over a data network, or via an audio signal over a data network or a telephone network. If the questions are received via an audio signal, then a speech recognition algorithm, many of which are commercially available, should be employed.

The natural language processor

2

runs to detect equivalent questions from the question source

4

. These equivalent questions are ranked by their correlation in meaning to the query question input at the interface

1

. Usually only the answer coupled to the top ranked equivalent question is retrieved and displayed to the user by a proper displaying means such as a monitor or printer. The system may also retrieve answers to some other equivalent questions other than the top ranked one, if necessary.

Natural language processors and recognition programs are available widely, and the details of how such programs are implemented are not critical to the present invention. In the present invention however, the output of such programs is not directly used, but instead is combined with a signal that measures the degree of correlation between the wording of an input string and that of potential matches.

Sometimes there may be more than one top ranked equivalent questions are detected. In such a situation, correlation in wording pattern may be taken into account to rank the equivalent questions, as shown in FIG.

2

.

In

FIG. 2

, equivalent Questions

30

and

42

are found to have the same correlation in meaning to the query question. To further differentiate them, a step is introduced in which the wording pattern is also taken into consideration. For example, if the query question is “When can I get the payment?”, the Question

30

is “How can I receive my money”, and the Question

42

is “When can I receive the payment”, finally the Question

42

is ranked over the Question

30

because of its higher correlation in wording pattern.

The wording pattern may comprise many factors. For example, the system may check how many words are used in both the query question and the equivalent question. Usually the words to be considered will not consider words such as the articles “a”, “the” and the connectives such as “because”, “therefore”, etc. Moreover, the words will be considered in a stemmed form. For example, the words “paying”, “payment”, “pay” will all be considered as “pay”.

A more complicated embodiment is shown in FIG.

3

. In this embodiment, both correlation in meaning and correlation in wording pattern are taken into account in determining ranks for all the equivalent questions.

Presumably, a query question results in four equivalent questions Q

30

, Q

42

, Q

48

and Q

56

that are highest ranked ones from the correlation in meaning. The system generates a first correlation value or score for meaning, and a second correlation value or score for wording pattern.

Conceptually, the A scores (i.e., A

1

-A

4

) measure the correlation in meaning while the B scores (i.e., B

1

-B

4

) measure the correlation in wording pattern. These two score series, however, may not have a same weight in ranking the equivalent questions. In some situations correlation in meaning may be more important than correlation in wording pattern, while in other situations correlation in wording pattern may be more important.

Having this in mind, a weighing system is introduced to compare the relevant importance of the two scores. In particular, a weight factor X is introduced for A scores and a weight factor Y is introduced for B scores. The relative importance of the relevancies in meaning and in wording pattern is quantified by weight factors X and Y. After being weighed, correlation algorithm is made on the two score series to get final combined scores “C

1

”, “C

2

”, “C

3

”, and “C

4

” respectively, which reflect both relevancies in meaning and wording pattern as well as their relative importance. Finally the equivalent questions are ranked in accordance with these final combined C scores. Thus, the ranking results are more accurate. Factions that may be taken into account include number of words, length of the string, etc.

Usually an answer coupled to the top ranked sample question is retrieved and displayed to the questioner.

As an alternative, the system may first display one or more highest ranked equivalent questions to the questioner who may select one among them to retrieve the answer. This, however, may sometimes be inconvenient to the questioner by introducing an additional step and the questioner has to read through several questions before he can determine which is the best. This may be impractical if the query interaction is implemented over a telephone. Nonetheless, the user can be prompted to select one of several questions as they are read.

In a preferred embodiment, the weight factors X and Y may be changeable by a questioner so as to fine tune the weight factors X and Y. This is advantageous as a questioner is able to interact with the system. If the questioner is not satisfied with a query result, he may change the weight factors X and/or Y to try for a better hit without changing his query question.

For example, if a query question uses more distinguishable keywords, the questioner may increase the weight factor Y so that the final ranking scores will rely more on the correlation in wording pattern than correlation in meaning. If the words used in a query question is less distinguishable, a higher weight factor X may produce a better result.

In the past, the correlation between a query question and the equivalent questions cannot be changed unless the questioner changes the query question. Therefore, in order to get a better hit, the questioner had to try many query questions until he got the right answer. With the present invention, it is much more convenient if the questioner may adjust the ranking by only changing the weight factors.

Though the above takes a FAQs answer retrieval system as an exemplary embodiment, it will be appreciated that the present invention is also applicable in other document or information retrieval systems and that modifications and variations will be possible to those with ordinary skill in the art without departing from the spirit of the invention. The scope of the invention is therefore intended to be solely defined in the claims.

Claims

1. A method of retrieving documents in a database retrieval system having a knowledge database, the method comprising the acts ofa. receiving a query string inputted by a user into a natural language interface of said database retrieval system, said interface being coupled to a string source having a plurality of pre-stored strings, each of said pre-stored strings being coupled to one of said documents; b. in response to said receiving act, detecting from said strings source a plurality of equivalent strings having essentially the same meaning as said query string; c. in response to said detecting act, initially ranking said plurality of equivalent strings by a weighing correlation between said query string and each of said equivalent strings; d. generating a first correlation value for the meaning of each of said plurality of equivalent strings; e. in response to said act of generating a first correlation value, quantifying the correlation in meaning between said equivalent strings and said query string with a first factor; f. generating a second correlation value for a wording pattern of each of said plurality of equivalent strings; g. in response to said act of generating a second correlation value, quantifying the correlation in wording pattern between said equivalent strings and said query string with a second factor; h. in response to said quantifying acts, ranking said equivalent strings by a combined correlation of meanings and wording patterns for each of said plurality of equivalent strings; and i. in response to said act of ranking said equivalent strings, retrieving a document coupled to a selected and ranked equivalent string and displaying said document to said user.
2. The method of claim 1 wherein said first and second factors are adjustable by said user.
3. The method of claim 1 wherein said pre-stored strings are sample questions and said documents are answers to each of said sample questions.
4. The method of claim 1 wherein said first factor is larger than said second factor.
5. The method of claim 1 wherein said second factor is larger than said first factor.
6. The method of claim 1 wherein said selected equivalent string is a top ranked equivalent string.
7. The method of claim 1 further comprising a step of displaying one or more highest ranked equivalent strings, and said selected equivalent string is determined by said user by selecting among said displayed equivalent strings.
8. An information retrieval system, comprising:a knowledge database having a document source comprising a plurality of documents and a string source comprising a plurality of pre-stored strings, each of said pre-stored strings being coupled to at least one of said documents; a natural language interface for a user to input a query string, said interface being coupled to said string source; a natural language processor for detecting equivalent strings having essentially the same meaning as said query string input at said natural language interface; means for weighing a correlation in meaning between said query string and said equivalent strings by a first factor, and weighing a correlation in wording pattern between said query and said equivalent strings by a second factor, so as to obtain a combined correlation; means for ranking said equivalent strings with said combined correlation; and means for retrieving said documents coupled to a selected equivalent string.
9. The retrieval system of claim 8 further comprises means for said user to adjust said first and second factors.
10. The retrieval system of claim 8 wherein said selected equivalent string is a top ranked string.
11. The retrieval system of claim 8 further comprising means for displaying to said user one or more highest ranked equivalent strings, and means for said user to select one of them as said selected equivalent string so as to retrieve said document.
12. The retrieval system of claim 8 wherein said wording pattern comprises number of same words used in both said query string and said equivalent strings.

US Referenced Citations (120)

Number	Name	Date	Kind
3333271	Robinson et al.	Jul 1967	A
4066847	Giordano	Jan 1978	A
4286118	Mehaffey et al.	Aug 1981	A
4356348	Smith	Oct 1982	A
4392129	Mehaffey et al.	Jul 1983	A
4408100	Pritz et al.	Oct 1983	A
4477698	Szlam et al.	Oct 1984	A
4494229	Jolissaint	Jan 1985	A
4510351	Costello et al.	Apr 1985	A
4540855	Szlam et al.	Sep 1985	A
4593273	Narcisse	Jun 1986	A
4599493	Cave	Jul 1986	A
4600814	Cunniff et al.	Jul 1986	A
4677663	Szlam	Jun 1987	A
4692858	Redford et al.	Sep 1987	A
4694483	Cheung	Sep 1987	A
4720853	Szlam	Jan 1988	A
4742537	Jesurum	May 1988	A
4742538	Szlam	May 1988	A
4742539	Szlam	May 1988	A
4757267	Riskin	Jul 1988	A
4782463	Sanders	Nov 1988	A
4782510	Szlam	Nov 1988	A
4792968	Katz	Dec 1988	A
4797911	Szlam et al.	Jan 1989	A
4811240	Ballou et al.	Mar 1989	A
4823306	Barbic et al.	Apr 1989	A
4829563	Crockett et al.	May 1989	A
4858120	Samuelson	Aug 1989	A
4866638	Cosentino et al.	Sep 1989	A
4881261	Oliphant et al.	Nov 1989	A
4894857	Szlam et al.	Jan 1990	A
4896345	Thorne	Jan 1990	A
4933964	Girgis	Jun 1990	A
4939771	Brown et al.	Jul 1990	A
4939773	Katz	Jul 1990	A
4988209	Davidson et al.	Jan 1991	A
5021976	Wexelblat et al.	Jun 1991	A
5041992	Cunningham et al.	Aug 1991	A
5062103	Davidson et al.	Oct 1991	A
5070525	Szlam et al.	Dec 1991	A
5115501	Kerr	May 1992	A
5119072	Hemingway	Jun 1992	A
5119475	Smith et al.	Jun 1992	A
5121477	Koopmans et al.	Jun 1992	A
5175761	Ramsay et al.	Dec 1992	A
5179657	Dykstal et al.	Jan 1993	A
5179700	Aihara et al.	Jan 1993	A
5181236	LaVallee et al.	Jan 1993	A
5206903	Kohler et al.	Apr 1993	A
5214688	Szlam et al.	May 1993	A
5276731	Arbel et al.	Jan 1994	A
5309505	Szlam et al.	May 1994	A
5309513	Rose	May 1994	A
5335269	Steinlicht	Aug 1994	A
5345589	King et al.	Sep 1994	A
5357254	Kah	Oct 1994	A
5386412	Park et al.	Jan 1995	A
5418948	Turtle	May 1995	A
5428827	Kasser	Jun 1995	A
5430792	Jesurum et al.	Jul 1995	A
5440616	Harrington et al.	Aug 1995	A
5490211	Adams et al.	Feb 1996	A
5500891	Harrington et al.	Mar 1996	A
5511112	Szlam	Apr 1996	A
5511117	Zazzera	Apr 1996	A
5519773	Dumas et al.	May 1996	A
5533109	Baker	Jul 1996	A
5535270	Doremus et al.	Jul 1996	A
5546456	Vilsoet et al.	Aug 1996	A
5553133	Perkins	Sep 1996	A
5568544	Keeler et al.	Oct 1996	A
5579368	van Berkum	Nov 1996	A
5581602	Szlam et al.	Dec 1996	A
5586178	Koenig et al.	Dec 1996	A
5588045	Locke	Dec 1996	A
5594781	Kozdon et al.	Jan 1997	A
5594791	Szlam et al.	Jan 1997	A
5619557	Van Berkum	Apr 1997	A
5623540	Morrison et al.	Apr 1997	A
5675637	Szlam et al.	Oct 1997	A
5689240	Traxler	Nov 1997	A
5696818	Doremus et al.	Dec 1997	A
5714932	Castellon et al.	Feb 1998	A
5715307	Zazzera	Feb 1998	A
5722059	Campana	Feb 1998	A
5722064	Campana	Feb 1998	A
5729600	Blaha et al.	Mar 1998	A
5742233	Hoffman et al.	Apr 1998	A
5815565	Doremus et al.	Sep 1998	A
5825283	Camhl	Oct 1998	A
5825869	Brooks et al.	Oct 1998	A
5828731	Szlam et al.	Oct 1998	A
5832059	Aldred et al.	Nov 1998	A
5832070	Bloom et al.	Nov 1998	A
5857014	Sumner et al.	Jan 1999	A
5864615	Dezonno	Jan 1999	A
5940494	Rafacz et al.	Aug 1999	A
5963635	Szlam et al.	Oct 1999	A
RE36416	Szlam et al.	Nov 1999	E
5991394	Dezonno et al.	Nov 1999	A
6028601	Machiraju et al.	Feb 2000	A
6044146	Gisby et al.	Mar 2000	A
6044355	Crockett et al.	Mar 2000	A
6118763	Trumbull	Sep 2000	A
6134530	Bunting et al.	Oct 2000	A
6157655	Shtivelman	Dec 2000	A
6269153	Carpenter et al.	Jul 2001	B1
6272347	Griffith et al.	Aug 2001	B1
6314089	Szlam et al.	Nov 2001	B1
6359892	Szlam	Mar 2002	B1
6359982	Foster et al.	Mar 2002	B1
6362838	Szlam et al.	Mar 2002	B1
6560590	Shwe et al.	May 2003	B1
6584464	Warthen	Jun 2003	B1
20020047859	Szlam et al.	Apr 2002	A1
20020067822	Cohen et al.	Jun 2002	A1
20020143878	Birnbaum et al.	Oct 2002	A1
20020145624	Szlam et al.	Oct 2002	A1
20020161896	Wen et al.	Oct 2002	A1

Foreign Referenced Citations (1)

Number	Date	Country
0855826	Jul 1998	EP

Non-Patent Literature Citations (3)

Entry
M2 Presswire, 3COM: Cell IT launches breakthrough multimedia call center solution based on high speed 3Com systems; Mar. 4, 1998; 1-3 webpages; Coventry.
AT & T Technology; In The Forefront With Integrated Call Centers; Winter 1992; 1-7 webpages; New York.
Telemarketing & Call Center Solutions; How a Travel-Industry Call Center Excels; Sep. 1997; 1-2 webpages; Norwalk.

Information retrieval method with natural language interface

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Term Extension

Abstract

Description

Claims

US Referenced Citations (120)

Foreign Referenced Citations (1)

Non-Patent Literature Citations (3)