Claims
- 1. A collection search system responsive to a user query against a collection of documents to provide a search report, said collection search system comprising:
- a) a collection index including first predetermined single word and multiple word phrases as indexed terms occurring in said collection of documents;
- b) a linguistic parser that identifies a list of search terms from a user query, said linguistic parser identifying said list from second predetermined single words and multiple word phrases; and
- c) a search engine coupled to receive said list from said linguistic parser, said search engine intersecting said list with said collection index to identify a predetermined document from said collection of documents, said search engine including an accumulator for summing for said predetermined document a relevancy score related to the intersection of said predetermined document with said list using conditional indexing of said word phrases.
- 2. The collection search system of claim 1 wherein said collection is fractionally stored in a plurality of distributed servers.
- 3. The collection search system of claim 1 wherein said linguistic parser provides for eliding predetermined punctuation marks and stop list words, wherein said collection search engine.
- 4. The search system of claim 1 wherein said phrase document terms include proximity related single word terms, and wherein phrase document terms exclude a predetermined set of words.
- 5. A collection search system responsive to a user query against a collection of documents to provide a search report, said collection search system comprising:
- a) a collection index including first predetermined single word and multiple word phrases as indexed terms occurring in said collection of documents, said collection index storing one-word and two-word indexes of the frequency of occurrence of search phrases;
- b) a linguistic parser that identifies a list of search terms from a user query, said linguistic parser identifying said list from second predetermined single words and multiple word phrases; and
- c) a search engine coupled to receive said list from said linguistic parser, said search engine intersecting said list with said collection index to identify a predetermined document from said collection of documents, said search engine including an accumulator for summing for said predetermined document a relevancy score related to the intersection of said predetermined document with said list.
- 6. The search system of claim 5 wherein three-word and more phrases are inferred from said one-word and two-word indexes of the frequency of occurrence of search phrases.
- 7. A collection search system responsive to a user query against a collection of documents to provide a search report, said collection search system comprising:
- a) a collection index including first predetermined single word and multiple word phrases as indexed terms occurring in said collection of documents, wherein said collection index is distributed in different subgroups which share common group statistics;
- b) a linguistic parser that identifies a list of search terms from a user query, said linguistic parser identifying said list from second predetermined single words and multiple word phrases; and
- c) a search engine coupled to receive said list from said linguistic parser, said search engine intersecting said list with said collection index to identify a predetermined document from said collection of documents said search engine including an accumulator for summing for said predetermined document a relevancy score related to the intersection of said predetermined document with said list.
- 8. The search system of claim 7 wherein one or more of said subgroups is stored within high-speed memory whereby searches of said one or more of said subgroups do not require accesses to slower-speed memory.
- 9. A collection search system responsive to a user query against a collection of documents to provide a search report, said collection search system comprising:
- a) a distributed collection index including first predetermined one-word and two-word phrases as indexed terms occurring in said collection of documents;
- b) a linguistic parser that identifies a list of search terms from a user query, said linguistic parser identifying said list from second predetermined single words and multiple word phrases; and
- c) a search engine coupled to receive said list from said linguistic parser, said search engine intersecting said list with said collection index to identify a predetermined document from said collection of documents, said search engine including an accumulator for summing for said predetermined document a relevancy score related to the intersection of said predetermined document with said list.
- 10. A collection search system responsive to a user query, Q, for searching against a collection of documents,
- where each document is described as a string of words, S, where for an M-word document, S=S.sub.1, S.sub.2,S.sub.3, . . . ,S.sub.m,S.sub.m+1,S.sub.m+2,S.sub.m+3, . . . ,SM, where m=1,2, . . . , M,
- where for a number, N.sub.g, of documents in a group g of documents in the collection, the documents are designated S1.sub.g,S2.sub.g, . . . ,Sn.sub.g, . . . ,SN.sub.g for n.sub.g =1,2, . . . , N.sub.g, and
- where a typical predetermined document, Sn.sub.g, of said documents in the group g is given by, Sn.sub.g =Sn.sub.g,1,Sn.sub.g,2,Sn.sub.g,3, . . . ,Sn.sub.g,m3 Sn.sub.g,m+1,Sn.sub.g,m+2,Sn.sub.g,m+3, . . . ,Sn.sub.g,Mm,
- said collection search system comprising:
- a collection index including first predetermined one-word and two-word phrases as indexed terms occurring in said collection of documents;
- a linguistic parser that identifies from the user query, Q, a list of search terms Q.sub.1, Q.sub.2, . . . , Q.sub.p, . . . , Q.sub.p each having a weighted value, W.sub.Q1,W.sub.Q2, . . . ,W.sub.Qp3 . . . , WQP, said linguistic parser identifying said list of search terms from second predetermined one-word and two-word phrases; and
- a search engine coupled to receive said list from said linguistic parser, said search engine intersecting said list with said collection index to identify the predetermined document Sn.sub.g from said collection of documents, said search engine including an accumulator for summing for said predetermined document a relevancy score, score(Sn.sub.g).sub.Q, for the document Sn.sub.g based on the query Q as follows: ##EQU5## where (A.sub.Ng).sub.Qp =a document value relative to the number of occurrences of the term Qp in the documents in the group g,
- where W.sub.Qp =the value of the query term relative to the number of occurrences of the term Qp in the particular document Sn.sub.g.
- 11. The search system of claim 10 wherein said documents are in different subgroups, sg.sub.p of the group, g, where g=sg.sub.1, sg.sub.2, . . . ,sg.sub.j, . . . ,sg.sub.G and the scoring is performed separately for each subgroup j using the statistics of the master subgroup, g, where score, score(Sn.sub.g).sub.Q {sg.sub.j }, for the particular document Sn.sub.g for a particular one subgroup {sg.sub.j } is determined as follows, ##EQU6## where W.sub.Qp {sg.sub.j }=the weighted value for the query collectively considering all the Nsg.sub.j documents in the subgroup sp.sub.j.
- 12. The search system of claim 11 wherein the total score SCORE(Sn.sub.g).sub.Q for the document Sn.sub.g for all the subgroups {sg.sub.j } for j=1,2, . . . , P is determined as follows: ##EQU7##
CROSS-REFERENCE TO RELATED APPLICATIONS
The present application is a continuation in part of the following application, assigned to the same assignee as the present application:
Title: REAL-TIME DOCUMENT COLLECTION SEARCH ENGINE WITH PHRASE INDEXING
Inventors: Steven T. Kirsch, William Chang, Edward Miller
SC/Ser No.: 08/696,782
Filed Date: Aug. 14, 1996
The present application is related to the following application, assigned to the same assignee as the present application:
Title: METHOD FOR AUTOMATICALLY SELECTING COLLECTIONS TO SEARCH IN FULL TEXT SEARCHES
Inventors: Steven T. Kirsch, William Chang, Edward Miller
SC/Ser No.: 08/928,542
Filed Date: Sep. 12, 1997
US Referenced Citations (5)
Continuation in Parts (1)
|
Number |
Date |
Country |
| Parent |
696782 |
Aug 1996 |
|