Claims
- 1. A method for summarizing the contents of a natural language document including a plurality of sentences and provided in electronic or digital form, said method comprising:
A. extracting from said document eSAOs, including extracting subjects, objects, and actions and extracting one or more of adjectives, prepositions, indirect objects and adverbials; B. determining a weight for each eSAO; C. for each sentence in said document, using the weights of all eSAOs for said sentence to obtain a sentence weight; and D. generating one or more document summaries as a function of said sentence weights.
- 2. The method of claim 1, further including in step A determining attributes for at least some of said subjects, objects, and indirect objects, wherein an attribute represents a word or phrase having a relationship to the subject, object, or indirect object for which it is an attribute.
- 3. The method of claim 2, wherein said relationship is one or more of a feature, inclusion, placement, formation, connection, separation, or transfer.
- 4. The method of claim 3, wherein said relationship is a feature of a type of parameter.
- 5. The method of claim 1, wherein step A includes determining Cause-Effect relationships between said eSAOs.
- 6. The method of claim 1, wherein step B is accomplished using statistical weighting, including determining said eSAO weight as a function of the frequency of appearance of components of said eSAOs in said document.
- 7. The method of claim 6, wherein the statistical weight of said sentence is a function of the maximum weight of each eSAO in said sentence.
- 8. The method of claim 1, further including determining a cue weight for each sentence using cue weighting, including determining said cue weight as a function of a quantitative importance of assigned to words and phrases, wherein said sentence weights are further determined as a function of said cue weights.
- 9. The method of claim 8, further including determining a Cause-Effect weight for each sentence using Cause-Effect weighting, including determining said Cause-Effect weight as a function of a quantitative score assigned to words and phrases having a Cause-Effect relationship, wherein said sentence weights are further determined as a function of said Cause-Effect weights.
- 10. The method of claim 1, further including determining a Cause-Effect weight for each sentence using Cause-Effect weighting, including determining said Cause-Effect weight as a function of a quantitative score assigned to words and phrases having a Cause-Effect relationship, wherein said sentence weights are further determined as a function of said Cause-Effect weights.
- 11. The method of claim 1, wherein one or more document summaries are selectable from a set of document summary types including at least one of a key-word summary, a topic-oriented summary, an eSAO summary, a classic summary, and a field-oriented summary.
- 12. The method of claim 1, wherein step D includes contracting said document summary by deleting introductory phrases and sentences as a function of a set of document patterns, wherein said document patterns identify said introductory phrases and sentences as having low relevance.
- 13. A method for summarizing the contents of a natural language document provided in electronic or digital form, said method comprising:
A. performing linguistic analysis, including:
i) tagging substantially each word as a function of a part of speech of said word; ii) parsing verbal sequences and noun phrases from said tagged words; and iii) building a syntactical parsed tree from said verbal sequences and noun phrases, according to a set of rules, wherein words grouped by a rule become inaccessible to other rules; B. weighting each sentence in the document as a function of quantitative importance and said syntactical parsed tree; and C. generating one or more document summaries, from a plurality of selectable document summary types, as a function of the sentence weights.
- 14. The method claim 13, further comprising, before step A:
D. preformatting the document, including:
i) removing symbols that are not part of the natural language text; ii) correcting mismatches and misspellings; iii) dividing the document into words and sentences; and iv) recognizing document fields.
- 15. The method of claim 13, wherein step A includes extracting from said document eSAOs, including extracting subjects, objects, and actions and extracting one or more of adjectives, prepositions, indirect objects and adverbials;
- 16. The method of claim 14, wherein step B includes determining a weight for each eSAO.
- 17. The method of claim 14, wherein step B includes determining a cue weight for each sentence.
- 18. The method of claim 14, wherein step B includes determining a Cause-Effect weight for each sentence.
- 19. A system for summarizing the contents of a natural language document provided in electronic or digital form, said system comprising:
A. at least one memory having a set of linguistic rules stored therein; B. a linguistic analyzer coupled to said at least one memory and configured for:
i) a tagging substantially each word as a function of a part of speech of said word; ii) parsing verbal sequences and noun phrases from said tagged words; and iii) building a syntactical parsed tree from said verbal sequences and noun phrases, according to said set of rules, wherein words grouped by a rule become inaccessible to other rules; C. a sentence weighting module configured to access said syntactical phrase tree and to weight each sentence in the document as a function of quantitative importance and said syntactical parsed tree; and D. a summary generating for one or more document summaries, from a plurality of selectable document summary types, as a function of the sentence weights.
- 20. The system as in claim 19, further comprising:
E. a preformatter configured for:
i) removing symbols that are not part of the natural language text; ii) correcting mismatches and misspellings; iii) dividing the document into words and sentences; and iv) recognizing document fields.
- 21. The system of claim 19, wherein said linguistic analyzer includes an eSAOs extractor, configured for extracting subjects, objects, and actions and further configured for extracting one or more of adjectives, prepositions, indirect objects and adverbials.
- 22. The system of claim 21, wherein said sentence weighting module is further configured for determining a weight for each eSAO.
- 23. The system of claim 21, wherein said sentence weighting module is further configured for determining a cue weight for each sentence.
- 24. The system of claim 21, wherein said sentence weighting module is further configured for determining a Cause-Effect weight for each sentence.
- 25. A system for summarizing the contents of a natural language document including a plurality of sentences and provided in electronic or digital form, said system comprising:
A. at least one memory having a set of linguistic rules stored therein; B. a linguistic analyzer coupled to said at least one memory and configured for extracting from said document eSAOs, including extracting subjects, objects, and actions and extracting one or more of adjectives, prepositions, indirect objects and adverbials; C. a weighting module for determining a weight for each eSAO and, for each sentence in said document, using the weights of all eSAOs for said sentence to obtain a sentence weight; and D. a summary generator for generating one or more document summaries as a function of said sentence weights.
- 26. The system of claim 25, wherein said linguistic analyzer is further configured for determining attributes for at least some of said subjects, objects, and indirect objects, wherein an attribute represents a word or phrase having a relationship to the subject, object, or indirect object for which it is an attribute.
- 27. The system of claim 26, wherein said relationship is one or more of a feature, inclusion, placement, formation, connection, separation, or transfer.
- 28. The system of claim 27, wherein said relationship is a feature of a type of parameter.
- 29. The system claim 25, wherein said linguistic analyzer is further configured for determining Cause-Effect relationships between said eSAOs.
- 30. The system of claim 25, wherein said weighting module is configured for performing statistical weighting, including determining said eSAO weight as a function of the frequency of appearance of components of said eSAO in said document.
- 31. The system of claim 30, wherein the weight of said sentence is a function of the maximum weight of each eSAO in said sentence.
- 32. The system of claim 25, wherein said weighting module is further configured for determining a cue weight for each sentence using cue weighting, including determining said cue weight as a function of a quantitative importance of assigned to words and phrases, wherein said sentence weights are further determined as a function of said cue weights.
- 33. The system of claim 32, wherein said weighting module is further configured for determining a Cause-Effect weight for each sentence using Cause-Effect weighting, including determining said Cause-Effect weight as a function of a quantitative score assigned to words and phrases having a Cause-Effect relationship, wherein said sentence weights are further determined as a function of said Cause-Effect weights.
- 34. The system of claim 25, wherein said weighting module is further configured for determining a Cause-Effect weight for each sentence using Cause-Effect weighting, including determining said Cause-Effect weight as a function of a quantitative score assigned to words and phrases having a Cause-Effect relationship, wherein said sentence weights are further determined as a function of said Cause-Effect weights.
- 35. The system of claim 25, wherein said one or more document summaries are selectable from a set of document summary types including at least one of a key-word summary, a topic-oriented summary, an eSAO summary, a classic summary, and a field-oriented summary.
- 36. The system of claim 25, wherein said summary generator is further configured for contracting said document summary by deleting introductory phrases and sentences as a function of a set of document patterns, wherein said document patterns identify said introductory phrases and sentences as having low relevance.
CROSS REFERENCES TO RELATED APPLICATIONS
[0001] This application claims the benefit of priority under 35 U.S.C. §119(e) from co-pending, commonly owned U.S. provisional patent application serial No. 60/308,886, entitled COMPUTER BASED SUMMARIZATION OF NATURAL LANGUAGE DOCUMENTS, filed Jul. 31, 2001.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60308886 |
Jul 2001 |
US |