Claims
- 1. A method of constructing a text summarization, comprising:
selecting at least one domain ontology comprising a set of concepts; defining a user profile indicative of the user's interests in terms of the concepts in the selected ontology; determining if a document is relevant to the user based upon the user profile; responsive to determining that the document is relevant, using at least a portion of the selected ontology to extract concepts from the document; determining the degree of match between the extracted concepts and the concepts defined in the user profile; and generating a document text summary if the degree of match exceeds a predetermined threshold.
- 2. The method of claim 1, wherein generating the document text summary comprises:
selecting sentences from the document based on the concepts in the user profile; ranking the selected sentences by relevance to the user profile; selecting sentences for inclusion in the document text summary based upon the ranking; and merging the selected sentences into the document text summary.
- 3. The method of claim 2, wherein selecting the sentences includes selecting all sentences containing the user profile concepts.
- 4. The method of claim 3, wherein selecting the sentences further comprises, selecting additional sentences containing antecedents of referring terms.
- 5. The method of claim 3, wherein selecting the sentences further comprises, selecting all sentences within a region of the document if the proportion of sentences containing concept terms in the region exceeds a predetermined threshold.
- 6. The method of claim 1, wherein the length of the document text summary is based on either a fixed word count specified by the user.
- 7. The method of claim 1, wherein the length of the document text summary is based on a percentage of the length of the document being summarized.
- 8. The method of claim 1, further comprising refining the document text summary including pronominalization of at least a portion of the summary.
- 9. The method of claim 1, further comprising, prior to determining if a document is relevant, retrieving a document using a web crawler via the Internet.
- 10. The method of claim 9, further comprising, after retrieving a document, preprocessing the document including identifying document structure information and performing part-of-speech analysis.
- 11. A computer program product comprising a computer readable medium containing a set of computer executable instructions for constructing a text summarization, the instructions comprising:
computer code means for selecting at least one domain ontology comprising a set of concepts; computer code means for defining a user profile indicative of the user's interests in terms of the concepts in the selected ontology; computer code means for determining if a document is relevant to the user based upon the user profile; computer code means for using at least a portion of the selected ontology to extract concepts from the document responsive to determining that the document is relevant; computer code means for determining the degree of match between the extracted concepts and the concepts defined in the user profile; and computer code means for generating a document text summary if the degree of match exceeds a predetermined threshold.
- 12. The computer program product of claim 11, wherein the code means for generating the document text summary comprises:
computer code means selecting sentences from the document based on the concepts in the user profile; computer code means for ranking the selected sentences by relevance to the user profile; computer code means for selecting sentences for inclusion in the document text summary based upon the ranking; and computer code means for merging the selected sentences into the document text summary.
- 13. The computer program product of claim 12, wherein the code means for selecting the sentences includes code means for selecting all sentences containing the user profile concept terms.
- 14. The computer program product of claim 13, wherein the code means for selecting the sentences further comprises, code means for selecting additional sentences containing pronouns referring to concept terms.
- 15. The computer program product of claim 13, wherein the code means for selecting the sentences further comprises, code means for selecting all sentences within a region of the document if the proportion of sentences containing concept terms in the region exceeds a predetermined threshold.
- 16. The computer program product of claim 11, wherein the length of the document text summary is based on either a fixed word count specified by the user.
- 17. The computer program product of claim 11, wherein the length of the document text summary is based on a percentage of the length of the document being summarized.
- 18. The computer program product of claim 11, further comprising code means for refining the document text summary including pronominalization of at least a portion of the summary.
- 19. The computer program product of claim 11, further comprising code means for retrieving a document using a web crawler via the Internet prior to determining if a document is relevant.
- 20. The computer program product of claim 19, further comprising code means for preprocessing the document after retrieval including identifying document structure information and performing part-of-speech analysis.
- 21. A data processing system including processor, memory, and input means, the system further include computer program product code for constructing a text summarization, the code comprising:
computer code means for selecting at least one domain ontology comprising a set of concepts; computer code means for defining a user profile indicative of the user's interests in terms of the concepts in the selected ontology; computer code means for determining if a document is relevant to the user based upon the user profile; computer code means for using at least a portion of the selected ontology to extract concepts from the document responsive to determining that the document is relevant; computer code means for determining the degree of match between the extracted concepts and the concepts defined in the user profile; and computer code means for generating a document text summary if the degree of match exceeds a predetermined threshold.
- 22. The data processing system of claim 21, wherein the code means for generating the document text summary comprises:
computer code means selecting sentences from the document based on the concepts in the user profile; computer code means for ranking the selected sentences by relevance to the user profile; computer code means for selecting sentences for inclusion in the document text summary based upon the ranking; and computer code means for merging the selected sentences into the document text summary.
- 23. The data processing system of claim 22, wherein the code means for selecting the sentences includes code means for selecting all sentences containing the user profile concept terms.
- 24. The data processing system of claim 23, wherein the code means for selecting the sentences further comprises, code means for selecting additional sentences containing pronouns referring to concept terms.
- 25. The data processing system of claim 23, wherein the code means for selecting the sentences further comprises, code means for selecting all sentences within a region of the document if the proportion of sentences containing concept terms in the region exceeds a predetermined threshold.
- 26. The data processing system of claim 21, wherein the length of the document text summary is based on either a fixed word count specified by the user.
- 27. The data processing system of claim 21, wherein the length of the document text summary is based on a percentage of the length of the document being summarized.
- 28. The data processing system of claim 21, further comprising code means for refining the document text summary including pronominalization of at least a portion of the summary.
- 29. The data processing system of claim 21, further comprising code means for retrieving a document using a web crawler via the Internet prior to determining if a document is relevant.
- 30. The data processing system of claim 29, further comprising code means for preprocessing the document after retrieval including identifying document structure information and performing part-of-speech analysis.
Parent Case Info
[0001] This application claims priority under 35 USC § 119(e)(1) from the provisional patent application entitled, CONCEPT-BASED ONTOLOGY TEXT SUMMARIZATION, Serial No. 60/215,436, filed Jun. 30, 2000.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60215436 |
Jun 2000 |
US |