Claims
- 1. A method comprising:
identifying a sentence of an essay; determining a feature associated with said sentence; determining a probability of said sentence being a discourse element by mapping said feature to a model, said model having been generated by a machine learning application based on at least one annotated essay; and annotating said essay based on said probability.
- 2. The method according to claim 1, wherein said discourse element is at least one of: title; background; thesis statement; main points; support; and conclusion.
- 3. The method according to claim 1, further comprising:
receiving said essay.
- 4. The method according to claim 1, wherein said feature comprises at least one of: a positional feature; a lexical feature; a rhetorical feature; and a punctuation.
- 5. The method according to claim 4, further comprising:
generating a flat file for said essay, said flat file including an entry for said sentence; modifying said entry to include data associated with said positional feature; modifying said entry to include data associated with said lexical feature; modifying said entry to include data associated with said rhetorical feature; identifying said punctuation being associated with said sentence; and modifying said entry to include data associated with said punctuation.
- 6. The method according to claim 5, wherein said positional feature comprises at least one of:
a sentence position, said sentence position being associated with a position of said sentence within said essay; a relative sentence position, said relative sentence position being associated with a relative position of said sentence within said essay; a paragraph position, said paragraph position being associated with a position of said sentence within a paragraph of said essay; and a relative paragraph position, said relative paragraph position being associated with a relative position of said paragraph within said essay.
- 7. The method according to claim 5, wherein said lexical feature comprises at least one of:
a category-specific cue, said category-specific cue being typically associated with a discourse element; a general vocabulary cue, said general vocabulary cue being typically associated with a discourse structure; and a key term, said key term being typically associated with a discourse relationship.
- 8. The method according to claim 5, further comprising:
generating a rhetorical structure tree based on said flat file; and identifying said rhetorical feature based on said rhetorical structure tree, wherein said rhetorical feature comprises at least one of:
a discourse structure, said discourse structure being typically associated with an elementary discourse unit; a rhetorical relation, said rhetorical relation describing a manner of association between a plurality of said discourse structures; and a status, said status comprising:
a nucleus, said nucleus being associated with a relatively more important one of said plurality of discourse structures; and a satellite, said satellite being associated with a relatively less important one of said plurality of discourse structures.
- 9. The method according to claim 1, wherein said rhetorical structure tree is mapped to a plurality of models and said probability being determined based on a voting algorithm.
- 10. A method comprising:
identifying a sentence of an essay; generating a flat file for said essay, said flat file including an entry for said sentence; determining a positional feature associated with said sentence; modifying said entry to include data associated with said positional feature; identifying a lexical feature associated with said sentence; modifying said entry to include data associated with said lexical feature; identifying a rhetorical feature associated with said sentence; modifying said entry to include data associated with said rhetorical feature; determining a probability of said sentence being a discourse element by mapping said flat file to a model, said model having been generated by a machine learning application based on at least one annotated essay; and annotating said essay based on said probability.
- 11. The method according to claim 10, wherein said discourse element is at least one of: title; background; thesis statement; main points; support; and conclusion.
- 12. The method according to claim 10, further comprising:
receiving said essay.
- 13. The method according to claim 10, wherein said positional feature comprises at least one of:
a sentence position, said sentence position being associated with a position of said sentence within said essay; a relative sentence position, said relative sentence position being associated with a relative position of said sentence within said essay; a paragraph position, said paragraph position being associated with a position of said sentence within a paragraph of said essay; and a relative paragraph position, said relative paragraph position being associated with a relative position of said paragraph within said essay.
- 14. The method according to claim 10, wherein said lexical feature comprises at least one of:
a category-specific cue, said category-specific cue being typically associated with a discourse element; a general vocabulary cue, said general vocabulary cue being typically associated with a discourse structure; and a key term, said key term being typically associated with a discourse relationship.
- 15. The method according to claim 10, further comprising:
generating a rhetorical structure tree based on said flat file; and identifying said rhetorical feature based on said rhetorical structure tree, wherein said rhetorical feature comprises at least one of:
a discourse structure, said discourse structure being typically associated with an elementary discourse unit; a rhetorical relation, said rhetorical relation describing a manner of association between a plurality of said discourse structures; and a status, said status comprising:
a nucleus, said nucleus being associated with a relatively more important one of said plurality of discourse structures; and a satellite, said satellite being associated with a relatively less important one of said plurality of discourse structures.
- 16. The method according to claim 10, further comprising:
identifying a punctuation associated with said sentence; and modifying said entry to include data associated with said punctuation.
- 17. The method according to claim 10, wherein said flat file is mapped to a plurality of models and said probability being determined based on a voting algorithm.
- 18. A method comprising:
receiving an essay; identifying a sentence of said essay; generating a flat file for said essay, said flat file including an entry for said sentence; determining a positional feature associated with said sentence, wherein said positional feature comprises at least one of:
a sentence position, said sentence position being associated with a position of said sentence within said essay; a relative sentence position, said relative sentence position being associated with a relative position of said sentence within said essay; a paragraph position, said paragraph position being associated with a position of said sentence within a paragraph of said essay; and a relative paragraph position, said relative paragraph position being associated with a relative position of said paragraph within said essay; modifying said entry to include data associated with said positional feature; identifying a lexical feature associated with said sentence, wherein said lexical feature comprises at least one of:
a category-specific cue, said category-specific cue being typically associated with a discourse element; a general vocabulary cue, said general vocabulary cue being typically associated with a discourse structure; and a key term, said key term being typically associated with a discourse relationship; modifying said entry to include data associated with said lexical feature; identifying a punctuation associated with said sentence; modifying said entry to include data associated with said punctuation; generating a rhetorical structure tree based on said flat file; identifying a rhetorical feature based on said rhetorical structure tree, wherein said rhetorical feature comprises at least one of:
a discourse structure, said discourse structure being typically associated with an elementary discourse unit; a rhetorical relation, said rhetorical relation describing a manner of association between a plurality of said discourse structures; and a status, said status comprising:
a nucleus, said nucleus being associated with a relatively more important one of said plurality of discourse structures; and a satellite, said satellite being associated with a relatively less important one of said plurality of discourse structures; modifying said entry to include data associated with said rhetorical feature; determining a probability of said sentence being a discourse element by mapping said flat file to a model, said model having been generated by a machine learning application based on at least one annotated essay; and annotating said essay based on said probability.
- 19. The method according to claim 18, wherein said discourse element is at least one of: title; background; thesis statement; main points; support; and conclusion.
- 20. A process comprising:
training a first judge to identify a sentence within an essay as being a discourse element; accepting a first annotation of said essay from said first judge; evaluating said first judge based on a comparison of said first annotation to a second annotation of a second judge; and calculating an empirical probability based on said first annotation in response to said evaluation exceeding a predetermined value, said empirical probability including at least one of:
a positional feature of said sentence within said essay; a category-specific feature of said sentence within said essay; a lexical feature of said sentence within said essay; a key term of said sentence within said essay; and a punctuation of said sentence within said essay.
- 21. A computer readable medium on which is embedded computer software, said software comprising executable code for performing a method comprising:
identifying a sentence of an essay; determining a feature associated with said sentence; determining a probability of said sentence being a discourse element by mapping said feature to a model, said model having been generated by a machine learning application based on at least one annotated essay; and annotating said essay based on said probability.
- 22. The method according to claim 21, wherein said discourse element is at least one of: title; background; thesis statement; main points; support; and conclusion.
- 23. The method according to claim 21, further comprising:
receiving said essay.
- 24. The method according to claim 21, wherein said feature comprises at least one of: a positional feature; a lexical feature; a rhetorical feature; and a punctuation.
- 25. The method according to claim 24, further comprising:
generating a flat file for said essay, said flat file including an entry for said sentence; modifying said entry to include data associated with said positional feature; modifying said entry to include data associated with said lexical feature; modifying said entry to include data associated with said rhetorical feature; identifying said punctuation being associated with said sentence; and modifying said entry to include data associated with said punctuation.
- 26. The method according to claim 25, wherein said positional feature comprises at least one of:
a sentence position, said sentence position being associated with a position of said sentence within said essay; a relative sentence position, said relative sentence position being associated with a relative position of said sentence within said essay; a paragraph position, said paragraph position being associated with a position of said sentence within a paragraph of said essay; and a relative paragraph position, said relative paragraph position being associated with a relative position of said paragraph within said essay.
- 27. The method according to claim 25, wherein said lexical feature comprises at least one of:
a category-specific cue, said category-specific cue being typically associated with a discourse element; a general vocabulary cue, said general vocabulary cue being typically associated with a discourse structure; and a key term, said key term being typically associated with a discourse relationship.
- 28. The method according to claim 25, further comprising:
generating a rhetorical structure tree based on said flat file; and identifying said rhetorical feature based on said rhetorical structure tree, wherein said rhetorical feature comprises at least one of:
a discourse structure, said discourse structure being typically associated with an elementary discourse unit; a rhetorical relation, said rhetorical relation describing a manner of association between a plurality of said discourse structures; and a status, said status comprising:
a nucleus, said nucleus being associated with a relatively more important one of said plurality of discourse structures; and a satellite, said satellite being associated with a relatively less important one of said plurality of discourse structures.
- 29. The method according to claim 21, wherein said rhetorical structure tree is mapped to a plurality of models and said probability being determined based on a voting algorithm.
- 30. An automatic essay annotator comprising:
means for identifying a sentence of an essay; means for determining a feature associated with said sentence; means for determining a probability of said sentence being a discourse element, said means for determining said probability being configured to map said feature to a model, said model having been generated by a machine learning application based on at least one annotated essay and said discourse element being at least one of: title; background; thesis statement; main points; support; and conclusion; and means for annotating said essay based on said probability.
- 31. The automatic essay annotator according to claim 30, further comprising:
means for receiving said essay.
- 32. The automatic essay annotator according to claim 30, wherein said means for determining said feature further comprises at least one of:
means for determining a positional feature; means for determining a lexical feature; means for determining a rhetorical feature; and means for determining a punctuation.
- 33. The automatic essay annotator according to claim 32, further comprising:
means for generating a flat file for said essay, said flat file including an entry for said sentence; means for modifying said entry to include data associated with said positional feature; means for modifying said entry to include data associated with said lexical feature; means for modifying said entry to include data associated with said rhetorical feature; means for identifying said punctuation being associated with said sentence; and means for modifying said entry to include data associated with said punctuation.
- 34. The automatic essay annotator according to claim 33, wherein said means for determining a positional feature comprises at least one of:
means for determining a sentence position, said sentence position being associated with a position of said sentence within said essay; means for determining a relative sentence position, said relative sentence position being associated with a relative position of said sentence within said essay; means for determining a paragraph position, said paragraph position being associated with a position of said sentence within a paragraph of said essay; and means for determining a relative paragraph position, said relative paragraph position being associated with a relative position of said paragraph within said essay.
- 35. The automatic essay annotator according to claim 33, wherein said means for determining a lexical feature comprises at least one of:
means for identifying a category-specific cue, said category-specific cue being typically associated with a discourse element; means for identifying a general vocabulary cue, said general vocabulary cue being typically associated with a discourse structure; and means for identifying a key term, said key term being typically associated with a discourse relationship.
- 36. The automatic essay annotator according to claim 33, further comprising:
means for generating a rhetorical structure tree based on said flat file; and means for identifying said rhetorical feature based on said rhetorical structure tree, wherein said rhetorical feature comprises at least one of:
a discourse structure, said discourse structure being typically associated with an elementary discourse unit; a rhetorical relation, said rhetorical relation describing a manner of association between a plurality of said discourse structures; and a status, said status comprising:
a nucleus, said nucleus being associated with a relatively more important one of said plurality of discourse structures; and a satellite, said satellite being associated with a relatively less important one of said plurality of discourse structures.
- 37. The automatic essay annotator according to claim 30, wherein said means for determining said probability further comprises:
means for mapping said rhetorical structure tree to a plurality of models and said probability being determined based on a voting algorithm.
- 38. An automatic essay annotator comprising:
a feature extractor, said feature extractor comprising:
a position identifier configured to determine a positional feature associated with a sentence of said essay, said position identifier further configured to generate a flat file, said flat file including an entry for said sentence, said entry including data associated with said positional feature; a lexical item identifier configured to identify a lexical feature associated with said sentence, said lexical item identifier further configured to modify said entry to include data associated with said lexical feature; and a rhetorical relation identifier configured to identify a rhetorical feature, said rhetorical relation identifier being further configured to modify said entry to include data associated with said rhetorical feature; and a discourse analysis modeler configured to determine a probability of said sentence being a discourse element, said discourse analysis modeler being configured to determine said probability by mapping said flat file to a model, said model having been generated by a machine learning application based on at least one annotated essay, said discourse analysis modeler being further configured to annotate said essay based on said probability.
- 39. The automatic essay annotator according to claim 38, wherein said discourse analysis modeler is further configured to determine said probability of said sentence being at least one of a plurality of discourse elements, said plurality of discourse elements including: title; background; thesis statement; main points; support; and conclusion.
- 40. The automatic essay annotator according to claim 38, wherein said feature extractor is configured to receive said essay.
- 41. The automatic essay annotator according to claim 38, wherein said position identifier is further configured to determine at least one of:
a sentence position, said sentence position being associated with a position of said sentence within said essay; a relative sentence position, said relative sentence position being associated with a relative position of said sentence within said essay; a paragraph position, said paragraph position being associated with a position of said sentence within a paragraph of said essay; and a relative paragraph position, said relative paragraph position being associated with a relative position of said paragraph within said essay.
- 42. The automatic essay annotator according to claim 38, wherein said lexical item identifier comprising:
a category-specific cue identifier configured to identify a cue typically associated with a discourse element; a general vocabulary cue identifier configured to identify a cue typically associated with a discourse structure; and a key term identifier configured to identify a key term, said key term being typically associated with a discourse relationship.
- 43. The automatic essay annotator according to claim 38, further comprising:
a punctuation identifier configured to identify a punctuation associated with said sentence, said punctuation identifier further configure to modify said entry to include data associated with said punctuation.
- 44. The automatic essay annotator according to claim 38, wherein said rhetorical relation identifier is further configured to generate a rhetorical structure tree based on said flat file and identify said rhetorical feature based on said rhetorical structure tree, wherein said rhetorical feature comprises at least one of:
a discourse structure, said discourse structure being typically associated with an elementary discourse unit; a rhetorical relation, said rhetorical relation describing a manner of association between a plurality of said discourse structures; and a status, said status comprising:
a nucleus, said nucleus being associated with a relatively more important one of said plurality of discourse structures; and a satellite, said satellite being associated with a relatively less important one of said plurality of discourse structures.
- 45. The automatic essay annotator according to claim 38, wherein said discourse analysis modeler being further configured to map said rhetorical structure tree to a plurality of models and determine a probability of said sentence being a discourse element based on a voting algorithm.
Parent Case Info
[0001] This application is a continuation in part of application Ser. No. 10/052,380, filed on Jan. 23, 2002, and which is incorporated herein.
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
10052380 |
Jan 2002 |
US |
Child |
10176534 |
Jun 2002 |
US |