Claims
- 1) A method of forming a summary for hierarchically related information, where the information can be represented as a set of nodes wherein each node is associated with a portion of the information and that portion of the information contains at least one sentence, the nodes are connected by directed edges wherein each node has at most one incoming edge, a parent node is the source of an incoming edge, a child node is the target of an outgoing edge
a) determining a sentence vector for each sentence associated with each node, b) determining a centroid vector of the sentence vectors, c) determining an intrinsic score for each sentence, d) selecting the sentence with the highest intrinsic score to form a summary, e) determining an extract score for each of the remaining sentences from the intrinsic score and the summary, f) selecting the sentence with the highest extract score and adding it to the summary, and g) repeating steps e) and f) until a desired number of sentences are selected.
- 2) The method of claim 1 wherein the centroid vector of the sentence vectors comprises a vector where each position in the vector contains the average of all values of all the sentence vectors.
- 3) The method of claim 1 wherein determining an intrinsic score for each sentence comprises:
a) determining a position of the sentence in the information, a position of the node associated with the sentence in the hierarchy, and a lexical centrality of the sentence to the information, and b) responsive to the determinations in step a determining an intrinsic score for the sentence.
- 4) The method of claim 1 wherein modifying the intrinsic score to determine an extract score for each of the remaining sentences comprises:
a) selecting one of the remaining sentences, b) determining the number of non-quoting sentences in the summary which are adjacent to the remaining sentence, the number of sentences in the summary whose associated nodes are either a parent node or a child node of the node associated with the selected sentence, the number of sentences in the summary which are quoted immediately before the selected sentence, and the number of sentences in the summary that appear immediately after a quote of the selected sentence, and c) responsive to the determinations in step b and the intrinsic score of the sentence, determining an extract score, and d) repeating steps a-c until each of the remaining sentences has an extract score.
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This patent application is related to:
[0002] U.S. patent application Ser. No. ______, titled “A Method and Apparatus for Normalizing Quoting Styles in Electronic Mail”, by Newman, filed concurrently herewith,
[0003] U.S. patent application Ser. No. ______, titled A Method and Apparatus for Clustering Hierarchically Related Information, by Newman et al. filed concurrently herewith,
[0004] U.S. patent application Ser. No. ______, titled “A Method and Apparatus for Generating Overview Information for Hierarchically Related Information”, by Newman et al. filed concurrently herewith,
[0005] U.S. patent application Ser. No., ______, titled “Method and Apparatus for Displaying Hierarchical Information”, by Newman filed concurrently herewith, and
[0006] U.S. patent application Ser. No. ______, titled “Method and Apparatus for Segmenting Hierarchical Information for Display Purposes”, by Newman filed concurrently herewith.