Claims
- 1. A text segmentation apparatus comprising:
means for analyzing an electronic text to determine likelihood of segmentation point for each of sentence ends in said text based on a coherent unit; and means for segmenting said text into text segments based on said likelihood of segmentation point and a specified text segmentation size.
- 2. A text segmentation apparatus comprising:
means for analyzing an electronic text to determine likelihood of segmentation point for each of sentence ends in said text based on a coherent unit; and means for segmenting said text into text segments based on said likelihood of segmentation point, wherein when the size of any of said segmented text segments exceeds a threshold value to be determined based on a specified text segmentation size, said text segmentation apparatus is programmed to segment said text segment at the position having best likelihood of segmentation point within said text segment.
- 3. The text segmentation apparatus as claimed in claim 2, further comprising means for setting up a pair of windows, each having a predetermined window size, on both left and right sides of each of said sentence ends in said text, for determining similarity of terms contained in said left and right windows, wherein said means for analyzing determines the likelihood of segmentation point based on said similarity.
- 4. The text segmentation apparatus as claimed in claim 3, wherein said means for setting up determines an overall likelihood of segmentation point F(c) based on a plurality of likelihood of segmentation point f(c) each of which is determined respectively for each of a number (L) of different window sizes, where c represents a respective sentence end position.
- 5. The text segmentation apparatus as claimed in claim 2, wherein when the size of any of said segmented text segments is smaller to a predetermined degree than the specified text segmentation size, said apparatus is programmed to revisit the previous unsegmented text segment so as to segment said text at the position having second best likelihood of segmentation point.
- 6. The text segmentation apparatus as claimed in claim 2, wherein said apparatus is programmed to determine the similarity between the segmented text segments and form association links on the text segments if their determined similarity exceeds a predetermined threshold value.
- 7. The text segmentation apparatus as claimed in claim 6, wherein said text segments are formatted using a markup language, and said association links are embedded in said text segments using said markup language.
- 8. A display device receiving the segmented text segments from the text segmentation apparatus as claimed in claim 2 for displaying said text segments in sequence in terms of their association.
- 9. The display device as claimed in claim 8, wherein said segmented text segments are associated each other based on the similarity between the text segments or the global text structure and then they are displayed in sequence of such association.
- 10. The text segmentation apparatus as claimed in claim 2, wherein said specified size is determined in accordance with the characteristics of the display device for displaying said text segments.
- 11. A text segmentation method comprising the steps of:
analyzing an electronic text to determine likelihood of segmentation point for each of sentence ends in said text based on a coherent unit; and segmenting said text into text segments based on said likelihood of segmentation point, wherein when the size of any of said segmented text segments exceeds a threshold value to be determined based on a specified text segmentation size, said text segment is segmented at the position having best likelihood of segmentation point within said text segment.
- 12. The text segmentation method as claimed in claim 11, further comprising a step for setting up a pair of windows, each having a predetermined window size, on both left and right sides of each of said sentence ends in said text and for determining the similarity of terms contained in said left and right windows, wherein said step for analyzing determines the likelihood of segmentation point based on said similarity.
- 13. The text segmentation method as claimed in claim 12, wherein said step for setting up determines an overall likelihood of segmentation points F(c) based on a plurality of likelihood of segmentation points f(c) each of which is determined respectively for each of a number (L) of different window sizes where c represents respective sentence end positions.
- 14. The text segmentation method as claimed in claim 11, further comprising a step for revisiting previous unsegmented text segment so as to segment said text at the position having second best likelihood of segmentation point when the size of any of said segmented text segments is smaller to a predetermined degree than the specified text segmentation size.
- 15. The text segmentation method as claimed in claim 11, further comprising a step for determining the similarity between the segmented text segments and form association links on the text segments if their determined similarity exceeds a predetermined threshold value.
- 16. The text segmentation method as claimed in claim 15, wherein said text segments are formatted using a markup language, and said association links are embedded in said text segments using said markup language.
Priority Claims (1)
Number |
Date |
Country |
Kind |
2000-302321 |
Oct 2000 |
JP |
|
Parent Case Info
[0001] This application claims priority from PCT Patent Application No. PCT/US01/30734, filed Oct. 02, 2001 and Japanese Patent Application No. 2000-302321, filed Oct. 02, 2000.
PCT Information
Filing Document |
Filing Date |
Country |
Kind |
PCT/US01/30734 |
10/2/2001 |
WO |
|