Claims
- 1. A method for automatically determining a semantic structure of an electronically formatted natural language based document consisting essentially of words, the method comprising the steps of:
- a) providing a numerical representation as a digital signal of the words within the document wherein said numerical representation contains some information relating the semantic content of the word to the semantic content of the document
- b) performing a wavelet transform on said signal, thereby determining the semantic structure.
- 2. The method of claim 1 wherein said wavelet transform is selected from the group comprising a fast wavelet transform, a redundant wavelet transform, a non-orthogonal wavelet transform, a local cosine transform, and a local sine transform.
- 3. The method of claim 1 further comprising the step of utilizing the output of the wavelet transform to generate a visual representation of the semantic structure of the document.
- 4. The method of claim 3 wherein the visual representation of the semantic structure of the document is selected from the group comprising a text based representation and a graphical representation and combinations thereof.
- 5. The method of claim 1 wherein the method of providing said numerical representation of the words within the document is selected from the group consisting of words frequency counts within the entire document, words frequency counts within subsets of the words in said document, functions of word frequency counts within the entire document, functions of word frequency counts within subsets of the words in said document, statistical correlations between words in said document, statistical correlations between groups of words contained in said document, and combinations thereof.
- 6. The method of claim 1 further comprising the step of utilizing the output of the wavelet transform to partition the document.
- 7. The method of claim 6 wherein the document is partitioned according to the semantic structure of the document at a single level.
- 8. The method of claim 6 wherein the document is partitioned according to the semantic structure of the document at multiple levels to produce an outline of the document.
- 9. The method of claim 6 wherein the document is partitioned according to the semantic structure of the document at multiple levels to produce a fuzzy outline of the document.
Government Interests
This invention was made with Government support under Contract DE-AC06-76RL0 1830 awarded by the U.S. Department of Energy. The Government has certain rights in the invention.
US Referenced Citations (4)
Number |
Name |
Date |
Kind |
5604824 |
Chui et al. |
Feb 1997 |
|
5748796 |
Zandi et al. |
May 1998 |
|
5841473 |
Chui et al. |
Nov 1998 |
|
5987459 |
Swanson et al. |
Nov 1999 |
|