Claims
- 1. A registration and search method for structured documents each structured document consisting of hierarchical elements, comprising:preparing correspondence data between a fixed-length-string and a string occurrence position within a structured document for all fixed-length-strings in the document and for each structured document, and additionally storing the correspondence data in an occurrence frequency extracting index; preparing a list of a character, all hierarchical elements containing the character and element lengths, each element length presenting a text length of the hierarchical element, and additionally storing the list in an element length index; obtaining an occurrence frequency and an occurrence position of search term by decomposing the search term into a plurality of fixed-length-substrings and by using the plurality of fixed-length-substrings and the occurrence frequency extracting index; selecting a search character from the search term, obtaining a hierarchical element containing the search character using the character from the element length index, and extracting a length of the element corresponding to a search range using the obtained occurrence position, the element length presenting a text length of the hierarchical element; and calculating a matching degree for the search term from the obtained occurrence frequency of the search term and the extracted element length of the element corresponding to the search range.
- 2. A method for structured documents according to claim 1, whereinthe occurrence frequency extracting index includes a tree-structured data section and an index storing section, the tree-structured data section indicating correspondence between a string and an identifier of the string, the index storing section being a list of an identifier of a document in which the string occurs, an identifier of a context in which the string occurs, and a character position for an identifier of the string.
- 3. A method for structured documents according to claim 1, whereinthe element length index comprises a list of a character, an identifier of a document containing the character, an identifier of an element containing the character, and a length of the element.
- 4. A method for structured documents according to claim 1, whereinthe element length index comprises a list of a group of characters, an identifier of a document containing at least one character of the set of characters, an identifier of an element containing at least one character of the group of characters, and a length of the element.
- 5. A method for structured documents according to claim 1, whereinthe element length is the number of characters contained in an element.
- 6. A method for structured documents according to claim 1, whereinthe element length is the number of bytes contained in an element.
- 7. A method for structured documents according to claim 1, whereinthe element length index is generated from a per-element character component table and an element length list prepared for each structured document, the per-element character component table indicating a relationship between an element identifier and a character occurrence within a structured document, the element length list being a list of an element identifier and a length of the element.
- 8. A method for structured document according to claim 1,wherein the element length index includes the lengths of all level elements including lowest-level elements, highest-level elements, and intermediate-level elements.
- 9. A search method for structured documents comprising the steps of:inputting search conditions including a search term and an element for specifying a search range; obtaining an occurrence frequency and an occurrence position of the search term by decomposing the search term into a plurality of substrings and by using the plurality of substrings and the occurrence frequency extracting index; selecting a search character from the search term, obtaining an element containing the search character using the character from element length index, and further extracting a length of the element corresponding to the search range using the obtained occurrence position; calculating a matching degree for the search term from the obtained occurrence frequency of the search term and the extracted element length of the element corresponding to the search range; and outputting the element containing the search term and the matching degree.
- 10. A method for structured documents according to claim 9, whereina first character of the search term is selected in the character selecting step.
CROSS-REFERENCE TO RELATED APPLICATION
This is a continuation of parent application Ser. No. 09/300,594, filed Apr. 28, 1999 now U.S. Pat. No. 6,496,820, allowed.
This application relates to U.S. patent application Ser. No. 09/256,178 filed on Feb. 24, 1999, which issued as U.S. Pat. No. 6,377,946 on Apr. 23, 2002. (Priority: Japan Application No. 10-043187), and assigned to the present assignee. The content of that application is incorporated herein by reference.
US Referenced Citations (14)
Non-Patent Literature Citations (1)
Entry |
“Information Retrieval Pre ntice Hall”, pp. 373-374 and pp. 219-227. |
Continuations (1)
|
Number |
Date |
Country |
Parent |
09/300594 |
Apr 1999 |
US |
Child |
10/218495 |
|
US |