Claims
- 1. A registration method for structured documents comprising the steps of:
preparing correspondence data between a string and a string occurrence position within a structured document for each structured document, and additionally storing the correspondence data in an occurrence frequency extracting index; and preparing a list of a character, an element containing the character and an element length thereof, and additionally storing the list in an element length index.
- 2. A registration method for structured documents according to claim 1, wherein
the occurrence frequency extracting index includes a tree-structured data section and an index storing section, the tree-structured data section indicating correspondence between a string and an identifier of the string, the index storing section being a list of an identifier of a document in which the string occurs, an identifier of a context in which the string occurs, and a character position for an identifier of the string.
- 3. A registration method for structured documents according to claim 1, wherein
the element length index comprises a list of a character, an identifier of a document containing the character, an identifier of an element containing the character, and a length of the element.
- 4. A registration method for structured documents according to claim 1, wherein
the element length index comprises a list of a group of characters, an identifier of a document containing at least one character of the set of characters, an identifier of an element containing at least one character of the group of characters, and a length of the element.
- 5. A registration method for structured documents according to claim 1, wherein
the element length is the number of characters contained in an element.
- 6. A registration method for structured documents according to claim 1, wherein
the element length is the number of bytes contained in an element.
- 7. A registration method for structured documents according to claim 1, wherein
the element length index is generated from a per-element character component table and an element length list prepared for each structured document, the per-element character component table indicating a relationship between an element identifier and a character occurrence within a structured document, the element length list being a list of an element identifier and a length of the element.
- 8. A search method for structured documents registered by the method according to claim 1, comprising the steps of:
inputting search conditions including a search term and an element for specifying a search range; obtaining an occurrence frequency and an occurrence position of the search term by decomposing the search term into a plurality of substrings and by using the plurality of substrings and the occurrence frequency extracting index; selecting a character from the search term, obtaining an element containing the character using the character from the element length index, and further extracting a length of the element within the search range; calculating a matching degree for the search conditions from the occurrence frequency and the occurrence position of the search term and the element length of the element within the search range; and outputting the element containing the search term and the matching degree.
- 9. A search method for structured documents according to claim 8, wherein
a first character of the search term is selected in the character selecting step.
- 10. A search system for structured documents comprising:
a) structured document registering means comprising:
occurrence frequency extracting index preparing means for preparing correspondence data between a string and a string occurrence position within a structured document for each structured document, and additionally storing the correspondence data in an occurrence frequency extracting index; and element length index preparing means for preparing a list of a character, an element containing the character and a length of the element for each structured document, and additionally storing the list in an element length index; b) structured document search means for searching a structured document comprising:
occurrence frequency extracting means for decomposing a search term into a plurality of substrings, and obtaining an occurrence frequency and an occurrence position of the search term using the plurality of substrings from the occurrence frequency extracting index; element length extracting means for selecting a character from the search term, obtaining an element containing the character and using the character from the element length index, and extracting a length of the element within a search range; and matching degree calculating means for calculating a matching degree for the search conditions from the occurrence frequency and the occurrence position of the search term and the length of the element within the search range.
- 11. Structured-document registration/search program group comprising:
a) a structured document registering program executing the steps of: preparing correspondence data between a string and an occurrence position of the string within a structured document for each structured document and additionally storing the correspondence data in an occurrence frequency extracting index; and preparing a list of a character, an element containing the character and a length of the element, and additionally storing the list in an element length index; and b) a structured document search program executing the steps of: inputting search conditions including a search term and an element for specifying a search range, decomposing the search term into a plurality of substrings, and obtaining an occurrence frequency and an occurrence position of the search term using the plurality of substrings from the occurrence frequency extracting index; selecting a character from the search term, obtaining, an element containing the character using the character from the element length index, and further extracting a length of the element within the search range; calculating a matching degree for the search conditions from the occurrence frequency and the occurrence position of the search term and the length of the element within the search range; and outputting the element containing the search term and the matching degree.
Priority Claims (1)
Number |
Date |
Country |
Kind |
10-136127 |
Apr 1998 |
JP |
|
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application relates to U.S. patent application Ser. No. ______ filed on Feb. 24, 1999 (Priority: Japan Application Number 10-043187, Attorney Docket. No. 500.36941X00), and assigned to the present assignee. The content of that application is incorporated herein by reference.
Continuations (1)
|
Number |
Date |
Country |
Parent |
09300594 |
Apr 1999 |
US |
Child |
10218495 |
Aug 2002 |
US |