Claims
- 1. A computer implemented method of information retrieval, comprising the steps of:
retrieving structural information of memorized documents according to a document type declaration that corresponds to each of said documents; retrieving elements, attributes and values of said elements and said attributes of said documents; generating a multilevel inverted index from said structural information, said elements, said attributes and said values; accepting a specification from a user having members that comprise at least one of said elements, said attributes and said values; responsive to said specification, extracting data from said index that complies with at least one of said members; displaying virtual directory paths of corresponding ones of said documents, wherein said directory paths each comprise a sequence of said members, and wherein contents of directories that are identified in said directory paths comprise selected ones of said documents possessing said specification.
- 2. The method according to claim 1, wherein said index comprises a structural section having postings of said structural information, and a words section having postings of said values, wherein said values are words.
- 3. The method according to claim 2, further comprising the step of arranging said directory paths in a hierarchy that is constructed in conformance with said specification.
- 4. The method according to claim 3, wherein said step of arranging comprises the steps of:
extracting a document identifier from one of said postings of said values; extracting an offset of a context from said one of said postings of said values; and extracting an entry length of said context from said one of said postings of said values.
- 5. The method according to claim 1, wherein said documents are XML documents.
- 6. The method according to claim 1, further comprising the steps of:
noting changes in a composition of a repository of said documents; and updating said index responsive to said changes.
- 7. The method according to claim 1, wherein said specification comprises a partial query and a complete query.
- 8. The method according to claim 1, wherein a portion of said specification is stated as a path name by the user.
- 9. A computer software product, comprising a computer-readable medium in which computer program instructions are stored, which instructions, when read by a computer, cause the computer to perform the steps of:
retrieving structural information of memorized documents according to a document type declaration that corresponds to each of said documents; retrieving elements, attributes and values of said elements and said attributes of said documents; generating a multilevel inverted index from said structural information, said elements, said attributes and said values; accepting a specification from a user having members that comprise at least one of said elements, said attributes and said values; responsive to said specification, extracting data from said index that complies with at least one of said members; associating said data with corresponding ones of said documents; displaying said corresponding ones of said documents as virtual directory paths, wherein said directory paths each comprise a sequence of said members, and wherein contents of directories that are identified in said directory paths comprise selected ones of said documents possessing said specification.
- 10. The computer software product according to claim 9, wherein said index comprises a structural section having postings of said structural information, and a words section having postings of said values, wherein said values are words.
- 11. The computer software product according to claim 10, further comprising the step of arranging said directory paths in a hierarchy that is constructed in conformance with said specification.
- 12. The computer software product according to claim 11, wherein said step of arranging comprises the steps of:
extracting a document identifier from one of said postings of said values; extracting an offset of a context from said one of said postings of said values; and extracting an entry length of said context from said one of said postings of said values.
- 13. The computer software product according to claim 9, wherein said documents are XML documents.
- 14. The computer software product according to claim 9, further comprising the steps of:
noting changes in a composition of a repository of said documents; and updating said index responsive to said changes.
- 15. The computer software product according to claim 9, wherein said specification comprises a partial query and a complete query.
- 16. The computer software product according to claim 9, wherein said specification is stated as a path name by the user.
- 17. The computer software product according to claim 9, wherein said specification is issued via a file system applications programming interface.
- 18. The computer software product according to claim 17, wherein said instructions define a file system engine that issues calls to an operating system.
- 19. A computer implemented information retrieval system for presenting a semantically dependent directory structure of XML files to a user, comprising:
a file system engine, that receives a file request via a file system application programming interface and issues file system calls to an operating system, wherein said file request specifies a file content of memorized files; an XML parser, linked to said file system engine, that retrieves structural information of XML documents, said XML parser further retrieving at least one of elements, attributes and respective values thereof from said XML documents; an indexer, linked to said XML parser, for constructing an inverted index of said elements and said attributes and said respective values thereof, wherein responsive to said file request, said file system engine retrieves postings of said inverted index that satisfy requirements of said file request, and returns directory paths to said file system application programming interface of selected ones of said XML documents corresponding to said postings.
- 20. The information retrieval system of claim 19, wherein said inverted index comprises a structural section having postings of said structural information, and a words section having postings of words of said XML documents.
- 21. The information retrieval system of claim 20, wherein said postings of said structural information and said postings of words comprise:
a document identifier of one of said XML documents; an offset of a context of said one XML document; and an entry length of said context of said one XML document.
- 22. The information retrieval system of claim 19, further comprising an XML analyzer for updating said inverted index, wherein said XML analyzer analyzes additions to said memorized files.
- 23. The information retrieval system of claim 19, wherein said XML parser retrieves said structural information from document type declarations of said XML documents.
- 24. The information retrieval system of claim 19, wherein said file request comprises a partial query and a complete query.
- 25. The information retrieval system of claim 19, wherein a portion of said file request is a path name.
- 26. The information retrieval system of claim 19, wherein a repository of said XML documents is a networked file system.
- 27. A computer implemented method of information retrieval, comprising the steps of:
retrieving structural information of memorized documents according to a document type declaration that corresponds to each of said documents, wherein said documents are written in a markup language; retrieving elements, attributes and values of said elements and said attributes of said documents; generating a multilevel inverted index from said structural information, said elements, said attributes and said values; accepting a specification from a user having members that comprise at least one of said elements, said attributes and said values; responsive to said specification, extracting data from said index that complies with at least one of said members; displaying virtual directory paths of corresponding ones of said documents, wherein said directory paths each comprise a sequence of said members, and wherein contents of directories that are identified in said directory paths comprise selected ones of said documents possessing said specification.
- 28. The method according to claim 27, wherein said index comprises a structural section having postings of said structural information, and a words section having postings of said values, wherein said values are words.
- 29. The method according to claim 28, further comprising the step of arranging said directory paths in a hierarchy that is constructed in conformance with said specification.
- 30. The method according to claim 29, wherein said step of arranging comprises the steps of:
extracting a document identifier from one of said postings of said values; extracting an offset of a context from said one of said postings of said values; and extracting an entry length of said context from said one of said postings of said values.
- 31. The method according to claim 27, further comprising the steps of:
noting changes in a composition of a repository of said documents; and updating said index responsive to said changes.
- 32. The method according to claim 27, wherein said specification comprises a partial query and a complete query.
- 33. The method according to claim 27, wherein a portion of said specification is stated as a path name by the user.
- 34. A computer software product, comprising a computer-readable medium in which computer program instructions are stored, which instructions, when read by a computer, cause the computer to perform the steps of:
retrieving structural information of memorized documents according to a document type declaration that corresponds to each of said documents; wherein said documents are written in a markup language; retrieving elements, attributes and values of said elements and said attributes of said documents; generating a multilevel inverted index from said structural information, said elements, said attributes and said values; accepting a specification from a user having members that comprise at least one of said elements, said attributes and said values; responsive to said specification, extracting data from said index that complies with at least one of said members; associating said data with corresponding ones of said documents; displaying said corresponding ones of said documents as virtual directory paths, wherein said directory paths each comprise a sequence of said members, and wherein contents of directories that are identified in said directory paths comprise selected ones of said documents possessing said specification.
- 35. The computer software product according to claim 34, wherein said index comprises a structural section having postings of said structural information, and a words section having postings of said values, wherein said values are words.
- 36. The computer software product according to claim 35, further comprising the step of arranging said directory paths in a hierarchy that is constructed in conformance with said specification.
- 37. The computer software product according to claim 36, wherein said step of arranging comprises the steps of:
extracting a document identifier from one of said postings of said values; extracting an offset of a context from said one of said postings of said values; and extracting an entry length of said context from said one of said postings of said values.
- 38. The computer software product according to claim 34, wherein said documents are XML documents.
- 39. The computer software product according to claim 34, further comprising the steps of:
noting changes in a composition of a repository of said documents; and updating said index responsive to said changes.
- 40. The computer software product according to claim 34, wherein said specification comprises a partial query and a complete query.
- 41. The computer software product according to claim 34, wherein said specification is stated as a path name by the user.
- 42. The computer software product according to claim 34, wherein said specification is issued via a file system applications programming interface.
- 43. The computer software product according to claim 42, wherein said instructions define a file system engine that issues calls to an operating system.
- 44. A computer implemented information retrieval system for presenting a semantically dependent directory structure of document files to a user, wherein documents of said document files are written in a markup language, comprising:
a file system engine, that receives a file request via a file system application programming interface and issues file system calls to an operating system, wherein said file request specifies a file content of memorized files; a parser of said markup language, linked to said file system engine, that retrieves structural information of said documents, said parser further retrieving at least one of elements, attributes and respective values thereof from said documents; an indexer, linked to said parser, for constructing an inverted index of said elements and said attributes and said respective values thereof, wherein responsive to said file request, said file system engine retrieves postings of said inverted index that satisfy requirements of said file request, and returns directory paths to said file system application programming interface of selected ones of said documents corresponding to said postings.
- 45. The information retrieval system of claim 44, wherein said inverted index comprises a structural section having postings of said structural information, and a words section having postings of words of said documents.
- 46. The information retrieval system of claim 45, wherein said postings of said structural information and said postings of words comprise:
a document identifier of one of said documents; an offset of a context of said one document; and an entry length of said context of said one document.
- 47. The information retrieval system of claim 44, further comprising an analyzer for updating said inverted index, wherein said analyzer analyzes additions to said memorized files.
- 48. The information retrieval system of claim 44, wherein said parser retrieves said structural information from document type declarations of said documents.
- 49. The information retrieval system of claim 44, wherein said file request comprises a partial query and a complete query.
- 50. The information retrieval system of claim 44, wherein a portion of said file request is a path name.
- 51. The information retrieval system of claim 44, wherein a repository of said documents is a networked file system.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of Provisional Application No. 60/209,475, filed Jun. 5, 2000.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60209475 |
Jun 2000 |
US |