Claims
- 1. A semantic-based system comprising:
at least one client operable to issue a query; and a file system connected to the at least one client via a network, wherein the file system stores objects and semantic information for the objects that is searchable to execute the query.
- 2. The semantic-based system of claim 1, wherein the semantic information includes semantic vectors for the objects, each semantic vector identifying predetermined features for an associated object.
- 3. The semantic-based system of claim 1, wherein the semantic vectors are searchable to identify objects having similar predetermined features.
- 4. The semantic-based system of claim 2, wherein the file system further comprises a semantic catalogue including each semantic vector, an associated object name and a location of the associated object in the file system.
- 5. The semantic-based system of claim 1, wherein the query identifies at least one semantic for searching the file system to execute the query.
- 6. The semantic-based system of claim 1, wherein the file system comprises at least one extractor for creating a semantic vector for each of the objects of a specific file type.
- 7. The semantic-based system of claim 6, wherein the file system comprises an extractor registry identifying each extractor in the file system.
- 8. The semantic-based system of claim 7, wherein the extractor registry is operable to add or remove an extractor from the file system.
- 9. The semantic-based system of claim 1, wherein the file system is a distributed file system overlaid on a peer-to-peer network comprising a plurality of nodes.
- 10. The semantic-based system of claim 9, wherein the distributed file system is a distributed archival file system operable to store a plurality of versions of files and semantic information for each version.
- 11. A distributed file system comprising:
a plurality of nodes storing objects; at least one extractor extracting semantic information for the objects; and a semantic catalogue including the semantic information for the objects, the semantic catalogue being stored in the plurality of nodes.
- 12. The distributed file system of claim 11, wherein the distributed file system is operable to execute a semantic-based query to identify one or more objects having a semantic provided in the query.
- 13. The distributed file system of claim 11, wherein the semantic information is semantic vectors for the objects, wherein each semantic vector identifies predetermined features for an associated object.
- 14. The distributed file system of claim 11, wherein the distributed file system is overlaid on a peer-to-peer network comprising the plurality of nodes.
- 15. The distributed file system of claim 8, wherein the semantic catalogue is distributed among the nodes.
- 16. A node in a semantic-based distributed file system, the node comprising:
a processor; a storage device storing objects; a semantic catalogue containing semantic information for the objects; and an extractor, wherein the processor is operable to execute the extractor for extracting the semantic information contained in the semantic catalogue.
- 17. The node of claim 16, wherein the semantic catalogue comprises at least one entry, the at least one entry including an object name, semantic information for the object, and location of the object.
- 18. A method for searching a semantic-based file system storing a plurality of objects, the method comprising steps of:
receiving a semantic query, the semantic query identifying at least one semantic; searching semantic vectors stored in the file system for the at least one semantic, wherein each semantic vector is associated with an object stored in the file system; and generating a result of the search.
- 19. The method of claim 18, wherein the semantic-based file system stores a semantic catalogue including an entry for each of the plurality of objects, each entry comprising an object name, a semantic vector associated with the object and the location of the object.
- 20. The method of claim 19, wherein the step of searching further comprises searching the semantic catalogue for the at least one semantic.
- 21. The method of claim 20, wherein the step of generating a result further comprises steps of:
identifying at least one object from the catalogue meeting the semantic query; identifying location of the at least one object; and retrieving the at least one object from the location.
- 22. A semantic-based file system comprising:
means for receiving a semantic query, the query identifying at least one semantic; means for searching semantic vectors stored in the file system for the at least one semantic, wherein each semantic vector is associated with an object stored in the file system; and means for generating a result of the search.
- 23. The semantic-based file system of claim 22 storing a semantic catalogue including an entry for each of the plurality of objects, each entry comprising an object name, a semantic vector associated with the object and the location of the object.
- 24 The semantic-based file system of claim 23, wherein the means for searching is operable to search the semantic catalogue for the at least one semantic.
- 25. The semantic-based file system of claim 23, wherein the means for generating a result further comprises:
means for identifying at least one object from the catalogue meeting the query; means for identifying location of the at least one object; and means for retrieving the at least one object from the location.
- 26. A method of performing a write operation in a semantic archival file system, the method comprising steps of:
receiving a new version of a file stored in the file system; computing a diff for the new version and the file; comparing the diff to a threshold; and storing the diff in the file system in response to the diff being less than the threshold.
- 27. The method of claim 26 further comprising steps of:
selecting a new file in response to the diff being greater than the threshold, the new file having similar semantics to the new version; computing a second diff, the second diff being for the new version and the new file; and storing the second diff in response to the second diff not being greater than the threshold.
- 28. The method of claim 27, wherein the step of selecting a new file comprises steps of:
generating a semantic vector for new version; comparing the semantic vector to semantic vectors for other files stored in the file system; identifying one of the semantic vectors for the other files that is similar to the semantic vector for the new version; and selecting the file associated with the identified semantic vector.
CROSS-REFERENCE
[0001] The present invention is related to pending:
[0002] U.S. Application Ser. No. ______. (Attorney Docket No. 200207182-1) filed herewith, and entitled “SEMANTIC HASHING”, by Xu et al.; and
[0003] U.S. Application Ser. No. ______, (Attorney Docket No. 200207183-1) filed herewith, and entitled “SNAPSHOT OF A FILE SYSTEM” by Mahaligam et al.; which are all assigned to the assignee and are incorporated by reference herein in their entirety.