Claims
- 1. A method for classification of documents, comprising the steps of:
receiving a classify command from a client for initiating a classification of documents, the classify instruction identifying input documents to be classified, a classification profile, and anchor values; retrieving the classification profile and input documents; extracting input values from each input document based on the anchor values; structuring the input values according to a search schema identified in the classification profile; performing similarity searches for determining similarity scores between each database document and each input document; and classifying the database documents based on profile and the similarity scores using classes and rules identified in the classification profile.
- 2. The method of claim 1, wherein the step of performing similarity searches comprises performing similarity searches for determining normalized similarity scores having values of between 0.00 and 1.00 for each for each database document for indicating a degree of similarity between each database document and each input document, whereby a normalized similarity score of 0.00 represents no similarity matching, a value of 1.00 represents exact similarity matching, and scores between 0.00 and 1.00 represent degrees of similarity matching.
- 3. The method of claim 1, wherein the step of retrieving the classification profile and input documents comprises retrieving the classification profile and input documents having repeating groups.
- 4. The method of claim 1, further comprising the steps of:
storing the classified database documents in a results database; and notifying the client of completion of the classify command.
- 5. The method of claim 4, wherein the step of storing the classified database documents comprises storing the classified database documents as a classification results file in a results database.
- 6. The method of claim 4, wherein the step of storing the classified database documents comprises storing the classified database documents in an output target database identified in the classification profile.
- 7. The method of claim 1, wherein each of the classes identified in the classification profile comprises an identification attribute, a name element, and a rank element.
- 8. The method of claim 7, further comprising a low score element and a high score element for defining lower and upper thresholds for similarity scores associated with the class.
- 9. The method of claim 1, wherein each of the rules identified in the classification profile comprises an identification attribute, a description element, and a condition element.
- 10. The method of claim 9, further comprising property elements for describing conditions for including a document in a parent class.
- 11. The method of claim 1, further comprising the step of mapping between defined classes and defined rules using class rule map files.
- 12. The method of claim 1, wherein the step of classifying the database documents is selected from the group consisting of classifying a document based on a threshold using a top score from results of more than one search schema, classifying a document based on a logical relationship and a threshold using a top score from more results of more than one search schema, classifying a document based on a number of search results for a single schema that have scores greater than a threshold, and classifying a document based on a number of search results from multiple schemas having scores above a threshold.
- 13. The method of claim 1, wherein the step of classifying the database documents further comprises classifying the multiple database documents based on profile and the similarity scores using classes and rules identified in the classification profile using a classify utility.
- 14. A computer-readable medium containing instructions for controlling a computer system to implement the method of claim 1.
- 15. A system for classification of documents, comprising:
a classification engine for receiving a classify command from a client for initiating a classification of documents, the classify instruction identifying input documents to be classified, a classification profile, and anchor values; the classification engine for retrieving the classification profile and input documents from a virtual document manager; the classification engine for extracting input values from each input document based on the anchor values; an XML transformation engine for structuring the input values according to a search schema identified in the classification profile; a search manager for performing similarity searches for determining similarity scores between each database document and each input document; and the classification engine for classifying the database documents based on profile and the similarity scores using classes and rules identified in the classification profile.
- 16. The system of claim 15, wherein the search manager performs similarity searches comprises performing similarity searches for determining normalized similarity scores having values of between 0.00 and 1.00 for each for each database document for indicating a degree of similarity between each database document and each input document, whereby a normalized similarity score of 0.00 represents no similarity matching, a value of 1.00 represents exact similarity matching, and scores between 0.00 and 1.00 represent degrees of similarity matching.
- 17. The system of claim 15, wherein the classification retrieves the classification and input documents having repeating groups.
- 18. The system of claim 18, further comprising the classification engine for storing the classified database documents in a results database and notifying the client of completion of the classify command.
- 19. The system of claim 18, wherein the classification engine stores the classified database documents as a classification results file in a results database.
- 20. The system of claim 18, wherein the classification engine stores the classified database documents in an output target database identified in the classification profile.
- 21. The system of claim 15, wherein each of the classes identified in the classification profile comprises an identification attribute, a name element, and a rank element.
- 22. The system of claim 21, further comprising a low score element and a high score element for defining lower and upper thresholds for similarity scores associated with the class.
- 23. The system of claim 15, wherein each of the rules identified in the classification profile comprises an identification attribute, a description element, and a condition element.
- 24. The system of claim 23, further comprising property elements for describing conditions for including a document in a parent class.
- 25. The system of claim 15, further comprising the classification for mapping between defined classes and defined rules using class rule map files.
- 26. The system of claim 15, wherein the classification engine for classifying the database documents is selected from the group consisting of means for classifying a document based on a threshold using a top score from results of more than one search schema, means for classifying a document based on a logical relationship and a threshold using a top score from more results of more than one search schema, means for classifying a document based on a number of search results for a single schema that have scores greater than a threshold, and means for classifying a document based on a number of search results from multiple schemas having scores above a threshold.
- 27. The system of claim 15, wherein the classification engine further comprises means for classifying the multiple database documents based on profile and the similarity scores using classes and rules identified in the classification profile using a classify utility.
- 28. A system for classification of documents comprising:
a classification engine for accepting a classify command from a client, retrieving a classification profile, classifying documents based on similarity scores, rules and classes, storing document classification results in a database, and notifying the client of completion of the classify command; a virtual document manager for providing input documents; an XML transformation engine for structuring the input values according to a search schema identified in the classification profile; and a search manager for performing similarity searches for determining similarity scores between each database document and each input document.
- 29. The system of claim 28, further comprising an output queue for temporarily storing classified documents.
- 30. The system of claim 28, further comprising a database management system for storing classification results.
- 31. A method for classification of documents, comprising:
receiving a classify command from a client, the classify command designating input document elements for names and search schema, anchor document structure and values to be used as classification filters, and a classification profile; retrieving the designated classification profile, the classification profile designating classes files for name, rank and score thresholds, rules files for nested conditions, properties, schema mapping, score threshold ranges and number of required documents, and class rules maps for class identification, class type, rule identification, description, property, score threshold ranges and document count; retrieving the designated search documents; identifying a schema mapping file for each input document; determining a degree of similarity between each input document and anchor document; classifying the input documents according to the designated classes files and rules files; and creating and storing a classification results file in a database.
- 32. The method of claim 31, wherein the number of documents classified is designated in the rules files.
- 33. The method of claim 31, further comprising notifying the client of completion of the classify command.
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims benefit of U. S. Provisional Application No. 60/319,138, filed on Mar. 6, 2002.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60319138 |
Mar 2002 |
US |