Claims
- 1. A method for classification of documents, comprising the steps of:
receiving a classify instruction from a client for initiating a classification of documents, the classify instruction identifying input documents to be classified, a classification profile, and anchor values; retrieving the classification profile and input documents; extracting input values from each input document based on the anchor values; structuring the input values according to a search schema identified in the classification profile; performing similarity searches for determining similarity scores between each database document and each input document; performing external analysis of the database documents for determining external analytic scores; classifying the database documents based on profile, external analytic scores and the similarity scores using classes and rules identified in the classification profile; and notifying the client of completion of the classify command.
- 2. The method of claim 1, wherein the step of performing similarity searches comprises performing similarity searches for determining normalized similarity scores having values of between 0.00 and 1.00 for each for each database document for indicating a degree of similarity between each database document and each input document, whereby a normalized similarity score of 0.00 represents no similarity matching, a value of 1.00 represents exact similarity matching, and scores between 0.00 and 1.00 represent degrees of similarity matching.
- 3. The method of claim 1, wherein the step of retrieving the classification profile and input documents comprises retrieving the classification profile and input documents having repeating groups.
- 4. The method of claim 1, wherein the step of classifying further comprises scoring the database documents relative to other database documents in a same class according to predetermined scoring thresholds.
- 5. The method of claim 4, further comprising defining an upper and a lower threshold for scores associated with a class.
- 6. The method of claim 1, further comprising the step of storing the classified database documents as a classification results file in a results database.
- 7. The method of claim 6, wherein the step of storing the classified database documents comprises storing the classified database documents in an output target database identified in the classification profile.
- 8. The method of claim 1, wherein each of the classes identified in the classification profile comprises an identification attribute, a name element, and a rank element.
- 9. The method of claim 8, further comprising a low score element and a high score element for defining lower and upper thresholds for similarity scores associated with the class.
- 10. The method of claim 1, wherein each of the rules identified in the classification profile comprises an identification attribute, a description element, and a condition element.
- 11. The method of claim 10, further comprising property elements for describing conditions for including a document in a parent class.
- 12. The method of claim 1, further comprising the step of mapping between defined classes and defined rules using class rule map files.
- 13. The method of claim 1, wherein the step of classifying the database documents is selected from the group consisting of classifying a document based on a threshold using a top score from results of more than one search schema, classifying a document based on a logical relationship and a threshold using a top score from more results of more than one search schema, classifying a document based on a number of search results for a single schema that have scores greater than a threshold, classifying a document based on a number of search results from multiple schemas having scores above a threshold, classifying a document based on external analytics for determining a document score, and classifying a document according to score rankings based on external analytics for determining a document score.
- 14. The method of claim 1, wherein the step of classifying the database documents further comprises classifying the multiple database documents based on profile, external analytic scores, and the similarity scores using classes and rules identified in the classification profile using a classify utility.
- 15. A computer-readable medium containing instructions for controlling a computer system to implement the method of claim 1.
- 16. A system for classification of documents, comprising:
a classification engine for receiving a classify instruction from a client for initiating a classification of documents, the classify instruction identifying input documents to be classified, a classification profile, and anchor values; the classification engine for retrieving the classification profile and input documents from a virtual document manager; the classification engine for extracting input values from each input document based on the anchor values; an XML transformation engine for structuring the input values according to a search schema identified in the classification profile; a search manager for performing similarity searches for determining similarity scores between each database document and each input document; external analytics for performing external analysis of the database documents for determining external analytic scores; the classification engine for classifying the database documents based on profile, external analytic scores and the similarity scores using classes and rules identified in the classification profile; and means for notifying the client of completion of the classify command.
- 17. The system of claim 16, further comprising the search manager for performing similarity searches for determining normalized similarity scores having values of between 0.00 and 1.00 for each for each database document for indicating a degree of similarity between each database document and each input document, whereby a normalized similarity score of 0.00 represents no similarity matching, a value of 1.00 represents exact similarity matching, and scores between 0.00 and 1.00 represent degrees of similarity matching.
- 18. The system of claim 16, further comprising the classification engine for retrieving the classification and input documents having repeating groups.
- 19. The system of claim 16, further comprising the classification engine for scoring the database documents relative to other database documents in a same class according to predetermined scoring thresholds.
- 20. The system of claim 16, further comprising the classification engine for storing the classified database documents as a classification results file in a results database.
- 21. The system of claim 20, wherein the classification engine stores the classified database documents in an output target database identified in the classification profile.
- 22. The system of claim 16, wherein each of the classes identified in the classification profile comprises an identification attribute, a name element, and a rank element.
- 23. The system of claim 22, further comprising a low score element and a high score element for defining lower and upper thresholds for similarity scores associated with the class.
- 24. The system of claim 16, wherein each of the rules identified in the classification profile comprises an identification attribute, a description element, and a condition element.
- 25. The system of claim 24, further comprising property elements for describing conditions for including a document in a parent class.
- 26. The system of claim 16, further comprising the classification engine for mapping between defined classes and defined rules using class rule map files.
- 27. The system of claim 16, wherein the classification engine for classifying the database documents is selected from the group consisting of means for classifying a document based on a threshold using a top score from results of more than one search schema, means for classifying a document based on a logical relationship and a threshold using a top score from more results of more than one search schema, means for classifying a document based on a number of search results for a single schema that have scores greater than a threshold, means for classifying a document based on a number of search results from multiple schemas having scores above a threshold, classifying a document based on external analytics for determining a document score, and classifying a document according to score rankings based on external analytics for determining a document score.
- 28. The system of claim 16, wherein the classification engine further comprises means for classifying the multiple database documents based on profile, external analytics, and the similarity scores using classes and rules identified in the classification profile using a classify utility.
- 29. A system for classification of documents comprising:
a classification engine for accepting a classify command from a client, retrieving a classification profile, classifying documents based on external analytic scores, similarity scores, rules and classes, storing document classification results in a database, and notifying the client of completion of the classify command; a virtual document manager for providing input documents; an XML transformation engine for structuring the input values according to a search schema identified in the classification profile; a search manager for performing similarity searches for determining similarity scores between each database document and each input document; and external analytics for determining external analytic scores.
- 30. The system of claim 29, further comprising an output queue for temporarily storing classified documents.
- 31. The system of claim 29, further comprising a database management system for storing classification results.
- 32. A method for classification of documents, comprising:
receiving a classify command from a client, the classify command designating input document elements for names and search schema, anchor document structure, external analytics and values to be used as classification filters, and a classification profile; retrieving the designated classification profile, the classification profile designating classes files for name, rank and score thresholds, rules files for nested conditions, properties, schema mapping, score threshold ranges and number of required documents, and class rules maps for class identification, class type, rule identification, description, property, score threshold ranges and document count; retrieving the designated search documents; identifying a schema mapping file for each input document; determining a degree of similarity between each input document and anchor document; determining analytic scores for each input document; classifying the input documents according to the designated classes files, analytic scores and rules files; creating and storing a classification results file in a database; and notifying the client of completion of the classify command.
- 33. The method of claim 32, wherein the number of documents classified is designated in the rules files.
Parent Case Info
[0001] This application claims benefit of U.S. provisional application 60/407,742, filed on Sep. 3, 2002, and is a continuation-in-part of U.S. application Ser. No. 10/248,962, filed on Mar. 5, 2003.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60407742 |
Sep 2002 |
US |
Continuation in Parts (1)
|
Number |
Date |
Country |
| Parent |
10248962 |
Mar 2003 |
US |
| Child |
10653432 |
Sep 2003 |
US |