Claims
- 1. A method to categorize a plurality of informational items in an information retrieval system, comprising the steps of:
identifying one or more groupings of the informational items into one or more clusters; classifying said clusters by identifying rules to assign the informational items to a specific one of said clusters; and summarizing each of said clusters by determining descriptive titles to uniquely identify each of said clusters.
- 2. The method, as recited in claim 1, wherein said clustering, classifying and summarizing are all performed on a given set of data in no particular order.
- 3. The method, as recited in claim 1, wherein said clustering includes an extension to BIRCH clustering algorithm, said extension comprising:
incorporating into subsequent classification and summarization steps one or more key points that define a bin of information items; and augmenting said key points with ranking scores to better identify the most important items within a pool of informational items for subsequent classification and summarization.
- 4. The method, as recited in claim 1, wherein said classifying includes an adaptation of RIPPER classification algorithm, said adaptation includes analyzing a hierarchical classification scheme wherein key points generated by a BIRCH algorithm are used to generate a set of RIPPER classification rules.
- 5. The method, as recited in claim 1, wherein said summarizing includes applying a pruning algorithm to a hierarchically modified combination of RIPPER and BIRCH algorithms.
- 6. A method, as recited in claim 1, further comprising the presentation of one or more search result information comprising:
identifying and describing said clustered informational items; searching said clustered informational item, said search being based on criteria that includes cluster description and specified informational items; and presenting the search result information as labeled clustered informational items.
- 7. A method for detailing one or more inter-relationship between one or more individual informational items and one or more clusters of information, said inter-relationship being depicted by links between said informational items and said clusters.
- 8. A method for reducing complexity and presenting cognitively important information to a user as a result of a search comprising;
identifying one or more levels of hierarchy by clustering, classifying and summarizing informational items sought in the search; and applying PATHFINDER algorithm at each said levels of hierarchy.
- 9. A computer-readable medium having computer-executable instructions for performing the steps recited in claim 1.
- 10. A computer-readable medium having computer-executable instructions for performing the steps recited in claim 7.
- 11. A computer-readable medium having computer-executable instructions for performing the steps recited in claim 8.
- 12. A method of extracting and reinforcing linguistic or statistically relevant features on a plurality of textual informational items in an information retrieval system, comprising the steps of:
identifying and ranking terms; language specific part of speech tagging; identifying useful linguistic and statistical features; optional term and feature reinforcement with supplied or automatically generated secondary information; and optional term and feature pruning reducing the number of features.
- 13. The method, as recited in claim 1 and claim 12, wherein said feature selection, clustering, classifying and summarizing are all performed on a given set of data in no particular order.
- 14. A computer-readable medium having computer-executable instructions for performing the steps recited in claim 12.
- 15. A computer-readable medium having computer-executable instructions for performing the steps recited in claim 13.
- 16. A method, as recited in claim 2, wherein said clustering is extended by adding an adaptive method consisting of:
application of said clustering; statistical evaluation of said clustering output for quality; usage of said statistics to alter the plurality of clustering parameters; re-running of clustering algorithm; and continuing this process until some stopping condition is met.
- 17. A computer-readable medium having computer-executable instructions for performing the steps recited in claim 16.
- 18. The method, as recited in claim 13 and claim 16, wherein said feature selection, adaptive clustering, classifying and summarizing are all performed on a given set of data in no particular order.
- 19. A computer-readable medium having computer-executable instructions for performing the steps recited in claim 18.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Serial No. 60/314,796, filed Aug. 24, 2001, which is fully incorporated herein by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60314796 |
Aug 2001 |
US |