Claims
- 1. A method for identifying user types in a collection of connected content portions comprising:
determining at least one significant user path of connected content portions determining a multi-modal user path user information need for each at least one significant user path; for each content portion comprising each of the at least one significant user path,
determining a multi-modal content portion feature information including at least two of a content feature information, connection feature information, inward connection feature information and outward connection feature information; combining each multi-modal content portion feature information for the user path with the multi-modal user path user information need; determining a similarity function and a measure of similarity for the multi-modal user path information; determining a multi-modal clustering type; clustering the multi-modal user path information based on the multi-modal clustering type, the similarity function and the measure of similarity.
- 2. The method of claim 1, wherein the multi-modal user path user information need is a multi-modal user path information need vector and the multi-modal content portion feature information is a multi-modal content portion feature vector.
- 3. The method of claim 2, wherein determining significant user paths uses the longest repeating sub-sequences.
- 4. The method of claim 2, wherein determining content feature information is based on weighted word frequency of each content portion.
- 5. The method of claim 2, wherein determining the connection feature information comprises breaking the connection portion into constituent words using “/” and “.” as word boundaries.
- 6. The method of claim 2, wherein determining the inward connection feature information and the outward connection feature information further comprises normalizing the inward connection feature information and the outward connection feature information.
- 7. The method of claim 2, wherein the similarity functions is based on determining the cosine between two multi-modal vectors.
- 8. The method of claim 2, wherein the multi-modal clustering type is at least one of K-means clustering, wavefront clustering.
- 9. The method of claim 2, wherein each content portion in the user path is weighted using at least one of a content portion access frequency weighting, a weighting of the content portion based on content portion position in the user path.
- 10. The method of claim 2, wherein each multi-modal feature vector may be independently weighted.
- 11. A system for identifying user types in a collection of connected content portions comprising:
a controller circuit, a memory circuit a input/output circuit; a multi-modal clustering type determining circuit; a content determining circuit; a usage determining circuit; a topology determining circuit; a user path determining circuit that determines at least one significant user path of connected content portions; a multi-modal user path user information need determining circuit that determines a user information need for each user path; multi-modal content, multi-modal connection, multi-modal inward connection and multi-modal outward connection feature information determining circuits that determine multi-modal content, multi-modal connection, multi-modal inward connection and multi-modal outward connection feature information for each content portion comprising a user path; wherein the controller combines each content portion multi-modal content, multi-modal connection, multi-modal inward connection and multi-modal outward connection feature information for the user path with the multi-modal user path user information need into a multi-modal user type; a similarity function determining circuit for determining similarity between two multi-modal information; a multi-modal clustering circuit that clusters the multi-modal user type information based on the multi-modal clustering type, the similarity function and a specified measure of similarity.
- 12. The system of claim 1, wherein the multi-modal user path user information need is a multi-modal user path information need vector and the multi-modal content portion feature information is a multi-modal content portion feature vector.
- 13. The system of claim 12, wherein the user path determining circuit determines significant user paths using the longest repeating subsequences.
- 14. The system of claim 12, wherein the multi-modal content feature information determining circuit determines words based on weighted word frequency of each content portion.
- 15. The system of claim 12, wherein the multi-modal connection feature information determining circuit determines connection features by breaking the connection portion or link into constituent words using “/” and “.” as word boundaries.
- 16. The system of claim 12, wherein the multi-modal inward connection feature determining circuit and the multi-modal outward connection feature determining circuit normalize the inward connection feature information and the outward connection feature information.
- 17. The system of claim 12, wherein the similarity function determining circuit determines similarity based on the cosine between two multi-modal vectors.
- 18. The system of claim 12, wherein the multi-modal clustering type is at least one of K-means clustering, wavefront clustering.
- 19. The system of claim 12, wherein each content portion in the user path is weighted by at least one of a content portion access frequency weighting circuit that weights the content portion based on access frequency, a path position weighting circuit that determines a weighting based on the position of the content portion within the user path.
- 20. The system of claim 12, further comprising a multi-modal feature weighting circuit that weights each multi-modal feature vector independently.
INCORPORATION BY REFERENCE
[0001] The following co-pending applications:
[0002] “SYSTEMS AND METHODS FOR PREDICTING USAGE OF A WEB SITE USING PROXIMAL CUES”, by E. Chi et al., Attorney Docket No. DA0A29, filed March 30, as U.S. application Ser. No. ______;
[0003] “SYSTEMS AND METHOD FOR INFORMATION BROWSING USING MULTI-MODAL FEATURES”, by F. Chen et al., Attorney Docket No. D/99011, filed Oct. 19, 1999, as U.S. application Ser. No. ______;
[0004] “SYSTEM AND METHOD FOR PROVIDING RECOMMNDATIONS BASED ON MULTI-MODAL USER CLUSTERS”, by H. Schuetze et al., Attorney Docket No. D/99197, filed Oct. 19, 1999, as U.S. application Ser. No. ______;
[0005] “SYSTEM AND METHOD FOR QUANTITATIVELY REPRESENTING DATA OBJECTS IN VECTOR SPACE”, by H. Schuetze et al., Attorney Docket No. D/99198, filed Oct. 19, 1999, as U.S. application Ser. No. ______;
[0006] “SYSTEM AND METHOD FOR IDENTIFYING SIMILARITIES AMONG DOCUMENTS IN A COLLECTION”, by H. Schuetze et al., Attorney Docket No. D/99198Q1, filed Oct. 19, 1999 as U.S. application Ser. No. ______;
[0007] “SYSTEM AND METHOD FOR CLUSTERING DATA OBJECTS IN A COLLECTION”, Schuetze et al., Attorney Docket No. D/991982, filed Oct. 19, 1999 as U.S. application Ser. No. ______;
[0008] “SYSTEM AND METHOD FOR VISUALLY REPRESENTING THE CONTENTS OF A MULTIPLE DATA OBJECT CLUSTER”, by H. Schuetze et al., Attorney Docket No. D/99198Q3, filed Oct. 19, 1999, as U.S. application Ser. No. ______; are each incorporated herein by reference in the entirety.
[0009] “SYSTEM AND METHOD FOR PREDICTING THE USAGE OF A WEB SITE USING PROXIMAL CUES”, by Ed. Chi et al., Attorney Docket No. D/A0A29, filed Mar. 30, 2001, as U.S. application Ser. No. ______;
[0010] “SYSTEM AND METHOD FOR INFERRING USER INFORMATION NEED IN A HYPERMEDIA LINKED DOCUMENT COLLECTION ” by Ed Chi et al., Attorney Docket No. D/99794, filed Mar. 31, 2000, as U.S. application Ser. No. 09/540063; are each incorporated herein by reference in the entirety.
GOVERNMENT LICENSE PROVISION
[0011] The U.S. Government has a paid-up license-in this invention and the right in limited circumstances to require the patent owner to license others on reasonable terms as provided for by the terms of Contract No. N00014-96-C-0097 awarded by the Office of Naval Research.