Claims
- 1. A method for partitioning a plurality of items into groups of similar items, said plurality of items corresponding to a selection history by at least one third party, said method comprising the steps of:partitioning said third party selection history into k clusters of said plurality of items, said plurality of items including at least one of programs, content and products; identifying at least one mean item for each of said k clusters; and assigning each of said plurality of items to one of said clusters based on a distance metric.
- 2. The method of claim 1, further comprising the step of incrementing said value of k until a further increment of k does not improve a classification accuracy.
- 3. The method of claim 1, further comprising the step of incrementing said value of k until a predefined performance threshold is reached.
- 4. The method of claim 1, further comprising the step of incrementing said value of k until an empty cluster is detected.
- 5. The method of claim 1, further comprising the step of assigning a label to each of said clusters.
- 6. The method of claim 5, further comprising the step of receiving a user selection of at least one cluster based on said assigned labels.
- 7. The method of claim 1, wherein said partitioning step further comprises the step of employing a k-means clustering routine.
- 8. The method of claim 1, wherein said distance metric is based on a distance between corresponding symbolic feature values of two items based on an overall similarity of classification of all instances for each possible value of said symbolic feature values.
- 9. The method of claim 8, wherein said distance between symbolic features is computed using a Value Difference Metric (VDM) technique.
- 10. The method of claim 1, wherein said step of identifying at least one mean item for each of said k clusters further comprises the steps of:computing a variance for each of said items; and selecting said at least one item that minimizes said variance as the mean symbolic value.
- 11. The method of claim 1, wherein said step of identifying at least one mean item for each of said k clusters, J, further comprises the steps of:computing a variance of each of said clusters, J, for each of said possible symbolic values, xμ, for each of said symbolic attributes; and selecting for each of said symbolic attributes at least one symbolic value, xμ, that minimizes said variance as the mean symbolic value.
- 12. The method of claim 1, wherein said mean is comprised of a plurality of items and wherein said distance metric for a given item in said third party selection history is based on a distance between said given item and each item comprising said mean.
- 13. A method for partitioning a plurality of items into groups of similar items, said plurality of items corresponding to a selection history by at least one third party, said method comprising the steps of:partitioning said third party selection history into k clusters of said plurality of items, said plurality of items including at least one of programs, content and products; identifying at least one mean item for each of said k clusters; assigning each of said plurality of items to one of said clusters based on a distance metric; and incrementing said value of k until a predefined condition is satisfied.
- 14. The method of claim 13, wherein said predefined condition is a further increment of k does not improve a classification accuracy.
- 15. The method of claim 13, wherein said predefined condition is a predefined performance threshold is reached.
- 16. The method of claim 13, wherein said predefined condition is detection of an empty cluster.
- 17. The method of claim 13, wherein said mean is comprised of a plurality of items and wherein said distance metric for a given item in said third party selection history is based on a distance between said given item and each item comprising said mean.
- 18. A system for partitioning a plurality of items into groups of similar items, said plurality of items corresponding to a selection history by at least one third party, said system comprising:a memory for storing computer readable code; and a processor operatively coupled to said memory, said processor configured to: partition said third party selection history into k clusters of said plurality of items, said plurality of items including at least one of programs, content and products; identify at least one mean item for each of said k clusters; and assign each of said plurality of items to one of said clusters based on a distance metric.
- 19. The system of claim 18, wherein said processor is further configured to increment said value of k until a further increment of k does not improve a classification accuracy.
- 20. The system of claim 18, wherein said processor is further configured to increment said value of k until a predefined performance threshold is reached.
- 21. The system of claim 18, wherein said processor is further configured to increment said value of k until an empty cluster is detected.
- 22. The system of claim 18, wherein said processor is further configured to assign a label to each of said clusters.
- 23. The system of claim 22, wherein said processor is further configured to receive a user selection of at least one cluster based on said assigned labels.
- 24. The system of claim 18, wherein said processor performs said partitioning using a k-means clustering routine.
- 25. The system of claim 18, wherein said distance metric is based on a distance between corresponding symbolic feature values of two items based on an overall similarity of classification of all instances for each possible value of said symbolic feature values.
- 26. The system of claim 25, wherein said distance between symbolic features is computed using a Value Difference Metric (VDM) technique.
- 27. The system of claim 18, wherein said processor identifies said at least one mean item for each of said k clusters by:computing a variance for each of said items; and selecting said at least one item that minimizes said variance as the mean symbolic value.
- 28. The system of claim 18, wherein said processor identifies said at least one mean item for each of said k clusters by:computing a variance of each of said clusters, J, for each of said possible symbolic values, xμ, for each of said symbolic attributes; and selecting for each of said symbolic attributes at least one symbolic value, xμ, that minimizes said variance as the mean symbolic value.
- 29. The system of claim 18, wherein said mean is comprised of a plurality of items and wherein said distance metric for a given item in said third party selection history is based on a distance between said given item and each item comprising said mean.
- 30. An article of manufacture for partitioning a plurality of items into groups of similar items, said plurality of items corresponding to a selection history by at least one third party, comprising:a computer readable medium having computer readable code means embodied thereon, said computer readable program code means comprising: a step to partition said third party selection history into k clusters of said plurality of items, said plurality of items including at least one of programs, content and products; a step to identify at least one mean item for each of said k clusters; and a step to assign each of said plurality of items to one of said clusters based on a distance metric.
CROSS-REFERENCE TO RELATED APPLICATIONS
The present invention is related to U.S. patent application Ser. No. 10/014,180 entitled “Method and Apparatus for Evaluating the Closeness of Items in a Recommender of Such Items,” U.S. patent application Ser. No. 10/014,192 entitled “Method and Apparatus for Generating A Stereotypical Profile for Recommending Items of Interest Using Item-Based Clustering,” U.S. patent application Ser. No. 10/014,202 entitled “Method and Apparatus for Recommending Items of Interest Based on Preferences of a Selected Third Party,” U.S. patent application Ser. No. 10/014,195 entitled “Method and Apparatus for Recommending Items of Interest Based on Stereotype Preferences of Third Parties,” and U.S. patent application Ser. No. 10/014,189 entitled “Method and Apparatus for Generating a Stereotypical Profile for Recommending Items of Interest Using Feature-Based Clustering,” each filed contemporaneously herewith, assigned to the assignee of the present invention and incorporated by reference herein.
US Referenced Citations (5)
Number |
Name |
Date |
Kind |
5544256 |
Brecher et al. |
Aug 1996 |
A |
5758257 |
Herz et al. |
May 1998 |
A |
5940825 |
Castelli et al. |
Aug 1999 |
A |
6029195 |
Herz |
Feb 2000 |
A |
6088722 |
Herz et al. |
Jul 2000 |
A |
Non-Patent Literature Citations (5)
Entry |
Cost et al., “A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features,”. |
Kittler et al., “Combining Classifiers,” Inter'l Conf. on Pattern Recognition, vol. II Track B, 897-901, (1996). |
Stanfill et al., “Toward Memory-Based Reasoning,” Communications of the ACM, vol. 29, 1213-1228, (1986). |
Gath et al: “Unsupervised optimal fuzzy clustering” IEEE Transactions On Pattern Analysis And Machine Intelligence, Jul. 1989, pp. 773-781. |
Cost et al: “A Weighted Nearest Neighbor Algorithm for Learning With Symbolic Features” Machine Learning, vol. 10, 1993, pp. 57-58. |