Claims
- 1. A method for partitioning a plurality of items into groups of similar items, said plurality of items corresponding to a selection history by at least one third party, said method comprising the steps of:
partitioning said third party selection history into k clusters; identifying at least one mean item for each of said k clusters; and assigning each of said plurality of items to one of said clusters based on a distance metric.
- 2. The method of claim 1, further comprising the step of incrementing said value of k until a further increment of k does not improve a classification accuracy.
- 3. The method of claim 1, further comprising the step of incrementing said value of k until a predefined performance threshold is reached.
- 4. The method of claim 1, further comprising the step of incrementing said value of k until an empty cluster is detected.
- 5. The method of claim 1, further comprising the step of assigning a label to each of said clusters.
- 6. The method of claim 5, further comprising the step of receiving a user selection of at least one cluster based on said assigned labels.
- 7. The method of claim 1, wherein said partitioning step further comprises the step of employing a k-means clustering routine.
- 8. The method of claim 1, wherein said items are programs.
- 9. The method of claim 1, wherein said items are content.
- 10. The method of claim 1, wherein said items are products.
- 11. The method of claim 1, wherein said distance metric is based on a distance between corresponding symbolic feature values of two items based on an overall similarity of classification of all instances for each possible value of said symbolic feature values.
- 12. The method of claim 11, wherein said distance between symbolic features is computed using a Value Difference Metric (VDM) technique.
- 13. The method of claim 1, wherein said step of identifying at least one mean item for each of said k clusters further comprises the steps of:
computing a variance for each of said items; and selecting said at least one item that minimizes said variance as the mean symbolic value.
- 14. The method of claim 1, wherein said step of identifying at least one mean item for each of said k clusters, J, further comprises the steps of:
computing a variance of each of said clusters, J, for each of said possible symbolic values, xμ, for each of said symbolic attributes; and selecting for each of said symbolic attributes at least one symbolic value, xμ, that minimizes said variance as the mean symbolic value.
- 15. The method of claim 1, wherein said mean is comprised of a plurality of items and wherein said distance metric for a given item in said third party selection history is based on a distance between said given item and each item comprising said mean.
- 16. A method for partitioning a plurality of items into groups of similar items, said plurality of items corresponding to a selection history by at least one third party, said method comprising the steps of:
partitioning said third party selection history into k clusters; identifying at least one mean item for each of said k clusters; assigning each of said plurality of items to one of said clusters based on a distance metric; and incrementing said value of k until a predefined condition is satisfied.
- 17. The method of claim 16, wherein said predefined condition is a further increment of k does not improve a classification accuracy.
- 18. The method of claim 16, wherein said predefined condition is a predefined performance threshold is reached.
- 19. The method of claim 16, wherein said predefined condition is detection of an empty cluster.
- 20. The method of claim 16, wherein said mean is comprised of a plurality of items and wherein said distance metric for a given item in said third party selection history is based on a distance between said given item and each item comprising said mean.
- 21. A system for partitioning a plurality of items into groups of similar items, said plurality of items corresponding to a selection history by at least one third party, said system comprising:
a memory for storing computer readable code; and a processor operatively coupled to said memory, said processor configured to: partition said third party selection history into k clusters; identify at least one mean item for each of said k clusters; and assign each of said plurality of items to one of said clusters based on a distance metric.
- 22. The system of claim 21, wherein said processor is further configured to increment said value of k until a further increment of k does not improve a classification accuracy.
- 23. The system of claim 21, wherein said processor is further configured to increment said value of k until a predefined performance threshold is reached.
- 24. The system of claim 21, wherein said processor is further configured to increment said value of k until an empty cluster is detected.
- 25. The system of claim 21, wherein said processor is further configured to assign a label to each of said clusters.
- 26. The system of claim 25, wherein said processor is further configured to receive a user selection of at least one cluster based on said assigned labels.
- 27. The system of claim 21, wherein said processor performs said partitioning using a k-means clustering routine.
- 28. The system of claim 21, wherein said distance metric is based on a distance between corresponding symbolic feature values of two items based on an overall similarity of classification of all instances for each possible value of said symbolic feature values.
- 29. The system of claim 28, wherein said distance between symbolic features is computed using a Value Difference Metric (VDM) technique.
- 30. The system of claim 21, wherein said processor identifies said at least one mean item for each of said k clusters by:
computing a variance for each of said items; and selecting said at least one item that minimizes said variance as the mean symbolic value.
- 31. The system of claim 21, wherein said processor identifies said at least one mean item for each of said k clusters by:
computing a variance of each of said clusters, J, for each of said possible symbolic values, xμ, for each of said symbolic attributes; and selecting for each of said symbolic attributes at least one symbolic value, xμ, that minimizes said variance as the mean symbolic value.
- 32. The system of claim 21, wherein said mean is comprised of a plurality of items and wherein said distance metric for a given item in said third party selection history is based on a distance between said given item and each item comprising said mean.
- 33. An article of manufacture for partitioning a plurality of items into groups of similar items, said plurality of items corresponding to a selection history by at least one third party, comprising:
a computer readable medium having computer readable code means embodied thereon, said computer readable program code means comprising: a step to partition said third party selection history into k clusters; a step to identify at least one mean item for each of said k clusters; and a step to assign each of said plurality of items to one of said clusters based on a distance metric.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present invention is related to United States Patent Application entitled “Method and Apparatus for Evaluating the Closeness of Items in a Recommender of Such Items,” (Attorney Docket Number US010567), United States Patent Application entitled “Method and Apparatus for Generating A Stereotypical Profile for Recommending Items of Interest Using Item-Based Clustering,” (Attorney Docket Number US010569), United States Patent Application entitled “Method and Apparatus for Recommending Items of Interest Based on Preferences of a Selected Third Party,” (Attorney Docket Number US010572), United States Patent Application entitled “Method and Apparatus for Recommending Items of Interest Based on Stereotype Preferences of Third Parties,” (Attorney Docket Number US010575) and United States Patent Application entitled “Method and Apparatus for Generating a Stereotypical Profile for Recommending Items of Interest Using Feature-Based Clustering,” (Attorney Docket Number US010576), each filed contemporaneously herewith, assigned to the assignee of the present invention and incorporated by reference herein.