Claims
- 1. A method for partitioning a plurality of items into groups of similar items, the plurality of items corresponding to a selection history by at least one third party, the method comprising the steps of:
partitioning the selection history into k clusters, k having an initial value of at least two; identifying at least one mean item for each of the k clusters; assigning each of the plurality of items to one of the k clusters based on a distance metric; determining a measure of cluster compactness for at least one of the k clusters; and incrementing the value of k if a predetermined criteria is not met by the measure of cluster compactness and then repeating the steps.
- 2. The method of claim 1, wherein the step of incrementing the value of k may be repeated until a further increment of k does not improve a classification accuracy.
- 3. The method of claim 1, further comprising the step of accepting a particular cluster if the predetermined criteria is met by the particular cluster.
- 4. The method of claim 1, wherein the step of partitioning further comprises the step of employing a k-means clustering routine.
- 5. The method of claim 1, wherein the plurality of items are programs.
- 6. The method of claim 1, wherein the plurality of items are content.
- 7. The method of claim 1, wherein the plurality of items are products.
- 8. The method of claim 1, wherein the distance metric is based on a distance between corresponding symbolic feature values of two items based on an overall similarity of classification of all instances for each possible value of the symbolic feature values.
- 9. The method of claim 8, wherein the distance between symbolic features is computed using a Value Difference Metric (VDM) technique.
- 10. The method of claim 1, wherein the step of identifying at least one mean item for each of the k clusters further comprises the steps of:
computing a variance for each of the plurality of items; and selecting the at least one item that minimizes the variance as the mean symbolic value.
- 11. A system for partitioning a plurality of items into groups of similar items, the plurality of items corresponding to a selection history by at least one third party, the system comprising:
a memory for storing computer readable code; and a processor operatively coupled to said memory, said processor configured to:
partition the selection history into k clusters, k having an initial value of at least two; identify at least one mean item for each of the k clusters; assign each of the plurality of items to one of the k clusters based on a distance metric; determine a measure of cluster compactness for at least one of the k clusters; and increment the value of k if a predetermined criteria is not met by the measure of cluster compactness.
- 12. The system of claim 11, wherein said processor is further configured to increment the value of k until a further increment of k does not improve a classification accuracy.
- 13. The system of claim 11, wherein said processor is further configured to accept a particular cluster if the predetermined criteria is met by the particular cluster.
- 14. The system of claim 11, wherein said processor performs said partitioning using a k-means clustering routine.
- 15. The system of claim 11, wherein the distance metric is based on a distance between corresponding symbolic feature values of two items based on an overall similarity of classification of all instances for each possible value of said symbolic feature values.
- 16. The system of claim 15, wherein the distance between symbolic features is computed using a Value Difference Metric (VDM) technique.
- 17. The system of claim 11, wherein said processor identifies the at least one mean item for each of the k clusters by:
computing a variance for each of the plurality of items; and selecting the at least one item that minimizes the variance as the mean symbolic value.
- 18. The system of claim 11, wherein the mean is comprised of a plurality of items and wherein the distance metric for a given item in the selection history is based on a distance between the given item and each item comprising the mean.
- 19. An article of manufacture for partitioning a plurality of items into groups of similar items, said plurality of items corresponding to a selection history by at least one third party, comprising:
a computer readable medium having computer readable code means embodied thereon, said computer readable program code means comprising:
a step to partition said third party selection history into k clusters; a step to identify at least one mean item for each of said k clusters; a step to assign each of said plurality of items to one of said clusters based on a distance metric; a step to determine a measure of cluster compactness for at least one of the k clusters; and a step to increment the value of k if a predetermined criteria is not met by the measure of cluster compactness.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present invention is related to United States patent application entitled “Method and Apparatus for Partitioning a Plurality of Items into Groups of Similar Items in a Recommender of Such Items,” Ser. No. 10/014,216, filed on Nov. 13, 2001, assigned to the assignee of the present invention and incorporated by reference herein.