Claims
- 1. A database management system for in-database clustering, comprising:
a first data table and a second data table, each data table including a plurality of rows of data; means for building a clustering model using the first data table; and means for applying the clustering model using the second data table to generate apply output data.
- 2. The database management system of claim 1, wherein:
the first data table and the second data table are the same data table.
- 3. The database management system of claim 1, wherein:
the first data table and the second data table are different data tables.
- 4. The database management system of claim 1, wherein the means for building a clustering model comprises:
a plurality of clustering model building routines.
- 5. The database management system of claim 4, wherein the plurality of clustering model building routines comprises probabilistic model routine and rule generation routine require at least one clustering algorithm to work with:
a K-means model building routine, an Orthogonal Partitioning Clustering model building routine, a mixture model building routine, a probabilistic model building routine, and a rule generation routine.
- 6. The database management system of claim 5, wherein the K-means model building routine comprises:
means for assigning each of at least of portion of the plurality of rows of data in the first data table to a cluster; and means for updating a centroid of at least one cluster.
- 7. The database management system of claim 6, wherein the means for assigning each of at least of portion of the plurality of rows of data in the first data table to a cluster comprises:
means for computing a distance metric to determine a distance between an input row of data in the first data table and each centroid of a cluster.
- 8. The database management system of claim 7, wherein the means for updating a centroid of at least one cluster comprises:
means for replacing a current centroid of a cluster by a mean of rows of data in the first data table points assigned to the cluster.
- 9. The database management system of claim 5, wherein the mixture model building routine comprises:
means for assigning a row of data in the first data table to a component based on a probability that the row of data belong to a distribution of the component; and means for updating parameters of the distribution of the component using the row of data assigned to the component.
- 10. The database management system of claim 5, wherein the Orthogonal Partitioning Clustering model building routine comprises:
means for generating a hierarchical grid-based clustering model.
- 11. The database management system of claim 5, wherein the rule generation routine comprises:
means for extracting, for a group of clusters, a set of rules from information in histograms of the clusters.
- 12. The database management system of claim 11, wherein the clusters are generated by the Orthogonal Partitioning Clustering model building routine.
- 13. The database management system of claim 4, wherein the plurality of clustering model building routines comprises:
a K-means model building routine, an Orthogonal Partitioning Clustering model building routine, a mixture model building routine, a probabilistic model building routine, and a rule generation routine.
- 14. The database management system of claim 13, wherein the K-means model building routine comprises:
means for assigning each of at least of portion of the plurality of rows of data in the first data table to a cluster; and means for updating a centroid of at least one cluster.
- 15. The database management system of claim 14, wherein the means for assigning each of at least of portion of the plurality of rows of data in the first data table to a cluster comprises:
means for computing a distance metric to determine a distance between an input row of data in the first data table and each centroid of a cluster.
- 16. The database management system of claim 15, wherein the means for updating a centroid of at least one cluster comprises:
means for replacing a current centroid of a cluster by a mean of rows of data in the first data table points assigned to the cluster.
- 17. The database management system of claim 16, wherein the mixture model building routine comprises:
means for assigning a row of data in the first data table to a component based on a probability that the row of data belong to a distribution of the component; and means for updating parameters of the distribution of the component using the row of data assigned to the component.
- 18. The database management system of claim 16, wherein the orthogonal partitioning clustering model building routine comprises:
means for generating a hierarchical grid-based clustering model.
- 19. The database management system of claim 16, wherein the rule generation routine comprises:
means for extracting, for a group of clusters, a set of rules from information in histograms of the clusters.
- 20. The database management system of claim 19, wherein the clusters are generated by the orthogonal partitioning clustering model building routine.
- 21. The database management system of claim 1, wherein the means for applying the clustering model comprises:
a Naïve Bayes apply routine.
- 22. In a database management system, a method for in-database clustering, comprising:
providing a first data table and a second data table, each data table including a plurality of rows of data; building a clustering model using the first data table; and applying the clustering model using the second data table to generate apply output data.
- 23. The method of claim 22, wherein:
the first data table and the second data table are the same data table.
- 24. The method of claim 22, wherein:
the first data table and the second data table are different data tables.
- 25. The method of claim 22, wherein the step of building a clustering model comprises:
a plurality of clustering model building routines.
- 26. The method of claim 25, wherein the plurality of clustering model building routines comprises at least one of:
a K-means model building routine, an Orthogonal Partitioning Clustering model building routine, a mixture model building routine, a probabilistic model building routine, and a rule generation routine.
- 27. The method of claim 26, wherein the K-means model building routine comprises:
assigning each of at least of portion of the plurality of rows of data in the first data table to a cluster; and updating a centroid of at least one cluster.
- 28. The method of claim 27, wherein the step of assigning each of at least of portion of the plurality of rows of data in the first data table to a cluster comprises:
computing a distance metric to determine a distance between an input row of data in the first data table and each centroid of a cluster.
- 29. The method of claim 28, wherein the step of updating a centroid of at least one cluster comprises:
replacing a current centroid of a cluster by a mean of rows of data in the first data table points assigned to the cluster.
- 30. The method of claim 26, wherein the mixture model building routine comprises:
assigning a row of data in the first data table to a component based on a probability that the row of data belong to a distribution of the component; and updating parameters of the distribution of the component using the row of data assigned to the component.
- 31. The method of claim 26, wherein the Orthogonal Partitioning Clustering model building routine comprises:
generating a hierarchical grid-based clustering model.
- 32. The method of claim 26, wherein the rule generation routine comprises:
extracting, for a group of clusters, a set of rules from information in histograms of the clusters.
- 33. The method of claim 32, wherein the clusters are generated by the Orthogonal Partitioning Clustering model building routine.
- 34. The method of claim 25, wherein the plurality of clustering model building routines comprises:
a K-means model building routine, an Orthogonal Partitioning Clustering model building routine, a mixture model building routine, a probabilistic model building routine, and a rule generation routine.
- 35. The method of claim 34, wherein the K-means model building routine comprises:
assigning each of at least of portion of the plurality of rows of data in the first data table to a cluster; and updating a centroid of at least one cluster.
- 36. The method of claim 35, wherein the step of assigning each of at least of portion of the plurality of rows of data in the first data table to a cluster comprises:
computing a distance metric to determine a distance between an input row of data in the first data table and each centroid of a cluster.
- 37. The method of claim 36, wherein the step of updating a centroid of at least one cluster comprises:
replacing a current centroid of a cluster by a mean of rows of data in the first data table points assigned to the cluster.
- 38. The method of claim 34, wherein the mixture model building routine comprises:
assigning a row of data in the first data table to a component based on a probability that the row of data belong to a distribution of the component; and updating parameters of the distribution of the component using the row of data assigned to the component.
- 39. The method of claim 34, wherein the Orthogonal Partitioning Clustering model building routine comprises:
generating a hierarchical grid-based clustering model.
- 40. The method of claim 34, wherein the rule generation routine comprises:
extracting, for a group of clusters, a set of rules from information in histograms of the clusters.
- 41. The method of claim 40, wherein the clusters are generated by the Orthogonal Partitioning Clustering model building routine.
- 42. The method of claim 22, wherein the step of applying the clustering model comprises:
a Naïve Bayes apply routine.
- 43. A database management system for performing in-database clustering, comprising:
a processor operable to execute computer program instructions; and a memory operable to store computer program instructions executable by the processor, for performing the steps of:
providing a first data table and a second data table, each data table including a plurality of rows of data; building a clustering model using the first data table; and applying the clustering model using the second data table to generate apply output data.
- 44. The system of claim 43, wherein:
the first data table and the second data table are the same data table.
- 45. The system of claim 43, wherein:
the first data table and the second data table are different data tables.
- 46. The system of claim 43, wherein the step of building a clustering model comprises:
a plurality of clustering model building routines.
- 47. The system of claim 46, wherein the plurality of clustering model building routines comprises at least one of:
a K-means model building routine, an Orthogonal Partitioning Clustering model building routine, a mixture model building routine, a probabilistic model building routine, and a rule generation routine.
- 48. The system of claim 47, wherein the K-means model building routine comprises:
assigning each of at least of portion of the plurality of rows of data in the first data table to a cluster; and updating a centroid of at least one cluster.
- 49. The system of claim 48, wherein the step of assigning each of at least of portion of the plurality of rows of data in the first data table to a cluster comprises:
computing a distance metric to determine a distance between an input row of data in the first data table and each centroid of a cluster.
- 50. The system of claim 49, wherein the step of updating a centroid of at least one cluster comprises:
replacing a current centroid of a cluster by a mean of rows of data in the first data table points assigned to the cluster.
- 51. The system of claim 47, wherein the mixture model building routine comprises:
assigning a row of data in the first data table to a component based on a probability that the row of data belong to a distribution of the component; and updating parameters of the distribution of the component using the row of data assigned to the component.
- 52. The system of claim 47, wherein the Orthogonal Partitioning Clustering model building routine comprises:
generating a hierarchical grid-based clustering model.
- 53. The system of claim 47, wherein the rule generation routine comprises:
extracting, for a group of clusters, a set of rules from information in histograms of the clusters.
- 54. The system of claim 53, wherein the clusters are generated by the Orthogonal Partitioning Clustering model building routine.
- 55. The system of claim 46, wherein the plurality of clustering model building routines comprises:
a K-means model building routine, an Orthogonal Partitioning Clustering model building routine, a mixture model building routine, a probabilistic model building routine, and a rule generation routine.
- 56. The system of claim 55, wherein the K-means model building routine comprises:
assigning each of at least of portion of the plurality of rows of data in the first data table to a cluster; and updating a centroid of at least one cluster.
- 57. The system of claim 56, wherein the step of assigning each of at least of portion of the plurality of rows of data in the first data table to a cluster comprises:
computing a distance metric to determine a distance between an input row of data in the first data table and each centroid of a cluster.
- 58. The system of claim 57, wherein the step of updating a centroid of at least one cluster comprises:
replacing a current centroid of a cluster by a mean of rows of data in the first data table points assigned to the cluster.
- 59. The system of claim 55, wherein the mixture model building routine comprises:
assigning a row of data in the first data table to a component based on a probability that the row of data belong to a distribution of the component; and updating parameters of the distribution of the component using the row of data assigned to the component.
- 60. The system of claim 55, wherein the Orthogonal Partitioning Clustering model building routine comprises:
generating a hierarchical grid-based clustering model.
- 61. The system of claim 55, wherein the rule generation routine comprises:
extracting, for a group of clusters, a set of rules from information in histograms of the clusters.
- 62. The system of claim 61, wherein the clusters are generated by the Orthogonal Partitioning Clustering model building routine.
- 63. The system of claim 43, wherein the step of applying the clustering model comprises:
a Naïve Bayes apply routine.
- 64. A computer program product for performing in-database clustering in a database management system, comprising:
a computer readable medium; computer program instructions, recorded on the computer readable medium, executable by a processor, for performing the steps of:
providing a first data table and a second data table, each data table including a plurality of rows of data; building a clustering model using the first data table; and applying the clustering model using the second data table to generate apply output data.
- 65. The computer program product of claim 64, wherein:
the first data table and the second data table are the same data table.
- 66. The computer program product of claim 64, wherein:
the first data table and the second data table are different data tables.
- 67. The computer program product of claim 64, wherein the step of building a clustering model comprises:
a plurality of clustering model building routines.
- 68. The computer program product of claim 67, wherein the plurality of clustering model building routines comprises at least one of:
a K-means model building routine, an Orthogonal Partitioning Clustering model building routine, a mixture model building routine, a probabilistic model building routine, and a rule generation routine.
- 69. The computer program product of claim 68, wherein the K-means model building routine comprises:
assigning each of at least of portion of the plurality of rows of data in the first data table to a cluster; and updating a centroid of at least one cluster.
- 70. The computer program product of claim 69, wherein the step of assigning each of at least of portion of the plurality of rows of data in the first data table to a cluster comprises:
computing a distance metric to determine a distance between an input row of data in the first data table and each centroid of a cluster.
- 71. The computer program product of claim 70, wherein the step of updating a centroid of at least one cluster comprises:
replacing a current centroid of a cluster by a mean of rows of data in the first data table points assigned to the cluster.
- 72. The computer program product of claim 68, wherein the mixture model building routine comprises:
assigning a row of data in the first data table to a component based on a probability that the row of data belong to a distribution of the component; and updating parameters of the distribution of the component using the row of data assigned to the component.
- 73. The computer program product of claim 68, wherein the Orthogonal Partitioning Clustering model building routine comprises:
generating a hierarchical grid-based clustering model.
- 74. The computer program product of claim 68, wherein the rule generation routine comprises:
extracting, for a group of clusters, a set of rules from information in histograms of the clusters.
- 75. The computer program product of claim 74, wherein the clusters are generated by the Orthogonal Partitioning Clustering model building routine.
- 76. The computer program product of claim 64, wherein the plurality of clustering model building routines comprises:
a K-means model building routine, an Orthogonal Partitioning Clustering model building routine, a mixture model building routine, a probabilistic model building routine, and a rule generation routine.
- 77. The computer program product of claim 76, wherein the K-means model building routine comprises:
assigning each of at least of portion of the plurality of rows of data in the first data table to a cluster; and updating a centroid of at least one cluster.
- 78. The computer program product of claim 77, wherein the step of assigning each of at least of portion of the plurality of rows of data in the first data table to a cluster comprises:
computing a distance metric to determine a distance between an input row of data in the first data table and each centroid of a cluster.
- 79. The computer program product of claim 78, wherein the step of updating a centroid of at least one cluster comprises:
replacing a current centroid of a cluster by a mean of rows of data in the first data table points assigned to the cluster.
- 80. The computer program product of claim 76, wherein the mixture model building routine comprises:
assigning a row of data in the first data table to a component based on a probability that the row of data belong to a distribution of the component; and updating parameters of the distribution of the component using the row of data assigned to the component.
- 81. The computer program product of claim 76, wherein the Orthogonal Partitioning Clustering model building routine comprises:
generating a hierarchical grid-based clustering model.
- 82. The computer program product of claim 76, wherein the rule generation routine comprises:
extracting, for a group of clusters, a set of rules from information in histograms of the clusters.
- 83. The computer program product of claim 82, wherein the clusters are generated by the Orthogonal Partitioning Clustering model building routine.
- 84. The computer program product of claim 64, wherein the step of applying the clustering model comprises:
a Naïve Bayes apply routine.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The benefit under 35 U.S.C. § 119(e) of provisional application 60/379,118, filed May 10, 2002, is hereby claimed.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60379118 |
May 2002 |
US |