Claims
- 1. A method for performing clustering within a relational database management system to group a set of n data points into a set of k clusters, each data point having a dimensionality p, the method comprising the steps of:establishing a first table, C, having 1 column and p*k rows, for the storage of means values; establishing a second table, R, having 1 column and p rows, for the storage of covariance values; establishing a third table, W, having w columns and k rows, for the storage of w weight, values; establishing a fourth table, Y, having 1 column and p*n rows, for the storage of values; and executing a series of SQL commands implementing an Expectation-Maximization clustering algorithm to iteratively update the means values, covariance values and weight values stored in said first, second and third tables; said step of executing a series of SQL commands implementing an Expectation-Maximization clustering algorithm includes the step of calculating a Mahalanobis distance for each of said n data points by using SQL aggregate functions to join tables Y, C and R.
- 2. The method for performing clustering within a relational database management system in accordance with claim 1, wherein said step of executing a series of SQL commands implementing an Expectation-Maximization clustering algorithm to iteratively update the means values, covariance values and weight values stored in said first, second and third tables continues until a specified number of iterations has been performed.
- 3. The method for performing clustering within a relational database management system in accordance with claim 1, wherein said first, second, third and fourth tables represent matrices.
- 4. The method for performing clustering within a relational database management system in accordance with claim 3, wherein said third table, R, represents a diagonal matrix.
- 5. The method for performing clustering within a relational database management system in accordance with claim 1, wherein:k≦p; and p<<n.
- 6. The method for performing clustering within a relational database management system in accordance with claim 5, wherein:p≦100; and k≦100.
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is related to the following U.S. Patent Applications, filed on even date herewith:
U.S. patent application Ser. No. 09/747,856 by Paul Cereghini and Carlos Ordonez and entitled “METHOD FOR PERFORMING CLUSTERING IN VERY LARGE DATABASES,” the disclosure of which is incorporated by reference herein.
U.S. patent application Ser. No. 09/747,858 by Paul Cereghini and Carlos Ordonez and entitled “HORIZONTAL IMPLEMENTATION OF EXPECTATION-MAXIMIZATION ALGORITHM IN SQL FOR PERFORMING CLUSTERING IN VERY LARGE DATABASES.”
US Referenced Citations (5)
| Number |
Name |
Date |
Kind |
|
6115708 |
Fayyad et al. |
Sep 2000 |
A |
|
6226334 |
Olafsson |
May 2001 |
B1 |
|
6345265 |
Thiesson et al. |
Feb 2002 |
B1 |
|
6374251 |
Fayyad et al. |
Apr 2002 |
B1 |
|
6449612 |
Bradley et al. |
Sep 2002 |
B1 |