Claims
- 1. A method for performing clustering within a relational database management system, the method comprising the steps of:establishing at least one table for the storage of Gaussian mixture parameters; and executing a series of SQL commands to update the Gaussian mixture parameters within said at least one table.
- 2. The method for performing clustering within a relational database management system in accordance with claim 1, wherein said SQL statements implement an Expectation-Maximization clustering algorithm.
- 3. A method for performing clustering within a relational database management system to group a set of n data points into a set of k clusters, each data point having a dimensionality p, the method comprising the steps of:establishing a first table, C, having p columns and k rows, for the storage of means values; establishing a second table, R, having p columns and p rows, for the storage of covariance values; establishing a third table, W, having k columns and 1 row, for the storage of weight values; and executing a series of SQL commands implementing an Expectation-Maximization clustering algorithm to iteratively update the means values, covariance values and weight values stored in said first, second and third tables.
- 4. The method for performing clustering within a relational database management system in accordance with claim 3, wherein said step of executing a series of SQL commands implementing an Expectation-Maximization clustering algorithm to iteratively update the means values, covariance values and weight values stored in said first, second and third tables continues until a specified number of iterations has been performed.
- 5. The method for performing clustering within a relational database management system in accordance with claim 3, wherein said first, second and third tables represents matrices.
- 6. The method for performing clustering within a relational database management system in accordance with claim 5, wherein said third table, R, represents a diagonal matrix.
- 7. The method for performing clustering within a relational database management system in accordance with claim 3, wherein:k≦p; and p<<n.
- 8. The method for performing clustering within a relational database management system in accordance with claim 7, wherein:p≦100, and k≦100.
- 9. The method for performing clustering within a relational database management system in accordance with claim 3, further comprising the steps of:establishing a fourth table, Z, having p column and n rows, for the storage of dimensionality values p for each data point n; and establishing a fifth table, Y, having 1 column and p*n rows, for the vertical storage of data points and dimensionality values; and wherein said step of executing a series of SQL commands implementing an Expectation-Maximization clustering algorithm to iteratively update the means values, covariance values and weight values stored in said first, second and third tables includes the steps of: for each of said n data points, calculating Mahalanobis distances using a vertical approach wherein said Mahalanobis distances are calculated by using SQL aggregate functions joining tables Y, C and R; and for each of said n data points, calculating means values, covariance values and weight values using a horizontal approach.
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is related to the following U.S. Patent Applications, filed on even date herewith:
U.S. patent application Ser. No. 09/747,858, now pending, by Paul Cereghini and Carlos Ordonez and entitled “HORIZONTAL IMPLEMENTATION OF EXPECTATION-MAXIMIZATION ALGORITHM IN SQL FOR PERFORMING CLUSTERING IN VERY LARGE DATABASES.”
U.S. patent application Ser. No. 09/47,857, by Paul Cereghini and Carlos Ordonez and entitled “VERTICAL IMPLEMENTATION OF EXPECTATION-MAXIMIZATION ALGORITHM IN SQL FOR PERFORMING CLUSTERING IN VERY LARGE DATABASES.”
US Referenced Citations (5)
| Number |
Name |
Date |
Kind |
|
5872850 |
Klein et al. |
Feb 1999 |
A |
|
6012058 |
Fayyad et al. |
Jan 2000 |
A |
|
6049777 |
Sheena et al. |
Apr 2000 |
A |
|
6421665 |
Brye et al. |
Jul 2002 |
B1 |
|
6438552 |
Tate |
Aug 2002 |
B1 |