Claims
- 1. A computer-implemented method for identifying patterns in data, the method comprising:
(a) inputting into a classifier a training set having known outcomes, the classifier comprising a decision function having a plurality of weights, each having a weight value, wherein the training set comprises features corresponding to the data and wherein each feature has a corresponding weight; (b) optimizing the plurality of weights so that classifier error is minimized; (c) computing ranking criteria using the optimized plurality of weights; (d) eliminating at least one feature corresponding to the smallest ranking criterion; (e) repeating steps (a) through (d) for a plurality of iterations until a subset of features of pre-determined size remains; and (f) inputting into the classifier a live set of data wherein the features within the live set are selected according to the subset of features.
- 2. The method of claim 1, wherein the classifier is a support vector machine.
- 3. The method of claim 1, wherein the classifier is a soft margin support vector machine.
- 4. The method of claim 1, wherein the ranking criterion corresponding to a feature is calculated by squaring the optimized weight for the corresponding feature.
- 5. The method of claim 1, wherein the decision function is a quadratic function.
- 6. The method of claim 1, wherein step (d) comprises eliminating a plurality of features corresponding to the smallest ranking criteria in a single iteration of steps (a) through (d).
- 7. The method of claim 1, wherein step (d) comprises eliminating a plurality of features corresponding to the smallest ranking criteria in at least the first iteration of steps (a) through (d) and in later iterations, eliminating one feature for each iteration.
- 8. The method of claim 1, wherein step (d) comprises eliminating a plurality of features corresponding to the smallest ranking criteria so that the number of features is reduced by a factor of two for each iteration.
- 9. The method of claim 1, wherein the training set and the live set each comprise gene expression data obtained from DNA micro-arrays.
- 10. The method of claim 1, further comprising pre-processing the training set and the live set so that the features are comparably scaled.
RELATED APPLICATIONS
[0001] The present application claims priority of each of U.S. Provisional Patent Application Serial No. 60/263,696, filed Jan. 24, 2001, U.S. Provisional Patent Application Serial No. 60/298,757, filed Jun. 15, 2001, and U.S. Provisional Patent Application Serial No. 60/275,760, filed Mar. 14, 2001, and is a continuation-in-part of U.S. patent applications Ser. Nos. 09/633,410, filed Aug. 7, 2000, which is a continuation-in-part of application Ser. No. 09/578,011, filed May 24, 2000, which is a continuation-in-part of application Ser. No. 09/568,301, filed May 9, 2000, now issued as Pat. No. ______, which is a continuation of application Ser. No. 09/303,387. filed May 1, 1999, now issued as Pat. No. 6,128,608, which claims priority to U.S. provisional application Serial No. 60/083,961, filed May 1, 1998. This application is related to applications Ser. No. 09/633,615, Ser. No. 09/633,616, and Ser. No. 09/633,850, all filed Aug. 7, 2000, which are also continuations-in-part of application Ser. No. 09/578,011. This application is also related to applications Ser. No. 09/303,386 and Ser. No. 09/305,345, now issued as Pat. No. 6,157,921, both filed May 1, 1999, and to application Ser. No. 09/715,832, filed Nov. 14, 2000, all of which also claim priority to provisional application Serial No. 60/083,961.
Provisional Applications (4)
|
Number |
Date |
Country |
|
60263696 |
Jan 2001 |
US |
|
60298757 |
Jun 2001 |
US |
|
60275760 |
Mar 2001 |
US |
|
60083961 |
May 1998 |
US |
Continuations (1)
|
Number |
Date |
Country |
Parent |
09303387 |
May 1999 |
US |
Child |
10057849 |
Jan 2002 |
US |
Continuation in Parts (3)
|
Number |
Date |
Country |
Parent |
09633410 |
Aug 2000 |
US |
Child |
10057849 |
Jan 2002 |
US |
Parent |
09578011 |
May 2000 |
US |
Child |
10057849 |
Jan 2002 |
US |
Parent |
09568301 |
May 2000 |
US |
Child |
10057849 |
Jan 2002 |
US |