Claims
- 1. A method for generating coordinates for products in a combinatorial library based on features of corresponding building blocks, wherein distances between the coordinates represent relationships between the products, the method comprising the steps of:
(1) obtaining mapping coordinates for a subset of products in the combinatorial library; (2) obtaining building block features for the subset of products in the combinatorial library; (3) using a supervised machine learning approach to infer a mapping function ƒ that transforms the building block features for each product in the subset of products to the corresponding mapping coordinates for each product in the subset of products; and (4) encoding the mapping function ƒ in a computer readable medium, whereby the mapping function ƒ is useful for generating coordinates for additional products in the combinatorial library from building block features associated with the additional products.
- 2. The method according to claim 1, further comprising the step of:
(5) providing building blocks features for at least one additional product to the mapping function ƒ, wherein the mapping function ƒ outputs generated mapping coordinates for the additional product.
- 3. The method according to claim 1, wherein step (1) comprises generating the mapping coordinates for the subset of products.
- 4. The method according to claim 3, wherein step (1) further comprises the steps of:
(a) generating an initial set of mapping coordinates for the subset of products; (b) selecting two products from the subset of products; (c) refining the mapping coordinates of at least one product selected in step (1)(b) based on the coordinates of the two products and a distance between the two products so that the distance between the refined coordinates of the two products is more representative of the relationship between the products; and (d) repeating steps (1)(b) and (1)(c) for additional products until a stop criterion is obtained.
- 5. The method according to claim 1, wherein step (1) comprises calculating the mapping coordinates for the subset of products using a dimensionality reduction algorithm.
- 6. The method according to claim 1, wherein step (1) comprises retrieving the mapping coordinates for the subset of products from a computer readable medium.
- 7. The method according to claim 1, wherein step (2) comprises the step of:
using a laboratory measured value as a feature for each building block in at least one variation site in the combinatorial library.
- 8. The method according to claim 1, wherein step (2) comprises the step of:
using a computed value as a feature for each building block in at least one variation site in the combinatorial library.
- 9. The method according to claim 1, wherein at least some of the building block features represent reagents used to construct the combinatorial library.
- 10. The method according to claim 1, wherein at least some of the building block features represent fragments of reagents used to construct the combinatorial library.
- 11. The method according to claim 1, wherein at least some of the building block features represent modified fragments of reagents used to construct the combinatorial library.
- 12. The method according to claim 1, wherein the mapping function ƒ is encoded as a neural network.
- 13. The method according to claim 1, wherein the mapping function ƒ is a set of specialized mapping functions ƒ1 through ƒn, each encoded as a neural network.
- 14. A system for generating coordinates for products in a combinatorial library based on features of corresponding building blocks, wherein distances between the coordinates represent similarity/dissimilarity of the products, comprising:
means for obtaining mapping coordinates for a subset of products in the combinatorial library; means for obtaining building block features for the subset of products in the combinatorial library; means for using a supervised machine learning approach to infer a mapping function ƒ that transforms the building block features for each product in the subset of products to the corresponding mapping coordinates for each product in the subset of products; and means for encoding the mapping function ƒ in a computer readable medium, whereby the mapping function ƒ is useful for generating coordinates for additional products in the combinatorial library from building block features associated with the additional products.
- 15. The system of claim 14, further comprising:
means for providing building blocks features for at least one additional product to the mapping function ƒ, wherein the mapping function ƒ outputs generated mapping coordinates for the additional product.
- 16. The system of claim 14, wherein said means for obtaining mapping coordinates comprises:
means for generating an initial set of mapping coordinates for the subset of products; means for selecting two products from the subset of products; means for refining the mapping coordinates of at least one product selected based on the coordinates of the two products and a distance between the two products so that the distance between the refined coordinates of the two products is more representative of the relationship between the products; and means for continuously selecting two products at a time and refining the mapping coordinates of at least one product selected until a stop criterion is obtained.
- 17. The system of claim 14, wherein a laboratory measured value is used as a feature for each building block in at least one variation site in the combinatorial library.
- 18. The system of claim 14, wherein a computed value is used as a feature for each building block in at least one variation site in the combinatorial library.
- 19. The system of claim 14, wherein at least some of the building block features represent reagents used to construct the combinatorial library.
- 20. The system of claim 14, wherein at least some of the building block features represent fragments of reagents used to construct the combinatorial library.
- 21. The system of claim 14, wherein at least some of the building block features represent modified fragments of reagents used to construct the combinatorial library.
- 22. The system of claim 14, wherein the mapping finction ƒ is encoded as a neural network.
- 23. The system of claim 14, wherein the mapping function ƒ is a set of specialized mapping functions ƒ1 through ƒn, each encoded as a neural network.
- 24. A computer program product for generating coordinates for products in a combinatorial library based on features of corresponding building blocks, wherein distances between the coordinates represent similarity/dissimilarity of the products, said computer program product comprising a computer useable medium having computer program logic recorded thereon for controlling a processor, said computer program logic comprising:
a procedure that enables said processor to obtain mapping coordinates for a subset of products in the combinatorial library; a procedure that enables said processor to obtain building block features for the subset of products in the combinatorial library; a procedure that enables said processor to use a supervised machine learning approach to infer a mapping function ƒ that transforms the building block features for each product in the subset of products to the corresponding mapping coordinates for each product in the subset of products; and a procedure that enables said processor to encode the mapping function ƒ in a computer readable medium, whereby the mapping function ƒ is useful for generating coordinates for additional products in the combinatorial library from building block features associated with the additional products.
- 25. The computer program product of claim 24, further comprising:
a procedure that enables said processor to provide building blocks features for at least one additional product to the mapping function ƒ, wherein the mapping function ƒ outputs generated mapping coordinates for the additional product.
- 26. The computer program product of claim 24, wherein said procedure that enables said processor to obtain mapping coordinates comprises:
a procedure that enables said processor to generate an initial set of mapping coordinates for the subset of products; a procedure that enables said processor to select two products from the subset of products; a procedure that enables said processor to refine the mapping coordinates of at least one product selected based on the coordinates of the two products and a distance between the two products so that the distance between the refined coordinates of the two products is more representative of the relationship between the products; and a procedure that enables said processor to continue selecting two products at a time and refining the mapping coordinates of at least one product selected until a stop criterion is obtained.
- 27. The computer program product of claim 24, wherein a laboratory measured value is used as a feature for each building block in at least one variation site in the combinatorial library.
- 28. The computer program product of claim 24, wherein a computed value is used as a feature for each building block in at least one variation site in the combinatorial library.
- 29. The computer program product of claim 24, wherein at least some of the building block features represent reagents used to construct the combinatorial library.
- 30. The computer program product of claim 24, wherein at least some of the building block features represent fragments of reagents used to construct the combinatorial library.
- 31. The computer program product of claim 24, wherein at least some of the building block features represent modified fragments of reagents used to construct the combinatorial library.
- 32. The computer program product of claim 24, wherein the mapping function ƒ is encoded as a neural network.
- 33. The computer program product of claim 24, wherein the mapping function ƒ is a set of specialized mapping functions ƒ1 through ƒn, each encoded as a neural network.
- 34. A method for analyzing a combinatorial library {ƒijk, i=1, 2, . . . , r; j=1, 2, . . . , ri; k=1, 2, . . . , n}, wherein r represents the number of variation sites in the library, ri represents the number of building blocks at the i-th variation site, and n represents the number of descriptors used to characterize each reagent, the method comprising the steps of:
(1) computing at least one descriptor for each reagent of the combinatorial library; (2) selecting a training subset of products {pi, i=1, 2, . . . , k} of the combinatorial library; (3) mapping the training subset of products onto m using a nonlinear mapping algorithm (pi→yi, i=1, 2, . . . , k, yi εm); (4) identifying, for each product pi of the training subset of products, corresponding reagents {tij, j=1, 2, . . . , r} and concatenating their descriptors ƒ1ti1,ƒ2ti2, . . . , ƒrtir into a single vector, xi; (5) training a combinatorial network to recognize the mapping xi→yi using input/output pairs of training set T={(xi, yi), i=1, 2, . . . , k}; (6) identifying, after the combinatorial network is trained, for each product {pz, z=1, 2, . . . w} of the combinatorial library to be mapped onto m, corresponding reagents {tj, j=1, 2, . . . , r} and concatenating their descriptors, ƒ1t1, ƒ2t2, . . . , ƒrtr, into a single vector, xz; and (7) mapping xz→yz using the trained combinatorial network, wherein yz represents generated coordinates for product pz.
- 35. The method of claim 34, wherein step (2) comprises:
selecting the training subset of products randomly.
- 36. The method of claim 1, wherein step (3) comprises:
(a) placing the selected training subset of products on an m-dimensional nonlinear map using randomly assigned coordinates; (b) selecting a pair of the products having a similarity relationship; (c) revising the coordinates of at least one of the selected pair of products based on the similarity relationship and the corresponding distance between the products on the nonlinear map; and (d) repeating steps (b) and (c) for additional pairs of the products until the distances between the products on the m-dimensional nonlinear map are representative of the similarity relationships between the products.
- 37. The method of claim 34, further comprising the step of:
storing an output of the trained combinatorial network on a computer readable storage device.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation-in-part of U.S. application Ser. No. 09/934,084, filed Aug. 22, 2001, which is incorporated by reference herein in its entirety, and it claims the benefit of U.S. Provisional Application No. 60/264,258, filed Jan. 29, 2001, and U.S. Provisional Application No. 60/274,238, filed Mar. 9, 2001, each of which is incorporated by reference herein in its entirety.
Provisional Applications (2)
|
Number |
Date |
Country |
|
60264258 |
Jan 2001 |
US |
|
60274238 |
Mar 2001 |
US |
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
09934084 |
Aug 2001 |
US |
Child |
10058216 |
Jan 2002 |
US |