Claims
- 1. A method of classifying a biological state from biological data by the detection of discriminatory patterns where the discriminatory pattern describes the biological state.
- 2. A method of classifying a biological state from biological data by the steps of:
a. detecting a discriminatory pattern that is a subset of a larger set of data in a data stream, said discrimination defined by success in a learning set of data; b. applying said discriminatory pattern to classify known or test data samples; and c. using said discriminatory pattern to classify unknown data samples, wherein the discriminatory pattern is indicative of the biological state and is discriminatory even when individual data points are not.
- 3. A method of classifying a biological state in biological data by the detection of discriminatory patterns using a vector space having multiple predetermined diagnostic clusters defining a known biological state comprising the steps of:
a. forming a normalized data stream that describes the biological data; b. abstracting the data stream to calculate a sample vector that characterizes the data stream; c. identifying the diagnostic cluster, if any, within which the sample vector rests; d. assigning to the biological data the diagnosis of the identified diagnostic cluster or, if no cluster is identified, assigning to the biological data the diagnosis of a typical sample, NOS; and e. using said discriminatory pattern to classify unknown data samples, wherein the discriminatory pattern describes the biological state and is discriminatory even when individual data points are not.
- 4. The method of claims 1, 2, or 3, wherein the discrimination is defined by success in a learning set of data, said learning set of data formed from biological data for which the biological state is known.
- 5. The method of claims 1, 2, or 3, wherein the biological data is data describing the expression of molecules in a biological sample.
- 6. The method of claims 1, 2, or 3, wherein the biological data is derived from clinical data.
- 7. The method of claims 1, 2, or 3, wherein the biological data is any combination of clinical data and data describing the expression of molecules in a biological sample.
- 8. The method of claims 1, 2, or 3, wherein the biological data is any combination of non-biological data and data describing the expression of molecules in a biological sample.
- 9. The method of claim 5 wherein the molecules are selected from the group consisting of proteins, peptides, phospholipids, DNA, and RNA.
- 10. The method of claim 7 wherein the molecules are selected from the group consisting of proteins, peptides, phospholipids, DNA, and RNA.
- 11. The method of claim 8 wherein the molecules are selected from the group consisting of proteins, peptides, phospholipids, DNA, and RNA.
- 12. The method of claim 5, wherein the biological sample is selected from the group consisting of serum, blood, saliva, plasma, nipple aspirants, synovial fluids, cerebro spinal fluids, sweat, urine, fecal matter, tears, bronchial lavage, swabbings, needle aspirantas, semen, vaginal fluids, and pre-ejaculate.
- 13. The method of claim 7, wherein the biological sample is selected from the group consisting of any bodily fluid such as serum, blood, saliva, plasma, nipple aspirants, synovial fluids, cerebrospinal fluids, sweat, urine, fecal matter, tears, bronchial lavage, swabbings, needle aspirantas, semen, vaginal fluids, and pre-ejaculate.
- 14. The method of claim 8, wherein the biological sample is selected from the group consisting of any bodily fluid such as serum, blood, saliva, plasma, nipple aspirants, synovial fluids, cerebrospinal fluids, sweat, urine, fecal matter, tears, bronchial lavage, swabbings, needle aspirantas, semen, vaginal fluids, and pre-ejaculate.
- 15. The method of claim 5, wherein the biological sample is selected from the group consisting of tissue culture supernatants, lyophilized tissue cultures, and viral cultures.
- 16. The method of claim 7, wherein the biological sample is selected from the group consisting of tissue culture supernatants, lyophilized tissue cultures, and viral cultures.
- 17. The method of claim 8, wherein the biological sample is selected from the group consisting of tissue culture supernatants, lyophilized tissue cultures, and viral cultures.
- 18. The method of claims 1, 2, or 3, wherein the biological state is a disease.
- 19. The method of claims 1, 2, or 3, wherein the biological state is a stage of a disease.
- 20. The method of claims 1, 2, or 3, wherein the biological state is the prognosis of a disease.
- 21. The method of claims 1, 2, or 3, wherein the biological state is the disease of an internal body organ.
- 22. The method of claims 1, 2, or 3, wherein the biological state is the stage of a disease of an internal body organ.
- 23. The method of claims 1, 2, or 3, wherein the biological state is the health of an internal body organ.
- 24. The method of claims 1, 2, or 3, wherein the biological state is the toxicity of one or more chemicals.
- 25. The method of claims 1, 2, or 3, wherein the biological state is the relative toxicity of one or more chemicals.
- 26. The method of claims 1, 2, or 3, wherein the biological state is the efficacy of a drug.
- 27. The method of claims 1, 2, or 3, wherein the biological state is the efficacy of one or more drugs.
- 28. The method of claims 1, 2, or 3, wherein the biological state is the responsiveness to a regimen of therapy.
- 29. The method of claims 1, 2, or 3, wherein the biological state is the state of perturbation of a body organ.
- 30. The method of claim 1, 2, or 3, wherein the biological state is the presence of one or more pathogens.
- 31. The method of claim 18, wherein the disease is one in which changes in the patterns of expression of inherent molecules in the diseased state are different from the non-diseased state.
- 32. The method of claim 18, wherein the disease is a cancer.
- 33. The method of claim 18, wherein the disease is selected from the group consisting of auto-immune diseases, Alzheimer's disease and arthritis.
- 34. The method of claim 18, wherein the disease is glomerulonephritis.
- 35. The method of claim 18, wherein the disease is any infectious disease.
- 36. The method of claim 32, wherein the cancer is selected from the group consisting of carcinomas, melanomas, lymphomas, sarcomas, blastomas, leukemias, myelomas, and neural tumors.
- 37. The method of claim 37, wherein the carcinoma is a prostatic carcinoma.
- 38. The method of claim 36, wherein the carcinoma is ovarian carcinoma.
- 39. The method of claims 2 or 3, wherein the data stream is formed by any high throughput data generation method.
- 40. The method of claims 2 or 3, wherein the data stream is a time of flight mass spectrum.
- 41. The method of claim 40, wherein the time of flight mass spectrum is generated by surface-enhanced laser desorption time-of-flight mass spectroscopy.
- 42. The method of claim 40, wherein the time of flight mass spectrum is generated by matrix assisted laser desorption ionization time of flight.
- 43. The method of claims 1, 2, or 3, further comprising using any pattern recognition method.
- 44. The method of claim 43, wherein the pattern recognition method further comprises a learning algorithm and a diagnostic algorithm.
- 45. The method of claims 1, 2, or 3, further comprising using a set of learning data streams to construct a diagnostic algorithm for a biological state of interest, wherein the diagnostic algorithm is characterized by having multiple diagnostic clusters of predetermined equal size in a vector space of a fixed number of dimensions, comprising the steps of:
a. providing a set of learning data streams, each data stream describing a biological sample with a known biological state; b. selecting an initial set of random logical chromosomes that specify the location of a predetermine number of points of the data stream; c. calculating a vector for each chromosome and for each data stream by abstracting the data stream at locations specified by the chromosome; d. determining a fitness of each chromosome by finding the locations in the vector space of a multiplicity of non-overlapping data clusters of the predetermined, equal size that maximize the number of vectors that rest in a cluster having a uniform status, wherein the larger the number of such vectors the larger the fitness; e. optimizing the set of logical chromosomes by an iterative process comprising reiteration of steps (c) and (d), terminating logical chromosomes with low fitness, replicating logical chromosomes of high fitness, recombination and random modification of the chromosomes; f. terminating the iterative process and selecting a logical chromosome that allows for a preferred set of non-overlapping data clusters; and g. constructing a diagnostic algorithm that embodies the selected logical chromosome and homogeneous non-overlapping data clusters.
- 46. The method of claim 45, further comprising the step of testing a diagnostic algorithm embodying an optimized chromosome and a fitness-maximizing set of data clusters to determine how accurately the diagnostic algorithm diagnoses a test set of data streams each having a known diagnosis that is independent of the instructional data streams.
- 47. The method of claim 45 wherein the vector space contains between 5 and 10 dimensions.
- 48. A method of diagnosing the disease of an organ of an individual which comprises:
a. analyzing a biological sample from the subject and calculating from the analysis a normalized vector, having at least 4 scalars and not more than 20 scalars, that is characteristic of the sample; b. providing a vector space of between 4 and 20 dimensions occupied by a data cluster map comprising at least 6 equal-sized, non-overlapping data clusters, a multiplicity of which data clusters are associated with a disease diagnosis and a multiplicity of which data clusters are associated with a normal samples and no data cluster of said map is associated with more than one diagnosis; c. calculating in which, if any, of the data clusters of the data cluster map the characteristic vector rests; and d. assigning to the sample the disease diagnosis associated with the data cluster in which the characteristic vector rests or, if the vector rests in no cluster assigning a classification of non-normal.
- 49. A method of diagnosing the stage of a disease of an organ of an individual which comprises:
a. analyzing a biological sample from the subject and calculating from the analysis a normalized vector, having at least 4 scalars and not more than 20 scalars, that is characteristic of the sample; b. providing a vector space of between 4 and 20 dimensions occupied by a data cluster map comprising at least 6 equal-sized, non-overlapping data clusters, a multiplicity of which data clusters are associated with a disease diagnosis and a multiplicity of which data clusters are associated with a normal sample s and no data cluster of said map is associated with more than one diagnosis; c. calculating in which, if any, of the data clusters of the data cluster map the characteristic vector rests; and d. assigning to the sample the disease diagnosis associated with the data cluster in which the characteristic vector rests or, if the vector rests in no cluster assigning a classification of non-normal.
- 50. The method of claim 48, wherein the disease is a cancer.
- 51. The method of claim 49, wherein the disease is a cancer.
- 52. The method of claim 49, wherein the stage of the disease is a primary malignancy.
- 53. The method of claims 48 or 49, wherein the biological sample is selected from the group consisting of any bodily fluid such as serum, blood, saliva, plasma, nipple aspirants, synovial fluids, cerebrospinal fluids, sweat, urine, fecal matter, tears, bronchial lavage, swabbings, needle aspirantas, semen, vaginal fluids, and pre-ejaculate.
- 54. The method of claims 48 or 49 wherein the data cluster map defines a pattern, wherein at least one scalar of the vector is a contextual diagnostic product.
- 55. The method of claims 48 or 49, wherein the size of the data cluster is defined by a Euclidean metric.
- 56. A method of diagnosing a primary malignancy of an organ of a subject which comprises:
a. analyzing a biological sample from the subject and calculating from the analysis a normalized vector, having at least 4 scalars, that is characteristic of the sample; b. providing a vector space of occupied by a data cluster map comprising at least 6 equal-sized, non-overlapping data clusters, a multiplicity of which data clusters are associated with a malignant diagnosis and a multiplicity of which data clusters are associated with a benign diagnosis and no data cluster of said map is associated with more than one diagnosis, wherein at least one scalar measures a product that is a contextual diagnostic product and wherein the size of the data cluster is defined by a Euclidean metric; c. calculating in which, if any, of the data clusters of the data cluster map the characteristic vector rests; and d. assigning to the sample the diagnosis associated with the data cluster in which the characteristic vector rests or if the vector rest in no data cluster assigning a diagnosis of non-normal, non-malignant.
- 57. The method of claim 56, wherein the biological sample is selected from the group consisting of any bodily fluid such as serum, blood, saliva, plasma, nipple aspirants, synovial fluids, cerebrospinal fluids, sweat, urine, fecal matter, tears, bronchial lavage, swabbings, needle aspirantas, semen, vaginal fluids, pre-ejaculate.
- 58. The method of claim 56, wherein the a multiplicity of scalars measure products that are contextual diagnostic products.
- 59. A computer software product that specifies computer executable code to execute a program comprising the following steps:
a. inputting a normalized data stream that describes a biological sample with a sample identifier; b. inputting a set of diagnostic clusters, each cluster associated with a diagnosis of a known biological state; c. abstracting the data stream to calculate a sample vector that characterizes the data stream; d. identifying the diagnostic cluster, if any, within which the sample vector falls; e. assigning to the sample the diagnosis of the identified diagnostic cluster or, if no cluster is identified assigning to the sample the diagnosis of non-normal, non-malignant; and f. outputting the assigned diagnosis and the sample identifier.
- 60. A general purpose digital computer comprising a program to execute the executable code of claim 59.
- 61. A computer software product that specifies computer executable code to execute a program comprising the following steps:
a. inputting a set of instructional data streams, each data stream describing a biological sample with a known biological state; b. inputting an operator specified number of points and an operator specified cluster size; c. selecting an initial set of random logical chromosomes that specify the location of the pre-specified number of points of the data stream; d. calculating a vector for each chromosome and for each data stream by abstracting the data stream at locations specified by the chromosome; e. determining a fitness of each chromosome by finding the locations in the vector space of a multiplicity of non-overlapping data clusters of the pre-specified size that maximize the number of vectors that rest in clusters having a uniform status, wherein the larger the number of such vectors the higher the fitness; f. optimizing the set of logical chromosomes by an iterative process comprising reiteration of steps (d) and (e), terminating logical chromosomes with low fitness, replicating logical chromosomes of high fitness, recombination and random modification of the chromosomes; g. terminating the iterative process; and h. outputting an optimized logical chromosome, and the locations of the data clusters that maximize the fitness of the optimized chromosome, so that a diagnostic algorithm that embodies the outputted logical chromosome and data clusters can be implemented.
- 62. A general purpose digital computer comprising a program to execute the executable code of claim 61.
- 63. A diagnostic model to determine a biological state of interest, wherein the diagnostic algorithm is characterized by having multiple diagnostic clusters of predetermined equal size in a vector space of a fixed number of dimensions.
- 64. The diagnostic model of claim 63, wherein the diagnostic clusters are produced by the following steps:
a. providing a set of learning data streams, each data stream describing a biological sample with a known biological state; b. selecting an initial set of random logical chromosomes that specify the location of a predetermine number of points of the data stream; c. calculating a vector for each chromosome and for each data stream by abstracting the data stream at locations specified by the chromosome; d. determining a fitness of each chromosome by finding the locations in the vector space of a multiplicity of non-overlapping data clusters of the predetermined, equal size that maximize the number of vectors that rest in a cluster having a uniform status, wherein the larger the number of such vectors the larger the fitness; e. optimizing the set of logical chromosomes by an iterative process comprising reiteration of steps (c) and (d), terminating logical chromosomes with low fitness, replicating logical chromosomes of high fitness, recombination and random modification of the chromosomes; f. terminating the iterative process and selecting a logical chromosome that allows for a preferred set of non-overlapping data clusters.
- 65. The diagnostic clusters produced by the model of claim 64.
Parent Case Info
[0001] This application claims benefit under 35 U.S.C. sec. 119(e)(1) of the priority of application Serial No. 60/232,909, filed Sep. 12, 2000, Serial No. 60/278,550, filed Mar. 23, 2001, Serial No. 60/219,067, filed Jul. 18, 2000, and U.S. Provisional Application titled “A Data Method Algorithm Reveals Disease with Protein Signal of Ovarian and Prostate Cancer in Serum,” (Serial. No. to be assigned), filed May 8, 2001 which is hereby incorporated by reference in its entirety.
Provisional Applications (3)
|
Number |
Date |
Country |
|
60232909 |
Sep 2000 |
US |
|
60278550 |
Mar 2001 |
US |
|
60219067 |
Jul 2000 |
US |