Claims
- 1. A method for classifying an unknown multilineage-affiliated gene, comprising:
(a) isolating a population of cells; (b) separating the population of cells into discrete cell sub-populations; (c) isolating expressed nucleic acid sequences from the discrete cell sub-populations and forming labeled nucleic acid probes from the expressed nucleic acid sequences; (d) hybridizing the labeled nucleic acid probes with a nucleic acid sequence library on an array, wherein identity and intensity of expression of the expressed nucleic acid sequences are identified to provide gene expression hybridization data; (e) converting the gene expression hybridization data into a graphical representation, whereby change in gene expression between the discrete cell sub-populations is profiled; (f) isolating a multilineage-affiliated gene of unknown identity; (g) determining the unknown gene's expression intensity in each of the discrete sub-populations; and, (h) comparing the unknown gene's expression pattern with known gene expression patterns in the graphical representation to associate the unknown gene with a group of known genes.
- 2. The method of claim 1, wherein the discrete sub-populations are selected from the group consisting of HSC, MPP, CMP, and CLP.
- 3. The method of claim 1, wherein the discrete sub-populations comprise HSC, transitional, and differentiated cells, wherein the differentiated cells are selected from the group consisting of adult, embryonic, neonatal, fetal liver, bone marrow, hematopoictic, splenic, and lymphoid stem cells.
- 4. The method of claim 1, wherein the discrete cell sub-populations are separated using cell surface markers.
- 5. The method of claim 1, wherein the expressed nucleic acid sequences comprise non-hematopoietic and hematopoietic genes.
- 6. The method of claim 1, wherein the expressed nucleic acid sequences are selected from the group consisting of RNA, DNA, and EST nucleic acid sequences.
- 7. The method of claim 2, wherein the HSC sub-population is identified by expression of genes selected from the group consisting of SEQ ID NOs. 3428-4863.
- 8. The method of claim 2, wherein the MPP sub-population is identified by expression of genes selected from the group consisting of SEQ ID NOs. 2076-3427.
- 9. The method of claim 2, wherein the CMP sub-population is identified by expression of genes selected from the group consisting of SEQ ID NOs. 822-2075.
- 10. The method of claim 2, wherein the CLP sub-population is identified by expression of genes selected from the group consisting of SEQ ID NOs. 1-821.
- 11. The method of claim 1, wherein the array comprises a substrate and nucleic acid sequences affixed to the substrate, wherein the nucleic acid sequences are selected from the group consisting of SEQ ID NOs. 1-4863.
- 12. The method of claim 1, wherein the gene expression hybridization data is converted to expression level data, whereby the expression level data are determined by comparing hybridization signals for perfect matches and mismatches of the nucleic acid sequences on the array to provide expression level data for each of the nucleic acid sequences.
- 13. The method of claim 12, wherein the expression level data is normalized to provide normalized expression data.
- 14. The method of claim 13, wherein the normalized expression data is filtered using a filter equation given by |yj(m)−yj(l)|>100 and yj(m)/yj(l)>2 for j=1, . . . ,n, where yj(m) and yj(l) are the order statistics with yj(l)≦ . . . ≦yj(m) for the jth gene, whereby this filtering criterion considers simultaneously the absolute difference (>100) of the gene expression levels and the fold change (>2-fold) of the expression levels for each gene (>100) to produce filtered gene expression data.
- 15. The method of claim 12, wherein the expression level data for the populations is statistically treated using similarity distance measurements to determine similarity of expressed genes in each population.
- 16. The method of claim 15, wherein Pearson correlation coefficient is used to compute gene expression intensity and diversity between the discrete sub-populations to establish a statistical measure of expression between sub-populations.
- 17. The method of claim 14, wherein the filtered expression data is organized using hierarchial clustering.
- 18. The method of claim 17, wherein the hierarchial clustering is achieved using K-means clustering.
- 19. The method of claim 1, wherein the graphical representation is made using Eisen software.
- 20. The method of claim 1, wherein the multilineage-affiliated genes are selected from the group consisting of hematopoietic and non-hematopoietic nucleic acid sequences.
- 21. The method of claim 18, wherein the K-means clustering method derives representative clusters 1-8.
- 22. The method of claim 18, wherein the K-means clustering method derives cumulative clusters 1-100.
- 23. The method of claim 14, wherein the filtered data is selected from SEQ ID NOs 1-4863.
- 24. A method for characterizing an unknown multilineage-affiliated gene comprising:
(a) profiling multilineage-affiliated gene expression in discrete cell sub-populations to provide expression data for selected genes in at least two discrete cell populations; (b) isolating an unknown multilineage-affiliated gene; (c) determining the unknown gene's expression characteristics for each of the sub-populations; and, (d) comparing the unknown gene's expression data with the expression data.
- 25. A method for classifying an unknown multilineage-affiliated gene, comprising:
(a) isolating an unknown multilineage-affiliated gene; (b) determining the unknown gene's expression characteristics in at least two discrete cell sub-populations; and, (c) comparing the unknown gene's expression characteristics with profiled gene expression data for the cell sub-populations.
- 26. A method for determining cell stage commitment by comparing nucleic acid expression patterns, comprising:
(a) isolating a population of cells; (b) separating the population of cells into discrete cell sub-populations; (c) isolating expressed nucleic acid sequences from the discrete cell sub-populations and forming labeled nucleic acid probes from the expressed nucleic acid sequences; (d) hybridizing the labeled nucleic acid probes with a nucleic acid sequence library on an array, wherein identity and intensity of expression of the expressed nucleic acid sequences are identified to provide gene expression hybridization data; (e) converting the gene expression hybridization data into a graphical representation, whereby change in gene expression between the discrete cell sub-populations is profiled; (f) isolating a cell of unknown commitment; (g) obtaining gene hybridization data for the unknown cell; (h) organizing the gene hybridization data, whereby identity of expressed genes is determined along with expression level of the identified genes; and, (i) comparing the unknown cell's hybridization data with the gene expression data to determine the cell's commitment.
- 27. The method of claim 26, wherein the graphical representation is a gene cluster expression map.
- 28. The method of claim 26, wherein the cell sub-populations are selected from the group consisting of CMP, CLP, HSC, and MPP.
- 29. The method of claim 26, wherein the gene expression hybridization data is normalized to provide expression data, whereby expression of each nucleic acid sequence is standardized relative to all the nucleic acid sequences.
- 30. The method of claim 26, wherein the normalized expression data is filtered to group genes having similar expression levels.
- 31. The method of claim 26, wherein the gene expression hybridization data is converted to an expression level which is a comparison of hybridization signals for perfect matches and mismatches to provide average expression levels for the genes.
- 32. The method of claim 26, wherein the gene expression data for the populations is statistically treated using similarity distance measurements to determine similarity of expressed genes in each population.
- 33. The method of claim 26, wherein Pearson correlation coefficient is used to plot gene expression intensity and diversity between the discrete sub-populations to establish a statistical measure of expression between sub-populations.
- 34. The method of claim 26, wherein the normalized gene expression data is organized using hierarchial clustering.
- 35. The method of claim 26, wherein the hierarchial clustering is achieved using K-means clustering.
- 36. The method of claim 26, wherein the graphical representation is made using Eisen software.
- 37. The method of claim 26, wherein the discrete sub-populations comprise HSC, transitional, and differentiated cells, wherein the differentiated cells are selected from the group consisting of adult, embryonic, neonatal, fetal liver, bone marrow, hematopoietic, splenic, and lymphoid stem cells.
- 38. A method for predicting cell stage commitment comprising:
(a) profiling multilineage-affiliated gene expression in discrete cell sub-populations to provide reference expression data for selected genes in at least two cell populations; (b) identifying a cell of unknown commitment; (c) determining gene expression level patterns for the unknown cell to provide gene identity and expression level data; and, (d) comparing the unknown cell's gene expression level data with the reference expression data.
- 39. A method for predicting potential of an unknown gene comprising:
(a) isolating an unknown cell; (b) determining the unknown cell's expression characteristics in at least two cell sub-populations; and, (c) comparing the unknown cell's expression characteristics with profiled gene expression data for the cell sub-populations.
- 40. A method for developing a gene expression map, comprising:
(a) isolating at least two sub-populations of cells; (b) obtaining gene hybridization data, including gene identity data and gene expression intensity data, wherein the genes are multilineage-affiliated genes; and, (c) converting the gene hybridization expression data to a graphical illustration.
- 41. The method of claim 40, wherein the graphical illustration is derived using hierarchial clustering.
- 42. The method of claim 40, wherein the graphical representation is made using Eisen software.
- 43. The method of claim 40, wherein the discrete sub-populations are selected from the group consisting of HSC, MPP, CMP, and CLP.
- 44. The method of claim 40, wherein the discrete sub-populations comprise HSC, transitional, and differentiated cells, wherein the differentiated cells are selected from the group consisting of adult, embryonic, neonatal, fetal liver, bone marrow, hematopoietic, splenic, and lymphoid stem cells.
- 45. The method of claim 40, wherein the discrete cell sub-populations are separated using cell surface markers.
- 46. The method of claim 40, wherein the gene expression hybridization data is converted to expression level data, whereby the expression level data are determined by comparing hybridization signals for perfect matches and mismatches of the nucleic acid sequences on the array to provide average expression levels for each of the nucleic acid sequences.
- 47. The method of claim 46, wherein the expression level data is normalized to provide expression data.
- 48. The method of claim 47, wherein the normalized expression data is filtered to group genes having similar expression levels.
- 49. The method of claim 46, wherein the normalized expression data is filtered using a filter equation given by |yj(m)−yj(l)|>100 and yj(m)/yj(l)>2 for j=1, . . . ,n, where yj(m) and yj(l) are the order statistics with yj(l)≦ . . . ≦yj(m) for the jth gene, whereby this filtering criterion considers simultaneously the absolute difference (>100) of the gene expression levels and the fold change (>2-fold) of the expression levels for each gene (>100) to produce filtered expression data.
- 50. The method of claim 46, wherein the expression level data for the populations is statistically treated using similarity distance measurements to determine similarity of expressed genes in each population.
- 51. The method of claim 50, wherein Pearson correlation coefficient is used to compute gene expression intensity and diversity between the discrete sub-populations to establish a statistical measure of expression between sub-populations.
- 52. The method of claim 48, wherein the filtered expression data is organized using hierarchial clustering.
- 53. The method of claim 52, wherein the hierarchial clustering is achieved using K-means clustering.
- 54. The method of claim 40, wherein the multilineage-affiliated genes are selected from the group consisting of hematopoietic and non-hematopoietic nucleic acid sequences.
- 55. A method for forming a hierarchial gene clustering map comprising:
(a) isolating at least two sub-populations of cells; (b) obtaining gene hybridization data, including gene identity data and gene expression intensity data, wherein the genes are multilineage-affiliated genes; (c) normalizing the gene hybridization expression data to provide expression data; (d) filtering the normalized expression data to group genes having similar expression levels; (e) organizing the normalized expression data using hierarchal clustering; and, (f) converting the hierarchal clustering data to a graphical illustration.
- 56. The method of claim 55, wherein the map has an axis with standard deviation values derived from the normalized gene expression values and an axis with at least two cell populations.
- 57. The method of claim 55, wherein the gene expression between sub-populations is represented by a standard deviation in expression between genes in one sub-population versus another, whereby up-regulation and down-regulation are represented.
- 58. The method of claim 55, wherein the standard deviation values range between −1.5 and +1.5, whereby −1.5 represents down-regulation and +1.5 represents up-regulation.
- 59. The method of claim 55, wherein filtered expression data selects nucleic acid sequences selected from the group consisting of SEQ ID NOs. 1-4863.
- 60. The method of claim 55, wherein the clusters are selected from the group consisting of representative clusters 1-8.
- 61. The method of claim 55, wherein the clusters are selected from the group consisting of cumulative clusters 1-100.
- 62. An array comprising a plurality of nucleic acid sequences affixed to a substrate, wherein the nucleic acid sequences are selected from the group consisting of representative clusters 1-8 and cumulative clusters 1-100.
- 63. An array comprising a plurality of nucleic acid sequences affixed to a substrate, wherein the nucleic acid sequences are selected from the group consisting of SEQ ID NOs. 1-4863.
- 64. An array comprising a plurality of nucleic acid sequences affixed to a substrate, wherein the nucleic acid sequences are selected from the group consisting of SEQ ID NOs. 3428-4863.
- 65. An array comprising a plurality of nucleic acid sequences affixed to a substrate, wherein the nucleic acid sequences are selected from the group consisting of SEQ ID NOs. 1-821.
- 66. An array comprising a plurality of nucleic acid sequences affixed to a substrate, wherein the nucleic acid sequences are selected from the group consisting of SEQ ID NOs. 2076-3427.
- 67. An array comprising a plurality of nucleic acid sequences affixed to a substrate, wherein the nucleic acid sequences are selected from the group consisting of SEQ ID NOs. 822-2075.
- 68. A kit for characterizing a gene of unknown function by associating it with genes of HSC, MPP, CMP, and CLP reference cell sub-populastions comprising:
(a) a container; (b) at least one nucleic acid sequence array selected from the arrays of claims 62-67; and, (c) an activated label.
- 69. The kit of claim 68, wherein the solid phase matrix is selected from the group consisting of glass, silicon, plastic, and semi-conductor material.
- 70. A group of nucleic acid sequences for use in determining cell commitment comprising SEQ ID NOs. 1-4863.
- 71. A group of nucleic acid sequences representative of HSC, comprising SEQ ID NOs. 3428-4863.
- 72. A group of nucleic acid sequences representative of CLP, comprising SEQ ID NOs. 1-821.
- 73. A group of nucleic acid sequences representative of MPP, comprising SEQ ID NOs. 2076-3427.
- 74. A group of nucleic acid sequences representative of CMP, comprising SEQ ID NOs. 822-2075.
- 75. A gene cluster for use in analyzing cell differentiation selected from the group consisting of cumulative gene clusters 1-100.
- 76. The gene cluster of claim 75, wherein nucleic acid sequences of SEQ ID NOs. 14863 form the cumulative gene clusters.
- 77. A gene cluster selected from the group consisting of representative gene clusters 1-8.
- 78. A population of non-hematopoiesis-affiliated genes, which comprise genes listed in FIG. 3.
- 79. A population of genes upregulated in CMP, comprising genes listed in FIG. 4D.
- 80. A population of genes upregulated in HSC, comprising genes listed in FIG. 4A.
- 81. A population of genes upregulated in CLP, comprising genes listed in FIG. 4C.
- 82. A population of genes upregulated in MPP, comprising genes listed in FIG. 4B.
- 83. A gene cluster map for use in analysis of multilineage-affiliated genes, comprising:
(a) an axis related to at least two cell populations; (b) an axis comprising normalized gene expression values; and, (c) a plot of genes clustered according to K-means clustering.
- 84. The gene cluster map of claim 83, wherein the normalized gene expression values are computed as standard deviations of a normalized value of gene expression, where the computation on expression level of a gene minus means of expression levels of this gene's standard deviation of the expression level.
- 85. The gene cluster map of claim 83, wherein normalized gene expression values are figured by a comparison between hybridization signals of a PM and a MM.
- 86. The gene cluster map of claim 83, wherein the standard deviation ranges between −1.5 and 1.5.
- 87. The gene cluster map of claim 83, wherein the normalized gene expression values are screened for clustering analysis by filtering.
- 88. A method for gene expression profiling, comprising:
(a) isolating a population of cells; (b) separating the population of cells into discrete cell sub-populations; (c) isolating expressed nucleic acid sequences from the discrete cell sub-populations and forming labeled nucleic acid probes from the expressed nucleic acid sequences; (d) hybridizing the labeled nucleic acid probes with a nucleic acid sequence library on an array, wherein identity and intensity of expression of the expressed nucleic acid sequences are identified to provide gene expression hybridization data; and, (e) converting the gene expression hybridization data into a graphical representation, whereby change in gene expression between the discrete cell sub-populations is profiled.
- 89. A computer system comprising:
(a) a storage device having stored therein a gene data expression routine for compiling gene expression signal data for one or more signals associated with hybridization of labeled nucleic acid probes with a nucleic acid sequence library on an array for an available set of expressed genes; (b) a processor coupled to the storage device for executing the gene data expression routine to determine the intensity of gene expression for each of a plurality of discrete sub-populations comprising the steps of:
(i) collecting identity and expression intensity information to construct gene expression patterns; (ii) mapping gene expression patterns to form gene cluster expression maps for each of the discrete cell sub-populations; (iii) determining an unknown gene's expression intensity in each of the discrete sub-populations to determine the expression level of the gene in each of the discrete sub-populations; (iv) comparing the unknown gene's expression pattern with the gene cluster expression maps; and (v) associating the unknown gene with one of the gene clusters to classify the unknown gene.
- 90. A method for determining cell stage commitment, comprising:
(a) isolating a population of cells; (b) separating the population of cells into discrete cell sub-populations; (c) isolating expressed nucleic acid sequences from the discrete cell sub-populations and forming labeled nucleic acid probes from the expressed nucleic acid sequences; (d) hybridizing the labeled nucleic acid probes with a nucleic acid sequence library on an array, wherein identity and intensity of expression of the expressed nucleic acid sequences are identified to provide gene expression hybridization data; (e) converting the gene expression hybridization data into normalized expression data, whereby change in gene expression between the discrete cell sub-populations is profiled; (f) isolating a multilineage-affiliated gene of unknown identity; (g) determining the unknown gene's expression intensity in each of the discrete sub-populations; and, (h) comparing the unknown gene's expression pattern with known gene expression patterns in the graphical representation to associate the unknown gene with a group of known genes.
- 91. A method for classifying an unknown multilineage-affiliated gene, comprising:
(a) isolating a population of cells; (b) separating the population of cells into discrete cell sub-populations; (c) isolating expressed nucleic acid sequences from the discrete cell sub-populations and forming labeled nucleic acid probes from the expressed nucleic acid sequences; (d) hybridizing the labeled nucleic acid probes with a nucleic acid sequence library on an array, wherein identity and intensity of expression of the expressed nucleic acid sequences are identified to provide gene expression hybridization data; (e) converting the gene expression hybridization data into normalized expression data, whereby change in gene expression between the discrete cell sub-populations is profiled; (f) isolating a multilineage-affiliated gene of unknown identity; (g) determining the unknown gene's expression intensity in each of the discrete sub-populations; and, (h) comparing the unknown gene's expression pattern with known gene expression patterns in the normalized expression data to associate the unknown gene with a group of known genes.
- 92. A method for developing a gene expression map, comprising:
(a) isolating at least two sub-populations of cells; (b) obtaining gene hybridization data, including gene identity data and gene expression intensity data, wherein the genes are multilineage-affiliated genes; and, (c) converting the gene hybridization expression data to normalized gene expression data.
Parent Case Info
[0001] This application is a non-provisional patent application based on U.S. Provisional Patent Application Serial No. 60/377,383, filed May 3, 2002.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60377383 |
May 2002 |
US |