Multiple sclerosis (MS) is a common disabling neurologic disease of young adults (1). Most patients with MS initially present with a clinically isolated syndrome (CIS) due to an inflammatory demyelinating insult in the central nervous system (CNS). Approximately one third of CIS patients will progress to clinically definite MS (CDMS) within 1 year after diagnosis and about half will do so after 2 years (2, 3). Although MRI assessment is routinely used to monitor and forecast conversion into MS, its specificity remains moderate (3). It is estimated that about 10% of CIS patients will remain free of further demyelinating attacks and neurological complications even in the presence of radiological evidence of white matter lesions (4). Although structural neuro-imaging studies are invaluable in the diagnosis and clinical surveillance of MS (3, 5), there is currently no biological marker that accurately predicts MS conversion in CIS patients. Individualized early prognosis and prediction of CDMS would be of substantial value because patients at high risk for rapid progression could be offered disease-modifying therapy, an approach shown to be beneficial in early MS (2).
The present invention meets these and other needs in the art.
The present invention provides methods and kits for identifying clinically isolated syndrome (CIS) patients at high risk of developing multiple sclerosis (MS).
In one aspect, a method is provided for identifying a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS). The method includes detecting the level of expression of a marker gene within the patient. The marker gene is a marker gene set forth in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, or Table 13, or the marker gene includes a nucleic acid sequence of at least 10 nucleotides in length and at least 90% (e.g., 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) identity with a contiguous portion of one of SEQ ID NO:1 to SEQ ID NO:1021. The level of expression of the marker gene is then compared to a standard control whereby a differential expression of the marker gene relative to the standard control indicates that the patient is at high risk of developing multiple sclerosis.
In another aspect, a method is provided for identifying a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS). The method includes detecting the level of expression of a plurality (e.g. a panel or group) of marker genes within the patient. The plurality of marker genes are all or a portion of marker genes listed in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, or Table 13, or the plurality of marker genes comprise a nucleic acid of at least 10 nucleotides in length and at least 90% identity with a contiguous region of all or a portion of marker gene sequences listed in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, or Table 13. The marker gene sequences listed in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, or Table 13 are referenced as SEQ ID numbers. In some embodiments, the plurality of marker genes are all or a portion of marker genes listed in one of Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, or Table 13, or the plurality of marker genes comprise a nucleic acid of at least 10 nucleotides in length and at least 90% identity with a contiguous region of all or a portion of marker gene sequences listed in one of Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, or Table 13. In other embodiments, the plurality of marker genes are all marker genes listed in one of Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, or Table 13, or the plurality of marker genes comprise a nucleic acid of at least 10 nucleotides in length and at least 90% identity with a contiguous region of all marker gene sequences listed in one of Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, or Table 13. The level of expression of the marker gene to a standard control is compared whereby a differential expression of the marker gene relative to the standard control indicates that the patient is at high risk of developing multiple sclerosis.
In another aspect, a kit is provided for use in identifying a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS). The kit includes (i) a nucleic acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 or 20 nucleotide continuous region with one or more nucleic acids within a marker gene identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, and/or Table 13, (ii) a nucleic acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 or 20 nucleotide continuous region with a target sequence to which the probe set identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 and/or Table 13 is designed to interrogate, or (iii) a nucleic acid complimentary to the nucleic acids set forth in (i) or (ii) above. In some embodiments, the kit also includes an electronic device or computer software capable of comparing a marker gene expression level from the patient to a standard control thereby indicating whether the patient is at high risk of developing multiple sclerosis.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Generally, the nomenclature used herein and the laboratory procedures in cell culture, molecular genetics, and nucleic acid chemistry and hybridization described below are those well known and commonly employed in the art.
As used herein, “nucleic acid” means either DNA, RNA, single-stranded, double-stranded, or more highly aggregated hybridization motifs, and any chemical modifications thereof. Modifications include, but are not limited to, those which provide other chemical groups that incorporate additional charge, polarizability, hydrogen bonding, electrostatic interaction, and functionality to the nucleic acid ligand bases or to the nucleic acid ligand as a whole. Such modifications include, but are not limited to, peptide nucleic acids, phosphodiester group modifications (e.g., phosphorothioates, methylphosphonates), 2′-position sugar modifications, 5-position pyrimidine modifications, 8-position purine modifications, modifications at exocyclic amines, substitution of 4-thiouridine, substitution of 5-bromo or 5-iodo-uracil; backbone modifications, methylations, unusual base-pairing combinations such as the isobases isocytidine and isoguanidine and the like. Modifications can also include 3′ and 5′ modifications such as capping.
The term “nucleic acid” or “polynucleotide” refers to deoxyribonucleotides or ribonucleotides and polymers thereof in either single- or double-stranded form. Unless specifically limited, the term encompasses nucleic acids containing known analogues of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions), alleles, orthologs, SNPs (haplotypes), and complementary sequences as well as the sequence explicitly indicated. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al., Nucleic Acid Res. 19:5081 (1991); Ohtsuka et al., J. Biol. Chem. 260:2605-2608 (1985); and Cassol et al. (1992); Rossolini et al., Mol. Cell. Probes 8:91-98 (1994)). The term nucleic acid is used interchangeably with gene, cDNA, and mRNA encoded by a gene.
The phrase “selectively (or specifically) hybridizes to” refers to the detectable binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under stringent hybridization conditions when that sequence is present in a complex mixture (e.g., total cellular or library DNA or RNA).
The terms “identical” or percent “identity,” in the context of two or more nucleic acids, refer to two or more sequences or subsequences that are the same or have a specified percentage of nucleotides that are the same (i.e., about 60% identity, preferably 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or higher) identity over a specified region, when compared and aligned for maximum correspondence over a comparison window or designated region) as measured using a BLAST or BLAST 2.0 sequence comparison algorithms with default parameters described below, or by manual alignment and visual inspection (see, e.g., the NCBI web site or the like). Such sequences are then said to be “substantially identical.” This definition also refers to, or may be applied to, the compliment of a test sequence. The definition also includes sequences that have deletions and/or additions, as well as those that have substitutions. As described below, the preferred algorithms can account for gaps and the like. Preferably, identity exists over a region that is at least about 10, 11, 12, 13, 14, 15, 20, 25 amino acids or nucleotides in length, or over a region in the range 10-20, 10-25, 10-30, 10-40, 10-50, 10-60, 10-70, 10-80, 10-90, or even 10-100. In certain preferred embodiments, identity exists over a region that is 10-100 amino acids or nucleotides in length.
The phrase “stringent hybridization conditions” refers to conditions under which a first nucleic acid will hybridize to its target subsequence, typically in a complex mixture of nucleic acid, but not detectably to other sequences. Stringent conditions are sequence-dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. An extensive guide to the hybridization of nucleic acids is found in Tijssen, Techniques in Biochemistry and Molecular Biology—Hybridization with Nucleic Probes, “Overview of principles of hybridization and the strategy of nucleic acid assays” (1993). Generally, stringent conditions are selected to be about 5-10° C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength pH. The Tm is the temperature (under defined ionic strength, pH, and nucleic concentration) at which 50% of the probes complementary to the target hybridize to the target sequence at equilibrium (as the target sequences are present in excess, at Tm, 50% of the probes are occupied at equilibrium). Stringent conditions will be those in which the salt concentration is less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30° C. for short probes (e.g., 10 to 50 nucleotides) and at least about 60° C. for long probes (e.g., greater than 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. For selective or specific hybridization, a positive signal is at least two times background, optionally 10 times background hybridization. Exemplary stringent hybridization conditions can be as following: 50% formamide, 5×SSC, and 1% SDS, incubating at 42° C., or 5×SSC, 1% SDS, incubating at 65° C., with wash in 0.2×SSC, and 0.1% SDS at 65° C. Such washes can be performed for 5, 15, 30, 60, 120, or more minutes.
Exemplary “moderately stringent hybridization conditions” include a hybridization in a buffer of 40% formamide, 1 M NaCl, 1% SDS at 37° C., and a wash in 1×SSC at 45° C. Such washes can be performed for 5, 15, 30, 60, 120, or more minutes. A positive hybridization is at least twice background. Those of ordinary skill will readily recognize that alternative hybridization and wash conditions can be utilized to provide conditions of similar stringency.
The terms “differential expression” or “differentially expressed” used in reference to the expression of a marker gene means an elevated level of expression of the marker gene or a lowered level of expression of the marker gene relative to a standard control that is indicative of a high risk CIS patient, as set forth in the methods and results disclosed herein (e.g. Tables 1, 2, 10A-12C, and 15A-17C). As is customary in the art, marker genes described herein each have an associated name (e.g., C17orf65, C4orf10, FAM98A, and the like). Accordingly, reference to a marker gene name in turn refers to the marker gene itself. “Target sequence” refers to a region within a target gene (e.g., marker gene) which a probe will identify, as known in the art. The term “probe set identifier” refers to set of nucleic acid probes capable of identifying a particular marker gene (e.g. target sequence). Probe set identifiers may be provided by Affymetrix (Santa Clara, Calif.), for example, as known in the art and disclosed herein. It is understood that one of skill in the art can, with only routine experimentation, design and use probes to identify specific marker genes as described herein. It is further understood that more than one probe, and more than one probe set identifier may be designed to identify a specific gene, for example a marker gene described herein.
Provided herein are methods of determining whether a patient with clinically isolated syndrome (CIS) is at high risk of developing multiple sclerosis (MS). CIS patients at high risk of developing MS are typically those patients that develop MS within two years of being initially diagnosed with CIS or within two years of the onset of CIS. In some embodiments, high risk CIS patients are those that develop CIS within 18 months, 12 months, or 9 months of being initially diagnosed with CIS or the onset of CIS. Thus, the methods provided herein are useful in identifying CIS patients that are likely to develop MS quickly relative to the average CIS patient.
It has been discovered that certain genes are markers of rapid development of MS in CIS patients. These marker genes were identified as genes that are differentially expressed relative to healthy individuals and/or CIS patients that do not develop MS quickly (i.e. those that are at low risk of rapid onset of MS). Thus, by detecting the level of expression of a marker gene within a CIS patient and comparing the level of expression of the maker gene to a standard control, high risk CIS patients may be identified. In some embodiments, the level of a plurality (e.g. a panel) of marker genes are detected and compared to the level of expression of the maker gene to a standard control to identify high risk CIS patients. Specific panels or groups of maker genes are discussed below. In some embodiments, the standard control may be approximately the average amount of expression of the marker gene(s) in humans, humans without CIS, or humans with CIS that are not at high risk of developing MS. In other embodiments, the standard control is a detected level of expression of a standard control gene in the CIS patient.
In some embodiments, the marker gene is a gene set forth in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 and/or Table 13. In some embodiments, the marker gene is any one of the genes set forth in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 and/or Table 13. In some embodiments, a plurality (e.g. a panel or group) of marker genes are detected. Thus, in some embodiments all the marker genes set forth in one of the following tables is selected: Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 or Table 13. In another embodiment, at least 2-9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 400, 500, 600, 700, or 800 of the marker genes set forth in one of the following tables is selected, as appropriate according to the number of genes set forth within the following tables: Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 or Table 13. Where a plurality (e.g. panel or group) of genes are detected, any combination of the marker genes disclosed in relevant table(s) may be detected. By measuring the expression levels of these one or more of these marker genes in a patient with CIS and comparing those levels with healthy individuals and/or CIS patients that do not develop MS quickly, the risk of the patient developing MS may be assessed. In some related embodiments, the marker gene is ZNF12, C17orf65, BAT1, ARHGDIA, NAPA, ATP5G2, DDX52, NDFIP1, SDAD1, USP7, MEF2A, AGER, RAB1B, GDI1 and/or BANF1. In other related embodiments, the marker gene is ZNF12, C17orf65, BAT1, ARHGDIA, NAPA, ATP5G2, DDX52, NDFIP1 and/or SDAD1. In still other related embodiments, the marker gene is USP7, MEF2A, AGER, RAB1B, GDI1 and/or BANF1. In other embodiments, the marker gene is TOB1. In other embodiments, the marker gene is not TOB1. In some embodiments, the marker gene is C17orf65, C4orf10, FAM98A, TLE1, INHBC, NAPA, TKT, TPT1, FLJ20054, KIAA0794, LOC 134492, and/or MGC34648. In some embodiments, the marker gene is any one of C17orf65, C4orf10, FAM98A, TLE1, INHBC, NAPA, TKT, TPT1, FLJ20054, KIAA0794, LOC134492 or MGC34648. In some embodiments, the marker gene is included within a plurality of genes selected from C17orf65, C4orf10, FAM98A, TLE1, INHBC, NAPA, TKT, TPT1, FLJ20054, KIAA0794, LOC134492 and MGC34648. In some embodiments, the marker gene is CD1D, CD44, CDC34, CDKN1C, CD47, GZMM, and/or PPIA. In some embodiments, the marker gene is any one of CD1D, CD44, CDC34, CDKN1C, CD47, GZMM, or PPIA. In some embodiments, the marker gene included within a plurality of genes selected from CD1D, CD44, CDC34, CDKN1C, CD47, GZMM, or PPIA.
In certain embodiments, the method described herein for detecting the level of expression of a marker gene is an in vitro method. In some embodiments, the marker gene is a gene set forth in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 and/or Table 13, and detection is conducted in vitro (e.g. on a biological sample derived from a CIS patient).
The expression levels of the marker genes may be measured using any appropriate method. In some embodiments, the amount of RNA expressed by the marker gene is measured. The amount of RNA expressed may be assessed, for example, using nucleic acid probes with marker gene coding sequences or using quantitative PCR techniques. For example, a nucleic acid array forming a probe set may be used to detect RNA expressed by the marker gene. The RNA expressed by the marker gene may be transcribed to cDNA (and in some cases to cRNA) and then queried with a gene chip array using methods known in the art. Thus, in some embodiments the marker gene may also be a gene including a nucleic acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 or 20 nucleotide continuous region (i.e. sequence) within a marker gene identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, and/or Table 13, or with a target sequence to which the probe set identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 and/or Table 13 is designed to interrogate. For example, the continuous region may be 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 nucleotides in length. In related embodiments, the marker gene includes a nucleic acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity with the entire length of one or more nucleic acids within a marker gene identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, and/or Table 13, or with a target sequence to which the probe set identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 and/or Table 13 is designed to interrogate. In other related embodiments, the marker gene includes a nucleic acid sequence having 100% identity with the entire length of one or more nucleic acids within a marker gene identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, and/or Table 13, or with a target sequence to which the probe set identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 and/or Table 13 is designed to interrogate. In other related embodiments, “one or more” nucleic acids within a probe set identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, and/or Table 13 referred to above is the majority or all of the nucleic acids within the probe set.
Tables 1A, 2, 4, 8, 13, 18 and 19 provide probe set identifiers using Affymetrix probe set identifier numbers, as known in the art. The nucleic acid sequences contained within each probe set identifier number and the target sequence to which the probe set is designed to interrogate are publicly available in a variety of sources, including the Affymetrix website and the National Cancer Institute website. The term “designed to interrogate” in the context of target genes, marker genes and probes refers to a probe having sufficient primary sequence complementarity to a target to detectably bind the target, as well known in the art.
In some embodiments, the marker gene includes a nucleic acid sequence within a marker gene identified in Table 1A. In other embodiments, the marker gene includes a nucleic acid sequence within a marker gene identified in Table 2. In other embodiments, the marker gene includes a nucleic acid sequence within a p marker gene identified in Table 4. In other embodiments, the marker gene includes a nucleic acid sequence within a marker gene identified in Table 8. In other embodiments, the marker gene includes a nucleic acid sequence within a marker gene identified in Table 13. In some embodiments, the marker gene is a gene set forth in Table 18. In some embodiments, the marker gene is C17orf65 (SEQ ID NO:977), C4orf10 (SEQ ID NO:1005), FAM98A (SEQ ID NO:1020), TLE1 (SEQ ID NO:844), INHBC (SEQ ID NO:993), NAPA (SEQ ID NO:995), TKT (SEQ ID NO:994), TPT1 (SEQ ID NO:138), F1120054 (SEQ ID NO:11), KIAA0794 (SEQ ID NO:104), LOC134492 (SEQ ID NO:184), and/or MGC34648 (SEQ ID NO:348). In some embodiments, the expression levels of a plurality (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 and/or 13) of marker genes as disclosed in Table 18 are detected. In some embodiments, the marker gene is a gene set forth in Table 19. In some embodiments, the marker gene is CD1D (SEQ ID NO:376), CD44 (SEQ ID NO:275), CDC34 (SEQ ID NO:553), CDKN1C (SEQ ID NO:320), CD47 (SEQ ID NO:1015), GZMM (SEQ ID NO:617), and/or PPIA (SEQ ID NO:1010). In some embodiments, the expression levels of a plurality (e.g., 2, 3, 4, 5, 6 or 7) of marker genes as disclosed in Table 19 are detected.
The comparison of the marker gene expression levels with a standard control may be accomplished by determining whether the marker gene is expressed in the CIS patient at an elevated level or a lowered level (i.e. detecting differential expression). The elevated or lowered levels are indicative of rapid development of multiple sclerosis (MS) (e.g. within two years of being initially diagnosed with CIS). Whether elevation or lowering of expression of a particular marker gene expression is indicative of rapid onset of MS in a CIS patient is clearly set forth in Tables 1A, 2, 10A-12C, and 15A-17C. For example, where the marker gene is TOB1, Table 1A clearly shows that lowered expression of TOB1 is indicative of rapid onset of MS in a CIS patient.
The standard control may be any appropriate standard known in the art. In some embodiments, the standard control is approximately the average amount of expression of the marker gene in humans, humans without CIS, or humans with CIS that are not at high risk of developing MS. Approximate average relative amounts of expression of marker genes are set forth in Tables 1A and 2 in a sample of humans without CIS, humans with CIS that are not at high risk of developing MS, and humans with CIS at high risk of developing MS. In addition, Table 4 provides approximate average amounts of expression of genes for humans with CIS and humans without CIS.
In other embodiments, the standard control is a detected level of expression of a standard control gene in the CIS patient. As used herein, a standard control gene is a human gene that is expressed at approximately constant levels thereby providing a baseline reading of gene expression for an individual. The standard control gene may also be referred to herein and in the art as a housekeeping gene. In some embodiments, the standard control gene is GAPDH, 18s ribosomal subunit, beta actin (ACTB), PPP1CA, beta 2 microglobulin (B2M), HPRT1, RPS13, RPL27, RPS20 or OAZ1.
The elevated level of expression of the marker gene or the lowered level of expression of the marker gene may be determined by calculating the ratio of the level of expression of the marker gene to the level of expression of a standard control gene. For example, Table 1B lists an average amount of GAPDH in the subjects studied according to the examples set forth below. The corresponding ratios of marker genes to GAPDH are set forth in Tables 1A and 2. By using the calculated ratios provided in Tables 1A and 2, the ratio of expression of a corresponding marker gene to GAPDH in a CIS patient may be calculated. Where the calculated marker to GAPDH ratio in the patient is approximately equal to the corresponding ratio provided in Table 1A and 2, the CIS patient is at high risk of rapidly developing MS.
In some embodiments, statistical models are established for determining whether expression of a marker gene is indicative of a CIS patient that is highly likely to develop MS, for example within 9 months of being initially diagnosed with CIS. Thus, in some related embodiments, the standard control is a threshold expression value obtained from a statistical model. Threshold expression values may be obtained optionally using a standard gene (e.g. GADPH or ACTB) and a classifier algorithm (e.g. compound covariate predictor (CCP), diagonal linear discriminant analysis (DLDA), and/or support vector machines (SVM) classifiers) (see Example 9 and Tables 8 to 17C). In some embodiments, a composite predictor is used to establish a statistical model or threshold vale wherein the composite predictor employs a CCP, DLDA and SVM. Where the expression of a marker gene in a CIS subject is above the calculated threshold expression value, a patient with CIS is at high risk for developing MS.
Using the teachings provided herein, one skilled in the art is enabled to use any known housekeeping gene to establish similar ratios and statistical models to identify CIS patients at high risk of rapidly developing MS using the disclosed methods.
In another aspect, there is provided an in vitro method for determining whether a patient with clinically isolated syndrome (CIS) is at high risk of developing multiple sclerosis (MS). The method includes isolating mRNA from the patient, thereby providing an in vitro nucleic acid sample. Optionally, the method further includes subjecting the in vitro nucleic acid sample to polymerase chain reaction under conditions suitable to amplify nucleic acid within the in vitro nucleic acid sample. The in vitro nucleic acid sample is contacted with a microarray, the microarray having a plurality of probes designed to interrogate specific marker genes. The level of nucleic acid duplex formation is determined between the in in vitro nucleic acid sample and the microarray, thereby providing the expression level of nucleic acid present in the in vitro nucleic acid sample. The expression level of nucleic acid is then compared to the expression level of a standard control. A differential expression of the marker gene relative to said standard control indicates that the patient is at high risk of developing multiple sclerosis. In some embodiments, the standard control may be approximately the average amount of expression of the marker gene in humans, humans without CIS, or humans with CIS that are not at high risk of developing MS. In other embodiments, the standard control is a detected level of expression of a standard control gene in the CIS patient. In some embodiments, the marker gene is a gene set forth in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 and/or Table 13. In some embodiments, the marker gene is any one of the marker genes set forth in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 and/or Table 13. In some embodiments, the expression level of a plurality (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90 or 100) of marker genes as set forth in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 and/or Table 13, are determined. In some embodiments, the marker gene is ZNF12, C17orf65, BATT, ARHGDIA, NAPA, ATP5G2, DDX52, NDFIP1, SDAD1, USP7, MEF2A, AGER, RAB1 B, GDI1 and/or BANF1. In other related embodiments, the marker gene is ZNF12, C17orf65, BAT1, ARHGDIA, NAPA, ATP5G2, DDX52, NDFIP1 and/or SDAD1. In still other related embodiments, the marker gene is USP7, MEF2A, AGER, RAB1 B, GDI1 and/or BANF1. In some embodiments, the expression levels of a plurality (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15) of marker genes selected from ZNF12, C17orf65, BAT1, ARHGDIA, NAPA, ATP5G2, DDX52, NDFIP1, SDAD1, USP7, MEF2A, AGER, RAB1 B, GDI1 and BANF1. In some embodiments, the marker gene is a gene set forth in Table 18. In some embodiments, the marker gene is C17orf65 (SEQ ID NO:977), C4orf10 (SEQ ID NO:1005), FAM98A (SEQ ID NO:1020), TLE1 (SEQ ID NO:844), INHBC (SEQ ID NO:993), NAPA (SEQ ID NO:995), TKT (SEQ ID NO:994), TPT1 (SEQ ID NO:138), F1120054 (SEQ ID NO:11), KIAA0794 (SEQ ID NO:104), LOC134492 (SEQ ID NO:184), or MGC34648 (SEQ ID NO:348). In some embodiments, the expression levels of a plurality (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or 13) of marker genes as disclosed in Table 18 are detected. In some embodiments, the marker gene is a gene set forth in Table 19. In some embodiments, the marker gene is CD1D (SEQ ID NO:376), CD44 (SEQ ID NO:275), CDC34 (SEQ ID NO:553), CDKN1C (SEQ ID NO:320), CD47 (SEQ ID NO:1015), GZMM (SEQ ID NO:617), or PPIA (SEQ ID NO:1010). In some embodiments, the expression levels of a plurality (e.g., 2, 3, 4, 5, 6 or 7) of marker genes as disclosed in Table 19 are detected.
In another aspect, a kit is provided for use in identifying a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS). The kit includes (i) a nucleic acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region (e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20, or within the range 10-50, 10-40, 10-30, or 10-20) with one or more nucleic acids within a marker gene identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, and/or Table 13, (ii) a nucleic acid sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 (e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20) nucleotide continuous region with a target sequence to which the probe set identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 and/or Table 13 is designed to interrogate, or (iii) a nucleic acid complimentary to the nucleic acids set forth in (i) or (ii) above. In some embodiments, the kit also includes an electronic device or computer software capable of comparing a marker gene expression level from the patient to a standard control thereby indicating whether the patient is at high risk of developing multiple sclerosis. In some embodiments, the kit contains a plurality (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20) of nucleic acid sequences having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region (e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20, or within the range 10-50, 10-40, 10-30, or 10-20) with a marker gene identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, and/or Table 13, or complement thereof. In some embodiments, the kit contains a plurality (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20) of nucleic acid sequences having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region (e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20, or within the range 10-50, 10-40, 10-30, or 10-20) with a marker gene identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, and/or Table 13, or complement thereof. In some embodiments, the plurality of marker genes are all or a portion of marker genes listed in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, or Table 13, or the plurality of marker genes comprise a nucleic acid of at least 10 nucleotides in length and at least 90% identity with a contiguous region of all or a portion of marker gene sequences listed in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, or Table 13. In some embodiments, the plurality of marker genes are all or a portion of marker genes listed in one of Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, or Table 13, or the plurality of marker genes comprise a nucleic acid of at least 10 nucleotides in length and at least 90% identity with a contiguous region of all or a portion of marker gene sequences listed in one of Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, or Table 13. In other embodiments, the plurality of marker genes are all marker genes listed in one of Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, or Table 13, or the plurality of marker genes comprise a nucleic acid of at least 10 nucleotides in length and at least 90% identity with a contiguous region of all marker gene sequences listed in one of Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, or Table 13.
In some embodiments, the electronic device or computer software employs the use of a statistical model. The electronic device or computer software may also utilize a threshold expression values obtained optionally using a standard gene (e.g. GADPH or ACTB) and a classifier algorithm (e.g. compound covariate predictor (CCP), diagonal linear discriminant analysis (DLDA), and/or support vector machines (SVM) classifiers) such as those set forth in Example 9 and Tables 8 to 17. One skilled in the art will immediately recognize that the electronic device or computer software may be used in the methods disclosed herein.
In some embodiments, the nucleic acid provided in the kit above may be a probe nucleic acid for use in a PCR technique, such as quantitative PCR, to assess the expression of a given marker gene. In some embodiments, the nucleic acid sequence has 100% identity with a continuous nucleic acid region (i.e. sequence) within a marker gene identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, and/or Table 13, or with a target sequence to which the probe set identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 and/or Table 13 is designed to interrogate, or is complimentary thereto. In other embodiments, the nucleic acid has the same sequence as a nucleic acid contained within a marker gene identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, and/or Table 13 or the target sequence to which the probe set identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 and/or Table 13 is designed to interrogate, or is complimentary thereto.
The nucleic acid provided in the kit may also hybridize under stringent conditions (or moderately stringent conditions) to a nucleic acid sequence within a marker gene identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 and/or Table 13 or a target sequence to which the probe set identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 and/or Table 13 is designed to interrogate. The nucleic acid provided in the kit may also be perfectly complimentary to a nucleic acid sequence within a marker gene identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 and/or Table 13 or a target sequence to which the probe set identified in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8 and/or Table 13 is designed to interrogate.
The present invention also contains the subject matter of the following numbered embodiments:
A method of identifying a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS), said method comprising:
A method of identifying a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS), said method comprising:
The method of Embodiment 1, wherein said marker gene comprises a nucleic acid sequence at least 10 nucleotides in length having at least 90% identity with a contiguous portion of a nucleic acid having the sequence of one of SEQ ID NO:1 to SEQ ID NO:1021.
The method of Embodiment 1 or Embodiment 2, wherein the said marker gene comprises a nucleic acid sequence at least 10 nucleotides in length having at least 95% identity with a contiguous portion of a nucleic acid having the sequence of one of SEQ ID NO:1 to SEQ ID NO:1021.
The method of any preceding Embodiments, wherein the method is an in vitro method and comprises detecting the level of expression of a marker gene in a sample previously isolated from said patient.
The method of Embodiment 4, which comprises contacting the sample with at least one_nucleic acid of at least 10 nucleotides in length and having at least 90% identity with a contiguous portion of one of SEQ ID NO:1 to SEQ ID NO:1021, and optionally comprises contacting the sample with 2, 3, 4, 5, 6, 7, 8, 9, 10 or more nucleic acids of at least 10 nucleotides in length and having at least 90% identity with a contiguous portion of one of SEQ ID NO:1 to SEQ ID NO:1021.
The method of Embodiment 4, which comprises contacting the sample with at least one_nucleic acid of at least 10 nucleotides in length and at least 95% identity with a contiguous portion of one of SEQ ID NO:1 to SEQ ID NO:1021, and optionally comprises contacting the sample with 2, 3, 4, 5, 6, 7, 8, 9, 10 or more nucleic acids of at least 10 nucleotides in length and having at least 95% identity with a contiguous portion of one of SEQ ID NO:1 to SEQ ID NO:1021,
The method of Embodiment 4, which comprises contacting the sample with at least one_nucleic acid of at least 10 nucleotides in length and at least 99% identity with a contiguous portion of one of SEQ ID NO:1 to SEQ ID NO:1021, and optionally comprises contacting the sample with 2, 3, 4, 5, 6, 7, 8, 9, 10 or more nucleic acids of at least 10 nucleotides in length and having at least 99% identity with a contiguous portion of one of SEQ ID NO:1 to SEQ ID NO:1021,
The method of any preceding Embodiment, wherein said marker gene is a marker gene set forth in Table 18.
The method of any of Embodiments 1 to 7, wherein said marker gene is a marker gene set forth in Table 19.
The method of any of Embodiments 1 to 7, wherein said marker gene is ZNF12 (SEQ ID NO:83), C17orf65 (SEQ ID NO:977), BAT1 (SEQ ID NO:981), ARHGDIA (SEQ ID NO:1000), NAPA (SEQ ID NO:995), ATP5G2 (SEQ ID NO:996), DDX52 (SEQ ID NO:292), NDFIP1 (SEQ ID NO:2), SDAD1 (SEQ ID NO:116), USP7 (SEQ ID NO:1014), MEF2A (SEQ ID NO:1007), AGER (SEQ ID NO:998), RAB1B (SEQ ID NO:1011), GDI1 (SEQ ID NO:986) or BANF1(SEQ ID NO:999).
The method of any of Embodiments 1 to 7, wherein said marker gene is ZNF12 (SEQ ID NO:83), C17orf65 (SEQ ID NO:977), BAT1 (SEQ ID NO:981), ARHGDIA (SEQ ID NO:1000), NAPA (SEQ ID NO:995), ATP5G2 (SEQ ID NO:996), DDX52 (SEQ ID NO:292), NDFIP1 (SEQ ID NO:2) or SDAD1 (SEQ ID NO:116).
The method of any of Embodiments 1 to 7, wherein said marker gene is USP7 (SEQ ID NO:1014), MEF2A (SEQ ID NO:1007), AGER (SEQ ID NO:998), RAB1B (SEQ ID NO:1011), GDI1 (SEQ ID NO:986) or BANF1 (SEQ ID NO:999).
The method of any of Embodiments 1 to 7, wherein said marker gene is C17orf65 (SEQ ID NO:977), C4orf10 (SEQ ID NO:1005), FAM98A (SEQ ID NO:1020), TLE1 (SEQ ID NO:844), INHBC (SEQ ID NO:993), NAPA (SEQ ID NO:995), TKT (SEQ ID NO:994), TPT1 (SEQ ID NO:138), F1120054 (SEQ ID NO:11), KIAA0794 (SEQ ID NO:104), LOC134492 (SEQ ID NO:184), or MGC34648 (SEQ ID NO:348).
The method of any of Embodiments 1 to 7, wherein said marker gene is CD1D (SEQ ID NO:376), CD44 (SEQ ID NO:275), CDC34 (SEQ ID NO:553), CDKN1C (SEQ ID NO:320), CD47 (SEQ ID NO:1015), GZMM (SEQ ID NO:617), or PPIA (SEQ ID NO:1010).
The method of any preceding Embodiment, wherein said standard control is a detected level of expression of a standard control gene in said patient.
The method of Embodiment 15, wherein said standard control gene is GAPDH, 18s ribosomal subunit, beta actin (ACTB), PPP1CA, beta 2 microglobulin (B2M), HPRT1, RPS13, RPL27, RPS20 or OAZ1.
The method of Embodiment 16, wherein said standard control gene is GAPDH.
The method of any preceding Embodiment, wherein the elevated level of expression of said marker gene or the lowered level of expression of said marker gene is determined by the ratio of the level of expression of said marker gene to the level of expression of said standard control gene, whereby said ratio being approximately equal to the corresponding ratio set forth in Table 1A or Table 2 predicts development of MS within two years of being initially diagnosed with CIS.
The method of any preceding Embodiment, wherein the elevated level of expression of said marker gene or the lowered level of expression of said marker gene is determined by a threshold expression level resulting from a statistical model.
The method of Embodiment 19, wherein said statistical model is obtained using a classifier algorithm selected from a compound covariate predictor, a diagonal linear discriminant analysis, and a support vector machine.
The method of any preceding Embodiment, wherein said patient at high risk of developing MS is a patient with CIS that will develop MS within two years of being initially diagnosed with CIS.
A kit for use in identifying a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS), said kit comprising;
A kit for use in identifying a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS), said kit comprising;
The kit of Embodiment 22 or 22A, wherein the nucleic acid is at least 10 nucleotides in length.
The kit of Embodiment 22, 22A or Embodiment 23, which comprises a nucleic acid comprising a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region with one or more marker genes wherein said marker gene is selected from ZNF12 (SEQ ID NO:83), C 17orf65 (SEQ ID NO:977), BAT1 (SEQ ID NO:981), ARHGDIA (SEQ ID NO:1000), NAPA (SEQ ID NO:995), ATP5G2 (SEQ ID NO:996), DDX52 (SEQ ID NO:292), NDFIP1 (SEQ ID NO:2), SDAD1 (SEQ ID NO:116), USP7 (SEQ ID NO:1014), MEF2A (SEQ ID NO:1007), AGER (SEQ ID NO:998), RAB1B (SEQ ID NO:1011), GDI1 (SEQ ID NO:986) and BANF1(SEQ ID NO:999).
The kit of Embodiment 22, 22A or Embodiment 23, which comprises a nucleic acid comprising a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region with one or more marker genes wherein said marker gene is selected from ZNF12 (SEQ ID NO:83), C17orf65 (SEQ ID NO:977), BAT1 (SEQ ID NO:981), ARHGDIA (SEQ ID NO:1000), NAPA (SEQ ID NO:995), ATP5G2 (SEQ ID NO:996), DDX52 (SEQ ID NO:292), NDFIP1 (SEQ ID NO:2) and SDAD1 (SEQ ID NO:116).
The kit of Embodiment 22, 22A or Embodiment 23, which comprises a nucleic acid comprising a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region with one or more marker genes wherein said marker gene is selected from USP7 (SEQ ID NO:1014), MEF2A (SEQ ID NO:1007), AGER (SEQ ID NO:998), RAB1B (SEQ ID NO:1011), GDI1 (SEQ ID NO:986) and BANF1 (SEQ ID NO:999).
Use, in the identification of a patient with clinically isolated syndrome (CIS) at high risk of developing multiple sclerosis (MS), of a microarray comprising a nucleic acid immobilised on a solid substrate, said nucleic acid having a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region with one or more marker genes wherein said marker gene is selected from the group consisting of SEQ ID NO:1 to SEQ ID NO:1021.
The use of Embodiment 27, wherein said nucleic acid comprises a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region with one or more marker genes wherein said marker gene is selected from ZNF12 (SEQ ID NO:83), C17orf65 (SEQ ID NO:977), BAT1 (SEQ ID NO:981), ARHGDIA (SEQ ID NO:1000), NAPA (SEQ ID NO:995), ATP5G2 (SEQ ID NO:996), DDX52 (SEQ ID NO:292), NDFIP1 (SEQ ID NO:2) or SDAD1 (SEQ ID NO:116), USP7 (SEQ ID NO:1014), MEF2A (SEQ ID NO:1007), AGER (SEQ ID NO:998), RAB1B (SEQ ID NO:1011), GDI1 (SEQ ID NO:986) and BANF1 (SEQ ID NO:999).
The use of Embodiment 27, wherein said marker gene is ZNF12 (SEQ ID NO:83), C17orf65 (SEQ ID NO:977), BAT1 (SEQ ID NO:981), ARHGDIA (SEQ ID NO:1000), NAPA (SEQ ID NO:995), ATP5G2 (SEQ ID NO:996), DDX52 (SEQ ID NO:292), NDFIP1 (SEQ ID NO:2) or SDAD1 (SEQ ID NO:116).
The use of Embodiment 27, wherein said marker gene is USP7 (SEQ ID NO:1014), MEF2A (SEQ ID NO:1007), AGER (SEQ ID NO:998), RAB1B (SEQ ID NO:1011), GDI1 (SEQ ID NO:986) or BANF1 (SEQ ID NO:999).
The use of Embodiment 27, wherein a plurality of nucleic acids are immobilised on said solid substrate, said plurality of nucleic acids having a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identity over at least a 10 nucleotide continuous region with all marker gene sequences listed in Table 18, Table 19, Table 1A, Table 2, Table 4, Table 8, or Table 13.
Gene expression microarray analysis was performed in negatively isolated naïve CD4+ T-cells obtained from 37 CIS patients after initial clinical presentation (mean 4.5+/−2.6 months) and from 29 controls matched for age and gender. Four arrays failed to pass our quality control protocol and were thus excluded from further analysis. Demographic characteristics of the remaining patients (n=34) and controls (n=28) were similar (Table 3). Analysis was focused on the 1,718 probe sets that showed at least a 2 fold-change from each gene's median value in more than 20% of the samples.
Principal component analysis (PCA) using expression values from these 1,718 probe sets showed a clear segregation between controls and CIS samples (
When subjected to a T-test, 975 probe sets were found differentially expressed between CIS and controls after correction for multiple comparisons (FDR<0.1 (Table 4). Interestingly, most of the discriminating transcripts (70%) were under-expressed in CIS whereas the remaining 30% were over-expressed. This finding is in agreement with previous observations that downregulated genes greatly outnumber upregulated genes in T lymphocytes from MS patients when studied by gene expression microarrays (6, 7) or by FACS (8).
Gene ontology (GO) enrichment of these 975 differentially expressed genes revealed alteration of major molecular functions and biological processes (
On the basis of transcriptional activity, samples from CIS patients segregated into 4 groups (groups #1, 2, 3, 4) corresponding to the first 4 splits of the dendrogram (Robustness index 99.4%) (
Patients from each of the 4 CIS transcriptional groups did not differ significantly according to age, gender, ethnic background, time from initial clinical event, or HLA-DRB1*1501 status (
In order to evaluate the concordance of gene expression with neurodegeneration, it was investigated to what extent changes in brain volume differed across the four CIS groups. To avoid biases related to therapy, only the 15 patients who did not receive disease-modifying therapy during the first year after diagnosis were included in this analysis. Normalized brain parenchyma (nBPV), white matter (nWMV), grey matter (nGMV) and CSF (nCSFV) volumes were not significantly different among the 4 groups at baseline. However, hierarchical clustering of changes after 1 year in these quantitative MRI parameters segregated patients in two major groups (
To further explore what information about conversion to MS is contained in gene expression at the CIS stage, a predictive model of survival (i.e. conversion to MS) was built based on supervised principal components (10). The resulting model contained mRNA gene products hybridizing to 28 probe sets set forth in Table 2 and allowed a segregation of CIS into high and low risk groups (
Differential gene expression observed shortly after CIS diagnosis may either reflect an acute and transient biological response to the disease and/or a predisposing causative signature. To investigate this further and to confirm our observations, samples from the same individuals where collected and processed one year (mean 11+/−3 months) after diagnosis of CIS. For this follow-up, 31 CIS and 9 controls were available. Hierarchical clustering of these samples performed with the same 1,718 genes identified at baseline still discriminated CIS from controls (
Because group #1 signature persists along time, focus lay on this group which also appears to constitute a relatively consistent biological entity. Among the RNA gene products hybridizing to the 975 probe sets set forth in Table 4 whose expression differentiates CIS from controls at baseline, RNA gene products hybridizing to 108 probe sets set forth in Table 1A were also differentially expressed between group #1 and the other CIS groups combined (
C57/B16 mice injected with MOG35-55 reproducibly develop experimental autoimmune encephalomyelitis (EAE), a widely-employed laboratory model for MS. In contrast to the sustained inflammation endured by animals with EAE, the response to CFA is expected to be transient and it was hypothesized that patterns of Tob1 expression should reflect this difference. To test this hypothesis, the database from two previous experiments was searched, in which high throughput gene expression in lymph nodes and spinal cords at the peak of EAE was measured (13, 14). As expected, Tob1mRNA expression was decreased in both lymph nodes and spinal cords of EAE animals compared to CFA controls (
In addition to intra-molecular mechanisms, engagement of transmembrane receptors may contribute in regulating T cell homeostasis. Among the 3 differentially expressed genes coding for transmembrane receptors (SIGLEC10, EMR1 and CD44) only CD44, was over-expressed in CIS patients (Table 6). This upregulation was confirmed by qRT-PCR (
Since CIS patients classified as group #1 converted to CDMS earlier than other CIS patients, it was hypothesized that TORI is also implicated in the progression of disease once established. A genetic effect would be then expected in CDMS patients showing extreme phenotypes (mild or severe). This hypotheses was tested by genotyping 5 SNPs located within or near the gene (
The study cohort consisted of 37 untreated CIS patients and 29 healthy control subjects matched for age and sex, evaluated at the UCSF Multiple Sclerosis Center. CIS patients were identified as subjects presenting with a first well-defined, neurological event persisting for more than 48 hours involving the optic nerve, brain parenchyma, brainstem, cerebellum, or spinal cord. All CIS patients demonstrated at least two abnormalities on brain MRI measuring greater than 3 mm2. Patients were followed for an average of 20 (+/−8) months. Time to conversion was defined as the delay between recruitment and next clinical event or the date of identified MRI changes fulfilling the McDonald criteria (5). Written informed consent was obtained from all study participants.
MRI scans for all subjects were acquired on a 1.5 T GE (GE) MRI scanner with a standard head coil. All CIS subjects were scanned every 3 months during the first year of follow-up and then every 6 months during the second year. T2 hyperintense lesions were identified on simultaneously viewed T2 and proton density-weighted dual echo (1 mm×1 mm×3 mm pixels, interleaved slices, 20 ms and 80 ms echo times) images with regions of interest drawn based on a semi-automated threshold with manual editing as described elsewhere (26) Annual percent brain volume change (PBVC) was calculated from high resolution 3D T1-weighted spoiled gradient recalled echo volumes (pixel size of 1 mm×1 mm×1.5 mm, 124 slices, flip angle 40°) using SIENA (27).
Blood samples were collected at the time of recruitment into the study (baseline) and after 12 months. Peripheral blood mononuclear cells (PBMC) were separated on a Ficoll gradient and frozen in liquid nitrogen until needed. Naïve CD4+ T cells were isolated by negative selection using Dynabeads® (Invitrogen). CD4+ T cells purity was assessed by FACS (>95%, data not shown). RNA was then extracted using RNeasy® Mini kit (Quiagen), amplified with MessageAmp™ II a RNA kit (Ambion) and labeled with Bio-11-UTP for subsequent hybridization onto Affymetrix® Human Genome U133 Plus2.0 arrays (TGEN). Thus, the probe set identifier numbers set forth in the Tables below (including Tablesl8, 19, 1A, 2, 4, 8 and 13) are in reference to Affymetrix® Human Genome U133 Plus2.0 arrays.
Quality control (QC) analysis of the arrays was performed using the Bioconductor package, available at the bioconductor.org website. In order to pass QC, arrays had to have at least 40% of their probe sets called present and had to have similar RNA degradation slopes, GAPDH and beta-actin ratios, scaling factors, histograms and box plot of intensities. Arrays were normalized using RMA (28). Statistical analyses were carried out using BRB-array Tools (Biometrics Research Branch, NIH). For multiple comparison correction, genes were considered differentially expressed if the univariate p-value was less than 0.001 and False discovery rate (FDR) less than 0.1 (29). Genes predicting MS conversion were determined using the Survival Analysis Prediction Tool of BRB-array Tools. The 2 survival risk groups were built using PCA with a p-value set at 0.001 for univarietely correlated genes with survival and leave-one-out-cross validation. (10) For cases with above and below average risk (50th percentile) Kaplan-Meier survival curves were used. Hierarchical clustering was performed using Genes@work® software (IBM Research). To gauge robustness in the classification, the dataset was perturbed by adding random (white) Gaussian noise using the median variance of the dataset and re-clustered the samples 100 times. The index of robustness is the mean percentage of times a pair of samples remained in the same cluster. To investigate the likelihood that segregation into 4 groups occurred by chance, the Integrated Bayesian Inference System (IBIS) was used, which is a supervised machine learning approach (9).
In order to calculate the marker to GAPDH ratio in a patient, univariate and multivariate statistical models are used. In univariate statistical models, the characteristic of each individual gene in classifying samples as being high or low risk genes is determined. In multivariate models, the best possible combination of two or more genes that can maximize the positive predictive value (PPV) or negative predictive value (NPV) is established. The positive predictive value, is defined as the number of true positives per total of true and false positives, whereas the negative predictive value describes the number of true negatives per total of true negatives and false negatives. Applying this statistical model provides methods to discriminate between high risk and low risk patients.
As discussed above, 108 probe sets were identified (Table 1A) by T-test analysis that hybridized to gene products that were differentially expressed between group#1 and other subjects. And 28 probe sets were identified (Table 2) by principal-component-based survival analysis that hybridized to gene products that were differentially expressed by high risk CIS patients. The combined set of 136 probe sets (108+28) were used to search for classifiers that could discriminate between the two groups with a reduced number of genes. Using compound covariate predictor (CCP), diagonal linear discriminant analysis (DLDA), and support vector machines (SVM) classifiers (see below), 13 probe sets were identified (Table 8) that hybridized to gene products that were differentially expressed. The CCP, DLDA and SVM were run with default parameters and within the BRB array tools application available from the National Cancer Institute. For each classifier a specific weight was assigned to each probe set as set forth in Table 9. The expression value of each probe set was normalized by that of two housekeeping (HK) genes: GAPDH and ACTB, with the results are provided in Tables 10A to 12C, which detail the predictive value of the statistical model by providing the number of CIS patients that developed MS within nine months (MS) and the number that did not develop MS within nine months (No MS) and the corresponding prediction based on the threshold value.
An independent (network-based) search was conducted based on the hypothesis that groups of genes whose products interact physically are likely to define biologically functional modules, as described in Ideker, T., et al. (2002). “Discovering regulatory and signalling circuits in molecular interaction networks.” Bioinformatics 18 Suppl 1: S233-40. Unlike with classical statistical analyses, identification of these modules allows for direct biological interpretation of the results. Briefly, we implemented a sub-network identification tool based on the algorithm previously described by Ideker et al. to identify groups of functionally related genes that could classify high versus low risk CIS patients. This algorithm consists of the following steps. First, a protein interaction database was downloaded locally. Second, starting from each node in the network, a sub-network was recursively grown by the addition of one neighboring node at a time. At each step, a scoring function was computed based on the mutual information between the weighted average of the expression values of all nodes considered at this step, and the vector of phenotypes (case versus control, high vs. low risk, etc). Third, the sub-network continued to grow until addition of a new node did not increase the score significantly. Three classifiers were constructed using the CCP, DLDA, and SVM algorithms. The network based search resulted in the identification of 6 probe sets (Table 13) that hybridized to gene products that were differentially expressed. For each classifier a specific weight was assigned to each probe set as set forth in Table 14. The expression value of each probe set was normalized by that of two housekeeping (HK) genes: GAPDH and ACTB. The predictive value and threshold values for the 6 probe sets were calculated, with the results provided in Table 15A to 17C, which detail the predictive value of the statistical model by providing the number of CIS patients that developed MS within nine months (MS) and the number that did not develop MS within nine months (No MS) and the corresponding prediction based on the threshold value.
The compound covariate predictor (CCP) used in the above studies is a weighted linear combination of log-ratios (or log intensities for single-channel experiments) for genes that are univariately significant at the specified level. By specifying a more stringent significance level, fewer genes are included in the multivariate predictor. Genes in which larger values of the log-ratio pre-dispose to class 2 rather than class 1 have weights of one sign, whereas genes in which larger values of the log-ratios pre-dispose to class 1 rather than class 2 have weights of the opposite sign. The univariate t-statistics for comparing the classes are used as the weights. The CCP is described in further detail in Radmacher M D, McShane L M, and Simon R. A paradigm for class prediction using gene expression profiles. Journal of Computational Biology 9:505-511, 2002; and I Hedenfalk, D Duggan, Y Chen, M Radmacher, M Bittner, R Simon, P Meltzer, B Gusterson, M Esteller, M Raffeld, et al. Gene expression profiles of hereditary breast cancer, New England Journal of Medicine 344:539-548, 2001.
The Diagonal Linear Discriminant Analysis (DLDA) used in the above studies is similar to the Compound Covariate Predictor, but not identical. It is a version of linear discriminant analysis that ignores correlations among the genes in order to avoid over-fitting the data. Many complex methods have too many parameters for the amount of data available. Consequently they appear to fit the training data used to estimate the parameters of the model, but they have poor prediction performance for independent data. The DLDA is described in further detail in McLachlan G J. Discriminant Analysis and Statistical Pattern Recognition Wiley-Interscience; New Ed edition (Aug. 4, 2004); and Dudoit S, Fridlyand J, Speed T P. Comparison of discrimination methods for the classification of tumors using gene expression data, Journal of the American Statistical Association 97:77-87, 2002).
The support vector machine (SVM) used in the above studies is a class prediction algorithm that has appeared effective in other contexts and is currently of great interest to the machine learning community. The SVM predictor can employ a variety of functions, as known in the art. In some embodiments, the SVM predictor is a linear function of the log-ratios or the log-intensities that best separates the data subject to penalty costs on the number of specimens misclassified. The SVM is described in further detail in Vapnik V. The Nature of Statistical Learning Theory. Springer-Verlag, 1995.
Master mix was prepared essentially as described previously, (9) with the addition of 200 μM ROX (Sigma), and overlaid on top of each well of a freshly thawed 384-well plate containing 5 ng of RNA in each well. Reactions were performed in triplicates using an ABI 7900 Sequence Detection System (Applied Biosystems).
Draining lymph nodes from either naive or injected (MOG35-55 or CFA alone) C57/B16 mice were removed, washed in PBS and then embedded in OCT and frozen. Sections were cut at 6 μm on a cryostat and stained for immunofluorescence examination using either a rabbit anti-TOB1 polyclonal antibody (H-70, Santa Cruz Biotechnology Inc. CA), or a purified rat anti CD4 antibody (BD Pharmingen). Secondary antibodies were anti-rabbit Alexa 488 (Molecular Probes, Eugene Oreg.) and anti-rat Alexa 594 (Molecular Probes). ELISAs for OPN were carried out using the Quantikine kit (R&D Systems) according to manufacturer's instructions.
Five single nucleotide polymorphisms (SNP) located within or near TOB1 were selected for genotyping in 62 mild and 74 severe MS patients. Mild disease was defined as EDSS<3 after 15 years of onset while severe was defined as EDSS>6 after 10 years of onset. Genotyping assays were carried out in 384-well plates using TaqMan® Universal PCR Master Mix on an ABI GeneAmp PCR System 7900 (Applied Biosystems). Statistical tests were carried out in SAS and Jmp Genomics suite (SAS). For haplotype analysis, exact p-values were calculated using the EM algorithm in a Monte Carlo approach with 10,000 permutations.
Tables 1-19 follow. Tables 1A, 1B and 2 provide differential gene expression analysis data. Table 3 provides data regarding subject characteristics at baseline. Table 4 provides a list of 975 genes differentially expressed between CIS and controls at baseline. Table 5 provides data regarding mean predictive accuracy of the top seven gene pairs. Table 6 provides data relating to the signature of group #1 patients. And Table 7 provides genotyping data of 5 TOB1 SNP in patients with mild (n=62) or severe (n=74) MS. Tables 9 to 19 present data resulting form the statistical model analysis as described herein. Terms used in the tables are as follows: The term “SD” in the context of statistical analysis refers to the standard deviation, as known in the art. The term “Ave.” refers to the statistical average, as known in the art. The term “Grpl” refers to Group #1 as described herein.
Homo sapiens (human)
Homo sapiens, clone IMAGE: 3352913, mRNA
Homo sapiens, clone IMAGE: 3881549, mRNA
1.E−05
1.E−06
9.E−06
2.E−07
2.E−06
1.E−07
1.E−05
1.E−07
8.E−07
1.E−07
2.E−07
3.E−07
4.E−05
2.E−05
1.E−05
3.E−04
3.E−04
4.E−06
8.E−06
6.E−07
5.E−04
3.E−05
2.E−04
7.E−05
1.E−06
3.E−05
2.E−06
1.E−07
2.E−06
2.E−07
4.E−07
3.E−07
1.E−07
3.E−06
3.E−04
5.E−07
2.E−04
5.E−07
6.E−07
3.E−07
2.E−05
9.E−07
1.E−07
This application claims the benefit of U.S. Provisional Application No. 61/083,505, filed Jul. 24, 2008, U.S. Provisional Application No. 61/103,215, filed Oct. 6, 2008, and U.S. Provisional Application No. 61/108,469, filed Oct. 24, 2008, all of which are incorporated herein by reference in their entireties and for all purposes.
The invention was supported, in whole or in part, by a grant from the National Institutes of Health (2R01NS026799). The Government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2009/051750 | 7/24/2009 | WO | 00 | 4/14/2011 |
Number | Date | Country | |
---|---|---|---|
61083505 | Jul 2008 | US | |
61103215 | Oct 2008 | US | |
61108469 | Oct 2008 | US |