The invention relates to the field of medicine and to the field of diagnostics. The invention in particular relates to means and methods for typing cells and cell isolates from individuals suffering from, or at risk of suffering from, a psychiatric disorder, particularly a depressive disorder.
The invention is exemplified herein below predominantly for depressive disorders, however, this does not mean that the invention is limited thereto. The means and methods of the invention are suited for all psychiatric disorders, depressive disorders are nevertheless preferred. Depressive and anxiety disorders, have a high lifetime prevalence (16% and 10%, respectively) and frequently run a chronic course. There is considerable co-morbidity between depressive and anxiety disorders. They have many symptoms in common, similar pathogenic mechanisms are proposed and both disorders can be treated successfully with similar antidepressants. The pathogenesis of depressive and anxiety disorders is largely unknown but it is clear that genetic vulnerability and environmental risk factors are both important. At least two meta-analyses (Sullivan et al., 2000; Hettema et al., 2001) suggest that familial aggregation for depressive and anxiety disorders are due to genetic effects (heritability: depression 37%, panic disorder 43%, generalized anxiety disorder 32%), with a minimal contribution of environmental effects shared by family members, but substantial individual-specific environmental effects.
Depression has a large impact on human health (morbidity and mortality) and society in general and will be the second most important medical disorder by the year 2020, according to the World Health Organization. The three brain systems that are most strongly implicated in the pathogenesis of depression are: (i) the midbrain serotonin (5-HT) system; (ii) the medial prefrontal cortex (mPFC); and (iii) the hypothalamopituitary-adrenal (HPA) axis. Midbrain serotonergic neurons in the raphe nuclei participate in many physiological functions and are considered to be the cellular target of serotonin-specific reuptake inhibitors (SSRIs), the main class of drugs used to treat psychiatric illnesses. Their activity is controlled by 5-HT1A autoreceptors and by input from many brain areas, including the mPFC (see Hajós et al., 1998; Peyron et al., 1995). A disturbed 5-HT transmission of the raphe nuclei to their targets forms a plausible explanation for depression since SSRI's successfully normalise mood in many patients, whereas depletion of serotonin precursor tryptophan results in acute recurrence of depression (Artigas et al., 1996; Stahl, 2000; Blier and Bergeron, 1998; Haddjeri et al., 1998). In animal models, the 5-HT1A receptor was shown to act during development to establish normal anxiety-like behaviour in the adult (Gross et al., 2002). Relevance to humans is demonstrated by decreased binding of 5-HT1A tracers (Drevets et al., 2000; Sargent et al., 2000) and increased 5-HT1A autoreceptor density in the RN of depressed patients (Stockmeier et al., 1998). Polymorphisms of the serotonin transporter gene have also been associated with depression symptoms (Mann et al., 2000) and the occurrence of depression after stressful life-events (Caspi et al., 2003) or tryptophane depletion (Neumeister et al., 2002).
The prefrontal cortex regulates cognitive and associative functions and is involved in planning and execution of complex tasks (see Fuster, 1997). Depressed patients show hypoperfusion in mood related areas of the PFC (Drevets et al., 1997), which correlates with depression severity and normalises during treatment with SSRI (Soares and Mann, 1997). The mPFC is one of the few forebrain areas that projects densely to the serotonergic RN (Hajós et al., 1998; Peyron et al., 1995) and electrical stimulation of the mPFC indeed modulates the activity of 5-HT neurons (Hajós et al., 1998; Celada et al., 2001). In turn, the mPFC is densely innervated by serotonergic RN afferents (Azmitia and Segal, 1978). Especially layer V pyramidal neurons express 5-HT1A and 5-HT2A receptors (Pompeiano et al., 1994; Kia et al., 1996) and GABAergic interneurons express 5-HT2A and 5-HT3 receptors (see Jakab and Goldman-Rakic, 2000). The precise impact of 5-HT receptor activation in the mPFC microcircuits is currently unknown and may hold important clues to the understanding of depression. Hyperactivity of the HPA-axis is found in about 50% of depression patients. Successful antidepressive treatment often normalizes the HPA-axis (Inder et al., 2001) and corticotrophin releasing hormone (CRH)1-antagonists have antidepressant effects (Kunzel et al., 2003). Disturbed 5-HT transmission influences hypothalamic CRH producing neurons and the ensuing ACTH release from the pituitary and cortisol from the adrenal in animals and humans (Jorgensen et al., 2003). Furthermore, increased CRH gene expression in these hypothalamic targets of the serotonin system was observed in post mortem material of depressed patients (Raadsheer et al., 1995). Whether disturbed HPA axis activity is secondary to a disturbed 5-HT system or constitutes its own source of pathology remains an open question (see Kagamiishi et al., 2003; Montgomery et al., 2001; Oshima et al., 2003; Summers et al., 2003). In either case, the HPA axis provides a non-invasive read-out parameter, cortisol, which can be assessed in animals and in large scale human studies, by means of provocation tests (e.g. dexamethasone suppression test) and non-invasive salivary sampling (Kirschbaum and Hellhammer, 1989). Taken together, the evidence outlined above clearly identifies the raphe serotonin system, the mPFC and the HPA axis as central systems in the pathogenesis of the depression spectrum and emphasizes the evident interplay between the three systems.
Although much progress has been made in the field of depression, there is still much to be learned. Depression and anxiety disorders frequently run a chronic course, with a number of negative health care consequences, including increased medical consumption, disability, somatic morbidity and mortality. Given an extremely variable natural history, the most viable route for prevention appears to be to design ways to detect those at risk for an unfavourable prognosis in an early stage, tailoring interventions to the projected prognosis. A sobering finding, common to most psychiatric disorders, is that, until now, not one single determinant explains more than 15% of the aetiology of either disorder (McGuire and Troisi, 1998). Factors that have been identified are mostly very common, implying that they do not carry very high relative or absolute risks for incidence. Moreover, many risk factors (such as multiple loss) are not open to intervention. This severely limits the scope for primary and secondary prevention. Major depression is by far the most widely studied condition, concerning determinants of prognosis (for review, see Spijker et al., 2002). The predictive power of clinical factors, such as co-morbidity or duration and severity of earlier episodes, although limited, is hopeful, as these factors can be assessed in routine clinical work. However, these factors do not explain the wide variation in the prognosis. It is highly likely that underlying (molecular) biological factors determine both the clinical features of index episodes and (in interaction with environmental factors) their subsequent course. Moreover, individuals with similar depression-like symptoms can have very different clinical outcomes later in life. In fact many individuals suffer from depression-like symptoms at some point in their life but these are temporary and full recovery is possible without reverting to medication treatment. It is, at present, difficult to distinguish between groups that will recover without medication treatment and groups that do not recover without medication. This makes it difficult to determine the best treatment plan for individuals that present themselves with depression-like symptoms. Moreover, after a first depressive episode, about 50% of the patients will experience a recurrence of depression within a year. Therefore, physicians advise the patient according to protocol, to stay on antidepressant medication for six month after remission. However, most patients do not comply to this advice because it is difficult to take medication at a time there is no burden of disease, while adverse effects may be present. Furthermore, with our present knowledge we can not predict whether a particular patient belongs to the 50% in whom depression would indeed reoccur, when of medication. Finally, at present physicians can not predict which type of antidepressant (e.g. noradrenergic or serotonergic) will be effective for a particular patient, and have minimal adverse effects. The present invention relates to means and methods for typing and predicting: 1) which individuals with depression-like symptoms will become depressed when of medication; 2) which individuals with a depressive disorder will suffer from reoccurrence after premature termination of prophylactic antidepressant treatment; and 3) which patients benefit from what type of antidepressants, in terms of optimal effect and minimal adverse effect.
In particular the invention provides a method for typing a cell isolate of an individual suffering from a psychiatric disorder, or at risk of suffering there from, the method comprising providing an RNA sample from a cell isolate from said individual, determining RNA levels for a set of genes in said RNA sample, wherein said set of genes comprises at least two of the genes listed in Table 9, preferably at least two of the genes having the gene number 1-106 listed in Table 9 (first column), more preferably at least two of the genes having the gene number 1-142 listed in Table 9 (first column), and typing said isolate on the basis of the levels of RNA determined for said set of genes.
Part of the biological changes described above is probably caused by genetic polymorphisms (Flint at al., 1995; Holsboer et al., 1995). It is generally expected that in the following years a large number of polymorphisms will be discovered in candidate genes that code for proteins, which are already known to be involved in the pathogenesis of a psychiatric disease such as, for example, a depression (Merikangas, 2002). Twin studies have shown that depression is at least partly genetically determined, whereby genetic polymorphisms underlie biological factors which, in interaction with environmental factors, determine the course of a psychiatric disease such as a depressive disorder. A method according to the invention allows typing of a cell isolate from an individual based on RNA levels of a set of genes in said cell isolate. Without being bound by theory it is believed that at least some of the differences in gene expression detected using a method of the invention are due to differentially expressed polymorphic genes. These genes are typically associated with their most proximal gene product (i.e. gene-expression- and protein profiles). Polymorphic differences underlying a psychiatric disease such as a depressive disorder are at least to some extent reflected in differences in gene expression patterns present in a cell isolate. Furthermore, cells of a patient have been exposed to factors related to the disease (e.g. elevated cortisol levels and probably various other as yet unknown factors), which can cause differences in gene expression status of cells between healthy controls and patients. These differences can be visualized by analyzing gene expression such as RNA levels. If required, cells can be contacted with a stimulus for enhancement of differences in gene expression.
Thus, polymorphic genes and the internal and external (with or without stimulus) cell environment provide a unique combination resulting in differences in gene expression patterns between patients and controls.
A preferred method according to the invention further comprises providing an RNA sample from said cell isolate after contacting said cell isolate with a stimulus, Said stimulus can be selected from cytokines such as stem cell factor, colony-stimulating factor, hepatocyte growth factor, interferon, leukemia inhibitory factor, transforming growth factor beta, tumor necrosis factor alpha, and interleukins; lipopolysaccharides (LPS); neurotransmitters such as acetylcholine, norepinephrine, dopamine, serotonin, and gamma aminobutyric acid; and hormones such as vasopressin, thyroid hormone, oestradiol, progesteron, testosteron, glucocorticoid and dehydroepiandosteron. Preferred stimuli are interleukin (IL) 6 and IL 10, TNF alpha; glucocorticoid, and LPS. A particularly preferred stimulus is provided by LPS, which appeared to be a potent stimulus especially for blood cells.
Preferably, said typing in a method according to the invention is based on a comparison with a reference. In one embodiment, said reference comprises the corresponding RNA levels for said set of genes in an RNA sample from a cell isolate from an individual or a group of individuals not suffering and/or not at risk of suffering from said psychiatric disorder. Alternatively, said reference comprises the corresponding RNA levels for said set of genes in an RNA sample from a cell isolate from an individual or a group of individuals suffering from said psychiatric disorder. In a preferred embodiment, said reference sample comprises both samples from an individual or a group of individuals not suffering and/or not at risk of suffering from said psychiatric disorder and from an individual or a group of individuals suffering from said psychiatric disorder. It is preferred that said group of individuals comprises at least two individuals, more preferred at least three individuals, more preferred at least four individuals, more preferred at least five individuals, more preferred at least ten individuals. A preferred reference comprises RNA levels determined for said set of genes in a cell isolate prior to and/or after contacting said cell isolate with a stimulus.
In a preferred embodiment, said RNA levels from a reference is stored on an electronic storage device, such as, but not limited to, a computer or a server. Said reference can be addressed to compare the determined RNA levels of an individual suffering from a psychiatric disorder, or at risk of suffering there from, with said reference. Said comparison preferably provides a resemblance score, which is a measure for the similarity of the determined RNA levels of said individual with the corresponding levels in said reference. An arbitrarily determined threshold can be provided to classify an individual into a group with a high resemblance score as compared to said reference, and a group with a low resemblance score as compared to said reference.
The invention further provides a method for typing a cell isolate of an individual suffering from a psychiatric disorder, or suspected of suffering there from, the method comprising providing a first RNA sample from a cell isolate from said individual, providing a second RNA sample from said cell isolate after contacting said cell isolate with a stimulus, determining RNA levels for a set of genes in said first and second RNA sample, wherein said set of genes comprises at least two of the genes having the gene number 1-106 listed in Table 9 (first column), more preferably at least two of the genes having the gene number 1-142 listed in Table 9 (first column), and typing said isolate on the basis of the levels of RNA determined for said set of genes.
Preferably, said typing is based on the levels of RNA as determined in said RNA sample provided after contacting the cell isolate with a stimulus, or whereby said typing is based on the ratio of RNA levels determined in said RNA samples provided from a cell isolate prior to an after contacting with a stimulus. It was found by the inventors that typing based on levels of RNA determined for said set of genes after contacting cells with a stimulus, or based on the ratio of RNA levels before and after contacting said cell isolate with a stimulus, provides a more robust typing whereby more information about the disease state of the individual is captured.
A psychiatric disorder comprises a clinical disorder, a mental disorder, a psychotic disorder such as schizophrenia, a depressive disorder, a substance-related disorder such as an alcohol-related disorder or a sedative-related disorder, a somatoform disorder, a factitious disorder, a dissociative disorder, or a personality disorder. A preferred psychiatric disorder comprises a depressive disorder.
The term cell isolate refers to a tissue sample comprising cells from an individual. Preferably, said tissue sample is a sample that can easily be isolated from said individual, including but not limited to a biopsy such as for example a skin biopsy comprising keratinocytes, a buccal swap comprising mucosa cells, and blood comprising blood cells. A preferred tissue sample for use in a method of the invention is provided by whole blood comprising blood cells such as peripheral mononuclear blood cells (PBMC). If required, PBMC can be isolated from an individual in sufficient quantifies, allowing typing of said cells according to a method of the invention. The cells from said tissue sample can be dissociated, if required, to obtain a preferably single cell suspension. Methods for dissociation of cells, employing for example proteases such as collagenase and trypsin, are known by a skilled artisan. An RNA sample can be obtained from said cell isolate by known methods including, for example, Trizol (Invitrogen; Carlsbad, Calif.), RNAqueous® Technology (Qiagen; Venlo, the Netherlands), Maxwell™ 16 Total RNA Purification Kit (Promega; Madison, Wis.) and, preferably, PAXgene™ Blood RNA Kit (Qiagen; Venlo, the Netherlands).
A particularly preferred tissue sample is whole blood comprising blood cells.
If required, said cell isolate can be stored prior to providing a RNA sample from said cell isolated, under conditions that preserve the quality of the RNA. Examples of such preservative conditions are known in the art and include fixation using e.g. formaline, the use of RNase inhibitors such as RNAsin™ (Pharmingen) or RNAsecure™ (Ambion), the use of preservative solutions such as RNAlater™ (Ambion), and reagents such as guananidine thiocyanate, or comparable reagents, for example, as used in the PAXgene™ Blood RNA Kit.
A cell isolate can be contacted with a stimulus by incubating said cell isolate, preferably comprising single cells, with said stimulus. For this, said cell isolate can be incubated or cultured in a medium suited for said cell isolate under conditions that favor survival of the cells. Said contacting can be for a predetermined period of time, ranging from between about 1 minute to several days. A preferred time period ranges between about 1 hour and about 24 hours, more preferred between about 2 hours and about 12 hours, and more preferred is about 5 hours. Methods and means for incubating or culturing a cell isolate are known in the art and can be obtained from, for example, Sigma-Aldrich and Invitrogen-Gibco.
Methods for determining RNA levels for a set of genes are known in the art and comprise Northern blotting, amplification methods such as based on polymerase chain reaction (PCR) and nucleic acid sequence based amplification (NASBA), and array-based methods. Preferred PCR-based methods comprise multiplex PCR and multiplex ligation-dependent probe amplification (Eldering et al., (2003) Nucleic Acids Res. 31: e153). A more preferred method is real-time quantitative PCR (Bustin (2002) J Mol Endocrin 29: 23-39).
An array format is particularly useful for this purpose. An array comprises probes specific for a gene or gene product in an arrayed format on a solid support. Said probes comprise nucleic acid molecules or mimics thereof such as peptide nucleic acid (PNA) that can hybridize to a labeled copy of RNA isolated from a cell isolate. Said probes preferably comprise a stretch of at least 20 nucleic acid residues that are identical to, or at least 95% similar to, a stretch of nucleic acid residues on a RNA molecule of which the level is to be determined, allowing base-pairing between said probe and said RNA molecule or a labeled copy thereof.
Methods for direct of indirect labeling RNA are known in the art and include Fluorescent Direct Label Kit (Agilent Technologies), GeneBeam™ First Strand cDNA Labeling kit (Enzo Life Sciences), and SuperScript™ Direct or Indirect cDNA Labeling Module (Invitrogen). The label preferably comprises fluorescent label such as cyanine 3 and cyanine 5. In a preferred embodiment, said first and second RNA sample are labeled with different dyes, allowing simultaneous hybridization of the labeled samples to a single array. Suitable hybridization and washing conditions as described in, for example, protocols from Agilent and Affymetrix can be applied for hybridization and washing of the arrays. A confocal scanning device is used to determine the intensity of label that remained associated with a probe, as a measure for the level of RNA present in a cell isolate. Different bioinformatic software tools, such as, for example, Agilent Feature Extraction, Limma®, Edwards, Loess, and Aquantile, can be applied to analyze the data. Data analysis comprises normalization of the data to reduce bias within and between experiments, such as dye switch and dye swap (see, for example, Sterrenburg et al., Nucleic Acids Res. 2002 Nov. 1; 30(21):e116).
A depressive disorder refers to a disorder causing consistent loss of interest or pleasure in daily activities for at least a 2 week period, and affecting social, occupational, educational or other important functioning. Said depressive disorder can be dysthymia, a bipolar disorder or manic-depressive illness, or, preferably, major depression which is also known as clinical depression, unipolar depression, and major depressive disorder (MDD). A method of the invention can also be applied for typing a cell isolate of an individual suffering from: 1) MDD or subtypes of MDD comprising melancholic depression, atypical depression, double depression, and MDD with anger attacks; or 2) anxiety disorders (i.e. panic disorder, generalized anxiety disorder, phobia, post-traumatic stress disorder, obsessive-compulsive disorder,
The genes listed in Table 9 with gene numbers 1-106 (first column) were identified because probes specific for these genes yielded a low p-value (t-test for MDD versus Control, <0.005), a low false discovery rate (FDR) of less than 0.01%, and yielded a large effect size (>30%, M-data; >30% R-data. The FDR of a set of predictions is the expected percentage of false predictions in the set of predictions.
The genes listed in Table 9 with gene number numbers 1-142 (which have in addition the alternative numbers 1-142 as present in the second column of Table 9) were identified because probes specific for these genes yielded a good prediction for the disease, and the initial set of 270 was further selected for optimal performance: high participation (100% in PAM-score), high difference between expression values from MDD patients and control (e.g. 0.3≦|DifferenceMDD vs. control)|<0.4=2 points, 0.4≦|DifferenceMDD vs. control)|<0.5=3 points), low p-value in t-test for MDD vs. Control (P<0.02; e.g. 0.01<P-value≦0.02=2 points, 0.007<P-value≦0.01=3 points), low q-score for MDD vs. Control in SAM analysis (25<q-value<40=3 points, 0<q-value<25=4 points, and when a gene was represented by multiple probes, the replicates should have similar values. Genes with a score>15 were selected for the MDD-marker.
It has been found that the inclusion at least two genes numbered 1-106 (first column) of Table 9 in a method of the invention already provides a good prediction of the sample as being derived from an individual suffering from depression or not. A more accurate prediction is possible by including more genes numbered 1-106 (first column) of Table 9 in a method of the invention.
It has been found that the inclusion at least two genes numbered 1-142 (first column) of Table 9 in a method of the invention also provides a good prediction of the sample as being derived from an individual suffering from depression or not. A more accurate prediction is possible by including more genes numbered 1-142 (first column) of Table 9 in a method of the invention.
Preferably at least three of the genes numbered 1-106 (first column) in Table 9, more preferred at least four of the genes numbered 1-106 (first column) in Table 9, more preferred at least five of the genes numbered 1-106 (first column) in Table 9, more preferred at least six of the genes numbered 1-106 (first column) in Table 9, more preferred at least seven of the genes numbered 1-106 (first column) in Table 9, more preferred at least eight of the genes numbered 1-106 (first column) in Table 9, more preferred at least nine of the genes numbered 1-106 (first column) in Table 9, more preferred at least ten of the genes numbered 1-106 (first column) in Table 9, more preferred at least eleven of the genes numbered 1-106 (first column) in Table 9, more preferred at least twelve of the genes numbered 1-106 (first column) in Table 9, more preferred at least thirteen of the genes numbered 1-106 (first column) in Table 9, more preferred at least fourteen of the genes numbered 1-106 (first column) in Table 9, more preferred at least fifteen of the genes numbered 1-106 (first column) in Table 9, more preferred at least sixteen of the genes numbered 1-106 (first column) in Table 9, more preferred at least seventeen of the genes numbered 1-106 (first column) in Table 9, more preferred at least eighteen of the genes numbered 1-106 (first column) in Table 9, more preferred at least nineteen of the genes numbered 1-106 (first column) in Table 9, more preferred at least twenty of the genes numbered 1-106 (first column) in Table 9, more preferred at least twenty-one of the genes numbered 1-106 (first column) in Table 9, more preferred at least twenty-two of the genes numbered 1-106 (first column) in Table 9, more preferred at least thirty of the genes numbered 1-106 (first column) in Table 9, more preferred at least fifty of the genes numbered 1-106 (first column) in Table 9, most preferred all of the genes numbered 1-106 (first column) in Table 9 are included in a method of the invention.
In another embodiment, is a method including at least two of the genes (numbered 1-142, first column) listed in Table 9, or analogs thereof. Analogs, which include splice variants of said genes are listed in Table 9 as an alternative Unigene cluster or as an alternative Genebank ID. More preferred is a method including three of the genes listed in Table 9, more preferred at least four of the genes listed in Table 9, more preferred at least five of the genes listed in Table 9, more preferred at least six of the genes listed in Table 9, more preferred at least seven of the genes listed in Table 9, more preferred at least eight of the genes listed in Table 9, more preferred at least nine of the genes listed in Table 9, more preferred at least ten of the genes listed in Table 9, more preferred at least eleven of the genes listed in Table 9, more preferred at least twelve of the genes listed in Table 9, more preferred at least thirteen of the genes listed in Table 9, more preferred at least fourteen of the genes listed in Table 9, more preferred at least fifteen of the genes listed in Table 9, more preferred at least sixteen of the genes listed in Table 9, more preferred at least seventeen of the genes listed in Table 9, more preferred at least eighteen of the genes listed in Table 9, more preferred at least nineteen of the genes listed in Table 9, more preferred at least twenty of the genes listed in Table 9, more preferred at least twenty-one of the genes listed in Table 9, more preferred at least twenty-two of the genes listed in Table 9, more preferred at least thirty of the genes listed in Table 9, more preferred at least fifty of the genes listed in Table 9, most preferred all of the genes listed in Table 9 are included in a method of the invention.
In a preferred embodiment, a method according the invention comprises at least two of the genes numbered 1-106 (first column) in Table 9, whereby said genes are numbered 1 and 2. An even more preferred method according to the invention comprises the genes numbered 1-6 of the genes numbered 1-106 (first column) in Table 9, more preferred the genes numbered 1-8 of the genes numbered 1-106 (first column) in Table 9, more preferred the genes numbered 1-12 of the genes numbered 1-106 (first column) in Table 9, more preferred the genes numbered 1-13 of the genes numbered 1-106 (first column) in Table 9 (HBG1, KRT23, AL833005, Caprin1, CENTD3, PROK2, ZBTB16, F11R, FANCE, LOC150166, TMEM4, SLC7A7, and MLC1), more preferred the genes numbered 1-22 of the genes numbered 1-106 (first column) in Table 9.
In another preferred embodiment, a method according the invention comprises at least two of the genes listed in Table 9, whereby said genes have the alternative gene numbers (second column) 1 and 2. In another preferred embodiment, a method according to the invention comprises at least two of the genes listed in Table 9, whereby said genes are CAPRIN1 and ZBTB16. An even more preferred method according to the invention comprises the genes having the alternative numbers (second column) 1-7 of the genes listed in Table 9, more preferred the genes having the alternative numbers (second column) 1-8 of the genes listed in Table 9, more preferred the genes having the alternative numbers (second column) 1-12 in Table 9, more preferred the genes having the alternative numbers (second column) 1-13 of the genes listed in Table 9, more preferred the genes having the alternative numbers (second column) 1-22 of the genes listed in Table 9.
In another preferred embodiment, a method according to the invention comprises at least the genes CLEC4A, F11R, TMEM4 and SLC7A7. In another preferred embodiment, a method according to the invention comprises at least the genes CLEC4A, PLSCR1, PROK2, ZBTB16 and MLC1. In another preferred embodiment, a method according to the invention comprises at least the genes CORO1A, FCN2, KRT23, MLC1, NRGN, PLXNB2, PPBP, PTGS1, RNPC1, SOX4 and VAMP8. In another preferred embodiment, a method according to the invention comprises at least the genes MLC1, RNPC1, PROK2, CLEC4A, CAPRIN1 and ZBTB16.
In another preferred embodiment, a method according to the invention further comprises the gene PBPP. In a preferred embodiment a set of genes of the present invention further comprises the gene PBPP.
Typing of a cell isolate of an individual exhibiting symptoms of depression with a method of the invention can be used to determine whether said individual is indeed suffering from a depressive disorder or is suffering from depressive symptoms that will not turn into a depressive disorder. A method of the invention can also be used to determine the biological severity of the depressive disorder. In a preferred embodiment, said typing allows prognosticating the severity of the syndrome, and/or prognosticating a response to medical treatment.
Medical treatment comprises antidepressant medication such as lithium, tricyclic antidepressants (TCAs), monoamine oxidase inhibitor (MAOIs), and selective serotonin reuptake inhibitors such as citalopram (Celexa), fluoxetine (Prozac), paroxetine (Paxil), and sertraline (Zoloft). A method of the invention is preferably used to prognosticate a response of an individual towards treatment with a selective serotonin reuptake inhibitor.
The invention further provides a set of probes for typing a cell isolate of an individual suffering from a depressive disorder or suspected of suffering there from, whereby said set of probes comprises nucleic acid sequences specific for at least two of the genes numbered 1-106 (first column) in Table 9. In another preferred embodiment, said set of probes comprises nucleic acid sequences specific for at least two of the genes listed in Table 9. Said set of probes preferably comprises at least two probes specific for genes numbered 1-106 (first column) in Table 9, wherein each of said at least two probes is specific for a different gene of the genes numbered 1-106 (first column) of Table 9. In another preferred embodiment, said set of probes preferably comprises at least two probes specific for genes listed in Table 9, wherein each of said at least two probes is specific for a different gene of Table 9. Preferably said set of probes comprises at least three probes that are specific for different genes of the genes numbered 1-106 (first column) in Table 9. In another preferred embodiment, said set of probes comprises at least three probes that are specific for different genes listed in Table 9. Said set of probes preferably comprises probes that are specific for each of the genes numbered 1-106 (first column) in Table 9. In another preferred embodiment, said set of probes preferably comprises probes that are specific for each of the genes listed in Table 9. Said probes are preferably selected to hybridize to specific exons of the genes numbered 1-106 (first column) in Table 9, or to hybridize to the 3′ ends of messenger RNA corresponding to at least two of the genes numbered 1-106 (first column) in Table 9. In another preferred embodiment, said probes are preferably selected to hybridize to specific exons of the genes listed in Table 9, or to hybridize to the 3′ ends of messenger RNA corresponding to at least two of the genes listed in Table 9. Methods for designing said probes specific for the genes numbered 1-106 (first column) or for said probes specific for the genes listed in Table 9 are known in the art and have been discussed, for example, in Bouwman et al. (2006) J Neurochem 99: 84-96; and Stam et al. (2007) Eur. J. Neurosci: In press).
In a preferred embodiment, said set of probes is capable of hybridizing to at least two of the genes numbered 1-106 (first column) in Table 9, and/or an RNA product thereof. In a more preferred embodiment, said set of probes is capable of hybridizing to at least two of the genes listed in Table 9, and/or an RNA product thereof. Said set of probes preferably comprises between 2 and 500 different probes, wherein at least two of said probes are specific for a different gene numbered 1-106 (first column) in Table 9, or more preferably for any different gene listed in Table 9. Preferably said set comprises between 10 and 100 different probes wherein at least two of said probes are specific for a different gene numbered 1-106 (first column) in Table 9, or more preferably for any different gene listed in Table 9. In a particularly preferred embodiment said set comprises 20 probes, wherein each of said probes is specific for a different gene numbered 1-106 (first column) in Table 9, or more preferably for any different gene listed in Table 9.
In another preferred embodiment, a method according to the invention said set of probes comprises at least probes specific for the genes CLEC4A, F11R, TMEM4 and SLC7A7. In another preferred embodiment, a method according to the invention, said set of probes comprises at least probes specific for the genes CLEC4A, PLSCR1, PROK2, ZBTB16 and MLC1. In another preferred embodiment, a method according to invention said set of probes comprises at least probes specific for the genes CORO1A, FCN2, KRT23, MLC1, NRGN, PLXNB2, PPBP, PTGS1, RNPC1, SOX4 and VAMP8. In another preferred embodiment, a method according to the invention said set of probes comprises at least probes specific for the genes MLC1, RNPC1, PROK2, CLEC4A, CAPRIN1 and ZBTB16.
In another preferred embodiment, a set of probes according to the invention further comprises a probe specific for the gene PBPP.
In further preferred embodiment, said set of probes comprises 12 probes, wherein said set of probes is specific for the genes numbered 1-12 of the genes numbered 1-106 (first column) in Table 9. The RNA levels of the genes numbered 1-12 of the genes numbered 1-106 (first column) in Table 9, HBG1, KRT23, AL833005, Caprin1, CENTD3, PROK2, ZBTB16, F11R, FANCE, LOC150166, TMEM4, SLC7A7, are preferably used for typing a cell isolate of an individual suffering from a depressive disorder.
In another further preferred embodiment, said set of probes comprises 7 probes, wherein said set of probes is specific for the genes having the alternative numbers (second column) 1-13 in Table 9. The RNA levels of the genes having the alternative numbers (second column) 1-13 in Table 9, KRT23, CAPRIN1, PLSCR1, PROK2, ZBTB16, TMEM4, CLEC4A, MLC1, are preferably used for typing a cell isolate of an individual suffering from a depressive disorder.
In another preferred embodiment, a method according to the invention further comprises the use of RNA levels of the gene PBPP for typing a cell isolate of an individual suffering from a depressive disorder.
In another preferred embodiment, said set of probes comprises no more than 1000 probes, preferably no more than 900, preferably no more than 800, preferably no more than 700, preferably no more than 600, preferably no more than 500, preferably no more than 400, preferably no more than 300, preferably no more than 200, preferably no more than 142, preferably no more than 125, preferably no more than 106, preferably no more than 90, preferably no more than 80, preferably no more than 70, preferably no more than 60, preferably no more than 50, preferably no more than 40, preferably no more than 30, preferably no more than 25, preferably no more than 20, preferably no more than 15, preferably no more than 10.
The invention also provides the use of the set of probes according to the invention for prognosticating syndrome severity, and/or a response to medical treatment for an individual suffering from a depressive disorder.
In another embodiment, the invention provides a set of primers for typing a cell isolate of an individual suffering from a depressive disorder or suspected of suffering there from, whereby said set of primers comprises primers specific for at least two of the genes numbered 1-106 (first column) of Table 9 In another embodiment, the invention provides a set of primers for typing a cell isolate of an individual suffering from a depressive disorder or suspected of suffering there from, whereby said set of primers comprises primers specific for at least two of any of the genes listed in Table 9. It is preferred that primers specific for a gene numbered 1-106 (first column) or any gene of Table 9 can be used in an amplification method to determine a level of expression of the said gene numbered 1-106 (first column) or any gene of Table 9, Therefore, it is preferred that these primer result in the amplification of not more than 2 kilobases of a continuous stretch of nucleic acid sequences on a mRNA product of said gene numbered 1-106 (first column) or any gene of Table 9, more preferred not more that 1 kilobase, most preferred between 50 bases and 200 bases. It is furthermore preferred that said stretch of nucleic acid sequences on a mRNA product span an exon-intron boundary in said gene numbered 1-106 (first column) or any gene of Table 9.
In another preferred embodiment, a method according to the invention said set of primers comprises at least primers specific for the genes CLEC4A, F11R, TMEM4 and SLC7A7. In another preferred embodiment, a method according to the invention, said set of primers comprises at least primers specific for the genes CLEC4A, PLSCR1, PROK2, ZBTB16 and MLC1. In another preferred embodiment, a method according to invention said set of primers comprises at least primers specific for the genes CORO1A, FCN2, KRT23, MLC1, NRGN, PLXNB2, PPBP, PTGS1, RNPC1, SOX4 and VAMP8. In another preferred embodiment, a method according to the invention said set of primers comprises at least primers specific for the genes MLC1, RNPC1, PROK2, CLEC4A, CAPRIN1 and ZBTB16.
In another preferred embodiment, said set of primers comprises no more than 1000 primers, preferably no more than 900, preferably no more than 800, preferably no more than 700, preferably no more than 600, preferably no more than 500, preferably no more than 400, preferably no more than 300, preferably no more than 200, preferably no more than 142, preferably no more than 125, preferably no more than 100, preferably no more than 90, preferably no more than 80, preferably no more than 70, preferably no more than 60, preferably no more than 50, preferably no more than 40, preferably no more than 30, preferably no more than 25, preferably no more than 20, preferably no more than 15, preferably no more than 10.
The invention furthermore provides the use of the set of primers according to the invention for prognosticating syndrome severity, and/or a response to medical treatment for an individual suffering from a depressive disorder.
An unchallenged sample was taken within 1 h after blood has been withdrawn (and minimally 10 min after). For this, a full heparin tube was inverted 5 times prior to opening the tube. Then, 2.5 ml was transferred to a PAXtube (PreAnalytiX GmbH) by decantation or pipeting (large tip opening). The PAXgene tube was inverted at least 10 times, and kept at room temperature for minimally 2 hours. Then, it was put at −20° C.
A challenged sample was made from the remaining blood in the heparin tube (allowing enough oxygen supply). To the remaining blood 1% (V/V) of LPS (final concentration: 10 ng LPS/ml blood) was added, and the tube was inverted 3 times to mix well immediately after addition of the LPS solution. The tube was kept at 37° C. slowly rotating (or otherwise lying flat slowly shaking) for 5-6 h. After this, the tube was inverted 5 times prior to opening the tube. Then, 2.5 ml was transferred to a PAXtube. This PAXgene tube was inverted at least 10 times, and kept at room temperature for minimally 2 h. Then, it was put at −20° C.
RNA isolation proceeds according to the PAX gene protocol (PreAnalytiX GmbH). After elution the RNA concentration was checked. Samples were subsequently precipitated with 0.3 M NaAc and ethanol with the addition of 0.1 μg linear acrylamide (co-precipitant), and stored at −80° C. until use. Labeling of RNA (1 μg) to use in microarray experiments was according to the Agilent protocol. Labeled RNA (1.5 μg Cy3-labeled basal sample; 1.25 μg Cy-5 labeled LPS sample) was hybridized onto the 44 k Human Agilent whole genome arrays according to the protocol. After washing (Agilent protocol), the arrays were quickly dried using acetonitrile. The array was scanned using the Agilent scanner.
Signal intensities were extracted using Agilent Feature Extraction (v8.0), and the data were analyzed using Limma (R), using median signals and median background. Non-uniform signals were flagged prior to analysis. After background subtraction (Edwards, offset 30), within normalization (Loess) and between array normalization (Aquantile) data were exported to SPSS or Excel. Further data selection consisted of the following criteria (in that order), resulting in ˜25,000 genes:
For any flagged red or green signal, the signal was discarded
Signal intensities for any gene should be >6.8 (log 2)
For any gene the signal should be present in >80% of the control and in >80% of the MDD samples
Subsequent analysis were done on the ratio (LPS vs basal; M), the red (LPS; R) or the green (basal; G) signal.
To identify classifier genes, the program PAM was used. Disease-state was used as identifier both for the M-data and the separate R- and G-dataset. The prediction errors were low, i.e. 35-40%. The analysis resulted in sets of 105 probes that participated>90% in the prediction. From these genes a selection was made based on the fact whether probes yielded a low p-value (t-test for MDD versus Control, <0.005 & FDR<0.01%), a large effect size (>30%), and whether multiple probe sets that represent a single gene had low p-values (<0.05) and high effect size (>20%) as well. In addition, the selected probe should be identifiable as an entry in Entrez Gene. As such, an initial selection of 12 genes was made. Some genes participated were informative for both the G- and R-data. For the G- and R-data, as well as for the M-data, the z-translocated Pearson correlation of the expression of each individual to the average of the expression from Controls was subtracted from the Pearson correlation of the expression of each individual to the average of the expression from MDD. Samples with a high score fulfil to the average MDD profile whereas samples with a low score fulfil to the average control profile. Figures representing these correlations for the separate R-data and M-data are shown (FIG. 13A,B, respectively). When the individual scores for the M-data and the G-& R-data were added, an even more robust difference could be made (
In another case, the program PAM was used to identify classifier genes. Disease-state was used as identifier both for the separate R- and G-dataset. The prediction errors were low, i.e. 35-40%. The PAM analysis resulted in sets of 160, and 110 probes (G- and R-dataset, respectively) that participated>90% in the prediction. From these genes a selection was made based was reduced further by selection based on the effect size (>30% differential expression between MDD and control), p-value (<0.05) and the robustness of the gene in the classifier (classification participation). In addition, the selected probe should be identifiable as an entry in Entrez Gene. As such, an initial selection of 12 genes was made for each data-set. Some genes participated were informative for both the G- and R-data. For the G- and R-data, as well as for the M-data, the z-translocated Pearson correlation of the expression of each individual to the average of the expression from Controls was subtracted from the Pearson correlation of the expression of each individual to the average of the expression from MDD. Samples with a high score fulfil to the average MDD profile whereas samples with a low score fulfil to the average control profile. A figure representing this correlation for the R-data is shown (
Major Depressive Disorder (MDD) is a highly prevalent psychiatric disorder that accounts for major psychological, physical and social impairments. Life-time prevalence for MDD is estimated from 15-17%, with women being affected twice as often as men1,2. Different factors have been found to play a role in the onset of MDD, including biological3, genetic1,4, and environmental factors (e.g. stress)5, but the exact pathogenesis of MDD remains largely unclear. At present, criteria for MDD diagnosis and treatment are based on various signs and symptoms that do not always fit into strict diagnostic categories such as DSM IV. Despite the fact that various risk-factors are known (i.e. genetics, gender, age of onset), biological markers that could support diagnosis, predict the risk for the (re)occurrence of MDD or for treatment outcome are not currently available.
Recent studies have suggested gene-expression profiling in blood cells as a promising alternative for identification of disease classifiers and risk markers6-9. Blood cells could be viewed as biosensors, of which the gene expression is influenced by the surrounding body fluids and all effector molecules therein. As such, gene expression of blood cells might reveal previous or developing disease states and thus present valuable diagnostic markers of an individual, with the perspective of predicting treatment efficacy or a patient's long-term prognosis. As a first step towards this goal, we explored the possibility to segregate subjects with MDD from healthy controls based on gene expression in whole blood. To this end, we examined whole genome gene expression from venous whole blood samples of unmedicated subjects (MDD and healthy controls) from the Netherlands Study of Depression and Anxiety (NESDA) cohort10.
Generally, a limitation of genomics-based assessment of human samples is individual variability. Despite sample matching, inter-individual and temporal variation in gene expression patterns occur in both brain11 and in whole blood12. In this study, we applied a powerful gene expression stimulus ex vivo. Previously, we have shown that such a stimulus generates a higher signal to basal state ratio13, thereby uniquely revealing differences in genomic responses related to disease state and to the individual's genotype. A lipo-polysaccharide (LPS) stimulus was chosen because it is a strong inducer of gene expression in human monocytes13,14. Moreover, studies on blood lymphocytes have revealed a close association between the state of the immune system and major psychiatric disorders15-19, and LPS might induce depressive-like behavior when applied in vivo in human subjects as well as in rodents20,21. We used this approach to classify MDD patients based on whole blood gene expression profiles from unmedicated MDD patients and controls selected from the Netherlands Study of Depression and Anxiety and identified a selection of genes of which the expression is predictive for the disease status.
In this study, we enrolled a total of 35 subjects currently experiencing a single or recurrent MDD episode, and 37 healthy controls with no current or previous diagnosis of MDD (Table 1 and 2); all cases are part of the Netherlands Study of Depression and Anxiety (NESDA; www.nesda.nl) cohort10,22.
For these subjects, blood was taken at two clinical sites (Amsterdam, Leiden). The composite interview diagnostic instrument-lifetime version 2.1-(CIDI23) was used to diagnose psychiatric disorders according to DSM-IV algorithms24. The Inventory of Depressive Symptomatology-IDS-SR30 was used to define MDD severity10. MDD cases with IDS scores lower than 21 and healthy controls with scores higher than 13 were excluded from further study. None of the cases and controls had co-morbid physical (e.g. malignancies, cardiovascular, neurological, immune or endocrine disorders) or psychiatric disorders (e.g. personality disorders, threatening compliance and safety) other than the diagnosed MDD and anxiety disorders for the patient group. Nor did any MDD patients or controls have a previous record of taking any medication, with the exception of sporadic use of paracetamol and ibuprofen (i.e. <3 times per week). For all MDD patients, healthy controls were matched for age, sex, smoking (previous, quit or no smoking; and age of onset of smoking), in this order. For females, additional matching criteria consisted of stage of cycle, use of contraceptive drugs, and known pregnancy25. The Medical Ethics committee of the VU University Medical Center approved the study protocol. Written informed consent was obtained from all participants. For a limited set of patients and controls, their leukocyte count was analyzed. There was no difference based on disease status, nor did it correlate with the MDD-marker (Table 4).
Serial venous whole blood samples were obtained between 8:00 a.m. and 10:00 a.m., after overnight fasting, in one 7 ml heparin-coated tube (Greiner). Between 10-60 minutes after blood draw, 2.5 ml of blood was transferred into a PAXgene tube (Qiagen) and used as the basal (non-LPS stimulated) sample. This tube was kept at room temperature for a minimum of 2 h, and subsequently stored at −20° C. The remaining blood (4.5 ml) was stimulated by addition of LPS (10 ng/ml blood; E. coli, Sigma). LPS-stimulated samples were laid flat and incubated at a slow rotation for 5-6 h at 37° C. before a 2.5 ml sample of this LPS-stimulated blood was transferred into a PAXgene tube (Qiagen), and treated as described for the basal sample.
RNA isolation was carried out as described previously13 including DNase treatment after thawing of the PAX tubes for 2 h at room temperature. RNA quality was first determined by spectrophotometry (NanoDrop ND-1000 UV-Vis Spectrophotometer, Nanodrop Technologies). Because of contaminants due to the isolation procedure, visible in the spectrophotogram at 200-230 nm, RNA samples were precipitated using ethanol, and NaAc with addition of 0.1 μg linear acryl-amide. Subsequently, RNA quality was determined by spectrophotometry and by using the RNA 6000 NanoChip kit on an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, Calif.). RNA passing quality control criteria (RIN values>8) were used for further analysis.
Labeling of RNA (1 μg) was according to the Agilent protocol (Agilent technologies, Palo Alto, USA). Using spectrophotometry, cRNA concentration and labeling efficiency were determined (Nanodrop ND-1000 UV-Vis Spectrophotometer, Nanodrop Technologies). From each patient, labeled cRNA (1.5 μg Cy3-labeled basal blood sample; 1.25 μg Cy-5 labeled LPS-stimulated blood sample) was hybridized for 15-17 h at 65° C. onto the 44K Human Agilent whole genome arrays according to the manufacturer's protocol. After washing, the arrays were quick-dried using acetonitrile. The array was scanned using the Agilent GeneArray Scanner (Agilent G2505B) with default settings for two-color hybridization.
Signal intensities were extracted using Agilent Feature Extraction (v8.0). Data were analyzed with Limma (Bioconductor) using median signals and mean background. Control spots were normalized, but were set not to influence normalization. Non-uniform signals were removed prior to further analysis. After background subtraction (Edwards, offset 30), within normalization (Loess) and between array normalization (Aquantile) data were exported to SPSS. Further data selection consisted of the following criteria (in this order); 1) Signal intensities for any gene should be 1.3× background (>6.8 (log 2)), and have non-saturated levels (<15.7 (log 2)), 2). In addition, for any gene the signal should be present in >80% of the control as well as in >80% of the MDD samples to avoid type-1 errors due to a limited number of participants in each group. These genes were given an absent-call. From the total of 41675 spots, filtering resulted in 22789 spots for the basal sample and 22935 spots in the LPS-stimulated sample, with an overlap of 21107 spots (>92%).
We used a Significance Analysis for Microarrays (SAM) analysis tool using R functions (Bioconductor, SAMR version 1.2) as an alternative method to identify differentially expressed genes and to estimate a False Discovery Rate (FDR). SAM was carried out on the data set using 100 permutations.
The Prediction Analysis of Microarrays (PAM)26 analysis tool (Bioconductor, pamr version 1.28) was used to generate cross-validated gene lists that distinguish MDD from healthy controls at different misclassification error rates. The randomly assigned first round of 21 MDD patients and 21 matched healthy controls was used as a training set, and the second round of 12 MDD patients and 13 matched healthy controls (of which 2 were not matched on gender) served as a validation set for the microarray. The classification accuracy was assessed by a 10 times repeated 10-fold cross-validation (leave-one-out) on the training set. Genes were further selected for optimal performance: high participation (100% in PAM-score), high difference between expression values from MDD patients and control (e.g. 0.3≦|DifferenceMDD vs. control)|<0.4=2 points, 0.4≦|DifferenceMDD vs. control)|<0.5=3 points), low p-value in t-test for MDD vs. Control (P<0.02; e.g. 0.01<P-value<0.02=2 points, 0.007<P-value<0.01=3 points), low q-score for MDD vs. Control in SAM analysis (25<q-value<40=3 points, 0<q-value<25=4 points, and when a gene was represented by multiple probes, the replicates should have similar values. Genes with a score>15 were selected for the MDD-marker.
The MDD-marker score of an individual sample (i) is defined as its Fischer-z transformed Pearson correlation with the average MDD profile (MDD) minus its Fischer-z transformed Pearson correlation with the average control profile (Control) for the indicated number of genes, based on a calculation by Scherzer et al7. The average profile was determined for each set of samples (training and validation set) and for each technique used (microarray and qPCR).
xi={Gi1, . . . , Gin}
Gin=gene expression for individual i at gene n
n=number of genes
Real-Time Quantitative PCR (qPCR)
For genes of interest, transcript-specific primers were designed based on Genbank sequence entries using Primer Express software (PE Biosystems, USA) with the manufacturer's settings. Only primers were taken of which the endpoint PCRs showed the amplicon and no primer-dimers as determined by generation of dissociation curves, and which had high amplification efficiencies.
From each sample, random primed (hexamers; Eurogentec, Belgium) cDNA (500 ng total RNA) was made with reverse transcriptase (200 U; Promega, USA) according to the manufacturer's protocol. Aliquots of cDNA were stored at −80° C., because repeated freeze-thaw cycles affect measured Ct values. For qPCR measurements (ABI PRISM 7900, Applied Biosystems), PCR conditions and SYBR green reagents (Applied Biosystems; USA) were used in a reaction volume of 10 μl using transcript-specific primers (300 nM) on cDNA (corresponding to ˜2 ng RNA).
The obtained cycle of threshold (Ctx) value for every gene was used to calculate the relative level of gene expression by normalization to the geometric means of replicated reference controls (CtHK; ACTB, GAPDH, TLN1). For all statistical calculations loge-based values were used for the amount of normalized transcript of interest, C−(Ctx−CtHK) (C=10), or the LPS-induced ratio, −((Ctx,LPS−CtHK, LPS)−(Ctx,basal−CtHK, basal)).
Fischer z-translocated Pearson correlation scores were analyzed by Newman-Keuls, Mann-Whitney U and, and χ2 tests. Significance of correlations between IDS and MDD-marker score (two-sided, α=0.05) was tested by Spearman rank correlation coefficient. Significance of individual genes was tested by Newman-Keuls test (two-sided, α=0.05).
In order to build a molecular marker gene set for MDD based on blood gene expression, we excluded several confounding factors. Because of the large NESDA cohort (1115 MDD patients), we were able to use stringent exclusion criteria to avoid the possibility that medication (e.g. antidepressants or benzodiazepines), or physical disorders could influence gene expression27,28.
After inclusion of 35 MDD subjects and 37 healthy controls, blood samples from 21 of the MDD patients were randomly taken in a first collection round and were matched with 21 healthy controls (Table 1, Table 5). This first round served as a training set to determine a molecular marker using a classifier approach (see below). In a second round, an additional 12 MDD patients and 13 healthy controls were randomly collected for validation of the molecular MDD-marker by microarray. Microarray analysis with genome-wide coverage was performed for these 67 subjects, and for each subject the labeled cRNA derived from a basal blood sample was co-hybridized with that of an LPS-stimulated blood sample. When the LPS sample was compared with the basal sample for all 67 arrays, about 1700 probes (7%) showed significant regulation according to stringent criteria (FDR<0.0001; −0.8>[LPS vs. basal] (log 2)>0.8), with 65% of these up-regulated and 35% down-regulated (
In order to build a molecular marker set that has diagnostic value for MDD, we used the prediction analysis for microarrays (PAM) tool26 on the training set of 21 MDD patients and 21 healthy controls, to select genes that were tested on an independent validation set of 12 MDD patients and 13 controls. This analysis was performed independently for the basal and the LPS-stimulated sample. In order to use the marker as a diagnostically-applicable tool and to circumvent over-fitting due to large numbers of genes29, the number of classifier genes found using the PAM analysis (160, 110 genes, respectively) was reduced further by selection based on the effect size (differential expression between MDD and control), p-value and the robustness of the gene in the classifier (classification participation). Finally, for each sample, 12 candidate classifier genes were selected (Table 7). The set of classifier genes from the LPS-stimulated sample was slightly better in discriminating MDD patients from controls than the set of classifier genes from the basal sample, as was determined by different statistical tests (Table 2). In order to validate these results, the MDD-marker was evaluated on the independent set of 12 MDD patients and 13 controls. This showed that the classifier genes of the LPS-stimulated sample were superior (p=0.002) to those of the basal sample for discrimination of MDD patients from controls (p=0.311; Table 2).
To avoid possible type-1 errors and to reduce the number of classifier genes, we used the technically-independent real-time quantitative PCR (qPCR) technique and analyzed the difference in gene expression in the LPS-stimulated sample in patients and controls of the training set. For 8 out of 12 genes, it was possible to make specific qPCR primers. For 7 out of these 8 genes, the difference in expression level between MDD patients and controls could be corroborated (
At present, laboratory blood tests to support MDD diagnosis are not available. Linking gene expression profiling in whole blood after ex vivo LPS stimulation with clinical data rapidly identified diagnostic biomarkers of MDD. These were confirmed in an independent validation set, as well as by qPCR, the latter being an independent and convenient low-cost method.
To ensure studying disease state, rather than trait, we have included both single episode and recurrent cases for which the presence of a current major depressive episode was obligatory. Moreover, we have excluded controls with a history of MDD. In addition, we have studied the correlation between our MDD-marker and depression severity (IDS). Since co-morbid physical disorders and use of medication are common in most MDD cohorts, the MDD subjects in the present study may not entirely reflect MDD patients in day-to-day clinical practice. Therefore, the MDD-marker that we present in this study should be considered as a first step toward the use of such a biomarker in the general population.
An important feature of a biomarker is sensitivity and specificity of detection. The χ2-test is indicative for the high specificity and sensitivity of the marker (validation set qPCR p=0.012). The sensitivity (% MDD patients with positive marker outcome) of our MDD-marker is 76.9%, and the specificity (% controls with negative marker outcome) is 71.4% for the validation set using qPCR (
Apart from a possible genetic component that could explain the differences between MDD patients and healthy controls in relation to the MDD-marker presented, epigenetic differences could also play a role. In animal models30,31 and humans alike32,33, variation in environment, e.g. early maternal care or prenatal SSRI exposure, may have a serious impact on the wellbeing of the offspring34. Disrupted parenting is associated with differences in hypothalamic-pituitary-adrenal (HPA) stress response in the offspring, known to be mediated via changes in the epigenetic regulation of glucocorticoid receptor gene expression31,35. In order to exclude a possibility of differential stress-response of leukocytes to the LPS-stimulus, we have analyzed our data for typical stress-response genes. In all individuals, stimulation of blood by LPS induced massive gene expression, among which are several cytokines (Table 6), such as TNF, NFkappaB, IL1a, IL6, 1110. However, none of these genes displayed a differential expression level between MDD patients and controls according to the limits set (−0.3>[MDD vs. control]>0.3; P<0.02). In addition, genes related to monoamine synthesis and release, such as serotonin and dopamine receptors, and the serotonin transporter, were not differentially expressed (Table 6) in either basal blood or LPS-stimulated blood between MDD patients and controls. Genes previously reported to be differentially expressed in whole blood, like 5HTT (SLC6A4), VEGF, PDE4B, HDAC, CREB and PDLIM5 (Table 6), were not confirmed in our NESDA cohort, possibly due to genetic differences with the Asian population in that study36.
Although blood gene expression has the potential to reveal a biological pathway or mechanism that plays a role in neuropsychiatric disease8, the primary hallmark of a biomarker is the ability to classify subjects. In addition, performing gene expression analysis on LPS-stimulated blood may have further emphasized the finding of marker genes rather than biologically-relevant genes. An over-representation analysis of the marker genes compared with the genes present on the array revealed a significant result for several levels of the gene ontology class ‘biological process’ (
Here, for the first time, we build a classifier gene set based on stimulated blood gene expression. The robustness of classification is exemplified in the low p-values obtained in the validation set (microarray), as well as the corroboration of the classifier using an independent technique (real-time qPCR). In general, markers for complex diseases are not simply present or absent. Rather, they have a wide range of values that overlap in persons with or without the disease, where the value typically increases progressively with increased risk or severity levels. The gene set, as specified for MDD in this study, has diagnostic value in determining disease state, because it correlated strongly with depression severity. The invention also applies for the predictive value of our marker for treatment outcome, as well as for the (re)occurrence of MDD in subjects that presently have no or only sub-threshold depressive disorders during the future 2-, or 4-year follow-up assessment of the NESDA study. The invention is also suited to test regular clinical screening tools. Obviously, a marker with predictive value has direct impact on day-to-day clinical practice, since it opens the door to prevention.
For the total set of arrays, filtering resulted in ˜24000 probes (˜60%) with a present-call. When the LPS sample was compared with the basal sample [LPS-basal] for all arrays, about 1700 probes (7%) showed significant regulation (FDR<0.0001; −0.8>[LPS-basal] (log 2)>0.8), with 65% of these up-regulated and 35% down-regulated (
As a first step to analyze whether our data had the potential to discriminate for disease state and to be able to use a classifier approach (see below), we examined the number of genes that were significantly different in expression level between MDD patients and healthy controls (43 and 44 subjects, respectively). The differentially expressed genes that are discriminative for disease state were analyzed using a false-discovery rate (FDR) analysis. Notably, a low number (3 genes) of significant genes (FDR<0.1) were detected when comparing untreated blood samples (basal blood) (
In order to find the molecular marker for MDD, we used the Prediction Analysis of Microarrays (PAM)1 analysis tool as described in the main text. Because genes with proven ability to classify samples do not necessarily need to be the most significant differentially expressed genes, we used all three data sets (gene expression from basal blood, LPS-stimulated blood, and the [LPS-basal] expression) in the classification algorithm. At the minimum misclassification error of 43.5%, 28.5%, 36%, the PAM analysis resulted in 160, 110, and 218 genes that classified subjects into two groups (MDD patients vs. control) for the three data sets used (basal, LPS and [LPS-basal], respectively). After reduction of this set of genes, 4 genes present in the LPS sample and one in the basal sample also scored relatively high as classifier in the [LPS-basal] sample. However, the final set of genes was selected based on participation from independent data sets.
CD69
1.15
CD83
3.59
CXCL2
3.91
DUSP2
2.69
IFITM1
1.14
IFITM3
0.92
IFNGR1
−1.39
IFNGR2
1.23
IL10
1.52
IL12B
1.35
IL17R
−1.14
IL1A
2.08
IL1B
5.77
IL23A
1.72
IL6
5.66
IL6R
−1.09
IL8
1.48
IL8RA
−2.54
IL8RB
−2.23
MX2
2.52
MYD88
1.12
NFKB1
3.13
NFKB2
2.16
NFKBIA
3.49
PTEN
−1.04
TGFBI
−2.72
TNF
3.09
TNFAIP3
3.07
TNFAIP6
4.90
AOC3
−0.50
COMT
−0.58
EDN1
2.59
EGR1
0.65
FCER1A
−1.53
FOS
−1.64
IL4I1
4.04
JUN
2.29
MAPK1
−0.64
MAPK14
−0.57
SIGLEC7
−0.51
SNX27
−0.92
SRC
1.08
SYK
−0.94
PDE4B
1.21
HDAC5
−0.94
VEGF
0.60
CAPRIN1
CLEC4A
+
+
KRT23
MLC1
+
+
+
PLSCR1
+
+
PROK2
+
ZBTB16
+
+
+
+
CAPRIN1
CLEC4A
+
+
KRT23
MLC1
+
+
PLSCR1
+
+
+
+
PROK2
+
+
+
+
+
+
ZBTB16
+
+
+
+
+
CAPRIN1
CLEC4A
+
+
+
KRT23
MLC1
PLSCR1
+
+
+
+
+
PROK2
+
+
+
+
+
+
+
+
+
+
+
+
ZBTB16
+
+
+
+
Number | Date | Country | Kind |
---|---|---|---|
07150177.9 | Dec 2007 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/NL08/50837 | 12/19/2008 | WO | 00 | 11/15/2010 |