Patients with metal disorders may receive the same diagnosis, and yet share few symptoms in common, vary widely in severity, and respond differently to treatments. Genetic association studies of mental disorders were plagued by weak and inconsistent findings, largely as a result of the clinical and etiologic heterogeneity of the cases when people were described only as having the disorder or not (cases vs controls). Classifications based on clinical features without regard for measured genotypic differences also failed to predict response to treatment.
A disorder is “complex” when it is influenced by the combined effects of interacting genes. Individual genes do not consistently cause a mental disorder; rather, it takes many genes operating in concert, possibly interacting with specific environmental factors, in order for a person to develop mental illness. Complex diseases, such as schizophrenia, may be influenced by hundreds or thousands of genetic variants that interact with one another in complex ways, and consequently display a multifaceted genetic architecture. The genetic architecture of heritable diseases refers to the number, frequency, and effect sizes of genetic risk alleles and the way they are organized into genotypic networks. In complex disorders, the same genotypic networks may lead to different clinical outcomes (a concept known as multifinality, which is called pleiotropy in genetics), and different genotypic networks may lead to the same clinical outcome (equifinality, which is also described as heterogeneity). In general, geneticists must expect the likelihood that many genes affect each trait and each gene affects many traits. Consequently, research on complex heritable disorders like schizophrenia is likely to yield weak and inconsistent results unless the complexity of their genetic and phenotypic architecture is taken into account.
For example, twin and family studies of schizophrenia consistently indicate that the variability in risk of disease is highly heritable (81%), but only 25% of the variability has been explained by specific genetic variants identified in genome-wide association studies (GWAS). This is not surprising for complex disorders like schizophrenia because current GWAS methods have been unable to characterize the gene-gene interactions (
In past studies of schizophrenia, the missing heritability problem has been approached by analyzing the explained variance in large individual samples or by using meta-analysis to combine data sets. Efforts have also been made to consider the impact of variation related to ethnicity, sex, chromosomes, functional observations, or allele frequency. Nevertheless, most of the heritability of schizophrenia remains unexplained. What is needed are new diagnostic methods that look at both the genetic and phenotypic characteristic of schizophrenia and tools for the performance and analysis of such methods.
Disclosed are methods and compositions related to diagnosing, assessing the risk, and classifying a subject with schizophrenia.
In one aspect, disclosed herein are diagnostic systems for diagnosing schizophrenia, wherein the diagnostic system comprises one or more expression panels, wherein the one or more expression panels each comprise one or more of the single nucleotide polymorphism (SNP) sets comprising 19_2, 88_64, 81_13, 87_76, 58_29, 83_41, 9_9, 10_4, 14_6, 56_30, 42_37, 65_25, 71_55, 12_11, 90_78, 77_5, 88_8, 51_28, 59_48, 41_12, 22_11, 13_12, 31_22, 85_84, 87_84, 16_10, 56_19, 75_31, 81_73, 85_23, 21_8, 76_74, 61_39, 75_67, 76_63, 81_3, 87_26, 88_43, 25_10, 12_2, 52_42, and/or 54_51.
Also disclosed is the diagnostic system of any preceding aspect, wherein the system selects for “severe process, with positive and negative symptom schizophrenia”, and wherein the one or more SNP sets comprise 56_30, 75_67, and/or 76_74.
Also disclosed is the diagnostic system of any preceding aspect, wherein the system selects for “positive and negative symptom Schizophrenia”, and wherein the one or more SNP sets comprise 59_48, 71_55, 21_8, 54_51, 31_22, 65_25, and/or 87_84.
Also disclosed is the diagnostic system of any preceding aspect, wherein the system selects for “negative Schizophrenia”, and wherein the one or more SNP sets comprise 58_29, 9_9, 22_11, 81_3, 13_12, 61_39, 10_4, 81_73, 75_31, 56_19, 88_8, and/or 12_2.
Also disclosed is the diagnostic system of any preceding aspect, wherein the system selects for “Positive Schizophrenia”, and wherein the one or more SNP sets comprise 88_64, 85_84, and/or 41_12.
Also disclosed is the diagnostic system of any preceding aspect, wherein the system selects for “severe process, positive schizophrenia”, and wherein the one or more SNP sets comprise 77_5, 81_13, and/or 25_10.
Also disclosed is the diagnostic system of any preceding aspect, wherein the system selects for “moderate process, disorganized negative schizophrenia”, and wherein the one or more SNP sets comprise 19_2, 52_42, 90_78, 12_11, 87_76, and/or 14_6.
Also disclosed is the diagnostic system of any preceding aspect, wherein the system selects for “moderate process, positive and negative schizophrenia”, and wherein the one or more SNP sets comprise 42_37, 88_43, and/or 51_28.
Also disclosed is the diagnostic system of any preceding aspect, wherein the system selects for “moderate process, continuous positive schizophrenia”, and wherein the one or more SNP sets comprise 16_10, 83_41, and/or 87_26.
Also disclosed herein are diagnostic systems of the invention, further comprising one or more phenotype panels, wherein each phenotype panel comprises one or more phenotypic sets selected from the group comprising 15_13, 12_11, 21_1, 50_46, 9_6, 46_23, 54_11, 30_17, 18_13, 27_6, 61_18, 64_11, 65_64, 12_4, 42_9, 52_28, 7_3, 48_41, 26_8, 69_41, 10_5, 17_2, 63_24, 69_66, 22_13, 53_6, 59_41, 20_19, 55_7, 34_17, 27_7, 4_1, 66_54, 8_4, 51_38, 42_7, 18_3, 46_29, 5_2, 57_39, 11_5, 24_4, 48_7, 28_23, and/or 25_20.
Also disclosed is the diagnostic system of any preceding aspect, wherein the system selects for “severe process, with positive and negative symptom schizophrenia”, and wherein the one or more phenotypic sets comprise 15_13, 12_11, 21_1, 50_46, 9_6, 46_23, 54_11, 30_17, 18_13, 27_6, 61_18, 64_11, and/or 65_64.
Also disclosed is the diagnostic system of any preceding aspect, wherein the system selects for “ positive and negative schizophrenia”, and wherein the one or more phenotypic sets comprise 12_4 and/or 42_9.
Also disclosed is the diagnostic system of any preceding aspect, wherein the system selects for “negative schizophrenia”, and wherein the one or more phenotypic sets comprise 52_28, 7_3, 48_41, 26_8, 69_41, 10_5, and/or 17_2.
Also disclosed is the diagnostic system of any preceding aspect, wherein the system selects for “positive schizophrenia”, and wherein the one or more phenotypic sets comprise 63_24 and/or 69_66.
Also disclosed is the diagnostic system of any preceding aspect, wherein the system selects for “severe process, positive schizophrenia”, and wherein the one or more phenotypic sets comprise 22_13, 18_13, 53_6, 59_41, 20_19, 55_7, 34_17, 69_66, 27_7, 18_13, 4_1, 66_54, and/or 8_4.
Also disclosed is the diagnostic system of any preceding aspect, wherein the system selects for “moderate process, disorganized negative schizophrenia”, and wherein the one or more phenotypic sets comprise 51_38, 42_7, 18_3, and/or 46_29.
Also disclosed is the diagnostic system of any preceding aspect, wherein the system selects for “moderate process, positive and negative schizophrenia”, and wherein the one or more phenotypic sets comprise 5_2, 57_39, 11_5, and/or 24_4.
Also disclosed is the diagnostic system of any preceding aspect, wherein the system selects for “moderate process, continuous positive schizophrenia”, and wherein the one or more phenotypic sets comprise 48_7, 28_23, and/or 25_20.
Also disclosed is the diagnostic system of any preceding aspect, further comprising a means for reading the one or more expression panels, a computer operationally linked to the means for reading the one or more expression panels, and a display for visualizing the diagnostic risk; wherein the computer identifies the expression profile of an expression panel, compares the expression profile to a control, and catalogs that data, wherein the computer provides an input source for inputting phenotypic into a phenomic database; wherein the computer compares the expression and phenomic data and calculates relationships between the genomic and phenotypic data; wherein the computer compares the genomic and phenotypic relationship data to a reference standard; and wherein the computer outputs the relationship data and the standard on the display.
In one aspect, disclosed herein are methods of diagnosing a subject with schizophrenia comprising obtaining a biological sample from the subject, obtaining clinical data from the subject, and applying the biological sample and clinical data to the diagnostic system of any preceding aspect.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate several embodiments and together with the description illustrate the disclosed compositions and methods.
Before the present compounds, compositions, articles, devices, and/or methods are disclosed and described, it is to be understood that they are not limited to specific synthetic methods or specific recombinant biotechnology methods unless otherwise specified, or to particular reagents unless otherwise specified, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting.
As used in the specification and the appended claims, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a pharmaceutical carrier” includes mixtures of two or more such carriers, and the like.
Ranges can be expressed herein as from “about” one particular value, and/or to “about” another particular value. When such a range is expressed, another embodiment includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by use of the antecedent “about,” it will be understood that the particular value forms another embodiment. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint. It is also understood that there are a number of values disclosed herein, and that each value is also herein disclosed as “about” that particular value in addition to the value itself. For example, if the value “10” is disclosed, then “about 10” is also disclosed. It is also understood that when a value is disclosed that “less than or equal to” the value, “greater than or equal to the value” and possible ranges between values are also disclosed, as appropriately understood by the skilled artisan. For example, if the value “10” is disclosed the “less than or equal to 10” as well as “greater than or equal to 10” is also disclosed. It is also understood that the throughout the application, data is provided in a number of different formats, and that this data, represents endpoints and starting points, and ranges for any combination of the data points. For example, if a particular data point “10” and a particular data point 15 are disclosed, it is understood that greater than, greater than or equal to, less than, less than or equal to, and equal to 10 and 15 are considered disclosed as well as between 10 and 15. It is also understood that each unit between two particular units are also disclosed. For example, if 10 and 15 are disclosed, then 11, 12, 13, and 14 are also disclosed.
In this specification and in the claims which follow, reference will be made to a number of terms which shall be defined to have the following meanings:
“Optional” or “optionally” means that the subsequently described event or circumstance may or may not occur, and that the description includes instances where said event or circumstance occurs and instances where it does not.
Throughout this application, various publications are referenced. The disclosures of these publications in their entireties are hereby incorporated by reference into this application in order to more fully describe the state of the art to which this pertains. The references disclosed are also individually and specifically incorporated by reference herein for the material contained in them that is discussed in the sentence in which the reference is relied upon.
We have chosen to measure and characterize the complexity of both the genotypic and the phenotypic architecture of schizophrenia (
We investigated the architecture of schizophrenia in the Molecular Genetics of Schizophrenia (MGS) study, in which all subjects had consistent and detailed genotypic and phenotypic assessments. We then replicated the results in two other independent samples in which comparable genotypic and phenotypic features were available: the Clinical Antipsychotic Trial of Intervention Effectiveness (CATIE) and the Portuguese Island studies from the Psychiatric Genomics Consortium (PGC).
The result of this work is a diagnostic system that is able to diagnose a subject as having schizophrenia, but more importantly classify the category of schizophrenia with which the subject is suffering. To accomplish this, the diagnostic system can comprise an expression panel that can be used to detect nucleic acid or protein expression. Thus, in one aspect, disclosed herein are diagnostic systems for diagnosing schizophrenia, wherein the diagnostic system comprises one or more expression panels, wherein the one or more expression panels can comprise one or more one or more expression sets (such as, for example, one or more SNP sets).
The expression panels disclosed herein can be assayed by any means to measure differential expression of a gene or protein known in the art. Specifically contemplated herein are methods of assessing the risk, diagnosing, or classifying schizophrenia comprising performing an assay that measures differential expression of a nucleic acid, gene, peptide, or protein. Specifically contemplated are methods of assessing the risk, diagnosing, or classifying schizophrenia comprising performing an assay that measures differential gene or protein expression, wherein the assay is selected from the group of assays comprising Northern analysis, RNAse protection assay, PCR, QPCR, genome microarray, DNA microarray, MMCHipslow density PCR array, oligo array, protein array, peptide array, phenotype microarray, SAGE, and/or high throughput sequencing. Therefore, it is understood that the microarray panel can measure differential expression of a phenotypes, proteins, peptides, RNAs, microRNAs, DNAs, Single Nucleotide Polymorphisms (SNPs), or genes or sets of said phenotypes, proteins, peptides, RNAs, microRNAs, DNAs, Single Nucleotide Polymorphisms (SNPs), or genes. For example, in one aspect, the disclosed panel can be a microarray such as a those developed and sold by Affymetrix, Agilent, Applied Microarrays, Arrayit, and Illumina
In one aspect, the panel can comprise Single Nucleotide Polymorphism (SNP) sets. The SNP set can be any SNP set that has a greater than 70% association with risk for schizophrenia, including but not limited to 19_2, 88_64, 81_13, 87_76, 58_29, 83_41, 9_9, 10_4, 14_6, 56_30, 42_37, 65_25, 71_55, 12_11, 90_78, 77_5, 88_8, 51_28, 59_48, 41_12, 22_11, 13_12, 31_22, 85_84, 87_84, 16_10, 56_19, 75_31, 81_73, 85_23, 21_8, 76_74, 61_39, 75_67, 76_63, 81_3, 87_26, 88_43, 25_10, 12_2, 52_42, and 54_51, which are specifically listed in Table 1.
a SKAT = SNP-Set Kernel Association Test.
Accordingly, in one aspect, disclosed herein are diagnostic systems for diagnosing schizophrenia, wherein the diagnostic system comprises one or more expression panels, wherein the one or more expression panels each comprise one or more of the single nucleotide polymorphism (SNP) sets selected from the group comprising, but not limited to 19_2, 88_64, 81_13, 87_76, 58_29, 83_41, 9_9, 10_4, 14_6, 56_30, 42_37, 65_25, 71_55, 12_11, 90_78, 77_5, 88_8, 51_28, 59_48, 41_12, 22_11, 13_12, 31_22, 85_84, 87_84, 16_10, 56_19, 75_31, 81_73, 85_23, 21_8, 76_74, 61_39, 75_67, 76_63, 81_3, 87_26, 88_43, 25_10, 12_2, 52_42, and/or 54_51. It is understood and herein contemplated that each of the SNP sets disclosed herein maps to one or more nucleic acid molecules. Therefore, a single SNP set will not necessarily be comprised solely of primers or probes for detection of a single SNP, but can be comprised of multiple primers and probes for the detection of SNPs mapping to at least one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, or twenty nucleic acid locations. As disclosed in Table 2, each of the SNP sets disclosed herein maps to particular locations on a gene, including protein coding and non-coding regulatory variants.
Homo sapiens 3 BAC RP11-436A20 (Roswell Park
Homo sapiens BAC clone RP11-431J17 from 4,
Homo sapiens BAC clone RP11-431J17 from 4,
Homo sapiens 3 BAC RP11-735B13 (Roswell Park
Homo sapiens 3 BAC RP11-735B13 (Roswell Park
For example, as disclosed in Table 2, where a SNP set 9_9 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in NTRK3 and SEMA3A; where a SNP set 10_4 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in C14orf102, C14orf102(5′), PSMC1, PSMC1(3′), and PSMC1(5′); where a SNP set 12_11 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in C14orf102, C14orf102(5′), PSMC1, PSMC1(3′), and PSMC1(5′); a SNP set 12_2 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in an intronic region and 3′ UTR of HPGDS, HPGDS(5′), an intronic region, missense, and 3′ UTR of SMARCAD1 and RP11-363G15.2; where a SNP set 13_12 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in EML5, SPATA7, U4.15(3′), U4.15(5′), and ZC3H14; where a SNP set 14_6 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in NTRK3; a SNP set 16_10 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in, intronic region and 3′ UTR of HPGDS, HPGDS(5′), RP11-363G15.2 and an intronic region, missense, and 3′ UTR of SMARCAD1; a SNP set 19_2 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in ARPC5L, an intronic region, missense, and 3′ UTR of GOLGA1, RPL35, WDR38, and SCA1; where a SNP set 21_8 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in AC068490.2; where a SNP set 22_11 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in AC068490.2; where a SNP set 25_10 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in AL158819.7(3′), FOXR2, FOXR2(3′), MAGEH1(5′), PAGE3, PAGE3(3′), PAGE3(5′), RP11-382F24.2, RP11-382F24.2(3′), RP11-382F24.2(5′), RP13-188A5.1, RRAGB, RRAGB(3′), RRAGB(5′), and SNORD112.49(3′); a SNP set 31_2 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in intronic region, and 3′ UTR C6orf138, C6orf138(3′), and OPN5(3′); where a SNP set 41_12 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in GPR119(3′), SLC25A14 and SLC25A14(3′); where a SNP set 42_37 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in NCAM1, RP11-629G13.1, RP11-629G13.1(3′), AC064837.1, and PPP1R1C; where a SNP set 51_28 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in IGSF1; a SNP set 52_42 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in NCAM1, RP11-629G13.1, and RP11-629G13.1(3′); where a SNP set 54_51 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in CSMD1; where a SNP set 56_19 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in SNX19(5′); where a SNP set 56_30 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in 7SK.207(3′), 7SK.207(5′), PTBP2, PTBP2(5′), RP4-726F1.1(3′), GP2, GP2(3′); where a SNP set 58_29 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in CTD-3025N20.2(3′) and RP11-1D12.2(5′); where a SNP set 59_48 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in RP11-128M1.1, RP11-128M1.1(3′) and TRPS1(3′); where a SNP set 61_39 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in IGSF1; where a SNP set 65_25 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in C20orf78(5′); where a SNP set 71_55 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in NTRK3(3′); where a SNP set 75_31 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in AC093577.1(3′), AC093577.1(5′), U6.1077(5′), and SNX19(5′); where a SNP set 75_67 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in SNORA42.4(5′), VANGL1(5′), RP11-298H24.1(3′), STYK1, AL 161669.1(3′), AL161669.1(5′), AL161669.2, AL161669.2(3′), 5S_rRNA.496(3′), NTRK3(3′), 7SK.236(5′), GP2, GP2(3′), CTA-714B7.5, RP11-436A20.3, C4orf37, C4orf37(3′), RP11-431J17.1(3′), 7SK.7(3′), DKK4(5′), DUSP4(5′), GSR, RP11-401H2.1(5′), RP11-486M23.1(5′), RP11-738G5.1(3′), RP11-770E5.1, SLC20A2, SNTG1, SNTGT1(3′), ST18, and VDAC3; where a SNP set 76_63 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in IGSF1; where a SNP set 76_74 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in AL161669.1(3′), AL161669.1(5′), AL161669.2, AL161669.2(3′), ABCC12(3′), ITFG1, NETO2, PHKB, PHKB(3′), C4orf37, C4orf37(3′), RP11-431J17.1(3′), SOD3(5′), CTD-2292M14.1(3′), RP11-1D12.2(5′), and RP11-770E5.1; where a SNP set 77_5 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in CSMD1; a SNP set 81_13 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in GP2, GP2(3′), RP11-401H2.1(5′), SNTG1, and SNTG1(3′); where a SNP set 81_3 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in AC068490.2; where a SNP set 81_73 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in TMEM135, TMEM135(3′), RYR3, and CHST9; where a SNP set 83_41 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in ATP8A2; where a SNP set 85_84 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in RP11-735B13.1, RP11-735B13.1(5′), and RP11-735B13.2(3′); where a SNP set 85_23 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in CHST9; a SNP set 87_26 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in NALCN and RP11-430M15.1; where a SNP set 87_76 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in TRPS1(3′); where a SNP set 87_84 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in AC093577.1(5′), FAM69A, FAM69A(5′), RPL5, RPL5(5′), SNORA66.1, and U6.1236(5′); where a SNP set 88_43 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in RP11-428G2.1(5′); where a SNP set 88_64 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in GP2 and GP2(3′); where a SNP set 88_8 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in AC093577.1(3′), AC093577.1(5′), EVI5, U6.1077(5′), and HACE1(3′); and where a SNP set 90_78 is disclosed, specifically contemplated herein is that SNP sets detects polymorphisms in AC093577.1(3′), AC093577.1(5′), EVI5, and U6.1077(5′).
It is contemplated herein that the disclosed expression panel can comprise a single expression set (such as, for example, the SNP sets disclosed herein 19_2, 88_64, 81_13, 87_76, 58_29, 83_41, 9_9, 10_4, 14_6, 56_30, 42_37, 65_25, 71_55, 12_11, 90_78, 77_5, 88_8, 51_28, 59_48, 41_12, 22_11, 13_12, 31_22, 85_84, 87_84, 16_10, 56_19, 75_31, 81_73, 85_23, 21_8, 76_74, 61_39, 75_67, 76_63, 81_3, 87_26, 88_43, 25_10, 12_2, 52_42, or 54_51). It is further contemplated herein that the disclosed expression panels can comprise any combination of 2, 3, 4, 5, 6, 7, 8, 9 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, or 42 or more of the disclosed SNP sets. For example, the expression panel can comprise one or more SNP sets are selected from the group comprising 88_8, 90_78, 65_25, 42_37, 71_55, 56_30, 77_5, 12_11, 51_28, 59_48, 10_4, 83_41, 58_29, 9_9, 14_6, 87_76, 88_64, or 81_13. Also, the expression panel can comprise one or more SNP sets are selected from the group comprising 10_4, 83_41, 58_29, 9_9, 14_6, 87_76, 88_64, or 81_13. Also, the expression panel can comprise one or more SNP sets are selected from the group comprising 87_76, 88_64, or 81_13.
As disclosed herein, through analysis of the complex genotypic and phenotypic relationships certain groupings of SNP sets and clinical/phenotypic features were elucidated. The composition of these designated sets is presented in Table 7. These SNP sets are associated with specific subtypes of the schizophrenias, which are characterized here simultaneously by both their genetic features (snp sets) and their clinical features (phenotypic sets) and are grouped into 8 subtypes (see, Table 7).
bSymptoms were assessed with Diagnostic Interview for Genetic Studies.
Because of these associations it is possible to create panels to assess the risk of a subject to have a particular classification of schizophrenia. These classification specific expression panels can be used individually in the diagnostic system disclosed herein or as one of several classification specific panels in a diagnostic system. For example, in one aspect, disclosed herein are diagnostic systems, wherein the system selects for severe process, with positive and negative symptom schizophrenia (I), and wherein the one or more SNP sets comprise 56_30, 75_67, or 76_74. Also disclosed are diagnostic systems, wherein the system selects for positive and negative Schizophrenia (II), and wherein the one or more SNP sets comprise 59_48, 71_55, 21_8, 54_51, 31_22, 65_25, or 87_84. Also disclosed are diagnostic systems, wherein the system selects for negative Schizophrenia (III), and wherein the one or more SNP sets comprise 58_29, 9_9, 22_11, 81_3, 13_12, 61_39, 10_4, 81_73, 75_31, 56_19, 88_8, or 12_2. Also disclosed are diagnostic systems, wherein the system selects for Positive Schizophrenia (IV), and wherein the one or more SNP sets comprise 88_64, 85_84, or 41_12. Also disclosed are diagnostic systems, wherein the system selects for severe process, positive schizophrenia (V), and wherein the one or more SNP sets comprise 77_5, 81_13, or 25_10. Also disclosed are diagnostic systems, wherein the system selects for moderate process, disorganized negative schizophrenia (VI), and wherein the one or more SNP sets comprise 19_2, 52_42, 90_78, 12_11, 87_76, and 14_6. Also disclosed are diagnostic systems, wherein the system selects for moderate process, positive and negative schizophrenia (VII), and wherein the one or more SNP sets comprise 42_37, 88_43, or 51_28. Also disclosed are diagnostic systems, wherein the system selects for moderate process, continuous positive schizophrenia (VIII), and wherein the one or more SNP sets comprise 16_10, 83_41, or 87_26.
As noted above, the disclosed classification specific expression panels can be used alone or in combination of 2 or more with any other classification specific expression panel. In a non-limiting example, the diagnostic system can comprise classification specific expression panels I; II; III; IV; V; VI; VII; VIII; I and II; I and III; I and IV; I and V; I and VI; I and VII; I and VIII; II and III; II and IV; II and V; II and VI; II and VII; II and VIII; III and IV; III and V; III and VI; III and VII; III and VIII; IV and V; IV and VI; IV and VII; IV and VIII; V and VI; V and VII, V and VIII; VI and VII; VI and VIII; VII and VIII; I, II, and III; III and IV; I, II, and V; I, II, and VI; I, II, and VII, I, II, and VIII; I, III, and IV; I, III, and V; I, III, and VI; I, III, and VII; I, III, and VIII; I, IV, and V; I, IV, and VI; I, IV, and VII; I, IV, and VIII; I, V, and VI; I, V, and VII, I, V, and VIII; I, VI, and VII, I, VI, and VIII; I, VII and VIII; I, II, III, and IV; I, II, III, and V; I, II, III, and VI, I, II, III, and VII; I, II, III, and VIII; I, II, IV, and V; I, II, IV, and VI; I, II, IV; and VI; I, II, IV, and VII; I, II, IV, and VIII; I, II, V, and VI; I, II, V, and VII; I, II, V, and VIII; I, II, VI, and VII; I, II, VI, and VIII; I, II, VII, and VIII; I, III, IV, and V; I, III, IV, and VI; I, III, IV, and VII; I, III, IV, and VIII; I, III, V, and VI; I, III, V, and VII; I, III, V, and VIII; I, IV, V, and VI; I, IV, V, and VII; I, IV, V, and VIII; I, V, VI, and VII; I, V, VI, and VIII; I, VI, VII, and VIII; I, II, III, IV, and V; I, II, III, IV, and VI; I, II, III, IV, and VII; I, II, III, IV, and VIII; I, III, IV, V, and VI; I, III, IV, V, and VII; I, III, IV, V, and VIII; I, II, IV, V, and VI; I, II, IV, V, and VII; I, II, IV, V, and VIII; I, II, III, V, and VI; I, II, III, V, and VII; I, II, III, V, and VIII; I, II, III, VI, and VII; I, II, III, VI, and VIII; I, II, III, VII, and VIII; I, II, III, IV, V, and VI; I, II, III, IV, V, and VII; I, II, III, IV, V, and VIII; I, II, III, IV, VI, and VII; I, II, III, IV, VI, and VIII; I, II, III, IV, VII, and VIII; I, II, III, IV, V, VI, and VII; I, II, III, IV, V, VI, and VIII; I, II, III, IV, V, VI, VII, and VIII; II, III, and IV; II, III, and V; II, III, and VI; II, III, and VII, II, III, and VIII; II, IV, and V; II, IV, and VI; II, IV, and VII; II, IV, and VIII; II, V, and VI; II, V, and VII; II, V, and VIII; II, VI, and VII, II, VI, and VIII; II, VII and VIII; II, III, IV, and V; II, III, IV, and VI; I II, III, IV; and VI; II, III, IV, and VII; II, III, IV, and VIII; II, IV, V, and VI; II, IV, V, and VII; II, IV, V, and VIII; II, IV, VI, and VII; II, IV, VI, and VIII; II, IV, VII, and VIII; II, III, V, and V; II, III, V, and VI; II, III, V, and VII; and II, III, V, and VIII.
In one aspect, it is understood and herein contemplated that expression panels can be complemented in the claimed diagnostic system with phenotypic panels which provide the results of clinical assessment, hereditary surveys, environmental surveys (which look at oxidative stress during development or delivery (such as maternal pre-eclampsia or delivery with low Apgar score), urban versus rural living conditions—urban life increases risk, use of recreational drugs like marijuana or PCP during adolescence, social isolation, childhood abuse or neglect, and reduction in sensory input such as hearing or visual loss), online surveys, and interviews creating phenotypic sets Accordingly, in one aspect, disclosed herein are diagnostic systems for diagnosing schizophrenia further comprising one or more phenotype panels, wherein each phenotype panel comprises one or more phenotypic sets such as those listed in Table 8. Thus, in one aspect, disclosed herein are diagnostic systems for diagnosing schizophrenia further comprising one or more phenotype panels, wherein each phenotype panel comprises one or more phenotypic sets selected from the group comprising 15_13, 12_11, 21_1, 50_46, 9_6, 46_23, 54_11, 30_17, 18_13, 27_6, 61_18, 64_11, 65_64, 12_4, 42_9, 52_28, 7_3, 48_41, 26_8, 69_41, 10_5, 17_2, 63_24, 69_66, 22_13, 53_6, 59_41, 20_19, 55_7, 34_17, 4_1, 66_54, 8_4, 51_38, 42_7, 18_3, 46_29, 5_2, 57_39, 11_5, 24_4, 48_7, 28_23, and/or 25_20. It is understood and herein contemplated that the disclosed phenotypic panels can comprise any of the phenotypic sets individually or in any combination of 2, 3, 4, 5, 6, 7, 8, 9 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, or 42 or more of the disclosed phenotype sets.
As noted in Table 7, the phenotypic sets disclosed herein have been associated with one or more symptoms of one or more schizophrenia classes. Thus, contemplated herein are classification specific phenotype panels that can be used individually in the diagnostic system disclosed herein or as one of several classification specific panels in a diagnostic system. For example, in one aspect, disclosed herein are diagnostic systems, with positive and negative symptom schizophrenia (I), and wherein the one or more phenotypic sets comprise 15_13, 12_11, 21_1, 50_46, 9_6, 46_23, 54_11, 30_17, 18_13, 27_6, 61_18, 64_11, or 65_64. Also disclosed are diagnostic systems, wherein the system selects for positive and negative schizophrenia (II), and wherein the one or more phenotypic sets comprise 12_4 or 42_9. Also disclosed are diagnostic systems, wherein the system selects for negative schizophrenia (III), and wherein the one or more phenotypic sets comprise 52_28, 7_3, 48_41, 26_8, 69_41, 10_5, or 17_2. Also disclosed are diagnostic systems, wherein the system selects for positive schizophrenia (IV), and wherein the one or more phenotypic sets comprise 63_24 and 69_66. Also disclosed are diagnostic systems, wherein the system selects for severe process, positive schizophrenia (V), and wherein the one or more phenotypic sets comprise 22_13, 18_13, 53_6, 59_41, 20_19, 55_7, 34_17, 69_66, 27_7, 18_13, 4_1, 66_54, or 8_4. Also disclosed are diagnostic systems, wherein the system selects for moderate process, disorganized negative schizophrenia (VI), and wherein the one or more phenotypic sets comprise 51_38, 427, 18_3, or 46_29. Also disclosed are diagnostic systems, wherein the system selects for moderate process, positive and negative schizophrenia (VII), and wherein the one or more phenotypic sets comprise 5_2, 57_39, 11_5, or 24_4. Also disclosed are diagnostic systems, wherein the system selects for moderate process, continuous positive schizophrenia (VIII), and wherein the one or more phenotypic sets comprise 48_7, 28_23, or 25_20. As noted above, the disclosed classification specific phenotype panels can be used alone or in combination of 2 or more with any other classification specific phenotype panel in the disclosed diagnostic system.
As noted above, the disclosed classification specific phenotypic panels can be used alone or in combination of 2 or more with any other classification specific phenotype panel. In a non-limiting example, the diagnostic system can comprise classification specific phenotype panels I; II; III; IV; V; VI; VII; VIII; I and II; I and III; I and IV; I and V; I and VI; I and VII; I and VIII; II and III; II and IV; II and V; II and VI; II and VII; II and VIII; III and IV; III and V; III and VI; III and VII; III and VIII; IV and V; IV and VI; IV and VII; IV and VIII; V and VI; V and VII, V and VIII; VI and VII; VI and VIII; VII and VIII; I, II, and III; III and IV; I, II, and V; I, II, and VI; I, II, and VII, I, II, and VIII; I, III, and IV; I, III, and V; I, III, and VI; I, III, and VII; I, III, and VIII; I, IV, and V; I, IV, and VI; I, IV, and VII; I, IV, and VIII; I, V, and VI; I, V, and VII, I, V, and VIII; I, VI, and VII, I, VI, and VIII; I, VII and VIII; I, II, III, and IV; I, II, III, and V; I, II, III, and VI, I, II, III, and VII; I, II, III, and VIII; I, II, IV, and V; I, II, IV, and VI; I, II, IV; and VI; I, II, IV, and VII; I, II, IV, and VIII; I, II, V, and VI; I, II, V, and VII; I, II, V, and VIII; I, II, VI, and VII; I, II, VI, and VIII; I, II, VII, and VIII; I, III, IV, and V; I, III, IV, and VI; I, III, IV, and VII; I, III, IV, and VIII; I, III, V, and VI; I, III, V, and VII; I, III, V, and VIII; I, IV, V, and VI; I, IV, V, and VII; I, IV, V, and VIII; I, V, VI, and VII; I, V, VI, and VIII; I, VI, VII, and VIII; I, II, III, IV, and V; I, II, III, IV, and VI; I, II, III, IV, and VII; I, II, III, IV, and VIII; I, III, IV, V, and VI; I, III, IV, V, and VII; I, III, IV, V, and VIII; I, II, IV, V, and VI; I, II, IV, V, and VII; I, II, IV, V, and VIII; I, II, III, V, and VI; I, II, III, V, and VII; I, II, III, V, and VIII; I, II, III, VI, and VII; I, II, III, VI, and VIII; I, II, III, VII, and VIII; I, II, III, IV, V, and VI; I, II, III, IV, V, and VII; I, II, III, IV, V, and VIII; I, II, III, IV, VI, and VII; I, II, III, IV, VI, and VIII; I, II, III, IV, VII, and VIII; I, II, III, IV, V, VI, and VII; I, II, III, IV, V, VI, and VIII; I, II, III, IV, V, VI, VII, and VIII; II, III, and IV; II, III, and V; II, III, and VI; II, III, and VII, II, III, and VIII; II, IV, and V; II, IV, and VI; II, IV, and VII; II, IV, and VIII; II, V, and VI; II, V, and VII; II, V, and VIII; II, VI, and VII, II, VI, and VIII; II, VII and VIII; II, III, IV, and V; II, III, IV, and VI; I II, III, IV; and VI; II, III, IV, and VII; II, III, IV, and VIII; II, IV, V, and VI; II, IV, V, and VII; II, IV, V, and VIII; II, IV, VI, and VII; II, IV, VI, and VIII; II, IV, VII, and VIII; II, III, V, and V; II, III, V, and VI; II, III, V, and VII; and II, III, V, and VIII.
It is further understood that a diagnostic system can comprise any one or combination two or more phenotype panel in combination with any one or combination of two or more expression panels.
In one aspect, it is disclosed that the diagnostic system can comprise a purpose built analysis and diagnostic system to read the expression panel, analyze the expression panel data, input phenotypic sets, and display data and risk profiles associated with having schizophrenia or any particular class of schizophrenia disclosed herein. Thus, in one aspect, disclosed herein are diagnostic systems of any preceding aspect further comprising a means for reading the one or more expression panels, a computer operationally linked to the means for reading the one or more expression panels, and a display for visualizing the diagnostic risk; wherein the computer identifies the expression profile of an expression panel, compares the expression profile to a control, and catalogs that data, wherein the computer provides an input source for inputting phenotypic into a phenomic database; wherein the computer compares the expression and phenomic data and calculates relationships between the genomic and phenotypic data; wherein the computer compares the genomic and phenotypic relationship data to a reference standard; and wherein the computer outputs the relationship data and the standard on the display.
As noted above, the disclosed expression panel can be analyzed or read by any means known in the art including Northern analysis, RNAse protection assay, PCR, QPCR, genome microarray, DNA microarray, MMCHipslow density PCR array, oligo array, protein array, peptide array, phenotype microarray, SAGE, and/or high throughput sequencing. The readers can comprise any of those known in the art including, but not limited to array readers marked by Affymetrix, Agilent, Applied Microarrays, Arrayit, and Illumina.
As disclosed herein protein arrays are solid-phase ligand binding assay systems using immobilized proteins on surfaces which include glass, membranes, microtiter wells, mass spectrometer plates, and beads or other particles. The assays are highly parallel (multiplexed) and often miniaturized (microarrays, protein chips). Their advantages include being rapid and automatable, capable of high sensitivity, economical on reagents, and giving an abundance of data for a single experiment. Bioinformatics support is important; the data handling demands sophisticated software and data comparison analysis. However, the software can be adapted from that used for DNA arrays, as can much of the hardware and detection systems.
One of the chief formats is the capture array, in which ligand-binding reagents, which are usually antibodies but can also be alternative protein scaffolds, peptides or nucleic acid aptamers, are used to detect target molecules in mixtures such as plasma or tissue extracts. In diagnostics, capture arrays can be used to carry out multiple immunoassays in parallel, both testing for several analytes in individual sera for example and testing many serum samples simultaneously. In proteomics, capture arrays are used to quantitate and compare the levels of proteins in different samples in health and disease, i.e. protein expression profiling. Proteins other than specific ligand binders are used in the array format for in vitro functional interaction screens such as protein-protein, protein-DNA, protein-drug, receptor-ligand, enzyme-substrate, etc. The capture reagents themselves are selected and screened against many proteins, which can also be done in a multiplex array format against multiple protein targets.
For construction of arrays, sources of proteins include cell-based expression systems for recombinant proteins, purification from natural sources, production in vitro by cell-free translation systems, and synthetic methods for peptides. Many of these methods can be automated for high throughput production. For capture arrays and protein function analysis, it is important that proteins should be correctly folded and functional; this is not always the case, e.g. where recombinant proteins are extracted from bacteria under denaturing conditions. Nevertheless, arrays of denatured proteins are useful in screening antibodies for cross-reactivity, identifying autoantibodies and selecting ligand binding proteins.
Protein arrays have been designed as a miniaturization of familiar immunoassay methods such as ELISA and dot blotting, often utilizing fluorescent readout, and facilitated by robotics and high throughput detection systems to enable multiple assays to be carried out in parallel. Commonly used physical supports include glass slides, silicon, microwells, nitrocellulose or PVDF membranes, and magnetic and other microbeads. While microdrops of protein delivered onto planar surfaces are the most familiar format, alternative architectures include CD centrifugation devices based on developments in microfluidics (Gyros, Monmouth Junction, N.J.) and specialised chip designs, such as engineered microchannels in a plate (e.g., The Living Chip™, Biotrove, Woburn, Mass.) and tiny 3D posts on a silicon surface (Zyomyx, Hayward Calif.). Particles in suspension can also be used as the basis of arrays, providing they are coded for identification; systems include colour coding for microbeads (Luminex, Austin, Tex.; Bio-Rad Laboratories) and semiconductor nanocrystals (e.g., QDots™, Quantum Dot, Hayward, Calif.), and barcoding for beads (UltraPlex™, SmartBead Technologies Ltd, Babraham, Cambridge, UK) and multimetal microrods (e.g., Nanobarcodes™ particles, Nanoplex Technologies, Mountain View, Calif.). Beads can also be assembled into planar arrays on semiconductor chips (LEAPS technology, BioArray Solutions, Warren, N.J.).
Immobilization of proteins involves both the coupling reagent and the nature of the surface being coupled to. A good protein array support surface is chemically stable before and after the coupling procedures, allows good spot morphology, displays minimal nonspecific binding, does not contribute a background in detection systems, and is compatible with different detection systems. The immobilization method used are reproducible, applicable to proteins of different properties (size, hydrophilic, hydrophobic), amenable to high throughput and automation, and compatible with retention of fully functional protein activity. Orientation of the surface-bound protein is recognized as an important factor in presenting it to ligand or substrate in an active state; for capture arrays the most efficient binding results are obtained with orientated capture reagents, which generally require site-specific labeling of the protein.
Both covalent and noncovalent methods of protein immobilization are used and have various pros and cons. Passive adsorption to surfaces is methodologically simple, but allows little quantitative or orientational control; it may or may not alter the functional properties of the protein, and reproducibility and efficiency are variable. Covalent coupling methods provide a stable linkage, can be applied to a range of proteins and have good reproducibility; however, orientation may be variable, chemical derivatization may alter the function of the protein and requires a stable interactive surface. Biological capture methods utilizing a tag on the protein provide a stable linkage and bind the protein specifically and in reproducible orientation, but the biological reagent must first be immobilized adequately and the array may require special handling and have variable stability.
Several immobilization chemistries and tags have been described for fabrication of protein arrays. Substrates for covalent attachment include glass slides coated with amino- or aldehyde-containing silane reagents. In the Versalinx™ system (Prolinx, Bothell, Wash.) reversible covalent coupling is achieved by interaction between the protein derivatised with phenyldiboronic acid, and salicylhydroxamic acid immobilized on the support surface. This also has low background binding and low intrinsic fluorescence and allows the immobilized proteins to retain function. Noncovalent binding of unmodified protein occurs within porous structures such as HydroGel™ (PerkinElmer, Wellesley, Mass.), based on a 3-dimensional polyacrylamide gel; this substrate is reported to give a particularly low background on glass microarrays, with a high capacity and retention of protein function. Widely used biological coupling methods are through biotin/streptavidin or hexahistidine/Ni interactions, having modified the protein appropriately. Biotin may be conjugated to a poly-lysine backbone immobilised on a surface such as titanium dioxide (Zyomyx) or tantalum pentoxide (Zeptosens, Witterswil, Switzerland).
Array fabrication methods include robotic contact printing, ink-jetting, piezoelectric spotting and photolithography. A number of commercial arrayers are available [e.g. Packard Biosciences] as well as manual equipment [V & P Scientific]. Bacterial colonies can be robotically gridded onto PVDF membranes for induction of protein expression in situ.
At the limit of spot size and density are nanoarrays, with spots on the nanometer spatial scale, enabling thousands of reactions to be performed on a single chip less than 1mm square. BioForce Laboratories have developed nanoarrays with 1521 protein spots in 85 sq microns, equivalent to 25 million spots per sq cm, at the limit for optical detection; their readout methods are fluorescence and atomic force microscopy (AFM).
Fluorescence labeling and detection methods are widely used. The same instrumentation as used for reading DNA microarrays is applicable to protein arrays. For differential display, capture (e.g., antibody) arrays can be probed with fluorescently labeled proteins from two different cell states, in which cell lysates are directly conjugated with different fluorophores (e.g. Cy-3, Cy-5) and mixed, such that the color acts as a readout for changes in target abundance. Fluorescent readout sensitivity can be amplified 10-100 fold by tyramide signal amplification (TSA) (PerkinElmer Lifesciences). Planar waveguide technology (Zeptosens) enables ultrasensitive fluorescence detection, with the additional advantage of no intervening washing procedures. High sensitivity can also be achieved with suspension beads and particles, using phycoerythrin as label (Luminex) or the properties of semiconductor nanocrystals (Quantum Dot). A number of novel alternative readouts have been developed, especially in the commercial biotech arena. These include adaptations of surface plasmon resonance (HTS Biosystems, Intrinsic Bioprobes, Tempe, Ariz.), rolling circle DNA amplification (Molecular Staging, New Haven Conn.), mass spectrometry (Intrinsic Bioprobes; Ciphergen, Fremont, Calif.), resonance light scattering (Genicon Sciences, San Diego, Calif.) and atomic force microscopy [BioForce Laboratories].
Capture arrays form the basis of diagnostic chips and arrays for expression profiling. They employ high affinity capture reagents, such as conventional antibodies, single domains, engineered scaffolds, peptides or nucleic acid aptamers, to bind and detect specific target ligands in high throughput manner.
An alternative to an array of capture molecules is one made through ‘molecular imprinting’ technology, in which peptides (e.g., from the C-terminal regions of proteins) are used as templates to generate structurally complementary, sequence-specific cavities in a polymerizable matrix; the cavities can then specifically capture (denatured) proteins that have the appropriate primary amino acid sequence (ProteinPrint™, Aspira Biosystems, Burlingame, Calif.).
Another methodology which can be used diagnostically and in expression profiling is the ProteinChip® array (Ciphergen, Fremont, Calif.), in which solid phase chromatographic surfaces bind proteins with similar characteristics of charge or hydrophobicity from mixtures such as plasma or tumour extracts, and SELDI-TOF mass spectrometry is used to detection the retained proteins.
Large-scale functional chips have been constructed by immobilizing large numbers of purified proteins and used to assay a wide range of biochemical functions, such as protein interactions with other proteins, drug-target interactions, enzyme-substrates, etc. Generally they require an expression library, cloned into E. coli, yeast or similar from which the expressed proteins are then purified, e.g. via a His tag, and immobilized. Cell free protein transcription/translation is a viable alternative for synthesis of proteins which do not express well in bacterial or other in vivo systems.
For detecting protein-protein interactions, protein arrays can be in vitro alternatives to the cell-based yeast two-hybrid system and may be useful where the latter is deficient, such as interactions involving secreted proteins or proteins with disulphide bridges. High-throughput analysis of biochemical activities on arrays has been described for yeast protein kinases and for various functions (protein-protein and protein-lipid interactions) of the yeast proteome, where a large proportion of all yeast open-reading frames was expressed and immobilised on a microarray. Large-scale ‘proteome chips’ promise to be very useful in identification of functional interactions, drug screening, etc. (Proteometrix, Branford, Conn.).
As a two-dimensional display of individual elements, a protein array can be used to screen phage or ribosome display libraries, in order to select specific binding partners, including antibodies, synthetic scaffolds, peptides and aptamers. In this way, ‘library against library’ screening can be carried out. Screening of drug candidates in combinatorial chemical libraries against an array of protein targets identified from genome projects is another application of the approach.
A multiplexed bead assay, such as, for example, the BD™ Cytometric Bead Array, is a series of spectrally discrete particles that can be used to capture and quantitate soluble analytes. The analyte is then measured by detection of a fluorescence-based emission and flow cytometric analysis. Multiplexed bead assay generates data that is comparable to ELISA based assays, but in a “multiplexed” or simultaneous fashion. Concentration of unknowns is calculated for the cytometric bead array as with any sandwich format assay, i.e. through the use of known standards and plotting unknowns against a standard curve. Further, multiplexed bead assay allows quantification of soluble analytes in samples never previously considered due to sample volume limitations. In addition to the quantitative data, powerful visual images can be generated revealing unique profiles or signatures that provide the user with additional information at a glance.
It is understood that use of the disclosed diagnostic system and/or expression and phenotypic panels can provide the capability to diagnose a subject with schizophrenia, assess the risk of having or developing schizophrenia, classifying a schizophrenia, and targeting a treatment of a schizophrenia. Accordingly, in one aspect, disclosed herein are methods of diagnosing a subject with schizophrenia comprising obtaining a biological sample from the subject, obtaining clinical data from the subject, and applying the biological sample and clinical data to the diagnostic system disclosed herein.
In one aspect, disclosed herein are methods of diagnosing a subject with schizophrenia and/or determining the schizophrenia class comprising: obtaining a biological sample from the subject; obtaining clinical data from the subject; applying the biological sample and clinical data to a diagnostic system for diagnosing schizophrenia, wherein the diagnostic system comprises one or more expression panels and one or more phenotypic panels; and comparing the genomic and phenotypic panels results to a reference standard, for example; wherein the presence of one or more SNP sets and one or more phenotypic sets in the subjects sample indicates the presence of schizophrenia, and wherein the genomic and phenotypic profile of the reference standard (such as, for example Table 7) most closely correlating with the subjects genomic and phenotypic profile indicates schizophrenia class of the subject.
It is understood that any one or combination of the SNP sets disclosed herein can be used in the disclosed methods. Thus, disclosed herein are methods of diagnosing a subject with schizophrenia and/or determining the schizophrenia class, wherein the one or more expression panels each comprise one or more of the single nucleotide polymorphism (SNP) sets selected from the group consisting of 19_2, 88_64, 81_13, 87_76, 58_29, 83_41, 9_9, 10_4, 14_6, 56_30, 42_37, 65_25, 71_55, 12_11, 90_78, 77_5, 88_8, 51_28, 59_48, 41_12, 22_11, 13_12, 31_22, 85_84, 87_84, 16_10, 56_19, 75_31, 81_73, 85_23, 21_8, 76_74, 61_39, 75_67, 76_63, 81_3, 87_26, 88_43, 25_10, 12_2, 52_42, and 54_51.
Because of these associations noted above in Table 7, it is possible to create panels to assess the risk of a subject to have a particular classification of schizophrenia. These classification specific expression panels can be used individually in the diagnostic method disclosed herein or as one of several classification specific panels in a diagnostic method. For example, in one aspect, disclosed herein are diagnostic methods, wherein the system selects for severe process, with positive and negative symptom schizophrenia (I), and wherein the one or more SNP sets comprise 56_30, 75_67, or 76_74. Also disclosed are diagnostic methods, wherein the system selects for positive and negative Schizophrenia (II), and wherein the one or more SNP sets comprise 59_48, 71_55, 21_8, 54_51, 31_22, 65_25, or 87_84. Also disclosed are diagnostic methods, wherein the system selects for negative Schizophrenia (III), and wherein the one or more SNP sets comprise 58_29, 9_9, 22_11, 81_3, 13_12, 61_39, 10_4, 81_73, 75_31, 56_19, 88_8, or 12_2. Also disclosed are diagnostic methods, wherein the system selects for Positive Schizophrenia (IV), and wherein the one or more SNP sets comprise 88_64, 85_84, or 41_12. Also disclosed are diagnostic methods, wherein the system selects for severe process, positive schizophrenia (V), and wherein the one or more SNP sets comprise 77_5, 81_13, or 25_10. Also disclosed are diagnostic methods, wherein the system selects for moderate process, disorganized negative schizophrenia (VI), and wherein the one or more SNP sets comprise 19_2, 52_42, 90_78, 12_11, 87_76, and 14_6. Also disclosed are diagnostic methods, wherein the system selects for moderate process, positive and negative schizophrenia (VII), and wherein the one or more SNP sets comprise 42_37, 88_43, or 51_28. Also disclosed are diagnostic methods, wherein the system selects for moderate process, continuous positive schizophrenia (VIII), and wherein the one or more SNP sets comprise 16_10, 83_41, or 87_26. As with the diagnostic systems any combination 2, 3, 4, 5, 6, 7, 8, or more of the disclosed expression panels can be used in the diagnostic methods.
It is understood that any one or combination of the phenotype panels disclosed herein can be used in the disclosed methods. Thus, disclosed herein are methods of diagnosing a subject with schizophrenia and/or determining the schizophrenia class, wherein the one or more phenotype panels each comprise one or more phenotypic sets selected from the group consisting of 15_13, 12_11, 21_1, 50_46, 9_6, 46_23, 54_11, 30_17, 18_13, 27_6, 61_18, 64_11, 65_64, 12_4, 42_9, 52_28, 7_3, 48_41, 26_8, 69_41, 10_5, 17_2, 63_24, 69_66, 22_13, 53_6, 59_41, 20_19, 55_7, 34_17, 27_7, 4_1, 66_54, 8_4, 51_38, 42_7, 18_3, 46_29, 5_2, 57_39, 11_5, 24_4, 48_7, 28_23, and 25_20.
As noted in Table 7, the phenotypic sets disclosed herein have been associated with one or more symptoms of one or more schizophrenia classes. Thus, contemplated herein are classification specific phenotype panels can be used individually in the diagnostic methods disclosed herein or as one of several classification specific panels in a diagnostic method. For example, in one aspect, disclosed herein are diagnostic methods, with positive and negative symptom schizophrenia (I), and wherein the one or more phenotypic sets comprise 15_13, 12_11, 21_1, 50_46, 9_6, 46_23, 54_11, 30_17, 18_13, 27_6, 61_18, 64_11, or 65_64. Also disclosed are diagnostic methods, wherein the system selects for positive and negative schizophrenia (II), and wherein the one or more phenotypic sets comprise 12_4 or 42_9. Also disclosed are diagnostic methods, wherein the system selects for negative schizophrenia (III), and wherein the one or more phenotypic sets comprise 52_28, 7_3, 48_41, 26_8, 69_41, 10_5, or 17_2. Also disclosed are diagnostic methods, wherein the system selects for positive schizophrenia (IV), and wherein the one or more phenotypic sets comprise 63_24 and 69_66. Also disclosed are diagnostic methods, wherein the system selects for severe process, positive schizophrenia (V), and wherein the one or more phenotypic sets comprise 22_13, 18_13, 53_6, 59_41, 20_19, 55_7, 34_17, 69_66, 27_7, 18_13, 4_1, 66_54, or 8_4. Also disclosed are diagnostic methods, wherein the system selects for moderate process, disorganized negative schizophrenia (VI), and wherein the one or more phenotypic sets comprise 51_38, 42_7, 18_3, or 46_29. Also disclosed are diagnostic methods, wherein the system selects for moderate process, positive and negative schizophrenia (VII), and wherein the one or more phenotypic sets comprise 5_2, 57_39, 11_5, or 24_4. Also disclosed are diagnostic methods, wherein the system selects for moderate process, continuous positive schizophrenia (VIII), and wherein the one or more phenotypic sets comprise 48_7, 28_23, or 25_20. As noted above, the disclosed classification specific phenotype panels can be used alone or in combination of 2 or more with any other classification specific phenotype panel in the disclosed diagnostic methods.
The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how the compounds, compositions, articles, devices and/or methods claimed herein are made and evaluated, and are intended to be purely exemplary and are not intended to limit the disclosure. Efforts have been made to ensure accuracy with respect to numbers (e.g., amounts, temperature, etc.), but some errors and deviations should be accounted for. Unless indicated otherwise, parts are parts by weight, temperature is in ° C. or is at ambient temperature, and pressure is at or near atmospheric.
a) Identifying Many SNP Sets as Candidates for Schizophrenia Risk
We first investigated the genotypic architecture of schizophrenia in the MGS study to identify SNP sets without knowledge of the subject's clinical status (i.e., case or control). Our exhaustive search uncovered 723 nonidentical and possibly overlapping SNP sets in the MGS samples. The SNP sets varied in terms of numbers of both subjects and SNPs. For example, one group contains 70 subjects and 24 SNPs, as expected because few subjects can share a large number of SNPs. Conversely, another group contains 258 subjects and three SNPs, as expected because a large number of subjects are likely to share only a few SNPs. Initially, we retained a large number of SNP sets merely to identify the genotypic clusters in all subjects whether they had schizophrenia or not.
b) SNP Sets Vary Greatly in Risk for Schizophrenia
Second, we computed the risk for schizophrenia in carriers of each SNP set (
The global variance in liability to schizophrenia explained by the average effects of all SNPs simultaneously in our sample was 24%. While individual SNPs were mostly low penetrant, many high-risk SNP sets were highly penetrant (e.g., 100% to 70%; see Table 1) and much more informative in predicting schizophrenia risk.
c) Relations Among SNP Sets to One Another and to Gene Products
We show herein that schizophrenia may be an etiologically heterogeneous group of illnesses in which some genotypic networks are disjoint, that is, share neither SNPs nor subjects. To test this, we first checked for overlap in constituent SNPs and/or subjects among all the SNP sets at high risk for schizophrenia (see
We also determined that some SNP sets share SNPs but not subjects (e.g., 59_48 and 87_76;
When evaluating whether different genotypic networks operate through distinct mechanisms, we found that high-risk SNP sets mapped to various classes of genes (e.g., protein coding, ncRNA genes, and pseudogenes) related to known functions and causing different effects on their products (
aThe 42 SNP sets at high risk for schizophrenia involved at least 96 gene loci, including 54 protein-coding loci and 42 polymorphisms at regulatory sites, as well as 112 polymorphisms in either intergenic or unannotated regions (see full Tables 2 and 6 and FIG. 7)
GSR
HACE1
NCAM1
SEMA3A
EML5
DUSP4
CSMD1
NTRK3
SNTG1
NALCN
OPN5
NETO2
PTBP2
RPL5
SNX19
SMARCAD1
FAM69A
HPGDS
GP2
d) Complex Genotypic-Phenotypic Relationships in Schizophrenia
Next we examined whether the complex genetic architecture of schizophrenia leads to phenotypic heterogeneity. Using data from the Diagnostic Interview for Genetic Studies, as well as from the Best Estimate Diagnosis Code Sheet submitted by GAIN/non-GAIN to dbGaP (see
Specifically, we identified a phenotypic set indicating a general process of severe deterioration (i.e., continuous positive symptoms with marked and progressive impairment) that was associated with many SNP sets (e.g., SNP sets 75_67 and 56_30, with p values, 2.3E-13 and 2.55E-05, respectively; Table 7,
e) Positive and Negative Symptoms Differentiate Classes of Schizophrenia
Genotypic and phenotypic relationships could be grouped into eight classes of schizophrenia, as shown in
f) Replication of Results in Two Independent Samples
We tested the replicability of our findings in the MGS study by carrying out the same analyses of the genotypic and phenotypic architecture of schizophrenia in the CATIE and Portuguese Island samples. A total of 1,303 SNPs were shared between the selected SNPs in the MGS and CATIE samples, and 1,234 SNPs between the MGS and Portuguese Island samples. Imputed variants were not considered, to avoid possible biases.
Together, both samples reproduced at least 81% of the SNP sets at risk (see Table 9). In addition, most of the SNP sets replicated in the two PGC samples achieved risk values as high as those of the MGS sample (>70%: 70% of those identified exhibit >70% risk, and 90% show >60% risk. Some SNP sets exhibited slightly higher risk values than those in the MGS sample. The genotypic-phenotypic relations in CATIE and the Portuguese Island studies closely matched those observed in the MGS study (hypergeometric statistics, p values 2E-13 to 1E-03). The eight schizophrenia classes exhibited high reproducibility. For example, except for one relation (“−” in the MGS study and “+ and −” in CATIE; see Table 9), all relations exhibited similar positive and negative symptoms in the MGS study and CATIE. Three relations showed less specific symptoms in CATIE than in the MGS study, as expected because CATIE did not use the Diagnostic Interview for Genetic Studies.
We found few differences when comparing the MGS and Portuguese Island studies (see Table 9), except differences in severity that preserved the sign of the symptoms. Three relations with negative symptoms in the MGS study exhibited negative and positive symptoms in the Portuguese Island sample (see Table 9). Only two SNP sets in the Portuguese Island sample had no significant crossmatch with the phenotypic features expected from the MGS study.
We first identified sets of interacting single-nucleotide polymorphisms (SNPs) that cluster within subgroups of individuals (SNP sets) regardless of clinical status in the MGS Consortium study, employing our generalized factorization method combined with non-negative matrix factorization to identify candidates for functional clusters (see
Second, we examined the risk of schizophrenia for each SNP set and identified those with high risk. The statistical significance of the association of SNP sets with schizophrenia was calculated using the SNP-Set Kernel Association Test (SKAT) program, which properly accounts for multiple comparisons.
Third, we checked for significant overlap among SNP sets in terms of subjects and/or SNPs using hypergeometric statistics (see
Fourth, we identified sets of distinct clinical features that cluster in particular cases with schizophrenia (i.e., phenotypic sets or clinical syndromes) without regard for their genetic background, again using non-negative matrix factorization. Ninety-three clinical features of schizophrenia from interviews based on the Diagnostic Interview for Genetic Studies, as well as the Best Estimate Diagnosis Code Sheet submitted by GAIN/non-GAIN to dbGaP, were initially considered with the MGS sample. The Diagnostic Interview for Genetic Studies was utilized for the Portuguese Island samples. Corresponding features were extracted in CATIE from the Positive and Negative Syndrome Scale, the Quality of Life Questionnaire, and the Structured Clinical Interview for DSM-IV. These phenotypic sets and their relations with one another characterize the phenotypic architecture of schizophrenia (
Fifth, we tested whether SNP sets were associated with distinct phenotypic sets in the MGS sample, and we tested the replicability of these relations in the two other independent studies. Replication was evaluated in terms of replication of the SNP sets and their corresponding risk, as well as the relationships between SNP sets and phenotypic sets. In the samples that used the Diagnostic Interview for Genetic Studies (the MGS and Portuguese Island samples), the specific phenotypic features can be compared. Since the CATIE study did not use the Diagnostic Interview for Genetic Studies, we estimated the corresponding symptoms from available phenotypic data (based on the Positive and Negative Syndrome Scale, the Quality of Life Questionnaire, and the Structured Clinical Interview for DSM-IV). Genotypic and phenotypic data were available for 738 cases in CATIE and 346 cases in the Portuguese Island study. The significance of cohesive relations among SNP sets and clinical syndromes was tested using hypergeometric statistics. The relations between the genotypic and phenotypic clusters characterize the genotypic-phenotypic architecture (
a) Genomics Dataset: Gain and NonGain Studies
We first investigated the architecture of schizophrenia (SZ) using the Gain and NonGain genome wide association studies (GWAS) as our main targets, which are coherent case-control studies performed in a single lab under similar conditions. This study contains data from 8023 subjects, 4196 patients and 3827 controls, combining data from Euro-American ancestry (EA) and African-American ancestry (AA). Genotyping was carried using the Affymetrix 6.0 array, which assays 906,600 SNPs.
This study was originally performed in part at Washington University. Study population, ascertainment, phenomics and genomic datasets, as well as other information relative to this study can be accessed in the dbGaP by their identifiers: phs000021.v3.p2 and phs000167.vl.p1 for GAIN and NonGAIN projects, respectively.
The genotype data was codified in a matrix [SNPs×subjects], where the columns and rows correspond to subjects and SNPs, respectively. In each cell of the matrix, the value for the corresponding SNP and subject is assigned as 1, 2, and 3 for the SNP allele values AA, AB, and BB, respectively. Missing values were initialized by 0.
b) Data Cleaning
The quality control (QC) of the genotypic data was performed following the steps removing consequently all the SNPs satisfying the next criteria:
A total of 209,321 SNPs were excluded due to the restrictions described above from the total 906,109 SNPs genotyped. Therefore, 696,788 SNPs passed the QC filters. Then, 2891 SNPs were pre-selected to reduce the large search space using the logistic association function included in the PLINK software suite, taking sex and ancestry as co-variates, and establishing a generous threshold (p-value<0.01). This threshold was established as 0.01 because this is approximately the value used in the supplementary tables reported in previously for AA, EA and AA-EA analyses.
c) Methodology: a Divide & Conquer Strategy to Dissect a GWAS into the Genotypic-Phenotypic Architecture of a Disease
To uncover the architecture of SZ we applied a “Divide & Conquer” strategy (see
The “divide” step deconstructs genotypic and phenotypic data independently, and explores multiple local patterns (i.e., SNP sets and phenotypic sets). We used non-negative matrix factorization methods that have been applied to characterize complex genomic and social profiles, and generalized them to approach GWA data in a purely data-driven and unbiased fashion.
Thus, our systematic grouping strategy is not directed by previous knowledge of polygenic involvement in SZ, does not limit subjects to only one SNP set, and does not predefine the number of SNP sets, avoiding possible biases and 4 assumptions that relationships are linear, regular, or random. Unlike other approaches, we do not constrain SNP sets to a particular genome feature or to be in linkage disequilibrium (LD), and the phenotypic status of the subjects is not considered in SNP set formation (i.e., it is unsupervised).
After incorporating phenotypic status a posteriori within each set (e.g., cases and controls), we establish their statistical significance with powerful and well-founded test methods that perform the appropriate corrections for the use of SNP sets, as well as provide an unbiased risk surface of disease to test predictions.
The “conquer” step consists of three stages. First, assembling the uncovered local components of the genotypic architecture into genotypic networks of SNP sets, where two SNP sets are connected if they (i) comprise different sets of subjects described by similar sets of SNPs, (ii) and/or if they have similar sets of subjects but characterized by distinct sets of SNPs, (iii) and/or if one of the two SNP sets contains a subset of subjects and SNPs of the other SNP set. Second, optimally combining the local components of the phenotypic architecture (i.e., phenotypic sets) with the genotypic sets to expose the joint genotypic-phenotypic architecture of the disease. Third, evaluating complexity in the pathway from SNP sets to phenotypic sets; some connected SNP-set networks may be candidates to converge to equifinality, whereas other disjoint networks can lead to multifinality (i.e., recognizing a collection of diseases).
Finally, we carried out independent analyses to test for possible confirmations of the heterogeneous architecture of SZ. We performed bioinformatics analysis of genes related to each uncovered relationship and their molecular consequences. Then, we computationally and clinically evaluated the genotypic-phenotypic relations to determine sub-classes of the disease based on whether the groups of SZ patients varied on a range of positive and/or negative symptoms.
d) Method
Given a genotype database from a GWAS represented as a matrix [SNPs×subjects], the method for dissecting the architecture of a disease is composed of 6 steps (
(1) Identify SNP Sets
Use a Generalized Factorization Method (GFM) to dissect a GWAS into SNP sets (see below for a mathematical description of NMF). The GFM applies recurrently a basic factorization method to generate multiple matrix partitions using various initializations with different maximum numbers of sub-matrices k(e.g., 2≦k≦√n), where n is the number of subjects, and thus, avoids any pre-assumption about the ideal number of sub-matrices (see below for a rationale about the use of unconstrained number of sub-matrices or clusters). Particularly, we developed a new version of the basic bioNMF method termed Fuzzy Nonnegative Matrix Factorization method (FNMF), and used it as a default basic factorization method. FNMF allows overlapping among sub-matrices, and detection of outliers. For each run of the basic factorization method (2≦k≦√n)), all sub-matrices are selected to compose a family of genotypic SNP sets G_k={G_k_i}, where 1≦i≦k. Each G_k family, as well as all families together G={G_k} for all k, may include overlapped, partially redundant and different-size sub-matrices.
(2) Perform a Statistical Analysis of SNP Sets
Use the R-project package SKAT to evaluate the significance of each SNP set. We used the identity-by-state (IBS) as a kernel because the analyzed variants are not rare but common, and therefore, using the “weighted IBS” kernel would not be adequate. Since the SNP sets can overlap, we run each one separately. The sex and ancestry of the subjects were used as covariates, and the default remaining parameters were utilized.
(3) Map a Disease Risk Function
3.1) Estimate the risk of a SNP set. Incorporate a posteriori the status of the subjects in a weighted average of epidemiological risks function of all subjects in a particular SNP set:
with ST being the status of the instances (i.e., cases and controls) and Q the weights given by epidemiologic risk of SZ in each SNP set (e.g., 0 and 1 for controls and cases; 0.01, 0.1 and 1 for cases, relatives and controls, respectively).
3.2) Plot the genotype risk surface of the disease. Encode each SNP set into a 3-tuple (X, Y, Z), where SNP sets are placed along the x- and y-axis using a dendrogram based on their distances in the SNP (see step 4.1, MSNPs) and subject (see step 4.2, Msubjects) domains, respectively, and Z is the risk variable calculated in (eqn. 1). Interpolate and plot the surface by using the tgp and latticeExtra packages in R-project, respectively.
(4) Discover and Encode Relations Among SNP Sets into Topologically Organized Networks
4.1) Identify optimal and non-redundant relations between SNP sets based on their shared SNPs and, separately, based on their shared subjects. Overlap of SNP sets refers to overlap of SNP loci, which, in most of our cases leads also to sharing allele values. The sharing of alleles is fully true when there is overlap of both loci and subjects.
4.1.1) Co-cluster all G_k_i SNP sets within G by calculating the pairwise probability of intersection among them using the Hypergeometric statistics (PIhyp) on intersected SNPs: PIhyp (G_e_q, G_r_w) (eqn. 2, see below), where q and w are SNP sets generated in runs with a maximum of e and r number of sub-matrices, respectively, and p in (eqn. 2) is the intersection of SNPs. Then, encode all PIhyp-values, which encompass—in some extent—the distance between SNP sets, in a square [SNP set×SNP set] matrix MSNPs.
4.1.2) Repeat the former procedure based on intersected subjects and determine the Msubjects matrix.
4.1.3) Eliminate highly overlapped/redundant SNP sets, which may occur due to the repetitive application of the factorization methods, by deleting all except one SNP set where Max(MSNPs[i,j], Msubjects[i, j])≦δ, for all i, j indices in the matrices. Here, we used δ=10E-15.
4.2) Organize SNP sets sharing SNPs and/or subjects into subnetworks.
4.2.1) For each row i and column j in MSNPs, MSNPs[i, j]≦φ, connect the corresponding SNP sets with a blue line, indicating that they share SNPs. In our case, we established φ≦3E−09. This value results from adjusting typical p-value of 0.01 by the total number of pairwise comparisons between all possible generated SNP sets [4094×4094, by using the Hypergeometric-based test (eqn. 2)], likewise a Bonferroni correction.
4.2.2) For each row i and column j in MSNPs, Msubjects[i, j]≦φ, connect the corresponding SNP sets with a red line, indicating that they share subjects.
(5) 5) Identify Genotype-Phenotype Latent Architectures
5.1) Create a phenotype database. Dissect the questionnaire based on DIGS and the Best Estimate Diagnosis into individual variables. The variables can be numerical or categorical. For efficiency, in our case, each categorical variable was re-coded into different variables with binary values. The phenotype data was codified in a [phenotype features×subjects] matrix, where the columns and rows correspond to subjects and phenotypic features, respectively. In our case, because the phenotypic features from cases are different from those from the controls, we only considered the cases.
5.2) Identify phenotype sets (Implemented in the PGMRA web server). Use step 1) with the phenotype database from 5.1) instead of genotype database to identify phenotypic sets, where a phenotypic set is a sub-matrix harboring subjects described by a set of phenotypic features sharing similar values (i.e., P_h_j, where j is a phenotypic set generated in a run with a maximum of h number of sub-matrices).
5.3) Identify genotypic-phenotypic relations. Co-cluster SNP sets with phenotype sets into relations using the Hypergeometric statistics on intersected subjects, where Ri,j=PIhyp (G_k_i, P_h_j) (see below, eqn. 2), G_k_i and P_h_j are SNP and phenotypic sets, respectively, and p in (see below, eqn. 2) is the intersection of subjects. Relations Ri,j<T constitute the genotypic-phenotypic architecture of a disease. The significance of the relations (T) was established by the p-value (PIhyp) provided by the Hypergeometric-based test (see below, eqn. 2).
(6) Annotate Genes, and Symptoms/Classes of Disease
6.1) Map latent architectures to the genome. For each SNP set, we analyze all genes being affected by each of the SNPs in a SNP set. This analysis includes the SNP location with respect to a gene, the type and number of genes being affected by one SNP (e.g., protein coding, ncRNA genes, and pseudogenes), the possible transcripts being affected and the position where they are affected (e.g. coding region, distance to stop codon, splicing site, intron, UTR, ect.), and finally promoter and intergenic regions' features are inspected for annotation if the SNP does not overlap with a gene then regulatory. Moreover the possible molecular consequences of each SNP over function is provided, as well as, the corresponding allele values. Annotation information was obtained from the Haploreg DB and from the Ensembl and NCBI web services (see below).
Once we obtain the information described above, we generate a list of relevant genes that it is used to query the Nextbio web site in order to find diseases related to each gene. NextBio uses proprietary algorithms to calculate and rank the diseases and drugs most significantly correlated with a queried gene, where rank values are established relative to the top-scored result (score set to 100). Therefore, although a low-scoring result might have less statistical significance compared to the top-ranked result, it could still have real biological relevance. In our case, out of all possible diseases, only the categories “Mental Disorders” and “Brain and Nervous System Disorders” were considered from the “Disease Atlas”.
6.2) Map latent architectures to disease symptoms or classes of disease.
6.2.1) Characterize each phenotypic feature by the type of symptoms that they represent. First, explore the distribution of the phenotypic dataset by calculating the principal components (PCA, Statistic Toolbox, Matlab R2011a) of the Phenotypic sample, where the columns are subjects and the rows are the phenotypic variables. Here we used as many PCs as needed to account for the 75% of the sample (5 PCs). In the sample with the phenotypic features as rows and the PCs as columns, cluster the rows by using Hierarchical Clustering (Correlation and Maximum as inter and intra-clustering measurements, Statistic Toolbox, Matlab R2011a). This clustering process generates natural groups of features constitution natural partition hypotheses about the phenotypic features. Second, evaluate each phenotypic feature included in the phenotype database using curated information from experts and the literature and individually classify each item based on the symptoms as purely positive (1), purely negative (4), primarily positive (2) or primarily negative symptoms (3).
6.2.2) For each phenotypic set P_h_j related to a SNP set G_k_i in Ri,j re-code each phenotypic feature by their positive and/or negative symptoms in a [Ri,j X phenotypic feature] matrix Msymptons.
6.2.3) Cluster the encoded features by factorizing Msymptoms into sub matrices using a basic factorization method with a maximum number of sub-matrices defined by the Cophenetic index.
6.2.4) Label the latent classes of the diseases. (The current results provided 8 classes, see
e) Mathematical Description of NMF
We consider a GWA data set consisting of a collection of N
f) Rationale for the Use of Unconstrained Number of Clusters
Although there are many indices that estimate the appropriate number of clusters for a given partition, we previously demonstrated that they are often constrained by the type of cluster, and metrics utilized. Therefore, it is hard to obtain a consensus from all of them, and they very often provide contradictory results. Moreover, given that the target of the method is to obtain good relations among clusters from different domains of knowledge, it is not known which cluster in one domain will match another cluster in a different domain, and thus, the more varied the clusters, the better the chance of identifying posterior inter-domain relations. To do so, we repeatedly applied a basic clustering method in one domain of knowledge to generate multiple clustering results using various numbers of clusters initializations (from 2 to √n, where n is the number of observations/subjects).
g) Coincident Test Index: Co-clustering and Establishing Relations Between Sets
The degree of overlapping between two SNP or phenotypic sets was assessed by calculating the pairwise probability of intersection among them based on the Hypergeometric distribution (PIhyp):
where p observations belong to a set of size h, and also belong to a set of size n; and g is the total number of observations. Therefore, the lower the PIhyp, the higher the overlapping. The (p-value of) hypergeometric “test” is used here as a measure of association strength. The real test (p-value) of genotypic-phenotypic relationship was provided through the permutation procedure.
h) Permutation Test for Genotypic-Phenotypic Relations
Statistical significance reported values were obtained by 4000 independent permutations due to the comparisons between all possible generated SNP sets (i.e., 4094, from 2 to √n), and possible overlapped SNP sets here identified were generated as following: a) assign random subjects to a phenotypic cluster of random size; b) assign random subjects to a genotype cluster (set) of random size; c) calculate the Hypergeometric statistic (PIhyp, eqn 2) between the two clusters and accumulate the value. These values form an empirical null distribution of PIhyp used to calculate the empirical p-value of an identified relation. All optimal relations had empirical p-value≦value<4.7E-03.
i) Resampling Statistics of the NMF Sets
To guarantee the submatrices converge to the same solution and, given the non-deterministic nature of NMF and its dependence on the initialization of the W and H vectors, we run it 40 times for any k maximum number of allowed submatrices with different random initializations of the vectors to select those that that best approximates the input matrix. Besides, to estimate the precision of sample statistics of the SNP sets (variance of the W and H vectors) we use a leave-one-out technique (jackknifing) 1000 times on the SNP domain and obtained a 94% support for all identified sets with an average variance of c.a.±5% of their corresponding W and H vectors. Finally, we already modified this sampling technique to ensure the occurrence of the remaining sets after a leave-one-set-out and applied to our current sample with >90% of support.
j) Data Reduction
Data reduction was not applied because many Principal Components (PCs) were required in this study, consistent with the demonstration that clustering with the PCs instead of the original variables does not necessarily improve, and often degrades, cluster quality and interpretability. Moreover, likewise in phenomics, partially correlated variables reinforce the association and clarify the symptom identification process. Therefore, we used initially 93 phenotypic features listed in Appendix I, catalog of phenotypic features.
Briefly, phenotypic features used in the search process included all available data from the interviews. That is, replies to DIGS as well as to the Best Estimate Diagnosis code sheet submitted by GAIN/NONGAIN to dbGaP. Unbiased compilation of all of the data resulted in an initial set of 93 features. To capture items specific for positive and negative schizophrenia and avoid symptoms with affective elements, symptoms reported by acutely psychotic patients, and redundant items the original set of was pruned based on authors clinical experience, and computational feature validation (above in Method, step 6.2.1).
Given that genotypic SZ architecture is composed of multiple networks, we matched each SNP set composing these networks with the corresponding genomic location of their SNPs, and in turn, with the mapped genes (
The uncovered SNP sets contain SNPs that map gene, promoter and intergenic regions (IGRs) located anywhere in the genome, without being constrained by genomic features such as a specific gene or haplotype (28). For example, SNP set 81_13 contains SNPs in chromosomes 8 and 16, whereas SNP set 42_37 has SNPs located in chromosomes 2 and 11 (
In addition to mapping genes in different locations, SNP variants within the SNP sets affect distinct classes of genes including protein-coding, non-coding (ncRNA) genes, and pseudogenes, with different molecular consequences depending on the altered region (coding, UTRs, introns, Table 4). For example, only 25% of SNPs in SNP set 75_67 affect protein-coding genes, which are the targets most often considered in genetic studies of diseases, whereas another 25% of SNPs affect ncRNAs (lincRNAs, antisense RNAs, miRNAs). One of these lincRNAs is SOX2-OT, which is associated with >15 possible transcripts (Table 4); it is contained inside the SOX2 transcription factor that is predominantly expressed in the human brain where SOX2-OT is also highly enriched.
Likewise, SNPs from SNP set 22_11 are located within a large intergenic region corresponding to two overlapping and newly characterized long ncRNAs AC068490.2 and AC096570.2 (Table 4). Moreover, two SNP variants of SNP set G19_2 affect miRNA AL354928.1 and small nuclear RNA U4, as well as protein-coding GOLGA1 gene (
A detailed analysis of SNPs and mapped genes revealed at least three complex scenarios affecting multiple genes in different fashions (activation, repression, antisense modulation) and producing different molecular consequences (Table 4). First, we determined that even a single SNP within a SNP set could produce different consequences in affected transcripts (Table 4). For example, one SNP from SNP set 81_13 was located in a protein-coding region of the SNTG1 gene, which can produce either a change in an intron or in a transcript affecting nonsense-mediated protein decay that would be eliminated by a surveillance pathway containing a premature stop codon (Table 4). Second, we found that multiple SNPs within a SNP set can affect multiple genes in different ways. This heterogeneity is exemplified by SNPs from SNP set 19_2 intersecting with both ncRNAs and the GOLGA1 gene (
Most genes mapped by the SNP sets are involved in neurodevelopment (Table 3). For example, the SNP set 81_13 (
We identified distinct pathways (see Tables 2 and 6, and
Akt is a Serine/threonine Kinase, it is activated by tyrosine kinase receptors, integrins, T and B cell receptors, cytokine receptors, G-proteins-coupled receptors and other stimuli that involves the production of PIP3 triphosphate (phosphatidylinositol triphosphate) by PI3K (phosphoinositide 3 kinase). PI3K can be activated by different ways:
FOXR2 (forkhead box R2) is a proto-oncogene when it is mutated, maintained cell growth and proliferation through activation of RAS (GTPase) increase aberrant signaling through pathways PI3K/AKT/mTOR and RAS/MAP/ERK, inhibiting apoptosis.
SOD3 (superoxide dismutase 3) causes increased of phosphorylation of ERK/Ras and PIP3 because PI3K, SOD3 may be Phosphorilated by Erk1/2.
SEMA3A inhibits the proliferation and cell growth in neurons and prevents axonal growth by inhibiting the PI3K/Akt via inhibition of Ras. Neuropilin and SEMA1 bound active apoptosis via PI3K/Akt.
RAS (GTPase) can be activated by FOXR2 mutated by SOD3 and inhibited by Sema3A. Ras and PI3K can activate mTORC1 by cRaf/MEK/ERK.
SNX19 inhibits Akt phosphorylation resulting in apoptosis.
STYK1 oncogene that binds to Akt to activate the cascade signaling downstream and leading to increased tumor cells and increasing the risk of metastasis.
CHST9 catalyzes the sulfates transfer to N-acetylgalactosamine residues, inhibits Cd19/p85/PI3K-p110 complex.
RRAGB is part of RAG proteins that interact with mTORC1 family and are required for activation of amino acids via mTORC1.
p38 MAPKs (α, β, γ, and δ) are members of the MAPK family that are activated by a variety of environmental stresses and inflammatory cytokines. As with other MAPK cascades, the membrane-proximal component is a MAPKKK, typically a MEKK or a mixed lineage kinase (MLK). The MAPKKK phosphorylates and activates MKK3/6, the p38 MAPK kinases. MKK3/6 can also be activated directly by ASK1, which is stimulated by apoptotic stimuli. p38 MAPK is involved in regulation of HSP27, MAPKAPK-2 (MK2), MAPKAPK-3 (MK3), and several transcription factors including ATF-2, Statl, the Max/ Myc complex, MEF-2, Elk-1, and indirectly CREB via activation of MSK1. This pathway may be activated by activation of PI3K way Rac/MEK/ERK.
DUSP4 is a MKP able of inhibiting p38MAPK 12 and 14a, is regulated by TNF-a expression. Decreases ERK 1/2 and reducing the cellular viability by alteration of the NF-κB/MAPK pathways.
MAGEH1 expression causes apoptosis of melanoma cells through the interaction with the inner region to the membrane of the p75 neurotrophin receptor (p75NTR) one TNF receptor type, and possibly also through competition with the TNF receptor associated factor-6 (TRAF6) and catalytic neurotrophin receptor (TRK) for the same site of interaction with p75.
TRPS1 The gene encodes for an atypical member of the GATA family. It can activate Snail 1 to produce inhibition of cadherines inside of nucleus.
ST18 is a promoter of hypermethylation, ST18 loss of expression in tumor cells suggests that this epigenetic mechanism responsible for the specific down-regulation of tumor.
SPATA7 may be involved in the preparation of chromatin in early meiotic prophase in the nuclei for the initiation of meiotic recombination.
ZC3H14 a protein with zinc finger Cys3His evolutionarily conserved that specifically binds to RNA and polyadenosine therefore postulated to modulate post-transcriptional gene expression.
U4, is part of snRNP small nucleolar ribonucleic particles (RNA-protein), each one bind specifically to individual RNA. The function of the human U4 3″SL micro RNA is unclear. It exists to enable the formation of nucleoplasm in Cajal bodies.
PPP1R1C (Protein phosphatase 1, regulatory subunit 1C) is a protein-coding gene and inhibitor of PP 1, and is itself regulated by phosphorylation. It promotes cell growth and may protect against cell death, particularly when induced by pathological stress.
PRPF31 main function is thought to recruit and strap for U4/U6 U5 tri-snRNP.
EVI5 works in G1/S phases, prevents phosphorylation of Emi 1 by Plk1 and therefore inactive APC/C and accumulates cyclin A. In prometaphase, Plk1 phosphorylates to EVI5, producing its inactivation and subsequent activation of APC/C and downstream signaling pathways to complete the mitotic cycle.
SNORA42: The main functions of snoRNAs has long been thought to modify, mature and stabilize rRNAs. These posttranslational modifications-transcriptional are important for production of accurate and efficient ribosome. Moreover, some snoRNAs are processed to produce small RNAs.
SNORD112. SnoRNAs act as small nucleolar ribonucleoproteins (snoRNPs), each of which consists of a C/D box or box H/ACA RNA guide, and four C/D and H/ACA snoRNP associated proteins. In both cases, snoRNAs specifically hybridize to the complementary sequence in the RNA, and protein complexes associated then perform the appropriate modification to the nucleotide that is identified by the snoRNAs.
SMARCAD1 contributes as part of a large complex with HDAC1, HDAC2, and KAP1 G9A to integrate with nucleosome spacing and histone deacetylation. H3K9 methylation is required for heterochromatin restore apparently facilitates histone deacetylation and H3K9mc3. How chromatin remodeling is done by deacetylation is unknown, but it seems to coordinate spacing between nucleosomes with H3K9 acetylation and monomethylation.
SLC25A14 uncoupling protein that facilitates the transfer of anions from the inside of the mitochondria to the outer mitochondrial membrane and the return transfer of protons from the outside to the inner mitochondrial membrane. SLC25A14 functional role in cellular energy supply and the production of superoxide after it overexpressed in neuronal cells. In untreated culture conditions, overexpression of MMP and SLC25A14 significantly decreased content of intracellular ATP.
TMEM135, some studies have demonstrated TMEM135 association with mitochondrial's fat metabolism, and a possible role for TMEM135 recently identified in improving fat storage.
VDAC3 selective Anions voltage-dependent channels (VDACs) are proteins that form pores allowing permeability of the mitochondrial outer membrane. A growing body of evidence indicates that VDAC plays a major role in metabolite flow in and out of mitochondria, resulting in regulation of mitochondrial functions.
SLC20A2 the proteins of this group transport stream comprises an initial joining of a Na+ion, followed by a random interaction between Pi (inorganic phosphorus) monovalent and second ion Na+. Reorientation loaded carrier, then leads to the release substrate in the cytosol.
NALCN encoding a voltage-independent, cationic, non-selective, non-inactivating, permeable to sodium, potassium and calcium channel when expressed exogenously in HEK293 cells. Sodium is important for neuronal excitability in vivo, the NALCN channel seems to be the main source of sodium leak in hippocampal neurons and because these two processes are strongly altered in schizophrenia is the hypothesis had to NALCN could show a genetic association with schizophrenia.
HACE1 is a tumor suppressor, catalyses poly-Rac1 ubiquitylation at lysine 147 upon activation by HGF, resulting in its proteasomal degradation. HACE1 controls NADPH oxidase. HACE1 promotes increased binding to Rac1 regulating the NADPH oxidase, decrease the production of oxygen free radicals, and inhibit the expression of cyclin D1 and decrease susceptibility to damage DNA. HACE1 loss leads to overactive NADPH oxidase, increased ROS generation, also the expression of cyclin D1 and DNA damage induced by ROS.
NCAM1 is a constitutive molecule expressed on the surface of various cells, promotes neurite outgrowth, nerve branching, fasciculation and cell migration.
OPN5 apparent gabaergic interaction in Synaptic space.
NETO2 is an auxiliary subunit determines the functional propiedadde KARS proteins (kainate, a subfamily of ionotropic glutamate receptors—iGluRs-) that mediate excitatory synaptic transmission, regulate the release of neurotransmitters and in selective distribution in brain.
VANGL1 This gene encodes a member of the family tretraspanin. Mutations in this gene are associated with neural tube defects. Alternative splicing results in multiple transcript variants.
DKK4 is a DKK to block the expression of LRP and thus union with the complex Frizzled and Wnt/SFRP/WIF blocking the release of b-catenin.
NTRK3 is a member of the family of neurotrophin receptors and is critical for the development of the nervous system. Published studies suggested that NTRK3 is a dependence receptor, which signals both the ligand-bound state (“on”) and the free ligand (“off”) state (see chart). When present the ligand neurotrophin-3 (NT-3), NTRK3 trigger signals within the cell via a tyrosine kinase domain in promoting cell proliferation and survival. In the absence of NT-3, NTRK3 signals for cell death by triggering apoptosis. Therefore, NTRK3 have the potential to be an oncogene or tumor suppressor gene function of the presence of NT-3.
PSMC1 is involved in the destruction of the protein in bulk at a fast or slow rate in a wide variety of biological processes such as cell cycle progression, apoptosis, regulation of metabolism, signal transduction, and antigen processing.
PTBP2 Ptbp1 and Ptbp2 regulate the alternative splicing of various RNA target assemblies, suggesting that the roles of Ptbp1/2 proteins are different in different cellular contexts. Ptbp2 functions in the brain are not clear.
RyR3s is a type of ion channel that intracellular free Ca2+ when opened from the endoplasmic reticulum (ER). It is very similar to the inositol triphosphate receptor (inositol-1,4,5-triphosphate) IP3R. The main signal to trigger the opening of RyRs are Ca2+ has usually entered through voltage-dependent channels of cell membrane. RyR3 is expressed in several cell types including the brain in small quantities, RyR3 deficient mice have impaired hippocampal synaptic plasticity and impaired learning. ATP also stimulates the activity of the channels RyR3. The therapeutic targets focus on molecules that induce release control, internalization and calcium mobilization.
RPL35 is a protein binding to the signal recognition particle (SPR) and its receptor (SR). They mediate targeting complexes nascent chain-ribosome to the endoplasmic reticulum.
RPL5 is an MDM2 binding protein (MDM2 oncogene, protein E3 ubiquitin ligase) and SRSF 1 (serine/rich splicing factor arginine 1) to stabilize p53 oncogene and to induce cell senescence. RPL can join RPL11 and other ribosomal proteins to silence Hdm2 and p53.
FAM69A calico dependent kinase, extracellular and intracellular, localized in the endoplasmic reticulum.
GOLGA1 is part transport proteins of the Golgi apparatus, which participates in glycosylation and transport of proteins and lipids in the secretory pathway.
EMLS blocks EMAP via MAP or stabilization of microtubules.
ARPC5L component can function as Arp2/3 complex which is involved in the regulation of actin polymerization and together with the activation of factor inducing nucleation (NPF) mediates the formation of branched networks of actin. It belongs to the family Arpc5.
CSMD1 in the TGF-β pathway, CSMD1 permits the TGF-β receptor I junction, allowing it to phosphorylate Smad3 and thus allow complex formation: phosphorylated Smad3/phosphorylated Smad2/Smad4; the complex is internalized into the cellular nucleus and bound to a transforming factor leads to apoptosis. In addition, the TGF-β receptor II binds the phosphorylated complex, allowing for subsequent binding Smad1/5/8 with Smad4, and nuclear internalizing inducing apoptosis mediated by binding to a transforming factor.
This application claims the benefit of U.S. Provisional Application No. 62/043,871, filed on Aug. 29, 2014 which is incorporated herein in its entirety.
Number | Date | Country | |
---|---|---|---|
62043871 | Aug 2014 | US |