This invention relates to the fields of genetics and the diagnosis and treatment of autism and autism spectrum disorders.
Several publications and patent documents are cited throughout the specification in order to describe the state of the art to which this invention pertains. Each of these citations is incorporated herein by reference as though set forth in full.
Autism (MIM [209850]) is a severe and relatively common neuropsychiatric disorder characterized by abnormalities in social behavior and communication skills, with tendencies towards patterns of abnormal repetitive movements and other behavior disturbances. Current prevalence estimates are ˜0.2% of the population for autism and 0.9% of the population for ASDs (MMWR Surveill Summ. 2009). Globally, males are affected four times as often as females2. As such, autism poses a major public health concern of unknown cause that extends into adulthood and places an immense economic burden on society. The most prominent features of autism are social and communication deficits. The former are manifested in reduced sociability (reduced tendency to seek or pay attention to social interactions), a lack of awareness of social rules, difficulties in social imitation and symbolic play, impairments in giving and seeking comfort and forming social relationships with other individuals, failure to use nonverbal communication such as eye contact, deficits in perception of others' mental and emotional states, lack of reciprocity, and failure to share experience with others. Communication deficits are manifested as a delay in or lack of language, impaired ability to initiate or sustain a conversation with others, and stereotyped or repetitive use of language. Autistic children have been shown to engage in free play much less frequently and at a much lower developmental level than peers of similar intellectual abilities. Markers of social deficits in affected children appear as early as 12-18 months of age, suggesting that autism is a neurodevelopmental disorder. It has been suggested that autism originates in developmental failure of neural systems governing social and emotional functioning. Although social and cognitive development are highly correlated in the general population, the degree of social impairment does not correlate well with IQ in individuals with autism. The opposite is seen in Down's syndrome and Williams syndrome, where social development is superior to cognitive function. Both examples point to a complex source of sociability.
The etiology of the most common forms of autism is still unknown. In the first description of the disease in 1943, Kanner suggested an influence of child-rearing practices on the development of autism, after observing similar traits in parents of the affected children. While experimental data fail to support several environmental hypotheses, there has been growing evidence for a strong genetic influence on this disorder. The rate of autism in siblings of affected individuals was shown to be 8.6%, a 215 fold increase over the general population (Ritvo et al. (1989) Am J Psychiatry 146(8):1032-6.). Twin studies have demonstrated significant differences in monozygotic and dizygotic twin concordance rates, the former concordant in 60% of twin pairs, with most of the non-autistic monozygotic co-twins displaying milder related social and communicative abnormalities. Social, language and cognitive difficulties have also been found among relatives of autistic individuals in comparison to the relatives of controls. The heritability of autism has been estimated to be >90%.
The genetic basis of autism has been extensively studied in the past decade using three complementary approaches: cytogenetic studies; linkage analysis, and candidate gene analysis see for a review (Freitag, C. M. et al., (2010) Eur Child Adolesc Psychiatry 19(3):169-78; Vorstman et al., (2006) Mol. Psychiatry 11:18-28; Veenstra-VanderWeele and Cook, (2004) Mol. Psychiatry 9: 819-32). Searches for chromosomal abnormalities in autism have revealed terminal and interstitial deletions, balanced and unbalanced translocations, and inversions on a large number of chromosomes, with abnormalities on chromosomes 15,7, and X being most frequently reported. The importance of the regions indicated by cytogenetic studies was evaluated by several whole genome screens in the multiplex autistic families (International Molecular Genetic Study of Autism Consortium, 1998). Strong and concordant evidence for the presence of an autism susceptibility locus was obtained for chromosome 7q; moderate evidence was obtained for loci on chromosomes 15q, 16p, 19p, and 2q; and the majority of the studies find no support for linkage to the X chromosome (Lamb et al, (2005) Med Genet. 42: 132-137; Lord et al, (2000) Autism Dev Disord. 30:205-223; Muhle et al., (2004) Pediatrics 113(5): e472-86). The AGRE sample provided the strongest evidence for loci on 17q and 5p (Yonan et al., (2003) Am J Hum Genet. 73:886-97). Numerous candidate gene studies in autism have focused on a few major candidates with respect to their location or function (reviewed in Veenstra-VanderWeele et al 2004, supra). Jamain et al., ((2003) Nat Genet. 34:27-9), reported rare nonsynonymous mutations in the X-linked genes encoding neuroligins, specifically NLGN3 and NLGN4, in linkage regions associated with ASD. Other evidence for a genetic basis of autistic endophenotypes comes from the study of disorders that share phenotypic features that overlap with autism such as Fragile X and Rett syndrome.
Many emerging theories of autism focus on changes in neuronal connectivity as the potential underlying cause of these disorders. Imaging studies reveal changes in local and global connectivity (Just et al., (2004) Brain 127: 1811-1821; Herbert et al., (2005) Ann Neurol 55(4): 530-40) and developmental studies of activity-dependent cortical development suggest that autism might result from an imbalance of inhibitory and excitatory synaptic connections during development (Rubenstein and Merzenich, (2004; Genes Brain Behav 2(5): 255-67). The fundamental unit of neuronal connectivity is the synapse; thus, if autism is a disorder of neuronal connectivity, then it can likely be understood in neuronal terms as a disorder of synaptic connections. Indeed, genetic studies reveal that mutations in key proteins involved in synaptic development and plasticity, such as neuroligins, FMRP and MeCP2 are found in individuals with autism and in two forms of mental retardation with autistic features, specifically Fragile-X and Rett's syndrome (Jamain et al, 2003, supra, O'Donnell and Warren, (2002) Annu Rev Neurosci 25: 315-38). Thus the pursuit of linkage between genetic anomalies and (endo)phenotypes at the neuronal level appears both warranted and fruitful. Furthermore, such neuronal connectivity anomalies, revealed, for example, by direct white matter tractography, or by observable delays in characteristic electrical activity, can be directly linked to behavioral and clinical manifestations of ASD, allowing these neuron-level phenotypes to be interpreted as neural correlates of behavior.
Overall, the linkage analysis studies conducted to date and discussed above have achieved only limited success in identifying genetic determinants of autism due to numerous reasons, among others the generic problem that the linkage analysis approach is generally poor in identifying common genetic variants that have modest effects (Hirschhorn and Daly, (2005) Nat Rev Genet 6(2): 95-108). This problem is highlighted in autism, a spectrum disorder wherein the varied phenotypes are determined by the net result of interactions between multiple genetic and environmental factors and, in which, any particular genetic variant that is identified is likely to contribute little to the overall risk for disease.
In one of the first studies to report an association of de novo copy number variations (CNVs) with autism (Science (2007) Apr. 20 ;316(5823):445-9), Sebat and colleagues suggest that CNVs may underlie certain cases of the disease. Indeed, the importance of their findings have been recapitulated in other work (Pinto et al, Nature. 2010 Jul. 15 ;466(7304):368-72.; Glessner et al Nature. 2009 May 28 ;459(7246):569-73) suggesting that CNVs may at least account for a small percentage of the genetic variation of the ASDs. However, these genetic defects are rare and collectively only explain a small proportion of the genetic risk for autism, thus suggesting the existence of additional genetic loci but with unknown frequency and effect size.
The present inventors have performed genome wide association study on several large patient cohorts and have successfully identified a number of target genes harboring copy number variations associated with autism and ASD. Thus, in accordance with the present invention, kits are provided for performance of a method for detecting a propensity for developing autism or autistic spectrum disorder is provided. An exemplary kit comprises means for obtaining a sample from a patient and testing the sample for the presence or absence of at least one deletion containing CNV in a target polynucleotide, wherein if the CNV is present, the patient has an increased risk for developing autism and/or autistic spectrum disorder. In a preferred embodiment, the deletion containing CNV is selected from the group of CNVs provided in Table II. In another embodiment, the kit includes reagents for performing the step of detecting the presence of said CNV and further comprises specific reagents for performing a process selected from the group consisting of detection of specific hybridization, measurement of allele size, restriction fragment length polymorphism analysis, allele-specific hybridization analysis, single base primer extension reaction, and sequencing of an amplified polynucleotide.
In another aspect, the present invention provides a method for identifying agents which alter neuronal signaling and/or morphology. An exemplary method entails providing cells expressing at least one CNV listed in Table 2 and cells which express the cognate wild type sequences corresponding to the CNV containing sequence, contacting both cell types with a test agent and analyzing whether the agent alters neuronal signaling and/or morphology of cells comprising the CNV relative to those which lack the genetic alteration, thereby identifying agents which alter neuronal signaling and morphology in CNV containing cells. In cases where the CNV is a deletion, vectors encoding such CNVs contain nucleic acids flanking the affected region of deletion of a suitable length, such that cloning and transformation of cells with the CNV containing nucleic acid is possible.
Also provided is a method of treating autism or ASD in a human subject determined to have at least copy number variation (CNV) associated with an autistic or ASD phenotype, said at least one CNV being selected from the group consisting of CNVs set out in Table 2, the method comprising administering to said human subject a therapeutically effective amount of at least one agent which is known to be efficacious in the signaling pathway adversely affected by the presence of said CNV. In a preferred embodiment, patients are tested for the presence or absence of at least one CNV containing gene is selected from the group consisting of ATP10A, GABRA5, GABRB3, GABRG3, GGTLC2, HBII-52-45, HBII-52-46, IPW, LOC648691, LOC96610, MAGEL2, MIR650, MKRN3, NCRNA00221, NDN, OCA2, OR4S2, PAR-SN, PAR1, PARS, POM121L1P, PRAME, SNORD107, SNORD108, SNORD109A, SNORD109B, SNORD115-11, SNORD115-29, SNORD115-36, SNORD115-43, SNORD115-44, SNORD115-48, SNORD64, SNRPN, SNURF, UBE3A, ZNF280A, ZNF280B. In yet another embodiment, the CNV is determined to reside in a gene important for GABA signaling and the agent is listed in Table 3 or Table 4. In a particularly preferred embodiment, the CNV alters GABA signaling and the agent is topiramide.
Epidemiologic studies have convincingly implicated genetic factors in the pathogenesis of autism, a common neuropsychiatric disorder in children, which presents with variable phenotype expression that extends into adulthood. Several genetic determinants have already been reported to be associated with ASD, including many rare de novo copy number variants (CNVs) that harbor small genomic deletions and insertions. These genetic alterations may account for a small subset of the phenotypic manifestation of the disease. Implicated genomic regions appear to be highly heterogeneous with variations reported in several genes, including NRXN1, NLGN3, SHANK3 and AUTS2 to date.
Predicting an individual's genomic risk for disease can facilitate the development of new interventions and streamline therapeutic approaches. To identify likely functional CNVs, we combined various large cohorts of autistic patients with a large number of neurologically normal controls to analyze over 3K affected cases and 7K controls. In a two-stage genome-wide association design, we uncovered 266 genome-wide statistically significant (combined P<=2.76×10−8) distinct CNV regions (CNVR).
The 38 genes with exons disrupted by these robust CNVRs are most enriched in gene networks impacting neurological disease, behavior and developmental disorders. GABAR-A receptor signaling was the most significant disrupted canonical pathway in ASD where case-enriched defects in GABRA5, GABRB3, and GABRG3 genes were identified. Moreover, network analysis of the first-degree gene interactome of the GABAR-A receptor family suggests that ASD cases are significantly enriched for such pathway defects (P<=2.1×10−21, OR=9.9) when compared with neurologically normal controls.
Taken together, the CNVRs we have identified impact multiple novel genes and signaling pathways, including genes involved in GABAR-A signaling, that can provide important targets for therapeutic intervention.
Since drugs must compete with endogenous small molecules for protein binding, many successful drugs target large gene families with multiple drug binding sites. In Example III, we search for defective gene family interaction networks (GFINs) in 6,742 patients with the ASDs relative to 12,544 neurologically normal controls, to find potentially additional genetic targets that may be amenable for drug therapy. We find significant enrichment of structural defects (P<=2.40×10−9, 1.8-fold enrichment) in the metabotropic glutamate receptor (GRM) GFIN, described in Example I and previously observed to impact attention deficit hyperactivity disorder (ADHD) and schizophrenia. Also, the MXD-MYC-MAX network of genes, previously implicated in cancer, is significantly enriched (P<=3.83×10−23, 2.5-fold enrichment), as is the calmodulin 1 (CALM1) gene interaction network (P<=4.16×10−4, 14.4-fold enrichment) which regulates voltage independent calcium-activated action potentials at the neuronal synapse. In conclusion, we find multiple defective gene family interactions underlie autism, which provide many novel translational targets for therapeutic interventions.
A “copy number variation (CNV)” refers to the number of copies of a particular gene in the genotype of an individual. CNVs represent a major genetic component of human phenotypic diversity. Susceptibility to genetic disorders is known to be associated not only with single nucleotide polymorphisms (SNP), but also with structural and other genetic variations, including CNVs. A CNV represents a copy number change involving a DNA fragment that is ˜1 kilobases (kb) or larger (Feuk et al. 2006a). CNVs described herein do not include those variants that arise from the insertion/deletion of transposable elements (e.g., ˜6-kb KpnI repeats) to minimize the complexity of future CNV analyses. The term CNV therefore encompasses previously introduced terms such as large-scale copy number variants (LCVs; lafrate et al. 2004), copy number polymorphisms (CNPs; Sebat et al. 2004), and intermediate-sized variants (ISVs; Tuzun et al. 2005), but not retroposon insertions.
A “single nucleotide polymorphism (SNP)” refers to a change in which a single base in the DNA differs from the usual base at that position. These single base changes are called SNPs or “snips.” Millions of SNP's have been cataloged in the human genome. Some SNPs such as that which causes sickle cell are responsible for disease. Other SNPs are normal variations in the genome.
The term “genetic alteration” as used herein refers to a change from the wild-type or reference sequence of one or more nucleic acid molecules. Genetic alterations include without limitation, base pair substitutions, additions and deletions of at least one nucleotide from a nucleic acid molecule of known sequence.
The term “solid matrix” as used herein refers to any format, such as beads, microparticles, a microarray, the surface of a microtitration well or a test tube, a dipstick or a filter. The material of the matrix may be polystyrene, cellulose, latex, nitrocellulose, nylon, polyacrylamide, dextran or agarose.
The phrase “consisting essentially of” when referring to a particular nucleotide or amino acid means a sequence having the properties of a given SEQ ID NO:. For example, when used in reference to an amino acid sequence, the phrase includes the sequence per se and molecular modifications that would not affect the functional and novel characteristics of the sequence.
“Target nucleic acid” as used herein refers to a previously defined region of a nucleic acid present in a complex nucleic acid mixture wherein the defined wild-type region contains at least one known nucleotide variation which may or may not be associated with autism. The nucleic acid molecule may be isolated from a natural source by cDNA cloning or subtractive hybridization or synthesized manually. The nucleic acid molecule may be synthesized manually by the triester synthetic method or by using an automated DNA synthesizer.
With regard to nucleic acids used in the invention, the term “isolated nucleic acid” is sometimes employed. This term, when applied to DNA, refers to a DNA molecule that is separated from sequences with which it is immediately contiguous (in the 5′ and 3′ directions) in the naturally occurring genome of the organism from which it was derived. For example, the “isolated nucleic acid” may comprise a DNA molecule inserted into a vector, such as a plasmid or virus vector, or integrated into the genomic DNA of a prokaryote or eukaryote. An “isolated nucleic acid molecule” may also comprise a cDNA molecule. An isolated nucleic acid molecule inserted into a vector is also sometimes referred to herein as a recombinant nucleic acid molecule.
With respect to RNA molecules, the term “isolated nucleic acid” primarily refers to an RNA molecule encoded by an isolated DNA molecule as defined above. Alternatively, the term may refer to an RNA molecule that has been sufficiently separated from RNA molecules with which it would be associated in its natural state (i.e., in cells or tissues), such that it exists in a “substantially pure” form.
By the use of the term “enriched” in reference to nucleic acid it is meant that the specific DNA or RNA sequence constitutes a significantly higher fraction (2-5 fold) of the total DNA or RNA present in the cells or solution of interest than in normal cells or in the cells from which the sequence was taken. This could be caused by a person by preferential reduction in the amount of other DNA or RNA present, or by a preferential increase in the amount of the specific DNA or RNA sequence, or by a combination of the two. However, it should be noted that “enriched” does not imply that there are no other DNA or RNA sequences present, just that the relative amount of the sequence of interest has been significantly increased.
It is also advantageous for some purposes that a nucleotide sequence be in purified form. The term “purified” in reference to nucleic acid does not require absolute purity (such as a homogeneous preparation); instead, it represents an indication that the sequence is relatively purer than in the natural environment (compared to the natural level, this level should be at least 2-5 fold greater, e.g., in terms of mg/ml). Individual clones isolated from a cDNA library may be purified to electrophoretic homogeneity. The claimed DNA molecules obtained from these clones can be obtained directly from total DNA or from total RNA. The cDNA clones are not naturally occurring, but rather are preferably obtained via manipulation of a partially purified naturally occurring substance (messenger RNA). The construction of a cDNA library from mRNA involves the creation of a synthetic substance (cDNA) and pure individual cDNA clones can be isolated from the synthetic library by clonal selection of the cells carrying the cDNA library. Thus, the process which includes the construction of a cDNA library from mRNA and isolation of distinct cDNA clones yields an approximately 10−6-fold purification of the native message. Thus, purification of at least one order of magnitude, preferably two or three orders, and more preferably four or five orders of magnitude is expressly contemplated.
The term “substantially pure” refers to a preparation comprising at least 50-60% by weight the compound of interest (e.g., nucleic acid, oligonucleotide, etc.). More preferably, the preparation comprises at least 75% by weight, and most preferably 90-99% by weight, the compound of interest. Purity is measured by methods appropriate for the compound of interest.
The term “complementary” describes two nucleotides that can form multiple favorable interactions with one another. For example, adenine is complementary to thymine as they can form two hydrogen bonds. Similarly, guanine and cytosine are complementary since they can form three hydrogen bonds. Thus if a nucleic acid sequence contains the following sequence of bases, thymine, adenine, guanine and cytosine, a “complement” of this nucleic acid molecule would be a molecule containing adenine in the place of thymine, thymine in the place of adenine, cytosine in the place of guanine, and guanine in the place of cytosine. Because the complement can contain a nucleic acid sequence that forms optimal interactions with the parent nucleic acid molecule, such a complement can bind with high affinity to its parent molecule.
With respect to single stranded nucleic acids, particularly oligonucleotides, the term “specifically hybridizing” refers to the association between two single-stranded nucleotide molecules of sufficiently complementary sequence to permit such hybridization under pre-determined conditions generally used in the art (sometimes termed “substantially complementary”). In particular, the term refers to hybridization of an oligonucleotide with a substantially complementary sequence contained within a single-stranded DNA or RNA molecule of the invention, to the substantial exclusion of hybridization of the oligonucleotide with single-stranded nucleic acids of non-complementary sequence. For example, specific hybridization can refer to a sequence which hybridizes to any autism specific marker gene or nucleic acid, but does not hybridize to other nucleotides. Also polynucleotide which “specifically hybridizes” may hybridize only to a neurospecific specific marker, such an autism-specific marker shown in the Table contained herein. Appropriate conditions enabling specific hybridization of single stranded nucleic acid molecules of varying complementarity are well known in the art.
For instance, one common formula for calculating the stringency conditions required to achieve hybridization between nucleic acid molecules of a specified sequence homology is set forth below (Sambrook et al., Molecular Cloning, Cold Spring Harbor Laboratory (1989):
T
m=81.5° C.+16.6 Log[Na+]+0.41(% G+C)−0.63(% formamide)−600/#bp in duplex
As an illustration of the above formula, using [Na+]=[0.368] and 50% formamide, with GC content of 42% and an average probe size of 200 bases, the Tm is 57° C. The Tm of a DNA duplex decreases by 1-1.5° C. with every 1% decrease in homology. Thus, targets with greater than about 75% sequence identity would be observed using a hybridization temperature of 42° C.
The stringency of the hybridization and wash depend primarily on the salt concentration and temperature of the solutions. In general, to maximize the rate of annealing of the probe with its target, the hybridization is usually carried out at salt and temperature conditions that are 20-25° C. below the calculated Tm of the hybrid. Wash conditions should be as stringent as possible for the degree of identity of the probe for the target. In general, wash conditions are selected to be approximately 12-20° C. below the Tm of the hybrid. In regards to the nucleic acids of the current invention, a moderate stringency hybridization is defined as hybridization in 6×SSC, 5×Denhardt's solution, 0.5% SDS and 100 μg/ml denatured salmon sperm DNA at 42° C., and washed in 2×SSC and 0.5% SDS at 55° C. for 15 minutes. A high stringency hybridization is defined as hybridization in 6×SSC, 5×Denhardt's solution, 0.5% SDS and 100 μg/ml denatured salmon sperm DNA at 42° C., and washed in 1×SSC and 0.5% SDS at 65° C. for 15 minutes. A very high stringency hybridization is defined as hybridization in 6×SSC, 5×Denhardt's solution, 0.5% SDS and 100 μg/ml denatured salmon sperm DNA at 42° C., and washed in 0.1×SSC and 0.5% SDS at 65° C. for 15 minutes.
The term “oligonucleotide,” as used herein is defined as a nucleic acid molecule comprised of two or more ribo- or deoxyribonucleotides, preferably more than three. The exact size of the oligonucleotide will depend on various factors and on the particular application and use of the oligonucleotide. Oligonucleotides, which include probes and primers, can be any length from 3 nucleotides to the full length of the nucleic acid molecule, and explicitly include every possible number of contiguous nucleic acids from 3 through the full length of the polynucleotide. Preferably, oligonucleotides are at least about 10 nucleotides in length, more preferably at least 15 nucleotides in length, more preferably at least about 20 nucleotides in length.
The term “probe” as used herein refers to an oligonucleotide, polynucleotide or nucleic acid, either RNA or DNA, whether occurring naturally as in a purified restriction enzyme digest or produced synthetically, which is capable of annealing with or specifically hybridizing to a nucleic acid with sequences complementary to the probe. A probe may be either single-stranded or double-stranded. The exact length of the probe will depend upon many factors, including temperature, source of probe and use of the method. For example, for diagnostic applications, depending on the complexity of the target sequence, the oligonucleotide probe typically contains 15-25 or more nucleotides, although it may contain fewer nucleotides. The probes herein are selected to be complementary to different strands of a particular target nucleic acid sequence and may or may not comprise a detectable label. This means that the probes must be sufficiently complementary so as to be able to “specifically hybridize” or anneal with their respective target strands under a set of pre-determined conditions. Therefore, the probe sequence need not reflect the exact complementary sequence of the target. For example, a non-complementary nucleotide fragment may be attached to the 5′ or 3′ end of the probe, with the remainder of the probe sequence being complementary to the target strand. Alternatively, non-complementary bases or longer sequences can be interspersed into the probe, provided that the probe sequence has sufficient complementarity with the sequence of the target nucleic acid to anneal therewith specifically.
The term “primer” as used herein refers to an oligonucleotide, either RNA or DNA, either single-stranded or double-stranded, either derived from a biological system, generated by restriction enzyme digestion, or produced synthetically which, when placed in the proper environment, is able to functionally act as an initiator of template-dependent nucleic acid synthesis. When presented with an appropriate nucleic acid template, suitable nucleoside triphosphate precursors of nucleic acids, a polymerase enzyme, suitable cofactors and conditions such as a suitable temperature and pH, the primer may be extended at its 3′ terminus by the addition of nucleotides by the action of a polymerase or similar activity to yield a primer extension product. The primer may vary in length depending on the particular conditions and requirement of the application, also it may or may not be detectably labeled. For example, in diagnostic applications, the oligonucleotide primer is typically 15-25 or more nucleotides in length. The primer must be of sufficient complementarity to the desired template to prime the synthesis of the desired extension product, that is, to be able anneal with the desired template strand in a manner sufficient to provide the 3′ hydroxyl moiety of the primer in appropriate juxtaposition for use in the initiation of synthesis by a polymerase or similar enzyme. It is not required that the primer sequence represent an exact complement of the desired template. For example, a non-complementary nucleotide sequence may be attached to the 5′ end of an otherwise complementary primer. Alternatively, non-complementary bases may be interspersed within the oligonucleotide primer sequence, provided that the primer sequence has sufficient complementarity with the sequence of the desired template strand to functionally provide a template-primer complex for the synthesis of the extension product. Probes and primers having the appropriate sequence homology which specifically hybridized to CNV containing nucleic acids are useful in the detecting the presence of such nucleic acids in biological samples.
Polymerase chain reaction (PCR) has been described in U.S. Pat. Nos. 4,683,195, 4,800,195, and 4,965,188, the entire disclosures of which are incorporated by reference herein.
The term “vector” relates to a single or double stranded circular nucleic acid molecule that can be infected, transfected or transformed into cells and replicate independently or within the host cell genome. A circular double stranded nucleic acid molecule can be cut and thereby linearized upon treatment with restriction enzymes. An assortment of vectors, restriction enzymes, and the knowledge of the nucleotide sequences that are targeted by restriction enzymes are readily available to those skilled in the art, and include any replicon, such as a plasmid, cosmid, bacmid, phage or virus, to which another genetic sequence or element (either DNA or RNA) may be attached so as to bring about the replication of the attached sequence or element. A nucleic acid molecule of the invention can be inserted into a vector by cutting the vector with restriction enzymes and ligating the two pieces together.
Many techniques are available to those skilled in the art to facilitate transformation, transfection, or transduction of the expression construct into a prokaryotic or eukaryotic organism. The terms “transformation”, “transfection”, and “transduction” refer to methods of inserting a nucleic acid and/or expression construct into a cell or host organism. These methods involve a variety of techniques, such as treating the cells with high concentrations of salt, an electric field, or detergent, to render the host cell outer membrane or wall permeable to nucleic acid molecules of interest, microinjection, PEG-fusion, and the like.
The term “promoter element” describes a nucleotide sequence that is incorporated into a vector that, once inside an appropriate cell, can facilitate transcription factor and/or polymerase binding and subsequent transcription of portions of the vector DNA into mRNA. In one embodiment, the promoter element of the present invention precedes the 5′ end of the autism specific marker nucleic acid molecule such that the latter is transcribed into mRNA. Host cell machinery then translates mRNA into a polypeptide.
Those skilled in the art will recognize that a nucleic acid vector can contain nucleic acid elements other than the promoter element and the autism specific marker gene nucleic acid molecule. These other nucleic acid elements include, but are not limited to, origins of replication, ribosomal binding sites, nucleic acid sequences encoding drug resistance enzymes or amino acid metabolic enzymes, and nucleic acid sequences encoding secretion signals, localization signals, or signals useful for polypeptide purification.
A “replicon” is any genetic element, for example, a plasmid, cosmid, bacmid, plastid, phage or virus, that is capable of replication largely under its own control. A replicon may be either RNA or DNA and may be single or double stranded.
An “expression operon” refers to a nucleic acid segment that may possess transcriptional and translational control sequences, such as promoters, enhancers, translational start signals (e.g., ATG or AUG codons), polyadenylation signals, terminators, and the like, and which facilitate the expression of a polypeptide coding sequence in a host cell or organism.
As used herein, the terms “reporter,” “reporter system”, “reporter gene,” or “reporter gene product” shall mean an operative genetic system in which a nucleic acid comprises a gene that encodes a product that when expressed produces a reporter signal that is a readily measurable, e.g., by biological assay, immunoassay, radio immunoassay, or by colorimetric, fluorogenic, chemiluminescent or other methods. The nucleic acid may be either RNA or DNA, linear or circular, single or double stranded, antisense or sense polarity, and is operatively linked to the necessary control elements for the expression of the reporter gene product. The required control elements will vary according to the nature of the reporter system and whether the reporter gene is in the form of DNA or RNA, but may include, but not be limited to, such elements as promoters, enhancers, translational control sequences, poly A addition signals, transcriptional termination signals and the like.
The introduced nucleic acid may or may not be integrated (covalently linked) into nucleic acid of the recipient cell or organism. In bacterial, yeast, plant and mammalian cells, for example, the introduced nucleic acid may be maintained as an episomal element or independent replicon such as a plasmid. Alternatively, the introduced nucleic acid may become integrated into the nucleic acid of the recipient cell or organism and be stably maintained in that cell or organism and further passed on or inherited to progeny cells or organisms of the recipient cell or organism. Finally, the introduced nucleic acid may exist in the recipient cell or host organism only transiently.
The term “selectable marker gene” refers to a gene that when expressed confers a selectable phenotype, such as antibiotic resistance, on a transformed cell.
The term “operably linked” means that the regulatory sequences necessary for expression of the coding sequence are placed in the DNA molecule in the appropriate positions relative to the coding sequence so as to effect expression of the coding sequence. This same definition is sometimes applied to the arrangement of transcription units and other transcription control elements (e.g. enhancers) in an expression vector.
The terms “recombinant organism,” or “transgenic organism” refer to organisms which have a new combination of genes or nucleic acid molecules. A new combination of genes or nucleic acid molecules can be introduced into an organism using a wide array of nucleic acid manipulation techniques available to those skilled in the art. The term “organism” relates to any living being comprised of a least one cell. An organism can be as simple as one eukaryotic cell or as complex as a mammal. Therefore, the phrase “a recombinant organism” encompasses a recombinant cell, as well as eukaryotic and prokaryotic organism.
The term “isolated protein” or “isolated and purified protein” is sometimes used herein. This term refers primarily to a protein produced by expression of an isolated nucleic acid molecule of the invention. Alternatively, this term may refer to a protein that has been sufficiently separated from other proteins with which it would naturally be associated, so as to exist in “substantially pure” form. “Isolated” is not meant to exclude artificial or synthetic mixtures with other compounds or materials, or the presence of impurities that do not interfere with the fundamental activity, and that may be present, for example, due to incomplete purification, addition of stabilizers, or compounding into, for example, immunogenic preparations or pharmaceutically acceptable preparations.
A “specific binding pair” comprises a specific binding member (sbm) and a binding partner (bp) which have a particular specificity for each other and which in normal conditions bind to each other in preference to other molecules. Examples of specific binding pairs are antigens and antibodies, ligands and receptors and complementary nucleotide sequences. The skilled person is aware of many other examples. Further, the term “specific binding pair” is also applicable where either or both of the specific binding member and the binding partner comprise a part of a large molecule. In embodiments in which the specific binding pair comprises nucleic acid sequences, they will be of a length to hybridize to each other under conditions of the assay, preferably greater than 10 nucleotides long, more preferably greater than 15 or 20 nucleotides long.
“Sample” or “patient sample” or “biological sample” generally refers to a sample which may be tested for a particular molecule, preferably an autism specific marker molecule, such as a marker shown in the tables provided below. Samples may include but are not limited to cells, body fluids, including blood, serum, plasma, urine, saliva, tears, pleural fluid and the like.
The terms “agent” and “test compound” are used interchangeably herein and denote a chemical compound, a mixture of chemical compounds, a biological macromolecule, or an extract made from biological materials such as bacteria, plants, fungi, or animal (particularly mammalian) cells or tissues. Biological macromolecules include siRNA, shRNA, antisense oligonucleotides, peptides, peptide/DNA complexes, and any nucleic acid based molecule which exhibits the capacity to modulate the activity of the SNP and/or CNV containing nucleic acids described herein or their encoded proteins. Agents are evaluated for potential biological activity by inclusion in screening assays described hereinbelow.
Autism-related-CNV and/or SNP containing nucleic acids, including but not limited to those listed in the Table provided below may be used for a variety of purposes in accordance with the present invention. Autism-associated CNV/SNP containing DNA, RNA, or fragments thereof may be used as probes to detect the presence of and/or expression of autism specific markers. Methods in which autism specific marker nucleic acids may be utilized as probes for such assays include, but are not limited to: (1) in situ hybridization; (2) Southern hybridization (3) northern hybridization; and (4) assorted amplification reactions such as polymerase chain reactions (PCR).
Further, assays for detecting autism-associated CNVs/SNPs may be conducted on any type of biological sample, including but not limited to body fluids (including blood, urine, serum, gastric lavage), any type of cell (such as brain cells, white blood cells, mononuclear cells) or body tissue. Such detection methods can include for example, southern and northern blotting, RFLP, direct sequencing and PCR amplification followed by hybridization of amplified products to a microarray comprising reference nucleic acid sequences.
From the foregoing discussion, it can be seen that autism-associated CNV/SNP containing nucleic acids, vectors expressing the same, autism CNV/SNP containing marker proteins and anti-Autism specific marker antibodies of the invention can be used to detect autism associated CNVs/SNPs in body tissue, cells, or fluid, and alter autism SNP containing marker protein expression for purposes of assessing the genetic and protein interactions involved in the development of autism. In most embodiments for screening for autism-associated CNVs/SNPs, the autism-associated CNV/SNP containing nucleic acid in the sample will initially be amplified, e.g. using PCR, to increase the amount of the templates as compared to other sequences present in the sample. This allows the target sequences to be detected with a high degree of sensitivity if they are present in the sample. This initial step may be avoided by using highly sensitive array techniques that are becoming increasingly important in the art. Alternatively, new detection technologies can overcome this limitation and enable analysis of small samples containing as little as 1 μg of total RNA. Using Resonance Light Scattering (RLS) technology, as opposed to traditional fluorescence techniques, multiple reads can detect low quantities of mRNAs using biotin labeled hybridized targets and anti-biotin antibodies. Another alternative to PCR amplification involves planar wave guide technology (PWG) to increase signal-to-noise ratios and reduce background interference. Both techniques are commercially available from Qiagen Inc. (USA).
Thus any of the aforementioned techniques may be used to detect or quantify autism-associated CNV/SNP marker expression and accordingly, diagnose autism or an autism spectrum disorder.
Any of the aforementioned products can be incorporated into a kit which may contain a autism-associated CNV/SNP specific marker polynucleotide or one or more such markers immobilized on a Gene Chip, an oligonucleotide, a polypeptide, a peptide, an antibody, a label, said label being detectable and optionally, operably linked to an oligonucleotide, polypeptide or antibody marker, or reporter, a pharmaceutically acceptable carrier, a physiologically acceptable carrier, instructions for use, a container, a vessel for administration, an assay substrate, or any combination thereof.
In a preferred embodiment, the kit contains reagents for identifying nucleic acids present in a biological sample with harbor nucleic acids comprising the genetic alterations described herein. In the case where the CNV is a deletion, probes or primers are designed to flank the affected region in order to assess whether the CNV is present or absent.
Since the CNVs and SNPs identified herein have been associated with the etiology of autism, methods for identifying agents that modulate the activity of the genes and their encoded products containing such CNVs/SNPs should result in the generation of efficacious therapeutic agents for the treatment of a variety of disorders associated with this condition.
As can be seen from the data provided in Table 1, several chromosomes contain regions which provide suitable targets for the rational design of therapeutic agents which modulate their activity. Small peptide molecules corresponding to these regions may be used to advantage in the design of therapeutic agents which effectively modulate the activity of the encoded proteins.
Molecular modeling should facilitate the identification of specific organic molecules with capacity to bind to the active site of the proteins encoded by the CNV/SNP containing nucleic acids based on conformation or key amino acid residues required for function. A combinatorial chemistry approach will be used to identify molecules with greatest activity and then iterations of these molecules will be developed for further cycles of screening.
The polypeptides or fragments employed in drug screening assays may either be free in solution, affixed to a solid support or within a cell. One method of drug screening utilizes eukaryotic or prokaryotic host cells which are stably transformed with recombinant polynucleotides expressing the polypeptide or fragment, preferably in competitive binding assays. Such cells, either in viable or fixed form, can be used for standard binding assays. One may determine, for example, formation of complexes between the polypeptide or fragment and the agent being tested, or examine the degree to which the formation of a complex between the polypeptide or fragment and a known substrate is interfered with by the agent being tested.
Another technique for drug screening provides high throughput screening for compounds having suitable binding affinity for the encoded polypeptides and is described in detail in Geysen, PCT published application WO 84/03564, published on Sep. 13, 1984. Briefly stated, large numbers of different, small peptide test compounds, such as those described above, are synthesized on a solid substrate, such as plastic pins or some other surface. The peptide test compounds are reacted with the target polypeptide and washed. Bound polypeptide is then detected by methods well known in the art.
A further technique for drug screening involves the use of host eukaryotic cell lines or cells (such as described above) which have a nonfunctional or altered autism associated gene. These host cell lines or cells are defective at the polypeptide level. The host cell lines or cells are grown in the presence of drug compound. The rate of cellular metabolism, alterations in cellular morphology and/or receptor signaling of the host cells is measured to determine if the compound is capable of altering any of these parameters in the defective cells. Host cells contemplated for use in the present invention include but are not limited to bacterial cells, fungal cells, insect cells, mammalian cells, and plant cells. The autism-associated CNV/SNP encoding DNA molecules may be introduced singly into such host cells or in combination to assess the phenotype of cells conferred by such expression. Methods for introducing DNA molecules are also well known to those of ordinary skill in the art. Such methods are set forth in Ausubel et al. eds., Current Protocols in Molecular Biology, John Wiley & Sons, NY, N.Y. 1995, the disclosure of which is incorporated by reference herein.
A wide variety of expression vectors are available that can be modified to express the novel DNA sequences of this invention. The specific vectors exemplified herein are merely illustrative, and are not intended to limit the scope of the invention. Expression methods are described by Sambrook et al. Molecular Cloning: A Laboratory Manual or Current Protocols in Molecular Biology 16.3-17.44 (1989). Expression methods in Saccharomyces are also described in Current Protocols in Molecular Biology (1989).
Suitable vectors for use in practicing the invention include prokaryotic vectors such as the pNH vectors (Stratagene Inc., 11099 N. Torrey Pines Rd., La Jolla, Calif. 92037), pET vectors (Novogen Inc., 565 Science Dr., Madison, Wis. 53711) and the pGEX vectors (Pharmacia LKB Biotechnology Inc., Piscataway, N.J. 08854). Examples of eukaryotic vectors useful in practicing the present invention include the vectors pRc/CMV, pRc/RSV, and pREP (Invitrogen, 11588 Sorrento Valley Rd., San Diego, Calif. 92121); pcDNA3.1N5&His (Invitrogen); baculovirus vectors such as pVL1392, pVL1393, or pAC360 (Invitrogen); and yeast vectors such as YRP17, YIPS, and YEP24 (New England Biolabs, Beverly, Mass.), as well as pRS403 and pRS413 Stratagene Inc.); Picchia vectors such as pHIL-D1 (Phillips Petroleum Co., Bartlesville, Okla. 74004); retroviral vectors such as PLNCX and pLPCX (Clontech); and adenoviral and adeno-associated viral vectors.
Promoters for use in expression vectors of this invention include promoters that are operable in prokaryotic or eukaryotic cells. Promoters that are operable in prokaryotic cells include lactose (lac) control elements, bacteriophage lambda (pL) control elements, arabinose control elements, tryptophan (trp) control elements, bacteriophage T7 control elements, and hybrids thereof. Promoters that are operable in eukaryotic cells include Epstein Barr virus promoters, adenovirus promoters, SV40 promoters, Rous Sarcoma Virus promoters, cytomegalovirus (CMV) promoters, baculovirus promoters such as AcMNPV polyhedrin promoter, Picchia promoters such as the alcohol oxidase promoter, and Saccharomyces promoters such as the gal4 inducible promoter and the PGK constitutive promoter, as well as neuronal-specific platelet-derived growth factor promoter (PDGF), the Thy-1 promoter, the hamster and mouse Prion promoter (MoPrP), and the Glial fibrillar acidic protein (GFAP) for the expression of transgenes in glial cells.
In addition, a vector of this invention may contain any one of a number of various markers facilitating the selection of a transformed host cell. Such markers include genes associated with temperature sensitivity, drug resistance, or enzymes associated with phenotypic characteristics of the host organisms.
Host cells expressing the autism-associated CNVs/SNPs of the present invention or functional fragments thereof provide a system in which to screen potential compounds or agents for the ability to modulate the development of autism. Thus, in one embodiment, the nucleic acid molecules of the invention may be used to create recombinant cell lines for use in assays to identify agents which modulate aspects of cellular metabolism associated with neuronal signaling and neuronal cell communication and structure. Also provided herein are methods to screen for compounds capable of modulating the function of proteins encoded by CNV/SNP containing nucleic acids.
Another approach entails the use of phage display libraries engineered to express fragment of the polypeptides encoded by the CNV/SNP containing nucleic acids on the phage surface. Such libraries are then contacted with a combinatorial chemical library under conditions wherein binding affinity between the expressed peptide and the components of the chemical library may be detected. U.S. Pat. Nos. 6,057,098 and 5,965,456 provide methods and apparatus for performing such assays.
The goal of rational drug design is to produce structural analogs of biologically active polypeptides of interest or of small molecules with which they interact (e.g., agonists, antagonists, inhibitors) in order to fashion drugs which are, for example, more active or stable forms of the polypeptide, or which, e.g., enhance or interfere with the function of a polypeptide in vivo. See, e.g., Hodgson, (1991) Bio/Technology 9:19-21. In one approach, discussed above, the three-dimensional structure of a protein of interest or, for example, of the protein-substrate complex, is solved by x-ray crystallography, by nuclear magnetic resonance, by computer modeling or most typically, by a combination of approaches. Less often, useful information regarding the structure of a polypeptide may be gained by modeling based on the structure of homologous proteins. An example of rational drug design is the development of HIV protease inhibitors (Erickson et al., (1990) Science 249:527-533). In addition, peptides may be analyzed by an alanine scan (Wells, (1991) Meth. Enzym. 202:390-411). In this technique, an amino acid residue is replaced by Ala, and its effect on the peptide's activity is determined. Each of the amino acid residues of the peptide is analyzed in this manner to determine the important regions of the peptide.
It is also possible to isolate a target-specific antibody, selected by a functional assay, and then to solve its crystal structure. In principle, this approach yields a pharmacore upon which subsequent drug design can be based.
One can bypass protein crystallography altogether by generating anti-idiotypic antibodies (anti-ids) to a functional, pharmacologically active antibody. As a mirror image of a mirror image, the binding site of the anti-ids would be expected to be an analog of the original molecule. The anti-id could then be used to identify and isolate peptides from banks of chemically or biologically produced banks of peptides. Selected peptides would then act as the therapeutic.
Thus, one may design drugs which have, e.g., improved polypeptide activity or stability or which act as inhibitors, agonists, antagonists, etc. of polypeptide activity. By virtue of the availability of CNV/SNP containing nucleic acid sequences described herein, sufficient amounts of the encoded polypeptide may be made available to perform such analytical studies as x-ray crystallography. In addition, the knowledge of the protein sequence provided herein will guide those employing computer modeling techniques in place of, or in addition to x-ray crystallography.
In another embodiment, the availability of autism-associated CNV/SNP containing nucleic acids enables the production of strains of laboratory mice carrying the autism-associated CNVs/SNPs of the invention. Transgenic mice expressing the autism-associated CNV/SNP of the invention provide a model system in which to examine the role of the protein encoded by the SNP containing nucleic acid in the development and progression towards autism. Methods of introducing transgenes in laboratory mice are known to those of skill in the art. Three common methods include: 1. integration of retroviral vectors encoding the foreign gene of interest into an early embryo; 2. injection of DNA into the pronucleus of a newly fertilized egg; and 3. the incorporation of genetically manipulated embryonic stem cells into an early embryo. Production of the transgenic mice described above will facilitate the molecular elucidation of the role that a target protein plays in various cellular metabolic and neuronal processes. Such mice provide an in vivo screening tool to study putative therapeutic drugs in a whole animal model and are encompassed by the present invention.
The term “animal” is used herein to include all vertebrate animals, except humans. It also includes an individual animal in all stages of development, including embryonic and fetal stages. A “transgenic animal” is any animal containing one or more cells bearing genetic information altered or received, directly or indirectly, by deliberate genetic manipulation at the subcellular level, such as by targeted recombination or microinjection or infection with recombinant virus. The term “transgenic animal” is not meant to encompass classical cross-breeding or in vitro fertilization, but rather is meant to encompass animals in which one or more cells are altered by or receive a recombinant DNA molecule. This molecule may be specifically targeted to a defined genetic locus, be randomly integrated within a chromosome, or it may be extrachromosomally replicating DNA. The term “germ cell line transgenic animal” refers to a transgenic animal in which the genetic alteration or genetic information was introduced into a germ line cell, thereby conferring the ability to transfer the genetic information to offspring. If such offspring, in fact, possess some or all of that alteration or genetic information, then they, too, are transgenic animals.
The alteration of genetic information may be foreign to the species of animal to which the recipient belongs, or foreign only to the particular individual recipient, or may be genetic information already possessed by the recipient. In the last case, the altered or introduced gene may be expressed differently than the native gene. Such altered or foreign genetic information would encompass the introduction of autism-associated CNV/SNP containing nucleotide sequences.
The DNA used for altering a target gene may be obtained by a wide variety of techniques that include, but are not limited to, isolation from genomic sources, preparation of cDNAs from isolated mRNA templates, direct synthesis, or a combination thereof.
A preferred type of target cell for transgene introduction is the embryonal stem cell (ES). ES cells may be obtained from pre-implantation embryos cultured in vitro (Evans et al., (1981) Nature 292:154-156; Bradley et al., (1984) Nature 309:255-258; Gossler et al., (1986) Proc. Natl. Acad. Sci. 83:9065-9069). Transgenes can be efficiently introduced into the ES cells by standard techniques such as DNA transfection or by retrovirus-mediated transduction. The resultant transformed ES cells can thereafter be combined with blastocysts from a non-human animal. The introduced ES cells thereafter colonize the embryo and contribute to the germ line of the resulting chimeric animal.
One approach to the problem of determining the contributions of individual genes and their expression products is to use isolated autism-associated CNV/SNP genes as insertional cassettes to selectively inactivate a wild-type gene in totipotent ES cells (such as those described above) and then generate transgenic mice. The use of gene-targeted ES cells in the generation of gene-targeted transgenic mice was described, and is reviewed elsewhere (Frohman et al., (1989) Cell 56:145-147; Bradley et al., (1992) Bio/Technology 10:534-539).
Techniques are available to inactivate or alter any genetic region to a mutation desired by using targeted homologous recombination to insert specific changes into chromosomal alleles. However, in comparison with homologous extrachromosomal recombination, which occurs at a frequency approaching 100%, homologous plasmid-chromosome recombination was originally reported to only be detected at frequencies between 10−6 and 10−3. Nonhomologous plasmid-chromosome interactions are more frequent occurring at levels 105-fold to 102 fold greater than comparable homologous insertion.
To overcome this low proportion of targeted recombination in murine ES cells, various strategies have been developed to detect or select rare homologous recombinants. One approach for detecting homologous alteration events uses the polymerase chain reaction (PCR) to screen pools of transformant cells for homologous insertion, followed by screening of individual clones. Alternatively, a positive genetic selection approach has been developed in which a marker gene is constructed which will only be active if homologous insertion occurs, allowing these recombinants to be selected directly. One of the most powerful approaches developed for selecting homologous recombinants is the positive-negative selection (PNS) method developed for genes for which no direct selection of the alteration exists. The PNS method is more efficient for targeting genes which are not expressed at high levels because the marker gene has its own promoter. Non-homologous recombinants are selected against by using the Herpes Simplex virus thymidine kinase (HSV-TK) gene and selecting against its nonhomologous insertion with effective herpes drugs such as gancyclovir (GANC) or (1-(2-deoxy-2-fluoro-B-D arabinofluranosyl)-5-iodou-racil, (FIAU). By this counter selection, the number of homologous recombinants in the surviving transformants can be increased. Utilizing autism-associated SNP containing nucleic acid as a targeted insertional cassette provides means to detect a successful insertion as visualized, for example, by acquisition of immunoreactivity to an antibody immunologically specific for the polypeptide encoded by autism-associated SNP nucleic acid and, therefore, facilitates screening/selection of ES cells with the desired genotype.
As used herein, a knock-in animal is one in which the endogenous murine gene, for example, has been replaced with human autism-associated CNV/SNP containing gene of the invention. Such knock-in animals provide an ideal model system for studying the development of autism.
As used herein, the expression of a autism-associated CNV/SNP containing nucleic acid, fragment thereof, or an autism-associated CNV/SNP fusion protein can be targeted in a “tissue specific manner” or “cell type specific manner” using a vector in which nucleic acid sequences encoding all or a portion of autism-associated CNV/SNP are operably linked to regulatory sequences (e.g., promoters and/or enhancers) that direct expression of the encoded protein in a particular tissue or cell type. Such regulatory elements may be used to advantage for both in vitro and in vivo applications. Promoters for directing tissue specific proteins are well known in the art and described herein.
The nucleic acid sequence encoding the autism-associated CNV/SNP of the invention may be operably linked to a variety of different promoter sequences for expression in transgenic animals. Such promoters include, but are not limited to a prion gene promoter such as hamster and mouse Prion promoter (MoPrP), described in U.S. Pat. No. 5,877,399 and in Borchelt et al., Genet. Anal. 13(6) (1996) pages 159-163; a rat neuronal specific enolase promoter, described in U.S. Pat. Nos. 5,612,486, and 5,387,742; a platelet-derived growth factor B gene promoter, described in U.S. Pat. No. 5,811,633; a brain specific dystrophin promoter, described in U.S. Pat. No. 5,849,999; a Thy-1 promoter; a PGK promoter; a CMV promoter; a neuronal-specific platelet-derived growth factor B gene promoter; and Glial fibrillar acidic protein (GFAP) promoter for the expression of transgenes in glial cells.
Methods of use for the transgenic mice of the invention are also provided herein. Transgenic mice into which a nucleic acid containing the autism-associated CNV/SNP or its encoded protein have been introduced are useful, for example, to develop screening methods to screen therapeutic agents to identify those capable of modulating the development of autism.
The elucidation of the role played by the autism associated CNVs/SNPs described herein in neuronal signaling and brain structure facilitates the development of pharmaceutical compositions useful for treatment and diagnosis of autism. These compositions may comprise, in addition to one of the above substances, a pharmaceutically acceptable excipient, carrier, buffer, stabilizer or other materials well known to those skilled in the art. Such materials should be non-toxic and should not interfere with the efficacy of the active ingredient. The precise nature of the carrier or other material may depend on the route of administration, e.g. oral, intravenous, cutaneous or subcutaneous, nasal, intramuscular, intraperitoneal routes.
Whether it is a polypeptide, antibody, peptide, nucleic acid molecule, small molecule or other pharmaceutically useful compound according to the present invention that is to be given to an individual, administration is preferably in a “prophylactically effective amount” or a “therapeutically effective amount” (as the case may be, although prophylaxis may be considered therapy), this being sufficient to show benefit to the individual.
The following materials and methods are provided to facilitate the practice of the present invention.
Study design & Quality Control
PennCNV was used to define CNVs across all genotyped samples. To control for potential chip-to-chip bias from the mixed SNP content introduced by genotyping across multiple chips types, only CNV calls from the 550K joint SNPs across the 550K, 610K, 660K, and 1M Illumina chips were considered. Low quality samples were excluded on a per sample basis if:
1. # CNVs>100
2. SD LRR>0.3
3.|GCWF|>0.02
For each stage of analysis, the genome was segmented into CNV regions (CNVRs) that define unambiguous sets of cases and controls impacted by CNVs which facilitates the immediate identification of “core” CNV genomic regions. These CNVRs were tested for association by Fisher's exact test in a two-stage design with an alpha of P<=0.01 after correcting for multiple tests.
Ingenuity pathway analysis was used to look for enrichment in networks and canonical pathways among genes with exons disrupted by replicated CNVRs. Fisher's exact test was used to gauge enrichment of the first order interactome of GABAR family of genes, as well as a test of 1000 random permutations of case/control labels.
The following examples are provided to illustrate certain embodiments of the invention. They are not intended to limit the invention in any way.
The ability to quantify individual's genomic risk for disease can facilitate the development of new interventions and improve medical practice. Many rare Copy
Number Variants (CNVs) that harbor small genomic deletions and insertions have been described in the autism spectrum disorders (ASD). To identify these likely functional elements, we combined various large cohorts of autistic patients with a large number of neurologically normal controls to analyze over 3K affected cases and 7K controls. In a two-stage genome-wide association design, we uncovered 266 genome-wide statistically significant (combined P<=2.76×10−8) distinct CNV regions (CNVR).
The 38 genes with exons disrupted by these robust CNVRs are most enriched in gene networks impacting neurological disease, behavior and developmental disorder. GABAR-A receptor signaling was found to be the most significant canonical pathways disrupted in ASD because case-enriched defects in GABRA5, GABRB3, and GABRG3 genes. Moreover, network analysis of the first-degree gene interactome of the GABAR-A receptor family suggests that ASD cases are significantly enriched for pathway defects (P<=2.1×10−21, OR=9.9) when compared with neurologically normal controls.
Taken together, the CNVRs we have identified impact multiple novel genes and signaling pathways, including genes involved in GABAR-A signaling, that may be important for new therapeutic development.
In all, 3871 unrelated cases were compared to 7768 controls. Samples were sourced from five independent sites, and were distributed as follows:
In all, 3225 cases and 7300 controls passed quality control and were used for CNV analysis. These individuals were segregated into a discovery (stage 1) and replication (stage 2) cohort based on the default quality calls of PennCNV. In this two-stage design, 2076 cases vs 4754 controls were used in the discovery cohort, and 1159 cases vs 2546 controls were used for a replication cohort. See
In the discovery stage, 353 significant CNVRs (nominal P<=1.8×10−8) were identified after Bonferroni correction for 550K SNPs used for analysis, and 266 significant CNVRs replicated (nominal P<=2.9×10−5) after correcting for 353 significant discovery regions tested. The most significantly associated CNVRs highlight some attractive and novel candidate genes for ASD.
Most interesting are the 25 duplications unique to cases in GABRB3-GABA-A receptor, beta 3 (P<=1.42×10−13, OR=inf). This is an attractive candidate gene as GABA is the main inhibitory neurotransmitter, and it lies within the Prader-Willi/Angelman syndrome critical region (15q11-13), mutations of which have been described in several individuals with autism. Moreover, this was found to be significant across Europeans and African populations (P<=6.44×10−5 and 1.82×10−5 respectively); Association between a GABRB3 polymorphism and autism (Buxbaum et al., 2002) as well as GABRA4 & GABRB 1 (Collins et al., 2006). Gabrb3 gene deficient mice exhibit impaired social and exploratory behaviors, deficits in non-selective attention and hypoplasia of cerebellar vermal lobules: a potential model of autism spectrum disorder (DeLorey, Sahbaie, Hashemi, Homanics, & Clark, 2008)
We found 38 genes with exons disrupted by robust CNVRs: ATP10A, GABRA5, GABRB3, GABRG3, GGTLC2, HBII-52-45, HBII-52-46, IPW, LOC648691, LOC96610, MAGEL2, MIR650, MKRN3, NCRNA00221, NDN, OCA2, OR4S2, PAR-SN, PAR1, PARS, POM121L1P, PRAME, SNORD107, SNORD108, SNORD109A, SNORD109B, SNORD115-11, SNORD115-29, SNORD115-36, SNORD115-43, SNORD115-44, SNORD115-48, SNORD64, SNRPN, SNURF, UBE3A, ZNF280A, ZNF280B. These genes are most enriched in gene networks impacting neurological disease, behavior and developmental disorder, and GABAR-A receptor signaling was found to be the most significant canonical pathways disrupted in ASD associated with case-enriched defects in GABRA5, GABRB3, and GABRG3 genes. See
Finally, we defined the first-degree interactome of the GABAR-A family, and a found that ASD cases are significantly enriched for pathway defects in cases when compared with neurologically normal controls. About 3% of cases harbor genetic pathway defects vs 0.03% of controls (P<=2.1×10−21, OR=9.9), and 17 out of 121 genes enriched in cases (14%) vs 9 out of 121 genes in controls (7%). The network showing genes enriched in cases (red) vs controls (blue) is as shown in
indicates data missing or illegible when filed
The information herein above can be applied clinically to patients for diagnosing an increased susceptibility for developing autism or autism spectrum disorder and therapeutic intervention. A preferred embodiment of the invention comprises clinical application of the information described herein to a patient. Diagnostic compositions, including microarrays, and methods can be designed to identify the genetic alterations described herein in nucleic acids from a patient to assess susceptibility for developing autism or ASD. This can occur after a patient arrives in the clinic; the patient has blood drawn, and using the diagnostic methods described herein, a clinician can detect a CNV as described in Example I and set forth in Table II. The information obtained from the patient sample, which can optionally be amplified prior to assessment, will be used to diagnose a patient with an increased or decreased susceptibility for developing autism or ASD. Kits for performing the diagnostic method of the invention are also provided herein. Such kits comprise a microarray comprising nucleic acids containing at least one of the CNV/SNPs provided herein in and the necessary reagents for assessing the patient samples as described above.
The identity of autism/ASD involved genes and the patient results will indicate which variants are present, and will identify those that possess an altered risk for developing ASD. The information provided herein allows for therapeutic intervention at earlier times in disease progression than previously possible. Also as described herein above, the CNV containing genes described herein provide novel targets for the development of new therapeutic agents efficacious for the treatment of this neurological disease.
The information provided herein can also be employed in a test and treat approach. Although relatively common, the ASDs are still relatively under diagnosed, especially in rural areas where community pediatricians may not be as knowledgeable about the deluge of research that continues to define these disorders and their treatment. By and large, the mainstay of treatment for the ASDs remains behavioral therapy. There are many different types of behavioral therapies that specifically cater to the diverse phenotypic manifestations of the ASDs, and in all cases earlier intervention is correlated with better results. Therefore, early diagnosis of particular ASD subtypes is crucial to early behavioral intervention (and psychopharmacological management in extreme cases) and maximizing a child's potential and quality of life.
From a pharmacogenomics perspective using breast cancer as an analogy, trastuzumab, a monoclonal antibody that targets HER2 expressing cancer cells, has revolutionized the treatment for breast cancer, and it is perhaps the best-known example of a successful personalized therapeutic (
Recently, next generation sequencing (NGS) was employed to study the genetic etiology of the ASD in sporadic families by analyzing the sequenced exomes of 20 parent-child trios, (O'Roak et al. 2011). These studies revealed four attractive candidate genes (FOXP1, GRIN2B, SCN1A and LAMC3) involved in neurotransmission which harbored functional de-novo mutations in sporadic families with ASDs. The notion that as few as 60 exomes could facilitate the identification of four rare plausible functional mutations underlying the ASDs suggests that as NGS of larger numbers of samples becomes more commonplace, the genomic landscape of rare mutations underlying ASDs will expand considerably to the benefit of clinicians and patients alike. Just as a better understanding of the molecular genetics of cancer cells has revolutionized the treatment of breast and other cancer treatments, improved resolution to identify rare mutations underlying ASDs will facilitate the development of molecular tests with better diagnostic yields that will be able to aid clinicians in diagnosing the ASDs and their particular genetic sub-types.
With better molecular diagnostics that dissect the sporadic genetic mutations underlying the ASDs, the personalized approach to treating patients' specific molecular defects in becomes a reality. This type of cutting-edge pharmacogenomics approach to the ASDs will facilitate the development of a test and treat model for drugs that target genetically defined responder populations just as trastuzumab does for HER2+ breast cancer patients (
Having already implicated the GABAR-A pathway by CNV analysis as described above in Example 1, we have gone on to identify a host of drug candidates that act on this signalling pathway to potentially rescue underlying neurogenetic defects in patients with the ASDs (Tables 3 and 4). Topiramate is one such candidate that acts as an agonist at the GABAR-A pathway. Just as we did for the GABAR-A pathway itself in example 1, we defined the first-degree interactome of topiramate itself, and a found that ASD cases are significantly enriched for pathway defects in cases when compared with neurologically normal controls. About 20.7% of cases harbor genetic pathway defects vs 8.3% of controls, a statistically significant difference (P<=1.5×10−44 OR=2.9) that supports our hypothesis that topiramate itself may be effective in treating patients with ASDs that harbor genetic defects in the GABAR-A pathway (
The rational approach to personalized drug design as described herein should both restore normal neurophysiology in patients with ASDs by rescuing specific disrupted genetic pathways and avoid exposing them to drugs that will precipitate adverse side effects. Given the immense clinical and genetic heterogeneity of the ASDs, early tailored psychopharmacgenomic intervention as we have outlined here in combination with comprehensive behavioral programs should improve the prognosis and the outlook for patients that suffer from these burdensome diseases.
Despite being highly heritable, the vast majority of family studies suggest that the ASDs do not segregate as a simple Mendelian disorder, but rather display clinical and genetic heterogeneity consistent with a complex trait [13]. Indeed, recent studies estimate that the ASDs may comprise up to 400 distinct genetic and genomic disorders that phenotypically converge [14,15]. Common variants such as single-nucleotide polymorphisms seem to contribute to ASD susceptibility, but, taken individually, their effects appear to be small [16]. However, there is increasing evidence that the ASDs can arise from rare or “private” highly penetrant mutations that segregate in families but are less generalizable to the general population[17-19]. Many genes implicated thus far-which are involved in chromatin remodeling, metabolism, mRNA translation, and synaptic function-seem to converge in common pathways or genetic networks affecting neuronal and synaptic homeostasis [16].
Such remarkable phenotypic and genotypic heterogeneity when coupled with the private nature of mutations in the ASDs has hindered identification of new genetic risk factors with therapeutic potential. However, it is noteworthy that many of the rare gene defects implicated in the ASDs belong to gene families. For instance, rare defects impacting multiple members of both the post-synaptic neuroligin (NLGN) gene family [20] as well as their pre-synaptic neurexin (NRXN) molecular interacting partners [21,22] have long been reported in patients with ASDs. Additionally, a number of other defective gene families with important functional roles have subsequently been well-characterized including ubiquitin (UBEA) conjugation[23], gamma-aminobutyric acid (GABA) receptor signaling [24-27] and cadherin/protocadherin (CDH) cell junction proteins [28] in the brain. Furthermore, multiple defects in voltage gated calcium channels (CACNA) have been found in schizophrenia [29], and a defective network of metabotropic glutamate (GRM) receptor signaling was found in both ADHD [30] and schizophrenia [31-36], two neuropsychiatric disorders that are highly coincident with the ASDs. Also, the vast majority of significant defective genes identified from recent whole exome sequences belongs to gene families [17-19].
Many studies have found defective genetic networks in the ASDs, [21,23,37-40] (see [16] for review), and we complement these in this work by uncovering new networks and implicating specific defective gene families that may be enriched for novel potential therapeutic targets. Drug binding sites on proteins usually exist out of functional necessity [33], and gene families derive from gene duplication events that present additional binding sites for a given drug to exert its effects. Most successful drugs achieve their activity by competing for a binding site on a protein with an endogenous small molecule [41], therefore, many successful pharmacologic gene targets are within large gene families. Indeed, nearly half of the pharmacologic gene targets fall into just six gene families: G-protein-coupled receptors (GPCRs), serine/threonine and tyrosine protein kinases, zinc metallopeptidases, serine proteases, nuclear hormone receptors and phosphodiesterases [41]. Moreover, many large gene families are localized to pre and post synaptic neuronal terminals to coordinate the highly complex and evolutionarily conserved process of neurotransmission[42], which is thought to be compromised to varying degrees in the autistic brain [43]. Therefore we hypothesize that we may select more efficacious drug targets for the ASDs by enriching for defective interaction networks defined by gene families.
The following materials and methods are provided to facilitate the practice of Example III.
The research presented here has been approved by the Children's Hospital of Philadelphia IRB (CHOP IRB#: IRB 06-004886). Some patients and their families were recruited through CHOP outreach clinics. Written informed consent was obtained from the participants or their parents using IRB approved consent forms prior to enrollment in the project. There was no discrimination against individuals or families who chose not to participate in the study. All data were analyzed anonymously and all clinical investigations were conducted according to the principles expressed in the Declaration of Helsinki.
Samples were selected from DNA collected as part of the Center for Applied Genomics (CAG) biorepository, from samples that originated at the Children's Hospital of Philadelphia. All children had a community diagnosis of ASD (n=539). This cohort included any children with ASD, and was unfiltered for presence of a comorbid genetic syndrome.
A physician blinded to mGluR status conducted chart review for all patients with mGluR network CNV's (n=62), 100 patients without mGluR CNV's, and all patients in the 22q sample (n=78). Patients were excluded if there was insufficient documentation of a community diagnosis of ASD. The validity of this diagnosis was not assessed as part of the present study. Patients were also excluded if they did not have at least one comprehensive history and physical documented by a physician. Comorbid medical conditions, clinical genetic testing and imaging data were reviewed. Children were categorized as having “Syndromic ASD” if they had ASD plus a structural defect or medical condition that occurs in less than 1% of the general population, and/or diagnosis of a genetic syndrome based on clinical testing. Genetic abnormalities predicted to be benign or ‘variant’ findings on clinical array were not categorized as “syndromic ASD” unless they met criteria based on the aforementioned clinical abnormalities. Two patients without documented structural abnormalities (but significant developmental delay) who had not received clinical genetic testing were categorized as “Syndromic ASD” based on the presence of large deletions on research arrays.
The majority of cases (5,049 of 6,742) and all controls (12,544) were genotyped with genome wide coverage using the Infinium II platform across various iterations of the HumanHap BeadChip with 550K, 610K, 660K, and 1M markers by the Center for Applied Genomics at The Children's Hospital of Philadelphia (CHOP). There were 1,693 cases genotyped by the AGP consortium. All cases and approximately 50% of controls were re-used from previously published large ASD studies [21,23,28,44]. All cases were diagnosed by ADI-R/ADOS and fulfilled standard criteria for autism spectrum disorders. Duplicates samples were removed by selecting unique samples with the best quality (based on genotyping statistics used to QC samples) from clusters defined by single linkage clustering of all pairs of samples with high pairwise identity by state measures (IBS>=0.9) across 140K non-correlated SNPs. Ethnicity of samples was inferred by a supervised k-means classification (k=3) of the first 10 eigenvectors estimated by principal component analysis across the same subset of 140K non-correlated SNPs. We used HapMap 3 [45] and the Human Genome Diversity Panel [46] samples with known continental ancestry to train the k-means classifier implemented by the R Language for Statistical Computing [65].
We called CNVs with the PennCNV algorithm [66], which combines multiple values, including genotyping fluorescence intensity (Log R Ratio), population frequency of SNP minor alleles (B Allele Frequency), and SNP spacing into a hidden Markov model. The term ‘CNV’ represents individual CNV calls, whereas ‘CNVR’ refers to population-level variation shared across subjects. Quality control thresholds for sample inclusion in CNV analysis included a high call rate (call rate>=95%) across SNPs, low standard deviation of normalized intensity (SD<=0.3), low absolute genomic wave artifacts (|GCWF|<=0.02), and low numbers of CNVs called (#CNVs<=100). Genome wide differences in CNV burden, defined as the average span of CNVs, between cases and controls and estimates of significance were computed using PLINK [67]. CNVRs were defined based on the genomic boundaries of individual CNVs, and the significance of the difference in CNVR frequency between cases and controls was evaluated at each CNVR using Fisher's exact test.
We extended our previous work from Example I to rank all gene family interaction networks (GFINs) by a permutation test. Specifically, we defined a GFIN as the directed second-degree gene interaction network defined by a family of genes. We found 2,611 gene families with at least two members based on official HUGO [48] gene nomenclature, and generated 1,732 GFINs using merged human interactome data from three different yeast two hybrid generated datasets [49-51] accessed through the Human Interactome Database [68]. We calculated an enrichment of cumulative network enrichment in a method previously described [30] for 1,557 GFINs with defined CNVs.
For each GFIN, we quantified its enrichment by a permutation test of 1000 second-degree gene interaction networks derived from a random set of N genes, where N is the number of members of a given gene family. Because the CNVs we are focused on are so rare, we are underpowered to achieve significance py permutation testing after correcting for multiple GFIN tests. However, we report all GFINs that are nominally significant.
Samples with SNP arrays of poor quality were excluded from CNV calling, since typically the proportion of false positives increases considerably for these samples. Only those samples where the genotyping call rate>98%, standard deviation of LRR (LRR_sd)<0.35, GC-wave factor (GCWF) is between −0.2 and 0.2, and total number of CNV calls for the sample <100. CNV's were visually valiated based on ParseCNV criteria (Glessner et al Plos One, 2013).
For syndromic ASD regions, genomic coordinates were those described by Betancur (2011 Brain Res). The GRM/mGluR network generated by Cytoscape from the Human Interactome database was described by Elia et al. (2012 Nat Genet) using UCSC Genome Browser definitions for gene coordinates. CNV calls were analyzed for overlap to known syndromic regions and GRM network genes. All syndromic aberrations detected by clinical cytogenetic laboratory testing were confirmed on corresponding SNP arrays.
Group comparisons were made using Fisher's Exact Test and Chi Square sample distribution as previously described.
Significant CNVRs that we identified were validated using commercially available qPCR Taqman probes run on the ABI GeneAmp 9700 system from Life Technology. Data File 1 lists 251 reactions that we tested using 121 different genomic probes across 85 different samples for which DNA was available. For deletions, our sensitivity=0.65, specificity=1.00, NPV=1.00, and PPV=0.88. For duplications, our sensitivity=0.68, specificity=0.99, NPV=0.94, and PPV=0.91.
In the present example we describe the results from a large genome-wide association study (GWAS) of structural variants that disrupt gene family protein interaction networks in patients with autism. We find multiple defective networks in the ASDs, most notably rare copy number variants (CNVs) in the metabotropic glutamate receptor (mGluR) signaling pathway in up to 6% of patients with the ASDs (as described above in Example I). Defective mGluR signaling was found in both ADHD [30] and schizophrenia [31-36], two common neuropsychiatric disorders that are highly coincident with the ASDs. Furthermore, we find other attractive candidates such the MAX Dimerization Protein (MXD) network that is implicated in cancer, and a Calmodulin 1 (CALM1) gene interaction network that is active in neuronal tissues. The numerous defective gene family interactions we find to underlie autism present many novel translational opportunities for the generation of more effective therapeutic interventions.
To identify and comprehensively characterize defective genetic networks underlying the ASDs, we performed a large-scale genome association study for copy number variation (CNVs) enriched in patients with autism. By combining the affected cases from previously published large ASD studies [21, 23, 28, 44] with more recently recruited cases from the Children's Hospital of Philadelphia, we executed one of the largest searches for rare pathogenic CNVs in ASDs to-date. In sum, 6,742 genotyped samples from patients with the ASDs were compared to those from 12,544 neurologically normal controls recruited at The Children's Hospital of Philadelphia (CHOP). These cases were each screened by neurodevelopmental specialists to exclude patients with known syndromic causes for autism. Genotyping was performed at CHOP for the vast majority of the ASD cases as well as all the controls. After cleaning the data to remove sample duplicates and performing standard QC for CNVs, we first inferred the continental ancestry of 5,627 affected cases and 9,644 disease free controls using a training set defined by populations from HapMap 3 [45] and the Human Genome Diversity Panel [46] (Table 5). Using this QC criteria, we estimated that the sensitivity and specificity of calling CNVs is approximately 70% and 100%, respectively, across 121 different genomic regions assayed by PCR (see methods). Across all ethnicities, there was an increased burden of CNVs in cases vs controls, a statistically significantly difference (P<=0.001) in the larger European (63.3 vs 54.5 Kb respectively) and African (70.4 vs 48.0 Kb respectively) derived populations.
We then searched for pan-ethnic CNV regions (CNVRs) discovered in the European-derived dataset (4,602 cases vs 4,722 controls; P<=0.0001) and replicated in an independent ASD dataset of African ancestry (312 cases vs 4,169 controls; P<=0.001) with subsequent measurement of overall significance across the entire multi-ethnic discovery cohort (5,627 cases vs 9,644 controls) for maximal power (
We examined the genetic interaction networks derived from gene families with members localized to the the Prader Willi/Angelman syndrome (15q11-13) critical region, the DiGeorge syndrome (22q11) critical region, and the novel PARP8 (5q11) region using a method previously applied to ADHD [30]; however, hardly any of the most significant genes harboring significant CNVRs clustered within gene families. Consequently, we broadened our search for gene family interaction networks (GFINs) and searched the entire genome for GFINs with CNVs enriched in autism. For every gene family, we defined a GFIN as the genetic interaction network spawned by its multiple duplicated members. We used standard HUGO [48] gene names to define 1,732 GFINs across which we searched for enrichment of network defects associated with the ASDs. However, because there is an a-priori excess of CNV burden in ASD cases over disease free controls (Table 5), larger GFINs are expected to display significant enrichment of case defects by virtue solely of their increased size and complexity. Therefore, for each GFIN, we used a network permutation test of case enrichment across 1,000 random sets of networked genes to control for the GFIN size and complexity. With this approach, we robustly identified network defects associated with the ASDs by minimizing statistical artifact derived from any a priori excessive CNV burden in cases over controls, as well as other unknown biases that may be inherent in the human interactome data [49-51] that we mined.
indicates data missing or illegible when filed
Out of 1,732 GFINs, we used the network permutation test to rank 1,557 GFINs with defined CNVs for enrichment of genetic defects in the ASDs. Among the top GFINs (Table 7) was the metabotropic glutamate receptor (mGluR) pathway defined by the GRM family of genes that impacts glutaminergic neurotransmission. The GRM family contains eight members, all of which were defined in the human interactome to cumulatively spawn a GFIN of 279 genes (
Many large studies of CNVs implicate genes within the glutaminergic signaling pathway in the etiology of the ASDs [21,23,37-40], and SNP [52,53] and CNV duplications [54] of GRM8 have been reported in association with the ASDs before in humans. Moreover, a recent functional study demonstrated that in mouse models of tuberous sclerosis and fragile X, two different forms of syndromic autism, the autistic phenotype was ameliorated by modulation of GRM5 in opposite directions for each syndrome which suggests that GRM5 functional activity is central in defining the axis of synaptopathophysiology in syndromic autism [55]. Our GRM network findings implicate rare defects in mGluR signaling also contribute to the ASDs outside of fragile X and tuberous sclerosis, and we posit that functional mGluR synaptopathophysiology may be initiated from many dozens if not hundreds of defective genes within the mGluR pathway that may account for as much as 6% of the endophenotypes of the ASDs (Table 7). Additionally, we recently demonstrated the importance of mGluRs in ADHD [30,56], a highly co-incident neuropsychiatric disorder within the autism spectrum. However, in contrast to ADHD where defects within the mGluR receptors themselves (GRMs) were among the most significant copy number defects contributing to the overall network significance, we found that in the ASDs defects of component GRMs contributed only modestly to the overall significance of the mGluR pathway. Nonetheless, the defects within GRM1, GRM3, GRM5, GRM7, and GRM8 that we identified as unique to cases and thus enriched are the same GRMs we identified as being pathogenic in ADHD and may impact glutaminergic signaling.
Among the most highly ranked GFINs by permutation testing, the MAX dimerization protein (MXD) GFIN (PFisher<=3.83×10−23, Enrichment=2.53, Pperm<=0.042) was the most enriched. The MXD family of genes encode proteins that interact with MYC/MAX network of basic helix-loop-helix leucine zipper (bHLHZ) transcription factors that regulate cell proliferation, differentiation, and apoptosis [MIM 600021] [57]; MXD genes are important candidate tumor suppressor genes as the MXD-MYC-MAX network is dysregulated in various types of cancer [58]. Interestingly an epidemiological link between autism and specific types of cancer has been reported [59] and anti-cancer therapeutics were recently shown to modulate ASD phenotypes in the mouse through regulation of synaptic NLGN protein levels [60]. Within the component genes contributing to the MXD GFIN significance, duplications in PARP10 (P<=4.06×10−11, OR=2.04) and UBE3A (1.50×10−6, OR=inf) are the most significantly enriched (data not shown). It is notable that we found PARP8 as significant across ethnicities as described earlier (Table 6), and we previously described the importance of structural defects in UBE3A in the ASDs [23].
Other notable significant GFINs uncovered were POU class 5 homeobox (POU5F) GIFN (PFisher<=2.96×10−17, Enrichment=2.3, Pperm<=0.008, and the SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily c (SMARCC) GFIN (PFisher<=1.22×10−9, Enrichment=1.9, Pperm, <=0.035). The POU5F family of genes encodes for transcription factors containing a POU homeodomain, and their role has been demonstrated in embryonic development, especially during early embryogenesis, and it is necessary for embryonic stem cell pluripotency. Component genes of the SMARCC gene family are members of the SWI/SNF family of proteins, whose members display helicase and ATPase activities and which are thought to regulate transcription of certain genes by altering the chromatin structure around those genes. Most interestingly, the KIAA family of genes ranked among the top GFINs (PFisher<=3.12×10−23, Enrichment=1.6, Pperm<=0.040). KIAA genes have been identified in the Kazusa cDNA sequencing project [61], and are predicted from novel large human cDNAs; however, they have no known function.
We also hypothesized that some component members of gene families may contribute disproportionately to the significance of a GFIN because they are highly connected to interacting gene partners that are enriched for CNV defects in ASD. Therefore we decomposed the 1,732 gene families into their 15,352 component duplicated genes of which 1,218 had defined networks with data to test for significance by genome-wide network permutation. The calmodulin 1 (CALM 1) gene interaction network ranked highest by network permutation testing of case enrichment for CNV defects across 1,000 random gene networks (Table 8,
Among other highly ranked first degree gene interaction networks, were the nuclear receptor co-repressor 1 (NCOR1; Pfisher<=1.11×10−6, Enrichment=13.37, Pperm<=0.004) and BCL2-associated athanogene 1 (BAG 1; Pfisher<=2.18×104, Enrichment=15.40, Pperm<=0.014) networks. NCOR1 is a transcriptional co-regulatory protein that appears to assist nuclear receptors in the down regulation of DNA expression through recruitment of histone deacetylases to DNA promoter regions; it is a principal regulator in neural stem cells [51]. The oncogene BCL2 is a membrane protein that blocks the apoptosis pathway, and BAG1 forms a BCL2-associated athanogene and represents a link between growth factor receptors and anti-apoptotic mechanisms. The BAG1 gene has been implicated in age related neurodegenerative diseases, including Alzheimer's disease [62,63].
In summary, the private nature of mutations in the ASDs, and the cumulative contributions of rare highly penetrant genetic defects boost our power to discover and prioritize significant pathway defects. As a result, our comprehensive, unbiased analytical approach has identified a diverse set of specific defective biological pathways that contribute to the underlying etiology of the ASDs. Among GFINs robustly enriched for structural defects, the most enriched was that of the MXD family of genes that has been implicated in cancer pathogenesis [58] thereby providing concrete genetic defects to explore the reported coincidence of specific cancers with the ASDs [59]. The most highly ranked component duplicated gene interaction network involves defects in CALM1 and its multiple interacting partners that are important in regulating voltage independent calcium-activated action potentials at the neuronal synapse. Moreover, we found significant enrichment for defects within the GFIN for GRM that defines the mGluR pathway that has previously been shown to be defective in other neuropsychiatric diseases [29,30]. While specific mGluR gene family members have been shown to underlie syndromic ASDs [55], our findings suggest that rare defects in mGluR signaling also contribute to idiopathic autism across the entire GFIN for GRM genes.
Consequently, in addition to specific neuronal pathways that are expected to be defective in the ASDs like those defined by GRM and CALM duplicate genes, we implicate completely novel biological pathways such as the MXD pathway specific forms which appear to be associated with the ASDs [59]. Given the unmet need for better treatment for neurodevelopmental diseases [64], the functionally diverse set of defective genetic interaction networks we report presents attractive genetic biomarkers for targeted therapeutic intervention in ASDs and across the neuropsychiatric disease spectrum.
Abnormal signaling mediated through mGluR5 is involved in the pathophysiology of Autism Spectrum Disorder (ASD) in Fragile X Syndrome and Tuberous Sclerosis. However, the role of other mGluR associated network/signaling genes in syndromic ASD is unknown. To determine whether copy number variants (CNV'S) are enriched in syndromic ASD, microarrays were used to identify mGluR network CNV's in children with ASD. We set out to determine 1) whether rate of syndromic features vary between children with ASD with and without CNV's in mGluR network genes; and 2) whether “second hits” in mGluR network genes occur more often in children with ASD in children with 22q11.2 Deletion Syndrome (who all have haploinsufficiency of RANBP1, an mGluR network gene in the 22q11.2 region.
Individuals in our biorepository with parental report of ASD (n=6,452) were screened for parental consent to access clinical evaluations in the Electronic Health Record at the Children's Hospital of Philadelphia (n=539). Our syndromic comparison cohort included children with 22811.2 Deletion Syndrome with full access to past medical and neuropsychological evaluations (n=75), including those with diagnosis of ASD (n=25) and those with no concern for ASD (n=50).
Patient categorization (syndromic vs nonsyndromic) was done via blinded medical chart review in all mGluR positive and 100 randomly selected mGluR negative cases.
Our results, explained further hereinbelow show that 11.5% of ASD had mGluR CNV's vs. 3.2% in healthy controls (p<0.001). Syndromic ASD was more prevalent in children with mGluR CNVs (72% vs 16%, p<0.001). A comparison cohort of children with 22q11.2 Deletion Syndrome (n=25 with ASD, n=50 without ASD), all haploinsufficient for mGluR network gene RANBP1, was evaluated to determine whether “second hits” in mGluR network genes confer additional risk for ASD. 20% with 22q11.2DS+ASD had “second hits” in mGluR signaling genes vs 2% in 22q11.2DS-ASD (p<0.014). Conclusions: We propose that altered RANBP1 expression may provide a mechanistic link between ASD in 22q11.2DS, Thalidomide Embryopathy and Fetal Valproate Syndrome, providing a link for seemingly unrelated genetic and environmental forms of ASD.
The results suggest that CNV's in mGluR network genes, previously implicated in altered neurological development in Fragile X Syndrome and Tuberous Sclerosis, may link many other genetic and environmental forms of Autism Spectrum Disorder.
As discussed in the previous examples, Autism Spectrum Disorder (ASD) occurs in approximately 1/88 individuals and is characterized by impairment in social communication and repetitive interests and activities1. Approximately 20% of cases occur in the context of an identifiable syndrome2. Genetic syndromes with ASD are heterogeneous, including cytogenetically visible chromosomal alterations (e.g. Trisomy 21), microdeletion and microduplication syndromes (e.g. 22q11.2 deletion syndrome [22q11.2DS]; 22q11.2 duplication syndrome [22q11.2DupS]), and monogenic disorders (e.g. Fragile X Syndrome [FXS], Tuberous Sclerosis [TS])3-13. In addition, prenatal exposure to thalidomide, valproic acid, misoprostol, ethanol and maternal rubella infection, have been associated with an elevated risk of ASD14-19.
The mechanism for the development of ASD in most forms of idiopathic and syndromic forms of ASD remains elusive. Recently, signaling through metabotropic glutamate receptor 5 (mGluR5) has been implicated in the development of ASD in FXS and TS20. In FXS, abnormal production of Fragile X Mental Retardation Protein (FMRP) removes normal inhibition of signaling through the mGluR pathway. Tuberous Sclerosis leads to over inhibition of signaling. Auerbach and colleagues (2011) demonstrated abnormal synaptic learning and atypical behavior in mouse models of FXS and TS, and reversed these effects by breeding the two strains together—mice harboring both mutations had normal mGluR signaling, and learning and behavior that was indistinguishable from control mice20. Other studies have demonstrated normalization of learning and behavior in Fragile X mice by administration of an mGluR5 antagonist21,22. In addition to elucidating the mechanism for cognitive and behavioral differences in FXS and TS, these studies suggest a promising avenue for pharmacological treatment. Recent studies have implicated rare CNV's in the etiology of ASD, including deletions impacting genes in the mGluR gene network23, consisting of 276 genes24. To determine whether additional forms of syndromic ASD may share a similar mechanism (through disruption of the mGluR gene network), we analyzed DNA from 539 children with ASD (not filtered for comorbid genetic syndrome) followed at the Children's Hospital of Philadelphia.
The following materials and methods are provided to facilitate the practice of Example IV.
Phenotypic data for patients with ASD as reported on parental health questionnaires from our biorepository (n=6,452) were evaluated to identify patients who received clinical assessment at the Children's Hospital of Philadelphia and agreed to Electronic Health Record chart review. DNA from these cases (n=539) were selected for further phenotypic and genotypic analysis. Children were recruited for inclusion in the general Center for Applied Genomics biorepository when they were getting blood drawn for another purpose at The Children's Hospital of Philadelphia, so there is an overrepresentation of children with at least one medical problem in this patient cohort. The parents of all patients gave consent for participation in the study, which was approved by the Institutional Review Board at the Children's Hospital of Philadelphia (IRB 06-004886).
Subject selection and randomization process: All patients with an mGluR CNV (n=62) and 100 patients without mGluR CNV were randomly selected for chart review. This procedure was selected to ensure that all patients with mGluR CNV received detailed chart review with an adequately sized comparison cohort. A three step process was done to ensure blinded chart review. The selection of the 162 charts was done by a geneticist with access to CNV data but without access to the Electronic Health Record (CK). Another author who had no access to CNV data nor the Electronic Health Record blinded and randomized the patient ID's (RTS). Finally, a physician with access to the Electronic Health Record but blinded to mGluR status (TLW) reviewed charts for documentation of ASD diagnosis and presence of other medical comorbidities.
Charts were reviewed to confirm a diagnosis of ASD and also to determine medical comorbidities for each patient. Diagnosis of ASD was confirmed in the chart, but as this was a retrospective chart review, gold-standard research instruments (e.g. Autism).
Structural birth defects, genetic testing and medical conditions were recorded for each patient. Cases were categorized as “Syndromic ASD” if they had ASD and presence of a medical condition orstructural birth defect (e.g. cleft palate) that occurs in less than 1% of the general population. This criteria was established to define a subset of patients whose ASD and other medical problems would be highly unlikely to occur coincidentally—With a baseline rate of ASD at 1/88 and a medical condition that occurs in <1% of the general population, the compound likelihood of both occurring by chance would be approximately 0.001%. See
DNA from subjects with ASD were each genotyped on the Human610-Quad or HumanHap550 SNP arrays from Illumina. For 22q11 DS cohorts, subjects were typed either on Illumina SNP arrays (Human610-Quad v1.0 or HumanHap550) or Affymetrix 6.0 SNP arrays. Clustering and SNP calling was performed using GenomeStudio (Illumina) to generate normalized intensity (i.e. Log-R ratio, or LRR) and B-allele frequencies (BAF). CNV calling was performed using the PennCNV algorithm [PMID: 17921354] following waviness correction [PMID: 18784189]. In brief, PennCNV uses a hidden Markov model (HMM) that incorporates information from LRR, BAF, as well as features of the array (e.g. distance between neighboring SNPs) to detect CNVs.
Samples with SNP arrays of poor quality were excluded from CNV calling, since typically the proportion of false positives increases considerably for these samples. Those samples where the genotyping call rate>96%, standard deviation of LRR (LRR sd)<0.4, GC-wave factor (GCWF) is between −0.2 and 0.2 after waviness correction, and total number of CNV calls for the sample <100 were included in analysis.
For syndromic ASD regions, genomic coordinates were those described by Betancur [PMID: 21129364]. The GRM/mGluR network generated by Cytoscape from the Human Interactome database was described by Elia et al. [PMID: 22138692] using UCSC Genome Browser definitions for gene coordinates (UCSC genes). This network from Cytoscape was used to define mGluR+ vs. mGluR-subsets. For 22q11 DS cohort analysis, additional GRM/mGluR network genes were identified based on 1st degree interaction network of the eight GRM genes using the program Ingenuity Pathway Analysis (Ingenuity Systems Inc./Qiagen; Redwood City, Calif.) as well as the genes encoding the group I mGluR signaling pathway described in Kelleher et al. [PMID: 22558107]. CNV calls were analyzed for overlap to known syndromic regions and GRM network genes. All syndromic aberrations detected by clinical cytogenetic laboratory testing were confirmed on corresponding SNP arrays.
CNVs in the mGluR network were found in 74% of patients with syndromic ASD compared to 16% of patients with nonsyndromic ASD (p<0.001). Most of the mGluR CNV's in patients with syndromic ASD (75%) were included in larger clinically significant CNV's. As mGluR network genes are present in the 22811.2 region (RANBP1) and on chromosome 21 (APP GRIK1 MX1 PCBP3 SETD4), patients with ASD in the presence of 22q11.2DS, 22q11.2DupS or Trisomy 21 accounted for 15 (33%) of the patients with Syndromic ASD+mGluR network changes. The remainder of observed cytogenetic changes had individual non-overlapping deletions or duplications. The analysis was repeated after exclusion of children with Trisomy 21, 22q11.2DS and 22q11.2DupS, (the syndromes in children in this study which have previously been associated with ASD). After their exclusion, the effect remained significant (p<0.001).
Autism Spectrum Disorder in 22q11.2 Deletion Syndrome is Associated with “Second Hit” in mGluR Pathway
As a comparison cohort, data from children with 22q11.2 DS with ASD (n=25) and without ASD (n=50) who had completed high density microarray evaluation (either Affymetrix 6.0, Illumina 500K, and Illumina 610Q) and clinical developmental assessments (as enrolled through a parallel study, approved by the Children's Hospital of Philadelphia Institutional Review Board, IRB 07-005352) were examined for the presence of a second mGluR network hit outside of the 22q11.2 region. “Second hits”, deletions of an mGluR network gene outside of the 22q11.2 region, were found in 20% (5/25) of patients with ASD and only 2% (1/50) without ASD (p<0.014).
Prior studies have demonstrated that abnormal signaling (either too much or too little) through mGluR5 could be the basis for abnormal neural development (and possibly ASD) in FXS and TS. Our data suggest that derangement of the mGluR network may be responsible for increased rates of ASD seen in cytogenetically distinct forms of syndromic ASD. mGluR network genes are found in the 22q11.2 region as well as on Chromosome 21, which may be involved in the increased prevalence of ASD in both Down Syndrome and 22q11.2 DS. However, all patients with Trisomy 21 or 22q11.2 DS harbor the change in the mGluR network suggesting a second hit outside of the region may be necessary for expression of the ASD phenotype.
The 22q11.2 DS is the most common microdeletion syndrome in humans, occurring in 1 in 4,000 individuals. The typical deletion spans approximately 3 Mb and includes approximately 45 genes, causing a variety of medical and behavioral disorders (Table 9)25-28. ASD occurs in approximately 20%, and psychosis in 25%5,9. The 22q11.2DupS results in the same types of birth defects and medical comorbidities seen in 22q11.2 DS, but at a lower rate (among over 60 patients in our clinical cohort). There are no cases of psychosis in 22q11DupS in the literature29 or our cohort. Among our cases with documentation of developmental evaluation after the age of 4, the prevalence of ASD is 27%, which is slightly higher than the rate in children with 22q11.2DS.
Thalidomide exposure during pregnancy causes a variety of birth defects that have all been reported in 22q11.2DS, including some that are extremely rare (e.g. phocomelia, radial ray defects). (Table 9). Miller and Stromland reported an elevated risk of ASD following exposure to thalidomide during early embryogenesis16. This study included prospective evaluation by a psychiatrist was done for adults who had been exposed to thalidomide during pregnancy and evaluation by a physician to document birth defects and associated features. All cases of ASD following thalidomide exposure had ear anomalies, suggesting exposure between days 24-28 post-fertilization. Among individuals exposed at this time, there was a 27% rate of ASD. Replications of this study in additional cohorts of children have not been possible because the use of thalidomide in pregnant women was widely restricted in the 1960's; therefore, additional cases are not available. Though several mechanisms for the cause of many of the birth defects in thalidomide embryopathy have been proposed, animal studies of the teratogenic effects of thalidomide have been limited due to significant species differences. One of the reasons thalidomide was used widely in the 1960's was because of a lack of teratogenicity in animals at levels that are highly teratogenic in humans. This has resulted in significant limitation in the ability of researchers to determine the teratogenic mechanism of thalidomide, as studies have taken place in animals for which thalidomide is not particularly teratogenic, or using dosages which are much higher than that used in humans. Recent changes in legislation have allowed for a study to be completed in human embryonic stem cells—the first of its kind to use human cells and dosages which would have been analogous to that experienced by women taking thalidomide in the 1950's and 1960's30. This study, conducted by Meganathan and colleages (2012) proposed that the teratogenic effects of thalidomide may be mediated through RANBP130. Valproic acid (VPA) is widely used as an anticonvulsant, mood stabilizer, and to prevent migraine headaches. Exposure to VPA during pregnancy causes an increased rate of several birth defects, all of which have been reported in 22q11.2DS, and most of which have been seen in Thalidomide Embryopathy (Table 9). Table 9 compares the birth defects seen in 22q11.2 DS, Thalidomide Embryopathy and Fetal Valproate Syndrome. The comparison of all birth defects seen in 22q11.2 DS to the exposures syndromes was not made because 22q11.2 DS includes deletion of dozens of additional genes which we do not propose to be affected in Thalidomide Embryopathy or Fetal Valproate Syndrome. In addition to structural defects, children exposed to VPA in utero have an elevated risk of developing ASD19,31,32. Rodent models of autism have used prenatal exposure to VPA to reproduce some of the neuroanatomic features of autism and abnormal behavior33-35. Due to its action as a Histone Deacetylase Inhibitor, VPA affects expression of many genes. Based on homology, decreased expression of RanBP1 mRNA is predicted in VPA-treated rats. Moreover, a recent study showed reversal of atypical behaviors in VPA-exposed mice with treatment with an mGluR antagonist36.
Derangement of genes in the mGluR network are found at a high rate in patients with different forms of Syndromic ASD, including 22q11.2DS, 22q11.2DupS, Trisomy 21 and a large number of other seemingly-unrelated chromosomal alterations. Moreover, among children with 22q11.2DS, the presence of a “second hit” in the mGluR network was identified in 20% of children with ASD, and only 2% of those without ASD (p<0.014). Significantly, four children, all with autism phenotype, had a small deletion in the vicinity of the RANBP1 gene. While the expression level of RANBP1 was not affected in one individual available for testing (data not shown), these atypical deletions could impact gene function with resulting dysregulation of the RANBP1 protein.
Taken together, these data implicate dysregulation of the mGluR network as a likely permissive factor that increases the propensity to develop an ASD. The striking increase in prevalence of ASD with a CNV affecting a second gene in the network suggests perturbations of mGluR signaling at multiple points is necessary. It is important to note that CNV's represent only a fraction of changes in mGluR network genes, as this study did not include assessment of sequence variations, and these findings may therefore represent the “tip of the iceberg”. While perturbation of the mGluR network appears to confer risk of ASD, additional genetic or environmental stressors are likely necessary for an individual child to develop ASD.
Striking similarities exist in the profiles of birth defects and elevated rates of Autism Spectrum Disorder seen in 22q11.2 Deletion Syndrome, Fetal Valproate Syndrome and Thalidomide Embryopathy. As thalidomide and VPA both cause decreased expression of RANBP1 mRNA, mimicking haploinsufficiency of the gene in 22q11.2 Deletion Syndrome, it is plausible that it could be involved in the common teratogenic profile across syndromes. Moreover, results from a Ranbp1 knockout mouse model from Paronett et al37 are also supportive of our hypothesis of the importance of RANBP1 in the neurological consequences of 22q11.2 DS and prenatal exposures affecting expression of RANBP1. In these studies, Ranbp1 (−/−) homozygotes, proliferation of the basal progenitor pool in the cortex is disrupted, leading to a dramatic reduction in cortical thickness and substantially fewer neurons in the perinatal cortex. The changes resulting from loss of RANBP1 function parallel that seen in mice with the larger 22q11.2 Deletion, suggesting that haploinsufficiency of Ranbp1 may contribute to the disruption of cortical circuitry in 22q11DS. Future studies, addressing the neurodevelopmental phenotype of mice with haploinsufficiency of Ranbp1 are anticipated to help elucidate the mechanism by which alterations of the mGluR pathway leads to increased risk of ASD,
While certain of the preferred embodiments of the present invention have been described and specifically exemplified above, it is not intended that the invention be limited to such embodiments. Various modifications may be made thereto without departing from the scope and spirit of the present invention, as set forth in the following claims.
This application is a continuation in part application of U.S. patent application Ser. No. 14/131,359, filed Jan. 7, 2014 which is a §371 application of PCT/US12/45959 filed Jul. 9, 2012, which in turn claims priority to U.S. Provisional Application Nos. 61/505,352 and 61/646,971 filed Jul. 7, 2011 and May 15, 2012 respectively, the entire contents of each being incorporated by reference herein as though set forth in full.
Pursuant to 35 U.S.C. '202(c) it is acknowledged that the U.S. Government has certain rights in the invention described, which was made in part with funds from the National Institutes of Health, Grant Numbers NIH T32 GM008628, RC2 MH089924, NIHD070454 and NIMH87636.
Number | Date | Country | |
---|---|---|---|
61646971 | May 2012 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14131359 | Jul 2014 | US |
Child | 14292480 | US |