A computer readable form of the Sequence Listing “12362-P40825US01_SequenceListing.txt” (8,192 bytes), submitted via EFS-WEB and created on Dec. 14, 2012, is herein incorporated by reference.
“Rearrangement hotspots”, genomic regions with an elevated frequency of genomic rearrangements such as deletions and duplications, are identified genome-wide. The application discloses microarray chips for detecting genomic rearrangements associated with genetic diseases and methods of manufacturing the microarray chips. The application also discloses methods of using copy number variants to diagnose Tourette Syndrome.
Segmental duplications (SDs) or low-copy repeats are blocks of DNA greater than 1 kilobase pair in size which share a high level of sequence homology (>90%) [Bailey J A, (2006); Redon R. et al. (2006); Bailey J A et al. (2002); and Alkan C. (2011)]. The catalogue of SDs comprises approximately 5% of the human genome encompassing 18% of genes [Bailey J A, (2006); Redon R. et al. (2006); Bailey J A et al. (2002); and Alkan C. (2011)]. They are considered antecedents to the formation of copy number variants (CNVs) which comprise approximately 12% of the human genome and are responsible for considerable human genetic variation [Redon R. et al. (2006); Bailey J. A. et al. (2002); Alkan C. et al. (2011); and Sharp A J et al. (2005)]. Emerging evidence suggests that SD regions are frequently associated with known genomic disorders with the vast majority representing novel sites whose genomic architecture is susceptible to disease-causing rearrangements [Sharp A J et al. (2005)]. However, the complexity of their structural architecture in the human genome and, more importantly, their role in disease pathogenesis remains largely elusive.
There is a growing body of evidence suggesting the involvement of multiple events in the origin of genomic rearrangements such as non-allelic homologous recombination (NAHR), non-homologous end joining (NHEJ), fork stalling and template switching (FoSTeS), and microhomology-mediated break-induced replication (MMBIR) [Cu W et al. (2008); Lieber M R et al. (2003); and Zhang F et al. (2009)]. Although the origins of the aforementioned mechanisms are strongly associated with highly homologous regions residing outside of common repeat elements (e.g., transposons) [Conrad F D et al. (2010)], the non-random distribution of highly homologous regions within SDs that are susceptible to such mechanisms remain to be fully elucidated. Moreover, evolutionary conservation of these mechanisms complicates the identification of SD breakpoints due to differing levels of sequence homology.
Genomic disorders arising from microdeletions/duplications can fail to be adequately explained by a single underlying event. The true contribution of NAHR, NEHJ, MMBIR and FoSTeS events to the origin of genomic rearrangement remains elusive, although large-scale studies are beginning to implicate NAHR as one of the primary events contributing to the origin of these genomic copy number changes [Conrad F D et al. (2010) and Mills R E et al. (2011)]. Genomic DNA situated between distal and proximal SDs represents a critical region often reported to be deleted/duplicated due to misalignment of the SDs between homologous chromosomes [Shaikh H T et al. (2007)].
Evidence suggests that the breakpoint architecture of SDs (i.e., distal and proximal) is associated with a higher propensity for NAHR-mediated rearrangement predisposing to an abnormal phenotype [DECIPHER Genomic Aberration Database: http://decipher.sanger.ac.uk/]. In other words, the increased frequency of pathogenic rearrangements is often directly correlated with the structural complexity of the local genomic regions involved. This is consistent with numerous reports indicating that highly homologous regions within SDs influence NAHR-mediated rearrangement events [Conrad F D et al. (2010); Mills R E et al, (2011); and Turner D J et al. (2007)]. These highly homologous regions may be referred to as ‘rearrangement hotspots’. Classic examples of NAHR-mediated genomic rearrangement include genomic disorders such as 3q29 microdeletion/duplication syndrome, globozoospermia, and Williams-Beuren syndrome [Ballif B C et al. (2008); Koscinski I et al. (2011); and Bayes M et al. (2003)].
In a recent report, the detection and validation of 8,599 CNVs using microarrays [Conrad, F D et al. (2010)] and subsequent targeted sequencing on 1067 of these CNV breakpoints [Conrad F D et al. (2010)] revealed extreme homologous regions consistent with NAHR-mediated rearrangements as the primary event in the origin of CNVs.
Array comparative genome hybridization (aCGH) technology, developed in the last decade, affords the capacity to examine the whole human genome on a single chip with a level of resolution dependent only on size and distance between the interrogating probes, a process also known as microarray-based cytogenetics [Diskin S J et al. (2009)]. The resolution afforded by microarray technology is at least 10-fold greater than the best prometaphase chromosome analysis obtained via conventional karyotyping, rendering it a sensitive whole-genome screen for genomic deletions and duplications [Brunetti-Pierri N et al. (2008)].
Genome-wide signatures of ‘rearrangement hotspots’ can facilitate the detection of genomic regions capable of mediating de novo deletions or duplications in humans. In particular, there remains a need for array comparative genome hybridization technology that targets all the vulnerable regions in the genome susceptible to disease-associated rearrangements including rearrangement hotspots, SDs, recently discovered CNVs and telomeric and centromeric chromosomal regions.
Structural variants are a risk factor for neuropsychiatric diseases. For example, Tourette syndrome (TS) is a developmental neuropsychiatric disorder characterized by the presence of both motor and verbal tics. It has a major genetic component but numerous linkage and association studies have not identified any common candidate genes. Copy number variation (CNV) analysis in TS revealed an association with genes previously implicated in autism spectrum disorder although none unique to TS. That no single CNV has been reported to segregate uniquely with TS in affected families provides an opportunity to detect novel CNVs specific to TS through the use of the microarray technology described herein.
The application relates to a microarray chip system comprising at least: 500, 750, 1000, 1500, 2000 or 4000 distinct oligonucleotide probes bound to a solid support, wherein each oligonucleotide probe comprises a nucleotide sequence complementary to a rearrangement indicator sequence region of a human genome, the rearrangement indicator sequence regions comprising a rearrangement indicator set that is indicative of risk, or occurrence, of genomic rearrangements.
In another embodiment, the application relates to a microarray chip system comprising at least: 500, 750, 1000, 1500, 2000 or 4000 distinct oligonucleotide probes bound to a solid support, wherein each oligonucleotide probe comprises a nucleotide sequence that hybridizes, optionally under medium or high stringency conditions, to a nucleotide sequence complementary to a rearrangement indicator sequence region of a human genome, the rearrangement indicator sequence regions comprising a rearrangement indicator set that is indicative of risk, or occurrence, of genomic rearrangements.
In one embodiment, the oligonucleotide probes are complementary to genomic sequences not more than 100, 150, 280, 300 or 500 base pairs apart within a single rearrangement indicator sequence region. Optionally, the oligonucleotide probes are complementary to every 100, 150, 280, 300 or 500 bp of a rearrangement indicator sequence region. In another embodiment, the oligonucleotides are 30 to 100 base pairs in length, optionally 45 to 65 base pairs in length. In yet another embodiment, the oligonucleotides are complementary to at least 5, 10, 15, 30, 40 or 60 contiguous nucleotides of the rearrangement indicator sequence regions.
In one embodiment, at least 5, 10, 15, 50, 100, 500 or 1000 oligonucleotide probes comprise a nucleotide sequence complementary to a single rearrangement indicator sequence region.
The application also discloses a system wherein the rearrangement indicator sequence regions comprise at least one rearrangement hotspot. Optionally, the rearrangement hotspot is contained within a segmental duplication. In one embodiment, the rearrangement hotspot comprises at least 10 duplicons.
In one embodiment, the rearrangement hotspots are selected from the genomic regions listed in Table 1. In another embodiment, the rearrangement indicator sequence regions are selected from the genomic regions listed in Tables 2-5.
In one embodiment, the solid support comprises at least one microarray chip and the oligonucleotide probes are arrayed on the at least one microarray chip. Optionally, the solid support comprises at least two microarray chips and the oligonucleotide probes are arrayed on the at least two microarray chips.
The application further discloses the use of a microarray chip system comprising: at least 500, 750, 1000, 1500, 2000 or 4000 distinct oligonucleotide probes bound to a solid support, wherein each oligonucleotide probe comprises a nucleotide sequence complementary to a rearrangement indicator sequence region of a human genome, the rearrangement indicator sequence regions comprising a rearrangement indicator set that is indicative of risk, or occurrence, of genomic rearrangements, wherein the use comprises detecting a genomic rearrangement or the risk of a genomic rearrangement or for identifying a novel genomic rearrangement.
In one embodiment, the genomic rearrangement indicates a genetic disease or the risk of a genetic disease.
In another embodiment, the genomic rearrangement is a copy number variation (CNV). Optionally, the CNV comprises a gene deletion or gene duplication. In further embodiments, the genomic rearrangement comprises a complex rearrangement, optionally multiple duplications and/or deletions and combinations thereof. Optionally, the genomic rearrangement comprises a duplication-deletion-duplication, a deletion-duplication-deletion, a duplication-duplication-deletion or a deletion-deletion-duplication. In other embodiments, the genomic rearrangement comprises a triplication. Optionally, the gene deletion or gene duplication comprises the deletion or duplication of a genomic sequence at least 200 bp, 500 bp, 1000 bp or 2000 bp in length.
In another embodiment the genomic rearrangement indicates a genetic disease in a subject. Optionally, the genomic rearrangement indicates the presence of, or the propensity for, a genetic disease in a subject.
Optionally, the genetic disease is Autism Spectrum Disorder, Psoriasis, Ankylosing Spondylitis or Tourette Syndrome. In a further embodiment, the genetic disease is Autism Spectrum Disorder and the genomic rearrangement is detected in the genes PGAP 1 or LNX1. In another further embodiment, the genetic disease is Ankylosing Spondylitis and the genomic rearrangement is detected in the genes UGT2B17 or UGT2B15.
The application also discloses a method of detecting genomic rearrangements in a subject. In one embodiment, the method comprises:
labeling a DNA test sample from a subject with a first fluorophore;
labeling a DNA reference sample with a second fluorophore;
contacting the labeled samples with the microarray system of the present application and hybridizing the labeled samples to the oligonucleotide probes of the present application; and
identifying a putative genomic rearrangement, wherein a non-equal signal ratio between first fluorophore and the second fluorophore identifies a putative genomic rearrangement.
In one embodiment, the DNA test sample and the DNA reference sample are genomic DNA samples. In another embodiment, the labeled samples are hybridized to the oligonucleotide probes of the disclosure under medium or high stringency hybridization conditions.
In one embodiment, a log 2 ratio of >0.25 or <−0.25 identifies a putative genomic rearrangement.
Optionally, the method further comprises validating the putative genomic rearrangement by quantitative FOR, fluorescence in-situ hybridization (FISH) analysis or karyotyping.
In one aspect of the method, the genomic rearrangement is associated with a genetic disease. In another aspect, the genomic rearrangement is a novel genomic rearrangement.
The application also discloses a method of constructing a microarray chip for detecting copy number variations comprising:
identifying at least 500, 1000, 1500, 2000 or 4000 rearrangement indicator sequence regions;
designing oligonucleotide probes complementary to the rearrangement indicator sequence regions; and
arraying the oligonucleotide probes on at least one microarray chip.
In one embodiment, the method further comprises the use of the chip to detect a genomic rearrangement wherein differential binding of a test genomic DNA sample and a reference genomic DNA sample to at least one oligonucleotide probe indicates a genomic rearrangement. Optionally, higher binding of the test genomic DNA sample than the reference genomic DNA sample to at least one oligonucleotide probe indicates a genomic duplication and higher binding of the reference genomic DNA sample than the test genomic DNA sample to at least one oligonucleotide probe indicates a genomic deletion.
Using the genome-wide rearrangement hotspots and microarray chips described herein, the inventors discovered a novel genomic region on chromosome 2 that is associated with copy number variants that segregate with Tourette Syndrome status.
Accordingly, the application also relates to a method of screening for, diagnosing and/or detecting an increased risk of developing Tourette syndrome in a subject comprising detecting the presence of a Tourette syndrome copy number variant in a Tourette syndrome critical region within a sample of the subject, wherein the presence of a Tourette syndrome copy number variant in a Tourette Syndrome critical region is indicative that the subject has Tourette Syndrome and/or an increased risk of developing Tourette syndrome.
In one embodiment, the Tourette Syndrome copy number variant is detected by one or more of: quantitative PCR, RT FOR, QF-PCR, fluorescent in situ hybribization (FISH), a binding agent, and/or a microarray.
The application also relates to a method of evaluating a genomic DNA from a subject suspected of having or having Tourette Syndrome comprising:
a) obtaining a genomic DNA test sample from the subject,
b) assaying the test sample to determine the presence or number of copies of a Tourette Syndrome copy number variant in the test sample
c) assaying a reference sample to determine the presence or number of copies of the Tourette Syndrome copy number variant in the reference sample,
d) identifying differences between the amount or number of copies of the Tourette Syndrome copy number variant in the Lest sample compared to the reference sample;
wherein differences between the amount or number of copies of the Tourette Syndrome copy number variant in the test sample compared to the reference sample are indicative of whether the subject has Tourette Syndrome or an increased risk of developing Tourette Syndrome.
The application also relates to a method of screening for, diagnosing and/or detecting an increased risk of developing Tourette Syndrome in a human subject comprising:
a) obtaining a sample from the subject;
b) assaying the sample for the presence of and detecting a Tourette Syndrome copy number variant in a Tourette Syndrome critical region thereby identifying the subject as having Tourette Syndrome or an increased risk of developing Tourette Syndrome, the assaying comprising hybridizing a probe and/or primer to the Tourette Syndrome copy number variant.
In one embodiment, the Tourette syndrome critical region is on chromosome 2, optionally located at 2q21.1-21.2.
In another embodiment, the Tourette Syndrome copy number variant is a duplication or a deletion. The duplication is optionally a duplication of genomic sequence corresponding to: chr2:132305299-132343808, chr2:132395155-132526804 or chr2:132305299-132343808 of human genome assembly 19.
In one embodiment, detecting a Tourette Syndrome copy number variant with an increased copy number compared to a reference sequence identifies the subject as having Tourette Syndrome or an increased risk of developing Tourette Syndrome. In one embodiment, a copy number of 4 or more compared to a reference sequence identifies the subject as having Tourette Syndrome or an increased risk of developing Tourette Syndrome. Optionally, the reference sequence is a human genome assembly sequence such as human genome assembly 19 (HG19).
In one embodiment the subject is presymptomatic, has one or more clinical symptoms or clinical features associated with Tourette Syndrome, has been diagnosed with Tourette syndrome and/or has at least one blood relation with Tourette Syndrome.
The application also provides isolated nucleic acids. In some embodiments, the isolated nucleic acids are useful as probes or primers for detecting Tourette Syndrome copy number variants.
Accordingly, in one embodiment, the application provides an isolated nucleic acid, wherein the nucleic acid hybridizes to:
a Tourette Syndrome copy number variant or a portion thereof;
a nucleic acid sequence complementary to a); and/or
a nucleic acid sequence corresponding to a).
Optionally, the Tourette Syndrome copy number variant comprises genomic sequence corresponding to chr2:132395155-132526804, chr2:132305299-132343808 or chr2:132480185-132510827 of human genome assembly 19.
In one embodiment, the isolated nucleic acid is a primer or a probe.
The application also provides a kit for screening for, diagnosing or detecting an increased risk of developing Tourette Syndrome comprising:
(a) a Tourette Syndrome copy number variant detection agent comprising an isolated nucleic acid as described here; and
instructions for use or a container for holding the detection agent of (a).
Embodiments of the disclosure will be shown in relation to the drawings in which the following is shown:
The present disclosure relates to the genome-wide identification of “rearrangement hotspots”. These rearrangement hotspots can facilitate the detection of genomic regions capable of mediating genomic rearrangements or aberrations such as de novo deletions or duplications in humans. The disclosure further relates to microarrays that comprehensively target vulnerable or fragile regions in the genome susceptible to disease-associated rearrangements and the use of the microarrays for detecting disease-associated genomic rearrangements. The application also discloses novel copy number variants in chromosome 2 that were identified using the microarrays described herein and co-segregate with Tourette Syndrome status.
The application describes the identification of genome-wide rearrangement hotspots that often predispose to genomic disorders in humans, mediated predominately by non-allelic homologous recombination. The application discloses a hierarchical approach to detect segmental duplication units using an all-hit mapping algorithm, interrogating every 100 by (GC-corrected read depth window with a 1 by overlap) excluding common repeat elements. Reference-guided assembly was obtained from reads based on the NA18507 human genome and duplicated sequences were extracted from the assembly using detected breakpoints (
In the present disclosure, the terms “genomic rearrangements”, “genomic alterations” and “genomic aberrations” refer to structural modifications, changes and alterations in chromosomal DNA. Common genomic rearrangements include copy number variants (CNVs) including gene duplications and gene deletions. In the present disclosure, the term “copy number variation” is defined as the gain or loss of genomic material compared to a reference sequence.
Additional genomic rearrangements, alterations or aberrations include, but are not limited to, insertions, translocations, recombinations, rearrangements and combinations thereof. The modification or change can vary in size from only a few bases to several kilobases. In some embodiments, the genomic material gained or lost in a genomic rearrangement is greater than 250 bp, 500 bp, 1 KB or 2 KB in size. In a genomic rearrangement, one or more parts of a chromosome are optionally rearranged within a single chromosome (intra-chromosomal) or between chromosomes (inter-chromosomal).
Genomic aberrations, rearrangements and alterations may result from multiple events, including but not limited to, non-allelic homologous recombination (NAHR), non-homologous end-joining (NHEJ), fork stalling and template switching (FoSTes) and microhomology-mediated break induced replication (MMBIR).
Diseases associated with, or indicated by, genomic rearrangements include genetic diseases which arise, at least in part, from genomic aberrations, rearrangements and alterations. Examples of genetic diseases associated with genomic rearrangements include, but are not limited to, 3q29 microdeletion/duplication syndrome, globozoospermia and Williams-Beuren syndrome. Several developmental neurocognitive disorders are caused by recurrent and non-recurrent genomic rearrangement and copy number variants have been identified which are responsible for mental retardation, autism spectrum disorders, developmental delays and multiple congenital anomalies. Copy number variants have also been detected in common autoimmune diseases such as ankylosing spondylitis, psoriasis, psoriatic arthritis, psoriasis vulgaris, rheumatoid arthritis, inflammatory bowel disease and systemic lupus erithmatosis.
Genomic Regions Associated with Genomic Rearrangements
The term “genomic region” refers to a contiguous length of nucleotides in a genome of an organism. A genomic region may be in the range of 10 kb in length to an entire chromosome, for example 100 kb to over 1 MB in length. Genomic regions are also referred to as “breakpoints”.
In the present disclosure, “rearrangement hotspots” are genomic regions susceptible to genomic rearrangements such as gene deletions or gene duplications. Examples of rearrangement hotspots are listed in Table 1. A genomic region susceptible to a genomic rearrangement is optionally a genomic region which is at least 10%, 35%, 50%, 100% more likely to undergo a genomic rearrangement than a genomic region that is not susceptible to genomic rearrangements. Such regions may be termed vulnerable or fragile genomic regions. Rearrangement hotspots can be correlated with increased non-allelic homologous recombination event frequency. In some embodiments of the disclosure, rearrangement hotspots are found within segmental duplications. In one particular embodiment, “rearrangement hotspots” are highly homologous regions within segmental duplication units. “Segmental duplications” (also known as low-copy repeats) are regions of DNA greater than 1 kilobase in size which share a high level of sequence homology, for example, more than 75%, 85% 90% or 95% sequence homology. Segmental duplications include both inter- and intra-chromosomal segmental duplications.
In other embodiments, rearrangement hotspots have a significantly high distribution of duplicons (p<1.0×10−6) with at least 10 duplicons per hotspot. Rearrangement hotspots optionally range in size from 100 to 15,000 base pairs, optionally 200 to 1000 base pairs.
“Duplicons” are short regions of homologous DNA, optionally greater than 100 bp. Duplicons are optionally located within segmental duplications and are highly homologous sequences located sparsely within the genome. Homologues of the duplicon can be located within the same chromosome or in other chromosomes
A “rearrangement indicator sequence region” is a genomic region identified to have a propensity for genomic rearrangements or known to be involved in genomic rearrangements. Optionally, a “rearrangement indicator sequence region” is a genomic region susceptible to genomic rearrangements such as gene deletions or gene duplications. A “rearrangement indicator sequence region” is optionally a genomic region which is at least 10%, 35%, 50%, 100% more likely to undergo a genomic rearrangement than a genomic region that is not susceptible to genomic rearrangements.
Optionally, a “rearrangement indicator sequence region” is used to identify when a genomic rearrangement has occurred. In one embodiment, oligonucleotides complementary to a “rearrangement indicator sequence region” are used to detect whether a genomic rearrangement has occurred through competitive hybridization of a test genomic DNA sample and a reference DNA sample to the oligonucleotides.
Rearrangement indicator sequence regions include genomic regions comprising or consisting of at least one rearrangement hotspot. A rearrangement indicator sequence region optionally comprises one rearrangement hotspot or multiple rearrangement hotspots. In some embodiments, rearrangement indicator sequence regions are 100 base pairs to 5 MB in length. Rearrangement indicator sequence regions also include genomic regions containing known CNVs and centromeric and telomeric chromosomal regions.
Examples of rearrangement indicator sequence regions are listed in Tables 1-5.
A “rearrangement indicator set” is a set or group of rearrangement indicator sequence regions that together are indicative of risk, or occurrence, of genomic rearrangements. Optionally, a “rearrangement indicator set” is a set or group of rearrangement indicator sequence regions that cumulatively are indicative of risk, or occurrence, of genomic rearrangements. “Risk of genomic rearrangements” refers to the risk that a particular genomic region may undergo a genomic rearrangement. “Occurrence of genomic rearrangements” refers to the presence of a genomic rearrangement in a subject. A “rearrangement indicator set” optionally comprises at least 500, 750, 1000, 1500, 2000 or 4000 rearrangement indicator sequence regions. In one embodiment, a “rearrangement indicator set” comprises at least 500, 750, 1000, 1500, 2000 or 4000 genomic regions that are at least 10%, 35%, 50%, 100% more likely to undergo a genomic rearrangement than a genomic region that is not susceptible to genomic rearrangements.
Oligonucleotides
The present disclosure provides oligonucleotides complementary to nucleotide sequences contained within genomic regions associated with genomic aberrations or rearrangements. The term “oligonucleotide” refers to short single stranded nucleic acid polymers. Oligonucleotides may be made synthetically or enzymatically. Oligonucleotides may range in length from 2 to 200 base pairs.
In one embodiment, the oligonucleotides described herein are used as array probes. The term “oligonucleotide probe” refers to an oligonucleotide designed for use as an array probe. Optionally, the oligonucleotide probes of the present disclosure range from 25-80 base pairs in length, preferably 45-60 base pairs in length.
The terms “complementary” or “complementarity”, as used herein, refer to the natural binding of polynucleotides under permissive salt and temperature conditions by base-pairing. For example, the sequence “A-G-T” binds to the complementary sequence “T-C-A”. Complementarity between two single-stranded molecules may be “partial”, in which only some nucleotides or portions of the nucleotide sequences of the nucleic acids bind, or it may be complete when total complementarity exists between the single stranded molecules. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridization between nucleic acid strands. In one embodiment of the present disclosure, oligonucleotide probes are complementary to contiguous sequences contained within genomic regions associated with genomic aberrations or rearrangements. In another embodiment, oligonucleotide probes hybridize to contiguous sequences contained within genomic regions associated with genomic aberrations or rearrangements. Optionally, the contiguous sequences are at least 5, 10, 20, 30, 40, 50 or 60 basepairs in length.
The term “hybridization” refers to the specific binding of a nucleic acid to a complementary nucleic acid. In one embodiment of the present disclosure, oligonucleotide probes hybridize under medium stringency hybridization conditions to genomic regions associated with genomic aberrations or rearrangements, for example rearrangement indicator sequence regions. In another embodiment, oligonucleotide probes hybridize under high stringency hybridization conditions to genomic regions associated with genomic aberrations or rearrangements, for example rearrangement indicator sequence regions. The terms “medium stringency hybridization conditions” and “high stringency hybridization conditions” are well known to a person skilled in the art. Examples of hybridization conditions may be found in molecular biology reference texts such as Molecular Cloning: A Laboratory Manual by Sambrook and Russell (3rd Edition, Cold Spring Harbour Press, 2001).
The stringency may be selected based on the conditions used in the wash step. For example, the salt concentration in the wash step can be selected from a high stringency of about 0.2×SSC at 50° C. for 15 minutes. In addition, the temperature in the wash step can be at high stringency conditions, at about 65° C. for 15 minutes.
By “medium stringency hybridization conditions” it is meant that conditions are selected which promote selective hybridization between two complementary nucleic acid molecules in solution. Hybridization may occur to all or a portion of a nucleic acid sequence molecule. The hybridizing portion is typically at least 15 (e.g. 20, 25, 30, 40 or 50) nucleotides in length. Those skilled in the art will recognize that the stability of a nucleic acid duplex, or hybrids, is determined by the Tm, which in sodium containing buffers is a function of the sodium ion concentration and temperature (Tm=31.5° C.−16.6 (Log 10[Na+])+0.41(% (G+C)−600/l), or similar equation). Accordingly, the parameters in the wash conditions that determine hybrid stability are sodium ion concentration and temperature. In order to identify molecules that are similar, but not identical, to a known nucleic acid molecule a 1% mismatch may be assumed to result in about a 1° C. decrease in Tm, for example if nucleic acid molecules are sought that have a >95% sequence identity, the final wash temperature will be reduced by about 5° C. Based on these considerations those skilled in the art will be able to readily select appropriate hybridization conditions.
In some embodiments, stringent or high stringency hybridization conditions are selected. By way of example the following conditions may be employed to achieve high stringency hybridization: hybridization at 5× sodium chloride/sodium citrate (SSC)/5×Denhardt's solution/1.0% SDS at Tm−5° C. based on the above equation, followed by a wash of 0.2×SSC/0.1% SDS at 60° C. for 15 minutes. Moderately stringent hybridization conditions include a washing step in 3×SSC at 42° C. for 15 minutes. It is understood, however, that equivalent stringencies may be achieved using alternative buffers, salts and temperatures. Additional guidance regarding hybridization conditions may be found in: Current Protocols in Molecular Biology, John Wiley & Sons, N.Y., 1989, 6.3.1-6.3.6 and in: Sambrook et al., Molecular Cloning, a Laboratory Manual, Cold Spring Harbor Laboratory Press, 2000, Third Edition.
According to the methods of the disclosure, oligonucleotide probes may be designed which are complementary to the identified rearrangement indicator sequence regions. Optionally, the oligonucleotide probes are designed to hybridize under medium stringency or high stringency conditions to the rearrangement indicator sequence regions.
In one embodiment, the oligonucleotide probes are complementary to, or hybridize to, the genomic regions set out in Tables 1-5. Optionally, the oligonucleotides may be complementary to, or hybridize to, sequences corresponding to the genomic regions set out in Tables 1-5. It is appreciated that there is variation in human genome sequences and the regions set out in Tables 1-5 specifically reflect human genome NA18507. The present disclosure encompasses regions in other human genomes that correspond to the regions depicted in Tables 1-5.
The term “distinct oligonucleotide probes” refers to oligonucleotide probes that each have different/distinct oligonucleotide sequences. Within the context of the present disclosure, “distinct oligonucleotide probes” each bind to, or are complementary to, different/distinct rearrangement indicator sequence regions.
Within a rearrangement indicator sequence region, the spacing between individual oligonucleotide probes ranges from not more than 100, 150, 280, 300, 500 or 1000 base pairs apart. The mean spacing between oligonucleotides with a single rearrangement indicator sequence region is approximately 280 base pairs, optionally 200 to 350 base pairs.
One aspect of the application provides an array for use in the methods described herein. In one embodiment, the application provides an array comprising a solid support having a plurality of addresses, wherein each address has disposed thereon an oligonucleotide probe that can specifically bind genomic DNA. In some embodiments, an array contains at least one, ten, 100, 1000, 10,000, 100,000, or 1,000,000 features in an area that is less than 20 cm2, 10 cm2, 5 cm2, 1 cm2 or less than 1 mm2.
The application also provides a “microarray chip system” comprising at least one solid support, optionally a glass slide, upon which oligonucleotides, such as the oligonucleotide probes described herein, have been arrayed. A microarray chip system includes more than one solid support wherein a unique collection of probes are arrayed on each support.
In one embodiment of the disclosure, the microarray system comprises a plurality of oligonucleotide probes bound to at least one solid support, the oligonucleotide probes comprising nucleotide sequences complementary to at least 500, 750, 1000 or 1500 rearrangement indicator sequence regions and wherein the at least 500, 750, 1000 or 1500 rearrangement indicator sequence regions are represented by at least one oligonucleotide probe. Optionally, the oligonucleotide probes hybridize under medium stringency or high stringency hybridization conditions to at least 500, 750, 1000 or 1500 rearrangement indicator sequence regions. In a preferred embodiment, the rearrangement indicator sequence regions are selected from the genomic regions listed in Tables 1-5.
The present disclosure also relates to methods for manufacturing microarray systems. In one embodiment of the disclosure, a method of constructing a microarray chip or microarray chip system for detecting copy number variations comprises identifying at least 500, 1000, 1500 rearrangement indicator sequence regions, designing oligonucleotide probes corresponding to the genomic regions and arraying the oligonucleotide probes on a microarray chip.
In a further embodiment, the oligonucleotides described herein are arrayed on microarray chips. Optionally, the oligonucleotide probes are arrayed on at least 1, at least 2, at least 3 or at least 4 microarray chips. In one embodiment, one million probes are arrayed on two microarray chips for a total of two million probes.
In a preferred embodiment, the oligonucleotide probes are arrayed on the support using inkjet technology, for example, Agilent's Sureprint system.
The disclosure provides methods for detecting a genomic rearrangement in a subject. Optionally, the disclosure provides for the use of the microarray chips described herein for detecting genomic rearrangements.
The methods of the disclosure further relate to the use of the microarray chips for diagnosing genomic disorders and for diagnosing the propensity to develop a particular disorder. The methods of the disclosure also relate to the use of the microarray chips for identifying a genetic basis for known diseases and for characterizing the specific genomic rearrangements that lead to a particular genetic disorder. In another embodiment of the disclosure, the present microarray chips are used to identify novel genomic rearrangements.
In one embodiment, the method provides competitively hybridizing test DNA samples and reference DNA samples to at least one microarray chip comprising oligonucleotides complementary to at least 500, 750, 1000 or 1500 rearrangement indicator sequence regions. In some embodiments, the methods provide labeling a test DNA sample with a first label and a reference DNA sample with a second label. Optionally, the DNA is genomic DNA.
The term “genomic DNA” refers to deoxyribonucleic acids that are obtained from an organism. The organism may be a human subject, a mouse or any other organism of interest. “Genomic DNA” may be purified, isolated, amplified, fragmented DNA. Optionally, genomic DNA is obtained from biological samples including, but not limited to, cell, tissue, organ, body fluid, excretory samples.
In some embodiments, the test DNA sample is genomic DNA to be tested for genomic rearrangements or aberrations and the reference DNA sample is a standard for detecting differences between the test DNA sample and the reference DNA sample. The test DNA sample may be a genomic DNA sample from a subject believed or suspected to have at least one genomic rearrangement or a disease associated with at least one genomic rearrangement or believed or suspected to have to have at least one genomic rearrangement or a rearrangement for a disease associated with at least one genomic rearrangement. For example, the subject can be a member of a family known to be affected by at least one genomic rearrangement or for a disease associated with at least one genomic rearrangement. In another embodiment, the test DNA sample is a genomic DNA sample from a subject, wherein the subject is to be tested or screened for at least one genomic rearrangement or for a disease associated with at least one genomic rearrangement.
In some embodiments, the reference DNA sample is genomic DNA from a subject who does not have at least one genomic rearrangement or a disease associated with at least one genomic rearrangement.
In a preferred embodiment, the test DNA sample and reference DNA sample are labeled with a substance that allows the quantity of each sample to be detected. Optionally, the labels are fluorescent labels or fluorophores. In one embodiment, the labels are Cy3 and Cy5.
According to the methods of the disclosure, the labeled test DNA and the labeled reference DNA are hybridized competitively to oligonucleotide probes. Optionally, the labeled test DNA and the labeled reference DNA are mixed together prior to hybridization. In one embodiment, labeled test DNA and the labeled reference DNA is hybridized under high stringency or medium stringency conditions to a microarray chip described herein. In a preferred embodiment, the labeled test DNA and the labeled reference DNA is hybridized to at least one microarray comprising oligonucleotide probes complementary to at least 500, 750, 1000 or 1500 rearrangement indicator sequence regions.
The hybridization may be performed under any appropriate hybridization conditions. Preferably, the hybridization is carried out under medium stringency or high stringency hybridization conditions. Optionally, the hybridization is carried out at around 37° C., 48° C. or 60° C. for at least 24, 36, 48, 80 or 86 hours.
Genetic aberrations or rearrangements are optionally detected using the resultant florescent intensities as an indicator.
In one embodiment of the disclosure, the fluorescent intensity on the oligonucleotide probe is measured and genomic rearrangements are detected using fluorescence intensity as an indicator. In one embodiment, in order to detect a genomic rearrangement (for example, a duplication or deletion of the test DNA), the fluorescence intensity ratio of the labeling substance derived from the reference genomic DNA to labeling substance derived from the test genomic DNA is determined from the fluorescence intensity obtained. Fluorescence intensity can be determined, for example, using an image analyzer.
When the ratio of fluorescence intensity of the labeling substance derived from the reference DNA to the labeling substance derived from the test DNA is high, more reference DNA than test DNA has hybridized to the oligonucleotide probe and a potential genomic deletion in the test DNA (a decrease in copy number) is indicated.
When the ratio of fluorescence intensity of the labeling substance derived from the reference DNA to the labeling substance derived from the test DNA is low, more test DNA than reference DNA has hybridized to the oligonucleotide probe and a potential genomic duplication in the test DNA (an increase in copy number) is indicated.
In one embodiment of the disclosure, the analysis parameters for the microarray are as follows:
GC correction is applied on ever 2 kb window of the genome using an ADM-2 algorithm
The derivative of the log spread ration (DLRS) must be <0.3, optionally <0.1; <0.2; <0.3; <0.4 or <0.5 for a sample
The different must be seen in at least 5 probes
The log 2 ratio between the signal corresponding to the test sample and the signal corresponding to the reference sample must be >0.25 or <−0.25, optionally >0.25 or <−0.25, optionally >0.1 or <−0.1; >0.15 or <−0.15; >0.2 or <−0.2; >0.25 or <−0.25; >0.3 or <−0.3; >0.35 or <−0.35; or >0.5 or <−0.5.
In one embodiment of the disclosure, a log 2 ratio of >0.1 or <−0.1; >0.15 or <−0.15; >0.2 or <−0.2; >0.25 or <−0.25; >0.3 or <−0.3; >0.35 or <−0.35; or >0.5 or <−0.5 indicates a putative genomic rearrangement. In another embodiment, a log 2 ratio of >0.25 or <−0.25, indicates a putative genomic rearrangement.
In one embodiment of the disclosure, a method is provided for detecting genomic rearrangements associated or predisposing subjects to developmental neurocognitive disorders (for example, autism spectrum disorder or Tourette syndrome) and complex autoimmune disorders (for example, psoriasis and ankylosing spondylitis).
In one embodiment, the method comprises obtaining genomic DNA from a test subject and genomic DNA from a reference subject. Optionally, the reference subject does not suffer from a developmental neurocognitive disorders (for example, autism spectrum disorder or Tourette syndrome) and/or complex autoimmune disorders (for example, psoriasis and ankylosing spondylitis).
The genomic DNA from the test subject is optionally labeled with a first flourophore, optionally Cy3 or Cy5, and the genomic DNA from the reference subject is optionally labeled with a second flourophore, optionally Cy3 or Cy5. The labeled DNA is competitively hybridized to a microarray chip comprising oligonucleotide probes complementary to genomic regions known to be associated with genomic rearrangements that can indicate a developmental neurocognitive disorder (for example, autism spectrum disorder or Tourette syndrome) and/or a complex autoimmune disorder (for example, psoriasis or ankylosing spondylitis).
In one embodiment, genomic rearrangements in the following genes/regions can be used to indicate developmental neurocognitive disorders (for example, autism spectrum disorder or Tourette syndrome) and/or complex autoimmune disorders (for example, psoriasis and ankylosing spondylitis) or the propensity to develop such a disorder:
Using the microarray chips described herein, the inventors discovered a genomic region on chromosome 2 that is associated with novel copy number variants that segregate with Tourette Syndrome status.
Accordingly, the application discloses methods of screening for, diagnosing and/or detecting an increased risk of developing Tourette Syndrome in a subject comprising detecting the presence of a Tourette Syndrome copy number variant in a Tourette syndrome critical region within a sample of the subject, wherein the presence of a Tourette syndrome copy number variant in a Tourette syndrome critical region is indicative that the subject has Tourette syndrome and/or an increased risk of developing Tourette syndrome.
As used herein, Tourette syndrome (TS) refers to a developmental neuropsychiatric disorder characterized by the presence of motor (simple and/or complex) and verbal tics with duration longer than one year [Pauls et al. 1991; Price et al. 1985; State 2011]. TS often manifests with features associated with obsessive compulsive disorder (OCD), attention deficit hyperactivity disorder (ADHD), poor impulse control and other behavioural abnormalities.
As used herein the phrase “screening for, diagnosing or detecting Tourette Syndrome” refers to a method or process of determining if a subject has Tourette Syndrome. Further, the phrase “screening for, diagnosing or detecting a risk of developing a Tourette Syndrome” refers to a method or process of determining if a subject has an increased risk of developing Tourette Syndrome.
As used herein, the term “Tourette Syndrome critical region” refers to a genomic region wherein at least one copy number variant within the genomic region segregates with Tourette Syndrome status. “Segregates with Tourette Syndrome status” indicates that a copy number variant is associated with Tourette Syndrome. For example, a particular copy number variant (for example, a duplication or a deletion) is present in subjects who have Tourette Syndrome but is absent in subjects who do not have Tourette Syndrome.
In one embodiment, the “Tourette Syndrome critical region” is located on chromosome 2. In another embodiment, the “Tourette syndrome critical region” is located at 2q14.3-q21.2. In another embodiment, the “Tourette syndrome critical region” is located at 2q21.1-21.2.
As used herein, a “copy number variant” or CNV is a DNA sequence of one kilobase (kb) or longer (for example, at least 2, 5, 10, 30, 50, 100, 150 or 200 kb in length) that is present at a variable copy number in comparison with a reference genome. Examples of reference genomes include the human genome assemblies such as human genome assembly 19 (HG19) and human genomes NA18507, NA10851, NA15510, NA07048.
Examples of copy number variants include “duplications” where the copy number of the DNA sequence is higher compared to a reference genome and “deletions” where the copy number of the DNA sequence is lower compared to a reference genome.
A “Tourette Syndrome copy number variant” refers to a copy number variant, for example a duplication or a deletion, that is associated with or useful for screening, diagnosing or detecting an increased risk of developing Tourette Syndrome when compared to a reference sequence (for example, human genome assembly 19). A “Tourette Syndrome copy number variant” is present in a higher or lower copy number in a subject with Tourette Syndrome or a subject with an increased risk of Tourette Syndrome compared to a subject not affected by Tourette Syndrome. The Tourette Syndrome copy number variant is in one embodiment inherited e.g. a germline mutation. In another embodiment, the copy number variant is sporadic.
In one embodiment, a “Tourette Syndrome copy number variant” is a duplication, i.e, a stretch of genomic sequence that is present in a higher copy number compared to a reference, or wild-type genomic sequence. In another embodiment, a “Tourette Syndrome copy number variant” is a deletion, i.e., a stretch of genomic sequence that is present in a lower copy number compared to a reference, or wild-type genomic sequence.
A reference genomic sequence is genomic DNA obtained from a subject who does not have Tourette syndrome or an increased risk of Tourette syndrome (also known as an “unaffected sample”). Reference genomic sequences include, but are not limited to, human genome sequences NA18507, NA10851, NA15510 NA07048 and human genome assemblies such as human genome assembly 19 (HG19).
In one embodiment, a “Tourette Syndrome copy number variant” is located on chromosome 2. In another embodiment, the “Tourette Syndrome copy number variant” is located at 2q14.3-q21.2. In another embodiment, the “Tourette Syndrome copy number variant” is located at 2q21.1-21.2. In another embodiment, a “Tourette Syndrome copy number variant” is found within the C2orf27A gene.
In one embodiment, a “Tourette syndrome copy number variant” is a 38 kb duplication located at chromosome 2q21.1 within the genomic region corresponding chr2:13205299-132343808 of human genome assembly 19 (HG19).
In another embodiment, a “Tourette syndrome copy number variant” is a 131 kb duplication located at chromosome 2q21.1 within the genomic region corresponding to chr2:132395155-132526804 of human genome assembly 19 (HG19).
In another embodiment, a “Tourette syndrome copy number variant” is a partial 30 kb duplication located at chromosome 2q21.1 within the genomic region corresponding to chr2:132480185-132510827 of human genome assembly 19 (HG19).
The term “corresponding to” as used herein means situated in a different sequence position but having sequence characteristics in common, including identical, or substantially identical, nucleotide sequence flanking the mutation (eg. substantial identity is optionally at least 75% identity over four or more contiguous nucleotides). For example, “genomic region corresponding to chr2: 132305299-132343808 of human genome assembly 19” refers to a genomic region that is equivalently situated in terms of flanking sequence and relative position to chr2: 132305299-132343808 of human genome assembly 19 but that may be identified by a different genomic position in a different genome assembly or reference sequence. Further “corresponding to” can refer to derived from or related to, for example a nucleic acid corresponding to a gene refers to a nucleic acid derived from the gene such as a transcript and/or an amplified or synthetic copy related to the gene.
The terms “risk” and “increased risk” as used herein refer to a subject having a predisposition to developing a disease e.g. increased risk compared to the average risk of a population. The predisposition is optionally inherited, or optionally acquired (e.g. sporadic mutation). The increased risk is relative to a subject not having a Tourette Syndrome copy number variant.
The term “sample” and “sample of a subject” as used herein refer to any sample of a subject that comprises nucleic acids, for example genomic DNA, and/or includes sequence or sequence data corresponding to genomic sequence. In one embodiment, the sample comprises blood, whole blood or a fraction thereof. In another embodiment, the sample is selected from the group consisting of fresh tissue such as a biopsy, frozen tissue and paraffin embedded tissue.
The term “subject” as used herein includes all members of the animal kingdom including multicellular organisms, including mammals, and preferably means humans.
Tourette Syndrome can be difficult to diagnose. The inventors have determined that the methods described herein identify individuals presymptomatically. Accordingly, in one embodiment, the individual is presymptomatic.
As used herein, “a relative” or “blood relation” is a relative genetically related, or related by birth, and includes without limitation 1st, 2nd, 3rd, 4th, 5th, 6th, 7th, 8th, 9th and 10th degree relations, for example but not limited to parents, children, grandchildren, grandparents, cousins and/or 2nd cousins related by blood.
Tourette Syndrome copy number variants are readily detected using isolated nucleic acids and/or compositions comprising isolated nucleic acids that are specific for a Tourette Syndrome copy number variant.
Accordingly in one aspect, the application provides isolated nucleic acids useful for detecting Tourette Syndrome copy number variants and compositions and reagents comprising isolated nucleic acids useful for detecting Tourette Syndrome copy number variants. Another aspect provides an isolated nucleic acid molecule comprising a nucleic acid sequence comprising a Tourette Syndrome copy number variant or a portion thereof.
The term “isolated nucleic acid sequence” and/or “oligonucleotide” as used herein refers to a nucleic acid substantially free of cellular material or culture medium when produced by recombinant DNA techniques, or chemical precursors, or other chemicals when chemically synthesized. The term “nucleic acid” and/or “oligonucleotide” as used herein refers to a sequence of nucleotide or nucleoside monomers consisting of naturally occurring bases, sugars, and intersugar (backbone) linkages, and is intended to include DNA and RNA which can be either double stranded or single stranded and represent the sense or antisense strand.
One aspect of the application provides an isolated nucleic acid molecule, wherein the isolated nucleic acid molecule hybridizes to:
(a) a Tourette Syndrome copy number variant or a portion thereof;
(b) a nucleic acid sequence complementary to a); and/or
(c) a nucleic acid sequence corresponding to a).
Optionally, the Tourette Syndrome copy number variant comprises genomic sequence chromosome 2, optionally with region 2q21.1-21.2. In one embodiment, the Tourette Syndrome copy number variant corresponds to chr2:132395155-132526804, chr2:132305299-132343808 or chr2:132480185-132510827 of human genome assembly 19.
In one embodiment, the isolated nucleic acid molecule is a probe or a primer used to detect a Tourette Syndrome copy number variant.
The hybridization is optionally under high or medium stringency conditions. Appropriate stringency conditions which promote hybridization are known to those skilled in the art, or can be found in Current Protocols in Molecular Biology, John Wiley & Sons, N.Y. (1989), 6.3.1 6.3.6. High and medium stringency hybridization conditions are also described herein.
In an embodiment, the isolated nucleic acid molecule is useful as a primer. The term “primer” as used herein refers to a nucleic acid sequence, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of synthesis when placed under conditions in which synthesis of a primer extension product, which is complementary to a nucleic acid strand is induced (e.g. in the presence of nucleotides and an inducing agent such as DNA polymerase and at a suitable temperature and pH). The primer must be sufficiently long to prime the synthesis of the desired extension product in the presence of the inducing agent. The exact length of the primer will depend upon factors, including temperature, sequences of the primer and the methods used. A primer typically contains 15-25 or more nucleotides, although it can contain less, for example, up to 5, 10, 12 or 15 nucleotides. The factors involved in determining the appropriate length of primer are readily known to one of ordinary skill in the art.
In one embodiment, the primers hybridize under medium or high stringency conditions to the Tourette Syndrome copy number variants described herein and allow amplification of a Tourette Syndrome copy number variant or a portion thereof. As used in relation to a primer, “a portion thereof” of a Tourette Syndrome copy number variant refers to a portion sufficient to prime amplification of the intended template.
In another embodiment, the application describes probes that are useful for detecting a Tourette Syndrome copy number variant.
The term “probe” as used herein refers to a nucleic acid sequence that will hybridize to a nucleic acid target sequence. In one example, the probe hybridizes to a Tourette Syndrome copy number variant or a nucleic acid sequence complementary to Tourette Syndrome copy number variant. The length of probe depends on the hybridization conditions and the sequences of the probe and nucleic acid target sequence. In one embodiment, the probe is at least 8, 10, 15, 20, 25, 50, 75, 100, 150, 200, 250, 400, 500 or more nucleotides in length. As used in relation to a probe, “a portion thereof” of a Tourette Syndrome copy number variant refers to a portion sufficient to specifically hybridize to the intended template.
Another aspect of the application provides an isolated nucleic acid molecule which has at least 75, 80, 85, 90, 95 or 99% sequence identity to a Tourette Syndrome copy number variant or a portion thereof. In another embodiment, an isolated nucleic acid molecule is provided which has at least 75, 80, 85, 90, 95 or 99% sequence identity to the complement of a Tourette Syndrome copy number variant or a portion thereof.
The term “sequence identity” as used herein refers to the percentage of sequence identity between two nucleic acid sequences. To determine the percent identity of two nucleic acid sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in the sequence of a nucleic acid sequence for optimal alignment with a second nucleic acid sequence). The nucleotides at corresponding nucleotide positions are then compared. When a position in the first sequence is occupied by the same nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences (i.e., % identity=number of identical overlapping positions/total number of positions.times.100%). In one embodiment, the two sequences are the same length. The determination of percent identity between two sequences can also be accomplished using a mathematical algorithm. A preferred, non-limiting example of a mathematical algorithm utilized for the comparison of two sequences is the algorithm of Karlin and Altschul, 1990, Proc. Natl. Acad. Sci, U.S.A. 87:2264-2268, modified as in Karlin and Altschul, 1993, Proc. Natl. Acad. Sci. U.S.A. 90:5873-5877. Such an algorithm is incorporated into the NBLAST and XBLAST programs of Altschul et al., 1990, J. Mol. Biol. 215:403. BLAST nucleotide searches can be performed with the NBLAST nucleotide program parameters set, e.g., for score=100, wordlength=12 to obtain nucleotide sequences homologous to a nucleic acid molecules of the present application. To obtain gapped alignments for comparison purposes, Gapped BLAST can be utilized as described in Altschul et al., 1997, Nucleic Acids Res, 25:3389-3402, Alternatively, PSI-BLAST can be used to perform an iterated search which detects distant relationships between molecules (Id.). When utilizing BLAST, Gapped BLAST, and PSI-Blast programs, the default parameters of the respective programs (e.g., of XBLAST and NBLAST) can be used (see, e.g., the NCBI website), The percent identity between two sequences can be determined using techniques similar to those described above, with or without allowing gaps. In calculating percent identity, typically only exact matches are counted.
In certain embodiments the isolated nucleic acid comprises a detectable label, such as a fluorescent or radioactive label.
Another aspect of the disclosure provides a reagent for detecting and/or amplifying a Tourette Syndrome copy number variant, such as the isolated nucleic acid primers described herein.
In one embodiment, a reagent for detecting a Tourette Syndrome copy number variant comprises an isolated nucleic acid molecule comprising:
a) an isolated nucleic acid molecule, wherein the isolated nucleic acid molecule hybridizes to a Tourette Syndrome copy number variant or a portion thereof; or
b) a nucleic acid molecule with at least 80%, 90%, 95%, or 99% sequence identity to a), characterized in that the nucleic add molecule is capable of binding a Tourette Syndrome copy number variant under stringent conditions.
A person skilled in the art will appreciate that a number of methods are useful for detecting the presence of a Tourette Syndrome copy number variant. For example a variety of techniques are known in the art for detecting copy number variants within a sample of nucleic acid, including, but not limited to, PCR, RT-PCR, QF-PCR, fluorescent in situ hydridization (FISH) and microarray analysis,
A Tourette Syndrome Copy number variant is optionally detected using the microarrays described herein which are designed to detect copy number variants.
In one embodiment, genomic DNA from a test subject who may have Tourette syndrome or be at a risk of Tourette Syndrome is optionally labeled with a first fluorophore, optionally Cy3 or Cy5, and the genomic DNA from a reference subject or a reference genome is optionally labeled with a second fluorophore, optionally Cy3 or Cy5. The labeled DNA is competitively hybridized to a microarray chip comprising oligonucleotide probes complementary to a Tourette Syndrome critical region. Differential binding of the test genomic DNA sample and a reference genomic DNA sample to at least one oligonucleotide probe complementary to a Tourette Syndrome critical region indicates a Tourette Syndrome copy number variant. For example, higher binding of the test genomic DNA sample than the reference genomic DNA sample to at least one oligonucleotide probe indicates a duplication associated with Tourette Syndrome and higher binding of the reference genomic DNA sample than the test genomic DNA sample to at least one oligonucleotide probe indicates a deletion associated with Tourette Syndrome.
In another embodiment, a Tourette Syndrome copy number variant is optionally detected using Quantitative Fluorescent PCR (QF-PCR). Using this method, primers are used to amplify genomic sequence contained with a Tourette Syndrome copy number variant such as the Tourette Syndrome copy number variants described herein. A person of skill in the art could readily design primers to amplify a copy number variant. The primers are used to amplify genomic sequence from a test subject and a reference standard (for example a reference sequence such as human genome assembly 19). QF-PCR is used to analyze the amount of nucleic acid amplified from the test subject and the reference standard. An increase in amplified sequence from the test subject compared to the reference standard indicates a duplication in the test subject. A decrease in amplified sequence from the test subject compared to the reference standard indicates a deletion in the test subject.
In one embodiment, Tourette Syndrome copy number variants are first detected using a microarray, and then the copy number variant is confirmed using a secondary method such as QF-PCR.
Another aspect of the disclosure is a kit for screening for, diagnosing the presence of, or detecting a risk of developing, Tourette Syndrome. In one embodiment, the kit comprises one or more isolated nucleic acid molecules and/or reagents described herein and instructions for use. In another embodiment, the kit comprises one or more isolated nucleic acid molecules and/or reagents described herein and a container for holding or storing the isolated nucleic acid molecules and/or reagents
In an embodiment the kit comprises an isolated nucleic acid molecule or composition that specifically hybridizes to Tourette Syndrome copy number variant, e.g. a probe or a primer. In an embodiment the nucleic acid molecule sequence is complementary to a Tourette Syndrome copy number variant or a portion thereof or the complement thereof. In another embodiment, the nucleic acid molecule comprises a detectable label such as a fluorescent molecule. In a further embodiment, the kit comprises an isolated nucleic acid molecule useful as a primer.
In certain embodiments, the kit is a diagnostic kit for medical use. In other embodiments, the kit is a diagnostic kit for laboratory use.
In another aspect the disclosure provides a commercial package comprising an isolated nucleic acid or reagent described herein and instructions for use.
The following non-limiting examples are illustrative of the present disclosure:
Embodiments of the disclosure will be illustrated in a non-limiting way by reference to the examples below.
Given that SDs intuitively consist of common repeat elements, SDs were fragmented into multiple smaller SD units which did not overlap with known repeat elements during the read depth-based analysis. In this study, 20,237 non-redundant sets of SD units with at least one inter- or intra-chromosomal rearrangement event were identified, representing 16.65 Mbp of SD units residing outside of common repeat elements in the human genome. At first glance, this total content of SDs may appear small compared with that previously reported [Bailey J A et al, (2002)] and that reported in the database of genomic variants (DGV) which is mainly attributed to methodological differences (i.e., exclusion of common repeats, GC-correction, shorter window length, low read depth threshold). Results from this study and Perry at al [Perry H G at al. (2008)], suggest that previously reported SD breakpoints are overinflated in size, further emphasizing the importance of creating a high-resolution map of ‘rearrangement hotspots’. Read depth distribution for duplicated and non-duplicated regions throughout the genome produced a distinctive distribution pattern with an approximate 7% error rate.
Considering CNVs have a tendency to overlap with nearby SD breakpoints, the results of this study were compared with a recent study which identified common CNV breakpoints in three populations (i.e., 57 Yoruba, 48 European and 54 Asian individuals) [Sudmant P H, 2010]. The detected autosomal SD units greater than 200 by shared 82% concordance (i.e., >50% overlap) with common CNV breakpoints using low coverage short-read data. Moreover, 79% of breakpoints residing within genes with >3 copies as previously reported [Alkan C, 2009], were located within SD breakpoints identified in this study.
Comparison with previous read depth-based reports highlights the advantages of the present hierarchical strategy which include: 1) the use of a 100 by read depth window with a 1 by overlap to detect SD units which enabled the capacity to detect SD units with higher resolution; 2) the use of a lower threshold (i.e., mean+2 standard deviations) than previously reported methods in order to detect homozygous and hemizygous duplications; 3) fragmentation of SDs into smaller SD units in order to separate duplicated regions from common repeated elements while reducing alignment bias for rearrangement analysis and computational time; and 4) integration of end space alignment algorithm with a ‘seed and extend’ clustering technique to the duplicated region of the reference guided assembly sequences to perform an exhaustive search (i.e., 409 million alignments) to identify rearrangement breakpoints.
Compared with copy number gains identified using microarray analysis [Conrad, D F et al. (2010)], sequencing data used in this study revealed that autosomal SD unit breakpoints overlapped 54% with copy number gains [Conrad, D F at al. (2010)], which increased to 67% when compared with 43× coverage [Sudmant P H et al. (2010)]]. Discrepancies are attributed to methodical biases, as detection of structural variants can be specific to different methodical approaches and discrepancies between methods can be as high as 80% [Alkan C et al. (2011)]. The rearrangement analysis within the novel sequence revealed multiple hits within the duplicated sequences (i.e., >90% similarity) that were previously uncharacterized.
Using 409 million pai/vise alignments, 1963 complex SD units or ‘rearrangement hotspots’ within SDs in the human genome with significantly high distribution of duplicons (p<1.0×10−6) with at least 10 duplicons per SD unit were identified (
Segmental duplications (SDs) can be categorized according to the location of the rearrangement considering that recombination events can occur between homologues (i.e, inter-chromosomal) or by looping out within a single homologue (i.e., intra-chromosomal). The analysis revealed that 7% of genes (i.e., 1,626/22,159) overlapped with 5,502 non-redundant SD units which represented 73% (i.e., 41/56) of the most highly variable genes previously identified in the human genome within three populations [Sudmant P H at al. (2010)] (
Previous cytogenetic studies have demonstrated that pericentromeric and subtelomeric SD regions are strikingly polymorphic and both represent hotbeds for genomic rearrangement [Mefford H et al. (2002) and She X et al. (2004)]. Investigation of recombination within SD units revealed that pericentromeric regions of chromosomes 2, 5, 7, 10, 15, 16, 17, 22 and Y were enriched with inter-chromosomal recombination, whereas only chromosome 11 was associated with intra-chromosomal breakpoints. Subtelomeric regions of chromosomes 1, 2, 4, 7, 9, 10, 11, 16, 19, 20, 22, and X were enriched with inter-chromosomal recombination, whereas chromosomes 3, 6, 12, 13, 14 and Y were associated with extreme intra-chromosomal breakpoints. This idiosyncratic rearrangement pattern suggests that multiple translocations involving distal regions of chromosomes create complex breakpoints within SDs. This is exemplified by the pseudoautosomal region 1 (PAR1) which displayed extensive inter- and intra-chromosomal tandem duplications, consistent with sex chromosome evolution. Another complex region where extensive intra-chromosomal rearrangements were identified is the distal heterochromatic region of the Y chromosome (i.e., Yq12), housing the male specific (MSY) region. In the analysis, both homozygous and hemizygous duplications were detected using read depth information which represents an extension to previous SD analysis [Sudmant P H et al. (2010) and Alkan C et al. (2009)] by the inclusion of sex chromosomes.
Complex rearrangements in multiple gene families where rapid evolution of NBPF, PRAME, RGPD, GAGE, LRRC, TBC1, NPIP and TRIM gene families were identified. Without being bound by theory, this appears to be predominantly attributed to intra-chromosomal gene transfer, whereas other complex gene families (e.g., ANKRD, OR, GUSB, FAM, POTE, ZNF and GOLG) appear to be more diverse with respect to transfer of gene content, occurring both within and between chromosomes. As previously reported [Alkan C et al, (2009)], the DUX family gene was associated with the most copies within the reference genome. The rearrangement analysis of the novel sequence within 10q26.3 region suggests at least 10 additional copies of the DUX4 gene is specific to novel sequences within the NA18507 human genome.
Gene Ontology Analysis within ‘Rearrangement Hotspots’
To investigate the impact of genes residing within ‘rearrangement hotspot’ regions identified in this study and theft relation to complex disease, genes were functionally categorized using PANTHER gene ontology analysis. Genes residing within ‘rearrangement hotspot’ regions appear to be involved in functions associated primarily with nucleic acid metabolism (22%) and cellular processes (16%), although associations also exist for developmental process (9%), cell cycle (9%), and cell communication (8%). This finding is consistent with a previous report in which copy number gains were associated with genes involved in nucleic acid metabolism and developmental processes, whereas copy number losses were enriched for genes involved in cell adhesion [Park H et al. (2010)]. That genes residing in ‘rearrangement hotspot’ regions are consistently associated with functions affecting multiple processes important in normal growth and development, further underscores the critical role that rearrangement hotspots play in the genetic etiology of complex disease.
A genome-wide high resolution map of ‘rearrangement hotspots’ has been produced. Without being bound by theory, these ‘rearrangement hotspots’ likely serve as templates for NAHR and consequently may represent an underlying mechanism for development of constitutional and acquired diseases arising from de novo deletions or duplications. A collection of 24 previously identified genomic disorders predominantly mediated by de novo NAHR events are catalogued in the DECIPHER database [DECIPHER Genomic Aberration Database: http://decipher.sanger.ac.uki]. Comparison of the hotspot regions identified in the present study with pathogenic deletions/duplications breakpoints mapped for those genomic disorders constituting only 15 common genomic loci revealed that 20% of the detected hotspots are clustered within proximal and distal SDs that are flanked by these pathogenic deletions/duplications (
The rearrangement structure of these hotspots based on the present in silico predictions (
A third complex region, revealed a previously uncharacterized gene desert within 1q21 indicating a possible harvest region for the NBPF gene family. This 68 Kbp gene desert region revealed extreme intra-chromosomal rearrangement without any signature of inter-chromosomal duplication in our in silico analysis (
Data Acquisition and Processing
Short read data was obtained for the NA18507 human genome sequenced using reversible terminator chemistry on an Illumina Genome Analyzer [Bently D R, 2008]. The original data consisted of >30× coverage of the genome. More than half of the data was obtained from the Short Read Archive Provisional FTP (NCBK) site (ftp://ftp.ncbi.nih.gov/pub/TraceDB/ShortRead/SRA000271/) with an average read length of approximately 36 bp. The analysis accuracy of this dataset has been previously described [Bently D R et al. (2008)]. The 4.8 Mb novel sequence detected in the NA18507 genome by a previous de novo assembly was also integrated in our rearrangement analysis. The length distribution revealed that the contigs/scaffolds are over fragmented and >80% of the sequence length is <1 kb in length. The NA18507 human genome was selected as it is representative of the ancestral African Euroban population which has been previously shown to contain the most diverse polymorphisms compared with other populations [Sudmant P H et al. (2010) and Alkan C et al. (2009)], rendering it an ideal sample to generate a ‘rearrangement hotspot’ map as the majority of the hotspot regions detected should exist within other populations.
mrsFAST (micro-read substitution only fast alignment search tool—version 2.3.0.2) was applied which implements an all-to-all algorithm unlike other short read mapping algorithms [Hach F et al. (2010)]. Specifically, it is a fast alignment search tool which uses cache oblivious short read mapping algorithm to align short reads in an individual genome against a repeat masked reference human genome within a user-specified number of mismatches. The short reads were mapped using mrsFAST with a maximum of two mismatches allowed against the repeat masked (UCSC hg18) genome assembly. mrsFAST returns all possible hits in the genome for a short read, allowing the detection of differential read depth distribution within duplicated regions of the human genome. Using the NA18507 human genome (18× coverage), 1.5 billion short reads were processed with 55.78% (i.e., ˜839 million short reads) mapped to the repeat masked human reference genome with the mrsFAST aligner which returned all possible mapping locations of a read; a key requirement to accurately predicting the duplicated regions within the reference genome.
There exists a known bias with next generation sequencing technology towards GC-rich and GC-poor regions. Moreover, during library preparation using an illumina Genome Analyzer, amplification artefacts are introduced in both GC-poor and GC-rich regions producing an uneven distribution of read coverage [Alkan C et al. (2009)] which has the potential of detecting false positive duplicated regions. To reduce this bias, a simple GC correction method was used. Overlapping windows (i.e., by 1 bp) with length ‘/’ was used for read depth computation. Each read was assigned only once by its starting position and read depth was computed for each chromosomal position. The original mean read depth was calculated for each length (i.e., 100 bp) block using equation (1). G+C percentage for every 100 by window from the reference human genome and the read depth was computed and subsequently interrogated for adjustment. The adjusted read depth was computed using the following equation:
where RDi, adjusted is the read depth after GC correction, RDi is the original read depth computed for ith window, m1 is the overall median of all the windows with 100 by length and m2 is the mean depth for all windows with same GC percentage. All subsequent analysis was carried out on the GC-corrected read depth.
The first step in dissecting SD unit breakpoints using the NA18507 genome from all hit map information was to compute read depth from short read sequence mapping and detect SD intervals that do not overlap with a repeat region of the genome. Read depth was computed for each point after obtaining mapped anchoring positions of the short reads from mrsFAST. A table was built for each chromosome, each containing coordinates where the common repeats are located. The read depth mean was computed for a chromosome from the genome content excluding common repeat regions. For each window with l length (100 bp) an event was determined. Events with excessive read depth and with a deletion were detected using equation (2).
To investigate the interrogating window if it falls within a common repeat elements, a library for the repeat masked regions (masked interspersed repeats, i.e. LINES, SINES, etc.) of the human genome was built. The mean length of the detected SD units was 822 bp. The read depth distribution between the detected duplication subunits and the non-duplicated regions of the genome show significant read depth differences with an approximately 7% error rate.
The current version of mrsFAST does not return the quality of the aligned reads within a consensus genome. Instead, MAQ version 0.7.1 (Mapping and Assembly with Quality) was used which assembles genomes with a specified quality. MAQ searches for the un-gapped match with lowest mismatch score (i.e., maximum of 2) in the first 28 bp. To confidently map alignments, MAQ assigns each alignment a Phred scaled quality score which measures the probability that the true alignment is not the alignment that is detected by MAQ. If a short read maps to multiple positions in the genome, MAQ will randomly pick one position and give the excluded position a mapping quality of zero. The NA18507 genome short reads were mapped and assembled into the reference genome using MAQ allowing at most 2 mismatches.
Using read depth as a measure to detect SD unit breakpoints may produce regions that share <90% sequence identity. To reduce false positive and computational burden after detecting SD unit breakpoints, a basic version of the end space alignment algorithm (without seed and extend approach) was utilized and a pairwise alignment for each of the SD units against the rest of the genome SD units was performed. Only those SD units for rearrangement analysis described in the following section that contained at least one duplicon >100 by with >90% sequence identity were included. 20,237 SD units when every 100 by window was assessed for a possible rearrangement were detected.
The ability to detect highly homologous regions between two sequences is essential for duplicon detection. Multiple clusters of non-adjacent duplicons with >90% sequence identity cannot be mapped using basic alignment algorithms. As previously reported, the basic pairwise global alignment algorithm will miss duplicon breakpoints that are non-adjacent within an SD with different thresholds of sequence identity [6]. Semi-global alignment has a tendency to produce pattern-like alignments (see example below), which are not informative for complex regions with multiple duplications. A modified version of the pairwise alignment algorithm was implemented where the alignments are scored ignoring end spaces of the two sequences. Adding the option of end spaces in our alignment does not produce pattern-like alignments and therefore accurately pinpoints the breakpoints of the duplicon with an allowed gap that crosses the threshold of >90% sequence identity. The neutral rate of evolutionary decay suggests that 10% sequence divergence is required to accurately detect duplications that are primate-specific [Gu W et al. (2008)].
In order to implement the algorithm, a dynamic programming technique was utilized which is a modified version of Smith-Waterman dynamic programming [Smith I F et al. (1981)]. This approach will detect the pairwise alignment relative to a penalty function corresponding to semi-global alignment. The dynamic programming (DP) algorithm was used to compute the above alignments and the backtrack pointer was used to identify the best alignment.
As a core searching algorithm, a penalty function was implemented to complete the dynamic programming matrix M. First, the first column and row was initialized with zeroes which provided forgiving spaces at the beginning of the sequences in order to obtain the highest similarity between the interrogated sequences. The intention was to locate duplicons between a pair of sequences (i.e., s and t) with >90% identity and alignment with minimal gaps to avoid pattern-like structures. “A” was encoded with 1, “G” with 2. “C” with 3 and “T” with 4 to construct the (m+1)×(n+1) DP matrix M, where m and n is the length of two given sequences s and t, respectively. The algorithm uses a dynamic programming technique to fill a matrix M by a look up penalty function from the 5×5 matrix C. A penalty function g(i,j) was introduced for matched alignment with a score of 2. For the mismatches between a pair of bases, a penalty of −2 was introduced for mismatch and −3 for misaligned sequence produced by sequence assembly tools (i.e., MAQ). A −3 penalty was used to reduce the amount of misaligned portions of the sequence into duplicon identification. To allow the algorithm to ignore the end positions of the sequences if it has low similarity, a trace back from the highest value returned by function Sim(s,t) in the matrix M was performed. For any two given sequences (i.e., s and t), a semi-global alignment is an alignment between a substring (in this case duplicon) of s and t.
The memory requirement to fill out DP matrix M is O(mn). The computational time to complete the dynamic programming Matrix M and to determine the maximum value in M for a given pair of sequence s and t with nearly similar length is O(n2) and to trace back starting from the maximum point in the matrix takes O(m+n) time to obtain optimal alignment.
It might be apparent that ignoring end spaces might not detect true breakpoints and for long sequences it might produce really short alignments. Considering that majority of the commonly used alignment search methods (i.e., BLAST, BLAT, and SHRiMP) implement a “seed and extend” method to obtain faster sequence comparison [Altschul S F, 1990; Kent 2002; Yanovsky V, 2008; Mi, 2010], this method was also applied in this study. To perform an exhaustive search within the scope of 100 by windows for any two given segmental unit sequences obtained from NA18507 genome, the dynamic programming algorithm for each 100 by window with 10 by overlaps as “seeds” was applied. The highly similar seeds (>90%) went through the “extend” step and the rest was ignored. As this approach might detect the same breakpoints multiple times if multiple seeding events are obtained from a highly duplicated region, the previously extended duplicon breakpoints from the same SD unit and the overlapping “seeds” was compared and only the maximum extended duplicon was kept. ‘Extend’ is a recursive procedure which extends bi-directionally by 10 bp and the extend step ceases in each direction when further extension does not cross the sequence identity threshold. As a result, the procedure terminates if any further extension of both directions returns <90% sequence identity.
Cytogenetic preparations were made from lymphoblastoid culture (obtained from Coriell cell repositories) for the NA18507 sample. The cell suspension was dropped on slides using a thermotone, aged overnight and hybridized with test (i.e., spectrum orange) and control probes. Following post-hybridization washes and 4,6-diamidino-2-phenylindole (DAP1) counterstaining, slides were analyzed using fluorescence microscopy. Pseudocoloring and image editing was performed using Photoshop software. To validate duplicon rearrangement within SD units, three complex regions in the human genome: 1q21.1, 16p12.1 and 22q11.21 were selected. In this study, fosmid genomic clones corresponding to a duplicated locus as a probe against chromosomal metaphase were used. The localization of FISH clones within these regions and the corresponding derivative loci validated >94% (i.e., 17/18) of the in silico co-localization predictions. The FISH technique was unable to provide a precise estimate of rearrangement at the level of 100 bp due to resolution limitations.
The basic analyses were conducted using a permutation procedure to assess statistical significance of 1-sided tests. The rearrangement for each SD unit was permuted randomly between the two groups and test statistics was computed in each permutation. All results reported in this study used 1 million permutations to derive an empirical value.
Gene ontology data analysis was performed using PANTHER (version 7.0) database [Mi H et al. (2010)]. The biological processes of the hotspots genes were analyzed.
A custom aCGH microarray was designed based on the rearrangement hotspots identified in Example 1. In all, approximately 500 MB of the human genomic sequence was covered within a 2×1 million probe (1 M) microarray. The Agilent custom microarray identification numbers are 035313 and 035316.
The genomic regions covered by the microarray were chosen as follows:
a) All the breakpoints (ie. “rearrangement hotspots”) identified in Example 1 were accommodated (Table 1).
b) The location of the hotspots and how far they are from each other was considered. If two hotspots were within 1 MB from each other, the entire region between the two hotspots was included.
c) Known CNV regions previously identified in the literature were included.
d) At least 1 MB of the telomeric and centromeric regions for all chromosomes were also included.
Probes were designed to be 45-60 basepairs in length. Probe spacing ranges between 190-500 by with a mean spacing of 280 by within each genomic region covered by the array.
Tables 2-5 contain the specific coordinates based on the NA18507 human genome corresponding to the 500 MB of genomic sequence covered by the microarray. In all, approximately 10% of the probes correspond to previously detected Copy Number Variants and 90% of the probes correspond to regions susceptible to genomic alteration as based on the computational analysis described above.
The microarray chips of Example 2 were used to detect genomic regions that have undergone complex structural rearrangements predisposing subjects to Developmental Neurocognitive Disorders (DND) and Complex Autoimmune Disorders.
Four families afflicted with Autism Spectrum Disorder (ASD), two families afflicted with psoriasis (Ps) and one family afflicted with Ankylosing Spondylitis (AS) were studied.
Genomic DNA samples were obtained from the subjects. The samples were first cleaned using QIAamp DNA Micro kit (Qiagen Cat#56304, lot#433156339). Each sample was eluted in a final volume of 95 μl in the Buffer AE provided with the kit. Then each sample was submitted to Nanodrop absorbance measurements for quantitation and quality analysis. A 2% agarose 48 wells EGeI® (E-gel, Invitrogen#G800802) was done to control the quality of gDNA.
According to NanoDrop results, 1.5 μg of each sample (in duplicate) were prepared and also 1.5 μg of control associated to each sample (including duplicate). The sample labeling was done by adding Random Primer to the samples before denaturation and fragmentation in a thermal cycler (AB Applied Biosystems #GeneAmp PCR system 9700) at 95° C. for 10 minutes, 4° C. for 5 minutes then move on ice for 5 minutes incubation. The Labeling Master mix was added to each tube (Cy3 for sample and Cy5 for control). Samples were transferred to a thermal cycler for 2 hours at 37° C., 10 minutes at 65° C. and 4° C. holding. The samples were then moved to ice and cleaned using Amicon 30 kD filter unit.
The cleaning was done with 1×TE buffer from Promega (TE Buffer, 1×, Molecular Grade (pH 8.0), a buffer composed of 10 mM Tris-HCl containing 1 mM EDTA Na2, pH at 25° C. The final volumes obtained were around 21 μl. Each duplicate (sample and control) were combined and the volume adjusted to 161 μl. 1.5 μl of each sample (combined) were used to determine yield and specific activity using Nanodrop spectrophotometer with the function MicroArray Measurement for DNA-50.
After yield determination, each sample was mixed with its corresponding control for a total volume of (319 μl). The total mixture was split into 2 tubes for hybridization. The hybridization master mix was added to each tube. Sample tubes were transferred into incubator with 1.5 ml tube heat block (SciGene #1057-30-0, SciGene#1057-34-0) set at 95° C. for exactly 3 minutes and immediately transferred into a second block heater set at 37° C. for 30 minutes.
Removing sample from 37° C. two by two (duplicate), the duplicates were mixed and loaded on the corresponding array then placed into hybridization oven for the week-end (86 hours).
After hybridization, the arrays were removed from the oven 8 by 8 and washed with wash buffer 1, wash buffer 2, acetonitrile and stabilization & drying solution. Arrays were installed into slide holder and cover with ozone barrier. Immediately after, the arrays were scanned with Agilent Sure Scan C scanner with a resolution of 3 μm.
The custom microarray was able to detect previously reported pathogenic aberrations associated with Autism Spectrum Disorder (ASD).
A 700 kb deletion known to be associated with ASD was detected on chromosome 16p11,2. The detected aberration was de novo as it was only detected in the affected family member and not in unaffected parents or siblings.
DNA from 17 subjects representing four families was analyzed. Seven subjects with ASD and 10 controls were analyzed.
A. Complex Aberration with PGAP1 Gene
Both deletion and duplication events were detected within this region in three families. In each family, the complex aberration was detected in affected family members but not in unaffected family members.
The PGAP-1 gene aberration is located on chromosome 2 between nucleotides 197707345-197776074 (NA18507 human genome).
Without being bound by theory, PGAP1 (post-GPI attachment to proteins 1) catalyzes glycosylphosphatidylinositol (GPI) biosynthesis and PGAP1 may function as a novel component of the Wnt pathway during forebrain development [Zoltwicz et al, 2009]. In addition, knockout of PGAP1 in mice results in complete loss of GPI synthesis and disrupts neurodevelopment [UEDA et al. 2007]. This is the first report linking aberrations within PGAP1 with ASD.
B. Complex Aberration within C7orf58
Multiple deletions were detected within the C7orf58 gene in three families. In each family, the aberration was detected in affected family members but not in unaffected family members.
Without being bound by theory, while the specific function of c7orf58 is unknown, disruption of the c7orf58 gene has been previously reported in a single patient with mental retardation, anxiety disorder and ASD (Dauwerse et al., 2009).
C. Complex Aberration within LNX1 Gene
An aberration was observed within the LNX1 gene. A complex aberration pattern was observed involving two deletions within the same gene within multiple patients with autism spectrum disorders.
The LNX1 gene aberration is located on chromosome 4 between nucleotides 54436284-5433277 (NA18507 human genome).
DNA from 10 members of a single family were analysed. The family pedigree included 5 members affected with ankylosing spondylitis (AS), 2 with systemic lupus and 1 with psoriasis.
A complex aberration (multiple duplications within and adjacent to UGT2B17 and UGT2B25 genes) were detected in all family members affected with AS but was not detected in unaffected family members. This aberration was also detected in one family member affected with systemic lupus.
The UGT2B17 gene aberration is located on chromosome 4 between nucleotides 69399539-69430016 (NA18507 human genome) and the UGT2B15 gene aberration is located on chromosome 4 between nucleotides 69518934-69530196 (NA18507 human genome).
The UGT2B17 gene encodes a key enzyme responsible for glucuronidation of androgens and their metabolites in humans. Without being bound by theory, changes in copy number within the UGT2B17 gene have been previously reported to be involved in bone formation, a characteristic of AS (Yang et al, Giroux et al).
Tourette syndrome (TS) is a developmental neuropsychiatric disorder characterized by the presence of motor (simple and/or complex) and verbal tics with duration longer than one year [Pauls et al. 1991; Price et al. 1985; State 2011]. TS often manifests with features associated with obsessive compulsive disorder (OCD); attention deficit hyperactivity disorder (ADHD), poor impulse control and other behavioural abnormalities, the pathophysiology of which remain to be elucidated [Robertson 2012]. The prevalence of TS is between 0.3-1% in any given population [Centers for Disease Control and Prevention 2009; Robertson 2008; Robertson et al. 2009], and consistently affects males more than females [Robertson 2012]. Twin studies consistently show higher concordance rates in monozygotic compared with dizygotic twins [Pauls et al. 1991; Price et al. 1985; Walkup et al. 1988] suggestive of a strong genetic component underpinning disease pathogenesis. Although early segregation analyses suggested an autosomal dominant inheritance pattern [Eapen et al. 1993], recent evidence suggests a heterogeneous complex genetic architecture underpins the pathogenesis of TS [Eapen et al. 1993; Pauls and Leckman 1986; State 2010; State 2011].
Structural variations are a risk factor for neuropsychiatric diseases. Recent analysis of copy number variants (CNV) in TS have demonstrated an association with genes previously implicated in autism spectrum disorders (ASD) and other neuropsychiatric disorders [Fernandez et al. 2012; Lawson-Yuen et al. 2008]. A rare deletion of exons located 5′ in the neurexin 1 (NRXN1) gene was identified in two unrelated TS patients [Sundaram et al. 2010]. A second deletion in the α-T catenin (CTNNA3) gene was identified in two independent TS studies [Fernandez et al. 2012; Sundaram et al. 2010]. Interestingly, deletions encompassing both the NRXN1 and CTNNA3 genes have been reported in ASD and schizophrenia [Fernandez et al. 2012; Sundaram et al. 2010]. Another rare deletion comprising the NLGN4 gene (Le. exons 4, 5 and 6) has been previously reported in TS and ASD [Lawson-Yuen et al. 2008]. An insertion/translocation between chromosomes 2 and 7 was reported to disrupt the CNTNAP2 gene in a two generation pedigree with a father and two offspring affected with TS [Verkerk et al. 2003]. The identification of CNVs implicated in neuropsychiatric disorders complicates genotype-phenotype analysis. That no single CNV has been reported to segregate uniquely with TS in affected families provides a great opportunity to detect novel CNVs specific to TS through the study of multiplex families.
Family A.
The proband (
Proband 10003.
The proband presented at 12 years to a pediatric psychiatrist and was diagnosed with TS (Table 6). Extended family history is limited due to adoption.
Families B and C.
These families have no known extended history of TS or other co-morbidities.
TS Population.
Probands and families were ascertained through a prospective study of TS in Newfoundland and Labrador (NL), from the Department of Child and Adolescent Psychiatry and the Child Development Clinic in the Janeway Child Health Centre, the Provincial Children's Hospital. Extended family histories and in-depth clinical information were obtained. The study was approved by the Human Research Ethics Board (#07-71). To date, 28 probands have been recruited and eight multi-generational family histories completed. DNA samples were collected from all affected subjects, their parents and extended family members (in multipex pedigrees) following consent and completion of multiple rating scales. The primary focus of this study was a single multiplex pedigree (
Control Population.
To assess the population frequency of the CNV detected using the custom aCGH microarray, 590 control samples were used from the NL population with no clinical report of TS and performed real time quantitative fluorescence polymerase chain reactions (QF-PCR). Custom Microarray. To assess the presence of CNVs on a genome-wide scale, a custom genome-wide microarray was designed based on breakpoints in regions that are susceptible to genomic rearrangements previously identified [Uddin et al. 2011]. The microarray comprised 2×1 million probes covering the genome with a mean spacing of 280 bp. DNA from the TS multiplex family was applied to the custom aCGH microarray which was performed at Genome Quebec (GQ) using an Agilent platform. Prior to CNV analysis, QC measures were applied and the derivative of the log ratio spread (DLRS) <0.25 was considered the threshold and CNVs were detected using the built-in Aberration Detection Method-2 (ADM-2) algorithm DNA Analytics v.4.0.85 (Agilent Technologies) using the following criteria: 1) at least five (5) probes for a CNV call on GC-corrected intensity; 2) nested filter was set to 2; and 3) log intensity >0.24 for duplications and <−0.24 for deletions. A custom script was applied to detect gene-enriched CNVs (i.e., overlaps or consists of a gene) that segregated (at least three cases) with affected status in the family.
QF-PCR.
To confirm the duplication detected using the custom 2M aCGH microarray, a Taqman copy number assay (Hs03417816; Life Technologies) was performed using the manufacturer's recommended protocol. The assay was performed in quadruplicate on 10 ng of genomic DNA for each sample in a 96-well plate. The 10 μl reaction mix consisted of 2 μl 2× Taqman Genotyping Master Mix (Life Technologies), 0.5 μl of 20× copy number assay (described above), 0.5 μl of TaqMan RNAse P Copy Number Reference Assay (Life Technologies, part 4403326), 2 μl of water and 2 μl of 5 ng/μl genomic DNA. Cycling conditions for the reaction were 95° C. for 10 min, followed by 40 cycles of 95° C. for 15 sec and 60° C. for 1 min. Samples were analyzed using the ViiA™ 7 Real-Time PCR System (Life Technologies) and analyzed using CopyCaller Software (Life Technologies, PN 4412907). Three reference (calibrator) DNA HapMap samples (NA10851, NA15510 and NA07048; Coriell Institute) plus one non-template control were included with the test samples.
The custom high-density aCGH microarray yielded approximately 2000 genomic aberrations. Comprehensive data analysis revealed atypical, rare micro-duplications located at chromosome 2q21.1 which segregated with affected members in family A. Large de novo variants were not detected in affected family members (data not shown). Within a 221 kb (chr2:132305299-132526804) region, two common blocks of micro-duplications were identified that segregated together in five of the six affected individuals (
The presence of block2 among five of the six affected family members was validated using a QF-PCR assay which demonstrated a relative copy number of four within the affected siblings and mother whereas unaffected members had a copy number of two or three (data not shown). QF-PCR analysis was performed on two additional families and 10 unrelated individuals with TS. The block2 micro-duplication with a copy number of 4 was detected in one additional affected individual (ID10003), but absent in all other unrelated affected or unaffected samples tested. Of the 590 control individuals analyzed using QF-PCR, only CNV predictions calls on 443 samples had a 95% confidence interval. The frequency of a copy number of four was observed in 4/443 (0.009) individuals.
The salient characteristics of TS segregate in subjects with the micro-duplications including multiple motor and vocal tics, and common co-morbidities including ADHD, OCD, major depression, anxiety, behavioural problems, and learning disability [Termine et al, 2006]. Migraine and sleep difficulties which have been reported in association with TS are also present in several affected family members [Abelson et al. 2005; Freeman et al. 2000; Kwak et al. 2003; Lespérance et al, 2004; Singer 2005]. Although TS segregates with the micro-duplications described, the morbidity of disease is variable. The proband (
A genomic region containing two micro-duplication blocks, the larger of which (131 kb) segregates with TS status in a three generation family has been identified. This micro-duplication encompasses the C2orf27A gene, which belongs to the C2orf27 gene family, and encodes an uncharacterized protein. Although the function of this gene is unknown, it was derived from a guanine nucleotide exchange factor protein [Toll-Riera et al. 2011]. Guanine nucleotide exchange factors are expressed in the basal ganglia [Kawasaki at al. 1998] which is associated with a variety of functions, including voluntary motor control, procedural learning relating to routine behaviors or “habits” such as eye movements, and cognitive functions [Albin at al. 1995]. Unlike previous reports of CNV associations with TS [Fernandez et al. 2012; Sundaram et al. 2010], these micro-duplications have not been reported with any other neuropsychiatric disorder and thus the candidate region is specific to TS. Interestingly, a larger region which encompasses the CNVs here detected in this study was previously identified as a locus through linkage analysis for dystonia in a four generation family [Norgren at al. 2011]. In that study, a critical 8.9 MB region correlated with the highest LOD score (
Given that the micro-duplications identified in this study are atypical CNVs, the population frequency was determined. From the Newfoundland population, it was observed that the larger micro-duplication represents an atypical, rare genomic aberration. Previously published high-density (42 million probes) genomic tiling microarray data (Conrad et al., 2010) have revealed the presence of common micro-deletions interspersed within the micro-duplicated regions identified in this study. Very low frequency (0.01-0.07) typical duplications have been reported in the Database of Genomic Variants (DGV) within this region. The DGV demonstrated typical CNV gains with low frequencies and of the reported studies, no single individual carries two duplication blocks within this region. However, the breakpoints reported in the DGV have not been validated. These breakpoints are also absent within the large study that investigated 15,767 children with various types of intellectual disability [Cooper et al. 2011]. Thus, this unusual segregation of the two micro-duplication blocks within the TS family is a rare event and is highly correlated with TS pathogenesis.
These findings underline the impact of CNVs with respect to human health and genomic susceptibility to TS. The larger micro-duplication that segregates with the affected individuals of Family A encompasses the C2orf27A gene. The rare frequency of this micro-duplication within the control population shows a link between the 2q21.1-21.2 locus and TS pathogenesis.
βIncludes all treatments from diagnosis to present day: not all currently prescibed.
This application claims benefit under 35 U.S.C. 119(e) to U.S. provisional application No. 61/579,214, filed Dec. 22, 2011, incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61579214 | Dec 2011 | US |