METHOD AND KIT FOR DETERMINING NEUROMUSCULAR DISEASE IN SUBJECT

Information

  • Patent Application
  • 20220025460
  • Publication Number
    20220025460
  • Date Filed
    July 21, 2020
    4 years ago
  • Date Published
    January 27, 2022
    2 years ago
Abstract
A method for determining a neuromuscular disease accompanied with a repeat expansion of CGG in a nucleic acid in a subject comprising: obtaining a nucleic acid fragment having a repeat expansion of CGG or a complementary sequence thereof from a nucleic acid sample from the subject,circularizing the nucleic acid fragment with an origin of chromosome (oriC) cassette to form a circular nucleic acid,amplifying the circular nucleic acid to produce a plurality of circular nucleic acids, anddetecting the repeat expansion of CGG or the complementary sequence thereof.
Description
TECHNICAL FIELD

A method and a kit for determining a neuromuscular disease in a subject are disclosed.


BACKGROUND ART

Noncoding repeat expansions cause various neuromuscular diseases including myotonic dystrophies, fragile X tremor/ataxia syndrome (FXTAS), some spinocerebellar ataxias, amyotrophic lateral sclerosis, and benign adult familial myoclonic epilepsies (BAFME).


Solution to Problem

U.S. 62/842,110 and PCT/JP2020/018412 are incorporated herein by reference. In addition, all patent applications, patents, and printed publications cited herein are incorporated herein by reference in the entireties, except for any definitions, subject matter disclaimers or disavowals, and except to the extent that the incorporated material is inconsistent with the express disclosure herein, in which case the language in this disclosure controls.


Inspired by the striking similarities in the clinical and neuroimaging findings between neuronal intranuclear inclusion disease (NIID) and FXTAS caused by noncoding CGG repeat expansions in FMR1, the present inventors directly searched for repeat expansion mutations, and identified noncoding CGG repeat expansions in NBPF19 (NOTCH2NLC) as the causative mutations for NIID. Further prompted by the similarities in the clinical and neuroimaging findings with NIID, the present inventors identified similar noncoding CGG repeat expansions in two other diseases, oculopharyngeal myopathy with leukoencephalopa (OPML) and oculopharyngodistal myopathy (OPDM) in LOC642361/NUTM2B-AS1 and LRP12, respectively. These findings expand the present inventor's knowledge on the clinical spectra of diseases caused by expansions of the same repeat motif and further highlight the role of direct search for expanded repeats in identifying genes underlying diseases.


An aspect of the present disclosure relates to a method for determining, diagnosing, or aiding to diagnose a neuromuscular disease accompanied with a repeat expansion of CGG in a nucleic acid in a subject comprising detecting a repeat expansion of CGG or a complementary sequence thereof in a nucleic acid sample from the subject. The neuromuscular disease may be selected from the group consisting of neuronal intranuclear inclusion disease, oculopharyngodistal myopathy, and oculopharyngeal myopathy with leukoencephalopathy.


An aspect of the present disclosure relates to a method for treating a neuromuscular disease accompanied with a repeat expansion of CGG in a nucleic acid in a subject comprising detecting a repeat expansion of CGG or a complementary sequence thereof in a nucleic acid sample from the subject, and if the repeat expansion is detected, administering a pharmaceutical composition for treating the neuromuscular disease to the subject. The neuromuscular disease may be selected from the group consisting of neuronal intranuclear inclusion disease, oculopharyngodistal myopathy, and oculopharyngeal myopathy with leukoencephalopathy.


In the above method, the nucleic acid sample may be a chromosome DNA. In the above method, the repeat expansion of CGG may be in a gene from the subject.


In the above method, the neuromuscular disease may be neuronal intranuclear inclusion disease and the repeat expansion of CGG may be in NBPF19 gene. NBPF19 gene is also referred to as NOTCH2NLC gene. In the above method, the neuromuscular disease may be neuronal intranuclear inclusion disease and the repeat expansion may be greater than 80 repeats.


In the above method, the neuromuscular disease may be oculopharyngodistal myopathy and the repeat expansion of CGG may be in 5′ untranslated region of LRP12 gene. In the above method, the neuromuscular disease may be oculopharyngodistal myopathy and the repeat expansion is greater than 77 repeats.


In the above method, the neuromuscular disease may be oculopharyngeal myopathy with leukoencephalopathy and the repeat expansion of CGG may be in LOC642361 gene and/or NUTM2B-AS1 gene. In the above method, the neuromuscular disease may be oculopharyngeal myopathy with leukoencephalopathy and the repeat expansion may be greater than the range in healthy individuals. The range in healthy individuals is 6 to 14 repeat units.


An aspect of the present disclosure relates to a kit for determining or diagnosing a neuromuscular disease accompanied with a repeat expansion of CGG in a nucleic acid in a subject comprising a nucleic acid reagent configured to detect a repeat expansion of CGG or a complementary sequence thereof in a nucleic acid sample from the subject. The neuromuscular disease may be selected from the group consisting of neuronal intranuclear inclusion disease, oculopharyngodistal myopathy, and oculopharyngeal myopathy with leukoencephalopathy.


In the above kit, the nucleic acid sample may be a chromosome DNA. In the above kit, the nucleic acid reagent may comprise a PCR primer configured to detect the repeat expansion of CGG or the complementary sequence thereof. In the above kit, the PCR primer may comprise a complementary sequence of CGG or a complementary sequence thereof. In the above kit, the nucleic acid reagent may comprise a probe configured to target a sequence flanking the repeat expansion of CGG or a complementary sequence thereof. In the above kit, the repeat expansion of CGG may be in a gene from the subject.


In the above kit, the neuromuscular disease may be neuronal intranuclear inclusion disease and the repeat expansion of CGG may be in NBPF19 gene. NBPF19 gene is also referred to as NOTCH2NLC gene. In the above kit, the neuromuscular disease may be neuronal intranuclear inclusion disease and the repeat expansion may be greater than 80 repeats.


In the above kit, the neuromuscular disease may be oculopharyngodistal myopathy and the repeat expansion of CGG may be in 5′ untranslated region of LRP12 gene. In the above kit, the neuromuscular disease may be oculopharyngodistal myopathy and the repeat expansion may be greater than 77 repeats.


In the above kit, the neuromuscular disease may be oculopharyngeal myopathy with leukoencephalopathy and the repeat expansion of CGG may be in LOC642361 gene. LOC642361 gene is also referred to as NUTM2B-AS1 gene. In the above kit, the neuromuscular disease may be oculopharyngeal myopathy with leukoencephalopathy and the repeat expansion is greater than the range in healthy individuals. The range in healthy individuals is 6 to 14 repeat units.


An aspect of the present disclosure relates to a method for determining a neuromuscular disease accompanied with a repeat expansion of CGG in a nucleic acid in a subject comprising: obtaining a nucleic acid fragment having a repeat expansion of CGG or a complementary sequence thereof from a nucleic acid sample from the subject, circularizing the nucleic acid fragment with an origin of chromosome (oriC) cassette to form a circular nucleic acid, amplifying the circular nucleic acid to produce a plurality of circular nucleic acids, and detecting the repeat expansion of CGG or the complementary sequence thereof.


The above method may further comprise digesting the amplified circular nucleic acids to obtain amplified nucleic acid fragments. Each of the amplified nucleic acid fragments may have the repeat expansion of CGG or the complementary sequence thereof.


In the above method, 5′ region of the oriC cassette may be complementary to 5′ region of the nucleic acid fragment and 3′ region of the oriC cassette may be complementary to 3′ region of the nucleic acid fragment.


In the above method, 5′ region of the oriC cassette may be complementary to 3′ region of the nucleic acid fragment and 3′ region of the oriC cassette may be complementary to 5′ region of the nucleic acid fragment.


In the above method, the repeat expansion of CGG or the complementary sequence thereof may locate between the 5′ region and the 3′ region of the nucleic acid fragment.


In the above method, the 5′ region and the 3′ region of the nucleic acid fragment may be loci specific to the neuromuscular disease.


In the above method, the nucleic acid fragment may be obtained by using a restriction enzyme or a gene editing protein.


In the above method, the neuromuscular disease may be selected from the group consisting of neuronal intranuclear inclusion disease, oculopharyngodistal myopathy, and oculopharyngeal myopathy with leukoencephalopathy.


In the above method, the nucleic acid sample may be a chromosome DNA.


In the above method, the repeat expansion of CGG may be in a gene from the subject.


In the above method, the neuromuscular disease may be neuronal intranuclear inclusion disease, and the repeat expansion of CGG may be in NBPF19 gene. NBPF19 gene is also referred to as NOTCH2NLC gene. The repeat expansion may be greater than 80 repeats.


In the above method, the neuromuscular disease may be oculopharyngodistal myopathy, and the repeat expansion of CGG may be in 5′ untranslated region of LRP12 gene. The repeat expansion may be greater than 77 repeats.


In the above method, the neuromuscular disease may be oculopharyngeal myopathy with leukoencephalopathy, and the repeat expansion of CGG may be in LOC642361 gene. LOC642361 gene is also referred to as NUTM2B-AS1 gene. The repeat expansion may be greater than the range in healthy individuals. The range in healthy individuals is 6 to 14 repeat units.


An aspect of the present disclosure relates to a kit for determining a neuromuscular disease accompanied with a repeat expansion of CGG in a nucleic acid in a subject comprising: a fragmentation reagent configured to obtain a nucleic acid fragment having a repeat expansion of CGG or a complementary sequence thereof from a nucleic acid sample from the subject, a circularizing reagent configured to circularize the nucleic acid fragment with an origin of chromosome (oriC) cassette to form a circular nucleic acid, and an amplifying reagent configured to amplify the circular nucleic acid to produce a plurality of circular nucleic acids.


The above kit may comprise a digesting reagent to digest the amplified circular nucleic acids to obtain amplified nucleic acid fragments. Each of the amplified nucleic acid fragments may have the repeat expansion of CGG or the complementary sequence thereof.


In the above kit, 5′ region of the oriC cassette may be complementary to 5′ region of the nucleic acid fragment and 3′ region of the oriC cassette may be complementary to 3′ region of the nucleic acid fragment.


In the above kit, 5′ region of the oriC cassette may be complementary to 3′ region of the nucleic acid fragment and 3′ region of the oriC cassette may be complementary to 5′ region of the nucleic acid fragment.


In the above kit, the repeat expansion of CGG or the complementary sequence thereof may locate between the 5′ region and the 3′ region of the nucleic acid fragment.


In the above kit, the 5′ region and the 3′ region of the nucleic acid fragment may be loci specific to the neuromuscular disease.


In the above kit, the fragmentation reagent may contain a restriction enzyme or a gene editing protein.


In the above kit, the neuromuscular disease may be selected from the group consisting of neuronal intranuclear inclusion disease, oculopharyngodistal myopathy, and oculopharyngeal myopathy with leukoencephalopathy.


In the above kit, the nucleic acid sample may be a chromosome DNA.


In the above kit, the repeat expansion of CGG may be in a gene from the subject.


In the above kit, the neuromuscular disease may be neuronal intranuclear inclusion disease, and the repeat expansion of CGG may be in NBPF19 gene. NBPF19 gene is also referred to as NOTCH2NLC gene. The repeat expansion may be greater than 80 repeats.


In the above kit, the neuromuscular disease may be oculopharyngodistal myopathy, and the repeat expansion of CGG may be in 5′ untranslated region of LRP12 gene. The repeat expansion may be greater than 77 repeats.


In the above kit, the neuromuscular disease may be oculopharyngeal myopathy with leukoencephalopathy, and the repeat expansion of CGG may be in LOC642361 gene. LOC642361 gene is also referred to as NUTM2B-AS1 gene. The repeat expansion maybe greater than the range in healthy individuals. The range in healthy individuals is 6 to 14 repeat units.


An aspect of the present disclosure relates to a method for detecting a repeat expansion of CGG in a nucleic acid comprising: obtaining a nucleic acid fragment having a repeat expansion of CGG or a complementary sequence thereof, circularizing the nucleic acid fragment with an origin of chromosome (oriC) cassette to form a circular nucleic acid, amplifying the circular nucleic acid to produce a plurality of circular nucleic acids, and detecting the repeat expansion of CGG or the complementary sequence thereof.


The above method may further comprise digesting the amplified circular nucleic acids to obtain amplified nucleic acid fragments. Each of the amplified nucleic acid fragments may have the repeat expansion of CGG or the complementary sequence thereof.


In the above method, 5′ region of the oriC cassette may be complementary to 5′ region of the nucleic acid fragment and 3′ region of the oriC cassette may be complementary to 3′ region of the nucleic acid fragment.


In the above method, 5′ region of the oriC cassette may be complementary to 3′ region of the nucleic acid fragment and 3′ region of the oriC cassette may be complementary to 5′ region of the nucleic acid fragment.


In the above method, the repeat expansion of CGG or the complementary sequence thereof may locate between the 5′ region and the 3′ region of the nucleic acid fragment.


In the above method, the nucleic acid fragment may be obtained by using a restriction enzyme or a gene editing protein.


In the above method, the nucleic acid fragment may be obtained from a chromosome DNA.


In the above method, the repeat expansion of CGG may be in a gene.


An aspect of the present disclosure relates to a kit for detecting a repeat expansion of CGG in a nucleic acid comprising: a fragmentation reagent configured to obtain a nucleic acid fragment having a repeat expansion of CGG or a complementary sequence thereof from a nucleic acid sample, a circularizing reagent configured to circularize the nucleic acid fragment with an origin of chromosome (oriC) cassette to form a circular nucleic acid, and an amplifying reagent configured to amplify the circular nucleic acid to produce a plurality of circular nucleic acids.


The above kit may further comprise a digesting reagent to digest the amplified circular nucleic acids to obtain amplified nucleic acid fragments. Each of the amplified nucleic acid fragments may have the repeat expansion of CGG or the complementary sequence thereof.


In the above kit, 5′ region of the oriC cassette may be complementary to 5′ region of the nucleic acid fragment and 3′ region of the oriC cassette may be complementary to 3′ region of the nucleic acid fragment.


In the above kit, 5′ region of the oriC cassette may be complementary to 3′ region of the nucleic acid fragment and 3′ region of the oriC cassette may be complementary to 5′ region of the nucleic acid fragment.


In the above kit, the repeat expansion of CGG or the complementary sequence thereof may locate between the 5′ region and the 3′ region of the nucleic acid fragment.


In the above kit, the fragmentation reagent may contain a restriction enzyme or a gene editing protein.


In the above kit, the nucleic acid sample may be a chromosome DNA.


In the above kit, the repeat expansion of CGG may be in a gene.





BRIEF DESCRIPTION OF DRAWINGS


FIG. 1 shows a brain MRI of patients with FXTAS, NIID, OPML, and OPDM. Representative brain T2-weighted images (T2WI) and diffusion-weighted images (DWI) of patients with FXTAS [fragile X tremor/ataxia syndrome, a 64-year-old male with mild expansion (premutation) of CGG repeats in FMR1], NIID (neuronal intranuclear inclusion disease, a 72-year-old female with expanded CGG repeats in NBPF19), OPML (oculopharyngeal myopathy with leukoencephalopathy, a 60-year-old female with CGG/CCG repeat expansion in LOC642361/NUTM2B-AS1), and OPDM (oculopharyngodistal myopathy, a 57-year-old male with CGG repeat expansion in LRP12) are shown. Widespread white matter changes with high T2-weighted signals associated with high-intensity signals in the corticomedullary junctions revealed by DWI are shown in the patients with FXTAS, NIID, and OPML. In the patient with FXTAS, cerebral white matter lesions are less prominent than in those with NIID and OPML. T2-weighted high intensity lesions in the middle cerebellar peduncles (MCP sign), a characteristic finding in FXTAS, are also observed in the patient with NIID, whereas slightly high intensity lesions in T2WI are observed in the cerebellar white matter surrounding the deep cerebellar nuclei in the patient with OPML. No abnormal signal intensities or atrophic changes are observed in the patient with OPDM.



FIG. 2A, FIG. 2B, FIG. 2C, and FIG. 2DFIG. 2A shows a direct identification of repeat expansion mutations by analysis of short reads of whole-genome sequence data. The flow chart shows the scheme for direct identification of repeat expansion mutations employing short reads of whole-genome sequencing data. Step 1: Using TRhist, the present inventors first extract short reads filled with tandem repeats that are overrepresented in patients. Step 2: In the short reads overrepresented in patients, the present inventors observe paired-end reads where both the short reads are filled with tandem repeats as indicated by two gray boxes and those where one of the paired short reads do not contain tandem repeats (nonrepeat reads) as indicated by black boxes. The present inventors then align the nonrepeat reads to the reference genome. As an optional step, the present inventors extract additional paired-end short reads partly filled with tandem repeats (composite boxes with gray and black) and further manually align these short reads and the paired nonrepeat reads (black boxes) to the reference genome in FIG. 2B. Step 3: The expanded repeats are confirmed by repeat-primed PCR analysis in FIG. 2C, Southern blot analysis in FIG. 2D, or long-read sequence analysis.



FIG. 3 shows a summary of the study and clinical overlaps in FXTAS, NIID1, OPML1, OPDM1, and OPMD.



FIG. 4 shows a haplotype analysis of three families with oculopharyngodistal myopathy type 1. Haplotypes were reconstructed using single nucleotide variants genotyped using Affymetrix Genome Wide SNP array 6.0 in three families (F3411, F7758, and F7967). In Families F7758 and F7967, multiple affected individuals were observed, whereas in family F3411onlyoneaffectedindividual (sporadiccase) was observed. In this analysis, the present inventors used hg19 as the reference sequence. First, homozygosity haplotypes were reconstructed (Miyazawa et al. Homozygosity haplotype allows a genome wide search for the autosomal segments shared among patients. Am J Hum Genet80;1090-1102, (2007)) and shared regions among the three patients were visually confirmed (gray). In addition to SNP array analysis, the present inventors also utilizedlOX GemCode Technology and compared each haploblock from three families from chr8:105,384,931 to chr8:105,657,322, avoiding genotypes within 10 kb of the boundaries of the haploblock indicated by longranger software. The present inventors selected single nucleotide variants with equal or more than 10 coverages from phased genotypes generated by 10X GemCode Technology. All the phased variants of the three families were matched as indicated by dimgray. These analyses suggested a common founder chromosome among these OPDM1 families.



FIG. 5A and FIG. 5BFIG. 5A shows homologous regions around the CGG repeats in NBPF19. NBPF19 gene is also referred to as NOTCH2NLC gene. FIG. 5A: Schematic representation of the four highly homologous genes (AC237572.1, NOTCH2, NOTCH2NL, and NBPF14) and NBPF19 are shown. Physical positions in hg38 are indicated. The five genes are located in the pericentric region of chromosome 1. The centromere and a long heterochromatin (1q12) exist between them. Parts of NBPF19, NBPF14, NOTCH2NL, and AC253572.1 have also been recently annotated as NOTCH2NLC, NOTCH2NLB, NOTCH2NLA, and NOTCH2NLR, respectively [Fiddes, I.T. et al. Ce11173, 1356-1369.e22 (2018) and Suzuki, I.K. et al. Ce11173, 1370-1384 (2018)]. FIG. 5B: To see sequences with high similarity in these regions, qs core and identity are calculated using BLAT [Kent, W.J. BLAT-the BLAST-like alignment tool. Genome Res.12:646-664 (2002)]. A portion of the NBPF19sequence (chr1:149,370,802-149,410,843 in hg38 that corresponds to 20 kb upstream and 20kb downstream of the CGG repeats in 5′ UTR of NBPF19) is used as a query. Identities of 99.2%-99.5% are indicated.



FIG. 6A and FIG. 6B show Japanese families with NIID enrolled in the present inventor's study.



FIG. 7A and FIG. 7B show an identification of CGG repeat expansion mutations in NBPF19 in NIID. NBPF19 gene is also referred to as NOTCH2NLC gene. FIG. 7A: Number of short reads filled with CGG/CCG tandem repeats in patients with NIID and controls, which were revealed by TRhist using whole genome sequencing data obtained by HiSeq2500. Short reads filled with CGG or CCG repeats were identified in four patients with NIID, whereas no such reads were observed in seven control subjects. FIG. 7B: The CGG/CCG repeat expansions were determined to be located in the 5′ untranslated regions (5′ UTR) of NBPF19, as revealed by alignment of the nonrepeat reads paired with short reads filled with CGG/CCG repeats to the reference genome. Although some of the nonrepeat reads were also aligned to paralogous genes (NBPF14, NOTCH2NL, NOTCH2, and AC253572.1) with enormously high identities with NBPF19 (left and right frames of alignment), the present inventors identified six short reads strongly supporting the alignment to NBPF19 (alignment of one of the six reads is shown in the center frame of aligned nucleotide sequences).



FIG. 8A and FIG. 8B show results from TRhist. Data from whole-genome sequence analysis of 150 bp(a) and 126 bp(b) paired-end reads. Only repeat motifs with 3-6 bases that any of the subjects showing more than 9 reads have been observed are shown. Reads filled with CCG(=CGG) repeats are observed in patients with NIID1, OPML1, and OPDM1. NIID1, neuronal intranuclearinclusion disease type 1; OPML1, oculopharyngealmyopathy with leukoencephalopathy type 1; OPDM1, oculopharyngodistal myopathy type 1.



FIG. 9A and FIG. 9B show an identification of location of CGG/CCG repeats in families with NIID. After short reads filled with CGG/CCG repeats were identified in four patients with NIID, reads paired with reads filled with CGG/CCG repeats were investigated. After trimming using quality score using sickle (version 1.33, https://github.com/najoshi/sickle), reads were visually investigated and mapped to hg38 using BLAT. In patients in F9193, F5804, F9468, and F9785, 6, 7, 13, and 7 reads were mapped to chromosome 1 (boxed with a blue line). In three patients, 3, 2, and 1 nonrepeat reads strongly supported the location of CGG/CCG repeats in NBPF19 (boxed with a red line). NBPF19 gene is also referred to as NOTCH2NLC gene. In patient 11-6 in F9193, another CGG/CCG repeat was suggested in AFF3 at the fragile site FRA2A located outside the candidate region determined by linkage analysis (data not shown). STR, short tandem repeat.



FIG. 10A, FIG. 10B, FIG. 10C, FIG. 10D, FIG. 10E, and FIG. 10F show a characterization of CGG repeat expansion mutations in 5′ UTR of NBPF19 in patients with NIID. NBPF19 gene is also referred to as NOTCH2NLC gene. FIG. 10A: Schematic representation of NBPF19 indicating the location of CGG repeat expansions. Recently, this region has also been annotated as NOTCH2NLC. The primer set used for repeat-primed PCR (RP-PCR) analysis was designed to detect the expanded CGG repeats on the basis of the unique sequences in NBPF19. FIG. 10B: Representative results of RP-PCR analysis demonstrating CGG repeat expansions in the patients in families F9193 and F6321 (upper and middle panels, respectively). In an unaffected married-in individual, no CGG repeat expansions were detected (lower panel). Experiments were conducted twice with reproducible results. FIG. 10C: CGG repeat expansions in NBPF19 were observed in 26 of the 28 Japanese index patients with NIID (12 probands of the 12 familial cases, 12 of the 14 sporadic cases, and both of the two cases with unavailable family histories). NBPF19 gene is also referred to as NOTCH2NLC gene. The repeat expansion mutations were also detected in two Malaysian patients. FIG. 10D: Pedigree chart of multiplex families with NIID. Squares and circles indicate males and females, respectively. A diagonal line through a symbol indicates a deceased individual. Affected individuals and those suspected of having the disease are indicated by filled and grey symbols, respectively. The pedigree charts are simplified and scrambled in part including those shown by diamond symbols for confidentiality reason. As shown in the mutation status below the symbols, 11 patients had repeat expansion mutations [exp(+)], whereas three asymptomatic individuals with normal nerve conduction study findings (F6321), three asymptomatic individuals aged >60 years with normal MRI findings (families F9193 and F11393), and two married-in healthy individuals did not [exp(−)]. FIG. 10E: Southern blot analysis revealed expanded alleles in patients with NIID. Probes 1 and 2 were used in the analysis (FIG. 15A and FIG. 15B and FIG. 16A and FIG. 16B). The lengths of CGG repeat expansions were estimated to range from 270 to 550 bp. Note that lower bands with intense signals represent wild type alleles of NBPF19 and the restriction fragments with the same sizes derived from the other four paralogous genes (AC253572.1, NOTCH2, NOTCH2NL, and NBPF14). Experiments were conducted twice with reproducible results. PBL, genomic DNA extracted from peripheral blood leukocytes; LCL, genomic DNA extracted from lymphoblastoid cell line. FIG. 10F: Distribution of number of CGG repeats in the 5′ UTR of NBPF19. The genomic DNA regions containing CGG repeats and the flanking sequences were amplified by PCR using an NBPF19-specific primer pair (FIG. 18A and FIG. 18B). The number of CGG repeats were determined from circular consensus sequencing (CCS) reads. CGG repeats ranged 7-39 repeats in 182 control subjects and there were considerable variations in the repeat configurations. In addition, three SNVs (rs1172135200, rs1258206224, and rs1436954367 designated as “3 SNVs”) were exclusively present in the allele with the repeat motif of (AGG)(CGG)9(AGG)3 in 14 control subjects. Another allele carrying rs1258206224 with a configuration of (AGG)(CGG).(AGG)2(CGG) were observed in 3 control subjects. The repeat motif of (AGG)(CGG)n(AGG)2(CGG) was observed in the majority of the alleles and the CGG repeat lengths tended to be larger than those with the repeat motif of (AGG)(CGG)n(AGG)3.



FIG. 11A, FIG. 11B, FIG. 11C, FIG. 11D, FIG. 11E, FIG. 11F, FIG. 11G, FIG. 11H, FIG. 11I, FIG. 11J, FIG. 11K, FIG. 11L, FIG. 11M, FIG. 11N and FIG. 11O show multiple sequence alignment of a long read, NBPF19, AC253572.1, NOTCH2NL, NBPF14, and NOTCH2. Multiple sequence alignment of a long-read sequence obtained by single-molecule, real-time sequencing, as well as the corresponding regions in NBPF19, AC253572.1, NOTCH2NL, NBPF14, and NOTCH2using ClustalW2 [Larkin, M. A., et al. ClustalW and ClustalX version 2.0. Bioinformatics23, 2947-2948 (2007)]. The five long reads spanning the CGG repeats in NBPF19were subjected to error-correction using Canu (version 1.7) [Koren, S., et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res.27, 722-736 (2017)] and then assembled using racon (version 1.3.1) [Vaser, R., et al. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res.27, 737-746 (2017)]. CGG repeat expansions were shown by boxes in FIG. 11B and FIG. 11C. An NBPF19-specific insertion of Alu sequence was shown by boxes in FIG. 11K and FIG. 11L, which confirmed that the expanded CGG repeats were located in NBPF19. One of the primer sequence (NBPF19-R, FIG. 13A and B) for repeat-primed PCR analysis (shown by a box in FIG. 11D) and a primer pair (pGEX3′-NBPF19-6F and NBPF19-5R2, FIG. 17A and FIG. 17B) for fragment analysis (shown by boxes in FIG. 11A and FIG. 11E) were designed to avoid nonspecific amplification.



FIG. 12 shows raw and corrected long reads. Rows with white background and those with grey background show read names, properties of reads and nucleotide sequences before error correction and those after error correction by Canu, respectively.



FIG. 13A, FIG. 13B, and FIG. 13C show primer sequences used for repeat-primed PCR analysis



FIG. 14 shows primer sequences used for the repeat-primed PCR analysis of FMR1. The present inventors used deaza-dGTPin place of dGTP. PCR reaction was conducted as follows; initial denaturation at 94° C. for 1 min, followed by 30 cycles of 94° C. for 30 s, 60° C. for 30 s, and 72° C. for 80 s or slow down PCR protocol shown in present disclosure. GCII buffer was obtained from TaKaRa (TaKaRaLA taq with GC buffer).



FIG. 15A and FIG. 15B show primer sequences used for preparation of template and probes for Southern blot analysis. Genomic DNA segments flanking the CGG repeats were amplified using the primer pairs (NBPF19_1/NBPF19_4, NBPF19_2aF2/NBPF19_2cF3, 2107F/3052R, and 2243F/2995R) and subcloned into plasmids. Probes for Southern blot hybridization analysis were prepared by digoxigenin(DIG) labeling using primer pairs (NBPF19_1/NBPF19_1R for Probe 1 and NBPF19_4F/NBPF19_4 for Probe 2, NBPF19_2aF2/NBPF19_2aR2 for Probe 3, NBPF19_2bF2/NBPF19_2bR2 for Probe 4, and NBPF19_2cF2/NBPF19_2cR2 for Probe 5 [NBPF19], 2107F/2531R for Probe 6 [LOC642361/NUTM2B-AS1], and 2243F/2562R for Probe 7 and 2538F_2995R for Probe 8 [LRP12]).



FIG. 16A, FIG. 16B and FIG. 16C show an intergenerational instability of the CGG repeats in NBPF19. NBPF19 gene is also referred to as NOTCH2NLC gene. FIG. 16A: Sacl/Nhel digestion sites around the CGG repeats in the 5′ UTR of NBPF19 are shown. An Alu sequence (starred) on the downstream of the CGG repeats is absent in the other 4 highly homologous genes (AC253572.1, NOTCH2, NOTCH2NL, and NBPF14). This enabled the present inventors to distinguish the NBPF19 alleles from other highly homologous genes in Southern blot analysis using Nhel-digested genomic DNA (gDNA). Restriction fragments generated from NOTCH2, AC253572.1, NBPF14, and NOTCH2NL are estimated to be 2,696 bp, 2,691 bp, 2,696 bp, and 2,707 bp, respectively, whereas that from NBPF19 is estimated to be 3,009 bp basedonhg38. FIGS. 16B and 16C: Southern blot analysis of parent-offspring pairs in the branches of F6321 using Nhel-digested gDNA, where the present inventors use probes 1-5 to enhance the signal intensity of target bands. White arrows indicate fragments derived from the 4 genes (NOTCH2, AC253572.1, NBPF14, and NOTCH2NL) that do not carry the Alu sequence designated by a star in (a) and gray arrows indicate wild typeNBPF19 alleles that carry the Alu sequence. Black arrows indicate NBPF19 alleles with expanded CGG repeats. The results showed that the sizes of the CGG repeats in NBPF19 become larger in the successive generations. The parent indicated by a gray symbol in (b) only showed abnormalities in the nerve conduction study.



FIG. 17A and FIG. 17B show primer sequences used for the fragment analysis in controls subjects. PCR reaction was conducted as follows; initial denaturation at 98° C. for 1 min, followed by 35 cycles of 98° C. for 10 sec, 58° C. for 30 sec, and 68° C. for 30 sec for NBPF19, initial denaturation at 95° C. for 1 min, followed by 30 cycles of 94° C. for 30 s, 50° C. for 30 s, and 72° C. for 60 s for LOC642361/NUTM2B-AS1, and .initial denaturation of 98° C. for lmin, followed by 35 cycles of 98° C. for 10 sec, 60° C. for 30 sec, and 68° C. for 30 sec for LRP12. GCII buffer was obtained from TaKaRa (TaKaRaLA taq with GC buffer).



FIG. 18A and FIG. 18B show primer sequences and barcode sequences used for the circular consensus sequencing (CCS) analysis using a SMRT sequencer. Each forward and reverse primers contained 16-mer barcodes as shown below. PCR reaction was conducted as follows; initial denaturation at 98° C. for 1 min, followed by 35 cycles of 98° C. for 10 sec, 58° C. for 30 sec, and 68° C. for 30 sec. GCII buffer was obtained from TaKaRa (TaKaRaLA taq with GC buffer).



FIG. 19A and FIG. 19B show repeat configurations of CGG and flanking repeats in NBPF19 in control subjects as revealed by CCS analysis. NBPF19 gene is also referred to as NOTCH2NLC gene. FIG. 19A: The CGG and flanking repeats in the 5′ UTR of NBPF19 is (AGG)(CGG)9(AGG)2(CGG) in the reference sequence (hg38). To determine the number of repeat units, repeat configurations and single nucleotide variants in the flanking sequences, circular consensus sequencing (CCS) analysis was performed for pooled barcoded PCR products from 182 control subjects. CCS reads were confirmed to have NBPF19-specific sequence shown by a underline. FIG. 19B: The present inventors observed 11 repeat configurations and single nucleotide variants (SNVs) in the flanking sequences in NBPF19. One allele carrying three SNVs (rs1172135200, rs1436954367, and rs1376391857) in the flanking sequences, all of which carried a configuration (AGG)(CGG)9(AGG)3, and another allele carrying rs1258206224 with a configuration of (AGG)(CGG)n(AGG)2(CGG) were observed in 14 and 3 controls, respectively. On the basis of these observations, distribution of number of the CGG repeat unit (shown by “n”) was determined (FIG. 30A, FIG. 30B, FIG. 30C, and FIG. 30D).



FIG. 20A and FIG. 20B shows a frequency distribution of repeat sizes in NBPF19 in 1,000 control subjects as revealed by fragment analysis. NBPF19 gene is also referred to as NOTCH2NLC gene. FIG. 20A: Frequency distribution of repeat sizes of the CGG repeats and the flanking variable repeat sequences in NBPF19 of 1,000 control subjects was determined by fragment analysis of PCR products obtained using NBPF19-specific primer pair (pGEX3′-NBPF19-6F and NBPF19-5R2). In the reference sequence (hg38), the repeat size is 13 repeat units, namely, (AGG)(CGG)9(AGG)2(CGG). FIG. 20B: Multiple sequence alignment of the five homologous sequences (NBPF19, AC253572.1, NOTCH2NL, NBPF14, and NOTCH2) using Clustal W2 is shown. Variable repeat sequences including CGG repeats are shown below a line. In the fragment analysis, repeat sizes were determined as the lengths in repeat units between the flanking non-variable sequences (shown below dotted lines). Primers used in the analysis are shown by arrows (pGEX3′-NBPF19-6F and NBPF19-5R2). Numbers shown in the figures indicate relative distances from 149,390,308 (NBPF19), 120,723,618 (AC253572.1), 146,229,332 (NOTCH2NL), 148,680,074 (NBPF14), and 120,069,958 (NOTCH2).



FIG. 21 shows inter-pulse durations (IPDs) in CGG sites examined by SMRT sequencing. The present inventors first created a reference IPD set for the hypomethylated CGGs and hypermethylated CGGs using whole-genome bisulfite sequencing data and PacBio Sequel sequencing data (both obtained from the same individual). The reference benchmark set had 303 hypomethylated CGG repeat regions with 1,220 Cp Gs and 14 hypermethylated regions with 59 CpGs. The present inventors observed a significant difference in IPD statistics (on cytosine sites of CGG) between the methylated (n=59) and unmethylated (n=1,220) CpG sites (*p=3.3×10−16, one-sided) using Mann-Whitney U test, demonstrating that IPD is informative in inferring CpG methylation status of CGG repeats. The present inventors next examined whether the expanded CGG repeat in the 5′ UTR of NBPF19 was similar to hypomethylated CGG repeats or hypermethylated CGG repeats in terms of IPD statistics of CpG sites, and the present inventors checked the null hypothesis of independence of IPD statistics using Mann-Whitney U test. The present inventors found that the IPD distribution on cytosine sites of the expanded CGG repeat in the 5′ UTR of NBPF19 (n=60) was similar to that of hypermethylated CGG repeats (n=59) (***p=0.35, two-sided test) but was significantly dissimilar to that of hypomethylated CGG repeats (n=1,220) (**p=1.6×10−4, one-sidedtest), showing that the expanded CGG repeat in the 5′ UTR of NBPF19 was regionally hypermethylated as a whole.



FIG. 22A and FIG. 22B show an expression level of NBPF19 in brains examined by RNA-seq. FIG. 22A: NBPF19 gene is also referred to as NOTCH2NLC gene. There are 4 positions in noncoding exon 1 of NBPF19 whose sequences are unique to NBPF19 among the five homologous sequences in AC253572.1, NOTCH2, NOTCH2NL, NBPF19, and NBPF14. Physical positions in hg38 are shown. From RNA-seq data from 3 patients with NIID and 8 control subjects (occipital lobe), read per million mapped reads of the positions were calculated. Because one of the position is just downstream of the CGG repeats (chr1:149,390,838 in hg38), which made precise alignment difficult, the present inventors did not calculate coverages of the position. FIG. 22B: Expression levels of NBPF19 the present inventors reassessed using read per million mapped reads in the three positions as described above. The present inventors did not see any statistically significant differences between NIID (n=3) and control subjects (n=8, Wilcoxson rank sum tests, two-sided). The data are shown as means and standard errors of means.



FIG. 23A, FIG. 23B, FIG. 23C, and FIG. 23D show an identification of CGG repeat expansions in LOC642361/NUTM2B-AS1 in a family with oculopharyngeal myopathy with leukoencephalopathy (OPML). FIG. 23A: Schematic representation of exons of LOC642361 and NUTM2B-AS1, both of which encode noncoding RNA. The directions of the transcription are indicated by arrows. The primer set used for repeat-primed PCR (RP-PCR) analysis is designed to detect expanded CGG repeats (a line and arrows). FIG. 23B: Representative results of RP-PCR analysis showing CGG repeat expansions in patients in the family F5305 (upper and middle panels). In an unaffected married-in individual, no CGG repeat expansions were detected (lower panel). Experiments were conducted twice with reproducible results. FIG. 23C: Pedigree chart of the family with OPML. Squares and circles indicate males and females, respectively. A diagonal line through a symbol indicates a deceased individual. Affected individuals are indicated by filled symbols. The pedigree charts are simplified for confidentiality reason. As shown in the mutation status below the symbols, four patients had repeat expansion mutations [exp(+)], whereas seven unaffected individuals including three married-in individuals did not [exp(−)]. FIG. 23D: Frequency distribution of repeat units of CGG repeats of 1,000 control subjects in LOC642361/NUTM2B-AS1 as revealed by fragment analysis is shown. LOC642361/NUTM2B-AS1-specific primers were used for amplification. In the reference sequence (hg38), (CGG)6 is registered.



FIG. 24 A and FIG. 24B shows short reads indicating CGG repeat expansion in LOC642361/NUTM2B-AS1. FIG. 24A: Nine nonrepeat reads paired with reads filled with CGG/CCG repeats were identified in patient III-5 in F5305. Seven of the nine reads were mapped to the LOC642361/NUTM2B-AS1 region best by BLAT. STR, short tandem repeat. FIG. 24B: Alignment of nonrepeat reads paired with reads filled with CGG/CCG repeats indicates that CGG repeat expansion is located in LOC642361/NUTM2B-AS1. Reads are shown in the same strand as the direction of transcription of LOC642361. Homologous sequences of LOC642361/NUTM2B-AS land mismatches among them are shown in red squares.



FIG. 25A and FIG. 25B show a linkage analysis of family (F5305) with OPML. Parametric linkage analysis results of family with OPML (F5305, FIG. 23A, FIG. 23B, FIG. 23C, and FIG. 23D) for all chromosomes (a) and candidate regions (b) are shown. Chromosome 10 is the only chromosome that shows LOD score of above 1. Boundary markers with physical positions in hg38 are indicated below. The locus of LOC642361/NUTM2B-AS1 is indicated by an arrow.



FIG. 26A, FIG. 26B, and FIG. 26C show a bidirectional transcription of CGG/CCG repeats in LOC642361/NUTM2B-AS1. Stranded RNA-seq data of a control brain and two control muscles using random primers in reverse transcription reactions are shown. Short reads are aligned to the reference sequence (hg38) using STAR [Dobin, A., et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics29, 15-21 (2013)]. Reads are divided into two files according to the direction of transcription. Only reads with mapping quality equal or more than 5 are shown using the Integrative Genomic Viewer [Robinson, J.T., et al. Integrative Genomic Viewer. Nat. Biotechnol. 29, 24-26 (2011)]. FIG. 26A: The CGG/CCG repeats in LOC642361/NUTM2B-AS1 were bidirectionally transcribed, although coverages at the CGG/CCG repeat were underrepresented presumably owing to its high GC content. FIG. 26B: No signals suggesting bidirectional transcription were observed in the CGG repeats in 5′ UTR of NBPF19, although a mapping problem remains in the locus considering other highly homologous sequences. FIG. 26C: Most of the reads in exon 1 of LRP12 were sense reads, whereas only trivial antisense reads were observed.



FIG. 27 shows a homologous regions of CGG repeats in LOC642361/NUTM2B-AS1. The regions of CGG repeats in LOC642361/NUTM2B-AS1 have two homologous sequences with high similarity in the reference genome (hg38). Identity and qs core are calculated using BLAT. The sequence (chr10:79,825,306-79,827,410) that corresponds to 1 kb upstream and downstream of the CGG repeat in LOC642361/NUTM2B-AS1 is used as a query.



FIG. 28A, FIG. 28B, and FIG. 28C shows multiple sequence alignments of homologous genes of LOC642361/NUTM2B-AS1. Multiple sequence alignment of sequence around the CGG/CCG repeats in LOC642361/NUTM2B-AS1 with homologous sequences of LINC00863/NUTM2A-AS1 (chromosome 10) and FLJ22063/AMMECR1L (chromosome 2) using ClustalW2. Sequences are derived from hg38. The position of CGG repeat expansion mutations is shown in a box. The primer sequence (LOC642361-R2, FIG. 13A, FIG. 13B, and FIG. 13C) for repeat-primed PCR analysis (shown by a lower arrow in FIG. 28B) and a primer pair (LOC642361_PCR-F3 and pGEX3′-LOC642361_PCR-R, FIG. 17A and FIG. 17B) for fragment analysis (shown by an arrow in FIG. 28A and shown by a upper arrow in FIG. 28B) were designed to avoid nonspecific amplification.



FIG. 29A and FIG. 29B show a southern blot analysis of LOC642361/NUTM2B-AS1. FIG. 29A: Southern blot analysis was performed using probes targeting flanking regions of the CGG repeats in LOC642361/NUTM2B-AS lin chromosome 10. The probes were also predicted to hybridize to the other two similar sequences (LINC00863/NUTM2A-ASlin chromosome 10 and FLJ22063/AMMECR1Lin chromosome 2). Predicted fragment sizes based on hg38 are 1.4 kb (LOC642361/NUTM2B-AS1), 1.4 kb (LINC0863/NUTM2A-AS1), and 1.1 kb (FLJ22063/AMMECR1L). Strong somatic instability of the CGG repeats was observed in genomic DNAs from peripheral blood leukocytes (PBL). The experiment was conducted once. FIG. 29B: An expanded allele of 2.1 kb (corresponding to 700 repeat units) was observed in genomic DNA from lymphoblastoid cell line of patient III-3 of family F5305. NC: normal control. The experiments were conducted twice with similar results.



FIG. 30A, FIG. 30B, FIG. 30C, FIG. 30D, and FIG. 30E show an identification of CGG repeat expansions in LRP12 in families with oculopharyngodistal myopathy (OPDM). FIG. 30A: Schematic representation of exons of LRP12. The CGG repeat expansion is located in the 5′ untranslated region (5′ UTR). The primer set used for repeat-primed PCR (RP-PCR) analysis is designed to detect expanded CGG repeats (a line and arrows). FIG. 30B: Representative results of RP-PCR analysis indicating CGG repeat expansions in patients in the families F7967 and F3411 (upper and middle panes). In an unaffected control, no CGG repeat expansions were detected (lower panel). Experiments were conducted twice with reproducible results. FIG. 30C: Pedigree charts of families with OPDM. Squares and circles indicate males and females, respectively. A diagonal line through a symbol indicates a deceased individual. Affected individuals are indicated by filled symbols. The pedigree charts are simplified for confidentiality reason. As shown in the mutation status below the symbols, three affected individuals had repeat expansion mutations [exp(+)], whereas the unaffected individual did not [exp(−)]. FIG. 30D: The CGG repeat expansions in LRP12 were identified in 38.2% of patients with supporting histopathological findings of rimmed vacuoles (RVs) and 16.7% of patients with unavailable histopathological findings. No CGG repeat expansions in LRP12 were found in patients with similar clinical presentations but without RVs in biopsied muscle specimens. FIG. 30E: Frequency distribution of repeat units of CGG repeats of 1,000 control subjects in LRP12 as revealed by fragment analysis is shown. The repeat configuration in the reference sequence (hg38) is (CGG)9(CGT)(CGG)(CGT)2. The number of repeat units for this allele was defined as 13 in this analysis. [FIG. 31A and FIG. 31B] FIG. 31A and FIG. 31B shows short reads indicating CGG repeat expansion in LRP12. FIG. 31A: Three nonrepeat reads paired with reads filled with CGG/CCG repeats were identified in patient III-1 in F7967. All the three reads were mapped to the LRP12 region by BLAT. STR, short tandem repeat. FIG. 31B: Alignment of nonrepeat reads paired with reads filled with CGG/CCG repeats indicates that CGG repeat expansion is located in 5′ UTR of LRP12. Reads are shown in the same strand as the direction of transcription of LRP12.



FIG. 32A and FIG. 32B show a southern blot analysis of patients with oculopharyngodistal myopathy and controls. FIG. 32A: Southern blot analysis of patients with OPDM1. In genomic DNAs from lymphoblastoid cell lines (LCLs), multiple bands presumably derived from somatic instabilities (gray arrows) were observed, whereas single expanded bands (230 and 380 bp, black arrows) were observed in genomic DNAs from peripheral blood leukocytes (PBL). This experiment was conducted once. FIG. 32B: In the two controls who had the longest repeats as suggested by repeat-primed PCR analysis, whose ages at blood sampling were 63 years and 25 years, the expanded CGG repeat sizes exceeded 300 bp (black and gray arrows) and multiple bands were observed in genomic DNA from LCL (gray arrows). This experiment was conducted once. Exp+, carrier of expansion; exp-, noncarrier of expansions.



FIG. 33A and FIG. 33B show clinical characteristics of the family (F5305) with oculopharyngeal myopathy with leukoencephalopathy (OPML). Abbreviation: y/o, years old; ND, not described; N/A: not applicable ; MMSE, Mini Mental State Examination; HDS R, The Revised Hasegawa dementia scale; WAIS R, Wechsler Adult Intelligence Scale revised; PIQ, performance intelligence Quotient; VIQ, verbal intelligence quotient; TIQ, total intelligence quotient.



FIG. 34 shows a model of a replication cycle of a circular nucleic acid.



FIG. 35 shows a procedure to detect a repeat expansion of CGG in a nucleic acid in a case where the repeat expansion of CGG is in NBPF19/NOTCH2NLC gene.



FIG. 36 shows a sequence of oriC cassette.



FIG. 37 shows a gel electrophoretic photograph according to example 8.



FIG. 38 shows gel electrophoretic photographs according to example 8.



FIG. 39 shows a table showing a result of size analysis of amplification products derived from four samples according to example 8.



FIG. 40 shows a table showing a result of size analysis of amplification products derived from 37 samples according to example 8.



FIG. 40 shows a table showing a result of size analysis of amplification products derived from 37 samples according to example 8.





DESCRIPTION OF EMBODIMENTS

Unstable tandem repeat expansions have been shown to be involved in a wide variety of neurological diseases. Given a rapidly increasing number of diseases belonging to this group, it is expected that many more diseases await identification of causative genes. Availability of massively parallel short-read sequencers has dramatically accelerated the search for causative genes including the de novo sequencing research paradigm. Since there remain difficulties in the detection of expanded tandem repeats with short-read sequencers, development of straightforward and efficient strategies for directly identifying expanded tandem repeats is expected to dramatically accelerate gene discoveries.


As the first candidate disease for direct search for expanded tandem repeat mutations, the present inventors selected neuronal intranuclear inclusion disease (NIID, MIM603472, https://omim.org/) in the present inventor's study. NIID is a neurodegenerative disease characterized clinically by various combinations of cognitive decline, parkinsonism, cerebellar ataxia and peripheral neuropathy, and neuropathologically by eosinophilic hyaline intranuclear inclusions in the central and peripheral nervous systems as well as in other tissues including cardiovascular, digestive, and urogenital organs. The age at onset ranges from infancy to late adulthood. Although an autosomal dominant mode of inheritance has been assumed, about two-thirds of cases have been reported to be sporadic. Recently, characteristic magnetic resonance imaging (MRI) findings including high-intensity signals in diffusion-weighted imaging (DWI) in the corticomedullary junction and eosinophilic intranuclear inclusions observed in skin biopsy have been described as useful diagnostic hallmarks for NIID. Following these reports, a rapidly increasing number of NIID cases, particularly those with late adult onset, have recently been reported.


Inspired by the striking similarity of MRI findings between NIID and fragile X tremor/ataxia syndrome (FXTAS, MIM300623), including T2-hyperintensity areas in the middle cerebellar peduncles (MCP sign) and high-intensity signals on DWI in the corticomedullary junction that are also occasionally observed in FXTAS (FIG. 1), and the presence of eosinophilic intranuclear inclusions observed in the two diseases, the present inventors hypothesized that NIID shares a common molecular basis with FXTAS, a disease caused by mildly expanded CGG repeats (premutation) in the 5′ untranslated region (UTR) of FMR1 with repeat units of 55-200. To explore the possibility of expanded CGG repeats in NIID, the present inventors devised the direct search strategy (FIG. 2A, FIG. 2B, FIG. 2C, and FIG. 2D) to efficiently identify expanded repeats in the human genome using TRhist, which produces histograms of short reads filled with tandem repeats. Employing TRhist, the present inventors indeed identified accumulation of short reads filled with CGG repeats in the 5′ UTR of NBPF19 in NIID in this present inventor's study. NBPF19 gene is also referred to as NOTCH2NLC gene.


Prompted by the similarity in the clinical and neuroimaging findings with NIID, the present inventors further identified similar noncoding CGG repeat expansions in two other diseases, oculopharyngeal myopathy with leukoencephalopathy (OPML) and oculopharyngodistal myopathy (OPDM, MIM164310), in LOC642361/NUTM2B-AS1 and LRP12, respectively. Taken together with the present inventor's previous findings, this present study further expands the concept that noncoding repeat expansion mutations involving the same repeat motifs, along with tissues where the genes are transcribed, lead to diseases with similar or overlapping clinical presentations, and provides a new straightforward approach to discover repeat expansion mutations underlying a wide variety of diseases.


Here, the present inventors identified noncoding CGG repeat expansions in the three genes, NBPF19, LOC642361, and LRP12, as the disease-causing mutations for NIID, OPML and OPDM, respectively (FIG. 3). NBPF19 gene is also referred to as NOTCH2NLC gene. The present inventors herein designate the diseases with the repeat expansions in NBPF19, LOC642361, and LRP12 as NIID1, OPML1, and OPDM1, respectively.


Including FXTAS and OPMD, these five diseases are caused by expansions involving the same repeat motif. Although the clinical presentations of FXTAS, NIID, OPML, OPDM, and OPMD are distinct, there are considerable overlaps among these diseases (FIG. 3), suggesting that transcribed expanded CGG repeats are commonly involved in the development of these diseases, irrespective of the genes where the expanded repeats are located. The present inventors have recently discovered that noncoding TTTCA repeat expansions in three genes cause benign adult familial myoclonic epilepsies (BAFME1 [MIM601068], BAFME6 [MIM618074], and BAFME7 [MIM618075]). Thus, the findings that the same expanded repeat motifs located in different genes lead to overlapping clinical spectra of diseases further expand the knowledge on the noncoding repeat expansion diseases. Although the tissue expression patterns of causative genes may modify their clinical presentations, what factors determine the distinct clinical characteristics among FXTAS, NIID1, OPML1, and OPDM1 remain to be further explored.


Although the frequency is very low, CGG repeat expansions in LRP12 were observed in a limited number of control subjects (0.2%). Regarding CGG repeat expansions in FMR1, 0.21% of males in controls had expansions (55-200 repeat units) in the United States. In frontotemporal lobar degeneration/amyotrophic lateral sclerosis (FTLD/ALS) caused by GGGGCC repeat expansions in C9orf72 [MIM105550], 0.15% of controls in the United Kingdom and 0.4% of controls in Finland have repeat expansions. Thus, rare occurrence of repeat expansions in controls seems to be common findings in noncoding repeat expansion diseases. Detailed investigations of the structures of expanded repeats and the haplotypes flanking the expanded repeats of the patients and controls may provide an insight into the mechanisms underlying the phenomenon.


Founder haplotypes have been identified in many repeat expansion diseases. Haplotype analysis in families with OPDM revealed a shared haplotype, suggesting a founder effect (FIG. 4). Because of the sequences with enormously high identities in the NBPF19 locus to the paralogous genes and the long heterochromatin (1q12) next to the locus (FIG. 5A and FIG. 5B), the present inventors were unable to unambiguously determine the haplotypes of families with NIID.


Of note, both FXTAS and C9ORF72-linked FTLD/ALS are well documented in sporadic cases. Family histories were documented only in 50% of Japanese families with NIID1 and 41% of patients with OPDM1 in the present case series, suggesting that the present inventors need to pay attention not only to familial cases but also to sporadic cases presenting with similar clinical features. Furthermore, diversities in clinical presentations and ages at onset have also been observed in these diseases. Although the mechanisms are as yet unknown, dynamic instability of noncoding repeat expansions among tissues as well as in germlines may underlie these phenomena.


In the present inventor's case series, 7.1% of Japanese NIID patients and 61.8% of OPDM patients with supporting pathological findings of biopsied tissuesdid not have CGG repeat expansion mutations in NBPF19 and LRP12, respectively. Thus, there remains a possibility of genetic heterogeneity in these diseases. Further search for CGG repeat expansions located in other loci or repeat expansions involving similar repeat motifs will be a feasible approach.


Analysis of methylation status of expanded CGG repeats in a patient with NIID using SMRT sequence reads showed a tendency of hypermethylation of CGG repeats. The present inventors did not, however, detect statistically significant decrease of NBPF19 transcripts, indicating that expanded alleles are not fully silenced. In addition, Fiddes et al. reported that NBPF19/NOTCH2NLC (which they call NOTCH2NLC-like paratype) had variable copy numbers with the frequency of 0, 1, and 2 copies being 0.4%, 6%, and 92%, respectively, indicating that haploinsufficiency of NBPF19 unlikely causes NIID.


In FXTAS, ubiquitinated inclusions have been shown in brains and non-neuronal tissues. After the discovery of repeat-associated non-ATG-initiated (RAN) translation, RAN proteins have been revealed to be a component of the ubiquitinated inclusions in FXTAS. NIID and OPDM are pathologically characterized by intranuclear inclusions and tubulofilamentous inclusions, respectively. Thus, it is conceivable to postulate that these inclusions observed in NIID and OPDM contain RAN proteins, although it awaits confirmation. In contrast, routine histopathological examinations of biopsied muscle from the two patients (III-3 and III-5 in F5305) did not reveal inclusions in OMPL1. RNA-mediated toxicity through the sequestration of RNA-binding proteins that recognize expanded CGG repeats may also be variably involved in these diseases.


Identification of disease-causing repeat expansions has been accomplished usually by laborious classical positional cloning approaches. As shown in the present disclosure, the present inventors used TRhist to directly detect repeat expansions from short-read next-generation sequencing data and discovered the causative genes by alignment of nonrepeat reads of the paired short reads to the reference genome. Among the recently developed programs targeting repeat expansions from the short-read data, an advantage of TRhist is its ability to detect insertions of any kind of expanded repeats including those containing novel repeat motifs that are not present in the reference genome. Since the present inventor's strategy (FIG. 2A, FIG. 2B, FIG. 2C, and FIG. 2D) does not require prior linkage analysis, it can be applicable to families with variable penetrances and even to sporadic patients without family histories. Availability of single-molecule long-read sequencers should further complement the search for disease-causing repeat expansions employing currently standard short-read next-generation sequencers.Considering that there are ˜80,000 microsatellites with 3-6 bases in introns of the human genome that could potentially undergo expansion, which by far exceed the number of 20,000 protein-coding and 22,000 noncoding genes (Ensembl, https://www.ensembl.org/), the search for noncoding repeat expansions is expected to further expand the present inventor's knowledge regarding the genetic architecture of a wide variety of diseases or traits.


In conclusion, the present inventors identified noncoding CGG repeat expansions as the causes of NIID1, OPML1, and OPDM1. These findings expand the present inventor7s insights into the molecular basis of these diseases and further emphasize the importance of noncoding repeat expansions in a wide variety of neurological diseases.


Based on the above findings by the present inventors, a method for determining, diagnosing, or aiding to diagnose a neuromuscular disease accompanied with a repeat expansion of CGG in a nucleic acid in a subject according to the embodiment of the present invention comprises detecting a repeat expansion of CGG or a complementary sequence thereof in a nucleic acid sample from the subject. Examples of the neuromuscular disease accompanied with the repeat expansion of CGG are neuronal intranuclear inclusion disease, (NIID) oculopharyngodistal myopathy (OPDM), and oculopharyngeal myopathy with leukoencephalopathy (OPML). Clinically, most cases of NIID present as a multisystem neurodegenerative process beginning in the second decade and progressing to death in 10 to 20 years. Neurological signs and symptoms vary widely, but usually include ataxia, extra-pyramidal signs such as tremor , lower motor neuron findings such as absent deep tendon reflexes, weakness, muscle wasting, foot deformities and less apparent behavioral or cognitive difficulties. Reported adult-onset cases are characterized by dementia and may represent different clinical presentations. In the present disclosure, the neuromuscular disease excludes fragile X syndrome, fragile X tremor ataxia syndrome (FXTAS), and oculopharyngeal muscular dystrophy.


The presence of the repeat expansion in the nucleic acid sample indicates that the subject has the neuromuscular disease or is at risk of having the neuromuscular disease. The method can be used for determining whether the subject has or is at risk of having the neuromuscular disease.


The subject is a human being or a non-human animal. The subject may be a patient who may have the neuromuscular disease. The nucleic acid sample may be collected from the subject prior to the detection of the repeat expansion. The nucleic acid sample may be collected from a cell from the subject. The cell may be leukocyte, lymphocyte, monocyte, erythroblast, hematopoietic stem cell, or hematopoietic progenitor cell. The method may be carried out in vivo. The nucleic acid sample may be DNA, such as chromosome DNA, or alternatively, the nucleic acid sample may be RNA. The repeat expansion of CGG may be in any gene from the subject.


In the case where the neuromuscular disease is neuronal intranuclear inclusion disease, the repeat expansion of CGG may be in NBPF19 gene. In the case where the neuromuscular disease is neuronal intranuclear inclusion disease, the repeat expansion may be greater than 70 repeats, greater than 75 repeats, greater than 80 repeats, greater than 85 repeats, or greater than 90 repeats. In the case where the neuromuscular disease is neuronal intranuclear inclusion disease, the size of the expanded CGG may be greater than 210 base pairs, greater than 225 base pairs, greater than 240 base pairs, greater than 255 base pairs, or 270 base pairs.


In the case where the neuromuscular disease is oculopharyngodistal myopathy, the repeat expansion of CGG may be in 5′ untranslated region of LRP12 gene. In the case where the neuromuscular disease is oculopharyngodistal myopathy, the repeat expansion may be greater than 70 repeats, greater than 75 repeats, greater than 77 repeats, greater than 80 repeats, greater than 85 repeats, or greater than 90 repeats. In the case where the neuromuscular disease is oculopharyngodistal myopathy, the size of the expanded CGG may be greater than may be greater than 210 base pairs, greater than 225 base pairs, greater than 231 base pairs, greater than 240 base pairs, greater than 255 base pairs, or 270 base pairs.


In the case where the neuromuscular disease is oculopharyngeal myopathy with leukoencephalopathy, the repeat expansion of CGG may be in LOC642361 gene. LOC642361 gene is also referred to as NUTM2B-AS1 gene. In the case where the neuromuscular disease is oculopharyngeal myopathy with leukoencephalopathy, the repeat expansion may be greater than the range in healthy individuals. The range in healthy individuals is 6 to 14 repeat units. In the case where the neuromuscular disease is oculopharyngeal myopathy with leukoencephalopathy, the size of the expanded CGG may be greater than the range in healthy individuals. The range in healthy individuals is 18 to 42 base pairs.


A kit for determining or diagnosing a neuromuscular disease accompanied with a repeat expansion of CGG in a nucleic acid in a subject according to the embodiment of the present invention comprises a nucleic acid reagent configured to detect a repeat expansion of CGG or a complementary sequence thereof in a nucleic acid sample from the subject. Examples of the neuromuscular disease are neuronal intranuclear inclusion disease, oculopharyngodistal myopathy, and oculopharyngeal myopathy with leukoencephalopathy.


The kit can be used for the method for determining or diagnosing the neuromuscular disease in the subject according to the embodiment of the present invention. The kit may be used in vivo.


The nucleic acid reagent may comprise a PCR primer configured to detect the repeat expansion of CGG or the complementary sequence thereof. The PCR primer may comprise a complementary sequence of CGG or a complementary sequence thereof.


The PCR may be a repeat-primed PCR and a long-range PCR. The repeat-primed PCR and the long-range PCR can detect the repeat expansion. An application on the repeat-primed PCR is described in Neuron 72, 257-268, October 20, 2011. In the repeat-primed PCR, nucleic acids are amplified between a forward primer and a reverse primer at an initial stage. Since the concentration of the forward primer is low, the forward primer is wasted. Thereafter, the nucleic acids are amplified between an anchor primer and the reverse primer. If the anchor primer does not present, a repeat sequence is randomly annealed. In such case, only short PCR products are produced, and it is difficult to detect a repeat expansion. If the anchor primer presents, PCR products are produced between the anchor primer and the reverse primer so that they reflect the distribution of PCR products produced at the initial stage by the annealing of the forward primer. A comb-like distribution of the PCR product can be obtained. It should be noted that the anchor primer is not limited to any specific sequence.


Alternatively, the nucleic acid reagent in the kit may comprise a hybridization probe configured to detect the repeat expansion of CGG, or the complementary sequence thereof. The hybridization probe can be used for a southern blotting, for example. The southern blotting can detect the repeat expansion. The hybridization probe is configured to detect fragmented nucleic acids that contain the expanded repeat sequence. The fragmented nucleic acids are prepared by using a restriction enzyme. The restriction enzyme is appropriately selected. A restriction site neighboring the expanded repeat sequence is preferably selected. The size of the fragmented nucleic acids prepared by the restriction enzyme may be less than 20 kb, less than 10 kb, or less than 5 kb.


The hybridization probe may comprise a complementary sequence of CGG, or a complementary sequence thereof. The hybridization probe may comprise a complementary sequence of a genome sequence around the expanded repeat sequence. The hybridization probe may comprise a complementary sequence of a sequence flanking the repeat expansion of CGG, or a complementary sequence thereof. The size of the sequence flanking the repeat expansion of CGG may be below 20 kb, below 10 kb, or below 5 kb. The hybridization probe may comprise a complementary sequence of a genome sequence of a partial sequence of the fragmented nucleic acids that contain the expanded repeat sequence.


Further, a method for determining a neuromuscular disease accompanied with a repeat expansion of CGG in a nucleic acid in a subject according to the embodiment of the present invention comprises obtaining a nucleic acid fragment having a repeat expansion of CGG or a complementary sequence thereof from a nucleic acid sample from the subject, circularizing the nucleic acid fragment with an origin of chromosome (oriC) cassette to form a circular nucleic acid, amplifying the circular nucleic acid to produce a plurality of circular nucleic acids, and detecting the repeat expansion of CGG or the complementary sequence thereof.


The nucleic acid sample may be a chromosome DNA. The repeat expansion of CGG may be in a gene from the subject. The nucleic acid fragment may be obtained by using a restriction enzyme or a gene editing protein. Any restriction enzyme or any gene editing protein that does not cleave the repeat expansion of CGG or the complementary sequence but can cleave an external sequence of the repeat expansion of CGG or the complementary sequence can be used. Combination of a plurality of enzymes and/or a plurality of gene editing proteins can be used. An example of the restriction enzyme is Earl. Examples of the gene editing protein are Cas protein family such as CRISPR/Cas9, ZFN, and TALEN. Any modified gene editing protein can be used.


With regards to replication origin sequences (oriC) that can bind to an enzyme having DnaA activity, publicly known replication origin sequences existing in bacterium, such as E. coli, Bacillus subtilis, etc., may be obtained from a public database such as NCBI (http://www.ncbi.nlm.nih.gov/). Or else, the replication origin sequence may be obtained by cloning a DNA fragment that can bind to an enzyme having DnaA activity and analyzing its base sequence.


The oriC cassette comprises the oriC and sequences configured to overlap against loci of the nucleic acid fragment. The oriC may locate between the sequences configured to overlap against loci of the nucleic acid fragment. The oriC cassette may further comprise ter sequence as described below.


5′ region of the oriC cassette may be complementary to 5′ region of the nucleic acid fragment and 3′ region of the oriC cassette may be complementary to 3′ region of the nucleic acid fragment. Alternatively, 5′ region of the oriC cassette may be complementary to 3′ region of the nucleic acid fragment and 3′ region of the oriC cassette may be complementary to 5′ region of the nucleic acid fragment.


The repeat expansion of CGG or the complementary sequence thereof may locate between the 5′ region and the 3′ region of the nucleic acid fragment. The 5′ region and the 3′ region of the nucleic acid fragment may be loci specific to the neuromuscular disease.


The nucleic acid sample and the oriC cassette may be assembled in the presence of a protein having RecA family recombinase activity to form the circular nucleic acid. The protein having RecA family recombinase activity will be referred to as RecA family recombinase protein.


The RecA family recombinase activity includes a function of polymerizing on single-stranded or double-stranded DNA to form a filament, hydrolysis activity for nucleoside triphosphates such as ATP (adenosine triphosphate), and a function of searching for a homologous region and performing homologous recombination. Examples of the RecA family recombinase proteins include Prokaryotic RecA homolog, bacteriophage RecA homolog, archaeal RecA homolog, eukaryotic RecA homolog, and the like. Examples of Prokaryotic RecA homologs include E. coli RecA; RecA derived from highly thermophilic bacteria such as Thermus bacteria such as Thermus thermophiles and Thermus aquaticus, Thermococcus bacteria, Pyrococcus bacteria, and Thermotoga bacteria; RecA derived from radiation-resistant bacteria such as Deinococcus radiodurans. Examples of bacteriophage RecA homologs include T4 phage UvsX. Examples of archaeal RecA homologs include RadA. Examples of eukaryotic RecA homologs include Rad51 and its paralog, and Dcml. The amino acid sequences of these RecA homologs can be obtained from databases such as NCBI (http://www.ncbi.nlm.nih.gov/).


The RecA family recombinase protein may be a wild-type protein or a variant thereof. The variant is a protein in which one or more mutations that delete, add or replace 1 to 30 amino acids are introduced into a wild-type protein and which retains the RecA family recombinase activity. Examples of the variants include variants with amino acid substitution mutations that enhance the function of searching for homologous regions in wild-type proteins, variants with various tags added to the N-terminal or C-terminus of wild-type proteins, and variants with improved heat resistance (WO 2016/013592). As the tag, for example, tags widely used in the expression or purification of recombinant proteins such as His tag, HA (hemagglutinin) tag, Myc tag, and Flag tag can be used. The wild-type RecA family recombinase protein means a protein having the same amino acid sequence as that of the RecA family recombinase protein retained in organisms isolated from nature.


The RecA family recombinase protein is preferably a variant that retains the RecA family recombinase protein. Examples of the variants include a F203W mutant in which the 203rd amino acid residue phenylalanine of E. coli RecA is substituted with tryptophan, and mutants in which phenylalanine corresponding to the 203rd phenylalanine of E. coli RecA is substituted with tryptophan in various RecA homologs.


A first enzyme group may be used to catalyze the replication of the circular nucleic acid. An example of the first enzyme group that catalyzes the replication of the circular nucleic acid is an enzyme group set forth in Kaguni J M & Kornberg A. Cell. 1984, 38:183-90. Specifically, examples of the first enzyme group include one or more enzymes or enzyme group selected from a group consisting of an enzyme having DnaA activity, one or more types of nucleoid protein, an enzyme or enzyme group having DNA gyrase activity, single-strand binding protein (SSB), an enzyme having DnaB-type helicase activity, an enzyme having DNA helicase loader activity, an enzyme having DNA primase activity, an enzyme having DNA clamp activity, and an enzyme or enzyme group having DNA polymerase III* activity, and a combinations of all of the aforementioned enzymes or enzyme groups.


The enzyme having DnaA activity is not particularly limited in its biological origin as long as it has an initiator activity that is similar to that of DnaA, which is an initiator protein of E. coli, and DnaA derived from E. coli may be preferably used. The Escherichia coli-derived DnaA may be contained as a monomer in the reaction solution in an amount of 1 nmol/L to 10 μmol/L, preferably in an amount of 1 nmol/L to 5 μmol/L, 1 nmol/L to 3 μmol/L, 1 nmol/L to 1.5 μmol/L, 1 nmol/L to 1.0 μmol/L, 1 nmol/L to 500 nmol/L, 50 nmol/L to 200 nmol/L, or 50 nmol/L to 150 nmol/L, but without being limited thereby.


A nucleoid protein is protein in the nucleoid. The one or more types of nucleoid protein is not particularly limited in its biological origin as long as it has an activity that is similar to that of the nucleoid protein of E. coli. For example, Escherichia coli-derived IHF, namely, a complex of IhfA and/or IhfB (a heterodimer or a homodimer), or Escherichia coli-derived HU, namely, a complex of hupA and hupB can be preferably used. The Escherichia coli-derived IHF may be contained as a hetero/homo dimer in a reaction solution in a concentration range of 5 nmol/L to 400 nmol/L. Preferably, the Escherichia coli-derived IHF may be contained in a reaction solution in a concentration range of 5 nmol/L to 200 nmol/L, 5 nmol/L to 100 nmol/L, 5 nmol/L to 50 nmol/L, 10 nmol/L to 50 nmol/L, 10 nmol/L to 40 nmol/L, or 10 nmol/L to 30 nmol/L, but the concentration range is not limited thereto. The Escherichia coli-derived HU may be contained in a reaction solution in a concentration range of 1 nmol/L to 50 nmol/L, and preferably, may be contained therein in a concentration range of 5 nmol/L to 50 nmol/L or 5 nmol/L to 25 nmol/L, but the concentration range is not limited thereto.


An enzyme or enzyme group having DNA gyrase activity is not particularly limited in its biological origin as long as it has an activity that is similar to that of the DNA gyrase of E. coli. For example, a complex of Escherichia coli-derived GyrA and GyrB can be preferably used. Such a complex of Escherichia coli-derived GyrA and GyrB may be contained as a heterotetramer in a reaction solution in a concentration range of 20 nmol/L to 500 nmol/L, and preferably, may be contained therein in a concentration range of 20 nmol/L to 400 nmol/L, 20 nmol/L to 300 nmol/L, 20 nmol/L to 200 nmol/L, 50 nmol/L to 200 nmol/L, or 100 nmol/L to 200 nmol/L, but the concentration range is not limited thereto.


A single-strand binding protein (SSB) is not particularly limited in its biological origin as long as it has an activity that is similar to that of the single-strand binding protein of E. coli. For example, Escherichia coli-derived SSB can be preferably used. Such Escherichia coli-derived SSB may be contained as a homotetramer in a reaction solution in a concentration range of 20 nmol/L to 1000 nmol/L, and preferably, may be contained therein in a concentration range of 20 nmol/L to 500 nmol/L, 20 nmol/L to 300 nmol/L, 20 nmol/L to 200 nmol/L, 50 nmol/L to 500 nmol/L, 50 nmol/L to 400 nmol/L, 50 nmol/L to 300 nmol/L, 50 nmol/L to 200 nmol/L, 50 nmol/L to 150 nmol/L, 100 nmol/L to 500 nmol/L, or 100 nmol/L to 400 nmol/L, but the concentration range is not limited thereto.


An enzyme having DnaB-type helicase activity is not particularly limited in its biological origin as long as it has an activity that is similar to that of the DnaB of E. coli. For example, Escherichia coli-derived DnaB can be preferably used. Such Escherichia coli-derived DnaB may be contained as a homohexamer in a reaction solution in a concentration range of 5 nmol/L to 200 nmol/L, and preferably, may be contained therein in a concentration range of 5 nmol/L to 100 nmol/L, 5 nmol/L to 50 nmol/L, or 5 nmol/L to 30 nmol/L, but the concentration range is not limited thereto.


An enzyme having DNA helicase loader activity is not particularly limited in its biological origin as long as it has an activity that is similar to that of the DnaC of E. coli. For example, Escherichia coli-derived DnaC can be preferably used. Such Escherichia coli-derived DnaC may be contained as a homohexamer in a reaction solution in a concentration range of 5 nmol/L to 200 nmol/L, and preferably, may be contained therein in a concentration range of 5 nmol/L to 100 nmol/L, 5 nmol/L to 50 nmol/L, or 5 nmol/L to 30 nmol/L, but the concentration range is not limited thereto.


An enzyme having DNA primase activity is not particularly limited in its biological origin as long as it has an activity that is similar to that of the DnaG of E. coli. For example, Escherichia coli-derived DnaG can be preferably used. Such Escherichia coli-derived DnaG may be contained as a monomer in a reaction solution in a concentration range of 20 nmol/L to 1000 nmol/L, and preferably, may be contained therein in a concentration range of 20 nmol/L to 800 nmol/L, 50 nmol/L to 800 nmol/L, 100 nmol/L to 800 nmol/L, 200 nmol/L to 800 nmol/L, 250 nmol/L to 800 nmol/L, 250 nmol/L to 500 nmol/L, or 300 nmol/L to 500 nmol/L, but the concentration range is not limited thereto.


An enzyme having DNA clamp activity is not particularly limited in its biological origin as long as it has an activity that is similar to that of the DnaN of E. coli. For example, Escherichia coli-derived DnaN can be preferably used. Such Escherichia coli-derived DnaN may be contained as a homodimer in a reaction solution in a concentration range of 10 nmol/L to 1000 nmol/L, and preferably, may be contained therein in a concentration range of 10 nmol/L to 800 nmol/L, 10 nmol/L to 500 nmol/L, 20 nmol/L to 500 nmol/L, 20 nmol/L to 200 nmol/L, 30 nmol/L to 200 nmol/L, or 30 nmol/L to 100 nmol/L, but the concentration range is not limited thereto.


An enzyme or enzyme group having DNA polymerase III* activity is not particularly limited in its biological origin as long as it is an enzyme or enzyme group having an activity that is similar to that of the DNA polymerase III* complex of E. coli. For example, an enzyme group comprising any of Escherichia coli-derived DnaX, HolA, HolB, HolC, HolD, DnaE, DnaQ, and HolE, preferably, an enzyme group comprising a complex of Escherichia coli-derived DnaX, HolA, HolB, and DnaE, and more preferably, an enzyme comprising a complex of Escherichia coli-derived DnaX, HolA, HolB, HolC, HolD, DnaE, DnaQ, and HolE, can be preferably used. Such an Escherichia coli-derived DNA polymerase III* complex may be contained as a heteromultimer in a reaction solution in a concentration range of 2 nmol/L to 50 nmol/L, and preferably, may be contained therein in a concentration range of 2 nmol/L to 40 nmol/L, 2 nmol/L to 30 nmol/L, 2 nmol/L to 20 nmol/L, 5 nmol/L to 40 nmol/L, 5 nmol/L to 30 nmol/L, or 5 nmol/L to 20 nmol/L, but the concentration range is not limited thereto.


A second enzyme group may be used to catalyze an Okazaki fragment maturation and synthesizes two sister circular nucleic acids constituting a catenane. The two sister circular nucleic acids are not covalently linked to one another but nevertheless cannot be separated unless covalent bond breakage occurs.


Examples of enzymes of the second enzyme group that catalyze an Okazaki fragment maturation and synthesize two sister circular DNAs constituting the catenane may include, for example, one or more enzymes selected from the group consisting of an enzyme having DNA polymerase I activity, an enzyme having DNA ligase activity, and an enzyme having RNaseH activity, or a combination of these enzymes.


An enzyme having DNA polymerase I activity is not particularly limited in its biological origin as long as it has an activity that is similar to DNA polymerase I of E. coli. For example, Escherichia coli-derived DNA polymerase I can be preferably used. Such Escherichia coli-derived DNA polymerase I may be contained as a monomer in a reaction solution in a concentration range of 10 nmol/L to 200 nmol/L, and preferably, may be contained therein in a concentration range of 20 nmol/L to 200 nmol/L, 20 nmol/L to 150 nmol/L, 20 nmol/L to 100 nmol/L, 40 nmol/L to 150 nmol/L, 40 nmol/L to 100 nmol/L, or 40 nmol/L to 80 nmol/L, but the concentration range is not limited thereto.


An enzyme having DNA ligase activity is not particularly limited in its biological origin as long as it has an activity that is similar to DNA ligase of E. coli. For example, Escherichia coli-derived DNA ligase or the DNA ligase of T4 phage can be preferably used. Such Escherichia coli-derived DNA ligase may be contained as a monomer in a reaction solution in a concentration range of 10 nmol/L to 200 nmol/L, and preferably, may be contained therein in a concentration range of 15 nmol/L to 200 nmol/L, 20 nmol/L to 200 nmol/L, 20 nmol/L to 150 nmol/L, 20 nmol/L to 100 nmol/L, or 20 nmol/L to 80 nmol/L, but the concentration range is not limited thereto.


The enzyme having RNaseH activity is not particularly limited in terms of biological origin, as long as it has the activity of decomposing the RNA chain of an RNA-DNA hybrid. For example, Escherichia coli-derived RNaseH can be preferably used. Such Escherichia coli-derived RNaseH may be contained as a monomer in a reaction solution in a concentration range of 0.2 nmol/L to 200 nmol/L, and preferably, may be contained therein in a concentration range of 0.2 nmol/L to 200 nmol/L, 0.2 nmol/L to 100 nmol/L, 0.2 nmol/L to 50 nmol/L, 1 nmol/L to 200 nmol/L, 1 nmol/L to 100 nmol/L, 1 nmol/L to 50 nmol/L, or 10 nmol/L to 50 nmol/L, but the concentration range is not limited thereto.


A third enzyme group may be used to catalyze a separation of the two sister circular nucleic acids.


An example of the third enzyme group that catalyzes the separation of the two sister circular nucleic acids is an enzyme group set forth in, for example, the enzyme group described in Peng H & Marians K J. PNAS. 1993, 90: 8571-8575. Specifically, examples of the third enzyme group include one or more enzymes selected from a group consisting of an enzyme having topoisomerase IV activity, an enzyme having topoisomerase III activity, and an enzyme having RecQ-type helicase activity; or a combination of the aforementioned enzymes.


The enzyme having topoisomerase III activity is not particularly limited in terms of biological origin, as long as it has the same activity as that of the topoisomerase III of Escherichia coli. For example, Escherichia coli-derived topoisomerase III can be preferably used. Such Escherichia coli-derived topoisomerase III may be contained as a monomer in a reaction solution in a concentration range of 20 nmol/L to 500 nmol/L, and preferably, may be contained therein in a concentration range of 20 nmol/L to 400 nmol/L, 20 nmol/L to 300 nmol/L, 20 nmol/L to 200 nmol/L, 20 nmol/L to 100 nmol/L, or 30 to 80 nmol/L, but the concentration range is not limited thereto.


The enzyme having RecQ-type helicase activity is not particularly limited in terms of biological origin, as long as it has the same activity as that of the RecQ of Escherichia coli. For example, Escherichia coli-derived RecQ can be preferably used. Such Escherichia coli-derived RecQ may be contained as a monomer in a reaction solution in a concentration range of 20 nmol/L to 500 nmol/L, and preferably, may be contained therein in a concentration range of 20 nmol/L to 400 nmol/L, 20 nmol/L to 300 nmol/L, 20 nmol/L to 200 nmol/L, 20 nmol/L to 100 nmol/L, or 30 to 80 nmol/L, but the concentration range is not limited thereto.


An enzyme having topoisomerase IV activity is not particularly limited in its biological origin as long as it has an activity that is similar to topoisomerase IV of E. coli. For example, Escherichia coli-derived topoisomerase IV that is a complex of ParC and ParE can be preferably used. Such Escherichia coli-derived topoisomerase IV may be contained as a heterotetramer in a reaction solution in a concentration range of 0.1 nmol/L to 50 nmol/L, and preferably, may be contained therein in a concentration range of 0.1 nmol/L to 40 nmol/L, 0.1 nmol/L to 30 nmol/L, 0.1 nmol/L to 20 nmol/L, 1 nmol/L to 40 nmol/L, 1 nmol/L to 30 nmol/L, 1 nmol/L to 20 nmol/L, 1 nmol/L to 10 nmol/L, or 1 nmol/L to 5 nmol/L, but the concentration range is not limited thereto.


Without being limited by theory, the circular nucleic acid is replicated or amplified through the replication cycle shown in FIG. 34 and FIG. 35 or by repeating this replication cycle. In the present description, replication of the circular nucleic acid means that the same molecule as the circular nucleic acid used as a template is generated.


Replication of the circular nucleic acid can be confirmed by the phenomenon that the amount of the circular nucleic acids in the reaction product after completion of the reaction is increased, in comparison to the amount of circular nucleic acid used as a template at initiation of the reaction. Preferably, replication of the circular nucleic acid means that the amount of the circular nucleic acids in the reaction product is increased at least 2 times, 3 times, 5 times, 7 times, or 9 times, in comparison to the amount of the circular nucleic acid at initiation of the reaction. Amplification of the circular nucleic acid means that replication of the circular nucleic acid progresses and the amount of the circular nucleic acids in the reaction product is exponentially increased with respect to the amount of the circular nucleic acid used as a template at initiation of the reaction. Accordingly, amplification of the circular nucleic acid is one embodiment of the replication of the circular nucleic acids. In the present description, the amplification of the circular nucleic acid means that the amount of the circular nucleic acids in the reaction product is increased at least 10 times, 50 times, 100 times, 200 times, 500 times, 1000 times, 2000 times, 3000 times, 4000 times, 5000 times, or 10000 times, in comparison to the amount of the circular nucleic acid used as a template at initiation of the reaction.


The circular nucleic acid is amplified in a cell-free system. The cell-free system means that the replication reaction is not performed in cells. Therefore, the method may be carried out in vitro.


The circular nucleic acid may comprise a pair of ter sequences that are each inserted outward with respect to oriC, and/or a nucleotide sequence recognized by XerCD. In a case where the circular nucleic acid has the ter sequences, a reaction solution for the amplification of the circular nucleic acid may comprise a protein having an activity of inhibiting replication by binding to the ter sequences. In a case where the circular nucleic acid has the nucleotide sequence recognized by XerCD, the reaction solution may comprise a XerCD protein.


A combination of ter sequences on the circular nucleic acid and the protein having the activity of inhibiting replication by binding to the ter sequences constitutes a mechanism of terminating replication. This mechanism was found in a plurality types of bacteria, and for example, in Escherichia coli, this mechanism has been known as a Tus-ter system (Hiasa, H., and Marians, K. J., J. Biol. Chem., 1994, 269: 26959-26968; Neylon, C., et al., Microbiol. Mol. Biol. Rev., September 2005, p. 501-526) and in Bacillus bacteria, this mechanism has been known as an RTP-ter system (Vivian, et al., J. Mol. Biol., 2007, 370: 481-491). In the method, by utilizing this mechanism, generation of a multimer as a by-product can be suppressed. The combination of the ter sequences on the circular nucleic acid and the protein having the activity of inhibiting replication by binding to the ter sequences is not particularly limited, in terms of the biological origin thereof.


A combination of a sequence recognized by XerCD on the DNA and a XerCD protein constitutes a mechanism of separating a multimer (Ip, S. C. Y., et al., EMBO J., 2003, 22: 6399-6407). The XerCD protein is a complex of XerC and XerD. As such a sequence recognized by XerCD, a dif sequence, a cer sequence, and a psi sequence have been known (Colloms, et al., EMBO J., 1996, 15(5): 1172-1181; Arciszewska, L. K., et al., J. Mol. Biol., 2000, 299: 391-403). In the method, by utilizing this mechanism, generation of a multimer as a by-product can be suppressed. The combination of the sequence recognized by XerCD on the circular nucleic acid and the XerCD protein is not particularly limited, in terms of the biological origin thereof. Moreover, the promoting factors of XerCD have been known, and for example, the function of dif is promoted by a FtsK protein (Ip, S. C. Y., et al., EMBO J., 2003, 22: 6399-6407). In one embodiment, such a FtsK protein may be comprised in the reaction solution.


The amplified circular nucleic acids are analyzed for detecting the repeat expansion of CGG or the complementary sequence thereof. For example, the molecular weight of the amplified circular nucleic acids is analyzed by using an electrophoresis.


The method may further comprise digesting the amplified circular nucleic acids to obtain amplified nucleic acid fragments. Each of the amplified nucleic acid fragments may have the repeat expansion of CGG or the complementary sequence thereof. For example, the amplified circular nucleic acids are digested by using a restriction enzyme. Any restriction enzyme that does not cleave the repeat expansion of CGG or the complementary sequence but can cleave an external sequence of the repeat expansion of CGG or the complementary sequence in the circular nucleic acid can be used. Combination of a plurality of enzymes can be used. An example of the restriction enzyme is SacI. The amplified nucleic acid fragments are analyzed for detecting the repeat expansion of CGG or the complementary sequence thereof. For example, the molecular weight of the amplified nucleic acid fragments is analyzed by using an electrophoresis.


The neuromuscular disease may be selected from the group consisting of neuronal intranuclear inclusion disease, oculopharyngodistal myopathy, and oculopharyngeal myopathy with leukoencephalopathy.


If the neuromuscular disease is neuronal intranuclear inclusion disease, the repeat expansion of CGG is NBPF19 gene. NBPF19 gene is also referred to as NOTCH2NLC gene. Therefore, the nucleic acid sample is obtained from NBPF19 gene/NOTCH2NLC gene. The repeat expansion due to neuronal intranuclear inclusion disease is detected by analyzing the amplified circular nucleic acids and/or the amplified nucleic acid fragments.


If the neuromuscular disease is oculopharyngodistal myopathy, the repeat expansion of CGG is in 5′ untranslated region of LRP12 gene. Therefore, the nucleic acid sample is obtained from LRP12 gene. The repeat expansion due to oculopharyngodistal myopathy is detected by analyzing the amplified circular nucleic acids and/or the amplified nucleic acid fragments.


If the neuromuscular disease is oculopharyngeal myopathy with leukoencephalopathy, the repeat expansion of CGG is in LOC642361 gene. LOC642361 gene is also referred to as NUTM2B-AS1 gene. Therefore, the nucleic acid sample is obtained from LOC642361/NUTM2B-AS1 gene. The repeat expansion due to oculopharyngeal myopathy is detected by analyzing the amplified circular nucleic acids and/or the amplified nucleic acid fragments.


As the method for amplifying the circular nucleic acid eliminates a deletion of a repeat expansion, it is possible for the method to detect the repeat expansion.


A kit for determining a neuromuscular disease accompanied with a repeat expansion of CGG in a nucleic acid in a subject according to the embodiment of the present invention comprises a fragmentation reagent configured to obtain a nucleic acid fragment having a repeat expansion of CGG or a complementary sequence thereof from a nucleic acid sample from the subject, a circularizing reagent configured to circularize the nucleic acid fragment with an origin of chromosome (oriC) cassette to form a circular nucleic acid, and an amplifying reagent configured to amplify the circular nucleic acid to produce a plurality of circular nucleic acids. The kit may further comprise a digesting reagent to digest the amplified circular nucleic acids to obtain amplified nucleic acid fragments.


The fragmentation reagent may comprise the restriction enzyme or the gene editing protein as described above. An example of the restriction enzyme is Earl. An example of the gene editing protein is CRISPR/Cas9. The circularizing reagent may comprise the RecA family recombinase protein and oriC cassette as described above. The amplifying reagent may comprise the first enzyme group, the second enzyme group and the third enzyme group as described above. The digesting reagent may comprise the restriction enzyme as described above. An example of the restriction enzyme is S acI.


EXAMPLE 1
Identification of CGG Repeat Expansions in Patients with NIID

The present inventors first enrolled 12 families with neuronal intranuclear inclusion disease (NIID), 14 patients with sporadic NIID, and 2 patients with unavailable family history of NIID, for whom the diagnosis was made on the basis of characteristic MRI findings (MCP sign and high-intensity signals on diffusion-weighted imaging (DWI) in the corticomedullary junction, FIG. 1) and/or intranuclear inclusions in skin or brain tissues (FIG. 6A and FIG. 6B).


The strategy for identification of expanded repeat expansions in the short reads obtained by massively parallel sequencers is shown in FIG. 2A, FIG. 2B, FIG. 2C, and FIG. 2D. Using TRhist, which extracts short reads filled with tandem repeats and provides histograms classified on the basis of the repeat motifs, short reads overrepresented exclusively in the patients are identified (Step 1). The location of the short reads filled with tandem repeats is determined by alignment of the paired short reads that do not contain repeat motifs (nonrepeat reads) to the reference human genome sequence (Step 2). The expanded repeat sequences are confirmed by repeat-primed PCR analysis, Southern blot analysis, or long-read sequence analysis (Step 3).


Initially, the present inventors directly searched for paired-end short reads in the whole-genome sequence data of four affected individuals from families F9193, F8504, F9468, and F9785 using TRhist. The present inventors detected short reads filled with CGG repeats that were exclusively observed in the four patients (FIG. 7A, FIG. 7B, FIG. 8A and FIG. 8B). The alignment of the nonrepeat reads paired with short reads filled with CGG/CCG repeats to the reference genome (hg38) revealed that the CGG repeat expansion was located in the peri-centromeric region of chromosome 1 (FIG. 7A and FIG. 7B). There are five paralogs that have sequences with enormously high identities (>99%) in hg38 derived from the human-, Denisovan-, and Neanderthal-specific multiplication of NBPF gene families in chromosome 1, namely, AC253572.1, NOTCH2, NOTCH2NL, NBPF14, and NBPF19 (FIG. 5A and FIG. 5B). Despite the enormously high identities among these paralogous genes, with careful inspections of the reads, the present inventors identified six nonrepeat reads from three patients strongly supporting the location of the CGG repeats in the 5′ UTR of NBPF19 (ENST00000621744.4 encoding neuroblastoma breakpoint family, member 19), which has also been recently annotated as NOTCH2NLC (NM_001364013.1 or NM_001364012.1 encoding notch homolog 2 N-terminal-like protein C, FIG. 7A, FIG. 7B, FIG. 9A and FIG. 9B).


EXAMPLE 2
Long-Read Sequencing Determined the Position of CGG Repeat Expansions Located in NBPF19

To conclusively determine the position of the repeat expansions, the present inventors conducted single-molecule, real-time (SMRT) sequencing of genomic DNA of patient II-5 in family F9193 (FIG. 10A, FIG. 10B, FIG. 10C, FIG. 10D, FIG. 10E, and FIG. 10F). The present inventors obtained 2,053,214 SMRT subreads with a mean subread length of 6,842 bp. The present inventors aligned these subreads to hg38 using minimap2, and then searched for those originating from the NBPF19 region. Even in the presence of highly identical sequences, the alignment of the subreads containing expanded CGG repeats to NBPF19 (FIG. 10A, FIG. 10B, FIG. 10C, FIG. 10D, FIG. 10E, and FIG. 10F) was clearly supported by the NBPF19-specific insertion of an Alu sequence (FIG. 11A, FIG. 11B, FIG. 11C, FIG. 11D, FIG. 11E, FIG. 11F, FIG. 11G, FIG. 11H, FIG. 11I, FIG. 11J, FIG. 11K, FIG. 11L, FIG. 11M, FIG. 11N and FIG. 11O).


Error correction of the five subreads was made using Canu (version 1.7). Although the error correction improved estimation of the sizes of expanded CGG repeats compared to those of raw subreads (FIG. 12), the five expanded CGG repeats in the error-corrected subreads were slightly different in length; namely, 430, 432, 435, 454, and 460 bp, which may reflect a slight divergence of expanded CGG repeats in somatic cells or may be introduced by the long-read sequencing errors.


EXAMPLE 3
Repeat-Primed PCR Analysis and Southern Blot Analysis of Repeat Expansions in NBPF19

The present inventors then designed the primer set for repeat-primed PCR analysis targeting the expanded CGG repeats in the 5′ UTR of NBPF19 (FIG. 10A, FIG. 10B, FIG. 10C, FIG. 10D, FIG. 10E, and FIG. 10F) based on the NBPF19-specific sequence (FIG. 11A, FIG. 11B, FIG. 11C, FIG. 11D, FIG. 11E, FIG. 11F, FIG. 11G, FIG. 11H, FIG. 11I, FIG. 11J, FIG. 11K, FIG. 11L, FIG. 11M, FIG. 11N, FIG. 11O, FIG. 13A, FIG. 13B and FIG. 13C). The repeat-primed PCR analysis (FIG. 10A, FIG. 10B, FIG. 10C, FIG. 10D, FIG. 10E, and FIG. 10F) indeed demonstrated repeat expansion mutations in 26 of the 28 Japanese index patients with NIID (12 probands of the 12 NIID families, 12 of the 14 patients with sporadic NIID, and both of the two NIID patients with unavailable family histories, FIG. 10A, FIG. 10B, FIG. 10C, FIG. 10D, FIG. 10E, FIG. 1OF FIG. 6Aand FIG. 6B). None of the 1,000 Japanese controls showed repeat expansions. In the three families with multiple affected family members, all the 11 affected individuals had the repeat expansions, whereas three asymptomatic individuals with normal nerve conduction study findings in family F6321, three asymptomatic individuals aged >60 years with normal MRI findings in families F9193 and F11393, and two married-in healthy individuals did not (FIG. 10A, FIG. 10B, FIG. 10C, FIG. 10D, FIG. 10E, and FIG. 10F). Additionally, the repeat expansion mutations were also identified in two Malaysian males of Chinese origin. Patient 1 presented with tremor, ataxia, peripheral neuropathy, urinary incontinence, and cognitive decline with the age at onset of 53 years, and patient 2 with unusual resting and action upper limb tremor, gait ataxia, and urinary incontinence with the onset in the middle age). Characteristic MRI findings (MCP sign and T2 hyperintensity signals in the white matter) suggested the diagnosis of FXTAS, but they did not have CGG repeat expansion mutations in FMR1 as examined by repeat-primed PCR analysis (FIG. 14).


The present inventors further confirmed the CGG repeat expansions in NIID patients by Southern blot analysis. The probes were designed to target the sequences flanking the CGG repeat in NBPF19 (FIG. 15A and FIG. 15B). Although the expanded alleles were clearly shown, strong signals reflecting the wild-type alleles of NBPF19 and fragments of the same sizes derived from the other four paralogous genes were detected owing to the highly identical sequences (FIG. 10A, FIG. 10B, FIG. 10C, FIG. 10D, FIG. 10E, FIG. 10F, FIG. 5A and FIG. 5B). Southern blot analysis of 28 patients with NIID and seven unaffected individuals revealed that all the patients had expanded alleles whereas the unaffected individuals did not. The lengths of the CGG repeat expansion were estimated to range from 270 to 550 bp, corresponding to approximately 90-180 repeat units. Intergenerational instability of expanded repeats was observed by Southern blot analysis of the two parent-offspring pairs (FIG. 16A, FIG. 16B, and FIG. 16C). Since the two offsprings were presymptomatic carriers, the present inventors were unable to address the presence of genetic anticipation phenomenon as a result of intergenerational instability of expanded repeats.


EXAMPLE 4
Distribution of Number of CGG Repeat Units and Repeat Configurations in Controls

Since the CGG repeats and the flanking sequences of NBPF19 show enormously high identities among the paralogous genes, AC253572.1, NOTCH2, NOTCH2NL, and NBPF14 (FIG. 5A, FIG. 5B, FIG. 7A, and FIG. 7B), the present inventors devised an NBPF19-specific primer pair (FIG. 17A and FIG. 17B) to specifically amplify NBPF19 and subjected the PCR products to circular consensus sequencing (CCS) mode of a PacBio Sequel sequencer (Pacific Biosciences) to exactly determine the repeat configurations of CGG repeats in NBPF19 (FIG. 18A and FIG. 18B). CCS analysis of the PCR products revealed polymorphic lengths of the repeat structure as well as 11 repeat configurations (FIG. 10A, FIG. 10B, FIG. 10C, FIG. 10D, FIG. 10E, and FIG. 10F) with the number of CGG repeat units ranging 7-39 in 182 control subjects. Interestingly, one allele carrying three single nucleotide variants (rs1172135200, rs1436954367, and rs1376391857) in the flanking sequences, all of which carried a configuration (AGG)(CGG)9(AGG)3, and another allele carrying rs1258206224 with a configuration of (AGG)(CGG).(AGG)2(CGG) were observed in 14 and 3 control subjects, respectively (FIG. 19A and FIG. 19B). No single nucleotide variants (SNVs) were observed in other alleles. Reanalysis of long reads spanning the expanded CGG repeats in a patient with NIID revealed a configuration of (AGG)(CGG). without these SNVs (FIG. 11A, FIG. 11B, FIG. 11C, FIG. 11D, FIG. 11E, FIG. 11F, FIG. 11G, FIG. 11H, FIG. 11I, FIG. 11J, FIG. 11K, FIG. 11L, FIG. 11M, FIG. 11N, and FIG. 11O).


The present inventors furthermore conducted fragment analysis of the PCR products containing the CGG repeats in NBPF19 in 1,000 controls. Since the repeat configurations are variable as shown in FIG. 10A, FIG. 10B, FIG. 10C, FIG. 10D, FIG. 10E, and FIG. 10F, the sizes of the repeats were determined as the sizes of the repeat configurations between the flanking non-variable sequences. The repeat sizes in NBPF19 were 9-43 in 1,000 controls (FIG. 20A and FIG. 20B).


EXAMPLE 5
Methylation Status of Expanded CGG Repeats in NBPF19 and Expression Levels of NBPF19 in Brains

To investigate methylation status of expanded CGG repeats located in the 5′ UTR of NBPF19, the present inventors utilized inter-pulse duration (IPD) analysis of the SMRT sequencing reads obtained from a patient with NIID. Because methylated CpGs slow down the sequencing process and generally result in statistically longer IPDs, the present inventors investigated the distribution of IPDs employing the method the present inventors recently devised. The present inventors found that the IPDs of expanded CGG repeats in the 5′ UTR of NBPF19 was similar to those of hypermethylated CGG repeats as determined by bisulfite sequencing (<30% of bisulfite calls on CpG sites) (p=0.35, n=59, two-sided test) but was significantly dissimilar to those of hypomethylated CGG repeats (>70% of bisulfite calls on CpG sites) (p=1.6*10-4, n=1,220, one-sided test), showing that the expanded CGG repeats in the 5′ UTR of NBPF19 tended to be hypermethylated (FIG. 21).


To examine whether the altered methylated status of NBPF19 is associated with transcriptional repression, the present inventors conducted RNA-seq analysis using RNAs extracted from brains of patients with NIID. Analysis of the expression levels of transcripts of NBPF19 using NBPF19-specific sequences revealed no statistical difference between expression levels of patients with NIID (n=3) and those of controls (n=8) (FIG. 22A and FIG. 22B).


EXAMPLE 6
Identification of CGG Repeat Expansions in LOC642361/NUTM2B-AS1 in OPML

The characteristic MRI findings of NIID include an increased DWI signal intensity in the corticomedullary junction of cerebral white matter. Intriguingly, in a single family (F5305, FIG. 23A, FIG. 23B, FIG. 23C, and FIG. 23D) presenting with oculopharyngeal myopathy, diffuse limb weakness, and leukoencephalopathy, strikingly similar characteristic DWI findings in the frontal corticomedullary junctions were noted in the index patient (FIG. 1). Patients in the family showed ptosis, restricted eye movements, dysphagia, dysarthria, and diffuse limb muscle weakness with nonspecific myopathic changes in muscle biopsy specimens. MRI was performed in three patients, which revealed T2 hyperintensity signals in the white matter in two patients (III-5 and III-8) and brain atrophy in three patients (III-5, 111-6, and III-8 in F5305). Since this is a new disease entity that has not been previously described, the present inventors designated the disease as oculopharyngeal myopathy with leukoencephalopathy (OPML). Among the patients, two patients (III-3 and III-6) had severe gastrointestinal dysmotility and respiratory failure in addition to ptosis, and ocular, pharyngeal, and limb muscle weakness. Patient III-3 further showed mild ataxia, bladder disturbances, and dilated cardiomyopathy, and patient III-5 showed hand tremor suspected of cerebellar origin. Note that tremor and ataxia are the common clinical characteristics of fragile X tremor/ataxia syndrome (FXTAS) and neuronal intranuclear inclusion disease (NIID), and gastrointestinal dysmotility is also occasionally observed in patients with NIID. After CGG repeat expansion mutations in NBPF19 were excluded by repeat-primed PCR analysis, the present inventors similarly directly searched for expanded CGG repeats in the whole-genome sequence data of the patient III-5 using TRhist(FIG. 2A, FIG. 2B, FIG. 2C, and FIG. 2D) and identified short reads filled with CGG repeats (FIG. 8A and FIG. 8B). The CGG repeat expansion was located in bidirectionally transcribed long noncoding RNAs, LOC642361 (NR_029407.1, transcribed in the CGG direction) and NUTM2B-AS1 (NR_120613.1, transcribed in the CCG direction, FIG. 23A, FIG. 23B, FIG. 23C, FIG. 23D, FIG. 24A and FIG. 24B) on 10q22.3, where parametric linkage analysis showed a single peak with a maximum multipoint LOD score of 1.94 (FIG. 25A and FIG. 25B). Bidirectional transcription was confirmed by stranded RNA-sequence data of a control brain and muscles (FIG. 26A, FIG. 26B, and FIG. 26C). Because the flanking sequences of the CGG repeats in LOC642361/NUTM2B-AS1 have homologous sequences in LINC00863/NUTM2A-AS1 (10q23.2) and FJL22063/AMMECR1L (2q14.3, FIG. 27), the LOC642361/NUTM2B-AS1-specific primers for repeat-primed PCR analysis were designed on (FIG. 28A, FIG. 28B, and FIG. 28C, FIG. 13A, FIG. 13B, and FIG. 13C). The repeat-primed PCR analysis targeting the CGG repeats confirmed that the four affected individuals in the family had the CGG repeat expansion mutations, whereas the seven unaffected individuals including three married-in healthy individuals did not (FIG. 23A, FIG. 23B, FIG. 23C, and FIG. 23D). None of the 1,000 controls showed the repeat expansion mutations as determined by repeat-primed PCR analysis. Fragment analysis using an LOC642361/NUTM2B-AS1-specific primer pair (FIG. 17A and FIG. 17B) revealed that the CGG repeats ranged 3-16 in 1,000 controls (FIG. 23A, FIG. 23B, FIG. 23C, and FIG. 23D).


Southern blot analysis of the affected individuals (family F5305) revealed broad smearing patterns (FIG. 15A and FIG. 15B), indicating strong somatic instability of the expanded CGG repeats in LOC642361/NUTM2B-AS1 in genomic DNAs from peripheral blood leukocytes (FIG. 29A and FIG. 29B).


EXAMPLE 7
Identification of CGG Repeat Expansions in LRP12 in OPDM

Although cerebral white matter involvement or MCP sign is not observed, another disease, oculopharyngodistal myopathy (OPDM), shared characteristic distributions of muscle involvement including ptosis, external ophthalmoplegia, and dysphagia similar to those of the patients in the family with OPML. Thus, the present inventors further explored a possibility of CGG repeat expansions in families with OPDM. OPDM is an autosomal dominant disease characterized by ptosis, external ophthalmoplegia, and weakness of the masseter, facial, pharyngeal, and distal limb muscles (MIM164310). To date, the causes of OPDM have not been elucidated.


Of the index patients in the 17 families with OPDM and 17 sporadic patients with OPDM in whom biopsied muscle specimens confirmed the presence of myopathic changes with rimmed vacuoles, which is consistent with the diagnosis of OPDM, and GCG repeat expansions in PABPN1, the causative gene for oculopharyngeal muscular dystrophy (OPMD, MIM164300) or CGG repeat expansions in LOC642361/NUTM2B-AS1 were excluded, the present inventors performed whole-genome sequence analysis of patient III-1 of family F7967. Direct search for CGG repeats (FIG. 2A, FIG. 2B, FIG. 2C, and FIG. 2D) revealed CGG repeat expansions (FIG. 8A and FIG. 8B) located in the 5′ UTR of LRP12, which encodes low density lipoprotein-related protein 12 (NM_013437, FIG. 30 A, FIG. 30B, FIG. 30C, FIG. 30D, FIG. 30E, FIG. 31A, and FIG. 31B). Repeat-primed PCR analysis targeting the CGG repeats in LRP12 confirmed the presence of the repeat expansions in patient III-1 in the family F7967 as well as in 12 patients (four with familial OPDM and eight with sporadic OPDM, FIG. 30A, FIG. 30B, FIG. 30C, FIG. 30D, and FIG. 30E). The present inventors further screened CGG repeat expansions in the 54 patients exhibiting similar clinical presentations including ptosis, and extraocular and pharyngeal weakness (26 with family history, 21 without family history, and seven with unknown family history) in whom muscle biopsy specimens were unavailable. The repeat-primed PCR analysis targeting CGG repeats in LRP12 revealed nine patients (four familial and five sporadic) with CGG repeat expansions (FIG. 30A, FIG. 30B, FIG. 30C, FIG. 30D, and FIG. 30E). In addition, screening for repeat expansions in the other 19 patients with similar muscle involvement but without rimmed vacuoles in biopsied muscle specimens did not reveal CGG repeat expansions in LRP12.


Southern blot analysis (FIG. 15A and FIG. 15B) of four patients with OPDM revealed discrete bands corresponding to the expanded repeats of approximately 280 or 380 bp in genomic DNAs from peripheral blood leukocytes (FIG. 32A and FIG. 32B), while multiple bands corresponding to expanded repeats were observed in genomic DNAs from lymphoblastoid cell lines, indicating somatic instability of the expanded repeats. Affected parent-offspring pairs with OPDM were unavailable.


To determine the distribution of repeat units in controls, the present inventors conducted fragment analysis of the PCR products. As (CGG)9(CGT)(CGG)(CGT)2 is registered in hg38, the sizes of the repeats were determined as the total number of repeat units including the repeat sequences flanking (CGG)n. Fragment analysis (FIG. 17A and FIG. 17B) revealed that the number of repeat units in LRP12 ranged 13-45 in 998 controls (FIG. 30A, FIG. 30B, FIG. 30C, FIG. 30D, and FIG. 30E), whereas only two of the 1,000 control individuals (0.2%) showed repeat expansions by the repeat-primed PCR analysis, which was further confirmed by Southern blot analysis (FIG. 32A and FIG. 32B).


OPMD, a disease with similar muscle involvement, is caused by short expansions of GCG repeats (affected individuals, 7-14 GCG repeat units; normal individuals, 6 repeat units) encoding a polyalanine stretch in polyadenylate-binding protein 2 (PABP2) encoded by PABPN1. It is intriguing to note that the same repeat motif is expanded in OPMD and OPDM, although the locations of the mutation are different between oculopharyngeal muscular dystrophy (OPMD) (coding region) and OPDM (5′ UTR).


(Methods)


(Patients and Controls)


All Japanese index patients were diagnosed as having NIID on the basis of characteristic MRI findings [T2-hyperintensity areas in the middle cerebellar peduncles (MCP sign) and high-intensity signals in DWI in the corticomedullary junction] and/or the presence of ubiquitin-positive intranuclear inclusions in the skin or brain tissues4 (FIG. 6A and FIG. 6B). In multiplex families, those who had cognitive decline and decreased or absent tendon reflexes were considered affected in family members aged >60 years in addition to the index patients with characteristic MRI and/or histopathological findings. Because neuropathy is frequently observed in NIIDS, family members with decreased or absent tendon reflexes and decreased motor conduction velocities in nerve conduction study (<49 m/s in the median nerve) were also considered affected. Genomic DNAs of 36 patients with NIID and eight unaffected family members from Japan (FIG. 6A and FIG. 6B), and two patients with NIID from Malaysia were investigated in the study. For confidentiality reason, parts of the pedigree charts were modified not including some individuals with unknown disease status and masking the gender of individuals in the younger generation.


All patients in the Japanese family with OPML showed ptosis, and ocular, pharyngeal, and limb muscle weakness (distal predominant or diffuse weakness). Family members aged over 40 without weakness in ocular or pharyngeal muscles were considered unaffected, because age at onset of the disease is in the range from teenage to 40 years. Genomic DNAs of four affected individuals and seven unaffected individuals in family F5305 were investigated in the study. Other family members were considered to have an unknown disease status.


OPDM was mainly diagnosed clinically. The patients showed characteristic clinical features including ptosis, and ocular, pharyngeal, and distal limb muscle weakness. The present inventors considered that patients in whom muscle biopsy specimens showed myopathic changes with rimmed vacuoles (RVs) were histopathologically supported to have the disease. Genomic DNAs of patients collected in Japan, including 34 with histopathological findings of RVs, 19 without histopathological findings of RVs, and 54 with characteristic clinical features but without histopathological examinations, were investigated in the present inventor's study. In families F7967 and F3411 in which the index patients showed histopathological findings of RVs, genomic DNAs of additional affected and unaffected family members were also investigated in the present inventor's study.


CGG repeat expansion mutations in the 5′ UTR of FMR1 have been excluded in all the probands of NIID (FIG. 14). GCG repeat expansions encoding polyalanine stretches in PABPN1 have been excluded33 in all the probands with OPML and OPDM.


All the participants gave their informed consent. The present inventor's study was approved by the institutional review boards of the University of Tokyo and the present inventors compiled with all relevant ethical regulations. Genomic DNAs were extracted from peripheral blood leukocytes, lymphoblastoid cell lines, or brains using standard procedures. Control subjects (n=1,000) were collected in Japan.


(SNV Genotyping)


SNV genotyping using Genome-Wide Human SNP array 6.0 (Affymetrix) was conducted in accordance with the manufacturer's instructions. SNVs were called and extracted using Genotyping Console 3.0.2 (Affymetrix). Only SNVs with p values of >0.05 in the Hardy-Weinberg test in the control samples, call rates of >0.98, and minor allele frequencies of >0.05 were used for further analysis.


(Genome-Wide Linkage Study)


A genome-wide linkage study of family F5305 (FIG. 30A, FIG. 30B, FIG. 30C, FIG. 30D, and FIG. 30E),) was performed using the pipeline software SNP-HiTLink and Allegro version 2with intermarker distances from 80 kb to 120 kb using an autosomal dominant model with complete penetrance. The disease allele frequency was set to 10−6.


(Whole-Genome Sequence Analysis and Search for Repeat Sequences)


Whole-genome sequence analysis of patients or controls was performed using HiSeq2500 [Illumina, 150 bp paired end (three patients with NIID, one patient with OPML, one patient with OPDM, and seven controls) or 126 bp paired end (three patients with NIID and a control subject)] in accordance with the manufacturer's instructions using a PCR-free library preparation protocol. Short-read sequences harboring repeat sequences were counted using the TRhist program. Only the reads completely filled with repeat motifs of 3-6 bases without mismatches were counted. Repeat motifs were not included in the tables when less than 10 reads were observed in all the 10 subjects (150 bp) and four subjects (126 bp).


Nonrepeat reads paired with short reads filled with CGG repeats were selected using TRhist. After quality-trimming using sickle (https://github.com/najoshi/sickle), trimmed nonrepeat reads were aligned to hg38 using BLAT. The present inventor annotated transcript/genes using UCSC annotations of RefSeq RNAs (https://genome.ucsc.edu/) or Gencode v29 (https://www.gencodegenes.org/).


(SMRT Sequencing Analysis of a Patient with NIID)


Whole-genome sequence analysis was performed using a Pacific Biosciences Sequel sequencer. Long reads were aligned to the reference genome (hg38) using minimap2(version 2.10). Multiple sequence alignment analysis of the long reads at the NBPF19 locus including CGG repeat expansions and the five paralogous sequences of the NBPF19, NBPF14, NOTCH2NL, NOTCH2, and AC253572.1 regions obtained from hg38 were performed using ClustalW (version 2.1). The long reads showing CGG repeat expansions in NBPF19 were further polished using Canu (version 1.7)and assembled using racon (version 1.3.1). From the long reads, the present inventors identified CGG repeat expansions in the 5′ UTR of NBPF19 using Tandem Repeat Finder (version 4.0.9).


(Repeat-Primed PCR Analysis)


Repeat-primed PCR analysis was performed using the primers shown in FIG. 13A, FIG. 13B, and FIG. 13C and LA taq with GC buffer (TaKaRa). The present inventors used deaza-dGTP in place of dGTP, and slow-down PCR protocol was utilized; initial denaturation at 95° C. for 5 min, followed by 50 cycles of 95° C. for 30 s, 98° C. for 10 s, 62° C. for 30 s, and 72° C. for 2 min. The ramp rate to 95° C. and 72° C. was set to 2.5° C/s and that to 62° C. was set to 1.5° C/s. Fragment analysis was performed using an ABI PRISM 3130x1 or 3730 sequencer (Life Technologies) and data were analyzed using GeneMapper software (version 4.1, Life Technologies).


(Southern Blot Analysis)


Southern blot analysis was performed to detect CGG repeat expansions in NBPF19, LOC642361/NUTM2B-AS1, and LRP12. The probes were designed to target the flanking regions of the CGG repeats in the 5′ UTR of NBPF19, the noncoding exon in LOC642361/NUTM2B-AS1, and the 5′ UTR of LRP12. Genomic fragments were subcloned into plasmids (pTA2, Toyobo) using primers shown in FIG. 15A and FIG. 15B, and probes were prepared by digoxigenin (DIG) labeling PCR using DIG-dUTP and dTTP at a ratio of 0.7 to 1.3. To increase signal intensity, several probes (Probes 1-5 or Probes 7 and 8) were mixed for hybridization for NBPF19 or LRP12, respectively. The primer pairs used for DIG-labeling are shown in FIG. 15A and FIG. 15B.


Ten micro grams of genomic DNAs extracted from peripheral blood leukocytes or lymphoblastoid cell lines was digested with Sad and/or Nhel (NBPF19) or Xspl (LOC642361/NUTM2B-AS1 and LRP12) and electrophoresed in 0.8%-1.2% agarose gels followed by capillary blotting onto positively charged nylon membranes (Sigma-Aldrich) and cross-linking by exposure to ultraviolet light. After prehybridization, the probes were hybridized overnight at 42° C. (LOC642361/NUTM2B-AS1 and LRP12) or 48° C. (NBPF19) in DIG Easy Hyb (Sigma-Aldrich). The membrane was finally washed with 0.1X-0.5X saline sodium citrate (SSC) and 0.1% sodium dodecyl sulfate (SDS) in 68° C. twice for 15 min each. The detection process was performed using Fab fragments of an anti-DIG antibody conjugated to alkaline phosphatase (Sigma-Aldrich), CDP-star (Sigma-Aldrich), and LAS3000 mini (Fujifilm).


(Analysis of Repeat Sizes in Controls)


The present inventors conducted fragment analysis to determine distribution of sizes of CGG repeats in NBPF19, LOC642361/NUTM2B-AS1, and LRP12 in 1,000 controls (FIG. 17A and FIG. 17B). In the analysis of NBPF19 and LOC642361/NUTM2B-AS1, the present inventors used NBPF19- and LOC642361/NUTM2B-AS1-specific primers to avoid non-specific amplification of genes due to highly homologous sequences (FIG. 17AFIG. 17B).


To determine the repeat configurations of CGG repeats in NBPF19, the present inventors conducted circular consensus sequencing (CCS) analysis using a PacBio Sequel sequencer (Pacific Biosciences) for pooled barcoded PCR products containing the CGG repeats in NBPF19 (FIG. 18A and FIG. 18B) that were prepared from 194 control subjects. “By strand” CCS reads were generated using SMRT Link (v.6.0.0.47841). Minimum number of passes were set to be 20 to obtain accurate CCS reads. After discarding 12 subjects with less than 50 CCS reads, the present inventors were able to determine number of CGG repeat units, repeat configurations, and flanking sequences in the 182 control subjects. In this analysis, copy number variations involving this locus were not taken into consideration.


(Methylation Analysis Using SMRT Sequencing Reads)


To investigate the CpG methylation status of expanded CGG repeats in the 5′ UTR of NBPF19, the present inventors utilized kinetic metric called inter-pulse duration (IPD) from SMRT sequencing reads. The present inventors first created a reference IPD set for the hypomethylated CGGs and hypermethylated CGGs using whole-genome bisulfite sequencing data and SMRT sequencing data obtained from the same control individual. CGG repeats in the hg38 reference sequence were identified by aligning synthetic (CGG). sequence (n=7; 21bp) to the reference by Bowtie 2 (version 2.1.0) allowing no mismatches. After removing regions without enough PacBio reads for calculating IPD statistics according to SMRT Pipe (version 0.51.0) provided by Pacific Biosciences, the present inventors obtained 401 CGG repeat sites. Then, the present inventors associated each CpG site with methylation status obtained by whole genome bisulfite sequencing data. The present inventors had, however, a smaller number of bisulfite-treated short reads available on CGG repeats than on other unique regions presumably due to ambiguous short read alignment to CGG repeats or high GC content. Since methylation statuses of neighboring CpG sites are likely to be correlated, the present inventors assumed that CpG sites in a single CGG repeat had an identical methylation status; namely, if <30% (>70%, respectively) of bisulfite calls on CpG sites within the repeat support methylation, then the entire region was defined to be hypomethylated (hypermethylated) as a whole. The analysis revealed 303 hypomethylated CGG repeat regions with 1,220 CpGs and 14 hypermethylated regions with 59 CpGs. The present inventors observed a significant difference in IPD statistics at cytosine of CGG between the hypermethylated and hypomethylated CpG sites (p=3.3*10−16) using Mann-Whitney U test (one-sided), demonstrating that IPD is informative in inferring CpG methylation statues of CGG repeat (FIG. 21).


The present inventors next examined whether the CGG repeats in the 5′ UTR of NBPF19 in a patient were similar to hypomethylated CGG repeat or hypermethylated CGG repeat in terms of IPD statistics of CpG sites, and the present inventors examined the null hypothesis of independence of IPD statistics using Mann-Whitney U test.


(RNA-Seq Analysis in Brains of Patients with NIID and Control Subjects)


To determine the expression levels of NBPF19 in patients with NIID, three autopsied brains of patients with NIID as well as eight control brains (occipital lobe) were subjected to unstranded RNA-seq. Short reads were aligned to hg38 using STAR (version 2.5.3a) and the numbers of reads aligned to NBPF19-specific sequences among the five homologous sequences were visually investigated. Statistical analysis was performed using Wilcoxon's rank sum test (two-sided).


To examine transcriptional directions, data on stranded RNA-seq of normal subjects (brain, n=1; muscle, n=2) were aligned to hg38 using STAR (version 2.5.3a). After reads with mapping quality of less than five were discarded using SAMtools (version 1.6), aligned reads and coverages were visualized using the Integrative Genomics Viewer (version 2.4.4).


(Haplotype Analysis)


Disease-relevant haplotypes in three families with OPDM (F3411, F7758, and F7967) were reconstructed using SNP genotypes. In addition, employing linked-read analysis (10X GemCode Technology), the haplotypes of the patient II-1 in family F3411, the index patient in family F7758, and the patient III-1 in family F7967 were determined using longranger (version 2.1.6) and loupe (version 2.1.1). The present inventors used the reference genome hg19 in this analysis.


(Summary of Clinical Presentation of the Index Patient (III 3) in Family F5305 with Oculopharyngeal Myopathy with Leukoencephalopathy (OPML)


The pedigree chart of this family (F5305) is shown in FIG. 23A, FIG. 23B, FIG. 23C, and FIG. 23D. There are seven affected individuals consistent with autosomal dominant inheritance.


The index patient (III 3, FIG. 23 noticed nasal voice a t the age of 15. The progression of her symptom was as follows: at 27 years old (y/o), she began noticing easy fatigability of her extremities; at 30 y/o, ptosis; and at 32 y/o, mild dysphagia. She underwent repeated blepharoplasties at ages 34, 45, and 56. She was examined at another hospital a t 35 y/o, where ptosis, dysarthria, dysphagia, and weakness of facial and neck muscles were observed, however, the limb muscles were minimally involved. Needle electromyography revealed motor units with short duration and low voltage, which were considered as myogenic changes . Muscle biopsy revealed no abnormal findings. Motor n erve conduction studies were normal.


Her symptoms gradually progressed . Detailed examination s at 58 y/o at the Department of Neurology, The University of Tokyo Hospital revealed ptosis, near lycomplete external ophthalmoplegia, dysarthria with nasal voice, and dysphagia. She also had facial, neck, and diffuse limb muscle weakness accompanied with diffuse muscular atrophy and generalized areflexia. She had dysuria requiring abdominal pressure to assist urination. Although tube feeding was tried because of dysphagia and repeated aspiration pneumonia, tube enteral feeding was not adequate due to severe gastrointestinal dysmotility. Weakness of respiratory muscles led to hypercapnia. On laboratory examination, serum creatine kinase levels were below the lower limit (29IU/L) L), while serum lactate and pyruvate levels were normal. Echocardiography revealed diffuse hypokinesis of the left ventricle (ejection fraction of 44%). Magnetic resonance imaging of the head revealed T2 hyperintensity signals in the white matter accompanied with hyperintensity signals on diffusion weighted images in the corticomedullary junction (FIG. 1). Clinical presentation of other family members are summarized in FIG. 33A and FIG. 33B.


Although autosomal dominant mitochondrial diseases exhibiting chronic progressive external ophthalmoplegia were initially considered FIG. 23A, FIG. 23B, FIG. 23C, and FIG. 23D) from the pedigree chart, no rearrangement s or deletions of mitochondrial DNA were identified by Southern blot hybridization analysis of genomic DNA extracted from the abdominal muscle specimen. Causative mutations in the nuclear genes responsible for autosomal dominant mitochondrial diseases POLG, SLC25A4, C10ORF2, POLG2, RRM2B, DNA2, OPA1, and AFG3L2 were not identified by whole genome sequence analysis. Oculopharyngeal muscular dystrophy was excluded by the analysis of the CGG repeat in PABPN1. Although oculopharyngodistal myopathy (OPDM) was another differential diagnosis, patients with OPDM usually showed muscular weakness with predominance in distal limbs and rimmed vacuoles in muscle biopsy specimens 1, while the patients in this family did not show such findings. Involvement of the gastrointestinal tract 2 or theheart 3 was only infrequently observed in patients with OPDM. Taken together with myopathy of the oculopharyngeal type, diffuse muscular weakness, characteristic brainMRl findings (leukoencephalopathy), and the gastrointestinal involvement, the present inventors considered the characteristic clinical presentation in this family constitute a novel clinical entity and designate the disease as OPML.


EXAMPLE 8
Identification of CGG Repeat Expansions in Patients with NIID by Circularizing DNA Sample)

A genomic fragment containing CGG repeats of the NBPF19 gene was assembled with an oriC cassette to form a circular DNA, and the circular DNA was amplified by replication-cycle reaction (RCR) (Masayuki Su' etsugu et al., “Exponential propagation of large circular DNA by reconstitution of a chromosome-replication cycle,” Nucleic Acids Research, 2017, Vol. 45, No. 20 11525-11534). Size differences of the repeat region of the amplified product were analyzed directly or following Sad digestion in agarose gel electrophoresis.


Genomic DNA (1 to 10 μg) was extracted from peripheral blood leukocytes (PB) or lymphoblastoid cell lines (LCL) and was fragmentated by digestion with Earl followed by phenol/chloroform extraction and ethanol precipitation. The genome fragments (100 ng) were then mixed with 1 ng of oriC cassette (FIG. 36, SEQ ID NO: 100) in 5 uL of assembly mixture [20 mmol/L Tris-HC1 (pH8.0), 4 mmol/L Dithiothreitol, 20 mmol/L Mg(OAc)2, 50 mmol/L Potassium glutamate, 100 umol/L ATP, 4 mmol/L Creatine phosphate, 150 mmol/L Tetramethylammonium chloride, 10% Dimethyl sulfoxide, 5% Polyethylene glycol (Mw 8,000), 20 ug/mL Creatine kinase, 1 umol/L RecA, 80 mU/ml Exo III]. The oriC cassette has 60 bp overlapping sequences against NBPF19-specific locus at the both ends. The assembly mixture was incubated at 42° C. for 30 min followed by heat treatment at 65° C. for 2 min and placed immediately on ice.


The assembly mixture (0.5 μL) was then added to RCR amplification mixture (total 5 μL) containing RCR buffer [20 mmol/L Tris-HCl (pH8.0),8 mmol/L Dithiothreitol, 150 mmol/L Potassium acetate, 10 mmol/L Mg(OAc)2, 4 mmol/L Creatine phosphate, 1 mmol/L each rNTP, 0.25 mmol/L NAD, 10 mmol/L Ammonium Sulfate, 50 ng/μL Yeast tRNA, 0.1 mmol/L each dNTP, 0.5 mg/mL BSA, 20 ng/μL Creatine kinase], 400 nmol/L SSB, 40 nmol/L IHF, 40 nmol/L DnaG, 40 nmol/L DnaN, 5 nmol/L PolIII*, 20 nmol/L DnaB-DnaC complex, 100 nmol/L DnaA, 10 nmol/L RNaseH, 50 nmol/L Ligase, 50 nmol/L Poll, 50 nmol/L GyrA-GyrB complex, 5 nmol/L Topo IV, 50 nmol/L Topo III, 50 nmol/L RecQ, and 60 nmol/L Tus. RCR amplification was performed at 30° C. for 16 hr. The reaction was then diluted 5-fold with RCR buffer and incubated at 30° C. for 30 min. 1 uL of the incubated sample was used directly (FIG. 37) or following digestion with Sad (FIG. 38) for size analysis in 1.5% agarose gel electrophoresis followed by SYBR Green staining.


The result of size analysis of the amplification products derived from four samples (FIG. 39) were shown in FIG. 37. DNA band of the amplified product derived from NIID patients (lanes 3 and 4) was broad and expanded to slower migrating position of the gel in comparison with DNA band derived from unaffected persons (lanes 1 and 2).


Amplification products derived from 37 samples (FIG. 40 and FIG. 41) were digested with SacI, and the result of size analysis were shown in FIG. 38. DNA bands indicating expanded allele were detected in the products derived from NIID patients (underlined lanes).










[Sequence Listing]



SEQUENCE LISTING


<110> The University of Tokyo


<120> METHOD AND KIT FOR DETERMINING NEUROPATHY IN SUBJECT


<130> T0529AMP0020-US


<160> 100


<170> PatentIn version 3.5


<210> 1


<211> 417


<212> DNA


<213> Homo sapiens


<400> 1


gcggcggcgg cggcggcctg cggcggcggc ggcggcgcgg tcggcgggcg gcgggcggcg  60





gcggcggcgt cggcggcggc ggcggctgcg ggcggcggcg gcggtgcggc gcggccggcc 120





gcggcggcgg cggcgggcgg cggcggcggc ggcggcggcc ggcgggcggc ggtcggccgc 180





ggcggcggcg cgggcggcgg cggcggcggc tggcgggcgg cggggcggcg gcggcggcgg 240





cggcggcggc ggcggcgcgc gggcgaagcg gcggcggcgg cggtcggcgg cgcggcggcg 300





gccggcggcg gcggcggcgg gcggcggcgg cggcggctgc gagcggtggc ggcgggcggc 360





ggcggcggcc ggcgcggcgg cggcggcggc tgcggcgggg cggcgggggg ggcggcg    417





<210> 2


<211> 431


<212> DNA


<213> Homo sapiens


<400> 2


ggcggcggcg gcggcggcgg ccggcggcgg cggcggcggc ggcggcggcg ggcggcgggc  60





ggcggcggcg gcggcggcgg cggcggcggc ggcgggcggc ggcggcgggc ggcggcggcg 120





cggcggcggc ggcggcggcg ggcggcggcg gcggcggcgg cggcggcggg cggcgggcgg 180





ccgcggcggc ggcgcgggcg gcggcggcgg cggcggcggg cggcggggcg gcggcggcgg 240





cggcggcggc ggcggcggcg gcggcgcggc gcggcggcgg cggcggcggc ggcgcggcgg 300





cggccggcgg cggcggcggc gggcggcggc ggcggcggct gcggcggcgg cggcgggcgg 360





cggcggcggc cggcggcggc ggcggcggcg gcggcggcgg ggcggcgggg ggggcggcgc 420





gggcggcggc g                                                      431





<210> 3


<211> 426


<212> DNA


<213> Homo sapiens


<400> 3


ggcggcgggc ggcggccggc ggcggcggcg gcggcggcgg cggcggcggc ggcggccggc  60





ggcactggcg gcggcggcgg cggcggcggc gcggcgtgcg gcgtcggcgg cggcggcggg 120





cggcggcggc ggcggcggcg gcggcggtcg gcggcggcgg cgggcggcgg cgcgcggcgg 180





cgcggcggcg gcggcggcgg cggcgggctg gcgcggcggc gcggatgcgg cggcggcggc 240





ggcggcggcg ggcggcggcg gcggcggcgg cggggcggcg gcgcggcgcg gcgggcggcg 300





gcggcggcgg cgcggctgcg gcggcggcgg ctgcggcggc tgcggcggcg gcggcggcgt 360





ctgcggcggc ggcggcggcg gcggtggcgg cggcgcggcg gctgcggcgg cgcggcggcg 420





gcggcg                                                            426





<210> 4


<211> 433


<212> DNA


<213> Homo sapiens


<400> 4


gcggcggcgg cgcggcggcc ggcggcggcg gcggcggcgg cggcggcggc ggcggcggcc  60





ggcggcggcg cggcggcggc ggcggcggcg gcggcgcggc ggcggcggcg gcggcggcgg 120





cgggcggcgg cggcggcggc ggcggcggcg gcggcggcgg cggcgggcgg cggcgcgcgg 180





cggcgcggcg gcggcggcgg cggcggcggg cggcggcggc ggcggcggcg gcggcggcgg 240





cggcggcggc ggcgggcggc ggcggcggcg gcggcggggc ggcggcggcg gcggcggcgg 300





gcggcggcgg cggcggcgcg gcggcggcgg cggcggcggc ggcggcggcg gcggcggcgg 360





cggcggcggc ggcggcggcg gcggcggcgg cggcggcggc gcggcggcgg cggcggcggc 420





ggcggcggcg gcg                                                    433





<210> 5


<211> 428


<212> DNA


<213> Homo sapiens


<400> 5


cggcggcggc ggctgtgcgg aggcggcggg cggcggcggg gcggcggcgc ggcggcggcg  60





gcggcggcgg cgtcggcggc ggcggcggcg gcgcgccggc ggcggcgcgg cggcggcggg 120





cgggcggcgg cggcggcggc ggcggcggcg gcggcggcgg cgggcggcgg cggcggcggc 180





ggcgcggcgg cggcgcggcg gcggcaggcg gcggcggcgg aggcggcttt ggcttcggcg 240





gcatggcggc ggcggcggcg gcggatggcg gcggcggcgg cggcggcggc ggcggcggcg 300





gcggcgcggc ggaggcggcg gcggggcgcg gcggcggcgg ctgccggcgg cggcgggcgg 360





cggggcggcg gcggtccggc cggcggcggc agagcggcgg caggcggcgg ccggcggcgg 420





cggcggcg                                                          428





<210> 6


<211> 436


<212> DNA


<213> Homo sapiens


<400> 6


ggcggcggcg gcggtgcggc ggcggcgggc ggcggcggcg gcggcggcgg cggcggcggc  60





ggcggcggcg gcggcggcgg cggcggcggc ggcgcgccgg cggcggcggc ggcggcggcg 120





ggcgggcggc ggcggcggcg gcggcggcgg cggcggcggc ggcgggcggc ggcggcggcg 180





gcggcgcggc ggcggcggcg gcggcggcgg cggcggcggc ggcggcggcg gcggcggcgg 240





cggcgtggcg gcggcggcgg cggcggcggg cggcggcggc ggcggcggcg gcggcggcgg 300





cggcggcggc ggcggaggcg gcggcggggc ggcggcggcg gcggctgccg gcggcggcgg 360





gcggcggcgg cggcggcggc gcggcggcgg cggcggcggc ggcggcggcg ggcggcggcc 420





ggcggcggcg gcggcg                                                 436





<210> 7


<211> 460


<212> DNA


<213> Homo sapiens


<400> 7


ggcggcggcg gcggcggcgg cggcggcgcg gcgcgcggcg gcggcggcgg cggcggcggc  60





tgcggtcgcg gcggcggcgc ggcgcggcgg cgtcggccgg cggcggcggc ggcggcgggc 120





ggcggcggcg gcgggcggcg gcggcggcgg gcggcggcgg cggctgcgcg gcggcggcgg 180





cggcggcgcg gcgcggcgcg gcggcggcgg cgggcggcgg cggcggccgg cggcggcggc 240





ggcggggcgg cggcggcgcg ggggggcggc ggcggcggcg gcggcggcgg cggcgcggcg 300





gcggcggcgg cggcggcggg cggcggcggc ggcgcggcgg cgggccggcg gcggcggcgg 360





cggcggcggc gcggcggcgg cggcggcggc ggccggcggc ggcggcggcg gcggcggcgg 420





cggcgggggg ggcgggcggg gaggcgcggg gcggcggcgg                       460





<210> 8


<211> 455


<212> DNA


<213> Homo sapiens


<400> 8


ggcggcggcg gcggcggcgg cggcggcgcg gcggcggcgg cggcggcggc ggcggcggcg  60





gcggcgcggc ggcggcgcgg cgcggcggcg gcggccggcg gcggcggcgg cggcgggcgg 120





cggcggcggc gggcggcggc ggcggcgggc ggcggcggcg gcggcgcggc ggcggcggcg 180





gcggcgcggc gcggcgcggc ggcggcggcg gcggcggcgg cggccggcgg cggcggcggc 240





ggggcggcgg cggcggcggc ggcggcggcg gcggcggcgg cggcggcggc gcggcggcgg 300





cggcggcggc ggcgggcggc ggcggcggcg cggcggcggg cggcggcggc ggcggcggcg 360





gcggcgcggc ggcggcggcg gcggcggccg gcggcggcgg cggcggcggc ggcggcggcg 420





gggggggcgg gcggcggagg cgcggggcgg cggcg                            455





<210> 9


<211> 479


<212> DNA


<213> Homo sapiens


<400> 9


cggcggcgcg gcggcggcgg cggcgcggcg gcggcggcgg cggcggcggc ggccgtgcgg  60





cggcggctgc ggcggcggcg gcggcggcgg cggggaccgg cggcggcggc gggcggcggc 120





ggcggcggcg gcggggcggc ggcggcggcg gcggcgggcg gcggcggcgg cggcggcgcg 180





gccggcggcg ggcgcggcgg cggcggcggc tttggcggcg gcggcgggga ggcggcggcg 240





gcggcggcgg cggcggcggc ggcggcggcg tgcggcgggc ggcggcgggg cggcgggcgg 300





cggctggcgg cggcggggcg gcggcggcgg ccggcggagc ggcccggcgc gcggcggcgg 360





cggcggcggc ggcggcgggc ggcggcggcg gcggcggcgg cggcaggcgg cggcggcggc 420





ggcggcggcg gccggcgggg cggcgaggcg gcggcgcggc ggcgtggcgg ccggcggcg  479





<210> 10


<211> 461


<212> DNA


<213> Homo sapiens


<400> 10


ggcggcggcg gcggcggcgg cggcggcgcg gcggcggcgg cggcggcggc ggcggccggc 60





ggcggcggcg gcggcggcgg cggcggcggc ggcggggcgg cggcggcggc gggcggcggc 120





ggcggcggcg gcgggcggcg gcggcggcgg cggcggcggc ggcggcggcg gcggcgcggc 180





ggcggcgggc gcggcggcgg cggcggcggc ggcggcggcg gcgaggcggc ggcggcggcg 240





gcggcggcgg cggcggcggc ggcggcggcg gcggcggcgg ggcggcggcg gcggcggcgg 300





cggcggcggc ggcggcggcg gcggcggagc ggcggcggcg gcggcggcgg cggcggcggc 360





ggcgggcggc ggcggcggcg gcggcggcgg cggcggcggc ggcggcggcg gcggcggcgg 420





cggggcggcg ggcggcggcg cggcggcggg cggcggcggc g                     461





<210> 11


<211> 50


<212> DNA


<213> Homo sapiens


<400> 11


gtggtgactc ctctatcggg acgcccctcc cattgtatct ggcccaggct             50





<210> 12


<211> 50


<212> DNA


<213> Homo sapiens


<400> 12


gggagagtgg ggctcctcta tcgggacccc ctccccattg gatctgccca             50





<210> 13


<211> 50


<212> DNA


<213> Homo sapiens


<400> 13


gggctcctct atcggacccc cttcgcccat tcgtggatct gcccatcgcg             50





<210> 14


<211> 50


<212> DNA


<213> Homo sapiens


<400> 14


agagtggggc tcctctatcg ggaccccctc cccatgtgga tctgcccatc             50





<210> 15


<211> 50


<212> DNA


<213> Homo sapiens


<400> 15


agagagagtg gggatactct aatcgggacc ccctccccat gggatctgcc             50





<210> 16


<211> 50


<212> DNA


<213> Homo sapiens


<400> 16


agagagtggg gctcctctat cgggaccccc tccccatgtg gatctgccca             50





<210> 17


<211> 50


<212> DNA


<213> Homo sapiens


<400> 17


gagagggggc tcctctatcg tgaccccctc cccatgttgt ctgcccccca             50





<210> 18


<211> 50


<212> DNA


<213> Homo sapiens


<400> 18


agagagtggg gctcctctat cgggaccccc tccccatgtg gatctgccca             50





<210> 19


<211> 50


<212> DNA


<213> Homo sapiens


<400> 19


agagtggggc tcctatatcg ggaccccctc cccatgtgat ctgcccaggt             50





<210> 20


<211> 50


<212> DNA


<213> Homo sapiens


<400> 20


agagagtggg gctcctctat cgggaccccc tccccatgtg gatctgccca             50





<210> 21


<211> 50


<212> DNA


<213> Homo sapiens


<400> 21


cgggcgcggc gaaccgagaa tatgcccgcc ctgcgcagct ctgactgctg             50





<210> 22


<211> 50


<212> DNA


<213> Homo sapiens


<400> 22


aaccgagaag atgcccgccc tgcgccgctc tctgctgtgg gcgctgctgg             50





<210> 23


<211> 50


<212> DNA


<213> Homo sapiens


<400> 23


accggagatg gccccgccct gcgccgtgct ctgctgtggg ggctgctggc             50





<210> 24


<211> 50


<212> DNA


<213> Homo sapiens


<400> 24


accgagaaga tgcccgccct gcgccgctct gctgtgggcg ctgctggcgc             50





<210> 25


<211> 50


<212> DNA


<213> Homo sapiens


<400> 25


accgagaaga tgccacgcca tgcgccgctc tgatgtgggc gatgctggcg             50





<210> 26


<211> 50


<212> DNA


<213> Homo sapiens


<400> 26


accgagaaga tgcccgccct gcgccgctct gctgtgggcg ctgctggcgc             50





<210> 27


<211> 50


<212> DNA


<213> Homo sapiens


<400> 27


accgagaaga atgcccgccc tggcccgctc tgctgtgggc gctgcttctg             50





<210> 28


<211> 50


<212> DNA


<213> Homo sapiens


<400> 28


accgagaaga tgcccgccct gcgccgctct gctgtgggcg ctgctggcgc             50





<210> 29


<211> 50


<212> DNA


<213> Homo sapiens


<400> 29


accgagaaga tttgcccgcc ctgcgacgct ctgatgtggg ctgggctggc             50





<210> 30


<211> 50


<212> DNA


<213> Homo sapiens


<400> 30


accgagaaga tgcccgccct gcgccgctct gctgtgggct gctgctggcg             50





<210> 31


<211> 20


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 31


agcgcccaca gcagagcggc                                              20





<210> 32


<211> 38


<212> DNA


<213> Artificial Sequence


<220>


Substitute Specification-Marked


<223> Primer


<400> 32


ccgggagctg catgtgtcag aggcggcggc ggcggcgg                          38





<210> 33


<211> 23


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 33


ccgggagctg catgtgtcag agg                                          23





<210> 34


<211> 23


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 34


cgctagaagg agtgtggtcc acc                                          23





<210> 35


<211> 38


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 35


ccgggagctg catgtgtcag aggcggcggc ggcggcgg                          38





<210> 36


<211> 23


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 36


ccgggagctg catgtgtcag agg                                          23





<210> 37


<211> 27


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 37


ggagggagga gaagctggag gtagacg                                      27





<210> 38


<211> 38


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 38


ccgggagctg catgtgtcag aggcggcggc ggcggcgg                          38





<210> 39


<211> 23


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 39


ccgggagctg catgtgtcag agg                                          23





<210> 40


<211> 28


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 40


tcaggcgctc agctccgttt cggtttca                                     28





<210> 41


<211> 38


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 41


ccgggagctg catgtgtcag aggccgccgc cgccgccg                          38





<210> 42


<211> 23


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 42


ccgggagctg catgtgtcag agg                                          23





<210> 43


<211> 20


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 43


gtgtgctgct cgcgtctttg                                              20





<210> 44


<211> 20


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 44


ctacaattct ctaaagcagg                                              20





<210> 45


<211> 20


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 45


gtgtgctgct cgcgtctttg                                              20





<210> 46


<211> 20


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 46


gtgtgggtgg gatggggaag                                              20





<210> 47


<211> 20


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 47


tattaaacgg atgacactcc                                              20





<210> 48


<211> 20


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 48


ctggtccact tctgaaattc                                              20





<210> 49


<211> 20


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 49


gaatttcaga agtggaccag                                              20





<210> 50


<211> 20


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 50


ctacaattct ctaaagcagg                                              20





<210> 51


<211> 23


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 51


ttggagtgtg cagagggata agg                                          23





<210> 52


<211> 19


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 52


cgcaggccag cttctctcg                                               19





<210> 53


<211> 23


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 53


ttggagtgtg cagagggata agg                                          23





<210> 54


<211> 21


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 54


tggattccac ccccgcggct c                                            21





<210> 55


<211> 23


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 55


ggagtcagga cagatgtgta cac                                          23





<210> 56


<211> 21


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 56


gtggttatgg cctgtcgctg g                                            21





<210> 57


<211> 23


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 57


ggagtcagga cagatgtgta cac                                          23





<210> 58


<211> 25


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 58


gatgcttgac tgtgagaaag cagag                                        25





<210> 59


<211> 25


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 59


ctctgctttc tcacagtcaa gcatc                                        25





<210> 60


<211> 21


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 60


gtggttatgg cctgtcgctg g                                            21





<210> 61


<211> 20


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 61


tactcaccat gcgcgggggt                                              20





<210> 62


<211> 38


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 62


ccgggagctg catgtgtcag agggcctgtg cttcggac                          38





<210> 63


<211> 23


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 63


ccgggagctg catgtgtcag agg                                          23





<210> 64


<211> 25


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 64


cgcagcccga gtttcccacc tttta                                        25





<210> 65


<211> 48


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 65


ccgggagctg catgtgtcag aggctcgcta gaaggagtgt ggtccacc               48





<210> 66


<211> 23


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 66


ccgggagctg catgtgtcag agg                                          23





<210> 67


<211> 22


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 67


gccaccctct cgtctcgcgc tg                                           22





<210> 68


<211> 43


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 68


ccgggagctg catgtgtcag aggcgaggaa aagcaagagc aac                    43





<210> 69


<211> 23


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 69


ccgggagctg catgtgtcag agg                                          23





<210> 70


<211> 19


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 70


ttgcgcctgt gcttcggac                                               19





<210> 71


<211> 20


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 71


tactcaccat gcgcgggggt                                              20





<210> 72


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 72


atgctgatga cgcgct                                                  16





<210> 73


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 73


gacagcatct gcgctc                                                  16





<210> 74


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 74


agcgtctgac gtgagt 16





<210> 75


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 75


tcgatatacg acgtgc                                                  16





<210> 76


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 76


tcgtcatacg ctctag                                                  16





<210> 77


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 77


cgactacgta cagtag                                                  16





<210> 78


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 78


gcgtagacag actaca                                                  16





<210> 79


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 79


acagtatgat gtactc                                                  16





<210> 80


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 80


gtctgataga tacaga                                                  16





<210> 81


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 81


ctgcgcagta cgtgca                                                  16





<210> 82


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 82


gtacatatgc gtctgt                                                  16





<210> 83


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 83


gagactagag atagtg                                                  16





<210> 84


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 84


tacgcgtgta cgcaga                                                  16





<210> 85


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 85


tgtcactcat ctgagt                                                  16





<210> 86


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 86


gcacatacac gctcac                                                  16





<210> 87


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 87


gctcgtcgcg cgcaca                                                  16





<210> 88


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 88


acagtgcgct gtctat 16





<210> 89


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 89


tcacactcta gagcga                                                  16





<210> 90


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 90


tcacatatgt atacat                                                  16





<210> 91


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 91


cgctgcgaga gacagt                                                  16





<210> 92


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 92


cgctgcgaga gacagt                                                  16





<210> 93


<211> 16


<212> DNA


<213> Artificial Sequence


<220>


<223> Barcode


<400> 93


gcagactctc acacgc                                                  16





<210> 94


<211> 22


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 94


accgagaaga tgcccgccct gc                                           22





<210> 95


<211> 22


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 95


cgcgcctcgg aaagaataac ag 22





<210> 96


<211> 22


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 96


accgagaaga tgcccgccct gc                                           22





<210> 97


<211> 21


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 97


aactgcccac ctccctgcac c                                            21


SS





<210> 98


<211> 21


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 98


cggcagcaag tctcagaaac t                                            21





<210> 99


<211> 22


<212> DNA


<213> Artificial Sequence


<220>


<223> Primer


<400> 99


cgcgcctcgg aaagaataac ag                                           22





<210> 100


<211> 419


<212> DNA


<213> Artificial Sequence


<220>


<223> oriC cassette


<400> 100


tggtctatca ttaacttgtt ttggtaaata acttaatctt catgatatgt agtctcttca  60





agtatgttgt aactaaagat ctactgtgga taactctgtc aggaagcttg gatcaaccgg 120





tagttatcca aagaacaact gttgttcagt ttttgagttg tgtataaccc ctcattctga 180





tcccagctta tacggtccag gatcaccgat cattcacagt taatgatcct ttccaggttg 240





ttgatcttaa aagccggatc cttgttatcc acagggcagt gcgatcctaa taagagatca 300





caatagaaca gatctctaaa taaatagatc ttctttttaa tactttagtt acaacatact 360





caacttctcc aacacatcaa gacattcttc cagtcttgca ttgctcccca gtttatata  419






CITATION LIST
Patent Literature

[PL 1]


US 2017/321263 A1


[PL 2]


US 2019/276883 A1


[PL 3]


US 2020/0115727 A1


[PL 4]


EP 3650543 A1


Non Patent Literature

[NPL 1]


Loureiro, J. R., Oliveira, C. L. & Silveira, I., “Unstable repeat expansions in neurodegenerative diseases: nucleocytoplasmic transport emerges on the scene,” Neurobiol. Aging 39, 174-183 (2016).


[NPL 2]


Vissers, L. E., et al., “A de novo paradigm for mental retardation,” Nat. Genet. 42, 1109-1112 (2010).


[NPL 3]


Lindenberg, R., Rubinstein, L. J., Herman, M. M. & Haydon, G. B., “A light and electron microscopy study of an unusual widespread nuclear inclusion body disease. A possible residuum of an old herpesvirus infection,” Acta Neuropathol. 10; 54-73 (1968).


[NPL 4]


Haltia, M., Somer, H., Palo, J. & Johnson, W. G., “Neuronal intranuclear inclusion disease in identical twins,” Ann. Neurol. 15; 316-321 (1984).


[NPL 5]


Sone, J. et al., “Clinicopathological features of adult-onset neuronal intranuclear inclusion disease,” Brain 139, 3170-3186 (2016).


[NPL 6]


Takahashi-Fujigasaki, J., Nakano, Y., Uchino, A. & Murayama, S., “Adult-onset neuronal intranuclear hyaline inclusion disease is not rare in older adults,” Geriatr. Gerontol. Int. 16 Suppl 1, 51-56 (2016).


[NPL 7]


Kimber, T. E. et al., “ Familial neuronal intranuclear inclusion disease with ubiquitin positive inclusions,” J. Neurol. Sci. 160, 33-40 (1998).


[NPL 8]


Sone, J. et al., “Neuronal intranuclear hyaline inclusion disease showing motor-sensory and autonomic neuropathy,” Neurology 65, 1538-1543 (2005).


[NPL 9]


Yamaguchi, N. et al., “An autopsy case of familial neuronal intranuclear inclusion disease with dementia and neuropathy,” Intern. Med. in press (doi: 10.2169/internalmedicine.1141-18).


[NPL 10]


Sone, J. et al., “Neuronal intranuclear inclusion disease cases with leukoencephalopathy diagnosed via skin biopsy,” J. Neurol. Neurosurg. Psychiatry 85, 354-356 (2014).


[NPL 11]


Sone, J. et al., “Skin biopsy is useful for the antemortem diagnosis of neuronal intranuclear inclusion disease,” Neurology 76, 1372-1376 (2011).


[NPL 12]


Nakano, Y. et al., “PML nuclear bodies are altered in adult-onset neuronal intranuclear hyaline inclusion disease,” J. Neuropathol. Exp. Neurol. 76, 585-594 (2017).


[NPL 13]


Takumida, H. et al., “Case of a 78-year-old woman with a neuronal intranuclear inclusion disease,” Geriatr. Gerontol. Int. 17, 2623-2625 (2017).


[NPL 14]


Sugiyama, A. et al., “MR imaging features of the cerebellum in adult-onset neuronal intranuclear inclusion disease: 8 cases,” Am. J. Neuroradiol. 38, 2100-2104 (2017).


[NPL 15]


Hunsaker, M. R. et al., “Widespread non-central nervous system organ pathology in fragile X premutation carriers with fragile X-associated tremor/ataxia syndrome and CGG knock-in mice,” Acta Neuropathol. 122, 467-479 (2011).


[NPL 16]


Hagerman, R. J. et al., “Intention tremor, parkinsonism, and generalized brain atrophy in male carriers of fragile X,” Neurology 57, 299-301 (2001).


[NPL 17]


Doi, K. et al., “Rapid detection of expanded short tandem repeats in personal genomics using hybrid sequencing,” Bioinformatics 30, 815-822 (2014).


[NPL 18]


Ishiura, H. et al., “Expansions of intronic TTTCA and TTTTA repeats in benign adult familial myoclonic epilepsy,” Nat. Genet. 50, 581-590 (2018).


[NPL 19]


Vandepoele, K., Van Roy, N., Staes, K., Speleman, F. & van Roy, F., “A novel gene family NBPF: intricate structure generated by gene duplication during primate evolution,” Mol. Biol. Evol. 22; 2265-75 (2005).


[NPL 20]


Fiddes, I. T. et al., “Human-specific NOTCH2NL genes affect Notch signaling and cortical neurogenesis,” Cell 173, 1356-1369 (2018).


[NPL 21]


Suzuki, I. K. et al., “Human-specific NOTCH2NL genes expand cortical neurogenesis through Delta/Notch regulation,” Cell 173, 1370-1384 (2018).


[NPL 22]


Li, H., “Minimap2: pairwise alignment for nucleotide sequences,” Bioinformatics in press (doi: 10.1093/bioinformatics/btyl91).


[NPL 23]


Koren, S. et al., “Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation,” Genome Res. 27,722-736 (2017).


[NPL 24]


Flusberg, B. A., et al., “Direct detection of DNA methylation during single-molecule, real-time sequencing,” Nat. Methods 7, 461-465 (2010).


[NPL 25]


Suzuki, Y., et al., “Agin: measuring the landscape of CpG methylation of individual repetitive elements,” Bioinformatics 32, 2911-2919 (2016).


[NPL 26]


Schuffler, M. D., Bird, T. D., Sumi, S. M. & Cook, A., “A familial neuronal disease presenting as intestinal pseudoobstruction,” Gastroenterology 75, 889-898 (1978).


[NPL 27]


Satoyoshi, E. & Kinoshita, M., “Oculopharyngodistal myopathy,” Arch. Neurol. 34, 89-92 (1977).


[NPL 28]Durmus, H. et al., “Oculopharyngodistal myopathy is a distinct entity: clinical and genetic features of 47 patients,” Neurology 76, 227-235 (2011).


[NPL 29]


Zhao, J. et al., “Clinical and muscle imaging findings in 14 mainland Chinese patients with oculopharyngodistal myopathy,” PLoS One 10, e0128629 (2015).


[NPL 30]


Satoyoshi, E., “Distal myopathy,” Tohoku J. Exp. Med. 161 Suppl, 1-19 (1990).


[NPL 31]


Brais, B. et al., “Short GCG expansions in the PABP2 gene cause oculopharyngeal muscular dystrophy,” Nat. Genet. 18, 164-167 (1998).


[NPL 32]


Seltzer, M. M., et al., “Prevalence of CGG expansions of the FMR1 gene in a US population-based sample,” Am. J. Med. Genet. B Neuropsychiatr. Genet. 159B, 589-597 (2012).


[NPL 33]


Beck, J. et al., “Large C9orf72 hexanucleotide repeat expansions are seen in multiple neurodegenerative syndromes and are more frequent than expected in the UK population,” Am. J. Hum. Genet. 92, 345-353 (2013).


[NPL 34]


Renton, A. E. et al., “A hexanucleotide repeat expansion in C9ORF72 is the cause of chromosome 9p21-linked ALS-FTD,” Neuron 72, 257-268.


[NPL 35]


Jacquemont, S. et al., “Penetrance of the fragile X-associated tremor/ataxia syndrome in a premutation carrier population,” JAMA 291, 460-469 (2004).


[NPL 36]


Coffey, S. M. et al., “Expanded clinical phenotype of women with the FMR1 premutation,” Am. J. Med. Genet. A 146A; 1009-1016 (2008).


[NPL 37]


DeJesus-Hernandez, M. et al., “Expanded GGGGCC hexanucleotide repeat in noncoding region of C9ORF72 causes chromosome 9p-linked FTD and ALS,” Neuron 72, 245-256 (2011).


[NPL 38]


Fratta, P. et al., “Screening a UK amyotrophic lateral sclerosis cohort provides evidence of multiple origins of the C9orf72 expansion,” Neurobiol. Aging 36, el-7 (2015).


[NPL 39]


Buxton, J. et al., “Detection of an unstable fragment of DNA specific to individuals with myotonic dystrophy,” Nature 355, 547-548 (1992).


[NPL 40]


Zu, T. et al., “Non-ATG-initiated translation directed by microsatellite expansions,” Proc. Natl. Acad. Sci. U. S. A. 108, 260-265 (2011).


[NPL 41]


Todd, P. K. et al., “CGG repeat-associated translation mediates neurodegeneration in fragile X tremor ataxia syndrome,” Neuron 78; 440-455 (2013).


[NPL 42]


Uyama, E., Uchino, M., Chateau, D., & Tome, F. M., “Autosomal recessive oculopharyngodistal myopathy in light of distal myopathy with rimmed vacuoles and oculopharyngeal muscular dystrophy,” Neuromuscul. Disord. 8, 119-125 (1998).


[NPL 43]


Jin, P. et al., “Pur alpha binds to rCGG repeats and modulates repeat-mediated neurodegeneration in a Drosophila model of fragile X tremor/ataxia syndrome,” Neuron 55, 556-564 (2007).


[NPL 44]


Sofola, O. A. et al., “RNA-binding proteins hnRNP A2/B1 and CUGBP1 suppress fragile X CGG premutation repeat-induced neurodegeneration in a Drosophila model of FXTAS,” Neuron 55, 565-571 (2007).


[NPL 45]


Bahlo, M. et al., “Recent advances in the detection of repeat expansions with short-read next-generation sequencing,” F1000Res. 7 (F1000 Faculty Rev), 736 (2018).


[NPL 46]


Mitsuhashi, S. et al., “Tandem-genotypes: robust detection of tandem repeat expansions from long DNA reads,” Genome Biol. 20, 58 (2019).


[NPL 47]


Sznajder, L. J. et al., “Intron retension induced by microsatellite expansions as a disease biomarker,” Proc. Natl. Acad. Sci. U. S. A. 115, 4234-4239 (2018).


[NPL 48]


Fukuda, Y. et al., “SNP HiTLink: a high-throughput linkage analysis system employing dense SNP data,” BMC Bioinformatics 10, 121 (2009).


[NPL 49]


Gudbjartsson, D. F., Thorvaldsson, T., Kong, A., Gunnarsson, G. & Ingolfsdottir, A. Allegro version 2, Nat. Genet. 37, 1015-1016 (2005).


[NPL 50]


Kent, W. J., “BLAT-the blast-like alignment tool,” Genome Res. 14, 656-664 (2002).


[NPL 51]


Larkin, M. A., et al., “Clustal W and Clustal X version 2.0,” Bioinformatics 23, 2947-2948 (2007).


[NPL 52]


Vaser, R., Sovic, I., Nagarajan, N., and Sikic, M., “Fast and accurate de novo genome assembly from long uncorrected reads,” Genome Res. 27, 737-746 (2017).


[NPL 53]


Benson, G., “Tandem repeat finder: a program to analyze DNA sequences,” Nucleic Acids Res. 27, 573-580 (1999).


[NPL 54]


Frey, U. H., Bachmann, H. S., Peters, J., & Siffert, W., “PCR-amplification of GC-rich regions: ‘slowdown PCR’,” Nat. Protoc. 3; 1312-1317 (2008).


[NPL 55]


Su, J., et al., “CpG_MP2: identification of CpG methylation patterns of genomic regions from high-throughput bisulfite sequencing data,” Nucleic Acids Res. 41, e4 (2013).


[NPL 56]


Dobin, A. et al., “STAR: ultrafast universal RNA-seq aligner,” Bioinformatics 29, 15-21 (2013).


[NPL 57]


Li, H., et al., “The Sequence Alignment/Map format and SAMtools,” Bioinformatics 25, 2078-2079 (2009).


[NPL 58]


Robinson, J. T. et al., “Integrative Genomic Viewer,” Nat. Biotechnol. 29, 24-26 (2011).


[NPL 59]


Miyazawa, H., et al., “Homozygosity haplotype allows a genomewide search for the autosomal segments shared among patients,” Am. J. Hum. Genet. 80, 1090-1102 (2007).


[NPL 60]


Satoyoshi, E. & Kinoshita, M., “Oculopharyngodistal myopathy,” Arch. Neuro1.34, 89-92 (1977).


[NPL 61]


Amato, A. A., Jackson, C. E., Ridings, L. W. & Barohn, R. J., “Childhood-onset oculopharyngodistal myopathy with chronic intestinal pseudo-obstruction,” Muscle Nerve 18, 842-847 (1995).


[NPL 62]


Thevathasan, W., et al., “Oculopharyngodistal myopathy-a possible association with cardiomyopathy,” Neuromuscul. Disord.21, 121-125 (2011).


[NPL 63]


Masayuki Su'etsugu et al., “Exponential propagation of large circular DNA by reconstitution of a chromosome-replication cycle,” Nucleic Acids Research, 2017, Vol. 45, No. 20 11525-11534


[NPL 64]


Tomonori Hasebe et al., “Efficient Arrangement of the Replication Fork Trap for In Vitro Propagation of Monomeric Circular DNA in the Chromosome-Replication Cycle Reaction,” Life 2018, 8, 43; doi:10.3390/life8040043


SUMMARY OF INVENTION
Technical Problem

The aim of the present invention is to provide a new method for determining a neuromuscular disease in a subject are disclosed.

Claims
  • 1. A method for determining a neuromuscular disease accompanied with a repeat expansion of CGG in a nucleic acid in a subject comprising: obtaining a nucleic acid fragment having a repeat expansion of CGG or a complementary sequence thereof from a nucleic acid sample from the subject,circularizing the nucleic acid fragment with an origin of chromosome (oriC) cassette to form a circular nucleic acid,amplifying the circular nucleic acid to produce a plurality of circular nucleic acids, anddetecting the repeat expansion of CGG or the complementary sequence thereof.
  • 2. The method of claim 1 further comprising digesting the amplified circular nucleic acids to obtain amplified nucleic acid fragments, wherein each of the amplified nucleic acid fragments has the repeat expansion of CGG or the complementary sequence thereof.
  • 3. The method of claim 1, wherein 5′ region of the oriC cassette is complementary to 5′ region of the nucleic acid fragment and 3′ region of the oriC cassette is complementary to 3′ region of the nucleic acid fragment.
  • 4. The method of claim 1, wherein 5′ region of the oriC cassette is complementary to 3′ region of the nucleic acid fragment and 3′ region of the oriC cassette is complementary to 5′ region of the nucleic acid fragment.
  • 5. The method of claim 1, wherein the repeat expansion of CGG or the complementary sequence thereof locates between 5′ region and 3′ region of the nucleic acid fragment.
  • 6. The method of claim 1, wherein 5′ region and 3′ region of the nucleic acid fragment are loci specific to the neuromuscular disease.
  • 7. The method of claim 1, wherein the nucleic acid fragment is obtained by using a restriction enzyme or a gene editing protein.
  • 8. The method of claim 1, wherein the neuromuscular disease is selected from the group consisting of neuronal intranuclear inclusion disease, oculopharyngodistal myopathy, and oculopharyngeal myopathy with leukoencephalopathy.
  • 9. The method of claim 1, wherein the nucleic acid sample is a chromosome DNA.
  • 10. The method of claim 1, wherein the repeat expansion of CGG is in a gene from the subject.
  • 11. The method of claim 10, wherein the neuromuscular disease is neuronal intranuclear inclusion disease, andwherein the repeat expansion of CGG is in NBPF19/NOTCH2NLC gene.
  • 12. The method of claim 11, wherein the repeat expansion is greater than 80 repeats.
  • 13. The method of claim 10, wherein the neuromuscular disease is oculopharyngodistal myopathy, andwherein the repeat expansion of CGG is in 5′ untranslated region of LRP12 gene.
  • 14. The method of claim 13, wherein the repeat expansion is greater than 77 repeats.
  • 15. The method of claim 10, wherein the neuromuscular disease is oculopharyngeal myopathy with leukoencephalopathy, andwherein the repeat expansion of CGG is in LOC642361/NUTM2B-AS1 gene.
  • 16. The method of claim 15, wherein the repeat expansion is greater than the range in healthy individuals, and wherein the range in healthy individuals is 6 to 14 repeat units.
  • 17. A kit for determining a neuromuscular disease accompanied with a repeat expansion of CGG in a nucleic acid in a subject comprising: a fragmentation reagent configured to obtain a nucleic acid fragment having a repeat expansion of CGG or a complementary sequence thereof from a nucleic acid sample from the subject,a circularizing reagent configured to circularize the nucleic acid fragment with an origin of chromosome (oriC) cassette to form a circular nucleic acid, andan amplifying reagent configured to amplify the circular nucleic acid to produce a plurality of circular nucleic acids.
  • 18. The kit of claim 17 further comprising a digesting reagent to digest the amplified circular nucleic acids to obtain amplified nucleic acid fragments, wherein each of the amplified nucleic acid fragments has the repeat expansion of CGG or the complementary sequence thereof.
  • 19. The kit of claim 17, wherein 5′ region of the oriC cassette is complementary to 5′ region of the nucleic acid fragment and 3′ region of the oriC cassette is complementary to 3′ region of the nucleic acid fragment.
  • 20. The kit of claim 17, wherein 5′ region of the oriC cassette is complementary to 3′ region of the nucleic acid fragment and 3′ region of the oriC cassette is complementary to 5′ region of the nucleic acid fragment.
  • 21. The kit of claim 17, wherein the repeat expansion of CGG or the complementary sequence thereof locates between 5′ region and 3′ region of the nucleic acid fragment.
  • 22. The kit of claim 17, wherein 5′ region and 3′ region of the nucleic acid fragment are loci specific to the neuromuscular disease.
  • 23. The kit of claim 17, wherein the fragmentation reagent contains a restriction enzyme or a gene editing protein.
  • 24. The kit of claim 17, wherein the neuromuscular disease is selected from the group consisting of neuronal intranuclear inclusion disease, oculopharyngodistal myopathy, and oculopharyngeal myopathy with leukoencephalopathy.
  • 25. The kit of claim 17, wherein the nucleic acid sample is a chromosome DNA.
  • 26. The kit of claims 17, wherein the repeat expansion of CGG is in a gene from the subject.
  • 27. The kit of claim 26, wherein the neuromuscular disease is neuronal intranuclear inclusion disease, andwherein the repeat expansion of CGG is in NBPF19/NOTCH2NLC gene.
  • 28. The kit of claim 27 wherein the repeat expansion is greater than 80 repeats.
  • 29. The kit of claim 26, wherein the neuromuscular disease is oculopharyngodistal myopathy, andwherein the repeat expansion of CGG is in 5′ untranslated region of LRP12 gene.
  • 30. The kit of claim 29, wherein the repeat expansion is greater than 77 repeats.
  • 31. The kit of claim 26, wherein the neuromuscular disease is oculopharyngeal myopathy with leukoencephalopathy, andwherein the repeat expansion of CGG is in LOC642361/NUTM2B-AS1 gene.
  • 32. The kit of claim 31, wherein the repeat expansion is greater than the range in healthy individuals, and wherein the range in healthy individuals is 6 to 14 repeat units.
  • 33. A method for detecting a repeat expansion of CGG in a nucleic acid comprising: obtaining a nucleic acid fragment having a repeat expansion of CGG or a complementary sequence thereof,circularizing the nucleic acid fragment with an origin of chromosome (oriC) cassette to form a circular nucleic acid,amplifying the circular nucleic acid to produce a plurality of circular nucleic acids, anddetecting the repeat expansion of CGG or the complementary sequence thereof.
  • 34. The method of claim 33 further comprising digesting the amplified circular nucleic acids to obtain amplified nucleic acid fragments, wherein each of the amplified nucleic acid fragments has the repeat expansion of CGG or the complementary sequence thereof.
  • 35. The method of claim 33, wherein 5′ region of the oriC cassette is complementary to 5′ region of the nucleic acid fragment and 3′ region of the oriC cassette is complementary to 3′ region of the nucleic acid fragment.
  • 36. The method of claim 33, wherein 5′ region of the oriC cassette is complementary to 3′ region of the nucleic acid fragment and 3′ region of the oriC cassette is complementary to 5′ region of the nucleic acid fragment.
  • 37. The method of claim 33, wherein the repeat expansion of CGG or the complementary sequence thereof locates between 5′ region and 3′ region of the nucleic acid fragment.
  • 38. The method of claim 33, wherein the nucleic acid fragment is obtained by using a restriction enzyme or a gene editing protein.
  • 39. The method of claim 33, wherein the nucleic acid fragment is obtained from a chromosome DNA.
  • 40. The method of claim 33, wherein the repeat expansion of CGG is in a gene.
  • 41. A kit for detecting a repeat expansion of CGG in a nucleic acid comprising: a fragmentation reagent configured to obtain a nucleic acid fragment having a repeat expansion of CGG or a complementary sequence thereof from a nucleic acid sample,a circularizing reagent configured to circularize the nucleic acid fragment with an origin of chromosome (oriC) cassette to form a circular nucleic acid, andan amplifying reagent configured to amplify the circular nucleic acid to produce a plurality of circular nucleic acids.
  • 42. The kit of claim 41 further comprising a digesting reagent to digest the amplified circular nucleic acids to obtain amplified nucleic acid fragments, wherein each of the amplified nucleic acid fragments has the repeat expansion of CGG or the complementary sequence thereof.
  • 43. The kit of claim 41 , wherein 5′ region of the oriC cassette is complementary to 5′ region of the nucleic acid fragment and 3′ region of the oriC cassette is complementary to 3′ region of the nucleic acid fragment.
  • 44. The kit of claim 41, wherein 5′ region of the oriC cassette is complementary to 3′ region of the nucleic acid fragment and 3′ region of the oriC cassette is complementary to 5′ region of the nucleic acid fragment.
  • 44. The kit of claim 41, wherein the repeat expansion of CGG or the complementary sequence thereof locates between 5′ region and 3′ region of the nucleic acid fragment.
  • 46. The kit of claim 41, wherein the fragmentation reagent contains a restriction enzyme or a gene editing protein.
  • 47. The kit of claim 41, wherein the nucleic acid sample is a chromosome DNA.
  • 48. The kit of claims 41, wherein the repeat expansion of CGG is in a gene.