The invention is broadly concerned with the determination of genetic factors associated with psychiatric health. More particularly, the present invention is directed to a human gene which is linked to a mood disorder or related disorder in affected individuals and their families. Specifically, the present invention is directed to a gene located on the eighteenth chromosome that is expressed in brain tissue and may be used as a diagnostic marker for bipolar disorder.
Pharmacogenetics Background:
Every individual is a product of the interaction of their genes and the environment. Pharmacogenetics is the study of how genetic differences influence the variability in patients responses to drugs. Through the use of pharmacogenetics, we will soon be able to profile variations between individuals'DNA to predict responses to a particular medicine. Target validation that will predict a well-tolerated and effective medicine for a clinical indication in humans is a widely perceived problem; but the real challenge is target selection. A limited number of molecular target families have been identified, including receptors and enzymes, for which high throughput screening is currently possible. A good target is one against which many compounds can be screened rapidly to identify active molecules (hits). These hits can be developed into optimized molecules (leads), which have the properties of well-tolerated and effective medicines. Selection of targets that can be validated for a disease or clinical symptom is a major problem faced by the pharmaceutical industry. The best-validated targets are those that have already produced well-tolerated and effective medicines in humans (precedent targets). Many targets are chosen on the basis of scientific hypotheses and do not lead to effective medicines because the initial hypotheses are often subsequently disproved.
Two broad strategies are being used to identify genes and express their protein products for use as high-throughput targets. These approaches of genomics and genetics share technologies but represent distinct scientific tactics and investments. Discovery genomics uses the increasing number of databases of DNA sequence information to identify genes and families of genes for tractable or scrollable targets that are not known to be genetically related to disease.
The advantage of information on disease-susceptibility genes derived from patients is that, by definition, these genes are relevant to the patients'genetic contributions to the disease. However, most susceptibility genes will not be tractable targets or amenable to high-throughput screening methods to identify active compounds.
The differential metabolism related to the relevant gene variants can be studied in focused functional genomic and proteomic technologies to discover mechanisms of disease development or progression.
Critical enzymes of receptors associated with the altered metabolism can be used as targets. Gene-to-function-to-target strategies that focus on the role of the specific susceptibility gene variants on appropriate cellular metabolism become important.
Data mining of sequences from the Human Genome Project and similar programmes with powerful bioinformatic tools has made it possible to identify gene families by locating domains that possess similar sequences. Genes identified by these genomic strategies generally require some sort of functional validation or relationship to a disease process. Technologies such as differential gene expression, transgenic animal models, proteomics, in situ hybridization and immunohistochemistry are used to imply relationships between a gene and a disease.
The major distinction between the genomic and genetic approaches is target selection, which genetically defined genes and variant-specific targets already known to be involved in the disease process. The current vogue of discovery genomics for nonspecific, wholesale gene identification, with each gene in search of a relationship to a disease, creates great opportunities for development of medicines.
It is also critical to realize that the core problem for drug development is poor target selection. The screening use of unproven technologies to imply disease-related validation, and the huge investment necessary to progress each selected gene to proof of a concept in humans, is based on an unproven and cavalier use of the word ‘validation’. Each failure is very expensive in lost time and money. For example, differential gene expression (DGE) and proeomics are screening technologies that are widely used for target validation. They detect different levels and/or patterns of gene and protein expression in tissues, which may be used to imply a relationship to a disease affecting that tissue.
Mood Disorder Background:
Mood disorders or related disorders include but are not limited to the following disorders as defined in the Diagnostic and statistical Manual of Mental Disorders, version 4 (DSM-W) taxonomy DSM-IV codes in parenthesis): mood disorders (296.XX,300.4,311,301.13,295.70), schizophrenia and related disorders (295.XX,297.1,298.8,297.3,298.9), anxiety disorders (300.XX,309.81,308.3), adjustment disorders (309.XX) and personality disorders (codes 301.XX).
The present invention is particularly directed to genetic factors associated with a family of mood disorders known as Bipolar (BP) spectrum disorders. Bipolar disorder (BP) is a severe psychiatric condition that is characterized by disturbances in mood, ranging from an extreme state of elation (mania) to a severe state of dysphoria (depression). Two types of bipolar illness have been described: type I BP illness (BPI) is characterized by major depressive episodes alternated with phases of mania, and type II BP illness (BPII), characterized by major depressive episodes alternating with phases of hypomania. Relatives of BP probands have an increased risk for BP, unipolar disorder (patients only experiencing depressive episodes; UP), cyclothymia (minor depression and hypomania episodes; cy) as well as for schizoaffective disorders of the manic (SAm) and depressive (SAd) type. Based on these observations BP, cY, UP and SA are classified as BP spectrum disorders.
The involvement of genetic factors in the etiology of BP spectrum disorders was suggested by family, twin and adoption studies (Tsuang and Faraone (1990), the Genetics of Mood Disorders, Baltimore, The John Hopkins University Press) However, the exact pattern of transmission is unknown. In some studies, complex segregation analysis supports the existence of a single major locus for BP (Spence et al. (1995), Am J. Med. Genet (Neuropsych. Genet.) QQ pp 370-376). Other researchers propose a liability-threshold-model, in which the liability to develop the disorder results from the additive combination of multiple genetic and environmental effects (McGuffin et al. (1994), Affective Disorders; Seminars in Psychiatric Genetics Gaskell, London pp 110-127).
Due to the complex mode of inheritance, parametric and non-parametric linkage strategies are applied in families in which BP disorder appears to be transmitted in a Mendelian fashion. Early linkage findings on chromosomes 11p15 (Egeland et al. (1987), Nature˜pp 783-787) and Xq27-q28 (Mendlewicz et al. (1987, the Lancet 1 pp 1230 -1232; Baron et al. (1987) Nature 12 & pp 289-292) have been controversial and could initially not be replicated (Kelsoe et al. (1989) Nature˜pp 238-243; Baron et al. (1993) Nature Genet˜pp 49-55) with the development of a human genetic map saturated with highly polymorphic markers and the continuous development of data analysis techniques, numerous new linkage searches were started. In several studies, evidence or suggestive evidence for linkage to particular regions on chromosomes 4, 12, 18, 21 and X was found (Black wood et al. (1996) Nature Genetics˜pp 427-430, Craddock et al. (1994) Brit J. psychiatry˜pp355-358, Berrettini et al. (1994), Proc Natl Acad Sci USA˜pp 5918-5921, Straub et al. (1994) Nature Genetics˜pp 291-296 and Pekkarinen et al. (1995) Genome Research 2 pp 105-115). In order to test the validity of the reported linkage results, these findings have to be replicated in other, independent studies.
Recently, linkage of bipolar disorder to the pericentromeric region on chromosome 18 was reported (Berrettini et al. 1994). Also a ring chromosome 18 with break-points and deleted regions at 18pter-p11 and 18q23-qter was reported in three unrelated patients with BP illness or relates syndromes (Craddock et al. 1994). The chromosome 18p linkage was replicated by stine et al. (1995) Am J. Hum Genet 22 pp 1384-1394, who also reported suggestive evidence for a locus on 18q21.2-q21.32 in the same study.
Interestingly, Stine et al. observed a parent-of-origin effect: the evidence of linkage was the strongest in the paternal pedigrees, in which the proband's father or one of the proband's father's sibs is affected. Several studies described anticipation in families transmitting BP disorder(McInnis et al 1993, Nylander et al 1994) suggesting the involvement of trinucleotide repeat expansions (TREs), considering a number of diseases caused by an expansion of a CAG/CTG, a CCG/CGG or a GAAJTTC repeat show anticipation (reviewed by Margolis et al.(Margolis et al 1999)). Previous efforts to find potentially expanded repeats have primarily focused on CAG/CTG repeats although the search for CCG/CGG repeats is increasing(Kleiderlein et al 1998, Mangel et al 1998, Eichhammer et al 1998, Kaushik et al 2000). Previously, we reported on a new method for the region specific isolation of triplet repeats: triplet repeat YAC fragmentation(Del Favero et al 1999). This proved to be a valid method for the isolation of CAG/CTG repeats and using this method, we exlcuded the involvement of CAG/CTG repeats from within 18q21.33-q23 in bipolar disorder(Goossens et al 2000). The present invention adapted the method for the region specific isolation of CCG/CGG repeats and applied it to the chromosome 18q21.33-q23 BP candidate region.
The present invention is directed to a novel gene and protein encoded by that gene.
The novel gene is located at an 8.9 cM chromosome region located between D18S68 and D18S979 at 18q21.33-q23 A physical map was constructed using yeast artificial chromosomes (YACs)(Verheyen et al 1999).
The previously described method was adapted for the region specific isolation of CCG/CGG repeats and applied to the chromosome 18q21.33-q23 BP candidate region. Three potential CpG islands were isolated, one of which is located 1.5 kb upstream of a predicted exon of 3639 bp. Further analysis showed this was part of a novel CpG-associated, brain-expressed gene, herein called NCAGI (Novel CpG Associated Gene 1). Mutation analysis of this positional and functional candidate identified two single nucleotide polymorphisms, which may be useful as a diagnostic marker for BP phenotype.
The present invention is directed to a novel gene located at the 18q chromosomal candidate region of chromosome 18. More specifically, the gene is located at an 8.9 cM region located between DI 8S68 and DI 8S979 at 18q21.33-q23.
The gene is located at a chromosomal region associated with mood disorders such as bipolar spectrum disorders and may therefore be useful as a diagnostic marker for bipolar spectrum disorders. The region in question when removed from the totality of the human genome may also be used to locate, isolate and sequence other genes which influences psychiatric health and mood.
Isolation and Identification of Identification of Novel Gene:
Standard procedures well-known to one skilled in the art were applied to the identified YAC clones and, where applicable, to the DNA from an individual afflicted with a mood disorder as defined herein, in the process of identifying and characterizing the relevant gene. For example, the inventors are able to make use of the previously identified apparent association between trinucleotide repeat expansions (TRE) within the human genome and the phenomenon of anticipation in mood disorders (Lindblad et al. (1995), Neurobiology of Disease 2. pp 55-62 and O'Donovan et al. (1995), Nature Genetics 1Q pp 380-381) to screen for TRE's in the selected YAC clones in order to identify candidate genes in the region of interest on human chromosome18. A variety of other known procedures can also be applied to the said YAC clones to identify the candidate gene as discussed below.
Accordingly, in a first aspect the present invention comprises the use of an 8.9 cM region of human chromosome 18q disposed between polymorphic markers D18S68 and D18S979 or a fragment thereof for identifying at least one human gene, including mutated and polymorphic variants thereof, which is associated with mood disorders or related disorders as defined above. As will be described below, the present inventors have identified this candidate region of chromosome 18q for such a gene, by analysis of co-segregation of bipolar disease in family MAD31 with 12 STR polymorphic markers previously located between D18S51 and D18S61 and subsequent LaD score analysis. Particular YACs covering the candidate region which may be used in accordance with the present invention are 961.h-9, 942-c.3, 766-f-12, 731-c- 7, 907.e.1, 752-g-8 and 717-d-3, preferred ones being 961h-9, 766.f.12 and 907-e.1 since these have the minimum tiling path across the candidate region suitable YAC clones for use are those having an artificial chromosome spanning the refined candidate region between D18S68 and D18S979.
There are a number of methods which can be applied to the candidate regions of chromosome 18q as defined above, whether or not present in a YAC, to identify a candidate gene or genes associated with mood disorders or related disorders. For example, as aforesaid, there is an apparent association between the extent of trinucleotide repeat expansions (TRE) in the human genome and the presence of mood disorders.
Accordingly, in a third aspect the present invention comprises a method of identifying at least one human gene, including mutated and polymorphic variants thereof, which is associated with a mood disorder or related disorder as defined herein which comprises detecting nucleotide triplet repeats in the region of human chromosome 18q disposed between polymorphic markers D18S68 and D18S979.
An alternative method of identifying said gene or genes comprises fragmenting a YAC clone comprising a portion of human chromosome 18q disposed between polymorphic markers D18S60 and D18S61, for example one or more of the seven aforementioned YAC clones, and detecting any nucleotide triplet repeats in said fragments, in particular repeats of CAG or CTG. Nucleic acid probes comprising at least 5 and preferably at least 10 CTG and/or CAG triplet repeats are a suitable means of detection when appropriately labelled. Trinucleotide repeats may also be determined using the known RED (repeat expansion detection) system (Shalling et al. (1993), Nature Genetics˜pp 135-139).
In a fourth embodiment the invention comprises a method of identifying at least one gene, including mutated and polymorphic variants thereof, which is associated with a mood disorder or related disorder and which is present in a YAC clone spanning the region of human chromosome 18q between polymorphic markers D18S60 and D18S61, the method comprising the step of detecting the expression product of a gene incorporating nucleotide triplet repeats by use of an antibody capable of recognizing a protein with anamino acid sequence comprising a string of at least 8, but preferably at least 12, continuous glutamine residues. Such a method may be implemented by sub-cloning YAC DNA, for example from the seven aforementioned YAC clones, into a human DNA expression library. A preferred means of detecting the relevant expression product is by use of a monoclonal antibody, in particular mAB 1 C2, the preparation and properties of which are described in International Patent.
Application Publication No WO 97/17445.
Further embodiments of the present invention relate to methods of identifying the relevant gene orgenes which involve the sub-cloning of YAC DNA as defined above into vectors such as BAC (bacterial artificial chromosome) or PAC (P1 or phage artificial chromosome) or cosmid vectors such as exon-trap cosmid vectors. The starting point for such methods is the construction of a contig map of the region of human chromosome 18q between polymorphic markers D18S60 and D18S61. To this end the present inventors have sequenced the end regions of the fragment of human DNA in each of the seven aforementioned YAC clones and these sequences are disclosed herein. Following sub-cloning of YAC DNA into other vectors as described above, probes comprising these end sequences or portions thereof, in particular those sequences shown in FIGS. 1 to 11 herein, together with any known sequenced tagged site (STS) in this region, as described in the YAC clone contig shown herein, as can be used to detect overlaps between said sub-clones and a contig map can be constructed. Also the known sequences in the current YAC contig can be used for the generation of contig map sub-clones.
One route by which a gene or genes which is associated with a mood disorder or associated disorder can be identified is by use of the known technique of exon trapping. This is an artificial RNA splicing assay, most often making use in current protocols of a specialized exon-trap cosmid vector. The vector contains an artificial mini-gene consisting of a segment of the SV40 genome containing an origin of replication and a powerful promoter sequence, two splicing-competentexons separated by an intron which contains a multiple cloning site and an SV40 polyadenylation site.
The YAC DNA is sub-cloned in the exon-trap vector and the recombinant DNA is transfected into a strain of mammalian cells. Transcription from the SV40 promoter results in an RNA transcript which normally splices to include the two exons of the minigene. If the cloned DNA itself contains a functional exon, it can be spliced to the exons present in the vector's minigene. Using reverse transcriptase a cDNA copy can be made and using specific PCR primers, splicing events involving exons of the insert DNA can be identified. Such a procedure can identify coding regions in the YAC DNA which can be compared to the equivalent regions of DNA from a person afflicted with a mood disorder or related disorder to identify the relevant gene.
Accordingly, in a fifth aspect the invention comprises a method of identifying at least one human gene, including mutated variants and polyrnorphisms thereof, which is associated with a mood disorder or related disorder which comprises the steps of:
As an alternative to exon trapping the YAC DNA may be sub-cloned into BAC, PAC, cosmid or other vectors and a contig map constructed as described above. There are a variety of known methods available by which the position of relevant genes on the sub-cloned DNA can be established as follows:
If the cloned YAC DNA is sequenced, computer analysis can be used to establish the presence of relevant genes. Techniques such as homology searching and exon prediction may be applied.
Once a candidate gene has been isolated in accordance with the methods of the invention more detailed comparisons may be made between the gene from a normal individual and one afflicted with a mood disorder such as a bipolar spectrum disorder. For example, there are two methods, described as “mutation testing”, by which a mutation or polymorphism in a DNA sequence can be identified. In the first the DNA sample may be tested for the presence or absence of one specific mutation but this requires knowledge of what the mutation might be. In the second a sample of DNA is screened for any deviation from a standard (normal) DNA. This latter method is more useful for identifying candidate genes where a mutation is not identified in advance. In addition the following techniques may be further applied to a gene identified by the above-described methods to identify differences between genes from normal or healthy individuals and those afflicted with a mood disorder or related disorder:
The electrophoretic mobilities of these structures on non-denaturing polyacrylamide gels depends on their chain lengths and on their conformation;
It will be appreciated that with respect to the methods described herein, in the step of detecting differences between coding regions from the YAC and the DNA of an individual afflicted with a mood disorder or related disorder, the said individual may be anybody with the disorder and not necessary a member of family MAD31.
In accordance with further aspects the present invention provides an isolated human gene and variants thereof associated with a mood disorder or related disorder and which is obtainable by any of the above described methods, an isolated human protein encoded by said gene and a cDNA encoding said protein.
Once a gene has been identified a number of methods are available to determine the function of the encoded protein. These methods are described by Eisenberg et al (Nature vol. 15, June 2000) and is herein incorporated by reference. One method involves a computational method that reveals functional linkages from genome sequences and is called the gene neighbor metho. If in several genomes the genes that encode two proteins are neighbors on the chromosome, the proteins tend to be functionally linked. This method can be powerful in uncovering functional linkages in prokaryotes, where operons are common, but also shows promise for analysing interacting proteins in eukaryotes.
A: Triplet Repeat Isolation
CCG/CGG YAC fragmentation vectors were constructed by cloning blunted (CCG)10/(CGG)10 adapters into the blunted SphI site of the previously described pDV1 basic vector(Del-Favero et al 1999). Sequencing determined that fragmentation vectors pDVCCG and pDVCGG have the adapter sequence in a 5′-(CCG)10-3′ and a 5′-(CGG)10-3′ orientation respectively.
Using these vectors, CCG/CGG repeats and flanking sequences were isolated by YAC fragmentation as described(Del-Favero et al 1999).
B: Characterisation of Structure of the NCAG1 Gene.
I.M.A.G.E. Consortium [LLNL] cDNA Clones(Lennon et al 1996) IMAGp998A 136826Q2, IMAGp998A 154307Q2, IMAGp998B194346Q2, IMAGp998D126826Q2, IMAGp998DI93628Q2, IMAGp998F131866Q2, IMAGp998H201815Q2, IMAGp998K235214Q2, IMAGp998L153967Q2 and IMAGp998N06839Q2 were ordered at RZPD Deutsches Ressourcenzentrum fur Genomforschung GmbH (Heubnerweg 6, 14059 Berlin-Charlottenburg, Germany). Cultures starting from single colonies were grown and plasmids were prepared by the Wizard Plus SV Minipreps DNA Purification System (Promega, Madison, Wis.). DNA sequencing was performed with the dideoxynucleotide sequencing method using a DNA sequencing kit (Perkin-Elmer, Foster, Calif.) and analysed by an ABI PRISM 377 DNA Sequencer (Perkin-Elmer, Foster, Calif.) or an ABI PRISM 3700 DNA Analyser (Perkin-Elmer, Foster, Calif.).
For the RT-PCR reactions, mRNA from SHSY-5Y cells was prepared using the μMACS mRNA Isolation Kit (Miltenyi Biotec, Bergisch Gladbach, Germany). After DNAseI treatment (Promega, Madison, Wis.), the RT reaction was primed with oligo(dT) primers and performed with Superscript Preamplification System for First Strand cDNA synthesis (GibcoBRL, N.V. Life Technologies, Merelbeke, Belgium). Fs-cDNA was used in long-range PCR reactions with TaKaRa LA Taq (Takara Shuzo Co., Otsu, Shiga, Japan). PCR products were reamplified with nested primers and sequenced as described above.
C: Characterisation of the Expression Pattern of the NCAG1 Gene.
Genepool cDNA (Invitrogen, Carlsbad, Calif.) from brain, fetal brain, placenta, liver, testis and lung was used as a cDNA mapping panel. The Human Brain Multiple Tissue Northern (MTN) Blot IV (Clontech, Palo Alto, Calif.) was used for radioactive hybridisation in accompanying ExpressHyb solution according to the instructions of the manufacturer. A zooblot was prepared by digesting 10 μg genomic DNA to completion with HindIII, running it on a TAE 1% agarose gel and performing a Southern blot. A PCR product containing the ORF of the NCAG1 gene was radioactively labelled and hybridised at 65° C.
D: Mutation Analysis of the NCAG1 Gene.
Overlapping PCR products of approximately 600 bp were generated and sequenced as described above. Both identified polymorphisms were detected by digesting the PCR product with Hinfl and electrophoresing the fragments on precast ExcelGel gels on a Multiphor II electrophoresis system (Amersham Pharmacia Biotech AB, Uppsala, Sweden)
E: CCG/CGG YAC Fragmentation
CCG/CGG YAC fragmentation was applied to YACs 961h9, 766fl2 and 907el(Goossens et al 2000). Size determination by Pulsed Field Gel Electrophoresis (PFGE) and Southern blot hybridisation resulted in 33 sets of equally sized fragmented YAC clones. Sequencing of 112 fragmented YAC ends identified seven (out of 33) sets of fragmented. YACs with identical end sequences resulting from a specific homologous recombination. One set (CCG7) was the result of fragmentation in the (CGG)6 repeat in the 5′ UTR of the CAP2 gene (GenBank acc. No L40377). A second set (CCG6) contained a (CCG)2 repeat and a third (CCG4) an imperfect CCCCG repeat. The triplet repeat in the 5′ UTR of the CAP2 gene was already shown not to be associated with BP disorder(Goossens et al 2000). The size of CCG4 was analyzed in 12 BP and 12 UP patients, but only one allele was detected. The size of CCG6 was not analyzed since it was to small to be polymorphic.
In depth analysis showed that three (CCG3, GenBank acc No . . . ; CCG4, GenBank acc No . . . and CCG6, GenBank acc No . . . ) of the seven sequences had high CG content (70-80%) and high CpG content (15-20 CpGs in 200 bp) but no additional CCG/CGG repeats were found. Primer pairs for these potential CpG islands were used to determine their position on the YAC contig (FIG. 1). BLASTN analysis(Altschul et al 1990) resulted for both CCG4 and CCG6 in hits with sequences of RPCI-11 BACs. CCG4 gave a hit in a contig of 27150 bp of the working draft sequence of RPCI-11 BAC 29013 (GenBank acc No AC022662, GI: 7249117). CCG6 was part of the complete sequence of RPCI-11 BAC 793J2 (GenBank acc No AC009802).
F: Identification and in Silico Characterisation of NCAG1 Gene.
To find genes possibly associated with the potential CpG islands CCG4 and CCG6, their surrounding BAC sequences were analysed using bioinformatic tools. Hence the 27150 bp contig of BAC 29013 and the complete sequence of BAC 793J2 were sent for analysis to the Rummage High-Throughput Sequence Annotation Server (http://gen100.imb-jena.de/rummage/index.html).
First, LCP(Huang 1994) and CPG(Larsen et al 1992) recognized CpG islands containing CCG4 and CCG6 of 1.2 kb and 0.4 kb respectively, confirming their potential role as CpG islands.
In a next step, exon prediction programs Grail(Uberbacher & Mural 1991) and Genscan(Burge & Karlin 1997) both predicted the presence of a 3639 bp exon, 1.5 kb downstream of the 1.2 kb large CpG island containing CCG4. This predicted exon contains an open reading frame (ORF) which starts at an ATG start codon with an almost perfect Kozak sequence and ends with a TAA stop codon. Other predicted features are a transcription start site (TSS) at 2352 bp upstream of the ORF (score 76.6 by Proscan(Prestridge 1995)) and polyadenylation signals at 3032, 3247, 4364, 5338 and 8266 downstream of the ORF (respective scores of 4.79, 3.83, 4.94, 4.93 and 6.27 by PolyAH(Salamov & Solovyev 1997)) (
BLASTN(Altschul et al 1990) alignment searches to sequences of dbEST revealed significant homology (>97%) to 21 human ESTs (Table 1,
G: Characterisation of the Structural Organisation of the NCAG1 Gene.
Based on the BLASTN EST hits I.M.A.G.E. Consortium [LLNL] cDNA Clones(Lennon et al 1996) were ordered and sequenced. The sequences alligned with the genomic sequence in the presumed 5′ UTR (untranslated region), the ORF and the presumed 3′ UTR, indicating that these sequences are indeed transcribed (
Since cDNA clone sequencing did not result in a continuous sequence of the transcript, primers were designed and used for RT-PCR experiments. Sequencing of different overlapping RT-PCR products confirmed the presence of a transcript of at least 9 kb, containing the ORF of the predicted exon, linked to the presumed 5′ and 3′ sequences (
H: Characterisation of the Expression Pattern of the NCAG1 Gene.
To investigate the expression profile of the NCAG1 gene, a long-range PCR spanning the ORF was optimised on genomic DNA and applied on a cDNA mapping panel. This showed that the fragment was present in cDNA from brain, fetal brain, placenta and liver but could not be detected in cDNA from testis and lung. More detailed information on the expression in the brain was obtained by Northern blot hybridisation showing expression of a >9.5 kb transcript in all investigated tissues (lung, placenta, small intestine, liver, kidney, skeletal muscle, heart, brain, uterus, trachea, thyroid, stomach, spinal cord, prostate, mammary gland, lymph node, brain (whole), bladder, adrenal gland, amygdala, caudate nucleus, corpus callosum, hippocampus, substantia nigra, thalamus and total brain).
Stringent Zooblot hybridisation experiments showed the presence of homologous sequences in the genomic DNA of other mammals like dog, pig, mouse, donkey, horse and sheep.
I: Mutation Analysis of the NCAG1 Gene.
Since this novel CpG-associated gene is brain-expressed and located in the chromosome 18q21.3-q23 BP candidate region, a mutation analysis of the ORF was performed on 3 patients and 1 escapee of the chromosome 18 linked family MAD31. In this way two single nucleotide polymorphisms were identified. The first is a C to T transition on position 2017 of the ORF, changing aminoacid (AA) 673 from proline to serine. This polymorphism was only found in the healthy control. The second polymorphism was found in all three patients. It was also a C to T transition, located at position 2824 and changing the 942 AA from proline to serine. Analysis of this polymorphism in family MAD31 showed that the T-allele was present on the disease haplotype.
Both polymorphisms were analysed in an association study on 92 BP patients and 92 age, sex and ethnicity matched controls by PCR-RFLP analysis. The P673S polymorphism turned out to be a frequent polymorphism with both alleles roughly equally present. The P942S polymorphism however was found to be a rare polymorphism, with the T allele only present in 3 BP patients and in 2 controls. Statistical analysis showed the control population was in Hardy-Weinberg equilibrium for both polymorphisms. No alleles, genotypes or haplotypes were found to be associated to BP disorder.
Since triplet repeat fragmentation was proven to be a valid method for the region specific isolation of triplet repeats(Goossens et al 2000), we applied it to the chromosome 18q21.33-q23 BP candidate region for the isolation of CCG/CGG repeats. Therefore, we first had to construct a new set of fragmentation vectors, pDVCCG and pDVCGG. Fragmentation experiments with these vectors resulted in transformation and fragmentation efficiencies in the same range as obtained with the CAG/CTG fragmentation vectors pDVCAG and pDVCTG (data not shown). Application of CCG/CGG fragmentation to YAC 961h9 resulted in the isolation of the (CGG)6 repeat in the 5′ UTR of CAP2. This repeat is adjacent to the (CAG)6 repeat previously reported(Goossens et al 2000). There, it was shown that this (CGG)6(CAG)6 repeat is polymorphic but not expanded in BP cases nor associated with BP disorder. Taken together, the CCG/CGG YAC fragmentation data does not support CCG/CGG repeats as disease causing agents in chromosome 18q21.33-q23 linked BP disorder. On the other hand, fragmentation experiments resulted in three sequences (CCG3, CCG4 and CCG6) with high CG (70-80%) and CpG content but containing no CCG/CGG repeat. CpG islands are usually defined as regions of DNA of more than 200 bases that have a CG content above 50% and a ratio of observed versus expected CpGs close to that statistically expected. Therefore, CCG3, CCG4 and CCG6 can be considered as potential CpG islands. Analysis of surrounding sequences of CCG4 and CCG6 with LCP(Huang 1994) and CPG(Larsen et al 1992) confirmed that the fragmentation occurred in both cases indeed in a CpG island. Since CpG islands are strongly associated with genes, more specifically housekeeping and widely expressed genes, these three sequences are likely to be located near this class of genes.
In the search for genes possibly associated with the isolated CpG islands, exon prediction programs Grail(Uberbacher & Mural 1991) and Genscan(Burge & Karlin 1997) both predicted the presence of a 3.6 kb exon downstream of the largest CpG island isolated. Two facts argued strongly against a false positive prediction. The first was that this two programs, based on different models, predicted exactly the same exon. The second was the mere presence in genomic DNA of this ORF continuing for 3.6 kb and starting with a Kozak consensus ATG. Additional evidence that this exon was indeed transcribed was found in the fact that a series of ESTs had very high homologies (97-100%) with sequences in and surrounding the ORF. In a next step, this evidence was extended by sequencing of the cDNA clones from which the ESTs originated. The EST sequences were prolonged and corrected and the homologies increased to 99-100%. The fact that the cDNA clones originated from different cDNA libraries (Table 1) indicated that the gene was expressed in different tissues. RT-PCR and northern blot experiments resulted in the final confirmation that this ORF was widely expressed, a usual characteristic of a CpG-associated gene.
cDNA clone sequencing resulted in complete sequence of seven human cDNA clones aligning with NCAG1. In two cases a piece of genomic DNA was missing in the cDNA sequence. Clone IMAGp998B194346Q2 lacked a 865 bp fragment (
EST-homologies and cDNA clone sequencing proved that a series of cDNA clones terminated at a predicted polyadenylation signal, 4.3 kb downstream of the ORF or 10.3 kb downstream of the predicted TSS. If the 5 prime intron of 865 bp is taken into account, the size of transcript will be 9.5 kb, which is the size of the transcript recognized in the Northern blot experiment.
On protein level, a cleavable signal peptide and two transmembrane domains are predicted. If this is correct, both N-terminal and C-terminal sides will be at the same side of the membrane in which it is embedded. The strong homology with the SART-2 protein is significant, but it does not add more clues as to potential functions of the novel protein.
The 2824T allele, present on the disease haplotype in the chromosome 18 linked family MAD31, is a very rare allele with a frequency of 0.03. Therefore statistical analysis in an association sample loses a lot of its strength, leaving the possibility that this allele confers an increased risk for BP disorder.
References
The following references are herein expressly incorporated by reference:
Number | Date | Country | Kind |
---|---|---|---|
01202214.1 | Jun 2001 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP02/06316 | 6/6/2002 | WO |