The sex-related height difference in humans is thought to be caused mainly by two components: first, a hormonal component determined by the sex dimorphism of bioactive gonadal steroids and second, a genetic component attributed to a Y-specific growth gene, termed GCY (Tanner, et al. 1966; Smith, et al. 1985; Ogata and Matsuo, 1992). Despite extensive mapping attempts for this gene on the human Y chromosome (Ogata, et al. 1995, Salo, et al. 1995, Rousseaux-Prevost, et al. 1996, De Rosa, et al. 1997), its precise position remains unknown. Recent evidence shows that inappropriate cytogenetic methodology in the characterization of Y-chromosomal terminal deletions has brought about some of the difficulties in elucidating the GCY-critical region. In order to overcome these problems, the inventors have considered only patients presenting de novo interstitial deletions for the GCY analysis on the Y chromosome (Kirsch, et al. 2000). This approach allows the assignment of GCY to a particular chromosomal interval without excluding the presence of X0-mosaicism and/or i(Yp) and idic(Yq11) chromosomes in patients with terminal deletions.
The direct comparison of overlapping interstitial deletions in seven adult males with normal height, one male with borderline height, and one patient with a large interstitial deletion and short stature resulted in the confirmation of the GCY critical interval between markers DYZ3 and DYS 11. This region roughly encompasses 1.6-1.7 Mb of genomic DNA. To improve the resolution in the region of interest close to the centromere, the inventors have established additional new STS markers specific for this part of the chromosome using our bacterial artificial chromosome (BAC)/P1-derived artificial chromosome (PAC) contig. Molecular deletion analysis using these new Y-chromosomal STSs allowed the inventors to narrow down the critical interval to a genomic region of 700 kb.
Preferably the regions are to the exclusion of the regions of chromosomes on each side of the defined regions.
Preferably the region is between SKY1 and sY83. It may include one or both the SKY1 and the sY83 regions. Preferably the region is between SKY8 and sY83 (preferably includes one or both of the SKY8 and sY83 regions), or SKY1 and SKY4.
The invention provides an isolated region of the Y chromosome between DYZ3 and DYS11 which encompasses GCY. Preferably the Y chromosome is a human Y chromosome.
The preferred region is between sY79 and sY81, preferably to the exclusion of the region of the Y chromosome outside that area of the chromosome.
Primers for use in GCY studies are also provided.
The invention further provides isolated gene/pseudogene sequences which contributes the sex related height difference in humans. These may be one or more of the gene or pseudogene sequences identified in one or more of the figures.
The invention further encompasses proteins having the same function as GCY protein and which have greater than 65% homology, greater than 70% homology, greater than 75% homology, greater than 80% homology, greater than 85% homology, preferably greater than 90% homology, and most preferably greater than 95% homology to the GCY protein. Preferably this has GCY gene activity, for example it has an effect on the height of a male mammal when expressed in that mammal.
Primers for use in detecting or amplifying a region of GCY are also provided. They may be labelled using radioactive or non-radioactive labels known in the art and used using well known methods. These methods include PCR, Southern or Northern blotting.
Experimental evidence will now be described in detail with reference to the figures in which:
Table 1 is a comparison of the adult height of patients and their siblings.
Table 2 is a table of new Y chromosomal STSs
Table 3 is the PCR/restriction digest analysis of sequence family variants in the AZFc region
Table 4 is a summary of BAC and PAC clones identified during physical map creation.
Table 5 is a summary of the genomic primers that will be used for microdeletion screening in adult males with idiopathic short stature.
Table 6 is a summary of the sequences of the isolated exon trap clones
Table 7A is a summary of primer pairs for predicted genes,
Table 8 is RT-PCR primer sequences for exon trap clones,
Tables 9a & b are tables showing homology of exons between ADLX and ADLY.
Table 10 is a summary of sequence divergence of genes/pseudogenes from the GCY region and their homology.
A diagram of the human Y chromosome with Yp telomere to the left and Yq telomere to the right is presented at the top. Shown below are the results of low-resolution analysis of Y-chromosomes of adult males with normal height or short stature. Along the top border, 95 Y-chromosomal STSs are listed. Except for SKY3 and SKY8 (see Table 2 for detail), all other STSs were previously reported (Vollrath et al., 1992, Jones et al., 1994, Reijo et al., 1995). Blank spaces or grey boxes indicate inferred absence or presence of markers for which assay was not performed. Asterisks indicate markers in the respective breakpoint regions which could not be tested. In all cases where previously published data of the patients were re-investigated, the identical DNA sample used for the primary analysis was studied. (Please note that the proximal as well as the distal breakpoint of the interstitial deletion of patient #293 resides within satellite type II sequences.)
A. Overview and amplicon structure of the human Y chromosome in the vicinity of the human DAZ cluster. Each amplicon is represented by specific bands (A, B, D, E, X). Shown above are arrows indicating the orientation of each member of an amplicon family with respect to each other. The amplicon indicated by bands X arose from a portion of chromosome 1 that was transposed to the distal end of the DAZ cluster and partially duplicated.
B. Precise position of selected Y-specific STSs and the SFVs according to the physical map of the human Y chromosome. Marker sY157 is highlighted as it was suspected to be present in only one copy by multiplex PCR analysis (see text for detail).
C. Summary of STS and SFV analysis in patients with Y-chromosomal rearrangements within the human DAZ cluster region. Grey boxes indicate inferred absence or presence of markers.
D. Sequence family variant typing of SKY10 and SKY12 in genomic DNA of patient #1972. Assay is described in Table 3. Along the right are listed fragment sizes (in bp). Products are separated by electrophoresis in 3% NuSieve agarose (3:1) and visualized by ethidium bromide staining.
A. Diagram showing the distribution of major tandem repeat blocks and general organization of sequence homologies. Basically, the region can be subdivided in three distinct intervals: a proximal region characterized by 5 bp satellite sequences (G), a central region with high homology to chromosome 1 (O), and a distal region composed of X/Y-homologous sequences (B). Below the precise position of the newly established and previously published STS markers in this region are illustrated. At the bottom border, the PAC/BAC contig constructed with the aid of the new STS markers is shown. Prefixes RP1, 5 indicate PAC clones and RP11 BAC clones, respectively.
B. Localization of the GCY critical interval as defined by high-resolution STS mapping in patients with short stature and normal height. Black boxes indicate the presence, white boxes the absence of the respective STS. Striped boxes depict the dosage unknown regions where the breakpoint resides.
Materials and Methods
Defining the GCY Critical Region
Selection of Patients
Patients #293, JOLAR, #28, #63 and #95 have been described clinically in detail elsewhere (Skare et al. 1990; Ma et al. 1993; Foresta et al. 1998; Kleiman et al. 1999). Patient Y0308 corresponds to case 1 in the study of Pryor et al. 1997. Patients T.M., #1947 and #1972 are phenotypically normal males suffering from idiopathic infertility. Genomic DNA samples were extracted from peripheral blood leukocytes (#28, #63, #95, Y0308, T.M., #1947, #1972) or from lymphoblastoid cell lines (#293, JOLAR). DNA isolated from peripheral blood leukocytes of normal males and females served as internal controls.
Height Assessment
As all individuals are of diverse ethnic origins, height was compared to the respective national height standards (Table 1). Patients were of similar age range. When possible, special attention was given to adult height comparisons between parents and siblings. Data are summarized along with the height standard deviation score (SDS) in Table 1. To calculate the SDS, mean adult height and the standard deviation were taken from the corresponding national physical growth studies.
PCR Analysis
Reactions were performed in a total volume of 50 μl (75 mM Tris/HCl pH9.0, 20 mM (NH4)2 SO4, 0.1% (w/v) Tween20, 1.5 mM MgCl2) containing 1.0 mM of each oligonucleotide primer, 100 ng genomic DNA as template, 5 units of Taq DNA polymerase (Eurogentec), and each dNTP at 1 mM in a thermocycler (MJ Research, Inc.) as follows: After an initial denaturation step of 95° C. for 5 min, samples were subjected to 30 cycles consisting of 30 sec at 94° C., 30 sec at 60° C. and 1 min at 72° C. followed by a final extension step of 5 min at 72° C. The Multiplex PCR was carried out as described in Henegariu et al. 1994 with minor modifications. Alu-Alu PCR reactions were essentially carried out as described in Nelson et al. 1991. Amplification products smaller than 1 kb were resolved on 3% NuSieve agarose/1% SeaKem GTG agarose (FMC) in 1×TBE (0.089 M Tris-borate/0.089 M boric acid/20 mM EDTA, pH 8.0). For amplification products larger than 1 kb as well as products from Alu-Alu-PCR, 1.5% SeaKem GTG agarose gels in 1×TBE were used for separation.
PCR Primers
Y-specific STSs, loci and PCR conditions have been described previously (Vollrath et al. 1992; Jones et al. 1994; Reijo et al. 1995). Sequences of new Y-chromosomal STSs are listed in Table 2. Y-specific STSs termed SKY were either derived from YAC, BAC and PAC end sequences or from clone-internal sequences amplified by various combinations of Alu primers. Primers for the markers SKY10, 11, 12, and 13 were designed to amplify fragments spanning unique restriction sites within the genomic DAZ locus (SKY10 from RP11-487K20 (AC024067), RP11-70G12 (AC006983), RP11-141N04 (AC008272), RP11-366C06 (AC015973), RP11-560118 (AC053522), RP11-175B09 (AL359453), SKY11 and SKY12 from RP11-245K04 (AC007965), RP11-100J21 (AC017005), RP11-506M09 (AC016752), RP11-589P14 (AC025246) and SKY13 from RP11-100J21 (AC017005), RP11-589P14 (AC025246), RP11-823D08 (AC073649), RP11-251M08 (AC010682), RP11-978G18 (AC073893)) in order to detect ‘sequence family variants’ (SFVs).
Restriction Analysis of PCR Products
PCR products were resolved on agarose gels, the appropriate gel bands cut out and the DNA isolated with GFX™ PCR DNA and Gel Band Purification Kit (Amersham Pharmacia Biotech, Inc.) according to the manufacturer's protocol. Fragments amplified from SKY5 and SKY6 were digested with TaqI and BsmI respectively. To detect SFVs at SKY10, SKY11, SKY12 and SKY13, PCR products were digested with restriction enzymes as listed in Table 3.
Sequencing of BAC/PAC/YAC end Fragments
DNA from BAC/PAC clones selected for end sequencing were purified with the Nucleobond PC100 Kit Macherey-Nagel) according to the manufacturer's instructions. End fragments were directly sequenced using the Thermosequenase Fluorescent Labelled Primer Cycle Sequencing Kit (Pharmacia) and analyzed on a Pharmacia A.L.F. express (Amersham Pharmacia Biotech). YAC end fragments were generated with Alu/Vector-polymerase chain reaction and subcloned in pCR2.1 with the TOPO-TA cloning Kit (Invitrogen). Sequencing was performed as described.
Fluorescence In Situ Hybridization
Metaphase spreads were obtained either from primary blood samples or immortalized cell lines. Preparations were made according to standard protocols (Lichter and Cremer 1992). Cosmid and plasmid DNA was labeled by nick translation with biotin-16-dUTP (La Roche). Slides carrying metaphase spreads were kept in 70% ethanol at 4° C. for one week. 200-300 ng of labeled plasmid or cosmid DNA, 20-30 μg of human Cot-1 DNA (GIBCO BRL), and hybridization buffer (50% formamide, 10% dextran sulfate, and 2×SSC, pH 7.0) were mixed, denatured for 5 min at 75° C. and pre-annealed for 30 min at 37° C. The slides were denatured for 2 min in 70% formamide and 2×SSC, pH7.0, at 72° C. (Ried et al. 1992). The pre-annealed probe was hybridized overnight in a humidifying chamber at 37° C. Slides were washed and stained with avidin-conjugated fluorescein isothiocyanate (FITC). The signal was amplified with biotinylated anti-avidin followed by shining with avidin-FITC. For the probe all human telomeres (Oncor) the instructions supplied by the manufacturer were followed. Chromosomes were counterstained with 4′,6-diamidino-2-phenylindol dihydrochloride (DAPI). Images were taken separately by using a cooled charge coupled device camera system (Photometrics, Tucson Ariz., USA). A Macintosh Quadra 900 was used for camera control and digital image acquisition in the ‘TIF’ format using the software package Nu200 2.0 (Photometrics). Separate gray scale fluorescence images were recorded for each fluorochrome. Images were overlaid electronically and further processed using the Adobe Photoshop software.
Searching the Stature Gene
Microdeletion Screening
Exon Amplification
Shotgun subcloning of PAC clones into pSPL3B. Genomic DNA from chromosome Y specific PAC clones was partially digested with Sau3AI. 100 ng of isolated fragments in the range of 4-10 Kb were ligated with 100 ng of pSPL3B that had been BamHI digested and dephosphorylated. The ligation reaction was transformed into supercompetent E. coli Xl-1 blue cells (Stratagene) and aliquots of each transformation plated on selective medium (ampicillin). Resulting colonies were subsequently pooled for plasmid DNA isolation.
Cell culture and electroporation. COS7 cells were propagated in DME medium supplemented with 10% heat inactivated calf serum. For transfections COS7 cells in between the 5th and 15th passage were grown to about 75% confluence, trypsinized, collected by centrifugation and washed in ice-cold Dulbecco's PBS. 4×109 cells were then resuspended in cold 0.7 ml Dulbecco's PBS and combined in a precooled electroporation cuvette (0.4 cm chamber, BioRad) with 0.1 ml Dulbecco's PBS containing 15 μg DNA. After 10 min on ice, cells were gently resuspended, electroporated (1.2 kV, 25 μf) in a BioRad Gene Pulser 2 and placed on ice again. After 10 min cells were transferred to a tissue culture dish (100 mm) containing 10 ml prewarmed, CO2 preequilibrated culture medium.
RNA isolation, RT-PCR and cloning. Cytoplasmic RNA was isolated 72 hrs post transfection (QIAGEN RNeasy Kit) and first strand synthesis was performed as recommended by the manufacturer with minor modifications: 5 μg of RNA was added to a solution containing 10 mM of each dNTP and 2 μM of oligonucleotide SA2. The mixture was heated to 65° C. for 5 min and then placed on ice for at least a further minute. After adding a reaction mixture containing 10× PCR buffer (Perkin-Elmer Cetus), 25 mM MgCl2, 0.1M DTT and RNAsin (35 U/μl), the reverse transcription reaction was transferred to 42° C. for 2 min. 1 μl of SuperScript II RT (200 U/μl; Gibco BRL) was then added and the reaction incubated at 42° C. for 90 min and 50° C. for 30 min. The entire cDNA synthesis reaction was then converted to double strand DNA using a limited number of PCR amplification cycles in the following 100 μl reaction mixture: 1× PCR buffer (Perkin-Elmer Cetus), 1.5 mM MgCl2, 200 μM dNTPs, 1 μM SA2, 1 μM SD6 and 2.5 U Taq polymerase (Perkin-Elmer Cetus). 6 amplification cycles were used and consisted of 1 min at 94° C., 1 min at 60° C. and 5 min at 72° C. To eliminate vector-only and false positive products, 50 U of BstXI (New England Biolabs) was added directly to the reactions, followed by overnight incubation at 55° C.
10 μl of the digest was then used in a second PCR amplification using internal primers in the following 100 μl reaction mixture: 1×PCR buffer (Perkin-Elmer Cetus), 1.5 mM MgCl2, 200 μM dNTPs, 1 μM (CAU)4-SD2, 1 μM (CUA)4-SA4 and 2.5 U Taq polymerase (Perkin-Elmer Cetus). 25 amplification cycles were used and consisted of 1 min at 94° C., 1 min at 60° C. and 3 min at 72° C. Products were separated by electrophoresis and fragments larger than the pure SD2/SA4 RT-PCR product excised and subcloned (CloneAmp pAMP1 System; Gibco BRL) into pAMP1 according to the manufacturer's protocol. Ligation reactions were then transformed in ultracompetent E. coli XL-2 blue (Stratagene) and plated on selective medium containing X-Gal/IPTG.
Identification of candidate exons. All white colonies were picked and transferred to 384-well microtiter plates containing selective medium and incubated overnight at 37° C. With a 384-pin transfer device 24.5×24.5 cm culture plates with and without positively charged nylon membranes (Amersham) on top of them were inoculated and also incubated overnight at 37° C. Colonies grown on culture plates were pooled for plasmid preparation, colonies on nylon membranes were used for colony lifts. Plasmid inserts were excised, purified, and hybridized to nylon membranes containing EcoRI-digests of the PAC clones used as the original substrate. Highlighting bands were subsequently isolated and hybridized to colony lifts to identify candidate exons. Candidate exons were isolated and sequenced by Sequitherm EXCEL II DNA Sequencing Kit (Epicentre Technologies). Sequences were automatically analyzed and read on an ALFExpress DNA sequencer. Table 6 lists the sequences of the isolated exon trap clones.
Exon Trapping. DNA from chromosome Y specific PAC (P1-derived artificial chromosome) clones RP1-148J07, RP5-1160A12, RP1-301P22, RP4-532107 and RP1-114A11 was partially digested with Sau3AI and fragments in the range of 4-10 Kb were individually subcloned into pSPL3B. COS7 cells were transfected and after 72 hrs cytoplasmic RNA was harvested using QIAGEN RNeasy Kit cDNA synthesis was performed as recommended by the manufacturer (Gibco-BRL). Primers flanking the cloning sites were used to identify products larger than the pure SD2/SA4 RT-PCR product. These fragments were excised, subcloned (CloneAmp pAMP1 System; Gibco BRL) into pAMP1 and sequenced Exon trap clones were labelled with 32P-dCTP by random priming and used as hybridization probes on Southern blots. Hybridization: 16 hrs at 65° C. in standard hybridization buffer (Singh and Jones 1984). Wash: three times for 20 min each at 65° C. in 0.1×SSC, 0.1% SDS.
In silico gene prediction. Completed genomic sequences from BAC clones RP11-75F05, RP11-461H06, RP11-333E09, RP11-558M10, CITB-298B15 and CITB-144J01 were analyzed for homologies to known genes and virtual gene content using the NIX (http://menu.hgmp.mrc.ac.uk) and Rummage (http://gen100.imb-jena.de) software packages. Computational identification of promoters and first exons was achieved by submitting BAC sequences to FirstEF (http://www.cshl.org/mzhanglab).
Reverse-transcribed polyA+-RNAs and cDNA libraries. Human polyA+-RNA of 16 fetal and adult tissues was purchased either from Clontech or Invitrogen. Human polyA+-RNAs from 3 osteosarcoma and 1 bone marrow fibroblast cell line were isolated by the QIAGEN Oligotex kit. First-strand cDNA synthesis was essentially carried out as described (Rao et al. 1997). Fourteen cDNA libraries were obtained either from Clontech or Stratagene. A collection of 40 cDNA libraries was also provided by the Resource Center of the German Human Genome Project (RZPD). The complete list is available on request.
Characterization of potential transcription units. After homology comparison and open reading frame (ORF) analysis of exon trap clones, primers were designed for RT-PCR amplification. Sequences are summarized in Table 8. In those cases where exon trap clones consisted of only one exon, two exon-specific primers were combined with cDNA-library specific primers in semi-nested PCP, Primers were designed from predicted gene models to amplify across exon/intron boundaries. To provide evidence of transcription, primers were used to screen a panel of cDNA libraries and polyA+-RNAs (see above). In the case of potential coamplification from homologous transcripts, primers flanked Y-specific restriction sites.
Evolutionary strata classification. Sequence divergence between genes/pseudogenes of the GCY region and their functional/non-functional progenitors was determined according to Li, 1993. Sequences for all pseudogenes were extracted from genomic sequences: KIAA1470PY from BAC clone RP11-75F05 (AC011293), KIAA1470P1 from BAC clone RP11-498M14 (AL445675), ADLY from BAC clone RP11-333E09 (AC011302), ARSFP and RPS24P1 from BAC clone CITB-144J01 (AC004772), RPS24PX from BAC clone RP11-418N20 (AC119620), ASSP6 from BAC clone RP11-461H06 (AC012502) and ASSP4 from BAC clone GS1-536K07 (AC004616). Sequences for all other genes were obtained from published cDNAs, whose GenBank accesion numbers are as follows: ADLX (AF245505), ARSF (XM—035467), RPS24 (NM—033022), ASS (X01630), KIAA1470 (AB040903). THC604695PY was not analyzed as only part of its most terminal exon (consisting almost entirely of 3'UTR) was available for comparison with the X-chromosomal EST cluster (AA662182 and AA662138).
Mapping of interstitial deletions
We studied the DNA of nine adult males which originally consulted reproduction centers about idiopathic infertility, but were otherwise generally healthy. Of the 9 males, 7 were unremarkable with respect to adult height. One patient, #293, with a height of 157 cm, presented short stature (SDS −2.9) and one, Y0308, with a height of 165.5 cm showed borderline height, being at the 3rd percentile of normal U.S. height standard (SDS −1.7). Adult height of his parents and siblings are in the normal range (Table 1), his brother being 20.5 cm taller than the patient Compared to his target height (178 cm) and target range (169-187 cm) he can be considered short. All men were ascertained solely on the basis of the occurrence of large de novo interstitial deletions on the Y chromosome. Only two of those patients had undergone previous chromosomal studies.
In our effort to localize the GCY locus, we focused on that part of the Y chromosome long arm, which was delimited by the boundaries of the interstitial deletions of the patients with short stature (
As the distal breakpoint of the deletion of patient #1972 does not reside within the specific part of the Y chromosome long arm, the nature of the deletion (terminal or interstitial) remained unclear. There was also no overlap of his deletion with the deletions of patients #1947 and T.M. Relying solely on the results obtained by the STS-based interstitial deletion mapping strategy, one could not formally exclude the region distal to sY158 as a potential critical region for GCY. However, multiplex PCR analysis always showed a less intense amplification product for STS sY157 (a Y-derived marker in close vicinity of sY158). To address this problem, the rearranged Y chromosome of patient #1972 was investigated in more detail.
Fluorescence In Situ Hybridization and Sequence Family Variant Typing Of Patient #1972
The overall integrity of the Y chromosome from patient #1972 was demonstrated by FISH of the cosmids LLOYNC03“M”34F05 (PAR1) and LLOYNC03“M”49B02 (PAR2) as well as the Y-centromere-specific probe Y-97 and the telomere-specific probe ‘all human telomeres’ (data not shown). Being aware of the complex structural organization of the human DAZ locus (
Typing the genomic DNA of patient #1972 for all four sequence family variants (SKY10/Tsp509I, SKY11/NlaIII, SKY12/MseI, and SKY13/Cac8I+TfiI) revealed the absence of one Y-derived non-allelic sequence variant (Table 3 and
Next, we investigated these SFVs in the two patients with the most distal breakpoints (#95 and #1947). Using genomic DNAs, we determined that both non-allelic variants of SKY11, SKY12, and SKY13 and one non-allelic variant of SKY10 were absent in patient #1947, whereas for all tested SFVs one non-allelic variant was absent in patient #95.
Taken together, these results provide evidence that the proximal breakpoint of the interstitial deletion present in the Y chromosome of patient #1972 resides within the interstitial deletion of patient #1947, thereby excluding this genomic region as a potential critical interval for GCY.
Refinement of the GCY Critical Interval
Based on the molecular analysis of the pericentric region of the long arm of the human Y chromosome (Williams and Tyler-Smith 1997), the physical extension of the GCY critical region as defined by the markers sY78 (DYZ3) and sY83 (DYS11) was estimated to constitute 1.6-1.7 Mb (
We generated 25 additional markers mainly by sequencing the end fragments of BAC, PAC, and YAC clones as well as clone-internal sequences amplified by various combinations of Alu-Alu oligonucleotide primer pairs. Of those, only 7 turned out to be Y-specific (SKY1, SKY2, and SKY4-8) (see Table 2 for detail). The BAC and PAC clones identified during the generation of the physical map are summarized in Table 4. Meanwhile, some of these clones have been completely sequenced as they form part of a tiling path for sequencing the human Y chromosome (Tilford et al. 2001). The proximal part of the cloned region between markers sY78 and SKY6 has not been sequenced to date. A selection of clones covering the entire GCY critical region is depicted in
Confirming the overlap between BAC RP11-295P22 and BAC RP11-322K23 appeared to be the most crucial step in the process of contig construction. Y-specific markers derived from the opposite end fragments of both clones were suspected to amplify identical-sized fragments from two different loci within the same 5 bp satellite region. By testing several restriction enzymes known to cut frequently within 5 bp satellites composed of the consensus sequence (TGGAA)n, we developed loci-specific PCR/restriction digestion assays. Typing all BAC clones mapping to this sequence block with the appropriate PCR/restriction digestion assay allowed us to precisely position them thereby confirming their overlaps.
In order to narrow down the critical interval for the GCY gene, we tested for the presence of the newly generated STS in patients #293, Y0308, and JOLAR. These results allowed us to define a small region for the GCY gene (
We have also established new Y-specific markers scattered uniformly across the entire 420 Kb of DNA (Tab 5).
Exon Trapping in the GCY Critical Region.
The boundaries of GCY region are defined by two deletion patients, JOLAR and Y0308 (
In Silico Analysis of Annotated BAC Clones.
We analysed the genomic sequence of the complete GCY region using the gene prediction programs assembled by the NIX and Rummage software packages. Homologous sequences were also analysed in the non-redundant (nr) database of GenBank using the BLASTN or FASTA algorithm. BAC RP11-75F05, for example, includes a 1 Kb segment with a 77% homology to the transcriptional unit KIAA1470 on chromosome 1p36 (
BAC RP11-333E09 includes a deleted duplication (ADLY) of the adlican gene on chromosome Xp22 (ADLX). ADLX has been previously shown to be upregulated in osteoarthritic tissue and therefore likely plays a role in bone metabolism. The Y chromosome copy, therefore, constitutes an important candidate for a gene involved in growth. Despite the loss of exons 3 and 4 as a consequence of intrachromosomal recombination, its basic structural organization (
Using various gene-finding programs we detected 17 gene models in the GCY region (
In conclusion, there is no identity of exon trap clones and gene models/homologies or pseudogenes KIAA1470PY, ASSP6, and THC604695PY. Considering ADLY as the most attractive candidate for the GCY locus, we directly compared the exon/intron boundaries of the Y- and X-derived copy (Tab. 9b). Exons 3 and 4 of ADLX are deleted on the Y copy. The remaining 3 internal exons still possess correct 5′ and 3′ splice sites.
Searching for a Transcriptional Unit
Homology searches performed with all exon trap clones and predicted gene models against the dbEST segment of GenBank did not yield any Y-specific EST. PCR and PCR/restriction digestion assays with primers corresponding to all putative transcriptional units were carried out. Primers derived from all exons of ADLY (Tab. 7B, 7C), the most prominent GCY candidate, were used to screen reverse-transcribed polyA+-RNAs from osteosarcoma and bone marrow fibroblast cell lines. Whereas ADLX was shown to be expressed in all tested cell lines (with the exception of neuronal tissues), no ADLY specific specific transcript was detectable. More extensive screening of polyA+-RNAs from various adult and fetal tissues basically led to the same result. We also tested all putative transcriptional units in the GCY region for expression in polyA+-RNAs from 21 tissues and 49 cDNA libraries. RT-PCR assays did not provide proof of a transcribed gene.
Evolutionary Features of Time GCY Critical Region.
High sequence homology of the Y chromosome to other chromosomal regions is consistent with an evolutionarily recent transposition of those regions to the Y chromosome. More subtle nuances in synonymous nucleotide divergences of homologous gene pairs (K) allow their integration into distinct evolutionary strata, group 1-4 (Lahn and Page 1999). The calculated Ks values for all gene pairs in the GCY region along with Ks values from reference genes of the different stratas are given in table 6. We noted that the Ks values for all X-Y gene pairs can be grouped into the most recent evolutionary stratum (group 4), having been embarked on X-Y differentiation 30 to 50 million years ago. This classification is independent of the actual functional state of X-chromosomal genes. Comparing Ks values between the Y-copies in the GCY region and their functional progenitors clearly demonstrates that decay of the X-chromosomal copies took place before the X-Y recombination occurred. Even more prominent is the difference between Ks values for the chromosome 1-chromosome Y gene pairs. The low Ks value for the KIAA1470P1/KIAA1470PY gene pair points towards a very recent transposition to the human Y (
As the frequency of nonsynonymous substitutions (Ka) is a function of both evolutionary time and selective constraints on the encoded proteins, the degree of constraint can be reflected in the ratio Ks/Ka (Li, 1993): Values greater than one indicate the presence of constraints on both homologs, and values in the vicinity of one are consistent with lack of constraint on at least one homolog. All determined Ks/Ka ratios suggest that natural selection on the Y copies is not ongoing thereby underlining their pseudogene status.
We searched the nr database of Genbank with the homology transitions and the distal border of the GCY region to precisely determine the physical extent of the homologous regions on chromosomal subintervals 1q43 and Xp22. To identify highly conserved segments, we used Advanced PipMaker (Schwartz et al. 2000, http://bio.ces.psu.edu) for comparing the corresponding DNA. Inspection of the compound dot plot allows the identification of those portions of the GCY region absent in homologous sequences. As the overall homology of Y/1 and Y/X in conserved regions is already in the range of 94-97% and 96-99%, putative protein-coding exons are not expected to show average percent identities higher than the non-coding environment Careful dot plot analysis showed that all novel sequences that have accumulated in the GCY region on the Y after the separation from its autosomal or X-chromosomal counterpart are exclusively of repetitive origin. Particularly evident is the prevailing preponderance of integrated LINEs family members.
Discussion
Since the issue on the existence of a Y-specific growth gene (GCY) was first raised, there have been several attempts to define its precise location. Whereas initial studies unanimously pointed towards a common region of the Y chromosome long arm (Salo et al. 1995), more recent investigations have led to the identification of two non-overlapping critical intervals (Rousseaux-Prevost et al. 1996, Ogata et al. 1995, De Rosa et al. 1997). FISH analyses resolved this apparent contradiction by presenting clear evidence that the patient materials used in these initial investigations contained 45,X0 cells and/or i(Yp) or idic (Yq11) chromosomes (Kirsch et al. 2000). Both genetic parameters influence the adult height of a given individual, thereby rendering it impossible to predict whether such patients have lost GCY or not Studies with patients carrying de novo interstitial deletions are, therefore, much better suited to address the problem of GCY localization.
In the course of winnowing the literature for patients with small interstitial deletions, in particular close to the centromere, it became clear that those patients are very rare. This prompted us to extend our search for patients carrying large de novo interstitial deletions, irrespective of their actual adult height. We examined 9 adult patients, 7 of whom presented normal height Furthermore, we could show overlapping deletions, thereby excluding GCY to reside between the Y-specific marker DYS11 and the pseudoautosomal region 2 (PAR2). Two patients, #293 and Y0308, presented interstitial deletions enabling the restriction of the GCY critical region to approximately 700 kb of DNA. This region is therefore predicted to harbour one or more genes required for normal human growth.
Exon Amplification and Gene Modeling in the GCY Region.
Although much attention has been drawn to the various azoospermia (AZF) critical regions in Yq11 as well as Y-encoded testis-specific or ubiquitously expressed genes, the GCY region up to now was not searched systematically for transcription units. We have used exon amplification, homology search, and in silico gene prediction to identify putative genes within this region. This information now provides the means to test candidate genes for involvement in human linear growth regulation. Up to date, the major problem in defining the GCY gene was the lack of potential transcription units assigned to this portion of the human Y chromosome. Prior to this study, there were only two pseudogenes, RPS24P1 and ARSFP, that mapped to the GCY critical region (Sargent et al. 1999).
By exon amplification we isolated 9 different exon trap clones, two of which were composed of two exons. Parallel sequencing efforts of the GCY region by the Human Genome Project allowed us to complete our catalog of potential transcription units in the GCY region. No Y-specific ESTs were assigned to the region. The Nix and Rummage software programs were used to analyze sequence data of completed BACs to predict potential genes in the sequence. We have identified 4 new genes/pseudogenes and 17 gene models. Of the 17 gene models, only five have homologies to the identified genes/pseudogenes. A gene model homologous to ADLY (cf1) was uniformly predicted by all gene-finding programs. Though, the probability given by various gene finding programs might be overestimated with regard to the gene model cf1. Very large exons, as present in ADLY, are less likely to be predicted correctly, but they are most unlikely to be completely missed. Consequently the tendency to classify actual pseudogenes as functional genes increases with the presence of large exons. The failure to trap exons of the putative ADLY transcription unit, albeit possessing correct splice sites, might be an intrinsic feature of Y-chromosomal sequences. Complete representation of the AZFc region in cosmid/P1 clones used for exon-trapping experiments (Reijo et al. 1995) led to the detection of DAZ as the only gene out of a possible 8 genes/gene families located in this region (Kuroda-Kawaguchi et al. 2001).
Surprisingly, we observed no concordance between the gene models and the exon trap clones. It is possible that exon amplification is dependent on the presence of functional splice sites in the genomic sequence whereas gene modeling is mainly based upon the in-phase hexamer measure (Rogic et al. 2001), a method determining the incidence of oligonucleotides of length six in a specific open reading frame. On the other hand, the prediction of correct splice sites is less important since such signal sensors have low information content and are usually degenerate. Consequently, the exon trap clones need not to be necessarily part of one of the predicted gene models, although a substantial fraction of the trapped exons (7/11) are composed of 75 to 200 nucleotides, a length range in which exons are most accurately predicted. Likewise, the putative exons assembled to a distinct gene model do not necessarily represent real exons.
It is possible that the eventual number of genes in the GCY region is smaller since exon trap clones and/or gene models turn out to be part of the same transcripts or do not represent genes at all. Despite the number of potential transcription units in the region, however, the search for the critical one might still be complicated by the fact that the phenotypic effect caused by mutation of the GCY locus is hard to be defined precisely. This makes it difficult to predict an expression profile, especially when the gene function is unknown. Since human linear growth is a multifactorial trait, growth failure is quite common. Although at least nine growth-controlling genes have been identified up to now, only few cases present disease-causing mutations within those genes. Definition of the transcription units in the region should now facilitate mutation studies, especially since full-length genes/pseudogenes have been isolated
Although reverse-transcribed polyA+-RNAs and cDNA libraries have been extensively screened, we have not detected any transcript specific to the Y. This raises the question whether our approach was suitable. To assess its usefulness we have verified the expression pattern of 20 genes known to be essential for bone development at GenePage (http://genome-www5.stanford.edu). At least double presence for each selected gene was warranted by our screening efforts. This corroborates the existence of an unusual gene with an extremely confined spatial and/or temporal expression pattern.
Evolutionary Features as a Clue to the GCY Locus?
To gain more insight into the molecular genesis of the GCY critical region, we used two methods. First, we validated the functional state of the genes/pseudogenes within the GCY region by comparing them with their direct and functional progenitors. All gene pairs showed Ks/Ka ratios of 1 to 2 rather indicating that the Y copy is a pseudogene. This result assigns the X-Y gene pairs to evolutionary stratum 4 which fits very well since all those gene pairs share a common evolutionary history. Only one gene pair out of this class, AMELX/Y, still encodes a functional X- and Y-copy (Salido et al. 1992). The Y-copy of KIAA1470 clearly could be classified as a pseudogene by comparing it with its functional progenitor on 1p36. Second, we made use of large-scale sequence comparison in order to identify potential differences between the subintervals of the GCY region and their homologous counterparts in Xp22 and 1q43. Neither subregions with a conservation level above the molecular environment nor small genomic fragments newly integrated into the GCY critical region could be detected. Furthermore, promoter prediction carried out simultaneously on homologous genomic sequences revealed no differences. This clearly excludes substantial rearrangements within the GCY critical region and lends support to a gene underlying male-specific regulatory mechanisms.
The standard deviation score (SDS) was calculated based on the equation: SDS = (X − M)/SD, where X is an individual's adult height and M and SD are the mean adult height and the ±1 standard deviation of the normal population, respectively.
(M) mother,
(F) father,
(S) sister,
(B) brother,
(NA) not available.
Markers indicated with a * amplify DNA fragments from more than one genomic locus (see Chapter Restriction analysis of PCR products for detail).
*The submitted sequence of the chromosome 1-derived BAC clone RP11-560I18 (AC053522) does not show a Tsp509I restriction site within the genomic fragment amplified by the primer pair SKY10. Restriction analysis of fragments amplified from male and female genomic DNA, from a somatic cell hybrid line containing chromosome 1 as the only chromosome of human origin and from the BAC RP11-560I18 as well
*14A3C is a hybridization probe previously described by Tyler-Smith et al. 1993. It detects a Y-specific HindIII-fragment of 3.5 kb and an additional autosomal fragment.
AAAGAGAAGGGCCCTGTGAT
1predicted product size in bp;
2Potential Y-derived transcript copies will be cut with the indicated restriction enzyme, potential X-derived transcripts remain uncut;
3indicates primer positions (orientation centromer to telomer) in the predicted gene containing BAC (a, b, c or d).
1ADLY refers to the gene predicted according to homology comparison with functional X-adlican.
2Numbering of exons is based on the exon/intron organization of the X-copy. Please note: RT-PCR with cf1for/rev would generate different-sized products from adlican copies. cf1-4a/cf1-6453 and C21/Cf1-4b amplification products encompass chromosome-specific restriction sites (cf1-4a/cf1-6453: Y-BamHI, X-PsyI; C21/cf1-4b: Y-NlaIII, X-SacI).
The product size of eta2 is 175 bp and of etc4 166bp. For single exon-trap clones semi-nested PCR was carried out: a reflects the outer primer, b the inner one.
*If chromosome X- or 1-derived copies of genes from the GCY region were not functional, Y-copies were additionally compared with their functional progenitors.
Number | Date | Country | Kind |
---|---|---|---|
0209640.2 | Apr 2002 | GB | national |
0215188.4 | Jul 2002 | GB | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP03/04546 | 4/25/2003 | WO | 12/7/2005 |