The object of the present invention is the in vitro diagnosis of the so-called polycystic ovary syndrome (PCOS) or the diagnosis of the predisposition to develop this syndrome, as well as the detection of the existence of or the predisposition to develop certain pathologies representing cardiovascular risk factors and which frequently occur associated to the polycystic ovary syndrome. The in vitro diagnosis method is based on the detection of CAPN5 (calpain-5) gene polymorphisms. The presence or predisposition to develop PCOS itself, hypercholesterolemia (both linked to total cholesterol and specifically referring to LDL-cholesterol), high blood pressure (specifically a rise in diastolic pressure), obesity, glucose intolerance, diabetes, hypertriglyceridemia, and low HDL-cholesterol levels, as well as the grouping of some of these cardiovascular risk factors, known as metabolic syndrome, can be diagnosed by means of the assay method and the assay device of the invention.
The polycystic ovary syndrome (PCOS) is the most usual endocrine disorder that affects women of child-bearing age. A prospective study reported an overall presence of 6.5% of PCOS specifically in Spain.
Despite this prevalence, little is known about the etiology of PCOS but there is increasing evidence of an important genetic implication (Legro et al., 1998; Govind et al., 1999). However, the hereditary manner of PCOS is still uncertain and recent studies indicate that this disorder could be a complex character (Crosignani and Nicolosi, 2001). This means that several genes are interacting with environmental factors (especially dietary factors) to determine the typically heterogeneous, clinical and biochemical phenotype (Weiss and Terwilliger, 2000). Biochemical parameters including fasting insulin levels or hyperandrogenemia appear to be highly hereditary parameters, suggesting that some clinical signs, symptoms or biochemical parameters of PCOS could be transmitted as autosomal dominant or X chromosome-linked Mendelian characters (Legro and Strauss, 2002).
The current data strongly support an association between PCOS and the risk of suffering from long-lasting diseases. The pathologies that have been associated with PCOS include:
This emphasizes the need of an early diagnosis of the syndrome and a close follow-up of women with PCOS.
By searching the molecular bases of PCOS, several research groups have started a systematic search of genetic risk factors involved in the predisposition to suffer from PCOS and in its prognosis (Legro and Strauss, 2002). Several genes involved in reproduction, which genes affect the secretion or action of insulin and the genes involved in obesity and energy regulation have been studied as candidate genes. Attention has specifically been focused on genes encoding steroidogenic enzymes of the androgen biosynthesis route (Urbanek et al., 1999; Gharani et al., 1997; Carey et al., 1994) and on the genes involved in the secretion and action of insulin (Urbanek et al., 1999; McKeigue and Wild, 1997; Waterworth et al., 1997; Talbot et al., 1996; Eaves et al., 1999).
PCOS has been associated with a 2-7 times risk of suffering from type 2 diabetes mellitus (T2DM). Previous epidemiological and genetic studies have shown that PCOS and T2DM could share genetic predisposition factors associated with both pathologies. Using this working hypothesis, several studies have suggested that genes related with T2DM could have an important role in the pathogenesis of PCOS (Ehrmann, Tang et al., 2002; Ehrmann, Schwarz et al., 2002; González et al., 2002; Escobar-Morreale et al., 2002). The first gene of predisposition to suffer from type 2 diabetes mellitus disclosed by a wide examination of the genome and positional cloning was a member of the family of cysteine proteases similar to calpain which is expressed ubiquitously as CALPAIN-10 (CAPN10) (Horikawa et al., 2000). Association studies using intragenic markers of the CAPN10 gene has disclosed that different alleles can contribute to the genetic predisposition to suffer from T2DM in several populations (Horikawa et al., 2000; Evans et al., 2001; Garant et al., 2002). Thus, for example, document WO 00/23603 describes the detection of a polymorphism in the CAPN10 gene associated with the predisposition to suffer from type 2 diabetes mellitus. Although it indicates that the method could be valid to detect polymorphisms in portions encoding other proteases, especially of the calpain family, the search or the detection of another polymorphism that is not the one associated to the CAPN10 gene is not described and associations with disorders other than type 2 diabetes mellitus are not carried out either. New candidates to risk alleles and genotypes within the CAPN10 gene, which could be associated with important phenotypic differences observed in patients with PCOS (Ehrmann et al., 2002; González et al., 2003), hypercholesterolemia (Daimon et al., 2002; González et al., 2003), hypertension (Hong et al., 2002), obesity (Shima et al., 2003), and recently with hypertriglyceridemia (Carlsson et al., 2004), have recently been identified. As in the previous case, these works do not show alterations in genes other than CAPN10 that could also be related to the predisposition to develop PCOS or the mentioned cardiovascular risk factors.
CAPN5 is a paralogue (a gene homologous to another gene within a same species which has been generated by genetic duplication) of the CAPN 10 gene which, like the latter, encodes a protease which has been involved in regulating several cell functions, including intracellular signaling, proliferation, differentiation and apoptosis (Ono et al., 1998; Sorimachi et al., 1997; Suzuki et al., 1992). Although the proteases encoded by both CAPN5 and CAPN10 tend to be considered as ubiquitous proteins, consulting the NCBI (National Center for Biotechnology Information) website, http://www.ncbi.nlm.nih.gov/, shows that there are tissues in which they are not expressed, the expression patterns of the proteins encoded by each gene further being different. The number of different isoforms which they have are also different, 8 isoforms being known for CAPN10 (ranging between 138 and 672 amino acids) and only one for CAPN5 (640 amino acids). Considering the larger CAPN10 isoforms, it is acknowledged that the organization in structural domains of protein sequences encoded by the CAPN5 and CAPN10 genes is considerably similar, although the alignment of the protein sequences of CAPN5 (640 amino acids) and isoform a (672 amino acids) of CAPN10 shows that only 31% of the amino acids of both proteins are the same and are in the same position. As is logical between paralogues, the gene sequences are considerably homologous, although they differ in length (49,136 bp encoding in CAPN10, reaching 59,192 bp in CAPN5), in the number of exons and introns (15 exons in the case of CAPN10, 13 in the case of CAPN5), in the presence of an intragenic gene within intron 3 of the CAPN5 gene (OMP: olfactory marker protein) and its location in the genome (CAPN10 is located in chromosome 2, specifically in 2q37.3, whereas CAPN5 is in chromosome 11, in 11q14). In spite of having similarities with CAPN10, the commented state of the art on the latter gene had not suggested nor analyzed a possible association between polymorphisms in the CAPN5 sequence and the predisposition to suffer from type 2 diabetes mellitus, PCOS or the disorders commonly associated with PCOS. The differences observed between CAPN10 and CAPN5 regarding their position in different chromosomes, their different isoforms, the low percentage of alignment between their amino acid sequences, the different number of exons and introns and, especially a different expression pattern, make it evident to think that CAPN10 and CAPN5 have different functions and are involved in different mechanisms or metabolic pathways which in turn can give rise to their involvement in unrelated diseases.
For its part, document WO 02/45491 describes a method for searching for associations between certain proteases (including CAPN5) and the predisposition to suffer from certain disorders, but the method is different: generating cell lines and transgenic mice with the modified CAPN5 gene and observing the possible phenotypic changes produced in them. Although it is suggested that the comparison of the genomic or cDNA sequences of individuals with disorders with the sequences corresponding to healthy individuals could be useful for the detection of possible involved mutations, a specific method or candidate to a mutated gene are not detailed and all the comments on this matter are focused on the case of cancer.
Franz et al. (2004) also deals with the study of the consequences that the calpain 5 gene alteration can have in the organism. The method used is again different from that of the invention: the generation of transgenic mice with the modified CAPN5 gene and the observation of the tissues in which the modified protein is expressed and the possible phenotypic changes generated in the mice. Although a reduction in the viability of transgenic mice is mentioned and no correlations with any specific disorder are established.
By means of genotyping the Finnish population, Silander et al. (2004) identified a wide region in chromosome 11 which included genes such as CAPN5 and THRSP, with a high linkage to the predisposition to develop diabetes. Said linkage was not assigned to any specific gene, to any polymorphism within any gene nor to any haplotype related to said polymorphisms.
However there is no disclosure in the state of the art that directly relates the changes in the CAPN5 gene sequence to the development or the predisposition to develop PCOS in humans, to any of the pathologies with which said syndrome is associated or to pathologies involving cardiovascular risk factors, associated or non-associated to PCOS. The present invention develops a methodology based on the CAPN5 gene and on the detection of the following polymorphisms in said gene: Nt g.86 A>G, Nt g.344 G>A, Nt c.1320 C>T and Nt c.1469 G>A.
The object of the present invention is to show the role of CAPN5, the gene encoding the CAPN5 protein, in the predisposition to suffer from both PCOS and any of the pathologies considered as cardiovascular risk factors (obesity, diabetes, hypertension, hypercholesterolemia and hypertriglyceridemia), which frequently occur associated to PCOS. To that end, two association studies were carried out by studying the possible relationship between the occurrence of the pathologies and the frequency of four intronic polymorphisms within the CAPN5 gene (Nt g.86 A>G, Nt g.344 G>A, Nt c.1320 C>T and Nt c.1469 G>A) in the following population groups:
1. In the case of PCOS, a series of patients (148 women affected by PCOS) who showed PCOS, detected by means of ultrasound scans, combined with one of more clinical features of PCOS was used and the frequency of the occurrence of said polymorphisms was compared with normal controls.
2. In the case of cardiovascular risk factors, an intra-cohort study (within the population group itself) was carried out in 606 unrelated individuals (278 men and 328 women), observing in them the presence or absence of fourteen phenotypic characteristics involving cardiovascular risk factors by themselves or as a result of the association thereof (the case of metabolic syndrome) and the possible relationship with the occurrence of the mentioned intronic polymorphisms.
In order to analyze the role of CAPN5 in the occurrence of these phenotypic features, corresponding haplotype-genotype-phenotype correlation studies of the CAPN5 gene were carried out in the populations corresponding to each of the two studies (the 148 women affected by PCOS on one hand and the 606 unrelated individuals on the other).
Attempts were also made to locate other CAPN5 gene variants that might be involved in disorders caused by alterations in the expression of said gene. For this purpose, a new analysis of the available CAPN5 sequence was carried out with different software, two polymorphisms that could have a deleterious effect on gene expression and contribute to the phenotypic charts of the second study being located. In order to confirm this hypothesis, a subgroup of the population of the study regarding the relationship with cardiovascular risk factors was selected and it was observed if these variants were present in their genome.
Patients
The study population consisted of 148 unrelated women with PCOS. All the patients and controls were Caucasian (European white).
According to Homburg, 2002, PCOS was defined by the presence of bilateral polycystic ovaries detected in examinations by means of ultrasound scans: up to a total of fifteen to twenty follicles with a diameter of less than 10 mm between both ovaries, an increase in the stroma volume and an increase in ovary volume (>9 ml). In addition to said ultrasound scans, the following clinical features were determined: 1) prolonged and idiopathic amenorrhea, defined as the absence of menstruation for at least more than 3 consecutive months (as opposed to normal monthly periodic menstruation or eumenorrhea) and/or oligomenorrhea (menstrual cycles >35 days, 2) clinical and/or biochemical hyperandrogenism: hirsutism according to Ferriman and Galleway criteria, acne, alopecia, increase in testosterone (T) levels (>3 ng) or 3) anovulatory infertility. In the clinical feature of hirsutism, there are women with PCOS who do not have hirsutism; therefore, both subgroups are distinguished when clinical and biochemical criteria for the assignment of patients with PCOS are used (see Table VII). Patients with the following exclusion criteria were excluded: hyperprolactinemia, thyroid disorders, and nonclassic 21-hydroxylase deficiency (Zawadzky and Dunaif, 1992). To estimate the population frequencies of the SNPs analyzed, 94 non-selected healthy controls of the same race and geographical region were genotyped anonymously.
DNA Extraction
5 ml of peripheral blood were obtained from all patients and controls to isolate germ line DNA from leukocytes. DNA extraction was performed according to standard procedures using NucleoSpin Blood Kit (Macherey-Nagel). DNA aliquots were prepared at a concentration of 5 ng/μl to perform PCRs. The rest of the stock was cryopreserved at −20 C.
Single Nucleotide Polymorphisms (SNP)
The genomic sequence used to carry out this study corresponds to the genomic sequence of the CAPN5 gene. The genomic sequence containing the CAPN5 gene (Contiguous Genomic Segment Group NT 033927, locus ID 726, ENSG00000149260) was identified using the blat tool in the http://www.ucsc.edu webpage, belonging to UCSC (University of California Santa Cruz) with a probe for mRNA of CAPN5 (GenBank accession number NM004055). The gene includes 59,192 bp comprising 13 exons and 12 introns in 11q.14.
Searching for SNP, germ line DNA from forty unrelated individuals is used, and this gene was traced selecting two candidate areas, if possible defined as clusters (calpain gene regions in which there is a proximity between polymorphisms implying that, in each of these regions, the polymorphisms are not transmitted to offspring independently, but as a single block), located in intron 1 and intron 3 (FIG. 1). Four sites of DNA with variations were identified (see FIG. 1) using two-way automated DNA sequencing, which sites were organized in two clusters; 1) two polymorphisms located in region 5′, intron 1 of the CAPN5 gene: Nt g.86 A>G and Nt g.344 G>A (according to GenBank accession number AY547311), and 2) two polymorphisms located in the region encoding the OMP (Olfactory Marker Protein) gene, intron 3 of the CAPN5 gene: Nt c.1320 C>T and Nt c.1469 G>A (according to GenBank accession number U01212). The information regarding the detected markers was compared to the information of the UCSC Genome Bioinformatics and Genome Database (GDB) web pages.
The parallel analysis of these polymorphisms was genotyped using automated DNA sequencing methods. Amplification primers covering the entire 5′ region of the study were designed for the PCR (Table I).
Primers and internal probes used to amplify and detect the genotypes of Nt g.86 A>G, Nt g.344 G>A, Nt c.1320 C>T and Nt c.1469 G>A of the CAPN5 gene.
PCR was performed with a final volume of 10 μl using 10 ng of genomic DNA, 1 mM of each amplification primer, 4.4 mM of MgCl2 and 1 μl of the reaction mixture LC Faststart DNA Master SYBR green I (Roche Applied Science). The amplification conditions during PCR in the Lightcycler thermal cycler were an initial denaturation step at 95° C. for 7 minutes, followed by 40 cycles of 95° C. for 0 seconds, 67° C. for 10 seconds and 72° C. for 45 seconds. The PCR products were directly purified and sequenced using the primers Capn5F (SEQ ID NO: 7) and Capn5R (SEQ ID NO: 8). The sequencing reactions were carried out using a CEQ dye terminator cycle sequencing quick start kit (Beckman Coulter, Inc.), according to manufacturer's instructions, and were analyzed with the CEQ™ 8000 genetic analysis system (Beckman Coulter, Inc.).
Amplification primers and fluorescent detection probes were designed and synthesized for the PCRs of the CAPN5 gene using the Web Primer software (genome-www2.stanford/edu.cgi-bin/SGD/web-primer) following the manufacturer's instructions. The selected primer pairs and the detection probes are summarized in Table I. The technique used is called FRET (fluorescence resonance energy transfer).
PCR conditions: Real-time PCR was performed in the LightCycler system (Roche) using previously published reaction conditions (González-Gómez et al., 2003; Real et al., 2001, Buch et al., 2003). PCR was performed to amplify the segments of the CAPN5 gene flanked by the two polymorphic sites within intron 3. The PCR reactions were carried out in a final volume of 10 μl using 10 ng of genomic DNA, 0.5 mM of each amplification primer, 4.4 mM MgCl2, 0.5 mM of each detection probe and 1 μl of LC Faststart DNA Master hybridization probes (Roche). An initial denaturation step of 95° C. for 7 minutes, followed by 40 cycles of 95° C. for 0 seconds, 68° C. for 10 seconds, and 72° C. for 40 seconds was used.
Melting curves: the conditions for obtaining optimal melting curves and spectrofluorimetric genotypes were 95° C. for 0 seconds, 63° C. for 25 seconds, 45° C. for 0 seconds and 80° C. for 0 seconds (with a temperature transfer speed of 20° C./s in each step, except the last step, in which the temperature transfer speed was 0.1° C./s). In the last step, a continuous fluorometric register was performed (F3/F1 for Nt c.1320 C>T and F2/F1 for Nt c.1469 G>A), fixing the gains of the system at 1, 50, and 50 on channels F1, F2, and F3, respectively. The genotype results using real time-PCR are shown in FIG. 1. In order to study the specificity of these assays, selected amplicons of different melting patterns were sequenced using an automated DNA sequencer Beckman Coulter CEQ™ 8000 genetic analysis system).
In order to compare allele frequencies and genotypes among non-selected patients and controls, χ2 tests with Yate's correction were carried out using Statcalc software with analysis software (EpiInfo 5.1, Center for Disease Control, Atlanta, Ga.).
Assays adapted from Sasieni (Sasieni, 1997) were used for the statistical analysis of the genotype distribution, Hardy-Weinberg equilibrium deviation assays, or two-point association studies. These calculations were carried out using the on-line resources of the Institute of Human Genetics, Munich, Germany (http://ihg.gsf.de).
The linkage disequilibrium value (D′) between the studied genetic markers, the haplotype frequencies and the association analysis based on haplotypes were calculated using the Thesias software available in http://genecanvas.ecgene.net (Tregouet et al., 2002; Tregouet et al., 2003).
The polymorphic allele frequency in four loci within the CAPN5 gene: Nt g.86 A>G, Nt g.344 G>A, Nt c.1320 C>T and Nt c.1469 G>A were analyzed in 148 PCOS cases comparing them with normal controls. The allele frequency in the Spanish population was 0.71 for allele A in Nt g.86 A>G, 0.54 for allele G in Nt g.344 G>A, 0.93 for allele C in Nt c.1320 C>T, 0.81 for allele G in Nt c.1466 G>A. The difference in the frequencies of the alleles in these genetic markers between PCOS and the controls was not statistically significant.
The genotype distribution was also determined for each polymorphism and was compared between women with PCOS and healthy women. The genotype frequencies observed during this study are consistent with the Hardy-Weinberg equilibrium law (p>0.38, Table II). When the genotypes observed in patients with PCOS are compared with those obtained in the controls, the Nt g.86 A>G genotypes do not appear to be differ to a considerable extent in their distribution, although they are significant at the 0.05 level (χ2=4.08, p=0.04, Table II). The analysis of the genotype observed in the patients with PCOS against those obtained in the controls, provides experimental evidence of the involvement of CAPN5 in the predisposition to suffer from PCOS, suggesting that the Nt g.86 A>G genotype was associated to PCOS in the studied population.
There were no statistically significant differences in the genotype distribution between the patients with PCOS and the controls in the case of the remaining SNPs (Table II).
aThe assay for the association is adapted from Sasieni (1997), assay for heterozygous.
Standardized studies of the linkage disequilibrium in pairs (D′) were carried out using the Thesias software in order to analyze the degree in which these polymorphisms are in linkage disequilibrium (LD). The “linkage disequilibrium” concept is related to the fact that, when there is proximity between two polymorphisms in a same region, the polymorphisms are not transmitted independently to the offspring, but as a single block. The way to measure the degree in which this joint transmission of two markers occurs is quantified by means of the parameter D′, the values of which range between +1 and −1; a value of D′=+1 indicates complete linkage disequilibrium, i.e., they are always transmitted together. The sign is positive when the least frequent alleles of each polymorphism are transmitted together and negative when the most frequent allele of one and the least frequent allele of the other are transmitted together (Lewontin and Kojima, 1960).
The results obtained with the study population are shown in Table III.
The linkage disequilibrium analysis shows that both polymorphisms of cluster 1 (Nt g.86 A>G and Nt g.344 G>A) are in complete LD and on the other hand, both polymorphisms of cluster 2 (Nt c.1320 C>T and Nt c.1469 G>A) are also in complete LD. These two blocks of consistently high LD are separated by an LD interruption interval between physically close polymorphisms (˜30 kb). The absence of linkage disequilibrium between the two clusters suggests the presence of a recombination hot spot between both clusters.
The construction of haplotypes, formed by SNPs along the four loci within the genomic region of the CAPN5 gene and the association analysis based on haplotypes were carried out using Thesias software. Only a small proportion of haplotypes really appears when the SNPs are in high LD. In this sense, nine unique haplotypes, marked as A-I between the cases and combined controls, were detected in the analyzed population. Apart from the haplotype having the wild-type allele in each of the polymorphic loci (haplotype D), there were eight haplotypes formed from several combinations and permutations of the variants of the sequence and the wild-type sequence in each locus (Table IV). In an overall manner, the allele distribution between patients with PCOS and the controls throughout these none haplotypes was not statistically different (χ2=7.77, p=0.45).
Using asymptotic studies gave as a result that the haplotype H was over-represented in patients with PCOS when they were compared with healthy women (χ2=4.60; OR=2.37, p=0.03, Table V). Haplotype G was infra-represented in the cases when they were compared with the controls, although only a tendency to association was observed (χ2=3.38; OR=0.59, p=0.06, Table V).
aAsymptotic studies;
bThesias software;
A series of haplotype context assays have also been carried out for each of the four polymorphic loci. The effect of the alleles of each polymorphism on the phenotype according to a haplotype context that only differs in the position of the given polymorphism was analyzed. In these comparisons, it was observed that the Nt g.88 A>G polymorphism was only associated with PCOS when it occurred together with the haplotype context (-GCG), haplotypes D-G (OR=0.45, p=0.02, Table VI), and tendency to the association with the haplotype context (-GCA), haplotypes E-H (OR=4.41, p=0.09, Table VI) was observed. The Nt c.1469 G>A polymorphism was only statistically significant with the haplotype context (GGC-), haplotypes G-H (OR=5.47, p=0.04, Table VI). These results showed that there are three groups of markers associated to PCOS and interestingly, it was observed that the four haplotypes included in this group (D, E, G and H) share a common consensus haplotype,-GC-, specific for the CAPN5 gene, which confers predisposition to suffer from PCOS.
Genotypes comprising pairs of CAPN5 haplotypes were generated for the cases and the controls and there were 30 different genotypes. The inspection of the genotypes in these two groups showed that the genotypes DE, GH and HH were over-represented among the patients with PCOS, together entailing 10% of the PCOS genotypes, and interestingly, the combinations of haplotypes were only present in a normal control (x2=7.00, p=0.008, FIG. 2). The four haplotypes involved are again D, E, G and H.
Preliminary studies were finally carried out to examine the role of the alleles of the CAPN5 gene in the presence of phenotypic characteristics in patients with PCOS. It was decided to compare the distribution of genotypes and haplotypes of the CAPN5 gene between groups of patients, divided depending on the presence/absence of specific phenotypic features:
Table VII shows the results of the phenotype-genotype correlation studies between the SNPs of CAPN5 and some clinical symptoms related to PCOS. The results indicate that CAPN5 markers have a predictive power to pre-symptomatically infer severe complications in PCOS women with no hirsutism. The presence of FH of obesity, menstrual alterations, the presence of family forms of cancer in PCOS women with hirsutism and the presence of type 2 diabetes mellitus in PCOS women would also have a high correlation with said CAPN5 markers.
Study Population
The study population consisted of 606 unrelated individuals (278 men, 46%; 328 women, 54%) selected at random. It was determined if each of these individuals showed or did not show one of the following fourteen phenotypic characteristics: 1) obesity, 2) abdominal obesity, 3) systolic hypertension, 4) diastolic hypertension, 5) combined (systolic and diastolic) hypertension, 6) diabetes, 7) glucose intolerance, 8) hyperinsulinemia 9) insulin resistance, 10) hypertriglyceridemia, 11) hypercholesterolemia, 12) low levels of cholesterol associated to high density lipoproteins (hereinafter denominated HDL-cholesterol), 13) high levels of cholesterol associated to low density lipoproteins (hereinafter denominated LDL-cholesterol) and 14) metabolic syndrome, constituting the grouping of some of these factors. The choice of the first three phenotypic characteristics is due to the fact that they are normally considered to be cardiovascular risk factors, which usually occur grouped; thus constituting the so-called metabolic syndrome, phenotypic characteristic number 14 of the study. There are various practical classifications to determine which individuals have the metabolic syndrome, but the most widely used is the Adult Treatment Panel III (ATPIII) of the NCEP (National Cholesterol Education Program, 2001) report, which considers metabolic syndrome to be the presence of at least three of the following cardiovascular risk factors:
a) abdominal obesity
b) hypertriglyceridemia
c) low HDL-cholesterol levels
d) hypertension (systolic and/or diastolic)
e) high fasting glucose
Like the high glucose levels in an oral glucose tolerance test, the latter characteristic is considered to be a glucose intolerance symptom and as such, a step prior to diabetes. Both glucose intolerance and type 2 diabetes mellitus are part of the disorders that can occur as a result of insulin resistance which really is a pathological condition characterized by the absence of the response to insulin by peripheral tissues which means that the tissues cannot recognize it and cannot use it even though there are high blood sugar levels. Therefore, a series of metabolic disorders occur which include dyslipidemia (hypertriglyceridemia, low HDL-cholesterol levels . . . ), abdominal obesity, hyperuricemia, hyperandrogenism etc., in addition to the already mentioned glucose intolerance and type 2 diabetes mellitus. That is why different ways are usually used to assess it. One of them is the homeostatic model (HOMA), used in this study, relating the insulin and glucose blood levels. The specific formula used is:
HOMA=[insulin concentration (μU/ml)×glucose concentration (mmol/l)]/22.5 the insulin resistance being judged for values equal to greater than that determined by the 75th percentile: HOMA=3.67.
The reference values used to judge the presence or absence of the other cardiovascular risk factors mentioned also vary between different authors.
In the present study, in an attempt to consider the features associated to diabetes and the steps prior to its development and the summary of alterations constituting the metabolic syndrome, it has been decided to analyze the aforementioned thirteen phenotypic characteristics, as well as the grouping of some of them constituting the metabolic syndrome as such. For each of these characteristics, it was decided whether each particular individual belonged to the case group or the control group in order to carry out the corresponding association study between the calpain-5 gene polymorphisms and this specific phenotypic characteristic. The assignment of each individual to the corresponding case group or the control group was carried out according to the criteria shown in Table VIII, which also shows the distribution of cases and controls obtained for each phenotypic characteristic studied and the references mentioning said criteria are cited.
aBMI: Body Mass Index, calculated as BMI = weight (kg)/height2 (m2)
bOGTT: Oral Glucose Tolerance Test. Blood level 120 minutes after an oral overload
An association between insulin resistance and the CAPN5 gene was not detected in the subsequent study of the phenotypic characteristics shown in Table VIII. Therefore, it does not appear in the tables below.
The anthropometric and biochemical data of each individual were collected to carry out the studies. For the qualitative studies, the assignment to the case group or to the control group for each of the phenotypic characteristics studied was carried out according to said collected anthropometric and biochemical parameters, although in said studies, as shown in Table VIII, the individuals who are receiving antihypertensive medication are considered to be hypertensive.
The specific anthropometric and biochemical values corresponding to each individual were also used for the quantitative studies, although in these studies the blood pressure values of the individuals who were receiving antihypertensive medication were eliminated.
An intra-cohort study was carried out in this population to estimate, according to the analyzed phenotypic characteristic, the occurrence frequency in cases and controls of the same SNPs of the CAPN5 gene studied in the patent specification for which the present specification is an addition.
A new analysis of the available CAPN5 sequence was also carried out with software different from that used to locate the SNPs of the invention, which allowed locating two additional CAPN5 variants with the possibility of being deleterious. In order to determine of this was so, a sub-cohort of the study group of 606 individuals which has just been described was selected and it was analyzed if these variables were present in their genomes. The performed studies are described in Example 17 and conclude with negative results. The absence of a relationship between the considered pathologies and the CAPN5 gene in the latter analyzed population subgroup is a good example of the difficulties involved in determining the gene variations that are really related to diseases, disorders and pathologies, it not being evident that a variant or SNP present in a gene will determine the occurrence of a certain pathology or the predisposition to suffer from it.
DNA was obtained by following a procedure similar to the one described in the association study between PCOS and CAPN5: 5 ml of peripheral blood were obtained from all patients and controls to isolate germ line DNA from leukocytes. DNA extraction was performed according to standard procedures using NucleoSpin Blood Kit (Macherey-Nagel). DNA aliquots were prepared at a concentration of 5 ng/μl to perform PCRs. The rest of the stock was cryopreserved at −20 C.
The same four single nucleotide polymorphisms (SNP) of the CAPN5 gene, the identification of which is described in the section “Single Nucleotide Polymorphisms” in the section of the study of the relationship between PCOS and the CAPN5 gene polymorphisms, were initially used as polymorphic markers to carry out the study.
The parallel analysis of these polymorphisms was genotyped in the same manner as it was carried out in Example 1 of the section of the study regarding PCOS, as described in Examples 8 and 9 et seq., subsequently carrying out the statistical analysis of the results, as described in Examples 10 to 15.
A haplotype re-analysis of the CAPN5 gene was subsequently carried out in the study population using a mathematical model different form the one used in Example 12 (Thesias), which reinforced the results obtained with this first mathematical model, even improving the statistical significance of the relationship of some of the haplotypes with some of the phenotypic features analyzed. This re-analysis is described in Example 16.
As in the case of Example 1, the parallel analysis of these polymorphisms was genotyped using automated DNA sequencing methods. Amplification primers covering the entire 5′ region of study were designed for the PCR. Due to the high number of individuals analyzed, new primers were designed (except Capn5F, which was also used in the study described in Example 1) to amplify the polymorphisms of cluster 1 separately, with the aim of obtaining amplicons of approximately 200 bp, the optimal size for the detection in a PSQ™ 96 (Pyrosequencing AB, Uppsala, Sweden). This instrument allows the simultaneous analysis of 96 samples by means of the pyrosequencing technique (Ronaghi M., et al., 1996).
The PCR reactions were performed with a final volume of 10 μl using 10 ng of genomic DNA.
The Capn5F primer marked with biotin (Capn5F, SEQ ID NO: 7) and an antisense primer (CAPN5-76int, SEQ ID NO: 9) were used for the amplification of the Nt g.86 A>G polymorphism. The PCR conditions were the following: an initial denaturation step of 95° C. for 5 minutes, 45 cycles of 95° C.-30 seconds, 68° C.-20 seconds, and 72° C.-30 seconds, and a final extension of 5 min at 72° C. An internal primer for the sequencing reaction was also designed (CAPN5-76R, SEQ ID NO: 10).
The primers for the amplification of the Nt g.344 G>A were CAPN5-140int, SEQ ID NO: 11 and CAPN5-R2, SEQ ID NO: 12, the sense primer (CAPN5-140int) being marked with biotin. The PCR conditions were: 95° C. for 5 minutes, 45 cycles of 95° C.-30 seconds, 62° C.-20 seconds, and 72° C.-30 seconds, and a final extension of 5 min at 72° C. A new internal anti-sense primer for the sequencing reaction was also designed (CAPN5-Rpi, SEQ ID NO: 13).
As in Example 2, genotyping of the region of the cluster was carried out by means of PCR amplification and genotyping by means of the so-called FRET technique, using the amplification primers and fluorescent detection probes indicated in said Example 2 (see Table 1).
The conditions for carrying out PCR and for obtaining the melting curves were identical to those described in Example 2.
Similar to the description in Example 3, the comparisons between allele frequencies and genotypes among non-selected patients and controls was carried out by means of χ2 tests with Yate's correction using Statcal software with analysis software (EpiInfo 5.1, Center for Disease Control, Atlanta Ga.).
Tests adapted from Sasieni (Sasieni, 1997) were used again for the statistical analysis of the genotype distribution, Hardy-Weinberg equilibrium deviation tests or two-point association studies were used, the on-line resources of the Human Genetic Institute, Munich, Germany (http://ihg.gsf.de) being used to carry out the calculations.
Polymorphic allele frequency was analyzed in the 606 individuals of the study at the four loci in the CPN5 gene: Nucleotide g.86 A>G, Nt g.86 A>G, Nt g.344 G>A, Nt c.1320 C>T and Nt c. 1496 G>A. The genotypes and allele frequencies obtained are summarized in Table IX.
The existence of association between thirteen of the phenotypic characteristics to be analyzed: 1) obesity, 2) abdominal obesity, 3) systolic hypertension, 4) diastolic hypertension, 5) combined (systolic and diastolic) hypertension, 6) diabetes, 7) glucose intolerance, 8) hyperinsulinemia, 9) insulin resistance, 10) hypertriglyceridemia, 11) hypercholesterolemia, 12) low HDL cholesterol levels, 13) high LDL cholesterol levels, was qualitatively analyzed, and the four polymorphisms were individually analyzed qualitatively using the χ2 analysis adapted from Sasieni (1997). The results are shown in Table X, which indicates only the most significant value of each analyzed phenotype in which an association was found, indicating the corresponding values of p, the odds ratio and the confidence interval (CI).
There are only two possible alleles in the analyzed polymorphisms: 1, usually corresponding to the most frequent variant, and 2, the less frequent one. Given that there are two copies of each of the genes, if the genotype of each individual is considered, it the individual can be homozygous for this polymorphism and have two equal alleles 1 (individual 11) or two alleles 2 (individual 22) or be heterozygous and have one allele of each type (genotype 12). When expressing 11+12 vs. 22, the intent is that the analysis has compared the individuals 22 (allele 2 homozygotes) to the group of individuals carrying allele 1 (individuals 11 and 12).
To analyze the degree to which these polymorphisms are in linkage disequilibrium (LD), standardized experiments concerning the linkage disequilibrium in pairs (D′) were carried out using Thesias software. The results are shown in Table XI and coincide with those obtained in Example 4 when a similar analysis was carried out with the population used for the study referring to PCOS. It was observed that the polymorphisms of Nt g.86 A>G and Nt c.1320 C>T are in complete linkage disequilibrium with the polymorphisms of Nt g.344 G>A and Nt c.1469 G>A, respectively (D′=−1, p<0.0001; D′=−1, p<0.0001, respectively), and between the polymorphisms of Nt g.344 G>A and Nt c.1320 C>T there is an absence of linkage disequilibrium (D′=0.00, p<0.001).
As in the study referring to PCOS, the analysis of the linkage disequilibrium again confirmed the existence of complete linkage disequilibrium in each cluster 1 (Nt g.86 A>G and Nt g.344 G>A) are in complete LD, and on the other hand both polymorphisms of cluster 2 (Nt c.1320 t and Nt c.1469 G>A) are also in complete LD, whereas there is an absence of linkage disequilibrium between the two clusters, again suggesting the presence of a recombination “hot spot” between both clusters.
The construction of haplotypes, which are made up of SNPs along the four loci inside the genome region of the CAPN5 gene and the association analysis based on the haplotypes was again carried out using Thesias software.
As occurred in Example 5 regarding the relationship between PCOS and the CAPN5 gene, nine unique haplotypes were detected again in the analyzed population, marked as A-I between the cases and combined controls. In addition to the haplotype of the wild-type allele in each of the polymorphic loci (haplotype D), there were eight haplotypes consisting of several combinations and permutations of variants of the sequence and the wild-type sequence in each locus. Table XII shows the identified haplotypes and their frequencies.
For the purpose of discovering not only which polymorphisms are associated to a certain phenotype but also which is the specific genetic context in which this association occurs, a series of haplotype context assays were carried out for each of the four polymorphic loci. The variables were analyzed qualitatively (according to the criteria mentioned in the section in which the “Study population” was described) and quantitatively, this latter case being subsequently shown in Example 14. The level of statistical significance is set at p=0.05 (Bonferroni correction is not applicable). The result in the statistical analysis is usually expressed with the value of p, which is simply a way of quantifying the probability with which the observation that is made is true and is not random. Statistical significance is usually established when the value of p obtained is less than 0.05, i.e. the association between a certain genotype and a disease is true with 95% probability. In studies in which several polymorphisms are examined at the same time, it is necessary to make a correction (Bonferroni correction) because each individual analysis presents a risk of false positives. The Bonferroni correction consists of dividing the vale of 0.05 by the number of analyzed polymorphisms. In this case, there are four analyzed polymorphisms, but they are divided by two because they are not completely independent, but rather are linkage disequilibrium in twos. This correction is made only in genotype studies; this correction is not established in haplotype studies because the Thesias program that has been used already establishes its own system of correcting possible errors or false positives, therefore the level of significance is set at 0.05. The lower the value of p the stronger is the association being studies, i.e. the fewer probabilities there are that what is observed is random. The values approaching the statistical significance limit without reaching it are said to show a trend, even more so the closer they are to said values.
Continuous variables are analyzed during quantitative analysis, i.e. variables which can have any value, such as age or the blood sugar level. For this type of variables, the means (the mean value of all the values analyzed so it is necessary that the samples be distributed normally) or medians (the central value of a series when the values are not distributed normally, as in this case) are compared in each genotype and they are analyzed to find if there are significant differences between them.
In qualitative analysis, however, variables which are distributed in categories, for example diabetic and non-diabetic, are compared. When the condition of healthy or afflicted depends on levels of continuous biochemical parameters such as diabetes or hypercholesterolemia, a reference value marking the limit between them is set. The same occurs with almost all the variables that have been analyzed. The reference values used in this study are hereinbefore mentioned at the beginning of the section of the Detailed Description of the Improvements of the Invention and are summarized in Table 1.
By assigning each investigational group with the condition of healthy (control) or afflicted (case) in qualitative analysis, statistical significance can be lost and the results may vary according to the criterion that is used, therefore quantitative studies are preferred. However, as will be discussed below, no discrepancies between both study groups arose, there is just greater sensitivity of the quantitative method.
The association between calpain 5 and obesity, diabetes, glucose intolerance, high LDL cholesterol and metabolic syndrome has been established by means of qualitative analysis. Table XIII includes the statistically significant results. The values of p in each haplotype or genetic context and between parentheses, the OR (odds ratio). The odds ratio values reflect the probability of an individual carrier of a certain allele of suffering the disease and an OR value <1 indicates protection against the disease (less probability of suffering it).
The quantitative studies have confirmed the association of CAPN5 with the increase of glucose levels in general, LDL cholesterol, to a certain extent, with obesity, but in this case not through the BMI but through the waist circumference measurement, a parameter associated to cardiovascular risk. Statistically significant values were also obtained for diastolic pressure, total cholesterol, low HDL cholesterol levels and high baseline insulin levels.
Table XIV shows the results with statistical significance obtained in the quantitative analysis. The arithmetic mean obtained for said parameter in the study population from the values corresponding to each individual is indicated immediately under the heading corresponding to each phenotype characteristic studied. The characteristics shown are: waist circumference (WC), diastolic pressure (DP), fasting glucose (FG), glucose levels detected in an oral glucose tolerance test (OGTT), baseline insulin (BI), total cholesterol (CHOL), HDL cholesterol (HDL) and LDL cholesterol (LDL). The value of p corresponding to a certain genetic context, e, immediately under it, the effect of the haplotype in question on levels of the phenotype variant (the increase or decrease of the value of the parameter with respect to the mean value of said parameter previously indicated) is indicated for each of the parameters. The values in italics are those values which do not reach statistical significance (p<0.05) but which show a strong trend, therefore they have been included in said Table XIV.
As previously discussed, the analysis of the results shown in Table XIV and their comparison with the results obtained in the qualitative analysis shows that there is no difference between the quantitative and quantitative analyses, simply greater sensitivity of the quantitative method. Therefore, if an association between CAPN5 and obesity is obtained in the qualitative analysis, an association between CAPN5 and a larger waist circumference (which is another way to determine obesity, abdominal obesity in this case) is obtained in the quantitative analysis. In relation to glucose intolerance, a characteristic associated with CAPN5 according to the qualitative study, the condition of afflicted (the assignment to the case group) is established in those individuals with baseline blood sugar levels exceeding 100 mg/dl and/or an oral glucose tolerance test (OGTT) exceeding 140 mg/dl; the latter parameter has obtained statistical significance in the quantitative analysis. Finally, statistic significance was obtained in both studies in the case of LDL cholesterol.
As an overall result of the haplotype study, it can be seen that most haplotypes associated to the various phenotypes analyzed share the haplotype contexts AGC_and GGC_. The presence of the latter polymorphism (Nt c.1469 G>A) has important effects on the levels of the studied variable according to the haplotype context in which it is located. Table XV allows observing this fact more clearly.
The presence of polymorphism Nt c.1469 G>A in the GGC-context is related to values exceeding the mean in terms of diastolic pressure, total cholesterol and LDL cholesterol, although the latter does not reach the level of statistical significance. It is located in the AGC- context, where a decrease in abdominal obesity (related to cardiovascular risk) and of diastolic pressure occur, further causing an increase in HDL cholesterol levels. It can also be seen that its presence in AGC- (haplotype AGCA) is related to higher glucose levels in the Oral Glucose Tolerance Test (OGTT or glucose after 120 minutes).
Other haplotypes have also shown an association with some alterations. In that sense, haplotype AACA has been associated to obesity and metabolic syndrome, whereas haplotype GGTG is associated to hyperinsulinemia and high total and LDL cholesterol levels.
To confirm the previous data, a re-analysis of the haplotype of the CAPN5 gene in the same study population was carried out using a mathematical model that was different from the one used by Thesias, implemented in Whap software (http://www.genome.wi.mit.edu/˜shaun/whap/). Thesias uses the SEM algorithm for the interference of the haplotypes and bases the association analysis on differences between the phenotype means associated to each haplotype. Whap uses the EM algorithm (the SEM algorithm being a variant of the EM algorithm) for the interference of haplotypes, but it bases the association analysis on a logistic regression.
According to the software instructions, for greater method precision the values of each series must be standardized such that the mean is equal to 0 and variance equal to 1. These standardized values are obtained subtracting from each value the mean of the series and dividing by the standard deviation thereof (X−X1/2/)/SD).
A test has been included in the association studies with 5000 permutations to obtain the empirical values of p. The empirical values of p are obtained after analyzing a permutation test. Permutations are the different possible ways of arranging the variables under study. The permutations test randomly distributes the values of the quantitative variable under study and the genotypes the specified number of times (5,000 in this case). The situation in which there are no differences is thus simulated given that the assignment of each value to each genotype is random in each permutation, a reduction of the rate of false positives being obtained. The significant results thus obtained are summarized in Table XVI. The column entitled “Haplotype (1d.f.)” refers to the value of p or statistical significance obtained by the specific haplotype specified in each case in a χ2 analysis with one degree of freedom (1d.f.). The values specified in the column “Overall (5.d.f.)” are general estimates of the extent to which the CAPN5 haplotypes affect the phenotype under study, without specifying which specific haplotype or haplotypes exert this action. Given that only the six CAPN5 haplotypes with a frequency exceeding 3% have been used in this analysis, a χ2 test with five degrees of freedom has been used. Finally, the presence of the asterisk indicates the associations that were significant according to Thesias software.
The analysis of haplotypes has also been carried out taking into consideration the effect of age and sex as covariables. Given that age and sex affect a number of biochemical and anthropometric values, it is common practice in statistics to correct studies of association due to the effect of these variables, because they may be factors which cause confusion altering the genotype-phenotype relationship. The software used for this new study, Whap, allows including these or other factors in the mathematical model which may affect the analysis of association between CAPN5 and the variable under study; these factors are referred to as covariables. Corrected values of association shown in Table XVII have been obtained in this manner. The table shows that the values of p are not substantially modified for any of the phenotypes analyzed, the associations with baseline glucose and total cholesterol being those which take on greater statistical significance with respect to the non-covariate analysis.
The results are generally consistent with those obtained by using the SEM algorithm. Furthermore, in the case of obesity in which a p with a marginal significance level was obtained in the quantitative analysis using the Thesias method (0.056), the statistical significance is improved with this other analysis (p=0.037, Table XVII), clarifying the relationship of the gene to obesity.
It can therefore be concluded that the group of symptoms with which the variants of calpain 5 are related are similar to those of the metabolic syndrome, characterized by the presence of three or more of the following factors: obesity, hypertension, glucose intolerance/insulin resistance, hypercholesterolemia and hypertriglyceridemia. Some of these haplotype variants specifically seem to be associated with the predisposition to manifesting metabolic syndrome in the analyzed population. The relationship of CAPN5 to phenotypes of the metabolic syndrome is resistant to different statistical models, analysis of covariables and use of medication (antihypertensive agents, oral antidiabetics and hypocholesterolemic agents), showing the value of CAPN5 in the prediction, prognosis and assistance in the diagnosis of diseases that today are as common as hypertension, hypercholesterolemia or obesity. Each risk group may be multifactorial, such that it may be the combination of several of the haplotype variants of the CAPN5 gene for which the association with cardiovascular risk factors has been detected in other sub-groups, which may be useful in detecting the existence or predisposition of suffering metabolic syndrome.
Attempts were made to locate other variants of the CAPN5 gene that may be involved in disorders caused by the altered expression of said gene. For this purpose a new analysis of the available CAPN5 sequence (ENSG00000149260) was carried out, this time using pupaSNP Finder software (Conde L. et al. 2004; http://pupasnp.bioinfo.cnio.es/), different from the one that previously allowed detecting the four previously described SNPs. This software is especially indicated for detecting the variations that may affect messenger RNA splicing. This process consists of a digestion and ligation reaction of the RNA molecules, which must occur before passing to the cytoplasm to be translated to a protein. For this reason, any alteration in the signals controlling this process entails serious consequences in the gene expression.
This new analysis clearly showed two candidate polymorphisms which could have an effect on the expression of CAPN5 due to a possible alteration in the messenger RNA splicing process.
Once exon 2 was located, rs11825526 was found, which is an A>G substitute which broke the structure of an exonic splicing enhancer (ESE), which could entail elimination of the exon. Table XVIII shows how the T>C change alters the action targets (ESEs) of splicing factors srp40 and sf2. ESEs are regulating regions which are found frequently, if not in all, exons, also including those which are constitutional (Cartegni L et al., 2002). To date, many polymorphisms located in encoding regions have been related with altered gene expression patterns. In fact, 50% of the mutations involving an amino acid change and which have been associated to said pathologies cause aberrant splicing (Cartegni L et al., 2003). The literature mentions some interesting examples of this type of mutations (Fackenthal JD et al., 2002). For this reasons it was considered that rs11825526 could be associated to the H and/or W haplotypes and thus be generating an aberrant gene expression pattern and finally contribute to the development of the phenotypic charts of the study.
A second option was variant rs11237087. This polymorphism involves G>T substitution at the donor site of splicing of the fourth CAPN5 exon. As shown in FIG. 3, 14 of the 15 CAPN5 exons have AG-GT consensus sites upstream and downstream from each intron. The target sequences of the splicing are strongly retained in CAPN5. Only intron 14 can be considered an exception to this rule. For this reason, this variant may generate an alternative splicing process in the CAPN5 mRNA, which would generate an aberrant protein.
The study described in Example 17 was carried out to confirm the possible contribution of these two variants to the development of the phenotypic charts referring to the relationship between CAPN5 and the alterations involving cardiovascular risk factors.
A sub-cohort of 24 patients out of 606 initial patients used for the study of cardiovascular risk factors was selected, taking into consideration criteria that included either a severe manifestation of the symptoms that defined the charts, a cluster thereof or manifest cases of family clustering of the symptomatology. Their clinical and haplotype descriptions are included in Table XIX.
Using genomic DNA as a mold, the region comprising the ESE of exon 2 with the possible mutation was amplified and sequenced. None of the patients included in this study presented the described variation. In fact, a mixture of the DNA of 24 individuals belonging to the general population was carried out, and said sample was amplified, and after sequencing it, no heterozygosity could be observed in the studied locus. This data suggests that the variant rs11825526 could not be present in our population. On the other hand, the fact that this polymorphism has still not been validated by other independent studies suggests that it could be a highly infrequent variant or a sequencing error.
The hypothesis of the possible involvement of the rs1123708 variant was then started. To that end, the region corresponding to the intron-exon junction of interest in the patients belonging to the sub-cohort described previously was sequenced. A control mixture of 20 DNAs was also included in this study. The sequence analysis disclosed the absence of the G>T variant in the patients and in the control mixture.
The non-detection of the latter two variants analyzed in the population in which they were searched shows the difficulty in determining the gene variants that are really involved in the development of pathologies, disease and disorders.
FIG. 1 shows the diagram of the CAPN5 gene. The polymorphism sites and its clusters are indicated.
FIG. 2 shows the CAPN5 genotypes in the cases with PCOS against normal controls
X axis, genotypes, Y axis, % of genotypes with the PCOS cases (darker) and the controls (lighter)
FIG. 3 shows the diagram of a primary transcript generated from the CAPN5 gene showing the situation of the exons, the cutting and splicing sites of each exon and the location of the rs11825526 and rs11237087 mutations. ATG: initiation codon; TGA: stop codon. AG/GT Groups: cutting and splicing sites of each exon.
Number | Date | Country | Kind |
---|---|---|---|
200401281 | May 2004 | ES | national |
200402737 | Nov 2004 | ES | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/ES2005/070073 | 5/27/2005 | WO | 00 | 8/8/2008 |