The figures show:
a: Intron 7 of USF1 harbors the 60-bp sequence shared by the 91 USF1-similarity genes. Parts (2-61 bp and 137-196 bp) of the AluSx repeat in intron 7 of USF1 have sequence similarities with the mouse B1 repeat. A total of 91 human genes, including USF1, have this 60-bp part of AluSx located either on the coding strand (43 genes) or on the opposite strand (48 genes). These 91 genes are listed in the Supplementary Table 4.
b: Transcription efficiency of a 268-bp region in intron 7 of USF1 containing the critical 60-bp sequence and the usf1s2 SNP (see
a: Schematic view of the 6.7 kb USF1 gene. Exons are depicted as thick boxes, UTRs as thinner boxes and introns as lines. Genotyped USF1 SNPs are marked above the gene with associating SNPs indicated with asterixes. A segment of intron 7 is amplified to show the location of the sequence (black bar), used to generate the 20-mer probe used in the EMSA. Nearby SNPs are indicated with larger font and arrows.
b: Cross-species conservation and EMSA probes. Two probes were constructed that both were capable of producing a shift in the EMSA; One of length 34 bp and the other 20 bp. The 34-mer probe contained all three SNPs from this intron 7 region, whereas the 20-mer probe only contained the critical usf1s2 SNP. Below is shown the cross-species sequence conservation and the consensus sequence. Y stands for pyrimidine and R for purine. Notably the nucleotide at usf1s2 itself is fully conserved, the risk allele representing the ancestral allele.
a: EMSA results show that both the 34 bp and the 20 bp probe around usf1s2 bind nuclear protein(s) from HeLa cell extract. The different usf1s2 allelic variants of both probe sets produce a gel-shift, marked by an arrow. Conversely, neither variant of the 20 bp probe representing the sequence around usf1s1 in the 3′UTR is capable of producing a gel-shift.
b: The specificity of the binding of nuclear protein(s). The 34 bp probe representing the sequence around usf1s2 produces a strong gel-shift which can be gradually competed with the addition of increasing molar concentrations of unlabeled probe.
The examples illustrate the invention.
All analyzed FCHL families had a proband with severe CHD and lipid phenotype, and on average 5-6 FCHL affected family members. These FCHL families exhibiting extreme and well-defined disease phenotypes were analyzed to identify the underlying gene contributing to FCHL on 1q21. We selected a regional candidate gene approach and sequenced four functionally relevant regional candidate genes on 1q21. The TXNIP, USF1, retinoid X receptor gamma (RGRG), and apolipoprotein A2 (APOA2) genes were sequenced to identify all possible variants. Of these, TXNIP initially represented the most promising positional candidate gene, because it has been shown to underlie the combined hyperlipidemia phenotype in mice17. The three additional regional genes were selected for sequencing based on their functional candidacy and close location (<2.5 Mb) to the original peak linkage markers, D1S104 and D1S1677 (
We identified a total of 23 SNPs for the 5687 bp sequence of the USF1 gene (Supplementary Table 2): Three of these were silent variants in exons, and the rest were located in the non-coding regions and in the putative promoter. Eight of the 23 SNPs were novel. Initially, we genotyped three SNPs for the USF1 gene: usf1s1 (exon 11), usf1s2 (intron 7), and usf1 s7 (exon 2) (the corresponding rs numbers for the genotyped SNPs are given in Tables 2-3),
The usf1s1 and usf1s2 provided evidence for linkage in the 42 FCHL families with maximum Iod scores of 3.5 and 2.0 for FCHL, and 3.7 and 2.0 for TGs. Combined analysis of these SNPs also provided some evidence for association with the gamete competition test for both FCHL (p=0.005) and TGs (p=0.008) (Table 1), although the results of individual SNPs were non-significant. We also observed a difference in the allele frequencies between unaffected and affected men, especially with the TG trait. The frequency of minor allele of usf1s1 was 22.0% in TG-affected males and 40% in the unaffected male family members. Since these affected and unaffected family members represent non-independent groups of males, we tested usf1s1 and usf1s2 in TG-affected men using the family-based association method, HHRR, and the gamete competition test: p-values of 0.01 and 0.02 were obtained in the HHRR analysis and 0.008 and 0.02 in the gamete competition test of the 42 nuclear FCHL families (Table 2). The combined analysis of these SNPs yielded a p-value of 0.003 in the HHRR test and 0.004 in the gamete competition test for TGs in men (Table 1).
−2167
New
0.02
T/C
−2022
New
0.05
A/C
−802
New
0.03
C/G
Underlined variants were genotyped in the FCHL families. For these SNPs, the numbers usf1s1-s9, used in the text and Tables 1-3, are also shown; New indicates that the SNP was not found in the SNP databases. The numbering of the new SNPs is based on the genomic sequence of USF1 at the UCSC Genome Browser, July 2003 (refGene_NM—007122).
Next, we genotyped these two associated SNPs, usf1s1 and usf1s2, in the larger study sample of 60 extended FCHL families. Furthermore, 12 additional SNPs were genotyped for the USF1 region (Table 2,
To confirm that the gamete competition results are indeed significant and not biased by such contributors as sparse data, we calculated empirical p-values for all gamete compete analyses involving multiple SNPs (Table 1) using gene dropping with at least 50,000 simulations (see Methods). The obtained empirical p-values were in very good agreement with the asymptotic p-values of the gamete competition analyses (Table. 1), indicating that the observed results do not represent artifacts of asymptotic approximations with sparse data.
After genotyping a total of 15 SNPs in the USF1 region, we identified a pattern of association and LD reaching at least 46 kb in men with high TGs and extending from the centromeric junctional adhesion molecule 1 (JAM1) gene to the USF1 gene (
In all affected family members, using both FCHL and TG traits, the evidence for association was restricted to the usf1 s1 and usf1s2 (Table 1) within the USF1 gene. The rest of the 13 SNPs genotyped for the JAM1-USF1 region did not provide significant evidence for association. However, we observed that two additional USF1 SNPs among those 23 SNPs identified by sequencing, rs2073655 in intron 3 and rs2073656 in intron 6, were also in full LD with the associated usf1s2 in 31 FCHL probands and are likely to extend the FCHL-associated region to intron 3 of USF1. No association was obtained with SNPs residing outside the JAM1-USF1 region (Supplementary Table 1). In conclusion, evidence for association and LD was restricted to a 1239 bp region within the USF1 gene in all affected individuals of FCHL families but extended at least 46 kb within the JAM1-USF1 region in men with high TGs (Tables 2-3,
The combination of the usf1s1-usf1s2 SNPs, resulting in the significant haplotypes for FCHL and TGs, was also tested with three additional qualitative lipid traits: high apolipoprotein B (apoB), high TC and small low-density lipoprotein (LDL) peak particle size. For apoB, p-values of 0.00003 and 0.0007 were obtained for all affected individuals and for affected men for the susceptibility haplotype 1-1 in the gamete competition analysis. For TC, the p-values were 0.0001 and 0.007; and for LDL peak particle size, 0.002 and 0.01, respectively. These results together with the results obtained for FCHL suggest that the underlying gene is not affecting TGs alone but also the complex FCHL phenotype.
Using the HBAT program we obtained evidence for shared haplotypes in the region of usf1s1 and usf1s2 (Table 3). This observation was supported by multipoint HHRR analyses (Table 3). For the haplotype 1-1 (1 indicating the common allele) a p-value of 0.0007 was obtained using the option.
This option measures not only preferential transmission of the susceptibility haplotype to affected but also less preferential transmissions to unaffected, making it useful here since in these extended families the unaffecteds also contain important information. The results of the HBAT-e option, a test of association given linkage, are also shown in Table 3. Since this test statistics implicitly conditions on linkage information, it is less powerful and leads to reduced p-values. However, this test together with the results of the HHRR analyses allow us to conclude that the 1-1 haplotype is associated with the phenotype (Table 3). Furthermore, haplotype 2-2 was significantly less transmitted to the affected subjects (p=0.004), suggesting a protective role for this allele. These results were further supported by a genotype-based association test for general pedigrees, the genotype-PDT, which provided evidence for association (Table 3), as well as by the gamete competition analyses (Table 1), where the same haplotype 1-1 was segregating to the affected individuals with both FCHL and TG traits.
We investigated Whether the gene expression profiles of fat biopsies from six affected FCHL family members carrying the susceptibility haplotype 1-1, constructed by the SNPs usf1s1 and usf1s2, revealed differences when compared to four affected FCHL family members homozygous for the putative protective haplotype, 2-2 (see above), using the Affymetrix, HGU133A probe array. We also specifically investigated whether USF1 is expressed in fat tissue because it is not sufficiently represented on the Affymetrix HGU133A chip. Using RT-PCR the USF1 was found to be expressed in the fat biopsy samples (data not shown). Quantitative real-time PCR was also performed to determine the relative expression levels of USF1 in adipose tissue in the affected FCHL family members carrying the risk haplotype and affected members not carrying the risk haplotype. No detectable differences in USF1 expression levels could be observed, suggesting that the potential functional significance of the FCHL associated allele of the USF1 is not delivered via a direct effect on the steady state transcript level in adipose tissue.
Due to the limited number of samples available, statistical power to detect differences in gene expression between the haplotype groups was not considered sufficient. As an alternative, we therefore defined cut-off thresholds (see Methods) to discriminate between significant differences and differences attributable to technical or biological noise in the experimental procedures. Using these criteria, we identified 25 genes that appeared up-regulated and 73 genes down-regulated in the susceptibility haplotype carriers (the complete lists will be available at our website, while the raw data can be accessed through the Gene Expression Omnibus at NCBI using the GEO accession GSE590). To lend biological relevance to these findings, lists of differentially expressed genes were examined for over-representation of functional classes, as defined by the gene ontology (GO) consortium, using the Expression Analysis Systematic Explorer (EASE) tool. Only three classes were found to be statistically significantly over-represented among the up-regulated genes (
Next we investigated the genomic sequence flanking the haplotype 1-1, and identified a 60-bp sequence element found in 91 human genes as follows: The SNP usf1s2, forming part of the haplotype 1-1, resides adjacent (8 bp) to a 306-bp AluSx repeat. Two parts (2-61 bp and 137-196 bp) of this AluSx repeat show sequence similarity with the mouse B1 repeat (
To obtain some evidence for the functional significance of this conserved 60-bp DNA element, we produced a 268-bp long construct containing the critical 60-bp sequence as well as the usf1 s2 SNP region and tested its regulatory function in vitro using the SEAP reporter system (
1According to the gene ontology (GO) classification biological process41.
The purpose of this experiment was not to solve whether the usf1s2 SNP is directly causative to FCHL. More complex functional studies need to be performed before any conclusions of the functional significance of a single non-coding SNP can be drawn. However, these preliminary data combined with the across species conservation would imply that the DNA region flanking the susceptibility haplotype contains an element affecting transcriptional regulation. The data also suggest that the element is more likely to be a C is acting type regulator rather than a direction-independent enhancer element.
The Finnish FCHL families were recruited in the Helsinki, Turku and Kuopio University Central Hospitals, as described earlier4,9. Each subject provided a written informed consent prior to participating in the study. All samples were collected in accordance with the Helsinki declaration, and the ethics committees of the participating centers approved the study design. The inclusion criteria for the FCHL probands were as follows4: 1) serum TC and/or TGs>90th age-sex specific Finnish population percentiles4, but if the proband had only one elevated lipid trait, a first-degree relative had to have the combined phenotype; 2) age>30 years and <55 for males and <65 years for females; 3) at least a 50% stenosis in one or more coronary arteries in coronary angiography. Exclusion criteria for the FCHL probands were type 1 DM, hepatic or renal disease, and hypothyroidism. Familial hypercholesterolemia was excluded from each pedigree by determining the LDL-receptor status of the proband by the lymphocyte culture method4. If the above mentioned criteria were fulfilled, families with at least two affected members were included in the study, and all the accessible family members were examined. Two traits were analyzed: FCHL and TGs. For the FCHL trait, family members were scored as affected according to the same diagnostic criteria as in our original linkage study4 using the Finnish age-sex specific 90th percentiles for high TC and high TGs, available from the web site of the National Public Health Institute, Finland. These ascertainment criteria are fully comparable with the original criteria1. For analysis of TGs, family members with TG levels≧90th Finnish age-sex specific population percentile were coded as affected. In addition to the FCHL and TG traits, the combination of the usf1s1-usf1s2 SNPs, which resulted in the significant haplotypes for the FCHL and TG traits, was also analyzed using the apolipoprotein B (apoB), LDL peak particle size and TC traits. For apoB and TC, the 90th age-sex specific Finnish population percentiles, publicly available from the web site of the National Public Health Institute, Finland, were used. For LDL peak particle size, the cut point of 25.5 nm was used to code individuals with small LDL particles as affected. Although LDL-C is an important component trait of FCHL, serum TC was used instead in the ascertainment of the Finnish FCHL families as well as in the statistical analyses of the SNPs forming the USF1 susceptibility haplotype. The reasoning for this is the significant hypertriglyceridemia associated with FCHL. The Friedewald formula is generally not recommended when TGs are over (400 mg/dl i.e. 4.4 mmol/l), which is often the case with hypertriglyceridemic FCHL family members. In addition, the population percentile points of LDL-C could not be estimated when including this factor, as we currently don't have population percentiles for LDL-C.
Serum lipid parameters and LDL peak particle size were measured as described earlier4,9,39. Probands or hyperlipidemic relatives who used lipid-lowering drugs were studied after their treatment was withheld for 4 weeks. In the 60 FCHL families, DNA and lipid measurements were available for 721 and 771 family members, respectively. In these 60 FCHL families, there were 226 individuals with TC>90% age-sex specific Finnish population percentile, 220 with TGs>90% age-sex specific percentile, 321 with TC and/or TGs>90% age-sex specific percentile; and 125 individuals with both TC and TGs>90% age-sex specific percentiles, respectively. A total of 96 men and 124 women exhibited high TGs (>age-sex 90th percentile).
The TXNIP gene was sequenced in the 60 FCHL probands and the APOA2, RXRG, and USF1 genes in the 31 probands of the original linkage study4. For TXNIP and USF1, 2000 bp upstream from the 5′ end of the gene were also sequenced. For USF1, the DNA binding domain was also sequenced in the remaining 29 probands. For all genes, both exons and introns were sequenced, except for the large 44,261-bp RXRG gene where only exons and 100 bp exon-intron boundaries were sequenced. Sequencing was done in both directions to identify heterozygotes reliably. Sequencing was performed according to the Big Dye Terminator Cycle Sequencing protocol (Applied Biosystems), with minor modifications and the samples separated with the automated DNA sequencer ABI 377XL (Applied Biosystems). Sequence contigs were assembled through use of Sequencher software (GeneCodes). The dbSNP and CELERA databases were used to select SNPs. Pyrosequencing and solid-phase minisequencing techniques were applied for SNP genotyping, as described earlier4,40. Pyrosequencing was performed using the PSQ96 instrument and the SNP Reagent kit (Pyrosequencing AB). Every SNP was first genotyped in a subset of 46 family members from 18 of the 60 FCHL families. If the SNP was polymorphic (minor allele frequency>10% in this subset), the SNP was genotyped in 238 family members of 42 FCHL families, including the 31 FCHL families of the original linkage study4. This strategy was not applied for the TXNIP gene the variants of which all had a minor allele frequency<10%. The physical order of the markers and genes was determined using the UCSC Genome Browser. The novel SNPs characterized in this study will be submitted to public databases (NCBI). All SNPs were tested for possible violation of Hardy Weinberg equilibrium (HWE) in three groups (all family members, probands, and spouses) using the HWSNP program developed by Dr. Markus Perola at the National Public Health Institute of Finland. Annotation data of the Alu elements were downloaded from the UCSC Genome Browser, which uses the RepeatMasker to screen DNA sequences for interspersed repeats. The positions of the 60-bp sequence on these Alu elements were identified using the BLAST. Other annotation data were downloaded from the LocusLink.
Six affected FCHL family members exhibiting the susceptibility haplotype (see Results) and four affected FCHL family members homozygous for the protective haplotype were selected for assessment of gene expression. All six susceptibility haplotype carriers were from six individual families. The four homozygous protective haplotype carriers were two subpairs from two families. Biopsies were taken from umbilical subcutaneous adipose tissue under local anaesthesia to collect 50-2000 mg of adipose tissue. The RNA was extracted using STAT RNA-60 reagent (Tel-Test, Inc.), according to the manufacturer's instructions, followed by DNAse. I treatment and additional purification with RNeasy Mini Kit columns (Qiagen). The quality of the RNA was assessed using the RNA 6000 Nano assay in the Bioanalyzer (Agilent) monitoring for ribosomal S28/S18 RNA ratio and signs of degradation. The concentration and the A260/A280 ratio of the samples were measured using a spectrophotometer, the acceptable ratio being 1.8-2.2. Then 2 μg of total RNA was reverse transcribed to cDNA using the SuperScript Choice System (Invitrogen) and T7-oligo(dT)24 primer, according to instructions provided by Affymetrix, except using 60 pmols of primer and a reaction volume of 10 μl, after which biotin-labeled cRNA was created using Enzo® BioArray™ HighYield™ RNA Transcript Labeling Kit (Affymetrix). Prior to hybridization the cRNA was fragmented to obtain a transcript size distribution of 50 to 200 bases, after which samples were hybridized to Affymetrix Human Genome U133A arrays and scanned in accordance with the manufacturers' recommendations.
Scanned images were analyzed with Affymetrix Microarray Suite 5 (Affymetrix, Santa Clara, Calif.) software employing the Statistical Expression Algorithm. All analysis parameters were set to the default values recommended by Affymetrix. Global scaling to a target intensity of 100 was applied to all arrays but no further normalizations were performed at this point. Output files of result metrics, including the scaled signal intensity values and the corresponding detection call expressed as absent, marginal or present, were further processed using GeneSpring 5.0 data analysis software (Silicon Genetics, Redwood City, Calif.). For each probe array a per gene normalization was applied so that signal intensities were divided by the median intensity calculated using all 10 probe arrays. Cut-off values to discriminate low quality data were determined separately for each haplotype group by dividing the base value with the proportional value estimated using the Cross Gene Error Model implemented in GeneSpring. To identify differentially expressed genes between the two haplotypes, ratios of averaged normalized intensities were calculated. Differences were considered as significant if the resulting ratio fell at least three standard deviations outside the average ratio calculated from the distribution of the log10 of the ratios. To further increase result stringency only genes scored as present in all 10 samples, or as absent or marginal in all cases and present in all the controls (or vice versa), were included. Annotation information defining the biological processes that each gene could be ascribed to was retrieved from the classifications provided by the gene ontology (GO) consortium41. Statistical evaluation of enrichment of categories represented in each gene list, compared to the proportion observed in the total population of genes on the probe array, was performed using the Expression Analysis Systematic Explorer (EASE) tool41, with the threshold value set to 3. The test statistic was calculated using Fisher's exact test. To maximize robustness, an EASE score (p-value) was calculated where the Fisher exact probabilities were adjusted so that categories supported by few genes were strongly penalized, while categories supported by many genes were negligibly penalized. EASE scores (p-values) falling below 0.05 were considered statistically significant.
Two affected FCHL family members exhibiting the susceptibility haplotype and two affected FCHL family members without the haplotype were selected for assessment of USF1 expression in adipose tissue utilizing the SYBR-Green assay (Applied Biosystems). Two step RT-PCR was done using TaqMan Gold RT-PCR kit according to manufacturers' recommendations. A total of 1 μg of RNA was converted to cDNA in a 100 μl reaction of which 1 μl was used in the quantitative PCR reaction. The ratio of USF1 to two housekeeping genes GAPDH and HPBGD was used to normalize the data. The specificity of the reaction was evaluated using a dissociation curve in addition to a no-template control. The following PCR primers were used in separate 10 μl SYBR-Green reactions: For USF1; forward: 5′-ATGACGTGCTTCGACAACAG-3′, reverse: 5′-GGGCTATCTGCAGTTCTTGG-3′. For GAPDH; forward: 5′-CGGAGTCAACGGATTTGGTCGTAT3′, reverse: 5′-AGCCTTCTCCATGGTGGTGAAGAC-3′. For HPBGD; forward: 5′-AACCCTCATGATGCTGTTGTC-3′, reverse: 5′-TAGGATGATGGCACTGAACTC3′. The reactions were run in triplicate using the ABI Prism 7900 HT Sequence Detection System in accordance with the manufacturers' recommendations and the data were analyzed using Sequence Detector version 2.0 software.
Initial functional analyses were performed using the SEAP reporter system (Clontech Laboratories, Palo Alto, Calif.) in COS cells. This system utilizes SEAP, a secreted form of human placental alkaline phosphatase, as a reporter molecule to monitor the activity of potential promoter and enhancer sequences. The constructs were cloned into the pSEAP2-Enhancer vector which contains the SV40 enhancer. The correct allele and orientation in each construct was verified by sequencing. Cell culture media between 48 h and 72 h after transfection were taken for the SEAP reporter assay. The monitoring of the SEAP protein was performed using the fluorescent substrate 4-methylumbelliferyl phosphate (MUP) in a fluorescent assay according to the manufacturer's instructions. Data are representative of at least two independent experiments.
Parametric linkage and nonparametric affected sib-pair (ASP) analyses were carried using the same programs and parameters as in the original linkage study4. Two traits were investigated, the FCHL and TG trait. The MLINK program of the LINKAGE package43 version FASTLINK 4.1P44-45 was used as implemented by the ANALYZE package46 to perform the parametric two-point and multipoint linkage analyses. The ASP analysis was performed using the SIBPAIR program of the ANALYZE package46. For each marker, allele frequencies were estimated from all individuals using the DOWNFREQ program47.
The SNPs were tested for association using the HHRR27 and the gamete competition test29. To minimize the number of tests performed, the SNPs residing outside the USF1-JAM1 region were tested for association only using the HHRR27 test when analyzing the TG- and FCHL-affected males. The HHRR analysis, performed by use of the HRRLAMB program48, tests the homogeneity of marker allele distributions between transmitted and non-transmitted alleles. The multi-HHRR analysis is testing the same hypothesis using several SNPs. The gamete competition test is a generalization of the TDT and views transmission of marker alleles to affected children as a contest between the alleles, making effective use of full pedigree data. The gamete competition method is not purely a test of association, because the null hypothesis is no association and no linkage, and thus linkage in itself also affects the observed p-value. Furthermore, the gamete competition test readily extends to two linked markers, enabling simultaneous analysis of multiple SNPs in a gene. P-values based on asymptotic approximations can be biased when data used to calculate them are relatively sparse. To confirm that the gamete competition results are indeed significant we also calculated empirical p-values for all analyses involving multiple SNPs (Table 1) using gene dropping. In gene dropping the founder genotypes are assigned using the estimated allele frequencies assuming HWE and linkage equilibrium (LE). The offspring genotypes are assigned assuming Mendelian segregation. Thus gene dropping is performed under the null hypothesis of LE and no linkage. To calculate an empirical p-value, gene dropping is performed multiple times. Here at least 50,000 simulations were performed for each analysis. The likelihood ratio test statistic (LRT) from each gene dropping iteration is compared to the LRT for the observed data. The empirical p-value is the proportion of iterations in which the gene dropping LRT equaled or exceeded the observed LRT. In general, the obtained empirical p-values of gene dropping are more conservative than asymptotic p-values for small sample sizes.
The HBAT program, options optimize offset (-o) and empirical test (-e), were performed to test for association between haplotypes and the trait49. The option-o measures not only preferential transmission of the susceptibility haplotype to affected but also less preferential transmissions to unaffecteds. The e option leads to a test of association given linkage and gives thus an empirical estimation of the variance. These haplotype analyses are affected by the fact that four of the 15 SNPs for the JAM1-USF1 region were genotyped in the 60 extended FCHL families and 11 SNPs in 42 nuclear FCHL families. The genotype Pedigree Disequilibrium Test (geno-PDT)50, which provides a genotype-based association test for general pedigrees, was also performed for a combination of genotypes from selected USF1 SNPs (Table 3). LD between the marker genotypes for SNPs in the JAM1-USF1 region was tested using the Genepop v3.1b program, option 2, at their web site. In this program, one test of association is performed for genotypic LD, and the null hypothesis is that genotypes, at one locus are independent from the genotypes at the other locus. The program creates contingency tables for all pairs of loci in each population and performs Fisher exact test for each table using a Markov chain.
Supplementary Tables 1-4 and further details on microarray data will be available at our web site (www.genetics.ucla.edu/labs/pajukanta/fchl/chr1/). The raw data for the complete set of probe arrays can be accessed through the Gene Expression Omnibus at NCBI (www.ncbi.nlm.nih.gov/geo) using the GEO accession GSE590. The Finnish 90th age-sex specific percentile values for TC and TGs are available at the web site of the National Public Health Institute of Finland (www.ktl.fi.molbio/wwwpub/fchl/genomescan). We used the dbSNP (available at www.ncbi.nim.nih.gov) and CELERA (www.celera.com) for SNP selection; the UCSC Genome Browser (genome.ucsc.edu) for physical order of the genes and for annotation of the Alu element; the BLAST (www.ncbi.nlm.nih.gov/blast/) for blasting sequences against human and mouse databases; the LocusLink (www.ncbi.nim.nih.gov/LocusLink/) to download annotation data; and the Genepop (wbiomed.curtin.edu.au/genepop/index.html) to calculate intermarker LD.
DNA probes representing both strands of the regions of interest were ordered from Proligo and 5′-end-labeled with [γ-32P]ATP using T4 polynucleotide kinase. Excess unincorporated label was removed using the QIAquick kit (Qiagen) according to manufacturer's instructions. Nuclear extracts were incubated for 30 minutes at room temperature in binding buffer (50 mM Tris-HCl (pH 7.5), 5 mM MgCl2, 2.5 mM EDTA, 2.5 mM DTT, 2.5 mM NaCl, 0.25 μg/μl poly(dl-dC).poly(dl-dC), 20% glycerol) and then electrophoresed on a 6% polyacrylamide gel containing 0.5 M TBE buffer. Gels were autoradiographed at −70° C. In order to test for specificity of binding, the extracts were run with an increasing concentration of unlabeled “cold” ds-probe as well as non-specific probe representing the sequence around the 3′-UTR SNP usf1 μl that did not produce a gel shift.
We selected 19 individuals for fat biopsy from our FCHL (ref. 6A) and low-HDL-C families33A based on their USF1 haplotype. They included 12 carriers of the risk-allele of the critical SNP usf1s2 and 7 individuals homozygous for the non-risk allele. Nine of these had been included in our original report6A. The average age in both groups was 49 years and the gender distribution was close to even (7 females and 5 males in the risk group versus 4 females and 3 males in the non-risk group). Fat biopsies were collected, RNA extracted and quantified as described previously6A. RNA labeling, array processing and scanning was done according to the standard protocol by Affymetrix with minor modifications, as described previously6A.
Scanned images were analyzed with Affymetrix Microarray Suite 5 (Affymetrix, Santa Clara, Calif.) software employing the Statistical Expression Algorithm. Global scaling to a target intensity of 100 was applied to all arrays, after which further data processing was carried out using GeneSpring 6.1 data analysis software (Silicon Genetics, Redwood City, Calif.). For each probe array, we applied a per gene normalization so that signal intensities were divided by the median intensity calculated using all 19 probe arrays, effectively centering the data around unity.
To identify differentially expressed genes between the two haplotypes, we adopted a strategy consisting of two filtering steps, in combination with a statistical analysis. First, we removed unreliable or inconsistent data using the Affymetrix detection calls, requiring genes to be scored as present in more than 50% of the samples in each haplotype group. In order to avoid losing potentially interesting data pertaining to genes whose expression was “turned off” in one group but “turned-on” in the other, we also included genes scoring absent calls in 100% of samples in one group and at least 50% present calls in the other. Normalized values were then averaged over samples in each haplotype group and ratios of these were calculated. The distribution of the ratios was evaluated and a cut-off limit of 1.5 fold was selected to focus attention on the most prominent and reliable expression changes. We determined significant changes by applying a two-sample t-test, allowing for unequal variances across groups, where a two-sided. P-value of 0.05 or lower was considered statistically significant. For the genes represented by more than one probe set on the array the measurements associated with the more conservative P-value were used.
We evaluated the effect of haplotype on gene expression for selected genes using a two-sample t-test, with no assumption of equal variances. Two-sided significance values were calculated and a type I error probability of 5% or lower was used to determine statistical significance. To control for possible confounding contribution from clinically relevant parameters on the observed differences between haplotype groups, we performed analyses of co-variance (ANCOVA). BMI, levels of insulin and triglycerides and HOMA index were included as co-variates to the factor determined by haplotype group and separate models for each co-variate were evaluated for main and interaction effects. Again, we considered type I errors at a probability of 5% or lower statistically significant. Closer scrutiny of haplotype effects on the relationship between gene expression and co-variates was done by linear regression analysis. The linear models were evaluated studying R, R2 and the F statistic.
Unsupervised hierarchical clustering of samples with respect to patterns of gene expression for selected genes was performed employing an agglomerative algorithm using unweighted pair-group average linkage, UPGA, amalgamation rules. Cluster similarity was determined with Pearsons' correlation. We analyzed possible associations between branching pattern and gender, affection status (FCHL or low-HDL) and familial relationships by overlaying status information on the dendrogram and visually assessing potential clusters.
Among the nine identified intragenic USF1 SNPs, two represent synonymous variants in the coding region, while seven were located in introns (
We first determined whether the region of usf1s2 represents a binding site for DNA binding proteins. We constructed two 34-mer probes (
A qualitative or quantitative functional change of a transcription factor such as USF1 would be expected to be reflected in the expression efficiency or pattern of the genes under its control. We hypothesized that if the usf1s2 polymorphism either itself was functional or served as a marker for an unknown functional element in the vicinity, we should be able to see a difference in the transcriptional profile of USF1 regulated genes in fat biopsies of individuals carrying either the “risk” or “non-risk” allele. This would represent an eloquent in vivo approach to address the function of the potential susceptibility polymorphism. We made a query of a transcription factor database (Transfac) and published literature and identified a total of 40 USF1-controlled genes and selected them for further analysis regardless of knowledge over biological pathway or tissue specificity (Table 4).
APOE
Apolipoprotein E
X
X
ABCA1
ATP-binding cassette, subfamily A
X
X
AGT
Angiotensinogen
X
X
To study the possible effects of allelic variants of USF1 on the transcriptional profiles, we obtained fat biopsies from 19 individuals from our cohort of dyslipidemic families (FCHL and low-HDL-C). They included 7 individuals homozygous for the rare 2-2 genotype of usf1s2 (marking the “non-risk” haplotype) and 12 individuals carrying the common 1 allele (marking the “risk” haplotype) in either heterozygous (8) or homozygous (4) form. Out of 40 listed USF1-controlled genes, 29 were represented on the Affymetrix U133A chips used in this study, some genes by multiple probe sets. We found that 13 genes, represented by a total of 19 probe sets, were expressed in the adipose tissue at a sufficiently high level as to produce reliable signals and were included in the study (Table 4). Several highly, relevant genes of lipid and glucose metabolism were on this list as well as a few genes whose relevancy isn't immediately obvious. After normalization, three genes (represented by a total of 6 probe sets all in agreement) differed significantly (P≦0.05) in their expression between the two haplotype groups of USF1, as evaluated using a two-sample t-test with no assumption of equal variance. All three genes, differentially expressed between individuals carrying either the “risk” or “non-risk” haplotype of USF1, were highly relevant to the phenotype: the ATP-binding cassette subfamily A (ABCA1) (ref. 13A), angiotensinogen (AGT) (ref. 14A) and apolipoprotein E (APOE) (ref. 15A) (
Signals such as serum insulin and glucose are critical in the regulation of various metabolic genes. Insulin is known to influence the ability of USF1 to bind the E-box sequence and thus participate in the regulation of gene expression in response to metabolic changes16A. To evaluate the possible contribution of these factors on the expression of the USF1-controlled genes, we fitted ANCOVA models to the data. We further extended the models to also test for possible effects of body mass index (BMI), triglycerides and HOMA (homeostatic model assessment), a measure of insulin resistance based on values for fasting serum insulin and glucose17A. For all but one of the genes tested, we observed no significant contribution from the various covariates, hence resulting in test statistics essentially the same as those of the simple, two-sample t-test. However, in agreement with earlier findings18A we observed a detectable effect of the insulin level on the expression of acetyl-CoA carboxylase alpha (ACACA) (P=0.05). This relationship, was closer scrutinized using linear regression, which demonstrated a moderately strong negative correlation (R2=0.453) between the steady state transcript level of ACACA and fasting levels of insulin. Partial regression for the haplotype groups additionally demonstrated that this correlation was in essence much stronger in the individuals with the 2-2. “non-risk” haplotype (R2=0.956) than in individuals carrying the “risk” haplotype (R2=0.093) of USF1.
We also tested whether any effect of parameters like sex or study cohort (FCHL or low-HDL) should be taken into account in our analyses by performing an unsupervised clustering of individual expression levels. We detected no effect for any measures looked at, as evidenced by the random clustering of individuals with respect to these variables (data not shown).
In addition to the analyses of known USF1-regulated genes, we tested the whole micro-array data for altered transcript levels of genes between carriers of the different USF1 haplotypes. Approaches of this kind have been successfully used to identify pathways and collections of co-regulated genes in different sets19A. This has most often been done when comparing groups with a clear phenotypic difference such as diabetic vs. non-diabetic19A, or cancer tissue vs. non-cancerous tissue.20A In our study, changes in which the expression differences were ≧1.5 fold, and that reached our limit of statistical significance (P≦0.05) in the two-sample t-test were defined as significant. This approach identified fifteen genes, among which 10 were upregulated and 5 downregulated in individuals with the non-risk haplotype (Table 5).
Again, the top gene on the list of downregulated genes in the risk individuals was APOE. The expression of APOE in the adipose tissue of individuals with the risk haplotype of USF1 was twice as low as expression in those carrying the non-risk haplotype. Other potentially interesting genes on the list included CYP4B1, involved in fatty acid metabolism, and VEGF, involved in angiogenesis, hypertension and it is an essential mediator in angiotensin II induced vascular inflammation21A. Experimental data is needed to verify whether USF1 plays a role in the regulation of these genes as well.
Finally, to investigate whether the putative regulatory element in intron 7 could represent a strong cis-regulatory element and exert its control on the expression of other genes in the vicinity of USF1, we studied the expression levels of 10 flanking genes from the 5′ CD244 gene all the way to APOA2, a stretch of 392 kb. Of these 10 genes, 6 are transcribed from the same DNA strand as USF1 and 4 from the opposite strand. The only probe set whose expression level differed significantly depending on an individual's allele at usf1s2 was one for the adjacent platelet F11 receptor (F11R) gene (P=0.013). This was interesting since the critical chromosomal interval showing an association in FCHL families reached into the F11R gene in alleles of high-triglyceride men6A. On the U133A array two probe sets represent F11R, however only one showed significant difference between the two USF1 haplotype groups. Upon closer examination of the representative sequence in the genome, we noted that the probe set which showed differential expression did not actually represent the F11R gene, but rather a short expressed sequence tag (EST) (AAW995043) immediately adjacent to it, 43.5 kb 3′ from the USF1 gene.
| Number | Date | Country | Kind |
|---|---|---|---|
| 04003554.5 | Feb 2004 | EP | regional |
| Filing Document | Filing Date | Country | Kind | 371c Date |
|---|---|---|---|---|
| PCT/EP05/01624 | 2/17/2005 | WO | 00 | 2/1/2007 |