Type 2 diabetes (T2D) is characterized by dysfunction in pancreatic islet cells resulting in insulin resistance and elevated blood glucose levels. Genome-wide association studies (GWAS) have identified >150 loci associated with T2D and diabetes-related glycemic traits such as fed and fasting glucose and insulin levels1. The prevailing model that has emerged from collective analyses is that islet dysfunction plays a major role in T2D genetic risk2. Protein-coding variants have been implicated as the most likely causal variants for only a handful of loci3. Thus, the large majority of SNPs identified by T2D GWAS likely reside in non-coding, regulatory regions of the genome. Identifying the causal variant(s), target gene(s), and direction-of-effect for each of these loci are important steps to translate GWAS results into mechanistic understanding of islet gene regulation and T2D pathogenesis and to identify and develop new therapeutic targets and strategies4.
Provided herein, in some embodiments, are cells and transgenic animals (and methods for producing these cells and animals) for use, for example, as models for the assessment of diabetes and associated conditions. Also provided herein, in other embodiments, are methods for assessing risk, diagnosing, preventing, and/or treating diabetes. The technology as provided herein is based, at least in part, on genome-wide association studies (GWAS) and functional genomics approaches that have implicated enhancer disruption in islet dysfunction and type 2 diabetes (T2D) genetic risk. The present disclosure describes genetic fine-mapping and functional (epi)genotnic approaches to a T2D-associated and proinsulin-associated locus on 15q22.2 to identify a causal variant, determine its direction-of-effect, and elucidate plausible target genes. Fine-mapping and conditional analyses of proinsulin levels of 8,635 non-diabetic individuals from the METSIM study support a single association signal represented by a cluster of sixteen strongly associated (p<10−17) variants in high linkage disequilibrium (r2>0.8) with the GWAS index SNP rs7172432. These variants reside in an evolutionarily and functionally conserved islet/beta cell stretch/super enhancer (SE); the most strongly associated variant (rs7163757, p=3×1.0−19) overlaps a conserved islet open chromatin site. DNA sequence containing the rs7163757 risk allele displayed two-fold higher enhancer activity than the non-risk allele in reporter assays (p<0.01) and was differentially bound by beta cell nuclear extract proteins. The nuclear factor of activated T cells (NFAT) transcription factor specifically potentiated risk allele enhancer activity and altered patterns of nuclear protein binding to the risk allele iii suggesting it may be a factor mediating risk allele effects. Unexpectedly, the rs7163757 proinsulin-raising and T2D risk (C) allele was associated with increased human islet C2CD4B expression and possibly C2CD4A expression, both of which were induced by diabetogenic inflammatory islet stressors. Together, these data suggest that the rs7163757 risk allele contributes to genetic risk of islet dysfunction and T2D by increasing NFAT-mediated islet enhancer activity and modulating C2CD4B, and likely C2CD4A, expression.
Thus, some aspects of the present disclosure provide cells comprising a modification in at least one allele of a C2 calcium-dependent domain containing 4A (C2cd4a) gene, in at least one allele of a C2 calcium-dependent domain containing 4B (C2cd4b) gene, or in a least one allele of a C2cd4a gene and in at least one allele of a C2cd4b gene. In some aspects, the present disclosure provides a (at least one) cell in a transgenic animal (e.g., a rodent, such as a mouse), the cell comprising a modification in at least one allele of a C2cd4a gene, in at least one allele of C2cd4b gene, or in a least one allele of a C2cd4a gene and in at least one allele of a C2cd4b gene. In some embodiments, the modification is a deletion.
Some aspects of the present disclosure provide cells comprising an inactivated C2cd4a allele, an inactivated C2cd4b allele, or an inactivated C2cd4a allele and an inactivated C2cd4b allele. In some aspects, the present disclosure provides a (at least one) cell in a transgenic animal (e.g., a rodent, such as a mouse), the cell comprising an inactivated C2cd4a allele, an inactivated C2cd4b allele, or an inactivated C2cd4a allele and an inactivated C2cd4b allele.
In some embodiments, the genotype of the cell is C2cd4a−/−. In other embodiments, the genotype of the cell is C2cd4b′ In yet other embodiments, the genotype of the cell is C2cd4a−/−/C2cd4b−/−.
Other aspects of the present disclosure provide a transgenic mouse that comprises an inactivated C2cd4a allele, an inactivated C2cd4b allele, or an inactivated C2cd4a allele and an inactivated. C2cd4b allele.
Also provided herein, in some aspects, are methods of producing C2cd4a−/−, C2cd4b-/-, or C2cd4a−/−/C2cd4b−/− rodent cells and animals (e.g., rodents, such as mice).
Further aspects of the present disclosure provide methods for preventing or treating diabetes. In some embodiments, the methods comprise assaying a genomic sample obtained from a subject for a single nucleotide polymorphism (SNP) selected from rs4502156 allele (I), rs1881415 allele (T), rs67818839 allele (A), rs7162757 allele (C), rs8037894 allele (G), rs8038040 allele (G), rs6494307 allele (C), rs7161785 allele (G), rs7167878 allele (C), rs71.67932 allele (C), rs7172432 allele (A), rs7173964 allele (G), rs4775466 allele (C), rs4775467 allele (G), rs10083587 allele (C), and rs11856307 (A). In some embodiments, when the SNP is present in the genomic sample, the methods further comprise diagnosing the subject as having diabetes or as at risk of diabetes. In some embodiments, when the SNP is present in the genomic sample, the methods further comprise administering to the subject a therapy (e.g., for insulin resistance) to prevent or treat diabetes in the subject.
In other embodiments, the methods comprise assaying a genomic sample obtained from a subject (e.g., a human subject) for an intergenic variant located in a region between a C2CD4,4 gene and a C2CD4B gene, and measuring a level of proinsulin in a biological sample obtained from the subject.
In some embodiments, when the intergenic variant is present in the genomic sample and the level of proinsulin in the biological sample is greater than 20 pmol/L (e.g., 25 pmol/L, 30 pmol/L, 35 pmol/L, 40 pmol/L, 45 pmol/L, or 50 pmol/L), the method further comprises administering to the subject a therapy (e.g., for insulin resistance) to prevent or treat diabetes in the subject.
Genome-wide association studies (GWAS) in different populations have reported multiple T2D index SNPs in the C2 calcium-dependent domain-containing protein 4A/C2 calcium-dependent domain-containing protein 4B/Vacuolar protein sorting 13 homolog C locus (C2CD4A [MIM 610343], C2CD4B [MIM 610344], VPS13C [MIV 608879]) on 15q22.2, each associated with a T2D odds ratio (OR) of ˜1.1 (range 1.06-1,14)5-8, Physiologic studies in non-diabetic European individuals suggest these variants do not affect insulin sensitivity, but rather contribute to T2D genetic risk through islet dysfunction. Reported T2D index SNP risk alleles were associated with increased fasting glucose9-11 and proinsulin levels11,12 and with decreased 2-hour glucose11,13, HOMA-B9, glucose stimulated insulin secretion/release (GSIS/GSIR)10, insulinogenic index10,11, and BIGTT-based acute insulin release. Conditional analysis of this locus in a Danish cohort indicated that rs7172432, or SNPs in high linkage disequilibrium (LD), exhibited stronger effects on fasting glucose and GS1R than several other SNPs reported10. Similarly, the rs7172432 risk “A” allele was associated with increased proinsulin levels in 8,224 METSIM study participants (β=0.042+0.005, p=7.4×10−18)12.
Genome-wide functional genomic and epigenomic analyses revealed that T2D GWAS SNPs are specifically and significantly enriched in islet enhancers14-17, suggesting that these variants may perturb enhancer activity and transcriptional regulation in the islet to contribute genetic susceptibility to islet dysfunction and T2D18. In particular, associated variants are enriched in islet stretch/super enhancers (SEs), which are large (>3 kb), tissue-specific enhancer regions located in or nearby genes important for cell type-specific functions, such as genes encoding insulin (INS [MIM 176730]), Pancreas/Duodenum homeobox protein 1 (PDX1[MIM 600733]), and the regulatory and catalytic subunits of the ATP-binding cassette, subfamily C, member 8/Potassium channel, inwardly rectifying, subfamily J, member 11 (A13CC8 [MIM 600509], KCNJ11 [MIM 600937]) in islets14. As such, SE chromatin state signatures can be used to nominate likely important regulatory regions in or nearby genes of unknown function.
In this disclosure, genetic and functional genomic approaches were combined to investigate the T2D and T2D-related metabolic trait GWAS association on chromosome 15q22.2. Together, the data identify rs7163757 as the most likely causal variant in this locus and implicate C2CD4B [MIM 610344] and C2C/44 [MINI 610343] induction in T2D genetic risk and diabetogenic islet stress responses. Genetic fine-mapping identified a single association signal consisting of sixteen highly associated variants, which reside in an islet stretch enhancer state between C2CD4A and C2CD4B. rs7163757 is the only variant overlapping an islet open chromatin site. The rs7163757 T2D risk allele (C) exhibits two-fold higher enhancer activity than the non-risk allele (T), is differentially bound by beta cell nuclear factors, and is associated with increased C2CD4B, and likely C2CD4A, expression in human pancreatic islets.
Some aspects of the present disclosure provide cells and/or animals (e.g., rodents, such as mice) comprising an inactivated C2 calcium-dependent domain containing 4A (C2cd4a) an inactivated C2 calcium-dependent domain containing 4B (C2cd4b) allele, or an inactivated C2cd4a allele and an inactivated C2cd4b allele.
In some embodiments, the cells and/or animals comprise a modification in an allele of a C2cd4a gene, in an allele of a C2cd4b gene, or in an allele of a C2cd4a gene and in an allele of a C2cd4b gene. In some embodiments, the modification is in a C2cd4a gene (e.g., Mus musculus C2cd4a, Chromosome 9 NC_000075.6 (67830532_67832330, complement); or Rattus norvegicus C2cd4b, Chromosome 8 NC_005107.4 (73671516, 73686344, complement)). In some embodiments, the modification is in a C2cd4b gene (e.g., Mus musculus C2cd4b, Chromosome 9 NC_000075.6 (67716225, 67760933); or Rattus norvegicus C2cd4b, Chromosome 8 NC_005107.4 (73589848, 73594516)). in some embodiments, the modification is in a C2cd4a gene and in a C2cd4b gene.
In some embodiments, the cells and/or animals comprise a modification in a genetic element that regulates expression of an allele of a C2cd4a gene, a modification in a genetic element that regulates expression of an allele of a C2cd4b gene, or a modification in a genetic element that regulates expression of an allele of a C2cd4a gene and a modification in a genetic element that regulates expression of an allele of a C2cd4b gene. In some embodiments, the genetic element is a promoter.
An inactivated gene is a gene that is not transcribed and/or does not encode a functional protein. A modification, with respect to a nucleic acid, is any manipulation of the nucleic acid, relative to the corresponding wild-type nucleic acid (e.g., the naturally-occurring nucleic acid). Non-limiting examples of nucleic acid modifications include deletions and/or insertions e.g., “indels” and frameshift mutations) as well as substitutions (e.g., point mutations). Modifications also include chemical modifications, for example, chemical modifications of a nucleobase. Methods of nucleic acid modification, for example, those that result in gene inactivation, are known and include, without limitation, RNA interference, chemical modification, and gene editing (e.g., using recombinases or other programmable nuclease systems, e.g., CRISPR/Cas, TALENs, and/or ZENs).
In some embodiments, the cells and/or animals comprise a deletion in at least one (e.g., one or more) allele of a C2cd4a gene, in at least one allele of a C2cd4b gene, or in a least one allele of a C2cd4a gene and in at least one allele of a C2cd4b gene. In some embodiments, the deletion is in a C2cd4a gene (e.g., Mus musculus C2cd4a, Chromosome 9 NC_000075.6 (67830532, 67832330, complement); or Rattus norvegicus C2cd4b, Chromosome 8 NC_005107.4 (73671516, 73686344, complement)). In some embodiments, the deletion is in a C2cd4b gene (e.g., Mus musculus C2cd4b, Chromosome 9 NC_000075.6 (67716225, 67760933); or Ramis norvegicus C2cd4b, Chromosome 8 NC_005107,4 (73589848, 73594516)). In some embodiments, the deletion is in a C2cd4a gene and a deletion in a C2cd4b gene.
A deletion refers to a mutation or loss of a genomic sequence. A gene deletion refers to a mutation or loss of a genomic sequence in or near a gene that prevents transcription of the gene and translation of a functional gene product. For example, a C2cd4a deletion may refer to a complete or partial loss of the C2cd4a gene sequence such that a functional C2cd4a gene product is not produced. Likewise, a C2cd4b deletion may refer to a complete or partial loss of the C2cd4b gene sequence such that a functional C2cd4b gene product is not produced.
An allele, as known in the art, is one of two or more alternative forms of a gene that arise by mutation and are found at the same place on a chromosome, in some embodiments, a cell comprises a modification (e.g., deletion) in one allele of a C2cd4a gene. In some embodiments, a cell comprises a modification (e.g., deletion) in both alleles of a C2cd4a gene, in some embodiments, a cell comprises a modification (e.g., deletion) in one allele of a C2cd4b gene. In some embodiments, a cell comprises a deletion in both alleles of a C2cd4b gene.
It is understood in the art that a cell comprising a deletion in one allele of a C2cd4a gene is referred to as having a C2cd4a−/− genotype. Likewise, a cell comprising a deletion in one allele of a C2cd4b gene has a C2cd4b−/− genotype. A cell comprising a deletion in both alleles of a C2cd4a gene has a C2cd4a−/− genotype, and a cell comprising a deletion in both alleles of a C2cd4h gene has a C2cd4b−/− genotype.
The cells provided herein are transgenic cells, meaning that the cells comprise an exogenous (foreign) nucleic acid. In some embodiments, the cells are eukaryotic cells, such as mammalian cells. Non-limiting examples of mammalian cells include human cells and rodent cells (e.g., mouse cells and/or rat cells). In some embodiments, the cells are islet beta cells, such as human islet beta cells and/or rodent islet beta cells (e.g., mouse islet beta cells or rat islet beta cells). In some embodiments, the cells are stem cells, e.g., embryonic stem cells. Other cell types (e.g., primate, equine, bovine, porcine, canine, and/or feline) are provided herein.
In some embodiments, a cell is a rodent cell. In some embodiments, a cell is mouse cell. For example, the mouse cell may be a New Zealand Obese (NZO) mouse cell (e.g., NZO/HILtJ cells. The Jackson Laboratory, Stock No: 002105). NZO inbred mice and strains derived from them develop severe obesity, and are thus useful for studying obesity and Type 2 diabetes.
In some embodiments, the transgenic cells as provided herein are present in a transgenic animal. Thus, the present disclosure provides transgenic animals comprising a transgenic cell. In some embodiments, a transgenic animal comprises a (at least one) C2cd4a−/+ cell. In some embodiments, a transgenic animal comprises a C2cd4d−/− cell. In some embodiments, a transgenic animal comprises a C2cd4b−/+ cell, In some embodiments, a transgenic animal comprises a C2cd4b−/− cell. In some embodiments, transgenic animal comprises a C2cd4a−/+ cell and a C2cd4b−/+. In some embodiments, transgenic animal comprises a C2cd4a−/− cell and a C2cd4b−/+. In some embodiments, transgenic animal comprises a C2cd4a−/− cell and a C2cd4b−/+. In some embodiments, transgenic animal comprises a C2cd4a−/− cell and a C2cd4b−/−. The transgenic animal may be a rodent, for example a mouse or a rat. Other transgenic animals (e.g., primate, equine, bovine, porcine, canine, and/or feline) are provided herein.
Some aspects of the present disclosure provide methods for producing transgenic cells and/or animals. Methods of transgenic production are known in the art, any of which may be used herein. Examples of such methods include DNA microinjection, embryonic stem cell-mediated gene transfer, retrovirus-mediated gene transfer. See, e.g., Kumar T R, et al. Methods Mol Biol. 2009; 590: 335-362, incorporated herein by reference. In some embodiment, Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/Cas-mediated gene transfer approach is used. As one example, a transgenic mouse model can be generated with CRISPR/Cas9 by injecting Cas9 mRNA and either one or multiple single guide RNAs (sgRNA) directly into mouse embryos to generate precise genomic edits into specific loci. Mice that develop from these embryos are genotyped or sequenced to determine if they carry the desired mutation(s), and those that do are bred to confirm germline transmission. As another example, mouse strains with Cue recombinase-dependent Cas9 expression may be used. These mouse strains allow for in vivo CRISPR gene editing wherever a viral vector co-expressing Cre and the sgRNA is injected. The virally-expressed Cue turns on Cas9 expression, which in turn edits the targeted gene or genes. As yet another example, in vivo gene editing in mice can be accomplished by local or systemic injection of separate Cas9 and sgRNA expressing lenti- or adeno-associated viruses. CRISPR,-Cas9-mediated gene editing occurs in cells that express both expression vectors.
Thus, provided herein are methods for producing a C2cd4a′ rodent cell, the methods comprising introducing into the rodent cell a nucleic acid encoding a gRNA that targets a C2cd4a gene, wherein (a) the cell expresses a Cas nuclease, (b) the method further comprises activating expression of a Cas nuclease in the cell, or (c) the method further comprises introducing into the cell a nucleic acid encoding a Cas nuclease, culturing the cell, and producing the C2cd4a-l-rodent cell. The methods may further comprise introducing into the C2cd4a−/− rodent cell a nucleic acid encoding a gRNA that targets a C2cd4b gene, wherein (a) the C2cd4a−/− rodent cell expresses a Cas nuclease, (b) the method further comprises activating expression of a Cas nuclease in the C2cd4a−/− rodent cell, or (c) the method further comprises introducing into the C2cd4a−/− rodent cell a nucleic acid encoding a Cas nuclease, culturing C2cd4a−/− rodent cell, and producing a C2cd4a−/−/C2cJ4b−/− rodent cell.
Other aspects of the present disclosure provide methods for producing a C2cd4b−/− rodent cell, the methods comprising introducing into the rodent cell a nucleic acid encoding a gRNA that targets a C2cd4b gene, wherein (a) the cell expresses a Cas nuclease, (b) the method further comprises activating expression of a Cas nuclease in the cell, or (c) the method further comprises introducing into the cell a nucleic acid encoding a Cas nuclease, culturing the cell, and producing the C2cd4b−/− rodent cell. In some embodiments, the methods further comprise introducing into the C2cd4b−/− rodent cell a nucleic acid encoding a gRNA that targets a C2cd4a gene, wherein (a) the (72cd4b−/− rodent cell comprises a Cas nuclease, (b) the method further comprises activating expression of a Cas nuclease in the C2cd46−/− rodent cell, or (c) the method further comprises introducing into the C2cd4b−/− rodent cell a nucleic acid encoding a Cas nuclease, culturing the C2cd4b:−/− rodent cell, and producing a C2cd4a′−/−/C2cd4b−/− rodent cell.
Yet other aspects of the present disclosure provide methods for producing a C2cd4a−/− /C2cd4b−/− rodent cell, the methods comprising introducing into the rodent cell a nucleic acid encoding a gRNA that targets a C2cd4a gene and a nucleic acid encoding a gRNA that targets a C2cd4b gene, wherein (a) the cell expresses a Cas nuclease, (b) the method further comprises activating expression of a Cas nuclease in the cell, or (c) the method further comprises introducing into the cell a nucleic acid encoding a Cas nuclease, culturing the cell, and producing the C2cd4a−/−/C2cd4b−/− rodent cell.
As is known in the art, the CRISPR pathway includes two principal components: the Cas (e,g., Cas9) nuclease and a guide RNA (gRNA). The guide RNA is a two component system including crRNA and tracrRNA. The crRNA targets the double stranded DNA to be cut, and has a short region of homology allowing it to bind the tracrRNA. The tracrRNA provides a stem loop structure which associates with Cas nuclease. The crRNA:tracrRNA duplex is referred to as the gRNA. The Cas9 nuclease and gRNA form a Cas9 ribonucleoprotein (RNP), which can bind and cut a specific DNA target in a whole genome context. In order to be cleaved by the RNP, a target includes two specific sequences. First, the gRNA typically includes approximately 20 bases of RNA-to-DNA homology, which is referred to as the protospacer. Second, the Cas nuclease includes a short protospacer adjacent motif (PAM) in order to bind to the target DNA. if the linking tracrRNA is present, and enough homology exists between the gRNA and the genomic target, the RNP cleaves both strands of the target DNA, creating a double-stranded break (DSB) at this precise location in the genome.
In some embodiments, a Cas nuclease is a Cas9 nuclease. It should be understood that in any of the embodiments described herein, Cas nuclease may substituted with Cpf1 nuclease or other CRISPR-associated nuclease. Cas nuclease and Cpf1 nuclease variants are also encompassed herein.
Methods of identifying gRNAs for use in modifying or deleting a nucleic acid sequence (e.g., of an allele) are known. For example, there are various commercial companies that offer computation programs to guide the selection of gRNA targets. See, e.g., Addgene's Validated gRNA Sequence Datatable. The general principles guiding gRNA selection include: identifying the region of the genome for targeting (the intended target site), identify protospacer sequences near the intended target site, and select protospacer sequences that minimize off-target effects. Examples of gRNA sequences used to target C2cd4a and C2cd4b are as follows:
Methods of introducing nucleic acids (e.g., DNA and/or RNA) and proteins into cells are known, any of which may be used herein. In some embodiments, the step of introducing a nucleic acid (e.g., a gRNA or a nucleic acid encoding a gRNA, or a nucleic acid encoding Cas nuclease) into a cell is performed by electroporation, viral transduction, chemical transfection, or other non-chemical transfection methods.
Transgenic cells produced herein may be used to produce transgenic animals. In some embodiments, the methods further comprise introducing a transgenic cell such as a transgenic embryonic stem cell into an inner cell mass of a blastocyst, which is then implanted in the uterus of a female animal. In some embodiments, the methods further comprise introducing a transgenic embryonic murine stem cell into an inner cell mass of a murine blastocyst, which is then implanted in the uterus of a female mouse (female mice), such as a NZO/HILtJ mouse.
Heterozygous offspring of a female mouse may be mated to produce homozygous mice having, for example, a C2a14a−/+, C2a14b−/+, or C2cd4a−/+/C2cd4b−/+ genotype.
Prevention and/or Treatment of Type II Diabetes
Some aspects of the present disclosure provide methods for preventing and/or treating diabetes (e.g., Type II diabetes). Genetic fine-mapping of genome-wide association study (GWAS) sequences described herein identified a single association signal including sixteen highly-associated single nucleotide polymorphisms (SNPs) (T2D-associated SNPs): rs4502156 allele (1′), rs1881415 allele (T), rs67818839 allele (A), rs7163757 allele (C), rs8037894 allele (G), rs8038040 allele (G), rs6494307 allele (C). rs7161785 allele (G), rs7167878 allele (C), rs7167932 allele (C), rs7172432 allele (A), rs7173964 allele (G), rs4775466 allele (C), rs4775467 allele (G), rs10083587 allele (C), and rs11856307 (A), which are associated with high levels of fasting proinsulin and with T2D. Thus, in some embodiments, the methods that comprise assaying a genomic sample for at least one of these sixteen highly-associated. SNPs. In some embodiments, the methods comprise assaying a genomic sample for rs4502156 allele (T). In some embodiments, the methods comprise assaying a genomic sample for rs1881415 allele (T). In some embodiments, the methods comprise assaying a genomic sample for rs67818839 allele (A). In some embodiments, the methods comprise assaying a genomic sample for rs7162757 allele (C). In some embodiments, the methods comprise assaying a genomic sample for rs8037894 allele (G). In some embodiments, the methods comprise assaying a genomic sample for rs8038040 allele (G). lirt some embodiments, the methods comprise assaying a genomic sample for rs6494307 allele (C). In some embodiments, the methods comprise assaying a genomic sample for rs7161785 allele (G). In some embodiments, the methods comprise assaying a genomic sample for rs7167878 allele (C). in some embodiments, the methods comprise assaying a genomic sample for rs7167932 allele (C). In some embodiments, the methods comprise assaying a genomic sample for rs7172432 allele (A). In some embodiments, the methods comprise assaying a genomic sample for rs7173964 allele (G). In some embodiments, the methods comprise assaying a genomic sample for rs4775466 allele (C). some embodiments, the methods comprise assaying a genomic sample for rs4775467 allele (G). In some embodiments, the methods comprise assaying a genomic sample for rs10083587 allele (C). In some embodiments, the methods comprise assaying a genomic sample for rs11856307 allele (A).
A genomic (DNA) sample may be assayed for a particular SNP by simply sequencing the genomic sample, For example, DNA isolated from a biological sample obtained from a subject may be sequenced using any one of the basic or high-throughput sequencing methods known in the art (see, e.g., Heather J M et al. Genomics, 2016; 107(1): 1-8, incorporated herein by reference). Other methods for assaying a SNP include single strand conformation polymorphism (SSCP), cleavage fragment length polymorphism analysis (CFLPA), and denaturing high performance liquid chromatography (DEIPLC).
In some embodiments, a genomic sample is Obtained from a biological sample from a subject. For example, a genomic sample may be obtained from blood, saliva, or urine. Other biological samples may be used.
In other embodiments, the methods comprise assaying a genornic sample obtained from a subject (e.g., a human subject) for an intergenic variant located in a region between a C2. calcium-dependent domain containing 4A gene (C2CD4A) and a C2 calcium-dependent domain containing 413 gene (C2CD4B). An intergenic variant may be any change in nucleotide sequence in a region located between two genes. In some embodiments, the intergenic variant is a SNP. In other embodiments, the intergenic variant is a deletion (of one or more nucleotides), In yet other embodiments, the intergenic variant is an insertion (of one or more nucleotides). Intergenic variants also include nucleotides modifications. In some embodiments, the intergenic variant is a SNP selected from rs4502156 allele (I), rs1881415 allele (T), rs67818839 allele (A), rs7162757 allele (C), rs8037894 allele (G), rs8038040 allele (G), rs6494307 allele (C), rs7161785 allele (G), rs7167878 allele (C), rs7167932 allele (C), rs7172432 allele (A), rs7173964 allele (G), rs4775466 allele (C), rs4775467 allele (G), rs10083587 allele (C), and rs11856307 (A). In some embodiments, the intergenic variant is SNP rs7163757 allele (C).
In some embodiments, the methods further comprise measuring a level of proinsulin in a biological sample obtained from the subject. Insulin resistance and deterioration of beta-cell secretion are main features in the development of type 2 diabetes, which is reflected in increasing serum intact proinsulin levels. Proinsulin is synthesized by the beta cell of the pancreas as a precursor molecule for insulin. Physiologically, virtually all proinsulin molecules are intra.cellularly cleaved by carboxypeptides into insulin and C-peptide. Normal (non-diabetic) subjects typically have proinsulin concentrations below the upper limit of the normal fasting reference range (˜22 pmol/L) when hypoglycemic (blood glucose <60 mg/dL). Proinsulin levels may be measured, for example, using a blood test.
Thus, when the intergenic variant (e.g., is SNP rs7163757 allele (C)) is present in a genomic sample and/or the level of proinsulin in the biological sample is greater than 22 pmol/L, (e.g., greater than 25, 30, 35, 40, 45, or 50 pmol/L) the methods may further comprise administering to the subject a therapy to prevent or treat diabetes in the subject. In some embodiments, the therapy is for insulin resistance. Therapies for insulin resistance to prevent or treat diabetes include pharmacologic therapies that reduce insulin resistance (insulin-sensitizing and antihyperglycemic effects) such as metformin, thiazolidinediones, and concentration insulin (U-100 insulin or U-500 insulin) and surgical treatment of underlying causes, such as bariatric surgery (e.g., gastric banding, sleeve gastrectomy, and gastric bypass) in selected morbidly obese subjects.
In some embodiments, the therapy for diabetes comprises administering to the subject an agent that reduces expression and/or activity of C2CD4A protein and/or C2CD4B protein. In some embodiments, the therapy for diabetes comprises administering to the subject an agent that reduces expression of C2CD4A protein. In some embodiments, the therapy for diabetes comprises administering to the subject an agent that reduces activity of C2CD4A protein. In some embodiments, the therapy for diabetes comprises administering to the subject an agent that reduces expression of C2CD4B protein. In some embodiments, the therapy for diabetes comprises administering to the subject an agent that reduces activity of C2CD4B protein. Examples of these agents include RNA interference molecules, such as siRNA and shRNA molecules, small molecule drugs, and other agents that reduce expression and/or activity of C2CD4A protein and/or C2CD4B protein.
In some embodiments, the therapy for insulin resistance comprises administering to the subject an agent that reduces expression and/or activity of C2CD4A protein and/or C2CD4B protein. In some embodiments, the therapy for insulin resistance comprises administering to the subject an agent that reduces expression of C2CD4A protein, in some embodiments, the therapy for insulin resistance comprises administering to the subject an agent that reduces activity of C2CD4A protein, In some embodiments, the therapy for insulin resistance comprises administering to the subject an agent that reduces expression of C2CD4B protein. in some embodiments, the therapy for insulin resistance comprises administering to the subject an agent that reduces activity of C2CD4B protein. Examples of these agents include RNA interference molecules, such as siRNA and shRNA molecules, small molecule drugs, and other agents that reduce expression and/or activity of C2CD4A protein and/or C2CD4B protein.
In some embodiments, the therapy for diabetes comprises administering to the subject an agent that reduces calcineurin (Cn)/NFAT pathway activity. In some embodiments, the therapy for insulin resistance comprises administering to the subject an agent that reduces calcineurin (Cn)/NFAT pathway activity. The NFAT family transcription fa.ctors are highly expressed in human and rodent islet cells and regulate the expression of several T2D-associated genes. Calcineurin mediates the localization of NFAT proteins from the cytosol to the nucleus where they can regulate gene expression. Examples of agents that reduce (inhibit) Cn/NFAT pathway activity include the microbial drugs Cyclosporin A (CsA) and FK506. Other Cn/NTAT pathway inhibitors may be used,
Some aspects of the present disclosure provide methods for diagnosing diabetes e.g., Type II diabetes (T2D)) in a subject. In some embodiments, diagnostic methods comprise assaying a genomic sample obtained from a subject (e.g., a human subject) for at least one of the following sixteen highly-associated SNPs (T2D-associated SNPs): rs4502156 allele (T), rs1881415 allele (T), rs67818839 allele (A), rs7163757 allele (C), rs8037894 allele (G), rs8038040 allele (G), rs6494307 allele (C), rs7161785 allele (G), rs7167878 allele (C), rs7167932 allele (C), rs7172432 allele (A), rs7173964 allele (G), rs4775466 allele (C), rs4775467 allele (G), rs10083587 allele (C), and/or rs11856307 (A). In some embodiments, the methods comprise assaying a genomic sample for rs4502156 allele (T). In some embodiments, the methods comprise assaying a genomic sample for rs1881415 allele (T). In some embodiments, the methods comprise assaying a genomic sample for rs67818839 allele (A). In some embodiments, the methods comprise assaying a genomic sample for rs7162757 allele (C). 1-11 some embodiments, the methods comprise assaying a genomic sample for rs8037894 allele (G). In some embodiments, the methods comprise assaying a genomic sample for rs8038040 allele (G). In some embodiments, the methods comprise assaying a genomic sample for rs6494307 allele (C). In some embodiments, the methods comprise assaying a genomic sample for rs7161785 allele (G). In some embodiments, the methods comprise assaying a genomic sample for rs7167878 allele (C). In some embodiments, the methods comprise assaying a genomic sample for rs7167932 allele (C). In some embodiments, the methods comprise assaying a genomic sample for rs7172432 allele (A). In some embodiments, the methods comprise assaying a genomic sample for rs7173964 allele (G). In some embodiments, the methods comprise assaying a genomic sample for rs4775466 allele (C). In some embodiments, the methods comprise assaying a genomic sample for rs4775467 allele (G). In some embodiments, the methods comprise assaying a genornic sample for rs10083587 allele (C). In some embodiments, the methods comprise assaying a genomic sample for rs11856307 allele (A). In some embodiments, when the SNP is present in the genomic sample, the methods further comprise diagnosing the subject as having diabetes (e.g., T2D) or as at risk of diabetes (e.g., T2D).
Methods for assaying a SNP are described elsewhere herein.
In other embodiments, diagnostic methods comprise assaying a genomic sample obtained from a subject (e.g., a human subject) for an intergenic variant located in a region between a C2 calcium-dependent domain containing 4A gene (C2CD4A) and a C2 calcium-dependent domain containing 4B gene (C2CD4B). In some embodiments, the intergenic variant is a SNP. In other embodiments, the intergenic variant is a deletion (of one or more nucleotides). In yet other embodiments, the intergenic variant is an insertion (of one or more nucleotides). Intergenic variants also include nucleotides modifications. In some embodiments, the intergenic variant is a SNP selected from rs4502156 allele (I), rs1881415 allele (T), rs67818839 allele (A), rs7162757 allele (C), rs8037894 allele (G), rs8038040 allele (G), rs6494307 allele (C), rs7161785 allele (G), rs7167878 allele (C), rs7167932 allele (C), rs7172432 allele (A), rs7173964 allele (G), rs4775466 allele (C), rs4775467 allele (G), rs10083587 allele (C), and rs11856307 (A). In some embodiments, the intergenic variant is SNP rs7163757 allele (C). In some embodiments, when the intergenic variant (e.g., is SNP rs7163757 allele (C)) is present in a genomic sample, the methods further comprise diagnosing the subject as having diabetes (e.g., T2D) or as at risk of diabetes T2D).
In some embodiments, the diagnostic methods further comprise measuring a level of proinsulin in a biological sample obtained from the subject. In some embodiments, when the intergenic variant (e.g., is SNP rs7163757 allele (C)) is present in a genomic sample and/or the level of proinsulin in the biological sample is greater than 22 pcnollt (e.g., greater than 25, 30, 35, 40, 45, or 50 pmol/L) the methods may comprise diagnosing the subject as having diabetes (e.g., T2D) or at risk of diabetes T2D).
rs7172432 has been the most reported C2CD4A/B/VPS13C locus index SNP associated with T2D and quantitative traits related to islet dysfunction in multiple populations5,6,8,10,12. To identify a set of candidate causal variants for functional follow-up in this locus, a genetic fine-mapping analysis of the fasting proinsulin association signal in the locus in 8,635 non-diabetic Finnish individuals from the MI TSIM study was conducted11,12,19. Fine-mapping analysis identified a cluster of sixteen strongly associated variants (p<1×10−17;
All sixteen variants are located between two recombination hotspots in the intergenic region between C2CD4A and C2CD4B. Conditional analysis using rs7172432 as a covariate attenuated the strength of the proinsulin association in this locus (
Because all sixteen variants identified by fine-mapping were intergenic (
SEs can encompass multiple, discrete open chromatin sites14,15,51. Chromatin accessibility in human islets was profiled using the assay for transposase accessible chromatin (ATAC-seq)29 to determine the constituent open chromatin sites within the C2CD4A/B/VPS13(7 islet SE and to identify any overlaps with the putative causal/functional variant(s) among the sixteen variants in this locus. As shown in
Sequences overlapping seven of the islet SE constituent sites exhibited evidence of sequence constraint through the vertebrate lineage. To determine which of the islet SE constituent sites in this locus were functionally preserved in rodents, ATAC-seq profiling was used and identified eight open chromatin sites in mouse MING cell lines (
Based on chromatin profiling and cross-species analyses, it was hypothesized that rs7163757 alleles alter HS3 regulatory element activity. To test this, the HS3 sequence (
The additional C2CD4A/B SE constituent sites (
To identify the putative target gene(s) of the fine-mapped variants (Table 1), associations between rs7163757 genotype and expression of genes with transcription start sites residing within 1 megabase (Mb) of this putative functional variant (Methods) were examined in a recent study of 112 Caucasian islet samples34 and in multiple tissues examined by the Genotype-Tissue Expression (GTEx) Consortium 53-55. C2CD4A and C2CD4B were robustly expressed in islets and only selectively among other GTEx tissues, contrasting with the wide expression of 1PS13C. The rs7163757 (C) risk allele was significantly associated with increased C2CD413 expression in both islet cohorts (
To assess further the potential links between rs7163757 genotype and C2CD4A and/or VPS13C expression, allele-specific eQTL (aseQTL) analyses43 of transcribed SNPs (
Both C2CD4A and C2CD4B expression are induced by inflammatory cytokines in endothelial cells5′. To determine if C2CD4A, C2CD4B, and/or VPS13C expression are induced by these diabetogenic stressors57 in islets/beta cells, RNA-seq data from human46 and rat47 islets exposed to inflammatory cytokines were analyzed. As shown in
This next study sought to identify plausible trans-factors that could mediate the increased rs7163757 (C) allele enhancer activity observed in
To test this hypothesis, pharmacologic inhibition of the calcineurin/NFAT pathway by tacrolimus (FK506) was first examined to determine how it would affect HS3 risk and non-risk allele enhancer activity. Calcineurin-mediated dephosphorylation is essential for NFAT translocation from the cytosol to nucleus67, and tacrolitnus inhibits this process. The effect of tacrolimus treatment on enhancer activity in MIN6 was tested, and it was found that it abrogated increased enhancer activity of the rs7163757 risk allele (C) to that of the rs7163757 non-risk (T) allele levels (
Next, studies tested how molecular manipulation of NFAT affected HS3 enhancer activity. MING cells were co-transfected with the HS3 luciferase vector and plasmids expressing GFP-tagged NTATc1. Furthermore, since a peptide sequence (“VIVIT”) has been shown to selectively inhibit calcineurin-NFAT interactions and NFAT activity37,67, studies also tested HS3 activity when co-transfected with GFP-tagged VIVIT to inhibit -NFAT activity in the cells. As a positive control, each expression vector was co-transfected with a luciferase reporter plasmid containing three canonical NFAT binding sites (NFAT Reporter). Expression of GFP alone (“G”), NFATc1 alone (“N”), or VIVIT alone (“V”) did not alter transcriptional activity of the empty luciferase vector (pGL4.23, not shown). As expected, NFATc1 expression in MIN6 cells enhanced luciferase activity of the NFAT Reporter five-fold (
Finally, to determine if the risk and non-risk alleles are bound by different beta cell nuclear proteins-complexes, DNA probes containing the rs7163757 risk allele (C) or the non-risk allele (T) were incubated with MIN6 beta cell line nuclear extracts (NE) and looked for electrophoretic mobility differences of the resulting protein-DNA complexes. As shown in
GWAS have identified over 150 loci contributing genetic susceptibility to islet dysfunction and T2D. The vast majority of GWAS SNPs associated with T2D and related molecular traits (e.g., fasting glucose, fasting insulin, 2-hour glucose, glucose-stimulated insulin secretion) reside in non-coding regions of the genome. Identifying (1) the causal variant(s), (2) their target gene(s), and (3) their direction(s) of effect for T2D risk (i.e., gain- or loss-of-function) at each associated locus is important to better understand the molecular genetic pathology underlying islet dysfunction and T2D, and to determine therapeutic suitability. In this study, genetic fine-mapping and functional genomic techniques was applied to refine the proinsulin and T2D association signal on 15q22.2 and address these three important questions. After identifying sixteen intergenic non-coding variants, functional (epi)genomics and experimental analyses was applied to implicate rs7163757 as the most likely functional variant, to identify C2CD4B, and likely C2CD4B as putative target genes of this human pancreatic islet enhancer SNP, and to link C2CD4A and C2CD4B induction with islet inflammatory stress responses. The T2D risk and proinsulin-rai sing allele (C) exhibits increased enhancer activity and is differentially bound by beta cell nuclear factors in vitro. Moreover, pharmacologic and molecular manipulation of the calcineurin/NFAT pathway combine with in vitro evidence to suggest that one or more of the NFAT TEs can bind to the risk allele and potentiate its enhancer activity. Taken together, these results implicate gain-of-function effects of the rs7163757 risk allele (C) on an evolutionarily conserved islet SE and increased C2CD4B (and likely C2CD4A) expression as a molecular mechanism underlying the 15q22.2 genetic association with increased proinsulin levels and T2D.
Human pancreatic islet eQTL and aseQT1_, analyses link increased C2CD4B expression, and potentially C2CD4A, to the TZD risk and proinsulin-increasing rs7163757 allele (C). While rs7163757 is detected as a lung eQTL for C2CD4A by the Genotype Tissue Expression Consortium53-55, the link between rs7163757 genotype and C2CD4B expression appears to be unique to islets. C2CD4A and C2CD4B, but not VPS13C, are induced by proinflammatory cytokines in islets and beta cells. Moreover, in vitro and in vivo data strongly suggest that the risk allele increases enhancer activity and. C2CD4B and C2CD4A expression. Notably, the trend of increased expression was consistent between RNA-seq profiles obtained from two independent islet cohorts34-42, supporting the robust and reproducible nature of these observations. These results contrast with a previous report linking the rs7163757 risk allele to decreased enhancer activity and female-specific decreases in VPS13C and C2CD4A expression among 40 female islet samples68. Targeted functional studies will provide important insights to resolve this apparent discrepancy; these data, however, challenge the existing model suggesting female-specific decreases in VPS13C and. C2CD4A expression as the mechanism(s) underlying rs7163757 risk allele association with islet dysfunction and T2D. Together, these data clearly motivate future studies to determine the role(s) that C2CD4B and/or C2CD4A may play in pathogenic or compensatory islet stress responses and to determine how these may be exploited to prevent and/or treat T2D.
SEs14,15,51 are important transcriptional regulatory regions that govern cell type-specific functions. In human islets, genes encoding proteins involved in glucose sensing (e.g., GCK), insulin secretion (e.g., INS, ABCC8/KCNI11), and islet cell identity (e.g., TFs PDXI, MAFA [MIM 610303], NKX6.1 [MIM 602563]) overlap or are nearby SE chromatin signatures. Evolutionary conservation of the C2CD4A/B/VPS13C SE signature reported here suggests that this locus is also an important region for islet function, rs7163757 resides in this conserved islet SE and the surrounding sequence, HS3, is an open chromatin site and an active enhancer independent of the rs7163757 genotype in human islets as indicated by in vivo islet ATAC-seq and TF ChIP-seq data and in vitro luciferase reporter activity. Consistent with this finding, the rs7163757 risk (C) and the non-risk (T) alleles are both empirically bound by multiple islet TFs according to islet ChIP-seq data16. This implies that the risk allele (C) does not create or destroy this open chromatin site, but that the observed increased risk allele (C) enhancer activity gain-of-function effect) is facilitated by recruiting an additional TF, in this case, NFAT, to a canonical binding site created by the risk allele. Such gain-of-function effects have been identified for other T2D-associ ated loci69, and “enhancer hijacking” is emerging as a tumorigenic mechanism in cancer70,71.
Together, the pharmacologic, molecular, and in vitro experiments in this study strongly suggest that NFAT is a TF family that mediates these gain-of-function risk allele effects. The NFAT TF family is linked to both physiologic and pathophysiologic transcriptional responses in islets/beta cells. Physiologically, it directly regulates ins transcription in pancreatic beta cells in response to increased Ca2+ levels and calcineurin activation62, and pharmacologic calcineurin inhibition decreased human beta cell survival and murine beta cell proliferation61. Beta cell-specific deletion of the calcineurin regulatory subunit (Cnb1)in mice resulted in age-dependent diabetes characterized by decreased beta cell proliferation and mass and reduced pancreatic insulin content; conditional expression of active Nfatc1 in Cnb1−/− mice rescued these detects and prevented diabetes65. Conversely, islet expression of constitutively active calcineurin in mice resulted in glucose intolerance and loss of beta cell mass due to decreased proliferation and increased apoptosis65. Additionally, the NFAT TF family has been implicated in the pancreatic islet/beta cell inflammation response, wherein it mediates TNT alpha expression after exposure to the pro-inflammatory cytokine IL-1β63. Recent data suggest that binding partners, such as IRK and JNK, may recruit NFAT to distinct cis-regulatory elements to mediate physiologic and pathophysiologic/inflammatory gene expression responses, respectively64, :Interestingly, C2CD4A and C2CD4B were both identified as inflammation-responsive genes in endothelial cells, suggesting that they may indeed be co-regulated: C2CD4A and C2CD4B expression increased 30- and 18-fold, respectively, after 2 hours of treatment with IL-1β in endothelial cells56. C2CD4A and C2CD4B expression was three- to five-fold and two- to seven-fold induced, respectively, by IL-1β, and IFNγ pro-inflammatory cytokines in human islets46, palmitate treatment induced C2CD4A expression three- to four-fold72. Three-fold induction of C2CD4A and C2CD4B expression was detected in human pancreatic islets after IL-1β treatment (data not shown). Similarly, a two-fold and three-fold induction of C2cd4a and C2cd4b in INS-1(832/13) was observed in beta cells treated with 2 U/ml IL-1β for 2 hours compared to untreated controls. Finally, C2CD4A was three-fold induced in diabetes-sensitive New Zealand Obese (NZO) mouse islets compared to those of diabetes-resistant 136-oh/oh mice after a carbohydrate challenge7 A working model has been proposed wherein the rs7163757 T2D risk allele creates NFAT-mediated/dependent enhancer gain-of-function and inappropriate, enhanced, or extended C2CD4B (and likely C2CD4A) expression in response to islet/beta cell inflammatory stress signaling. Multiple NFAT IF paralogs (NFATC1, NFATC2, and NFATC3) are expressed in islets. Thus, it will be important in future studies to elucidate the specific NFAT family member(s) mediating the rs7163757 risk allele effects and clarify the condition(s) that elicit NFAT binding to this regulatory site in vivo. Moreover, follow-up studies to define the molecular functions of C2CD4B and/or C2CD4A in human islets, and to determine the effect of their overexpression on islet/beta cell (patho)physiology will be critical to understand their roles in T2D pathogenesis and their utility as therapeutic targets to prevent and/or treat T2D.
Based on human islet expression and epigenomic data linking the rs7163757 proinsulin raising and type 2 diabetes risk allele to altered C2CD4B, and likely C2CD4A, expression, it was hypothesized that C2cd4a, C2cd4b, or both genes would be required for proper insulin secretion in beta cells. To test this hypothesis, CRISPR/Cas9 was used to delete C2cd4a, C2cd4b, or both genes in INS-1 (832/13) and test the effects of deleting them on basal (3 mM glucose) or glucose-stimulated (15 mM glucose) insulin secretion (GSIS). As shown in
(
After confirming that each targeted deletion resulted in a null allele with no gene expression, how C2cd4a and/or C2cd4b loss-of-function affected GSIS in the mutant beta cell lines was determined. Each single mutant, and the double mutant, did not exhibit insulin secretion differences under basal (3 mM) glucose concentrations (
Genetic association results were reported for fasting proinsulin levels from up to 8,635 non-diabetic Finnish men from the population-based Metabolic Syndrome in Men (METSIM) study19. Study participants with type 1 or type 2 diabetes (previously diagnosed, on diabetes medication, fasting glucose ≥7 mmol/l or 2-h glucose ≥11.1 mmol/l) were excluded from analysis. Mean age of analyzed participants was 57.2 years (median=57.0 years; range=45-74 years) and mean body mass index (BMI) was 26.8 kg/m2 (median=26.3 kg/m2; range=16.2 to 51.6 kg/m2). Blood samples were drawn after a 12-h overnight fast and fasting plasma-specific proinsulin (Human Proinsulin RIA Kit, Linco Research; no cross-reaction with insulin or C-peptide) and fasting insulin (Al)VIA Centaur Insulin IRI, 02230141, Siemens Medical Solutions Diagnostics; minimal cross-reaction with proinsulin or C-peptide) were measured by immunoassay. The study was approved by the ethics committee of the University of Kuopio and Kuopio University Hospital, and informed consent was obtained from all study participants.
METSIM samples were genotyped with the Illumina HumanOmniExpress Beadchip (Illumina, San Diego, Calif., USA). Illumina array probe sequences were mapped to the hgl9 genome assembly using BWA20. SNP quality control steps prior to imputation included removing SNPs with ambiguously mapping probe sequences and SNPs with call rate <95% or Hardy-Weinberg equilibrium test p-value <10−6. A two-step genotype imputation strategy21 was followed. First, haplotypes were statistically estimated using SHAPEIT222 and then imputed genotypes into these estimated haplotypes using minimac223. The haplotypes from 2,737 European individuals sequenced in the Genetics of Type 2 Diabetes (GoT2D) project were used as the imputation reference panel;. Participants were previously genotyped with the Illumina HumanExome BeadChip12, which focuses on protein-altering variants selected from the exome sequences of >12,000 individuals. Exome chip variants were incorporated after imputation.
To account for relatedness between study participants and population structure, associations between the phenotype were tested and the estimated dosages (imputed variants) or additively coded genotypes (directly genotyped variants) using a linear mixed model with empirical kinship matrix, as implemented in EMMAX24. Log-transformed plasma proinsulin was adjusted for age, BMI, and log-transformed insulin before association testing, and analyzed rank-based inverse normal-transformed residuals. Directly genotyped variants were analyzed with minor allele count (MAC)≥5 and HWE test p-value ≥10−6, and imputed variants with imputation quality score R2>0.3 and minor allele frequency (MAF) >0.5%.
To identify additional independent signals in the region, a conditional analysis was carried out, in which the allele count of the most strongly associated. SNP (rs7172432) was included as a covariate in the model. Regional association results were visualized using LocusZoom25. Linkage disequilibrium was estimated from the imputation reference panel.
For the Bayesian fine-mapping analysis, we followed Fuchsberger et al3. In brief, we defined the candidate set of variants by identifying all analyzed variants with r2>0.1 with the most associated variant, and within a 5:Mb window centered on the most associated variant. We calculated approximate Bayes' factors (ABF) for each variant as:
ABF=√{square root over (1−rerz
where r=0.04/(s.e.2+0.04), z=β/s.e., and β and s.e. are the log odds ratio estimate and its associated standard error. The posterior probability of being causal for each variant as ABET where T is the sum of ABF values over all candidate variants was then calculated. Next, variants were ranked in decreasing order by posterior probabilities and the 99% credible set was obtained by including variants with the highest posterior probabilities until the cumulative posterior probability ≥99%.
Fresh human non-diabetic pancreatic islets were purchased from ProdoLabs (UNOS#ABEI388) in accordance with regulations of Human Subjects research. Upon arrival, the cells were transferred into PLM(S) media (ProdoLabs), supplemented with PIM(ABS) (ProdoLabs) and PIM(G) (ProdoLabs) and kept in a T-150 non-tissue culture treated flask (MR) for recovery at 37 C and 5% CO2 overnight. ATAC-seq and RNA-seq were performed the following day as described below.
Mouse MIN6 and rat INS-1(832/13) (a kind gift from C. Newgard) beta cell lines were cultured as previously described26,27. Briefly, MING were grown in DMEM (4.5 g/I Glucose) (Life Technologies) supplemented with 1 mM Sodium Pyruvate (Life Technologies), 100 μM 2-Mercaptoethanol (Sigma), and 10% Fetal Bovine Serum (Seradigm). ENS-1(832/13) cells were cultured in RPMI-1640 (11.1 mM D-glucose) (Life Technologies) supplemented with 10% Fetal Bovine Serum (Seradigm), 10 mM HEPES (Life Technologies), 2 mM L-glutamine, 1 mM sodium pyruvate, and 50 iM 2-Mercaptoethanol (Sigma)
Crosslinking and ChIP-seq of human islets, MIN6, and INS-1(832/13) beta cells were carried out as described28. Human islet, MIN6, and INS-1(832/13) beta cell line ATAC-seq libraries were prepared as described29. Approximately 250 islet equivalents (250,000 islet cells), 250,000 :INS-1(832/13) cells, and 50,000 MING cells were transposed.
Chromatin states were determined using ChromHMM as described14. ATAC-seq reads were aligned to hg19 (human islets), mm9 (MING), and rn5 (INS-1(832/13)) reference genomes using BWA20 with ‘mem’ option. Only reads uniquely mapping to their respective genomes were used in subsequent analysis. Reads mapping to the mitochondrial genome (chrM) were removed, and duplicate reads mapping to the nuclear genome were eliminated to avoid potential PCR amplification artifacts in ATAC-seq. Human islet, MIN6, and. INS-1(832/13) ATAC-seq library and data statistics were collected. ATAC-seq enriched regions (peaks) in each sample or merged replicates were identified using the MACS23° program with the following parameters: MACS/2.1.0.20151222/binlmacs2 callpeak-t<input tag file>-f BED-n<output peak file>-g ‘hs’-nomodel-shift-100-extsize 200 B.
Human islet RNA-seq data and tracks were previously described14. Additionally, total RNA was extracted and purified from 32 Caucasian human pancreatic islet samples procured from IIDP and NDRI, MIN6, and INS-1(832/13) beta cell lines using Trizol (Life Technologies) according to the manufacturer's instructions, and sequenced using the Illumina. TniSeq (human islets, MIN6) or Kapa Biosystems KAPA mRNA-seq (INS-1(832/13)) kits according to the manufacturer's instructions. MIN6 RNA-seq data were aligned to min9 using tophat v1:3.231. For INS-1(832/13) RNA-seq, the gene models for C2cd4a, C2a141? and 1/7:ps.13c were absent in the current version of the UCSC rn5 gene annotations (Illumina iGenomes). A reference guided transcriptome assembly was performed using cufflinks (v2.2.1) to construct the transcript models31. The assembled transcripts were then visualized on the UCSC genome browser and gene locations identified by homology to mouse annotations. Transcript models for each locus were collapsed to generate the corresponding gene model. These gene models were added to the existing rn5 gene annotations and a reference index created using RSEM32 (v.1.12.2). RNA-seq reads were realigned to this reference index using RSEM to generate alignments and count matrices. INS-1(832/13) UCSC genome browser tracks were created using homer (v4.6)33.
Transcription factor binding predictions in human islet ATAC-seq data were performed as described35. Briefly, matrices that represent the Tn5 integration events ±100 bp around position weight matrix (PWM) scan matches for a given motif were generated. These matrices were used as input for CENTIPEDE35 to calculate the posterior probability of each motif instance being bound. individual motifs were considered bound if the CENTIPEDE posterior was greater than or equal to 0.99 and the motif coordinates were completely intersecting an ATAC-seq peak in the sample of origin.
Human SE constituent sequences from the region on chr15: 62363117-62455736 (GRCh37/hg19 coordinates), between C2CD4A and C2CD4B, were cloned. SE constituent site sequences were PCR amplified from islet genomic DNA (gDNA) of two individuals with different genotypes (Haplotype 1 and Haplotype 2) at several of the SNPs in this region using Phusion high fidelity polymerase (Thermo Scientific). Amplicons were cloned into pDONR201 and shuttled into modified pGL4.23 luciferase reporter vectors using Gateway cloning (Invitrogen) as previously described28. H53 rs71.63757 (C) and (T) alleles were interconverted using primer sequences (not shown) and the QuikChange Lightning site-directed mutagenesis kit (Agilent Technologies) according to the manufacturer's instructions.
MIN6 or INS-1(832/13) cells were seeded at a density of 60,000 cells per well in 96-well-plates 24 hours prior to transfection. Cells were co-transfected in triplicate with 200 ng of pGL4.23 containing each human C2CD44/B SE sequence and 2 ng Renilla (pRL-TK) using Lipofectamine 2000 Transfection reagent (Life Technologies) according to the manufacturer's instructions. Between four and sixteen clones were tested per C2CD4A/B SE sequence and orientation. Each plasmid was transfected and measured in triplicate, and experiments were completed on at least three separate occasions. 38-40 hours after transfection, cells were lysed in 1×Passive Lysis Buffer (PLB) using the Dual Luciferase Reporter Assay system (Promega) according to the manufacturer's instructions. Luciferase was measured on a Synergy2 Microplate Reader (BioTek). Firefly values were normalized to Renilla to control for differences in cell number or transfection efficiency. To determine glucose-stimulated activity of the reporters, INS-1(832/13) cells were grown in INS-1(832/13) media containing reduced glucose (3-5 mM) for 8-10 hours prior to transfection with the C2CD4/1/B FIS3 reporter plasmids. Transfected cells were cultured in reduced glucose medium for an additional 16 hours, after which they were grown in high (15 mM) or low (3 mkt) glucose-containing medium for an additional 24 hours prior to cell lysis in 1× PLB and luciferase reporter activity measurement as described above. At least three plasmid preparations were tested for each HS3 allele on three separate occasions. Pharmacologic inhibition of calcineurin/NFAT: MING cells were pre-treated with 10 ng/ml tacrolimus (FK506) (Biotang) or ethanol vehicle control 30 minutes prior transfection with 200 ng of the C2CD4A/B SE firefly luciferase vectors and 2ng Renilla (pRL-TK). Luciferase activity was tested as above for between four and five clones on three separate occasions.
Molecular manipulation of the Cn/NFAT pathway: 200 ng of the C2CD4A/B SE luciferase vectors were co-transfected in triplicate with 100 ng of plasmids expressing wildtype NFAT (EGFPC1-huNFATc1EE-WT36; Addgene, plasmid #24219) or a mutant NFAT peptide sequence (EGFPN1-VIVIT37; Addgene, plasmid #11106). pGL3-NFAT-luciferase38 (Addgene, plasmid 17870), which contains 3 canonical NFAT binding sequences, was used as a positive control to measure NFAT activity and transcriptional responses in these experiments. GFP intensity was documented (GFP-channel=60% light intensity; 717ransmitted=5.3% light intensity) for all cells transfected with EGFPC1-huNFATc1EE-WT, EGFPNIATIVIT, and pEGFP-C1 (Clontech, catalog 4 6084-1) expression constructs using an EVOS E′L Cell Imaging System (Life Technologies) to confirm GFP fusion protein levels for each co-transfected expression plasmid. For these experiments, four independent clones were selected and tested on four separate occasions.
Islet DNA samples were genotyped at the Genetic Resources Core Facility (GRCF) of the Johns Hopkins Institute of Genetic Medicine on the HumanOmni2,5-4v1 H BeadChip array (Illumina, San Diego, Calif. USA). The same quality control criteria was applied to SNPs for further analysis as described for the METSIM samples. In addition, we filtered out A/T or C/G SNPs with MAF>0.2, or SNPs that have absolute alternate allele frequency difference>0.2 with 1000G FUR population, yielding 2,057,703 SNPs f©r imputation. SNP imputation and phasing was performed using the same strategy as described for METSIM samples. Haplotypes from 1000G phase3 v539 were used as the reference panel. To improve phasing quality given the small target sample set, samples were pre-phased together with the 2,504 reference panel samples using ShapeIT222.
Islet Expression Quantitative Trait Locus (eQTL) Lookups and Conditional Analysis
Cis-eQTL, data for each gene (n=10) whose most upstream TSS was within 1 Mb of rs7163757 were obtained from a parallel study of expression in 112 human islets34. To determine whether cis-eQTL associations between rs7163757 and nearby genes could be affected by other strong eQTLs in the region, iterative conditional analysis was performed on each of these genes. The following linear regression model, fit within the two islet studies (n=31 and n=81) from Varshney et al33 was used:
Yi□=α+βjSNPGiSNP+βjsGis+εij
Yij is the inverse-normalized and PEER-adjusted FPKM value for individual i and gene j, GiSNP is the imputed allele count of rs7163757, PjSP is the regression coefficient of the imputed allele count for rs7163757, Gis is the set of all SNPs within 1 Mb of the most upstream TSS of the gene, and εij is normally distributed with mean zero and variance σ2. Only SNPs present in both studies (MAC≥1) and with MAC>10 across all 112 samples were considered. The results from the two studies were combined using sample-size weighted meta-analysis33. If greater than or equal to one SNP had a meta-analysis p-value<1.2×10−4 (corresponding to the p-value threshold for cis-eQTLs with FDR<5%), the SNP with the most significant p-value in the model were retained and repeated the procedure until no added SNP had a p-value <1.2×10−4.
This procedure corresponds to performing stepwise forward selection of SNPs within 1 Mb of the most upstream TSS based on the results of the meta-analysis at each step (using a stopping threshold p-value of 1.2×10−4). The conditional p-value for rs7163757 is the p-value for from the final model. The Benjamini-Hochberg method40 was used to adjust the conditional p-values for multiple testing.
ASE was quantified as described41 using RNA-seq data from FUSION34 and Groop42 human islet samples, except stranded RNA-seq reads were considered together. Targeted ASE quantitative trait locus (aseQTL) analysis was performed to identify transcripts impacted by putative regulatory SNPs (rSNPs) in the C2CD4A/B/TTS13C locus. The approach to statistically associate an rSNP genotype with a transcribed SNP's (tSNP) ASE was performed as described; with the following modifications: 1) coordinates for all transcription start sites for a given gene were merged and extended 100 kb upstream and downstream; (2) any SNPs in this window were tested as rSNPs; (3) any SNP with quantified ASE that overlapped an exonic region of the gene with at least 30x coverage was tested as a tSNP; and (4) rSNPs and tSNPs were tested in a pairwise, gene-by-gene fashion. For a tSNP/rSNP pair to be tested, we required observing with quantified ASE at least five samples each that were rSNP heterozygotes and homozyotes.
The fraction reference allele (fracRef) value measures ASE at tSNPs and is converted to the absolute allelic imbalance, a value representing the allelic skew from the expected fracRef, for aseQTL analysis. For example, a SNP with fracRef of 0.8 (over-expressed reference allele) or 0.2 (over-expressed alternate allele) and expected fracRef of 0.5 results in an absolute allelic imbalance of 0.3. Allelic imbalance is calculated by taking the absolute value of the difference between the observed fracRef and the sample-specific and allele-pair-specific expected fracRef. Absolute allelic imbalance values range from 0 to 0.5; fracRef values range from 0 to 1. The tSNP absolute allelic imbalance values are compared for homozygous versus heterozygous rSNPs using a two-sided Wilcoxon Rank Sum Test (wilcox. test in R), producing a p-value for every rSNP and tSNP pair tested. Storey's FDR was performed to correct for multiple testing burden on genes within the topologically associating domain (TAD)44 surrounding C2CD4A, C2CD4B, and VPS13(7 as opposed to genome-wide. q-values were calculated for each rSNP/tSNP pair within the TAD coordinates chr15:61412708-62612708 (h 19, as determined from hESC and IMR90 TADs44 using Bioconductor's qvahte package.
EMS As were carried out as previously described45. Briefly, 17 bp biotin end-labeled complementary oligonucleotides were designed surrounding the variant rs7163757 (5′ biotin-TGATTTTTC [C/T] ATTTTAAGC -3′, Integrated DNA Technologies) and double-stranded probes were generated for both alleles. Nuclear extract from mouse insulinoma. MIN6 cells was prepared using the NE-PER Extraction Kit (Thermo Scientific). The LightShift Chemiluminescent EMSA. Kit (Thermo Scientific) was used following the manufacturer's instructions. Binding reactions consisted of lx binding buffer, 1 ug poly (di-dC), 4 ps nuclear extract, and 200 fmol labeled probe. Reactions were incubated at room temperature for 25 minutes. For competition reactions, 64-fold excess of unlabeled probe for either allele was included and pre-incubated for 15 min. For supershift assays, 4 pg of antibody (Nkx6.1, sc-15030X; PDX1, sc-14662X; YY1, sc-1703X; p300, sc-585x; FoxA2, sc-9187X; TvlafB, sc-22830X; NFATc2 (4G6-G5)x, sc-7296X (Santa Cruz Biotechnology); NFATc1(7A6), 556602; BD Biosciences) was added to the binding reaction and pre-incubated for 25 minutes. DNA-protein complexes were detected by chemiluminescence. EMSAs were repeated and yielded comparable results.
Processed RPKM (reads per kilobase of transcript per million mapped reads) values (Dataset S1) were obtained from published studies of 5 human islets treated with 11,1-β and ITN-γ46 and of rat islets treated with 0.1 or 20 U/ml IL-1β47. INS-1(832/1.3) cells were incubated in reduced-serum (1% FBS) INS-1(832/13) complete medium overnight, then treated for 2 hours with 0 or 2 Ulml recombinant rat IL-1β (Biolegend) prior to RNA harvest and processing for RNA-seq as described above.
Guide RNA (gRNA) oligonucleotides (e.g., AGCCACTGGTATCGTCCCTT (6358 C2cd4a sgRNA1), TTCCAAAGGGACGATACCAG (6359 C2cd4a sgRNA2), CTGCTTTGACCGGCTCCCGG, (6360_C2cd4a_sgRNA3), CTGCTGCTTTGACCGGCTCC (6361_C2cd4b_sgRN A4), CTGGATATGTTAAACGTAGG (6362_C2cd4b_sgRNA1), CTTGGCATGTCCGITTAGGA (6363_C2cd4b_sgRNA2), CCTGGCCGTGCGCATCAAGG (6364_C2cd4b_sgRNA3), and CCTTGATGCGCACGGCCAGG (6365_C2cd4b_sgRNA4)) were designed to target the promoter region of C2ed4a and C2cd4b in the rat genome (m5) using crispr.mit.edu (
RNA was extracted using Trizol as previously described, and cDNA was synthesized using SuperScript IV (Invitrogen). Primer sequences can be found in the attached Supplemental Table. (Welk was performed using SYBR Green (Qiagen). C2cd4a, C2cd4b, and Vps/3c transcript levels were normalized to that of the housekeeping/control gene Gapdh using the delta et method. C2CD4A, C2CD4B, and VPS13C expression plotted in
This application claims the benefit under 35 U.S.C. § 119(e) of U.S. provisional application No. 62/624,640, filed Jan. 31, 2018, and U.S. provisional application No. 62/666,596, filed May 3, 2018, each of which is incorporated by reference herein in its entirety.
This invention was made with government support under R01 DK117137A awarded to by National Institute of Diabetes and Digestive arid Kidney Diseases, National Institutes of Health and under W81XWH-18-1-0401 awarded by Department of Defense. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2019/015958 | 1/31/2019 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62666596 | May 2018 | US | |
62624640 | Jan 2018 | US |