While modern medicine had advanced treatments for many different diseases, there remains an unmet need for treatment of disease and/or conditions associated with or contributing to disease states for which additional treatment is needed or for which no treatment currently exists. The instant disclosure addresses one or more such needs in the art.
Disclosed herein are methods of treatment of various disease states in which an individual in need thereof if administered one or more therapeutic agents capable of modulating one or more transcription factors. Also disclosed are methods by which an individual may be treated for one or more disease states, in which loci in which transcription factors bind are detected.
The application file contains at least one drawing executed in color.
Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The following description of certain examples of the technology should not be used to limit its scope. Other examples, features, aspects, embodiments, and advantages of the technology will become apparent to those skilled in the art from the following description, which is by way of illustration, one of the best modes contemplated for carrying out the technology. As will be realized, the technology described herein is capable of other different and obvious aspects, all without departing from the technology. Accordingly, the drawings and descriptions should be regarded as illustrative in nature and not restrictive.
It is further understood that any one or more of the teachings, expressions, embodiments, examples, etc. described herein may be combined with any one or more of the other teachings, expressions, embodiments, examples, etc. that are described herein. The following-described teachings, expressions, embodiments, examples, etc. should therefore not be viewed in isolation relative to each other. Various suitable ways in which the teachings herein may be combined will be readily apparent to those of ordinary skill in the art in view of the teachings herein. Such modifications and variations are intended to be included within the scope of the claims.
The terms and expressions used herein have the ordinary meaning as is accorded to such terms and expressions with respect to their corresponding respective areas of inquiry and study except where specific meanings have otherwise been set forth herein.
As used herein and in the appended claims, the singular forms “a,” “and,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a method” includes a plurality of such methods and reference to “a dose” includes reference to one or more doses and equivalents thereof known to those skilled in the art, and so forth.
The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, e.g., the limitations of the measurement system. For example, “about” can mean within 1 or more than 1 standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, or up to 10%, or up to 5%, or up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value. Where particular values are described in the application and claims, unless otherwise stated the term “about” meaning within an acceptable error range for the particular value should be assumed.
The terms “individual,” “host,” “subject,” and “patient” are used interchangeably to refer to an animal that is the object of treatment, observation and/or experiment. Generally, the term refers to a human patient, but the methods and compositions may be equally applicable to non-human subjects such as other mammals. In some embodiments, the terms refer to humans. In further embodiments, the terms may refer to children.
The term “therapeutically effective amount,” as used herein, refers to any amount of a compound which, as compared to a corresponding subject who has not received such amount, results in improved treatment, healing, prevention, or amelioration of a disease, disorder, or side effect, or a decrease in the rate of advancement of a disease or disorder. The term also includes within its scope amounts effective to enhance normal physiological function.
The terms “treat,” “treating” or “treatment,” as used herein, refers to any treatment of a disease or condition associated with a disease or physiological parameter that is dysregulated (such as blood pressure dysregulation), particularly in a human, and includes a) preventing the disease from occurring in a subject that may be predisposed to the disease and or condition but has not yet been diagnosed as having it; b) inhibiting the disease or condition, and c) relieving the disease and/or condition. “Treatment” can also encompass delivery of an agent or administration of a therapy in order to provide for a pharmacological effect, even in the absence of a disease or condition. The term “treatment” is used in some aspects to refer to administration of a compound disclosed herein to mitigate a disease or disorder in a host, for example a mammal, more specifically a human. The term “treatment” can include preventing a disorder from occurring in a host, particularly when the host is predisposed to acquiring the disease, but has not yet been diagnosed, inhibiting the disorder; and/or alleviating or reversing the disorder. Insofar as the methods describe “preventing” a disease or disorder, it is understood that the term “prevent” does not require that the disease state be completely thwarted. Rather, the term “preventing” refers to the ability of the skilled artisan to identify a population that is susceptible to disorders, such that administration of the compounds disclosed herein can occur prior to onset of a disease. The term does not mean that the disease state must be completely avoided.
The term “pharmaceutically acceptable,” as used herein, refers to a material, such as a carrier or diluent, which does not abrogate the biological activity or properties of the compounds described herein. Such materials are administered to an individual without causing undesirable biological effects or interacting in a deleterious manner with any of the components of the composition in which it is contained.
The term “pharmaceutically acceptable salt,” as used herein, refers to a formulation of a compound that does not cause significant irritation to an organism to which it is administered and does not abrogate the biological activity and properties of the compounds described herein.
The terms “composition” or “pharmaceutical composition,” as used herein, refers to a mixture of at least one compound, such as the compounds provided herein, with at least one and optionally more than one other pharmaceutically acceptable chemical components, such as carriers, stabilizers, diluents, dispersing agents, suspending agents, thickening agents, and/or excipients.
The term “carrier” applied to pharmaceutical compositions of the disclosure refers to a diluent, excipient, or vehicle with which an active compound (e.g., dextromethorphan) is administered. Such pharmaceutical carriers can be sterile liquids, such as water, saline solutions, aqueous dextrose solutions, aqueous glycerol solutions, and oils, including those of petroleum, animal, vegetable, or synthetic origin, such as peanut oil, soybean oil, mineral oil, sesame oil and the like. Suitable pharmaceutical carriers are described in “Remington's Pharmaceutical Sciences” by E. W. Martin, 18th Edition.
The term “modulated” or “modulation” or “regulated” or “regulation” can refer to both up regulation, activation, or stimulation, for example, by agonizing or potentiating, and down regulation, inhibition or suppression, for example by antagonizing, decreasing or inhibiting, unless otherwise specified or clear from the context of a specific usage.
Explaining the genetics of many diseases is challenging because most associations localize to regulatory regions. Applicant has tested the hypothesis that transcription factors (TFs) are associated with multiple loci of individual complex genetic disorders with a novel computational method for discovering disease-driving mechanisms.
Dosage
As will be apparent to those skilled in the art, dosages outside of these disclosed ranges may be administered in some cases. Further, it is noted that the ordinary skilled clinician or treating physician will know how and when to interrupt, adjust, or terminate therapy in consideration of individual patient response.
In one aspect, the dosage of an agent disclosed herein, based on weight of the active compound, administered to an individual in need thereof may be about 0.25 mg/kg, 0.5 mg/kg, 0.1 mg/kg, 1 mg/kg, 2 mg/kg, 3 mg/kg, 4 mg/kg, 5 mg/kg, 6 mg/kg, or more of a subject's body weight. In another embodiment, the dosage may be a unit dose of about 0.1 mg to 200 mg, 0.1 mg to 100 mg, 0.1 mg to 50 mg, 0.1 mg to 25 mg, 0.1 mg to 20 mg, 0.1 mg to 15 mg, 0.1 mg to 10 mg, 0.1 mg to 7.5 mg, 0.1 mg to 5 mg, 0.1 to 2.5 mg, 0.25 mg to 20 mg, 0.25 to 15 mg, 0.25 to 12 mg, 0.25 to 10 mg, 0.25 mg to 7.5 mg, 0.25 mg to 5 mg, 0.5 mg to 2.5 mg, 1 mg to 20 mg, 1 mg to 15 mg, 1 mg to 12 mg, 1 mg to 10 mg, 1 mg to 7.5 mg, 1 mg to 5 mg, or 1 mg to 2.5 mg.
In one aspect, an agent disclosed herein may be present in an amount of from about 0.5% to about 95%, or from about 1% to about 90%, or from about 2% to about 85%, or from about 3% to about 80%, or from about 4%, about 75%, or from about 5% to about 70%, or from about 6%, about 65%, or from about 7% to about 60%, or from about 8% to about 55%, or from about 9% to about 50%, or from about 10% to about 40%, by weight of the composition.
The compositions may be administered in oral dosage forms such as tablets, capsules (each of which includes sustained release or timed release formulations), pills, powders, granules, elixirs, tinctures, suspensions, syrups, and emulsions. They may also be administered in intravenous (bolus or infusion), intraperitoneal, subcutaneous, or intramuscular forms all utilizing dosage forms well known to those of ordinary skill in the pharmaceutical arts. The compositions may be administered by intranasal route via topical use of suitable intranasal vehicles, or via a transdermal route, for example using conventional transdermal skin patches. A dosage protocol for administration using a transdermal delivery system may be continuous rather than intermittent throughout the dosage regimen.
A dosage regimen will vary depending upon known factors such as the pharmacodynamic characteristics of the agents and their mode and route of administration; the species, age, sex, health, medical condition, and weight of the patient, the nature and extent of the symptoms, the kind of concurrent treatment, the frequency of treatment, the route of administration, the renal and hepatic function of the patient, and the desired effect. The effective amount of a drug required to prevent, counter, or arrest progression of a symptom or effect of a disease can be readily determined by an ordinarily skilled physician
Compositions may include suitable dosage forms for oral, parenteral (including subcutaneous, intramuscular, intradermal and intravenous), transdermal, sublingual, bronchial or nasal administration. Thus, if a solid carrier is used, the preparation may be tableted, placed in a hard gelatin capsule in powder or pellet form, or in the form of a troche or lozenge. The solid carrier may contain conventional excipients such as binding agents, fillers, tableting lubricants, disintegrants, wetting agents and the like. The tablet may, if desired, be film coated by conventional techniques. Oral preparations include push-fit capsules made of gelatin, as well as soft, scaled capsules made of gelatin and a coating, such as glycerol or sorbitol. Push-fit capsules can contain active ingredients mixed with a filler or binders, such as lactose or starches, lubricants, such as talc or magnesium stearate, and, optionally, stabilizers. In soft capsules, the active compounds may be dissolved or suspended in suitable liquids, such as fatty oils, liquid, or liquid polyethylene glycol with or without stabilizers. If a liquid carrier is employed, the preparation may be in the form of a syrup, emulsion, soft gelatin capsule, sterile vehicle for injection, an aqueous or non-aqueous liquid suspension, or may be a dry product for reconstitution with water or other suitable vehicle before use. Liquid preparations may contain conventional additives such as suspending agents, emulsifying agents, wetting agents, non-aqueous vehicle (including edible oils), preservatives, as well as flavoring and/or coloring agents. For parenteral administration, a vehicle normally will comprise sterile water, at least in large part, although saline solutions, glucose solutions and like may be utilized. Injectable suspensions also may be used, in which case conventional suspending agents may be employed. Conventional preservatives, buffering agents and the like also may be added to the parenteral dosage forms. For topical or nasal administration, penetrants or permeation agents that are appropriate to the particular barrier to be permeated are used in the formulation. Such penetrants are generally known in the art. The pharmaceutical compositions are prepared by conventional techniques appropriate to the desired preparation containing appropriate amounts of the active ingredient, that is, one or more of the disclosed active agents or a pharmaceutically acceptable salt thereof according to the invention.
The dosage of an agent disclosed herein used to achieve a therapeutic effect will depend not only on such factors as the age, weight and sex of the patient and mode of administration, but also on the degree of inhibition desired and the potency of an agent disclosed herein for the particular disorder or disease concerned. It is also contemplated that the treatment and dosage of an agent disclosed herein may be administered in unit dosage form and that the unit dosage form would be adjusted accordingly by one skilled in the art to reflect the relative level of activity. The decision as to the particular dosage to be employed (and the number of times to be administered per day) is within the discretion of the physician, and may be varied by titration of the dosage to the particular circumstances of this invention to produce the desired therapeutic effect.
In one aspect, a method of treating a disease is disclosed, in which the method may comprise the step of identifying one or more, or two or more, or three or more, or four or more, or five or more, or six or more, or seven or more, or eight or more, or nine or more, or ten or more, or 11 or more, or 12 or more, or 13 or more, or 14 or more, or 15 or more, or 16 or more, or 17 or more, or 18 or more, or 19 or more, or 20 or more, or 21 or more, or 22 or more, or 23 or more, or 24 or more, or 25 or more, or 26 or more, or 27 or more, or 28 or more, or 29 or more, or 30 or more, or 31 or more, or 32 or more, or 33 or more, or 34 or more, or 35 or more, or 36 or more, or 37 or more, or 38 or more, or 39 or more, or 40 or more, or more than 40 loci associated with a disease state as listed herein. The individual may have, or be suspected of having the disease. The method may further comprise the step of treating the individual with a compound that modulates the TF associated with the one or more loci.
Application to a matrix of 213 phenotypes and 1,544 TF binding datasets identifies 2,264 significant associations for hundreds of TFs in 94 phenotypes, including prostate and breast cancers. Strikingly, nearly half of the systemic lupus erythematosus risk loci are occupied by the Epstein-Barr virus EBNA2 protein and 24 human TFs, revealing an important gene-environment interaction. Similar EBNA2-anchored associations also exist in multiple sclerosis, rheumatoid arthritis, inflammatory bowel disease, type 1 diabetes, juvenile idiopathic arthritis, and celiac disease. Instances of allele-dependent DNA binding and downstream effects on gene expression at plausibly causal autoimmune variants support a genetic mechanism of pathogenesis centered on EBNA2. Applicant's results nominate mechanisms operating across disease risk loci, suggesting new paradigms of disease origins.
The mechanisms generating genetic associations have proven difficult to elucidate for most diseases. Gene-environment interactions may explain the etiology of many autoimmune diseases1-3. In particular, Epstein-Barr virus (EBV) infection has been implicated in the autoimmune mechanisms and epidemiology of systemic lupus erythematosus (SLE)4-7, increasing SLE risk by as much as 50-fold in children4. SLE patients also have elevated EBV loads in blood and early lytic viral gene expression6. Despite connections between EBV and multiple autoimmune diseases, the underlying molecular mechanisms remain unknown8,9.
Genome wide association studies (GWASs) have identified >50 convincing European ancestry SLE loci (
Intersection of Disease Risk Loci with TF-DNA Binding Interactions
To identify TFs that bind a significant number of risk loci for a given disease, Applicant developed the RELI (Regulatory Element Locus Intersection) algorithm. RELI systematically estimates the significance of the intersection of the genomic coordinates of plausibly causal genetic variants and DNA sequences immunoprecipitated (through ChIP-seq) by a particular TF. Observed intersection counts are compared to a null distribution composed of variant sets chosen to match the disease loci in terms of allele frequency and linkage disequilibrium (LD) block structure (
Applicant first gauged the ability of RELI to capture known or suspected connections between TFs and diseases. The androgen receptor (AR) plays a well-established role in prostate cancer17, and RELI analysis revealed that AR binding sites in VCaP cells significantly intersect prostate cancer-associated loci (17 of 52 loci, Relative Risk (RR)=3.7, corrected P-value (Pc)<10−6, Table 1). Similarly, binding sites for GATA3 in MCF7 cells significantly intersect breast cancer variants18 (Pc<10−10, Table 1). Consistent with EBV contributing to multiple sclerosis (MS)19-21 and results from a recent study22, RELI reveals that the EBV-encoded EBNA2 protein occupies 44 of the 109 MS loci in Mutu B cells (Pc<10−29, Table 1). Prostate and breast cancer loci do not significantly intersect EBNA2 peaks, nor do the loci of certain inflammatory diseases such as systemic sclerosis (Table 1). Collectively, these observations illustrate that predictions made by RELI are specific and consistent with previously established disease mechanisms.
Applicant assembled 53 European ancestry SLE loci (P<5×10−8) with risk allele frequencies >1%, constituting 1,359 plausibly causal SLE variants. To explore the possible environmental contribution from EBV, Applicant evaluated the ChIP-seq data from EBV-infected B cells for the EBV gene products EBNA1, EBNA2 (three datasets), EBNA3C, EBNA-LP, and Zta (Supplementary Data 2). EBNA2 occupies loci that significantly intersect SLE risk loci in all three available ChIP-seq datasets (Table 1). For example, 26 of 53 European SLE GWAS loci contain DNA immunoprecipitated by EBNA2 in the Mutu B cell line, an almost 6-fold enrichment (Pc<10−24). No association was detected for the other EBV-encoded proteins. To examine the possibility that these results might simply be explained by enrichment of SLE loci in B cell open chromatin regions, Applicant restricted the RELI null model to variants located in DNase hypersensitive regions in EBV-infected B cells. With this higher stringency null model, all of the EBNA2 associations remained significant. Thus, the associations Applicant detect between SLE risk loci and EBNA2 cannot simply be explained by the previously established strong co-localization between SLE risk loci and B cell regulatory regions in the genome23.
Applicant next applied RELI to a large collection of human TF ChIP-seq datasets (1,544 experiments evaluating 344 TFs and 221 cell lines). In total, 132 ChIP-seq datasets involving 60 unique TFs strongly intersect SLE loci (10-53<Pc<10-6). 109 (83%) of the experiments were performed in EBV-infected B cell lines, with impressive fidelity between datasets. Nearly identical results were obtained using a null model that also takes the distance to the nearest gene transcription start site into account (
If EBV is involved in SLE pathogenesis, then the absence of EBV, and hence EBNA2, should diminish the observed associations with SLE risk loci. For eight TFs, ChIP-seq datasets are available in both EBNA2-expressing (EBV-infected) and EBV negative B cell lines.
Notably, the four TFs with the strongest RELI P-values in EBV-infected B cells (BATF, IRF4, PAXS, and SPI1) have weaker P-values in EBV negative B cells (
EBNA2-Occupied Genomic Sites Intersect Autoimmune-Associated Loci
Applicant applied RELI to 213 diseases and phenotypes obtained from the NHGRI GWAS catalog′ and other sources, revealing nine phenotypes displaying strong EBNA2 association in addition to SLE and MS: rheumatoid arthritis (RA), inflammatory bowel disease (IBD), type 1 diabetes (T1D), juvenile idiopathic arthritis (JIA), celiac disease (CelD), chronic lymphocytic leukemia (CLL), Kawasaki disease (KD), ulcerative colitis (UC), and immunoglobulin glycosylation (IgG) (
Consistent with the SLE results (
In order to identify additional EBNA2 co-factor candidates, Applicant isolated EBNA2 disorder-associated variants located within EBNA2 ChIP-seq peaks and evaluated them using RELI. This analysis confirms the importance of RBPJ, followed by members of the basal transcriptional machinery (TBP and p300), and NFκB subunits (which are involved in EBNA2-mediated gene activation′) (
The particular TFs tend to be shared across the EBNA2 disorders, but the loci they occupy are less frequently shared. No EBNA2-bound locus is associated with all seven EBNA2 disorders; most loci are unique to only one disorder (
If changes in gene regulation explain these results, then expression trait quantitative loci (eQTLs), ChIP-seq peaks for Pol-II, and histone marks associated with active gene regulatory regions should be relatively concentrated at the risk loci occupied by EBNA2. These predictions are indeed true for each of the seven EBNA2 disorders (
EBNA2 Participates in Allele-Dependent Formation of Transcription Complexes at Disease Risk Loci
The observed associations (
Applicant applied MARIO to 271 ChIP-seq datasets performed in the five genotyped cell lines, altogether assessing 98 different molecules. Since EBNA2 binds DNA through co-factors, Applicant first asked if the variants displaying EBNA2 allele-dependent binding might also coincide with similarly altered binding of other TFs. This analysis revealed strong concordance of allele-dependent binding events both within and across cell types. For example, Applicant identified 68 heterozygous common variants located within allele-dependent EBNA2 GM12878 ChIP-seq peaks. EBF1, whose binding is globally influenced by EBNA236, has a coincident ChIP-seq peak favoring the same allele at 39 (57%) of these loci, as opposed to only 8 (11%) on the opposite allele (P<10′, binomial test,
To detect potential downstream effects of allelic EBNA2 binding, Applicant measured genome-wide gene expression levels by RNA-seq in Ramos, an EBV negative B cell line that can support an EBV infection. Applicant confirmed the expected presence or absence of EBNA2 by sequencing and western blot (
Applicant next searched for autoimmune-associated variants that might impact EBNA2 binding, resulting in allelic expression of a nearby gene. This analysis was dependent on the small subset of genetic variants satisfying four necessary criteria: the variant must be (1) plausibly causal for an autoimmune disorder; (2) immunoprecipitated by EBNA2; (3) heterozygous in the cell line assayed; and (4) proximal to a plausible target mRNA that contains a heterozygous variant in Ramos cells (to detect allelic expression). For example, the 23 EBNA2 variants listed satisfy the first three criteria, but only five satisfy the fourth criterion of being within 50kb of a potential target gene containing a heterozygous variant in the Ramos cell line.
Despite these limitations, Applicant's approach identified autoimmune-associated variants displaying allelic EBNA2 binding and allelic expression of a nearby gene. For example, rs3794102, a variant strongly associated with vitiligo (P<10−9), has significantly skewed allelic binding of eight proteins—EBNA2, its suspected co-factor EBF136, and chromatin accessibility all favor the non-reference ‘G’ vitiligo risk allele (
Autoimmune-Associated Genetic Mechanisms in EBV-Infected B Cells
Applicant next used RELI to rank cell types by their relative importance to each of the EBNA2 disorders, based on the intersection between disease-associated variants and likely regulatory regions in that cell type. This procedure revealed a clear enrichment for EBV-infected B cells in SLE. For example, of the 175 H3K27ac ChIP-seq datasets available, the highest ranked 30 datasets are all from EBV-infected B-cell lines (
RELI Identifies Relationships Between Particular TFs and Many Diseases
Extension of RELI analysis to GWAS data for 213 phenotypes produced 2,264 significant (Pc<10−6) TF-disease connections. In addition to the EBNA2-related associations, clustering of these results reveals a large grouping of hematopoietic phenotypes and well-established blood cell regulators such as GATA1 and TAL1 (
Efforts to understand the gene-environment interaction of SLE loci with EBV have revealed that EBNA2 and its associated human TFs occupy a significant fraction of autoimmune risk loci. Further analyses suggest that multiple causal autoimmune variants may act through allele-dependent binding of these proteins, resulting in downstream alterations in gene expression. In this scenario, the relevant TFs and gene expression changes must occur in the cell type that alters disease risk. Collectively, Applicant's data identify the EBV-infected B cell as a possible site for gene action in multiple autoimmune diseases, with the caveat that existing data are biased, having been predominantly collected in this cell type. Notably, four of the top 20 TFs that co-occupy EBNA2 disorder loci with EBNA2 are targeted by at least one available drug (MED1, EP300, NFKB1, and NFKB2)45, and a recent study shows that the C-terminal domain of the BS69/ZMYND11 protein can bind to and inhibit EBNA246. These results offer promise for the development of future therapies for manipulating the action of these proteins in individuals harboring risk alleles at EBNA2-bound loci.
The disclosed results nominate particular TFs and cell types for 94 phenotypes, providing mechanisms possibly explaining the molecular and cellular origins of disease risk for experimental verification and exploration.
Methods Summary
Applicant compiled and curated a set of 99,733 variants associated with or in strong linkage disequilibrium with 213 phenotypes (based upon direct genotyping and/or standard variant imputation). Applicant collected a set of 2,511 functional genomics datasets (ChIP-seq for specific proteins, ChIP-seq for histone marks, DNase-seq, and eQTLs) from a variety of sources. Applicant developed a novel algorithm, RELI (Regulatory Element Locus Intersection), to estimate the significance of the intersection between the variants associated with a given phenotype and a given functional genomics dataset. To identify allelic binding of proteins within ChIP-seq datasets, Applicant genotyped five EBV-infected B cell lines, and developed a novel pipeline called MARIO (Measurement of Allelic Ratios Informatics Operator) to detect allelic read count imbalance at heterozygotes in the assayed cell line. To identify gene expression patterns dependent upon both genotype and EBV, Applicant performed RNA-seq in Ramos B cell lines with or without EBV infection. Details are provided in the Supplementary Methods.
Collection and Processing of Datasets
Applicant compiled a large collection of genetic and functional genomic datasets from a variety of sources. Phenotype-associated genetic variants were largely obtained from the NHGRI GWAS catalog29. This catalog does not contain candidate gene studies, including those from the widely-used ImmunoChip platform47. For SLE, MS, SSc, RA, and JIA, peer-reviewed literature was thus curated to maximize the number and accuracy of loci. Only associations exceeding genome-wide significance (P<5×10−8) were considered. Datasets were separated and annotated by ancestry, except where noted. Phenotypes were filtered to only include those with five or more associated loci separated by at least 500 kb, following Farh et al.30. Loci containing multiple variants were restricted to the single most strongly associated variant, and subsequently expanded to incorporate variants in strong linkage disequilibrium (LD) (r2>0.8) with this variant using Plink48. The resulting variants in each locus are referred to as plausibly causal.
Functional genomics data, including ChIP-seq and DNase-seq, were obtained from a variety of sources, including ENCODE49 (downloaded on 4/14), Roadmap epigenomics50 (6/15), Cistrome51 (12/15), PAZAR52 (4/14), ReMap-ChIP53 (8/15), and Gene Expression Omnibus54. ChIP-seq datasets containing less than 500 peaks were removed. The genomic coordinates of the peaks for each dataset were stored as .bed files. eQTLs were obtained from GTExPortal55 (1/16), the Pritchard lab eQTL database (http://eqthuchicago.edu/) (4/14), and the Harvard eQTL database (https://www.hsph.harvard.edu/liming-liang/software/eqtl/) (4/14). TF binding motif models in the form of position frequency matrices were obtained from Cis-BP (build 1.02)56.
Regulatory Element Locus Intersection (RELI) Algorithm
Applicant created the RELI algorithm to search for potential shared regulatory mechanisms acting across phenotype-associated loci. In brief, RELI takes a set of variants as input, expands the set using LD blocks, and calculates the statistical intersection of the resulting loci with every dataset in a compendium (e.g., ChIP-seq datasets) (
RELI was designed to be flexible in terms of the null models it employs. The default null model, as described above, uses all common variants in the genome. Applicant also considered a higher-stringency null model by only considering common variants located within DNase-seq peaks in any of the 22 available EBV-infected B cell line datasets. This null model thus controls for the known association of SLE-associated variants with regulatory regions in B cells23.
Applicant identified the optimal clusters depicted as red boxes in
Cell Line Genotyping and Imputation
Without genotyping data, it is not possible to distinguish between perfect allelic imbalance at a heterozygous variant (e.g., 10 reads on one allele and 0 on the other) and homozygosity. Applicant therefore genotyped five EBV-infected B cell lines that had previously been used for ChIP-seq experiments. Genotyping was performed as previously described59 on Illumina OMNI-5 genotyping arrays using Infinium2 chemistry. Genotypes were called using the Gentrain2 algorithm within Illumina Genome Studio. Quality control on the variants from autosomal chromosomes was performed as previously described59. Quality control data cleaning was performed in the context of a larger batch of non-disease controls to allow for the assessment of data quality. Briefly, all cell lines had call rates >99%, only common variants (minor allele frequency >0.01) were included, and all variants were previously shown to be in Hardy-Weinberg equilibrium in control populations at P>0.000159. To detect associated variants that were not directly genotyped on the OMNI-5, Applicant performed genome-wide imputation using overlapping 150 kb sections of the genome with IMPUTE260 and used a composite imputation reference panel of pre-phased integrated haplotypes from the 1,000 Genomes Project sequence data freeze from June 2014. Imputed genotypes were required to meet or exceed a probability threshold of 0.9, an information measure of >0.5, and the same quality-control criteria threshold described above for the genotyped markers.
Detection of Allele-Dependent Sequencing Reads Using MARIO
Applicant developed the MARIO (Measurement of Allelic Ratio Informatics Operator) pipeline to identify allele-dependent behavior at heterozygous variants in functional genomics datasets such as ChIP-seq. In brief, the pipeline downloads a set of reads, aligns them to the genome, calls peaks using MACS244 (parameters: --nomodel--extsize 147-g hs-q 0.01), identifies allele-dependent behavior at heterozygotes within peaks (described below), and annotates the results (
To estimate the significance of the degree of allelic imbalance of a given ChIP-seq, ATAC-seq, or DNase-seq dataset at a given heterozygote, Applicant developed a value called the ARS (Allelic Reproducibility Score). The ARS is based on a combination of two predictive variables for a given heterozygous variant of a given dataset—the total number of reads available at the variant and the imbalance between the number of reads for each allele. Other variables were tested and deemed uninformative (see below). The ARS value also accounts for the number of available experimental replicates, and the degree to which they agree. ARS values were calibrated using seven TFs with ChIP-seq datasets available in four replicate experiments in GM12878 or K562 cell lines: SPI1 (set 1), SPI1 (set 2), NRSF, REST, RNF2, YY1 and ZBTB33. The presence of multiple replicates monitoring binding of the same TF in the same cell type enables the estimation of the degree to which allelic behavior is reproducible, given the values of the predictive variables.
ARS Values were Defined and Calculated Using the Following Procedure:
1) Determine the number of reads mapping to each allele of each heterozygous variant in each replicate. The pipeline was applied to each experimental replicate and counted the number of reads that overlap each heterozygous variant, corresponding to the two alternative alleles. All duplicate reads were removed using the “MarkDuplicates” tool from the PICARD software package (https://broadinstitute.github.io/picard/). Before mapping reads using Bowtie261 (parameters-N 1--np 0--n-ceil 10--no-unal), Applicant masked all common variants in the GrCh37 (hg19) reference genome to N. This step removed bias generated by reads carrying non-reference alleles. Applicant designated the allele with the greater number of reads the strong allele, and the other the weak allele (
2) Identify predictive variables of reproducible allele-dependent behavior across replicates. Applicant identified variables that are predictive of reproducible allelic behavior across multiple ChIP-seq replicates within a dataset. Applicant collected a set of seven datasets, {D}, with each dataset comprised of four experimental replicates, {R} (
Applicant evaluated the performance of each of these variables using a true-positive set of reproducible variants. This set was created by identifying all variants that share the same strong allele across all four replicates (
3) Determine a function mapping the values of the predictive variables to a single ARS value. Applicant next created a function for mapping the values of predictive variables for any heterozygous variant to a single ARS value estimating the degree of reproducible allelic behavior. Applicant developed a scheme that accounts for the fact that any given dataset might contain any number of experimental replicates, with agreement between a larger number of replicates being a desirable trait. Within each of the seven datasets in the set {D}, all possible combinations of one, two, or three replicates is considered. Without loss of generality, the procedure for the case of two replicates is described, which considers the subsets {R1,R2}, {R1,R3}, {R1,R4}, etc. The set {H} of reproducible variants is first identified (as described above) for each subset. The WS_ratio is transformed into ranges, {(0-0.1), (0-0.2), (0-0.3), . . . (0-1)}, and for each range, the fraction of variants that are contained in the reproducible variant set as a function of num_reads is calculated (
where w is the WS_ratio, r is num_reads, and Aw and Bw are the fitting parameters. The resulting functions yield ARS values for any given heterozygous variant in any dataset, as a function of the number of experimental replicates, the WS_ratio, and num_reads. As a final step, when multiple replicates are available, an ARS value is only reported for a variant if the strong allele is consistent in the majority of cases, to account for the possibility of a failed experiment. A direct interpretation of the ARS values can be seen in the relationship between ARS values and the WS_ratio (
The corresponding NCBI experiment run identifiers for the seven ChIP-seq datasets with four available replicates are: NRSF (SRR1176035, SRR1176037, SRR1176039, SRR1176050), REST (SRR400395, SRR400396, SRR400397, SRR400398), RNF2 (SRR400400, SRR400401, SRR400402, SRR400403), SPI1 (set 1) (SRR1176055, SRR1176056, SRR1176057, SRR1176058), SPI1 (set 2) (SRR351880, SRR351881, SRR578180, SRR578181), YY1 (SRR351719, SRR351720, SRR578174, SRR578175), ZBTB33 (SRR1176059, SRR1176060, SRR1176061, SRR1176062).
EBV Infection of Ramos Cells.
All cells were confirmed to be free of mycoplasma infection using PlasmaTest (InvivoGen, San Diego, Calif.) prior to use in experiments. Wild-type EBV was prepared from supernatants of B95-8 cells cultured in RPMI medium 1640 supplemented with 10% FBS for two weeks. Briefly, the cells were pelleted and the virus suspension was filtered through 0.45 μM Millipore filters. The concentrated virus stocks were aliquoted and stored at −80° C.
Applicant infected ˜2×106 Ramos Cells (ATCC CRL-1596) in the presence of growth medium containing 2 μg/ml of phytohemagglutinin (PHA) for 4 hours. The infected cells were washed, cultured in growth media, and observed daily for multinuclear giant cell formation and morphological changes characteristic of EBV-infected B cells. After 10 passages, the infection was confirmed by measuring the expression of viral EBNA2 protein levels (
RNA-Seq
RNA was isolated from Ramos cell lines with and without EBV infection using the mirVANA Isolation Kit (Ambion). RNA sequencing targeting 150 million mappable 125 basepair reads from paired-end, poly-A enriched libraries was performed at the CCHMC DNA Sequencing and Genotyping Core Facility at CCHMC. Sequencing reads were aligned to the GrCh37 (hg19) build of the human genome using TopHat62 and Bowtie261 with Ensembl63 RNA transcript annotations as a guide. In parallel, these data were aligned to the EBV genome (NCBI). As expected, 0 reads mapped in the EBV negative dataset, whereas 7,349 reads mapped in the EBV-infected dataset. 82.8% of the sequence reads aligned specifically to the human transcriptome, with a 2.6% increase in the aligned reads in the EBV negative samples. No abnormal quality control (QC) flags were identified following QC analysis with the software FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). For allelic analysis, sequencing reads were aligned to the GrCh37 (hg19) build of the human genome using Hisat264. Differential expression analysis was performed using Cufflinks65.
As additional QC, Applicant further compared the results to a study examining host gene expression changes to EBV infection in primary B cells28. Of the 80 genes whose expression is significantly altered by the presence of EBV in Applicant's study, 18 of them are also significantly differentially expressed in this dataset. Further, among the 80 differentially expressed genes detected, many of them represent classic host genes whose expression is modulated by EBV. Some gene expression is increased by the virus, while the expression of other genes is decreased. In all of these cases, the data agree with the established paradigm. Genes whose expression is activated by EBV include CD4466, TNFAIP267, MX168, and IFI4469; genes with lower expression include VAV370 and CD9971.
Allelic qPCR
gDNA and RNA were extracted from Ramos cells with and without B95.8 EBV infection using the DNeasy Blood & Tissue Kit (Qiagen) and mirVana miRNA Isolation Kit (Invitrogen), respectively. RNA was treated with DNase using the TURBO DNA-free Kit (Ambion) and converted to cDNA using the High-Capacity RNA-to-cDNA Kit (Applied Biosystems). qPCR was performed with a single set of Taqman genotyping primers (Applied Biosystems) to rs8193 using the ABI 7500 PCR system. Fold change of expression was calculated with 2-ΔΔCT values, where cDNA was normalized to gDNA.
Data Availability
RNA-seq data are available in the Gene Expression Omnibus (GEO) database under accession number GSE93709. Full datasets and results, including disease variants (with alleles) and all RELI and MARIO output, are provided in the Supplementary Material.
Code Availability
The final RELI and MARIO source code, with documentation, will be made freely available under the GNU General Public License on the Weirauch Lab Bitbucket page: https://bitbucket.org/account/user/weirauchlab/projects/ci
All percentages and ratios are calculated by weight unless otherwise indicated.
All percentages and ratios are calculated based on the total composition unless otherwise indicated.
It should be understood that every maximum numerical limitation given throughout this specification includes every lower numerical limitation, as if such lower numerical limitations were expressly written herein. Every minimum numerical limitation given throughout this specification will include every higher numerical limitation, as if such higher numerical limitations were expressly written herein. Every numerical range given throughout this specification will include every narrower numerical range that falls within such broader numerical range, as if such narrower numerical ranges were all expressly written herein.
The dimensions and values disclosed herein are not to be understood as being strictly limited to the exact numerical values recited. Instead, unless otherwise specified, each such dimension is intended to mean both the recited value and a functionally equivalent range surrounding that value. For example, a dimension disclosed as “20 mm” is intended to mean “about 20 mm.”
Every document cited herein, including any cross referenced or related patent or application, is hereby incorporated herein by reference in its entirety unless expressly excluded or otherwise limited. The citation of any document is not an admission that it is prior art with respect to any invention disclosed or claimed herein or that it alone, or in any combination with any other reference or references, teaches, suggests or discloses any such invention. Further, to the extent that any meaning or definition of a term in this document conflicts with any meaning or definition of the same term in a document incorporated by reference, the meaning or definition assigned to that term in this document shall govern.
While particular embodiments of the present invention have been illustrated and described, it would be obvious to those skilled in the art that various other changes and modifications can be made without departing from the spirit and scope of the invention. It is therefore intended to cover in the appended claims all such changes and modifications that are within the scope of this invention.
This application claims the benefit of and priority to U.S. Provisional Patent Application Ser. No. 62/361,174, filed Jul. 12, 2016, entitled “Role for Epstein-Barr Virus EBNA2 in Autoimmunity,” U.S. Provisional Patent Application Ser. No. 62/385,197, filed Sep. 8, 2016, entitled “Transcription Factors Operating Across Disease Loci:EBNA2 in Autoimmunity,” U.S. Provisional Patent Application Ser. No. 62/455,649, filed Feb. 7, 2017, entitled “Drug Discovery in Lupus with Allele Specific Reporters,” U.S. Provisional Patent Application Ser. No. 62/459,326, filed Feb. 15, 2017, entitled “Drug Discovery in Lupus with Allele Specific Reporters,” and U.S. Provisional Patent Application Ser. No. 62/479,685, filed Mar. 31, 2017, entitled “Drug Discovery for Allele Specific Gene Regulation,” the contents of which are incorporated herein in their entirety for all purposes.
This invention was made with government support under A1024717 awarded to the National Institutes of Health. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
62361174 | Jul 2016 | US | |
62385197 | Sep 2016 | US | |
62455649 | Feb 2017 | US | |
62459326 | Feb 2017 | US | |
62479685 | Mar 2017 | US |