This disclosure relates generally to methods for treating complex diseases in a human subject. Specifically, this comprises constructing a genetic risk score orientated around genes related to the mode of action of a specific agent, and thus, selecting a suitable agent for treatment in an individual through the use of this genetic score.
The vast majority of global disease burden is underpinned by complex disorders, including, but not limited to, psychiatric and neurobehavioural disorders, neurodegenerative disorders, inflammatory and autoimmune disorders, metabolic and cardiovascular disease, cancer, and renal disease. Genetic risk plays an intrinsic role in common human disease and provides insights to may assist to improve patient outcomes. Although, genome-wide association studies (GWAS) have revealed much of the complexity of the heritable component of these traits, we will need innovative approaches to translate vast amounts of genetic data available into clinically actionable insights.
A key aspect of genetic component of complex disorders is that inter-individual heterogeneity is pervasive. In other words, the precise genetic risk factors carried by any given patient will be highly variable, and this can result in similarly variable biology being impacted by the genetic architecture of a disorder. Understanding these differences between individuals is likely crucial to facilitate precision management of these disorders and assist in treatment formulation. Conventional approaches that summate genetic risk burden in an individual, such as a polygenic risk scoring (PRS), also commonly referred to as polygenic scoring (PGS), do so by weighting individual alleles carried genome-wide by their association effect size from a well powered GWAS for the trait or disorder in question. These PRS/PGS approaches have demonstrated significant associations with a diverse range of phenotypes at the population level; for example, heart disease, breast cancer, type 2 diabetes, and inflammatory bowel disease (Khera et al., 2018, Nature Genetics, 50: 1219-1224).
Whilst genome wide PRS/PGS can model individual differences in genetic risk, a key limitation of these methods are their composition of heterogeneous genetic risk factors that lack biological salience and cannot provide specific information that would assist to formulate treatment for a complex disorder. As a result, there is an ongoing need for methodology that utilise the genetic architecture of complex disorders revealed by GWAS in a manner that is informative for treatment. Specialised genetic risk scores, termed pharmagenic enrichment scores (PES), are specifically oriented around clinically actionable, that is, targetable by drugs, biological pathways (Reay et al., 2020, Scientific Reports, 10(1):879). The use of this PES approach is designed around identifying targetable pathways for a disorder with no a priori hypothesis, however, there is no inherent information as to which direction of therapeutic modulation is appropriate.
This disclosure is predicated on the application of biologically directed polygenic scores, that is, apharmagenic enrichment score, directed to genes associated with either a specific therapeutic agent or a specific therapeutic target, whereby the direction of beneficial therapeutic modulation can be predicted genetically. A therapeutic agent in this context includes, but is not limited to, a pharmacological agent, lifestyle intervention, or non-prescription supplement; moreover, a therapeutic target encompasses a gene and its associated mRNA, mRNA isoforms thereof, protein, protein isoforms thereof, or post-translational modifications of said protein. Both of the above, that is, a therapeutic agent or therapeutic target, are hereafter referred to collectively as a directional anchor.
Accordingly, the present disclosure provides a mechanism for treating a complex disorder in a human subject comprising:
The method of treatment described above represents a key advance in that the application of the pharmagenic enrichment score is specifically orientated around a selected therapeutic, and thus, the direction of effect of the selected therapeutic is known or predicted.
Embodiments of the disclosure are described herein, by way on non-limiting example, with reference to the accompanying drawings.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by those of ordinary skill in the art to which the invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, preferred methods and materials are described. All patents, patent applications, published applications and publications, databases, websites and other published materials referred to throughout the entire disclosure, unless noted otherwise, are incorporated by reference in their entirety. In the event that there is a plurality of definitions for terms, those in this section prevail. Where reference is made to a URL or other such identifier or address, it is understood that such identifiers can change and particular information on the internet can come and go, but equivalent information can be found by searching the internet. Reference to the identifier evidences the availability and public dissemination of such information.
The articles “a”, “an”, and “the” additionally include their plural aspects unless in the event that their context clearly states otherwise. Therefore, reference to “an agent” includes a single agent, as well as two or more agents, and so on, and so forth.
In its central aspect, the present disclosure provides a method for treating a complex disorder whereby it comprises;
The proceeding sections will further delineate specific terminology and components of the method for treatment described above.
The term “Complex Disorders” as used herein refers to disorders which do not display typical patterns of Mendelian inheritance in the majority of instances, that is, they do not arise from a single gene or small set of genes. Moreover, complex disorders result from a complex interplay between heritable (genetic) and environmental components. Complex disorders would be known to those skilled in the art, with some examples for illustration including heart disease, schizophrenia, breast cancer, Parkinson's disease, bipolar disorder, diabetes, asthma, and Crohn's disease.
A complex disorder may also encompass one or many “complex traits”, which is often used interchangeably by those skilled in the art with the term “quantitative trait.” These complex traits also do not exhibit Mendelian inheritance patterns, and exist as a distribution of continuous variables amongst individuals—examples thereof include, height, body-mass index, white blood cell count, high-density lipoprotein, blood pressure, and creatinine.
The term “variant” as used herein refers to any modification to the DNA sequence as compared to one or more reference DNA sequences. Variants may involve any number of adjacent or spaced apart bases or series of bases, and may include single nucleotide substitutions, insertions, deletions, and block substitutions of nucleotides, structural variants, fusion, copy number variants, repeat length variants, variable number tandem repeats, microsatilites, minisatelites.
In an embodiment, variants are selected from the group consisting of common SNPs, CNV, gene deletions, gene inversions, gene duplications, splice variants and haplotypes associated with the complex disorder. In a preferred embodiment, the variants are SNPs.
The term “genome-wide variants” as used herein refers to information pertaining to genetic variants across the whole genome. Such information includes variants in both coding and non-coding regions of the genome.
In an embodiment, the data representing genome-wide variants is selected from the group consisting of single nucleotide polymorphism (SNP) genotype data, copy number variant (CNV) data, gene deletion data, gene inversion data, gene duplication data, splice variant data, haplotype data, or combinations thereof.
In an embodiment, the data representing genome-wide variants is SNP genotype data.
As used herein, the term “SNP” or “single nucleotide polymorphism” refers to a genetic variation between individuals; e.g., a single nitrogenous base position in the DNA of organisms that is variable. As used herein, “SNPs” is the plural of SNP.
The term “polymorphism” as used herein refers to a locus that is variable; that is, within a population, the nucleotide sequence at a polymorphism has more than one version or allele. One example of a polymorphism is a “single nucleotide polymorphism”, which is a polymorphism at a single nucleotide position in a genome (the nucleotide at the specified position varies between individuals or populations).
The term “gene” as used herein refers to one or more sequence(s) of nucleotides in a genome that together encode one or more expressed molecules, e.g., an RNA, or polypeptide. The gene can include coding sequences that are transcribed into RNA, which may then be translated into a polypeptide sequence, and can include associated structural or regulatory sequences that aid in replication or expression of the gene.
The term “genotype” as used herein refers to the genetic constitution of an individual (or group of individuals) at one or more genetic loci. Genotype is defined by the allele(s) of one or more known loci of the individual, typically, the compilation of alleles inherited from its parents.
The term “haplotype” as used herein refers to the genotype of an individual at a plurality of genetic loci on a single DNA strand. Typically, the genetic loci described by a haplotype are physically and genetically linked, i.e., on the same chromosome strand.
The term “allele” refers to one of two or more different nucleotide sequences that occur or are encoded at a specific locus, or two or more different polypeptide sequences encoded by such a locus. For example, a first allele can occur on one chromosome, while a second allele occurs on a second homologous chromosome, e.g., as occurs for different chromosomes of a heterozygous individual, or between different homozygous or heterozygous individuals in a population. One example of a polymorphism is a SNP, which is a polymorphism at a single nucleotide position in a genome (the nucleotide at the specified position varies between individuals or populations).
The term “allele frequency” as used herein refers to the frequency (proportion or percentage) at which an allele is present at a locus within an individual, within a line, or within a population of lines. For example, for an allele “A” diploid individuals of genotype “AA”, “Aa” or “aa” may have allele frequencies of 2, 1, or 0, respectively. One can estimate the allele frequency within a line or population (e.g., cases or controls) by averaging the allele frequencies of a sample of individuals from that line or population. Similarly, one can calculate the allele frequency within a population of lines by averaging the allele frequencies of lines that make up the population.
An individual is “homozygous” if the individual has only one type of allele at a given locus (e.g., a diploid individual has a copy of the same allele at a locus for each of two homologous chromosomes). An individual is “heterozygous” if more than one allele type is present at a given locus (e.g., a diploid individual with one copy each of two different alleles). The term “homogeneity” indicates that members of a group have the same genotype at one or more specific loci. In contrast, the term “heterogeneity” is used to indicate that individuals within the group differ in genotype at one or more specific loci.
The term “locus” as used herein refers to a chromosomal position or region. For example, a polymorphic locus is a position or region where a polymorphic nucleic acid, trait determinant, gene or marker is located. In a further example, a “gene locus” is a specific chromosome location (region) in the genome of a species where a specific gene can be found.
Methods for obtaining data representing genome-wide variants would be known to persons skilled in the art, illustrative examples of which include performing microarray analysis, massively parallel sequencing, amplicon sequencing, multiplexed PCR, molecular inversion probe assay, GoldenGate assay, allele-specific hybridization, DNA-polymerase-assisted genotyping, ligase-assisted genotyping, and comparative genomic hybridization (CGH). Alternatively, data representing genome-wide variants may be obtained from published datasets.
In an embodiment, the data representing genome-wide variants is obtained from genome-wide association study (GWAS) summary statistics.
It is contemplated herein that the data representing genome-wide variants from the plurality of individuals with the complex disorder and the plurality of individuals without the complex disorder may be obtained using one method, which may differ from the method for obtaining data representing genome-wide variants from the subject. For example, SNP genotype data from the plurality of individuals with the complex disorder and the plurality of individuals without the complex disorder may be obtained by SNP microarray, while the SNP genotype from the subject may be obtained by massively parallel sequencing.
In an embodiment, the data representing genome-wide variants from a plurality of individuals with the complex disorder and a plurality of individuals without the complex disorder is obtained from a GWAS. GWAS are observational studies of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWAS have identified a large number of genetic variants significantly associated with human disease. These disease-associated variants have provided candidate genes for further study and hypotheses about disease mechanisms. GWAS have also confirmed the polygenic nature of complex disorders, particularly for psychiatric disorders. For example, GWAS studies have demonstrated that the cumulative effect of a large number of weakly associated SNPs, most of which are not statistically significant alone.
The term effect size would be understood by those skilled in the art as an output from a generalised linear model which represents the effect of a variant, per allele under an additive model, on the phenotype or complex disorder of interest. In an embodiment, these effect sizes represent mean genotype-disorder effects.
A directional anchor hereby refers to a specific biological factor to which the method of treatment is guided. In one embodiment, this is a therapeutic agent around which precision treatment of a complex disorder in a human subject could be implemented. A therapeutic agent in this context includes, but is not limited to, a pharmacological agent, lifestyle intervention, or non-prescription supplement. In another embodiment, this is a therapeutic targets, and includes, a gene and its associated mRNA, mRNA isoforms thereof, protein, protein isoforms thereof, or post-translational modifications of said protein.
The term pharmagenic enrichment score or PES as used herein refers to a polygenic score calculated for a pharmacologically relevant set of genes. Specifically annotating total polygenic risk for a disorder in this fashion facilitates a more therapeutically relevant implementation of this information for any given individual. The term “polygenic risk score” is used to define an individuals' risk of developing a complex disorder or progressing to a more advanced stage of a disorder, based on a large number, typically thousands, of common genetic variants each of which might have modest individual effect sizes contribute to the disease or its progression, but in aggregate have significant predicting value. In the present case, polygenic risk score may be used to predict the likelihood that an individual will develop a complex disorder using common single nucleotide SNPs associated with the complex disorder. However, genome-wide polygenic risk score (as a biologically unannotated instrument) does not necessarily provide insight into pathways suitable for pharmacologically intervention in individuals.
In accordance with the methods disclosed herein, an elevated PES for a given pharmacologically relevant pathway is indicative that the subject will be sensitive to a therapeutic agent that is known to interact with the pharmaceutically relevant pathway. As described elsewhere herein, elevated PES is not significantly related to polygenic risk. Accordingly, the PES approach can capture latent enrichment of polygenic signal in pathways relevant to pharmaceutical actions in subjects with a low overall trait PRS relative to others with the same complex disorder phenotype.
In an embodiment, PES is calculated from SNPs mapped to genes which form the candidate pharmacologically actionable geneset. This may comprise model (1) which sums the statistical effect size of each variant in the geneset multiplied by the allele count (dosage) for said variant. For example, for individual i, let denote the statistical effect size from the GWAS for each variant j in the geneset, multiplied by the dosage (G) of j in i.
The term “reference predictive polygenic score” is interchangeable with the terms “reference pharmagenic enrichment score” or “reference PES”. In an illustrative example, the comparison may be carried out using a reference predictive polygenic score that is representative of a known or predetermined predictive polygenic risk score from an individual, from a large reference cohort or a cohort of case and controls for the complex disorder phenotype in question, that is associated with sensitivity to a therapeutic agent, as described elsewhere herein.
The reference predictive polygenic score is typically a predetermined predictive polygenic score in a particular cohort or population of subjects (e.g., normal healthy controls, subjects with the complex disorder phenotype in question, subjects who had no sign of the complex disorder at the time the reference sample was obtained but who have gone on to develop the complex disorder, etc.). The reference value may be represented as an absolute number, or as a mean value (e.g., mean+/−standard deviation), such as when the reference value is derived from (i.e., representative of) a population of individuals.
Whilst persons skilled in the art would understand that using a reference predictive polygenic score that is derived from a sample population of individuals is likely to provide a more accurate representation of the predictive polygenic score in that particular population (e.g., for the purposes of the methods disclosed herein), in some embodiments, the reference predictive polygenic score can be a predictive polygenic score derived from the genome-wide variant information obtained from a single biological sample.
The pharmagenic enrichment score or PES, as described elsewhere herein, is calculated specifically relative to genes which are biologically interact with a directional anchor. The term “biologically interact” would be understood by those skilled in the art to encompass themes which include, but are not limited to, physical protein interaction, co-expression, co-occurrence in a database, and correlated expression. The two central embodiments of a directional anchor have been outlined elsewhere herein, with further elaboration in the proceeding text.
In an embodiment, a directional anchor is a therapeutic target, whereby the direction of beneficial therapeutic modulation can be predicted genetically. A therapeutic target encompasses a gene and its associated mRNA, mRNA isoforms thereof, protein, protein isoforms thereof, or post-translational modifications of said protein. This biological entity also satisfies the following criteria, i) statistically associated with the disorder or trait to be treated, ii) the direction in which modulating the therapeutic target would be therapeutically beneficial can be proposed, and iii) this entity can be modulated in said direction by some agent or other intervention. Persons skilled in the art would understand that “therapeutically beneficial” encompasses a reduction in a pathological process relative to the health of the individual to be treated. Moreover, “statistically associated” would be understood by persons skilled in the art as being related to the trait in a fashion that is greater than by chance alone, as indexed by metrics including a frequentist P value or a probabilistic Bayes' factor.
In another embodiment, a directional anchor is a therapeutic agent that would be used for the treatment of an individual. A therapeutic agent would be understood by persons skilled in the art and includes, but is not limited to, a pharmacological agent, lifestyle intervention, or non-prescription supplement.
The pharmacologically actionable gene-set around which the pharmagenic enrichment score is constructed, as described elsewhere herein, is composed of genes biologically related to the directional anchor in same fashion. Embodiments that derive these gene-sets related to the directional anchor would include: proteins predicted from experimental or in silico data to interact with the therapeutic target or a protein target of a therapeutic agent; proteins or genes that are annotated in a biological database or peer-reviewed literature article as being related to the directional anchor; genes which are statistically more likely than chance alone to be co-expressed with the directional anchor; and, genes or proteins correlated with the treatment of a directional anchor embodied as a therapeutic agent, with this treatment either in vitro, in vivo, or predicted in silico.
In an embodiment, PES is calculated from SNPs mapped to genes which form the candidate pharmacologically actionable geneset related to the directional anchor. This may comprise model (1), as defined elsewhere herein, which sums the statistical effect size of each variant in the geneset multiplied by the allele count (dosage) for said variant. In accordance with the methods disclosed herein, an elevated PES for a given pharmacologically relevant pathway is indicative that the subject will be sensitive to a therapeutic agent that comprises a directional anchor embodied as a therapeutic agent or a therapeutic agent that with a directional anchor embodied as a therapeutic target. As described elsewhere herein, elevated PES related to a directional anchor gene is not significantly related to polygenic risk.
The present disclosure will now be further described in greater detail by reference to the following specific examples, which should not be construed as in any way limiting the scope of the disclosure. In particular, this approach is amendable to the majority of disorders and traits, as outlined elsewhere herein, and thus, the named disorders and traits in the examples are only a small selection.
Methods Relating to this Example are Henceforth Outlined:
In this embodiment, the directional anchor was a therapeutic target, with a focus on therapeutic targets that are modulated by approved agents. In an example, if upregulation of a hypothetical gene, gene X, was associated with greater odds of a disease phenotype, then an antagonist of gene Xmay be clinically useful. If this gene Xantagonist is already approved for another indication, this may inform drug repurposing. However, there is immense heterogeneity between individuals for any given complex trait or disease in its genetic architecture, which often translates to highly variable clinical manifestation. We therefore state that individuals with a greater burden of disorder-associated genetic risk in the directional anchor gene, and its network of genes that physically and biologically interact with it, may benefit more specifically from a drug repurposing candidate targeting the DA-gene (pharmagenic enrichment score, PES). PES constructed from biological networks encompassing the directional anchor genes is likely to incorporate disorder-associated impacts on upstream processes that would modify the effect of a compound targeting the candidate gene, as well as downstream processes triggered by modulating the directional anchor (
We obtained GWAS summary statistics for schizophrenia (SZ) and bipolar disorder (BIP) from the psychiatric genomics consortium (Schizophrenia Working Group of the Psychiatric Genomics Consortium et al., 2020, medRxiv, https://doi.org/10.1101/2020.09.12.20192922; Stahl et al., 2019, Nature Genetics, 51:793-803). The SZ GWAS was a mega-analysis of majority European ancestry cohorts and comprised 67,390 cases and 94,015 controls, whilst the European ancestry BIP GWAS mega-analysis had 20,352 cases and 31,358 controls. In addition, we also utilised the same SZ GWAS with a constituent cohort removed (Australian Schizophrenia Research Bank) when we profiled PES within that dataset, as described in a proceeding section of the methods.
A transcriptome-wide association study (TWAS) and a proteome-wide association study (PWAS) was performed of SZ and BIP by leveraging genetically imputed models of mRNA and protein expression, respectively. Specifically, we utilised the FUSION approach for TWAS/PWAS, which would be understood by those skilled in the art (Gusev et al., 2016, Nature Genetics, 48:245-252). Expression weights for the TWAS were derived from postmortem brain (GTEx v7, PsychENCODE) and whole blood (GTEx v7), whilst protein expression weights were similarly from postmortem brain (ROSMAP) and whole blood (ARIC). The FUSION methodology integrates SNP-effects from the model of genetically predicted expression with the effects of the same SNPs on SZ or BIP, after accounting for linkage disequilibrium, such that the TWAS Z score can be conceptualized measure of genetic covariance between mRNA or protein expression of the gene and the GWAS trait of interest. We utilised a conservative method for multiple-testing correction whereby the Bonferroni methodology was implemented to divide the alpha level (0.05) by the total number of significantly cis-heritable models of genetically regulated expression (GReX) tested from any brain tissue considered or whole blood. Several genes had GReX available in multiple-tissues, thus rendering Bonferroni correction conservative, however, we implemented this approach to capture only the most confidently associated genes that could constitute drug-repurposing candidates. For candidate directional anchor genes derived from TWAS/PWAS, we probabilistically finemapped those regions using the FOCUS methodology using the default prior (p=1×10−3) and prior variance (nσ2=40) to approximate Bayes' factors such that the posterior inclusion probability (PIP) of each gene being member of a credible set with 90% probability of containing the causal gene could be derived (Mancuso et al., 2019, Nature Genetics, 51:675-682). We also investigated the impact of using a more conservative prior as outlined in the supplementary text. Moreover, we tested whether SNPs that constitute the GReX model and either SZ or BIP displayed statistical colocalisation with the coloc package as implemented by FUSION.
In addition, we leveraged variants strongly correlated with mRNA (expression quantitative trait loci—eQTL) and protein expression (protein expression quantitative trait loci—pQTL), respectively, as instrumental variables (IVs) in a two-sample Mendelian randomisation (MR) analysis (Hermani et al, 2018, eLife, 7:e34408). Analogous to the TWAS/PWAS, eQTL/pQTL were derived from brain (MetaBrain, ROSMAP) and blood (eQTLGen, Zhang et al., 2020, Nature Genetics, 52:1122-1131). Strict selection criteria were implemented to select suitable IVs, including only retaining independent genome-wide significant (P<5×10−8) SNPs that were associated with three or less mRNA/proteins in each relevant tissue/study. Moreover, we utilised a more stringent LD clumping procedure for eQTLs given the greater power and sample sizes for these datasets also results in immense pleiotropy amongst the SNP effects on mRNA by only selecting the most significant independent SNPs using one megabase clumps, with LD estimated using the 1000 genomes phase 3 panel. The effect of mRNA or protein expression for any given gene on SZ or BIP was estimated using the Wald ratio (single IV) or an inverse-variance weighted estimator (multiple IVs, with fixed effects due to the small number of IVs). As in the TWAS/PWAS, we utilised Bonferroni correction across all tissues in the mRNA and protein analyses respectively and then sought to identify candidate directional anchor genes from these signals. For any candidate directional anchor genes, that is, where an approved drug was predicted to reverse the odds increasing direction of expression, we performed a series of sensitivity analyses. Briefly, these involved: assessing the genomic locus of the IV SNP and which other genes, if any, it was associated with, testing evidence for a shared causal variant via colocalisation using default priors and conducting a phenome-wide Mendelian randomisation analysis (MR-pheWAS) using SNP effects from each trait in the IEUGWAS database. The above MR and sensitivity analyses were performed using the R packages TwoSampleMR v0.5.5, ieugwasr v0.1.5, and coloc v4.0.4.
We searched genes prioritised from the TWAS/PWAS or MR analyses in the Drug-gene interaction database (DGIdb v4.2.0—accessed April 2021) to identify approved compounds that could reverse the odds increasing direction of expression for SZ or BIP. DGIdb combines data from databases such as DrugBank, as well as curated literature sources. We defined high confidence drug-gene interactions as those reported in DrugBank as well as at least one other database or literature source.
Protein-protein interaction data was downloaded from the STRING database vI1. We utilised each of the six candidate directional anchor genes as a seed gene, separately, and constructed a network of genes predicted to interact with the seed gene by retaining high confidence edges (confidence score>0.7) derived from experimental evidence or curated protein-complex and pathway databases, as this is generally considered the most rigorous evidence from STRING. We then tested which gene-sets curated by the g:Profiler (version e104_eg51_p15_3922dba) resource (GO, KEGG, Reactome, WikiPathways, TRANSFAC, miRTarBase, Human Protein Atlas, CORUM, and Human phenotype ontology) were overrepresented amongst the genes in each network, using the g:SCS (set counts and sizes) multiple-testing correction method implemented by g:Profiler that has been shown to better account for the complex, overlapping nature of these data. We considered a corrected P value<0.05 as significant. We then tested the association of the genes in each of these networks, with and without the gene removed, with the common variant signal in the SZ and BIP GWAS using MAGMA v1.09. Briefly, SNP-wise P values were aggregated at gene-level, with SNPs annotated to genes using two different sets of genic boundary extensions to capture potential regulatory variation, conservative (5 kilobases (kb) upstream, 1.5 kb downstream), and liberal (35 kb upstream, 10 kb downstream). Gene-set association is implemented by MAGMA using linear regression, whereby the probit transformed genic P values (Z scores) are the outcome with a binary explanatory variable indicating whether a gene is in the set to be tested (βS), covaried for other confounders like gene size, as described previously. The test statistic of interest is a one-sided test of whether βS>0, and thus, quantifies if the genes in the set are more associated than all other genes. We also investigated the association of the approximately 34,000 gene-sets collated by g:Profiler, such that we could demonstrate whether gene-sets overrepresented in each network were also associated with SZ or BIP.
We sought to utilise variants annotated to the genes within the network of each candidate directional anchor genes as to develop pharmagenic enrichment scores for SZ and BIP, respectively. As described previously, a PES is analogous to a genome-wide PRS in its derivation, with the key difference that it only utilises variants mapped to the gene-set of interest (equation 1) [18]. Specifically, a PES profile in individual i comprises sum of the effect size of j variants from the GWAS ({circumflex over (β)}j) annotated to at least one gene in set M, multiplied by the allelic dosage under an additive model (GijϵG=0, 1, 2). PESi=Σj=1M{circumflex over (β)}jGij [1]. The genome wide PRS for SZ and BIP are essentially the same model but M incorporates the entire genome. In accordance with the MAGMA analyses, we tested two genic boundary configurations for evaluating the best performing PES for each directional anchor gene network—conservative and liberal. Our previous PES related approaches utilised the LD clumping and thresholding (C+T) approach, whereby SNPs are ‘clumped’ such that the retained SNPs are largely independent and ‘thresholded’ based on their association P-value in the GWAS. In each case the threshold is set at the level the optimal for the druggable gene-set association at the population level. However, given we selected the gene-sets in this study based on interactions with the candidate directional anchor gene we tested four different P value thresholds (PTϵT={0.005, 0.05, 0.5, 1}), which represent a model with all SNPs, nominally significant SNPs, and a threshold an order of magnitude above or below the nominal threshold. We utilised PRSice-2 v2.3.3 (linux) for the C+T models. In addition, we also utilised a penalised regression framework to shrink SNP effect sizes to optimize the model for each PES, as implemented by the standalone version of lassosum v0.4.5. The implementation for this method has been outlined extensively elsewhere, with the optimal tuning parameter (λ) based on the score that displays the highest correlation with the phenotype and the best performing constraint parameter (s) chosen from a range of apriori specified values to decrease computational burden (0.2, 0.5, 0.9, and 1).
We utilised the prospective UK Biobank (UKBB) cohort to define the best performing PES for each directional anchor gene network. Our group has previously processed the UKBB genotype data such that unrelated individuals of white British ancestry were retained, along with other sample and variant level quality control considerations applied [20]. As a result, the composition of the full UKBB cohort in this study was 336,896participants for which up to 13,568,914 autosomal variants were available (imputation INFO>0.8). SZ and BIP cases were defined in the UKBB using a combination of self-report data both from the general assessment visit and the mental health questionnaire (MHQ), along with hospital inpatient data (primary or secondary ICD-10 codes). In total, there we 631 UKBB participants from the study cohort defined as having SZ, with 1657 BIP cases identified. The controls were double the number of the respective case cohorts randomly, and independently for SZ and BIP, derived from 75,201 individuals with genotype data that completed the MHQ and did not self-report any mental illness. The full complement of SZ cases with the aforementioned controls (N=1262) was utilised as the training set for the SZ scores given the relatively small number of cases. As a result, we utilised the Australian Schizophrenia Research Bank (ASRB) cohort as a validation set to attempt to replicate the associations observed with the scores (Loughland et al., 2010, Aust N Z JPsychiatry, 44 :1029-1035). The ASRB was a component of the PGC3 SZ GWAS, and thus, we retrained all the best performing PES scores using summary statistics with the ASRB cohort removed before testing them in that dataset. The BIP analyses employed a 70/30 split for the training and validation cohort in the UKBB, with double the number of independent MHQ derived healthy controls utilised for each case-set. Further information regarding the demographic composition of these cohorts is provided in the supplementary text. The PES and PRS constructing using the C+T configurations and penalised regression were scaled to have a mean of zero and unit variance before evaluating their association with SZ or BIP for the respective scores in the UKBB training cohorts using binomial logistic regression covaried for sex, age, genotyping batch, and five SNP derived principal components. The optimal PES for each network was selected for each disorder separately by calculating the variance explained on the liability scale (Nagelkerke's R2), assuming a 0.7% and 1% prevalence for SZ and BIP, respectively. These PES/PRS that explained the most phenotypic variance were then profiled and tested in the validation sets. For PES that were significantly associated with either disorder, we conservatively constructed another model that also included genome wide PRS, with a χ2 test of residual deviance performed to ascertain whether adding the PES in addition to the PRS significantly improved model fit. Correlations (Pearson's) amongst scaled the PES and PRS were visualised using the corrplot package v0.84. Individuals with at least one elevated PES in the training cohorts (highest decile) were identified, with this binary variable tested for association with SZ or BIP using another logistic regression model. Finally, we also considered residualised PES, whereby the residuals were extracted and scaled from a linear model that regressed genome wide PRS against principal components and genotyping batch on the score in question. All analyses described in this section were performed utilising the programming language R (version 3.6.0).
We then wished to investigate the correlations between the best performing PES for each network and i) blood or urine biochemical traits, and ii) self-reported mental health disorders besides SZ or BIP. The biochemical analyses were performed in up to 70,625 individuals who did not self-report any mental health disorders in the MHQ and were also not included in the SZ or BIP training/validation sets as controls. There were 33 biochemical traits tested (raw measured values) in a linear model with each PES or PRS as an explanatory variable covaried for sex, age, sex x age, age2, 10 principal components, and genotyping batch. We also performed sex-stratified analyses, with oestradiol additionally considered in females. A number of sensitivity analyses were performed for PES-biochemical trait pairs that were significantly correlated after FDR correction—i) adjustment for genome-wide PRS, ii) natural log transformation of the biochemical outcome variable, iii) inverse-rank normal transformed residuals as the outcome variable from a model that regressed sex, sex x age, and age2, and iv) adjustment for statin use (given the number of lipid related signals uncovered). These correlations are observational in nature, and thus, there are several other potential confounders that could be considered—however, given the potential biases induced by adjusting for heritable covariates, we utilised the above strategies as a baseline suite of sensitivity analyses. A specific test of sexual dimorphisms between the regression results in males and females was also performed based on the sex-specific regression estimates and standard errors (Martin et al., 2021, Biological Psychiatry, 89:1127-1137). Moreover, we then evaluated the association between each score and 14 non-SZ or BIP mental health disorders which individuals who completed the MHQ were asked to self-report. In all instances, we used the 70,625 individuals who did not self-report any mental disorders as the controls in binomial logistic regression models covaried for the same terms as in the biochemical analyses.
The findings and data related to this example are detailed henceforth.
We sought to identify candidate directional anchor genes for SZ or BIP by integrating GWAS summary statistics for these traits with transcriptomic and proteomic data collected from either blood or post-mortem brain (
Moreover, we then utilised eQTL and pQTL as IVs in a Mendelian randomisation analysis to priortise candidate directional anchor genes, which seeks to estimate the causal effect of mRNA or protein expression on either disorder outcome, given more onerous assumptions are met (Supplementary Materials, Supplementary Tables 7-10). This is critical as the use of molecular QTLs related to variables like mRNA expression as IVs is challenging due to LD complexity and the potential effect of QTLs on multiple genes [28, 46]. As a result, we sought to complement the above discovery orientated TWAS/PWAS with more conservative selection criteria for an eQTL or pQTL to be an IV, particularly in the case of eQTLs where sample sizes for some tissues are now very large. Independent SNPs (LD r2<0.001) acting as eQTLs or pQTLs at a threshold of genome-wide significance (P<5×10−8) were selected from post-mortem brain or blood, as outlined in the methods and supplementary materials. Due to the conservative nature of these analyses, many of the genes considered in the TWAS/PWAS did not have a suitable IV available, whilst conversely, a small number of genes that did not display adequate multivariate cis-heritability in the TWAS/PWAS weights could now be included. The mRNA models after Bonferroni correction uncovered four genes for which expression exerted a potential causal effect on SZ or BIP with a suitable compound approved to reverse the odds-increasing direction of effect. There were three for SZ (PCCB, NEK1, and PTK2B), as well as FADS1 for bipolar. Interestingly, PCCB and FADS1 overlapped with the TWAS results—as an example, each standard deviation increase in cortical FADS1 expression was associated with an approximately 15.23% [95% CI. 8.69%, 21.77%] decrease in the odds of BIP, which could be accentuated by an FADS1 agonist like the omega-3 fatty acid supplement icosapent (Ethyl eicosapentaenoic acid). We then performed a series of sensitivity analyses to assess IV validity and for evidence of confounding pleiotropy. These analyses supported PCCB and FADS1 as candidate directional anchor genes. The index IV-SNPs mapped to PCCB and FADS1 expression, respectively, was then utilised to perform a phenome-wide scan spanning over 10,000 GWAS of the effect of expression of these two genes using SNP effect sizes the IEUGWAS database. Firstly, we found that increased cortical expression of PCCB, which was associated with deceased odds of SZ from a previous GWAS, was also linked to a reduction in other psychiatric phenotypes from self-reported UK Biobank GWAS such as worry, neuroticism, nervousness, and tenseness, supporting the utility of a PCCB agonist, like biotin, as a repurposing candidate. Secondly, the phenome-wide data for increased cortical FADS1 expression demonstrated, as expected, a strong effect on lipids, including increased HDL and decreased triglycerides. We did find one potential other candidate for BIP using the pQTL approach (MAP2K2), however, given this was a trans-pQTL, we did not consider it further as we wished to retain only the most biologically confident associations. As a result, the MR approach did not add any additional candidate directional anchor genes but provided more support to PCCB and FADS1. A less conservative MR paradigm in terms of IV selection would likely yield more genes but as our TWAS/PWAS analyses were already discovery focused, we believe this is appropriate given the underlying assumptions of MR. We summarize the candidate directional anchor genes in table 1.
We sought to define a network of genes that display high confidence interactions with each candidate directional anchor gene using data from the STRING database, such that we can then construct a pharmagenic enrichment score using variants annotated to these genes. The number of direct interactions identified for each of the six candidate genes, excluding the gene itself, were as follows: CACNA1C network (83 genes), FADS1 network (16 genes), FES network (37 genes), GRIN2A network (54 genes), PCCB network (26 genes), and RPS17 network (254 genes). All of these networks displayed significantly more interactions than what would be expected by chance alone (P<1×10−16), with an example of two of these networks (CACNA1C and FADS1) visualised in
We then tested whether there was enrichment of the common variant signal for SZ or BIP in any of these networks, with and without the directional anchor gene included, using MAGMA (
Pharmagenic enrichment scores (PES) were then constructed for the genes in each directional anchor gene network using SNP weights for SZ and BIP, respectively. SZ and BIP PES were considered for all six networks given the high genetic correlation between SZ and BIP, as well as extensive phenotypic overlap. We defined a training set of SZ (N=631) and BIP cases (N=1161) in the UK Biobank, with double the number of controls randomly, and independently, selected from individuals with no self-reported mental health conditions for each training set. Two methods were utilised to find the most parsimonious PES profile for each network, along with a genome-wide PRS for SZ and BIP—clumping and thresholding (C+T), and penalised regression (Table 2).
1SNPs with a non-zero coefficient after the reweighting in the penalised regression model or independent SNPs after linkage disequilibrium clumping and thresholding (C + T).
2Bipolar disorder or schizophrenia log odds per standard deviation increase in the score (standard error)
3The two models evaluated were clumping and thresholding (C + T) or penalised regression (as implemented by the lassosum package)
In the SZ cohort, there were three network SZ PES which were significantly associated with increased odds of SZ after multiple testing correction including networks for FES, GRIN2A, and RPS17 (Table 2). In the GRIN2A network PES featuring 5037 variants constructed using penalised regression explained approximately 0.35% of phenotypic variance on the liability scale (OR per SD in score=1.19 [95% CI. 1.09, 1.29], P=9.23×10−4). We then conservatively adjusted for the best performing genome-wide SZ PRS and found that the GRIN2A network PES remained significantly associated with SZ. In the FES and RPS17 networks, their PES were just below the threshold for significance after PRS adjustment. It is notable that the SZ network PES profiles were only marginally correlated with genome-wide SZ PRS (all r<0.11), which suggests these scores may capture biologically aggregated risk which is distinct from the undifferentiated genome-wide signal. Individuals with elevated SZ directional anchor network PES were then defined as those with scores in the highest decile (≥90th percentile). The majority of SZ cases (53.72%) had at least one elevated PES, with a significant enrichment of SZ cases amongst individuals with an elevated PES, even after covariation for genome-wide PRS−OR=1.45 [95% CI: 1.22, 1.67], P=1.57×10−3. Interestingly, amongst individuals in this cohort with relatively low SZ PRS (lowest decile), 12 out of the 19 SZ cases with low PRS had an elevated PES (63%), with a nominally significant association remaining between elevated PES and SZ amongst those with low genome wide PRS (P=0.027). Upon considering only SZ cases in terms of low PRS, we found that 46.88% had at least one elevated PES. Taken together, these data suggest that some individuals with otherwise low SZ PRS may have localised genetic risk within particular biologically related networks. Given the relatively small number of SZ cases in the UKBB, we sought to replicate our results using an independent case-control cohort from the ASRB (NCases=425, NControls=251) rather than splitting the UKBB cohort into a training and validation set. The PES and PRS models were retrained in the UKBB from the same GWAS with the ASRB cohort removed. We were able to nominally replicate the association of the FES network PES with SZ in the ASRB (OR per SD=1.21 [95% CI: 1.04, 1.38], P=0.024), whilst the observed association between the GRIN2A and RPS17 network PES and SZ and in the UKBB was not replicated.
BIP PES within these networks was then profiled in the UKBB training set (Table 2). Interestingly, there were more of the directional anchor gene network PES associated with BIP than SZ, which may reflect that larger number of BIP cases in the UKBB, and thus, greater statistical power. Specifically, all of the network BIP PES were significantly higher in cases, with the exception of the FADS1 network PES for which there was only a trend towards significance. Analogous to the SZ cohort, the GRIN2A network PES explained the most phenotypic variance on the liability scale (0.39%), with each SD in the score associated with approximately an 19% [95% CI: 12%, 26%] increase in the odds of BIP. Moreover, adjustment for BIP genome wide PRS did not ablate the significance of the GRIN2A network, RPS17 network PES, and FESnetwork PES, whilst the PCCB network PES trended towards significance (P=0.1) after PRS covariation. The correlations between each PES and BIP PRS were also small, however, the RPS17 network PES (r=0.13), CACNA1C network PES (r=0.14), and PCCB network PES (r=0.13) were slightly larger in terms of their PRS correlation than what was observed for the SZ scores. We then investigated the characteristics of individuals with elevated BIP PES and found like SZ that almost half of the BIP cases (49.1%) had at least one PES greater than or equal to the 90th percentile. There was also enrichment of BIP cases amongst participants with an elevated PES after adjusting for BIP PRS (OR=1.19 [95% CI: 1.04, 1.34], P=0.027). Considering BIP cases in the lowest decile of the BIP PRS distribution, 36% of them had at least one top decile PES despite their low genome-wide burden, although unlike SZ the association between elevated PES and case-status in this subcohort was not statistically significant. An independent BIP case-control cohort from the UKBB was utilised to attempt to replicate these associations, and we found that the network RPS17 PES was significantly enriched in BIP cases in this validation cohort: whilst there was a trend for the GRIN2A network PES (P=0.052).
The GRIN2A network PES explained the most phenotypic variance for SZ and BIP, and survived covariation for genome-wide PRS—thus, we wanted to test whether constructing a PES for this network with the GRIN2A gene excluded would still be associated. In other words, we investigated whether there was an effect from variants mapped to the network without the directional anchor gene itself. Indeed, we did find that the GRIN2A network PES with the GRIN2A gene removed was still significantly enriched in both SZ and BIP (PSZ=9.23×10−4 and PBIP=3.81×10−6). The relationship between genome-wide PRS and this PES was also examined in further detail by constructing a ‘residualised PES’ whereby we obtained the normalized residuals from a model that regressed SNP derived principal components, genotyping batch, and genome-wide PRS for BIP and SZ, respectively, on the GRIN2A network PES for either disorder. We posit that the individuals with an elevated residualised PES are more likely to represent true enrichment in that network given that the effect of the genome wide PRS, along with variables related to technical artefacts and population stratification, have been adjusted for. Encouragingly, we find that the correlation between the raw GRIN2A network PES for either disorder and their respective residualised PES were highly concordant, with the majority of individuals with an elevated GRIN2A PES (≥90th percentile) also in that same quantile for the residualised PES (
We then investigated the association of the directional anchor gene PES in an independent subset of the UKBB with other mental health phenotypes and systemic biochemical measures (
We also performed a phenome-wide association study of each score with 14 self-reported mental health disorders in the UKBB cohort, excluding SZ and BIP (
Methods relating to this example are henceforth outlined. In this embodiment, the directional anchor is a therapeutic agent, in the form of compounds that inhibit the activity of the FTO gene.
We sought to refine the target genes of two recent, potent small-molecular inhibitors of FTO—specifically, CS1 (bisantrene) and CS2 (brequinar). RNA sequencing was performed to explore the mRNA expression correlates in vitro of CS1 and CS2, as described extensively in that study (Su et al., 2020, Cancer Cell, 38(1):79-96.el1). The human monocytic leukaemia cell line NOMO-1 was treated with both compounds individually (three replicates each) relative to four control replicates. In addition, the effect of FTO knockdown via short-hairpin RNA (shFTO) was also investigated relative to a control construct (shNS). We obtained raw count data generated by HTSeq for the aforementioned experiments by correspondence with the lead author, with data in the form of transcript-per-million uploaded to GEO for this study (GSE136204).
Data normalisation, filtration, and differential expression analyses were performed using the edgeR package version 3.34.0. We considered three different contrasts: i) CS1 treated cells vs control replicates, ii) CS2 treated cells vs control replicates, and iii) shFTO treated cells vs shNS treated control replicates. Raw counts were firstly normalised to library size, followed by removing lowly expressed genes with fewer than 10 raw counts in the smallest library via a counts-per-million thresholding approach. Data were inspected before and after the filtration step via coefficient of variation (BCV) and multidimensional scaling (MDS) plots. Differential expression for each gene that survived quality control was then performed using exact tests for differences in the means between two groups of negative-binomially distributed counts. We defined a differentially expressed gene as those which survived multiple-testing correction using the Benjamini-Hochberg false discovery rate approach at the 1% threshold and had an absolute log2 fold change (FC)>0.6, which approximates an absolute FC of 1.5. In addition, we also investigated a more stringent cut-off of |log2FC|>1. The overrepresentation of each set of candidate genes amongst biological pathways and other ontology sets was performed using g:Profiler. The gene-ontology molecular process sets overrepresented for the genes implicated in all three treatments were further subjected to clustering by semantic similarity via the REVIGO online platform.
We obtained genome-wide association study (GWAS) summary statistics of overall breast-cancer susceptibility, that is, all subtypes included (Zhang et al., Nature Genetics, 62:572-581)). Considering all cohorts included in the meta-analysis there were 133,384 cases and 113,789 controls of European ancestry. Non-palindromic common variants (frequency>1%) outside of the major histocompatibility complex were retained in order to construct a breast cancer polygenic risk score (PRS) and pharmagenic enrichment scores (PES) based on the targets of the FTO inhibitors. The breast cancer GWAS did not feature the UK Biobank cohort.
The breast cancer GWAS summary statistics were filtered such that only genes annotated to the target of FTO inhibition were retained, thus, forming the PES investigated henceforth in this study. Genic boundaries were extended 5kb upstream and 1.5kb downstream to capture regulatory variation. There were three different gene-sets considered as the FTO inhibitor target network: i) genes differentially expressed after both CS1 and CS2 treatment, ii) genes differentially expressed in all three treatments (CS1, CS2, and shFTO), and iii) genes differentially expressed in all three treatments using a stricter |log2FC| cut-off of 1. After annotating variants to these three gene-sets, we performed clumping using the 1000 genomes phase 3 European reference panel (r2<0.1 per 250 kb clump) such that the remaining variants were in relative linkage equilibrium (LE). Three P-value thresholds were chosen to construct scores, and thus, each gene-set had three PES considered in the first instance: all SNPs mapped to the FTO inhibitor targets, nominally significant SNPs mapped to the FTO inhibitor targets (PGWAS<0.05), and genome-wide significant SNPs mapped to the FTO inhibitor targets (PGWAS<5×10−8). Specifically, a PES profile in individual i comprises sum of the effect size of j variants from the GWAS ({circumflex over (β)}j) annotated to at least one gene in set M, multiplied by the allelic dosage under an additive model (GijϵG=0, 1, 2). PESi=Σj=1M{circumflex over (β)}jGij [1]. A genome wide PRS was constructed using the same three clumping and thresholding parameters, with the final scores averaged by the number of alleles in each participant. Scores were profiled using plink2.
The breast cancer FTO inhibition PES and PRS were profiled in the large, prospective UK Biobank (UKBB) cohort, which features extensive self-reported and primary care data related to cancer diagnoses. This research was conducted using the UKBB resource under the application 58432. The UKBB genotype data was previously processed such that unrelated individuals of white British ancestry were retained, along with other sample and variant level quality control considerations applied (Reay et al., 2021, medRxiv, https://doi.org/10.1101/2021.01.24.21250424). As a result, the composition of the full UKBB cohort in this study was 336,896 participants for which up to 13,568,914 autosomal variants were available (imputation INFO>0.8).
We defined prevalent breast cancer cases in the UKBB at time of analysis (July 2021) as females who satisfied at least one of the following criteria: i) self-reported breast cancer in their interview upon their visit/s to an assessment centre, ii) a relevant ICD-9 or ICD-10 code recorded via the linked national cancer registry inpatient data (ICD-9: 1740-1749, ICD-10: C500-C506, C508, C509). Controls were female-participants who did not self-report any cancer or had a relevant linked diagnosis recorded on the cancer registry at this same time-point. We tested the association of each breast cancer PES and PRS with prevalent breast cancer separately using a binomial logistic regression model covaried for age, 20 SNP derived principal components, and genotyping batch. The effect of additionally adjusting for PRS on the association between the PES and prevalent breast cancer was then quantified. Moreover, variance explained (Nagelkerke's R2) by each score was converted to the liability scale assuming a 3.6% population prevalence. The correlation between PES and PRS was also assessed, whilst residualised PES were derived to further model the PES/PRS relationship. These residualised scores are the scaled residuals from a linear model with the PES as the outcome variable and PRS, principal components, batch as the predictor, with the underlying aim to assess the extent that individuals with elevated PES are driven by the genome-wide polygenic signal (PRS) and/or technical factors related to population stratification and genotyping.
The results of this example are detailed henceforth.
We investigated the mRNA correlates shared across in vitro treatment of the FTO inhibitors CS1 and CS2, as well as further comparing to the effect of FTO inhibition via a shRNA (shFTO,
The association of pharmagenic enrichment scores (PES), constructed from the three-sets of FTO inhibition related genes, was tested for prevalent breast cancer amongst female participants in the UK Biobank cohort (
The above embodiments in this example represent a mechanism by which a directional anchor PES for a compound that inhibits FTO could utilised in the precision treatment of breast cancer. In this embodiment, patients with high FTO associated PES may be expected to display a stronger response to an FTO inhibitor. Accordingly, these scores were significantly elevated in breast cancer cases even after adjustment for the background of elevated genome wide genetic risk among female participants in the UK Biobank cohort.
Number | Date | Country | Kind |
---|---|---|---|
2022900558 | Mar 2022 | AU | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/AU2023/050326 | 4/21/2023 | WO |