Alzheimer's disease (“AD”) is the most common cause of dementia, an affliction that ultimately occurs in over 43 million people worldwide. The majority of dementia cases occur after age 65, which impose an increasing burden on societies with aging populations. AD is defined biologically by the presence of a specific neuropathology of the brain: extracellular deposition of amyloid-β (Aβ) in the form of diffuse and neuritic plaques and the presence of neuropil threads within dystrophic neurites that contain aggregated, hyperphosphorylated tau protein and intraneuronal neurofibrillary tangles (NFT) Both Aβ and NFT accumulation typically progress to targeted neuronal and synaptic loss, mainly in regions of the cerebral cortex and the hippocampus. Concurrent with the neuronal loss seen in AD, there is an additional coordinated breakdown across other brain cell types such as gliosis, demyelination, and inflammation which exacerbates cognitive dysfunction.
Despite the heavy burden on society and on aging populations, there are only four medications currently approved by the FDA for treating AD, and they are approved only for managing the cognitive impairment that are present in symptomatic AD. One explanation for the present paucity of effective AD therapies appears in recent developments indicating that AD is a heterogeneous disease caused by a variety of pathophysiologic mechanisms, mechanisms that often lie outside the current dogma regarding AD. For instance, up to one-third of patients with a clinical diagnosis of AD have no accumulation of amyloid-beta (Sekiya, et al., 2018), and many diagnosed with AD at post-mortem biopsy do not show cognitive impairment (Iaocono et al., 2014).
Sporadic Late-Onset Alzheimer's Disease (LOAD), the most prevalent form of dementia among people over age 65, is a progressive and irreversible brain disorder. Over 5.5 million in the US are affected by LOAD, which is currently the sixth leading cause of death in the US and costs more than $200 billion annually. There is an urgent need to develop effective methods to prevent, treat, or delay the onset or progression of LOAD. Among those at risk of LOAD, certain patients may carry a unique set of numerous genetic changes with greater risk for developing the disease, including CLU, TREM2, and most importantly Apolipoprotein E (APOE) variants (Lambert, et al. 2013). But the interaction between specific LOAD risk alleles, changes in disease pathogenesis, and their effects on patients, however, remains elusive.
Furthermore, it is very challenging to predict the progression of AD, suggesting high heterogeneity in disease progression among AD patients. There is growing evidence that disease progression and responses to interventions differ significantly within LOAD. For instance, patients with LOAD often branch into distinct groups including (a) slow vs. rapid cognitive decliners (Risacher et al., 2017); (b) amnestic vs. non-amnestic AD (Bredesen et al., 2015); (c) executive vs. cortical visual defect vs. dysphasia predominant AD (Phillips et al. 2018); (d) psychosis and/or depression associated AD (Qian et al. 2018); and (e) metabolic-dysfunction associated AD modulated by abnormalities in insulin resistance, hormonal deficiencies, or homocysteinemia (Huang et al. 2012). Finally, the relationship between the various forms of AD and other non-AD dementias such as primary age-related tauopathy (PART) (Crary et al., 2014), vascular contributions to cognitive impairment and dementia (VCID) (Chornenkyy et al. 2019), and frontotemporal dementia (Bang et al. 2015) must be better understood. Therefore, identifying unique molecular subtypes of AD resistant to other comorbid conditions may provide new insights into AD patient subpopulations and pave a way for precision medicine for AD.
Molecular biomarkers may hold the promise for improving methodologies for AD subtype identification and classification (Blalock et al. 2004; Courtney et al. 2010). Some recent studies have highlighted the great advantages of using RNA-seq to profile the transcriptome of the brain with neurodegenerative diseases. For instance, a multi-Omic molecular analysis of LOAD across four brain regions uncovered subnetworks and novel molecular drivers of the disease, including the vacuolar ATP-dependent proton pump ATP6V1A, which have now been shown to modulate cognitive function in Drosophila models of AD (Wang et al. 2019). Additionally, molecular network analysis of LOAD brains has identified an excess of dysregulated genes that cannot be fully predicted by a single model of the disease. Nevertheless, only a limited number of published papers describe RNA-seq studies of the most relevant materials, namely, AD patients' brains across multiple regions (Twine et al. 2011).
Thus, there is a need for characterizing the specific subtype signatures of AD, for identifying individual targets for treatment and for identifying drugs useful in the treatment.
The present disclosure overcomes the deficiencies noted above by identifying five molecular subtypes of AD and subsequently characterizing them with molecular signatures, network regulator genes, and matched mouse models. The identified subtypes are concentrated in the hippocampal area but distributed across brain regions.
The molecular AD subtypes identified in the present disclosure are well conserved across different independent cohorts, have independent molecular signatures, network regulator genes, and matched mouse models of AD.
Accordingly, these molecular subtypes can be used to: predict clinical features such as cognitive function or dementia; provide diagnostic signatures for classifying AD subtypes; identify key regulator genes across the subtypes and key genes unique to a given subtype; and provide methods for identifying new candidate drugs for treating AD and for stratifying patient populations for suitable AD treatments.
The present disclosure further provides methods for predicting such AD subtypes in affected subjects using whole genome sequencing. The present disclosure provides the first genomic copy number variation (CNV) study of LOAD based on whole genome sequencing data.
The present disclosure further provides methods for predicting AD subtypes in affected subjects using blood gene expression data.
The present disclosure also identifies FDA approved, investigational and experimental drugs that are useful in treating different AD subtypes.
The present disclosure provides for the first time a global landscape and detailed map of signaling circuits of complex molecular interactions in 4 key brain regions affected by LOAD, information that is critical for identifying specific treatment targets and identifying LOAD therapeutics. The present disclosure further provides multiple neuronal modules particularly relevant to LOAD pathology and predicts key regulators of these modules.
The present disclosure provides multiple AD subtypes and methods for effectively treating patients by correlating treatment methods with AD subtype.
The present disclosure provides methods of treating LOAD by administering to a subject in need thereof a therapeutically effective amount of an FDA approved drug.
The present disclosure provides a method of treating LOAD by administering to a subject in need thereof a therapeutically effective amount of a drug that targets individual AD subtypes.
The present disclosure provides methods for predicting AD subtypes based on whole genome sequencing alone.
The present disclosure provides methods for predicting AD subtypes using blood monocyte transcriptomes.
The present disclosure provides a method for treating Alzheimer's Disease (AD) in a patient in need thereof, wherein the method includes and administering a therapeutically effective amount of a compound selected from the group consisting of: thioproperazine; nalbuphine; gabexate; mesoridazine; dimercaptosuccinic-acid; menadione; carbamazepine; diphenidol; epirizole; timolol; mestranol; naphazoline; hesperidin; ethisterone; amlodipine; amsacrine; febuxostat; famciclovir; ezetimibe; carbetocin; orphenadrine; hyoscyamine; amiodarone.hcl; erythromycin-ethylsuccinate; meclizine; dobutamine; phenazopyridine; spironolactone; meclofenamic-acid; parachorophenol; bemegride; ketorolac; brinzolamide; nortriptyline; hexylcaine; omeprazole; norgestrel; olmesartan-medoxomil; perphenazine; promazine; metolazone; citalopram; clonazepam; lamotrigine; mosapride; and ephedrine.
The present disclosure provides a method for treating Alzheimer's Disease (AD) in a patient in need thereof, wherein the method includes and administering a therapeutically effective amount of a compound selected from the group consisting of: thioproperazine; nalbuphine; gabexate; mesoridazine; dimercaptosuccinic-acid; menadione; carbamazepine; diphenidol; epirizole; timolol; mestranol; naphazoline; hesperidin; ethisterone; amlodipine; amsacrine; febuxostat; famciclovir; ezetimibe; carbetocin; orphenadrine; hyoscyamine; amiodarone.hcl; erythromycin-ethylsuccinate; meclizine; dobutamine; phenazopyridine; spironolactone; meclofenamic-acid; parachorophenol; bemegride; ketorolac; brinzolamide; nortriptyline; hexylcaine; omeprazole; norgestrel; olmesartan-medoxomil; perphenazine; promazine; metolazone; citalopram; clonazepam; lamotrigine; mosapride; and ephedrine, and wherein the wherein the drug has been selected for treating at least one Alzheimer's Disease subtype.
The present disclosure provides a method for treating Alzheimer's Disease (AD) in a patient in need thereof, wherein the method includes and administering a therapeutically effective amount of a compound selected from the group consisting of: thioproperazine; nalbuphine; gabexate; mesoridazine; dimercaptosuccinic-acid; menadione; carbamazepine; diphenidol; epirizole; timolol; mestranol; naphazoline; hesperidin; ethisterone; amlodipine; amsacrine; febuxostat; famciclovir; ezetimibe; carbetocin; orphenadrine; hyoscyamine; amiodarone.hcl; erythromycin-ethylsuccinate; meclizine; dobutamine; phenazopyridine; spironolactone; meclofenamic-acid; parachorophenol; bemegride; ketorolac; brinzolamide; nortriptyline; hexylcaine; omeprazole; norgestrel; olmesartan-medoxomil; perphenazine; promazine; metolazone; citalopram; clonazepam; lamotrigine; mosapride; and ephedrine, and wherein the wherein the drug has been selected for treating at least one Alzheimer's Disease subtype selected from the group consisting of: AD subtype A, AD subtype B1, AD subtype B2, AD subtype C1, and AD subtype C2.
The present disclosure provides a method for treating Alzheimer's Disease (AD) in a patient in need thereof, wherein the method includes and administering a therapeutically effective amount of a compound selected from the group consisting of: GW-3965; ciglitazone; L-689560; fludroxycortide; sirtinol; Y-26763; mometasone-furoate; erythrosine; MDL-72832; NU-1025; cyclopentolate; ZM-306416; CP-93129; CGP-13501; CI-966; FK-888; PPT; cyclosporin-a; clofibric acid; neostigmine; FIT; ciproxifan; quinpirol-(−); clopidogrel; DMP-543; salmeterol; tremorine; piperidolate; pinacidil; erythrosine; mometasone-furoate; BML-284; GW-0742; adapalene; imatinib; CP-93129; proxyfan; tranylcypromine; ilomastat; FK-888; phenazopyridine; PNU-22394; clofibric acid; fenoterol; IBC-293; cyclopentolate; SQ-22536; UBP-296; atorvastatin; emetine; FH-535; altanserin; gamma-linolenic-acid; and alpha-linolenic-acid, and wherein the wherein the drug has been selected for treating at least one Alzheimer's Disease subtype selected from the group consisting of: AD subtype A, AD subtype B1, AD subtype B2, AD subtype C1, and AD subtype C2.
The present disclosure provides a computer-implemented method to predict an AD subtype of a subject, the method comprising: obtaining data of nucleotide characteristics for a sample collected from a particular human subject, wherein the sample is collected from the group consisting of: brain tissue; epithelial tissue; cerebrospinal fluid; and blood; providing the data as input to a trained machine learning model, wherein the model is selected from the group consisting of: Random Forest, hierarchical clustering, k-means clustering, WSCNA, MEGENA, Bayesian causal network, CNVnator, Pindel, MetaSV, Delly2, Quasipoisson regression, AdaBoost, logistic regression, decision tree, nearest neighbors (KNN), support vector machines (SVM), naïve Bayes, multi-layer perceptron, and Ensemble and wherein the model determines the AD subtype based on the data, wherein the AD subtype is any one of: AD subtype A, AD subtype B1, AD subtype B2, AD subtype C1, or AD subtype C2; and obtaining, from the model, the AD subtype.
The present disclosure provides a computer-implemented method to predict an AD subtype of a subject, the method comprising: obtaining data of nucleotide expression for a sample collected from a particular human subject, wherein the sample is collected from blood; providing the data as input to a trained machine learning model, wherein the model is selected from the group consisting of: Weighted Sample Gene Network Analysis (WSCNA) and Multi-scale Gene Expression Network Analysis (MEGENA); and wherein the model determines the AD subtype based on the data, wherein the AD subtype is any one of: AD subtype A, AD subtype B1, AD subtype B2, AD subtype C1, or AD subtype C2; and obtaining, from the model, the AD subtype.
The present disclosure provides a computer-implemented method to predict an AD subtype of a subject, the method comprising: obtaining data of nucleotide expression for a sample collected from a particular human subject, wherein the sample is collected from cerebrospinal fluid; providing the data as input to a trained machine learning model, wherein the model is selected from the group consisting of: Weighted Sample Gene Network Analysis (WSCNA) and Multi-scale Gene Expression Network Analysis (MEGENA); and wherein the model determines the AD subtype based on the data, wherein the AD subtype is any one of: AD subtype A, AD subtype B1, AD subtype B2, AD subtype C1, or AD subtype C2; and obtaining, from the model, the AD subtype.
The present disclosure provides a computer-implemented method to predict an AD subtype of a subject, the method comprising: obtaining data of gene expression levels for a sample collected from a particular human subject, wherein the sample is collected from blood; providing the data as input to a trained machine learning model, wherein the model is selected from the group consisting of: Random Forest, AdaBoost, logistic regression, decision tree, nearest neighbors (KNN), support vector machines (SVM), naïve Bayes, multi-layer perceptron, and an Ensemble with equal weights for each classifier and wherein the model determines the AD subtype based on the data; wherein the AD subtype is any one of: AD subtype A, AD subtype B1, AD subtype B2, AD subtype C1, or AD subtype C2; and obtaining, from the model, the predicted AD subtype.
The present disclosure provides a computer-implemented method to predict for identifying candidate compounds for use in treating an AD subtype, the method comprising: obtaining data of drug induced signatures for candidate compounds and AD subtype signatures, wherein the AD subtypes are selected from the group consisting of: AD subtype A, AD subtype B1, AD subtype B2, AD subtype C1, or AD subtype C2; providing the data as input to a trained machine learning model, wherein the model is EDMURA; and obtaining from the model, the drug associated with an AD subtype.
Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this disclosure belongs. The meaning and scope of the terms should be clear, however, in the event of any latent ambiguity, definitions provided herein take precedent over any dictionary or extrinsic definition.
As used herein, the following terms have the meanings ascribed to them below, unless specified otherwise. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise.
The phrase “and/or,” as used herein in the specification and in the claims, should be understood to mean “either or both” of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Other elements may optionally be present other than the elements specifically identified by the “and/or” clause, whether related or unrelated to those elements specifically identified unless clearly indicated to the contrary. Thus, as a non-limiting example, a reference to “A and/or B,” when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A without B (optionally including elements other than B); in another embodiment, to B without A (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
As used herein, the term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 3 or more than 3 standard deviations, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, preferably up to 10%, more preferably up to 5%, and more preferably still up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value.
As used herein, the term “effective amount” or “therapeutically effective amount” refers to a quantity of a drug sufficient to achieve a desired effect or a desired therapeutic effect. In the context of therapeutic applications, the amount of the drug administered to the subject can depend on the type and severity of the disease or symptom and on the characteristics of the individual, such as general health, age, sex, body weight and tolerance to drugs. The skilled artisan will be able to determine appropriate dosages depending on these and other factors.
As used herein, the terms “treat,” “treatment,” or “treating” includes any treatment of a condition or disease in a subject, or particularly a human, and may include: (i) preventing the disease or condition from occurring in the subject which may be predisposed to the disease but has not yet been diagnosed as having it; (ii) inhibiting the disease or condition, i.e., arresting or slowing down its progression; relieving the disease or condition, i.e., causing regression of the condition; or (iii) ameliorating or relieving the conditions caused by the disease, i.e., symptoms of the disease. “Treat,” “treatment,” or “treating,” as used herein, could be used in combination with other standard therapies or alone.
As used herein, the term “late-onset Alzheimer's Disease” or “LOAD” includes patients who were diagnosed with AD at age 65 or older.
As used herein, the term “Alzheimer's Disease subtype” or “AD subtype” refers to a category of Alzheimer's Disease that is determined by a distinct molecular signature comprised of up- and down-regulated genes which can be further characterized by the presence or absence of particular pathways (markers), as provided in the present disclosure, and falling into one of the following categories: AD subtype A; AD subtype B1; AD subtype B2; AD subtype C1; and AD subtype C2.
As used herein, the term “AD subtype A” refers to the Alzheimer's Disease subtype provided in this disclosure as having specific traits, including, for example: opposite to the differential expression directions of the Blalock signatures; up regulation of the GNF2_MAPT pathway; upregulation of glutaminergic, GABAergic, and dendritic synaptic pathways; upregulation of protein degradation related genes, including ubiquitination and polyubiquitination, protein catabolism, the proteasome, and proteins targeting for destruction; upregulated neuronal regulators (GABRB2, SYT1, NSF, SLC4A10, SLC9A6 and SCN2A); downregulated KDGs in astrocytes, endothelial cells, and microglia (LRP10, NOTCH1, ITGB5, MYO1C, and TLN1).
As used herein, the term “AD subtype B1” refers to the Alzheimer's Disease subtype provided in this disclosure as having specific traits, including, for example: up regulation of the GNF2_MAPT pathway; upregulation of glutaminergic, GABAergic, glycinergic, and dendritic synaptic pathways; upregulation of organic acid related genes, including acid secretion and acidic amino acid transport; downregulation of genes in oligodendrocytes (PLP1, UGT8, CLDND1, ERMN, and ENPP2); upregulation of genes in neurons (CACNA1B, BSN, FBXO41, CHD5, DGKZ, SYT7, CELSR3 and RAPGEFL1); and upregulation of genes in astrocytes (IQSEC2).
As used herein, the term “AD subtype B2” refers to the Alzheimer's Disease subtype provided in this disclosure as having specific traits, including, for example: up regulation of the GNF2_MAPT pathway; upregulation of glutaminergic, GABAergic, glycinergic, and dendritic synaptic pathways; upregulation of innate and adaptive immune response, immune system activation, inflammation, circulatory system development, and endothelial cell migration; upregulation of organic acid related genes, including acid secretion and acidic amino acid transport; increased APOE e2 allele dosage; downregulation of genes in oligodendrocytes (PLP1, UGT8, CLDND1, ERMN, and ENPP2); downregulation of PICALM and of PSMC6; upregulation of FBXO41, WIZ, PRRC2A, ZMIZ2, CIC, and TCEA1.
As used herein, the term “AD subtype C1” refers to the Alzheimer's Disease subtype provided in this disclosure as having specific traits, including, for example: consistent with the Blalock signatures; upregulation of amyloid-beta binding, clearance, fiber formation pathways and of scavenger receptor activity; down regulation of the GNF2 MAPT pathway; downregulation of glutaminergic, GABAergic, glycinergic, and dendritic synaptic pathways; upregulation of innate and adaptive immune response, immune system activation, inflammation, circulatory system development, and endothelial cell migration; increased APOE e4 allele dosage; downregulation of GABRB2, SYT1, and PREPL; and KDGs are upregulated in microglia (TLN1, MSN, and IL6R), endothelial cells (TAGLN2), and astrocytes (LRP10, GNA12, and LTBP3) and downregulated in neurons (ATP6V1A, SCN2A, GABRB2, and NAPB); downregulation of AMPH, MEF2C, and EPDR1.
As used herein, the term “AD subtype C2” refers to the Alzheimer's Disease subtype provided in this disclosure as having specific traits, including, for example: consistent with the Blalock signatures; down regulation of the GNF2_MAPT pathway; downregulation of glutaminergic, GABAergic, glycinergic, and dendritic synaptic pathways; upregulation of innate and adaptive immune response, immune system activation, inflammation, circulatory system development, and endothelial cell migration; downregulation of GABRB2, SYT1, GABRB2, SCN2A, NSF, GABBR2, and PREPL; upregulation of STAT3, SLC39A1, LRP10, GNA12, TAGLN2, IL6R, and MAPKAPK2.
The following examples are put forth to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the compositions, and assay, screening, and therapeutic methods of the invention, and are not intended to limit the scope of what the inventors regard as their invention.
Clinical and transcriptomic signatures from the MSBB-AD study of 364 human brains were obtained, including whole transcriptome RNA-seq data from four brain regions (FP, STG, PHG, IFG) from subjects with AD that showed neurocognitive decline as measured by clinical dementa rating (CDR) score>1 and non-demented controls (CDR=0). Table 1 summarizes the clinical and pathologic phenotypes for the samples in the MSBB-AD cohort with the transcriptomic data from the parahippocampal gyrus. The inventors identified numerous confounding factors in the RNA-seq data, including age of death, race, and post-mortem interval (PMI). To minimize re-identification of clinical and technical covariates, the transcriptomic data was corrected for age of death, race, gender, post-mortem interval (PMI), batch number, and RNA integrity number (RIN) using a mixed effects model.
To understand which brain regions and molecular processes are most vulnerable to dysregulation in AD, the inventors performed differential gene expression analysis between AD and control, generating differentially expressed genes (DEGs) for each of the four brain regions in the MSBB-AD. The inventors discovered that the PHG brain region has the largest number of DEGs (3571 genes, adjusted FDR=0.05) compared to the FP (3 genes), STG (1 gene), and IFG (181 genes). Their findings are consistent with previous DEG analyses of the MSBB-AD transcriptomic data and suggest that the PHG is most vulnerable in AD as manifested by marked transcriptomic dysregulation. Moreover, these findings are consistent with their previous pan-cortical atlas of AD (independent of the data described herein) in which the PHG brain region showed the strongest transcriptomic changes (Wang et al. 2016). Indeed, this prior work has shown that the hippocampus is strongly associated with A3 and tau accumulation and early memory loss in AD. The inventors have also shown in previous studies of the MSBB-AD cohort that transcriptomic changes in the PHG region highly overlap the KEGG Alzheimer's and Parkinson's disease gene sets and are correlated with high A3 plaque density (Wang et al. 2019), demonstrating that these changes are consistent with AD disease progression. Therefore, the present disclosure shows that the PHG region carries the strongest molecular signature of AD.
Patients with more severe AD-associated dementia, such as those in a later stage of the disease, are reported to have more neuronal loss at post-mortem biopsy. Therefore, it is important to control for AD stage before transcriptomic analysis is performed between subjects with AD. Previous work has shown that brain cell type proportions (SPVs), including the proportion of neurons in a sample, can be inferred from bulk RNAseq data when combined with measurements of brain cell-type specific gene expression patterns (McKenzie et al. 2018), and may serve as a marker of neuronal loss in AD.
To determine the extent of hippocampal neuronal loss in the MSBB-AD cohort, cell type proportion analysis of bulk tissue transcriptomic data from the PHG region was performed. The inventors discovered a strong relationship between PHG neuronal loss and clinical dementia rating (CDR) score in the MSBB-AD cohort, with astrogliosis and increased abundance of other cell types associated with disease progression. Currently, there is no universally accepted method to rectify for neuronal loss seen in AD, as this reduction in neurons is both the cause and effect of molecular changes leading to cognitive impairment. The inventors examined both normalization by neuronal cell type proportion SPVs and AD dementia (as measured by CDR score, range 0-6) to reduce the molecular signature of neuronal loss in AD, using a linear effect model. The inventors found that there is no significant correlation between cell type and AD staging after either normalization by neuronal cell type proportion or by dementia severity. Furthermore, the inventors did not see a further reduction in this correlation with additional neuronal cell type normalization after normalization by dementia severity. Thus, the inventors found that both normalization by CDR and cell-type proportion SPVs are effective in removing the confounding effects of neuronal loss along disease progression in the MSBB-AD cohort.
The inventors further normalize the PHG transcriptomic data by CDR score, to remove the confounding effects of neuronal loss in later AD stages. Therefore, any identified differences between groups of AD subjects would be distinct from previously identified clinical subtypes of AD which rely on these metrics.
Identification of AD Putative Subtypes from Molecular Data of the PHG
To robustly identify subgroups of AD subjects, the inventors evaluated the performance of several sample clustering methods for AD subjects using by the normalized gene expression data within each brain region in the MSBB-AD cohort. The inventors used two classical clustering algorithms (hierarchical, k-means) as well as two novel network-based clustering algorithms (Weighted Sample Gene Network Analysis (WSCNA), Multi-scale Gene Expression Network Analysis (MEGENA)) to group similar samples together into putative AD subtypes. WSCNA shows the best performance in terms of clustering quality and thus is adopted to identify AD subtypes for the subsequent analyses.
The inventors successfully identified clusters of related samples using all four methods for each of the four brain regions. To determine the likelihood that these sample clusters may represent molecular subtypes of AD and that the sample grouping is consistent, 50 rounds of bootstrapped reclustering were performed using each clustering algorithm while withholding 20% of the samples and genes per round. An empirical calculation was then performed to determine the likelihood that samples are consistently grouped together from the observed clusters compared with a distribution of 100,000 possible random groupings. A specific subtype grouping is considered a putative subtype if its empirically adjusted p-value is less than 0.05. Employing this method, the inventors detected the presence of putative AD subtypes using all four clustering methods (emp. p-value: <0.05) based on molecular data from the PHG region alone. Among the four algorithms evaluated, the inventors found that their new network-based clustering approach, WSCNA, shows the highest likelihood of stable subtypes compared to random clustering of 7.08:1 (emp. p-value: <1*10−5) in the PHG region. The inventors found that the other clustering methods also identified subtypes in the PHG, but with a smaller likelihood ratio. Furthermore, among all four brain regions in the MSBB-AD, the inventors discovered that the PHG region shows the most robust AD subtype signal.
As shown in
Cluster stability is defined here as the rate at which sample pairs group together into the same subtypes upon repeated re-clustering on a random subset of the input data. Subtypes from WSCNA clustering are generally stable, and sample pairs grouped together, on average, between 60-91% across all five detected AD subtypes. The class C subtypes have the strongest stability, followed by class A and class B. All subtypes demonstrate a cluster stability strongly above random clusters, which was empirically determined at a stability range between 20-30%. Therefore, the subtypes found by the inventors show specific robust molecular signals suitable for classification into stable subtypes.
To characterize the molecular signatures of these AD subtypes, the inventors identified DEGs for each of the five subtypes compared with non-demented controls (CDR=0) from the RNA-seq data in the PHG region. The inventors found that each AD of the subtypes they discovered has a specific transcriptomic signature of up- and down-regulated genes that distinguishes it from the others at a molecular level, revealing a plurality of different mechanisms of AD. As shown by
Using mean gene expression levels grouped by gene ontology (GO) pathway as surrogate markers for the activity level of various molecular processes in the brain, the inventors identified several differences in key AD-related pathways between the subtypes, providing key insights into disease pathogenesis. As shown in
Surprisingly, the inventors observed only weak molecular enrichment of amyloid-beta and tau related pathways across all AD subjects as a group, but they saw strong enrichment of these pathways in the subtypes. For instance, the inventors discovered strong upregulation of amyloid-beta binding, clearance, and fiber formation pathways in the subtype C1, as well as scavenger receptor activity in the subtypes C1 and C2, while these same pathways are down-regulated in the subtype B1 and mild downregulation in the subtype A. On the other hand, they found that tau-neighborhood genes (“GNF2_MAPT” pathway) are strongly up-regulated in the subtypes A, B1, and B2 but downregulated in C1 and C2. Tau protein binding and tau-related P35 pathway genes are up-regulated in the subtype A. Therefore, it is likely that AD subtypes may be characterized by either amyloid-beta activity predominant (class C) or MAPT-activity predominant (class A+B) though they cannot fully explain all differences seen between the five subtypes.
The subtypes identified by the inventors also differ strongly in neuronal activity despite normalization for AD staging. The inventors have discovered broad downregulation of glutaminergic, GABAergic, glycinergic, and dendritic synaptic pathways in class C subtypes, with absent changes in cholinergic and dopaminergic synaptic pathways, suggesting that these synapse types are selectively resilient to AD subtype molecular changes. On the other hand, the inventors found strong upregulation of these same synapse pathways in the classes A and B, with the exception of upregulation of glycinergic synapse in the class A. This pattern is consistent with differences in synaptic excitation pathways between subtypes: excitatory synapses are up-regulated in the classes A and B but down-regulated in the class C. These data suggest that AD subtypes may be split into those selectively vulnerable to synaptic depression (class C) versus synaptic excitation (classes A and B).
Dysregulated immune system activities, including reactive gliosis and the breakdown of the blood-brain barrier, have been repeatedly observed in AD brains (Sweeney et al. 2018). The present disclosure shows that the subtypes B2, C2, and especially C1, immune related pathways including the innate and adaptive immune response, immune system activation, inflammation, circulatory system development, and endothelial cell migration are upregulated in comparison with the normal control. Such upregulation coincides with increased expression of blood-brain barrier, basement membrane, and cell matrix adhesion genes. However, these immune response pathways are down regulated in the subtypes A and B1. These data and the findings relative to synaptic pathways suggest that disease progression across AD subtypes is characterized by either increased immune or synapse pathway activity.
Finally, the inventors found that certain molecular pathways are subtype-specific and thus provide greater insights into disease pathogenesis when considering other enriched AD pathways. For example, the present disclosure shows that many protein degradation related genes, including ubiquitination and polyubiquitination, protein catabolism, the proteasome, and proteins targeting for destruction are up-regulated in the subtype A, while organic acid related genes, including acid secretion and acidic amino acid transport, are specifically up-regulated in the class B.
Association of Clinical and Pathological Phenotypes and APOE Variants with Putative AD Subtypes
The present disclosure provides a better understanding of the clinical characteristics of these molecularly defined AD subtypes, by comparing the relationship between characterized AD pathologic markers in the MSBB-AD study and each subtype. The inventors found that under the Kruskal-Wallis (KW) one-way analysis of variance test, AD subtypes are marginally associated with several clinical AD markers, including tau NFT levels in the medial frontal cortex (KW p=0.041), Aβ mean plaque levels (KW p=0.020), and APOE e4 (KW p=0.048) and APOE e2 (KW p=0.012) allele counts (
The present disclosure provides a better understanding of the differences in APOE allele dosages between AD subtypes. The inventors found that certain subtypes are preferentially enriched or depleted for the e4 and e2 alleles compared with others. For instance, the subtype C1 has a significantly increased APOE e4 allele dosage (median: 0.61 alleles/pt) compared with the subtypes A (p=0.035 under Welsh's t-test), B1 (p=0.015), and B2 (p=0.017). This is consistent with the known influence of the e4 allele on AD pathogenesis, including the formation of amyloid-beta plaques and NFTs, a trait most similar to the molecular signature of the amyloid-predominant subtypes. On the other hand, subtype C2, which shares many molecular features with C1, does not show this association with APOE e4 in the present disclosure. Furthermore, subtype B2 has an increased APOE e2 allele dosage (median: 0.23 alleles/pt) compared with subtype A (p=0.031) and C1 (p=0.0091); however, like APOE e4 among the class C subtypes, the APOE e2 dosage is also much higher than subtype B1 (p=0.049). Therefore, while APOE may modulate AD pathogenesis and contribute to some molecular signatures in a portion of subtypes, the present disclosure shows that APOE dosage cannot explain all the molecular similarities and differences between both related and distinct AD subtypes.
The present disclosure shows that a subset of post-mortem Alzheimer's brains with PHG transcriptomic data available (n=55 out of 151) have additional quantification of amyloid-beta plaque and tau NFT amounts across multiple brain regions. The inventors found that tau NFT counts are significantly associated with the AD subtypes across the inferior parietal lobule (KW test p=0.017) and medial frontal gyrus (KW p=0.034). In these regions, both the class B and C subtypes have increased tau NFT burden. In contrast, the inventors found that amyloid-beta plaque rating is significantly elevated in the inferior parietal lobule (KW p=0.031), medial frontal gyrus (KW p=0.041), and lateral frontal gyrus (KW p=0.012) in only the class C (amyloid-predominant) subtypes. These discoveries are consistent with the previous signatures from the GO pathway analysis, indicating that class C subtypes are amyloid-beta predominant, while class B subtypes are tau NFT predominant. While class A shows increased MAPT pathway activity, it is resilient to the development of tau NFTs, perhaps via increased protein degradation pathway activity. Therefore, these disclosures indicate that class A subtypes are tau NFT resilient. As the inventors expected from the analysis on all samples, the present disclosure shows that both CDR (KW p=0.155) and Braak score (KW p=0.075) are not associated with the AD subtypes. Therefore, the present disclosure shows that AD staging is not associated with the changes in amyloid-beta plaque and tau NFT levels in the subtypes.
The diverse molecular changes that the inventors have identified in the AD subtypes suggest distinct intrinsic molecular mechanisms underlying each subtype. To identify the key regulators of the molecular changes in each AD subtype, the inventors employed a network biology approach that integrates multiscale embedded gene co-expression network analysis (MEGENA) and Bayesian causal network (BN) inference. Towards this end, the inventors constructed a co-expression network based on all the AD samples in the PHG which includes 22,291 genes and 61,152 edges, and a Bayesian causal network comprised of 21,577 genes and 23,554 edges. The inventors performed key driver analysis of each resulting network and the subtype DEG signatures and identified a ranked list of 955 upregulated and 639 downregulated key network regulator genes (KNRs) in the MEGENA network and a ranked list of 1,226 upregulated and 846 downregulated KNRs in the BN network. Finally, the intersection of the BN and MEGENA network KNRs yields a subset of 233 up- and 164 down-regulated KNRs across the five subtypes (Table 2).
Finally, the present disclosure illustrates that these subtype key network regulator genes in the PHG are also key regulators in other brain regions in the MSBB-AD cohort. The same network analysis procedure is applied to the other regions, including the FP, IFG, and STG. As shown in
The present disclosure provides a better understanding of brain cell-type specificity of transcriptomic changes in each AD subtype. To achieve this, the inventors performed cell-type proportion analysis on each sample using the brain cell type marker signatures determined by cell-type specific sequencing previously conducted by the inventors (McKenzie et al. 2018).
Next, using RNA-seq data derived from cultured brain cells (including neurons, astrocytes, microglia, endothelial cells, and oligodendrocytes), the inventors examined the cell-type specificity of the key regulator genes of each AD subtype. As shown in
Enrichment of Known AD Genetic Risk Markers with Specific Subtypes
To identify the influence of genetic determinants on AD subtype, the inventors investigated the differences in polygenic risk score (PRS) between the predicted AD subtypes. First, each sample's PRS was computed against the Kunkle et al. GWAS meta-analysis across AD using the PRSice R package. As non-European samples are excluded from the meta-analysis, the present disclosure excludes non-European individuals in the MSBB-AD during the PRS calculation. The inventors discovered that two of the three subtype classes show an increase in PRS burden compared with non-demented controls, with significant differences in classes A (median PRS=0.235, p=0.048 under Welsh's t-test) and C (PRS=0.44, p=0.013) subtypes. While the inventors found that class B shows an increase in PRS, this difference is not significant from non-demented controls (PRS=0.33, p=0.117). Additionally, the inventors found an increased PRS burden across all AD samples (median PRS: 0.35) compared with non-demented controls (PRS: −0.43, p=0.016). Despite these significant differences between individual AD subtypes and non-demented controls, there is no significant difference in PRS between the AD subtype classes. Therefore, the present disclosure demonstrates that genetic factors likely predispose individuals in the MSBB cohort to developing AD across each subtype, but such factors fail to adequately discriminate the molecular subtypes.
To better interrogate the intersection between known AD-associated genetic loci and the AD subtypes, the inventors also intersect the AD risk genes compiled by the IGAP Consortium and the predicted subtype-specific key regulators. The inventors found that forty-nine key regulators of the MEGENA network across all five MSBB subtypes have genetic loci associated with AD (IGAP gene-level significance p<=0.01), including AMPH, PICALM, MEF2C, EPDR1, and PSMC6 (Table 3). They also discovered that: AMPH, MEF2C, and EPDR1 are downregulated in the class C AD subtypes and upregulated in class A; PICALM is downregulated in subtype B2; and PSMC6 is downregulated in both the class B and C subtypes.
AMPH, otherwise known as amphiphysin 1, is a vesicle cell surface protein important in clathrin-mediated endocytosis and is primarily expressed in neurons. It is an important homolog of BIN1, which is highly expressed in oligodendrocytes as well as immune cell types and has been shown to be correlated with tau levels as well as GFAP and MBP expression. In neurons, increased AMPH expression is associated with increased tau pathology, while increased BIN1 expression is associated with decreased tau pathology. On the other hand, human autoantibodies to AMPH, which are observed in rare diseases (e.g. Stiff person syndrome), and have been shown to induce defective presynaptic vesicle dynamics and composition, leading to decreased GABAergic transmission. Many GABA pathway genes are also predicted as key regulators in multiple subtypes (GABRB2, GABRA1, GABRA4). Therefore, either up- or downregulation of AMPH may lead to synaptic defects, either through directly on the synaptic vesicles or through the secondary effects of tau accumulation. These data match the present disclosure regarding the class C subtypes showing that decreased AMPH likely lead to decreased neuronal activity and upregulated AMPH likely increase tau pathway activity. Therefore, the effects of GABAergic signaling on cognitive dysfunction are likely critical to understanding and treating AD subtypes.
Sub-Classification of MSBB-AD Samples with Mild Cognitive Impairment (CDR=0.5) Recapitulates Three Major AD Subtype Classes
The present disclosure shows that patients with a mild cognitive impairment (MCI), without clinically defined AD, exhibit a subtype-specific signature consistent with the AD subtypes identified here. Based on the clinical data available, the inventors defined MCI for the MSBB-AD cohort as having a CDR score of 0.5, which corresponded to possible or mild dementia (n=32). To differentiate whether MCI samples group together with AD subtypes or if MCI samples cluster separately based on their molecular signatures, the inventors repeated the WSCNA on the combined AD and MCI samples in the MSBB-AD cohort. The inventors discovered that MCI samples are distributed in the branches corresponding with all five AD subtypes, including both amyloid-beta and tau-predominant AD subtypes. A count of the number of MCI samples distributed to each branch, labeled by the corresponding AD subtype is shown in Table 4. The inventors found that the distribution of the MCI samples across the subtypes is different than that of the AD samples (one-way ANOVA p-value: 4.7*10−3, F-statistic=14.947, df=1). The MCI samples are distributed more often to tau-predominant AD subtypes than amyloid-beta predominant AD subtypes. This could be due to a variety of reasons, including potential resilience to certain AD subtypes among the MCI group. Therefore, the present disclosure provides evidence that MCI samples may also be sub-classified into different subtypes, providing additional insights into molecular features of the disease.
WSCNA-based subtyping analysis on the gene expression data from the dorsolateral prefrontal cortex (DLPFC) in the ROSMAP cohort (n=610, with 388 AD cases) confirms the AD subtypes identified in the MSBB-AD cohort. The ROSMAP cohort is an independent study of a different brain region with more mild cognitive impairment (n=287) (MCI) and severe AD patients. The post-mortem brains in the ROSMAP cohort were collected from individuals residing around multiple geographically distant sites across the United States, and patients enrolled in the study were evaluated multiple times over many years before death for cognitive impairment that was suggestive of AD. Cognitive impairment was measured by the mini-mental status exam (MMSE) at multiple timepoints, and pathologic factors such as amyloid-beta and tau burden were measured post-mortem. Additionally, the study includes three predominantly African American and Hispanic communities in multiple locations. Therefore, re-identification of similar molecular subtypes of AD in the ROSMAP cohort should allow for greater generalization of the findings from the MSBB-AD alone.
As with the normalization process performed on the MSBB RNA-seq data, the inventors corrected the ROSMAP gene expression data for batch effect, post-mortem interval, gender, RNA integrity number, and outliers, as well as dementia severity using MMSE scores to remove any potential effect of increasing neuronal loss with AD staging on the ROSMAP subtypes. Previous studies have shown a strong correspondence between CDR scores and MMSE (Bennet et al. 2012), and, therefore, MMSE would serve as a similar measure of dementia severity. The inventors measured the cell type proportion of ROSMAP samples with increasing dementia severity before and after MMSE normalization. Using these methods, the inventors discovered that—like in MSBB-AD—neuronal loss inferred from bulk transcriptomic data in ROSMAP is correlated with decreasing cognition (MMSE), and normalization using MMSE score can eliminate this observed bias and allows for stage-free subtyping. Furthermore, the inventors show in the present disclosure that cell type proportion normalization is not sufficient to eliminate the bias from dementia severity, as a significant residual correlation still persists between MMSE and various cell types. Therefore, the inventors used MMSE normalization to correct for this.
As shown in
The present disclosure also shows that the subjects in the ROSMAP cohort are distributed in the AD subtypes at the same relative proportion as those in the MSBB-AD cohort. Surprisingly, the distributions are roughly proportional to each other, with fewer samples in the C2 subtype than C1, with most samples either falling into class A or B (Table 5). The inventors found no significant difference in the distribution of samples after performing a one-way ANOVA test across both MSBB-AD and ROSMAP, while and excluding samples that don't match any subtype across both datasets (p-value: 0.249, f-statistic=1.62, df=1).
As shown in
The present disclosure shows that, similar to the MSBB subtypes, the ROSMAP subtypes are not associated with clinical and pathological traits including biological sex, cognitive scores, age, CERAD pathologic scores, MMSE, and Braak staging, revealing that these subtypes are truly molecular subtypes with distinct expression patterns. As shown in
Additionally, the inventors computed the polygenic risk scores for ROSMAP subtypes and non-demented controls using the Kunkle et al. GWAS meta-analysis, applying the same methodology as for MSBB-AD. Similar to the MSBB-AD cohort, the present disclosure shows that each of the ROSMAP classes A (median: 0.339, p-value=5.1*10−4 under Welsh's t-test), B (med.: 0.084, p=0.032), and C (med.: 0.191, p=1*10−5) have an increased PRS compared to non-demented controls (med.: −0.239); however, the inventors found no significant PRS differences between subtypes.
Correspondence Between huma AD Subtypes and AD Mouse Models
In the past two decades, several different mouse models of AD have been developed to characterize AD pathology, biology, and behavioral changes. Many of these mouse models perturb various AD-related proteins or regulatory genes such as AD, APP, Tau, PSEN1, TYROBP, and HDAC1. Given the significant differences in molecular changes among the AD subtypes, the inventors examined whether these AD subtypes' transcriptomic signatures match the existing mouse model signatures.
The inventors collected aligned RNA-seq data from 19 mouse model studies of AD that are publicly available at the Accelerating Medicines Project-Alzheimer's Disease (AMP-AD) portal on Synapse.org and GEO. Many of these models harbor multiple amyloid precursor protein (APP) variants and/or Tau protein variants. The Swedish mice (APPK670NM671NL) develop amyloid plaques near neurons. The Dutch (APPE693Q) mice accumulate soluble Aβ in perivascular cells at the blood-brain barrier. The 5XFAD mice recapitulate APP variants seen in familial forms of AD but do not have related tau NFT seen in AD. Tau protein variant TauP301L (‘D35’) and TauP301S (‘PS19’) mice develop hyperphosphorylated tau, as well as presenilin 1 (PSEN1) Δexon9 and M146V variants. The inventors also examined APOE variant mice as well as mouse models with mutant HDAC1, TYROBP, TREM2, BIN1, CD2AP, CL U, and GFAP alleles.
As shown in
The inventors further examined the expression changes of the subtype-specific key regulators in the 5XFAD, TauP301L, and CLU mutant mouse models that match the three subtype classes.
Three major Alzheimer's Disease (AD) subtypes (i.e., typical, intermediate, and atypical subtypes) were identified using the gene expression data in multiple brain regions in the MSBB and ROSMAP cohorts using whole genome sequencing (WGS) data. Copy number variations (CNVs) were identified from the paired-end short read (2×150 bp)-based whole-genome sequencing (WGS) data generated from postmortem brain tissues of 1,411 North American Caucasian individuals across two cohorts from the Accelerating Medicines Partnership-Alzheimer's Disease (AMP-AD) consortium, including the Mount Sinai/JJ Peters VA Medical Center Brain Bank (MSBB) AD cohort, and the ROSMAP cohort using four complementary CNV calling approaches (i.e., CNVnator, Pindel, MetaSV, and Delly2). Within each cohort, individual-level calling results from the four approaches were integrated into a set of population-level CNVs. Furthermore, only consensus CNVs detected by three or more approaches in each cohort were used for afterward analysis to exclude software bias.
By comparing 701 LOAD cases with 710 non-AD cases, the inventors identified 3,012 rare AD-specific CNVs genome-wide. The inventors discovered that AD-specific CNVs were only observed in AD cases and found that sixty-four AD-specific CNVs were conserved across two cohorts. The inventors further found that AD-specific CNVs are enriched in transcriptional regions for biological processes such as cellular glucuronidation, neuron projection, and multicellular organismal signaling, a novel finding not found in AD GWAS. By further integrating clinical, pathophysiological, and transcriptomic data, the inventors discovered that common CNVs affect the transcription levels of genes involved in MHC Class II receptor activity across different brain regions, supporting previous reports of the increased immune response in AD. The inventors discovered that three CNVs (i.e., mCNV233, mCNV236, and mCNV11665) are significantly negatively correlated with the Braak score in the DLPFC region. The CNV-Gene-Trait correlation networks integrating matched multi-omics and clinicopathological data disclosed here first pinpoint one novel CNV, a key regulator for immune response (DEL6619.MSBB/mCNV21544.ROSMAP), and further provide many novel gene targets which connect CNVs and clinical and pathological traits of AD.
After excluding the duplications, contaminated samples, and outliers, the MSBB and ROSMAP cohorts contain 341 and 1,129 samples, respectively. To exclude bias from demographic history, the analysis focused on North American Caucasian samples in the afterward analysis. There were 1,411 samples left in total (MSBB: 284 samples, and ROSMAP: 1,127 samples). By integrating results from four different and complementary CNV calling approaches (CNVnator, Pindel, Delly2, and MetaSV), a set of CNVs was generated for each cohort (
The Consensus Class III includes 7,150 and 9,902 CNVs in the MSBB and ROSMAP cohorts, respectively (Table 6,
The inventors further categorized all samples of the MSBB and ROSMAP cohorts into three clinical diagnostic groups (i.e., the AD group, the mild cognitive impairment (MCI) group, and the normal control (NL) group) based on the disease severity measurement Clinical Dementia Rating (CDR). In the MSBB cohort, there are 224 AD samples with CDR>0.5, 27 MCI samples with CDR=0.5, and 33 NL samples without cognitive impairment (CDR=0). The ROSMAP cohort includes 477 AD samples, 285 MCI samples, and 365 NL samples. In total, there are 701 LOAD, 312 MCI, and 398 NL samples.
Each CNV was assigned to a clinical diagnostic group to which the respective sample belonged (
Among these AD-specific CNVs, 64 were conserved in the two cohorts (
The inventors discovered that one of the sixty-four AD-specific CNVs conserved across the two cohorts resides within the duplication region encompassing the APP gene (chr21: 14,714,507-29,216,662: nsv1398044) (
Genes whose transcriptional regions reside in the genomic regions of AD-specific CNVs are defined as AD-CNV genes in the subsequent analyses. The inventors discovered that the AD-CNV genes are significantly enriched for important biological processes such as cellular glucuronidation, neuron projection, uronic acid metabolic process, extrinsic component of plasma membrane, synapse, catenin complex, and multicellular organismal signaling (
Copy number variations (CNVs) were identified from the WGS data generated from postmortem brain tissues in the MSBB and ROSMAP cohorts, as described earlier. Among 341 MSBB samples, 144 had predicted AD subtypes carrying 19,084 CNVs. Polygenic risk scores (PRS) of SNPs, as well as 22,296 pathway-based and 307 module-based polygenic risk enrichment scores were generated.
For classification and evaluation of classifier performance, the random forest algorithm with caret R package was used. 10-fold cross validation was conducted using the createFolds function and applied automatic tuning with 5 randomly generated parameters. The performance was evaluated by taking the mean of scores collected from 10 iterations.
Data preprocessing and feature selection was applied using only the training data in each iteration. First, preProcess function from the R caret package was used to center and scale the training data and to exclude near zero-variance predictors. Then, the Recursive Feature Elimination (RFE) method was used to select most relevant features in the training set. The RFE algorithm was applied using repeated k-fold cross-validation with three repeats and 5 folds. Different feature sizes (2000, 2100, 2200, . . . , 4000) were evaluated.
To avoid overfitting, the inventors applied 10-fold cross validation. In each iteration, 2000-3600 relevant genetic features were selected by performing feature selection. These selected features have been used to build random forest classifiers. The mean accuracy was estimated as 0.70. The inventors found Area Under the Curve (AUC) values of 0.82, 0.74, and 0.65 for intermediate, typical, and atypical classes, respectively.
RNA sequencing libraries were prepared from peripheral blood mononuclear cells based on the CD14+CD16− markers of AD individuals. Among the sequenced samples, 102 of them have matched brain tissues with AD pathology in the ROSMAP database. The gene expression levels were normalized for different library size by TMM method of edgeR, and adjusted co-variates including MMSE, ExonicRate, SEX, study and Batch by a liner model. The standardized expression levels of different features were used as the input for the machine learning tools. Different parameters were evaluated during the feature selection and model training. Different features sizes (50, 100, 200 and 500) were chosen based on random forest important scores or differentially expressed genes (DEGs, p-value<0.05) between AD subtypes. During the model training, the whole dataset was split into training and testing datasets based on the rules of the K fold cross-validation. Three K values (5, 10, 15) were used for evaluation. For each fold of the cross-validation experiment, nine methods were used to fit the training dataset and the accuracy of the testing dataset was calculated. Finally, the mean accuracy and the standard deviation were calculated from the K-fold experiments. Nine different machine learning methods were implemented including random forest, AdaBoost, logistic regression, decision tree, nearest neighbors (KNN), support vector machines (SVM), naïve Bayes, multi-layer perceptron, and an Ensemble method with equal weights for each classifier.
Based on the cross-validation results from different parameter combinations, the inventors identified several methods that can predict AD subtypes using the monocyte gene expressions. The inventors found that the logistic regression, SVM, and Ensemble methods can achieve 0.84 accuracy with 100 or 200 features selected by random forest importance. They found that the multi-layer perceptron and naïve Bayes methods can reach 0.80 accuracy in predicting the AD subtypes. In contrast, the feature selection based on DEGs only achieved as high as 0.63 accuracy for the different methods. The Receiver Operator Characteristic (ROC) curve showed that the logistic regression, SVM, and Ensemble methods have the Area Under the Curve (AUC) values over 0.92 with 200 features selected by random forest, followed by multi-layer perceptron and naïve Bayes with AUC values over 0.85. As shown in
For each AD subtype signature, log 2 fold change (logFC) of a gene was weighted by the number of genes that were predicted to be regulated by the key driver (KD) gene through the network KD analysis (KDA) of the subtype signature. Where n genes are predicted to be regulated by a KD gene i in the KDA, the weight is calculated as follows:
Drug-induced signatures of neural progenitor cells (NPC) were identified from the NIH Library of Integrated Network-Based Cellular Signatures (LINCS) Phase I and Phase II datasets. Normalized level 3 data were downloaded from the LINCS data portal. Batch effects of gene expression samples treated with compounds or DMSO from multiple batches were corrected to remove systematic biases. In total, drug-induced signatures were identified for 3,629 compound candidates.
Drug candidate scores between LINCS drug and weighted AD subtype signatures were calculated by the EMUDRA algorithm as previously described (Zhou, et al. 2018). The algorithm uses an ensemble approach of four distinct drug repositioning algorithms (cosine similarity, expression weighted cosines, eXtreme Pearson correlation, eXtreme Spearman rank-ordered correlation). Scores for each method were further processed by a normalization. Normalized scores from the 4 methods were combined into a final score. Drug annotation data was derived from deposited information across the DrugBank database.
For each subtype signature in each of the MSBB and ROSMAP cohorts, a candidate drug was nominated if its EMUDRA matching scores were less than −3 in both LINCS Phase I and Phase II datasets. To further reduce the false positives and ensure robustness of predicted drugs, the analysis required that a drug should be nominated based on both MSBB and ROSMAP cohorts. Using this stringent nomination process, the inventors identified 53 FDA approved drugs targeting 1 or multiple AD subtypes. The inventors further excluded highly toxic oncology drugs and drugs unsuitable for oral administration, leading to 46 drugs, shown in Table 8.
Although some of these predicted drugs have been tested in the context of AD, none was tested in the context of AD molecular subtypes.
As shown in Table 8, fourteen of the 46 predicted drugs are predicted to be effective for multiple subtypes but none of them is effective for all the five subtypes.
The inventors discovered that two drugs thioproperazine and nalbuphine can target all subtypes except the intermediate subtype B1. Thioproperazine is an antipsychotic indicated for the management of acute and chronic schizophrenia, including cases that are refractory to more common neuroleptics. Thioproperazine is used for treatment of behavioral and psychological symptoms in older people with dementia. Nalbuphine is an opioid analgesic which is used in the treatment of pain. It acts as a moderate-efficacy partial agonist or antagonist of the μ-opioid receptor (MOR) and as a high-efficacy partial agonist of the κ-opioid receptor (KOR), whereas it has relatively low affinity for the δ-opioid receptor (DOR) and sigma receptors. It is prescribed in older adults with or without Alzheimer disease and related dementia.
The inventors discovered that two drugs Gabexate and Mesoridazine can target three AD subtypes. Gabexate, targeting subtypes C1, C2 and B2, is a synthetic serine protease inhibitor which has been used as an anticoagulant. It also known to decrease production of inflammatory cytokines. Gabexate has been investigated for use in cancer, ischemia-reperfusion injury, and pancreatitis. Gabexate also functions as a small molecular inhibitor of serine protease, which may be related with Alzheimer's disease (Leung, D. et al. 2000), thereby supporting the inventors' discovery of its usefulness for treating AD subtypes subtypes C1, C2 and B2.
Mesoridazine (Serentil), targeting subtypes C2, B2 and A, is a piperidine neuroleptic drug belonging to the class of drugs called phenothiazines, used in the treatment of schizophrenia. Mesoridazine exhibited potent inhibitory effects on acetylcholinesterase, which is a therapeutic target in the treatment of Alzheimer's disease (Ko, L. et al. 1997), providing additional support for the inventors' discovery of mesoridazine's usefulness for treating AD subtypes C2, B2, and A.
The inventors discovered that Menadione (Vitamin K3) can target two subtypes (C2 and B2); it is a fat-soluble vitamin precursor that is converted into menaquinone in the liver. The primary known function of vitamin K is to assist in the normal clotting of blood, but it may also play a role in normal bone calcification. Menadione causes oxidative stress by generating reactive oxygen species. Menadione-induced tau dephosphorylation in cultured human neuroblastoma cells. Menadione sodium bisulfite inhibits the toxic aggregation of amyloid-3, further supporting the inventors' present disclosure of its utility for treating AD subtypes C2 and B2.
The inventors discovered that Carbamazepine (CBZ) targets subtypes C1 and C2; it is an anticonvulsant medication used primarily in the treatment of epilepsy and neuropathic pain. It is used as an adjunctive treatment in schizophrenia along with other medications and as a second-line agent in bipolar disorder. Carbamazepine may be effective in treating agitation in severely demented Alzheimer's in patients that are refractory to neuroleptic medication alone, particularly those that fall into the subtype C1 and C2 categories.
The inventors discovered six drugs Amlodipine, amsacrine, febuxostat, famciclovir, ezetimibe, and carbetocin that are specifically effective for the typical AD subtype C2. Amlodipine, targeting subtype C2, is a L-type calcium channel blocker used to treat hypertension and angina. Amlodipine is being tested in phase 2/3 trials to reduce the risk for Alzheimer's disease (NCT02913664), but the results have not been disclosed (Schampel, A. & Kuerten, S., 2017). Amlodipine cannot pass the blood-brain barrier and may elicit neuroprotective effects to reverse calcium-induced excitotoxicity and mitochondrial dysfunction that underlie several neurologic disorders including Alzheimer's disease when delivered into the brain through the blood-brain barrier (Alawdi, S. H. et al. 2019), supporting the present discovery of its utility in treating AD, subtype 2.
Febuxostat is a xanthine oxidase/dehydrogenase inhibitor that achieves its therapeutic effect by decreasing serum uric acid. Febuxostat is used for the management of chronic hyperuricemia in adults with gout. The association of uric acid levels and dementia is an emerging area of interest. A dose-related reduction in the risk of dementia in older adults has been shown with febuxostat daily dose (Singh, J. A. et al. 2018), which supports the inventors' present disclosure showing its usefulness for treating AD subtype C2.
Ezetimibe is an anti-hyperlipidemic medication that selectively inhibits the intestinal absorption of cholesterol and related phytosterols. It has a mechanism of action that differs from those of other classes of cholesterol-reducing compounds. Ezetimibe does not inhibit cholesterol synthesis in the liver or increase bile acid excretion, but, rather, it localizes and appears to act at the brush border of the small intestine and inhibits the absorption of cholesterol, leading to a decrease in the delivery of intestinal cholesterol to the liver. This causes a reduction of hepatic cholesterol stores and an increase in clearance of cholesterol from the blood; this distinct mechanism is complementary to that of HMG-CoA reductase inhibitors. High cholesterol levels have been positively correlated with a higher incidence of memory impairment and dementia. Therefore, a study was undertaken to investigate the potential of the ezetimibe in memory deficits associated with dementia of Alzheimer's type in mice (Dalla, Y. et al. 2009). Ezetimibe significantly attenuated streptozotocin-induced memory deficits and biochemical changes in mice. The memory-restorative effect of ezetimibe can be attributed to its cholesterol-dependent as well as cholesterol-independent effects. The prior mouse study, when combined with the inventors' discovery of ezitembe's effectiveness in targeting the typical AD subtype C2, highlights ezetimibe's usefulness in addressing in memory dysfunctions associated with dementia of AD.
The inventors discovered that the intermediate AD subtype B2 is specifically targeted by 21 different drugs. Orphenadrine, for example, is an anticholinergic drug used to treat muscle pain and to help with motor control in Parkinson's disease. It binds and inhibits both histamine H1 receptors and NMDA receptors. In addition, it has mild antihistaminic and local anesthetic properties. Moreover, the protective effect of orphenadrine on glutamate neurotoxicity has been shown in vitro and in vivo. Phenazopyridine is an effective oral urinary analgesic commonly used for the treatment of irritative lower urinary tract conditions. Erythromycin ethylsuccinate is a macrolide antibiotic used to treat and prevent a variety of bacterial infections. Erythromycin provided orally to transgenic mice TgCRND8 has been shown to consistently reduced brain Abeta (1-42) levels. Amiodarone is a class III antiarrhythmic. It blocks sodium channels at rapid pacing frequencies, and like class II drugs, amiodarone exerts a noncompetitive antisympathetic action. In addition to blocking sodium channels, amiodarone blocks myocardial potassium channels, which contributes to slowing of conduction and prolongation of refractoriness. It is indicated for initiation of treatment and prophylaxis of frequently recurring ventricular fibrillation and hemodynamically unstable ventricular tachycardia in patients that are refractory to other therapy. Hyoscyamine, the levo-isomer of atropine, is an anticholinergic and a natural plant alkaloid derivative. Hyoscyamine is indicated to treat various gastrointestinal problems such as cramps and irritable bowel syndrome. Hyoscyamine is used for various treatments and therapeutics due to its antimuscarinic properties, such as bladder and bowel control problems, cramping pain caused by kidney stones and gallstones, and Parkinson's disease. In addition, it is used to decrease the side effects of certain medications and insecticides.
The inventors discovered that the intermediate AD subtype B1 is specifically targeted by three drugs (Citalopram, Clonazepam, and lamotrigine). Citalopram is a selective serotonin reuptake inhibitor (SSRI) used in the treatment of depression. It potentiates serotonergic activity in the central nervous system (CNS) due to its inhibition of CNS neuronal reuptake of serotonin (5-HT). The molecular target for citalopram is the serotonin transporter (SLC6A4), inhibiting its serotonin reuptake in the synaptic cleft. In vitro and in vivo studies in animals suggest that citalopram binds with significantly less affinity to histamine, acetylcholine, and norepinephrine receptors than tricyclic antidepressant drugs. Besides its FDA-approved indication for the treatment of depression, citalopram can be used off-label for the treatment of sexual dysfunction, post-stroke behavioral changes, ethanol abuse, obsessive-compulsive disorder in children, diabetic neuropathy, and many others. Additionally, citalopram is used in the treatment of agitation in Alzheimer's disease (Aga, V. M. 2019).
Serotonin is an important neurotransmitter that participates in the modulation of memory formation. The decreased levels of both serotonin and its receptors have been reported in human post-mortem AD studies (Xu, Y. et al. 2012). Serotonin reduces generation of amyloid-β in vitro and in animal models of AD (Cirrito, J. R. et al. 2011). As an SSRI drug, citalopram is indicated in lowering brain Aβ concentrations. It reduced Aβ formation and decreased the release of the proinflammatory factors of activated microglia in vitro (Dhami, K. S. et al. 2013). Chronic citalopram treatment reduced Aβ plaque load in APP/PS1 mice. Citalopram, promoting synaptic plasticity and hippocampal neurogenesis, could also improve learning and memory in social isolation rats (Gong, W. G. et al 2017). Moreover, an acute dose of citalopram administration has been linked to a decreased amount of newly generated Aβ in young healthy humans (Sheline et al., 2014). It is inferred that citalopram has beneficial effects on memory deficit and non-cognitive neuropsychiatric behaviors in treating AD, although the underlying mechanism remains unclear. Recently, Zhang et al. showed that chronic citalopram administration in APP/PS1 mice could rescue impaired short-term memory and ameliorate non-cognitive behavioral deficits as well as decreased the amyloid plaque load in the brain (Zhang et al. 2018). These prior findings support the inventors' present disclosure showing citalopram's utility in the early treatment of AD, particularly subtype B1.
Clonazepam is a long-acting benzodiazepine that can bind to benzodiazepine receptors, which are components of various varieties of gamma-aminobutyric acid (GABA) receptors, thereby potentiating the effects of GABA (DeVane et al. 1991). As GABA is an inhibitory neurotransmitter, this results in increased inhibition of the ascending reticular activating system. Clonazepam, in this way, facilitates various effects like sedation, hypnosis, skeletal muscle relaxation, anticonvulsant activity, and anxiolytic action (Nardi et al. 2013). The agent has been indicated for treating panic disorders, severe anxiety, and various seizures. Although it was shown that clonazepam might potentially reduce microglial neuroinflammation, there were no replicated findings for benzodiazepines (Wilms, H. et al. 2003). The present disclosure indicates that clonazepam would be useful for the treatment of AD, specifically subtype B1.
Lamotrigine is a phenyltriazine antiepileptic used to treat some types of epilepsy and bipolar disorder. Lamotrigine could have some clinical efficacy in certain neuropathic pain states, as well (Jensen, T. S. 2002 & Pappagallo, M. 2003). While the precise mechanism by which lamotrigine exerts its anticonvulsant action are unknown, one proposed mechanism of action of lamotrigine involves an effect on sodium channels. In vitro pharmacological studies demonstrate that lamotrigine inhibits voltage-sensitive sodium channels, thereby stabilizing neuronal membranes and consequently modulating the presynaptic release of excitatory amino acid transmitters. Lamotrigine is used to treat seizures associated with Alzheimer's disease (Vossel, K. A. et al. 2017 & Wu, H. et al. 2015). In addition, it was shown that lamotrigine could ameliorate executive dysfunction and brain inflammatory response in the mouse model of AD. In combination with the inventors' present disclosure showing lamotrigine's utility for targeting AD, subtype B1, early lamotrigine intervention is a therapeutic strategy for AD.
The inventors discovered that atypical AD subtype A is specifically targeted by Mosapride and Ephedrine. Mosapride is a prokinetic serotonin 5HT4-receptor agonist and serotonin 5HT3-receptor antagonist, which stimulates gastric motility. This drug is used clinically to treat gastrointestinal motility disorders. Ephedrine is an alpha and beta-adrenergic agonist which indirectly increases the release of norepinephrine from sympathetic neurons. In combination, these actions lead to larger quantities of norepinephrine present in the synapse, for longer periods of time, increasing stimulation of the sympathetic nervous system. As a sympathomimetic amine, ephedrine has vasoconstrictive, positive chronotropic, and positive inotropic effects. Ephedrine crosses the blood brain barrier and stimulates the central nervous system. Ephedrine products are now banned in many countries, as they are a major source for the production of the addictive compound methamphetamine. The FDA has approved ephedrine only for the treatment of clinically important hypotension occurring in the setting of anesthesia, but the present disclosure shows its utility for treating atypical AD subtype A.
An in silico EMUDRA analysis was performed to match differential expression signatures from each subtype class with known drug differential expression signatures. AD subtype differential expression signatures were generated from the RNA-seq data previously described during molecular subtype characterization, using the log fold-change for each significant DEG as input to EMUDRA. The LINCS L1000 gene expression drug signature dataset of 3,629 therapeutic candidates in neural progenitor cells (NPCs) was also used. EMUDRA analysis was then performed to determine if any drugs are beneficial (e.g., drug DEGs are opposite the direction of subtype DEG signature) or detrimental (e.g., drug DEGs are similar in direction to the subtype DEG signature). After EMUDRA processing, additional annotation was performed by DrugBank database matching to identify each drug's common name and mechanism of action (MOA).
The inventors identified 1,126 drugs with predicted beneficial effect for class A, 966 for class B, and 1,035 drugs for class B, under an adjusted q-value of <0.05; the top 10 per subclass are identified in Table 9. The inventors further discovered that 94 of the drugs were categorized as beneficial for classes A and B, 70 for classes A and C, and 273 for classes B and C. Using this analytical framework, the inventors did not find any drugs predicted to be beneficial for all subtype classes. EMUDRA analysis was also repeated for each of the five individual AD subtypes. The inventors discovered 1046 beneficial drugs for subtype A, 931 for B1, 1034 for B2, 1003 for C1, and 1077 for C2; the top ten drugs for each of the AD subtypes are provided in Table 10.
The DrugBank name, EMUDRA normalized score against LINCS drug signatures evaluated in NPCs (higher is better, negative is a detrimental drug), mechanism of action, p-value, and adjusted q-value fit with the gene differential expression signature for each AD subtype are provided for each of the top 10 drugs per subtype class. A drug must have a negative normalized score and significant (<0.05) q-value to be beneficial drug for a particular subtype class; positive normalized scores are detrimental.
indicates data missing or illegible when filed
The DrugBank name, EMUDRA normalized score against LINCS drug signatures evaluated in NPCs (higher is better, negative is a detrimental drug), mechanism of action, p-value, and adjusted q-value fit with the gene differential expression signature for each AD subtype are provided for each of the top 10 drugs per subtype class. A drug must have a negative normalized score and significant (<0.05) q-value to be beneficial drug for a particular subtype; positive normalized scores are detrimental.
The present disclosure employs AD cohorts of RNA-seq data: the Mount Sinai Brain Bank (MSBB) study and the Religious Orders Study-Memory and Aging Project (ROSMAP). The MSBB-AD cohort includes RNA expression data in the following four different brain regions: Frontal Pole (FP, Brodmann area 10; n=265 with 187 AD cases), Superior Temporal Gyrus (STG, Brodmann area 22; n=240 with 174 AD cases), Parahippocampal Gyrus (PHG, Brodmann area 36; n=215 with 151 AD cases) and the Inferior Frontal Gyrus (IFG, Brodmann area 44; n=222 with 157 AD cases). Clinical phenotypes for each subject are also collected including age, race, sex, hypoxia-induced encephalopathy (HIE) score, cognitive function scores, CDR, age of onset and death, and pathologic findings of Tau and amyloid on biopsy. This cohort was specifically selected to include cases with either no neuropathology, or only neuropathological lesions diagnostic of AD. Cases with mixed neuropathology, e.g., AD and cerebrovascular disease, AD with Lewy bodies, etc. were specifically excluded from the study cohort. Controls were defined as those presenting with no cognitive impairment (i.e., CDR=0) and no significant neuritic plaque or neurofibrillary tangle involvement.
The Religious Orders Study-Memory and Aging Project (ROSMAP) includes whole transcriptome RNA-seq data of the dorsolateral prefrontal cortex (DLPFC) from 615 subjects including those with AD (n=391), mild cognitive impairment (n=64), and non-demented controls (n=160) determined by a CERAD pathology score of Definite AD or Probable AD. Clinical and pathologic phenotypes, as well as demographic information, were also collected as well for each sample including MMSE scores (at time of diagnosis and last known), CERAD score, Braak score, cognitive score, APOE genotype, age of death, age at diagnosis, post-mortem interval, gender, race, education level, and if the subject was Spanish-speaking
On the MSBB-AD cohort data, rounds of bootstrapped reclustering were performed using each of the four clustering algorithms, with 20% of the samples and genes withheld per round. The rate at which pairs of samples shared the same cluster were calculated across all 50 bootstrapping rounds (e.g., a pair of samples clustered together in 35 out of 50 bootstrapped clustering rounds would have a rate of 70%), defined as the pairwise sample reclustering rate. The average pairwise sample reclustering rate were then calculated for all pairs of samples within the sample clusters identified by each algorithm, as well as the average rate of same-sized clusters drawn from a distribution of 100,000 random pairs of samples. These average rates were termed the cluster stability rate and the null cluster stability rate, respectively. Then a calculation was performed to determine the empirical likelihood that the cluster stability rate and the null cluster stability rate are the same, under the binomial distribution. Using this method, a specific subtype grouping is considered a putative subtype if its empirically-adjusted p-value is less than 0.05.
MSBB-AD RNA-seq data was processed with the STAR aligner and normalized using mixed model correction for batch effect, RNA integrity number, rRNA rate, exonic RNA rate, post-mortem interval (PMI), age of death (AOD), inferred race, and inferred sex. Label swaps were inferred and corrected or removed if resolution was not possible.
To remove the disease stage effect, CDR is corrected in the MSBB-AD gene expression data through linear model normalization. This was verified by performing a second round of linear model fitting between CDR and gene expression which showed no correlation significantly differentially expressed genes were observed between patients with and without dementia across all brain regions in the MSBB-AD.
ROSMAP DLPFC RNA-seq data was also normalized for age of death, gender, batch, RIN, and PMI using mixed model correction. Data were then subsequently normalized for last known MMSE score using a linear model, and no genes are shown subsequently to have a correlation with MMSE (R2=0).
Differential gene expression (DEG) analysis was performed to determine the molecular signatures of each of the AD subtypes compared with non-demented (CDR=0) controls, starting with the RNA-seq counts per million (CPM) data as input. The analysis was carried out separately for each comparison. Log-scaled (base 2) gene CPMs from samples in the comparison were first fit to a linear model using the lmfit( ) provided by the limma R package before contrasts were fit. Empirical Bayes statistics for differential expression were then calculated using the eBayes( ) R function, followed by the topTable( ) R function to output significant DEGs. P-values were adjusted by qvalues provided by the qvalue Bioconductor package, using default parameters.
WSCNA identifies sample clusters by analyzing gene expression level correlations between pairs of samples to build a sample correlation network, which is then used to calculate topological overlap (TOM) score that can be used to cluster similar samples together via k-means clustering. WSCNA extends the WINA algorithm to samples by transposing the input matrix so that sample-sample correlations are compared. Note that gene expression data is standardized to z-scores so that expression differences do not inflate the correlation metric.
Key driver analysis (“KDA”) (McKenzie et al. 2017) was applied to the multiscale embedded gene expression network analysis (MEGENA) network generated from parahippocampal gyrus data in the MSBB. KDA first generates a subnetwork NG, defined as the set of nodes in N that are no more than h layers away from the nodes in G, and then searches the h-layer neighborhood (h=1, . . . , H) for each gene in NG (HLNg,h) for the optimal h*, such that
ES
h*=max(ESh,g)∀g∈Ng,h∈{1, . . . ,H}
where, ESh,g is the computed enrichment statistic for HLNg,h. This results in a list of predicted key network regulatory hub genes that may alter the expression pattern of its surrounding nodes and result in the DEG pattern observed.
The inventors developed a random forest (RF) model to classify samples into each AD subtype using the MSBB-AD PHG brain region data for training and then validated this model on the ROSMAP data. All RF models were built using the scipy Python library, with initial parameters of 300 decision trees and a maximum tree depth of 8. Before model creation, both datasets are first corrected for cohort effect between MSBB-AD and ROSMAP, using the ComBat program, to reduce technical differences between studies. Classifier creation was divided into three steps: feature selection, model training, and model validation. First, for the feature selection step, different numbers of top key network regulator genes were selected as features from each subtype (n=1 to 80 features per subtype, total: 5-400 features). Second multiple RF models were trained to predict subtype classification within the MSBB-AD (PHG) cohort and to evaluate the model accuracy using leave-one-out cross-validation between the predicted and observed subtypes. In the model training step, a RF model is created on all AD patients' PHG samples in MSBB-AD using only the top-performing features identified in the previous step. Finally, for the model validation step, the RF model created from the MSBB-AD data is applied to the ROSMAP data, and model accuracy is evaluated by comparing the predicted ROSMAP subtypes from the RF model and the observed ROSMAP subtypes from network-based clustering analysis. The number of features used in the RF model are increased until maximum validation accuracy is achieved, and the top-performing set of features from this model are retained.
A Bayesian causal network was constructed by integrating genome-wide gene expression, SNP genotype, and known transcription factor (TF)-target relationships in the PHG in the MSBB-AD cohort. First, expression quantitative trait loci (eQTLs) are computed and then a formal statistical causal inference test (CIT) is employed to infer the causal probability between gene pairs associated with the same eQTL. The causal relationships inferred are used, together with TF-target relationships from the ENCODE project, as structural priors for building a causal gene regulatory network from the gene expression data through a Monte Carlo Markov Chain (MCMC) simulation-based procedure. A network averaging strategy was followed, in which 1,000 networks are generated from the MCMC procedure starting with different random structure and links that shared by more than 30% of the networks are used to define a final consensus network structure. To ensure the consensus network is a directed acyclic graph, an iterative de-loop procedure was conducted, removing the most-weakly supported link of all links involved in any loop. Key Driver Analysis (KDA) was performed on the consensus Bayesian network to identify key network regulatory genes which can potentially regulate numerous downstream nodes. (Zhang et al. 2013; MacKenzie et al. 2017).
To estimate the cell-type proportion of bulk tissue RNA-seq data, a cell-type deconvolution was performed on each sample using the brain cell type marker signatures provided by the BRETIGEA R package. One thousand marker genes per cell type were used from the human brain cell marker gene set (neurons, endothelials, oligodendrocytes, microglia, and astrocytes) to generate all surrogate cell-type proportion (SPV) estimates, except for oligodendrocyte precursor cells which only had 500 marker genes available. Normalization of the bulk RNA-seq by brain cell type was also performed by BRETIGEA, using the default parameters and the calculated SPV values from the previous step.
To generate cell-type specificity plots, using the mean cell-type gene expression levels from (Zhang Y. et al. 2007), each squared expression value was plotted as a vector from the center on a polar coordinate system. The inventors then calculated the vector sum from each of the expression levels and multiplied the final result by a scaling parameter to create a final point as the estimate of the cell-type specificity of any gene under consideration.
The gene expression profiles of the blood monocytes were downloaded from the AD Knowledge Portal (Synapse ID: syn22024496). The peripheral blood mononuclear cells (PBMCs) were isolated based on the CD14+CD16−markers using the EasySep Human Monocyte Isolation Kit (Negative selection kit, Stemcell Technologies, 19359). Then the Live (BV510-) CD14+/CD16-cells were sorted on a BD Influx cell sorter. RNA sequencing libraries were prepared using SMART-seq2 protocols for cDNA preparation followed by Nextera XT DNA library preparation. The pooled cDNA libraries were sequenced by HiSeq 2500 and NovaSeq 6000 (Illumina).
RNA sequencing libraries were prepared from peripheral blood mononuclear cells based on the CD14+CD16−markers of AD individuals. Among the sequenced samples, 102 of them have matched brain tissues with AD pathology in the ROSMAP database. These brain tissues could be broadly classified into two AD subtypes: the typical (n=57) and untypical (n=45) subtypes. The gene expressions were normalized via a linear model for different library size by TMM method of edgeR, and adjusted co-variates including MMSE, ExonicRate, SEX, study and Batch by a liner model. The resulting standardized expression levels of the different features were used as the input for the machine learning tools. Different parameters were evaluated during the feature selection and model training. Different features sizes (50, 100, 200 and 500) were chosen based on random forest important scores or differentially expressed genes (DEGs, p-value<0.05) between AD subtypes. During the model training, the whole dataset was split into training and testing datasets based on the rules of the K fold cross-validation. Three K values (5, 10, 15) were used for evaluation. For each fold of the cross-validation experiment, nine methods were used to fit the training dataset and the accuracy of the testing dataset was calculated. Finally, the mean accuracy and the standard deviation were calculated from the K-fold experiments. Nine different machine learning methods were implemented with equal weights for each classifier, including random forest, AdaBoost, logistic regression, decision tree, nearest neighbors (KNN), support vector machines (SVM), naïve Bayes, multi-layer perceptron, and an Ensemble method.
The whole-genome sequencing data in the MSBB cohort are available at the AMP-AD knowledge portal (synapse ID: syn10901600). The WGS data in MSBB were generated from 353 individuals, of which 341 had clinical and pathological data (i.e., age of death, Clinical Dementia Rate (CDR), Plaque Mean, CERAD score, and Braak stage score (bbscore)). The 284 North American Caucasian samples were used. Subjects with CDR scores larger than 0.5 were classified as AD, those with CDR equal to 0.5 were classified as mild cognitive impairment (MCI), and those with CDR equal to zero were classified as healthy controls (NL). Under this classification scheme, there are 224 AD cases, 27 MCI cases, and 33 NL cases. The mean sequencing depth of all samples is 36.58X. There is no significant difference in sequencing depth among the three groups.
There are 1,200 individuals with whole-genome sequencing data in the ROSMAP cohort (synapse ID: syn10901595). Outliers that contained more than 6,000 deletions or 1000 duplications in the individual scanning stage and other dementia cases were excluded. The filtering process identified 71 outliers. Non-Caucasian samples were also excluded and 1,127 Caucasian samples were used in the afterward analysis. The subjects were classified into three diagnostic groups based on their final Clinical Consensus Diagnosis (AD: 4 or 5; MCI: 2 or 3; NL: 1). Under this definition, there are 477 AD, 285 MCI, and 365 NL subjects.
The RNA-seq based transcriptomic data in MSBB are also available at the AMP-AD portal (synapse ID: syn3157743). The samples were extracted from the BM-10, BM-22, BM-36, and BM-44 regions. Information about the MSBB samples and sequencing data can be found in a previous publication (Wang M. et al. 2018). RNA-seq data normalization and covariate correction were detailed in (Wang et al. 2020). RNA-seq data in MSBB were adjusted for covariates, including postmortem interval (PMI), RNA integrity number (RIN), race, age of death (AOD), batch effect, and sex.
The RNA-seq transcriptomic data in the DLPFC region of the ROSMAP cohort was downloaded from the AMP-AD portal (synapse ID: syn3388564). The read alignment, gene expression quantification, normalization, and covariate correction were performed using the same pipeline as the MSBB data. Briefly, the reads were mapped to human genome hg19 using the STAR aligner (v2.3.0e), and then gene-level expression was quantified by featureCounts (v1.6.3) based on Ensembl gene model GRCh37.70. Next, gene-level count data was normalized using the R/limma's voom function and subsequently corrected for known covariates, including sequencing batch, PMI, AOD, sex, and RIN by a mixed model.
This application is a § 371 national stage of PCT International Application No. PCT/US21/56315, entitled “Methods for Identifying and Targeting the Molecular Subtypes of Alzheimer's Disease” filed on Oct. 22, 2021, which claims priority to U.S. Provisional Application No. 63/104,416, filed Oct. 22, 2020, which is incorporated herein by reference in their entirety.
This invention was made with government support under U01AG046170 awarded by NIH on Sep. 30, 2018. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US21/56315 | 10/22/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63104416 | Oct 2020 | US |