 
                 Patent Application
 Patent Application
                     20250146074
 20250146074
                    Inflammatory bowel disease (IBD) is a chronic, debilitating gastrointestinal disorder with high rates of treatment failure.1 At present, there is no systematic means to predict response to IBD therapy.
The anti-inflammatory drug mesalamine, also known as 5-aminosalicylic acid (5-ASA), is among the most commonly prescribed therapies available for IBD,2,3 and is usually formulated to be active within the colon; however, over half of IBD patients fail to respond or eventually lose response to 5-ASA over time.4,5 There is thus a need to identify and eliminate the causes of such treatment failure. Apropos, experimental data suggest that the gut microbiome plays a role in the pharmacokinetics of several medications.6-9 For example, gut microbial enzymes were recently identified and characterized in vitro to metabolize digoxin and L-dopa, medications for heart failure and Parkinson's disease, respectively.10-12 In humans, association studies suggest a role of the microbiome in modulating drug efficacy of anti-cytokine biologics and cardiometabolic drugs,13,14 although these generally lack mechanistic explanations. As a result, few such examples of gut microbial drug metabolism have been linked both mechanistically and with clinical outcomes in humans, in the case of 5-ASA or other drugs.
Prior anaerobic stool culture experiments suggest that up to a third of 5-ASA can be metabolized by gut bacteria into N-acetyl 5-ASA,15,16 a compound that lacks anti-inflammatory activity in placebo-controlled trials.17,18 This is at least in part due to the metabolite's diminished (<5%) bioavailability to colonic epithelial cells.19 Experimental work has identified several bacteria capable of 5-ASA metabolism,20,21 but the majority of these are typically absent or of low abundance in patients with IBD. In any case, the specific microbial enzyme(s) responsible for the inactivation of 5-ASA in the gut microbiome have remained elusive for the last forty years. Additionally, prior to this work, the clinical implications of the microbial enzyme(s) which inactivate 5-ASA were unclear.
In one aspect, the invention features a method of determining whether a patient diagnosed with inflammatory bowel disease (IBD) will respond to 5-ASA therapy, the method including:
In some embodiments, the microbial acetyltransferase genes are three or more of the following genes:
In some embodiments, IBD is ulcerative colitis. In other embodiments, IBD is Crohn's Disease. In yet other embodiments, the second line therapy is selected from a biologic for treating IBD. In still other embodiments, the second line therapy is selected from an immunomodulator for treating IBD. In another embodiment, analyzing the stool sample includes employing quantitative polymerase chain reaction (qPCR).
In another aspect, the invention involves treating a patient having IBD which includes administering to the patient a biologic in an amount effective to treat IBD when a stool sample from the patient includes microbial acetyltransferase genes capable of converting 5-ASA to N-acetyl 5-ASA. In one embodiment, IBD is ulcerative colitis. In another embodiments, IBD is Crohn's Disease. In still another embodiment, the biologic for treating IBD is adalimumab, vedolizumab, orinfliximab.
In another aspect, the invention involves treating a patient having IBD which includes administering to the patient an immunomodulator in an amount effective to treat IBD when a stool sample from the patient includes microbial acetyltransferase genes capable of converting 5-ASA to N-acetyl 5-ASA. In one embodiment, IBD is ulcerative colitis. In another embodiments, IBD is Crohn's Disease. In still another embodiment, the immunomodulator for treating IBD is 6-mercaptopurine or tofacitinib.
In another aspect, the invention features a method for treating a patient having IBD includes administering to the patient 5-aminosalicylic acid (5-ASA) in an amount effective to treat IBD when microbial acetyltransferase genes are not present in a stool sample from the patient. In one embodiment, IBD is ulcerative colitis. In another embodiments, IBD is Crohn's Disease. In one embodiment, wherein the microbial acetyltransferase genes are three or more of the following genes:
In any of the aforementioned aspects and embodiments, the patient is a human patient.
Other features and advantages of the invention will be apparent from the following detailed description and figures, and from the claims.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application with color drawings will be provided by the Office upon request and payment of the necessary fee.
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
As is described below, we have identified and biochemically validated gut microbial enzymes capable of generating the clinically ineffective 5-ASA derivative N-acetyl 5-ASA. In turn, we show that the presence of these microbial acetyltransferase genes in gut metagenomes is associated with an increased risk for 5-ASA treatment failure. These findings were driven and validated by metagenomics (MGX), metatranscriptomics (MTX) and metabolomics (MBX) from more than 1,000 stool samples collected over a 1-year period from patients with IBD and controls22 as well as subsequent phylogenetics, heterologous expression and chemical characterization of the target proteins.
Dense Time-Series Multi-Omic Profiles from an IBD Patient Cohort Reliably Identify 5-ASA Use and Chemical Derivatives
To study the role of the gut microbiome in metabolizing 5-ASA and modulating its efficacy, we leveraged data from the Integrative Human Microbiome Project (iHMP or HMP2) Multi-omics Database (IBDMDB, ibdmdb.org), a multi-center cohort of 132 individuals with and without IBD, who each provided repeated medication, dietary, and symptom assessments along with serial stool and blood samples over one year (
Concordance between self-reported use of 5-ASA (Methods) and detection of fecal 5-ASA was 80.3% (
Consistent with clinical practice,25 participants on 5-ASA were more likely to have UC than CD, and less than 10% were users of bonded 5-ASA (sulfasalazine, olsalazine, or balsalazide) (Table 1). Of note, 13 individuals were found to start or resume 5-ASA therapy during their sampling time courses, termed “new users”—providing an opportunity to examine the direct impact of the drug on the microbiome. Samples pre- and post-5-ASA administration were collected an average of 13.0 (±8.7) weeks apart.
Fecal metabolomic profiles segregated significantly according to 5-ASA user status (
To identify specific fecal metabolites directly modulated by 5-ASA use, we focused our attention on the natural experiment of 13 new users of 5-ASA, defined as those patients for whom there were pre- and post-treatment stool samples. 2,306 metabolomic features (2.8%, n=81,868) were significantly differentially abundant when comparing stool pre- and post-5-ASA (paired Wilcoxon, FDR q<0 (
We next sought to estimate the relative contributions of microbial, host, and other factors to these differentially abundant metabolites, given that many of these factors will co-vary during 5-ASA use. In models containing fecal 5-ASA levels; microbiome taxonomic data; host variable data, including diet, disease type, age, other medications, and sex; and an other/unexplained term, we quantified variance explained (EV) in each 5-ASA-shifted metabolite. As expected, by definition, 5-ASA drug levels had the largest predictive power (mean ˜29% EV) in determining 5-ASA-modulated metabolite levels, followed by microbiome features (median ˜19% EV) and other host factors (
We then wanted to determine if any of the remaining 2,293 unannotated metabolic features represented unrecognized biotransformations of 5-ASA. Specifically, we calculated differences between the masses of significantly altered metabolites and 5-ASA, and then mapped these against mass shifts of known microbial biotransformations.8 As a positive control, we started with N-acetyl 5-ASA. As expected, the mass shift corresponded to acetylation (+42), and this peak distinguished 5-ASA users from non-users in the IBDMDB essentially as well as the parent compound (c-statistic 0.99) (
Next, we sought to identify which gut microbial enzymes were involved in generating the clinically ineffective metabolite N-acetyl 5-ASA, as well as the other putative biotransformations of 5-ASA observed above. Notably, typical homology-based approaches6,9,11 were not able to provide any candidates, even using very generous thresholds over more than a thousand metagenomes. Specifically, using two arylamine N-acetyltransferase (NAT) sequences from Salmonella enterica serovar typhimurium LT2 (nhoA) and Pseudomonas aeruginosa, previously shown to metabolize 5-ASA,20,21 as well as 105 NAT or N-hydroxyarylamine O-acetyltransferase microbial protein sequences predicted to metabolize 5-ASA, we queried the complete set of gut microbial proteins in the IBDMDB HUMAnN 3-formatted UniRef90 database at a minimum of 25% full-length homology. This matched 5,685 unique gene clusters. Only 16 of these gene families were also detected in the IBDMDB gut metagenomic profiles, 14 of which mapped to E. coli and 2 of which were NATs. Notably, none were present in our metatranscriptomic data. As expected, there was no detectable Salmonella or Pseudomonas aeruginosa found in the gut of IBD patients. Furthermore, bacterial species previously found to acetylate 5-ASA were absent or of low abundance and/or prevalence in the gut microbiome (
This prompted the development of our novel multi-omic criteria, combining 1) metatranscriptomics- and 2) metabolomics-based strategies. To first derive candidates from MTX profiles, we performed multivariate testing for differentially expressed microbial transcript families between 5-ASA users and non-users, accounting for DNA copy number31 (Methods). We identified two significantly overexpressed UniRef90 gene clusters with putative acetyltransferase function: 1) a GNAT family N-acetyltransferase (UniRef90 ID: C7H1G6) and 2) an acetyl-CoA acetyltransferase (UniRef90 ID: R6TIX3) (linear mixed effects models, β 0.0003, FDR q0.24 and β 0.0003, FDR q 0.14, respectively) (
In our second, MBX-based strategy, we correlated the presence and/or absence of microbial transcripts with fecal N-acetyl 5-ASA levels across samples from 5-ASA users. Specifically, first we represented each metatranscriptomic gene family as present or absent, based on detection in the IBDMDB (relative abundance >0). Next, we classified stool samples as N-acetyl 5-ASA high (>median) or N-acetyl 5-ASA low/negative (<median, including undetectable levels). Then we calculated the sensitivity and specificity with which each metatranscriptomic gene cluster associated with the dichotomized N-acetyl 5-ASA. At a 50% cutoff for sensitivity and specificity (as used previously, Methods), we uncovered an additional 7 putative acetyltransferase gene clusters (
We finally pooled all candidate gene families that met either of these two MTX or MBX criteria (summarized in 
Having identified thiolase and acyl-CoA N-acetyltransferase superfamilies with potential 5-ASA-inactivating capabilities, we sought to biochemically confirm these predicted activities in vitro. Since examining the genomic context of each candidate using assembled complete genomes revealed no obviously related regulatory genes synthesized codon-optimized DNA sequences for each candidate gene according to their UniRef90 amino acid sequence and heterologously expressed them in E. coli(Methods). We expressed the known NAT from Salmonella enterica, the candidate thiolase from Firmicutes CAG:176 (UniRef90 ID R6CZ24), and the acyl-CoA N-acetyltransferase from F. prausnitzii (UniRef90 ID C7H1G6) for further biochemical characterization (
We focused our next efforts on the thiolase superfamily due to its combination of phylogenetic consistency and substantially greater sequence conservation. The thiolase enzyme also accepted longer chain acyl-CoA donors in a pooled assay (
To gain insight into how the Firmicutes CAG:176 thiolase (FcTHL) acetylates 5-ASA, we generated an acetylated unliganded crystal structure of FcTHL refined at 1.9 Å resolution (Methods, Table 3). As seen in other biosynthetic thiolases,35-37 protein subunits join to form dimers, which then link via four interacting loops (one from each subunit) to form a tetramer (
Gut Microbial 5-ASA-Inactivating Acetyltransferases are Associated with Greater Risk of Treatment Failure in 5-ASA Users
Having identified human gut microbial acetyltransferases capable of converting 5-ASA to the clinically ineffective N-acetyl 5-ASA, we next examined whether the presence of these twelve enzymes was associated with 5-ASA treatment failure. Consistent with the primary outcome in prior clinical trials of 5-ASA, we defined this as initiation of corticosteroid treatments within our HMP2 patient subpopulation38 (Methods). While the subcohort to which this was applicable was small, 39 individuals in the HMP2 who were treated with 5-ASA at any point and who provided longitudinal information on steroid use (prednisone, budesonide, methylprednisolone) contributed 609 stool samples across the entire year-long cohort.22 Using multivariate logistic regression models, we adjusted for age, sex, smoking status, and IBD subtype, each of which are thought to be linked to risk of disease flare.39 We also were potentially able to account for host genetics, through inclusion of each participant's NAT2 phenotype (e.g. “fast” vs “slow” acetylator, Methods), as conflicting data suggest that NAT2 may or may not be implicated in 5-ASA metabolism.40-42
We found that the presence of any of four acetyltransferase genes in stool samples was significantly associated with an increased risk of steroid initiation (
In sensitivity analyses, when considering each of these genes as part of a risk score, an increasing number of microbial acetyltransferase genes was significantly associated with a greater risk of steroid initiation (ptrend <0.0001), and models mutually adjusted for the other genes were essentially unchanged. Importantly, those with scores of ≥3 genes were similar in distribution by CD and age to those with 2 or fewer genes. When performing the same analyses among IBD participants who were never-users of 5-ASA (who instead were treated with other drugs e.g., on 6-MP, infliximab, or methotrexate), we found no positive association of these gene families with use of steroids. Further, we considered that participants' samples were not independent and thus we used mixed effects models adjusting for participant and found that presence of 3 or more acetyltransferase genes (compared to 2 or fewer) was also associated with an increased risk of steroid use, albeit with wider confidence intervals given the small sample size (OR 3.87, 95% CI 1.02-14.70) (
To test the reproducibility of our findings and to provide clarity about the temporal sequence between carriage of these microbial acetyltransferases and the transition from 5-ASA use to steroids, we validated our results in the independent Study of a Prospective Adult Research Cohort with IBD (SPARC IBD) (Methods). We note that we were blinded to these validation data, which were not available to us during the initial identification process for genes linked with steroid use in the IBDMDB. Among 208 users of 5-ASA who were steroid-free at study entry, we identified 60 new cases of corticosteroid use (
Finally, we observed that these acetyltransferase genes were variably prevalent across individuals, but also that their presence did not differ between 5-ASA users and non-users (
Nearly forty years ago, researchers observed the conversion of 5-ASA to its inactive form (N-acetyl 5-ASA) in human stool cultures, leading to the hypothesis that gut bacteria may be responsible for this transformation.16 Fifteen years later, several bacteria, including S. typhimurium that contains a known homologue to the human NAT, were shown to inactivate 5-ASA.20 However, the identity of the specific enzymes responsible for this conversion in healthy adults or in those with IBD remained unknown. Here, we offer evidence that the culprits are “moonlighting” gut commensal acetyltransferases44—which canonically condense short chain acyl-CoAs as part of an intracellular energy storage strategy45—and that they may have a direct, deleterious impact on host health.
Intriguingly, these findings highlight both the strengths and weaknesses of enzyme identification by sequence homology alone, as compared with the multi-omic methods we employed. While no transcribed gut microbiome sequences shared even remote homology with the known Salmonella NAT sequence, the predicted structural similarity between its active site residues and those of the Firmicutes thiolase could suggest an analogous and near-convergent enzymatic mechanism. Both enzymes form acyl-enzyme intermediates via cysteine residues that are poised to donate an acetyl group to a nucleophile, and both rely upon stabilizing histidine residues, providing a possible explanation (
More broadly, while some medications are known to influence the microbiome,46 the converse effect of gut bacteria on drug safety and efficacy has even greater translational potential. One of the major challenges to discovering microbe-drug interactions is that their specificity can vary widely. In our study and recent case—examples,7,9-11 drug—metabolizing enzymes are restricted to very few microbial strains and sequences. Others, such as the microbial beta glucuronidases affecting cancer therapies,47 can be taxonomically and phylogenetically widespread. In either case, such microbial enzymes are likely present in the host gut prior to any encounters with these non-antibiotic medications. Thus, the biomarker of interest for precision medicine is the baseline presence of relevant genes, which can often be identified by transcriptional responses to a compound, not the differential abundance of their microbial carriers. This necessitates novel and agnostic discovery methods such as those developed here—beginning with population-scale multi-omics (metatranscriptomics, metabolomics, and metagenomics) in the complex anaerobic nutritional and chemical environment of a natural human host—and ending with biochemical validation for the hundreds of health-relevant drug-microbe interactions that remain to be discovered.
Finally, our findings provide the first direct link between specific gut metabolic enzymes and 5-ASA treatment failure in IBD, thus yielding immediate clinical potential. 5-ASA is the most commonly prescribed medication for UC. To remain effective, it must remain available in unmodified form in the lumen of the colon, and thus, unlike other drugs, it is designed to be poorly absorbed in the small bowel.48 Individuals in which 5-ASA therapy fails are typically progressed to riskier immunosuppressive treatments, and the ability to do so only when necessary (i.e. when 5-ASA-modifying thiolase sequences are present in the gut microbiome) would provide a valuable biomarker for precision medicine.
Furthermore, the elucidation of the enzymatic mechanisms by which microbes inactivate 5-ASA may lead to microbiome-specific inhibitors of enzymes to enhance 5-ASA efficacy in the future. In short, these findings open a conceptual window into personalized microbiome-based medicine for IBD.
The following materials and methods were used to obtain the above results.
As previously published,22 132 IBDMDB participants were recruited between 2013-2014 from five major medical centers: Cincinnati Children's Hospital, Emory University Hospital, Massachusetts General Hospital, Massachusetts General Hospital for Children, and Cedars-Sinai Medical Center, and had an initial colonoscopy upon enrollment to determine study phenotype (UC, CD, or non-IBD). Based on a combination of endoscopic and histopathologic findings as well as gastrointestinal (GI) symptoms or positive imaging (for example, colonic wall thickening or ileal inflammation), 104 participants were classified as having UC or CD. Of these, 79 participants provided 1,036 stool samples every two weeks (either in person or by mail, deposited in 5 ml of molecular biology grade 100% ethanol, further processing details elsewhere22) which were characterized by MGX, MBX, and/or MTX (see below). Due to restrictions such as available sample mass and missing samples, there were minor differences in the sampling pattern and in ability to perform MGX, MBX, and MTX on all samples. Blood draws (whole blood) occurred at approximately quarterly follow-up visits at the clinic.22 The study was reviewed by the Institutional Review Boards at each sampling site.22 All participants provided informed consent.
Throughout the study, participants completed serial questionnaires with information collected on diet, symptoms, and medication use.22 At baseline and with each collection of stool samples, participants completed a food frequency questionnaire, the former long-format and the latter short. Collection questionnaires also reported use of antibiotics, chemotherapy, or immunosuppressants such as steroids. More detailed medication information was collected three times at the three blood draws, at approximately the start, in the middle, and at the end of the study. Using this detailed data, we generated the following medication classes for mesalamine/5-ASA (oral: Asacol, Pentasa, Lialda, and Apriso; per-rectum: Rowasa enemas, Canasa suppositories; and bonded: Dipentum, Colazol, and Azulfidine). Steroids were grouped as use of Entecort, Medrol, or prednisone for the purposes of progression from 5-ASA in our case-cohort study (see ‘Case-cohort study below). Biologics were grouped as use of infliximab, adalimumab, certilizumab, and natalizumab. In the case of missing data, last observation carried forward (LOCF) was employed. Lifestyle and demographic data including age at consent, age at diagnosis, smoking status, and BMI were assessed at the start of the study. All questionnaires, as well as detailed protocols (including product numbers), can be found on the IBDMDB data portal at ibdmb.org/protocols. Responses and metadata are available at ibdmdb.org/results. Dysbiosis is a term previously defined22 as microbial excursions from the non-IBD microbiome that is considered to be both a surrogate of disease severity as well as activity.
As above, to maximize accuracy of 5-ASA user classification, use was determined according to detection of drug levels in stool (see section on ‘Metabolomics measurements and identification’). Based on two distinct groupings of 5-ASA levels in stool (
  
  (TP+TN)/(TP+TN+FP+FN).
In PRISM cohort metabolomics (
As previously described,22 total nucleic acid was extracted from an aliquot of each stool sample Chemagic DNA Blood Kit-96 from Perkin Elmer. This combines chemical and mechanical lysis with magnetic bead-based purification, with full details available from the IBDMDB. DNA samples were quantified using a fluorescence-based PicoGreen assay; RNA samples were quantified using a fluorescence-based RiboGreen assay (see below). RNA quality was assessed via smear analysis on the Caliper LabChip GX.
Metagenomes were generated as described elsewhere.22 In brief, metagenomic DNA was quantified using Quant-iT PicoGreen dsDNA Assay (Life Technologies) and normalized to a concentration of 50 pg/ul. Illumina sequencing libraries were prepared from 100-250 pg DNA using the Nextera XT DNA Library Preparation kit (Illumina). Prior to shotgun sequencing, libraries were pooled by collecting equal volumes of each library from batches of 96 samples. Insert sizes and concentrations for each pooled library were determined using an Agilent Bioanalyzer DNA 1000 kit (Agilent Technologies). Libraries were sequenced on HiSeq2000 or 2500 2×101 to yield ˜10 million paired end reads. Post-sequencing de-multiplexing and generation of BAM and FASTQ files were generated using the Picard suite (broadinstitute.github.io/picard).
Illumina cDNA libraries were generated using a modified version of the RNAtag-seq protocol.49 In brief, 500 ng-1 μg of total RNA was fragmented, depleted of genomic DNA, dephosphorylated, and ligated to DNA adapters carrying 5′-AN8-3′ barcodes of known sequence with a 5′ phosphate and a 3′ blocking group. Barcoded RNAs were pooled and depleted of rRNA using the RiboZero rRNA depletion kit (Epicentre). Pools of barcoded RNAs were converted to Illumina cDNA libraries and then sequenced as above.
Although only host NAT2 genotypes were used in this analysis (see below), whole-exome libraries were originally constructed and sequenced on an Illumina HiSeq 4000 sequencer with 151-bp paired end reads.22 Output from Illumina software was processed by the Picard pipeline to yield BAM files containing calibrated, aligned reads. Host genetic exome sequence data were processed using the Broad Institute sequencing pipeline by the Data Sciences Platform (Broad Institute).
A portion of each selected stool sample (40-100 mg) and the entire volume of originating ethanol preservative were stored in 15-ml centrifuge tubes at −80° C. until all samples were collected. Samples were then thawed on ice and centrifuged (4° C., 5,000 g) for 5 min. Ethanol was evaporated using a gentle stream of nitrogen gas using a nitrogen evaporator (TurboVap LV; Biotage) and stored at −80° C. until all samples in the study had been dried. After a homogenization and vortexing process, the mixture was aliquoted and stored at −80° C. until LC-MS analyses.
A combination of four untargeted LC-MS methods were used to profile metabolites in the fecal homogenates, as previously published;22 two methods that measure polar metabolites (a method that measures metabolites of intermediate polarity (for example, fatty acids and bile acids), and a lipid profiling method: 1) HILIC-pos (positive ion mode MS analyses of polar metabolites), 2) HILIC-neg (negative ion mode MS analysis of polar metabolites), 3) C18-neg (negative ion mode analysis of metabolites of intermediate polarity; for example, bile acids and free fatty acids), and 4) C8-pos. Lipids (polar and nonpolar). Additionally, pairs of pooled reference samples were inserted into the queue at intervals of approximately 20 samples for quality control and data standardization. Samples were prepared for each method using extraction procedures that are matched for use with the chromatography conditions. Data were acquired using LC-MS systems composed of Nexera X2 U-HPLC systems (Shimadzu Scientific Instruments) coupled to Q Exactive/Exactive Plus orbitrap mass spectrometers (Thermo Fisher Scientific).
Taxonomic and functional profiles from the HMP2 were newly generated with the updated bioBakery3 meta'omics workflow using default parameters (huttenhower.sph.harvard.edu/biobakery_workflows) (20) (provenance log: ibdmdb.org/tunnel/cb/document/Public/HMP2/Metadata/anadama_run.log.gz). In brief, reads mapping to the human genome were first filtered out using KneadData v0.7.0. Taxonomic profiles of shotgun metagenomes were generated using MetaPhlAn 3.0, which uses a library of clade-specific markers to provide pan-microbial (bacterial, archaeal, viral, and eukaryotic) profiling (huttenhower.sph.harvard.edu/metaphlan3), quantifying a total of 578 microbial species across all participants (prior to quality control). Functional profiling was performed by HUMAnN v3.0.0.alpha.1 (huttenhower.sph.harvard.edu/humann3). HUMAnN constructs a sample-specific reference database from the pangenomes of the subset of species detected in the sample by MetaPhlAn (pangenomes are precomputed representations of the open reading frames of a given species). Sample reads are mapped against this database to quantify gene presence and abundance on a per-species basis. A translated search is then performed against a UniRef-based protein sequence catalog50 (UniRef90 version 2019_01) for all reads that fail to map at the nucleotide level, in this case assigning 71.0%±13.3% of all DNA reads and 31.6%±8.8% of all RNA reads to UniRef90 gene families. The result are abundance profiles of gene families (UniRef90s), for both metagenomics and metatranscriptomics, stratified by each species contributing those genes, and which can then be summarized to higher-level gene groupings such as ECs or KOs. Relative abundance data was filtered to remove features with no variance or >90% zeroes and then arc-sine square-root transformed to reduce effects from zero-inflation. As reported previously,22 multiple pilot studies and technical replicates were performed as multi-batch analyses, ensuring that data generation methods produced reproducible results.
Raw LC-MS data were acquired to the data acquisition computer interfaced to each LC-MS system and then stored on a robust and redundant file storage system (Isilon Systems) accessed via the internal network at the Broad Institute. Nontargeted data were processed using Progenesis Qlsoftware (v 2.0, Nonlinear Dynamics) to detect and de-isotope peaks, perform chromatographic retention time alignment, and integrate peak areas. Peaks of unknown ID were tracked by method, m/z and retention time. Identification of nontargeted metabolite LC-MS peaks was conducted by: i) matching measured retention times and masses to mixtures of reference metabolites analyzed in each batch; and ii) matching an internal database of >600 compounds that have been characterized using the Broad Institute methods. Temporal drift was monitored and normalized with the intensities of features measured in the pooled reference samples.
For the identification of 5-ASA, two separate standards (SIGMA catalog PHR1060 and 18858) confirmed the identity of the 5-ASA peak in the IBDMDB (m/z: 154.0502, RT: 3.83 min) through retention time and spectral matching (
To assess association of overall metabolome structure with 5-ASA use, omnibus testing was performed on Bray-Curtis dissimilarity matrices from MBX measurements. Profiles were first log-transformed and filtered to exclude features with more than 90% missing values before calculation of dissimilarities. Quantification of variance explained for the metabolomics data was calculated using PERMANOVA with the adonis function in the R v4.0.1 package “vegan” 2.5-6.51 The total variance explained by each variable was calculated independently of other variables to avoid issues related to ordering, as done previously.22 PCoA ordination in 
To identify individual significantly differentially abundant metabolomic features before and after 5-ASA administration in 
To identify possible biotransformations of 5-ASA, and specifically gain insights into the chemistry of microbial 5-ASA metabolism, we performed a four-step search, replicated in an independent cohort. First, we calculated mass differences (Δm/z) between the mass of each unannotated metabolite significantly altered by 5-ASA use (identified above, among the subpopulation of “new users”) and the expected parent compound (i.e. 5-ASA, a mass of 154.05). Second, we mapped these against known mass shifts from microbial biotransformations (6). Third, we calculated a c-statistic for each candidate metabolite, to determine how well it discriminated 5-ASA users from non-users in the entire study population, and subset these features to 95% predictive value (c-statistic >0.95). Finally, we further filtered this subset of annotated metabolomic features to those enriched among 5-ASA users compared to non-users (as opposed to enriched in non-users), given that drug derivatives should be more common among users. We then replicated this same four-step process in an independent cohort of 220 participants with and without IBD, the Prospective Registry in IBD Study at MGH (PRISM), which had fecal metabolomics generated using the same platform (26). The features from HMP2 and PRISM then were overlaid, yielding the proposed 5-ASA derivatives in 
As further confirmation of our proposed derivatives, we pursued an independent metabolomics identification approach, where the known MS2 spectra of N-acetyl 5-ASA (MassBank of North America) were passed to the GNPS workflow MASST,53 then input to the METABOLOMICS SNETS-V2 workflow, and finally passed to a second GNPS workflow, Network Annotation Propagation (NAP_CCMS v1.2.5), all with default parameters.54 The results were exported for visualization using the GNPS workflow MolNetEnhancer (v15). Nodes that were adjacent to (and thus co-occurring with) N-acetyl 5-ASA were inspected, and their m/z ratios were compared to the proposed candidates.
In order to identify relative contributions of microbial, host, and other factors to the subset of differentially abundant metabolomic features altered by 5-ASA in 
Differentially abundant microbial species between 5-ASA users and non-users were identified using arcsine square-root transformed taxonomic data with an FDR q<0.25. Abundances were fit with the following per-feature linear mixed-effects model:
  
    
  
Using the significant results of this differential abundance testing among species and the significantly altered metabolites identified among new users, we then conducted, hierarchical all-against-all association testing using Spearman associations using HAIIA 0.8.20 for 
We first performed an amino acid homology search (blastp function, Diamond v0.9.24.125) of the Human Microbiome Project (HMP) reference isolate gut microbial genomes (UniRef90 version 2019_01) using an e-value of 10, >25% identity, “sensitive” mode, and 107 input microbial protein sequences (Table 4). Two came from arylamine N-acetyltransferases (NAT) from Salmonella enterica serovar typhimurium LT2 (nhoA) (UniProt accession, Q00267) and Pseudomonas aeruginosa (UniProt accession, Q9HUY3), previously shown to metabolize 5-ASA.20,21 105 additional sequences were also included (that majority of which were NATs or N-hydroxyarylamine O-acetyltransferases), after having been generated by “Similarity to Identify MicrobioMe Enzymatic Reactions” (SIMMER),58 a computational tool designed to make informed hypotheses about microbial enzymes predicted to metabolize xenobiotics, in part informed by prior experimental knowledge. For this tool, we provided the SMILES substrate input as:
  
  C1=CC(═C(C═C1N)C(═O)O)O·CC(═O)SCCNC(═O)CCNC(═O)[COOH](C(C)(C)COP(═O)(O)OP(═O)(O)OC[COOH]1[COH]([COH]([COOH](O1)N2C═NC3═C(N═CN═C32)N)O)OP(═O)(O)O)O
  
  And the SMILES product as:
  
  CC(═O)NC1═CC(═C(C═C1)O)C(═O)O·CC(C)(COP(═O)(O)OP(═O)(O)OC[COOH]1[CO H]([COH]([COOH](O1)N2C═NC3═C(N═CN═C32)N)O)OP(═O)(O)O)[COH](C(═O)NCCC(═O)NCCS)O
We requested to report all alignments that were found and then overlapped the resultant list of 5,206 unique UniRef90 IDs with the entire catalog of UniRef90 IDs provided by the HMP2 MGX and MTX datasets.
In our MTX-based strategy, we used differential abundance testing to identify significantly overexpressed acetyltransferases based on paired MTX and MGX profiles. After an arcsine square-root transformation was applied to both MTX and MGX datasets, abundances were fit with the following per-feature linear mixed-effects model, which adjusted for DNA copy number, which allows for biological and technical zero values while also controlling for underlying DNA levels (27):
  
    
  
as prior studies have shown that RNA levels can be a function of mere abundance of DNA.59 Fitting was performed with the/me function from the R package “nlme” package v3.1-149,60 where significance of the association was assessed using Wald's test. Nominal P values were adjusted for multiple hypothesis testing with a target FDR q of 0.25. In order to reduce the effect of zero inflation in microbiome data, features with a relative abundance of less than 1 e-8 in at least 10% of samples were excluded. Among the significant hits, we then sought to further characterize those with putative acyl transfer function, as defined by containing an acetyltransferase domain as annotated on UniProt.
In our MBX-based strategy, we dichotomized each metatranscriptomic gene family as present (relative abundance >0) or absent. Next, we classified stool samples as N-acetyl 5-ASA high (>median) or N-acetyl 5-ASA low/negative (<median, including undetectable levels). Then we calculated the sensitivity and specificity with which each metatranscriptomic gene cluster associated with the dichotomized N-acetyl 5-ASA. Filtering gene clusters with greater than 50% sensitivity and specificity (as used previously),61 we then sought to further characterize those with putative acyl transfer function, as defined by those containing an acetyltransferase domain as annotated on UniProt.62 For continuous correlations, abundance of the seven additional acetyltransferase gene clusters were then arc-sine square-root transformed and associated with log-transformed N-acetyl 5-ASA with linear regression, with an FDR q<0.25 threshold applied (95% CI shown in 
The pooled twelve amino acid sequences were aligned with clustalW v2 (default parameters).63 Candidate enzymes were then grouped in an average distance tree using neighbor-joining with Blosum62 (Jalview v2.11.1.3),64 which were then mapped to the InterPro database 85.065 to identify protein superfamilies. Amino acid residues in the multiple sequence alignment were colored by clustalX v2.63 Taxonomic lineages of each candidate enzyme was then determined using the UniProt database.62
For gene context analysis, we downloaded the assembled reference genomes of the strains of interest from the NCBI assembly database66 in July 2019. We then queried the proteins of interest (our candidate UniRef90s) against the proteins encoded by the reference genomes using DIAMOND v0.9.24,67 and identified homologous loci by requiring identity 290% and coverage 280%. Next, we examined up to 10 genes that were upstream and downstream from these acetyltransferase gene loci to explore genomic neighborhoods. These neighborhood genes' UniRef accessions (from NCBI) were then mapped to the UniProt62 to assess their approximate functions, and in particular, were screened for any glosses indicating transcriptional regulators and/or promoters.
Finally, using previously defined phylogroups generated for F. prausnitzii,33 based on a comparative analysis of genomes, in which a phylogenetic tree of the family Ruminococcaceae was constructed based on concatenated alignments of 245 highly conserved proteins. Within this hierarchy, F. prausnitzii was represented as a monophyletic group of strains, and within this, there were three clear and statistically significant splits into species/subspecies level groups (
We examined the association of candidate acetyltranferases with risk of disease of relapse among a convenience sample of 39 “ever users” of 5-ASA—defined as any use of 5-ASA in the cohort. Participants with missing corticosteroid data were excluded. First, we discretized each of the 12 candidate protein families as present/absent based on MGX abundance per sample >0. We estimated multivariate odds ratios (ORs) for relapse with 95% confidence intervals (Cis) using logistic regression, adjusting for potential confounders, including smoking status (never vs ever), age at consent (continuous), disease type (CD vs UC), and host acetylation phenotype (defined below). We also tested for trend by considering each protein as an independent risk factor, and creating an additive score. In a sensitivity analyses, we tested the association between presence of the E. coli NAT identified in our homology search with steroid use, as well as the association between these acetyltransferases and risk of steroid use among 5-ASA non-users. Finally, as shown above in the results, our results were similar when we used fixed effects models and mixed effects models (including a random effects term for individual effects). In light of this similarity, and the fact that steroid use can truly occur multiple times throughout a year, we focused on the results of fixed effects models.
We validated the association between metagenomic carriage of gut microbial acetyltransferases with risk of 5-ASA treatment failure in an ongoing independent cohort, Study of a Prospective Adult Research Cohort with IBD (SPARC IBD), from the IBD Plexus platform of the Crohn's & Colitis Foundation. As previously published,68,69 3,029+well-phenotyped adult patients with UC, CD, or unclassified IBD have been recruited since 2015 from 17 major academic medical centers with no overlap with those in the IBDMDB: Baylor College of Medicine, Baylor University Medical Center, Brigham & Women's Hospital, Indiana University Hospital, Mayo Clinic, Medical College of Wisconsin, NYU Langone Medical Center, University of Alabama Medicine, University of Cincinnati Medical Center, University of Chicago, University of Maryland, University of Michigan, University of Pennsylvania, University of Pittsburgh, University of Wisconsin, Vanderbilt University Medical Center, Washington University School of Medicine. The study protocol was approved by the University of Pennsylvania's institutional review board. All participants provided informed consent.
The study population consisted of 240 patients enrolled in SPARC IBD between 2016 and 2020 who 1) provided a stool sample at time of consent and 2) were on 5-ASA at cohort entry. To ensure a prospective design, participants who were on steroids at baseline were excluded from this analysis (n=26). Additionally, we excluded participants who withdrew consent from the study after enrollment (n=6). Thus, the remaining 208 individuals formed the analytic sample. Follow up was through Oct. 19, 2021.
At the time of consent, samples were collected by the patient at home and shipped directly to the biobank through a courier service as previously described.69 A minority of individuals (n=33) provided more than one sample during the cohort. Although the SPARC IBD protocol does not include a prespecified schedule for repeated sampling after enrollment, participants are able to provide additional biosamples at time of usual care sigmoidoscopy or colonoscopy. Additional stool samples may also be obtained approximately 3 months after a change in therapy if the patient has a follow-up office visit during that time.
At enrollment current medications were captured through the SPARC electronic data capture tool. Subsequent medication data in SPARC IBD were collected from an IBD Smartform and more broadly from the Epic electronic health record system. In addition, electronic surveys were delivered to patients every 3 months to capture IBD-related symptoms and current IBD therapies to track patient-reported disease activity between office visits and to further confirm medications. Use of 5-ASA was defined identically as in the IBDMDB by use any of the following, which includes bonded formulations: oral mesalamine, mesalamine suppository, mesalamine enema, olsalazine, balsalazide, and sulfasalazine. Use of corticosteroids was defined identically as in the IBDMDB by use any of the following: budesonide, prednisone, or methylprednisolone.
Genomic DNA extraction was performed via the Mag Attract Power Soil kit (Qiagen, Cat #27000-4-EP) following the manufacturer's instructions. Total nucleic acids were extracted using the Qiagen MagAttract PowerMicrobiome kit (Qiagen, Catalog No. 27500-4-EP).
Metagenomic sequencing was performed by Diversigen. Genomic DNA was prepared into libraries for sequencing by the Nextera DNA Flex library preparation kit (Illumina, Catalog No. 20018705) with Nextera Index Kit (Illumina, Catalog No. 20027213). Library size estimation and quantification were determined with the fragment analyzer (Advanced Analytical Technologies, Inc.) electrophoresis system. The prepared libraries were sequenced via the NovaSeq 6000, 2×150 bp sequencing platform (Illumina).
Data from the validation cohorts were not available to us while developing the original prediction models in the HMP2. Sequencing data from SPARC IBD went through the exact same analysis pipeline (biobakery3) as in the IBDMDB to extract the metagenomic data that our prediction models were based on. Only microbial acetyltransferases that were significantly predicted to be linked with steroid use in our discovery cohort (R6CZ24, T5S060, R5CY66, C7H1G6) were considered for further analysis. As in the IBDMDB, we discretized each of the four candidate protein families as present/absent based on abundance per sample >0. To mitigate sampling bias, samples were not included if they were provided fewer than 30 days after the previous. Similarly, to avoid sampling bias, participants were censored when they reported using steroids.
The primary exposure was defined as metagenomic carriage of 3-4 acetyltransferases compared to 0-2 acetyltransferases, just as performed in the IBDMDB. We were not powered to examine the association between each acetyltransferase with risk of steroids. Given the possibility of correlation within samples, we calculated the odds ratios (ORs) and 95% confidence intervals (Cis) of use of steroids using multivariable generalized estimating equations in the in the R v4.0.1 package “gee” v. 4.1370 with adjustment for age and sex. In a sensitivity analysis, we limited our analysis to the single baseline sample, and related metagenomic carriage of these genes with future steroid risk using multivariable logistic regression with adjustment for age, sex, and IBD disease type. Random effects meta-analysis was performed using the R v4.0.1 package “metafor” 1.4-0.71
Human NAT2 acetylator genotype status has previously been linked with increased risk of bladder cancer due to decreased ability to detoxify carcinogens,72 as well in predicting drug-induced liver injury from isoniazid.73 Although studies attempting to link NAT2 acetylation phenotypes with 5-ASA response have not been fruitful,40 many continue to speculate if NAT2 could play a role in inactivating 5-ASA.41 Therefore, using a simple panel of two SNPs identified in participants' exome sequencing data (rs1041983, rs1801280) previously shown to accurately predict acetylation phenotypes as “fast” vs. “slow”,74 we first calculated the sum of variant alleles in each patient. As previously, any participant who had two or more variant alleles could then be categorized as a slow acetylators, inferring that they had either the NAT2*5 or NAT2*6 haplotype.
Candidate 5-ASA acetyltransferase genes (as well as control GFP and known Salmonella typhymirium gene) were synthesized according to the E. coli codon-optimized amino acid sequence on UniProt (GenScript) and then cloned into pET28b inducible expression vectors using Gibson assembly (including an in-frame either N- or C-terminal polyhistidine sequence). Identities of the constructs were confirmed with DNA sequencing and then were transformed into E. coli BL21 strains for expression according to the standardized protocols from GenScript. All E. coli expression constructs were grown anaerobically in sealed Hungate tubes in Terrific Broth (VWR) supplemented with 5 mM MgSO4 and kanamycin (50 μg/mL) and were incubated at 37° C. overnight before dilution at 1:100 into fresh media the following morning. These diluted cultures were grown anaerobically in sealed flasks at 37° C. to an OD600 ˜0.6 at which point protein expression was induced by the addition of 100 μM Isopropyl β-D-1-thiogalactopyranoside (IPTG, TEKNOVA) followed by aerobic culture overnight at 16° C. The following morning, E. coli were pelleted by centrifugation and then lysed in 20 mM HEPES pH 8.0 buffer containing 30 mM imidazole and 300 mM NaCl and supplemented with 0.5% Octyl-β-D-thrioglucopyranoside (Chem-Impex), 0.5 mg/mL lysozyme (Sigma), and SIGFAST protease inhibitor cocktail (Sigma). After lysis and clarification by centrifugation, lysates were incubated for 1 hour at 4° C. with His-Pure Cobalt Purification beads (Thermo). After incubation, beads were washed with 6 column volumes of 20 mM HEPES pH 8.0 buffer containing 30 mM imidazole and 300 mM NaCl before elution with 1 column volume of 20 mM HEPES pH 8.0 containing 300 mM imidazole and 300 mM NaCl. Eluted protein was then buffer exchanged into 20 mM HEPES pH 8.0 with 300 mM NaCl and 10% glycerol for storage at −80° C. prior to further experiments.
Enzyme concentrations were calculated according to Beer's Law with extinction coefficients calculated by Benchling software based on the amino acid sequences of the putative enzymes. Assays mixtures contained 20 mM HEPES pH 7.5 with 50 mM NaCl with 50 μM enzyme for the end-point assay and 5 μM for promiscuity assays; 1 mM of the specified substrates (5-ASA, 4-ASA, dapsone, isoniazid, procainamide, hydralazine), and 1 mM of the indicated acyl-CoAs (acetyl-CoA, propionyl-CoA, butyryl-CoA, CoALA Biosciences) supplemented with 1 mM MgCl2 and 1 mM Tris-(2-carboxyethyl)phosphine (TCEP, Sigma). Reactions were conducted at 37° C. for 6 hours for the initial endpoint assay and room temperature for 1 hour for the promiscuity assay. Quenching/extraction for the promiscuity assay was with equal volumes of acetonitrile and methanol. Quenching/extraction for the end-point was one part sample and nine-parts extraction mix (75% acetonitrile:25% methanol v/v with 10 μM 1,2-13C2-taurine from Cambridge Isotopes as an internal standard for quantification). Samples were vortexed and cooled to −20° C. and then centrifuged prior to LC-MS analysis.
For the kinetic assay, assay conditions were the same as above (20 mM HEPES pH 7.5 with 50 mM NaCl, 1 mM MgCl2, and 1 mM TCEP, and 5 μM enzyme). Concentration of 5-ASA substrate was varied as indicated with 1 mM acetyl-CoA held constant. Reactions were carried out at 37° C. for 0, 10, 20, 30, 45, 60, 90, and 120 min in triplicate and quenched/extracted with one part sample and nine-parts extraction mix (75% acetonitrile: 25% methanol v/v with 10 μM 1,2-13C2-taurine from Cambridge Isotopes as an internal standard for quantification). Samples were vortexed and cooled to −20° C. and then centrifuged prior to LC-MS analysis.
For the promiscuity assays, mass spectrometry analyses were conducted using an LC-MS system composed of an Agilent 1260 Infinity HPLC (Agilent Technologies) coupled to a 6530 Accurate-Mass QTOF LC-MS (Agilent Technologies). The samples were injected into a 150×3 mm Hypersil GOLD aQ column (Thermo Scientific) at 30° C. The column was eluted isocratically at a flow rate of 500 μL/min with 99% mobile phase A (0.1% formic acid in water) for two minutes followed by a linear gradient for 16 minutes to 99% mobile phase B (acetonitrile with 0.1% formic acid). This ratio was continued for 5 minutes before returning to 99% mobile phase A over 1 minute and continuing at that ratio for additional 6 minutes. MS analyses were performed in negative ion mode (N-acetyl-5-ASA, N-propionyl-5-ASA, N-butyryl-5-ASA, and N-acetyl-4-ASA) and positive ion mode (N-acetyldapsone, N-acetylprocainamide, N-acetylhydralazine, and N′-acetylisoniazid) using electrospray ionization, full scan MS acquisition over 100 to 3000 m/z, a resolution setting of 10,000, and centroided masses. Other MS settings were as follows: heater temperature 300° C., ESI nebulizer 35 psi, spray voltage 3.5 kV, 2 spectra/s. The identities of N-acetyl-5-ASA (Cayman Chemical, Ann Arbor, MI) and N-acetyl-4-ASA (Santa Cruz Biotechnology, Dallas, TX), were confirmed using authentic mass standards. Remaining identities were inferred using calculated masses with a mass detection tolerance of 10 ppm. Raw data from the LC-MS were analyzed using Agilent MassHunter Qualitative Analysis 10.0 software.
For end-point and kinetic assays, mass spectrometry analyses were conducted using an LC-MS system composed of an Agilent 1290 Infinity II UHPLC (capable of column switching) coupled to a Agilent 6470A Triple Quadrupole LC/MS. The samples were injected into an Infinity Lab Poroshell120 Hilic column (2.1×100 mm 2.7 μm) at 25° C. The column was eluted isocratically at a flow rate of 600 μL/min with 5% mobile phase A (10 mM ammonium formate with 0.1% formic acid in water) for 18 seconds followed by a linear gradient for 132 seconds to 60% mobile phase B (acetonitrile with 0.1% formic acid). This was followed by a 3 second gradient to 40% mobile phase B. This was then followed by a 27 second gradient returning to 5% mobile phase A at a flow rate of 1200 μL/min. This flow rate and ratio was held for an additional 48 seconds. Additional column equilibration was carried out on a secondary pump for 117 seconds at 5% mobile phase A and a flow rate of 1000 μL/min. MS was conducted in negative ion mode using electrospray ionization. Data were collected via MRM (N-acetyl 5-ASA MRM 194->150, 1,2-13C2-taurine MRM 125.9->79.9). Other MS settings were as follows: heater temp 300° C., ESI nebulizer 45 psi, spray voltage 3.5 kV, acquisition time 100 ms/spectrum. Raw data from the LC-MS were analyzed using Agilent MassHunter Quantitative Analysis Version 10.1 Software. For absolute quantification, all samples were normalized to the taurine internal standard and concentrations calculated against a standard curve.
  Oscillibacter sp., strain KLE 1745 was obtained through BEI resources, the NIH Institute of Allergy and Infectious Diseases (NIAID) as part of the Human Microbiome Project. This strain was cultured anaerobically directly from freezer stocks for three days in reinforced Clostridial media (BD Difco) at 37° C. before the addition of 5-ASA to a final concentration of 1 mM (stock concentration 100 mM in DMSO, sterile filtered with 0.22 μm filter) for an additional 24 hours prior to harvesting. Spent media was centrifuged at 10,000×g for 3 minutes to pellet bacteria and cell debris. 10 μL of media was then extracted with 90 μL of acetonitrile:methanol (75%:25% v/v) and measured via LC-QTOF as described elsewhere in the methods section.
The C-terminal non-cleavable His tag construct of the thiolase from the Firmicutes CAG:176 thiolase (FcTHL) was overexpressed in E. coli BL21 (DE3) and purified using affinity chromatography and size-exclusion chromatography. Briefly, cells were grown at 37° C. in TB medium in the presence of 50 μg/ml of kanamycin to an OD of 0.8, cooled to 17° C., induced with 500 μM isopropyl-1-thio-D-galactopyranoside (IPTG), incubated overnight at 17° C., collected by centrifugation, and stored at −80° C. Cell pellets were lysed in buffer A (25 mM HEPES, pH 7.5, 500 mM NaCl, 0.5 mM TCEP, and 20 mM Imidazole) using Microfluidizer (Microfluidics), and the resulting lysate was centrifuged at 30,000 g for 40 min. Ni-NTA beads (Qiagen) were mixed with cleared lysate for 30 min and washed with buffer A. Beads were transferred to an FPLC-compatible column, and the bound protein was washed further with buffer A for 10 column volumes and eluted with buffer B (25 mM HEPES, pH 7.5, 500 mM NaCl, 0.5 mM TCEP, and 400 mM Imidazole). The eluted sample was concentrated and purified further using a Superdex 200 16/600 column (Cytiva) in buffer C containing 20 mM HEPES, pH 7.5, 200 mM NaCl, and 10 μM TCEP. FcTHL containing fractions were concentrated to ˜50 mg/mL and stored in −80° C.
FcTHL at 800 μM was crystallized in 100 mM sodium acetate, pH 4.9, 55% MPD, and 20 mM CaCl2) by sitting-drop vapor diffusion at 20° C. Crystals were transferred briefly into crystallization buffer containing 25% glycerol prior to flash-freezing in liquid nitrogen.
Diffraction data were collected at beamline NE-CAT-241D-E at the Advanced Photon Source (Argonne National Laboratory). Data sets were integrated and scaled using XDS.75 Structures were solved by molecular replacement using the program Phaser and the search model PDB entry 4XL2. Iterative manual model building and refinement using Phenix76 and Coot77 led to a model with excellent statistics.
Protein structures were visualized in ChimeraX v1.15.78 The predicted structure was then overlaid using the MatchMakertool (default settings) on monomers from the PDB entries 1 DM3 and 1 E2T. To compare active sites, residues were pre-specified, and then overlaid using the match tool(default settings) which performs least-squares fitting of active site atoms, first moving the one set of atoms onto the second, followed by the remaining model containing the atoms.
  
    
      
        
        
          
            
          
        
        
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
        
      
      
        
        
        
        
        
          
            
            
            
            
          
          
            
            
            
            
          
          
            
            
          
        
      
      
        
        
        
        
        
          
            
            
            
            
          
          
            
            
            
            
          
          
            
            
            
            
          
          
            
            
            
            
          
          
            
          
          
            
            
            
            
          
          
            
            
            
            
          
          
            
            
            
            
          
          
            
            
            
            
          
          
            
            
            
            
          
          
            
            
            
            
          
          
            
            
            
            
          
          
            
            
            
            
          
          
            
          
          
            
              aBonded 5-ASA formulations include sulfasalazine, balsalazide, and olsalazine
          
          
            
              bAs previously defined in the HMP2, Methods22
          
        
      
    
  
  
    
      
        
        
          
            
          
        
        
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
        
      
      
        
        
        
        
          
            
            
            
          
        
      
      
        
        
        
        
        
        
          
            
            
            
            
            
          
          
            
            
            
            
            
          
          
            
          
          
            
            
            
            
            
          
          
            
            
            
            
            
          
          
            
            
            
            
            
          
          
            
            
            
            
            
          
          
            
          
        
      
    
  
  
    
      
        
        
          
            
          
        
        
          
            
          
          
            
          
          
            
          
        
      
      
        
        
        
          
            
            
          
          
            
            
          
        
      
      
        
        
        
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
          
          
            
            
          
          
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
            
          
          
            
          
        
      
    
  
  
    
      
        
        
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
        
        
          
            
          
        
      
      
        
        
        
        
        
          
            
            
            
            
          
          
            
            
            
            
          
          
            
            
          
          
            
            
            
            
          
          
            
            
            
            
          
          
            
            
            
            
          
          
            
            
            
            
          
          
            
            
          
        
      
      
        
        
        
        
        
          
            
            
            
            
          
          
            
            
            
            
          
          
            
            
          
          
            
            
            
            
          
          
            
            
            
          
          
            
            
          
        
      
    
  
  
    
      
        
        
          
            
          
        
        
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
        
      
      
        
        
        
          
            
            
          
          
            
            
          
          
            
            
          
        
      
      
        
        
        
        
          
            
            
            
          
          
            
            
            
          
          
            
            
          
          
            
            
            
          
          
            
            
            
          
          
            
            
            
          
          
            
            
          
          
            
            
            
          
          
            
            
            
          
          
            
            
            
          
          
            
            
            
          
          
            
            
          
          
            
            
              aIBD-undifferentiated
          
        
      
    
  
  
    
      
        
        
          
            
          
        
        
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
          
            
          
        
      
      
        
        
        
        
        
        
          
            
            
            
            
            
          
          
            
            
          
        
      
      
        
        
        
        
        
        
          
            
            
            
            
            
          
          
            
            
            
            
            
          
          
            
            
            
            
            
          
          
            
            
            
            
            
          
          
            
            
          
        
      
    
  
We identified and biochemically validated gut microbial enzymes capable of generating the clinically ineffective 5-ASA derivative, N-acetyl 5-ASA, and we have shown that the presence of these microbial acetyltransferase genes in gut metagenomes is associated with an increased risk for 5-ASA treatment failure. These findings were driven and validated by metagenomics (MGX), metatranscriptomics (MTX), and metabolomics (MBX) from >1,000 stool samples collected over a one-year period from patients with IBD and controls,22 as well as subsequent phylogenetics, heterologous expression, and chemical characterization of the target proteins (which would otherwise have been annotated to a sequence superfamily canonically involved in energy storage). The process is generalizable to other endpoints, chemicals, and microbiome multi-omics, improving our ability to understand gut microbial metabolism of drugs in patients and, therefore, opening new avenues for targeted or adjuvant therapies.
These gut microbial enzymes are human gut microbial acetyltransferases capable of converting 5-ASA to the clinically ineffective N-acetyl 5-ASA. The presence of any of four acetyltransferase genes in stool samples was significantly associated with an increased risk of steroid initiation—three from the thiolase superfamily (R5CY66 OR 2.88, 95% CI, [1.66-5.00]; and T5S060 OR 3.24, 95% CI [1.63-6.42]), including the Firmicutes CAG:176 enzyme (R6CZ24 OR 2.58, 95% CI [1.40-4.77]) and one from the acyl-CoA superfamily (C7H1G6, odds ratio [OR]2.81, 95% confidence interval [CI][1.68-4.68]). Accordingly, a method of determining if a patient diagnosed with IBD will respond to 5-ASA therapy is provided. The method includes detecting in the patient gut bacteria having three or more of these genes:
The presence of three or more of these genes indicates that the patient diagnosed with IBD will not respond 5-ASA therapy and other therapies should be administered, for example: biologics (e.g. infliximab, adalimumab, vedolizumab) or immunomodulators (6-mercaptopurine, tofacitinib). If three or more of the above enumerated genes are not detected in the patient diagnosed with IBD, then the patient should be administered 5-ASA as a first line treatment according to standard treatment protocols.
The invention therefore includes a method of determining whether a patient diagnosed with IBD will respond to 5-ASA therapy. In this method a stool sample is obtained from the patient. The stool sample is then analyzed for the presence of microbial acetyltransferase genes capable of converting 5-ASA to the clinically ineffective N-acetyl 5-ASA. If microbial acetyltransferase genes are not present in the sample, the patient will respond to 5-ASA therapy and such therapy should be administered. However, if microbial acetyltransferase genes are present in the sample, the patient will not respond to 5-ASA therapy and a second line therapy should be administered. The microbial acetyltransferase genes are three or more of the following genes:
The amino acid sequence and the GenBank Accession numbers for each of the four microbial acetyltransferase can be found here:
The nucleotide sequences can also be accessed using links on the uniprot.org site.
The method can be used if the IBD diagnosis is either ulcerative colitis or Crohn's Disease. The second line therapy can be selected from biologics (e.g. adalimumab, vedolizumab, infliximab) or immunomodulators (e.g. 6-mercaptopurine, tofacitinib) used for treating these diseases and such treatments are administered according to standard methods.
The detection of these microbial genes can be accomplished through readily used assays including shotgun sequencing (metagenomics) of human stool samples with computational analysis to identify the presence or absence of the relevant bacterial genes. The presence of these microbial genes at baseline predicts future loss of response to 5-ASA.
In other methods, qPCR (quantitative polymerase chain reaction) can be employed to identify the presence or absence of the relevant bacterial genes according to standard methods. Primer 3 software, which can be accessed at primer3.org/, is employed to design primers highly specific to the enzymes coded for by the above-identified genes. Those primers are then validated in silico. Either GeneRunner or BLAST (or both) can be used to check and confirm their efficiency and specificity. Then the in silico primers are further validated experimentally. See, for example, Kumar and Chordia, (2015) In Silico PCR Primer Designing and Validation, in Basu, C (eds) PCR Primer Design. Methods in Molecular Biology, vol 1275. Humana Press, New York, NY. Once validated, they are used to amplify the four genes set forth above according to Taqman fluorogenic probe detection. This can be accomplished by one of skill in the art using standard methods known in the art. See also, Barghouthi, A Universal Method for the Identification of Bacteria Based on General PCR Primers, Indian J Microbiol (October-December 2011) 51(4): 430-444 and Kralik and Ricchi, A Basic Guide to Real Time PCR in Microbial Diagnostics: Definitions Parameters, and Everything, Frontiers in Microbiology, 2 Feb. 2017, Vol 8, Article 108, doi: 10.3389/fmicb.2017.00108.
  
While the invention has been disclosed in particular embodiments, it will be understood by those skilled in the art that certain substitutions, alterations and/or omissions may be made to the embodiments without departing from the spirit of the invention. Accordingly, the foregoing description is meant to be exemplary only, and should not limit the scope of the invention. All references, scientific articles, patent publications, and any other documents cited herein are hereby incorporated by reference for the substance of their disclosure.
This application claims benefit of U.S. Provisional Application No. 63/596,703 filed Nov. 7, 2023, the contents of which are incorporated by reference.
This invention was made with government support under 5R35CA253185-03 awarded by the National Institutes of Health—National Cancer Institute. The government has certain rights in the invention.
| Number | Date | Country | |
|---|---|---|---|
| 63596703 | Nov 2023 | US |