The present application is being filed along with a Sequence Listing in electronic format. The Sequence Listing is provided as a file entitled 57123_703_601.xml, created Nov. 2, 2022, which is 363,456 bytes in size. The information in the electronic format of the Sequence Listing is incorporated by reference in its entirety.
Natural products-molecules produced by plants, microbes, and other living things have been important sources of medicine throughout human history. From herbal remedies to carefully formulated therapeutics, the treatments for many illnesses, including infectious diseases, cancers, and metabolic disorders, have their origins in the natural world. Despite the ubiquity of natural products in modern medicine, the discovery of new natural compounds with therapeutically relevant activities is hampered by their limited abundance and synthetic complexity.
From 1981 to 2014, nearly 50% of all medicines approved for use in the United States by the U.S. Food and Drug Administration (FDA) were natural products, their derivatives, or molecules modeled after them. Of the 459 medicines deemed essential by the World Health Organization (WHO), 202 have natural sources or are derived from naturally-occurring compounds. The tendency for natural products to exert therapeutic effects is often attributed to their origins: molecules made in biological systems are more likely to be biologically active. Considering their important role and proclivity for success in medicine, scientists have exhaustively searched for such molecules.
Terpenoids make up a particularly interesting class of medicinally relevant molecules. These compounds comprise the largest family of natural products (with over 95,000 known structures to date) and account for at least 100 medicines. Produced in all kingdoms of life, this family of compounds is produced, primarily, by terpene synthases. Classically, these enzymes act on prenyl diphosphate substrates of varying lengths, although a small number of synthases have been reported to accept additional substrates. These substrates are produced by prenyltransferases, which condense the five-carbon precursors isoprenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP) into progressively longer diphosphate molecules. From minimally diverse starting materials, terpene synthases can produce—often with remarkable chemoselectivity—a wide range of linear and cyclic structures. Further diversification of these structures is completed by tailoring enzymes, such as cytochrome P450s or dehydrogenases, which introduce heteroatoms and other functional moieties. In nature, functionalized terpenoids play important roles in ecological defense and communication; in medicine, they serve as anti-cancer, anti-inflammatory, and hormone therapies.
Like most medicines, terpenoid-based drugs have often been discovered through serendipity. For example, paclitaxel, an important anti-cancer drug, was discovered in the 1960s by screening over 30,000 extracts of plant/animal material. In modern times, systematic discovery efforts have relied on high-throughput screening (HTS) approaches. HTS typically requires assays that can be miniaturized (<100 μL) and run in parallel with 96-, 348-, or 1536-well plates and liquid handling robotics. While HTS has yielded many important successes, it has several limitations. First, HTS requires highly specialized equipment, development of suitable assays, and infrastructure to efficiently complete experiments. HTS centers, where screens can be carried out at dedicated laboratories with the necessary instruments and expertise, are becoming more common, but screening costs can approach or exceed $1.00/well, leading to costs in the hundreds of thousands of dollars for comprehensive library screens. Additionally, the molecules in natural product libraries are obtained from biological material (e.g., plant matter, soil samples, coral reefs, etc.). Acquiring these samples often requires significant resources and existing sampling strategies have yielded fewer and fewer novel compounds over time. Diversity-oriented chemical synthesis has successfully produced libraries of many compound classes, but this strategy has been limited to only a few terpenoid families that represent a small fraction of what can be found in nature.
Early achievements in heterologous biosynthesis of terpenoids laid the foundation for subsequent microbial production systems by elucidating general principles for heterologous terpene biosynthesis and functionalization (e.g. the design of precursor pathways and the effects of enzyme expression levels and solubility) and by developing strategies and tools for carrying out this work (e.g. optimized combinatorial screening, analytical methods, biosensors for rapid production readouts). These efforts, however, spanned nearly a decade, resulting in high-level production of only a handful of terpenoids (most of which were already known to have medicinal value)—this rate of productivity is incompatible with pharmaceutical discovery, which requires efficient access to many different molecules.
Combinatorial biosynthesis has emerged as an effective approach for producing structurally varied terpenoids. This strategy exploits the inherent promiscuity of certain enzymes (e.g., their ability to act on more than one substrate) by replacing their genes in a biosynthetic pathway with homologs. Often, these homologs carry out different chemical reactions on the same substrate, yielding different metabolites with minimal pathway manipulation. Combinatorial methods are well-suited for producing diverse terpenoids, as their biosynthetic pathways are inherently modular: most unfunctionalized terpenes can be produced by the activity of just two enzyme classes—prenyltransferases and terpene synthases—acting on IPP and dimethylallyl pyrophosphate (DMAPP).
Despite important achievements in synthetic biology, the use of engineered microbes in high-throughput discovery campaigns remains challenging. Microbial production systems have seen rapid development in the past decade but advances in efficient functional characterization of biosynthetic compounds have been lacking. Today, a scientist wanting to screen microbially synthesized terpenoids for activity against a disease-relevant enzyme would have to purify them from cell culture in milligram-scale quantities to identify compounds with a desired activity (which may also require protein purification and assay development). This work can be laborious and, because purification is often not parallelizable, greatly reduces screening throughput.
Aspects disclosed herein provide methods of performing multiplexed discovery of bioactive molecules that modulate activity of a target enzyme, the methods comprising: (a) providing a plurality of cells; (b) introducing into each of the plurality of cells a synthetic genetically-encoded system that links expression of a gene of interest to biosynthesis of a bioactive molecule by a cell of the plurality of cells, wherein the synthetic genetically-encoded system encodes: the target enzyme, a gene of interest, a synthase of the bioactive molecule, a ligand, and a receptor specific to the ligand under conditions sufficient for binding of the ligand to the receptor to produce a ligand-receptor pair, wherein the ligand-receptor pair activates transcription of the gene of interest; (c) performing multiplexed sequencing of the plurality of cells; and (d) identifying a subset of the plurality of cells in which the expression of the gene of interest is increased relative to a reference expression level, wherein the reference expression level is obtained from an otherwise identical reference cell that does not comprise a metabolic pathway that produces the bioactive molecule, the ligand or the receptor. In some embodiments, the expression of the gene of interest is increased relative to the reference expression level when the bioactive molecule is present in the cell at concentrations sufficient to modulate the activity of the target enzyme. In some embodiments, modulation of the target enzyme by the bioactive molecule disrupts binding between the receptor and the ligand, thereby reducing transcriptional activation of the gene of interest. In some embodiments, the binding of the ligand to the receptor is phosphorylation dependent. In some embodiments, the plurality of cells are prokaryotic cells. In some embodiments, the prokaryotic cells comprise bacterial cells. In some embodiments, the bioactive molecule comprises a terpenoid. In some embodiments, the target enzyme comprises a proteolytic enzyme, a phosphatase, or a kinase. In some embodiments, the phosphatase comprises a tyrosine phosphatase. In some embodiments, the kinase comprises a tyrosine kinase. In some embodiments, the exogenous genetically-encoded system comprises a two-hybrid system encoding the target enzyme, the gene of interest, the ligand, and the receptor. In some embodiments, the synthase comprises a terpene synthase or a nonribosomal peptide synthetase. In some embodiments, the terpene synthase comprises γ-humulene synthase (GHS), amorphadiene synthase (ADS), α-bisabolene synthase (ABS), or taxadiene synthase (TXS). In some embodiments, the terpene synthase comprises an amino acid sequence that is greater than or equal to about 90% identical to any one of SEQ ID NO: 4, 7, 9, 11, 13, 15, 17, 19, or 23. In some embodiments, the ligand comprises a kinase substrate that binds to the receptor in a phosphorylated state, and the receptor comprises a phosphorylated protein binding domain that binds to the kinase substrate when the kinase substrate is phosphorylated. In some embodiments, the ligand is coupled to a subunit of RNA polymerase and the receptor is coupled to a DNA binding domain, or the ligand is coupled to the DNA binding domain and the receptor is coupled to the omega subunit of the RNA polymerase. In some embodiments, the gene of interest encodes a reporter polypeptide comprising a luciferase enzyme, a fluorescent polypeptide, secreted alkaline phosphatase, ß-galactosidase levansucrase, chloramphenicol acetyltransferase (CAT), or a protein that confers antibiotic resistance. In some embodiments, the gene of interest encodes a polymerizing enzyme that, when expressed, binds to a promoter operably linked to a gene encoding the reporter polypeptide to drive expression of the reporter polypeptide. In some embodiments, the expression of the reporter polypeptide from the gene is greater than an expression of the reporter polypeptide if it were encoded by the gene of interest. In some embodiments, the expression of the reporter polypeptide is greater by more than or equal to about 2-fold. In some embodiments, the genetically-encoded system further encodes a metabolic pathway for biosynthesis of the bioactive molecule. In some embodiments, the metabolic pathway is an isoprenoid pathway. In some embodiments, the isoprenoid pathway comprises a mevalonate pathway, a methylerthritol 4-phosphate (MEP) pathway, a deoxyxylulose 5-phosphate (DXP) pathway, or an isopentenol utilization (IUP) pathway. In some embodiments, the multiplex sequencing comprises long read sequencing. In some embodiments, the synthetic genetically-encoded system comprises one or more molecular barcode sequences that uniquely identifies the target enzyme, the synthase, or a combination thereof. In some embodiments, the multiplex sequencing further comprises performing demultiplexing, thereby assigning each of the one or more molecular barcodes with the target enzyme, the synthase, or the combination thereof, for each cell of the subset of the plurality of cells. In some embodiments, the method further comprises performing multiplexed sequencing of the plurality of cells prior to introducing in (b), wherein the identifying in (d) comprises detecting enrichment of the gene of interest following the introducing in (b).
Aspects disclosed herein provide systems of linking expression of a gene of interest to biosynthesis by the cell of a bioactive molecule that modulates activity of a target enzyme, the systems comprising: one or more nucleic acid molecules encoding a genetically-encoded system that, when expressed in a cell, links expression of a gene of interest to biosynthesis by the cell of a bioactive molecule that modulates activity of a target enzyme, wherein the genetically-encoded system comprises: the target enzyme; a synthase of the bioactive molecule; a ligand; and a receptor specific to the ligand, wherein, (i) the ligand is coupled to a subunit of RNA polymerase and the receptor is coupled to a DNA binding protein, or (ii) the ligand is coupled to the DNA binding protein and the receptor is coupled to the subunit of RNA polymerase, and wherein the one or more nucleic acid molecules comprises: one or more adaptor molecules comprising a sequencing primer binding site; the gene of interest; and a transcription initiation site for the gene of interest comprising: a binding site for the DNA binding protein; and a promoter sequence comprising a binding site for the RNA polymerase. In some embodiments, the system further comprises the cell comprising the one or more nucleic acid molecules. In some embodiments, the cell is a prokaryotic cell. In some embodiments, the prokaryotic cell comprises a bacterial cell. In some embodiments, the cell is isolated. In some embodiments, the bioactive molecule comprises a terpenoid. In some embodiments, the target enzyme comprises a proteolytic enzyme, a phosphatase, or a kinase. In some embodiments, the phosphatase comprises a tyrosine phosphatase. In some embodiments, the kinase comprises a tyrosine kinase. In some embodiments, the subunit of the RNA polymerase comprises an omega subunit of the RNA polymerase. In some embodiments, the exogenous genetically-encoded system comprises a two-hybrid system encoding the target enzyme, the ligand, the receptor, and the gene of interest. In some embodiments, the synthase comprises a terpene synthase or a nonribosomal peptide synthetase. In some embodiments, the terpene synthase comprises γ-humulene synthase (GHS), amorphadiene synthase (ADS), α-bisabolene synthase (ABS), or taxadiene synthase (TXS). In some embodiments, the terpene synthase comprises an amino acid sequence that is greater than or equal to about 90% identical to any one of SEQ ID NO: 4, 7, 9, 11, 13, 15, 17, 19, or 23. In some embodiments, the ligand comprises a kinase substrate that binds to the receptor in a phosphorylated state, and the receptor comprises a phosphorylated protein binding domain that binds to the kinase substrate when the kinase substrate is phosphorylated. In some embodiments, the ligand is coupled to a subunit of RNA polymerase and the receptor is coupled to a DNA binding domain, or the ligand is coupled to the DNA binding domain and the receptor is coupled to the subunit of the RNA polymerase. In some embodiments, the gene of interest encodes a reporter polypeptide comprising a luciferase enzyme, a fluorescent polypeptide, secreted alkaline phosphatase, β-galactosidase levansucrase, chloramphenicol acetyltransferase (CAT), or a protein that confers antibiotic resistance. In some embodiments, the gene of interest encodes a modulator protein that is operably linked to a gene encoding the reporter polypeptide, wherein the modulator protein activates or represses expression of the reporter polypeptide. In some embodiments, the one or more adaptor molecules comprises one or more molecular barcode sequences unique to the target enzyme, the synthase, or the combination thereof. In some embodiments, the genetically-encoded system further encodes a metabolic pathway for biosynthesis of the bioactive molecule. In some embodiments, the metabolic pathway is an isoprenoid pathway. In some embodiments, the isoprenoid pathway comprises a mevalonate pathway, a methylerthritol 4-phosphate (MEP) pathway, a deoxyxylulose 5-phosphate (DXP) pathway, or an isopentenol utilization (IUP) pathway. In some embodiments, the one or more adaptor molecules further comprises another barcode sequence unique to the metabolic pathway.
Aspects disclosed herein provide methods of determining a presence of a bioactive molecule that modulates activity of a target enzyme, the methods comprising: (a) introducing into a cell a synthetic genetically-encoded system that links expression of a gene of interest to biosynthesis of the bioactive molecule by the cell, wherein the synthetic genetically-encoded system encodes: the target enzyme, a gene of interest encoding modulatory protein that modulates expression of a reporter polypeptide, the reporter polypeptide, a synthase of the bioactive molecule, a ligand, and a receptor specific to the ligand under conditions sufficient for binding of the ligand to the receptor to form a ligand-receptor pair, wherein the ligand-receptor pair activates transcription of the gene of interest; (b) measuring the expression of the reporter polypeptide; and (c) determining the presence of the bioactive molecule in the cell if the expression of the reporter polypeptide is increased or decreased relative to a reference expression level obtained from an otherwise identical reference cell that does not comprise a functional metabolic pathway that produces the bioactive molecule, the ligand or the receptor. In some embodiments, the expression of the reporter polypeptide is increased relative to the reference expression level when the bioactive molecule is present in the cell at concentrations sufficient to modulate the activity of the target enzyme. In some embodiments, the modulatory protein comprises a polymerizing enzyme that activates transcription of the reporter polypeptide. In some embodiments, the expression of the reporter polypeptide is decreased relative to the reference expression level when the bioactive molecule is present in the cell at concentrations sufficient to modulate the activity of the target enzyme. In some embodiments, the modulatory protein comprises a transcriptional repressor that represses transcription of the reporter polypeptide. In some embodiments, modulation of the target enzyme by the bioactive molecule disrupts binding between the receptor and the ligand, thereby reducing transcriptional activation of the gene of interest. In some embodiments, the binding of the ligand to the receptor is phosphorylation dependent. In some embodiments, cell is a prokaryotic cell. In some embodiments, the prokaryotic cell is a bacterial cell. In some embodiments, the bioactive molecule comprises a terpenoid. In some embodiments, the target enzyme comprises a proteolytic enzyme, a phosphatase, or a kinase. In some embodiments, the phosphatase comprises a tyrosine phosphatase. In some embodiments, the kinase comprises a tyrosine kinase. In some embodiments, the exogenous genetically-encoded system comprises a two-hybrid system encoding the target enzyme, the ligand, the receptor, and the gene of interest. In some embodiments, the synthase comprises a terpene synthase or a nonribosomal peptide synthetase. In some embodiments, the terpene synthase comprises γ-humulene synthase (GHS), amorphadiene synthase (ADS), α-bisabolene synthase (ABS), or taxadiene synthase (TXS). In some embodiments, the terpene synthase comprises an amino acid sequence that is greater than or equal to about 90% identical to any one of SEQ ID NO: 4, 7, 9, 11, 13, 15, 17, 19, or 23. In some embodiments, the ligand comprises a kinase substrate that binds to the receptor in a phosphorylated state, and the receptor comprises a phosphorylated protein binding domain that binds to the kinase substrate when the kinase substrate is phosphorylated. In some embodiments, the ligand is coupled to a subunit of RNA polymerase and the receptor is coupled to a DNA binding domain, or the ligand is coupled to the DNA binding domain and the receptor is coupled to the subunit of the RNA polymerase. In some embodiments, the reporter polypeptide comprises a luciferase enzyme, a fluorescent polypeptide, secreted alkaline phosphatase, ß-galactosidase levansucrase, chloramphenicol acetyltransferase (CAT), or a protein that confers antibiotic resistance. In some embodiments, the genetically-encoded system further encodes a metabolic pathway for biosynthesis of the bioactive molecule. In some embodiments, the metabolic pathway is an isoprenoid pathway. In some embodiments, the isoprenoid pathway comprises a mevalonate pathway, a methylerthritol 4-phosphate (MEP) pathway, a deoxyxylulose 5-phosphate (DXP) pathway, or an isopentenol utilization (IUP) pathway.
Aspects disclosed herein provide systems of linking expression of a gene of interest to biosynthesis by the cell of a bioactive molecule that modulates activity of a target enzyme, the systems comprising: one or more nucleic acid molecules encoding a genetically-encoded system that, when expressed in a cell, links expression of a gene of interest to biosynthesis by the cell of a bioactive molecule that modulates activity of a target enzyme, wherein the genetically-encoded system comprises: a reporter polypeptide; the target enzyme; a synthase of the bioactive molecule; a ligand; and a receptor specific to the ligand, wherein, (i) the ligand is coupled to a subunit of RNA polymerase and the receptor is coupled to a DNA binding protein, or (ii) the ligand is coupled to the DNA binding protein and the receptor is coupled to the subunit of RNA polymerase, and wherein the one or more nucleic acid molecules comprises: the gene of interest, wherein the gene of interest encodes a modulator protein configured to activate transcription or repress transcription of the reporter polypeptide; and a transcription initiation site for the gene of interest comprising: a binding site for the DNA binding protein; and a promoter sequence comprising a binding site for the RNA polymerase. In some embodiments, the modulatory protein comprises a polymerizing enzyme that activates transcription of the reporter polypeptide. In some embodiments, the modulatory protein comprises a transcriptional repressor that represses transcription of the reporter polypeptide. In some embodiments, the system further comprises the cell comprising the one or more nucleic acid molecules. In some embodiments, the cell is a prokaryotic cell. In some embodiments, the prokaryotic cell comprises a bacterial cell. In some embodiments, the cell is isolated. In some embodiments, the bioactive molecule comprises a terpenoid. In some embodiments, the target enzyme comprises a proteolytic enzyme, a phosphatase, or a kinase. In some embodiments, the phosphatase comprises a tyrosine phosphatase. In some embodiments, the kinase comprises a tyrosine kinase. In some embodiments, the subunit of the RNA polymerase comprises an omega subunit of the RNA polymerase. In some embodiments, the exogenous genetically-encoded system comprises a two-hybrid system encoding the target enzyme, the ligand, the receptor, and the gene of interest. In some embodiments, the synthase comprises a terpene synthase or a nonribosomal peptide synthetase. In some embodiments, the terpene synthase comprises γ-humulene synthase (GHS), amorphadiene synthase (ADS), α-bisabolene synthase (ABS), or taxadiene synthase (TXS). In some embodiments, the terpene synthase comprises an amino acid sequence that is greater than or equal to about 90% identical to any one of SEQ ID NO: 4, 7, 9, 11, 13, 15, 17, 19, or 23. In some embodiments, the ligand comprises a kinase substrate that binds to the receptor in a phosphorylated state, and the receptor comprises a phosphorylated protein binding domain that binds to the kinase substrate when the kinase substrate is phosphorylated. In some embodiments, the ligand is coupled to a subunit of RNA polymerase and the receptor is coupled to a DNA binding domain, or the ligand is coupled to the DNA binding domain and the receptor is coupled to the subunit of the RNA polymerase. In some embodiments, the reporter polypeptide comprises a luciferase enzyme, a fluorescent polypeptide, secreted alkaline phosphatase, ß-galactosidase levansucrase, chloramphenicol acetyltransferase (CAT), or a protein that confers antibiotic resistance. In some embodiments, the genetically-encoded system further encodes a metabolic pathway for biosynthesis of the bioactive molecule. In some embodiments, the metabolic pathway is an isoprenoid pathway. In some embodiments, the isoprenoid pathway comprises a mevalonate pathway, a methylerthritol 4-phosphate (MEP) pathway, a deoxyxylulose 5-phosphate (DXP) pathway, or an isopentenol utilization (IUP) pathway. In some embodiments, the one or more nucleic acid molecules further comprises one or more barcode sequences unique to the metabolic pathway. In some embodiments, the one or more nucleic acid molecules further comprises one or more barcode sequences unique to the synthase, the target enzyme or a combination thereof.
Aspects disclosed herein provide methods of determining a presence of a bioactive molecule that modulates the activity of a target enzyme, the methods comprising: (a) introducing into a cell a synthetic genetically-encoded system that links expression of a gene of interest to biosynthesis of the bioactive molecule by the cell, wherein the synthetic genetically-encoded system encodes the target enzyme comprising: a proteolytic enzyme, the gene of interest, a synthase of the bioactive molecule, a ligand, and a receptor specific to the ligand under conditions sufficient for binding of the ligand to the receptor to form a ligand-receptor pair, wherein the ligand-receptor pair comprises a cleavage site recognized by the proteolytic enzyme, and activates transcription of the gene of interest; (b) measuring the expression of the gene of interest; and (c) determining the presence of the bioactive molecule in the cell if the expression of the gene of interest is increased relative to a reference expression level obtained from an otherwise identical reference cell that does not comprise a metabolic pathway that produces the bioactive molecule, the ligand or the receptor. In some embodiments, the expression of the gene of interest is increased relative to the reference expression level when the bioactive molecule is present in the cell at concentrations sufficient to modulate the activity of the target enzyme. In some embodiments, modulation of the target enzyme by the bioactive molecule disrupts binding between the receptor and the ligand, thereby reducing transcriptional activation of the gene of interest. In some embodiments, the binding of the ligand to the receptor is phosphorylation dependent. In some embodiments, the cell is a prokaryotic cell. In some embodiments, the prokaryotic cell comprises a bacterial cell. In some embodiments, the bioactive molecule comprises a terpenoid. In some embodiments, the proteolytic enzyme comprises a viral proteolytic enzyme. In some embodiments, the viral proteolytic enzyme comprises 3CL protease (3CLpro) of severe acute respiratory syndrome coronavirus 2 (SARS-COV-2), NS2B/NS3 protease of West Nile Virus, or papain-like protease (PLpro) of SARS-COV-2. In some embodiments, the proteolytic enzyme comprises a ubiquitin specific protease. In some embodiments, the ubiquitin specific protease is ubiquitin specific protease 7 (USP7). In some embodiments, the synthetic genetically-encoded system comprises a two-hybrid system encoding the target enzyme, the ligand, the receptor, and the gene of interest. In some embodiments, the synthase comprises a terpene synthase or a nonribosomal peptide synthetase. In some embodiments, the terpene synthase comprises γ-humulene synthase (GHS), amorphadiene synthase (ADS), α-bisabolene synthase (ABS), or taxadiene synthase (TXS). In some embodiments, the terpene synthase comprises an amino acid sequence that is greater than or equal to about 90% identical to any one of SEQ ID NO: 4, 7, 9, 11, 13, 15, 17, 19, or 23. In some embodiments, the ligand comprises a kinase substrate that binds to the receptor in a phosphorylated state, and the receptor comprises a phosphorylated protein binding domain that binds to the kinase substrate when the kinase substrate is phosphorylated. In some embodiments, the ligand is coupled to a subunit of RNA polymerase and the receptor is coupled to a DNA binding domain, or the ligand is coupled to the DNA binding domain and the receptor is coupled to the subunit of the RNA polymerase. In some embodiments, the ligand comprises a linker coupled to the subunit of RNA polymerase. In some embodiments, the receptor comprises a linker coupled to the subunit of RNA polymerase. In some embodiments, the linker comprises the cleavage site recognized by the proteolytic enzyme. In some embodiments, the cleavage site comprises an amino acid sequence comprising AVLQSGFR (SEQ ID NO: 1), KARVLAEAM (SEQ ID NO: 2), LRGG (SEQ ID NO: 3), or SEQ ID NO: 25. In some embodiments, the linker comprises one or more alanine residues flanking the cleavage site. In some embodiments, the gene of interest encodes a reporter polypeptide comprising a luciferase enzyme, a fluorescent polypeptide, secreted alkaline phosphatase, ß-galactosidase levansucrase, chloramphenicol acetyltransferase (CAT), or a protein that confers antibiotic resistance. In some embodiments, the gene of interest encodes a modulator protein that is operably linked to a gene encoding a reporter polypeptide, wherein the modulator protein activates or represses expression of the reporter polypeptide. In some embodiments, the expression of the reporter polypeptide is greater than an expression of the reporter polypeptide if the reporter polypeptide were encoded by the gene of interest. In some embodiments, the expression of the reporter polypeptide is greater by more than or equal to about 2-fold. In some embodiments, the genetically-encoded system further encodes a metabolic pathway for biosynthesis of the bioactive molecule. In some embodiments, the metabolic pathway is an isoprenoid pathway. In some embodiments, the isoprenoid pathway comprises a mevalonate pathway, a methylerthritol 4-phosphate (MEP) pathway, a deoxyxylulose 5-phosphate (DXP) pathway, or an isopentenol utilization (IUP) pathway.
Aspects disclosed herein provide systems of linking expression of a gene of interest to biosynthesis by the cell of a bioactive molecule that modulates activity of a target enzyme, the systems comprising: one or more nucleic acid molecules encoding a genetically-encoded system that, when expressed in a cell, links expression of a gene of interest to biosynthesis by the cell of a bioactive molecule that modulates activity of a target enzyme, wherein the genetically-encoded system comprises: a metabolic pathway for biosynthesis of the bioactive molecule; the target enzyme, wherein the target enzyme comprises a proteolytic enzyme; a synthase of the bioactive molecule; a ligand; and a receptor specific to the ligand, wherein (i) the ligand is coupled to a subunit of RNA polymerase and the receptor is coupled to a DNA binding protein, or (ii) the ligand is coupled to the DNA binding protein and the receptor is coupled to the subunit of RNA polymerase; wherein the receptor or the ligand comprise a cleavage site recognized by the proteolytic enzyme; and wherein the one or more nucleic acid molecules comprises: the gene of interest; and a transcription initiation site for the gene of interest comprising: a binding site for the DNA binding protein; and a promoter sequence comprising a binding site for the RNA polymerase. In some embodiments, the system further comprises the cell comprising the one or more nucleic acid molecules. In some embodiments, the cell is a prokaryotic cell. In some embodiments, the prokaryotic cell comprises a bacterial cell. In some embodiments, the cell is isolated. In some embodiments, the bioactive molecule comprises a terpenoid. In some embodiments, the proteolytic enzyme comprises a viral proteolytic enzyme. In some embodiments, the viral proteolytic enzyme comprises 3CL protease (3CLpro) of severe acute respiratory syndrome coronavirus 2 (SARS-COV-2), NS2B/NS3 protease of West Nile Virus, or papain-like protease (PLpro) of SARS-COV-2. In some embodiments, the proteolytic enzyme comprises a ubiquitin specific protease. In some embodiments, the ubiquitin specific protease is ubiquitin specific protease 7 (USP7). In some embodiments, the subunit of the RNA polymerase comprises an omega subunit of the RNA polymerase. In some embodiments, the synthetic genetically-encoded system comprises a two-hybrid system encoding the target enzyme, the ligand, the receptor, and the gene of interest. In some embodiments, the synthase comprises a terpene synthase or a nonribosomal peptide synthetase. In some embodiments, the terpene synthase comprises γ-humulene synthase (GHS), amorphadiene synthase (ADS), α-bisabolene synthase (ABS), or taxadiene synthase (TXS). In some embodiments, the terpene synthase comprises an amino acid sequence that is greater than or equal to about 90% identical to any one of SEQ ID NO: 4, 7, 9, 11, 13, 15, 17, 19, or 23. In some embodiments, the ligand comprises a kinase substrate that binds to the receptor in a phosphorylated state, and the receptor comprises a phosphorylated protein binding domain that binds to the kinase substrate when the kinase substrate is phosphorylated. In some embodiments, the ligand is coupled to a subunit of RNA polymerase and the receptor is coupled to a DNA binding domain, or the ligand is coupled to the DNA binding domain and the receptor is coupled to the subunit of the RNA polymerase. In some embodiments, the ligand comprises a linker coupled to the subunit of RNA polymerase. In some embodiments, the receptor comprises a linker coupled to the subunit of RNA polymerase. In some embodiments, the linker comprises the cleavage site recognized by the proteolytic enzyme. In some embodiments, the cleavage site comprises an amino acid sequence comprising AVLQSGFR (SEQ ID NO: 1), KARVLAEAM (SEQ ID NO: 2), LRGG (SEQ ID NO: 3), or SEQ ID NO: 25. In some embodiments, the linker comprises one or more alanine residues flanking the cleavage site. In some embodiments, the reporter polypeptide comprises a luciferase enzyme, a fluorescent polypeptide, secreted alkaline phosphatase, ß-galactosidase levansucrase, chloramphenicol acetyltransferase (CAT), or a protein that confers antibiotic resistance. In some embodiments, the genetically-encoded system further encodes a metabolic pathway for biosynthesis of the bioactive molecule. In some embodiments, the metabolic pathway is an isoprenoid pathway. In some embodiments, the isoprenoid pathway comprises a mevalonate pathway, a methylerthritol 4-phosphate (MEP) pathway, a deoxyxylulose 5-phosphate (DXP) pathway, or an isopentenol utilization (IUP) pathway. In some embodiments, the one or more nucleic acid molecules further comprises one or more barcode sequences unique to the metabolic pathway. In some embodiments, the one or more nucleic acid molecules further comprises one or more barcode sequences unique to the synthase, the target enzyme or a combination thereof.
Aspects disclosed herein provide systems for identifying a protease modulator, the systems comprising: a first nucleic acid sequence encoding a phosphorylated tyrosine binding domain, wherein the phosphorylated tyrosine binding domain optionally comprises a Src homology 2 (SH2) domain; a second nucleic acid sequence encoding a repressor element, wherein the repressor element optionally comprises a cI repressor; a third nucleic acid sequence encoding a subunit of a RNA polymerase, wherein the subunit of the RNA polymerase is optionally an omega subunit of the RNA polymerase (RpoZ); a fourth nucleic acid sequence encoding a tyrosine kinase substrate; a fifth nucleic acid sequence encoding a tyrosine kinase; a sixth nucleic acid encoding a target protease; a seventh nucleic acid sequence encoding a protease cleavage site; an eighth nucleic acid encoding an operator for the repressor element, wherein the operator for the repressor element optionally comprises a cI repressor; a ninth nucleic acid sequence comprising a binding site for the RNA polymerase; and a tenth nucleic acid sequence encoding a reporter gene. In some embodiments, the systems further comprise an eleventh nucleic acid sequence encoding Hsp90 co-chaperone Cdc37. In some embodiments, the tyrosine kinase comprises Src kinase. In some embodiments, the first nucleic acid sequence and the second nucleic acid sequence encode the phosphorylated tyrosine binding domain fused with the repressor element. In some embodiments, the third nucleic acid sequence and the fourth nucleic acid sequence encode the subunit of the RNA polymerase fused with the tyrosine kinase substrate. In some embodiments, the protease cleavage site is positioned in a linker region disposed between the subunit of the RNA polymerase and the tyrosine kinase substrate. In some embodiments, the first nucleic acid sequence and the fourth nucleic acid sequence encode the phosphorylated tyrosine binding domain fused with the tyrosine kinase substrate. In some embodiments, the second nucleic acid sequence and the third nucleic acid sequence encode the repressor element fused with the subunit of the RNA polymerase. In some embodiments, a first barcode sequence operably linked to the sixth nucleic acid sequence, wherein the first barcode is sufficient to identify the target protease. In some embodiments, the barcode comprises an index, wherein the index comprises greater than or equal to about 6 contiguous base pairs that are specific to the target protease. In some embodiments, an exogenous nucleic acid encoding a terpene synthase, nonribosomal peptide synthetase, or a combination thereof. In some embodiments, the exogenous nucleic acid encoding the terpene synthase, nonribosomal peptide synthetase, or a combination thereof comprises a second barcode sequence sufficient to identify the terpene synthase. In some embodiments, the second barcode comprises an index, and wherein the index comprises greater than or equal to about 6 contiguous base pairs that are specific to the terpene synthase. In some embodiments, the exogenous nucleic acid further encodes an enzyme configured to catalyze condensation of (i) isopentenyl diphosphate (IPP), (ii) dimethylallyl diphosphate (DMAPP), or (iii) a combination of IPP and DMAPP. In some embodiments, the enzyme comprises a geranylgeranyl diphosphate synthase (GGPPS). In some embodiments, the exogenous nucleic acid further encodes a cytochrome P450 enzyme, a cytochrome P450 reductase enzyme, a cytochrome b5 enzyme, an oxidase enzyme, an acyltransferase enzyme, a glycosyltransferase enzyme, a halogenase, a peroxidase, or any combination thereof. In some embodiments, another exogenous nucleic acid sequence encoding a metabolic pathway for (i) the IPP, (ii) the DMAPP, (iii) the combination of the IPP and the DMAPP, (iv) molecules resulting from the condensation of the IPP, the DMAPP, or the combination of the IPP and the DMAPP, or (v) any combination of (i) to (iv). In some embodiments, the tyrosine kinase substrate comprises a polypeptide, and wherein the polypeptide comprises a tyrosine residue configured to (i) be phosphorylated by the Src kinase, (ii) bind to the SH2 domain when the tyrosine residue is phosphorylated, (iii) bind to the SH2 domain with less binding affinity as compared to the binding affinity between the tyrosine residue and the SH2 domain when the tyrosine residue is dephosphorylated, or (iv) any combination of (i) to (iii). In some embodiments, the tyrosine kinase substrate comprises a substrate domain derived from a hamster polyomavirus middle T antigen (MidT). In some embodiments, the seventh nucleic acid sequence encoding the protease cleavage site comprises an amino acid sequence configured to be hydrolyzed by: (i) the 3CL protease (3CLpro) of severe acute respiratory syndrome coronavirus 2 (SARS-COV-2), (ii) NS2B/NS3 protease of West Nile Virus, (iii) papain-like protease (PLpro) of SARS-COV-2, or (iv) ubiquitin specific protease 7 (USP7). In some embodiments, the amino acid sequence comprises AVLQSGFR (SEQ ID NO: 1). In some embodiments, the amino acid sequence further comprises fewer than or equal to 4 alanine residues on an N-terminus, C-terminus, or combination of the N-terminus and C-terminus of the amino acid sequence. In some embodiments, the seventh nucleic acid sequence encoding the protease cleavage site comprises an amino acid sequence configured to be hydrolyzed by human immunodeficiency virus 1 protease (HIV1pro). In some embodiments, the amino acid sequence comprises KARVLAEAM (SEQ ID NO: 2). In some embodiments, the amino acid sequence further comprises fewer than or equal to 4 alanine residues on an N-terminus, C-terminus, or combination of the N-terminus and C-terminus of the amino acid sequence. In some embodiments a single nucleic acid molecule comprising any combination of the first nucleic acid sequence, the second nucleic acid sequence, the third nucleic acid sequence, the fourth nucleic acid sequence, the fifth nucleic acid sequence, the sixth nucleic acid sequence, the seventh nucleic acid sequence, the eighth nucleic acid sequence, the ninth nucleic acid sequence, and the tenth nucleic acid sequence. In some embodiments, the single nucleic acid molecule is a vector comprising a plasmid vector, a viral vector, a cosmid, an artificial chromosome, a region of a host chromosome, or any combination thereof. In some embodiments, one or more of the first nucleic acid sequence, the second nucleic acid sequence, the third nucleic acid sequence, the fourth nucleic acid sequence, the fifth nucleic acid sequence, the sixth nucleic acid sequence, the seventh nucleic acid sequence, the eighth nucleic acid sequence, the ninth nucleic acid sequence, and the tenth nucleic acid sequence is a vector comprising a plasmid vector, a viral vector, a cosmid, an artificial chromosome, a region of a host chromosome, or any combination thereof. In some embodiments, the reporter gene encodes: a luciferase enzyme; a fluorescent polypeptide; secreted alkaline phosphatase; ß-galactosidase levansucrase; chloramphenicol acetyltransferase (CAT); antibiotic resistance; or a polymerizing enzyme that, when expressed, binds to a promoter operably linked to a gene encoding a detectable polypeptide to drive expression of the detectable polypeptide. In some embodiments, the expression of the detectable polypeptide is greater than an expression of the detectable polypeptide when the gene encoding the detectable polypeptide is included as the reporter gene.
Aspects disclosed herein provide isolated cells comprising systems for identifying a protease modulator, the systems comprising: a first nucleic acid sequence encoding a phosphorylated tyrosine binding domain, wherein the phosphorylated tyrosine binding domain optionally comprises a Src homology 2 (SH2) domain; a second nucleic acid sequence encoding a repressor element, wherein the repressor element optionally comprises a cI repressor; a third nucleic acid sequence encoding a subunit of a RNA polymerase, wherein the subunit of the RNA polymerase is optionally an omega subunit of the RNA polymerase (RpoZ); a fourth nucleic acid sequence encoding a tyrosine kinase substrate; a fifth nucleic acid sequence encoding a tyrosine kinase; a sixth nucleic acid encoding a target protease; a seventh nucleic acid sequence encoding a protease cleavage site; an eighth nucleic acid encoding an operator for the repressor element, wherein the operator for the repressor element optionally comprises a cI repressor; a ninth nucleic acid sequence comprising a binding site for the RNA polymerase; and a tenth nucleic acid sequence encoding a reporter gene. In some embodiments, the isolated cell comprises a prokaryotic cell. In some embodiments, the isolated cell is obtained from a unicellular organism. In some embodiments, the isolated cell comprises a bacterial cell, an algae cell, an archaea cell, a protozoa cell, or a fungal cell. In some embodiments, the fungal cell is a yeast cell. In some embodiments, the bacterial cell is an Escherichia coli (E. coli) cell.
Aspects disclosed herein provide methods of identifying a modulator of the target protease, the methods comprising (a) expressing in a cell an exogenous terpene synthase; (b) introducing into the cell the system of high throughput screening of bioactive molecules that modulate a target enzyme that links the modulation of the target protease with expression of the reporter gene; and (c) measuring expression of the reporter gene in the presence of expression of the terpene synthase, wherein an increased or decreased expression of the reporter gene as compared to a reference expression level indicates a presence of the modulator of the target protease produced by the cell. In some embodiments, expressing in the cell an exogenous nucleic acid sequence encoding a metabolic pathway for (i) isopentenyl diphosphate (IPP), (ii) dimethylallyl diphosphate (DMAPP), (iii) the combination of the IPP and the DMAPP, (iv) molecules resulting from condensation of the IPP, the DMAPP, or the combination of the IPP and the DMAPP, or (v) any combination of (i) to (iv). In some embodiments, the cell further comprises an enzyme configured to catalyze the condensation of (i) the IPP, (ii) the DMAPP, or (iii) a combination of the IPP and the DMAPP. In some embodiments, the enzyme comprises a geranylgeranyl diphosphate synthase (GGPPS). In some embodiments, culturing the cell in a growth cell medium, wherein the growth cell medium comprises glycerol at a concentration comprising less than or equal to about 2% (by volume). In some embodiments, culturing the cell in a growth cell medium, wherein the growth cell medium comprises mevalonate at a concentration comprising less than or equal to about 20 micromolar (mM). In some embodiments, isolating the modulator of the target protease. In some embodiments, the target protease comprises a viral protease. In some embodiments, the viral protease comprises HIV-1 protease (HIV-1Pr) or SARS-COV-2 main protease (3ClPro), (ii) NS2B/NS3 protease (WNV) of West Nile Virus, (iii) papain-like protease (PLpro) of SARS-COV-2, or (iv) Dengue Virus Protease (DVpro). In some embodiments the target protease is a human protease. In some embodiments, the human protease is ubiquitin-specific protease 7 (USP7). In some embodiments, the USP7 is a cancer target. In some embodiments, the introducing of (b) is performed under conditions sufficient to cause the omega subunit of the RNA polymerase to recruit RNA polymerase to the binding site for the RNA polymerase in the absence of a protease, thereby expressing the reporter gene. In some embodiments, the reference expression level is derived from a reference cell expressing the second nucleic acid sequence that is modified such that the tyrosine kinase substrate contains a mutation that inhibits its binding to the SH2 domain. In some embodiments, the reference expression level is derived from a reference cell comprising a modified terpene synthase comprising a mutation that reduces its activity as compared with an otherwise identical terpene synthase that does not have the mutation. In some embodiments, repeating (a) to (c), wherein for each repetition, a new exogenous terpene synthase is used to identify a new modulator of the target protease. In some embodiments, the modulator of the target protease is a terpene.
Aspects disclosed herein provide systems for high throughput screening of bioactive molecules that inhibit a target enzyme, the systems comprising: a first nucleic acid sequence encoding a phosphorylated tyrosine binding domain; a second nucleic acid sequence encoding a repressor element; a third nucleic acid sequence encoding a subunit of RNA polymerase; a fourth nucleic acid sequence encoding a tyrosine kinase substrate; a fifth nucleic acid sequence encoding tyrosine kinase; a sixth nucleic acid encoding the target enzyme, wherein the sixth nucleic acid comprises a barcode sufficient to identify the target enzyme; a seventh nucleic acid encoding an operator for the repressor element; an eighth nucleic acid sequence comprising a binding site for the RNA polymerase; and a ninth nucleic acid sequence encoding a reporter gene. In some embodiments, the phosphorylated tyrosine binding domain comprises a Src homology 2 (SH2) domain. In some embodiments, the repressor element comprises a cI repressor. In some embodiments, the subunit of the RNA polymerase comprises an omega subunit of the RNA polymerase. In some embodiments, the tyrosine kinase comprises Src kinase. In some embodiments, the target enzyme is a tyrosine phosphatase, a protease, or a combination thereof. In some embodiments, the protease is a viral protease. In some embodiments, the protease prevents transcriptional activation by stopping fusion of two proteins. In some embodiments, the two proteins are middle T antigen and an RNA polymerase. In some embodiments, inactivation of the protease may reenable transcription. In some embodiments, the barcode comprises an index comprising greater than or equal to about 6 contiguous base pairs that are specific to the target enzyme. In some embodiments, a single nucleic acid molecule comprising any combination of the first nucleic acid sequence, the second nucleic acid sequence, the third nucleic acid sequence, the fourth nucleic acid sequence, the fifth nucleic acid sequence, the sixth nucleic acid sequence, the seventh nucleic acid sequence, the eighth nucleic acid sequence, and the ninth nucleic acid sequence. In some embodiments, one or more of the first nucleic acid sequence, the second nucleic acid sequence, the third nucleic acid sequence, the fourth nucleic acid sequence, the fifth nucleic acid sequence, the sixth nucleic acid sequence, the seventh nucleic acid sequence, the eighth nucleic acid sequence, and the ninth nucleic acid sequence is a vector comprising a plasmid vector, a viral vector, a cosmid, an artificial chromosome, a region of the host chromosome, or any combination thereof. In some embodiments, the reporter gene encodes: a luciferase enzyme; a fluorescent polypeptide; secreted alkaline phosphatase; ß-galactosidase levansucrase; chloramphenicol acetyltransferase (CAT); antibiotic resistance; or a polymerizing enzyme that, when expressed, binds to a promoter operably linked to a gene encoding a detectable polypeptide to drive expression of the detectable polypeptide. In some embodiments, the expression of the detectable polypeptide is greater than an expression of the detectable polypeptide when the gene encoding the detectable polypeptide is included as the reporter gene. In some embodiments, the system further comprises an exogenous nucleic acid encoding a terpene synthase, a nonribosomal peptide synthetase, or a combination thereof. In some embodiments, the exogenous nucleic acid comprises a second barcode sequence sufficient to identify the terpene synthase. In some embodiments, the second barcode comprises an index, and wherein the index comprises greater than or equal to about 6 nucleotide base pairs that are specific to the terpene synthase. In some embodiments, the exogenous nucleic acid further comprises an enzyme configured to catalyze condensation of (i) isopentenyl diphosphate (IPP), (ii) dimethylallyl diphosphate (DMAPP), or (iii) a combination of the IPP and the DMAPP. In some embodiments, the enzyme comprises a geranylgeranyl diphosphate synthase (GGPPS). In some embodiments, the exogenous nucleic acid further encodes a cytochrome P450 enzyme, a cytochrome P450 reductase enzyme, a cytochrome b5 enzyme, an oxidase enzyme, an acyltransferase enzyme, a glycosyltransferase enzyme, a halogenase, a peroxidase, or any combination thereof.
Aspects disclosed herein provide isolated cells comprising the system for high throughput screening of bioactive molecules that inhibit a target enzyme. comprising: a first nucleic acid sequence encoding a phosphorylated tyrosine binding domain; a second nucleic acid sequence encoding a repressor element; a third nucleic acid sequence encoding a subunit of RNA polymerase; a fourth nucleic acid sequence encoding a tyrosine kinase substrate; a fifth nucleic acid sequence encoding tyrosine kinase; a sixth nucleic acid encoding the target enzyme, wherein the sixth nucleic acid comprises a barcode sufficient to identify the target enzyme; a seventh nucleic acid encoding an operator for the repressor element; an eighth nucleic acid sequence comprising a binding site for the RNA polymerase; and a ninth nucleic acid sequence encoding a reporter gene. In some embodiments, the isolated cell comprises a prokaryotic cell. In some embodiments, the isolated cell is obtained from a unicellular organism. In some embodiments, the isolated cell comprises a bacterial cell, an algae cell, an archaea cell, a protozoa cell, or a fungal cell. In some embodiments, the fungal cell is a yeast cell. In some embodiments, the bacterial cell is an Escherichia coli (E. coli) cell.
Aspects disclosed herein provide methods of identifying a modulator of the target enzyme, the methods comprising: (a) introducing into a plurality of cells the system for high throughput screening of bioactive molecules that inhibit a target enzyme that links the modulation of the target enzyme with expression of the reporter gene; (b) measuring expression of the reporter gene in the plurality of cells; (c) detecting in a subset of the plurality of cells an increased or decreased expression of the reporter gene as compared to a reference expression level, thereby indicating a presence of the modulator of the target enzyme produced by cells within the subset of the plurality of cells; identifying the first barcode in cells within the subset of the plurality of cells, thereby identifying the target enzyme of the modulator produced by the cells; and optionally, isolating the modulator of the target enzyme produced by the cells to identify the modulator. In some embodiments, introducing into the plurality of cells an exogenous nucleic acid sequence encoding a terpene synthase, wherein the exogenous nucleic acid sequence comprises a second barcode. In some embodiments, the second barcode comprises an index, and wherein the index comprises greater than or equal to about 6 contiguous base pairs that are specific to the terpene synthase. In some embodiments, the exogenous nucleic acid sequence further encodes an enzyme configured to catalyze condensation of (i) isopentenyl diphosphate (IPP), (ii) dimethylallyl diphosphate (DMAPP), or (iii) a combination of the IPP and the DMAPP. In some embodiments, herein the enzyme comprises a geranylgeranyl diphosphate synthase (GGPPS). In some embodiments, the exogenous nucleic acid sequence further encodes a cytochrome P450 enzyme, a cytochrome P450 reductase enzyme, a cytochrome b5 enzyme, an oxidase enzyme, an acyltransferase enzyme, a glycosyltransferase enzyme, a halogenase, a peroxidase, or any combination thereof. In some embodiments, introducing into the plurality of cells another exogenous nucleic acid sequence encoding a metabolic pathway for (i) the IPP, (ii) the DMAPP, (iii) a combination of the IPP and DMAPP, or (iv) molecules resulting from the condensation of the IPP, the DMAPP, or the combination of the IPP or the DMAPP. In some embodiments, identifying the second barcode in the cells within the subset of the plurality of cells, thereby identifying which of the unique exogenous terpene synthase in each of the cells produces the modulator of the target enzyme in that cell. In some embodiments, the measuring is performed by multiplex sequencing genetic information of the cells. In some embodiments, the multiplex sequencing comprises sequencing-by-synthesis, sequencing by transient binding, single-molecule real-time sequencing, ion semiconductor sequencing (Iron Torrent®), pyrosequencing, combinatorial probe anchor synthesis (cPAS), sequencing-by-ligation, nanopore sequencing, or semiconductor-based electronic sequencing (GenapSys™). In some embodiments, the methods may include associating the expression of the reporter gene in each cell of the subset of the plurality of cells with the barcode for each cell using a computer processor programmed to demultiplex genetic information that was sequenced. In some embodiments, associating the expression of the reporter gene in each cell of the subset of the plurality of cells with the second barcode for each cell of the plurality of cells using a computer processor programmed to demultiplex genetic information that was sequenced. In some embodiments, the plurality of cells comprises 10-1010 colony-forming cells for a single implementation of the method. In some embodiments, culturing the plurality of cells in a growth cell medium, wherein the growth cell medium comprises (i) glycerol at a concentration between about 1% and about 2%, (ii) mevalonate at a concentration comprising less than or equal to about 20 mM, (iii) or a combination of (i) and (ii). In some embodiments, of the modulator of the target enzyme is a modulator of the target enzyme when the expression of the reporter gene detected in (c) is increased. In some embodiments, the modulator of the target enzyme is an activator of the target enzyme when the expression of the reporter gene detected in (c) is decreased.
Aspects disclosed herein provide systems for high throughput screening of bioactive molecules that modulate a target enzyme, the systems comprising: a first nucleic acid sequence encoding a phosphorylated tyrosine binding domain; a second nucleic acid sequence encoding a repressor element; a third nucleic acid sequence encoding a subunit of RNA polymerase; a fourth nucleic acid sequence encoding a tyrosine phosphatase substrate; a fifth nucleic acid sequence encoding tyrosine kinase; a sixth nucleic acid encoding the target enzyme; a seventh nucleic acid encoding an operator for the repressor element; an eighth nucleic acid sequence comprising a binding site for the RNA polymerase; and a ninth nucleic acid sequence encoding a polymerizing enzyme, that when expressed, drives expression of a detectable polypeptide. In some embodiments, the phosphorylated tyrosine binding domain comprises a Src homology 2 (SH2) domain. In some embodiments, the repressor element comprises a cI repressor. In some embodiments, the subunit of the RNA polymerase comprises an omega subunit of the RNA polymerase. In some embodiments, the tyrosine kinase comprises Src kinase. In some embodiments, the target enzyme is a tyrosine phosphatase, a protease, or a combination thereof. In some embodiments, the protease is a viral protease. In some embodiments, the sixth nucleic acid sequence comprises a barcode sufficient to identify the target enzyme. In some embodiments, the barcode comprises an index comprising greater than or equal to about 6 contiguous base pairs that are specific to the target enzyme. In some embodiments, the system further comprises an exogenous nucleic acid encoding a terpene synthase, a nonribosomal peptide synthetase, or a combination thereof. In some embodiments, a single nucleic acid molecule comprising any combination of the first nucleic acid sequence, the second nucleic acid sequence, the third nucleic acid sequence, the fourth nucleic acid sequence, the fifth nucleic acid sequence, the sixth nucleic acid sequence, the seventh nucleic acid sequence, the eighth nucleic acid sequence, and the ninth nucleic acid sequence. In some embodiments, one or more of the first nucleic acid sequence, the second nucleic acid sequence, the third nucleic acid sequence, the fourth nucleic acid sequence, the fifth nucleic acid sequence, the sixth nucleic acid sequence, the seventh nucleic acid sequence, the eighth nucleic acid sequence, and the ninth nucleic acid sequence is vectors comprising a plasmid vector, a viral vector, a cosmid, an artificial chromosome, a region of the host chromosome, or a combination thereof. In some embodiments, the detectable polypeptide comprises a fluorescent polypeptide. In some embodiments, the expression of the detectable polypeptide is greater than an expression of the detectable polypeptide when the gene encoding the detectable polypeptide is included in place of the gene for the polymerizing enzyme. In some embodiments, the RNA polymerase is different than the polymerizing enzyme. In some embodiments, the polymerizing enzyme comprises an RNA polymerizing enzyme. In some embodiments, the RNA polymerizing enzyme comprises T7 RNA polymerase.
Aspects disclosed herein provide isolated cells comprising the system for high throughput screening of bioactive molecules that modulate a target enzyme, the systems comprising: a first nucleic acid sequence encoding a phosphorylated tyrosine binding domain; a second nucleic acid sequence encoding a repressor element; a third nucleic acid sequence encoding a subunit of RNA polymerase; a fourth nucleic acid sequence encoding a tyrosine phosphatase substrate; a fifth nucleic acid sequence encoding tyrosine kinase; a sixth nucleic acid encoding the target enzyme; a seventh nucleic acid encoding an operator for the repressor element; an eighth nucleic acid sequence comprising a binding site for the RNA polymerase; and a ninth nucleic acid sequence encoding a polymerizing enzyme, that when expressed, drives expression of a detectable polypeptide. In some embodiments, the isolated cell comprises a prokaryotic cell. In some embodiments, the isolated cell is obtained from a unicellular organism. In some embodiments, the isolated cell comprises a bacterial cell, an algae cell, an archaea cell, a protozoa cell, or a fungal cell. In some embodiments, the fungal cell is a yeast cell. In some embodiments, the bacterial cell is an Escherichia coli (E. coli) cell.
Aspects disclosed herein provide methods of amplifying expression of a reporter in vivo that is linked to modulation of a target enzyme, the methods comprising: introducing into a cell the system of high throughput screening of bioactive molecules that modulate a target enzyme; and measuring expression of the detectable polypeptide in the cell, wherein the expression of the detectable polypeptide is greater than the expression of the detectable polypeptide when the gene expressing the detectable polypeptide is included in place of the gene for the polymerizing enzyme. In some embodiments, the expression of the detectable polypeptide is greater by at least 2-fold, 3-fold, 4-fold, 5-fold, 10-fold, or 100-fold. In some embodiments, the expression of the detectable polypeptide is greater by between about 2-fold and 100-fold, 3-fold and 90-fold, 4-fold and 80-fold, 5-fold and 70-fold, 6-fold and 60-fold, 7-fold and 50-fold, 8-fold and 40-fold, 9-fold and 30-fold, 10-fold and 20-fold. In some embodiments, the cell comprises a prokaryotic cell. In some embodiments, the cell is obtained from a unicellular organism. In some embodiments, the cell comprises a bacterial cell, an algae cell, an archaea cell, a protozoa cell, or a fungal cell. In some embodiments, the fungal cell is a yeast cell. In some embodiments, the bacterial cell is an Escherichia coli (E. coli) cell.
Aspects disclosed herein provide terpene synthases comprising: an amino acid sequence encoding a terpene synthase, wherein the amino acid sequence comprises a mutation that increases modulation of a target tyrosine phosphatase as compared with an otherwise identical terpene synthase without the mutation. In some embodiments, the terpene synthase comprises γ-humulene synthase (GHS), amorphadiene (AD) synthase, or α-bisabolene (AB) synthase. In some embodiments, the target tyrosine phosphatase comprises a cysteine-specific protein tyrosine phosphatase. In some embodiments, the cysteine-specific protein tyrosine phosphatase comprises a dual-specificity phosphatase (DUSP). In some embodiments, the target tyrosine phosphatase comprises Protein Tyrosine Phosphatase 1B (PTP1B), Protein tyrosine phosphatase non-receptor type 2 (TC-PTP), Protein tyrosine phosphatase non-receptor type 6 (SHP1), Protein tyrosine phosphatase non-receptor type 11 (SHP1), Protein tyrosine phosphatase non-receptor type 12 (PTP-PEST), or Protein tyrosine phosphatase non-receptor type 22 (LYP). In some embodiments, the mutation is a single amino acid mutation. In some embodiments, the amino acid sequence comprises SEQ ID NO: 7, and wherein the mutation comprises A319Q or Y415C, or a combination thereof. In some embodiments, the mutation is with reference to SEQ ID NO: 7, and wherein the mutation comprises (a) A319Q and Y415F, (b) A319Q and S484G, or (c) A319Q and S484G, or a combination thereof. In some embodiments, the mutation comprises an amino acid mutation of an amino acid lacking a hydroxyl group. In some embodiments, the terpene synthase is isolated. In some embodiments, the terpene synthase is purified.
Aspects disclosed herein provide methods of identifying a modulator of the target tyrosine phosphatase, the methods comprising: expressing in a cell the terpene synthase; introducing into the cell an expression system that links the modulation of the target tyrosine phosphatase with expression of a reporter gene; and measuring expression of the reporter gene in the presence of the terpene synthase, wherein an increased expression of the reporter gene as compared with a reference expression level indicates a presence of the modulator of the target tyrosine phosphatase produced by the cell. In some embodiments, the modulator of the target tyrosine phosphatase comprises himachalol, α-himachalene, or β-himachalene. In some embodiments, the cell comprises a prokaryotic cell. In some embodiments, the cell is obtained from a unicellular organism. In some embodiments, the cell comprises a bacterial cell, an algae cell, an archaea cell, a protozoa cell, or a fungal cell. In some embodiments, the fungal cell is a yeast cell. In some embodiments, the bacterial cell is an Escherichia coli (E. coli) cell. In some embodiments, culturing the cell in a growth cell medium, wherein the growth cell medium comprises glycerol at a concentration of less than or equal to about 2% (by volume). In some embodiments, culturing the cell in a growth cell medium, wherein the growth cell medium comprises mevalonate at a concentration of less than or equal to about 20 mM. In some embodiments, the expression system comprises: a first nucleic acid sequence encoding a phosphorylated tyrosine binding domain, wherein the phosphorylated tyrosine binding domain optionally comprises a Src homology 2 (SH2) domain; a second nucleic acid sequence encoding a repressor element, wherein the repressor element is optionally a cI repressor; a third nucleic acid sequence encoding a subunit of an RNA polymerase, wherein the subunit of the RNA polymerase comprises an omega subunit of the RNA polymerase (RpoZ); a fourth nucleic acid sequence encoding a tyrosine phosphatase substrate; a fifth nucleic acid sequence encoding a tyrosine kinase, wherein optionally the tyrosine kinase optionally comprises Src kinase; a sixth nucleic acid encoding the target tyrosine phosphatase; a seventh nucleic acid encoding an operator for the repressor element, wherein the repressor element optionally comprises a cI repressor; an eighth nucleic acid sequence comprising a binding site for the RNA polymerase; and a ninth nucleic acid sequence encoding the reporter gene. In some embodiments, the expression system further comprises a tenth nucleic acid sequence encoding Hsp90 co-chaperone Cdc37. In some embodiments, the first nucleic acid sequence and the second nucleic acid sequence encode the phosphorylated tyrosine binding domain fused with the repressor element. In some embodiments, third nucleic acid sequence and the fourth nucleic acid sequence encode the subunit of the RNA polymerase fused with the tyrosine phosphatase substrate. In some embodiments, the first nucleic acid sequence and the fourth nucleic acid sequence encode the phosphorylated tyrosine binding domain fused with the tyrosine phosphatase substrate. In some embodiments, the second nucleic acid sequence and the third nucleic acid sequence encode the repressor element fused with the subunit of the RNA polymerase. In some embodiments, the tyrosine phosphatase substrate comprises a polypeptide, wherein the polypeptide comprises a tyrosine residue configured to: be phosphorylated by the Src kinase; dephosphorylated by the target tyrosine phosphatase; bind to the SH2 domain when the tyrosine residue is phosphorylated; bind to the SH2 domain with less binding affinity when the tyrosine residue is dephosphorylated as compared to the binding affinity between the tyrosine residue and the SH2 domain when the tyrosine residue is phosphorylated; or any combination of (i) to (iv). In some embodiments, the tyrosine phosphatase substrate comprises a substrate domain derived from a hamster polyomavirus middle T antigen (MidT). In some embodiments, the expression system further comprises a first barcode sequence operably linked to the sixth nucleic acid sequence, wherein the first barcode is sufficient to identify the target tyrosine phosphatase. In some embodiments, the barcode comprises an index, and wherein the index comprises greater than or equal to about 6 contiguous base pairs that are specific to the target tyrosine phosphatase. In some embodiments, the expression system further comprises a single nucleic acid molecule comprising any combination of the first nucleic acid sequence, the second nucleic acid sequence, the third nucleic acid sequence, the fourth nucleic acid sequence, the fifth nucleic acid sequence, the sixth nucleic acid sequence, the seventh nucleic acid sequence, the eighth nucleic acid sequence, and the ninth nucleic acid sequence. In some embodiments, the single nucleic acid molecule comprises a vector comprising a plasmid vector, a viral vector, a cosmid, an artificial chromosome, or a region of a chromosome of the cell. In some embodiments, one or more of the first nucleic acid sequence, the second nucleic acid sequence, the third nucleic acid sequence, the fourth nucleic acid sequence, the fifth nucleic acid sequence, the sixth nucleic acid sequence, the seventh nucleic acid sequence, the eighth nucleic acid sequence, and the ninth nucleic acid sequence is a vector comprising a plasmid vector, a viral vector, a cosmid, an artificial chromosome, a region of the host chromosome, or any combination thereof. In some embodiments, the reporter gene encodes: a luciferase enzyme; a fluorescent polypeptide; secreted alkaline phosphatase; ß-galactosidase levansucrase; chloramphenicol acetyltransferase (CAT); antibiotic resistance; or a polymerizing enzyme that, when expressed, binds to a promoter operably linked to a gene encoding a detectable polypeptide to drive expression of the detectable polypeptide. In some embodiments, the expression of the detectable polypeptide is greater than an expression of the detectable polypeptide when a gene encoding the detectable polypeptide is included as the reporter gene. In some embodiments, the expressing the terpene synthase comprises introducing an exogenous nucleic acid into the cell, wherein the exogenous nucleic acid encodes the terpene synthase. In some embodiments, the exogenous nucleic acid comprises a second barcode sequence sufficient to identify the terpene synthase. In some embodiments, the second barcode comprises an index, and wherein the index comprises greater than or equal to about 6 contiguous base pairs that are specific to the terpene synthase. In some embodiments, the exogenous nucleic acid further encodes an enzyme configured to catalyze condensation of isopentenyl diphosphate (IPP), dimethylallyl diphosphate (DMAPP), or a combination of IPP and DMAPP. In some embodiments, the enzyme comprises a geranylgeranyl diphosphate synthase (GGPPS). In some embodiments, the exogenous nucleic acid further encodes a cytochrome P450 enzyme, a cytochrome P450 reductase enzyme, a cytochrome b5 enzyme, an oxidase enzyme, an acyltransferase enzyme, a glycosyltransferase enzyme, a halogenase, a peroxidase, or any combination thereof. In some embodiments, the expressing the terpene synthase further comprises introducing another exogenous nucleic acid encoding a metabolic pathway for (i) the IPP, (ii) the DMAPP, (iii) molecules resulting from the condensation of the IPP or the DMAPP, or the combination of IPP and DMAPP, or (iv) any combination thereof. In some embodiments, isolating the modulator of the target tyrosine phosphatase. In some embodiments, the reference expression level is derived from a reference cell expressing the second nucleic acid sequence that is modified such that the tyrosine phosphatase substrate contains a mutation that inhibits its binding to the SH2 domain. In some embodiments, the reference expression level is derived from a reference cell comprising a modified terpene synthase comprising a mutation that reduces its activity as compared with an otherwise identical terpene synthase that does not have the mutation.
Aspects provided herein provide isolated nucleic acid molecules encoding the terpene synthase comprising: an amino acid sequence encoding a terpene synthase, wherein the amino acid sequence comprises a mutation that increases modulation of a target tyrosine phosphatase as compared with an otherwise identical terpene synthase without the mutation. In some embodiments, the isolated nucleic acid molecule is a vector comprising a plasmid vector, a viral vector, a cosmid, an artificial chromosome, or a region of a host chromosome.
Aspects disclosed herein provide isolated cells comprising the terpene synthase comprising: an amino acid sequence encoding a terpene synthase, wherein the amino acid sequence comprises a mutation that increases modulation of a target tyrosine phosphatase as compared with an otherwise identical terpene synthase without the mutation. In some embodiments, the cell comprises a prokaryotic cell. In some embodiments, the cell is obtained from a unicellular organism. In some embodiments, the cell comprises a bacterial cell, an algae cell, an archaea cell, a protozoa cell, or a fungal cell. In some embodiments, the fungal cell is a yeast cell. In some embodiments, the bacterial cell is an Escherichia coli (E. coli) cell.
All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.
The novel features of the inventive concepts are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present inventive concepts will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the inventive concepts are utilized, and the accompanying drawings of which:
Disclosed herein are systems, methods, and compositions for the discovery of bioactive molecules with therapeutic potential that modulate the activity of a target enzyme. The disclosure also provides systems, methods and compositions for directed evolution of metabolic pathways that produce bioactive molecules that modulate target enzyme function. The systems and methods disclosed herein have been optimized for high-throughput screens of bioactive modulators of a target enzyme (e.g., terpenoids) that, in some cases, mimic or recreate natural processes of diversification and selection. For instance, the methods and systems for high-throughput screens may involve large numbers of metabolic pathways, target enzymes, or both, thereby increasing the diversity and number of bioactive molecules that can be discovered. In some embodiments, the system comprises one or more expression systems including without limitation (i) a two-hybrid system that, when expressed in a cell, links a detectable output (e.g., luminescence or cell growth) to the modulation of a target enzyme (e.g., therapeutic target), and (ii) a metabolic system that enables the biosynthesis of structurally varied bioactive molecules that modulate a target enzyme (e.g., potential therapeutic agent). In some embodiments, the cell is a microorganism, such as a bacterial cell (e.g., E. coli). In some embodiments, the detectable output is amplified by linking the activity of the target enzyme to a gene of interest (GOI) encoding an enzyme that drives expression of a detectable polypeptide, such as a fluorescent or bioluminescent polypeptide.
Some aspects of this disclosure provide systems, methods and compositions for identifying bioactive molecules that modulate the activity of proteases, protein phosphatases (e.g., protein tyrosine phosphatase), or combinations thereof. In some embodiments, the systems, methods and compositions described herein are capable of identifying bioactive molecules with therapeutic potential that modulate the activity of a various proteases utilizing a specific variety of the two-hybrid system that contains a protease cleavage recognition motif that, when cleaved by the protease, disrupts transcription of the GOI. In some embodiments, target enzymes may be an enzyme of a pathogen. For example, a target enzyme may be a functional protein of a virus (e.g., viral protease), such that an implementation of the systems and methods disclosed herein is used to discover a bioactive molecule (e.g., therapeutic molecule) that targets the functional protein of a virus. Similarly, in some embodiments, the target enzyme may be a functional protein of a bacterial pathogen, a prion pathogen, or any one of various pathogens where the functional protein is tied to the infectivity, severity, and/or progression of a disease associated with the pathogen. Utilizing the systems and methods disclosed herein may accelerate the discovery process of therapeutic molecules.
Some aspects of this disclosure provide systems, methods and compositions for identifying novel synthases that produce the bioactive molecules disclosed herein. In some embodiments, the novel synthases are terpene synthases or non-ribosomal peptide synthetases. The present disclosure provides numerous modified synthases that have undergone single site mutagenesis (SSM) to improve production of bioactive molecules of interest. In some embodiments, modified terpene synthases disclosed herein produce increased diversity novel terpenoids with therapeutic potential.
Aspects of this disclosure also provide cells (e.g., microorganisms) that are configured to guide the discovery and biosynthesis of the bioactive molecules as novel targeted therapeutics. In some embodiments the cells are semi-synthetic. In some embodiments the cell comprises the one or more expression systems disclosed herein. In some embodiments, the cells produce the bioactive molecules, such as metabolic products or modulators (e.g., activators or inhibitors) of a target enzyme, disclosed herein. In some cases, discovered metabolic products may exhibit single-digit micromolar half maximal inhibitory concentrations (IC50s) or inhibitor constants (Kis), or unusual modes of inhibition, or a combination thereof.
Drug design is an exceedingly difficult problem. Despite advances in structural biology and computational chemistry, the design of molecules that bind tightly to specific disease-relevant proteins can still be extremely difficult. Some drug development processes may begin with screens of large molecular libraries. A molecule, once identified, may be synthesized in quantities sufficient for subsequent analysis, optimization, and clinical evaluation—which is a challenging feat. The economics of pharmaceutical development for infectious diseases may disincentivize costly discovery efforts until after an outbreak has occurred—which may constrain the time available to search a given chemical space accessible with some screening methodologies.
Meanwhile, nature has endowed living systems with the catalytic machinery to build an enormous variety of biologically active molecules. These living systems evolved to synthesize various biologically active molecules to carry out important metabolic and ecological functions (e.g., the phytochemical recruitment of predators of herbivorous insects) which sometimes exhibit useful medicinal properties in humans. Over the years, screens of environmental extracts and natural product libraries—augmented, on occasion, with combinatorial (bio) chemistry—have uncovered a diverse set of therapeutics, from aspirin to paclitaxel. Unfortunately, these screens may be resource intensive, limited by low natural titers, and largely subject to serendipity. Bioinformatic tools, in turn, have permitted the identification of biosynthetic gene clusters, where co-localized resistance genes can reveal the biochemical function of their products. The therapeutic applications of many natural products, however, differ from their native functions, and many biosynthetic pathways can, when appropriately reconfigured, produce entirely new and, perhaps, more effective therapeutic molecules. Methods for identifying and evolving natural products that solve specific, therapeutically relevant challenges remain largely undeveloped; as a result, the biomedical potential of these molecules—and the enzymes that make them—has yet to be fully realized.
The system disclosed herein, in some embodiments, comprise a two-hybrid system (e.g., bacterial two-hybrid (B2H) system) that, when transfected into a cell, links survival or a detectable output of the cell to production of modulator of a target enzyme (e.g., therapeutic target) encoded by the two-hybrid system. In some embodiments, the system also comprises a one or more exogenous nucleic acid molecules encoding a metabolic pathway and a synthase responsible for expressing the bioactive molecules in the cell that modulate the target enzyme. In some embodiments, the cell is a genetically encoded microorganism (e.g., E. coli) engineered to express the two-hybrid system, the metabolic system, and the synthase under conditions sufficient to guide the cell to assemble various bioactive molecules that modulate the intended target enzyme. This approach has numerous important benefits over traditional drug discovery processes, including, but not limited to: (i) it can enable rapid, fermentation-based scale up for compound optimization, preclinical studies, and early human trials, and, thus, promises to accelerate the pace—and reduce the cost—of therapeutic development; (ii) it does not necessarily presuppose a specific molecular structure and thus facilitates the identification of nonintuitive relationships between modulators (e.g., inhibitors) and target enzymes (e.g., drug targets); (iii) it does not necessarily require the specification of a single binding site and thus permits the discovery of new sites; (iv) it can use cellular machinery (e.g., chaperones) to stabilize full-length drug targets; (v) it permits the construction of structurally varied leads, or “backups”, that can mitigate risk in drug development; (vi) it is compatible with DNA barcoding technology and next-generation sequencing and, thus, permits multiplexing across many pathways and many targets. The economics of the system are well suited for multi-target discovery campaigns designed to produce broad set of new, synthetically tractable lead compounds before a pandemic has occurred (or rapidly after it begins). The inventive concepts disclosed herein build on certain aspects of the systems disclosed in U.S. patent application Ser. Nos. 17/141,321 and 17/859,509, each of which is hereby incorporated by reference in its entirety.
Provided herein, in some aspects, are genetically-encoded systems that have been modified to identify modulators of new target enzymes (e.g., therapeutic targets), such as proteases. In some embodiments, the two-hybrid system, the metabolic pathway, the synthase, or any combination thereof, of the genetically-encoded systems is modified. For example, referring to
In some aspects, the systems are engineered to produce natural and unnatural protease inhibitors of a particular drug target by harnessing the endogenous biosynthetic pathways of the cell. In some embodiments, the proteases are human proteases, viral proteases, or a combination thereof. Discovery of viral protease inhibitors may be relevant to preventing or treating disease or conditions associated with pathogenic infections by disrupting the function(s) of a given virus (e.g., HIV-1 protease (HIV-1Pr) and 3-chymotrypsin-like protease (3ClPro) from SARS-COV-2). In some embodiments, the human proteases comprise Ubiquitin-specific-processing protease 7 (USP7). Discovery of human protease inhibitors may be relevant to preventing or treating diseases or a conditions associated with the overactivity or overexpression of proteases, including for example, vascular disease, cancer, and others.
Further, the optimal design of each protease system and workflow disclosed herein is adaptable to the development of similar tools for the discovery of modulators of other types of therapeutic targets. Using the evolved 74 terpenoid pathways and identified several enzyme combinations that show altered resistance phenotypes (implying biosynthesis of protease inhibitors). The system, which may encompass a bacterial two-hybrid system may enable the detection of biosynthetically accessible small molecules that inhibit proteases and other potential therapeutic targets.
Optimization of the systems disclosed herein to identify novel modulators of proteases has a profound implications for treating difficult-to-treat disease or conditions associated with protease activity. Proteases are centrally important to many biochemical processes and have provided a rich set of targets for treating human diseases. These enzymes, which catalyze the hydrolysis of peptide bonds, coordinate the dynamic remodeling—and functional rewiring—of the complex protein systems that underlie blood clotting, repair, and viral assembly, among other biochemical feats. Over the years, proteases have emerged as important targets for other viral diseases—notably, hepatitis C and Coronavirus disease of 2019 (COVID-19)—as well as cardiovascular disorders and cancer. Despite their therapeutic promise, proteases often evolve resistance mutations, which can emerge early in clinical trials, and remain subject to the same slow development timelines that plague other drugs. New approaches for discovering protease inhibitors could help address resistance mutations and accelerate drug development.
Natural products are a longstanding source of pharmaceuticals and bioactive compounds, including protease inhibitors, but have proven challenging to screen in high-throughput assays. Their low natural abundance and complex biological matrices (e.g., multicomponent extracts) tend to complicate compound detection and dereplication, while their chemical structures, which often include multiple stereocenters, tend to slow scale-up and hit optimization. Advances in microbial genetics and bioinformatics have led to an explosion of new biosynthetic gene clusters (BGCs) and uncovered enzymes capable of adding biochemically nonstandard functionalities (e.g., terminal alkynes, halogens, and hydrazines). The structures and biological activities of biosynthetic compounds, however, remain challenging to predict from sequence data alone, and functional characterization typically requires laborious extraction and purification steps.
The genetically encoded microorganisms disclosed herein, which are equipped with the systems disclosed herein, offer a promising means of accelerating the discovery of pharmaceutically relevant natural products. These in vivo systems link the inhibition of a heterologously expressed target enzyme to a biochemical output (e.g., growth, color formation, or fluorescence); they have several important advantages over in vitro assays: (i) they can screen DNA-encoded pathways, where library size is limited by transformation efficiency; (ii) they require only a small amount of target protein, which is maintained by a living cell, and can avoid the laborious protein purification and stabilization steps required for in vitro assays; (iii) they are designed to detect inhibitors within the cellular milieu and can thus provide an initial—if, largely, general—screen for inhibitor stability and toxicity; and (iv) they facilitate rapid scale-up of molecular synthesis via microbial fermentation.
Genetically encoded biosensors for enzyme inhibitors are sparse; to date, most have focused on controlling cell viability. Illustrative strategies for protease inhibitors include (i) the addition of protease recognition sites to antibiotic resistance proteins (e.g., the metal-tetracycline/H+ antiporter) or essential regulatory enzymes (e.g., adenylate cyclase, which synthesizes cyclic AMP), or (ii) the use of proteolyzable “pro” domains to cage toxic proteins (e.g., ribosomal protein S12, which restores the streptomycin sensitivity of streptomycin-resistant E. coli). Several of these systems have enabled the detection of peptide inhibitors synthesized in microbial hosts, but their direct modification of phenotype-specific proteins (e.g., the adenylate cyclase) tends to limit their rapid extension to other proteases or biochemical outputs.
Also provided, in some aspects, are modified synthase enzymes (e.g., terpene synthases) expressed by the genetically-encoded systems disclosed herein. In some embodiments, the system has been modified to increase the diversity of the modulators produced by the cell. In some embodiments, the nucleic acid molecules encoding the synthase (e.g., enzyme responsible for producing the therapeutic target, e.g., protease or a phosphatase) may be modified to produce mutant synthase enzymes in the cell that produce a more diverse range of therapeutic targets against which the cell produces a more diverse range of modulators. For example, as described herein, the synthase responsible for producing terpenoids (e.g., terpene synthase) may be modified to produce a wider range of terpenes or terpenoid. In some embodiments, γ-humulene synthase, a low-producing terpene synthase generating many products, is mutated at one or more (e.g., 2) amino acid positions under conditions sufficient produce a larger number of diverse terpenoid inhibitors. In some embodiments, the synthase variants produced at least two potential terpenoid inhibitors with titers increased 12- and 50-fold compared to the starting enzyme.
Also provided herein, in some aspects, are extensions of the system described here to screen large numbers of pathways and target enzymes. In some embodiments, molecular barcodes may be applied to one or more components of the genetically-encoded systems, such as the synthase, the metabolic pathway, the target enzyme, or any combination thereof. In some embodiments, the efficiency of the system is increased by pooling cells having barcoded components and analyzing them using multiplex sequencing analysis. Secondary sequence data analysis utilizing suitable computer programs demultiplexes the cells, and assigns the unique molecular barcode to the one or more components of the genetically-encoded systems.
In a pilot experiment described herein, the inventors of the instant disclosure combined (i) three isoprenoid pathways, (ii) 37 terpenoid pathways, and five protein tyrosine phosphatase (PTP)-specific B2H systems in a single screen. To overcome the challenge of screening with the drop-based plating the 555 possible combinations of these three sets of plasmids, barcoding both (i) the terpenoid pathways and (ii) the B2H systems were performed to reduce the required number of transformations to 15 (e.g., one for each precursor-B2H combination). In this pilot experiment, each transformation was plated on both selective and non-selective media, the pools were amplified from each plate with a PCR reaction that introduced a second barcode for the PTP of interest, and next generation sequencing was used to measure the enrichment of specific pathways. As disclosed herein, high quality, statistically significant data was obtained for a 104-105 variants when sequencing short amplicons, illustrating that the proposed strategy is compatible with very large biosynthetic libraries and/or numerous target two-hybrid systems. Without being bound by any particular theory, the high throughput extensions of the systems disclosed herein are applicable to systems configured to identify bioactive molecules that modulate any target enzyme, not just phosphatases as illustrated in this pilot study.
Also provided, in some aspects, are kits comprising the systems disclosed herein, and instructions for how to use the systems disclosed herein to identify novel modulators of an intended therapeutic target, or purify novel modulators of an intended therapeutic target, or a combination thereof. Such kits may comprise a container to store the system components and instructions.
Provided herein are systems for identifying a novel modulator of a target enzyme, or identifying one or more metabolic pathways that produce bioactive molecules that modulate the activity of a target enzyme, or both. In some embodiments, the target enzyme is a therapeutic target (e.g., phosphatase, protease) disclosed herein. In some embodiments, systems comprise genetically encoded systems that, when introduced into a cell under suitable conditions, induces the cell to produce novel modulators of the target enzyme. The systems disclosed herein comprise the cell, which, in some cases, is referred to herein as a genetically-encoded microorganism, once it has been engineered to contain the genetically-encoded systems disclosed herein. Also provided are systems for expanding screens for the novel modulators of the target enzyme or metabolic pathways from the genetically-encoded systems using high throughput analysis, such as multiplex sequencing. To that end, certain computer systems are also encompassed in the systems disclosed herein, which store and are programmed to perform instructions for analyzing the multiplex sequencing results, such as demultiplexing, sequence alignment, and so forth.
Provided herein, in some aspects are genetically-encoded systems that comprise one or more system components, such as one or more nucleic acid molecules encoding a two-hybrid system, a metabolic pathway, an enzyme for producing the target enzyme, or any combination thereof. In some embodiments, the target enzyme comprises a protease. In some embodiments, the target enzyme comprises a phosphatase (e.g., tyrosine phosphatase). In some embodiments, the two-hybrid system comprises or is a bacterial two-hybrid system. In some embodiments, the enzyme for producing the target enzyme comprises a terpene synthase. In some embodiments, the metabolic pathway comprises a mevalonate pathway, a methylerthritol 4-phosphate pathway, or a combination thereof. In some embodiments, the one or more nucleic acid molecules encoding the metabolic pathway comprises one or more metabolic intermediates for terpene synthesis. In some embodiments, the system comprise a cell. In some embodiments, the cell comprises the one or more nucleic acid molecules encoding a two-hybrid system, a metabolic pathway, an enzyme for producing the target enzyme, or any combination thereof. In some embodiments, the cell is configured to express the gene expression products from the system to facilitate production of novel modulators of an intended target enzyme by the cell.
Provided herein are cells that may be engineered to contain or express one or more systems disclosed herein. In some embodiments, the cell comprises the two-hybrid system. In some embodiments, the cell comprises the metabolic pathway. In some embodiments, the cell comprises the enzyme for producing the target enzyme (e.g., therapeutic target). In some embodiments, the enzyme is a synthase (e.g., terpene synthase). In some embodiments, the cell comprises one or more nucleic acid molecules encoding two-hybrid system, the metabolic pathway, the enzyme for producing the target enzyme, or any combination thereof.
In some embodiments, the cell comprises a microbial cell. In some embodiments, the microbial cell comprises an Escherichia coli cell. In some embodiments, the microbial cell comprises a Bacillus subtilis cell. In some embodiments, the microbial cell comprises a Cupriavidus necator cell. In some embodiments, the microbial cell comprises a Streptomyces lividans cell. In some embodiments, the microbial cell comprises a Streptomyces reveromyceticus cell. In some embodiments, the microbial cell comprises a Streptomyces venezuelae cell. In some embodiments, the microbial cell comprises a Synechococcus leopoliencsis cell. In some embodiments, the microbial cell comprises a Saccharomyces cerevisiae cell. In some embodiments, the microbial cell comprises a Saccharomyces coelicolor cell. In some embodiments, the microbial cell comprises a Pichia pastoris cell. In some embodiments, the microbial cell comprises a Pichia guilliermondii cell. In some embodiments, the microbial cell comprises a Yarrowia lipolytica cell. In some embodiments, the microbial cell comprises a Rhodosporidium toruloides cell. In some embodiments, the microbial cell comprises a Metarhizium brunneum cell. In some embodiments, the microbial cell comprises a Aspergillus niger cell. In some embodiments, the microbial cell comprises a Rhizopus oryzae cell.
In some embodiments, the cell comprises a mammalian cell. In some embodiments, the mammalian cell comprises a Chinese hamster ovary cell. In some embodiments, the mammalian cell comprises a baby hamster kidney cell. In some embodiments, the mammalian cell comprises a HeLa cell (a cervical cancer cell derived from Henrietta Lacks). In some embodiments, the mammalian cell comprises a human embryonic kidney cell. In some embodiments, the mammalian cell comprises a human retinal cell. In some embodiments, the mammalian cell comprises a Sp2/0 mouse myeloma cell. In some embodiments, the mammalian cell comprises a NS0 mouse myeloma cell.
In some embodiments, the cell is wild-type. In some embodiments, the cell is modified relative to a wild-type cell of the same type. For example, the cell may be modified to express the metabolic pathway prior to introducing the two-hybrid system into the cell. In another example, the cell may be modified to express the two-hybrid system prior to introducing the metabolic pathway into the cell. In another example, the cell may lack one or more endogenous genes, such as for example, a gene to encode the target enzyme where applicable. In another example, the cell may lack a gene for a subunit of RNA polymerase or portions thereof, such as the omega subunit. In another example, the cell may lack one or more native genes that enhance the intracellular production or intracellular accumulation of a bioactive molecule that modulates the activity of a target enzyme in the cell. In another example, the cell may have a deletion or mutation that reduces homologous recombination events likely to disrupt plasmids, such as a deletion of the recA1 gene. In some cases, the cell may have a deletion or mutation that improves the titratability of certain inducible promoters such as an arabinose-inducible promoter. In some embodiments, the cell is a cell line. In some embodiments, the cell line is immortalized.
In some embodiments, the cell is stored in a medium, such as Luria-Bertani liquid medium, Luria-Bertani solid medium, terrific broth liquid medium, terrific broth solid medium, yeast extract peptone dextrose liquid medium, yeast extract peptone dextrose solid medium, yeast synthetic drop-out medium, yeast nitrogen base, modified minimum essential medium, Dulbecco's modified Eagle medium, Ham's F10 medium, Ham's F12 medium, Roswell Park Memorial Institute medium, Glasgow's modified minimum essential medium, or Leibovitz L-15 medium. In some embodiments, the cell is stored in a medium as a suspension or attached to a surface (e.g., flask, plate, or well). In some embodiments, the media comprises one or more media components, such as an energy source (e.g., glucose), protein, vitamins, inorganic salts, serum, growth factors, hormones, attachment factors, amino acids, peptone, carbohydrates, minerals, pH buffer system, pH indicators, metals, blood, gelling agents (e.g., agar or pectin), or any combination thereof. In some embodiments, the media is selection media that contains a means for selecting only the cells that produced a modulator of a target enzyme (e.g., terpenoid inhibitor, protease inhibitor). In some embodiments, such selection media may contain an antibiotic, antiseptic, peptone, carbohydrate, inorganic salt, chemical substances (e.g., bile salts, lithium chloride, irgasan, tamoxifen, or potassium tellurite), adenosine deaminase, cytosine deaminase, dihydrofolate reductase, dye, phage, or any combination thereof. In some embodiments, such selection media may lack an amino acid, nutrient, carbohydrate, nucleoside, inorganic salt, serum, growth factor, or any combination thereof. In some embodiments, the antibiotic comprises penicillin, streptomycin, ampicillin, carbenicillin, spectinomycin, bleomycin, novobiocin, doxycycline, tetracycline, neomycin, kanamycin, zeocin, puromycin, geneticin, amphotericin, gentamicin, polymyxin B, hygromycin B, blasticidin, vancomycin, erythromycin, chloramphenicol, ticarcillin, or cefixime. In some embodiments, the media is a growth cell medium. In some embodiments, the growth cell medium may comprise glycerol at a concentration between 0% and 2% (by volume). In some embodiments, the growth medium comprises mevalonate at a concentration between 0 mM and 20 mM. In some embodiments, the growth medium comprises isopropyl β-D-thiogalactopyranoside (iPTG) at a concentration between 0 mM and 0.5 mM. In some embodiments, the growth medium comprises 3-morpholinopropane-1-sulfonic acid (MOPS) at a concentration between 0 mM and 50 mM. In some embodiments, the growth medium comprises sucrose at a concentration between 0% and 5% weight/volume.
The cells disclosed here may be isolated or purified. Suitable methods of purifying or isolating a cell may be found in Invitrogen, Gibco. “Cell culture basics.” Life technologies (2014), Sivashanmugam, Arun, et al. “Practical protocols for production of very high yields of recombinant proteins using Escherichia coli.” Protein science 18.5 (2009): 936-948, and Clontech. “Yeast Protocols Handbook.” Takara Bio (2009), each of which is incorporated by reference in its entirety.
In some embodiments, the cell comprises a prokaryotic cell. In some embodiments, the cell is obtained from a unicellular organism. In some embodiments, the cell is or comprises a bacterial cell, an algae cell, an archaea cell, a protozoa cell, or a fungal cell. In some embodiments, the fungal cell may be a yeast cell. In some embodiments, the bacterial cell may be an E. coli cell. In some embodiments, the cell is isolated or purified. In some embodiments, the cell is in a cell line or cell culture. In some embodiments, a plurality of cells are provided, wherein each cell comprises a unique expression system disclosed herein.
Provided herein are improved systems for producing novel modulators of a target enzyme (e.g., therapeutic target) by linking expression of a gene of interest (GOI) with production of a novel modulator with a two-hybrid system. In some embodiments, the two-hybrid system comprises a bacterial two-hybrid (B2H) system. In some embodiments, the two-hybrid system comprises a yeast two-hybrid (Y2H) system. In some embodiments, the two-hybrid system is a fluorescent two-hybrid system. In some embodiments, the two-hybrid system is an enzymatic two-hybrid system. In some embodiments, the Y2H is a slit-ubiquitin Y2H system. In some embodiments, the GOI encodes a survival advantage (e.g., antibacterial resistance) for the cell such that the two-hybrid system utilizes cell survival as a selection pressure, to identify cells that produced the modulators of the target enzyme.
In some embodiments, the two-hybrid system comprises one or more nucleic acid molecules encoding a receptor (e.g. phosphorylated protein binding domain), a DNA binding protein (e.g., repressor element), a subunit of RNA polymerase or portions thereof, a ligand (e.g. kinase substrate), a target enzyme, an operator for the repressor element, or a combination thereof. In some embodiments, where the receptor is or comprises a phosphorylated protein binding domain and the ligand is or comprises a kinase substrate, then the one or more nucleic acid molecules also encode a kinase. In some embodiments, the one or more nucleic acid molecules comprises a binding site for the subunit for the RNA polymerase configured to bind to the subunit for RNA polymerase and initiate transcription of a gene of interest (GOI), such as a reporter gene. In some embodiments, the phosphorylated protein binding domain is a phosphorylated tyrosine binding domain. In some embodiments, the kinase substrate is a tyrosine kinase substrate. In some embodiments, the kinase is a tyrosine kinase. In some embodiments the GOI is a reporter gene. In some embodiments, the one or more nucleic acid molecules further encodes a chaperone polypeptide. In some embodiments, the one or more nucleic acid molecules is or comprises an expression vector. In some embodiments, the expression vector is or comprises a plasmid vector, a viral vector, a cosmid, an artificial chromosome, or a region of the host chromosome. In some embodiments, the two-hybrid system comprises less than or equal to 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleic acid molecules encoding the two-hybrid system. In some embodiments, more than or equal to 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 nucleic acid molecules encode the two-hybrid system. In some embodiments, the two-hybrid system comprises two (2) nucleic acid molecules encoding the two-hybrid system. In the case of a two-hybrid system comprising or consisting of 2 nucleic acid molecules, in some embodiments, the first nucleic acid molecule encodes the receptor (e.g., phosphorylated tyrosine binding domain), a repressor element, a subunit of RNA polymerase or portions thereof, ligand (e.g., a tyrosine kinase substrate), tyrosine kinase, and the target enzyme; and the second nucleic acid molecule encodes the operator for the repressor element and comprises a binding site for the subunit for the RNA polymerase.
In some embodiments, the receptor comprises a polypeptide suitable for binding the ligand. In some embodiments, the receptor is or comprises a ligand-binding domain. In some embodiments, the receptor is or comprises an antibody, single-domain antibody, single-chain fragment (scFv), miniprotein, a phosphorylated protein binding protein or domain thereof, or a ligand-binding portion thereof. In some embodiments, the receptor and ligand binding (e.g., forming a receptor-ligand pair) is phosphorylation dependent. For example, the receptor is or comprises a phosphorylated protein binding domain and the ligand is or comprises a kinase substrate, such that when the kinase substrate is phosphorylated, it binds to the receptor. In some embodiments, the phosphorylated protein binding domain comprises a phosphorylated serine/threonine binding domain. In some embodiments the phosphorylated serine/threonine binding domain comprises a 14-3-3, polo box, FHA, FF, BRCT, WW, WD40, or MH2 domain. In some embodiments, the phosphorylated protein binding domain comprises or is a phosphorylated tyrosine binding domain. In some embodiments, the phosphorylated tyrosine binding domain comprises Src homology 2 (SH2) domain, a phosphotyrosine-binding domain (PTB), or phosphotyrosine-interaction (PI) domain. In some embodiments, the phosphorylated protein binding domain comprises a modified or truncated polypeptide. In some embodiments, the phosphorylated protein binding domain comprises a truncated SH2. In some embodiments, the receptor and ligand binding is not phosphorylation dependent. In some embodiments, the receptor is or comprises an antibody or antigen-binding fragment thereof. In some embodiments, the ligand comprises a monobody, such as the HA4 monobody. In some embodiments, the receptor comprises an SH2 domain that can bind to nonphosphorylated proteins. In some embodiments, the receptor comprises the SH2 domain from Abl kinase. In some embodiments the ligand comprises an SspA binding domain. In some embodiments, the SspA binding domain is coupled to a light oxygen voltage 2 (LOV2) domain from Avena sativa such that it is partially obscured when LOV2 is in its dark state. In some embodiments, the receptor comprises a SspB domain, which is capable of binding to the SspA domain.
In some embodiments, the DNA binding protein is suitable for binding to a transcriptional start site of a gene of interest disclosed here. In some embodiments, the DNA binding protein is or comprises a repressor element. In some embodiments, the repressor element functions to repress transcription of the gene of interest. In other embodiments, the repressor element does not function to repress transcription of the gene of interest. In such embodiments, virtually any DNA binding protein will work in the two-hybrid system disclosed herein. Non-limiting DNA binding proteins include enhancers, transcription factors, or repressors. In some embodiments, the repressor element comprises a cI repressor. In some embodiments, the repressor element is a CymR repressor. In some embodiments, the repressor element is a Cro repressor. In some embodiments, the repressor element is any protein that binds to DNA with an affinity sufficient to activate transcription of a nearby gene of interest when the repressor element is fused to a subunit of RNA polymerase or portions thereof such that it can localize RNA polymerase to the gene of interest. In some embodiments, the repressor element is a nuclease DNA binding element. In some embodiments, the repressor element is a Cas DNA binding element. In some embodiments, the repressor element is a transcription factor.
In some embodiments, the subunit of the RNA polymerase is derived from a prokaryotic organism. In some embodiments, the prokaryotic organism is a microbe, such as bacteria, archaea, protozoa, fungi, algae, lichens, slime molds, viruses, or prions. In some embodiments, the bacteria comprises Escherichia Coli, Bacillus Subtilis, Mycobacterium, Streptomyces, or Cyanobacteria. In some embodiments, the bacteria comprises E. Coli. In some embodiments, the subunit of the RNA polymerase is derived from a eukaryotic organism. In some embodiments, the eukaryotic organism is Arabidopsis thaliana, yeast, fly (e.g., Drosophila melanogaster), worm (e.g., Caenorhabditis elegans), zebrafish (e.g., Danio reiro), or mouse (e.g., Mus musculus). In some embodiments, the subunit of the RNA polymerase or portions thereof comprises an omega subunit of RNA polymerase (RPω, encoded by gene RpoZ). In some embodiments, RPω (RpoZ) may be identified with National Library of Medicine (NCBI) Gene ID: 12930353). In some embodiments, the subunit of RNA polymerase or portions thereof comprises an alpha subunit of RNA polymerase (RPα, encoded by gene rpoA). In some embodiments, the subunit or portions thereof is a sigma factor. In the case of eukaryotic RNA polymerase, in some embodiments, the RNA polymerase is or comprises RNA polymerase II. A portion of a subunit of an RNA polymerase disclosed herein may be, for example, the portion of the subunit that recruiting RNA polymerase to the transcriptional start site of a GOI disclosed herein. In some embodiments, the portion of the subunit of RNA polymerase comprises the N-terminus of the amino acid sequence of the subunit, the C-terminus of the amino acid sequence of the subunit, both the N-terminus and the C-terminus of the amino acid sequence of the subunit, or neither of the N-terminus and the C-terminus of the amino acid sequence of the subunit.
In some embodiments, the kinase comprises a serine/threonine kinase. In some embodiments, the kinase comprises or is a tyrosine kinase. In some embodiments, the tyrosine kinase comprises Src Kinase. In some embodiments, Src Kinase is derived from Homo sapiens (human), which may be identified with NCBI Gene ID: 6714. In some embodiments, the Src Kinase is derived from Mus musculus (Mouse), Gallus gallus (Chicken), Rattus norvegicus (Rat), or Bos taurus (Bovine). In some embodiments, Src Kinase comprises an amino acid sequence comprising SEQ ID NO 74. In some embodiments, Src Kinase comprises an amino acid sequence that is greater than or equal to 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 74. In some embodiments, the kinase is or comprises isopentenyl kinase. In some embodiments, isopentenyl kinase comprises an amino acid sequence provided in SEQ ID NO: 269. In some embodiments isopentenyl kinase comprises an amino acid sequence that is greater than or equal to 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 269. In some embodiments, the kinase is or comprises Choline kinase. In some embodiments, Choline kinase comprises an amino acid sequence provided in SEQ ID NO: 267. In some embodiments Choline kinase comprises an amino acid sequence that is greater than or equal to 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 267. In some embodiments, the kinase is a portion of a kinase enzyme, such as a truncated version of any one of SEQ ID NOS: 74, 269, or 267. In some embodiments, the truncation comprises a truncation of an N-terminus, a C-terminus, or both of the amino acid sequence. In some embodiments, the Src kinase comprises a truncation of amino acids 1-250, such as in SEQ ID NO: 246. In some embodiments, the Lck kinase comprises a truncation of amino acids 1-206 and 497-509, such as in SEQ ID NO: 247. In some embodiments, the kinase is or comprises lymphocyte-specific protein tyrosine kinase (Lck). In some embodiments, the kinase is or comprises Fyn kinase. In some embodiments, Fyn kinase comprises an amino acid sequence provided in SEQ ID NO: 248. In some embodiments, Fyn kinase comprises an amino acid sequence that is greater than or equal to 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 248. In some embodiments, the kinase is or comprises proto-oncogene tyrosine-protein kinase (Yes). In some embodiments, Yes kinase comprises an amino acid sequence provided in SEQ ID NO: 249. In some embodiments, Yes kinase comprises an amino acid sequence that is greater than or equal to 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 249. In some embodiments, the kinase is or comprises tyrosine kinase EphA2 (EphA2). In some embodiments, EphA2 comprises an amino acid sequence provided in SEQ ID NO: 250. In some embodiments, EphA2 comprises an amino acid sequence that is greater than or equal to 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 250. In some embodiments, the kinase is or comprises Bruton's tyrosine kinase (BTK). In some embodiments, BTK comprises an amino acid sequence provided in SEQ ID NO: 251. In some embodiments, BTK comprises an amino acid sequence that is greater than or equal to 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 251
In some embodiments, the chaperone polypeptide comprises Hsp90 co-chaperone Cdc37. In some embodiments, the chaperone polypeptide comprises the GroEL/GroES complex. In some embodiments, Cdc37 comprises an amino acid sequence comprising SEQ ID NO 76. In some embodiments, Cdc37 comprises an amino acid sequence that is greater than or equal to 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 76.
In some embodiments, the components above (e.g., kinase, chaperone, receptor, ligand, etc.) may be derived from a prokaryotic organism. In some embodiments, the prokaryotic organism is a microbe, such as bacteria, archaea, protozoa, fungi, algae, lichens, slime molds, viruses, or prions. In some embodiments, the bacteria comprises Escherichia Coli, Bacillus Subtilis, Mycobacterium, Streptomyces, or Cyanobacteria. In some embodiments, the bacteria comprises E. Coli. In some embodiments, the components above may be derived from a eukaryotic organism. In some embodiments, the eukaryotic organism is Arabidopsis thaliana, yeast, fly (e.g., Drosophila melanogaster), worm (e.g., Caenorhabditis elegans), zebrafish (e.g., Danio reiro), or mice (e.g., Mus musculus).
Two or more two-hybrid system components may be coupled to each other. In some embodiments two or more of the receptor (e.g., phosphorylated tyrosine binding domain), the DNA binding protein (e.g., repressor element), the subunit of RNA polymerase or portions thereof, the ligand (e.g., tyrosine kinase substrate), the tyrosine kinase, the target enzyme, the operator for the repressor element, are coupled to each other. In some embodiments, the receptor (e.g., phosphorylated tyrosine binding domain) is coupled to the DNA binding protein (e.g., repressor element). In some embodiments, the SH2 domain is coupled with the cI repressor. In some embodiments, the subunit of the RNA polymerase or portions thereof is coupled with the ligand (e.g., tyrosine phosphatase substrate). In some embodiments, the RpoZ is coupled to the ligand (e.g., tyrosine phosphatase substrate). In some embodiments, the receptor (e.g., phosphorylated tyrosine binding domain) is coupled to the ligand (e.g., tyrosine phosphatase substrate). In some embodiments, the SH2 domain is coupled to the tyrosine phosphatase substrate. In some embodiments, the repressor element is coupled to the subunit of the RNA polymerase or portions thereof. In some embodiments, the cI repressor is coupled to the RpoZ. In some embodiments, the two or more components of the two-hybrid system are coupled to each other by fusion (e.g., expression of a fusion protein). In some embodiments, the two or more components of the two-hybrid system are coupled to each other with a linker. In some embodiments, the linker comprises a chemical linker, a peptide linker, or both. In some embodiments, the peptide linker is an alanine linker. In some embodiments, the linker binds components through peptide bonds, covalent bonds, ionic bonds, hydrogen bonds, disulfide bonds, or hydrophilic or hydrophobic interactions. Non-limiting examples of peptide linkers can be found here Chen, Xiaoying, Jennica L. Zaro, and Wei-Chiang Shen. “Fusion protein linkers: property, design and functionality.” Advanced drug delivery reviews 65.10 (2013): 1357-1369, which is hereby incorporated by reference in its entirety.
In some embodiments, the RNA polymerase binding site is suitable for binding with an RNA polymerase disclosed herein. In some embodiments, the subunit of RNA polymerase or portions thereof encoded by the genetically-encoded system disclosed herein recruits RNA polymerase to the RNA polymerase binding site to initiate transcription of a gene of interest. In such embodiments, the RNA polymerase binding site may be in a transcriptional activation site or region of the gene of interest. In some embodiments, the binding site for the RNA polymerase is a binding site for the subunit of the RNA polymerase or portions thereof. In some embodiments, a sigma factor enables binding of RNA polymerase to a gene promoter.
In some embodiments, the gene of interest (GOI) is a reporter gene that encodes a reporter polypeptide. In some embodiments, the reporter polypeptide comprises a luciferase enzyme, a fluorescent polypeptide, alkaline phosphatase, ß-galactosidase, a fructosyltransferase (e.g., levansucrase), chloramphenicol acetyltransferase (CAT), or a polypeptide that confers resistance to an antibiotic. In some embodiments, the antibiotic is penicillin, streptomycin, ampicillin, carbenicillin, spectinomycin, bleomycin, novobiocin, doxycycline, tetracycline, neomycin, kanamycin, zeocin, puromycin, geneticin, amphotericin, gentamicin, polymyxin B, hygromycin B, blasticidin, vancomycin, erythromycin, chloramphenicol, ticarcillin, or cefixime. Non-limiting examples of reporter genes encoding resistance to an antibiotic include, beta-lactamases, bleomycin binding protein Ble-MBL, blasticidin S deaminase, aminoglycoside adenylyltransferase, aminoglycoside phosphotransferase, tetracycline efflux protein, puromycin N-acetyltransferase, chloramphenicol acetyltransferase, neomycin phosphotransferase II, sterol 24-C-methyltransferase, bifunctional enzyme AAC/APH, or mobilized colistin resistance. Non-limiting fluorescent polypeptides include, but are not limited to green fluorescent protein, enhanced green fluorescent protein, green fluorescent protein ultra violet, blue fluorescent protein, enhanced blue fluorescent protein yellow fluorescent protein, enhanced yellow fluorescent protein, red fluorescent protein, DsRed fluorescent protein, cyan fluorescent protein, enhanced cyan fluorescent protein, mCherry, m Turquoise, m Venus, mRuby mWasabi, mTagBFP, mCitrine, mBanana, mOrange, dTomato, and Emerald.
In some embodiments, the GOI encodes a polymerizing enzyme or transcriptional activator that, when expressed, binds to a promoter or enhancer operably linked to a gene encoding a reporter polypeptide to drive expression of the reporter polypeptide disclosed herein. In some embodiments, the GOI encodes a polymerizing enzyme or repressor that, when expressed, binds to a promoter or transcriptional start site operably linked to a gene encoding the reporter polypeptide to reduce expression of the reporter polypeptide disclosed herein. In either case, the variant expression of the reporter polypeptide (e.g., increased expression in the case of the polymerizing enzyme or activator; decreased expression in the case of the polymerizing enzyme or repressor) as compared to a reference expression of the reporter polypeptide may be a readout of the genetically-encoded systems disclosed herein. In some embodiments, the reporter polypeptide is a detectable polypeptide. In some embodiments, a detectable polypeptide comprises a fluorescent polypeptide, such as those disclosed herein. In some embodiments, the polymerizing enzyme comprises an RNA polymerase. In some embodiments, the RNA polymerase comprises a prokaryotic RNA polymerase. In some embodiments, the RNA polymerase comprises a eukaryotic RNA polymerase. In some embodiments, the RNA polymerase is derived from a virus or bacteriophage. In some embodiments, the RNA polymerase comprises T7 RNA Polymerase (T7 RNAP), SP6 RNA Polymerase, or T3 RNA Polymerase. In some embodiments, the prokaryotic RNA polymerase is derived from a bacterium, archaea, or algae. In some embodiments, the RNA polymerase comprises Escherichia coli RNA Polymerase, Escherichia coli RNA Polymerase core enzyme, Escherichia coli RNA Polymerase holoenzyme, Poly(A) Polymerase, or plastid-encoded RNA polymerase. In some embodiments, the eukaryotic RNA polymerase is derived from a yeast, mammal, or plant. In some embodiments, the eukaryotic RNA polymerase comprises RNA polymerase I, RNA polymerase II, RNA polymerase III, RNA polymerase IV, RNA polymerase V, or chloroplast-derived plastid-encoded polymerase. In some embodiments, the RNA polymerase is a modified version of the wild-type RNA polymerase. In some embodiments, the RNA polymerase comprises one or more mutations of an amino acid sequence to improve fidelity, affinity, or both. In some embodiments, a subunit of the RNA polymerase or portions thereof sufficient to induce expression of the gene of interest is used rather than the entire RNA polymerase. In some embodiments, when the GOI encodes a polymerizing enzyme or a transcriptional activator that induces expression (e.g., activates transcription) of a reporter polypeptide that is detectable, the detectable signal or readout from the detectable polypeptide is greater than if the GOI encoded the detectable polypeptide. In some embodiments, the signal or readout is greater than by about 1-fold, 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, or 10-fold. In some embodiments, the signal or readout from the detectable polypeptide is greater than by about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100%. In some embodiments, the signal or readout from the detectable polypeptide comprises from 1-fold to 10-fold, from 2-fold to 9-fold, from 3-fold to 8-fold, from 4-fold to 7-fold, or from 5-fold to 6-fold greater. In some embodiments, the signal or readout from the detectable polypeptide comprises from 50% to 100%, from 55% to 95%, from 60% to 90%, from 65% to 85%, or from 70% to 80% greater. In some embodiments, the extent of signal amplification cannot be quantified because the detectable polypeptide yields no detectable signal when included as the GOI, rather than as a gene regulated by an activator or polymerizing enzyme encoded by the GOI. As an example, the reporter gene may encode T7 RNA Polymerase (T7 RNAP), that when expressed in the presence of an inhibitor of the target enzyme, drives expression of a fluorescent protein (FP), as shown in
a. Target Enzymes
Disclosed herein are target enzymes. In some embodiments, the target enzymes disclosed herein are therapeutic targets. In some embodiments, the target enzymes are encoded by the two-hybrid systems described herein. In some embodiments, the target enzyme may be associated with, or cause, a disease or a condition disclosed herein, such as cancer. In some embodiments, the target enzyme may be associated with, or cause, an infection or a disease or a condition associated with an infection by a pathogen. In some embodiments, the pathogen may be a virus, a bacterium, a fungus, a parasite, or a prion. In some embodiments, the target enzyme may be an enzyme that is expressed by one or more cancer cells.
Non-limiting examples of diseases or conditions that are associated with, or caused by, an infection by a pathogen include the common cold or viral rhinitis, influenza, meningitis, herpes, warts, measles, viral gastroenteritis, toxoplasmosis, encephalitis, tuberculosis, certain types of cancer such as cervical cancer, pneumonia, sepsis, pre-term or still birth, Ebola virus disease, Zika virus disease, Coronavirus disease, Lassa fever, Crimean-Congo hemorrhagic fever, Cholera, Dengue, Hepatitis, HIV/AIDS, diarrhea, Echinococcosis, Malaria, Polio, Tetanus, Rabies, Monkeypox, or smallpox.
Non-limiting examples of diseases or conditions that are associated with, or caused by, aberrant protease activity include cancer, diabetes, cardiovascular disease, inflammation, neurological disease, atherosclerosis, thrombosis, aneurysm, pulmonary hypertension, arthritis, osteoporosis, and chronic obstructive pulmonary disease.
In some embodiments, the target enzyme comprises a wild-type sequence. In some embodiments, the target enzyme is derived from an animal (e.g., mammals, mollusks, or cnidarians), plant, bacteria, virus, bacteriophage, chromistan, protist, or fungus. In some embodiments, the mammal is a monkey, primate, or human. In some embodiments, the mammal is a human. In some embodiments, the target enzyme is modified relative to the wild-type target enzyme. In some embodiments, the modification is an insertion, a substitution, or a deletion of one or more amino acids with reference to the wild-type sequence. In some embodiments, the modification is at one or more amino acid positions of the wild-type sequence. In some embodiments, the target enzyme expressed by the genetically-encoded system comprises a truncation at an N terminus, a C terminus, or both of the amino acid sequence of the target enzyme. In some embodiments, the truncation comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 amino acids. In some embodiments, the truncation comprises fewer than or equal to about 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 amino acids. In some embodiments, the truncation comprises greater than or equal to about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40 amino acids. In some embodiments, the truncation comprises 1-40, 2-39, 3-38, 4-37, 5-36, 6-35, 7-34, 8-33, 9-32, 10-31, 11-30, 12-29, 13-28, 14-27, 15-26, 16-25, 17-24, 18-23, 19-22, 20-21 amino acids. Non-limiting examples of truncated target enzymes are provided in Table 28.
In some embodiments, the target enzyme comprises a phosphatase or another enzyme capable of removing a phosphate group from a substrate, such as a protein, or a catalytically active portion thereof. In some embodiments, the phosphatase is capable of dephosphorylating a histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine, alanine, asparagine, aspartic acid, glutamic acid, serine, arginine, cysteine, glutamine, glycine, proline, or tyrosine. In some embodiments, the phosphatase comprises or is a tyrosine phosphatase. Non-limiting examples of protein tyrosine phosphatases are provided in Tautz L, Critton D A, Grotegut S. Protein tyrosine phosphatases: structure, function, and implication in human disease. Methods Mol Biol. 2013; 1053:179-221, which is hereby incorporated by reference in its entirety. In some embodiments, the tyrosine phosphatase comprises Protein tyrosine phosphatase non-receptor type 1 (PTP1B), Protein tyrosine phosphatase non-receptor type 2 (TC-PTP), Protein tyrosine phosphatase non-receptor type 6 (SHP1), Protein tyrosine phosphatase non-receptor type 11 (SHP1), or Protein tyrosine phosphatase non-receptor type 12 (PTP-PEST). In some embodiments, the tyrosine phosphatase is a receptor tyrosine phosphatase. In some embodiments, the tyrosine phosphatase comprises a cysteine-specific protein tyrosine phosphatase. In some embodiments, the tyrosine phosphatase is derived from Homo sapiens (human). In some embodiments, human PTP1B can be identified by NCBI Gene ID: 5770. In some embodiments, human TCPTP can be identified by NCBI Gene ID: 5771. In some embodiments, human SHP1 can be identified by NCBI Gene ID: 5777. In some embodiments, human PTP-PEST can be identified by NCBI Gene ID: 5782. Non-limiting examples of tyrosine phosphatases include PTP1B (SEQ ID NOS: 6 and 236), TCPTP (SEQ ID NOS: 237-238), PTPRB (SEQ ID NO: 239), PTPRC (SEQ ID NO: 240), PTPN6 (SEQ ID NO: 241), PTPN22 (SEQ ID NO: 242), PTPRS (SEQ ID NO: 243), PTPRM (SEQ ID NO: 244), or PTPRZ (SEQ ID NO: 245). In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 6. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 6. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 235. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 235. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 236. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 236. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 237. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 237. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 238. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 238. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 239. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 239. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 240. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 240. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 241. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 241. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 242. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 242. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 243. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 243. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 244. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 244. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 245. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 245.
In some embodiments, the tyrosine phosphatase is truncated. In some embodiments, the truncation is the N-terminus or the C-terminus, or both of the amino acid sequence. In some embodiments, the truncated tyrosine phosphatase is or comprise a catalytic domain of the phosphatase (e.g., a portion there cable of performing a phosphatase catalytic function). In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in Table 27. In some embodiments, the catalytic domains of the tyrosine phosphatases described herein comprises an amino acid sequence provided in Table 28. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is provided in any one of SEQ ID NOS: 235-245. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOS: 235-245. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 235. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 235. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 236. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 236. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 237. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 237. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 238. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 238. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 239. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 239. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 240. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 240. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 241. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 241. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 242. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 242. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 243. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 243. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 244. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 244. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence provided in SEQ ID NO: 245. In some embodiments, the tyrosine phosphatase comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 245.
In some embodiments, the phosphatase comprises or is a serine phosphatase. In some embodiments, the serine phosphatase is a threonine phosphatase. In some embodiments, the phosphatase is a serine threonine phosphatase. Non-limiting examples of serine threonine phosphatases include Phosphoprotein phosphatases, Phosphoprotein phosphatases activated by magnesium, serine/threonine protein phosphatase 5/retinal degeneration C (PP5/rdgC), protein phosphatase with EF-hand domain 2 (PPEF2), protein phosphatase 5 catalytic subunit (PPP5C), Carboxy Terminal Domain phosphatases. In some embodiments, the phosphatase comprises or is a tyrosine, serine, and threonine phosphatase. Non-limiting examples of protein tyrosine, serine, and threonine phosphatase include Lambda Protein Phosphatase.
In some embodiments, the target enzyme is a protein tyrosine phosphatase. In some embodiments, the protein tyrosine phosphatase is a nonreceptor protein tyrosine phosphatase. In some embodiments, the nonreceptor protein tyrosine phosphatase is PTP1B, PTPN2, or PTPN22. In some embodiments, the protein tyrosine phosphatase is a protein serine/threonine phosphatase. In some embodiments, the protein serine/threonine phosphatase is PP1, PP2A, or PP2B. In some embodiments, the protein tyrosine phosphatase is a dual specificity phosphatase. In some embodiments, the dual specificity phosphatase is a MAPK phosphatase, laforin, a PTEN-like phosphatase, or a Cdc14 phosphatase.
In some embodiments, the target enzyme is or comprises a proteolytic enzyme. In some embodiments, the proteolytic enzyme is a protease, peptidase or proteinase, or any other enzyme capable of hydrolyzing peptide bonds, or a catalytically active portion thereof. In some embodiments, the proteolytic enzyme hydrolyzes a peptide bond of a serine or a tyrosine. In some embodiments, the proteolytic enzyme hydrolyzes a peptide bond of a histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, valine, alanine, asparagine, aspartic acid, glutamic acid, serine, arginine, cysteine, glutamine, glycine, proline, or tyrosine. In some embodiments, the protease is derived from Homo sapiens (human) (e.g., a human protease), bacteria, archaea, algae, a virus, or a plant. In some embodiments, the protease is derived from a virus (e.g., a viral protease). In some embodiments, the human protease comprises ubiquitin specific peptidase 7 (USP7) (also referred to herein as Ubiquitin-specific-processing protease 7 (USP7)), which may be identified by NCBI Gene ID: 7874. Non-limiting examples of other human ubiquitin specific proteases include Ubiquitin-specific-processing protease 4 (USP4), Ubiquitin-specific-processing protease 11 (USP11), Ubiquitin-specific-processing protease 32 (USP32), Ubiquitin-specific-processing protease 15 (USP15), Ubiquitin-specific-processing protease 9X (USP9X), Ubiquitin carboxyl-terminal hydrolase 14 (USP14), or Ovarian tumor (OTU) domain-containing protein 7B. In some embodiments USP7 comprises an amino acid sequence comprising SEQ ID NO: 65. In some embodiments, USP7 comprises an amino acid sequence that is more than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 65. In some embodiments, the ubiquitin specific protease comprises USP11. In some embodiments, the USP11 comprises an amino acid sequence comprising SEQ ID NOS: 288. In some embodiments, the USP11 comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NOS: 288. In some embodiments, the ubiquitin specific protease comprises USP14. In some embodiments USP14 comprises an amino acid sequence comprising SEQ ID NO: 289. In some embodiments, USP14 comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 289. In some embodiments, the ubiquitin specific protease comprises the Ovarian tumor (OTU) domain-containing protein 7B. In some embodiments, the OTU domain-containing protein 7B comprises an amino acid sequence comprising SEQ ID NO: 290. In some embodiments, the OTU domain-containing protein 7B comprises an amino acid sequence that is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 290.
In some embodiments, the protease may be 3CL protease (3CLpro), papain-like protease (PLpro), NS2B, NS3pro, NS2B-NS3pro fusion protein, 3C protease, K7L, I7L, OTU domain of L protein, NSP2. In some embodiments, the viral protease may be a protease in the family of Calciviridae, Coronaviridae, Flaviviridae, Picornaviridae, Poxviridase, Nairoviridae, or Togaviridae. In some embodiments, the viral protease comprises a protease from Norovirus GI.1, Norovirus GII.4, Severe acute respiratory syndrome (SARS), Middle East respiratory syndrome coronavirus (MERS-COV), Dengue Virus 1, Dengue Virus 2, Dengue Virus 3, Dengue Virus 4, West Nile Virus, Japanese encephalitis virus, St. Louis encephalitis virus, Yellow fever virus, Zika virus, Hepatitis A, Enterovirus 68, Enterovirus 71, Variola Major, small pox, Monkeypox virus, Crimean-Congo hemorrhagic fever orthonairovirus, Venezuelan equine encephalitis virus, Eastern equine encephalitis virus, Western equine encephalitis virus, or Chikungunya virus. In some embodiments, the viral protease is or comprises 3CLpro of severe acute respiratory syndrome coronavirus 2 (SARS-COV-2). In some embodiments 3CLpro 7 comprises an amino acid sequence comprising SEQ ID NO: 69. In some embodiments, 3CLpro comprises an amino acid sequence that is more than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 69. In some embodiments, the viral protease is or comprises NS2B/NS3 protease of West Nile Virus. In some embodiments NS2B/NS3 protease comprises an amino acid sequence comprising SEQ ID NO: 78. In some embodiments, NS2B/NS3 protease comprises an amino acid sequence that is more than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:78. In some embodiments, the viral protease is or comprises PLpro of SARS-COV-2. In some embodiments PLpro comprises an amino acid sequence comprising SEQ ID NO: 67. In some embodiments, PLpro comprises an amino acid sequence that is more than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 67. In some embodiments, HIV protease (HIV-1Pr) comprises an amino acid sequence provided in SEQ ID NO: 63. In some embodiments, HIV-1Pr comprises an amino acid sequence that is more than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 63. In some embodiments, USP7 protease comprises an amino acid sequence provided in SEQ ID NO: 65. In some embodiments, USP7 protease comprises an amino acid sequence that is more than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 65.
In some embodiments, the target enzyme is encoded by the two-hybrid system disclosed herein. In some embodiments, the target enzyme is produced by the synthase enzyme encoded by the system disclosed herein. Certain trypsin-like serine proteases (e.g., NS3pro) may exhibit activity in the present of a cofactor (e.g., NS2B). In some embodiments, the trypsin-like serine protease and its cofactor (e.g., NS3pro and NS2B) are expressed as a protein-protein fusion or as separate proteins that forms a complex in the cell, as illustrated in
In some embodiments, the target enzyme is a protein kinase. In some embodiments, the protein kinase is a protein tyrosine kinase. In some embodiments, the protein tyrosine kinase is a receptor tyrosine kinase. In some embodiments, the receptor tyrosine kinase is EGFR, HER2/ErbB2, PDGFR, FGFR, Insulin receptor, or MET. In some embodiments, the protein tyrosine kinase is a non-receptor tyrosine kinase. In some embodiments, the non-receptor tyrosine kinase is Janus kinase (JAK), focal adhesion kinase, Feline Sarcoma kinase, SYK, TEC, or Abl. In some embodiments, the protein kinase is a protein serine/threonine kinase. In some embodiments, the serine/threonine kinase is JNK, Protein Kinase B/AKT, Casein Kinase 2, Protein Kinase A, MAPKs, or mTOR, In some embodiments, the protein kinase is a Cyclin Dependent Kinase (CDK). In some embodiments, the protein kinase comprises Src Kinase, lymphocyte-specific protein tyrosine kinase (Lck), Fyn kinase Yes kinase, tyrosine kinase EphA2, or Bruton's tyrosine kinase (BTK). In some embodiments, the protein kinase is truncated. In some embodiments, the truncation is on the C-terminus, the N-terminus or a combination thereof. In some embodiments, the protein kinase comprises an amino acid sequence provided in any one of SEQ ID NOS: 246-251. In some embodiments, the protein kinase comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to any one of SEQ ID NOS: 246-251.
b. Ligand
Disclosed herein are ligands capable of binding receptors encoded by the two-hybrid systems disclosed herein. In some embodiments, the ligand is a polypeptide that includes short hydrophobic peptide segments that can bind to a receptor (e.g. Hsp70, Hsp90, Per-Arnt-Sim repeats). In some embodiments, the ligand is a polypeptide with an amino acid sequence that is similar to, in part or in full, or identical to, the amino acid sequence of the receptor (e.g., homodimer cytochrome c). In some embodiments, the ligand binds to the receptor in a manner that is not phosphorylation dependent. In some embodiments, the ligand is a polypeptide that binds to the receptor through hydrogen bonds (e.g., estrogen receptor alpha/beta heterodimer). In some embodiments, the ligand interacts with the receptor through agglutination (e.g., antibody-antigen binding). In some embodiments, the ligand binds to the receptor in a manner that is phosphorylation dependent. In some embodiments, the ligand is a kinase substrate. In some embodiments, the kinase substrate may comprise a polypeptide with an amino acid residue that can be phosphorylated by a protein kinase, dephosphorylated by a protein phosphatase, bind to a phosphorylated protein binding domain (e.g., SH2 domain) in its phosphorylated state, and bind less strongly to phosphorylated protein binding domain (or not at all) when it is dephosphorylated. In some embodiments, the phosphorylated protein binding domain comprises or is a tyrosine kinase substrate. In some embodiments, the tyrosine kinase substrate may comprise a polypeptide with a tyrosine residue that can be phosphorylated by a protein tyrosine kinase, dephosphorylated by a protein tyrosine phosphatase, bind to a SH2 domain in its phosphorylated state, and bind less strongly to the SH2 domain (or not at all) when it is dephosphorylated. In some embodiments, where the binding between SH2 is not phosphorylation dependent, the kinase substrate can be SH2ABL/HA4, as shown
c. Protease Cleavage Sites
Disclosed herein are protease cleavage sites that are defined by a protease recognition motif disclosed herein and configured to be cleaved by a proteolytic enzyme (e.g., a protease) disclosed herein. In some embodiments, the protease cleavage sites are engineered to in a linker region between one or more components of the two-hybrid system. In some embodiments, the protease cleavage site is located outside the linker region. In some embodiments, the two-hybrid system is the phosphorylation sensitive B2H system disclosed herein. In some embodiments, the protease cleavage site is positioned in a linker between the subunit of the RNA polymerase or portions thereof (e.g., RpoZ) and the kinase/phosphatase substrate (e.g., MidT), as shown in
In some embodiments, the protease recognition motif is specific to a protease disclosed herein. In some embodiments, the protease recognition motif is provided in Table 11. In some embodiments, the protease comprises HIVpro, 3CLpro of severe acute respiratory syndrome coronavirus 2 (SARS-COV-2), the papain-like protease (PLpro) of SARS-COV-2, or ubiquitin-specific-processing protease 7 (USP7). In some embodiments, these proteases are important targets for viral diseases (e.g., HIVpro, 3CLpro, and PLpro) and cancer (e.g., USP7), have protease recognition motifs that range from 4 to 75 amino acids and exhibit different yields when overexpressed in a cell (e.g., E. coli). In some embodiments, the protease is provided in Table 11.
In some embodiments, the protease recognition motifs comprise less than or equal to about 100, 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 68, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, or 2 amino acids. In some embodiments, the recognition motifs comprise more than or equal to about 100, 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 68, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, or 2 amino acids. In some embodiments, the recognition motifs comprise 3-100, 3-75, 3-50, 3-25, 4-100, 4-75, 4-50, 4-25, 5-100, 5-75, 5-50, 5-25, 6-100, 6-75, 6-50, 6-25, 7-100, 7-75, 7-50, 7-25, 8-100, 8-75, 8-50, 8-25, 9-100, 9-75, 9-50, 9-25, 10-100, 10-75, 10-50, or 10-25 amino acids. In some embodiments, the recognition motifs comprise 100, 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 68, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, or 2 amino acids. In some embodiments, the amino acids are contiguous.
In some embodiments, the linker is or comprises a peptide linker. In some embodiments, the linker comprises an alanine linker. In some embodiments, the linker (not including the protease cleavage site) comprises less than or equal to about 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 amino acids. In some embodiments, the linker (not including the protease cleavage site) comprises more than or equal to about 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 amino acids. In some embodiments, the linker (not including the protease cleavage site) comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids. In some embodiments, the linker (not including the protease cleavage site) comprises 1-10, 2-10, 3-10, 4-10, 5-10, 6-10, 7-10, 8-10, 9-10, 1-9, 2-9, 3-9, 4-9, 5-9, 6-9, 7-9, 8-9, 1-8, 2-8, 3-8, 4-8, 5-8, 6-8, 7-8, 1-7, 2-7, 3-7, 4-7, 5-7, 6-7, 1-6, 2-6, 3-6, 4-6, 5-6, 1-5, 2-5, 3- 5, 4-5, 1-4, 2-4, 3-4, 1-3, 2-3, or 1-2 amino acids. In some embodiments, the amino acids are contiguous. In some embodiments, the peptide linker comprises proline-rich sequences, polar residues (e.g., serine, glycine, threonine), stretches of glycine and serine residues. Non-limiting examples of peptide linkers can be found here Chen, Xiaoying, Jennica L. Zaro, and Wei-Chiang Shen. “Fusion protein linkers: property, design and functionality.” Advanced drug delivery reviews 65.10 (2013): 1357-1369, which is hereby incorporated by reference in its entirety.
In some embodiments, protease recognition motif comprises an amino acid sequence that is capable of being hydrolyzed by the 3CL protease (3CLpro) of severe acute respiratory syndrome coronavirus 2 (SARS-COV-2). In some embodiments, the amino acid sequence comprises AVLQSGFR (SEQ ID NO: 1), which is a substrate recognition motif for 3CLsubs. In some embodiments, the amino acid sequence further comprises a linker sequence. In some embodiments, the linker sequence comprises at least about 1, 2, 3, or 4 alanine residues on the N- and/or C-terminal sides of the protease cleavage site. In some embodiments, the protease cleavage site comprises a modification relative to SEQ ID NO: 1. In some embodiments, the modification is an insertion, a substitution, or a deletion of one or more amino acids in SEQ ID NO: 1. In some embodiments, the modification is at an amino acid position 1, 2, 3, 4, 5, 6, 7, or 8 of SEQ ID NO: 1. In some embodiments, the protease cleave site is indicated by an “*”, such as for example, in
In some embodiments, the protease recognition motif comprises an amino acid sequence capable of being hydrolyzed by human immunodeficiency virus 1 protease (HIV-1pro). In some embodiments, the amino acid sequence comprises KARVLAEAM (SEQ ID NO: 2), which is a substrate recognition motif for HIV-1pro. In some embodiments, the amino acid sequence further comprises a linker sequence. In some embodiments, the linker sequence comprises at least about 1, 2, 3, or 4 alanine residues on the N- and/or C-terminal sides of protease cleavage site. In some embodiments, the protease cleavage site comprises a modification relative to SEQ ID NO: 2. In some embodiments, the modification is an insertion, a substitution, or a deletion of one or more amino acids in SEQ ID NO: 2. In some embodiments, the modification is at an amino acid position 1, 2, 3, 4, 5, 6, 7, 8, or 9 of SEQ ID NO: 2. In some embodiments, the protease cleave site is indicated by an “*”, such as for example, in
In some embodiments, the protease recognition motif comprises an amino acid sequence capable of being hydrolyzed by papain-like protease (PLpro). In some embodiments, the amino acid sequence comprises LRGG (SEQ ID NO: 3), which is a substrate recognition motif for PLpro. In some embodiments, the amino acid sequence further comprises a linker sequence. In some embodiments, the linker sequence comprises at least about 1, 2, 3, or 4 alanine residues on the N- and/or C-terminal sides of protease cleavage site. In some embodiments, the protease cleavage site comprises a modification relative to SEQ ID NO: 3. In some embodiments, the modification is an insertion, a substitution, or a deletion of one or more amino acids in SEQ ID NO: 3. In some embodiments, the modification is at an amino acid position 1, 2, 3, or 4 of SEQ ID NO: 3. In some embodiments, the protease cleave site is indicated by an “*”, such as for example, in
In some embodiments, the insertion comprises the ubiquitin protein. In some embodiments, the insertion comprises a native recognition site for PLpro. In some embodiments, the insertion comprises a nonnative recognition site for PLpro.
Thus, by adding protease recognition motifs to the phosphorylation sensitive B2H system disclosed herein, the inventors of the instant disclosure modified the system to detect inhibitors of proteases rather than phosphatases. In some embodiments, ribosomal binding sites (RBS) were added to the two-hybrid system to enhance ribosomal binding to the mRNA encoding the protease described elsewhere, which had the strongest influence on dynamic range. In some embodiments, the RBS sequences are provided in SEQ ID NOS: 38-42. In some embodiments, the RBS sequences are greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to any one of SEQ ID NOS: 38-42. In some embodiments, the RBS is engineered. In some embodiments, the RBS is located to induce transcription of the RNA polymerase described elsewhere. In some embodiments, the RBS is located in the untranslated region in the 5′ direction of the RNA polymerase described elsewhere. In some embodiments, a luminescence-based screen was used to facilitate a rapid evaluation of whether the RBS that were added improved translation of the protease. In some embodiments, a fluorescence-based assay is used to evaluate whether the RBS improved translation of the gene of interest. In some embodiments; growth-coupled assays were used to evaluate whether the two-hybrid system had successfully been modified to detect inhibitors of proteases rather than phosphatases. Methods for screening both components in combination—and, ideally, within the final two-hybrid system intended for use in high-throughput assays—could accelerate the optimization of new protease-specific two-hybrid systems.
In addition, it was discovered that phosphorylation sensitive B2H systems disclosed herein may not require a protease cleavage site to detect inhibitors of proteases given the promiscuity of proteases and the sensitivity of the B2H systems. Thus, in some embodiments, the linker does not comprise a protease cleavage site or recognition motif.
The two-hybrid (e.g., B2H) system described herein has several important advantages over previous biosensors for protease inhibitors, including but not limited to: (i) the substrate-RpoZ fusion being able to accommodate a large range of linker lengths (e.g., the addition of peptide stretches of 4-75 amino acids) and, thus, facilitating the incorporation of different protease cleavage sites; (ii) the system controls the transcription of user-defined GOIs (e.g., genes for luminescence, antibiotic resistance, or, perhaps, fluorescence) and thus, is compatible with a large variety of high-throughput screens; (iii) the system relies on a system of adjustable components—from the protease cleave site and protease RBS, which helped improve dynamic range in the systems, to the peptide substrate and kinase RBS, which can modulate the extent of protein-protein binding, and these components provide multiple routes to two-hybrid optimization. In general, the modularity of the two-hybrid system facilitates its extension to different targets, signals, and assay types.
The screen of terpenoid pathways highlights important challenges and opportunities for using genetically encoded detection systems. A previously unreported terpenoid inhibitor of 3CLpro, α-bisabolol, which has a reasonable IC50 (˜30-80 μM) for a 15-carbon hydrocarbon, was identified. The production of this terpenoid alone, however, was insufficient to enhance antibiotic resistance, which has two implications: (i) that simple comparisons of the product profiles of hits and non-hits can miss inhibitory products and, thus, highlights the importance of including multiple pathways that generate the same product in starting libraries, and (ii) that the survival advantage conferred by some pathways might peak at intermediate production levels—which could plausibly inhibit the target while avoiding off-target interactions—and, thus, motivates a systematic study of inhibitor-generating pathways under different levels of induction. Curiously, one hit identified in the screen (Q41594) produced small amounts of α-bisabolol in liquid culture, where intracellular titers were lower than the IC50, as described below with respect to Examples 12-14. These titers, which varied with media composition, motivate future efforts to screen and analyze pathways under identical growth conditions. By whittling down large pathway libraries such as those described herein to a small subset that generate inhibitors, they can reduce the throughput required for compound isolation and analysis.
d. Gene of Interest (GOI)
Provided herein, in some embodiments, are genes of interest (GOI), which refer to genes capable of producing a gene expression product that is detectable directly or indirectly. In some embodiments, the GOI encodes a detectable polypeptide, such as a fluorescent polypeptide, or an amplifying enzyme (e.g., T7 RNA polymerase). Non-limiting examples of fluorescent polypeptides comprise, but are not limited to green fluorescent protein, enhanced green fluorescent protein, green fluorescent protein ultra violet, blue fluorescent protein, enhanced blue fluorescent protein yellow fluorescent protein, enhanced yellow fluorescent protein, red fluorescent protein, DsRed fluorescent protein, cyan fluorescent protein, enhanced cyan fluorescent protein, mCherry, m Turquoise, m Venus, mRuby, mWasabi, mTagBFP, mCitrine, mBanana, mOrange, dTomato, and Emerald. In some embodiments, the GOI encodes an enzyme that produces a detectable signal when introduced to a substrate, such as for example, luciferase, β-galactosidase, or bacterial luminescence (lux). In some embodiments, the GOI encodes a gene expression product that confers antibiotic resistance. Non-limiting examples of GOI that confer antibiotic resistance include SpecR, beta-lactamases, bleomycin binding protein Ble-MBL, blasticidin S deaminase, aminoglycoside adenylyltransferase, aminoglycoside phosphotransferase, tetracycline efflux protein, puromycin N-acetyltransferase, chloramphenicol acetyltransferase, neomycin phosphotransferase II, sterol 24-C-methyltransferase, bifunctional enzyme AAC/APH, or mobilized colistin resistance. In some embodiments, the amino acid sequence for SpecR comprises SEQ ID NO: 79. In some embodiments, the amino acid sequence for SpecR is greater than or equal to 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 79. In some the GOI comprises LuxAB. In some embodiments, the amino acid sequence for LuxAB comprises SEQ ID NO: 34. In some embodiments, the amino acid sequence for LuxAB is greater than or equal to 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 34.
In some embodiments, the GOI encodes a transcriptional repressor. In some embodiments, the GOI encodes a catalytically dead Cas protein. In some embodiments, the GOI encodes transcription repressor such as tetracycline repressor, LexA repressor, lacI repressor, Centromere Binding Factor 1 (CBF1), Krüppel-associated box (KRAB). In some embodiments, the repressor encodes SrpR, AmeR, BetI, PsrA, PhiF or HlyII. In some embodiments, the repressor is derived from a bacteria, yeast, tetrapod, insect, plant, or mammal.
Provided herein are bioactive molecules produced by a genetically modified organism disclosed herein, which may or may not utilize a combination of complex metabolic pathways that work together to produce the bioactive molecule. In some embodiments, the bioactive molecule is a potential therapeutic agent, which may be useful for treating a disease or a condition disclosed herein.
In some embodiments, the bioactive molecule is a modulator of the target enzyme. In some embodiments, the modulator of the target enzyme is an inhibitor of the target enzyme. In some embodiments the inhibitor of the target enzyme is an allosteric modulator of the target enzyme. In some embodiments, the modulator of the target enzyme is an agonist of the target enzyme. In some embodiments the agonist of the target enzyme is an allosteric modulator of the target enzyme. In some embodiments, the modulator of the target enzyme binds the target enzyme directly or indirectly. Non-limiting examples of methods of analysis of protein-protein binding to determine whether the modulator binds the target enzyme include a co-immunoprecipitation (co-IP), pull-down, crosslinking protein interaction analysis, labeled transfer protein interaction analysis, or Far-western blot analysis, FRET based assay, including, for example FRET-FLIM, a yeast two-hybrid assay, BiFC, or split luciferase assay.
In some cases, the metabolic pathway may be known or unknown; the genetically engineered systems and methods of the present disclosure may be driven (e.g., through evolutionary selection) to find a combination of metabolic pathways to arrive at a desirable bioactive molecule. A bioactive molecule may comprise various classes of biologically produced molecules, where “classes” may refer to any named category that defines a group of molecules having a common characteristic (e.g., proteins, nucleic acids, carbohydrates, small molecule). In some cases, a bioactive molecule may undergo various modifications and/or transformations to its structure. For example, a bioactive protein molecule may be modified with various post-translational modifications and/or transform in conformation (which may be guided by other proteins such as chaperons, heat shock proteins, and any protein that serves a folding function).
A bioactive molecule may comprise one or a combination of molecular components from various biomolecule classes, for example, metabolites (e.g., terpenoids, peptides, or phenylpropanoids), amino acids, carbohydrates, nucleic acids, lipids, any monomeric forms thereof, any polymeric forms thereof, or any derivatives thereof. In some embodiments, a bioactive molecule may comprise one or more modifications. For example, a bioactive protein may comprise post-translation modifications, including, but not limited to: acylation, myristoylation, palmitoylation, isoprenylation, prenylation, farnesylation, geranylgeranylation, glypiation, glycosylphosphatidylinositol anchor formation, lipoylation, flavin functionalization, heme functionalization, phosphorylation, phosphopantetheinylation, retinylidene Schiff base formation, diphthamide formation, ethanolamine phosphoglycerol functionalization, hypusine formation, beta-Lysine addition, acetylation, formylation, alkylation, methylation, amidation, amide bond formation, butyrylation, gamma-carboxylation, glycosylation, polysialylation, malonylation, hydroxylation, iodination, nucleotide addition, phosphate ester formation, phosphoramidate formation, adenylation, uridylylation, propionylation, pyroglutamate formation, gluthathionylation, nitrosylation, sulfenylation, sulfinylation, sulfonylation, succinylation, sulfation, glycation, carbamylation, carbonylation, isopeptide bond formation, biotinylation, carbamylation, oxidation, pegylation, citrullination, deamidation, eliminylation, disulfide bond formation, proteolytic cleavage, isoaspartate formation, racemization, protein splicing, chaperon-assisted folding.
In some embodiments, the bioactive molecule comprises a chemical compound. In some embodiments, the bioactive molecule comprises an intermediate of a metabolic pathway, such for example, farnesyl diphosphate. In some embodiments, the bioactive molecule comprises a sesquiterpene. In some embodiments, the bioactive molecule comprises Himachalol, β-himachalene, γ-humulene, E-β-farnesene, E-α-bisabolene, β-bisabolene, γ-bisabolene, α-himachalene, γ-himachalene, α-longipinene, β-gurjunene, α-ylangene, β-ylangene, longifolene, β-longipinene, siberene, β-cubebene, cyclosativene, or sativene, or any combination thereof, as shown in
In some embodiments, the bioactive molecule is a flavonoid. In some embodiments the flavonoid is a phenylpropanoid. In some embodiments, the phenylpropanoid comprises L-phenylalanine, L-tyrosine, cinnamic acid, p-coumaric acid, coumarin, umbelliferone, pinosylvin, resveratrol, pinocembrin, naringenin chalcone, naringenin, pinocembrin, chrysin, apigenin, baicalein, scutellarein, or a combination thereof. In some embodiments, the bioactive molecule is a nonribosomal peptide. In some embodiments, the peptide is an aldehyde. In some embodiments, the peptide is a dipeptide. In some embodiments, the dipeptide has a dipeptide pyrazine core. In some embodiments the dipeptide is an aldehyde.
In some embodiments, the bioactive molecule inhibits the target enzyme with a percent (%) inhibition that is greater than or equal to about 90%. In some embodiments, the bioactive molecule inhibits the target enzyme with a percent (%) inhibition that is equal to about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%. In some embodiments, the bioactive molecule inhibits the target enzyme with a percent (%) inhibition that is greater than or equal to about 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%. In some embodiments, the bioactive molecule inhibits the target enzyme with a percent (%) inhibition that is from 70%-100%, 75%-95%, or 80%-90%. In some embodiments, the bioactive molecule inhibits the target enzyme with a percent (%) inhibition that is from 80%-100%. In some embodiments, the bioactive molecule activates the target enzyme with a percent (%) inhibition that is greater than or equal to about 90%. In some embodiments, the bioactive molecule activates the target enzyme with a percent (%) inhibition that is equal to about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%. In some embodiments, the bioactive molecule activates the target enzyme with a percent (%) inhibition that is greater than or equal to about 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%. In some embodiments, the bioactive molecule activates the target enzyme with a percent (%) inhibition that is from 70%-100%, 75%-95%, or 80%-90%. In some embodiments, the bioactive molecule activates the target enzyme with a percent (%) inhibition that is from 80%-100%. In some embodiments, the bioactive molecule is or comprises α-bisabolol, or a derivative thereof. In some embodiments, percent inhibition or percent activation may be measured using a fluorogenic peptide-based detection system, in which the proteolytic activity of the target enzyme liberates a fluorophore (7-Amino-4-trifluoromethylcoumarin, AFC, λex=400 nm, λex=505 nm) from a peptide substrate (TSAVLQ* SEQ ID NO: 81), as shown in
In some embodiments, the bioactive molecule is present in the cell at a concentration that matches or exceeds the half-maximal inhibitor concentration (IC50) when measured using an in vitro kinetic assay carried out in buffer with purified target enzyme and purified bioactive molecule. In some embodiments, the concentration exceeds the IC50 by about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 200%, 250%, or 300%. In some embodiments, the concentration exceeds the IC50 by about 1-fold, 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, or 10-fold. In some embodiments, the bioactive molecule is present in the cell at a concentration that matches or exceeds the half-maximal activation concentration (AC50) when measured using an in vitro kinetic assay carried out in buffer with purified target enzyme and purified bioactive molecule. In some embodiments, the concentration exceeds the AC50 by about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 200%, 250%, or 300%. In some embodiments, the concentration exceeds the AC50 by about 1-fold, 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, or 10-fold.
Disclosed herein, in some embodiments, are metabolic pathways that facilitate the production of bioactive molecules in a cell. In some embodiments, the metabolic pathway comprises a pathway for producing the synthase (e.g., terpene synthase). In some embodiments, the metabolic pathway further comprises a metabolic precursor pathway encoding certain enzymes responsible for producing metabolic precursors that serve as substrates for the synthase to produce the bioactive molecules (e.g., terpenoids). In some embodiments, the metabolic pathway is unknown (e.g., randomized mutagenesis of metabolic components). In some embodiments, the metabolic pathway is known. In some embodiments, the metabolic pathway comprises a mevalonate pathway, a methylerthritol 4-phosphate (MEP) pathway, a deoxyxylulose 5-phosphate (DXP) pathway, or a combination thereof. In some embodiments, the metabolic precursor pathway comprises enzymes that convert mevalonate to isopentyl pyrophosphate (IPP) and farnesyl pyrophosphate (FPP). In some metabolic precursor pathway generates geranyl pyrophosphate (GPP), farnesyl pyrophosphate (FPP), or geranylgeranyl pyrophosphate (GGPP), or any combination thereof. In some embodiments, the metabolic pathway and metabolic precursor pathway are exogenous to the cell. In some embodiments, the metabolic pathway and metabolic precursor pathway are derived from Homo sapiens (human), yeast (e.g., Saccharomyces Cerevisiae), a plant, algae, or bacteria.
In some embodiments, the metabolic pathways comprises isoprenoid precursors isopentenyl diphosphate (IPP), dimethylallyl diphosphate (DMAPP), or a combination thereof. In some embodiments, IPP and DMAPP are synthesized from either (i) acetyl-CoA through the mevalonate pathway (MVA) or (ii) pyruvate and glyceraldehyde 3-phosphate through the non-mevalonate pathway (MEP or DXP). Condensation of IPP and DMAPP generates longer isoprenoids, such as geranyl diphosphate (GPP, C10), farnesyl diophosphate (FPP, C15), or geranylgeranyl diphosphate (GGPP, C20), which are substrates for terpene synthases disclosed herein. In some embodiments, the enzymes encoded by the metabolic pathway comprise mevalonate kinase (ERG12) (NCBI Gene ID: 855248), phosphomevalonate kinase (ERG8) (NCBI Gene ID: 855260), or diphosphomevalonate decarboxylase MVD1 (MVD1) (NCBI Gene ID: 855779), or a combination thereof.
In some embodiments, the metabolic precursor pathway comprises precursors to convert isoprenol into farnesyl diphosphate (FPP) or geranylgeranyl diphosphate (GGPP). In some embodiments, the metabolic pathway further comprises GGPP synthase (GGPPS) that synthesis GGPP from FPP and IPP. In some embodiments, GGPP is a terpenoid precursor for certain terpene synthases disclosed herein, such as α-bisabolene synthase (ABS), or taxadiene synthase (TXS). In some embodiments, FFP is a terpenoid precursor for γ-humulene synthase (GHS), amorphadiene synthase (ADS). Non-limiting examples of encoded metabolic pathways and terpenoid biosynthesis precursors can be found in Martin V J, Pitera D J, Withers S T, Newman J D, Keasling J D. Engineering a mevalonate pathway in Escherichia coli for production of terpenoids. Nat Biotechnol. 2003 July; 21 (7): 796-802; and U.S. patent application Ser. Nos. 17/141,321 and 17/859,509, each of which are hereby incorporated by reference in its entirety.
In some embodiments, the metabolic pathway further includes an enzyme that selectively hydroxylates unactivated carbon-hydrogen bonds. In some embodiments, the enzyme comprises a cytochrome P450 enzyme, a cytochrome P450 reductase enzyme, a cytochrome b5 enzyme, an oxidase enzyme, an acyl transferase enzyme, a glycosyltransferase enzyme, a halogenase, or a peroxidase, or a combination thereof. Non-limiting examples of metabolic pathways that include these enzymes that selectively hydroxylate unactivated carbon-hydrogen bonds are provided in Chang M C, Eachus R A, Trieu W, Ro D K, Keasling J D. Engineering Escherichia coli for production of functionalized terpenoids using plant P450s. Nat Chem Biol. 2007 May; 3 (5): 274-7, which is hereby incorporated by reference in its entirety.
Disclosed herein are synthase enzymes that are engineered to produce a bioactive molecule that modulates the activity or expression of a target enzyme disclosed herein. In some embodiments, the system further comprises a nucleic acid encoding a synthase described herein. In some embodiments, the synthase enzyme has been modified relative to a wild-type (or otherwise unmodified) synthase enzyme. In some embodiments, the modified synthases increase diversity of the bioactive molecules produced by the engineered organism in vivo that modulate the activity or expression of the target enzyme. In some embodiments, the synthase is a terpene synthase or a non-ribosomal peptide synthetase.
In some embodiments, the synthase is derived from a prokaryotic organism. In some embodiments, the prokaryotic organism comprises bacteria, archaea, a virus, or cyanobacteria. In some embodiments, the synthase is derived from a eukaryotic organism. In some embodiments, the eukaryotic organism comprises a plant (e.g., Arabidopsis thaliana), a fungus (e.g., Ascomycetes), algae (e.g., Chlorella, Chlamydomonas), human (Homo sapiens), mouse (Mus musculus), chicken (Gallus gallus), rat (Rattus norvegicus), bovine (Bos taurus), or yeast (e.g., Saccharomyces cerevisiae).
In some embodiments, the terpene synthases disclosed herein are modified to produce terpenoids that modulate a target enzyme disclosed herein as compared with an otherwise wild-type terpene synthases. In some embodiments, the terpene synthases converts GPP, FPP, and/or GGPP (generated by the metabolic precursor pathway) to one or more terpenoids. The modified terpene synthases disclosed herein produce novel terpenoids with therapeutic potential to target enzymes disclosed herein (e.g., protein tyrosine phosphatase, protease). In some embodiments, the terpenoids produced by the terpene synthase inhibit or activate the protein tyrosine phosphatase. In some embodiments, the terpenoids produced by the terpene synthases disclosed herein inhibit or activate a protease disclosed herein.
In some embodiments, the terpene synthase comprises γ-humulene synthase (GHS), amorphadiene synthase (ADS), α-bisabolene synthase (ABS), or taxadiene synthase (TXS). In some embodiments, a wild-type sequence for GHS is SEQ ID NO: 7. In some embodiments, a wild-type sequence for ADS is SEQ ID NO: 4. In some embodiments, a wild-type sequence for TXS is SEQ ID NO: 13. In some embodiments, ABS comprises an amino acid sequence provided in SEQ ID NO: 17.
In some cases, the terpene synthase may comprise a mutated form of GHS, ADS, ABS, or TXS, relative to a wild-type sequence. In some embodiments, the modified terpene synthase comprises a mutation in an amino acid sequence. In some embodiments, the mutation is a single amino acid mutation. In some embodiments, the mutation comprises two or more amino acid mutations. In some embodiments, the terpene synthase may comprise 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acid mutations. In some embodiments, the terpene synthase may comprise 1-10, 2-9, 3-8, 4-7, or 5-6 amino acid mutations. In some embodiments, the mutation comprises a substitution, insertion, of deletion of one or more amino acids. In some cases, the amino acid sequence comprise at least 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 7. In some embodiments, the mutation comprises A319Q with reference to SEQ ID NO: 7. In some embodiments, the mutation comprises Y415C with reference to SEQ ID NO: 7. In some embodiments, the mutation comprises a combination thereof. In some embodiments, the mutation comprises (a) A319Q and Y415F, (b) A319Q and S484G, or (c) A319Q and S484G, or a combination thereof, all with reference to SEQ ID NO: 7. In some embodiments, the mutation may comprise an amino acid mutation of an amino acid lacking a hydroxyl group.
In some embodiments, the terpene synthase is truncated such that only the catalytically active portion of the synthase is encoded. In some embodiments, the catalytic portion of GHS is SEQ ID NO: 295. In some embodiments, the catalytic portion of GHS comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 295. In some embodiments, a catalytic portion of ADS is SEQ ID NO: 293. In some embodiments, the catalytic portion of ADS comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 293. In some embodiments, a catalytic portion of TXS is SEQ ID NO: 297. In some embodiments, the catalytic portion of TXS comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 297.
In some embodiments, the terpene synthase comprises one or more mutations provided in
In some embodiments, the terpene synthase is a catalytically active portion thereof, such as those provided in Table 30. In some embodiments, the catalytically active portion of ADS comprises an amino acid sequence provided in SEQ ID NO: 293. In some embodiments, the catalytically active portion of ADS comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 293. In some embodiments, the catalytically active portion of GHS comprises an amino acid sequence provided in SEQ ID NO: 295. In some embodiments, the catalytically active portion of GHS comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 295. In some embodiments, the catalytically active portion of TXS comprises an amino acid sequence provided in SEQ ID NO: 297. In some embodiments, the catalytically active portion of TXS comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 297.
In some embodiments, the terpene synthase is provided in Table 31. (S)-β-Bisabolene synthase, β-Bisabolene synthase, Taxadiene synthase, Terpene synthase from Cynara cardunculus var, (+)-α-Bisabolol synthase, (+)-epi-α-Bisabolol synthase, γ-Humulene synthase, Sesquiterpene synthase 14b, Artemisia annua (Sweet wormwood) Amorpha-4,11-diene synthase, or a combination thereof. In some embodiments, (S)-β-Bisabolene synthase comprises an amino acid sequence provided in SEQ ID NO: 9. In some embodiments, (S)-β-Bisabolene synthase comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 9. In some embodiments, β-Bisabolene synthase comprises an amino acid sequence provided in SEQ ID NO: 11. In some embodiments, β-Bisabolene synthase comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 11. In some embodiments, Terpene synthase from Cynara cardunculus var comprises an amino acid sequence provided in SEQ ID NO: 15. In some embodiments, Terpene synthase from Cynara cardunculus var comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 15. In some embodiments, (+)-α-Bisabolol synthase comprises an amino acid sequence provided in SEQ ID NO: 17. In some embodiments, (+)-α-Bisabolol synthase comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 17. In some embodiments, (+)-epi-α-Bisabolol synthase comprises an amino acid sequence provided in SEQ ID NO: 19. In some embodiments, (+)-epi-α-Bisabolol synthase comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 19. In some embodiments, γ-Humulene synthase comprises an amino acid sequence provided in SEQ ID NO: 7. In some embodiments, γ-Humulene synthase comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 7. In some embodiments, Sesquiterpene synthase 14b comprises an amino acid sequence provided in SEQ ID NO: 23. In some embodiments, Sesquiterpene synthase 14b comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 23. In some embodiments, Artemisia annua (Sweet wormwood) Amorpha-4,11-diene synthase comprises an amino acid sequence provided in SEQ ID NO: 4. In some embodiments, Artemisia annua (Sweet wormwood) Amorpha-4,11-diene synthase comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 4.
In some embodiments, the non-ribosomal peptide synthetase comprises a carrier protein domain, an adenylation domain, a condensation domain, a thioesterase domain, or a reductase domain, or a combination thereof. Non-limiting examples of non-ribosomal peptide synthetases and their substrates are discussed in Miller B R, Gulick A M. “Structural Biology of Nonribosomal Peptide Synthetases.” Methods Mol Biol. 1401 (2016) 3-29, which is hereby incorporated by reference. In some embodiments, the non-ribosomal peptide synthetase comprises GupB, Nterp, or a combination thereof. In some embodiments, the non-ribosomal peptide synthetase is a dipeptide synthase. In some embodiments, the non-ribosomal peptide synthetase is a cyclodipeptide synthase. In some embodiments the non-ribosomal peptide synthetase comprises domains from one or more naturally occurring non-ribosomal peptide synthetases. In some embodiments the non-ribosomal peptide synthase has one or more mutations in one or more adenylation (A) domains. In some embodiments, the non-ribosomal peptide synthase includes one or more adenylation (A) domains from a different source organism than other domains in the non-ribosomal peptide synthase.
In some cases, the one or more products of the terpene synthase are isolated. In some embodiments, the one or more products of the terpene synthase are purified. In some embodiments, the terpene synthase or modified terpene synthase, or catalytically active portion thereof is isolated or purified.
Provided herein are methods of amplifying expression of a reporter in vivo that may be linked to inhibition of a target enzyme. In some embodiments, the GOI encodes an enzyme capable of inducing expression of a detectable polypeptide disclosed herein, such as a polymerase. In some embodiments, the GOI encodes T7 RNA polymerase. Other non-limiting examples of RNA polymerases include other viral RNA polymerases, such as T3 polymerase, SP6 polymerase, and Kl l polymerase; Eukaryotic RNA polymerases, such as such as RNA polymerase I, RNA polymerase II, RNA polymerase III, RNA polymerase IV, and RNA polymerase V; or Archaea RNA polymerases. This polymerase encoded by the GOI can then bind to the promoter driving expression of a detectable polypeptide, resulting in some cases, in amplification of the detectable signal by nearly 5-fold, as compared to the GOI encoding the detectable polypeptide itself.
Provided herein, in some embodiments, are one or more nucleic acid molecules encoding the systems disclosed herein. In some embodiments, the nucleic acid molecules comprise deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). In some embodiments, the one or more nucleic acid molecules encoding the target enzymes comprise a plasmid vector, a viral vector, a cosmid, an artificial chromosome, or a region of the host chromosome. In some embodiments, the plasmid vector is derived from bacteria, archaea, yeast, or plants. In some embodiments, the viral vector is derived from adenovirus, adeno-associated virus, retrovirus, lentivirus, poxvirus, baculovirus, or herpes simplex virus. In some embodiments, the one or more nucleic acid molecules encode a phosphorylated protein binding domain, a kinase substrate, a repressor element, a subunit of RNA polymerase or portions thereof, a kinase, the kinase/phosphatase substrate, the target enzyme (e.g., protease, phosphatase), an operator for the repressor element, binding site for the subunit of RNA polymerase or portions thereof, a chaperone polypeptide, a metabolic pathway, synthase (e.g., terpene synthase), a gene of interest (GOI), or any combination thereof.
In some embodiments, the systems disclosed herein comprise a single nucleic acid molecule encoding the phosphorylated protein binding domain, the repressor element, the subunit of RNA polymerase or portions thereof, the kinase, the kinase/phosphatase substrate, the target enzyme (e.g., protease, phosphatase), the operator for the repressor element, binding site for the subunit of RNA polymerase or portions thereof, the chaperone polypeptide, the metabolic pathway, the synthase (e.g., terpene synthase), the GOI, or any combination thereof. In some embodiments, the systems disclosed herein comprise more than one nucleic acid molecule encoding the phosphorylated protein binding domain, the repressor element, the subunit of RNA polymerase or portions thereof, the kinase, the kinase/phosphatase substrate, the target enzyme (e.g., protease, phosphatase), the operator for the repressor element, binding site for the subunit of RNA polymerase or portions thereof, the chaperone polypeptide, the metabolic pathway, the synthase (e.g., terpene synthase), the GOI, or any combination thereof. In some embodiments, the systems disclosed herein comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 or more nucleic acid molecules. In some embodiments, the two-hybrid system comprises two separate nucleic acid molecules. For example, the two-hybrid system may comprise a first nucleic acid molecule (e.g., plasmid vector) encoding the phosphorylated protein binding domain, a repressor element, a subunit of RNA polymerase or portions thereof, the chaperone polypeptide, and the target enzyme; and a second nucleic acid molecule encoding the gene of interest (GOI), and comprising the binding site for the subunit of RNA polymerase or portions thereof, an operator for the repressor element. In some embodiments, the first nucleic acid molecule comprises a ribosomal binding site (RBS) disclosed herein.
Provided herein, in some embodiments, are systems comprising: (1) a first nucleic acid sequence encoding a phosphorylated protein binding domain; (2) a second nucleic acid sequence encoding a repressor element; (3) a third nucleic acid sequence encoding a subunit of RNA polymerase or portions thereof; (4) a fourth nucleic acid sequence encoding a kinase/phosphatase substrate; (5) a fifth nucleic acid sequence encoding kinase; (6) a sixth nucleic acid encoding the target enzyme; (7) a seventh nucleic acid encoding an operator for the repressor element; (8) an eighth nucleic acid sequence comprising a binding site for the RNA polymerase; and (9) a ninth nucleic acid sequence encoding a polymerizing enzyme. In some embodiments, the kinase substrate is coupled to the subunit of the RNA Polymerase or portions thereof. In some embodiments, kinase substrate comprises MidT. In some embodiments, the subunit of the RNA polymerase or portions thereof comprises Rpoz. In some embodiments, there is a linker between the kinase substrate and the subunit of the RNA Polymerase or portions thereof. In some embodiments, the repressor element is coupled to the phosphorylated protein binding domain. In some embodiments the repressor element is or comprises cI repressor. In some embodiments, the phosphorylated protein binding domain is or comprises SH2. In some embodiments, the repressor element and the phosphorylated protein binding domain are coupled by a linker. In some embodiments, the target enzyme comprises a protease, such as those disclosed herein. In some embodiments, the systems further comprise a (10) tenth nucleic acid sequence encoding a metabolic pathway for producing the bioactive molecule described herein. In some embodiments, the systems further comprise (11) an eleventh nucleic acid sequence encoding a synthase enzyme for producing the bioactive molecule. In some embodiments, the eleventh nucleic acid sequence further encodes and enzyme for synthesizing geranylgeranyl diphosphate (GGPP) from metabolic intermediates (e.g., farnesyl diphosphate (FFP), and isopentenyl diphosphate (IPP), e.g., geranylgeranyl diphosphate synthase (GGPPS)). In some embodiments, the first, second, third, fourth, fifth, sixth, seventh and eighth nucleic acid sequences are on a single nucleic acid molecule. In some embodiments, the first, second, third, fourth, fifth, sixth, seventh and eighth nucleic acid sequences are on a single nucleic acid molecule. In some embodiments, the first, second, third, fourth, fifth, sixth and ninth nucleic acid sequences are comprised in a single nucleic acid molecule. In some embodiments, the seventh and eighth nucleic acid sequences are comprised in a single nucleic acid molecule. In some embodiments, the tenth and elevenths nucleic acid sequence may be comprised in a single nucleic acid molecule or more than one.
In some embodiments, the one or more nucleic acid molecules encoding the above genetically-encoded system components comprises a promoter sequence configured to drive expression of a gene expression product. In some embodiments, the gene expression produce comprises the phosphorylated protein binding domain, the repressor element, the subunit of RNA polymerase or portions thereof, the kinase, the kinase/phosphatase substrate, the target enzyme (e.g., protease, phosphatase), the operator for the repressor element, binding site for the subunit of RNA polymerase, the chaperone polypeptide, the metabolic pathway, the synthase (e.g., terpene synthase), the GOI, or any combination thereof. In some embodiments, the one or more nucleic acid molecules comprises an operator or an inducer of transcription of the gene expression product. In some embodiments, the one or more nucleic acid molecules comprises an enhancer, a response element, or a silencer. In some embodiments, one or more nucleic acid molecules comprises, in a 5′ to a 3′ direction, a promoter and a nucleic acid sequence encoding the gene expression product (e.g., a component of the system). In some embodiments, the one or more nucleic acid molecules comprises, in a 5′ to a 3′ direction, a promoter, an operator, and a nucleic acid sequence encoding the gene expression product (e.g., a component of the system). In some embodiments, the one or more nucleic acid molecules is comprised in an operon. In some embodiments, the promoter comprises a TATA Box for forming the transcription initiation complex in a eukaryotic cell. In some embodiments, the promoter comprises a Pribnow box for forming the transcription initiation complex in a bacterial cell.
In some embodiments, the promoter comprises a pBAD promoter, Prol promoter, placZopt promoter, ProD promoter, or any combination thereof. In some embodiments, the promoter comprises a nucleic acid sequence provided in any one of SEQ ID NOS: 82-85. In some embodiments, the promoter comprises a nucleic acid sequence that is greater than or equal to 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOS: 82-85.
Provided herein, in some embodiments, are one or more nucleic acid molecules encoding a repressor element. In some embodiments, the operator for the repressor element comprises a cI repressor. In some embodiments, the cI repressor can be identified with Primary Accession No. P03034 (UniProt) (SEQ ID NO: 86).
Provided herein, in some embodiments, are one or more nucleic acid molecules encoding chaperone polypeptide. In some embodiments, the chaperone polypeptide comprises CDC37. In some embodiments, the one or more nucleic acid molecules encoding CDC37 is provided in SEQ ID NO: 75. In some embodiments, the one or more nucleic acid molecules encoding CDC37 is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 75.
Provided herein, in some embodiments, are one or more nucleic acid molecules encoding a subunit of RNA polymerase or portions thereof. In some embodiments, the binding site for the RNA polymerase is a binding site for a subunit of the RNA polymerase or portions thereof (e.g., RpoZ) (SEQ ID NO: 88).
Disclosed herein are one or more nucleic acid molecules encoding a phosphorylated protein binding domain disclosed herein. In some embodiments, the phosphorylated protein binding domain comprises or is a phosphorylated tyrosine binding domain. In some embodiments, the phosphorylated tyrosine binding domain comprises Src homology 2 (SH2). In some embodiments, the one or more molecules comprises a nucleic acid sequence encoding SH2, such as for example SEQ ID NO: 90. In some embodiments, the one or more nucleic acid molecules encoding the SH2 comprises a nucleic acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 90.
In some embodiments, the one or more molecules comprises a nucleic acid sequence encoding HA4, such as for example SEQ ID NO: 94. In some embodiments, the one or more nucleic acid molecules encoding the HA4 comprises a nucleic acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 94. In some embodiments, the one or more molecules comprises a nucleic acid sequence encoding SH2ABL, such as for example SEQ ID NO:92. In some embodiments, the one or more nucleic acid molecules encoding the SH2ABL comprises a nucleic acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO:92.
Disclosed herein are one or more nucleic acid molecules encoding a kinase/phosphatase substrate. In some embodiments, the kinase/phosphatase substrate comprises hamster polyomavirus middle T antigen (MidT). In some embodiments, the one or more molecules comprises a nucleic acid sequence encoding MidT, such as for example SEQ ID NO: 96 or SEQ ID NO: 98. In some embodiments, the one or more nucleic acid molecules encoding the MidT comprises a nucleic acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 96 or SEQ ID NO: 98.
Provided herein, in some embodiments, are one or more nucleic acid molecules encoding kinase. In some embodiments, the kinase comprises or is Src Kinase. In some embodiments, the one or more molecules comprises a nucleic acid sequence encoding Src Kinase, such as for example SEQ ID NO:73. In some embodiments, the one or more nucleic acid molecules encoding the Src Kinase comprises a nucleic acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO:73. In some embodiments, the one or more nucleic acid molecules encodes a truncated Src Kinase. In some embodiments, the Src Kinase comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 246. In some embodiments, the one or more nucleic acid molecules encodes a Lck kinase. In some embodiments, the Lck kinase comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 247. In some embodiments, the one or more nucleic acid molecules encodes a Fyn kinase. In some embodiments, the Fyn Kinase comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 248. In some embodiments, the one or more nucleic acid molecules encodes a Yes kinase. In some embodiments, the Yes kinase comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 249. In some embodiments, the one or more nucleic acid molecules encodes an Epha2 kinase. In some embodiments, the Epha2 kinase comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 250. In some embodiments, the one or more nucleic acid molecules encodes a BTK. In some embodiments, the BTK comprises an amino acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to SEQ ID NO: 251.
Provided herein, in some embodiments, are one or more nucleic acid molecules encoding a target enzyme. In some embodiments, the one or more nucleic acid molecules encoding the target enzymes disclosed herein further comprise a ribosomal binding site (RBS), which enhances translation of the mRNA encoding the target enzyme. In some embodiments, the RBS comprises or is an internal ribosome entry site (IRES). In some embodiments, the RBS comprises 5′-AGGAGG-3′. In some embodiments, the RBS comprises 5′-GGTG-3′. In some embodiments, RBS is modified to further enhance ribosomal binding. In some embodiments, the RBS is engineered via a degenerate primer. In some embodiments, the RBS variants are screened as libraries. In some embodiments, the RBS variants are screened in conjunction with variants in other GOIs or operators (e.g., T7 RNAP, GFPuv). In some embodiments, the RBS is exogenous to the cell. In some embodiments, the RBS is endogenous to the cell. In some embodiments, the RBS is encoded by a nucleic acid sequence comprising any one of SEQ ID NOS: 100-108 or SEQ ID NOS: 39-42. In some embodiments, the RBS is or comprises a nucleic acid sequence that is more than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to any one of SEQ ID NOS: 100-108 or SEQ ID NOS: 39-42.
In some embodiments, the one or more nucleic acid molecules encoding the target enzyme comprises a deoxyribonucleic acid (DNA) sequence encoding the target enzyme. In some embodiments, the DNA sequence encoding PTP1B is provided in SEQ ID NO: 5. In some embodiments, the DNA sequence encoding PTP1B is greater than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical SEQ ID NO: 5. In some embodiments, the one or more nucleic acid molecules encodes PTP1B321. PTP1B405. TCPTP317, TCPTP387, PEST (E57D)306, STEP282-563, or SHP2237-529. In some embodiments, the one or more nucleic acid molecules comprises a nucleic acid sequence provided in Table 28. In some embodiments, the one or more nucleic acid molecules comprises a nucleic acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to any one of the sequences provided in Table 28. In some embodiments, the one or more nucleic acid molecules encodes a protein kinase. In some embodiments, the one or more nucleic acid molecules comprises a nucleic acid sequence provided in Table 28. In some embodiments, the one or more nucleic acid molecules comprises a nucleic acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to any one of the sequences provided in Table 28.
In some embodiments, HIV protease (HIV-1Pr) is encoded by a DNA sequence provided in SEQ ID NO. 62. In some embodiments, HIV-1Pr is encoded by a DNA sequence that is greater than or equal to 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 62. In some embodiments, 3CLpro is encoded by a DNA sequence provided in SEQ ID NO. 68. In some embodiments, 3CLpro is encoded by a DNA sequence that is greater than or equal to 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 68. In some embodiments, NS2B/NS3 protease is encoded by a DNA sequence provided in SEQ ID NO.77. In some embodiments, NS2B/NS3 protease is encoded by a DNA sequence that is greater than or equal to 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 77. In some embodiments PLpro is encoded by a DNA sequence comprising SEQ ID NO: 66. In some embodiments, PLpro comprises is encoded by a DNA sequence that is more than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 66. In some embodiments USP7 is encoded by a DNA sequence comprising SEQ ID NO: 64. In some embodiments, USP7 comprises is encoded by a DNA sequence that is more than or equal to about 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 64.
Provided herein, in some embodiments, are one or more nucleic acid molecules encoding a protease cleavage site. In some embodiments, the protease cleavage site is for recognition by 3CLpro. In some embodiments, the one or more nucleic acid molecules encoding the 3CLpro protease cleavage site is provided in SEQ ID NO: 109. In some embodiments, the protease cleavage site is for recognition by HIVpro. In some embodiments, the one or more nucleic acid molecules encoding the HIVpro protease cleavage site is provided in SEQ ID NO:110. In some embodiments, the protease cleavage site is for recognition by PLpro. In some embodiments, the one or more nucleic acid molecules encoding the PLpro protease cleavage site is provided in SEQ ID NO:111. In some embodiments, the protease cleavage site is for recognition by USP7. In some embodiments, the one or more nucleic acid molecules encoding the USP7 protease cleavage site is provided in SEQ ID NO:24.
Provided herein, in some embodiments, are one or more nucleic acid molecules encoding a gene of interest (GOI). In some embodiments, the GOI is or comprises LuxAB. In some embodiments, the one or more nucleic acid molecules encoding LuxAB comprises SEQ ID NO: 112. In some embodiments, the one or more nucleic acid molecules encoding LuxAB is greater than or equal to 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 112. In some embodiments, the GOI is or comprises SpecR. In some embodiments, the one or more nucleic acid molecules encoding SpecR comprises SEQ ID NO:79. In some embodiments, the one or more nucleic acid molecules encoding SpecR is greater than or equal to 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 79.
Provided herein, in some embodiments, are one or more nucleic acid molecules encoding an operator for the repressor element. In some embodiments, the one or more nucleic acid molecules encoding the operator comprises SEQ ID NOS: 113-117. In some embodiments, the one or more nucleic acid molecules encoding the operator is greater than or equal to 50%, 60%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NOS: 113-117.
Provided herein, in some embodiments, are one or more nucleic acid molecules encoding metabolic pathway. In some embodiments, one or more nucleic acid molecules encoding the metabolic pathway encodes an enzyme that catalyzes the condensation of isopentenyl diphosphate (IPP), or dimethylallyl diphosphate (DMAPP), such as a geranylgeranyl diphosphate synthase (GGPPS). In some embodiments, one or more nucleic acid molecules encoding the metabolic pathway further encodes an enzyme that selectively hydroxylates unactivated carbon-hydrogen bonds disclosed herein. In some embodiments, the enzyme comprises a cytochrome P450 enzyme, a cytochrome P450 reductase enzyme, a cytochrome b5 enzyme, an oxidase enzyme, an acyl transferase enzyme, a glycosyltransferase enzyme, a halogenase, and/or a peroxidase. In some embodiments, one or more nucleic acid molecules encoding the metabolic pathway further encodes mevalonate kinase (ERG12) (NCBI Gene ID: 855248), phosphomevalonate kinase (ERG8) (NCBI Gene ID: 855260), or diphosphomevalonate decarboxylase MVD1 (MVD1) (NCBI Gene ID: 855779), or a combination thereof. In some embodiments, the metabolic pathway is encoded by one nucleic acid molecule. In some embodiments, the metabolic pathway is encoded by two separate nucleic acid molecules. In some embodiments, the metabolic pathway is encoded by three separate nucleic acid molecules. In some embodiments, the metabolic pathway is encoded by four separate nucleic acid molecules. In some embodiments, the system comprises a first nucleic acid molecule encoding mevalonate kinase (ERG12), phosphomevalonate kinase (ERG8, or diphosphomevalonate decarboxylase MVD1 (MVD1), or a combination thereof; and a second nucleic acid molecule encoding a synthase disclosed herein. In some embodiments, the second nucleic acid molecule further encodes geranylgeranyl diphosphate synthase (GGPPS). In some embodiments, the first nucleic acid molecule and the second nucleic acid molecules are plasmid vectors in operable combination with one another. Alternatively, the first and second nucleic acid molecules may be on the same plasmid.
Provided herein are one or more nucleic acid molecules encoding the terpene synthases described herein. In some embodiments, the nucleic acid molecules encoding the terpene synthase comprise a plasmid vector, a viral vector, a cosmid, an artificial chromosome, or a region of the host chromosome. In some embodiments, the nucleic acid molecules encoding the terpene synthetases further encodes the metabolic pathway or metabolic precursor pathway disclosed herein. For example, the nucleic acid molecule encoding the terpene synthase may also encode an enzyme that catalyzes the condensation of isopentenyl diphosphate (IPP), or dimethylallyl diphosphate (DMAPP), such as a geranylgeranyl diphosphate synthase (GGPPS). In some embodiments, the nucleic acid encoding the terpene synthase described herein further encodes an enzyme that selectively hydroxylates unactivated carbon-hydrogen bonds disclosed herein. In some embodiments, the enzyme comprises a cytochrome P450 enzyme, a cytochrome P450 reductase enzyme, a cytochrome b5 enzyme, an oxidase enzyme, an acyl transferase enzyme, a glycosyltransferase enzyme, a halogenase, and/or a peroxidase. In some embodiments, the nucleic acid encoding the terpene synthase described herein further encodes mevalonate kinase (ERG12) (NCBI Gene ID: 855248), phosphomevalonate kinase (ERG8) (NCBI Gene ID: 855260), or diphosphomevalonate decarboxylase MVD1 (MVD1) (NCBI Gene ID: 855779), or a combination thereof. In some embodiments, the one or more nucleic acid molecules encoding the terpene synthases are provided in Table 30. In some embodiments, the one or more nucleic acid molecules comprises a nucleic acid sequence that is greater than or equal to about 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to any one of the sequences provided in Table 30.
In some embodiments, the system further encodes or comprises various transcription factors, transcription activators, or transcription repressors. In some embodiments, the cell comprises the various transcription factors. In some embodiments, the system further comprises one or more inducers of transcription, such as for example, a substance that binds to a repressor and prevents the repressor from inhibiting transcription. Also provided herein, in some aspects, are molecular barcodes capable of being added to the one or more nucleic acid molecules disclosed herein that enable identification a component of the system disclosed herein using multiplexed sequence analysis. In some embodiments, the nucleic acid molecules disclosed herein comprise a molecular barcode sequence unique to a target enzyme, a synthase, a metabolic pathway, or a combination thereof. In some embodiments, the nucleic acid molecule encoding the target enzyme also comprises a unique barcode sequence that enables identification of the target enzyme. In some embodiments, the nucleic acid molecule encoding the synthase also comprises a unique barcode sequence that enables identification of the synthase. In some embodiments, the barcode is sufficient to identify a target tyrosine phosphatase. In some embodiments, the target enzyme comprises or is a proteolytic enzyme disclosed herein. In some embodiments, the target enzyme comprises or is a protein phosphatase disclosed herein (e.g., tyrosine phosphatase).
In some embodiments, the molecular barcode comprises or is a unique molecular identifier (UMI) comprising a nucleic acid sequence coupled to a 5′ or a 3′ end (or both 5′ and 3′ end) of a nucleic acid sequence encoding a phosphorylated protein binding domain, a repressor element, a subunit of RNA polymerase or portions thereof, a kinase substrate, kinase, the target enzyme, an operator for the repressor element, a synthase (e.g., terpene synthase), or a metabolic pathway, or any combination thereof.
In some embodiments, the molecular barcode has a length comprising from about 5 nucleotides to 25 nucleotides, 6 nucleotides to 24 nucleotides, 7 nucleotides to 23 nucleotides, 8 nucleotides to 22 nucleotides, 9 nucleotides to 21 nucleotides, 10 nucleotides to 20 nucleotides, 11 nucleotides to 19 nucleotides, 12 nucleotides to 18 nucleotides, 13 nucleotides to 17 nucleotides, or 14 nucleotides to 16 nucleotides. In some embodiments, the length of a molecular barcode comprises less than or equal to 25 nucleotides. In some embodiments, the length of a molecular barcode comprises at least or equal to about 1, 2, 3, 4, 5, or 6 nucleotides. In some embodiments, the molecular barcode comprises at least or equal to about 6 nucleotides. In some embodiments, the nucleotides are contiguous.
In some embodiments, the nucleic acid molecules disclosed herein may comprise an adaptor. In some embodiments, the adaptor comprises one or more primer sites, such as a site for sequencing primer or an amplification primer. In some embodiments, the primer is a universal primer. In some embodiments, the adaptor comprises an index site comprising a nucleic acid sequence that may be capable of identifying the sample. In some embodiments, the index site comprise a nucleic acid sequence that has a length comprising from about 5 nucleotides to 25 nucleotides, 6 nucleotides to 24 nucleotides, 7 nucleotides to 23 nucleotides, 8 nucleotides to 22 nucleotides, 9 nucleotides to 21 nucleotides, 10 nucleotides to 20 nucleotides, 11 nucleotides to 19 nucleotides, 12 nucleotides to 18 nucleotides, 13 nucleotides to 17 nucleotides, or 14 nucleotides to 16 nucleotides. In some embodiments, the length of the index site comprises less than or equal to 25 nucleotides. In some embodiments, the length of the index site comprises at least or equal to about 1, 2, 3, 4, 5, or 6 nucleotides. In some embodiments, the index site comprises at least or equal to about 6 nucleotides. In some embodiments, the nucleotides are contiguous. In some embodiments, the adaptor comprises more than one index site. In some embodiments, the adaptor comprises 2, 3, 4, 5, 6, 7, 8, 9, or 10 index sites. In some embodiments, the adaptor comprises one or more of the UMI disclosed herein. A non-limiting example of an adaptor comprises xGen™ Dual Index UMI.
In some embodiments, the adaptors disclosed herein are designed for a specific next generation sequencing platform, such sequences that allow template molecules (for a sequencing reaction) to be immobilized to a solid surface. In some embodiments, the adaptor comprises P5 and P7 sequences suitable for sequencing using Illumina® sequencing-by-synthesis. Non-limiting sequencing platforms comprises bisulfite-free sequencing, bisulfite sequencing, TET-assisted bisulfite (TAB) sequencing, APOBEC-Coupled Epigenetic (ACE) sequencing, high-throughput sequencing, Maxam-Gilbert sequencing, massively parallel signature sequencing, Polony sequencing, 454 pyrosequencing, Sanger sequencing, sequencing-by-synthesis, SOLID sequencing, Ion Torrent™ semiconductor sequencing, DNA nanoball sequencing, Heliscope single molecule sequencing, single molecule real time (SMRT) sequencing, nanopore DNA sequencing, shot gun sequencing, RNA sequencing, Enzyme-assisted Identification of Genome Modification Assay (EnIGMA) sequencing, nanopore sequencing, sequencing-by-binding, or any combination thereof.
In some embodiments, a molecular barcode is not used to demultiplex sequencing data from a multiplex sequencing reaction. In such an embodiment, a nucleic acid sequence encoding a system component may be used to identify the sample, the terpene synthase, the metabolic pathway, or the target enzyme. For example the nucleic acid sequence encoding the terpene synthase may be used to identify the terpene synthase, and so on. In some embodiments, such implementation of the method is particularly suited to long-read sequencing, using platforms such as (but not limited to) SMRTR sequencing, or nanopore DNA sequencing (e.g., Oxford Nanopore) sequencing.
Provided herein are genetically-encoded system comprising gene of interest (GOI) that encodes a polymerizing enzyme, that when expressed, drives expression of a detectable polypeptide in the presence of an inhibitor of the target enzyme. In some embodiments, the genetically-encoded system disclosed herein comprises: (i) a first nucleic acid sequence encoding a phosphorylated protein binding domain (e.g., phosphorylated tyrosine binding domain); (ii) a second nucleic acid sequence encoding a repressor element; (iii) a third nucleic acid sequence encoding a subunit of RNA polymerase or portions thereof; (iv) a fourth nucleic acid sequence encoding a phosphatase substrate (e.g., tyrosine phosphatase substrate); (v) a fifth nucleic acid sequence encoding kinase (e.g., tyrosine kinase); (vi) a sixth nucleic acid encoding the target enzyme; (vii) a seventh nucleic acid encoding an operator for the repressor element; (viii) an eighth nucleic acid sequence comprising a binding site for the RNA polymerase; (ix) a ninth nucleic acid sequence encoding a polymerizing enzyme, that when expressed, drives expression of a detectable polypeptide in the presence of an inhibitor of the target enzyme. In some embodiments, the signal from the detectable polypeptide is amplified by at least 2-fold, 3-fold, 4-fold, 5-fold, 10-fold, or 100-fold, as compared with expression of the detectable polypeptide by the GOI itself. In some embodiments, the signal may be amplified by about 2-fold to 100-fold, 3-fold to 90-fold, 4-fold to 80-fold, 5-fold to 70-fold, 6-fold to 60-fold, 7-fold to 50-fold, 8-fold to 40-fold, 9-fold to 30-fold, 10-fold to 20-fold.
In some embodiments, the one or more nucleic acid molecules disclosed herein comprises molecular witch that enable precise control over the on/off state of the genetically-encoded system (e.g., the two-hybrid system). In some embodiments, the molecular switch is an optical switch. In some embodiments, the two-hybrid system comprises one or more nucleic acid molecules encoding (i) a variant of a light-oxygen-voltage 2 (LOV2) domain that contains a bacterial SsrA peptide and (ii) a modified SspB peptide in place of the substrate and phosphorylation binding domain (SH2 domains). Exposure of LOV2 to light causes a conformational change that exposes the SsrA peptide and enables an SsrA-SspB interaction that promotes transcription of a gene of interest (GOI). In some embodiments, the GOI is or comprises a gene for LuxAB. This type of photo-switchable system is valuable to control the dynamics of the two-hybrid system to improve the production and/or detection of inhibitors. In some embodiments, the GOI comprises a gene for a fluorescent protein. In some embodiments, the fluorescent protein comprises GFP. In some embodiments, the SsraA-SspB interaction is replaced by a different set of protein binding partners modulated by light. In some embodiments these binding partners are BphP1 and PpsR2.
The methods and systems may utilize or comprise one or more processors or computers. The processor may be a hardware processor such as a central processing unit (CPU), a graphic processing unit (GPU), a general-purpose processing unit, or a computing platform. The processor may be comprised of any of a variety of suitable integrated circuits, microprocessors, logic devices, field-programmable gate arrays (FPGAs) and the like. In some instances, the processor may be a single core or multi core processor, or a plurality of processors may be configured for parallel processing. Although the disclosure is described with reference to a processor, other types of integrated circuits and logic devices are also applicable. The processor may have any suitable data operation capability. For example, the processor may perform 512 bit, 256 bit, 128 bit, 64 bit, 32 bit, or 16 bit data operations. In some embodiments, such processors and computer systems are programmed to perform analysis of sequencing data from the multiplex sequencing analysis described herein. In some embodiments, the processors and computer systems are programed to demultiplex sequencing data, by assigning the one or molecular barcodes to the tepee synthase, the two-hybrid system, or both.
The computer system includes a central processing unit (CPU, also “processor” and “computer processor” herein), which can be a single core or multi core processor, or a plurality of processors for parallel processing. The computer system also includes memory or memory location (e.g., random-access memory, read-only memory, flash memory), electronic storage unit (e.g., hard disk), communication interface (e.g., network adapter) for communicating with one or more other systems, and peripheral devices, such as cache, other memory, data storage and/or electronic display adapters. The memory, storage unit, interface and peripheral devices are in communication with the CPU through a communication bus (solid lines), such as a motherboard. The storage unit can be a data storage unit (or data repository) for storing data. The computer system can be operatively coupled to a computer network (“network”) with the aid of the communication interface. The network can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet. The network in some cases is a telecommunication and/or data network. The network can include one or more computer servers, which can enable distributed computing, such as cloud computing. The network 6, in some cases with the aid of the computer system, can implement a peer-to-peer network, which may enable devices coupled to the computer system to behave as a client or a server.
The CPU can execute a sequence of machine-readable instructions, which can be embodied in a program or software. The instructions may be stored in a memory location, such as the memory. The instructions can be directed to the CPU, which can subsequently program or otherwise configure the CPU to implement methods of the present disclosure. Examples of operations performed by the CPU can include fetch, decode, execute, and writeback.
In some embodiments, the program or software is for primary sequence data analysis. In some embodiments, the program or software is for secondary sequence data analysis, such as DNA sequencing analysis or RNA sequencing analysis. In some embodiments, the secondary sequence data analysis comprises demultiplexing, trimming, read alignment, and UMI reference building. Non-limiting examples of programs or software include for performing secondary sequence data analysis include, but are not limited to Velvet, DRAGEN BioIT (Illumina®), SMRT (PacBio), MinKNOW and EPI2ME (Oxford Nanopore), or Burrows-Wheeler Alignment based algorithms (e.g., bowtie and SOAP2).
The CPU can be part of a circuit, such as an integrated circuit. One or more other components of the system can be included in the circuit. In some cases, the circuit is an application specific integrated circuit (ASIC).
The storage unit can store files, such as drivers, libraries and saved programs. The storage unit can store user data, e.g., user preferences and user programs. The computer system in some cases can include one or more additional data storage units that are external to the computer system, such as located on a remote server that is in communication with the computer system through an intranet or the Internet.
The computer system can communicate with one or more remote computer systems through the network. For instance, the computer system can communicate with a remote computer system of a user. Examples of remote computer systems include personal computers (e.g., portable PC), slate or tablet PC's (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants. The user can access the computer system via the network.
Disclosed herein, in some embodiments, are methods of utilizing the systems disclosed herein to identify bioactive molecules that modulate activity of a target enzyme (e.g., protease, phosphatase) or metabolic pathways that produce intermediates for producing the bioactive molecules, or both. The methods disclosed herein may be modified by applying molecular barcodes to the nucleic acid molecules encoded by the two-hybrid system or the metabolic system that can be used to demultiplex samples from multiplex sequencing analysis.
Provided herein are methods for performing multiplexed discovery of bioactive molecules that inhibit activity of a target enzyme, the method comprising: (a) providing a plurality of cells; (a) introducing into each of the plurality of cells a genetically-encoded system that links expression of a gene of interest to biosynthesis of a bioactive molecule by a cell of the plurality of cells, wherein the genetically-encoded system encodes the target enzyme, a synthase of the bioactive molecule, a ligand, and a receptor specific to the ligand under conditions sufficient for binding of the ligand to the receptor to produce a ligand-receptor pair, wherein the ligand-receptor pair induces expression of the gene of interest that is increased relative to a reference expression level when the bioactive molecule that inhibits the activity of the target enzyme is present in the cell, wherein the binding does not induce the expression of the gene of interest that is increased relative to the reference expression level when the bioactive molecule that inhibits the activity of the target enzyme is not present in the cell; (b) measuring the expression of the gene of interest that is increased relative to the reference expression level in a subset of the plurality of cells; and (c) performing multiplexed sequencing of the subset of the plurality of cells to discover the bioactive molecules that inhibit the activity of the target enzyme produced by the subset of the plurality of cells. In some embodiments, the reference expression level is obtained from an otherwise identical reference cell that does not comprise a functional version of the synthase, the ligand or the receptor. In some embodiments, the target enzyme comprises a proteolytic enzyme or a phosphatase. In some embodiments, the phosphatase comprises a tyrosine phosphatase. In some embodiments, inhibition of the target enzyme by the bioactive molecule disrupts binding between the receptor and the ligand, thereby preventing the ligand-receptor pair from inducing expression of the gene of interest. In some embodiments, the binding of the ligand to the receptor is phosphorylation dependent. In some embodiments, the synthase comprises a terpene synthase or a nonribosomal peptide synthetase. In some embodiments, the terpene synthase comprises γ-humulene synthase (GHS), amorphadiene synthase (ADS), α-bisabolene synthase (ABS), or taxadiene synthase (TXS). In some embodiments, the terpene synthase comprises an amino acid sequence that is greater than or equal to about 90% identical to any one of SEQ ID NO: 4, 7, 13, or 17. In some embodiments, the ligand comprises a kinase substrate that binds to the receptor in a phosphorylated state, and the receptor comprises a phosphorylated protein binding domain that binds to the kinase substrate when the kinase substrate is phosphorylated. In some embodiments, the ligand is coupled to an omega subunit of RNA polymerase and the receptor is coupled to a DNA binding domain, or the ligand is coupled to the DNA binding domain and the receptor is coupled to the omega subunit of the RNA polymerase. In some embodiments, the gene of interest encodes a reporter polypeptide comprising a luciferase enzyme, a fluorescent polypeptide, secreted alkaline phosphatase, ß-galactosidase levansucrase, chloramphenicol acetyltransferase (CAT), or antibiotic resistance. In some embodiments, the gene of interest encodes a polymerizing enzyme that, when expressed, binds to a promoter operably linked to a gene encoding the reporter polypeptide to drive expression of the reporter polypeptide. In some embodiments, the genetically-encoded system further encodes a metabolic pathway for biosynthesis of the bioactive molecule. In some embodiments, metabolic pathway is an isoprenoid pathway. In some embodiments, the isoprenoid pathway comprises a mevalonate pathway, a methylerthritol 4-phosphate (MEP) pathway, or a deoxyxylulose 5-phosphate (DXP) pathway. In some embodiments, the multiplex sequencing comprises long read sequencing. In some embodiments, the exogenous genetically-encoded system comprises one or more molecular barcode sequences that uniquely identifies the target enzyme, the synthase, or a combination thereof. In some embodiments, the multiplex sequencing further comprises performing demultiplexing, thereby assigning each of the one or more molecular barcodes with the target enzyme, the synthase, or the combination thereof, for each the subset of the plurality of cells.
Provided herein are methods of identifying a bioactive molecule that modulates a target enzyme disclosed herein. In some embodiments, the target enzyme is a protease or a phosphatase (e.g., tyrosine phosphatase). In some embodiments, the bioactive molecule is an inhibitor of the target enzyme. In some embodiments, the methods disclosed herein for identifying a bioactive molecule that inhibits a target enzyme, in some embodiments, comprise: (a) expressing in a cell an exogenous synthase for producing the bioactive molecule in the cell; (b) expressing in the cell a metabolic pathway under conditions suitable to provide metabolic intermediates for producing the bioactive molecule by the exogenous synthase; (c) introducing into the cell a two-hybrid system disclosed herein that links modulation of the target enzyme with expression of a gene of interest (GOI); and (d) measuring expression of the GOI. In some embodiments the exogenous synthase comprises a terpene synthase. In some embodiments, the target enzyme comprises a protease. In some embodiments, the target enzyme comprises a phosphatase. In some embodiments, the phosphatase is a tyrosine phosphatase. In some embodiments, the metabolic pathway comprises enzymes, metabolites, and/or intermediates of a mevalonate pathway, a methylerthritol 4-phosphate (MEP) pathway, a deoxyxylulose 5-phosphate (DXP) pathway, or a combination thereof. In some embodiments, the metabolic pathway results in isopentenyl diphosphate (IPP), or dimethylallyl diphosphate (DMAPP), or a combination thereof. In some embodiments, an increased expression of the gene of interest (GOI) as compared to a reference expression level indicates a presence of the inhibitor of the target enzyme produced by the cell. In some embodiments, a decreased expression of the GOI as compared to a reference expression level indicates an absence of an inhibitor of the target enzyme produced by the cell. In some embodiments, the reference expression level may be derived from a reference cell expressing a modified phosphatase/kinase substrate (e.g., MidT) containing a mutation that inhibits its binding to the phosphorylated protein binding domain (e.g., SH2). In some embodiments, the reference expression level may be derived from a reference cell expressing a modified synthase that contains a mutation that reduces the activity of the synthase. In some embodiments, the method further comprises comparing cell survival or growth, cell size, fluorescence, luminescence, or light absorption between the reference cell and a cell disclosed herein. In some embodiments, the method comprises repeating (a) to (d), wherein for each repetition, a new exogenous synthase may be used to identify a new bioactive molecule of the target enzyme. In some embodiments, the inhibitor of the target protease may be a terpene or terpenoid.
Also provided are methods of identifying metabolic pathways that produce bioactive molecules that modulate target enzymes. Referring to
Provided herein are methods of expressing one or more heterologous nucleic acid molecules in a cell. In some embodiments, the heterologous nucleic acid may be introduced into the cell by transfection, transduction, or other suitable method. Suitable methods may be found in Chong Z X, Yeap S K, Ho W Y. Transfection types, methods and strategies: a technical review. Peer J. 2021 Apr. 21; 9:e11165, which is incorporated by reference in its entirety. In some embodiments, the transfection is transient. In some embodiments, the one or more heterologous nucleic acid molecules are comprised in one or more plasmid vectors. In some embodiments, the transfection is performed using electroporation, injection, nucleofection, sonoporation, magnetofection, or using a laser beam. In some embodiments, the transfection is performed using chemical to aid in the transfection, such as for example, lipid-based transfection. In some embodiments, another chemical approach is used, such as for example, using micro-/nano-particles, polymers, peptides/cations, calcium phosphate or dendrimers. In some embodiments, the one or more heterologous nucleic acid molecules may be introduced to the cell by transduction. In some embodiments, the one or more heterologous nucleic acid molecules are comprised in one or more viral vectors. In some embodiments, the transduction is transient. In some embodiments, the transient transduction may be performed using adenovirus, adeno-associated virus, lentivirus, or Herpes virus mediated transduction.
In some embodiments, methods comprise providing a cell described herein, and introducing one or more heterologous nucleic acid molecules encoding a synthase disclosed herein. In some embodiments, the synthase is a terpene synthase. In some embodiments, the terpene synthase is a modified terpene synthase relative to a wild-type terpene synthase. In some embodiments, the cell is a microbial cell, such as a bacterial cell (e.g., E. coli). In some embodiments, cell had been previously engineered to expresses the metabolic pathway under conditions suitable to provide metabolic intermediates for producing the bioactive molecule by the exogenous synthase. In other embodiments, the methods further comprise introducing one or more heterologous nucleic acid molecules encoding the metabolic pathway disclosed herein. In some embodiments, the methods further comprise introducing one or more heterologous nucleic acid molecules encoding the two-hybrid system disclosed herein. In some embodiments, the method of introducing the two-hybrid system into the cell is performed under conditions sufficient to cause the RNA polymerase omega subunit to recruit RNA polymerase to the binding site for RNA polymerase in the absence of a target enzyme (e.g., protease, phosphatase) inhibitor, thereby expressing the reporter gene.
In some embodiments, methods further comprise culturing the cell in a growth cell medium. In some embodiments the growth cell medium may comprise glycerol at a concentration between 0% and 2% (by volume). In some embodiments, the growth medium comprises mevalonate at a concentration between 0 mM and 20 mM. In some embodiments, the growth medium comprises iPTG at a concentration between 0 mM and 0.5 mM. In some embodiments, the growth medium comprises MOPS at a concentration between 0 mM and 50 mM. In some embodiments, the growth medium comprises sucrose at a concentration between 0% and 5% weight/volume.
In some embodiments, the cell is incubated for a certain length of time. In some embodiments, the length of time comprises more than 10 seconds and no more than 4 weeks, 1-10 minutes, 1-60 minutes, 0-24 hours, 0-48 hours, 0-72 hours, 1-5 days, 1-7 days, 0-4 weeks, or 1-4 weeks. In some embodiments, the length of time comprises about 10 seconds, 11 seconds, 12 seconds, 13 seconds, 14 seconds, 15 seconds, 16 seconds, 17 seconds, 18 seconds, 19 seconds, 20 seconds, 21 seconds, 22 seconds, 23 seconds, 24 seconds, 25 seconds, 26 seconds, 27 seconds, 28 seconds, 29 seconds, 30 seconds, 31 seconds, 32 seconds, 33 seconds, 34 seconds, 35 seconds, 36 seconds, 37 seconds, 38 seconds, 39 seconds, 40 seconds, 41 seconds, 42 seconds, 43 seconds, 44 seconds, 45 seconds, 46 seconds, 47 seconds, 48 seconds, 49 seconds, 50 seconds, 51 seconds, 52 seconds, 53 seconds, 54 seconds, 55 seconds, 56 seconds, 57 seconds, 58 seconds, 59 seconds, 60 seconds, 2 minutes, 3 minutes, 5 minutes, 6 minutes, 7 minutes, 8 minutes, 9 minutes, 10 minutes, 15 minutes, 30 minutes, 45 minutes, 60 minutes, 2 hours, 3 hours, 4 hours, 5 hours, 6 hours, 7 hours, 8 hours, 9 hours, 10 hours, 11 hours, 12 hours, 24 hours, 36 hours, 48 hours, 56 hours, 72 hours, 96 hours, 120 hours, 4 days, 5 days, 6 days, 7 days, 8 days, 10 days, 11 days, 12 days 13 days, 2 weeks, 3 weeks, or 4 weeks. In some embodiments, the cell is incubated at a temperature comprising no lower than 4 degrees Celsius and no higher than 40 degrees Celsius. In some embodiments, the cell is incubated at a temperature comprising about 4-40 degrees Celsius, 4-37 degrees Celsius, 20-40 degrees Celsius, 20-37 degrees Celsius, 20-30 degrees Celsius, 20-25 degrees Celsius, 30-35 degrees Celsius, or 25-37 degrees Celsius. In some embodiments, the cell is incubated at a temperature comprising about 4 degrees Celsius, 20 degrees Celsius, 21 degrees Celsius, 22 degrees Celsius, 23 degrees Celsius, 24 degrees Celsius, 25 degrees Celsius, 26 degrees Celsius, 27 degrees Celsius, 28 degrees Celsius, 29 degrees Celsius, 30 degrees Celsius, 31 degrees Celsius, 32 degrees Celsius, 33 degrees Celsius, 34 degrees Celsius, 35 degrees Celsius, 36 degrees Celsius, 37 degrees Celsius, 38 degrees Celsius, 39 degrees Celsius, or 40 degrees Celsius. In some embodiments, the cell is cultured in a suspension. In some embodiments, the cell is cultured in a solid medium, such as an agarose plate.
In some embodiments, the methods further comprise selecting the cell colonies containing the modulator of the target enzyme for further analysis by identifying the colonies that express the GOI. In some embodiments, where the GOI encodes for antibiotic resistance, the cells are plated and incubated on a solid medium containing an antibiotic (e.g., kanamycin, tetracycline, chloramphenicol) that is lethal to the cells that do not express the GOI. In some embodiments, where the GOI encodes an enzyme that produces a luminescent biomolecule (e.g., LuxAB), the cell colonies are expanded on solid medium, and introduced to a substrate of the enzyme, and cell colonies that produce the luminescent biomolecule are visible. In some embodiments, where the GOI encodes a fluorescent biomolecule (e.g., GFP) directly or indirectly, the cell colonies comprising the fluorescent biomolecule are visible. In some embodiments, the signal produced from the fluorescent biomolecule is increased by 2-fold, 3-fold, 4-fold, 5-fold, 10-fold, or 100-fold when it is expressed indirectly (e.g., when the GOI encodes an RNA polymerase that induced expression of the fluorescent biomolecule) than if expressly directly from the GOI. In some embodiments, the signal produced from the fluorescent biomolecule is increased by 2-fold to 100-fold, 3-fold to 90-fold, 4-fold to 80-fold, 5-fold to 70-fold, 6-fold to 60-fold, 7-fold to 50-fold, 8-fold to 40-fold, 9-fold to 30-fold, 10-fold to 20-fold when it is expressly indirectly rather than directly from the GOI. In some embodiments, the cell colonies that express the modulator of the target enzyme are cultured in suspension and expanded until they reach a certain optical density (OD) of about 600. In some embodiments, the cells are isolated from the liquid medium and pelleted using centrifugation, and stored as necessary before further analysis.
Provided herein are methods of determining a presence of a bioactive molecule that inhibits activity of a target enzyme, the method comprising: (a) introducing into a cell an exogenous genetically-encoded system that links expression of a gene of interest to biosynthesis of the bioactive molecule by the cell, wherein the exogenous genetically-encoded system encodes the target enzyme, a synthase of the bioactive molecule, a ligand and a receptor specific to the ligand under conditions sufficient for binding of the ligand to the receptor to form a ligand-receptor pair, wherein the ligand-receptor pair induces expression of the gene of interest that is increased relative to a reference expression level when the bioactive molecule that inhibits the activity of the target enzyme is present in the cell, wherein the binding does not induce the expression of the gene of interest that is increased relative to the reference expression level when the bioactive molecule that inhibits the activity of the target enzyme is not present in the cell; (b) measuring the expression of the reporter polypeptide; and (c) determining the presence of the bioactive molecule in the cell if the expression of the gene of interest is increased relative to the reference expression level. In some embodiments, the reference expression level is obtained from an otherwise identical reference cell that does not comprise a functional version of the synthase, the ligand or the receptor. In some embodiments, inhibition of the target enzyme by the bioactive molecule disrupts binding between the receptor and the ligand, thereby preventing the ligand-receptor pair from inducing expression of the gene of interest. In some embodiments, the binding of the ligand to the receptor is phosphorylation dependent. In some embodiments, the synthase comprises a terpene synthase or a nonribosomal peptide synthetase. In some embodiments, the terpene synthase comprises γ-humulene synthase (GHS), amorphadiene synthase (ADS), α-bisabolene synthase (ABS), or taxadiene synthase (TXS). In some embodiments, the terpene synthase comprises an amino acid sequence that is greater than or equal to about 90% identical to any one of SEQ ID NO: 4, 7, 13, or 17. In some embodiments, the terpene synthase is a catalytically active portion thereof. In some embodiments, the catalytically active portion of the terpene synthase comprises an amino acid sequence that is greater than or equal to about 90% identical to any one of the amino acid sequences provided in Table 30. In some embodiments, the ligand comprises a kinase substrate that binds to the receptor in a phosphorylated state, and the receptor comprises a phosphorylated protein binding domain that binds to the kinase substrate when the kinase substrate is phosphorylated. In some embodiments, the ligand is coupled to an omega subunit of RNA polymerase or portions thereof and the receptor is coupled to a DNA binding domain, or the ligand is coupled to the DNA binding domain and the receptor is coupled to the omega subunit or portions thereof of the RNA polymerase. In some embodiments, the gene of interest encodes a reporter polypeptide comprising a luciferase enzyme, a fluorescent polypeptide, secreted alkaline phosphatase, ß-galactosidase levansucrase, chloramphenicol acetyltransferase (CAT), or antibiotic resistance. In some embodiments, the gene of interest encodes a polymerizing enzyme that, when expressed, binds to a promoter operably linked to a gene encoding the reporter polypeptide to drive expression of the reporter polypeptide. In some embodiments, the genetically-encoded system further encodes a metabolic pathway for biosynthesis of the bioactive molecule. In some embodiments, metabolic pathway is an isoprenoid pathway. In some embodiments, the isoprenoid pathway comprises a mevalonate pathway, a methylerthritol 4-phosphate (MEP) pathway, or a deoxyxylulose 5-phosphate (DXP) pathway. In some embodiments, the signal produced from the reporter polypeptide is increased by 2-fold, 3-fold, 4-fold, 5-fold, 10-fold, or 100-fold when it is expressed indirectly (e.g., when the GOI encodes an RNA polymerase that induced expression of the reporter polypeptide) than if expressly directly from the GOI. In some embodiments, the signal produced from the fluorescent biomolecule is increased by 2-fold to 100-fold, 3-fold to 90-fold, 4-fold to 80-fold, 5-fold to 70-fold, 6-fold to 60-fold, 7-fold to 50-fold, 8-fold to 40-fold, 9-fold to 30-fold, 10-fold to 20-fold when it is expressly indirectly rather than directly from the GOI.
In some embodiments, the method may further comprise measuring the expression of said reporter polypeptide comprising a protein that confers antibiotic resistance by using drops of liquid culture to seed cells on solid media containing different concentrations of antibiotic such that cells that produce a bioactive molecule that modulates the activity of said target enzyme grow to higher concentrations of antibiotic than cells that do not produce that molecule or that produce less of it.
In some embodiments, the method may further comprise isolating a bioactive molecule that modulates the target enzyme (e.g., inhibitor of the protease or phosphatase). In some embodiments, the target enzyme comprises a target phosphatase disclosed herein In some embodiments, the target enzyme comprises a target protease. In some embodiments, the target protease may comprise a viral protease. In some embodiments, the viral protease may comprise HIV-1 protease (HIV-1Pr) or SARS-COV-2 main protease (3ClPro). Methods of isolating the bioactive molecule comprises (1) breaking the cells to release their chemical constituents; (2) extracting the sample using a suitable solvent (or through distillation or the trapping of compounds); (3) separating the desired bioactive molecule (e.g., terpene or terpenoid) from other undesired contents of the extracts that confound analysis and quantification; and (4) use an appropriate method of analysis (e.g. thin layer chromatography [TLC], gas chromatography [GC], or liquid chromatography [LC]), as discussed in Jiang Z, Kempinski C, Chappell J. “Extraction and Analysis of Terpenes/Terpenoids.” Curr Protoc Plant Biol. 1 (2016) 345-358, which is hereby incorporated by reference in its entirety. In some embodiments, Nuclear Magnetic Resonance (NMR) is also used to examine molecules. In some embodiments, only the cell cultures are spun down and only the culture supernatant is analyzed. In some embodiments, the cells are spun down, washed, and lysed, and only the intracellular molecules are analyzed. In some embodiments, both extracellular and intracellular molecules are analyzed.
Provided herein, are high throughput methods of identifying a plurality of modulators of a plurality of target enzymes using multiplex sequencing analysis. In some embodiments, the method may comprise: (a) introducing into a plurality of cells a nucleic acid sequence encoding exogenous synthases for producing the bioactive molecule in the cell; (b) introducing into the plurality of cells one or more nucleic acid sequences encoding a metabolic pathway under conditions suitable to provide metabolic intermediates for producing the bioactive molecule by the exogenous synthases; (c) introducing into the plurality of cells two-hybrid system that links the modulation of a target enzyme of the plurality of target enzymes with expression of a GOI, wherein the two-hybrid system comprises a nucleic acid sequence encoding a target enzyme, wherein the nucleic acid sequence comprises one or more molecular barcodes corresponding to the target enzyme, the metabolic pathway, the synthase, or a combination thereof; (d) measuring expression of the reporter gene in the plurality of cells; (e) performing multiplexed sequencing analysis to produce sequencing data; (f) demultiplexing the sequencing data by assigning the one or more molecular barcodes of each of the plurality of cells to the respective target enzyme produced by that cell. In some embodiments, the nucleic acid sequence encoding each of the exogenous synthases is also barcoded to enable to assignment of the synthase (or mutant thereof) and the resulting bioactive molecule produced by the cell. In some embodiments, methods further comprise detecting an increased expression of the GOI in a subset of the plurality of cells, as compared to a reference expression level, which is thereby indicative that the subset of the plurality of cells produced an inhibitor of the target enzyme. In some embodiments, multiplexed sequencing analysis is performed on the plurality of cells prior to (d) to measure baseline expression of a terpene synthase in each of the plurality of cells. In some embodiments, the multiplex sequencing in (e) is performed on a subset of the plurality of cells with an increase or a decrease in the expression for the reporter gene (e.g., indicating presence of a bioactive molecule modulating the activity of the target enzyme) to measure the expression of a terpene synthase. In some embodiments, enrichment of the terpene synthase is determined by comparing the expression of the terpene synthase to identify the subset of the plurality of cells that produced a bioactive molecule modulating the activity of the target enzyme.
Provided herein, are high throughput methods of identifying a plurality of modulators of a plurality of target enzymes using multiplex sequencing analysis. In some embodiments, the method may comprise: (a) introducing into a plurality of cells a nucleic acid sequence encoding exogenous synthases for producing the bioactive molecule in the cell; (b) introducing into the plurality of cells one or more nucleic acid sequences encoding a metabolic pathway under conditions suitable to provide metabolic intermediates for producing the bioactive molecule by the exogenous synthases, wherein the one or more nucleic acid sequences encoding the metabolic pathway comprises one or more molecular barcodes corresponding to the metabolic pathway; (c) introducing into the plurality of cells two-hybrid system that links the inhibition of a target enzyme of the plurality of target enzymes with expression of a GOI, wherein the two-hybrid system comprises a nucleic acid sequence encoding a target enzyme; (d) measuring expression of the reporter gene in the plurality of cells; (e) using multiplexed sequencing analysis to produce sequencing data; (f) demultiplexing the sequencing data by assigning the one or more molecular barcodes of each of the plurality of cells to the respective target enzyme produced by that cell. In some embodiments, the nucleic acid sequence encoding each of the exogenous synthases is also barcoded to enable to assignment of the synthase (or mutant thereof) and the resulting bioactive molecule produced by the cell. In some embodiments, methods further comprise detecting an increased expression of the GOI in a subset of the plurality of cells, as compared to a reference expression level, which is thereby indicative that the subset of the plurality of cells produced an inhibitor of the target enzyme. In some embodiments, the nucleic acid sequence encoding each of the target enzymes comprises a unique molecular barcode enabling the identification of the target enzyme with the bioactive molecule that is identified. In some embodiments, multiplexed sequencing analysis is performed on the plurality of cells prior to (d) to measure baseline expression of a terpene synthase in each of the plurality of cells. In some embodiments, the multiplex sequencing in (e) is performed on a subset of the plurality of cells with an increase or a decrease in the expression for the reporter gene (e.g., indicating presence of a bioactive molecule modulating the activity of the target enzyme) to measure the expression of a terpene synthase. In some embodiments, enrichment of the terpene synthase is determined by comparing the expression of the terpene synthase to identify the subset of the plurality of cells that produced a bioactive molecule modulating the activity of the target enzyme.
In some embodiments, the plurality of cells are pooled prior to measuring in (d), thereby reducing the time to perform the analysis. In this manner, from 102 to 1010 colony-forming cells may be analyzed in parallel, thereby drastically reducing the time of analysis for large screens. In some embodiments, multiple bioactive molecules may be identified in a single implementation of the method. In some embodiments, from 102 to 1010, 103 to 109, 104 to 108, 105 to 107 colony-forming cells may be analyzed in parallel. In some embodiments, more than or equal to about 102, 103, 104, 105, 106, 107, 108, 109, or 1010 colony-forming cells may be analyzed in parallel. In some embodiments, fewer than or equal to about 102, 103, 104, 105, 106, 107, 108, 109, or 1010 colony-forming cells may be analyzed in parallel.
In some embodiments, the multiplex sequencing comprises sequencing-by-synthesis, sequencing by transient binding, single-molecule real-time sequencing, ion semiconductor sequencing (Iron Torrent®), pyrosequencing, combinatorial probe anchor synthesis (cPAS), sequencing-by-ligation, nanopore sequencing, or semiconductor-based electronic sequencing (GenapSys™). In some embodiments, the assessing the enrichment of target enzymes within the subset of cells, as compared to the plurality of cells, is performed by a computer processor programmed to demultiplex the genetic information that was sequenced. In some embodiments, the primary or secondary sequencing data analysis is performed by a computer systems disclosed herein. In some embodiments, the secondary sequence data analysis comprises demultiplexing the molecular barcodes sufficient to identify the metabolic pathway, the synthase, or both that produced the bioactive molecule with therapeutic potential, or the target enzyme that the bioactive molecule inhibits, or a combination thereof.
In some embodiments, methods further comprise introducing into each cell of the plurality of cells a nucleic acid sequence encoding the unique terpene synthase. In some embodiments, the nucleic acid sequence comprises a barcode sufficient to identify the terpene synthase. In some embodiments, the method may further comprise: (a) identifying the second barcode in cells within each of (i) the plurality of cells and (ii) a subset of the plurality of cells with an increased expression level of the reporter gene, (b) assessing the enrichment of terpene synthases within the subset of cells, as compared to the plurality of cells, thereby identifying which of the unique exogenous terpene synthase in each cell produces the inhibitor of the target enzyme in that cell. In some embodiments, the one or more nucleic acid sequence encoding the metabolic pathway encodes an enzyme that catalyzes the condensation of isopentenyl diphosphate (IPP), dimethylallyl diphosphate (DMAPP), or a combination of IPP and DMAPP. In some embodiments, the enzyme comprises geranylgeranyl diphosphate synthase (GGPPS). In some embodiments, the one or more nucleic acid sequences further comprises a cytochrome P450 enzyme, a cytochrome P450 reductase enzyme, a cytochrome b5 enzyme, an oxidase enzyme, an acyl transferase enzyme, a glycosyltransferase enzyme, a halogenase, or a peroxidase, or a combination thereof. In some embodiments, the one or more nucleic acid sequences encode a metabolic pathway for IPP, DMAPP and/or molecules resulting from the condensation of IPP and/or DMAPP. In some embodiments, the GOI encodes a polymerizing enzyme that, when expressed, binds to a promoter operably linked to a gene encoding a detectable polypeptide to drive expression of the detectable polypeptide, wherein the detectable polypeptide optionally may comprise a fluorescent polypeptide. In some embodiments, the expression of the detectable polypeptide may be greater than an expression of the detectable polypeptide when its gene may be included as the reporter gene.
Also provided are methods of adding one or more molecular barcodes to one or more nucleic acids disclosed herein. In some embodiments, the one or more barcodes may be added by polymerase chain reaction (PCR), ligation, or transposition. Suitable techniques for attaching molecular barcodes to one or more nucleic acids disclosed herein are provided in Head, Steven R., et al. “Library construction for next-generation sequencing: overviews and challenges.” Biotechniques 56.2 (2014): 61-77; and in Hu, Taishan, et al. “Next-generation sequencing technologies: An overview.” Human Immunology 82.11 (2021): 801-811; and in Gkazi, Athina. “An Overview of Next-Generation Sequencing.” (2021), which is hereby incorporated by reference in its entirety
In some embodiments, the method may further comprise culturing the plurality of cells in a growth cell medium, provided in Section I (A) (1) herein. In some embodiments, the growth cell medium comprises (i) glycerol at a concentration between 1 and 2%, (ii) mevalonate at a concentration between 0 and 20 mM, (iii) or a combination of (i) and (ii).
Also provided are methods of determining a presence of a bioactive molecule that inhibits activity of a target enzyme, the method comprising: (a) introducing into a cell an exogenous genetically-encoded system that links expression of a gene of interest to biosynthesis of the bioactive molecule by the cell, wherein the exogenous genetically-encoded system encodes the target enzyme, a synthase of the bioactive molecule, a ligand and a receptor specific to the ligand under conditions sufficient for binding of the ligand to the receptor to form a ligand-receptor pair, wherein the ligand-receptor pair comprises a cleavage site recognized by the proteolytic enzyme, and induces expression (e.g., activates transcription) of the gene of interest that is increased relative to a reference expression level when the bioactive molecule that inhibits the activity of the target enzyme is present in the cell, wherein the binding does not induce the expression of the gene of interest that is increased relative to the reference expression level when the bioactive molecule that inhibits the activity of the target enzyme is not present in the cell, wherein the target enzyme comprises a proteolytic enzyme; (b) measuring the expression of the gene of interest; and (c) determining the presence of the bioactive molecule in the cell if the expression of the gene of interest is increased relative to the reference expression level. In some embodiments, the reference expression level is obtained from an otherwise identical reference cell that does not comprise a functional version of the synthase, the ligand or the receptor. In some embodiments, inhibition of the target enzyme by the bioactive molecule disrupts binding between the receptor and the ligand, thereby preventing the ligand-receptor pair from inducing expression of the gene of interest. In some embodiments, the binding of the ligand to the receptor is phosphorylation dependent. In some embodiments, the synthase comprises a terpene synthase or a nonribosomal peptide synthetase. In some embodiments, the terpene synthase comprises γ-humulene synthase (GHS), amorphadiene synthase (ADS), α-bisabolene synthase (ABS), or taxadiene synthase (TXS). In some embodiments, the terpene synthase comprises an amino acid sequence that is greater than or equal to about 90% identical to any one of SEQ ID NO: 4, 7, 13, or 17. In some embodiments, the ligand comprises a kinase substrate that binds to the receptor in a phosphorylated state, and the receptor comprises a phosphorylated protein binding domain that binds to the kinase substrate when the kinase substrate is phosphorylated. In some embodiments, the ligand is coupled to an omega subunit of RNA polymerase or portions thereof and the receptor is coupled to a DNA binding domain, or the ligand is coupled to the DNA binding domain and the receptor is coupled to the omega subunit or portions thereof of the RNA polymerase. In some embodiments, the gene of interest encodes a reporter polypeptide comprising a luciferase enzyme, a fluorescent polypeptide, secreted alkaline phosphatase, ß-galactosidase levansucrase, chloramphenicol acetyltransferase (CAT), or antibiotic resistance. In some embodiments, the gene of interest encodes a polymerizing enzyme that, when expressed, binds to a promoter operably linked to a gene encoding the reporter polypeptide to drive expression of the reporter polypeptide. In some embodiments, the genetically-encoded system further encodes a metabolic pathway for biosynthesis of the bioactive molecule. In some embodiments, metabolic pathway is an isoprenoid pathway. In some embodiments, the isoprenoid pathway comprises a mevalonate pathway, a methylerthritol 4-phosphate (MEP) pathway, or a deoxyxylulose 5-phosphate (DXP) pathway. In some embodiments, the signal produced from the reporter polypeptide is increased by 2-fold, 3-fold, 4-fold, 5-fold, 10-fold, or 100-fold when it is expressed indirectly (e.g., when the GOI encodes an RNA polymerase that induced expression of the reporter polypeptide) than if expressly directly from the GOI. In some embodiments, the signal produced from the fluorescent biomolecule is increased by 2-fold to 100-fold, 3-fold to 90-fold, 4-fold to 80-fold, 5-fold to 70-fold, 6-fold to 60-fold, 7-fold to 50-fold, 8-fold to 40-fold, 9-fold to 30-fold, 10-fold to 20-fold when it is expressly indirectly rather than directly from the GOI.
Provided herein, in some aspects are kits comprising the one or more system components disclosed herein. In some embodiments, the kits comprise one or more components of the genetically-encoded system described herein. In some embodiments, the kits comprise the one or more nucleic acid molecules encoding the two-hybrid system (e.g., B2H system). In some embodiments, the kits comprise the one or more nucleic acid molecules encoding the metabolic pathway. In some embodiments, the kits comprise the one or more nucleic acid molecules encoding the terpene synthase. In some embodiments, the kits further comprise a cell, or a plurality of cells. In some embodiments, the kits further comprise cell media, such as growth media. In some embodiments, the kits further comprise additional constituents of the cell media, such as mevalonate, antibiotics, and so forth. The exact nature of the components configured in the inventive kit depends on its intended purpose. For example, some kits are configured for the purpose of producing a genetically encoded microorganism, isolating a bioactive molecule from a plurality of genetically encoded microorganisms, or investigating therapeutic potential of the one or more bioactive molecules.
Instructions for use may be included in the kit. “Instructions for use” typically include a tangible expression describing the technique to be employed in using the components of the kit to effect a desired outcome, e.g., producing a genetically encoded microorganism, isolating a bioactive molecule from a plurality of genetically encoded microorganisms, or investigating therapeutic potential of the one or more bioactive molecules. Optionally, the kit also contains other useful components, such as, diluents, buffers, pharmaceutically acceptable carriers, syringes, catheters, applicators, pipetting or measuring tools, bandaging materials or other useful paraphernalia as will be readily recognized by those of skill in the art.
The materials or components assembled in the kit can be provided to the user stored in any convenient and suitable ways that preserve their operability and utility. For example, the components can be in dissolved, dehydrated, or lyophilized form; they can be provided at room, refrigerated or frozen temperatures. The components are typically contained in suitable packaging material(s). As employed herein, the phrase “packaging material” refers to one or more physical structures used to house the contents of the kit, such as inventive compositions and the like. The packaging material is constructed by well-known methods, preferably to provide a sterile, contaminant-free environment. The packaging materials employed in the kit are those customarily utilized in gene expression assays and in the administration of treatments. As used herein, the term “package” refers to a suitable solid matrix or material such as glass, plastic, paper, foil, and the like, capable of holding the individual kit components. Thus, for example, a package can be a plastic vial or tube used to contain suitable quantities of the genetically-encoded system, and/or cells. The packaging material generally has an external label which indicates the contents and/or purpose of the kit and/or its components.
Unless defined otherwise, all terms of art, notations and other technical and scientific terms or terminology used herein are intended to have the same meaning as is commonly understood by one of ordinary skill in the art to which the claimed subject matter pertains. In some cases, terms with commonly understood meanings are defined herein for clarity and/or for ready reference, and the inclusion of such definitions herein should not necessarily be construed to represent a substantial difference over what is generally understood in the art.
Throughout this application, various embodiments may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosure. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
As used in the specification and claims, the singular forms “a”, “an” and “the” include plural references unless the context clearly dictates otherwise. For example, the term “a sample” includes a plurality of samples, including mixtures thereof.
The terms “determining,” “measuring,” “evaluating,” “assessing,” “assaying,” and “analyzing” are often used interchangeably herein to refer to forms of measurement. The terms include determining if an element is present or not (for example, detection). These terms can include quantitative, qualitative or quantitative and qualitative determinations. Assessing can be relative or absolute. “Detecting the presence of” can include determining the amount of something present in addition to determining whether it is present or absent depending on the context.
The terms “subject,” “individual,” or “patient” are often used interchangeably herein. A “subject” can be a biological entity containing expressed genetic materials. The biological entity can be a plant, animal, or microorganism, including, for example, bacteria, viruses, fungi, and protozoa. The subject can be tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro. The subject can be a mammal. The mammal can be a human. The subject may be diagnosed or suspected of being at high risk for a disease. In some cases, the subject is not necessarily diagnosed or suspected of being at high risk for the disease.
The term “in vivo” is used to describe an event that takes place in a subject's body.
The term “ex vivo” is used to describe an event that takes place outside of a subject's body. An ex vivo assay is not performed on a subject. Rather, it is performed upon a sample separate from a subject. An example of an ex vivo assay performed on a sample is an “in vitro” assay.
The term “in vitro” is used to describe an event that takes places contained in a container for holding laboratory reagent such that it is separated from the biological source from which the material is obtained. In vitro assays can encompass cell-based assays in which living or dead cells are employed. In vitro assays can also encompass a cell-free assay in which no intact cells are employed.
The term “about” is used herein with reference to a number refers to that number plus or minus 10% of that number. The term “about” a range refers to that range minus 10% of its lowest value and plus 10% of its greatest value.
The terms, “polynucleotide,” or “nucleic acid,” are used interchangeably herein to refer to polymers of nucleotides of any length, and include DNA and RNA. The nucleotides can be deoxyribonucleotides, ribonucleotides, modified nucleotides or bases, and/or their analogs, or any substrate that can be incorporated into a polymer by DNA or RNA polymerase. A polynucleotide may comprise modified nucleotides, such as, but not limited to methylated nucleotides and their analogs or non-nucleotide components. Modifications to the nucleotide structure may be imparted before or after assembly of the polymer. A polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component.
The term “cell,” as used herein, generally refers to a biological cell.
The term “gene,” as used herein, refers to a segment of nucleic acid that encodes an individual protein or RNA (also referred to as a “coding sequence” or “coding region”), optionally together with associated regulatory region such as promoter, operator, terminator and the like, which may be located upstream or downstream of the coding sequence. A “genetic locus” referred to herein, is a particular location within a gene.
The terms “increased” or “increase” are used herein to generally mean an increase by a statically significant amount.
The terms “decreased” or “decrease” are used herein generally to mean a decrease by a statistically significant amount.
The terms “polypeptide,” “peptide” and “protein” may be used interchangeably herein in reference to a polymer of amino acid residues. A protein may refer to a full-length polypeptide as translated from a coding open reading frame, or as processed to its mature form, while a polypeptide or peptide may refer to a degradation fragment or a processing fragment of a protein that nonetheless uniquely or identifiably maps to a particular protein. A polypeptide may be a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of adjacent amino acid residues. Polypeptides may be modified, for example, by the addition of carbohydrate, phosphorylation, etc.
The terms “homologous,” “homology,” or “percent homology” when used herein to describe to an amino acid sequence or a nucleic acid sequence, relative to a reference sequence, can be determined using the formula described by Karlin and Altschul (Proc. Natl. Acad. Sci. USA 87:2264-2268, 1990, modified as in Proc. Natl. Acad. Sci. USA 90:5873-5877, 1993). Such a formula is incorporated into the basic local alignment search tool (BLAST) programs of Altschul et al. (J Mol Biol. 1990 Oct. 5; 215 (3): 403-10; Nucleic Acids Res. 1997 Sep. 1; 25 (17): 3389-402). Percent homology of sequences can be determined using the most recent version of BLAST, as of the filing date of this application. Percent identity of sequences can be determined using the most recent version of BLAST, as of the filing date of this application.
The term “percent (%) identity”, or “percent sequence identity,” with respect to a reference polypeptide sequence is the percentage of amino acid residues in a candidate sequence that are identical with the amino acid residues in the reference polypeptide sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any conservative substitutions as part of the sequence identity. As used herein, the term “percent (%) identity”, or “percent sequence identity,” with respect to a reference nucleic acid sequence is the percentage of nucleotides in a candidate sequence that are identical with the nucleotides in the reference nucleic acid sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity. Alignment for purposes of determining percent sequence identity can be achieved in various ways that are known for instance, using publicly available computer software such as BLAST, BLAST-2, ALIGN or Megalign (DNASTAR) software. Appropriate parameters for aligning sequences are able to be determined, including algorithms needed to achieve maximal alignment over the full length of the sequences being compared. For purposes herein, however, % amino acid sequence identity values are generated using the sequence comparison computer program ALIGN-2. The ALIGN-2 sequence comparison computer program was authored by Genentech, Inc., and the source code has been filed with user documentation in the U.S. Copyright Office, Washington D.C., 20559, where it is registered under U.S. Copyright Registration No. TXU510087. The ALIGN-2 program is publicly available from Genentech, Inc., South San Francisco, Calif., or may be compiled from the source code. The ALIGN-2 program should be compiled for use on a UNIX operating system, including digital UNIX V4.0D. All sequence comparison parameters are set by the ALIGN-2 program and do not vary.
The term “two-hybrid system”, as used herein, refers to a genetic system for identifying protein-protein interactions (PPIs) and protein-DNA interactions. In some embodiments, the two-hybrid system detects interactions between the target enzyme and the target enzyme substrate by measuring activity of the target enzyme on the substrate evidenced by a readout of the genetic system, such as fluorescence or cell survival.
The terms “gene of interest” or “GOI,” as used interchangeably herein, refer to a gene encoding a gene expression product that is detectable directly or indirectly.
Amino acids disclosed herein may be represented by a one letter or three letter code under the Internal Union of Pure and Applied Chemistry (IUPAC) naming convention, as set forth in Table 23A.
A nucleotide disclosed herein may be represented by a one letter or symbol under the IUPAC naming convention, as set forth in Tables 23B below.
The term “bioactive molecule” as used herein refers to a molecule having a biologic effect on a living organism, tissue or cell.
The term, “metabolic pathway” as used herein refers to one or more chemical reactions carried out by constituents (e.g., reactants, products, intermediates) of the metabolic pathway within a cell. The phrase, “encoding a metabolic pathway” with reference to a nucleic acid molecule refers to a nucleic acid molecule encoding one or more of the constituents of the metabolic pathway. For instance, the metabolic pathway may be an isoprenoid pathway involved in the synthesis of isoprenoids. Non-limiting examples of isoprenoid pathways are mevalonate pathway and non-mevalonate pathway (e.g., methylerthritol 4-phosphate (MEP) or deoxyxylulose 5-phosphate (DXP) pathways). The isoprenoid pathway may produce isopentenyl diphosphate (IPP) or dimethylallyl diphosphate (DMAPP), which are precursors of isoprenoid biosynthesis. The metabolic pathway may be naturally occurring. The metabolic pathway may be synthetic. In either case, the metabolic pathway may be exogenous to the cell. A non-limiting example of a synthetic metabolic pathway is the isopentenol utilization pathway (IUP) described in AO Chatzivasileiou et al., “Two-step pathway for isoprenoid synthesis.” Applied Biological Sciences. 116 (2) 506-511 (Dec. 24, 2018), which is hereby incorporated by reference.
The term, “synthase” or “synthetase,” as used interchangeably herein, refers to an enzyme that is capable of catalyzing synthesis of a molecule. In some embodiments, the molecule is a bioactive molecule disclosed herein.
The term, “ligand” a used herein refers to molecule that binds to another molecule, such as a receptor. In some cases, the ligand binds to a receptor disclosed herein to serve a biological purpose, such as for example, activate transcription of a gene of interest (GOI). In some embodiments, the ligand is a phosphorylated amino acid.
The term, “receptor” as used herein refers to a protein that binds to a molecule, such as a ligand disclosed herein.
The term, “transcription” as used herein refers to the process by which the information in a strand of DNA is copied into a new molecule of messenger RNA (mRNA).
The term “polymerizing enzyme” as used herein refers to an enzyme or catalytically active portion thereof capable of polymerizing the synthesis of a polymer, such as a nucleic acid molecule. In some cases, the polymerizing enzyme is a “DNA polymerase,” which refers to an enzyme or catalytically active portion thereof capable of polymerizing the synthesis of DNA. In some cases, the polymerizing enzyme is an “RNA polymerase,” which refers to an enzyme or catalytically active portion thereof capable of polymerizing the synthesis of RNA.
The term “terpene” as used herein refers to an organic compound that is a simple hydrocarbon. In some cases, the terpene has 1, 2, 3, 4, 5, 6, 7, 8, or more isoprene units. Non-limiting examples of terpenes include isoprene, monoterpenes, sesquiterpenes, diterpenes, sesterterpenes, triterpenes, tetraterpenes and polyterpenes.
The term “terpenoid” or “isoprenoid” as used interchangeably herein refers to a terpene that has been modified to contain one or more functional groups, oxidized methyl groups, or a combination thereof. In some cases, the terpene has 1, 2, 3, 4, 5, 6, 7, 8, or more isoprene units. Non-limiting examples of terpenoids include hemiterpenoids, monoterpenoids, sesquiterpenoids, diterpenoids, sesterterpenoids, triterpenoids, tetraterpenoids, and polyterpenoids.
The term “isoprene” referred to herein, refers to 2-methyl-1,3-butadiene (e.g., CH2=C (CH3)-CH═CH2).
The term “isoprenoid” as used herein refers to an organic molecule containing two or more isoprene units.
The term “multiplex sequencing,” as used herein, refers to sequencing genetic information from two or more samples in a single sequencing run. In some cases, the two or more samples each comprise cells harboring a distinct two-hybrid system, a metabolic pathway, or both. In some embodiments, the multiplex sequencing includes pooling two or more samples prior to sequencing.
The term, “proteolytic enzyme” as used herein is an enzyme or catalytically active portion thereof capable of proteolysis. In some embodiments, the proteolytic enzyme is a protease, a peptidase, or proteinase. In some embodiments, the proteolytic enzyme is an exopeptidase or an endopeptidase.
The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described.
The following examples are included for illustrative purposes only and are not intended to limit the scope of the inventive concepts.
The molecules produced in the natural world have been shaped, over millennia, by enzymes evolving under selective pressure. Secondary metabolites—compounds that are not essential to the survival of their producers—are a remarkable outcome of this process; this class of chemicals encompasses an enormous diversity of molecular structures capable of carrying out complex biological functions. Many of the enzymes comprising secondary metabolic pathways are highly evolvable: mutations can alter their substrate specificity and reactivity to dramatically change the structures of the products they produce. This plasticity has been exploited to produce various compounds by engineering terpene synthases, the class of enzymes responsible for producing terpenoids (a vast natural product family including many secondary metabolites); however, systems that pair product diversification with a selective pressure to observe evolutionary trajectories of a terpene synthase are lacking. Here, a genetically encoded bacterial two-hybrid system conferring antibiotic resistance in response to the inactivation of heterologously expressed protein tyrosine phosphatase 1B (an important drug target) can be used to evolve a terpene synthase in E. coli. Starting with γ-humulene synthase, a low-producing terpene synthase generating many products, the work described herein shows that 1-2 mutations are sufficient to enhance resistance to an antibiotic in the system—an indication of drug target inhibition. The best mutants are better tolerated by E. coli and increase total terpene production, and one of these variants exhibits a product profile shifted towards two potential terpenoid inhibitors with titers increased 12- and 50-fold compared to the starting enzyme. The results demonstrate the feasibility of using genetically encoded selection pressures to evolve biosynthetic enzymes towards the production of biologically active molecules.
Terpenoids are the largest and most structurally diverse group of natural products and include a striking variety of biologically active compounds, from flavors to medicines. Terpenoids play an outsized role in the evolution and adaptation of living systems. These secondary metabolites carry out a broad set of physiological functions in their native hosts (e.g., signaling, protein localization, and protection from abiotic stress) and mediate essential interactions between unlike organisms (e.g., plants and pollinators, microbial pathogens, and symbionts). For millennia, their sophisticated biological activities have found use in flavors, fragrances, and medicines. Despite this well-documented biochemical versatility, the evolutionary processes that generate new functional terpenoids are poorly understood and difficult to recapitulate in engineered systems. This study uses a synthetic biochemical objective—a transcriptional system that links the inhibition of protein tyrosine phosphatase 1B (PTP1B), a human drug target, to the expression of a gene for antibiotic resistance in E. coli—to evolve γ-humulene synthase (GHS) to build terpenoid inhibitors. Site-saturation mutagenesis of poorly conserved residues yielded mutants that improved fitness (e.g., the antibiotic resistance of E. coli) by reducing GHS toxicity and/or by increasing inhibitor production. Intriguingly, a combination of two mutations enhanced the titer of a minority product—a terpene alcohol that inhibits PTP1β—by over fifty-fold, and a comparison of similar mutants enabled the identification of a site where mutations permit efficient hydroxylation. Findings illustrate how the plasticity of terpene synthases enables an efficient sampling of structurally distinct starting points for building new functional molecules and provide an experimental framework for exploiting this plasticity in activity-guided screens.
All natural terpenoids have a common biosynthetic origin. Their assembly begins with two C5 isoprenoid precursors—isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP)—which are synthesized from either (i) acetyl-CoA through the mevalonate pathway (MVA) or (ii) pyruvate and glyceraldehyde 3-phosphate through the non-mevalonate pathway (MEP or DXP). Condensation of IPP and DMAPP generates longer isoprenoids, such as geranyl diphosphate (GPP, C10), farnesyl diphosphate (FPP, C15), or geranylgeranyl diphosphate (GGPP, C20), which are substrates for terpene synthases, P450 monooxygenases, and acyltransferases. Metabolic engineers have resolved the biosynthetic pathways of many important terpenoids (e.g., artemisinin, paclitaxel, and momilactone B); the evolutionary transformations that allow them to build new functional molecules, however, are difficult to probe without a framework for carrying out biosynthesis under selective pressures.
Terpene synthases are centrally important to terpenoid diversity. These enzymes convert a few linear substrates into hundreds of complex scaffolds (e.g., hydrocarbons with multiple fused rings and stereocenters), which form the core of more than 95,000 known natural products. These enzymes are intriguing because they share a small set of domain architectures (a, αβ, βγ, or αβγ) and catalytic motifs (e.g., DDXDD and NSE for class I cyclases, and DXDD for class II cyclases), given their diverse product profiles. In general, terpene synthases can act on a small set of linear substrates by initiating a carbocation cyclization cascade and controlling it by constraining the conformational space and termination steps accessible to intermediates; as a result, mutations that affect the volume, contour, and solvation structure of the active site tend to alter product profiles. As a case study, γ-humulene synthase (GHS) from Abies grandis converts FPP into over 50 sesquiterpenes (
In this study, E. coli was modified with a synthetic biochemical objective (the inhibition of protein tyrosine phosphatase 1B (PTP1B) from Homo sapiens) and was used to evolve mutants of GHS that achieve this objective. PTP1B is an influential regulatory enzyme, an important model system for biophysical studies, and an elusive drug target; new inhibitors could find broad use. GHS has a diverse, mutation-sensitive product profile and does not generate potent inhibitors of PTP1B in its wild-type form; it is a promising starting point for directed evolution. Using the artificial objective as a guide, the mutants of GHS were uncovered in which 1-2 amino acid substitutions confer a survival advantage by reducing GHS toxicity and/or by generating PTP1B inhibitors, which summarizing the mechanisms by which mutants enhance antibiotic resistance. Findings illustrated how terpene synthases can evolve quickly under artificial selection pressures to build biologically active molecules. The best performing mutants exhibited altered product profiles with a shared major product that inhibits PTP1B. These mutants illustrate how TSs can evolve with heterologous hosts to build molecules that solve new challenges.
To begin, a selective pressure was engineered to guide terpenoid biosynthesis in E. coli. A bacterial two-hybrid (B2H) system that links the inhibition of PTP1B to the expression of a gene for antibiotic resistance was specifically chosen. (
Early studies of enzyme evolution used GHS as a model system. In one seminal study, researchers mutated 19 residues that line the active site of GHS and used the product profiles of single mutants to design variants with very narrow—and very different—product profiles. In a follow-up study, the researchers showed that the rational redistribution of glycine and proline residues (e.g., rational mutations informed by residue conservation in a multiple sequence alignment) could improve terpenoid production in E. coli. Both studies suggested that the effects of mutations were additive (e.g., substitutions that enhanced terpenoid production by GHS also did so for mutants with different product profiles). This work provided a powerful framework for the rational redesign of terpene synthases, but it did not attempt to evolve them under selective pressures. The present disclosure provides evolving these enzymes to produce molecules that address a genetically encoded challenge in a heterologous host. The B2H system provides an opportunity for such studies (
To search for evolutionarily accessible changes in the activity of GHS that might improve its ability to generate inhibitors of PTP1B, site-saturation mutagenesis was carried out at sites likely to influence the volume and/or hydration structure of the active site. The amino acids that line the active sites of terpene synthases may not be amenable to mutagenesis nor likely to shift product profiles. At notable extremes, mutations at catalytic residues (e.g., the DXDD motif) can inactivate the enzymes, while mutations at other sites can disrupt folding. Mutable, yet influential sites were searched by targeting poorly conserved residues that are likely to affect the volume or hydration structure of the active site. These features help dictate the conformation space, entropic constraints, and termination steps available to reacting intermediates. The following procedure was used to identify the residues for the site-saturation mutagenesis: (i) X-ray crystal structures of abietadiene synthase (ABS) from Abies grandis and taxadiene synthase (TXS) from Taxus brevifolia (GHS does not have a crystal structure) were aligned. (ii) All residues within 8 Å of the substrate analog (2-fluoro-geranylgeranyl diphosphate) of the class I active site of TXS were selected, and a subset of sites that differ between ABS and TXS were identified. (iii) The sequences of ABS, TXS, GHS, EIS, and 8-selenine synthase (DSS) from Abies grandis were aligned (
In this equation, σV2 is the variance in volume, σHW2 is the variance in Hopp-Woods index, and nv and nHW are normalization factors (e.g., the highest variances measured in this study). (v) Each site was ranked according to S and selected the six highest-scoring sites (
The B2H system was used to search for single-site mutants that confer a survival advantage. Briefly, the mutant library was transformed into cells harboring both the B2H system (pB2H) and the mevalonate-dependent pathway for FPP and IPP (pMBIS), picked colonies that grew at high concentrations of spectinomycin (e.g., 400-600 μg/ml), cloned identified mutations into a new plasmid (to reduce the effects of random mutations outside of the terpene synthase), and used GC-MS to examine the product profiles of the selected resistance-enhancing mutants. In the initial screen, two mutants were identified that generated major shifts in terpenoid production (
A drop-based assay was used to examine the survival advantage conferred by both mutants (
In the initial screen, many hits (e.g., cells that survived at high concentrations of spectinomycin) had an incomplete or missing GHS gene (44%,
To bias the screen against B2H activation by FPP, which does not require terpene synthase activity, media conditions were searched that would disfavor this mode of survival by increasing FPP concentrations to toxic levels. In brief, concentrations of glycerol and mevalonate were increased—modifications known to enhance terpenoid production in E. coli—and evaluated the influence of these media conditions on antibiotic resistance. These conditions reduced the resistance conferred by incomplete TS plasmids (here, plasmids that lacked a GHS gene) but not the resistance afforded by A319Q (
Bias of the screen against FPP accumulation was sought by creating a penalty for this mechanism of B2H activation. In brief, the effects of higher glycerol and mevalonate concentrations were tested in solid media on empty vector survival. In studies using similar terpenoid pathways (e.g., pMBIS and a terpene synthase expressed from the pTrc99 plasmid), these modifications have increased terpenoid production, suggesting they may cause higher flux through the MBIS pathway. As expected, these media changes reduced the antibiotic resistance conferred by the empty vector but left the resistance conferred by A319Q unchanged (
To evolve the terpenoid pathway further, the additional rounds of mutagenesis were carried out on GHS. Mutations were searched for that might improve upon A319Q by using SSM and error-prone PCR (ePCR). For SSM, we selected the five remaining sites identified with Eq. 3-1; for ePCR, a homology model was generated all residues within 8 Å of the substrate analog used to select SSM sites were targeted (
The eight hits that grew at high concentrations of spectinomycin (400-600 μg/ml) were examined: five from SSM and three from ePCR (
Certain features of mutated synthases enhance antibiotic resistance, while others do not. For example, some mutated terpene synthases enhance antibiotic resistance because they produce inhibitors that activate the B2H (inhibitors of PTP1B in our paper), while in general, mutants that do not make an inhibitor do not enhance resistance as much as those that do. Whether a mutated synthase enhances antibiotic resistance is also impacted by other sources of toxicity. For example, mutants might form toxic aggregates, produce bactericidal metabolites or metabolites that disrupt cellular function, or that compete more effectively for essential metabolic intermediates than other enzymes in the cell (a sort of siphoning off effect). The resistance as described herein is likely a non-linear combination of multiple biochemical properties, which are challenging to predict from structure or sequence data alone. Properties that have been examined include inhibitor production (the goal of our genetically encoded system is to find mutants that produce inhibitors), enzyme toxicity (some mutations also make the GHS enzyme less toxic, perhaps, for one of the reasons shown above), and enzyme solubility (some mutations appear to improve stability).
The drop-based plating of these mutants was performed, as before. A319Q/Y415F conferred a survival advantage and exceeded the growth of A319Q (
Mutants A319Q and A319Q/Y415F enhanced antibiotic resistance (albeit, mildly) in the presence of an inactive B2H system (
Mutant A319Q/Y415F afforded the most spectinomycin resistance of any variant (
Y415C also generates large amounts of himachalol (e.g., intracellular concentrations of 389±33 μM) and merits further discussion. Like A319Q/Y415F, Y415C improved the specific growth rate of E. coli in liquid culture; however, unlike the double mutant, it failed to improve antibiotic resistance when paired with an inactive B2H system in our selection assay. At first glance, these results seem contradictory, but they probably reflect the different cellular stresses imposed by the two experiments. To collect growth curves, E. coli strains were used that lack both the FPP pathway and the B2H system; for selection experiments, both were included. As a result, the selection experiments place four additional stresses on the cell: (i) the isoprenoid pathway, which generates FPP, a toxic intermediate, (ii) the B2H system, which has no apparent toxicity but requires cellular resources for plasmid maintenance and constitutive protein expression, (iii) the antibiotics required to maintain pMBIS and pB2H, and (iv) spectinomycin (the variable selection pressure used in our assay). It is speculated that these stresses may accentuate differences in the toxicity of GHS mutants. This theory was explored, in part, by comparing the soluble fractions of Y415C, A319Q, and A319Q/Y415F overexpressed in E. coli (
Comparisons of GHS mutants, taken together, indicate that the pronounced fitness advantage of A319Q/Y415F results from both (i) its ability to overproduce himachalol, a PTP1B inhibitor, and (ii) its reduced cellular toxicity. Importantly, himachalol titer of A319Q/Y415F is over fifty-fold higher than that of the wild-type GHS; this mutant illustrates the efficiency with which terpene synthases can adapt to produce new biologically active molecules.
A single carbocation intermediate can undergo either (i) a 6,1-ring closure to form himachalanes or (ii) a 1,3-hydride shift to form humulanes (
Rational combination of the two best-performing mutants from the single-site screen was investigated. Unlike previous studies, combining mutations did not yield additive effects: compared to Y415C, the A319Q/Y415C mutant showed similar survival characteristics and an ˜57% reduction in terpenoid titers (the profiles, though, were shifted to β-himachalene and himachalol in both strains,
Throughout the evolution effort, three different mutations were identified at residue Y415: alanine, cystine, and phenylalanine. Curiously, each mutation at this position shifted the product profile of the enzyme towards himachalane-type sesquiterpenoids (products 7-10,
This study evolved γ-humulene synthase to solve a genetically encoded problem in E. coli: inhibition of a medicinally relevant enzyme. This is the first use of a growth-coupled selection to guide terpene synthase evolution towards production of a biologically active molecule. Potential inhibitors of PTP1B were identified that merit further investigation: himachalol and β-himachalene. Importantly, the final mutant showed a 50-fold increase in himachalol production over the wild-type enzyme. This remarkable improvement in titer of a minor product through only two rounds of evolution reveals a powerful feature of the approach for molecular discovery: mutations that enhance the production of compounds that activate the two-hybrid system will be enriched, making their isolation for characterization easier (isolating these compounds from wild-type γ-humulene synthase, which produces much lower titers, would be much more difficult). Many other terpene synthases, as well as functionalizing enzymes like cytochrome P450s, produce different products with various titers in response to mutations, suggesting the approach could be used to evaluate large and diverse sets of molecules by evolving a minimal number of starting pathways to solve genetically encoded problems.
There were challenges to evolving a metabolic pathway under a selection pressure that were identified. Farnesyl pyrophosphate, an intermediate in sesquiterpene biosynthesis, was found to be an inhibitor of PTP1B; its production enriched variants in which the evolving terpene synthase gene (which, for GHS, is slightly toxic to E. coli) was removed from the cell. Strategies for reducing the viability of “uninteresting” solutions, like a promoter that responds to FPP accumulation by expressing a toxic gene, could further bias the cell towards producing a more interesting molecule. Intermediates may trigger selection systems in other natural product pathways as well; systematically characterizing their effects may be necessary to ensure the target(s) for directed evolution will be retained following selection.
Many natural terpenoid pathways evolved to improve the fitness of living systems in response to specific biochemical challenges (e.g., cellular responses to both biotic and abiotic stresses). In this study, a B2H system was used to define an artificial challenge—the inhibition of PTP1β—and evolved a terpene synthase to address it. The screen of a relatively small library of GHS mutants (e.g., SSM at 6 sites) identified single and double mutants that improved the fitness of B2H-encoded E. coli cells by reducing GHS toxicity and/or by increasing inhibitor production. These distinct biochemical traits highlight the multi-objective optimization problems that guide the evolution of specialized secondary metabolites in biological systems.
Terpene synthases have been the subject of a myriad of detailed enzymological studies, but they remain challenging to engineer. Mutations that alter their product profiles often reduce catalytic activity, and substitutions required to generate specific products are challenging to predict de novo. The growth-coupled assays disclosed herein identified a combination of mutations in GHS that improve the titer of a minority product—a terpene alcohol that inhibits PTP1β—by over fifty-fold, and enabled the isolation of a residue where mutations can improve water capture—a historically challenging feat, given the complexity of the carbocation cyclization cascade and the contributions of water. Sesquiterpene synthases that generate a single hydroxylated product are rare, but the analysis described herein allowed building one: Y415S, which produces mainly himachalol. The findings suggest that activity-guided screens—and, perhaps in the future, screens carried out with generalist biosensors for specific classes of terpenoids—can accelerate the discovery of active, functionally distinct variants of terpene synthases, which are valuable starting points for structure-function studies and protein engineering.
A genetically encoded objective has several important differences from some complex biochemical challenges encountered in nature (e.g., inter-organism communication). First, the target of inhibition is located within the same cell—and within the same cellular region, the cytosol—as the terpenoid pathway, so terpenoid transport between cells is not a selection criterion. Second, two system properties—an overabundance of terpenoid precursor and inefficient terpenoid export—lead to high intracellular concentrations that make potent inhibitors unnecessary. Notably, the analysis culminated in a double mutant with major products that were easy to purify; mutants with potent, low-abundant inhibitors may have been overlooked. New approaches to reduce intracellular titer (e.g., a reduction in precursor supply) or to survey minority products could yield more potent molecules. Finally, the E. coli cells used in this study lack P450 monooxgenases and other terpenoid-functionalizing enzymes that could generate more soluble or potent molecules. Future efforts to integrate these enzymes into terpenoid pathways could expand the solution space explored in activity-guided screens.
The study of evolution focused on a single enzyme in a terpenoid pathway, but in nature, enzymes do not evolve one at a time and the intermediate metabolites are not strictly acted upon by individual enzymes in a sequence. As an example, plant diterpenes are often produced through two cyclization steps (in distinct active sites) that can be carried out by multiple enzymes, and the resulting diterpenoids can then be acted upon by multiple cytochrome P450s (whose products can sometimes react even further with the same enzymes). Each of these enzymes can simultaneously undergo random mutation/recombination to yield pathways producing different final products. Extending the work in this study to multiple component pathways like those of plant diterpenoids (within the limits of heterologous expression in E. coli) could be a powerful approach for enlarging the chemical space being searched for inhibitors of PTP1B (or another genetically encoded objective). Multicomponent pathways could also be used with the two-hybrid system to explicitly investigate the propensity for a pathway to produce a biologically active compound when it is evolved sequentially (e.g., one enzyme at a time) versus when multiple components are evolved at once.
E. coli DH10B, chemically competent NEB Turbo, and electrocompetent One Shot Top10 (Invitrogen) cells were used for cloning and library preparation. E. coli BL2 (DE3) cells were used to express proteins for in vitro studies, and E. coli s1030 for all B2H analyses, and DH5α for terpenoid isolation. When necessary, the chemically competent and electrocompetent cells were generated with well documented protocols (RbCl and washing, respectively).
Farnesyl pyrophosphate (FPP) and methyl abietate from Santa Crua Biotechnology; tris(2-carboxyethyl) phosphine (TCEP), bovine serum albumin (BSA), M9 minimal salts, phenylmethylsulfonyl fluoride (PMSF), and DMSO (dimethyl sulfoxide) from Millipore Sigma; longifolene, glycerol, bacterial protein extraction reagent II (B-PERII), and lysozyme from VWR; cloning reagents from New England Biolabs; and α-bisabolol and all other reagents (e.g., antibiotics and media components) were purchased from Thermo Fisher. Cedarwood oil (for β-himachalene isolation) was purchased from King Soopers. The mevalonate was prepared by mixing 1 volume of 2 M DL-mevalanolactone with 1.05 volumes of 2 M KOH, followed by incubation at 37° C. for 30 minutes. Vanillin-sulfuric acid solution was prepared by adding 7 g of vanillin and 1.3 mL of concentrated H2SO4 to 200 mL of methanol. Gene sources for new plasmids are outlined in Table 13.
All plasmids were constructed by using Gibson Assembly. Table 2 describes the composition, antibiotic resistance, and availability of all final plasmids. Table 3 and Table 4 list the primers used for plasmid assembly. NEB Turbo was used for all cloning, BL21 (DE3) for all protein expression, DH10B for large-scale terpenoid production, and s1030 for all B2H experiments.
A homology model of GHS was constructed by using SWISS-MODEL with α-bisabolene synthase (pdb entry 3SAE) as a template. This software package uses ProMod3 to build models from a target-template alignment, which preserves the structures of conserved regions and remodels insertions and deletions with a fragment library.
Multiple sequence alignment were carried out for the amino acid sequences of ABS, TXS, GHS, EIS, and DSS by using Clustal Omega (
Libraries of enzyme mutants were prepared by using site-saturation mutagenesis (SSM) and error-prone PCR (ePCR). For SSM, the procedure performed the following steps: (i) The genes were amplified with primers containing degenerate codons (NNK) at the residues of interest. (ii) The amplified genes were digested with DpnI, purified them with gel electrophoresis, and used circular polymerase extension cloning (CPEC) to integrate them into plasmids (e.g., pTS). (iii) Heat shock was used to transform the fully assembled plasmids (10 μL) into chemically competent NEB Turbo cells (100 μL). (iv) After 1 hour of shaking (37° C., 225 RPM) in 1 mL SOC, serial dilutions were used on LB agar plates (20 g/L agar, 10 g/L tryptone, 10 g/L sodium chloride, 5 g/L yeast extract, 50 μg/ml carbenicillin) to ensure library size was greater than 10-fold the maximum number of transformants required for full coverage of all possible codons (e.g., greater than 2,240 transformants for a single site saturation mutagenesis library using NNK codons), and all remaining cells were plated over several plates for overnight growth (37° C.). (v) 3-5 colonies were sequenced to verify the presence of mutated genes. If over 50% of the colonies were missing mutations, the library was remade. (vi) The plates were scraped into LB media (10 g/L tryptone, 10 g/L sodium chloride, 5 g/L yeast extract) and miniprepped the final transformants to recover the DNA Library (E.Z.N.A. Plasmid DNA Mini Kit, Omega). (vii) All final libraries were frozen in MilliQ water at −20° C.
For ePCR, the same procedure was carried out with the following modifications: (i) The structures of γ-humulene synthase (modeled) and 5-epi-aristocholene synthase (PDB entry 5EAT) were aligned (ii) The Genemorph II kit (Agilent) was used to amplify residues 304-593 (comprising all amino acids within 8 Å of the substrate analog from PDB structure 5EAT aligned to the homology model) with a high error rate (50 ng template DNA: predicted 9-16 mutations/kb), and the procedure dialyzed the final plasmid mixture into MilliQ water for 2 hours. (ii) Two 100-μL aliquots of electrocompetent One Shot Top10 cells were transformed with 10 μL of the dialyzed CPEC reaction, and each aliquot was recovered in 900 μl SOC for 1 hour. (iii) The outgrowths were pooled, plated serial dilutions on 100 mm petri dishes, and plated the remaining cells on a single large bioassay dish (245 mm×245 mm×25 mm with 20 g/L agar, 10 g/L tryptone, 10 g/L sodium chloride, 5 g/L yeast extract, and 100 μg/ml carbenicillin). The cells were grown at 30° C. overnight to minimize lawn formation, scraped the resulting colonies, and froze the final libraries as above.
The SSM libraries were screened in eight steps: (i) 100 ng of each frozen DNA library (one per site) was pooled and the pooled library was dialyzed into MilliQ water for two hours. (ii) 10 μL was electroporated of the dialyzed library into a 100-μL aliquot of E. coli s1030 cells harboring a mevalonate-dependent isoprenoid pathway producing IPP and FPP (pMBIS) and the two-hybrid system (pB2H), and the cells were recovered in 900 μL SOC for 1 hour (37° C., 225 RPM). (iii) The serial dilutions of each transformation reaction were plated on LB agar supplemented with antibiotics for plasmid maintenance (50 μg/ml kanamycin, 50 μg/ml carbenicillin, 10 μg/ml tetracycline, and 34 μg/ml chloramphenicol) to estimate screening coverage. (iv) The remaining transformants were grown in 50 mL Terrific Broth (TB: 12 g/L tryptone, 24 g/L yeast extract, 20 mL/L glycerol, 2.28 g/L KH2PO4, 12.53 g/L K2HPO4, plasmid antibiotics) overnight (37° C., 225 RPM). (v) An aliquot of each culture was diluted in 1:75 in 4.5 mL TB (pH=7.0 with plasmid antibiotics) and grew this dilution to an OD600 of 0.3-0.6 (37° C. and 225 RPM). (vi) 500 μM IPTG and 20 mM mevalonate were added, and each induced culture grew for 20 hours (22° C. and 225 RPM). (vii) Each culture was diluted to an OD600 of 0.001 and spread 100 μL on LB agar plates (pH=7.0) supplemented with 20 mM mevalonate, 500 μM iPTG, 20 mL/L glycerol (omitted in single-site library screens), plasmid antibiotics, and varying concentrations of spectinomycin. For steps ii-vii, a plasmid harboring the parent terpene synthase was included into each library as a control. (viii) The cells were grown at 22° C., checking for colony growth every 24 hours. The hits were picked from plates for which the library produced a greater number of colonies than the control (e.g., the parent template used for mutagenesis). The ePCR libraries were screened in an analogous fashion (steps ii-viii).
To identify hits meriting further analysis, the terpene synthase gene from either a plasmid extraction or PCR amplifications were sequenced. The mutations identified by this process were introduced into a new pTrc vector harboring GHS (to minimize the impact of random mutations occurring outside of the targeted gene). The re-cloned mutants were transformed into s1030 cells harboring pB2H and pMBIS and plated on LB agar supplemented with antibiotics for plasmid maintenance (50 μg/ml kanamycin, 50 μg/ml carbenicillin, 10 μg/ml tetracycline, and 34 μg/ml chloramphenicol). Colonies were picked to determine product profiles in 4 mL cultures (see below); mutants producing different or greater amounts of products were subjected to drop-based plating to measure spectinomycin resistance.
For the SSM libraries, it was aimed to screen library sizes of at least ten times the maximum number of variants. The first SSM library was constructed by pooling six single-site libraries in an equimolar ratio; it had a maximum diversity of 120, and 15,000 and 9,000 mutants were screened in two separate screens. The second SSM library had a maximum diversity of 100, and 58,500 mutants were screened in one screen. Both library sizes were estimated by counting colonies generated by transforming the SSM reaction. For ePCR, 18,900 transformants, or 1% of the total library of 1.8×106 (and well below the maximum number of 20276 variants, which is experimentally inaccessible) were screened. Larger mutant libraries may be screened. In a typical screen, over 100 colonies on both the wild-type and library plates were observed in the absence of spectinomycin, and 0-100 colonies were observed on plates that contained spectinomycin (≥400 μg/ml).
The spectinomycin resistance of B2H-containing strains was examined by following these steps: (i) The S1030 cells were transformed with pMBIS and variants of pTS and pB2H (Table 2), plated the transformed cells on LB agar supplemented with antibiotics for plasmid maintenance (50 μg/ml kanamycin, μg/ml carbenicillin, 10 μg/ml tetracycline, and 34 μg/ml chloramphenicol), and grew them overnight (37° C.) (to minimize the influence of mutations outside of the terpene synthase, the terpene synthase gene was recloned from all hits.). (ii) The single colonies were used to inoculate 1-2 mL TB (pH=7.0 supplemented with plasmid antibiotics) and the cells grew overnight (37° C. and 225 RPM). (iii) An aliquot of each culture was diluted in 1:75 in 4.5 mL TB (as above) and this dilution grew to an OD600 of 0.3-0.6 (37° C. and 225 RPM). (iv) 500 μM IPTG and 20 mM mevalonate was added to each liquid culture and the induced cultures grew for 20 hours (22° C. and 225 RPM). (v) Fresh TB (no antibiotics) was used to dilute each culture to an OD600 of 0.5 unless specified otherwise, and the 5-10 μL of the dilution was plated on LB agar plates (pH=7.0) supplemented, unless otherwise specified, with 20 mM mevalonate, 500 μM iPTG, 20 mL/L glycerol, antibiotics for plasmid maintenance (as above), and with varying concentrations of spectinomycin. (vi) The cells were grown at 22° C. for at least 48-72 hours before photographing them.
Small-scale terpenoid production was carried in TB (pH=7.0 with plasmid maintenance antibiotics: 50 μg/ml kanamycin, 50 μg/ml carbenicillin, 10 μg/ml tetracycline, and 34 μg/ml chloramphenicol). Briefly, the s1030 cells harboring pMBIS and pB2H were transformed with pTS, the cells were plated on LB agar supplemented with plasmid antibiotics, and grew overnight (37° C.). On the following day, the colonies were picked to inoculate 2 mL of TB (see above), which grew overnight (37° C., 225 RPM). On the subsequent morning, the culture was diluted with TB at a ratio of 1:75 in either 4 mL or 10 mL TB (as above) and grew it to an OD600 of 0.3-0.6 (37° C., 225 RPM). After it reached the desired OD600, the culture was induced with 20 mM mevalonate and 500 μM iPTG, and grew it at 22° C. for 48-88 hours.
DH5α cells were used to carry out large-scale terpenoid production. Briefly, these cells were transformed with pTS and pAM45 (a plasmid that enables mevalonate biosynthesis and conversion to IPP/FPP59), plated on LB agar, and grew overnight (37° C.). On the following day, isolated colonies were picked to inoculate 4 mL TB (pH=7.0) supplemented with 20 mL/L glycerol, and grown overnight (37° C., 225 RPM). On the next morning, the culture was diluted with TB at a ratio of 1:50 into Difco TB mix supplemented with 20 ml/L glycerol, and this dilution grew to an OD600 of 0.3-0.6 (37° C., 225 RPM). The culture was included by adding 500 μM IPTG and grown at 22° C. for at least 84 hours. Table 2 describes the antibiotics added to LB and TB media for plasmid maintenance.
Hexane was used to extract terpenoids from liquid culture, which varied by culture volume: For 4 mL cultures, 0.6 mL hexane was added to 1.0 mL of culture, vortexed for 3 minutes, centrifuged at 13,300 RPM for 2 minutes, and 0.4 mL of hexane was extracted for analysis. Intracellular terpenoids (always collected from 4 mL cultures) were extracted by: (1) Recording the OD600 of each culture at the time of extraction (for determining total intracellular volume per mL of culture) (2) removing 1 mL culture and centrifuging at 4,000×g for 3 minutes (3) discarding the supernatant and adding 100 μL disruptor beads (Chemglass, CLS-1835-BG1)+600 μL hexane (4) and vortexing the bead/hexane mixture for 3 minutes. Samples were centrifuged and stored as before.
For −10 mL cultures, 14 mL of hexane was added to 10 mL of culture, shook at 100 RPM (room temperature) for 30 minutes, transferred to a 50-mL falcon tube, centrifuged at ˜5,000×g for 5-10 minutes, and the hexane layer was removed for analysis.
For large (e.g., 1.0-2.0 L) cultures, hexane was added to 16.7% v/v and mixed by stirring at room temperature for at least 2 hours. The organic layer was recovered with a separation funnel and centrifuged it at 5,000×g for 5-10 minutes. The final hexane layer was removed for further analysis.
Intracellular concentrations of terpenoids were examined by extracting these compounds from cells grown in 4-mL cultures. Briefly, at 48 hours, 1 mL of cell culture was removed, centrifuged for 3 minutes (4000×g), and the supernatant was discarded. Terpenoids were extracted from the cell pellet by adding 600 μL hexane and 100-μL of 0.1-mm disrupter beads (Chemglass, CLS-1835-BG1) and vortexing the suspension for 3 minutes. The resulting lysate was centrifuged at 17,000×g for 2 minutes and the resulting hexane layer was analyzed using GC/MS as described below. Finally, intracellular concentrations of each terpenoid (Ccell) was determined per below:
All samples were analyzed with a gas chromatograph/mass spectrometer (GC-MS; a Trace 1310 GC fitted with a TG5-SilMS column and an ISQ 7000 MS; Thermo Fisher Scientific). All samples were prepared by adding 20 μg/ml of methyl abietate as an internal standard, except when estimating purity. When the peak area of the internal standard exceeded ±50% of the average area of all samples containing that standard, the corresponding samples were re-analyzed. For all runs, the following GC method was used: hold at 80° C. (3 min), increase to 250° C. (15° C./min), hold at 250° C. (6 min), increase to 280° C. (30° C./min), and hold at 280° C. (3 min). To identify various analytes, m/z ratios were scanned from 50 to 550. The molecules were identified by using the NIST MS library and, when necessary, confirmed this identification with mass spectra reported in the literature. When displaying chromatograms, the peak for methyl abietate or himachalol was aligned if necessary (due to shifting retention times arising from column trimming carried out as part of routine maintenance). Purity was estimated as the fraction of the total chromatogram area comprised by the peak of interest.
Sesquiterpenes were quantified by using select ion mode (SIM) to scan for the molecular ion (m/z=204) and an ion common to both sesquiterpenes and methyl abietate, the internal standard (m/z=121). The peaks that made up <1% of the total integrated area in the m/z=204 chromatogram were ignored. The remaining peaks were quantified using the common ion m/z=121 and Eq. 3-2, where Ai is the
Himachalol was isolated from two 2-L cultures of GHS A319Q/Y415F grown in 4-L Erlenmeyer flasks. Terpenoid biosynthesis and extraction were carried out as described above. The hexane extract was dried to ˜500 μL with a rotary evaporator and dry loaded the sample onto a 12 g C18 column (Biotage Sfar HC Duo). Indole was removed from the terpenoids using C18 chromatography with a Biotage Selekt (5 CVs 70% acetonitrile in water, 5 CV's 85% acetonitrile in water, 5 CV's 100% acetonitrile in water; 10 mL fractions). The terpenoid content of various fractions were checked by using thin layer chromatography (TLC, 3:7 ethyl acetate/hexane) supplemented with a vanillin/sulfuric acid detection method (heating at 125° C. for ˜30 seconds). Himachalol was identified using the NIST MS library (
Himachalol-containing fractions were pooled and dried using a rotary evaporator, using ethanol to form an azeotrope for removing water. The dried material was resuspended in 200 μL hexane and loaded onto a 5 g silica column (Biotage Sfar HC Duo) for normal phase purification. Using a Biotage Selekt system, the compound of interest was isolated using an isocratic gradient (10% ethyl acetate in hexane), collecting 5 mL fractions. TLC was used with vanillin/sulfuric acid charring to identify himachalol-containing fractions; himachalol appeared on the TLC plates as a purple spot. One 85% pure himachalol fraction (GC/MS) was obtained.
γ-humulene was isolated from two 2-L cultures of GHS A319Q grown in 4-L Erlenmeyer flasks. Terpenoid biosynthesis and extraction were carried out as described above. The hexane extract was dried to ˜500 μL with a rotary evaporator. The material was loaded onto a 5 g silica column (Sigma) and gamma humulene was isolated using vacuum liquid chromatography (isocratic 100% hexane gradient, 3 mL fractions). The terpenoid content of various fractions were checked by using thin layer chromatography (TLC, 3:7 ethyl acetate/hexane) supplemented with a vanillin/sulfuric acid detection method. A single fraction containing >85% pure γ-humulene was obtained. γ-humulene appeared as a purple spot on the TLC plates. The composition of terpenoid-containing fractions were analyzed with GC-MS and, owing to its thermal instability, estimated the purity of γ-humulene using 1H NMR (
β-himachalene was isolated from cedarwood oil (King Soopers). 502 mg of the oil was loaded onto a 20 g silica column (Sigma) and the non-himachalene components were removed using VLC (10 fractions, 0% ethyl acetate in hexanes; 5 fractions, 5% ethyl acetate in hexanes; 5 fractions, 10% ethyl acetate in hexane; 10 mL fractions). The fractions were analyzed using the vanillin acid-sulfuric acid detection method and GC/MS, obtaining a fraction enriched in α-, β-, and γ-himachalene. β-himachalene was identified using the NIST MS library (
PTP1B was purified as described previously. Briefly, the E. coli BL21 (DE3) cells were transformed with a pET21b vector containing the catalytic domain of PTP1B (residues 1-321) modified with a 6× polyhistidine tag on its C-terminus. The cells were grown in 1-L cultures to an OD600 of 0.3-0.6 (37° C., 225 RPM), induced with 500 μM IPTG, and grown at 22° C. for 20 hours. The cells were lysed with B-PERII, and purified PTP1B by using desalting, nickel affinity, and anion exchange chromatography (HiPrep 26/10, HisTrap HP, and HiPrep Q HP, respectively; GE Healthcare). The final protein was stored (50 μM) in HEPES buffer (50 mM, pH 7.5, 0.5 mM TCEP) in 20% glycerol at −80° C.
The soluble fraction of several GHS variants were measured by using the Nano-Glo HiBit Lytic Detection System (Promega). E. coli BL21 was transformed with a pET28a vector containing Y415C, A319Q, or A319Q/Y415F with a HiBit tag on the N-terminus. Individual colonies were used to inoculate 2 mL of TB media, which grew overnight (37° C., 225 RPM). Each overnight culture was diluted 1:50 in 4 mL TB in a 24-deep well block and grown (37° C., 225 RPM) to an OD600 of 0.5-0.9, at which point 500 μM IPTG was added and the cultures were grown for an additional 24 hours (37° C., 225 RPM).
Following protein expression, 200 μL of each culture was transferred to a 96-deep well block and cells were lysed by adding 200 μL of the HiBit Lytic reagent (prepared and incubated according to the manufacturer's instructions). In preparation for measuring total concentrations of terpene synthase, 100 μL of each lysis reaction was transferred to a 96-well white microplate (Nunc). In preparation for measuring soluble protein concentrations, the remaining volume of each lysed culture was centrifuged (3,000 RPM, 10 minutes) and 100 μL of each supernatant was transferred to the same microplate. For all wells, the luminescent signal of the total and soluble samples were measured with a Spectramax M5 plate reader, and the soluble fraction of terpene synthase was determined by dividing the soluble signal by the total signal.
Inhibition was examined by FPP by measuring its influence on PTP1B-catalyzed hydrolysis of p-nitrophenylphosphate (pNPP). Briefly, 100-μL reactions were prepared comprising 50 nM PTP1B, 0.167-20 mM pNPP, and 75-150 μM terpenoids in 50 mM HEPES (pH=7.3) with 50 mM TCEP, 50 μg/mL BSA, and 2-10% DMSO. The reactions were initiated by adding pNPP, and the production of p-nitrophenol (pNP) was monitored by measuring absorbance at 405 nm every 10 s for 5 min (SpectraMax iD3 plate reader). When necessary, the solubility of the terpenoids in individual wells were assessed by plotting the A405 values for each well—including a no-inhibitor well—in a single read.
Kinetic data was analyzed by using a custom Matlab script supplemented with a user-generated standard curve (e.g., a plot of absorbance at 405 nm vs. pNP concentration in μM,
Inhibition by FPP, a costly reagent, was examined with three modifications of the above assay: (i) pNPP concentration was held constant (5 mM). (ii) The 10% DMSO was replaced with 10% of a mixture of methanol: 10 mM NH4OH (7:3). (FPP was purchased as a 1.1 mg/mL solution in methanol: 10 mM NH4OH (7:3)). (ii) IC50 was estimated by using a linear fit to the initial rate data; 95% confidence intervals were propagated on the regression parameters generated using Matlab's “coefCI function”. This approach, which reflects the limited number of measurements afforded by the FPP stock (measurements which include very high and very low initial rates), may result in a greater error than a more standard approach for estimating IC50 (e.g., 4-parameter logistic curves) but nonetheless provides an order of magnitude estimate of potency.
The influence of GHS mutants on E. coli growth were examined by expressing them with pET29b vectors (including a C-terminal Hibit tag: GSSGGSSGVSGWRLFKKIS; Promega). These plasmids were transformed into BL21 cells plated on LB agar supplemented with 50 μg/mL kanamycin and grown overnight at 37° C. The resulting colonies were used to inoculate 2-mL liquid cultures of each transformation (Difco TB mix supplemented with kanamycin), which were grown overnight (37° C. and 225 RPM). The next morning, each culture was diluted 1:100 in 200 μL liquid media (Difco TB mix supplemented with kanamycin and 50 μM) in a clear 96-well plate (Costar flat bottom). Growth curves were measured using a SpectraMax iD3 plate reader (OD600, measurements every 15 minutes after 5 seconds of shaking). When analyzing data, wells with OD600>0.04 at t=0 were ignored, an indication of cell aggregates.
Specific growth rate was determined by determining the exponential growth region for each curve (e.g., the span of time over which instantaneous growth rate was constant). \
The data was transformed and plotted the data according to the above equation, where OD, is the OD600 at time t, ODt0 is the OD600 at the beginning of the exponential growth phase, and μ is the specific growth rate. “μ” was determined as the slope of each transformed plot (using the fitlm function in Matlab) and the error in μ was determined from the 95% confidence intervals for each slope (using the coefCI function in Matlab).
Statistical significance was determined with a one-tailed Welch's t-test (Table 18), and an F-test was used to compare one- and two-parameter models of inhibition (Table 17).
All drug discovery efforts begin by identifying functional molecules. Many small-molecule discovery programs rely on expensive and laborious high-throughput screens of large compound libraries; in contrast, biological systems (e.g., the natural world) are constantly discovering functional molecules through natural selection. Discovery approaches that emulate nature by introducing genetically encoded selection pressures into microbes that can produce structurally diverse compounds could be useful for the efficient discovery of novel molecules with pharmaceutically relevant activities. This study used a bacterial two-hybrid architecture to encode selection pressures that gene expression to the activity of important drug targets: HIV-1 protease (HIV-1Pr) and 3-chymotrypsin-like protease (3ClPro) from SARSCOV2. The bacterial two-hybrid architecture identified differences in the optimal design of each protease system and present a workflow that should be adaptable to the development of similar tools. The bacterial two-hybrid architecture screened each protease B2H against 74 terpenoid pathways and identified several enzyme combinations that show altered resistance phenotypes (implying biosynthesis of protease inhibitors). These results expand on the early work by showing that bacterial two-hybrid systems enable the detection of biosynthetically accessible small molecules that inhibit proteases and, more broadly, suggest that these systems provide a particularly versatile means of screening biosynthetic pathways that produce medicinally relevant natural products.
Nature is replete with enzymes that produce an enormous variety of biologically active molecules. Over millennia of evolution, compounds carrying out specific biological activities (pheromones, pest repellants, toxins, etc.) have been enriched—or “discovered”—through selective pressures. Many of these natural products exhibit useful medicinal activities in humans, but these properties are often discovered serendipitously or through screens of chemical libraries. Unfortunately, these screens usually require compound isolation from natural sources—a laborious and expensive endeavor. Microbial systems have excelled at producing terpenoids, alkaloids, peptides, and other natural products in laboratories; however, identifying functionally valuable molecules still requires non-trivial purification schemes followed by in vitro assays. Consequently, microbial systems producing drug-like molecules have been limited to the production of single compounds with known value (e.g. the pharmaceutical precursors dihydroartemisinic acid or taxadiene) or many diverse compounds that lack functional characterization.
Genetically encoded systems that connect the activity of compounds in a cell to easily measurable outputs (e.g. fluorescence, luminescence, or growth) could be useful for purification-free, functional characterization of microbially-produced molecules. Unfortunately, linking the activity of a compound to transcription of such signals is not straightforward, especially if the desired activity is modulating a drug target completely orthogonal to microbial gene expression. A limited number of systems have been developed to respond to a biosynthesized molecule's activity against a medicinally relevant enzyme in a cell (e.g. rho bacterial termination factor, HIV-1 protease, and protein tyrosine phosphatase 1B), but these strategies have not been generalized to other targets or used with large biosynthetic libraries (e.g. >50 pathways).
In this work, the bacterial two-hybrid architecture expanded on the previously reported phosphatase-based system by developing genetically encoded bacterial two-hybrids that respond to the activity of HIV1-Pr and 3ClPro. To date, the associated viruses (HIV and SARS-COV-2) are responsible for >40 million deaths worldwide. While no 3ClPro inhibitors are approved for use today, 10 HIV-1Pr inhibitors have been. Even so, resistance to HIV-1Pr drugs frequently emerges (especially in the developing world) and they often require suboptimal dosing/delivery strategies due to their poor pharmacokinetic properties. Thus, new inhibitors of both enzymes could be useful for drug development. To screen for such molecules, the bacterial two-hybrid architecture developed and optimized luminescent systems responding to the activity of each protease and used the best constructs to inform the design of growth-coupled systems. The bacterial two-hybrid architecture used these tools to screen >100 metabolic pathway/inhibitor targets with a simple drop-plating assay, allowing us to quickly identify pathways producing molecules with different survival phenotypes alongside each protease B2H (implying varying levels of inhibitory activity.) The findings suggest that these tools can quickly screen biosynthetic pathways for molecules with broad or specific inhibitory activities through parallel screens of two-hybrid systems harboring different drug targets. Coupled with large biosynthetic pathway libraries, these designs should prove useful in the discovery of novel viral protease inhibitors.
To screen metabolic pathways for protease inhibitors, a genetically encoded system that links protease activity to a selective pressure was sought. To this end, a bacterial two-hybrid (B2H) system was developed to control the expression of an essential gene. A system was previously created in which MidT (a phosphotyrosine substrate) and a superbinder v-Src SH2 domain were fused to the omega subunit of RNA polymerase or portions thereof (RpoZ) and a DNA-binding cI repressor protein, respectively. Adding Src kinase phosphorylates the substrate, enabling binding to the SH2 domain, localization of RNA polymerase, and transcription of an antibiotic resistance gene from an optimized B2H promoter, pLacZopt. It was hypothesized that cleavage sites could be encoded in the MidT-RpoZ linker to make a protease-responsive B2H: active protease would cleave the fusion, preventing RpoZ from localizing RNA polymerase, and protease inactivation would restore localization and, thus, transcription (
In some embodiments, a protease recognition sequence was added to the MidT-RpoZ linker. The protease recognition sequences reduced luminescence but maintained a 4 to 5-fold dynamic range. In those embodiments, E. coli was transformed with a protease induction system and a bacterial 2-hybrid system modified with an inactive PTP1B and a protease-specific cleavage site, which allowed for monitoring of changes caused by protease expression. Monitoring the changes indicated that two HIVpro systems and one 3CLpro system exhibited a decrease in luminescence in response to protease expression, and inactive proteases showed a small decrease in luminescence, which may have been an effect resulting from weak substrate binding and/or a general cellular stress response to protease overexpression. Additional proteases and protease-specific cleavage sites were then screened by adding recognition sites for the papin-like protease of SARS-COV-2 (PLpro), the NS2B/NS3 proteases of the West Nile and Dengue Viruses, WNVpro and DVpro, respectively, and ubiquitin-specific protease 7 (USP7). The proteases and protease-specific cleavage sites were screened alongside bacterial 20 hybrid systems. For USP7, a catalytic domain with and without a C-terminal extension required for activity was included. As a result of the screen, 3CLpro reduced luminescence for multiple recognition sites, indicating that one or more components of the underlying bacterial 2-hybrid system contained a cleavage site for 3CLpro, which was later confirmed to be a site in RpoZ.
In some embodiments, new bacterial 2-hybrid systems promoting spectinomycin resistance were creating. To build survival-modulating systems for 3CLpro and HIVpro, earlier bacterial 2-hybrid systems including a gene for spectinomycin resistance were changed by swapping a protease with PTP1B and adding the best-performing cleavage site from previous screens.
A kinetic characterization of α-bisabolol based on the embodiments described above suggests that α-bisabolol is an inhibitor of 3CLpro.
To begin, the B2H systems that respond to the activity of HIV-1 protease (HIV-1Pr) or SARS-COV-2's 3-chymotrypsin like protease (3ClPro) were created. A cleavage sequence was inserted for each protease into the MidT/RpoZ linker, testing constructs with different numbers of alanine residues around the insertion. Designs lacking cleavage sequences were also tested. To measure transcription from the B2H promoter with a wide range of protease expression levels, HIV-1Pr or 3CLpro were introduced on an arabinose-inducible plasmid and placed a luciferase gene, LuxAB, under control of the B2H promoter (
3CLpro showed significant reductions in luminescence (6-fold) even in the absence of a cleavage sequence in the MidT/RpoZ linker (
When using non-cognate or no recognition sites, HIV1-Pr showed smaller reductions in luminescence (2-3-fold). This effect could be consistent with low-level proteolysis; unfortunately, HIV-1Pr can act on a broad range of recognition sequences, precluding simple predictions of cleavage sites in the system. One exception to this trend was a 6-fold change observed with the 3CLpro cleavage sequence and a three-alanine linker. Although the observed effect on transcription was high in this system, the basal signal (e.g., without arabinose induction) was lower compared to the HIV-1Pr plus three-alanine linker. Therefore, the HIV1-Pr site was chosen to be used in the growth-coupled system, hypothesizing that it would afford better survival characteristics due to higher expression of an essential gene in the absence of an active protease.
Next, B2H systems were created that are compatible with selection. The procedure (i) introduced each protease and the best performing linker identified in the luminescence screen into the B2H system; and (ii) introduced aadA (indicated as “SpecR”), a gene encoding resistance to the antibiotic spectinomycin, in place of the LuxAB. The arabinose screen suggested high levels of protease would be important for maximal reduction in aadA expression, so the procedure tested multiple ribosome binding sites (RBS) to achieve high translation initiation rates (TIR) of each protease (
The RBS Calculator was used to design sequences with a wide range of predicted TIR's for each protease, at least 2 of which were tested with each system. To confirm functionality, both WT and inactive enzymes (D25N mutation in HIV-1 Pr, H41A mutation in 3CLpro) were tested. These systems were plated on solid media containing spectinomycin and identified RBS's with TIR's of 20,000 (HIV-1Pr) and 90,000 (3CLpro) that showed poor growth when the proteases were active and robust growth when they were inactive. In agreement with the observed fold-changes with the luminescent system, the B2H systems saw more striking growth differences with 3ClPro than with HIV1-Pr. The B2H designs were used to screen metabolic pathways.
To search for biosynthetic pathways producing protease inhibitors, the B2H systems were paired with terpenoid pathways. These molecules and their derivatives have been shown to inhibit viral proteases and the construction of many diverse terpenes in E. coli can be achieved by exchanging just 1-2 genes in a biosynthetic pathway (a terpene synthase and/or prenyltransferase). To produce terpenes in E. coli, the B2H systems coupled the isopentenol utilization pathway (pIUP, which produces IPP and DMAPP from the cheap precursor, isoprenol) with an in-house terpene synthase library including 37 genes from a diverse set of organisms (
To search for terpenoid inhibitors of viral proteases, the pathways were paired with each protease responsive B2H. In the presence of a GGPP producing precursor pathway, pathways conferring survival against 3ClPro were not observed. Nearly all pathways, however, did provide a survival advantage with HIV-1Pr, suggesting (i) the GGPP precursor may be inhibiting this enzyme or (ii) the stringency of the HIV1-Pr selection against GGPP pathways needs to be increased (e.g. the active HIV1-Pr should die at lower concentrations of spectinomycin). In the presence of an FPP producing precursor pathway, several terpene synthases were observed conferring high levels of resistance (e.g. growth at 800 μg/mL spectinomycin) with one or both protease targets (
Nature excels in the development and production of functional molecules. Over millennia, random mutations and recombination events have produced myriad enzymatic pathways which, challenged by natural selection, yield useful compounds. In this study, the expression systems disclosed herein were engineered to apply artificial selection pressures to recapitulate this process in engineered E. coli. Using a bacterial two-hybrid architecture, the procedure developed and/or characterized systems for detecting inhibitors of HIV1-Pr and 3ClPro, proteases necessary for the infectiousness of two epidemic-causing viruses, by inserting cleavage sites into the B2H's flexible linkers. Using this strategy allowed for creation of systems producing either luminescence or spectinomycin resistance as output signals. The luminescent output allowed for quantification of system performance and identify optimal linker constructs, streamlining the development of the final antibiotic resistance-based system. This optimization revealed that the residues flanking the inserted cleavage site affect system performance depending on the protease used. It also helped identify a putative protease recognition site within RpoZ. Fortunately, a functional B2H system was still able to be developed, but non-targeted proteolysis of different B2H components (e.g. Src, CDC37, or the luciferase/spectinomycin resistance proteins) could complicate other designs.
To demonstrate the utility of the B2H, 74 terpene synthase pathways were screened against the HIV1-Pr and 3ClPro systems. Using a drop-plating assay, several terpene synthase pathways were identified conferring resistance to spectinomycin (implying protease inhibition) in the presence of one or both proteases. These pathways and their products merit follow up, both for evaluation of biosynthesized inhibitors (some of which may exhibit target selectivity) and investigations into the functionality of unexpected prenyltransferase/terpene synthase combinations (e.g., FPPS and 1,8-cineole synthase). Larger screens would benefit from improvements in assay throughput, such as barcoding strategies that allow pooling of pathways and measurements of fitness differences with next generation sequencing. Similar approaches have streamlined genome-wide studies of fitness-enhancing or reducing alterations, suggesting comparable improvements in the throughput of fitness measurements using plasmid-borne systems-like the terpenoid pathways and B2H systems—could be possible.
Bacterial two-hybrid systems were developed that detects the activity of two important disease-relevant proteases in E. coli, and the B2H systems used them to screen 74 terpenoid pathways for potential inhibitors. Several pathways were identified that improve resistance in the presence of HIV1-Pr, 3ClPro, or both—an indication of inhibitor biosynthesis. The findings described herein show that the B2H architecture can be adapted to other classes of drug targets. When paired with existing biosynthetic pathways for building diverse compounds in E. coli, these B2H systems could accelerate the development of drugs against challenging targets. Chemically competent NEB Turbo cells was used to carry out cloning and E. coli s1030 for all B2H analyses.
Methyl abietate was purchased from Santa Cruz Biotechnology. Tris(2-carboxyethyl) phosphine (TCEP), bovine serum albumin (BSA), M9 minimal salts, phenylmethylsulfonyl fluoride (PMSF), and DMSO (dimethyl sulfoxide) were purchased from Millipore Sigma; glycerol from VWR; cloning reagents from New England Biolabs; and all other reagents (e.g., antibiotics and media components) from Thermo Fisher.
All plasmids were constructed using Gibson assembly. Table 6 describes the source of each gene; Table 7 describe the composition of all final plasmids. In all cases, LB indicates the LB Miller recipe. Agar concentration was 2% for all solid media. When necessary, chemically competent cells were generated for cloning with the standard RbCl protocol, and electrocompetent cells with a washing protocol as previously described. For screening terpene synthases against proteases, the chemically competent cells were generated as follows: (i) From a glycerol stock, the s1030 cells harboring the pIUP and pB2H variant of interest were streaked and grew them on LB agar with plasmid antibiotics (kanamycin, tetracycline, chloramphenicol, concentrations listed in Table 7) at 37° C. (ii) The following day, the a single colony was picked to inoculate 2 mL LB with the same antibiotics and grew the culture at 37° C. and 225 RPM for 16 hours. (iii) A 1:100 dilution in 50 mL LB was created with the same antibiotics and grew the culture at 37° C. and 225 RPM until the OD600 reached 0.3-0.6. (iv) The cells were pelleted at 5,000 RPM for 5 minutes. (v) The cells were resuspended in 500 μL ice cold 100 mM CaCl2)+7% (v/v) DMSO and froze 100 μL aliquots at −80° C.
Preliminary B2H systems (which contained LuxAB as the GOI) were characterized with luminescence assays. Plasmids were transformed into s1030, plated the transformed cells onto LB agar plates+plasmid antibiotics (Table 7), and incubated all plates overnight at 37° C. The following day, colonies were picked to inoculate 1 mL LB cultures with the same antibiotics and grew the culture at 37° C. and 225 RPM for 16 hours. The following morning, each culture was diluted by 100-fold into 1 ml of TB media and incubated these cultures in individual wells of a deep 96-well plate for 5.5 hours (37° C., 225 RPM), including arabinose when a pBAD plasmid was present. 100 μL of each culture was transferred into a single well of a standard 96-well clear plate and measured both OD600 and luminescence on a Spectramax iD3 plate reader (standard luminescence settings). Cell-free media was measured and subtracted the signals from each measurement prior to calculating OD-normalized luminescence (e.g., Lum/OD600).
Drop-plating of E. coli cells lacking a metabolic pathway was carried out as follows: (i) The B2H plasmid was transformed into s1030 cells (electroporation) and plated on LB agar+plasmid maintenance antibiotics (kanamycin and tetracycline, antibiotic concentrations listed in Table 7) and grown overnight at 37° C. (ii) Colonies were picked to inoculate 1 mL TB (12 g/L tryptone, 24 g/L yeast extract, 12 mL/L 100% glycerol, 2.28 g/L KH2PO4, 12.53 g/L K2HPO4), pH=7.0+plasmid maintenance antibiotics and shaken overnight (225 RPM, 37° C.). (iii) The following morning, each culture was diluted to OD600=0.1 in 1 mL TB, pH=7.0 (no antibiotics), 5-7 μL of each dilution was drop-plated onto LB agar (pH=7.5+plasmid maintenance antibiotics and increasing concentrations of spectinomycin), and the plates were grown at 37° C. (iv) The following morning, plates were photographed.
Drop-plating of E. coli cells producing terpenoids was carried out as follows: (i) pB2H, pIUP, and pTS were transformed into s1030 cells (electroporation) and plated on LB agar containing kanamycin, tetracycline, chloramphenicol, and carbenicillin. (ii) Colonies were picked to inoculate 2 mL TB, pH=7.0+plasmid maintenance antibiotics and shaken overnight (225 RPM, 37° C.). (iii) The following morning, each culture was diluted as before and each dilution was drop-plated on LB agar (pH=7.0)+2% glycerol, 10 mM isoprenol, 50 μM iPTG, plasmid maintenance antibiotics, and increasing amounts of spectinomycin, and the plates were grown at 22° C. (v) plates were photographed after 72 hours.
Terpenes were produced by transforming all necessary plasmids into E. coli cells (see Table 7 for strain/plasmid details) and plating on LB agar plates containing antibiotics for plasmid maintenance. Following overnight growth at 37° C., colonies were picked to inoculate 2 mL TB+antibiotics and grown overnight in an incubator shaker (37° C., 225 RPM). The following morning, cultures were diluted 1:75 into TB+antibiotics and grown (37° C., 225 RPM) until the OD600=0.3-0.6. Once reaching the required OD600, cultures were induced by adding isoprenol (50 mM) and iPTG (50 μM or 500 μL) and then transferred to an incubator shaker at 22° C., 225 RPM for 48 hours.
Terpenoids were produced in vivo in 4 mL cultures as described above. At the completion of each fermentation, the OD600 was measured and the total cellular volume in 1 mL of the culture was determined from the specific cellular volume for complex media containing glycerol and amino acids (assumed to be similar to TB). Lysate terpenoids (cells+media) were extracted by: (1) adding 1 mL culture to 600 μL hexane (2) vortexing hexane/cell mixture for 3 minutes (3) centrifuging the mixture at 17,000×g for 2 minutes (4) retaining 400 μL of the resulting hexane layer and storing at −20° C. for further analysis. Intracellular terpenoids were extracted by: (1) removing an additional 1 mL culture and centrifuging at 4,000×g for 3 minutes (2) discarding the supernatant and adding 100 μL disruptor beads (Chemglass, CLS-1835-BG1)+600 μL hexane and (3) vortexing the bead/hexane mixture for 3 minute. Samples were centrifuged and stored as before.
Terpene titers and compound identity were analyzed by using a Trace 1310 GC fitted with a TG5-SilMS column and an ISQ 7000 MS with the following GC method: hold at 80° C. (3 min), increase to 250° C. (15° C./min), hold at 250° C. (6 min), increase to 280° C. (30° C./min), and hold at 280° C. (3 min). For compound identification, the m/z ratios from 50-550 were scanned and assigned ID's (when possible) using comparisons to compounds in the NIST MS library. For compound quantification, the single ions (m/z=121 for sesquiterpenes, and m/z=93 for diterpenes) were scanned. Samples included an internal standard, caryophyllene, at a constant 20 μg/mL. Injections where the internal standard area was greater than ±50% different from the average of all samples from a given day were repeated. Terpene titers for compounds i were determined as caryophyllene equivalents using equation 4-1, where sid=caryophyllene:
Multiple sequence alignments were created for all cladograms using the Muscle algorithm in MegaX. Following alignment, the maximum-likelihood tree using MegaX was created with default settings. Tree visualization was carried out in R studio using the ggtree package.
To align 3ClPro from SARSCOV and SARsCoV2, EMBOSS Needle was used.
In some embodiments, a protease recognition sequence
An approach for using genetically encoded systems was developed to guide the discovery of targeted, biologically active molecules in microbial hosts. The work began with the development of a bacterial two-hybrid (B2H) system that links the activity of protein tyrosine phosphatase 1B (PTP1B), an elusive drug target, to the expression of an antibiotic resistance gene to in E. coli. This system was used to screen 29 terpenoid pathways and identified two inhibitors with surprising potency and binding modes. Building on these results, the same system was used to evolve a terpene synthase to confer a survival advantage in the presence of the PTP1B-focused B2H system; in this effort, the B2H system identified mutants that increase the production of total terpenes in E. coli and/or shift its product profile to enhance the titers of minor components. This study also revealed a previously unreported residue important for directing 6,11 ring closure during catalysis; removal of the hydroxyl functionality at this site yielded significant shifts in product profile toward bicyclic molecules. This work concluded with the development of two-hybrid systems that detect the activity of viral proteases; using these systems, a combinatorial screen of 74 biosynthetic pathways was carried out, identifying enzyme combinations that modulate the activity of each protease system in distinct ways. These findings demonstrate the compatibility of the two-hybrid architecture with other classes of diseases-relevant enzymes.
The work with a PTP1B-specific bacterial two-hybrid motivates screens of other PTP-based systems. Combining the two-hybrid system harboring PTP1B with terpenoid pathways led to the discovery of molecules with surprising degrees of specificity against other phosphatases—a property that has eluded past drug development efforts. Motivated by these results, the approach showed that the two-hybrid system could incorporate other PTPs of medicinal relevance without further optimization and that the responses of these systems are consistent with the selectivity of biosynthesized inhibitors (e.g., the pathway for amorphadiene, which is a more potent inhibitor of PTP1B than TCPTP, conferred a better survival advantage alongside the PTP1B-specific B2H system than it did for the TCPTP-specific system). The approach envisioned using alternative PTP-specific B2H systems not only for identifying inhibitors of alternative PTPs, but also for carrying out high-throughput screens that enable the identification of metabolic pathways for selective inhibitors. A screen of biosynthetic libraries against multiple PTP-specific B2H systems, for example, should enable the identification of pathways that produce selective inhibitors.
Long isoforms of PTP1B and TCPTP (harboring disordered and/or hydrophobic domains) are also compatible with the B2H design (
Unfortunately, not all PTPs are easy to incorporate into the B2H design. For example, striatal-enriched phosphatase (STEP, a potential target for neurological diseases) and SHP2 (a validated cancer target) did not yield functional two-hybrid systems, most likely due to low activity against the MidT substrate (
The extension of the B2H screens to large numbers of pathways and protein targets will require enhanced throughput. Pooling of barcoded biosynthetic pathways followed by next generation sequencing measurements of barcode abundance could improve screening efficiency. In a pilot experiment, the approach combined (i) three isoprenoid pathways, (ii) 37 terpenoid pathways, and five PTP-specific B2H systems in a single screen. The 555 possible combinations of these three sets of plasmids would be challenging to screen with drop-based plating (
In this study, the approach demonstrated that the detection systems are compatible with large mutagenesis libraries, in addition to large pathway libraries. The first growth-coupled assay used to carry out directed evolution of a terpene synthase was reported to improve its ability to generate a biologically active molecule. This assay allowed us to screen thousands of enzyme variants on selective media to identify variants with improved titers, shifted product profiles, and lowered burdens on cell growth. Although growth-based selections are valuable for their throughput, their reliance on survival can be confounded by other fitness effects, such as the toxicity or metabolic burden of heterologously expressing many genes. Screening mutagenesis libraries of a poorly tolerated terpene synthase in E. coli against the PTP1B-based two-hybrid system yielded strains with improved growth that was partially independent of B2H modulation. Using T7 RNAP as the gene expressed by the B2H, amplification systems were built that, following PTP1B inactivation, show large increases in fluorescent protein expression from a plasmid encoded T7 promoter (
The approach modified the two-hybrid architecture to accommodate viral proteases. The resulting systems demonstrate how the original detection system can be extended to other drug targets. Although the targets were screened with the terpene synthase library, the structures of some previously reported protease inhibitors resemble those of other natural products (e.g., flavonoids or non-ribosomal peptides); incorporating pathways responsible for their production may also yield compounds with pharmaceutically relevant properties. To take better advantage of more natural product classes, hosts other than E. coli may be important. Organisms like those of the Streptomyces genus are capable of producing more complex molecules, and genome minimized versions of certain species are available for heterologous biosynthesis with minimal background natural product production. Intriguingly, the RNA polymerase structure of Streptomyces coelicolor (previously engineered to produce non-native molecules), could be compatible with a bacterial two-hybrid system similar to the one developed in this thesis. Specifically, the α-subunit (rpoA) shares ˜60% sequence identity with E. coli's rpoA and is functional in both organisms. In E. coli, RpoA can play a similar role as rpoZ in the bacterial two-hybrid system without any genomic modification (rpoZ requires a scarless deletion); Initial systems will likely focus on detecting proteases or peptidases—several of these enzymes are known to express in the Streptomyces genus. The resulting systems can then be screened with biosynthetic pathways producing a wide range of natural product classes.
This example describes a B2H system that includes a protease recognition sequence in a linker that connects MidT to RpoZ (
This works begins with a B2H system that links the inactivation of PTP1B to the expression of a gene of interest (GOI). In this system, Src kinase phosphorylates a substrate domain, causing it to bind to a Src homology 2 (SH2) domain, and the substrate-SH2 complex activates transcription of the GOI. PTP1B dephosphorylates the substrate domain, preventing transcription; the inactivation of PTP1B reenables it. Protease-specific detection systems do not require phosphorylation, but it was speculated that the substrate-SH2 interaction could be modified to detect proteases through the addition of protease-specific cleavage sites.
It was determined how protease-specific cleavage sites affect B2H function. In brief, recognition sequences for 3CLpro and HIVpro were added to the linker that connects the substrate domain to RpoZ (the omega subunit of RNA polymerase); these sites with 0-4 alanine residues (which were speculated to modulate protease access); and the output afforded by active and inactive PTP1B were measured, as shown in
Next, the sensitivity of the luminescent systems to protease overexpression was assessed. Here, we used B2H systems modified to contain both (i) protease-specific cleavage sites flanked by 0- or 4-alanine segments and (ii) an inactive PTP1B. In brief, we transformed E. coli with two plasmid-borne modules—(i) a B2H system and (ii) a protease induction system (an arabinose-inducible protease)—and we monitored changes in luminescence caused by protease expression (e.g., arabinose titration;
Additional proteases and protease-specific cleavage sites were screened. In short, recognition sites were added for the papain-like protease of SARS-COV-2 (PLpro), the NS2B/NS3 proteases of West Nile and Dengue Viruses (WNVpro and DVpro, respectively), and ubiquitin-specific protease 7 (USP7). These B2H systems were screened alongside the associated proteases (
Table 10 provides a non-limited list of viral proteases. In this example, 30 viral proteases were considered on the basis that associated viruses contribute to viral diseases with significant unmet medical need, high epidemic potential, and/or relevance to US biodefense. These diseases are listed as (i) priority pathogens by the National Institute of Allergy and Infectious Diseases (NIAID) 19 and/or (ii) priority emerging infectious diseases by the World Health Organization (WHO) 20. The disclosed of viral proteases of Table 10 includes 25 enzymes; each selected protein (or a close homologue) has at least one crystal structure and has been expressed in an active form in E. coli.
These proteases are considered for several reasons: (i) they may complement the modularity of the systems and methods disclosed (e.g., the platforms and/or workflows) for integrating new targets into the B2H system; (ii) data generated in screens may be used to prioritize hits based on unmet medical need, commercial opportunity, and molecular progressivity (e.g., ‘drug-likeness’ or synthetic tractability); and (iii) studying these proteases may inform about inhibitor specificity that could be used to further inform the design of broad-spectrum antivirals or shift focus away from non-selective inhibitors with potential toxicity issues.
To build survival-modulating systems for 3CLpro and HIVpro, two changes were made to a B2H system that includes a gene for spectinomycin resistance as the GOI: (i) a protease was swapped for PTP1B, and (ii) the best-performing cleavage site from our luminescence-based screen was added. To optimize these systems, ribosome binding sites (RBSs) were screened with different translation initiation rates (TIRs) and selected RBSs that enhanced sensitivity to spectinomycin.
The analysis of luminescence-based systems suggests that protein expression is an important adjustable parameter for B2H development. To sample different expression levels without adding an inducer, the ribosome binding site (RBS) calculator, developed by the Salis Lab 174, was used to design RBSs with different translation initiation rates (TIRs), and these sites were screened with drop-based plating. This screen allowed the identification of RBSs for 3CLpro and HIVpro that link protease inactivation to an increase in spectinomycin resistance (
B2H development was continued by focusing on PLpro. To reduce the cloning required to sample different RBSs, degenerate primers were used to screen a small (˜200 member) library of TIRs spanning several orders of magnitude (50-100,000). This rapid screen uses drop-based plating to identify RBSs that confer sensitivity to spectinomycin (e.g., the approach assumes that a reduction in spectinomycin resistance reflects the expression of an active enzyme). Several “hits” were found (e.g., RBSs that confer sensitivity to spectinomycin) for two recognition sequences (
This example describes using microbial systems to guide the discovery and biosynthesis of natural products that inhibit therapeutic protease targets. One to three pathways that confer a survival advantage by producing inhibitors for each of two disease-relevant proteases may be used. By way of example, natural products were formed that inhibit 3CLpro, PLpro, HIV1pro, WNVpro, DVpro, and USP7.
Screening of protease inhibitors began by focusing on terpenoids. This class of natural products was chosen for a number of reasons: (i) Terpenoids include over 80,000 known compounds and represent nearly one-third of all characterized natural products (the basis of approximately 50% of FDA approved drugs); they define a rich molecular landscape for the discovery of bioactive molecules. (ii) Terpenoids can be synthesized and functionalized in E. coli. (iii) A docking study of 3CLpro suggests that it may bind to terpenoids. (iv) Many allosteric sites are only partially solvent exposed and include large nonpolar patches (it was hypothesized that these terpenoids, which are largely nonpolar, might be well suited for finding cryptic allosteric sites, the allosteric site on PTP1B providing a validating example).
Engineered microbial systems provide a powerful tool for screening genes for their ability to generate enzyme inhibitors. For example, most terpenoids are not commercially available, and even when their metabolic pathways are known, their biosynthesis, purification, and in vitro analysis is a resource-intensive process that is difficult to parallelize with existing methods. The B2H systems offer a potential solution: They can identify inhibitor-synthesizing genes with a simple growth-coupled assay. A PTP1B-specific B2H system was used to screen a diverse set of uncharacterized biosynthetic genes. Briefly, a bioinformatic analysis of the largest terpene synthase family (PF03936) was carried out by building and annotating a cladogram of its 4,464 constituent members; from here, they synthesized three uncharacterized genes from each of eight clades: six with no characterized genes and two with some characterized genes (
In a screen of over 70 terpenoid pathways, several pathways were identified to generate inhibitors of 3CLpro. In other applications, pathways that produce inhibitors of other target enzymes (e.g., HIVpro, PLpro, and USP7) may be used.
The library of biosynthetic pathways was expanded to include a larger set of terpenoid pathways, as well as pathways for non-ribosomal peptides (which include many potent, cell permeable protease inhibitors) and phenylpropanoids (which include inhibitors of flavivirus and coronavirus proteases). In brief, the isopentenol utilization pathway (IUP) was coupled with (i) two prenyltransferases (e.g., farnesyl pyrophosphate synthase [FPPS] or geranylgeranyl pyrophosphate synthase [GGPPS]) and (ii) 37 terpene synthases (e.g., the above 24 genes supplemented with 13 others known to generate structurally distinct products). This library includes 74 pathways and—as estimated—at least several hundred structurally distinct terpenoids (e.g., a single terpene synthase can generate as many as 50 products). IUP was chosen over the mevalonate-dependent pathway because it can generate terpenoids from a cheap precursor (e.g., isoprenol), rather than mevalonate; in liquid culture, it produced amorphadiene (C15) and abietadiene (C20) at titers of 1.88-15.05 mg/L and 121.16-1463.01 mg/L intracellularly (caryophyllene equivalents). These titers are sufficient for the intracellular detection of compounds with IC50s less than or equal to 440 μM (it was assumed that the intracellular concentration must be greater than or equal to the IC50).
The protease inhibitor discovery effort began by focusing on 3CLpro and HIVpro. For each target, the B2H system was used to assess the antibiotic resistance conferred by different pathways (
The large set of pathways identified in the screen of GGPP pathways against HIVpro was intriguing. This result was followed up by (i) measuring the inhibition of HIVpro by GGPP, a potential inhibitor common to all GGPP pathways, (ii) investigating the stress response associated with GGPP production, and (iii) attempting to stabilize HIV protease. HIVpro does not have a positively charged active site, so it was not expected for GGPP to inhibit it. It was hypothesized that a stress response was a more likely cause. Briefly, GGPP can slow the growth of E. coli, and a stress response might inactive HIVpro, which is prone to aggregation. Quantitative proteomics will be performed to compare difference in protein levels between GGPPS-harboring and GGPPS-free strains of E. Coli. Additionally, an attempt to stabilize HIVpro inside the cell by attaching it to fusion partners (e.g., thioredoxin and glutathione-S transferase) was performed; these fusion partners can improve the expression of active soluble protein in E. coli, and they do not interfere with inhibition because they are cleaved off by the protease in the cell.
Intriguingly, two diterpene synthases—O64405 and Q41594 (taxadiene synthase and abietadiene synthase, respectively)—and one monoterpene synthase—UPI0018D1934E (1,8-cineole synthase) conferred resistance when paired with a sesquiterpene precursor. Previous biochemical studies of the two diterpene synthases have shown that they can act on FPP to produce bisabolene- and farnesene-type sesquiterpenes; however, the FPP activity of the monoterpene synthase was unexpected. This finding highlights the value of pairing terpene synthases, which are highly promiscuous, with nonnative precursors (a feat unachievable in screens of natural libraries).
The first screen was followed up by focusing on FPP pathways. In brief, two sets of experiments were performed: (i) Drop-based plating to confirm the survival advantage conferred by each hit (e.g., the terpene synthase and associated precursor pathway). (ii) 10-30 ml cultures to examine the product profiles of each hit. Intriguingly, all ten terpene synthase genes afforded a reproducible survival advantage, but many failed to generate terpenoids in liquid culture. This apparent discrepancy between the results of screens on solid media and terpenoid production in liquid culture may have resulted from differences in strains, precursor pathways, or culture conditions (see below). Nonetheless, for 3CLpro, the three pathways that conferred the greatest survival advantage generated α-bisabolol, β-bisabolene, or eucalyptol as major products (
To expand the molecular search space explored in our high-throughput screens, pathways were assembled for nonribosomal peptides and phenylpropanoids. These pathways facilitated the incorporation of heteroatoms (e.g., oxygen, nitrogen, and halogens) at early stages of inhibitor biosynthesis (for terpenoid pathways, the first cyclic molecule is typically a hydrocarbon scaffold). Both sets of molecules also include numerous potent, cell-permeable protease inhibitors (including inhibitors of 3CLpro).
Plasmid-borne biosynthetic routes were chosen for building each new class of natural product. For nonribosomal peptides, nonribosomal peptide synthetases (NRPSs) are identified in bioinformatic analyses of large genomic databanks (e.g., antiSMASH or the NIH Human Microbiome Project). NRPSs are assembly-line enzymes encoded by large gene clusters; they are compatible with expression in E. coli. For phenylpropanoids, one or two plasmids encoding 1-7 bacterial and/or plant genes that convert L-tyrosine or L-phenylalanine to different products were used. Unlike NRPSs, these pathways include discrete enzymes that can be reconfigured to produce different products via combinatorial biocatalysis. Altogether, it was planned to build eight NRPSs and fourteen phenylpropanoid genes that, in various combinations, should have generated over 40 distinct products.
Two carboxylic acid reductases were chosen to study in detail: GupB and Nterp. These enzymes activate two L-tyrosine molecules and reduce them to amino aldehydes, which react to form an unstable imine product that generates a dipeptide pyrazine core (
Pathways were assembled for a structurally diverse set of compounds produced from L-phenylalanine or L-tyrosine (
This example describes using kinetic assays, X-ray crystallography, and in vitro cell studies to characterize new protease inhibitors. Detailed biochemical studies of inhibitors will inform compound optimization efforts that focus on improving potency, solubility, and other drug-like properties. Crystallographic data and cell-based studies of one or more inhibitors may demonstrate a potency supportive of compound optimization (e.g., IC50<5 μM).
Terpenoid biosynthesis were scaled up by coupling large-scale liquid cultures with flash chromatography. Amorphadiene (an early indication of an inhibitor of PTP1B) was produced with greater than >200 mg/L from shake flasks and complete purification (>95% purity) within one week.
Kinetic characterization of α-bisabolol suggested that it was an inhibitor of 3CLpro.
Recombinant 3CLpro was produced in a lab to grow crystals of this protein. X-ray diffraction data was obtained to help complete structural refinement. A 2.1-Å crystal structure of 3CLpro was obtained. Co-crystallization and ligand soaking is used to prepare crystals of the protein-ligand complex. Both approaches have been used in the past, but co-crystallization may be more effective for α-bisabolol, which is nonpolar and may have trouble diffusing to the active site without disrupting the crystal. Crystals of the protein-inhibitor complex may help resolve the mode of inhibition. Proteomics experiments may be performed test a hypothesis that α-bisabolol forms a covalent complex with the catalytic cysteine, of the specific binding site for α-bisabolol.
This example describes a prophetic example for designing B2H systems that incorporate various proteases, in analogous systems disclosed in Example 4: “B2H System with a Protease Recognition Sequence in the Linker”. Two elements integrates into the B2H systems: (i) new protease-specific cleavage sites, and (ii) new viral proteases. Luminescent systems are used, which allow the assessment of both the influence of new cleavage sites on B2H function and the susceptibility of these sites to proteolysis. Starting with functional and operable systems, for example, systems disclosed elsewhere in the present application, the luminescence gene is swapped with a gene for antibiotic resistance. Problematic designs will be screened by alternative RBSs, cleavage sites, and protease expression strategies (e.g., chaperones and/or partial truncations).
It is assessed whether proteolysis of the MidT-RpoZ fusion inhibits B2H activation. The protein-protein interaction that controls expression of the GOI occurs between (i) the kinase substrate (MidT), which is fused to the omega subunit or portions thereof of the RNA polymerase (MidT-RpoZ), and (ii) an SH2 domain, which is fused to 434cI, as shown in
It is assessed whether native E. coli proteases act on the protease recognition sequences. E. coli has native proteases that could act on the cleavage sites present in our B2H systems. This interaction was not observed in B2H systems so far. Without being bound to a particular theory, the lack of observation regarding this effect may be because of the uniqueness of the chosen sites. If the interaction is observed in experimental systems described in this example, alternative protease-specific cleavage sites may be screened or evolved. For the screening methodology, 3-5 alternative sites may be identified from literature. For the evolution methodology, B2H systems that (i) lack a target protease, (ii) contain SpecR as the GOI, and (iii) include sequences with alternative residues flanking the cut site, will be used. First, B2H systems will be screened for growth on spectinomycin to identify sequences that are stable in E. coli. Then, these hits will be paired with target protease in a luminescence-based screen (such as the one depicted in
This example describes a prophetic example using DNA barcoding and next-generation sequencing to parallelize screens of multiple targets and pathways. This example discloses (i) screening ten targets against 100 pathways in a single experiment, and (ii) a set of three potent inhibitors (e.g., IC50<10 μM) for each of five viral proteases.
Natural products represent a longstanding source of pharmaceuticals and medicinal preparations. Without being bound to a particular theory, natural products, as a result of their biological origin, tend to exhibit favorable pharmacological properties (e.g., bioavailability and “metabolite-likeness”) and exert a striking variety of therapeutic effects (e.g., analgesic, antiviral, antineoplastic, anti-inflammatory, immunosuppressive, and immunostimulatory). This example describes adding new targets and pathways, and by enhancing the throughput screens.
Disclosed herein are a broad set of modular metabolic pathways for terpenoids, nonribosomal peptides, and phenylpropanoids. These classes include some natural products and protease inhibitors. Also disclosed herein is a library that includes (i) 40 terpene synthases (each of which can be paired with one of three precursor pathways), (ii) 8 nonribosomal peptide synthetases (NRPSs, which can be reconfigured to generate alternative products), (iii) and 23 phenylpropanoid-generating enzymes (e.g., three precursor-enzymes, nine phenylpropanoid enzymes, and 10 tailoring enzymes). This set includes over 100 biosynthetic pathways. The precise number and diversity of possible products is difficult to quantify (a single terpene synthase can produce over 50 terpenoids), but 1,000 is a conservative estimate. This number may seem small in comparison to drug discovery campaigns that begin with libraries having ten million molecules (or more). However, the library disclosed herein are influenced by historical successes and failures, includes only a fraction of potential biologically active molecules, and are typically whittled down to libraries of ˜10,000 likely inhibitors early in the discovery process. The libraries of the present disclosure include a unique set of molecules that are both (i) absent in contemporary libraries (even existing libraries of natural products include molecules pre-optimized by living systems for their own ends) and (ii) biased to be biologically active (e.g., living systems typically use these classes of molecular structures for defense and inter-species signaling). The ability for systems and methods to find a novel inhibitor of 3CLpro, one of the most widely screened enzymes in the world, highlights the advantages of our approach, even when used with relatively small libraries as disclosed herein.
Large-Scale Screens with Many Targets and Many Pathways.
A high-throughput method for screening many targets and pathways in parallel accelerates the discovery of early hits and provide insights about hit selectivity and off-target activity before kinetic assays. The following describes an approach for combining at least ten targets and 100 metabolic pathways into a single screen, as shown in
Enzymes from secondary metabolism facilitate evolutionary adaptation by enabling rapid changes in enzyme function; a single mutation can dramatically alter their substrate specificities and product profiles. This example describes using directed evolution and combinatorial biosynthesis to diversify biosynthetic pathways and broaden early screens (e.g., a barcode could represent a collection of mutated pathways).
Terpenoid pathways are diversified by using directed evolution. For background, the active sites of terpene synthases contain constellations of amino acids that guide catalysis by controlling the conformational space and solvation environment available to reacting substrates. These attributes are modified by using (i) random mutagenesis and (ii) site-saturation mutagenesis (SSM). For SSM, poorly conserved residues located near (<8 Å) the active site are mutated. Resulting mutant libraries will be screened in six steps: (i) The mutant libraries are transformed into B2H-containing E. coli cells. (ii) The transformed cells are plated on solid media with different concentrations of spectinomycin. (iii) Colonies are picked that grow on plates with concentrations of spectinomycin at which the wild-type enzymes do not permit growth. (iv) The terpene synthase genes are sequenced. (v) All hits are verified, and potential background mutations are removed by reintroducing the associated mutations into the starting (e.g., non-surviving) pathway, and by carrying out drop-based plating to retest resistance. The final products are analyzed, purified, and tested as described above. This effort will focus on γ-humulene synthase and epi-isozizaene synthase, which produce many products,
Non-ribosomal peptide and phenylpropanoid pathways are diversified by using domain shuffling and combinatorial biosynthesis. Both efforts focus on the incorporation of non-native tailoring enzymes (e.g., halogenases and cytochrome P450s). Note: Cytochrome P450s, which are membrane-bound, can be challenging to express in bacterial systems. Eukaryotic P450s are expressed in bacterial hosts (e.g., engineering the N-terminal transmembrane helix, co-expression of an appropriate reductase enzyme).
Compounds generated by pathways that confer a survival advantage are identified and purified by using any one of the relevant method or system disclosed herein. Briefly, flash chromatography and high-performance liquid chromatography (HPLC) purify compounds, and gas chromatography-mass spectrometry (GC-MS) and nuclear magnetic resonance (NMR) spectroscopy are used to identify them. Note: In some cases, compounds may be identified by sampling crude extract (e.g., identification is not dependent on purification).
Potential inhibitors are characterized, and their mode of inhibition is investigated by combining in vitro kinetic assays, X-ray crystallography, and mutational analyses. Briefly, viral proteases are expressed, purified, and crystallized with methods (Table 10). IC50 curves are constructed and crystallographic and mutational studies will be conducted on verified inhibitors.
On-target activity is assessed by using cell-based assays. A wide range of antiviral assays may be employed (e.g., assessment of microscopic cytopathic effects and plaque reduction neutralization tests).
It is assessed if the number of metabolic pathways is sufficient to generate protease inhibitors. It is difficult to estimate the library size required to find an inhibitor of a given target a priori, given the importance of compatibility between library diversity and target structure. A library of the present disclosure produced at least one novel inhibitor of 3CLpro, and the growing collection includes NRPSs that generate peptide aldehydes, a class of molecules that includes potent (IC50˜ 10 nM) inhibitors of serine and cysteine proteases. A peptide aldehyde served as the basis for Bortezomib, an FDA-approved proteasome inhibitor. If initial screening efforts do not yield potent protease inhibitors, the library of biosynthetic pathways may be expanded by adding new genes.
It is assessed if some pathways generate too many products. Highly promiscuous terpene synthases can generate many products but some tend to synthesize only 2-3 major ones (˜50-75% of total). Some examples are 8-selinene synthase and γ-humulene synthase, which convert farnesyl pyrophosphate into ˜30 and ˜50 detectable products, respectively, but only three major products. Inhibitors are isolated from mixtures by using dereplication methods with proteases and phosphatases, which is disclosed elsewhere in the present application.
Strategies for finding inhibitors with improved potencies are assessed. Pathways that generate 1-200 mg/L of natural products are used. At the higher high titers, pathways could produce weak inhibitors at sufficient quantities to inhibit target proteases inside the cell. To improve the stringency of the screen, lower inducer or precursor concentrations can be lowered to reduce inhibitor biosynthesis during drop-based plating. This condition biases the search toward potent inhibitors that function at low concentrations.
Natural products are sometimes considered difficult starting points for pharmaceutical development, in some cases because of their limited natural availability and high synthetic complexity (e.g., compounds with multiple stereocenters). This example describes an approach for identifying molecules having improved potency and drug-like properties over α-bisabolol—and, perhaps, over other 3CLpro inhibitors identified—for the treatment of COVID-19. Contemplated herein is an inhibitor having an IC50 of <100 nM. This IC50 can be sufficient for some animal studies and exhibits a 30-fold improvement over the initial IC50. This example also describes a workflow for the (bio) synthetic optimization of hits identified with a platform disclosed herein. This work seeks to develop a general (bio) synthetic workflow for progressing early-stage hits into drug-like compounds.
Coronaviruses contain a single-stranded positive-sense RNA genome encased in a membrane envelope, as illustrated in
The SARS-COV-2 genome encodes 16 non-structural proteins, 9 accessory factors, and 4 structural proteins. As illustrated in
As disclosed elsewhere herein, α-bisabolol was found to inhibit 3CLpro. Briefly, α-bisabolol is an unusual—and, possibly, covalent—inhibitor; whereas some of the other 3CLpro inhibitors in clinical development are peptide mimics. A crystal structure of 3CLpro bound to α-bisabolol will be collected, evaluated for off-target activity against human cysteine proteases (e.g., Cathepsin L and B), and be tested for cell-based antiviral activity with IAR.
Clinical candidates may be developed in stages: (i) hit identification, (ii) hit-to-lead optimization, and (iii) lead optimization (in broad terms). A promising hit often possesses at least five features to enter hit-to-lead optimization: (i) single-digit micromolar potency or less, (ii) a crystal structure of the protein-inhibitor complex to guide synthetic chemistry, (iii) a strategy for chemical functionalization, (iv) a readily synthesizable core structure, and (v) several alternative structures as backups. Features (iii)-(v) may hinder the progression of natural products into promising candidates; natural products can have low natural titers and complex chemistries.
Though α-bisabolol is commercially available, most structural variants are not. Some examples of structural variants are shown in
Synthetic chemistry and enzymatic functionalization are combined to improve the potency and drug-like properties of potent cores, starting with α-bisabolol. Inspired by plant secondary metabolism, cytochrome P450 enzymes will be used to selectively hydroxylate unactivated carbon-hydrogen bonds. An enzyme panel that includes human, insect, and plant P450s is screened. Note: Human P450s can selectively functionalize (+)-epi-α-bisabolol and likely accept α-bisabolol as a substrate. For the screening effort, three strategies are pursued: (i) a B2H screen, (ii) whole cell biocatalysis, and (iii) biocatalysis with purified membrane fractions. The first approach could identify potency-enhancing functional groups in growth-coupled assays (e.g., selection); the latter two will use GC-MS and LC-MS to identify functionalized molecules. This work may resolve enzymatic schemes for the diversification and optimization of terpenoids.
Improved inhibitors will be characterized (e.g., more potent or more soluble inhibitors) with cell-based assays and in vitro absorption, distribution, metabolism, and excretion (ADME) studies. Briefly, cell-based assays and ADME studies may be conducted.
Upon identifying an inhibitor with a potency less than 100 nM, an animal study is used to evaluate bioavailability, pharmacokinetics, and basic toxicity.
In some embodiments of high-throughput screens, S1030 cells are transformed with a B2H system, an isopentenol utilization pathway, and a terpene synthase, and are grown on solid media (LB agar plates). The S1030 cells lacked the RpoZ subunit of RNA polymerase; the isopentenol utilization pathway allowed for modulation of terpenoid production by controlling the concentration of isoprenol, an essential precursor; and the LB agar improves the stringency of the screen. To follow up on interesting hits, DH5α or DH10B cells were transformed with a complete mevalonate-dependent isoprenoid pathway (e.g., one that generates FPP or GGPP from acetyl-CoA) and a terpene synthase and were grown in 0.01-1 L of liquid TB media. The complete isoprenoid pathway reduces the cost of high-titer expression by avoiding the use of exogenously added substrates, and the strains (e.g., DH5α and DH10B) and media improve titers. Moving forward, an intermediary step may be added in which the terpenoid profile will be analyzed directly from the solid media (e.g., the screen).
A procedure for testing solid media was developed: A small section of the agar was removed, cells were lysed, and hexane overlay was used to extract a sample for GC-MS; this approach yielded detectable terpenoids (
The screening approach allows identifying pathways that confer a survival advantage in the presence of a B2H system. This advantage may result from target inhibition, but this connection may be tested in additional ways. In vitro kinetic assays can be used to carry out such tests.
In brief, 3Clpro was expressed with both (i) an N-terminal GST tag (which is cleaved by the protein during expression) and (ii) a C-terminal polyhistidine tag (which facilitates purification with nickel-affinity chromatography). A precision protease was used to remove the C-terminal tag prior to anion exchange. This protocol yielded titers of purified protein (˜10 mg/L) sufficient for in vitro kinetic assays and X-ray crystallography. Notably, similar protocols, which yield proteins without expression artifacts (e.g., tag or linker), are compatible with the other target proteases explored in this proposal.
Purified terpenoids were produced by coupling large-scale liquid cultures with chromatographic separation. 1-L liquid cultures in high-yield flasks were cultured for 2-4 days, after which (i) a hexane overlay was used to extract all terpenoids, (ii) vacuum liquid chromatography was performed for initial separation, and (iii) flash chromatography (normal or reverse phase silica) was performed to isolate products of interest. At various steps in this process, 1H NMR, GC-MS, and thin layer chromatography (TLC) was performed to monitor purification. Analysis of amorphadiene highlights the steps the results afforded by these methods (
Förster resonance energy transfer (FRET) peptides provide a facile means of assaying protease inhibitors. These peptides contain a fluorophore and a quencher separated by a protease recognition domain; peptide cleavage separates the fluorophore and quencher and increases fluorescence.
For 3CLpro, a commercially available substrate was used: Mca-AVLQSGFRK(Dnp)K (SEQ ID NO: 1), where Mca ((methoxycoumarin-4-yl) acetyl) was the fluorophore and 2,4-dinitrophenyl (Dnp) was the quencher. Using this assay, inhibition by eucalyptol and α-bisabolol (
The inhibitory mechanisms of newly discovered hits are analyzed by collecting X-ray crystal structures of protein-inhibitor complexes. These structures can reveal (i) the contribution of protein-ligand contacts (hydrogen bonds, halogen bonds, and van der Waals contacts) to differences in binding affinity and (ii) new modes of covalent inhibition.
Biostructural analyses began with 3CLpro. The asymmetric unit of this enzyme forms one polypeptide, which associates with another polypeptide to form a crystallographic two-fold axis of symmetry. Inhibitors of 3CLpro can be co-crystallized or soaked into ligand-free crystals. This enzyme has 3 domains: domain I (8-101), domain II (102-184), and domain III (201-303). The Cys-His catalytic dyad and substrate-binding cleft sit between domains I and II. Previous covalent inhibitors of 3CLpro have involved the formation a covalent adduct with the catalytic cysteine (C145); however, unlike N3—a well characterized covalent inhibitor—α-bisabolol is not a Michael acceptor. A variety of crystallization buffers were screened and a structure of 3CLpro was collected, shown in
Cell-based assays of 3CLpro inhibitors is described herein. It begins with a plaque assay, which quantifies the plaques formed in cell culture upon infection with serial dilutions of a virus, which is the standard methodology for quantifying concentrations of replication-competent lytic virions. Neutral red is used to stain monolayers of mammalian cells to look for differences in plaque formation associated with α-bisabolol treatment. Next, real time quantitative PCR (RT-qPCR) is be used to measure viral yield reduction in cells treated with α-bisabolol. For both studies, Vero E6 cells is used.
Cell-based assays for USP7 inhibitors are described herein. For background, USP7 is an important regulator of MDM2, an E3 ligase that promotes proteosomal degradation of the tumor suppressor p53. Human colon cancer cells (HCT 116) are treated with increasing concentrations of inhibitor, lyse them, and use a ubiquitin-propargylamine (Ub-PA) probe is used to measure on-target engagement (e.g., this probe should compete with the inhibitor binding to USP7 but not USP47 or other off-target USPs212). Next, a similar experiment is performed, but Western Blots are performed to examine the influence of inhibitors on downstream signaling targets. In particular, concentration-dependent decrease in MDM2 and increase in p53 and p21 will be examined. This analysis will help to establish on-target activity in mammalian cells.
A bacterial two-hybrid (B2H) system in which a phosphorylation-mediated binding event activates transcription of a GOI (as shown with respect to
A general architecture to detect protease inhibitors was selected. Two protein fusions formed the core of the base B2H: (i) a Src homology 2 (SH2) domain fused to the cI repressor, and (ii) a kinase substrate domain (MidT) fused to the omega subunit of RNA polymerase (RpoZ). Src-mediated phosphorylation of the substrate domain allowed the substrate domain to bind to the SH2 domain, and the resulting substrate-SH2 complex activated transcription of the GOI by localizing RNA polymerase to its promoter; PTP1B-mediated dephosphorylation of the substrate, in turn, prevents activation. 3CLpro overexpression was found to reduce GOI transcription (as shown with respect to
A fusion of a protease recognition (PR) site to the substrate-RpoZ fusion by adding PR sites for HIVpro and 3CLpro, each flanked by 0-4 alanine residues (as shown with respect to
Luminescent B2H systems were used to screen different combinations of proteases and protease-specific cleavage sites (as shown with respect to
Single-plasmid B2H systems were constructed by making three modifications to the base system (i) exchanging the gene for PTP1B with protease genes, (ii) adding the PR sites selected in the luminescent screen, and (iii) exchanging the luciferase gene (LuxAB) for a spectinomycin resistance gene (SpecR). The B2H for USP7 worked immediately (
The B2H systems of example 13 were evaluated to find unexpected inhibitors by using them to screen terpenoid pathways, as the pathways generate mixtures of products that are challenging to purify. As nonpolar molecules, terpenoids are also scaffolds for building protease inhibitors, which are typically peptide mimics. Briefly, each terpenoid pathway was assembled with two plasmid-borne modules, (1) pIUP, which converts isoprenol to farnesyl pyrophosphate (FPP), and (2) pTS, which encodes a terpene synthase (TS, as shown with respect to
The initial hits were refined through two steps. First, product profiles in liquid culture were examined by pairing each TS with a plasmid harboring the mevalonate-dependent isoprenoid pathway from Saccharomyces cerevisiae (pAM45), which pathway afforded high titers of sesquiterpenes in E. coli and, thus, facilitated TS characterization. Of nine initial hits, five generated products detectable with GC-MS (with respect to
In vitro kinetic assays allowed for examination of the inhibitory effect of α-bisabolol. The small amount of α-bisabolol produced by Q41594 in liquid culture was difficult to purify, so three commercially available diastereomers ((−)-α-bisabolol, (+)-α-bisabolol, and (+)-epi-α-bisabolol) were tested. All three diastereomers had similar IC50s, which ranged from 30±5 μM to 80±37 μM, a range consistent with the Kis of compounds identified with previous genetic screens (regarding
TSs may exhibit different toxicities in E. coli, even in the absence of isoprenoid pathways and/or active B2H systems, which may be a result of differences in protein expression or solubility. To evaluate the contribution of TS toxicity to the fitness advantage conferred by Q41594, E. coli was transformed with plasmids harboring Q41594, E3W205, and O64405 (e.g., a hit and two non-hits) and grew the transformed strains in liquid culture (regarding
A handful of well-characterized TSs produce α-bisabolenes. These enzymes were used to carry out a systematic analysis of the link between α-bisabolol production and antibiotic resistance. In a B2H screen of seven additional TSs, two α-bisabolol producers emerged as hits: (i) A0A1L7NYG3, a (+)-α-bisabolol synthase from Artemisia kurramensis, and (ii) J7LH11, a (+)-epi-α-bisabolol synthase from Phyla dulcis (
In vitro analysis was completed by examining the inhibitory effects of three other bisabolenes produced by TSs included in the screen (as shown with respect to
Materials for carrying out the methods as described in Examples 13-14 may include M9 minimal salts, tris(2-carboxyethyl) phosphine (TCEP), bovine serum albumin (BSA), phenylmethylsulfonyl fluoride (PMSF), 3-methyl-2-buten-1-ol (prenol), dimethyl sulfoxide (DMSO), isopropyl-D-thiogalactopyranoside (IPTG), (−)-α-bisabolol, 3CLpro fluorogenic peptide substrate (TSAVLQ_AFC), 7-Amino-4-trifluoromethylcoumarin (AFC), BugBuster® 10× Protein Extraction Reagent, Steriflip filters, and ACS grade hexane from Millipore Sigma; glycerol and lysozyme from VWR; deuterated chloroform from Cambridge Isotope Laboratories (99.8% D); cloning reagents from New England Biolabs; BL21 (DE3) pLysS competent cells from Novagen; pGEX-4T-1 GST vector from GenScript; 2.5-liter Ultra Yield Flasks from Thomson Instrument Company; antibiotics, media components, pre-made HEPES buffer (1 M pH 7.3), and Human Rhinovirus (HRV) 3C protease from Thermo Fisher; lysozyme from Thermo Scientific; imidazole from Teknova; 30-kDa Spin-X UF spin columns from Corning; HisTrap HP and HiTrap Q-HP columns from Cytiva; glycerol, bacterial protein extraction reagent II (B-PERII), and lysozyme from VWR; and (+)-α-bisabolol, (+)-epi-α-bisabolol, (−)-β-bisabolol and (−)-β-bisabolene from Toronto Research Chemicals. A vanillin-sulfuric acid solution was prepared by adding 7 g of vanillin and 1.3 mL of concentrated H2SO4 to 200 mL of methanol for TLC visualization. Certain bacterial strains were also used in the methods, such as E. coli. Chemically competent NEB Turbo cells were used for molecular cloning, chemically competent or electrocompetent S1030 cells (Addgene #105063) for luminescence studies and drop-based plating, DH5α for terpenoid production, and E. coli NEB BL21 (DE3) for protein overexpression.
Chemically competent cells were generated in six steps: (i) cells were plated on LB agar plates with the requisite antibiotics (listed in
Electrocompetent cells were generated by following an approach similar to the one above. In step iv, the cells were resuspended in 1 mL of ice-cold Milli-Q water, then recentrifuged and resuspended in sterile ice-cold 20% glycerol twice. The pellets were frozen as before.
Luminescence assays were carried out in seven steps: (i) S1030 cells were transformed with protease-free B2H systems with and without pBad plasmids listed in
The spectinomycin resistance of B2H-containing strains was examined through six steps: (i) S1030 cells were transformed with pIUP_FPP and variants of pTS and pB2H (Table S2), the transformed cells were plated on LB agar supplemented with antibiotics for plasmid maintenance (50 μg/ml kanamycin, 100 μg/ml carbenicillin, 10 μg/ml tetracycline, and 34 μg/ml chloramphenicol), and grown overnight (37° C.); (ii) single colonies were used to inoculate 1 mL TB (pH=7.0 supplemented with plasmid antibiotics) and grew the cells overnight (37° C. and 225 rpm); (iii) an aliquot of each culture was diluted 1:100 in TB (as above), and 3 μL of the dilution was plated on LB agar plates (pH=7.0) supplemented with 10 mM isoprenol, 50 μM IPTG, 20 mL/L glycerol, antibiotics for plasmid maintenance (as above), and varying concentrations of spectinomycin, unless otherwise specified in the figures; and (v) the cells were grown at 22° C. for at least 48-72 hours before photographing them.
Small-scale terpenoid production was carried out in TB (pH=7.0) supplemented with antibiotics (
Terpenoids generated in liquid culture were measured with a gas chromatograph/mass spectrometer (GC-MS; a Trace 1310 GC fitted with a TG5-SilMS column, 15 m×0.25 mm, film thickness 0.25 μm and an ISQ 7000 MS; Thermo Fisher Scientific). All samples were prepared in hexane and diluted highly concentrated samples 10-20 times prior to bring concentrations within the MS detection limit. For full scans, the following GC method was used: hold at 40° C. (1 min), increase to 250° C. (30° C./min), hold at 250° C. (10 min). For the select-ion scans (SIM;
To quantify α-bisabolol and β-bisabolene, GC/MS standard curves were built of structurally similar molecules. Store-bought (−)-α bisabolol was used, and, in the absence of a highly pure analytical standard of β-bisabolene, α-bisabolene isolated from bacterial cultures was used. A series of stocks of both standards in hexane was created and analyzed with GC/MS as outlined above.
α-bisabolene was produced by carrying out the following steps: (i) E. coli DH5α was transformed with pAM45 and pTS containing α-bisabolene synthase (Uniprot ID: O81086) and used individual colonies to inoculate six 20-mL starter cultures (TB, pH-7.0 supplemented with plasmid antibiotics); (ii) each starter culture was used to inoculate a 50-mL culture (e.g., a 1:50 dilution in TB, pH=7.0), which was grown to an OD of 0.3-0.6 (37° C., 225 rpm), induced with 500 μM IPTG, and then grown for 144 hours (22° C., 225 rpm); (iii) the six 50-mL cultures were combined with 90 mL hexanes and agitated at room temperature for 30 minutes (vortexer); (iv) a separatory funnel was used to remove the hexanes and added them to a 500-mL centrifuge tube, which was spun at 4000 rpm for 20 minutes; (v) the supernatant was moved to a round bottom flask and evaporated the hexanes under vacuum to produce crude oil; (vi) 71.4 mg of crude oil was loaded onto a 5-g silica column (Sigma) and the non-α-bisabolene components were removed with vacuum liquid chromatography (VLC). This method yielded 25 5-ml fractions: 15 fractions with 0% ethyl acetate in hexanes, 5 fractions with 5% ethyl acetate in hexanes, and 5 fractions with 20% ethyl acetate in hexane; (vii) the fractions were analyzed with thin layer chromatography (TLC, 3:7 ethyl acetate/hexane) using vanillin acid-sulfuric acid as the detection method, where α-bisabolene appears as a dark blue spot on the TLC plates. Three fractions were enriched in α-bisabolene; (viii) these fractions were combined and dried with a rotary evaporator; (ix) the final composition was confirmed with 1H NMR in CDCl3 (300 MHz,
NMR spectroscopy was carried out at the BioFrontiers Nuclear Magnetic Resonance Facility at CU Boulder. All experiments were completed at 25° C. with a Bruker Accent 300 MHz spectrometer equipped with a Bruker 5 mm Smart Broadband Observe solution probe (BBFO), and final spectra were processed with MestReNova 14.2 software.
The stereochemistry of (−)-α-bisabolol, (+)-α-bisabolol, (+)-epi-α-bisabolol, (−)-β-bisabolol, and (−)-β-bisabolene reflect stereochemistry or specific rotation values reported in vendor certificates. The specific rotation of (+)-α-bisabolene was determined by using an Anton Paar MCP-200 polarimeter. In brief, the sodium D line was used at 589 nm with a cell path length of 100 mm. For α-bisabolene, 12.18 mg of the colorless oil was dissolved in 3.0 mL CHCl3 (0.406 g/100 mL CHCl3), placed the resulting solution inside the cell, and allowed the temperature to equilibrate to 25° C. before collecting a reading.
Intracellular concentrations of terpenoids were examined by extracting these compounds from cells grown in 4-mL cultures. Briefly, at 72 hours, 1 mL of cell culture was removed and centrifuged for 3 minutes (4000×g), and the supernatant was discarded. Terpenoids were extracted from the cell pellet by adding 600 μL hexane and 100-μL of 0.1-mm disrupter beads (Chemglass, CLS-1835-BG1) and vortexing the suspension for 30 minutes. The resulting lysate was centrifuged at 17,000×g for 10 minutes and analyzed the resulting hexane layer using GC/MS as described above. Finally, intracellular concentrations of each terpenoid (Ccell) were determined by using Eq. 1:
where Cculture is the concentration of terpenoids in the hexane, Vhexane is 600 μL, n is the extraction efficiency, COD is the OD-specific cell concentration (7.8×108 cells ml-1 OD-1), and Vcell is the volume of a single cell (4.4 fL/cell) 72. For initial estimates, η=1 was used, which assumes both complete cell lysis and complete partitioning of terpenoids from the aqueous to the organic layer; accordingly, the approach may underestimate intracellular terpenoid concentrations.
3CLpro was overexpressed in E. coli. In brief, BL21 (DE3) pLysS competent cells was transformed with a pGEX-4T-1 GST vector containing full-length 3CLpro with a 6× polyhistidine tag and a Human Rhinovirus (HRV) 3C protease site on its C-terminus (e.g., Q*GPHHHHHH (SEQ ID NO: 291), where Q is the C-terminal residue of the protein and * is the protease cleavage site). Two colonies were used to inoculate two 10-ml liquid cultures (LB supplemented with 50 μg/ml carbenicillin and 34 μg/ml chloramphenicol), which was grown overnight in an incubator shaker (37° C. and 200 rpm). These starter cultures were used to inoculate two one-liter cultures in 2.5-liter Ultra Yield Flasks, which were placed in an incubator shaker (37° C. and 200 rpm). At an OD600 of 0.65, the temperature was lowered to 16° C., protein expression was induced by adding 0.5 mM dioxane-free isopropyl-D-thiogalactopyranoside (IPTG), and grew the cultures for 18 hours. Final cultures were centrifuged, the pellets resuspended in 20 mL of Lysis Buffer (50 mM Tris, 1% Triton X-100, 300 mM NaCl, pH 8.0), and stored at −80° C.
3CLpro was purified from cell pellets by using fast protein liquid chromatography (FPLC). To begin, the frozen cell pellets were lysed by adding a solution containing 120 μl of Bond Breaker with 500 mM TCEP (Thermo Scientific), 100 μg lyophilized Lysozyme (Thermo Scientific), 2 mL BugBuster® 10× Protein Extraction Reagent (EMD Millipore), and 20 μl of 25 U/μl Benzonase (Millipore Sigma) to each pellet. These samples were rocked at room temperature for 1 hour and spun them down at 16000×g for 25 minutes. The supernatant from each lysis reaction was combined, with imidazole (Teknova) was added for a final concentration of 5 mM, and the final solution was filtered with 0.22 μm Steriflip filter (Millipore Sigma). Filtered solution was loaded onto a 5-mL HisTrap HP column (Cytiva) using a GE Akta Purifier 10, the column was washed with five column volumes of Tris buffer (50 mM Tris, 300 mM NaCl, 50 mM Imidazole, 0.5 mM TCEP, pH 8.0), and the protein was eluted with imidazole (50 mM to 200 mM imidazole). A 30-kDa Spin-X UF spin column (Corning) was used to concentrate the final protein to 10 mg/mL in cold HRV 3C cleavage buffer (50 mM Tris pH 7.0, 150 mM NaCl, 1 mM EDTA, 0.5 mM TCEP). Rhinovirus 3C Protease (Thermo Pierce) was added at a ratio of 1 mg HRV 3C for every 3 mg of 3CLpro and the proteolysis reaction was incubated at 4° C. for 16 hours. To remove the his-tagged HRV 3C protease and unproteolyzed 3CLpro, the final sample was diluted in Tris buffer (50 mM Tris, 300 mM NaCl, 0.5 mM TCEP, pH 8.0) to lower the imidazole concentration below 10 mM, loaded it onto 5-mL HisTrap HP column, and the flowthrough was collected. The final protein was filtered with a 0.45-μm filter, diluted 20-fold into Tris buffer (25 mM pH 8.0), and loaded onto an equilibrated 5-mL HiTrap Q-HP column (Cytiva). The loaded column was washed with five column volumes of Tris buffer (25 mM, pH 8.0) and eluted with salt (25 mM Tris, 500 mM NaCl, pH 8.0). The fractions were pooled with 3CLpro and exchanged into cold Tris buffer (50 mM Tris, 1 mM EDTA, 0.5 mM TCEP, pH 7.3), and the protein was concentrated to >10 mg/mL with a 30 kDa cutoff Spin-X UF spin column prior to freezing at −80° C.
The inhibitory effects of various compounds were characterized by measuring their influence on 3CLpro-catalyzed proteolysis of a fluorogenic peptide substrate (TSAVLQ_AFC). Briefly, 100-μL reactions were prepared consisting of 5 μg/mL SARS-COV-2 3CLpro, 10 μg/mL TSAVLQ_AFC, and 0.01-10,000 μM terpenoid in HEPES buffer (25 mM, pH=7.3) with 1% DMSO. These reactions were initiated by adding peptide substrate, and the proteolysis of fluorogenic peptide was monitored by measuring fluorescence (λex=400 nm, λem=505 nm) every 10 s for 10 min (SpectraMax iD3 plate reader).
While preferred embodiments of the present inventive concepts have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the inventive concepts. It should be understood that various alternatives to the embodiments of the inventive concepts described herein may be employed in practicing the inventive concepts. It is intended that the following claims define the scope of the inventive concepts and that methods and structures within the scope of these claims and their equivalents be covered thereby.
A. grandis
E. coli
T. Canadensis
E. coli
A denotes NIAID Priority Category A;
B denotes NIAID Priority Category B;
C denotes NIAID Priority Category C; and
W denotes WHO Priority Emerging Infectious Disease.
H. sapiens
H. sapiens
P. luminescens
Escherichia coli
Escherichia virus Lambda
Rous sarcoma virus
H. sapiens
H. sapiens
S. cerevisiae, A. thaliana,
E. coli
E. coli
Artemisia annua
Abies grandis
Abies grandis
Taxus brevifola
Abies grandis
S. Suecicum HHB10207 ss-3
L. perrieri
H. vulgare
O. sativa
O. glumipatula
S. olaracea
A. aquimarinus
N. tabacum
B. oleracea
P. trichocarpa
J. curcas
A. Muscaria Koide BX008
E. guttata
S. indica
P. subalpine
B. napus
S. stellatus SS14
T. terrestris ATCC 38088
A. gallica
H. sublateritium FD-334
T. urartu
H. vulgare
A. gephyra
P. cablin
Z. officinale
S. austrocaledonicum
S. clavuligerus
H. impetigenosus
A. maritima
N. tabacum
S. tuberosum
S. lycopersicum
H. sapiens
H. sapiens
E. coli.
cerevisiae
A.
thaliana
E. coli
Zingiber
officinale
Santalum
austrocaledonicum
cardunculus
Artemisia
kurramensis
dulcis
grandis
Solanum
habrochaites
Artemisia
annua (Sweet
Novosphingobium
aromaticivorans
An inverted bacterial two-hybrid system was developed based on the phosphorylation dependent B2H system. DH10BΔRpoω cells (e.g., DH10B cells with the gene for the omega subunit of RNA polymerase knocked out) were produced harboring either (i) the system depicted in
The inverted bacterial two-hybrid system was also developed that links kinase activity to the repression of a gene for spectinomycin resistance, as shown in
An inverted bacterial two-hybrid system was developed based on the phosphorylation dependent B2H system. E. Coli cells were engineered to express the Src Kinase Inverted B2H system in
In other embodiments of the B2H system, a cI-SH2 fusion partner is expressed constitutively from a prol promoter (
The B2H system has been developed to use different DNA Binding domains (
In
In
In
An experiment using a B2H system using iLID-SsrA/SspB as binding partners was used to interrogate the effect of an rpoA substitution on transcriptional activity in multiples strains of E. coli. Red fluorescent protein, mRuby3, was the gene of interest as a reporter of transcriptional activation (
A next-generation sequencing experiment carried out in E. coli cells harboring a B2H system that links PTP1B inactivation to the expression of gene for spectinomycin resistance and a terpenoid pathway, comprising an isoprenoid pathway (pAM45), a terpene synthase, and a cytochrome P450 (CYP2A6) (
Selection experiments carried out with two strains of E. coli were grown in liquid media in the presence of spectinomycin (LB with antibiotics for plasmid maintenance and concentrations of spectinomycin as indicated) (
Next generation sequencing was used to identify terpenoid pathways that confer a survival advantage under spectinomycin selection (
The two-hybrid system may be used to detect a presence of bioactive molecules that enhance activity of the target enzyme, instead of inhibitors of the target enzyme. This may be accomplished using the same two-hybrid system described elsewhere herein (see for e.g., Examples 1-22) by measuring a decrease in expression of the gene of interest (GOI) rather than an increase in expression of the GOI relative to a reference expression level. A reference level may be obtained from an otherwise identical cell that does not comprise a functional metabolic pathway that produces the bioactive molecule, the ligand or the receptor.
The two-hybrid system may be used to detect a presence of a bioactive molecule that modulates activity of a kinase. A phosphate-dependent two-hybrid system described elsewhere herein can be used to detect a presence of a bioactive molecule that enhances activity of a kinase. In each case, you would expect an increase in expression of the gene of interest (GOI).
An inverted phosphate-dependent two-hybrid system described in Example 16 can be used to detect a presence of a bioactive molecule that inhibits activity of a kinase. Inhibition of a kinase will prevent the kinase from phosphorylating the kinase substrate, thereby preventing the kinase substrate from binding to the phosphorylated protein binding domain. Without formation of the kinase substrate-phosphorylated protein binding domain pair, transcriptional activation of the gene of interest does not occur, thereby increasing expression of a reporter polypeptide (that is inversely correlated with the expression of the GOI).
This application claims the benefit of U.S. Provisional Application No. 63/274,988, filed Nov. 3, 2021, U.S. Provisional Application No. 63/281,023, filed Nov. 18, 2021, U.S. Provisional Application No. 63/318,302, filed Mar. 9, 2022, and U.S. Provisional Application No. 63/397,780, filed Aug. 12, 2022, each of which is incorporated herein by reference in its entirety.
This invention was made with Government support under Grant Nos. 2030347 and 1750244 awarded by the National Science Foundation, and Grant No. 1R35GM143089 awarded by the National Institutes of Health. The Government has certain rights to this invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/079253 | 11/3/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63274988 | Nov 2021 | US | |
63281023 | Nov 2021 | US | |
63318302 | Mar 2022 | US | |
63397780 | Aug 2022 | US |