Non-ribosomal peptides (NRPs) and polyketides (PKs) are classes of secondary metabolites produced in a variety of organisms. Many members from this classification of natural products exhibit medicinally relevant properties including antimicrobial (e.g., vancomycin and erythromycin), antitumor (e.g., bleomycin and epothilone), antifungal (e.g., soraphen and fengycin), immunosuppressant (e.g., cyclophilin and rapamycin) and cholesterol-lowering (e.g., lovastatin) activity.
Although NRP and PK natural products are chemically diverse, these types of compounds are biosynthesized in their cognate producer organisms in a similar manner by multienzymatic megacomplexes known as non-ribosomal peptide synthetases and polyketide synthases. These large proteins construct the framework of NRPs and PKs in an assembly-line fashion from simple chemical monomers (amino acids in the case of NRPSs, and acyl-CoA thioesters in the case of PKSs). For more information on classification of NRPs and PKs, see Cane D E, Walsh C T, and Khosla C, Science, 1998, 282, 63 and references therein.
The power of NRPs and PKs as potential drugs lies in their diverse and complicated chemical structures. Generally, it is the intricacy of these natural products that makes them (or variants thereof) difficult to access synthetically. Several examples exist where laborious synthetic routes have been developed, rarely successfully, for NRPs or PKs. Additionally, various moieties on such molecules are inaccessible to modification by organic synthesis, or can only be produced at low yields using such techniques. This difficultly in synthesis and modification of the NRP and PK natural products underscores the need for alternative strategies to enhance synthesis and create variants of these molecules.
Despite the apparent modular structure of the NRPSs, it has, prior to present invention, in practice been difficult to swap domains so that the resulting NRPS is active. Substitution of one domain for another generally yields great (e.g., >10-fold) reductions in yield (see
Thus, there is a need for new methods to produce novel varieties NRPs and PKs and a need for methods that increase the yields of such NRPs and PKs.
In a first aspect, the invention provides a method of generating a modified assembly line which includes the steps of (a) providing a first gene encoding a polypeptide, wherein the polypeptide includes at least one (e.g., at least 2, 3, 4, 5, 7, 10, 15, 20, 25, 30) domain of a first assembly line (e.g., an NRPS, PKS, or NRPS-PKS hybrid); (b) creating at least 15 (e.g., at least 20, 25, 30, 40, 50, 60, 75, 100, 200, 500, 750, 1000, 5000, 7500, 10,000, 25,000, 50,000, 100,000, 500,000, 1,000,000, 10,000,000, 100,000,000, 1,000,000,000) unique mutations in the nucleic acid encoding the domain, thereby creating unique variants; (c) introducing the unique mutations into second genes (e.g., genes derived from the same biosynthetic gene as the first gene) encoding at least one domain of a second assembly line (e.g., an assembly line derived from the same or different biosynthetic gene or genes as the first assembly line); (d) expressing the second assembly line in a cell, for example a bacterium (e.g., Bacillus subtilis, Pseudomonas syringae, Streptomyces sp., or Esherichia coli) or a fungal cell (e.g., a yeast cell); and (e) identifying a variant generated from step (b); using the cell in a selection or screen, wherein the selection or screen identifies a modified assembly line that alters the amount or structure of a product (e.g., an antibiotic, antifungal, antineoplastic agent, or immunosupressant) of the second assembly line; The method may further include repeating steps (b) through (e) at least once (e.g., at least 2, 3, 4, 5, 6, 7, 10, 15, 20, 35, 30, 50, 75, 100 times). The method may also include replacing at least one domain in the second assembly line with a domain from a third assembly line (e.g., an assembly line derived from the same biosynthetic assembly line as the first assembly line) prior to the identifying step (e). The method may include a creating step (b) which further includes modifying a second domain of the polypeptide coded for by the first gene. The creating step (b) may be performed in vitro. The creating step (b) may be performed by random mutagenesis (e.g., error prone PCR). The introducing step (c) may include replacing at least one domain of the second gene with the variant. The selection or screen may be performed by observing antibacterial or antifungal activity of the product. The selection may be performed on solid media or may be performed in liquid media. The second assembly line may be an NRPS, a PKS, or an NRPS-PKS hybrid. The polypeptide may include all domains of the assembly line.
In another aspect the invention also provides an organism including a modified assembly line of the first aspect.
In another aspect, the invention provides a library produced by the method including steps (a)-(c) of the first aspect to produce a library including at least 15 (e.g., at least 20, 25, 30, 40, 50, 60, 75, 100, 200, 500, 750, 1000, 5000, 7500, 10,000, 25,000, 50,000, 100,000, 500,000, 1,000,000, 10,000,000, 100,000,000, 1,000,000,000) nucleic acids encoding unique variants.
By “assembly line” is meant a polypeptide or plurality of interacting polypeptides that form multimodular enzymes which synthesize one or more of the following categories of small molecules: (i) nonribosomal peptides, (ii) polyketides, and (iii) nonribosomal peptide-polyketide hybrids. Assembly lines comprise an initiation module and a termination module. Assembly lines may further comprise one, two, three, four, five, six, seven, or more elongation modules. Assembly lines may be synthases, synthetases, or a combination thereof.
By “module” is meant a set of domains. A plurality of modules comprise an assembly line (e.g., an NRPS or PKS). One or more polypeptides may comprise a module. Combinations of modules can catalyze a series of reactions to form larger molecules. In one example, a module may comprise a C (condensation) domain, an A (adenylation) domain, and a peptidyl carrier protein domain.
By “initiation module” is meant a module which is capable of providing a monomer to a second module (e.g., an elongation or termination module). In the case of an NRPS, an initiation module comprises, for example, an A (adenylation) domain and a PCP (peptidyl carrier protein) (e.g., a T (thiolation)) domain. The initiation module may also contain an E (epimerization) domain. In the case of a PKS, the initiation module comprises an AT (acetyltransferase) domain and an acyl carrier protein (ACP) domain. Initiation modules are preferably at the amino terminus of a polypeptide of the first module of an assembly line, and each assembly line preferably contains one initiation module.
By “elongation module” is meant a module which adds a monomer to another monomer or to a polymer. An elongation module may comprise a C (condensation), Cy (heterocyclization), E, MT (methyltransferase), Ox (oxidase), or Re (reductase) domain; an A domain; or a T domain. An elongation domain may further comprise additional E, Re, DH (dehydration), MT, NMet (N-methylation), or Cy domains.
By “termination module” is meant a module that releases the molecule (e.g., an NRP, PK, or combination thereof) from the assembly line. The molecule may be released by, for example, hydrolysis or cyclization. Termination modules may comprise a TE (thioesterase), C, or Re domain. The termination module is preferably at the carboxy terminus of a polypeptide of an NRPS or PKS. The termination module may further comprise additional enzymatic activities (e.g., oligomerase activity).
By “domain” is meant a polypeptide sequence, or a fragment of a larger polypeptide sequence, with a single enzymatic activity. Thus, a single polypeptide may comprise multiple domains. Multiple domains may form modules. Examples of domains include C (condensation), Cy (heterocyclization), A (adenylation), T (thiolation), TE (thioesterase), E (epimerization), MT (methyltransferase), Ox (oxidase), Re (reductase), KS (ketosynthase), AT (acyltransferase), KR (ketoreductase), DH (dehydratase), and ER (enoylreductase).
By “nonribsomally synthesized peptide,” “nonribosomal peptide,” or “NRP” is meant any polypeptide not produced by a ribosome. NRPs may contain cyclized or branched amino acids, or any combination thereof. NRPs include peptides produced by an assembly line.
By “polyketide” is meant a compound comprising multiple ketyl units.
By “nonribosomal peptide synthetase” is meant a polypeptide or series of interactaing polypetide that produce a nonribosomal peptide.
By “polyketide synthase” is meant a polypeptide or series of polypeptides that produce a polyketide.
By “alter an amount” is meant to change the amount, by either increasing or decreasing. An increase or decrease maybe by 3%, 5%, 8%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, or more.
By “alter a structure” is mean any change in a chemical (e.g., covalent or noncovanlent) bond as compared to a reference structure.
By “mutation” is meant an alteration in the nucleic acid sequence such that the amino acid sequence encoded by the nucleic acid sequence has at least one amino acid alteration from a naturally occurring sequence. The mutation may, without limitation, be an insertion, deletion, frameshift mutation, or a missense mutation. This term also describes a protein encoded by the mutant nucleic acid sequence.
By “variant” is meant a polypeptide or polynucleotide with at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or 99% sequence identity to a reference sequence.
Sequence identity is typically measured using sequence analysis software (for example, Sequence Analysis Software Package of the Genetics Computer Group, University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, Wis. 53705, BLAST, BESTFIT, GAP, or PILEUP/PRETTYBOX programs). Such software matches identical or similar sequences by assigning degrees of homology to various substitutions, deletions, and/or other modifications. Conservative substitutions typically include substitutions within the following groups: glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid, asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, tyrosine. In an exemplary approach to determining the degree of identity, a BLAST program may be used, with a probability score between e−3 and e−100 indicating a closely related sequence.
Other features and advantages of the invention will be apparent from the following Detailed Description, the drawings, and the claims.
The present invention provides methods for generating a modified assembly line. These modified assembly lines are useful for producing novel compounds (e.g., NRPs and PKs) that have activities including but not limited to antimalarial, immunosupressory, antitumor, anticholestrolemic, antibiotic (e.g., antibacterial), and antifungal activities.
Assembly lines multimodular enzymes composed of individual domains arranged on one polypeptide or on a plurality of interacting polypeptides. Each individually folded “globule” is called a domain. Domains are organized into fundamental units called modules, which are defined operationally, as a set of domains responsible for incorporating one monomer into the growing chain. The complete set of modules responsible for assembling a natural product or the precursor of a natural product is called an assembly line (e.g., a synthase or synthetase).
The product of an assembly line may be a precursor to a product that undergoes further modification by, e.g., glycosyltransferases (e.g., the glyclosyltransferases that add sugars to the erythromycin aglycone) or oxidases (e.g., oxidases that form aryl-ether and aryl-aryl crosslink in the vancomycin-family glycopeptides). The modifications may be necessary for the natural product to be active. If the product of the assembly line itself does not have biological activity, then either (i) the new product may still be recognized by the enzyme catalyzing the further modification (e.g., oxidases and glycosyltransferases) that recognized the product of the original assembly line, or (ii) the new product may be a precursor for a semisynthetic drug. The precursor may then be modified by standard organic synthesis techniques, thereby transforming the precursor into an active drug. Taxol, for example, is produced in this manner.
The following domains may be included within an NRPSs: C (condensation), Cy (heterocyclization), A (adenylation), T (thiolation) or PCP (peptidyl carrier protein), TE (thioesterase), E (epimerization), MT (methyltransferase), Ox (oxidase), and Re (reductase) domains. Nonribosomal peptide synthetases generally have the following structure: A-T-(C-A-T)n-TE where A-T is the initiation module, C-A-T are the elongation modules, and TE is the termination module (see
The NRPS core domains include the A and PCP (or T) domains (
NRPSs are generally modular, and the series of catalytic steps moves from the amino to carboxy terminus of each polypeptide that makes up the NRPS. For example the NRPS that produces typrocidine is made of three genes producing three polypeptides. TycA contains the initiation module; TycB contains three elongation modules, and TycC contains six additional elongation modules plus a termination module (
The following domains may be included within a PKS: KS (ketosynthase), AT (acyltransferase), T (thiolation), KR (ketoreductase), DH (dehydratase), ER (enoylreductase), TE (thioesterase). PKSs generally have the following structure: AT-T-(KS-AT-T)n-TE. AT-T is the initiation module, KS-AT-T are the elongation modules, and TE is the termination module. The structure of a PKS is very similar to NRPS structure. There are many examples (e.g., yersiniabactin, epothilone, bleomycin) of hybrid PKS-NRPS systems in which both types of assembly line are pieced together to form a coherent unit. Within each PKS module, one either finds a KR, a KR and DH, a KR and DH and ER, or no additional domains. These extra domains within a module determine the chemical Functionality at the beta carbon (e.g., carbonyl, hydroxyl, olefin, or saturated carbon).
Assembly lines produce, for example, NRPs, PKs, and combinations/hybrids of NRPs and PKs. A comparison between NRPs and ribosomal peptides is shown in Table 1. In one example of a NRPS, epothilone synthetase has a molecular mass of ˜1800 kDa and includes six polypeptides, whereas the ribosome is ˜2600 kDa and includes 55 proteins and 3 rRNAs.
The present invention includes identification of modified assembly lines (MAL) using a screen or selection. Any screen or selection method standard in the art may be used to create the MALs of the present invention. Typically, one or more random mutations is introduced into a domain to create a variant domain of a nucleic acid encoding a NPP, PK or PK-NRP hybrid. Selective pressure or a screen is then applied to cells encoding the assembly lines having the variant domains. The steps of mutation and selection may be repeated. Suprisingly, despite many prior unsuccessful attempts to alter assembly lines and then natural products, we have discovered that the approach of directed evolution of specific domains rapidly and readily produces MALs having improved biosynthetic capacity and/or synthesizing novel variants of natural products. Our findings have resulted in a method of enhancing production of natural products and creating new natural products even when little tertiary or quaternary structural information is available regarding the assembly line and the natural product is one which is inaccessible for chemical modification. We believe this approach has tremendous ramifications for the production of therapeutically important molecules.
A preferred screen for secretion of a PK, NRP, or PK-NRP product includes a library of producer cells (produced by transformation with a library of plasmids or other vectors encoding a portion of the assembly line which are autologous or integrated) plated on top of a lawn of a tester strain. The tester strain may be bacterial or fungal strain sensitive to the product of the assembly line or predicted to be sensitive to the novel product produced by a modified assembly line (see
The readout for the screen may include, for example, fusing a metabolite-responsive promoter element to a reporter gene (e.g., luciferase or GFP) and screening by FACS. In this format, the metabolite-responsive promoter might be the target of a two-component system that normally senses the presence of the assembly line product and initiates a host self-protection response in the producing organism. For example, the Tet (tetracycline) repressor and the associated Tet-on and Tet-off plasmid constructs, which are standard in the art, could be used to perform such a screen.
A selection maybe performed by growing two strains—a producer (e.g., a library of producers) and a tester—in culture (e.g., liquid or solid culture) together, and strains that successful produce the desired assembly line product win the competition with their unsuccessful counterparts and take over the population. An example of this assay is described by Arndt et al. ((1999), Microbiol 145, 1989-2000). Alternatively, selection for a trait in a single cultured strain is also possible using selective media conditions. Selection conditions for the biosynthetic products of assembly lines are known in the art.
Libraries of assembly lines may be generated using molecular biology methods standard in the art. Random mutagenesis of a domain or domains of an assembly line may be performed using known methods such as error prone PCR described herein. While we have discovered that functional MALs may be generated with as few as four mutations in a domain using our selection and screening protocols, it will be appreciated that the degree of variation introduced into a domain may be controlled by the practitioner.
Mutagenesis may be accomplished by variety of means, including the GeneMorph® II EZClone Domain Mutagenesis Kit (Stratagene, La Jolla, Calif.). Error prone PCR is a method standard in the art and described in Beaudry and Joyce (Science 257:635 (1992)) and Bartel and Szostak (Science 261:1411 (1993)). This technique may be used to introduce random mutations into genes coding for proteins. Kits for performing random mutagenesis by PCR are commercially available, for example, the Diversify™ PCR Random Mutagenesis Kit (BD Biosciences, Mountain View, Calif.). Chemical mutation, radiation, and any other technique known in the art for modifying the nucleic acid sequence are appropriate for use in the present invention.
The following examples are meant to illustrate the invention and should not be construed as limiting. Other examples of modified assembly lines can be found, for example, in Lai et al., Proc. Natl. Acad. Sci. USA 103:5314-5319, 2006, hereby incorporated by reference.
Enterobactin is a small, iron-chelating molecule known as a siderophore (
Following production of enterobactin, apo-enterobactin is exported from the E. coli cytoplasm by EntS. Enterobactin then interacts with Fe3+ and forms a complex, which is then imported across the outer membrane into the periplasm by FepA and transported by FepB to FepD and FepG, which import Fe3+-enterobactin into the cytoplasm, a reaction that is catalyzed by ATP hydrolysis of FepC. Fes converts the complex into Fe and DHB-Ser (
An EntF− strain grown on minimal media in the presence of an iron chelator such as 2,2′-dipyridyl, is not capable of rapid growth, while an EntF+ does grow quickly (
Using standard molecular biology techniques, the Ser-specific A domain from EntF was replaced with a Ser-specific A domain from the syringomycin synthetase (Pseudomonas syringae), SyrE-A1, creating a hybrid module, EntF-SyrE-A1 (
After two rounds of selection, two clones had emerged that had colony diameters similar to that of cells harboring wild-type EntF. These EntF-SyrE-A1 genes from these clones (410B-02 and 410B-06) were isolated and sequenced (
The bacillomycin gene cluster comprises bmyA, bmyB, bmyC, and bmyD (
As described above, the enterobactin synthetase assembly line comprises EntB, EntD, EntE, and EntF. Interactions between EntB and the other proteins (EntD, EntE, and EntF) are known to occur (
Using the above approach, one can modify the protein-protein interactions within an assembly line to enhance biosynthesis or produce novel natural products.
As substrates for biosynthetic operations are presented on carrier proteins as covalently-attached thioesters (through a 4′-phosphopantetheine cofactor), a detailed understanding of protein-protein interactions between carrier proteins and other domains is required for reprogramming of NRPS/PKS machinery. In this example, we report the identification of a protein interaction surface on the EntB aryl carrier protein (EntB-ArCP) for phosphopantetheinyl transferases (PPTases), such as EntD and Sfp, by combinatorial mutagenesis and selection. This protein interaction surface is highly localized, consisting of just two surface residues, and is distinct from the previously identified interface for the downstream elongation module, EntF.
As noted above, enterobactin (1) is an iron-chelating siderophore produced by Escherichia coli upon iron starvation. The enterobactin synthetase consists of four protein components, EntBDEF, that use three molecules each of 2,3-dihydroxybenzoate (DHB) and serine to produce 1 via NRPS logic (
Over 65 non-redundant surviving clones from each library were isolated and sequenced. From these data, WT/Ala ratios for each position, defined as the number of times WT was observed to the number of times Ala was observed, were determined. The degree of conservation for each residue was classified as high (WT/Ala≧20), intermediate (6<WT/Ala<20), or low (WT/Ala≦6). Only five residues fell into the intermediate or high conservation categories (Table 2).
The sequencing results revealed that the residues G242 and D244 form a conserved, surface-exposed patch that immediately precedes the phosphopantetheinylated S245 (
Three other residues displayed intermediate or high conservation: L238, L243, and D234. The residues L238 and L243, located on the loop, point toward the carrier protein core. The high WT/Ala ratios at these positions is likely due to the role of the Leu side-chain in maintaining the stability of the EntBArCP fold. Aspartate at position 234 was preferred about 14-fold over Ala, presumably because D234 participates in charge-charge interactions with K215 and R219 of helix 1.
Collectively, we now have scanned ˜80% of the EntB-ArCP surface using a combinatorial mutagenesis and selection scheme. Overall, the majority of EntB-ArCP surface residues were highly tolerant to mutation. Thirty-six of 44 total surface residues that were examined here and in our earlier report showed low conservation. This result implies that the majority of EntBArCP surface residues are not involved in interactions with other synthetase components.
We and others have found that aryl carrier proteins from EntBDEF and related synthetases are surprisingly impervious to mutation while maintaining their ability to be recognized by free-standing adenylation domains in vitro. Thus, the interface for EntE may be malleable for presentation of aminoacyl-O-AMP to the pantetheinyl arm of EntB.
This example suggest that reprogramming NRPS and PKS assembly lines by engineering selective carrier protein interactions should optimally focus on interaction “hot spots,” similar to those on EntB-ArCP for EntD/Sfp and EntF. This process can be facilitated by directed evolution approaches (e.g., using the methods described herein) that target these regions.
The E. coli K12-derived strain entB::kanR contains a chromosomal replacement of the entB gene with a kanamycin resistance marker. When transformed with a plasmid harboring the entB gene, these cells are able to grow on iron-depleted media. This complementation format allowed us to rapidly process large libraries of EntB variants for function. We used a structural homology model of EntB-ArCP based PCP domain from the tyrocidine synthetase (TycC3-PCP) for our analysis. A crystal structure of full-length EntB (apo-form) (Drake et al., Chem. Biol. 13:409-419, 2006), which we used for our subsequent library design
Three shotgun alanine scanning libraries that span helix 1 (library H1) and the long loop between helix 1 and helix 2 (loop 1, libraries L1A and L1B) were constructed as described below. (Regions of EntB-ArCP corresponding to helices 2 and 3 and loop 2 were examined as described herein.)
Library Selection and Functional Mapping onto the EntB Crystal Structure
Selection for functional EntB variants was achieved by plating the libraries onto minimal media made iron-deficient by the addition of 100 μM 2,2′-dipyridyl. After incubation at 37° C. for two overnights, colonies of varying diameters were observed The largest colonies were picked, restreaked onto selective media, and sequenced. Table 3 contains the compiled data from sequencing of 69, 88, and 75 nonredundant clones from the surviving pools of H1, L1A and L1B, respectively. For each position, the WT/Ala ratio was used as a measure of conservation, where the WT/Ala ratio is defined as the number of times WT side-chain identity was observed to the number of times Ala was observed. For position 216 (in which Ala is the WT residue), the WT/Lys ratio was used. The degree of conservation at each position was categorized as high (WT/Ala≧20), intermediate (6<WT/Ala<20), or low (WT≦6). The surface representation of the apo-EntB-ArCP crystal structure, where each position is color coded according to these classifications, is shown in
2.2 (V)
1.2 (V)
Several positions fell into the high or intermediate conservation category. The sidechains for L238 and L243 point toward the core of the ArCP domain (
We prepared a variant of the EntB ArCP domain containing a G242A mutation. The ability of EntD and Sfp to recognize and efficiently phosphopantetheinylate this mutant was examined by monitoring incorporation of 1-[14C]-acetyl-CoA onto the ArCP over time.
In order to determine the role of D244, we expressed and purified the ArCP mutants D244A and D244R. Using an HPLC assay, D244A was be readily converted to the holo-form by EntD, but not by Sfp (
The plasmid pJRL16 contains the entB gene cloned into a pET22b-based plasmid. For each library, an inactive template based on pJRL16 was produced that contained two sequential TAA stop codons and a unique EcoRI site in the region of entB to be randomized. The appropriate inactive template was used for full plasmid replication with the phosphorylated primers 5′-CCA GCA CCT ATC CCC GCC KCC RMA RMA GMA CTG SST GMA GYT ATC SYT SCA TTG CTG GAC GAG TCC GAT-3′ for H1, 5′-GAG GTG ATC CTG CCG SYT CTG GMT GMA KCC GMT GMA SCA KYT GMT GMT GAC AAC CTG ATC GAC-3′ for L1A, or 5′-GAA CCC TTC GAT GAC GMT RMC SYT RYT GMT KMT GST SYT GMT TCG GTG CGC ATG ATG GCG-3′ for L1B (regions of randomization indicated in bold, hybridization regions indicated in italic; the standard abbreviations for DNA degeneracies are used: K=G/T, M=A/C, R=A/G, S=G/C, Y=C/T. Ligation of the nascent DNA was accomplished by addition of Taq ligase to the reaction mixture. Plasmid replication with the library primers resulted in replacement of the stop codons S and the EcoRI site with the desired regions of randomization. The template was then destroyed by double digestion with DpnI and EcoRI.; and the library DNA was purified by phenol/chloroform extraction. Transformation of library DNA into entB::kanR cells was achieved by electroporation. In a typical selection b 104-107 cells were plated onto 241−×241 -mm plates of minimal media containing 100 μM 2,2′-dipyridyl and 100 μg/mL carbenicillin, and grown for two overnights at 37° C. The largest colonies were restreaked onto selective media and sequenced.
The DNA for G242A, D244A, and D244R was prepared using standard methods. Expression and purification of EntB-ArCP (WT and mutants) and EntD was as previously described. Phosphopantetheinylation assays monitored by radioactivity were performed in 75 mM Tris pH 7.5, 10 mM MgCl2, 0.5 mM TCEP using 69 μM 1-[14C]-acetyl-CoA (6.6 Ci/mol) and 15 μM EntB-ArCP (WT or mutants). The total reaction volume was 50 μL. Reactions were initiated by addition of EntD or Sfp and quenched in 500 μL 10% (w/v) trichloroacetic acid (TCA). The protein pellet was recovered by centrifugation, washed with 10% (w/v) TCA, and then redissolved in 100 μL formic acid. Scintillation fluid was added (4 mL) and the amount of incorporated radiolabel was determined by liquid scintillation counting. Conditions for the HPLC phosphopantetheinylation assay were similar. Following incubation with CoASH (5 mM) and EntD or Sfp, reactions were quenched in water/0.1% TFA. Analysis was performed using a C4 HPLC column with water/0.1% TFA and acetonitrile as the mobile phases.
To assess surface features of the EntF T domain recognized by C, A, and TE, regions of the EntF T domain were submitted to shotgun alanine scanning and Ent production selection, which revealed residues that could not be substituted by Ala. EntF mutants bearing Ala in such positions were assayed in vitro for Ent production with EntEB and A-T, C-T, and T-TE communications. From these studies, G1027A and M1030A were found to be specifically defective in acyl transfer from T to TE. Thus, these mutants define an interaction surface between these two in cis domains in an NRPS module.
In the two-module EntEBF system EntEB acts as initiation module, while EntF functions as both an elongation and a termination module. Given that the four-helix T domain scaffolds can be distinguished, at least by some partner proteins that work in trans, we sought to determine if the EntB T domain presents different faces to its distinct partners, EntD (the PPTase), EntE (the A domain), and EntF (C domain). To do so, we employed a selection under low iron conditions where E. coli require the capacity to produce enterobactin to grow on low iron media. By combinatorial mutagenesis of selected regions on EntB, we identified a surface of the EntB T domain that, upon mutation in the comprising residues, was specifically impaired for recognition by the EntF elongation module but not interaction with EntD or EntE. In this example, we have turned to the in cis T domain of the 142 kDa protein EntF to assess comparable libraries by combinatorial mutagenesis.
Carrier protein domains are approximately 80 to 100 residues in length. A structure of the EntF T domain is not currently available; we therefore produced a structural model based on homology with a T domain from the tyrocidine NRPS system (TycC3-PCP). Residues 960-1047 of EntF were aligned with TycC3-PCP by using the ClustalW algorithm (
Helix II of the B. subtilis ACP from primary metabolism has been reported to be important for interaction of the ACP with its cognate phosphopantetheinyl transferase ACPS (ACP synthase). Also, helix II residues on PCPs have been reported for interaction with catalytic partners. Residues in helix III of EntB-ArCP constitute an interaction interface for the downstream elongation module, EntF. Therefore, we targeted these portions on the EntF T domain surface (predicted to lie in the helix II/loopII/helixIII region) for combinatorial mutagenesis via shotgun alanine scanning. In this combinatorial mutagenesis strategy, codons are used that allow the residues to vary between wt, Ala, and sometimes a third or fourth residue. For cases where the wt residue was Ala, we used a combinatorial codon set that allowed the side-chain identity to vary between Ala, Glu, Gln, Pro. Three libraries spanning regions of helix II, helix III, and loop II/helix III were prepared (
Selection for functional EntF clones was based on the fact that enterobactin production is essential for the survival of E. coli under low iron conditions. The E. coli strain entF::cat (ER 1100A) contains a chromosomal replacement of the entF gene by a chloramphenicol resistance marker. The entF::cat strain is not able to grow in minimal media in which iron is sequestered by the chelator 2,20-dipyridyl. However, the entF knockout cells can be complemented by transformation with a pET29-based plasmid that harbors the wild-type entF gene.
Bacteria harboring the EntF libraries were subjected to the iron-deficient selection conditions. Colonies of varying sizes were observed after 24 hr at 37° C., the largest of which were isolated and sequenced. Twenty-nine, 16, and 17 nonredundant surviving clones from the helix II, helix III, and loop II/helix III libraries (respectively) were analyzed. Further sequencing of survival colonies from helix III and loop II/helix III (40 and 44 total colonies sequenced, respectively) yielded redundant sequences. This result might be due to the small sequence diversity of these two libraries. The survival rate on selection medium was estimated by comparing numbers of colonies that grew on rich media with the number of colonies that grew on low-iron media. We observed survival rates of 30% for the helix II library, 8% for the helix III library, and 15% for the loop II/helix III library.
For residues L1007, G1027, V1029, and M1030, the wt amino acid was strongly preferred over Ala (no Ala residues were observed in surviving clones at these positions. The residue V1029 is predicted to be a core residue in the EntF T domain homology model; furthermore, NMR studies of an EntF fragment confirmed that V1029 points toward the core of the EntF T domain (D. Frueh, D. Vosburg, C. T. W., G. Wagner, unpublished data). We therefore reasoned that mutation of V1029 would be likely to cause disruption of the EntF T domain structure, and thus we did not characterize any point mutants at this position. Residue L1007 is located on helix II of the EntF T domain homology model, immediately C-terminal to the phosphopantetheinylated Ser. The analogous position was found to be important for interactions between the PPTase and ACP of the B. subtilis FAS. Therefore, we believe that mutation of L1007 affect posttranslational modification of EntF. The residues G1027 and M1030 lie on helix III of the EntF T domain model. A representation of the EntF T domain homology model with the locations of the conserved residues is shown in
From the sequencing results for the survivors, proline was prohibited within a-helical regions, except at the beginning of helix III. Proline is an α-helix-breaking residue and would likely disrupt the structure of the EntF T domain if placed in the middle of an α-helix. The observation that proline was not observed in α-helical positions of the EntF T domain (where proline was permitted as an option) suggests that E. coli survival under low iron conditions is tightly coupled to EntF function. Under low iron conditions, E. coli thus are under selective pressure for well-folded and functional EntF variants. This result therefore confirms that the information from sequencing results is valuable for dissecting EntF function.
Phosphopantetheinylation Assay
Enterobactin production by the Ent synthetase requires that the T domains of EntB and EntF be primed with the 40-phosphopantetheine prosthetic group. Two endogenous PPTases are found in E. coli: one for primary metabolism (ACPS) and the dedicated PPTase EntD, which is encoded in the enterobactin biosynthetic gene cluster. The PPTase ACPS is responsible for the modification of ACP for fatty acid synthesis but does not accept the EntF T domain as a substrate. However, expression of EntD is upregulated in response to low iron conditions, resulting in the posttranslational modification of the EntB and EntF T domains to their holo forms. In order to determine whether the observed conservation of L1007, G1027, and M1030 during in vivo enterobactin production selection was due to recognition defects between EntF and EntD, a phosphopantetheinylation assay was performed with EntD and EntF (wt and mutants).
We characterized the mutants G1027A and M1030A in a previously reported enterobactin reconstitution assay involving EntE and EntB Gehring et al., Biochemistry 37:2648-2659, 1998. This assay allows validation of the sequence results from combinatorial mutagenesis and affords the opportunity to quantitatively evaluate the overall competence of the EntF mutants for the three steps of the enterobactin biosynthesis reaction cascade (shown in
To prepare the holo form of EntF (wt and mutants), we used the broad-substrate PPTase Sfp from B. subtilis. Both mutants could be efficiently phosphopantetheinylated by Sfp. As the Km for DHB-SEntB-ArCP as the substrate of EntF is approximately 1 μM, reconstitution assays were preformed at 15 μM EntB-ArCP so that catalysis involving EntF would be the rate-limiting step in enterobactin production. This condition allowed us to evaluate whether the EntF mutants were deficient in any of the in cis interactions listed above.
The production of enterobactin is shown in
Loading of Ser onto the EntF T Domain
The loading of Ser onto EntF T domain by the EntF A domain is a two-step process. First, Ser is adenylated by the A domain to form the activated Ser-O-AMP ester. Second, this activated Ser-O-AMP species is coupled to the thiol on the phosphopentetheinyl arm of the EntF T domain. To examine the kinetics of Ser covalent loading onto the EntF T domain, the time course for loading of 14C labeled serine was determined (
Following the loading of serine onto the EntF T domain, the C domain of EntF catalyzes the condensation of DHB (loaded on EntB-ArCP) with the serine loaded on the EntF T domain to form a DHB-Ser condensation product (
Acyl-Transfer from the T Domain to the TE Domain
The EntF TE domain is a unique thioesterase because it is responsible for elongation (trimerization of DHB-Ser via the sidechain hydroxyl of Ser) followed by macrocyclization and release of the mature enterobactin product. This process requires well-timed communication events between the T and TE domains. From the enterobactin reconstitution assay, we concluded that the overall competence of the mutants G1027A and M1030A for the three steps involved in enterobactin production ([1] Ser loading, [2] condensation, and [3] elongation/macrocyclization) was reduced by 15- and 30-fold, respectively. However, neither of these mutations had defects in the Ser loading step or the condensation step as judged by assays that tested each of these steps separately. Therefore, we infer that G1027A and M1030A must be defective in the macrocyclization step (i.e., communication between the T and TE domains of EntF). As a direct assay for T-TE communication using the native DHB and Ser substrates is not available, we developed an assay to examine transfer of an independently primed acyl group from the T domain to the TE domain of EntF. In this assay, T-TE communication was detected by monitoring the net hydrolysis of a noncognate acyl group from EntF. In particular, a limiting amount of 1-[14C]-acetyl-CoA was used with Sfp to load the apo form of EntF (wt and mutants) with 1-[14C]-acetyl-pantethene onto the T domain. In wt EntF, this radiolabeled acyl group is transferred to the active site serine of the downstream TE domain but is not capable of participating in macrocyclization. As shown in
The T domains that are the centerpiece of the covalent attachment strategy for PKS and NRPS assembly line logic must first be primed by dedicated PPTases that add the 20 Å phosphopantetheine arm, thereby installing the nucleophilic thiol and bringing the assembly lines to the ready position. The thiols of the thiolation domains in turn capture acyl chains in covalent thioester linkage during natural product chain growth. The structure of a number of T domains, of both the ACP and PCP subcategories, have been determined by NMR and/or X-ray in both apo and holo forms and show a three- or four-helix scaffold with the Ser residue to be primed with phosphopantetheine near the N-terminal end of helix II. Priming by PPTase requires the folded architecture of the apoT domains for modification to proceed.
Despite the very similar folds among the 80-100 residue T domains, they can exist in several contexts. One major subgroup is that of free-standing T domains in type II PKS systems such as the actinorhodin, and the frenolicin synthases. At the other extreme are type I PKSs, such as deoxyerythronolide B synthase and rapamycin synthase, where a T domain is embedded in cis in every module. Most NRPS assembly lines follow type I assembly logic, e.g., ACV synthetase, tyrocidine synthetase, and the three subunit heptapeptide synthetase in vancomycin construction However, in coumermycin formation, there is a free standing A and T domain for channeling proline down that antibiotic pathway. The EntEBF synthetase is a hybrid of type I (EntF) and type II (EntBE) contexts with one T domain (EntB) in trans and one T domain (EntF) in cis.
Here, we have turned to the other T domain in the Ent synthetase, which is embedded within the four domain EntF and have used the same approach of shotgun alanine scanning and selection for survivors on low iron medium. We kept side chains of core residues in the EntF T domain constant and varied surface residues on helices II and III and in corresponding loops. The positions L1007, G1027, and M1030 could not be mutated to Ala without impaired enterobactin production. The L007A, G1027A, and M1030A mutants of EntF were constructed, purified, and assayed in vitro to validate the defect in Ent formation and to determine which of the domain-domain interactions was affected. First, the priming from apo-EntF to the holo form of the T domain still occurs in G1027A and M1030A but not L1007A. This assay provided a readout that the architecture of the T domain in the vicinity of the critical Ser to be primed is in a native state, and the results indicated that G1027A and M1030A were still competent in this regard, but L1007A was not. Second, the A domain within G1027A and M1030A still activates Ser and installs it on the holo form of the T domain as assayed by covalent loading of radiolabeled Ser onto EntF. The C domain was assayed in truncated three-domain C-A-T constructs of G1027A and M1030A with 14C-labeled Ser and unlabeled DHB with EntE and EntB. In the absence of a TE domain, if the C domain is functioning, it should transfer DHB from DHB-S-EntB to [14C]-Ser-S-EntF and yield the DHB-[14C]-Ser-S-EntF. Cleavage of the thioester allowed detection and quantitation of DHB-[14C]-Ser. Both the G1027A and M1030A forms of EntF were as active as wild-type EntF in this assay, suggesting recognition of the T domain mutants by the C domain in cis was unaffected.
With the C and A domains of EntF unaffected, the most likely effect of the G1027A and M1030A mutations in the EntF T domain are in its recognition by the in cis downstream TE domain. A result consistent with the impairment of T-TE interaction was obtained in an acyl transfer assay. EntF was primed with 1-[14C]-acetyl-CoA. Wild-type EntF hydrolyzes the acetyl thioester, presumably by transfer to the adjacent TE domain, which then acts as an acetyl-thioesterase. The half-life for acetyl group hydrolytic release is about 5 min. Compared to normal enterobactin cyclotrimerization of 100 min−1, the hydrolysis of the noncognate acetyl group occurs at about 1/500th the rate, slow enough to be inconsequential for normal turnover but useful as an assay for a slow default hydrolytic activity of EntF TE domain. The G1027A and M1030A mutants in EntF can be stably primed with the acetyl-S-pantetheine consistent with failure to transfer the acetyl group from T to TE.
Both of the T domains in EntB and EntF have surface patches that are loci of specific recognition by particular partner enzymes. In the EntB T domain, two residues on helix III (F264 and A268) and one on helix II (M249) interact with the downstream EntF and are critical for C domain function (
We believe that T domains use helix III as a general interaction surface for immediate downstream domains (
Production of a Homology Model for EntF T Domain
The T domain of EntF (residues 960-1047) was aligned with TycC3-PCP (PDB code: 1DNY) with the ClustalW algorithm. A homology model was generated by Swiss-Pdb Viewer and refined by SWISS-MODEL software. All structural figures were prepared with Pymol software (DeLano Scientific).
Library Construction and Selection for Enterobactin Production
For each library, an inactive template based on wild-type EntF construct pER311A was generated by the SOE method (Ho et al., Gene 77:51-59, 1989). The inactive templates contained tandem TAA stop codons followed by a unique restriction site SacI in the region of EntF T domain to be randomized. These inactive templates were used for full plasmid replication with the primers 5′-GCG CTT GGC GGT CAT TCG SYT SYT GCA RYG RMA CTG GCA SMA CAG TTA AGT CGG CAG GTT-3′ for helix U library, 5′-CGC CAG GTG ACG CCG GGG SMA GYT RYG GYT SMA TCA ACT GTC GCC AAA CTG-3′ for helix III library, and 5′-CAG TTA AGT CGG CAG GTT GCA SST SMA GYT RCT SCA GST CAA GTG ATG GTC GCG TCA-3′ for loop II/helix III library, respectively (sites of randomization indicated in bold; DNA degeneracies are represented as: K=G/T, M=A/C, R=AG, S=G/C, Y=C/T). DpnI and SacI were used to destroy the templates. Library DNA were transformed into electrocompetent entF::cat cells and plated onto minimal media in which iron was sequestered by the addition of 100 μM 2,2′-dipyridyl. The transformats were allowed to grow for 24 hr, and the largest colonies were isolated and sequenced.
Site-Directed Mutagenesis, Protein Expression, and Purification
The EntF site-directed mutants L1007A, K1011A, G1027A, and M1030A were constructed by the SOE method (Ho et al., supra). The generation of H1271A and H138A were previously described (Roche and Walsh, Biochemistry 42:1334-1344, 2003). The overexpression and purification of EntF (wild-type and mutants), EntE, EntB-ArCP, EntD, and Sfp were performed as reported (Roche and Walsh, supra). Protein concentrations were determined by Bradford assay.
Phosphopantetheinylation Assays
Phosphopantetheinylation was measured by incorporation of radiolabeled [3H]CoASH onto EntF (wt and mutants). Reactions were performed under the following condition: 75 mM Tris (pH 7.5), 10 mM MgCl2, 0.5 mM Tris(2-Carboxyethyl) phosphine (TCEP), 6 μM EntF (wt and mutants), 30 μM [3H]CoASH (66.8 Ci/mol), and they were initiated by the addition of 1 μM EntD. Reactions were quenched with of 10% (wt/vol) TCA, and then BSA (100 mg) was added as a carrier. The protein pellet was washed with 10% (wt/vol) TCA and resuspended in formic acid, and the amount of radioactive label was measured by liquid scintillation counting.
Enterobactin Reconstitution Assay
Holo EntB-ArCP and EntF were prepared by incubating the apo proteins with 300 nM Sfp and 500 mM CoASH in 75 mM Tris (pH 7.5), 10 mMMgCl2, and 0.5 mMTCEP for 20 min. The enterobactin reconstitution assay was performed as in [37] and modified to the following condition: 75 mM Tris (pH 7.5), 10 mM MgCl2, 0.5 mM TCEP, 500 mM DHB, 1 mM L-serine, 10 mM ATP, 300 nM EntE, 15 mM holo EntB-ArCP, 100 nM holo EntF (wt or mutants). Reaction progress was monitored by high-performance liquid chromatography (HPLC) with water/acetonitrile/trifluoroacetic acid mobile phases. Duplicate experiments were performed to determine initial rates for enterobactin reconstitution.
Ser Incorporation Assay
Reactions were performed under the following condition: 75 mMTris (pH 7.5), 10 mM MgCl2, 0.5 mM TCEP, 5 μM holo EntF (wt and mutants), 200 μM [14C] L-Ser (52.38 Ci/mol, Sigma), and they were initiated by the addition of 10 mMATP. The measurement of the amount of radioactive label on proteins was performed the same as that described in the Phosphopantetheinylation Assay section. Experiments were performed in duplicates.
Condensation Assay
Holo form EntF C-A-T (wt and mutants) proteins were prepared as above. The reaction mixture containing 75 mM Tris (pH 7.5), 10 mMMgCl2, 0.5 mMTCEP, 5 μM holo EnF C-A-T (wt and mutants), 10 μM EntB-ArCP, 900 nM EntE, 100 μM [14C] L-Ser(52.38 Ci/mol, Sigma), and 10 mM ATP were preincubated for 5 min to allow Ser loading. The condensation reactions were started by adding 500 μM DHB. Reactions were quenched within 15 s and washed with 10% TCA. The protein pellets were resuspended in 100 μl 0.5 M KOH. After 10 min incubation at room temperature, which allows the release of Ser or DHB-Ser from proteins, 10 μl of 50% TFA (trifluoroacetic acid) was added to acidify the mixture. Precipitation was removed by centrifugation and supernatants were analyzed by HPLC. Flow-through radioactivity was monitored by using a Radioisotope Detector β-RAM Model 3 (Beckman).
Acyl Transfer Assay
Reactions were performed under this condition: 75 mMTris (pH 7.5), 10 mMMgCl2, 0.5mMTCEP, 75 μM 1-[14C]acetyl-CoA (31.10 Ci/mol, Amersham Pharmacia), and 6 μM EnF (wt and mutants). Reactions were started by adding 300 nM Sfp. Reactions were quenched, and the amount of radioactive label was measured as described in the Phosphopantetheinylation Assay section.
All publications, patent applications including U.S. provisional patent application No. 60/701,807, filed Jul. 21, 2005, and patents mentioned in this specification are herein incorporated by reference.
Various modifications and variations of the described method and system of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific desired embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention that are obvious to those skilled in the fields of medicine, immunology, pharmacology, oncology, or related fields are intended to be within the scope of the invention.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US06/28487 | 7/21/2006 | WO | 00 | 10/14/2009 |
Number | Date | Country | |
---|---|---|---|
60701807 | Jul 2005 | US |