The present invention is in the field of inhibiting the growth of certain bacteria.
The microbiota of plants and animals have co-evolved with their hosts for millions of years1-3. Due to photosynthesis, plants serve as a rich source of carbon for diverse bacterial communities. These include mutualists and commensals, as well as pathogens. Phytopathogens and plant growth-promoting bacteria significantly affect plant growth, health, and productivity4-7. Except for intensively studied relationships such as root nodulation in legumes8, T-DNA transfer by Agrobacterium9, and type III secretion-mediated pathogenesis10, the understanding of molecular mechanisms governing plant-microbe interactions is quite limited. It is therefore important to identify and characterize the bacterial genes and functions that help microbes thrive in the plant environment. Such knowledge should improve our ability to combat plant diseases and harness beneficial bacterial functions for agriculture, directly impacting global food security, bioenergy, and carbon sequestration.
Cultivation-independent methods based on profiling of marker genes or shotgun metagenome sequencing have considerably improved our understanding of microbial ecology in the plant environment11-15. In parallel, the reduction of sequencing costs has enabled the genome sequencing of plant-associated (PA) bacterial isolates at a large scale16. Importantly, isolates enable functional validation of in silico predictions. Isolate genomes also provide genomic and evolutionary context for individual genes and the ability to access genomes of rare organisms that might be missed by metagenomics due to limited sequencing depth. While metagenome sequencing has the advantage of capturing the DNA of uncultivated organisms, multiple 16S rRNA gene surveys have reproducibly shown that the most common plant-associated bacteria are mainly derived from four phyla13,17 (Proteobacteria, Actinobacteria, Bacteroidetes, and Firmicutes) that are amenable to cultivation. Thus, bacterial cultivation is not a major limitation when sampling the abundant members of the plant microbiome16.
The present invention provides for a composition comprising a purified or isolated Hyde1 gene product, or a functional fragment thereof.
In some embodiments, the Hyde1 gene product comprises an amino acid sequence having at least 70%, 80%, 90%, 95%, or 99% amino acid identity with any one of SEQ ID NOs:1-11. In some embodiments, the Hyde1 gene product comprises one or more of the following conserved amino acid sequences: VYRLE (SEQ ID NO:12), VYRLD (SEQ ID NO:13), QRXXH (SEQ ID NO:14), VRLYRI (SEQ ID NO:15), VRLYRV (SEQ ID NO:16), VRLHRI (SEQ ID NO:17), VRLHRV (SEQ ID NO:18), IRLYRI (SEQ ID NO:19), IRLYRV (SEQ ID NO:21), IRLHRI (SEQ ID NO:22), IRLHRV (SEQ ID NO:23), PXXLLGXSXXVDXW (SEQ ID NO:24), PXXLLGXSXXVDLW (SEQ ID NO:25), and PXXLLGXSXXVDIW (SEQ ID NO:26), wherein X is any naturally occurring amino acid.
In some embodiments, the Hyde1 gene product is Aave_0989, Aave_3191, or any other Hyde1 gene described herein.
In some embodiments, the Hyde1 gene product is capable of killing a broad array of plant pathogenic bacterial species.
The present invention provides for a pharmaceutical composition comprising the composition of claim 1 and a pharmaceutically acceptable carrier.
The present invention provides for a medicant manufactured using the composition of the present invention.
The present invention provides for a modified host cell comprises one or more genes encoding, and/or capable of expressing, a Type VI secretion system (T6SS), Hyde1, and/or Hyde2, or a functional fragment thereof.
The present invention provides for a modified bacterial cell comprises one or more genes encoding, and/or capable of expressing, a Type VI secretion system (T6SS), wherein the bacterial cell is a naturally occurring and pathogenic to a subject but is modified to be not pathogenic to the organism.
In some embodiments, the subject is a plant or a mammal, such as a human. In some embodiments, the subject is known to be, suspected to be, or has a high probability of being infected or contaminated with a pathogenic bacteria. In some embodiments, the subject is a human patient.
In some embodiments, the bacterial cell is modified to reduce expression of, or is knocked out for, a Type III secretion system (T3SS) or Type IV secretion system (T4SS) that the unmodified bacterial cell naturally is capable of expressing.
In some embodiments, the bacterial cell is a Hyde1 positive strain. In some embodiments, the bacterial cell is modified to make it not pathogenic. In some embodiments, the bacterial cell naturally contains or expresses Hyde1, wherein optionally the bacterial cell is modified to make it not pathogenic.
The present invention provides for a method of treating a plant diseases caused all or in part by a bacterial cell, comprising: administering a composition of claim 3 to a plant, or a part thereof, in need thereof.
In some embodiments, the part is a seed, root, stem, stalk, branch, leaf, flower, or fruit.
The present invention provides for a method of treating a disease caused all or in part by a bacterial cell, comprising: administering a pharmaceutical composition or medicant of the present invention to a subject in need thereof.
In some embodiments, the bacterial cell is a human pathogen and the subject is a human patient.
In some embodiments, the bacterial cell is a species from a genus selected from the group consisting of Escherichia, Enterococcus, Staphylococcus, Klebsiella, Acinetobacter, Pseudomonas, and Enterobacter.
In some embodiments, the bacterial cell is an Escherichia coli, Enterococcus faecium, Enterobacter cloacae, Enterobacter aerogenes, Staphylococcus aureus, Klebsiella pneumonia, Acinetobacter baumannii, or Pseudomonas aeruginosa.
The present invention provides for a method to limit or reduce growth of a pathogenic bacteria in an environment, comprising: introducing a non-pathogenic bacterial comprising one or more genes encoding, and/or capable of expressing, a Type VI secretion system (T6SS), Hyde1, and/or Hyde2, or functional fragment thereof, to an environment; whereby expression of the Type VI secretion system (T6SS), Hyde1, and/or Hyde2, or functional fragment thereof, limits or reduces growth of a pathogenic bacteria in the environment.
In some embodiments, the environment is an intensive care unit (ICU), or is known to be, suspected to be, or has a high probability of being infected or contaminated with a pathogenic bacteria.
A group of novel proteins (Hyde1 proteins) in the bacterial genus Acidovorax are used to kill competing organisms, including bacterial plant pathogens. The proteins are likely injected through type VI secretion system. Most of the organisms encoding for these proteins are plant pathogens but they can be mutated and turned into non-pathogens or transfer the relevant toxic genes into non-pathogenic bacteria. The resulting bacterial strains can be used as biocontrol agents to limit plant pathogens and possibly also human pathogens. 13 out of 16 bacterial strains tested as prey cells are killed in vitro by the Acidovorax strain encoding Hyde1 proteins. Bacterial killing of prey cells is significantly reduced when the Hyde proteins are deleted or when T6SS is deleted. It is shown that direct expression of the toxic protein is very toxic to the recipient bacterial cell. Antibacterial properties of Hyde proteins in bacteria dwelling in the plant environments and as a pure protein are tested. Many Hyde-like proteins of potential antimicrobial properties are predicted and identified.
The foregoing aspects and others will be readily appreciated by the skilled artisan from the following description of illustrative embodiments when read in conjunction with the accompanying drawings.
Before the invention is described in detail, it is to be understood that, unless otherwise indicated, this invention is not limited to particular sequences, expression vectors, enzymes, host microorganisms, or processes, as such may vary. It is also to be understood that the terminology used herein is for purposes of describing particular embodiments only, and is not intended to be limiting.
In this specification and in the claims that follow, reference will be made to a number of terms that shall be defined to have the following meanings:
The terms “optional” or “optionally” as used herein mean that the subsequently described feature or structure may or may not be present, or that the subsequently described event or circumstance may or may not occur, and that the description includes instances where a particular feature or structure is present and instances where the feature or structure is absent, or instances where the event or circumstance occurs and instances where it does not.
As used in the specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to an “expression vector” includes a single expression vector as well as a plurality of expression vectors, either the same (e.g., the same operon) or different; reference to “cell” includes a single cell as well as a plurality of cells; and the like.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limits of that range is also specifically disclosed. Each smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in that stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range, and each range where either, neither or both limits are included in the smaller ranges is also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
The term “about” refers to a value including 10% more than the stated value and 10% less than the stated value.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are now described. All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited.
One aspect of the invention comprises the utilization of a bacterial strain that colonizes plants as a way of eradicating plant pathogens (such as, serving as a biocontrol agent).
In some embodiments, the strain is Acidovorax citrulli AC000-1 (wild-type strain) which is shown in vitro to kill 13/16 bacterial strains tested when it is co-cultured with a competing bacterial strain for 19 hours.
Using computational biology tools developed, new genes (“Hyde1”) are identified as being used in the killing. The genes are genetically associated with the T6SS loci through a putative adaptor gene that we named Hyde2. Two Hyde1 genes, sharing 53 amino acid sequence identity, were tested by expression in E. coli, and are shown to be highly toxic to E. coli, leading to nearly one million-fold reduction in colony forming units.
The wild-type (WT) and Hyde1 mutant strains are then tested against 16 different plant-associated bacterial strains including the following plant pathogens: Pseudomonas syringae B728a, Pseudomonas syringae tomato DC3000, Ralstonia solanacearum AW1, Xanthomonas campestris LMG568, and Agrobacterium tumefaciens C58. The WT strain clears 10-10,000 fold more bacteria than the Hyde1 deletion mutant. For example, the plant pathogen Pseudomonas syringae B728a is reduced by 10,000 fold when challenged with WT Acidovorax citrulli AC000-1. This effect is abolished when Hyde1 proteins are deleted. The most toxic protein may be Aave_0989 as a mutant containing the deletion of this gene abolished the killing of two competing bacterial strains.
The invention encompasses a large number of organisms that encode Hyde1 genes and Hyde1-like genes and may kill other competing pathogens.
In case of plant pathogens the main mechanisms for pathogenesis is known (T3SS and its secreted proteins) and hence, by deleting these genes we can produce non-pathogenic bacterial strains efficient in killing competitor cells.
By using the same computational biology approach described herein, one skilled in the art can predict other novel families of putatively antibacterial effect against other bacteria.
The following are exemplary Hyde1 amino acid sequences, which are all from the Acidovorax avenae citrulli AAC00-1 strain.
It is to be understood that, while the invention has been described in conjunction with the preferred specific embodiments thereof, the foregoing description is intended to illustrate and not limit the scope of the invention. Other aspects, advantages, and modifications within the scope of the invention will be apparent to those skilled in the art to which the invention pertains.
All patents, patent applications, and publications mentioned herein are hereby incorporated by reference in their entireties.
The invention having been described, the following examples are offered to illustrate the subject invention by way of illustration, not by way of limitation.
The present invention provides for a new genetic mechanism that efficiently inhibits growth of different bacteria grown in culture, including of plant pathogens.
The Type VI secretion system (T6SS) is used by bacteria to secrete proteins (“effectors”) that are toxic to neighboring cells, mostly bacteria, but occasionally to eukaryotic host cells (plant or animal cells).
A set of new genes (Hyde 1) are discovered in bacteria that are pathogenic to plants (genus Acidovorax). The genes are restricted to the Acidovorax phytopathogens and are found in all 10 analyzed strains. The strains are originally isolated from the leaves of a large set of plants, including sugarcane, rice, lamb's lettuce, Konjac, watermelon, melon, maize and Citrullus lanatus. The genes are associated with T6SS. This is done by identifying the Hyde1 genes next to a set of other novel genes (hyde2) that are either located next to or within T6SS gene locus, or Hyde2 is fused to T6SS-associated proteins. Hyde2 may be an adaptor gene connecting Hyde1 genes to T6SS.
When expressing two different Hyde1 genes (Aave_0989 and Aave_3191) in E. coli recipient cell, the number of bacterial colonies is reduced by 105 fold in comparison to the expression of a non-toxic gene. The 9/11 Hyde1 genes in Acidovorax citrulli AC000-1 is deleted. See
When co-culturing the wild-type Acidovorax citrulli strain (serving as a predator strain) with competing bacteria (serving as prey cells), the viability of the competing bacteria is reduced by up to 105 fold in comparison to the same experiment with deletion mutants either for Hyde1 or for T6SS. This is shown for 7/10 strains of prey cells tested. See
Acidovorax citrulli AAC00-1 strain also inhibits growth or kill a set of plant pathogens. A deletion mutant for Aave_0989 gene abolished the Hyde1 toxicity against two prey strains (E. coli and L434). Hence the Aave_0989 gene demonstrated the strongest antibacterial property of the genes tested. See
Plants intimately associate with diverse bacteria. Plant-associated bacteria have ostensibly evolved genes that enable them to adapt to plant environments. However, the identities of such genes are mostly unknown, and their functions are poorly characterized. 484 genomes of bacterial isolates from roots of Brassicaceae, poplar, and maize are sequenced. 3,837 bacterial genomes are compared to identify thousands of plant-associated gene clusters. Genomes of plant-associated bacteria encode more carbohydrate metabolism functions and fewer mobile elements than related non-plant-associated genomes do. Candidates from two sets of plant-associated genes are experimentally validated: one involved in plant colonization, and the other serving in microbe-microbe competition between plant-associated bacteria. 64 plant-associated protein domains are identified that potentially mimic plant domains; some are shared with plant-associated fungi and oomycetes. This expands the genome-based understanding of plant-microbe interactions and provides potential leads for efficient and sustainable agriculture through microbiome engineering.
An objective was to characterize the genes that contribute to bacterial adaptation to plants (plant-associated genes) and those genes that specifically aid in bacterial root colonization (root-associated genes). The genomes of 484 new bacterial isolates and single bacterial cells from the roots of Brassicaceae, maize, and poplar trees are sequenced. The newly sequenced with existing genomes into a dataset of 3837 high quality, non-redundant genomes are combined. A computational approach to identify plant-associated (PA) genes and root-associated (RA) genes based on comparison of phylogenetically-related genomes with knowledge of the origin of isolation is developed. Two sets of PA genes, including a novel gene family that functions in plant-associated microbe-microbe competition are experimentally validated. In addition, many PA genes that are shared between bacteria of different phyla and even between bacteria and PA eukaryotes are characterized. This represents a comprehensive and unbiased effort to identify and characterize candidate genes required at the bacterial-plant interface.
To obtain a comprehensive PA bacterial reference genome set, 191, 135, and 51 novel bacterial strains from the roots of Brassicaceae (91% from Arabidopsis thaliana), poplar trees (Populus trichocarpa and Populus deltoides), and maize, respectively, isolated and sequenced (Table 1). The bacteria are specifically isolated from either the root interior (endophytic compartment), the root surface (rhizoplane), or the soil attached to the root (rhizosphere) of plants. In addition, 107 single bacterial cells from surface-sterilized roots of A. thaliana are isolated and sequenced. All genomes are assembled, annotated and deposited in public databases and in a dedicated website.
Acinetobacter*
Pseudomonas*
A Broad, High-Quality Bacterial Genome Collection
In addition to the newly sequenced genomes noted above, public databases are mined to collect 5587 bacterial genomes belonging to the four most abundant phyla of PA bacteria13 (Methods). Each genome is manually classified as PA, non-plant associated (NPA), or soil-derived based on its unambiguous isolation niche. The PA genomes include organisms isolated from plants or rhizospheres. A subset of the PA bacteria is also annotated as ‘RA’ when isolated from the rhizoplane or the root endophytic compartment. Genomes from bacteria isolated from soil are considered as a separate group, as it is unknown whether these strains can actively associate with plants. Finally, the remaining genomes are labeled as non-plant associated (NPA) genomes; these are isolated from diverse environments, including humans, animals, air, sediments, and aquatic environments. A stringent quality control process is performed to remove low quality or redundant genomes. This leads to a final dataset of 3837 high quality and non-redundant genomes, including 1160 PA genomes, 523 of which are also RA. These 3837 genomes are grouped into nine monophyletic taxa to allow comparative genomics among phylogenetically-related genomes (
To determine whether the genome collection from cultured isolates is representative of plant-associated bacterial communities, cultivation-independent 16S rDNA surveys and metagenomes from the plant environment of Arabidopsis11,12, barley18, wheat, and cucumber14 are analyzed. The nine taxa analyzed here account for 33-76% (median 41%, Supplementary Table 4) of the total bacterial communities found in PA environments and therefore represent a significant portion of the plant microbiota, consistent with previous reports13,16,19.
The genomes of bacteria isolated from plant environments with bacteria of shared ancestry yet isolated from non-plant environments are compared. The two groups should differ in the set of accessory genes that evolved as part of their adaptation to a specific niche. Comparison of the size of PA, soil, and NPA genomes reveal that PA and/or soil genomes are significantly larger than NPA genomes (P<0.05, PhyloGLM and t-tests). The trend is observed in 6-7 of the nine analyzed taxa (depending on the test), representing all four phyla. Pangenome analyses within a few genera having PA and NPA isolation sites reveal similar pangenome sizes between PA and NPA genomes.
Next, whether certain gene categories are enriched or depleted in PA genomes compared to their NPA counterparts is examined, using 26 broad functional gene categories. Enrichments are detected using the PhyloGLM test (
It is sought to identify specific genes that are enriched in PA and RA genomes, compared to NPA and soil-derived genomes, respectively. First, the proteins/protein domains of each taxon are clustered based on homology using different annotation resources: COG20, KEGG Orthology21 and TIGRFAM22, which typically comprise 35%-75% of all genes in bacterial genomes23. In order to capture in the analysis genes that do not have existing functional annotations, Orthofinder24 is used (following benchmarking) to cluster all protein sequences within each taxon into homology-based orthogroups. Finally, protein domains are clustered using Pfam25. These five protein/domain clustering approaches are used in parallel comparative genomics pipelines. Each protein/domain sequence is additionally labeled as originating from either a PA or a NPA genome.
Next, it is tested if protein/domain clusters are significantly associated with a PA lifestyle using five independent statistical approaches: hypergbin, hypergcn (two versions of the Hypergeometric test), phyloglmbin, phyloglmcn (two phylogenetic tests based on PhyloGLM26), and Scoary27, a stringent combined test. These analyses are based on either gene presence/absence or gene copy number. A gene is defined as significantly PA (henceforth “PA gene”) if it belonged to a significant PA gene cluster by at least one test, and originated from a PA genome. Significant NPA, RA and soil genes are defined in the same way. Significant gene clusters found using the different methods had varying degrees of overlap. In general, it is noted there is a high degree of overlap between PA and RA genes and an overlap between NPA and soil genes. Overall, PA genes are depleted from NPA genomes from heterogeneous isolation sources. Performing principal coordinates analysis (PCoA) using matrices containing only the PA and NPA genes are derived from each method as features increased the separation of PA from NPA genomes along the first two axes.
As a validation of predictions, the abundance patterns of PA/RA genes in natural environments are assessed. 38 publicly available PA, NPA, RA and soil shotgun metagenomes are retrieved, including some from PA environments that are not used for isolation of the bacteria analyzed here14,28,29. Reads from these culture-independent metagenomes to PA genes from all statistical approaches are mapped. PA genes in up to seven taxa are more abundant (P <0.05, t-test) in PA metagenomes than in NPA metagenomes (
In addition, eight genes that were predicted as PA by multiple approaches are selected for experimental validation using an in planta bacterial fitness assay. The roots of surface-sterilized rice seedlings (n=9-30 seedlings/experiment) are inoculated with wild type Paraburkholderia kururiensis M130 (a rice endophyte30) or a knock-out mutant strain for each of the eight genes. The plants are grown for 11 days, collected and quantified the bacteria that were tightly attached to the roots. Mutations in two genes lead to four-six fold reduced colonization (FDR corrected Wilcoxon rank sum test, q <0.1) relative to wild type bacteria (
Functions for which co-expression and cooperation between different proteins are needed are often encoded by gene operons in bacteria. It is tested whether the methods correctly predict known PA operons. PA and RA genes are grouped into putative PA and RA operons based on their genomic proximity and orientation. This analysis yielded some well-known PA functions, for example, the nodABCSUIJZ and nifHDKENXQ operons (
In summary, thousands of PA and RA gene clusters are identified by five different statistical approaches and validated these by computational and experimental approaches, broadening our understanding of the genetic basis of plant-microbe interactions and providing a valuable resource to drive further experimentation.
PA and RA proteins and protein domains conserved across evolutionarily diverse taxa are potentially pivotal to the interaction of bacteria with plants. 767 Pfam domains are identified that are significant PA domains in at least three taxa based on multiple tests. Two of these domains, a DNA binding (pfam00356) and a ligand binding (pfam13377) domain, are characteristic of the LacI transcription factor (TF) family. These TFs regulate gene expression in response to different sugars41 and their copy number is expanded in the genomes of PA and RA bacteria of eight of the nine taxa analyzed (
Another domain, Aldo-keto reductase (pfam00248), is a metabolic domain enriched within the genomes of PA and RA bacteria from eight taxa belonging to all four phyla (
Convergent evolution or horizontal transfer of protein domains from eukaryotes to bacteria have been suggested for some microbial effector proteins that are secreted into eukaryotic host cells to suppress defense and facilitate microbial proliferation43-45. New candidate effectors or other functional plant protein mimics are searched for. A set of significant PA/RA Pfam domains is retrieved that are reproducibly predicted by multiple approaches or in multiple taxa and cross-referenced these with protein domains that are also more abundant in plant genomes than in bacterial genomes. This analysis yields 64 Plant-Resembling PA and RA Domains (PREPARADOs) encoded by 11,916 genes. The number of PREPARADOs is four-fold higher than the number of domains that overlap with reproducible NPA/soil domains and plant domains (n=15). The PREPARADOs are relatively abundant in genomes of PA Bacteroidetes and Xanthomonadaceae (>0.5% of all domains on average). Some PREPARADOs are previously described as domains within effector proteins, such as Ankyrin repeats46, regulator of chromosome condensation repeat (RCC1)47, Leucine-rich repeat (LRR)48, and pectate lyase49. Intriguingly, PREPARADOs from plant genomes are enriched 3-14-fold (P<10-5, Fisher exact test) as domains predicted to be ‘integrated effector decoys’ when fused to plant intracellular innate immune receptors of the NLR class50-53 (compared against two random domain sets). Surprisingly, 2201 bacterial proteins that encode 17/64 of the PREPARADOs share ≥40% identity across the entire protein sequence with eukaryotic proteins from plants, PA fungi or PA oomycetes, and therefore likely maintain a similar function. The patchy distribution among this class could have resulted from convergent evolution or from cross-kingdom HGT between phylogenetically distant organisms experiencing the shared selective forces of the plant environment.
Seven PREPARADO-containing protein families are characterized by N-terminal eukaryotic or bacterial signal peptides followed by a PREPARADO dedicated to carbohydrate binding or metabolism. One of these domains, Jacalin, is a mannose-binding lectin domain that is found in 48 genes in the Arabidopsis thaliana genome compared with three genes in the human genome25. Mannose is found on the cell wall of different bacterial and fungal pathogens and could serve as a microbial-associated molecular pattern (MAMP) that is recognized by the plant immune system54-61. A family of ˜430 AA long microbial proteins is identified with a signal peptide, followed by a functionally ill-defined endonuclease/exonuclease/phosphatase family domain (pfam03372) and ending with a Jacalin domain (pfam01419). Strikingly, this domain architecture is absent in plants but is distributed across diverse microorganisms, many of which are phytopathogens, including Gram-negative and -positive bacteria, fungi from the Ascomycota and Basidiomycota phyla, and oomycetes (
To conclude, a large set of protein domains is discovered that are shared between plants and the microbes colonizing them. In many cases the entire protein is conserved across evolutionarily distant PA microorganisms.
Numerous cases of PA gene clusters (orthogroups) are identified that demonstrate high co-occurrence between genomes. When the PA genes are derived from phylogeny-aware tests (i.e. PhyloGLM and Scoary) they are candidates for inter-taxon HGT events. For example, a cluster predicted by Scoary of up to 11 co-occurring genes (mean pairwise Spearman correlation=0.81) is identified in a flagellum-like locus from sporadically distributed PA/soil genomes across 12 different genera in Burkholderiales (
In addition to successfully capturing several known PA operons (
The typical Jekyll gene is 97 AAs long, contains an N-terminal signal peptide, lacks a transmembrane domain, and in 98.5% of cases appears in non-pathogenic PA or soil-associated Acidovorax isolates (
The Hyde putative operons, on the other hand, are composed of two distinct gene families unrelated to Jekyll. A typical Hyde1 protein has 135 AAs and an N-terminal transmembrane helix. Hyde1 proteins are also highly variable as measured by copy number variation, sequence divergence and intra-locus transposon insertions (
The elevated sequence diversity of Jekyll and Hyde1 genes suggests that these two PA protein families could be involved in molecular arms races with other organisms within the plant environment. Since many type VI effectors are used in inter-bacterial warfare, Acidovorax Hyde1 proteins are tested for antibacterial properties. Expression of two variants of the gene in E. coli led to 105-106 fold reduction in cell numbers (
There is increasing awareness that plant-associated microbial communities play important roles in host growth and health. An understanding of plant-microbe relationships at the genomic level could enable enhancement of agricultural productivity using microbes. Most studies have focused on specific plant microbiomes, with more emphasis on microbial diversity than on gene function12,14,16,18,68-74. Nearly 500 RA bacterial genomes isolated from different plant hosts are sequenced. These new genomes are combined in a collection of 3837 high quality bacterial genomes for comparative analysis. A systematic approach is developed to identify PA and RA genes and putative operons. This method is accurate as reflected by the ability to capture numerous operons previously shown to have a PA function, the enrichment of PA genes in PA metagenomes, the validation of Hyde1 proteins as likely type VI effectors in Acidovorax directed against other PA bacteria, and the validation of two new genes in Paraburkholderia kururiensis that affect rice root colonization. Bacterial genes that are enriched in genomes from the plant environment are also likely to play a role in adaptation to the many other organisms that share the same niche, as demonstrated for Hyde1.
Five different statistical approaches are used to identify genes significantly associated with the plant/root environment, each with its advantages and disadvantages. The phylogeny-correcting approaches (phyloglmbin, phyloglmcn, and Scoary) allow accurate identification of genes that are polyphyletic and correlate with an environment independently of ancestral state. Based on metagenome validation, the hypergeometric test predicts more genes that are abundant in plant-associated communities than Phyloglm. It also enables identification of monophyletic PA genes but yields more false positives than the phylogenetic tests since in every PA lineage, many lineage-specific genes will be considered PA. Scoary is the most stringent method of all and yields the lowest number of predictions. Future experimental validation should prioritize genes predicted in multiple taxa and/or by multiple approaches.
64 PREPARADOs are discovered. Proteins containing 19 of these domains are predicted to be secreted by Sec or T3SS. Notably, plant proteins carrying 35 of these domains belong to the NLR class of intracellular innate immune receptors. Hence, these PREPARADO protein domains may serve as molecular mimics. Some may interfere with plant immune functions through disruption of key plant protein interactions75,76. Likewise the Jacalin-containing proteins in PA bacteria, fungi and oomycetes may represent a strategy of avoiding MAMP-triggered immunity by binding to extracellular microbial mannose molecules, thereby serving as a molecular invisibility cloak77,78.
Finally, it is demonstrated that numerous PA functions are surprisingly consistent across phylogenetically-diverse bacterial taxa and that some functions are even shared with PA eukaryotes. Some of these traits may facilitate plant colonization by microbes and therefore might prove useful in genome engineering of agricultural inoculants to eventually yield a more efficient and sustainable agriculture.
Bacterial strains from Brassicaceae and Poplar are isolated using previously described protocols79,80. Poplar strains are cultured from root tissues collected from Populus deltoides and Populus trichocarpa trees in Tennessee, North Carolina, and Oregon. Root samples are processed as described previously15,81. Briefly, rhizosphere strains are isolated by plating serial dilutions of root wash, while for endosphere strains, surface sterilized roots are pulverized with a sterile mortar and pestle in 10 mL of MgSO4 (10 mM) solution followed by plating serial dilutions. Strains are isolated on R2A agar media, and resulting colonies are picked and re-streaked a minimum of three times to ensure isolation. Isolated strains are identified by 16S rDNA PCR followed by Sanger sequencing.
For maize isolates, soils associated with Il14h and Mo17 maize genotypes grown in Lansing, N.Y. and Urbana, Ill. The rhizosphere soil samples of each maize genotype are grown at each location and are collected at week 12 as previously described68. From each rhizosphere soil sample, soil is washed and samples are plated onto Pseudomonas Isolation Agar (BD Diagnostic Systems). The plates are incubated at 30° C. until colonies formed and DNA is extracted from cells.
For isolation of single cells, A. thaliana accessions Col-0 and Cvi-0 are grown to maturity. Roots are washed in distilled water multiple times. Root surfaces are sterilized using bleach. Surfaced sterilized roots are then ground using a sterile mortar and pestle. Individual cells are isolated using FACS followed by DNA amplification using MDA, and 16S rDNA screening as described previously82.
DNA from isolates and single cells is sequenced using NGS platforms, mostly using the Illumina HiSeq technology. Sequenced genomic DNA is assembled using different assembly methods. Genomes are annotated using the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)23 and are deposited at the IMG database83, ENA or Genbank for public usage.
5586 bacterial genomes are retrieved from the IMG system. Isolation sites are identified through a manual curation process that included scanning of IMG metadata, DSMZ, ATCC, NCBI Biosample, and the scientific literature. Based on its isolation site, each genome is labeled as one of PA, NPA, or soil. PA organisms are also labeled as RA when isolated from the EC or from the rhizoplane. A stringent quality control is applied to ensure a high quality and minimally biased set of genomes:
To generate a bacterial phylogenetic tree of the 3837 high-quality and non-redundant genomes, 31 universal single copy genes from each genome are retrieved using AMPHORA287. For each individual marker gene an alignment is constructed using Muscle with default parameters. The 31 alignments are masked using Zorro88 and filtered the low quality columns of the alignment. Finally, the 31 alignments are concatenated into an overall merged alignment from which an approximately-maximum-likelihood phylogenetic tree is built using the WAG model implemented in FastTree 2.189.
The dataset is divided into different taxa (taxonomic groups) in order to allow downstream identification of genes enriched in the PA or RA genomes of each taxon over the NPA or soil genomes from the same taxon, respectively. In order to determine the number of taxonomic groups to analyze, the phylogenetic tree is converted into a distance matrix using the cophenetic function implemented in the R package ape. The 3837 genomes are clustered into 9 groups using k-medoids clustering as implemented in the partitioning around medoids (PAM) algorithm from the R package fpc. k-medoids clusters a data set of n objects into k a priori defined clusters. In order to identify the optimal k for the dataset, the silhouette coefficient for values of k ranging from 1 to 30 is compared. A value of k=9 is selected as it yielded the maximal average silhouette coefficient (0.66). In addition, when using a k=9 the taxa are monophyletic, contained hundreds of genomes, and are relatively balanced between PA and NPA genomes in most taxa (Table 1). The resulting genome clusters generally overlap with annotated taxonomic units. One exception is in the Actinobacteria phylum. The clustering divide the genomes into two taxa that named “Actinobacteria 1” and “Actinobacteria 2”. However, this rigorous phylogenetic analysis supports previous suggestions for revisions in the taxonomy of phylum Actinobacteria90.
In addition, the tree revealed very divergent bacterial taxa in the Bacteroidetes phylum that cannot be separated into monophyletic groups. Specifically, the Sphingobacteriales order (from Class Sphingobacteria) and the Cytophagaceae (from class Cytophagia) are paraphyletic. Therefore, all Bacteroidetes are unified into one phylum-level taxon.
The following description applies to PA, NPA, RA, and soil genes. PA genes are identified using a two-step process that includes protein/domain clustering based on AA sequence similarity and subsequent identification of the protein/domain clusters significantly enriched in protein/domains from PA bacteria. Clustering of genes and protein domains involved five independent methods: Orthofinder24, COG20, Kegg orthology (KO)21, TIGRFAM22, and Pfam25. Orthofinder is selected (following the aforementioned benchmarking) as a clustering approach that included all proteins, including those that lack any functional annotation. First, each taxon is compiled separately, a list of all proteins in the genomes. For COG, KO, TOGRFAM, and Pfam, the existing annotations of IMG genes is used that are based on blast alignments to the different protein/domain models23. This process yielded gene/domain clusters. Next, clusters are tested that are significantly enriched with genes derived from PA genomes. These clusters are termed ‘PA clusters’. In the statistical analysis, only clusters of more than five members are used. P values are corrected with Benjamini-Hochberg FDR and use q<0.05 as significance threshold, unless stated differently. The proteins in each cluster are categorized as either PA or NPA, based on the label of its encoding genome.
Metagenome samples (n=38) are downloaded from NCBI and GOLD. The reads are translated into proteins and proteins of at least 40 aa long are aligned using HMMsearch95 against the different protein references. The protein references include the predicted PA, RA, soil, and NPA proteins from Orthofinder found significant by the different approaches.
In order to visualize the overall contribution of statistically significant enriched/depleted orthogroups to the differentiation of PA and NPA genomes, PCoA and logistic regression is utilized. For each of the nine taxa analyzed, this analysis is run over a collection of matrices. The first matrix is the full pan genome matrix; this matrix depicts the distribution of all the orthogroups contained across all the genomes in a given taxon. The subsequent matrices represent subsets of the full pan genome matrix, each of these matrices depict the distribution of only the statistically significant orthogroups as called by one of the five different algorithms utilized to test for the genotype-phenotype association.
The function cmdscale from the R (v 3.3.1) stats package is used to run PCoA over all the Tmatrices described above using the Canberra distance as implemented in the vegdist function from the vegan (v 2.4-2) R package (see URLs). Then, the first two axes output from the PCoA are used as independent variables to fit a logistic regression over the labels of each genome (PA, NPA). Finally, the Akaike Information Criteria (AIC) is computed for each of the different models fitted. Briefly, the AIC estimates how much information is lost when a model is applied to represent the true model of a particular dataset. See URLs for the scripts used to perform the PCoA.
Validation of PA Genes in Paraburkholderia kururiensis M130 Affecting Rice Root Colonization
Growth and transformation details of Paraburkholderia kururiensis M130 are determined.
Internal fragments of 200-900 bp from each gene of interest are PCR amplified using primers. Fragments are first cloned in the pGem2T easy vector (Promega) and sequenced (GATC Biotech; Germany), then excised with EcoRI restriction enzyme and cloned in the corresponding site in pKNOCK Km R96. These plasmids are then used as a suicide delivery system in order to create the knockout mutants and transferred to P. kururiensis M130 by triparental mating. All the mutants are verified by PCR using primers specific to the pKNOCK-Km vector and to the genomic DNA sequences upstream and downstream from the targeted genes.
Rhizosphere Colonization Experiments with P. kururiensis and Mutant Derivatives
Seeds of Oryza sativa (BALDO variety) are surface sterilized and are left to germinate in sterile conditions at 30° C. in the dark for seven days. Each seedling is then aseptically transferred into a 50 mL Falcon tube containing 35 mL of half strength Hoagland solution semisolid substrate (0.4% agar). The tubes are then inoculated with 107 cfu of a P. kururiensis suspension. Plants are grown for eleven days at 30° C. (16-8 h light-dark cycle). For the determination of the bacterial counts, plants are washed under tap water for 1 min and then cut below the cotyledon to excise the roots. Roots are air dried for 15 min, weighed and then transferred to a sterile tube containing 5 mL of PBS. After vortexing, the suspension is serially diluted to 10-1 and 10-2 in PBS and aliquots are plated on KB plates containing the appropriate antibiotic (Rif 50 μg/mL for the wt, Rif 50 μg/mL and Km 50 μg/mL for the mutants). After three days incubation at 30° C., cfu are counted. Three replicates for each dilution from ten independent plantlets are used to determine the average cfu values.
Pfam25 version 30.0 metadata is downloaded. Protein domains that appear in both Viridiplantae and bacteria and occur at least twice more frequently in Viridiplantae than in bacteria were considered as plant-like domains (n=708). In parallel, the set of significant PA, RA, NPA, soil Pfam protein domains predicted by the five algorithms in the nine taxa are scanned. A list of domains is compiled that are significant PA/RA in at least four tests, and significant NPA/soil in up to two tests (n=1779). The overlap between the first two sets is defined as PREPARADOs (n=64). In parallel, two control sets of 500 random plant-like Pfam domains and 500 random PA/RA Pfam domains are created. Enrichment of PREPARADOs integrated into plant NLR proteins in comparison to the domains in the control groups is tested using the Fisher exact test. In order to identify domains found in plant disease resistance proteins, all proteins are retrieved from Phytozome and BrassicaDB. To identify domains in plant disease resistance proteins, hmmscan is used to search protein sequences for the presence of either NB-ARC (PF00931.20), TIR (PF01582.18), TIR_2 (PF13676.4), or RPW8 (PF05659.9) domains. Bacterial proteins carrying the PREPARADO domains are considered as having full-length identity to fungal, oomycete or plant proteins based on LAST alignments to all Refseq proteins of plants, fungi, and protozoa. Full-length is defined as an alignment length of at least 90% of the length of both query and reference proteins. The threshold used for considering a high amino acid identity was 40%. Explanation about prediction of secretion of proteins with PREPARADOs appears in the Supplementary Information.
Significant PA, NPA, RA, and soil genes of each genome are clustered based on genomic distance: genes sharing the same scaffold and strand that were up to 200 bp apart are clustered into the same predicted operon. Up to one spacer gene, which is a non-significant gene, is allowed between each pair of significant genes within an operon. Operons are predicted for the genes in COG and OrthoFinder clusters using all five approaches. Operons are annotated as Biosynthetic Gene Clusters (BGCs) if at least one of the constituent genes is part of a BGC from the IMG-ABC database97.
To find all homologs and paralogs of Jekyll and Hyde genes, IMG blast search with an e value threshold of 1e-5 is used against all IMG isolates. Hyde1 homologs of Acidovorax, Hyde1 homologs of Pseudomonas, Hyde2, and Jekyll genes are searched using proteins of genes Aave_1071, A243_06583, Ga0078621_123530, and Ga0102403_10160 as the query sequence, respectively. Multiple sequence alignments are done using Mafft98. A phylogenetic tree of Acidovorax species is produced using RaxML99 based on concatenation of 35 single copy genes110.
To verify the toxicity of Hyde1 and Hyde2 proteins to E. coli, genes encoding proteins Aave_0990 (Hyde2), Aave_0989 (Hyde1) and Aave_3191 (Hyde1) or GFP as a control, are cloned to the inducible pET28b expression vector using the LR reaction. The recombinant vectors are transformed into E. coli C41 competent cells using electroporation after sequencing validation. Five colonies are selected and cultured in LB liquid media supplemented with kanamycin with shaking overnight. OD600 of the bacteria culture is adjusted to 1.0 and then diluted by 102, 104, 106 and 108 times successively. Bacteria culture gradients are spotted (5 μL) on LB plates with or without 0.5 mM IPTG to induce gene expression.
A Δ5-Hyde1 strain is constructed. Acidovorax citrulli strain AAC00-1 and its derived mutants are grown on nutrient agar medium supplemented with rifampicin (100 μg/ml). To delete a cluster of five Hyde1 genes (Aave_3191-3195), a marker-exchange mutagenesis is performed as previously described101. The marker-free mutant is designated as Δ1-Hyde1, and its genotype is confirmed by PCR amplification and sequencing. The marker-exchange mutagenesis procedure is repeated to further delete four Hyde1 loci. The final mutant with deletion of 9 out of 11 Hyde1 genes (in five loci) is designated as Δ5-Hyde1 and is used for competition assay. A ΔT6SS mutant was from Ron Walcott's lab.
Competition assay of Acidovorax citrulli AAC00-1 Against Different Strains
E. coli BW25113 pSEVA381 is grown aerobically in LB broth (5 g/L NaCl) at 37° C. in presence of chloramphenicol. Naturally antibiotic resistant bacterial leaf isolates16 and Acidovorax strains are grown aerobically in NB medium (5 g/L NaCl) at 28° C. in presence of the appropriate antibiotic.
Competition assays are conducted similarly as described elsewhere66,102. Briefly, bacterial overnight cultures are harvested and washed in PBS (pH 7.4) to remove excess antibiotics and resuspended in fresh NB medium to an optical density of 10. Predator and prey strains are mixed at 1:1 ratio and 5 μL of the mixture is spotted onto dry NB agar plates and incubated at 28° C. As a negative control, the same volume of NB medium is mixed with prey cells instead of the predator strain. After 19h of co-incubation, bacterial spots are excised from the agar and resuspended in 500 μL NB medium and are spotted on NB agar containing antibiotic selective for the prey strains. CFUs of recovered prey cells are determined after incubation at 28° C. All assays are performed in at least three biological replicates.
References cited (wherein each reference is incorporated by reference):
While the present invention has been described with reference to the specific embodiments thereof, it should be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the true spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation, material, composition of matter, process, process step or steps, to the objective, spirit and scope of the present invention. All such modifications are intended to be within the scope of the claims appended hereto.
This application claims priority as a continuation application to PCT International Patent Application No. PCT/US2018/059277, filed Nov. 5, 2018, which claims priority to U.S. Provisional Patent Application Ser. No. 62/581,556, filed on Nov. 3, 2017, which are hereby both incorporated by reference in their entireties.
The invention was made with government support under Contract Nos. DE-AC02-05CH11231 awarded by the U.S. Department of Energy and Grant No. IOS-1343020 awarded by the National Science Foundation. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
62581556 | Nov 2017 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2018/059277 | Nov 2018 | US |
Child | 16866308 | US |