Biosynthetic Approach For Heterologous Production And Diversification Of Bioactive Lyciumin Cyclic Peptides

Information

  • Patent Application
  • 20200347396
  • Publication Number
    20200347396
  • Date Filed
    January 21, 2019
    6 years ago
  • Date Published
    November 05, 2020
    4 years ago
Abstract
Lyciumin cyclic peptides and methods of producing lyciumin cyclic peptides are described. A host cell can include a transgene encoding a lyciumin precursor peptide, or a biologically-active fragment thereof. The lyciumin precursor peptide, or biologically-active fragment thereof, can include one or more core lyciumin peptide domains. The transgene can be expressed in the host cell to thereby produce a lyciumin precursor peptide, or biologically-active fragment thereof. The lyciumin precursor peptide, or biologically-active fragment thereof, can be converted to one or more lyciumin cyclic peptides in the host cell. A library of nucleic acids encoding lyciumin precursor peptides, or biologically-active fragments thereof, can be generated.
Description
INCORPORATION BY REFERENCE OF MATERIAL IN ASCII TEXT FILE

This application incorporates by reference the Sequence Listing contained in the following ASCII text file being submitted concurrently herewith:

    • a) File name: 03992061003_SEQUENCELISTING.txt; created Jan. 21, 2019; 317 KB in size.


BACKGROUND

Cyclic peptides are an emerging source of pharmaceutical and agrochemical innovation in order to treat human and plant diseases, respectively [1]. Cyclic peptide natural products can be used in medicine as immunosuppressants, antibiotics and anti-cancer agents [2] and in agriculture for plant pathogen control [3]. The emergence of resistance against many current antibiotics and pest control agents in human pathogens and plant pathogens [5], respectively, demands new discovery and metabolic engineering platforms for peptide-based drug and agrochemical development. Additionally, plant peptides are important endogenous chemicals to modulate the rhizosphere during plant development [6] and under abiotic stresses [7], and, thus, they have potential in optimizing plant fitness in changing climates through rhizosphere engineering.


SUMMARY

Described herein is a method of producing one or more lyciumin cyclic peptides. In some embodiments, the method of producing one or more lyciumin cyclic peptides can include providing a host cell that includes a transgene encoding a lyciumin precursor peptide, or a biologically-active fragment thereof, and expressing the transgene in the host cell to thereby produce a lyciumin precursor peptide, or biologically-active fragment thereof. The lyciumin precursor peptide, or biologically-active fragment thereof, can include one or more core lyciumin peptide domains. In some embodiments, the lyciumin precursor peptide, or biologically-active fragment thereof, can be converted to one or more lyciumin cyclic peptides in the host cell.


In some embodiments, the transgene is operably linked to a heterologous promoter in the host cell. In some embodiments, the transgene is introduced in a vector. In some embodiments, the method includes introducing the transgene into the host cell. In some embodiments, the method includes introducing a vector that includes the transgene into the host cell. In some embodiments, the lyciumin precursor peptide includes a plurality of core lyciumin peptide domains. In some embodiments, the core lyciumin peptide domains can encode two or more different lyciumin cyclic peptides.


The host cell can express one or more of: an enzyme that cyclizes the lyciumin precursor peptide; an endopeptidase; a glutamine cyclotransferase; and/or an exopeptidase. In some embodiments, arginine is immediately N-terminal to the core lyciumin peptide domain. In some embodiments, the endopeptidase is an arginine endopeptidase. In some embodiments tyrosine is immediately C-terminal to the core lyciumin peptide domain.


In some embodiments, the host cell is a plant cell. In some embodiments, the plant cell is an Amaranthaceae family plant cell. In some embodiments, the plant cell is an Amaranthus genus plant cell, such as an Amaranthus hypochondriacus plant cell. In some embodiments, the plant cell is a Beta genus plant cell, such as a Beta vulgaris plant cell. In some embodiments, the plant cell is a Chenopodium genus plant cell, such as a Chenopodium quinoa plant cell. In some embodiments, the plant cell is a Fabaceae family plant cell. In some embodiments, the plant cell is a Glycine genus plant cell, such as a Glycine max plant cell. In some embodiments, the plant cell is a Medicago genus plant cell, such as a Medicago truncatula plant cell. In some embodiments, the plant cell is a Solanaceae family plant cell. In some embodiments, the plant cell is a Solanum genus plant cell, such as a Solanum melongena plant cell or a Solanum tuberosum plant cell. In some embodiments, the plant cell is a Nicotiana genus plant cell, such as a Nicotiana benthamiana plant cell. In some embodiments, the plant cell is a Capsicum genus plant cell, such as a Capsicum annuum plant cell.


In some embodiments, the lyciumin precursor peptide includes SEQ ID NO: 1. In some embodiments, the lyciumin precursor peptide consists of SEQ ID NO: 1. In some embodiments, the lyciumin precursor peptide consists essentially of SEQ ID NO: 1. In some embodiments, the lyciumin precursor peptide includes SEQ ID NO: 2. In some embodiments, the lyciumin precursor peptide consists of SEQ ID NO: 2. In some embodiments, the lyciumin precursor peptide consists essentially of SEQ ID NO: 2. In some embodiments, the lyciumin cyclic peptide is Lyciumin A, Lyciumin B, Lyciumin C, or Lyciumin D, or a combination thereof.


Described herein also is a method of generating a library of nucleic acids encoding lyciumin precursor peptides, or biologically-active fragments thereof. The method can include constructing a plurality of vectors, each vector comprising a nucleic acid encoding a different lyciumin precursor peptide, or biologically-active fragment thereof, operably linked to a heterologous promoter for expression in a host cell. In some embodiments, the library can include at least at least hundreds of nucleic acids, e.g., at least 103 nucleic acids, at least 104 nucleic acids, at least 105 nucleic acids, at least 106 nucleic acids, or at least 10′ nucleic acids.


In some embodiments, the method of generating a library of nucleic acids can include introducing the plurality of vectors into host cells. In certain embodiments, the lyciumin precursor peptide, or biologically-active fragments thereof, can be converted to one or more lyciumin cyclic peptides in the host cell. In some embodiments, the host cell is a plant cell. In some embodiments, the plant cell is a Solanaceae family plant cell. In some embodiments, the plant cell is a Nicotiana genus plant cell, such as a Nicotiana benthamiana plant cell.


In some embodiments, the method can include isolating a lyciumin cyclic peptide from the host cell. In some embodiments, the method can include assaying for an activity of interest either crude extract from the host cell or a lyciumin peptide isolated from the host cell.


In some embodiments, the method of generating a library of nucleic acids can include introducing a nucleic acid encoding a lyciumin peptide having an activity of interest into a second host cell. In some embodiments, the second host cell is a plant cell. In some embodiments, the plant cell is an Amaranthaceae family plant cell. In some embodiments, the plant cell is an Amaranthus genus plant cell, such as an Amaranthus hypochondriacus plant cell. In some embodiments, the plant cell is a Beta genus plant cell, such as a Beta vulgaris plant cell. In some embodiments, the plant cell is a Chenopodium genus plant cell, such as a Chenopodium quinoa plant cell. In some embodiments, the plant cell is a Fabaceae family plant cell. In some embodiments, the plant cell is a Glycine genus plant cell, such as a Glycine max plant cell. In some embodiments, the plant cell is a Medicago genus plant cell, such as a Medicago truncatula plant cell. In some embodiments, the plant cell is a Solanaceae family plant cell. In some embodiments, the plant cell is a Solanum genus plant cell, such as a Solanum melongena plant cell or a Solanum tuberosum plant cell. In some embodiments, the plant cell is a Nicotiana genus plant cell, such as a Nicotiana benthamiana plant cell. In some embodiments, the plant cell is a Capsicum genus plant cell, such as a Capsicum annuum plant cell.


Also described herein are isolated nucleic acids comprising a nucleotide sequence encoding a lyciumin precursor peptide, or a biologically-active fragment thereof, operably linked to a heterologous promoter. In some embodiments, the lyciumin precursor peptide includes a plurality of core lyciumin peptide domains. In some embodiments, the core lyciumin peptide domains encode two or more different lyciumin cyclic peptides. In some embodiments, the lyciumin precursor peptide comprises SEQ ID NO: 1. In some embodiments, the lyciumin precursor peptide comprises SEQ ID NO: 2. In some embodiments, the nucleic acid is a cDNA.


Described herein are vectors that include any of the nucleic acids described herein.


Described herein are host cells that include any of the nucleic acids or vectors described herein. In some embodiments, the host cell is a plant cell. In some embodiments, the plant cell is an Amaranthaceae family plant cell. In some embodiments, the plant cell is an Amaranthus genus plant cell, such as an Amaranthus hypochondriacus plant cell. In some embodiments, the plant cell is a Beta genus plant cell, such as a Beta vulgaris plant cell. In some embodiments, the plant cell is a Chenopodium genus plant cell, such as a Chenopodium quinoa plant cell. In some embodiments, the plant cell is a Fabaceae family plant cell. In some embodiments, the plant cell is a Glycine genus plant cell, such as a Glycine max plant cell. In some embodiments, the plant cell is a Medicago genus plant cell, such as a Medicago truncatula plant cell. In some embodiments, the plant cell is a Solanaceae family plant cell. In some embodiments, the plant cell is a Solanum genus plant cell, such as a Solanum melongena plant cell or a Solanum tuberosum plant cell. In some embodiments, the plant cell is a Nicotiana genus plant cell, such as a Nicotiana benthamiana plant cell. In some embodiments, the plant cell is a Capsicum genus plant cell, such as a Capsicum annuum plant cell.


Further described herein is a library that includes a plurality of nucleic acid molecules, each nucleic acid molecule including a nucleotide sequence encoding a lyciumin precursor peptide, or a biologically-active fragment thereof. In some embodiments, the nucleotide sequence encoding a lyciumin precursor peptide, or a biologically-active fragment thereof, is operably linked to a heterologous promoter in each nucleic acid molecule. In some embodiments, the nucleic acid molecules are complementary DNA (cDNA) molecules.


In addition, described herein are lyciumin cyclic peptides produced by a method described herein.


Described herein is a method of producing one or more lyciumin cyclic peptides. The method can include: a) providing a host cell that includes a transgene encoding a polypeptide that includes one or more core lyciumin peptide domains; and b) expressing the transgene in the host cell to thereby produce a polypeptide that comprises one or more core lyciumin peptide domains. In some embodiments, the polypeptide is converted to one or more lyciumin cyclic peptides in the host cell.


The methods and products described herein can be used to produce a platform for lyciumin expression and diversification, which can be used to create a library of lyciumin cyclic peptides. The lyciumins precursor peptides described herein can be expressed in planta. The lyciumin cyclic peptides described herein can be used in agrochemical and pharmaceutical applications that aim to increase plant fitness towards abiotic and biotic stresses and treat human diseases, respectively.





BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.


The foregoing will be apparent from the following more particular description of example embodiments, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating embodiments.



FIGS. 1A-D show characterization of lyciumins as plant ribsosomal peptides. FIG. 1A is an image of Lyciumin producing plant Lycium barbarum. FIG. 1B is lyciumin structures from Lycium plants. FIG. 1C is Lyciumin precursor peptide sequence from Lycium barbarum root transcriptome. FIG. 1D is chromatograms showing heterologous expression of lyciumin precursor peptide gene LbaLycA in Nicotiana benthamiana via Agrobacterium tumefaciens LBA4404 pEAQ-HT system results in lyciumin A, B and D formation after 6 days. Base peak chromatograms (BPC) of lyciumin A, B and D mass signals.



FIGS. 2A-B show lyciumin discovery by plant genome mining. FIG. 2A is a schematic showing precursor gene-guided genome mining for lyciumin discovery from plant genomes by identification of lyciumin precursors (BURP domain proteins) in a plant genome by core peptide prediction, prediction of lyciumin chemotype from the core peptide sequence and lyciumin chemotyping by LC-MS and MS/MS analysis of plant extracts. FIG. 2B shows structural diversity of lyciumins characterized by genome mining from Amaranthaceae, Fabaceae, and Solanaceae plants with corresponding precursor gene accession numbers and core peptide sequences. The stereochemistry of lyciumin glycine α-carbon is inferred from lyciumin A and lyciumin I structure elucidation.



FIGS. 3A-E pertain to investigation of lyciumin biosynthesis in Lycium barbarum. FIG. 3A is a schematic showing an example biosynthetic pathway for lyciumin B formation in Lycium barbarum from precursor peptide LbaLycA. FIG. 3B shows detection of [Gln1]-lyciumin B mass signals in L. barbarum root extract and Nicotiana benthamiana leaf extracts after heterologous expression of lyciumin precursor LbaLycA for six days. FIG. 3C shows genomic co-localization of lyciumin precursor genes and glutamine cyclotransferase genes for putative N-terminal lyciumin protection in Chenopodium quinoa and Beta vulgaris. FIG. 3D shows detection of abolished mass signals for [Gln1]-lyciumin species in N. benthamiana leaf extracts after heterologous expression of lyciumin precursor LbaLycA and glutamine cyclotransferase LbaQC from Lycium barbarum root transcriptome. FIG. 3E shows detection of [Tyr9]-lyciumin B and [Tyr9-Gln10]-lyciumin B mass signals in L. barbarum root extract and N. benthamiana leaf extracts after heterologous expression of lyciumin precursor LbaLycA for six days. Abbreviations: LbaLycA (6d)—Nicotiana benthamiana infiltrated with Agrobacterium tumefaciens LBA4404 containing pEAQ-HT-LbaLycA (six days), pEAQ-HT (6d)—Nicotiana benthamiana infiltrated with Agrobacterium tumefaciens LBA4404 containing pEAQ-HT (six days), BPC—base peak chromatogram, PTM—post-translational modification.



FIGS. 4A-B show investigation of promiscuity of the lyciumin pathway by heterologous expression of lyciumin I precursor gene Sali3-2 from Glycine max and corresponding mutants in Nicotiana benthamiana via Agrobacterium tumefaciens LBA4404 pEAQ-HT system (6 days). FIG. 4A shows LC-MS detection of lyciumin I after heterologous expression of Sali3-2 in N. benthamiana. FIG. 4B is a table of core peptide mutants of Sali3-2 in the lyciumin pathway in Nicotiana benthamiana. Abbreviations: aa—amino acid, N.b.—Nicotiana benthamiana, A.t.—Agrobacterium tumefaciens.



FIG. 5 is candidate transcripts of lyciumin precursor from Lycium barbarum root transcriptome. Predicted lyciumin core peptides are highlighted in green (lyciumin A), blue (lyciumin B) and red (lyciumin D).



FIG. 6 is a signal peptide prediction of LbaLycA with SignalP-4.1.



FIGS. 7A-C are MS/MS analyses of lyciumin B exemplifying planar structure elucidation of lyciumin peptides by tandem mass spectrometry. FIG. 7A is the lyciumin B MS/MS spectrum with highlighted amino acid iminium ions. FIG. 7B is the lyciumin B spectrum with highlighted fragments of linear lyciumin N-terminus. FIG. 7C is the lyciumin MS/MS spectrum of fragments of C-terminal lyciumin macrocycle.



FIGS. 8A-E show lyciumin precursor gene expression. FIG. 8A shows lyciumin precursor gene expression in Amaranthus hypochondriacus. FIG. 8B shows lyciumin precursor gene expression in Chenopodium quinoa. FIG. 8C shows lyciumin precursor gene expression in Glycine max. FIG. 8D shows lyciumin precursor gene expression in Medicago truncatula. FIG. 8E shows lyciumin precursor gene expression in Solanum tuberosum.



FIGS. 9A-D show lyciumin chemotyping in source plant tissues. FIG. 9A shows lyciumin chemotyping in Amaranthus hypochondriacus tissues. FIG. 9B shows lyciumin chemotyping in Beta vulgaris tissues. FIG. 9C shows lyciumin chemotyping in Glycine max tissues. FIG. 9D shows lyciumin chemotyping in Chenopodium quinoa tissues.



FIGS. 10A-D show an investigation of sequences and distribution of lyciumin precursors in plants. FIG. 10A shows types of lyciumin precursors based on primary structure analysis (Abbreviation: Core—core peptide (core lyciumin peptide domain)). FIG. 10B shows a neighbor-joining phylogenetic tree of the BURP-domain sequences of lyciumin precursors predicted from plant genomes (denoted by “g” after species name) and plant transcriptomes (denoted by “t” after species name) and founding members of BURP domain protein family (BMN2A [Brassica napus], Polygalacturonase-1 non-catalytic subunit beta [Solanum lycopersicum], USP87/92 [Vicia faba], RD22 [Arabidopsis thaliana]) generated with 2000 bootstrap generations using the p-distance method. The scale measures evolutionary distances in substitutions per amino acid. Precursors with characterized chemotypes are noted with an asterick. Predicted cyclization sites at the fourth core pepitde position of precursor peptides are noted after species name as capital letter (Abbreviations: A—alanine, G—glycine, L—leucine, P—proline, R—arginine, T—threonine). FIG. 10C is a Venn diagram of core peptide sequences of predicted and characterized lyciumin chemotypes based on genome and transcriptome mining (Table 4). FIG. 10D is a phylogenetic relationship of plant families with predicted and characterized (noted by astericks) lyciumin chemotypes (both highlighted in red).



FIG. 11 shows a precursor gene-guided genome mining workflow for lyciumin discovery.



FIGS. 12A-B pertain to genome mining of lyciumins from Amaranthus hypochondriacus. FIG. 12A is Amaranthus hypochondriacus lyciumin precursor peptide (BURP domain underlined, core peptides highlighted in blue and red). FIG. 12B is predicted lyciumin chemotypes.



FIGS. 13A-B pertain to genome mining of lyciumins from Beta vulgaris. FIG. 13A is Beta vulgaris lyciumin precursor peptide (BURP domain underlined, core peptides highlighted in blue and green). FIG. 13B is predicted lyciumin chemotype.



FIGS. 14A-B pertain to genome mining of lyciumins from Chenopodium quinoa. FIG. 14A is Chenopodium quinoa lyciumin precursor peptide (BURP domain underlined, core peptides highlighted in blue, red and purple). FIG. 14B is predicted lyciumin chemotypes.



FIGS. 15A-D pertain to genome mining of lyciumins from Glycine max. FIG. 15A is Glycine max lyciumin H precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 15B is predicted lyciumin H chemotype. FIG. 15C is Glycine max lyciumin I precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 15D is predicted lyciumin I chemotype.



FIGS. 16A-D pertain to genome mining of lyciumins from Solanum melongena. FIG. 16A is Solanum melongena lyciumin I precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 16B is predicted lyciumin I chemotype. FIG. 16C is Solanum melongena lyciumin B precursor peptide (BURP domain underlined, core peptides highlighted in blue and red). FIG. 16D is predicted lyciumin B chemotype.



FIGS. 17A-B pertain to genome mining of lyciumins from Medicago truncatula. FIG. 17A is Medicago truncatula lyciumin I precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 17B is predicted lyciumin I chemotype.



FIGS. 18A-D pertain to genome and transcriptome mining of lyciumins from Solanum tuberosum. FIG. 18A is lyciumin precursor peptide derived from Solanum tuberosum genome (BURP domain underlined, core peptides highlighted in red). FIG. 18B is lyciumin precursor peptide transcripts derived from Solanum tuberosum tuber transcriptome (SRR5970148) de novo assembled with Trinity (v2.4, BURP domain underlined, core peptides highlighted in red). FIG. 18C is lyciumin precursor peptide transcripts derived from Solanum tuberosum tuber transcriptome (SRR5970148) de novo assembled with rnaSPAdes (v1.0, BURP domain underlined, core peptides highlighted in red). FIG. 18D is predicted lyciumin core peptides derived from genome mining and transcriptome mining of Solanum tuberosum. Bold core peptides indicate detected lyciumin chemotypes.



FIGS. 19A-O pertain to genome and transcriptome mining of lyciumins from Solanum tuberosum. FIG. 19A is Solanum tuberosum lyciumin J precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 19B is predicted lyciumin J chemotype. FIG. 19C is Solanum tuberosum lyciumin B precursor peptide transcript (BURP domain underlined, core peptides highlighted in red). FIG. 19D is predicted lyciumin B chemotype. FIG. 19E is Solanum tuberosum lyciumin K precursor peptide transcript (BURP domain underlined, core peptides highlighted in red). FIG. 19F is predicted lyciumin K chemotype. FIG. 19G is Solanum tuberosum lyciumin L precursor peptide transcript (BURP domain underlined, core peptides highlighted in red). FIG. 19H is predicted lyciumin L chemotype. FIG. 19I is Solanum tuberosum lyciumin M precursor peptide transcript (BURP domain underlined, core peptides highlighted in red). FIG. 19J is predicted lyciumin M chemotype. FIG. 19K is Solanum tuberosum lyciumin N precursor peptide transcript (BURP domain underlined, core peptides highlighted in red). FIG. 19L is predicted lyciumin N chemotype. FIG. 19M is Solanum tuberosum lyciumin O precursor peptide transcript (BURP domain underlined, core peptides highlighted in red). FIG. 19N is predicted lyciumin O chemotype. FIG. 19O shows lyciumin chemotyping in Solanum tuberosum tuber tissue and sprout tissue.



FIGS. 20A-B pertain to transcriptome mining of lyciumin peptide in Selaginella uncinata. FIG. 20A is predicted lyciumin precursor transcript (5′-partial, Table 4) from de novo rnaSPAdes assembly of Selaginella uncinata transcriptome (SRR7132763, BURP domain underlined, core peptide highlighted in red). FIG. 20B is predicted [QPYSVFAW]-lyciumin chemotype.



FIGS. 21A-B pertain to characterization of Lycium barbarum glutamine cyclotransferase (LbaQC). FIG. 21A is LbaQC sequence with predicted secretory pathway signaling peptide underlined (SignalP v4.1). FIG. 21B is bioinformatic analysis of candidate lyciumin-glutamine cyclotransferase LbaQC from root transcriptome of Lycium barbarum.



FIGS. 22A-E pertain to heterologous expression of Sali3-2-[QAYGVYTW] in Nicotiana benthamiana. FIG. 22A is Sali3-2-[QAYGVYTW] lyciumin precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 22B is predicted [QAYGVYTW]-lyciumin chemotype. FIG. 22C is LC-MS chemotyping of predicted [QAYGVYTW]-lyciumin in peptide extract of N. benthamiana leaves infiltrated with A. tumefaciens LBA4404 pEAQ-HT-Sali3-2-[QAYGVYTW] for six days. FIG. 22D is MS analysis of predicted [QAYGVYTW]-lyciumin chemotype. FIG. 22E is MS/MS analysis of predicted [QAYGVYTW]-lyciumin chemotype.



FIGS. 23A-B pertain to heterologous expression of Sali3-2-[QPAGVYTW] in Nicotiana benthamiana. FIG. 23A is Sali3-2-[QPAGVYTW] lyciumin precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 23B is predicted [QPAGVYTW]-lyciumin chemotype.



FIGS. 24A-B pertain to heterologous expression of Sali3-2-[QPYAVYTW] in Nicotiana benthamiana. FIG. 24A is Sali3-2-[QPYAVYTW] lyciumin precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 24B is predicted [QPYAVYTW]-lyciumin chemotype.



FIGS. 25A-B pertain to heterologous expression of Sali3-2-[QPYGAYTW] in Nicotiana benthamiana. FIG. 25A is Sali3-2-[QPYGAYTW] lyciumin precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 25B is predicted [QPYGAYTW]-lyciumin chemotype.



FIGS. 26A-B pertain to heterologous expression of Sali3-2-[QPYGVATW] in Nicotiana benthamiana. FIG. 26A is Sali3-2-[QPYGVATW] lyciumin precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 26B is predicted [QPYGVATW]-lyciumin chemotype.



FIGS. 27A-B pertain to heterologous expression of Sali3-2-[QPYGVYAW] in Nicotiana benthamiana. FIG. 27A is Sali3-2-[QPYGVYAW] lyciumin precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 27B is predicted [QPYGVYAW]-lyciumin chemotype.



FIGS. 28A-I pertain to heterologous expression of Sali3-2-[QPYTVYTW] in Nicotiana benthamiana. FIG. 28A is Sali3-2-[QPYTVYTW] lyciumin precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 28B is predicted [QPYTVYTW]-lyciumin chemotype. FIG. 28C is LC-MS chemotyping of predicted [QPYTVYTW]-lyciumin in peptide extract of N. benthamiana leaves infiltrated with A. tumefaciens LBA4404 pEAQ-HT-Sali3-2-[QPYTVYTW] for six days. FIG. 28D is MS analysis of predicted [QPYTVYTW]-lyciumin chemotype. FIG. 28E is MS/MS analysis of predicted [QPYTVYTW]-lyciumin chemotype. FIG. 28F is predicted [QPYTVYTW]-dehydrothreonine lyciumin chemotype. FIG. 28G is LC-MS chemotyping of predicted [QPYTVYTW]-dehydrothreonine lyciumin in peptide extract of N. benthamiana leaves infiltrated with A. tumefaciens LBA4404 pEAQHT-Sali3-2-[QPYTVYTW] for six days. FIG. 28H is MS analysis of predicted [QPYTVYTW]-dehydrothreonine lyciumin chemotype. FIG. 28I is MS/MS analysis of predicted [QPYTVYTW]-dehydrothreonine lyciumin chemotype.



FIGS. 29A-B pertain to heterologous expression of Sali3-2-[QPYGVYTY] in Nicotiana benthamiana. FIG. 29A is Sali3-2-[QPYGVYTY] precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 29B is predicted [QPYGVYTY]-cyclic peptide chemotype with a putative C-terminal cyclization site.



FIGS. 30A-B pertain to heterologous expression of Sali3-2-[QPYGVYFY] and CanBURP in Nicotiana benthamiana and MS characterization of putative cyclic lyciumin-type peptide in Capsicum annuum. FIG. 30A is Capsicum annum precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 30B is predicted [QPYGVYFY]-cyclic peptide chemotype with a putative C-terminal cyclization site.



FIGS. 31A-B pertain to heterologous expression of Sali3-2-[QPWGVGTW] in Nicotiana benthamiana. FIG. 31A is Sali3-2-[QPWGVGTW] lyciumin precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 31B is predicted [QPWGVGTW]-lyciumin chemotype.



FIGS. 32A-B pertain to heterologous expression of Sali3-2-[QPWGVGAW] in Nicotiana benthamiana. FIG. 32A is Sali3-2-[QPWGVGAW] lyciumin precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 32B is predicted [QPWGVGAW]-lyciumin chemotype.



FIGS. 33A-B pertain to heterologous expression of Sali3-2-[QPWGVYTW] in Nicotiana benthamiana. FIG. 33A is Sali3-2-[QPWGVYTW] lyciumin precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 33B is predicted [QPWGVYTW]-lyciumin chemotype.



FIGS. 34A-B pertain to heterologous expression of Sali3-2-[QPFGVYTW] in Nicotiana benthamiana. FIG. 34A is Sali3-2-[QPFGVYTW] lyciumin precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 34B is predicted [QPFGVYTW]-lyciumin chemotype.



FIGS. 35A-B pertain to heterologous expression of Sali3-2-[QPFGFFSW] in Nicotiana benthamiana. FIG. 35A is Sali3-2-[QPFGFFSW] lyciumin precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 35B is predicted [QPFGFFSW]-lyciumin chemotype.



FIGS. 36A-B pertain to heterologous expression of Sali3-2-[QPWGVYSW] in Nicotiana benthamiana. FIG. 36A is Sali3-2-[QPWGVYSW] lyciumin precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 36B is predicted [QPWGVYSW]-lyciumin chemotype.



FIGS. 37A-B pertain to heterologous expression of Sali3-2-[QPYGVYFW] in Nicotiana benthamiana. FIG. 37A is Sali3-2-[QPYGVYFW] lyciumin precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 37B is predicted [QPYGVYFW]-lyciumin chemotype.



FIG. 38A is a chart showing Lyciumin B production in tobacco and source plant (Lycium barbarum). FIG. 38B shows the nucleotide and amino acid sequences of an engineered lyciumin precursor containing one repeat of a single core peptide (SEQ ID NO: 50; QPWGVGSW=lyciumin B). FIG. 38C shows the nucleotide and amino acid sequences of an engineered lyciumin precursor containing five repeats of a single core peptide (SEQ ID NO: 50; QPWGVGSW=lyciumin B). FIG. 38D shows the nucleotide and amino acid sequences of an engineered lyciumin precursor containing ten repeats of a single core peptide (SEQ ID NO: 50; QPWGVGSW=lyciumin B).



FIGS. 39A-B show lyciumin chemotypes and types of precursor peptides in plants. FIG. 39A is a chemical structure of lyciumins. FIG. 39B show two types of BURP-domain lyciumin precursor peptides categorized based on primary structure (Abbreviation: Core—core lyciumin peptide domain).



FIGS. 40A-C show characterization of lyciumin chemo- and genotype in lycophyte Selaginella uncinata. FIG. 40A is a lyciumin precursor peptide from S. uncinata. The BURP domain sequence is underlined, the lyciumin core peptide motifs are highlighted in red, and the signaling peptide sequence is highlighted in blue. FIG. 40B is the predicted lyciumin-[QPYSVFAW] chemotype from S. uncinata. FIG. 40C is LC-MS detection of predicted lyciumin-[QPYSVFAW] chemotype in peptide extract of S. uncinata root.



FIGS. 41A-C show taxonomic distribution and phylogenetic relationship of lyciumin precursors in land plants. FIG. 41A is a simplified maximum-likelihood phylogenetic tree built using the BURP-domain sequences of lyciumin precursors predicted from plant genomes (Kersten and Weng, 2018) and plant transcriptomes (FIGS. 43A-I) and founding members of BURP-domain protein family. Bootstrap values (based on 1000 replicates) of key branches are displayed. The scale measures evolutionary distances in substitutions per amino acid. The fully annotated tree is shown in FIG. 45. A large-scale neighbor-joining tree also including non-lyciumin-producing BURP-domain proteins from several sequenced plant genomes is shown in Dataset S2. FIG. 41B is a Venn diagram of core peptide sequences of predicted and characterized lyciumin chemotypes based on genome and transcriptome mining (Table 8). FIG. 41C is taxonomic distribution of predicted and characterized lyciumin chemotypes (both highlighted in red) in land plants. Plant families with characterized lyciumin chemotypes are denoted by asterisks.



FIGS. 42A-F show convergent evolution of lyciumin-[QPFGVFGW] from nonhomologous precursor proteins in Celtis occidentalis (Cannabaceae) and Achyranthes bidentata (Amaranthaceae). FIG. 42A is a photograph of a C. occidentalis tree used for chemotyping experiments in Example 2. FIG. 42B is the predicted lyciumin-[QPFGVFGW] precursor peptide from C. occidentalis with DUF2775 domain (type 3 lyciumin precursor). The DUF2775-domain sequence is underlined, the core peptide motifs are highlighted in red, and the signal peptide sequence is highlighted in blue. FIG. 42C is the predicted lyciumin-[QPFGVFGW] (lyciumin Q) chemotype. FIG. 42D is a greenhouse-grown A. bidentata plant used in Example 2. FIG. 42E is the predicted lyciumin-[QPFGVFGW] BURP-domain precursor peptide from A. bidentata transcriptome. The core peptide motifs are highlighted in red, orange and purple, the signal peptide is highlighted in blue, and the BURP domain is underlined. FIG. 42F is LC-MS characterization of lyciumin-[QPFGVFGW] in peptide extracts of C. occidentalis leaves, A. bidentata seeds, and N. benthamiana leaves sampled six days after infiltration with Agrobacterium tumefaciens LBA4404 carrying the pEAQ-HT-Sali3-2-[QPFGVFGW] construct. The detailed MS/MS analysis is shown in FIGS. 47A-G.



FIGS. 43A-I are Dataset S1. Candidate lyciumin precursor peptides from plant transcriptomes. Underlined—BURP domain, red—core peptide.



FIGS. 44A-B show characterization of lyciumin-[QPYSVFAW] in Selaginella uncinata. FIG. 44A is heterologous expression of Sali3-2-[QPYGVFAW] in Nicotiana benthamiana. Sali3-2-[QPYGVFAW] lyciumin precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 44B is predicted [QPYGVFAW]-lyciumin chemotype.



FIG. 45 is phylogenetic analysis of predicted and characterized lyciumin precursors from analyzed plant genomes and transcriptomes. A maximum-likelihood phylogenetic tree of BURP-domain sequences of lyciumin precursors predicted from plant genomes and transcriptomes as well as several founding members of the BURP-domain protein family generated with 1000 bootstrap generations. The scale measures evolutionary distances in substitutions per amino acid. The predicted motif sequences and their flanking residues are expressed in the protein names. Precursors with characterized chemotypes are noted with an asterick. Predicted cyclization sites at the fourth core peptide position of precursor peptides are noted after species name as capital letter (Abbreviations: A—alanine, E—glutamate, F—phenylalanine, G—glycine, L—leucine, P—proline, R—arginine, T—threonine). For condensed tree, see FIG. 41A.



FIGS. 46A-B pertain to CocDUF2775-homolog from Parasponia andersonii (Cannabaceae). FIG. 46A is protein sequence of CocDUF2775-homolog from Parasponia andersonii (Cannabaceae). FIG. 46B is a sequence alignment of CocDUF2775 and its closest NCBI-nr sequence database homolog. Predicted core peptide sequences are highlighted in red.



FIGS. 47A-G pertain to characterization of lyciumin-[QPFGVFGW] in Celtis occidentalis and Achyranthes bidentata. FIG. 47A is an MS analysis of predicted [QPFGVFGW]-lyciumin chemotype in peptide extract of Celtis occidentalis leaves. FIG. 47B is an MS/MS analysis of predicted [QPFGVFGW]-lyciumin chemotype in peptide extract of Celtis occidentalis leaves. FIG. 47C shows heterologous expression of Sali3-2-[QPFGVFGW] in Nicotiana benthamiana. Sali3-2-[QPFGVFGW] lyciumin precursor peptide (BURP domain underlined, core peptide highlighted in red). FIG. 47D is MS analysis of predicted [QPFGVFGW]-lyciumin chemotype in peptide extract of N. benthamiana leaves infiltrated with A. tumefaciens LBA4404 pEAQ-HTSali3-2-[QPFGVFGW] for six days. FIG. 47E is MS/MS analysis of predicted [QPFGVFGW]-lyciumin chemotype in peptide extract of N. benthamiana leaves infiltrated with A. tumefaciens LBA4404 pEAQ-HT-Sali3-2-[QPFGVFGW] for six days. FIG. 47F is MS analysis of predicted [QPFGVFGW]-lyciumin chemotype in peptide extract of Achyranthes bidentata seeds. FIG. 47G is MS/MS analysis of predicted [QPFGVFGW]-lyciumin chemotype in peptide extract of Achyranthes bidentata seeds.





DETAILED DESCRIPTION

A description of example embodiments follows.


Ribosomally synthesized and post-translationally modified peptides (RiPPs) have been rapidly expanded in defined classes in the era of whole genome sequencing. While most RiPPs have been characterized from bacteria and fungi, few examples are known from plants. Described herein are lyciumins as a plant RiPP class. A lyciumin precursor gene was identified from the lyciumin producer Lycium barbarum. A precursor gene-guided genome mining approach was used to show that lyciumin genotypes and chemotypes are widely distributed in crop and forage plants. The promiscuity of the lyciumin pathway led to the discovery of peptide macrocyclization chemistry in lyciumin-type peptides from pepper seeds and suggests a largely untapped peptide chemical space in the plant kingdom. Based on the physical connection of lyciumin core peptides to protein domains associated with abiotic stress responses in plants, a platform for lyciumin expression and diversification was developed, which can be used to create a library of lyciumin cyclic peptides. The lyciumins described herein can be expressed in planta. The lyciumins cyclic peptides described herein can be used in agrochemical and pharmaceutical applications that aim to increase plant fitness towards abiotic and biotic stresses and treat human diseases, respectively.


Ribosomally synthesized and post-translationally modified peptides (RiPPs) are a rapidly growing class of natural products, since whole genome sequencing enabled the discovery of many RiPP precursor genes and corresponding biosynthetic pathways [8]. While most RiPPs have been discovered from bacteria and fungi, few examples are known from plants. The two biosynthetically defined classes of characterized plant RiPPs are cyclotides and orbitides, which are “head-to-tail” cyclic peptides with or without disulfide bonds, respectively [8-10]. Beyond “head-to-tail” cyclic peptides, the phytochemical repertoire of cyclic peptides suggests that there is a largely untapped diversity of branched cyclic plant peptide chemistry and underlying biochemistry to be discovered [11].


Discovery of peptide natural products and their biosynthetic pathways from microbes and fungi has been revolutionized by genome mining approaches using a vast resource of microbial and fungal genome sequences and biosynthetic knowledge of peptide natural product biosynthesis [12-16]. During the past decade, the number of publicly available plant genomes has exponentially increased due to the improvement of sequencing technologies and lowered genome sequencing costs [17]. In addition, characterization of plant biosynthetic pathways has accelerated due to synthetic biology approaches [18,19]. In analogy to microbial and fungal genomics, the growing plant genomic resource has inspired genome mining approaches for known classes of plant natural products [20-22] based on the current knowledge of plant natural product biosynthesis [23]. Generally, in a genome mining experiment predicted biosynthetic genotypes are connected with corresponding chemotypes based on applied biosynthetic knowledge in three steps: (A) Genotype prediction (the prediction of biosynthetic genes), (B) chemotype prediction (the prediction of structural features of a natural product from its biosynthetic genes) and (C) structure-guided chemotyping (the connection of an analyte structure with the predicted natural product structure) [13]. However, there are several challenges associated with plant natural product discovery by genome mining: (A) Genotype prediction is complicated by knowledge gaps in plant natural product biosynthesis and no or only partial clustering of biosynthetic genes in some plant natural product pathways, (B) chemotype prediction from biosynthetic genes can be difficult for certain natural product classes, and (C) structure-guided chemotyping can be problematic in terms of identification of structural information that can be connected to a predicted chemotype from given biosynthetic genes for a successful genome mining experiment. RiPPs have advantages to circumvent these general problems of plant genome mining because the peptide sequence is directly encoded in the genome as a core peptide within a precursor peptide [8]. Thus, identification of a precursor peptide gene specific to a chemically defined RiPP class yields structural information via the core peptide to enable the prediction of a RiPP structure for subsequent structure-guided chemotyping, for example by mass spectrometry [24]. Such a precursor gene-guided genome mining approach for plant RiPPs should not require knowledge of other biosynthetic genes encoding post-translationally modifying enzymes or proteases.


A precursor gene-guided genome mining approach was employed to identify a candidate class of plant RiPPs: the branched cyclic lyciumin peptides [25]. Lyciumins were originally isolated as inhibitors of the angiotensin-converting enzyme and renin from the roots of Lycium barbarum (Solanaceae, FIG. 1A), a Chinese herbal medicine for the treatment of hypertension [25]. Lyciumins are branched cyclic peptides with an N-terminal pyroglutamate and a macrocyclic linkage involving a C-terminal tryptophan or tyrosine residue. A lyciumin cyclic peptide typically consists of eight amino acids of a core lyciumin peptide domain. Typically, the C-terminal residue cyclizes with the α-carbon of the fourth residue. An example is a macrocyclic linkage between a C-terminal tryptophan indole-nitrogen and a glycine-α-carbon. Lyciumin A and C derivatives have also been isolated from the seeds of the medicinal plant Celosia argentea (Amaranthaceae) suggesting that these peptides are produced by multiple plant families [26].


Described herein is the discovery of lyciumin cyclic peptides from crop and forage plants by precursor gene-guided genome mining. Based on the physical connection of lyciumin biosynthesis and abiotic stress response reception in lyciumin precursor peptides, a platform is established to metabolically engineer and produce lyciumin peptide libraries, e.g., in planta. Such lyciumin peptide libraries can be used for, e.g., future engineering applications towards crops with increased stress tolerance. The discovery of lyciumins is a blueprint for peptide discovery by genome mining in the plant kingdom and lyciumin metabolic engineering sets the stage for their potential application as agrochemicals or pharmaceuticals to increase crop fitness and treat human diseases, respectively.


As used herein, the term “lyciumin precursor peptide” refers to a peptide that includes an N-terminal leader domain, one or more core lyciumin peptide domains, and, optionally, a C-terminal BURP domain or C-terminal DUF2775 domain. In some instances, one or more core lyciumin peptide domains can be within a BURP domain. In some instances, one or more core lyciumin peptide domains can be within a DUF2775 domain. In some instances, one or more core lyciumin peptide domains are not within (e.g., outside) a BURP domain. In some instances, one or more core lyciumin peptide domains can be within the N-terminal leader domain. In some instances, one or more core lyciumin peptide domains are not within (e.g., outside) the N-terminal leader domain. In some embodiments, a lyciumin precursor peptide includes from one to twenty core lyciumin peptide domains. In some embodiments, a lyciumin precursor peptide includes from one to ten core lyciumin peptide domains. In some instances, lyciumin precursor peptides can include more than twenty core lyciumin peptide domains. In some embodiments, the lyciumin precursor peptide includes a C-terminal BURP domain. In some embodiments, the lyciumin precursor peptide, or biologically-active fragment thereof, can include a signal peptide sequence. For example, a signal peptide sequence can direct a lyciumin precursor peptide, or biologically-active fragment thereof, through a portion of the secretory pathway and can facilitate localization to a particular organelle, such as a vacuole, which can be relevant for subsequent processing or conversion from a lyciumin precursor peptide to a lyciumin cyclic peptide. A signal peptide can be endogenous for a particular host cell or plant cell, or it can be heterologous. Typically, a signal peptide is located N-terminal to one or more core lyciumin peptide domains. In some instances, a signal peptide can be part of an N-terminal leader domain. In certain host cells (e.g., mammalian or plant host cells), expression and/or secretion of a protein can be increased by using a signal sequence, such as a heterologous signal sequence. Therefore, in some embodiments, the lyciumin precursor peptide includes a heterologous signal sequence at its N-terminus.


As used herein, the term “core lyciumin peptide domain” refers to a peptide domain that includes eight amino acids. The peptide is of the form QXX(G/A/T/S/P/E/F/L/R)XXX(Y/W), where X is any amino acid. For example, in some embodiments of interest, the peptide is of the form QXX(G/A/T/S/P)XXX(Y/W), where X is any amino acid. For example, in some embodiments of interest, the peptide is of the form QXX(G/A/T/S)XXX(Y/W), where X is any amino acid. For example, in some embodiments of interest, the peptide is of the form QXX(G/A/T)XXX(Y/W), where X is any amino acid. In particular embodiments, X is any of the twenty-two naturally occurring amino acids. In particular embodiments, X is any of the twenty amino acids encoded by the universal genetic code. In some embodiments, a core lyciumin peptide domain is a sequence listed in Table 1, Table 2, Table 3, or Table 4. In some embodiments, a core lyciumin peptide domain differs in sequence from a sequence listed in Table 1, Table 2, Table 3, or Table 4. For example, a core lyciumin peptide domain can have at least one substitution (e.g., 2, 3, 4, 5, etc. substitutions) relative to a sequence listed in Table 1, Table 2, Table 3, or Table 4. In some embodiments, the core lyciumin peptide domain differs in sequence from a naturally occurring core lyciumin peptide domain. In some embodiments, the sequence of the lyciumin precursor peptide, or biologically-active fragment thereof, differs from a naturally occurring sequence. In particular embodiments, as described herein, the variable X in the peptide QXX(G/A/T/S/P/E/F/L/R)XXX(Y/W), the peptide QXX(G/A/T/S/P)XXX(Y/W), the peptide QXX(G/A/T/S)XXX(Y/W), or the peptide QXX(G/A/T)XXX(Y/W) may be further restricted at individual positions, as described in the following paragraphs. A wide variety of core lyciumin peptide domains can be created. For example, in some embodiments, one of the X positions can be restricted. In other embodiments, two of the X positions can be restricted. In other embodiments, three of the X positions can be restricted. In other embodiments, four of the X positions can be restricted. In other embodiments, five of the X positions can be restricted.


In some embodiments of the core lyciumin peptide, the second position is proline or alanine. In some embodiments, the second position is proline. In some embodiments, the second position is not proline. In some embodiments, the second position is alanine. In some embodiments, the second position is not alanine.


In some embodiments of the core lyciumin peptide, the third position is tryptophan, alanine, tyrosine, phenylalanine, leucine, isoleucine, or serine. In some embodiments, the third position is tryptophan. In some embodiments, the third position is not tryptophan. In some embodiments, the third position is alanine. In some embodiments, the third position is not alanine. In some embodiments, the third position is tyrosine. In some embodiments, the third position is not tyrosine. In some embodiments, the third position is phenylalanine. In some embodiments, the third position is not phenylalanine. In some embodiments, the third position is leucine. In some embodiments, the third position is not leucine. In some embodiments, the third position is isoleucine. In some embodiments, the third position is not isoleucine. In some embodiments, the third position is serine. In some embodiments, the third position is not serine.


In some embodiments of the core lyciumin peptide, the fourth position is glycine. In some embodiments, the fourth position is not glycine. In some embodiments, the fourth position is alanine. In some embodiments, the fourth position is not alanine. In some embodiments, the fourth position is threonine. In some embodiments, the fourth position is not threonine. In some embodiments, the fourth position is serine. In some embodiments, the fourth position is not serine. In some embodiments, the fourth position is proline. In some embodiments, the fourth position is not proline. In some embodiments, the fourth position is glutamic acid. In some embodiments, the fourth position is not glutamic acid. In some embodiments, the fourth position is phenylalanine. In some embodiments, the fourth position is not phenylalanine. In some embodiments, the fourth position is leucine. In some embodiments, the fourth position is not leucine. In some embodiments, the fourth position is arginine. In some embodiments, the fourth position is not arginine.


In some embodiments of the core lyciumin peptide, the fifth position is valine, alanine, phenylalanine, serine, glycine, threonine, isoleucine, glutamine, or leucine. In some embodiments, the fifth position is valine. In some embodiments, the fifth position is not valine. In some embodiments, the fifth position is alanine. In some embodiments, the fifth position is not alanine. In some embodiments, the fifth position is phenylalanine. In some embodiments, the fifth position is not phenylalanine. In some embodiments, the fifth position is serine. In some embodiments, the fifth position is not serine. In some embodiments, the fifth position is glycine. In some embodiments, the fifth position is not glycine. In some embodiments, the fifth position is threonine. In some embodiments, the fifth position is not threonine. In some embodiments, the fifth position is isoleucine. In some embodiments, the fifth position is not isoleucine. In some embodiments, the fifth position is glutamine. In some embodiments, the fifth position is not glutamine. In some embodiments, the fifth position is leucine. In some embodiments, the fifth position is not leucine.


In some embodiments of the core lyciumin peptide, the sixth position is glycine, tyrosine, alanine, threonine, serine, phenylalanine, leucine, cysteine, methionine, isoleucine, arginine, histidine, asparagine, valine, or aspartate. In some embodiments, the sixth position is glycine. In some embodiments, the sixth position is not glycine. In some embodiments, the sixth position is tyrosine. In some embodiments, the sixth position is not tyrosine. In some embodiments, the sixth position is alanine. In some embodiments, the sixth position is not alanine. In some embodiments, the sixth position is threonine. In some embodiments, the sixth position is not threonine. In some embodiments, the sixth position is serine. In some embodiments, the sixth position is not serine. In some embodiments, the sixth position is phenylalanine. In some embodiments, the sixth position is not phenylalanine. In some embodiments, the sixth position is leucine. In some embodiments, the sixth position is not leucine. In some embodiments, the sixth position is cysteine. In some embodiments, the sixth position is not cysteine. In some embodiments, the sixth position is methionine. In some embodiments, the sixth position is not methionine. In some embodiments, the sixth position is isoleucine. In some embodiments, the sixth position is not isoleucine. In some embodiments, the sixth position is arginine. In some embodiments, the sixth position is not arginine. In some embodiments, the sixth position is histidine. In some embodiments, the sixth position is not histidine. In some embodiments, the sixth position is asparagine. In some embodiments, the sixth position is not asparagine. In some embodiments, the sixth position is valine. In some embodiments, the sixth position is not valine. In some embodiments, the sixth position is aspartate. In some embodiments, the sixth position is not aspartate.


In some embodiments of the core lyciumin peptide, the seventh position is serine, isoleucine, threonine, alanine, phenylalanine, glycine, tyrosine, methionine, lysine, valine, or arginine. In some embodiments, the seventh position is serine. In some embodiments, the seventh position is not serine. In some embodiments, the seventh position is isoleucine. In some embodiments, the seventh position is not isoleucine. In some embodiments, the seventh position is threonine. In some embodiments, the seventh position is not threonine. In some embodiments, the seventh position is alanine. In some embodiments, the seventh position is not alanine. In some embodiments, the seventh position is phenylalanine. In some embodiments, the seventh position is not phenylalanine. In some embodiments, the seventh position is glycine. In some embodiments, the seventh position is not glycine. In some embodiments, the seventh position is tyrosine. In some embodiments, the seventh position is not tyrosine. In some embodiments, the seventh position is methionine. In some embodiments, the seventh position is not methionine. In some embodiments, the seventh position is lysine. In some embodiments, the seventh position is not lysine. In some embodiments, the seventh position is valine. In some embodiments, the seventh position is not valine. In some embodiments, the seventh position is arginine. In some embodiments, the seventh position is not arginine.


In some embodiments of the core lyciumin peptide, the eighth position is tyrosine. In some embodiments, the eighth position is not tyrosine. In some embodiments, the eighth position is tryptophan. In some embodiments, the eighth position is not tryptophan.


As used herein, the term “biologically-active fragment,” when referring to a lyciumin precursor peptide, refers to a fragment of a lyciumin precursor peptide that includes at least one core lyciumin peptide domain and that can be converted to a lyciumin cyclic peptide (e.g., in a host cell). Typically, the biologically-active fragment is cyclized in the host cell. In some instances, the biologically-active fragment may have shorter N-terminal or C-terminal domains compared to a lyciumin precursor peptide. In some instances, biologically-active fragments can be fragments of naturally-occurring lyciumin precursor peptides. In some instances, a biologically-active fragment can be a portion of a lyciumin precursor peptide having at least one core lyciumin peptide, which is embedded in, or linked to (e.g., at the N-terminus of, at the C-terminus of), a heterologous amino acid sequence that is not generally found in a lyciumin precursor peptide.


In some embodiments, the invention provides a method of producing one or more lyciumin cyclic peptides that includes: (a) providing a host cell that includes a transgene encoding a polypeptide that comprises one or more core lyciumin peptide domains; (b) expressing the transgene in the host cell to thereby produce a polypeptide that includes one or more core lyciumin peptide domains. In some embodiments, the polypeptide is converted to one or more lyciumin cyclic peptides in the host cell.


As used herein, the term “lyciumin cyclic peptide” refers to a branched cyclic peptide with an N-terminal pyroglutamate and a macrocyclic linkage involving a C-terminal tryptophan or tyrosine residue. A lyciumin cyclic peptide typically consists of the eight amino acids of the core lyciumin peptide domain. Typically, the C-terminal residue cyclizes with the α-carbon of the fourth residue. An example is a macrocyclic linkage between a C-terminal tryptophan indole-nitrogen and a glycine-α-carbon.


The BURP domain (Pfam 03181) is around 230 amino acid residues and has the following conserved features: two phenylalanine residues at its N-terminus; two cysteine residues; and four repeated cysteine-histidine motifs, arranged as: CH-X(10)-CH-X(25-27)-CH-X(25-26)-CH, where X can be any amino acid.


The DUF2775 domain (Pfam 10950) is a eukaryotic protein family which includes a number of plant organ-specific proteins. Their predicted amino acid sequence is often repetitive and suggests that these proteins could be exported and glycosylated. Multiple sequence alignment shows a highly conserved motif of 135 amino acids. This motif includes approximately 20 amino acids from the non-repeating area of the peptide, 2 tandem repeats and 1 truncated tandem repeat (Albornos et al., 2012). The first seven amino acids of the DUF2775 domain are typically KDXYXGW, where X can be any amino acid.


Embodiments described herein also include engineered nucleic acids that encode engineered lyciumin precursor peptides (and engineered lyciumin precursor peptides encoded by such engineered nucleic acids). An example is an engineered nucleic acid that encodes n number of core lyciumin peptide domains, wherein n is an integer. The core lyciumin peptide domains within an engineered lyciumin precursor peptide can be identical or non-identical. Multiple identical core lyciumin peptide domains can allow for increased production of a homogenous population of core lyciumin peptides and lyciumin cyclic peptides. Typically, n is an integer from 1 to 10, preferably from 5 to 10. In some instances, n can be greater than 10. In some instances, an engineered nucleic acid encodes from 5 to 10 identical lyciumin precursor peptides. The core lyciumin peptides domains are typically separated by an intervening sequence.


In the example shown in FIGS. 38B-D, the core lyciumin peptide domains, which are indicated in red, are separated by a thirteen amino acid sequence, though different lengths are permissible. In the embodiment of FIGS. 38B-D, arginine is immediately N-terminal to the core lyciumin peptide domain, except for an instance where serine is immediately N-terminal to the core lyciumin peptide domain. In the embodiment of FIGS. 38B-D, tyrosine is immediately C-terminal to the core lyciumin peptide domain. Other amino acids can be immediately N-terminal or C-terminal. A wide variety of core lyciumin peptide domains can be expressed from an engineered nucleic acid.


As used herein, “converting the lyciumin precursor peptide, or biologically-active fragment thereof, to one or more lyciumin cyclic peptides in a host cell,” “converted to one or more lyciumin cyclic peptides in a host cell,” and similar phrases refer to one or more enzymatic reactions that convert a lyciumin precursor peptide, or biologically-active fragment thereof, to one or more lyciumin cyclic peptides. In some instances, conversion is facilitated by one or more enzymes that cyclizes the lyciumin precursor peptide, or biologically-active fragment thereof. In some instances, conversion is catalyzed, in part, by one or more endopeptidases, such as an arginine endopeptidase, which acts N-terminal to a core lyciumin peptide domain. In some instances, conversion is catalyzed by one or more glutamine cyclotransferases, which cyclize an N-terminal glutamine in a core lyciumin peptide domain. In some instances, conversion is catalyzed by one or more exopeptidases. Conversion to a lyciumin cyclic peptide can, but need not, occur within in a host cell.


Host cells include cells that are capable of converting a lyciumin precursor peptide to a lyciumin cyclic peptide, as well as cells that are incapable of converting a lyciumin precursor peptide to a lyciumin cyclic peptide. For example, a host cell can express a lyciumin precursor peptide but lack one or more enzymes required to convert the lyciumin precursor peptide to a lyciumin cyclic peptide. In such circumstances, the lyciumin precursor peptide can be isolated or obtained from the host cell and then converted to a lyciumin cyclic peptide in another environment (e.g., in a cell free system, such as in a cell lysate (or fractionated cell lysate) from a source that is capable of converting a lyciumin precursor peptide to a lyciumin cyclic peptide).


In some embodiments, a lyciumin precursor peptide can include a tag, which can be used to isolate the lyciumin precursor peptide from a cell that expresses it. Such a tag can be useful for a manufacturing process that involves recombinant expression of a lyciumin precursor peptide and subsequent cyclization using purified enzyme. In some embodiments, a nucleotide sequence encoding a lyciumin precursor peptide is fused in-frame with a nucleotide sequence encoding an epitope tag, also known as an affinity tag, which can be useful for, e.g., protein purification. Examples of suitable epitope tags are known in the art and include FLAG, HA, His, GST, CBP, MBP, c-Myc, DHFR, GFP, CAT and others.


Nucleic Acids

As used herein, the term “nucleic acid” refers to a polymer comprising multiple nucleotide monomers (e.g., ribonucleotide monomers or deoxyribonucleotide monomers). “Nucleic acid” includes, for example, DNA (e.g., genomic DNA and cDNA), RNA, and DNA-RNA hybrid molecules. Nucleic acid molecules can be naturally occurring, recombinant, or synthetic. In addition, nucleic acid molecules can be single-stranded, double-stranded or triple-stranded. In certain embodiments, nucleic acid molecules can be modified. In the case of a double-stranded polymer, “nucleic acid” can refer to either or both strands of the molecule.


The terms “nucleotide” and “nucleotide monomer” refer to naturally occurring ribonucleotide or deoxyribonucleotide monomers, as well as non-naturally occurring derivatives and analogs thereof. Accordingly, nucleotides can include, for example, nucleotides comprising naturally occurring bases (e.g., adenosine, thymidine, guanosine, cytidine, uridine, inosine, deoxyadenosine, deoxythymidine, deoxyguanosine, or deoxycytidine) and nucleotides comprising modified bases known in the art.


As used herein, the term “sequence identity,” refers to the extent to which two nucleotide sequences, or two amino acid sequences, have the same residues at the same positions when the sequences are aligned to achieve a maximal level of identity, expressed as a percentage. For sequence alignment and comparison, typically one sequence is designated as a reference sequence, to which a test sequences are compared. The sequence identity between reference and test sequences is expressed as the percentage of positions across the entire length of the reference sequence where the reference and test sequences share the same nucleotide or amino acid upon alignment of the reference and test sequences to achieve a maximal level of identity. As an example, two sequences are considered to have 70% sequence identity when, upon alignment to achieve a maximal level of identity, the test sequence has the same nucleotide or amino acid residue at 70% of the same positions over the entire length of the reference sequence.


Alignment of sequences for comparison to achieve maximal levels of identity can be readily performed by a person of ordinary skill in the art using an appropriate alignment method or algorithm. In some instances, the alignment can include introduced gaps to provide for the maximal level of identity. Examples include the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482 (1981), the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol. 48:443 (1970), the search for similarity method of Pearson & Lipman, Proc. Nat'l. Acad. Sci. USA 85:2444 (1988), computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, Wis.), and visual inspection (see generally Ausubel et al., Current Protocols in Molecular Biology).


When using a sequence comparison algorithm, test and reference sequences are input into a computer, subsequent coordinates are designated, if necessary, and sequence algorithm program parameters are designated. The sequence comparison algorithm then calculates the percent sequence identity for the test sequence(s) relative to the reference sequence, based on the designated program parameters. A commonly used tool for determining percent sequence identity is Protein Basic Local Alignment Search Tool (BLASTP) available through National Center for Biotechnology Information, National Library of Medicine, of the United States National Institutes of Health. (Altschul et al., 1990).


In various embodiments, two nucleotide sequences, or two amino acid sequences, can have at least, e.g., 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, sequence identity. When ascertaining percent sequence identity to one or more sequences described herein, the sequences described herein are the reference sequences.


For many of the nucleotide sequences described herein, additional 5′- and 3′-nucleotides can be appended to the nucleotide sequence in order to perform Gibson cloning of the sequence into an expression vector. Gibson cloning utilizes Gibson assembly, an exonuclease-based method for joining DNA fragments. For example, a 5′ adapter (see SEQ ID NO: 123) and a 3′ adapter (see SEQ ID NO: 124) can be appended 5′ and 3′, respectively, to SEQ ID NOS: 9 through 33 for Gibson cloning and assembly into tobacco expression vector pEAQ-HT.


Vectors

The terms “vector”, “vector construct” and “expression vector” mean the vehicle by which a DNA or RNA sequence (e.g. a foreign gene) can be introduced into a host cell, so as to transform the host and promote expression (e.g. transcription and translation) of the introduced sequence. Vectors typically comprise the DNA of a transmissible agent, into which foreign DNA encoding a protein is inserted by, e.g., restriction enzyme technology. Some viral vectors comprise the RNA of a transmissible agent. A common type of vector is a “plasmid”, which generally is a self-contained molecule of double-stranded DNA that can readily accept additional (foreign) DNA and which can readily introduced into a suitable host cell. A large number of vectors, including plasmid and fungal vectors, have been described for replication and/or expression in a variety of eukaryotic and prokaryotic hosts.


The terms “express” and “expression” mean allowing or causing the information in a gene or DNA sequence to become manifest, for example producing a protein by activating the cellular functions involved in transcription and translation of a corresponding gene or DNA sequence. A DNA sequence is expressed in or by a cell to form an “expression product” such as a protein. The expression product itself, e.g. the resulting protein, may also be said to be “expressed” by the cell. A polynucleotide or polypeptide is expressed recombinantly, for example, when it is expressed or produced in a foreign host cell under the control of a foreign or native promoter, or in a native host cell under the control of a foreign promoter.


Gene delivery vectors generally include a transgene (e.g., nucleic acid encoding an enzyme) operably linked to a promoter and other nucleic acid elements required for expression of the transgene in the host cells into which the vector is introduced. Suitable promoters for gene expression and delivery constructs are known in the art. For bacterial host cells, suitable promoters, include, but are not limited to promoters obtained from the E. coli lac operon, Streptomyces coelicolor agarase gene (dagA), Bacillus subtilis levansucrase gene (sacB), Bacillus licheniformis alpha-amylase gene (amyL), Bacillus stearothermophilus maltogenic amylase gene (amyM), Bacillus amyloliquefaciens alpha-amylase gene (amyQ), Bacillus licheniformis penicillinase gene (penP), Bacillus subtilis xy1A and xy1B genes, and prokaryotic beta-lactamase gene (See e.g., Villa-Kamaroff et al., Proc. Natl. Acad. Sci. USA 75: 3727-3731, 1978), as well as the tac promoter (See e.g., DeBoer et al., Proc. Natl. Acad. Sci. USA 80: 21-25, 1983). Examples of promoters for filamentous fungal host cells, include, but are not limited to promoters obtained from the genes for Aspergillus oryzae TAKA amylase, Rhizomucor miehei aspartic proteinase, Aspergillus niger neutral alpha-amylase, Aspergillus niger acid stable alpha-amylase, Aspergillus niger or Aspergillus awamori glucoamylase (glaA), Rhizomucor miehei lipase, Aspergillus oryzae alkaline protease, Aspergillus oryzae triose phosphate isomerase, Aspergillus nidulans acetamidase, and Fusarium oxysporum trypsin-like protease (See e.g., WO 96/00787), as well as the NA2-tpi promoter (a hybrid of the promoters from the genes for Aspergillus niger neutral alpha-amylase and Aspergillus oryzae triose phosphate isomerase), and mutant, truncated, and hybrid promoters thereof. Examples of yeast cell promoters can be from the genes for Saccharomyces cerevisiae enolase (ENO-1), Saccharomyces cerevisiae galactokinase (GAL1), Saccharomyces cerevisiae alcohol dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase (ADH2/GAP), and Saccharomyces cerevisiae 3-phosphoglycerate kinase. Other useful promoters for yeast host cells are known in the art (See e.g., Romanos et al., Yeast 8:423-488, 1992). For plant host cells, examples of suitable promoters include the cauliflower mosaic virus 35S promoter (CaMV 35S), and promoters (e.g., constitutive promoters) of genes that are highly expressed in plants (e.g., plant housekeeping genes, genes encoding Ubiquitin, Actin, Tubulin, or EIF (eukaryotic initiation factor)). Plant virus promoters can also be used. Additional useful plant promoters include those discussed in [50, 51], the entire contents of which are incorporated herein by reference. The selection of a suitable promoter is within the skill in the art. The recombinant plasmids can also comprise inducible, or regulatable, promoters for expression of a lyciumin precursor peptide, or biologically-active fragment thereof, in cells.


Various gene delivery vehicles are known in the art and include both viral and non-viral (e.g., naked DNA, plasmid) vectors. Viral vectors suitable for gene delivery are known to those skilled in the art. Such viral vectors include, e.g., vector derived from the herpes virus, baculovirus vector, lentiviral vector, retroviral vector, adenoviral vector and adeno-associated viral vector (AAV). Vectors derived from plant viruses can also be used, such as the viral backbones of the RNA viruses Tobacco mosaic virus (TMV), Potato virus X (PVX) and Cowpea mosaic virus (CPMV), and the DNA geminivirus Bean yellow dwarf virus. The viral vector can be replicating or non-replicating.


Non-viral vectors include naked DNA and plasmids, among others. Non-limiting examples include pKK plasmids (Clonetech), pUC plasmids, pET plasmids (Novagen, Inc., Madison, Wis.), pRSET or pREP plasmids (Invitrogen, San Diego, Calif.), or pMAL plasmids (New England Biolabs, Beverly, Mass.), and such vectors may be introduced into many appropriate host cells, using methods disclosed or cited herein or otherwise known to those skilled in the relevant art.


In certain embodiments, the vector comprises a transgene operably linked to a promoter. The transgene encodes a biologically-active molecule, such as a lyciumin precursor peptide described herein.


To facilitate the introduction of the gene delivery vector into host cells, the vector can be combined with different chemical means such as colloidal dispersion systems (macromolecular complex, nanocapsules, microspheres, beads) or lipid-based systems (oil-in-water emulsions, micelles, liposomes).


Some embodiments relate to a vector comprising a nucleic acid encoding lyciumin precuror peptide, or biologically-active fragment thereof, described herein. In certain embodiments, the vector is a plasmid, and includes any one or more plasmid sequences such as, e.g., a promoter sequence, a selection marker sequence, or a locus-targeting sequence. Suitable plasmid vectors include p423TEF 2μ, p425TEF 2μ, and p426TEF 2μ. Another suitable vector is pHis8-4 (Whitehead Institute, Cambridge, Mass., United States of America). Another suitable vector is pEAQ-HT [50].


Although the genetic code is degenerate in that most amino acids are represented by multiple codons (called “synonyms” or “synonymous” codons), it is understood in the art that codon usage by particular organisms is nonrandom and biased towards particular codon triplets. Accordingly, in some embodiments, the vector includes a nucleotide sequence that has been optimized for expression in a particular type of host cell (e.g., through codon optimization). Codon optimization refers to a process in which a polynucleotide encoding a protein of interest is modified to replace particular codons in that polynucleotide with codons that encode the same amino acid(s), but are more commonly used/recognized in the host cell in which the nucleic acid is being expressed. In some aspects, the polynucleotides described herein are codon optimized for expression in a bacterial cell, e.g., E. coli. In some aspects, the polynucleotides described herein are codon optimized for expression in a yeast cell, e.g., S. cerevisiae. In some aspects, the polynucleotides described herein are codon optimized for expression in a tobacco cell, e.g., N. benthamiana.


Host Cells

A wide variety of host cells can be used in the present invention, including fungal cells, bacterial cells, plant cells, insect cells, and mammalian cells.


In some embodiments, the host cell is a fungal cell, such as a yeast cell and an Aspergillus spp cell. A wide variety of yeast cells are suitable, such as cells of the genus Pichia, including Pichia pastoris and Pichia stipitis; cells of the genus Saccharomyces, including Saccharomyces cerevisiae; cells of the genus Schizosaccharomyces, including Schizosaccharomyces pombe; and cells of the genus Candida, including Candida albicans.


In some embodiments, the host cell is a bacterial cell. A wide variety of bacterial cells are suitable, such as cells of the genus Escherichia, including Escherichia coli; cells of the genus Bacillus, including Bacillus subtilis; cells of the genus Pseudomonas, including Pseudomonas aeruginosa; and cells of the genus Streptomyces, including Streptomyces griseus.


In some embodiments, the host cell is a plant cell. A wide variety of cells from a plant are suitable, including cells from a Nicotiana benthamiana plant. In some embodiments, the plant belongs to a genus selected from the group consisting of Arabidopsis, Beta, Glycine, Helianthus, Solanum, Triticum, Oryza, Brassica, Medicago, Prunus, Malta, Hordeum, Musa, Phaseolus, Citrus, Piper, Sorghum, Daucus, Manihot, Capsicum, and Zea. In some embodiments, the host cell is a plant cell from the Amaranthaceae family. In some embodiments, the plant cell is an Amaranthus genus plant cell, such as an Amaranthus hypochondriacus plant cell. In some embodiments, the plant cell is a Beta genus plant cell, such as a Beta vulgaris plant cell. In some embodiments, the plant cell is a Chenopodium genus plant cell, such as a Chenopodium quinoa plant cell. In some embodiments, the plant cell is a Fabaceae family plant cell. In some embodiments, the plant cell is a Glycine genus plant cell, such as a Glycine max plant cell. In some embodiments, the plant cell is a Medicago genus plant cell, such as a Medicago truncatula plant cell. In some embodiments, the plant cell is a Solanaceae family plant cell. In some embodiments, the plant cell is a Solanum genus plant cell, such as a Solanum melongena plant cell or a Solanum tuberosum plant cell. In some embodiments, the plant cell is a Nicotiana genus plant cell, such as a Nicotiana benthamiana plant cell. In some embodiments, the plant cell is a Capsicum genus plant cell, such as a Capsicum annuum plant cell.


In some embodiments, the host cell is an insect cell, such as a Spodoptera frugiperda cell, such as Spodoptera frugiperda Sf9 cell line and Spodoptera frugiperda Sf21


In some embodiments, the host cell is a mammalian cell.


In some embodiments, the host cell is an Escherichia coli cell. In some embodiments, the host cell is a Nicotiana benthamiana cell. In some embodiments, the cell is a Saccharomyces cerevisiae cell.


As used herein, the term “host cell” encompasses cells in cell culture and also cells within an organism (e.g., a plant). In some embodiments, the host cell is part of a transgenic plant.


Some embodiments relate to a host cell comprising a vector as described herein. In certain embodiments, the host cell is an Escherichia coli cell, a Nicotiana benthamiana cell, or a Saccharomyces cerevisiae cell.


In some embodiments, the host cells are cultured in a cell culture medium, such as a standard cell culture medium known in the art to be suitable for the particular host cell.


Methods of Making Transgenic Host Cells

Described herein are methods of making a transgenic host cell. The transgenic host cells can be made, for example, by introducing one or more of the vector embodiments described herein into the host cell.


In some embodiments, the method comprises introducing into a host cell a vector that includes a nucleic acid transgene that encodes a lyciumin precursor peptide, or a biologically-active fragment thereof. The lyciumin precursor peptide, or biologically-active fragment thereof, can include one or more core lyciumin peptide domains.


In some embodiments, one or more of the nucleic acids are integrated into the genome of the host cell. In some embodiments, the nucleic acids to be integrated into a host genome can be introduced into the host cell using any of a variety of suitable methodologies known in the art, including, for example, CRISPR-based systems (e.g., CRISPR/Cas9; CRISPR/Cpf1), TALEN systems and Agrobacterium-mediated transformation. However, as those skilled in the art would recognize, transient transformation techniques can be used that do not require integration into the genome of the host cell. In some embodiments, nucleic acid (e.g., plasmids) can be introduced that are maintained as episomes, which need not be integrated into the host cell genome.


In certain embodiments, the nucleic acid is introduced into a tissue, cell, or seed of a plant cell. Various methods of introducing nucleic acid into the tissue, cell, or seed of plants are known to one of ordinary skill in the art, such as protoplast transformation. The particular method can be selected based on several considerations, such as, e.g., the type of plant used. For example, a floral dip method is a suitable method for introducing genetic material into a plant. In other embodiments, agroinfiltration can be useful for transient expression in plants. In certain embodiments, the nucleic acid can be delivered into the plant by an Agrobacterium.


In some embodiments, a host cell is selected or engineered to have increased activity of the synthesis pathway.


Some of the methods described herein include assaying for an activity of interest. For example, crude extract from a host cell that expresses a lyciumin precursor peptide and/or lyciumin cyclic peptide, or a lyciumin cyclic peptide isolated from the host cell, can be assayed for an activity of interest. An example of an activity of interest is modulation (enhancement or inhibition) of fungal or bacterial growth, such as the ability to inhibit growth of a pathogenic fungal or bacterial species or the ability to promote growth of a potentially desirable fungal or bacterial species. Another example of an activity of interest is a protease inhibitor activity, which can include inhibition of a viral, bacterial, fungal, or mammalian protease.


EXEMPLIFICATION
Example 1
Results
Lyciumins are Plant RiPPs

The requirement for precursor gene-guided genome mining of a class of plant ribosomal peptides is the identification of a peptide-specific precursor gene, which provides the peptide sequence information via the core peptide. In order to identify the lyciumin precursor gene, a de novo transcriptome was generated of the roots of a Lycium barbarum plant, which produced lyciumin A, B and D based on liquid chromatography-mass spectrometry (LC-MS). Tblastn search of predicted core peptide sequences of lyciumin A (SEQ ID NO: 148; QPYGVGSW), lyciumin B (SEQ ID NO: 50; QPWGVGSW) and lyciumin D (SEQ ID NO: 174; QPYGVGIW) yielded three partial transcripts of candidate lyciumin precursor genes and a full length sequence of a candidate lyciumin precursor gene was obtained by cloning guided by these transcripts (FIGS. 1C and 5). The identified lyciumin precursor from Lycium barbarum, LbaLycA, consists of an N-terminal signal peptide indicating processing through the secretory pathway (FIG. 6) [27], an N-terminal domain with twelve repeats with each including a core peptide for lyciumin A, B or D and a C-terminal BURP domain (Pfam 03181) [28]. BURP domain proteins are terrestrial plant-specific proteins, which have diverse tempo-spatial expression patterns in plants and are often associated with abiotic stress responses of plants [29] and exhibit diverse temporal and spatial expression patterns in plants. BURP domain proteins are named after their initial members: BNM2, a microspore protein from Brassica rapus [30], VfUSP, an unknown seed protein from Vicia faba [31], RD22, an Arabidopsis thaliana dehydration-responsive protein [32], and PG1β, a β-subunit of polygalacturonase isoenzyme 1 of tomato [33].


In order to test whether LbaLycA is a precursor peptide for lyciumin biosynthesis, LbaLycA was expressed heterologously in Nicotiana benthamiana via infiltration of Agrobacterium tumefaciens LBA4404 pEAQ-HT-LbaLycA. LC-MS analysis of an organic extract of N. benthamiana leaves six days after inoculation with A. tumefaciens LBA4404 pEAQ-HT-LbaLycA showed mass signals for lyciumin A, B and D, as detected in Lycium barbarum root extracts (FIG. 1D), while no lyciumin mass signals appeared in the empty vector control. This result showed that lyciumins are plant RiPPs. In addition, the reconstitution of lyciumin biosynthesis by sole expression of a lyciumin precursor gene in N. benthamiana (Solanaceae) leaves revealed that tobacco leaf cells must have the enzymes necessary to produce lyciumins.









TABLE 1







Genome mining of BURP domain proteins in plant genomes and predicted lyciumin


core peptides. Asterisks indicate transcriptome-derived lyciumin core peptide sequences.














Predicted



Organism


BURP
Predicted core


(Genome version)
Family
Database
domain #
peptides















Amaranthus

Amaranthaceae
JGI Phytozome
19
QPYTVGSW



hypochondriacus


12.1

(SEQ ID NO:


(v1.0)



41),






QPYTVFSW






(SEQ ID NO:






42)



Amborella

Amborellaceae
JGI Phytozome
34




trichopoda (v1.0)


12.1





Anacardium

Anacardiaceae
JGI Phytozome
13




occidentale (v0.9)


pre-release species





Ananas comosus

Bromeliaceae
JGI Phytozome
7



(v3)

12.1





Aquilegia coerulea

Ranunculaceae
JGI Phytozome
5



(v3.1)

12.1





Arabidopsis halleri

Brassicaceae
JGI Phytozome
5



(v1.1)

12.1





Arabidopsis lyrata

Brassicaceae
JGI Phytozome
7



(v2.1)

12.1





Arabidopsis

Brassicaceae
JGI Phytozome
5




thaliana (TAIR10)


12.1





Arachis duranensis

Fabaceae
NCBI (GenBank
18
QPYGVYTW


(Aradul.1)

JQIN00000000.1)

(SEQ ID NO:






43)



Arachis ipaensis

Fabaceae
NCBI (GenBank
17
QPYGVYTW


(Araip1.1)

JQIO00000000.1)

(SEQ ID NO:






43)



Aegilops tauschii

Poaceae
NCBI (GenBank
24



(Aet_MR_1.0)

MCGU00000000.1)





Asparagus

Asparagaceae
JGI Phytozome
7




officinalls (V1.1)


pre-release species





Beta vulgaris

Amaranthaceae
NCBI (GenBank
18
QPWTVYGW


(RefBeet-1.2.2)

AYZS00000000.2)

(SEQ ID NO:






44),






QPWTVAGW






(SEQ ID NO:






45),






QPFTISAW






(SEQ ID NO:






46),






QPWTVAAW






(SEQ ID NO:






47)



Boechera stricta

Brassicaceae
JGI Phytozome
5



(v1.2)

12.1





Brachypodium

Poaceae
JGI Phytozome
14




distachyon (v3.1)


12.1





Brachypodium

Poaceae
JGI Phytozome
16




stacei (v1.1)


12.1





Brachypodium

Poaceae
JGI Phytozome
19




sylvaticum (v1.1)


pre-release species





Brassica oleracea

Brassicaceae
JGI Phytozome
7




capitata (v1.0)


12.1





Brassica rapa FPsc

Brassicaceae
JGI Phytozome
9



(v1.3)

12.1





Cajanus cajan

Fabaceae
NCBI (GenBank
13



(C.cajan_V1.0)

AGCT00000000.1)





Camelina sativa

Brassicaceae
NCBI (GenBank
17



(Cs)

JFZQ00000000.1)





Capsella

Brassicaceae
JGI Phytozome
5




grandiflora (v1.1)


12.1





Capsella rubella

Brassicaceae
JGI Phytozome
5



(v1.0)

12.1





Capsicum annuum

Solanaceae
NCBI (GenBank
23
QPYGGLTW


(Zunla 1 Ref_v1.0)

ASJU00000000.1)

(SEQ ID NO:






48),






QPWGVCLW






(SEQ ID NO:






49),






QPWGVGSW






(SEQ ID NO:






50),






QPWGVGFW






(SEQ ID NO:






51)



Capsicum

Solanaceae
NCBI (GenBank
21




baccatum


MLFT00000000.2)




(ASM227188v2)







Capsicum chinense

Solanaceae
NCBI (GenBank
21
QPWGVCFW


(ASM227189v2)

MCIT00000000.2)

(SEQ ID NO:






52),






QPWGVGSW






(SEQ ID NO:






50),






QPWGVGFW






(SEQ ID NO:






51)



Carica papaya

Caricaceae
JGI Phytozome
11



(ASGPBv0.4)

12.1





Chenopodium

Amaranthaceae
JGI Phytozome
42
QPFTVVGW



quinoa (v1.0)


pre-release species

(SEQ ID NO:






53),






QPYTVMAW






(SEQ ID NO:






54),






QPYTVWGW






(SEQ ID NO:






55),






QPYTVMGW






(SEQ ID NO:






56),






QPYTVYGW






(SEQ ID NO:






57),






QPFTVFGW






(SEQ ID NO:






58),






QPYTVDGW






(SEQ ID NO:






59)



Cicer arietinum

Fabaceae
JGI Phytozome
15



(v1.0)

pre-release species





Citrus clementina

Rutaceae
JGI Phytozome
11



(v1.0)

12.1





Citrus sinensis

Rutaceae
JGI Phytozome
9



(v1.1)

12.1





Coffea arabica

Rubiaceae
JGI Phytozome
61



(UCDv0.5)

pre-release species





Cucumis melo

Cucurbitaceae
NCBI (GenBank
7



(ASM31304v1)

CAJI00000000.1)





Cucumis sativus

Cucurbitaceae
JGI Phytozome
7



(v1.0)

12.1





Cucurbita

Cucurbitaceae
NCBI (GenBank
10




moschata


NEWM00000000.1)




(Cmos_1.0)







Daucus carota

Apiaceae
JGI Phytozome
14



(v2.0)

12.1





Dichanthelium

Poaceae
NCBI (GenBank
11




oligosanthes


LWDX00000000.2)




(ASM163321v2)







Durio zibethinus

Malvaceae
NCBI (GenBank
20



(Duzib1.0)

NSDW00000000.1)





Elaeis guineensis

Arecaceae
NCBI (GenBank
21



(EG5)

ASJS00000000.1)





Erythranthe guttata

Phrymaceae
NCBI (GenBank
6



(Mimgu1_0)

APLE00000000.1)





Eucalyptus grandis 

Myrtaceae
JGI Phytozome
14



(v2.0)

12.1





Eutrema

Brassicaceae
JGI Phytozome
5




salsugineum (v1.0)


12.1





Fragaria vesca

Rosaceae
JGI Phytozome
10



(v1.1)

12.1





Glycine max

Fabaceae
JGI Phytozome
26
QPFTVFAW


(Wm82.a2.v1)

12.1

(SEQ ID NO:






60),






QPWGVGTW






(SEQ ID NO:






61),






QPYGVYTW






(SEQ ID NO:






43)



Gossypium

Malvaceae
JGI Phytozome
27




hirsutum (v1.1)


pre-release species





Gossypium

Malvaceae
JGI Phytozome
18




raimondii (v2.1)


12.1





Hevea brasiliensis 

Euphorbiaceae
NCBI (GenBank
19



(ASM165405v1)

LVXX00000000.1)





Helianthus annuus

Asteraceae
JGI Phytozome
17



(r1.2)

pre-release species





Hordeum vulgare

Poaceae
JGI Phytozome
15



(rl)

pre-release species





Ipomoea nil

Convolvulaceae
NCBI (GenBank
12



(Asagao_1.1)

BDFN00000000.1)





Jatropha curcas

Euphorbiaceae
NCBI (GenBank
32



(JatCur_1.0)

AFEW00000000.1)





Juglans regia

Juglandaceae
NCBI (GenBank
13



(wgs.5d)

LIHL00000000.1)





Kalanchoe

Crassulaceae
JGI Phytozome
32




fedtschenkoi (v1.1)


12.1





Kalanchoe laxiflora

Crassulaceae
JGI Phytozome
31



(v1.1)

12.1





Lactuca sativa (V8)

Asteraceae
JGI Phytozome
12





pre-release species





Linum

Linaceae
JGI Phytozome
18




usitatissimum


12.1




(v1.0)







Lupinus

Fabaceae
NCBI (GenBank
13




angustifolius


MLAU00000000.1)




(LupAngTanjil_v1.0)







Malus domestica

Rosaceae
JGI Phytozome
28



(v1.0)

12.1





Manihot esculenta

Euphorbiaceae
JGI Phytozome
31



(v6.1)

12.1





Marchantia

Marchantiaceae
JGI Phytozome
1




polymorpha (v3.1)


12.1





Medicago

Fabaceae
JGI Phytozome
52
QPLLFIYW



truncatula


12.1

(SEQ ID NO:


(Mt4.0v1)



62),






QPYGVYFW






(SEQ ID NO:






63),






QPYGVYTW






(SEQ ID NO:






43),






QPLTTRMW






(SEQ ID NO:






64),






QPLITRMW






(SEQ ID NO:






146),






QPLTTSMW






(SEQ ID NO:






65),






QPITTHMW






(SEQ ID NO:






66),






QPFGINIW






(SEQ ID NO:






67),






QPFGVLTW






(SEQ ID NO:






68),






QPFGFFSW






(SEQ ID NO:






69),






QPLPAHKW






(SEQ ID NO:






70),






QPFRTIGW






(SEQ ID NO:






71),






QPLGAVKW






(SEQ ID NO:






72),






QPFGSLTW






(SEQ ID NO:






73),






QPFGVAAW






(SEQ ID NO:






74),






QPFGFRAW






(SEQ ID NO:






75),






QPFEAHTW






(SEQ ID NO:






76)



Mimulus guttatus

Phrymaceae
JGI Phytozome
8



(v2.0)

12.1





Miscanthus sinensis

Poaceae
JGI Phytozome
23



(v7.1)

pre-release species





Morus notabilis

Moraceae
NCBI (GenBank
10



(ASM41409v2)

ATGF00000000.1)





Musa acuminata

Musaceae
JGI Phytozome
7



(v1)

12.1





Nelumbo nucifera

Nelumbonaceae
NCBI (GenBank
8



(Chinese Lotus 1.1)

AQ0G00000000.1)





Nicotiana attenuata

Solanaceae
NCBI (GenBank
14
QPWGVYSW


(v2)

MJEQ00000000.1)

(SEQ ID NO:






77)



Nicotiana

Solanaceae
Sol Genomics
12




benthamiana


Network




(v1.0.1)







Nicotiana sylvestris

Solanaceae
NCBI (GenBank
10



(Nsyl)

ASAF00000000.1)





Nicotiana tabacum

Solanaceae
Sol Genomics
15



(v1.0 Edwards

Network




2017)







Nicotiana

Solanaceae
NCBI (GenBank
10




tomentosiformis


ASAG00000000.1)




(Ntom_v01)







Olea europaea var. 

Oleaceae
JGI Phytozome
16




sylvestris (v1.0)


pre-release species





Oropetium

Poaceae
JGI Phytozome
4




thomaeum (v1.0)


12.1





Oryza sativa

Poaceae
JGI Phytozome
18



(v7_JGI)

12.1





Panicum hallii

Poaceae
JGI Phytozome
13



(v2.0)

12.1





Panicum virgatum

Poaceae
JGI Phytozome
36



(v1.1)

12.1





Petunia axillaris

Solanaceae
Sol Genomics
13
QPYGVFAW


(v1.6.2)

Network

(SEQ ID NO:






78),






QPFGVFAW






(SEQ ID NO:






79)



Petunia inflata

Solanaceae
Sol Genomics
14
QPYGPFGW


(v1.0.1)

Network

(SEQ ID NO:






80),






QPFGDYVW






(SEQ ID NO:






81),






QPYGVFGW






(SEQ ID NO:






82),






QPFGVFGW






(SEQ ID NO:






83),






QPFGVFVW






(SEQ ID NO:






84)



Phalaenopsis

Orchidaceae
NCBI (GenBank
7




equestris


APLD00000000.1)




(ASM126359v1)







Phaseolus vulgaris

Fabaceae
JGI Phytozome
11



(v2.1)

12.1





Phoenix dactylifera

Arecaceae
NCBI (GenBank
11



(DPV01)

ATBV00000000.1)





Physcomitrella

Funariaceae
JGI Phytozome
9




patens (v3.3)


12.1





Populus deltoides

Fabaceae
JGI Phytozome
21



(WV94 v2.1)

pre-release species





Populus euphratica

Salicaceae
NCBI (GenBank
14



(PopEup_1.0)

AOFL00000000.1)





Populus

Salicaceae
JGI Phytozome
20




trichocarpa (v3.0)


12.1





Prunus avium

Rosaceae
NCBI (GenBank
19
QPAPQLYW


(PAV_r1.0)

BDGV00000000.1)

(SEQ ID NO:






85)



Prunus persica

Rosaceae
JGI Phytozome
29
QPAAQLYW


(v2.1)

12.1

(SEQ ID NO:






86),






QPAPQLYW






(SEQ ID NO:






85)



Pyrus ×

Rosaceae
NCBI (GenBank
21




bretschneideri


AJSU00000000.1)




(Pbr_ v1.0)







Raphanus sativus

Brassicaceae
NCBI (GenBank
11



(Rs1.0)

JRUI00000000.2)





Ricinus communis

Euphorbiaceae
JGI Phytozome
24



(v0.1)

12.1





Salix purpurea

Salicaceae
JGI Phytozome
12



(v1.0)

12.1





Selaginella

Selaginellaceae
JGI Phytozome
7




moellendorffii


12.1




(v1.0)







Sesamum indicum

Pedaliaceae
NCBI (GenBank
9



(v1.0)

APMJ00000000.1)





Setaria italica

Poaceae
JGI Phytozome
15



(v2.2)

12.1





Setaria viridis

Poaceae
JGI Phytozome
15



(v1.1)

12.1





Solanum

Solanaceae
JGI Phytozome
14
QPWGVGAW



lycopersicum


12.1

(SEQ ID NO:


(iTAG2.4)



87),






QPWGVYRW






(SEQ ID NO:






88),






QPYGVYRW






(SEQ ID NO:






89),






QPYGVYSW






(SEQ ID NO:






90),






QPWGVGSW






(SEQ ID NO:






50)



Solanum

Solanaceae
Sol Genomics
10
QPWGVNSW



melongena (v2.5.1)


Network

(SEQ ID NO:






91),






QPWGVLRW






(SEQ ID NO:






92),






QPWGVGSW






(SEQ ID NO:






50),






QPWGVLGW






(SEQ ID NO:






93),






QPYGVYTW






(SEQ ID NO:






43),



Solanum pennellii

Solanaceae
Sol Genomics
13
QPWGVGAW




Network

(SEQ ID NO:






87),






QPFGVYRW






(SEQ ID NO:






94),






QPWGVFRW






(SEQ ID NO:






95),






QPWGVGSW






(SEQ ID NO:






50)



Solanum

Solanaceae
Sol Genomics
13
QPWGVGAW



pimpinellifolium


Network

(SEQ ID NO:


(LA1589)



87),






QPWGVYRW






(SEQ ID NO:






88),






QPYGVYRW






(SEQ ID NO:






89),






QPYGVYSW






(SEQ ID NO:






90),






QPWGVGSW






(SEQ ID NO:






50)



Solanum tuberosum

Solanaceae
JGI Phytozome
20
QPWGVDSW


(v4.03)

12.1

(SEQ ID NO:






97),






QPYGVGVW






(SEQ ID NO:






98),






QPFGVGRW*






(SEQ ID NO:






99),






QPWGVGRW*






(SEQ ID NO:






100),






QPFGVVAW*






(SEQ ID NO:






101),






QPYGVLAW*






(SEQ ID NO:






102),






QPYGVSRW*






(SEQ ID NO:






103),






QPWGVVAW*






(SEQ ID NO:






104),






QPYGVFRW*






(SEQ ID NO:






105),






QPYGVFAW*






(SEQ ID NO:






78),






QPYGVDGW*






(SEQ ID NO:






107),






QPYGVYRW*






(SEQ ID NO:






89),






QPWGVGAW*






(SEQ ID NO:






87),






QPYGVFGW*






(SEQ ID NO:






82),






QPFGVFGW*






(SEQ ID NO:






83),






QPYGVFAW*






(SEQ ID NO:






78),






QPWGVGSW*






(SEQ ID NO:






50)



Sorghum bicolor

Poaceae
JGI Phytozome
11



(v3.1.1)

12.1





Sphagnum fallax

Sphagnaceae
JGI Phytozome
8



(v0.5)

12.1





Spinacia oleracea

Amaranthaceae
NCBI (GenBank
16 



(ASM200726v1)

LZYP00000000.1)





Spirodela polyrhiza

Araceae
JGI Phytozome
11



(v2)

12.1





Tarenaya

Cleomaceae
NCBI (GenBank
8




hassleriana


AOUI00000000.1)




(ASM46358v1)







Theobroma cacao

Malvaceae
JGI Phytozome
14



(v1.1)

12.1





Trifolium pratense 

Fabaceae
JGI Phytozome
36
QPLGTWIW


(v2)

12.1

(SEQ ID NO:






108),






QPFGIAAW






(SEQ ID NO:






109),






QPSGVYIW






(SEQ ID NO:






110),






QPFGINIW






(SEQ ID NO:






67),






QPYGVYTW






(SEQ ID NO:






43)



Triticum aestivum

Poaceae
JGI Phytozome
33



(v2.2)

pre-release species





Vicia faba

Fabaceae
NCBI (GenBank
5



(VfEP_Reference-

CSVX00000000.1)




Unigene)







Vigna angularis

Fabaceae
NCBI (GenBank
13



(Vigan1.1)

JZJHO0000000.1)





Vigna radiata

Fabaceae
NCBI (GenBank
24



(release 101)

JJMO00000000.1)





Vigna unguiculata

Fabaceae
JGI Phytozome
19
QPATLLAW


(v1.1)

pre-release species

(SEQ ID NO:






111)



Vitis vinifera

Vitaceae
JGI Phytozome
7



(Genoscope.12X)

12.1





Zea mays PH207

Poaceae
JGI Phytozome
10



(v1.1)

12.1





Ziziphus jujuba

Rhamnaceae
NCBI(GenBank
20



(ZizJuj_1.1)

JREP00000000.1)





Zostera marina

Zosteraceae
JGI Phytozome
9



(v2.2)

12.1









Genome Mining Reveals Hidden Chemical Diversity of Lyciumin Plant Ribosomal Peptides

With a precursor gene for lyciumin biosynthesis in hand, lyciumin genotypes and chemotypes in genome sequenced plants were identified. The precursor gene-guided genome mining approach (FIG. 2A) started with tblastn-search of plant genomes for homologs of lyciumin precursor LbaLycA and, generally, BURP domain proteins (Pfam 03181). A candidate lyciumin precursor was identified by one or multiple candidate core peptide sequences of the motif QP(X)5W in its N-terminal half, i.e. the non-BURP domain. If a putative lyciumin precursor was identified from a plant genome, corresponding lyciumin structures were predicted based on its core peptide sequences. Herein, the N-terminal glutamine was transformed into a pyroglutamine and the C-terminal tryptophan-indole-nitrogen was linked to the α-carbon of the amino acid at the fourth position. Subsequently, these predicted lyciumin chemotypes were searched for in the LC-MS-based metabolomics dataset of peptide extracts prepared from the target plant host, by querying both peptide parent masses in MS data and predicted peptide fragment masses in MS/MS data (e.g. the pyroglutamate-proline-b-ion ([M+H]+, 209.09207 m/z, or amino acid-iminium ions). Finally, MS/MS analysis of lyciumins enables the characterization of a planar structure (FIGS. 7A-C) and, thus, could verify a connection of a candidate lyciumin mass spectrum with a lyciumin genotype as a successful plant genome mining experiment.


Genome mining of LbaLycA homologs revealed that 21 of 116 analyzed plant genomes have a candidate lyciumin precursor peptide gene (Table 1). The putative lyciumin producing plants were Amaranthaceae, Fabaceae, Rosaceae or Solanaceae plants. Bioinformatic analysis of identified BURP domain proteins yielded 71 distinct core peptide sequences with 60 of them being species-specific (Table 2), indicating a large untapped diversity of lyciumin chemotypes, and several core peptide sequences, which are present in multiple species and families (QPYGVYTW (SEQ ID NO: 43) and QPWGVGSW (SEQ ID NO: 50)), indicating functional selection of their products. Subsequently, ten plants with candidate lyciumin genotypes were selected and their organic extracts were analyzed for predicted lyciumin chemotypes by LC-MS. For seven of those plants, predicted lyciumin analytes could be detected and verified by MS and MS/MS analysis including economically important crop and forage plants such as Amaranthus hypochondriacus (amaranth), Beta vulgaris (beet), Chenopodium quinoa (quinoa), Glycine max (soy), Solanum melongena (eggplant), and Medicago truncatula (FIGS. 2B, 12A-B, 13A-B, 14A-B, 15A-D, 16A-D, and 17A-B). Identities were verified by LC-MS, MS and MS/MS. No lyciumin peptides could be detected in peptide extracts of Solanum lycopersicum Heinz 1702 (tomato), Capsicum annuum (pepper) and Trifohum pratens (red clover). Characterized lyciumin precursor genes are differentially expressed in plant tissues and developmental stages with generally the highest expression in roots and embryo-developing tissues (FIGS. 8A-E). Accordingly, characterized lyciumin concentrations are generally the highest in roots and seeds, while some lyciumins are detected in the whole plant, such as in soy, quinoa and amaranth (FIGS. 9A-D).


For Solanum tuberosum (potato), several lyciumin peptides could be characterized by LC-MS analysis. However, none of the detected peptides matched the predicted core peptide sequences from the genome derived lyciumin precursor. Analysis of the corresponding genome location showed that the 5′-region of the lyciumin precursor gene PGSC0003DMG400047074 was incomplete (FIGS. 18A-D and 19A-O). Therefore, a de novo transcriptome was assembled for a Russett potato tuber (NCBI SRA: SRR5970148) in order to recover the missing core peptide sequences of the predicted lyciumin precursor. Blast search in the transcriptome using LbaLycA as query yielded eleven additional core peptide sequences from candidate lyciumin precursor transcripts including all core peptide sequences that matched detected potato lyciumin peptides (FIGS. 18A-D, and 19A-O; Tables 1 and 2). To verify this result, one precursor peptide gene of a detected potato lyciumin, StuBURP, was cloned and transiently expressed in N. benthamiana leaves, which resulted in the detection of its predicted product, lyciumin J (StuBURP, FIGS. 18A-D and 19A-O). LC-MS, MS, and MS-MS were performed. The characterization of potato lyciumins indicates that peptide genome mining can be complicated by sequencing gaps in early or draft genomes and that de novo transcriptomics can complement plant RiPP genome mining. An issue of de novo transcriptome assembly for lyciumin precursor identification is that de novo transcriptome assembly programs such as Trinity [34] and rnaSPADES [35] have problems with complete assembly of short repeats such as the N-terminal domain of LbaLycA (FIGS. 6 and 19A-O).









TABLE 2







Examples of lyciumin core peptides;


asterisks indicate detected peptides.










Core peptide

Single- vs.



(SEQ ID NO:)
Chemotype
Multi-Species
Organism(s)





QPYTVGSW*
Lyciumin A
Single-species

Amaranthus



(SEQ ID NO: 41)



hypochondriacus






QPYTVFSW*
Lyciumin C
Single-species

Amaranthus



(SEQ ID NO: 42)



hypochondriacus






QPYGVYTW*
Lyciumin I
Multi-species

Arachis duranensis,



(SEQ ID NO: 43)



Arachis ipaensis,







Glycine max,







Medicago







truncatula,







Solanum







melongena,







Trifolium pratens






QPWTVYGW

Single-species

Beta vulgaris



(SEQ ID NO: 44)








QPWTVAGW

Single-species

Beta vulgaris



(SEQ ID NO: 45)








QPFTISAW

Single-species

Beta vulgaris



(SEQ ID NO: 46)








QPWTVAAW*
Lyciumin E
Single-species

Beta vulgaris



(SEQ ID NO: 47)








QPYGGLTW

Single-species

Capsicum annuum



(SEQ ID NO: 48)








QPWGVCLW

Single-species

Capsicum annuum



(SEQ ID NO: 49)








QPWGVGSW*
Lyciumin B
Multi-species

Capsicum annuum,



(SEQ ID NO: 50)



Capsicum chinense,







Solanum







lycopersicum,







Solanum







melongena,







Solanum pennellii,







Solanum







pimpinellifolium,







Solanum tuberosum






QPWGVGFW

Multi-species

Capsicum annuum,



(SEQ ID NO: 51)



Capsicum chinense






QPWGVCFW

Single-species

Capsicum chinense



(SEQ ID NO: 52)








QPFTVVGW*
Lyciumin G
Single-species

Chenopodium



(SEQ ID NO: 53)



quinoa






QPYTVMAW

Single-species

Chenopodium



(SEQ ID NO: 54)



quinoa






QPYTVWGW*
Lyciumin F
Single-species

Chenopodium



(SEQ ID NO: 55)



quinoa






QPYTVMGW

Single-species

Chenopodium



(SEQ ID NO: 56)



quinoa






QPYTVYGW

Single-species

Chenopodium



(SEQ ID NO: 57)



quinoa






QPFTVFGW

Single-species

Chenopodium



(SEQ ID NO: 58)



quinoa






QPYTVDGW

Single-species

Chenopodium



(SEQ ID NO: 59)



quinoa






QPFTVFAW

Single-species

Glycine max



(SEQ ID NO: 60)








QPWGVGTW*
Lyciumin H
Single-species

Glycine max



(SEQ ID NO: 61)








QPLLFIYW

Single-species

Medicago



(SEQ ID NO: 62)



truncatula






QPYGVYFW

Single-species

Medicago



(SEQ ID NO: 63)



truncatula






QPLTTRMW

Single-species

Medicago



(SEQ ID NO: 64)



truncatula






QPLTTSMW

Single-species

Medicago



(SEQ ID NO: 65)



truncatula






QPITTHMW

Single-species

Medicago



(SEQ ID NO: 66)



truncatula






QPFGINIW

Multi-species

Medicago



(SEQ ID NO: 67)



truncatula,







Trifolium pratense






QPFGVLTW

Single-species

Medicago



(SEQ ID NO: 68)



truncatula






QPFGFFSW

Single-species

Medicago



(SEQ ID NO: 69)



truncatula






QPLPAHKW

Single-species

Medicago



(SEQ ID NO: 70)



truncatula






QPFRTIGW

Single-species

Medicago



(SEQ ID NO: 71)



truncatula






QPLGAVKW

Single-species

Medicago



(SEQ ID NO: 72)



truncatula






QPFGSLTW

Single-species

Medicago



(SEQ ID NO: 73)



truncatula






QPFGVAAW

Single-species

Medicago



(SEQ ID NO: 74)



truncatula






QPFGFRAW

Single-species

Medicago



(SEQ ID NO: 75)



truncatula






QPFEAHTW

Single-species

Medicago



(SEQ ID NO: 76)



truncatula






QPWGVYSW

Single-species

Nicotiana attenuata



(SEQ ID NO: 77)








QPYGVFAW*
Lyciumin J
Multi-species

Petunia axillaris,



(SEQ ID NO: 78)



Solanum tuberosum






QPFGVFAW

Single-species

Petunia axillaris



(SEQ ID NO: 79)








QPYGPFGW

Single-species

Petunia inflata



(SEQ ID NO: 80)








QPFGDYVW

Single-species

Petunia inflata



(SEQ ID NO: 81)








QPYGVFGW

Multi-species

Petunia inflata,



(SEQ ID NO: 82)



Solanum tuberosum






QPFGVFGW

Multi-species

Petunia inflata,



(SEQ ID NO: 83)



Solanum tuberosum






QPFGVFVW

Single-species

Petunia inflata



(SEQ ID NO: 84)








QPAPQLYW

Multi-species

Prunus avium,



(SEQ ID NO: 85)



Prunus persica






QPAAQLYW

Single-species

Prunus persica



(SEQ ID NO: 86)








QPWGVGAW*
Lyciumin K
Multi-species

Solanum



(SEQ ID NO: 87)



lycopersicum,







Solanum pennellii,







Solanum







pimpinellifolium,







Solanum tuberosum






QPWGVYRW

Single-species

Solanum



(SEQ ID NO: 88)



lycopersicum






QPYGVYRW*
Lyciumin M
Multi-species

Solanum



(SEQ ID NO: 89)



lycopersicum,







Solanum







pimpinellifolium,







Solanum tuberosum






QPYGVYSW

Multi-species

Solanum



(SEQ ID NO: 90)



lycopersicum,







Solanum







pimpinellifolium






QPWGVNSW

Single-species

Solanum melongena



(SEQ ID NO: 91)








QPWGVLRW

Single-species

Solanum melongena



(SEQ ID NO: 92)








QPWGVLGW

Single-species

Solanum melongena



(SEQ ID NO: 93)








QPFGVYRW

Single-species

Solanum pennellii



(SEQ ID NO: 94)








QPWGVFRW

Single-species

Solanum pennellii



(SEQ ID NO: 95)








QPYGVYSW

Single-species

Solanum



(SEQ ID NO: 90)



pimpinellifolium






QPWGVDSW

Single-species

Solanum tuberosum



(SEQ ID NO: 97)








QPYGVGVW

Single-species

Solanum tuberosum



(SEQ ID NO: 98)








QPFGVGRW

Single-species

Solanum tuberosum



(SEQ ID NO: 99)








QPWGVGRW*
Lyciumin O
Single-species

Solanum tuberosum



(SEQ ID NO: 100)








QPFGVVAW

Single-species

Solanum tuberosum



(SEQ ID NO: 101)








QPYGVLAW

Single-species

Solanum tuberosum



(SEQ ID NO: 102)








QPYGVSRW*
Lyciumin N
Single-species

Solanum tuberosum



(SEQ ID NO: 103)








QPWGVVAW*
Lyciumin L
Single-species

Solanum tuberosum



(SEQ ID NO: 104)








QPYGVFRW

Single-species

Solanum tuberosum



(SEQ ID NO: 105)








QPYGVFAW

Single-species

Solanum tuberosum



(SEQ ID NO: 78)








QPYGVDGW

Single-species

Solanum tuberosum



(SEQ ID NO: 107)








QPLGTWIW

Single-species

Trifolium pratense



(SEQ ID NO: 108)








QPFGIAAW

Single-species

Trifolium pratense



(SEQ ID NO: 109)








QPSGVYIW

Single-species

Trifolium pratense



(SEQ ID NO: 110)








QPATLLAW

Single-species

Vigna unguiculata



(SEQ ID NO: 111)









The Sequence Rules of Naturally Occurring Lyciumins

Next, the observed lyciumins were analyzed in their structural diversity (FIGS. 1B and 2B). In this example, all detected lyciumins have an aromatic amino acid (Phe, Tyr, Trp) at the third position and a valine at the fifth position. The sixth and seventh residues vary greatly with ten different amino acids including aromatic, polar and charged residues. In this example, the cyclization site is a glycine at the fourth position of all detected lyciumins, except for the identified Amaranthaceae lyciumin precursor peptides which contain a threonine at the fourth residue of the core peptides (FIG. 2B and Table 2). In organic extracts of investigated Amaranthaceae plants, several lyciumin derivatives were detected, which show a mass shift at the fourth residue corresponding to C2H3O or a putative dehydrothreonine, supporting a biosynthetic route via a threonine cyclization site. Predicted and characterized lyciumin genotypes can be divided into two types based on primary structure. Type 1 lyciumin precursors have core peptides within the BURP domain (e.g. in Fabaceae), while type 2 lyciumin precursors contain core peptides N-terminally of the BURP domain (e.g. in Amaranthaceae and Solanaceae) (FIG. 10A).


Parallel Evolution of Lyciumin Biosynthesis in Angiosperms and Lycophytes

The discovery of lyciumin biosynthesis in three plant families via genome mining motivated a more detailed exploration of the distribution of lyciumin genotypes in the plant kingdom. To do this, plant transcriptomes were targeted as an alternative source for the discovery of lyciumins in plants, as the analysis of the sequenced plant genomes only represented 42 of the estimated 667 plant families. Given success in improving BURP domain precursor gene assembly for lyciumin discovery from the potato transcriptome, large-scale de novo re-assembly of transcriptomes of 793 plants species representing 317 plant families was performed using rnaSPAdes [35], starting from raw sequencing reads generated as part of the 1 kp project. Subsequently, tblastn searches of type 1 and type 2 lyciumin precursors in the assembled transcriptomes identified candidate lyciumin precursors in multiple plant families, including Aizoaceae, Mollunginaceae, Nyctaginaceae, Petiveriaceae, Phytolaccaceae (all Caryophyllales) and Selaginellaceae (Table 3). Corresponding to these transcriptome predictions, a lyciumin chemotype was detected in the peptide extract of Selaginella uncinata roots, showcasing that lyciumins can also be discovered by mining transcriptomes. The putative Selaginella peptide lyciumin P is derived from the core peptide QPYSVFAW and features a serine as the putative cyclization site (FIGS. 20A-B), in contrast to the glycine and threonine residues found in other plants. It is noteworthy that Selaginellaceae represents lycophytes, which are basal vascular plants diverged from the rest of the vascular plant lineages over 400 million years ago.


Several observations support independent diversification of lyciumin peptides in Selaginellaceae, Fabaceae, Solanaceae and Caryophyllales families with a few cases of parallel evolution of the identical lyciumin peptide natural products in distantly related plant families. First, the phylogenetic analysis of the BURP domains of the predicted and characterized lyciumin precursor proteins reveals five well-defined clades of sequences from Caryophyllales, Fabaceae, Rosaceae, Solanaceae, and Selaginellaceae (FIG. 10B), suggesting the diversity of lyciumin biosynthesis arose after these plant families have split from each other. Second, both Fabaceae and Selaginellaceae contain type 1 lyciumin precursors, while Caryophyllales and Solanaceae contain type 2 lyciumin precursors (FIG. 10B, Table 3), implicating possible independent recruitments of non-peptide-producing ancestral BURP domain proteins for lyciumin biosynthesis in various plant families. Third, the predicted cyclization site at the fourth position of lyciumin core peptides is a threonine in most Caryophyllales precursors, a glycine in most Fabaceae and Solanaceae precursors, and a serine or a glycine in Selaginellaceae precursors (FIG. 10B), which suggests independent diversification and potentially independent occurrences of the lyciumin biosynthetic machineries in these plant families. Finally, core peptides of predicted and characterized lyciumin precursors are mostly unique to each of these clades (75% of core peptides are found in one plant family, FIG. 10C, Table 4). However, two lyciumin chemotypes (lyciumin A and J) are predicted to occur in three of the five clade, presenting exemplary cases of parallel evolution of identical metabolic traits from homologous ancestral states (FIG. 10C). The repeated occurrences of lyciumin A and J in distantly related plant lineages also imply important biological functions contained in these natural products for their plant hosts. Given these results in the context of phylogenetic relationship among the predicted and characterized lyciumin-producing plants, it is reasonable to conclude that homologous BURP domain protein progenitors likely gave rise to independent occurrences of lyciumin biosynthesis at least twice in lycophytes and angiosperms during land plant evolution, and that the diversity of lyciumin genotypes and chemotypes present in extant plants largely arose through independent divergent evolution within the host plant families (FIG. 10D).









TABLE 3







Transcriptome mining of lyciumin precursor


genes in 1 kp database of plant


transcriptomes (de novo rnaSPADES re-assembly)

















Predicted







lyciumin







chemotype







(threonine at







fourth







position from




Transcript

Predicted
predicted core




product
Lyciumin
core
peptide




(Dataset
precursor
peptides
converted to


Organism
NCBI SRA
S3)
type
(QP(X)5W)
glycine)





Selaginella
ERR2040880
SwiBURP
1
QPYSVFAW
QPYSVFAW



willdenowii




(SEQ ID NO:
(SEQ ID NO:






147)
147)






Selaginella

SRR3136708
SbrBURP
1
QPYGVGSW
QPYGVGSW



bryopteris




(SEQ ID NO:
(SEQ ID NO:






148),
148),






QPYGVIRW
QPYGVIRW






(SEQ ID NO:
(SEQ ID NO:






149)
149)






Selaginella

SRR7132766
SmoBURP
1
QPYGVGAW
QPYGVGAW



moellendorffii




(SEQ ID NO:
(SEQ ID NO:






150)
150)






Selaginella

SRR7132763
SunBURP
1
QPYSVFAW
QPYSVFAW



uncinata




(SEQ ID NO:
(SEQ ID NO:






147)
147)






Acacia

ERR2040348
AarBURP
1
QPWGVGTW
QPWGVGTW



argyrophylla




(SEQ ID NO:
(SEQ ID NO:






61)
61)






Acacia

ERR2040344
ApyBURP
1
QPWGVGTW
QPWGVGTW



pycnantha




(SEQ ID NO:
(SEQ ID NO:






61)
61)






Apios

ERR706837
AamBURP
1
QPYGVYAW
QPYGVYAW



americana




(SEQ ID NO:
(SEQ ID NO:






151)
151)






Astragalus

ERR706814
AmeBURP1
1
QPFGARTW
QPFGARTW



membranaceus




(SEQ ID NO:
(SEQ ID NO:






152)
152)




AmeBURP2
1
QPFGALVW
QPFGALVW






(SEQ ID NO:
(SEQ ID NO:






153),
153),






QPFGGFAW
QPFGGFAW






(SEQ ID NO:
(SEQ ID NO:






154)
154)




AmeBURP3
1
QPFGFLIW
QPFGFLIW






(SEQ ID NO:
(SEQ ID NO:






155)
155)






Bituminaria

ERR2040332
BbiBURP
1
QPYGVLYW
QPYGVLYW



bituminosa




(SEQ ID NO:
(SEQ ID NO:






156)
156)






Glycine soja

ERR2040335
GsoBURP1
1
QPWGVGTW
QPWGVGTW






(SEQ ID NO:
(SEQ ID NO:






61)
61)




GsoBURP2
1
QPYGVYTW
QPYGVYTW






(SEQ ID NO:
(SEQ ID NO:






43)
43)






Glycyrrhiza

ERR2040333
GleBURP
1
QPYGVYTW
QPYGVYTW



lepidota




(SEQ ID NO:
(SEQ ID NO:






43)
43)






Lathyrus

ERR706828
LsaBURP
1
QPFGINSW
QPFGINSW



sativus




(SEQ ID NO:
(SEQ ID NO:






157)
157)






Senna

ERR706829
SheBURP1
1
QPFGVFAW
QPFGVFAW



hebecarpa




(SEQ ID NO:
(SEQ ID NO:






79)
79)



ERR706829
SheBURP2
1
QPYGVFAW
QPYGVFAW






(SEQ ID NO:
(SEQ ID NO:






78)
78)






Xanthocercis

ERR706865
XzaBURP
1
QPYGVYSW
QPYGVYSW



zambesiaca




(SEQ ID NO:
(SEQ ID NO:






90)
90)






Delosperma

ERR2040192
DecBURP1
2
QPWTVSLW
QPWGVSLW



echinatum




(SEQ ID NO:
(SEQ ID NO:






158)
159)




DecBURP2
2
QPWTVSSW
QPWGVSSW






(SEQ ID NO:
(SEQ ID NO:






160)
161)






Alternanthera

ERR2040215
AbrBURP
2
QPYTVGAW
QPYGVGAW



brasiliana




(SEQ ID NO:
(SEQ ID NO:






162)
150)






Alternanthera

ERR2040219
AseBURP
2
QPFTVGAW
QPFGVGAW



sessilis




(SEQ ID NO:
(SEQ ID NO:






163)
164)






Amaranthus

ERR2040205
AtrBURP1
2
QPYTVGSW
QPYGVGSW



tricolor




(SEQ ID NO:
(SEQ ID NO:






41),
148),






QPFTVGSW
QPFGVGSW






(SEQ ID NO:
(SEQ ID NO:






165)
166)




AtrBURP2
2
QPYTVGSW
QPYGVGSW






(SEQ ID NO:
(SEQ ID NO:






41),
148),






QPFTVGSW
QPFGVGSW






(SEQ ID NO:
(SEQ ID NO:






165)
166)






Atriplex

ERR2040208
AhoBURP
2
QPFTVGAW
QPFGVGAW



hortensis




(SEQ ID NO:
(SEQ ID NO:






163)
164)






Atriplex

ERR2040210
AprBURP1
2
QPFTVGAW
QPFGVGAW



prostrata




(SEQ ID NO:
(SEQ ID NO:






163)
164)




AprBURP2
2
QPFTVGAW
QPFGVGAW






(SEQ ID NO:
(SEQ ID NO:






163)
164)




AprBURP3
2
QPFTVGAW
QPFGVGAW






(SEQ ID NO:
(SEQ ID NO:






163),
164),






QPFTFRAW
QPFGFRAW






(SEQ ID NO:
(SEQ ID NO:






167)
75)






Chenopodium

ERR2040214
CquBURP
2
QPFTVVGW
QPFGVVGW



quinoa




(SEQ ID NO:
(SEQ ID NO:






53),
168),






QPYTVMAW
QPYGVMAW






(SEQ ID NO:
(SEQ ID NO:






54),
169),






QPYTVWGW
QPYGVWGW






(SEQ ID NO:
(SEQ ID NO:






55),
170),






QPYTVMGW
QPYGVMGW






(SEQ ID NO:
(SEQ ID NO:






56)
171)






Hypertelis

ERR2040235
HceBURP
2
QPFTVLGW
QPFGVLGW



cerviana




(SEQ ID NO:
(SEQ ID NO:






179),
180),






QPFTVFGW
QPFGVFGW






(SEQ ID NO:
(SEQ ID NO:






58)
83)






Bougainvillea

ERR2040242
BspBURP1
2
QPFTVGSW
QPFGVGSW



spectabilis




(SEQ ID NO:
(SEQ ID NO:






165),
166),






QPYTVGSW
QPYGVGSW






(SEQ ID NO:
(SEQ ID NO:






41),
148),






QPYTVGAW
QPYGVGAW






(SEQ ID NO:
(SEQ ID NO:






162)
150)




BspBURP2
2
(Q)PYTVGA
(Q)PYGVGA






W (SEQ ID
W (SEQ ID






NO: 162)
NO: 150)




BspBURP3
2
QPYTVGGW
QPYGVGGW






(SEQ ID NO:
(SEQ ID NO:






181),
182),






QPYTVGAW
QPYGVGAW






(SEQ ID NO:
(SEQ ID NO:






162),
150),






QPCTVGAW
QPCGVGAW






(SEQ ID NO:
(SEQ ID NO:






183)
184)






Petiveria

ERR2040253
PalBURP
2
QPYTVGAW
QPYGVGAW



alliacea




(SEQ ID NO:
(SEQ ID NO:






162)
150)






Phytolacca

ERR2040254
PboBURP
2
QPYTVFAW
QPYGVFAW



bogotensis




(SEQ ID NO:
(SEQ ID NO:






185),
78),






QPYTVFSW
QPYGVFSW






(SEQ ID NO:
(SEQ ID NO:






42)
175)






Microtea

ERR2040255
MdeBURP
2
QPYTVFAW
QPYGVFAW



debilis




(SEQ ID NO:
(SEQ ID NO:






185)
78)






Hilleria

ERR2040256
HlaBURP
2
QPYTVGSW
QPYGVGSW



latifolia




(SEQ ID NO:
(SEQ ID NO:






41),
148),






QPYIAILW
QPYIAILW






(SEQ ID NO:
(SEQ ID NO:






186)
186)






Atropa

ERR2040625
AbeBURP1
2
QPYGVFSW
QPYGVFSW



belladonna




(SEQ ID NO:
(SEQ ID NO:






175),
175),






QPYGVGFW
QPYGVGFW






(SEQ ID NO:
(SEQ ID NO:






176),
176),






QPYGVGSW
QPYGVGSW






(SEQ ID NO:
(SEQ ID NO:






148)
148)




AbeBURP2
2
QPWEVFSW
QPWEVFSW






(SEQ ID NO:
(SEQ ID NO:






177)
177)






Lycium

ERR2040629
LbaBURP1
2
QPYGVGSW
QPYGVGSW



barbarum




(SEQ ID NO:
(SEQ ID NO:






148)
148)




LbaBURP2
2
QPYGVGSW
QPYGVGSW






(SEQ ID NO:
(SEQ ID NO:






148),
148),






QPYGVFSW
QPYGVFSW






(SEQ ID NO:
(SEQ ID NO:






175),
175),






QPWGVGSW
QPWGVGSW






(SEQ ID NO:
(SEQ ID NO:






50)
50),







QPFGVGSW







(SEQ ID NO:







166)






Solanum

ERR2040630
SchBURP1
2
QPYGVYTW
QPYGVYTW



cheesmaniae




(SEQ ID NO:
(SEQ ID NO:






43)
43)




SchBURP2
2
QPWGVGSW
QPWGVGSW






(SEQ ID NO:
(SEQ ID NO:






50)
50)






Solanum

ERR2040627
SduBURP
2
QPYGVSIW
QPYGVSIW



dulcamara




(SEQ ID NO:
(SEQ ID NO:






173),
173),






QPWGVGSW
QPWGVGSW






(SEQ ID NO:
(SEQ ID NO:






50),
50),






QPYGVFSW
QPYGVFSW






(SEQ ID NO:
(SEQ ID NO:






175),
175),






QPYGVGIW
QPYGVGIW






(SEQ ID NO:
(SEQ ID NO:






174)
174)






Solanum

ERR2040632
SlaBURP1
2
QPWGVGAW
QPWGVGAW



lasiophyllum




(SEQ ID NO:
(SEQ ID NO:






87),
87),






QPYGVYSW
QPYGVYSW






(SEQ ID NO:
(SEQ ID NO:






90),
90),






QPWGVGSW
QPWGVGSW






(SEQ ID NO:
(SEQ ID NO:






50)
50)




SlaBURP2
2
QPWGVYRW
QPWGVYRW






(SEQ ID NO:
(SEQ ID NO:






88),
88),






QPYGVYRW
QPYGVYRW






(SEQ ID NO:
(SEQ ID NO:






89),
89),






QPWGVGAW
QPWGVGAW






(SEQ ID NO:
(SEQ ID NO:






87)
87)






Solanum

ERR2040626
SptBURP
2
QPYGVFAW
QPYGVFAW



ptychanthum




(SEQ ID NO:
(SEQ ID NO:






78)
78)






Solanum

ERR2040632
SsiBURP
2
QPYDAYSW
QPYDAYSW



sisymbriifolium




(SEQ ID NO:
(SEQ ID NO:






178)
178)






Solanum

ERR2040628
SviBURP1
2
QPYGVYSW
QPYGVYSW



virginianum




(SEQ ID NO:
(SEQ ID NO:






90)
90)




SviBURP2
2
QPYGVYGW
QPYGVYGW






(SEQ ID NO:
(SEQ ID NO:






172)
172)






QPYGVYVW
QPYGVYVW






(SEQ ID NO:
(SEQ ID NO:






251)
251)
















TABLE 4







Core peptide analysis of lyciumin chemotypes predicted


from genomes and transcriptomes.











Core






peptide
Chemotype
Organism(s)
Plant family
Plant order





QPYGVGSW*
Lyciumin

Amaranthus


Amaranthaceae,


Caryophyllales,



(SEQ ID NO:
A

hypochondriacus,


Phytolaccaceae,


Selaginellales,



148)


Amaranthus


Nyctaginaceae,


Solanales






tricolor, Atropa


Selaginellaceae,







belladonna,


Solanaceae







Bougainvillea








spectabilis, Hilleria








latifolia, Lycium








barbarum,








Selaginella








bryopteris








QPYGVFSW*
Lyciumin

Amaranthus


Amaranthaceae,


Caryophyllales,



(SEQ ID NO:
C

hypochondriacus,


Phytolaccaceae,


Solanales



175)


Phytolacca


Solanaceae







bogotensis, Atropa








belladonna, Lycium








barbarum,








Solanum








dulcamara,








QPYGVYTW*
Lyciumin

Arachis


Fabaceae,


Fabales,



(SEQ ID NO:
I

duranensis,


Solanaceae


Solanales



43)


Arachis ipaensis,








Glycine max,








Glycine soja,








Glycyrrhiza








lepidota, Medicago








truncatula,








Solanum








cheesmaniae,








Solanum








melongena,








Trifolium pratens








QPWGVYGW


Beta vulgaris


Amaranthaceae


Caryophyllales



(SEQ ID NO:






187)









QPWGVAGW


Beta vulgaris


Amaranthaceae


Caryophyllales



(SEQ ID NO:






188)









QPFGISAW


Beta vulgaris


Amaranthaceae


Caryophyllales



(SEQ ID NO:






189)









QPWGVAAW*
Lyciumin

Beta vulgaris


Amaranthaceae


Caryophyllales



(SEQ ID NO:
E





190)









QPYGGLTW


Capsicum annuum


Solanaceae


Solanales



(SEQ ID NO:






48)









QPWGVCLW


Capsicum annuum


Solanaceae


Solanales



(SEQ ID NO:






49)









QPWGVGSW*
Lyciumin

Capsicum annuum,


Solanaceae


Solanales



(SEQ ID NO:
B

Capsicum





50)


chinense, Lycium








barbarum,








Solanum








cheesmaniae,








Solanum








dulcamara,








Solanum








lasiophyllum,








Solanum








lycopersicum,








Solanum








melongena,








Solanum pennellii,








Solanum








pimpinellifolium,








Solanum








tuberosum








QPWGVGFW


Capsicum annuum,


Solanaceae


Solanales



(SEQ ID NO:


Capsicum chinense





51)









QPWGVCFW


Capsicum chinense


Solanaceae


Solanales



(SEQ ID NO:






52)









QPFGVVGW*
Lyciumin

Chenopodium


Amaranthaceae


Caryophyllales



(SEQ ID NO:
G

quinoa





168)









QPYGVMAW


Chenopodium


Amaranthaceae


Caryophyllales



(SEQ ID NO:


quinoa





169)









QPYGVWGW*
Lyciumin

Chenopodium


Amaranthaceae


Caryophyllales



(SEQ ID NO:
F

quinoa





170)









QPYGVMGW


Chenopodium


Amaranthaceae


Caryophyllales



(SEQ ID NO:


quinoa





171)









QPYGVYGW


Chenopodium


Amaranthaceae,


Caryophyllales,



(SEQ ID NO:


quinoa, Solanum


Solanaceae


Solanales



172)


virginianum








QPFGVFGW


Chenopodium


Amaranthaceae,


Caryophyllales,



(SEQ ID NO:


quinoa, Hypertelis


Molluginaceae,


Solanales



83)


cerviana, Petunia


Solanaceae







inflata, Solanum








tuberosum








QPYGVDGW


Chenopodium


Amaranthaceae,


Caryophyllales,



(SEQ ID NO:


quinoa, Solanum


Solanaceae


Solanales



107)


tuberosum








QPFGVFAW


Glycine max,


Fabaceae,


Fabales,



(SEQ ID NO:


Petunia axillaris,


Solanaceae


Solanales



79)


Senna hebecarpa








QPWGVGTW*
Lyciumin

Glycine max,


Fabaceae


Fabales



(SEQ ID NO:
H

Acacia





61)


argyrophylla,








Acacia pycnantha,








Glycine soja








QPLLFIYW


Medicago


Fabaceae


Fabales



(SEQ ID NO:


truncatula





62)









QPYGVYFW


Medicago


Fabaceae


Fabales



(SEQ ID NO:


truncatula





63)









QPLGTRMW


Medicago


Fabaceae


Fabales



(SEQ ID NO:


truncatula





191)









QPLGTSMW


Medicago


Fabaceae


Fabales



(SEQ ID NO:


truncatula





192)









QPIGTHMW


Medicago


Fabaceae


Fabales



(SEQ ID NO:


truncatula





193)









QPFGINIW


Medicago


Fabaceae


Fabales



(SEQ ID NO:


truncatula,





67)


Trifolium pratense








QPFGVLTW


Medicago


Fabaceae


Fabales



(SEQ ID NO:


truncatula





68)









QPFGFFSW


Medicago


Fabaceae


Fabales



(SEQ ID NO:


truncatula





69)









QPLPAHKW


Medicago


Fabaceae


Fabales



(SEQ ID NO:


truncatula





70)









QPFRTIGW


Medicago


Fabaceae


Fabales



(SEQ ID NO:


truncatula





71)









QPLGAVKW


Medicago


Fabaceae


Fabales



(SEQ ID NO:


truncatula





72)









QPFGSLTW


Medicago


Fabaceae


Fabales



(SEQ ID NO:


truncatula





73)









QPFGVAAW


Medicago


Fabaceae


Fabales



(SEQ ID NO:


truncatula





74)









QPFGFRAW


Atriplex prostrata,


Amaranthaceae,


Caryophyllales,



(SEQ ID NO:


Medicago


Fabaceae


Fabales



75)


truncatula








QPFEAHTW


Medicago


Fabaceae


Fabales



(SEQ ID NO:


truncatula





76)









QPWGVYSW


Nicotiana


Solanaceae


Solanales



(SEQ ID NO:


attenuata





77)









QPYGVFAW*
Lyciumin

Microtea debilis,


Phytolaccaceae,


Caryophyllales,



(SEQ ID NO:
J

Petunia axillaris,


Fabaceae,


Fabales,



78)


Phytolacca


Solanaceae


Solanales






bogotensis, Senna








hebecarpa,








Solanum








ptychanthum,








Solanum








tuberosum








QPYGPFGW


Petunia inflata


Solanaceae


Solanales



(SEQ ID NO:






80)









QPFGDYVW


Petunia inflata


Solanaceae


Solanales



(SEQ ID NO:






81)









QPYGVFGW


Petunia inflata,


Solanaceae


Solanales



(SEQ ID NO:


Solanum





82)


tuberosum








QPFGVFVW


Petunia inflata


Solanaceae


Solanales



(SEQ ID NO:






84)









QPAPQLYW


Prunus avium,


Rosaceae


Rosales



(SEQ ID NO:


Prunus persica





85)









QPAAQLYW


Prunus persica


Rosaceae


Rosales



(SEQ ID NO:






86)









QPWGVGAW*
Lyciumin

Solanum


Solanaceae


Solanales



(SEQ ID NO:
K

lasiophyllum,





87)


Solanum








lycopersicum,








Solanum pennellii,








Solanum








pimpinellifolium,








Solanum








tuberosum








QPWGVYRW


Solanum


Solanaceae


Solanales



(SEQ ID NO:


lasiophyllum,





88)


Solanum








lycopersicum








QPYGVYRW*
Lyciumin

Solanum


Solanaceae


Solanales



(SEQ ID NO:
M

lasiophyllum,





89)


Solanum








lycopersicum,








Solanum








pimpinellifolium,








Solanum








tuberosum








QPYGVYSW


Solanum


Fabaceae,


Fabales,



(SEQ ID NO:


lasiophyllum,


Solanaceae


Solanales



90)


Solanum








lycopersicum,








Solanum








pimpinellifolium,








Solanum








virginianum,








Xanthocercis








zambesiaca








QPWGVNSW


Solanum


Solanaceae


Solanales



(SEQ ID NO:


melongena





91)









QPWGVSSW


Delosperma


Aizoaceae


Caryophyllales



(SEQ ID NO:


echinatum





161)









QPWGVLRW


Solanum


Solanaceae


Solanales



(SEQ ID NO:


melongena





92)









QPWGVLGW


Solanum


Solanaceae


Solanales



(SEQ ID NO:


melongena





93)









QPFGVYRW


Solanum pennellii


Solanaceae


Solanales



(SEQ ID NO:






94)









QPWGVFRW


Solanum pennellii


Solanaceae


Solanales



(SEQ ID NO:






95)









QPWGVDSW


Solanum


Solanaceae


Solanales



(SEQ ID NO:


tuberosum





97)









QPYGVGVW


Solanum


Solanaceae


Solanales



(SEQ ID NO:


tuberosum





98)









QPFGVGRW


Solanum


Solanaceae


Solanales



(SEQ ID NO:


tuberosum





99)









QPWGVGRW*
Lyciumin

Solanum


Solanaceae


Solanales



(SEQ ID NO:
O

tuberosum





100)









QPFGVVAW


Solanum


Solanaceae


Solanales



(SEQ ID NO:


tuberosum





101)









QPYGVLAW


Solanum


Solanaceae


Solanales



(SEQ ID NO:


tuberosum





102)









QPYGVSRW*
Lyciumin

Solanum


Solanaceae


Solanales



(SEQ ID NO:
N

tuberosum





103)









QPWGVVAW*
Lyciumin

Solanum


Solanaceae


Solanales



(SEQ ID NO:
L

tuberosum





104)









QPYGVFRW


Solanum


Solanaceae


Solanales



(SEQ ID NO:


tuberosum





105)









QPLGTWIW


Trifolium pratense


Fabaceae


Fabales



(SEQ ID NO:






108)









QPFGIAAW


Trifolium pratense


Fabaceae


Fabales



(SEQ ID NO:






109)









QPSGVYIW


Trifolium pratense


Fabaceae


Fabales



(SEQ ID NO:






110)









QPAGLLAW


Vigna unguiculata


Fabaceae


Fabales



(SEQ ID NO:






194)









QPYGVIRW


Selaginella


Selaginellaceae


Selaginellales



(SEQ ID NO:


bryopteris





149)









QPYGVGAW


Alternanthera


Amaranthaceae,


Caryophyllales,



(SEQ ID NO:


brasiliana,


Nyctaginaceae,


Selaginellales



150)


Bougainvillea


Petiveriaceae,







spectabilis,


Selaginellaceae







Petiveria alliacea,








Selaginella








moellendorffii








QPYGVYAW


Apios americana


Fabaceae


Fabales



(SEQ ID NO:






151)









QPFGARTW


Astragalus


Fabaceae


Fabales



(SEQ ID NO:


membranaceus





152)









QPFGALVW


Astragalus


Fabaceae


Fabales



(SEQ ID NO:


membranaceus





153)









QPFGGFAW


Astragalus


Fabaceae


Fabales



(SEQ ID NO:


membranaceus





154)









QPFGFLIW


Astragalus


Fabaceae


Fabales



(SEQ ID NO:


membranaceus





155)









QPFGFLIW


Astragalus


Fabaceae


Fabales



(SEQ ID NO:


membranaceus





155)









QPYGVLYW


Bituminaria


Fabaceae


Fabales



(SEQ ID NO:


bituminosa





156)









QPFGINSW


Lathyrus sativus


Fabaceae


Fabales



(SEQ ID NO:






157)









QPWGVSLW


Delosperma


Aizoaceae


Caryophyllales



(SEQ ID NO:


echinatum





159)









QPFGVGAW


Atriplex hortensis,


Amaranthaceae


Caryophyllales



(SEQ ID NO:


Atriplex prostrata,





164)


Alternanthera








sessilis








QPFGVLGW


Hypertelis cerviana


Molluginaceae


Caryophyllales



(SEQ ID NO:






195)









QPYGVGGW


Bougainvillea


Nyctaginaceae


Caryophyllales



(SEQ ID NO:


spectabilis





182)









QPCGVGAW


Bougainvillea


Nyctaginaceae


Caryophyllales



(SEQ ID NO:


spectabilis





184)









QPYGVGFW


Atropa belladonna


Solanaceae


Solanales



(SEQ ID NO:






176)









QPWEVFSW


Atropa belladonna


Solanaceae


Solanales



(SEQ ID NO:






177)









QPFGVGSW


Amaranthus


Solanaceae


Solanales



(SEQ ID NO:


tricolor,





166)


Bougainvillea








spectabilis, Lycium








barbarum








QPYGVSIW


Solanum


Solanaceae


Solanales



(SEQ ID NO:


dulcamara





173)









QPYGVGIW


Solanum


Solanaceae


Solanales



(SEQ ID NO:


dulcamara





174)









QPYGVYGW


Solanum


Solanaceae


Solanales



(SEQ ID NO:


virginianum





172)









QPYSVFAW*
Lyciumin

Selaginella


Selaginellaceae


Selaginellales



(SEQ ID NO:
P

uncinata,





147)


Selaginella








willdenowii








QPYDAYSW


Solanum


Solanaceae


Solanales



(SEQ ID NO:


sisymbriifolium





178)





If fourth amino acid is a threonine, it is converted to glycine based on sequence rules of naturally occurring lyciumins (FIG. 2B) from predicted and characterized lyciumin precursors from genome and transcriptome mining.


Astericks note characterized lyciumin chemotypes.






The Lyciumin Biosynthetic Pathway is Promiscuous in Substrate and Macrocyclization

A biosynthetic proposal for lyciumins was established from heterologous expression, sequences and genome locations of precursor peptide genes. Following the general dogma of RiPP biosynthesis, lyciumin biosynthesis starts with translation of a precursor peptide gene such as LbaLycA by the ribosome (FIG. 3A) [8]. The precursor peptide is then cyclized between the tryptophan and glycine in each core peptide, which is supported by no detection of the linear core peptides in the LbaLycA heterologous expression experiments in N. benthamiana or in L. barbarum root extracts. Cyclization of the tryptophan-indole nitrogen to an unactivated α-carbon suggests a radical-oxidative cyclization mechanism (36), although candidate lyciumin cyclases have yet to be identified. In the next step, the modified LbaLycA is cleaved by an endopeptidase, N-terminally of the core peptide. This is supported by the detection of lyciumin derivatives with an N-terminal glutamine in leaf extracts of N. benthamiana transiently expressing LbaLycA, and in L. barbarum root extracts (FIG. 3B). Subsequently, core peptides are N-terminally protected by pyroglutamate formation, which can be catalyzed by a glutamine cyclotransferase (QC). Indeed, QC-encoding genes were identified next to the lyciumin precursor genes in the genomes of Chenopodium quinoa and Beta vulgaris (FIG. 3C, Table 5). Furthermore, co-expression of LbaLycA with a L. barbarum homolog of these QCs in N. benthamiana resulted in the loss of mass signals of N-terminally unprotected lyciumins, confirming their enzymatic role in forming the N-terminal pyroglutamate moiety in lyciumins (FIGS. 3D and 21A-B). In the final step, lyciumins are produced by C-terminal exoproteolytic maturation. This step is supported by the detection of multiple C-terminally extended lyciumin derivatives in leaf extracts of N. benthamiana transiently expressing LbaLycA, and in L. barbarum root extracts (FIG. 3E). See also FIG. 4E.









TABLE 5







Bioinformatic analysis of lyciumin precursor genes and co-clustered glutamine


cyclotransferase genes in genomes of Beta vulgaris and Chenopodium quinoa (FIG. 3C).














Gene
Closest functional blastp



Predicted

product
homolog (organism)


Gene
function
Reference
length [aa]
[Similarity/Identity, %/%]











Chenopodium quinoa (v1.0) locus












AUR62017095-RA
BURP
XP_021740703.1
619
XP_010675925.1|PREDICTED:


(CquBURP1)
domain


BURP domain



lyciumin


protein USPL1 [Beta



precursor



vulgaris subsp. vulgaris]







[68/57]


AUR62017096-RA
Glutamine
XP_021740704.1
286
XP_010675927.1|PREDICTED:



cyclotransferase


glutaminyl-peptide






cyclotransferase isoform X1






(Beta vulgaris subsp.







vulgaris) [80/71]









Beta vulgaris locus (RefBeet-1.2.2)












LOC104891851
BURP
XP_010675925.1
446
XP_010676059.1|PREDICTED:


(BvuBURP2)
domain


BURP domain



lyciumin


protein USPL1-like (Beta



precursor



vulgaris subsp. vulgaris)







[93/89]


LOC104891854
Glutamine
XP_010675927.1
306
XP_021771347.1|glutaminyl-



cyclotransferase


peptide cyclotransferase-






like isoform X1






[Chenopodium quinoa]


LOC104891968
BURP
XP_010676059.1
404
XP_010675925.1|PREDICTED:


(BvuBURP1)
domain


BURP domain



lyciumin


protein USPL1 (Beta



precursor



vulgaris subsp. vulgaris)







[93/89]


LOC104891853
Glutamine
XP_010675926.1
201
XP_021740704.1|glutaminyl-



cyclotransferase


peptide cyclotransferase-






like (Chenopodium quinoa)






[73/68]









Despite the lack of a characterized lyciumin cyclase, the promiscuity of the lyciumin biosynthetic pathway in N. benthamiana was investigated. In order to generate lyciumin core peptide mutants, a lyciumin precursor from Glycine max with only one core peptide (QPYGVYTW (SEQ ID NO: 43)) in the N-terminal domain was characterized. Heterologous expression of this precursor, namely Sali3-2 (Glyma. 12G217400) [39], in N. benthamiana resulted in the formation of its predicted lyciumin product, lyciumin I (FIGS. 2B and 4A). With a single core peptide precursor in hand, an alanine scan was first performed through its core peptide region to identify mutable positions (FIG. 4B). Based on MS analysis, mutations of all residues, except the N-terminal glutamine and the C-terminal tryptophan of the Sali3-2 core peptide, to alanine resulted in lyciumin formation, indicating core peptide promiscuity of the lyciumin pathway (FIGS. 4B, 22A-E, 23A-B, 24A-B, 25A-B, 26A-B, and 27A-B).


Next, it was tested whether the length of the linear N-terminus or the size of the peptide macrocycle could be modified in the N. benthamiana heterologous expression system (FIG. 4B). No lyciumin production was observed for a four-, two-, or one-amino acid-long N-terminal branch, suggesting a conserved N-terminus length of three amino acids (FIG. 4B). Similarly, no lyciumin formation occurred from core peptides with four- or six-amino acid-long C-termini, indicating a conserved C-terminal macrocycle of five amino acids (FIG. 4B). Finally, whether the cyclization residues could be altered was tested. Based on Amaranthaceae lyciumin precursors, mutation of Sali3-2 core peptide from glycine to threonine at the fourth position resulted in lyciumin I production. When this mutant precursor peptide gene was expressed in N. benthamiana, production of a putative dehydrothreonine-derivative was detected, as observed for Amaranthaceae lyciumin chemotypes with corresponding [Thr4]-core peptides (FIG. 2B) in addition to lyciumin I (FIGS. 28A-I). Mutation of the C-terminal tryptophan to the other aromatic amino acids, phenylalanine and histidine, led to abolishment of lyciumin formation (FIG. 4B). Surprisingly, mutation to tyrosine yielded a cyclic peptide with a putative new tyrosine [Tyr8] macrocyclization based on MS analysis, thus, suggesting the discovery of a new peptide macrocyclization (FIG. 4B).


In order to investigate whether the lyciumin pathway can produce unknown peptide macrocyclizations, the BURP domain sequences from genome sequenced plants for core peptides with the motif QP(X)5Y, i.e. a C-terminal tyrosine instead of a tryptophan, were revisited. This search identified a candidate precursor peptide from Capsicum annuum, CanBURP, with a predicted QPYGVYFY core peptide was transiently expressed in N. benthamiana, which suggested a lyciumin derivative with a tyrosine macrocyclization. In a parallel experiment, Sali3-2 was transiently expressed with the same core peptide sequence. Furthermore, peptide metabolic profiling was also conducted with C. annuum seed extract. In all three experiments, analytes were detected suggesting cyclic peptide chemistry derived from the tyrosine-terminal core peptide (FIGS. 30A-F). Ultimately, the investigation of the lyciumin pathway in tobacco with a soybean precursor peptide shows specificity in peptide architecture but promiscuity in peptide sequence and macrocyclization, allowing for the discovery of a new RiPP class by genome mining.


Taken together, these structure-function relationship studies varying precursor core peptide sequence in heterologously reconstituted lyciumin pathway in N. benthamiana suggest restriction in peptide length but promiscuity in peptide sequence and macrocyclization, presenting tremendous opportunity for branched cyclic RiPP diversification via metabolic engineering. Several lyciumins have been produced, such as lyciumin H and K (FIGS. 2C, 4B, 31A-B, and 32A-B), “unnatural” lyciumins such as lyciumin-[QPFGVYTW] and lyciumin-[QPWGVYTW] (FIGS. 4B, 33A-B, and 34A-B), and predicted lyciumins with up to four mutations from genome-derived BURP domain sequences such as lyciumin-[QPFGFFSW] and lyciumin-[QPYGVYFW] from M. truncatula (FIGS. 4B, 35A-B, and 36A-B) and lyciumin-[QPYGVYSW] from Nicotiana attenuata (FIGS. 4B and 37A-B) via the Sali3-2-based heterologous expression system in N. benthamiana. These results highlight the utility of this expression platform for unlocking cryptic peptide chemotypes from diverse plant species and producing peptide libraries based on existing, predicted or unknown lyciumin chemical space.


Heterologous Lyciumin Production in Tobacco Vs. Source Plant (Lycium barbarum)


Transient expression was measured for engineered lyciumin precursor from Lycium barbarum (LbaLycA) with one, five or ten repeats of a single core peptide (QPWGVGSW=lyciumin B (SEQ ID NO: 50)) after infiltration of six week old tobacco leaves with A. tumefaciens LBA4404 pEAQ-HT-LycA-1×/5×/10× (three plants per construct). See SEQ ID NOS: 112-117, FIGS. 38B-D.


Peptide extraction of freeze-dried, infiltrated leaf samples of each plant (0.1 g) occurred six days after infiltration and peptide extraction of freeze-dried Lycium barbarum roots was done with plant material from 6 month old plants (0.1 g).


LC-MS analysis for lyciumin B mass signal (single ion monitoring: 896.3-897.3 m/z) and manual peak integration was performed in QualBrowser (Thermo).


The results are shown in FIG. 38A, which indicate detection of increasing lyciumin B concentration in tobacco transiently expressing engineered lyciumin precursors with increasing lyciumin B repeats (1<5<10); and detection of higher lyciumin B concentration in tobacco transiently expressing lyciumin precursors than in lyciumin B source plant tissue (Lycium barbarum root).


Discussion

Here, lyciumins are described as a branched cyclic plant RiPP class that is characterized by an N-terminal pyroglutamate and a macrocyclic bond between a C-terminal tryptophan or tyrosine residue, e.g., C-terminal tryptophan-indole-nitrogen and a glycine-α-carbon. Lyciumins were determined to be RiPPs by identification of their precursor peptide from Lycium barbarum. The characterized precursor peptide of this RiPP class enabled successful genome mining of other lyciumin genotypes and chemotypes in genome sequenced crop and forage plants. Biosynthetic investigation of the lyciumin pathway in tobacco indicates promiscuity in peptide sequence, but not in peptide architecture. The pathway tolerates mutation of the C-terminal tryptophan to a tyrosine to form a putative new peptide macrocyclization found in lyciumin-[QPYGVYFY] from pepper seeds by genome mining. The connection of lyciumin core peptides with BURP domains suggests a physical connection of abiotic stress responses via heavy metal-binding BURP domains in plant vacuoles and lyciumin peptide signaling from roots and young plants for potential rhizosphere modulation to alleviate stress such as drought and acidic soil. As rhizosphere engineering has potential to increase crop and forage plant fitness in changing climates and growth conditions, a platform was engineered to produce lyciumin peptide libraries in planta for potential agricultural applications of the lyciumin pathway as transgenes or gene editing targets. Lyciumin cyclic peptide libraries may be used for other purposes, such as development of peptide based-drugs such as protease inhibitors.


Genome mining for plant natural product discovery has been realized for multiple natural product classes on a single-pathway scale [20] and on a multi-pathway scale [21,22,41,42]. Increasing biosynthetic knowledge catalyzed by synthetic biology [18,23] and increasing genomic plant resources prime the field of plant natural products for automated natural product discovery by genome mining. Plant RiPPs such as the lyciumins are a suitable class to be added to plant genome mining pipelines as they could be readily connected from a genotype to a chemotype in a similar fashion as microbial peptide natural products [43].


Overall, described herein is a blueprint for genome mining of branched cyclic RiPPs in plants by identification of pathway-specific precursor peptides. These cyclic lyciumin peptides have potential utility for increasing crop fitness and developing peptide-based drugs.


Materials and Methods
Materials and Instruments

All chemicals were purchased from Sigma-Aldrich, unless otherwise specified. Oligonucleotide primers and synthetic genes were purchased as gBlocks® from Integrated DNA Technologies, Inc. Solvents for liquid chromatography high-resolution mass spectrometry were Optima® LC-MS grade (Fisher Scientific) or LiChrosolv® LC-MS grade (Millipore). High resolution mass spectrometry analysis was performed on a Thermo ESI-Q-Exactive Orbitrap MS coupled to a Thermo Ultimate 3000 UHPLC system. Low-resolution mass spectrometry analysis was done on a Thermo ESI-QQQ MS coupled to a Thermo Ultimate 3000 UHPLC system. NMR analysis was performed on a Bruker Avance II 600 MHz NMR spectrometer equipped with a High Sensitivity Prodigy Cryoprobe. Preparative HPLC was performed on a Shimadzu LC-20AP liquid chromatograph equipped with a SPD-20A UV/VIS detector and a FRC-10A fraction collector.


Plant Material


Lycium barbarum was purchased as three year-old plants for extraction and cultivation. Amaranthus hypochondriacus seeds for cultivation were purchased from Strictly Medicinal® Seeds. Amaranth grain for extraction was Arrowhead Mills® amaranth. Chenopodium quinoa seeds for cultivation were purchased from Earthcare Seeds. Quinoa for extraction was Trader Joe's® Tricolor quinoa. Beta vulgaris seeds (Detroit Dark Red cultivar) for cultivation and extraction were purchased from David's Garden Seeds. Glycine max seeds (Chiba green soybean) for cultivation and extraction were purchased from High Mowing Organic Seeds®. Seeds of wild-type Medicago truncatula for cultivation were a gift by Prof. Dong Wang (U Mass Amherst). Capsicum annuum seeds (Jalapeno Early) for cultivation and extraction were purchased from EdenBrothers®. Solanum lycopersicum seeds (cultivar Heinz 1706-BG) for cultivation were provided by the Tomato Genetics Resource Center (UC Davis). Solanum melongena seeds for cultivation were purchased from Seedz®. Solanum tuberosum tubers for cultivation (Russett or Red potato) were purchased from Trader Joe's®. Trifolium pratense seeds were purchased from OutsidePride.com®. Nicotiana benthamiana seeds for cultivation were a gift from the Lindquist lab (Whitehead Institute, MIT). Selaginella uncinata plant was purchased from Plant Delights Nursery®.


Plant Cultivation


Lycium barbarum was grown from three year-old live roots in MiracleGro® potting soil as a potted plant in full sun with occasional application of organic fertilizer. Lycium barbarum seeds from fruits of the three year-old plant, were grown in Sun Gro® Propagation Mix soil with added vermiculite (Whittemore Inc.) and added fertilizer in a greenhouse with a 16 h light/8 h dark cycle for six months. Amaranthus hypochondriacus, Chenopodium quinoa, Beta vulgaris, Glycine max, Medicago truncatula, Capsicum annuum, Solanum lycopersicum, Solanum melongena and Trifolium pratense were grown from seeds in Sun Gro® Propagation Mix soil with added vermiculite (Whittemore Inc.) and added fertilizer in a greenhouse with a 16 h light/8 h dark cycle for six months. Nicotiana benthamiana was grown from seeds in Sun Gro® Propagation Mix soil with added vermiculite (Whittemore Inc.) and added fertilizer in a greenhouse with a 16 h light/8 h dark cycle for three months. Solanum tuberosum tubers were sprouted under natural light for three weeks.


Transcriptomic Analysis of Lycium barbarum and Identification of Candidate Precursor Gene LbaLycA



Lycium barbarum roots were removed from a three year-old plant, washed with sterile water, and total RNA was extracted with the QIAGEN RNeasy Plant Mini kit. RNA quality was assessed by Agilent Bioanalyzer. A strand-specific mRNA library was prepared (TruSeq Stranded Total RNA with Ribo Zero Library Preparation Kit, Illumina) and sequenced with a HiSeq2000 Illumina sequencer in HISEQRAPID mode (100×100). Illumina sequence raw-files were combined and assembled by the Trinity package [34]. Gene expression was estimated by mapping raw sequencing reads to the assembled transcriptomes using RSEM [37]. The Lycium barbarum root transcriptome was analyzed for lyciumin precursors by searching predicted core peptide sequences for known lyciumin A (SEQ ID NO: 148; QPYGVGSW), lyciumin B (SEQ ID NO: 50; QPWGVGSW), lyciumin C (SEQ ID NO: 175; QPYGVFSW), and lyciumin D (SEQ ID NO: 174; QPYGVGIW) by blastp algorithm on an internal Blast server. In order to clone and sequence a lyciumin precursor gene from Lycium barbarum, cDNA was prepared from root total RNA with SuperScript® III First-Strand Synthesis System (Invitrogen). Transcripts with lyciumin core peptide sequences were used to design cloning primers (LbaLycA-pEAQ-AgeI (SEQ ID NO: 118): AGACCGGTATGGAGTTGCATCACCATTAC, LbaLycA-pEAQ-XhoI (SEQ ID NO: 119): AGCTCGAGTTAGTTTTCAGACACTTGAGTTGCG) for amplification of precursor peptide gene LbaLycA with Phusion® High-Fidelity DNA polymerase (New England Biolabs) and directional cloning with restriction enzymes AgeI and XhoI (New England Biolabs) and T4 DNA ligase (New England Biolabs) into pEAQ-HT, which was linearized by restriction enzymes AgeI and XhoI [38]. Cloned LbaLycA was sequenced by Sanger sequencing from pEAQ-HT-LbaLycA.


Cloning of Lyciumin Precursor Gene StuBURP from Solanum tuberosum


Tuber sprout tissue was removed from a sprouting potato tuber and total RNA was extracted with the QIAGEN RNeasy Plant Mini kit. cDNA was prepared from sprout total RNA with SuperScript® III First-Strand Synthesis System (Invitrogen). A de novo transcriptome was assembled from a Russett potato RNA-seq dataset (NCBI SRA: SRR5970148) and transcripts homologous to target lyciumin precursor PGSC0003DMG400047074 were used to design cloning primers (StuBURP-pEAQ-fwd (SEQ ID NO: 120): TGCCCAAATTCGCGACCGGTATGGAGTTGCATCACCAATA; StuBURP-pEAQ-rev (SEQ ID NO: 121): CCAGAGTTAAAGGCCTCGAGTTAGTTTTCAGCCACTTGAAGAACTG) for amplification of precursor gene StuBURP with Phusion® High-Fidelity DNA polymerase (New England Biolabs). StuBURP was cloned into pEAQ-HT, which was linearized by restriction enzymes AgeI and XhoI, by Gibson cloning assembly (New England Biolabs). Cloned StuBURP (SEQ ID NO. 126) was sequenced by Sanger sequencing from pEAQ-HT-StuBURP.


Heterologous Expression of Lyciumin Precursor Genes in Nicotiana benthamiana



Agrobacterium tumefaciens LBA4404 was transformed with pEAQ-HT-LbaLycA, other pEAQ-HT constructs with lyciumin precursor genes (pEAQ-HT-StuBURP, pEAQ-HT-CanBURP, pEAQ-HT-Sali3-2, pEAQ-HT-Sali-3-2-mutants) or pEAQ-HT-LbaQC by electroporation (2.5 kV), plated on YM agar (0.4 g yeast extract, 10 g mannitol, 0.1 g sodium chloride, 0.2 g magnesium sulfate (heptahydrate), 0.5 g potassium phosphate, (dibasic, trihydrate), 15 g agar, ad 1 L Milli-Q Millipore water, adjusted pH 7) with 100 μg/mL rifampicin, 50 μg/mL kanamycin and 100 μg/mL streptomycin and incubated for two days at 30° C. A 5 mL starter culture of YM medium with 100 μg/mL rifampicin, 50 μg/mL, kanamycin and 100 μg/mL streptomycin was inoculated with a clone of Agrobacterium tumefaciens LBA4404 pEAQ-HT-LbaLycA and incubated for 24-36 h at 30 on a shaker at 225 rpm. Subsequently, the starter culture was used to inoculate a 50 mL culture of YM medium with 100 μg/mL rifampicin, 50 μg/mL kanamycin and 100 μg/mL streptomycin, which was incubated for 24 h at 30° C., on a shaker at 225 rpm. The cells from the 50 mL culture were centrifuged for 30 min at 3000 g, the YM medium was discarded and cells were resuspended in MMA medium (10 mM MES KOH buffer (pH 5.6), 10 mM magnesium chloride, 100 μM acetosyringone) to give a final optical density of 0.8. The Agrobacterium suspension was infiltrated into the bottom of Nicotiana benthamiana plants (six weeks old). N. benthamiana plants were placed in the shade two hours before infiltration. After infiltration, N. benthamiana plants were grown as described above for six days. Subsequently, infiltrated leaves were collected and subjected to chemotyping.


Chemotyping of Lyciumin Peptides from Plant Material


For peptide chemotyping, 0.2 g plant material (fresh weight) were frozen in liquid nitrogen and ground with mortar and pestle. Ground plant material was extracted with 10 mL methanol for 1 h at 37° C. in a glass vial. Plant methanol extract was dried under nitrogen gas in a separate glass vial. Dried plant methanol extract was resuspended in water (10 mL) and partitioned with hexane (2×10 mL) and ethyl acetate (2×10 mL), and subsequently extracted with n-butanol (10 mL). The n-butanol extract was dried in vacuo and resuspended in 2 mL methanol for liquid chromatography-mass spectrometry (LC-MS) analysis. Peptide extracts were subjected to high resolution MS analysis with the following LC-MS parameters: LC—Phenomenex Kinetex® 2.6 μm C18 reverse phase 100 Å 150×3 mm LC column, LC gradient: solvent A—0.1% formic acid, solvent B—acetonitrile (0.1% formic acid), 0-2 min—5% B, 2-23 min—5-95% B, 23-25 min—95% B, 25-30 min—5% B, 0.5 mL/min, MS—positive ion mode, Full MS: Resolution 70000, mass range 425-1250 m/z, dd-MS2 (data-dependent MS/MS): resolution 17500, Loop count 5, Collision energy 15-35 eV (stepped), dynamic exclusion 1 s. LC-MS data was analyzed with QualBrowser in the Thermo Xcalibur software package (version 3.0.63, ThermoScientific).


For comparative chemotyping of lyciumin concentrations in different plant tissues, peptides were extracted from plant tissues as described above from three different plants of the same age. Analyzed tissues of Amaranthus hypochondriacus and Chenopodium quinoa (three month old) were flower, leaf, root, seed and stem. Analyzed tissues of Beta vulgaris (three month old) were leaf, root, seed and stem. Analyzed tissues of Glycine max (three month old) were bean, leaf, pod, root and stem. Analyzed tissues for Solanum tuberosum (three week old) were sprout and tuber. Peptide extracts were subjected to low resolution MS analysis by selected-ion monitoring (SIM) of lyciumin masses specific to each plant and the following LC-MS parameters: LC—Phenomenex Kinetex® 2.6 μm C18 reverse phase 100 Å 150×3 mm LC column, LC gradient: solvent A—0.1% formic acid, solvent B—acetonitrile (0.1% formic acid), 0-1 min: 5% B, 1-8 min: 5-95% B, 8-10 min: 95% B, 10-15 min: 5% B, MS—positive ion mode, SIM (Amaranthus hypochondriacus: 872.8-873.8 m/z and 963.8-964.8 m/z, Chenopodium quinoa: 869.8-870.8 m/z and 972.8-973.8 m/z, Beta vulgaris: 894.8-895.8 m/z, Glycine max: 910.8-911.8 m/z and 993.8-994.8 m/z, Solanum tuberosum: 880.8-881.8 m/z, 896.8-897.8 m/z, 913.8-914.8 m/z, 922.8-923.8 m/z, 947.8-948.8 m/z, 972.8-973.8 m/z and 1048.8-1049.8 m/z). Lyciumin ion abundance values were determined by peak area integration from each lyciumin SIM chromatogram in QualBrowser in the Thermo Xcalibur software package (version 3.0.63, ThermoScientific).


Lyciumin Genome Mining

Prediction of lyciumin genotypes: For prediction of lyciumin precursor peptide genes in a plant genome, LbaLycA homologs were searched by tblastn search in the 6-frame translated genome sequence (JGI Phytozome v12.1 and pre-release genomes) or by blastp of Refseq protein sequences (NCBI genomes, Table 1). In addition, annotated BURP domains were identified by ‘BURP domain’ Keyword search (JGI Phytozome v12.1 and pre-release genomes). All identified BURP domain proteins from a plant genome were then searched for lyciumin core peptide sequences with the search criteria of a glutamine and proline as the first and second amino acid, respectively, in the core peptide sequence and a tryptophan at the eighth position of the core peptide sequence. A BURP domain protein, which contained one or multiple sequences matching these lyciumin core peptide criteria, was a candidate lyciumin precursor peptide and, thus, its gene a predicted lyciumin genotype in the target plant genome.


In order to complement missing core peptide sequences from a lyciumin precursor gene with a sequence gap in the potato genome (PGSC0003DMG400047074), a Russett potato tuber transcriptome (NCBI SRA: SRR5970148) was assembled by Trinity (v2.4) [34] and rnaSPAdes (v1.0, kmer 25,75) [35]. Precursor peptide transcripts with missing core peptide sequences were searched in both de novo transcriptome assemblies by LbaLycA tblastn search.


Prediction of lyciumin chemotypes: A lyciumin structure was predicted from a putative lyciumin core peptide sequence by transformation of the glutamine at the first position to a pyroglutamate and formation of a covalent bond between the indole-nitrogen of the tryptophan at the eighth position with the α-carbon of the residue at the fourth position by loss of two hydrogens.


Lyciumin chemotyping: LC-MS data of peptide extracts from a predicted lyciumin producing plant was analyzed for lyciumin mass signals by (a) parent mass search (base peak chromatogram of calculated [M+H]+ of predicted lyciumin structure, Δm=5 ppm), (b) fragment mass search of pyroglutamate-proline-b-ion in MS/MS data (C10H13N2O+, 209.09207 m/z, Δm=5 ppm), and (c) iminium ion mass search of specific amino acids of predicted structure in MS/MS data (for example, pyroglutamate iminium ion [M+H]+ 84.04439 m/z) with QualBrowser in the Thermo Xcalibur software package (version 3.0.63, ThermoScientific). Putative mass signals of predicted lyciumin structures were confirmed by MS/MS data analysis.


Lyciumin Transcriptome Mining

For lyciumin transcriptome mining, transcriptomes of terrestrial plants from the 1 kp database were assembled by rnaSPAdes (v1.0, kmer 25,75 or, if failed, default kmer 55). De novo assembled transcriptomes were searched for LbaLycA homologs (type 1 lyciumin precursor) and Sali3-2 (Glycine max, AAB66369.1, type 2 lyciumin precursor) by tblastn search on an internal Blast server. Candidate lyciumin precursors were predicted with the same core peptide search criteria as for lyciumin genome mining with some precursors being partial sequences due to failed complete de novo assembly (Table 3). In order to verify lyciumin genotype prediction in Selaginellaceae (1 kp dataset: ERR2040880—Selaginella willdenowii), other Selaginella transcriptomes from the NCBI SRA were de novo assembled (rnaSPAdes, v1.0, kmer 25,75, SRR3136708—Selaginella bryopteris, SRR4762537—Selaginella martensii, SRR5499403—Selaginella uncinata, SRR7132763 —Selaginella uncinata, SRR7132764—Selaginella rupestris, SRR7132766—Selaginella moellendorffii, SRR7132767—Selaginella peruviana, SRR7132768—Selaginella borealis, SRR7132769—Selaginella braunii), and searched for lyciumin genotypes as described above. For core peptide analysis of predicted lyciumin genotypes (Table 4, FIG. 10C) from genomes and transcriptomes, predicted core peptides from predicted lyciumin precursor protein sequences were transformed at the fourth position to a glycine in case of a fourth position threonine based on glycine as the common cyclization residue in Amaranthaceae lyciumin chemotypes (FIG. 2C).


Phylogenetic Analysis of Lyciumin Precursor Peptides

Protein sequences of characterized and predicted lyciumin precursors from genomes and transcriptomes (except 3′-partial sequences) and four founding members of the BURP domain family (NP_001303011.1—BURP domain-containing protein BNM2A precursor [Brassica napus], NP 001234835.1—Polygalacturonase-1 non-catalytic subunit beta precursor [Solanum lycopersicum], CAA31603.1/CAA31602.1—Embryonic abundant protein USP87/Embryonic abundant protein USP92 [Vicia faba], NP 197943.1—BURP domain protein RD22 [Arabidopsis thaliana]) [30-33] were reduced to their BURP domain (Pfam PF03181) and aligned using Muscle algorithm in MEGA (ver. 7.0.9). A neighbor-joining phylogenetic tree was generated with 2000 bootstrap generations using the p-distance method in MEGA.


Lyciumin Metabolic Engineering in Nicotiana benthamiana


Predicted lyciumin precursor Sali3-2 (Glyma.12G217400) was synthesized as an IDT gBlock® with a 5′-adapter (TGCCCAAATTCGCGACCGGT (SEQ ID NO: 252)) and a 3′-adapter (CTCGAGGCCTTTAACTCTGG (SEQ ID NO: 253) for Gibson assembly. pEAQ-HT was digested by AgeI and XhoI restriction enzymes and the Sali3-2 gBlock® was cloned into the digested pEAQ-HT with Gibson Assembly Master Mix (New England Biolabs). pEAQ-HT-Sali3-2 was verified by Sanger sequencing and transformed into Agrobacterium tumefaciens LBA4404 for heterologous expression as described above. Constructs for metabolic engineering of lyciumins were Sali3-2 mutants of its core peptide sequence (SEQ ID NOS: 8-33). Sali3-2 mutants were synthesized as gBlocks® and cloned into pEAQ-HT for heterologous expression in N. benthamiana as described above. Chemotyping of infiltrated N. benthamiana leafs for lyciumins was done as described above.


Purification and Structure Elucidation of Lyciumins

For lyciumin A, B and I) isolation, Lycium barbarum roots (100 g wet weight) were ground with a tissue homogenizer and extracted for 16 h with methanol shaking at 225 rpm and 37 T. For lyciumin C isolation, amaranth grain (4.5 kg) was ground in a tissue homogenizer and extracted for 16 h with methanol shaking at 22.5 rpm and 37° C. Methanol extracts were filtered and dried in vacuo. Dried methanol extracts were resuspended in water and partitioned twice with hexane and twice with ethylacetate and then extracted twice with n-butanol. n-butanol extracts were dried in vacuo. Dried n-butanol extracts were resuspended in 10% methanol and separated by flash liquid chromatography with Sephadex LH20 as a stationary phase and a gradient of 10-100% methanol as a mobile phase. Fractions were collected with a fraction collector and analyzed for lyciumin content by LC-QQQ-MS with the following LC-MS settings: LC—Phenomenex Kinetex® 2.6 μm C18 reverse phase 100 Å 150×3 mm LC column, LC gradient: solvent A—0.1% formic acid, solvent B—acetonitrile (0.1% formic acid), 0.5 mL/min, 0-1 min: 5% B, 1-8 min: 5-95% B, 8-10 min: 95% B, 10-15 min: 5% B, MS—positive ion mode, Full MS: Lyciumin A/B/D—860-920 m/z, Lyciumin C/I—950-1010 m/z. LH20 fractions with lyciumins were combined, dried in vacuo, resuspended in 10% acetonitrile (0.1°/o trifluoroacetic acid) and subjected to preparative HPLC with a Phenomenex Kinetex® 5 μm C18 reverse phase 100 Å 150×21.2 mm LC column as a stationary phase for two rounds of separation. LC settings were as follows: solvent A—0.1% trifluoroacetic acid, solvent B—acetonitrile (0.1% trifluoroacetic acid), 10 mL/min, Lyciumin A (20 mg)—1.LC: 0-3 min: 10% B, 3-43 min: 10-50% B, 43-45 min: 50-95% B, 45-48 min: 95% B, 48-49 min: 95-10% B, 49-69 min: 10% B, 2.LC: 0-5 min: 35% B, 5-35 min: 35-50% B, 35-38 min: 50-95% B, 38-40 min: 95% B, 40-40.1 min: 95-35% B, 40.1-60 min: 35% B, Lyciumin B (13 mg)—1.LC: 0-3 min: 20% B, 3-48 min: 20-40% B, 48-50 min: 40-95% B, 50-54 min: 95% B, 54-55 min: 95-20% B, 55-70 min: 20% B, 2.LC: 0-3 min: 30% B, 3-35 min: 30-45% B, 35-38 min: 45-95% B, 38-40 min: 95% B, 40-40.1 min: 95-30% B, 40.1-60 min: 30% B, Lyciumin C—1.LC: 0-3 min: 10% B, 3-43 min: 10-50% 13, 43-45 min: 50-95% B, 45-48 min: 95% B, 48-49 min: 95-10% B, 49-69 min: 10% B, 2.LC: 0-3 min: 40% B, 3-48 min: 40-55% B, 48-50 min: 55-95% B, 50-54 min 95% B, 54-55 min: 95-40%, 55-70 min: 40% B, Lyciumin D (5 mg)—1.LC: 0-3 min: 20% B, 3-48 min: 20-40% B, 48-50 min: 40-95% B, 50-54 min: 95% B, 54-55 min: 95-20% B, 55-70 min: 20% B, 2.LC: 0-3 min: 30% B, 3-48 min: 30-50% B, 48-50 min: 50-95% B, 50-54 min: 95% B, 54-55 min: 95-30% B, 55-70 min: 30% B, Lyciumin I—1.LC: 0-3 min: 20% B, 3-48 min: 20-50% B, 48-50 min: 50-95% B, 50-54 min: 95% B, 54-55 min: 95-20% B, 55-70 min: 20% B. 2.LC: 0-3 min: 25% B, 3-48 min: 25-45% B, 48-50 min: 45-95% B, 50-54 min: 95% B, 54-55 min: 95-25% B, 55-70 min: 25% B. Preparative HPLC fractions with lyciumin C and lyciumin I, respectively, were combined, dried in vacuo, resuspended in 30% acetonitrile (0.1% trifluoroacetic acid) and subjected to semipreparative HPLC with a Phenomenex Kinetex® 5 μm C18 reverse phase 100 Å 250×10 mm LC column as a stationary phase. LC settings were as follows: Solvent A—0.1% trifluoroacetic acid, solvent B—acetonitrile (0.1% trifluoroacetic acid), 1.5 mL/min, Lyciumin C (25 mg)—0-5 min: 40% B, 5-15 min: 40-42% B, 15-17 min: 42-95% B, 17-20 min, 95% B, 20-20.1 min: 95-40% B, and lyciumin I (2.5 mg)—0-5 min: 30% B, 5-30 min: 30-35% B, 30-32 min: 35-95% B, 32-36 min: 95% B, 36-40 min: 95-30% B, 40-60 min: 30% B. For NMR analysis, lyciumin A, B, C, D and I were each dissolved in DMSO-d6. Lyciumin A was analyzed for 1H and 13C NMR data, lyciumin B, D and C were analyzed for 1H NMR data. Lyciumin I was analyzed for 1H NMR, 1H-1H COSY, 1H-1H TOCSY, HSQC, HMBC and ROESY data. NMR data was analyzed with TopSpin software (v3.5) from Broker. Stereochemistry of crosslinked glycine α-carbons at the fourth position of lyciumms was inferred as (R) based on lyciumin A analysis [52] and same ROESY correlations of lyciumin I glycine-Ha as reported for lyciumin A [52]. Stereochemistry of other amino acids of lyciumin I was inferred as (L)-amino acids because of its ribosomal biosynthesis and (L)-amino acid stereochemistry in all reported lyciumins [25,26,52].


Gene Expression Analysis of Characterized Lyciumin Precursors

Gene expression of characterized lyciumin precursors was estimated by mapping raw sequencing reads to de novo assembled transcriptomes using RSEM [37]. For Solanum tuberosum, gene expression of lyciumin precursor peptide gene (PGSC0003DMG400047074, SEQ ID NO: 38) was analyzed in 16 tissue samples (NCBI SRA datasets: ERR029909, ERR029910, ERR029911, ERR029912, ERR029913, ERR029914, ERR029915, ERR029916, ERR029917, ERR029918, ERR029919, ERR029920, ERR029921, ERR029922, ERR029923, ERR029924) by RSEM against the combined de novo Trinity-assembled transcriptome of all 16 samples (FIG. 8E). For Amaranthus hypochondriacus, gene expression of lyciumin precursor peptide gene AHYPO_007393-RA was analyzed in eight tissues and conditions (NCBI SRA: SRR1598916, SRR1598915, SRR1598914, SRR1598913, SRR1598912, SRR1598911, SRR1598910, SRR1598909) by RSEM against the combined de novo Trinity-assembled transcriptome of all eight samples. (FIG. 8A). For Chenopodium quinoa, gene expression of lyciumin precursor peptide AUR62017095 was analyzed in 15 tissue samples (NCBI: SRA: SRR5974430, SRR5974427, SRR5974436, SRR5974438, SRR5974437, SRR5974435, SRR5974432, SRR5974433, SRR5974425, SRR5974426, SRR5974424, SRR5974431, SRR5974428, SRR5974429, SRR5974434) against the combined de novo Trinity-assembled transcriptome of all 15 samples (FIG. 8B). For Medicago truncatula, gene expression of lyciumin precursor peptide Medtr2g081610 (SEQ ID NO: 40) was assessed by the eFP bar.utoronto.ca webbrowser of gene expression data from Medicago truncatula RNA-seq dataset and displayed (FIG. 8D). For Glycine max, gene expression of lyciumin precursor peptides Sali3-2 (SEQ ID NO: 8) and Glyma.12G217300 (SEQ ID NO: 36) was assessed by the eFP bar.utoronto.ca webbrowser of gene expression data from Glycine max RNA-seq datasets and displayed (FIG. 8C).


Glutamine Cyclotransferase Co-Expression Assays with LbaLycA in Nicotiana benthamiana


Glutamine cyclotransferase LbaQC was characterized as the closest homolog of Chenopodium quinoa glutamine cyclotransferase (AUR62017096, Phytozome 12.1) by blastp search oft barbarum root transcriptome on an internal Blast server. LbaQC was synthesized as a gBlock® (ATGGTTTCTTCTACTTCATATCTACCTACCAATCACACAAAAATGCCTCTGCTAA ATCCAAGGTTTCTAGTCATAAGCTTGATTGTTCTACTGAGCATCACCGTATTCAGA GAAGCTGAAGCATCATATAGAGTTTACAAAGTCAAAGTAGTCAATGAATTCCCTC ACGACCCCCAAGCCTACACTCAGGGGCTTCTCTATGCAGAAAATAATACACTCTT TGAATCAACTGGACTTTACGGACGTTCATCTGTTCGAAAAGTTGCATTGCTGGAC GGTAAGGTTGAGAGACTTCATGAAATGGAGTCTTCTTACTTTGGAGAGGGTCTAA CTCTTCTTGGTGAGAGGTTGTTCCAACTAACATGGTTGCTGGATACAGGTTTCATA TATGATCGATACAACTTCAGCAAATTCAAAAAGTTTACTCATCACATGCAAGATG GTTGGGGATTGGCAACCGATGGGAAAGTACTTITTGGAAGTGATGGAACATCAA CATTATATAAGATTGACCCTAAAACAATGAAAGTCATCAGAAAACAAGTTGTCAA GTCTCAAGGGCATGAAGTGCGCTACCTGAATGAGCTGGAGTATGTGAAAGCTGA AGTCTGGGCAAATGTTTATGTGACTGATTGCATTGCTAGAATTTCACCAAAAGAT GGCACTGTGATCGGGTGGATTCTCCTTCAATCTCTAAGAGAAGAGTTAATATCAA GAGGATATAAGGACTTCGAGGTCCTGAATGGAATCGCATGGGACAGAGATGGTG ACCGTATTTTTGTGACAGGGAAACTATGGCCAAAGCTCTTTGAGATCAAGTTGCT CCCCCTCACACCGAATGATCCATTGGCTGGAGAAATCAATAACTTGTGCATCCCG AAAACCAGTTTTCTCTTGGAAATTTAG (SEQ ID NO: 122)) with a 5′-adapter (TGCCCAAATTCGCGACCGGT) (SEQ ID NO: 123) and a 3′-adapter (CTCGAGGCCTTTAACTCTGG) (SEQ ID NO: 124) for Gibson assembly. pEAQ-HT was digested by AgeI and XhoI restriction enzymes and the LbaQC gBlock@ was cloned into linearized pEAQ-HT with Gibson Assembly Master Mix (New England Biolabs). pEAQ-HT-LbaQC was verified by Sanger sequencing. For glutamine cyclotransferase co-expression assays. pEAQ-HT-LbaQCand pEAQ-HT-LbaLycA were transformed into Agrobacterium tumefaciens LBA4404 for heterologous expression as described above. For co-expression assay, leaves of three plants of Nicotiana benthamiana (six week old) were infiltrated with a 1:1 mixture of resuspended A. tumefaciens LBA4404 pEAQ-HT-LbaQC (OD 0.8) and A. tumefaciens LBA4404 pEAQ-HT-LbaLycA (OD 0.8). For LbaLycA control expression without LbaQC, leaves of three plants of Nicotiana benthamiana (six week old) were infiltrated with resuspended A. tumefaciens LBA4404 pEAQ-HT-LbaLycA (OD 0.4). Infiltrated plants were cultivated as described before for six days for heterologous expression. After six days, leaves of three plants of the LbaQC-LbaLycA co-expression and leaves of three plants of LbaLycA expression control were collected and freeze-dried. For comparative chemotyping of [Gln1]-lyciumin B, [Gln1]-lyciumin D, lyciumin B and lyciumin D, peptides were extracted from 0.1 g of freeze-dried tobacco leaves as described above for peptide chemotyping from the LbaQC-LbaLycA co-expression plants and from the LbaLycA expression plants. Peptide extracts were subjected to low resolution MS analysis by selected-ion monitoring (SIM) of masses of [Gln1]-lyciumins and lyciumins with the following LC-MS parameters: LC—Phenomenex Kinetex@ 2.6 μm C18 reverse phase 100 Å 150×3 mm LC column, LC gradient: solvent A—0.1% formic acid, solvent B—acetonitrile (0.1% formic acid), 0.5 mL/min, 0-1 min: 5% B, 1-8 min: 5-95% B, 8-10 min: 95% B, 10-15 min: 5% B, MS—positive ion mode, SIM: 896.8-897.8 m/z (lyciumin B), 899.8-900.8 m/z (lyciumin D), 913.8-914.8 m/z ([Gln1]-lyciumin B), 916.8-917.8 m/z ([Gln1]-lyciumin D). Lyciumin and [Gin1]-lyciumin ion abundance values were determined by peak area integration from each peptide SIM chromatogram in QualBrowser in the Thermo Xcalibur software package (version 3.0.63, ThermoScientific).


Example 2
Results

This experiment investigates the taxonomic distribution of lyciumin-type RiPPs in the plant kingdom and further probes into the evolutionary mechanisms that could explain the observed distribution pattern. This endeavour was greatly facilitated by the extensive plant transcriptome sequencing effort in the recent years, e.g., the 1 kp project, which covers more than half of the extent plant families on earth (Matasci et al., 2014). This experiment establishes an evolutionary framework of how lyciumin-type RiPPs have emerged over the last 450 million years of land plant evolution.


Example 2 is an extension of the results described in Example 1 under the subheading “Parallel evolution of lyciumin biosynthesis in angiosperms and lycophytes.” Some of the results presented under that subheading of Example 1 may be repeated in Example 2 for clarity.


Lyciumin Genotypes are Present in Multiple Angiosperm Families and in the Lycophyte Family Selaginellaceae

With the recently revealed knowledge of the lyciumin precursor protein (Kersten and Weng, 2018), the distribution of lyciumin genotypes in the plant kingdom was explored using available transcriptome resources generated by the plant community in recent years (Matasci et al., 2014). However, an apparent issue of identification of lyciumin precursor genes from plant transcriptomes is the repetitive nature of most of the lyciumin-precursor-peptide-encoding genes, which causes misassembly of these genes from short-read RNA-seq data using de novo transcriptome assembly programs. For example, known lyciumin precursor peptides from Amaranthaceae, Fabaceae and Solanaceae comprise repeating motifs of lyciumin core peptides either in the N-terminal domain (type 1 lyciumin precursor) or within the BURP domain (type 2 lyciumin precursor) (FIG. 39B) (Kersten and Weng, 2018). Surveying among numerous de novo transcriptome assembly softwares, it was observed that rnaSPAdes (Grabherr et al., 2011; Bankevich et al., 2012; Bushmanova et al., 2018) is generally more robust in de novo assembly of lyciumin precursor genes containing repetitive core-peptide motif compared to other assemblers (Grabherr et al., 2011; Bankevich et al., 2012; Bushmanova et al., 2018). To further assess whether rnaSPAdes is more suitable for lyciumin genotype identification from transcriptome data, RNA-seq datasets of characterized and predicted lyciumin producers based on genome mining were assembled by rnaSPAdes (v1.0) and by Trinity (v2.6.6) in three repeated runs. Trinity was selected for this comparison as it was the second best assembler for repetitive lyciumin precursor peptide genes in the initial survey. Assembly of previously characterized and predicted lyciumin precursor peptides were then compared between the rnaSPAdes assembly and Trinity assembly. Whereas Trinity assembly yielded only one fully assembled precursor gene sequence in six test cases, rnaSPAdes enabled the complete assembly of five precursor gene sequences of the six test cases. Both assemblers yielded the correct core peptide gene sequences in each successfully assembled test case with the exception that rnaSPAdes missed one core peptide sequence in one assembled Solanum melongena precursor compared to Trinity (Sme2.5_02115.1_g00002.1). Whereas Trinity showed small variations in assembly between repeated runs, e.g., in repeat number per lyciumin precursor (Nicotiana attenuata OIT08186.1), rnaSPAdes assembly of lyciumin precursors was consistent over repeated runs.


Given the results of improved BURP-domain precursor gene assembly using rnaSPAdes at the time of the transcriptome analysis, de novo reassembly of transcriptomes of 793 plant species was performed using rnaSPAdes starting from raw sequencing reads generated as part of the 1 kp project (Matasci et al., 2014), which represent a total of 317 land plant families. Subsequently, lyciumin genotypes were searched for in these reassembled transcriptomes by tblastn using type 1 (LbaLycA, GenBank: MH124242) and type 2 (Sali3-2, GenBank: AAB66369) lyciumin precursors as queries. This exercise readily identified a battery of candidate lyciumin precursor genes distributed across diverse plant families that extend beyond the previously reported Amaranthaceae, Fabaceae and Solanaceae (Kersten and Weng, 2018). These newly identified lyciumin-genotype-containing plant families include Aizoaceae, Molluginaceae, Nyctaginaceae, Petiveriaceae and Phytolaccaceae, which are all under the order of Caryophyllales, as well as Selaginellaceae. It is noteworthy that Selaginellaceae is one of the three extant families of lycophytes which are basal vascular plants separated from all other euphyllophytes over 400 million years ago (Banks, 2009).


Lyciumin Chemotyping Confirms Lyciumin Biogenesis in Lycophytes

Since RiPPs have only been reported in angiosperms, we sought to confirm the predicted lyciumin production in Selaginella. To do this, several additional transcriptomes of Selaginella species were assembled starting from RNA-seq raw reads available from the NCBI SRA using rnaSPAdes, and searched for lyciumin genotypes. In addition to Selaginella willdenowii (1 kp dataset, NCBI SRA: ERR2040880) (Matasci et al., 2014), lyciumin precursor genes were found in three other Selaginella species: S. uncinata, S. moellendorffii and S. bryopteris (FIGS. 43A-I). One predicted lyciumin precursor gene was cloned and sequenced from root cDNA of S. uncinata (FIG. 40A, SunBURP, GenBank: MK089798). The corresponding lyciumin precursor peptide has five repeats including the putative lyciumin core peptide QPYSVFAW, indicating a serine serving as a lyciumin cyclization residue (FIG. 30B). Subsequent metabolic profiling experiments using liquid chromatography-mass spectrometry (LC-MS) further revealed an analyte in the peptide extract of S. uncinata roots that matched the mass of a predicted lyciumin-[QPYSVFAW] peptide (FIGS. 40C and 40D) and had a lyciumin-characteristic pyroglutamate-proline b-ion in its MS/MS spectrum (FIG. 40D, FIG. S3) (Kersten and Weng, 2018). Further MS/MS analysis of the candidate lyciumin-[QPYSVFAW] peptide confirmed the cyclization site at the fourth amino acid, enabling the prediction of a new peptide macrocyclization between the tryptophan-indole nitrogen and the α-carbon of a serine (FIG. 40C) based on MS/MS data, core peptide sequence and comparative MS/MS analysis of lyciumin-[QPYGVFAW] (FIGS. 44A-B). This newly identified lyciumin-[QPYSVFAW] (lyciumin P) represents the first RiPP from a non-seed plant and highlights that new RiPP chemistry could be discovered by large-scale transcriptome mining in plants.


Phylogenetic and Sequence Analysis of BURP-Domain Lyciumin Precursors Suggests Divergent and Parallel Evolution of Lyciumins in Land Plants

Given the occurrence of lyciumins in multiple families of angiosperms and lycophytes, the evolutionary history of the characterized and predicted lyciumin precursor genes was examined (FIGS. 43A-I) (Kersten and Weng, 2018). Two general scenarios of lyciumin evolution are plausible. In the first scenario, the ability to produce lyciumin-type cyclic peptides from BURP-domain-containing genes evolved prior to the bifurcation of lycophytes and euphyllophytes. This trait was independently lost in plant families that do not contain lyciumin genotypes (Griesmann et al., 2018). Alternatively, ancestral non-peptide-producing BURP-domain genes could be independently recruited as lyciumin-type peptide precursor genes in distantly related plant families.


To test these alternative evolutionary hypotheses, phylogenetic analyses were performed of BURP-domain proteins from several sequenced plant genomes together with BURP domains of the predicted and characterized lyciumin-producing precursor proteins or BURP domains of the predicted and characterized lyciumin-producing precursor proteins alone (FIGS. 31A and 45). The resulted phylogenies show five well-resolved clades of sequences from Caryophyllales, Fabaceae, Rosaceae, Solanaceae, and Selaginellaceae with no support of shared ancestry for any of the two or more families, suggesting that ancestral BURP-domain proteins were likely recruited independently to serve as lyciumin precursor peptides within each of these five plant lineages.


Lyciumin Chemotyping Confirms Lyciumin Biogenesis in Lycophytes

The core peptide motif sequences of all the predicted and characterized lyciumin precursors were systematically examined, and it was found that the lyciumin core-peptide motifs are mostly unique to each of the phylogenetic clades (75% of core peptides are found in one plant family, FIG. 41B), suggesting extensive diversification of lyciumin peptide chemistry within each of the lyciumin-producing plant families. Notably, several lyciumin chemotypes are predicted to occur in three of the five clades: lyciumin A in Caryophyllales, Selaginellales and Solanales, lyciumin J and lyciumin-[QPFGVFAW] in Caryophyllales, Fabales and Solanales, and lyciumin-[QPFGVFGW] in Caryophyllales, Rosales and Solanales (FIG. 41B). A number of lyciumins are present in two plant orders. These include lyciumin C, lyciumin I, lyciumin-[QPYGVYGW] and lyciumin-[QPFGVGSW] in Caryophyllales and Solanales, lyciumin-[QPFGFRAW] in Caryophyllales and Fabales, lyciumin-[QPYGVYSW] in Fabales and Solanales, and lyciumin-[QPYGVGAW] in Caryophyllales and Selaginellales. These observations suggest that those chemotypes shared between distantly related plant families are exemplary cases of metabolic trait convergence against the backdrop of extensive parallel divergence. Previous structure-function exploration of the sequence rules of lyciumin core peptides suggests that over 3,000,000 lyciumin chemotypes are theoretically possible (Kersten and Weng, 2018). The fact that several identical lyciumin chemotypes evolved repeatedly in multiple plant families implicates that these peptides likely play some generally important biological functions in their host plants (FIG. 41B).


From the perspective of lyciumin precursor gene structure, both Fabaceae and Selaginellaceae contain type 1 lyciumin precursors, while Caryophyllales and Solanaceae contain type 2 lyciumin precursors (FIG. 41A, FIG. 45, and FIGS. 43A-I) (Kersten and Weng, 2018). The predicted cyclization site at the fourth position of lyciumin core peptides is a threonine in most Caryophyllales precursors, a glycine in most Fabaceae precursors, a glycine in most Solanaceae precursors, a proline in most Rosaceae precursors, and a serine or a glycine in Selaginellaceae precursors (FIG. 45 and FIGS. 43A-1).


Given these results in the context of the taxonomic relationship of the predicted and characterized lyciumin-producing plants, homologous non-lyciumin-producing BURP-domain protein progenitors likely gave rise to independent occurrences of lyciumin biogenesis at least once in lycophytes and four times in eudicots followed by extensive divergent and parallel evolution to yield the extant lyciumin chemodiversity (FIG. 41C).


A Shared Lyciumin Chemotype in Hemp Plant Celtis occidentalis and Amaranth Plant Achyranthes Bidentata is Derived from Two Disparate Families of Precursor Peptides


When mining new lyciumin genotypes from diverse plant transcriptomes, a lyciumin core-peptide-containing gene from hackberry (Celtis occidentalis, Cannabaceae, FIG. 42A, NCBI SRA: ERR2040412) was serendipitously identified, which does not contain a C-terminal BURP domain. Instead, this putative lyciumin precursor is a DUF2775-domain protein (PF10950) with three repeats including the putative lyciumin core peptide QPFGVFGW (FIG. 42B). Searching proteins related to this newly discovered candidate C. occidentalis lyciumin precursor CocDUF2775 identified an organ-specific protein (PF10950) from Cannabaceae plant Parasponia andersonii as the closest homolog which does not contain lyciumin core peptides (FIGS. 46A-B). Intrigued by the possibility of a new lyciumin precursor gene family in plants, several tissues of a C. occidentalis tree were analyzed by LC-MS and to look for the predicted lyciumin-[QPFGVFGW] (lyciumin Q) chemotype (FIG. 42C). The peptide extract of C. occidentalis leaves indeed contained an analyte that matches the predicted lyciumin-[QPFGVFGW] chemotype by MS and MS/MS analysis (FIGS. 42F and 47A-B). To further verify the detected lyciumin-[QPFGVFGW] chemotype, lyciumin-[QPFGVFGW] was generated in transgenic Nicotiana benthamiana via a previously described lyciumin expression platform (Kersten and Weng, 2018) with all chromatographic and MS features identical to the isolated lyciumin-[QPFGVFGW] from C. occidentalis (FIGS. 42F and 47A-B). In brief, this platform is based on the Glycine max lyciumin precursor gene Sali3-2 (Tang et al., 2014). Sali3-2 harbors a single core peptide sequence, which allows for convenient production of any lyciumin chemotypes of interest in N. benthamiana via transient expression of Sali3-2 with engineered core peptide sequence. Since the reassembled C. occidentalis transcriptome does not encode any BURP-domain proteins that contain the core peptide QPFGVFGW (FIG. 47A-B), it was concluded that CocDUF2775 is a new type of lyciumin precursor peptide defined by its DUF2775 domain, which was dubbed the type 3 lyciumin precursor (FIG. 42B) (Albornos et al., 2012).


The emergence of lyciumins from precursor peptides unrelated to the BURP-domain proteins indicates a potential case of convergent evolution of lyciumins in plants. To probe this hypothesis further, all characterized and predicted BURP-domain lyciumin precursor peptides were queried for potential lyciumin-[QPFGVFGW] producers. This exercise identified several BURP-domain lyciumin precursor genes from Chenopodium quinoa (Amaranthaceae), Hypertelis (Kewa) cerviana (Molluginaceae), Petunia inflata and Solanum tuberosum (both Solanaceae) that harbor the capacity to produce lyciumin-[QPFGVFGW]. Because lyciumin-[QPFGVFGW] could not be detected in peptide extracts of greenhouse-grown C. quinoa and S. tuberosum plants, whereas H. cerviana and P. inflata plants were not available for chemical analysis, additional Amaranthaceae transcriptomes were further queried to search for BURP-domain precursor genes containing lyciumin-[QPFGVFGW] motifs. One such candidate gene was identified in the transcriptome of ox knee (Achyranthes bidentata) (FIGS. 42D and 42E), which contains several repeats with QPFTVFGW core peptides. Example 1 shows that lyciumin-producing plants of the Amaranthaceae family have a threonine at the fourth position of the lyciumin core peptide which is transformed into a glycine during lyciumin biosynthesis (Kersten and Weng, 2018). Therefore, the identified core peptide QPFTVFGW in a lyciumin precursor protein from A. bidentata indicates the formation of a putative lyciumin-[QPFGVFGW] chemotype. Peptide chemotyping of multiple tissues of a greenhouse-grown A. bidentata plant revealed the accumulation of lyciumin-[QPFGVFGW] (lyciumin Q) in seeds (FIGS. 42F and 47A-G). The biogenesis of the same lyciumin chemotype from nonhomologous precursor proteins, i.e., DUF2775-domain and BURP-domain proteins, in two distantly related angiosperm families illustrates a classic case of convergent evolution of a metabolic trait (Weng, 2014).


Discussion

Knowledge of plant specialized metabolism is incomplete. The increase of genomic and transcriptomic data from taxonomically diverse plants can greatly accelerate the discovery of new plant chemotypes and unlock the trajectories of metabolic evolution towards these chemical adaptations. In Example 2, large-scale transcriptome mining followed by peptide chemotyping was carried out, which revealed that the branched cyclic lyciumin RiPPs most likely evolved independently at least once in lycophytes and four times in angiosperms. The results also suggest that lyciumins have emerged in BURP-domain precursor genes and DUF2775-domain precursor genes via convergent evolution.


Lyciumins share several similar features in terms of their biosynthetic origin and taxonomic distribution with head-to-tail cyclic RiPPs such as cyclotides and orbitides. Like other classes of plant cyclic RiPPS, lyciumins might have occurred through serendipitous emergence of a single lyciumin core peptide motif sequence in progenitor BURP domain proteins or DUF2775 domain proteins, which could be processed into small stable branched cyclic peptides by post-translationally modifying enzymes and proteases already present in the host plant cells. Lyciumins with favorable properties that render selective advantage to the plant host were more likely to be retained. Through core peptide mutagenesis and internal motif duplication, the ancestral lyciumins could undergo subsequent chemical optimization, amplification and diversification to yield extant lyciumins (Mylne et al., 2012). Indeed, lyciumin precursor peptides have diverse core peptide sequences that often exist as repetitive motifs in one precursor gene (FIGS. 43A-I) (Kersten and Weng, 2018). Lyciumin genotypes and chemotypes were mainly found in angiosperms prior to this study, which is also the case for cyclotides and orbitides discovered to date. Moreover, like cyclotide and orbitide precursors, all three types of lyciumin precursors contain N-terminal signaling peptides targeting to the secretory pathway (FIGS. 39B and 42B), suggesting that lyciumin precursor proteins are likely directed to the vacuole to complete RiPP biosynthesis similar to cyclotides (Conlan et al., 2011).


There are several unique aspects of lyciumins in the context of plant RiPP evolution. First, lyciumins occured in non-angiosperms, i.e., Selaginella plants, and, therefore, represent the only known RiPP family from non-angiosperms to date. Nevertheless, a report of a cyclotide-like protein domain with protease inhibitory activity in S. moellendorffii (James et al., 2017) implies that head-to-tail cyclic RiPPs may also exist in non-angiosperms and await to be discovered. Second, lyciumins have evolved from different precursor peptides than those that give rise to head-to-tail cyclic RiPPs. Whereas precursor peptides of cyclotides and orbitides are mainly stand-alone proteins (Jennings et al., 2001; Mylne et al., 2012) or seed storage albumins (Poth et al., 2011), lyciumin precursors are either BURP-domain proteins or DUF2775-domain proteins. The full elucidation of lyciumin biosynthetic steps such as proteolytic cleavage and cyclization in characterized lyciumin-producing plant families will further reveal additional differences and similarities between the evolutionary trajectories underlying the head-to-tail cyclic RiPPs and the branched cyclic RiPPs in plants.


This study reveals a complex history of lyciumin evolution within land plants. First, lyciumin genotypes and the corresponding chemotypes were identified in distantly related lycophytes (i.e., Selaginellaceae) and angiosperms (i.e., Cannabaceae, Caryophyllales families, Fabaceae, Solanaceae), whereas no lyciumin genotypes were found in taxa immediately sister to these lyciumin-producing plant lineages nor in any ferns and gymnosperms which are two major vascular plant lineages intermediate between lycophytes and angiosperms. Specifically, no BURP-domain proteins with lyciumin-like core peptide motifs were found in the transcriptomes of the families Lycopodiaceae and Isoetaceae, the only two other extant lycophyte families besides Selaginellaceae. Similarly, lyciumin-producing BURP-domain proteins are absent from plant orders neighboring Fabales (i.e., Malpighiales and Rosales), Caryophyllales (i.e., Berberidopsidales and Cornales), and Solanales (i.e., Gentianales and Lamiales). Moreover, no lyciumin-producing DUF2775-domain proteins were found in plant orders neighboring Rosales (i.e., Fabales and Cucurbitales). Based on these observations, it is most likely that the ability to produce lyciumins arose independently in lycophytes and angiosperms. Phylogenetic reconstruction of lyciumin-producing and non-lyciumin-producing BURP-domain sequences and analysis of the cyclization residues further suggest that the recruitment of BURP-domain proteins for lyciumin biogenesis occurred independently at least four times within angiosperms. Second, although lyciumin chemotypes vary greatly between lyciumin-producing plant families, a few overlapping lyciumin chemotypes were also found (FIG. 41B), illustrating potential cases of parallel evolution. Third, the occurrence of identical lyciumins from unrelated BURP-domain and DUF2775-domain precursor proteins demonstrates a case of convergent evolution (Weng, 2014) in terms of the RiPP precursor peptide recruitment. Future analysis of increasing plant transcriptomes and genomes from all plant lineages (Cheng et al., 2018) will continue to test and refine the model of branched cyclic RiPP evolution in plants.


Lyciumins have been characterized as protease inhibitors, which implicates their potential physiological functions in host defense similar to other plant RiPP classes (Hernandez et al., 2000). While specific functions of DUF2775 proteins in plants are unknown (Albornos et al., 2012), BURP domains have been associated with plant responses to abiotic stresses such as drought (Wang et al., 2012). For example, the lyciumin I precursor peptide Sali3-2 is highly expressed in G. max roots under acidic soil conditions (Ragland and Soliman, 1997), and its overexpression in Arabidopsis alleviates heavy metal stress (Tang et al., 2014). Future research will help elucidate why lyciumins evolved from BURP-domain or DUF2775-domain proteins. It is possible that lyciymins first evolved to enhance host defense or metal-chelation when host plants were under certain abiotic stresses. A comprehensive understanding of the biosynthetic mechanism and evolution of lyciumin-type RiPPs in the plant kingdom will ultimately facilitate engineering of this cyclic peptide class for crop improvement and drug development.


Methods
Materials and Methods

All chemicals were purchased from Sigma-Aldrich, unless otherwise specified. Oligonucleotide primers and synthetic genes were purchased as gBlocks® from Integrated DNA Technologies, Inc. Solvents for liquid chromatography high-resolution mass spectrometry were Optima® LC-MS grade (Fisher Scientific) or LiChrosolv® LC-MS grade (Millipore). High resolution mass spectrometry analysis was performed on a Thermo ESI-Q-Exactive Orbitrap MS coupled to a Thermo Ultimate 3000 UHPLC system. Low-resolution mass spectrometry analysis was done on a Thermo ESI-QQQ MS coupled to a Thermo Ultimate 3000 UHPLC system.


Plant Material


Nicotiana benthamiana seeds for cultivation were a gift from the Lindquist lab (Whitehead Institute, MIT). Selaginella uncinata plants were purchased from Plant Delights Nursery, Inc. Celtis occidentalis leaves were collected from a living tree (Accession No. 7894*A) on Aug. 3, 2018 in the Arnold Arboretum of Harvard University (Project No. 25-2018). Achyranthes bidentata seeds for cultivation were purchased from Frozen Seed Capsules™. Chenopodium quinoa seeds for cultivation were purchased from Earthcare Seeds.


Plant Cultivation


Achyranthes bidentata, Chenopodium quinoa and Nicotiana benthamiana were grown in Sun Gro® Propagation Mix soil with added vermiculite (Whittemore Inc.) and added fertilizer in a greenhouse with a 16 h light/8 h dark cycle for two to six months.


Transcriptome Assembly and Transcriptome Mining of Lyciumin Precursor Genes

For comparative assembly of lyciumin precursor genes, selected plant transcriptome datasets from the NCBI Sequence Read Archive (Lycium barbarum—SRR6896657, Amaranthus hypochondriacus—SRR1598913, Chenopodium quinoa—ERR2040214, Solanum melongena—SRR1104129, Medicago truncatula—SRR5732302, Nicotiana attenuata—SRR1950612) were assembled in triplicate with Trinity (v2.6.6, Grabherr et al., 2011) or rnaSPAdes (v1.0, kmer=25,75 or 55, Bankevich et al., 2012; Bushmanova et al., 2018). Target lyciumin precursor peptides (Lycium barbarum—MH124242, Amaranthus hypochondriacus—AHYPO_007393, Chenopodium quinoa—XP_021740703.1, Solanum melongena—Sme2.5_02115.1 g00002.1, Medicago truncatula—-Medtr8g045890, Nicotiana attenuata—OIT08186.1) were searched in the corresponding de novo transcriptome assemblies by tblastn on an internal Blast server (Priyam et al., 2015).


For large-scale transcriptome mining of lyciumin precursor genes, land plant transcriptome datasets from the 1 kp database (Matasci et al. 2014, Table S1) were assembled by rnaSPAdes (v1.0, kmer=25,75). If an rnaSPAdes (kmer=25,75) assembly failed, the transcriptome was assembled with rnaSPAdes (kmer=55) if possible. Resulting rnaSPAdes-contig files of the 1 kp transcriptomes were searched on an internal Blast server by blastn for (A) homologs of type 1 lyciumin precursor Sali3-2 (GenBank: AAB66369) or (B) homologs of type 2 lyciumin precursor LbaLycA (GenBank: MH124242). Candidate lyciumin precursors were identified by the presence of a lyciumin core peptide motif defined as QP(X)5W, where X is any amino acid, in the N-terminal domain (type 2 lyciumin precursor) or within the BURP domain (type 1 lyciumin precursor).


Cloning of Selaginella uncinata Lyciumin Precursor Gene SunBURP


Illumina sequence raw-files of a Selaginella uncinata transcriptome (NCBI-SRA: SRR7132763) were combined and assembled by rnaSPAdes (v1.0, kmer 25,75, Bankevich et al., 2012; Bushmanova et al., 2018). The Selaginella uncinata root transcriptome was analyzed for lyciumin precursor genes by blastp algorithm on an internal Blast server (Priyam et al., 2015). Root tissue was removed from a Selaginella uncinata plant and total RNA was extracted with the QIAGEN RNeasy Plant Mini kit. cDNA was prepared from root total RNA with SuperScript® III First-Strand Synthesis System (Invitrogen). Transcripts homologous to target lyciumin precursor LbaLycA were used to design cloning primers (SunBURP-pEAQ-fwd (SEQ ID NO: 128): TGCCCAAATTCGCGACCGGTATGGCATCTAATCTCCTTTACTTGC, SunBURP-pEAQ-rev (SEQ ID NO: 129): CCAGAGTTAAAGGCCTCGAGTTACCACACAATGGTTTCGTACC) for amplification of precursor peptide gene SunBURP with Phusion® High-Fidelity DNA polymerase (New England Biolabs). SunBURP was cloned into pEAQ-HT (Sainsbury et al., 2009), which was linearized by restriction enzymes AgeI and XhoI, by Gibson cloning assembly (New England Biolabs) (Gibson et al., 2009). Cloned SunBURP was sequenced by Sanger sequencing from pEAQ-HT-SunBURP


Peptide Chemotyping

For peptide chemotyping, 0.2 g plant material (fresh weight) were frozen and ground with mortar and pestle. Ground plant material was extracted with 10 mL methanol for 1 h at 37° C. in a glass vial. Plant methanol extract was dried under nitrogen gas in a separate glass vial. Dried plant methanol extract was resuspended in water (10 mL) and partitioned with hexane (2×10 mL) and ethyl acetate (2×10 mL), and subsequently extracted with n-butanol (10 mL). The n-butanol extract was dried in vacuo and resuspended in 2 mL methanol for liquid chromatography-mass spectrometry (LC-MS) analysis. Peptide extracts were subjected to high resolution MS analysis with the following LC-MS parameters: LC—Phenomenex Kinetex® 2.6 μm C18 reverse phase 100 Å 150×3 mm LC column, LC gradient: solvent A—0.1% formic acid, solvent B—acetonitrile (0.1% formic acid), 0-2 min: 5% B, 2-23 min: 5-95% B, 23-25 min: 95% B, 25-30 min: 5% B, 0.5 mL/min, MS—positive ion mode, Full MS: Resolution 70000, mass range 425-1250 m/z, dd-MS2 (data-dependent MS/MS): resolution 17500, Loop count 5, Collision energy 15-35 eV (stepped), dynamic exclusion 1 s. LC-MS data of peptide extracts from a predicted lyciumin producing plant was analyzed for lyciumin mass signals by (a) parent mass search (base peak chromatogram of calculated [M+H]+ of predicted lyciumin structure, Δm=5 ppm), (b) fragment mass search of pyroglutamate-proline-b-ion in MS/MS data (C10H13N2O3+, 209.09207 m/z, Δm=5 ppm), and (c) iminium ion mass search of specific amino acids of predicted structure in MS/MS data (for example, pyroglutamate iminium ion [M+H]+ 84.04439 m/z). Putative mass signals of predicted lyciumin structures were confirmed by MS/MS data analysis with QualBrowser in the Thermo Xcalibur software package (version 3.0.63, ThermoScientific).


Transient Expression of Lyciumin Precursor Genes Sali3-2-[QPFGVFGW] and Sali3-2-[QPYGVFAW] in Nicotiana benthamiana


Sali3-2-[QPFGVFGW] and Sali3-2-[QPYGVFAW] were cloned into pEAQ-HT (Sainsbury et al., 2009), which was linearized by restriction enzymes AgeI and XhoI, by Gibson cloning assembly (New England Biolabs) (Gibson et al., 2009). Agrobacterium tumefaciens LBA4404 was transformed with pEAQ-HT-Sali3-2-[QPFGVFGW] or pEAQ-HT-Sali3-2-[QPYGVFAW] by electroporation (2.5 kV), plated on YM agar (0.4 g yeast extract, 10 g mannitol, 0.1 g sodium chloride, 0.2 g magnesium sulfate (heptahydrate), 0.5 g potassium phosphate, (dibasic, trihydrate), 15 g agar, ad 1 L Milli-Q Millipore water, adjusted pH 7) with 100 μg/mL rifampicin, 50 μg/mL kanamycin and 100 μg/mL streptomycin and incubated for two days at 30° C. A 5 mL starter culture of YM medium with 100 μg/mL rifampicin, 50 μg/mL kanamycin and 100 μg/mL streptomycin was inoculated with a clone of Agrobacterium tumefaciens LBA4404 pEAQ-HT—Sali3-2-[QPFGVFGW] or pEAQ-HT-Sali3-2-[QPYGVFAW] and incubated for 24-36 h at 30° C. on a shaker at 225 rpm. Subsequently, the starter culture was used to inoculate a 50 mL culture of YM medium with 100 μg/mL rifampicin, 50 μg/mL kanamycin and 100 μg/mL streptomycin, which was incubated for 24 h at 30° C. on a shaker at 225 rpm. The cells from the 50 mL culture were centrifuged for 30 min at 3000 g, the YM medium was discarded and cells were resuspended in MMA medium (10 mM MES KOH buffer (pH 5.6), 10 mM magnesium chloride, 100 μM acetosyringone) to give a final optical density of 0.8. The Agrobacterium suspension was infiltrated into the bottom of leaves of Nicotiana benthamiana plants (six week old). N. benthamiana plants were placed in the shade two hours before infiltration. After infiltration, N. benthamiana plants were grown as described above for six days. Subsequently, infiltrated leaves were collected and subjected to chemotyping.


Phylogenetic Analysis of Lyciumin Precursor Genes (BURP Domains)

Protein sequences of characterized and predicted lyciumin precursors from genomes (Kersten et al., 2018, except 3′-partial sequences) and transcriptomes (FIGS. 43A-I, precursors with full length BURP domains only) and four founding members of the BURP domain family (NP_001303011.1—BURP domain-containing protein BNM2A precursor [Brassica napus], NP_001234835.1—Polygalacturonase-1 non-catalytic subunit beta precursor [Solanum lycopersicum], CAA31603.1/CAA31602.1—Embryonic abundant protein USP87/Embryonic abundant protein USP92 [Vicia faba], NP_197943.1—BURP domain protein RD22 [Arabidopsis thaliana]) (Bassüner et al., 1988, Yamaguchi-Shinozaki et al., 1993, Zheng et al., 1992, Boutilier et al., 1994) were reduced to their BURP domain (Pfam PF03181) and aligned using Muscle algorithm (Edgar, 2004) in MEGA (ver. 7.0.9) (Kumar et al., 2016). A maximum-likelihood phylogenetic tree was generated with 1000 bootstrap generations using the p-distance method (Nei et al., 2000) in MEGA.


Accession Numbers

LC-MS datasets (MassIVE) (Wang et al., 2016): MSV000083215 (Celtis occidentalis leaf), MSV000083216 (Achyranthes bidentata seed), MSV000083217 (Selaginella uncinata root). GenBank: SunBURP—MK089798.










SEQUENCES



LbaLycA (SEQ ID NO: 1):


MELHHHYFFILLSLAFIASHAANLSPEVYWKVKLPNTPMPRPIKDALHYSEASEGDV





HKLRQPWGVGSWYQAANEGDIKKLRQPYGVGIWYQAANEGDVKKLRQPWGVGS





WYQAANEGDVKKLRQPWGVGSWYQAANEGDVKKLRQPWGVGSWYQAANEGDA





NEGDVKKLRQPYGVGIWYQAANEGDVKKLRQPWGVGSWYQAANEGDVKKLRQP





WGVGSWYQAANEGDVKKLHQPWGVGSWYQAANEGDVKKLPQPWGVGSWYQAA





NEGDVKKLRQPYGVGIWYEAANEGQVKKLRQPYGVGSWYNTATKKDVNENLPVT





PYFFETDLHQGKKMNLPSLKNYNPAPILPRKVADSIPFSSDKIEEILKHFSIDKDSEGA





KMIKKTIKMCEEQAGNGEKKYCATSLESMVDFTSSYLGTNNIIALSTLVEKETPEVQI





YTIEEVKEKANGKGVICHKVAYPYAIHYCHSVGSTRTFMVSMVGSDGTKVNAVSEC





HEDTAPMNPKALPFQLLNVKPGDKPICHFILDDQIALVPSQDATQVSEN






Glycine max: Glyma.12G217400 Org_Gmax peptide: Glyma.12G217400.1.p



(1 of 11) PTHR31236:SF2-DEHYDRATION-RESPONSIVE PROTEIN RD22


(PAC:30547846) (SEQ ID NO: 2):


MEFRCSVISFTILFSLALAGESHVHASLPEEDYWEAVWPNTPIPTALRELLKPLPAGVE





IDELPKQIDDTQYPKTFFYKEDLHPGKTMKVQFTKRPYAQPYGVYTWLTDIKDTSKE





GYSFEEICIKKEAFEGEEKFCAKSLGTVIGFAISKLGKNIQVLSSSFVNKQEQYTVEGV





QNLGDKAVMCHGLNFRTAVFYCHKVRETTAFMVPLVAGDGTKTQALAVCHSDTSG





MNHHMLHELMGVDPGTNPVCHFLGSKAILWVPNLSMDTAYQTNVVV





BURP domain-containing protein 5-like (CanBURP, Capsicum annuum)


(SEQ ID NO: 3):


MAMLYQYYFFTLLSLVFVVISHAANLSPEVYWKIKLPNTPMPKPIKDALHISEKTSQP





YGGLTWDWFHVFSKNELHKLHQLSQPYGVYFYGVSLKNLNEDHLVTRFFFETDLH





QGKKVNLKSLKNNNPAPLLPRKVVDSISFSSNRIEEILDHFSVDNNSEDAKVIKRTVE





LCEQPAADGEIKYCATSLESIIDFASSRLETNNILAIHTEVEKETPVLQTYTIKEVKEKA





NGKCVICHKVPYPYAVHFCHDVGSTRAFRVTMVGADGTKVNAVSVCHEDTASMNP





KALVFQLLNIKPGDKPICHFIMDDQIALFPSQNAVLQMAEG





AUR62017096-RA [CquBURP1, Chenopodium quinoa] [glutamine


cyclotransferase] (SEQ ID NO: 4):


MLKFLYFPFAYYLHSLAEVEAFLVMSSSYFGEGLTLVGERLYQLTYDQNTGFIYDRT





TLSKVSNSGVPLLLTIGIFNHQMKDGWGLTTDGKIMFGSDGSSTLYHIDPRTMKVIKR





QNVRYKDLDVHYLNELEYVHGEVWANVFRTDCIIRISPEDGTVLGWILLPMLRERLE





AAGEIESEDVLNGIAWDSDGKRIFVTGKLWPKLFEIKVHSSNDHSQVDIERMCIQML





TRLEGMK





XP_010675926.1 PREDICTED: glutaminyl-peptide cyclotransferase


isoform X1 [Beta vulgaris subsp. vulgaris] (SEQ ID NO: 5):


MASECILVPCYKRLSRAVSIACLLGFLVPLSILSNTLSALPLDSQKNIQLPQIYTIEVVN





VYPHDPRAFTEGLLYGGNNTLYESTGLYGMSTVRRVTLQTGKVEALQTMDLSYFGE





GLTLVDERLYQLTYEHNTGFIHDRSNLSKVRNSGNPFLFCWNLSFEHSCSPTGHFCLD





LDLSRVGMLQEDELIDFYTLQSPCYSR





XP_010675927.1 PREDICTED: glutaminyl-peptide cyclotransferase


isoform X1 [Beta vulgaris subsp. vulgaris] (SEQ ID NO: 6):


MASECILAPCYKRLSRAVSIACLLGFLVPLSILSNTLSALPLDSQKNIQLPQIYTIEVVN





VYPHDPRAFTEGLLYGGNNTLYESTGLYGMSTVRRVTLQTGKVEAFQTMDLSYFGE





GLTLVDERLYQLTYEHNTGFIYDRSNLSKIGQFTHQMADGWGLASDGKVLFGSDGS





STLYQIDPKTMKEIQRQTVRYMDLDVPYLNELEYVNGEVWANVATTDCIVRISPEDG





TVLGWILLPILRERMMADGELDVFDILNGIAWDKDEQRVFVTGKCWPKVFEIKVNQ





SKDHSDADVRRLCIPVPASVEAMK





LbaQC1 [Lycium barbarum][glutamine cyclotransferase]


(SEQ ID NO: 7):


MPLLNPRFLVISLIVLLSITVFREAEASYRVYKVKVVNEFPHDPQAYTQGLLYAENNT





LFESTGLYGRSSVRKVALLDGKVERLHEMESSYFGEGLTLLGERLFQLTWLLDTGFI





YDRYNFSKFKKFTHhMQDGWGLATDGKVLFGSDGTSTLYKIDPKTMKVIRKQVVK





SQGHEVRYLNELEYVKAEVWANVYVTDCIARISPKDGTVIGWILLQSLREELISRGY





KDFEVLNGIAWDRDGDRIFVTGKLWPKLFEIKLLPLTPNDPLAGEINNLCIPKTSFL





LEI





Sali3-2 (SEQ ID NO: 8):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTATGGTGTATATACATGGTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QPWGVYTW] (SEQ ID NO: 9):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTGGGGTGTATATACATGGTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QPYGVYFW] (SEQ ID NO: 10):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTATGGTGTATATTTCTGGTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QPWGVGAW] (SEQ ID NO: 11):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTGGGGTGTAGGTGCATGGTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QPFGVYTW] (SEQ ID NO: 12):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTTTGGTGTATATACATGGTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QPWGVGTW] (SEQ ID NO: 13):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTGGGGTGTAGGTACATGGTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[EPYGVYTW] (SEQ ID NO: 14):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCAGAACCTTATGGTGTATATACATGGTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QAYGVYTW] (SEQ ID NO: 15):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAAGCTTATGGTGTATATACATGGTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QPAGVYTW] (SEQ ID NO: 16):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTGCTGGTGTATATACATGGTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QPFGFFSW] (SEQ ID NO: 17):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTTTGGTTTCTTCTCATGGTTAACGGATATT





AAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGAA





GCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGGT





TTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QPYGVYW] (SEQ ID NO: 18):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTATGGTGTATATTGGTTAACGGATATTAA





AGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGAAGC





GTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGGTTTT





GCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCAATA





AGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCAGTG





ATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCCGTG





AAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCAGG





CACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCATGA





ACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCAAG





GCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGTTG





TTGTTTAA





Sali3-2-[QPYGVYTAW] (SEQ ID NO: 19):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTATGGTGTATATACAGCATGGTTAACGGA





TATTAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAA





AGAAGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAAT





TGGTTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTT





GTCAATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAA





GCAGTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAG





TCCGTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAAC





TCAGGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTT





CATGAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAA





GCAAGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAA





CGTTGTTGTTTAA





Sali3-2-[QPWGVYSW] (SEQ ID NO: 20):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTGGGGTGTATATTCATGGTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QPYAVYTW] (SEQ ID NO: 21):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTATGCTGTATATACATGGTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QPYTVYTW] (SEQ ID NO: 22):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTATACTGTATATACATGGTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QPYGAYTW] (SEQ ID NO: 23):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTATGGTGCATATACATGGTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QPYGVATW] (SEQ ID NO: 24):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTATGGTGTAGCTACATGGTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QPYGVYAW] (SEQ ID NO: 25):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTATGGTGTATATGCATGGTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QPYGVYTA] (SEQ ID NO: 26):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTATGGTGTATATACAGCGTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QPYGVYTY] (SEQ ID NO: 27):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTATGGTGTATATACATACTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QYGVYTW] (SEQ ID NO: 28):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAATATGGTGTATATACATGGTTAACGGATATTAA





AGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGAAGC





GTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGGTTTT





GCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCAATA





AGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCAGTG





ATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCCGTG





AAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCAGG





CACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCATGA





ACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCAAG





GCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGTTG





TTGTTTAA





Sali3-2-[QGVYTW] (SEQ ID NO: 29):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAAGGTGTATATACATGGTTAACGGATATTAAAGA





CACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGAAGCGTT





TGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGGTTTTGCC





ATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCAATAAGC





AAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCAGTGATGT





GTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCCGTGAAAC





AACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCAGGCACTT





GCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCATGAACTCA





TGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCAAGGCCAT





TTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGTTGTTGTTT





AA





Sali3-2-[QAPYGVYTW] (SEQ ID NO: 30):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAAGCACCTTATGGTGTATATACATGGTTAACGGA





TATTAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAA





AGAAGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAAT





TGGTTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTT





GTCAATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAA





GCAGTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAG





TCCGTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAAC





TCAGGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTT





CATGAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAA





GCAAGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAA





CGTTGTTGTTTAA





Sali3-2-[QPYGVYTF] (SEQ ID NO: 31):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTATGGTGTATATACATTCTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QPYGVYTH] (SEQ ID NO: 32):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTATGGTGTATATACACACTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QPYGVYFY] (SEQ ID NO: 33):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTATGGTGTATATTTCTACTTAACGGATATT





AAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGAA





GCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGGT





TTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





AHYPO_007393 Org_Ahypochondriacus peptide: AHYPO_007393-RA (1 of 19)


PF03181-BURP domain (BURP) (PAC:32833029) (SEQ ID NO: 34)


MAMDLRLQFPALFLLTFLALHASSCKQEDYWKMKLPKVPMPEAIKQSLLHSGGENK





LKDDSALKQPYTVGSWKYDVDTNKVKDDSVVKQPYTVGSWKYDADKNKVPDESA





LKQPYTVFSWKYDAGENKVKDESALKQPYTVFSWKYDAGENKVKDESALKQPYTV





FSWKYDAGENKVKDESALKQPYTVFSWKYDAGENKVKDESALKQPYTVFSWKYD





AGENKVKDESALKQPYTVFSWKYDAGENKVKDESALKQPYTVFSWKYDAGENKV





KDESALKQPYTVFSWKYDAGENKVKDESALKQPYTVFSWKYDAGENKVKDESALK





QPYTVFSWKYDAGENKVKDESALKQPYTVFSWKYDAGENKVKDESALKQPYTVGS





WKYDAGENKVKDESALKQPYTVGSWKYNENDESKQASPHHLHHHKLMHDNVNSK





DQEDLTDGSVFFVEKSLHIGSKLKHDFQKTPKTSFLSKQEAQSIPFSMEKIGDILNLTC





AQSMEDIVDFVVGELGTNEVEIKMMNNNIEVPNGIQDYVLSKVEKLVVPGNTAVAC





HRMSYPYIVYYCHHQQDIGQYNVTLVSPSTGAAFQTTAVCHYDTYAWQPDVVALK





YLGIRPGDAPVCHFSAINDMFWNRKNNDFKSLDMVQ*





AUR62017095 Org_Cquinoaearly-release peptide: AUR62017095-RA BURP5:


BURP domain-containing protein 5 (PAC:36309717) (SEQ ID NO: 35)


MSLYSNDADKAKKANTNQPFTVVGWKYNADGAKERVGMSQPYTVMAWKYNVD





DAKERVGIDQPYTVWGWNYNTDSANKEKVKEAYKPLSIETNTKKTGIDQPYTVWG





WNYNTNSANKEKVKEAEKPLSIETNTKKTGVDQPYTVWGWNYNTNNANKEKVKE





AEKPLSIETNTKKTGIDQPYTVWGWNYNTDSANKEKVKEANKPLSIETDTKKTGIDQ





PYTVWGWNYNTDSANKEKVKEAEKSLSIETNTKKTGIDQPYTVWGWNYNTNSGNK





EKVKEADKVFTMDTSTKKAGTKQPYTVMGWKYNADNGKREKVGHEVSVGSVFFIE





KSLRLGDKLKHDFQKTPSVPFLPKHIAKSIPFSEDKFTEILNLFSIKPGSVEATGIKGTL





DVCLHRPKVEKENRTCAQSMEDVVDFVVRELGSNDVELRMMKNDIEVPKGIQDYVI





TKVKKLVVPGNTAAACHRMSYPYVVYYCHHQQDIGHYDVTLVSPTTGNAIQTTAV





CHYDTYAWKPNVPALQYLGIRPGDAPVCHFSAINDMFWSLKANSKSLDMVV*





Glyma.12G217300 Org_Gmax peptide: Glyma.12G217300 PTHR31236:SF2-


DEHYDRATION-RESPONSIVE PROTEIN RD22 (PAC:30548760) (SEQ ID NO: 36)


MALRCLVMSLSVLFTLGLARESHARDEDFWHAVWPNTPIPSSLRDLLKPGPASVEID





DHPMQIEETQYPKTFFYKEDLHPGKTMKVQFSKPPFQQPWGVGTWLKEIKDTTKEG





YSFEELCIKKEAIEGEEKFCAKSLGTVIGFAISKLGKNIQVLSSSFVNKQDQYTVEGVQ





NLGDKAVMCHRLNFRTAVFYCHEVRETTAFMVPLVAGDGTKTQALAICHSNTSGM





NHQMLHQLMGVDPGTNPVCHFLGSKAILWVPNLSVDTAYQTNIVA*





PGSC0003DMG400047074 Org_Stuberosum peptide:


PGSC0003DMP400069178 BURP domain-containing protein 


(PAC:37467747, SEQ ID NO: 38):


MELHHQYYFFTFFSVIFVVSHAANLSPEVYWRVKLPNTPMPTPIKDALHISDGIRLPL





RTSFTKYANHGEWVDGIRLPFENELHKVRQPWGVDSWYQAAPENELHKVRQPYGV





GVWYNDAAKKDLNDNHPVTPYFFETDLHQGKKMNLQSLKNYNPAPILPRKVVDSI





AFSSDKIEEILNHFSVDKDSERAKDIKKTIKTCEEPAGNGEVKHCATSLESMIDFTLSH





LGTNNIIAMSTEVEKETPEVQAYTIEKVEEKANGKGVVCHKVAYPYAVHFCHDVGS





TRTFMVSMVGADGTKVNAVSVCHEDTASMNPKALPFQLLNVKPGDKPICHFTLDD





QIALFPSQNAVLQVAEN*





Medtr2g081590 Org_Mtruncatula peptide: Medtr2g081590.1 PTHR31236:SF2-


DEHYDRATION-RESPONSIVE PROTEIN RD22 (PAC:31067976, SEQ ID NO: 39):


MGFQHLLIFISVLSLALAGGSHASVPEEEYWEAVWPNTPIPTSLLELLKPGPKGVEID





DLPTEIDDTQFPTNFFYEHELYPGKTMNMQFSKRPLAQPYGVYFWMHDIKDLQKEG





YTIDEMCVKNKPKKVEEKFCAKSLGTLIGFAISKLGKNIQSLSSSFIDKHEQYKIESVQ





NLGDKAVMCHRLNFQKVVFYCHEVHGTTAFKVPLVANDGTKTHAIATCHADISGM





NQHMLHQIMKGDPGSNHVCHFLGNKAILWVPNLGLDNAYGANAAL*





Medtr2g081610 Org_Mtruncatula peptide: Medtr2g081610.1 PTHR31236:SF2-


DEHYDRATION-RESPONSIVE PROTEIN RD22 (PAC:31064010, SEQ ID NO: 40):


MELKHILIFISVLSLALAGGSHASLPEEEYWEAVWPNTPIPSSLRELLKPGPEGVEIDD





LPMEVDDTQYPKTFFYEHELYPGKTMKVQFSKRPFAQPYGVYTWMREIKDIEKEGY





TFNEVCVKKAAAEGEQKFCAKSLGTLIGFSISKLGKNIQALSSSFIDKHEQYKIESVQN





LGEKAVMCHRLNFQKVVFYCHEIHGTTAFMVPLVANDGRKTQALAVCHTDTSGMN





HEMLQQIMKADPGSKPVCHFLGNKAILWVPNLGLDNAYGANAAV*





CanBURP nucleotide sequence (SEQ ID NO: 37):


ATGGCGATGCTTTACCAATATTACTTCTTCACACTTCTTTCTCTTGTTTTTGTCGTA





ATTAGTCATGCAGCAAATTTATCTCCTGAGGTGTATTGGAAAATCAAACTACCC





AATACTCCTATGCCCAAACCTATCAAGGATGCCCTACACATTTCTGAGAAAACGT





CCCAACCATATGGAGGTCTTACTTGGGATTGGTTTCACGTTTTCTCCAAGAACGAG





CTACACAAATTACACCAATTAAGCCAACCATATGGAGTGTACTTTTATGGTGTTT





CTTTGAAAAACCTTAATGAAGATCACCTAGTTACACGTTTCTTTTTTGAAACCGAT





TTACATCAAGGGAAAAAAGTGAATCTTAAGTCGCTCAAAAACAACAATCCAGCT





CCCCTTTTGCCTCGCAAAGTTGTAGATTCCATCTCTTTCTCATCGAACAGAATTGA





GGAAATTCTTGATCACTTTTCTGTTGACAACAATTCAGAAGATGCTAAAGTGATCA





AGAGAACAGTCGAACTCTGTGAACAGCCCGCAGCTGATGGAGAGATAAAATATTGT





GCCACTTCCTTGGAATCTATAATTGATTTCGCCTCATCTCGCTTGGAAACAAACA





ATATTTTGGCAATTCACACCGAGGTAGAGAAGGAAACTCCAGTGCTGCAAACATAT





ACTATCAAAGAAGTGAAAGAGAAAGCAAACGGTAAATGTGTCATATGCCACAAA





GTACCTTACCCATATGCAGTACACTTTTGCCATGATGTAGGAAGCACCAGGGCTT





TTAGGGTCACTATGGTGGGTGCTGATGGAACAAAAGTTAATGCAGTATCAGTCTGC





CATGAGGATACTGCATCCATGAATCCTAAGGCATTGGTTTTTCAGTTGCTCAATA





TTAAGCCCGGAGATAAGCCTATTTGCCATTTTATTATGGATGATCAAATTGCCCTGT





TTCCTTCACAAAACGCAGTTCTTCAAATGGCTGAAGGCTAA





LbaLycA (Lycium barbarum) nucleotide sequence (SEQ ID NO: 125):


ATGGAGTTGCATCACCATTACTTCTTCATACTTCTTTCTCTTGCTTTTATAGCAAG





TCATGCAGCTAATTTATCTCCTGAGGTGTATTGGAAAGTCAAGCTGCCCAACACT





CCTATGCCCAGACCCATTAAGGATGCTCTACACTATTCTGAAGCCTCCGAGGGTG





ACGTTCACAAGTTGCGCCAACCATGGGGAGTGGGTTCGTGGTATCAAGCAGCAA





ACGAGGGTGATATTAAAAAATTACGCCAACCATATGGAGTTGGTATATGGTATC





AAGCAGCAAACGAGGGTGATGTTAAAAAATTACGCCAACCATGGGGAGTTGGTT





CCTGGTATCAAGCAGCAAACGAGGGTGATGTTAAAAAATTACGCCAACCATGGG





GAGTGGGTTCCTGGTATCAAGCAGCAAACGAGGGTGATGTTAAAAAATTACGCC





AACCATGGGGAGTGGGTTCCTGGTATCAAGCAGCAAACGAGGGTGATGCAAATG





AGGGTGATGTTAAAAAATTACGCCAACCATATGGAGTTGGTATATGGTATCAAG





CAGCAAACGAGGGTGATGTTAAAAAATTACGCCAACCATGGGGAGTGGGTTCTT





GGTATCAAGCAGCAAACGAGGGTGATGTTAAAAAATTACGCCAACCATGGGGAG





TGGGTTCCTGGTATCAAGCAGCAAACGAGGGTGATGTTAAAAAATTACACCAAC





CATGGGGAGTGGGTTCCTGGTATCAAGCAGCAAACGAGGGTGATGTTAAAAAAT





TACCCCAACCATGGGGAGTGGGTTCCTGGTATCAAGCAGCAAACGAGGGTGATG





TTAAAAAATTACGCCAACCATATGGAGTTGGTATATGGTATGAAGCAGCAAACG





AGGGTCAAGTTAAAAAATTACGCCAACCCTATGGAGTGGGTTCGTGGTATAATA





CTGCTACAAAGAAAGATGTTAATGAAAACCTCCCAGTCACCCCTTACTTTTTTGA





AACAGATTTACATCAAGGGAAAAAGATGAATCTTCCATCTCTCAAAAATTATAAT





CCAGCTCCCATTTTGCCTCGCAAAGTTGCAGATTCCATCCCCTTCTCATCAGACA





AGATTGAAGAAATTCTAAAGCACTTTTCCATTGATAAGGACTCAGAGGGGGCTA





AAATGATCAAGAAAACTATCAAAATGTGTGAGGAGCAAGCGGGTAATGGCGAG





AAGAAATATTGTGCCACTTCCTTAGAATCAATGGTTGATTTCACCTCATCTTATCT





GGGAACAAATAATATTATAGCACTTTCCACTTTAGTAGAGAAGGAAACTCCAGA





GGTGCAAATATATACCATCGAAGAAGTGAAAGAGAAAGCAAATGGCAAAGGCG





TGATATGCCACAAAGTGGCTTACCCGTATGCGATACATTATTGCCATAGTGTAGG





AAGCACAAGGACCTTTATGGTCTCAATGGTGGGTTCTGATGGAACAAAAGTTAAT





GCAGTATCAGAGTGTCATGAGGATACTGCACCCATGAACCCTAAGGCATTGCCTT





TTCAATTGCTCAACGTTAAGCCAGGAGATAAACCTATTTGCCATTTCATATTGGA





TGATCAGATTGCCTTAGTTCCTTCTCAAGACGCAACTCAAGTGTCTGAAAACTAA





StuBURP (Solanum tuberosum) nucleotide sequence (SEQ ID NO: 126):


ATGGAGTTGCTTCACCAATATTATTTCTTCACATTTTTTTCTGTAATTTTTGTGGTA





AGTCATGCAGCAAATTTATCTCCTGAGGTGTATTGGAGAGTCAAATTGCCTAATA





CTCCCATGCCCACACCTATCAAAGATGCACTACACATTTCTGAGAAAACTGCATA





TAATGGAGATGGAAACACCAAAATATCCCAACCATATGGAGTGTTTGCATGGTA





TCAGGCTGCCTCCGAGAATGAGCTTCACAAAGTACGCCAACCATATGGAGTGGA





TGGATGGTACAAGGCTGCCTCCGAGAATGAGCTTCACAAAGTACGCCAACCATA





TGGAGTGTTTGCATGGTACAAGGCTATCACCGAGAATGAGCTTCACAAAGTACG





CCAACCATATGGAGTGTTTGCATGGTACAAGGCTGCCACCGAGAATGAGCTTCA





CAAAGTACGCCAACCATATGGAGTGTTTGCATGGTACAAGGCTGCCTCCGAGAA





TGTGCTTCACAAAGTACGCCAACCATATGGAGTGTTTGCATGGTACAATGATGCT





GCTAAGAAAGATCTTAATGACAATCACCCAGTGACGCCATACTTCTTTGAAACAG





ATTTACATCAAGGGAAAAAAATGAATCTTCAGTCTCTCAAAAACTACAATCCAG





CACCCATTTTGCCACGCAAAGTTGTAGATTCAATTGCTTTCTCATCGGACAAAAT





TGAGGAAATTCTTAATCACTTCTCTGTTGATAAGGACTCGGAACGTGCTAAAGAC





ATCAAGAAAACAATCAAAATGTGTGAAGAGCCTGCGGGTAACGGAGAGGTAAA





ACATTGTGCCACTTCTTTGGAATCTATGATTGATTTCACCTTATCTCACCTGGGAA





CAAACAATATTGTAGCAATTTCCACTGAAGTAGACAAGGAAACTCCAGAGGTGC





AAACATATACCATCGAAAAAGTGGAAGAGAAAGCAAATGGCAAAGGTGTTGTAT





GTCACAAAGTAGCTTACCCATATGCAGTACACTTTTGCCATGATGTAGGAAGCAC





TAGGACATTTGTGGTGTCTATGGTGGGTGCTGACGGAACAAAAGTTAATGCAGTA





TCAGTCTGCCATGAGGATACTGCATCCATGAACCCTAAGGCATTGCCTTTTCAGT





TGCTCAACGTTAAGCCTGGAGACAAGCCTATTTGCCATTTCACTTTGGACGATCA





AATTGCCCTGTTTCCTTCTCAAAACGCACTTCTTCAAGTGGCTGAAAACTAA





StuBURP (Solanum tuberosum) (SEQ ID NO: 127):


MELLHQYYFFTFFSVIFVVSHAANLSPEVYWRVKLPNTPMPTPIKDALHISEKTAYNG





DGNTKISQPYGVFAWYQAASENELHKVRQPYGVDGWYKAASENELHKVRQPYGVF





AWYKAITENELHKVRQPYGVFAWYKAATENELHKVRQPYGVFAWYKAASENVLH





KVRQPYGVFAWYNDAAKKDLNDNHPVTPYFFETDLHQGKKMNLQSLKNYNPAPIL





PRKVVDSIAFSSDKIEEILNHFSVDKDSERAKDIKKTIKMCEEPAGNGEVKHCATSLES





MIDFTLSHLGTNNIVAISTEVDKETPEVQTYTIEKVEEKANGKGVVCHKVAYPYAVH





FCHDVGSTRTFVVSMVGADGTKVNAVSVCHEDTASMNPKALPFQLLNVKPGDKPIC





HFTLDDQIALFPSQNALLQVAEN





Sali3-2-[QPFGVFGW] ordered from IDT as gblocks® (SEQ ID NO: 196):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCGTTTGGAGTATTTGGTTGGTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA





Sali3-2-[QPYGVFAW] ordered from IDT as gblocks® (SEQ ID NO: 197):


ATGGAATTTCGATGCTCAGTCATCTCTTTTACCATTCTCTTCTCTCTTGCTCTTGCA





GGAGAGAGCCATGTCCATGCATCGCTACCTGAGGAAGATTATTGGGAAGCTGTT





TGGCCAAACACTCCCATTCCCACTGCACTGCGAGAGCTTCTAAAGCCTCTCCCTG





CAGGTGTTGAAATCGATGAACTCCCTAAGCAAATTGATGATACACAGTACCCAA





AAACATTCTTCTATAAAGAAGACCTTCATCCAGGCAAAACAATGAAAGTACAAT





TCACCAAGCGTCCCTATGCACAACCTTATGGTGTATTCGCCTGGTTAACGGATAT





TAAAGACACCTCTAAAGAAGGATATAGTTTTGAAGAGATATGCATCAAGAAAGA





AGCGTTTGAGGGAGAAGAGAAGTTTTGTGCAAAATCCTTGGGAACAGTAATTGG





TTTTGCCATTTCAAAGCTGGGAAAGAACATTCAAGTACTTTCAAGTTCCTTTGTCA





ATAAGCAAGAGCAATACACTGTGGAAGGAGTGCAGAATCTTGGAGACAAAGCA





GTGATGTGTCATGGGCTAAATTTCAGAACTGCAGTATTTTACTGCCATAAAGTCC





GTGAAACAACAGCTTTCATGGTTCCATTGGTGGCTGGTGATGGAACCAAAACTCA





GGCACTTGCTGTTTGCCACTCAGATACTTCTGGAATGAATCATCACATGCTTCAT





GAACTCATGGGAGTTGATCCTGGAACTAACCCTGTTTGCCATTTCCTTGGAAGCA





AGGCCATTTTATGGGTACCCAATTTATCTATGGACACTGCCTATCAGACTAACGT





TGTTGTTTAA






REFERENCES FOR EXAMPLE 1



  • 1. Craik, D. J., Fairlie, D. P., Liras, S. and Price, D., 2013. The future of peptide-based drugs. Chemical biology & drug design, 81(1), pp. 136-147.

  • 2. Nolan, E. M. & Walsh, C. T. How nature morphs peptide scaffolds into antibiotics. ChemBioChem 10, 34-53 (2009).

  • 3. Gao, A. G., Hakimi, S. M., Mittanck, C. A., Wu, Y., Woerner, B. M., Stark, D. M., Shah, D. M., Liang, J. and Rommens, C. M., 2000. Fungal pathogen protection in potato by expression of a plant defensin peptide. Nature biotechnology, 18(12), pp. 1307-1310.

  • 4. Ventola, C. L., 2015. The antibiotic resistance crisis: part 1: causes and threats. Pharmacy and Therapeutics, 40(4), p.277.

  • 5. Tabashnik, B. E. and Carriére, Y., 2017. Surge in insect resistance to transgenic crops and prospects for sustainability. Nature biotechnology, 35(10), p.926.

  • 6. Chaparro, J. M., Badri, D. V. and Vivanco, J. M., 2014. Rhizosphere microbiome assemblage is affected by plant development. The ISME journal, 8(4), pp. 790-803.

  • 7. Dimkpa, C., Weinand, T. and Asch, F., 2009. Plant-rhizobacteria interactions alleviate abiotic stress conditions. Plant, cell & environment, 32(12), pp. 1682-1694.

  • 8. Arnison, P. G., Bibb, M. J., Bierbaum, G., Bowers, A. A., Bugni, T. S., Bulaj, G., Camarero, J. A., Campopiano, D. J., Challis, G. L., Clardy, J. and Cotter, P. D., 2013. Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Natural product reports, 30(1), pp. 108-160.

  • 9. Craik, D. J., Daly, N. L., Bond, T. and Waine, C., 1999. Plant cyclotides: a unique family of cyclic and knotted proteins that defines the cyclic cystine knot structural motif. Journal of molecular biology, 294(5), pp. 1327-1336.

  • 10. Craik, D. J., Lee, M. H., Rehm, F. B., Tombling, B., Doffek, B. and Peacock, H., 2017. Ribosomally-synthesised cyclic peptides from plants as drug leads and pharmaceutical scaffolds. Bioorganic & medicinal chemistry.

  • 11. Tan, N. H. and Zhou, J., 2006. Plant cyclopeptides. Chemical reviews, 106(3), pp. 840-895.

  • 12. Lautru, S., Deeth, R. J., Bailey, L. M. and Challis, G. L., 2005. Discovery of a new peptide natural product by Streptomyces coelicolor genome mining. Nature chemical biology, 1(5), pp. 265-269.

  • 13. Kersten, R. D., Yang, Y. L., Xu, Y., Cimermancic, P., Nam, S. J., Fenical, W., Fischbach, M. A., Moore, B. S. and Dorrestein, P. C., 2011. A mass spectrometry-guided genome mining approach for natural product peptidogenomics. Nature chemical biology, 7(11), pp. 794-802.

  • 14. Winter, J. M., Behnken, S. and Hertweck, C., 2011. Genomics-inspired discovery of natural products. Current opinion in chemical biology, 15(1), pp. 22-31.

  • 15. Ziemert, N., Alanjary, M. and Weber, T., 2016. The evolution of genome mining in microbes—a review. Natural product reports, 33(8), pp. 988-1005.

  • 16. Hetrick, K. J. and van der Donk, W. A., 2017. Ribosomally synthesized and post-translationally modified peptide natural product discovery in the genomic era. Current Opinion in Chemical Biology, 38, pp. 36-44.

  • 17. Goodstein, D. M., Shu, S., Howson, R., Neupane, R., Hayes, R. D., Fazo, J., Mitros, T., Dirks, W., Hellsten, U., Putnam, N. and Rokhsar, D. S., 2011. Phytozome: a comparative platform for green plant genomics. Nucleic acids research, 40(D1), pp. D1178-D1186.

  • 18. Lau, W. and Sattely, E. S., 2015. Six enzymes from mayapple that complete the biosynthetic pathway to the etoposide aglycone. Science, 349(6253), pp. 1224-1228.

  • 19. Owen, C., Patron, N. J., Huang, A. and Osbourn, A., 2017. Harnessing plant metabolic diversity. Current Opinion in Chemical Biology, 40, pp. 24-30.

  • 20. Medema, M. H. and Osbourn, A., 2016. Computational genomic identification and functional reconstitution of plant natural product biosynthetic pathways. Natural product reports, 33(8), pp. 951-962.

  • 21. Kautsar, S. A., Suarez Duran, H. G., Blin, K., Osbourn, A. and Medema, M. H., 2017. plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters. Nucleic acids research, 45(W1), pp. W55-W63.

  • 22. Topfer, N., Fuchs, L. M. and Aharoni, A., 2017. The PhytoClust tool for metabolic gene clusters discovery in plant genomes. Nucleic acids research, 45(12), pp. 7049-7063.

  • 23. Anarat-Cappillino, G. and Sattely, E. S., 2014. The chemical logic of plant natural product biosynthesis. Current opinion in plant biology, 19, pp. 51-58.

  • 24. Mohimani, H. and Pevzner, P. A., 2016. Dereplication, sequencing and identification of peptidic natural products: from genome mining to peptidogenomics to spectral networks. Natural product reports, 33(1), pp. 73-86.

  • 25. Yahara S, Shigeyama C, Nohara T. Tetrahedron Lett. (1989)

  • 26. Morita, H., Suzuki, H. and Kobayashi, J. I., 2004. Celogenamide A, a New Cyclic Peptide from the Seeds of Celosia a rgentea. Journal of natural products, 67(9), pp. 1628-1630.

  • 27. Petersen T N., Brunak S., von Heijne G. & Nielsen H. SignalP 4.0: discriminating signal peptides from transmembrane regions Nature Methods, 8:785-786, 2011

  • 28. Hattori, J., Boutilier, K. A., Campagne, M. L. and Miki, B. L., 1998. A conserved BURP domain defines a novel group of plant proteins with unusual primary structures. Molecular and General Genetics MGG, 259(4), pp. 424-428.

  • 29. Ding, X., Hou, X., Xie, K. and Xiong, L., 2009. Genome-wide identification of BURP domain-containing genes in rice reveals a gene family with diverse structures and responses to abiotic stresses. Planta, 230(1), pp. 149-163.

  • 30. Boutilier, K. A., Gines, M. J., DeMoor, J. M., Huang, B., Baszczynski, C. L., Iyer, V. N. and Miki, B. L., 1994. Expression of the BnmNAP subfamily of napin genes coincides with the induction of Brassica microspore embryogenesis. Plant molecular biology, 26(6), pp. 1711-1723.

  • 31. Bassüner, R., Baumlein, H., Huth, A., Jung, R., Wobus, U., Rapoport, T. A., Saalbach, G. and Müntz, K., 1988. Abundant embryonic mRNA in field bean (Vicia faba L.) codes for a new class of seed proteins: cDNA cloning and characterization of the primary translation product. Plant molecular biology, 11(3), pp. 321-334.

  • 32. Yamaguchi-Shinozaki, K. and Shinozaki, K., 1993. The plant hormone abscisic acid mediates the drought-induced expression but not the seed-specific expression of rd22, a gene responsive to dehydration stress in Arabidopsis thaliana. Molecular and General Genetics MGG, 238(1), pp. 17-25.

  • 33. Zheng, L., Heupel, R. C. and DellaPenna, D., 1992. The beta subunit of tomato fruit polygalacturonase isoenzyme 1: isolation, characterization, and identification of unique structural features. The plant cell, 4(9), pp. 1147-1156.

  • 34. Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q. and Chen, Z., 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature biotechnology, 29(7), pp. 644-652.

  • 35. Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., Lesin, V. M., Nikolenko, S. I., Pham, S., Prjibelski, A. D. and Pyshkin, A. V., 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of computational biology, 19(5), pp. 455-477.

  • 36. Tang, M. C., Zou, Y., Watanabe, K., Walsh, C. T. and Tang, Y., 2016. Oxidative cyclization in natural product biosynthesis. Chemical reviews, 117(8), pp. 5226-5333.

  • 37. Li, B. and Dewey, C. N., 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC bioinformatics, 12(1), p.323.

  • 38. Sainsbury, F., Thuenemann, E. C. and Lomonossoff, G. P., 2009. pEAQ: versatile expression vectors for easy and quick transient expression of heterologous proteins in plants. Plant biotechnology journal, 7(7), pp. 682-693.

  • 39. Ragland M, Soliman KM (1997) Sali5±4a and Sali3±2, two genes induced by aluminum in soybean roots. Plant Physiol 114:395

  • 40. Tang, Y., Cao, Y., Qiu, J., Gao, Z., Ou, Z., Wang, Y. and Zheng, Y., 2014. Expression of a vacuole-localized BURP-domain protein from soybean (SALI3-2) enhances tolerance to cadmium and copper stresses. PloS one, 9(6), p. e98830

  • 41. Fazio, G. C., Xu, R. and Matsuda, S. P., 2004. Genome mining to identify new plant triterpenoids. Journal of the American Chemical Society, 126(18), pp. 5678-5679.

  • 42. Huang, A. C., Kautsar, S. A., Hong, Y. J., Medema, M. H., Bond, A. D., Tantillo, D. J. and Osbourn, A., 2017. Unearthing a sesterterpene biosynthetic repertoire in the Brassicaceae through genome mining reveals convergent evolution. Proceedings of the National Academy of Sciences, p. 201705567.

  • 43. Mohimani, H., Kersten, R. D., Liu, W. T., Wang, M., Purvine, S. O., Wu, S., Brewer, H. M., Pasa-Tolic, L., Bandeira, N., Moore, B. S. and Pevzner, P. A., 2014. Automated genome mining of ribosomal peptide natural products. ACS chemical biology, 9(7), pp. 1545-1551.

  • 44. Mylne, J. S., Chan, L. Y., Chanson, A. H., Daly, N. L., Schaefer, H., Bailey, T. L., Nguyencong, P., Cascales, L. and Craik, D. J., 2012. Cyclic peptides arising by evolutionary parallelism via asparaginyl-endopeptidase-mediated biosynthesis. The Plant Cell, 24(7), pp. 2765-2778.

  • 45. J. A. Condie, G. Nowak, D. W. Reed, J. J. Balsevich, M. J. Reaney, P. G. Arnison and P. S. Covello, Plant J., 2011, 67, 682.

  • 46. Mylne, J. S., Colgrave, M. L., Daly, N. L., Chanson, A. H., Elliott, A. G., McCallum, E. J., Jones, A. and Craik, D. J., 2011. Albumins and their processing machinery are hijacked for cyclic peptides in sunflower. Nature Chemical Biology, 7(5), pp. 257-259.

  • 47. Saska, I., Gillon, A. D., Hatsugai, N., Dietzgen, R. G., Hara-Nishimura, I., Anderson, M. A. and Craik, D. J., 2007. An asparaginyl endopeptidase mediates in vivo protein backbone cyclization. Journal of Biological Chemistry, 282(40), pp. 29721-29728.

  • 48. Goodbody, A. E., Endo, T., Vukovic, J., Kutney, J. P., Choi, L. S. and Misawa, M., 1988. Enzymic coupling of catharanthine and vindoline to form 3′, 4′-anhydrovinblastine by horseradish peroxidase. Planta medica, 54(02), pp. 136-140.

  • 49. Sterjiades, R., Dean, J. F. and Eriksson, K. E. L., 1992. Laccase from sycamore maple (Acer pseudoplatanus) polymerizes monolignols. Plant Physiology, 99(3), pp. 1162-1168.

  • 50. Sainsbury, F., Thuenemann, E. C., and Lomonossoff, G. P. pEAQ: versatile expression vectors for easy and quick transient expression of heterologous proteins in plants. Plant Biotechnol J. 7(7): 682-93 (2009).

  • 51. Hernandez-Garcia, C. M. and Finer, J. J. Identification and validation of promoters and cis-acting regulatory elements. Plant Sci. 217-218 (2014) 109-119.

  • 52. Morita, H., Yoshida, N., Takeya, K., Itokawa, H. & Shirota, O. Configurational and conformational analyses of a cyclic octapeptide, lyciumin A, from Lycium chinense Mill. Tetrahedron 52, 2795-2802 (1996).



REFERENCES FOR EXAMPLE 2

53. Albornos, L., Martin, I., Iglesias, R., Jimenez, T., Labrador, E., and Dopico, B. (2012). ST proteins, a new family of plant tandem repeat proteins with a DUF2775 domain mainly found in Fabaceae and Asteraceae. BMC Plant Biol. 12:207.

  • 54. Arnison, P. G., Bibb, M. J., Bierbaum, G., Bowers, A. A., Bugni, T. S., Bulaj, G., Camarero, J. A., Campopiano, D. J., Challis, G. L., Clardy, J., et al. (2013). Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat. Prod. Rep. 30:108-160.
  • 55. Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., Lesin, V. M., Nikolenko, S. I., Pham, S., Prjibelski, A. D., et al. (2012). SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19:455-477.
  • 56. Banks, J. A. (2009). Selaginella and 400 million years of separation. Annu. Rev. Plant Biol. 60:223-238.
  • 57. Barber, C. J. S., Pujara, P. T., Reed, D. W., Chiwocha, S., Zhang, H., and Covello, P. S. (2013). The Two-step Biosynthesis of Cyclic Peptides from Linear Precursors in a Member of the Plant Family Caryophyllaceae Involves Cyclization by a Serine Protease-like Enzyme. J. Biol. Chem. 288:12500-12510.
  • 58. Bassüner, R., Baumlein, H., Huth, A., Jung, R., Wobus, U., Rapoport, T. A., Saalbach, G., and Müntz, K. (1988). Abundant embryonic mRNA in field bean (Vicia faba L.) codes for a new class of seed proteins: cDNA cloning and characterization of the primary translation product. Plant Mol. Biol. 11:321-334.
  • 59. Belknap, W. R., McCue, K. F., Harden, L. A., Vensel, W. H., Bausher, M. G., and Stover, E. (2015). A family of small cyclic amphipathic peptides (SCAmpPs) genes in citrus. BMC Genomics 16:303.
  • 60. Boutilier, K. A., Gilles, M. J., DeMoor, J. M., Huang, B., Baszczynski, C. L., Iyer, V. N., and Miki, B. L. (1994). Expression of the BnmNAP subfamily of napin genes coincides with the induction of Brassica microspore embryogenesis. Plant Mol. Biol. 26:1711-1723.
  • 61. Bushmanova, E., Antipov, D., Lapidus, A., and Przhibelskiy, A. D. (2018). rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data Advance Access published 2018, doi:10.1101/420208.
  • 62. Chekan, J. R., Estrada, P., Covello, P. S., and Nair, S. K. (2017). Characterization of the macrocyclase involved in the biosynthesis of RiPP cyclic peptides in plants. Proc. Natl. Acad. Sci. U.S.A 114:6551-6556.
  • 63. Cheng, S., Melkonian, M., Smith, S. A., Brockington, S., Archibald, J. M., Delaux, P.-M., Li, F.-W., Melkonian, B., Mavrodiev, E. V., Sun, W., et al. (2018). 10 KP: A phylodiverse genome sequencing plan. Gigascience 7:1-9.


64. Claeson, P., Goransson, U., Johansson, S., Luijendijk, T., and Bohlin, L. (1998). Fractionation Protocol for the Isolation of Polypeptides from Plant Biomass. J. Nat. Prod. 61:77-81.

  • 65. Condie, J. A., Nowak, G., Reed, D. W., Balsevich, J. J., Reaney, M. J. T., Arnison, P. G., and Covello, P. S. (2011). The biosynthesis of Caryophyllaceae-like cyclic peptides in Saponaria vaccaria L. from DNA-encoded precursors. Plant J. 67:682-690.
  • 66. Conlan, B. F., Gillon, A. D., Barbeta, B. L., and Anderson, M. A. (2011). Subcellular targeting and biosynthesis of cyclotides in plant cells. Am. J. Bot. 98:2018-2026.
  • 67. Craik, D. J., Lee, M.-H., Rehm, F. B. H., Tombling, B., Doffek, B., and Peacock, H. (2018). Ribosomally-synthesised cyclic peptides from plants as drug leads and pharmaceutical scaffolds. Bioorg. Med. Chem. 26:2727-2737.
  • 68. Dutton, J. L., Renda, R. F., Waine, C., Clark, R. J., Daly, N. L., Jennings, C. V., Anderson, M. A., and Craik, D. J. (2004). Conserved structural and sequence elements implicated in the processing of gene-encoded circular proteins. J. Biol. Chem. 279:46858-46867.
  • 69. Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32:1792-1797.
  • 70. Gibson, D. G., Young, L., Chuang, R.-Y., Venter, J. C., Hutchison, C. A., 3rd, and Smith, H. O. (2009). Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6:343-345.
  • 71. Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., et al. (2011). Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29:644-652.
  • 72. Griesmann, M., Chang, Y., Liu, X., Song, Y., Haberer, G., Crook, M. B., Billault-Penneteau, B., Lauressergues, D., Keller, J., Imanishi, L., et al. (2018). Phylogenomics reveals multiple losses of nitrogen-fixing root nodule symbiosis. Science 361.
  • 73. Gruber, C. W., Cemazar, M., Clark, R. J., Horibe, T., Renda, R. F., Anderson, M. A., and Craik, D. J. (2007). A novel plant protein-disulfide isomerase involved in the oxidative folding of cystine knot defense proteins. J. Biol. Chem. 282:20435-20446.
  • 74. Gui, B., Shim, Y. Y., Datla, R. S. S., Covello, P. S., Stone, S. L., and Reaney, M. J. T. (2012). Identification and Quantification of Cyclolinopeptides in Five Flaxseed Cultivars. J. Agric. Food Chem. 60:8571-8579.
  • 75. Hernandez, J. F., Gagnon, J., Chiche, L., Nguyen, T. M., Andrieu, J. P., Heitz, A., Trinh Hong, T., Pham, T. T., and Le Nguyen, D. (2000). Squash trypsin inhibitors from Momordica cochinchinensis exhibit an atypical macrocyclic structure. Biochemistry 39:5722-5730.
  • 76. James, A. M., Jayasena, A. S., Zhang, J., Berkowitz, O., Secco, D., Knott, G. J., Whelan, J., Bond, C. S., and Mylne, J. S. (2017). Evidence for Ancient Origins of Bowman-Birk Inhibitors from Selaginella moellendorffii. Plant Cell 29:461-473.
  • 77. Jayasena, A. S., Fisher, M. F., Panero, J. L., Secco, D., Bernath-Levin, K., Berkowitz, O., Taylor, N. L., Schilling, E. E., Whelan, J., and Mylne, J. S. (2017). Stepwise Evolution of a Buried Inhibitor Peptide over 45 My. Mol. Biol. Evol. 34:1505-1516.
  • 78. Jennings, C., West, J., Waine, C., Craik, D., and Anderson, M. (2001). Biosynthesis and insecticidal properties of plant cyclotides: the cyclic knotted proteins from Oldenlandia affinis. Proc. Natl. Acad. Sci. U.S.A 98:10614-10619.
  • 79. Kaufmann, H. P., and Tobschirbel, A. (1959). Uber ein Oligopeptid aus Leinsamen. Chem. Ber. 92:2805-2809.
  • 80. Kersten, R. D., and Weng, J.-K. (2018). Gene-guided discovery and engineering of branched cyclic peptides in plants. Proc. Natl. Acad. Sci. U.S.A 115:E10961-E10969.
  • 81. Koehbach, J., O'Brien, M., Muttenthaler, M., Miazzo, M., Akcan, M., Elliott, A. G., Daly, N. L., Harvey, P. J., Arrowsmith, S., Gunasekera, S., et al. (2013). Oxytocic plant cyclotides as templates for peptide G protein-coupled receptor ligand design. Proc. Natl. Acad. Sci. U.S.A 110:21183-21188.
  • 82. Kumar, S., Stecher, G., and Tamura, K. (2016). MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol. Biol. Evol. 33:1870-1874.
  • 83. Luckett, S., Garcia, R. S., Barker, J. J., Konarev, A. V., Shewry, P. R., Clarke, A. R., and Brady, R. L. (1999). High-resolution structure of a potent, cyclic proteinase inhibitor from sunflower seeds. J. Mol. Biol. 290:525-533.
  • 84. Matasci, N., Hung, L.-H., Yan, Z., Carpenter, E. J., Wickett, N. J., Mirarab, S., Nguyen, N., Warnow, T., Ayyampalayam, S., Barker, M., et al. (2014). Data access for the 1,000 Plants (1 KP) project. Gigascience 3:17.
  • 85. Mylne, J. S., Colgrave, M. L., Daly, N. L., Chanson, A. H., Elliott, A. G., McCallum, E. J., Jones, A., and Craik, D. J. (2011). Albumins and their processing machinery are hijacked for cyclic peptides in sunflower. Nat. Chem. Biol. 7:257-259.
  • 86. Mylne, J. S., Chan, L. Y., Chanson, A. H., Daly, N. L., Schaefer, H., Bailey, T. L., Nguyencong, P., Cascales, L., and Craik, D. J. (2012). Cyclic peptides arising by evolutionary parallelism via asparaginyl-endopeptidase-mediated biosynthesis. Plant Cell 24:2765-2778.
  • 87. Nei, M., and Kumar, S. (2000). Molecular Evolution and Phylogenetics. Oxford University Press, USA.
  • 88. Nguyen, G. K. T., Lian, Y., Pang, E. W. H., Nguyen, P. Q. T., Tran, T. D., and Tam, J. P. (2013). Discovery of linear cyclotides in monocot plant Panicum laxum of Poaceae family provides new insights into evolution and distribution of cyclotides in plants. J. Biol. Chem. 288:3370-3380.
  • 89. Nguyen, G. K. T., Wang, S., Qiu, Y., Hemu, X., Lian, Y., and Tam, J. P. (2014). Butelase 1 is an Asx-specific ligase enabling peptide macrocyclization and synthesis. Nat. Chem. Biol. 10:732-738.
  • 90. Park, S., Yoo, K.-O., Marcussen, T., Backlund, A., Jacobsson, E., Rosengren, K. J., Doo, I., and Göransson, U. (2017). Cyclotide Evolution: Insights from the Analyses of Their Precursor Sequences, Structures and Distribution in Violets ( ). Front. Plant Sci. 8:2058.
  • 91. Poth, A. G., Colgrave, M. L., Lyons, R. E., Daly, N. L., and Craik, D. J. (2011). Discovery of an unusual biosynthetic origin for circular proteins in legumes. Proc. Natl. Acad. Sci. U.S.A 108:10127-10132.
  • 92. Poth, A. G., Mylne, J. S., Grassl, J., Lyons, R. E., Millar, A. H., Colgrave, M. L., and Craik, D. J. (2012). Cyclotides associate with leaf vasculature and are the products of a novel precursor in petunia (Solanaceae). J. Biol. Chem. 287:27033-27046.
  • 93. Priyam, A., Woodcroft, B. J., Rai, V., Munagala, A., Moghul, I., Ter, F., Gibbins, M. A., Moon, H., Leonard, G., Rumpf, W., et al. (2015). Sequenceserver: a modern graphical user interface for custom BLAST databases Advance Access published 2015, doi:10.1101/033142.
  • 94. Ragland, M., and Soliman, K. M. (1997). A molecular approach to understanding aluminum tolerance in soybean (Glycine max L.). In Global Environmental Biotechnology, pp. 125-138.
  • 95. Saether, O., Craik, D. J., Campbell, I. D., Sletten, K., Juul, J., and Norman, D. G. (1995). Elucidation of the primary and three-dimensional structure of the uterotonic polypeptide kalata B1. Biochemistry 34:4147-4158.
  • 96. Sainsbury, F., Thuenemann, E. C., and Lomonossoff, G. P. (2009). pEAQ: versatile expression vectors for easy and quick transient expression of heterologous proteins in plants. Plant Biotechnol. J. 7:682-693.
  • 97. Saska, I., Gillon, A. D., Hatsugai, N., Dietzgen, R. G., Hara-Nishimura, I., Anderson, M. A., and Craik, D. J. (2007). An asparaginyl endopeptidase mediates in vivo protein backbone cyclization. J. Biol. Chem. 282:29721-29728.
  • 98. Tan, N.-H., and Zhou, J. (2006). Plant Cyclopeptides. Chem. Rev. 106:840-895.
  • 99. Tang, Y., Cao, Y., Gao, Z., Ou, Z., Wang, Y., Qiu, J., and Zheng, Y. (2014). Expression of a vacuole-localized BURP-domain protein from soybean (SALI3-2) enhances tolerance to cadmium and copper stresses. PLoS One 9:e98830.
  • 100. van den Berg, A. J. J., S F A, den Bosch, J. J. K., Kroes, B. H., Beukelman, C. J., Leeflang, B. R., and Labadie, R. P. (1995). Curcacycline A—a novel cyclic octapeptide isolated from the latex offatropha curcasL. FEBS Lett. 358:215-218.
  • 101. Wang, H., Zhou, L., Fu, Y., Cheung, M.-Y., Wong, F.-L., Phang, T.-H., Sun, Z., and Lam, H.-M. (2012). Expression of an apoplast-localized BURP-domain protein from soybean (GmRD22) enhances tolerance towards abiotic stress. Plant Cell Environ. 35:1932-1947.
  • 102. Wang, M., Carver, J. J., Phelan, V. V., Sanchez, L. M., Garg, N., Peng, Y., Nguyen, D. D., Watrous, J., Kapono, C. A., Luzzatto-Knaan, T., et al. (2016). Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 34:828-837.
  • 103. Weng, J.-K. (2014). The evolutionary paths towards complexity: a metabolic perspective. New Phytol. 201:1141-1149.
  • 104. Weng, J.-K., and Chapple, C. (2010). The origin and evolution of lignin biosynthesis. New Phytol. 187:273-285.
  • 105. Weng, J.-K., Philippe, R. N., and Noel, J. P. (2012). The rise of chemodiversity in plants. Science 336:1667-1670.
  • 106. Yamaguchi-Shinozaki, K., and Shinozaki, K. (1993). The plant hormone abscisic acid mediates the drought-induced expression but not the seed-specific expression of rd22, a gene responsive to dehydration stress in Arabidopsis thaliana. Mol. Gen. Genet. 238:17-25.
  • 107. Zheng, L., Heupel, R. C., and DellaPenna, D. (1992). The b Subunit of Tomato Fruit Polygalacturonase Isoenzyme 1: Isolation, Characterization, and Identification of Unique Structural Features. Plant Cell 4:1147.


INCORPORATION BY REFERENCE; EQUIVALENTS

The teachings of all patents, published applications and references cited herein are incorporated by reference in their entirety.


While example embodiments have been particularly shown and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the embodiments encompassed by the appended claims.

Claims
  • 1. A method of producing one or more lyciumin cyclic peptides, the method comprising: a) providing a host cell comprising a transgene encoding a lyciumin precursor peptide, or a biologically-active fragment thereof, wherein the lyciumin precursor peptide, or biologically-active fragment thereof, comprises one or more core lyciumin peptide domains;b) expressing the transgene in the host cell to thereby produce a lyciumin precursor peptide, or biologically-active fragment thereof, wherein the lyciumin precursor peptide, or biologically-active fragment thereof, is converted to one or more lyciumin cyclic peptides in the host cell.
  • 2. The method of claim 1, wherein the transgene is operably linked to a heterologous promoter in the host cell.
  • 3. The method of claim 1 or 2, wherein the transgene is introduced in a vector.
  • 4. The method of claim 1 or 2, further comprising introducing the transgene into the host cell.
  • 5. The method of claim 4, further comprising introducing a vector comprising the transgene into the host cell.
  • 6. The method of claim 1 or 2, wherein the lyciumin precursor peptide comprises a plurality of core lyciumin peptide domains.
  • 7. The method of claim 6, wherein the core lyciumin peptide domains encode two or more different lyciumin cyclic peptides.
  • 8. The method of claim 1 or 2, wherein the host cell expresses one or more enzymes that cyclize the lyciumin precursor peptide; one or more endopeptidases; one or more glutamine cyclotransferases; and one or more exopeptidases, or a combination thereof.
  • 9. The method of claim 1 or 2, wherein arginine is immediately N-terminal to the core lyciumin peptide domain.
  • 10. The method of claim 9, wherein the endopeptidase is an arginine endopeptidase.
  • 11. The method of claim 1 or 2, wherein tyrosine is immediately C-terminal to the core lyciumin peptide domain.
  • 12. The method of claim 1 or 2, wherein the host cell is a plant cell.
  • 13. The method of claim 12, wherein the plant cell is an Amaranthaceae family plant cell.
  • 14. The method of claim 13, wherein the plant cell is an Amaranthus genus plant cell.
  • 15. The method of claim 14, wherein the plant cell is an Amaranthus hypochondriacus plant cell.
  • 16. The method of claim 13, wherein the plant cell is a Beta genus plant cell.
  • 17. The method of claim 16, wherein the plant cell is a Beta vulgaris plant cell.
  • 18. The method of claim 13, wherein the plant cell is a Chenopodium genus plant cell.
  • 19. The method of claim 18, wherein the plant cell is a Chenopodium quinoa plant cell.
  • 20. The method of claim 12, wherein the plant cell is a Fabaceae family plant cell.
  • 21. The method of claim 20, wherein the plant cell is a Glycine genus plant cell.
  • 22. The method of claim 21, wherein the plant cell is a Glycine max plant cell.
  • 23. The method of claim 20, wherein the plant cell is a Medicago genus plant cell.
  • 24. The method of claim 23, wherein the plant cell is a Medicago truncatula plant cell.
  • 25. The method of claim 12, wherein the plant cell is a Solanaceae family plant cell.
  • 26. The method of claim 25, wherein the plant cell is a Solanum genus plant cell.
  • 27. The method of claim 26, wherein the plant cell is a Solanum melongena plant cell.
  • 28. The method of claim 26, wherein the plant cell is a Solanum tuberosum plant cell.
  • 29. The method of claim 25, wherein the plant cell is a Nicotiana genus plant cell.
  • 30. The method of claim 29, wherein the plant cell is a Nicotiana benthamiana plant cell.
  • 31. The method of claim 25, wherein the plant cell is a Capsicum genus plant cell.
  • 32. The method of claim 31, wherein the plant cell is a Capsicum annuum plant cell.
  • 33. The method of claim 1 or 2, wherein the lyciumin precursor peptide comprises SEQ ID NO: 1.
  • 34. The method of claim 1 or 2, wherein the lyciumin precursor peptide comprises SEQ ID NO: 2.
  • 35. The method of claim 1 or 2, wherein the lyciumin cyclic peptide is Lyciumin A, Lyciumin B, Lyciumin C, or Lyciumin D, or a combination thereof.
  • 36. A method of generating a library of nucleic acids encoding lyciumin precursor peptides, or biologically-active fragments thereof, the method comprising constructing a plurality of vectors, each vector comprising a nucleic acid encoding a different lyciumin precursor peptide, or biologically-active fragment thereof, operably linked to a heterologous promoter for expression in a host cell.
  • 37. The method of claim 36, further comprising introducing the plurality of vectors into host cells, wherein the lyciumin precursor peptide, or biologically-active fragments thereof, is converted to one or more lyciumin cyclic peptides in the host cell.
  • 38. The method of claim 37, wherein the host cell is a plant cell.
  • 39. The method of claim 38, wherein the plant cell is a Solanaceae family plant cell.
  • 40. The method of claim 39, wherein the plant cell is a Nicotiana genus plant cell.
  • 41. The method of claim 40, wherein the plant cell is a Nicotiana benthamiana plant cell.
  • 42. The method of any of claims 37-41, further comprising isolating a lyciumin cyclic peptide from the host cell.
  • 43. The method of any of claim 37-41, further comprising assaying for an activity of interest either crude extract from the host cell or a lyciumin peptide isolated from the host cell.
  • 44. The method of any of claim 37-41, further comprising introducing a nucleic acid encoding a lyciumin peptide having an activity of interest into a second host cell.
  • 45. The method of claim 44, wherein the second host cell is a plant cell.
  • 46. The method of claim 45, wherein the plant cell is an Amaranthaceae family plant cell.
  • 47. The method of claim 46, wherein the plant cell is an Amaranthus genus plant cell.
  • 48. The method of claim 47, wherein the plant cell is an Amaranthus hypochondriacus plant cell.
  • 49. The method of claim 46, wherein the plant cell is a Beta genus plant cell.
  • 50. The method of claim 49, wherein the plant cell is a Beta vulgaris plant cell.
  • 51. The method of claim 46, wherein the plant cell is a Chenopodium genus plant cell.
  • 52. The method of claim 51, wherein the plant cell is a Chenopodium quinoa plant cell.
  • 53. The method of claim 45, wherein the plant cell is a Fabaceae family plant cell.
  • 54. The method of claim 53, wherein the plant cell is a Glycine genus plant cell.
  • 55. The method of claim 54, wherein the plant cell is a Glycine max plant cell.
  • 56. The method of claim 53, wherein the plant cell is a Medicago genus plant cell.
  • 57. The method of claim 56, wherein the plant cell is a Medicago truncatula plant cell.
  • 58. The method of claim 45, wherein the plant cell is a Solanaceae family plant cell.
  • 59. The method of claim 58, wherein the plant cell is a Solanum genus plant cell.
  • 60. The method of claim 59, wherein the plant cell is a Solanum melongena plant cell.
  • 61. The method of claim 59, wherein the plant cell is a Solanum tuberosum plant cell.
  • 62. The method of claim 58, wherein the plant cell is a Nicotiana genus plant cell.
  • 63. The method of claim 62, wherein the plant cell is a Nicotiana benthamiana plant cell.
  • 64. The method of claim 58, wherein the plant cell is a Capsicum genus plant cell.
  • 65. The method of claim 64, wherein the plant cell is a Capsicum annuum plant cell.
  • 66. An isolated nucleic acid comprising a nucleotide sequence encoding a lyciumin precursor peptide, or a biologically-active fragment thereof, operably linked to a heterologous promoter.
  • 67. The isolated nucleic acid of claim 66, wherein the lyciumin precursor peptide comprises a plurality of core lyciumin peptide domains.
  • 68. The isolated nucleic acid of claim 67, wherein the core lyciumin peptide domains encode two or more different lyciumin cyclic peptides.
  • 69. The isolated nucleic acid of claim 66, wherein the lyciumin precursor peptide comprises SEQ ID NO: 1.
  • 70. The isolated nucleic acid of claim 66, wherein the lyciumin precursor peptide comprises SEQ ID NO: 2.
  • 71. The isolated nucleic acid of any of claims 66-70, wherein the nucleic acid is a cDNA.
  • 72. A vector comprising the nucleic acid of any of claims 66-70.
  • 73. A host cell comprising the nucleic acid of any of claims 66-71 or the vector of claim 72.
  • 74. The host cell of claim 73, wherein the host cell is a plant cell.
  • 75. The host cell of claim 74, wherein the plant cell is an Amaranthaceae family plant cell.
  • 76. The host cell of claim 75, wherein the plant cell is an Amaranthus genus plant cell.
  • 77. The host cell of claim 76, wherein the plant cell is an Amaranthus hypochondriacus plant cell.
  • 78. The host cell of claim 75, wherein the plant cell is a Beta genus plant cell.
  • 79. The host cell of claim 78, wherein the plant cell is a Beta vulgaris plant cell.
  • 80. The host cell of claim 75, wherein the plant cell is a Chenopodium genus plant cell.
  • 81. The host cell of claim 80, wherein the plant cell is a Chenopodium quinoa plant cell.
  • 82. The host cell of claim 74, wherein the plant cell is a Fabaceae family plant cell.
  • 83. The host cell of claim 82, wherein the plant cell is a Glycine genus plant cell.
  • 84. The host cell of claim 83, wherein the plant cell is a Glycine max plant cell.
  • 85. The host cell of claim 82, wherein the plant cell is a Medicago genus plant cell.
  • 86. The host cell of claim 85, wherein the plant cell is a Medicago truncatula plant cell.
  • 87. The host cell of claim 74, wherein the plant cell is a Solanaceae family plant cell.
  • 88. The host cell of claim 87, wherein the plant cell is a Solanum genus plant cell.
  • 89. The host cell of claim 88, wherein the plant cell is a Solanum melongena plant cell.
  • 90. The host cell of claim 88, wherein the plant cell is a Solanum tuberosum plant cell.
  • 91. The host cell of claim 87, wherein the plant cell is a Nicotiana genus plant cell.
  • 92. The host cell of claim 91, wherein the plant cell is a Nicotiana benthamiana plant cell.
  • 93. The host cell of claim 87, wherein the plant cell is a Capsicum genus plant cell.
  • 94. The host cell of claim 93, wherein the plant cell is a Capsicum annuum plant cell.
  • 95. A library comprising a plurality of nucleic acid molecules, each nucleic acid molecule comprising a nucleotide sequence encoding a lyciumin precursor peptide, or a biologically-active fragment thereof.
  • 96. The library of claim 95, wherein the nucleotide sequence encoding a lyciumin precursor peptide, or a biologically-active fragment thereof, is operably linked to a heterologous promoter in each nucleic acid molecule.
  • 97. The library of claim 95 or 96, wherein the nucleic acid molecules are cDNA molecules.
  • 98. A lyciumin cyclic peptide produced by the method of any of claims 1-65.
  • 99. A method of producing one or more lyciumin cyclic peptides, comprising: a) providing a host cell comprising a transgene encoding a polypeptide that comprises one or more core lyciumin peptide domains;b) expressing the transgene in the host cell to thereby produce a polypeptide that comprises one or more core lyciumin peptide domains.
  • 100. The method of claim 99, wherein the polypeptide is converted to one or more lyciumin cyclic peptides in the host cell.
RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No. 62/619,905, filed on Jan. 21, 2018. This application claims the benefit of U.S. Provisional Application No. 62/620,420, filed on Jan. 22, 2018. This application claims the benefit of U.S. Provisional Application No. 62/732,957, filed on Sep. 18, 2018. The entire teachings of the above applications are incorporated herein by reference.

PCT Information
Filing Document Filing Date Country Kind
PCT/US2019/014430 1/21/2019 WO 00
Provisional Applications (3)
Number Date Country
62732957 Sep 2018 US
62620420 Jan 2018 US
62619905 Jan 2018 US