Heterologous biosynthesis of nodulisporic acid

REFERENCE TO AN ELECTRONIC SEQUENCE LISTING

The contents of the electronic sequence listing (sequencelisting.xml; Size: 118,571 bytes; and Date of Creation: Aug. 1, 2022) is herein incorporated by reference in its entirety.

FIELD OF THE INVENTION

This invention generally relates to novel polypeptides that catalyze at least one biochemical reaction leading to the production of a nodulisporic acid (NA), polynucleotides encoding such polypeptides, methods of making such polypeptides and polynucleotides, and methods of using such polypeptides and polynucleotides to produce at least one NA by heterologous expression in a permissive host.

BACKGROUND

Filamentous fungi produce a diverse repertoire of interesting and useful chemical compounds. Members of one such class of compounds, the indole diterpenes (IDTs), are of particular interest due to their wide range of chemical diversity and concomitant bioactivities, which include anti-MRSA, (Ogata, M.; Ueda, J.; Hoshi, M.; Hashimoto, J.; Nakashima, T.; Anzai, K.; Takagi, M.; Shin-ya, K. J. Antibiot. (Tokyo) 2007, 60 (10), 645-648), anti-cancer (anti-H1N1, insecticidal and tremorgenic⁶activities. NAs (FIG. 1) are a group of notably bioactive quasi-paspaline-like IDTs produced by Hypoxylon pulicicidum, formerly classified as Nodulisporium sp. Nodulisporic acid A (NAA) 10 is of particular significance because it exhibits highly potent insecticidal activity against blood-feeding arthropods while exhibiting no observable adverse effects on mammals.

NAs are especially difficult to biosynthesize from the natural producer, H. pulicicidum. Reported NA biosynthesis methods require that H. pulicicidum be grown for 21 days in complete darkness in highly nutrient rich media. Due to the difficulty of NAA 10 biosynthesis in H. pulicicidum, obtaining useful quantities of NAA 10 using published fermentations methods is challenging, and production of commercial quantities of NAA 10 essentially unachievable. Accordingly, attempts have been made to chemically synthesize NAA 10 resulting in mechanisms for the synthesis of nodulisporic acid F (NAF) 5a and nodulisporic acid D 7a, but full synthesis of NAA 10 has not been achieved.¹²Consequently there is a need in the art for new methods of NAA 10 synthesis and/or biosynthesis that will provide useful quantities of NAA 10.

It is an object of the present invention to provide a polynucleotide encoding at least one enzyme in the NAA 10 biosynthesis pathway of H. pulicicidum and/or to provide a method of using such a vector to produce at least one indole diterpene compound that is a NA and/or to produce a precursor to NAA 10 in a heterologous host and/or to at least provide the public with a useful choice.

In this specification where reference has been made to patent specifications, other external documents, or other sources of information, this is generally for the purpose of providing a context for discussing the features of the invention. Unless specifically stated otherwise, reference to such external documents is not to be construed as an admission that such documents, or such sources of information, in any jurisdiction, are prior art, or form part of the common general knowledge in the art.

SUMMARY OF THE INVENTION

In one aspect the invention relates to an isolated polypeptide comprising an amino acid sequence selected from the group consisting of NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodM (SEQ ID NO:12), NodB (SEQ ID NO:15), NodO (SEQ ID NO:18), NodJ (SEQ ID NO:21), NodC (SEQ ID NO:24), NodY1 (SEQ ID NO:27), NodD2 (SEQ ID NO:30), NodD1 (SEQ ID NO:33), NodY2 (SEQ ID NO:36), NodZ (SEQ ID NO:39), NodS (SEQ ID NO:50), and NodI (SEQ ID NO:56) or a functional variant or fragment thereof.

In another aspect the invention relates to an isolated polynucleotide encoding a polypeptide comprising an amino acid sequence selected from the group consisting of NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodM (SEQ ID NO:12), NodB (SEQ ID NO:15), NodO (SEQ ID NO:18), NodJ (SEQ ID NO:21), NodC (SEQ ID NO:24), NodY1 (SEQ ID NO:27), NodD2 (SEQ ID NO:30), NodD1 (SEQ ID NO:33), NodY2 (SEQ ID NO:36), NodZ (SEQ ID NO:39), NodS (SEQ ID NO:50), and NodI (SEQ ID NO:56) or a functional variant or fragment thereof.

In another aspect the invention relates to an isolated polynucleotide comprising at least 70% nucleic acid sequence identity to a nucleic acid sequence selected from the group consisting of nodW cDNA (SEQ ID NO:2), nodW genomic DNA (SEQ ID NO:1), nodR cDNA (SEQ ID NO:5), nodR genomic DNA (SEQ ID NO:4), nodX cDNA (SEQ ID NO:8), nodX genomic DNA (SEQ ID NO:7), nodM cDNA (SEQ ID NO:11), nodM genomic DNA (SEQ ID NO:10), nodB cDNA (SEQ ID NO:14), nodB genomic DNA (SEQ ID NO:13), nodO cDNA (SEQ ID NO:17), nodO genomic DNA (SEQ ID NO:16), nodJ cDNA (SEQ ID NO:20), nodJ genomic DNA (SEQ ID NO:19), nodC cDNA (SEQ ID NO:23), nodC genomic DNA (SEQ ID NO:22), nodY1 cDNA (SEQ ID NO:26), nodY1 genomic DNA (SEQ ID NO:25), nodD2 cDNA (SEQ ID NO:29), nodD2 genomic DNA (SEQ ID NO:28), nodD1 cDNA (SEQ ID NO:32), nodD1 genomic DNA (SEQ ID NO:31), nodY2 cDNA (SEQ ID NO:35), nodY2 genomic DNA (SEQ ID NO:34), nodZ cDNA (SEQ ID NO:38), nodZ genomic DNA (SEQ ID NO:37), nodS cDNA (SEQ ID NO:49), nods genomic DNA (SEQ ID NO:48), nodI cDNA (SEQ ID NO:55), and nodI genomic DNA (SEQ ID NO:54).

In another aspect the invention relates to a transcription unit (TU) comprising at least one isolated polynucleotide according to the invention.

In another aspect the invention relates to a vector that encodes an isolated polypeptide according to the invention.

In another aspect the invention relates to a vector comprising an isolated nucleic acid sequence or a TU according to the invention.

In another aspect the invention relates to an isolated host cell comprising an isolated polypeptide, isolated polynucleotide, TU and/or vector according to the invention.

In another aspect the invention relates to a method of making at least one NA comprising heterologously expressing at least one polypeptide, isolated nucleic acid sequence, TU or vector according to the invention in an isolated host cell.

In another aspect the invention relates to at least one NA made by a method of the invention.

In another aspect the present invention relates to an isolated polypeptide or functional fragment or variant thereof from Hypoxylon spp. that catalyzes a biochemical reaction in the biosynthetic pathway leading from 3-geranylgeranyl indole (GGI) 2 to NAA 10.

In another aspect the present invention relates to an isolated polynucleotide encoding at least one polypeptide or functional variant or fragment thereof from Hypoxylon spp. that catalyzes a biochemical reaction in the biosynthetic pathway leading from GGI 2 to NAA 10.

In another aspect the invention relates to a method of making at least one Hypoxylon spp. polypeptide or functional variant or fragment thereof comprising heterologously expressing an isolated nucleic acid sequence or vector according to the invention in an isolated host cell.

In another aspect the invention relates to a method of making at least one NA comprising heterologously expressing in an isolated host cell, at least one polypeptide that catalyzes a biochemical reaction in the biosynthetic pathway leading from GGI 2 to NAA 10.

In another aspect the invention relates to an isolated host cell that expresses at least one heterologous polypeptide that catalyzes the transformation of a substrate in the biosynthetic pathway leading from GGI 2 to the formation of NAA 10.

In another aspect the invention relates to an isolated host cell that produces by heterologous expression, at least one polypeptide involved in the biosynthetic pathway leading from GGI 2 to NAA 10.

In another aspect the invention relates to a method of producing at least one NA comprising contacting a carbohydrate comprising substrate with a recombinant cell transformed with a nucleic acid that results in an increased level of activity of a polypeptide selected from the group consisting of NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodM (SEQ ID NO:12), NodB (SEQ ID NO:15), NodO (SEQ ID NO:18), NodJ (SEQ ID NO:21), NodC (SEQ ID NO:24), NodY1 (SEQ ID NO:27), NodD2 (SEQ ID NO:30), NodD1 (SEQ ID NO:33), NodY2 (SEQ ID NO:36), NodZ (SEQ ID NO:39), NodS (SEQ ID NO:50), and NodI (SEQ ID NO:56) or a functional variant or fragment thereof compared to the cell prior to transformation, such that the substrate is metabolized to at least one NA.

In another aspect the invention relates to an isolated strain of Hypoxylon pulicicidum that comprises at least one heterologous nucleic acid sequence encoding an enzyme in a biosynthetic pathway leading to NAA 10.

In another aspect the invention relates to an isolated strain of Hypoxylon pulicicidum that expresses at least two different GGPPS enzymes.

In another aspect the invention relates to an isolated strain of Hypoxylon pulicicidum that comprises a genetic modification that leads to an increased biosynthesis of NAA 10.

In another aspect the invention relates to a method of making NAA 10 comprising expressing at least one heterologous nucleic acid sequence in Hypoxylon pulicicidum, wherein the at least one heterologous nucleic acid sequence encodes an enzyme in a biosynthetic pathway leading to NAA 10.

Various embodiments of the different aspects of the invention as discussed above are also set out below in the detailed description of the invention, but the invention is not limited thereto.

Other aspects of the invention may become apparent from the following description which is given by way of example only and with reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will now be described with reference to the figures in the accompanying drawings.

FIG. 1: Collection of known nodulisporic acids (NAs).

FIG. 2: Branch points in the biosynthetic pathway of indole diterpenes (IDTs) that give rise to the diverse array of IDT structures. Arrows represent enzymatic steps in IDT biosynthesis.

FIG. 3: HPLC analysis (271 nm) of extracts of P. paxilli knockout (KO) strains (in gray (⋅⋅⋅⋅⋅⋅⋅)) expressing different H. pulicicidum (Nod) enzymes and/or P. paxilli (Pax) enzymes (in black (-)). A black X covers the enzyme(s) that are not expressed in the P. paxilli KO strains (traces i.a, ii.a, iii.a, iv.a, and v.a). The enzyme(s) that have been newly expressed in the P. paxilli KO strain are depicted below the corresponding KO strain and next to their UV traces (i.b, ii.b, iii.b, iv.b, and v.b). Notably there is a compound that elutes at the same retention time as emindole SB 4a, but emindole SB 4a is only present in three traces (ii.b, iii.b, and v.b) as confirmed by corresponding 406.31±0.01 m/z EICs (FIGS. 5, 6, and 9). Traces correspond to fungal extracts as follows: i.a=PN2290, i.b=pKV27:PN2690, ii.a=PN2257, ii.b=pKV63:PN2257, iii.a=PN2250, iii.b=pSK66:PN2250, iv.a=PN2250, iv.b=pKV74:PN2250, v.a=PN2257, v.b=pKV64:PN2257.

FIG. 4: Extracted ion chromatogram for pKV27:PN2290 (nodC:ΔpaxC) showing MS peak for paxilline 6b (5.3 min, 436.248±0.01 m z).

FIG. 5: Extracted ion chromatograms for pKV63:PN2257 (nodM:ΔpaxM) showing MS peak for emindole SB 4a (19.3 min, 406.31±0.01 m z) but not paspaline 4b (17.6 min, 422.305±0.01 m z).

FIG. 6: Extracted ion chromatograms for pSK66:PN2250 (paxG, nodC, nodM, and nodB:ΔPAX cluster) showing MS peak for emindole SB 4a (19.3 min, 406.31±0.01 m z) but not paspaline 4b (17.6 min, 422.305±0.01 m z).

FIG. 7: Extracted ion chromatogram for pKV74:PN2250 (paxG, paxC, paxM, nodB:ΔPAX cluster) showing MS peak for paspaline 4b (17.6 min, 422.305±0.01 m z) but not emindole SB 4a (19.3 min, 406.31±0.01 m z).

FIG. 8: Depiction of the predicted NA gene cluster from H. pulicicidum (A) and the NAF 5a biosynthetic pathway (B). Arrows represent individual genes and arrow decorations represent gene function. Figure is not to exact scale and does not include exon/intron structure. Notably the gene cluster lacks a GGPPS responsible for the first secondary-metabolic step in IDT synthesis.

FIG. 9: Extracted ion chromatograms for pKV64:PN2257 (nodM and nodW:ΔpaxM) showing MS peaks for emindole SB 4a (19.3 min, 406.31±0.01 m z) and NAF 5a (6.2 min, 436.284±0.01 m z).

FIG. 10: Extracted ion chromatograms for pSK68:PN2250 (paxG, nodC, nodM, nodB, and nodW: ΔPAX cluster) showing MS peaks for emindole SB 4a (19.3 min, 406.31±0.01 m z) and NAF 5a (6.2 min, 436.284±0.01 m z).

FIG. 11: Overview of MIDAS Level-1 cloning. (A) ggtctegtgagacg (SEQ ID NO: 125); cgtctctagacc (SEQ ID NO: 126); ccagagcactctgc (SEQ ID NO: 127); gcagagatctgg (SEQ ID NO: 128); (B) cgtctcactcgggag (SEQ ID NO: 129); aatgtgagacagagacg (SEQ ID NO: 130); gcagagtgagccctc (SEQ ID NO: 131); ttacactctgtctctgc (SEQ ID NO: 132); (C) ggtctcgggag (SEQ ID NO: 133); aatgtgagacc (SEQ ID NO: 134); ccagagccctc (SEQ ID NO: 135); ttacactctgg (SEQ ID NO: 136).

FIG. 12: MIDAS module address system.

FIG. 13: Overview of MIDAS cassettes.

FIG. 14: Principle of MIDAS multigene assembly (level-3).

FIG. 15: Overview of MIDAS format.

DETAILED DESCRIPTION OF THE INVENTION
Definitions

The term “comprising” as used in this specification and claims means “consisting at least in part of”; that is to say when interpreting statements in this specification and claims which include “comprising”, the features prefaced by this term in each statement all need to be present but other features can also be present. Related terms such as “comprise” and “comprised” are to be interpreted in similar manner.

The term “consisting essentially of” as used herein means the specified materials or steps and those that do not materially affect the basic and novel characteristic(s) of the claimed invention.

The term “consisting of” as used herein means the specified materials or steps of the claimed invention, excluding any element, step, or ingredient not specified in the claim.

The terms “recognition site” and “restriction site” are used interchangeably herein and mean the same thing. These terms as used herein with reference to a restriction enzyme mean the nucleic acid sequence or sequences of a polynucleotide that define the binding site on the polynucleotide for a given restriction enzyme.

The term “indole diterpene (IDT) compound” or “indole diterpenoid” refers to any compound derived from an indole containing precursor, preferably indole-3-glycerol phosphate 1b, and geranylgeranyl pyrophosphate (GGPP) 1a.

In some embodiments an IDT compound is selected from the group consisting of GGI 2, emindole SB 4a, and NAF 5a.

The term “genetic construct” refers to a polynucleotide molecule, usually double-stranded DNA, which has been conjugated to another polynucleotide molecule. In one non-limiting example a genetic construct is made by inserting a first polynucleotide molecule into a second polynucleotide molecule, for example by restriction/ligation as known in the art. In some embodiments, a genetic construct comprises a single polynucleotide module, at least two polynucleotide modules, or a series of multiple polynucleotide modules assembled into a single contiguous polynucleotide molecule (also referred to herein as a “multigene construct”), but not limited thereto.

A genetic construct may contain the necessary elements that permit transcription of a polynucleotide molecule, and, optionally, for translating the transcript into a polypeptide. A polynucleotide molecule comprised in and/or by the gene construct may be derived from the host cell, or may be derived from a different cell or organism and/or may be a recombinant polynucleotide. Once inside the host cell the genetic construct may become integrated in the host chromosomal DNA. The genetic construct may be linked to a vector.

The term “transcription unit” (TU) as used herein refers to a polynucleotide comprising a sequence of nucleotides that code for a single RNA molecule including all the nucleotide sequences necessary for transcription of the single RNA molecule, including a promoter, an RNA-coding sequence, and a terminator, but not limited thereto.

The term “transcription unit module” (TUM) as used herein refers to a polynucleotide comprising a sequence of nucleotides that encode a single RNA molecule, or parts thereof; or that encode a protein coding sequence (CDS), or parts thereof; or that encode sequence elements, or parts thereof, that control transcription of that RNA molecule; or that encode sequence elements or parts thereof that control translation of the CDS. Such sequence elements may include, but are not limited to, promoters, untranslated regions (UTRs), terminators, polyadenylation signals, ribosome binding sites, transcriptional enhancers and translational enhancers.

The term “multigene construct” as used herein means a genetic construct that is a polynucleotide comprising at least two TUs.

The term “marker” as used herein means a nucleic acid sequence in a polynucleotide that encodes a selectable marker or scorable marker.

The term “selectable marker” as used herein refers to a TU, which when introduced into a cell, confers at least one trait on the cell that allows the cell to be selected based on the presence or absence of that trait. In one embodiment the cell is selected based on survival under conditions that kill cells not comprising the at least one selectable marker.

The term “scorable marker” as used herein refers to a TU, which when introduced into a cell, confers at least one trait on the cell that allows the cell to be scored based on the presence or absence of that trait. In one embodiment the cell comprising the TU is scored by identifying the cell phenotypically from a plurality of cells.

The term “genetic element” as used herein refers to any polynucleotide sequence that is not a TU or does not form part of a TU. Such polynucleotide sequences may include, but are not limited to origins of replication for plasmids and viruses, centromeres, telomeres, repeat sequences, sequences used for homologous recombination, site-specific recombination sequences, and sequences controlling DNA transfer between organisms.

The term “vector” as used herein refers to any type of polynucleotide molecule that may be used to manipulate genetic material so that it can be amplified, replicated, manipulated, partially replicated, modified and/or expressed, but not limited thereto. In some embodiments a vector may be used to transport a polynucleotide comprised in that vector into a cell or organism.

The term “source vector” as used herein refers to a vector into which polynucleotide sequences of interest can be cloned. In some embodiments the polynucleotide sequences are TUs and TUMs as described herein. In some embodiments a source vector is selected from the group consisting of plasmids, bacterial artificial chromosomes (BACs), phage artificial chromosomes (PACs), yeast artificial chromosomes (YACs), bacteriophage, phagemids, and cosmids. In some embodiments, a source vector comprising a polynucleotide sequence of interest is termed an entry clone. In some embodiments the entry clone can serve as a shuttle or destination vector for receiving further polynucleotide sequences.

The term “shuttle vector” as used herein refers to a vector into which polynucleotide sequences of interest can be cloned and from which they can be manipulated. In some embodiments the polynucleotide sequences are TUs and TUMs as described herein. In some embodiments a shuttle vector is selected from the group consisting of plasmids, BACs, PACs, YACs, bacteriophage, phagemids, and cosmids. In some embodiments, a shuttle vector comprising a polynucleotide sequence of interest can serve as a destination vector for receiving further polynucleotide sequences.

The term “destination vector” as used herein refers to a vector into which polynucleotide sequences of interest can be cloned. In some embodiments the polynucleotide sequences are TUs and TUMs as described herein. In some embodiments a destination vector is selected from the group consisting of plasmids, BACs, PACs, YACs, bacteriophage, phagemids, and cosmids. In some embodiments, a destination vector comprising a polynucleotide sequence of interest is an entry clone. In some embodiments the entry clone can serve as a destination vector for receiving further polynucleotide sequences.

The term “polynucleotide(s),” as used herein, means a single or double-stranded deoxyribonucleotide or ribonucleotide polymer of any length, and include as non-limiting examples, coding and non-coding sequences of a gene, sense and antisense sequences, exons, introns, genomic DNA, cDNA, pre-mRNA, mRNA, rRNA, siRNA, miRNA, tRNA, ribozymes, recombinant polynucleotides, isolated and purified naturally occurring DNA or RNA sequences, synthetic RNA and DNA sequences, nucleic acid probes, primers, fragments, genetic constructs, vectors and modified polynucleotides. Reference to nucleic acids, nucleic acid molecules, nucleotide sequences and polynucleotide sequences is to be similarly understood.

The term “gene” as used herein refers to gene the biologic unit of heredity, self-reproducing and located at a definite position (locus) on a particular chromosome. In one embodiment the particular chromosome is a eukaryotic or bacterial chromosome. The term bacterial chromosome is used interchangeably herein with the term bacterial genome.

The term “gene cluster” as used herein refers to a group of genes located closely together on the same chromosome whose products play a coordinated role in a specific aspect of cellular primary or secondary metabolism. In one example a gene cluster comprises a group of CDSs the products of which all participate in a series of biochemical reactions that comprise the biosynthetic pathway or array that produces a given metabolite, particularly a secondary metabolite.

The term “secondary metabolite” as used herein refers to compounds that are not involved in primary metabolism, and therefore differ from the more prevalent macromolecules such as proteins and nucleic acids that make up the basic machinery of life.

The terms “under conditions wherein the . . . enzyme is active” and “under conditions wherein the . . . enzymes are active”, and grammatical variations thereof when used in reference to enzyme activity mean that the enzyme will perform it's expected function; e.g., a restriction endonuclease will cleave a nucleic acid at an appropriate restriction site, and a DNA ligase will covalently join two polynucleotides together.

The term “endogenous” as used herein refers to a constituent of a cell, tissue or organism that originates or is produced naturally within that cell, tissue or organism. An “endogenous” constituent may be any constituent including but not limited to a polynucleotide, a polypeptide including a non-ribosomal polypeptide, a fatty acid or a polyketide, but not limited thereto.

The term “exogenous” as used herein refers to any constituent of a cell, tissue or organism that does not originate or is not produced naturally within that cell, tissue or organism. An exogenous constituent may be, for example, a polynucleotide sequence that has been introduced into a cell, tissue or organism, or a polypeptide expressed in that cell, tissue or organism from that polynucleotide sequence.

“Naturally occurring” as used herein with reference to a polynucleotide sequence according to the invention refers to a primary polynucleotide sequence that is found in nature. A synthetic polynucleotide sequence that is identical to a wild polynucleotide sequence is, for the purposes of this disclosure, considered a naturally occurring sequence. What is important for a naturally occurring polynucleotide sequence is that the actual sequence of nucleotide bases that comprise the polynucleotide is found or known from nature.

For example, a wild type polynucleotide sequence is a naturally occurring polynucleotide sequence, but not limited thereto. A naturally occurring polynucleotide sequence also refers to variant polynucleotide sequences as found in nature that differ from wild type. For example, allelic variants and naturally occurring recombinant polynucleotide sequences due to hybridization or horizontal gene transfer, but not limited thereto.

“Non-naturally occurring” as used herein with reference to a polynucleotide sequence according to the invention refers to a polynucleotide sequence that is not found in nature. Examples of non-naturally occurring polynucleotide sequences include artificially produced mutant and variant polynucleotide sequences, made for example by point mutation, insertion, or deletion, but not limited thereto. Non-naturally occurring polynucleotide sequences also include chemically evolved sequences. What is important for a non-naturally occurring polynucleotide sequence according to the invention is that the actual sequence of nucleotide bases that comprise the polynucleotide is not found or known from nature.

The term, “wild type” when used herein with reference to a polynucleotide refers to a naturally occurring; non-mutant form of a polynucleotide. A mutant polynucleotide means a polynucleotide that has sustained a mutation as known in the art, such as point mutation, insertion, deletion, substitution, amplification or translocation, but not limited thereto.

The term, “wild type” when used herein with reference to a polypeptide refers to a naturally occurring, non-mutant form of a polypeptide. A wild type polypeptide is a polypeptide that is capable of being expressed from a wild type polynucleotide.

The term “coding sequence” or “open reading frame” (ORF) refers to the sense strand of a genomic DNA sequence or a cDNA sequence that is capable of producing a transcription product and/or a polypeptide under the control of appropriate regulatory sequences. The CDS is identified by the presence of a 5′ translation start codon and a 3′ translation stop codon. When inserted into a genetic construct or an expression cassette, a “coding sequence” (CDS) is capable of being expressed when it is operably linked to a promoter sequence and/or other regulatory elements.

“Operably-linked” means that the sequence to be expressed is placed under the control of regulatory elements.

“Regulatory elements” as used herein refers to any nucleic acid sequence element that controls or influences the expression of a polynucleotide insert from a vector, genetic construct or expression cassette and includes promoters, transcription control sequences, translation control sequences, origins of replication, tissue-specific regulatory elements, temporal regulatory elements, enhancers, polyadenylation signals, repressors and terminators. Regulatory elements can be “homologous” or “heterologous” to the polynucleotide insert to be expressed from a genetic construct, expression cassette or vector as described herein. When a genetic construct, expression cassette or vector as described herein is present in a cell, a regulatory element can be “endogenous”, “exogenous”, “naturally occurring” and/or “non-naturally occurring” with respect to cell.

The term “noncoding region” refers to untranslated sequences that are upstream of the translational start site and downstream of the translational stop site. These sequences are also referred to respectively as the 5′ UTR and the 3′ UTR. These regions include elements required for transcription initiation and termination and for regulation of translation efficiency.

Terminators are sequences, which terminate transcription, and are found in the 3′ untranslated ends of genes downstream of the translated sequence. Terminators are important determinants of mRNA stability and in some cases have been found to have spatial regulatory functions.

The term “promoter” refers to nontranscribed cis-regulatory elements upstream of the coding region that regulate the transcription of a polynucleotide sequence. Promoters comprise cis-initiator elements which specify the transcription initiation site and conserved boxes. In one non-limiting example, bacterial promoters may comprise a “Pribnow box” (also known as the −10 region), and other motifs that are bound by transcription factors and promote transcription. Promoters can be homologous or heterologous with respect to polynucleotide sequence to be expressed. When the polynucleotide sequence is to be expressed in a cell, a promoter may be an endogenous or exogenous promoter. Promoters can be constitutive promoters, inducible promoters or regulatable promoters as known in the art.

“Homologous” as used herein with reference to polynucleotide regulatory elements, means a polynucleotide regulatory element that is a native and naturally-occurring polynucleotide regulatory element. A homologous polynucleotide regulatory element may be operably linked to a polynucleotide of interest such that the polynucleotide of interest can be expressed from a TU, genetic element or vector according to the invention.

“Homologous” as used herein with reference to polynucleotide or polypeptide in a host organism means that the polynucleotide or polypeptide is a native and naturally-occurring polynucleotide or polynucleotide within that host organism. A homologous polynucleotide may be operably linked to a homologous or heterologous regulatory element so that a homologous polypeptide may be expressed from a TU, genetic element or vector comprising the homologous polynucleotide as described herein.

“Introduced Homologous” as used herein with reference to polynucleotide or polypeptide in a host organism means that the polynucleotide or polypeptide is a native and naturally-occurring polynucleotide or polynucleotide within that host organism that has been introduced into the organism by experimental techniques. A introduced homologous polynucleotide may be operably linked to a homologous or heterologous regulatory element so that a homologous polypeptide may be expressed from a TU, genetic element or vector comprising the homologous polynucleotide as described herein.

“Heterologous” as used herein with reference to polynucleotide regulatory elements, means a polynucleotide regulatory element that is not a native and naturally-occurring polynucleotide regulatory element. A heterologous polynucleotide regulatory element is not normally associated with the CDS to which it is operably linked. A heterologous regulatory element may be operably linked to a polynucleotide of interest such that the polynucleotide of interest can be expressed from a, vector, genetic construct or expression cassette according to the invention. Such promoters may include promoters normally associated with other genes, ORFs or coding regions, and/or promoters isolated from any other bacterial, viral, eukaryotic, or mammalian cell.

“Heterologous” as used herein with reference to a polynucleotide or polypeptide in a host organism means a polynucleotide or polypeptide that is not a native and naturally-occurring polynucleotide or polypeptide in that host organism. A heterologous polynucleotide may be operably linked to a heterologous or homologous regulatory element so that a heterologous polypeptide may be expressed from a TU, genetic element or vector comprising the heterologous polynucleotide as described herein.

The terms “heterologously expressing” and “heterologous expression” mean the expression of a heterologous polypeptide in a host cell.

A “biochemical reaction in the biosynthetic pathway leading from GGI 2 to NAA 10” means one of the specific reactions catalyzed by one of the specific enzymes involved in transforming the substrate molecule GGI 2 through the following intermediates: mono-expoxidized GGI 3a, emindole SB 4a, NAF 5a, NAE 6a, NAD 7a, NAC 8, NAB 9, to NAA 10, and does not include similar enzymes within a host cell that may have similar functions but that do not act on the particular named intermediates above.

A “functional variant or fragment thereof” of a polypeptide is a subsequence of the polypeptide that performs a function that is required for the biological activity or binding of that polypeptide and/or provides the three dimensional structure of the polypeptide. The term may refer to a polypeptide, an aggregate of a polypeptide such as a dimer or other multimer, a fusion polypeptide, a polypeptide fragment, a polypeptide variant, or functional polypeptide derivative thereof that is capable of performing the polypeptide activity.

“Isolated” as used herein with reference to polynucleotide or polypeptide sequences describes a sequence that has been removed from its natural cellular environment. An isolated molecule may be obtained by any method or combination of methods as known and used in the art, including biochemical, recombinant, and synthetic techniques. The polynucleotide or polypeptide sequences may be prepared by at least one purification step.

“Isolated” when used herein in reference to a cell or host cell describes to a cell or host cell that has been obtained or removed from an organism or from its natural environment and is subsequently maintained in a laboratory environment as known in the art. The term encompasses single cells, per se, as well as cells or host cells comprised in a cell culture and can include a single cell or single host cell.

The term “isolated host cell” as used herein with reference to a fungal host cell encompasses single cells of unicellular fungi and the hyphae and mycelia of filamentous fungi including septate and non-septate forms.

The term “recombinant” refers to a polynucleotide sequence that is removed from sequences that surround it in its natural context and/or is recombined with sequences that are not present in its natural context. A “recombinant” polypeptide sequence is produced by translation from a “recombinant” polynucleotide sequence.

As used herein, the term “variant” refers to polynucleotide or polypeptide sequences different from the specifically identified sequences, wherein one or more nucleotides or amino acid residues is deleted, substituted, or added. Variants may be naturally occurring allelic variants, or non-naturally occurring variants. Variants may be from the same or from other species and may encompass homologues, paralogues and orthologues. In certain embodiments, variants of the polypeptides useful in the invention have biological activities that are the same or similar to those of a corresponding wild type molecule; i.e., the parent polypeptides or polynucleotides.

In certain embodiments, variants of the polypeptides described herein have biological activities that are similar, or that are substantially similar to their corresponding wild type molecules. In certain embodiments the similarities are similar activity and/or binding specificity.

In certain embodiments, variants of polypeptides described herein have biological activities that differ from their corresponding wild type molecules. In certain embodiments the differences are altered activity and/or binding specificity.

The term “variant” with reference to polynucleotides and polypeptides encompasses all forms of polynucleotides and polypeptides as defined herein.

Variant polynucleotide sequences preferably exhibit at least 50%, at least 60%, preferably at least 70%, preferably at least 71%, preferably at least 72%, preferably at least 73%, preferably at least 74%, preferably at least 75%, preferably at least 76%, preferably at least 77%, preferably at least 78%, preferably at least 79%, preferably at least 80%, preferably at least 81%, preferably at least 82%, preferably at least 83%, preferably at least 84%, preferably at least 85%, preferably at least 86%, preferably at least 87%, preferably at least 88%, preferably at least 89%, preferably at least 90%, preferably at least 91%, preferably at least 92%, preferably at least 93%, preferably at least 94%, preferably at least 95%, preferably at least 96%, preferably at least 97%, preferably at least 98%, and preferably at least 99% identity to a sequence of the present invention. Identity is found over a comparison window of at least 8 nucleotide positions, preferably at least 10 nucleotide positions, preferably at least 15 nucleotide positions, preferably at least 20 nucleotide positions, preferably at least 27 nucleotide positions, preferably at least 40 nucleotide positions, preferably at least 50 nucleotide positions, preferably at least 60 nucleotide positions, preferably at least 70 nucleotide positions, preferably at least 80 nucleotide positions, preferably over the entire length of a polynucleotide used in or identified according to a method of the invention.

Polynucleotide variants also encompass those which exhibit a similarity to one or more of the specifically identified sequences that is likely to preserve the functional equivalence of those sequences and which could not reasonably be expected to have occurred by random chance.

Polynucleotide sequence identity and similarity can be determined readily by those of skill in the art.

Variant polynucleotides also encompasses polynucleotides that differ from the polynucleotide sequences described herein but that, as a consequence of the degeneracy of the genetic code, encode a polypeptide having similar activity to a polypeptide encoded by a polynucleotide of the present invention. A sequence alteration that does not change the amino acid sequence of the polypeptide is a “silent variation”. Except for ATG (methionine) and TGG (tryptophan), other codons for the same amino acid may be changed by art recognized techniques, e.g., to optimize codon expression in a particular host organism.

Polynucleotide sequence alterations resulting in conservative substitutions of one or several amino acids in the encoded polypeptide sequence without significantly altering its biological activity are also included in the invention. A skilled artisan will be aware of methods for making phenotypically silent amino acid substitutions (see, e.g., Bowie et al., 1990, Science 247, 1306).

The term “variant” with reference to polypeptides also encompasses naturally occurring, recombinantly and synthetically produced polypeptides. Variant polypeptide sequences preferably exhibit at least 35%, preferably at least 40%, preferably at least 50%, preferably at least 60%, preferably at least 70%, preferably at least 71%, preferably at least 72%, preferably at least 73%, preferably at least 74%, preferably at least 75%, preferably at least 76%, preferably at least 77%, preferably at least 78%, preferably at least 79%, preferably at least 80%, preferably at least 81%, preferably at least 82%, preferably at least 83%, preferably at least 84%, preferably at least 85%, preferably at least 86%, preferably at least 87%, preferably at least 88%, preferably at least 89%, preferably at least 90%, preferably at least 91%, preferably at least 92%, preferably at least 93%, preferably at least 94%, preferably at least 95%, preferably at least 96%, preferably at least 97%, preferably at least 98%, and preferably at least 99% identity to a sequence of the present invention. Identity is found over a comparison window of at least 2 amino acid positions, preferably at least 3 amino acid positions, preferably at least 4 amino acid positions, preferably at least 5 amino acid positions, preferably at least 7 amino acid positions, preferably at least 10 amino acid positions, preferably at least 15 amino acid positions, preferably at least 20 amino acid positions, preferably over the entire length of a polypeptide used in or identified according to a method of the invention.

Polypeptide variants also encompass those which exhibit a similarity to one or more of the specifically identified sequences that is likely to preserve the functional equivalence of those sequences and which could not reasonably be expected to have occurred by random chance.

Polypeptide sequence identity and similarity can be determined readily by those of skill in the art.

A variant polypeptide includes a polypeptide wherein the amino acid sequence differs from a polypeptide herein by one or more conservative amino acid or non-conservative substitutions, deletions, additions or insertions which do not affect the biological activity of the peptide.

Conservative substitutions typically include the substitution of one amino acid for another with similar characteristics, e.g., substitutions within the following groups: valine, glycine; glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid; asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, tyrosine.

Analysis of evolved biological sequences has shown that not all sequence changes are equally likely, reflecting at least in part the differences in conservative versus non-conservative substitutions at a biological level. For example, certain amino acid substitutions may occur frequently, whereas others are very rare. Evolutionary changes or substitutions in amino acid residues can be modelled by a scoring matrix also referred to as a substitution matrix. Such matrices are used in bioinformatics analysis to identify relationships between sequences and are known to the skilled worker.

Other variants include peptides with modifications which influence peptide stability. Such analogs may contain, for example, one or more non-peptide bonds (which replace the peptide bonds) in the peptide sequence. Also included are analogs that include residues other than naturally occurring L-amino acids, e.g. D-amino acids or non-naturally occurring synthetic amino acids, e.g. beta or gamma amino acids and cyclic analogs.

Substitutions, deletions, additions or insertions may be made by mutagenesis methods known in the art. A skilled worker will be aware of methods for making phenotypically silent amino acid substitutions. See for example Bowie et al., 1990, Science 247, 1306.

A polypeptide as used herein can also refer to a polypeptide that has been modified during or after synthesis, for example, by biotinylation, benzylation, glycosylation, phosphorylation, amidation, by derivatization using blocking/protecting groups and the like. Such modifications may increase stability or activity of the polypeptide.

The terms “modulate(s) expression”, “modulated expression” and “modulating expression” of a polynucleotide or polypeptide, are intended to encompass the situation where genomic DNA corresponding to a polynucleotide to be expressed according to the invention is modified thus leading to modulated expression of a polynucleotide or polypeptide of the invention. Modification of the genomic DNA may be through genetic transformation or other methods known in the art for inducing mutations. The “modulated expression” can be related to an increase or decrease in the amount of messenger RNA and/or polypeptide produced and may also result in an increase or decrease in the activity of a polypeptide due to alterations in the sequence of a polynucleotide and polypeptide produced.

The terms “modulate(s) activity”, “modulated activity” and “modulating activity” of a polynucleotide or polypeptide, are intended to encompass the situation where genomic DNA corresponding to a polynucleotide to be expressed according to the invention is modified thus leading to modulated expression of a polynucleotide or modulated expression or activity of polypeptide of the invention. Modification of the genomic DNA may be through genetic transformation or other methods known in the art for inducing mutations. The “modulated activity” can be related to an increase or decrease in the amount of messenger RNA and/or polypeptide produced and may also result in an increase or decrease in the functional activity of a polypeptide due to alterations in the sequence of a polynucleotide and polypeptide produced.

It is intended that reference to a range of numbers disclosed herein (for example 1 to 10) also incorporates reference to all related numbers within that range (for example, 1, 1.1, 2, 3, 3.9, 4, 5, 6, 6.5, 7, 8, 9 and 10) and also any range of rational numbers within that range (for example 2 to 8, 1.5 to 5.5 and 3.1 to 4.7) and, therefore, all sub-ranges of all ranges expressly disclosed herein are expressly disclosed. These are only examples of what is specifically intended and all possible combinations of numerical values between the lowest value and the highest value enumerated are to be considered to be expressly stated in this application in a similar manner.

DETAILED DESCRIPTION

Since the identification of the biosynthetic pathway for the IDT paxilline 6b in Penicillium paxilli, gene functionality in seven other IDT biosynthetic pathways has been elucidated. These IDT pathways share homologous genes that encode enzymes for the first three steps in IDT biosynthesis (FIG. 2): (I) a geranylgeranyl pyrophosphate synthase (GGPPS), converts farnesyl pyrophosphate and isopentyl pyrophosphate into GGPP 1a, (II) a geranylgeranyl transferase (GGT), catalyzes the indole condensation of GGPP 1a and indole-3-glycerol phosphate 1b to make GGI 2, and (III) a regioselective flavin adenine dinucleotide (FAD) dependent epoxidase, creates the single and/or double epoxidized-GGI products 3a/3b. At the fourth enzymatic step, involving IDT cyclization, the pathways diverge into four key branches giving rise to mono/di-oxygenated anti-Markovnikov-derived cyclic cores like emindole SB 4a and paspaline 4b, or mono/di-oxygenated Markovnikov-derived cyclic cores like aflavinine 4c and the emindole DB 4d. These cyclic cores are often further modified with decorative enzymes that create the bioactive diversity seen across IDTs.

NAs are bioactive IDTs produced by Hypoxylon pulicicidum, with NAA 10 being of particular significance due to its highly potent insecticidal activity against blood-feeding arthropods and lack of mammalian toxicity. However, as described herein, the production of NAA 10 by direct synthesis has not been achieved, and the biosynthetic production of this compound in quantities that would be useful at even a small scale commercial level would be difficult to achieve, if at all.

Accordingly, the present invention generally relates to a series of isolated genes from the fungus, H. pulicicidum, which combined, form a gene cluster that mediates the production of NAs, and to the use of that gene cluster to direct the heterologous expression of NAs in an isolated host cell, preferably an isolated fungal cell. Using a recently developed technique for manipulating gene sequences termed the Modular Idempotent DNA Assembly System (MIDAS), the inventors have reconstituted the biosynthetic pathway for NAF 5a from H. pulicicidum in an alternate fungal host, Penicillium paxilli. The MIDAS platform and method of using the MIDAS platform are described herein, and related patent application AU2017903955, the entirety of which is incorporated herein by reference.

The inventors analyzed the genomic sequence of H. pulicicidum and identified a cluster comprising 15 predicted coding sequences (CDSs): nodW cDNA (SEQ ID NO:2) and genomic DNA (SEQ ID NO:1), nodR cDNA (SEQ ID NO:5) and genomic DNA (SEQ ID NO:4), nodX cDNA (SEQ ID NO:8) and genomic DNA (SEQ ID NO:7), nodM cDNA (SEQ ID NO:11) and genomic DNA (SEQ ID NO:10), nodB cDNA (SEQ ID NO:14) and genomic DNA (SEQ ID NO:13), nodO cDNA (SEQ ID NO:17) and genomic DNA (SEQ ID NO:16), nodJ cDNA (SEQ ID NO:20) and genomic DNA (SEQ ID NO: 19), nodC cDNA (SEQ ID NO:23) and genomic DNA (SEQ ID NO:22), nodY1 cDNA (SEQ ID NO:26) and genomic DNA (SEQ ID NO:25), nodD2 cDNA (SEQ ID NO:29) and genomic DNA (SEQ ID NO:28), nodD1 cDNA (SEQ ID NO:32) and genomic DNA (SEQ ID NO:31), nodY2 cDNA (SEQ ID NO:35) and genomic DNA (SEQ ID NO:34), nodZ cDNA (SEQ ID NO:38) and genomic DNA (SEQ ID NO:37), nodS cDNA (SEQ ID NO:49) and genomic DNA (SEQ ID NO:48), nodI cDNA (SEQ ID NO:55) and genomic DNA (SEQ ID NO:54) that are expected to encode enzymes necessary for the biosynthesis of NAA 10.

The boundaries of this cluster were determined by identifying flanking genes that have high similarity and syntenic organisation compared with an equivalent genomic locus in another Hypoxylon strain that does not produce nodulisporic acids. Details of these predicted genes in the cluster and their proposed function are shown in Table 1. Seven of the cluster genes are homologous to those found in other IDT biosynthetic gene clusters (Tables 2-6). The protein product of the seven predicted genes that are homologous to IDT biosynthesis genes from other fungi have at least 35% amino acid identity to their homologues in the PAX cluster of P. paxilli, the JAN cluster of P. janthinellum, and/or the PEN cluster of P. crustosum and include a GGT (NodC (SEQ ID NO:24)), two FAD-dependent oxidases (NodM (SEQ ID NO:12) and NodO (SEQ ID NO:18)), an IDT cyclase (NodB (SEQ ID NO:15)), two prenyl transferases (NodD2 (SEQ ID NO:30), and NodD1 (SEQ ID NO:33)), and one cytochrome P450 oxygenase (NodR (SEQ ID NO:6)). The other seven putative ORFs were predicted to encode four cytochrome P450 oxygenases (NodW (SEQ ID NO:3), NodX (SEQ ID NO:9), NodJ (SEQ ID NO:21), and NodZ (SEQ ID NO:39)), a pair of paralogous FAD-dependent oxygenases (NodY1 (SEQ ID NO:27), and NodY2 (SEQ ID NO:36)), and two gene products that may be involved in NA biosynthesis with unknown functions (NodS (SEQ ID NO:50), and NodI (SEQ ID NO:56)). Similar to the TER gene cluster from Chaunopycnis alba (Tolypocladium album) responsible for terpendole biosynthesis, the NOD cluster does not appear to contain a secondary metabolite-specific GGPPS gene. Notably, the inventors identified only one GGPPS-encoding gene in the genome of H. pulicicidum and the amino acid sequence of its predicted protein product, its exon/intron structure, and its location outside of the identified cluster strongly suggest that it is responsible for primary metabolic function similar to ggs1 in P. paxilli.

To confirm the function of gene products and directly establish their respective roles in NAA 10 biosynthesis the inventors constructed a series of plasmids harbouring various combinations of these genes, which they then transformed into in appropriate P. paxilli hosts (Table 7) for heterologous production of NAA 10 precursors. Accordingly, CDSs of the H. pulicicidum genes of interest were amplified (see Table 8 for primers) and cloned into a MIDAS Level-1 destination vector, pML1 (Table 9). At MIDAS Level-2, the cloned CDSs were placed under the control of heterologous promoter (ProUTR) and transcriptional terminator (UTRterm) modules to generate full-length TUs (Table 10), which were then used to generate the multi-gene plasmids (Table 11). The inventors used a repertoire of P. paxilli knockout strains (Table 7) to carry out functional complementations and pathway reconstitution to determine the functions of genes in the NOD cluster. Following transformation of P. paxilli hosts with multi-gene plasmids, the inventors determined the chemical phenotypes of the transformants initially by normal-phase thin-layer chromatography (TLC, results not shown) and subsequently by reversed-phase liquid chromatography-mass spectrometry (LC-MS) analysis of fungal extracts. The inventors purified the newly expressed metabolites, as determined by high resolution mass spectrometry (HRMS), by semi-preparative reversed-phase high-performance liquid chromatography (HPLC) and subjected compounds to nuclear magnetic resonance (NMR) spectroscopic analysis (¹H, ¹³C, and HSQC, HMBC, COSY) for final identification.

Using this methodology, the inventors identified that nodC (cDNA (SEQ ID NO:23) and genomic DNA (SEQ ID NO:22)) is a functional ortholog of paxC (cDNA (SEQ ID NO:43) and genomic DNA (SEQ ID NO:42)), and that NodC (SEQ ID NO:24) mediates the production of GGI 2, the second step in IDT biosynthesis, in H. pulicicidum (FIG. 3, trace i.b, FIG. 4). NodC (SEQ ID NO:24) shares 52.3% amino acid sequence identity with PaxC (SEQ ID NO:44) from P. paxilli (Table 2). The inventors also identified that nodM (cDNA (SEQ ID NO:11) and genomic DNA (SEQ ID NO:10)), is a homolog of paxM (cDNA (SEQ ID NO:46) and genomic DNA (SEQ ID NO:45)), and that NodM (SEQ ID NO:12) is a GGI 2 mono-epoxidase that catalyzes the production of monoepoxidized-GGI 3a (FIG. 3, trace ii.b, FIG. 5). NodM (SEQ ID NO:12) shares 48.6% sequence identity with PaxM (SEQ ID NO:47) from P. paxilli (Table 3). Notably, the inventors showed that NodM (SEQ ID NO: 12) is specifically a mono-epoxidase, unlike PaxM (SEQ ID NO:47), which is a mono- or di-epoxidase. The inventors further confirmed that NodB (SEQ ID NO:15) from H. pulicicidum acts as the IDT cyclase that cyclizes the monoepoxidized-GGI product 3a to form emindole SB 4a (FIG. 3, trace iii.b, FIG. 6) and that nodB (cDNA (SEQ ID NO:14) and genomic DNA (SEQ ID NO:13)) is a functional ortholog of paxB (cDNA (SEQ ID NO:52) and genomic DNA (SEQ ID NO:51)) from P. paxilli (FIG. 3, trace iv.b, FIG. 7). NodB (SEQ ID NO:15) from H. pulicicidum shares 63% identity with PaxB (SEQ ID NO:53) from P. paxilli (Table 4).

The dedicated NAA 10 core is NAF 5a, generated by oxidation of the terminal methyl carbon, C5″, of emindole SB 4a (FIG. 8B). Accordingly the inventors confirmed the identity of this oxidase as a P450 oxygenase encoded by H. pulicicidum nodW (cDNA (SEQ ID NO:2) and genomic DNA (SEQ ID NO: 1)) by co-expression of nodW (cDNA (SEQ ID NO:2) and genomic DNA (SEQ ID NO:1)) with nodM (cDNA (SEQ ID NO:11) and genomic DNA (SEQ ID NO:10)) into a paxM (cDNA (SEQ ID NO:46) and genomic DNA (SEQ ID NO:45)) deletion background (PN2257) resulting in the production of NAF 5a (FIG. 3, trace v.b, FIG. 9).

To confirm that only five genes are essential for the production of NAF 5a in P. paxilli, and to establish that no other P. paxilli IDT genes from the PAX cluster in the paxM KO strain (PN2257) were contributing to NAF 5a production, the inventors assembled a multigene construct comprising paxG (genomic DNA (SEQ ID NO:40)) from P. paxilli, and nodC (genomic DNA (SEQ ID NO:22)), nodM (genomic DNA (SEQ ID NO:10)), nodB (genomic DNA (SEQ ID NO:13)), and nodW (genomic DNA (SEQ ID NO:1)) from H. pulicicidum. This multigene construct was transformed into a P. paxilli PAX gene cluster knockout strain (PN2250). As expected, expression of the five genes showed that NAF 5a was produced (FIG. 10) and indicated that these five genes are indeed required to biosynthesize NAF 5a.

Based on the work described herein the inventors disclose the use of heterologous expression to identify the first five steps that deliver the NAA 10 core compound, NAF 5a. In particular, the inventors have confirmed the function of four previously unknown genes from H. pulicicidum: nodC (cDNA (SEQ ID NO:23) and genomic DNA (SEQ ID NO:22)), nodM (cDNA (SEQ ID NO:11) and genomic DNA (SEQ ID NO:10)), nodB (cDNA (SEQ ID NO:14) and genomic DNA (SEQ ID NO:13)), and nodW (cDNA (SEQ ID NO:2) and genomic DNA (SEQ ID NO:1)), and discovered a second filamentous fungal species, H. pulicicidum, that does not appear to have a secondary metabolic GGPPS gene but can still produce IDTs. Without wishing to be bound by theory, the inventors believe that H. pulicicidum relies upon its primary metabolic GGPPSs to provide the GGPP for IDT synthesis. The lack of a secondary-metabolic GGPPS may explain why H. pulicicidum produces such low quantities of NAs. The low quantities of NAs produced by H. pulicicidum is a challenge for both resolving the biosynthetic details and for usage of the compounds or their derivatives. Using the efficient gene reassembly of MIDAS and heterologous expression in P. paxilli the inventors have overcome both these issues. Furthermore, the inventors demonstrated that P. paxilli, with its far more favourable growth conditions, is a suitable host for heterologous expression studies, which enabled the inventors to confirm the function of genes more quickly and easily than would have been possible had they relied on the biosynthetic machinery of H. pulicicidum.

Elucidation of the biosynthetic routes for heterologous production of NAF 5a in P. paxilli provides a reasonable expectation of success in being able to fully identify the gene products from H. pulicicidum that are responsible for the ‘decoration’ steps that lead to the production of fully functionalized NAA 10. This reasonable expectation comes from the identification, by the inventors, of the nucleic acid CDSs that are predicted to encode the enzymes required for prenylation to form nodulisporic acid E 6a, and for oxidations, to form nodulisporic acid D 7a. Each of these nucleic acid sequences has been putatively identified from the H. pulicicidum biosynthetic gene cluster described herein. Overall, the inventors work described herein confirms that heterologous expression of IDT genes in a heterologous host is a viable method that can be employed to produce natural products that are difficult to obtain in other ways, and that could be useful across many industries.

Polypeptides

Accordingly, in one aspect the invention relates to an isolated polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodM (SEQ ID NO:12), NodB (SEQ ID NO:15), NodO (SEQ ID NO:18), NodJ (SEQ ID NO:21), NodC (SEQ ID NO:24), NodY1 (SEQ ID NO:27), NodD2 (SEQ ID NO:30), NodD1 (SEQ ID NO:33), NodY2 (SEQ ID NO:36), NodZ (SEQ ID NO:39), NodS (SEQ ID NO:50), and NodI (SEQ ID NO:56) or a functional variant or fragment thereof.

Preferably the functional variant or fragment thereof comprises at least 70%, preferably at least 75%, preferably at least 80%, preferably at least 85%, preferably at least 90%, preferably at least 95%, preferably at least 99% amino acid sequence identity to an amino acid sequence selected from the group consisting of SEQ ID NO: NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodM (SEQ ID NO:12), NodB (SEQ ID NO: 15), NodO (SEQ ID NO:18), NodJ (SEQ ID NO:21), NodC (SEQ ID NO:24), NodY1 (SEQ ID NO:27), NodD2 (SEQ ID NO:30), NodD1 (SEQ ID NO:33), NodY2 (SEQ ID NO:36), NodZ (SEQ ID NO:39), NodS (SEQ ID NO:50), and NodI (SEQ ID NO:56).

Preferably the isolated polypeptide comprising NodW (SEQ ID NO:3) or a functional variant or fragment thereof has oxygenase activity, preferably cytochrome P450 oxygenase activity.

Preferably the isolated polypeptide comprising NodR (SEQ ID NO:6) or a functional variant or fragment thereof has oxygenase activity, preferably cytochrome P450 oxygenase activity.

Preferably the isolated polypeptide comprising NodX (SEQ ID NO:9) or a functional variant or fragment thereof has oxygenase activity, preferably cytochrome P450 oxygenase activity.

Preferably the isolated polypeptide comprising NodM (SEQ ID NO:12) or a functional variant or fragment thereof has oxygenase activity, preferably FAD-dependent oxygenase activity.

Preferably the isolated polypeptide comprising NodB (SEQ ID NO:15) or a functional variant or fragment thereof has cyclase activity, preferably IDT cyclase activity.

Preferably the isolated polypeptide comprising NodO (SEQ ID NO:18) or a functional variant or fragment thereof has oxygenase activity, preferably FAD-dependent oxygenase activity.

Preferably the isolated polypeptide comprising NodJ (SEQ ID NO:21) or a functional variant or fragment thereof has oxygenase activity, preferably cytochrome P450 oxygenase activity.

Preferably the isolated polypeptide comprising NodC (SEQ ID NO:24) or a functional variant or fragment thereof has transferase activity, preferably GGT activity.

Preferably the isolated polypeptide comprising NodY1 (SEQ ID NO:27) or a functional variant or fragment thereof has oxygenase activity, preferably FAD-dependent oxygenase activity.

Preferably the isolated polypeptide comprising NodD2 (SEQ ID NO:30) or a functional variant or fragment thereof has transferase activity, preferably prenyl transferase activity.

Preferably the isolated polypeptide comprising NodD1 (SEQ ID NO:33) or a functional variant or fragment thereof has transferase activity, preferably prenyl transferase activity.

Preferably the isolated polypeptide comprising NodY2 (SEQ ID NO:36) or a functional variant or fragment thereof has oxygenase activity, preferably FAD-dependent oxygenase activity.

Preferably the isolated polypeptide comprising NodZ (SEQ ID NO:39) or a functional variant or fragment thereof has oxygenase activity, preferably cytochrome P450 oxygenase activity.

In one embodiment the isolated polypeptide comprises SEQ ID NO: NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodM (SEQ ID NO:12), NodB (SEQ ID NO:15), NodO (SEQ ID NO:18), NodJ (SEQ ID NO:21), NodC (SEQ ID NO:24), NodY1 (SEQ ID NO:27), NodD2 (SEQ ID NO:30), NodD1 (SEQ ID NO:33), NodY2 (SEQ ID NO:36), NodZ (SEQ ID NO:39), NodS (SEQ ID NO:50), and NodI (SEQ ID NO:56) or a functional variant or fragment thereof.

In one embodiment the isolated polypeptide consists essentially of SEQ ID NO: NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodM (SEQ ID NO:12), NodB (SEQ ID NO:15), NodO (SEQ ID NO:18), NodJ (SEQ ID NO:21), NodC (SEQ ID NO:24), NodY1 (SEQ ID NO:27), NodD2 (SEQ ID NO:30), NodD1 (SEQ ID NO:33), NodY2 (SEQ ID NO:36), NodZ (SEQ ID NO:39), NodS (SEQ ID NO:50), and NodI (SEQ ID NO:56) or a functional variant or fragment thereof.

In one embodiment the isolated polypeptide consists of SEQ ID NO: NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodM (SEQ ID NO:12), NodB (SEQ ID NO:15), NodO (SEQ ID NO:18), NodJ (SEQ ID NO:21), NodC (SEQ ID NO:24), NodY1 (SEQ ID NO:27), NodD2 (SEQ ID NO:30), NodD1 (SEQ ID NO:33), NodY2 (SEQ ID NO:36), NodZ (SEQ ID NO:39), NodS (SEQ ID NO:50), and NodI (SEQ ID NO:56) or a functional variant or fragment thereof.

Polynucleotides

In another aspect the invention relates to an isolated polynucleotide encoding a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodM (SEQ ID NO:12), NodB (SEQ ID NO:15), NodO (SEQ ID NO:18), NodJ (SEQ ID NO:21), NodC (SEQ ID NO:24), NodY1 (SEQ ID NO:27), NodD2 (SEQ ID NO:30), NodD1 (SEQ ID NO:33), NodY2 (SEQ ID NO:36), NodZ (SEQ ID NO:39), NodS (SEQ ID NO:50), and NodI (SEQ ID NO:56) or a functional variant or fragment thereof.

Preferably the functional variant or fragment thereof comprises at least 70%, preferably at least 75%, preferably at least 80%, preferably at least 85%, preferably at least 90%, preferably at least 95%, preferably at least 99% amino acid sequence identity to an amino acid sequence selected from the group consisting of SEQ ID NO: NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodM (SEQ ID NO:12), NodB (SEQ ID NO:15), NodO (SEQ ID NO: 18), NodJ (SEQ ID NO:21), NodC (SEQ ID NO:24), NodY1 (SEQ ID NO:27), NodD2 (SEQ ID NO:30), NodD1 (SEQ ID NO:33), NodY2 (SEQ ID NO:36), NodZ (SEQ ID NO:39), NodS (SEQ ID NO:50), and NodI (SEQ ID NO:56) a functional variant or fragment thereof.

Preferably the isolated polynucleotide encodes a polypeptide comprising NodW (SEQ ID NO:3) or a functional variant or fragment thereof having oxygenase activity, preferably cytochrome P450 oxygenase activity.

Preferably the isolated polynucleotide encodes a polypeptide comprising NodR (SEQ ID NO:6) or a functional variant or fragment thereof having oxygenase activity, preferably cytochrome P450 oxygenase activity.

Preferably the isolated polynucleotide encodes a polypeptide comprising NodX (SEQ ID NO:9) or a functional variant or fragment thereof having oxygenase activity, preferably cytochrome P450 oxygenase activity.

Preferably the isolated polynucleotide encodes a polypeptide comprising NodM (SEQ ID NO:12) or a functional variant or fragment thereof having oxygenase activity, preferably FAD-dependent oxygenase activity.

Preferably the isolated polynucleotide encodes a polypeptide comprising NodB (SEQ ID NO:15) or a functional variant or fragment thereof having cyclase activity, preferably IDT cyclase activity.

Preferably the isolated polynucleotide encodes a polypeptide comprising NodO (SEQ ID NO:18) or a functional variant or fragment thereof having oxygenase activity, preferably FAD-dependent oxygenase activity.

Preferably the isolated polynucleotide encodes a polypeptide comprising NodJ (SEQ ID NO:21) or a functional variant or fragment thereof having oxygenase activity, preferably cytochrome P450 oxygenase activity.

Preferably the isolated polynucleotide encodes a polypeptide comprising NodC (SEQ ID NO:24) or a functional variant or fragment thereof having transferase activity, preferably GGT activity.

Preferably the isolated polynucleotide encodes a polypeptide comprising NodY1 (SEQ ID NO:27) or a functional variant or fragment thereof having oxygenase activity, preferably FAD-dependent oxygenase activity.

Preferably the isolated polynucleotide encodes a polypeptide comprising NodD2 (SEQ ID NO:30) or a functional variant or fragment thereof having transferase activity, preferably prenyl transferase activity.

Preferably the isolated polynucleotide encodes a polypeptide comprising NodD1 (SEQ ID NO:33) or a functional variant or fragment thereof having transferase activity, preferably prenyl transferase activity.

Preferably the isolated polynucleotide encodes a polypeptide comprising NodY2 (SEQ ID NO:36) or a functional variant or fragment thereof having oxygenase activity, preferably FAD-dependent oxygenase activity.

Preferably the isolated polynucleotide encodes a polypeptide comprising NodZ (SEQ ID NO:39) or a functional variant or fragment thereof having oxygenase activity, preferably cytochrome P450 oxygenase activity.

In one embodiment the isolated polynucleotide encodes a polypeptide comprising NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodM (SEQ ID NO:12), NodB (SEQ ID NO:15), NodO (SEQ ID NO:18), NodJ (SEQ ID NO:21), NodC (SEQ ID NO:24), NodY1 (SEQ ID NO:27), NodD2 (SEQ ID NO:30), NodD1 (SEQ ID NO:33), NodY2 (SEQ ID NO:36), NodZ (SEQ ID NO:39), NodS (SEQ ID NO:50), and NodI (SEQ ID NO:56) or a functional variant or fragment thereof.

In one embodiment the isolated polynucleotide encodes a polypeptide consisting essentially of NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodM (SEQ ID NO:12), NodB (SEQ ID NO:15), NodO (SEQ ID NO:18), NodJ (SEQ ID NO:21), NodC (SEQ ID NO:24), NodY1 (SEQ ID NO:27), NodD2 (SEQ ID NO:30), NodD1 (SEQ ID NO:33), NodY2 (SEQ ID NO:36), NodZ (SEQ ID NO:39), NodS (SEQ ID NO:50), and NodI (SEQ ID NO:56) or a functional variant or fragment thereof.

In one embodiment the isolated polynucleotide encodes a polypeptide consisting of NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodM (SEQ ID NO:12), NodB (SEQ ID NO:15), NodO (SEQ ID NO:18), NodJ (SEQ ID NO:21), NodC (SEQ ID NO:24), NodY1 (SEQ ID NO:27), NodD2 (SEQ ID NO:30), NodD1 (SEQ ID NO:33), NodY2 (SEQ ID NO:36), NodZ (SEQ ID NO:39), NodS (SEQ ID NO:50), and NodI (SEQ ID NO:56) or a functional variant or fragment thereof.

In another aspect the invention relates to an isolated polynucleotide comprising at least 70% nucleic acid sequence identity to a nucleic acid sequence selected from the group consisting of nodW cDNA (SEQ ID NO:2), nodW genomic DNA (SEQ ID NO:1), nodR cDNA (SEQ ID NO:5), nodR genomic DNA (SEQ ID NO:4), nodX cDNA (SEQ ID NO:8), nodX genomic DNA (SEQ ID NO:7), nodM cDNA (SEQ ID NO:11), nodM genomic DNA (SEQ ID NO:10), nodB cDNA (SEQ ID NO:14), nodB genomic DNA (SEQ ID NO:13), nodO cDNA (SEQ ID NO:17), nodO genomic DNA (SEQ ID NO:16), nodJ cDNA (SEQ ID NO:20), nodJ genomic DNA (SEQ ID NO:19), nodC cDNA (SEQ ID NO:23), nodC genomic DNA (SEQ ID NO:22), nodY1 cDNA (SEQ ID NO:26), nodY1 genomic DNA (SEQ ID NO:25), nodD2 cDNA (SEQ ID NO:29), nodD2 genomic DNA (SEQ ID NO:28), nodD1 cDNA (SEQ ID NO:32), nodD1 genomic DNA (SEQ ID NO:31), nodY2 cDNA (SEQ ID NO:35), nodY2 genomic DNA (SEQ ID NO:34), nodZ cDNA (SEQ ID NO:38), nodZ genomic DNA (SEQ ID NO:37), nodS cDNA (SEQ ID NO:49), nodS genomic DNA (SEQ ID NO:48), nodI cDNA (SEQ ID NO:55), and nodI genomic DNA (SEQ ID NO:54).

Preferably the isolated polynucleotide comprises at least 75%, preferably at least 80%, preferably at least 85%, preferably at least 90%, preferably at least 95%, preferably at least 99% nucleic acid sequence identity to a nucleic acid sequence selected from the group consisting of SEQ ID NO: nodW cDNA (SEQ ID NO:2), nodW genomic DNA (SEQ ID NO:1), nodR cDNA (SEQ ID NO:5), nodR genomic DNA (SEQ ID NO:4), nodX cDNA (SEQ ID NO:8), nodX genomic DNA (SEQ ID NO:7), nodM cDNA (SEQ ID NO:11), nodM genomic DNA (SEQ ID NO:10), nodB cDNA (SEQ ID NO:14), nodB genomic DNA (SEQ ID NO:13), nodO cDNA (SEQ ID NO:17), nodO genomic DNA (SEQ ID NO:16), nodJ cDNA (SEQ ID NO:20), nodJ genomic DNA (SEQ ID NO:19), nodC cDNA (SEQ ID NO:23), nodC genomic DNA (SEQ ID NO:22), nodY1 cDNA (SEQ ID NO:26), nodY1 genomic DNA (SEQ ID NO:25), nodD2 cDNA (SEQ ID NO:29), nodD2 genomic DNA (SEQ ID NO:28), nodD1 cDNA (SEQ ID NO:32), nodD1 genomic DNA (SEQ ID NO:31), nodY2 cDNA (SEQ ID NO:35), nodY2 genomic DNA (SEQ ID NO:34), nodZ cDNA (SEQ ID NO:38), nodZ genomic DNA (SEQ ID NO:37), nodS cDNA (SEQ ID NO:49), nodS genomic DNA (SEQ ID NO:48), nodI cDNA (SEQ ID NO:55), and nodI genomic DNA (SEQ ID NO:54).

In one embodiment the isolated polynucleotide comprises a nucleic acid sequence selected from the group consisting of nodW cDNA (SEQ ID NO:2), nodW genomic DNA (SEQ ID NO:1), nodR cDNA (SEQ ID NO:5), nodR genomic DNA (SEQ ID NO:4), nodX cDNA (SEQ ID NO:8), nodX genomic DNA (SEQ ID NO:7), nodM cDNA (SEQ ID NO:11), nodM genomic DNA (SEQ ID NO:10), nodB cDNA (SEQ ID NO:14), nodB genomic DNA (SEQ ID NO:13), nodO cDNA (SEQ ID NO:17), nodO genomic DNA (SEQ ID NO:16), nodJ cDNA (SEQ ID NO:20), nodJ genomic DNA (SEQ ID NO:19), nodC cDNA (SEQ ID NO:23), nodC genomic DNA (SEQ ID NO:22), nodY1 cDNA (SEQ ID NO:26), nodY1 genomic DNA (SEQ ID NO:25), nodD2 cDNA (SEQ ID NO:29), nodD2 genomic DNA (SEQ ID NO:28), nodD1 cDNA (SEQ ID NO:32), nodD1 genomic DNA (SEQ ID NO:31), nodY2 cDNA (SEQ ID NO:35), nodY2 genomic DNA (SEQ ID NO:34), nodZ cDNA (SEQ ID NO:38), nodZ genomic DNA (SEQ ID NO:37), nodS cDNA (SEQ ID NO:49), nodS genomic DNA (SEQ ID NO:48), nodI cDNA (SEQ ID NO:55), and nodI genomic DNA (SEQ ID NO:54).

In one embodiment the isolated polynucleotide consists essentially of a nucleic acid sequence selected from the group consisting of nodW cDNA (SEQ ID NO:2), nodW genomic DNA (SEQ ID NO:1), nodR cDNA (SEQ ID NO:5), nodR genomic DNA (SEQ ID NO:4), nodX cDNA (SEQ ID NO:8), nodX genomic DNA (SEQ ID NO:7), nodM cDNA (SEQ ID NO:11), nodM genomic DNA (SEQ ID NO:10), nodB cDNA (SEQ ID NO:14), nodB genomic DNA (SEQ ID NO:13), nodO cDNA (SEQ ID NO:17), nodO genomic DNA (SEQ ID NO:16), nodJ cDNA (SEQ ID NO:20), nodJ genomic DNA (SEQ ID NO:19), nodC cDNA (SEQ ID NO:23), nodC genomic DNA (SEQ ID NO:22), nodY1 cDNA (SEQ ID NO:26), nodY1 genomic DNA (SEQ ID NO:25), nodD2 cDNA (SEQ ID NO:29), nodD2 genomic DNA (SEQ ID NO:28), nodD1 cDNA (SEQ ID NO:32), nodD1 genomic DNA (SEQ ID NO:31), nodY2 cDNA (SEQ ID NO:35), nodY2 genomic DNA (SEQ ID NO:34), nodZ cDNA (SEQ ID NO:38), nodZ genomic DNA (SEQ ID NO:37), nodS cDNA (SEQ ID NO:49), nodS genomic DNA (SEQ ID NO:48), nodI cDNA (SEQ ID NO:55), and nodI genomic DNA (SEQ ID NO:54).

In one embodiment the isolated polynucleotide consists of a nucleic acid sequence selected from the group consisting of nodW cDNA (SEQ ID NO:2), nodW genomic DNA (SEQ ID NO:1), nodR cDNA (SEQ ID NO:5), nodR genomic DNA (SEQ ID NO:4), nodX cDNA (SEQ ID NO:8), nodX genomic DNA (SEQ ID NO:7), nodM cDNA (SEQ ID NO:11), nodM genomic DNA (SEQ ID NO:10), nodB cDNA (SEQ ID NO:14), nodB genomic DNA (SEQ ID NO:13), nodO cDNA (SEQ ID NO:17), nodO genomic DNA (SEQ ID NO:16), nodJ cDNA (SEQ ID NO:20), nodJ genomic DNA (SEQ ID NO:19), nodC cDNA (SEQ ID NO:23), nodC genomic DNA (SEQ ID NO:22), nodY1 cDNA (SEQ ID NO:26), nodY1 genomic DNA (SEQ ID NO:25), nodD2 cDNA (SEQ ID NO:29), nodD2 genomic DNA (SEQ ID NO:28), nodD1 cDNA (SEQ ID NO:32), nodD1 genomic DNA (SEQ ID NO:31), nodY2 cDNA (SEQ ID NO:35), nodY2 genomic DNA (SEQ ID NO:34), nodZ cDNA (SEQ ID NO:38), nodZ genomic DNA (SEQ ID NO:37), nodS cDNA (SEQ ID NO:49), nodS genomic DNA (SEQ ID NO:48), nodI cDNA (SEQ ID NO:55), and nodI genomic DNA (SEQ ID NO:54).

The nucleic acid molecules of the invention or otherwise described herein are preferably isolated. They can be isolated from a biological sample using a variety of techniques known to those of ordinary skill in the art. By way of example, such polynucleotides can be isolated through use of the polymerase chain reaction (PCR) as known in the art. The nucleic acid molecules of the invention can be amplified using primers, as defined herein, derived from the polynucleotide sequences of the invention.

Further methods for isolating polynucleotides include use of all, or portions of, a polynucleotide of the invention as hybridization probes. The technique of hybridizing labeled polynucleotide probes to polynucleotides immobilized on solid supports such as nitrocellulose filters or nylon membranes, can be used to screen genomic or cDNA libraries. Similarly, probes may be coupled to beads and hybridized to the target sequence. Isolation can be effected using known art protocols such as magnetic separation. The choice of appropriately stringent hybridization and wash conditions is believed to be within the skill of those in the art.

Polynucleotide fragments may be produced by techniques well-known in the art such as restriction endonuclease digestion and oligonucleotide synthesis.

A partial polynucleotide sequence may be used as a probe, in methods well-known in the art to identify the corresponding full length polynucleotide sequence in a sample. Such methods include PCR-based methods, 5′RACE and hybridization-based method, computer/database-based methods as known in the art. Detectable labels such as radioisotopes, fluorescent, chemiluminescent and bioluminescent labels may be used to facilitate detection. Inverse PCR also permits acquisition of unknown sequences, flanking the polynucleotide sequences disclosed herein, starting with primers based on a known region as known and used in the art. The method uses several restriction enzymes to generate a suitable fragment in the known region of a gene. The fragment is then circularized by intramolecular ligation and used as a PCR template. Divergent primers are designed from the known region. In order to physically assemble full-length clones, standard molecular biology approaches can be utilized as known in the art. Primers and primer pairs which allow amplification of polynucleotides of the invention, also form a further aspect of this invention.

Variants (including orthologues) may be identified by the methods described. Variant polynucleotides may be identified using PCR-based methods as known in the art. Typically, the polynucleotide sequence of a primer, useful to amplify variants of polynucleotide molecules by PCR, may be based on a sequence encoding a conserved region of the corresponding amino acid sequence.

Further methods for identifying variant polynucleotides include use of all, or portions of the specified polynucleotides as hybridization probes to screen genomic or cDNA libraries as described above. Typically probes based on a sequence encoding a conserved region of the corresponding amino acid sequence may be used. Hybridization conditions may also be less stringent than those used when screening for sequences identical to the probe.

In another aspect the invention relates to a TU comprising at least one isolated polynucleotide as described herein. In one embodiment the TU is comprised in vector, preferably an expression vector. In one embodiment the vector is selected from the group consisting of plasmids, BACs, (PACs), YACs, bacteriophage, phagemids, and cosmids. Preferably the vector is a plasmid.

In another aspect the invention relates to a vector that encodes an isolated polypeptide or functional variant or fragment thereof according to the invention.

In another aspect the invention relates to a vector comprising an isolated nucleic acid sequence according to the invention.

In one embodiment the isolated nucleic acid sequence is comprised in a TU.

In one embodiment the vector is selected from the group consisting of plasmids, BACs, PACs, YACs, bacteriophage, phagemids, and cosmids. Preferably the vector is a plasmid. In one embodiment the vector is an expression vector.

A TU comprising a polynucleotide of the invention can be incorporated into any suitable vector capable of expressing that polynucleotide or, where applicable, an encoded polypeptide of the invention in vitro or in a host cell. Preferably the vector is an expression vector. Examples of suitable expression vectors include, but not limited to, plasmid DNA vectors, viral DNA vectors (such as adenovirus and adeno-associated virus), or viral RNA vectors (such as a retroviral vectors). In some embodiments the plasmid and/or phage vectors may be selected from the following vectors or variants thereof including pUC18, pU19, Mp18, Mp19, ColE1, PCR1 and pKRC; lambda gt10 and M13 plasmids such as pBR322, pACYC184, pT127, RP4, p1J101, SV40 and BPV. Also included are vectors such as, but not limited to, cosmids, YACS, BACs shuttle vectors such as pSA3, PAT28 transposons (such as described in U.S. Pat. No. 5,792,294) and the like.

Suitable viral vectors include, but are not limited to vectors derived from adenovirus (AV); adeno-associated virus (AAV); retroviruses (e.g., lentiviruses (LV), Rhabdoviruses, murine leukemia virus); herpes virus, and the like. Viral vectors employed herein can be appropriately modified by pseudotyping with envelope proteins or other surface antigens from other viruses, or by substituting different viral capsid proteins, as known and used in the art.

In one embodiment the expression vector comprises at least one, preferably at least two, preferably at least three, preferably at least four, preferably at least five, preferably at least six, preferably at least seven, preferably at least eight, preferably at least nine, preferably at least 10 isolated polynucleotides as described herein.

In one embodiment the vector is a component in a cloning system. In one embodiment the cloning system is useful for making a gene construct comprising at least one TU.

In one embodiment the vector is comprised in a vector set, the vector set being part of a cloning system. In one embodiment the cloning system is useful for making a gene construct comprising at least one TU.

In one embodiment the cloning system is useful for making a gene construct comprising at least one TU.

In one embodiment the gene construct is a multigene construct comprising at least two TUs. In one embodiment the multigene construct comprises at least three, preferably at least four, preferably at least five, preferably at least six, preferably at least seven, preferably at least eight, preferably at least nine, preferably at least ten TUs.

The TUs described herein may comprise one or more of the disclosed polynucleotide sequences and/or polynucleotides encoding the disclosed polypeptides, of the invention. The TU can constructed to drive expression of at least one polypeptide involved in the biosynthesis of NAA 10, either in vitro or in vivo. In one embodiment, the TU comprises a polynucleotide of the invention operatively linked to 5′ or 3′ untranslated regulatory sequences. The design of a particular TU will depend on various factors including the host cells in which the operatively linked polynucleotide is to be expressed and the desired level of polynucleotide expression.

Likewise, the selection of various promoters, enhancers and/or other genetic elements for a TU will depend on various factors including the host cells and expression levels discussed above. In one embodiment, the TU comprises a homologous promoter operatively linked to a polynucleotide of the invention. In another embodiment, the expression cassette comprises a heterologous promoter operatively linked to a polynucleotide of the invention. In one embodiment, the homologous or heterologous promoter is an inducible, repressible or regulatable promoter. A suitable promoter may be chosen and used under the appropriate conditions to direct high-level expression of a polynucleotide of the invention. Many such elements are described in the literature and are available through commercial suppliers.

By way of example only, promoters useful in the expression cassettes can be any suitable eukaryotic or prokaryotic promoter. In one embodiment, the eukaryotic promoter can be a eukaryotic RNA polymerase I (pol I), RNA polymerase II (pol II), or RNA polymerase III (pol III). Expression levels of an operably linked polynucleotide in a particular cell type will be determined by the nearby presence (or absence) of specific gene regulatory sequences (e.g., enhancers, silencers and the like). Any suitable promoter/enhancer combination (see: Eukaryotic Promoter Data Base EPDB) can be used to drive expression of a polynucleotide of the invention.

Additional promoters useful in expression cassettes include ß-lactamase, alkaline phosphatase, tryptophan, and tac promoter systems which are all well known in the art. Yeast promoters include 3-phosphoglycerate kinase, enolase, hexokinase, pyruvate decarboxylase, glucokinase, and glyceraldehydrate-3-phosphanate dehydrogenase but are not limited thereto.

Prokaryotic promoters useful in expression cassettes include constitutive promoters as known in the art (such as the int promoter of bacteriophage lamda and the bla promoter of the beta-lactamase gene sequence of pBR322) and regulatable promoters (such as lacZ, recA and gal). A ribosome binding site upstream of the CDS may also be required for expression.

Enhancers useful in a TU include SV40 enhancer, cytomegalovirus early promoter enhancer, globin, albumin, insulin and the like.

In one embodiment, a TU may be driven by a T3, T7 or SP6 cytoplasmic expression system.

The choice of a particular promoter/enhancer/cell type combination for protein expression is within the ordinary skill of those in the art of molecular biology (see, for example, Sambrook et al. (1989) which is incorporated herein by reference).

In another aspect the invention relates to an isolated host cell comprising an isolated polypeptide, isolated polynucleotide, TU and/or isolated vector according to the invention.

In one embodiment the isolated host cell is a prokaryotic or eukaryotic cell. Prokaryotes most commonly employed as host cells are strains of Escherichia coli (E. coli). Other prokaryotic hosts include Pseudomonas, Bacillus, Serratia, Klebsiella, Streptomyces, Listeria, Salmonella and Mycobacteria but are not limited thereto.

In one embodiment the eukaryotic cell is an animal cell, a plant cell, a fungal cell or a protist cell. In one embodiment the animal cell is an insect cell or a mammalian cell. In one embodiment the fungal cell is a single cell of a unicellular fungal host strain. In one embodiment the fungal cell comprises fungal hyphae or the mycelia of a fungal host strain.

In one embodiment the fungal cell, hyphae or mycelia of the fungal host strain are from the genus Aspergillus, Trichoderma, Neurospora, Fusarium, Mortierella, Chrysosporium, Candida, Geotrichum, Yarrowia, Eremothecium, Trichoplusia, Ashbya, Hansenula, Pichia, Kluveromyces, Schizzosaccharomyces, Monascus, Talaromyces, Cryptonectria, Endothia, Tolypocladium, Hypocrea, Gibberella, Acremonium, Agaricus, Pleurotus, Penicillium, Volvariella, Flammulina, Lentinula, Auricularia, Ganoderma, (Rhizo)mucor, Riopus, or Saccharomyces, preferably Penicillium, Aspergillus, Saccharomyces, Pichia, Tricoplusia, and Spondoptera. Preferably the fungal cell is from Saccharomyces. Preferably the fungal hyphae or mycelia is from Penicillium, preferably P. paxilli.

In one embodiment the NA is selected from the group of NAs depicted in FIG. 1. Preferably the NA is NAF 5a or NAA 10, preferably NAA 10.

In one embodiment the polypeptide is a polypeptide or functional variant or fragment according to the invention.

Specifically contemplated as embodiments within this aspect of the invention are various embodiments set out herein with regards to any other aspect of the invention that relate to heterologous expression (including choice of appropriate regulatory sequences), expression cassettes, genetic elements, TUs, multigene constructs, host cells, and vectors.

In a particular embodiment, heterologous expression of the polypeptide comprises expression of at least one polynucleotide according to the invention or at least one TU encoding at least one polypeptide of the invention, from at least one vector as described herein in an isolated fungal host cell or in the mycelia of an isolated fungal strain as described herein. In one embodiment the polypeptide is NodR (SEQ ID NO:6) NodX (SEQ ID NO:9), or NodZ (SEQ ID NO:39), preferably the polypeptide is an enzyme that catalyzes a biological transformation from NAB 9 to NAA 10. In one embodiment the fungal cell or strain is a cell or strain of Penicillium, preferably P. paxilli.

In one embodiment the TU is comprised in a multigene construct comprising at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine and/or at least 10 polynucleotides encoding polypeptides according to the invention.

In another aspect the invention relates to at least one NA made by a method of the invention. In one embodiment the NA is selected from the group of NAs depicted in FIG. 1. Preferably the NA is NAF 5a or NAA 10, preferably NAA 10.

In another aspect the present invention relates to an isolated polypeptide or functional variant or fragment thereof from Hypoxylon spp. that catalyzes a biochemical reaction in the biosynthetic pathway leading from GGI 2 to NAA 10.

In another aspect the present invention relates to an isolated polynucleotide encoding at least one polypeptide from Hypoxylon spp. that catalyzes a biochemical reaction in the biosynthetic pathway leading from GGI 2 to NAA 10.

In one embodiment the isolated polypeptide is an oxygenase, preferably a cytochrome P450 oxygenase or a FAD-dependent oxygenase. Preferably the cytochrome P450 oxygenase is NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodJ (SEQ ID NO:21), or NodZ (SEQ ID NO:39). Preferably the FAD dependent oxygenase is NodM (SEQ ID NO:12), NodO (SEQ ID NO:18), NodY1 (SEQ ID NO:27), or NodY2 (SEQ ID NO:36). In one embodiment the isolated polypeptide is a transferase, preferably a GGT, or a prenyl transferase. Preferably the GGT is NodC (SEQ ID NO:24). Preferably the prenyl transferases are NodD1 (SEQ ID NO:33), or NodD2 (SEQ ID NO:30). In one embodiment the isolated polypeptide is a IDT cyclase. Preferably the IDT cyclase is NodB (SEQ ID NO: 15). In one embodiment the isolated polypeptide is NodS (SEQ ID NO:50). In one embodiment the isolated polypeptide is NodI (SEQ ID NO:56).

In one embodiment the isolated polypeptide catalyzes a biochemical reaction in the biosynthetic pathway leading from GGI 2 to NAF 5a. Preferably the isolated polypeptide is a GGT, a FAD-dependent oxygenase, an IDT cyclase, or a cytochrome P450 oxygenase. Preferably the GGT is NodC (SEQ ID NO:24). Preferably the FAD-dependent oxygenase is NodM (SEQ ID NO:12). Preferably the IDT cyclase is NodB (SEQ ID NO:15). Preferably the cytochrome P450 oxygenase is NodW (SEQ ID NO:3).

In one embodiment the isolated polypeptide or functional variant or fragment thereof is encoded by a nucleic acid according to the invention.

In another aspect the invention relates to a method of making at least one Hypoxylon spp. polypeptide or functional variant or fragment thereof comprising heterologously expressing an isolated nucleic acid sequence, TU or vector according to the invention in an isolated host cell.

In one embodiment the at least one Hypoxylon spp. polypeptide is a polypeptide according to the invention as contemplated herein for any other aspect of the invention.

In one embodiment the at least one Hypoxylon spp. polypeptide is a polypeptide comprising an amino acid sequence of SEQ ID NO: NodW (SEQ ID NO:3) or a functional variant or fragment thereof. Preferably the polypeptide consists essentially or consists of SEQ ID NO: NodW (SEQ ID NO:3). In one embodiment the isolated host cell comprises fungal mycelia of the genus Penicillium, preferably P. paxilli.

Specifically contemplated for this aspect of the invention are various embodiments set out for any other aspect of the invention that relate to the heterologous expression (including choice of appropriate regulatory sequences), genetic elements, TUs, multigene constructs, host cells, and vectors.

In one embodiment the at least one polypeptide is an oxygenase, preferably a cytochrome P450 oxygenase or a FAD-dependent oxygenase. Preferably the cytochrome P450 oxygenase is NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodJ (SEQ ID NO:21), or NodZ (SEQ ID NO:39). Preferably the FAD dependent oxygenase is NodM (SEQ ID NO: 12), NodO (SEQ ID NO:18), NodY1 (SEQ ID NO:27), or NodY2 (SEQ ID NO:36). In one embodiment the isolated polypeptide is a transferase, preferably a GGT, or a prenyl transferase. Preferably the GGT is NodC (SEQ ID NO:24). Preferably the prenyl transferases are NodD1 (SEQ ID NO:33), or NodD2 (SEQ ID NO:30). In one embodiment the isolated polypeptide is a IDT cyclase. Preferably the IDT cyclase is NodB (SEQ ID NO:15). In one embodiment the isolated polypeptide is NodS (SEQ ID NO:50). In one embodiment the isolated polypeptide is NodI (SEQ ID NO:56).

In one embodiment the at least one polypeptide catalyzes a biochemical reaction in the biosynthetic pathway leading from GGI 2 to NAF 5a. Preferably at least one polypeptide is a GGT, a FAD-dependent oxygenase, an IDT cyclase, or a cytochrome P450 oxygenase. Preferably the GGT is NodC (SEQ ID NO:24). Preferably the FAD-dependent oxygenase is NodM (SEQ ID NO:12). Preferably the IDT cyclase is NodB (SEQ ID NO:15). Preferably the cytochrome P450 oxygenase is NodW (SEQ ID NO:3).

In one embodiment the least one polypeptide comprises the amino acid sequence of SEQ ID NO: NodW (SEQ ID NO:3) or a functional variant or fragment thereof. Preferably the polypeptide consists essentially or consists of SEQ ID NO: NodW (SEQ ID NO:3). In one embodiment the isolated host cell comprises fungal mycelia of the genus Penicillium, preferably P. paxilli.

In one embodiment at least one heterologous polypeptide catalyzes the transformation of a substrate in the biosynthetic pathway leading to the formation of NAF 5a.

In one embodiment the substrate is selected from the group consisting of GGPP 1a, indole-3-glycerol phosphate 1b, GGI 2, mono-epoxidized GGI 3a, emindole SB 4a, NAF 5a, NAE 6a, NAD 7a, NAC 8, and NAB 9.

In one embodiment the transformation is selected from the group consisting of a condensation, an oxidation, or a cyclization.

In one embodiment the substrates that are transformed are GGPP 1a and indole-3-glycerol phosphate 1b, and the transformation is a condensation.

In one embodiment the substrate that is transformed is GGI 2 and the transformation is an oxidation.

In one embodiment the substrate that is transformed is mono-epoxidized GGI 3a and the transformation is a cyclization.

In one embodiment the substrate that is transformed is emindole SB 4a and the transformation is an oxidation.

In one embodiment the substrate that is transformed is NAF 5a and the transformation is a condensation.

In one embodiment the substrate that is transformed is NAE 6a and the transformation is an oxidation.

In one embodiment the substrate that is transformed is NAD 7a and the transformations are an oxidation and a condensation.

In one embodiment the substrate that is transformed is NAC 8 and the transformation is an oxidation.

In one embodiment the substrate that is transformed is NAB 9 and the transformation is an oxidation.

In another aspect the invention relates to an isolated host cell that produces, by heterologous expression, at least one polypeptide involved in the biosynthetic pathway leading to NAA 10.

In one embodiment the at least one polypeptide catalyzes a biochemical reaction in the biosynthetic pathway leading from GGI 2 to NAF 5a. Preferably at least one polypeptide is a GGT, a FAD-dependent oxygenase, an IDT cyclase, or a cytochrome P450 oxygenase. Preferably the GGT is NodC (SEQ ID NO:24). Preferably the FAD-dependent oxygenase is NodM (SEQ ID NO:12). Preferably the IDT cyclase is NodB (SEQ ID NO: 15). Preferably the cytochrome P450 oxygenase is NodW (SEQ ID NO:3).

In some embodiments specifically contemplated for this aspect of the invention, the at least one polypeptide is a polypeptide involved in the biosynthetic pathway leading to NAA 10 as defined herein for any other aspect of the invention.

In one embodiment at least one polypeptide is a polypeptide or functional variant or fragment thereof of the invention. In one embodiment the polypeptide or functional variant or fragment thereof is encoded by a nucleic acid sequence of the invention.

In one embodiment the at least one polypeptide is involved in the biosynthetic pathway leading to NAF 5a. In one embodiment the least one polypeptide comprises the amino acid sequence of SEQ ID NO: NodW (SEQ ID NO:3) or a functional variant or fragment thereof. Preferably the polypeptide consists essentially or consists of SEQ ID NO: NodW (SEQ ID NO:3). In one embodiment the isolated host cell comprises fungal mycelia of the genus Penicillium, preferably P. paxilli.

In another aspect the invention relates to a method of producing at least one NA comprising contacting a carbohydrate comprising substrate with a recombinant cell transformed with a nucleic acid that results in an increased level or activity of a polypeptide selected from the group consisting of NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodM (SEQ ID NO:12), NodB (SEQ ID NO:15), NodO (SEQ ID NO:18), NodJ (SEQ ID NO:21), NodC (SEQ ID NO:24), NodY1 (SEQ ID NO:27), NodD2 (SEQ ID NO:30), NodD1 (SEQ ID NO:33), NodY2 (SEQ ID NO:36), NodZ (SEQ ID NO:39), NodS (SEQ ID NO:50), and NodI (SEQ ID NO:56) or a functional variant or fragment thereof compared to the cell prior to transformation, such that the substrate is metabolized to at least one NA.

In one embodiment the nucleic acid encodes at least one polypeptide that catalyzes a biochemical reaction in the biosynthetic pathway leading from GGI 2 to NAF 5a, preferably that catalyzes the biochemical reaction that leads from emindole SB 4a to NAF 5a.

In one embodiment the recombinant host cell is an isolated host cell of the invention as described herein.

In one embodiment the carbohydrate is comprised in a culture media. In one embodiment the culture media is CDYE or a variation thereof that supports the growth of the recombinant cell.

In one embodiment the nucleic acid encodes least one polypeptide that is an oxygenase, preferably a cytochrome P450 oxygenase or a FAD-dependent oxygenase. Preferably the cytochrome P450 oxygenase is NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodJ (SEQ ID NO:21), or NodZ (SEQ ID NO:39). Preferably the FAD dependent oxygenase is NodM (SEQ ID NO:12), NodO (SEQ ID NO:18), NodY1 (SEQ ID NO:27), or NodY2 (SEQ ID NO:36). In one embodiment the isolated polypeptide is a transferase, preferably a GGT, or a prenyl transferase. Preferably the GGT is NodC (SEQ ID NO:24). Preferably the prenyl transferases are NodD1 (SEQ ID NO:33), or NodD2 (SEQ ID NO:30). In one embodiment the isolated polypeptide is a IDT cyclase. Preferably the IDT cyclase is NodB (SEQ ID NO: 15). In one embodiment the isolated polypeptide is NodS (SEQ ID NO:50). In one embodiment the isolated polypeptide is NodI (SEQ ID NO:56).

In one embodiment the nucleic acid encodes at least one GGT, FAD-dependent oxygenase, IDT cyclase, or cytochrome P450 oxygenase. In one embodiment the nucleic acid codes at least two, preferably at least three, preferably all four of the GGT, FAD-dependent oxygenase, IDT cyclase, or cytochrome P450 oxygenase. Preferably the GGT is NodC (SEQ ID NO:24). Preferably the FAD-dependent oxygenase is NodM (SEQ ID NO:12). Preferably the IDT cyclase is NodB (SEQ ID NO:15). Preferably the cytochrome P450 oxygenase is NodW (SEQ ID NO:3).

In one embodiment a polypeptide selected from the group consisting of NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodM (SEQ ID NO:12), NodB (SEQ ID NO:15), NodO (SEQ ID NO:18), NodJ (SEQ ID NO:21), NodC (SEQ ID NO:24), NodY1 (SEQ ID NO:27), NodD2 (SEQ ID NO:30), NodD1 (SEQ ID NO:33), NodY2 (SEQ ID NO:36), NodZ (SEQ ID NO:39), NodS (SEQ ID NO:50), and NodI (SEQ ID NO:56) or a functional variant or fragment thereof comprises the amino acid sequence of a NodW (SEQ ID NO:3), NodR (SEQ ID NO:6), NodX (SEQ ID NO:9), NodM (SEQ ID NO:12), NodB (SEQ ID NO:15), NodO (SEQ ID NO:18), NodJ (SEQ ID NO:21), NodC (SEQ ID NO:24), NodY1 (SEQ ID NO:27), NodD2 (SEQ ID NO:30), NodD1 (SEQ ID NO:33), NodY2 (SEQ ID NO:36), NodZ (SEQ ID NO:39), NodS (SEQ ID NO:50), and NodI (SEQ ID NO:56) or functional variant or fragment thereof of the invention.

In one embodiment the polypeptide comprises the amino acid sequence of NodW (SEQ ID NO:3) or a functional variant or fragment thereof. Preferably the polypeptide consists essentially or consists of SEQ ID NO: NodW (SEQ ID NO:3). In one embodiment the isolated host cell comprises fungal mycelia of the genus Penicillium, preferably P. paxilli.

In one embodiment the at least one heterologous or introduced homologous nucleic acid sequence is at least one NAA 10 biosynthetic gene selected from the group consisting of nodW, nodR, nodX, nodM, nodB, nodO, nodJ, nodC, nodY1, nodD2, nodD1, nodY2, nodZ nodS, and nodI as described herein.

In one embodiment one of the two different GGPPS enzymes is produced in H. pulicicidum by heterologous expression.

In one embodiment one of the two different GGPPS enzymes is encoded by a second copy of a native H. pulicicidum gene that encodes a GGPPS enzyme.

In another aspect the invention relates to an isolated strain of Hypoxylon pulicicidum that comprises a genetic modification that leads to an increased biosynthesis of NAA 10.

In one embodiment the isolated strain comprises increased expression of at least one GGPPS enzyme as compared to a control strain of H. pulicicidum, preferably H. pulicicidum ATCC 74245.

In one embodiment the increased expression is increased expression of the native primary GGPPS gene of H. pulicicidum via modification of genetic regulatory elements.

In one embodiment modification of genetic regulatory elements comprises operatively linking the native primary GGPPS gene to an alternative or modified promoter.

In one embodiment modification of genetic regulatory elements comprises operatively linking a native primary GGPPS gene to a more robust native promoter. In one embodiment the native primary GGPPS gene is an introduced homologous gene.

In one embodiment the increased expression is the result of heterologous expression of biosynthetic genes that contribute to NAA 10 biosynthesis.

In one embodiment the increased expression is due to expression of heterologous genes in H. pulicicidum that have equivalent biochemical function to genes identified in the Nod cluster, wherein the Nod cluster is as described herein.

In one embodiment the increased expression is due to expression of heterologous genes in H. pulicicidum to remediate limitations in the supply of substrate compounds or biosynthetic intermediates that are necessary for NAA 10 biosynthesis.

In one embodiment the increased expression is due to heterologous expression in H. pulicicidum of any gene that encodes a GGT that catalyzes the condensation of GGPP 1a and indole-3-glycerol phosphate 1b to produce 3-geranylgeranyl indole 2.

In one embodiment the increased expression is due to heterologous expression in H. pulicicidum of any gene that encodes a GGPPS.

In one embodiment the increased expression is due to heterologous expression in H. pulicicidum of any gene that encodes a FAD-dependent oxidase that creates the single epoxidized-GGI product 3a.

In one embodiment the increased expression is due to heterologous expression in H. pulicicidum of any gene that encodes an enzyme that cyclises the single epoxidized-GGI 3a to produce emindole SB 4a.

In one embodiment the increased expression is due to heterologous expression in H. pulicicidum of any gene that encodes an oxidase that oxidises emindole SB 4a to produce a nodulisporic acid.

In one embodiment the increased expression is due to at least one genetic modification that leads to the increased expression of a NA biosynthetic gene selected from the group consisting of nodW, nodR, nodX, nodM, nodB, nodO, nodJ, nodC, nodY1, nodD2, nodD1, nodY2, nodZ, nods and nodI as described herein.

Specifically contemplated for this aspect of the invention are various embodiments set out for any other aspect of the invention that relate to the isolated strains of Hypoxylon pulicicidum as described herein including as relates to increased expression of NAA 10, and also including all embodiments set out regarding heterologous expression (including choice of appropriate regulatory sequences), genetic elements, TUs, multigene constructs, host cells, and vectors as described herein.

In this specification where reference has been made to patent specifications, other external documents, or other sources of information, this is generally for the purpose of providing a context for discussing the features of the invention. Unless specifically stated otherwise, reference to such external documents is not to be construed as an admission that such documents; or such sources of information, in any jurisdiction, are prior art, or form part of the common general knowledge in the art.

The invention will now be illustrated in a non-limiting way by reference to the following examples.

EXAMPLES

Materials and Methods

gDNA Isolation for Genome Sequencing and TUM Amplification

Genomic DNA for genome sequencing and TUM amplification by PCR was isolated from Penicillium paxilli strain ATCC® 26601™ (PN2013) and Hypoxylon pulicicidum strain ATCC® 74245™, according to Byrd, A. D.; Schardl, C. L.; Songlin, P. J.; Mogen, K. L.; Siegel, M. R. Curr. Genct. 1990, 18 (4), 347-354. with modifications. Sterile 2.4% (w/v) Difco™ potato dextrose broth (Becton, Dickinson and Company, Maryland, U.S.A.) in Milli-Q® water was prepared in 25 mL aliquots in 125 mL Erlenmeyer flasks and inoculated with 5×10⁶spores or ˜1 cm²freshly ground mycelia (for non-sporulating strains). Cultures were incubated for 2-4 days at 22° C. with shaking (200 rpm). The fermentation broth was filtered through a sterile nappy liner and the mycelia were rinsed three times with sterile water. Mycelia was transferred to a sterile 15 mL centrifuge tube and flash frozen in liquid nitrogen for lyophilization for 24-48 hours. 15-20 mg freeze dried mycelia was placed in a mortar with liquid nitrogen and ground into a powder. The ground mycelia was transferred into a 2 mL tube and resuspended in 1 mL extraction buffer (150 mM EDTA, 50 mM Tris-HCl, and 1% (w/v) sodium lauroyl sarcosine). 1.6 mg proteinase K was added to the tube and contents were incubated at 37º C for 30 min. The tube was centrifuged at 13,000 rpm for 10 min and the supernatant was transferred to a fresh 2 mL tube. 500 μL phenol and 500 μL chloroform were added to the tube and the contents were mixed by vortex before centrifugation for 10 min at 13,000 rpm. The aqueous phase was transferred to a fresh 2 mL tube and washed two more times with 500 μL phenol and 500 μL chloroform as previously described. The aqueous phase was then transferred to a fresh 2 mL tube and washed (vortex and centrifuge at 13,000 rpm for 10 min) with 1 mL chloroform. The aqueous phase was transferred to a fresh 2 mL tube and mixed with 1 mL chilled isopropanol. The DNA was precipitated overnight at −20° C. and pelleted at 13,000 rpm for 10 min. The supernatant was discarded and the DNA was resuspended in 1 mL 1 M NaCl. The tube was incubated for 10 min at room temperature and then centrifuged at 13,000 rpm for 10 min to pellet polysaccharides. The supernatant was transferred to a fresh tube and mixed with 1 mL isopropanol. The tube was incubated at room temperature for 10 min and DNA was pelleted by centrifugation at 13,000 rpm for 10 min. The supernatant was discarded and 1 mL chilled 70% ethanol was added to the pellet without resuspension. The tube was centrifuged for 2 min at 13,000 rpm and the supernatant was discarded. The tube was centrifuged for 1 min at 13,000 rpm and residual 70% ethanol was pipette off. The pellet was air dried at room temperature, resuspended in 50 μL Milli-Q® water and stored at −20° C.

MIDAS Design Overview

The MIDAS toolkit is based on the Golden Gate assembly technique, which utilises the ability of Type IIS restriction enzymes to seamlessly join multiple DNA fragments together in a single reaction. MIDAS makes use of three Type IIS restriction enzymes, AarI, BsaI and BsmBI, which generate user-defined 4 bp overhangs upon cleavage. Through the appropriate choice of these user-defined overhangs, and the appropriate orientation of the Type IIS sites flanking each of the DNA fragments, multiple fragments can be assembled into a recipient plasmid (also called a destination vector) in an ordered (directional) fashion using a one-pot restriction-ligation reaction. The recipient plasmid contains a marker gene (typically the lacZα gene for blue/white screening) flanked by two divergently oriented recognition sites for a Type IIS enzyme; these elements, collectively called the ‘Golden Gate cloning cassette’, are replaced by the insert during the assembly reaction.

As with other recently described Golden Gate-based modular assembly techniques, assembly of genes and multigene constructs using MIDAS is a hierarchical process. At the first level (MIDAS Level-1), functional modules (promoters, CDS, terminators, tags, etc.) are cloned into the Level-1 destination vector (pML1), where they form libraries of reusable, sequence-verified parts. The complementary design of the modules and destination vector ensures that, once cloned into pML1, these modules can be released from the vector by digestion with BsaI.

At the second level (Level-2), compatible sets of the sequence-verified Level-1 modules are released from pML1 and assembled into a Level-2 destination vector (pML2) using a BsaI-mediated Golden Gate reaction, leading to creation of a Level-2 plasmid containing a eukaryotic TU. Once again, the design rules ensure that each assembled TU can be released from the pML2 vector—this time by digestion either with AarI or BsmBI (depending on the pML2 vector in which the TU was assembled).

At Level-3, the TUs that were assembled at Level-2 are released from the pML2 plasmids and are sequentially assembled together in a Level-3 destination vector (pML3), using either AarI- or BsmBI-mediated Golden Gate reactions, to form functional multigene constructs, which can then be transformed into the desired expression host.

Level-1: Module Cloning

At Level-1, functional TUMs are generated, either as a PCR product, or as a synthetic polynucleotide sequence (from a gene synthesis company), and are cloned into the Level-1 destination vector (pML1) by BsmBI-mediated Golden Gate cloning. In order to be cloned into the pML1 vector, the PCR primers are designed so that each amplified TUM is flanked by two convergent BsmBI sites, BsmBI[CTCG] and [AGAC]BsmBI, which upon restriction enzyme cleavage, generate sticky ends that are compatible with those of the BsmBI sites present in the pML1 destination vector. Thus, the Golden Gate cloning cassette present in pML1 consists of two divergent BsmBI sites flanking a lacZα scoreable marker: 5′-[CTCG]BsmBI-lacZα-BsmBI[AGAC]-3′ (FIG. 11A).

To enable subsequent (i.e., Level-2) assembly of full-length TUs, each TUM is designed to be flanked by four module-specific nucleotides (NNNN) at the 5′ end, and four module-specific nucleotides (NNNN) at the 3′ end, which are included as part of the PCR primer sequences. The complementary design of the amplified modules and the pML1 vector ensures that, when amplified TUMs are cloned into pML1 using the BsmBI-mediated Golden Gate reaction, each TUM becomes flanked by convergent BsaI recognition sites, and the module-specific nucleotides (NNNN and NNNN) become the BsaI-specific 4 bp overhangs when the module is released from pML1 during the subsequent (i.e., Level-2) BsaI-mediated Golden Gate assembly of the full-length TU. Thus, the overall structure of each module in the PCR product (or synthetic polynucleotide) takes the form: 5′-BsmBI[CTCG]NNNN-TUM-NNNNtg[AGAC]BsmBI-3′ (FIG. 11B), which becomes 5′-BsaI[NNNN]-TUM-[NNNN]BsaI-3′ in pML1, following BsmBI-mediated cloning (FIG. 11C).

As each TUM is defined by its flanking four nucleotides, these module-specific bases effectively form an address system for each TUM and they determine its position and orientation within the assembled TU. The developers of MoClo and GoldenBraid2.0 have already worked in concert to develop a common syntax or set of standard addresses for plant expression (referred to as ‘fusion sites’ in the MoClo system and ‘barcodes’ in GoldenBraid2.0) for a wide variety of TUMs to facilitate part exchangeability, and this standard is also adopted here for MIDAS-based assembly of TUs for expression in filamentous fungi (FIG. 12).

Thus, for filamentous fungal expression, a ProUTR module (comprising a promoter, 5′ untranslated region (UTR) and ATG initiation codon) would have GGAG as the module-specific 5′ nucleotides, and AATG as the module-specific 3′ nucleotides (i.e., 5′-GGAG-ProUTR-AATG-3′), with the translation initiation codon underlined. Similarly, a CDS module would be flanked by AATG and GCTT (i.e., 5′-AATG-CDS-GCTT-3′), while a UTRterm module (consisting of a 3′UTR and a 3′ non-transcribed region, including the polyadenylation signal) would have the form 5′-GCTT-UTRterm-CGCT-3′. Considerations for the design of PCR primers for amplifying these three types of TUM are shown in Table 12.

Following the BsmBI-mediated assembly of TUMs in pML1, reactions are transformed into an E. coli strain such as DH5α (or equivalent) and spread onto LB plates supplemented with spectinomycin, IPTG and X-Gal. Plasmids harbouring a cloned TUM are identified by screening white colonies and confirmed by sequencing.

At MIDAS Level-1, it is important that all internal recognition sites for AarI, BsaI and BsmBI are masked or eliminated from the TUMs. The process of masking or removal of such forbidden sites—referred to as “domestication”—can be achieved by; (i) excluding these sites when ordering the sequences from a gene synthesis company, (ii) directed mutagenesis, or (iii) using masking oligonucleotides that form triplexes with the target DNA, thereby preventing restriction enzyme cleavage. In the same way that Type IIS enzymes have previously been utilised for mutagenesis and for Golden Gate domestication purposes, we domesticated MIDAS modules by designing PCR primers (referred to as domestication primers) that overlap the internal Type IIS restriction site and which contain a single nucleotide mismatch that destroys the site. Because the PCR products are designed to be assembled together in MIDAS using a BsmBI-mediated Golden Gate reaction to form the full-length domesticated TUM in pML1, it is important that the MIDAS domestication primers be designed with BsmBI restriction sites that generate compatible overhangs at their 5′ ends.

Level-2: TU Assembly

At Level-2, compatible sets of cloned and sequence-verified Level-1 TUMs (for example ProUTR, CDS and UTRterm modules) are assembled into a pML2 destination vector using a BsaI-mediated Golden Gate reaction, leading to creation of a Level-2 plasmid (pML2 entry clone) containing a complete (i.e., full-length) eukaryotic TU. The module address standard described earlier ensures that the assembly of a TU proceeds in an ordered, directional fashion, with the 3′ end of one module being compatible with the 5′ end of the next module.

The module-specific bases GGAG, located at the 5′ end of ProUTR modules, and CGCT, at the 3′ end of UTRterm modules, are compatible with the overhangs generated by BsaI digestion of the pML2 destination vectors, and these bases therefore define the outermost cloning boundaries of a Level-2 assembly.

In MIDAS, there are eight Level-2 (pML2) destination vectors into which a TU can be assembled, the choice of which depends on the desired configuration of TUs in the multigene plasmid produced at Level-3, namely: (i) the desired order in which TUs are added to the multigene assembly, (ii) the desired direction in which the multigene plasmid is assembled and (iii) the desired orientation of each TU in the multigene plasmid. These features are discussed further below.

The pML2 vectors are distinguished from one another by the arrangement of specific sequence features that are central to the operation of MIDAS. These sequence features, collectively called the MIDAS cassette (FIG. 13), define the Level-2 assembly of TUs and govern the assembly of multigene constructs produced at Level-3.

Each MIDAS cassette is defined by (i) having a Golden Gate cloning cassette with flanking, divergent BsaI recognition sites, (ii) differing arrangements of recognition sites for AarI and BsmBI and (iii) the presence or absence of a lacZα scoreable marker. These features are described in greater detail.

In contrast to the usual Golden Gate cloning cassette (which typically contains a lacZα gene for blue/white screening), the Golden Gate cloning cassettes in all eight pML2 vectors contain a mutant E. coli pheS gene (driven by the promoter of the E. coli gene for chloramphenicol acetyltransferase) flanked by divergent BsaI recognition sites. The Thr²⁵¹Ala/Ala²⁹⁴Gly double mutant of the E. coli pheS gene used here confers high lethality to cells grown on LB media supplemented with the phenylalanine analogue 4-chloro-phenylalanine, 4CP.³⁶During BsaI-mediated Level-2 assembly of TUs, the mutant pheS gene is eliminated from the pML2 vectors and can therefore be used as a negative selection marker.

The eight pML2 vectors can be divided into two classes, “Blue” and “White”, depending on the presence or absence, respectively, of a lacZα gene in the MIDAS cassette (see FIG. 13). There are four “Blue” pML2 vectors (indicated by the “B” in the plasmid name) and four “White” pML2 vectors (indicated by the “W” in the plasmid name). The “Blue” and “White” vectors also differ in the relative configuration of the AarI and BsmBI restriction sites in their MIDAS cassettes. Thus, in the “Blue” vectors, the entire MIDAS cassette is flanked by convergent BsmBI sites and nested within is the lacZα gene flanked by divergent AarI sites. In the “White” vectors, the enzyme configuration is switched (the entire MIDAS cassette is flanked by convergent AarI sites and nested within are two divergent BsmBI sites) and there is no lacZα gene. It is important to note that the lacZα chromogenic marker in the pML2 vectors is not used for blue/white screening during the Level-2 Golden Gate assembly of TUs (it is reserved for the Level-3 cloning), but the choice of “Blue” or “White” vector into which a TU should be assembled must be made during Level-2 assembly of TUs as this will determine the order in which that TU is added to the multigene construct at Level-3. Likewise, the AarI and BsmBI sites are also not used for Level-2 assembly of TUs; instead they are integral to the Level-3 assembly of multigene constructs. These considerations, including the differences between the (+) and (−) vectors, are discussed further below, under the Level-3 description.

The orientation (direction of transcription) of each TU can be freely defined by assembling each TU in either a pML2 “Forward” vector (indicated by “F” in the plasmid name) or a pML2 “Reverse” vector (indicated by “R” in the plasmid name). The pML2 “Reverse” vectors have their BsaI recognition sites (for Golden Gate assembly of TUs) switched relative to the BsaI fusion sites in the pML2 “Forward” vectors. Thus, pML2 “Forward” vectors have their pheS-based Golden Gate cassette oriented 5′-[GGAG]BsaI-pheS custom character -BsaI[CGCT]-3′, while the pML2 “Reverse” vectors have their BsaI recognition sites switched: 5′-[AGCG]BsaI-pheS-BsaI[CTCC]-3′, where the arrowhead indicates the direction of transcription of the mutant pheS gene.

In contrast to the cloned Level-1 modules, the pML2 destination vectors confer kanamycin resistance, allowing efficient counter selection against Level-1 module backbones, while the mutant pheS gene provides powerful negative selection against any parental pML2 destination plasmids when E. coli DH5a cells (or equivalent) transformed with the assembly reactions are spread onto LB plates supplemented with kanamycin and 4CP.

Level-3: Assembly of Multigene Constructs

At MIDAS Level-3, TUs that were assembled in the pML2 plasmids are sequentially loaded (by binary assembly) into the Level-3 destination vector (pML3) to form the multigene construct.

Assembly of multigene constructs at Level-3 is crucially dependent on the relative configuration of the AarI and BsmBI restriction sites in the MIDAS cassettes located in the “Blue” and “White” pML2 vectors; the nested and inverted configuration of these restriction sites in the White vectors compared to the Blue vectors is a defining feature of the MIDAS multigene assembly process. In the “Blue” vectors, the entire MIDAS cassette has flanking convergent BsmBI sites and nested within is a lacZα gene flanked by divergent AarI sites. In the “White” vectors, the enzyme configuration is inverted (the entire MIDAS cassette has flanking convergent AarI sites and nested within are two divergent BsmBI sites) and there is no lacZα gene. As illustrated in FIG. 14, the nesting and inversion of the restriction sites in the “Blue” and “White” vectors mean that TUs assembled into “White” MIDAS cassettes can be inserted into “Blue” MIDAS cassettes using AarI-mediated Golden Gate reactions and, conversely, TUs assembled into “Blue” MIDAS cassettes can be cloned into “White” MIDAS cassettes using BsmBI-mediated Golden Gate reactions. This cycle of cloning (i.e., alternating between “White” and “Blue” pML2 entry clones) can be repeated indefinitely.

The Golden Gate cloning cassette found in the Level-3 destination vector, pML3, consists of a lacZα gene flanked by divergent AarI sites: [CATT]AarI-lacZa-AarI[CGTA], so the MIDAS Level-3 assembly is always initiated (i.e., the first TU is always added) using an AarI-mediated Golden Gate reaction between pML3 and a TU that has been assembled into a pML2 “White” destination vector (FIG. 14). The plasmid generated is then used in a BsmBI-mediated Golden Gate reaction with a TU cloned into a pML2 “Blue” destination vector. Further TUs are added by following this approach of alternating between Aurl- and BsmBI-mediated Golden Gate reactions using pML2 “White” and pML2 “Blue” entry clones, respectively. Thus, cach plasmid generated by cloning a TU into the multigene construct becomes the destination vector for the next cycle of TU addition (FIG. 14).

Following each cloning cycle, E. coli DH5a cells (or equivalent) are transformed with the Golden Gate reactions, spread onto LB plates supplemented with spectinomycin, IPTG and X-Gal, and positive clones are identified by blue/white screening. Spectinomycin selects for cells that have taken up the Level-3 plasmid and counter selects against any pML2 plasmid backbones. Note that, whereas the lacZα chromogenic marker present in the pML2 “Blue” vectors was not previously utilised during Level-2 assembly of TUs, it is now, at the level of multigene assembly (Level-3), that it becomes used for blue/white screening. Thus, for TUs assembled into the multigene construct using AarI-mediated Golden Gate reactions, white colonies are picked for analysis, while TUs assembled into the multigene construct using BsmBI-mediated Golden Gate reactions are analysed by picking blue colonies (see Table 13).

In its simplest configuration, MIDAS can achieve multigene assembly using only two pML2 destination vectors: one “White” vector and one “Blue” vector (FIG. 15A). The full set of eight pML2 vectors are provided to enable maximum user control over: (i) the order in which each TU is added to the growing multigene construct, (ii) the desired orientation (that is, the direction of transcription) of each TU and (iii) the polarity of assembly, i.e., the direction in which incoming TUs are loaded into the multigene construct.

Firstly, and as described earlier, the order of addition of each TU to the growing multigene construct is governed by the choice of “White” or “Blue” pML2 destination vector into which the TUs are assembled.

Secondly, and as described previously when discussing the Level-2 features, the orientation (direction of transcription) of a TU can be freely defined by the choice of “Forward” or “Reverse” pML2 vector into which the TU is assembled. Extending MIDAS to include the option of assembling TUs in either orientation expands the vector suite to four pML2 plasmids (see FIG. 13 and FIG. 15B).

Thirdly, the polarity of multigene assembly (i.e., the direction in which new TUs are added to the growing multigene assembly) can also be freely defined—in this case by assembling TUs in pML2 destination vectors of either “plus” (+) or “minus” (−) polarity (FIG. 13). The use of a pML2(+) entry clone for Level-3 assembly ensures that the TU added next will be added in the same direction as the direction of transcription of the Spec^Rgene found in pML3, i.e. the TU assembled next in the multigene construct will be added to the right of the TU that was added using the pML2(+) entry clone (as illustrated in FIG. 15A and FIG. 15B). In contrast, use of a pML2(−) entry clone for Level-3 assembly forces the next TU to be added in the direction opposite to that of the direction of transcription of the Spec^Rgene found in pML3, so that the next TU loaded into the multigene construct will be added to the left of the TU that was added using the pML2(−) entry clone. If, however, entry clones of both polarity (i.e., both pML2(+) and pML2(−) entry clones) are used to build the multigene construct, then this confers MIDAS with the ability to switch the direction in which new TUs are added to the Level-3 assembly, and for the hypothetical assembly shown in FIG. 15C all subsequently added TUs will be nested between TU3 and TU2.

Bacterial and Fungal Strains

Routine growth of Escherichia coli was performed at 37ºC in LB broth. Chemically competent E. coli HST08 Stellar cells (Clontech Laboratories, Inc.) were used for routine transformations and maintenance of plasmids. Penicillium paxilli strains used in this study are shown in Table 7.

Protocols for MIDAS Level-1 Module Cloning

PCR-amplified modules were purified using spin-column protocols and cloned into the MIDAS Level-1 plasmid, pML1, by BsmBI-mediated Golden Gate assembly. Typically, 1-2 μL (approximately 50-200 ng) of pML1 plasmid DNA from a miniprep was mixed with 1-2 μL of each purified PCR fragment, 1 L of BsmBI (20 U/μL), 1 μL of T4 DNA Ligase (20 U/μL) and 2 μL of 10×T4 DNA Ligase buffer in a total reaction volume of 20 μL. Reactions were incubated at 37ºC for 1 to 3 hours and an aliquot (typically 2-3 μL) was transformed into 30 μL of E. coli HST08 Stellar competent cells by heat shock. Following the recovery period (i.e., addition of 250 μL SOC medium and incubation at 37° C. for 1 hour), aliquots of the transformation mix were spread onto LB agar plates supplemented with 50 μg/mL spectinomycin, 1 mM IPTG and 50 μg/mL X-Gal. Plates were incubated overnight at 37ºC, and white colonies were chosen for analysis.

Protocols for MIDAS Level-2 TU Assembly

Using the modules cloned at Level-1, full-length TUs were assembled into MIDAS Level-2 plasmids by BsaI-mediated Golden Gate assembly. Typically, 40 fmol of pML2 plasmid DNA was mixed with 40 fmol of plasmid DNA of each Level-1 entry clone, 1 μL of BsaI-HF (20 U/μL), 1 μL of T4 DNA Ligase (20 U/μL) and 2 μL of 10×T4 DNA Ligase buffer in a total reaction volume of 20 μL. Reactions were incubated in a DNA Engine PTC-200 Peltier Thermal Cycler (Bio-Rad) using the following parameters: 45 cycles of (2 minutes at 37° C. and 5 minutes at 16° C.), followed by 5 minutes at 50° C. and 10 minutes at 80° C. Reactions were transformed as described for the Level-1 assembly and spread onto LB agar plates containing 75 μg/mL kanamycin and 1.25 mM 4CP. Following overnight incubation at 37° C., colonies were picked for analysis.

Protocols for MIDAS Level-3 Multigene Assembly

Full-length TUs assembled at Level-2, were used to create multigene assemblies in the Level-3 destination vector by alternating Golden Gate assembly using either AarI (for TUs cloned into pML2 “White” vectors) or BsmBI (for TUs cloned into pML2 “Blue” vectors). Typically, 40 fmol of Level-3 destination vector plasmid DNA was mixed with 40 fmol of Level-2 entry clone plasmid DNA, 1 μL of BsaI-HF (20 U/μL), 1 μL of T4 DNA Ligase (20 U/μL) and 2 μL of 10×T4 DNA Ligase buffer in a total reaction volume of 20 μL. Reactions were incubated in a DNA Engine PTC-200 Peltier Thermal Cycler (Bio-Rad) using the following parameters: 45 cycles of (2 minutes at 37° C. and 5 minutes at 16° C.), followed by 5 minutes at 37° C. and 10 minutes at 80° C. Reactions were transformed as described for the Level-1 assembly and spread onto LB agar plates supplemented with 50 μg/mL spectinomycin, 1 mM IPTG and 50 μg/mL X-Gal. Plates were incubated overnight at 37° C. For AarI-mediated assembly reactions, white colonies were chosen for analysis while, for BsmBI-mediated assembly reactions, blue colonies were selected.

Media and Reagents Used for Fungal Work.

CDYE (Czapex-Dox/Yeast extract) medium with trace elements was made with deionized water and contained 3.34% (w/v) Czapex-Dox (Oxoid Ltd., Hampshire, England), 0.5% (w/v) yeast extract (Oxoid Ltd., Hampshire, England), and 0.5% (v/v) trace element solution. For agar plates, Select agar (Invitrogen, California, U.S.A.) was added to 1.5% (w/v).

Trace element solution was made in deionized water and contained 0.004% (w/v) cobalt(II) chloride hexahydrate (Ajax Finechem, Auckland, New Zealand), 0.005% (w/v) copper(II) sulfate pentahydrate (Scharlau, Barcelona, Spain), 0.05% (w/v) iron(II) sulfate heptahydrate (Merck, Darmstadt, Germany), 0.014% (w/v) manganese(II) sulfate tetrahydrate, and 0.05% (w/v) zinc sulfate heptahydrate (Merck, Darmstadt, Germany). The solution was preserved with 1 drop of 12 M hydrochloric acid.

Regeneration (RG) medium was made with deionized water and contained 2% (w/v) malt extract (Oxoid Ltd., Hampshire, England), 2% (w/v) D(+)-glucose anhydrous (VWR International BVBA, Leuven, Belgium), 1% (w/v) mycological peptone (Oxoid Ltd., Hampshire, England), and 27.6% sucrose (ECP Ltd. Birkenhead, Auckland, New Zealand). Depending on whether the media was to be used for plates (1.5% RGA) or overlays (0.8% RGA), Select agar (Invitrogen, California, U.S.A.) was added to 1.5% or 0.8% (w/v), respectively.

Fungal Protocols—Protoplast Preparation

The preparation of fungal protoplasts for transformation was according to Yelton, M. M.; Hamer, J. E.; Timberlake, W. E. Proc. Natl. Acad. Sci. 1984, 81 (5), 1470-1474. with modifications. Five 25 mL aliquots of CDYE medium with trace elements, in 100 mL Erlenmeyer flasks, were inoculated with 5×10⁶spores and incubated for 28 hours at 28° C. with shaking (200 rpm). The fermentation broth from all five flasks was filtered through a sterile nappy liner and the combined mycelia were rinsed three times with sterile water and once with OM buffer (10 mM Na₂HPO₄and 1.2 M MgSO₄·7H₂O, brought to pH 5.8 with 100 mM NaH₂PO₄·2H₂O). Mycelia were weighed, resuspended in 10 mL of filter-sterilized Lysing Enzymes solution (prepared by resuspending Lysing Enzymes from Trichoderma harzianum (Sigma) at 10 mg/mL in OM buffer) per gram of mycelia, and incubated for 16 hours at 30° C. with shaking at 80 rpm. Protoplasts were filtered through a sterile nappy liner into a 250 mL Erlenmeyer flask. Aliquots (5 mL) of filtered protoplasts were transferred into sterile 15 mL centrifuge tubes and overlaid with 2 mL of ST buffer (0.6 M sorbitol and 0.1 M Tris-HCl at pH 8.0). Tubes were centrifuged at 2600×g for 15 minutes at 4° C. The white layer of protoplasts that formed between the OM and ST buffers in each tube was transferred (in 2 mL aliquots) into sterile 15 mL centrifuge tubes, gently washed by pipette resuspension in 5 mL of STC buffer (1 M sorbitol, 50 mM Tris-HCl at pH 8.0, and 50 mM CaCl₂)) and centrifuged at 2600×g for 5 minutes at 4° C. The supernatant was decanted off and pelleted protoplasts from multiple tubes were combined by resuspension in 5 mL aliquots of STC buffer. The STC buffer wash was repeated three times until protoplasts were pooled into a single 15 mL centrifuge tube. The final protoplasts pellet was resuspended in 500 μL of STC buffer and protoplast concentration was estimated with a hemocytometer. The protoplast stock was diluted to give a final concentration of 1.25×10⁸protoplasts per mL of STC buffer. Aliquots of protoplasts (100 μL) were used immediately for fungal transformations and excess protoplasts were preserved in 8% PEG solution (80 μL of protoplasts were added to 20 μL of 40% (w/v) PEG 4000 in STC buffer) in 1.7 mL micro-centrifuge tubes and stored at −80° C.

Fungal Protocols—Transformation of P. paxilli

Fungal transformations—modified from Vollmer, S. J.; Yanofsky, C. Proc. Natl. Acad. Sci. 1986, 83 (13), 4869-4873 and Oliver, R. P.; Roberts, I. N.; Harling, R.; Kenyon, L.; Punt, P. J.; Dingemanse, M. A.; van den Hondel, C. A. M. J. J. Curr. Genet. 1987, 12 (3), 231-233.—were carried out in 1.7 mL micro-centrifuge tubes containing 100 μL (1.25×10⁷) protoplasts, either freshly prepared in STC buffer, or stored in 8% PEG solution (as described above). A solution containing 2 μL of spermidine (50 mM in H₂O), 5 μL heparin (5 mg/mL in STC buffer), and 5 μg of plasmid DNA (250 μg/mL) was added to the protoplasts and, following incubation on ice for 30 minutes, 900 μL of 40% PEG solution (40% (w/v) PEG 4000 in STC buffer) was added. The transformation mixture was incubated on ice for a further 15-20 minutes, transferred to 17.5 mL of 0.8% RGA medium (prewarmed to 50° C.) in sterile 50 mL tubes, mixed by inversion, and 3.5 mL aliquots were dispensed onto 1.5% RGA plates. Following overnight incubation at 25° C., 5 mL of 0.8% RGA (containing sufficient geneticin to achieve a final concentration of 150 μg per mL of solid media) was overlaid onto each plate. Plates were incubated for a further 4 days at 25° C. and spores were picked from individual colonies and streaked onto CDYE agar plates supplemented with 150 μg/mL geneticin. Streaked plates were incubated at 25° C. for a further 4 days. Spores from individual colonies were suspended in 50 μL of 0.01% (v/v) triton X-100 and 5×5 μL aliquots of the spore suspension was transferred onto new CDYE agar plates supplemented with 150 μg/mL geneticin. Sporulation plates were incubated at 25° C. for 4 days and spore stocks were prepared as follows. Colony plugs from the sporulation plates were suspended in 2 mL of 0.01% (v/v) triton X-100, and 800 μL of suspended spores were mixed with 200 μL of 50% (w/v) glycerol in a 1.7 mL micro-centrifuge tube. Spore stocks were used to inoculate 50 mL of CDYE media, flash frozen in liquid nitrogen and stored at −80° C.

Indole Diterpene Production and Extraction

Fungal transformants were grown in 50 mL of CDYE medium with trace elements for 7 days at 28° C. in shaker cultures (≥200 rpm), in 250 mL Erlenmeyer flasks capped with cotton wool. Mycelia were isolated from fermentation broths by filtration through nappy liners, transferred to 50 mL centrifuge tubes (Lab Serv®, Thermo Fisher Scientific) and indole diterpenes were extracted by vigorously shaking the mycelia (≥200 rpm) in 2-butanone for ≥45 minutes.

Thin-Layer Chromatography

The 2-butanone supernatant (containing extracted indole diterpenes) was used for thin-layer chromatography (TLC) analysis on solid phase silica gel 60 aluminium plates (Merck). Indole diterpenes were chromatographed with 9:1 chloroform:acetonitrile or 8:2 dichloromethane:acetonitrile and visualized with Ehrlich's reagent (1% (w/v) p-dimethylaminobenzaldehyde in 24% (v/v) HCl and 50% ethanol).

Liquid Chromatography-Mass Spectrometry

Samples were prepared for liquid chromatography-mass spectrometry (LC-MS) from those transformants that tested positive by TLC. Accordingly, a 1 mL sample of the 2-butanone supernatant (containing extracted indole diterpenes) was transferred to a 1.7 mL micro-centrifuge tube and the 2-butanone was evaporated overnight. Contents were resuspended in 100% acetonitrile and filtered through a 0.2 μm membrane into an LC-MS vial. LC-MS samples were chromatographed on a reverse phase Thermo Scientific Accucore 2.6 μm C18 (50× 2.1 mm) column attached to an UltiMate® 3000 Standard LC system (Dionex, Thermo Fisher Scientific) run at a flow rate of 0.200 mL/minute and eluted with aqueous solutions of acetonitrile containing 0.01% formic acid using a multistep gradient method (Table 14). Mass spectra were captured through in-line analysis on a maXis™ II quadrupole-time-of-flight mass spectrometer (Bruker).

Large Scale Indole Diterpene Purification for NMR Analysis

Fungal transformants that produced high levels of novel indole diterpenes were grown in ≥1 litre of CDYE medium with trace elements, as described under “Indole diterpene production and extraction”. Mycelia were pooled into 1 litre Schott bottles containing stir bars. 2-butanone was added and indole diterpenes were extracted overnight with stirring (≥700 rpm). Extracts were filtered through Celite® 545 (J. T. Baker®) and dry loaded onto silica with rotary evaporation for crude purification by silica column prior to a final purification by semi-preparative HPLC. A 1 mL aliquot of crude extract was injected onto a semi-preparative reversed phase Phenomenex 5 μm C18(2) 100 Å (250×15 mm) column attached to an UltiMate® 3000 Standard LC system (Dionex, Thermo Fisher Scientific) run at a flow rate of 8.00 mL/minute. Multistep gradient methods were optimized for the purification of different sets of indole diterpenes. The purity of each indole diterpene was assessed by LC-MS and the structure was identified by NMR.

NMR

NMR samples were prepared in deuterated chloroform. Compounds were analysed by standard one-dimensional proton and carbon-13 NMR, two-dimensional correlation spectroscopy (COSY), heteronuclear single quantum correlation spectroscopy (HSQC), and heteronuclear multiple bond correlation spectroscopy (HMBC).

Tables 1-14 referenced in this specification are set out below:

TABLE 1

Functional assignment of predicted genes in the putative nodulisporic acid gene cluster.

Size of

Most notable BLASTp match

encoded

E-value
Protein name

protein
Predicted function

% identity/
and accession

Gene
(aa)
[Specific Function]
Organism
% coverage
number

nodI
1664
WD40 domain protein

Hypoxylon sp.
0
OTA80149

CO27-5
36% ID/80%

nodW
608
Cytochrome P450 oxygenase

Aspergillus

9.00E−153
XP_020058732

[terminal-C dioxygenase]

aculeatus

44% ID/97%

nodR*
511
Cytochrome P450 oxygenase

Penicillium

3.00E−108
PtmU/BAU61563

simplicissimum

36% ID/97%

nodX
593
Cytochrome P450 oxygenase

Hypoxylon sp.
0
OTA78491

62% ID/70%

nodM*
463
FAD-dependent oxygenase

Aspergillus

5.00E−173
AtmD/Q672V4

[IDT mono-epoxidase]

flavus

55% ID/93%

nodB*
243
IDT cyclase

Penicillium

9.00E−119
PenB/AGZ20190

[IDT cyclase]

crustosum

68% ID/99%

nodO*
448
FAD-dependent oxygenase

Penicillium

2.00E−160
JanO/AGZ20488

janthinellum

60% ID/97%

nodJ
514
Cytochrome P450 oxygenase

Aspergillus

3.00E−148
XP_001270361

clavatus

42% ID/99%

nodC*
326
Geranylgeranyl transferase

Penicillium

2.00E−136
PenC/AGZ20189

[Geranylgeranyl transferase]

crustosum

66% ID/83%

nodY1
431
FAD-dependent oxygenase

Penicillium

2.00E−71
OxaD/AOC80388

oxalicum

34% ID/99%

nodD2*
434
prenyl transferase

Penicillium

1.00E−144
JanD/AGZ20478

janthinellum

48% ID/96%

nodD1*
431
prenyl transferase

Penicillium

1.00E−155
JanD/AGZ20478

janthinellum

53% ID/94%

nodY2
461
FAD-dependent oxygenase

Aspergillus

3.00E−105
AspB/P0DOW1

alliaceus

42% ID/98%

nodZ
477
Cytochrome P450 oxygenase

Penicillium

7.00E−166
OQE14847

flavigenum

48% ID/96%

nodS
535
Not stated

Hypoxylon sp.
9.00E−139
OTA93952

CO27-5
46% ID/94%

Naming of genes in IDT clusters has followed the A. nidulans naming convention where genes are given a name with a with a three letter prefix in lower case that designates species, followed by a single letter suffix in upper case that designates gene function written in italic font (e.g. paxC). Naming of the corresponding protein product follows the same rules except that the initial letter of the prefix is upper case and the entire name is written in normal (non-italic) font (e.g. PaxC is the protein product of paxC). Thus, a nod name was assigned to each H. pulicicidum gene in the NAA 10 gene cluster. H. pulicicidum genes that share homology (>35% amino acid identity of predicted translational products) with genes found in known IDT pathways are followed by an asterisk (*) and, with the exception of nodR, were given letters corresponding to known confirmed genes (e.g. the protein encoded by nodC shares 52.8% amino acid identity with the protein product of paxC). The genes that do not share homology with known IDT genes were assigned letters that are not shared with any of the confirmed IDT genes. Notably, the cluster contains two sets of paralogous genes (share >40% amino acid identity with each other), which we have distinguished using numbers (i.e. nodD1/nodD2 and nodY1/nodY2). Closest matches were identified using BLASTp (protein-protein BLAST) against the non-redundant protein sequence database with ‘expect threshold’ set at 10 and ‘word size’ set at 6. The BLOSUM62 scoring matrix was applied with a gap opening penalty of 11 and a gap extension penalty of 1 with conditional compositional score matrix adjustment.

TABLE 2

Similarity matrix of geranylgeranyl

transferases (‘C’ enzymes).

Enzyme
PaxC
NodC
LtmC
AtmC
JanC
PenC

PaxC
100

52.3

35.3

54.3

70.9

60.5

NodC

66.7

100

38.8

55.1

55.4

54.4

LtmC

54

61.7

100

40.2

36.4

39.5

AtmC

69.9

70.5

61.5

100

58.3

63.6

JanC

79.5

70.3

55.5

74.1

100

66.7

PenC

73.4

70.5

59.1

78

79.2

100

Numbers in italics represent % identity scores. Numbers in bold represent % similarity scores for amino acid residues.

TABLE 3

Similarity matrix of 3-gernaylkgeranylindole

epoxidases (‘M’ enzymes).

Enzyme
LtmM
NodM
AtmM
PenM
JanM
PaxM

LtmM
100

37.3

38.2

36.8

36.9

37.9

NodM

57.2

100

48.3

51.2

52.9

48.6

AtmM

56.3

63.4

100

48.3

47.9

48.9

PenM

54.8

66

62.8

100

61.6

60.6

JanM

54.7

67

61.6

75.1

100

66.7

PaxM

56.6

65.8

64.4

74

79.9

100

Numbers in italics represent % identity scores and numbers in bold represent % similarity scores for amino acid residues.

TABLE 4

Similarity matrix of indole diterpene cyclases (‘B’ enzymes).

Enzyme
PaxB
NodB
LtmB
AtmB
JanB
PenB

PaxB
100

63

49.6

62.1

77

72.4

NodB

78.2

100

48.8

64.2

65.4

67.9

LtmB

65.6

63.5

100

48.8

51.6

52

AtmB

77

78.2

65.2

100

67.5

70.8

JanB

87.2

77.8

66.4

79.8

100

78.2

PenB

87.7

80.2

66.4

82.3

86.4

100

Numbers in italics represent % identity scores and numbers in bold represent % similarity scores for amino acid residues.

TABLE 5

Similarity matrix of indole diterpene prenyl transferases (‘D’ and ‘E’ enzymes

compared to NodD1 and NodD2).

Enzyme
NodD2
PaxD
NodD1
JanD
AtmD
LtmE
PenD
PenE

NodD2
100

42.3

44.7

45

31.6

11.3

32.6

23.3

PaxD

61.9

100

44.9

65.8

31.3

11.4

31.4

24.3

NodD1

60.7

63.6

100

49.2

29.2

11.2

29.7

22.7

JanD

63.1

80.6

65.6

100

30.5

11.7

31.9

25.2

AtmD

49.6

49.4

47.6

50.2

100

11.2

28.6

25.2

LtmE

20

20.7

21

21.6

19.5

100

10.8

11.6

PenD

54.4

53.5

50.7

51.9

48.1

20.9

100

24.3

PenE

41.1

40.5

40.2

41.3

37.4

21.7

41.3

100

Numbers in italics represent % identity scores and bold numbers represent % similarity scores for amino acid residues.

TABLE 6

Similarity matrix of indole diterpene FAD dependent

oxidative cyclases (‘O’ enzymes).

Enzyme
PenO
JanO
Nodo
PaxO

PenO
100

42.9

44.9

40.3

JanO

59.3

100

50.7

71.9

NodO

61.9

69.2

100

48.7

PaxO

56.9

84

67

100

Numbers in italics represent % identity scores and numbers in bold represent % similarity scores for amino acid residues.

TABLE 7

Table of fungal species used in this study.

Hypoxylon

Indole diterpene

pulicicidum

phenotype

(Nodulisporium

Nodulisporic

sp.) strain
Description
acid A
Source^reference(s)

ATCC ® 74245 ™
Wild type
+
ATCC²⁵

Penicillium

Indole diterpene

paxilli

phenotype

strain
Description
Paspaline
Paxilline
Source^reference(s)

PN2013
Wild type
+
+
Barry Scott, Massey

(ATCC ®26601 ™)

University²⁴

PN2250 (CY2)
PN2013/Deletion of entire PAX
−
−
Barry Scott, Massey

locus (ΔPAX); Hyg^R

University¹⁵

PN2257
PN2013/ΔpaxM::P_glcA-hph-T_trpC;
−
−
Barry Scott, Massey

Hyg^R

University¹⁵

PN2290
PN2013/ΔpaxC::P_trpC-hph; Hyg^R
−
−
Barry Scott, Massey

University²⁸

TABLE 8

PCR primers for amplification of transcription unit modules (TUMs).

TUM
Primer name
Primer sequence (5′ to 3′)

Hypoxylon pulicicidum primers

nodW

nodW_CDS
P4502 frag 1
cgatgtacgtctcaCTCGAATGactttagctatttta
SEQ ID NO: 57

F
ggcatcagttgcc

P4502 frag 1

actgctcgtctcaACTCccgctgcgagccgct
SEQ ID NO: 58

R

P4502 frag 2

acgtaccgtctccGAGTccggtcctggtggagtgatc
SEQ ID NO: 59

F

P4502 frag 2
gacctttcgtctctGTCTcaAAGCctaagttatgcc
SEQ ID NO: 60

R
cagatatttccag

nodM

nodM_CDS
nodM frag1 F
cgatgtacgtctcaCTCGAATGtctacccctgagt
SEQ ID NO: 61

tcaagg

nodM frag1 R

cagtcacgtctcaACGCctctcaagaacgatgtggga
SEQ ID NO: 62

aattc

nodM frag2 F

gtgcatcgtctcaGCGTagtgtaatcgcaccagag
SEQ ID NO: 63

nodM frag2 R
gacctttcgtctctGTCTcaAAGCctatgaagcgat
SEQ ID NO: 64

gtctctaatatggagtaac

nodB

nodB_CDS
nodB F
cgatgtacgtctcaCTCGAATGgatggattcgatc
SEQ ID NO: 65

gttccaatg

nodB R
gacctttcgtctctGTCTcaAAGCttattgagccttc
SEQ ID NO: 66

cgcgcattg

nodC

nodC_CDS
nodC frag1 F
cgatgtacgtctcaCTCGAATGtccttaggtttaca
SEQ ID NO: 67

gtgcttgg

nodC frag1 R

cattgacgtctcgGTCAcgtcgccaaaccagcga
SEQ ID NO: 68

nodC frag2 F

gtcacgcgtctctTGACggcctcactagctttcc
SEQ ID NO: 69

nodC frag2 R
gacctttcgtctctGTCTcaAAGCtcaatgcgtaag
SEQ ID NO: 70

atcgagtttctcctttct

Penicillium paxilli primers

paxG

paxG_ProUTR
PpaxG F
cgatgtacgtctcaCTCGGGAGattcacgacctgt
SEQ ID NO: 71

gactagtcaa

PpaxG R
gacctttcgtctctGTCTcaCATTggcgtcgaactt
SEQ ID NO: 72

gatgaagttttc

paxG_CDS
paxG frag1 F
cgatgtacgtctcaCTCGAATGtcctacatccttg
SEQ ID NO: 73

cagaag

paxG frag1 R

cttctacgtctcgTACTgttctaatcgtgcttggtg
SEQ ID NO: 74

paxG frag2 F

gcacgacgtctccAGTAcaggtgctagaagatgacg
SEQ ID NO: 75

ttgac

paxG frag2 R

aggcgccgtctccACCAatctctttcaatcttgcttgttg
SEQ ID NO: 76

ga

paxG frag3 F

gattgacgtctctTGGTgacccccgegcctt
SEQ ID NO: 77

paxG frag3 R

gtcgaccgtctctTTCCctagtatattggaagctcccc
SEQ ID NO: 78

g

paxG frag4 F

tccaatcgtctcgGGAAaccctaagtcgacttagtgc
SEQ ID NO: 79

g

paxG frag4 R
gacctttcgtctctGTCTcaAAGCttaaactcttcctt
SEQ ID NO: 80

tctcattagtaggg

paxGUTR_term
TpaxG F
cgatgtacgtctcaCTCGGCTTtcaatcgtgctgc
SEQ ID NO: 81

atttctctt

TpaxG R
gacctttcgtctctGTCTcaAGCGtcactcccgagc
SEQ ID NO: 82

aatattgct

paxC

paxC_ProUTR
PpaxC F2
cgatgtacgtctcaCTCGGGAGacaacaaaaag
SEQ ID NO: 83

atcagccaatgg

PpaxC R2
gacctttcgtctctGTCTcaCATTaaaatgggacct
SEQ ID NO: 84

acaccctgaa

paxC_CDS
paxC frag1 F
cgatgtacgtctcaCTCGAATGggcgtagcagg
SEQ ID NO: 85

ga

paxC frag1 R

cattgacgtctccACGGcgccagacaaggga
SEQ ID NO: 86

paxC frag2 F

cccttgcgtctcgCCGTgacggagtcaatgggttc
SEQ ID NO: 87

paxC frag2 R
gacctttcgtctctGTCTcaAAGCtcatgccttcag
SEQ ID NO: 88

gtcaagcttc

paxC_UTRterm
TpaxC F
cgatgtacgtctcaCTCGGCTTttggccttgtgaa
SEQ ID NO: 89

atatgggactac

TpaxC R
gacctttcgtctctGTCTcaAGCGatctctgtcatgt
SEQ ID NO: 90

cggatatcagat

paxM

paxM_ProUTR
PpaxM F
cgatgtacgtctcaCTCGGGAGgttgttggcatg
SEQ ID NO: 91

ggagtaggat

PpaxM R
gacctttcgtctctGTCTcaCATTggtttctgaatctt
SEQ ID NO: 92

aaagatacatgaaaag

paxM_CDS
paxM frag1 F
cgatgtacgtctcaCTCGAATGgaaaaggccga
SEQ ID NO: 93

gtttcaag

paxM frag1 R

tgacaacgtctcgTCCAtcgaataaagcgttgacttgc
SEQ ID NO: 94

paxM frag2 F

acgcttcgtctcaTGGActcactattgtcacaatccatg
SEQ ID NO: 95

gaaaag

paxM frag2 R
gacctttcgtctctGTCTcaAAGCttaaacttgaag
SEQ ID NO: 96

aaaataaaacttcagggcac

paxM_UTRterm
TpaxM frag1
cgatgtacgtctcaCTCGGCTTaccattggagca
SEQ ID NO: 97

F
atttttggttttc

TpaxM frag1

gttcgccgtctcgACTCgattgcttgtgggtct
SEQ ID NO: 98

R

TpaxM frag2

acaagccgtctccGAGTccagccagcgaacttg
SEQ ID NO: 99

F

TpaxM frag2
gacctttcgtctctGTCTcaAGCGttttggcttacttc
SEQ ID NO: 100

R
agtttaactgttttg

paxB

paxB_ProUTR
PpaxB F
cgatgtacgtctcaCTCGGGAGaaggctgtgttg
SEQ ID NO: 101

gagagaatc

PpaxB R
gacctttcgtctctGTCTcaCATTagtttctaaggtt
SEQ ID NO: 102

gacgtgggaaaaag

paxB_CDS
paxB F
cgatgtacgtctcaCTCGAATGgacggttttgatg
SEQ ID NO: 103

tttcccaa

paxB R
gacctttcgtctctGTCTcaAAGCtcaatttgctttttt
SEQ ID NO: 104

cggcccgcttatgc

paxB_UTRterm
TpaxB F
cgatgtacgtctcaCTCGGCTTtcggcagttgag
SEQ ID NO: 105

ggtgaaac

TpaxB R
gacctttcgtctctGTCTcaAGCGggttaacaatga
SEQ ID NO: 106

ggaacgatgaacag

Additional primers

trpC

trpC_ProUTR
PtrpC frag1 F
cgatgtacgtctcaCTCGGGAGgaattcatgcca
SEQ ID NO: 107

gttgttcccag

PtrpC frag1 R

cgatgtacgtctcaGCTTggccgactcgctg
SEQ ID NO: 108

PtrpC frag2 F

cacctttcgtctccAAGCagacgtgaagcaggacgg
SEQ ID NO: 109

PtrpC frag2 R

cgatgtcgtctcgCAGAccattgcacaagcctc
SEQ ID NO: 110

PtrpC frag3 F

gacctttcgtctcgTCTGcgcatggatcgctgc
SEQ ID NO: 111

PtrpC frag3 R
gacctttcgtctctGTCTcaCATTatcgatgcttgg
SEQ ID NO: 112

gtagaataggtaag

trpC_UTRterm
T trpC frag1 F
cgatgtacgtctcaCTCGGCTTgatccacttaacg
SEQ ID NO: 113

ttactgaaatcatcaaac

T trpC frag1 R

gacctttcgtctctCTGCttgatctcgtctgccga
SEQ ID NO: 114

T trpC frag2 F

cgatgtacgtctcaGCAGatcaacggtcgtcaaga
SEQ ID NO: 115

T trpC frag2 R
gacctttcgtctctGTCTcaAGCGtctagaaagaa
SEQ ID NO: 116

ggattacctctaaacaagtgt

nptII

nptII_CDS
ntpII F
cgatgtacgtctcaCTCGAATGattgaacaagat
SEQ ID NO: 117

ggattgcacg

ntpII R
gacctttcgtctctGTCTcaAAGCctcagaagaact
SEQ ID NO: 118

cgtcaagaaggc

The forward and reverse PCR primers used for amplification of TUMs (i.e. promoters (ProUTR), coding sequences (CDSs), and terminators (UTRterm)) are listed. Primers used to amplify TUM fragments for domestication purposes (i.e. removal of internal sites for AarI, BsaI or BsmBI) are underlined (e.g., P4502 frag 1 R). The template for amplification of nod CDSs was genomic DNA from Hypoxylon pulicicidum strain ATCC® 74245™.²⁵The template for amplification of pax gene TUMs was genomic DNA from Penicillium paxilli strain ATCC® 26601™ (PN2013).²⁴The PCR products used to produce the trpC ProUTR module, nptII CDS module (conferring resistance to geneticin), and trpC_UTRtermmodule were all amplified from plasmid pII99.⁴¹The BsmBI recognition sites are shown in bold lower case text (cgtctc), with the overhangs generated following BsmBI cleavage shown by the upper case italics text. The 5′ (prefix) and 3′ (suffix) nucleotide bases, which flank each TUM and form the basis of the address system for each of the MIDAS modules, are shown in bold upper case text, and bold upper case italic text respectively.

TABLE 9

MIDAS Level-1 plasmid library: Assembly of TUMs in pML1.

[GGAG] [AATG] [GCTT] [CGCT]

ProUTR modules
CDS modules
UTRterm modules

Plasmid

Plasmid

Plasmid

name
Description
name
Description
name
Description

pSK1
paxG_ProUTR
pKV45
nodW_CDS
pSK3
paxG_UTRterm

pKV28
paxC_ProUTR
pKV59
nodM_CDS
pSK12
paxC_UTRterm

pSK4
paxM_ProUTR
pSK18

pSK6
paxM_UTRterm

pSK7
paxB_ProUTR
pSK19
nodB_CDS
pSK9
paxB_UTRterm

pSK17
trpC_ProUTR
pSK20
nodC_CDS
pSK15
trpC_UTRterm

pSK2
paxG_CDS

pSK11
paxC_CDS

pSK5
paxM_CDS

pSK16
nptII_CDS

This table represents the MIDAS level-1 TUMs that we used to assemble MIDAS level-2 TUs (Table 10). The 4 base prefixes and suffixes (5′ to 3′) that flank each TUM are shown at the top of the table to highlight the sequences used to bind the TUMs together to make MIDAS level-2 TUs. These 4 base flanking regions are depicted in the primer table (Table 8) in bold upper case text (forward addresses) and bold upper case italics text (reverse addresses).

TABLE 10

MIDAS Level-2 plasmid library: Assembly of TUs in pML2 destination vectors

Level-1 entry clones used for
pML2

TU assembly
destination
Level-2 entry clones

TU
ProUTR
CDS
UTRterm
vector
Name
Description

nodW
pSK17
pKV45
pSK15
pML2(+)WR

pKV52

custom character

(T_trpC-nodW-

P_trpC):pML2(+)WR

pML2(+)BR

pSK67

custom character

(T_trpC-nodW-

P_trpC):pML2(+)BR

nodM
pSK4
pKV59
pSK6
pML2(+)BF

pKV57

(P_trpC-nodM-

T_trpC) custom character

:pML2(+)BF

pSK18

pML2(+)WF

pSK28

(P_trpC-nodM-

T_trpC) custom character

:pML2(+)WF

nodB
pSK7
pSK19
pSK9
pML2(+)BR

pSK29

custom character

(T_trpC-nodB-

P_trpC):pML2(+)BR

nodC
pSK17
pSK20
pSK15
pML2(+)BF

pKV26

(P_trpC-nodC-

T_trpC) custom character

:pML2(+)BF

pKV28

pSK12
pML2(+)WF

pSK60

(P_trpC-nodC-

T_trpC) custom character

:pML2(+)WF

paxG
pSK1
pSK2
pSK3
pML2(+)BR

pSK21

custom character

(T_paxG-paxG-

P_paxG):pML2(+)BR

paxC
pKV28
pSK11
pSK12
pML2(+)WF

pSK59

(P_paxC-paxC-

T_paxC) custom character

:pML2(+)WF

paxM
pSK4
pSK5
pSK6
pML2(+)WR

pSK22

custom character

(T_paxM-paxM-

P_paxM):pML2(+)WR

nptII
pSK17
pSK16
pSK15
pML2(+)WF

pSK26

(P_trpC-nptII-

T_trpC) custom character

:pML2(+)WF

This table represents the construction of the MIDAS level-2 TUs that were used to assemble MIDAS level-3 multi-gene plasmids for heterologous expression studies. The names of the Level-2 entry plasmids produced are shown in bold. TUs are described by the CDS they contain and TU orientation, determined by the pML2 destination vector, is shown by the arrowhead ( custom character for forward (F) destination vector and for reverse (R) destination vector) in the Level-2 description.

TABLE 11

MIDAS Level-3 plasmid library: Multi-gene assemblies in pML3

Golden

Level-2 entry clone
Destination
Gate
Product Level-3 plasmid

Step
Name
Description
vector
reaction
Name
Description
Plasmid size (kb)

1
pSK26
(P_trpC-nptII-
pML3
AarI

pKV22

pML3:nptII custom character

5.6

T_trpC) custom character

:pML2(+)WF

2
pKV26
(P_trpC-nodC-
pKV22
BsmBI

pKV27

pML3:nptII custom character

:nodC

9.0

T_trpC) custom character

:pML2(+)BF

2
pKV57
(P_trpC-nodM-
pKV22
BsmBI

pKV63

pML3:nptII custom character

:nodM

9.4

T_trpC) custom character

:pML2(+)BF

3
pKV52

custom character

(T_trpC-nodW-
pKV63
AarI

pKV64

pML3:nptII custom character

:
13.0

P_trpC):pML2(+)WR

nodM custom character

nodW

1
pSK26
(P_trpC-nptII-
pML3
AarI

pSK33

pML3:nptII custom character

5.6

T_trpC) custom character

:pML2(+)WF

2
pSK21

custom character

(T_paxG-paxG-
pSK33
BsmBI

pSK34

pML3:nptII custom character

paxG
8.2

P_paxG):pML2(+)BR

3
pSK22

custom character

(T_paxM-paxM-
pSK34
AarI

pSK36

pML3:nptII custom character

paxG:
11.5

P_paxM):pML2(+)WR

custom character

paxM

4
pSK29

custom character

(T_trpC-nodB-
pSK36
BsmBI

pKV73

pML3:nptII custom character

paxG:
14.1

P_trpC):pML2(+)BR

custom character

paxM:

nodB

5
pSK59
(P_paxC-paxC-
pSK73
AarI

pKV74

pML3:nptII custom character

paxG:
16.3

T_paxC) custom character

:pML2(+)WF

custom character

paxM:

nodB:paxC

3
pSK28
(P_trpC-nodM-
pSK34
AarI

pSK35

pML3:nptII custom character

paxG:
11.5

T_trpC) custom character

:pML2(+)WF

nodM custom character

4
pSK29

custom character

(T_trpC-nodB-
pSK35
BsmBI

pSK38

pML3:nptII custom character

paxG:
14.1

P_trpC):pML2(+)BR

nodM custom character

nodB

5
pSK60
(P_trpC-nodC-
pSK38
AarI

pSK66

pML3:nptII custom character

paxG:
16.3

T_trpC) custom character

:pML2(+)WF

nodM custom character

nodB:nodC

6
pSK67

custom character

(T_trpC-nodW-
pSK66
BsmBI

pSK68

pML3:nptII custom character

paxG:
20.5

P_trpC):pML2(+)BR

nodM custom character

nodB:

nodC custom character

nodW

The table shows the Level-2 entry clone and Level-3 destination vectors used to construct the multi-gene plasmids. The names of the plasmids produced during each cycle of Level-3 assembly are shown in bold. The number of level 3 assembly reactions used to create the level-3 plasmid is indicated by number in the step column. TUs are annotated with the name of the CDS they contain. TU orientation is shown by the arrowhead.

TABLE 12

Generalised primer design for amplification of ProUTR, CDS and UTRterm

modules to be cloned into pML1.

TUM
Primer
Primer sequence (5′ to 3′)

[GGAG]-
Forward
5′-cgatgtacgtctcaCTCGGGAG (SEQ ID NO: 119) (+18-25 bases

ProUTR-

specific for the 5′ end of the promoter)-3′

[AATG]
Reverse
5′-gacctttcgtctctGTCTcaCATT (SEQ ID NO: 120) (+18-25 bases

specific for the 3′ end of the 5′UTR)-3′

The CAT sequence (reverse-complement = ATG) underlined

within the CATT module-specific nucleotides, specifies the

translation initiation codon for the CDS of interest, while

the final T (not underlined) represents the base immediately

upstream of the initiation codon.

[AATG]-
Forward
5′-cgatgtacgtctcaCTCGAATG (SEQ ID NO: 121) (+18-25 bases

CDS-

specific for the 5′ end of the CDS, beginning at the 2^nd

[GCTT]

codon)-3′

The ATG sequence (underlined within the AATG module-specific

nucleotides) specifies the translation initiation codon for

the CDS of interest, while the initial A (not underlined)

represents the base immediately upstream of the initiation

codon.

Reverse
5′-gacctttcgtctctGTCTcaAAGC* (SEQ ID NO: 122) (+18-25

bases specific for the 3′ end of the CDS)-3′

Remember to include a stop codon (*) at the end of the CDS.

[GCTT]-
Forward
5′-cgatgtacgtctcaCTCGGCTT (SEQ ID NO: 123) (+18-25 bases

UTRterm-

specific for the 5′ end of the 3′UTR)-3′

[CGCT]
Reverse
5′-gacctttcgtctctGTCTcaAGCG (SEQ ID NO: 124) (+18-25 bases

specific for the 3′ end of the terminator)-3′

Generalised features of forward and reverse PCR primers used for amplification of TUMs are listed. The BsmBI recognition sites are shown in lower case bold (cgtctc), with the overhangs generated following BsmBI cleavage shown by upper case italics (e.g., CTCG). The 5′ and 3′ nucleotide-specific bases, which flank each TUM and form the basis of the address system for each of the MIDAS modules, are shown in upper case bold (e.g., GGAG) and upper case bold italics (e.g., CATT), respectively.

TABLE 13

Level-3 multigene assemblies are constructed by alternating Golden Gate

cloning reactions using TUs assembled in “White” and “Blue” pML2 vectors.

Level-2

Golden

entry

Gate

Step
clone
Destination plasmid
reaction
Product plasmid
Screen

1
TU1 in a
pML3
AarI-
pML3:TU1
White

White

mediated

colonies

pML2

vector

2
TU2 in a
pML3:TU1
BsmBI-
pML3:TU1:TU2
Blue

Blue

mediated

colonies

pML2

vector

3
TU3 in a
pML3:TU1:TU2
AarI-
pML3:TU1:TU2:TU3
White

White

mediated

colonies

pML2

vector

4
TU4 in a
pML3:TU1:TU2:TU3
BsmBI-
pML3:TU1:TU2:TU3:TU4
Blue

Blue

mediated

colonies

pML2

vector

The table shows the cloning steps used to produce a hypothetical multigene construct containing four TUs, with each row depicting the input plasmids (Level-2 entry clone and destination plasmid), the type of Golden Gate reaction used for assembly, the product plasmid and the type of colonies screened.

TABLE 14

Multistep acetonitrile gradient used for

LC-MS analysis of fungal extracts.

Time
% (v/v) of acetonitrile +

(minutes)
0.01% (v/v) formic acid

0
50

1
50

15
70

20
95

25
95

28
50

38
50

REFERENCES

(1) Nakazawa, J.; Yajima, J.; Usui, T.; Ueki, M.; Takatsuki, A.; Imoto, M.; Toyoshima, Y. Y.; Osada, H. Chem. Biol. 2003, 10 (2), 131-137.

(2) Sallam, A. A.; Ayoub, N. M.; Foudah, A. I.; Gissendanner, C. R.; Meyer, S. A.; El Sayed, K. A. Eur. J. Med. Chem. 2013, 70, 594-606.

(3) Byrd, A. D.; Schardl, C. L.; Songlin, P. J.; Mogen, K. L.; Siegel, M. R. Curr. Genet. 1990, 18 (4), 347-354.

(4) Yelton, M. M.; Hamer, J. E.; Timberlake, W. E. Proc. Natl. Acad. Sci. 1984, 81 (5), 1470-1474.

(5) Vollmer, S. J.; Yanofsky, C. Proc. Natl. Acad. Sci. 1986, 83 (13), 4869-4873.

(6) Oliver, R. P.; Roberts, I. N.; Harling, R.; Kenyon, L.; Punt, P. J.; Dingemanse, M. A.; van den Hondel, C. A. M. J. J. Curr. Genet. 1987, 12 (3), 231-233.

	Number	Date	Country
Parent	16651065		US
Child	17816630		US

Heterologous biosynthesis of nodulisporic acid

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Term Extension

Abstract

Description

Claims

Priority Claims (1)

RELATED APPLICATIONS

US Referenced Citations (1)

Non-Patent Literature Citations (9)

Related Publications (1)

Divisions (1)

Entry
International Search Report and Written Opinion corresponding to International Patent Application No. PCT/IB2018/057528 mailed Dec. 13, 2018.
Nicholson M.J et al. “Molecular Cloning and Functional Analysis of Gene Clusters for the Biosynthesis ofindole-Diterpenes in Penicillium crustosum and P. janthinellum”, Toxins, 2015, vol. 7, pp. 2701-2722, DOI: 1O.33990/toxins7082701 Abstract, Results, Discussion.
Bills, G.F. et al “Hypoxylon pulicicidum sp. nov. (Ascomycota, Xylariales), a Pantropical Insecticide-Producing Endophyte” PLoS One. 2012; 7(10): e46687, DOI: 10. 1371/journal.pone.0046687 Abstract, Figure 1, Table 1, p. 16, “Morphology and Culture Studies” and “Fermentation for Detection ofNodulisporic Acids”.
Protein Sequence, Hypoxylon pulicicidum strain MF5954 hypothetical protein, cytochrome P450 oxygenase (nodW). Genbank MG 182145.1 (Accession No. MGI82145), Published: IO—Jan. 2018 Whole Document.
Nucleotide Sequence, cytochrome P450 oxygenase [Hypoxylon pulicicidum], GenBank: AUM60065.1 (Accession No. AUM60065), Published: IO-Jan. 2018 Whole Document.
Nucleotide Sequence, cytochrome P450 oxygenase [Hypoxylon pulicicidum], GenBank: AUM60066.1 (Accession No. AUM60066), Published: IO-Jan. 2018 Whole Document.
Nucleotide Sequence, cytochrome P450 oxygenase [Hypoxylon pulicicidum], GenBank: AUM60053.1, Published: Jan. 10, 2018 Whole Document.
Van de Bittner, K.C., et al., “Heterologous Biosynthesis ofNodulisporic Acid F”, 1. Am. Chern. Soc., Dec. 28, 2017, 140 (2), pp. 582-585, DOI: 10.1021/jacs.7bl0909 Whole Document.
Nicholson M. et ai, “Draft Genome Sequence of the Filamentous Fungus Hypoxylon pulicicidum ATCC 74245”, Genome Announcements, Il-Jan. 2018, vol. 6, No. 2, e01380-17, DOI:10.1128/genomeA.01380-17.