The present disclosure relates generally to targeted gene editing constructs, including methods of designing a DNA-recognition moiety for modulation of gene expression in plants, DNA-recognition moieties, gene editing constructs, methods for the modulation of gene expression in plants using gene editing constructs, and plants or regenerable plant cells produced therefrom.
This application claims priority from Australian Provisional Patent Application No. 2019904146 filed on 4 Nov. 2019, the entire content of which is hereby incorporated by reference.
This application contains a Sequence Listing, which has been submitted electronically and is hereby incorporated by reference in its entirety. Nucleotide bases are defined in accordance with the International Union of Pure and Applied Chemistry (IUPAC) nucleic acid notation, which is consistent with the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25.
Cannabis sativa is an herbaceous flowering plant of the Cannabis genus (Rosale) that has been used for its fibre and medicinal properties for thousands of years. The medicinal qualities of cannabis have been recognised since at least 2800 BC, with use of cannabis featuring in ancient Chinese and Indian medical texts. Although use of cannabis for medicinal purposes has been known for centuries, research into the pharmacological properties of the plant has been limited due to its illegal status in most jurisdictions.
The chemistry of cannabis is varied. It is estimated that cannabis plants produce more than 400 different molecules, including phytocannabinoids, terpenes and phenolics. Cannabinoids, such as Δ-9-tetrahydrocannabinol (THC) and cannabidiol (CBD) are the most well-known and researched cannabinoids. CBD and THC are naturally present in their acidic forms, Δ-9-tetrahydrocannabinolic acid (THCA) and cannabidiolic acid (CBDA) in planta, which are alternative products of a shared precursor, cannabigerolic acid (CBGA). Since different cannabinoids are likely to have different therapeutic potential, it is important to be able to identify and extract different cannabinoids that are suitable for medicinal use.
Despite advances in plant breeding technologies and the increasing commercial importance of cannabis plant varieties, there remains a need for improved methods of producing cannabis plants with one or more desirable phenotypic and/or chemotypic traits, including for large-scale production and breeding programs.
In an aspect disclosed herein, there is provided a method of producing a nucleic acid sequence encoding a DNA-recognition moiety for a targeted gene editing construct, the method comprising:
In another aspect disclosed herein, there is provided a nucleic acid sequence encoding a DNA-recognition moiety produced by the methods disclosed herein.
In another aspect disclosed herein, there is provided a gene editing construct comprising the nucleic acid sequence encoding the DNA-recognition moiety disclosed herein.
In another aspect disclosed herein, there is provided a method of modulating gene expression in a plant cell, the method comprising:
In another aspect disclosed herein, there is provided a transformed plant cell comprising the gene editing construct disclosed herein.
In another aspect disclosed herein, there is provided a method for producing a regenerable plant cell with modified gene expression, the method comprising:
In another aspect disclosed herein, there is provided a plant comprising the transformed plant cell described herein.
In another aspect disclosed herein, there is provided a regenerable plant cell produced according to the methods disclosed herein.
Throughout this specification, unless the context requires otherwise, the word “comprise”, or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated element or integer or group of elements or integers but not the exclusion of any other element or integer or group of elements or integers.
The reference in this specification to any prior publication (or information derived from it), or to any matter which is known, is not, and should not be taken as an acknowledgement or admission or any form of suggestion that that prior publication (or information derived from it) or known matter forms part of the common general knowledge in the field of endeavour to which this specification relates.
Unless specifically defined otherwise, all technical and scientific terms used herein shall be taken to have the same meaning as commonly understood by one of ordinary skill in the art.
Unless otherwise indicated the molecular biology, cell culture, laboratory, plant breeding and selection techniques utilised in the present invention are standard procedures, well known to those skilled in the art. Such techniques are described and explained throughout the literature in sources such as, J. Perbal, A Practical Guide to Molecular Cloning, John Wiley and Sons (1984), J. Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press (1989), T. A. Brown (editor), Essential Molecular Biology: A Practical Approach, Volumes 1 and 2, IRL Press (1991), D. M. Glover and B. D. Hames (editors), DNA Cloning: A Practical Approach, Volumes 1-4, IRL Press (1995 and 1996), and F. M. Ausubel et al. (editors), Current Protocols in Molecular Biology, Greene Pub. Associates and Wiley-Interscience (1988, including all updates until present); Janick, J. (2001) Plant Breeding Reviews, John Wiley & Sons, 252 p.; Jensen, N. F. ed. (1988) Plant Breeding Methodology, John Wiley & Sons, 676 p., Richard, A. J. ed. (1990) Plant Breeding Systems, Unwin Hyman, 529 p.; Walter, F. R. ed. (1987) Plant Breeding, Vol. I, Theory and Techniques, MacMillan Pub. Co.; Slavko, B. ed. (1990) Principles and Methods of Plant Breeding, Elsevier, 386 p.; and Allard, R. W. ed. (1999) Principles of Plant Breeding, John-Wiley & Sons, 240 p. The ICAC Recorder, Vol. XV no. 2: 3-14; all of which are incorporated by reference. The procedures described are believed to be well known in the art and are provided for the convenience of the reader. All other publications mentioned in this specification are also incorporated by reference in their entirety.
As used in the subject specification, the singular forms “a”, “an” and “the” include plural aspects unless the context clearly dictates otherwise. Thus, for example, reference to “a plant” includes a single plant, as well as two or more plants; reference to “an endonuclease” includes a single endonuclease, as well as two or more endonuclease, and so forth.
As used herein, “and/or” refers to and encompasses any and all possible combinations of one or more of the associated listed items, as well as the lack of combinations when interpreted in the alternative (or).
The present disclosure is predicated, at least in part, on the unexpected finding that multiple reference genomes may be used to construct a pan-genome useful in the production of high-efficiency gene editing constructs, while minimising the possibility of off-target effects. Such gene editing constructs may be used in advantageous plant production methods, including the screening of gene editing constructs for the efficient and effective modulation of plant gene expression in transient in vitro plant cell models, and the stable transformation of plant cells capable of regenerating a whole plant with modified gene expression as a result of the expression of the gene editing constructs.
Accordingly, in an aspect disclosed herein, there is provided a method of producing a nucleic acid sequence encoding a DNA-recognition moiety for a targeted gene editing construct, the method comprising:
The term “DNA-recognition moiety” as used herein refers to a molecule that is capable of hybridising to a target DNA sequence, or its complement, for use in gene editing.
Preferred DNA-recognition moieties will hybridise under stringent conditions to a target DNA sequence, or its complement. The term “hybridise under stringent conditions”, and grammatical equivalents thereof, refers to the ability of a nucleic acid molecule to hybridise to a target nucleic acid molecule under defined conditions of temperature and salt concentration. With respect to nucleic acid molecules greater than about 100 bases in length, typical stringent hybridisation conditions are no more than 25° C. to 30° C. (for example, 10° C.) below the melting temperature (Tm) of the native duplex (see generally, Sambrook et al., supra). Tm for nucleic acid molecules greater than about 100 bases can be calculated by the formula Tm=81.5+0.41% (G+C-log (Nat)). With respect to nucleic acid molecules having a length less than 100 bases, exemplary stringent hybridisation conditions are 5° C. to 10° C. below Tm.
Persons skilled in the art would understand that the DNA-recognition moiety may be DNA, RNA or a polypeptide.
Illustrative examples of suitable DNA molecules include antisense, as well as sense (e.g., coding and/or regulatory) DNA molecules. Antisense DNA molecules include short oligonucleotides. Other examples of inhibitory DNA molecules include those encoding interfering RNAs, such as shRNA and siRNA. Yet another illustrative example of an inhibitor of gene expression is catalytic DNA, also referred to as DNAzymes.
Illustrative examples of suitable RNA molecules include siRNA, dsRNA, stRNA, shRNA and miRNA (e.g. short temporal RNAs and small modulatory RNAs), ribozymes, and guide (i.e., gRNA or single-guide RNA (sgRNA)) or clustered regularly interspaced short palindromic repeats (CRISPR) RNAs used in combination with the Cas or other endonucleases (van der Oost et al. 2014, Nature Reviews Microbiology,12(7):479-92).
In an embodiment, the DNA-recognition moiety is a CRISPR RNA. Suitable CRISPR RNA will be known to persons skilled in the art, illustrative examples of which include guide RNA (gRNA) and single-guide RNA (sgRNA).
In an embodiment, the DNA-recognition moiety is a polypeptide. Illustrative examples of a suitable polypeptide molecules are zinc finger nucleases or “ZFN”, and transcription activator-like (TAL) targeting domains, as described elsewhere herein.
The terms “guide RNA” or “gRNA” refer to a RNA sequence that is complementary to a target DNA and directs a CRISPR endonuclease to the target DNA. gRNA comprises crispr RNA (crRNA) and a tracr RNA (tracrRNA). crRNA is a 17-20 nucleotide sequence that is complementary to the target DNA, while the tracrRNA provides a binding scaffold for the endonuclease. crRNA and tracrRNA exist in nature a two separate RNA molecules, which has been adapted for molecular biology techniques using, for example, 2-piece gRNAs such as CRISPR tracer RNAs (cr:tracrRNAs).
The terms “single-guide RNA” or “sgRNA” refers to a single RNA sequence that comprises the crRNA fused to the tracrRNA.
Accordingly, the skilled person would understand that the term “gRNA” describes all CRISPR guide formats, including two separate RNA molecules or a single RNA molecule. By contrast, the term “sgRNA” will be understood to refer to single RNA molecules combining the crRNA and tracrRNA elements into a single nucleotide sequence.
In a preferred embodiment, the DNA-recognition moiety is a single-guide RNA (sgRNA).
Methods to optimise the design and efficiency of sgRNAs will be known to persons skilled in the art, illustrative examples of which include the paired nicking strategy described by Cho et al. (2014, Genome Research, 24: 132-41) and Ran et al. (2013, Cell, 154: 1380-9), dimeric-Cas9 based systems as described by Wyvekens et al. (2015, Human Gene Therapy, 26: 425-31), truncation of the 3′ end of the sgRNA scaffold as described by Hsu et al. (2013, Nature Biotechnology, 31: 827-32), or addition of two guanine nucleotides to the 5′ end of the sgRNA as described by Cho et al. (2014, supra). The length of the sgRNA has also been demonstrated to result in different effects of CRISPR-mediated modification of gene expression (Zhang et al., 2016, Scientific Reports, 6: 28566).
In an embodiment, the sgRNA is complementary to a target DNA sequence of between 10 and 30 nucleotides in length.
In an embodiment, the sgRNA consists of a sequence provided in Table 5, or complementary sequences thereof.
In an aspect disclosed herein, there is provided a DNA-recognition moiety produced according to the methods disclosed herein.
The term “targeted gene editing construct” as used herein refers to a recombinant nucleic acid molecule formed in vitro by the manipulation of nucleic acid into a form not normally found in nature.
In an embodiment, the targeted gene editing construct is an expression vector.
The term “vector” as used herein refers to a nucleic acid molecule, preferably a DNA molecule derived from a plasmid or plant virus, into which a nucleic acid sequence may be inserted. The vector may also include a selection marker such as an antibiotic resistance gene that can be used for selection of suitable bacterial or plant transformants, or sequences that enhance transformation of prokaryotic or eukaryotic (especially cannabis) cells such as T-DNA or P-DNA sequences. Examples of such resistance genes and sequences are well known to those of skill in the art.
In an embodiment, the targeting gene editing construct is a plasmid. In another embodiment, the plasmid is a Ti plasmid.
As used herein, the terms “encode,” “encoding” and the like refer to the capacity of a nucleic acid to provide for another nucleic acid or a polypeptide. For example, a nucleic acid sequence is said to “encode” a polypeptide if it can be transcribed and/or translated to produce the polypeptide or if it can be processed into a form that can be transcribed and/or translated to produce the polypeptide. Such a nucleic acid sequence may include a coding sequence or both a coding sequence and a non-coding sequence. Thus, the terms “encode,” “encoding” and the like include an RNA product resulting from transcription of a DNA molecule, a protein resulting from translation of an RNA molecule, a protein resulting from transcription of a DNA molecule to form an RNA product and the subsequent translation of the RNA product, or a protein resulting from transcription of a DNA molecule to provide an RNA product, processing of the RNA product to provide a processed RNA product (e.g., mRNA) and the subsequent translation of the processed RNA product.
The term “endogenous” refers to a gene or nucleic acid sequence or segment that is normally found in a host organism.
The terms “expressible,” “expressed,” and variations thereof refer to the ability of a cell to transcribe a nucleotide sequence to RNA and optionally translate the mRNA to synthesise a peptide or polypeptide that provides a biological or biochemical function.
As used herein, the term “gene” includes a nucleic acid molecule capable of being used to produce mRNA optionally with the addition of elements to assist in this process. Genes may or may not be capable of being used to produce a functional protein. Genes can include both coding and non-coding regions (e.g., introns, regulatory elements, promoters, enhancers, termination sequences and 5′ and 3′ untranslated regions).
The terms “heterologous nucleic acid sequence,” “heterologous nucleotide sequence,” “heterologous polynucleotide,” “foreign polynucleotide,” “exogenous polynucleotide” and the like are used interchangeably to refer to any nucleic acid (e.g., a nucleotide sequence encoding at least one targeting RNA), which is introduced into the genome of an organism by experimental manipulations.
The terms “heterologous polypeptide,” “foreign polypeptide” and “exogenous polypeptide” are used interchangeably to refer to any peptide or polypeptide, which is encoded by a “heterologous nucleic acid sequence,” “heterologous nucleotide sequence,” “heterologous polynucleotide,” “foreign polynucleotide” and “exogenous polynucleotide,” as defined above.
The term “operably connected” or “operably linked” as used herein refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner. For example, a regulatory element or regulatory sequence “operably linked” to a coding sequence refers to positioning and/or orientation of the regulatory sequence relative to the coding sequence to permit expression of the coding sequence under conditions compatible with the regulatory sequence.
By “regulatory element” or “regulatory sequence” it is meant a nucleic acid sequence (e.g., DNA) necessary for expression of an operably linked coding sequence in a particular host cell. The regulatory sequences that are suitable for eukaryotic cells include promoters, polyadenylation signals, transcriptional enhancers, translational enhancers, leader or trailing sequences that modulate mRNA stability, as well as targeting sequences that target a product encoded by a transcribed polynucleotide to an intracellular compartment within a cell or to the extracellular environment.
In an embodiment, the regulatory element is a promoter. In another embodiment, the promoter is a 35S promoter.
The terms “polynucleotide,” “polynucleotide sequence,” “nucleotide sequence,” “nucleic acid” or “nucleic acid sequence” as used herein designate mRNA, RNA, cRNA, cDNA or DNA. The term typically refers to polymeric form of nucleotides of at least 10 bases in length, either ribonucleotides or deoxynucleotides or a modified form of either type of nucleotide. The term includes single and double stranded forms of RNA or DNA.
“Polypeptide,” “peptide,” “protein” and “proteinaceous molecule” are used interchangeably herein to refer to molecules comprising or consisting of a polymer of amino acid residues and to variants and synthetic analogues of the same. Thus, these terms apply to amino acid polymers in which one or more amino acid residues are synthetic non-naturally occurring amino acids, such as a chemical analogue of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers.
As used herein the term “recombinant” as applied to “nucleic acid molecules,” “polynucleotides” and the like is understood to mean artificial nucleic acid structures (i.e., non-replicating cDNA or RNA; or replicons, self-replicating cDNA or RNA) which can be transcribed and/or translated in host cells or cell-free systems described herein. Recombinant nucleic acid molecules or polynucleotides may be inserted into a vector. Non-viral vectors such as plasmid expression vectors or viral vectors may be used. The kind of vectors and the technique of insertion of the nucleic acid construct would be known to persons skilled in the art. A nucleic acid molecule or polynucleotide according to this disclosure does not occur in nature in the arrangement described by the present invention. In other words, a heterologous nucleotide sequence is not naturally combined with elements of a parent virus genome (e.g., promoter, ORF, polyadenylation signal, DNA-recognition moiety, endonuclease).
In an embodiment, the targeting gene editing construct further comprises a nucleic acid encoding an endonuclease.
Suitable endonucleases will be known to persons skilled in the art, illustrative examples of which include an RNA-guided DNA endonuclease, zinc finger nuclease (ZFN), transcription activator-like effector nucleases (TALEN), CRISPR-associated (Cas) nucleases.
In an embodiment, the nuclease is selected from the group consisting of an RNA-guided DNA endonuclease, ZFN, and a TALEN.
“Transcription activator-like effector nucleases” or “TALEN” are restriction enzymes that can be engineered to cut specific sequences of DNA. They are made by fusing a TAL effector DNA-binding domain to a DNA cleavage domain (a nuclease that cuts DNA strands). Transcription activator-like effectors (TALEs) can be engineered to bind practically any desired DNA sequence, so when combined with a nuclease, DNA can be cut at specific locations. The restriction enzymes can be introduced into cells, for use in gene editing or for genome editing in situ, a technique known as genome editing with engineered nucleases. The mechanism of TALEN-mediated cleavage of target DNA sequences would be known to persons skilled in the art and has been described, for example by Boch (2011, Nature Biotechnology, 29: 135-136), Juong et al. (2013, Nature Reviews Molecular Cell Biology, 14: 49-55) and Sune et al. (2013, Biotechnology and Bioengineering, 110: 1811-1821).
“Zinc finger nucleases” or “ZFN” are proteins comprising nucleic acid binding domains that are stabilised by zinc. The individual DNA binding domains are typically referred to as “fingers”, such that a ZFN has at least one finger, preferably two fingers, preferably three fingers, preferably four fingers, preferably five fingers, or more preferably six fingers. Each finger binds from two to four base pairs of a target DNA sequence, and typically comprises an about 30 amino acid zinc-chelating, DNA binding region. ZFN facilitate site-specific cleavage within a target DNA sequence, allowing endogenous or other end-joining repair mechanisms to introduce insertions or deletions to repair the gap. The mechanism of ZFN-mediated cleavage of target DNA sequences would be known to persons skilled in the art and has been described, for example, by Liu et al. (2010, Biotechnology and Bioengineering, 106: 97-105).
In an embodiment, the RNA-guided DNA endonuclease is a CRISPR-associated (Cas) endonuclease.
The CRISPR-Cas system evolved in bacteria and archaea as an adaptive immune system to defend against viral attack. Upon exposure to a virus, short segments of viral DNA are integrated in the clustered regularly interspaced short palindromic repeats (i.e., CRISPR) locus. RNA is transcribed from a portion of the CRISPR locus that includes the viral sequence. That RNA, which contains sequence complementarity to the viral genome, mediates targeting of a Cas endonuclease to the sequence in the viral genome. The Cas endonuclease cleaves the viral target sequence to prevent integration or expression of the viral sequence.
The mechanisms of CRISPR-mediated gene editing would be known to persons skilled in the art and have been described, for example, by Doudna et al., (2014, Methods in Enzymology, 546) and Belhaj et al., (2013, Plant Methods, 9:39) and in WO 2013/188638 and WO 2014/093622.
Suitable Cas endonucleases will be known to persons skilled in the art, illustrative examples of which include Cas9, Cas12a (also referred to as Cpf1), Cas12b (also referred to as C2c1), Cas13a (also referred to as C2c2), Cas13b, CasX, Cas3 and Cas10. The term “Cas endonucleases” as used herein also contemplates the use of natural and engineered Cas endonucleases, described, for example, by Wu et al. (2018, Nature Chemical Biology, 14: 642-651).
In a preferred embodiment, the Cas endonuclease is Cas9.
In an aspect, the present disclosure provides a gene editing construct comprising a nucleic acid sequence encoding the DNA-recognition moiety disclosed herein.
In an embodiment, the gene editing construct further comprises a nucleic acid encoding an endonuclease.
The term “genome” as used herein refers to the total inherited genetic complement of the cell, plant or plant part, and includes chromosomal DNA, plastid DNA, mitochondrial DNA and extrachromosomal DNA molecules.
In an embodiment, the genome is a de novo assembled genome sequence. In another embodiment, the genome is a published assembled genome sequence.
A skilled person would understand that genomes for use in accordance with the methods disclosed herein may be derived from both male and female plants of a reference species.
The terms “consensus sequence” or “canonical sequence” may be used interchangeably herein to refer to a nucleic acid sequence that represents the most frequent residues of a nucleic acid sequence found at each position in a sequence alignment. Accordingly, the skilled person would understand that the consensus sequence described herein represents the result of the comparison between the nucleic acid sequence of a genome from a plant of a reference species, with the corresponding nucleic acid sequence of a genome from one or more additional plants of the reference species. Methods for comparison of nucleic acid sequences would be known to persons skilled in the art, illustrative examples of which include multiple sequence alignment.
Multiple sequence alignment may be conducted by computerised implementations of algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 Science Drive Madison, Wis., USA) or by inspection and the best alignment (i.e., resulting in the highest percentage homology over the comparison window) generated by any of the various methods selected. Reference also may be made to the BLAST family of programs as described by, for example, Altschul et al. (1997, Nucleic Acids Research, 25:3389). A detailed discussion of sequence analysis can be found in Unit 19.3 of Ausubel et al., supra.
In an embodiment, sequences of similar length with an alignment similarity of between 80 and 100% are incorporated into the consensus sequence. The term “between 80 and 100%” as used herein means preferably about 80%, preferably about 81%, preferably about 82%, preferably about 83%, preferably about 84%, preferably about 85%, preferably about 86%, preferably about 87%, preferably about 88%, preferably about 89%, preferably about 90%, preferably about 91%, preferably about 92%, preferably about 93%, preferably about 94%, preferably about 95%, preferably about 96%, preferably about 97%, preferably about 98%, preferably about 99%, or more preferably about 100% alignment similarity.
The term “pan-genome” as used herein refers to the entire gene set of all strains in a species. It includes genes present in all strains (i.e., the core genome) and genes present only in some strains of a species (i.e., variable or accessory genome). The core genome represents the genes present in all strains of a species. It typically includes housekeeping genes for cell envelope or regulatory functions. The variable or accessory genome refers to genes not present in all strains or species. These include genes present in two or more strains or even genes unique to a single strain only, for example, genes for a strain specific adaptation, such as increased expression of a particular cannabinoid (e.g., THC and/or CBD).
In an embodiment, the consensus sequence is a Cannabis sativa pan-genome.
The term “genomic variation” as used herein refers to differences in the genomes of a plant from a reference species, as compared to the genomes from one or more additional plants of the reference species.
In an embodiment, the genome variation is selected from the group consisting of a single nucleotide polymorphism (SNP) location, SNP frequency, copy number variation (CNV) and presence absence variations (PAV).
In an embodiment, the genomic variation is a genomic variation shown in any one of the sequences selected from the group consisting of SEQ ID NO: 199-233.
The term “polymorphism” refers to any change in the nucleotide sequence of the gene including such silent nucleotide substitutions.
A “single nucleotide polymorphism” or “SNP” is a substitutional variant that occurs are a specific position in the genome. Substitutional nucleotide variants are those in which at least one nucleotide in the sequence has been removed and a different nucleotide inserted in its place. In some embodiments, the number of nucleotides affected by substitutions in a mutant gene relative to the wild-type gene is a maximum of ten nucleotides, more preferably a maximum of 9, 8, 7, 6, 5, 4, 3, or 2, or most preferably only one nucleotide. Substitutions may be “silent” in that the nucleotide substitution does not change the amino acid defined by the codon. Alternatively, the nucleotide substitution(s) may change the encoded amino acid sequence and thereby alter the activity of the encoded enzyme, particularly if conserved amino acids are substituted for another amino acid which is quite different i.e., a non-conservative substitution.
The term “copy number variation” or “CNV” is a duplication or deletion event that affects a number of base pairs. These structural variants result in a change in the number of copies of a particular gene between one reference genome and the next.
An allele is a variant of a gene at a single genetic locus. Each chromosome of a pair of chromosomes has one copy (i.e., one allele) of each gene. If both alleles of a gene are the same, the organism is homozygous with respect to that allele or gene. If the two alleles are different, the organism is heterozygous with respect to that gene. The two alleles of a gene in the plant may have the same mutation as each other, so are said to be homozygous for that mutation, or the two alleles may comprise different mutations to each other and are said to be heterozygous for those mutations.
Cannabis is an erect annual herb with a dioecious breeding system, although monoecious plants exist. Wild and cultivated forms of Cannabis are morphologically variable, which has resulted in difficulty defining the taxonomic organisation of the genus.
In an embodiment, the reference species is of the genus Cannabis. Plants of the genus Cannabis will be known to persons skilled in the art, illustrative examples of which include Cannabis sativa, Cannabis indica and Cannabis ruderalis.
In an embodiment, the reference species is Cannabis sativa, also referred to as C. sativa.
In an embodiment, the reference species is Cannabis sativa, and wherein the genome of the reference species comprises one or more nucleic acid sequences selected from the group consisting of SEQ ID NOs: 164-198.
The terms “plant”, “cultivar”, “variety”, “strain” or “race” are used interchangeably herein to refer to a plant or a group of similar plants according to their structural features and performance (i.e., morphological and physiological characteristics).
The published reference genome for C. sativa is the assembled draft genome and transcriptome of “Purple Kush” or “PK” (van Bakal et al., 2011, Genome Biology, 12(10): R102). C. sativa, has a diploid genome (2n=20) with a karyotype comprising nine autosomes and a pair of sex chromosomes (X and Y). Female plants are homogametic (XX) and males heterogametic (XY) with sex determination controlled by an X-to-autosome balance system. The estimated size of the haploid genome is 818 Mb for female plants and 843 Mb for male plants.
The term “cannabinoid”, as used herein, refers to a family of terpeno-phenolic compounds, of which more than 100 compounds are known to exist in nature. Cannabinoids will be known to persons skilled in the art, illustrative examples of which are provided in Table 1, below, including acidic and decarboxylated forms thereof.
Cannabinoid biosynthesis in plants typically involves the production of fatty acid and isoprenoid precursors via the hexonate, methylerythritol 4-phosphate (MEP) and gernyl diphosphate (GPP) pathways, as described by, for example, Marks et al. (2009, Journal of Experimental Botany, 60: 3715).
The hexonate pathway involves desaturase, lipoxygenase (LOX), hydroperioxide lyase (HPL) and an acyl-activating enzyme (AEE) step that produces hexanoyl-CoA. Hexanoyl-CoA produced via the hexonate pathway acts as the substrate for polyketide synthase enzyme (OLS) that yields olivetolic acid.
The MEP pathway results in the synthesis of a prenyl side-chain, which is utilised as the substrate for GPP synthesis (Phillips et al., 2008, Trends in Plant Science, 13(12): 619-23). GPP is added by an aromatic prenyltransferase (PT) that yields CBGA (WO 2011/017798). The final steps involve catalysis by the oxidocyclases THCAS and CBDAS resulting in the production of THCA and CBDA, respectively (van Bakel et al., supra).
Cannabinoids are synthesised in cannabis plants as carboxylic acids. While some decarboxylation may occur in the plant, decarboxylation typically occurs post-harvest and is increased by exposing plant material to heat (Sanchez and Verpoote, 2008, Plant Cell Physiology, 49(12): 1767-82). Decarboxylation is usually achieved by drying and/or heating the plant material. Persons skilled in the art would be familiar with methods by which decarboxylation of cannabinoids can be promoted, illustrative examples of which include air-drying, combustion, vaporisation, curing, heating and baking.
The term “terpene” as used herein, refers to a class of organic hydrocarbon compounds, which are produced by a variety of plants. Cannabis plants produce and accumulate different terpenes, such as monoterpenes and sesquiterpenes, in the glandular trichomes of the female inflorescence. The term “terpene” includes “terpenoids” or “isoprenoids”, which are modified terpenes that contain additional functional groups.
Terpenes are responsible for much of the scent of cannabis flowers and contribute to the unique flavour qualities of cannabis products. Terpenes will be known to persons skilled in the art, illustrative examples of which are provided in Table 2.
Terpene biosynthesis in plants typically involves two pathways to produce the general 5-carbon isoprenoid diphosphate precursors of all terpenes: the MEP pathway as described elsewhere herein, and the cytosolic mevalonate (MEV) pathway. These pathways control the different substrate pools available for terpene synthases (TPS).
In an embodiment, the target DNA sequence comprises one or more cannabinoid biosynthesis genes.
Reference to “gene” includes DNA corresponding to the exons or the open reading frame of a gene. Reference herein to a “gene” is also taken to include a classical genomic gene consisting of transcriptional and/or translational regulatory sequences and/or a coding region and/or non-translated sequences (i.e., introns, 5′- and 3′—untranslated sequences), or mRNA or cDNA corresponding to the coding regions (i.e., exons) and 5′- and 3′—untranslated sequences of the gene.
The term “cannabinoid biosynthesis gene” as used herein refers to any gene encoding a protein involved in the biosynthesis of a cannabinoid.
In an embodiment, the cannabinoid biosynthesis gene is selected from the group consisting of DXS1, DXS2, DXR, MCT, CMK, MDS, HDS, HDR, IPP/IPI. GPP_LSU, GPP_SSU, FAD2#1, FAD2#2, FAD2#3, FAD2#4, LOX, HPL, AAE], OLS, OAC, OAC#2, GOT, CBCAS, CBCAS-like#a, CBCAS-like#b, CBCAS-like#c, CBCAS-like#d, CBCAS-like#e, CBCAS-like#f, CBCAS-like#g, CBCAS#a, CBCAS#b, and THCAS.
In an embodiment, the DNA-recognition moiety is complementary to a target sequence in at least one cannabinoid biosynthesis gene within the consensus sequence.
As described elsewhere herein, some aspects of terpene biosynthesis are also regulated by the MEP pathway, encoded by genes including DXS1, DXS2, MCT, CMK, HDS, HDR and GPPS. Accordingly, persons skilled in the art would understand that modulation of cannabinoid biosynthesis genes may also be useful in modulating the expression of some terpenes. Terpenes have been associated with therapeutic benefits independent from cannabinoids (Brahmkshatriya and Brahmkshatriya, 2013, in Ramawat and Merillon (eds), Natural Products, Springer, Berlin, Heidelberg). Therefore, modulation of terpene biosynthesis may also be advantageous for cannabis plant production.
Methods for Modulating Gene Expression
In an aspect disclosed herein, there is provided a method of modulating gene expression in a plant cell, the method comprising:
Modulation of gene expression by gene editing may be performed by introducing a targeted gene editing construct comprising a DNA-recognition moiety and an endonuclease that is capable of being functionally expressed in a cell to modifying gene expression. Accordingly, modulation of gene expression includes activating or inhibiting the expression of endogenous genes, inducing or enhancing the expression of endogenous genes and introducing and expressing one or more exogenous genes in a cell.
Modulation of gene expression in accordance with the methods disclosed herein may comprise the inhibition of gene expression or inducing or enhancing gene expression.
The term “inhibition of gene expression” and the like typically refer to a decrease in the level of mRNA in a plant cell as derived from a target DNA sequence (e.g., a cannabinoid biosynthesis gene). Such reduction may be the result of reduction of transcription, including by methylation of promoter regions via chromatin re-modelling, or post-transcriptional modification of the RNA molecules, including via RNA degradation, or both. Inhibition of gene expression should not necessarily be interpreted as an abolishing of the expression of the target nucleic acid or gene. In some embodiments, the introduction of a gene editing construct in a plant cell will decrease the level of mRNA by at least about 5%, preferably by at least about 10%, preferably by at least about 20%, preferably by at least about 30%, preferably by at least about 40%, preferably by at least about 50%, preferably by at least about 60%, preferably by at least about 70%, preferably by at least about 80%, preferably by at least about 90%, preferably by at least about 95%, preferably by at least about 99%, or preferably by about 100% of the mRNA level found in the plant cell in the absence of the gene editing construct.
Conversely, the term “inducing or enhancing gene expression” and the like refer to an increase in the level of mRNA in a plant cell for an endogenous (i.e., homologous or native) target gene (e.g., a cannabinoid biosynthesis gene). In some embodiments, the introduction of the gene editing construct in a cell will increase the level of endogenous mRNA by at least about 5%, preferably by at least about 10%, preferably by at least about 20%, preferably by at least about 30%, preferably by at least about 40%, preferably by at least about 50%, preferably by at least about 60%, preferably by at least about 70%, preferably by at least about 80%, preferably by at least about 90%, preferably by at least about 95%, preferably by at least about 99%, or preferably by about 100% of the mRNA level found in the cell in the absence of the gene editing construct.
Methods for the measurement of gene expression in plant cells would be known to persons skilled in the art, illustrative examples of which include RT-PCR, RNA-Seq, Northern blot analysis, and the like.
The modulation of gene expression in accordance with the methods disclosed herein may be stable, transient or conditional gene expression modulation.
In an embodiment, the gene editing construct transiently modulates the expression of one or more target genes in the plant cell.
In an embodiment, the gene editing construct stably modulates the expression of one or more target genes in the plant cell.
In an embodiment, the plant cell is a protoplast. In another embodiment, the protoplast is a mesophyll-derived protoplast.
In an aspect disclosed herein, there is provided a transformed plant cell comprising the gene editing construct as disclosed elsewhere herein.
The inventors have surprisingly shown that in vitro propagated plant strains provide a source of mesophyll-derived protoplasts that are highly effective for the transient and rapid evaluation of gene editing constructs to identify effective gene editing constructs for use in the stable transduction of regenerable plant cells for the production of plants with modified gene expression.
Accordingly, in another aspect disclosed herein, there is provided a method for screening gene editing constructs in plant cells, comprising:
In an embodiment, the plant cell is a protoplast. In another embodiment, the protoplast is a mesophyll-derived protoplast.
Methods for producing a regenerable plant cell with modified gene expression
In an aspect disclosed herein, there is provided a method for producing a regenerable plant cell with modified gene expression, the method comprising:
A number of techniques are available for the introduction of nucleic acid molecules into regenerable cells derived from germinated plant tissue, well known to persons skilled in the art.
The term “transformation” as used herein means alteration of the genotype of a cell, for example, a bacterium or a plant, particularly a cannabis plant, by the introduction of a foreign or exogenous nucleic acid. By “transformant” is meant an organism so altered. Introduction of DNA into a plant by crossing parental plants or by mutagenesis per se is not included in transformation. The nucleic acid molecule may be replicated as an extrachromosomal element or is preferably stably integrated into the genome of the plant.
The most commonly used methods to produce fertile, transgenic plants comprise two steps: the delivery of DNA into regenerable cells and plant regeneration through in vitro tissue culture. Two methods are commonly used to deliver the DNA: T-DNA transfer using Agrobacterium tumefaciens or related bacteria and direct introduction of DNA via particle bombardment. It will be apparent to the skilled person that the particular choice of a transformation system to introduce a nucleic acid construct into plant cells is not essential to or a limitation of the present disclosure, provided it achieves an acceptable level of nucleic acid transfer.
Agrobacterium-mediated transformation of cannabis may be performed by methods known in the art. Any Agrobacterium strain with sufficient virulence may be used. Bacteria related to Agrobacterium may also be used. The DNA that is transferred (T-DNA) from the Agrobacterium to the recipient plant cells is comprised in a gene editing construct (i.e., chimeric plasmid) that contains one or two border regions of a T-DNA region of a wild-type Ti plasmid flanking the nucleic acid to be transferred. The genetic construct may contain two or more T-DNAs, for example, where one T-DNA contains the gene of interest and a second T-DNA contains a selectable marker gene, providing for independent insertion of the two T-DNAs and possible segregation of the selectable marker gene away from the transgene of interest.
In an embodiment, the regenerable plant cell is transformed with the gene editing construct using Agrobacterium tumefaciens, or a related bacteria.
In another embodiment, the regenerable plant cell is transformed with the gene editing construct using Agrobacterium tumefaciens strain EHA105. In another embodiment, the regenerable plant cell is transformed with the gene editing construct using the Agrobacterium tumefaciens strain LBA4404. In yet another embodiment, the regenerable plant cell is transformed with the gene editing construct using the Agrobacterium tumefaciens strain GV3101.
Transformed plants can be produced by introducing a gene editing construct described elsewhere herein into a recipient cell and growing a new plant that comprises and expresses the polynucleotide encoded by the gene editing construct, thereby modulating gene expression in the new plant. The process of growing a new plant from a transformed cell, which is in cell culture, is referred to herein as “regeneration”.
In an embodiment, the germinated plant tissue is selected from the group consisting of embryogenic cotyledons, primordial root and radicle of mature embryos.
The term “transgenic plant” as used herein refer to a plant that contains a genetic construct (“transgene”) not found in a wild-type plant of the same species, variety or cultivar. That is, transgenic plants (transformed plants) contain genetic material that they did not contain prior to the transformation. A “transgene” as referred to herein has the normal meaning in the art of biotechnology and refers to a genetic sequence, which has been produced or altered by recombinant DNA or RNA technology. If present in a plant cell, the transgene had been introduced into the plant cell or a progenitor cell by a human. The transgene may include genetic sequences obtained from or derived from a plant cell, or another plant cell, or a non-plant source, or a synthetic sequence. Typically, the transgene has been introduced into the plant by human manipulation such as, for example, by transformation but any method can be used as one of skill in the art recognises. The genetic material is typically stably integrated into the genome of the plant. The introduced genetic material may comprise sequences that naturally occur in the same species but in a rearranged order or in a different arrangement of elements, for example an antisense sequence or a sequence expressing an inhibitory double-stranded RNA. Plants containing such sequences are included herein in “transgenic plants”. Transgenic plants as defined herein include all progeny of an initial transformed and regenerated plant (TO plant) which has been genetically modified using recombinant techniques, where the progeny comprise the transgene. Such progeny may be obtained by self-fertilisation of the primary transgenic plant or by crossing such plants with another plant of the same species. In an embodiment, the transgenic plants are homozygous for each and every gene that has been introduced (transgene) so that their progeny do not segregate for the desired phenotype. Transgenic plant parts include all parts and cells of said plants, which comprise the transgene such as, for example, seeds, cultured tissues, callus and protoplasts.
A “non-transgenic plant”, preferably a non-transgenic cannabis plant, is one that has not been genetically modified by the introduction of genetic material by recombinant DNA techniques. The presence in a plant or seed of deletions of part of a gene as generated by site-specific endonucleases such as ZFN, TAL effectors of CRISPR type nucleases, followed by non-homologous end-joining repair in the plant cell, and progeny thereof are included herein as “non-transgenic”.
In an aspect disclosed herein, there is provided a plant comprising the transformed plant cell disclosed herein.
In an embodiment, the plants comprising the transformed plant cell disclosed herein are plants of the genus Cannabis. In another embodiment, the plants comprising the transformed plant cell disclosed herein are Cannabis sativa plants.
In a preferred embodiment, the plants comprising the transformed plant cell disclosed herein are Cannabis sativa plants with modified expression of one or more cannabinoid biosynthesis genes. The person skilled in the art would understand that modifying the expression of one or more cannabinoid biosynthesis genes may result in a Cannabis sativa plant that can produce cannabinoids at optimised levels for medicinal applications.
In another aspect disclosed herein, there is provided a regenerable plant cell produced according to the methods disclosed herein.
The DNA-recognition moiety and gene editing constructs of the present disclosure may also be provided in a kit. The kit may comprise additional components to assist in performing the methods as described herein, such as administration devices(s), excipient(s), and/or diluent(s). The kits may also include containers for housing the various components and instructions for using the kit components in such methods.
Those skilled in the art will appreciate that the invention described herein is susceptible to variations and modifications other than those specifically described. It is to be understood that the invention includes all such variations and modifications that fall within the spirit and scope. The invention also includes all of the steps, features, compositions and compounds referred to or indicated in this specification, individually or collectively, and any and all combinations of any two or more of said steps or features.
Unless otherwise defined, all technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art to which this invention belongs.
The various embodiments enabled herein are further described by the following non-limiting examples.
Genome assembly of a female Cannabis plant (“C1”) was performed by preparing Single Molecule, Real Time (SMRT) bell libraries from extracted DNA as per the manufacturer's recommendations (Pacific Biosciences of California, Inc., Menlo Park, Calif., US). Generated SMRT bell templates were sequenced using PacBio (Pacific Biosciences of California, Inc., Menlo Park, Calif., US) Sequel as per the manufacturer's recommendations. Raw reads were error corrected and assembled using SMRT Link's Hierarchical Genome Assembly Process (HGAP4).
Sequences (i.e., cannabinoid biosynthesis genes) from the C1 genome are shown in SEQ ID NOs: 164-198.
The CBDrx genome was obtained from The European Nucleotide Archive (PRJEB29284) (haps://www.ebi.ac.uk/ena/data/view/PRJEB29284). PK and Finola genome assemblies were obtained from the NCBI BioProject database (PRJNA73819) (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA73819).
Cannabinoid biosynthesis genes were accessed from a variety of sources and public databases (Table 3). Sequences were downloaded and used as a query for BLAST analysis against the genome assembly with an e-value threshold set at <10-10. Identified scaffold regions of interest from the reference genome were annotated and visualised using FGENESH (Solovyev et al., 2006, Genome Biology, 7(1): S10) and MEGANTE (Numa and Itoh, 2013, Plant and Cell Physiology, 55(1): e2-e2).
SNP discovery was performed, and five hundred and thirty-four whole genomes were re-sequenced on a HiSeq3000 instrument at varying depths. The resulting sequence data was reference aligned to the genome assembly of C1, using the BWA MEM algorithm (Li, 2013, ArXiv Preprint ArXiv: 1303.3997). Variants were identified using samtools (Li et al., 2009, Bioinformatics, 25(16): 2078-79) and a bed file with scaffold regions of interest matching to gene sequences of cannabinoid biosynthesis genes was created (see, e.g., variants comprised in any one of the sequences of SEQ ID NOs: 199-233). Alignments were sorted and used for variant calling with an adjusted mapping quality (−C 50) and minimum read depth of 5 to generate a consensus sequence.
Presence of an allele, or extra copies of a gene, were determined based on genomic nucleotide multiple sequence alignments using MUSCLE (Edgar, 2004, Nucleic Acids Research, 32(5): 1792-97). Sequences of similar length with alignment similarity between 80-98%, which produced identical translated proteins, were determined as alleles. Where large variation existed between genomic nucleotide sequence length and content, or where nucleotide sequences were <1000 bp, predicted mRNA sequences were used from FGENESH for alignment. Alleles were determined if similarity equaled>98%. Extra copies of genes were determined if similarity were <98%.
CHOPCHOP (Labun et al., 2016, Nucleic Acids Research, 44(W1): W272-76), CRISPR MultiTargeter (Prykhozhij et al., 2015, PLoS One, 10(3): e0119372), Crispor (Haeussler et al., 2016, Genome Biology, 17(1): 148) and ZiFit (Hwang et al., 2013, Nature Biotechnology, 31(3): 227) were used for the selection of sgRNAs. For visual confirmation of SNP avoidance, sgRNAs were manually aligned to C1 and consensus sequences using Sequencher (Gene Codes Corporation).
To locate all the genes involved in cannabinoid biosynthesis, query references were downloaded from publicly available databases (Table 3) and BLAST analyses was performed against the C1 genome assembly. All genes in the MEP, GPP, Hexanoate and Cannabinoid pathway were identified (Table 3). Two versions of 1-deoxy-D-xylulose 6-phosphate synthase (DXS) were identified in the MEP pathway, with single copies of 1-deoxy-D-xylulose 5-phosphate reductoisomerase (DXR), 4-diphosphocytidyl-2C-methyl-D-erythritol synthase (MCT), 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase (CMK), 2C-methyl-D-erythritol 2,4-cyclodiphosphate synthase (MDS), 4-hydroxy-3-methylbut-2-en-1-yl diphosphate synthase (HDS) and 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate reductase (HDR). Single genes of isopentenyl diphosphate isomerase (IPP/IPI), geranyl pyrophosphate synthase (GPP), small and large subunits, were identified in the GPP pathway. In the hexanoate pathway, four copies of fatty-acid desaturase (FAD2) were identified using the Purple Kush (PK) desaturase gene sequence as the query and all are believed to be involved in cannabinoid biosynthesis. Lipoxygenase (LOX) and hydroperoxide lyase (HPL) were identified using the associated PK gene sequences as the queries. Acyl-activating enzyme (AAE1) was found using previously published sequences (Table 3) amongst the AAE superfamily, containing 15 AAE homologs. In the cannabinoid pathway a single copy of olivetol synthase (OLS) and three copies of olivetolic acid cyclase (OAC) were found. Two complete CBDAS genes were identified with seven closely related homologs. A single, complete copy of cannabichromenic acid synthase (CBCAS) was identified with two closely related homologs, and a single copy of THCAS was identified.
Within the publicly available cannabis genomes, the assembled gene set was then used to query gene copy number and identify potential homologs. Differences exist between the datasets in terms of gene copy number due to the resolution of the sequence data, genetic mapping, scaffolding technologies and natural variation in different genomes. Variations in gene presence and copy number, using the assembled reference gene list, exist for DXS1, DXS2, DXR, IPP/IPI, GPP_SSU, FAD2, AAE], OLS, OAC, CBDAS, THCAS and CBCAS (Table 3). Within the Finola genome, DXS1, DXS2, GPP_SSU and AAE] were not identified, with copy number variation observed for FAD2, OLS and OAC when compared to C1. Within the CBDrx genome, no copy of IPP/IDI was identified, while copy number variations were identified for FAD2 and synthase genes compared to C1. The updated PK genome had at least one copy of each gene, with variations in copy numbering existing for DXR, FAD2, OLS and OAC compared to C1.
To assess gene variation, an established resource of SNP locations were overlaid onto the identified genes integral to the cannabinoid biosynthesis (Table 4). With the exception of FAD2, which belongs to a large, diverse family of desaturases and CBDAS#a, a homolog of CBDAS, the cannabinoid biosynthesis genes contain relatively conserved total variations in their sequences (Table 5). Each consensus sequence, containing SNP locations was then used for intelligent guide designs to avoid all known nucleotide variations, creating universal guides, which can be broadly used on any plant genotype within the species, and in the instance of highly similar gene sequences, unique guides designed to target only a specific gene of interest (
Phytocannabinoids are of particular interest for their pharmacological applications in a growing number of medical conditions. Knowledge and understanding of the gene interactions and their relationship to final cannabinoid concentration can facilitate improved cannabis strains with desired novel cannabinoid levels. Creating a pangenome consensus of each gene in the contributing pathways allows for genomically informed decisions, based on known SNP location and frequency as well as presence absence variations (PAV), for crop improvement by means of genome editing. Using publicly available sequence information (Table 3), at least one full-length transcript for all genes involved in cannabinoid biosynthesis were found. Two DXS genes were also identified. Single copies of DXR, HDR and IPI/IPP were identified in the C1 genome. Fatty acid desaturase enzymes belong to two large multifunctional classes, either membrane bound, or soluble. The desaturase of interest in cannabinoid production is involved in the hexanoate pathway, leading to the production of hexanoyl-CoA, the first precursor in the cannabinoid pathway. Despite the complexity of the number of FAD2 gene sequences, four copies of this gene were identified. THC-rich PK cultivar was shown to have two copies of OLS and OAC, whereas CBD-rich cultivar, CBDrx, had just one copy of each. The C1 cultivar, with relatively equal cannabinoid levels, was shown to contain a single copy of OLS and 2 copies of OAC. Using the synthase genes from the C1 sequence as the query against CBDrx, Finola and PK genomes, the total number of synthase genes varied considerably between the cultivars. In the CBDrx genome (Grassa et al., 2018, BioRxiv, 458083, doi: https://doi.org/10.1101/458083), 13 synthase genes were reported. 11 were identified using our sequences as queries. Identification of which synthase genes were not identified is difficult due to the nested repeating nature of synthase genes around the centromere. However, variation in synthase genes is most likely due to PAV across different cultivars, which in the case of maize is common (Springer et al., 2009, PLoS Genetics, 5(11): e1000734). Total synthase gene number is not given for Finola or PK (Laverty et al., 2019, Genome Research, 29(1): 146-56), however 9 and 14 genes were found, respectively.
Within the Finola genome, 4 genes could not be identified. Both forms of DXS were not present. GPP SSU and AAE] were also not identified. AAE] was found to be the gene that synthesises hexanoyl-CoA from hexanoate supplying the cannabinoid pathway (Stout et al., 2012, The Plant Journal, 71(3): 353-65) and since Finola still produces cannabinoids, this result was considered an assembly error. GPP is a heterodimer requiring both subunits, large and small, for optimum activity. GPP activity has been previously shown to be active but at lower levels when the small subunit was inactive (Wang and Dixon, 2009, Proceedings of the National Academy of Sciences U.S.A., 106(24): 9914-19), however both subunits were still present, suggesting the absence of GPP SSU in the Finola genome is also due to assembly error. The absence of IPP/IPI in the CBDrx genome is also strongly suggested to be due to assembly error, since previous studies on Arabidopsis double mutant knockdown of IPP/IPI produced dwarfism and male sterility (Okada et al., 2008, Plant and Cell Physiology, 49(4): 604-16).
The SNP location resource revealed some genes are more highly conserved than others (Table 5). Comparative analysis of SNPs present in genes of variable copy number in C1, CBDrx, Finola and PK genomes was performed (excluding results of no gene presence). Through multiple sequence alignments of coding sequences, it was observed that the presence of SNP's occurred in the extra gene copy where the presence of homozygous alleles exist, suggesting that either sequencing error has occurred, or in fact there is an extra copy of the gene and a set of alleles. Within the C1 genome, OAC produced three hits with two sequences determined as alleles with an extra copy of the gene existing. When sequences were aligned, SNPs occurred in all sequences and when sequences were translated, nearly identical protein sequences (>99%) were produced confirming that an extra copy of the gene was present, potentially in a hemizygous condition. Within the PK genome, copy number variation was shown for OLS and OAC. Like OAC in the C1 genome, OLS produced three hits, two of which were determined to be alleles and one to be an extra copy. SNPs were identified in all three sequences when coding regions were aligned with similar results obtained from protein sequence alignment. Initial alignment of both OAC hits, in PK, found a 98.5% similarity in genomic sequences, however no gene prediction was possible on one of the sequences, possibly due to a premature stop codon from a SNP rendering this gene inactive potentially indicating that it exists as a pseudogene.
Using multiple tools for the design of sgRNA ensured that all possible guide designs could be assessed for in silico off-targeting. Each tool implemented different scoring rules based on off-targets, mismatches, efficiency score, existence of self-complimentary regions, GC content, location of guide and multiple sequence alignments (Prykhozhij et al., supra; Labun et al., supra). Due to the absence of a fully developed pan-genome for analysis by these tools, the use of multiple tools was necessary. The presence of a PAM site is necessary for sgRNA binding and while these tools scanned the gene sequence for the PAM sites, the results obtained varied between the online tools. Visualisation of guides was clear using CHOPCHOP compared to the other tools and regularly provided the best guide designs. However, when highly homologous sequences were used MultiTargeter was able to perform sequence alignments and produce unique guides for each sequence, a feature not possible within the other tools. Designing the guides for the unique synthases were first run using MultiTargeter and further verified using CHOPCHOP for visualisation. Guides designed were targeted to the earliest possible exon for maximum likelihood of a frame shift mutation. The error prone nature of NHEJ often occurs with small deletions, or insertions, occurring at the DSB leading to protein misfolding and thus production of a knock out gene. Each identified gene, with accompanying allele where applicable, were analysed and sgRNAs were designed to be either universal, inactivating both alleles, or if sequence heterozygosity exists, specific guides were designed (Table 5). Using genome editing, sequence homogeneity between synthase genes could potentially lead to off-target editing, with targets suggested to have at least several nucleotides different for discrimination (Soyars et al., 2018, Plant and Cell Physiology, 59(8): 1608-20). Where possible, each synthase gene, and accompanying homologs, had universal and specific guides designed that could be used regardless of cultivar chosen as the target.
The reported sequence similarity between THCAS, CBDAS and CBCAS, of up to 95% (Laverty et al., supra) requires precise, intelligent design, using multiple online tools and a large consensus population to improve the likelihood of correct gene knockout. Off-targeting predictions, given by sgRNA online tools, currently use the previously fragmented genome of PK (van Bakel et al., supra). To circumvent this, each sgRNA was used as a query to BLAST against the C1 genome for potential off-targets. From the BLAST results no sgRNA had an unexpected sequence match elsewhere in the genome.
91
32
12 genes and 7 homologs;
21 gene and 2 homologs.
Buds were excised from the initial mother plant and then cleaned by rinsing several times under tap water. The buds were then surface sterilised by stirring in 80% Ethanol (v/v) for 1 minute. The ethanol was then decanted off from the plant tissue, and the cannabis buds rinsed with tap water for a minimum of three times, changing the water in between each rinse. The buds were then immersed in 15% Domestos® [4.75% available Chlorine m/v] for 15 min with shaking at 150 rpm/min (
Once the in vitro plants were established and generated sufficient leaf material for protoplast isolation, leaf strips were taken. Protoplasts were isolated from well rooted, 1 month old, young leaves from a plantlet (
After incubation, the enzyme mixture was filtered through a sterile 70 μm cell strainer and centrifuged at 700 rpm for 10 minutes (Eppendorf Model 5910R) before decanting off the supernatant. The pellet was then resuspended in 3 ml W5 buffer (Table 8) transferred to a 14 mL round bottomed tube and 3 ml of 20% sucrose was added and the centrifugation is repeated (
Isolated protoplasts were divided into aliquots of 1×106 and centrifuged at 700 rpm for 10 min with the supernatant removed. The pelleted protoplasts are re-suspended by adding 100 μl of transformation buffer (Table 9) to the protoplast pellet, followed by 50 μl of 20-30 μg plasmid DNA and immediately 150 μl of pre-warmed 40% PEG solution (warmed to 42° C. for 1 hr prior to transformation; Table 10). Mixing gently after adding each of the contents and incubate at ambient room temperature (22° C.) for 15 minutes in the dark. Following the incubation, 5 ml of W5 buffer was added dropwise to the sides of the tube to gently mix the protoplasts. A further 5 ml of W5 was added as gently as possible to the sides of the tube to gently mix the protoplast. The protoplasts were centrifuged again at 700 RPM for 10 minutes; with the supernatant carefully discarded without disturbing the pellet and re-suspending in 150 μl of W5 buffer and incubated in the dark at room temperature (22° C.) for 48 hrs. The expression of the GFP and dsRED proteins was observed under a fluorescence microscope (OLYMPUS CKX53, Tokyo, Japan) (excitation emission wavelengths 470-490 and 550-570 nm) (
The transfected protoplasts were collected into individual 1.5 or 2.0 ml microfuge tubes of c. 1.0×106 protoplasts/ml per tube. The cells were then pelleted by centrifugation and a lysis buffer added, followed by snap freezing in liquid nitrogen. The cells were then subjected to DNA extraction following the Qiagen (Hilden Germany) DNeasy plant kit following manufacturer's instructions. The target genome edit sites were targeted by multiple pairs of PCR primers that generate amplicons that surround the site, which can be sequenced by 100-200 bp Illumina sequencing technology. Due to the requirement to cover all possible deletions the primers are required to be a minimum of 10-20 bp away from the target site. The amplicons comprising specific DNA bar codes were added to each tube and then pooled for DNA sequencing on an Illumina sequencing by synthesis platform. Sequence data of c. 10 million reads per sample were generated and subsequently aligned to the reference sequence, with variant sequences detected at the target site. A count of specific deletions at the target nuclease site was made in comparison to the number of unedited reference sequences. The size of the deletion could then be determined for each of the edited reads and the construct with the highest number of edits at the site identified for further stable editing.
To prepare the digestion media, the components (excluding enzymes) were mixed in MilliQ water, pH balance and filtered through 0.22 μm filter. The enzymes were then added to desired concentrations, and dissolved. The mixture was allowed to sit in a 55° C. water bath for 10 mins to enhance solubility, followed by 0.22 μm filtration.
To prepare the W5 wash buffer, all components were mixed in MilliQ, pH balanced and filter sterilised through 0.22 μm filter.
The transformation buffer was prepared fresh for every transformation. 30 mL aliquots of the transformation buffer were prepared, pH balanced and filter sterilised through 0.22 μm filter syringe.
The PEG 400 was prepared fresh for every transformation. 30 mL aliquots were prepared in MilliQ water and placed in a falcon tube. The falcon tube was then placed in water bath at 42° C. for 1 hour before transformation.
Cannabis strains have proven to be highly variable, making the effect of protocols, more importantly media compositions, variable across different strains. Here are potential alternative media compositions that can substitute the compositions previously mentioned, should those media not be suitable for the specific plants used (Table 11, 12 and 13).
Seed germination and callus induction from embryogenic cotyledons
Seeds were initially cleaned by rinsing several times under tap water, then surface sterilised by stirring in 80% Ethanol (v/v) for 1 minute. The ethanol was then decanted off from the seeds, which were rinsed with tap water for a minimum of three times, changing the water in between each rinse. The seeds were then immersed in 15% Domestos® (4.75% available Chlorine m/v) for 15 min with shaking at 180 rpm/min (
Once the seed coat has been split by the germinating seed (
Planting out and hardening of regenerated plantlets
Healthy, rooted plantlets (
To prepare the callus induction media, all components were mixed together (excluding Kinetin and NAA) in MilliQ, pH balanced and autoclaved at 121° C. for 15 min. Once cooled to 55° C., the Kinetin and NAA were added, before pouring the media into sterile petri dishes.
To prepare the regeneration media, the components were mixed together (excluding TDZ) in MilliQ, pH balanced and autoclaved at 121° C. for 15 min Once cooled to 55° C., the TDZ was added, before pouring the media into sterile petri dishes/culture vessels.
To prepare the rooting media, the components were mixed together (excluding IBA) in MilliQ, pH balanced and autoclaved at 121° C. for 15 min Once cooled to 55° C., IBA was added, before pouring the median into sterile culture vessels.
Again, each cannabis strain requires different hormones and carbohydrate sources to initiate undifferentiated callus formation and regeneration. All media for tissue culture requires a carbohydrate source (e.g., maltose replaces sucrose), agar concentration, and potentially agar source, hormone choice and concentration empirically adjusted on a plant genotype-by-genotype basis.
Number | Date | Country | Kind |
---|---|---|---|
2019904146 | Nov 2019 | AU | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/AU2020/051102 | 10/14/2020 | WO |