This invention relates to plant genes regulated by transcription factors that control the gene network response to an environmental perturbation or signal, and the manipulation of the expression of these “response genes” and/or their regulatory transcription factors in transgenic plants to confer a desired phenotype. The invention also relates to a rapid technique named “TARGET” (Transient Assay Reporting Genome-wide Effects of Transcription factors) for determining such “response genes” and their regulatory transcription factors as well as the structure of the involved gene regulatory networks (GRN)—including “transient” targets of transcription factors (TF)—by transiently perturbing the expression of the transcription factors of interest and the signals they transduce in protoplasts of any plant species.
Determining the fundamental structure of gene regulatory networks (GRN) is a major challenge of systems biology. In particular, inferring GRN structure from comprehensive gene expression and transcription factor (TF)-promoter interaction datasets has become an increasingly sought after aim in both fundamental and agronomical research in plant biology (Bonneau et al, 2007, Cell 131:1354-1365; Ruffel et al., 2010, Plant Physiol 152:445-452). A crucial step for the assessment of GRN is the identification of the direct TF-target genes.
Transgenic plant lines expressing tagged versions of the TF-of-interest can be used together with transcriptomic and DNA-binding analyses to obtain high-confidence lists of direct targets (see e.g., Mönke et al., 2012, Nucleic acids research 40:8240-825). However, the generation of such transgenics can be a limiting factor, especially in large-scale studies or in non-model species.
Another major challenge in systems biology is the generation of gene regulatory networks (GRNs) that describe, and ideally, predict how the network will respond to perturbation. Currently, the global structure of a GRN is modeled by inferring regulatory relationships between transcription factors (TFs) and their target genes from genomic data (Krouk et al., 2010, Genome Biology 11:R123; Brady et al., 2011, Molecular Systems Biology 7:459; Petricka et al., 2011, Trends in Cell Biology 21:442). While diverse experimental approaches have been devised to validate interactions between specific TFs and their targets (Matallana-Ramirez et al., 2013, Molecular Plant [epub ahead of print, doi: 10.1093/mp/sst012]; Bargmann et al., 2013, Molecular Plant 6(3):978; Gorte et al., 2011, Plant Transcription Factors, vol. 754, pp. 119-141; Iwata et al., 2011, Plant Transcription Factors, vol. 754, pp. 107-117; Wehner et al., 2011, Frontiers in Plant Science 2:68), the “gold standard” in the field has been to identify primary TF-targets as genes that are both transcriptionally regulated and whose promoter region is bound by the TF of interest (Oh et al., 2009, The Plant Cell Online 21:403). However, a GRN built purely on this “gold standard” rule (Reeves et al., 2011, Plant Molecular Biology 75:347; Gorski et al., 2011, Nucleic Acids Research 39:9536; Hull et al., 2013, BMC Genomics 14:92; Fujisawa et al., 2011, Planta 235:1107), renders a static network that only includes targets stably bound by a TF under the studied conditions, and likely underestimates the dynamic interactions occurring in vivo.
For example, in higher plants, fluctuating nitrogen levels in the soil cause rapid and dramatic changes in plant gene expression. Nitrogen is both a metabolic nutrient and signal that broadly and rapidly reprograms genome-wide responses. While genomic responses to nitrogen have been studied for many years, only a small number of genes in nitrogen genome-wide reprogramming have been identified. The unidentified genes represent the so-called “dark matter” of such metabolic regulatory circuits, a crucial problem in understanding system-wide genetic regulation in many fields.
Plant genes regulated by transcription factors that control the gene network response to an environmental perturbation or signal (e.g., nitrogen, water, sunlight, oxygen, temperature) are described. These genes respond rapidly to their environment, but surprisingly, there is no evidence of direct transcription factor interaction. More particularly, the large class of genes described herein (and exemplified in Tables 1, 2, 19, 20, and 23) respond to the perturbation of a regulatory transcription factor and the signal it transduces, but in fact are not stably bound to the transcription factor, and yet are most relevant to the signal induced in vivo—in other words, they represent members of the “dark matter” of metabolic regulatory circuits. The invention involves the transgenic manipulation of these “response genes” and/or the genes encoding their regulatory transcription factors in plants so that their respective gene products are either overexpressed or underexpressed in the plant in order to confer a desired phenotype; e.g., increased N usage (to enhance plant growth/biomass) or N storage/yield (to enhance N storage and/or protein accumulation in seeds of seed crops).
The invention is based, in part, on the development of a rapid technique named “TARGET” (Transient Assay Reporting Genome-wide Effects of Transcription factors) that uses transient transformation of a plasmid containing a glucocorticoid receptor (GR)-tagged TF in protoplasts to study the genome-wide effects of TF activation. The TARGET system can be used to rapidly retrieve information on direct TF target genes in less than two week's time. The technique can be used as a part of various experimental designs, as show in
While not intending to be bound to any theory of operation, using the TARGET system, gene networks have been identified that are regulated by TFs via transient associations with the target gene. Unexpectedly, these transient TF targets were found to be biologically relevant in controlling responsiveness to the applied signal/pertubation/cue. The target genes of interest are referred to herein as “response genes” that are regulated by what is referred to herein as their transiently associated “touch and go” or “hit and run” transcription factors. Conventional wisdom has focused on the “Golden Set” of genes stably bound and regulated by a TF, and has failed to uncover these transient associations described herein.
As a proof-of-principle candidate, the well-studied transcription factor, Abscicic acid insensitive 3 (ABI3) was investigated using TARGET, as described in more detail herein in Section 6 (Example 1). The de novo identification of the abscisic acid response element (ABRE) and a majority of the previously classified direct targets was established by use of the TARGET method, confirming its applicability. The TARGET system was then further modified, as described in further detail in Sections 7 and 10 (Examples 2 and 5), to identify genes transiently bound and regulated by the TF of the system in response to an environmental signal. These modifications allowed for the discovery of a “hit-and-run” (“touch-and-go”) mode-of-action for a proof-of-principle transcription factor candidate, bZIP1, where bZIP1 “hits” its target, initiates transcription, then dissociates (“run”), leaving the transcription going on even without bZIP1 binding to the promoter. As evidence that transcription of a gene initiated by “the Hit” continues after “the Run,” an affinity-tagged UTP was used to label and capture newly synthesized mRNA, as described in Section 11 (Example 6). By adding this UTP affinity label at a time-point when bZIP1 is not detectably bound, it was determined that response genes were still actively transcribed. Section 12 (Example 7) describes the discovery that the transient TF-targets detected specifically in the TARGET cell-based system make a unique contribution to understanding how signal transduction occurs in planta, while eluding detection in planta.
In Section 8 (Example 3), a method for identifying nitrogen-regulated connections conserved across model species and crops is detailed. This method is a rapid way to assess whether the function of a gene of interest is conserved across species and enables the enhancement of the translational discoveries of the TARGET system. The method of Section 8 may be used as an alternative or supplement to using the TARGET system directly in protoplasts of crops or other plant species. Section 9 (Example 4) also describes a method for identifying networks conserved across species to identify translational targets that may be used as an alternative or supplement to the TARGET system.
One advantage of the TARGET system is the ability to study gene regulatory networks and targets of transcription factors in a transient assay system, which means the method can be applied to plants that cannot be stably transformed. Protoplasts can be made from any plant species, and a transcription factor of interest can be transiently expressed to identify its targets genome-wide. Target genes of transcription factors can be rapidly identified because the method does not rely on the use of transgenic plants, which normally have to be stably transformed. Also, the TARGET technique allows for cross-species studies in order to analyze evolutionary conserved networks using genes from a poorly characterized plant genus or species in a better characterized model genus, such as Arabidopsis, which has a fully sequenced genome and has microarray chip data available. This also has important implications for translational studies of gene function, from data-rich models (e.g. Arabidopsis) to data-poor crops. By providing the ability to do reciprocal cross species genetic network comparisons, the TARGET technique allows for the determination of TF-target connections that are evolutionarily conserved and therefore likely the most important elements of transcription factor networks. The optional modifications to the TARGET system confers the further advantage of the ability to detect gene networks that are controlled transiently in response to environmental signals by TF interactions that have been previously ignored. TF regulation is not always associated with stable TF binding. The TARGET system uncovers TF targets that would otherwise be missed in other systems that require TF binding to identify gene targets. The TARGET system allows for the identification of the functional mode of action for any TF within and across species.
The most recent advance in the field of nitrogen-signaling uncovered a master transcription factor, NLP7, which when mutated, affects >58% of the nitrogen-responsive genes in plants, yet can be shown to bind to only 10% of these targets. This conundrum represents a general problem in the field of transcription, and a particular problem in metabolic signaling, where TF binding is a poor indicator of system-wide gene regulation. In fact, most GRN studies have focused on determining when and how TF binding does, or does not, result in activation of its target genes. Such TF-binding approaches have missed the “dark matter” of signal transduction. The TARGET system has revealed that the largest class of genes responding to the perturbation of a TF and a signal it transduces are in fact not stably bound to the TF, and this class of genes which has the most relevance to the signal transduced has been missed in all TF studies to date. Several unique aspects of the system described enable the discovery of this large set of primary TF targets that are regulated by, but do not stably bind to the TF.
In one embodiment, the present invention is directed to a transgenic plant that ectopically expresses one or more touch and go (hit and run) transcription factor genes and exhibits a desired phenotype, wherein the said one or more genes comprises a polynucleotide that encodes At1g01060, At1g01720, At1g13300, At1g15100, At1g22070, At1g25550, At1g25560, At1g29160, At1g43160, At1g51700, At1g51950, At1g53910, At1g66140, At1g68670, At1g68840, At1g74660, At1g74840, At1g75390, At1g77450, At1g80840, At2g04880, At2g20570, At2g22430, At2g22850, At2g24570, At2g25000, At2g28510, At2g28550, At2g30250, At2g33710, At2g38470, At2g46830, At3g01560, At3g04070, At3g06590, At3g20770, At3g25790, At3g46130, At3g47620, At3g51920, At3g54620, At3g60490, At3g61150, At3g61890, At3g62420, At4g17490, At4g17500, At4g24240, At4g27410, At4g31800, At4g34590, At4g36540, At4g37180, At4g37260, At4g37610, At4g37730, At5g05410, At5g06800, At5G10030, At5g13080, At5g14540, At5g24800, At5g39610, At5g44190, At5g47230, At5g48655, At5g49450, At5g49520, At5g56270, At5g60850, At5g63790, At5G65210, or At5g65640. In another embodiment, the present invention is directed to a transgenic plant that ectopically expresses one or more touch and go (hit and run) transcription factor genes and exhibits a desired phenotype, wherein the said one or more genes comprises a polynucleotide that encodes At1g01060, At1g01720, At1g13300, At1g15100, At1g25550, At1g25560, At1g29160, At1g51700, At1g51950, At1g53910, At1g66140, At1g68670, At1g68840, At1g74660, At1g75390, At1g77450, At1g80840, At2g04880, At2g22850, At2g24570, At2g28510, At2g28550, At2g30250, At2g33710, At3g04070, At3g06590, At3g20770, At3g25790, At3g46130, At3g47620, At3g51920, At3g54620, At3g60490, At3g62420, At4g17490, At4g24240, At4g27410, At4g31800, At4g34590, At4g36540, At4g37180, At4g37610, At4g37730, At5g05410, At5g06800, At5G10030, At5g13080, At5g39610, At5g47230, At5g49520, At5g56270, At5g60850, At5g63790, At5G65210, or At5g65640.
In one embodiment, the present invention is directed to an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker. In another embodiment, the present invention is directed to an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker, wherein the nucleic acid molecule is a DNA plasmid. In yet another embodiment, the present invention is directed to an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker, wherein the domain comprising an inducible nuclear localization signal is glucocorticoid receptor. In yet another embodiment, the present invention is directed to an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker, wherein the nucleic acid molecule is a DNA plasmid and the domain comprising an inducible nuclear localization signal is glucocorticoid receptor.
In one embodiment, the present invention is directed to an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker, wherein the selectable marker is a fluorescent selection marker. In another embodiment, the present invention is directed to an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker, wherein the nucleic acid molecule is a DNA plasmid, and wherein the selectable marker is a fluorescent selection marker. In yet another embodiment, the present invention is directed to an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker, wherein the domain comprising an inducible nuclear localization signal is glucocorticoid receptor. In yet another embodiment, the present invention is directed to an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker, wherein the nucleic acid molecule is a DNA plasmid, the domain comprising an inducible nuclear localization signal is glucocorticoid receptor, and the selectable marker is a fluorescent selection marker.
In one embodiment, the present invention is directed to an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker, wherein the selectable marker is green fluorescent protein, yellow fluorescent protein, red fluorescent protein, cyan fluorescent protein, or blue fluorescent protein. In another embodiment, the present invention is directed to an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker, wherein the nucleic acid molecule is a DNA plasmid, and wherein the selectable marker is a green fluorescent protein, yellow fluorescent protein, red fluorescent protein, cyan fluorescent protein, or blue fluorescent protein. In yet another embodiment, the present invention is directed to an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker, wherein the domain comprising an inducible nuclear localization signal is glucocorticoid receptor. In yet another embodiment, the present invention is directed to an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker, wherein the nucleic acid molecule is a DNA plasmid, the domain comprising an inducible nuclear localization signal is glucocorticoid receptor, and the selectable marker is green fluorescent protein, yellow fluorescent protein, red fluorescent protein, cyan fluorescent protein, or blue fluorescent protein.
In one embodiment, the present invention is directed to an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker, wherein the isolated nucleic acid is DNA plasmid pBeaconRFP_GR, which comprises the nucleotide sequence of SEQ ID NO: 1.
In one embodiment, the present invention is directed to a host cell comprising an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker.
In one embodiment, the present invention is directed to a host cell comprising an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker, wherein the host cell is a plant protoplast.
In one embodiment, the present invention is directed to a host cell comprising an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker, wherein the host cell is a plant protoplast, and wherein the plant protoplast is derived from one of the following genuses: Acorus, Aegilops, Allium, Amborella, Antirrhinum, Apium, Arabidopsis, Arachis, Beta, Betula, Brassica, Capsicum, Ceratopteris, Citrus, Cryptomeria, Cycas, Descurainia, Eschscholzia, Eucalyptus, Glycine, Gossypium, Hedyotis, Helianthus, Hordeum, Ipomoea, Lactuca, Linum, Liriodendron, Lotus, Lupinus, Lycopersicon, Medicago, Mesembryanthemum, Nicotiana, Nuphar, Pennisetum, Persea, Phaseolus, Physcomitrella, Picea, Pinus, Poncirus, Populus, Prunus, Robinia, Rosa, Saccharum, Schedonorus, Secale, Sesamum, Solanum, Sorghum, Stevia, Thellungiella, Theobroma, Triphysaria, Triticum, Vitis, Zea, or Zinnia.
In one embodiment, the present invention is directed to a host cell comprising an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker, wherein the host cell is transfected with the nucleic acid molecule.
In one embodiment, the present invention is directed to a host cell comprising an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker, wherein the host cell is transiently transfected with the nucleic acid molecule.
In one embodiment, the present invention is directed to a host cell comprising an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker, wherein the host cell is derived from a genus that is different from the genus from which the transcription factor is derived from.
In one embodiment, the present invention is directed to a host cell comprising an isolated nucleic acid molecule that encodes (a) a chimeric protein comprising a transcription factor fused to a domain comprising an inducible nuclear localization signal; and (b) an independently expressed selectable marker, wherein the host cell is a plant protoplast derived from the genus Arabidopsis and the transcription factor is derived from the genus Zea.
In one embodiment, the present invention is directed to a method for identifying target genes of a transcription factor comprising: (i) transfecting host cells with a nucleic acid molecule described above; (ii) detecting host cells that express the selectable marker; (iii) contacting the host cells that express the selectable marker with an agent that induces nuclear localization of the chimeric protein; and (iv) detecting the level of mRNA expressed in the host cells, wherein an alteration in the level of the mRNA expressed in the host cells that have nuclear localization of the chimeric protein compared to the level of the mRNA expressed in the host cells that do not have nuclear localization of the chimeric protein indicates the identification of target genes of the transcription factor.
In one embodiment, the present invention is directed to a method for identifying target genes of a transcription factor comprising: (i) transfecting host cells with a nucleic acid molecule described above; (ii) detecting host cells that express the selectable marker; (iii) contacting the host cells that express the selectable marker with an agent that induces nuclear localization of the chimeric protein; (iv) detecting the level of mRNA expressed in the host cells, wherein an alteration in the level of the mRNA expressed in the host cells that have nuclear localization of the chimeric protein compared to the level of the mRNA expressed in the host cells that do not have nuclear localization of the chimeric protein indicates the identification of target genes of the transcription factor; and (v) identifying direct target genes of the transcription factor using a method comprising: (a) contacting the host cells with cyclohexamide; and (b) detecting the level of mRNA expressed in the host cells; wherein an alteration in the level of the mRNA expressed in the host cells treated with cyclohexamide compared to the level of the mRNA expressed in the host cells not treated with cyclohexamide indicates the identification of direct target genes of the transcription factor.
In one embodiment, the present invention is directed to a method for identifying target genes of a transcription factor comprising: (i) transfecting host cells with a nucleic acid molecule described above; (ii) detecting host cells that express the selectable marker; (iii) contacting the host cells that express the selectable marker with an agent that induces nuclear localization of the chimeric protein; and (iv) detecting the level of mRNA expressed in the host cells, wherein an alteration in the level of the mRNA expressed in the host cells that have nuclear localization of the chimeric protein compared to the level of the mRNA expressed in the host cells that do not have nuclear localization of the chimeric protein indicates the identification of target genes of the transcription factor, wherein the host cell is a plant protoplast.
In one embodiment, the present invention is directed to a method for identifying target genes of a transcription factor comprising: (i) transfecting host cells with a nucleic acid molecule described above; (ii) detecting host cells that express the selectable marker; (iii) contacting the host cells that express the selectable marker with an agent that induces nuclear localization of the chimeric protein; (iv) detecting the level of mRNA expressed in the host cells, wherein an alteration in the level of the mRNA expressed in the host cells that have nuclear localization of the chimeric protein compared to the level of the mRNA expressed in the host cells that do not have nuclear localization of the chimeric protein indicates the identification of target genes of the transcription factor; and (v) identifying direct target genes of the transcription factor using a method comprising: (a) contacting the host cells with cyclohexamide; and (b) detecting the level of mRNA expressed in the host cells; wherein an alteration in the level of the mRNA expressed in the host cells treated with cyclohexamide compared to the level of the mRNA expressed in the host cells not treated with cyclohexamdie indicates the identification of direct target genes of the transcription factor, wherein the host cell is a plant protoplast.
In one embodiment, the present invention is directed to a method for identifying target genes of a transcription factor comprising: (i) transfecting host cells with a nucleic acid molecule described above; (ii) detecting host cells that express the selectable marker; (iii) contacting the host cells that express the selectable marker with an agent that induces nuclear localization of the chimeric protein; and (iv) detecting the level of mRNA expressed in the host cells, wherein an alteration in the level of the mRNA expressed in the host cells that have nuclear localization of the chimeric protein compared to the level of the mRNA expressed in the host cells that do not have nuclear localization of the chimeric protein indicates the identification of target genes of the transcription factor, wherein the host cell is a plant protoplast derived from one of the following genuses: Acorus, Aegilops, Allium, Amborella, Antirrhinum, Apium, Arabidopsis, Arachis, Beta, Betula, Brassica, Capsicum, Ceratopteris, Citrus, Cryptomeria, Cycas, Descurainia, Eschscholzia, Eucalyptus, Glycine, Gossypium, Hedyotis, Helianthus, Hordeum, Ipomoea, Lactuca, Linum, Liriodendron, Lotus, Lupinus, Lycopersicon, Medicago, Mesembryanthemum, Nicotiana, Nuphar, Pennisetum, Persea, Phaseolus, Physcomitrella, Picea, Pinus, Poncirus, Populus, Prunus, Robinia, Rosa, Saccharum, Schedonorus, Secale, Sesamum, Solanum, Sorghum, Stevia, Thellungiella, Theobroma, Triphysaria, Triticum, Vitis, Zea, or Zinnia.
In one embodiment, the present invention is directed to a method for identifying target genes of a transcription factor comprising: (i) transfecting host cells with a nucleic acid molecule described above; (ii) detecting host cells that express the selectable marker; (iii) contacting the host cells that express the selectable marker with an agent that induces nuclear localization of the chimeric protein; (iv) detecting the level of mRNA expressed in the host cells, wherein an alteration in the level of the mRNA expressed in the host cells that have nuclear localization of the chimeric protein compared to the level of the mRNA expressed in the host cells that do not have nuclear localization of the chimeric protein indicates the identification of target genes of the transcription factor; and (v) identifying direct target genes of the transcription factor using a method comprising: (a) contacting the host cells with cyclohexamide; and (b) detecting the level of mRNA expressed in the host cells; wherein an alteration in the level of the mRNA expressed in the host cells treated with cyclohexamide compared to the level of the mRNA expressed in the host cells not treated with cyclohexamide indicates the identification of direct target genes of the transcription factor, wherein the host cell is a plant protoplast derived from one of the following genuses: Acorus, Aegilops, Allium, Amborella, Antirrhinum, Apium, Arabidopsis, Arachis, Beta, Betula, Brassica, Capsicum, Ceratopteris, Citrus, Cryptomeria, Cycas, Descurainia, Eschscholzia, Eucalyptus, Glycine, Gossypium, Hedyotis, Helianthus, Hordeum, Ipomoea, Lactuca, Linum, Liriodendron, Lotus, Lupinus, Lycopersicon, Medicago, Mesembryanthemum, Nicotiana, Nuphar, Pennisetum, Persea, Phaseolus, Physcomitrella, Picea, Pinus, Poncirus, Populus, Prunus, Robinia, Rosa, Saccharum, Schedonorus, Secale, Sesamum, Solanum, Sorghum, Stevia, Thellungiella, Theobroma, Triphysaria, Triticum, Vitis, Zea, or Zinnia.
In one embodiment, the present invention is directed to a method for identifying target genes of a transcription factor comprising: (i) transfecting host cells with a nucleic acid molecule described above; (ii) detecting host cells that express the selectable marker; (iii) contacting the host cells that express the selectable marker with an agent that induces nuclear localization of the chimeric protein; and (iv) detecting the level of mRNA expressed in the host cells, wherein an alteration in the level of the mRNA expressed in the host cells that have nuclear localization of the chimeric protein compared to the level of the mRNA expressed in the host cells that do not have nuclear localization of the chimeric protein indicates the identification of target genes of the transcription factor, wherein the host cells are transiently transfected with the nucleic acid molecules.
In one embodiment, the present invention is directed to a method for identifying target genes of a transcription factor comprising: (i) transfecting host cells with a nucleic acid molecule described above; (ii) detecting host cells that express the selectable marker; (iii) contacting the host cells that express the selectable marker with an agent that induces nuclear localization of the chimeric protein; and (iv) detecting the level of mRNA expressed in the host cells, wherein an alteration in the level of the mRNA expressed in the host cells that have nuclear localization of the chimeric protein compared to the level of the mRNA expressed in the host cells that do not have nuclear localization of the chimeric protein indicates the identification of target genes of the transcription factor, wherein the agent that induces nuclear localization of the chimeric protein is dexamethasone.
In one embodiment, the present invention is directed to a method for identifying target genes of a transcription factor comprising: (i) transfecting host cells with a nucleic acid molecule described above; (ii) detecting host cells that express the selectable marker; (iii) contacting the host cells that express the selectable marker with an agent that induces nuclear localization of the chimeric protein; and (iv) detecting the level of mRNA expressed in the host cells, wherein an alteration in the level of the mRNA expressed in the host cells that have nuclear localization of the chimeric protein compared to the level of the mRNA expressed in the host cells that do not have nuclear localization of the chimeric protein indicates the identification of target genes of the transcription factor, wherein the step of detecting host cells that express the selectable marker is performed by Fluorescence Activated Cell Sorting (FACS).
In one embodiment, the present invention is directed to a method for identifying target genes of a transcription factor comprising: (i) transfecting host cells with a nucleic acid molecule described above; (ii) detecting host cells that express the selectable marker; (iii) contacting the host cells that express the selectable marker with an agent that induces nuclear localization of the chimeric protein; and (iv) detecting the level of mRNA expressed in the host cells, wherein an alteration in the level of the mRNA expressed in the host cells that have nuclear localization of the chimeric protein compared to the level of the mRNA expressed in the host cells that do not have nuclear localization of the chimeric protein indicates the identification of target genes of the transcription factor, wherein the step of detecting the level of mRNA expressed in the host cells is performed by quantitative PCR, high throughput sequencing, or gene microarrays.
In one embodiment, the present invention is directed to a method for identifying target genes of a transcription factor comprising: (i) transfecting host cells with a nucleic acid molecule described above; (ii) detecting host cells that express the selectable marker; (iii) contacting the host cells that express the selectable marker with an agent that induces nuclear localization of the chimeric protein; and (iv) detecting the level of mRNA expressed in the host cells, wherein an alteration in the level of the mRNA expressed in the host cells that have nuclear localization of the chimeric protein compared to the level of the mRNA expressed in the host cells that do not have nuclear localization of the chimeric protein indicates the identification of target genes of the transcription factor, wherein the host cell is derived from a genus that is different from the genus from which the transcription factor is derived from.
In one embodiment, the present invention is directed to a method for identifying target genes of a transcription factor comprising: (i) transfecting host cells with a nucleic acid molecule described above; (ii) detecting host cells that express the selectable marker; (iii) contacting the host cells that express the selectable marker with an agent that induces nuclear localization of the chimeric protein; and (iv) detecting the level of mRNA expressed in the host cells, wherein an alteration in the level of the mRNA expressed in the host cells that have nuclear localization of the chimeric protein compared to the level of the mRNA expressed in the host cells that do not have nuclear localization of the chimeric protein indicates the identification of target genes of the transcription factor, wherein the host cell is a plant protoplast derived from the genus Arabidopsis and the transcription factor is derived from the genus Zea.
In one embodiment, the present invention is directed to a method for identifying target genes of a transcription factor comprising: (i) transfecting plant protoplasts with a DNA plasmid that encodes (a) a chimeric protein comprising a transcription factor fused to a glucocorticoid receptor; and (b) an independently expressed red fluorescent protein; (ii) detecting the plant protoplasts that express the red fluorescent protein by performing Fluorescence Activated Cell Sorting.(FACS); (iii) contacting the plant protoplasts that express the red fluorescent protein with an dexamethasone; and (iv) detecting the level of mRNA expressed in the host cells, wherein an alteration in the level of the mRNA expressed in the plant protoplasts that have nuclear localization of the chimeric protein compared to the level of the mRNA expressed in the plant protoplasts that do not have nuclear localization of the chimeric protein indicates the identification of target genes of the transcription factor.
In one embodiment, the present invention is directed to a method for identifying target genes of a transcription factor comprising: (i) transfecting host cells with a nucleic acid molecule described above; (ii) detecting host cells that express the selectable marker; (iii) contacting the host cells that express the selectable marker with an agent that induces nuclear localization of the chimeric protein; (iv) detecting the level of mRNA expressed in the host cells, wherein an alteration in the level of the mRNA expressed in the host cells that have nuclear localization of the chimeric protein compared to the level of the mRNA expressed in the host cells that do not have nuclear localization of the chimeric protein indicates the identification of target genes of the transcription factor; and (v) detecting transcription factor binding to genomic DNA in the host cells.
In one embodiment, the present invention is directed to a method for identifying target genes of a transcription factor comprising: (i) transfecting host cells with a nucleic acid molecule described above; (ii) detecting host cells that express the selectable marker; (iii) contacting the host cells that express the selectable marker with an agent that induces nuclear localization of the chimeric protein; and (iv) detecting the level of mRNA expressed in the host cells, wherein an alteration in the level of the mRNA expressed in the host cells that have nuclear localization of the chimeric protein compared to the level of the mRNA expressed in the host cells that do not have nuclear localization of the chimeric protein indicates the identification of target genes of the transcription factor, and wherein the transcription factor is not ABI3.
In one embodiment, the present invention is directed to a method for identifying target genes of a transcription factor comprising: (i) transfecting host cells with a nucleic acid molecule described above; (ii) detecting host cells that express the selectable marker; (iii) contacting the host cells that express the selectable marker with an agent that induces nuclear localization of the chimeric protein; (iv) detecting the level of mRNA expressed in the host cells, wherein an alteration in the level of the mRNA expressed in the host cells that have nuclear localization of the chimeric protein compared to the level of the mRNA expressed in the host cells that do not have nuclear localization of the chimeric protein indicates the identification of target genes of the transcription factor; and (v) detecting transcription factor binding to genomic DNA in the host cells, wherein the transcription factor is not ABI3.
Units, prefixes, and symbols may be denoted in their SI accepted form. Unless otherwise indicated, nucleic acids are written left to right in 5′ to 3′ orientation; amino acid sequences are written left to right in amino to carboxyl orientation, respectively. Numeric ranges recited within the specification are inclusive of the numbers defining the range and include each integer within the defined range. Amino acids may be referred to herein by either their commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly accepted single-letter codes. Unless otherwise provided for, software, electrical, and electronics terms as used herein are as defined in The New IEEE Standard Dictionary of Electrical and Electronics Terms (5th edition, 1993). The terms defined below are more fully defined by reference to the specification as a whole.
As used herein, the term “agronomic” includes, but is not limited to, changes in root size, vegetative yield, seed yield or overall plant growth. Other agronomic properties include factors desirable to agricultural production and business.
By “amplified” is meant the construction of multiple copies of a nucleic acid sequence or multiple copies complementary to the nucleic acid sequence using at least one of the nucleic acid sequences as a template. Amplification systems include the polymerase chain reaction (PCR) system, ligase chain reaction (LCR) system, nucleic acid sequence based amplification (NASBA, Cangene, Mississauga, Ontario), Q-Beta Replicase systems, transcription-based amplification system (TAS), and strand displacement amplification (SDA). See, e.g., Diagnostic Molecular Microbiology: Principles and Applications, D. H. Persing et al., Ed., 1993, American Society for Microbiology, Washington, D.C. The product of amplification is termed an amplicon.
As used herein, “antisense orientation” includes reference to a duplex polynucleotide sequence that is operably linked to a promoter in an orientation where the antisense strand is transcribed. The antisense strand is sufficiently complementary to an endogenous transcription product such that translation of the endogenous transcription product is often inhibited.
In its broadest sense, a “delivery system,” as used herein, is any vehicle capable of facilitating delivery of a nucleic acid (or nucleic acid complex) to a cell and/or uptake of the nucleic acid by the cell.
The term “ectopic” is used herein to mean abnormal subcellular (e.g., switch between organellar and cytosolic localization), cell-type, tissue-type and/or developmental or temporal expression (e.g., light/dark) patterns for the particular gene or enzyme in question. Such ectopic expression does not necessarily exclude expression in tissues or developmental stages normal for said enzyme but rather entails expression in tissues or developmental stages not normal for the said enzyme.
By “endogenous nucleic acid sequence” and similar terms, it is intended that the sequences are natively present in the recipient plant genome and not substantially modified from its original form.
The term “exogenous nucleic acid sequence” as used herein refers to a nucleic acid foreign to the recipient plant host or, native to the host if the native nucleic acid is substantially modified from its original form. For example, the term includes a nucleic acid originating in the host species, where such sequence is operably linked to a promoter that differs from the natural or wild-type promoter.
By “encoding” or “encoded”, with respect to a specified nucleic acid, is meant comprising the information for translation into the specified protein. A nucleic acid encoding a protein may comprise non-translated sequences (e.g., introns) within translated regions of the nucleic acid, or may lack such intervening non-translated sequences (e.g., as in cDNA). The information by which a protein is encoded is specified by the use of codons. Typically, the amino acid sequence is encoded by the nucleic acid using the “universal” genetic code. However, variants of the universal code, such as are present in some plant, animal, and fungal mitochondria, the bacterium Mycoplasma capricolum, or the ciliate Macronucleus, may be used when the nucleic acid is expressed therein.
When the nucleic acid is prepared or altered synthetically, advantage can be taken of known codon preferences of the intended host where the nucleic acid is to be expressed. For example, although nucleic acid sequences of the present invention may be expressed in both monocotyledonous and dicotyledonous plant species, sequences can be modified to account for the specific codon preferences and GC content preferences of monocotyledons or dicotyledons as these preferences have been shown to differ (Murray et al., 1989, Nucl. Acids Res. 17: 477-498). Thus, the maize preferred codon for a particular amino acid may be derived from known gene sequences from maize. Maize codon usage for 28 genes from maize plants is listed in Table 4 of Murray et al., supra.
By “fragment” is intended a portion of the nucleotide sequence. Fragments of the modulator sequence will generally retain the biological activity of the native suppressor protein. Alternatively, fragments of the targeting sequence may or may not retain biological activity. Such targeting sequences may be useful as hybridization probes, as antisense constructs, or as co-suppression sequences. Thus, fragments of a nucleotide sequence may range from at least about 20 nucleotides, about 50 nucleotides, about 100 nucleotides, and up to the full-length nucleotide sequence of the invention.
As used herein “full-length sequence” in reference to a specified polynucleotide or its encoded protein means having the entire amino acid sequence of, a native (non-synthetic), endogenous, biologically active form of the specified protein. Methods to determine whether a sequence is full-length are well known in the art including such exemplary techniques as northern or western blots, primer extension, 51 protection, and ribonuclease protection. See, e.g., Plant Molecular Biology: A Laboratory Manual, Clark, Ed., 1997, Springer-Verlag, Berlin. Comparison to known full-length homologous (orthologous and/or paralogous) sequences can also be used to identify full-length sequences of the present invention. Additionally, consensus sequences typically present at the 5′ and 3′ untranslated regions of mRNA aid in the identification of a polynucleotide as full-length. For example, the consensus sequence ANNNNAUGG, where the underlined codon represents the N-terminal methionine, aids in determining whether the polynucleotide has a complete 5′ end. Consensus sequences at the 3′ end, such as polyadenylation sequences, aid in determining whether the polynucleotide has a complete 3′ end.
The term “gene activity” refers to one or more steps involved in gene expression, including transcription, translation, and the functioning of the protein encoded by the gene.
The term “genetic modification” as used herein refers to the introduction of one or more exogenous nucleic acid sequences as well as regulatory sequences, into one or more plant cells, which in certain cases can generate whole, sexually competent, viable plants. The term “genetically modified” or “genetically engineered” as used herein refers to a plant which has been generated through the aforementioned process. Genetically modified plants of the invention are capable of self-pollinating or cross-pollinating with other plants of the same species so that the foreign gene, carried in the germ line, can be inserted into or bred into agriculturally useful plant varieties.
As used herein, “heterologous” in reference to a nucleic acid is a nucleic acid that originates from a foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention. For example, a promoter operably linked to a heterologous structural gene is from a species different from that from which the structural gene was derived, or, if from the same species, one or both are substantially modified from their original form. A heterologous protein may originate from a foreign species or, if from the same species, is substantially modified from its original form by deliberate human intervention.
By “host cell” is meant a cell that contains a vector and supports the replication and/or expression of the vector. Host cells may be prokaryotic cells such as E. coli, or eukaryotic cells such as yeast, plant, insect, amphibian, or mammalian cells. Preferably, host cells are monocotyledonous or dicotyledonous plant cells. A particularly preferred monocotyledonous host cell is a maize host cell.
The term “introduced” in the context of inserting a nucleic acid into a cell, means “transfection” or “transformation” or “transduction” and includes reference to the incorporation of a nucleic acid into a eukaryotic or prokaryotic cell where the nucleic acid may be incorporated into the genome of the cell (e.g., chromosome, plasmid, plastid or mitochondrial DNA), converted into an autonomous replicon, or transiently expressed (e.g., transfected mRNA).
The term “isolated” refers to material, such as a nucleic acid or a protein, which is: (1) substantially or essentially free from components which normally accompany or interact with it as found in its natural environment The isolated material optionally comprises material not found with the material in its natural environment; or (2) if the material is in its natural environment, the material has been synthetically altered or synthetically produced by deliberate human intervention and/or placed at a different location within the cell. The synthetic alteration or creation of the material can be performed on the material within or apart from its natural state. For example, a naturally-occurring nucleic acid becomes an isolated nucleic acid if it is altered or produced by non-natural, synthetic methods, or if it is transcribed from DNA which has been altered or produced by non-natural, synthetic methods. See, e.g., Compounds and Methods for Site Directed Mutagenesis in Eukaryotic Cells, Kmiec, U.S. Pat. No. 5,565,350; In vivo Homologous Sequence Targeting in Eukaryotic Cells; Zarling et al., PCT/US93/03868. The isolated nucleic acid may also be produced by the synthetic re-arrangement (“shuffling”) of a part or parts of one or more allelic forms of the gene of interest Likewise, a naturally-occurring nucleic acid (e.g., a promoter) becomes isolated if it is introduced to a different locus of the genome. Nucleic acids which are “isolated,” as defined herein, are also referred to as “heterologous” nucleic acids.
As used herein, the term “marker” refers to a gene encoding a trait or a phenotype which permits the selection of, or the screening for, a plant or plant cell containing the marker.
As used herein, “nucleic acid” includes reference to a deoxyribonucleotide or ribonucleotide polymer, or chimeras thereof, in either single- or double-stranded form, and unless otherwise limited, encompasses known analogues having the essential nature of natural nucleotides in that they hybridize to single-stranded nucleic acids in a manner similar to naturally occurring nucleotides (e.g., peptide nucleic acids).
By “nucleic acid library” is meant a collection of isolated DNA or RNA molecules which comprise and substantially represent the entire transcribed fraction of a genome of a specified organism or of a tissue from that organism. Construction of exemplary nucleic acid libraries, such as genomic and cDNA libraries, is taught in standard molecular biology references such as Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology, Vol. 152, Academic Press, Inc., San Diego, Calif (Berger); Sambrook et al., 1989, Molecular Cloning—A Laboratory Manual, 2nd ed., Vol. 1-3; and Current Protocols in Molecular Biology, F. M. Ausubel et al., Eds., 1994, Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc.
As used herein “operably linked” includes reference to a functional linkage between a promoter and a second sequence, wherein the promoter sequence initiates and mediates transcription of the DNA sequence corresponding to the second sequence. Generally, operably linked means that the nucleic acid sequences being linked are contiguous and, where necessary to join two protein coding regions, contiguous and in the same reading frame.
The term “orthologous” as used herein describes a relationship between two or more polynucleotides or proteins. Two polynucleotides or proteins are “orthologous” to one another if they are derived from a common ancestral gene and serve a similar function in different organisms. In general, orthologous polynucleotides or proteins will have similar catalytic functions (when they encode enzymes) or will serve similar structural functions (when they encode proteins or RNA that form part of the ultrastructure of a cell).
The term “overexpression” is used herein to mean above the normal expression level in the particular tissue, all and/or developmental or temporal stage for said enzyme/expressed protein product.
As used herein, the term “plant” is used in its broadest sense, including, but is not limited to, any species of woody, ornamental or decorative, crop or cereal, fruit or vegetable plant, and algae (e.g., Chlamydomonas reinhardtii). Non-limiting examples of plants include plants from the genus Arabidopsis or the genus Oryza. Other examples include plants from the genuses Acorus, Aegilops, Allium, Amborella, Antirrhinum, Apium, Arachis, Beta, Betula, Brassica, Capsicum, Ceratopteris, Citrus, Cryptomeria, Cycas, Descurainia, Eschscholzia, Eucalyptus, Glycine, Gossypium, Hedyotis, Helianthus, Hordeum, Ipomoea, Lactuca, Linum, Liriodendron, Lotus, Lupinus, Lycopersicon, Medicago, Mesembryanthemum, Nicotiana, Nuphar, Pennisetum, Persea, Phaseolus, Physcomitrella, Picea, Pinus, Poncirus, Populus, Prunus, Robinia, Rosa, Saccharum, Schedonorus, Secale, Sesamum, Solanum, Sorghum, Stevia, Thellungiella, Theobroma, Triphysaria, Triticum, Vitis, Zea, or Zinnia.” Plants included in the invention are any plants amenable to transformation techniques, including gymnosperms and angiosperms, both monocotyledons and dicotyledons. Examples of monocotyledonous angiosperms include, but are not limited to, asparagus, field and sweet corn, barley, wheat, rice, sorghum, onion, pearl millet, rye and oats and other cereal grains. Examples of dicotyledonous angiosperms include, but are not limited to tomato, tobacco, cotton, rapeseed, field beans, soybeans, peppers, lettuce, peas, alfalfa, clover, cole crops or Brassica oleracea (e.g., cabbage, broccoli, cauliflower, brussel sprouts), radish, carrot, beets, eggplant, spinach, cucumber, squash, melons, cantaloupe, sunflowers and various ornamentals. Examples of woody species include poplar, pine, sequoia, cedar, oak, etc. Still other examples of plants include, but are not limited to, wheat, cauliflower, tomato, tobacco, corn, petunia, trees, etc. As used herein, the term “cereal crop” is used in its broadest sense. The term includes, but is not limited to, any species of grass, or grain plant (e.g., barley, corn, oats, rice, wild rice, rye, wheat, millet, sorghum, triticale, etc.), non-grass plants (e.g., buckwheat flax, legumes or soybeans, etc.). As used herein, the term “crop” or “crop plant” is used in its broadest sense. The term includes, but is not limited to, any species of plant or algae edible by humans or used as a feed for animals or used, or consumed by humans, or any plant or algae used in industry or commerce. As used herein, the term “plant” also refers to either a whole plant, a plant part, or organs (e.g., leaves, stems, roots, etc.), a plant cell, or a group of plant cells, such as plant tissue, plant seeds and progeny of same. Plantlets are also included within the meaning of “plant.” The class of plants which can be used in the methods of the invention is generally as broad as the class of higher plants amenable to transformation techniques, including both monocotyledonous and dicotyledonous plants.
The term “plant cell” as used herein refers to protoplasts, gamete producing cells, and cells which regenerate into whole plants. Plant cell, as used herein, further includes, without limitation, cells obtained from or found in: seeds, suspension cultures, embryos, meristematic regions, callus tissue, leaves, roots, shoots, gametophytes, sporophytes, pollen, and microspores. Plant cells can also be understood to include modified cells, such as protoplasts, obtained from the aforementioned tissues.
As used herein, “polynucleotide” includes reference to a deoxyribopolynucleotide, ribopolynucleotide, or chimeras or analogs thereof that have the essential nature of a natural deoxy- or ribo-nucleotide in that they hybridize, under stringent hybridization conditions, to substantially the same nucleotide sequence as naturally occurring nucleotides and/or allow translation into the same amino acid(s) as the naturally occurring nucleotide(s). A polynucleotide can be full-length or a subsequence of a native or heterologous structural or regulatory gene. Unless otherwise indicated, the term includes reference to the specified sequence as well as the complementary sequence thereof. Thus, DNAs or RNAs with backbones modified for stability or for other reasons are “polynucleotides” as that term is intended herein. Moreover, DNAs or RNAs comprising unusual bases, such as inosine, or modified bases, such as tritylated bases, to name just two examples, are polynucleotides as the term is used herein. It will be appreciated that a great variety of modifications have been made to DNA and RNA that serve many useful purposes known to those of skill in the art. The term polynucleotide as it is employed herein embraces such chemically-, enzymatically- or metabolically-modified forms of polynucleotides, as well as the chemical forms of DNA and RNA characteristic of viruses and cells, including among other things, simple and complex cells.
The terms “polypeptide”, “peptide” and “protein” are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residue is an artificial chemical analogue of a corresponding naturally-occurring amino acid, as well as to naturally-occurring amino acid polymers. The essential nature of such analogues of naturally-occurring amino acids is that, when incorporated into a protein, that protein is specifically reactive to antibodies elicited to the same protein but consisting entirely of naturally occurring amino acids. The terms “polypeptide”, “peptide” and “protein” are also inclusive of modifications including, but not limited to, glycosylation, lipid attachment, sulfation, gamma-carboxylation of glutamic acid residues, hydroxylation and ADP-ribosylation. Further, this invention contemplates the use of both the methionine-containing and the methionine-less amino terminal variants of the protein of the invention.
As used herein “promoter” includes reference to a region of DNA upstream from the start of transcription and involved in recognition and binding of RNA polymerase and other proteins to initiate transcription. A “plant promoter” is a promoter capable of initiating transcription in plant cells whether or not its origin is a plant cell. Exemplary plant promoters include, but are not limited to, those that are obtained from plants, plant viruses, and bacteria which comprise genes expressed in plant cells such Agrobacterium or Rhizobium. Examples of promoters under developmental control include promoters that preferentially initiate transcription in certain tissues, such as leaves, roots, or seeds. Such promoters are referred to as “tissue preferred.” Promoters which initiate transcription only in certain tissue are referred to as “tissue specific.” A “cell type” specific promoter primarily drives expression in certain cell types in one or more organs, for example, vascular cells in roots or leaves. An “inducible” or “repressible” promoter is a promoter which is under environmental control. Examples of environmental conditions that may affect transcription by inducible promoters include anaerobic conditions or the presence of light Tissue specific, tissue preferred, cell type specific, and inducible promoters represent the class of “non-constitutive” promoters. A “constitutive” promoter is a promoter which is active under most environmental conditions.
As used herein “recombinant” includes reference to a cell or vector that has been modified by the introduction of a heterologous nucleic acid, or to a cell derived from a cell so modified. Thus, for example, recombinant cells express genes that are not found in identical form within the native (non-recombinant) form of the cell, or exhibit altered expression of native genes, as a result of deliberate human intervention. The term “recombinant” as used herein does not encompass the alteration of the cell or vector by events (e.g., spontaneous mutation, natural transformation, transduction, or transposition) occurring without deliberate human intervention.
As used herein, a “recombinant expression cassette” is a nucleic acid construct, generated recombinantly or synthetically, with a series of specified nucleic acid elements which permit transcription of a particular nucleic acid in a host cell. The recombinant expression cassette can be incorporated into a plasmid, chromosome, mitochondrial DNA, plastid DNA, virus, or nucleic acid fragment Typically, the recombinant expression cassette portion of an expression vector includes, among other sequences, a nucleic acid to be transcribed, and a promoter.
The term “regulatory sequence” as used herein refers to a nucleic acid sequence capable of controlling the transcription of an operably associated gene. Therefore, placing a gene under the regulatory control of a promoter or a regulatory element means positioning the gene such that the expression of the gene is controlled by the regulatory sequence(s). Because a microRNA binds to its target, it is a post transcriptional mechanism for regulating levels of mRNA. Thus, an miRNA can also be considered a “regulatory sequence” herein. Not just transcription factors.
The term “residue” or “amino acid residue” or “amino acid” are used interchangeably herein to refer to an amino acid that is incorporated into a protein, polypeptide, or peptide (collectively “protein”). The amino acid may be a naturally occurring amino acid and, unless otherwise limited, may encompass non-natural analogs of natural amino acids that can function in a similar manner as naturally occurring amino acids.
The term “tissue-specific promotor” is a polynucleotide sequence that specifically binds to transcription factors expressed primarily or only in such specific tissue.
The term “selectively hybridizes” includes reference to hybridization, under stringent hybridization conditions, of a nucleic acid sequence to a specified nucleic acid target sequence to a detectably greater degree (e.g., at least 2-fold over background) than its hybridization to non-target nucleic acid sequences and to the substantial exclusion of non-target nucleic acids. Selectively hybridizing sequences typically have about at least 80% sequence identity, preferably 90% sequence identity, and most preferably 100% sequence identity (i.e., complementary) with each other.
As used herein, a “stem-loop motif” or a “stem-loop structure,” sometimes also referred to as a “hairpin structure,” is given its ordinary meaning in the art, i.e., in reference to a single nucleic acid molecule having a secondary structure that includes a double-stranded region (a “stem” portion) composed of two regions of nucleotides (of the same molecule) forming either side of the double-stranded portion, and at least one “loop” region, comprising uncomplemented nucleotides (i.e., a single-stranded region).
The term “stringent conditions” or “stringent hybridization conditions” includes reference to conditions under which a probe will selectively hybridize to its target sequence, to a detectably greater degree than to other sequences (e.g., at least 2-fold over background). Stringent conditions are sequence-dependent and will be different in different circumstances. By controlling the stringency of the hybridization and/or washing conditions, target sequences can be identified which are 100% complementary to the probe (homologous probing). Alternatively, stringency conditions can be adjusted to allow some mismatching in sequences so that lower degrees of similarity are detected (heterologous probing). Generally, a probe is less than about 1000 nucleotides in length, optionally less than 500 nucleotides in length.
Typically, stringent conditions will be those in which the salt concentration is less than about 1.5 M Na ion, typically about 0.01 to 1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30° C. for short probes (e.g., 10 to 50 nucleotides) and at least about 60° C. for long probes (e.g., greater than 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. Exemplary low stringency conditions include hybridization with a buffer solution of 30 to 35% formamide, 1 M NaCl, 1% SDS (sodium dodecyl sulphate) at 37° C., and a wash in lx to 2×SSC (20×SSC=3.0 M NaCl/0.3 M trisodium citrate) at 50 to 55° C. Exemplary moderate stringency conditions include hybridization in 40 to 45% formamide, 1 M NaCl, 1% SDS at 37° C., and a wash in 0.5× to 1×SSC at 55 to 60° C. Exemplary high stringency conditions include hybridization in 50% formamide, 1 M NaCl, 1% SDS at 37° C., and a wash in 0.1×SSC at 60 to 65° C.
Specificity is typically the function of post-hybridization washes, the critical factors being the ionic strength and temperature of the final wash solution. For DNA-DNA hybrids, the Tm can be approximated from the equation of Meinkoth and Wahl, 1984, Anal. Biochem., 138:267-284: Tm=81.5° C+16.6 (log M)+0.41 (%GC)-0.61 (% form)-500/L; where M is the molarity of monovalent cations, %GC is the percentage of guanosine and cytosine nucleotides in the DNA, % form is the percentage of formamide in the hybridization solution, and L is the length of the hybrid in base pairs. The Tm is the temperature (under defined ionic strength and pH) at which 50% of a complementary target sequence hybridizes to a perfectly matched probe. Tm is reduced by about 1° C. for each 1% of mismatching; thus, Tm, hybridization and/or wash conditions can be adjusted to hybridize to sequences of the desired identity. For example, if sequences with ≥90% identity are sought, the Tm can be decreased 10° C. Generally, stringent conditions are selected to be about 5° C. lower than the thermal melting point (Tm) for the specific sequence and its complement at a defined ionic strength and pH. However, severely stringent conditions can utilize a hybridization and/or wash at 1, 2, 3, or 4° C. lower than the thermal melting point (Tm); moderately stringent conditions can utilize a hybridization and/or wash at 6, 7, 8, 9, or 10° C. lower than the thermal melting point (Tm); low stringency conditions can utilize a hybridization and/or wash at 11, 12, 13, 14, 15, or 20° C. lower than the thermal melting point (Tm). Using the equation, hybridization and wash compositions, and desired Tm, those of ordinary skill will understand that variations in the stringency of hybridization and/or wash solutions are inherently described. If the desired degree of mismatching results in a Tm of less than 45° C. (aqueous solution) or 32° C. (formamide solution) it is preferred to increase the SSC concentration so that a higher temperature can be used. An extensive guide to the hybridization of nucleic acids is found in Tijssen, 1993, Laboratory Techniques in Biochemistry and Molecular Biology—Hybridization with Nucleic Acid Probes, Part I, Chapter 2 “Overview of principles of hybridization and the strategy of nucleic acid probe assays”, Elsevier, N.Y.; and Current Protocols in Molecular Biology, Chapter 2, Ausubel et al., Eds., 1995, Greene Publishing and Wiley-Interscience, New York. Hybridization and/or wash conditions can be applied for at least 10, 30, 60, 90, 120, or 240 minutes.
As used herein, “transcription factor” (“TF”) includes reference to a protein which interacts with a DNA regulatory element to affect expression of a structural gene or expression of a second regulatory gene. “Transcription factor” may also refer to the DNA encoding said transcription factor protein. The function of a transcription factor may include activation or repression of transcription initiation.
The term “transfection,” as used herein, refers to the introduction of a nucleic acid into a cell. The term “transient transfection,’ as used herein, refers to the introduction of a nucleic acid into a cell, wherein the nucleic acids introduced into the transfected cell are not permanently incorporated into the cellular genome.
As used herein, “transgenic plant” includes reference to a plant which comprises within its genome a heterologous polynucleotide or which lacks, by means of homologous recombination or other methods, a native polynucleotide. Generally, the heterologous polynucleotide is stably integrated within the genome such that the polynucleotide is passed on to successive generations. The heterologous polynucleotide may be integrated into the genome alone or as part of a recombinant expression cassette. “Transgenic” is used herein to include any cell, cell line, callus, tissue, plant part or plant, the genotype of which has been altered by the presence of heterologous nucleic acid or lacks a native nucleic acid including those transgenics initially so altered as well as those created by sexual crosses or asexual propagation from the initial transgenic. The term “transgenic” as used herein does not encompass the alteration of the genome (chromosomal or extra-chromosomal) by conventional plant breeding methods or by naturally occurring events such as random cross-fertilization, non-recombinant viral infection, non-recombinant bacterial transformation, non-recombinant transposition, or spontaneous mutation.
The term “underexpression” is used herein to mean below the normal expression level in the particular tissue, all and/or developmental or temporal stage for said enzyme/expressed protein product.
As used herein, “vector” includes reference to a nucleic acid used in introduction of a polynucleotide of the present invention into a host cell. Vectors are often replicons. Expression vectors permit transcription of a nucleic acid inserted therein.
The following terms are used to describe the sequence relationships between a polynucleotide/polypeptide of the present invention with a reference polynucleotide/polypeptide: (a) “reference sequence”, (b) “comparison window”, (c) “sequence identity”, and (d) “percentage of sequence identity”.
(a) As used herein, “reference sequence” is a defined sequence used as a basis for sequence comparison with a polynucleotide/polypeptide of the present invention. A reference sequence may be a subset or the entirety of a specified sequence; for example, as a segment of a full-length cDNA or gene sequence, or the complete cDNA or gene sequence.
(b) As used herein, “comparison window” includes reference to a contiguous and specified segment of a polynucleotide/polypeptide sequence, wherein the polynucleotide/polypeptide sequence may be compared to a reference sequence and wherein the portion of the polynucleotide/polypeptide sequence in the comparison window may comprise additions or deletions (i.e., gaps) compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. Generally, the comparison window is at least 20 contiguous nucleotides/amino acids residues in length, and optionally can be 30, 40, 50, 100, or longer. Those of skill in the art understand that to avoid a high similarity to a reference sequence due to inclusion of gaps in the polynucleotide/polypeptide sequence, a gap penalty is typically introduced and is subtracted from the number of matches.
Methods of alignment of sequences for comparison are well-known in the art. Optimal alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman, 1981, Adv. Appl. Math. 2: 482; by the homology alignment algorithm of Needleman and Wunsch, 1970, J. Mol. Biol. 48: 443; by the search for similarity method of Pearson and Lipman, 1988, Proc. Natl. Acad. Sci. 85: 2444; by computerized implementations of these algorithms, including, but not limited to: CLUSTAL in the PC/Gene program by Intelligenetics, Mountain View, Calif.; GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science Dr., Madison, Wis., USA; the CLUSTAL program is well described by Higgins and Sharp, 1988, Gene 73: 237-244; Higgins and Sharp, 1989, CABIOS 5: 151-153; Corpet et al., 1988, Nucleic Acids Research 16: 10881-90; Huang et al., 1992, Computer Applications in the Biosciences 8: 155-65; and Pearson et al., 1994, Methods in Molecular Biology 24: 307-331.
The BLAST family of programs which can be used for database similarity searches includes: BLASTN for nucleotide query sequences against nucleotide database sequences; BLASTX for nucleotide query sequences against protein database sequences; BLASTP for protein query sequences against protein database sequences; TBLASTN for protein query sequences against nucleotide database sequences; and TBLASTX for nucleotide query sequences against nucleotide database sequences. See, Current Protocols in Molecular Biology, Chapter 19, Ausubel et al., Eds., 1995, Greene Publishing and Wiley-Interscience, New York.
Software for performing BLAST analyses is publicly available, e.g., through the National Center for Biotechnology Information (world-wide web at ncbi.nlm.nih.gov). This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold. These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always >0) and N (penalty score for mismatching residues; always <0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment The BLASTN program (for nucleotide sequences) uses as defaults a wordlength (W) of 11, an expectation (E) of 10, a cutoff of 100, M=5, N=−4, and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff & Henikoff, 1989, Proc. Natl. Acad. Sci. USA 89:10915).
In addition to calculating percent sequence identity, the BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin & Altschul, 1993, Proc. Natl. Acad. Sci. USA 90:5873-5877). One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance.
BLAST searches assume that proteins can be modeled as random sequences. However, many real proteins comprise regions of nonrandom sequences which may be homopolymeric tracts, short-period repeats, or regions enriched in one or more amino acids. Such low-complexity regions may be aligned between unrelated proteins even though other regions of the protein are entirely dissimilar. A number of low-complexity filter programs can be employed to reduce such low-complexity alignments. For example, the SEG (Wooten and Federhen, 1993, Comput. Chem., 17:149-163) and XNU (Claverie and States, 1993, Comput. Chem., 17:191-201) low-complexity filters can be employed alone or in combination.
Unless otherwise stated, nucleotide and protein identity/similarity values provided herein are calculated using GAP (GCG Version 10) under default values.
GAP (Global Alignment Program) can also be used to compare a polynucleotide or polypeptide of the present invention with a reference sequence. GAP uses the algorithm of Needleman and Wunsch (J. Mol. Biol. 48: 443-453,1970) to find the alignment of two complete sequences that maximizes the number of matches and minimizes the number of gaps. GAP considers all possible alignments and gap positions and creates the alignment with the largest number of matched bases and the fewest gaps. It allows for the provision of a gap creation penalty and a gap extension penalty in units of matched bases. GAP must make a profit of gap creation penalty number of matches for each gap it inserts. If a gap extension penalty greater than zero is chosen, GAP must, in addition, make a profit for each gap inserted of the length of the gap times the gap extension penalty. Default gap creation penalty values and gap extension penalty values in Version 10 of the Wisconsin Genetics Software Package for protein sequences are 8 and 2, respectively. For nucleotide sequences the default gap creation penalty is 50 while the default gap extension penalty is 3. The gap creation and gap extension penalties can be expressed as an integer selected from the group of integers consisting of from 0 to 100. Thus, for example, the gap creation and gap extension penalties can each independently be: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60 or greater.
GAP presents one member of the family of best alignments. There may be many members of this family, but no other member has a better quality. GAP displays four figures of merit for alignments: Quality, Ratio, Identity, and Similarity. The Quality is the metric maximized in order to align the sequences. Ratio is the quality divided by the number of bases in the shorter segment Percent Identity is the percent of the symbols that actually match. Percent Similarity is the percent of the symbols that are similar. Symbols that are across from gaps are ignored. A similarity is scored when the scoring matrix value for a pair of symbols is greater than or equal to 0.50, the similarity threshold. The scoring matrix used in Version 10 of the Wisconsin Genetics Software Package is BLOSUM62 (see Henikoff & Henikoff, 1989, Proc. Natl. Acad. Sci. USA 89:10915).
Multiple alignment of the sequences can be performed using the CLUSTAL method of alignment (Higgins and Sharp, 1989, CABIOS. 5:151-153) with the default parameters (GAP PENALTY=10, GAP LENGTH PENALTY=10). Default parameters for pairwise alignments using the CLUSTAL method are KTUPLE 1, GAP PENALTY=3, WINDOW=5 and DIAGONALS SAVED=5.
(c) As used herein, “sequence identity” or “identity” in the context of two nucleic acid or polypeptide sequences includes reference to the residues in the two sequences which are the same when aligned for maximum correspondence over a specified comparison window. When percentage of sequence identity is used in reference to proteins it is recognized that residue positions which are not identical often differ by conservative amino acid substitutions, where amino acid residues are substituted for other amino acid residues with similar chemical properties (e.g., charge or hydrophobicity) and therefore do not change the functional properties of the molecule. Where sequences differ in conservative substitutions, the percent sequence identity may be adjusted upwards to correct for the conservative nature of the substitution. Sequences which differ by such conservative substitutions are said to have “sequence similarity” or “similarity”. Means for making this adjustment are well-known to those of skill in the art. Typically this involves scoring a conservative substitution as a partial rather than a full mismatch, thereby increasing the percentage sequence identity. Thus, for example, where an identical amino acid is given a score of 1 and a non-conservative substitution is given a score of zero, a conservative substitution is given a score between zero and 1. The scoring of conservative substitutions is calculated, e.g., according to the algorithm of Meyers and Miller, 1988, Computer Applic. Biol. Sci., 4:11-17, e.g., as implemented in the program PC/GENE (Intelligenetics, Mountain View, Calif., USA).
Polynucleotide sequences having “substantial identity” are those sequences having at least about 50%, 60% sequence identity, generally 70% sequence identity, preferably at least 80%, more preferably at least 90%, and most preferably at least 95%, compared to a reference sequence using one of the alignment programs described above. Preferably sequence identity is determined using the default parameters determined by the program. Substantial identity of amino acid sequences generally means sequence identity of at least 50%, more preferably at least 70%, 80%, 90%, and most preferably at least 95%. Nucleotide sequences are generally substantially identical if the two molecules hybridize to each other under stringent conditions.
(d) As used herein, “percentage of sequence identity” means the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity.
As used herein, the term “transgenic,” when used in reference to a plant (i.e., a “transgenic plant”) refers to a plant that contains at least one heterologous gene in one or more of its cells, or that lacks at least one native gene, such as by means of homologous recombination, in one or more of its cells.
As used herein, “substantially complementary,” in reference to nucleic acids, refers to sequences of nucleotides (which may be on the same nucleic acid molecule or on different molecules) that are sufficiently complementary to be able to interact with each other in a predictable fashion, for example, producing a generally predictable secondary structure, such as a stem-loop motif. In some cases, two sequences of nucleotides that are substantially complementary may be at least about 75% complementary to each other, and in some cases, are at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, at least about 99.5%, or 100% complementary to each other. In some cases, two molecules that are sufficiently complementary may have a maximum of 40 mismatches (e.g., where one base of the nucleic acid sequence does not have a complementary partner on the other nucleic acid sequence, for example, due to additions, deletions, substitutions, bulges, etc.), and in other cases, the two molecules may have a maximum of 30 mismatches, 20 mismatches, 10 mismatches, or 7 mismatches. In still other cases, the two sufficiently complementary nucleic acid sequences may have a maximum of 0, 1, 2, 3, 4, 5, or 6 mismatches.
By “variants” is intended substantially similar sequences. For “variant” nucleotide sequences, conservative variants include those sequences that, because of the degeneracy of the genetic code, encode the amino acid sequence of the modulator of the invention. Variant nucleotide sequences include synthetically derived sequences, such as those generated, for example, using site-directed mutagenesis. Generally, variants of a particular nucleotide sequence of the invention will have at least about 40%, 50%, 60%, 65%, 70%, generally at least about 75%, 80%, 85%, preferably at least about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, and more preferably at least about 98%, 99% or more sequence identity to that particular nucleotide sequence as determined by sequence alignment programs described elsewhere herein using default parameters. By “variant” protein is intended a protein derived from the native protein by deletion or addition of one or more amino acids to the N-terminal and/or C-terminal end of the native protein; deletion or addition of one or more amino acids at one or more sites in the native protein; or substitution of one or more amino acids at one or more sites in the native protein. Such variants may result from, for example, genetic polymorphism or human manipulation. Conservative amino acid substitutions will generally result in variants that retain biological function
As used herein, the term “yield” or “plant yield” refers to increased plant growth, and/or increased biomass. In one embodiment, increased yield results from increased growth rate and increased root size. In another embodiment, increased yield is derived from shoot growth. In still another embodiment, increased yield is derived from fruit growth.
The present invention involves plant genes that are regulated by transcription factors that control the gene network response to an environmental perturbation or signal (e.g., nitrogen, water, sunlight, oxygen, temperature). These genes respond rapidly to their environment, but surprisingly, there is no evidence of direct transcription factor interaction. More particularly, the large class of genes described herein (and exemplified in Tables 1, 2, 19, 20, and 23) respond to the perturbation of a regulatory transcription factor and the signal it transduces, but in fact are not stably bound to the transcription factor, and yet are most relevant to the signal induced in vivo—in other words, they represent members of the “dark matter” of metabolic regulatory circuits. In some embodiments, these “response genes” are transgenically manipulated so that their respective gene products are either overexpressed or underexpressed in a plant in order to confer a desired phenotype. In other embodiments, the genes encoding the transcription factors regulating these “response genes” are transgenically manipulated so that their respective gene products are either overexpressed or underexpressed in a plant in order to confer a desired phenotype. In a particular embodiment, the desired phenotype is increased nitrogen usage, which may be desired to enhance plant growth. In another embodiment, the desired phenotype is increased nitrogen storage, which may be desired to enhance the storage of nitrogen in seeds of seed crops. In yet other embodiments, the desired phenotype is
In certain embodiments, the transgenically manipulated response gene is one or more of the following (also listed in Tables 1 and 2): At3g28510, At1g73260, At1g22400, At1g80460, At1g05570, At5g22570, At5g65110, At1g24440, At5g04310, At3g16150, At4g13430, At1g08090, At5g57655, At1g62660, At3g14050, At5g18670, At1g15380, At5g56870, At2g43400, At3g28510, At1g73260, At1g22400, At1g80460, At1g05570, At5g22570, At5g65110, At1g24440, At5g04310, At3g16150, At4g13430, At1g08090, At5g57655, At1g62660, At3g14050, At5g18670, At1g15380, At5g56870, At2g43400, At3g28510, At1g73260, At1g22400, At1g80460, At1g05570, At5g22570, At5g65110, At1g24440, At5g04310, At3g16150, At4g13430, At1g08090, At5g57655, At1g62660, At3g14050, At5g18670, At1g15380, At5g56870, At2g43400, At3g28510, At1g73260, At1g22400, At1g80460, At1g05570, At5g22570, At5g65110, At1g24440, At5g04310, At3g16150, At4g13430, At1g08090, At5g57655, At1g62660, At3g14050, At5g18670, At1g15380, At5g56870, or At2g43400.
In certain embodiments, the transgenically manipulated TF is one or more of the following (also listed in Table 3): At1g01060, At1g01720, At1g13300, At1g15100, At1g22070, At1g25550, At1g25560, At1g29160, At1g43160, At1g51700, At1g51950, At1g53910, At1g66140, At1g68670, At1g68840, At1g74660, At1g74840, At1g75390, At1g77450, At1g80840, At2g04880, At2g20570, At2g22430, At2g22850, At2g24570, At2g25000, At2g28510, At2g28550, At2g30250, At2g33710, At2g38470, At2g46830, At3g01560, At3g04070, At3g06590, At3g20770, At3g25790, At3g46130, At3g47620, At3g51920, At3g54620, At3g60490, At3g61150, At3g61890, At3g62420, At4g17490, At4g17500, At4g24240, At4g27410, At4g31800, At4g34590, At4g36540, At4g37180, At4g37260, At4g37610, At4g37730, At5g05410, At5g06800, At5G10030, At5g13080, At5g14540, At5g24800, At5g39610, At5g44190, At5g47230, At5g48655, At5g49450, At5g49520, At5g56270, At5g60850, At5g63790, At5G65210, or At5g65640.
In certain embodiments, the transgenically manipulated plant is a species of woody, ornamental, decorative, crop, cereal, fruit, or vegetable. In other embodiments, the plant is a species of one of the following genuses: Acorus, Aegilops, Allium, Amborella, Antirrhimum, Apium, Arabidopsis, Arachis, Arachis, Beta, Betula, Brassica, Capsicum, Ceratopteris, Citrus, Cryptomeria, Cycas, Descurainia, Eschscholzia, Eucalyptus, Glycine, Gossypium, Hedyotis, Helianthus, Hordeum, Ipomoea, Lactuca, Linum, Liriodendron, Lotus, Lupinus, Lycopersicon, Medicago, Mesembryanthemum, Nicotiana, Nuphar, Pennisetum, Persea, Phaseolus, Physcomitrella, Picea, Pinus, Poncirus, Populus, Prunus, Robinia, Rosa, Saccharum, Schedonorus, Secale, Sesamum, Solanum, Sorghum, Stevia, Thellungiella, Theobroma, Triphysaria, Triticum, Vitis, Zea, or Zinnia.
The invention is based, in part, on the development of a rapid technique named “TARGET” that uses transient expression of a glucocorticoid receptor (GR)-tagged TF in protoplasts to study the genome-wide effects of TF activation. In some embodiments, the TARGET system can retrieve information on direct target genes in less than two weeks time. Multiple experimental designs exist for use of the TARGET system, as shown in
In certain embodiments, the method of the present invention further comprises identifying direct target genes of the transcription factor comprising: (v) contacting the host cells with cyclohexamide; and (vi) detecting the level of mRNA expressed in the host cells; wherein an alteration in the level of the mRNA expressed in the host cells treated with cyclohexamide compared to the level of the mRNA expressed in the host cells not treated with cyclohexamide indicates the identification of direct target genes of the transcription factor.
In some embodiments, the nucleic acid molecule utilized in the methods of the invention is a DNA plasmid. In some embodiments, the domain comprising an inducible cellular localization signal encoded by the nucleic acid molecule used in the method of the invention is glucocorticoid receptor and the agent that allows for nuclear localization of the chimeric protein is dexamethasone. Dexamethasone prevents sequestration of the GR-TF fusion in the cytoplasm, allowing for localization to the nucleus. In some embodiments, the cellular localization signal encoded by the nucleic acid molecule allows for localization to the chloroplast or mitochondria upon treatment with the inducing agent.
In one embodiment, a) an isolated nucleic acid encoding a GR-TF fusion construct and an independently expressed selectable marker (e.g. a fluorescent protein such as RFP) is transiently transfected into plant protoplasts; b) treatment of the protoplasts with dexamethasone releases the GR-TF fusion from sequestration in the cytoplasm, allowing the TF to reach target genes; c) protoplasts that have been transiently transfected are identified by means of the detectable signal gene (e.g. by fluorescence activated cell sorting (FACS) to determine the presence of a fluorescent protein such as RFP); d) mRNA transcripts are measured from the transiently transfected protoplasts through use of a microarray analysis.
In some embodiments, the protoplasts are optionally exposed to an environmental signal, such as nitrogen, before treatment with dexamethasone, allowing for the measurement of transcription factor activity in response to the signal. In some embodiments, protoplasts may optionally be treated with cyclohexamide prior to or concurrently with dexamethasone treatment, which blocks translation, allowing for the distinction of primary target genes, which are still expressed in the presence of cyclohexamide, from secondary target genes, which are not expressed in the presence of cyclohexamide. In some embodiments, TF binding to response genes in transiently transfected protoplasts may optionally be analyzed using ChIP-Seq. In some embodiments, ChIP-Seq or microarray analysis is performed at differing time points after an environmental signal in order to determine temporal changes in TF binding or gene expression.
In certain embodiments, gene networks are identified that are regulated by TFs which demonstrate only transient association with a target gene. The identified TFs that regulate a target gene but are only transiently associated with that target gene can be referred to as “touch and go” or “hit and run” TFs. Touch and go (hit and run) TFs are implicated when (i) one or more particular gene transcript levels are perturbed when the TF-fusion construct is transiently expressed and released from sequestration in the cytoplasm, and (ii) stable binding to the gene or genes is not detected by ChIP SEQ analysis. In some embodiments, these touch and go (hit and run) TFs regulate genes that control responsiveness to an environmental signal, perturbation, or cue. The identified genes targeted by these transiently-associating TFs in response to an environmental signal, perturbation, or cue can be referred to as “response genes.” “Response genes” are implicated when, in the presence of an environmental signal, perturbation, or cue, “touch and go” (hit and run) TFs perturb the levels of one or more particular gene transcript yet do not stably bind the gene as measured by ChIP-Seq analysis. The identification of a particular response gene or set of genes may vary with time after the protoplast is exposed to the environmental signal, perturbation, or cue.
The present invention uses nucleic acid molecules, compositions and methods for determining the target genes of transcription factors and the structure of gene regulatory networks (GRN) by transiently expressing transcription factors of interest in host cells, such as protoplasts. The protoplasts can be isolated and utilized from virtually any plant genus and species in the methods of the invention so that target genes and gene regulatory networks in poorly characterized plant genus and species can be studied. The methods of the invention allow for cross-species studies in order to analyze evolutionary conserved networks using genes from a poorly characterized plant genus or species in a better characterized model genus, such as Arabidopsis, which has a fully sequenced genome and has microarray chip data available. By providing the ability to do reciprocal cross species genetic network comparisons, the TARGET technique allows for the determination of what is evolutionary conserved and therefore likely the most important elements of transcription factor networks.
In some embodiments, the selectable marker encoded by the nucleic acid molecule used in the method of the invention is a fluorescent selection marker. A fluorescent selection marker that can be used in the method of the invention includes, but is not limited to, green fluorescent protein, yellow fluorescent protein, red fluorescent protein, cyan fluorescent protein, or blue fluorescent protein. In a specific embodiment, the fluorescent selection marker used in the method of the invention is red fluorescent protein. In certain embodiments, the step of detecting host cells that express the selectable marker is performed by Fluorescence Activated Cell Sorting (“FACS”).
In a specific embodiment, the nucleic acid molecule utilized in the methods of the invention is DNA plasmid pBeaconRFP_GR, which comprises the nucleotide sequence of SEQ ID NO: 1.
In certain embodiments, the host cell utilized in the methods of the present invention are transiently transfected with the nucleic acid molecules of the invention. In some embodiments, the host cell utilized in the methods of the present invention is a plant protoplast. In particular embodiments, the plant protoplast is derived from one of the following genuses: Acorus, Aegilops, Allium, Amborella, Antirrhinum, Apium, Arabidopsis, Arachis, Beta, Betula, Brassica, Capsicum, Ceratopteris, Citrus, Cryptomeria, Cycas, Descurainia, Eschscholzia, Eucalyptus, Glycine, Gossypium, Hedyotis, Helianthus, Hordeum, Ipomoea, Lactuca, Linum, Liriodendron, Lotus, Lupinus, Lycopersicon, Medicago, Mesembryanthemum, Nicotiana, Nuphar, Pennisetum, Persea, Phaseolus, Physcomitrella, Picea, Pinus, Poncirus, Populus, Prunus, Robinia, Rosa, Saccharum, Schedonorus, Secale, Sesamum, Solanum, Sorghum, Stevia, Thellungiella, Theobroma, Triphysaria, Triticum, Vitis, Zea, or Zinnia. In some embodiments, the host cell is derived from a genus that is different from the genus from which the transcription factor is derived from. For example, the host cell is a plant protoplast derived from the genus Arabidopsis and the transcription factor is derived from the genus Zea.
The tables below list transcription factors and response genes for which expression may be modified in transgenic plants to produce desired phenotypes. In Section 5.2, methods for the production of transgenic plants with modified expression of one or more of these genes are enumerated.
Table 1 shows 20 genes that are (1) ClassIIIA, i.e. no TF binding but TF-activated and (2) transiently upregulated by N. These genes are examples of “response” genes. Table 2 shows 14 genes that are (1) ClassIIIA, i.e. no binding but activated and (2) early (9-20 min) upregulated by N. These are also “response” genes. Table 3 lists “touch and go” (“hit and run”) transcription factors that may be utilized with the TARGET system to discover more response genes, which may be modified in transgenic plants to create a desired phenotype. Likewise, the transcription factor genes listed in Table 3 may themselves be modified in transgenic plants to create a desired phenotype.
B. cinerea.
P. syringae and necrotrophic fungal pathogens. Located
The methods of the invention involve modulation of the expression of one, two, three or more target nucleotide sequences (i.e., target genes) in a host cell, such as a plant protoplast. That is, the expression of a target nucleotide sequence of interest may be increased or decreased.
The target nucleotide sequences may be endogenous or exogenous in origin. By “modulate expression of a target gene” is intended that the expression of the target gene is increased or decreased relative to the expression level in a host cell that has not been altered by the methods described herein.
By “increased or over expression” is intended that expression of the target nucleotide sequence is increased over expression observed in conventional transgenic lines for heterologous genes and over endogenous levels of expression for homologous genes. Heterologous or exogenous genes comprise genes that do not occur in the host cell of interest in its native state. Homologous or endogenous genes are those that are natively present in the plant genome. Generally, expression of the target sequence is substantially increased. That is expression is increased at least about 25%-50%, preferably about 50%-100%, more preferably about 100%, 200% and greater.
By “decreased expression” or “underexpression” it is intended that expression of the target nucleotide sequence is decreased below expression observed in conventional transgenic lines for heterologous genes and below endogenous levels of expression for homologous genes. Generally, expression of the target nucleotide sequence of interest is substantially decreased. That is expression is decreased at least about 25%-50%, preferably about 50%-100%, more preferably about 100%, 200% and greater.
Expression levels may be assessed by determining the level of a gene product by any method known in the art including, but not limited to determining the levels of the RNA and protein encoded by a particular target gene. For genes that encode proteins, expression levels may determined, for example, by quantifying the amount of the protein present in plant cells, or in a plant or any portion thereof. Alternatively, it desired target gene encodes a protein that has a known measurable activity, then activity levels may be measured to assess expression levels.
Any method or delivery system may be used for the delivery and/or transfection of the nucleic acid vectors encoding any of the genes of interest of the present invention in the host cell, e.g., plant protoplast. The vectors may be delivered to the host cell either alone, or in combination with other agents. Transient expression systems may also be used. Homologous recombination may also be used.
Transfection may be accomplished by a wide variety of means, as is known to those of ordinary skill in the art. Such methods include, but are not limited to, Agrobacterium-mediated transformation (e.g., Komari et al., 1998, Curr. Opin. Plant Biol., 1:161), particle bombardment mediated transformation (e.g., Finer et al., 1999, Curr. Top. Microbiol. Immunol., 240:59), protoplast electroporation (e.g., Bates, 1999, Methods Mol. Biol., 111:359), viral infection (e.g., Porta and Lomonossoff, 1996, Mol. Biotechnol. 5:209), microinjection, and liposome injection. Other exemplary delivery systems that can be used to facilitate uptake by a cell of the nucleic acid include calcium phosphate and other chemical mediators of intracellular transport, microinjection compositions, and homologous recombination compositions (e.g., for integrating a gene into a preselected location within the chromosome of the cell). Alternative methods may involve, for example, the use of liposomes, electroporation, or chemicals that increase free (or “naked”) DNA uptake, transformation using viruses or pollen and the use of microprojection. Standard molecular biology techniques are common in the art (e.g., Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual, 2nd ed., Cold Spring Harbor Laboratory Press, New York).
One of skill in the art will be able to select an appropriate vector for introducing the encoding nucleic acid sequence in a relatively intact state. Thus, any vector which will produce a host cell, e.g., plant protoplast, carrying the introduced encoding nucleic acid should be sufficient. The selection of the vector, or whether to use a vector, is typically guided by the method of transformation selected.
The transformation of plants cells in accordance with the invention may be carried out in essentially any of the various ways known to those skilled in the art of plant molecular biology. (See, for example, Methods of Enzymology, Vol. 153, 1987, Wu and Grossman, Eds., Academic Press, incorporated herein by reference).
Plant cells can comprise two or more nucleotide sequence constructs. Any means for producing a plant cell, e.g., protoplast, comprising the nucleotide sequence constructs described herein are encompassed by the present invention. For example, a nucleotide sequence encoding the modulator can be used to transform a plant cell at the same time as the nucleotide sequence encoding the precursor RNA. The nucleotide sequence encoding the precursor mRNA can be introduced into a plant cell that has already been transformed with the modulator nucleotide sequence. Likewise, viral vectors may be used to express gene products by various methods generally known in the art. Suitable plant viral vectors for expressing genes should be self-replicating, capable of systemic infection in a host, and stable. Additionally, the viruses should be capable of containing the nucleic acid sequences that are foreign to the native virus forming the vector.
Homologous recombination may be used as a method of gene inactivation.
The particular choice of a transformation technology will be determined by its efficiency to transform certain plant species as well as the experience and preference of the person practicing the invention with a particular methodology of choice. It will be apparent to the skilled person that the particular choice of a transformation system to introduce nucleic acid into plant cells is not essential to or a limitation of the invention, nor is the choice of technique for plant regeneration.
Agrobacterium. The nucleic acid sequences utilized in the present invention can be introduced into plant cells using Ti plasmids of Agrobacterium tumefaciens (A. tumefaciens), root-inducing (Ri) plasmids of Agrobacterium rhizogenes (A. rhizogenes), and plant virus vectors. For reviews of such techniques see, for example, Weissbach & Weissbach, 1988, Methods for Plant Molecular Biology, Academic Press, New York, Section VIII, pp. 421-463; and Grierson & Corey, 1988, Plant Molecular Biology, 2d Ed., Blackie, London, Ch. 7-9, and Horsch et al., 1985, Science, 227:1229.
In using an A. tumefaciens culture as a transformation vehicle, it is most advantageous to use a non-oncogenic strain of Agrobacterium as the vector carrier so that normal non-oncogenic differentiation of the transformed tissues is possible. It is also preferred that the Agrobacterium harbor a binary Ti plasmid system. Such a binary system comprises 1) a first Ti plasmid having a virulence region essential for the introduction of transfer DNA (T-DNA) into plants, and 2) a chimeric plasmid. The chimeric plasmid contains at least one border region of the T-DNA region of a wild-type Ti plasmid flanking the nucleic acid to be transferred. Binary Ti plasmid systems have been shown effective in the transformation of plant cells (De Framond, Biotechnology, 1983, 1:262; Hoekema et al., 1983, Nature, 303:179). Such a binary system is preferred because it does not require integration into the Ti plasmid of A. tumefaciens, which is an older methodology.
In some embodiments, a disarmed Ti-plasmid vector carried by Agrobacterium exploits its natural gene transferability (EP-A-270355, EP-A-01 16718, Townsend et al., 1984, NAR, 12:8711, U.S. Pat. No. 5,563,055).
Methods involving the use of Agrobacterium in transformation according to the present invention include, but are not limited to: 1) co-cultivation of Agrobacterium with cultured isolated protoplasts; 2) transformation of plant cells or tissues with Agrobacterium; or 3) transformation of seeds, apices or meristems with Agrobacterium.
In addition, gene transfer can be accomplished by in planta transformation by Agrobacterium, as described by Bechtold et al., (C. R. Acad. Sci. Paris, 1993, 316:1194). This approach is based on the vacuum infiltration of a suspension of Agrobacterium cells.
In certain embodiments, nucleic acid molecule is introduced into plant cells by infecting such plant cells, an explant, a meristem or a seed, with transformed A. tumefaciens as described above. Under appropriate conditions known in the art, the transformed plant cells are grown to form shoots, roots, and develop further into plants.
Other methods described herein, such as microprojectile bombardment, electroporation and direct DNA uptake can be used where Agrobacterium is inefficient or ineffective. Alternatively, a combination of different techniques may be employed to enhance the efficiency of the transformation process, e.g., bombardment with Agrobacterium-coated microparticles (EP-A-486234) or microprojectile bombardment to induce wounding followed by co-cultivation with Agrobacterium (EP-A-486233).
CaMV. In some embodiments, cauliflower mosaic virus (CaMV) is used as a vector for introducing a desired nucleic acid into plant cells (U.S. Pat. No. 4,407,956). CaMV viral DNA genome can be inserted into a parent bacterial plasmid creating a recombinant DNA molecule which can be propagated in bacteria. After cloning, the recombinant plasmid again can be cloned and further modified by introduction of the desired nucleic acid sequence. The modified viral portion of the recombinant plasmid can then be excised from the parent bacterial plasmid, and used to inoculate the plant cells or plants.
Mechanical and Chemical Means. In some embodiments, a nucleic acid molecule of the invention is introduced into a plant cell using mechanical or chemical means. Exemplary mechanical and chemical means are provided below.
As used herein, the term “contacting” refers to any means of introducing a nucleic acid molecule into a plant cell, including chemical and physical means as described above. Preferably, contacting refers to introducing the nucleic acid or vector containing the nucleic acid into plant cells (including an explant, a meristem or a seed), via A. tumefaciens transformed with the nucleic acid molecule.
Microinjection. In one embodiment, the nucleic acid molecule can be mechanically transferred into the plant cell by microinjection using a micropipette. See, e.g., WO 92/09696, WO 94/00583, EP 331083, EP 175966, Green et al., 1987, Plant Tissue and Cell Culture, Academic Press, Crossway et al., 1986, Biotechniques 4:320-334.
PEG. In other embodiment, the nucleic acid can also be transferred into the plant cell by using polyethylene glycol (PEG) which forms a precipitation complex with genetic material that is taken up by the cell.
Electroporation. Electroporation can be used, in another set of embodiments, to deliver a nucleic acid to the cell (see, e.g., Fromm et al., 1985, PNA5, 82:5824). “Electroporation,” as used herein, is the application of electricity to a cell, such as a plant protoplast, in such a way as to cause delivery of a nucleic acid into the cell without killing the cell. Typically, electroporation includes the application of one or more electrical voltage “pulses” having relatively short durations (usually less than 1 second, and often on the scale of milliseconds or microseconds) to a media containing the cells. The electrical pulses typically facilitate the non-lethal transport of extracellular nucleic acids into the cells. The exact electroporation protocols (such as the number of pulses, duration of pulses, pulse waveforms, etc.), will depend on factors such as the cell type, the cell media, the number of cells, the substance(s) to be delivered, etc., and can be determined by those of ordinary skill in the art. Electroporation is discussed in greater detail in, e.g., EP 290395, WO 8706614, Riggs et al., 1986, Proc. Natl. Acad. Sci. USA 83:5602-5606; D′Halluin et al., 1992, Plant Cell 4:1495-1505). Other forms of direct DNA uptake can also be used in the methods provided herein, such as those discussed in, e.g., DE 4005152, WO 9012096, U.S. Pat. No. 4,684,611, Paszkowski et al., 1984, EMBO J. 3:2717-2722.
Ballistic and Particle Bombardment. Another method for introducing a nucleic acid molecule is high velocity ballistic penetration by small particles with the nucleic acid to be introduced contained either within the matrix of such particles, or on the surface thereof (Klein et al., 1987, Nature 327:70). Genetic material can be introduced into a cell using particle gun (“gene gun”) technology, also called microprojectile or microparticle bombardment. In this method, small, high-density particles (microprojectiles) are accelerated to high velocity in conjunction with a larger, powder-fired macroprojectile in a particle gun apparatus. The microprojectiles have sufficient momentum to penetrate cell walls and membranes, and can carry RNA or other nucleic acids into the interiors of bombarded cells. It has been demonstrated that such microprojectiles can enter cells without causing death of the cells, and that they can effectively deliver foreign genetic material into intact tissue. Bombardment transformation methods are also described in Sanford et al. (Techniques 3:3-16, 1991) and Klein et al. (Bio/Techniques 10:286, 1992). Although, typically only a single introduction of a new nucleic acid sequence(s) is required, this method particularly provides for multiple introductions.
Particle or microprojectile bombardment are discussed in greater detail in, e.g., the following references: U.S. Pat. No. 5,100,792, EP-A-444882, EP-A-434616; Sanford et al., U.S. Pat. No. 4,945,050; Tomes et al., 1995, “Direct DNA Transfer into Intact Plant Cells via Microprojectile Bombardment,” in Plant Cell, Tissue, and Organ Culture: Fundamental Methods, ed. Gamborg and Phillips (Springer-Verlag, Berlin); and McCabe et al., 1988, Biotechnology 6:923-926.
Colloidal Dispersion. In other embodiments, a colloidal dispersion system may be used to facilitate delivery of a nucleic acid into the cell. As used herein, a “colloidal dispersion system” refers to a natural or synthetic molecule, other than those derived from bacteriological or viral sources, capable of delivering to and releasing the nucleic acid to the cell. Colloidal dispersion systems include, but are not limited to, macromolecular complexes, beads, and lipid-based systems including oil-in-water emulsions, micelles, mixed micelles, and liposomes. One example of a colloidal dispersion system is a liposome. Liposomes are artificial membrane vessels. It has been shown that large unilamellar vessels (“LUV”), which-range in size from 0.2 to 4.0 microns, can encapsulate large macromolecules within the aqueous interior and these macromolecules can be delivered to cells in a biologically active form (e.g., Fraley et al., 1981, Trends Biochem. Sci., 6:77).
Lipids. Lipid formulations for the transfection and/or intracellular delivery of nucleic acids are commercially available, for instance, from QIAGEN, for example as EFFECTENE® (a non-liposomal lipid with a special DNA condensing enhancer) and SUPER-FECT® (a novel acting dendrimeric technology) as well as Gibco BRL, for example, as LIPOFECTIN® and LIPOFECTACE®, which are formed of cationic lipids such as N-[1-(2,3-dioleyloxy)-propyl]-N,N,N-trimethylammonium chloride (“DOTMA”) and dimethyl dioctadecylammonium bromide (“DDAB”). Liposomes are well known in the art and have been widely described in the literature, for example, in Gregoriadis, G., 1985, Trends in Biotechnology 3:235-241; Freeman et al., 1984, Plant Cell Physiol. 29:1353).
Other Methods. In addition to the above, other physical methods for the transformation of plant cells are reviewed in the following and can be used in the methods provided herein. Oard, 1991, Biotech. Adv. 9:1-11. See generally, Weissinger et al., 1988, sAnn. Rev. Genet. 22:421-477; Sanford et al., 1987, Particulate Science and Technology 5:27-37; Christou et al., 1988, Plant Physiol. 87:671-674; McCabe et al., 1988, Bio/Technology 6:923-926; Finer and McMullen, 1991, In vitro Cell Dev. Biol. 27P:175-182; Singh et al., 1998, Theor. Appl. Genet. 96:319-324; Datta et al., 1990, Biotechnology 8:736-740; Klein et al., 1988, Proc. Natl. Acad. Sci. USA 85:4305-4309; Klein et al., 1988, Biotechnology 6:559-563; Tomes, U.S. Pat. No. 5,240,855; Buising et al., U.S. Pat. Nos. 5,322,783 and 5,324,646; Klein et al., 1988, Plant Physiol. 91:440-444; Fromm et al., 1990, Biotechnology 8:833-839; Hooykaas-Van Slogteren et al., 1984, Nature (London) 311:763-764; Bytebier et al., 1987, Proc. Natl. Acad. Sci. USA 84:5345-5349; De Wet et al., 1985, The Experimental Manipulation of Ovule Tissues, ed. Chapman et al. (Longman, N.Y.), pp. 197-209; Kaeppler et al., 1990, Plant Cell Reports 9:415-418 and Kaeppler et al., 1992, Theor. Appl. Genet. 84:560-566; Li et al., 1993, Plant Cell Reports 12:250-255 and Christou and Ford, 1995, Annals of Botany 75:407-413; Osjoda et al., 1996, Nature Biotechnology 14:745-750; all of which are herein incorporated by reference.
The nucleic acid molecules of the invention may be provided in nucleotide sequence constructs or expression cassettes for expression in the plant cell of interest. The cassette will include 5′ and 3′ regulatory sequences operably linked to an encoding nucleotide sequence of the invention.
The expression cassette may additionally contain at least one additional gene to be co-transformed into the organism. Alternatively, the additional gene(s) can be provided on multiple expression cassettes.
In certain embodiments, an expression cassette can be used with a plurality of restriction sites for insertion of the sequences of the invention to be under the transcriptional regulation of the regulatory regions. The expression cassette can additionally contain selectable marker genes (see below).
The expression cassette will generally include in the 5′-3′ direction of transcription, a transcriptional and translational initiation region, a DNA sequence of the invention, and a transcriptional and translational termination region functional in plants. The transcriptional initiation region, the promoter, may be native or analogous or foreign or heterologous to the plant host. Additionally, the promoter may be the natural sequence or alternatively a synthetic sequence. By “foreign” is intended that the transcriptional initiation region is not found in the native plant into which the transcriptional initiation region is introduced. As used herein, a chimeric gene comprises a coding sequence operably linked to a transcription initiation region that is heterologous to the coding sequence.
The termination region may be native with the transcriptional initiation region, may be native with the operably linked DNA sequence of interest, or may be derived from another source. Convenient termination regions are available from the Ti-plasmid of A. tumefaciens, such as the octopine synthase and nopaline synthase termination regions. See also Guerineau et al., 1991, Mol. Gen. Genet. 262:141-144; Proudfoot, 1991, Cell 64:671-674; Sanfacon et al., 1991, Genes Dev. 5:141-149; Mogen et al., 1990, Plant Cell 2:1261-1272; Munroe et al., 1990, Gene 91:151-158; Ballas et al., 1989, Nucleic Acids Res. 17:7891-7903; and Joshi et al., 1987, Nucleic Acid Res. 15:9627-9639.
In some embodiments, a nucleic acid can be delivered to the cell in a vector. As used herein, a “vector” is any vehicle capable of facilitating the transfer of the nucleic acid to the cell such that the nucleic acid can be processed and/or expressed in the cell. The vector may transport the nucleic acid to the cells with reduced degradation, relative to the extent of degradation that would result in the absence of the vector. The vector optionally includes gene expression sequences or other components (such as promoters and other regulatory elements) able to enhance expression of the nucleic acid within the cell. The invention also encompasses the cells transfected with these vectors, including those cells previously described.
To commence a transformation process in certain embodiments, it is first necessary to construct a suitable vector and properly introduce it into the plant cell. Vector(s) employed in the present invention for transformation of a plant cell include an encoding nucleic acid sequence operably associated with a promoter, such as a leaf-specific promoter. Details of the construction of vectors utilized herein are known to those skilled in the art of plant genetic engineering.
In general, vectors useful in the invention include, but are not limited to, plasmids, phagemids, viruses, other vehicles derived from viral or bacterial sources that have been manipulated by the insertion or incorporation of the nucleotide sequences (or precursor nucleotide sequences) of the invention. Viral vectors useful in certain embodiments include, but are not limited to, nucleic acid sequences from the following viruses: retroviruses; adenovirus, or other adeno-associated viruses; mosaic viruses such as tobamoviruses; potyviruses, nepoviruses, and RNA viruses such as retroviruses. One can readily employ other vectors not named but known to the art. Some viral vectors can be based on non-cytopathic eukaryotic viruses in which non-essential genes have been replaced with the nucleotide sequence of interest. Non-cytopathic viruses include retroviruses, the life cycle of which involves reverse transcription of genomic viral RNA into DNA with subsequent proviral integration into host cellular DNA.
Genetically altered retroviral expression vectors can have general utility for the high-efficiency transduction of nucleic acids. Standard protocols for producing replication-deficient retroviruses (including the steps of incorporation of exogenous genetic material into a plasmid, transfection of a packaging cell lined with plasmid, production of recombinant retroviruses by the packaging cell line, collection of viral particles from tissue culture media, and infection of the cells with viral particles) are well known to those of ordinary skill in the art. Examples of standard protocols can be found in Kriegler, M., 1990, Gene Transfer and Expression, A Laboratory Manual, W.H. Freeman Co., New York, or Murry, E. J. Ed., 1991, Methods in Molecular Biology, Vol. 7, Humana Press, Inc., Cliffton, N.J.
Another-example of a virus for certain applications is the adeno-associated virus, which is a double-stranded DNA virus. The adeno-associated virus can be engineered to be replication-deficient and is capable of infecting a wide range of-cell types and species. The adeno-associated virus further has advantages, such as heat and lipid solvent stability; high transduction frequencies in cells of diverse lineages; and/or lack of superinfection inhibition, which may allow multiple series of transductions.
Another vector suitable for use with the method provided herein is a plasmid vector. Plasmid vectors, have been extensively described in the art and are well-known to those of skill in the art. See, e.g., Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press. These plasmids may have a promoter compatible with the host cell, and the plasmids can express a peptide from a gene operatively encoded within the plasmid. Some commonly used plasmids include pBR322, pUC18, pUC19, pRC/CMV, SV40, and pBlueScript. Other plasmids are well-known to those of ordinary skill in the art. Additionally, plasmids may be custom-designed, for example, using restriction enzymes and ligation reactions, to remove and add specific fragments of DNA or other nucleic acids, as necessary. The present invention also includes vectors for producing nucleic acids or precursor nucleic acids containing a desired nucleotide sequence (which can, for instance, then be cleaved or otherwise processed within the cell to produce a precursor miRNA). These vectors may include a sequence encoding a nucleic acid and an in vivo expression element, as further described below. In some cases, the in vivo expression element includes at least one promoter.
Where appropriate, the gene(s) for enhanced expression may be optimized for expression in the transformed plant. That is, the genes can be synthesized using plant-preferred codons corresponding to the plant of interest. Methods are available in the art for synthesizing plant-preferred genes. See, for example, U.S. Pat. Nos. 5,380,831, and 5,436,391, and Murray et al., 1989, Nucleic Acids Res. 17:477-498.
Additional sequence modifications are known to enhance gene expression in a cellular host. These include elimination of sequences encoding spurious polyadenylation signals, exon-intron splice site signals, transposon-like repeats, and other such well-characterized sequences that may be deleterious to gene expression. The G-C content of the sequence may be adjusted to levels average for a given cellular host, as calculated by reference to known genes expressed in the host cell. When desired, the sequence is modified to avoid predicted hairpin secondary mRNA structures. However, it is recognized that in the case of nucleotide sequences encoding the miRNA precursors, one or more hairpin and other secondary structures may be desired for proper processing of the precursor into a mature miRNA and/or for the functional activity of the miRNA in gene silencing.
The expression cassettes can additionally contain 5′ leader sequences in the expression cassette construct. Such leader sequences can act to enhance translation. Translation leaders are known in the art and include: picornavirus leaders, for example, EMCV leader (Encephalomyocarditis 5′ noncoding region) (Elroy-Stein et al., 1989, PNAS USA 86:6126-6130); potyvirus leaders, for example, TEV leader (Tobacco Etch Virus) (Allison et al., 1986); MDMV leader (Maize Dwarf Mosaic Virus); Virology 154:9-20), and human immunoglobulin heavy-chain binding protein (BiP), (Macejak et al., 1991, Nature 353:90-94); untranslated leader from the coat protein miRNA of alfalfa mosaic virus (AMV RNA 4) (Jobling et al., 1987, Nature 325:622-625); tobacco mosaic virus leader (TMV) (Gallie et al., 1989, Molecular Biology of RNA, ed. Cech (Liss, New York), pp. 237-256); and maize chlorotic mottle virus leader (MCMV) (Lommel et al., 1991, Virology 81:382-385). See also, Della-Cioppa et al., 1987, Plant Physiol. 84:965-968.
In preparing the expression cassette, the various DNA fragments can be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame. Toward this end, adapters or linkers can be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like. For this purpose, in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions, may be involved.
Provided herein are host cells that contain a vector, e.g., a DNA plasmid and support the replication and/or expression of the vector. Host cells may be prokaryotic cells such as E. coli, or eukaryotic cells such as yeast, plant, insect, amphibian, or mammalian cells. In some embodiments, host cells are monocotyledonous or dicotyledonous plant cells. In other embodiments monocotyledonous host cell is a maize host cell. In certain embodiments, the host cell utilized in the methods of the present invention are transiently transfected with the nucleic acid molecules of the invention.
In preferred embodiments, the host cell utilized in the methods of the present invention is a plant protoplast. Plant protoplasts are plant cells that had their entire plant cell wall enzymatically removed prior to the introduction of the molecule of interest. The complete removal of the cell wall disrupts the connection between cells producing a homogenous suspension of individualized cells which allows more uniform and large scale transfection experiments. This comprises, but is not restricted to protoplast fusion, electroporation, liposome-mediated transfection, and polyethylene glycol-mediated transfection. Protoplast preparation is therefore a very reliable and inexpensive method to produce millions of cells.
In particular embodiments, the plant protoplast is derived from one of the following genuses: Acorus, Aegilops, Allium, Amborella, Antirrhinum, Apium, Arabidopsis, Arachis, Beta, Betula, Brassica, Capsicum, Ceratopteris, Citrus, Cryptomeria, Cycas, Descurainia, Eschscholzia, Eucalyptus, Glycine, Gossypium, Hedyotis, Helianthus, Hordeum, Ipomoea, Lactuca, Linum, Liriodendron, Lotus, Lupinus, Lycopersicon, Medicago, Mesembryanthemum, Nicotiana, Nuphar, Pennisetum, Persea, Phaseolus, Physcomitrella, Picea, Pinus, Poncirus, Populus, Prunus, Robinia, Rosa, Saccharum, Schedonorus, Secale, Sesamum, Solanum, Sorghum, Stevia, Thellungiella, Theobroma, Triphysaria, Triticum, Vitis, Zea, or Zinnia. In some embodiments, the host cell is derived from a genus that is different from the genus from which the transcription factor is derived from. For example, the host cell is a plant protoplast derived from the genus Arabidopsis and the transcription factor is derived from the genus Zea.
Also provided herein are plant cells having the nucleotide sequence constructs of the invention. A further aspect of the present invention provides a method of making such a plant cell involving introduction of a vector including the construct into a plant cell. For integration of the construct into the plant genome, such introduction will be followed by recombination between the vector and the plant cell genome to introduce the sequence of nucleotides into the genome. RNA encoded by the introduced nucleic acid construct may then be transcribed in the cell and descendants thereof, including cells in plants regenerated from transformed material. A gene stably incorporated into the genome of a plant is passed from generation to generation to descendants of the plant, so such descendants should show the desired phenotype.
Optionally, germ line cells may be used in the methods described herein rather than, or in addition to, somatic cells. The term “germ line cells” refers to cells in the plant organism which can trace their eventual cell lineage to either the male or female reproductive cell of the plant. Other cells, referred to as “somatic cells” are cells which give rise to leaves, roots and vascular elements which, although important to the plant, do not directly give rise to gamete cells. Somatic cells, however, also may be used. With regard to callus and suspension cells which have somatic embryogenesis, many or most of the cells in the culture have the potential capacity to give rise to an adult plant. If the plant originates from single cells or a small number of cells from the embryogenic callus or suspension culture, the cells in the callus and suspension can therefore be referred to as germ cells. In the case of immature embryos which are prepared for treatment by the methods described herein, certain cells in the apical meristem region of the plant have been shown to produce a cell lineage which eventually gives rise to the female and male reproductive organs. With many or most species, the apical meristem is generally regarded as giving rise to the lineage that eventually will give rise to the gamete cells. An example of a non-gamete cell in an embryo would be the first leaf primordia in corn which is destined to give rise only to the first leaf and none of the reproductive structures.
In the broad method of the invention, the nucleic acid molecule of the invention is operably linked with a promoter. It may be desirable to introduce more than one copy of a polynucleotide into a plant cell for enhanced expression.
In general, promoters are found positioned 5′ (upstream) of the genes that they control. Thus, in the construction of promoter gene combinations, the promoter is preferably positioned upstream of the gene and at a distance from the transcription start site that approximates the distance between the promoter and the gene it controls in the natural setting. As is known in the art, some variation in this distance can be tolerated without loss of promoter function. Similarly, the preferred positioning of a regulatory element, such as an enhancer, with respect to a heterologous gene placed under its control reflects its natural position relative to the structural gene it naturally regulates.
Thus, the nucleic acid, in one embodiment, is operably linked to a gene expression sequence, which directs the expression of the nucleic acid within the cell. A “gene expression sequence,” as used herein, is any regulatory nucleotide sequence, such as a promoter sequence or promoter-enhancer combination, which facilitates the efficient transcription and translation of the nucleotide sequence to which it is operably linked. The gene expression sequence may, for example, be a eukaryotic promoter or a viral promoter, such as a constitutive or inducible promoter. Promoters and enhancers consist of short arrays of DNA sequences that interact specifically with cellular proteins involved in transcription, for instance, as discussed in Maniatis et al., 1987, Science 236:1237. Promoter and enhancer elements have been isolated from a variety of eukaryotic sources including genes in plant, yeast, insect and mammalian cells and viruses (analogous control elements, i.e., promoters, are also found in prokaryotes). In some embodiments, the nucleic acid is linked to a gene expression sequence which permits expression of the nucleic acid in a plant cell. A sequence which permits expression of the nucleic acid in a plant cell is one which is selectively active in the particular plant cell and thereby causes the expression of the nucleic acid in these cells. Those of ordinary skill in the art will be able to easily identify promoters that are capable of expressing a nucleic acid in a cell based on the type of plant cell.
A number of promoters can be used in the practice of the invention. The promoters can be selected based on the desired outcome. Generally, the nucleotide sequence and the modulator sequences can be combined with promoters of choice to alter gene expression if the target sequences in the tissue or organ of choice. Thus, the nucleotide sequence or modulator nucleotide sequence can be combined with constitutive, tissue-preferred, inducible, developmental, or other promoters for expression in plants depending upon the desired outcome.
The selection of a particular promoter and enhancer depends on what cell type is to be used and the mode of delivery. For example, a wide variety of promoters have been isolated from plants and animals, which are functional not only in the cellular source of the promoter, but also in numerous other plant species. There are also other promoters (e.g., viral and Ti-plasmid) which can be used. For example, these promoters include promoters from the Ti-plasmid, such as the octopine synthase promoter, the nopaline synthase promoter, the mannopine synthase promoter, and promoters from other open reading frames in the T-DNA, such as ORF7, etc. Promoters isolated from plant viruses include the 35S promoter from cauliflower mosaic virus. Promoters that have been isolated and reported for use in plants include ribulose-1,3-biphosphate carboxylase small subunit promoter, phaseolin promoter, etc. Thus, a variety of promoters and regulatory elements may be used in the expression vectors of the present invention.
Promoters useful in the compositions and methods provided herein include both natural constitutive and inducible promoters as well as engineered promoters. The CaMV promoters are examples of constitutive promoters. Other constitutive mammalian promoters include, but are not limited to, polymerase promoters as well as the promoters for the following genes: hypoxanthine phosphoribosyl transferase (“HPTR”), adenosine deaminase, pyruvate kinase, and alpha-actin.
Promoters useful as expression elements of the invention also include inducible promoters. Inducible promoters are expressed in the presence of an inducing agent. For example, a metallothionein promoter can be induced to promote transcription in the presence of certain metal ions. Other inducible promoters are known to those of ordinary skill in the art. The in vivo expression element can include, as necessary, 5′ non-transcribing and 5′ non-translating sequences involved with the initiation of transcription, and can optionally include enhancer sequences or upstream activator sequences.
For example, in some embodiments an inducible promoter is used to allow control of nucleic acid expression through the presentation of external stimuli (e.g., environmentally inducible promoters), as discussed below. Thus, the timing and amount of nucleic acid expression can be controlled in some cases. Non-limiting examples of expression systems, promoters, inducible promoters, environmentally inducible promoters, and enhancers are well known to those of ordinary skill in the art. Examples include those described in International Patent Application Publications WO 00/12714, WO 00/11175, WO 00/12713, WO 00/03012, WO 00/03017, WO 00/01832, WO 99/50428, WO 99/46976 and U.S. Pat. Nos. 6,028,250, 5,959,176, 5,907,086, 5,898,096, 5,824,857, 5,744,334, 5,689,044, and 5,612,472. A general descriptions of plant expression vectors and reporter genes can also be found in Gruber et al., 1993, “Vectors for Plant Transformation,” in Methods in Plant Molecular Biology & Biotechnology, Glich et al., Eds., p. 89-119, CRC Press.
For plant expression vectors, viral promoters that can be used in certain embodiments include the 35S RNA and 19S RNA promoters of CaMV (Brisson et al., Nature, 1984, 310:511; Odell et al., Nature, 1985, 313:810); the full-length transcript promoter from Figwort Mosaic Virus (FMV) (Gowda et al., 1989, J. Cell Biochem., 13D: 301) and the coat protein promoter to TMV (Takamatsu et al., 1987, EMBO J. 6:307). Alternatively, plant promoters such as the light-inducible promoter from the small subunit of ribulose bis-phosphate carboxylase (ssRUBISCO) (Coruzzi et al., 1984, EMBO J., 3:1671; Broglie et al., 1984, Science, 224:838); mannopine synthase promoter (Velten et al., 1984, EMBO J., 3:2723) nopaline synthase (NOS) and octopine synthase (OCS) promoters (carried on tumor-inducing plasmids of Agrobacterium tumefaciens) or heat shock promoters, e.g., soybean hsp17.5-E or hsp17.3-B (Gurley et al., 1986, Mol. Cell. Biol., 6:559; Severin et al., 1990, Plant Mol. Biol., 15:827) may be used. Exemplary viral promoters which function constitutively in eukaryotic cells include, for example, promoters from the simian virus, papilloma virus, adenovirus, human immunodeficiency virus, Rous sarcoma virus, cytomegalovirus, the long terminal repeats of Moloney leukemia virus and other retroviruses, and the thymidine kinase promoter of herpes simplex virus. Other constitutive promoters are known to those of ordinary skill in the art.
To be most useful, an inducible promoter should 1) provide low expression in the absence of the inducer; 2) provide high expression in the presence of the inducer; 3) use an induction scheme that does not interfere with the normal physiology of the plant; and 4) have no effect on the expression of other genes. Examples of inducible promoters useful in plants include those induced by chemical means, such as the yeast metallothionein promoter which is activated by copper ions (Mett et al., Proc. Natl. Acad. Sci., U.S.A., 90:4567, 1993); In2-1 and In2-2 regulator sequences which are activated by substituted benzenesulfonamides, e.g., herbicide safeners (Hershey et al., Plant Mol. Biol., 17:679, 1991); and the GRE regulatory sequences which are induced by glucocorticoids (Schena et al., Proc. Natl. Acad Sci., U.S.A., 88:10421, 1991). Other promoters, both constitutive and inducible will be known to those of skill in the art.
A number of inducible promoters are known in the art. For resistance genes, a pathogen-inducible promoter can be utilized. Such promoters include those from pathogenesis-related proteins (PR proteins), which are induced following infection by a pathogen; e.g., PR proteins, SAR proteins, beta-1,3-glucanase, chitinase, etc. See, for example, Redolfi et al., 1983, Neth. J. Plant Pathol. 89:245-254; Uknes et al., 1992, Plant Cell 4:645-656; and Van Loon, 1985, Plant Mol. Virol. 4:111-116. Of particular interest are promoters that are expressed locally at or near the site of pathogen infection. See, for example, Marineau et al., 1987, Plant Mol. Biol. 9:335-342; Matton et al., 1989, Molecular Plant-Microbe Interactions 2:325-331; Somsisch et al., 1986, Proc. Natl. Acad. Sci. USA 83:2427-2430; Somsisch et al., 1988, Mol. Gen. Genet. 2:93-98; and Yang, 1996, Proc. Natl. Acad. Sci. USA 93:14972-14977. See also, Chen et al., 1996, Plant J. 10:955-966; Zhang et al., 1994, Proc. Natl. Acad. Sci. USA 91:2507-2511; Warner et al., 1993, Plant J. 3:191-201; Siebertz et al., 1989, Plant Cell 1:961-968; U.S. Pat. No. 5,750,386; Cordero et al., 1992, Physiol. Mol. Plant Path. 41:189-200; and the references cited therein.
Additionally, as pathogens find entry into plants through wounds or insect damage, a wound-inducible promoter may be used in the DNA constructs of the invention. Such wound-inducible promoters include potato proteinase inhibitor (pin II) gene (Ryan, 1990, Ann. Rev. Phytopath. 28:425-449; Duan et al., 1996, Nature Biotechnology 14:494-498); wun1 and wun2, U.S. Pat. No. 5,428,148; win1 and win2 (Stanford et al., 1989, Mol. Gen. Genet. 215:200-208); systemin (McGurl et al., 1992, Science 225:1570-1573); WIPI (Rohmeier et al., 1993, Plant Mol. Biol. 22:783-792; Eckelkamp et al., 1993, FEBS Letters 323:73-76); MPI gene (Corderok et al., 1994, Plant J. 6(2):141-150); and the like. Such references are herein incorporated by reference.
Chemical-regulated promoters can be used to modulate the expression of a gene in a plant through the application of an exogenous chemical regulator. Depending upon the objective, the promoter may be a chemical-inducible promoter, where application of the chemical induces gene expression, or a chemical-repressible promoter, where application of the chemical represses gene expression. Chemical-inducible promoters are known in the art and include, but are not limited to, the maize In2-2 promoter, which is activated by benzenesulfonamide herbicide safeners, the maize GST promoter, which is activated by hydrophobic electrophilic compounds that are used as pre-emergent herbicides, and the tobacco PR-1 a promoter, which is activated by salicylic acid. Other chemical-regulated promoters of interest include steroid-responsive promoters (see, for example, the glucocorticoid-inducible promoter in Schena et al., 1991, Proc. Natl. Acad. Sci. USA 88:10421-10425 and McNellis et al., 1998, Plant J. 14(2):247-257) and tetramiR167e-inducible and tetramiR167e-repressible promoters (see, for example, Gatz et al., 1991, Mol. Gen. Genet. 227:229-237, and U.S. Pat. Nos. 5,814,618 and 5,789,156), herein incorporated by reference.
Where enhanced expression in particular tissues is desired, tissue-preferred promoters can be utilized. Tissue-preferred promoters include those described by Yamamoto et al., 1997, Plant J. 12(2):255-265; Kawamata et al., 1997, Plant Cell Physiol. 38(7):792-803; Hansen et al., 1997, Mol. Gen Genet. 254(3):337-343; Russell et al., 1997, Transgenic Res. 6(2):157-168; Rinehart et al., 1996, Plant Physiol. 112(3):1331-1341; Van Camp et al., 1996, Plant Physiol. 112(2):525-535; Canevascini et al., 1996, Plant Physiol. 12(2):513-524; Yamamoto et al., 1994, Plant Cell Physiol. 35(5):773-778; Lam, 1994, Results Probl. Cell Differ. 20:181-196; Orozco et al., 1993, Plant Mol. Biol. 23(6): 1129-1138; Matsuoka et al., 1993, Proc Natl. Acad. Sci. USA 90(20):9586-9590; and Guevara-Garcia et al., 1993, Plant J 4(3):495-505.
The particular promoter selected should be capable of causing sufficient expression to result in the production of an effective amount of structural gene product in the plant cell to cause upregulation of genes as compared to wild type. The promoters used in the vector constructs of the present invention may be modified, if desired, to affect their control characteristics. In certain embodiments, chimeric promoters can be used.
There are promoters known which limit expression to particular plant parts or in response to particular stimuli. One skilled in the art will know of many such plant part-specific promoters which would be useful in the present invention. In certain embodiments, to provide pericycle-specific expression, any of a number of promoters from genes in Arabidopsis can be used. In some embodiments, the promoter from one (or more) of the following genes may be used: (i) At1g11080, (ii) At3g60160, (iii) At1g24575, (iv) At3g45160, or (v) At1g23130. In specific embodiments, (vi) promoter elements from the GFP-marker line used in Gifford et al. (in preparation) will be used (see also, Bonke et al., 2003, Nature 426, 181-6; Tian et al., 2004, Plant Physiol 135, 25-38). Several of the predicted genes have a number of potential orthologs in rice and poplar and thus are predicted that they will be applicable for use in crop species; (i) Os04g44410, Os10g39560, Os06g51370, Os02g42310, Os01g22980, Os05g06660, and Poptr1#568263, Poptr1 #555534, Poptr1#365170; (ii) Os04g49900, Os04g49890, Os01g67580, and Poptr1#87573, Poptr1#80582, Poptr1#565079, Poptr1#99223.
Promoters used in the nucleic acid constructs of the present invention can be modified, if desired, to affect their control characteristics. For example, the CaMV 35S promoter may be ligated to the portion of the ssRUBISCO gene that represses the expression of ssRUBISCO in the absence of light, to create a promoter which is active in leaves but not in roots. The resulting chimeric promoter may be used as described herein. For purposes of this description, the phrase “CaMV 35S” promoter thus includes variations of CaMV 35S promoter, e.g., promoters derived by means of ligation with operator regions, random or controlled mutagenesis, etc. Furthermore, the promoters may be altered to contain multiple “enhancer sequences” to assist in elevating gene expression.
An efficient plant promoter that may be used in specific embodiments is an “overproducing” or “overexpressing” plant promoter. Overexpressing plant promoters that can be used in the compositions and methods provided herein include the promoter of the small sub-unit (“ss”) of the ribulose-1,5-biphosphate carboxylase from soybean (e.g., Berry-Lowe et al., 1982, J. Molecular & App. Genet., 1:483), and the promoter of the chorophyll a-b binding protein. These two promoters are known to be light-induced in eukaryotic plant cells. For example, see Cashmore, Genetic Engineering of plants: An Agricultural Perspective, p. 29-38; Coruzzi et al., 1983, J. Biol. Chem., 258:1399; and Dunsmuir et al., 1983, J. Molecular & App. Genet., 2:285.
The promoters and control elements of, e.g., SUCS (root nodules; broadbean; Kuster et al., 1993, Mol Plant Microbe Interact 6:507-14) for roots can be used in compositions and methods provided herein to confer tissue specificity.
In certain embodiment, two promoter elements can be used in combination, such as, for example, (i) an inducible element responsive to a treatment that can be provided to the plant prior to N-fertilizer treatment, and (ii) a plant tissue-specific expression element to drive expression in the specific tissue alone.
Any promoter of other expression element described herein or known in the art may be used either alone or in combination with any other promoter or other expression element described herein or known in the art. For example, promoter elements that confer tissue specific expression of a gene can be used with other promoter elements conferring constitutive or inducible expression.
Promoter and promoter control elements that are related to those described in herein can also be used in the compositions and methods provided herein. Such related sequence can be isolated utilizing (a) nucleotide sequence identity; (b) coding sequence identity of related, orthologous genes; or (c) common function or gene products.
Relatives can include both naturally occurring promoters and non-natural promoter sequences. Non-natural related promoters include nucleotide substitutions, insertions or deletions of naturally-occurring promoter sequences that do not substantially affect transcription modulation activity. For example, the binding of relevant DNA binding proteins can still occur with the non-natural promoter sequences and promoter control elements of the present invention.
According to current knowledge, promoter sequences and promoter control elements exist as functionally important regions, such as protein binding sites, and spacer regions. These spacer regions are apparently required for proper positioning of the protein binding sites. Thus, nucleotide substitutions, insertions and deletions can be tolerated in these spacer regions to a certain degree without loss of function.
In contrast, less variation is permissible in the functionally important regions, since changes in the sequence can interfere with protein binding. Nonetheless, some variation in the functionally important regions is permissible so long as function is conserved.
The effects of substitutions, insertions and deletions to the promoter sequences or promoter control elements may be to increase or decrease the binding of relevant DNA binding proteins to modulate transcript levels of a polynucleotide to be transcribed. Effects may include tissue-specific or condition-specific modulation of transcript levels of the polypeptide to be transcribed. Polynucleotides representing changes to the nucleotide sequence of the DNA-protein contact region by insertion of additional nucleotides, changes to identity of relevant nucleotides, including use of chemically-modified bases, or deletion of one or more nucleotides are considered encompassed by the present invention.
Typically, related promoters exhibit at least 80% sequence identity, preferably at least 85%, more preferably at least 90%, and most preferably at least 95%, even more preferably, at least 96%, at least 97%, at least 98% or at least 99% sequence identity. Such sequence identity can be calculated by the algorithms and computers programs described above.
Usually, such sequence identity is exhibited in an alignment region that is at least 75% of the length of a sequence or corresponding full-length sequence of a promoter described herein; more usually at least 80%; more usually, at least 85%, more usually at least 90%, and most usually at least 95%, even more usually, at least 96%, at least 97%, at least 98% or at least 99% of the length of a sequence of a promoter described herein.
The percentage of the alignment length is calculated by counting the number of residues of the sequence in region of strongest alignment, e.g., a continuous region of the sequence that contains the greatest number of residues that are identical to the residues between two sequences that are being aligned. The number of residues in the region of strongest alignment is divided by the total residue length of a sequence of a promoter described herein. These related promoters may exhibit similar preferential transcription as those promoters described herein.
In certain embodiments, a promoter, such as a leaf-preferred or leaf-specific promoter, can be identified by sequence homology or sequence identity to any root specific promoter identified herein. In other embodiments, orthologous genes identified herein as leaf-specific genes (e.g., the same gene or different gene that if functionally equivalent) for a given species can be identified and the associated promoter can also be used in the compositions and methods provided herein. For example, using high, medium or low stringency conditions, standard promoter rules can be used to identify other useful promoters from orthologous genes for use in the compositions and methods provided herein. In specific embodiments, the orthologous gene is a gene expressed only or primarily in the root, such as pericycle cells.
Polynucleotides can be tested for activity by cloning the sequence into an appropriate vector, transforming plants with the construct and assaying for marker gene expression. Recombinant DNA constructs can be prepared, which comprise the polynucleotide sequences of the invention inserted into a vector suitable for transformation of plant cells. The construct can be made using standard recombinant DNA techniques (Sambrook et al., 1989) and can be introduced to the species of interest by Agrobacterium-mediated transformation or by other means of transformation as referenced below.
The vector backbone can be any of those typical in the art such as plasmids, viruses, artificial chromosomes, BACs, YACs and PACs and vectors of the sort described by (a) BAC: Shizuya et al., 1992, Proc. Natl. Acad. Sci. USA 89: 8794-8797; Hamilton et al., 1996, Proc. Natl. Acad. Sci. USA 93: 9975-9979; (b) YAC: Burke et al., 1987, Science 236:806-812; (c) PAC: Stemberg N. et al., 1990, Proc Natl Acad Sci USA. January; 87(1):103-7; (d) Bacteria-Yeast Shuttle Vectors: Bradshaw et al., 1995, Nucl Acids Res 23: 4850-4856; (e) Lambda Phage Vectors: Replacement Vector, e.g., Frischauf et al., 1983, J. Mol. Biol. 170: 827-842; or Insertion vector, e.g., Huynh et al., 1985, In: Glover N M (ed) DNA Cloning: A practical Approach, Vol. 1 Oxford: IRL Press; T-DNA gene fusion vectors: Walden et al., 1990, Mol Cell Biol 1: 175-194; and (g) Plasmid vectors: Sambrook et al., infra.
Typically, the construct comprises a vector containing a sequence of the present invention operationally linked to any marker gene. The polynucleotide was identified as a promoter by the expression of the marker gene. Although many marker genes can be used, Green Fluorescent Protein (GFP) is preferred. The vector may also comprise a marker gene that confers a selectable phenotype on plant cells. The marker may encode biocide resistance, particularly antibiotic resistance, such as resistance to kanamycin, G418, bleomycin, hygromycin, or herbicide resistance, such as resistance to chlorosulfuron or phosphinotricin (see below). Vectors can also include origins of replication, scaffold attachment regions (SARs), markers, homologous sequences, introns, etc.
Specific promoters may be used in the compositions and methods provided herein. As used herein, “specific promoters” refers to a subset of promoters that have a high preference for modulating transcript levels in a specific tissue or organ or cell and/or at a specific time during development of an organism. By “high preference” is meant at least 3-fold, preferably 5-fold, more preferably at least 10-fold still more preferably at least 20-fold, 50-fold or 100-fold increase in transcript levels under the specific condition over the transcription under any other reference condition considered. Typical examples of temporal and/or tissue or organ specific promoters of plant origin that can be used in the compositions and methods of the present invention, include RCc2 and RCc3, promoters that direct root-specific gene transcription in rice (Xu et al., 1995, Plant Mol. Biol. 27:237 and TobRB27, a root-specific promoter from tobacco (Yamamoto et al., 1991, Plant Cell 3:371). Examples of tissue-specific promoters under developmental control include promoters that initiate transcription only in certain tissues or organs, such as roots
“Preferential transcription” is defined as transcription that occurs in a particular pattern of cell types or developmental times or in response to specific stimuli or combination thereof. Non-limitative examples of preferential transcription include: high transcript levels of a desired sequence in root tissues; detectable transcript levels of a desired sequence in certain cell types during embryogenesis; and low transcript levels of a desired sequence under drought conditions. Such preferential transcription can be determined by measuring initiation, rate, and/or levels of transcription.
Typically, promoter or control elements, which provide preferential transcription in cells, tissues, or organs of a root, produce transcript levels that are statistically significant as compared to other cells, organs or tissues. For preferential up-regulation of transcription, promoter and control elements produce transcript levels that are above background of the assay.
The method of the present invention comprises detecting host cells that express a selectable marker. In certain embodiments, the step of detecting host cells that express the selectable marker is performed by Fluorescence Activated Cell Sorting (FACS) in the methods of the present invention. Fluorescence activated cell sorting (FACS) is a well-known method for separating particles, including cells, based on the fluorescent properties of the particles (see, e.g., Kamarch, 1987, Methods Enzymol, 151:150-165). Laser excitation of fluorescent moieties in the individual particles results in a small electrical charge allowing electromagnetic separation of positive and negative particles from a mixture. In one embodiment, cell surface marker-specific antibodies or ligands are labeled with distinct fluorescent labels. Cells are processed through the cell sorter, allowing separation of cells based on their ability to bind to the antibodies used. FACS sorted particles may be directly deposited into individual wells of 96-well or 384-well plates to facilitate separation and cloning.
Also, desired plants may be obtained by engineering the disclosed gene constructs into a variety of plant cell types, including but not limited to, protoplasts, tissue culture cells, tissue and organ explants, pollens, embryos as well as whole plants. In an embodiment of the present invention, the engineered plant material is selected or screened for transformants (those that have incorporated or integrated the introduced gene construct(s)) following the approaches and methods described below. An isolated transformant may then be regenerated into a plant. Alternatively, the engineered plant material may be regenerated into a plant or plantlet before subjecting the derived plant or plantlet to selection or screening for the marker gene traits. Procedures for regenerating plants from plant cells, tissues or organs, either before or after selecting or screening for marker gene(s), are well known to those skilled in the art.
A transformed plant cell, callus, tissue or plant may be identified and isolated by selecting or screening the engineered plant material for traits encoded by the marker genes present on the transforming DNA. For instance, selection may be performed by growing the engineered plant material on media containing inhibitory amount of the antibiotic or herbicide to which the transforming gene construct confers resistance. Further, transformed plants and plant cells may also be identified by screening for the activities of any visible marker genes (e.g., the 3-glucuronidase, luciferase, B or C1 genes) that may be present on the recombinant nucleic acid constructs of the present invention. Such selection and screening methodologies are well known to those skilled in the art.
Physical and biochemical methods also may be also to identify plant or plant cell transformants containing the gene constructs of the present invention. These methods include but are not limited to: 1) Southern analysis or PCR amplification for detecting and determining the structure of the recombinant DNA insert; 2) Northern blot, Si RNase protection, primer-extension or reverse transcriptase-PCR amplification for detecting and examining RNA transcripts of the gene constructs; 3) enzymatic assays for detecting enzyme or ribozyme activity, where such gene products are encoded by the gene construct; 4) protein gel electrophoresis, Western blot techniques, immunoprecipitation, or enzyme-linked immunoassays, where the gene construct products are proteins. Additional techniques, such as in situ hybridization, enzyme staining, and immunostaining, also may be used to detect the presence or expression of the recombinant construct in specific plant organs and tissues. The methods for doing all these assays are well known to those skilled in the art.
Following transformation, a plant may be regenerated, e.g., from single cells, callus tissue or leaf discs, as is standard in the art. Almost any plant can be entirely regenerated from cells, tissues, and organs of the plant. Available techniques are reviewed in Vasil et al., 1984, in Cell Culture and Somatic Cell Genetics of Plants, Vols. I, II, and III, Laboratory Procedures and Their Applications (Academic Press); and Weissbach et al., 1989, Methods For Plant Mol. Biol.
The transformed plants may then be grown, and either pollinated with the same transformed strain or different strains, and the resulting hybrid having expression of the desired phenotypic characteristic identified. Two or more generations may be grown to ensure that expression of the desired phenotypic characteristic is stably maintained and inherited and then seeds harvested to ensure expression of the desired phenotypic characteristic has been achieved.
Normally, a plant cell is regenerated to obtain a whole plant from the transformation process. The term “growing” or “regeneration” as used herein means growing a whole plant from a plant cell, a group of plant cells, a plant part (including seeds), or a plant piece (e.g., from a protoplast, callus, or tissue part).
Regeneration from protoplasts varies from species to species of plants, but generally a suspension of protoplasts is first made. In certain species, embryo formation can then be induced from the protoplast suspension. The culture media will generally contain various amino acids and hormones, necessary for growth and regeneration. Examples of hormones utilized include auxins and cytokinins. Efficient regeneration will depend on the medium, on the genotype, and on the history of the culture. If these variables are controlled, regeneration is reproducible.
Regeneration also occurs from plant callus, explants, organs or parts. Transformation can be performed in the context of organ or plant part regeneration (see Methods in Enzymology, Vol. 118 and Klee et al., Annual Review of Plant Physiology, 38:467, 1987). Utilizing the leaf disk-transformation-regeneration method of Horsch et al., Science, 227:1229, 1985, disks are cultured on selective media, followed by shoot formation in about 2-4 weeks. Shoots that develop are excised from calli and transplanted to appropriate root-inducing selective medium. Rooted plantlets are transplanted to soil as soon as possible after roots appear. The plantlets can be repotted as required, until reaching maturity.
In vegetatively propagated crops, the mature transgenic plants are propagated by utilizing cuttings or tissue culture techniques to produce multiple identical plants. Selection of desirable transgenics is made and new varieties are obtained and propagated vegetatively for commercial use.
In seed propagated crops, mature transgenic plants can be self crossed to produce a homozygous inbred plant. The resulting inbred plant produces seed containing the newly introduced foreign gene(s). These seeds can be grown to produce plants that would produce the selected phenotype, e.g., increased lateral root growth, uptake of nutrients, overall plant growth and/or vegetative or reproductive yields.
Parts obtained from the regenerated plant, such as flowers, seeds, leaves, branches, fruit, and the like are included in the invention, provided that these parts comprise cells comprising the isolated nucleic acid of the present invention. Progeny and variants, and mutants of the regenerated plants are also included within the scope of the invention, provided that these parts comprise the introduced nucleic acid sequences. Transgenic plants expressing the selectable marker can be screened for transmission of the nucleic acid of the present invention by, for example, standard immunoblot and DNA detection techniques. Transgenic lines are also typically evaluated on levels of expression of the heterologous nucleic acid. Expression at the RNA level can be determined initially to identify and quantitate expression-positive plants. Standard techniques for RNA analysis can be employed and include PCR amplification assays using oligonucleotide primers designed to amplify only the heterologous RNA templates and solution hybridization assays using heterologous nucleic acid-specific probes. The RNA-positive plants can then analyzed for protein expression by Western immunoblot analysis using the specifically reactive antibodies of the present invention. In addition, in situ hybridization and immunocytochemistry according to standard protocols can be done using heterologous nucleic acid specific polynucleotide probes and antibodies, respectively, to localize sites of expression within transgenic tissue. Generally, a number of transgenic lines are usually screened for the incorporated nucleic acid to identify and select plants with the most appropriate expression profiles.
A preferred embodiment is a transgenic plant that is homozygous for the added heterologous nucleic acid; i.e., a transgenic plant that contains two added nucleic acid sequences, one gene at the same locus on each chromosome of a chromosome pair. A homozygous transgenic plant can be obtained by sexually mating (selfing) a heterozygous transgenic plant that contains a single added heterologous nucleic acid, germinating some of the seed produced and analyzing the resulting plants produced for altered expression of a polynucleotide of the present invention relative to a control plant (i.e., native, non-transgenic). Back-crossing to a parental plant and out-crossing with a non-transgenic plant are also contemplated.
Transformed plant cells which are derived by any of the above transformation techniques can be cultured to regenerate a whole plant which possesses the transformed genotype. Such regeneration techniques often rely on manipulation of certain phytohormones in a tissue culture growth medium. For transformation and regeneration of maize see, Gordon-Kamm et al., 1990, The Plant Cell, 2:603-618.
Plants cells transformed with a plant expression vector can be regenerated, e.g., from single cells, callus tissue or leaf discs according to standard plant tissue culture techniques. It is well known in the art that various cells, tissues, and organs from almost any plant can be successfully cultured to regenerate an entire plant. Plant regeneration from cultured protoplasts is described in Evans et al., 1983, Protoplasts Isolation and Culture, Handbook of Plant Cell Culture, Macmillan Publishing Company, New York, pp. 124-176; and Binding, Regeneration of Plants, Plant Protoplasts, 1985, CRC Press, Boca Raton, pp. 21-73.
The regeneration of plants containing the foreign gene introduced by Agrobacterium from leaf explants can be achieved as described by Horsch et al., 1985, Science, 227:1229-1231. In this procedure, transformants are grown in the presence of a selection agent and in a medium that induces the regeneration of shoots in the plant species being transformed as described by Fraley et al., 1983, Proc. Natl. Acad. Sci. (U.S.A.), 80:4803. This procedure typically produces shoots within two to four weeks and these transformant shoots are then transferred to an appropriate root-inducing medium containing the selective agent and an antibiotic to prevent bacterial growth. Transgenic plants of the present invention may be fertile or sterile.
The regeneration of plants from either single plant protoplasts or various explants is well known in the art. See, for example, Methods for Plant Molecular Biology, A. Weissbach and H. Weissbach, eds., 1988, Academic Press, Inc., San Diego, Calif. This regeneration and growth process includes the steps of selection of transformant cells and shoots, rooting the transformant shoots and growth of the plantlets in soil. For maize cell culture and regeneration see generally, The Maize Handbook, Freeling and Walbot, Eds., 1994, Springer, N.Y. 1994; Corn and Corn Improvement, 3rd edition, Sprague and Dudley Eds., 1988, American Society of Agronomy, Madison, Wis.
The present invention also provides a plant comprising a plant cell as disclosed. Transformed seeds and plant parts are also encompassed.
In addition to a plant, the present invention provides any clone of such a plant, seed, selfed or hybrid progeny and descendants, and any part of any of these, such as cuttings, seed. The invention provides any plant propagule, that is any part which may be used in reproduction or propagation, sexual or asexual, including cuttings, seed and so on. Also encompassed by the invention is a plant which is a sexually or asexually propagated off-spring, clone or descendant of such a plant, or any part or propagule of said plant, off-spring, clone or descendant. Plant extracts and derivatives are also provided.
Any species of woody, ornamental or decorative, crop or cereal, fruit or vegetable plant, and algae (e.g., Chlamydomonas reinhardtii) may be used in the compositions and methods provided herein. Non-limiting examples of plants include plants from the genus Arabidopsis or the genus Oryza. Other examples include plants from the genuses Acorus, Aegilops, Allium, Amborella, Antirrhinum, Apium, Arachis, Beta, Betula, Brassica, Capsicum, Ceratopteris, Citrus, Cryptomeria, Cycas, Descurainia, Eschscholzia, Eucalyptus, Glycine, Gossypium, Hedyotis, Helianthus, Hordeum, Ipomoea, Lactuca, Linum, Liriodendron, Lotus, Lupinus, Lycopersicon, Medicago, Mesembryanthemum, Nicotiana, Nuphar, Pennisetum, Persea, Phaseolus, Physcomitrella, Picea, Pinus, Poncirus, Populus, Prunus, Robinia, Rosa, Saccharum, Schedonorus, Secale, Sesamum, Solanum, Sorghum, Stevia, Thellungiella, Theobroma, Triphysaria, Triticum, Vitis, Zea, or Zinnia.
Plants included in the invention are any plants amenable to transformation techniques, including gymnosperms and angiosperms, both monocotyledons and dicotyledons.
Examples of monocotyledonous angiosperms include, but are not limited to, asparagus, field and sweet corn, barley, wheat, rice, sorghum, onion, pearl millet, rye and oats and other cereal grains.
Examples of dicotyledonous angiosperms include, but are not limited to tomato, tobacco, cotton, rapeseed, field beans, soybeans, peppers, lettuce, peas, alfalfa, clover, cole crops or Brassica oleracea (e.g., cabbage, broccoli, cauliflower, brussel sprouts), radish, carrot, beets, eggplant, spinach, cucumber, squash, melons, cantaloupe, sunflowers and various ornamentals.
Examples of woody species include poplar, pine, sequoia, cedar, oak, etc.
Still other examples of plants include, but are not limited to, wheat, cauliflower, tomato, tobacco, corn, petunia, trees, etc.
In certain embodiments, plants of the present invention are crop plants (for example, cereals and pulses, maize, wheat, potatoes, tapioca, rice, sorghum, millet, cassaya, barley, pea, and other root, tuber, or seed crops. Exemplary cereal crops used in the compositions and methods of the invention include, but are not limited to, any species of grass, or grain plant (e.g., barley, corn, oats, rice, wild rice, rye, wheat, millet, sorghum, triticale, etc.), non-grass plants (e.g., buckwheat flax, legumes or soybeans, etc.). Grain plants that provide seeds of interest include oil-seed plants and leguminous plants. Other seeds of interest include grain seeds, such as corn, wheat, barley, rice, sorghum, rye, etc. Oil seed plants include cotton, soybean, safflower, sunflower, Brassica, maize, alfalfa, palm, coconut, etc. Other important seed crops are oil-seed rape, sugar beet, maize, sunflower, soybean, and sorghum. Leguminous plants include beans and peas. Beans include guar, locust bean, fenugreek, soybean, garden beans, cowpea, mungbean, lima bean, fava bean, lentils, chickpea, etc.
Horticultural plants to which the present invention may be applied may include lettuce, endive, and vegetable brassicas including cabbage, broccoli, and cauliflower, and carnations and geraniums. The present invention may also be applied to tobacco, cucurbits, carrot, strawberry, sunflower, tomato, pepper, chrysanthemum, poplar, eucalyptus, and pine.
The present invention may be used for transformation of other plant species, including, but not limited to, corn (Zea mays), canola (Brassica napus, Brassica rapa ssp.), alfalfa (Medicago sativa), rice (Oryza sativa), rye (Secale cereale), sorghum (Sorghum bicolor, Sorghum vulgare), sunflower (Helianthus annuus), wheat (Triticum aestivum), soybean (Glycine max), tobacco (Nicotiana tabacum, Nicotiana benthamiana), potato (Solanum tuberosum), peanuts (Arachis hypogaea), cotton (Gossypium hirsutum), sweet potato (Ipomoea batatus), cassaya (Manihot esculenta), coffee (Coffea spp.), coconut (Cocos nucifera), pineapple (Ananas comosus), citrus trees (Citrus spp.), cocoa (Theobroma cacao), tea (Camellia sinensis), banana (Musa spp.), avocado (Persea americana), fig (Ficus casica), guava (Psidium guajava), mango (Mangifera indica), olive (Olea europaea), papaya (Carica papaya), cashew (Anacardium occidentale), macadamia (Macadamia integrifolia), almond (Prunus amygdalus), sugar beets (Beta vulgaris), oats, barley, Arabidopsis spp., vegetables, ornamentals, and conifers.
Methods of cultivation of plants are well known in the art. For example, for the cultivation of wheat see Alcoz et al., 1993, Agronomy Journal 85:1198-1203; Rao and Dao, 1992, J. Am. Soc. Agronomy 84:1028-1032; Howard and Lessman, 1991, Agronomy Journal 83:208-211; for the cultivation of corn see Tollenear et al., 1993, Agronomy Journal 85:251-255; Straw et al., Tennessee Farm and Home Science: Progress Report, Spring 1993, 166:20-24; Miles, S. R., 1934, J. Am. Soc. Agronomy 26:129-137; Dara et al., 1992, J. Am. Soc. Agronomy 84:1006-1010; Binford et al., 1992, Agronomy Journal 84:53-59; for the cultivation of soybean see Chen et al., 1992, Canadian Journal of Plant Science 72:1049-1056; Wallace et al., 1990, Journal of Plant Nutrition 13:1523-1537; for the cultivation of rice see Oritani and Yoshida, 1984, Japanese Journal of Crop Science 53:204-212; for the cultivation of linseed see Diepenbrock and Porksen, 1992, Industrial Crops and Products 1:165-173; for the cultivation of tomato see Grubinger et al., 1993, Journal of the American Society for Horticultural Science 118:212-216; Cerne, M., 1990, Acta Horticulture 277:179-182; for the cultivation of pineapple see Magistad et al., 1932, J. Am. Soc. Agronomy 24:610-622; Asoegwu, S. N., 1988, Fertilizer Research 15:203-210; Asoegwu, S. N., 1987, Fruits 42:505-509; for the cultivation of lettuce see Richardson and Hardgrave, 1992, Journal of the Science of Food and Agriculture 59:345-349; for the cultivation of mint see Munsi, P. S., 1992, Acta Horticulturae 306:436-443; for the cultivation of camomile see Letchamo, W., 1992, Acta Horticulturae 306:375-384; for the cultivation of tobacco see Sisson et al., 1991, Crop Science 31:1615-1620; for the cultivation of potato see Porter and Sisson, 1991, American Potato Journal, 68:493-505; for the cultivation of brassica crops see Rahn et al., 1992, Conference “Proceedings, second congress of the European Society for Agronomy”Warwick Univ., p.424-425; for the cultivation of banana see Hegde and Srinivas, 1991, Tropical Agriculture 68:331-334; Langenegger and Smith, 1988, Fruits 43:639-643; for the cultivation of strawberries see Human and Kotze, 1990, Communications in Soil Science and Plant Analysis 21:771-782; for the cultivation of songhum see Mahalle and Seth, 1989, Indian Journal of Agricultural Sciences 59:395-397; for the cultivation of plantain see Anjorin and Obigbesan, 1985, Conference “International Cooperation for Effective Plantain and Banana Research” Proceedings of the third meeting. Abidjan, Ivory Coast, p. 115-117; for the cultivation of sugar cane see Yadav, R. L., 1986, Fertiliser News 31:17-22; Yadav and Sharma, 1983, Indian Journal of Agricultural Sciences 53:38-43; for the cultivation of sugar beet see Draycott et al., 1983, Conference “Symposium Nitrogen and Sugar Beet” International Institute for Sugar Beet Research—Brussels Belgium, p. 293-303. See also Goh and Haynes, 1986, “Nitrogen and Agronomic Practice” in Mineral Nitrogen in the Plant-Soil System, Academic Press, Inc., Orlando, Fla., p. 379-468; Engelstad, O. P., 1985, Fertilizer Technology and Use, Third Edition, Soil Science Society of America, p.633; Yadav and Sharmna, 1983, Indian Journal of Agricultural Sciences, 53:3-43.
Engineered plants exhibiting the desired physiological and/or agronomic changes can be used directly in agricultural production.
Thus, provided herein are products derived from the transgenic plants or methods of producing transgenic plants provided herein. In certain embodiments, the products are commercial products. Some non-limiting example include genetically engineered trees for e.g., the production of pulp, paper, paper products or lumber; tobacco, e.g., for the production of cigarettes, cigars, or chewing tobacco; crops, e.g., for the production of fruits, vegetables and other food, including grains, e.g., for the production of wheat, bread, flour, rice, corn; and canola, sunflower, e.g., for the production of oils or biofuels.
In certain embodiments, commercial products are derived from a genetically engineered (e.g., comprising overexpression of GLK1 in the vegetative tissues of the plant) species of woody, ornamental or decorative, crop or cereal, fruit or vegetable plant, and algae (e.g., Chlamydomonas reinhardtii), which may be used in the compositions and methods provided herein. Non-limiting examples of plants include plants from the genus Arabidopsis or the genus Oryza. Other examples include plants from the genuses Acorus, Aegilops, Allium, Amborella, Antirrhinum, Apium, Arachis, Beta, Betula, Brassica, Capsicum, Ceratopteris, Citrus, Cryptomeria, Cycas, Descurainia, Eschscholzia, Eucalyptus, Glycine, Gossypium, Hedyotis, Helianthus, Hordeum, Ipomoea, Lactuca, Linum, Liriodendron, Lotus, Lupinus, Lycopersicon, Medicago, Mesembryanthemum, Nicotiana, Nuphar, Pennisetum, Persea, Phaseolus, Physcomitrella, Picea, Pinus, Poncirus, Populus, Prunus, Robinia, Rosa, Saccharum, Schedonorus, Secale, Sesamum, Solanum, Sorghum, Stevia, Thellungiella, Theobroma, Triphysaria, Triticum, Vitis, Zea, or Zinnia.
In some embodiments, commercial products are derived from a genetically engineered gymnosperms and angiosperms, both monocotyledons and dicotyledons. Examples of monocotyledonous angiosperms include, but are not limited to, asparagus, field and sweet corn, barley, wheat, rice, sorghum, onion, pearl millet, rye and oats and other cereal grains. Examples of dicotyledonous angiosperms include, but are not limited to tomato, tobacco, cotton, rapeseed, field beans, soybeans, peppers, lettuce, peas, alfalfa, clover, cole crops or Brassica oleracea (e.g., cabbage, broccoli, cauliflower, brussel sprouts), radish, carrot, beets, eggplant, spinach, cucumber, squash, melons, cantaloupe, sunflowers and various ornamentals.
In certain embodiments, commercial products are derived from a genetically engineered woody species, such as poplar, pine, sequoia, cedar, oak, etc.
In other embodiments, commercial products are derived from a genetically engineered plant including, but are not limited to, wheat, cauliflower, tomato, tobacco, corn, petunia, trees, etc.
In certain embodiments, commercial products are derived from a genetically engineered crop plants, for example, cereals and pulses, maize, wheat, potatoes, tapioca, rice, sorghum, millet, cassaya, barley, pea, and other root, tuber, or seed crops. In one embodiment, commercial products are derived from a genetically engineered cereal crops, including, but are not limited to, any species of grass, or grain plant (e.g., barley, corn, oats, rice, wild rice, rye, wheat, millet, sorghum, triticale, etc.), non-grass plants (e.g., buckwheat flax, legumes or soybeans, etc.). In another embodiments, commercial products are derived from a genetically engineered grain plants that provide seeds of interest, oil-seed plants and leguminous plants. In other embodiments, commercial products are derived from a genetically engineered grain seed plants, such as corn, wheat, barley, rice, sorghum, rye, etc. In yet other embodiments, commercial products are derived from a genetically engineered oil seed plants, such as cotton, soybean, safflower, sunflower, Brassica, maize, alfalfa, palm, coconut, etc. In certain embodiments, commercial products are derived from a genetically engineered oil-seed rape, sugar beet, maize, sunflower, soybean, or sorghum. In some embodiments, commercial products are derived from a genetically engineered leguminous plants, such as beans and peas (e.g., guar, locust bean, fenugreek, soybean, garden beans, cowpea, mungbean, lima bean, fava bean, lentils, chickpea, etc.)
In certain embodiments, commercial products are derived from a genetically engineered horticultural plant of the present invention, such as lettuce, endive, and vegetable brassicas including cabbage, broccoli, and cauliflower, and carnations and geraniums; tomato, tobacco, cucurbits, carrot, strawberry, sunflower, tomato, pepper, chrysanthemum, poplar, eucalyptus, and pine.
In still other embodiments, commercial products are derived from a genetically engineered corn (Zea mays), canola (Brassica napus, Brassica rapa ssp.), alfalfa (Medicago sativa), rice (Oryza sativa), rye (Secale cereale), sorghum (Sorghum bicolor, Sorghum vulgare), sunflower (Helianthus annuus), wheat (Triticum aestivum), soybean (Glycine max), tobacco (Nicotiana tabacum, Nicotiana benthamiana), potato (Solanum tuberosum), peanuts (Arachis hypogaea), cotton (Gossypium hirsutum), sweet potato (Ipomoea batatus), cassaya (Manihot esculenta), coffee (Coffea spp.), coconut (Cocos nucifera), pineapple (Ananas comosus), citrus trees (Citrus spp.), cocoa (Theobroma cacao), tea (Camellia sinensis), banana (Musa spp.), avocado (Persea americana), fig (Ficus casica), guava (Psidium guajava), mango (Mangifera indica), olive (Olea europaea), papaya (Carica papaya), cashew (Anacardium occidentale), macadamia (Macadamia integrifolia), almond (Prunus amygdalus), sugar beets (Beta vulgaris), oats, barley, Arabidopsis spp., vegetables, ornamentals, and conifers.
5.3. Components of the Target System
The TARGET system utilizes a nucleic acid encoding a chimeric protein comprising a transcription factor fused to a domain comprising an inducible cellular localization signal and an independently expressed selectable marker. Nucleic acids for use with the target system may be plasmids or other appropriate nucleic acid constructs as described in Section 5.2.3. The TARGET system also comprises methods of measuring mRNA expression levels and may additionally comprise methods of detecting TF binding to gene targets.
5.3.1. Transcription Factors
The transcription factor component chimeric protein encoded by the nucleic acid construct may be, but is not limited to, one of those listed in Table 3. The transcription factor used is not limited to nuclear transcription factors, but may also include proteins that modulate mitochondrial or chloroplast gene expression.
5.3.2. Localization Signals and Inducing Agents
The glucorticoid receptor (GR) may be used as the inducible cellular localization signal in the chimeric protein encoded by the nucleic acid construct. In the case of the a TF-GR chimeric protein, dexamethasone may be used as the inducing agent. Alternately, another glucocorticoid may be used instead of dexamethasone. Treatment with dexamethasone releases the glucocorticoid receptor from sequestration in the cytoplasm, allowing the TF-GR fusion protein to access its target genes (e.g., in the nucleus). The GR is not the only such inducible cellular localization signal that may be used in this method. Any receptor component or other protein known in the art that is capable of being released from sequestration or otherwise re-localized to the destination of the transcription factor component by treatment of the protoplasts with an inducing agent may potentially be used in the TARGET system.
5.3.3. Expression System and Selectable Markers
Using any gene transfer technique, such as the above-listed techniques (of Section 5.2), an expression vector harboring the nucleic acid may be transformed into a cell to achieve temporary or prolonged expression. Any suitable expression system may be used, so long as it is capable of undergoing transformation and expressing of the precursor nucleic acid in the cell. In one embodiment, a pET vector (Novagen, Madison, Wis.), or a pBI vector (Clontech, Palo Alto, Calif.) is used as the expression vector. In some embodiments an expression vector further encoding a green fluorescent protein (“GFP”) is used to allow simple selection of transfected cells and to monitor expression levels. Non-limiting examples of such vectors include Clontech's “Living Colors Vectors” pEYFP and pEYFP-C.
The recombinant construct of the present invention may include a selectable marker for propagation of the construct. For example, a construct to be propagated in bacteria preferably contains an antibiotic resistance gene, such as one that confers resistance to kanamycin, tetracycline, streptomycin, or chloramphenicol. Suitable vectors for propagating the construct include plasmids, cosmids, bacteriophages or viruses, to name but a few.
In some embodiments, the selectable marker encoded by the nucleic acid molecule used in the method of the invention is a fluorescent selection marker. A fluorescent selection marker that can be used in the method of the invention includes, but is not limited to, green fluorescent protein, yellow fluorescent protein, red fluorescent protein, cyan fluorescent protein, or blue fluorescent protein. In a specific embodiment, the fluorescent selection marker used in the method of the invention is red fluorescent protein. In certain embodiments, the step of detecting host cells that express the selectable marker is performed by Fluorescence Activated Cell Sorting (FACS). Any selectable marker known in the art that may be encoded in the nucleic acid construct and which is selectable using a cell sorting or other selection technique may be used to identify those cells that have expressed the nucleic acid construct containing the chimeric protein.
In addition, the recombinant constructs may include plant-expressible selectable or screenable marker genes for isolating, identifying or tracking of plant cells transformed by these constructs. Selectable markers include, but are not limited to, genes that confer antibiotic resistances (e.g., resistance to kanamycin or hygromycin) or herbicide resistance (e.g., resistance to sulfonylurea, phosphinothricin, or glyphosate). Screenable markers include, but are not limited to, the genes encoding .beta.-glucuronidase (Jefferson, 1987, Plant Molec Biol. Rep 5:387-405), luciferase (Ow et al., 1986, Science 234:856-859), B and C1 gene products that regulate anthocyanin pigment production (Goff et al., 1990, EMBO J 9:2517-2522).
In some cases, a selectable marker may be included with the nucleic acid being delivered to the cell. A selectable marker may refer to the use of a gene that encodes an enzymatic or other detectable activity (e.g., luminescence or fluorescence) that confers the ability to distinguish cells expressing the nucleic acid construct from those that do not. A selectable marker may confer resistance to an antibiotic or drug upon the cell in which the selectable marker is expressed. Selectable markers may be “dominant” in some cases; a dominant selectable marker encodes an enzymatic or other activity (e.g., luminescence or fluorescence) that can be detected in any cell or cell line.
In some embodiments, the marker gene is an antibiotic resistance gene whereby the appropriate antibiotic can be used to select for transformed cells from among cells that are not transformed. Examples of suitable selectable markers include adenosine deaminase, dihydrofolate reductase, hygromycin-B-phosphotransferase, thymidine kinase, xanthine-guanine phospho-ribosyltransferase and amino-glycoside 3′-0-phosphotransferase II. Other suitable markers will be known to those of skill in the art.
5.3.4. Detecting the Level of mRNA Expressed in Host Cells
The methods of the present invention comprise a step of detecting the level of mRNA expressed in the host cells of the invention.
In some embodiments, the level of mRNA expressed in host cells is determined by quantitative real-time PCR (qPCR), a method for DNA amplification in which fluorescent dyes are used to detect the amount of PCR product after each PCR cycle. (Higuchi et al., 1992; Simultaneous amplification and detection of specific DNA-sequences. Bio-Technology 10(4), 413-417].). The qPCR method has become the tool of choice for many scientists because of method's dynamic range, accuracy, high sensitivity, specificity and speed. Quantitative PCR is carried out in a thermal cycler with the capacity to illuminate each sample with a beam of light of a specified wavelength and detect the fluorescence emitted by the excited fluorochrome. The thermal cycler is also able to rapidly heat and chill samples thereby taking advantage of the physicochemical properties of the nucleic acids and DNA polymerase.
In some embodiments, the level of mRNA expressed in host cells is determined by high high throughput sequencing (Next-generation sequencing; also ‘Next-gen sequencing’ or NGS). =NGS methods are highly parallelized processes that enable the sequencing of thousands to millions of molecules at once. Popular NGS methods include pyrosequencing developed by 454 Life Sciences (now Roche), which makes use of luciferase to read out signals as individual nucleotides are added to DNA templates, Illumina sequencing that uses reversible dye-terminator techniques that adds a single nucleotide to the DNA template in each cycle and SOLiD sequencing by Life Technologies that sequences by preferential ligation of fixed-length oligonucleotides.
In some embodiments, the level of mRNA expressed in host cells is determined by gene microarrays. A microarray works by exploiting the ability of a given mRNA molecule to bind specifically to, or hybridize to, the DNA template from which it originated. By using an array containing many DNA samples, it can be determined in a single experiment, the expression levels of hundreds or thousands of genes within a cell by measuring the amount of mRNA bound to each site on the array. With the aid of a computer, the amount of mRNA bound to the spots on the microarray is precisely measured, generating a profile of gene expression in the cell.
5.3.5. Detecting TF Binding to Gene Targets
In some embodiments, the method comprises detection of the level of TF binding to gene targets by ChIP-Seq analysis. ChIP-Seq analysis utilizes chromatin immunoprecipitation in parallel with DNA sequencing to map the binding sites of a TF or other protein of interest. First, protein interactions with chromatin are cross-linked and fragmented. Then, immunoprecipitation is used to isolate the TF with bound chromatin/DNA. The associated chromatin/DNA fragments are sequenced to determine the gene location of protein binding. Other assays known in the art may be used to detect the location of TF binding to genomic regions of DNA.
In some embodiments, the yeast one hybrid method may be used. The yeast one hybrid method detects protein-DNA interactions, and may be adapted for use in plants. The DNA binding domains unveiled by ChIP-Seq may be cloned upstream of a reporter gene in a vector or may be introduced into the plant genome by homologous recombination, which allows the transcription factor to interact with the DNA element in a natural environment. A fusion protein containing a constitutive TF activation domain and the DNA binding domain of the TF of interest may then be expressed, and the interaction of the binding domain with the DNA will be detected by reporter gene expression. The yeast one hybrid method can thus be used in some embodiments as a way to interrogate the relationship between binding and activation, as only the binding domain of the TF of interest is used in the fusion protein in the heterologous system.
5.3.6. Identifying Conserved Connections Across Species
In some embodiments, gene networks conserved between Arabidopsis (or another model species) and a species of interest may be determined by a data mining approach. In this approach, Arabidopsis plants are grown under the same conditions as plants from another species of interest, including perturbation of environmental signals (e.g. nitrogen). RNA is then extracted from the roots and shoots of the plants, and cDNA synthesized from the extracted RNA. A microarray analysis and filtering approach may be used to determine the genes of each species regulated by the environmental signal when compared with control conditions. An ortholog analysis may then determine the genes orthologous between the two species. Data integration and network analysis then allows for the determination of a core translational network. In some embodiments, the response genes in a species of plant for which a protoplast system is not feasible may be discovered by using such a data mining approach, as described, in combination with the TARGET system for Arabidopsis or another species used as a model.
6.1. Introduction
A rapid technique to study the genome-wide effects of TF activation in protoplasts that uses transient expression of a glucocorticoid receptor (GR)-tagged TF has been developed in the present invention. This system can be used to rapidly retrieve information on direct target genes in less than two week's time. As a proof-of-principle candidate, the well-studied transcription factor, Abscicic acid insensitive 3 (ABI3; Koornneef et al., 1989, Plant physiology, 90:463-469; Mönke et al., 2012, Nucleic acids research 40:8240-8254) was used. The de novo identification of the abscisic acid response element (ABRE) and a majority of the previously classified direct targets was established by use of this method. This technique was named TARGET, for Transient Assay Reporting Genome-wide Effects of Transcription factors.
Technically, plant protoplasts are transfected with a plasmid (pBeaconRFP_GR) that expresses the TF-of-interest fused to GR, which allows the controlled entry of the chimeric GR-TF into the nucleus by addition of the GR-ligand dexamethasone (DEX; Schena and Yamamoto, 1988, Science 241:965-967). In addition, the vector contains a separate expression cassette with a positive fluorescent selection marker (red fluorescent protein; RFP) which enables fluorescence activated cell sorting (FACS) of successfully transformed protoplasts (see
6.2. Materials and Methods
Plant materials and treatment. Wild-type Arabidopsis thaliana seed (Col-0, Arabidopsis Biological Resource Center) was sterilized by 5 min incubation with 96% ethanol followed by 20 min incubation with 50% household bleach and rinsing with sterile water. Seeds were plated on square 10×10 cm plates (Fisher Scientific) with MS-agar (2.2 g/l Murashige and Skoog Salts [Sigma-Aldrich], 1% [w/v] sucrose, 1% [w/v] agar, 0.5 g/lIViES hydrate [Sigma-Aldrich], pH 5.7 with KOH) on top of a sterile nylon mesh (NITEX 03-100/47, Sefar filtration Inc.) to facilitate harvesting of the roots. Seeds were plated in two dense rows. Plates were vernalized for 2 days at 4° C. in the dark and placed vertically in an Advanced Intellus environmental controller (Percival) set to 35 μmol/m2*sec−1 and 22° C. with an 18 h-light/6 h-dark regime.
Vector construction. pBeaconRFP_GR was constructed by PCR amplification of the glucocorticoid receptor from pJCGLOX (Joubes et al., 2004, The Plant Journal 37: 889-896) with primers GR-F and GR-R, both with an SpeI restriction site, using Phusion polymerase (New England Biolabs). The PCR product was ligated into the SpeI site upstream of the GATEWAY (Invitrogen) cassette in pBeaconRFP (Bargmann and Birnbaum, 2009; Plant physiology 149:1231-1239). The orientation of the insert was checked by PCR. The pBeaconRFP_GR vector (as well as the pMON999_mRFP control vector, containing only 35S::mRFP) will be made available through the VIB website: http://gateway.psb.ugent.be/.
ABI3 cDNA was PCR amplified with primers ABI3_AttB1 and ABI3_AttB2, and subsequently re-amplified with primers AttB1 and AttB2 using Phusion polymerase. The PCR product was recombined into pDONR221 using BP clonase and subsequently shuttled into pBeaconRFP_GR with LR clonase (Invitrogen).
Protoplast preparation, transfection, treatment and cell sorting. Protoplast were prepared, transfected and sorted as described in Bargmann and Birnbaum, 2009; Plant physiology 149:1231-1239; and Bargmann and Birnbaum, 2010, JoVE. Briefly, roots of 10-day-old seedling were harvested and treated with cell wall digesting enzymes (Cellulase and Macerozyme; Yakult, Japan) for 3 hours. Cells were filtered, washed and 106 cells were transfected with a polyethylene glycol treatment using 50 μg of plasmid DNA and incubated at room temperature overnight. Protoplast suspensions were pretreated with 35 μM cycloheximide (CHX; Sigma-Aldrich) for 30 min, after which 10 μM dexamethasone (DEX; Sigma-Aldrich) was added and cells were incubated at room temperature. Controls were treated with solvent alone. A 10 mM DEX stock was dissolved in ethanol and a 50 mM CHX stock was dissolved in dimethylsulfoxide, both were stored at −20° C. All transfections and treatments were performed in triplicate. Treated protoplasts suspensions were sorted with a FACSAria (BD Biosciences), using 488 nm excitation and measuring emission at 530/30 nm for green fluorescence and 610/20 nm for red fluorescence. RFP-positive cells were sorted directly into RNA extraction buffer. Twenty thousand RFPpositive cells (+/−10% of sorted events were RFP-positive under these experimental conditions) were then isolated by FACS and RNA was extracted for transcript analysis by qPCR.
A temporal qPCR analysis of PER1 and CRU3 induction by DEX in the presence of CHX was performed after a 1-hour, 5-hour and overnight (16-hour) incubation (see
qPCR and microarray analysis. RNA was extracted using an RNeasy Micro Kit with RNase-free DNase Set according to the manufacturer's instructions (QIAGEN). RNA was quantified with a Bioanalyzer (Agilent Technologies). Gene expression was determined by quantitative real-time PCR (LightCycler; Roche Diagnostics) using gene-specific primers and LightCycler FastStart DNA Master SYBR Green (Roche Diagnostics). Expression levels of tested genes were normalized to expression levels of theACT2/8 and CLATHRIN genes as described in (Krouk et al., 2006 Plant Physiol 142:1075-1086). For microarray analysis, RNA was amplified and labeled with WT-Ovation Pico RNA Amplification System and FL-Ovation cDNA Biotin Module V2, respectively (NuGEN). The labeled cDNA was hybridized, washed and stained on an ATH-121501 Arabidopsis full genome microarray using a Hybridization Control Kit, a GeneChip Hybridization, Wash, and Stain Kit, a GeneChip Fluidics Station 450 and a GeneChip Scanner (Affymetrix). The microarray data reported in this paper have been deposited in the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) database, (accession #GSE33344). Raw microarray data was normalized using MAS5.0 (scaling factor of 250, Flexarray; http://www.gqinnovationcenter.com/services/bioinformatics/flexarray/index.aspx?1=e). Data was logged prior to running a Tukey post hoc test on the significance coefficients of a two way ANOVA carried out on CHX versus DEX treatment (in-house [R] script) for differential responses to DEX with or without CHX on non-ambiguous probesets. Heatmaps were created using Multiple Experiment Viewer software (TIGR; http://www.tm4.org/mev/). For the overlap analysis with previously identified targets of ABI3 (Mönke et al., 2012, Nucleic acids research 40:8240-8254), VP1 (Suzuki et al., 2003, Plant physiology 132:1664-1677) and ABI5 (Reeves et al., 2011, Plant molecular biology, 75:347-363), distance between non-parametric distributions (one from the overlap of sampled input gene sets and one from two randomly sampled sets of genes represented on the ATH1 array) was calculated using the genesect [R] script (Krouk et al., 2010, Genome biology 11:R123). For the overlap with VP1 targets, the background consisted of genes represented on both the ATH1- and the 8k AG array [Affymetrix] used by Suzuki and co-workers.
GO-term and promoter analysis. GO-term analysis was performed online using the BioMaps function on the VirtualPlant website (www.virtualplant.org) with a default corrected p-value cutoff on the Fisher exact test of p<10-3 (Katari et al., 2010; Plant Physiology, 152:500-515). To determine enrichment of known promoter motifs, the number of 1 kb upstream promoters, out of the top fifty ABI3 up-regulated genes, having one or more of the motifs described in the PLACE database was counted (http://www.dna.affrc.go.jp/PLACE/). p-values were generated using hypergeometric distribution, and values were FDR corrected using an FDR q-value cutoff of 0.01.promoter element enrichment analysis was performed using [R] (http://www.r-project.org/). For the sliding window analysis for promoter element enrichment (see
6.3. Results
As a first test of the TARGET system, the expression of known direct ABI3 targets PER1 and CRU3 were assayed by qPCR. Compared to control gene expression, both PER1 and CRU3 showed significant induction of transcript levels upon DEX treatment in the ABI3-GR transfected protoplasts in the presence of CHX (
The list of 186 putative direct up-regulated genes was highly significantly enriched for genes previously identified as direct targets of ABI3 in whole plant studies (Ze=54.3), as well as targets of the maize homolog VIVIPAROUS1 (Ze=20.8) and co-regulator ABIS (Ze=20.9) (
6.4. Discussion
One advantage of the TARGET system lies in the speed at which identification of genome-wide TF targets can be performed. A candidate TF can now be scrutinized for its target genes in a genome in a matter of weeks rather than the months required for the generation of stable transgenic plant lines. The TARGET transient transformation system can also be used purely as a verification of specific TF-target interactions by qPCR, much as yeast-one-hybrid (Y1H) assays are often used, but now in the context of endogenous gene activation in plant cells rather than promoter binding in a yeast strain. The TARGET approach brings the convenience of microbiological systems like Y1H to the genome-wide transcriptomic capabilities of in planta studies. Another advantage of the use of protoplast transformation in the TARGET system is that it can be done in a wide range of species where the generation of transgenic plant lines is either impossible or problematic and more time-consuming (Sheen et al., 2001, Plant physiology 127:1466-1475). The TARGET system combined with RNA sequencing, can enable rapid and systematic assessment of TF function in numerous plant species, for example in important crop model species.
This system is not a replacement for in-depth studies using transcriptional- and chromatin immuno-precipitation (ChIP) analyses in transgenic plants. Rather, TARGET is rapid tool for GRN investigations that may have uses in particular circumstances. There are considerations associated with the use of this system. On its own, a genome-wide analysis will yield results that contain false-positives and false-negatives. Identification of direct regulated genes by TARGET is therefore not unequivocal, additional assays for direct TF-target interaction (e.g. ChIP, Y1H, gel shift assays) are required for definitive identification of TF targets. The functionality of the chimeric GR-TF is not tested in this system, other than by the substance of the results. CHX treatment by itself may have effects on transcription that influence the DEX effect on certain direct target genes. Lastly, the cellular dissociation procedure itself may induce gene expression responses that could conceal the effects of TF activation. One can envisage two ways of using the TARGET system; either in combination with other techniques to get high confidence target lists for a particular TF, or as a high-throughput analysis of numerous TFs in a given GRN to get a broad view of putative interactions.
Overall, the results presented here demonstrate that TARGET represents a novel and rapid transient system for TF investigation that can be used to help map GRN. Important indications of TF operation, such as direct target genes, biological function by GO-term associations and cis-regulatory elements involved in its action, can be obtained in a rapid and straightforward manner. The proof-of-principle analysis with ABI3 offers a new dataset of transcripts affected by this TF, adding to the understanding of the downstream significance of this central regulator.
The pBeaconRFP_GR vector will be made available through the VIB website (http://gateway.psb.ugent.be/).
7.1. Introduction
Evidence for temporal, signal induced TF-target associations that involve the rapid and transient induction of genes related to the signal has been developed in the present invention. This discovery was enabled by a combination of conceptual and technical advances in a cell-based system, which enabled overexpression of a specific TF of interest and temporal induction of its nuclear localization. By temporally inducing TF nuclear localization using dexamethasone (DEX) in the presence of cycloheximide (CHX) to block translation, identification of the primary targets of a TF of interest was possible, based on either TF-regulation or TF-binding assayed in the same samples, exposed to a signal. Moreover, the perturbation of both the TF and the signal it transduces uncovered three distinct TF modes-of-action, “poised”, “active” and “transient”, the latter encompassing signal-dependent, transient TF-target associations. This discovery was made for bZIP1 (BASIC LEUCINE ZIPPER 1), a TF implicated as an integrator of cellular and metabolic signaling in Arabidopsis and shared in other eukayrotes (Weltmeier et al., 2008, Plant Molecular Biology 69:107; Sun et al., 2011, Journal of Plant Research 125:429; Baena-Gonzalez et al., 2007, Nature 448:938; Kietrich et al., 2011, The Plant Cell 23:381; Kang et al., 2010, Molecular Plant 3:361; Gutierrez et al., 2008, Proc. Natl. Acad. Sci. U.S.A., 105:4939; Obertello et al., 2010, BMC systems biology 4:111). The discovery of this new class of“transient”, signal-induced TF-target interactions opens a window into TF network dynamics that has been missed in previous TF studies in plants and animals. The inclusion of such context-dependent TF-target interactions in GRNs, will improve the predictive capability of GRN models to generate hypotheses that will direct future experimental efforts in living systems.
7.2. Materials and Methods
Plant Materials and DNA Constructs. Wild-type Arabidopsis thaliana seeds [Columbia ecotype (Col-0)] were vapor-phase sterilized, vernalized for 3 days, then 1 ml of seeds were sown on 24 agar plates containing MS [2.2 g/l custom made Murashige and Skoog salts without N or sucrose [Sigma-Aldrich]; 1% [w/v] sucrose; 0.5 g/l MES hydrate [Sigma-Aldrich]; 1 mM KNO3; 2% [w/v] agar; pH 5.7 with HCl]. Plants were grown vertically in an Intellus environment controller [Percival Scientific, Perry, Iowa] set to 35 μmol m−2 s−1 and 16 h-light/8 h-dark regime at constant 22° C. bZIP1 [At5g49450] cDNA in pENTR was obtained from the REGIA collection (Paz-Ares et al., 2002, Comparative and functional genomics 3:102) and was then cloned into the destination vector pBeaconRFP_GR (Bargmann et al., 2013, Molecular Plant 6(3):978) by LR recombination [Life Technologies].
Protoplast Preparation, Transfection, Treatment and Cell Sorting. Protoplasts were prepared, transfected and sorted as previously described (Bargmann et al., 2013, Molecular Plant 6(3):978; Yoo et al., 2007, Nature Protocols 2:1565; Bargmann et al., 2009, Plant physiology 149:1231). Briefly, roots of 10-day-old seedlings were harvested and treated with cell wall digesting enzymes [Cellulase and Macerozyme; Yakult, Japan] for 4 h. Cells were filtered and washed then transfected with 40 μg of pBeaconRFP_GR::bZIP1 plasmid DNA per 1×106 cells facilitated by polyethylene glycol treatment [PEG; Fluka 81242] for 25 minutes (Bargmann et al., 2013, Molecular Plant 6(3):978). Cells were washed drop-wise, concentrated by centrifugation, then resuspended in wash solution for overnight incubation at room temperature. Protoplast suspensions were treated sequentially with a N-signal treatment of either a 20 mM KNO3 and 20 mM NH4NO3 solution [N] or 20 mM KCl [control] for 2 h, either cycloheximide [CHX] [35 μM in DMSO; Sigma-Aldrich] or solvent alone as mock for 20 min, and then with either dexamethasone [DEX] [10 μM in EtOH; Sigma-Aldrich] or solvent alone as mock for 4 h at room temperature. Treated protoplast suspensions were sorted as in (Bargmann et al., 2009, Plant physiology 149:1231): approximately 10,000 RFP-positive cells were sorted directly into RLT buffer [QIAGEN].
RNA Extraction And Microarray. RNA was extracted from protoplasts [6 replicates: 3 treatment replicates and 2 biological replicates] using an RNeasy Micro Kit with RNase-free DNaseI Set [QIAGEN] and quantified on a Bioanalyzer RNA Pico Chip [Agilent Technologies]. RNA was then converted into cDNA, amplified and labeled with Ovation Pico WTA System V2 [NuGEN] and Encore Biotin Module [NuGEN], respectively. The labeled cDNA was hybridized, washed and stained on an ATH1-121501 Arabidopsis Genome Array [Affymetrix] using a Hybridization Control Kit [Affymetrix], a GeneChip Hybridization, Wash, and Stain Kit [Affymetrix], a GeneChip Fluidics Station 450 and a GeneChip Scanner [Affymetrix].
Analysis of microarray data with CHX treatment: Microarray intensities were normalized using the GCRMA [http://www.bioconductor.org/packages/2.11/bioc/html/germa.html] package. Differentially expressed genes were then determined by a 3-way ANOVA with N, DEX and biological replicates as factors. The raw p-value from ANOVA was adjusted by False Discovery Rate [FDR] to control for multiple testing (Benjamini et al., 2005, Genetics 171:783). Genes significantly regulated by N and/or bZIP1 were then selected with a FDR cutoff of 5% while genes significantly regulated by the interaction of N and bZIP1 [NXbZIP1] were selected with a p-val [ANOVA] cutoff of 0.01. Only unambiguous probes were included. Heatmaps were created using Multiple Experiment Viewer software [TIGR; http://www.tm4.org/mev/]. The significance of overlaps of gene sets were calculated using the genesect [R] script (Krouk et al., 2010, Genome Biology 11:R123) or the hypergeometric method [R].
Analysis of microarray data without CHX treatment: Analysis was identical to with CHX except a 2-way ANOVA with N and bZIP1 as factors was used to identify differentially expressed genes.
Micro Chromatin Immunoprecipitation. For each combination of protoplast treatments (see above), an unsorted suspension of protoplasts containing approximately 5,000-10,000 GR::bZIP1 transfected cells was incubated with gentle rotation in 1% formahaldeyde in W5 buffer for 7 minutes, then washed with W5 buffer and frozen in liquid N2. μChIP was performed according to Dahl et al, 2008 (Dahl et al., 2008, Nucleic Acids Research, 36:e15) with a few modifications. The GR::bZIP1-DNA complexes were captured using anti-GR antibody [GR [P-20]-Santa Cruz biotech] bound to Protein A beads [Life Biotechnologies]. A washing step with LiCl buffer [0.25M LiCl, 1% Na deoxycholate, 10 mM Tris-HCl (pH8), 1% NP-40] was added in between the wash with RIPA buffer and TE (Dahl et al., 2008, Nucleic Acids Research, 36:e15). After elution from the beads, the ChIP material and the INPUT DNA were cleaned and concentrated using QIAGEN MinElute Kit [QIAGEN]. The protoplast suspension used for micro ChIP was not FACS sorted to maintain a comparable incubation time between the samples that were used for microarray analyses and for micro ChIP. Additionally, FACS sorting of transformed cells was not required to identify DNA targets, as it is required for microarray studies.
ChIP-Seq library prep. The ChIP DNA and Input DNA were prepared for Illumina HiSeq sequencing platform following the Illumina ChIP-Seq protocol [Illumina, San Diego, Calif.] with modifications. Barcoded adaptors and enrichment primers [BiOO Scientific, TX, USA] were used according to the manufacturer's protocol. The concentration and the quality of the libraries was determined by the Qubit Fluorometric DNA Assay [InVitrogen, NY, USA], DNA 12000 Bioanalzyer chip [Agilent, Calif., USA] and KAPA Quant Library Kit for Illumina [KAPA Biosystems, Mass., USA]. A total of 8 libraries were then pooled equimolarly and sequenced on two lanes of an Illumina HiSeq platform for 100 cycles in paired-end configuration [Cold Spring Harbor Lab, N.Y.].
ChIP-Seq Analysis. Reads obtained from the four treatments were filtered and aligned to the Arabidopsis thaliana genome [TAIR10] and clonal reads were removed. The ChIP alignment data was compared to its partner Input DNA and peaks were called using the QuEST package (Valouev et al., 2008, Nature Methods 5:829.) with a ChIP seeding enrichment ≥5, and extension and background enrichments ≥2. These regions were overlapped with the genome annotation to identify genes within 500 bp downstream of the peak. The gene lists from multiple treatments were largely overlapping sets and hence were pooled to generate a single list of 850 genes that show significant binding of bZIP1. Due to technical issues, the experimental design used for ChIP-Seq precludes the observation of significant differences between the genes bound by bZIP1 under the different treatment conditions. This is because the samples fixed for ChIP included a variable number of transfected cells that were not sorted by FACS.
Cis-element Motif Analysis. 1 Kb regions upstream of the TSS (Transcription Start Site) for target genes were extracted based on TAIR10 annotation and submitted to the Elefinder program (Li et al., 2011, Plant physiology 156:2124.) or MEME (53) to determine over-representation of known binding sites. (Different parameters used in specific cases were notified in the paper if applicable). The E-value of significance for each motif was used to cluster the occurrence of motifs in the various subsets using the HCL algorithm in MeV (Saeed et al., 2006, Methods in Enzymology 411:134). Motifs that show a higher specificity to a particular category or a sub-group were identified with the PTM algorithm in MeV. De novo motif identification was performed on 1 Kb upstream sequence of the genes regulated by bZIP1 from microarray and ChIP-Seq data separately using the MEME suite (Bailey et al., 2009, Nucleic Acids Research 37:W202).
7.3. Results
Perturbation of a TF and the signal it transduces uncovers context-dependent primary TF target genes. To discern mechanisms by which TFs controlling GRNs respond to a signal perceived in vivo, both a TF (bZIP1) and a metabolic signal that it transduces (nitrogen, N) were perturbed (Gutierrez et al., 2008, Proc. Natl. Acad. Sci. U.S.A. 105:4939; Obertello et al., 2010, BMC systems biology 4:111). The Arabidopsis TF bZIP1 was transiently overexpressed as a glucocorticoid receptor fusion (35S::GR-bZIP1) in a rapid cell-based system called TARGET (Transient Assay Reporting Genome-wide Effects of Transcription factors) (Bargmann et al., 2013, Molecular Plant 6(3):978) and genome-wide responses were monitored (
Forty-eight bZIP1 primary targets (FDR<0.05) were uncovered that show a significant TF×N-signal interaction (pval<0.01) (Table 6). These genes responding to bZIP1×N interactions form four distinct expression clusters (
To next identify primary bZIP1 targets whose promoter was bound by the GR-bZIP1 fusion protein either directly or indirectly through an interacting TF partner in a protein complex, a micro-ChIP protocol (Dahl et al., 2008, Nucleic Acids Research 36:e15) was adapted using anti-GR antibodies to pull down genomic regions bound to bZIP1 (
It was confirmed that the 1,218 genes responding to bZIP1 perturbation and the 850 genes with significant binding to bZIP1 are enriched in bZIP1 primary targets by cis-regulatory motif analysis using MEME (Bailey et al., 2009, Nucleic Acids Research 37:W202) and elefinder (Li et al., 2011, Plant physiology 156:2124), which searches for known bZIP1 binding sites. Genes induced or bound by bZIP1 (644 genes) showed a highly significant overrepresentation of “G/C-box” (
Identification of temporal modes of bZIP1 primary target gene regulation. Mechanisms underlying temporal, signal-mediated modes of TF action were identified by integrating results from transcriptome and ChIP-Seq, and then performing analysis of signal context, biological function, and cis-element enrichment in bZIP1 primary target genes (
In planta cross-validation of the three classes of bZIP1 primary targets. The in vivo relevance of all three classes of bZIP1 primary targets was validated based on comparison to targets identified in planta in i) a constitutive bZIP1 overexpression line (Kang et al., 2010, Molecular Plant 3:361) (122/449 genes; p-val<0.001) (
Cis-element analysis of the three classes of bZIP1 targets. Cis-element analysis of each of the three subclasses of bZIP1 regulated gene targets show enrichment of known bZIP binding sites (
Class I “poised” bZIP1 targets: TF Binding, No regulation. This class of bZIP1 primary targets were specifically and significantly overrepresented in genes involved in “regulation of transcription” and “calcium transport” (FDR<0.01) (
Class II “active” bZIP1 targets: TF Binding and Regulation. The 190 primary bZIP1 target genes in Class II, represents a 29% overlap (p-val<0.001) between the transcriptome and ChIP-Seq data, which compares favorably to such overlaps in other TF studies in planta (23% ABI3 (Monke et al., 2012, Nucleic Acids Research 40:8240); 25% PIL5 (Oh et al., 2009, The Plant Cell Online 21:403)). Class II genes are the classical “gold standard” set that are the only primary targets identified in other TF studies that require TF-binding to define primary targets. For bZIP1, these primary targets in Class II have an overrepresentation in genes involved in “response to stress/stimulus” (FDR<0.01), which was a term common to all three classes of bZIP1 targets. No class-specific GO-terms were identified for these “classic” Class II bZIP1 primary target genes (
Class III “transient” bZIP1 targets: TF Regulation, but no detectable TF binding. Unexpectedly, the Class III bZIP1 primary target genes, that are regulated by, but not detectably bound to the TF, turned out to be the largest set of bZIP1 primary target genes (1,028) detected in this study. The Class III genes were identified as primary bZIP1 targets based on gene regulation in response to the nuclear import of bZIP1 performed in the presence of CHX (to block activation of secondary targets), but were not detected in the parallel ChIP-Seq analysis to be bound by bZIP1 directly or indirectly in a protein complex containing bZIP1. In either scenario—direct binding of bZIP1 to its gene target or bZIP1 binding via interacting TF partners—the bZIP1 target gene should be detected by ChIP-Seq if the interaction is stable. This led to the hypothesis that the Class III primary bZIP1 target genes that are regulated in response to DEX-induced bZIP1 nuclear import may be the result of a transient TF-target association not detectable by ChIP-Seq at the time of sampling. A series of results supports this view, and also indicates that the Class III “transient” bZIP1 primary targets are most relevant to the function of bZIP1 in transducing the N-signal provided. First, the Class III “transient” bZIP1 primary target genes show a substantial (117/328) and the most significant overlap with N-responsive genes (
Class III “transient” bZIP1 target genes show an early and transient N-response in planta. To assess the significance of the three classes of bZIP1 targets identified in this cell-based system, the classes were compared to studies that have implicated bZIP1 as a master hub in mediating responses to N nutrient signals in planta (Gutierrez et al., 2008, Proc. Natl. Acad. Sci. U.S.A. 105:4939; Obertello et al., 2010, BMC Systems Biology 4:111). Indeed, all three classes of bZIP1 primary targets identified in this cell-based system were significantly enriched (pval<0.001) in genes regulated by an identical nitrogen treatment (NH4NO3) in an in planta study (
Cis-element context analysis uncovers elements associated with signal×TF interactions. A distinguishing feature of the Class III “transient” bZIP1 primary targets is their significant enrichment in genes responding to a bZIP1×N-signal interaction (
7.4. Discussion and Concluding Remarks
A previously unrecognized “transient” mode of TF action was uncovered by a conceptual innovation in the experimental design to temporally perturb both a TF and signal, and in the integration and interpretation of TF-binding and TF-regulation data. This allowed for identification of primary TF targets based on either gene regulation or TF-binding, and the association of this regulation with a signal. This contrasts with previous studies of TFs in both plants and animals, where the identification of primary targets has been limited to TF-binding and/or the overlap between TF-regulation and TF-binding (Reeves et al., 2011, Plant Molecular Biology 75:347; Gorski et al., 2011, Nucleic Acids Research 39:9536; Hull et al., 2013, BMC Genomics 14:92; Fujisawa et al., 2011, Planta 235:1107; Wagner et al., 2004, The Plant Journal: for cellular and molecular biology 39:273). The approach enabled discovery of a new class of “transient” TF targets that are regulated by the TF but not detectably bound by it, because of three complementary features of the system: i) the ability to temporally induce the nuclear import of the TF bZIP1 in the presence or absence of a signal; ii) the use of a protein synthesis inhibitor (CHX) to identify primary TF-targets based solely on gene regulation; and iii) the ability to perform transcriptome analysis and ChIP-Seq on the same samples which allowed direct data comparison. Combining these features enabled the distinction between three temporal modes of bZIP1 action in regulating primary TF-target genes: “poised”, “active” and “transient”. By examining the TF modes of action in the presence or absence of a signal it transduces (N), it was found that Class III “transient” gene targets (TF-regulated but not bound) were most relevant to the N-signal provided, as they show unique and significant: i) enrichment in N-responsive genes (
This discovery suggests that the Class III “transient” TF-target genes are likely the result of a temporal association between bZIP1 with these targets, acting either directly on the primary target DNA and/or through TF partner interactions (
To place these findings in perspective, the general field of GRN validation has focused on determining when and how TF binding does, or does not, result in gene activation (Reeves et al., 2011, Plant Molecular Biology 75:347; Gorski et al, 2011, Nucleic Acids Research 39:9536). This focus has limited the field to studying the more stable and static “gold standard” interactions exemplified by the bZIP1 Class II genes (TF-bound and regulated). The discovery of the Class III “transient” TF-targets (TF-regulated, no binding) now opens the opposite question/perspective in the general field of transcriptional control: How and why can TF-induced changes in mRNA occur in the absence of stable TF binding? The simple explanation that the Class IIIA mRNA is stabilized by CHX or bZIP1 is not supported by the data, as +/-CHX results are comparable (
In support of this “hit-and-run” model, the Class III “transient” genes are enriched in mRNAs with short half-lives (<2 hour) (Chiba et al., 2013, Plant & cell physiology 54:180) indicating that they are actively transcribed at the 5 hour time-point when the gene is induced by the TF but is not stably bound to it (
The “transient”, signal-induced association of a target with a TF can be analaogized to a “touch-and-go” (hit-and-run) landing or circuit maneuver used in aviation. This involves landing a plane on a runway and taking off again without coming to a full stop, allowing many landings in a short time. This maneuver also allows pilots to rapidly detect or avoid another plane or object on the runway, and could serve an analogous role for bZIP1 and its TF partners. The “touch-and-go” (hit-and-run) mode may enable bZIP1 to “direct”, “detect” or “avoid” TFs on a gene target, or alternatively to rapidly activate and leave the promoter “empty” for its TF partners to occupy. By contrast, the more traditional “stop-and-go” action requiring a full stop before taking off again, is a more stable maneuver which can be analogized to the classic Class II “gold standard” set, in which the TF lands (stably binds) and regulates a gene. While these more stable and static interactions have been the focus of most TF studies, the discovery of this new “touch-and-go” (hit-and-run) mode of TF action opens a new concept and field of inquiry in the study of dynamic GRNs in plants and animals.
8.1. Plant Growth and Treatment
Rice seeds (Oryza sativa ssp. japonica) were kindly provided by Dale Bumpers of the National Rice Research Center (AR, USA). Seeds were surface-sterilized and vernalized on 1× Murashige and Skoog (MS) basal salts (custom-made; GIBCO) with 0.5 mM ammonium succinate and 3 mM sucrose, 0.8% BactoAgar at pH 5.5 for 3 days in dark conditions at 27° C. Germinated seeds were transferred to a hydroponic system (Phytatray II, Sigma Aldrich) containing basal MS salts (custom-made; GIBCO) with 0.5 mM ammonium succinate and 3 mM sucrose at pH 5.5 to grow for 12 days under long-day (16 h light: 8 h dark) at 27° C., at light intensity of 180 μE·s−1·m−2. Media was replaced every 3 days and the plants were transferred to fresh media containing basal MS salts for 24 h prior treatment. On day 13, plants were transiently treated for 2 h at the start of their light cycle by adding Nitrogen (N) at a final concentration of 20 mM KNO3 and 20 mM NH4NO3 (referred here as 1×N). Control plants were treated with KCl at a final concentration of 20 mM. After treatment, roots and shoots were harvested separately using a blade, and immediately submerged into liquid nitrogen and stored at −80° C. prior to RNA extraction.
Arabidopsis seeds were placed for 2 days in the dark at 4° C. to synchronize germination. Seeds were surface-sterilized and then transferred to a hydroponic system (Phytatray I, Sigma Aldrich) containing the same media previously described for rice (pH 5.7). Growth conditions were the same as in rice, except that plants were under 50 μE s−1·m−2 light intensity at 22° C. N-starvation and treatments were done as described above (
8.2. Microarray Experiments and Analysis
cDNA synthesis, array hybridization and normalization of the signal intensities were performed according to the instructions provided by Affymetrix. Affymetrix Arabidopsis ATH1 Genome Array Chip and Rice Genome Array Chip were used for respective species. Data normalization was performed using the RMA (Robust Microarray Analysis) method in the Bioconductor package in R statistical environment. A two-way Analysis of Variance (ANOVA) was performed using custom-made function in R to identify probes that were differentially expressed following N treatment. The p-values for the model were corrected for multiple hypotheses testing using FDR correction at 5% (Benjamini and Hochberg, 1995, Journal of the Royal Statistical Society 57:289). The probes passing the cut-off (p≤0.05) for the model and, N treatment or interaction of N treatment and tissue, were deemed significant. A Tukey's HSD post-hoc analysis was performed on significant probes to determine the tissue specificity of N-regulation at p-value cut-off ≤0.05 and fold-change ≥1.5-fold (
Orthologous N-regulated genes between Rice and Arabidopsis were obtained using reverse Blast (Camacho et al., 2008, BMC Bioinformatics 10:421) with an e-value≤1e−20, thereby allowing for multiple ortholog hits (
8.3. Network Analysis
A Rice Multinetwork was generated using the following interactions (
Metabolic interactions were obtained from RiceCyc (Dharmawardhana et al., 2013, Rice 6:15).
Protein-Protein interactions were obtained from the PRIN database (Gu et al., 2011, BMC Bioinformatics 12:161), and published work, which include experimentally determined and computationally predicted interactions (Ding et al., 2009, Plant Physiology 149(3):1478; Rohila et al., 2006, The Plant Journal 46:1; Ho et al., 2012, The Rice Journal 5:15).
Predicted Regulatory interactions were created between a Transcription Factor (TF) and its putative target using TF family membership obtained from Grassius (Yilmaz et al., 2009, Plant Physiology 149:171) and identification of cis-regulatory motifs, obtained from AGRIS (Palaniswamy et al., 2006, Plant Physioloy 140:818), in 1000 bp upstream of promoter sequence of Target genes. Motifs were searched using the DNA pattern search tool from the RSA tools server with default parameters (van Helden, 2003, Nucleic Acids Research 31:3593).
The 451 N-regulated rice genes were queried against the Rice Multinetwork to create a N-regulated gene network in Rice. Additionally, conserved correlation edges between two N-regulated Rice genes were proposed if the respective Arabidopsis N-regulated orthologs were also correlated significantly in the same direction (both positively or negatively) with Pearson correlation coefficient ≥0.8. Predicted regulatory interactions were further restricted to those TF and Target pairs where the two were also significantly correlated (Pearson correlation coefficient ≥0.8 and p-value ≤0.01), which resulted in a network of 206 Rice genes, of which 21 are transcription factors, with 6,818 edges (
The network was further refined by removing conserved correlation edges that are not supported with predicted regulatory edges which resulted in a “N-regulated correlated network” containing 151 Rice genes, of which 16 were TFs (Table 8). All network visualizations were created using Cytoscape (v2.8.3) software (Shannon et al., 2003, Genome Research 13:2498).
A comparison of the number of TF targets at various network building steps as shown in
Arabidopsis Orthologs
Arabidopsis Orthologs of Rice TF
9.1. Building Crop Networks
Network analysis and tools can be used to translate knowledge from models-to-crops to aid in translation to agriculture. By using a publicly available microarray N-treatment dataset of maize that discovered biomarkers nitrogen status in the field, a step-by-step analysis incorporating Arabidopsis network knowledge results in networks that enable focused hypothesis generation with translational value.
5,057 N-responsive genes were identified using functions in VirtualPlant maize, which form a correlation network of 4,278 maize genes. This network is too large to enable focused translational targets, and more than 50% of the maize genes are unannotated. This maize transcriptome data may be interpreted in the context of the Arabidopsis network to derive networks and focused translational targets.
First, the 5,057 maize genes were mapped to 3,756 arabidopsis homologs using VirtualPlant maize, which uses the maize “best-hit” to Arabidopsis data provided by Phytozyme (www.phytozyme.net).
Next, the “gene network” function in VirtualPlant (protein:protein, metabolic, cis-binding, and text-mining edges) was used to obtain a network of 2,262 connected maize genes. A GO term over-representation test on this network identifies Nitrogen metabolic process (p<1e−33) and sulfur metabolic process (p<0.005) among the significant terms. Hyoptheses were focused for translational studies using conserved N-networks, and the maize translational network was refined by selecting genes that are N-regulated in both maize and Arabidopsis in Step 3.
Subsequently, an Arabidopsis nitrogen response gene set (1,254 genes) was created as a union of genes responsive in shoots (Gutierrez et al., 2008, Proc Natl Acad Sci USA, 105(12):4939) and roots (Schena and Yamamoto, 1988, Science 241(4868):965). These Arabidopsis genes and the 2,262 maize genes were intersected to produce a highly significant (p<0.001) overlapping gene list of 223 N-regulated genes. The regulatory edges in this conserved network were required to have a correlation of >0.7 or <−0.7 (within maize), as described in (Gutierrez et al., 2008, Proc Natl Acad Sci USA, 105(12):4939) and (Sheen, 2001, Plant physiology 149(3):1231). BioMaps analysis in Virtual plant uncovered significant GO terms including photoperiodism (p-val<0.005) and nitrate transport (p-val<0.01) and 15 TF hubs for focused generation of translational targets.
Using the VirtualPlant-meets-Cytoscape function, a “hubbiness” table was generated to identify the master regulatory nodes in the core N-regulatory network conserved between maize and Arabidopsis. Remarkably, the 5 top TF hubs include TFs (CCA1, GLK1 and bZIP9) (
The TF hubs of this N-regulatory network between maize and Arabidopsis (
10.1. Introduction
Signal propagation through gene regulatory networks (GRNs) enables organisms to rapidly respond to changes in environmental signals. For example, dynamic GRN studies in plants have uncovered genome-wide responses that occur within as little as three minutes following a nitrogen (N) nutrient signal perturbation (Kouk et al., 2010, Genome Biology 11:R123). Yet, many of the underlying rapid and temporal network connections between transcription factors (TFs) and their targets elude detection even in fine-scale time-course studies (Ni et al., 2009, Gene Dev 23(11):1351-1363; Chang et al., 2013, Elife 2:e00675), as current methods used (e.g. chromatin immunoprecipitation, ChIP) require stable TF-binding in at least one time-point to identify primary targets (Gorski et al, 2011, Nucleic Acids Research 39(22):9536-9548; Hughes et al., 2013, Genetics 195(1):9-36; Marchive et al., 2013, Nature Communications 4). However, recent models suggest that GRNs built solely on TF-binding data are insufficient to recapture transcriptional regulation (Biggin MD, 2011, Dev Cell 21(4):611-626; Walhout A J M, 2011, Genome Biol 12(4); Lickwar et al., 2012, Nature 484(7393):251-255). Compounding this dilemma, TFs have been found to stably bind to only a small percentage (5-32%) of the TF-regulated genes across eukaryotes (Gorski et al, 2011, Nucleic Acids Research 39(22):9536-9548; Hughes et al., 2013, Genetics 195(1):9-36; Marchive et al., 2013, Nature Communications 4; Monke et al., 2012, Nucleic Acids Research 40:82401; Arenhart et al., 2014, Molecular plant 7(4):709-721; Bolduc et al., 2012, Gene Dev 26(15):1685-1690; Bianco et al., 2014, Cancer research 74(7):2015-2025). Since TF-binding is required to define the primary targets in current GRN studies, the large set of TF-regulated, but not TF-bound genes must be categorically dismissed as indirect or secondary targets (Gorski et al, 2011, Nucleic Acids Research 39(22):9536-9548; Hughes et al., 2013, Genetics 195(1):9-36; Arenhart et al., 2014, Molecular plant 7(4):709-721; Bolduc et al., 2012, Gene Dev 26(15):1685-1690; Bianco et al., 2014, Cancer research 74(7):2015-2025). Provided herein is an alternative—and more intriguing conclusion—that these typically dismissed targets comprise the “dark matter” of rapid and transient signal transduction that has previously eluded detection across eukaryotes.
To capture these rapid and dynamic network connections that elude detection by biochemical TF-binding assays, an approach was developed that can identify primary targets based on a functional read out—TF-induced gene regulation—even in the absence of detectable TF-binding. This study focuses on the master TF bZIP1 (BASIC LEUCINE ZIPPER 1), a central integrator of metabolic signaling including sugar (Baena-Gonzalez et al., 2007, Nature 448:938; Kang et al., 2010, Molecular Plant 3:361-373; Dietrich et al., 2011, The Plant Cell 23:381-395) and N nutrient signals (Gutierrez et al., 2008, Proc. Natl. Acad. Sci. U.S.A. 105:4939; Obertello et al., 2010, BMC Systems Biology 4:111). To uncover the underlying dynamic GRNs, both bZIP1 and the N-signal it transduces were temporally perturbed in a cell-based system designed for temporal TF perturbation. This cell-based system named TARGET (Transient Assay Reporting Genome-wide Effects of Transcription factors), which involves inducible TF nuclear localization, is able to identify primary TF targets based solely on TF-induced gene regulation, as shown for a well-studied TF involved in plant hormone signaling—ABI3 (Bargmann et al., 2013, Molecular Plant 6(3):978). In this study, by adapting a micro-ChIP protocol (Dahl et al., 2008, Nucleic Acids Research, 36:e15) to the cell-based TARGET system, primary targets were monitored based on either TF-induced gene regulation or TF-binding quantified in the same cell samples, enabling a direct comparison. The use of isolated cells allowed the capture of rapid and transient regulatory events including the formation of TF-DNA complexes within 1-5 min from the onset of TF translocation to the nucleus. Such a short-lived interaction would likely be missed in planta, as effective protein-DNA cross-linking in intact plant tissues requires prolonged (for a minimum of 15 minutes) infiltration under vacuum. Unexpectedly, the primary TF targets that are regulated by, but not stably bound to bZIP1—termed “transient”—were the most biologically relevant to rapid transduction of the N-signal. These transient TF-targets include first-responder genes, induced as early as 3-6 minutes after N-signal perturbation in planta (Kouk et al., 2010, Genome Biology 11:R123). This discovery suggests that the current “gold-standard” of GRNs built solely on the intersection of TF-binding and TF-regulation data miss a large and important class of transient TF targets, which are at the heart of dynamic networks. Moreover, the shared features of these transient bZIP1 targets and their role in rapid N-signaling provides genome-wide support for a classic, but largely forgotten model of “hit-and-run” transcription (Schaffner, 1988, Nature 336:427-428). This transient mode-of-action can enable a master TF to catalytically and rapidly activate a large set of genes in response to a signal.
10.2. Materials and Methods
Plant Materials and DNA Constructs. Wild-type Arabidopsis thaliana seeds [Columbia ecotype (Col-0)] were vapor-phase sterilized, vernalized for 3 days, then 1 ml of seed were sown on agar plates containing 2.2 g/l custom made Murashige and Skoog salts without N or sucrose (Sigma-Aldrich), 1% [w/v] sucrose, 0.5 g/l MES hydrate (Sigma-Aldrich), 1 mM KNO3 and 2% [w/v] agar. Plants were grown vertically on plates in an Intellus environment controller (Percival Scientific, Perry, Iowa), whose light regime was set to 50 μmol m−2 s−1 and 16 h-light/8 h-dark at constant temp of 22° C. The bZIP1 (At5g49450) cDNA in pENTR was obtained from the REGIA collection (Paz-Ares et al., 2002 Comp Funct Genomics 3(2):102-108) and was then cloned into the destination vector pBeaconRFP_GR used in the protoplast expression system (Bargmann et al., 2009, Plant physiology 149:1231) by LR recombination (Life Technologies). The pBeaconRFP_GR vector is available through the VIB website (http://gateway.psb.ugent.be/).
Protoplast Preparation, Transfection, Treatments and Cell Sorting. Root protoplasts were prepared, transfected and sorted as previously described (Bargmann et al., 2013, Molecular Plant 6(3):978; Yoo et al., 2007, Nature Protocols 2:1565; Bargmann et al., 2009, Plant physiology 149:1231). Briefly, roots of 10-day-old seedlings were harvested and treated with cell wall digesting enzymes (Cellulase and Macerozyme; Yakult, Japan) for 4 h. Cells were filtered and washed then transfected with 40 μg of pBeaconRFP_GR::bZIP1 plasmid DNA per 1×106 cells facilitated by polyethylene glycol treatment (PEG; Fluka 81242) for 25 minutes (Bargmann et al., 2009, Plant physiology 149:1231). Cells were washed drop-wise, concentrated by centrifugation, then resuspended in wash solution W5 (154 mM NaCl, 125 mM CaCl2, 5 mM KCl, 5 mM IViES, 1 mM Glucose) for overnight incubation at room temperature. Protoplast suspensions were treated sequentially with: 1) a N-signal treatment of either a 20 mM KNO3 and 20 mM NH4NO3 solution (N) or 20 mM KCl (control) for 2 h, 2) either CHX (35 μM in DMSO, Sigma-Aldrich) or solvent alone as mock for 20 min, and then 3) with either DEX (10 μM in EtOH, Sigma-Aldrich) or solvent alone as mock for 5 h at room temperature. Treated protoplast suspensions were FACS sorted as in (13): approximately 10,000 RFP-positive cells were FACS sorted directly into RLT buffer (QIAGEN) for RNA extraction.
RNA Extraction and Microarray. RNA from 6 replicates (3 treatment replicates and 2 biological replicates) was extracted from protoplasts using an RNeasy Micro Kit with RNase-free DNaseI Set (QIAGEN and quantified on a Bioanalyzer RNA Pico Chip (Agilent Technologies). RNA was then converted into cDNA, amplified and labeled with Ovation Pico WTA System V2 (NuGEN) and Encore Biotin Module (NuGEN), respectively. The labeled cDNA was hybridized, washed and stained on an ATH1-121501 Arabidopsis Genome Array (Affymetrix) using a Hybridization Control Kit (Affymetrix), a GeneChip Hybridization, Wash, and Stain Kit (Affymetrix), a GeneChip Fluidics Station 450 and a GeneChip Scanner (Affymetrix).
Analysis of microarray data with CHX treatment. Microarray intensities were normalized using the GCRMA (http://www.bioconductor.org/packages/2.11/bioc/html/gcrma.html) package. Differentially expressed genes were then determined by a 3-way ANOVA with N, DEX and biological replicates as factors. The raw p-value from ANOVA was adjusted by False Discovery Rate (FDR) to control for multiple testing (Benjamini et al., 2005, Genetics 171:783). Genes significantly regulated by the N-signal and/or DEX-induced bZIP1 nuclear localization were then selected with a FDR cutoff of 5%. Genes significantly regulated by the interaction of the N-signal and bZIP1 (N-signal×bZIP1) were selected with a p-val (ANOVA) cutoff of 0.01. Only unambiguous probes were included. Heat maps were created using Multiple Experiment Viewer software (TIGR; http://www.tm4.org/mev/). The significance of overlaps of gene sets were calculated using the GeneSect (R)script (Katari et al., 2010, Plant physiology 152:500) using the microarray as background. Hypergeometric distribution was used in one case (specified in the manuscript) to evaluate the enrichment of gene sets, when a specific background—N-responsive genes identified in different root cell types (Gifford et al., 2008, Proc. Natl. Acad. Sci. U.S.A. 105:803-808)—was needed.
Filtering bZIP1 targets for the effects of protoplasting, and response to CHX or DEX. In this step, genes were filtered out whose expression states responded to protoplasting, or to treatments of DEX or CHX that were not related to the bZIP1 mediated regulation, in the following three steps: Filter 1: DEX-response filter: Genes responding to DEX independent of TF. Genes significantly induced/repressed by DEX-treatment in protoplasts transfected with the empty pBeanconRFP GR_plasmid (ANOVA analysis; FDR<0.05), were excluded from analysis (1.6% genes filtered). Filter 2: Protoplast-response filter: Genes induced by protoplasting. Genes that are induced by root protoplasting (Birnbaum K, et al., 2003, Science 302(5652):1956-1960) were removed from the list of bZIP1 targets (12.3% genes filtered). Filter 3: DEX×CHX interaction filter. Genes whose DEX-regulation is modified by CHX. This filter removes genes from the analysis in cases where the effects of DEX-induced TF nuclear import on gene regulation are affected by CHX treatment. To do this, a 3-way ANOVA was performed (Factors Nitrogen, DEX, and CHX) and bZIP1 primary targets were identified whose gene expression regulation by the DEX-induced nuclear import of bZIP1 is different between +CHX and −CHX conditions (FDR cutoff of interaction term CHX*DEX<0.05). This eliminated genes that are regulated by bZIP1 in the presence of CHX, but not in the absence of CHX. This gene set may contain bZIP1 targets under a self-control negative feedback loop, and bZIP1 targets for which the half-lives of the transcripts affected by CHX. While the first case is potentially interesting, the second case represents the CHX artifact to be removed. Since it is difficult to differentiate between the two outcomes, these CHX-sensitive DEX-responsive genes dependent on bZIP1 were eliminated from the list of bZIP1 target genes (17.4% genes filtered), thus increasing precision over recall.
Micro-Chromatin Immunoprecipitation. For each combination of protoplast treatments (see above), an unsorted suspension of protoplasts containing approximately 5,000-10,000 GR::bZIP1 transfected cells was fixed for ChIP analysis, using an adapted version of the micro-ChIP protocol by Dahl et al (Dahl et al., 2008, Nucleic Acids Research 36:e15). The advantage in a ChIP analysis from protoplasts is that short-lived interactions would likely be missed in planta assays, as effective protein-DNA cross-linking in intact plant tissues requires prolonged (for a minimum of 15 minutes) infiltration under vacuum (Gendrel et al., 2005, Nat Methods 2(3):213-218). Cells were incubated with gentle rotation in 1% formaldehyde in W5 buffer for 7 minutes, then washed with W5 buffer and frozen in liquid N2. μChIP was performed according to Dahl et al. (2008, Nucleic Acids Research 36:e15) with a few modifications below. The GR::bZIP1-DNA complexes were captured using anti-GR antibody [GR (P-20) (Santa Cruz biotech) bound to Protein-A beads (Life Biotechnologies)]. A washing step with LiCl buffer [0.25M LiCl, 1% Na deoxycholate, 10 mM Tris-HCl (pH8), 1% NP-40] was added in between the wash with RIPA buffer and TE (Dahl et al., 2008, Nucleic Acids Research 36:e15). After elution from the beads, the ChIP material and the Input DNA were cleaned and concentrated using QIAGEN MiniElute Kit (QIAGEN). The protoplast suspension used for micro-ChIP was not FACS sorted in order to maintain a comparable incubation time between the samples that were used for microarray analyses and for micro ChIP. Importantly, while FACS sorting of transformed cells is required for microarray studies, it was not required to identify DNA targets using ChIP-seq.
ChIP-Seq library preparation. The ChIP DNA and Input DNA were prepared for Illumina HiSeq sequencing platform following the Illumina ChIP-Seq protocol (Illumina, San Diego, Calif.) with modifications. Barcoded adaptors and enrichment primers (BiOO Scientific, TX, USA) were used according to the manufacturer's protocol. The concentration and the quality of the libraries was determined by the Qubit Fluorometric DNA Assay (InVitrogen, NY, USA), DNA 12000 Bioanalzyer chip (Agilent, CA, USA) and KAPA Quant Library Kit for Illumina (KAPA Biosystems, MA, USA). A total of 8 libraries were then pooled in equimolar amounts and sequenced on two lanes of an Illumina HiSeq platform for 100 cycles in paired-end configuration (Cold Spring Harbor Lab, N.Y.).
ChIP-Seq Analysis. Reads obtained from the four treatments (with DEX and N in the presence of CHX) were filtered and aligned to the Arabidopsis thaliana genome (TAIR10) and clonal reads were removed. The ChIP alignment data was compared to its partner Input DNA and peaks were called using the QuEST package (20) with a ChIP seeding enrichment ≥3, and extension and background enrichments ≥2. These regions were overlapped with the genome annotation to identify genes within 500 bp downstream of the peak. The gene lists from multiple treatments were largely overlapping sets, and hence were pooled to generate a single list of genes that show significant binding of bZIP1. Due to technical issues, the experimental design used for ChIP-Seq precludes the observation of significant differences between the genes bound by bZIP1 under the different treatment conditions. This is because the samples fixed for ChIP included a variable number of transfected cells that were not sorted by FACS.
The ChIP-seq studies were performed using a micro-ChIP protocol on ˜10,000 cells, which result in a low DNA input, compared to standard ChIP studies. It has been shown that peak discovery from ChIP data becomes more challenging as the number of cells goes down (
Time-series ChIP-seq. The ChIP time-series samples were pre-treated with a N-signal treatment of 20 mM KNO3 and 20 mM NH4NO3 solution (N) for 2 h, followed by CHX (35 μM in DMSO, Sigma-Aldrich) for 20 min. Protoplasts were then treated with DEX (10 μM in Ethanol, Sigma-Aldrich) and samples were harvested at 1, 5, 30 and 60 min after the start of the DEX-induced bZIP1 nuclear localization.
Cis-element Motif Analysis. 1 Kb regions upstream of the TSS (Transcription Start Site) for target genes were extracted based on TAIR10 annotation and submitted to the Elefinder program (all promoters from the genome as background) (Li et al., 2011, Plant physiology 156:2124-2140) or MEME (against a randomized dinucleotide background) (Bailey et al., 2009, Nucleic Acids Research 37:W202-208) to determine over-representation of known cis-element binding sites (different parameters used in specific cases were notified in the paper if applicable). The E-value of significance for each motif was used to cluster the occurrence of motifs in the various subsets using the HCL algorithm in MeV (Saeed et al., 2006, Methods in Enzymology 411:134-193). Motifs that show a higher specificity to a particular category or a sub-group were identified with the PTM algorithm in MeV. De novo motif identification was performed on 1 Kb upstream sequence of the genes regulated by bZIP1 from microarray and ChIP-Seq data separately using the MEME suite (Bailey et al., 2009, Nucleic Acids Research 37:W202-208).
Accession numbers. The raw data from all Microarray assays, were submitted to NCBI GEO and is available under the accession number GSE54049. The raw sequencing data from ChIP-Seq assays is available from NCBI SRA under the accession SRX425878.
10.3. Results
Temporal perturbation of both bZIP1 and the N-signal it transduces. To identify how bZIP1 mediates the rapid propagation of a N-signal in a GRN, both bZIP1 and the N-signal it transduces were temporally perturbed in the cell-based TARGET system (
ARABIDOPSIS THALIANA TRNA ADENOSINE DEAMINASE A
ARABIDOPSIS THALIANA CELLULOSE SYNTHASE LIKE G2
ARABIDOPSIS NITRATE REDUCTASE 2
ARABIDOPSIS THALIANA SULFOTRANSFERASE 2A
ARABIDOPSIS THALIANA DROUGHT-INDUCED 8
ARABIDOPSIS THALIANA HIGH AFFINITY NITRATE TRANSPORTER 2.6
ARABIDOPSIS THALIANA NITRITE REDUCTASE
ARABIDOPSIS THALIANA PLASMA MEMBRANE INTRINSIC PROTEIN 1
ARABIDOPSIS THALIANA WRKY DNA-BINDING PROTEIN 7
ARABIDOPSIS THALIANA PROTEIN-SERINE KINASE 1
ARABIDOPSIS THALIANA WRKY DNA-BINDING PROTEIN 72
Arabidopsis thaliana dirigent protein 6
ARABIDOPSIS THALIANA VOLTAGE DEPENDENT ANION CHANNEL 1
ARABIDOPSIS THALIANA CATION/H+ EXCHANGER 18
ARABIDOPSIS FATTY ACID HYDROXYLASE 2
thaliana protein match is: unknown protein (TAIR: AT1G49500.1); Has 22 Blast hits to 22 proteins in 2
Primary targets of bZIP1 can be identified by either TF-regulation or TF-binding. bZIP1 primary targets were first identified based solely on TF-induced gene regulation. A total of 901 genes were identified as primary bZIP1 targets based on significant regulation in response to DEX-induced TF nuclear import, compared to minus DEX controls (ANOVA analysis; FDR adjusted p-value<0.05) (
bZIP1 primary targets were next identified based solely on TF-DNA binding. Genes bound by bZIP1 were identified as genic regions enriched in the ChIP DNA, compared to the background (input DNA), using the QuEST peak-calling algorithm (
bZIP1 (FDR < 0.05)
bZIP1 (FDR < 0.05) AND
NitrogenXbZIP1
bZIP1 bound genes
Arabidopsis NAC domain containing protein 2
Arabidopsis thaliana lipoxygenase 3
ARABIDOPSIS THALIANA MILDEW RESISTANCE
ARABIDOPSIS THALIANA DELTA(3), DELTA(2)-ENOYL
ARABIDOPSIS THALIANA NUDIX HYDROLASE
Arabidopsis NAC domain containing protein 29
Arabidopsis thaliana lipoxygenase 4
ARABIDOPSIS GOLDEN2-LIKE 1
ARABIDOPSIS NDR1/HIN1-LIKE 1
Arabidopsis thaliana eukaryotic translation initiation factor 3
ARABIDOPSIS ORTHOLOG OF SUGAR BEET HS1 PRO-1 2
ARABIDOPSIS THALIANA CALMODULIN LIKE 4
ARABIDOPSIS THALIANA VOLTAGE DEPENDENT
ARABIDOPSIS THALIANA
ARABIDOPSIS MANGANESE SUPEROXIDE DISMUTASE 1
ARABIDOPSIS THALIANA WOUND-INDUCED PROTEIN
Arabidopsis thaliana RING and Domain of Unknown Function
Arabidopsis thaliana late embryogenensis abundant like 5
Arabidopsis thaliana Nudix hydrolase homolog 7
ARABIDOPSIS THALIANA CALCINEURIN B-LIKE
ARABIDOPSIS THALIANA HEAT SHOCK
ARABIDOPSIS FATTY ACID HYDROXYLASE 2
ARABIDOPSIS THALIANA WRKY DNA-BINDING
ARABIDOPSIS THALIANA HEAVY METAL ATPASE 1
ARABIDOPSIS THALIANA FERRETIN 1
ARABIDOPSIS SULFATE TRANSPORTER 68
Arabidopsis thaliana NADP-malic enzyme 2
ARABIDOPSIS THALIANA VOLTAGE DEPENDENT
ARABIDOPSIS THALIANA WRKY DNA-BINDING
Arabidopsis NAC domain containing protein 87
ARABIDOPSIS THALIANA SPX DOMAIN GENE 1
Arabidopsis NAC domain containing protein 91
ARABIDOPSIS THALIANA BASIC LEUCINE ZIPPER 9
Arabidopsis toxicos en levadura 31
ARABIDOPSIS THALIANA WRKY DNA-BINDING
ARABIDOPSIS THALIANA MYB DOMAIN PROTEIN 44
ARABIDOPSIS GLUTATHIONE S-TRANSFERASE 11
ARABIDOPSIS GLUTATHIONE S-TRANSFERASE 1
ARABIDOPSIS 12-OXOPHYTODIENOATE REDUCTASE 1
ARABIDOPSIS 12-OXOPHYTODIENOATE REDUCTASE 2
ARABIDOPSIS THALIANA PYRIDOXINE BIOSYNTHESIS
Arabidopsis NAC domain containing protein 81
ARABIDOPSIS THALIANA RAS-RELATED NUCLEAR
ARABIDOPSIS HEAVY METAL ATPASE 8
Integration of TF-regulation and TF-binding data identifies three modes-of-action for bZIP1 and its primary targets: poised, stable, and transient. To understand the underlying mechanisms by which bZIP1 propagates N-signals through a GRN, primary targets identified either by TF-induced gene regulation or TF-binding were integrated. To enable a direct comparison of transcriptome and TF-binding data, of the 850 genes bound to bZIP1, 187 genes not represented on the ATH1 microarray were omitted. 136 genes that did not pass the stringent filters for effects of protoplasting, DEX, or CHX treatment were also omitted. This resulted in a filtered total of 527 bZIP1 bound genes (
Arabidopsis thaliana protein match is: unknown protein (TAIR: AT5G23890.1); Has 30201
Arabidopsis protein of unknown function (DUF241)
Arabidopsis thaliana protein match is: unknown protein (TAIR: AT1G65486.1); Has 22 Blast
Arabidopsis protein of unknown function (DUF241)
Arabidopsis thaliana protein match is: S-adenosyl-L-methionine-dependent
thaliana protein match is: unknown protein (TAIR: AT4G37290.1); Has 36 Blast hits to 35
armeniaca}
To next explore the biological relevance of the three distinct classes of primary bZIP1 targets, the following features were examined: (1) enrichment of cis-regulatory elements (
In addition to these common features consistent with the role of bZIP1 in planta (Baena-Gonzalez et al., 2007, Nature 448:938; Kang et al., 2010, Molecular Plant 3:361-373; Gutierrez et al., 2008, Proc. Natl. Acad. Sci. U.S.A. 105:4939-4944), distinctive features for the Class III transient bZIP1 primary targets specifically relevant to rapid N-signaling were uncovered. These class-specific features are outlined below.
Class I “Poised” targets (TF Binding only). Class I bZIP1 primary targets (407 genes) that are bound, but not regulated by bZIP1, are significantly enriched in genes involved in response to biotic/abiotic stimuli, and transport of divalent ions (FDR<0.01) (
Class II “Stable” targets (TF Binding and Regulation). Class II targets (120 genes) are regulated and bound by bZIP1. This 23% overlap (p-val<0.001) between transcriptome and ChIP-Seq data (
Class III “Transient” targets (TF Regulation, but no detectable TF binding). Unexpectedly, the largest group of bZIP1 primary targets (781 genes), is represented by the Class III “transient” targets i.e., primary targets regulated by bZIP1 perturbation but not detectably bound by it (
Arabidopsis
thaliana protein match is: glucan synthase-like 4 (TAIR: AT3G14570.2); Has 315
The Class III transient bZIP1 primary targets comprise “first responders” in rapid N-signaling. In line with its role as a master regulator in a N-response gene network, all three classes of bZIP1 primary targets uncovered in this cell-based study are significantly enriched with N-responsive genes observed in whole plants (Krouk et al., 2010, Genome Biology 11(12):R123; Gutierrez et al., 2008, Proc. Natl. Acad. Sci. U.S.A. 105:4939-4944; Wang et al., 2003, Plant Physiol. 132(2):556-567; Wang et al., 2004, Plant physiology 136(1):2512-2522) (
branched
chain
family
amino
acid
metabolic
process
amine
catabolic
process
cellular
amino
acid
catabolic
process
branched
chain
family
amino
acid
catabolic
process
Lastly, Class III transient target genes are uniquely enriched in genes that respond early and transiently to the N-signal in planta (
A transient mode of bZIP1 action invokes a “hit-and-run” model for N-signaling. The significant enrichment of N-relevant genes in Class III targets, links the transient mode-of-action of bZIP1 with early and transient aspects of N-nutrient signaling (
To experimentally determine if any of the Class III transient targets are bound by bZIP1 at very early time-points, ChIP-Seq analysis was performed on four additional time-points after the DEX-induced nuclear import of bZIP1. 41 genes were revealed from Class III transient targets that have detectable bZIP1 binding at one or more of the earlier time-points (1, 5, 30, 60 min) (
Finally, the hypothesis that bZIP1 acts as a “pioneer/catalyst” TF in N-signal propagation through a GRN, is further supported by cis-motif analysis. Specifically, the promoters of Class III “transient” bZIP1 target genes contained the largest number and most significant enrichment of cis-regulatory motifs, in addition to bZIP1-binding sites (
10.4. Discussion
The discovery of a large and typically overlooked class of transient primary targets of the master TF bZIP1, disclosed herein, introduces a novel perspective in the general field of dynamic GRNs. Dynamic TF-target binding studies across eukaryotes have captured many transient TF-targets (Ni et al., 2009, Gene Dev 23(11):1351-1363; Chang et al., 2013, Elife 2:e00675). However, even those fine-scale time-series ChIP studies likely miss highly temporal connections, as they require biochemically detectable TF binding in at least one time-point to identify primary TF targets. Key to the discovery of the transient targets of bZIP1 involved in rapid N-signaling, disclosed herein, is the ability to identify primary targets based on TF-induced changes in mRNA that can occur even in the absence of detectable TF binding. The cell-based system also enabled the detection of rapid and transient binding within 1 minute of TF nuclear import, owing to rapid fixation of protein-DNA complexes in plant cells lacking a cell wall. Importantly, the in planta relevance of the cell-based TARGET studies disclosed herein (
The discovery of these transient TF targets, disclosed herein, adds a new perspective to the field of dynamic GRNs. Recent time-series studies in yeast by Lickwar et. al. reported transitive TF-target binding described as a “tread-milling” mechanism, in which a TF exhibits weak and transitive binding to some of its targets, resulting in a lower level of gene activation (Lickwar et al., 2012, Nature 484(7393):251-255). The transient bZIP1 targets detected in this study do not fit this “tread-milling” model, since there is no significant difference between the expression fold-change distributions of for Class III “transient” targets, versus Class II “stable” targets. Instead, the transient TF-target interactions uncovered herein are conceptualized to a classic, but largely forgotten, “hit-and-run” model of transcription proposed in the 1980's (Schaffner, 1988, Nature 336:427-428) (
In support of this “hit-and-run” transcription model, Class III “transient” targets include genes that are rapidly and transiently bound by bZIP1 at very early time-points (1-5 min) after TF nuclear import, and whose level of expression is maintained at a higher level, despite being no longer bound by bZIP1 at later time-points. Continued regulation of the bZIP1 targets (after bZIP1 is no longer bound) might be mediated by other TF partners recruited by the “trigger/pioneer” TF (
Importantly, these results have significance beyond bZIP1, N-signaling, and indeed transcend plants. Across eukaryotes, TFs are found to bind only to a small percentage of their regulated targets, as shown in plants (Monke et al., 2012, Nucleic Acids Research 40:82401; Arenhart et al., 2014, Molecular plant 7(4):709-721; Bolduc et al., 2012, Gene Dev 26(15):1685-1690), yeast (Hughes et al., 2013, Genetics 195(1):9-36) and animals (Gorski et al., 2011, Nucleic Acids Research 39:9536; Bianco et al., 2014, Cancer research 74(7):2015-2025). The large number of TF-regulated but unbound genes, including the false negatives of ChIP-seq (Chen et al., 2012, Nat Methods 9(6):609), must be dismissed as putative secondary targets in approaches that can only identify primary targets based on TF-DNA binding. Instead, it is shown herein that these typically dismissed targets, which can be identified as primary TF targets by a functional read-out in this cell-based TARGET approach (e.g. TF-induced regulation), are crucial for rapid and dynamic signal propagation, thus uncovering the “dark matter” of signal transduction that has been missed. More broadly, the approach described herein is applicable across eukaryotes, and can also be adapted to studying cell-specific GRNs, by using GFP-marked cell lines in the assay (Birnbaum K, et al., 2003, Science 302(5652):1956-1960). Moreover, this approach can identify primary targets even in cases where TF binding can never be physically detected. The transient targets thus uncovered, will reveal the elusive temporal interactions that mediate rapid and dynamic responses of GRNs to external signals.
As described herein, using the cell-based TARGET system, a novel class of transient TF targets that are directly regulated by the bZIP1 TF, but not detectably bound by it were identified. This class of transient targets (Class III) suggests a “hit-and-run” mode-of-action for bZIP1, where bZIP1 “hits” its target, initiates transcription, then dissociates (“run”), leaving the transcription going on even without bZIP1 binding to the promoter.
To test the hypothesis that transcription of a gene initiated by “the Hit” continues after “the Run,” an affinity-tagged UTP was used to label and capture newly synthesized mRNA. By adding this label at a time-point when the TF is not detectably bound, it can be determined whether a gene is still actively transcribed. Briefly, biosynthetic tagging of newly synthesized RNA performed using 4-thiouracil and uracil phosphoribosyltransferase (referred to as “4sU tagging” hereinafter) (Sidaway-Lee et al., 2014, Genome Biology 15 (3): R45; Zeiner et al., 2008, Methods in Molecular Biology 419: 135-46), was adapted for the cell based TARGET system in plants (Bargmann et al., 2013, Molecular Plant 6(3):978). Technically, 4sU is fed to plant protoplasts and incorporated into newly synthesized RNA. After that, total RNA is extracted from the protoplasts, and the newly synthesized RNA that is tagged with 4sU is isolated from the total RNA through biotinylation and Streptavidin magnetic beads. Next, the RNA is purified and used for transcriptomics profiling. The 4sU tagged RNA represents only the newly transcribed genes.
4sU tagged RNA can be detected as early as in 20 min after feeding 4sU to isolated protoplasts (
Arabidopsis
thaliana protein match is: glucan synthase-like 4 (TAIR: AT3G14570.2); Has 315
Arabidopsis protein of unknown function (DUF241)
Transient TF-targets detected in cells help to decipher dynamic N-regulatory networks operating in planta. The transient TF-targets detected specifically in the TARGET cell-based system make a unique contribution to understanding how signal transduction occurs in planta. First, as the TARGET cell-based system detects only primary TF targets, this data enables the identification of direct TF-targets in the in planta TF perturbation data, which on its own cannot distinguish primary vs. secondary targets. Second, the network inference studies described herein for the proof-of-principle example bZIP1 predict that the transient bZIP1 targets (detected only in cells) are TF2's predicted to regulate secondary bZIP1 targets (detected only in planta) (
Transient TF1→T2 targets detected in TARGET cell-based system are predicted to regulate secondary targets of TF1 identified in planta. The hypothesis that “transient” targets of bZIP1 detected in the cell-based TARGET system mediate N-regulation of downstream bZIP1 targets in planta was developed by the preliminary implementation of the “Network Walking” pipeline outlined in
In Step 1, to identify genes potentially involved in bZIP1-mediated N-signaling in planta, bZIP1 targets identified using the cell-based TARGET system (primary targets), described herein, were combined with bZIP1 targets identified by TF perturbation in planta (primary and secondary targets) (Kang et al., 2010, Molecular Plant 3:361), and then this union of bZIP1 targets was intersected with the list of N-regulated genes from a time-course study of N-treatments performed in planta.
In Step 2, TF→target connections were inferred between the bZIP1 targets identified in the cell-based TARGET system with those identified by TF perturbation in planta, using the N-treatment time-series data and the network inference approach that was previously and validated in silico and experimentally (Directed Factor Graphs) (Krouk et al., 2010, Genome Biology 11:R123) (Step 2,
The resulting network (shown in
Remarkably, 18/22 of these TF2's are Class III transient targets of bZIP1 detected only in the TARGET cell-based system, described herein (Inner ring of
This result supports the hypothesis that transient bZIP1 targets detected only in the TARGET cell-based system described herein, are intermediate effectors of secondary bZIP1 targets detected only in planta (Kang et al., 2010, Molecular Plant 3:361). This combined experimental and computational approach is called “Network Walking”, because it enables a “walk” from pioneer TF1→transient target (TF2)→effector target in planta (e.g. N-assimilation gene), as described below.
The general “Network Walking” Pipeline (
Step 1A: Experimental: Perturb pioneer TF1 and identify symmetric difference between cell-based targets identified in TARGET (TF2.1-j), and in planta targets defined by TF perturbation in planta (Z1-j), as well as overlap.
Step 1B: Computational: Infer edges in network. This will infer edges between potential “transient” targets detected in the cell-based TARGET system (TF2.1-j) and in planta targets (Z1-j) of TF1 using time-series data and network inference approaches DFG (Krouk et al., 2010, Genome Biology 11:R123), Genie3 or Inferrelator (Krouk et al., 2013, Genome Biology 14(6): 123).
Step 2A: Experimental: Perturb TF2 in cell-based TARGET system to validate primary TF2→gene Z edges and also identify new transient targets of TF2 (e.g. TF3.1-j).
Step 2B: Computational: Rerun network inference (e.g. DFG) using time-series data from N-treated plants, this time using a directed matrix that starts with priors defined experimentally by TF2 target data (Step 3).
Outcome: This combined computational/experimental pipeline will result in a validated “Network Walk” from pioneer TF1→transient TF2.1 (identified in TARGET)→target gene Z's in planta. Another outcome will be new transient TF2→TF3i-j's which may drive a new round of TF perturbation e.g. Step 3A, in a true systems biology cycle. Each iterative cycle of TF perturbation and network modeling, will build a new set of edges in the network out from the original TF1. The networks generated in Aim 2A will test the general hypothesis that transient targets detected only in the rapid and temporal cell based system, reveal “hidden steps” that mediate downstream responses in planta—but cannot be detected in planta. Thus, rather than merely using the in planta data to confirm TF-targets identified in the TARGET cell-based system, these network connections show that the transient targets identified in the cell-based TARGET system add to and refine our understanding of how dynamic networks operate in vivo, but whose specific connections elude detection in planta.
Although the invention is described in detail with reference to specific embodiments thereof, it will be understood that variations which are functionally equivalent are within the scope of this invention. Indeed, various modifications of the invention in addition to those shown and described herein will become apparent to those skilled in the art from the foregoing description and accompanying drawings. Such modifications are intended to fall within the scope of the appended claims. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. Such equivalents are intended to be encompassed by the following claims.
All publications, patents and patent applications mentioned in this specification are herein incorporated by reference into the specification to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference in their entireties.
This application is a continuation of U.S. patent application Ser. No. 14/457,402 filed on Aug. 12, 2014, which claims the benefit of U.S. Provisional Application No. 61/865,438 filed on Aug. 13, 2013 and U.S. Provisional Application 62/011,729 filed on Jun. 13, 2014, the entire contents of each of which are incorporated by reference herein in their entireties.
Number | Date | Country | |
---|---|---|---|
62011729 | Jun 2014 | US | |
61865438 | Aug 2013 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14457402 | Aug 2014 | US |
Child | 16211900 | US |