Synthetic arcelin-5 promoter and methods of use thereof

Abstract
The present invention relates to methods for the design and production of synthetic promoters with a defined specificity and promoters produced with these methods.
Description
FIELD OF THE INVENTION

The present invention relates to methods for the design and production of synthetic promoters with a defined specificity and promoters produced with these methods.


BACKGROUND OF THE INVENTION

Manipulation of plants to alter and/or improve phenotypic characteristics such as productivity or quality requires expression of heterologous genes in plant tissues. Such genetic manipulation relies on the availability of a means to drive and to control gene expression as required. For example, genetic manipulation relies on the availability and use of suitable promoters which are effective in plants and which regulate gene expression so as to give the desired effect(s) in the transgenic plant.


Advanced traits often require the coordinated expression of more than one gene in a transgenic plant. For example, to achieve the production of polyunsaturated fatty acids such as archachidonic acid in a plant requires expression of at least 5 genes. There is also increasing demand of trait stacking which requires the combination of more than one gene in transgenic plants.


The availability of suitable promoters for such coordinated expression is limited. Promoters would often need to have the same tissue and/or developmental specificity and preferably comparable expression strength. One solution has been to use the same promoter for the expression of several genes. Expression constructs comprising more than one expression cassette with tandem or inverted sequence repeats of for example a promoter cause various problems. When located on one vector, handling of the vector in bacteria for cloning, amplification and transformation is difficult due to recombination events which lead to the loss and/or rearrangement of part of the expression construct. Moreover, sequence verification of constructs comprising repeated sequences is difficult and sometimes impossible. A further problem of such expression constructs comprising repeats of the same promoter sequence is that recombination may also occur after introduction into the genome of the target organism such as a plant.


Additionally it is well known that repeated promoter sequences in the genome of organisms such as a plant may induce silencing of expression derived from these promoters, for example by methylation of the promoter or increase of chromatin density at the site of the promoters which makes the promoter inaccessible for transcription factors.


The use of different promoters in expression constructs comprising more than one expression cassette is one possibility to circumvent these problems. Isolation and analysis of promoters is laborious and time consuming. It is unpredictable what expression pattern and expression strength an isolated promoter will have and hence a high number of promoters need to be tested in order to find at least two promoters with comparable expression pattern and optionally comparable expression strength.


There is, therefore, a great need in the art for the availability of new sequences that may be used for expression of selected transgenes in economically important plants. It is thus an objective of the present invention to provide new methods for the production of synthetic promoters with identical and/or overlapping expression pattern or expression specificity and optionally similar expression strength. This objective is solved by the present invention.







DETAILED DESCRIPTION OF THE INVENTION

A first embodiment of the invention is a method for the production of one or more synthetic regulatory nucleic acid molecules of a defined specificity comprising the steps of

    • a) identifying at least one naturally occurring nucleic acid molecule of the defined specificity (starting molecule) and
    • b) identifying conserved motives in the at least one nucleic acid sequence (starting sequence) of the starting molecule of the defined specificity as defined in a) and
    • c) mutating the starting sequence while
      • i) leaving at least 70%, preferably 80%, 85%, 90%, more preferably at least 95%, even more preferably at least 98% or at least 99% for example 100% of the motives unaltered known to be involved in regulation of the respective defined specificity (also called preferentially associated motives) and
      • ii) leaving at least 80%, preferably at least 90%, 95% for example 100% of the motives unaltered involved in transcription initiation (also called essential motives) and
      • iii) leaving at least 10%, preferably at least 20%, 30%, 40% or 50%, more preferably at least 60%, 70% or 80%, even more preferably at least 90% or 95% of other identified motives (also called non exclusively associated) unaltered and
      • iv) keeping the arrangement of the identified motives substantially unchanged and
      • v) avoiding the introduction of new motives known to influence expression with another specificity than said defined specificity and
      • vi) avoiding identical stretches of more than 50 basepairs, preferably 45 basepairs, more preferably 40 basepairs, most preferably 35 basepairs, for example 30 basepairs between each of the starting sequence and the one or more mutated sequences and
    • d) producing a nucleic acid molecule comprising the mutated sequence and
    • e) optionally testing the specificity of the mutated sequence in the respective organism.


In one embodiment of the invention, additional preferably associated motives may be introduced into the sequence of the synthetic nucleic acid molecule.


Production of the nucleic acid molecule comprising the mutated sequence could for example be done by chemical synthesis or by oligo ligation whereby smaller oligos comprising parts of the sequence of the invention are stepwise annealed and ligated to form the nucleic acid molecule of the invention.


In a preferred embodiment of the invention, the synthetic regulatory nucleic acid molecule is a synthetic promoter, in a more preferred embodiment the synthetic regulatory nucleic acid molecule is a synthetic promoter functional in a plant, plant tissue or plant cell.


The at least one starting molecule comprising the starting sequence may for example be identified by searches in literature or internet resources such as sequence and/or gene expression data bases. The at least one starting molecule comprising the starting sequence may in another example be identified by isolation and characterization of a natural occurring promoter from the respective organism, for example plants, algae, fungi, animals and the like. Such methods are well known to a person skilled in the art and for example described in Back et al., 1991, Keddie et al., 1992, Keddie et al., 1994.


Motives in a series of nucleic acid molecules may be identified by a variety of bioinformatic tools available in the art. For example see Hehl and Wingender, 2001, Hehl and Bulow, 2002, Cartharius et al., 2005, Kaplan et al., 2006, Dare at al., 2008.


In addition, there are various databases available specialized in promoter analysis and motif prediction in any given sequence. For example as reviewed in Hehl and Wingender, 2001.


It is also possible to identify motives necessary for regulation of expression of the defined specificity with experimental methods known to a skilled person. Such methods are for example deletion or mutation analysis of the respective starting sequence as for example described in Montgomery et al., 1993.


Essential motives known to be involved in transcription initiation for example by being bound by general initiation factors and/or RNA polymerases as described above under ii) are for example the TATA box, the CCAAT box, the GC box or other functional similar motives as for example identified in Roeder (1996, Trends in Biochemical Science, 21(9)) or Baek et al. (2006, Journal of Biological Chemistry, 281). These motives allow a certain degree of degeneration or variation of their sequence without changing or destroying their functionality in initiation of transcription. The skilled person is aware of such sequence variations that leave the respective motives functional. Such variations are for example given in the Transfac database as described by Matys et al, ((2003) NAR 31 (1)) and literature given therein. The Transfac database may for example be accessed via ftp://ftp.ebi.ac.uk/pub/databases/transfac/transfac32.tar.Z. Hence it is to be understood that the term “leaving motives unaltered involved in transcription initiation” means that the respective motives may be mutated, hence altered in their sequence as long as their respective function which is enabling initiation of transcription is not altered, hence as long as the essential motives are functional. In another embodiment of the invention the first 49, preferably 44, more preferably 39, even more preferably 34, most preferably 29 bp directly upstream of the transcription initiation site are kept unaltered.


The term “keeping the arrangement of the motives unchanged” as used above under iv) means, that the order of the motives and/or the distance between the motives are kept substantially unchanged, preferably unchanged. Substantially unchanged means, that the distance between two motives in the starting sequence does not differ from the distance between these motives in the synthetic regulatory nucleic acid sequence, hence the distance between said motives is not longer or shorter, by more than 100%, for example 90%, 80% or 70%, preferably 60%, 50% or 40%, more preferably not more than 30% or 20%, most preferably not more than 10% in the synthetic regulatory nucleic acid sequence as compared to the starting sequence. Preferably the distance between two motives in the starting sequence differs by not more than 10, preferably 9, more preferably 8 or 7 or 6 or 5 or 4, even more preferably not more than 3 or 2, most preferably not more than 1 basepairs from the distance in the permutated sequence.


Inverted and/or direct stretches of repeated sequences may lead to the formation of secondary structures in plasmids or genomic DNA. Repeated sequences may lead to recombination, deletion and/or rearrangement in the plasmid both in E. coli and Agrobacterium. In eukaryotic organisms, for example plants, repeated sequences also tend to be silenced by methylation. Recombination events which lead to deletions or rearrangements of one or more expression cassettes and/or T-DNAs are likely to lead to loss of function for example loss of expression of such constructs in the transgenic plant (Que and Jorgensen, 1998, Hamilton et al., 1998). It is therefore a critical feature of the invention at hand to avoid identical stretches of 50 basepairs, preferably 45 basepairs, more preferably 40 basepairs, most preferably 35 basepairs, for example 30 basepairs between each of the starting sequence and the one or more permutated sequences. In case of the production of more than one permutated sequences said identical stretches must be avoided between the starting sequence and each of the permutated sequences in a pair wise comparison. In another embodiment, such identical stretches must be avoided between all permutated sequences and the starting sequence; hence none of the permutated and starting sequences shares such identical stretches with any of the other sequences.


The skilled person is aware that regulatory nucleic acids may comprise promoters and functionally linked to said promoters 5′UTR the latter may comprise at least one intron. It has been shown, that introns may be lead to increased expression levels derived from the promoter to which the 5′UTR comprising the intron is functionally linked. The 5′ UTR and the intron may be altered in their sequence as described, wherein the splice sites and putative branching point are not altered in order to ensure correct splicing of the intron after permutation. No nucleotide exchanges are introduced into sequences at least 2, preferably at least 3, more preferably at least 5, even more preferably at least 10 bases up- and downstream of the splice sites (5′ GT; 3′ CAG) are kept unchanged. In addition, “CURAY” and “TNA” sequence elements being potential branching points of the intron are kept unchanged within the last 200 base pairs, preferably the last 150 base pairs, more preferably the last 100 base pairs, even more preferably the last 75 base pairs of the respective intron.


The 5′UTR may be permutated according to the rules as defined above, wherein preferably at least 25, more preferably at least 20, even more preferably at least 15, for example at least 10, most preferably at least 5 base pairs up- and downstream of the transcription start are kept unchanged. The AT content of both the 5′ UTR and the intron is not changed by more than 20%, preferably not more than 15%, for example 10% or 5% compared to the AT content of the starting sequence.


A further embodiment of the invention is a synthetic regulatory nucleic acid molecule produced according to the method of the invention.


An expression construct comprising the said synthetic regulatory nucleic acid molecule is another embodiment of the invention.


A vector comprising the regulatory nucleic acid molecule or the expression construct of the invention is also comprised in this invention, as well as microorganisms, plant cells or animal cells comprising the regulatory nucleic acid molecule, the expression construct and/or the vector of the invention.


A further embodiment of the invention is a plant, plant seed, plant cell or part of a plant comprising the regulatory nucleic acid molecule, the expression construct and/or the vector of the invention.


A further embodiment of the invention are exemplary recombinant seed specific or seed preferential synthetic regulatory nucleic acid molecules produced according to the method of the invention wherein the regulatory nucleic acid molecule is comprised in the group consisting of

    • I) a nucleic acid molecule represented by SEQ ID NO: 2, 4 or 6 and
    • II) a nucleic acid molecule comprising at least 1000 consecutive base pairs, for example 1000 base pairs, preferably at least 800 consecutive base pairs, for example 800 base pairs, more preferably at least 700 consecutive base pairs, for example 700 base pairs, even more preferably at least 600 consecutive base pairs, for example 600 base pairs, most preferably at least 500 consecutive base pairs, for example 500 base pairs or at least 400, at least 300, at least 250 for example 400, 300 or 250 base pairs of a sequence described by SEQ ID NO: 2, 4 or 6 and
    • III) a nucleic acid molecule having an identity of at least 70%, for example at least 75%, 76%, 77%, 78%, 79% preferably at least 80%, for example at least 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, more preferably 90%, for example at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, even more preferably 98% most preferably 99% over a sequence of at least 250, 300, 400, 500, 600 preferably 700, more preferably 800, even more preferably 900, most preferably 1000 consecutive nucleic acid base pairs to a sequences described by SEQ ID NO: 2, 4 or 6 and IV) a nucleic acid molecule having an identity of at least 70%, for example at least 75%, 76%, 77%, 78%, 79% preferably at least 80%, for example at least 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, more preferably 90%, for example at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, even more preferably 98% most preferably 99% to a sequence consisting of at least 50%, 60%, 70%, 80%, 90% or 100% of any of the sequences described by SEQ ID NO: 2, 4 or 6 and
    • V) a nucleic acid molecule hybridizing under high stringent, preferably very high stringent conditions with a nucleic acid molecule of at least 250, 300, 400, 500, 600, 700, 800, 900, 1000 or the complete consecutive base pairs of a nucleic acid molecule described by any of SEQ ID NO: 2, 4 or 6 and
    • VI) a complement of any of the nucleic acid molecules as defined in I) to V).


Another embodiment of the invention are exemplary recombinant seed specific or seed preferential synthetic regulatory nucleic acid molecules produced according to the method of the invention wherein the regulatory nucleic acid molecule is comprised in the group consisting of

    • i) a nucleic acid molecule represented by SEQ ID NO: 2, 4 or 6 and
    • ii) a nucleic acid molecule comprising at least 1000 consecutive base pairs, for example 1000 base pairs, preferably at least 800 consecutive base pairs, for example 800 base pairs, more preferably at least 700 consecutive base pairs, for example 700 base pairs, even more preferably at least 600 consecutive base pairs, for example 600 base pairs, most preferably at least 500 consecutive base pairs, for example 500 base pairs or at least 400, at least 300, at least 250 for example 400, 300 or 250 base pairs of a sequence described by SEQ ID NO: 2, 4 or 6 and
    • iii) a nucleic acid molecule having an identity of at least 75% over a sequence of at least 250, 300, 400, 500, 600 preferably 700, more preferably 800, even more preferably 900, most preferably 1000 or the complete consecutive nucleic acid base pairs to a sequences described by SEQ ID NO: 6,
    • iv) a nucleic acid molecule having an identity of at least 90% over a sequence of at least 250, 300, 400, 500, 600 preferably 700, more preferably 800, even more preferably 900, most preferably 1000 or the complete consecutive nucleic acid base pairs to a sequences described by SEQ ID NO: 2 or 4 and
    • v) a nucleic acid molecule hybridizing under high stringent, preferably very high stringent conditions with a nucleic acid molecule of at least 250, 300, 400, 500, 600, 700, 800, 900, 1000 or the complete consecutive base pairs of a nucleic acid molecule described by any of SEQ ID NO: 2, 4 or 6 and
    • vi) a complement of any of the nucleic acid molecules as defined in i) to v).


Further embodiments of the invention are exemplary recombinant constitutive regulatory nucleic acid molecules produced according to the method of the invention wherein the regulatory nucleic acid molecule is comprised in the group consisting of

    • I) a nucleic acid molecule represented by SEQ ID NO: 14 or 15 and
    • II) a nucleic acid molecule comprising at least 1750, 1500, 1250 or 1000 consecutive base pairs, for example 1000 base pairs, preferably at least 800 consecutive base pairs, for example 800 base pairs, more preferably at least 700 consecutive base pairs, for example 700 base pairs, even more preferably at least 600 consecutive base pairs, for example 600 base pairs, most preferably at least 500 consecutive base pairs, for example 500 base pairs or at least 400, at least 300, at least 250 for example 400, 300 or 250 base pairs of a sequence described by SEQ ID NO: 14 or 15 and
    • III) a nucleic acid molecule having an identity of at least 70%, for example at least 75%, 76%, 77%, 78%, 79% preferably at least 80%, for example at least 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, more preferably 90%, for example at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, even more preferably 98% most preferably 99% over a sequence of at least 250, 300, 400, 500, 600 preferably 700, more preferably 800, even more preferably 900, for example 1000, most preferably 1250, for example 1500 or 1750 or 2000 consecutive nucleic acid base pairs to a sequences described by SEQ ID NO: 14 or 15 and
    • IV) a nucleic acid molecule having an identity of at least 70%, for example at least 75%, 76%, 77%, 78%, 79% preferably at least 80%, for example at least 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, more preferably 90%, for example at least 91%, 92%, 93%, 94%, 95%, 96%, 97%, even more preferably 98% most preferably 99% to a sequence consisting of at least 50%, 60%, 70%, 80%, 90% or 100% of any of the sequences described by SEQ ID NO: 14 or 15 and
    • V) a nucleic acid molecule hybridizing under high stringent, preferably very high stringent conditions with a nucleic acid molecule of at least 250, 300, 400, 500, 600, 700, 800, 900, 1000, 1250, 1500, 1750 or 2000 or the complete consecutive base pairs of a nucleic acid molecule described by any of SEQ ID NO: 14 or 15 and
    • VI) a complement of any of the nucleic acid molecules as defined in I) to V).


Another embodiment of the invention are exemplary recombinant constitutive synthetic regulatory nucleic acid molecules produced according to the method of the invention wherein the regulatory nucleic acid molecule is comprised in the group consisting of

    • i) a nucleic acid molecule represented by SEQ ID NO: 14 or 15 and
    • ii) a nucleic acid molecule comprising at least 2000, 1750, 1500, 1250 or 1000 consecutive base pairs, for example 1000 base pairs, preferably at least 800 consecutive base pairs, for example 800 base pairs, more preferably at least 700 consecutive base pairs, for example 700 base pairs, even more preferably at least 600 consecutive base pairs, for example 600 base pairs, most preferably at least 500 consecutive base pairs, for example 500 base pairs or at least 400, at least 300, at least 250 for example 400, 300 or 250 base pairs of a sequence described by SEQ ID NO: 14 or 15 and
    • iii) a nucleic acid molecule having an identity of at least 95%, preferably 97%, more preferably 98%, most preferably 99% over a sequence of at least 250, 300, 400, 500, 600 preferably 700, more preferably 800, even more preferably 900, fore example 1000, most preferably 1500, for example 2000 or the complete consecutive nucleic acid base pairs to a sequences described by SEQ ID NO: 14 or 15,
    • iv) a nucleic acid molecule hybridizing under high stringent, preferably very high stringent conditions with a nucleic acid molecule of at least 250, 300, 400, 500, 600, 700, 800, 900, 1000, 1250. 1500, 1750, 2000 or the complete consecutive base pairs of a nucleic acid molecule described by any of SEQ ID NO: 14 or 15 and
    • v) a complement of any of the nucleic acid molecules as defined in i) to v).


It is to be understood, that the group of exemplary recombinant seed specific or seed preferential or constitutive synthetic regulatory nucleic acid molecules produced according to the method of the invention as defined above under I) to V) and i) to vi) does not comprise the starting molecules as defined by SEQ ID NO: 1, 3, 5 and 13 or a complement thereof or a nucleic acid molecule having at least 250 consecutive base pairs of a sequence described by SEQ ID NO: 1, 3, 5 or 13 or a complement thereof or any other nucleic acid molecule occurring in a wild type plant as such nucleic acid molecules are molecules that are not produced according to the invention but are naturally present in wild type plants.


An expression construct comprising any of said synthetic regulatory nucleic acid molecules as defined above under I) to VI) and i) to vi) is another embodiment of the invention.


A vector comprising the regulatory nucleic acid molecule or the expression construct of the invention is also comprised in this invention, as well as microorganisms, plant cells or animal cells comprising the regulatory nucleic acid molecule, the expression construct and/or the vector of the invention.


A further embodiment of the invention is a plant, plant seed, plant cell or part of a plant comprising the regulatory nucleic acid molecule, the expression construct and/or the vector of the invention.


DEFINITIONS

Abbreviations: GFP—green fluorescence protein, GUS—beta-Glucuronidase, BAP—6-benzylaminopurine; 2,4-D—2,4-dichlorophenoxyacetic acid; MS—Murashige and Skoog mediurn; NAA—1-naphtaleneacetic acid; MES, 2-(N-morpholino-ethanesulfonic acid, IAA indole acetic acid; Kan: Kanamycin sulfate; GA3—Gibberellic acid; Timentin™: ticarcillin disodium/clavulanate potassium.


It is to be understood that this invention is not limited to the particular methodology or protocols. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention which will be limited only by the appended claims. It must be noted that as used herein and in the appended claims, the singular forms “a,” “and,” and “the” include plural reference unless the context clearly dictates otherwise. Thus, for example, reference to “a vector” is a reference to one or more vectors and includes equivalents thereof known to those skilled in the art, and so forth. The term “about” is used herein to mean approximately, roughly, around, or in the region of. When the term “about” is used in conjunction with a numerical range, it modifies that range by extending the boundaries above and below the numerical values set forth. In general, the term “about” is used herein to modify a numerical value above and below the stated value by a variance of 20 percent, preferably 10 percent up or down (higher or lower). As used herein, the word “or” means any one member of a particular list and also includes any combination of members of that list. The words “comprise,” “comprising,” “include,” “including,” and “includes” when used in this specification and in the following claims are intended to specify the presence of one or more stated features, integers, components, or steps, but they do not preclude the presence or addition of one or more other features, integers, components, steps, or groups thereof. For clarity, certain terms used in the specification are defined and used as follows:


Antiparallel: “Antiparallel” refers herein to two nucleotide sequences paired through hydrogen bonds between complementary base residues with phosphodiester bonds running in the 5′-3′ direction in one nucleotide sequence and in the 3′-5′ direction in the other nucleotide sequence.


Antisense: The term “antisense” refers to a nucleotide sequence that is inverted relative to its normal orientation for transcription or function and so expresses an RNA transcript that is complementary to a target gene mRNA molecule expressed within the host cell (e.g., it can hybridize to the target gene mRNA molecule or single stranded genomic DNA through Watson-Crick base pairing) or that is complementary to a target DNA molecule such as, for example genomic DNA present in the host cell.


“Box” or as synonymously used herein “motif” or “cis-element” of a promoter means a transcription factor binding sequence defined by a highly conserved core sequence of approximately 4 to 6 nucleotides surrounded by a conserved matrix sequence of in total up to 20 nucleotides within the plus or minus strand of the promoter, which is able of interacting with the DNA binding domain of a transcription factor protein. The conserved matrix sequence allows some variability in the sequence without loosing its ability to be bound by the DNA binding domain of a transcription factor protein.


One way to describe transcription factor binding sites (TFBS) is by nucleotide or position weight matrices (NWM or PWM) (for review see Stormo, 2000). A weight matrix pattern definition is superior to a simple IUPAC consensus sequence as it represents the complete nucleotide distribution for each single position. It also allows the quantification of the similarity between the weight matrix and a potential TFBS detected in the sequence (Cartharius et al. 2005).


Coding region: As used herein the term “coding region” when used in reference to a structural gene refers to the nucleotide sequences which encode the amino acids found in the nascent polypeptide as a result of translation of a mRNA molecule. The coding region is bounded, in eukaryotes, on the 5′-side by the nucleotide triplet “ATG” which encodes the initiator methionine and on the 3′-side by one of the three triplets which specify stop codons (i.e., TAA, TAG, TGA). In addition to containing introns, genomic forms of a gene may also include sequences located on both the 5′- and 3′-end of the sequences which are present on the RNA transcript. These sequences are referred to as “flanking” sequences or regions (these flanking sequences are located 5′ or 3′ to the non-translated sequences present on the mRNA transcript). The 5′-flanking region may contain regulatory sequences such as promoters and enhancers which control or influence the transcription of the gene. The 3′-flanking region may contain sequences which direct the termination of transcription, post-transcriptional cleavage and polyadenylation.


Complementary: “Complementary” or “complementarity” refers to two nucleotide sequences which comprise antiparallel nucleotide sequences capable of pairing with one another (by the base-pairing rules) upon formation of hydrogen bonds between the complementary base residues in the antiparallel nucleotide sequences. For example, the sequence 5′-AGT-3′ is complementary to the sequence 5′-ACT-3′. Complementarity can be “partial” or “total.” “Partial” complementarity is where one or more nucleic acid bases are not matched according to the base pairing rules. “Total” or “complete” complementarity between nucleic acid molecules is where each and every nucleic acid base is matched with another base under the base pairing rules. The degree of complementarity between nucleic acid molecule strands has significant effects on the efficiency and strength of hybridization between nucleic acid molecule strands. A “complement” of a nucleic acid sequence as used herein refers to a nucleotide sequence whose nucleic acid molecules show total complementarity to the nucleic acid molecules of the nucleic acid sequence.


Conserved motives: A conserved motif as used herein means a sequence motif or box found in various promoters having the same or overlapping specificity. Overlapping specificity means the specificity of at least two promoters wherein the expression derived from one promoter is in part or completely in the same for example tissue as the other promoter, wherein the latter one may drive expression in additional tissues in which the first promoter may not drive expression. Motives may be grouped in three classes:


Essential: motives present in the promoters of most genes that are transcribed by RNA Polymerase II and which are preferentially localized close to the transcription start side. Such motives must not be made dysfunctional by mutations according to the method of the invention. Hence they must not be altered in a way that prevents them from being bound by the respective DNA binding domain of the transcription factor protein that would have bound to the unaltered sequence.


non exclusively associated: motives present in the promoters of genes that are associated with certain tissues/physiological states/treatments but not exclusively, they may be expressed also in other tissues/physiological states/treatments. According to the method of the invention, such motives should preferably not be made dysfunctional by mutations or at least only a certain percentage of such motives present in one particular promoter or starting sequence. Hence they should preferably not be altered in a way that prevents them from being bound by the respective DNA binding domain of the transcription factor protein that would have bound to the unaltered sequence.


preferentially associated: motives present in the promoters of genes that are expressed preferentially in specific tissues/physiological states/treatments. The vast majority of such motives identified in a starting sequence must not be made dysfunctional by mutations according to the method of the invention. Hence they must not be altered in a way that prevents them from being bound by the respective DNA binding domain of the transcription factor protein that would have bound to the unaltered sequence.


Defined specificity: the term “defined specificity” means any expression specificity of a promoter, preferably a plant specific promoter, which is beneficial for the expression of a distinct coding sequence or RNA. A defined specificity may for example be a tissue or developmental specificity or the expression specificity could be defined by induction or repression of expression by biotic or abiotic stimuli or a combination of any of these.


Double-stranded RNA: A “double-stranded RNA” molecule or “dsRNA” molecule comprises a sense RNA fragment of a nucleotide sequence and an antisense RNA fragment of the nucleotide sequence, which both comprise nucleotide sequences complementary to one another, thereby allowing the sense and antisense RNA fragments to pair and form a double-stranded RNA molecule.


Endogenous: An “endogenous” nucleotide sequence refers to a nucleotide sequence, which is present in the genome of the untransformed plant cell.


Expression: “Expression” refers to the biosynthesis of a gene product, preferably to the transcription and/or translation of a nucleotide sequence, for example an endogenous gene or a heterologous gene, in a cell. For example, in the case of a structural gene, expression involves transcription of the structural gene into mRNA and—optionally—the subsequent translation of mRNA into one or more polypeptides. In other cases, expression may refer only to the transcription of the DNA harboring an RNA molecule. Expression may also refer to the change of the steady state level of the respective RNA in a plant or part thereof due to change of the stability of the respective RNA.


Similar expression strength: Two or more regulatory nucleic acid molecules have a similar expression strength when the expression derived from any of the regulatory nucleic acid molecule in a distinct cell, tissue or plant organ does not deviate by more than factor 2.


Expression construct: “Expression construct” as used herein mean a DNA sequence capable of directing expression of a particular nucleotide sequence in an appropriate part of a plant or plant cell, comprising a promoter functional in said part of a plant or plant cell into which it will be introduced, operatively linked to the nucleotide sequence of interest which is—optionally—operatively linked to termination signals. If translation is required, it also typically comprises sequences required for proper translation of the nucleotide sequence. The coding region may code for a protein of interest but may also code for a functional RNA of interest, for example RNAa, siRNA, snoRNA, snRNA, microRNA, ta-siRNA or any other noncoding regulatory RNA, in the sense or antisense direction. The expression construct comprising the nucleotide sequence of interest may be chimeric, meaning that one or more of its components is heterologous with respect to one or more of its other components. The expression construct may also be one, which is naturally occurring but has been obtained in a recombinant form useful for heterologous expression. Typically, however, the expression construct is heterologous with respect to the host, i.e., the particular DNA sequence of the expression construct does not occur naturally in the host cell and must have been introduced into the host cell or an ancestor of the host cell by a transformation event. The expression of the nucleotide sequence in the expression construct may be under the control of a constitutive promoter or of an inducible promoter, which initiates transcription only when the host cell is exposed to some particular external stimulus. In the case of a plant, the promoter can also be specific to a particular tissue or organ or stage of development.


Expression pattern or expression specificity of a regulatory nucleic acid molecule as used herein defines the tissue and/or developmental and/or environmentally modulated expression of a coding sequence or RNA under the control of a distinct regulatory nucleic acid molecule.


Foreign: The term “foreign” refers to any nucleic acid molecule (e.g., gene sequence) which is introduced into the genome of a cell by experimental manipulations and may include sequences found in that cell so long as the introduced sequence contains some modification (e.g., a point mutation, the presence of a selectable marker gene, etc.) and is therefore distinct relative to the naturally-occurring sequence.


Functional linkage: The term “functional linkage” or “functionally linked” is to be understood as meaning, for example, the sequential arrangement of a regulatory element (e.g. a promoter) with a nucleic acid sequence to be expressed and, if appropriate, further regulatory elements (such as e.g., a terminator or an enhancer) in such a way that each of the regulatory elements can fulfill its intended function to allow, modify, facilitate or otherwise influence expression of said nucleic acid sequence. As a synonym the wording “operable linkage” or “operably linked” may be used. The expression may result depending on the arrangement of the nucleic acid sequences in relation to sense or antisense RNA. To this end, direct linkage in the chemical sense is not necessarily required. Genetic control sequences such as, for example, enhancer sequences, can also exert their function on the target sequence from positions which are further away, or indeed from other DNA molecules. Preferred arrangements are those in which the nucleic acid sequence to be expressed recombinantly is positioned behind the sequence acting as promoter, so that the two sequences are linked covalently to each other. The distance between the promoter sequence and the nucleic acid sequence to be expressed recombinantly is preferably less than 200 base pairs, especially preferably less than 100 base pairs, very especially preferably less than 50 base pairs. In a preferred embodiment, the nucleic acid sequence to be transcribed is located behind the promoter in such a way that the transcription start is identical with the desired beginning of the chimeric RNA of the invention. Functional linkage, and an expression construct, can be generated by means of customary recombination and cloning techniques as described (e.g., in Maniatis T, Fritsch E F and Sambrook J (1989) Molecular Cloning: A Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory, Cold Spring Harbor (NY); Silhavy et al. (1984) Experiments with Gene Fusions, Cold Spring Harbor Laboratory, Cold Spring Harbor (NY); Ausubel et al. (1987) Current Protocols in Molecular Biology, Greene Publishing Assoc. and Wiley Interscience; Gelvin et al. (Eds) (1990) Plant Molecular Biology Manual; Kluwer Academic Publisher, Dordrecht, The Netherlands). However, further sequences, which, for example, act as a linker with specific cleavage sites for restriction enzymes, or as a signal peptide, may also be positioned between the two sequences. The insertion of sequences may also lead to the expression of fusion proteins. Preferably, the expression construct, consisting of a linkage of a regulatory region for example a promoter and nucleic acid sequence to be expressed, can exist in a vector-integrated form and be inserted into a plant genome, for example by transformation.


Gene: The term “gene” refers to a region operably joined to appropriate regulatory sequences capable of regulating the expression of the gene product (e.g., a polypeptide or a functional RNA) in some manner. A gene includes untranslated regulatory regions of DNA (e.g., promoters, enhancers, repressors, etc.) preceding (up-stream) and following (downstream) the coding region (open reading frame, ORF) as well as, where applicable, intervening sequences (i.e., introns) between individual coding regions (i.e., exons). The term “structural gene” as used herein is intended to mean a DNA sequence that is transcribed into mRNA which is then translated into a sequence of amino acids characteristic of a specific polypeptide.


Genome and genomic DNA: The terms “genome” or “genomic DNA” is referring to the heritable genetic information of a host organism. Said genomic DNA comprises the DNA of the nucleus (also referred to as chromosomal DNA) but also the DNA of the plastids (e.g., chloroplasts) and other cellular organelles (e.g., mitochondria). Preferably the terms genome or genomic DNA is referring to the chromosomal DNA of the nucleus.


Heterologous: The term “heterologous” with respect to a nucleic acid molecule or DNA refers to a nucleic acid molecule which is operably linked to, or is manipulated to become operably linked to, a second nucleic acid molecule to which it is not operably linked in nature, or to which it is operably linked at a different location in nature. A heterologous expression construct comprising a nucleic acid molecule and one or more regulatory nucleic acid molecule (such as a promoter or a transcription termination signal) linked thereto for example is a constructs originating by experimental manipulations in which either a) said nucleic acid molecule, or b) said regulatory nucleic acid molecule or c) both (i.e. (a) and (b)) is not located in its natural (native) genetic environment or has been modified by experimental manipulations, an example of a modification being a substitution, addition, deletion, inversion or insertion of one or more nucleotide residues. Natural genetic environment refers to the natural chromosomal locus in the organism of origin, or to the presence in a genomic library. In the case of a genomic library, the natural genetic environment of the sequence of the nucleic acid molecule is preferably retained, at least in part. The environment flanks the nucleic acid sequence at least at one side and has a sequence of at least 50 bp, preferably at least 500 bp, especially preferably at least 1,000 bp, very especially preferably at least 5,000 bp, in length. A naturally occurring expression construct—for example the naturally occurring combination of a promoter with the corresponding gene—becomes a transgenic expression construct when it is modified by non-natural, synthetic “artificial” methods such as, for example, mutagenization. Such methods have been described (U.S. Pat. No. 5,565,350; WO 00/15815). For example a protein encoding nucleic acid molecule operably linked to a promoter, which is not the native promoter of this molecule, is considered to be heterologous with respect to the promoter. Preferably, heterologous DNA is not endogenous to or not naturally associated with the cell into which it is introduced, but has been obtained from another cell or has been synthesized. Heterologous DNA also includes an endogenous DNA sequence, which contains some modification, non-naturally occurring, multiple copies of an endogenous DNA sequence, or a DNA sequence which is not naturally associated with another DNA sequence physically linked thereto. Generally, although not necessarily, heterologous DNA encodes RNA or proteins that are not normally produced by the cell into which it is expressed.


Hybridization: The term “hybridization” as used herein includes “any process by which a strand of nucleic acid molecule joins with a complementary strand through base pairing.” (J. Coombs (1994) Dictionary of Biotechnology, Stockton Press, New York). Hybridization and the strength of hybridization (i.e., the strength of the association between the nucleic acid molecules) is impacted by such factors as the degree of complementarity between the nucleic acid molecules, stringency of the conditions involved, the Tm of the formed hybrid, and the G:C ratio within the nucleic acid molecules. As used herein, the term “Tm” is used in reference to the “melting temperature.” The melting temperature is the temperature at which a population of double-stranded nucleic acid molecules becomes half dissociated into single strands. The equation for calculating the Tm of nucleic acid molecules is well known in the art. As indicated by standard references, a simple estimate of the Tm value may be calculated by the equation: Tm=81.5+0.41 (% G+C), when a nucleic acid molecule is in aqueous solution at 1 M NaCl [see e.g., Anderson and Young, Quantitative Filter Hybridization, in Nucleic Acid Hybridization (1985)]. Other references include more sophisticated computations, which take structural as well as sequence characteristics into account for the calculation of Tm. Stringent conditions, are known to those skilled in the art and can be found in Current Protocols in Molecular Biology, John Wiley & Sons, N.Y. (1989), 6.3.1-6.3.6.


Medium stringency conditions when used in reference to nucleic acid hybridization comprise conditions equivalent to binding or hybridization at 68° C. in a solution consisting of 5×SSPE (43.8 g/L NaCl, 6.9 g/L NaH2PO4.H2O and 1.85 g/L EDTA, pH adjusted to 7.4 with NaOH), 1% SDS, 5× Denhardt's reagent [50× Denhardt's contains the following per 500 mL 5 g Ficoll (Type 400, Pharmacia), 5 g BSA (Fraction V; Sigma)] and 100 μg/mL denatured salmon sperm DNA followed by washing (preferably for one times 15 minutes, more preferably two times 15 minutes, more preferably three time 15 minutes) in a solution comprising 1×SSC (1×SSC is 0.15 M NaCl plus 0.015 M sodium citrate) and 0.1% SDS at room temperature or—preferably 37° C.—when a DNA probe of preferably about 100 to about 500 nucleotides in length is employed.


High stringency conditions when used in reference to nucleic acid hybridization comprise conditions equivalent to binding or hybridization at 68° C. in a solution consisting of 5×SSPE (43.8 g/L NaCl, 6.9 g/L NaH2PO4.H2O and 1.85 g/L EDTA, pH adjusted to 7.4 with NaOH), 1% SDS, 5× Denhardt's reagent [50× Denhardt's contains the following per 500 mL 5 g Ficoll (Type 400, Pharmacia), 5 g BSA (Fraction V; Sigma)] and 100 μg/mL denatured salmon sperm DNA followed by washing (preferably for one times 15 minutes, more preferably two times 15 minutes, more preferably three time 15 minutes) in a solution comprising 0.1×SSC (1×SSC is 0.15 M NaCl plus 0.015 M sodium citrate) and 1% SDS at room temperature or—preferably 37° C.—when a DNA probe of preferably about 100 to about 500 nucleotides in length is employed.


Very high stringency conditions when used in reference to nucleic acid hybridization comprise conditions equivalent to binding or hybridization at 68° C. in a solution consisting of 5×SSPE, 1% SDS, 5× Denhardt's reagent and 100 μg/mL denatured salmon sperm DNA followed by washing (preferably for one times 15 minutes, more preferably two times 15 minutes, more preferably three time 15 minutes) in a solution comprising 0.1×SSC, and 1% SDS at 68° C., when a probe of preferably about 100 to about 500 nucleotides in length is employed.


“Identity”: “Identity” when used in respect to the comparison of two or more nucleic acid or amino acid molecules means that the sequences of said molecules share a certain degree of sequence similarity, the sequences being partially identical.


To determine the percentage identity (homology is herein used interchangeably) of two amino acid sequences or of two nucleic acid molecules, the sequences are written one underneath the other for an optimal comparison (for example gaps may be inserted into the sequence of a protein or of a nucleic acid in order to generate an optimal alignment with the other protein or the other nucleic acid).


The amino acid residues or nucleic acid molecules at the corresponding amino acid positions or nucleotide positions are then compared. If a position in one sequence is occupied by the same amino acid residue or the same nucleic acid molecule as the corresponding position in the other sequence, the molecules are homologous at this position (i.e. amino acid or nucleic acid “homology” as used in the present context corresponds to amino acid or nucleic acid “identity”. The percentage identity between the two sequences is a function of the number of identical positions shared by the sequences (i.e. % homology=number of identical positions/total number of positions×100). The terms “homology” and “identity” are thus to be considered as synonyms.


For the determination of the percentage identity of two or more amino acids or of two or more nucleotide sequences several computer software programs have been developed. The identity of two or more sequences can be calculated with for example the software fasta, which presently has been used in the version fasta 3 (W. R. Pearson and D. J. Lipman, PNAS 85, 2444 (1988); W. R. Pearson, Methods in Enzymology 183, 63 (1990); W. R. Pearson and D. J. Lipman, PNAS 85, 2444 (1988); W. R. Pearson, Enzymology 183, 63 (1990)). Another useful program for the calculation of identities of different sequences is the standard blast program, which is included in the Biomax pedant software (Biomax, Munich, Federal Republic of Germany). This leads unfortunately sometimes to suboptimal results since blast does not always include complete sequences of the subject and the query. Nevertheless as this program is very efficient it can be used for the comparison of a huge number of sequences. The following settings are typically used for such a comparisons of sequences:


-p Program Name [String]; -d Database [String]; default=nr; -i Query File [File In]; default=stdin; -e Expectation value (E) [Real]; default=10.0; -m alignment view options: 0=pairwise; 1=query-anchored showing identities; 2=query-anchored no identities; 3=flat query-anchored, show identities; 4=flat query-anchored, no identities; 5=query-anchored no identities and blunt ends; 6=flat query-anchored, no identities and blunt ends; 7=XML Blast output; 8=tabular; 9 tabular with comment lines [Integer]; default=0; -o BLAST report Output File [File Out] Optional; default=stdout; -F Filter query sequence (DUST with blastn, SEG with others) [String]; default=T; -G Cost to open a gap (zero invokes default behavior) [Integer]; default=0; -E Cost to extend a gap (zero invokes default behavior) [Integer]; default=0; -X X dropoff value for gapped alignment (in bits) (zero invokes default behavior); blastn 30, megablast 20, tblastx 0, all others 15 [Integer]; default=0; -I Show GI's in deflines [T/F]; default=F; -q Penalty for a nucleotide mismatch (blastn only) [Integer]; default=−3; -r Reward for a nucleotide match (blastn only) [Integer]; default=1; -v Number of database sequences to show oneline descriptions for (V) [Integer]; default=500; -b Number of database sequence to show alignments for (B) [Integer]; default=250; -f Threshold for extending hits, default if zero; blastp 11, blastn 0, blastx 12, tblastn 13; tblastx 13, megablast 0 [Integer]; default=0; -g Perfom gapped alignment (not available with tblastx) [T/F]; default=T; -Q Query Genetic code to use [Integer]; default=1; -D DB Genetic code (for tblast[nx] only) [Integer]; default=1; -a Number of processors to use [Integer]; default=1; -O SeqAlign file [File Out] Optional; -J Believe the query defline [T/F]; default=F; -M Matrix [String]; default=BLOSUM62; —W Word size, default if zero (blastn 11, megablast 28, all others 3) [Integer]; default=0; -z Effective length of the database (use zero for the real size) [Real]; default=0; -K Number of best hits from a region to keep (off by default, if used a value of 100 is recommended) [Integer]; default=0; —P 0 for multiple hit, 1 for single hit [Integer]; default=0; —Y Effective length of the search space (use zero for the real size) [Real]; default=0; —S Query strands to search against database (for blast[nx], and tblastx); 3 is both, 1 is top, 2 is bottom [Integer]; default=3; -T Produce HTML output [T/F]; default=F; —I Restrict search of database to list of GI's [String] Optional; -U Use lower case filtering of FASTA sequence [T/F] Optional; default=F; -y X dropoff value for ungapped extensions in bits (0.0 invokes default behavior); blastn 20, megablast 10, all others 7 [Real]; default=0.0; -Z X dropoff value for final gapped alignment in bits (0.0 invokes default behavior); blastn/megablast 50, tblastx 0, all others 25 [Integer]; default=0; —R PSI-TBLASTN checkpoint file [File In] Optional; -n MegaBlast search [T/F]; default=F; -L Location on query sequence [String] Optional; -A Multiple Hits window size, default if zero (blastn/megablast 0, all others 40 [Integer]; default=0; -w Frame shift penalty (OOF algorithm for blastx) [Integer]; default=0; -t Length of the largest intron allowed in tblastn for linking HSPs (0 disables linking) [Integer]; default=0.


Results of high quality are reached by using the algorithm of Needleman and Wunsch or Smith and Waterman. Therefore programs based on said algorithms are preferred. Advantageously the comparisons of sequences can be done with the program PileUp (J. Mol. Evolution., 25, 351 (1987), Higgins et al., CABIOS 5, 151 (1989)) or preferably with the programs “Gap” and “Needle”, which are both based on the algorithms of Needleman and Wunsch (J. Mol. Biol. 48; 443 (1970)), and “BestFit”, which is based on the algorithm of Smith and Waterman (Adv. Appl. Math. 2; 482 (1981)). “Gap” and “BestFit” are part of the GCG software-package (Genetics Computer Group, 575 Science Drive, Madison, Wis., USA 53711 (1991); Altschul et al., (Nucleic Acids Res. 25, 3389 (1997)), “Needle” is part of the The European Molecular Biology Open Software Suite (EMBOSS) (Trends in Genetics 16 (6), 276 (2000)). Therefore preferably the calculations to determine the percentages of sequence identity are done with the programs “Gap” or “Needle” over the whole range of the sequences. The following standard adjustments for the comparison of nucleic acid sequences were used for “Needle”: matrix: EDNAFULL, Gap_penalty: 10.0, Extend_penalty: 0.5. The following standard adjustments for the comparison of nucleic acid sequences were used for “Gap”: gap weight: 50, length weight: 3, average match: 10.000, average mismatch: 0.000.


For example a sequence, which is said to have 80% identity with sequence SEQ ID NO: 1 at the nucleic acid level is understood as meaning a sequence which, upon comparison with the sequence represented by SEQ ID NO: 1 by the above program “Needle” with the above parameter set, has a 80% identity. The identity is calculated on the complete length of the query sequence, for example SEQ ID NO:1.


Isogenic: organisms (e.g., plants), which are genetically identical, except that they may differ by the presence or absence of a heterologous DNA sequence.


Isolated: The term “isolated” as used herein means that a material has been removed by the hand of man and exists apart from its original, native environment and is therefore not a product of nature. An isolated material or molecule (such as a DNA molecule or enzyme) may exist in a purified form or may exist in a non-native environment such as, for example, in a transgenic host cell. For example, a naturally occurring polynucleotide or polypeptide present in a living plant is not isolated, but the same polynucleotide or polypeptide, separated from some or all of the coexisting materials in the natural system, is isolated. Such polynucleotides can be part of a vector and/or such polynucleotides or polypeptides could be part of a composition, and would be isolated in that such a vector or composition is not part of its original environment. Preferably, the term “isolated” when used in relation to a nucleic acid molecule, as in “an isolated nucleic acid sequence” refers to a nucleic acid sequence that is identified and separated from at least one contaminant nucleic acid molecule with which it is ordinarily associated in its natural source. Isolated nucleic acid molecule is nucleic acid molecule present in a form or setting that is different from that in which it is found in nature. In contrast, non-isolated nucleic acid molecules are nucleic acid molecules such as DNA and RNA, which are found in the state they exist in nature. For example, a given DNA sequence (e.g., a gene) is found on the host cell chromosome in proximity to neighboring genes; RNA sequences, such as a specific mRNA sequence encoding a specific protein, are found in the cell as a mixture with numerous other mRNAs, which encode a multitude of proteins. However, an isolated nucleic acid sequence comprising for example SEQ ID NO: 1 includes, by way of example, such nucleic acid sequences in cells which ordinarily contain SEQ ID NO:1 where the nucleic acid sequence is in a chromosomal or extrachromosomal location different from that of natural cells, or is otherwise flanked by a different nucleic acid sequence than that found in nature. The isolated nucleic acid sequence may be present in single-stranded or double-stranded form. When an isolated nucleic acid sequence is to be utilized to express a protein, the nucleic acid sequence will contain at a minimum at least a portion of the sense or coding strand (i.e., the nucleic acid sequence may be single-stranded). Alternatively, it may contain both the sense and anti-sense strands (i.e., the nucleic acid sequence may be double-stranded).


Minimal Promoter: promoter elements, particularly a TATA element, that are inactive or that have greatly reduced promoter activity in the absence of upstream activation. In the presence of a suitable transcription factor, the minimal promoter functions to permit transcription.


Naturally occurring as used herein means a cell or molecule, for example a plant cell or nucleic acid molecule that occurs in a plant or organism which is not manipulated by man, hence which is for example neither mutated nor genetically engineered by man.


Non-coding: The term “non-coding” refers to sequences of nucleic acid molecules that do not encode part or all of an expressed protein. Non-coding sequences include but are not limited to introns, enhancers, promoter regions, 3′ untranslated regions, and 5′ untranslated regions.


Nucleic acids and nucleotides: The terms “Nucleic Acids” and “Nucleotides” refer to naturally occurring or synthetic or artificial nucleic acid or nucleotides. The terms “nucleic acids” and “nucleotides” comprise deoxyribonucleotides or ribonucleotides or any nucleotide analogue and polymers or hybrids thereof in either single- or double-stranded, sense or antisense form. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions) and complementary sequences, as well as the sequence explicitly indicated. The term “nucleic acid” is used interchangeably herein with “gene”, “cDNA, “mRNA”, “oligonucleotide,” and “polynucleotide”. Nucleotide analogues include nucleotides having modifications in the chemical structure of the base, sugar and/or phosphate, including, but not limited to, 5-position pyrimidine modifications, 8-position purine modifications, modifications at cytosine exocyclic amines, substitution of 5-bromo-uracil, and the like; and 2′-position sugar modifications, including but not limited to, sugar-modified ribonucleotides in which the 2′-OH is replaced by a group selected from H, OR, R, halo, SH, SR, NH2, NHR, NR2, or CN. Short hairpin RNAs (shRNAs) also can comprise non-natural elements such as non-natural bases, e.g., ionosin and xanthine, non-natural sugars, e.g., 2′-methoxy ribose, or non-natural phosphodiester linkages, e.g., methylphosphonates, phosphorothioates and peptides.


Nucleic acid sequence: The phrase “nucleic acid sequence” refers to a single or double-stranded polymer of deoxyribonucleotide or ribonucleotide bases read from the 5′- to the 3′-end. It includes chromosomal DNA, self-replicating plasmids, infectious polymers of DNA or RNA and DNA or RNA that performs a primarily structural role. “Nucleic acid sequence” also refers to a consecutive list of abbreviations, letters, characters or words, which represent nucleotides. In one embodiment, a nucleic acid can be a “probe” which is a relatively short nucleic acid, usually less than 100 nucleotides in length. Often a nucleic acid probe is from about 50 nucleotides in length to about 10 nucleotides in length. A “target region” of a nucleic acid is a portion of a nucleic acid that is identified to be of interest. A “coding region” of a nucleic acid is the portion of the nucleic acid, which is transcribed and translated in a sequence-specific manner to produce into a particular polypeptide or protein when placed under the control of appropriate regulatory sequences. The coding region is said to encode such a polypeptide or protein.


Oligonucleotide: The term “oligonucleotide” refers to an oligomer or polymer of ribonucleic acid (RNA) or deoxyribonucleic acid (DNA) or mimetics thereof, as well as oligonucleotides having non-naturally-occurring portions which function similarly. Such modified or substituted oligonucleotides are often preferred over native forms because of desirable properties such as, for example, enhanced cellular uptake, enhanced affinity for nucleic acid target and increased stability in the presence of nucleases. An oligonucleotide preferably includes two or more nucleomonomers covalently coupled to each other by linkages (e.g., phosphodiesters) or substitute linkages.


Overhang: An “overhang” is a relatively short single-stranded nucleotide sequence on the 5′- or 3′-hydroxyl end of a double-stranded oligonucleotide molecule (also referred to as an “extension,” “protruding end,” or “sticky end”).


Overlapping specificity: The term “overlapping specificity” when used herein related to expression specificity of two or more promoters means that the expression regulated by these promoters occur partly in the same plant tissues, developmental stages or conditions. For example, a promoter expressed in leaves and a promoter expressed in root and leaves have an overlap in expression specificity in the leaves of a plant.


Plant: is generally understood as meaning any eukaryotic single- or multi-celled organism or a cell, tissue, organ, part or propagation material (such as seeds or fruit) of same which is capable of photosynthesis. Included for the purpose of the invention are all genera and species of higher and lower plants of the Plant Kingdom. Annual, perennial, monocotyledonous and dicotyledonous plants are preferred. The term includes the mature plants, seed, shoots and seedlings and their derived parts, propagation material (such as seeds or microspores), plant organs, tissue, protoplasts, callus and other cultures, for example cell cultures, and any other type of plant cell grouping to give functional or structural units. Mature plants refer to plants at any desired developmental stage beyond that of the seedling. Seedling refers to a young immature plant at an early developmental stage. Annual, biennial, monocotyledonous and dicotyledonous plants are preferred host organisms for the generation of transgenic plants. The expression of genes is furthermore advantageous in all ornamental plants, useful or ornamental trees, flowers, cut flowers, shrubs or lawns. Plants which may be mentioned by way of example but not by limitation are angiosperms, bryophytes such as, for example, Hepaticae (liverworts) and Musci (mosses); Pteridophytes such as ferns, horsetail and club mosses; gymnosperms such as conifers, cycads, ginkgo and Gnetatae; algae such as Chlorophyceae, Phaeophpyceae, Rhodophyceae, Myxophyceae, Xanthophyceae, Bacillariophyceae (diatoms), and Euglenophyceae. Preferred are plants which are used for food or feed purpose such as the families of the Leguminosae such as pea, alfalfa and soya; Gramineae such as rice, maize, wheat, barley, sorghum, millet, rye, triticale, or oats; the family of the Umbelliferae, especially the genus Daucus, very especially the species carota (carrot) and Apium, very especially the species Graveolens dulce (celery) and many others; the family of the Solanaceae, especially the genus Lycopersicon, very especially the species esculentum (tomato) and the genus Solanum, very especially the species tuberosum (potato) and melongena (egg plant), and many others (such as tobacco); and the genus Capsicum, very especially the species annuum (peppers) and many others; the family of the Leguminosae, especially the genus Glycine, very especially the species max (soybean), alfalfa, pea, lucerne, beans or peanut and many others; and the family of the Cruciferae (Brassicacae), especially the genus Brassica, very especially the species napus (oil seed rape), campestris (beet), oleracea cv Tastie (cabbage), oleracea cv Snowball Y (cauliflower) and oleracea cv Emperor (broccoli); and of the genus Arabidopsis, very especially the species thaliana and many others; the family of the Compositae, especially the genus Lactuca, very especially the species sativa (lettuce) and many others; the family of the Asteraceae such as sunflower, Tagetes, lettuce or Calendula and many other; the family of the Cucurbitaceae such as melon, pumpkin/squash or zucchini, and linseed. Further preferred are cotton, sugar cane, hemp, flax, chillies, and the various tree, nut and wine species.


Polypeptide: The terms “polypeptide”, “peptide”, “oligopeptide”, “polypeptide”, “gene product”, “expression product” and “protein” are used interchangeably herein to refer to a polymer or oligomer of consecutive amino acid residues.


Pre-protein: Protein, which is normally targeted to a cellular organelle, such as a chloroplast, and still comprising its transit peptide.


Primary transcript: The term “primary transcript” as used herein refers to a premature RNA transcript of a gene. A “primary transcript” for example still comprises introns and/or is not yet comprising a polyA tail or a cap structure and/or is missing other modifications necessary for its correct function as transcript such as for example trimming or editing.


Promoter: The terms “promoter”, or “promoter sequence” are equivalents and as used herein, refer to a DNA sequence which when ligated to a nucleotide sequence of interest is capable of controlling the transcription of the nucleotide sequence of interest into RNA. Such promoters can for example be found in the following public databases www.grassius.org/grasspromdb.html, mendel.cs.rhul.ac.uk/mendel.php?topic=plantprom, ppdb.gene.nagoya-u.ac.jp/cgi-bin/index.cgi. Promoters listed there may be addressed with the methods of the invention and are herewith included by reference. A promoter is located 5′ (i.e., upstream), proximal to the transcriptional start site of a nucleotide sequence of interest whose transcription into mRNA it controls, and provides a site for specific binding by RNA polymerase and other transcription factors for initiation of transcription. Said promoter comprises for example the at least 10 kb, for example 5 kb or 2 kb proximal to the transcription start site. It may also comprise the at least 1500 bp proximal to the transcriptional start site, preferably the at least 1000 bp, more preferably the at least 500 bp, even more preferably the at least 400 bp, the at least 300 bp, the at least 200 bp or the at least 100 bp. In a further preferred embodiment, the promoter comprises the at least 50 bp proximal to the transcription start site, for example, at least 25 bp. The promoter does not comprise exon and/or intron regions or 5′ untranslated regions. The promoter may for example be heterologous or homologous to the respective plant. A polynucleotide sequence is “heterologous to” an organism or a second polynucleotide sequence if it originates from a foreign species, or, if from the same species, is modified from its original form. For example, a promoter operably linked to a heterologous coding sequence refers to a coding sequence from a species different from that from which the promoter was derived, or, if from the same species, a coding sequence which is not naturally associated with the promoter (e.g. a genetically engineered coding sequence or an allele from a different ecotype or variety). Suitable promoters can be derived from genes of the host cells where expression should occur or from pathogens for this host cells (e.g., plants or plant pathogens like plant viruses). A plant specific promoter is a promoter suitable for regulating expression in a plant. It may be derived from a plant but also from plant pathogens or it might be a synthetic promoter designed by man. If a promoter is an inducible promoter, then the rate of transcription increases in response to an inducing agent. Also, the promoter may be regulated in a tissue-specific or tissue preferred manner such that it is only or predominantly active in transcribing the associated coding region in a specific tissue type(s) such as leaves, roots or meristem. The term “tissue specific” as it applies to a promoter refers to a promoter that is capable of directing selective expression of a nucleotide sequence of interest to a specific type of tissue (e.g., petals) in the relative absence of expression of the same nucleotide sequence of interest in a different type of tissue (e.g., roots). Tissue specificity of a promoter may be evaluated by, for example, operably linking a reporter gene to the promoter sequence to generate a reporter construct, introducing the reporter construct into the genome of a plant such that the reporter construct is integrated into every tissue of the resulting transgenic plant, and detecting the expression of the reporter gene (e.g., detecting mRNA, protein, or the activity of a protein encoded by the reporter gene) in different tissues of the transgenic plant. The detection of a greater level of expression of the reporter gene in one or more tissues relative to the level of expression of the reporter gene in other tissues shows that the promoter is specific for the tissues in which greater levels of expression are detected. The term “cell type specific” as applied to a promoter refers to a promoter, which is capable of directing selective expression of a nucleotide sequence of interest in a specific type of cell in the relative absence of expression of the same nucleotide sequence of interest in a different type of cell within the same tissue. The term “cell type specific” when applied to a promoter also means a promoter capable of promoting selective expression of a nucleotide sequence of interest in a region within a single tissue. Cell type specificity of a promoter may be assessed using methods well known in the art, e.g., GUS activity staining, GFP protein or immunohistochemical staining. The term “constitutive” when made in reference to a promoter or the expression derived from a promoter means that the promoter is capable of directing transcription of an operably linked nucleic acid molecule in the absence of a stimulus (e.g., heat shock, chemicals, light, etc.) in the majority of plant tissues and cells throughout substantially the entire lifespan of a plant or part of a plant. Typically, constitutive promoters are capable of directing expression of a transgene in substantially any cell and any tissue.


Promoter specificity: The term “specificity” when referring to a promoter means the pattern of expression conferred by the respective promoter. The specificity describes the tissues and/or developmental status of a plant or part thereof, in which the promoter is conferring expression of the nucleic acid molecule under the control of the respective promoter. Specificity of a promoter may also comprise the environmental conditions, under which the promoter may be activated or down-regulated such as induction or repression by biological or environmental stresses such as cold, drought, wounding or infection.


Purified: As used herein, the term “purified” refers to molecules, either nucleic or amino acid sequences that are removed from their natural environment, isolated or separated. “Substantially purified” molecules are at least 60% free, preferably at least 75% free, and more preferably at least 90% free from other components with which they are naturally associated. A purified nucleic acid sequence may be an isolated nucleic acid sequence.


Recombinant: The term “recombinant” with respect to nucleic acid molecules refers to nucleic acid molecules produced by recombinant DNA techniques. Recombinant nucleic acid molecules as such do not exist in nature but are modified, changed, mutated or otherwise manipulated by man. A “recombinant nucleic acid molecule” is a non-naturally occurring nucleic acid molecule that differs in sequence from a naturally occurring nucleic acid molecule by at least one nucleic acid. The term “recombinant nucleic acid molecule” may also comprise a “recombinant construct” which comprises, preferably operably linked, a sequence of nucleic acid molecules, which are not naturally occurring in that order wherein each of the nucleic acid molecules may or may not be a recombinant nucleic acid molecule. Preferred methods for producing said recombinant nucleic acid molecule may comprise cloning techniques, directed or non-directed mutagenesis, synthesis or recombination techniques.


Sense: The term “sense” is understood to mean a nucleic acid molecule having a sequence which is complementary or identical to a target sequence, for example a sequence which binds to a protein transcription factor and which is involved in the expression of a given gene. According to a preferred embodiment, the nucleic acid molecule comprises a gene of interest and elements allowing the expression of the said gene of interest.


Starting sequence: The term “starting sequence” when used herein defines the sequence of a promoter of a defined specificity which is used as a reference sequence for analysis of the presence of motives. The starting sequence is referred to for the definition of the degree of identity to the sequences of the promoters of the invention. The starting sequence could be any wild-type, naturally occurring promoter sequence or any artificial promoter sequence. The sequence of a synthetic promoter sequence produced with the method of the invention may also be used as a starting sequence.


Substantially complementary: In its broadest sense, the term “substantially complementary”, when used herein with respect to a nucleotide sequence in relation to a reference or target nucleotide sequence, means a nucleotide sequence having a percentage of identity between the substantially complementary nucleotide sequence and the exact complementary sequence of said reference or target nucleotide sequence of at least 60%, more desirably at least 70%, more desirably at least 80% or 85%, preferably at least 90%, more preferably at least 93%, still more preferably at least 95% or 96%, yet still more preferably at least 97% or 98%, yet still more preferably at least 99% or most preferably 100% (the later being equivalent to the term “identical”in this context). Preferably identity is assessed over a length of at least 19 nucleotides, preferably at least 50 nucleotides, more preferably the entire length of the nucleic acid sequence to said reference sequence (if not specified otherwise below). Sequence comparisons are carried out using default GAP analysis with the University of Wisconsin GCG, SEQWEB application of GAP, based on the algorithm of Needleman and Wunsch (Needleman and Wunsch (1970) J. Mol. Biol. 48: 443-453; as defined above). A nucleotide sequence “substantially complementary” to a reference nucleotide sequence hybridizes to the reference nucleotide sequence under low stringency conditions, preferably medium stringency conditions, most preferably high stringency conditions (as defined above).


Transgene: The term “transgene” as used herein refers to any nucleic acid sequence, which is introduced into the genome of a cell by experimental manipulations. A transgene may be an “endogenous DNA sequence,” or a “heterologous DNA sequence” (i.e., “foreign DNA”). The term “endogenous DNA sequence” refers to a nucleotide sequence, which is naturally found in the cell into which it is introduced so long as it does not contain some modification (e.g., a point mutation, the presence of a selectable marker gene, etc.) relative to the naturally-occurring sequence.


Transgenic: The term transgenic when referring to an organism means transformed, preferably stably transformed, with a recombinant DNA molecule that preferably comprises a suitable promoter operatively linked to a DNA sequence of interest.


Vector: As used herein, the term “vector” refers to a nucleic acid molecule capable of transporting another nucleic acid molecule to which it has been linked. One type of vector is a genomic integrated vector, or “integrated vector”, which can become integrated into the chromosomal DNA of the host cell. Another type of vector is an episomal vector, i.e., a nucleic acid molecule capable of extra-chromosomal replication. Vectors capable of directing the expression of genes to which they are operatively linked are referred to herein as “expression vectors”. In the present specification, “plasmid” and “vector” are used interchangeably unless otherwise clear from the context. Expression vectors designed to produce RNAs as described herein in vitro or in vivo may contain sequences recognized by any RNA polymerase, including mitochondrial RNA polymerase, RNA pol I, RNA pol II, and RNA pol III. These vectors can be used to transcribe the desired RNA molecule in the cell according to this invention. A plant transformation vector is to be understood as a vector suitable in the process of plant transformation.


Wild-type: The term “wild-type”, “natural” or “natural origin” means with respect to an organism, polypeptide, or nucleic acid sequence, that said organism is naturally occurring or available in at least one naturally occurring organism which is not changed, mutated, or otherwise manipulated by man.


EXAMPLES
Chemicals and Common Methods

Unless indicated otherwise, cloning procedures carried out for the purposes of the present invention including restriction digest, agarose gel electrophoresis, purification of nucleic acids, Ligation of nucleic acids, transformation, selection and cultivation of bacterial cells were performed as described (Sambrook et al., 1989). Sequence analyses of recombinant DNA were performed with a laser fluorescence DNA sequencer (Applied Biosystems, Foster City, Calif., USA) using the Sanger technology (Sanger et al., 1977). Unless described otherwise, chemicals and reagents were obtained from Sigma Aldrich (Sigma Aldrich, St. Louis, USA), from Promega (Madison, Wis., USA), Duchefa (Haarlem, The Netherlands) or Invitrogen (Carlsbad, Calif., USA). Restriction endonucleases were from New England Biolabs (Ipswich, Mass., USA) or Roche Diagnostics GmbH (Penzberg, Germany). Oligonucleotides were synthesized by Eurofins MWG Operon (Ebersberg, Germany).


Example 1
1.1 Directed Permutation of the Promoter Sequence

Using publicly available data, two promoters showing seed specific expression in plants were selected for analyzing the effects of sequence permutation in periodic intervals throughout the full length of the promoter DNA sequence (WO2009016202, WO2009133145). The wildtype or starting sequences of the Phaseolus vulgaris p-PvARC5 (SEQ ID NO 1) (with the prefix p-denoting promoter) and the Vicia faba p-VfSBP (SEQ ID NO 3) promoters were analyzed and annotated for the occurrence of motives, boxes, cis-regulatory elements using e.g. the GEMS Launcher Software (www.genomatix.de) with default parameters (Core similarity 0.75, matrix similarity 0.75)


The “core sequence” of a matrix is defined as the usually 4 consecutive highest conserved positions of the matrix.


The core similarity is calculated as described here and in the papers related to MatInspector (Cartharius K, et al. (2005) Bioinformatics 21; Cartharius K (2005), DNA Press; Quandt K, et al (1995) Nucleic Acids Res. 23.


The maximum core similarity of 1.0 is only reached when the highest conserved bases of a matrix match exactly in the sequence. More important than the core similarity is the matrix similarity which takes into account all bases over the whole matrix length. The matrix similarity is calculated as described here and in the MatInspector paper. A perfect match to the matrix gets a score of 1.00 (each sequence position corresponds to the highest conserved nucleotide at that position in the matrix), a “good” match to the matrix has a similarity of >0.80.


Mismatches in highly conserved positions of the matrix decrease the matrix similarity more than mismatches in less conserved regions.


Opt. gives the Optimized matrix threshold: This matrix similarity is the optimized value defined in a way that a minimum number of matches is found in non-regulatory test sequences (i.e. with this matrix similarity the number of false positive matches is minimized). This matrix similarity is used when the user checks “Optimized” as the matrix similarity threshold for MatInspector. In the following, the DNA sequences of the promoters were permutated according to the method of the invention to yield p-PvArc5_perm (SEQ ID NO 2) and p-VfSBP_perm (SEQ ID NO 4). In case of the p-PvArc5 promoter 6.6% of the motives not associated with seed specific/preferential expression and transcription initiation have been altered, in case of the p-VfSBP 7.8%. DNA permutation was conducted in a way to not affect cis regulatory elements which have been associated previously with seed specific gene expression or initiation of transcription and permutations were distributed periodically over the full promoter DNA sequence with less than 46 nucleotides between permutated nucleotide positions and within a stretch of 5 nucleotides having at least one nucleotide permutated. Permutations were carried out with the aim to keep most of the cis regulatory elements, boxes, motives present in the native promoter and to avoid creating new putative cis regulatory elements, boxes, motives.


The list of motives, boxes, cis regulatory elements in the PvARC5 promoters before and after the permutation are shown in Table 1 and 2.


The list of motives, boxes, cis regulatory elements in the VfSBP promoters before and after the permutation are shown in Table 3 and 4.


Empty lines resemble motives, boxes, cis regulatory elements not found in one sequence but present in the corresponding sequence, hence, motives, boxes, cis regulatory elements that were deleted from the starting sequence or that were introduced into the permutated sequence.









TABLE 1







Boxes and Motifs identified in the starting sequence of the PvARC5 promoter


PvARC5 promotor


















Position






Further Family


Position

Core
Matrix


Family
Information
Matrix
Opt.
from-to
Strand
sim.
sim.

















P$PSRE
Pollen-specific
P$GAAA.01
0.83
 9-25
(+)
1
0.862



regulatory elements


P$IDDF
ID domain factors
P$ID1.01
0.92
36-48
(−)
1
0.922


P$MYBL
MYB-like proteins
P$ATMYB77.01
0.87
47-63
(+)
1
0.887


P$RAV5
5′-part of bipartite
P$RAV1-5.01
0.96
48-58
(+)
1
0.96



RAV1 binding



site


P$MYBL
MYB-like proteins
P$GAMYB.01
0.91
52-68
(−)
1
0.932


O$INRE
Core promoter
O$DINR.01
0.94
75-85
(+)
0.97
0.988



initiator elements


P$AHBP

Arabidopsis

P$WUS.01
0.94
84-94
(−)
1
0.963



homeobox protein


P$MIIG
MYB IIG-type
P$PALBOXL.01
0.80
 87-101
(+)
0.84
0.806



binding sites


P$NCS1
Nodulin consensus
P$NCS1.01
0.85
106-116
(+)
1
0.99



sequence 1


P$GAPB
GAP-Box (light
P$GAP.01
0.88
108-122
(+)
0.81
0.884



response elements)


P$AHBP

Arabidopsis

P$WUS.01
0.94
110-120
(−)
1
0.963



homeobox protein


P$TEFB
TEF-box
P$TEF1.01
0.76
111-131
(−)
0.96
0.761


P$CCAF
Circadian control
P$CCA1.01
0.85
113-127
(+)
0.77
0.856



factors


P$IBOX
Plant I-Box
P$GATA.01
0.93
121-137
(−)
1
0.964



sites


P$GAGA
GAGA elements
P$GAGABP.01
0.75
125-149
(−)
0.75
0.768


P$NCS2
Nodulin consensus
P$NCS2.01
0.79
126-140
(+)
1
0.799



sequence 2


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
144-154
(−)
1
0.923



homeobox protein


P$AHBP

Arabidopsis

P$BLR.01
0.90
147-157
(−)
1
0.928



homeobox protein


O$VTBP
Vertebrate TA-
O$LTATA.01
0.82
151-167
(+)
1
0.839



TA binding



protein factor


P$NCS1
Nodulin consensus
P$NCS1.01
0.85
164-174
(−)
1
0.898



sequence 1


P$L1BX
L1 box, motif
P$ATML1.01
0.82
175-191
(+)
0.75
0.872



for L1 layer-



specific expression


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
177-187
(+)
0.83
0.902



homeobox protein


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
177-187
(−)
1
1



homeobox protein


O$VTBP
Vertebrate TA-
O$ATATA.01
0.78
184-200
(+)
0.75
0.797



TA binding



protein factor


P$TELO
Telo box (plant
P$ATPURA.01
0.85
186-200
(−)
0.75
0.857



interstitial telomere



motifs)


P$NCS2
Nodulin consensus
P$NCS2.01
0.79
213-227
(−)
1
0.826



sequence 2


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
233-251
(+)
0.75
0.824


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
238-254
(+)
0.82
0.798


P$NCS1
Nodulin consensus
P$NCS1.01
0.85
261-271
(−)
1
0.851



sequence 1


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
264-280
(+)
1
0.774


O$VTBP
Vertebrate TA-
O$ATATA.01
0.78
267-283
(+)
1
0.872



TA binding



protein factor


P$SPF1
Sweet potato
P$SP8BF.01
0.87
298-310
(−)
1
0.872



DNA-binding



factor with two



WRKY-



domains


P$BRRE
Brassinosteroid
P$BZR1.01
0.95
303-319
(−)
1
0.953



(BR) response



element


P$L1BX
L1 box, motif
P$ATML1.02
0.76
319-335
(+)
0.89
0.762



for L1 layer-



specific expression


P$GBOX
Plant G-box/C-
P$TGA1.01
0.90
327-347
(−)
1
0.909



box bZIP proteins


P$GTBX
GT-box elements
P$GT1.01
0.85
337-353
(+)
1
0.854


P$IBOX
Plant I-Box
P$GATA.01
0.93
337-353
(−)
1
0.935



sites


P$OPAQ
Opaque-2 like
P$O2.01
0.87
351-367
(−)
1
0.919



transcriptional



activators


P$GTBX
GT-box elements
P$S1F.01
0.79
362-378
(−)
0.75
0.797


P$AHBP

Arabidopsis

P$ATHB9.01
0.77
367-377
(−)
1
0.788



homeobox protein


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
367-377
(+)
1
0.926



homeobox protein


P$GTBX
GT-box elements
P$SBF1.01
0.87
367-383
(+)
1
0.894


P$L1BX
L1 box, motif
P$ATML1.01
0.82
369-385
(+)
1
0.827



for L1 layer-



specific expression


P$AHBP

Arabidopsis

P$WUS.01
0.94
371-381
(−)
1
1



homeobox protein


O$VTBP
Vertebrate TA-
O$LTATA.01
0.82
396-412
(−)
1
0.857



TA binding



protein factor


P$LREM
Light responsive
P$RAP22.01
0.85
397-407
(+)
1
0.921



element



motif, not modulated



by different



light



qualities


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
401-411
(−)
1
0.916



homeobox protein


P$MYBL
MYB-like proteins
P$WER.01
0.87
403-419
(−)
1
0.9


P$MYBS
MYB proteins
P$OSMYBS.01
0.82
416-432
(+)
0.75
0.837



with single



DNA binding



repeat


P$TELO
Telo box (plant
P$ATPURA.01
0.85
440-454
(−)
0.75
0.854



interstitial telomere



motifs)


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
461-479
(−)
0.75
0.826


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
468-478
(+)
1
0.892



homeobox protein


O$VTBP
Vertebrate TA-
O$VTATA.01
0.90
473-489
(−)
1
0.913



TA binding



protein factor


O$PTBP
Plant TATA
O$PTATA.01
0.88
476-490
(−)
1
0.889



binding protein



factor


P$PSRE
Pollen-specific
P$GAAA.01
0.83
482-498
(+)
1
0.831



regulatory elements


P$HMGF
High mobility
P$HMG_IY.01
0.89
499-513
(−)
1
0.91



group factors


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
499-517
(−)
1
0.878


P$GTBX
GT-box elements
P$SBF1.01
0.87
509-525
(−)
1
0.885


P$GARP
Myb-related
P$ARR10.01
0.97
540-548
(+)
1
0.976



DNA binding



proteins (Golden2,



ARR, Psr)


P$AHBP

Arabidopsis

P$ATHB9.01
0.77
558-568
(−)
1
0.775



homeobox protein


P$L1BX
L1 box, motif
P$PDF2.01
0.85
558-574
(−)
1
0.865



for L1 layer-



specific expression


P$NCS1
Nodulin consensus
P$NCS1.01
0.85
558-568
(−)
0.88
0.927



sequence 1


P$EINL
Ethylen insensitive
P$TEIL.01
0.92
572-580
(+)
1
0.921



3 like



factors


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
583-593
(+)
0.94
0.977



homeobox protein


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
583-593
(−)
0.83
0.94



homeobox protein


P$L1BX
L1 box, motif
P$HDG9.01
0.77
607-623
(+)
1
0.772



for L1 layer-



specific expression


P$IBOX
Plant I-Box
P$IBOX.01
0.81
610-626
(+)
0.75
0.824



sites


P$MYBS
MYB proteins
P$MYBST1.01
0.90
613-629
(−)
1
0.953



with single



DNA binding



repeat


P$IBOX
Plant I-Box
P$GATA.01
0.93
616-632
(+)
1
0.942



sites


P$TEFB
TEF-box
P$TEF1.01
0.76
616-636
(+)
0.96
0.778


P$MYBS
MYB proteins
P$TAMYB80.01
0.83
625-641
(−)
1
0.859



with single



DNA binding



repeat


O$PTBP
Plant TATA
O$PTATA.02
0.90
631-645
(+)
1
0.927



binding protein



factor


O$PTBP
Plant TATA
O$PTATA.02
0.90
632-646
(−)
1
0.929



binding protein



factor


O$VTBP
Vertebrate TA-
O$ATATA.01
0.78
646-662
(−)
0.75
0.825



TA binding



protein factor


P$L1BX
L1 box, motif
P$HDG9.01
0.77
648-664
(+)
1
0.791



for L1 layer-



specific expression


P$HMGF
High mobility
P$HMG_IY.01
0.89
649-663
(−)
1
0.902



group factors


P$DOFF
DNA binding
P$PBF.01
0.97
654-670
(+)
1
0.979



with one finger



(DOF)


P$LREM
Light responsive
P$RAP22.01
0.85
682-692
(−)
1
0.975



element



motif, not modulated



by different



light



qualities


P$TEFB
TEF-box
P$TEF1.01
0.76
696-716
(−)
0.84
0.78


P$MYBL
MYB-like proteins
P$CARE.01
0.83
699-715
(−)
1
0.88


P$LEGB
Legumin Box
P$RY.01
0.87
704-730
(+)
1
0.94



family


P$GBOX
Plant G-box/C-
P$BZIP910.01
0.77
716-736
(−)
0.75
0.856



box bZIP proteins


P$GBOX
Plant G-box/C-
P$ROM.01
0.85
717-737
(+)
1
1



box bZIP proteins


P$ABRE
ABA response
P$ABF1.03
0.82
719-735
(−)
0.75
0.857



elements


P$GBOX
Plant G-box/C-
P$BZIP910.02
0.84
722-742
(−)
0.75
0.862



box bZIP proteins


P$MYCL
Myc-like basic
P$MYCRS.01
0.93
739-757
(−)
0.86
0.943



helix-loop-helix



binding factors


P$OPAQ
Opaque-2 like
P$GCN4.01
0.81
745-761
(−)
1
0.85



transcriptional



activators


P$AREF
Auxin response
P$ARE.01
0.93
747-759
(+)
1
0.941



element


P$TEFB
TEF-box
P$TEF1.01
0.76
783-803
(−)
0.84
0.78


P$MYBL
MYB-like proteins
P$CARE.01
0.83
786-802
(−)
1
0.876


P$LEGB
Legumin Box
P$RY.01
0.87
788-814
(−)
1
0.929



family


P$LEGB
Legumin Box
P$RY.01
0.87
791-817
(+)
1
0.984



family


P$ROOT
Root hair-
P$RHE.01
0.77
796-820
(+)
1
0.812



specific cis-



elements in



angiosperms


P$GBOX
Plant G-box/C-
P$CPRF.01
0.95
803-823
(−)
1
0.989



box bZIP proteins


P$GBOX
Plant G-box/C-
P$CPRF.01
0.95
804-824
(+)
1
0.98



box bZIP proteins


P$MYCL
Myc-like basic
P$MYCRS.01
0.93
804-822
(−)
1
0.956



helix-loop-helix



binding factors


P$ABRE
ABA response
P$ABRE.01
0.82
805-821
(+)
1
0.874



elements


P$MYCL
Myc-like basic
P$PIF3.01
0.82
805-823
(+)
1
0.914



helix-loop-helix



binding factors


P$OPAQ
Opaque-2 like
P$RITA1.01
0.95
805-821
(−)
1
0.992



transcriptional



activators


P$ABRE
ABA response
P$ABF1.03
0.82
806-822
(−)
1
0.977



elements


P$OPAQ
Opaque-2 like
P$RITA1.01
0.95
806-822
(+)
1
0.973



transcriptional



activators


P$OCSE
Enhancer element
P$OCSTF.01
0.73
809-829
(−)
0.85
0.747



first identified



in the



promoter of the



octopine synthase



gene



(OCS) of the




Agrobacterium





tumefaciens T-




DNA


P$GTBX
GT-box elements
P$S1F.01
0.79
823-839
(−)
1
0.794


P$LFYB
LFY binding
P$LFY.01
0.93
839-851
(−)
0.91
0.935



site


P$LEGB
Legumin Box
P$RY.01
0.87
840-866
(−)
1
0.948



family


P$LEGB
Legumin Box
P$RY.01
0.87
843-869
(+)
1
0.966



family


P$LEGB
Legumin Box
P$IDE1.01
0.77
847-873
(+)
1
0.779



family


P$GBOX
Plant G-box/C-
P$BZIP910.01
0.77
855-875
(−)
0.75
0.856



box bZIP proteins


P$GBOX
Plant G-box/C-
P$ROM.01
0.85
856-876
(+)
1
1



box bZIP proteins


P$ABRE
ABA response
P$ABF1.03
0.82
858-874
(−)
0.75
0.857



elements


P$GBOX
Plant G-box/C-
P$BZIP910.02
0.84
861-881
(−)
0.75
0.862



box bZIP proteins


P$SALT
Salt/drought
P$ALFIN1.02
0.95
871-885
(−)
1
0.963



responsive



elements


P$LEGB
Legumin Box
P$RY.01
0.87
895-921
(+)
1
0.927



family


P$GBOX
Plant G-box/C-
P$BZIP910.01
0.77
907-927
(−)
0.75
0.856



box bZIP proteins


P$GBOX
Plant G-box/C-
P$ROM.01
0.85
908-928
(+)
1
0.938



box bZIP proteins


P$ABRE
ABA response
P$ABF1.03
0.82
910-926
(−)
0.75
0.857



elements


P$GBOX
Plant G-box/C-
P$BZIP910.02
0.84
913-933
(−)
0.75
0.871



box bZIP proteins


P$MADS
MADS box
P$SQUA.01
0.90
960-980
(−)
1
0.908



proteins


P$L1BX
L1 box, motif
P$PDF2.01
0.85
963-979
(+)
1
0.856



for L1 layer-



specific expression


P$LREM
Light responsive
P$RAP22.01
0.85
972-982
(+)
1
0.858



element



motif, not modulated



by different



light



qualities


O$PTBP
Plant TATA
O$PTATA.01
0.88
974-988
(−)
0.83
0.886



binding protein



factor


O$VTBP
Vertebrate TATA
O$ATATA.01
0.78
974-990
(+)
0.75
0.83



binding



protein factor


O$VTBP
Vertebrate TATA
O$MTATA.01
0.84
976-992
(+)
1
0.843



binding



protein factor


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
983-999
(−)
1
0.787


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
 984-1002
(−)
1
0.818


P$AHBP

Arabidopsis

P$ATHB1.01
0.90
 991-1001
(+)
1
0.989



homeobox protein


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
 991-1001
(−)
1
0.943



homeobox protein


P$HMGF
High mobility
P$HMG_IY.01
0.89
 992-1006
(+)
1
0.913



group factors


P$SPF1
Sweet potato
P$SP8BF.01
0.87
1003-1015
(+)
1
0.881



DNA-binding



factor with two



WRKY-



domains


P$OCSE
Enhancer element
P$OCSTF.01
0.73
1004-1024
(+)
1
0.776



first identified



in the



promoter of the



octopine synthase



gene



(OCS) of the



Agrobacterium



tumefaciens T-



DNA


P$GBOX
Plant G-box/C-
P$UPRE.01
0.86
1009-1029
(−)
1
0.974



box bZIP proteins


P$GBOX
Plant G-box/C-
P$TGA1.01
0.90
1010-1030
(+)
1
0.991



box bZIP proteins


P$ABRE
ABA response
P$ABF1.03
0.82
1011-1027
(+)
1
0.828



elements


P$OPAQ
Opaque-2 like
P$O2.01
0.87
1011-1027
(−)
1
0.99



transcriptional



activators


P$OPAQ
Opaque-2 like
P$O2_GCN4.01
0.81
1012-1028
(+)
0.95
0.893



transcriptional



activators


P$ROOT
Root hair-
P$RHE.01
0.77
1013-1037
(−)
1
0.771



specific cis-



elements in



angiosperms


P$LEGB
Legumin Box
P$LEGB.01
0.65
1025-1051
(+)
1
0.656



family


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
1042-1052
(+)
0.83
0.902



homeobox protein


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
1042-1052
(−)
1
1



homeobox protein


P$GTBX
GT-box elements
P$SBF1.01
0.87
1045-1061
(+)
1
0.904


O$INRE
Core promoter
O$DINR.01
0.94
1070-1080
(−)
0.97
0.949



initiator elements


P$CCAF
Circadian control
P$CCA1.01
0.85
1093-1107
(−)
1
0.952



factors


P$L1BX
L1 box, motif
P$ATML1.01
0.82
1098-1114
(−)
0.75
0.843



for L1 layer-



specific expression


P$CARM
CA-rich motif
P$CARICH.01
0.78
1102-1120
(−)
1
0.791


P$MADS
MADS box
P$SQUA.01
0.90
1108-1128
(−)
1
0.928



proteins


O$PTBP
Plant TATA
O$PTATA.01
0.88
1111-1125
(+)
1
0.961



binding protein



factor


O$VTBP
Vertebrate TATA
O$VTATA.01
0.90
1112-1128
(+)
1
0.968



binding



protein factor


P$LEGB
Legumin Box
P$RY.01
0.87
1130-1156
(−)
1
0.922



family


P$AHBP

Arabidopsis

P$WUS.01
0.94
1135-1145
(+)
1
1



homeobox protein


P$LEGB
Legumin Box
P$RY.01
0.87
1138-1164
(−)
1
0.914



family


P$ROOT
Root hair-
P$RHE.01
0.77
1138-1162
(+)
0.75
0.794



specific cis-



elements in



angiosperms


P$L1BX
L1 box, motif
P$ATML1.01
0.82
1141-1157
(+)
0.75
0.833



for L1 layer-



specific expression
















TABLE 2







Boxes and Motifs identified in the permutated sequence of the PvARC5 promoter.


Preferably associated boxes are annotated in line 38, 43, 116, 121, 124, 128, 129, 137, 138,


143, 145, 146, 147, 151, 152, 153, 156, 162, 165, 175, 184, 186, 188, 203 and 205 of tables 1


and 2. Essential boxes are annotated in line 83, 111, 112, 172 and 201 of tables 1 and 2.


PvARC5 promotor permutated















Further Family


Position

Core
Matrix


Family
Information
Matrix
Opt.
from-to
Strand
sim.
sim.

















P$PSRE
Pollen-specific
P$GAAA.01
0.83
 9-25
(+)
1
0.862



regulatory elements


P$IDDF
ID domain factors
P$ID1.01
0.92
36-48
(−)
1
0.922


P$MYBL
MYB-like proteins
P$ATMYB77.01
0.87
47-63
(+)
1
0.887


P$RAV5
5′-part of bipartite
P$RAV1-5.01
0.96
48-58
(+)
1
0.96



RAV1 binding



site


P$MYBL
MYB-like proteins
P$GAMYB.01
0.91
52-68
(−)
1
0.932


P$STKM
Storekeeper
P$STK.01
0.85
58-72
(+)
0.79
0.894



motif


P$MYBL
MYB-like proteins
P$MYBPH3.01
0.80
59-75
(+)
0.75
0.806


P$L1BX
L1 box, motif
P$ATML1.02
0.76
62-78
(+)
0.89
0.791



for L1 layer-



specific expression


O$INRE
Core promoter
O$DINR.01
0.94
75-85
(+)
0.97
0.988



initiator elements


P$AHBP

Arabidopsis

P$WUS.01
0.94
84-94
(−)
1
0.963



homeobox protein


P$MIIG
MYB IIG-type
P$PALBOXL.01
0.80
 87-101
(+)
0.84
0.806



binding sites


P$NCS1
Nodulin consensus
P$NCS1.01
0.85
106-116
(+)
1
0.99



sequence 1


P$GAPB
GAP-Box (light
P$GAP.01
0.88
108-122
(+)
0.81
0.884



response elements)


P$AHBP

Arabidopsis

P$WUS.01
0.94
110-120
(−)
1
0.963



homeobox protein


P$IBOX
Plant I-Box
P$GATA.01
0.93
121-137
(−)
1
0.939



sites


P$GAGA
GAGA elements
P$GAGABP.01
0.75
125-149
(−)
0.75
0.764


P$NCS2
Nodulin consensus
P$NCS2.01
0.79
126-140
(+)
1
0.799



sequence 2


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
144-154
(−)
1
0.923



homeobox protein


P$AHBP

Arabidopsis

P$BLR.01
0.90
147-157
(−)
1
0.928



homeobox protein


O$VTBP
Vertebrate TA-
O$ATATA.01
0.78
149-165
(+)
1
0.78



TA binding



protein factor


O$VTBP
Vertebrate TA-
O$LTATA.01
0.82
151-167
(+)
1
0.825



TA binding



protein factor


P$NCS1
Nodulin consensus
P$NCS1.01
0.85
164-174
(−)
1
0.898



sequence 1


P$L1BX
L1 box, motif
P$ATML1.01
0.82
175-191
(+)
0.75
0.872



for L1 layer-



specific expression


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
177-187
(+)
0.83
0.902



homeobox protein


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
177-187
(−)
1
1



homeobox protein


O$VTBP
Vertebrate TA-
O$ATATA.01
0.78
184-200
(+)
0.75
0.797



TA binding



protein factor


P$TELO
Telo box (plant
P$ATPURA.01
0.85
186-200
(−)
0.75
0.857



interstitial telomere



motifs)


P$PSRE
Pollen-specific
P$GAAA.01
0.83
188-204
(−)
1
0.843



regulatory elements


P$NCS2
Nodulin consensus
P$NCS2.01
0.79
213-227
(−)
1
0.826



sequence 2


P$IBOX
Plant I-Box
P$GATA.01
0.93
221-237
(+)
1
1



sites


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
233-251
(+)
0.75
0.824


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
238-254
(+)
0.82
0.798


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
243-261
(−)
0.75
0.824


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
250-260
(+)
1
0.892



homeobox protein


P$PSRE
Pollen-specific
P$GAAA.01
0.83
257-273
(+)
1
0.881



regulatory elements


P$NCS1
Nodulin consensus
P$NCS1.01
0.85
261-271
(−)
1
0.851



sequence 1


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
264-280
(+)
1
0.774


O$VTBP
Vertebrate TA-
O$ATATA.01
0.78
267-283
(+)
1
0.872



TA binding



protein factor


P$MYBL
MYB-like proteins
P$GAMYB.01
0.91
289-305
(−)
1
0.919


P$SPF1
Sweet potato
P$SP8BF.01
0.87
298-310
(−)
1
0.872



DNA-binding



factor with two



WRKY-



domains


P$BRRE
Brassinosteroid
P$BZR1.01
0.95
303-319
(−)
1
0.953



(BR) response



element


P$GBOX
Plant G-box/C-
P$TGA1.01
0.90
327-347
(−)
1
0.909



box bZIP proteins


P$GTBX
GT-box elements
P$GT1.01
0.85
337-353
(+)
1
0.854


P$IBOX
Plant I-Box
P$GATA.01
0.93
337-353
(−)
1
0.935



sites


P$PSRE
Pollen-specific
P$GAAA.01
0.83
342-358
(−)
1
0.896



regulatory elements


P$AHBP

Arabidopsis

P$ATHB9.01
0.77
343-353
(−)
1
0.869



homeobox protein


P$NCS1
Nodulin consensus
P$NCS1.01
0.85
343-353
(−)
0.88
0.915



sequence 1


P$GTBX
GT-box elements
P$S1F.01
0.79
344-360
(−)
0.75
0.827


O$INRE
Core promoter
O$DINR.01
0.94
345-355
(+)
0.97
0.945



initiator elements


P$OPAQ
Opaque-2 like
P$O2.01
0.87
351-367
(−)
1
0.919



transcriptional



activators


P$GTBX
GT-box elements
P$S1F.01
0.79
362-378
(−)
0.75
0.797


P$AHBP

Arabidopsis

P$ATHB9.01
0.77
367-377
(−)
1
0.788



homeobox protein


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
367-377
(+)
1
0.926



homeobox protein


P$GTBX
GT-box elements
P$SBF1.01
0.87
367-383
(+)
1
0.894


P$L1BX
L1 box, motif
P$ATML1.01
0.82
369-385
(+)
1
0.827



for L1 layer-



specific expression


P$AHBP

Arabidopsis

P$WUS.01
0.94
371-381
(−)
1
1



homeobox protein


P$MYBL
MYB-like proteins
P$ATMYB77.01
0.87
376-392
(−)
0.86
0.924


P$CCAF
Circadian control
P$CCA1.01
0.85
387-401
(+)
1
0.851



factors


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
392-410
(+)
1
0.864


O$VTBP
Vertebrate TA-
O$LTATA.01
0.82
396-412
(−)
1
0.852



TA binding



protein factor


P$LREM
Light responsive
P$RAP22.01
0.85
397-407
(+)
1
0.911



element



motif, not modulated



by different



light



qualities


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
401-411
(−)
1
0.916



homeobox protein


P$MYBL
MYB-like proteins
P$WER.01
0.87
403-419
(−)
1
0.9


P$MYBS
MYB proteins
P$OSMYBS.01
0.82
416-432
(+)
0.75
0.829



with single



DNA binding



repeat


P$L1BX
L1 box, motif
P$ATML1.01
0.82
420-436
(−)
0.75
0.821



for L1 layer-



specific expression


O$VTBP
Vertebrate TA-
O$ATATA.01
0.78
426-442
(+)
0.75
0.819



TA binding



protein factor


P$GTBX
GT-box elements
P$SBF1.01
0.87
426-442
(−)
1
0.902


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
428-444
(−)
1
0.772


P$OCSE
Enhancer element
P$OCSL.01
0.69
428-448
(+)
0.77
0.692



first identified



in the



promoter of the



octopine synthase



gene



(OCS) of the




Agrobacterium





tumefaciens T-




DNA


P$TELO
Telo box (plant
P$ATPURA.01
0.85
440-454
(−)
0.75
0.854



interstitial telomere



motifs)


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
455-465
(+)
0.83
0.902



homeobox protein


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
455-465
(−)
1
0.979



homeobox protein


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
461-479
(−)
0.75
0.815


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
468-478
(+)
1
0.901



homeobox protein


O$VTBP
Vertebrate TA-
O$VTATA.01
0.90
473-489
(−)
1
0.913



TA binding



protein factor


O$PTBP
Plant TATA
O$PTATA.01
0.88
476-490
(−)
1
0.889



binding protein



factor


O$VTBP
Vertebrate TA-
O$ATATA.01
0.78
489-505
(−)
0.75
0.825



TA binding



protein factor


P$L1BX
L1 box, motif
P$HDG9.01
0.77
491-507
(+)
1
0.791



for L1 layer-



specific expression


P$HMGF
High mobility
P$HMG_IY.01
0.89
492-506
(−)
1
0.902



group factors


P$CCAF
Circadian control
P$CCA1.01
0.85
498-512
(+)
0.76
0.862



factors


P$HMGF
High mobility
P$HMG_IY.01
0.89
499-513
(−)
1
0.909



group factors


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
499-517
(−)
1
0.827


P$GTBX
GT-box elements
P$SBF1.01
0.87
509-525
(−)
1
0.885


P$SPF1
Sweet potato
P$SP8BF.01
0.87
520-532
(−)
1
0.905



DNA-binding



factor with two



WRKY-



domains


P$WBXF
W Box family
P$WRKY.01
0.92
526-542
(−)
1
0.936


P$GARP
Myb-related
P$ARR10.01
0.97
540-548
(+)
1
0.976



DNA binding



proteins (Golden2,



ARR, Psr)


P$AHBP

Arabidopsis

P$ATHB9.01
0.77
558-568
(−)
1
0.775



homeobox protein


P$L1BX
L1 box, motif
P$PDF2.01
0.85
558-574
(−)
1
0.865



for L1 layer-



specific expression


P$NCS1
Nodulin consensus
P$NCS1.01
0.85
558-568
(−)
0.88
0.927



sequence 1


P$EINL
Ethylen insensitive
P$TEIL.01
0.92
572-580
(+)
1
0.921



3 like



factors


P$SBPD
SBP-domain
P$SBP.01
0.88
573-589
(+)
1
0.885



proteins


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
583-593
(+)
0.94
0.977



homeobox protein


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
583-593
(−)
0.83
0.94



homeobox protein


P$MYCL
Myc-like basic
P$MYCRS.01
0.93
591-609
(−)
0.86
0.958



helix-loop-helix



binding factors


P$OPAQ
Opaque-2 like
P$O2_GCN4.01
0.81
593-609
(+)
1
0.838



transcriptional



activators


O$VTBP
Vertebrate TA-
O$VTATA.02
0.89
603-619
(+)
1
0.89



TA binding



protein factor


P$L1BX
L1 box, motif
P$HDG9.01
0.77
607-623
(+)
1
0.772



for L1 layer-



specific expression


P$IBOX
Plant I-Box
P$IBOX.01
0.81
610-626
(+)
0.75
0.824



sites


P$MYBS
MYB proteins
P$MYBST1.01
0.90
613-629
(−)
1
0.953



with single



DNA binding



repeat


P$IBOX
Plant I-Box
P$GATA.01
0.93
616-632
(+)
1
0.942



sites


P$TEFB
TEF-box
P$TEF1.01
0.76
616-636
(+)
0.96
0.778


P$MYBS
MYB proteins
P$TAMYB80.01
0.83
625-641
(−)
1
0.861



with single



DNA binding



repeat


O$PTBP
Plant TATA
O$PTATA.02
0.90
631-645
(+)
1
0.927



binding protein



factor


O$PTBP
Plant TATA
O$PTATA.02
0.90
632-646
(−)
1
0.929



binding protein



factor


P$L1BX
L1 box, motif
P$HDG9.01
0.77
648-664
(+)
1
0.822



for L1 layer-



specific expression


P$HMGF
High mobility
P$HMG_IY.01
0.89
649-663
(−)
1
0.923



group factors


P$DOFF
DNA binding
P$PBF.01
0.97
654-670
(+)
1
0.979



with one finger



(DOF)


P$LREM
Light responsive
P$RAP22.01
0.85
682-692
(−)
1
0.975



element



motif, not modulated



by different



light



qualities


P$MYBL
MYB-like proteins
P$CARE.01
0.83
689-705
(+)
1
0.884


P$TEFB
TEF-box
P$TEF1.01
0.76
696-716
(−)
0.84
0.779


P$MYBL
MYB-like proteins
P$CARE.01
0.83
699-715
(−)
1
0.88


P$LEGB
Legumin Box
P$RY.01
0.87
704-730
(+)
1
0.94



family


P$GBOX
Plant G-box/C-
P$BZIP910.01
0.77
716-736
(−)
0.75
0.856



box bZIP proteins


P$GBOX
Plant G-box/C-
P$ROM.01
0.85
717-737
(+)
1
1



box bZIP proteins


P$ABRE
ABA response
P$ABF1.03
0.82
719-735
(−)
0.75
0.857



elements


P$GBOX
Plant G-box/C-
P$BZIP910.02
0.84
722-742
(−)
0.75
0.862



box bZIP proteins


P$GBOX
Plant G-box/C-
P$HBP1B.01
0.83
734-754
(+)
0.77
0.852



box bZIP proteins


P$MYCL
Myc-like basic
P$MYCRS.01
0.93
739-757
(−)
0.86
0.953



helix-loop-helix



binding factors


P$ABRE
ABA response
P$ABF1.01
0.79
741-757
(−)
0.75
0.796



elements


P$OPAQ
Opque-2 like
P$O2_GCN4.01
0.81
741-757
(+)
1
0.871



transcriptional



activators


P$OPAQ
Opaque-2 like
P$GCN4.01
0.81
745-761
(−)
1
0.85



transcriptional



activators


P$AREF
Auxin response
P$ARE.01
0.93
747-759
(+)
1
0.941



element


P$MYBL
MYB-like proteins
P$GAMYB.01
0.91
754-770
(+)
1
0.933


O$INRE
Core promoter
O$DINR.01
0.94
757-767
(+)
1
0.943



initiator elements


P$WBXF
W Box family
P$WRKY.01
0.92
780-796
(+)
1
0.942


P$TEFB
TEF-box
P$TEF1.01
0.76
783-803
(−)
0.84
0.779


P$MYBL
MYB-like proteins
P$CARE.01
0.83
786-802
(−)
1
0.876


P$LEGB
Legumin Box
P$RY.01
0.87
788-814
(−)
1
0.929



family


P$LEGB
Legumin Box
P$RY.01
0.87
791-817
(+)
1
0.984



family


P$ROOT
Root hair-
P$RHE.01
0.77
796-820
(+)
1
0.812



specific cis-



elements in



angiosperms


P$GBOX
Plant G-box/C-
P$CPRF.01
0.95
803-823
(−)
1
0.989



box bZIP proteins


P$GBOX
Plant G-box/C-
P$CPRF.01
0.95
804-824
(+)
1
0.98



box bZIP proteins


P$MYCL
Myc-like basic
P$MYCRS.01
0.93
804-822
(−)
1
0.956



helix-loop-helix



binding factors


P$ABRE
ABA response
P$ABRE.01
0.82
805-821
(+)
1
0.874



elements


P$MYCL
Myc-like basic
P$PIF3.01
0.82
805-823
(+)
1
0.922



helix-loop-helix



binding factors


P$OPAQ
Opaque-2 like
P$RITA1.01
0.95
805-821
(−)
1
0.992



transcriptional



activators


P$ABRE
ABA response
P$ABF1.03
0.82
806-822
(−)
1
0.977



elements


P$OPAQ
Opaque-2 like
P$RITA1.01
0.95
806-822
(+)
1
0.973



transcriptional



activators


P$OCSE
Enhancer element
P$OCSL.01
0.69
809-829
(−)
1
0.819



first identified



in the



promoter of the



octopine synthase



gene



(OCS) of the




Agrobacterium





tumefaciens T-




DNA


P$GTBX
GT-box elements
P$S1F.01
0.79
823-839
(−)
1
0.802


P$LFYB
LFY binding
P$LFY.01
0.93
839-851
(−)
0.91
0.936



site


P$LEGB
Legumin Box
P$RY.01
0.87
840-866
(−)
1
0.948



family


P$LEGB
Legumin Box
P$RY.01
0.87
843-869
(+)
1
0.966



family


P$LEGB
Legumin Box
P$IDE1.01
0.77
847-873
(+)
1
0.779



family


P$GBOX
Plant G-box/C-
P$BZIP910.01
0.77
855-875
(−)
0.75
0.856



box bZIP proteins


P$GBOX
Plant G-box/C-
P$ROM.01
0.85
856-876
(+)
1
1



box bZIP proteins


P$ABRE
ABA response
P$ABF1.03
0.82
858-874
(−)
0.75
0.857



elements


P$GCCF
GCC box family
P$ERE_JERE.01
0.85
870-882
(−)
0.81
0.86


P$HEAT
Heat shock
P$HSE.01
0.81
880-894
(−)
1
0.827



factors


P$MYBS
MYB proteins
P$ZMMRP1.01
0.79
881-897
(+)
0.81
0.867



with single



DNA binding



repeat


P$LEGB
Legumin Box
P$RY.01
0.87
895-921
(+)
1
0.924



family


P$GBOX
Plant G-box/C-
P$BZIP910.01
0.77
907-927
(−)
0.75
0.856



box bZIP proteins


P$GBOX
Plant G-box/C-
P$ROM.01
0.85
908-928
(+)
1
0.938



box bZIP proteins


P$ABRE
ABA response
P$ABF1.03
0.82
910-926
(−)
0.75
0.864



elements


P$GBOX
Plant G-box/C-
P$BZIP910.02
0.84
913-933
(−)
0.75
0.871



box bZIP proteins


P$SBPD
SBP-domain
P$SBP.01
0.88
939-955
(+)
1
0.887



proteins


P$EINL
Ethylen insensitive
P$TEIL.01
0.92
942-950
(+)
0.84
0.922



3 like



factors


P$MADS
MADS box
P$SQUA.01
0.90
960-980
(−)
1
0.908



proteins


P$L1BX
L1 box, motif
P$PDF2.01
0.85
963-979
(+)
1
0.856



for L1 layer-



specific expression


P$LREM
Light responsive
P$RAP22.01
0.85
972-982
(+)
1
0.858



element



motif, not modulated



by different



light



qualities


O$PTBP
Plant TATA
O$PTATA.01
0.88
974-988
(−)
0.83
0.905



binding protein



factor


O$VTBP
Vertebrate TA-
O$ATATA.01
0.78
974-990
(+)
0.75
0.83



TA binding



protein factor


O$VTBP
Vertebrate TA-
O$MTATA.01
0.84
976-992
(+)
1
0.855



TA binding



protein factor


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
983-999
(−)
1
0.867


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
 984-1002
(−)
1
0.81


P$AHBP

Arabidopsis

P$ATHB1.01
0.90
 991-1001
(+)
1
0.989



homeobox protein


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
 991-1001
(−)
1
0.943



homeobox protein


P$HMGF
High mobility
P$HMG_IY.01
0.89
 992-1006
(+)
1
0.913



group factors


P$OCSE
Enhancer element
P$OCSL.01
0.69
1004-1024
(+)
1
0.827



first identified



in the



promoter of the



octopine synthase



gene



(OCS) of the




Agrobacterium





tumefaciens T-




DNA


P$GBOX
Plant G-box/C-
P$UPRE.01
0.86
1009-1029
(−)
1
0.974



box bZIP proteins


P$GBOX
Plant G-box/C-
P$TGA1.01
0.90
1010-1030
(+)
1
0.991



box bZIP proteins


P$ABRE
ABA response
P$ABF1.03
0.82
1011-1027
(+)
1
0.828



elements


P$OPAQ
Opaque-2 like
P$O2.01
0.87
1011-1027
(−)
1
0.99



transcriptional



activators


P$OPAQ
Opaque-2 like
P$O2_GCN4.01
0.81
1012-1028
(+)
0.95
0.893



transcriptional



activators


P$ROOT
Root hair-
P$RHE.01
0.77
1013-1037
(−)
1
0.771



specific cis-



elements in



angiosperms


P$LEGB
Legumin Box
P$LEGB.01
0.65
1025-1051
(+)
1
0.656



family


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
1042-1052
(+)
0.83
0.902



homeobox protein


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
1042-1052
(−)
1
1



homeobox protein


P$GTBX
GT-box elements
P$SBF1.01
0.87
1045-1061
(+)
1
0.888


P$GTBX
GT-box elements
P$SBF1.01
0.87
1046-1062
(−)
1
0.888


P$IBOX
Plant I-Box
P$GATA.01
0.93
1060-1076
(+)
1
0.949



sites


O$INRE
Core promoter
O$DINR.01
0.94
1070-1080
(−)
0.97
0.949



initiator elements


P$NACF
Plant specific
P$TANAC69.01
0.68
1078-1100
(+)
1
0.775



NAC [NAM (no



apical meristem),



ATAF172,



CUC2 (cup-



shaped cotyledons



2)] transcription



factors


P$CCAF
Circadian control
P$CCA1.01
0.85
1093-1107
(−)
1
0.949



factors


P$MADS
MADS box
P$SQUA.01
0.90
1097-1117
(+)
1
0.908



proteins


P$CARM
CA-rich motif
P$CARICH.01
0.78
1102-1120
(−)
1
0.791


P$MADS
MADS box
P$SQUA.01
0.90
1108-1128
(−)
1
0.928



proteins


O$PTBP
Plant TATA
O$PTATA.01
0.88
1111-1125
(+)
1
0.961



binding protein



factor


O$VTBP
Vertebrate TA-
O$VTATA.01
0.90
1112-1128
(+)
1
0.968



TA binding



protein factor


P$LEGB
Legumin Box
P$RY.01
0.87
1130-1156
(−)
1
0.932



family


P$LEGB
Legumin Box
P$RY.01
0.87
1138-1164
(−)
1
0.914



family


P$ROOT
Root hair-
P$RHE.01
0.77
1138-1162
(+)
0.75
0.794



specific cis-



elements in



angiosperms


P$L1BX
L1 box, motif
P$ATML1.01
0.82
1141-1157
(+)
0.75
0.833



for L1 layer-



specific expression
















TABLE 3







Boxes and Motifs identified in the starting sequence of the VfSBP promoter


p-VfSBP (nativ)















Further Family


Position

Core
Matrix


Family
Information
Matrix
Opt.
from-to
Strand
sim.
sim.

















P$MYBS
MYB proteins
P$MYBST1.01
0.90
12-28
(+)
1
0.918



with single



DNA binding



repeat


P$GAGA
GAGA elements
P$BPC.01
1.00
25-49
(−)
1
1


P$LEGB
Legumin Box
P$IDE1.01
0.77
 80-106
(−)
1
0.805



family


P$GTBX
GT-box elements
P$GT3A.01
0.83
 85-101
(−)
1
0.843


P$PSRE
Pollen-specific
P$GAAA.01
0.83
101-117
(−)
1
0.883



regulatory elements


P$SPF1
Sweet potato
P$SP8BF.01
0.87
118-130
(+)
1
0.897



DNA-binding



factor with two



WRKY-



domains


P$GBOX
Plant G-box/C-
P$HBP1B.01
0.83
138-158
(+)
1
0.834



box bZIP proteins


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
165-181
(−)
0.78
0.788


P$NACF
Plant specific
P$TANAC69.01
0.68
173-195
(−)
0.81
0.729



NAC [NAM (no



apical meristem),



ATAF172,



CUC2 (cup-



shaped cotyledons



2)] transcription



factors


P$MADS
MADS box
P$AGL1.01
0.84
174-194
(−)
0.98
0.862



proteins


P$MADS
MADS box
P$AGL1.01
0.84
175-195
(+)
0.98
0.863



proteins


P$TCPF
DNA-binding
P$ATTCP20.01
0.94
189-201
(+)
1
0.968



proteins with



the plant specific



TCP-



domain


P$L1BX
L1 box, motif
P$ATML1.02
0.76
194-210
(−)
0.89
0.8



for L1 layer-



specific expression


P$AHBP

Arabidopsis

P$BLR.01
0.90
198-208
(+)
0.83
0.936



homeobox protein


O$VTBP
Vertebrate TA-
O$ATATA.01
0.78
207-223
(+)
0.75
0.811



TA binding



protein factor


P$EINL
Ethylen insensitive
P$TEIL.01
0.92
215-223
(−)
0.96
0.924



3 like



factors


P$GBOX
Plant G-box/C-
P$HBP1A.01
0.88
217-237
(−)
1
0.908



box bZIP proteins


P$GBOX
Plant G-box/C-
P$GBF1.01
0.94
218-238
(+)
1
0.963



box bZIP proteins


P$GTBX
GT-box elements
P$S1F.01
0.79
218-234
(+)
1
0.821


P$ABRE
ABA response
P$ABF1.03
0.82
219-235
(+)
1
0.825



elements


P$ROOT
Root hair-
P$RHE.01
0.77
221-245
(−)
1
0.803



specific cis-



elements in



angiosperms


P$CE1F
Coupling element
P$SBOX.01
0.87
222-234
(−)
0.78
0.916



1 binding



factors


O$VTBP
Vertebrate TA-
O$VTATA.01
0.90
233-249
(−)
1
0.916



TA binding



protein factor


O$PTBP
Plant TATA
O$PTATA.02
0.90
236-250
(−)
1
0.9



binding protein



factor


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
256-266
(+)
0.94
0.896



homeobox protein


P$NCS1
Nodulin consensus
P$NCS1.01
0.85
256-266
(−)
0.88
0.871



sequence 1


P$LREM
Light responsive
P$RAP22.01
0.85
290-300
(−)
1
0.931



element



motif, not modulated



by different



light



qualities


P$AGP1
Plant GATA-
P$AGP1.01
0.91
292-302
(−)
1
0.984



type zinc finger



protein


P$LREM
Light responsive
P$RAP22.01
0.85
306-316
(+)
1
0.938



element



motif, not modulated



by different



light



qualities


P$MYBL
MYB-like proteins
P$CARE.01
0.83
308-324
(−)
1
0.854


P$CCAF
Circadian control
P$CCA1.01
0.85
354-368
(+)
1
0.895



factors


P$HEAT
Heat shock
P$HSE.01
0.81
375-389
(−)
1
0.861



factors


P$MYBL
MYB-like proteins
P$WER.01
0.87
392-408
(−)
1
0.87


P$MYBL
MYB-like proteins
P$WER.01
0.87
394-410
(+)
1
0.95


P$MSAE
M-phase-
P$MSA.01
0.80
395-409
(−)
0.75
0.808



specific activator



elements


P$HEAT
Heat shock
P$HSE.01
0.81
415-429
(+)
1
0.811



factors


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
421-439
(−)
0.75
0.852


P$WBXF
W Box family
P$WRKY.01
0.92
426-442
(+)
1
0.939


P$DOFF
DNA binding
P$PBOX.01
0.75
431-447
(−)
0.76
0.782



with one finger



(DOF)


P$WBXF
W Box family
P$WRKY.01
0.92
453-469
(+)
1
0.958


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
468-484
(−)
0.82
0.849


P$OPAQ
Opaque-2 like
P$O2_GCN4.01
0.81
486-502
(+)
1
0.818



transcriptional



activators


P$OPAQ
Opaque-2 like
P$O2.01
0.87
498-514
(−)
1
0.919



transcriptional



activators


P$HEAT
Heat shock
P$HSE.01
0.81
512-526
(−)
1
0.85



factors


P$WBXF
W Box family
P$WRKY.01
0.92
533-549
(−)
1
0.966


P$WBXF
W Box family
P$WRKY.01
0.92
543-559
(+)
1
0.966


P$WBXF
W Box family
P$ERE.01
0.89
562-578
(+)
1
0.972


P$DOFF
DNA binding
P$PBOX.01
0.75
614-630
(+)
0.76
0.766



with one finger



(DOF)


P$GTBX
GT-box elements
P$S1F.01
0.79
630-646
(+)
1
0.819


P$AGP1
Plant GATA-
P$AGP1.01
0.91
636-646
(−)
1
0.913



type zinc finger



protein


P$AGP1
Plant GATA-
P$AGP1.01
0.91
637-647
(+)
1
0.915



type zinc finger



protein


P$HEAT
Heat shock
P$HSE.01
0.81
649-663
(+)
0.78
0.87



factors


P$HEAT
Heat shock
P$HSE.01
0.81
654-668
(−)
1
0.815



factors


O$INRE
Core promoter
O$DINR.01
0.94
660-670
(−)
1
0.944



initiator elements


P$GAPB
GAP-Box (light
P$GAP.01
0.88
702-716
(−)
1
0.897



response elements)


P$GTBX
GT-box elements
P$GT1.01
0.85
723-739
(−)
1
0.925


P$AHBP

Arabidopsis

P$WUS.01
0.94
726-736
(−)
1
1



homeobox protein


P$MYBL
MYB-like proteins
P$GAMYB.01
0.91
773-789
(+)
1
0.951


P$GTBX
GT-box elements
P$GT3A.01
0.83
775-791
(+)
1
0.899


P$MYBL
MYB-like proteins
P$CARE.01
0.83
801-817
(−)
1
0.837


O$VTBP
Vertebrate TA-
O$ATATA.01
0.78
803-819
(−)
1
0.811



TA binding



protein factor


O$VTBP
Vertebrate TA-
O$ATATA.01
0.78
819-835
(−)
0.75
0.874



TA binding



protein factor


P$MADS
MADS box
P$AGL15.01
0.79
827-847
(−)
0.83
0.791



proteins


P$MADS
MADS box
P$AGL15.01
0.79
828-848
(+)
1
0.895



proteins


P$CCAF
Circadian control
P$CCA1.01
0.85
843-857
(−)
1
0.883



factors


P$GTBX
GT-box elements
P$SBF1.01
0.87
844-860
(−)
1
0.948


P$CARM
CA-rich motif
P$CARICH.01
0.78
845-863
(+)
1
0.806


P$PSRE
Pollen-specific
P$GAAA.01
0.83
858-874
(+)
0.75
0.831



regulatory elements


P$MYBL
MYB-like proteins
P$NTMYBAS1.01
0.96
867-883
(+)
1
0.963


P$GTBX
GT-box elements
P$SBF1.01
0.87
869-885
(+)
1
0.883


P$RAV5
5′-part of bipartite
P$RAV1-5.01
0.96
882-892
(+)
1
0.96



RAV1 binding



site


P$AHBP

Arabidopsis

P$WUS.01
0.94
888-898
(−)
1
1



homeobox protein


P$GTBX
GT-box elements
P$SBF1.01
0.87
897-913
(+)
1
0.886


P$AHBP

Arabidopsis

P$BLR.01
0.90
906-916
(+)
1
1



homeobox protein


P$AHBP

Arabidopsis

P$BLR.01
0.90
907-917
(−)
1
0.903



homeobox protein


P$CARM
CA-rich motif
P$CARICH.01
0.78
908-926
(−)
1
0.826


P$MYBL
MYB-like proteins
P$NTMYBAS1.01
0.96
916-932
(−)
1
0.962


P$MIIG
MYB IIG-type
P$PALBOXP.01
0.81
918-932
(−)
0.94
0.817



binding sites


P$DOFF
DNA binding
P$DOF1.01
0.98
929-945
(−)
1
0.983



with one finger



(DOF)


P$GTBX
GT-box elements
P$GT1.01
0.85
933-949
(+)
0.97
0.854


O$VTBP
Vertebrate TA-
O$LTATA.01
0.82
944-960
(+)
1
0.829



TA binding



protein factor


P$AHBP

Arabidopsis

P$ATHB9.01
0.77
959-969
(+)
0.75
0.816



homeobox protein


P$AHBP

Arabidopsis

P$ATHB9.01
0.77
959-969
(−)
1
0.909



homeobox protein


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
970-980
(+)
1
0.916



homeobox protein


P$AHBP

Arabidopsis

P$ATHB1.01
0.90
973-983
(+)
1
0.989



homeobox protein


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
973-983
(−)
1
0.976



homeobox protein


P$IDDF
ID domain factors
P$ID1.01
0.92
976-988
(+)
1
0.928


P$IBOX
Plant I-Box
P$GATA.01
0.93
 995-1011
(+)
1
0.96



sites


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
1008-1018
(+)
1
0.937



homeobox protein


P$AHBP

Arabidopsis

P$WUS.01
0.94
1012-1022
(−)
1
1



homeobox protein


P$SPF1
Sweet potato
P$SP8BF.01
0.87
1029-1041
(−)
0.78
0.879



DNA-binding



factor with two



WRKY-



domains


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
1036-1054
(−)
1
0.822


P$AHBP

Arabidopsis

P$ATHB1.01
0.90
1054-1064
(+)
1
0.99



homeobox protein


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
1054-1064
(−)
0.83
0.94



homeobox protein


P$GTBX
GT-box elements
P$GT3A.01
0.83
1066-1082
(+)
1
0.889


O$PTBP
Plant TATA
O$PTATA.02
0.90
1086-1100
(+)
1
0.94



binding protein



factor


O$VTBP
Vertebrate TA-
O$VTATA.01
0.90
1087-1103
(+)
0.89
0.927



TA binding



protein factor


O$PTBP
Plant TATA
O$PTATA.01
0.88
1088-1102
(+)
1
0.958



binding protein



factor


O$VTBP
Vertebrate TA-
O$VTATA.01
0.90
1089-1105
(+)
1
0.971



TA binding



protein factor


P$E2FF
E2F-homolog
P$E2F.01
0.82
1117-1131
(−)
1
0.833



cell cycle regulators


P$PSRE
Pollen-specific
P$GAAA.01
0.83
1146-1162
(+)
1
0.908



regulatory elements


P$GTBX
GT-box elements
P$S1F.01
0.79
1153-1169
(+)
1
0.8


P$GTBX
GT-box elements
P$S1F.01
0.79
1170-1186
(−)
1
0.797


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
1173-1191
(+)
1
0.813


P$MADS
MADS box
P$AGL2.01
0.82
1174-1194
(+)
1
0.9



proteins


P$AHBP

Arabidopsis

P$BLR.01
0.90
1189-1199
(+)
0.83
0.919



homeobox protein


P$DOFF
DNA binding
P$PBOX.01
0.75
1229-1245
(−)
0.76
0.763



with one finger



(DOF)


P$MYBL
MYB-like proteins
P$WER.01
0.87
1234-1250
(−)
0.94
0.88


O$PTBP
Plant TATA
O$PTATA.01
0.88
1241-1255
(+)
1
0.964



binding protein



factor


O$VTBP
Vertebrate TA-
O$VTATA.01
0.90
1242-1258
(+)
1
0.967



TA binding



protein factor


P$DOFF
DNA binding
P$PBOX.01
0.75
1265-1281
(−)
0.76
0.762



with one finger



(DOF)


P$GTBX
GT-box elements
P$GT3A.01
0.83
1265-1281
(+)
0.75
0.839


P$AHBP

Arabidopsis

P$BLR.01
0.90
1274-1284
(−)
1
0.928



homeobox protein


P$OCSE
Enhancer element
P$OCSL.01
0.69
1278-1298
(+)
0.77
0.732



first identified



in the



promoter of the



octopine synthase



gene



(OCS) of the




Agrobacterium





tumefaciens T-




DNA


P$MYCL
Myc-like basic
P$MYCRS.01
0.93
1284-1302
(−)
0.86
0.963



helix-loop-helix



binding factors


P$TALE
TALE (3-aa
P$KN1_KIP.01
0.88
1289-1301
(−)
1
1



acid loop extension)



class



homeodomain



proteins


P$AREF
Auxin response
P$SEBF.01
0.96
1292-1304
(+)
1
0.98



element


P$MSAE
M-phase-
P$MSA.01
0.80
1295-1309
(−)
0.75
0.818



specific activator



elements


P$DOFF
DNA binding
P$PBOX.01
0.75
1296-1312
(−)
1
0.776



with one finger



(DOF)


P$MYBL
MYB-like proteins
P$WER.01
0.87
1310-1326
(−)
0.94
0.876


P$AHBP

Arabidopsis

P$BLR.01
0.90
1319-1329
(+)
1
0.93



homeobox protein


O$VTBP
Vertebrate TA-
O$ATATA.01
0.78
1323-1339
(−)
1
0.881



TA binding



protein factor


P$LREM
Light responsive
P$RAP22.01
0.85
1327-1337
(−)
1
0.936



element



motif, not modulated



by different



light



qualities


P$GTBX
GT-box elements
P$SBF1.01
0.87
1338-1354
(+)
1
0.896


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
1338-1356
(−)
1
0.819


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
1345-1355
(+)
0.83
0.902



homeobox protein


P$AHBP

Arabidopsis

P$BLR.01
0.90
1345-1355
(−)
1
0.998



homeobox protein


P$AGP1
Plant GATA-
P$AGP1.01
0.91
1354-1364
(−)
1
0.916



type zinc finger



protein


O$VTBP
Vertebrate TA-
O$VTATA.01
0.90
1376-1392
(−)
1
0.949



TA binding



protein factor


P$HMGF
High mobility
P$HMG_IY.01
0.89
1377-1391
(+)
1
0.952



group factors


O$PTBP
Plant TATA
O$PTATA.01
0.88
1379-1393
(−)
1
0.883



binding protein



factor


P$IBOX
Plant I-Box
P$IBOX.01
0.81
1399-1415
(−)
0.75
0.822



sites


O$VTBP
Vertebrate TA-
O$LTATA.01
0.82
1417-1433
(−)
1
0.86



TA binding



protein factor


P$IBOX
Plant I-Box
P$IBOX.01
0.81
1419-1435
(−)
0.75
0.824



sites


P$WBXF
W Box family
P$WRKY.01
0.92
1429-1445
(−)
1
0.958


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
1457-1473
(+)
0.82
0.798


P$ROOT
Root hair-
P$RHE.02
0.77
1458-1482
(+)
0.75
0.786



specific cis-



elements in



angiosperms


P$LFYB
LFY binding
P$LFY.01
0.93
1486-1498
(−)
0.91
0.987



site


P$CAAT
CCAAT binding
P$CAAT.01
0.97
1490-1498
(−)
1
0.982



factors


P$HEAT
Heat shock
P$HSE.01
0.81
1526-1540
(+)
1
0.833



factors


P$AHBP

Arabidopsis

P$BLR.01
0.90
1550-1560
(−)
1
0.93



homeobox protein


P$IDDF
ID domain factors
P$ID1.01
0.92
1563-1575
(+)
1
0.952


P$NCS2
Nodulin consensus
P$NCS2.01
0.79
1565-1579
(+)
0.75
0.845



sequence 2


O$VTBP
Vertebrate TA-
O$MTATA.01
0.84
1570-1586
(+)
1
0.846



TA binding



protein factor


P$DOFF
DNA binding
P$PBF.01
0.97
1571-1587
(+)
1
0.988



with one finger



(DOF)


P$LEGB
Legumin Box
P$RY.01
0.87
1572-1598
(−)
1
0.898



family


P$MADS
MADS box
P$AGL3.01
0.83
1637-1657
(+)
1
0.851



proteins


P$MYBL
MYB-like proteins
P$ATMYB77.01
0.87
1654-1670
(−)
1
0.909


P$URNA
Upstream sequence
P$USE.01
0.75
1659-1675
(+)
1
0.758



element



of U-



snRNA genes


P$AHBP

Arabidopsis

P$ATHB1.01
0.90
1671-1681
(−)
1
0.989



homeobox protein


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
1671-1681
(+)
1
0.955



homeobox protein


P$OCSE
Enhancer element
P$OCSL.01
0.69
1677-1697
(+)
1
0.763



first identified



in the



promoter of the



octopine synthase



gene



(OCS) of the




Agrobacterium





tumefaciens T-




DNA


P$GBOX
Plant G-box/C-
P$GBF1.01
0.94
1682-1702
(−)
1
0.968



box bZIP proteins


P$ABRE
ABA response
P$ABRE.01
0.82
1685-1701
(−)
1
0.855



elements


P$BRRE
Brassinosteroid
P$BZR1.01
0.95
1696-1712
(−)
1
0.954



(BR) response



element


P$GBOX
Plant G-box/C-
P$GBF1.01
0.94
1696-1716
(−)
1
0.963



box bZIP proteins


P$TEFB
TEF-box
P$TEF1.01
0.76
1696-1716
(−)
0.96
0.826


P$OPAQ
Opaque-2 like
P$O2_GCN4.01
0.81
1698-1714
(−)
0.95
0.824



transcriptional



activators


P$DPBF
Dc3 promoter
P$DPBF.01
0.89
1700-1710
(+)
1
0.943



binding factors


P$LEGB
Legumin Box
P$RY.01
0.87
1701-1727
(−)
1
0.887



family


P$LEGB
Legumin Box
P$IDE1.01
0.77
1708-1734
(+)
1
0.871



family


P$MYBS
MYB proteins
P$TAMYB80.01
0.83
1727-1743
(+)
1
0.85



with single



DNA binding



repeat


P$ROOT
Root hair-
P$RHE.02
0.77
1740-1764
(+)
1
0.786



specific cis-



elements in



angiosperms


P$GBOX
Plant G-box/C-
P$EMBP1.01
0.84
1747-1767
(−)
1
0.84



box bZIP proteins


P$ABRE
ABA response
P$ABRE.01
0.82
1750-1766
(−)
1
0.831



elements


O$VTBP
Vertebrate TA-
O$VTATA.01
0.90
1756-1772
(+)
1
0.963



TA binding



protein factor


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
1765-1781
(−)
1
0.781
















TABLE 4







Boxes and Motifs identified in the permutated sequence of the VfSBP promoter. Preferably


associated boxes are annotated in line 8, 14, 26, 56, 58, 59, 66, 121, 144, 148, 158, 185,


200, 201, 211, 215, 218, 219, 220, 225, 226, 228 of tables 3 and 4. Essential boxes are annotated


in line 130, 132 and 146 of tables 3 and 4.


p-VfSBP_perm















Further Family


Position

Core
Matrix


Family
Information
Matrix
Opt.
from-to
Strand
sim.
sim.

















P$MYBS
MYB proteins
P$MYBST1.01
0.90
12-28
(+)
1
0.918



with single



DNA binding



repeat


P$AGP1
Plant GATA-
P$AGP1.01
0.91
25-35
(−)
1
0.914



type zinc finger



protein


P$GAGA
GAGA elements
P$BPC.01
1.00
25-49
(−)
1
1


P$AGP1
Plant GATA-
P$AGP1.01
0.91
26-36
(+)
1
0.914



type zinc finger



protein


P$LEGB
Legumin Box
P$IDE1.01
0.77
 80-106
(−)
1
0.805



family


P$GTBX
GT-box elements
P$GT3A.01
0.83
 85-101
(−)
1
0.843


P$PSRE
Pollen-specific
P$GAAA.01
0.83
101-117
(−)
1
0.883



regulatory elements


P$GBOX
Plant G-box/C-
P$HBP1B.01
0.83
138-158
(+)
1
0.834



box bZIP proteins


P$WBXF
W Box family
P$ERE.01
0.89
154-170
(−)
1
0.935


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
165-181
(−)
0.78
0.788


P$NACF
Plant specific
P$TANAC69.01
0.68
173-195
(−)
0.81
0.728



NAC [NAM (no



apical meristem),



ATAF172,



CUC2 (cup-



shaped cotyledons



2)] transcription



factors


P$MADS
MADS box
P$AGL1.01
0.84
174-194
(−)
0.98
0.856



proteins


P$MADS
MADS box
P$AGL1.01
0.84
175-195
(+)
0.98
0.844



proteins


P$TCPF
DNA-binding
P$ATTCP20.01
0.94
189-201
(+)
1
0.968



proteins with



the plant specific



TCP-



domain


P$L1BX
L1 box, motif
P$ATML1.02
0.76
194-210
(−)
0.89
0.795



for L1 layer-



specific expression


P$AHBP

Arabidopsis

P$BLR.01
0.90
198-208
(+)
0.83
0.936



homeobox protein


P$EINL
Ethylen insensitive
P$TEIL.01
0.92
215-223
(−)
0.96
0.924



3 like



factors


P$GBOX
Plant G-box/C-
P$HBP1A.01
0.88
217-237
(−)
1
0.908



box bZIP proteins


P$GBOX
Plant G-box/C-
P$GBF1.01
0.94
218-238
(+)
1
0.963



box bZIP proteins


P$GTBX
GT-box elements
P$S1F.01
0.79
218-234
(+)
1
0.821


P$ABRE
ABA response
P$ABF1.03
0.82
219-235
(+)
1
0.825



elements


P$ROOT
Root hair-
P$RHE.01
0.77
221-245
(−)
1
0.803



specific cis-



elements in



angiosperms


P$CE1F
Coupling element
P$SBOX.01
0.87
222-234
(−)
0.78
0.916



1 binding



factors


O$VTBP
Vertebrate TA-
O$VTATA.01
0.90
233-249
(−)
1
0.939



TA binding



protein factor


P$IBOX
Plant I-Box
P$GATA.01
0.93
245-261
(−)
1
0.963



sites


P$MYBS
MYB proteins
P$HVMCB1.01
0.93
248-264
(+)
1
0.957



with single



DNA binding



repeat


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
256-266
(+)
0.94
0.896



homeobox protein


P$NCS1
Nodulin consensus
P$NCS1.01
0.85
256-266
(−)
0.88
0.871



sequence 1


O$VTBP
Vertebrate TA-
O$ATATA.01
0.78
260-276
(+)
1
0.819



TA binding



protein factor


P$LREM
Light responsive
P$RAP22.01
0.85
290-300
(−)
1
0.931



element



motif, not modulated



by different



light



qualities


P$AGP1
Plant GATA-
P$AGP1.01
0.91
292-302
(−)
1
0.984



type zinc finger



protein


P$AGP1
Plant GATA-
P$AGP1.01
0.91
293-303
(+)
1
0.915



type zinc finger



protein


P$LREM
Light responsive
P$RAP22.01
0.85
306-316
(+)
1
0.938



element



motif, not modulated



by different



light



qualities


P$MYBL
MYB-like proteins
P$CARE.01
0.83
308-324
(−)
1
0.854


P$MYBL
MYB-like proteins
P$ATMYB77.01
0.87
319-335
(+)
1
0.87


O$INRE
Core promoter
O$DINR.01
0.94
322-332
(+)
1
0.969



initiator elements


P$MADS
MADS box
P$AGL15.01
0.79
345-365
(+)
0.85
0.825



proteins


P$CCAF
Circadian control
P$CCA1.01
0.85
354-368
(+)
1
0.895



factors


P$HEAT
Heat shock
P$HSE.01
0.81
375-389
(−)
1
0.861



factors


P$MYBL
MYB-like proteins
P$WER.01
0.87
392-408
(−)
1
0.87


P$MYBL
MYB-like proteins
P$WER.01
0.87
394-410
(+)
1
0.95


P$MSAE
M-phase-
P$MSA.01
0.80
395-409
(−)
0.75
0.808



specific activator



elements


P$HMGF
High mobility
P$HMG_IY.01
0.89
402-416
(−)
1
0.929



group factors


P$CCAF
Circadian control
P$CCA1.01
0.85
404-418
(+)
1
0.871



factors


P$AHBP

Arabidopsis

P$BLR.01
0.90
407-417
(−)
1
0.901



homeobox protein


P$LREM
Light responsive
P$RAP22.01
0.85
411-421
(+)
1
0.916



element



motif, not modulated



by different



light



qualities


P$HEAT
Heat shock
P$HSE.01
0.81
415-429
(+)
1
0.811



factors


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
421-439
(−)
0.75
0.849


P$DOFF
DNA binding
P$PBOX.01
0.75
431-447
(−)
0.76
0.782



with one finger



(DOF)


P$WBXF
W Box family
P$WRKY.01
0.92
453-469
(+)
1
0.958


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
468-484
(−)
0.82
0.849


P$OPAQ
Opaque-2 like
P$O2_GCN4.01
0.81
486-502
(+)
1
0.818



transcriptional



activators


P$OPAQ
Opaque-2 like
P$O2.01
0.87
498-514
(−)
1
0.919



transcriptional



activators


P$HEAT
Heat shock
P$HSE.01
0.81
512-526
(−)
1
0.824



factors


P$NCS2
Nodulin consensus
P$NCS2.01
0.79
525-539
(−)
0.75
0.815



sequence 2


P$WBXF
W Box family
P$WRKY.01
0.92
533-549
(−)
1
0.966


P$WBXF
W Box family
P$WRKY.01
0.92
543-559
(+)
1
0.966


P$WBXF
W Box family
P$ERE.01
0.89
562-578
(+)
1
0.972


P$DOFF
DNA binding
P$PBOX.01
0.75
614-630
(+)
0.76
0.766



with one finger



(DOF)


P$GTBX
GT-box elements
P$S1F.01
0.79
630-646
(+)
1
0.819


P$AGP1
Plant GATA-
P$AGP1.01
0.91
636-646
(−)
1
0.913



type zinc finger



protein


P$AGP1
Plant GATA-
P$AGP1.01
0.91
637-647
(+)
1
0.921



type zinc finger



protein


P$MYBL
MYB-like proteins
P$GAMYB.01
0.91
640-656
(−)
1
0.918


P$HEAT
Heat shock
P$HSE.01
0.81
649-663
(+)
0.78
0.87



factors


P$HEAT
Heat shock
P$HSE.01
0.81
654-668
(−)
1
0.815



factors


O$INRE
Core promoter
O$DINR.01
0.94
660-670
(−)
1
0.944



initiator elements


P$PREM
Motifs of plastid
P$MGPROTORE.01
0.77
691-721
(−)
1
0.789



response



elements


P$GAPB
GAP-Box (light
P$GAP.01
0.88
702-716
(−)
1
0.897



response elements)


P$GTBX
GT-box elements
P$GT1.01
0.85
723-739
(−)
1
0.925


P$AHBP

Arabidopsis

P$WUS.01
0.94
726-736
(−)
1
1



homeobox protein


P$CARM
CA-rich motif
P$CARICH.01
0.78
731-749
(+)
1
0.855


P$MYCL
Myc-like basic
P$ICE.01
0.95
734-752
(+)
0.95
0.961



helix-loop-helix



binding factors


P$MYBL
MYB-like proteins
P$GAMYB.01
0.91
773-789
(+)
1
0.951


P$GTBX
GT-box elements
P$GT3A.01
0.83
775-791
(+)
1
0.899


P$MYBL
MYB-like proteins
P$CARE.01
0.83
801-817
(−)
1
0.837


O$VTBP
Vertebrate TA-
O$ATATA.01
0.78
803-819
(−)
1
0.811



TA binding



protein factor


P$L1BX
L1 box, motif
P$PDF2.01
0.85
814-830
(−)
1
0.869



for L1 layer-



specific expression


P$GTBX
GT-box elements
P$GT1.01
0.85
815-831
(−)
0.97
0.854


O$VTBP
Vertebrate TA-
O$ATATA.01
0.78
819-835
(−)
0.75
0.874



TA binding



protein factor


P$MADS
MADS box
P$AGL15.01
0.79
828-848
(+)
1
0.857



proteins


P$CCAF
Circadian control
P$CCA1.01
0.85
843-857
(−)
1
0.883



factors


P$GTBX
GT-box elements
P$SBF1.01
0.87
844-860
(−)
1
0.948


P$CARM
CA-rich motif
P$CARICH.01
0.78
845-863
(+)
1
0.806


P$MYBL
MYB-like proteins
P$CARE.01
0.83
849-865
(−)
1
0.876


P$GTBX
GT-box elements
P$SBF1.01
0.87
869-885
(+)
1
0.883


P$RAV5
5′-part of bipartite
P$RAV1-5.01
0.96
882-892
(+)
1
0.96



RAV1 binding



site


P$L1BX
L1 box, motif
P$PDF2.01
0.85
884-900
(−)
0.85
0.853



for L1 layer-



specific expression


P$AHBP

Arabidopsis

P$WUS.01
0.94
888-898
(−)
1
1



homeobox protein


P$MYBL
MYB-like proteins
P$ATMYB77.01
0.87
895-911
(+)
1
0.962


P$GTBX
GT-box elements
P$SBF1.01
0.87
897-913
(+)
1
0.883


P$AHBP

Arabidopsis

P$BLR.01
0.90
906-916
(+)
1
1



homeobox protein


P$AHBP

Arabidopsis

P$BLR.01
0.90
907-917
(−)
1
0.903



homeobox protein


P$CARM
CA-rich motif
P$CARICH.01
0.78
908-926
(−)
1
0.826


P$MYBL
MYB-like proteins
P$NTMYBAS1.01
0.96
916-932
(−)
1
0.962


P$MIIG
MYB IIG-type
P$PALBOXP.01
0.81
918-932
(−)
0.94
0.817



binding sites


P$SPF1
Sweet potato
P$SP8BF.01
0.87
931-943
(−)
1
0.889



DNA-binding



factor with two



WRKY-



domains


P$L1BX
L1 box, motif
P$ATML1.01
0.82
948-964
(+)
1
0.908



for L1 layer-



specific expression


P$AHBP

Arabidopsis

P$ATHB9.01
0.77
959-969
(+)
0.75
0.816



homeobox protein


P$AHBP

Arabidopsis

P$ATHB9.01
0.77
959-969
(−)
1
0.909



homeobox protein


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
970-980
(+)
1
0.916



homeobox protein


P$AHBP

Arabidopsis

P$ATHB1.01
0.90
973-983
(+)
1
0.989



homeobox protein


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
973-983
(−)
1
0.976



homeobox protein


P$IDDF
ID domain factors
P$ID1.01
0.92
976-988
(+)
1
0.928


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
985-995
(+)
1
0.916



homeobox protein


P$GTBX
GT-box elements
P$SBF1.01
0.87
 985-1001
(+)
1
0.891


P$GTBX
GT-box elements
P$SBF1.01
0.87
 986-1002
(−)
1
0.877


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
 992-1002
(−)
1
0.916



homeobox protein


P$IBOX
Plant I-Box
P$GATA.01
0.93
 995-1011
(+)
1
0.935



sites


P$LEGB
Legumin Box
P$LEGB.01
0.65
 998-1024
(+)
0.75
0.676



family


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
1008-1018
(+)
1
0.937



homeobox protein


P$AHBP

Arabidopsis

P$WUS.01
0.94
1012-1022
(−)
1
1



homeobox protein


P$MYBL
MYB-like proteins
P$GAMYB.01
0.91
1022-1038
(−)
1
0.925


P$SPF1
Sweet potato
P$SP8BF.01
0.87
1029-1041
(−)
0.78
0.879



DNA-binding



factor with two



WRKY-



domains


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
1036-1054
(−)
1
0.83


P$AHBP

Arabidopsis

P$ATHB1.01
0.90
1054-1064
(+)
1
0.99



homeobox protein


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
1054-1064
(−)
0.83
0.94



homeobox protein


P$GTBX
GT-box elements
P$GT3A.01
0.83
1066-1082
(+)
1
0.889


O$PTBP
Plant TATA
O$PTATA.02
0.90
1086-1100
(+)
1
0.94



binding protein



factor


O$VTBP
Vertebrate TA-
O$VTATA.01
0.90
1087-1103
(+)
0.89
0.927



TA binding



protein factor


O$PTBP
Plant TATA
O$PTATA.01
0.88
1088-1102
(+)
1
0.958



binding protein



factor


O$VTBP
Vertebrate TA-
O$VTATA.01
0.90
1089-1105
(+)
1
0.971



TA binding



protein factor


P$DOFF
DNA binding
P$DOF3.01
0.99
1098-1114
(+)
1
0.995



with one finger



(DOF)


P$E2FF
E2F-homolog
P$E2F.01
0.82
1117-1131
(−)
1
0.833



cell cycle regulators


P$SPF1
Sweet potato
P$SP8BF.01
0.87
1130-1142
(+)
1
0.881



DNA-binding



factor with two



WRKY-



domains


P$PSRE
Pollen-specific
P$GAAA.01
0.83
1146-1162
(+)
1
0.873



regulatory elements


P$GTBX
GT-box elements
P$S1F.01
0.79
1170-1186
(−)
1
0.797


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
1173-1191
(+)
1
0.813


P$MADS
MADS box
P$AGL2.01
0.82
1174-1194
(+)
1
0.9



proteins


P$AHBP

Arabidopsis

P$BLR.01
0.90
1189-1199
(+)
0.83
0.919



homeobox protein


P$IDDF
ID domain factors
P$ID1.01
0.92
1205-1217
(−)
1
0.97


P$DOFF
DNA binding
P$PBOX.01
0.75
1229-1245
(−)
0.76
0.763



with one finger



(DOF)


P$MYBL
MYB-like proteins
P$WER.01
0.87
1234-1250
(−)
0.94
0.88


O$PTBP
Plant TATA
O$PTATA.01
0.88
1241-1255
(+)
1
0.964



binding protein



factor


O$VTBP
Vertebrate TA-
O$VTATA.01
0.90
1242-1258
(+)
1
0.967



TA binding



protein factor


P$DOFF
DNA binding
P$PBOX.01
0.75
1265-1281
(−)
0.76
0.762



with one finger



(DOF)


P$GTBX
GT-box elements
P$GT3A.01
0.83
1265-1281
(+)
0.75
0.839


P$AHBP

Arabidopsis

P$BLR.01
0.90
1274-1284
(−)
1
0.928



homeobox protein


O$PTBP
Plant TATA
O$PTATA.01
0.88
1277-1291
(+)
1
0.908



binding protein



factor


O$VTBP
Vertebrate TA-
O$VTATA.01
0.90
1278-1294
(+)
1
0.918



TA binding



protein factor


P$OCSE
Enhancer element
P$OCSL.01
0.69
1278-1298
(+)
0.77
0.712



first identified



in the



promoter of the



octopine synthase



gene



(OCS) of the




Agrobacterium





tumefaciens T-




DNA


P$MYCL
Myc-like basic
P$MYCRS.01
0.93
1284-1302
(−)
0.86
0.933



helix-loop-helix



binding factors


P$TALE
TALE (3-aa
P$KN1_KIP.01
0.88
1289-1301
(−)
1
1



acid loop extension)



class



homeodomain



proteins


P$AREF
Auxin response
P$SEBF.01
0.96
1292-1304
(+)
1
0.98



element


P$MSAE
M-phase-
P$MSA.01
0.80
1295-1309
(−)
0.75
0.803



specific activator



elements


P$DOFF
DNA binding
P$PBOX.01
0.75
1296-1312
(−)
1
0.797



with one finger



(DOF)


P$MYBL
MYB-like proteins
P$WER.01
0.87
1310-1326
(−)
0.94
0.876


P$AHBP

Arabidopsis

P$BLR.01
0.90
1319-1329
(+)
1
0.93



homeobox protein


O$VTBP
Vertebrate TA-
O$ATATA.01
0.78
1323-1339
(−)
1
0.833



TA binding



protein factor


P$LREM
Light responsive
P$RAP22.01
0.85
1327-1337
(−)
1
0.936



element



motif, not modulated



by different



light



qualities


P$IBOX
Plant I-Box
P$GATA.01
0.93
1328-1344
(+)
1
0.939



sites


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
1334-1352
(+)
1
0.816


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
1335-1345
(−)
0.83
0.904



homeobox protein


P$AHBP

Arabidopsis

P$BLR.01
0.90
1335-1345
(+)
1
0.998



homeobox protein


P$GTBX
GT-box elements
P$SBF1.01
0.87
1338-1354
(+)
1
0.896


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
1338-1356
(−)
1
0.819


P$AHBP

Arabidopsis

P$ATHB5.01
0.89
1345-1355
(+)
0.83
0.902



homeobox protein


P$AHBP

Arabidopsis

P$BLR.01
0.90
1345-1355
(−)
1
0.998



homeobox protein


P$AGP1
Plant GATA-
P$AGP1.01
0.91
1354-1364
(−)
1
0.916



type zinc finger



protein


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
1365-1375
(−)
1
0.896



homeobox protein


O$VTBP
Vertebrate TA-
O$VTATA.01
0.90
1376-1392
(−)
1
0.949



TA binding



protein factor


P$HMGF
High mobility
P$HMG_IY.01
0.89
1377-1391
(+)
1
0.952



group factors


O$PTBP
Plant TATA
O$PTATA.01
0.88
1379-1393
(−)
1
0.883



binding protein



factor


P$IDDF
ID domain factors
P$ID1.01
0.92
1387-1399
(+)
1
0.926


P$MYBL
MYB-like proteins
P$GAMYB.01
0.91
1389-1405
(+)
1
0.939


O$INRE
Core promoter
O$DINR.01
0.94
1392-1402
(+)
1
0.943



initiator elements


P$IBOX
Plant I-Box
P$IBOX.01
0.81
1399-1415
(−)
0.75
0.822



sites


P$MYBL
MYB-like proteins
P$WER.01
0.87
1410-1426
(+)
1
0.875


P$SPF1
Sweet potato
P$SP8BF.01
0.87
1412-1424
(+)
1
0.91



DNA-binding



factor with two



WRKY-



domains


O$VTBP
Vertebrate TA-
O$LTATA.01
0.82
1417-1433
(−)
1
0.847



TA binding



protein factor


P$IBOX
Plant I-Box
P$IBOX.01
0.81
1419-1435
(−)
0.75
0.824



sites


P$WBXF
W Box family
P$WRKY.01
0.92
1429-1445
(−)
1
0.958


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
1457-1473
(+)
0.82
0.798


P$ROOT
Root hair-
P$RHE.02
0.77
1458-1482
(+)
0.75
0.786



specific cis-



elements in



angiosperms


P$LFYB
LFY binding
P$LFY.01
0.93
1486-1498
(−)
0.91
0.987



site


P$CAAT
CCAAT binding
P$CAAT.01
0.97
1490-1498
(−)
1
0.982



factors


P$HEAT
Heat shock
P$HSE.01
0.81
1526-1540
(+)
1
0.833



factors


P$GTBX
GT-box elements
P$GT1.01
0.85
1536-1552
(−)
0.84
0.869


P$WBXF
W Box family
P$ERE.01
0.89
1537-1553
(+)
1
0.9


P$SPF1
Sweet potato
P$SP8BF.01
0.87
1546-1558
(+)
1
0.919



DNA-binding



factor with two



WRKY-



domains


P$AHBP

Arabidopsis

P$BLR.01
0.90
1550-1560
(−)
1
0.93



homeobox protein


P$LREM
Light responsive
P$RAP22.01
0.85
1555-1565
(−)
1
0.882



element



motif, not modulated



by different



light



qualities


P$NCS1
Nodulin consensus
P$NCS1.01
0.85
1559-1569
(−)
0.8
0.855



sequence 1


P$GARP
Myb-related
P$ARR10.01
0.97
1560-1568
(+)
1
0.97



DNA binding



proteins (Golden2,



ARR, Psr)


P$IDDF
ID domain factors
P$ID1.01
0.92
1563-1575
(+)
1
0.952


P$NCS2
Nodulin consensus
P$NCS2.01
0.79
1565-1579
(+)
0.75
0.845



sequence 2


O$VTBP
Vertebrate TA-
O$MTATA.01
0.84
1570-1586
(+)
1
0.846



TA binding



protein factor


P$DOFF
DNA binding
P$PBF.01
0.97
1571-1587
(+)
1
0.988



with one finger



(DOF)


P$LEGB
Legumin Box
P$RY.01
0.87
1572-1598
(−)
1
0.898



family


P$NCS2
Nodulin consensus
P$NCS2.01
0.79
1610-1624
(+)
1
0.867



sequence 2


P$MADS
MADS box
P$AGL3.01
0.83
1637-1657
(+)
1
0.851



proteins


P$GTBX
GT-box elements
P$GT3A.01
0.83
1652-1668
(−)
1
0.854


P$MYBL
MYB-like proteins
P$NTMYBAS1.01
0.96
1654-1670
(−)
1
0.971


P$AHBP

Arabidopsis

P$HAHB4.01
0.87
1671-1681
(+)
1
0.934



homeobox protein


P$OCSE
Enhancer element
P$OCSL.01
0.69
1677-1697
(+)
1
0.763



first identified



in the



promoter of the



octopine synthase



gene



(OCS) of the




Agrobacterium





tumefaciens T-




DNA


P$GBOX
Plant G-box/C-
P$GBF1.01
0.94
1682-1702
(−)
1
0.968



box bZIP proteins


P$ABRE
ABA response
P$ABRE.01
0.82
1685-1701
(−)
1
0.855



elements


P$BRRE
Brassinosteroid
P$BZR1.01
0.95
1696-1712
(−)
1
0.954



(BR) response



element


P$GBOX
Plant G-box/C-
P$GBF1.01
0.94
1696-1716
(−)
1
0.963



box bZIP proteins


P$TEFB
TEF-box
P$TEF1.01
0.76
1696-1716
(−)
0.84
0.799


P$DPBF
Dc3 promoter
P$DPBF.01
0.89
1700-1710
(+)
1
0.943



binding factors


P$EREF
Ethylen respone
P$ANT.01
0.81
1701-1717
(+)
1
0.862



element



factors


P$LEGB
Legumin Box
P$RY.01
0.87
1701-1727
(−)
1
0.925



family


P$LEGB
Legumin Box
P$RY.01
0.87
1704-1730
(+)
1
0.967



family


P$LEGB
Legumin Box
P$IDE1.01
0.77
1708-1734
(+)
1
0.888



family


P$MADS
MADS box
P$MADS.01
0.75
1722-1742
(+)
1
0.758



proteins


P$MYBS
MYB proteins
P$TAMYB80.01
0.83
1727-1743
(+)
1
0.861



with single



DNA binding



repeat


P$URNA
Upstream sequence
P$USE.01
0.75
1731-1747
(+)
1
0.77



element



of U-



snRNA genes


P$ROOT
Root hair-
P$RHE.02
0.77
1740-1764
(+)
1
0.79



specific cis-



elements in



angiosperms


P$GBOX
Plant G-box/C-
P$EMBP1.01
0.84
1747-1767
(−)
1
0.84



box bZIP proteins


P$ABRE
ABA response
P$ABRE.01
0.82
1750-1766
(−)
1
0.831



elements


O$VTBP
Vertebrate TA-
O$VTATA.01
0.90
1756-1772
(+)
1
0.957



TA binding



protein factor


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
1765-1781
(−)
1
0.781









1.2 Vector Construction

Using the Multisite Gateway System (Invitrogen, Carlsbad, Calif., USA), promoter::reporter-gene cassettes were assembled into binary constructs for plant transformation. beta-Glucuronidase (GUS) or uidA gene which encodes an enzyme for which various chromogenic substrates are known, was utilized as reporter protein for determining the expression features of the permutated p-PvArc5_perm (SEQ ID NO2) and p-VfSBP_perm (SEQ ID NO4) promoter sequences.


The DNA fragments representing promoters p-PvArc5_perm (SEQ ID NO2) and p-VfSBP_perm (SEQ ID NO4) were generated by gene synthesis. Endonucleolytic restriction sites suitable for cloning the promoter fragments into beta-Glucuronidase reporter gene cassettes were included in the synthesis. The p-PvArc5_perm (SEQ ID NO2) promoter was cloned into a pENTR/A vector harboring the beta-Glucuronidase reporter gene c-GUS (with the prefix c-denoting coding sequence) followed by the t-PvArc (with the prefix t- denoting terminator) transcription terminafor sequence using restriction endonucleases FseI and NcoI, yielding construct LJB2012. Similarly, the p-VfSBP_perm (SEQ ID NO4) promoter was cloned into a pENTR/B vector harboring the beta-Glucuronidase reporter gene c-GUS followed by the t-StCatpA transcriptional terminator sequence using restriction endonucleases FseI and NcoI, yielding construct LJB2007.


The complementary pENTR vectors without any expression cassettes were constructed by introduction of a multiple cloning site via KpnI and HindIII restriction sites. By performing a site specific recombination (LR-reaction), the created pENTR/A, pENTR/B and pENTR/C were combined with the pSUN destination vector (pSUN derivative) according to the manufacturers (Invitrogen, Carlsbad, Calif., USA) Multisite Gateway manual. The reactions yielded a binary vector with the p-PvArc5_perm (SEQ ID NO2) promoter, the beta-Glucuronidase coding sequence c-GUS and the t-PvArc terminator, for which the full construct sequence is given (SEQ ID NO7). Accordingly, a binary vector with the p-VfSBP_perm (SEQ ID NO4) promoter, the beta-Glucuronidase reporter gene and the t-StCatpA terminator for which the full construct sequence is given (SEQ ID NO8). The resulting plant transformation vectors are summarized in table 5:









TABLE 5







Plant expression vectors for B. napus transformation









plant
Composition of the expression cassette
SEQ


expression vector
Promoter::reporter gene::terminator
ID NO





LJB2045
p-PvArc5_perm::c-GUS::t-PvArc
7


LJB2043
p-VfSBP_perm::c-GUS::t-StCatpA
8









1.3 Generation of Transgenic Rapeseed Plants (Amended Protocol According to Moloney et al., 1992, Plant Cell Reports, 8: 238-242)

In preparation for the generation of transgenic rapeseed plants, the binary vectors were transformed into Agrobacterium tumefaciens C58C1:pGV2260 (Deblaere et al., 1985, Nucl. Acids. Res. 13: 4777-4788). A 1:50 dilution of an overnight culture of Agrobacteria harboring the respective binary construct was grown in Murashige-Skoog Medium (Murashige and Skoog, 1962, Physiol. Plant 15, 473) supplemented with 3% saccharose (3MS-Medium). For the transformation of rapeseed plants, petioles or hypocotyledons of sterile plants were incubated with a 1:50 Agrobacterium solution for 5-10 minutes followed by a three-day co-incubation in darkness at 25° C. on 3 MS. Medium supplemented with 0.8% bacto-agar. After three days, the explants were transferred to MS-medium containing 500 mg/l Claforan (Cefotaxime-Sodium), 100 nM lmazetapyr, 20 microM Benzylaminopurin (BAP) and 1.6 g/l Glucose in a 16 h light/8 h darkness light regime, which was repeated in weekly periods. Growing shoots were transferred to MS-Medium containing 2% saccharose, 250 mg/l Claforan and 0.8% Bacto-agar. After 3 weeks, the growth hormone 2-Indolbutyl acid was added to the medium to promote root formation. Shoots were transferred to soil following root development, grown for two weeks in a growth chamber and grown to maturity in greenhouse conditions.


Example 2: Expression Profile of the p-PvArc5_perm and p-VfSBP_perm Gene Control Elements

To demonstrate and analyze the transcription regulating properties of a promoter, it is useful to operably link the promoter or its fragments to a reporter gene, which can be employed to monitor its expression both qualitatively and quantitatively. Preferably bacterial β-glucuronidase is used (Jefferson 1987). β-glucuronidase activity can be monitored in planta with chromogenic substrates such as 5-bromo-4-Chloro-3-indolyl-β-D-glucuronic acid during corresponding activity assays (Jefferson 1987). For determination of promoter activity and tissue specificity, plant tissue is dissected, stained and analyzed as described (e.g., Bäumlein 1991).


The regenerated transgenic T0 rapeseed plants harboring single or double insertions of the transgene deriving from constructs LJB2043 or LJB2045 were used for reporter gene analysis.


Table 6 summarizes the reporter gene activity observed in plants harboring transgenes containing SEQ ID NO2 and SEQ ID NO4 in constructs LJB2043 and LJB2045, respectively:









TABLE 6







beta-Glucuronidase reporter gene activity in selected rapeseed


plants harboring transgenes with SEQ ID NO2 (p-PvARC5-perm)


and SEQ ID NO4 (p-VfSBP-perm) compared to the GUS


expression derived from the respective starting sequence in rapeseed


(p-VfSBP) or Phaseolus and Arabidopsis plants (p-PvArc5).












LJB2043

LJB2045




p-VfSBP-

p-



Tissue
perm
p-VfSBP
PvArc5_perm
p-PvArc5*





leaves
negative
negative
negative
negative


stem
negative
negative
negative
negative


roots
negative
negative
negative
negative


flower
negative
negative
negative
negative


silique (without seed)
negative
not
negative
not assayed




analyzed




embryo (early)
weak
weak
strong
strong, no


embryo (young)
weak
weak
strong
seperate


embryo (medium)
strong
strong
strong
analyses of






different stages


embryo (mature)
strong
strong
strong
strong


seed shell
weak
not
strong
strong




analyzed





*expression in Phaseolus and Arabidopsis according to Goossens et al.






The gene expression activity conferred by p-PvArc5 perm and p-VfSBP_perm is shown exemplary in FIG. 1 (p-PvArc5_perm) and in FIG. 2 (P-VfSBP_perm).


General results for SEQ ID NO2: Strong GUS expression was detected in all stages of embryo development and in seed shells. No activity was found in other tissues analyzed.


General results for SEQ ID NO4: Weak GUS expression was detected in early and young embryo stages, strong GUS expression could be observed in medium and mature embryos. Weak expression was monitored in seed shells. No activity was found in other tissues investigated.


Example 3
3.1 Random Permutation of the Promoter Sequence

Using publicly available data, a promoter showing seed specific expression in plants was selected for analyzing the effects of sequence permutation in periodic intervals throughout the full length of the promoter DNA sequence. The wild type sequences of the Brassica napus p-BnNapin promoter was analyzed and annotated for the occurrence of cis-regulatory elements using available literature data (Ellerström et al., Ericson et al., Ezcurra et al.). In the following, the DNA sequence of the promoter was permutated in the region of −1000 to +1 nucleotides with the following criteria to yield p-BnNapin_perm (SEQ ID NO6): DNA permutation was conducted in a way to not affect cis regulatory elements which have been proven previously to be essential for seed specific gene expression and motives essential for gene expression. The remaining promoter sequence was randomly permutated resulting in a promoter sequence with an overall nucleotide homology of 75% to the initial p-BnNapin sequence


3.2 Vector Construction

Using the Multisite Gateway System (Invitrogen, Carlsbad, Calif., USA), promoter::reporter-gene cassettes were assembled into binary constructs for plant transformation. Beta-Glucuronidase (GUS) or uidA gene which encodes an enzyme for which various chromogenic substrates are known, was utilized as reporter protein for determining the expression features of the permutated p-BnNapin_perm (SEQ ID NO6) promoter sequences.


The DNA fragments representing promoter p-BnNapin_perm was generated by gene synthesis. Endonucleolytic restriction sites suitable for cloning the promoter fragment into a beta-Glucuronidase reporter gene cassette was included in the synthesis. p-BnNapin_perm (SEQ ID NO6) promoter was cloned into a pENTR/A vector harboring the beta-Glucuronidase reporter gene c-GUS (with the prefix c-denoting coding sequence) followed by the t-nos transcription terminator sequence using restriction endonucleases BamHI and NcoI, yielding pENTR/A LLL1168.


A 1138 bp DNA fragment representing the native promoter p-BnNapin (SEQ ID NO5) was generated by PCR with the following primers.











SEQ ID NO 11










Loy963
GATATAGGTACCTCTTCATCGGTGATTGATTCCT













SEQ ID NO 12










Loy964
GATATACCATGGTCGTGTATGTTTTTAATCTTGTTTG






Endonucleolytic restriction sites suitable for cloning the promoter fragment into a beta-Glucuronidase reporter gene cassette were included in the primers. p-BnNapin (SEQ ID NO5) promoter was cloned into a pENTR/A vector harboring the beta-Glucuronidase reporter gene c-GUS (with the prefix c- denoting coding sequence) followed by the t-nos transcription terminator sequence using restriction endonucleases KpnI and NcoI, yielding pENTR/A LLL1166.


By performing a site specific recombination (LR-reaction), the newly created pENTRs/A LLL1168 and LLL1166, were combined with pENTR/B and pENTR/C and the pSUN destination vector (pSUN derivative) according to the manufacturers (Invitrogen, Carlsbad, Calif., USA) Multi-site Gateway manual. The reaction yielded binary vector LLL 1184 with the p-BnNapin_perm (SEQ ID NO6) promoter, the beta-Glucuronidase coding sequence c-GUS and the t-nos terminator, and binary vector LLL 1176 with the native p-BnNapin (SEQ ID NO5) promoter, the beta-Glucuronidase coding sequence c-GUS and the t-nos terminator. For both vectors the full construct sequence is given (SEQ ID NO9 and 10). The resulting plant transformation vectors are shown in table 7:









TABLE 7







Plant expression vectors for A. thaliana transformation









plant
Composition of the expression cassette
SEQ


expression vector
Promoter::reporter gene::terminator
ID NO












LLL1184
p-BnNapin_perm::c-GUS::t-nos
9


LLL1176
p-BnNapin::c-GUS::t-nos
10









3.3 Generation of Arabidopsis thaliana Plants


A. thaliana plants were grown in soil until they flowered. Agrobacterium tumefaciens (strain C58C1 [pMP90]) transformed with the construct of interest was grown in 500 mL in liquid YEB medium (5 g/L Beef extract, 1 g/L Yeast Extract (Duchefa), 5 g/L Peptone (Duchefa), 5 g/L sucrose (Duchefa), 0.49 g/L MgSO4 (Merck)) until the culture reached an OD600 0.8-1.0. The bacterial cells were harvested by centrifugation (15 minutes, 5,000 rpm) and resuspended in 500 mL infiltration solution (5% sucrose, 0.05% SILWET L-77 [distributed by Lehle seeds, Cat. No. VIS-02]). Flowering plants were dipped for 10-20 seconds into the Agrobacterium solution. Afterwards the plants were kept in the dark for one day and then in the greenhouse until seeds could be harvested. Transgenic seeds were selected on soil by spraying the seeds directly after sowing with a solution of 0.016 g/l Imazamox. After 12 to 14 days surviving plants were transferred to pots and grown in the greenhouse.


Example 4: Expression Profile of the Native p-Bn-Napin and the p-BnNapin_perm Gene Control Elements

To demonstrate and analyze the transcription regulating properties of a promoter, it is useful to operably link the promoter or its fragments to a reporter gene, which can be employed to monitor its expression both qualitatively and quantitatively. Preferably bacterial β-glucuronidase is used (Jefferson 1987). β-glucuronidase activity can be monitored in planta with chromogenic substrates such as 5-bromo-4-Chloro-3-indolyl-β-D-glucuronic acid during corresponding activity assays (Jefferson 1987). For determination of promoter activity and tissue specificity, plant tissue is dissected, stained and analyzed as described (e.g., Bäumlein 1991).


The regenerated transgenic T0 Arabidopsis plants harboring single or double insertions of the transgene deriving from constructs LLL1184 (SEQ ID NO9) and constructs LLL1176 (SEQ ID NO10) were used for reporter gene analysis. Table 8 summarizes the reporter gene activity observed in plants harboring transgenes containing SEQ ID NO9 and SEQ ID NO10 in constructs LLL1184 and LLL1176, respectively:









TABLE 8







beta-Glucuronidase reporter gene activity in selected Arabidopsis


plants harboring transgenes with SEQ ID NO 9 or 10 respectively.









Tissue
LLL1176
LLL1184





leaves
negative
negative


Stem
negative
negative


Roots
negative
negative


Flower
negative
negative


Silique
weak
weak


Embryo (medium)
strong
strong


Embryo (mature)
strong
strong









The gene expression activity conferred by pBn-Napin and p-BNapin_perm is shown exemplary in FIG. 3 (p-Bn_napin SEQ ID NO5, p-BnNapin_perm SEQ ID NO6)


General results for SEQ ID NO5 and 6: For both promoters pBn-Napin and p-BNapin_perm strong GUS expression was detected in medium to mature stages of embryo development. Weak expression was monitored in seed shells and in siliques. No activity was found in other tissues analyzed.


Example 5: Directed Permutation of a Constitutive Promoter Sequence

Using publicly available data, one promoters showing constitutive expression in plants was selected (de Pater, B. S., van der Mark, F., Rueb, S., Katagiri, F., Chua, N. H., Schilperoort, R. A. and Hensgens, L. A. (1992) The promoter of the rice gene GOS2 is active in various different monocot tissues and binds rice nuclear factor ASF-1 Plant J. 2 (6)) for analyzing the effects of sequence permutation in periodic intervals throughout the full length of the promoter DNA sequence. The wildtype or starting sequence of the Oryza sativa p-GOS2 (SEQ ID NO 13) (with the prefix p- denoting promoter) promoter was analyzed and annotated for the occurrence of motives, boxes, cis-regulatory elements using e.g. the GEMS Launcher Software (www.genomatix.de) as described above in example 1.


The promoter p-Gos2 encompasses a 5′UTR sequence with an internal intron. To ensure correct splicing of the intron after permutation, splice sites and putative branching point were not altered. No nucleotide exchanges were introduced into sequences 10 bp up- and downstream of the splice site (5′ GT; 3′ CAG) and “TNA” sequence elements within the last 100 base pairs of the original p-Gos2 were preserved after permutation.


In the following, the DNA sequence of the promoter was permutated according to the method of the invention to yield p-GOS2_perm1 and p-GOS2_perm2 respectively (SEQ ID NO 14 and 15).


The list of motives, boxes, cis regulatory elements in the p-GOS2 promoters before and after the permutation are shown in Table 9 for the starting sequence of p-GOS2, Table 10 for the p-GOS2_perm1 (SEQ ID NO 14) and Table 11 for the p-GOS2_perm2 sequence (SEQ ID NO 15).


Empty lines resemble motives, boxes, cis regulatory elements not found in one sequence but present in the corresponding sequence, hence, motives, boxes, cis regulatory elements that were deleted from the starting sequence or that were introduced into the permutated sequence.









TABLE 9







Boxes and Motifs identified in the starting sequence of the p-GOS2 promoter













Position
Core
Matrix


p-GOS2

Position
sim.
sim.













Family
Further Family Information
Matrix
Opt.
from-to



















P$NCS1
Nodulin consensus sequence 1
P$NCS1.01
0.85
6
16
1
0.857


P$MSAE
M-phase-specific activator
P$MSA.01
0.8
15
29
1
0.832



elements


P$MYBL
MYB-like proteins
P$GAMYB.01
0.91
29
45
1
0.927


P$MYBL
MYB-like proteins
P$WER.01
0.87
33
49
1
0.897


P$MADS
MADS box proteins
P$AGL2.01
0.82
35
55
0.79
0.82


P$NACF
Plant specific NAC [NAM
P$IDEF2.01
0.96
48
60
1
0.96



(no apical meristem),



ATAF172, CUC2 (cup-



shaped cotyledons 2)]



transcription factors


P$BRRE
Brassinosteroid (BR) response
P$BZR1.01
0.95
48
64
1
0.954



element


O$PTBP
Plant TATA binding protein
O$PTATA.01
0.88
60
74
1
0.883



factor


O$VTBP
Vertebrate TATA binding
O$VTATA.01
0.9
61
77
1
0.961



protein factor


O$INRE
Core promoter initiator elements
O$DINR.01
0.94
65
75
0.97
0.94


O$VTBP
Vertebrate TATA binding
O$LTATA.01
0.82
69
85
1
0.842



protein factor


O$VTBP
Vertebrate TATA binding
O$VTATA.01
0.9
71
87
0.89
0.921



protein factor


O$YTBP
Yeast TATA binding protein
O$SPT15.01
0.83
74
90
1
0.832



factor


O$YTBP
Yeast TATA binding protein
O$SPT15.01
0.83
75
91
1
0.876



factor


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
76
92
0.75
0.781



protein factor


O$YTBP
Yeast TATA binding protein
O$SPT15.01
0.83
77
93
0.76
0.835



factor


P$MIIG
MYB IIG-type binding sites
P$PALBOXL.01
0.8
118
132
0.77
0.841


P$DOFF
DNA binding with one finger
P$DOF1.01
0.98
126
142
1
0.99



(DOF)


P$DOFF
DNA binding with one finger
P$PBF.01
0.97
149
165
1
0.989



(DOF)


P$WNAC
Wheat NAC-domain transcription
P$TANAC69.01
0.68
170
192
0.81
0.712



factors


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
187
203
1
0.922



protein factor


P$E2FF
E2F-homolog cell cycle
P$E2F.01
0.82
193
207
1
0.829



regulators


O$INRE
Core promoter initiator elements
O$DINR.01
0.94
200
210
0.97
0.945


P$AHBP

Arabidopsis homeobox

P$ATHB5.01
0.89
207
217
0.83
0.903



protein


P$AHBP

Arabidopsis homeobox

P$HAHB4.01
0.87
207
217
1
0.967



protein


P$CNAC
Calcium regulated NAC-
P$CBNAC.02
0.85
215
235
1
0.947



factors


P$MYBS
MYB proteins with single
P$PHR1.01
0.84
217
233
1
0.944



DNA binding repeat


P$OCSE
Enhancer element first
P$OCSL.01
0.69
216
236
1
0.722



identified in the promoter of



the octopine synthase gene



(OCS) of the Agrobacterium




tumefaciens T-DNA



P$MYBS
MYB proteins with single
P$PHR1.01
0.84
222
238
1
0.979



DNA binding repeat


P$GTBX
GT-box elements
P$SBF1.01
0.87
246
262
1
0.901


P$STKM
Storekeeper motif
P$STK.01
0.85
251
265
1
0.85


P$AHBP

Arabidopsis homeobox

P$ATHB5.01
0.89
254
264
0.83
0.904



protein


P$AHBP

Arabidopsis homeobox

P$BLR.01
0.9
254
264
1
0.998



protein


P$HEAT
Heat shock factors
P$HSFA1A.01
0.75
284
300
1
0.757


P$CCAF
Circadian control factors
P$CCA1.01
0.85
297
311
1
0.953


P$LFYB
LFY binding site
P$LFY.01
0.93
318
330
0.91
0.945


P$GAGA
GAGA elements
P$BPC.01
1
329
353
1
1


P$CCAF
Circadian control factors
P$EE.01
0.84
335
349
0.75
0.865


P$GAGA
GAGA elements
P$BPC.01
1
331
355
1
1


P$CCAF
Circadian control factors
P$CCA1.01
0.85
337
351
1
0.968


P$GTBX
GT-box elements
P$SBF1.01
0.87
341
357
1
0.875


P$MADS
MADS box proteins
P$SQUA.01
0.9
345
365
1
0.925


P$CCAF
Circadian control factors
P$EE.01
0.84
363
377
1
0.925


O$VTBP
Vertebrate TATA binding
O$MTATA.01
0.84
383
399
1
0.895



protein factor


P$CARM
CA-rich motif
P$CARICH.01
0.78
388
406
1
0.785


P$AHBP

Arabidopsis homeobox

P$HAHB4.01
0.87
397
407
1
0.902



protein


O$VTBP
Vertebrate TATA binding
O$LTATA.01
0.82
395
411
1
0.889



protein factor


O$VTBP
Vertebrate TATA binding
O$LTATA.01
0.82
396
412
1
0.844



protein factor


O$PTBP
Plant TATA binding protein
O$PTATA.01
0.88
398
412
1
0.892



factor


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
397
413
0.75
0.781



protein factor


P$AHBP

Arabidopsis homeobox

P$HAHB4.01
0.87
400
410
1
0.902



protein


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
402
418
0.75
0.781



protein factor


O$VTBP
Vertebrate TATA binding
O$VTATA.02
0.89
405
421
1
0.983



protein factor


O$PTBP
Plant TATA binding protein
O$PTATA.02
0.9
408
422
1
0.917



factor


P$OCSE
Enhancer element first
P$OCSTF.01
0.73
426
446
1
0.784



identified in the promoter of



the octopine synthase gene



(OCS) of the Agrobacterium




tumefaciens T-DNA



P$AHBP

Arabidopsis homeobox

P$HAHB4.01
0.87
440
450
1
0.926



protein


P$AHBP

Arabidopsis homeobox

P$WUS.01
0.94
444
454
1
1



protein


P$OPAQ
Opaque-2 like transcriptional
P$O2_GCN4.01
0.81
447
463
1
0.819



activators


P$SEF4
Soybean embryo factor 4
P$SEF4.01
0.98
472
482
1
0.984


P$GTBX
GT-box elements
P$SBF1.01
0.87
481
497
1
0.922


P$DOFF
DNA binding with one finger
P$DOF1.01
0.98
482
498
1
0.994



(DOF)


P$GTBX
GT-box elements
P$SBF1.01
0.87
482
498
1
0.9


P$WBXF
W Box family
P$WRKY11.01
0.94
493
509
1
0.963


P$SEF4
Soybean embryo factor 4
P$SEF4.01
0.98
504
514
1
0.994


P$IBOX
Plant I-Box sites
P$GATA.01
0.93
509
525
1
0.961


P$NCS1
Nodulin consensus sequence 1
P$NCS1.01
0.85
515
525
1
0.948


P$GTBX
GT-box elements
P$S1F.01
0.79
518
534
0.75
0.793


P$LREM
Light responsive element
P$RAP22.01
0.85
527
537
1
0.897



motif, not modulated by



different light qualities


P$L1BX
L1 box, motif for L1 layer-
P$ATML1.01
0.82
525
541
0.75
0.825



specific expression


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
539
555
0.75
0.782



protein factor


P$ROOT
Root hair-specific cis-
P$RHE.01
0.77
568
592
0.75
0.772



elements in angiosperms


P$ABRE
ABA response elements
P$ABRE.01
0.82
591
607
1
0.837


P$ASRC
AS1/AS2 repressor complex
P$AS1_AS2_II.01
0.86
599
607
1
0.867


P$L1BX
L1 box, motif for L1 layer-
P$HDG9.01
0.77
629
645
1
0.89



specific expression


P$L1BX
L1 box, motif for L1 layer-
P$HDG9.01
0.77
631
647
0.8
0.783



specific expression


P$L1BX
L1 box, motif for L1 layer-
P$ATML1.01
0.82
638
654
1
0.877



specific expression


P$CCAF
Circadian control factors
P$EE.01
0.84
649
663
1
0.899


P$DOFF
DNA binding with one finger
P$PBF.01
0.97
687
703
1
0.987



(DOF)


P$GTBX
GT-box elements
P$SBF1.01
0.87
689
705
1
0.888


P$AHBP

Arabidopsis homeobox

P$BLR.01
0.9
695
705
1
0.929



protein


P$CCAF
Circadian control factors
P$EE.01
0.84
694
708
1
0.954


P$LREM
Light responsive element
P$RAP22.01
0.85
701
711
1
1



motif, not modulated by



different light qualities


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
699
715
0.75
0.822



protein factor


P$HMGF
High mobility group factors
P$HMG_IY.01
0.89
711
725
1
0.929


P$AHBP

Arabidopsis homeobox

P$ATHB1.01
0.9
716
726
0.79
0.901



protein


P$AHBP

Arabidopsis homeobox

P$BLR.01
0.9
716
726
1
0.998



protein


O$VTBP
Vertebrate TATA binding
O$VTATA.02
0.89
716
732
1
0.893



protein factor


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
715
733
1
0.856


P$DOFF
DNA binding with one finger
P$PBOX.01
0.75
718
734
0.76
0.762



(DOF)


P$HEAT
Heat shock factors
P$HSE.01
0.81
718
734
1
0.833


P$GAPB
GAP-Box (light response
P$GAP.01
0.88
733
747
1
0.924



elements)


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
744
760
0.78
0.834


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
754
770
0.75
0.831



protein factor


P$TELO
Telo box (plant interstitial
P$ATPURA.01
0.85
756
770
0.75
0.869



telomere motifs)


P$MYCL
Myc-like basic helix-loop-
P$OSBHLH66.01
0.85
789
807
1
0.851



helix binding factors


P$BRRE
Brassinosteroid (BR) response
P$BZR1.01
0.95
793
809
1
0.998



element


P$URNA
Upstream sequence element
P$USE.01
0.75
812
828
0.75
0.797



of U-snRNA genes


P$MADS
MADS box proteins
P$AGL1.01
0.84
812
832
1
0.895


P$MADS
MADS box proteins
P$AGL1.01
0.84
813
833
0.92
0.911


P$NCS1
Nodulin consensus sequence 1
P$NCS1.01
0.85
872
882
0.81
0.888


P$LREM
Light responsive element
P$RAP22.01
0.85
879
889
1
0.896



motif, not modulated by



different light qualities


P$MSAE
M-phase-specific activator
P$MSA.01
0.8
880
894
1
0.877



elements


P$MYBL
MYB-like proteins
P$NTMYBAS1.01
0.96
900
916
0.95
0.968


P$GTBX
GT-box elements
P$SBF1.01
0.87
909
925
1
0.905


P$MYBL
MYB-like proteins
P$AS1_AS2_I.01
0.99
911
927
1
1


P$LREM
Light responsive element
P$RAP22.01
0.85
981
991
1
0.893



motif, not modulated by



different light qualities


O$PTBP
Plant TATA binding protein
O$PTATA.02
0.9
982
996
1
0.951



factor


P$L1BX
L1 box, motif for L1 layer-
P$PDF2.01
0.85
982
998
1
0.884



specific expression


O$VTBP
Vertebrate TATA binding
O$VTATA.01
0.9
983
999
1
0.955



protein factor


P$MADS
MADS box proteins
P$AGL15.01
0.79
1006
1026
0.83
0.793


P$MYBS
MYB proteins with single
P$ZMMRP1.01
0.79
1008
1024
0.78
0.811



DNA binding repeat


O$PTBP
Plant TATA binding protein
O$PTATA.02
0.9
1010
1024
1
0.91



factor


P$CGCG
Calmodulin binding/
P$ATSR1.01
0.84
1051
1067
1
0.859



CGCG box binding proteins


P$ABRE
ABA response elements
P$ABF1.01
0.79
1053
1069
1
0.837


P$CE3S
Coupling element 3 sequence
P$CE3.01
0.77
1052
1070
1
0.893


P$NACF
Plant specific NAC [NAM
P$ANAC092.01
0.92
1055
1067
1
0.927



(no apical meristem),



ATAF172, CUC2 (cup-



shaped cotyledons 2)]



transcription factors


P$DPBF
Dc3 promoter binding factors
P$DPBF.01
0.89
1057
1067
1
0.908


P$PREM
Motifs of plastid response
P$MGPROTORE.01
0.77
1059
1089
1
0.806



elements


O$MTEN
Core promoter motif ten
O$HMTE.01
0.88
1072
1092
0.96
0.94



elements


P$DREB
Dehydration responsive
P$HVDRF1.01
0.89
1079
1093
1
0.922



element binding factors


P$PREM
Motifs of plastid response
P$MGPROTORE.01
0.77
1077
1107
1
0.805



elements


O$MTEN
Core promoter motif ten
O$DMTE.01
0.77
1097
1117
0.84
0.805



elements


P$OPAQ
Opaque-2 like transcriptional
P$O2.02
0.87
1135
1151
1
0.915



activators


P$SALT
Salt/drought responsive
P$ALFIN1.02
0.95
1136
1150
1
0.954



elements


P$L1BX
L1 box, motif for L1 layer-
P$PDF2.01
0.85
1179
1195
1
0.882



specific expression


P$SBPD
SBP-domain proteins
P$SBP.01
0.88
1199
1215
1
0.912


P$PALA
Conserved box A in PAL
P$PALBOXA.01
0.84
1201
1219
1
0.863



and 4CL gene promoters


P$MYBS
MYB proteins with single
P$ZMMRP1.01
0.79
1230
1246
1
0.833



DNA binding repeat


P$AHBP

Arabidopsis homeobox

P$ATHB9.01
0.77
1244
1254
1
0.867



protein


P$MADS
MADS box proteins
P$AGL2.01
0.82
1248
1268
0.97
0.828


P$MYBS
MYB proteins with single
P$MYBST1.01
0.9
1262
1278
1
0.953



DNA binding repeat


P$HEAT
Heat shock factors
P$HSE.01
0.81
1278
1294
1
0.864


P$LEGB
Legumin Box family
P$RY.01
0.87
1277
1303
1
0.871


P$MYBS
MYB proteins with single
P$OSMYBS.01
0.82
1343
1359
0.75
0.822



DNA binding repeat


O$INRE
Core promoter initiator elements
O$DINR.01
0.94
1349
1359
0.97
0.955


P$STKM
Storekeeper motif
P$STK.01
0.85
1355
1369
1
0.95


P$GTBX
GT-box elements
P$GT1.01
0.85
1403
1419
0.97
0.865


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
1439
1455
0.75
0.797



protein factor


P$OCSE
Enhancer element first
P$OCSL.01
0.69
1437
1457
0.77
0.745



identified in the promoter of



the octopine synthase gene



(OCS) of the Agrobacterium




tumefaciens T-DNA



P$HEAT
Heat shock factors
P$HSFA1A.01
0.75
1478
1494
1
0.764


P$WBXF
W Box family
P$WRKY.01
0.92
1488
1504
1
0.94


P$TEFB
TEF-box
P$TEF1.01
0.76
1491
1511
0.96
0.858


P$MYBS
MYB proteins with single
P$HVMCB1.01
0.93
1498
1514
1
0.934



DNA binding repeat


P$MYBS
MYB proteins with single
P$TAMYB80.01
0.83
1509
1525
0.75
0.837



DNA binding repeat


P$MSAE
M-phase-specific activator
P$MSA.01
0.8
1551
1565
1
0.802



elements


P$OPAQ
Opaque-2 like transcriptional
P$O2.01
0.87
1558
1574
1
0.883



activators


P$AHBP

Arabidopsis homeobox

P$ATHB5.01
0.89
1569
1579
0.83
0.904



protein


P$AHBP

Arabidopsis homeobox

P$ATHB5.01
0.89
1569
1579
0.94
0.978



protein


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
1609
1625
0.75
0.781



protein factor


P$LREM
Light responsive element
P$RAP22.01
0.85
1613
1623
1
0.966



motif, not modulated by



different light qualities


P$TEFB
TEF-box
P$TEF1.01
0.76
1617
1637
0.84
0.812


P$WNAC
Wheat NAC-domain transcription
P$TANAC69.01
0.68
1625
1647
0.9
0.775



factors


P$NACF
Plant specific NAC [NAM
P$ANAC019.01
0.94
1632
1644
0.95
0.968



(no apical meristem),



ATAF172, CUC2 (cup-



shaped cotyledons 2)]



transcription factors


P$GTBX
GT-box elements
P$S1F.01
0.79
1642
1658
1
0.917


P$PSRE
Pollen-specific regulatory
P$GAAA.01
0.83
1644
1660
1
0.864



elements


P$MYBL
MYB-like proteins
P$MYBPH3.01
0.8
1647
1663
1
0.938


P$DOFF
DNA binding with one finger
P$DOF1.01
0.98
1694
1710
1
1



(DOF)


P$HEAT
Heat shock factors
P$HSFA1A.01
0.75
1703
1719
0.86
0.757


P$CCAF
Circadian control factors
P$EE.01
0.84
1719
1733
1
0.955


P$MADS
MADS box proteins
P$AG.01
0.8
1717
1737
0.9
0.816


P$GTBX
GT-box elements
P$ASIL1.01
0.93
1732
1748
1
0.967


O$INRE
Core promoter initiator elements
O$DINR.01
0.94
1749
1759
1
0.957


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
1749
1767
0.75
0.837


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
1754
1772
0.75
0.815


P$L1BX
L1 box, motif for L1 layer-
P$ATML1.02
0.76
1757
1773
0.89
0.848



specific expression


P$AHBP

Arabidopsis homeobox

P$ATHB9.01
0.77
1761
1771
0.75
0.815



protein


O$VTBP
Vertebrate TATA binding
O$VTATA.02
0.89
1777
1793
1
0.996



protein factor


P$DOFF
DNA binding with one finger
P$DOF3.01
0.99
1778
1794
1
0.995



(DOF)


O$PTBP
Plant TATA binding protein
O$PTATA.02
0.9
1780
1794
1
0.923



factor


P$IBOX
Plant I-Box sites
P$GATA.01
0.93
1787
1803
1
0.967


P$MYBS
MYB proteins with single
P$MYBST1.01
0.9
1790
1806
1
0.972



DNA binding repeat


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
1803
1819
0.75
0.797



protein factor


P$IBOX
Plant I-Box sites
P$GATA.01
0.93
1847
1863
1
0.945


P$MYBS
MYB proteins with single
P$MYBST1.01
0.9
1850
1866
1
0.966



DNA binding repeat


P$MADS
MADS box proteins
P$SQUA.01
0.9
1866
1886
1
0.916


P$GTBX
GT-box elements
P$SBF1.01
0.87
1872
1888
1
0.905


O$VTBP
Vertebrate TATA binding
O$LTATA.01
0.82
1873
1889
1
0.837



protein factor


P$AHBP

Arabidopsis homeobox

P$HAHB4.01
0.87
1878
1888
1
0.902



protein


P$L1BX
L1 box, motif for L1 layer-
P$ATML1.01
0.82
1882
1898
0.75
0.824



specific expression


O$INRE
Core promoter initiator elements
O$DINR.01
0.94
1886
1896
0.97
0.949


P$EPFF
EPF-type zinc finger factors,
P$ZPT22.01
0.75
1887
1909
1
0.774



two canonical



Cys2/His2 zinc finger motifs



separated by spacers of



various length


P$GAPB
GAP-Box (light response
P$GAP.01
0.88
1907
1921
1
0.903



elements)


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
1912
1930
1
0.849


P$HMGF
High mobility group factors
P$HMG_IY.01
0.89
1920
1934
1
0.892


P$SEF4
Soybean embryo factor 4
P$SEF4.01
0.98
1927
1937
1
0.984


P$MYBL
MYB-like proteins
P$ATMYB77.01
0.87
1973
1989
1
0.9


P$GTBX
GT-box elements
P$ASIL1.01
0.93
1998
2014
1
0.971


P$OPAQ
Opaque-2 like transcriptional
P$O2_GCN4.01
0.81
2001
2017
1
0.83



activators


P$IBOX
Plant I-Box sites
P$GATA.01
0.93
2018
2034
1
0.964


P$MYBS
MYB proteins with single
P$MYBST1.01
0.9
2021
2037
1
0.957



DNA binding repeat


P$LREM
Light responsive element
P$RAP22.01
0.85
2035
2045
1
0.858



motif, not modulated by



different light qualities


P$MIIG
MYB IIG-type binding sites
P$MYBC1.01
0.92
2033
2047
1
0.941


P$HEAT
Heat shock factors
P$HSFA1A.01
0.75
2041
2057
1
0.792


P$MYBL
MYB-like proteins
P$GAMYB.01
0.91
2054
2070
1
0.918


P$GTBX
GT-box elements
P$GT1.01
0.85
2056
2072
1
0.876


P$ASRC
AS1/AS2 repressor complex
P$AS1_AS2_II.01
0.86
2067
2075
1
0.906


P$EINL
Ethylen insensitive 3 like
P$TEIL.01
0.92
2098
2106
0.96
0.926



factors


O$VTBP
Vertebrate TATA binding
O$LTATA.01
0.82
2110
2126
1
0.828



protein factor


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
2110
2126
1
0.807
















TABLE 10







Boxes and Motifs identified in the permutated sequence of the p-GOS2_perm1 promoter.













Position




p-GOS2_perm1

Position
Core
Matrix













Family
Further Family Information
Matrix
Opt.
from-to
sim.
sim.

















P$NCS1
Nodulin consensus sequence 1
P$NCS1.01
0.85
6
16
1
0.857


P$MSAE
M-phase-specific activator
P$MSA.01
0.8
15
29
1
0.832



elements


P$MYBL
MYB-like proteins
P$GAMYB.01
0.91
29
45
1
0.92


P$MYBL
MYB-like proteins
P$WER.01
0.87
33
49
1
0.897


P$MADS
MADS box proteins
P$AGL2.01
0.82
35
55
0.79
0.82


P$NACF
Plant specific NAC [NAM (no
P$IDEF2.01
0.96
48
60
1
0.96



apical meristem), ATAF172,



CUC2 (cup-shaped cotyledons



2)] transcription factors


P$BRRE
Brassinosteroid (BR) response
P$BZR1.01
0.95
48
64
1
0.954



element


O$PTBP
Plant TATA binding protein
O$PTATA.01
0.88
60
74
1
0.887



factor


O$VTBP
Vertebrate TATA binding
O$VTATA.01
0.9
61
77
1
0.961



protein factor


O$INRE
Core promoter initiator elements
O$DINR.01
0.94
65
75
0.97
0.94


O$VTBP
Vertebrate TATA binding
O$LTATA.01
0.82
69
85
1
0.867



protein factor


O$VTBP
Vertebrate TATA binding
O$VTATA.01
0.9
71
87
0.89
0.92



protein factor


O$YTBP
Yeast TATA binding protein
O$SPT15.01
0.83
74
90
1
0.832



factor


O$YTBP
Yeast TATA binding protein
O$SPT15.01
0.83
75
91
1
0.877



factor


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
76
92
0.75
0.781



protein factor


O$YTBP
Yeast TATA binding protein
O$SPT15.01
0.83
77
93
0.76
0.835



factor


P$MIIG
MYB IIG-type binding sites
P$PALBOXL.01
0.8
118
132
0.77
0.841


P$DOFF
DNA binding with one finger
P$DOF1.01
0.98
126
142
1
0.99



(DOF)


P$DOFF
DNA binding with one finger
P$PBF.01
0.97
149
165
1
0.989



(DOF)


P$WNAC
Wheat NAC-domain transcription
P$TANAC69.01
0.68
170
192
0.81
0.712



factors


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
187
203
1
0.878



protein factor


P$E2FF
E2F-homolog cell cycle regulators
P$E2F.01
0.82
193
207
1
0.826


P$AHBP

Arabidopsis homeobox protein

P$ATHB5.01
0.89
207
217
0.83
0.903


P$AHBP

Arabidopsis homeobox protein

P$HAHB4.01
0.87
207
217
1
0.967


P$CNAC
Calcium regulated NAC-
P$CBNAC.02
0.85
215
235
1
0.937



factors


P$MYBS
MYB proteins with single
P$PHR1.01
0.84
217
233
1
0.944



DNA binding repeat


P$OCSE
Enhancer element first identified
P$OCSL.01
0.69
216
236
1
0.735



in the promoter of the



octopine synthase gene



(OCS) of the Agrobacterium




tumefaciens T-DNA



P$MYBS
MYB proteins with single
P$PHR1.01
0.84
222
238
1
0.979



DNA binding repeat


P$GTBX
GT-box elements
P$SBF1.01
0.87
246
262
1
0.901


P$STKM
Storekeeper motif
P$STK.01
0.85
251
265
1
0.85


P$AHBP

Arabidopsis homeobox protein

P$ATHB5.01
0.89
254
264
0.83
0.904


P$AHBP

Arabidopsis homeobox protein

P$BLR.01
0.9
254
264
1
0.998


P$HEAT
Heat shock factors
P$HSFA1A.01
0.75
284
300
1
0.757


P$CCAF
Circadian control factors
P$CCA1.01
0.85
297
311
1
0.94


P$LFYB
LFY binding site
P$LFY.01
0.93
318
330
0.91
0.945


P$WBXF
W Box family
P$ERE.01
0.89
322
338
1
0.893


P$GAGA
GAGA elements
P$BPC.01
1
329
353
1
1


P$CCAF
Circadian control factors
P$EE.01
0.84
335
349
0.75
0.865


P$GAGA
GAGA elements
P$BPC.01
1
331
355
1
1


P$CCAF
Circadian control factors
P$CCA1.01
0.85
337
351
1
0.968


P$GTBX
GT-box elements
P$SBF1.01
0.87
341
357
1
0.875


P$MADS
MADS box proteins
P$SQUA.01
0.9
345
365
1
0.925


P$CCAF
Circadian control factors
P$EE.01
0.84
363
377
1
0.924


O$VTBP
Vertebrate TATA binding
O$MTATA.01
0.84
383
399
1
0.895



protein factor


P$CARM
CA-rich motif
P$CARICH.01
0.78
388
406
1
0.8


P$AHBP

Arabidopsis homeobox protein

P$HAHB4.01
0.87
397
407
1
0.902


O$VTBP
Vertebrate TATA binding
O$LTATA.01
0.82
395
411
1
0.889



protein factor


O$VTBP
Vertebrate TATA binding
O$LTATA.01
0.82
396
412
1
0.844



protein factor


O$PTBP
Plant TATA binding protein
O$PTATA.01
0.88
398
412
1
0.892



factor


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
397
413
0.75
0.781



protein factor


P$AHBP

Arabidopsis homeobox protein

P$HAHB4.01
0.87
400
410
1
0.902


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
402
418
0.75
0.781



protein factor


O$VTBP
Vertebrate TATA binding
O$VTATA.02
0.89
405
421
1
0.983



protein factor


O$PTBP
Plant TATA binding protein
O$PTATA.02
0.9
408
422
1
0.917



factor


P$OCSE
Enhancer element first identified
P$OCSTF.01
0.73
426
446
1
0.762



in the promoter of the



octopine synthase gene



(OCS) of the Agrobacterium




tumefaciens T-DNA



P$AHBP

Arabidopsis homeobox protein

P$HAHB4.01
0.87
440
450
1
0.926


P$AHBP

Arabidopsis homeobox protein

P$WUS.01
0.94
444
454
1
1


P$OPAQ
Opaque-2 like transcriptional
P$O2_GCN4.01
0.81
447
463
1
0.819



activators


P$SEF4
Soybean embryo factor 4
P$SEF4.01
0.98
472
482
1
0.988


P$GTBX
GT-box elements
P$SBF1.01
0.87
481
497
1
0.922


P$DOFF
DNA binding with one finger
P$DOF1.01
0.98
482
498
1
0.994



(DOF)


P$GTBX
GT-box elements
P$SBF1.01
0.87
482
498
1
0.9


P$WBXF
W Box family
P$WRKY11.01
0.94
493
509
1
0.957


P$SEF4
Soybean embryo factor 4
P$SEF4.01
0.98
504
514
1
0.988


P$IBOX
Plant I-Box sites
P$GATA.01
0.93
509
525
1
0.961


P$NCS1
Nodulin consensus sequence 1
P$NCS1.01
0.85
515
525
1
0.948


P$GTBX
GT-box elements
P$S1F.01
0.79
518
534
0.75
0.793


P$LREM
Light responsive element
P$RAP22.01
0.85
527
537
1
0.897



motif, not modulated by different



light qualities


P$L1BX
L1 box, motif for L1 layer-
P$HDG9.01
0.77
525
541
0.75
0.78



specific expression


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
539
555
0.75
0.782



protein factor


P$ROOT
Root hair-specific cis-
P$RHE.01
0.77
568
592
0.75
0.772



elements in angiosperms


P$ABRE
ABA response elements
P$ABRE.01
0.82
591
607
1
0.837


P$ASRC
AS1/AS2 repressor complex
P$AS1_AS2_II.01
0.86
599
607
1
0.867


P$L1BX
L1 box, motif for L1 layer-
P$HDG9.01
0.77
629
645
1
0.888



specific expression


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
631
647
0.75
0.831



protein factor


P$L1BX
L1 box, motif for L1 layer-
P$HDG9.01
0.77
631
647
0.8
0.783



specific expression


P$L1BX
L1 box, motif for L1 layer-
P$PDF2.01
0.85
638
654
1
0.861



specific expression


P$CCAF
Circadian control factors
P$EE.01
0.84
649
663
1
0.899


P$DOFF
DNA binding with one finger
P$PBF.01
0.97
687
703
1
0.987



(DOF)


P$GTBX
GT-box elements
P$SBF1.01
0.87
689
705
1
0.888


P$AHBP

Arabidopsis homeobox protein

P$BLR.01
0.9
695
705
1
0.929


P$CCAF
Circadian control factors
P$EE.01
0.84
694
708
1
0.954


P$LREM
Light responsive element
P$RAP22.01
0.85
701
711
1
0.98



motif, not modulated by different



light qualities


P$HMGF
High mobility group factors
P$HMG_IY.01
0.89
711
725
1
0.929


P$AHBP

Arabidopsis homeobox protein

P$ATHB1.01
0.9
716
726
0.79
0.901


P$AHBP

Arabidopsis homeobox protein

P$BLR.01
0.9
716
726
1
0.998


O$VTBP
Vertebrate TATA binding
O$VTATA.02
0.89
716
732
1
0.893



protein factor


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
715
733
1
0.856


P$DOFF
DNA binding with one finger
P$PBOX.01
0.75
718
734
0.76
0.762



(DOF)


P$HEAT
Heat shock factors
P$HSE.01
0.81
718
734
1
0.833


P$GAPB
GAP-Box (light response
P$GAP.01
0.88
733
747
1
0.917


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
744
760
0.78
0.834


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
754
770
0.75
0.831



protein factor


P$TELO
Telo box (plant interstitial
P$ATPURA.01
0.85
756
770
0.75
0.869



telomere motifs)


P$MYCL
Myc-like basic helix-loop-
P$OSBHLH66.01
0.85
789
807
1
0.851



helix binding factors


P$BRRE
Brassinosteroid (BR) response
P$BZR1.01
0.95
793
809
1
0.998



element


P$URNA
Upstream sequence element
P$USE.01
0.75
812
828
0.75
0.797



of U-snRNA genes


P$MADS
MADS box proteins
P$AGL1.01
0.84
812
832
1
0.895


P$MADS
MADS box proteins
P$AGL1.01
0.84
813
833
0.92
0.911


P$NCS1
Nodulin consensus sequence 1
P$NCS1.01
0.85
872
882
0.81
0.888


P$LREM
Light responsive element
P$RAP22.01
0.85
879
889
1
0.896



motif, not modulated by different



light qualities


P$MSAE
M-phase-specific activator
P$MSA.01
0.8
880
894
1
0.877



elements


P$MYBL
MYB-like proteins
P$NTMYBAS1.01
0.96
900
916
0.95
0.968


P$GTBX
GT-box elements
P$SBF1.01
0.87
909
925
1
0.905


P$MYBL
MYB-like proteins
P$AS1_AS2_I.01
0.99
911
927
1
1


P$LREM
Light responsive element
P$RAP22.01
0.85
981
991
1
0.893



motif, not modulated by different



light qualities


O$PTBP
Plant TATA binding protein
O$PTATA.02
0.9
982
996
1
0.951



factor


P$L1BX
L1 box, motif for L1 layer-
P$PDF2.01
0.85
982
998
1
0.884



specific expression


O$VTBP
Vertebrate TATA binding
O$VTATA.01
0.9
983
999
1
0.955



protein factor


P$MADS
MADS box proteins
P$AGL15.01
0.79
1006
1026
0.83
0.8


P$MYBS
MYB proteins with single
P$ZMMRP1.01
0.79
1008
1024
0.78
0.811



DNA binding repeat


O$PTBP
Plant TATA binding protein
O$PTATA.02
0.9
1010
1024
1
0.91



factor


P$CGCG
Calmodulin binding/CGCG
P$ATSR1.01
0.84
1051
1067
1
0.859



box binding proteins


P$ABRE
ABA response elements
P$ABF1.01
0.79
1053
1069
1
0.837


P$CE3S
Coupling element 3 sequence
P$CE3.01
0.77
1052
1070
1
0.863


P$NACF
Plant specific NAC [NAM (no
P$ANAC092.01
0.92
1055
1067
1
0.927



apical meristem), ATAF172,



CUC2 (cup-shaped cotyledons



2)] transcription factors


P$DPBF
Dc3 promoter binding factors
P$DPBF.01
0.89
1057
1067
1
0.908


P$PREM
Motifs of plastid response
P$MGPROTORE.01
0.77
1059
1089
1
0.806



elements


O$MTEN
Core promoter motif ten elements
O$HMTE.01
0.88
1072
1092
0.96
0.94


P$DREB
Dehydration responsive element
P$HVDRF1.01
0.89
1079
1093
1
0.917



binding factors


P$PREM
Motifs of plastid response
P$MGPROTORE.01
0.77
1077
1107
1
0.807



elements


O$MTEN
Core promoter motif ten elements
O$DMTE.01
0.77
1097
1117
0.84
0.805


P$OPAQ
Opaque-2 like transcriptional
P$O2.02
0.87
1135
1151
1
0.915



activators


P$SALT
Salt/drought responsive elements
P$ALFIN1.02
0.95
1136
1150
1
0.954


P$L1BX
L1 box, motif for L1 layer-
P$PDF2.01
0.85
1179
1195
1
0.882



specific expression


P$SBPD
SBP-domain proteins
P$SBP.01
0.88
1199
1215
1
0.912


P$PALA
Conserved box A in PAL and
P$PALBOXA.01
0.84
1201
1219
1
0.863



4CL gene promoters


P$MYBS
MYB proteins with single
P$ZMMRP1.01
0.79
1230
1246
1
0.833



DNA binding repeat


P$AHBP

Arabidopsis homeobox protein

P$ATHB9.01
0.77
1244
1254
1
0.89


P$AHBP

Arabidopsis homeobox protein

P$ATHB9.01
0.77
1244
1254
0.75
0.777


P$MADS
MADS box proteins
P$AGL2.01
0.82
1248
1268
0.97
0.835


P$MYBS
MYB proteins with single
P$MYBST1.01
0.9
1262
1278
1
0.953



DNA binding repeat


P$HEAT
Heat shock factors
P$HSE.01
0.81
1278
1294
1
0.864


P$LEGB
Legumin Box family
P$RY.01
0.87
1277
1303
1
0.871


P$MYBS
MYB proteins with single
P$OSMYBS.01
0.82
1343
1359
0.75
0.822



DNA binding repeat


O$INRE
Core promoter initiator elements
O$DINR.01
0.94
1349
1359
0.97
0.955


P$STKM
Storekeeper motif
P$STK.01
0.85
1355
1369
1
0.927


P$GTBX
GT-box elements
P$GT1.01
0.85
1403
1419
0.97
0.865


P$OCSE
Enhancer element first identified
P$OCSL.01
0.69
1437
1457
0.77
0.703



in the promoter of the



octopine synthase gene



(OCS) of the Agrobacterium




tumefaciens T-DNA



P$HEAT
Heat shock factors
P$HSFA1A.01
0.75
1478
1494
1
0.764


P$WBXF
W Box family
P$ERE.01
0.89
1488
1504
1
0.968


P$TEFB
TEF-box
P$TEF1.01
0.76
1491
1511
0.96
0.852


P$MYBS
MYB proteins with single
P$HVMCB1.01
0.93
1498
1514
1
0.934



DNA binding repeat


P$MYBS
MYB proteins with single
P$TAMYB80.01
0.83
1509
1525
0.75
0.837



DNA binding repeat


P$MSAE
M-phase-specific activator
P$MSA.01
0.8
1551
1565
1
0.82



elements


P$OPAQ
Opaque-2 like transcriptional
P$O2.01
0.87
1558
1574
1
0.883



activators


P$AHBP

Arabidopsis homeobox protein

P$ATHB5.01
0.89
1569
1579
0.83
0.904


P$AHBP

Arabidopsis homeobox protein

P$ATHB5.01
0.89
1569
1579
0.94
0.978


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
1609
1625
0.75
0.781



protein factor


P$LREM
Light responsive element
P$RAP22.01
0.85
1613
1623
1
0.966



motif, not modulated by different



light qualities


P$TEFB
TEF-box
P$TEF1.01
0.76
1617
1637
0.84
0.761


P$WNAC
Wheat NAC-domain transcription
P$TANAC69.01
0.68
1625
1647
0.9
0.75



factors


P$NACF
Plant specific NAC [NAM (no
P$ANAC019.01
0.94
1632
1644
0.95
0.968



apical meristem), ATAF172,



CUC2 (cup-shaped cotyledons



2)] transcription factors


P$GTBX
GT-box elements
P$S1F.01
0.79
1642
1658
1
0.882


P$PSRE
Pollen-specific regulatory
P$GAAA.01
0.83
1644
1660
1
0.864



elements


P$MYBL
MYB-like proteins
P$MYBPH3.01
0.8
1647
1663
1
0.938


P$DOFF
DNA binding with one finger
P$DOF1.01
0.98
1694
1710
1
1



(DOF)


P$HEAT
Heat shock factors
P$HSFA1A.01
0.75
1703
1719
0.86
0.765


P$CCAF
Circadian control factors
P$EE.01
0.84
1719
1733
1
0.955


P$MADS
MADS box proteins
P$AG.01
0.8
1717
1737
0.9
0.816


P$GTBX
GT-box elements
P$ASIL1.01
0.93
1732
1748
1
0.98


O$INRE
Core promoter initiator elements
O$DINR.01
0.94
1749
1759
1
0.957


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
1749
1767
0.75
0.837


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
1754
1772
0.75
0.815


P$L1BX
L1 box, motif for L1 layer-
P$ATML1.02
0.76
1757
1773
0.89
0.848



specific expression


P$AHBP

Arabidopsis homeobox protein

P$ATHB9.01
0.77
1761
1771
0.75
0.815


P$MADS
MADS box proteins
P$AGL3.01
0.83
1768
1788
0.97
0.838


O$VTBP
Vertebrate TATA binding
O$VTATA.02
0.89
1777
1793
1
0.996



protein factor


P$DOFF
DNA binding with one finger
P$DOF3.01
0.99
1778
1794
1
0.995



(DOF)


O$PTBP
Plant TATA binding protein
O$PTATA.02
0.9
1780
1794
1
0.923



factor


P$IBOX
Plant I-Box sites
P$GATA.01
0.93
1787
1803
1
0.967


P$MYBS
MYB proteins with single
P$MYBST1.01
0.9
1790
1806
1
0.972



DNA binding repeat


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
1803
1819
0.75
0.797



protein factor


P$IBOX
Plant I-Box sites
P$GATA.01
0.93
1847
1863
1
0.945


P$MYBS
MYB proteins with single
P$MYBST1.01
0.9
1850
1866
1
0.966



DNA binding repeat


P$MADS
MADS box proteins
P$SQUA.01
0.9
1866
1886
1
0.916


P$GTBX
GT-box elements
P$SBF1.01
0.87
1872
1888
1
0.905


O$VTBP
Vertebrate TATA binding
O$LTATA.01
0.82
1873
1889
1
0.837



protein factor


P$AHBP

Arabidopsis homeobox protein

P$HAHB4.01
0.87
1878
1888
1
0.902


P$L1BX
L1 box, motif for L1 layer-
P$ATML1.01
0.82
1882
1898
0.75
0.824



specific expression


O$INRE
Core promoter initiator elements
O$DINR.01
0.94
1886
1896
0.97
0.949


P$EPFF
EPF-type zinc finger factors,
P$ZPT22.01
0.75
1887
1909
1
0.752



two canonical Cys2/His2 zinc



finger motifs separated by



spacers of various length


P$GAPB
GAP-Box (light response
P$GAP.01
0.88
1907
1921
1
0.903



elements)


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
1912
1930
1
0.849


P$HMGF
High mobility group factors
P$HMG_IY.01
0.89
1920
1934
1
0.892


P$SEF4
Soybean embryo factor 4
P$SEF4.01
0.98
1927
1937
1
0.984


P$MYBL
MYB-like proteins
P$ATMYB77.01
0.87
1973
1989
1
0.9


P$GTBX
GT-box elements
P$ASIL1.01
0.93
1998
2014
1
0.958


P$OPAQ
Opaque-2 like transcriptional
P$O_GCN4.01
0.81
2001
2017
1
0.875



activators


P$IBOX
Plant I-Box sites
P$GATA.01
0.93
2018
2034
1
0.964


P$MYBS
MYB proteins with single
P$MYBST1.01
0.9
2021
2037
1
0.957



DNA binding repeat


P$LREM
Light responsive element
P$RAP22.01
0.85
2035
2045
1
0.868



motif, not modulated by different



light qualities


P$MIIG
MYB IIG-type binding sites
P$MYBC1.01
0.92
2033
2047
1
0.938


P$HEAT
Heat shock factors
P$HSFA1A.01
0.75
2041
2057
1
0.792


P$MYBL
MYB-like proteins
P$GAMYB.01
0.91
2054
2070
1
0.918


P$GTBX
GT-box elements
P$GT1.01
0.85
2056
2072
1
0.876


P$ASRC
AS1/AS2 repressor complex
P$AS1_AS2_II.01
0.86
2067
2075
1
0.906


P$ASRC
AS1/AS2 repressor complex
P$AS1_AS2_II.01
0.86
2075
2083
1
0.906


P$EINL
Ethylen insensitive 3 like factors
P$TEIL.01
0.92
2098
2106
0.96
0.926


O$VTBP
Vertebrate TATA binding
O$LTATA.01
0.82
2110
2126
1
0.828



protein factor


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
2110
2126
1
0.807
















TABLE 11







Boxes and Motifs identified in the permutated sequence of the p-GOS2_perm2 promoter.













Position




p-GOS2_perm2

Position
Core
Matrix













Family
Further Family Information
Matrix
Opt.
from-to
sim.
sim.

















P$NCS1
Nodulin consensus sequence 1
P$NCS1.01
0.85
6
16
1
0.857


P$MSAE
M-phase-specific activator
P$MSA.01
0.8
15
29
1
0.832



elements


P$MYBL
MYB-like proteins
P$GAMYB.01
0.91
29
45
1
0.95


P$MYBL
MYB-like proteins
P$WER.01
0.87
33
49
1
0.897


P$MADS
MADS box proteins
P$AGL2.01
0.82
35
55
0.789
0.82


P$NACF
Plant specific NAC [NAM
P$IDEF2.01
0.96
48
60
1
0.96



(no apical meristem),



ATAF172, CUC2 (cup-



shaped cotyledons 2)] transcription



factors


P$BRRE
Brassinosteroid (BR) response
P$BZR1.01
0.95
48
64
1
0.954



element


O$PTBP
Plant TATA binding protein
O$PTATA.01
0.88
60
74
1
0.883



factor


O$VTBP
Vertebrate TATA binding
O$VTATA.01
0.9
61
77
1
0.961



protein factor


O$INRE
Core promoter initiator elements
O$DINR.01
0.94
65
75
0.969
0.94


O$VTBP
Vertebrate TATA binding
O$LTATA.01
0.82
69
85
1
0.867



protein factor


O$VTBP
Vertebrate TATA binding
O$VTATA.01
0.9
71
87
0.892
0.92



protein factor


O$YTBP
Yeast TATA binding protein
O$SPT15.01
0.83
74
90
1
0.832



factor


O$YTBP
Yeast TATA binding protein
O$SPT15.01
0.83
75
91
1
0.877



factor


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
76
92
0.75
0.781



protein factor


O$YTBP
Yeast TATA binding protein
O$SPT15.01
0.83
77
93
0.755
0.835



factor


P$MIIG
MYB IIG-type binding sites
P$PALBOXL.01
0.8
118
132
0.768
0.841


P$DOFF
DNA binding with one finger
P$DOF1.01
0.98
126
142
1
0.99



(DOF)


P$DOFF
DNA binding with one finger
P$PBF.01
0.97
149
165
1
0.989



(DOF)


P$WNAC
Wheat NAC-domain transcription
P$TANAC69.01
0.68
170
192
0.812
0.713



factors


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
187
203
1
0.869



protein factor


P$E2FF
E2F-homolog cell cycle
P$E2F.01
0.82
193
207
1
0.829



regulators


O$INRE
Core promoter initiator elements
O$DINR.01
0.94
200
210
0.969
0.945


P$AHBP

Arabidopsis homeobox

P$ATHB5.01
0.89
207
217
0.83
0.903



protein


P$AHBP

Arabidopsis homeobox

P$HAHB4.01
0.87
207
217
1
0.967



protein


P$CNAC
Calcium regulated NAC-
P$CBNAC.02
0.85
215
235
1
0.95



factors


P$MYBS
MYB proteins with single
P$PHR1.01
0.84
217
233
1
0.975



DNA binding repeat


P$OCSE
Enhancer element first
P$OCSL.01
0.69
216
236
1
0.71



identified in the promoter of



the octopine synthase gene



(OCS) of the Agrobacterium




tumefaciens T-DNA



P$MYBS
MYB proteins with single
P$PHR1.01
0.84
222
238
1
0.922



DNA binding repeat


P$GTBX
GT-box elements
P$SBF1.01
0.87
246
262
1
0.901


P$STKM
Storekeeper motif
P$STK.01
0.85
251
265
1
0.85


P$AHBP

Arabidopsis homeobox

P$ATHB5.01
0.89
254
264
0.83
0.904



protein


P$AHBP

Arabidopsis homeobox

P$BLR.01
0.9
254
264
1
0.998



protein


P$HEAT
Heat shock factors
P$HSFA1A.01
0.75
284
300
1
0.784


P$CCAF
Circadian control factors
P$CCA1.01
0.85
297
311
1
0.947


P$LFYB
LFY binding site
P$LFY.01
0.93
318
330
0.914
0.945


P$GAGA
GAGA elements
P$BPC.01
1
329
353
1
1


P$CCAF
Circadian control factors
P$EE.01
0.84
335
349
0.75
0.865


P$GAGA
GAGA elements
P$BPC.01
1
331
355
1
1


P$CCAF
Circadian control factors
P$CCA1.01
0.85
337
351
1
0.968


P$GTBX
GT-box elements
P$SBF1.01
0.87
341
357
1
0.875


P$MADS
MADS box proteins
P$SQUA.01
0.9
345
365
1
0.925


P$CCAF
Circadian control factors
P$EE.01
0.84
363
377
1
0.925


O$VTBP
Vertebrate TATA binding
O$MTATA.01
0.84
383
399
1
0.91



protein factor


P$CARM
CA-rich motif
P$CARICH.01
0.78
388
406
1
0.785


P$AHBP

Arabidopsis homeobox

P$HAHB4.01
0.87
397
407
1
0.902



protein


O$VTBP
Vertebrate TATA binding
O$LTATA.01
0.82
395
411
1
0.889



protein factor


O$VTBP
Vertebrate TATA binding
O$LTATA.01
0.82
396
412
1
0.844



protein factor


O$PTBP
Plant TATA binding protein
O$PTATA.01
0.88
398
412
1
0.892



factor


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
397
413
0.75
0.781



protein factor


P$NCS2

Arabidopsis homeobox

P$HAHB4.01
0.87
400
410
1
0.902



protein


P$MSAE
Vertebrate TATA binding
O$ATATA.01
0.78
402
418
0.75
0.781



protein factor


P$MYBL
Vertebrate TATA binding
O$VTATA.02
0.89
405
421
1
0.983



protein factor


P$MYBL
Plant TATA binding protein
O$PTATA.02
0.9
408
422
1
0.917



factor


P$OCSE
Enhancer element first
P$OCSTF.01
0.73
426
446
1
0.733



identified in the promoter of



the octopine synthase gene



(OCS) of the Agrobacterium




tumefaciens T-DNA



P$AHBP

Arabidopsis homeobox

P$HAHB4.01
0.87
440
450
1
0.921



protein


P$AHBP

Arabidopsis homeobox

P$WUS.01
0.94
444
454
1
1



protein


P$OPAQ
Opaque-2 like transcriptional
P$O2_GCN4.01
0.81
447
463
1
0.819



activators


P$SEF4
Soybean embryo factor 4
P$SEF4.01
0.98
472
482
1
0.987


P$GTBX
GT-box elements
P$SBF1.01
0.87
481
497
1
0.922


P$DOFF
DNA binding with one finger
P$DOF1.01
0.98
482
498
1
0.994



(DOF)


P$GTBX
GT-box elements
P$SBF1.01
0.87
482
498
1
0.9


P$WBXF
W Box family
P$WRKY11.01
0.94
493
509
1
0.957


P$SEF4
Soybean embryo factor 4
P$SEF4.01
0.98
504
514
1
0.998


P$IBOX
Plant I-Box sites
P$GATA.01
0.93
509
525
1
0.986


P$NCS1
Nodulin consensus sequence 1
P$NCS1.01
0.85
515
525
1
0.948


P$GTBX
GT-box elements
P$S1F.01
0.79
518
534
0.75
0.793


P$LREM
Light responsive element
P$RAP22.01
0.85
527
537
1
0.897



motif, not modulated by



different light qualities


P$L1BX
L1 box, motif for L1 layer-
P$ATML1.01
0.82
525
541
0.75
0.825



specific expression


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
539
555
0.75
0.782



protein factor


P$ROOT
Root hair-specific cis-
P$RHE.01
0.77
568
592
0.75
0.787



elements in angiosperms


P$ABRE
ABA response elements
P$ABRE.01
0.82
591
607
1
0.837


P$ASRC
AS1/AS2 repressor complex
P$AS1_AS2_II.01
0.86
599
607
1
0.867


P$L1BX
L1 box, motif for L1 layer-
P$HDG9.01
0.77
629
645
1
0.883



specific expression


P$L1BX
L1 box, motif for L1 layer-
P$HDG9.01
0.77
631
647
0.797
0.776



specific expression


P$L1BX
L1 box, motif for L1 layer-
P$ATML1.01
0.82
638
654
1
0.886



specific expression


P$CCAF
Circadian control factors
P$EE.01
0.84
649
663
1
0.891


P$DOFF
DNA binding with one finger
P$PBF.01
0.97
687
703
1
0.987



(DOF)


P$GTBX
GT-box elements
P$SBF1.01
0.87
689
705
1
0.888


P$AHBP

Arabidopsis homeobox

P$BLR.01
0.9
695
705
1
0.929



protein


P$CCAF
Circadian control factors
P$EE.01
0.84
694
708
1
0.954


P$LREM
Light responsive element
P$RAP22.01
0.85
701
711
1
1



motif, not modulated by



different light qualities


P$MADS
MADS box proteins
P$RIN.01
0.77
699
719
1
0.776


P$HMGF
High mobility group factors
P$HMG_IY.01
0.89
711
725
1
0.924


P$AHBP

Arabidopsis homeobox

P$ATHB1.01
0.9
716
726
0.789
0.901



protein


P$AHBP

Arabidopsis homeobox

P$BLR.01
0.9
716
726
1
0.998



protein


O$VTBP
Vertebrate TATA binding
O$VTATA.02
0.89
716
732
1
0.893



protein factor


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
715
733
1
0.856


P$DOFF
DNA binding with one finger
P$PBOX.01
0.75
718
734
0.761
0.762



(DOF)


P$HEAT
Heat shock factors
P$HSE.01
0.81
718
734
1
0.833


P$GAPB
GAP-Box (light response
P$GAP.01
0.88
733
747
1
0.885



elements)


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
744
760
0.779
0.834


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
754
770
0.75
0.831



protein factor


P$TELO
Telo box (plant interstitial
P$ATPURA.01
0.85
756
770
0.75
0.869



telomere motifs)


P$MYCL
Myc-like basic helix-loop-
P$OSBHLH66.01
0.85
789
807
1
0.851



helix binding factors


P$BRRE
Brassinosteroid (BR) response
P$BZR1.01
0.95
793
809
1
0.998



element


P$URNA
Upstream sequence element
P$USE.01
0.75
812
828
0.75
0.797



of U-snRNA genes


P$MADS
MADS box proteins
P$AGL1.01
0.84
812
832
1
0.895


P$MADS
MADS box proteins
P$AGL1.01
0.84
813
833
0.915
0.911


P$NCS1
Nodulin consensus sequence 1
P$NCS1.01
0.85
872
882
0.805
0.888


P$LREM
Light responsive element
P$RAP22.01
0.85
879
889
1
0.896



motif, not modulated by



different light qualities


P$MSAE
M-phase-specific activator
P$MSA.01
0.8
880
894
1
0.877



elements


P$MYBL
MYB-like proteins
P$NTMYBAS1.01
0.96
900
916
0.949
0.968


P$GTBX
GT-box elements
P$SBF1.01
0.87
909
925
1
0.905


P$MYBL
MYB-like proteins
P$AS1_AS2_I.01
0.99
911
927
1
1


P$LREM
Light responsive element
P$RAP22.01
0.85
981
991
1
0.893



motif, not modulated by



different light qualities


O$PTBP
Plant TATA binding protein
O$PTATA.02
0.9
982
996
1
1



factor


P$L1BX
L1 box, motif for L1 layer-
P$PDF2.01
0.85
982
998
1
0.884



specific expression


O$VTBP
Vertebrate TATA binding
O$VTATA.01
0.9
983
999
1
0.973



protein factor


P$MADS
MADS box proteins
P$AGL15.01
0.79
1006
1026
0.825
0.793


P$MYBS
MYB proteins with single
P$ZMMRP1.01
0.79
1008
1024
0.778
0.811



DNA binding repeat


O$PTBP
Plant TATA binding protein
O$PTATA.02
0.9
1010
1024
1
0.91



factor


P$CGCG
Calmodulin binding/
P$ATSR1.01
0.84
1051
1067
1
0.859



CGCG box binding proteins


P$ABRE
ABA response elements
P$ABF1.01
0.79
1053
1069
1
0.797


P$CE3S
Coupling element 3 sequence
P$CE3.01
0.77
1052
1070
1
0.874


P$NACF
Plant specific NAC [NAM
P$ANAC092.01
0.92
1055
1067
1
0.924



(no apical meristem),



ATAF172, CUC2 (cup-



shaped cotyledons 2)] transcription



factors


P$DPBF
Dc3 promoter binding factors
P$DPBF.01
0.89
1057
1067
1
0.908


P$PREM
Motifs of plastid response
P$MGPROTORE.01
0.77
1059
1089
1
0.806



elements


O$MTEN
Core promoter motif ten
O$HMTE.01
0.88
1072
1092
0.961
0.94



elements


P$DREB
Dehydration responsive
P$HVDRF1.01
0.89
1079
1093
1
0.922



element binding factors


P$PREM
Motifs of plastid response
P$MGPROTORE.01
0.77
1077
1107
1
0.784


O$MTEN
Core promoter motif ten
O$DMTE.01
0.77
1097
1117
0.844
0.802



elements


P$OPAQ
Opaque-2 like transcriptional
P$O2.02
0.87
1135
1151
1
0.915



activators


P$SALT
Salt/drought responsive
P$ALFIN1.02
0.95
1136
1150
1
0.954



elements


P$L1BX
L1 box, motif for L1 layer-
P$PDF2.01
0.85
1179
1195
1
0.882



specific expression


P$SBPD
SBP-domain proteins
P$SBP.01
0.88
1199
1215
1
0.912


P$PALA
Conserved box A in PAL
P$PALBOXA.01
0.84
1201
1219
1
0.863



and 4CL gene promoters


P$MYBS
MYB proteins with single
P$ZMMRP1.01
0.79
1230
1246
1
0.838



DNA binding repeat


P$AHBP

Arabidopsis homeobox

P$ATHB9.01
0.77
1244
1254
1
0.777



protein


P$MADS
MADS box proteins
P$AGL2.01
0.82
1248
1268
0.969
0.828


P$MYBS
MYB proteins with single
P$MYBST1.01
0.9
1262
1278
1
0.953



DNA binding repeat


P$HEAT
Heat shock factors
P$HSE.01
0.81
1278
1294
1
0.864


P$LEGB
Legumin Box family
P$RY.01
0.87
1277
1303
1
0.871


P$MYBS
MYB proteins with single
P$OSMYBS.01
0.82
1343
1359
0.75
0.822



DNA binding repeat


O$INRE
Core promoter initiator elements
O$DINR.01
0.94
1349
1359
0.969
0.955


P$STKM
Storekeeper motif
P$STK.01
0.85
1355
1369
1
0.95


P$GTBX
GT-box elements
P$GT1.01
0.85
1403
1419
0.969
0.85


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
1439
1455
0.75
0.797



protein factor


P$OCSE
Enhancer element first
P$OCSL.01
0.69
1437
1457
0.769
0.734



identified in the promoter of



the octopine synthase gene



(OCS) of the Agrobacterium




tumefaciens T-DNA



P$HEAT
Heat shock factors
P$HSFA1A.01
0.75
1478
1494
1
0.764


P$WBXF
W Box family
P$WRKY.01
0.92
1488
1504
1
0.94


P$TEFB
TEF-box
P$TEF1.01
0.76
1491
1511
0.957
0.859


P$MYBS
MYB proteins with single
P$HVMCB1.01
0.93
1498
1514
1
0.934



DNA binding repeat


P$MYBS
MYB proteins with single
P$TAMYB80.01
0.83
1509
1525
0.75
0.845



DNA binding repeat


P$MSAE
M-phase-specific activator
P$MSA.01
0.8
1551
1565
1
0.807



elements


P$OPAQ
Opaque-2 like transcriptional-
P$O2.01
0.87
1558
1574
1
0.883



activators


P$AHBP

Arabidopsis homeobox

P$ATHB5.01
0.89
1569
1579
0.83
0.904



protein


P$AHBP

Arabidopsis homeobox

P$ATHB5.01
0.89
1569
1579
0.936
0.978



protein


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
1609
1625
0.75
0.781



protein factor


P$LREM
Light responsive element
P$RAP22.01
0.85
1613
1623
1
0.966



motif, not modulated by



different light qualities


P$TEFB
TEF-box
P$TEF1.01
0.76
1617
1637
0.839
0.812


P$WNAC
Wheat NAC-domain transcription
P$TANAC69.01
0.68
1625
1647
0.896
0.811



factors


P$NACF
Plant specific NAC [NAM
P$ANAC019.01
0.94
1632
1644
0.953
0.968



(no apical meristem),



ATAF172, CUC2 (cup-



shaped cotyledons 2)] transcription



factors


P$GTBX
GT-box elements
P$S1F.01
0.79
1642
1658
1
0.917


P$PSRE
Pollen-specific regulatory
P$GAAA.01
0.83
1644
1660
1
0.864



elements


P$MYBL
MYB-like proteins
P$MYBPH3.01
0.8
1647
1663
1
0.938


P$DOFF
DNA binding with one finger
P$DOF1.01
0.98
1694
1710
1
1



(DOF)


P$HEAT
Heat shock factors
P$HSFA1A.01
0.75
1703
1719
0.857
0.757


P$CCAF
Circadian control factors
P$EE.01
0.84
1719
1733
1
0.953


P$MADS
MADS box proteins
P$AG.01
0.8
1717
1737
0.902
0.813


P$GTBX
GT-box elements
P$ASIL1.01
0.93
1732
1748
1
0.967


O$INRE
Core promoter initiator elements
O$DINR.01
0.94
1749
1759
1
0.965


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
1749
1767
0.75
0.83


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
1754
1772
0.75
0.822


P$L1BX
L1 box, motif for L1 layer-
P$ATML1.02
0.76
1757
1773
0.89
0.848



specific expression


P$AHBP

Arabidopsis homeobox

P$ATHB9.01
0.77
1761
1771
0.75
0.815



protein


O$VTBP
Vertebrate TATA binding
O$VTATA.02
0.89
1777
1793
1
0.996



protein factor


P$DOFF
DNA binding with one finger
P$DOF3.01
0.99
1778
1794
1
0.995



(DOF)


O$PTBP
Plant TATA binding protein
O$PTATA.02
0.9
1780
1794
1
0.923



factor


P$IBOX
Plant I-Box sites
P$GATA.01
0.93
1787
1803
1
0.967


P$MYBS
MYB proteins with single
P$MYBST1.01
0.9
1790
1806
1
0.972



DNA binding repeat


O$VTBP
Vertebrate TATA binding
O$ATATA.01
0.78
1803
1819
0.75
0.812



protein factor


P$IBOX
Plant I-Box sites
P$GATA.01
0.93
1847
1863
1
0.945


P$MYBS
MYB proteins with single
P$MYBST1.01
0.9
1850
1866
1
0.966



DNA binding repeat


P$MADS
MADS box proteins
P$SQUA.01
0.9
1866
1886
1
0.916


P$GTBX
GT-box elements
P$SBF1.01
0.87
1872
1888
1
0.905


O$VTBP
Vertebrate TATA binding
O$LTATA.01
0.82
1873
1889
1
0.837



protein factor


P$AHBP

Arabidopsis homeobox

P$HAHB4.01
0.87
1878
1888
1
0.902



protein


P$L1BX
L1 box, motif for L1 layer-
P$ATML1.01
0.82
1882
1898
0.75
0.824



specific expression


O$INRE
Core promoter initiator elements
O$DINR.01
0.94
1886
1896
0.969
0.949


P$EPFF
EPF-type zinc finger factors,
P$ZPT22.01
0.75
1887
1909
1
0.755



two canonical



Cys2/His2 zinc finger motifs



separated by spacers of



various length


P$GAPB
GAP-Box (light response
P$GAP.01
0.88
1907
1921
1
0.903



elements)


P$SUCB
Sucrose box
P$SUCROSE.01
0.81
1912
1930
1
0.849


P$HMGF
High mobility group factors
P$HMG_IY.01
0.89
1920
1934
1
0.892


P$SEF4
Soybean embryo factor 4
P$SEF4.01
0.98
1927
1937
1
0.984


P$MYBL
MYB-like proteins
P$ATMYB77.01
0.87
1973
1989
1
0.894


P$GTBX
GT-box elements
P$ASIL1.01
0.93
1998
2014
1
0.971


P$OPAQ
Opaque-2 like transcriptional
P$O2_GCN4.01
0.81
2001
2017
1
0.83



activators


P$IBOX
Plant I-Box sites
P$GATA.01
0.93
2018
2034
1
0.964


P$MYBS
MYB proteins with single
P$MYBST1.01
0.9
2021
2037
1
0.957



DNA binding repeat


P$LREM
Light responsive element
P$RAP22.01
0.85
2035
2045
1
0.858



motif, not modulated by



different light qualities


P$MIIG
MYB IIG-type binding sites
P$MYBC1.01
0.92
2033
2047
1
0.941


P$HEAT
Heat shock factors
P$HSFA1A.01
0.75
2041
2057
1
0.801


P$MYBL
MYB-like proteins
P$GAMYB.01
0.91
2054
2070
1
0.918


P$GTBX
GT-box elements
P$GT1.01
0.85
2056
2072
1
0.876


P$ASRC
AS1/AS2 repressor complex
P$AS1_AS2_II.01
0.86
2067
2075
1
0.906


P$EINL
Ethylen insensitive 3 like
P$TEIL.01
0.92
2098
2106
0.964
0.926



factors


O$VTBP
Vertebrate TATA binding
O$LTATA.01
0.82
2110
2126
1
0.828



protein factor


P$MYBL
MYB-like proteins
P$MYBPH3.02
0.76
2110
2126
1
0.807









5.2 Vector Construction

The DNA fragments representing promoter p-GOS2_perm1 (SEQ ID NO14) and p-GOS2_perm2 (SEQ ID NO15), respectively, were generated by gene synthesis. Endonucleolytic restriction sites suitable for cloning the promoter fragments were included in the synthesis. The p-GOS2_perm1 (SEQ ID NO14) and p-GOS2_perm2 (SEQ ID NO15) promoters are cloned into destination vectors compatible with the Multisite Gateway System upstream of an attachment site and a terminator using Swa1 restriction endonuclease.


beta-Glucuronidase (GUS) or uidA gene which encodes an enzyme for which various chromogenic substrates are known, is utilized as reporter protein for determining the expression features of the permutated p-GOS2_perm (SEQ ID NO14) and p-GOS2_perm2 (SEQ ID NO15) promoter sequences.


A pENTR/A vector harboring the beta-Glucuronidase reporter gene c-GUS (with the prefix c-denoting coding sequence) is constructed using site specific recombination (BP-reaction).


By performing a site specific recombination (LR-reaction), the created pENTR/A is combined with the destination vector according to the manufacturers (Invitrogen, Carlsbad, Calif., USA) Multisite Gateway manual. The reaction yields a binary vector with the p-GOS2_perm1 promoter (SEQ ID NO14) or the p-Gos2_perm2 promoter (SEQ ID NO 15), respectively, the beta-Glucuronidase coding sequence c-GUS and a terminator.


5.3 Generation of Transgenic Rice Plants

The Agrobacterium containing the respective expression vector is used to transform Oryza sativa plants. Mature dry seeds of the rice japonica cultivar Nipponbare are dehusked. Sterilization is carried out by incubating for one minute in 70% ethanol, followed by 30 minutes in 0.2% HgCl2, followed by a 6 times 15 minutes wash with sterile distilled water. The sterile seeds are then germinated on a medium containing 2.4-D (callus induction medium). After incubation in the dark for four weeks, embryogenic, scutellum-derived calli are excised and propagated on the same medium. After two weeks, the calli are multiplied or propagated by subculture on the same medium for another 2 weeks. Embryogenic callus pieces are sub-cultured on fresh medium 3 days before co-cultivation (to boost cell division activity).



Agrobacterium strain LBA4404 containing the respective expression vector is used for co-cultivation. Agrobacterium is inoculated on AB medium with the appropriate antibiotics and cultured for 3 days at 28° C. The bacteria are then collected and suspended in liquid co-cultivation medium to a density (OD600) of about 1. The suspension is then transferred to a Petri dish and the calli immersed in the suspension for 15 minutes. The callus tissues are then blotted dry on a filter paper and transferred to solidified, co-cultivation medium and incubated for 3 days in the dark at 25° C. Co-cultivated calli are grown on 2.4-D-containing medium for 4 weeks in the dark at 28° C. in the presence of a selection agent. During this period, rapidly growing resistant callus islands developed. After transfer of this material to a regeneration medium and incubation in the light, the embryogenic potential is released and shoots developed in the next four to five weeks. Shoots are excised from the calli and incubated for 2 to 3 weeks on an auxin-containing medium from which they are transferred to soil. Hardened shoots are grown under high humidity and short days in a greenhouse.


The primary transformants are transferred from a tissue culture chamber to a greenhouse. After a quantitative PCR analysis to verify copy number of the T-DNA insert, only single copy transgenic plants that exhibit tolerance to the selection agent are kept for harvest of T1 seed. Seeds are then harvested three to five months after transplanting. The method yields single locus transformants at a rate of over 50% (Aldemita and Hodges1996, Chan et al. 1993, Hiei et al. 1994).


Example 6: Expression Profile of the p-GOS2_perm1 (SEQ ID NO14) and p-GOS2_perm2 (SEQ ID NO15) Control Elements

To demonstrate and analyze the transcription regulating properties of a promoter, it is useful to operably link the promoter or its fragments to a reporter gene, which can be employed to monitor its expression both qualitatively and quantitatively. Preferably bacterial β-glucuronidase is used (Jefferson 1987). β-glucuronidase activity can be monitored in planta with chromogenic substrates such as 5-bromo-4-Chloro-3-indolyl-β-D-glucuronic acid during corresponding activity assays (Jefferson 1987). For determination of promoter activity and tissue specificity, plant tissue is dissected, stained and analyzed as described (e.g., Bäumlein 1991).


The regenerated transgenic T0 rice plants are used for reporter gene analysis.


General results for SEQ ID NO14: Medium-strong GUS expression is detected in all plant tissues analyzed.


General results for SEQ ID NO15: Medium-strong GUS expression is detected in all plant tissues analyzed.


General results for SEQ ID NO13: Medium-strong GUS expression is detected in all plant tissues analyzed.

Claims
  • 1. A synthetic promoter comprising SEQ ID NO: 2.
  • 2. An expression construct comprising the synthetic promoter of claim 1 operably linked to a nucleotide sequence of interest.
  • 3. A vector comprising the expression construct of claim 2.
  • 4. A host cell comprising the synthetic promoter of claim 1.
  • 5. The host cell of claim 4, wherein said host cell is a plant cell.
  • 6. A plant or plant part comprising the synthetic promoter of claim 1.
  • 7. A plant seed comprising the synthetic promoter of claim 1.
  • 8. A method of making a transgenic plant or plant cell, the method comprising a) transforming a plant or plant cell with a construct comprising SEQ ID NO: 2 operably linked to a sequence of interest to produce a transgenic plant or plant cell; and optionally,b) regenerating a transgenic plant from said transformed plant cell.
  • 9. The method of claim 8 further comprising producing seed from said transgenic plant and collecting said seed.
Priority Claims (1)
Number Date Country Kind
10193800 Dec 2010 EP regional
Parent Case Info

This application is a National Stage application of International Application No. PCT/IB2011/055412, filed Dec. 1, 2011, which claims the benefit of U.S. Provisional Application No. 61/419,895, filed Dec. 6, 2010. This application also claims priority under 35 U.S.C. §119 to European Patent Application No. 10193800.9, filed Dec. 6, 2010.

PCT Information
Filing Document Filing Date Country Kind 371c Date
PCT/IB2011/055412 12/1/2011 WO 00 6/4/2013
Publishing Document Publishing Date Country Kind
WO2012/077020 6/14/2012 WO A
US Referenced Citations (2)
Number Name Date Kind
6927321 Wang et al. Aug 2005 B2
20100199365 Senger et al. Aug 2010 A1
Foreign Referenced Citations (8)
Number Date Country
2721879 Oct 2009 CA
1489631 Apr 2004 CN
WO 0250295 Jun 2002 WO
WO 2006131490 Dec 2006 WO
WO 2009016202 Feb 2009 WO
WO 2009117417 Sep 2009 WO
WO 2009133145 Nov 2009 WO
WO 2010127969 Nov 2010 WO
Non-Patent Literature Citations (32)
Entry
Kramvis et a. The core promoter of hepatitis B virus. Journal of Viral Hepatitis. 1999. 6: 415-427.
Buckwold et al. Effects of a naturally occurring mutation in the Hepatitis B Virus Basal Core Promoter on Precore gene expression and viral replication. Journal of Virology. 1996. 70(9): 5845-5851.
Kim et al. A 20 nucleotide upstream element is essential for the nopaline synthase (nos) promoter activity. Plant Molecular Biology. 1994. 24: 105-117.
Donald et al. Mutation of either G box or I box sequences profoundly affects expression from the Arabidopsis rbcS-1A promoter. The EMBO Journal. 1990. 9(6): 1717-1726.
Ito et al. A novel cis-acting element in promoters of plant b-type cyclin genes activates M phase-specific transcription. The Plant Cell. 1998. 10: 331-341.
Kutach et al. The downstream promoter element DPE appears to be as widely used as the TATA box in Drosphila core promoters. Molecular and Cellular Biology. 2000. 4754-4764.
Goossens et al. The arcelin-5 Gene of Phaseolus vulgarisDirects High Seed-Specific Expression in TransgenicPhaseolus acutifolius and Arabidopsis Plants. Molecular Genetis and Genomics. 1999. 120(4): 1095-1103.
Rombauts et al. PlantCARE, a plant cis-acting regulatory element database. Nucleic Acids Research. 1999. 27(1): 295-296.
Ross et al. Activation of the Oryza sativa non-symbiotic haemoglobin-2 promoter by the cytokinin-regulated transcription factor, ARR1. Journal of Experimental Biology. 2004. 55(403): 1721-1731.
Extended European Search Report, issued in co-assigned application No. 11846949.3, dated Apr. 23, 2014.
Goossens et al., “The arcelin-5 Gene of Phaseolus vulgaris Directs High Seed-Specific Expression in Transgenic Phaseolus acutifolius and Arabidopsis Plants,” Plant Physiology, vol. 120, (1999), pp. 1095-1104.
Konishi and Yanagisawa, “Identification of a Nitrate-Responsive cis-Element in the Arabidopsis NIR1 Promoter Defines the Presence of Multiple cis-Regulatory Elements for Nitrogen Response,” The Plant Journal, vol. 62, (2010), pp. 269-282.
Roychoudhury et al., “Trans-acting Factor Designated OSBZ8 Interacts with Both Typical Abscisic Acid Responsive Elements as well as Abscisic Acid Responsive Element-Like Sequences in the Vegetative Tissues of Indica Rice Cultivars,” Plant Cell Rep., vol. 27, (2008), pp. 779-794.
Baek et al., “Human Mediator Enhances Basal Transcription by Facilitating Recruitment of Transcription Factor IIB During Preinitiation Complex Assembly,” Journal of Biological Chemistry, vol. 281, (2006), pp. 15172-15181.
Cartharius et al., “MatInspector and Beyond: Promoter Analysis Based on Transcription Factor Biding Sites,” Bioinformatics, vol. 21, No. 13, (2005), pp. 2933-2942.
Cartharius, “MatInspector: Analysing Promoters for Transcription Factor Binding Sites,” Analytical Tools for DNA, Genes and Genomes, (2005), pp. 161-184.
Dare et al., “Identification of a Cis-Regulatory Element by Transient Analysis of Co-ordinately Regulated Genes,” Plant Methods, vol. 4, No. 17, (2008).
DePater et al., “The Promoter of the Rice Gene GOS2 is Active in Various Different Monocot Tissues and Binds Rice Nuclear Factor ASF-1,” Plant Journal, vol. 2, No. 6, (1992), pp. 837-844.
Hamilton et al., “A Monocot Pollen-Specific Promoter Contains Separable Pollen-Specific and Quantitative Elements,” Plant Molecular Biology, vol. 38, No. 4, (1998), pp. 663-669.
Hehl and Wingender, “Database-Assisted Promoter Analysis,” Trends in Plant Science, vol. 6, No. 6, (2001), pp. 251-255.
Hehl and Bulow, “Internet Resources for Gene Expression Analysis in Arabidopsis thaliana,” Current Genomics, vol. 9, No. 6, (2008), pp. 375-380.
International Preliminary Report on Patentability, issued in PCT/IB2011/055412, dated Jun. 20, 2013.
International Search Report, issued in PCT/IB2011/055412, dated Apr. 12, 2012.
Jensen et al., “The Sequence of Spacers Between the Consensus Sequences Modulates the Strength of Prokaryotic Promoters,” Applied and Environmental Microbiology, vol. 64, No. 1, (1998), pp. 82-87, Search Report.
Kaplan et al., “Rapid Transcriptome Changes Induced by Cytosolic Ca2+ Transients Reveal ABRE-Related Sequences as CA2+-Responsive cis Elements in Arabidopsis,” Plant Cell, vol. 18, No. 10, (2006), pp. 2733-2748.
Matys et al., “TRANSFAC®: Transcriptional Regulation, from Patterns to Profiles,” Nucleic Acids Research, vol. 31, No. 1, (2003), pp. 374-378.
Montgomery et al., “Positive and Negative Regulatory Regions Control the Spatial-Distribution of Polygalacturonase Transcription in Tomato Fruit Pericarp,” Plant Cell, vol. 5, No. 9, (1993), pp. 1049-1062.
Quandt et al., “MatInd and MatInspector: New Fast and Versatile Tools for Detection of Consensus Matches in Nucleotide Sequence Data,” Nucleic Acids Research, vol. 23, (1995), pp. 4878-4884.
Que and Jorgensen, “Homology-Based Control of Gene Expression Patterns in Transgenic Petunia Flowers,” Developmental Genetics, vol. 22, No. 1, (1998), pp. 100-109.
Roeder, “The Role of General Initiation Factors in Transcription by RNA Polymerase II,” Trends in Biochemical Science, vol. 21, No. 9, (1996), pp. 327-335.
Rossi et al., “Biological Expression of an Escherichia coli Consensus Sequence Promoter and Some Mutant Derivatives,” Proc. Natl. Acad. Sci. USA, vol. 80, (1983), pp. 3203-3207, Search Report.
Wu et al., “Quantitative Nature of the Prolamin-box, ACGT and AACA Motifs in a Rice Glutelin Gene Promoter: Minimal cis-Element Requirements for Endosperm-Specific Gene Expression,” The Plant Journal, vol. 23, No. 3, (2000), pp. 415-421, Search Report.
Related Publications (1)
Number Date Country
20130263330 A1 Oct 2013 US
Provisional Applications (1)
Number Date Country
61419895 Dec 2010 US