The present invention relates to the field of plant molecular biology and plant genetic engineering and more specifically relates to polynucleotide molecules useful for control of gene expression in plants and the identification of candidate gene promoter regulatory elements using bioinformatics.
One of the goals of plant genetic engineering is to produce plants with desirable characteristics or traits. Technological advances have provided the requisite tools to transform plants to contain and express foreign genes. The technological advances in plant transformation and regeneration have enabled researchers to take an exogenous polynucleotide molecule, such as a gene from a heterologous or native source, and incorporate that polynucleotide molecule into a plant genome. The gene can then be expressed in a plant cell to exhibit the added characteristic or trait. In one approach, expression of a gene in a plant cell or a plant tissue that does not normally express such a gene may confer a desirable phenotypic effect. In another approach, transcription of a gene or part of a gene in an antisense orientation may produce a desirable effect by preventing or inhibiting expression of an endogenous gene.
Expression of heterologous DNA sequences in a plant host is dependent upon the presence of an operably linked promoter that is functional within the plant host. Choice of the promoter sequence will determine temporal and spatial expression within the organism the heterologous DNA sequence is expressed. Thus, where expression is desired in a preferred tissue of a plant, tissue-preferred promoters are utilized. In contrast, where gene expression throughout the cells of a plant is desired, constitutive promoters are preferred. Additional regulatory sequences upstream and/or downstream from the core promoter sequence may be included in expression constructs of transformation vectors to bring about varying levels of tissue-preferred or constitutive expression of heterologous nucleotide sequences in a transgenic plant. Isolation and characterization of promoters and terminators that can serve as regulatory elements for expression of isolated nucleotide sequences of interest in are needed for impacting various traits in plants.
Numerous promoters, which are active in plant cells, have been described in the literature. These promoters and numerous others have been used in the creation of constructs for transgene expression in plants. Despite the number of promoters, there is still a need for novel promoters and regulatory elements with beneficial expression characteristics.
For production of transgenic plants with various desired characteristics, it would be advantageous to have a variety of promoters to provide gene expression such that a gene is transcribed efficiently in the amount necessary to produce the desired effect. The commercial development of genetically improved germplasm has also advanced to the stage of introducing multiple traits into crop plants, often referred to as a gene stacking approach. In this approach, multiple genes conferring different characteristics of interest can be introduced into a plant. It is often desired when introducing multiple genes into a plant that each gene is modulated or controlled for optimal expression, leading to a requirement for diverse regulatory elements. In light of these and other considerations, it is apparent that optimal control of gene expression and regulatory element diversity are important in plant biotechnology.
According to one aspect, a computer-assisted method of identifying regulatory elements includes receiving a first orthologous species sequence, receiving a word length, receiving a relative offset, and receiving at least one additional orthologous species sequence, wherein each of the orthologous species sequences is associated with a species, and each of the species is an orthologous species. The method further includes performing a pairwise comparison between each pair of orthologous species sequences, computing using a computing device, overlapping portions of the sequence overlapping the sequences of all of the orthologous species sequences within the relative offset and greater than or equal to the word length. The method further includes providing an output to a user identifying the overlapping portions of the sequence for all of the orthologous species sequences to identify candidate regulatory elements.
According to another aspect, a system for identifying regulatory elements includes a computer and an article of software executing on the computer. The article of software is adapted for performing steps of receiving a first orthologous species sequence, receiving a word length, receiving a relative offset, receiving at least one additional orthologous species sequence, wherein each of the orthologous species sequences is associated with a species, and each of the species is an orthologous species. The article of software is further adapted for performing a pairwise comparison between each pair of orthologous species sequences, computing overlapping portions of the sequence overlapping the sequences of all of the orthologous species sequences within the relative offset and greater than or equal to the word length, and providing an output to a user identifying the overlapping portions of the sequence for all of the orthologous species sequences to identify candidate regulatory elements.
According to another aspect of the present invention, a computer-assisted method of identifying regulatory elements is provided. The method includes receiving a first sequence;
receiving a word length, receiving a relative offset, receiving at least one additional sequence, performing a pairwise comparison between each pair of sequences, computing using a computing device, overlapping portions of the first sequence overlapping the sequences of all of the sequences within the relative offset and greater than or equal to the word length, and providing an output to a user identifying the overlapping portions of the first sequence for all sequences to identify candidate regulatory elements.
The following description is merely exemplary in nature and is in no way intended to limit the methods, their application, or uses.
As used herein, the term “orthologs” may refer to two genes of different species that share a common evolutionary ancestry. They can be derived from a speciation event and belong to different species.
As used herein, the term “orthologous” may refer to two or more species that share a common evolutionary ancestry.
As used herein, the term “regulatory element” may refer to intended sequences responsible expression of the associated coding sequence including, but not limited to, promoters, terminators, enhancers, introns, and the like. A “regulatory element” may be in different portions of the gene.
As used herein, the term “promoter” may refer to a regulatory region of DNA capable of regulating the transcription of a linked sequence. It may, but need not include a TATA box capable of directing RNA polymerase II to initiate RNA synthesis at the appropriate transcription initiation site for a particular coding sequence. A promoter may also include other recognition sequences generally positioned upstream or 5′ to the TATA box, which may be referred to as upstream promoter elements.
It should be appreciated that confident identification of orthologs can also rely on the availability of suitability comprehensive collection of genes from both organisms. However, whether a particular set of species is appropriate can be readily determined from results obtained using the methodology. For example, if too many or too few candidate regulatory elements are consistently found, then it is apparent changes in the orthologous species used should be adjusted.
Where a maize species is of interest and one wants to find a particular promoter within a sequence associated with the maize species, other species that may be used may include rice, maize, and sorghum. Alternatively another monocot may be used such as onion, barley, or wheat.
Given three orthologous species, species A, species B, and species C, three pairwise comparisons are performed, namely A and B, A and C, and B and C. A distance is defined by the user which is a relative distance to an ATG start site (where DNA is used).
Although distance is a matter of user preference, useful distances include those on the order of about 100 bases or 150 bases. Of course, lesser or greater distances may be used. A shared element size is also selected by the user. The shared element size is a minimum size of interest to the user. Although shared element size is a matter of user preference, usually the shared elements size is in the range of 6 to 25. Having a size of at least six reduces the likelihood of random occurrences, un-related to conservation. Having a shared element size too large may miss possible regulatory elements. It is to be appreciated that the shared element size is a minimum size of interest to the user, so providing a relatively small shared element size of 6 or 7 will still capture much larger regulatory expressions where present. If two or more common elements overlap each other in every sequence used in the comparison, they are merged into a single element. Thus, specifying a 6-letter word size can produce a 30-letter common element.
The pairwise comparisons performed take into account the distance specified by the user in determining relative similarity. Thus, for example, where a distance of 100 bases or more is specified, the first shared element size of species A is search for in the 100 bases of species. Lengths which are more than or equal to the minimum size of interest are maintained for each pairwise comparison. Only those stretches of sequences common to all of the pairwise comparisons are considered to be candidate regulation elements. It should be appreciated that this methodology preserves relative order and approximate spacing across the entire set of species. It should further be noted that this approach does not rely upon complex scoring or statistical methods for evaluating possible alignments between the sequences of the different species, and thus do not have the same types of limitations and issues associated with such systems. It is also observed that gains in performance can be made by implementing the method using a non-linear binary search instead of linear approach. This reduces processing time significantly.
In addition, it is contemplated that more nuanced pattern searches may be used in making comparisons. In particular, some of the ‘letters’ in a word may be variables. It is further contemplates the analysis need not only be performed on forward-written words. In particular, words can be implemented in both the forward as well as the reverse direction. Some regulatory elements, especially those with ‘enhancer-like’ function can work in both directions.
Once candidate regulatory elements have been identified, this information may be used in various applications. Such applications may be relevant to transgenic research, such as improvement of crop plants. The method may be used for defining the boundaries of functional promoters. This may simplify sub-cloning processes; focus the research on promoter regions more likely to yield the full and desired expression pattern. It also enables efficient us of cloning vector space; some cloning vectors become unstable with large inserts. This issue is particularly germane to transgenic stacking experiments, because with more gene constructs packed into the same vector, the risk of vector instability increases, and once in the plant there is added risk to transformation efficiency and stability.
Various methods are available for using candidate sequences. Functional fragments can be obtained by use of restriction enzymes to cleave naturally occurring regulatory element nucleotide sequences. Alternatively, such elements may be synthesized from the naturally occurring DNA sequence; or can be obtained through the use of PCR technology. See particularly, Mullis et al. (1987) Methods Enzymol. 155:335-350, and Erlich, ed. (1989) PCR Technology (Stockton Press, New York), all of which are herein incorporated by reference. Where transformation vectors are formed, activity can be measured by Northern blot analysis, reporter activity measurements when using transcriptional fusions, and the like. See, for example, Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2nd ed. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.), herein incorporated by reference. Reporter genes can be included in the transformation vectors. Examples of suitable reporter genes known in the art can be found in, for example: Jefferson et al. (1991) in Plant Molecular Biology Manual, ed. Gelvin et al. (Kluwer Academic Publishers), pp. 1-33; DeWet et al. (1987) Mol. Cell. Biol. 7:725-737; Goff et al. (1990) EMBO J. 9:2517-2522; Kain et al. (1995) BioTechniques 19:650-655; and Chiu et al. (1996) Current Biology 6:325-330, all of which are incorporated by reference. Additional information regarding transformation may be found in Regeneration of plants after transformation: McCormick et al. (1986) Plant Cell Reports 5:81-84, herein incorporated by reference in its entirety.
It may also be desired that expression associated with the candidate regulatory elements identified be suppressed. Methods of co-suppression are known in the art and can be similarly applied. These methods involve the silencing of a targeted gene by spliced hairpin RNA's and similar methods also called RNA interference and promoter silencing (see Smith et al. (2000) Nature 407:319-320, Waterhouse and Helliwell (2003)) Nat. Rev. Genet. 4:29-38; Waterhouse et al. (1998) Proc. Natl. Acad. Sci. USA 95:13959-13964; Chuang and Meyerowitz (2000) Proc. Natl. Acad. Sci. USA 97:4985-4990; Stoutjesdijk et al. (2002) Plant Phystiol. 129:1723-1731; and Patent Application WO 99/53050; WO 99/49029; WO 99/61631; WO 00/49035 and U.S. Pat. No. 6,506,559.
Thus, it should be apparent that once candidate regulatory elements are found, various methods may be applied. On example of a promoter which has been identified using the software methodology described herein is disclosed in U.S. Provisional Patent Application No. 60/963,878, entitled A Plant Regulatory Region That Directs Transgene Expression in the Maternal and Supporting Tissue of Maize Ovules and Pollinated Kernels, filed Aug. 7, 2007, and herein incorporated by reference in its entirety. See also U.S. Published Patent Application No. 2009-0094713 herein incorporated by reference in its entirety. The Published Patent Application discloses compositions comprising nucleotide sequences for a reproductive-tissue-preferred and preferentially an immature-ear-preferred promoter region for an actin depolymerization factor (ADF) gene, more particularly, the ADF4 promoter. Regulatory motifs of about six or eight bases within the ADF4 promoter sequence were identified by comparison to upstream sequences from orthologous genes from sorghum and rice. The 1000 base pairs upstream of the ADF4 promoter, relative to the ATG start of translation, were compared to the 1000 base pairs upstream sequence of the orthologous rice and sorghum genes. The comparison was performed through performing pairwise comparisons of multiple regulatory sequences from a plurality of orthologous species, here maize, rice and sorghum, to identify the regulatory motifs.
There the methodology and system described herein was applied to identify regulatory motifs in the ADF4 promoter. Regulatory motifs of about six or eight bases within the ADF4 promoter sequence were identified by comparison to upstream sequences from orthologous genes from sorghum and rice. The 1000 base pairs upstream of the ADF4 promoter, relative to the ATG start of translation were compared to the 1000 base pairs upstream sequence of the orthologous rice and sorghum genes to provide the output shown in
It should be appreciated that the methodology described does not require complex scoring rules such as may be associated with other methodologies. The process allows users to identify conserved candidate regulatory elements in gene promoters. Multiple promoters can be compared. The main approach is to compare promoters for orthologous genes across species, such as maize, rice and sorghum, or to compare genes within and/or between species that share expression patterns. The result is a listing of short promoter sequences that are preserved in the same relative order and approximate spacing across the set of promoters compared, and as well defines the likely promoter functional boundary.
The method may be used in various applications. Such applications may be relevant to transgenic research, such as improvement of crop plants. The method may be used for defining the boundaries of functional promoters. This may simplify sub-cloning processes and focus the research on promoter regions more likely to yield the full and desired expression pattern. It also enables efficient us of cloning vector space; some cloning vectors become unstable with large inserts. This issue is germane to transgenic stacking experiments, because with more gene constructs packed into the same vector, the risk of vector instability increases, and once in the plant there is added risk to transformation efficiency and stability. By allowing less DNA to be used, there is the practical advantage of having to describe and account for less introduced DNA, often a regulatory concern.
These methods allow identification of novel regulatory elements which may be novel and which alone or in combination may lead to methods for novel recombined or synthethic promoters having enhanced or novel expression capability. It should also be clear that multiple promoters may be searched for simultaneously. It should be appreciated that the methods may be used for comparing promoters and related types of diffuse regulatory elements, not necessarily promoters, and may be used for any organism, not just plants.
In addition, although discussed in the context of a comparative genomics method, sets of co-regulated genes (similar mRNA expression patterns), such as those of a common biochemical or signaling pathway may be used. These genes, from one or multiple species, also may serve as inputs to the program.
Although various specific embodiments and examples are provided herein, it should be understood that such examples and specific disclosure, while indicating embodiments of the invention, are given by way of illustration only. From the above discussion, one skilled in the art can ascertain the essential characteristics of the embodiments, and without departing from the spirit and scope thereof, can make various changes and modifications of them to adapt to various usages, conditions, and environments. Thus, various modifications of the embodiments in addition to those shown and described herein will be apparent to those skilled in the art from the foregoing description. Such modifications are also intended to fall within the scope of the appended claims.
This application claims priority under 35 U.S.C. §119(e) to provisional application Ser. No. 61/086,372 filed Aug. 5, 2008 herein incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61086372 | Aug 2008 | US |