SYNTHETIC ALGAL PROMOTERS

Information

  • Patent Application
  • 20190382779
  • Publication Number
    20190382779
  • Date Filed
    February 16, 2017
    7 years ago
  • Date Published
    December 19, 2019
    4 years ago
Abstract
This invention provides synthetic promoters capable of promoting and/or initiating transcription of a polynucleotide in an algal cell, and methods of designing, producing and using such promoters.
Description
INCORPORATION BY REFERENCE OF SEQUENCE LISTING PROVIDED AS A TEXT FILE

A Sequence Listing is provided herewith as a text file, “UCSDP044US_ST25.txt” created on Jan. 8, 2019 and having a size of 204,189 bytes. The contents of the text file are incorporated by reference herein in their entirety.


BACKGROUND

Algae are among the most ancient and diverse organisms on the planet. Microalgae have evolved to adapt to a wide range of environments and consequently have proven to be a rich source of genetic and chemical diversity (Blunt et al., 2012; Gimpel et al., 2013; Parker et al., 2008). This diversity has been exploited as a unique source of bioactive compounds, including antioxidants, omega 3 fatty acids, and potentially novel therapeutic drugs (Cardozo et al., 2007). In addition, microalgae have also proven to be cost-effective and safe hosts for expressing a wide array of recombinant proteins, including human and animal therapeutics, vaccines, and industrial enzymes (Georgianna et al., 2013; Griesbeck and Kirchmayr, 2012; Rosales-Mendoza et al., 2012; Specht et al., 2010).



Chlamydomonas reinhardtii is a long established model system for studying molecular and genetic systems of algae. The most successful advances in recombinant protein expression within C. reinhardtii have been within the chloroplast where exogenous protein levels have reached almost 10% of total soluble protein (Manuell et al., 2007). This progress has been aided by the fact that gene integration occurs exclusively by homologous recombination within the plastid (Fischer et al., 1996). The chloroplast also has strong, well-characterized promoters and regulatory untranslated regions (UTRs) to enable high levels of transgene expression (Rosales-Mendoza et al., 2012; Specht et al., 2010). The most successful regulatory elements are those from endogenous highly expressed photosynthetic proteins (Gimpel and Mayfield, 2013; Rosales-Mendoza et al., 2012; Specht et al., 2010). However, recent work in the Mayfield laboratory has shown that high-throughput analysis of synthetic 5′ UTRs can identify novel regulatory elements and lead to increased transgene expression within the plastid (Specht and Mayfield, 2012).


While advancements have been made in heterologous nuclear gene expression in C. reinhardtii over the last several years (Rasala et al., 2013; Rasala et al., 2012; Schroda et al., 2000), these tools still lags significantly behind both plastid gene expression in algae, as well heterologous gene expression in many other eukaryotic organisms. Controlled nuclear gene expression is an essential tool for synthetic biology in any industrial microorganisms. Recent advances also allow protein products to be targeted to any cellular location in C. reinhardtii (Rasala et al., 2013). Targeted expression is essential for metabolic engineering, since enzymes need to be localized to their functional site. Proper localization is also important for the production of high-value protein products. Specific organelles may be better suited for proper post-translational modification and folding of complex proteins. In particular, chloroplasts lack the enzymes involved in protein glycosylation, an essential modification for many therapeutic proteins (Lingg et al., 2012). Finally, nuclear expression allows for the secretion of recombinant proteins, which can lead to simpler and cheaper downstream processing (Corchero et al., 2013).


One of the main reasons for poor heterologous gene expression from the nuclear genome of algae is the lack of strong promoters (Rosales-Mendoza et al., 2012; Specht et al., 2010). Studies have identified several endogenous promoters that promote exogenous gene expression, including those from the well-characterized and highly expressed genes such as those for the Rubisco small subunit (RBCS2), heat shock protein 70A (HSP70A), and photosystem I protein psaD (Cerutti et al., 1997; Schroda et al., 2000; Fischer and Rochaix, 2001). In an attempt to increase expression above the modest levels achieved with these native promoters, chimeric promoters have been developed that contain the heat shock 70A promoter region fused upstream of the RBCS2 promoter (ar1), which has led to increased transcription (Schroda et al., 2002; Schroda et al., 2000; Wu et al., 2008). However, protein accumulation from exogenous genes expressed using this best chimeric promoter is still poor, with recombinant protein levels peaking around 0.25% of total soluble protein, which is well below the level of economic viability for almost any recombinant protein product. Finally, viral promoters that are favored in higher plant expression systems have been shown to be minimally successful in algal systems (Diaz-Santos et al., 2013). Therefore, novel regulatory elements must be identified or generated and combined into robust promoters capable of driving high rates of transcription in order to achieve the robust exogenous protein expression required to make algae a true industrial organisms.


Several recent reviews have highlighted the generation of synthetic promoters and promoter libraries as important biobricks for protein expression and, in particular, systems engineering (Blazeck and Alper, 2013; Hammer et al., 2006; Mukherji and van Oudenaarden, 2009; Ruth and Glieder, 2010). Engineered promoters have demonstrated the ability to drive exogenous gene expression above levels achieved by the best native promoter systems. In addition, development of libraries of designer promoters is essential for systems engineering. The synthetic nature of these promoters reduces or eliminates the chance of homology dependent gene silencing and can potentially allow them to be utilized in multiple species or cell lines. In this study, publicly available mRNA expression data was utilized to identify cis-motifs found in promoters of highly expressed C. reinhardtii genes. These motifs were then used to generate a novel set of completely synthetic algal promoters (saps) that allowed for high constitutive gene expression within the C. reinhardtii nucleus. A combination of analyzes of these native promoters and novel saps revealed previously uncharacterized C. reinhardtii promoter structures including a newly identified core DNA motif important for promoter function in highly transcribed genes.


SUMMARY

Provided are synthetic promoters useful for high level transcription or expression of polynucleotides in an algal cell. Accordingly, in one aspect, provided is a synthetic promoter capable of promoting and/or initiating transcription of a polynucleotide in an algal cell. In varying embodiments, the synthetic promoter comprising from 3 to 30, e.g., from 3 to 27, e.g., from 3 to 25, e.g., from 3 to 20, e.g., from 3 to 15, e.g., from 3 to 10, e.g., from 3 to 5, promoter (cis)-elements selected from the group consisting of the sequences in Tables 1 and 2, and FIGS. 16A and 16B. In varying embodiments, the promoter (cis)-elements are positioned or located within the promoter relative to the transcriptional start site (TSS) as indicated in Table 1. In varying embodiments, the synthetic promoter comprises one or more transcriptional factor binding site motifs selected from the group consisting of the sequences in FIGS. 17A, 17B, and 17C. In varying embodiments, the promoter comprises a nucleic acid sequence of any one of the sequences in Table 4 (e.g., any one of SEQ ID NOs: 38-62). In varying embodiments, the promoter is responsive to light exposure and comprises one or more promoter (cis)-elements selected from the group consisting of the sequences in FIG. 16A. In varying embodiments, the promoter is responsive to dark exposure and comprises one or more promoter (cis)-elements selected from the group consisting of the sequences in FIG. 16B. In varying embodiments, the promoter is at least about 200 bp in length and up to about 500 bp, 600 bp, 700 bp, 750 bp, 800 bp, 900 bp or 1000 bp in length. In varying embodiments, the synthetic promoter promotes transcription levels that are at least about 2-fold greater, e.g., 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, or more, greater than a control promoter (e.g., a random polynucleotide sequence or a native promoter). In varying embodiments, the promoter (cis)-elements are positioned or arranged within a promoter scaffold or backbone. In varying embodiments, the nucleic acid base of highest probability or second highest probability at a particular position of the promoter scaffold or backbone (e.g., based on known native promoter sequences) relative to the transcriptional start site (TSS) is assigned to that position, e.g., as indicated in Table 3. In varying embodiments, the algal cell is a green algal cell. In varying embodiments, the green algal cell is a Chlamydomonas cell. In varying embodiments, the green algal cell is a Chlamydomonas reinhardtii cell.


In another aspect, provided is an expression cassette comprising a synthetic promoter as described above and herein.


In another aspect, provided is a vector comprising the expression cassette comprising a synthetic promoter as described above and herein. In varying embodiments, the vector is a plasmid vector.


In another aspect, provided is a cell comprising a synthetic promoter, or an expression cassette or vector comprising the synthetic promoter, as described above and herein. In varying embodiments, the cell is a green algal cell. In varying embodiments, the cell is a Chlamydomonas cell. In varying embodiments, the cell is a Chlamydomonas reinhardtii cell. In varying embodiments, the cell overexpresses, e.g., by at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 150%, 200%, or more, greater than a control, one or more transcription factors encoded by a polynucleotide comprising at least about 60% sequence identity, e.g., 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 100% sequence identity, to SEQ ID NOs: 87-178, e.g., SEQ ID NO: 150 (TF64). In varying embodiments, the cell underexpresses, e.g., by at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 150%, 200%, or more, less than a control, one or more transcription factors encoded by a polynucleotide comprising at least about 60% sequence identity, e.g., 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 100% sequence identity, to SEQ ID NOs: 87-178, e.g., SEQ ID NO: 150 (TF64).


In a further aspect, provided is a method of transcribing or expressing a polynucleotide, e.g., in vitro or in an algal cell. In varying embodiments, the methods comprise contacting a polymerase to a polynucleotide comprising the synthetic promoter operably linked to a coding polynucleotide under conditions that allow the polymerase to transcribe the coding polynucleotide under the control of the synthetic promoter. In varying embodiments, the methods comprise introducing into the algal cell the polynucleotide operably linked to, e.g., and under the promoter control of, a synthetic promoter as described and herein. In a further aspect, provided is a method of increasing the transcription of a polynucleotide in an algal cell. In varying embodiments, the methods comprise introducing into the algal cell the polynucleotide operably linked to, e.g., and under the promoter control of, a synthetic promoter as described and herein. In varying embodiments, transcription of the polynucleotide is increased in response to light exposure and the synthetic promoter comprises one or more promoter (cis)-elements selected from the group consisting of the sequences in FIG. 16A. In varying embodiments, transcription of the polynucleotide is increased in response to dark exposure and the synthetic promoter comprises one or more promoter (cis)-elements selected from the group consisting of the sequences in FIG. 16B. In some embodiments, the transcription levels of the polynucleotide are increased at least about 2-fold greater, e.g., 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, or more, greater than a control promoter (e.g., a random polynucleotide sequence or a native promoter). In varying embodiments, the (coding) polynucleotide operably linked to the synthetic promoter is codon-biased or codon-optimized for expression in an algal cell. In varying embodiments, the algal cell is a green algal cell. In varying embodiments, the algal cell is a Chlamydomonas cell. In varying embodiments, the algal cell is a Chlamydomonas reinhardtii cell. In some embodiments, the cell comprises one or more transcription factors encoded by a polynucleotide comprising at least about 60% sequence identity, e.g., 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 100% sequence identity, to SEQ ID NOs: 87-178, e.g., SEQ ID NO: 150 (TF64). In varying embodiments, the cell overexpresses, e.g., by at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 150%, 200%, or more, greater than a control, one or more transcription factors encoded by a polynucleotide comprising at least about 60% sequence identity, e.g., 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 100% sequence identity, to SEQ ID NOs: 87-178, e.g., SEQ ID NO: 150 (TF64). In varying embodiments, the cell underexpresses, e.g., by at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 150%, 200%, or more, less than a control, one or more transcription factors encoded by a polynucleotide comprising at least about 60% sequence identity, e.g., 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 100% sequence identity, to SEQ ID NOs: 87-178, e.g., SEQ ID NO: 150 (TF64).


In a further aspect, provided is a method of designing, constructing and/or assembling a synthetic promoter, e.g., as described herein. In varying embodiments, the methods comprise assembling or arranging at least about 3 (cis)-elements, e.g., from 3 to 30, e.g., from 3 to 27, e.g., from 3 to 25, e.g., from 3 to 20, e.g., from 3 to 15, e.g., from 3 to 10, e.g., from 3 to 5, promoter (cis)-elements selected from the sequences in Tables 1 and 2, and FIGS. 16A and 16B within a promoter scaffold or backbone. In varying embodiments, the synthetic promoter comprises one or more transcriptional factor binding site motifs selected from the group consisting of the sequences in FIGS. 17A, 17B, and 17C. In varying embodiments, the promoter (cis)-elements are positioned or located within the promoter relative to the transcriptional start site (TSS) as indicated in Table 1. In varying embodiments, the promoter is at least about 200 bp in length and up to about 500 bp, 600 bp, 700 bp, 750 bp, 800 bp, 900 bp or 1000 bp in length. In varying embodiments, the synthetic promoter promotes transcription levels that are at least 2-fold greater, e.g., 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, or more, greater than a control promoter (e.g., a random polynucleotide sequence or a native promoter). In varying embodiments, the nucleic acid base of highest probability or second highest probability at a particular position of the promoter scaffold or backbone relative to the transcriptional start site (TSS) is assigned to that position, e.g., as indicated in Table 3. In varying embodiments, the method is computer implemented.


In a further aspect, provided is a synthetic nuclear transcription system, the system comprising a synthetic promoter as described above and herein, operably linked to a polynucleotide of interest, and one or more transcription factors encoded by a polynucleotide comprising at least about 60% sequence identity, e.g., 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 100% sequence identity, to SEQ ID NOs: 87-178, e.g., SEQ ID NO: 150 (TF64). The systems can be used for in vitro or in vivo transcription. In some embodiments of the system, transcription of the polynucleotide is increased in response to light exposure and the synthetic promoter comprises one or more promoter (cis)-elements selected from the group consisting of the sequences in FIG. 16A. In some embodiments of the system, transcription of the polynucleotide is increased in response to dark exposure and the synthetic promoter comprises one or more promoter (cis)-elements selected from the group consisting of the sequences in FIG. 16B. Further provided is a cell or population of cells comprising the system as described above and herein. In some embodiments, the cell is a green algal cell. In some embodiments, the cell is a Chlamydomonas cell. In some embodiments, the cell is a Chlamydomonas reinhardtii cell. In varying embodiments, the cell overexpresses, e.g., by at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 150%, 200%, or more, greater than a control, one or more transcription factors encoded by a polynucleotide comprising at least about 60% sequence identity, e.g., 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 100% sequence identity, to SEQ ID NOs: 87-178, e.g., SEQ ID NO: 150 (TF64). In varying embodiments, the cell underexpresses, e.g., by at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 150%, 200%, or more, less than a control, one or more transcription factors encoded by a polynucleotide comprising at least about 60% sequence identity, e.g., 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 100% sequence identity, to SEQ ID NOs: 87-178, e.g., SEQ ID NO: 150 (TF64).


In another aspect, provided is a kit comprising a synthetic promoter, or an expression cassette or vector or cell comprising the synthetic promoter, as described above and herein. In another aspect, provided is a kit comprising the synthetic nuclear transcription system, including green algal cells comprising the synthetic promoters and optionally overexpressed or underexpressed transcription factors, as described herein.


Definitions

Unless otherwise provided, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art of genetics, bioinformatics, and gene design. General dictionaries containing many of the terms used in this disclosure are: Singleton et al. (1994) Dictionary of Microbiology and Molecular Biology, 2nd Ed., John Wiley and Sons, N.Y.; and Hale and Marham (1991) The Harper Collins Dictionary of Biology, Harper Perennial, N.Y. Any methods and materials similar or equivalent to those described herein may be used in the practice or testing of embodiments of the invention, though certain methods and materials are exemplified by those disclosed herein.


Codon optimization: As used herein, the term “codon optimization” refers to processes employed to modify an existing coding sequence, or to design a coding sequence in the first instance, for example, to improve translation in an expression host cell or organism of a transcript RNA molecule transcribed from the coding sequence, or to improve transcription of a coding sequence. Codon optimization includes, but is not limited to, processes including selecting codons for the coding sequence to suit the codon preference of the expression host organism. Codon optimization also includes, for example, the process sometimes referred to as “codon harmonization,” wherein codons of a codon sequence that are recognized as low-usage codons in the source organism are altered to codons that are recognized as low-usage in the new expression host. This process may help expressed polypeptides to fold normally by introducing natural and appropriate pauses during translation/extension. Birkholtz et al. (2008) Malaria J. 7:197-217. Codon optimization can also include codon abundance in relation to tRNA availability under certain conditions.


It will be understood that, due to the redundancy of the genetic code, multiple DNA sequences may be designed to encode a single amino acid sequence. Thus, optimized DNA sequences may be designed, for example, to remove superfluous restriction sites and undesirable RNA secondary structures, while optimizing the nucleotide sequence of the coding region so that the codon composition resembles the overall codon composition of the host in which the DNA is to be expressed.


Modify: As used herein, the terms “modify” or “alter,” or any forms thereof, mean to modify, alter, replace, delete, substitute, remove, vary, or transform.


Nucleic acid molecule: As used herein, the term “nucleic acid molecule” may refer to a polymeric form of nucleotides, which may include both sense and anti-sense strands of RNA, cDNA, genomic DNA, and synthetic forms and mixed polymers of the above. A nucleotide may refer to a ribonucleotide, deoxyribonucleotide, or a modified form of either type of nucleotide. A “nucleic acid molecule” as used herein is synonymous with “nucleic acid” and “polynucleotide.” A nucleic acid molecule is usually at least 10 bases in length, unless otherwise specified. The term includes single- and double-stranded forms of DNA. A nucleic acid molecule can include either or both naturally occurring and modified nucleotides linked together by naturally occurring and/or non-naturally occurring nucleotide linkages.


Nucleic acid molecules may be modified chemically or biochemically, or may contain non-natural or derivatized nucleotide bases, as will be readily appreciated by those of skill in the art. Such modifications include, for example, labels, methylation, substitution of one or more of the naturally occurring nucleotides with an analog, internucleotide modifications (e.g., uncharged linkages: for example, methyl phosphonates, phosphotriesters, phosphoramidates, carbamates, etc.; charged linkages: for example, phosphorothioates, phosphorodithioates, etc.; pendent moieties: for example, peptides; intercalators: for example, acridine, psoralen, etc.; chelators; alkylators; and modified linkages: for example, alpha anomeric nucleic acids, etc.). The term “nucleic acid molecule” also includes any topological conformation, including single-stranded, double-stranded, partially duplexed, triplexed, hairpinned, circular, and padlocked conformations.


Operably linked: A first nucleotide sequence is operably linked with a second nucleic acid sequence when the first nucleic acid sequence is in a functional relationship with the second nucleic acid sequence. When recombinantly produced, operably linked nucleic acid sequences are generally contiguous, and, where necessary to join two protein-coding regions, in the same reading frame (e.g., in a polycistronic ORF). However, nucleic acids need not be contiguous to be operably linked.


The term, “operably linked,” when used in reference to a regulatory sequence and a coding sequence, means that the regulatory sequence affects the expression of the linked coding sequence. “Regulatory sequences,” or “control elements,” refer to nucleotide sequences that influence the timing and level/amount of transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences may include promoters; translation leader sequences; introns; enhancers; stem-loop structures; repressor binding sequences; termination sequences; and polyadenylation recognition sequences. Particular regulatory sequences may be located upstream and/or downstream of a coding sequence operably linked thereto. Also, particular regulatory sequences operably linked to a coding sequence may be located on the associated complementary strand of a double-stranded nucleic acid molecule.


Promoter: As used herein, the term “promoter” refers to a region of DNA that may be upstream from the start of transcription, and that may be involved in recognition and binding of RNA polymerase and other proteins to initiate transcription. A promoter may be operably linked to a coding sequence for expression in a cell, or a promoter may be operably linked to a nucleotide sequence encoding a signal sequence which may be operably linked to a coding sequence for expression in a cell.


Vector: A nucleic acid molecule as introduced into a cell, for example, to produce a transformed cell. A vector may include nucleic acid sequences that permit it to replicate in the host cell, such as an origin of replication. Examples of vectors include, but are not limited to: a plasmid; cosmid; bacteriophage; or virus that carries exogenous DNA into a cell. A vector may also include one or more genes, antisense molecules, and/or selectable marker genes and other genetic elements known in the art. A vector may transduce, transform, or infect a cell, thereby causing the cell to express the nucleic acid molecules and/or proteins encoded by the vector. A vector optionally includes materials to aid in achieving entry of the nucleic acid molecule into the cell (e.g., a liposome, and protein coating).


Expression: As used herein, the term “expression” may refer to the transcription and stable accumulation of mRNA encoded by a polynucleotide, or to the translation of such an mRNA into a polypeptide. The term “over-expression,” as used herein, refers to expression that is higher than endogenous expression of the same or a closely related gene. A heterologous gene is over-expressed if its expression is higher than that of a closely-related endogenous gene (e.g., a homolog).


The terms “identical” or percent “identity,” and variants thereof in the context of two or more polynucleotide sequences, refer to two or more sequences or subsequences that are the same. Sequences are “substantially identical” if they have a specified percentage of nucleic acid residues or nucleotides that are the same (i.e., at least 60% identity, optionally at least 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identity over a specified region (or the whole reference sequence when not specified)), when compared and aligned for maximum correspondence over a comparison window, or designated region as measured using one of the following sequence comparison algorithms (e.g., as described below and herein) or by manual alignment and visual inspection. The present invention provides polynucleotides improved for expression in algal host cells that are substantially identical to the polynucleotides AAACCCAAC, AAACCCATC, AACAGCCAG, AACTGAGG, ACCCCATCGC (Seq ID NO: 24), ACGGCCAT, AGCAAGTC, AGCAAGTC, AGCAATTT, ATGCATTA, CAACACACC, CACGAACC, CACGCCCTG, CGCTCGGC, and/or CGGGCCCA. Optionally, the identity exists over a region that is at least about 50 amino acids in length, or more preferably over a region that is 100, 200, 300, 400, 500, 600, 800, 1000, or more, nucleic acids in length, or over the full-length of the sequence.


For sequence comparison, typically one sequence acts as a reference sequence, to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters can be used, or alternative parameters can be designated. The sequence comparison algorithm then calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters.


The term “comparison window”, and variants thereof, includes reference to a segment of any one of the number of contiguous positions selected from the group consisting of from 20 to 600, usually about 50 to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned. Methods of alignment of sequences for comparison are well known in the art. Optimal alignment of sequences for comparison can also be conducted by the local homology algorithm of Smith and Waterman Add. APL. Math. 2:482 (1981), by the homology alignment algorithm of Needle man and Wunsch J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson and Lipman Proc. Natl. Acad. Sci. (U.S.A.) 85: 2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science Dr., Madison, Wis.), Karlin and Altschul Proc. Natl. Acad. Sci. USA, 87: 2264-2268(1990), or by manual alignment and visual inspection (see, e.g., Ausubel et al., Current Protocols in Molecular Biology (1995 supplement)). Examples of an algorithm that is suitable for determining percent sequence identity and sequence similarity include the BLAST and BLAST 2.0 algorithms, which are described in Altschul et al. (1977) Nuc. Acids Res. 25:3389-3402, and Altschul et al. (1990) J. Mol. Biol. 215:403-410, respectively. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (on the internet at ncbi.nlm.nih.gov/).





BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.



FIG. 1, panels A-E, illustrates design of synthetic algal promoters and expression vector construction. Panel A) Relative GC content of the top 50 native promoters was analyzed (moving window 20 bp). Synthetic and random promoters were generated to mimic the AT-skew. Panel B) Motifs discovered in the top 50 native promoters were placed in a synthetic backbone in positions similar to their position in the native promoters. The overall promoter was designed to mimic −450 to +50 bp relative to TSS. Panel C) Synthetic algal promoters (saps) were placed upstream of mCherry expression cassette, which included the RBCS2 5′ and 3′ UTR (U) and first intron (I) in order to drive expression. A separate hygromycin expression cassette was place upstream of the mCherry cassette to allow for screening of transformants independent of synthetic promoter function. Synthetic promoters were compared to the hsp70/rbcs2 hybrid promoter (ar1). Panel D) Randomly generated sequences are used to drive mCherry. The relative mCherry fluorescence of 5,000 transformants is compared to 5,000 transformants of the ar1 construct by flow cytometry. Populations that are statistically different are indicated (a-b, Tukey's test, p<0.05) Box and whisker plot indicates max (top of line), min (bottom of line), first quartile (bottom of box), second quartile (median; middle line), third quartile (top of box). Panel E) sap transformants were compared to ar1 transformants by flow cytometry. Populations transformed with seven of the sap promoters have more mCherry fluorescence than ar1 transformed cells (*, Tukey's test, p<0.05).



FIG. 2 illustrates frequency of POWRs motifs in the top 50 native promoters and the 25 sap promoters.



FIG. 3 illustrates TC rich identified by POWRs in the top 50 native promoters.



FIG. 4, panels A and B, illustrates a comparison of robustness of plate vs flow cytometry data for C. reinhardtii promoter strength analysis. Panel A) Constructs were transformed into two independent C. reinhardtii cultures (Replicate 1 and 2) and plated on two separate plates (ex: 1-1, 1-2). Twenty-four individuals were picked from each plate and screened using a Tecan plate reader. The remainder of the transformants from each plate were pooled and screened by flow cytometry. Populations that are statistically different are indicated (a-b, Tukey's test, p<0.05). Panel B) C. reinhardtii was transformed with ar1 and sap11 rearranged so that the hyg construct was downstream of mCherry in two independent transformation events. mCherry expression was measured for the pooled transformants. Rearrangement did not alter promoter function for either promoter.



FIGS. 5, panels A-D, illustrate promoter and motif deletions of sap11. Panel A) The expression vector was rearranged to have the hygromycin resistance cassette downstream of the mCherry cassette. sap11 was cloned upstream of the mCherry cassette with the rbcs2 5′ and 3′ UTRs (U) and the first rbcs2 intron (I). Portions of the sap11 promoter were removed through SLiCE cloning to leave -250, -150, and -50 bp of sap11 sequence upstream of the sap11 TSS. Panel B) Flow cytometry analysis for mCherry fluorescence of 5,000 transformants of the original and shortened sap11 constructs. Populations that are statistically different are indicated (a-c, Tukey's test, p<0.05). Panel C) Putative cis-motifs (underlined) in the −150 to 0 bp region of sap11 (SEQ ID NO: 1) were targeted for mutational analysis. Eight residues (bold) were replaced with either polyA (A) or polyT (T) residues to generate six sap11 Δm mutants including one in which both motif 3 and 4 were replaced (sap11 Δm3-4). Panel D) Flow cytometry analysis for mCherry fluorescence of 5,000 transformants of the sap 11 construct compared with sap 11 motif deletion constructs.



FIGS. 6, panels A-C, illustrates locally enriched POWRs and DREME motifs in top 4,412 promoters from C. reinhardtii nuclear genome. EST validated promoters were analyzed with CentriMo for locally enriched motifs. Relative enrichment of motifs relative to the TSS for the top three categories of motifs is shown (panels A-C).



FIG. 7 illustrates alignment of CCCAT motif with homologous motifs in H. sapiens and Arabidopsis thaliana.



FIG. 8 illustrates GC and AT content of top 4,412 EST validated C. reinhardtii promoters.



FIG. 9 illustrates production of transcription factor (TF) library proteins in yeast. Immunoblot of whole cell lysates of S. cerevisiae strains producing TF library proteins separated by SDS-PAGE and probed with anti-GAL4-AD antibody. Numbers below each blot indicate TF library number.



FIG. 10 illustrates C. reinhardtii TF library tested for transcription activation from select promoters via yeast one-hybrid assay. Y1H assay performed with all 92 TF library proteins against five C. reinhardtii promoters (LCIC, LCI5, SEBP1, Nar1.2, and LHCBM5), each in 300 bp fragments (labeled A, B, and C). Functional read out was expression of the lux gene. Red data points indicate statistical significance of increased lux expression compared to an empty vector control (see Materials and Methods). x axes: TF, transcription factor library number.



FIG. 11 illustrates yeast one-hybrid assay using orthologous promoters (2.1), TF64-associated promoters (2.2). Y1H assay performed with all 92 TF library proteins against promoters (LCIC, LCI5, SEBP1, Nar1.2, and LHCBM5), each in 300 bp fragments (labeled A, B, and C) from V. carteri (Vca), C. vulgaris (Cvu), A. thaliana (Ath), and Z. mays (Zma). Functional read out was expression of the lux gene. Red data points indicate statistical significance of increased lux expression compared to an empty vector control (see Materials and Methods of Example 2).



FIG. 12, panels A and B, illustrate alignment of TF64-associated promoter sequences. MEME analysis of the promoter fragments associated with TF64 via Y1H assay. Panel A) Top motif identified among promoters analyzed. Panel B) Promoter sequences showing top motif location. CANNTG sequences are underlined. Sequences: Cre_NAR1.2_C (Seq ID NO: 2), Cre_NAR1.2_C (Seq ID NO: 3), Cre_LCIC_C (Seq ID NO: 4), Vca_LCIC_A (Seq ID NO: 5), Vca_SEBP1_A (Seq ID NO: 6), Zma_SEBP1_B (Seq ID NO: 7), Cre_SEBP1_C (Seq ID NO: 8), Vca_LCIC_A (Seq ID NO: 9), Cre_SEBP1_C (Seq ID NO: 10), Vca_LHCB5_C (Seq ID NO: 11), Vca_LHCB5_C (Seq ID NO: 12), Cre_LCIC_C (Seq ID NO: 13), Cre_LCIC_C (Seq ID NO: 14), Cre_SEBP1_B (Seq ID NO: 15), Cre_LCIC_C (Seq ID NO: 16), Cre_LCI5_C (Seq ID NO: 17).



FIG. 13, panels A-D, illustrates Basic Helix-Loop-Helix transcription factor alignment, strain construction and growth. Panel A) Protein sequence alignment of TF64-related proteins. The C. reinhardtii TF64 sequence from the PlnTFD was used as a query in a BLAST search for related proteins. Selected top hits are shown. C. reinhardtii strain 503 (in bold, used as a reference strain in this study due to the lack of a published sequence for strain cc1010) was among the top hits. Proteins from other related algal species are also shown. Alignment is focused on the basic Helix-Loop-Helix region. Functionally important conserved residues are indicated by color. C. rein PTFD (SEQ ID NO: 18), C. rein cc503 (SEQ ID NO: 19), V. carteri (SEQ ID NO: 20), A. protothecoides (SEQ ID NO: 21), C. subelliposoidea (SEQ ID NO: 22). Panel B) Schematic of the pTM207 vectors used to constitutively express the gene encoding TF64 and GFP. The ble gene confers zeocin resistance and 2A is a linker peptide that is cleaved post-translationally. The pTM207 vector also encodes an N-terminal 3×FLAG-tag fused to each TF, not shown. Panel C) Immunoblot of whole cell lysates of wild type (WT) C. reinhardtii and engineered strains producing TF64 (64-4, 64-7, 64-8, 64-9, 64-11) or GFP, separated by SDS-PAGE and probed with anti-FLAG antibody. Higher molecular weight product is prior to 2A cleavage. Panel D) Growth curves of wild type (cc1010) C. reinhardtii and strains producing TF64 (TF64-7) or GFP, cultured for four days in TAP medium under constant light. Growth was measured at OD750. Data is plotted from three biological replicates with the SEM for each strain. The “Exponential Growth” graph indicates the slope of the line during log phase growth for each strain by color.



FIG. 14, panels A-C, illustrate RNA-sequencing data from two strains constitutively producing either low or high amounts of TF64. Panel A) Differential transcription analysis of strains cc1010::TF64-7 and cc1010::TF64-9 compared to cc1010::GFP by RNA-sequencing. The log2 (fold change) was plotted for each unique read with a FPKM value≥1.0 (see Materials and Methods). Panel B) Comparison of RNA-Seq data from each TF64-producing strain (TF64-7 and TF64-9). Each data point represents a unique read. The log2 (fold change) was plotted. Purple line represents the best-fit line for all data, R2=0.498, slope=0.560. Panel C) Heat map of expression profiles from the top 20 activated and inhibited genes and Y1H-assayed genes in strains cc1010::TF64-7 and cc1010::TF64-9 compared to cc1010::GFP. Units for heat map key values are log2 (fold change). Genes of interest are labeled below the heat map. RNA-sequencing data was compiled from three biological replicates.



FIGS. 15, panels A-B, illustrates transcription regulation of light harvesting complex II components and Yeast One-Hybrid—assayed genes by TF64. Expression data for A) genes LHCBM1-9 and B) genes LCI5, SEBP1, LCIC, NAR1.2 from strain cc1010::TF64-7 compared to cc1010::GFP analyzed by RT-qPCR and RNA-Seq. The log2 (fold change) was plotted. RT-qPCR data is from two biological replicates with SEM. RNA-Seq data is the average of three biological replicates. Note that there were multiple unique reads for certain genes.



FIGS. 16A and 16B, illustrate position frequency matrices rendered with Weblogo (Crooks et al., Genome Res. 2004 Jun;14(6):1188-90). Letter height indicates relative frequency of nucleotides in the 8-letter motif. Below the position weight matrices is a nucleotide consensus sequence given for the motif. A probability cut off of 0.1 (out of 1) in the position probability matrix for the motif was used for the inclusion in the consensus sequence. N=A,T,G, or C. [X/Z] notation indicates that either nucleotide X or Z could be represented at a single position (e.g., A[G/C]T indicates that the first nucleotide in the motif is A and the second is either G or C while the third is T resulting in the variants AGT or ACT of the motif.). FIG. 16A shows unique light-upregulated motif as position weight matrix rendered with Weblogo and IUPAC nucleotide consensus of light-upregulated motifs. FIG. 16B shows unique dark-upregulated motif as position weight matrix rendered with Weblogo and IUPAC nucleotide consensus of dark-upregulated motifs.



FIGS. 17A, 17B, and 17C illustrate predicted binding sites for Chlamydomonas reinhardtii transcription factor families as deduced by the Plant Transcription Factor Database. Letter height indicates relative frequency of nucleotides in the proposed binding sequence. To the right of the position weight matrices is a nucleotide consensus sequence given for the motif. A probability cut off of 0.1 (out of 1) in the position probability matrix for the motif was used for the inclusion in the consensus sequence.



FIG. 18 illustrates AR1 promoter sequence (SEQ ID NO: 23) with putative bHLH-family TF binding sites identified by underlined and bolded text.



FIG. 19 illustrates orange fluorescent protein (OFP) fluorescence when driven by AR1 in a TF64 expressing strain.





DETAILED DESCRIPTION
1. Introduction

Algae have enormous potential as bio-factories for the efficient production of a wide array of high-value products, and eventually as a source of renewable biofuels. However, tools for engineering the nuclear genomes of algae remain scarce and limited in functionality. We generated synthetic algal promoters (saps) as a tool for increasing nuclear gene expression and as a model for understanding promoter elements and structure in green algae. Promoters were generated to mimic native cis-motif elements, structure, and overall nucleotide composition of top expressing genes from Chlamydomonas reinhardtii. Twenty five saps were used to drive expression of a fluorescent report in transgenic algae. A majority of the promoters were functional in vivo and seven were identified to drive expression of the fluorescent reporter better than the current best endogenous promoter in C. reinhardtii, the chimeric hsp70/rbs2 promoter. Further analysis of the best synthetic promoter, sap11, revealed a new DNA motif essential for promoter function that is widespread and highly conserved in C. reinhardtii. These data demonstrate the utility of synthetic promoters to drive gene expression in green algae, and lays the groundwork for the development of a suite of saps capable of driving the robust and complex gene expression that will be required for algae to reach their potential as an industrial platform for photosynthetic bio-manufacturing.


2. Synthetic Promoters

Provided are synthetic promoters useful for high level transcription or expression of polynucleotides in an algal cell. Accordingly, in one aspect, provided is a synthetic promoter capable of promoting and/or initiating transcription of a polynucleotide in an algal cell. In varying embodiments, the synthetic promoter comprising from 3 to 30, e.g., from 3 to 27, e.g., from 3 to 25, e.g., from 3 to 20, e.g., from 3 to 15, e.g., from 3 to 10, e.g., from 3 to 5, promoter (cis)-elements selected from the group consisting of promoter (cis)-elements shown in Table 1 and (FIGS. 16A and 16B). In varying embodiments, the promoter (cis)-elements are positioned or located within the promoter relative to the transcriptional start site (TSS) as indicated in Table 1.









TABLE 1







Location of motif (cis)-elements in the synthetic


algal promoters (saps) relative to the


transcription start site (TSS).


















matched sequence
SEQ


Motif




(promoter
ID


number
Promoter
Start
Stop
Strand
element)
NO:
















20
sap_19
−377
−369
+
AAACCCAAC






20
sap_25
−199
−191

AAACCCATC






11
sap_15
−178
−170

AACAGCCAG






100
sap_9
−408
−401
+
AACTGAGG






1
sap_12
−372
−363
+
ACCCCATCGC
24





62
sap_18
−80
−73

ACGGCCAT






104
sap_1
−54
−47

AGCAAGTC






104
sap_25
−106
−99
+
AGCAAGTC






104
sap_22
−129
−122
+
AGCAATTT






104
sap_8
−104
−97
+
AGCAATTT






51
sap_7
−359
−352

AGCGCTTT






5
sap_14
−116
−109

ATGCATTA






5
sap_4
−419
−412
+
ATGCATTT






20
sap_15
20
28
+
CAACACACC






20
sap_22
−9
−1
+
CAACCGACC






46
sap_17
−380
−372

CACACCTTG






46
sap_21
−368
−360
+
CACACTTCG






46
sap_25
−4
4
+
CACACTTCG






69
sap_2
−208
−201

CACGAACC






69
sap_15
−203
−196

CACGCAAC






69
sap_24
−354
−347

CACGCAAC






37
sap_1
−432
−425

CACGCATG






37
sap_1
−366
−359
+
CACGCATG






37
sap_24
−363
−356

CACGCATG






37
sap_4
−363
−356

CACGCATG






14
sap_4
−437
−429

CACGCCCTG






37
sap_1
−161
−154
+
CATGCATG






37
sap_1
−161
−154

CATGCATG






37
sap_10
−137
−130
+
CATGCATG






37
sap_10
−137
−130

CATGCATG






37
sap_11
−152
−145
+
CATGCATG






37
sap_11
−152
−145

CATGCATG






37
sap_13
−148
−141
+
CATGCATG






37
sap_13
−148
−141

CATGCATG






37
sap_14
−67
−60
+
CATGCATG






37
sap_14
−63
−56
+
CATGCATG






37
sap_14
−67
−60

CATGCATG






37
sap_14
−63
−56

CATGCATG






37
sap_15
−151
−144
+
CATGCATG






37
sap_15
−151
−144

CATGCATG






37
sap_16
−81
−74
+
CATGCATG






37
sap_16
−81
−74

CATGCATG






37
sap_18
−154
−147
+
CATGCATG






37
sap_18
−154
−147

CATGCATG






37
sap_19
−104
−97
+
CATGCATG






37
sap_19
−104
−97

CATGCATG






37
sap_2
−140
−133
+
CATGCATG






37
sap_2
−140
−133

CATGCATG






37
sap_20
−114
−107
+
CATGCATG






37
sap_20
−114
−107

CATGCATG






37
sap_5
−150
−143
+
CATGCATG






37
sap_5
−150
−143

CATGCATG






37
sap_1
−432
−425
+
CATGCGTG






37
sap_1
−366
−359

CATGCGTG






37
sap_24
−363
−356
+
CATGCGTG






37
sap_4
−363
−356
+
CATGCGTG






64
sap_1
−261
−254
+
CCATTTGG






1
sap_9
−71
−62
+
CCCCCATCGC
25





117
sap_7
36
43
+
CCCTCCGC






116
sap_21
−42
−35
+
CCGAGCAA






116
sap_20
−353
−346
+
CCGAGCAC






116
sap_11
−46
−39
+
CCGAGCGA






116
sap_20
−63
−56

CCGAGCGA






116
sap_11
−231
−224

CCGCGCAA






54
sap_11
−41
−34
+
CGAGCCCG






54
sap_17
−395
−388

CGAGCTCA






54
sap_11
−220
−213
+
CGAGTCCA






60
sap_12
−42
−35
+
CGCCAAAG






1
sap_11
−76
−67
+
CGCCCATTGC
26





69
sap_1
−352
−345
+
CGCGAAAC






69
sap_11
−232
−225

CGCGCAAC






69
sap_2
−347
−340

CGCGCAAC






117
sap_11
−184
−177
+
CGCGCCGC






117
sap_24
−274
−267

CGCGCCGC






14
sap_16
35
43

CGCGGACTG






117
sap_9
−326
−319

CGCTCAGC






117
sap_11
−349
−342
+
CGCTCCGC






117
sap_5
−54
−47

CGCTCCGC






2
sap_19
−35
−28
+
CGCTCCTT






117
sap_11
−355
−348

CGCTCGGC






117
sap_11
−47
−40

CGCTCGGC






117
sap_14
−354
−347

CGCTCGGC






24
sap_14
37
44
+
CGGGCACG






54
sap_12
−196
−189
+
CGGGCCCA






54
sap_15
−324
−317

CGGGCCCA






54
sap_20
−130
−123

CGGGCCCA






54
sap_21
−73
−66
+
CGGGCCCA






54
sap_23
−312
−305

CGGGCCCA






54
sap_25
−271
−264
+
CGGGCCCA






54
sap_25
−156
−149
+
CGGGCCCA






54
sap_6
−135
−128

CGGGCCCA






54
sap_8
−210
−203

CGGGCCCA






3
sap_1
−85
−77
+
CGTACGGCA






3
sap_14
−88
−80
+
CGTACGGCA






3
sap_2
−84
−76

CGTACGGCA






3
sap_23
−65
−57
+
CGTACTGCA






14
sap_16
−338
−330
+
CTCGCACAG






2
sap_13
−24
−17
+
CTCTCCCT






2
sap_18
−19
−12
+
CTCTCCTT






2
sap_20
−26
−19
+
CTCTCCTT






2
sap_19
−25
−18
+
CTCTCTTT






2
sap_2
−16
−9
+
CTCTCTTT






2
sap_23
−20
−13
+
CTCTCTTT






2
sap_24
−28
−21
+
CTCTCTTT






2
sap_25
−19
−12
+
CTCTCTTT






2
sap_3
−25
−18
+
CTCTCTTT






2
sap_5
−27
−20
+
CTCTCTTT






2
sap_5
−19
−12
+
CTCTCTTT






2
sap_8
−19
−12
+
CTCTCTTT






116
sap_12
−303
−296

CTGAGCAA






2
sap_20
−35
−28
+
CTTTCCTT






2
sap_20
−21
−14
+
CTTTCCTT






2
sap_6
−273
−266

CTTTCCTT






2
sap_11
−21
−14
+
CTTTCTTT






2
sap_16
−29
−22
+
CTTTCTTT






2
sap_18
−14
−7
+
CTTTCTTT






2
sap_21
−19
−12
+
CTTTCTTT






2
sap_3
−37
−30
+
CTTTCTTT






2
sap_4
−25
−18
+
CTTTCTTT






20
sap_20
−186
−178

GAACCCACC






46
sap_16
5
13
+
GACACCTCA






24
sap_1
−274
−267
+
GAGGCGCG






24
sap_21
−198
−191

GAGGCGCG






86
sap_10
−122
−115

GCACGGGC






86
sap_19
−134
−127

GCACGGGC






86
sap_14
40
47
+
GCACGGGT






86
sap_6
5
12

GCACGGTC






86
sap_9
−374
−367
+
GCACGGTC






50
sap_23
−259
−252
+
GCCAGAGC






50
sap_24
−285
−278
+
GCCAGAGC






50
sap_21
−414
−407

GCCAGGAC






50
sap_15
39
46

GCCAGGGC






50
sap_21
−177
−170
+
GCCAGGGC






50
sap_3
41
48

GCCAGGGC






50
sap_4
−439
−432
+
GCCAGGGC






50
sap_5
−273
−266
+
GCCAGGGC






50
sap_7
−182
−175

GCCAGGGC






1
sap_3
−97
−88

GCCCCAATGC
27





1
sap_21
−408
−399
+
GCCCCAGCGC
28





1
sap_17
−83
−74

GCCCCATTGC
29





50
sap_24
−188
−181
+
GCCCGAGC






50
sap_25
−159
−152

GCCCGAGC






113
sap_12
−66
−59

GCGAGCGA






113
sap_14
−204
−197

GCGAGCGA






113
sap_18
−117
−110
+
GCGAGCGA






113
sap_20
−67
−60

GCGAGCGA






113
sap_23
−220
−213

GCGAGCGA






113
sap_3
−224
−217

GCGAGCGA






113
sap_7
−259
−252
+
GCGAGCGA






113
sap_8
−261
−254
+
GCGAGCGA






113
sap_8
−257
−250
+
GCGAGCGA






113
sap_9
−52
−45
+
GCGAGCGA






113
sap_1
−40
−33
+
GCGAGCGC






113
sap_10
−43
−36
+
GCGAGCGC






113
sap_12
−344
−337

GCGAGCGC






113
sap_13
−252
−245
+
GCGAGCGC






113
sap_15
−215
−208

GCGAGCGC






113
sap_15
−111
−104
+
GCGAGCGC






113
sap_16
−341
−334

GCGAGCGC






113
sap_17
−238
−231
+
GCGAGCGC






113
sap_17
30
37

GCGAGCGC






113
sap_18
−43
−36
+
GCGAGCGC






113
sap_23
−241
−234
+
GCGAGCGC






113
sap_24
−69
−62

GCGAGCGC






113
sap_25
−292
−285

GCGAGCGC






113
sap_25
−44
−37
+
GCGAGCGC






113
sap_6
−188
−181
+
GCGAGCGC






113
sap_6
−63
−56
+
GCGAGCGC






113
sap_7
−41
−34
+
GCGAGCGC






113
sap_9
−48
−41
+
GCGAGCGC






1
sap_15
−221
−212

GCGCCATCGC
30





1
sap_23
−75
−66
+
GCGCCATCGC
30





1
sap_23
−81
−72

GCGCCATCGC
30





1
sap_25
−229
−220

GCGCCATCGC
30





1
sap_8
−35
−26

GCGCCATTGC
30





113
sap_1
−36
−29
+
GCGCGCGA






113
sap_12
−246
−239

GCGCGCGA






113
sap_16
−174
−167
+
GCGCGCGA






113
sap_19
−248
−241

GCGCGCGA






113
sap_19
−224
−217
+
GCGCGCGA






113
sap_20
−252
−245
+
GCGCGCGA






113
sap_23
−245
−238
+
GCGCGCGA






113
sap_3
−189
−182
+
GCGCGCGA






113
sap_4
−187
−180
+
GCGCGCGA






113
sap_7
−45
−38
+
GCGCGCGA






113
sap_8
−231
−224
+
GCGCGCGA






113
sap_9
−237
−230
+
GCGCGCGA






113
sap_10
−39
−32
+
GCGCGCGC






113
sap_10
−39
−32

GCGCGCGC






113
sap_11
−187
−180
+
GCGCGCGC






113
sap_11
−187
−180

GCGCGCGC






113
sap_11
−99
−92
+
GCGCGCGC






113
sap_11
−99
−92

GCGCGCGC






113
sap_12
−244
−237
+
GCGCGCGC






113
sap_12
−242
−235
+
GCGCGCGC






113
sap_12
−244
−237

GCGCGCGC






113
sap_12
−242
−235

GCGCGCGC






113
sap_13
−248
−241
+
GCGCGCGC






113
sap_13
−246
−239
+
GCGCGCGC






113
sap_13
−248
−241

GCGCGCGC






113
sap_13
−246
−239

GCGCGCGC






113
sap_14
−42
−35
+
GCGCGCGC






113
sap_14
−42
−35

GCGCGCGC






113
sap_16
−176
−169
+
GCGCGCGC






113
sap_16
−176
−169

GCGCGCGC






113
sap_16
−128
−121
+
GCGCGCGC






113
sap_16
−128
−121

GCGCGCGC






113
sap_18
−39
−32
+
GCGCGCGC






113
sap_18
−39
−32

GCGCGCGC






113
sap_19
−246
−239
+
GCGCGCGC






113
sap_19
−244
−237
+
GCGCGCGC






113
sap_19
−242
−235
+
GCGCGCGC






113
sap_19
−246
−239

GCGCGCGC






113
sap_19
−244
−237

GCGCGCGC






113
sap_19
−242
−235

GCGCGCGC






113
sap_19
−226
−219
+
GCGCGCGC






113
sap_19
−226
−219

GCGCGCGC






113
sap_19
−42
−35
+
GCGCGCGC






113
sap_19
−40
−33
+
GCGCGCGC






113
sap_19
−42
−35

GCGCGCGC






113
sap_19
−40
−33

GCGCGCGC






113
sap_20
−254
−247
+
GCGCGCGC






113
sap_20
−254
−247

GCGCGCGC






113
sap_3
−191
−184
+
GCGCGCGC






113
sap_3
−191
−184

GCGCGCGC






113
sap_6
−238
−231
+
GCGCGCGC






113
sap_6
−238
−231

GCGCGCGC






113
sap_8
−233
−226
+
GCGCGCGC






113
sap_8
−233
−226

GCGCGCGC






113
sap_8
−43
−36
+
GCGCGCGC






113
sap_8
−41
−34
+
GCGCGCGC






113
sap_8
−43
−36

GCGCGCGC






113
sap_8
−41
−34

GCGCGCGC






113
sap_9
−239
−232
+
GCGCGCGC






113
sap_9
−239
−232

GCGCGCGC






59
sap_10
−364
−357

GCGCGCGT






59
sap_15
−244
−237
+
GCGCGCGT






59
sap_15
−246
−239

GCGCGCGT






59
sap_16
−130
−123

GCGCGCGT






59
sap_19
−240
−233
+
GCGCGCGT






59
sap_25
−223
−216
+
GCGCGCGT






113
sap_11
−191
−184

GCGCTCGA






113
sap_1
−40
−33

GCGCTCGC






113
sap_10
−43
−36

GCGCTCGC






113
sap_12
−344
−337
+
GCGCTCGC






113
sap_13
−252
−245

GCGCTCGC






113
sap_15
−215
−208
+
GCGCTCGC






113
sap_15
−111
−104

GCGCTCGC






113
sap_16
−341
−334
+
GCGCTCGC






113
sap_17
−238
−231

GCGCTCGC






113
sap_17
30
37
+
GCGCTCGC






113
sap_18
−43
−36

GCGCTCGC






113
sap_23
−241
−234

GCGCTCGC






113
sap_24
−69
−62
+
GCGCTCGC






113
sap_25
−292
−285
+
GCGCTCGC






113
sap_25
−44
−37

GCGCTCGC






113
sap_6
−188
−181

GCGCTCGC






113
sap_6
−63
−56

GCGCTCGC






113
sap_7
−41
−34

GCGCTCGC






113
sap_9
−48
−41

GCGCTCGC






59
sap_12
−342
−335
+
GCTCGCGT






59
sap_21
−273
−266
+
GCTCGCGT






59
sap_6
−264
−257
+
GCTCGCGT






60
sap_17
−420
−413
+
GGCCAGCG






1
sap_24
−215
−206
+
GGCCCAACGC
31





1
sap_22
−346
−337

GGCCCACTGC
32





1
sap_21
−185
−176
+
GGCCCAGCGC
33





1
sap_21
−71
−62
+
GGCCCATCGC
34





1
sap_11
−165
−156

GGCCCATTCC
35





1
sap_14
−177
−168
+
GGCCCATTCC
35





1
sap_16
−150
−141

GGCCCATTCC
35





1
sap_17
−348
−339

GGCCCATTCC
35





1
sap_2
−239
−230
+
GGCCCATTCC
35





1
sap_22
−340
−331
+
GGCCCATTCC
35





1
sap_24
−221
−212

GGCCCATTCC
35





1
sap_25
−154
−145
+
GGCCCATTCC
35





1
sap_3
−274
−265

GGCCCATTCC
35





1
sap_5
−117
−108

GGCCCATTCC
35





1
sap_7
−288
−279
+
GGCCCATTCC
35





1
sap_22
−91
−82
+
GGCCCATTGC
36





1
sap_3
−154
−145
+
GGCCCATTGC
36





60
sap_13
−346
−339

GGCCGAAG






60
sap_25
−60
−53
+
GGCCGGAG






47
sap_12
−46
−39

GGCGAGAC






47
sap_17
−252
−245
+
GGCGAGAC






47
sap_16
−225
−218

GGCGCGAC






47
sap_19
−65
−58
+
GGCGCGAC






47
sap_25
−101
−94

GGCGCGAC






117
sap_14
−106
−99
+
GGCTCCGC






117
sap_19
−323
−316

GGCTCCGC






47
sap_20
−7
0

GGCTCGAC






1
sap_12
−90
−81

GGGCCATTGC
37





24
sap_13
−123
−116

GGGGCCCG






24
sap_14
−334
−327

GGGGCCCG






24
sap_11
−96
−89

GGGGCGCG






24
sap_19
−79
−72

GGGGCGCG






24
sap_20
−321
−314

GGGGCGCG






24
sap_25
−287
−280

GGGGCGCG






3
sap_9
−102
−94
+
GGTACGGCA






57
sap_23
−434
−427
+
GTCCACTG






14
sap_23
−443
−435

GTCGCCCTG






47
sap_8
−6
1
+
GTCGCGAC






47
sap_8
−6
1

GTCGCGAC






47
sap_17
−63
−56

GTCGCGAT






105
sap_19
−130
−123
+
GTGCGCCC






105
sap_11
−9
−2
+
GTGTGCCC






57
sap_18
−161
−154

GTTCAATG






57
sap_23
−383
−376
+
GTTCGCTG






11
sap_17
−159
−151
+
TACAGCAAG






11
sap_25
−260
−252
+
TACAGCAAG






11
sap_21
−115
−107

TACGGCCAG






26
sap_5
−285
−278
+
TCAAACCA






113
sap_11
−191
−184
+
TCGAGCGC






113
sap_1
−36
−29

TCGCGCGC






113
sap_12
−246
−239
+
TCGCGCGC






113
sap_16
−174
−167

TCGCGCGC






113
sap_19
−248
−241
+
TCGCGCGC






113
sap_19
−224
−217

TCGCGCGC






113
sap_20
−252
−245

TCGCGCGC






113
sap_23
−245
−238

TCGCGCGC






113
sap_3
−189
−182

TCGCGCGC






113
sap_4
−187
−180

TCGCGCGC






113
sap_7
−45
−38

TCGCGCGC






113
sap_8
−231
−224

TCGCGCGC






113
sap_9
−237
−230

TCGCGCGC






59
sap_19
−353
−346
+
TCGCGCGT






59
sap_25
−323
−316

TCGCGCGT






113
sap_12
−66
−59
+
TCGCTCGC






113
sap_14
−204
−197
+
TCGCTCGC






113
sap_18
−117
−110

TCGCTCGC






113
sap_20
−67
−60
+
TCGCTCGC






113
sap_23
−220
−213
+
TCGCTCGC






113
sap_3
−224
−217
+
TCGCTCGC






113
sap_7
−259
−252

TCGCTCGC






113
sap_8
−261
−254

TCGCTCGC






113
sap_8
−257
−250

TCGCTCGC






113
sap_9
−52
−45

TCGCTCGC






59
sap_10
−338
−331

TCTCGCGA






59
sap_24
−201
−194
+
TCTCGCGA






59
sap_6
−207
−200
+
TCTCGCGA






59
sap_9
−65
−58

TCTCGCGA






59
sap_19
−289
−282
+
TCTCGCGT






54
sap_19
38
45
+
TGAGCCCA






63
sap_1
−372
−365

TGCACACC






63
sap_17
−377
−370

TGCACACC






63
sap_8
−286
−279

TGCACACC






3
sap_21
−435
−427

TGCAGGGCA






109
sap_21
−200
−193
+
TGCGCGCC






109
sap_6
−219
−212
+
TGCGCGCC






51
sap_5
−230
−223
+
TGCGCTTT






51
sap_6
−368
−361

TGCGCTTT






109
sap_4
−344
−337

TGCTCACC






109
sap_4
35
42

TGCTCACC






109
sap_23
−38
−31

TGCTCGCA






109
sap_8
−145
−138
+
TGCTCGCA






38
sap_5
−447
−440
+
TGGAAAGG






38
sap_19
2
9

TGGTAAGG






3
sap_15
−63
−55

TGTACGGCA






3
sap_19
−93
−85
+
TGTACGGCA






109
sap_23
−414
−407
+
TGTTCGCC






109
sap_8
−223
−216

TGTTCGCC






108
sap_18
−348
−341
+
TTCGCAAA






108
sap_5
−314
−307
+
TTCGCGAA






108
sap_5
−314
−307

TTCGCGAA






108
sap_8
−302
−295
+
TTCGCGAA






108
sap_8
−302
−295

TTCGCGAA






51
sap_18
−205
−198

TTCGCTTG





* The start and stop values are relative to the artificial TSS that is part of the synthetic promoter sequence. So a motif at −50 would actually be at −100 to the 3∝ end of the whole sap sequence.






Various additional cis elements are shown in Table 2.









TABLE 2





Illustrative additional cis elements.


Sequence

















TCTTTACTT







TACAGCCAG







CTCGCACTG







CAACCCAGC







CAGGCGCG







TCAAACCA







ACATACAA







CACGCGTG







TGGAAACG







TACACCTCG







GCCAGAAC







TTCGCTTT







CGAGCCCA







GTTCACTG







GGCCAAAG







ACGGCCGA







TACACACC







CCGTTCGG







CACGAAAC







GCACGTGC







TGATATCA







AACTCAGG







GTGGGACC







TTCGCCAA










In certain embodiments, the synthetic promoter comprises one or more Myb family, SBP family, bHLH family, C2H2 family, bZIP family, C3H family, Dof family or G2 family transcriptional factor binding site motifs. In certain embodiments, the synthetic promoter comprises one or more transcriptional factor binding site motifs selected from the group consisting of the sequences in FIGS. 17A-17C.


The (cis)-elements are positioned or arranged within a promoter scaffold or backbone. In varying embodiments, the nucleic acid base of highest probability or second highest probability at a particular position of the promoter scaffold or backbone (e.g., based on known native promoter sequences) relative to the transcriptional start site (TSS) is assigned to that position, e.g., as indicated in Table 3.









TABLE 3





Average nucleotide composition of native C. reinhardtii promoters.























position relative to TSS:
−449
−448
−447
−446
−445
−444
−443
−442





A
0.191
0.191
0.191
0.191
0.191
0.191
0.191
0.191


C
0.298
0.299
0.299
0.299
0.299
0.299
0.299
0.299


G
0.317
0.316
0.316
0.316
0.315
0.315
0.315
0.315


T
0.192
0.192
0.192
0.193
0.193
0.193
0.193
0.194





position relative to TSS:
−441
−440
−439
−438
−437
−436
−435
−434





A
0.191
0.191
0.191
0.191
0.191
0.192
0.192
0.192


C
0.299
0.299
0.299
0.299
0.300
0.299
0.299
0.299


G
0.314
0.314
0.314
0.313
0.313
0.313
0.313
0.312


T
0.194
0.194
0.194
0.194
0.194
0.194
0.195
0.195





position relative to TSS:
−433
−432
−431
−430
−429
−428
−427
−426





A
0.193
0.193
0.193
0.193
0.193
0.193
0.193
0.193


C
0.299
0.299
0.299
0.299
0.299
0.299
0.299
0.299


G
0.312
0.311
0.311
0.310
0.310
0.310
0.310
0.309


T
0.194
0.195
0.195
0.196
0.196
0.196
0.196
0.197





position relative to TSS:
−425
−424
−423
−422
−421
−420
−419
−418





A
0.194
0.194
0.194
0.194
0.194
0.194
0.195
0.195


C
0.299
0.299
0.298
0.298
0.298
0.298
0.298
0.298


G
0.309
0.309
0.309
0.309
0.309
0.308
0.308
0.308


T
0.197
0.197
0.197
0.197
0.197
0.197
0.197
0.198





position relative to TSS:
−417
−416
−415
−414
−413
−412
−411
−410





A
0.195
0.194
0.195
0.195
0.195
0.195
0.195
0.195


C
0.297
0.297
0.297
0.297
0.297
0.297
0.297
0.296


G
0.308
0.308
0.308
0.308
0.308
0.308
0.308
0.308


T
0.198
0.198
0.198
0.198
0.198
0.198
0.199
0.199





position relative to TSS:
−409
−408
−407
−406
−405
−404
−403
−402





A
0.195
0.195
0.195
0.195
0.195
0.196
0.196
0.196


C
0.296
0.296
0.296
0.295
0.294
0.294
0.294
0.294


G
0.307
0.308
0.308
0.308
0.308
0.308
0.308
0.308


T
0.199
0.200
0.200
0.200
0.200
0.201
0.200
0.200





position relative to TSS:
−401
−400
−399
−398
−397
−396
−395
−394





A
0.196
0.196
0.196
0.196
0.196
0.196
0.196
0.196


C
0.294
0.294
0.293
0.293
0.293
0.293
0.293
0.292


G
0.308
0.308
0.308
0.308
0.308
0.308
0.308
0.308


T
0.200
0.200
0.201
0.201
0.201
0.201
0.201
0.201





position relative to TSS:
−393
−392
−391
−390
−389
−388
−387
−386





A
0.196
0.196
0.196
0.196
0.196
0.196
0.196
0.196


C
0.292
0.291
0.292
0.291
0.291
0.291
0.290
0.290


G
0.309
0.309
0.309
0.309
0.310
0.309
0.310
0.310


T
0.201
0.202
0.202
0.202
0.201
0.202
0.202
0.202





position relative to TSS:
−385
−384
−383
−382
−381
−380
−379
−378





A
0.195
0.195
0.195
0.195
0.195
0.195
0.195
0.195


C
0.290
0.289
0.290
0.290
0.290
0.289
0.289
0.289


G
0.311
0.311
0.311
0.312
0.312
0.312
0.313
0.313


T
0.202
0.202
0.202
0.202
0.202
0.202
0.201
0.201





position relative to TSS:
−377
−376
−375
−374
−373
−372
−371
−370





A
0.194
0.194
0.195
0.195
0.195
0.195
0.194
0.194


C
0.289
0.289
0.289
0.289
0.290
0.290
0.290
0.289


G
0.313
0.313
0.313
0.313
0.313
0.313
0.314
0.314


T
0.201
0.201
0.201
0.201
0.201
0.201
0.201
0.200





position relative to TSS:
−369
−368
−367
−366
−365
−364
−363
−362





A
0.195
0.195
0.195
0.195
0.195
0.194
0.194
0.194


C
0.289
0.289
0.289
0.289
0.290
0.290
0.290
0.290


G
0.314
0.315
0.315
0.315
0.315
0.315
0.315
0.315


T
0.200
0.200
0.200
0.200
0.199
0.199
0.199
0.199





position relative to TSS:
−361
−360
−359
−358
−357
−356
−355
−354





A
0.194
0.194
0.194
0.193
0.193
0.193
0.193
0.193


C
0.290
0.290
0.290
0.290
0.290
0.290
0.290
0.291


G
0.316
0.316
0.316
0.316
0.316
0.316
0.316
0.316


T
0.199
0.199
0.199
0.198
0.198
0.198
0.198
0.198





position relative to TSS:
−353
−352
−351
−350
−349
−348
−347
−346





A
0.193
0.193
0.193
0.193
0.193
0.193
0.193
0.193


C
0.291
0.291
0.292
0.292
0.292
0.292
0.292
0.293


G
0.316
0.316
0.316
0.316
0.316
0.316
0.316
0.316


T
0.198
0.198
0.198
0.198
0.198
0.197
0.197
0.196





position relative to TSS:
−345
−344
−343
−342
−341
−340
−339
−338





A
0.193
0.193
0.193
0.193
0.193
0.193
0.193
0.193


C
0.293
0.293
0.293
0.293
0.293
0.293
0.293
0.293


G
0.316
0.316
0.316
0.316
0.316
0.316
0.316
0.316


T
0.196
0.196
0.196
0.196
0.196
0.196
0.196
0.196





position relative to TSS:
−337
−336
−335
−334
−333
−332
−331
−330





A
0.193
0.193
0.193
0.193
0.194
0.194
0.193
0.194


C
0.293
0.293
0.293
0.293
0.292
0.293
0.293
0.293


G
0.316
0.316
0.316
0.316
0.316
0.315
0.315
0.315


T
0.196
0.196
0.196
0.196
0.196
0.196
0.197
0.197





position relative to TSS:
−329
−328
−327
−326
−325
−324
−323
−322





A
0.193
0.193
0.193
0.193
0.193
0.193
0.194
0.194


C
0.293
0.293
0.293
0.293
0.293
0.293
0.293
0.293


G
0.315
0.314
0.314
0.314
0.314
0.314
0.314
0.313


T
0.197
0.197
0.197
0.198
0.198
0.198
0.198
0.198





position relative to TSS:
−321
−320
−319
−318
−317
−316
−315
−314





A
0.194
0.195
0.195
0.195
0.195
0.195
0.195
0.196


C
0.293
0.293
0.293
0.293
0.293
0.293
0.293
0.293


G
0.313
0.312
0.311
0.312
0.311
0.311
0.311
0.311


T
0.198
0.198
0.198
0.198
0.198
0.199
0.199
0.199





position relative to TSS:
−313
−312
−311
−310
−309
−308
−307
−306





A
0.196
0.196
0.196
0.196
0.196
0.196
0.196
0.197


C
0.292
0.292
0.292
0.292
0.292
0.292
0.292
0.291


G
0.311
0.310
0.310
0.310
0.309
0.309
0.308
0.308


T
0.199
0.200
0.200
0.201
0.201
0.201
0.201
0.202





position relative to TSS:
−305
−304
−303
−302
−301
−300
−299
−298





A
0.197
0.198
0.198
0.198
0.199
0.199
0.200
0.200


C
0.290
0.290
0.289
0.289
0.288
0.288
0.288
0.288


G
0.307
0.307
0.307
0.307
0.306
0.305
0.305
0.304


T
0.203
0.203
0.203
0.204
0.205
0.205
0.205
0.206





position relative to TSS:
−297
−296
−295
−294
−293
−292
−291
−290





A
0.201
0.201
0.202
0.202
0.202
0.202
0.202
0.203


C
0.287
0.287
0.287
0.286
0.286
0.285
0.284
0.284


G
0.304
0.303
0.303
0.302
0.302
0.302
0.303
0.302


T
0.206
0.206
0.207
0.207
0.207
0.208
0.208
0.209





position relative to TSS:
−289
−288
−287
−286
−285
−284
−283
−282





A
0.203
0.204
0.204
0.205
0.206
0.206
0.206
0.207


C
0.284
0.283
0.282
0.281
0.281
0.280
0.280
0.279


G
0.302
0.302
0.302
0.301
0.301
0.301
0.300
0.300


T
0.209
0.209
0.209
0.210
0.210
0.211
0.211
0.212





position relative to TSS:
−281
−280
−279
−278
−277
−276
−275
−274





A
0.207
0.207
0.208
0.209
0.209
0.210
0.210
0.210


C
0.278
0.278
0.277
0.276
0.276
0.275
0.275
0.274


G
0.300
0.300
0.300
0.300
0.299
0.299
0.300
0.299


T
0.212
0.213
0.213
0.213
0.213
0.214
0.214
0.215





position relative to TSS:
−273
−272
−271
−270
−269
−268
−267
−266





A
0.210
0.210
0.211
0.211
0.212
0.212
0.213
0.213


C
0.273
0.273
0.273
0.272
0.272
0.271
0.270
0.270


G
0.299
0.299
0.299
0.299
0.299
0.298
0.298
0.299


T
0.215
0.215
0.215
0.215
0.215
0.216
0.216
0.216





position relative to TSS:
−265
−264
−263
−262
−261
−260
−259
−258





A
0.213
0.213
0.213
0.213
0.213
0.213
0.213
0.213


C
0.270
0.270
0.269
0.270
0.269
0.269
0.269
0.269


G
0.299
0.299
0.299
0.298
0.299
0.298
0.298
0.298


T
0.216
0.216
0.217
0.217
0.217
0.217
0.217
0.217





position relative to TSS:
−257
−256
−255
−254
−253
−252
−251
−250





A
0.214
0.214
0.214
0.214
0.214
0.214
0.214
0.214


C
0.268
0.268
0.268
0.268
0.268
0.268
0.268
0.267


G
0.299
0.299
0.299
0.300
0.300
0.300
0.300
0.301


T
0.217
0.216
0.216
0.216
0.216
0.216
0.216
0.215





position relative to TSS:
−249
−248
−247
−246
−245
−244
−243
−242





A
0.214
0.213
0.213
0.213
0.213
0.213
0.213
0.213


C
0.268
0.267
0.267
0.268
0.268
0.268
0.268
0.268


G
0.301
0.302
0.302
0.302
0.303
0.303
0.303
0.304


T
0.215
0.215
0.215
0.215
0.214
0.214
0.214
0.213





position relative to TSS:
−249
−248
−247
−246
−245
−244
−243
−242





A
0.214
0.213
0.213
0.213
0.213
0.213
0.213
0.213


C
0.268
0.267
0.267
0.268
0.268
0.268
0.268
0.268


G
0.301
0.302
0.302
0.302
0.303
0.303
0.303
0.304


T
0.215
0.215
0.215
0.215
0.214
0.214
0.214
0.213





position relative to TSS:
−241
−240
−239
−238
−237
−236
−235
−234





A
0.212
0.212
0.212
0.212
0.211
0.211
0.211
0.211


C
0.268
0.268
0.268
0.268
0.269
0.269
0.269
0.270


G
0.305
0.305
0.305
0.306
0.306
0.306
0.306
0.306


T
0.213
0.212
0.212
0.212
0.211
0.211
0.211
0.210





position relative to TSS:
−233
−232
−231
−230
−229
−228
−227
−226





A
0.211
0.212
0.212
0.212
0.212
0.212
0.212
0.212


C
0.270
0.269
0.270
0.270
0.270
0.270
0.270
0.270


G
0.307
0.307
0.307
0.307
0.308
0.308
0.308
0.308


T
0.210
0.210
0.210
0.209
0.209
0.208
0.208
0.207





position relative to TSS:
−225
−224
−223
−222
−221
−220
−219
−218





A
0.212
0.212
0.212
0.212
0.212
0.212
0.212
0.212


C
0.270
0.270
0.270
0.270
0.270
0.271
0.271
0.270


G
0.308
0.309
0.309
0.309
0.310
0.310
0.310
0.310


T
0.207
0.207
0.207
0.206
0.206
0.205
0.205
0.205





position relative to TSS:
−217
−216
−215
−214
−213
−212
−211
−210





A
0.212
0.212
0.212
0.212
0.212
0.212
0.213
0.213


C
0.270
0.270
0.270
0.270
0.270
0.271
0.270
0.271


G
0.310
0.310
0.310
0.310
0.310
0.309
0.309
0.308


T
0.205
0.205
0.206
0.206
0.206
0.206
0.206
0.206





position relative to TSS:
−209
−208
−207
−206
−205
−204
−203
−202





A
0.213
0.213
0.214
0.215
0.215
0.215
0.216
0.216


C
0.271
0.271
0.271
0.271
0.271
0.271
0.270
0.271


G
0.308
0.308
0.307
0.307
0.306
0.306
0.306
0.305


T
0.206
0.206
0.206
0.206
0.206
0.207
0.207
0.207





position relative to TSS:
−201
−200
−199
−198
−197
−196
−195
−194





A
0.216
0.216
0.216
0.217
0.217
0.217
0.218
0.218


C
0.270
0.270
0.270
0.270
0.270
0.269
0.269
0.268


G
0.305
0.304
0.303
0.303
0.303
0.302
0.302
0.301


T
0.208
0.208
0.209
0.209
0.209
0.210
0.210
0.211





position relative to TSS:
−193
−192
−191
−190
−189
−188
−187
−186





A
0.218
0.218
0.218
0.218
0.218
0.218
0.219
0.219


C
0.268
0.268
0.268
0.268
0.268
0.267
0.267
0.266


G
0.301
0.300
0.300
0.299
0.299
0.299
0.298
0.297


T
0.212
0.212
0.213
0.213
0.214
0.215
0.216
0.216





position relative to TSS:
−185
−184
−183
−182
−181
−180
−179
−178





A
0.219
0.219
0.219
0.220
0.220
0.221
0.221
0.221


C
0.266
0.265
0.265
0.264
0.264
0.263
0.262
0.261


G
0.297
0.296
0.296
0.295
0.294
0.294
0.293
0.293


T
0.217
0.218
0.219
0.220
0.221
0.222
0.223
0.223





position relative to TSS:
−177
−176
−175
−174
−173
−172
−171
−170





A
0.221
0.222
0.222
0.223
0.224
0.224
0.225
0.225


C
0.260
0.260
0.259
0.258
0.257
0.257
0.255
0.254


G
0.293
0.293
0.292
0.292
0.291
0.291
0.291
0.290


T
0.224
0.225
0.225
0.225
0.226
0.227
0.228
0.229





position relative to TSS:
−177
−176
−175
−174
−173
−172
−171
−170





A
0.221
0.222
0.222
0.223
0.224
0.224
0.225
0.225


C
0.260
0.260
0.259
0.258
0.257
0.257
0.255
0.254


G
0.293
0.293
0.292
0.292
0.291
0.291
0.291
0.290


T
0.224
0.225
0.225
0.225
0.226
0.227
0.228
0.229





position relative to TSS:
−169
−168
−167
−166
−165
−164
−163
−162





A
0.226
0.226
0.227
0.228
0.228
0.228
0.228
0.229


C
0.253
0.252
0.251
0.250
0.249
0.248
0.247
0.246


G
0.290
0.290
0.290
0.289
0.289
0.289
0.289
0.288


T
0.230
0.230
0.231
0.232
0.233
0.234
0.234
0.235





position relative to TSS:
−161
−160
−159
−158
−157
−156
−155
−154





A
0.230
0.231
0.232
0.232
0.232
0.233
0.233
0.233


C
0.245
0.244
0.243
0.241
0.241
0.240
0.238
0.238


G
0.288
0.288
0.287
0.288
0.288
0.287
0.288
0.288


T
0.235
0.236
0.237
0.238
0.238
0.239
0.239
0.239





position relative to TSS:
−153
−152
−151
−150
−149
−148
−147
−146





A
0.234
0.235
0.235
0.236
0.237
0.237
0.238
0.238


C
0.237
0.236
0.235
0.234
0.233
0.232
0.231
0.231


G
0.288
0.288
0.288
0.288
0.288
0.288
0.288
0.288


T
0.240
0.240
0.241
0.241
0.241
0.241
0.241
0.242





position relative to TSS:
−145
−144
−143
−142
−141
−140
−139
−138





A
0.239
0.239
0.240
0.240
0.241
0.241
0.241
0.241


C
0.230
0.229
0.229
0.228
0.227
0.227
0.227
0.227


G
0.289
0.289
0.289
0.290
0.290
0.291
0.291
0.292


T
0.241
0.242
0.241
0.241
0.240
0.240
0.240
0.239





position relative to TSS:
−137
−136
−135
−134
−133
−132
−131
−130





A
0.242
0.242
0.242
0.241
0.241
0.240
0.240
0.240


C
0.226
0.226
0.226
0.226
0.226
0.227
0.227
0.227


G
0.292
0.293
0.293
0.294
0.295
0.295
0.296
0.297


T
0.239
0.238
0.238
0.238
0.237
0.237
0.236
0.235





position relative to TSS:
−129
−128
−127
−126
−125
−124
−123
−122





A
0.240
0.240
0.239
0.239
0.238
0.238
0.237
0.237


C
0.227
0.228
0.228
0.229
0.229
0.229
0.230
0.230


G
0.299
0.300
0.300
0.301
0.303
0.304
0.305
0.306


T
0.233
0.232
0.231
0.230
0.229
0.228
0.228
0.227





position relative to TSS:
−121
−120
−119
−118
−117
−116
−115
−114





A
0.236
0.235
0.234
0.233
0.233
0.233
0.232
0.231


C
0.231
0.231
0.232
0.233
0.234
0.235
0.235
0.236


G
0.308
0.309
0.310
0.312
0.313
0.314
0.315
0.316


T
0.225
0.224
0.222
0.220
0.219
0.218
0.217
0.215





position relative to TSS:
−113
−112
−111
−110
−109
−108
−107
−106





A
0.231
0.230
0.229
0.228
0.227
0.226
0.225
0.224


C
0.238
0.238
0.239
0.240
0.241
0.242
0.243
0.244


G
0.316
0.318
0.319
0.320
0.321
0.322
0.323
0.325


T
0.214
0.213
0.212
0.210
0.209
0.208
0.207
0.206





position relative to TSS:
−105
−104
−103
−102
−101
−100
−99
−98





A
0.223
0.222
0.221
0.220
0.219
0.218
0.217
0.216


C
0.245
0.246
0.247
0.248
0.249
0.251
0.251
0.253


G
0.326
0.327
0.328
0.328
0.329
0.330
0.331
0.331


T
0.204
0.204
0.202
0.202
0.201
0.200
0.199
0.198





position relative to TSS:
−97
−96
−95
−94
−93
−92
−91
−90





A
0.216
0.215
0.215
0.214
0.214
0.213
0.212
0.211


C
0.254
0.255
0.256
0.257
0.257
0.258
0.260
0.261


G
0.331
0.332
0.332
0.332
0.332
0.332
0.333
0.333


T
0.198
0.197
0.196
0.195
0.195
0.195
0.194
0.193





position relative to TSS:
−89
−88
−87
−86
−85
−84
−83
−82





A
0.211
0.210
0.209
0.209
0.209
0.208
0.207
0.207


C
0.262
0.263
0.264
0.265
0.266
0.267
0.268
0.269


G
0.332
0.332
0.332
0.332
0.331
0.331
0.330
0.330


T
0.193
0.193
0.193
0.193
0.193
0.193
0.192
0.192





position relative to TSS:
−81
−80
−79
−78
−77
−76
−75
−74





A
0.207
0.206
0.206
0.205
0.205
0.204
0.204
0.203


C
0.271
0.271
0.273
0.274
0.275
0.275
0.276
0.277


G
0.329
0.328
0.327
0.327
0.326
0.325
0.325
0.324


T
0.192
0.192
0.192
0.192
0.193
0.193
0.193
0.194





position relative to TSS:
−73
−72
−71
−70
−69
−68
−67
−66





A
0.203
0.203
0.203
0.202
0.202
0.202
0.202
0.201


C
0.278
0.279
0.280
0.282
0.283
0.284
0.285
0.286


G
0.323
0.322
0.321
0.320
0.319
0.318
0.317
0.316


T
0.194
0.194
0.194
0.194
0.194
0.194
0.195
0.195





position relative to TSS:
−65
−64
−63
−62
−61
−60
−59
−58





A
0.201
0.202
0.202
0.203
0.203
0.204
0.204
0.205


C
0.287
0.288
0.290
0.290
0.291
0.291
0.292
0.293


G
0.314
0.311
0.309
0.306
0.304
0.301
0.299
0.296


T
0.195
0.196
0.197
0.199
0.200
0.202
0.203
0.204





position relative to TSS:
−57
−56
−55
−54
−53
−52
−51
−50





A
0.205
0.206
0.206
0.206
0.206
0.207
0.208
0.209


C
0.293
0.293
0.294
0.295
0.295
0.296
0.297
0.297


G
0.294
0.292
0.289
0.287
0.285
0.283
0.280
0.278


T
0.206
0.207
0.209
0.210
0.211
0.212
0.213
0.214





position relative to TSS:
−49
−48
−47
−46
−45
−44
−43
−42





A
0.209
0.210
0.210
0.211
0.211
0.212
0.213
0.215


C
0.297
0.298
0.298
0.299
0.299
0.299
0.299
0.299


G
0.275
0.273
0.271
0.268
0.266
0.263
0.260
0.258


T
0.217
0.218
0.219
0.221
0.223
0.224
0.225
0.227





position relative to TSS:
−41
−40
−39
−38
−37
−36
−35
−34





A
0.215
0.216
0.217
0.218
0.219
0.220
0.221
0.222


C
0.299
0.299
0.299
0.299
0.298
0.298
0.298
0.298


G
0.255
0.253
0.250
0.247
0.244
0.241
0.239
0.237


T
0.229
0.231
0.233
0.234
0.236
0.239
0.240
0.242





position relative to TSS:
−33
−32
−31
−30
−29
−28
−27
−26





A
0.223
0.225
0.226
0.227
0.228
0.230
0.232
0.233


C
0.297
0.296
0.296
0.295
0.294
0.293
0.291
0.289


G
0.234
0.231
0.229
0.226
0.223
0.221
0.220
0.218


T
0.244
0.246
0.248
0.250
0.253
0.254
0.255
0.257





position relative to TSS:
−25
−24
−23
−22
−21
−20
−19
−18





A
0.235
0.237
0.239
0.240
0.242
0.245
0.245
0.246


C
0.289
0.287
0.285
0.284
0.282
0.280
0.281
0.279


G
0.217
0.214
0.213
0.213
0.212
0.209
0.211
0.209


T
0.258
0.260
0.262
0.262
0.263
0.264
0.261
0.264





position relative to TSS:
−17
−16
−15
−14
−13
−12
−11
−10





A
0.247
0.250
0.252
0.253
0.254
0.255
0.256
0.257


C
0.278
0.276
0.275
0.274
0.273
0.273
0.271
0.271


G
0.208
0.207
0.205
0.205
0.204
0.203
0.204
0.204


T
0.266
0.266
0.267
0.267
0.267
0.268
0.268
0.268





position relative to TSS:
−9
−8
−7
−6
−5
−4
−3
−2





A
0.257
0.259
0.259
0.260
0.261
0.262
0.262
0.263


C
0.270
0.269
0.268
0.268
0.267
0.266
0.265
0.265


G
0.203
0.203
0.203
0.202
0.202
0.202
0.202
0.202


T
0.268
0.268
0.268
0.269
0.269
0.269
0.269
0.269





position relative to TSS:
−1
0
1
2
3
4
5
6





A
0.264
0.264
0.265
0.266
0.266
0.267
0.267
0.267


C
0.264
0.264
0.262
0.261
0.261
0.261
0.260
0.260


G
0.202
0.202
0.203
0.204
0.204
0.205
0.206
0.206


T
0.269
0.269
0.269
0.268
0.268
0.267
0.266
0.266





position relative to TSS:
7
8
9
10
11
12
13
14





A
0.268
0.268
0.268
0.269
0.269
0.269
0.268
0.267


C
0.259
0.260
0.260
0.259
0.259
0.259
0.259
0.260


G
0.207
0.207
0.208
0.209
0.210
0.211
0.212
0.212


T
0.265
0.264
0.263
0.262
0.262
0.260
0.260
0.260





position relative to TSS:
15
16
17
18
19
20
21
22





A
0.268
0.267
0.266
0.265
0.264
0.263
0.260
0.261


C
0.261
0.261
0.262
0.263
0.263
0.264
0.266
0.264


G
0.212
0.212
0.214
0.215
0.216
0.218
0.221
0.219


T
0.259
0.259
0.258
0.256
0.256
0.255
0.253
0.256





position relative to TSS:
23
24
25
26
27
28
29
30





A
0.261
0.260
0.258
0.256
0.255
0.255
0.255
0.254


C
0.265
0.267
0.268
0.269
0.270
0.270
0.270
0.271


G
0.221
0.223
0.224
0.227
0.228
0.228
0.229
0.230


T
0.253
0.250
0.250
0.248
0.248
0.247
0.246
0.246





position relative to TSS:
31
32
33
34
35
36
37
38





A
0.253
0.253
0.252
0.251
0.250
0.250
0.249
0.248


C
0.271
0.271
0.272
0.272
0.272
0.273
0.274
0.275


G
0.231
0.232
0.233
0.234
0.235
0.236
0.237
0.238


T
0.246
0.244
0.244
0.243
0.242
0.241
0.240
0.240





position relative to TSS:
39
40
41
42
43
44
45
46





A
0.247
0.246
0.245
0.244
0.243
0.242
0.241
0.241


C
0.275
0.275
0.276
0.277
0.278
0.279
0.279
0.280


G
0.239
0.241
0.242
0.242
0.243
0.243
0.244
0.244


T
0.239
0.238
0.237
0.237
0.236
0.236
0.235
0.234
















position relative to TSS:
47
48
49
50







A
0.240
0.240
0.239
0.238



C
0.281
0.281
0.282
0.283



G
0.245
0.246
0.247
0.247



T
0.234
0.233
0.233
0.232







*mimics from promoter positions −449 to 50 bp upstream of the TSS and is calculated as described herein.






In varying embodiments, the synthetic promoter scaffold or backbone is derived from a promoter capable of expression of a polynucleotide in an algal cell, e.g., in the nucleus or a plastid organelle (e.g., a chloroplast). In varying embodiments, the synthetic promoter scaffold or backbone is derived from a promoter capable of driving expression in an algal cell selected from the group consisting of psbA, atpA, psbD, TufA and atpB. See, e.g., U.S. Patent Publication No. 2012/0309939.


In varying embodiments, the promoter comprises a nucleic acid sequence of a synthetic promoter shown in Table 4 (e.g., any one of SEQ ID NOs: 38-62).









TABLE 4







Illustrative synthetic algal promoters. Underlined sequences show


location of elements.











SEQ




ID


Promoter
Sequence
NO





sap8
CACCAGGACATCCCTCTCTCAGCTCCTAGAAGCTGTCTCGT
38



GCCAGCTTCGGTCGGGCCGCAAGTAAAGCGAGACCCAAGA




GCGACGTTTGCCACCTTGCGCGTGCTTTGAGCATGTCGCGA




AGAAACCCCGAAGGCATGGGGCCCATTCGCGAAGCAAATC




TGGTGTGCAACCATTAAGGCTTTAAAGCGAGCGAGCGAGC




AGGAGGCCCATGCAGCGCGCGCGAGGCGAACATAGAATG





GGCCCGCTCTTCCGCTGCGCGTTAGAAGCGAGGCAGCATC





ATATTCATATTCATTAGCACCAATGCTCGCAGGTATACAAA




TTTTGTGCAGAAGCGAAAATGCAAGCAATTTGCATGGGGC




GTACGGCCGCATGGGGCTTTTTTTTTTGGGGCTCAAGTCTC




AGAGCGCGCGCGCAATGGCGCCCTCTCCTCTCTTTTCCTCG





TCGCGACCGAACCCAGCAAGGTGCGTCAAGATCGCTGTCG





GGTAAGAGCCAAGGCT






sap11
CACATGCTGACTACGAGCAGGCGCTGGGCAGAATGGCATG
39



AAGGCTTCTGAGCGACTCGGCGACGAACTCATCCCTCAAG




TGTTGCACAAAAGCGCCGAGCGCTCCGCGTTCGAGGGCGA




ATGACCCGCGCGAATGGGCCCCACAAATGACCAGGCAACC




TCAAGCTAACGCAGCGGCCTTTTACGTATAGAGCGACTGC




AAGCAAGTATGCAGCTCGTTGCGCGGTCGCGAGTCCAAGT




CGCGCTGCGCGCACATCCTCGAGCGCGCGCCGCGGCCACC




AAGTGGAATGGGCCCATCATGCATGTTTGCTTGGCCCCGAT




AAAGCCCGCAATTTTGGGAAAAAGGTACGGCGCGCGCCCC




ATGCGAGATGTACGCCCATTGCATGGGGCAACTTGCTCAA




AGCCGAGCGAGCCCGCTGCAGGTTAGTCTTTCTTTTAGCGT





GTGCCCACACCTTTCTAGTCGTTCTTCGCCACCACCAACAA





GAAAGCCGGCGGCCTCG






sap22
GAAGCCCTCCATAATGGCCCCGTCTCCGCATCTCCCGCACT
40



GTTCGCGGGCAACAGCAGGGAGACGAGAGGAACCCAAGA




AGCGCGCCACTGCAGCGCTTCGCGCAGTGGGCCCATTCCG




GCAATTATGACCCCCGACCGCGCGGGTATGAAGCTGTTTTC




AAGCAACTCGGCGCAGTTCTTGGCACTCGATTTGCGCGAG




AGCGAGTTTCAGAATGGGCCCTCTTTTTGCTTGCTTTTGCG




CGTCGACCGCCTCGCGAAATGGTGGGGCCTGCACCCATTGT




TTCATTCTATGTATCAATGCCATTTATAATCATTAGGAGCA





ATTTTGGTACGGCGTGCGTCACTTGCATGGGGCTGGCCCAT






TGCAATGAGATGGGCGCATGGGGCGCTCAATTGTCTGCGA





CTTGCGAGCCACTTCTCTCTTCCCTCTCTCGCCGTCAACCGA





CCGACTCACTTCGTCGCAACCACCTTTCGTGAGTAGGTAGT





GTGTAAGAAGGT






sap1
CCCCCTGCCTCCTCGCGCATGCGTGAGGCATGAGAGCGTG
41



GCATAAGGCCGTAAAGCAAAGCGACAAGGGGCTTCCAGGT




GTGCACGCATGCAAGCACGCGAAACTTTTTTTCTGCGCTGG




GTTTGTCGCTTTCCTAGTTTGTAATGTGTTCCAACCCTTTTA




GGCGTGGCAGCAGAGGCGCGCGGCGCCATTTGGGAAAGCA




AGTTAGTGCAAAATGCAAACATGCGCAAGGGCGCGGGGTT




CGCGACCATCGCGAGCTCCATAGCGCTGGTGGCTATGCAC




CATTCCATGCATGCATACAATTCATTATGGGCCCATTCAAA




TTTTGGGGGCGTTCTTATCCTTCCCTGGAGGGCCCATTCTC




GTACGGCATTGCATGGGGCCGCCCCATGCGGACTTGCTTAT




CCTGCGAGCGCGCGACAGCTTTCTCTTTTACTTGTCGCAGG




TTGCGCCGAACACTTCTCTTTCAAAACACCAGTGAGCAGGC




CCTCGCCCCCAA






sap2
CGGGTGTTGTGCTCAGAGTGGCTTCCGCATGATAAACGCA
42



GCGCTGAAGCTATTAAAGCAGGGGGAACCCTCGCTCAAGA




GATCGCAAGCACCAGCGCACGCGTTGCGCGCATGTCGCGC




AGCAATTGGCAGAAACCGCTTGAAATTCGCATCAATGCAT




GTCAAGGCGCAATAGCTATGCGCAAGGCCTCCCGGCTATG




CGTAGACAAGGGCCCATTCCTAGAATCAGGGGAATCAAGC




GGGTTCGTGCAAGCGTGGGCCCATTCTCAGGCCAGCATAG




CGAGGATAAAGCTAGCATAAATTGCGCCCCATGCATGGGC




AGAATTTTTGGCGCTTCCAACGCGAAGCAGCAGCGCATGG




GGCGATGCCGTACGGCGAGATCGCCTCTCAAGTCTTTGTCG




CAAGTCGCGAGCCACTGCACCACCTTTCCTCTCTCTCTTTGT




CCACCGCTAGGCAAGGGTGGCCGCAAAAAACAAGTACAGG




GTAAGAACAGGGCTCTT






sap3
AGGCTAGAACAGTTTCTCCTCTCCATGGCAATATCCCGCAC
43



CAGGGCACGAGGGCACTTAAAGCACGGGAGAGGGTGTTGG




GGTCTCCGAAAGCACTAGAACCTGACAGTGAATGGGCCCT




TTCCCCGGCATGGGCAAGCAAGCAAGAAGGCAAGCAGCGG




CAGAAGCAAAGTGCGGAATGGGCCCTTGCGCGTATATATT




TCGGGCAAGAGCGACGGAAAGCGGTCGCTCGCCTGCAGAG




GCGTTGAATTAAATTCTGCGCGCGCGAATGCGATTAAAGC




ATACAGCATGCACTGGCCCATTGCATACAATTCAAATTATC




TGGGCCCCATGCGCGGTCCACGAAAAGGCTGCATTGGGGC




GCCGTACGGCGTCGCGCTCATGCGCCCCATGCAGATGGCC




GCCGGTCTTCCTTTCTTTCTCTCTCTCTTTCTCTTTCAGGTGC




CCCTCCTAGGACACTTCGCCTTAAAGTAACACCAACAAGA




AGCGCGCCCTGGCCC






sap4
CCTGCTTCAGGCCAGGGCGTGAGATAAAGCATGCATTTGG
44



CAGCGATGTCAGGGGCTTTCTGAAAGCCGCTTTTGGCACGG




TGTGACATGCGTGCACGCGTTTCGGGTGAGCAGCAATGTTC




AGCAACCCCCGCAATGCGGGGCCCATTCTGGGCAACCCTT




CCAACAAAGTTGAAGTGAGCAATCGATTTTGGCAGAATGG




GCCCACGCGGGTCGCGGCATGCGCTTGCGCCGGGGAGAAT




TCATGGCCTCGCGCAAGGCAGCGCGCGAAATATTGCGGTG




GTCTCACGCATAGCAACCAGGGGGCACTCGCAAAGGCTGT




ATATTAGTTTATAGGCCCTAGGCCCCATGCGGTTTGTACGG




CCCATTGAGGCCCCATGCCCCATGCAAATTTTGCGCCAGCG




CTCACCTCCCCACTCTTTCTCTTTCTTTCCTCCCGTGGAACA




CCAGTCACCAGTCCTCATTCAGCAAGGAGCAAGCCGCCGG




TGAGCAGGTGAGCC






sap5
CCTGGAAAGGAGGCTAGGGCGCATGTCGTTTTGCAAAAAA
45



ACGCGTGGCAGGAGTGGGACAAGGAACCGCTTCTTCGCTT




CTTCTTTGGCAGTGCAAGGCGCAGCACCAAGTGCAGCGAG




CAGTGAAACAATGGGTTCGCGAATGGGCCCTCTTGGAAGC




AACCTCAAACCATTCTGCCAGGGCTCAACTGAGCACGCGG




CGCTATGCGTGAGCAAACATGCGCTTTTTGTGCTGCAAGAA




TTCCTCGGCAAGCTGATTTTCGTCGCTCCCAGCGTCACCCA




GGGCCTTGGCTTCTATGCATGCATGGGGCAGAGCATGGGT




GTTTAATTTTGGAATGGGCCCCAGCCCCATGCGCCCAATTA




ACGCCCCATTCGCCCGCCGTACGGCGAGTCTTGCGGAGCG




CAAGTCTCTTTCTCCTTGCCTCTCTTTCTCTCTTTCTCGTCGA




CCGTCGCCGACCACCTAGGTCAATTTTGAAGTCAAGACCTG




AAGCGCGCTCTTC






sap6
ATGGGAGCAGCTCCTCCTCTCTCTGTCTGCTTCTGGGCCTA
46



CACGAGTGTCGATGTGCCTTTGGCACGGAGAAGCGAGAGG




AAAGCGCATGCCTCAAAAATCCCGAAGTGCCAAGCATGGG




GCAACCCCCGACGCGAAATTATTGTCAAAGCCAGCAGTGT




CATTCATGCTGGCAGAAGGAAAGTGCTCGCGTTTAAAGGA




GGCAGACAGAGCGCGCGCGGGCGGTCGCATGCGCGCCAAA




ATCTCGCGACCTCGCGAAATGCGAGCGCGGGCCACCTTTA




GAAGTAGCAAAATGCCATTGAATGGGCCCAGAATGGGCCC




GTGATGTCTATGTGCATGAGGGCCCCATGCAAGGCAGAAA




GTCGATCGTACCGAGATCGCCCCATGCGAGCGCCGTACTCC




GCGGAGAAGTCGCGCGGGCGCAAGCTAGTTCTCTTTCTCAC




TTCCCGTAGTCGACCGTGCTTCACGTCAGTCCACCACCACG




CGGCCATCTTTAGCCG






sap7
GCTTCGTCACGCAGGCAGCTGGGCAGGCAGGAAAAGCATA
47



AGGGCACTTCATCATCGTGGGAGAGAAGGCCTGGAAGGAG




AAGGGACACAAAAGCGCTTCGACCTTGCGCCCTTGAGGCA




CCGTCGACCCTTTGGAGCTACCTTTTGGAGCAGTGTTCTGG




GGCCCATTCCCAAAAGGGTGCTGCGCAAGGCGAGCGACTT




TTAGGCAGAGCAAAAGCATGCTTGCCAGTCTGGGCGCCAA




GCCTTCCGCGCACGGTGCTCGAATGGGCCCTGGCCTTTCAT




GCCTTGCTCTGATTTTCATTAGCATCGTGGCCCCATGCGAA




AGCCGAAAGCGCGAGCTCCTGCGCATGGGGCGATCTTCCT




GGCGCCACGGCAGAGATCGCCGTACGAGTGCAGAGTCTTC




CGCGCGCGAGCGCGACTTTCTCTTTCTCTTTCCCATCTTAGG




AAACACTTCGCCACTGCTTTCGTTAAGAGCCGCCGGAAGG




CCCTCCGCGCCCTGG






sap9
CTGGTCCCAGTTGTGCATTCTCATGTGAGGAACCCTGGGCC
48



AACTGAGGGGCAGAGGGCAGACGAGAAACGGTCCGCACG




GTCGCAAGCGCACAAAGCACGCGTTCGACTGCGCTCTAAT




GGGGCTGAGCGTGTCTGACCTTTTAGCTCAGCAAATCAGGC




AGAAGCAGAAAGCTAACCTACAAGTGGGCCTCATAGAATG




GGCCCCACGGCGCGCGCGATGACACGCAGTCGCTTGCGTC




GCGGCAAGCGGAAGCTGCGAGCCACGAGCGAATGGGCCCT




TTCATGCCATGCTAGATGCTAAATTTCCACAAAGAGACAA




AATTAATGCGAGGGCCCCATGCAGGCGGTACGGCAGATCG




CTTGCCCCATGCGATCGCCCCCATCGCGAGACCCTTGCGAG




CGAGCGCCTGCACCGTTGCCCTCTTTCTCTCTCTTGTCCTGT




CGCCTTTCTAGGAAAGGGCGCCACCTTTGCAGAAAGAACA




AGAGGGCCTCGCAGGT






sap10
ATGCCTCCTCGCTTAGCGCTAGAAAGCCGTCTGTCCTTAAA
49



AAAGCCAGCGCAGAGCGACTGCACTTCTTGGCTCAAGAGA




TCGCACGCGCGCCGACCCGCCAGGTCTGGGTCGCGAGAGC




GTCTCTCGCCGGGCGCTGTCGACCGCTTTAGCACTGTGTCA




TTTCAAGTCATGAGCTGCTACAAGTCGCAGCCGAGGAGCA




GAATGGGCCCTGGGCGGCATGCGCATTTCCCGCTCGCCAG




GGTTCACTCAGCAAGCCCTCAGCGCTGCAGGCTCACACATT




CTTTGCTGATTATGCATGCAAGCATGCCCCATGCATGGTAC




TGCGCCCGTGCGAGAGAATGGGCCCCTCTCGCCGTACCATT




CTCGCCGCAATTGCATGGGGCGACTTTTGAAGGCCGACTTT




GCGAGCGCGCGCCGAGCCTCTTTCTCTTTGTCGTCGCCTTG




TTCGACACTTCAGTCACCTCGCCTCCACCAAGGGTGGCCCT




CGCAAGAAGGAG






sap12
CTGCGTGCATTTTAGGAGGAAGAAAGCCTCCGCAGAGCCG
50



CACTGACTTCGCGAGCCCTTGCGTAGAAATCTCTGAAACCC




CATCGCACCAAGTGACCTTTCTCAGCGCTCGCGTTGGCACG




CGTCGCTTTCTGCCGCACACGCAATTGCTCAGCAACAAAGA




GGCAAGCTATTAGTATCAAGGCTATGCGCGAGCGGAGACC




TCGCGCGCGCGCTGGCGGCTCACGGCGCCTGGGCAACTTG




GGGTTCGCTTCGGGCCCATTCATAGCGCTGAGTGGCCATTC




AAGGGCCCATTCAAGGTCGCAGGGGATTAGCATACCAAAA




TGTAATGCAGAATGCCTTCTCTGCGCGCATGGGGCGCAATG




GCCCAATTCTCGCCGTACTCGCTCGCGCATGGGGCGGAGTC




TCGCCAAAGCGCGTTCTTTCTCTTTGTGCCGCTAGTCGTCG




CAGGTGAGCGTTAGATCACCTTGCTCCTTTTTTCCGCCCCG




CGCTGTGAGTAC






sap13
TGCCTCCAGAAGATAAAGCATCTCATGTAGGTCAGGAAGA
51



ACTCCAGGAAAAGCAACAGCAAGCAAGGGGACACGCTGCT




ACACAGAGCTTCGAAAATCGAAACTTCGGCCCTGACATAA




CCGCAAGTGTGTGCAGCGAGGGCCCATTCTGTTCTAAGAA




AGCCCACCAACCTCAAGTGCTGGTCGACGCAGCATCCGCG




AGCGCGCGCGCCAAAAAGTTGTGCAGTTTGGGTGCGCGTC




GTGCGACGGTCGCTCTTCCCTCAGCGCGAAATCCATTCCCC




ATCATTTGGGTCTCTGCACCCATGCATGTTTGTGCGAGCGT




CGCGCGGGCCCCATGCGGTACGGCTTTTCTGAATGGGCCCC




CCCGCTTGCATGGGCGCGGTCGACCGCATGGGGCGAGAGC




GCAACAAAACAGCGCGTTCTCTCTCTCCCTCTTTCCAAACC




GGTTGGCCGAACAACCACTTATCATCTTCGTTGCCCCAGCA




GGCCCTGTCCAAGAA






sap14
AATGGCCCGCCCTGGACATGGCGCAGCCTGAGGGCCCTGT
52



TGCAAAACGGCTTAAAAACACTTAAATCGCTGGCAGGGAC




ACTTCGTGCGGGTCTGCCGAGCGCAAGGCGCGTTTCGGGC




CCCGGCACCGTCGCTGTTTCGGACCCCCGTTCGTGCCAGCG




CGCTCAACTAATGCGAGAATGGGCCCAGAAAACAGAGCAA




AATGCAAGAGCAGCAAAACTGCGCATGCGCCACTGTTGTC




TCACTCGCTCGCGCAAGCTCCACGGCCCTGGGGCCCATTCC




AGCGCGTAAATAAGCCACCATTTTGCGGTCTGGCAGCAGC




ACCAAAATTTTTAATGCATGGGGCTCCGCGAAATGGCGCC




GTACGGCACCGAGATCTGCCCATGCATGCATGGGGCGGAG




TCAAAGCGCGCGCCGAGCTCTTTCTTCTTGTCAGCACCGCA




GGTTGCTCACGTAGGACACTTCTTTGCGCGTCGCCCCTGCC




TTCGGGCACGGGTAAG






sap15
CACGAGTTTGCTGGACATCCTGGCTTTCTCAGTGGCAGCGC
53



CGTAGGTCGGGCAGAGGGAGAAACCCTTCGCTTCTCAGGA




GAAGCATACGTTCGTTCGGTGGGGGGCGAAGAACCACAGC




AGAATGGGCCCGCTTTCGCGGCATCAATGCATGCTCATCAC




CAAGCAGAGGCTCAGAGCCTCCTCAAATCAGGGGAAAACT




GACGCGCGCGTGAGCGCGCTTCCGACGCGATGGCGCTCGC




TTGGGTTGCGTGAGCAGGCTGCGAGAGCGCTGGCTGTTAC




ATTCATTGAATGGGCCCATGCATGGGGCAAATAGTGCGGC




GCTTCCATGCAAGCAAGCGAGCGCGACGCGCATGGGGCGC




CTGTACGGCCGCCCCCATTCCCCATGCCGTACAGAGTCTGG




GTCTTCCTTCCTGCACAGCACTTCTTTCCTCGAGTTGTTCGT




CGTCGCATCGCCACTTCTGGCCAGCAACACACCGGAAGCG




CAGGCCCTGGCCCTC






sap16
GTTGCCCTGCTTCCGTCCATGATGGCGCATGCCTGAAGCAG
54



GGCAGGCCGCACATGACTTCAAGCGTCCTGGGGTTCGCAA




TCAAGAGCTTTCGCGTGTCTGCGGGTCGCGCTCGCACAGCG




GCCCCGCGCGTGCCGAGCTCGACACTCGTTCGCGTTAGGCA




ACTCAAAACCAAGCTACAACAAGCAGTATACCTTGCGCAG




CAAGGAGCATGCTTTTCTCCGGTCGCGCCCAACGACGATTT




CCTCGCTGGTGCAAGCTCCCGAGCTCCCAGCGCGCGCGAA




TAGCAAATAGCAAATGGAATGGGCCCTTGTTTATAACGCG




CGCGCATGGGGCGAACGTACGGCGAAATTTGCATCGGTTT




GCCCCATGCATGCAGAATGGGCCCATTTTTGCCCTCGCGCT




GCGCAAGCGCGAGCTCTTTCTTTCTCTTTCGGGTCTTTCTCC




GTTTGTTGACACCTCAAGTAAAAGGCTTTTCTCACACCAGT




CCGCGGTGAGCC






sap17
CACCTGCTGCTGGGGCAGAATGGCCATGTGGCCAGCGCAC
55



TGTTGTTGTGACACTGAGCTCGAGAAGGACAAGGTGTGCA




AGTGACATGTGCACGCGAAGGGGAATGGGCCCCAAGGGCC




CATTCGTGCAGCGGGTGCTGCCGCATTGAAGCAACCAACA




AAGCTAATGCGCTAATGCGCTGACGCGTTCCGTGGAAGGC




GAGACGCAAGCGCGAGCGCGGAAAGCAGGCGATTCACTCG




CGCCAAGCCTCGCGGGAGCGCTACTAGCCCATACGGCCCA




ATAGCAAGCATACAGCAAGCCTCTGCGCATGGGGCCAATG




CATGGGGCCGTTCTGGTACGGCTATGCCTTTCTCCCATTTG




CAATGGCAATGGGGCCCCCATGCAGATCGCGACGAGGGTC




TCTTCCGCTCAGTCAGCGTTCTCTTTCTCTTTTCGAGCTCCC




GTCGTCGCTTGCACAAGAAGGCCGCACAGCAGTCTTGCGC




TCGCCCAATTAGCCCTG






sap18
GGATGCTGGACAAGAGAAGAACATGCCAGCCATGACACCT
56



GCCTGAACTCCAGCTCGAGAGACACTATTTCGACCCAAGG




TGTTGAGTGCAGATCGCAGCTTTCGCAAACGCAGCTCTCGG




GTTTGTGAAATGACCCCGTGTCTGAAGCAGTCAGCGGGGG




CATGTCTTGGTTATTGGAAGGGCGCGGTGGAAGTGGGTCC




AGCAAAACGGGTCTCGCAGCGCGAGCAGCGCCAAGAACG




AGTGCAAGCGAATGGGCCCTCAAAGGCCATCGCCCCCAGC




GCTGACCCCATTGAACATGCATGTTTGCGCATGGGGCAAC




ATAGTGCAGCCCGCGAGCGAAAAAGGGCCCATTCTTGCAT




GGGGCGCCAATGGCCGTACGAGCGAGTCGGGGTCTCTCAA




GTGCTTGCGAGCGCGCGCTCTTTCTCTTTCCTCTCCTTTCTT




TGAGCAGCTTCACTGATCACGTACTTCTTCGCAACAAGCAG




GGTAAGAAGCGGTGCGT






sap19
GGATGACTCCGTGCATGCAAATGCCGCACGTCTGCGAGGG
57



CTTTCGCGACGAGAAGGAAATCAAGAAGGGAGAAACCCA




ACCTCCGAGAAGCATGTTCGCGCGTTTGAGCAGCGAGGGA




CTCTCTCGCGGAGCCTTCCCGAAGAAAGTCTTGGGGCCCAT




TCTCGCGTTTTCACCAATGGCCTCGAGGCTCAGTAGGATTT




TCGCGCGCGCGCGCGTGAGCATGCGCGCGCGAGTCTGGGT




TGAATGGGCCCTCCTGCGAGCTTCCCCAGGCAGCGGGGCC




CATTCAGCAAGCATACAATGCTTGTGATTGCTTAGCCCGTG




CGCCCCATGCGCAGAGAGAGCCCCATGCATGGGCTGTACG




GCAGATCTCGCGCCCCCCGTACGGCGCGACGAGTCTGCTG




CGAGAGCGCGCGCGCTCCTTCTCTCTCTTTCACGTGTAGGC




GCAGGTCGCCTTACCACCTAGGAAGGTGCGTCCCTCACCCT




CTGTGAGCCCAAGGGC






sap20
CTGCCCCAGTTTGCTTAAATGCGTGCATGATGCATTCTCGT
58



AGGTCGTTCATGGCAGCTCGAGATAGTTCCGAAACGACCG




CAAGCACCCCGCCACCCGAGCACGCTCTTTTTTCGACCGCA




AAGAACCGCGCCCCGCTGTTCCAATGCATGTCAAGCAATG




TCAACTCGCCGCTATTAAGGGCCCATTCTTTCTGCGCGCGC




GACATGCTTTGAGAGCAAAATGCAACTGCTTTTGTTTTGCA




AGCTCAAAGGCCTTCTTCGGGTGGGTTCAGTTCTATATCAC




CATTCATTCATTGCGCGCAGGCAGATAAATAGAATGGGCC




CGCGGCGCCCCATGCATGAGGCCGTACTTGGCAGATGCAT




GGGGCGCCCCCTGGAGCTCGCTCGCTCGGGGTGAAGAGCG




CCTTCTTGTCTTTCCTTTCTCTCCTTTCCTTACCTTCGTCGAG




CCTGCCAAGATCGGTGGCGTCAGTGCGTCGCCTTAAGCAG




GCCCTGTGAGTA






sap21
ATGACTTGGTGGACTGCCCTGCACGCCTTCCGCATGTCCTG
59



GCCCCAGCGCACTTCTTGGCAGTAAAGCGGCAAGCGGGGA




CACACTTCGCGTGCGCGCTGCCAAGTGCCCGGGAGTGCCCT




CGACCCGCGACTCCTATCAATAAAGCCCGCTCGCCTTCCTT




CCTTGGTGTTGGTGCTCGCGTCAATCCTGCAAGCAGAAGCC




CAGCTCGCAAAATGCAGCGCGAGCAAGTTGCGCCACTCAT




TCACTTGCGCGCCTCGAATGGGCCCAGCGCCAGGGCCCATT




CAAGTGGTTAAGCTATGTATGCAATGCGGCGCTCCAAATTA




TTTTGTTTCTGGCCGTACAGGGTCGGTACGACCCAAGATCT




CGCCCCATGCGGGCCCATCGCATGGGGCGCCCCTTGCAAG




CCGAGCAAGCGCGAGTTCTCGCCCTTTCTTTCTCTTCGACC




TAGGCACACCGTGGGCGCCGCACACCACAGCAGCAGTGTG




TCCTCCCGGCAA






sap23
CCCCGGCAGGGCGACGTCCACTGCACAGCCAGCCATGTTC
60



GCCTGCCCATATTTGGTCCGGCGAGGGTTCGCTGCTACACA




GGGGGGAGTGCAAGCGCTACCTTGCGTCGACAGCGGCATG




AAGGGCCCACGCAGAATGGGCCCGCAATGCATTGCAATGT




TCAAGCTCATGATTAACGCGCTGCAACGCGCCAGAGCGAG




AGAGCGCGCGAGCGCTCTGGGGTCCTTGTCGCTCGCTTTTG




TTTTCGCGGGCAAGCTCGCTGTGGGCCCTCCAGCGCATTTT




TTTTCTATCATAGTGACATGACCTTTGAATGGGCCCTGTGG




GCGCGGCCCAGAAAATTTTTTTTTCTCTTTCTCCGCCCCATG




CGGCGATGGCGCCATCGCCGTACTGCATGGGGCTCTTTTGA




GAAGTGCGAGCAACACTCTTTCCTCTCTTTCTCTCAAACAC




CAGTCGATCCAACCACACCATTTTCCTATCTGTGCGCTCTT




CCGCGGCGGCC






sap24
TGCTCCAGGATCTGGGCTTTGGGCATGTGTCTGTCCTTAAC
61



CAGGCACTGAAGCCTGCAACACTTCCCCTTTGGCTTCCGAG




AAAGCATGCGTGCGTTGCGTGTGGGGCCCATTCGGGAGTG




AAATTATGTCTGCTAGGCATTGTGAAGCTATGCAGTGTTGG




TGCCAGAGCCTCGCGGCGCGGCCGCGTAAAGCAAGAGCCA




TTTTGCGCAAAGTCGCGGAATGCCGGGAATGGGCCCAACG




CTTCCTCTCGCGAGTTGCGCCCGAGCGTAGCGCCTTTCAGT




TTCATTCCAGCTGGGTATGCGCCCCATGCAATTTTGCGCAT




GGGGCGCTTCCGCAGTTTGCGCGAAATCGTACGGCGTACG




GCTTGCATTCCCCATGCGCTCGCGCTCTTCTCTTGCTGCGCG




CGGACTTCACCTTTCTCTCTTTGAACGGTCTAGCCCGCAGG




CCGAACACCAGATCTTCACGTCCCGCCAAGCCGCAACTTGC




AGGTGCCGCGG






sap25
GGTAGTGGCCCTCTCCTCTTGCACCTATTTGCCCCGCACAG
62



CAGCGCAGGAGGGCAGCGCTGCCTTCACTTCCCCTCCTTCG




AGAGATCGCAAGCTGGCTCATCACACGCTCGGAAAAGAAC




CGGCACGCGCGAGCAATTGAATCGCAGTAGCTCCAGCGCT




CGCGCCCCGGCTGGTGCGGGCCCATTCTACAGCAAGGCGA




AGTATGCGGGCCTTCAGCGCGATGGCGCGCGTCGCGAACG




AGTCATAAGATGGGTTTTGCCAGCGCCAGCGTAGCACCAG




CCATTCATGCTCGGGCCCATTCCACAGTGTTTGCGAGGCCA




AAAATTTTGCAAGGCAAGCAAGCAAGTCGCGCCGTACGAT




GGCCCCATGCAGCAAATGGCGCATGGGGCCGGAGTCTGCA




GAGCGAGCGCACTTCTTTCTTCTCTCTCTCTCTTTAGGTGCC




CACACTTCGCTTCGCAAGATCAGCAACCTCGCAAGGTTGA




GCTTCGGGGAAGCTT









In varying embodiments, the promoter is at least about 200 bp in length and up to about 500 bp, 600 bp, 700 bp, 750 bp, 800 bp, 900 bp or 1000 bp in length. In varying embodiments, the synthetic promoter promotes transcription levels that are at least about 2-fold greater, e.g., 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, or more, greater than a control promoter (e.g., a random polynucleotide sequence or a native promoter). In varying embodiments, the control promoter is the ar1 promoter. In varying embodiments, the control promoter is selected from psbA, atpA, psbD, TufA and atpB.


The synthetic promoters find use, e.g., for the expression of a polynucleotide of interest in an algal cell, e.g., a green algal cell, including a Chlamydomonas, Dunaliella, Haematococcus, Chlorella, or Scenedesmaceae cell.


3. Expression Cassettes, Vectors, Algal Cells, Kits

a. Expression Cassettes


Further provided are expression cassettes comprising the synthetic promoters as described above and herein, operably linked to a polynucleotide of interest to be transcribed. In some embodiments, the polynucleotide encodes a protein of interest, e.g., for expression in an algal cell. In varying embodiments, coding polynucleotide sequences can be improved for expression in photosynthetic organisms (e.g., algae) by changing codons that are not common in the algae host cell (e.g., used less than ˜20% of the time). A codon usage database of use is found at kazusa.or.jp/codon/. For improved expression of coding polynucleotide sequences in C. reinhardtii host cells, codons rare or not common to the nucleus or chloroplast of C. reinhardtii in the native nucleic acid sequences are reduced or eliminated. A representative codon table summarizing codon usage in the C. reinhardtii chloroplast is found on the internet at kazusa.orjp/codon/cgi-bin/showcodon.cgi?species=3055.chloroplast.


As appropriate, the expression cassettes can further comprise terminating sequences, enhancers and other regulatory and/or linking sequences. In varying embodiments, the expression cassette comprises a transcriptional and translational initiation region, which may be inducible or constitutive, where the coding region is operably linked under the transcriptional control of the transcriptional initiation region, and a transcriptional and translational termination region. Certain control regions (including subsequences within the synthetic promoter) may be native to the gene, or may be derived from an exogenous source.


b. Vectors


Further provided are vectors comprising the synthetic promoters and/or expression cassettes as described above and herein. The vector can be any appropriate form known in the art for introduction of a recombinant expression cassette comprising the synthetic promoters in an algal cell. In varying embodiments, the vectors can integrate into the genome of an algal cell (nuclear or plastid, e.g., chloroplast), or can support episomal expression (e.g., in either the algal cell nucleus or plastid, e.g., chloroplast). In varying embodiments, the vector is a DNA plasmid. In varying embodiments, the vector is a virus. In varying embodiments, the vector is a polynucleotide suitable for homologous recombination, e.g., into the genome of an algal cell.


Numerous suitable expression vectors are known to those of skill in the art. The following vectors are provided by way of example; for bacterial host cells: pQE vectors (Qiagen), pBluescript plasmids, pNH vectors, lambda-ZAP vectors (Stratagene), pTrc99a, pKK223-3, pDR540, and pRIT2T (Pharmacia); for eukaryotic host cells: pXT1, pSG5 (Stratagene), pSVK3, pBPV, pMSG, pET21a-d(+) vectors (Novagen), and pSVLSV40 (Pharmacia). However, any other plasmid or other vector may be used so long as it is compatible with the host cell. For example, illustrative vectors including without limitation, psbA-kanamycin vector can be used for the expression of one or more proteins, e.g., in the plastids of a photosynthetic organism. The synthetic promotors described herein can replace the promoters in the commercially available plasmid.


Knowledge of the chloroplast genome of the host organism, for example, C. reinhardtii, is useful in the construction of vectors for use in the disclosed embodiments. Chloroplast vectors and methods for selecting regions of a chloroplast genome for use as a vector are well known (see, for example, Bock, J. Mol. Biol. 312:425-438, 2001; Staub and Maliga, Plant Cell 4:39-45, 1992; and Kavanagh et al., Genetics 152:1111-1122, 1999, each of which is incorporated herein by reference). The entire chloroplast genome of C. reinhardtii is available to the public on the world wide web, at the URL “biology.duke.edu/chlamy_genome/-chloro.html” (see “view complete genome as text file” link and “maps of the chloroplast genome” link; J. Maul, J. W. Lilly, and D. B. Stern, unpublished results; revised Jan. 28, 2002; to be published as GenBank Ace. No. AF396929; and Maul, J. E., et al. (2002) The Plant Cell, Vol. 14 (2659-2679)). Generally, the nucleotide sequence of the chloroplast genomic DNA that is selected for use is not a portion of a gene, including a regulatory sequence or coding sequence. For example, the selected sequence is not a gene that if disrupted, due to the homologous recombination event, would produce a deleterious effect with respect to the chloroplast. For example, a deleterious effect on the replication of the chloroplast genome or to a plant cell containing the chloroplast. In this respect, the website containing the C. reinhardtii chloroplast genome sequence also provides maps showing coding and non-coding regions of the chloroplast genome, thus facilitating selection of a sequence useful for constructing a vector (also described in Maul, I. E., et al. (2002) The Plant Cell, Vol. 14 (2659-2679)). For example, the chloroplast vector, p322, is a clone extending from the Eco (Eco RI) site at about position 143.1 kb to the Xho (Xho I) site at about position 148.5 kb (see, world wide web, at the URL “biology.duke.edu/chlamy_genome/chloro.html”, and clicking on “maps of the chloroplast genome” link, and “140-150 kb” link; also accessible directly on world wide web at URL “biology.duke.edu/chlam-y/chloro/chlorol40.html”).


Expression vectors generally have convenient restriction sites located near the promoter sequence to provide for the insertion of nucleic acid sequences encoding exogenous proteins. A selectable marker operative in the expression host may be present in the vector.


The expression cassettes comprising the synthetic promoters disclosed herein may be inserted into a vector by a variety of methods. In the most common method the sequences are inserted into an appropriate restriction endonuclease site(s) using procedures commonly known to those skilled in the art and detailed in, for example, Green and Sambrook, Molecular Cloning, A Laboratory Manual, 4th Ed., Cold Spring Harbor Press, (2012) and Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons (through 2016). Polymerase and recombinase methods such as restriction free cloning (Bond, et al., Nucleic Acids Res. (2012) Jul;40(Web Server issue):W209-13; PMID: 22570410) and Seamless Ligation Cloning Extract (SLiCE) (Zhang, et al, Nucleic Acids Res. (2012) Apr;40(8):e55; PMID: 22241772) may also be employed.


c. Algal Cells


Further provided is a cell or population of cells comprising the synthetic promoters and/or expression cassettes and/or vectors as described above and herein. The algal cells may comprise the synthetic promoter integrated into their genome (plastid or nuclear), or within an episomal vector. In varying embodiments, the cell or population of cells are algal cells. In some embodiments, the cell or population of cells are green algal cells. In varying embodiments, the green algae is selected from the group consisting of Chlamydomonas, Dunaliella, Haematococcus, Chlorella, and Scenedesmaceae. In some embodiments, the Chlamydomonas is a Chlamydomonas reinhardtii. In varying embodiments, the green algae can be a Chlorophycean, a Chlamydomonas, C. reinhardtii, C. reinhardtii 137c, or a psbA deficient C. reinhardtii strain.


Transformation of host cells to contain the synthetic promoters and/or expression cassettes and/or vectors as described above and herein includes transformation with circular vectors, linearized vectors, linearized portions of a vector, or any combination of the above. Thus, a host cell comprising a vector may contain the entire vector in the cell (in either circular or linear form), or may contain a linearized portion of a vector of the present disclosure.


d. Kits


Further provided is a kit comprising the synthetic promoters and/or expression cassettes and/or vectors and/or cells or population of cells and/or synthetic nuclear transcription systems as described above and herein. In varying embodiments, the expression cassettes and/or vectors can comprise multiple cloning sites to allow for the convenient insertion of a coding polynucleotide that is operably linked to the synthetic promoter. In varying embodiments, the kits comprising a synthetic nuclear transcription system additionally comprise one or more transcription factors, or cell comprising one or more transcription factors, e.g., as encoded by one or more of SEQ ID NOs: 87-178, e.g., SEQ ID NO: 150 (TF64). In varying embodiments, the kits can comprise an algal cell or population of algal cells as described herein. As appropriate, the algal cells can be fresh or frozen. The algal cells may comprise the synthetic promoter integrated into their genome (nuclear or plastid, e.g., chloroplast), or within an episomal vector.


4. Methods of Designing Synthetic Promoters

Further provided is a method of designing, constructing and/or assembling a synthetic promoter, e.g., as described herein. In varying embodiments, the methods comprise assembling or arranging at least about 3 (cis)-elements, e.g., from 3 to 30, e.g., from 3 to 27, e.g., from 3 to 25, e.g., from 3 to 20, e.g., from 3 to 15, e.g., from 3 to 10, e.g., from 3 to 5, promoter (cis)-elements selected from the group consisting of the sequences in Tables 1 and 2 within a promoter scaffold or backbone. As appropriate, the placement of the (cis)-elements or the constructing of the promoter scaffold or backbone can be designed, constructed or assembled first. In varying embodiments, the promoter (cis)-elements are positioned or located within the promoter relative to the transcriptional start site (TSS) as indicated in Table 1. In varying embodiments, the promoter is at least about 200 bp in length and up to about 500 bp, 600 bp, 700 bp, 750 bp, 800 bp, 900 bp or 1000 bp in length. In varying embodiments, the synthetic promoter promotes transcription levels that are at least 2-fold greater, e.g., 3-fold, 4-fold, 5 fold, 6-fold, 7-fold, 8-fold, 9-fold, 10 fold, or more, greater than a control promoter (e.g., a random polynucleotide sequence or a native promoter). In varying embodiments, the nucleic acid base of highest probability or second highest probability at a particular position of the promoter scaffold or backbone relative to the transcriptional start site (TSS) is assigned to that position, e.g., as indicated in Table 3. In varying embodiments, the method is computer implemented.


5. Methods of Making Synthetic Promoters

The synthetic promoters can be made using any method known in the art, including recombinant and chemically synthesized techniques. Chemically synthesized promoters can by comprised entirely of native or naturally occurring DNA bases, or can contain one or more modified bases or derivatives. Modified bases are well known in the art, and include, e.g., 2-Aminopurine, 2,6-Diaminopurine (2-Amino-dA), 5-Bromo-deoxyuridine, deoxyUridine, inverted dT, Inverted Dideoxy-T, Dideoxycytidine (ddC), 5-Methyl deoxycytidine, 2′-deoxyInosine (dl), Deoxylnosine, 5-hydroxybutynl-2′-deoxyuridine, 8-aza-7-deazaguanosine, locked nucleic acids (LNAs), 5-Nitroindole, 2′-0-Methyl RNA, Hydroxmethyl dC, Unlocked Nucleic Acids (UNAs) (UNA-A, UNA-U, UNA-C, UNA-G), Iso-dG, Iso-dC, and 2′ Fluoro bases (Fluro A, Fluro C, Fluoro G, Fluoro U).


6. Methods of Promoting Transcription

Further provided is a method of transcribing or expressing a polynucleotide, e.g., in vitro or in an algal cell. In varying embodiments, the methods comprise contacting a polymerase to a polynucleotide comprising the synthetic promoter operably linked to a coding polynucleotide under conditions that allow the polymerase to transcribe the coding polynucleotide under the control of the synthetic promoter. In varying embodiments, the methods comprise introducing into the algal cell the polynucleotide operably linked to, e.g., and under the promoter control of, a synthetic promoter as described and herein. In a further aspect, provided is a method of increasing the transcription of a polynucleotide in an algal cell. In varying embodiments, the methods comprise introducing into the algal cell the polynucleotide operably linked to, e.g., and under the promoter control of, a synthetic promoter as described and herein. In some embodiments, the transcription levels of the polynucleotide are increased at least about 2-fold greater, e.g., 3-fold, 4-fold, 5 fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, or more, greater than a control promoter (e.g., a random polynucleotide sequence or a native promoter). In varying embodiments, the (coding) polynucleotide operably linked to the synthetic promoter is codon-biased or codon-optimized for expression in an algal cell. A representative codon table summarizing codon usage in the C. reinhardtii chloroplast is found on the internet at “kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=3055.chloroplast.” In various embodiments, preferred or more common codons for amino acid residues in C. reinhardtii are shown in Table 5.









TABLE 5







Codons for amino acid residues in C.reinhardtii.










Amino Acid
Preferred codons for improved



Residue
expression in algae






Ala
GCT, GCA



Arg
CGT



Asn
AAT



Asp
GAT



Cys
TGT



Gln
CAA



Glu
GAA



Gly
GGT



Ile
ATT



His
CAT



Leu
TTA



Lys
AAA



Met
ATG



Phe
TTT



Pro
CCA



Ser
TCA



Thr
ACA, ACT



Trp
TGG



Tyr
TAT



Val
GTT, GTA



STOP
TAA









In varying embodiments, the algal cell is a green algal cell, as described herein. In varying embodiments, the algal cell is a Chlamydomonas cell. In varying embodiments, the algal cell is a Chlamydomonas reinhardtii cell.


To generate a genetically modified host cell, a polynucleotide, or a polynucleotide cloned into a vector, is introduced stably or transiently into a host cell, using established techniques, including, but not limited to, electroporation, biolistic, calcium phosphate precipitation, DEAE-dextran mediated transfection, and liposome-mediated transfection. For transformation, a polynucleotide of the present disclosure will generally further include a selectable marker, e.g., any of several well-known selectable markers such as restoration of photosynthesis, or kanamycin resistance or spectinomycin resistance.


A polynucleotide or recombinant nucleic acid molecule described herein, can be introduced into a cell (e.g., alga cell) using any method known in the art. A polynucleotide can be introduced into a cell by a variety of methods, which are well known in the art and selected, in part, based on the particular host cell. For example, the polynucleotide can be introduced into a cell using a direct gene transfer method such as electroporation or microprojectile mediated (biolistic) transformation using a particle gun, or the “glass bead method,” or by pollen-mediated transformation, liposome-mediated transformation, transformation using wounded or enzyme-degraded immature embryos, or wounded or enzyme-degraded embryogenic callus (for example, as described in Potrykus, Ann. Rev. Plant. Physiol. Plant Mol. Biol. 42:205-225, 1991).


As discussed above, microprojectile mediated transformation can be used to introduce a polynucleotide into a cell (for example, as described in Klein et al., Nature 327:70-73, 1987). This method utilizes microprojectiles such as gold or tungsten, which are coated with the desired polynucleotide by precipitation with calcium chloride, spermidine or polyethylene glycol. The microprojectile particles are accelerated at high speed, into a cell using a device such as the BIOLISTIC PD-1000 particle gun (BioRad; Hercules Calif). Methods for the transformation using biolistic methods are well known in the art (for example, as described in Christou, Trends in Plant Science 1:423-431, 1996). Microprojectile mediated transformation has been used, for example, to generate a variety of transgenic plant species, including cotton, tobacco, corn, hybrid poplar and papaya.


Important cereal crops such as wheat, oat, barley, sorghum and rice also have been transformed using microprojectile mediated delivery (for example, as described in Duan et al., Nature Biotech. 14:494-498, 1996; and Shimamoto, Curr. Opin. Biotech. 5:158-162, 1994). The transformation of most dicotyledonous plants is possible with the methods described above. Transformation of monocotyledonous plants also can be transformed using, for example, biolistic methods as described above, protoplast transformation, electroporation of partially permeabilized cells, introduction of DNA using glass fibers, and the glass bead agitation method.


The basic techniques used for transformation and expression in photosynthetic microorganisms are similar to those commonly used for E. coli, Saccharomyces cerevisiae and other species. Transformation methods customized for photosynthetic microorganisms, e.g., the chloroplast of a strain of algae, are known in the art. These methods have been described in a number of texts for standard molecular biological manipulation (see Packer & Glaser, 1988, “Cyanobacteria”, Meth. Enzymol., Vol. 167; Weissbach & Weissbach, 1988, “Methods for plant molecular biology,” Academic Press, New York, Green and Sambrook, Molecular Cloning, A Laboratory Manual, 4th Ed., Cold Spring Harbor Press, (2012); and Clark M S, 1997, Plant Molecular Biology, Springer, N.Y.). These methods include, for example, biolistic devices (See, for example, Sanford, Trends In Biotech. (1988) .delta.: 299-302, U.S. Pat. No. 4,945,050; electroporation (Fromm et al., Proc. Nat'l. Acad. Sci. (USA) (1985) 82: 5824-5828); use of a laser beam, electroporation, microinjection or any other method capable of introducing DNA into a host cell.


Plastid transformation is a routine and well known method for introducing a polynucleotide into a plant cell chloroplast (see U.S. Pat. Nos. 5,451,513, 5,545,817, and 5,545,818; WO 95/16783; McBride et al., Proc. Natl. Acad. Sci., USA 91:7301-7305, 1994). In some embodiments, chloroplast transformation involves introducing regions of chloroplast DNA flanking a desired nucleotide sequence, allowing for homologous recombination of the exogenous DNA into the target chloroplast genome. In some instances one to 1.5 kb flanking nucleotide sequences of chloroplast genomic DNA may be used. Using this method, point mutations in the chloroplast 16S rRNA and rps12 genes, which confer resistance to spectinomycin and streptomycin, can be utilized as selectable markers for transformation (Svab et al., Proc. Natl. Acad. Sci. USA, 87:8526-8530, 1990), and can result in stable homoplasmic transformants, at a frequency of approximately one per 100 bombardments of target leaves.


In some embodiments, an alga is transformed with one or more polynucleotides which encode one or more polypeptides, as described herein. In one embodiment, a transformation may introduce a nucleic acid into a plastid of the host alga (e.g., chloroplast). In another embodiment, a transformation may introduce a second nucleic acid into the chloroplast genome of the host alga. In still another embodiment, a transformation may introduce two protein coding regions into the plastid genome on a single gene, or may introduced two genes on a single transformation vector.


Transformed cells can be plated on selective media following introduction of exogenous nucleic acids. This method may also comprise several steps for screening. A screen of primary transformants can be conducted to determine which clones have proper insertion of the exogenous nucleic acids. Clones which show the proper integration may be propagated and re-screened to ensure genetic stability. Such methodology ensures that the transformants contain the genes of interest. In many instances, such screening is performed by polymerase chain reaction (PCR); however, any other appropriate technique known in the art may be utilized. Many different methods of PCR are known in the art (e.g., nested PCR, real time PCR). For any given screen, one of skill in the art will recognize that PCR components may be varied to achieve optimal screening results. For example, magnesium concentration may need to be adjusted upwards when PCR is performed on disrupted alga cells to which (which chelates magnesium) is added to chelate toxic metals. Following the screening for clones with the proper integration of exogenous nucleic acids, clones can be screened for the presence of the encoded protein(s) and/or products. Protein expression screening can be performed by Western blot analysis and/or enzyme activity assays. Product screening may be performed by any method known in the art, for example mass spectrometry, SDS PAGE protein gels, or HPLC or FPLC chromatography.


The expression of the protein can be accomplished by inserting a polynucleotide sequence (gene) encoding the protein or enzyme into the chloroplast genome of a microalgae. The modified strain of microalgae can be made homoplasmic to ensure that the polynucleotide will be stably maintained in the chloroplast genome of all descendants. A microalga is homoplasmic for a gene when the inserted gene is present in all copies of the chloroplast genome, for example. It is apparent to one of skill in the art that a chloroplast may contain multiple copies of its genome, and therefore, the term “homoplasmic” or “homoplasmy” refers to the state where all copies of a particular locus of interest are substantially identical. Plastid expression, in which genes are inserted by homologous recombination into all of the several thousand copies of the circular plastid genome present in each plant cell, takes advantage of the enormous copy number advantage over nuclear-expressed genes to permit expression levels that can readily exceed 10% or more of the total soluble plant protein. The process of determining the plasmic state of an organism of the present disclosure involves screening transformants for the presence of exogenous nucleic acids and the absence of wild-type nucleic acids at a given locus of interest.


EXAMPLES

The following examples are offered to illustrate, but not to limit the claimed invention.


Example 1
Synthetic Promoters Capable of Driving Robust Nuclear Gene Expression in the Green Alga Chlamydomonas reinhardtii

Materials and methods.


POWRS Motif Identification. The top 50 highest-expressed endogenous genes were identified based on their RNA accumulation under ambient conditions according to previously published RNA-seq data (Fang et al., 2012). Since promoter structure is not strictly defined in Chlamydomonas reinhardtii the sequence between −1000 and +50 for the top 50 genes were analyzed using the POWRS motif identification program (Davis et al., 2012) (Phytozome 10.2, Chlamydomonas reinhardtii v4.3 and/or v5.5). All default settings on POWRS were used, except that the minimum number of sequences that a valid motif must match was lowered to ten.


Generation of synthetic promoters. Promoters were generated using random insertion of POWRs motifs, constraining positions relative to the positions of the motif clusters in the native sequences. Promoter backbones were generated to ensure similar GC content as the native promoters, including a periodic AT-rich regions (FIG. 1, panel A). Finally, all promoters contained at least one copy of a TC rich motif around the TSS (FIG. 2). Random promoters were generated by choosing 500 random nucleotides based on the Markov model that described the native promoter GC content without periodic AT-rich regions (Table 6).









TABLE 6







Markov model for random promoter generation.













−500 to −200
−199 to −100
−99 to 0















A
0.2
0.2
0.28



C
0.3
0.25
0.24



G
0.3
0.35
0.2



T
0.2
0.2
0.28









Plasmid construction. The synthetic algal promoters were synthesized as gBlocks (IDT, Coralville, Iowa) integrating in DNA ends that allowed cloning via SLiCE technology (Zhang et al., 2012) (Table 7). All restriction enzymes were purchased from New England Biolabs (Ipswich, Mass.). The pBR4 expression vector with the hygromycin B resistance gene under the control of the B-tubulin promoter and a separate cassette with the mCherry gene driven by the ar1 promoter was used as the backbone (Berthold et al., 2002; Rasala et al., 2012). pBR4 was digested with NdeI and XbaI to remove the ar1 promoter up to end of the RBCS2 5′UTR and generate ends for SLiCE cloning. Synthetic promoters were cloned with the RBCS2 5′UTR, which was amplified with appropriate primers to allow 15 bp overhangs with the synthetic promoters as well the digested backbone (Table 7), resulting in the constructs in FIG. 1, panel B. To rearrange sap11 with the hygromycin cassette downstream of the mCherry cassette, each half of pBR4 was amplified with appropriate primers for USER cloning into the HCR1, a modified pBlueScript II (Agilent, Santa Clara, Calif.), as previously described (Specht et al., 2015) (Table 7). The rearranged construct was then digested with NdeI and XbaI to remove ar1 and replace it with sap11 which was PCR amplified and SliCE cloned into the rearranged pBR4. Promoter and motif deletions were performed by SLiCE cloning. polyA and polyT mutations were introduced using overlapping primers and PCR pieces generated were cloned into a pBR4-rearranged backbone which had been digested with EcoRI and NdeI (Table 7). All constructs were confirmed by restriction digest and sequencing.









TABLE 7







Primers used for expression vector constructions.













SEQ ID


Primer Use
Primer Name
Sequence
NO





5′UTR
5′UTR_F
GTTGAGTGACTTCTCTTGTAAAAAAGT
63


amplification
5′UTR_R
CCCTTGGACACCATATGCATGGCCATC
64




CTG






Expression
mCherry_F
GGGTTTAAUTCTAGACGGCGGGGAGC
65


Vector

TCG



Rearrange
mCherry_R
ATCGCGCTUCAAATACGCCC
66



hyg_F
AAGCGCGAUATCAAGCTTCTT
67



hyg_R
GGTCTTAAUGGTACCCGCTTCAAATAC
68




GCCC






sap11
sap11_F
GCTGAGGGTTTAATTCTAGAACATGCT
69


introduction
sap11_R
CCCTTGGACACCATATGC
70


into





rearranged





vector








Promoter
sap11Δ-230_F
GCTGAGGGTTTAATTCTAGAAAGCAAG
71


deletion

TATGCAGC



sap11
sap11Δ-130_F
GCTGAGGGTTTAATTCTAGAGCATGTT
72




TGCTTGGC




sap11Δ-30_F
GCTGAGGGTTTAATTCTAGAAAGCCGA
73




GCGAGCCC




sap11_R
CCCTTGGACACCATATGC
74





Motif
sap11_F
GCTGAGGGTTTAATTCTAGAACATGCT
75


Deletion

GACTACGA




sap11_R
CCCTTGGACACCATATGC
76



m1_F
GGGGTTTTTTTTACATGCATGATGGGC
77



m1_R
TGTAAAAAAAACCCCGATAAAGCCCG
78



m2_F
CGCAAAAAAAATTTCCCAAAATTGCG
79



m2_R
GGGAAATTTTTTTTGCGCGCGCCCCAT
80




GC




m3_F
ACATTTTTTTTGGGGCGCGCGCCG
81



m3_R
CCCCAAAAAAAATGTACGCCCATTGC
82



m4_F
TGCTTTTTTTTATGGGCGTACATCTC
83



m4_R
CCATAAAAAAAAGCAACTTGCTCAAA
84




G




m5_F
CTATTTTTTTTACTAACCTGCAGCGG
85



m5_R
TAGTAAAAAAAATAGCGTGTGCCCAC
86




A










C. reinhardtii growth and transformation. Wild-type (cc1690) C. reinhardtii were grown and transformed using the methods described previously using 1 μg of plasmid DNA (Rasala et al., 2012). Plasmid constructs were digested with KpnI to linearize them prior to electroporation. Transformants were first screened on TAP (Tris-acetate-phosphate)/agar plates containing 15 μg/ml hygromycin, resulting in approximately 5,000 to 12,000 transformants per selection. The entire transformant pool was then collected and transferred to liquid TAP medium for two days, followed by screening on the flow cytometer.


Flow cytometry measurement of mCherry fluorescence. mCherry fluorescence was visualized by a BD LSRII flow cytometer and analyzed using FlowJo v10.0.8. The population was gated using the following strategy: the FSC and SSC parameters were obtained using a 488 nm blue laser and were used to eliminate smaller non-algal samples and clumps of algae that can be misread as a single cell. Next, the 488 nm laser using a 685 LP and a 710/50 filter set was used in combination with a 405 nm violet laser and 450/50 filter to remove dead cells and remaining debris from the population. The mCherry fluorescence was then measured with a 561 nm yellow/green laser with a 600 LP and 610/20 filter set. To better visualize the population, the mCherry fluorescence channel was plotted against the window created by the 405 nm laser with a 505 LP and 535/30 filter set. Using the untransformed parent strain as a reference, the events containing only background fluorescence were removed from the analysis. What remained was considered single-cell, living, C. reinhardii that is expressing mCherry. A representative window was selected from the remaining population and the mCherry fluorescence channel was broken down into individual events, resulting in 80 to 10,000 data points.


Genomic promoter motif analysis. For whole genome promoter analysis, genome sequence and annotation for Creinhardtii_281_v5.5 was obtained from phytozome.jgi.doe.gov (Merchant et al., 2007). Annotated 5′ UTR start sites were compared to PASA assembled EST start sites. Only 4,412 of the 22,892 total annotated 5′ UTR start sites were within 10 bp of a PASA EST start site and considered EST validated sites. Sequence from −1000 bp upstream to +500 bp downstream of the validated 5′ UTR start sites were analyzed for new motifs using DREME (Bailey, 2011). Then the promoter sequences were analyzed by CentriMo to identify POWRS or DREME motifs that are enriched in specific regions relative to the TSS (Bailey and Machanick, 2012).


96-well vs flow cytometry mCherry fluorescence measurement. Two independent pools of C. reinhardtii were grown and transformed as described in experimental procedures. Differences in transformation efficiencies resulted in twice as many transformants in pool 2 as in pool 1. Each pool was transformed twice each with ar1 or sap11 resulting in four independent pools of transformants. After selection on solid media, 24 transformants were picked from each plate and transferred to a 96-well plate with 200 μl TAP, grown to saturation, then diluted 1:20 in TAP. Transformed cells were grown until late log phase in TAP media without antibiotics. Cells (100 μl) were transferred to a black 96-well plate (Corning Costar, Tewksbury, Mass.). mCherry fluorescence (575 nm/608 nm) was read using a Tecan plate reader (Tecan Infinite M200 PRO, Mannedorf, Switzerland). Fluorescence signals were normalized to chlorophyll fluorescence (440 nm/680 nm). After first 24 transformants were selected, the remaining transformants were collected from each plate and transferred to 50 ml TAP. mCherry fluorescence was measured as in experimental procedures. While, measurement of 24 transformants per construct resulted in variable results between experiments, measurement of 6000+ transformants resulted in consistent, reproducible results. This result was also independent of transformation efficiency.


Results

Native motif identification and saps generation. In order to generate saps capable of driving high heterologous gene expression, native C. reinhardtii genes were analyzed that showed the highest RNA accumulation in wild type (wt) cells grown under ambient conditions. The top 50 genes were identified based on previously published RNA-seq data (Fang et al., 2012). This data set was chosen because the growth conditions best match typical ambient small scale laboratory growth conditions for green algae. Promoter regions (−1000 to +50 nt from the transcription start site) from these genes were analyzed using the POWRs software (Davis et al., 2012). POWRs identifies motifs based not only on enriched sequences but also on the position of these elements within the promoter region. POWRs clusters sequences together based on similarity to create motif clusters that can be characterized by position weight matrixes. POWRs identified 127 motif clusters containing 979 unique motifs within the top 50 native gene promoters (FIG. 2). Upon inspection of the motifs, nine TC rich motifs were identified, some of which were localized around the transcription start site (TSS; FIG. 3). In Arabidopsis thaliana, a TC-like motif near the TSS may function similarly to the TATA box (Bernard et al., 2010). Therefore, these TC rich motifs were added to every synthetic promoter and enriched around the TSS.


Analysis of the top 50 native promoters also revealed that there is a decrease in the GC content within 500 bp around the transcription start site (FIG. 1, panel A). This trend is in direct contrast with the promoters of higher plant species, which skew towards higher GC content near the TSS (Calistri et al., 2011; Fujimori et al., 2005). C. reinhardtii promoter GC content structure most resembles Saccharomyces cerevisiae and some prokaryotic species that increase AT-content towards the TSS. This trend in C. reinhardtii does not appear to be due simply to the higher overall GC content of its nuclear genome, since species like the red alga Cyanidioschyzon merolae also have high GC content but have an increase in GC towards the TSS (Calistri et al., 2011). In addition to a general AT-increase at the TSS, there also appeared to be smaller dips in GC content at approximately −280 and −140 bp upstream of the TSS. These AT-rich regions have a similar periodicity as that of nucleosome wrapped DNA, which is around 147 bp (Lodha and Schroda, 2005). These AT-rich regions were incorporated in the synthetic promoters.


Synthetic promoters were generated to include nucleotide backbones that had a similar GC profile as the native promoters, including the aforementioned AT-bias towards the TSS and AT rich regions at −280 and −140 bp (FIG. 1, panel A). Promoters were designed to be 500 bp in length for ease of synthesis and analysis. Since many motifs are localized across and downstream of the TSS, promoters were designed to mimic −450 bp upstream and 50 bp downstream of the TSS in order to not cutoff important motifs. This is a similar strategy to previous native hybrid promoter designs (Schroda et al., 2000). Motifs were overlaid onto nucleotide backbones constrained to a similar region to where they were found in the native sequences (Davis et al., 2012; FIGS. 2, FIG. 1, panel B).


Synthetic promoters drive transcription in vivo. Twenty five saps were studied for their ability to drive the expression of the mCherry fluorescent reporter protein. The saps were synthesized and cloned in front of an mCherry reporter gene, which also contained the 5′ and 3′ RBCS2 UTRs as well as the first RBCS2 intron (FIG. 1, panel C). These elements have all been previously shown to improve mRNA accumulation and protein synthesis of heterologous genes in C. reinhardtii (Rasala et al., 2013; Lumbrears et al., 1998). The vector construct also included a hygromycin resistance cassette, which was driven by the beta tubulin (TUBB2) promoter to select for transformed algae independent of synthetic promoter function (Berthold et al., 2002). This allowed large scale mCherry analysis of all promoters including weak or non-functioning promoters.


Transformation of the C. reinhardtii nucleus occurs almost exclusively through non-homologous end-joining (Gumpel et al., 1994; Sodeinde and Kindle, 1993). This results in random insertion, multiple insertions, and highly variable exogenous gene expression. Typical promoter analysis involves measuring the expression of 10-50 individual transformants. However, measuring individual transformants is time and resource consuming, and the variability in expression is still high unless many individual are measured. Alternatively, if many transformants are pooled and protein or RNA levels are measured of the total population, noise from positional insertion effects can be reduced, but this does not allow measurement of the range of expression over the population pool. Therefore, for this study flow cytometry was used to measure promoter strength. Flow cytometry allows measurement of both a large number of transformants while also recording the data for individual transgenic cells. This provides a highly confident average as well as the range of expression for our reporter gene for each promoter tested.


To determine if our synthetic promoters were functional based on our design principles, and not just coincidental, random promoters were also generated whose sequence had a similar GC content to both native and our synthetic promoters, but with no periodical AT rich regions upstream or placement of motifs (FIG. 1, panel A, Table 1). These promoters would also serve as a negative control for random positional effects since exogenous gene expression can occur simply due to insertion next to a native promoter (Haring and Beck, 1997).


Analysis of mCherry expression driven by the 25 saps revealed a wide range of functionality compared to ar1. As expected, there was low level of mCherry fluorescence above the WT background in our random promoter transformants (FIG. 1d). It is important to note that while five random promoters were generated, only two provided had enough mCherry positive transformants to perform proper statistical analysis and are shown in FIG. 1, panel D. Multiple transformations and screenings were performed to increase the number of positive events for statistical analysis, but none could be successfully reproduced. Eight saps were found to be no better than these randomly generated promoters (FIG. 1, panel E). However, 10 saps were not only better than our random controls, but were as good as ar1. Encouragingly, seven saps were actually better than ar1 (Tukey HSD, p <0.05) with both average and max mCherry fluorescent levels almost twice as high as ar1. These results were consistent over multiple transformations and screenings (FIG. 4, panel A).


sap11 contains a positive cis-effector motif. In order to determine which motifs contribute to the promoter strength of the high-expressing saps, we chose sap11 for further analysis, as it consistently produced the greatest amount of mCherry. First, a deletion series was performed in which nucleotides were deleted from the 5′ end so that −-250, −150, or −50 bp upstream of the TSS remained (FIG. 5, panel A). For this study, the expression vector was rearranged so that the hygromycin resistance cassette was downstream of the mCherry cassette. This rearrangement avoided any confounding data due to the relative shift of the position of the 3′UTR from the hygromycin cassette after promoter deletion. Rearrangement did not affect the promoter function of either ar1 or sap11 (FIG. 4, panel B). The relative mCherry fluorescence from sap11 in this rearranged vector was unchanged from the original design (FIG. 1, panel E, and FIG. 5, panel B). Analysis of mCherry fluorescence in sap11Δ mutants revealed only a slight reduction in expression in sap11Δ-250 and sap11Δ-150 mutants (FIG. 5, panel B). However, a significant drop in expression was observed in sap11Δ-50 where there was no expression above those found for the random promoters. These results are consistent with the fact that core motifs are often found within 200 bp upstream of the TSS (Berendzen et al., 2006; Maston et al., 2006; Yamamoto et al., 2007).


To further narrow down specific motifs essential for sap11 function, motif deletion analysis was performed. Four regions contained POWRs identified motifs between −150 and −50 bp from the TSS (FIG. 5, panel C). Eight A residues were used to replace the entire motif or the majority of the bases of the motif for those longer than 8 nucleotides. For motif 2, polyT residues were used to replace the motif since the region was highly A rich. Motif 5 comprised of a TC-rich motif that resided around the TSS. This motif was also deleted since it is homologous to the TC motifs found in Arabidopsis, and was therefore thought to be a functional element (Bernard et al., 2010). However, deletion of motif 5 (sap11Δm5) did not result in significant reduction in mCherry production (FIG. 5, panel D). Therefore, either this particular iteration of the motif was not utilized in sap11 or the TC motifs are not essential in C. reinhardtii. The deletion of both motif 3 and 4 (sap11Δm3 and sap11Δm4) resulted in significant decreases in promoter function, while deletion of motif 1 and 2 (sap11Δm1 and sap11Δm2) had little effect. Interestingly, regions 3 and 4 have nearly identical reverse complement motifs (CCCATGCGA and TGCATGGG, respectively), suggesting they could be targeted by the same transcription factor. In order to determine if regions 3 and 4 were redundant, a double mutant was generated in which both regions were replaced with polyA nucleotides (sap11Δm3-4). This promoter functioned similarly to the individual motif 3 and 4 KOs, suggesting that motif 4 may be redundant with motif 3 or that KO of motifs 3 and 4 already eliminate any expression above background (FIG. 5, panel D). It is important to note while this motif was essential for promoter function in sap 11, this motif alone is not sufficient for expression as several of the non-functioning saps also contained this motif in a similar location (see, e.g., FIG. 2).


Because the CCCAT motifs had such a significant impact on sap11 function, we set out to determine if it may be a core motif within C. reinhardtii. One method to identify core motifs is to identify motifs that are relatively enriched at specific locations relative to the TSS in a large number of promoters. Therefore, we analyzed the promoter regions of 4,412 genes in C. reinhardtii. Promoters were chosen if their 5′ UTR start sites (Chlamydomonas reinhardtii v5.5) were within 10 bp of the start site of PASA(Program to Assemble Spliced Alignments; Phytozome 10.2) assembled EST. Promoter sequences from −1000 to +500 of the 5′ UTR site were analyzed to identify motifs that are enriched in similar regions (Bailey and Machanick, 2012). Surprisingly, the top eight motifs identified were all CCCAT-like motifs that were highly enriched only at −100 to −40 bp upstream of the TSS with a peak at −65 bp (FIG. 6, panel A). Moreover, 10.6% (467 promoters) of all the promoters analyzed had exactly CCCATGCA sequence at this location, while 35.4% (1564 promoters) had some variation of this motif at this location. This suggests that the CCCAT motif is a core motif within the C. reinhardtii promoter.


Motif sequence similarity search using TOMTOM analysis of this motif sequence revealed some homology to the cis-motif recognized by the Arabidopsis phytochrome interacting factor (PIFs; FIG. 4; Gupta et al., 2007). PIFs are involved in light-regulated gene expression (Castillon et al., 2007). Similarly, functional analysis of CCCAT motif-containing genes revealed enrichment in pathways that are diurnally regulated (e.g., Ribosomes, antenna proteins). However, the CCCAT motif was found in over 1,500 genes, the vast majority of which were not diurnally regulated (<5% overlap with differentially regulated genes identified in Zones et al., 2015). The role the CCCAT motif within the context of these native promoters remains to be determined. Interestingly, only one helix-loop-helix transcription factor (Cre14.g620850) could be identified in C. reinhardtii with homology to the PIF proteins in Arabidopsis, based on amino acid similarity. It will be interesting to determine if this putative transcription factor can bind to the CCCAT motif in C. reinhardtii. If it does, it most likely has a unique function compared to Arabidopsis based on its target genes in C. reinhardtii.



C. reinhardtii promoters contain AT and TC rich motifs near TSS. CentriMo analysis of the C. reinhardtii promoters revealed other motifs that were enriched at specific regions relative to the TSS. Of note, AT-rich motifs appeared to peak at the TSS and then at periodic but decreasing intervals both upstream and downstream of the TSS (FIG. 6, panel B). These intervals appeared ˜130 bp apart from each other. These regions correspond to the AT-rich regions found in the top 50 genes (FIG. 1, panel A), and when the relative GC content is analyzed in the larger genomic promoter set a similar pattern of AT-rich regions is seen (FIG. 8). Initially this periodicity suggests a relationship to nucleosome positioning. However, nucleosomes in C. reinhardtii protect 147 bp of DNA and typically have a period of ˜170 bp (Fu et al., 2015; Lodha and Schroda, 2005). Interestingly, this period more closely follows the period of 6 mA methylated sites around the TSS which have a period of ˜134 bp (Fu et al., 2015). However, the AT-rich sites are not located at the same position as either the nucleosomes or the 6 mA sites. Finally, CentriMo analysis found TC rich motifs that were enriched around the TSS of C. reinhardtii promoters. However, their enrichment was far less significant than the CCCAT or TA rich motifs (FIG. 6, panel C). This is consistent with the motif deletion analysis that demonstrated that this motif is not essential in the sap11 promoter.


Discussion

In this study, synthetic promoters were successfully generated that were capable of driving exogenous gene expression within the C. reinhardtii nucleus. The saps generated in this study were based on native DNA motifs identified using the POWRs algorithm. Using a stochastic method of motif placement that was based on motif location relative to the TSS in native promoters, we were able to generate saps that were as successful as, or better than, the best native promoters in C. reinhardtii (Schroda et al., 2002; Schroda et al., 2000). The current best promoter for C. reinhardtii is a non-native promoter ar1 that is a hybrid between two endogenous promoter regions. Our novel saps rely on a completely synthetic promoter backbone with a cis-regulatory motif structure informed from annotation based and experimentally derived genomic information. It should be noted that the HSP70A promoter acts as a transcriptional state enhancer, which increases the probability of transcription of the neighboring promoter (Schroda et al., 2008). It would be interesting to see if fusing the HSP70A promoter upstream our synthetic promoter further improves their function similarly to HSP70A′s effect on RBCS2. Alternatively, our promoters could also be fused with other native 5′ and 3′ UTRs, such as psaD, which in one study showed similar improvements over ar1 for luciferase expression (Kumar et al., 2013).


Bioinformatic analysis used to identify motifs within native promoters led to the identification of novel elements as well as information about promoter structure within the nuclear genome of C. reinhardtii. First, C. reinhardtii promoters have an AT-bias near the TSS, which is unique from other plant species studied thus far (FIG. 1, panel A; Calistri et al., 2011; Fujimori et al., 2005). This bias more than likely affects the structure of the DNA in this location and may be important for nucleosome organization or other DNA-protein interactions (Gabrielian et al., 1999; Kanhere and Bansal, 2005). In addition to an overall AT-bias, there were also pockets of AT-rich regions upstream of the TSS, which correlated with AT-rich motifs found in the EST validated promoters (FIG. 1, panel A and 6, panel B). The pattern of the AT-rich regions corresponds to a similar periodic pattern of 6 mA methylation sites around the TSS, but is shifted by ˜30 bp (Fu et al., 2015). It has been suggested that the periodicity of the 6 mA sites may help establish nucleosome organization around the TSS. Therefore, the AT-bias with specific AT-rich periodic regions may work together with the 6 mA methylation site to establish nucleosome packing and encourage transcription factor and RNA polymerase binding around the TSS.


In addition to AT-rich regions, TC-rich regions were also enriched in C. reinhardtii promoters. This enrichment was more significant in the top 50 expressed genes compared to the genome-in whole (FIG. 6, panel C). This enrichment in top expressed genes is consistent with similar motifs found in Arabidopsis (Bernard et al., 2010). However, when this motif was removed from sap11, there was little loss in promoter function. It is important to note that TC motif analysis in Arabidopsis was only performed in silico. Therefore, the relative importance or function of these motifs has yet to be established in vivo. It is also possible that this motif is a consequence of the relative AT enrichment around the TSS and only its relative AT content is important. Since the motif was replaced with a polyA sequence, the AT content was not significantly changed. Further work is still required to rule out the relevance of the TC-rich motifs in C. reinhardtii.


Promoter motif deletion analysis did reveal the presence of an essential motif within the sap11 promoter. Motif regions 3 and 4 contained nearly identical CCCAT motifs. Knock out of these motifs led to severe reduction of sap11 function. Bioinformatic analysis further revealed that this motif is highly enriched at −65 bp upstream of the TSS of 1564 genes with 446 having the exact CCCATGCA sequence (FIG. 6, panel A). However, many versions of the CCCAT motif contain the conserved CATG 6 mA sequence (Fu et al., 2015). Therefore, the CCCAT motif may function as a target for DNA methylation in its role in transcriptional regulation. While one putative C. reinhardtii transcription factor has been predicted to bind to the CCCAT motif based on in silico homology analysis, further in vitro and in vivo work is required to identify the true transcription factor partner.


The combination of bioinformatic analysis of gene structure and expression and in vivo testing of synthetic primers based on these analyses has proven a fruitful area of research for discovery of unknown cis elements and for use in designing strong synthetic promoters (Blazeck and Alper, 2013; Koschmann et al., 2012; Venter, 2007). The knowledge gained in this study gives us a synthetic template to generate large promoter libraries. These libraries will be used to generate more significant data about the importance of individual motifs and overall promoter structure in C. reinhardtii, which will ideally enable us to generate successive rounds of engineered promoters to achieve exogenous gene expression above currently achieved levels. Large promoter libraries will also allow for the integration of multiple genes into the same host by allowing separate transgenes to be driven by unique promoters to reduce genomic rearrangements brought about by sequence specific targeting that may arise from a genome laced with identical sequences. This latter feature is particularly important in metabolic engineering, which often requires the introduction of multiple enzymes into the host organism. Finally, as we have demonstrated in this study, synthetic promoters provide a platform on which to identify motifs in vivo involved in transcriptional regulation in C. reinhardtii. In the future, this can be expanded to motifs predicted to be involved in inducible regulation such as heat shock, nickel or nitrate addition or iron-deficiency. Together these tools will represent a large step forward in the synthetic engineering of algae for the production of biofuels and bio-products.


References for Example 1

Bailey, T. L. (2011) DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27, 1653-1659.


Bailey, T. L. and Machanick, P. (2012) Inferring direct DNA binding from ChIP-seq. Nucleic Acids Res 40, e128.


Berendzen, K. W., Stuber, K., Harter, K. and Wanke, D. (2006) Cis-motifs upstream of the transcription and translation initiation sites are effectively revealed by their positional disequilibrium in eukaryote genomes using frequency distribution curves. BMC bioinformatics 7, 522.


Bernard, V., Brunaud, V. and Lecharny, A. (2010) TC-motifs at the TATA-box expected position in plant genes: a novel class of motifs involved in the transcription regulation. Bmc Genomics 11, 1-15.


Berthold, P., Schmitt, R. and Mages, W. (2002) An engineered Streptomyces hygroscopicus aph 7″ gene mediates dominant resistance against hygromycin B in Chlamydomonas reinhardtii. Protist 153, 401-412.


Blazeck, J. and Alper, H. (2013) Promoter engineering: recent advances in controlling transcription at the most fundamental level. Biotechnology Journal 8, 46-58.


Blunt, J. W., Copp, B. R., Keyzers, R. A., Munro, M. H. G. and Prinsep, M. R. (2012) Marine natural products. Natural Product Reports 29, 144-222.


Calistri, E., Livi, R. and Buiatti, M. (2011) Evolutionary trends of GC/AT distribution patterns in promoters. Molecular Phylogenetics and Evolution 60, 228-235.


Cardozo, K. H. M., Guaratini, T., Barros, M. P., Falcão, V. R., Tonon, A. P., Lopes, N. P., Campos, S., Tones, M. A., Souza, A. O., Colepicolo, P. and Pinto, E. (2007) Metabolites from algae with economical impact. Comparative biochemistry and physiology. Toxicology & pharmacology 146, 60-78.


Castillon, A., Shen, H. and Huq, E. (2007) Phytochrome Interacting Factors: central players in phytochrome-mediated light signaling networks. Trends Plant Sci 12, 514-521.


Cerutti, H., Johnson, A., Gillham, N. and Boynton, J. (1997) A eubacterial gene conferring spectinomycin resistance on Chlamydomonas reinhardtii: integration into the nuclear genome and gene expression. Genetics 145, 97-110.


Corchero, J., Gasser, B., Resina, D., Smith, W., Parrilli, E., Vázquez, F., Abasolo, I., Giuliani, M., Jäntti, J., Ferrer, P., Saloheimo, M., Mattanovich, D., Schwartz, S., Tutino, M. and Villaverde, A. (2013) Unconventional microbial systems for the cost-efficient production of high-quality protein therapeutics. Biotechnology Advances 31, 140-153.


Davis, I., Benninger, C., Benfey, P. and Elich, T. (2012) POWRS: position-sensitive motif discovery. Plos One 7, e40373.


Diaz-Santos, E., de la Vega, M., Vila, M., Vigara, J. and León, R. (2013) Efficiency of different heterologous promoters in the unicellular microalga Chlamydomonas reinhardtii. Biotechnology Progress 29, 319-328.


Dufresne, A., Ostrowski, M., Scanlan, D. J., Garczarek, L., Mazard, S., Palenik, B. P., Paulsen, I. T., de Marsac, N. T., Wincker, P., Dossat, C., Ferriera, S., Johnson, J., Post, A. F., Hess, W. R. and Partensky, F. (2008) Unraveling the genomic mosaic of a ubiquitous genus of marine cyanobacteria. Genome biology 9, R90.91-15.


Fang, W., Si, Y., Douglass, S., Casero, D., Merchant, S., Pellegrini, M., Ladunga, I., Liu, P. and Spalding, M. (2012) Transcriptome-wide changes in Chlamydomonas reinhardtii gene expression regulated by carbon dioxide and the CO2-concentrating mechanism regulator CIA5/CCM1. Plant Cell 24, 1876-1893.


Fischer, N. and Rochaix, J. (2001) The flanking regions of PsaD drive efficient gene expression in the nucleus of the green alga Chlamydomonas reinhardtii. Molecular Genetics and Genomics 265, 888-894.


Fischer, N., Stampacchia, O., Redding, K. and Rochaix, J. D. (1996) Selectable marker recycling in the chloroplast. Molecular and General Genetics 251, 373-380.


Fu, Y., Luo, G. Z., Chen, K., Deng, X., Yu, M., Han, D., Hao, Z., Liu, J., Lu, X., Dore, L. C., Weng, X., Ji, Q., Mets, L. and He, C. (2015) N6-methyldeoxyadenosine marks active transcription start sites in Chlamydomonas. Cell 161, 879-892.


Fujimori, S., Washio, T. and Tomita, M. (2005) GC-compositional strand bias around transcription start sites in plants and fungi. Bmc Genomics 6.


Gabrielian, A. E., Landsman, D. and Bolshoy, A. (1999) Curved DNA in promoter sequences. In Silico Biol 1, 183-196.


Georgianna, D. R., Michael, J. H., Marina, M., Shuiqin, W., Kyle, B., Alex, J. L., James, H., Michael, M. and Stephen, P. M. (2013) Production of recombinant enzymes in the marine alga Dunaliella tertiolecta. Algal Research 2, 2-9.


Gimpel, J., Specht, E., Georgianna, D. and Mayfield, S. (2013) Advances in microalgae engineering and synthetic biology applications for biofuel production. Current opinion in chemical biology 17, 489-495.


Gimpel, J.A. and Mayfield, S. P. (2013) Analysis of heterologous regulatory and coding regions in algal chloroplasts. Applied microbiology and biotechnology 97, 4499-4510.


Griesbeck, C. and Kirchmayr, A. (2012) Algae: An alternative to the higher plant system in gene farming. In: Molecular Farming in Plants: Recent Advances and Future Prospects (Wang, A. and Ma, S. eds), pp. 125-143. Dordrecht, Netherlands: Springer Science & Business Media.


Gumpel, N. J., Rochaix, J. D. and Purton, S. (1994) Studies on homologous recombination in the green alga Chlamydomonas reinhardtii. Curr Genet 26, 438-442.


Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. and Noble, W. S. (2007) Quantifying similarity between motifs. Genome Biology 8, R24.


Hammer, K., Mijakovic, I. and Jensen, P. (2006) Synthetic promoter libraries-tuning of gene expression. Trends in Biotechnology 24, 53-55.


Haring, M. A. and Beck, C. F. (1997) A promoter trap for Chlamydomonas reinhardtii: Development of a gene cloning method using 5′ RACE-based probes. Plant J 11, 1341-1348.


Kanhere, A. and Bansal, M. (2005) Structural properties of promoters: similarities and differences between prokaryotes and eukaryotes. Nucleic Acids Res 33, 3165-3175.


Koschmann, J., Machens, F., Becker, M., Niemeyer, J., Schulze, J., Billow, L., Stahl, D. and Hehl, R. (2012) Integration of bioinformatics and synthetic promoters leads to the discovery of novel elicitor-responsive cis-regulatory sequences in Arabidopsis. Plant Physiology 160, 178-191.


Kumar, A., Falcao, V. R. and Sayre, R. T. (2013) Evaluating nuclear transgene expression systems in Chlamydomonas reinhardtii. Algal Res 2, 321-332.


Lingg, N., Zhang, P., Song, Z. and Bardor, M. (2012) The sweet tooth of biopharmaceuticals: importance of recombinant protein glycosylation analysis. Biotechnology Journal 7, 1462-1472.


Lodha, M. and Schroda, M. (2005) Analysis of chromatin structure in the control regions of the Chlamydomonas HSP70A and RBCS2 genes. Plant Mol Biol 59, 501-513.


Lodha M, Schulz-Raffelt M, Schroda M. (2008) A new assay for promoter analysis in Chlamydomonas reveals roles for heat shock elements and the TATA box in HSP70A promoter-mediated activation of transgene expression. Eukaryotic Cell 7, 72-176.


Lumbreras, V., Stevens,D., and Purton, S. (1998) Efficient foreign gene expression in Chlamydomonas reinhardtii mediated by an endogenous intron. The Plant Journal 14, 441-447.


Manuell, A.L., Beligni, M.V., Elder, J. H., Siefker, D. T., Tran, M., Weber, A., McDonald, T. L. and Mayfield, S. P. (2007) Robust expression of a bioactive mammalian protein in Chlamydomonas chloroplast. Plant Biotechnology Journal 5, 402-412.


Maston, G. A., Evans, S. K. and Green, M. R. (2006) Transcriptional regulatory elements in the human genome. Annual review of genomics and human genetics 7, 29-59.


Merchant, S. S., Prochnik, S. E., Vallon, O., Harris, E. H., Karpowicz, S. J., Witman, G. B., Terry, A., Salamov, A., Fritz-Laylin, L. K., Maréchal-Drouard, L. and others (2007) The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science 318, 245-250.


Mukherji, S. and van Oudenaarden, A. (2009) Synthetic biology: understanding biological design from synthetic circuits. Nature Reviews Genetics 10, 859-871.


Parker, M. S., Mock, T. and Armbrust, E. V. (2008) Genomic insights into marine microalgae. Annual Review of Genetics 42, 619-645.


Rasala, B., Barrera, D., Ng, J., Plucinak, T., Rosenberg, J., Weeks, D., Oyler, G., Peterson, T., Haerizadeh, F. and Mayfield, S. (2013) Expanding the spectral palette of fluorescent proteins for the green microalga Chlamydomonas reinhardtii. The Plant Journal 74, 545-556.


Rasala, B. A., Lee, P. A., Shen, Z. X., Briggs, S. P., Mendez, M. and Mayfield, S. P. (2012) Robust expression and secretion of xylanasel in Chlamydomonas reinhardtii by fusion to a selection gene and processing with the FMDV 2A peptide. PloS one 7, e43349.


Rosales-Mendoza, S., Paz-Maldonado, L. M. T. and Soria-Guerra, R. E. (2012) Chlamydomonas reinhardtii as a viable platform for the production of recombinant proteins: current status and perspectives. Plant Cell Rep 31, 479-494.


Ruth, C. and Glieder, A. (2010) Perspectives on synthetic promoters for biocatalysis and biotransformation. Chembiochem 11, 761-765.


Schroda, M., Beck, C. and Vallon, O. (2002) Sequence elements within an HSP70 promoter counteract transcriptional transgene silencing in Chlamydomonas. The Plant Journal 31, 445-455.


Schroda, M., Blocker, D. and Beck, C. (2000) The HSP70A promoter as a tool for the improved expression of transgenes in Chlamydomonas. The Plant Journal 21, 121-131.


Sharma, N. K., Tiwari, S. P., Tripathi, K. and Rai, A. K. (2011) Sustainability and cyanobacteria (blue-green algae): facts and challenges. Journal of Applied Phycology 23, 1059-1081.


Sodeinde, O. A. and Kindle, K. L. (1993) Homologous recombination in the nuclear genome of Chlamydomonas reinhardtii. Proceedings of the National Academy of Sciences 90, 9199-9203.


Specht, E. and Mayfield, S. P. (2012) Synthetic oligonucleotide libraries reveal novel regulatory elements in Chlamydomonas chloroplast mRNAs. ACS Synthetic Biology 2, 34-46.


Specht, E., Miyake-Stoner, S. and Mayfield, S. (2010) Micro-algae come of age as a platform for recombinant protein production. Biotechnology letters 32, 1373-1383.


Specht, E. A., Nour-Eldin, H. H., Hoang, K. T. D. and Mayfield, S. P. (2015) An improved ARS2-derived nuclear reporter enhances the efficiency and ease of genetic engineering in Chlamydomonas. Biotechnology Journal 10, 473-479.


Venter, M. (2007) Synthetic promoters: genetic control through cis engineering. Trends Plant Sci 12, 118-124.


Wu, J., Hu, Z., Wang, C., Li, S. and Lei, A. (2008) Efficient expression of green fluorescent protein (GFP) mediated by a chimeric promoter in Chlamydomonas reinhardtii. Chinese Journal of Oceanology and Limnology 26, 242-247.


Yamamoto, Y. Y., Ichida, H., Matsui, M., Obokata, J., Sakurai, T., Satou, M., Seki, M., Shinozaki, K. and Abe, T. (2007) Identification of plant promoter constituents by analysis of local distribution of short sequences. Bmc Genomics 8, 67.


Zhang, Y., Werling, U. and Edelmann, W. (2012) SLiCE: a novel bacterial cell extract-based DNA cloning method. Nucleic Acids Res 40, e55.


Zones, J. M., Blaby, I. K., Merchant, S. S. and Umen, J. G. (2015) High-Resolution Profiling of a Synchronized Diurnal Transcriptome from Chlamydomonas reinhardtii Reveals Continuous Cell and Metabolic Differentiation. The Plant Cell 10, 2743-2769


Example 2
A Synthetic Nuclear Transcription System in Green Algae: Characterization of Chlamydomonas reinhardtii Nuclear Transcription Factors and Identification of Targeted Promoters

This example is published as Anderson, et al, Algal Research (2017) 22:47-55. which is hereby incorporated herein by reference in its entirety for all purposes.


Methods

Algal Strains, Culture Conditions, and Genetic Transformation. Chlamydomonas reinhardtii cc1010 (Chlamydomonas Resource Center, St. Paul, NM) was used as the wild type strain for this study. Algal strains were cultured in TAP (Tris-Acetate-Phosphate) medium [25] at 23° C. under constant illumination (5,000 lux) and with constant shaking (100 rmp). C. reinhardtii was transformed by electroporation as previously described [19] with the exception of the 40 mM sucrose supplement. Transformants were selected on TAP medium agar plates supplemented with 10 μg/ml zeocin. Gene-positive colonies were screened by PCR.


Generation of Transcription Factor Library. Initial gene models for 346 identified C. reinhardtii TFs were obtained from the PlnTFDB (http://plntfdb.bio.uni-potsdam.de/v3.0/) [24,26]. These were then cross-referenced by BLAST against the Phytozome database (http://phytozome.jgi.doe.gov) to obtain the most up-to-date and accurate gene models. Primers were designed to anneal to the 5′ and 3′ ends of each gene (Integrated DNA Technologies). RNA was isolated from cc1010 cultures grown to 6×108 cells per ml using PureLink Plant RNA Reagent (Ambion by Life Technologies) and cDNA libraries generated with Verso cDNA Synthesis Kit (Thermo Fisher Scientific). Gene coding sequences were amplified with Phusion Polymerase using the GC buffer (Thermo Fisher Scientific) supplemented with 0.5 to 1M Betaine (Sigma) with a touchdown PCR protocol [27]. Successfully amplified CDSs were then cloned into the pENTR/D-TOPO vector in E. coli via TOPO cloning (Life Technologies). Resulting clones were sequence verified by Sanger sequencing. Silent mutations were deemed acceptable. In the case of non-silent mutations, these were allowed only after multiple independent clones were confirmed with the same difference(s) from the published gene model. Clones were transferred to pDEST22 (S. cerevisiae Y1H vector) or pTM207 (ble2A-derived [19] C. reinhardtii nuclear expression vector) via Gateway LR-Clonase (Life Technologies).


Yeast Culture Conditions and Yeast One-Hybrid Assay. Culture conditions and mating of Saccharomyces cerevisiae YM4271 and Y1H assay were performed following the MATCHMAKER One-Hybrid System protocol (CLONTECH Laboratories, Inc.). Reporter plasmids were chromosomally integrated in the S. cerevisiae YM4271 genome. Briefly, Y1H library strains were inoculated into 96 well plates and cultured overnight (O/N). The OD600 was measured from 100 μl of O/N culture. Using white 96 well plates (Greiner), 50 μl of O/N culture was combined with 50 μl of Luciferase Assay Reagent (Promega) using an injector on a Tecan plate reader (Tecan Infinite M200 PRO). Luminescence was measured five seconds post-injection. Luminescence was first normalized to the OD600 and then for each TF normalized to the empty vector control. A one sided Grubbs' test for outliers (0.05 level) was used to determine fold increases in luminescence that were outside the distribution. Assays were repeated with replicates for outlier samples. Values were determined significant by Student's t-test and/or greater than two standard deviations from the mean of the empty vector luminescence control.


Immunoblotting. Cells were cultured until mid to late log phase, washed in PBS-T (Phosphate-Buffered Saline-Tween) buffer, and lysed by sonication. Total soluble protein pellets were resuspended in SDS-PAGE loading buffer. Boiled samples were separated on a 12% SDS-PAGE gel, transferred to nitrocellulose, and probed with anti-GAL4-AD antibody (Sigma) for S. cerevisiae or anti-FLAG monoclonal antibody conjugated to alkaline phosphatase (Sigma) for C. reinhardtii.


RNA Purification. RNA was extracted from C. reinhardtii strains of interest after 3-4 days of growth in TAP medium under constant light using PureLink Plant RNA Reagent (Ambion by Life Technologies) according to the manufacture's protocol. RNA was treated with 4 U of TURBO DNase (Thermo Fisher Scientific) for 30 min at 37° C.


RNA-Sequencing and Analysis. RNA from three biological replicates for each strain analyzed was sent to the Institute for Genomic Medicine at the University of California, San Diego for Next-Generation Sequencing on an Illumina HiSeq2500. Single-end 50 bp reads were generated. Reads were aligned to the latest reference index (Chlre4_Augustus5_transcripts.fasta) downloaded from the Joint Genome Institute (JGI) at www.phytozome.net using TopHat open software on Galaxy (usegalaxy.org) [28-30]. Differential expression analysis was performed using Cufflinks also on Galaxy. For gene identification, C. reinhardtii strain 503 was used as a reference strain due to the lack of a published sequence for strain cc1010. The average log2 (fold change) of all FPKM (Fragments Per Kilobase of transcript per Million mapped reads) values≥1.0 for the experimental strain (transcription factor constitutive-expression) compared to the control strain (GFP constitutive-expression strain) was plotted.


Reverse Transcriptase Quantitative Polymerase Chain Reaction. 1 μg of purified RNA was reverse transcribed using the Verso cDNA synthesis kit (Thermo Fisher Scientific). cDNA was diluted 1:2 for qPCR analysis using Power SYBR Green PCR Master Mix (Applied Biosystems). qPCR was performed on a My iQ thermocycler (Bio Rad). Two biological replicates were performed each with technical triplicates. The ΔΔCt method was used for relative quantification of gene expression [31]. RACK1 was used as an internal standard. The mean log2 (fold change) and SEM from biological replicates was plotted.


Promoter Motif Identification. Promoter sequences were obtained from NCBI. DNA sequences were analyzed using the software programs MEME [32,33], AME [34], and Jalview [35].


Results.

Construction of a putative transcription factor library. One of our main goals with this project was to narrow down the list of potential cognate TF-promoter pairs, i.e., which TFs bind and regulate which nuclear promoters, in C. reinhardtii. An understanding of the global network of regulatory interactions within the nuclear genome is critical for the engineering of synthetic transcription systems, a long-range goal for our laboratory. Therefore, we set out to construct a library of recombinant C. reinhardtii nuclear transcription factors (TFs). Just after the C. reinhardtii genome sequence was completed [11], putative TFs, as well as transcription regulators (TRs), were identified by presence of homology to known TF/TR domains and available at the Plant Transcription Factor Database (PlnTFDB) [24,26]. In order to have the most up-to-date gene model for the TFs and TRs, we took fragments from the identified genes and used a BLAST search against the latest gene models from Phytozome. The TF/TR library (referred to simply as the TF library from here on) was generated using TOPO cloning such that the gene encoding each TF was PCR amplified from C. reinhardtii cc1010 cDNA and ligated into the pENTR/D-TOPO vector, followed by transformation into Escherichia coli (see Materials and Methods). We were able to successfully construct plasmid vectors encoding 92 different putative TFs predicted in the C. reinhardtii genome (Table 8) (from a total of over 300 TFs identified by bioinformatics). Our library contains TFs belonging to multiple TF families including but not limited to: High Mobility Group (HMG) box, basic Helix-Loop-Helix (bHLH), Cys2His2 zinc finger (C2H2), Cys3His zinc finger (C3H), Forkhead-associated (FHA), basic Leucine Zipper (bZIP), MYB (myeloblastosis), Gcn5-related N-acetyltransferase (GNAT), Tubby bipartite (TUB), Tumor necrosis factor receptor-associated (TRAF), SET (histone methyltransferases), and CCAAT-enhancer-binding proteins (CCAAT). A complete list of each TF and relevant information can be found in Table 8.









TABLE 8





Transcription factor library.
























CDS




Augustus u10.2
v4.0 gene

length
TF Library


TF#
gene ID #
ID #
PTFDB TF family
bp
Clone Notes





1
Cre06.g268600
126810
CSD
744






2
Cre06.g261450.
142283
HMG (high
540




t1.1

mobility group)







box







3
Cre14.g620850.
183777
bHLH
1368
silent T681C



t1.1









4
Cre13.g596300.
159133
C2H2 / C2C2-CO-
1233
silent T1074C



t1.1

like







5
Cre06.g250950.
142476
C3H
822




t1.1









6
Cre16.g672300.
184386
HMG (high
621




t1.1

mobility group)







box







7
Cre14.g620500.
347049
AP2-EREBP
1032




t1.1









8
Cre02.g082550.
53522
FHA
2034




t1.1









9
Cre20.g758600.
290169
bZIP
1731




t1.1









10
Cre05.g242600.
187360
C2C2-GATA
1194




t1.1









11
Cre03.g193900
147364
CCAAT
696






12
Cre08.g378800.
345074
C2C2-GATA
633




t1.1









13
Cre07.g341800.
378904
CCAAT
837




t1.1









14
Cre32.g781700.
22211
C3H
534




t1.1









15
Cre16.g671900.
34069
FHA
768




t1.1









16
Cre03.g197100
117291
MYB
1437






17
Cre01.g014050.
146239
C3H
1227




t1.1









18
Cre03.g198800.
417388
MYB-related
1368




t1.1









19
Cre12.g521150.
205894
C2C2-Dof
1875




t1.1









20
Cre06.g293750
194555
C3H
1725
silent G1380A,







appears to have







15 bp repeat at







322 & 685





21
Cre02.g118250.
194816
SWI/SNF-BAF60b
828
6 silent A147T,



t1.1



G444A, G465A,







G555A, T567C,







C738T; plus







C222G causes







D→E mutation.







Apr. 20, 2012 -







Confirmed real







differences







between CC1010







(WT) and







CC503







(reference







sequence)





22
Cre03.g152150.
149734
C2H2
1242
15 bp repeat at



t1.1



222 & 1050





23
Cre03.g194950.
190458
Sigma70
2259




t1.1









24
Cre05.g238250.
410640
bZIP
1575




t1.1









25
Cre04.g228400.
205718
WRKY
1920




t1.2









26
Cre12.g520650.
17453
TUB
1356




t1.1









27
Cre07.g326150.
205729
C3H
2253




t1.2









28
Cre04.g216200
177225
bHLH
1407
silent G1056A





29
Cre14.g624800.
147817

1416
real length is



t1.1



1485, different







splice site from







predicted





30
Cre02.g136800.
205561
MYB-related
2022




t1.1









31
Cre02.g096300.
186972
C2H2
2100




t1.2









32
Cre03.g184150.
115555
GNAT
516




t1.1









33
Cre16.g657150.
287999
GNAT
837




t1.2









34
Cre02.g101850.
377090
GNAT
480




t1.1









35
Cre01.g048800.
283458
GNAT
1005




t1.2









36
Cre11.g480950.
192899
HMG
471




t1.1









37
Cre12.g542500.
79755
mTERF
474




t1.1









38
Cre12.g560200.
165420
GNAT
447




t1.1









39
Cre01.g063450.
193681
PHD
591




t1.1









40
Cre02.g091550.
186648
PBF-2-like
717




t1.1









41
Cre10.g420100.
96716
SBP
1026
7 silent T159C,



t1.1



A180G, G159A,







C660T, C861T,







T873G, C966T





42
Cre11.g475100.
160596
GNAT
396




t1.1









43
Cre02.g079200.
111791
CCAAT
630




t1.1









44
Cre09.g402350.
191829
FHA
555




t1.1









45
Cre13.g590350.
147286
PZIP
1005




t1.1









46
Cre16.g667450.
26047
TUB
1476
silent C939T



t1.1









47
Cre14.g623800.
117568
GNAT
642




t1.2









48
Cre10.g431450.
420467
GNAT
1515




t1.1









49
Cre17.g729750.
289541
GNAT
501




t1.1









50
Cre10.g430750.
338485
MYB-related
972
702-773 in frame



t1.2



deletion (present







in 6 clones, 24







AA long mostly







Alanine repeat)





51
Cre27.g774300.
154505
SET
1566




t1.2









52
Cre29.g778700.
407701
SlFa-like
222




t1.1









53
Cre02.g108450.
76570
MBF1
420




t1.1









54
Cre07.g351850.
337711
GNAT
672




t1.1









55
Cre16.g668200.
288229
PHD
744




t1.1









56
Cre06.g305200.
156694
C2H2
1005




t1.2









57
Cre06.g254650.
134186
C3H
1023




t1.1









58
Cre07.g321550.
187531
bZIP
1182




t1.1









59
Cre17.g702650.
145251
HMG
1212




t1.1









60
Cre01.g022950.
146398
TRAF
1212




t1.1









61
Cre16.g672400.
149109
MYB-related
1506




t1.1









62
Cre12.g540400.
137355
Rcd1
900




t1.1









63
Cre06.g286700.
402799
TRAF
999




t1.1









64
Cre02.g109700.
415443
bHLH
1011




t1.2









65
Cre01.g035150.
406697
C3H
1197




t1.2









66
Cre12.g516050.
423729
FHA
1065




t1.1









67
Cre04.g218400.
423158
TRAF
1179




t1.2









68
Cre10.g441000.
379612
IWS1
1590




t1.1









69
Cre06.g269100.
142152
GNAT
861




t1.2









70
Cre13.g586450.
143712
GNAT
861




t1.1









71
Cre11.g479800.
379890
TRAF
1182




t1.2









72
Cre04.g226400.
189471
CCAAT
1230




t1.1









73
Cre16.g695600.
178083
MYB-related
1416




t1.1









74
Cre09.g392300.
148265
GNAT
1458




t1.2









75
Cre13.g593900.
205788
GNAT
1023
v4.3 had extra



t1.1



intron (263-271)







corrected v5.3





76
Cre17.g739450.
135809
CCAAT
618




t1.2









77
Cre02.g084550.
290467
GNAT
894




t1.1









78
Cre13.g597500.
151334
TRAF
1068




t1.1









79
Cre07.g316600.
142718
FHA
1467




t1.1









80
Cre23.g766800.
391557
MED7
753




t1.1









81
Cre13.g581150.
413200
GNAT
1128




t1.1









82
Cre26.g772400.
398164
Coactivator p15
1371
deletion 1027-



t1.2



1053 from v4.3





83
Cre04.g215450.
151740
TRAF
1587




t1.1









84
Cre08.g364450.
95444
GNAT
543




t1.1









85
Cre12.g556400.
117655
CCAAT
891




t1.2









86
Cre06.g283200.
295365
SET
1008




t1.1









87
Cre07.g319701.
127044
C2C2-GATA
1329




t1.1









88
Cre16.g662650.
288117
GNAT
1044




t1.2









89
Cre06.g256200.
142398
GNAT
1173
silent G510A



t1.1









90
Cre12.g520850.
424885
SOH1
426




t1.1









91
Cre10.g446450.
281993
Orphan
1311




t1.1









92
Cre02.g075650.
417182
C3H
1254




t1.1













SEQ ID



TF#
NO
Reference Sequence





1
87
ATGGGCGAGCAGCTGAGGCAACAGGGAACCGTAAAG




TGGTTCAACGCCACCAAAGGCTTCGGCTTCATCACGC




CTGGTGGTGGCGGCGAGGACCTCTTTGTGCACCAGAC




CAACATCAACTCGGAGGGCTTCCGCAGCCTGCGGGAG




GGTGAAGTCGTCGAGTTCGAGGTTGAGGCTGGGCCGG




ATGGACGCTCTAAGGCTGTGAACGTGACGGGCCCCGG




AGGGGCCGCGCCCGAGGGCGCTCCGCGGAACTTCCGC




GGTGGCGGCCGCGGCCGCGGCCGCGCTCGCGGCGCCC




GCGGCGGCTATGCTGCTGCGTACGGCTACCCGCAGAT




GGCGCCGGTCTACCCCGGCTACTACTTCTTCCCCGCGG




ACCCCACGGGCCGGGGACGGGGTCGCGGCGGCCGCG




GCGGCGCCATGCCCGCCATGCAGGGCGTGATGCCGGG




TGTGGCGTACCCGGGCATGCCCATGGGCGGGGTGGGC




ATGGAGCCGACGGGCGAGCCGTCGGGGCTGCAGGTG




GTGGTGCACAACCTGCCGTGGAGCTGCCAGTGGCAGC




AGCTCAAGGACCACTTCAAGGAGTGGCGGGTGGAGCG




CGCAGACGTCGTGTACGACGCCTGGGGCCGCTCGCGG




GGCTTCGGCACCGTGCGCTTCACGACCAAGGAGGACG




CCGCGACGGCGTGCGACAAGTTGAACAACAGCCAAAT




CGACGGGCGCACGATAAGCGTCCGGCTCGACCGTTTC




GCTTGA





2
88
ATGGCTGGTGACAAGGCTGCCACCAAGGAGAAGAAG




GCCGCAGAGCCCAAGGGCAAGCGGAAGGAGACTGAG




GGCAAGGCCGAGCCCCCCGCCAAGAAGGCTGCCAAG




GCTCCCCCCAAGGAGAAGCCCGCCAAGAAGGCGCCC




GCCAAGAAGGAGAAGAAGGCCAAGGACCCCAACGCC




CCCAAGAAGCCCCTCACTTCCTTCATGTACTTCTCGAA




CGCCATCCGTGAGAGCGTGAAGTCCGAGAACCCTGGC




ATTGCCTTCGGCGAGGTCGGCAAGGTGATCGGCGAGA




AGTGGAAGGGCCTGTCCGCTGACGACAAGAAGGAGT




ACGATGAGAAGGCGGCTAAGGACAAGGAGCGCTACC




AGAAGGAGATGGAGTCTTACGGCGGCTCGTCGGGTGC




CTCCAAGAAGCCCGCGGCCAAGAAGGAGAAGGCTGC




GCCCAAGAAGAAGGCTAAGGAGGAGGAGGAGGAGGA




CGAGCCTGAGGCCGATGACGATGGTGATGACGACGAC




GAGGACGATGATGGTGATGACGATGAGTAA





3
89
ATGCAGCAGTCTTCGCAGCTTGGGCTGCCTGACCAGC




TCGCTCTGCTCAGCGGATTCCCGGCCGCGCTCTTCCCC




CAGCAGTACGGGTCGGGAGACCGCGACCTACAGCTCG




GCGGCCTGCGTAATGTGGGCAAAACGAAGTCTTCTGA




CAGCCGGAGCTCAAGTGCCTACGCGAGCAGGCACCAA




GCGGCTGAGCAACGCCGCCGAACTCGAATCAATGAGA




GGCTGGAGCTCCTGCGCAAGCTGGTGCCGCATGCGGA




GCGCGCCAACACGGCGTGCTTTCTGGAGGAGGTCATC




AAGTACATCGAGGCGCTGAAGGCGCGCACACTGGATC




TAGAGTCGCAGGTGGAGGCCCTGACGGGCAAGCCGGT




GCCCAAGTCGCTGGCGCTGCCCACCGGCATGCCGTCG




GTGCTGGCCGGAGGCTCCACCAGCGCGGACAACACCA




ACGCCAGCCCGCGCATGGTTGGCGCAGCGACATCGTC




GCAGGGCGGGCCCGCGGGCTCGCTGCCATCGGGGCAG




CCGGGCGCCGGCGGGGCGGGCGCGGGCTCCCTAGCCA




GCCCCTCCACCACGCCGCCCCCTACCATGACCGCGCA




GCAGGCCTCCCAGCAGCTCTCGCTCATGCAGTCGGGC




GGGCAGGCGGGCGGCTCGCAGGGCCTGCCGTCACAGC




TGACGCTGCCCAGTGGCGGCGCCGGCGCGGGGCTGCT




CTCGGCGGCGCAGCAGAGCCTGCTGGGTTTCCCCCAG




TCGGGCGGCCTGTCCCTCTCAGGCGCCGGCCTGTCACT




GGGCGGCAGCGGCCTGGGCCACGGCACCAGCGGCAT




CAGCCTGACCCAGTTCGCCGGCAACCTGCAGGCGGCC




GCCGCGGCCGCCGCCGCGGCGTCGCACGGCGCCGGCA




GCCAGTCCCACTCGCAGTCGCAGTCGCAGCACTCCGG




CCTCAGCCTGGGCTCGCACCACGTCACCGCGTCGCAG




CTGAACGAACTGCAGGCCATGCAAATGATGCAGTCGC




TGCAGCAGCACCACAACCAGCACGCGGCGGCCGCCGC




GGTGGTCGCGGCCGCGGGTGGCGGCGGCGGTTCCCGC




CCGGGATCCACGTTCCACCCCACCAACAACAAGGCGT




TCCTGCACTTCAACGAGGACGCCTACGCCTTCAGCGG




CAAGCCCGAGCTGTCGCTACCCGCGCGCAGCCTGCTG




GGTGCAGCCGCGGCCTCCGCCGCCACGCCCAGCACGT




CTCTCCAGCTGACCACCGTGCAGCTGCCCGCGGACTC




GAACACGCTGCTCCAGGTGGAGATGGCGCGCAAGGCC




GCGTCGGGCTCTCCCGTGTCCAGCGAGGAGAGCGGCG




TGCCGCTGAAGAAGCGCAAAGTGCTGGTGCTGTAA





4
90
ATGTCGAGTTGCGTCGTGTGCGCGGCCGCAGCGGTCG




TTTGGTGCCAGAATGACAAGGCGCTGCTTTGCAAGGA




CTGCGATGTGCGCATCCACACCAGCAACGCGGTCGCT




GCGCGCCATACCCGCTTCGTGCCCTGCCAGGGCTGCA




ACAAGGCCGGTGCTGCGCTCTACTGCAAGTGCGACGC




CGCGCACATGTGCGAGGCTTGCCACAGCTCCAACCCC




CTAGCTGCTACGCACGAGACCGAGCCGGTGGCGCCGC




TGCCGTCAGTCGAGCAGGGCGCTGCACCGGAGCCTCA




GGTCCTGAACATGCCCTGCGAGTCTGTGGCGCAGTCT




GCGGCCAGCCCCGCGGCTTGGTTTGTGGACGACGAGA




AGATGGGCACGACCAGCTTCTTTGATGCGCCTGCGGT




GCTGTCGCCCTCGGGCAGCGAGGCCGTGGTGCCCGTC




ATGTCCGCCCCTATCGAGGACGAGTTTGCATTCGCGG




CCGCCCCGGCGACGTTCAAGGAAATCAAGGACAAGCT




CGAGTTCGAGGCTCTGGACCTGGACAACAACTGGCTC




GACATGGGCTTCGATTTCACTGATATCCTGTCCGACGG




CCCCTCTGATGTGGGCCTGGTCCCCACCTTCGATGCCG




TCGATGAGGCCGCGGATGCCGTGGCTGACGCTATCGT




GCCCACCTTCGAGGAGGAGCAGCCCCAGTTACAGCAG




CAGGAGCCCCTGGTGCTGGCTCCCGCCCCGGAGGAGT




CGGCTGCTAGCCGCAAGCGCGCTGCCGCCGAGGAGGC




CGCGGAGGAGCCGGCCGCCAAGGTGCCGGCCCTGACT




CACCAGGCGCTGCTGCAGGCGCAGGCCGCCGCCTTCC




AGGCCGTGCCCCAGGCGTCAGCGCTGTTCTTCCAGCC




GCAGATGCTGGCCGCGCTGCCGCACCTGCCGCTGCTG




CAGCAGCCCATGATGCCGGCAGCCGTCGCCCCGGCGC




CCGTGCCCAAGAGCGGCAGCGCCGCCGCCAGCGCGGC




CCTCGCCGCCGGTGCCAACCTGACTCGCGAGCAGCGC




GTGGCGCGCTACCGCGAGAAGCGGAAGAACCGCTCTT




TCGCCAAGACCATCCGCTACGCTTCCCGCAAGGCGTA




TGCGGAGATCCGCCCCCGCATTAAGGGCCGCTTCGCC




AAGAAGGAGGAGATTGAGGCCTGGAAGGCGGCGCAC




GGCGGCGACGACGCCATTGTTCCCGAGGTCCTGGACG




CTGAGTGCTAA





5
91
ATGGCCGAGCACTTGGCTAGCATCTTCGGCACGGAGA




AGGACCGCGTGAACTGCCCGTTCTACTTCAAGATTGG




AGCGTGCCGCCATGGCGATCGCTGCTCGCGCCTGCAC




AACCGGCCGACGATTAGCCCGACCATTCTAATGGCGA




ACATGTACCAGAATCCGCTTTTGAACGCTCCGCTGGG




GCCGGACGGGCTGCCCATTCGGGTGGATCCCAGGGCT




GCTCAGGAACACTTCGAGGACTTCTATGAGGACGTGT




TTGAGGAGCTGGCGGCGCACGGTGAACTGGAGAACCT




GAACGTGTGCGATAACTTCGCTGACCATATGGTCGGG




AACGTGTACGCCAAGTTCCGGGACGAGGACGCGGCTG




CACGCGCGCTGACGGCGCTGCAGGGCCGCTACTACGA




CGGGCGGCCCATCATCGTGGAATTCAGCCCCGTGACT




GACTTCCGTGAGGCCACGTGCCGCCAGTACGAGGAAA




ACACGTGCAACCGCGGCGGCTACTGCAACTTCATGCA




CCTGAAGCCCATCAGCCGGGAGCTGCGCAAGAAGCTG




TTTGGGAGGTACAAGCGCCGGGAGCGCAGCCGCAGCC




CACGGCGCGACCGCGGCGACCGCGGGGACCGCGGCG




ATCGGCGCGAGCGGGACCGTGACTGGGACCGTGGCGA




CCGGGACCGCGGGCGGGGTCGCAGCCGCAGCCGCAG




CCGCGAGCGGGGGGGTGGCGACCGGCGCCGCGAGAC




GTCGGAGGAGCGCCGCGCAAAGATTGCAGCATGGAA




CACAGAGCGTGACGGAAGTGCTGGTGGCGGCGGCGGT




GGTGGGTGGTGA





6
92
ATGCTGCGCTACGCTGCTCTCCGCACTGTCCCGCGCGC




CATCGCGCCCGCCCGCCGGGCCATGGTGATTCGGTCTT




TCTCGGAAAGCAACGATGCCGCGCCCCCGGCTAAGAA




GGCAACCAAGCCCGCCAAGGCGGAGAAGGCGCCGAA




GGCGGAGAAGGCGCCGAAGGTGGAGAAGCCGAAGGC




GATGCGCGCGCCAAGCGCTTACAACCTGTTCTATAAG




GCGATCTTCCAGCAAGTGCGCAGCGAGAACCCCGACA




AGAAGGTTACTGAGCTCGGGTCAAAGGTCCGCGACAA




GTGGGCTTCCATTTCGGCACTGGAGCGGGCGCCGTAT




GAGGCGCAGGCTGCCGCGCGCAAGAAGGAAGTGGAT




GCCAAGAGGGCTGAGGTGCTGGCTGCCAAGAAGGCC




GCCGCCCGGCCCGTGACCGCCTACATCGCGTTCGCCA




ATGCCAAGCGTCCCGAGATCAAGGCGCAGAACCCTGA




CAAGACCATGGCGCAGGTGGCGAGCCTGCTGGGGTCC




ATTTGGAAGGGGATGTCGGAGGAGCAGCAGAAGCCG




TACCGTGACCAGGCCAAGGCGGCGATGGACGCGTGGA




AGGCCAAGCAGCAGGCGCAGCAGTCCGCGTAA





7
93
ATGGAGACGCTGTGGCCGGCTCCATACGCCCTACCGC




TCCAGTCTGCGGCGATGGCGCTGTCCGAACAGCAGCT




TGGCCAACACATTGATTCTGGCAGCGAGGAGGACCAC




ATCGCGGTCGTGGCGCAGGTCCAGACTGGCAAGAAGC




GACGCAGTGTGAGCGCGGAAGAGGACCCAGACTATG




AGGACGCCGCGCAAGGCGCGCAAGGCATAACGCATG




ATGGTACATCAAACAAGGCCGGCTACCGAGGCGTACG




GCGCCGGCCATGGGGCTCCTACGCCGCCGAGATTCGG




GACGCAGGCTGCGGCAAGCGCCGGTGGATTGGCACGT




TCAAGACTGCTGAGGAGGCTGCACGGGCGTACGATGA




GGCCGCCATTGCGCTGCATGGGCCTCGCGCCAAGACC




AACTTCACCTACCCCTGCCAGCAGCAGAGCGCCGCCG




CCGCGCCAGCCGCCGCACACAAGGCCCACAAGCCGCA




CGCCGCCGCCGCGCCGCAGCACCACAAGCCGGCGCAC




CACAGCCAGCAACCTGCTCAGCCGCGCAAGCAGCCGC




TGCACCCCCGGCAGCCGTACCAGCAGCACCAGCCCCC




CCAGCTGCCGACGCATCAGGAGGAGGAGCAGTACCG




GCGCAAGTCGGACGACTCAGACACCTCTATGACCGCT




GCGCTGCCGCTGCCGCTGTCGCTGACGGGGCAGCTGG




GCCTGCCGCCGCTGACGCTGCCGGGGCTTGAGGGTCT




GGACCTGATGGCGCTGCAGTCCAACCCCGCGCTGCTA




GCCGCGCTGCTCGCCGCCACGCGGCAGCACCTCCCGG




GGTTGGCCGGGCCGGATGCGCAGCCCGCCTGCCTGCC




GGAGCAGCAGCTGTCGGAGCGGGTCTGGGTCCAGGAG




CAGCCGGTGCAGGGGTGCGAGGAGGAGGAGGACGGG




TTGGAGGAGCCGGAGCCGCCGCAGGTGCTGCGGCCGG




AGCAGCTTCGGTCGCTGCAGGTGCTGGCGGAGGTGGC




GCACCTGTTCGGGCGCCGCGACTTCTGCATGTCGTGA





8
94
ATGAAGGTTATTATCGCCGGCGCGGGCATCGGCGGCC




TGGTGCTAGCCGTTGCACTTCTGAAGCAGGGCTTCCA




GGTTCAGGTCTTTGAGCGCGACCTGACGGCCATCCGC




GGCGAGGGCAAGTACCGTGGACCCATCCAGGTTCAAA




GCAATGCGCTCGCTGCGCTGGAGGCTATCGATCCCGA




GGTGGCCGCGGAGGTGCTGCGCGAGGGCTGCATCACT




GGCGACCGTATCAACGGGCTCTGCGACGGCCTGACTG




GCGAGTGGTACGTCAAGTTCGACACGTTCCACCCGGC




GGTCAGCAAGGGCCTGCCGGTGACCCGCGTCATCAGC




CGCCTCACGCTGCAGCAGATCCTGGCCAAAGCCGTGG




AGCGCTACGGCGGCCCCGGCACCATCCAGAACGGCTG




CAACGTGACCGAGTTCACGGAGCGCCGCAACGACACC




ACCGGCAACAACGAGGTGACTGTGCAGCTGGAGGAC




GGGCGCACGTTTGCGGCCGACGTGCTGGTGGGCGCCG




ACGGCATCTGGTCCAAGATCCGTAAGCAGCTCATTGG




CGAGACCAAGGCCAACTACAGCGGGTACACCTGCTAC




ACCGGCATCTCGGACTTTACGCCGGCGGACATTGACA




TTGTGGGCTACCGCGTGTTCCTGGGCAACGGCCAGTA




CTTTGTCAGCAGCGACGTGGGCAACGGCAAGATGCAG




TGGTACGGCTTCCACAAGGAGCCGTCTGGCGGCACCG




ACCCCGAGGGCAGCCGCAAGGCGCGCCTGCTGCAGAT




CTTTGGCCACTGGAACGACAACGTGGTGGACCTGATC




AAGGCCACGCCCGAGGAGGACGTGCTGCGCCGCGAC




ATCTTTGACAGGCCGCCCATCTTCACCTGGAGCAAGG




GCCGCGTGGCCCTGCTGGGCGACAGCGCGCACGCCAT




GCAGCCCAACCTGGGCCAGGGCGGCTGCATGGCCATT




GAGGACGCCTACGAGCTGGCCATCGACCTCAGCCGCG




CCGTGTCCGACAAGGCCGGAAACGCGGCGGCGGTGG




ACGTGGAGGGCGTGCTGCGCAGCTACCAGGACAGCCG




CATTTTGCGCGTCAGCGCCATTCACGGCATGGCGGGC




ATGGCTGCCTTCATGGCCAGCACCTACAAGTGCTACCT




GGGCGAGGGCTGGAGCAAGTGGGTTGAGGGGCTGCG




CATCCCGCACCCCGGCCGCGTGGTGGGGCGGCTGGTG




ATGCTGCTCACCATGCCCAGCGTGCTGGAGTGGGTGC




TGGGCGGCAACACCGACCACGTGGCGCCGCACCGCAC




CAGCTACTGCTCGCTGGGCGACAAGCCCAAGGCTTTC




CCCGAGAGCCGCTTCCCCGAGTTCATGAACAACGACG




CCTCCATCATCCGCTCCTCCCACGCCGACTGGCTGCTG




GTGGCGGAGCGCGACGCCGCCACGGCCGCCGCCGCCA




ACGTGAACGCCGCCACCGGCAGCAGCGCCGCCGCGGC




CGCCGCCGCCGACGTGAACAGCAGCTGCCAGTGCAAG




GGCATCTACATGGCGGACTCGGCGGCCCTGGTGGGCC




GCTGCGGCGCCACCTCGCGCCCCGCGCTGGCCGTGGA




CGACGTGCACGTCGCCGAGAGTCACGCGCAGGTCTGG




CGCGGCCTCGCCGGCCTCCCCCCCTCCTCGTCGTCCGC




CTCCACCGCCGCCGCCTCTGCGTCCGCCGCCTCCTCTG




CCGCCAGCGGCACCGCCAGCACCCTGGGCAGCTCGGA




GGGCTACTGGCTCCGCGACCTGGGCAGCGGCCGCGGC




ACCTGGGTCAACGGCAAGCGCCTGCCCGACGGCGCCA




CGGTGCAGCTGTGGCCCGGCGACGCGGTGGAGTTCGG




CCGGCACCCCAGCCACGAGGTGTTCAAGGTGAAGATG




CAGCACGTGACGCTGCGCAGCGACGAGCTCAGCGGCC




AGGCCTACACCACGCTCATGGTGGGCAAGATCCGGAA




CAACGACTACGTCATGCCCGAGTCGCGGCCGGACGGC




GGCAGCCAGCAGCCGGGCCGCCTGGTGACGGCTTAA





9
95
ATGGCTCGACAACAGCAGCATCAGCAGCAAGCCTCTG




ACCAGCAGCAGACCGGCGCTCGAGCGAACGGCCGGC




GAGCTTGTCGGCGCGGCAGCGACGAGCCCGCAGAGG




AGGTGAACGCCATGGACAGCCCCTCCTCCTCACCAGC




AGGTGCCGGGAAGGTGAGCCAGCGCGGCCGCAAGGC




CGCAGCGGCCTCCGGCGCCGCGGCGACCAAGCGCGGC




ACCAGCGCATCCGGAGCCGGCTCAGGGCCGGACGAG




GGTGGCGCCCCCGGCAACAACGGCAGCGGCAGCTTCG




CGCTGCCCCTGTCTACCGGCGGCGGCGCACGCAGCCG




GCACCGGCGCAGCCCCAGTGACCTCAGCGAGCCCTCG




GCCAGCGGCCTGCCGGGCGCACTGCCACTGCCGCTGC




CCCTAGTGGCCGACAAGCCGCTGAGCGAGTTCGTGGG




CCAGACCCGCGCCAACGCGCTGGACCCGGCGCAGCTG




GACCCCAAGCGCGCGCGCCGCATCATCGCCAACCGGC




AGTCGGCGCACCGCAGCCGCATGAAGAAGCTGCAGCT




CATCCACGAGCTGGAGCAGCGGGTGACGACCGCGCGC




GCCGCCACGGACGCGGTGCGGCAGCAGAACGTCGCG




GCGGCGGAGCGGCGGCGCGAGCTGCTCACGGCGGCG




GCGACGGCGCAGCAGCAGCTGGCGGAGCTGCGGCGC




GAGGCGGCGGCTGTGGCGGCCATGCACAGCGCCCTGG




CGGCGGAGCTCGCCAAGATAGGCATCGCGGGGCCGCC




GCCAGCGCCCGCGGCAGCAGAGCCGGCGGCGGCGCC




CGCCGACGGCATGGAGGTTGGGCTGCGTGGCTCGAGC




GGCGGTGCGGTGGCGCCCGCGACGCCGCCTAATGGCT




CGGAGGTGGGCGCCGAGCTGCACGGCCGCATGTCAGT




CAACGGGGCCGCCACCCGCGCCGCCGGCGGCCCGTCG




GCTTCCGGCAGCTCCGGCACATCGGCGTCCATGGGTC




AGGCTGGGGCTGCGGGCTCCCAGCCTGGCGGCGCGGC




GGTGCCTGAGAGCCCCTTCCTCCTGCCGCACCTGCCGC




CGCCGCACATCATGTCCGCTCACACCGCCGCCGCGGC




TGGCAGTGGCGGTGGCGGCGGCTCGTTTTCAAACCAC




CACCATCACCACCACAGCCACAGTCACAGTGGGAGCG




GCAGCGCTATGCCGCTGCTGTCCGCTCCCGGTGCCGC




CTCCTACACCTTTGGGCAGCAGCACAACCCAGCCCAC




CAGCAGCAGCACCAGCAGCAGCCCGCGCCGTTCCTGC




AAGGTGCCCTGCCGCAGCACACGCAGCTGGCGCACCC




CGCGCCCTCGCACAGCCGCAACCCCTCCGCCAGCAGC




CTGGCCGGCCCGGCGCCTTCGCAACCCAGCGCCGCCG




TGGAGGCTGCGGCTGCCTTCCAGCAGGCGCCCACAGC




CGCTGACGTCACGCCGGAGCCGGGCGCCAGGCAGGAT




GGCGGCGGCGGCGGTGGCGGCGAAGTGGCTCACGGC




AGTTCGCCCATGGCCCTGGACGGGTTTGGCCTGGCAG




GGCTGATGGGGCTGGGCATGGGCAACGACGGCCTGGC




AGGAGGCGGCGGCATCGGAGGAGGCGGAGGCGAGGG




GGAGGCGGGGGCGGTGGGGGACAGTGACACGGACGT




GGGCGACTTCTTGTTGATGGGCATGGGAGACGGCGAT




GGGGACGACACGGCGCCCACGGACGGGGCGGGATTG




TGA





10
96
ATGGCCCCCGCCCCAGCTTTCGAGCCGTCCTGCTCCAT




GCTGTCCGTCTTCAGCATGTGCACCGCGCTACCGCTGG




CGGAGCGTGACGTGAACGGCGCCGGCGCCTGCTTCTC




AGGAGCCTCCGCGCTGGCGTGCCCCTCCAAACCGGCT




TCGATACGCCGTGGGGCGTCGTTCCTCGATGTGGAGG




ATGCCTGTGTGGGCCTGACTAGCGCCGACCGTGCCTG




CTTCCTCATACCTGAGGACAGCGTGTATGTGTCGCCCG




CCTGCTCCGCTCGCGAGAACGCCGGCGCCGGCCCCCG




CCTGCCGCTGCCCAGCGGCACCTTCACCACCGCCGTC




GCCACCTCGACGAGCGGTGCCAGCCTCAGCGGCCTCT




CCGCTGCGCCCACCGGCTTTCTGGCGGGCTGCGAGGA




GTTTGTCCATGCGTCCGTGTGCTTTGAGAAGGCAGCCC




AGGCGCTGGAGGCCGTCACCCGCCCGCCGCCCGCGGT




TCCCTCGTGTAGCCCTAGCACGAGCTCCGGTGCCGCG




AACGGCGCGCAGGCCGACGAGCCCGCTGCCGGTCTCT




TCCGGCGCGTGAGCTCTCTGGCGCCCTCCCCCGCTGCC




AGCAGCCATGAGAACCACCAGCACCAGCACCAGGAC




GGCTCCTGTTGCTCTTCGGCGGAGGCGGTGGAGGCGC




CGGCGGCGCCCGTCGTGTCGGACGGTGCGGCGGCCTG




TGCGGAGCAGCTTCCCCAGCAGGTATTGCTGCCCCAG




GTGCCTCTGGAGCACCACCGGCATGAATACCTGGACG




CGTCGAGCGCAGCGCTGCAGCTGCAGGCTCAGCTGCC




CACGATGCTCGAGGAGCAGCAGCAGCAATCGCCGGA




GGAGGCGGCTCAGCCTGAGCAGTTGCAGCTGCTGCAG




GCGGTCCCGGCCCCGGCTCCGGCTCCCCGGGCCTTCC




ACCACAAGACTGGTGGCCCCTGTGATCACTGCGGCGC




CACGGAGTCGCCGCAGTGGCGCCGCGGCCCGCCCGCC




AAGCCCATGCTGTGCAACGCCTGCGGCACCCGATACC




GCCGCACTAACCAGCTCGGCCCTGTGGGCGCACACAC




GCCGGCGGGCCGTGCTGCAGCCGCGGCAGCAGCTGCG




GGCGCGTCCGTGTCTGGCGGCAAGCGCATCAGCAAGG




GACACGGCGGCGCCGCGGCCAAACGCAACCGTGCGA




GCTACTGA





11
97
ATGGCTCCCACGGCATATATGCTCTTCTGCAATCAGCA




TAGAGAATCCGTGCGCCAGCGGCTAGCAGCAGAGGGC




CAGGAGAAGATAGCGGTGACGGTCGTGGCCAAGGAG




CTGGGCCAAATATGGAAAGCTCTTACCGAGGAGGAAA




AGGCCAAGTACCGGGCGCAAGCAGAGGAGCAGAAGC




AGCAGCAACAGCAGCAACAAGCGGGCGACGGGAGCG




AGACGCAAGGCGAGGGGAACGCGGAGGGGGGCCAGA




GGGCTGGCAGCCCCGCCAAGGCTGCCGCTGCTGCTTC




GCTACCGGCGTCCTGGGTGCGCAAAGTGGTCAACCTG




GACCCTGAAATCCAGCGCTGCTCCGCTGAGGGCGTGC




TGGCGCTGTCGGCGGCCGCGGAGGTGTTCCTGTCCGC




CGTGTGCGCCAAGGCCACGGCGGCGGCGGCGGCAGG




CAAGCGGCGCACGGTGCGCCTGGATGACATGGAGAA




GTGCATTCGGGGCGACAAGCGGCTCATGGCCGCGGGC




TTCACCGCCGTCATCAACATGGTGTCGGCTGCAGCGG




CCACAGAGGCGGAGGGCAAGGCTGCTGCGGTGGCTGC




AGCGGGCGCGCCGCCGGGAAAAAAGCAAAAGGTGGA




CAAGGCCGCCGCACCGGCGGCAGGGGCGGATAAGCA




CAACAGCATTGAGAAGGCGTTTGGTATGGCGTCATGA





12
98
ATGCGAGGCTCCACTGGCGGCCCCTGCTGCCACTGCG




GCACCGTCGCGACTCCCTGCTGGCGAAAGGGGCCCTG




CGACAAGCCGGTGCTCTGCAATGCGTGCGGCAGCCGG




TACCTGGTCAAGGGCTCACTCGCTGGGTACTTCCCTGG




CGCGCGCCGGGCGAGTGCGGGCACCCGTAGCGAGGC




GCCTCAGATTCAGGCGACCGTCGTTTCCGCGGCCGGC




AAGTCTGCTGCGCGGAAATCCGCCGCGCTGTCGTCAG




TAGCCGCATCTGCTGGTGCCAAGCGCAAGGTGCAAGA




GCTGGACGGGAACGAAACCGGTGCCAAGCGCATCTTC




AACAACTACGAGGCCCTGGAGGAGCTGCGCGCGTTCT




TTGCCAGCAGCCGAGGGCCGCAGGCGCCAGCCCAGAC




CTCGGACTCTCAGGACTCGCAAGGCCAATTCCGGGAC




GAGGCGCAGTACCTAGACGCGAGCTCCGACGATGGCC




TGGAGCACCCCGACTCGGAGCCGGTGGCGGCTTTGCG




CCACATGCGTGCCCCCCTCAACGCCACCACGGCGGCA




AACTACTCGGCACCGCACGTGCCGACTTTCCAGCGGC




GGCCGCGCAAGCAGCTGCACCCGGTGCCGTGCTCCTG




CTAA





13
99
ATGGAGGCACAAATAGAGAAGCCTGAGGCAGATGCG




GAGCTGCCGCGAGCGCTAATTCGGCGAATTGTCAAGT




CTAAACTCGCACTCCTCGCGGGCGACGATGCAAAGGA




ATTCAGTGTGAATAAGGACGCTCTTACAGCACTTGCA




GAGTGCACCAAAGTCTTCATAAGCTGCTTGGCATCGA




CTTCCAATGACATTTGCCAGGAGAAGCGGCGGTCAAC




CGTGAACGCTGACGACGTGCTCACGGCGCTGCACGAC




CTGGATTTCCCAGAGCTCGTGGGGCCCCTGCGGGAGC




AGCTTGAAGCCTTCAAGGAGGCAGCAAAGGAGCGCA




ACAAGAACCGGCAGCAGGCCGGCGGCAACAAGAAGC




GCAAGAGCGGCGCCGCAGCCGACGAGCCGCCCCCAG




TGGCGCCGCGCAGCTCTCTGCAGGCGGCGCCAGCGGA




GGCCGCGCCGGAGGCTGAGGACGGCAGCGGCGGCGC




GGGCCCCAGCCATGCCGACGACGACGACGACGGCGC




ACTGGTGCCGGGGACCGGCATGGGCATTGGCGGCGCC




GGCGGCTTTGGCGAGGACGGGCTTGGAGGCATCGGGC




TGGGTGTGGGCATGGGCGTGGGCGTGGGATTAGACGC




GCCGGGGCTGGCGCTGTCTCCTGGCGGCCTGGCGATG




GGCGGCGCGGAGGCCGGCGCGGTGGCGGCGGCGGAT




GTGGCGGCGCACCCGCAGCAGCAGGAAGCGGCAGGT




GCTGCTGCGCAACAGCAGCAGCGAGCAGTGGAGGAA




GTGGCGCCGGAGGCGGTGGTGGAGGAGGAGGTGCAA




GTGGAGGACATGTTGGTCGACGCGCTGCCGTGA





14
100
ATGGACGGCGCCTTCCCCAATCGTCGGGGGGACGGAT




ACGGGGGCAGCCAGGGTGATGGCGAGGGCCAGGGAG




GGAAGCCTCGCGGCTTCAGGGGCACCGCGGAGAATGC




CAAGACCAAGGTCTGCACTAGGTGGCTGCAGGGCGAT




TGCCGCTTTGGCGCGCGCTGCAACTTTGCCCATGGCGA




GCACGAGCTGCGGAAGCTGCCCGAGCGTCAGGGCGG




GCGCGGTGGTGGTGGCCGGGGCTATGGAGGCAATGCT




GGTCCCTACGGTGGCCGGGGCGGCTACGGCGGTGGTG




GCTACGGCGGCCAGCCCGGCATGCCCGGCGGCTACGG




CGGCGGCCAGGGCGGCGCGCCCGGCCCCAACGTGTCG




GAGGACGTGTGGGCGGCGCAGGGCTACCCGGTGCAG




GGCCCTAACGGTTGGGTGCAGTACCGCACCCGCGACA




CCGGGGAGCCCTACTTCCACAACCACCGGACAAACGA




GACGGTGTGGGACCGGCCCGCGGACTGGCCGGTCACG




ATGCAGGGCCAGATCTGA





15
101
ATGCTGTTCAATCCACCTGAGTGGGCCAGCCAACCCT




GTAGAATCGCGAGCCTTGAGGTTTATTCCGGCAACCG




ACGGATTGTTGTTCATCCTGTGGACATCGAGCCCTATT




ACACGTTCGGACGGCAAGCTGAGTCGGTGTCAATTGC




ACTCGAGCACCATTCGTGTAGCCGCGTGCACGCTGCT




CTCGTCCACCACAACGACGGTCGCATCTTCTTAATCGA




CCTCCAGTCGACACAAGGCACGACTGTTGACGGCCGC




CGCATCGCACCCAACAAGCCGGTAGTGCTTAAAGACA




ACACGCGCATTCGCTTCGGCGAGCTAGAGTACGACTA




CGTTCTTCGCTGCGAGTCTGCAGCCGAGAAGCGCTCC




GCCGCCGGTGACCCCGACGCCGCCCACGCGCAGCCGC




ACAAGCGCGCCGCCATGGCCGACGCCCGCGTCCGCGC




CTCCCACCTGCTGGTCAAACACAAGGACGTGCGCCGC




CCCAGCTCCTGGAAGGAGCCCGTGGTGACCCGCACCC




GGGAGGAGGCGCTGGCCATGATCGAGCACTTCCACTC




CATGCTGGTCAAGGGCGAGGTGGAGTTCGCGGCGCTG




GCCGCACAGGAGAGCCACTGCAGCAGCGCCAAGCGC




GGCGGGGACCTGGGGGAGTTCGGTCGCGGCGAGATGC




AGAAGCCGTTCGAGGACGCCACCTACGCCCTCAAGGT




GGGCGAGCTGAGCGGCCCCGTGTTCAGCGACTCGGGC




GTGCACCTCATCCTGCGCACAGGCTGA





16
102
ATGTCCGGCGACAGCAGCGCCGGCGAGCGCCGTAGGC




GATATCCACTGGCTAACATAAAGGGCGGCTGGTCTGC




GGTGGAGGACACAACACTGAAGAGGCTTGTGGAGGA




GTTTGGTGAGGGCAACTGGAGCGTCATCGCCCGTCAC




CTTAACGCATCGCTGGGCAAGCCCTCGGACTCGGGCC




GCATCGGCAAGCAGTGCCGCGAGCGCTACAACCACCA




CCTTCGGCCAGACATCAAGAAGGATGCCTGGACTGAG




GAGGAGGAGTCGCTGCTAGTGGCGGCACACCTGCGCT




ACGGCAACCGCTGGAGTGACATCGCCAAGGTCATTCG




CGGCCGTACCGAGAACGCAGTGAAGAACCACTGGAA




CGCAACCCTGAGGCGCAAGGACGGCGACAAGGCCAT




CCGCAGCGGTACCGCACCGCAATCGTGCGTGCTTAAG




AACTACATGATCCGCCTGCACCTGCTGCCCGGGCCAC




CAGTCGGCCCGACCGCCGCCACGACGGCACTGCCTGA




CAACGCGGCGGCTGCCGTTGCACCGCTCCCCGCCAAG




CCCGTCGCCAAGCGCGCCCGGTCCTCGGTGGCGGCTG




AGTCTCCCAAGGTCGCTGGTGGCGTCCACCCAGCGGA




CCCGGCGCAGCCCGGCCCATCGCCCTCCTCCTCCACC




AGCACTCACGACGGCGTCAGCTCCAGCCCGCACCGCA




GCTTTGATGCCAGCGTGGCGTCGCCGGCCGGCGGGGC




AGCCGCCAACCGCAAGCGGCCGCGCATCATCACTTTT




GCCGCCGCGCCCGACCCGGCGGCCGCTATCGCAGCCT




CCACCCTGTCGCGTCACGCTTCGCCGGCGCCCCTGGCT




GCAATGCCCATGCAGGACGGCATGCCCATGCCCCTCT




TCGCGCCGCTGTCGCTCCTGGCCGTGCCCAACTTAACC




GGCCAGGTGACAGCCGCGCCCACGGCGCCCGTGGCGA




TGCGGATGCAGTTCCAGATGCAGCAGCAGCAACAGCA




AGACATGCACCCGCAGATGCAGCAGCAGGTGGCCATG




CAGCCGTCCGCGCCGGCCATGCGTCGCCCCAGCCCGC




GTCCGCAGCCGGTGCAGCAGCAGCAGCAGCAGCAGC




AGATGCGCGGCAGCAGCCAGCCGCGCACGTCGCAGCC




ACCGCAGCGCGGCTCGGCGCCGCTGGGCTGGGCGTCC




GACAGCGCCGAGGACAGCCTGTACGGCAGCCCCGTGT




CTGACAGGTTTGTGGACATGCAGTTTGAGGAGGACTA




CCTGTGCAGCCACGGTGCCGGGGGCCAGAAGGCGGCA




GCGATCGCAGCCCCGGCCTCCTATAAGGCAGCTGATG




AGACGCAAGGGCAGGAGCTACAGCTGCAGTTGGCGG




GCGTGGGCAGCAGCGAGGTGCAGGCGGCGCAGATCA




TGCTCGCCCTGCGGAGCCTGGCGGGCGGCCTGTGA





17
103
ATGGCGCCGAAGGCAGCCCCCAAAGTAGACAAGGCG




AAAGCGGCTGCCAAACAGAAGGCCGCTGAGGACAAG




ACTTTCGGCCTTAAAAATAAGAACAAGTCGGCCAAGG




TGCAAAAGTATGTGCAAAACGTCAAGACGAACGCGAC




GCAGAACCTTGGCGCCTACAAGCCCGTGGAGGCGAAG




AAGAAGGACAAGGCTCCGGATGAGCTGGGCAACATTT




TTCTGCCGACCATTAAGCAGCCAAAGGTGCCGGACGG




CGTGGACCCCAAGTCCATCGTGTGCGAGTTCTTCCGCC




ACAACCAGTGCACCAAGGGCAACAAGTGCAAGTTCAG




CCACGACCTGTCGGTGGAGCGCAAGGGCCCCAAGATC




TCGCTGTACGCCGACCAGCGCGACCTGGGCAAGGACG




GCGAGGACAAGGAGGGCATGGAGGACTGGGACCAGG




CCACGCTGGAGGCGGCGGTGAAGCAGAAGCACGCCA




ACGAGAACAAGCCCACGGACATCATCTGCAAATTCTT




CCTGGAGGCCGTGGAGAAGAAGCTGTATGGATGGTTC




TGGAAGTGCCCCAACGGCGAGGACTGCAAGTACCGGC




ACGCGCTGCCGCACAACTACGTGCTCAAGAGCCAGAT




GAAGGAGCTGCTAGAGGAGGAGGCGCGCAACACCAA




GGACATTGCGGAGTCCATTGAGGAGGAGCGCGCCAAG




GTGGTGGCGCGCACGCCCATCACCCAGGAGACGTTCA




GTGCCTGGCACCGGGCGAAGCGCGAGGCCAAGGCGG




CCAAGCGGGCGACGGACGAGGAGGAGCGGCGCAAGA




AGGGCATCCTCAACGGCCGCGAGATCTTCATGCAGGA




GGGCTTCGTGGCCAACGACGACGCCAGCGCGGCGGAC




GAGTACGGCTTCGAGGTGGACGAGGAGGAGGAAATC




AAGGCCATGATCGAGCGCGCGGCGGCGGCGGCGGAG




GCGGCCAGGCAGCAGGCGGAGCTGGGGCCAGTGCCG




GAGGAGGCGGAGGAGGCGAACGAGGGCGCGGGGCCA




TCCGGCAGCGGCGCCGGGCCATCCACACACCTCAACC




TAGAAGACGAGGAGGCGCAGGAGCTGTTCGATGACG




ATGATGACGACGACGAGGAAATGGAGGACGACGAGG




AAATGGACGACGACGACGACGACGACGACGAGCTGG




AGGGGCTGGAGGACCACGTGAAGGGGATGCACGTGG




GCGGGGCAGCAGGGCAATGA





18
104
ATGAGCGGCGAGCCCTCGCCCCTCGAGGAGCAACCGG




ACCTAGATAACTCTGAGGACCTACACAACAGCTCTGA




CGCTGCGAACGCCAGCAGCCGGAAGGGTCAGCCATGG




AGCGAGGAGGAGCACAGGGCGTTCTTGGCAGGCCTGA




AGTCACTCGGCAAAGGTAGCTGGCGACAAATTAGCCA




GCAGTTCGTGCCGACGCGGACCCCTACGCAGGTGGCC




AGCCACGCACAAAAGCACTTTATGCGTGTAGCCGGTG




CTACCAAGCGGAAGAGCCGCTTCACGGCGCTCGAGAC




CGAGGTTCTGCCGCCCGCCAAGATTGCTCATGTTGATT




CGAGGCAGCACGGTTCGGAGCAGACGGAGCAGCTGG




AGCCGCAGCCCCAGGCGCAGGCGCGACAGCCGGCGA




TGGCCCCGCAGGCGCAGCAGGCAGGCGCACCCGCGG




CCTCGCAGTTTGGGCCGATGGCCGCCTTTGGGCCTATG




GCTGCGTTCCCGTTCATGAACCCCATGATGTTCGGCTT




CCCGGCGCCCTTCTTCCCGCCCTTCATGTGCCCGCCCC




CCGCCTTCGCGGCCGCGGCGATGCAGAGCATGAACGC




GATGCAGAAGTCTGGTATGGCTCCCGGCATGATGATG




CCGCCGCTGTTCGCGCCCATGATGGCCGCCATGGCCG




CAGCCTCCACGCCCTTCTTCATGGCGCAGCAAATGCA




GGCCATGGCGGCGCAGGCGGCGGCAGCGCAGCAGCA




GGCGGCGCAGGCCGCAGCGGCACAGCAGCAGCAGCA




GTACGCAGCGACGCAGGCGGCCACCAGCGGCGCCGC




CACCACGGCCGGCACCGCCACCGCCACATCCGACACA




GCCAACAGCGATGACGCGGTGCGGCGCCGCCACGCCT




CCGTCGCCGCGCCCAGCGTTGGCAACAATGCCGGCTT




GGGCGGCTCCTCGCCTGCGGTCAAGGCCGAGCCCGTG




TTGCACGTGCAGATCCCCGCGCGGCCGCCGTCGGCCT




GCGGCGTCGCCGGCAGCACCAACACCAGCCCAGGCCG




TGTTGCGGCCGCGACGCCGGGGCCTGACGCAGTGGCG




GCGACGGGCGGAGAGTCGCCGGCAGCGGCACAGGCC




GGCGCCAGCAATGCGGCGCCGCCGCGGGAGCAGGCG




AAGAGCTGTGGCGGCGCCCCTGGCGGCGTTGGTGCCA




GGTGTAGCGGCAGCGGCGTGGCGGTGCCCGCGGGCGG




CTGCGGCCTGGAGCAGCAGCAGCAGCCGCTGCAGCGG




CGGGTGTCGGGTGGGCGCGGCGAGGAAGGTGGTGCG




GCGTTGCCCTTCCATGCGTCCTCGCACTCGGCTTTCCG




GCCGCCGCAGGCGCAGCAGGAGATCAAGGCCGAGAG




CTAG





19
105
ATGGTAGACGGTGGTTCGCGTGCTGCCTCTGGCCAGC




TGGATGACTGGGCCGCAGGCGTCGCGGCTGACCTAGA




CCAGGGAGAGGGCGACCGCGCAGGGGCGAGGCGACG




ACCTGCGCGCGACGCCAGCCCGGCGCCGGATGCTCGC




AAAGTGACAACGTTCACAAACAAAAAGCGCCCGGCAT




CGGACAGGGACAGCAGCCCGGAGGAGGACGACGAGG




AGCAGGCTCAGAAAGGCTCCCTCAAAGCGGATGGAAC




TCGCCCCAAGCTTCCACGCCCCGACAAGAAGGAGGCA




TGCCCTCGCTGCAACAGCATGGACACCAAATTCTGCT




ACTACAACAATTACAACATCAAGCAGCCCCGCTTTTA




CTGCAAGACGTGTCAGCGGTACTGGACTGCCGGCGGC




ACGTTGAGGAACATCGCTCCGGGCTCCGGTCGGCGCA




AGAGCAAGAGCAAAGCCGCGCGTGAGAAGAACAGCC




CCTCGCTCGCCGAGCAGCTCACGGCGGTTGCGGCGGG




ACAGGGCATGTTCGGGCTCGGAGGCGGGGGCGGGTAC




AACGGCATCAGCCCGGCGCTGGCGCTCGCCGCGGCCA




CCGATCCCACAGGGCTGCTGGCCGCGAATAGCGCCGC




GGCGTACGGTCTGGGTGGCCACGGAACCATCTCCGGC




CTGAAGCTGGGCGGTGTGGGTGGGCTGCCGGCGCAGT




TCAACAGTGAGTTGGCGCTGCGGGAGCACCTTGCAGG




GCAGCACAGCCTGGAGACACGGCTGCTGCTGAACGGG




CACCTCAGCGCCGAGGACCTGCCGAACGGCATGTCGG




CGGCGGCGCTGGCACAGGCCAGTGCACAGCTGCACGC




TCTGCACGGGCAGGGCAGTGGCATTGCGCAGTCGCTG




GCGGCCGGCAACGGGCACACGGGGTCGCCCTCGCCCT




CACCTCCTCCGGCCGGGAACGGCGGGCAGCAGCACCC




GCTGTCTTCCTCCCCGCAGCACGGCGGCGGCTCGCAG




GCCTCGCAGCAGCCGTCTCCTCCTCAGCAGGGCTCGG




ACGACGCCGAGGGCGGTGGCGAGGAGCGCTATGTGG




CGCAGGGCCGCCGCGTGCGCGTGAAGGCGGAGTTGGA




CGGCAACGCCGTCAGCAGCAGCCTCGCAATGGGCGGC




GGCGGTGGCTCGGGTGCGTACGCCAACGGCGCTAGCA




TTGCCTCCTCCATTGCCAACGCCCAGCTCGCGGCCAGC




CTCAGCATGCCGCCCAGCATGGGCGCGCTGGCGGCTG




TGATGGGCCCTGGCGGCGGCCCCAGCGGCCTCCACCC




ACTGCTTGCGCAGGACAATGGTGGCAGCCTGCTTGAC




GCCGGCCTGACGCGGCAGCAACTGCTAGTGCTGCAAC




AGCACCAGGCCATGCAGCAGGCGCAGCAGCAGGAGA




GCCTCCAGCAGCTCAGCAGCTTGCAGCAGCTGCAGGG




CCTTGCCGCGCTGCACGGCCAGCACTCGGCGGCGGGC




CTGGCGGGGCTGGACCCGCTGCAGCGCAGCGCGCTGC




TGCACTCGGCGGCCGGGCTAGGCGGCGTGGGCGTGGG




TGGCTGGCTGCAGGGCGGCGGCGGAGGGAACTCGCTC




GCAGCCGCTGCTGCGCTGGAGTCGCTTCAGGCGCAGC




ACCTTCTCCAGGCGCAGCAGGTGCACCCCTCGGCGGC




CGCTGCCCTCATCGGTGGCGGTGGCAGCAGCGCCGCA




GCGCAGATGTTGCAGGCGCAGGCCGCCGCCGCCGCCG




CGGGTGGGGGCGGAGGCTGGCAGGGCGTGGCCTCAG




CAGCGAATTGGCCGTCGGCCTGGTCGTCGTACAGCGG




CCCGTCGTCTGGCAGCTACGCCGGCTACGCACTGCAG




GCGGCGGCCGCTTACTCGGGTGCTAGGTGA





20
106
ATGGACCAATACCAGCTTGCTCAGCTTCAGCAGCGGT




TTCAGGAGGTTAACCTGAGCGGCGGGGTTGACCAGGG




CGCCATGCTCAAGTCAGCAGGTGACCTGCTGTCATCC




GCTGAGGCCACAACACAGTACAGCTCATCAGAGTCTA




GCTCTGGAGCCGACAACTTGAACCAGCTGGACAGCTC




CAGCCTCCTGGACACAGGCATGCTCGCTACAGCGCGG




CAAAGTGATGGCGCGCGCTCTACCGGGCAACCGTCGC




AGGAGGGAAAGGCGCAGATTTGCTTCGACTTCACAAA




GGGCGTGTGCTCACGTGGCGACAAGTGCAAGTACTCG




CACGACCTCGCAACCATCGTGCATTTCAACAGCAAGG




AGAAGGGCATCTGCTTTGACTACCTGCGCAACCAGTG




CCACCGCGGCCTCCTGTGCCGGTTCAGCCACGACCTCT




CAAACATTGCGCAACAGTGCCAGGTGAACAACGGTGT




AGCCCGCGGTCCGGCACAGGGCGCCAAGCCAAACGC




CATCTGCTACGACTTCGTCAAAGGCGTCTGCCAACGC




GGCGCGGAGTGCCGCTACAGCCACGACCTGTCCCTCA




TCGCGCGCATGGCCCGCGGCGGCAGCGCGCAGCCCAA




GGCTGGCGAGGTCTGCTACGACTACCTCAGGGGCCGC




TGCAACCGCGGCGCCACCTGCAAGTACTCGCACAACA




TCGCCTTCCTGGCGGCGCCCGGTTTCCTGGGCAACGCC




ATGTCGTCGGACGGTGTGCCCATGGCTGCGCAGGCGC




CGGGCGGCCACATGTCGGCTGGCGGTGCGCCGCCGCT




CGGCCCCATGCCTGTCCCCGGCGGCCCAGGCTTCATG




GGCATGGGCGGCATGTCCGGCATGGGCCCGCGCCCCC




TGCACACCGCGCTGAGCGCCGACCAGGCCACGCTGAG




CCACGTCCTGGCGGCGGCGGGGCCGGGCGCCGTCAGC




CAGATGCTGGCGGCACAGGCGGCGGCGCAGCAGAGC




AACGGCTTGGCGGCCGAGGCGGCGGACGGGCGCCGG




CGCTCCAACAGCCTGAACGGCGACATGGGCAACGACA




CGCTCGCCGTCAACGACCAGCCGCACTGGAACGCCAA




GGGCCTGGCCATGGCACAGCACGCGGCCATCATGCAG




CGCATGGCGGGCATGGCGGCGGCTGCTGGCATGCAGC




AGGCCTTCGGCGGCGGCATGGGCCAGGGCATGCCCGG




ACGAGGCATGCCGCCGGGCGCTGACGCCATGTCCCAC




TTGTACGGCAAGCCGCCGCCATCCATGGGCTCCTACG




GCGGCCACGACACAAGCGCGGGCATGCGGCGGCCGC




CGCTGCCGCCCGGCGGCGGCAGCGTGCCCGCCGAGTT




TGCGGCCCTGCTGGCGGCCGGCGGCATGGCGGACAGT




CATGCGCTGTACGCTGAGACGATCAAGGCGCAGCTCC




AGGCGCAGCAGGGCGCGCGCATGGTGCCCAACCTCAG




CGGTGGCGGCGCGCCGCCCATGATGGCCGCTGCGCCG




CAACCCATCCCCGGACGCGACAGCCAGGGCTACGACG




TCGCCGCGGCGCAATACGCGCAGCAGGGCGGCTCGCA




GTCTGGCGGCGGCGCGCCATCCTCGGACAGCGGCAGC




CTCTCGCGGAGCGCGCCGTCGGCAGGCGCCCCGGTCA




ACCCCGACCTACTCCCGATGATCAAGGAGATTTGGAG




CAAGCCCGGGCAGATAGCGGCATGA





21
107
ATGACAATCCCTGACGAGGAGGTTCTCACTAAGCTGC




GTGAGCTTCTGAAACACGCAGACCTGAATGTCACCAC




CGAAAAGATGCTGCGCAAGCAGCTTGAGGAGCACTTT




AAGCAGGACATGACAGACCGGAAGCCCATTATTCGAG




CCGAGGTTGAGCGATATTTAGCTGAGGGAGCAGGGGA




TGAGGAAGAGGAGGAGGAAGAGGAGGAGGACGACG




ACGACGCGCCGGCTCGGGGAAGCGGCATGGGCTCGTG




GTTGTCAGAGCCGCTGCAGGCCTTCCTGGGGGTGGAG




TCGCTGCCCCGCACGCAAGTAGTCAAGCGGCTGTGGG




AGTACATCAAGGCCAACAACCTGCAGGACCCCAAGGA




CAAGCGCAAAATCCTGCTGGATGACAAGCTCAAGACA




TTGTTCACCTCGCCGCTCACCATGTTCACCATGAATTC




GCAGCTGAGCAAACACGTCAAGGTGTATGACGGGGAC




GATGAGGAGCCCAAGGCCAAGTCAGCCAAGCGGCCA




GCGAGCAAAGCGGGCAAGGAGAAGCCCAAGAAGGTC




AAGACCGAGATGGATGAGGAGAAGCGGAAGAAGAAC




GCGTTCACCAAGCCCGTGCGGCTGTCCCCGGAGCTGG




CGGCGCTGACGGGCAAGGAGTCCATGGGGCGGCCGG




AGGTGACGTCGTTCTTCTGGGCGTACGTCAAGGAGAA




GGGCCTCAAGGATCCCGCGAACGGCCAGTTCATTATC




TGCGACGCGGCGCTCAAGAAGATCACAGGCGAGGAG




CGCTTCAAGGGGTTTGGCTTCATGAAGTACTTCGCGCC




GCACATGCTCAAGGACTGA





22
108
ATGGCGACCAACCTGTGCGCCGAGTGCGGCATAAAGC




TGTCGCGGCCCGAGTATCAGAAGCACATGCAGGAGGT




GCACGGCGTCTCCATCCAGCACGACAGCGACGACGAG




CGCGATAAGGAGGCCCCGGCCGCCGGCGAGGACGGC




GCCGATGCCAAGCCGCAGCGCCAGCGCCGCCGTGGCG




GCAACAAGGAGGGCGCCGCCGAGGGCGCGGAGGGTG




CCGAGGAGGGCGCCGCCGGCGAGGACGGCGCCCGGC




CTCCGCGCGAGCGTCGGCGCCGCGGTGGGCGCAAGCC




CGCCGGTGAGGGTGCAGATGGCGAGGCTCCCGCTGGC




GACTTCGCCTGCGGCGACTGCGACCGCACCTTCGCCA




GCCAGCAGGGCCTGAGGCGCCACCTGCAGGCCAAGC




ACCCTGAGTCTGAGGCGACGGCCGCTGCGGTGGCCGC




CGCGGCAGCCGAGGCCCCGGCGGCCGGCGCGCGCCGT




GGTGGCCGCGGCCGTGGCCGCGGCACCAAGGCCGAG




GCCGGTGAGGGCGCCGCGGATGGCGCGGCAGCTGAC




GGCGCGGAGGGTGCCAAGCCCGCGGCGGGTGGTCGC




AGCCGTGGCCGTGGCGGCCGCACCGGCCGCGGCCGCG




GCGCCGCCGCTGCCCCCGTGCCGGACGACCCCACCGC




CGCCGCTGCAATGGCTGCGGCTGCGGCCAAGGCGGCG




ATTGGCGGCGCGGCCGCTGAGCCCGCGGCGGAGCGTC




AGGTGACGCTGTTCCGCTGCAAGCAGTGCGAGCAGGG




CTTCAAGAGCCGCAATCGCGCCCGCGAGCACGTTATT




GAGGCACACGCCGCCGACGTGCCCGCGGAGGCCCCTG




CCGAGGCGCCGGGCGTCAAGCCGCCGCCGCCGGAGG




GCGTGGAGCTGCCGCCGGGCCGCCGCGCGCCGGCGCC




GGTGGTGCCCGTGCCCGCTGACGCCCTGCTGGAGGTC




GCGGAGGTGACCACCAAGCGCGCGCCCCGCGCGTCGC




GCCGGCGGAAGCCGCGCACGGCCAACGGCGACGAGC




CGTCAGGCGATGCGGCGGAGGGCGAGGAGGGCGCCG




CGGAGGGCGGCCGCGCGCGCGGCGGTCGCCGCGGTG




GTCGCGGCGGCGACGCCGCGGCTCCGGCTGCCGGTGG




TGACGCGGCCGCTGCCTCCGGTGATGCGCCCGCGCCT




GCTGGCGGCGCCAAGCGCAGTGGCGCGCCCACGGAC




GAGCTGGCGGCTCTGGGCATCACCGCGAGCTAA





23
109
ATGGCGCTTCCAGGCTCCACAATGAACCTTACAACCC




GCTGCTCTACTACACCGCGGTCGGCTGTGGTTGCGCGC




GCGGTGGCTGCGCCCACGCGACCCACCACCAAGTCTG




CGGTGCCAGAGCTGCTGGATAGCCGGCCAGGCGAGCG




CAATCTCAACTTCATGGAGTATGCTCAGGCGACTCAG




ATGCTGGACCGGCTCAAGGGCCAGGCCTCTGACCTGG




AATTGCTGCTGGACCAGCTCAACGCGCTGGAGGCCAG




CCTCGACGAGAGCGTTCTGGCGCCGCCCACGGTGGAC




GACCCCAAGGAGCGGGCTGCGCGACAAGCACGGCGC




GCTGCCAAGCGTGCAGAGCGTAGGGCCCAGGCGACAT




CCGCAACAGTCGCGGCCGCGGCTGGGCCGGCAATGTC




AGCAGTGGTCTCGCATTCCACGCCGACGAAGGCTGCT




GCTGCGCCGGCCACGTCAACAGCGAGCAGCAGCTCCA




GCGATAGTGGTTTGCTAGACCTGGTGAGCTTTGTTGGC




GGCTTTGACACGCGGCCGATCCCGGCAACGACGTCTG




CACCCCCTGCTGGCGCCAGCAGCTCCGACGTGCAGCA




CCTGGAGGACCTCTTCAAACTCAGCGTCGGCGAGCCC




GACATCCCCCGGGCCTCCGCTTCAGCAGCGCCTGCGG




TGCTGCGGCCACGCAAGCTCACACCAAAGAAGCCCTC




TGCGGCACCCTCCGCGGCGGTGACGGCAGCACCCTCG




CCGGCACCCACGCTCCCCAGCACGCCCAGCACCAGCG




CGCGCATTGCGCCCGCGCCCGGCTCCCTCGCGGATGA




GCTGGAGCGGTTACTGGGGCCCACCACGTCACGGGAG




GCGGCTGAGTCTGAGGACGAGGACAGCTTCGCGGGGC




CGTCTGAGGACGACCTGCTGGCGCTGGAGCAGGAGGT




GTCGCGCAAGTCGTCACGGCTGCCTGTGCTAGACGAG




GAAGACGAGGAGGATGAGCAGCAGCAGCTGGAGGAC




AACGAGGAGGACGCGGTGGCGGGGCCCGGCTCTTTGG




AGGCGTCGGCAATGGCGACTCGGACGTCCAGCCAGCT




GTCCATCATGCAGACGGGGCCGTCGCTGCTTAGCCTG




GTCCCAGCATCCGCGGCGCCAGGCCGCAGCGCCAAGG




CGCGCGCCTCCCGGCGCGCGGCGCGCAACGGTCACGC




TAGCGGGCGGCTGGGTGGCGCGACAGCTAACGCGGCG




GGGCGGGGCAAGGTGGGCAGCAAGGACGGGACCATG




AACTTCCTGGGCAAGGTGGAGTCATTGTCAACGCTGG




ACGTGGAGAAGGAACGCGAGGTGACGGCAGTTTGCC




GCGACTTCCTGTTCCTGGAGAAGGTGAAGCGGCAGTG




CGAGAAGACGCTGCACCGGCCCGCCACGTCTGAGGAG




ATTGCGGCGGCCGTGGCCATGGATGTCGAGAGCCTGA




AGCTCCGCTATGACGCCGGTCTGAAGGCCAAGGAGCT




GCTGCTCAAGTCCAACTACAAGCTGGTCATGACGGTG




TGCAAGTCGTTTGTGGGCAAGGGCCCGCACATCCAGG




ACCTGGTGTCGGAGGGCGTCAAGGGCCTGCTCAAGGG




CGTGGAAAAGTACGACGCCACCAAGGGCTTCCGCTTC




GGCACGTACGCGCACTGGTGGATCCGCCAGGCCGTGT




CGCGCTCGCTGGCGGAGACGGGCCGCGCAGTCAGGCT




GCCCATGCACATGATCGAGCAGCTGACGCGGCTCAAG




AACCTGTCCGCCAAGCTGCAGACGCAGCTGGCGCGAG




AGCCCACGCTGCCCGAGCTGGCCAAGGCGGCTGGTCT




GCCTGTGACGCGCGTTCAGATGCTCATGGAGACGGCG




CGCTCCGCCGCGTCCCTGGACACGCCCATCGGCGGCA




ACGAGCTGGGCCCGACCGTGAAGGACTCCGTGGAGGA




CGAGCGCGAGGCGGCGGACGAGGAGTTTGGCAGCGA




CAGTCTGCGCAACGACATGGAGGCGATGTTGTTGGAG




CTGCCGGAGCGCGAGGCGCGCGTGGTGCGGCTGCGCT




TCGGGCTGGACGACGGCAAGGAGTGGACGCTGGAGG




AGATTGGAGAGGCGCTGAACGTAACACGCGAGCGCAT




CCGTCAGATTGAGGCCAAGGCGCTGCGCAAGCTGCGT




GTGAAGACTATTGACGTGAGCGGCAAGCTGATGGAGT




ACGGCGAGAACCTGGAGATGCTGATGGACGGCTCGCG




CGAGATGGCTGCGCGCACCAGCAGCGGCACCCGCAA




GACGTAA





24
110
ATGGACGTGGATGGCCTGGACCTGGCGGCCCTTCTGG




CTGAAGGGCCAGACTCGGGAGTCGGCCCGTCGCTTCT




GGACGATGAACTGTTTTCCGAGGATCTGATGCAGTTCT




TGGAAACGATAGAGGGCCAGCCGACTTCAACGCAGTG




CCACCAGAAGCTAGCCGCACAGCAACAGCAGCAGCC




GGTGCCGGCGCCTGCCCCAGCTCCTGCTCCCGCGGTG




CCCATTCCTGTTGCGACCTCCTCGCCCGCGGTCGCGAT




GTCGCCCACTGCGTCGACCTCGTCCGGCTGCTCTTCGG




GCGTCGTGGCTGCACCTGCGCCCATTCCTACACCAGTA




GCGCCAGCAGCTGCGGCCGTGGCCCTGGCTGCGCTGC




AACAGGCGCAAATGCAGCAATTGCAGCCGGCCTGCGC




TGCCATGCTGCCGCGCCTGGTCACCACAACAGCAGCG




CAGCAGATGTGGACTGCTATGCTTCAAGCAGCGTGCA




CAGTTACGCCGGCACTGGCTGCCGCTCACGCACCTGC




TGCGGCCTCGGTCGATGACGCTAAGGCGCGCGCCCAG




CCCGCTGGGACGAGCCGACAAGGAAGCCGCGAGGAC




TCCGGCGACTCCTCCGACACCGACCAAGATGATGATA




TGGTGGACTCCAAGGGCAAGTCCGTGGGCAACAAGCG




CAAGGCACCCGAGGTGGACTGGCGGCAAATCGAGGA




CCCGGCGGAGAGGCGCCGGCAACGGCGACTGGCGAA




GAACCGTGTTACCGCGGCGCGGTCCCGCGAGCGCAAG




AAGGCCGCCTGGAGCGAGCTTGAGGAGCGCCTGAAG




GGCATCGAGACCGAGAATGCGCAGCTCCGCGCCATGC




TGGAGACCTTCGCGCGCGAGAACACCGCCCTCAAGGC




GCAGCTGCTCACCGTGGCAGCAGCCGGCGGCGTGCCA




GGCCTGAACCACGGCCAGGCGGGCAAGACCATGGAC




CCTGCTAGCGTCCTCCCAGTATTTATAGCTATCATGCT




GGTGGTCTCTGCCCTCCTGCCTGGTGACAAGGCCTGCG




CGCTGCTCGGCTCGCTGCTGCCGCTGGCGCTGATCGCC




TCGATGATGGGCGCCGCCGGCTCGGGCGCTAACGCCA




ACGGCGGCGCCGCCTTCGACTGCCTGTTCCGCCTAATG




CACAGCCTCAGCACGCTGCTATCCAAGAGCAGTAGAA




CGCTGCAGCGCAGCCTAAAGCGCATGCTACTGGCTCG




ACAGCGTTATCTGGGCGCCAAAGGCATGGCCAAGCTC




GGCACCGCCGGCGCGCGGCTCTTCGACCAGCTCCTGA




CGACCCCTTCGCCAACGTCGCCGAGTGCCGCTGAGGA




CCCCGGGATAGCGCCTGGGTCTCCTTCGGACTCGGAC




GGCCGCAACAACGCCGACATGGATGTTGACGTGGCCA




CCGTGCTTGCCGCAGAGCCGGCCGAGCAGGCGCCGCC




AACCGCCACCTGCGCCGCTGCCGTATTGGGCGCTAAG




CCCACGGCGGAGGCGCCGGTGGTGGCGATGGCGGGG




GCCCTGCAGGCGGGCTGCGGAGGCGTGGTGGTGGTGA




AGCAGGAGCCGGTGTGCTAA





25
111
ATGGACACCAGCATTCCATTTCCGCGACCTATCAACG




CGCGGGGCCCTGCTCCGGGCCAGACTCCATCTCAATT




GAGCTCGCTGCCCCCGAGCCTTCAAGCGCGGCTCGGA




CTGGGCGCCACGCACGACTCGCCTGTTCTGCTTCCACT




ACTCCAGCAGGTCGAGGCTTCTCCTACAACCGGCATT




CATCAGCTGTGCCCGCCGCTGTTCCAGCCAGCTCAGC




CGGCTCGGGTGCCCCTGCCGATTCCAGCCCGAACGGA




GGCGGCCTCGGCAGCGCCAGAGCCCACTCGGGCCATT




AAACGCGAGTACGAGCCCCGCGCTGGAAATGGCAAA




CAGTCAGTGGCCAACTCGGACGGCTGGCAGTGGCGGA




AGTACGGCGAGAAGCTGGTGAAGGGCAGCCCGAACC




CGCGCAGCTACTACAAGTGCAGCCATCCGGGCTGCCT




GGCCAAGAAGATTGTTGAGCGCTCCGACTCGGACGGC




ACAGTGCTGTCCACGGAGTACAAGGGGGATCACTGCC




ACCCGGCGCCCAGCGCCGTCAAGGCCTCACGCTTCAA




GCCGAAGCCCAAGACGGAGCCGCCGGTCATGGTTGCA




CCGCCAGTGTTCAGTGCCGTCGACATCACGGTGCCCA




ACGGATTTCCGCCGGGCGCGAACGGGCGGGTCGGCTT




TCCGCTGTCTGGCGGTGACATGCTCCCCATCCCGGAG




GCGCTGAAGAGCGACTTCCCAGTGCCGCACGCTGCTG




GTGCGGCGGCCGCACACGAGGACGACACGGACACAA




GTGAACCGGAGCCCGCTGCGGCGCTGAAGGCGGCGCC




ACAGGACACTCGTGCTGCGCAGGCTGCCGCCACTGCT




ATCCGCAAAGTCCGCGACAGCGCTGAATCGCCGAGCA




AGCGCCTCGACATGCTGGCAGCGTACGCTGAGGAGGC




GGAGCGCCAGCTCAAATCAAGCAGCAACAGCCCGGA




GCAAGGCCCCAGCGCCAAGCGCCAGCGGACAGAAGC




TGGGGCTATGCGGACGCGCGCCAATCCCGACGATGAC




GACGATGGCAGCGGCGCACCTAGCACGTCGGGCATGC




AGCGTGTGGTGGACATCACCAACATGGACGATGGCTA




CAGGTGGCGCAAATACGGCCAGAAGCAGGTGAAGGG




CAGTCCCTTCCCCCGCGCGTACTACAAGTGCACGCAC




ATGGGCTGCTCGGTCCGCAAGCACGTGGAGCGCAGCG




CGGAGGACGAGACACGGTTCGTAGTCACGTACGAGGG




CACACATAGCCACCGGCTACCAACCGGGAGCCGGCGG




CGGAGCGCCAGGGATATGGCGGAAGATGACGAGGAT




TACGAGGGCGAGGACGCCGAGGAGGACAGCTCGCAG




CCCACCAGCCCGCAGTACGGCAATGTCAACGGTTCGG




GGGGTCCGGGCCAGCACGCAGCCTCCAAGGCCGCGGC




GCAGGGCGCGCAGCTGGTGCACCCGTCGGGTGCGCAG




CCGGCCAGCGCGGACTTCGGCCAGCAGCTGCAGCAGC




TCTCGACCAGCCTGCTGGCGTCCACCGTACTGCAGCA




GGCGGCACTGAGCGGCGTGCTGCCGCTGCTGCAGTAC




AACTCGCTGTCGTCGGAGGCGCTCGCCAGCCTGGGCG




TGAACTCGGAGGCGCTCCAGGGCGTGGAGCAGCTCAA




CCTCGCGTCGGTCGGCAACTTAGCCGACTTGACCAAC




CTTCTTCGCCAGCACGCGCAGATGGACCTGGCGCTGG




CAGCGCAGGCTCAGGCCATCGACGCGGCGAACGCGA




ACTGGGACCCGCTGGCGTGCCTTATCACGCCACGGCC




CAACGTCTCGCCGGCGGGCCAGGGTCACGCCATGGGC




CAGGCGCCGTCCGCGGGCACTGGCCGGCAGACTAAAG




CAGCTGTGTTTCAGAAGCAAGTGGCGACTACTGAAGC




GTGA





26
112
ATGGACTCTGATAGCGACGATGAGCGTGCGGCGGGCT




ACGTGCCAGTGTTGGCAGCATCAATGCCACGAGCTGC




TGCAGCGGCGGCAGTGGCCAGCCCCGCGGCGAAGCA




ACCTTCCAACGTTCTACAAGATGGTGTTTCGCTTTACA




CCAATGAGCTGTTCACCGACAACAACGGGGATGTGCT




GGGCGAGGGTCCTGGGCTCGCGTCTCCCAGCGGAGCG




GCGCCCGGCAGCGCACGAAAAGGCCTGGCTGCGAAA




CGGCAGGAGCGGTTGCAGGGGAACGCATACACGCCA




AACTCGCTCCTAAAGAACGCCTCACTGCGTAACCCCG




GTGCCCCTGCGTCGCCGGGTATGCGGGACTCGCCCTC




CTCCTTCCGGCCATCCACCCTGTCGCAAACGGGGACC




GCCACCACAGTGGAAACGACATTGGTCAGCCCCAACC




GCAACAGCAACAACCAGGGCATCGCCGGGGGCGTGG




GAATGGTGCACGGCTTGCGCGCCAGCTACGACCCCAA




CGAGGGGCAGGAGGAGCCTGTGCCCTCCACGCGGTAC




GTGGCGCCGGCAGCGGTGCCGGTGGCACGCGCCGTGC




CCCAGCTGGACCTTTCAGACATGCCGGCATTCCTGCA




GCAGCCGGGGCCTAAGAATGGGCCGGTGCAGTGCGTC




ATCGTGCGCGACCGCGGGTCTGCAAAGATGTACCCGC




GGTACTCGCTGTTCCTGGAGGAGGGGCGGCGCTTTCT




GCTGTCAGCGCGCAAGCGGAAGAAGCAGACCACCAG




CAACTACATCATATCCATGGACTACGAGGACCTCAGC




CGGGAGAGCGGGTCGTTCTTTGGGAAGGTCCGCGCCA




ACTTCGTGGGTACGGAGTTCACGGTGTATGACCGGGG




GGTTAAGGCGGGCAAGAAGGACGCCCAGGGCGACGG




CCAGCGCGAGGAGCTGGGGGCGGTGACGTACCAGTAC




AACGTGCTGGGCACGCGGGGGCCGCGCAAGATGATG




GCGGCCATCCCCGGGGTGGACGGCAGCGGGCGGCGC




ATGTTCAACCCCAGCGGCGACGCGGACACCATCCTGG




AGCGGCTCAAACACCGGAAGGGACTGGAGGAGCTGG




TGGTGATGGGCAACAAGCCGCCGCGCTGGAATGACGA




GCTGAACGCCTACTGCCTGAACTTCAACGGGCGCGTG




ACGGAGGCGTCCGTGAAGAACTTCCAGCTGGTGTCGG




ACGACAACCACAACCACGTCATCCTGCAGTTCGGCAA




GGTCGGCAAGGACACGTTCACCATGGACTACCAGTGG




CCCATCTCCGCGTTTCAGGCGTTCGCCATCTGCATGTC




GTCCTTTGACAACAAGCTGGCGTGCGAGTAA





27
113
ATGTTGCCTTCCGAGCCGCCCTCAGCACCGAGCTCCG




ACCCGAAGGGAGCCGGCCAGGAGGCTCAGCAAGCTG




AAGACTCGCCGCTATACAAGACGGATGAATTCCGCAT




GTTTTGCTTCAAGGTGCTGCCATGCTCCAAGCGATATG




TGCACGACTGGACAGTATGTCCGTTCGCGCACCCTGG




CGAGAAGGCTAAGCGCCGGGACCCTCGCGTGTTCACC




TACACTGGCGTCGCGTGCCCGGATATGAAGAAGTGCC




AACGCGGAGACGCGTGCCCATACGCGCACAACGTGTT




CGAGTACTGGATGCACCCAAGCAGGTATCGCACGCAG




CTGTGCAACGACGGCATTGGGTGCAAGCGGAAGGTGT




GCTTCTTCGCGCACACGCTGGAGGAGCTGCGCGTCTC




CAACGTCAAGCTGCTGCCCGCCGACATCGCGGCGGGG




GTGGACGTGGACCTGGACCCCTTCCGCCGCCCGGAGC




CCGCCAGTGGCCTGCGCTCCGCCAACAAGGCGGGTGG




GGGCGGCTCCAATGCGGCCGCGTCGTCCGGCAACGAG




GCCCTGGTGGAGGCGCTGCGTGTGCAGCAGCAGCAAC




AACAGCAAGTCAAGAAGGCGGCGGCGGCGCTGCAGC




GCAACGCATCGCGCGGGCTGGCGGTAGAGCTGCAGCA




GCTGCAGGCGCTACAGCAGCTACAGGCGGTGCTGGCC




AGCACTCCCGGCTTGGCAGCTCTGGCGCCGCAGCTGC




AGGCGCAGCAGATGGCGGCAGCCGCGGCCGCCTCGCC




CGACTCATTCCTGAACGCCATGATGGCCAACCTCCGC




ATGGCGGGTGCTGGTGCAGGGGCCGGGTCGGGAATGC




CGCACGGCGGCGGCTCCGGCCACGGCGGCCTGGGCAG




CGGCGCCGCGGGCAACGGCGCGGCACTGATTGACGCG




GTGGTGCAGCAGGCGGTGCAGCAGGTGCTGTCAAACA




GCGCGGCGCAGCAGGCCGCCACGGCGCTGCTGATCAT




GCAACAGCAGCAGCAACACCAGCAGCAGGCTGCCGC




TGCTGCGGCGGCTGCCGCGGCGATGGCGCAGCAGCAG




CAGCAACACCAGCAGCAGCAGGCCGCGGCGGCCAAC




CACCAGGCGGCGCAGGCGCAAGCGCACGCGCTGCTTG




GGCACCTGCTCATGCAGCAGCAGCACCACCAGCAGCA




GCAACAACAGGGCGGCCCCAGCCCCGCCGCCATGCAG




GCTGCGCTGGCCATGCTGCAGCAGCAGCAGGCCGCGG




CAGGCCACGGCGGCCCGCACATGCCGCCGCAATACAT




GCAGGGCGCCCGCCCGCTGAGCCCCATGGGTTCGGGC




ATGGAGGCGGCCATGGCGGCTATGCATGCGCATCAGC




AGCACCAACACCAGCAGCACCAACAGCACATGGGCC




AGCAGCCCTCGCTGCCGGGCTCGGTGCGCTCCTCCGC




CACTGGCATGATGTCGGCTGTCGGCGGCCCCGTCGGC




CCGCCCGGCTCGCGCAACGGCGACGCCGCCGCCGTCC




CTGGCGGCCCGGGCTCCCCTCACGGCTCGCCCTCTGG




CTCGCCGCCGGGCGACGGCCCGCTGGGCGGTCCCGGT




GGCGCTGCCGCGGCGGGCGCCGCATTCTCGGCAGCTG




CTACTGCCGCTGCCAGCTATTACAGCCAGGAGGCCAG




CCGCAGTAGCTTTGAGAGCTACCGCAGCAGCGAGGTC




GACCTGGGCCTGGGGCTGGGCCTGGGGCTGGGCGCGC




ACCACTCGATGCACCACCACCACCAACAGCAGCAGCA




CGCCATGCAGCAGCAGCAGCAGCACCAGTTCGGCGGC




GCCGGCATGCACTCGAGCGGCCCCAGCAGCGGCGGCA




CGCAGCGCAGCTCGCTGGAGCTCATGCAGCCGCCGCC




GCAGCAGCAGCAGCAGCAGCAGCAGCATGGCTACAG




CCACTTCGCCGGCGGCCCGCAGCCGCCGCACAAGGCC




TTCATGGGCGATGCGGCCTTTGCGGGCCCGCCCTTCGC




CGGCGGCCTGCCGTCGCATGCCGCGGCGCCCGGCCCG




CGCAGCCCCAGCGCCACGTCGTCGGGCCTGCCCGCCG




CCGCCGAGGAGGAGGCCGCGCGTCAGCAGGCGAACG




CCAACGGCCTGTTTGCGGCGGTGCAGGCGGCGGCGGC




GGCGGGCGCGCAGGCCGGCGGTGCCGGCGCCGGTGC




GCAGCTTAACCTGCCCGAGTCGCTGCTCGCCGAGCCC




GTAGGCCCCGCGGCGATGGCCGCGGCGTTCCGGATTT




GA





28
114
ATGAACGAGGCGCTGGACTTTGGGATCGGCGACTCGC




AGTATGTCTTCACGGATTTAGAGCTCAACGAGCTGCT




GGGCGTGATAGAGCGCAAAGCAGCCGGCGAGGCCGA




GCCTGACGCTCTCGATTTCCTGCGCGCCACTGACGGCA




ATGGACTTGCTCTTCAGTTCCAACCGCGTTCTCAAAAG




GACAACGGCAGTGGGTGCAGCCTCGAGCAGAGCGCG




GTTGCAGCAGCGGTCAAGCTGGAGGATAGCGCGCTGT




CATCGGCACTGGCGTCACCGGTAGACACACCCGCACT




CACCGGCGTCGCCGACCCAGCGTCCCTCTACGGTAGC




GGTGCAGAGATATCGATCATGCCCATGCCTCACGCCG




CCGCTGCTTCCGCTCCGACGTCACTTCACGCCTACACC




CTGCCGGGCACCGCGGGGCACGCGGCGCTTGTTGGCA




GCTCGCCGGCGCTAGTGAGCACCCTTGTCGCCGCCGC




CACTGCCGCACAGCAGGCGCAACACAATGCGCAACTG




GCGGCAGCCGCGGCCGGCTGCCTGCACGTGCACGCCC




CACTCCAGCTGGCGCGCTTCGCATCGGTTCCGGCACC




GCCGGGCAAAGCCATGTCCATGTCCATGTCCATGGCT




GAGCCCAAGGGCCAGATCAGCCACTCCACGGTGGAGA




AGCAGCGCCGCGACCGCATCAACTCACTGATTGACGA




GCTGCGCGAGCTTGTGCCGCCGCAGCAGCGTGGTGGA




GCCAACGGTGCCGCCGCCGCCGCCGCCAACGACGCGG




GAGGCCTGGAAGCTCGGCGGCCCAAGCACGTTGTACT




GGCAGACACCATCCAACTGCTCAAGCACCTGCAGCTC




AAGCTATCAATGGGCGCGCTGGAAGTGGGCGGCGCCA




CCAATGGCTGCTACGTCAACGGGAATGGCGGCTACTG




CAATGGAGGCGGCGGCGGCGGCAGCGGCGGCGCCGT




CGGGCGGCTGGGCAGTGGCTTCAACGGGGAGGAGGA




CACGGCCAACTCGGAGGGCAAGGCCAGCAAGGGATC




CTCCAGTCACGAGGAGATGGAGGTCGGCGGCGCTCCT




CAGATGCCACACATCCCCTGCCAGATGACGCAGATGT




CGGGCGTGACGGTGGAGCGCGGCCCCGACTGCTACTA




CGTGCAGGTCAAGTGCCGCGACCGCAAGGGGCTGCTG




TCCGACATCATCAACGCCCTGAGACAGCTGCCACTGG




AGATCCGCACCGCCGCCGTGACCACCACCAACGGCAC




GGTGCGTGACGTGTTTGAGGTGAAGTTGGACGACCCC




GGGCTCAGCCCCGAGGACGTCCAGAACCTGGTGCACG




ACGCCCTGTTCCAGAGCCACCTGTTGGCGGCGCAGAG




CGAGAGCCTGGCCGCAGCCGGCAAGCGGCCTCGCGCC




TAG





29
115
ATGCGCACCTCAGATAATAGAAACACGCTGTCTCTCG




AGACAGCAGCGCCGGTCTATGGCGCAGCGGAGCTGGT




GGAGGGACAGGCGGTGCTCAGCCTTTTAGAGAGCTTG




GATGTCGAATCGATCGACCTGATGGTGTATGGGTACG




AGGTCGTGGGCTGGGAGGAGGCGCACGCGAAGGAGC




CCAAGCTCCCGGCGGCGGACCCATACGCCCCTAGCCA




GCTGGTGACACCCTTGGACTCACAGCAGCAGCAACAG




CAGCAGCAACAGCCGCCGCCGCCATCTGCGGCCTCCA




AGGCTTCGCCACTGGGCGTGCCCAGACACGGCCAGCG




AACCATCTTCAATATCTGCCAGGTATGCGTGGACGGC




CGGACGTTTCGGCTGGCCGGCACACCAGCACGCACCA




TTGGAGACGTGAGCTACCGGAACCTCTCTGGCGAGGT




CAGCTACGGCTTGCAGGTGGAGGTGCGGCGTCCGAGC




AGTTTCGCGTCGGCAGCCGAACAGCAGCAGCACCAGT




TGGCGGTTCTGCGTGCTGATTGCGAGCTCGTGATTATA




CAGCGCGCGGAGGCGGCGCAGGGCCCGCCAGCCCCC




GAGGAGCATACGTCGGCTGGGGCGGCGGCGGCCAGG




GGCCCAGCAGCAGGCGGAGCTGAAGCGGCGGAGGCG




GCCGCGCCGGTGCCGTGCGATGAGGTGGTGACCCTGG




TGCCGGCCTTCTTCTTCTGCTGCAGTAGCGGCGGCCGC




GTGACGGTGCGGCTGCGGCCGGGGCGGGATGGCTACG




TGGCAGGCGAGGCGGCGGAGGTGGTGGTCGAGGTTG




ACAACCGGTCGAATCAGGAGTTTCGGGATGTGCGGCT




TGAAGTGGAGCGCCGCCTCACATTGGTCAGCAACAGC




GCCGGCGGAGGCGGTAGCGCCGGCAGCAGCGGCAGC




GGCAGTAGCAGCGCCACCGCGGGGCTTGTGCCGGGAT




GCTTCACTGAAGAGGAGCGGATCTTCAAGAGCAAGAC




CACGGCCGCCCTACTACCGGGAGCCTGCTACCTGGGA




GCCAACGCGCTGCGGCTGCCGGTGCCCCTGCCCTCCA




ACACGCCGCCCTCCACCTCCGGCGCGCTTGTGCGCTG




CTCCTACACCGCCACGGTGGAGGTGCTGCCGGCGTCG




GCGACAGCGCTGCGCGGCGCGGCGCCGCCGCGGCTGC




GTGTGCCGCTGACCGTGTTCGCATCCGCGCCGAGCTC




GTTCGCCACGGCGGCGGCACGGCATGCTCACCTGCAG




CAGGACGCAAGCGAGCAAGCGCCGGCGCACGTGTTG




GTGGTGGTGCCGCCCGTGGATGTAGTGCTCCCCGCAG




CTGCGCCGCAGCTGCCTCCCACCGCCGAGGTAAATGT




CAAACAGCACAACGGCGTGGCTGGCGCAAACCCGATG




TACGCGGGCCCGTAG





30
116
ATGACCGAGACCGACCACCGCCGCAGCCGCCCCGACT




GGTCCCGCGCTCAGAGCCTGCGCCTGATCCAGCTGCA




CGTCAAGCTGGGCAACAGCTGGACCGAGATCGCCAAG




CAGCTGCCCGGCCGCACCCAGAACGACTGCAAGAACT




TCTTCTTCGGCGCCCTGCGCGCCAAGCGCGGCTACCG




CGACAACCTGGTGTACGCCTACGCTCGCGCTCTGCCC




CCCGCTAGCGCTTCCGCTTGCGGCAGCTGGGAGCAGG




ACAAGCGCGGCCCCGACGCTCTGACCCGCGCTGCTGC




TTACAAGGCCGCCATGCAGCAGGTCGCCGCTCAGGAG




GTGGCCGAGCAGATGGAGAAGCAGCAGCGGAGCCAG




CAGCAGGAGGGCGAGGACGGCGGCTGCGGCAGCGGC




GCTGCTGGCGCTACCGCTGAGGACGGCGGCGAGCCCG




GCGCTGTGGCTGCTGCTAGCCGCCGCAGCAGCAGCGT




GTCCGTGGGCGCTGACGGCGCTGCTCCCACCGCTCAG




GGCGACGGCATGGACACCCAGGAGGACGCTGCTTCCG




CTCCCGCTTGCCCCGCTTCGGCTGCTGCTTCCCCCGTG




GGCCCCGGCGACGTGTCCGTGCGCCGCCTGAGCAGCA




CCGGCGACACCGTGGTCACCGACGCTGCTGGCACCCG




CACCGTGGTGGCTGCTGGCGTGGTCGCTGGCGGCTGG




CGCAGCGTGGCCGCTGCCGCTAGCATGCCCGCTCACC




CCGCTGCTGTGGTGTCGATGCCCCCCGTGGTGCCCGCT




TCGGTGGTGGCGGCTGCTTCCGGCGTGCTGGGCGCTG




CTGCCGTGCCCGCTGCCGGCGCTCCCGGCGACCGCCT




GAGCCTGCAGTCCCTGCAGCCCCCCCCCCACGGCTTC




GCTGCTCTGCCGCAGTCCGCTGCTCCCGCCATTGGCAG




CAGCTCCGCTAGCCCCTTCTGGCAGCACCAGCAGCAG




CACCACCTGATGGGCCCCCGCGTGCAGCTGCTGAGCC




ACGAGAGCCTGGCTCTGCTGCACCAGCAGCACCAGCA




GGCCCAGCAGCACAGCCACGTGGTGCTGCACGTGGCG




CCCCCGTTCCTGCAGCAGCACCACCAGAACCCCCACC




ACCAGCACCTGATGGTGCAGCTGGAGGGCGCTGGCGC




TGGCGCTCCCGCTGGCGCGTTCCAGCTGCAGCACCAC




CAGCACCTGCACCCCCACCACGTGCAGGGCTCCGGCC




CCGCTGACGGCAGCTCGGGCCCCGTGCTGCTGATGGG




CCCGGCTGGCCCCCACGCCGCTGCTCTGCAGCTGCTG




GGCAGCCACCCGCACCACCAGCACCAGCACCACCAGC




AGCTGGTCCTGCTGCCCTCCAGCGTGCCCGGCGCTCC




GCCCCAGCACGTCCTGCTGCCGATGGCTGTGCGCCCC




CCCCACCTGCTGCAGTACGGCGGCGCCCACGGCGCTT




CCGCCGCTGCTAGCGCTGCCGCGGCTGCTCCCTCGGCT




GGCATGGGCGCTTTCGTGTTCCACCCCCACCCCCAGC




AGCAGCAGCTGCCCCCCGCTGCTGCCGCTGCTTTCGCT




GCTGCCAGCGCCGCTCCCTCCCAGCCCGCTGCGGTGG




CTGCCGCTGTGCACTCCCTGGCTCCCGCTGCTTCGGCC




GCTCTGAGCCTGAGCGGCAGCTCCGTGCTGGAGGCTA




CCACCACCACGACCCGCATCACCACGACCACCGCTGC




TGCTGTCGCCGCTGCGGCTGCTGGCGCGGCTGTCGCTG




CCGGCGTCAAGACCGAGCCCGCTTCCGCTGAGGCTGC




TACCGGCTGGGCTCAGCAGCAGCAGCAGAAGGCTCAC




GCTGGCGTCAGCCGCAGCTGCAGCTCCAGCTCGAGCA




GCTCGGCTGCCTGCGGCGCTTGCTCGACCTGCACCGCT




GGCGTCGGCGCTACCCCCGCTACCGCTACCCAGCTGC




CCCAGCACCAGCAGGACCACCAGCTGCTGGGCGACGA




CTGGTGCGCTGGCGACGAGGAGTGGGCTGAGCTGGGC




CGCATTCTGCTGGGCTGA





31
117
ATGGAGGCCCTGGACGCCCAGGACAGCCTGCAGCTGG




ACGTGGTGTCCCCCAGCGCTCGCCCCGCTGCTGCTGG




CGGCGACAAGCGCGACCCCGAGCGCTTCTACTGCCCC




TACCCCGGCTGCAACCGCAGCTTCGCTGAGCTGTGGC




GCCTGAAGGTGCACTACCGCGCTCCCCCCGACATTCG




CGGCAGCGGCAAGGAGCGCGGCCACGGCACCGAGCT




GACCCACTGCCCCAAGTGCGGCAAGACCCTGAAGCCC




GGCAAGCACCACGTGGGCTGCAGCGGCGGCAAGAGC




GCTCCCCGCCAGACCGCTAGCAAGCGCAACCGCACCG




GCGCTGACGACGCCGACGAGGCTGTGCCCGGCAGCCC




CCACAGCAAGCACGTGCGCGGCACCGACATGGACGG




CGACCCCCACAAGAGCTGGCAGGACTTCGCTCTGACC




CACGCCGGCTACGCCATCGGCGCTCCCGCTATGCTGG




CTCCCCTGAAGCAGGAGCACCCCGAGTGGCCCCCCAC




CGTGCCCCAGGGCGTGTTCGTGGGCCACGGCGACCGC




GTGTCCTGGCTGCCCGGCCAGGTCAACGGCTTCGTGC




CCCAGCTGCAGCCCCAGCGCTACCAGCAGCCCCAGTT




CCCGCCCGAGCTGGCCCAGGCTTTCGCCGCTGCTGGC




ACCCACGCTCCCCACGTGTACGCTCAGCAGGTCCCCTT




CGCCAGCATTCCCGGCTACCCCGGCCAGCCCGGCGTG




GCCACCCTGCAGGTCACCACCGAGAGCGGCCAGGTGC




TGAGCATCCCCGCCAACATGGCTGGCATGCCCCCCGG




CATGGCCGGCCTGCCCGGCACCCTGGTGTACCACCAG




CAGCCGCCCCCCCACGACGCTGCTGCTAGCTACCTGG




CTCAGGCCCAGGCCCACGCTCAGCACGCCGCTGCTAT




GCACGCCGTGAACAGCGCTCACGCCCAGCAGCAGCAG




CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCCCGGC




GTGCCCGCTGCTCCCCCCGCTGTGCCGGGCGTGCACG




ACGGCATGCCGCCGGGCACCGTCGCCGCTGCCGCTGC




GGCCGCTGCTGCGGCTGCCGCCGTGGGCGGCAGCGCT




CCCAGCGCTCTGCAGACCGACGTCGGCGGCCGCCCCG




GCGCTGCTCTGCCGCCGCAGGCTGCTCCCGGCACGGG




CGCTGGCCAGGGCGCTGGCGCTCCGGCTGGCGCTGCT




GACGGCGGCGCGGCTCCGGCTGCTGGCGACGCTGCCG




CTTCGGGCGGCGCTAAGCCCGTGGCTGACGAGGACAA




CCTGGGCACCGTGTTCGACGACGTCGAGGAGTTCACC




CGCGACTTCGGCCGCATTCCCAGCCCCCCCCCCCTGCC




CCCCGACTTCCACACCGCTGCTACCGGCGGCAACGGC




ATGCTGTTCAACTTCAGCCAGTTCGGCCAGAAGCTGC




CCCGCACCCAGAGCCACACCCGCCTGGACCGCAGCCT




GAGCGCTGTCGGCCTGGGCCACCTGGACGTGGGCGTC




GACGGCGACGTGATGTACGACCACACCGACGACGGCG




ACCTGATGCAGCTGCTGTTCGGCGTGCCGGACGAGCT




GCCCACCATGGCCACCATCCACCTGCACAAGTGGTCC




AACGAGGAGGACGAGGACGACGACGCCGCTGAGCCC




GGCGGCGGCGGCGCGGCCGCGGCGGGCGGCGGCGGC




GGCGCTGCTGCTGGCGCTGGCGGCGAGGGCGGCGGCG




GCGCGGGCGCGGGCGGCGGCGGCGCCGGCGCTGGCG




CTGGCGAGGCTAACGCTGCTGCTGGCCGGGGCGGCGC




GGGCCCCGGCCCCGGCCTGGAGGCTGGCGGCGGCGGC




GGCGGCGGCGGCGCCGGCGAGGGCGGCCCCGGCGCT




GGCCAGCAGCCCCCCCACCACCAGCAGAGCGTGGGCG




GCCACGACCAGCGCCCCCTGAACGGCAAGACGCTGCA




CGGCCACGACGCCAGCCTGGCTGTGCTGCCCGCTCCC




GGCGGCAAGTCGCTGATGAACGGCGGCGCTGGCCACG




CTGGCGAGGAGCACCACCGCGACCACCTGCTGGACGC




TGAGACCTTCCGCCTGCTGCAGAGCTGCGACTAG





32
118
ATGCAGGACCCCCATTTACAAGAAACGACAGCTTCGG




AGCCGCTGACAATGGAGGAGGAGTATGAAATGCAGC




GCTCCTGGGCGCAAGATGAGGACAAGCTCACATTCAT




AGTGCTGGACAGGGGTTTCCCCGATGTGCCGGGCACC




GGCAGCCATGGCGGCGGCATGGCGGGCGATGTAAACC




TGTTTTTTACGCTGGACGAGGAGGAGGGCGGGCGGCA




GGCGGCGGAGATTGAGGTCATGGTGGCAGAGCAGGG




CTCGCGGGGCAAGGGCATCGCCAAGGAAGCGCTCCGT




GCGCTTATGGCATACGCCAGCAGGGAGCTGGGGGTGA




AGCGCTTCGTGGCCAAGATACACGAGGTCAATGCGCC




GTCCCGAAAGCTGTTTGAGGGCCTCGGCTTCGAGGAG




TTCAAGAGGGTGGCATGCTTTGGCGAGGTCCACTACC




AGCTCTCGACGGACAAGGCTGCCGACTGGCTGCCGCA




ACTGCAGGAGGGGCTCAATCTTGGCAAGTACGAGTAG





33
119
ATGCACACAATAAAATGCAACCGGCCCTGCTCTGTGG




CGTCGTCACGCGCAAAGAACTTGCCGACGCATTTCAA




GCTCGGGGCGCTGCCCATTTTGCACAGCGCTGAAACA




GCACTACATAGCGCTAGGGAGCATGGATCAGCTCCAC




ACACCCGGCGATGCGGCGTGGTCCGCTGCGCGTCGGA




AGCCCCAGCGGGCCCGCACACCACCGTGCCGCATCAC




ACGGAGGTGGCTGTGCTGGGTGGCCGCCTGGTCGTGA




GACCCATCACCGCCGGGGAGATCCAGGCCGCAGGCGT




GGTCCTGACCCGTGCGTTCGCGGGCTCATCGGAGGCG




GTGTCCTTGAAGGAAGTGCTGCAAGATCTGGAGACCC




AGGGCGGCGCCGGAGGCGCTGCCGCGGCAACTGGCTG




CTTCCTGGTTGCCCGCCTGTACCCCTCCACCTCCTCCT




CGGGCGCCAGTGGCAGCAGCAACGTACAGCTGCCGCC




GGGCCAGGACTCGCGACTGGTGGCCACTGCTTCCGTG




TCGCTGAGCGCACAGGACATGCTGGTGCGCCGCCTGC




CGCCGCCCAACCCGCCGCCGGCCGCCGCCGCCTACAT




AAGTAACATGGCGGTGGACCCCAAGTTTCGGAGACAG




GGCATTGCGCGAGCCCTGCTGGCGGCGTGCGAGGAGG




TGGCGCGCGGCGCGGGGCTCCGGGAGGCGTCGCTGCA




CGTGCGGGAGGCTGACTCGGCGGCGCGTGCGCTGTAC




GATAGTTCCGGGTACACAGTCGTGGTCAAGGACTCAT




GGGTGGACACCATGCGGCACAATATTCGGCCACGACT




CCTGATGAAGCGGACGCTTTAA





34
120
ATGGCTAAACGCGAGCTTGCTGTCAGCTTTGACATTGT




TAGAGAAAAGAACCTTGAGCAACTTAAGCTGCTAAAC




AGCGTTATCTTCCCGATGAAGTATGCGGATGAGGTGT




ACCGGCAATGCATGGCGTGCGGCGACCTGACTCAGCT




AGCATACCACAACGACGTCCTGGTGGGGGCCATCACG




GTGCGCTGCGAGCGCCAGCCCAATGGCAAGGCGAAG




GCCTACATCGCCACGCTAGGCGTGCTGGCGCCGTATC




GCAACTTCGCTATCGGCGCCAAGCTGCTGCAGCGCTC




GCTGGCTGCGGCGCAGCAGGACCCCAACATCGAGGAG




GCGTTTGTGCATGTGCAGGTCGACAACGAGGACGCCA




TCCGCTTCTACCAGCGGCACGGCTTTGAGAAGGGCGA




GGTGGTCAAGGACTATTACAAGAAGCTGTCGCCGCCG




GACGCAGTGGTCATGAGCAAGAAGCTGGCAGCATAG





35
121
ATGCTCCGCTGCGACCGGTTCTTCTCGAGCACACGCCT




CGTCGACAATCAGACTCTTCAAATCAGCTGCAAATAT




ATCAACAACAAGTTATCTAGTCCACTTTACGCATCTTG




CAATTGCAATCAAGGAAGCGGCCTTGCAAGTCTGCGA




CGCAGCTCGAGCAGCTGTTATAGCTCAAGACAGGTCC




CTGCGGCCATTGCGGAAGTTAATGTCCGCTCGGTGCG




CAGCCTCAGCCGCTGGCGATGGCAGGACCTCGCTCAG




GTGGCCTTCCTGCTAGCGGCATCGTTCTATGAGGACG




GCGAATCATGGCAGCTCGAGAGCCCCCCGGTCCCGGC




GACAAGTAGCACCGCAGTAAGCCAGGATGTAACAGA




ACTGTTGCCTGCATCGACTGATGCCAGGCAGTCGGCA




AGCGGCGCCGGCCGTAGCAGCAGGAACAGCAGCAGC




AGCAGGGCAAGCAGCAGTGGCAGTGGAGCAATCAAC




CGGCCACTGTCAGGAGCTGCTCTGCTGTTCGGTGCGTC




CTCGCTCCTGGCCATCTTGATCCAGCACGCCACATATG




GTGCGCGCCACGTCACGCTGCTTGCGGAGTTGCAGGA




GTCCGGCGAGGTGATTGGCTGCTGCGGGCTGACGTTC




GATGCTGCTCCAGCCGACGTCGTGGAGGCCACCGGCG




CGCCACAGGGCTGCGAGTATGCGCTGCTCACGGGCTT




AGCAGTTGCGCCGCCGCAGCGCCGCCGTGGTGTCGCA




TCGGCACTGCTGCAGGCGGCAGAGCAGGAGGCGCGG




CGGGGCCCTGGCCAGGCACGGCGGGGCCCTGGCCCGG




CACGGCGCCGGCTGCCGGCACTTCTGGCATTGCTGGTT




TCCAAACTCAACGCCGCGGGAAGGAGGCTGTACGAGC




GGAACTTGTACGAGGAGGCAGAAGACTGGGTGGACA




CGCGGTGGGAGCTGGACGCAGAGAAGGGCCGTGTTG




GGAAGCCCCGGCGGCTGCTCCTCTTTCGCCGATTGAC




ACAATAG





36
122
ATGCACTGGCAATATCCATGTTCCAATCTTTACTGTTT




TACATTGCTGCTCTGCATCCGCTGCGTTTCGCAAGGGG




ACGGTGAGCTTTCTGGGCCTGTTGTTTCGCAAGTCGCG




GCAAGCCGGCAAAGTGCACCAGCGCATTTGCATAGGC




GATCCTACTACAGAAGGAAAATGCCACGCGCTGCCAA




GGAGAAGCCGGAGAAGAAAGAGAAGAAGGTCAAGGA




CCCCAATGCCCCTAAGAAGCCCATGGGCGCCTACATG




TGGTTCTGCAAGGAGATGCGGGAGCAGGTGAAGGCCG




ACAACCCGGAGTTCAGCGTCACCGACATCGGCCGGCG




GTTGGGGGAGCTATGGAAGGAGTGCGAGGACGACGA




CAAGAAGAAGTTCCAGGACTTGGCGGACAAGGACAA




GGAGCGGTACAACAAGGAGAACGCCGCGTACCAGAA




GAAGGAGAAGGAGGCAAAGTCGGAATAA





37
123
ATGATTGACCTACTGCTGGGAGCATCGTTGTCTCCCTC




GGATATCGGACAGGTTCTGCTAGCGTATCCACAGGCC




TTCCAGCTCTCCCTGGACCGCGCTCGGGAGGTGCTGG




ACTTCCTGCGCGACGACATGCACCTCAGCGAGTCCCA




GGTCCGCACGGTGCTGACGCGCTATCCAAGCATCCTC




AACATGAACGTCAAGGGCCAGTTGCGCCCCCAGGTAG




CGTACCTCAACTCGCTGGGCGTGGGCCCAGAGTCGCT




GCCGGAGCTGGTGCTGAGCCGGCCTCTGGTGCTGGGG




CCCGGCATCGACACCGTCATCACCTTCCTCAAGCGGC




TGGGCGTGCCGCGCTCGCAGATGCACCGCATGCTGCG




CTCCTGCCCTCTGGACTACCGGGTTCAGTTCAAGAGCT




TTAGCGCCGCGGCGCCGGGTGGCAGCTCTTCCTCCTCG




TCCTCCGGCGGCATGGGCCGCAACTAG





38
124
ATGACGTCAGAGGAGCTATCTGTACGCAAACTTGAGC




AAGGAGATTTCGATAAGGGCTTTCTTACTGTCCTTGGG




CATCTGACAACGGTGGGGGATGTGACGCGGGAGATGT




TTGAAGAGCAAATACGTCGGCGAGATGCAGTGGGTGG




CTACCACACGGTGGTCATAGAAGACAACAGCCGCATC




GTCGCCACGGCCAGCATGGTGGTGGAGCTCAAGTTCA




TCCACGGCTGCAGCAAGGTGGGGCACATCGAGGATGT




GGTGGTGGACCCCGCGTACCGGGGCAAGCGCCTGGGG




CTCAAGCTGATCGAGGCGCTCATCGAGTCGGCCCGCG




GAGATGGCTGTTACAAGGTGATCCTGGACTGCGCGGA




GGGCAATGTGCCCTTTTACGAGAAGGCCGGGCTGGTG




CGCAAGGAGGTGCAGATGGTGCGCTACCTGGACCGGT




GA





39
125
ATGACAAAGCATAAACGCCGAGAGCTGCCCAGTGCGG




TCCACGATGGAGAGGAGTATAAACCAGGGGACTGCGT




GCTAATCAACCCGGACGCCTCTGCGCCCGCCTACATT




GCACGGATCCGGAAGCTCATACAGATCGGCGCGGAGC




CAGAGCAGGTGGAACTGGAGGTGACCTGGTTCTACCG




ACCAGAGGAGGCCATCGGGGGGCGCAAGGCCTTCCAC




GGCGAGGCGGAGGTGTTCGACTCTGACCACCAGGATA




AAGCACCACTAGCTGCCATCCTGGGTCGCTGCAACGT




ACACAACGTGTCACGGTATGAGTCGCTAGAACGGCGA




GACGAGAACGACTTTTTCTGCCGCTTCACATACAAGC




CCCGCACCAAGCAGTTTGAGCCGGATCGCGTGCCAGT




GTACTGCGTATGCGAGCTGCCATACAACCCAGACAGG




CCGATGATCAACTGCGACAACTGCGACGAGTGGTACC




ACCCGCAGTGCCTGGGCCTTGGCCAGCACGTGCTGCA




GCAGGACCACTTCGTGTGCCCTACTTGCACCACGCCG




CAGCAGCCCGCCAAGAAGTCCCGTCCTGGGGCATGA





40
126
ATGCTGCTGTCACGTCTCGCTCATTCCGCTCTCCCTGC




CTCGCTCCGCGCCTCGGCCGCGAGCTCGGCCTCGTCG




CAGCTCCATGCTGTGCCCCGTGTCGCGAGCGCCGCTC




CGCGGGCGCCGTCGCACGTCGCGCAGTACAGCAACGG




CTCTGCGGCGCCCGTCCCTCCCAACTTCGCTGCTCCCA




ATGACCGCGCCGCCACCAGCTCCAGCGACCGTGTATA




CACCAACTATTACGTGTACAAGACCCGCGCGGCCATG




TGCCTGCGGCTGCTGCCGCCCACGTTCGCCAAGGCGC




AAGCCGGCAAGGTCCTGGAACGTGACGGCACCATGCT




GCTTGAGTTTGCCACTGCCAACGCGGCCGCACCGGGC




GCTGGCAGCGGCCCCGCAGGCAACGTCAACCGCACCT




ACAACTGGGGCAACAAGGTGACGTTCGCTCTGAGCCC




GGTGGAGCTTGGAAACATCCTGGCGGGGGATGCGGTG




GCCTCGGACAAGGGGCTGGTGCTGTGGCACGACCCAG




CCAAGCTAGGCAAGACCGGCGAGCCCATTAAGAAGCT




GAGTCTGAAGCAGCTCCCAGACGGCAACATCAGCTTC




AACCTCACCGCCGGGCCCGAGAACTTCAGCGTGCCCG




TCACCAAGGGCGAGTTTGAGGTGATAAAGTCGGTCGC




GCAGTTCGCCATCCCCCGGCTGCTGGGCTTTGACGCCG




TTTTCGAATAG





41
127
ATGGGCAAGGACTACTATGCAATCCTTGGAGTGCAGA




AAGGAGCAGATGAAAATGAACTTAAGAAAGCGTATC




GAAAATTGGCGATGAAGTGGCACCCGGACAAGAACC




CAGACAACAAGGAGGAGGCTGCCGCCAAGTTCAAGG




AGATCTCTGAAGCTTACGAGGTGCTGACGGATCCAGA




CAAGCGGGAGGTGTACGACAAGTTCGGGGAGGAGGG




GCTCAAGGGAGGCATGGGCGGCGGGCCGGGCGGCGG




ACCGGGCGGGCCAGGCGGCTTCCACTTCCGGAGACCC




GAGGACATCTTCGCGGAGCTGTTCGGGGGCCGCAGTC




CGTTCGGCATGGACGACGACGACATGTACGCGGGCGG




CAGCTTCGGCGGCGGCGGCGGCGGCTTCCCCTTTGGC




GCGTTCGGCGGCATGGGCGGCTTCCCGGGCGGCGGCA




TGGGCGGCATGGGCGGGATGCCTGGCATGGGGCAACG




GCGGCCATCCGGGCCAGTCAAGGCCAAGGCCATTGAG




CACAAGCTCAACCTCTCGCTCGAGGAGCTGTACGCGG




GCACCACCAAGAAGATGAAGATCAACCGCAAGGTCA




AGGGCCGGCCGCAGGAGGAGATCCTGGAGATCGCGG




TCCGCCCGGGCTGGAAGAAGGGCACCAAGATCACCTT




CCAGGAGAAGGGCGACGAGGATCAAGGCATCATTCCC




GCGGACATTGTCTTCGTCATTGATGAGAAGCCGCACC




CACGGTTCAGGCGCGAGGGCAACGACCTGTACTTCAC




GGCGGTGGTGTCGCTGGCGGACGCGCTGTGCGGCACC




ACGTTGCAGATTCCGCACTTGGACGGCACCACGATAG




ACCTGCCAATCCGGGACGTCATCCGGCCTGGCGAGAG




CAAGGTGTTGCGCGGCAAGGGCATGCCCGTCACCAAG




GAGCCGGGCGCGTTTGGGAACATGGTGCTCAAGTTCG




ACGTCAAGTTCCCGCGCGAGCTCAGCGACGCCACTAA




GCAGCAGCTGCGAGCCATCCTGCCCTCGCACTGA





42
128
ATGGCCATGGCCAAGGAGACCGAGGACCTGGACCTGC




CAGAGGCAACCGCCCACGCGGGCGTGCTCGCTGTGCT




GGAGGGCAAAACGCACGCGGCGTATTACCTGCTGGAG




CAGTCGGGGGAGGTCGTGGCGCAGCTGATGATCACAC




TGGAATGGAGCGATTGGCGAGCCTCCGACATCTGGTG




GATCCAATCTGTGTACGTTAGGCCAGACTGCCGGCGC




CGGGGCCACTTCCGGGCACTGTACGCGCACGTGCGGG




AGGAGTGCCGGCGGGCGGGTGCCTGCGGGCTGCGGCT




GTACGCGGACACTGGGAACGAGCGGGCACACGCCGC




GTACGAGGGCCTGGGCATGAGCAGCCACTACAAGGTG




TTTGAAGACATGTTCACCCAGTACTGA





43
129
ATGAGCGGGGACGAGGGCGACGGTCGAGATGGCAAC




AGCAATGCGCGTGAGCAGGACAGGTTCCTGCCCATCG




CCAACATCAGCAGAATTATGAAGAAGGCGCTCCCGAA




CAACGCGAAAATAGCCAAGGATGCAAAGGAGACGGT




CCAGGAGTGCGTCTCGGAGTTCATTAGCTTCATCACGT




CGGAGGCTAGTGACAAGTGCCAGCGGGAGAAGCGGA




AAACAATTAACGGCGACGACCTGCTGTGGGCCATGAC




GACGTTGGGCTTTGAGGAGTACCTGGAGCCGCTCAAA




CTCTACTTAGCCAAGTTCAGAGAGGCTGAGGCGGCGA




CATCCAATAAGCCAGGGGGCGGCTCAGGTGCCAACGC




GGAGGCAAAGCGTGAGGCGGCCGCGGCGGCTGCGGC




TGCGGCCGCAGCTGCGGCTGCAGTTTCGCAGCAACAG




GCGGCGCAGCAGCAGATGGCGGCGCAGCTGCAAGCT




GGCATGGCGTTCCCGGGGCTCATGCCGGCGCAGTTCC




AGGGGCTACCGCCCGGCATGATTCCCGCTGGCTTCCC




CGGACTGCCGCTGCCTCCGGGCGTGCCGGGCCTGATG




ATGCCAGGTGGCGTTGTGCCCAAGCAGGAGCCCCCCA




AGTAG





44
130
ATGGCCGATGAGGGACCGTCAACGTCTGGGGACGTGC




GCTTCACTGTTCCCACACGCCTAAAGCTGATTGTGACC




GAGGGGCCTTGCGAGGGACAGATTTTTGACGCCGCAG




AAATGGACGCCTGTTTCCTGACGCTCGGGCGGACAAA




GAAAACCAAAATCCACCTGAAGGATGACTCCATCTCG




GAGAAGCACGCCGAGTTCGCATGGACTGGGAGCCACT




GGACGGTCACAGACACGTGCAGCTCCAACGGCACCCG




AGTGAATGGGGCCAAGCTCAAACCAAACGAGCCGCA




CGTGCTAAAGGCGGGTGAGCACGTGGCGCTGGGTGAT




GAGACCATCATGACCGTGGAGCTGTCGCAGCAGTCGC




TCGCGAACGTGTCACTGGAATGGCTGATGCGGGCGCA




CTTCGAGAGCAGCTGCCAGGGGCTGGAGGCTGGCGGC




GCGGACAAGGCGCGGGAGATGGTCCGCCGCTGCCACG




AGGCCCTGGACTCGCTGATGGACCCGGCGGCGGCTGT




AGCGCCCGCGGCCGCAGCCACGGCGGGAGGGAAGTA




G





45
131
ATGGAGCTTGGACTCGCAGAGAGTCTGGGCGACGCCG




ACTCCCTAGCAGCCTACCTAAATGGCAGTTTCATCGGT




GGAGGCTCGCTGGAGCAGACGCTGGAGGCACCTTCAT




TTTTAGGCGAGCTCGCTGCCATTACGGGGTCTATGGA




GGCTCCTTATGCGGCGGCAGCACCTGAGCTGCCGGCA




GAGCTCAAGCCAGAGGAGCTGCCTTCGACAAGCGGCG




CAGGCTTCCTGCCACAGTCGGAGGCGGGGCCGATGTC




CGAGGCCGGGCTCTCCGCCGATGGCGGGCTTATGTCG




GAGGACGACGCGGAGGGCGGCGCAACGTCCTGCAAG




GGCGGCGGCAAGCGCCGTCGGCGGATACGCACCGAG




AGGCAGCAGGTGCTGAATCGCCTAGCACAGCAGCGAT




ACAGGCAGCGCAAGAAGGAGAAGGTCCAGGCGCTTC




AGCACAACGTGGACGCCTTGCAGATGCAGCTGGAGCG




GGTCAGCTTCCTGGAGTCGCAGTGCGACTCACTGCGC




GGCACGGTGGCTCAGCTAGGCGCGGACCTTGCTGCCA




AGGACGCGGGGCTGGCGGCGGCGCAGGCGCAGCTGC




GGCAGGCGGCGGTACTGCTAAAGGGCGCGCAGGACA




AATGCGCTTCGCAGGAGCGGCAGCTGGCGGAGCAGGC




GCAGGCGCTGGAGGCGCAGCGCTCACAGCTGCGTGTG




TCCAACCTGGCCAGCCTGGACCCCCAGGCCCTGTCCG




ACCGGCTGCTGGCGCTGGTGAAGGAGGCCTTCGCCGC




CGCTGCCGCAGAGCGCAGCTCGGAGATTGACGGCTCC




GGGATGGCGGCGCCGGCGGCGGCTGCCGCGGCGCCTT




CGGCGCCGCCACCGCTGGCGATGTCGGAGGAGGTGGT




GGCGGCTCTGAGCCGCAGCCTCACCAGCTGCTGCCGC




GAGCTGGTGTTTGCTAGCAAGGGCCTGGGCGGCAAGC




AGGCGGCGGCGGAGGCACCGTCCGTCATCCCCGTGCA




GTGCTGCTAA





46
132
ATGGCCAAGCTCATTAAGAACGTCGGAGCTTCACTAA




GGGCAAGGACCCACGACGAGGACGACACAATGATGA




AGCAGAAAGGAGCGACAGGGGTGTTCAGAAACCTCG




CGTTCGCGGACGCTGACGACAACTTGGTCTCCACCTCC




GCACGCGCGATGGCAACTTCGGAAAGTACCAAGAAG




AACAACTTCTTTGGTGGCAGTCAGGACAACATTGCGT




CCATAGATGTCACGCCGCGGTCACGCGACGCGGGCAA




CGGAGCGTCCTCCTGGGCGCACGCTGACCTCCCCACC




TCGGCCAGCAAGCGCGTGGGCAGCACCGGCAGCGCAT




CTACACCTGTGAAGAGCGCAACCTTTGCACGCACCGC




TTCGGCACAAAAGCGCGCCAAGAACGCGACAGCCATT




CAGGAAATCTCTGCGTTTGAGCACGAGCACGCTGTGA




TGGACGAGATGTCGGGCTCCGAAGACGGCGAGCGGCC




AGCGGGCCTAGTGAGCGGCGGCAGCGCCATCGGCGCC




ACCACTAGCACCACCGTCATTGCCGTGCGCTCCGTCG




CGCGCGGCCCCAGCATCACGCAGCAGGTCAGCACCAG




CGGCAGCGTGCGGGCGTGGGAGGAGGAGGTGAAGCG




GCTTATCGCCAGCGGGCGGCACGAGGACGCGGTGCGG




TGGGTGGCCCCCTCGGACGGCATCATCCGCTGCACTG




TGCGTCGCGTGAAGAACTTCCTGGGGCATACGCTCGC




CTACCAGCTCTTCTTGGACTCTGGAGACACGTTCGTGC




TGGCGGCGCGTAAGCGCAAGAAGAGCAAGGCCTCCA




ACTTCGTGCTGAGCACCAGCCAGGAGGACCTCGGCAA




GGACTCGGACCACTGCATCGCCAAGCTGCGAGCCAAC




TTCGTGGGCACTGAGTACGGCCTGGTGTCGCGCACCG




GCGGCCACATCAGCGGCAGCATGGACATTGACGGCGG




CGCGCAGTCGGGCGGCAAGCTGGCGCCGCCGGCCGA




GCCCTTCTCCCGCGAGGAGATTGCGGTGCACTACAAG




CAGACCGCGCTGACGGCCAAGGGCGGACCCCGCACC




ATGCTGGTCGCCACGCCGCTGCCGGAAGTGAGCTGGG




CCCCCAGCGCCGCTGACGGCTCGGACTCGCTCGCCAA




CTGCCTTGAGGCGGCGCGCCGGCGGGAGCTGTCGCCG




CGCATGGAGCGGCAGCTGTGCATGCTGGCCACGCGGC




CGCCGGAGTGGGACCCCAGCCTGAAGGCGTACACGCT




CGACTTCCACGGCCGCATCCGCGCCAGCAGCGTGAAG




AACTTCCAGCTGGTGCACTGGGACCACAACACGGACC




GCAAGGGCTCTGACCTGGTGCTGCAGTTTGGAAAGAT




TGACGAGAACACTGACGACTTCGCGCTGGATTTCACC




TACCCGCTCAGCCTGCAGAAGGCGTTCGCCATCGCGC




TCGCAAGCACCGACACAAAGCTGTGCTACGCGTTGTA




A





47
133
ATGGCAGAAGAGACAGGCCGGTCGCAGAGCGGCGCC




GAGGCGACGACCAGCGATGCCATCCGATATGTCCAAT




ACAAAGGCGAGGAGGACCTGCCCATCGTAATGGGCCT




GGTCGACAAGGAGCTCAGCGAGCCCTACAGCATCTTC




ACGTATCGCTACTTTCTGCAGCAATGGCCACACCTATG




TTACATTGCATATGACGGTGACAAGCCGTTCGGCACG




GTCGTGTGCAAAATGGACATGCACCGGGACCGGGCGC




TGCGCGGCTATGTCGCAATGCTCGTGGTTGACAAGGA




GTACCGTGGCAAGCGCGTGGGCTCTGAGCTGGTGAAG




ATGGGGATTCGGGAGATGATTGCGGGCGGCTGCGAGG




AGGTGGTGCTGGAGGCGGAGGTCGTAAACACCGGCGC




CCTCAAGCTATACCAGGGGCTGGGCTTCGTGCGGGAA




AAGCGCCTTCACAGGTACTACCTGAACGGTGTGGACG




CCTACCGCCTCAAGCTGCTGCTGCCGCTGACCGAAGA




GAAGAAGGCGGCGCTGGCGGCGGCCGCGGCGGCGGA




GGCGGCGGAGCTGGAGGGGGTGGAGCTGGAGGCGGC




GGCGGTGGACGCAGGGGCAGTCGCGGCGGCGGCGGA




GCCTGCCATTGCGTGA





48
134
ATGGTCGGCAACAAGCTGTCAGCTGTAAGGTCTGTGC




TGCGAAAGGCTCGACAGCTCAAGGACCCTCTCGGTGA




GCTCGTCAGCACTGCAAGGCCCTGCCGCGTCGACGGC




CAGCAACACACGCACTTCCGGTCGCACCATGCCGCCG




ACCTTCCCAAGCAGCAGCTGGAATGGTGTCTGGACGT




GTGCCGGGAGAACATGGCGGCCTTTTATGAGCGCGTG




TGGTCTTGGAGCGATGTGAAAAAGAGGCGGCAGTTCA




CCTCGAGCGCTTCTCGGTTCCTGATAGCATATGACGTG




AACGCTGCTCGCGTCCCTGTTGGCTACATCAACTTCAG




GTTCGAGTACGAGGACGGCGAGGCGGTGCTGTACTGC




TACGAGCTGCAGGTGGCGCGGGCGGCGCAGCAGCGG




GGCCTGGGCCGAGCCATGATGGAGCTGCTGGAGCAAA




TTGCGTGGGGCGCCGGAATGAGCAAGGTGATGCTGAC




GGTGTTCACCGAAAACGTCCCGGCACTGGCGTTCTAC




TCCAAACTGGGTTACCGGCTTGATGAGACGTCCCCCG




ACTATAGCCCCGCAAGCGGCAACTGTAGTCCCCTGGA




GTTGGCGCACAGCGCGGGCGGCGGTGGCAGTAGCCGG




TGCAGTCCGGAGCTTGGCGCGGCGGCGGCGGTGACAG




CTACTGGTACGGGCTGCAGTGGCAACCGTAGCGCAAG




CGGAAGCCCGGAGGGCGGTGGCAGCGCTGCTGTCAGC




AGCAGCATGGCTGTCAGCAGCGGGAGCGCTGGGGGTG




CTGGGAGCGGCGAGGGCAGCGGGAGCGGCTACCACA




TTCTCAGCAAGCGGATTCCATCGGACTGGCGGGAGGA




GGTGAGGCTTCAGCAGGAGGCGCAGCAGCAGCGAGA




CGTGCAGCGTGCTGAGGTGCAGCAGCAAGTGGCAGTG




CGGAACGTGGCGCCCGGGCATCAGGCTCACGAGGAGC




ACCAGGTGCACCAGCAAGGCCAGTCGCCGCAGCCACT




GCCACAGCAGCTGGCACCGCTGCGGCAGGCAGTGGAG




GCCGTGGCTGCCATGGCAGAGGCGGCCTTGCCTGTGG




CAGCAGCAGCGGCCTCGCCGGCCGCAGTCTGCGCCCC




GGAGGCCGAGGCTGAAGAGCCTGGCAGTCGGAAAAA




GCAGCGCGTATCCTGCACGCCGGATGTCACCGGCGCA




GGCAGGAGCGGCAGTTGCGGGCCGGAGCTGGAGGAC




CGCGCTGAGGGAGCAGCGCAAAGCGACGTCGCGGCC




ACCGCCGGACACGACCTGTCACGGAATGGCACACCGG




TGCCCATGGTGATCCATGAGGGCACGGGTGCTGGTTC




TGGCGCCGGTGCTGCAGCGGCTGGGACCTCGAGCACA




GAGCAGGAGAAGGCAGAGCAGGTGAAGCCGGGGGCT




GCAGAGCCCGCGGCGGTACCGCCGGCGCAGGATGGC




GAGGCCGCGGGGGCTGGCATGAAGATATGTGGAGCGT




GCAGCAGCAATGGTGCAGCGGCCGCTGAGCACATACC




GTAG





49
135
ATGCTGGACCGAATTCATGAACTTGAAGCTGCCTCTTA




CCCAGAGGACGAGGCCGCTACTTACGATAAGCTAAAG




TTCAGGATCGAAAACGCGTCGAACGTGTTCCTGGTCG




CGCTGTCGGCGGAGGGCGACGGGGAGCCCAAGGTCGT




CGGGTTTGTGTGCGGCACGCAAACGCGCGCGTCTAAG




CTGACACACGAGTCCATGTCAACGCACGATGCCGACG




GCGCACTACTGTGCATCCACTCGGTGGTGGTGGACGC




CGCGCTGCGCCGGCGCGGCCTGGCCACCCGCATGCTC




CGAGCCTACACCGCCTATGTGGCCGCTACCTCCCCGG




ACCTGACCGGGATACGGCTGCTGACCAAGCAGAACCT




GATCCCGCTGTACGAGGGCGCGGGCTTTACGCTGCTG




GGTCCCTCGGATGTGGAGCACGGCGCCGATCTGTGGT




ACGAATGCGCCATGGAGCTTGAGGCGGAGGAGGAGG




CGGAGGTGGCGGAAGCCTAG





50
136
ATGGCAGCCAGCTTCTCTATCTCTGGCGATTTTGCCTG




TGGCCAGTCTACTGGTCACGCGACGTTCTGGCGGCTTG




AAGAGAACAAAGTCTTCGAGGTAGCCCTTGCAAGACA




CTACGCGGACGTGGACAGGTTCGAGCGCATCGCCTCT




TATCTGCCAAACAAGACGCCTAACGACATTCAGAAGC




GGCTCCGCGACCTCGAGGACGACTTGCGACGCATCGA




TGAGGGGTGTAACGAGGGCGCCTCAGCTCAGAGCGCC




CCCGCGGCGACCCCCGCACGTTCAGAGGACTCGGCGC




CGAACGCCAAGCGGCCAAAGACCGATGTGCCAGCCA




ACGGTGACCGTCGCAAGGGTGTGCCCTGGACGGAGGA




GGAGCACCGGTTGTTCCTGCTCGGGCTCGCCAAGTTC




GGCAAGGGTGACTGGCGTTCCATCGCCCGCAACTTCG




TCATTTCTCGGACGCCAACCCAGGTGGCGAGCCATGC




GCAAAAGTATTTCATCCGCTTAAACAGCATGAACAAG




AAGGACAAGCGCCGGGCGTCGATCCACGACATCACCA




GCCCGACGCTGCCCGCCTCGGTGGCCAACCCCGCCCC




GACCACGGGGCTAGCGCCTGCAGCGGCCTCGGGCAAG




GCCACCTCGTCATTGGTGCAGGGCGCGACCTCCTCCG




CCACCACTGCCACCTCGCAGCCCATGGCCGCCGCGGC




GGCCGCTGCAGCGGCAGCCTTCCCCGCGGCTGCGCAC




GTCGCCGCTGCCGCTGCCGCGGCCGCCGCCGCCGCCA




CCAGCACCACCAGCGTTTTCGCGCAGCTGGCTATGCA




CGGGCTTGCCATGCAGCCGGTGATGCAGCAAGCGGCT




GCGGCTGCGGCAGCAGCGGGCATGATGCCTCAGCTCA




ACGCGGCGGCCGCGGCCGCTGCGGCCGCCGGCATGCC




GGCGCCCGTGCTTCCCAACGCGGCGCAGTACATGGTG




CAGGTCTAA





51
137
ATGCGCAGCCAATACTTGCTTAACACACGCCGGTGGG




TGGTTCGCCTTGCCGATCAGTGCAGCCAGCGCGCGAG




CCTTACGGTGAGCGCGCAAGCCGCCGCCGCAAACGAG




CCAGTCACTGATCTACCGGAGCTAGTATCTTGGGTCTT




GCACCGAGGAGGTCGAGTGGATGGCGCAACGCTCGCG




AACCTGGCTGGGCGCGATGGCGGCAGCGGCTGGGGGC




TGAAGTGCACCAGAGACGTGCAGCAAGGGCATCGGCT




CATCACGCTGCCGAACGCAGCGCACCTGACCTACGGC




GCCAACGACGATCCTCGGCTCCTGGCTCTGATCGAGA




AGGTGCCCTCAGAGTTGTGGGGCGCTAAGCTGGCGCT




CCAGCTGATCGCTCAGCGGCTTCAGGGGGGCGAGTCG




CAGTTTGCCTCGTACGTGGCGGAGCTACCCAAGGGCT




TCCCCGGCATCCCCGTGTTCTTCCCCCGCACCGCGCTG




GACATGATCGACTACCCACCCTGCTCGCAGCAGGTGA




AGAAGCGCTGCAAGTGGCTGTACGAGTTCAGCACTGA




GGTGCTGGCCAGACTGCCGGGTAGCCCCGAGGACCCC




TTCGGCGGCGTGGCGGTGGACATCAACGCCCTGGGCT




GGGCCATGGCGGCGGTGAGCTCACGTGCCTTCCGCAC




GCGCGGCCCCACACAGCCCGCCGCCATGCTGCCGCTG




ATCGACATGGCCAACCACACCTTTAGCCCCAACGCCG




AGGTGCTGCCGCTTGAGGGCGGCGGCGGCGCGGTGGG




CCTGTTTGCGCGGCGGGCCATTACTGAGGGCGAGCCG




CTGCTGCTGAGCTACGGCCAGCTGTCCAACGACTTCCT




GTTCATGGACTATGGCTTCATCGTGGAGGACAACCCG




TACGACTCTGTGCAGCTGAGGTTCGACGTCAACCTGCT




GCAGGCCGGCGCGCTGGTGGCCAACGTGAGTGATGCA




CTGGGCGCCCCCCTGGACCTGGCGCCCCGCACCTGGC




AGCTGCAGCTGCTGGCCGAGCTGGGGCTGGTGGGCCC




AGCCGCCAACACCGAGCTCAACATCGGCGGCGGCGGC




CCGGGCGCTGAGCTGCTGGACGGGCGGCTGCTGGCGG




CGGCGCGCATCATGGTGGCGCGGGCCGATGGCGAGGT




GTCGGGGCGCGGCGTGGAGCGGCTGTGTGCTGTGGAC




CGACCGCTGGGTCGGGACAACGAGCTGGCGGCACTGC




GCACTGTGGGCGGCGTGCTGGCGTTTGCGCTGAGCAA




TTTTGCAACCACCCTGGACCAGGACAAGACACTGCTG




GCGGGGCAGCCCGTGGCGGTGCCGCAGGCGGGCGGG




GTGGGCGAGCGCGAGCTGCCACCCCTTGCCAGTGAGG




ACGAGGCTCTGGCGGTGCGGTTCCGGCTGGAGAAGAA




GAAGATCCTCAGCCGGGCGCTGCAGCGGGTGGGCGCA




TTAAGTCAGGCGGCCGCGGGCAACAGCGAGCTGAGGC




AGACGGCAGGCTCTGCAGCAGCAAAGAAGGGCAGCA




AGCCGGCGCCGGCCACTGGCAAGGGCTTCGGCTCCAA




GAAGCGGTGA





52
138
ATGGCAGACGCAACGGGCTCAACGCAAGACGACGGC




TCCAACACCGTGATTGTTATTGTAGGAGTGGTGCTTGT




CATAGTTGGAGGCGCGCTGCTTTATTCTTTTATTCAAT




ACCAGCGGATGATGGCCAACGCGCCCGCACGGCCAA




AGAAGAAGCTAGGGGCGAAGCAGATCAAGCGCGAAA




AGCTGAAGATGGGCGTTCGGCCGCCGGGCGACGACTG




A





53
139
ATGAACATGAACTCTCAAGACTGGGACACCGTTGTGC




TTCGCAAGAAGCAGCCTACTGGCGCAGCGCTGAAGGA




CGAAGCCGCTGTCAATGCGGCACGGCGGCAAGGTGCA




GCTGTGGAGACGTCGCAGAAATTTAACGCTGGAAAGA




ACAAGCCTGGTGCGGCTCAGACTGTGAGCGGCAAGCC




TGCAGCCAAGCTGGAGCAGGAGACGGAGGACTTCCAT




CACGAGCGCGTGTCTTCGAACCTCAAGCAGCAGATTG




TGCAGGCGCGCACGGCGAAGAAGATGACCCAGGCGC




AGCTAGCGCAGGCTATCAACGAGAAGCCGCAGGTGAT




CCAGGAGTACGAGCAGGGCAAGGCCATCCCCAACCCC




CAGGTGCTCTCGAAGCTGTCCCGTGCGCTCGGCGTGG




TGCTGAAGAAGTAA





54
140
ATGGGCAGCACATCAGGTGTTCGCACGTTCAGCAAAT




CCGATGACCCGGTCGCAGCGGAGGAGTGCTGCAACAC




GGTTGGCAAGGGTTTCGCCTCCGAGCCCAACAACGTG




TTCTTCTGTGCGGACCCCGCGCTCTTCGAGGGCAGGTG




GAGGGCCATCGCCCACAACAGCCTACTGCGCAGCCCC




GAGACCCCCCTGCTGCACTCGGTGGCCTCCGGCGATA




CGCAGCACGCGGCCGTTGCATTTGCTTACTCCTACCCC




GAGCAGAAGACACCGGATGACGCGCCGGAGCCGCCC




GGTGTCATCGACCTGTCCGGCAGCGGCCGGCCCGAGG




CGGTACCCACACGGGATGAGATGCTCAAGTACCTCGG




GGACAAGAAGACCGAGTTCTACCAGCGGCGCGGGCC




GTTCGAGTACGTGGCCTTCCTCGCCACTCGGCCCGAGC




ACTGGGGGCGAGGCCTGGGGTCGCGGCTGCTGAAGCA




CCTGACCGACAGGGCTGACGCCGGGGGCCGGTGGGCG




TACCTGGAGGCGACCAACGCGGACAACGCGCGGCTGT




ATGCCAGGCACGGCTTCCGCGAGATCGAGACCAAGGT




GTGGACGCTCGAGTGCCTGCCCGGGCAGCGCATGATG




CTGATTTACATGGAGCGACCACCCTCGGCACAGCAGC




AGTAG





55
141
ATGACGGATTACCTAAAGGACTTCATTGACAGGGCTG




CAGATGTGCCCCTGCAGCTGCGTCGGCGCCTTGCCCTC




ATCCGTGACCTAGACGAGAAGGCACAGGCGCTGCATC




GTGAAATAGATGAGCACTGCAAGCGCACGCTGGCGGA




GAAATCGCAGCAGCACGCAGCTAAGAAACAGAAGCA




GGCTGCGGGGGAGGACGCTGGCGGGTCAGCAGCGGC




GCCGTACGACGTGGAGTCGGCTCTGAAGCGGCTCATA




GGTCTCGGGGACGAGAAGGTCAACATTGCTAACCAGA




TTTACGACTTCATGGACAACCACATCAACCAGCTAGA




CACGGACTTGCAGCAGCTGGACGGGGAGATTGAGGCG




GACCGCAAGGAGCTAGGGCTGGAGGGTGACGAGACG




GCCTGCGAAAAGCTGGGCATAGAGGCGCCGCAGGGG




TCACGGCCGCACACGGTCGGGAAAGGGGCAGCGGAC




CAGAAGAAGAAGCGCGGGCGGAAGAAGGACGAGTCG




ACGGCAGCTGCAGCCGGTGGGCTGCCGCCCATCGAGA




ACGAGCCGGCGTACTGCATCTGCAACAAGCCGTCGGC




GGGGCAGATGGTGGGCTGCGACAACCCCGAGTGCACC




ATCGAGTGGTTCCACTTCGAGTGCGTGGGGCTGACGG




AGGAGCCCAAGGGCAAGTGGTACTGCCCCGTGTGCCG




CGGGGACCTGCAGGTCAAGTCGGGCAAGAAGAGCGG




GCGGCGGTGA





56
142
ATGGGGAAGAAGAAGAAGCAGAAGGAAATCGAGCAG




TGCTTTTGCTATTATTGCGACCGCATTTTCGATGATGA




GTCGGCGTTGATTGTGCACCAGAAAAACAAGCACTTC




AAATGCCCAGAATGCAACCGCAAAATGAACACCGCCC




AGGGCCTGGCAACGCACGCGTTCCAGGTGCACAAACT




AACCATCACTGCTGTGCCCGCCGCCAAGGCCGGGAGA




GATTCCATGGCTGTGGAGATCTTCGGCATGGCGGGCG




TGCCGGACGACGTGCGGCCCGCCAAGCTTCAGGGTGA




TGGGCCTGCGCTCAAGAAGGCGCGCGCGGACGACGAC




GATGACGTGACGCCGCCGCCCGCGCCGCCGCCGCCGC




CGGGCGGCATGCCGCCGCCGATGGGCGGCTACCACCC




TGGCATGCCGCCGCCCATGGGCTACCCGCCCTACGGC




GCACCACCGCCGTATGGGTATCCGCCCTACGGGCCGC




CCCCGCCGGGGTACCCGCCGCGCCCGGGCATGCCGCC




TCCCTACGGCGCGCCGCCTCCCTACGGCATGCCGCCTC




CCGGCTACCCGCCTCGCCCCGGGATGCCGCCCCCAGG




CATGCCACCGGGTGCGCCGCCGCCGCTGGGCGGCCCG




CGGCCGCCCTTCCCGCCCTACGGCATGCCGCCACCGG




GCATGCCGCCTCCGGGCATGCCTCCCCCCGGAATGCC




GCCACCAGGCATGCCGCCGCCAGGGGCACCAGGCGG




GCCCCTCTTCCCCATCGGGCAAGCGCCACCGGGCGCG




CCGCCGGCACTTTTCCCCATTGGCTCTTCGGCGCAGCC




GCCGGCTGCAGGGGCAGATGCAGGGGCAGGGGCCGC




CGCAGCGCCCGCCGCGGCGGGATCGGTGGCGCCGGCG




CCCGGCGACGGGTCGGTGGTGGTGTGGACGGATGAGG




AGTGTTCCATTGAGGAGCGGCGGGCGCAGCTGCCTCG




CTACGCGATCGCGGCCGGGGGGCCAGGGCGCAACGG




GGCATGA





57
143
ATGAAGGACGACGCGGCAGCGGCAGCGGAGCGCCCG




GCGGACATGCCCACGGACGCCGCGGACGCTGCCGGGC




CGGGCCCCAACTCAGCTGCCGTGGCCGCGGCCGCTGG




CTCAGCAGGCATGTTCCGCCGCAAAAAGGGTGGCGCC




AACATTCGTAAGCGCGGCGGGGCGGAGGGCGGCAGC




GACGACGACGAGGCGGGGGGTGGCGTGGTGCGCAAG




GCCAAGGCCGCCAAGTCGGACGCGCCGCTGGCGTTCA




CGACCAAGAAGGACGACAAGGAGACGTTAATGGTGG




AGTTTGCGGGCTCCAAGGCGCTGCAGGACGGGAAAGA




CACGCTCGCGACACGCGTGCTGGAGACGGAGACGGA




ATATGACCGGGACGCACGGGCGCGGCGCGAGGAGGT




GCTTAAGCAGGCCACGGCGGCGGAGGGCGCGGCGGA




CGACGGCACGTACAAGGGCATGAACGCATACGTCGAC




TACCGCAAGGGCTTCCGGCGCGAGCACACGGTGGCGG




CAGAGAAGGGCACCGGCTCGCACGGCCCCCTGCGCGG




CAACGCCTACGTGCGCGTGACGGCCCGCTTCGACTAC




CAGCCGGACGTGTGCAAGGACTACAAGGAGACCGGCT




ACTGCTCGTACGGCGACACGTGCAAGTTCATGCACGA




CCGTGGAGACTACAAGAGCGGCTGGGAGCTGGATAA




GATGTGGGAGGAGGAGCAGAAGCGCAAGGCGGAGGC




CCTTGCCAAGGGCTGGAACCCGGACGCCGATGGCGAG




GAGGAGGAGGAGCAGGGAGGCGGCCGGGAGGATGAC




GAGCTGCCGTTCGCTTGCTTCATCTGCCGCGAGCCCTG




GGAGGCCTGCAAGTCGCCGCCGGTGGTGACGCGCTGT




AAACACTACTTTTGTGAAAAGTGCGCGCTCAAACACA




ACGCCAAGACGACCAAGTGTGCGGTGTGCGGAGTGGC




CACACAGGGCATCTTTAATGTGGCGCAGGACATCATC




AAGCGCCAGAAGCGCATGGGCGTGGTGGGGTGA





58
144
ATGGAGCGCTTTGACTCCCAGATGCTGTTCAGCGTCTT




TAGGAACGACGAGGGTGAAAACCTTTTGCCGTTTGAT




GAACTGGCGGAGCTGCTTCAGATGGATCTGGCTCCCA




ATGGCGACGCCGGGGCCACGCCAGCATCGTTCGCACC




GGACGCCGCTCTGCCCCTAGACCTCCCACACCTGCAC




CACGCGCCACCCATCATCACCGCGCCGCTAGTCACCA




CCGCGCCGCCCACCGGCCCCATTCCCTCTGACGAGCG




CGCCGCAGCGCTGACGCACCAAAGCACTCTGCCCAGC




CCCAGCGGCGGTAGCAGCGACCACACACGCGCCCAG




AACTGGGCCGGCTCGAACCCATCATCAGAGGACGGCG




ACGGAGATGGCGACCGCGACGGACGCGACGGTGACG




GCGACAGCGGAGACTCAGACATGGACCACACCACAC




AGACGCCGGGCGTCAGCGGGGCCGGCGACGCGGGCG




GCCGCGGGCGGCGGGGCAGCAGCAAGGGCGGCAAGG




CGTCATCGGGTGTGAAGAAGCGGCGGCAGCGCAATGC




CGAGCAGATGGAGTCCAACCGCATCGCGCAGCAGAA




GTACAGGCAGCGTAAGAAGGGCGAGCAGAGCGCGCT




GCAGACGGCTGTGGACTTGCTCACGGCGCAGGTGGCG




GCACTCAAGGCCGTGGAGGTCCGCAACGGCGAGCTGG




AGGCGGCCGCAGCGGCTCTGCAGTCCACGGTGTCTCA




GCAGGCCGCCGCCGTGGCCTCGCTGCAGCAGCACAGC




GCCGGGCAGGCGGCGGAGCTGGAGGCAACTCGCGCG




GCGCTGGGGCACAGCCAGCAGCAGGTGGCCGCCCAG




CACCGCATCATCGTGGACCAGGGCACCAAGCTGAGGC




TGCAGGAGCAGGTGATTGCAAGCCTGAAGGACCGACT




GAAGGAGGAGATCGACGAGGCATTGAAGTGCGTGGC




GCCAAACACCGTGTGCGAGAAGATGGTGGCGGCGGTC




AAGGCCGCGCTGTACGGTGCCAAGGACGTCAGCGGAC




TGCAGGACGTGCTGTCCCAGCTGCCGGAGCACCTGGT




GCACGACATCTGCAAGAACATCTGGCAGGTGTGCAAG




GAGTCCTGGCCCGACCTGCGCAGCCGCTGCGCCACCC




TGCACGCCGCCGGCTGCCCCACCAGCGGCTTCGGCAC




TGCCTGA





59
145
ATGTTGCGCCAGCTTTGCAGCCGCAGCCTGCAGAGCC




TGGCATCTCTGCAGGGCCGCTGCACCTCGGGCTTGGC




GACGACGCTTCGTGCTGCGAGCAGCCTGAGCGAGCTG




TCACGGCCAGCCCCTTCAGTGGCGACCTCGCAATCAC




CAGCATGGTCATATAGAAATAGCAACTTGCTAGCGGC




GCCACCTCTGGGCTTGGGACTGGCGCCCCAGGTCCGC




GTAACCCCGGACGCCTCCACCATCCTCAGCCTCTTTGT




AAGCCAGCGGCGCAACGCAGCCGCAGCGGCTGCCGC




GGCCGCCGTAAAGAAGGCCGCACCGGCAAAGAAGAA




GAAGAAGAGCGCGCCGAAAACGGCGGCAAGCAGCAA




GCCTAAGCCCAAGCCCAAATCGACAGCAGCAGCCGCA




ACCAAGGGCCGCGTGCGGACCAGACCCGCCAAAGCC




CCGGCGCGCAAGTCGACCACCACCGCCGCGGCCAAAC




GCAAGAAGCCCGTCCGCAATTCCATCTCCGCCGCCGG




CCGCAAGGCCGCGAAGGCCGCGGAGGTCAAGGCCCG




GCTGCGAGTGCGCGCGACAGCGCAGCGCGCACGCGC




GCGTGCCGCCAAGGCCCTGGCCATGAAGCGGGAGCGC




GCCAAGCTCGCGCGGATCAGGCGGCGCGAGCGCGAA




GCGCTCAGGAAGCAGAAGCAGCGGGAAAAGCTGGCC




GCGGCAAAGGCCAGGGCCAAGGAGAAGGAGGCGGCA




CGCATCAAGAAGGCGCCATCGGCCTTCGGCCTGTACC




TGCAAGACCACTCCAAGGCGGTGCGCGACGCCCTGCC




CGCCGGCGCCGCCAGCGGCATGCAGCGCCAGGCGCTC




GCGTTCAAGGTGCTGGCGGAGCGCTTCAAGGTGCTGC




CGGAGGCGGAGAAGGCGCCGTACGAGGCGCGCTCGG




CGGCGCTGAAGGCGAAGGTGGCGGAGGCGCGCGCCC




AGGCCAAGGCGGAGAACAGCGCCAAGGCGGCCCTCA




CGCCCTACATCTTGTTCTTCAAGGAGTCCTACAGCGCC




ACGCGCGCCGCGCACCCGGACCTCAACGCGAAGCAG




GTGGCTGCCAAGATGGGGCAGTTGTGGAAGGCGATGC




CGGCGGAGCAGCAGCAGCGCTACCGCGACCTTTCAGA




GGCGGACCGGAAGGCGAAGGGCCTGCCTGAGCTGAA




GAAGAAGGCGGCAGCGCAGACTCAGGCCAAGCGGGC




GTGA





60
146
ATGGCTAGCCTGGTCTACTCCCACGAGTGGCTGATCTC




CAACTTTTTGAAAGTGGAGGCCCAGTCCGTCGACTCG




CCTTCCTTTAAGCTGGGCCCTCATGCCTGGAAGCTTCA




ACTCTACCCCTCTCAGGATAAAACGCACCTGTCCGTGT




ACCTGCGCTCCGTGGAGCCGAAAGCACCGCGAGCAGT




GAACTTCAAGTTCGTGCTGCGCAATTGGCAAGACCCC




AAGGATGACTTCAAAAGCGCAGACGCAAGCTACACCT




ACACCGACGCGTGCGTGGCGGGATATGGCTTTCCCAG




CTTCATTCCTCGCGAGAAGCTCAGTATCGCCTCCGGCT




TCCTGCGTCCCACTAGTCCCACCAACGGCGGCGCGTT




GCTGCTGCGTATAGAGCTCGAGTACAACACACTTCCG




GCGGCCTCCAGCGCGGCGGCGGATGGCAGCAGCGGC




GGTGACGGCGGCGGTGGCGTTTACCCGGCAACTGTGT




GCGACGGCGCGGTCTCTGCCGGTAGCGGCGACATTGC




CACGGACCTGCTCTCACTCTGGAAGCGCCCCGGCCCC




ACCTCCGATCTCATTATCATCGCTACCGCGCCCGCCGG




TGCGGCGGCGGCAGTGGCGGCCAACCCAACAGCAGA




GGTCTTGGGAACGGGAGCGGGCGCGGCTGCTACCATC




AAACCCACCACTGCCACGGCGGCGGCTGACGGCGGCG




GCAGCAGCTGCGGCCCCAGCAACACCGGCATGCGGCG




CTTCGACGTGCACCGCGCCATCGTGGCCGCGCGCTGC




CCCTACTTCGCCACGCTGTTTGACAGCGGCATGCGCG




ACAGCAGCGCACGCGAACTGCCGCTGCCCGACACCGA




CCCCGCCGCTCTGGAGCCGCTGCTGCACTTCATGTACG




GTGGCGGGCTCACCGTCACTACCCGCCAGCAGGCGCG




CAGCTCCTTGGAGCTGGCAGACCGGCTGCTGCTGCCC




AAGGTGGCGGCGCTGCTGCGGACGCACCTGCTGTCCA




CCGTGACTGTGGCCAGCGTGGTGCAAGAAGTTCTGTG




GGCGGCGGACGCGGCGCAAACAGAGCTGTTGACGGG




CCTGCTAGATTTCGCGGCGGAGGCAGAGGCTGACCTG




CCAGAGCGCGACCTGCAGCAGCTGGCGGCGCAGCAG




CCGGCGCTGATGGCACAGCTGTTCACGGCCGCTCGCC




GCGCCGCGAAACGCTCGTGCACGTAA





61
147
ATGAAGATGTTGGAATTTCGCCTGAAGCTGGGCACCG




GAGCAGACTGGGAGGCGCTCGGACCTATTCCAGAGCC




GTTTCCGTTCTCCATCGACGCGGACTGCACCACTTTGG




CTTTTAAGCACTACCTCAGCCACAAAATCCTAAATGG




GGTCTTCGAGCCTGGAAACTTTCAGCTTCGGCTGCAG




GGCTGCGACAAGGAGCTGGAGGACGTCGCTGACGCCG




GCCAACCCACCACTTCCACGCACCAACCCCAGCTGCG




GCGGCTTGCCAGCCAGGGCGTGTGCAACGGCAGCGTG




CTGCAGCTTGACGTGTGTGCGACTGAGGAGGAGCTGC




AGCGGTTCCTGGACGCAGCGGAGGAGTGCGGCACGGC




TAACGAGCTTGGGCACGTCGAGGAGCAGCAGGAGGC




GCAGACGCCACCAGCGGCAGGTGTCGATCCGCGGCAG




CGGCACGCAGCAGAAGGCAGTGCGGCGGCGGCGGGC




GACGGGCCAACCGGGCGGCCCAGCCTTGGCATGATGC




ACACGCCTGCGGGCACGGTGGGCACCTTTCTGGACGA




TGAAGACGCGGACTACCTGCAGGAGGACCTAGAGGC




GCTGGTGCAGCCGGCGGCGCAGCGGGCTGGGGAGGA




GGAGCTCGATCACTTAAACGTGGCGGCCGACGGCGAG




CCTTTTGAGGCCGAGGACGCTGAGGACTTTGAGGCGC




ATGGCAGGGAGTTGCGAGGAGCAGGAGGCGTTGTGG




GGGCCCCTCAGCAACAGCATCCGGCCTTCGCGGCTGC




GGCGGAGGGGCGAGAGCAGGAAGGTGACGACGAGGA




CTGGGGCGACATGGGCCTGCGGTCAGCCGGCACTCGG




ACCGCGGGCCAGCCCGAACGGCGGGCTGCGGTCGCG




ACGCCGGCGCGGAAGCAGCAACAGCAACAGCAGCGA




CCGCGGGCTAACCTCCAGTCAGCGGCCAAGCGGGCGC




GCAGGGAGGCGCCGGAAGAGGAGCTTGACTTCGTGTC




GGGGTCAGCGGACGAGGGCGCTCAGCCCGCCCAGCA




ACAGCAGTGCACGCATGGCGCGGCGATCGTCGGCGGC




AGCACCAGAGGCGCCGCCGCGCCTGCACGCGCGGCG




GCAACGGGTGCTGCCACTGCTGCTGGCGCCGCCGCAC




CTAGGTCGCAGCCGCCGCGACAGCCGGCACTTGCACG




GTCTACGGGACTGCCGGCGGCCATGCAGCCTGCAGTG




GACACGGGCGCATTTAGCGCCTACGGCGGTGGCGGTG




GCCAGCAGCGAGCCTCAAGTGGCTTCTGGTCGCTGGA




GGAGACTGAACGCCTGGTGGAGTGGGTTGACTCGCAC




GGCGCGCGGCAGTGGACCATGTTCGTACAGCTGAACA




CCGACTTACACAGAGACGTGGAGCAAGTGAAGATGA




AGTGGCGTAACCTCAAGAACGCCAGCAAGAAGCGCTG




GACCGTTGCGCGGAGAGTACCTCCGCCGGACCTGCGG




GCACGCATCGACGAGATTGTGCGTCGGGACACTTAG





62
148
ATGGCTGCAAGCACGCTCGGGGATGCGCAGCAGGTCG




AATCCTTTGTGCACCAGCTCATAAATCCTGCGACACGC




GAGAATGCGTTGTTGGAGCTGAGCAAGAAGCGGGAG




AATTTCCCGGAGCTTGCGCCCTACCTCTGGCACTCCTT




CGGGGCAATCGCGGCGCTGCTTCAGGAGATCGTGGCC




ATTTACCCGCTGCTCTCGCCGCCGTCGTTGACAGCACA




TGCATCAAATCGCGTGTGCAATGCTCTGGCGCTGCTGC




AATGCGTGGCGTCTCACAATGAGACGAGGGCACTGTT




CCTCCAAGCGCACATCCCGCTCTTCCTGTACCCCTTCC




TCCAAACCATGAGCAAAACGCGGCCGTTCGAGTACCT




GCGCCTGACCAGCCTGGGCGTGATCGGCGCGCTGGTC




AAGGTGGACGACACGGACGTGATCAACTTCCTGCTGT




CCACCGAGATCATCCCGCTGTGCCTGCGCACCATGGA




GATCGGCACGGAGCTGTCCAAGACCGTGGCCACCTTC




ATCGTGCAGAAGATCCTGCTGGACGACGTGGGCCTGA




ACTACATCTGCGCCACTGCCGAGCGCTTCTTCGCGGTG




GGCGCCGTGCTGGGCAACATGGTGGTGGCGCAGGCGC




AGATGGTGGACCAGCCCAGCCAGCGGCTGCTCAAGCA




CATCATCCGCTGCTACCTGCGCCTGTCCGACAACCCGC




GCGCGCGCGAGGCGCTGCGGTCCTGCCTGCCGGAGCT




GCTGCGCAACACGCAGTTCACGGCGTGCCTGAAGAAC




GACGACACCACGCGCAGGTGGCTGGCGCAGCTGCTCA




TGAACGTGGGCTTCTCCGACTCCGCCGCGGCACTGGG




TGCGCCCGACGTGGTGCAGCCATCGCCCGTCATGGGC




GCGTGA





63
149
ATGGGGAGCAGCAGCGAACGATTGCCAGCAGGTTCTG




GTAGCTGCCTACACCCTGGCTGCAGCGGATTGTGCTGT




CTGGCAAAAGCCCCAGTCTCCGACACCATCGTCGTTTC




TACCGCGGCCCCCTCCGCGGGTTGTGACCTGAAGCTG




GTGTGCTGCGACGGCGCGCTGATGGCCAGCCGCTGCG




TGCTGTGCCGCGCCTCGTCCGTGCTGCGGTCAACGCTG




GAGCTGGAGCTGCCGGAAGCAGGCGAGCTGCGCCTGC




CGGCAGACAAGGCCGAGTCGTGGCGCATGGCCCTCAG




CTTGCTGAGCCTGGAGGCGTACCCGCTATCGCTCGTG




ACATCGGACAACGTCGTGGACCTGCTGCTGCTGGCCG




ACAAGTACGACATACCCATCGTCCGGGGCGCCTGTGC




GCACTTCCTGCACCTGAACGCGCGGCAGCTATCTCTA




GTGCCGCCGCTGTCCTCTGCCTCCAACCTGCTCACCGC




CGCCAGCCTGGTCATCAAGTTCGTACAGCCGTACCCG




GGGCTGCAGCAGTACGGCAGTACGGTACAGGCCCGAC




TGGATGATGAGCTGGCGATGCTGAGGATGCCGCCGGA




CGTGCTGCTGGCGGCTGTCCAGGCTGCGGGCGGCCCG




GGCGCCCCGGACCGCGCCGCCTCCGCTCTGGCGGCCT




GGCAGCGCGACCTGGTGCGGCTGACGTCCGAGCTGCA




CGTCCTGGTGGGCGCCGCCGACTACGCAGGCACCGTG




GCGCCGGAGGTGCAGGCGGCTGTGACCTTGGGGCTGC




TGGCGGCGGTGCGGCACAGCGCCTCCCGTGTGGCGCC




CACGTGCGGCCGCTGCGGCGGCGTGCTGCAGGCGGGC




CCAGGGGCACTGCACGCAGACTGCGCGGCAGCGCAAT




ACACAGACCTGCACACACGCGGCTGCCGGCTGTGCAA




TGCGCCCATGCTGCCCACCCATGCGCGCTTCTGCAACT




CGTGCGCCTACCGCAAGCACAAGAAGTCATAA





64
150
ATGGGGTTTCCGCAGCTGATGGTGCAGGTGCTGCCAG




CGCAGGCGGCCCTGGCAGCCCACCTTCAACAGCAGCA




ACAGCAGTCCATAGCGGCGGCACTCGCGCCCCAGCTG




GCGGCGGCGGTGCACGCACACGCTGCGCCCATGGCGC




CTCTAGCTGCGCCGCCGGCGCAGATACCCGCGCGCGT




GGCCTCGCCCACGTACCGTCATACCGGGAGAGCGCAA




GCCGCGGAAGCCGCCGCCGGGTCGCGAGCACCGGTTA




GCCATAGCACGGTGGAAAAGCAGCGGCGCGACCGCA




TCAACTCGCTGATCGACGAGCTGCGGGACCTCGTGCC




GCCGACGCAGCAGCAACAGCAGCAACAGCAGCAGAT




TGGGGTGGTCACCATTGGTGTGAGCGACAACCCGGAG




GCCTCGTCGCGGCGGCCCAAGCACGTGGTTCTGGCGG




ACACTATCAACCTGCTGAAAGCGCTGAGGCAGCGGGT




GTCGTTTGCGGCTGTGACGGCGGAGCTGCAGCAGCTA




CCGGCGGGCGGCAGCGGCGGTGGCGGTGGCGGCGGC




GGCGGCGCACTACCACTGCCACTGCCGGTGCCAGGCA




TGTACGGTGCTGTGGCAGGCGCGGGCATGGTGCCGGG




CATGCCCGGGAGCGGCGTGCAGCCAGTGAAGCAGGA




GCCGCAGGGGTCATCCAGCCAAGATGATGACGACATG




GGACACCCCGGGGGCCCCGGCGTCACAGTCAAGAAG




GGGCCAGACTGTTTCTACGTCCAGGTCACATGTCCGG




ACCGCAAGGGGCTGCTGTCGGACATCACCGACACGTT




GCGGAACTTATCACTGGAGGTCCGCACGGCCGCCGTC




ACCACCAATGGCGGCTCGGTGCGTGACGTGTTCGAAG




TGGTGCCCCCTGACGGCGCCGTCGCACTGGCGCCCGA




GGCGGTCCAAAGCATGGTGCAGGGCGCGCTGTCGCAG




CGCGTGGCGGAGGGGCAGCAGGAGGTCACGGCAGGC




AAGCGCCCGCGTGCATGA





65
151
ATGTTTCCAAACCCATTTTTCGGCATGGGCGCGCCCTT




CGGGCCGGGCATGAATAACATGGGTGGTATGCCGGGA




CAGGAGATGGCTGGGATGCCGGGGTTCCCGGGCATGC




CTGGTGGCACGATGGGTCCGGGGATGCCCGGCGGCAA




CATGGGTGGTGGAGGCGGTATGATGGGCGGTGGCCCG




ATGGGCGGCCAGGGACACGGCGGAGGCGGAGGAGGC




GGAGGCGGTGGCGGCGAGGGCCACCGGGGCGGCATG




GGCGGCGGCGGAGGGCGGGGCCCTGGCGGCGACAAG




CGCCCGGGTATGTGCGTCAGGTGGTCCAACAGCGGCA




GCTGCCAGTTCGGAGACAGGTGCAGGTACCTGCACGG




GCAAGGCGACAGCCGGTACCCGCCAGGGCCATCCGAC




GGCGGGCCGGGAGGGTTCATGGGCGGCGGCGGTGGC




GGCGGCGGCGGCGGTCCCATCCGCCGCGGCGGGAGA




GGCGGCGGCGGCGACGATGGGCCGGGCGGCCGCGGC




TCCCGCCCTACGGGCCCCAAGACGCGCCTGTGTGAGA




AGTTCATGGCCACGGGAACGTGTCGGTACGGCGACAC




CTGCATATTTGCACACGGGATGGAGGAGCTGCGGCCG




GGCCGTGACGCCGGAGGCCCGCCTCCCCCGCAGCCGC




CGCCACAACAGGCGCAGCAGATGCAGCAGCAGCAAC




AGCAGCAGCACCAACAGCAGCACCAGCAGCAGCAAC




AGCAGCAACACCGGCAGCAGCAACAAGACGGGGGCA




ACGCCACGTCACCTTCGCGAGGCGCCTTTGGCGGGCC




GAGCGGGAGACCGCAGGCGCAGCAGGGTGGGCCGGC




GGCGGGTGCCCGTGGGCAGCCACCACCAGCAGCAGC




AAGTGCGCCACAGGATGCAGCCGCTGCGACAGGCGCA




CAAGCAGCCACAGCTGCAGCAACCGCAGCCGCACCCT




CCAAGCCCCAGGAGGTCACCTTTGTGGACAAGGTGCG




CGCGCTGTGCGGCGTGCTGCACATCGGCCAGGCGGCG




GCGCTGGCGGCGGAGAAGCCGCTGGCGCTCACCACCG




CGGCCATGTCGCTGCGAGCCGGCACCGCCTACAAGGA




GAACCCTTTTGCGGACGGAGTGGAGAGATACGTGGCG




ATCTCGGCCGGTGGAGGGGGAGGGGGAGGCAGCGCC




GGCCAGGGGCAGATGCAGCACTAG





66
152
ATGATTAAAGGCGTGAACCGACCAGCCATTTTCTATG




ACTTGGTCGGGCTTGCGCACTCAGGCGTGGTATCTGTG




GGCAGAGAGAGCGATTCTACTATACGCCTCGACTGCC




CGGAAGTCCCTTTCCTGCTCTCGCGCAAGCATGCTAAA




ATTTGCGTCAATCCAGACGGCAGCCTTATTCTGAAGG




ACATCAACTCCACGAACGGCACCTACATCGCTCGTGA




AGGCGAATTTCTCAGGCGGCTGCGGTCGGATGAGGGC




TGGGAGCTACGCCGCGGCGACCTGATTGGCTTTGGCG




GGCCGGAGACCATTGTTGCGCGTAGCGATGTGCCGGA




CGTCACCGTCGCCAACCCCTTCCTGTTCCGCTACACGC




CGCTGGACGACGATGCAGATAGTGCGTTTAACTCGTC




TGCAGAGCAGCAGCTGCTGGGCAACGGAGCGCAGGG




TCGCGCACGGAAATTCAGCGAAATTGAAGACCGCTGC




CAGCAAGAGGACATGGACTGCGAGGTGGCGTCCACCA




GCTCACCCGACAAAGAGAGCAAGAAGGCCAAGACGG




CGGTGACAGCAAAGGACATCGTGTCTAACCTAGCAAA




CCACCTAACGTGTGCCATCTGCCACGACTGGCTCGCTG




GTGCACATGCGCTAACATGCGGGCATATGTTCTGCGG




CATCTGCCTCGCGGGGTGGCTGGCACAAAAGCAATCC




TGCCCGGAGTGCCGGAAACCGAGTGCAGGTGTCCCTG




TGAGGTGCCGCGGTGTCGACAACTCCATCTCTGACAT




CCTCCAACACAACCTGGTGTCGCCGAACTCAAAGCGT




GAAAGGCGTCGGAAGCAGCTGGCGTGGGAGGAGGTC




GGAGACGGTGTGCTTGAAAGCTGGACAAATGCGATGC




AGCAGCGGCGGCAACAGGCTGTGAACGTAGCATCGCA




ACACCTAGCAAACCTGACGGGGCAACCTGCACCTGCG




CCGGTTGCTGCTGCACCACGACCAGACGGCGCCGTGG




TTGGTCAGAACACGCGGCGGGCCAACCAAGGCGGCGC




GCCGCCAGTCAACCGGCCAGCCCGGTGA





67
153
ATGGAGGGCTGGGGAGCAGCATCCACCATCCTCGGGG




TGGCCCATTTGCCACCCGGCAACGCTTGGGGCCCGGA




GGAGTGCCTAACCTTTCATACCCGAACCGCGGTGTAC




CGCCTTCCGCTCAGCGCAACCGCCTCCGGAGGCGGCG




CGACGCCGCAGCTGCTGGCGGGGCAGGAGAGCGAGC




GGGGCGCAGCGAGGGTCGATGGCAGCGGTGCCGACG




CCCGGTTCCATCACCTCAGCAGTGCAGGCCTCCAGGT




CAACGCAGACGGTCGGCTGTTGCTCCTTGACTTGGACT




CAACAGCGGATGTAACGCGCCTGCGCCTCGTTGCTCC




TGGTGGAACTGTTAGCACGGTGACAGGCGTGGAGCTG




GCTGGCCGGTGGGTAGACCTGGTAATCCTGCCAAACG




GCTACCTAGCTGCACGTGAAATTGCACAAGTGCAGTC




TGACGGGGACCTGGATGAAGACGAAATGGCAGAGCC




GTACTGGGAGAGCAAGCGCGTTGCGGTGATTGCGACC




AGCTTCACACCACTGGCGCTTGTGGCAACAGCAGCAG




CGGCGGGGCCGCCGCCGCGCAGCCTGCCCGCCGACCT




GGGTGCGCTGGTGGAGGACGCGCAGCAACCTGGCGGC




GGCGGCGCAGTAGCAGACCTGGTCATTCGCGTGGGCG




AGCGGCGCTTTCACTGCCACCGGGCCATCCTGTCCGC




GCGCTGCGACTACTTCAAGCACCGCCTGGCGGGCGAC




GCGTTCGAAGACGCGCGCGCGGCGGAGCTGGAGCTGC




CGGACGCGGACCCCGACACCTTCGCGCTGCTGCTGCG




CTGGTTGTACACGGGCGGCGCGGACATTTTGCCTAAA




CAGGCGCGCGGCGTGGCTGAGCTGGCGGACCGGCTGC




TCCTGCCTGAGCTGTGCGCCCGCGCGTTGGACGTGTTG




TTCGCGTCAGTGGACGCCGGAAGCATCGTGGACAGCC




TGCTGTGGGCCGCGGGCTGCTGCGAGGCGCACGGTGG




CGGCGGCGCTTTCGATCAGCTGTTGCTGCGGCTGAAG




CGCTGGTACGTCGAGCGGGCGGCGGAGGTGCGGGCCG




CGGCGCGAGACAGCCTGCGGGCGCTGATGACCCAGCA




GCCTGACCTGATGCTAGAGCTGATGGAGGCGAGCGAG




CAGCGGGCGGTGAAGCGGGCCCGGACCAAGTAG





68
154
ATGGCGGAGCTTGAGGATGATGTCCTCGTTCAGGCCG




GCGAGCAGGACGATGCCAACGACCTCAACCGGCAGCT




GTTCGGTGCCGATAGCGACGATGAGGGCGCGCCGCCC




GCGGCCGACCCGCACGCCCAGGCGCAGCACCTGGCGG




AGCAGGAGGCGCTGCTGGAGGATGACTTGGAGGACG




CAGACGTAGACGCCGAGGCGGCGCTAGAGGACGAGC




TGTCGGGCGGCAGCAGCGACGACGGCGGGGCGGTCA




AGAAGGGCAAGAAGGATAAGAAGCTGCGCAAGAAGC




GCGAGGGTGGCAAGGACGACAAGCCCAAAAAGAAGC




GCCAGCGGGGCGAGGGCGGCAAGGGTGAGAAGGGCG




ACAAGGCGGGAAAGAAGGGCAAAGCCCCGAAGGAGA




CCATCGCCACGGGCAGGTCTCGGCGGACGCCGGGCGG




TGGCGAGGCGGGCGAGGAGCAGCAGCCGCGCCCACG




CCGCCCCGTGGGCGAGGGCGGAGACGACCTGCCCAGT




GATGAGCTGCAGGAGCAGGAGGCGGACCGTGCCTTCA




TTGACGATGACGGTGCGGAGCCGGTTGCCAGTGATGA




TGAGAATGCGCCGCGTGTGGTGGCGGACGAGGCGGA




GGAGGCGATTGACGCGGACGAGGACCACCCCTTCAAG




CGCAAGAAGCGGAAGAAGGAGAACACCGGCAACGTG




GAGCTGGAGATCAAGGAGATGCTGGGCAAGATGGAG




GCGGCCATGGAGCATGACTTCGAGACGGTGGCGCGCA




ACGCGGGCGTGGAGCTGAAGAAGGACAGCGGCGACA




ACCTGGTGACGGACGCGGAGGGGCACTACGTGGTGGC




GCGCAAGGGGCCGCCGCCGGCCTCCAAGAGCCCCGCC




ATCAGCAAGCTCAGGCTGCTGCCGGAGCTGGAGCTGT




TCCTGGCGCAGCGCAAGTACCACGAGAGCTTCCTGCA




GCAGGGCGGGCTGGGTGTGCTGAAGGGCTGGCTGGAG




CCCTACTTTGACGGCACGCTGCCCACCATGCGCGTGC




GCACGGCGGTGCTCAAGGGGCTGCAGACCCTGCCCAT




CGACACGCGATTTGAGGACCACAAGGAGATGCTGCGC




AAAAGCCAGGTGGGCAAGAACGTGATGTTCCTGTTCA




AGTGCTCGGAGGAGACGGCCGACAACCGCCGCATCGC




CAAGGAGCTGGTGCACCGCTGGAGCAGACCCATCTTC




TACGACCAGGAGGCGGAGGAGGCCAAGAAGCAGCTG




CACCAGCAGCAGCTGCTGGAGGCTCGGCGCATGGAGC




TGGAGCGCCGCCAGGCAGACGGCGGCGAGGAGGACA




AGAGCGCGTCGGCGCAAGTGCGCAACAAGGCCATGC




GCATCCACGCGCTCATCCCGCGGGCGTCCAAGCTGGA




CTACGTGAACAACCCGGGTGCGGCCAAGGACTTCAAC




GAGAGCGAGGTGGCCAACGCCGCCGCCGCCGCCGGC




CCCAAGTCCAAGCAGGTGGACGCGCTCACCAAGCGCC




TGCGTGAGCAGCAGAAGAAGCTCAAGGACGGCAGCG




CACGCGCCATGAAGCCCAGTGTGGAGGGCCGCAACAT




TGTGCTCATGAAGTAG





69
155
ATGTCGGTCGTGTCAGCGAACAGCAGCACTGGCCGGG




AGCCGGAGCCCGCCACCTCCAGCACCTCCTCTCCCGC




CACAGCCGCGCCCACGCTGCCACTACGCAGTGCCGCA




TCCGGGGACGCCACGGATTCTGAGTCCAACAGCCCCG




GCCCCAGCACCCCCTCCGCCCCGGGGCCGCGGCAGGT




ACCCACCGTGGATGCAGTATTCCCCACGCGGTACGGC




ACACGCTTCCGCGTGCGGCCGTACAGCAACAACGAGT




ACGGCTCCATCATTGACTTGCAGTCAGAGGCCTTCCAC




ACGCTCAACCCGGTGCCCTTCCTGAATGACTTCACCTA




CAAGCGCTTCCGGGCCGAGGTGGTGGATGCGTTGAAG




CAGAAGACCAAATACTCGGACCCCTCCGTCTTCCAGC




TCCTCGTGGCGTTGGAGCAGGAGCCGGAGCAGGAGCC




ATCAGGCAGCAGCAGCAGCAGCAGCAGCAACAACGG




CGATGGCAGTAGCAACGGCAACAGCAGCAGCAGCAG




CAGCAGTGCCAAGGTGGTGGGGGTGGTGGAGGTGTCC




CTGATGGAGGAGCGGGGGGTGCTGGGGTGCCTGCCGC




CCGGCACGCGCGAGTACGCCTACGTCAGCAGCATGTG




TGTGGCGCCCACCGCCAGGCGGCGAGGCGTGGCGCAG




GCGCTCATGAGCGCGGCGGAGGAGCAGGCGCGGCTGT




GGGGTCAGCAGCAGCTGGCGCTGCACGTGTACCGCGA




CAACACGCCCGCGGTGCAGCTGTACGGCGGCTGGGGC




ATGGCCGTACTCAACACCGACCCCGACTGGAAGGCCT




GGTTCGGAGACCGCGTGCGGCTGCTCATGCACAAGCG




GTTGGCGTAG





70
156
ATGCGTACCGCAATCCTCGCCCCATCCCACGGCCCTG




CCTCCTCCTTCCAGCAACGCACAAATTCGGTGCACAC




GCGGACTGTACTCGCGCACGGCGCTGCGGGGTCGGCG




AATCGCTCCTCTGCACCATCGGCATCGACGACCCCCTC




GGCCTCATCCGCGCTGGATGCAACCCAACCCATCATC




CGGACGCTGAAGGAGTGCGACACTGGAGCCATCACGC




GCGCGTCGGTGTGCTTTGGCCGGTCGATGCGGACCGA




CCCCACCATGACCTACGTCACCGGGGGCCGCTGCCCG




GAGCGCGTGGGGGCGCTGTTCGAGCAGGTGGCAACCA




TGTGCATGCGCGGTGCCCGCGACCCCGCCACCACCTG




GCTGCTGGAGACGCCCCGCAGCGGCGGCGACAGCGA




CAGTGCGGATGTGGTGTGCATCGCATGCGAGTACCCG




GCGGCCTACCCCAGCGACTGGGAGCTGCTGCGCGCCG




GGCTGCTGCGTGTGCTGCTGGCCTGCCCGGGCTGGGG




CGTGCTGCGCGCGCTGATGAACATGCTGGACCAGTTC




AACGCCACCAAGGCGCAGTTCAACAAGGAGCACGGC




GATTTCCTGTACATTGCGTGCTTTGGCACTGCCCCGGA




GCAGCAGGGCCGCGGGCTGGGCTCACAGCTGATGCGG




CGGGTGCTGCAGCACGCAGACGCCAAAGACCTGCCCG




TCTACCTGGAGGCCAGTGGCGCCGCGTCGGCGGCGTT




CTACCGCCGCCACGGATTCCAGGACATTAAGCAGGTC




CGGGCCAGCCCCGGCGCCCCAGACCTCATCATCATGG




CCCGGCCCCGCGCCTCGCAGCTGCAGCAGCACGGCCA




GCAGCAGTAG





71
157
ATGAGCGTCGCCAAGTATACATACGAGTGGCTCATCA




AGCGTTCCGCTGAGCTCCCTGACGCTGTCGAGACACC




CGACTTCGTGCTGGGCTTCTATACCTGGAGGCTGCGGC




TGCATCTGCGCCAGTCGATCAACCTTCGAAAGCACGT




GCCCCTGTACCTGCACCATGTGCCAGTACGGGGAGGC




GTGGACGCGCCGCCGCCCCTGAAGTACACTTTTGTAG




TGAAGAACTGGAAGGACCCATCCAAGGACCATGTGAC




TGAGGGCAAGCCCGGTACGGTCTTCAACCTCAAAAAC




GCAAAATGGGGCAAAGAGCTGATCTTGCGGGACCAGC




TGATGTCCATTGACACGGGGTTCCTGCGCTGTGACGG




CTCCCTGCTGCTGCGGCTGGAGCTTCAAATGCCGGAG




AAGAAACAATGGAGCGATGACGATGACGACTCGAAA




TATGACTCGGATGAGGAGGAGGCCTACCCTGCGGTCC




TCAAGGAGGGCTCGGGCGGCGGCAGCAGCATCGGCA




GCGATTTCCTCTCGCTGCTGGCCGATCCCGGCCCCACC




ACTGACCTCACCATCACCGCGACAGCAGCGGTCGCGG




GCGGTGTTACGGGGGCCGGGAAAGAGGGGGGAAGTA




AGAAGCGAAAAGCCGACACCGCCAGCAGCAACGGCG




GCAGCACTGGCGCAAGCAGCAGCCGCTTCCCCGTGCA




CCGCGCCATCCTGGCCGCGCGCTGCCCCTACTTCGCCA




CGCACTTCGCCAGCGGGCTCGGCGACAGCAACACGCG




CGAGCTGCACATGCCGGACACCGACCCGGACGCGCTG




GCGGCACTGCTGCGCTTCGTGTACGGCGGGGAGCTTC




GTGTGGCTTCCCGGGAGCAGGCGTCGCGCTGCCTAGC




GCTGGCGGACCGGCTGCTGCTGCCCAAGGCGGCAGGG




CTGCTGCGAGCGCACCTGCTGGCCACCCTGTCTCCGG




CTACCGTCATGGCGGACCTGACGTGGGCGGCGGGTCT




GGCGGAGGGCCAGGGGCAGGCGGAGTTGCTGACGGG




GCTTGTGGACTACGCCGCAGAGCAGGAGGCGGACATT




GCAGAGGAGCAGGTGGAGCAGCTGGCGGCGGCACAG




CCCGCGCTCATGGCGAAGCTCTTTACGGCGCGGGTGC




AGGCTGCCAAGCGCTGCCGCGTGTGGAAGGCATGCTG




A





72
158
ATGGATAACTCACCTGCAGTGCTCAATGGAGCAGCGG




ACAACTCGGAACTGCCCATGGCTCAAGTTAAAAGGAT




AATGCACAGTAGAGGCGTCACGTCAAATGCGGAAAGC




AGCTTTCTGGTCGCCCGTGCTGCGGAGATGTTCTTGGA




TGCGCTTGTGGCGCGCGCCGGCGGCGCCATGGCAGCG




GGGGGCGAGGCGGAGCTCCGATACGATCACGTGGCCG




ACGGCGTCCAGACCTGGGCGCCAGGGAGCCGCCTGCT




GTCAGACGCGGTACCGAAGCGCGTGCATGCCGGGCAG




CTGCGACGGGACCCGCGCTTCAACGGCCGCACGCCGT




GGGTGCTGCCGCCGCCAGCCGGGCAGCAGCAGCAGCC




TCATCAGGAGCACACGGCGGTGGCGGCGGCGGCACA




ACGAGGCCCGGCGGCGGCCGCAGCGGTGGCGCAGCC




GATGGGTGTGCCCCAGGGCGTGCCTCTGGGTGTGCCT




CAAGCGTCCGCGCCAGGGATGGCGCATGCGCACGTGC




CGCATCTGCCCATACACGCAGCTGCCATGCAGCAGCA




GCAGCAGTCGCACAACCATGTTGGCCCGGCGCAGGTA




CCGCAAGCGGTGTTGCCACCGCCGCAGCAGCATCAGC




ACCAACACCAACAACAGCAACAGCAGCAGCAACAGC




AACAGCAGCAAGCCGCTTTCGCGCAGCACCTGCAGCA




ACAGATGCTCATGCAGCAGCAAGTACTACTGCAGCAG




CAGCAGCAGCAGGCGCAGGCACAGCAGCAAGCGCTT




GCGCAGCAGCAGGCGCAGCAGCAACAACAACAGCAA




CAGGAGGCCGCTGCGGCGGCGGCGGCGGCGGCGGCG




GTGGCAGCCGCGACGGCGGCGCAGCAGCAGGCTGTG




AGCTCTGTAGCGACCGTGTCGCAAGCTGTTGCGGGTA




TGGTGCCGGGCGGCGTGCCGGCGCCGCAGGACCCGCA




CCAGCAACATCAACAACAAGCCGCAGCCCTGGCCATG




CAGCATCAGCTTATGCTACAGCTGCAGCACCAGCAGC




AAATGCAGATGACGTTGATGTTTCAACAACAGCTACA




GCAGCAGCAGCAACAACAGCAGCAGCAACAGCAGCA




TATGATGATGATAGGGGCAGGTCAGCATCCCTACTTC




CTCGGCGGCGCGGCGGCGGCGGCGGCGAGTGCTGGCG




GCGGCTTCGGCGGGGGCTCTGTGATGGGCATGCCGGC




ACAGGGCGGGCAGTGA





73
159
ATGTCAAGATGTTCGTTGGCGCTGGGGCTGTTTGGACT




TTTGCTGGCGGGCATGGCGGGCATGGATGGTGTGGAT




GCTGCTGGCAGCAAAATAACTGCCGCGGACCTAGCAA




ACCTCAACCTATACAAGGTGTTGGGTGTCACAGCCAA




GGCTACTTCCGTGGAGATTGCAAAGGCCTACCGCAAG




CTGGCCATCAAGTATCACCCTGATAAGAATCCTCAGG




GTCAGGACCAGTTCATCAAAATTGCATACGCCTATGA




GATCCTGGGTGATGAGACCAAGCGGGCGCGCTACGAC




GCCGGCGGCTTCGCTGCGGCCACCGAGTTCGCGGCGC




AGGCGCCCAACTGGGACACCTGGCAGCCGCCCGAGGC




GCCCAGCGCCACTGTGTTCGAGGAGTGGCAAAACCAC




AACATCTACTACGACCTGGCCATGCTAGTGGCACTGC




TGGCGGGCGGCGCGGCGGCCTGGGTGGCGTGGGTGCA




GGCCTCTGAGCGGCTCAAGCGGGCACGCAAGGCAGCA




CGCAAGGCAGCGGGGGGCGGCAAGTCGGCTCCGGCA




AGCGGCGCCGGCAGCAGCCGCCCACGGCGGCAGCGG




GTGCAGTCCAGCGGCGCGCTGTCAACAGGGTCGGGCG




CGGGCATGGGCGGCAGCGACAGCGACAGTGACGAGG




CCGGCGGGGGCGCCAGGGCCGATCAACCAGACACGG




CGGCCCCGGGTCCCTCCGGCTCGCTGCTGCTGCAGCC




CGCCAAACCGGCCGGTGGCTCGGCTGCAGCTGCCATG




CGGGAGTGGAGCGCCGAGGAGCTGCGGCTGCTAGAC




AAGGGTCTGAAGAAGTTCCCCGTGGGCACCGTCAAGC




GCTGGGAGGCGGTGACGGGCGTGGTACGCACTCGCAC




CCTGGAGGAGGTGCTGGTCATGGTCAAGAACTACAAG




GGCGGGTCGCATCTGCGGGCCAGAGTGCAGGAGGATT




GGAAGGCGGGGCGGAAGGCGGGGGCCGCAACGGTAG




CGGTGGCAGCCTCTCAGGCGGCGCCCGACATACGTTA




CGATGGCCCGCCCACTGTGAATGGCGGACCAGCGGAC




GGAGAGCACACAGCAGCGGTGGCGGCAGCGGTGGCA




GCAACAGCAGCAGCAACCGCCGGGGGTCAGGTGCTA




GCCACCGGTGGCGGGACCAAGGCGGCCAAGGCGCCG




GCAGGGACAGAGAAGGCGGGCGTGGATGCGCCATGG




ACCGAGGCCCAGGAGGTGGCACTGGTGGCGGCGCTGA




AGCAGTGCCCCAAGGAGCTGGGCGCGGAGCGCTGGG




ACGCGGTGGCCAAGCTGGTGCCGGGGCGCAGCAAGG




CGCAGTGCTTCAAGCGCTTCAAGGAGCTGAGGGACGC




CTTCCGCAGCAAGAAGGGGGCGGGGGGTGGAGCGGA




GGGAGATGACGGCGACGACTGA





74
160
ATGCATGAAGGACAAAACACATGTGGCCCTGCGACCA




GAGGTCATGCCGACGGAGGTGGTCTCGGCGTGCACTT




GCTTGTGGCGGGAGCGATTCTCCACGGCCTTGCGTGT




GACGCGCCGGCTGCGCTCGCAGCACTTTGGCTTGAAC




GCTGTATCGCCTATAATCCTGTGCTTCTGACACACCTC




GACGGCGTCAACGACCTGCCAGCGCCACGGAGGTGCG




GCTGGGGCCGCGCGGCCCTGCCCTGGGCGGCGGTGAG




CTTGGCCGGCGGCCTCCCAGCCATTGACAAGGGCAGC




ACCACACGTCACGTGTGTGCTGGCTGCCACCAGCACC




TCACCACCTCTGACCTCGCACGCCTGGAGGAGCAGCA




GGAGCAGTCGGCGCCGCGCCACCTGCACCCGCACCCG




CAACCGCAGACCCCAGGCGCAGTGCTGCAGTGCGATG




GCTGCCACCGCTGCTTCCACGGCCCCTGCCACCGGCG




GTGGGCCGCTGCGGCGGAGCAGGAGCAGCGTCGGCG




GGCACGGGTACACGCGGCACGTGATGGCCGGGACGG




GTCGGGGAGGCGGCAACAGCCGGAGGCCGTGAGGGC




ATCGGCGGCGGCGGTGGAGGCTGGGGACCCGGGCGA




CGACGGGGCTTGGTTCCATGATACGGAGTGCAAACAG




GTCCGGGTGGCGCTGCTGCGGCTGTGCCGGCGGGGGG




ACATATGGCTGCCTGAGGGCACATCAACATCGCCGCC




AGCAATAGCAGCTGCACCACCCGCACCAGCAGCCGCG




AGCAGCAGCAGCGGCAGCAGCCTCGTTGCAGCACCAG




ACCACGCGGCCGCTTCAGGTCGCCCAGATGCTGCGCC




AGGCGCCAGCCCGCCCACGACCTTGACCTCGACCGCG




ACACCACACAGCGTACCTGAGTCCCCGCAGCAGCCGC




GGCAACGGCTGCGCATGCGGGTGTACGACTGCAATGA




CGGCGGGCCGGCGGCGGCTGTCGGTCTGCGGCGTGTG




CACGGCGTGCTGCGTGCCGCGGGCTTTGGCTACGGCC




TGAGCGACCTCCGGCAGTTTGATGTGGCGGCGTTGCT




GATGGCCGAGGACTCGGGCCAGGCCCTGTCCGCCGCC




GTACTGGACGTGTACGGCTCACACTTTGCGGAGCTGT




ACCTGCTGGCCACATGCGCCGCCGTACAGCGGCGCGG




GTACGGCCGGGCGCTGGTGCGGCAACTGGAGCAGGA




GCTAGCGGCCAGCGGCGTGCGGCGGCTTCTGGTGTCG




GTGGACGATGACGACCTGGTCAATCAGGGGCTGTGGC




ACCACGCGATGGGGTTTGGGTCCGTGCCTGACGCAGA




GCTCCGGCAGCTGGCGAGGAGCTGGGGGGCGTTCGGG




CCGGCGGCGCGGCGCGGCACCGTGTTCCTGTACCGGC




CCCTGCTTGGCGGAGCTGGCGAGGCGCAGGGGCAGGG




GCAGCACGGCAAGCGGTGA





75
161
ATGGTTGCCAGCAGCAGCGCCGAGGAGCAGCCGCGCG




TAGTCTCGTTGAGCTCGGCCAATCGGCAGCAGCTCTC




GCGCGCGGCAGTCTGCTTCGGTGCGTCTATGGTGGAG




GACCCGATCCTCATGTGGGCAACGGACGGCAAGAACC




CCGCCGGCTCAGTAGGCTTCTACACAAAGATGGCGGA




GGTGTTCTTCAATGCGATGGCGGACCGCAGCTGGTGC




TGGGCGTTGCAGGCGCCAGCCAATGCCAAAGCGCTAC




CCGTGGTGGGCGGTGAACTGGACGCCCACACTCCGCA




GAGCGTGTGCCTTGCTTGTGAGGTGCCGCGCGCCTAC




CCCTCCGACTGGCAGCTCCTGTGCGCGGGCATGGTGG




GGCTGGGCCTGCGCTCCCCCAGTTGGCGCTGCGTGCG




GATGTTCCTGCACCTCACGCCCGAGTTCCAGAAGCGG




CACAAGGCCTTCCACACGGAGCACGGGCCCTTCGTCT




ACATCGCCGCGTTCGGTACCCGGCCCAAGCTGTGGCG




CCGCGGCCGCGGCTCCCAGCTCATGTCGGCTGTCCTC




AAGATGGCAGACCAGAAGAACATGCACTGCTACCTGG




AGGCCAGCAGCGACGACAGCCGCCGCTTCTACGCCCG




ACACGGCTTTGCGCTGAAGGAGGAGCTCTGCGTGCTG




CCGCTCACAGCCTCCGACGCCGCCGGCGCGCCGCTGC




TGTACATTATGGTGCGGCCGCCCCAGGGCGCCGGTGC




TGGAGGTGCGGGCGGTGGTGGTGGCGGCGCGGGTGCG




CTGGCGGCCGGTGTTGGAGGCAAGGGCGCCGCTGCGG




CTGGCGCTGCGGTGGGACCGGTGGCGGCGCCGGCGAA




AGCGGCGGAGGTGGTGGTGACGGCGGCGGGCGGCAT




CGCGGCGACGGTGGCGGTGCCAGAGGCGGCGGCGGC




AGCGGCTGCATCCACAGAGCCGCAGAAGCAGACGGC




GGCGGCGGCGGCTGAGGCTGGGCAAGCTGGAGAGCG




TGCGCGACAGGGGGATGAGCAGGTGTAG





76
162
ATGTCTGACGATAGCGATGTTTCATTGCCAAGGACTA




CCTTACAAAAAATGATCAAGGACTTACTTCCACCGGA




CATGCGCTGCGCTAATGACACGGTGGAGATGGTCATT




GCGTGCTGCACCGAGTTCATCCAGCTTCTGTCCAGCGA




GTCTAATGAGGTGGCGACGCGGGAGGGCCGCTCCATC




ATCCACCCTGACCACGTCATGCGCGCGCTCACGGAGC




TGGGCTTCCAGGAGTTTGTGGGCGAGGTGAACGCAGC




GCTGCACACCTTCAAGGAAGAGACCAAGACGGCGCAC




TCGCGGAAGGCCGACCTGAGGAAGACGGGCGCCGAG




CAGGCGGGGCTCACGGAGGAGGAGCAGATCGCTCTAC




AACAGCAGATGTTTGCAGCGGCACGTGCGCAGTCCAT




GACCACGAGTGAGGTCGCCGCCTCCATGACCGCCTCC




TACGACCGAATGGCAATGGCGGCGGCGGCGGCAGCG




GCGGCGGCGGGGGGCGGCGGAGGCGCCGGCGGCGCG




GCGGGGCAAGCGCCAGGGATAGCGCCAGGCCTTGCG




GCGCCGATGCCGCCGTTGCAGGGGCAGGTGCCGCTGC




CGGATGCGGCGCCGCCAGCTGAGCAGTAG





77
163
ATGCTAGCGCGCAGCGCTCACGTGCAGCGCTGTGCAT




GCAGCCAGCGCCGGCGCTTGTCGGTGTGGGGCCGGCG




CATACGCGCCCGCCCCGTAGCCCCCGCCTCGGCGTCC




GCGCCCGCGGTCTCGTCATCCAGCGGACCCCCACGAC




TGGTGGATGTAAACGTCCGGAAAGCGTCCACCGCCGC




GGAGCTGCGCGCAGCTGCCTACCTGCGCGCCATCAGC




TTCTACACCTACCCAGAGGGCCGGAGCGAGTTCGCGG




CTCGGTCACACCGGCGCATGAAAGCGGATACGGAGTG




GGAGACCGTCACCAAGAAAGTGGAAGGCCGCGATGA




AGCCTACAAGGACCTGGACGTGAGCTGCTTCGTGGCG




TGTGTGGCGGACGACCTGGTGGCGCTGCCCGGGCCCG




GCAGTAGCGCCGCCAGCGTCAGTGGCAGTAGCGGCGG




CGACCCAGATCGGCAGGAGCTGCTGGCGGCGCTGCGG




GCGGGGCTGGACGCGTCGGCGCAGCTTCCTGCGGATC




CGGCAGCGGGTGTCAGCCGTCAGCTGGTGGTGGGGTC




GTTGGATCTGAACGTGGGGCACACGTTGCCGTCGGAG




GAGCTGATTGGCAGGCAGCCGAAGGAAGACCCGCGC




CACCGGAGAGCCTACCTAAGCAACGTGTGTGTGGCGC




CGGCGGCGCGGCGGATGGGCCTGGCGCGGGCGCTGCT




GCGCGTTGCGGAGGAGGAGGCGCGCAGCAAAGGTGT




GCAGTGGCTGTACGTACATGTGGTGGCAGACAACCAG




CCCGCCGTGAAGCTGTACTGTGAGGCAATGGGGTTCG




AGGTGGAGCAGGCGGAGTCGGAGGGTTACGCACGCTC




GCTGCAGCGGCCCCGGCGATTGATTCTTGCAAAGGAA




CTTGCGTGA





78
164
ATGTACCCACACCAAGATAAGGAGCCCCGCACGCACA




TCTCTTTGTTCCTGGAGGCTGTCGATGTCGCAGCAGGG




GCACAGCCGCCCACACTAGCATTCAAGCTTTACGTGA




AGCACTGGAAGGACTCCAACAAAGACTCCATCTGCGA




AAGCAAGGAGCCGAAAACCTTCAACGTGAGGTGGGG




CTTCAGCGCTTTCTTTCCCCGCGCTCAACTCACGACGG




ACTCTGGTTTCATCCGCCGCCGCGATGGCGCCCTGCTC




CTGGCCGCGGAGATTGAGCTGCCGGCTGGGCTGGCGG




CGGCAGCAGGAGCAGCTGCCGGCGGCAGCTGCCGCA




GCAGCAGCTCCAGCGCATACCCAGCTAGCATCACAGA




CGGCGCGGCGCGCCAGGACGTTAGCGGTGACCTCCTG




GCCCTGCTGGAAAAGCCAGGCTCCACCTCTGACCTGA




CCATCGTCGCGATCGCTGGCAGCGACAGCGGTGCCGA




TACGGGAGGCTCAGGAAATGGTGAGGCACCGGCGGCT




ACGTGGCTGAAACGGAAGTTAGTCACGGACAAGGGA




CGGAAGGGCGGCTGCGTGGGCAGCCCGGACACGAGG




CGCAGGTTCGACGTGCACCGCGCCATCCTGGCGGCGC




GCTGCCCCTACTTCGCCACACACTTCGCCAGCGGCAT




GGGCGACAGCGCGGCCCGCGAGCTAGATATGCCGGAC




ACGGACCCGGGCGCGCTGGCGGCGCTGCTTCGCTTCA




TCTACGGCGGCGAGCTTGTTGTCGCCTCCCGCGCGCA




GGCCCGCGCCGGCCTGGCCCTGGCGGACCGGCTGCTG




CTGCCCAAGGCGGTGGCGCTGCTGCGCGCGCAGCTGC




TGGCCAGCCTGTGCCCCAGCGCCATCGCCGCCGACCT




GATGTGGGCGGCTGGGTGCGGCGACCAGGCGGGGCTG




CTGGTGGAGCTGCTGGACTTCGCGGCGGAGGCTGCAG




ACGAGGTGCCCCAGTCCGACTTGCAGCAGCTGGCGGC




GGCGCACCCGGGGCTCACGGCGCAGCTGTTCGCCGCC




AGCGTGCGCGCCGCCAAGCGCTCGAAATCTTGA





79
165
ATGGCGCATAAAGAAAAGGGCGGCTCGGAGGCGAAG




ACCGTGGACGCAGACGCAATCTTCAGGATTTTCACAG




CTTGCCAGGGCGACATCCCCACGATTGTCATAGACAC




TCGGGCGCAGAAGGAGTTCAAGGTGTCCCACATATGC




GGCGCGTTCTGCGTCCGACTCAGCGCCAACGGGCAGG




TCCTGGCGGACTACTCCTCATCCAGCTACAACATCAA




GTGGAGCCAGGACTGCTGGTGGGGCCGTAACGTGCTT




GTGTACGGCGAGCCGGGCCTCAAGAAGGACCACCCTG




TGATCGCCTTCCTGTCGCGCCAGGGCAAGTGCCGCAA




CCTGCGCTACTACAAGGATGGGTTTGAGGCCTTCGCC




AAGGCGTACCCCTACCTGTGCACCACCTCCCTCAAGT




CCATTTGCATTAAGCGCTACCCCAGCCAGATCCTGCCG




GGGCAGTTGTACCTAGGTGACTGGGAGCACGCCGCGG




ACAACGAGCGGCTGGCAGAGATGGGCATAAGGAGGA




TCCTGACCATCCACAACCACCCCGAGAACCTCCGGCC




GCCGGCCGGCATCAAGCACCTGCGGCAACAGCTACCG




GACATCGAGGACGCGGACATCTCCGCCTACTTCTCTG




AGGCGTTTGACTTCATTGACGAGGGGAGAGAGCGCAA




GCAACCTGTGCTGGTGCACTGCGGCGCGGGCGTAAGC




CGTAGCGCCACCCTGGTCATGATGTACCTCATGCGCC




GCAACAGCTGGTCGGCGGCCCGGGCGCGCGGCTACGT




GGTGGAGCGGCGCAGTGTGGTGTGCATCAACGACGGC




TTCTACATGACCCTATGCGCCCTGGAGCCGCAGCTGG




GCATCGCGGAGCGGAGCGACCCCAACGCCACATTCGG




GTTCCGTGGCGCCGATGCACCCGAGCCGCAGCAGATC




AAGGTGGTGCTGAGTGAAGACGCGGCGGGGCAGAAG




GTGCCGGTGCGCCTGCTGGCAGCCAAGGAGGCGGCGC




AGGCGGCGGAGGCGGACAAGGCCGGCGCGGCGGGGG




CCAAGCGGCCGCGGGAGGGTGGCGAGGGCGGCGATA




CCCTGGCAGCCAAGCGCAGCCGACCGGGCGAGCCGG




CGTCCGCCGCAGGCGGCGCGGGTGCGTTCACACTGGT




GTTCGATGTGGTGAAGCCGGAAGGGCTGGTGGGGCGG




CTGGAGGCGGGGCCCATGCGGCCCAGCCAGCGCCTGC




TGCTGGGCCGCCAGCCGGGCGTGTGCGATGTGGTGCT




GGAGCACGCATCCATCAGCAGGCAGCACGCGGCGTTG




AGTGTGGACCGGGCCGGTGCGGCTTTCGTGACAGACC




TGCAGAGCGCCCATGGCACCAAGGTGGCGGACACCTG




GATCAAGCCCAACGCGCCGCGGCAGCTGACCCCGGGG




ACGGTGGTCAGCTTCGGCGCCAGCACGCGAGCCTACA




AGTTGGTCCGCGTCAGCAAGGCGGACTAG





80
166
ATGGCCGCGGCGGCCACCAACGGTGCCACCATGCGCG




AGGCCTACCCGCCGCCGCCCTCGCTGTTCAACCTGTAC




CGCCCGGATGACGGCGTGTCGCCGCTGCCGCCCGGGC




CCCCGCCCATCCCCACGCCCGCGGACGTGTCGGCGCT




GCGGGAGCGCAAGGTGGAGCTCAAGGTGCTGGGCAA




TCCCCTGAAGCTGCACGAGGAGCTGGTGCCGCCGCTC




ACCACCGCGGCGCTGTACCGGCCGGCGGGTCCGGACG




GACACATAGACTTCAAGTCTGAGCTGCGGCGGCTCAG




CCGCGAGCTGGCCTTCATGCTGCTTGAGCTGACCAAA




GCAGTGGCGGAGCAGCCCGGCAGCTATGCCTCCCAGC




TGACGCACGTGAACCTGCTGTTCGCCAACCTGGTGCA




GCTCACCAACATGCTAAGGCCGTACCAGGCACGTGCC




ACCCTGGAAGCCACCTTGGGCCTGCAGCTGTCCAACA




TGCGGGCGGCGCTGGGCCGGCTGCGGCAGCAGGTGGC




GGCGGCAGATGCGGCTCTGGGCGGCATGGCGCGAGCG




CTGGTGGAGGCGGGAGAGGGGGACAGCGCGGAGAGC




GCGGCACGACCTGCAGAGGCGGGGACAGCGGAGGCG




GGGGCGGCGGGTGCTGAAGCTGGTGTTGCAGCAGGGG




AGGGGGCAGGGACAGAGGCGGCGGTGGCGGCGGCGC




GAGGGGCGGATGCGGGCAGGACAGCGGCTCCGGACG




CCATGGAGGAGTTTTGA





81
167
ATGGAGGACACAAAGGAGGTGGCGCTCATATTTGCTG




AGTCCTTTGGCCGCGGCAACTTCCCTGGTGTCCAGGCA




GAGGCACTGGATGCGTTAGAAACCAGCTATGTGGGCG




CCATTGAGCGCGAGATGACCGATAAACTGCGGGAAAC




TATGGAGGCCAAGGTGCAGGCCTCTCGCGAGCACCGC




GAGTACCGGATGCAGCAGTACCTGCAGTTACTGCGGG




CGCAGCTGGCGGCGCTGAGAGGCGAGCCCGCGCGCTT




CCCCACACAGCCCTCGCCCTCGGATGAGCGCAACCTG




CAACGGCTGCGGCGGGCGCGGCAGTTCCTGGTGCTCG




TGGCGGAGGAACGGCCGACGGCTGAGGCTGGTGAGG




CTGGTGGCCAAGCCTCTGCCTCGTCCTCAGTAGCAGC




GGAGGCGGCGGCGGAGCCGGAACCGGAGGCAGCGGC




GCCGGGGCCCGGGCCCGGCTCGGCGGCTTGTGCTACA




GGGGCCGCGGCCTCGGCAGCGGCGTATGGGGGGGCG




CGGAGGCGGGGCCAGGCGGTGGCGGCGGCGTCACTGT




CGCTGCTGCAGCCAGAGGCTCTGCTGCCGCCGCCCTT




CCCCTCCAACAAGCCCTACCGCCTGTACGTGTCCAAC




ATGAGTGTGGTGCCCGCGCACCGGCGGCGCGGCCTGG




CCAAAAGGCTGCTGCTGCAGTGCGAGCGCGTGGCCCG




GCTATGGGGCCATGAGTCCATCTGGCTCCACGTCAAG




CGCAGCAACGCCGCCGCCGCCGCGTTGTACGCCTCCA




TGGGCTACACACCGGTGGAGTCGGGCGGCATGAGGCT




GCTGCCGGGGCCGCTCAGCCAGGTGCTGATGACTAAG




ACCCTGCCGCCGCTCAGAGGCAGCTGCCGAGTGGAGC




TGGGACGGGGCGGGGCCAGCAGGTCGCAGGCGGCAG




CCGGCAGCAGCAGCAGCAGTGGCAGCAGCGGCAACG




GCGGCAGTAGCAGCAGCGGAGCCGGCGGCGTGTCGG




CGGGCGAGGCGGTAGTGAGCGGGGTGTCGGGGAGGT




CCCGAGAGAAGGATGGTGTGTTTGTGTGGGGTGCCGT




GGTGGAGGGGGCAGGAGACGTGGGGCCCACCGACAA




GGGGGCGGAGCGGCCAGGGCAGTAG





82
168
ATGGCAGACGAAACGGGTATCGTAAAGCAGGCCGTGC




TCGAGTTCCTGAAGACGGCCGACATGAATGTAACAAC




GGAGCGCACAGTCCTGAATCACCTGGCGGCCACGCTG




CAGCTAAGCCAGGAGGTCAAGGCGTACAAGGCGGTC




GTGTCGGCCACGATTGACGACTACCTATCGGCTCTGG




ATGACGCCGAGGATGAGGAGGAAGCCGCGGAGCAAG




AGGAGGAGGAGGACGCAGGCGCAGCCAAGGCAGGCG




GCCGCAAGCGCGCCGGCGGCGCAGCCGGCGGCGCTG




CCGCTAAGAAGAGCCGCAGCAGCAGTGGCGCCGCTGG




CGGCGGCGGCGACGACGTGCTGCTGCACGTGGACCTG




AGCGAGCGGCGCAAGGCGCGTGTACGGCGCTACGAG




GGGCGGCTGCACGTTGATGTACGGGAGTTCTACAAGA




AGGACGGCGAGGACGCGCCCACACAGAAGGGGCTGT




CCATGGACCCGGGGCAGTGGGCCCGACTGGCGCGGGA




GCTGCCGCGGCTGGTGGCGGCGCAGCGGGCGGGCGCT




GCAGGCGGCGGCGGCGGCGAGGTGCCGCCGGCGCAG




CTGGCCAAGACTCGGCTGGCCTCCGTCAGCGAGTTCA




AGGGCACTTACTACCTAGGGTTGCGCGAGTACTACGA




GAAGGATGGCCAGCTGCTGCCGGGCAAGAAGGGCGT




GAGCCTGAACCCCTCGGAAGCGGAGGCCCTGCTCGCC




GCCGCCGCCGCCATCACCACTGCCGCCGGCGGCGTGC




CGGCCGACCTGCCGCCGCTCGAGCCCTCTGCACTGCT




GCCCACCGCCGGCTCCGGCTCCGCAGCCTCCGGGGCC




ACTGCCAAAGCCAGCGCGAGCGCGGGGCCCTCCAAG




GCGGCGGCGGCGGCAGCAGCGGCGCCAGCGGCCGGT




ACCGTTGCCAGCGGCGAGCCGACTGAGGTGGTGGAGC




TGGGGTCGAACAAGCGGCTGAGCATCAGTCACTTCGG




CGGGCGCACCAGCGTAGACCTGCGCGAGTTCTACGAC




GTAAGCTACAGAGGTGTTGGTGCTGAGAAAGACGGGC




AGAAGCTTCCAGGCAAGAAGGGCATTGCGCTGGCCCC




GGCTGACTGGGCCACGATGTGCGCCGCCCTGCCCGCC




ATCAGCTCCGCCCTGGCCAAACGCGACATGGGCTATG




TGCTGCAGCTCAGCGGCAAGCGGCGTGTGTCCTTGTC




CGAATTCAAGGGTGCGGTGTATGTGGGCGTGCGCGAG




TTCTACGAGAAGGACGGTCAGCTGCTGCCGGGCGCCA




AGGGCCTGTCTATGAACGCGGCCCAGTGGGCGGCGCT




GGTGGCGGGCGCGCCGGGCTTCAACGCCGCACTCCAG




AGCCAAGAGTAG





83
169
ATGTTTTCGCTCAGCACGACGAATATATCCGATGTGCC




GCTGTTCTGGGAAACTGTCAACCTAGTGTACGATTCCT




TTACCGAGAGCTTCATCGTGGTCACTGGCGCATGCATT




CAGCAGCTGATCCCTGCCCTCCACGGCGAGGACGACG




AGCCGCTCGTGCTCGCTGCAGTGGCGGGAGCTATACT




ACCGGTCCGTGTGCAGGCAAATGGCCGTGGTAACGTG




GCGCAGTTCGGCAAGCCCACGCATATTGCCACCGACG




GCAAGGGCACGCTGTACGTGCTCGATCAGGCCAACAT




CCGCAAGCTGCAGCTGCCGGCGGCGGCGCGCTACCAG




CCCCATCAGCAGCGCCAGCGCATCAACTCCATGCAGG




TGGAGGTCACCACGTTGTCGCAGCAGCTTCCCCCGGA




TATGACAGCCAGCGGAATGGTTTACGTCCCCGCGGGG




GAGAGCCCTGGCGGCAGCGAGTGCCTGATCCTGGCGG




GCACCAAGGGCATCTACCGGCTGCCCCTGTGCAATAA




TGACGCAGCAATTGAAGCAGGCGGCAAGGCTGGGAT




GCAGGGCAGCGGCAGTGGTGCCGTGGCTGGCGGCACG




GGTGGAGCAGCGGAGGCCACCACCGCCACTGGCAGC




CTACACCGGTTGGCAGGCAATAGTGACACCGCAGGAA




GCTGGGGAATCCGTTTTGATGCATTTGGTGCGCAGGC




CAAGATGCTCGCCATCTCCTCCGGCCTTGCACTCACTG




GTGATGGCCGCGTGGTGTTCTTGGACTATTCCGCAACC




CAGAGGGACACGGCCGTGCGGTGCATACGGATGTCCG




ATGGGCGCGTGTCCACGCTTTACGAAGGCCTGGACGG




GCAGTGGCAGTGGCCGTGCCTGCTCCCCAGCGGCTGC




CTGGCCATGACGAGTGGCAAGGACCTCTTCATCATCG




ACCTGGCCCTTCCGCCGCCACGGCCGCCGCCACCGCC




GCCCAGCACCGGCCCGCCGCCGCGTAGCCTGGCCTCG




GACCTGGGCGCGCTGCTAGACGGCGCGGCGGGCGCGG




CCAGCTCCGACCTGACCATCCTGGTCGGCGGACGGGC




CTTCAAGGCGCACCGCGTCATCCTGGCCGCGCGCTGC




GAGTACTTCGCCAAGCGCCTGGAGGAGGGCGCCTACG




CGGACGGCGCCAAGCAGGAGCTGGAGCTGCCGGAAG




CGGAGCCCGCGGCGTTCGAGGTGCTGCTTCGCTGGCT




GTACACCGGCGCCGCGGACGTCCCGGCTGAGCTGGCG




CAGGAGGTGGCGGTCCTGGCGGACCGCCTCGTGCTGC




CGGAGCTGTGCGATGCTGCGCAGGCGGTGGTGCTCGA




GTCTGTGACCCCTGGGTCGGTTGCGGCGGCGCTGGTG




TGGGCGGCGAGCTGCGTGCCTGGGCGTGGCAGCAGCT




TCGAGCAGGTGCTGCGCCGGCTGAAGAAGTGGTACGT




GGCGCACTATGACAAGGTGCGGAGCGAGGCGCGCGC




GAGCGTGGTGGCGCTGATGGCCAGCAACCCCGAGCTG




GCGATGGAGCTGCAGGAGGAGGTGCTGGGGGCCACG




GAGCGGCGGGTGAGCAAGAAGCAGCGGGTTTAG





84
170
ATGGTCTGCATTCGCCCAGCAACGATTGACGACCTAA




TGCAGATGCAGCGGTGCAACCTGCTGTGTCTACCTGA




GAACTACCAGCTGAAGTACTACCTGTACCACATCCTG




TCCTGGCCCCAGCTGCTGCAAGTGGCGGAGGACTACG




ACGGCAAGATTGTGGGATACGTGCTGGCCAAGATGGA




GGAGGAGGCCAGCGAGCAGCACGGACACATCACCTC




GGTGGCGGTGGCGCGCACGCACCGCAAACTTGGCCTG




GCCACAAAGCTCATGAGCTCCACGCACAAGGCCATGG




AGGAGGTGTTCGGCGCGCAGTACGTGTCGCTGCACGT




GCGCGTCACCAACAAGGTGGCCGTGCACCTGTACACG




CAGACCCTGGGCTACCAGATCTACGACATCGAGGGCA




AGTACTACGCCGACGGTGAGGACGCCTACGAGATGCG




CAAGTACTTTGGCCCTGCGCCGCCCGCCCTGGCCAAG




AAGGCCGCGGCGCTCACGGCGCAGGCCACCGGACTGC




CCGCGCCCACAGCCGCCAGCAGCTGA





85
171
ATGGGGGACCAGTATAACTATTATCCGGGCGGGTACA




CTGGTGGAATCCCGCCGAACCACCACCAAGCTGAGGC




GCTCAAGTCTTTTTGGCAAGCACAGCTGGTCGAGGTG




TCTGAGGTCCCACCTGACCCAACTGTATTCAAGAACC




ACCAGCTGCCTCTGGCCCGCATCAAAAAGATTATGAA




GTCGGATGAGGACGTGCGCATGATCAGCGCGGAGGCC




CCCGTGCTGTTTGCCAAGGCGTGTGAGATGTTCATCCT




GGAGCTGACGCTGCGGTCGTGGATGCACGCGGAGGAA




AACAAGCGGCGCACGCTGCAGCGCAACGACGTGGCG




GCGGCTATCACCAAAACAGACATCTTTGACTTCCTGAT




CGACATTGTGCCCCGGGAGGATGGCAAGCCGGAGGA




GGGCGGCGCCGCGGCGCCCGGCGGCGCGGCCCCCGC




GACTGCGCCGTCACCGGCCGGGCCCGGCGGCTCCGGA




AACCAGCAGGCAGCTTCCGCTGCCTCGACGGCTGCCC




CGGCAGCGGCCGCGCCGCGGCCGCCCGCGCCACCGGG




CATGCCCACCGCGCCAGGCATGTTCTTCCCGCCGCCCT




TCCCAATGCCGCCGGGCGCGCTGGGGGACCCCAGCCA




CGCGGCCGCGGCGGCAGCGGCGGCGGCGGTGATGAT




GCGGCCACCCATGGGTGTGGACCCCAACCTGGTCCTG




CAGTACCAGCAGCAGATATTGGCGGGGCAGGCGCCAG




GGTGGCCGCACCTGCCGGGGTTGCCGCCGCCGCCGAC




GTCGCAGCCGGGCGCCGCGGCTGCGGCCGCTGCGGCG




GCGGCGGCGGCGGCAGCTGCCGCAGCAGCGGGAGCT




GCGGCAGCAGAGGGGCAGGCGGAGGCTGCAAAGCAG




GAGTAA





86
172
ATGACGAAGGATGAGCAGGCATTGCTAGATTGGGTTA




TTGCTGAGGGCGGCGAACTGCGGGTGACGATTTCCCG




CGATGAGGCGGGGGTGCGGGGCCTTTACACCACGCAG




CCAGTGAAGAAGGGCGAGGTAATAGTCTCCATCCCTC




AGCACATCGTCCTCAGCGTGAAGAATGTGGCAGCTGC




GGAAGCCTCCCCCCAGCTGCTCAAGGAGATTCACTCG




CCCTGCTCACGGCTCAGACCGTACCTGGACACACTGC




CTGGGCCTGACGGGGTGCTCACGGCGTACAACTGGCC




TGAGGAGTACATCAAGTACCTGGCCGACCCCGCGATG




GAGGAGCAGTTGAAGAACTCCTTCAAGTTGCACGCGC




GCAACACGTGGCTCGGGCACAACGACGATGAAATGG




AGGTGACCATCCCAGAGGCCATCGGCCGCAAGAACAT




TACATTGAAGGAGTGGGAGCACGTTGTGTCACTGCTG




AGCTCGCGGACGTTCAGCATCCGCAAGGGCGCCTTGT




CGCTGGTGCCCGTGCTAGATCTGGTCAACCACGATGT




GCGGGACATCAACCAGCTCGGCAACAGCAGCACTGTC




GATCTGGTCGCCGGCAAGGACCTGGCTGCTGGCGAGC




AAGTGACCATCACCTACGGCTCCATGCGCAATGACGA




GCTGCTCATGTACTATGGGTTCGTTGACACGGTGACG




GAGCCGCCCCGCCTGTTCTCCGTTGACCACCGCGATTT




CAAGCTGTACGAGGCCAACCCGCTCAGCGACAGTCCG




TTGGAAGGCCCGCCGGAGGTGCTGCGGACAGAGCTGG




CGCGTCTGCGTGGCATCCTCACCGCGTTTGAGGCCAG




ACTGGACGGGCTGGGCCCAATTCCCGACACACAGCCG




TACGTGGCGTCGCTGCTGCGGGACGCACACGACCGGA




GGCGGCGCGCGCTGCATGCGGAGATAGGCCGCCTGGA




GCAGCAGCTGCAAGGGGCCAGCGGCAGCGGCGGCGA




GGAGCTATAG





87
173
ATGTCGATGCGCAACAACAAGCGCCGCGCTCTGGCAA




GCGCTGGCGCCGCCAGCAAGCAATCTGCGGTGGCCGA




CGCCGTCCTGGACGTGGCCAACCGCAAGGGCGTCCGC




TGCTGCGTAGAGTGCGGGGCGACGTCCACTCCGCAGT




GGCGTGAAGGCCCGATGGGCCCCAAGACGCTGTGCAA




CGCCTGTGGCGTGCGCCGCCAGCGCCTCATCCGCAAG




CAGCAGGCCGCTGTCGCTGGCGTCACGCCCACCGCGC




CTGTCGCCGCCGTGCAGGCTCGCCGCCGTCTGGCCAC




CCGCCGCCGCCCCGGCGCCTCTGCCTCGCTCATCGCCG




ACGAGGATGTCTTTGCGCCCGCGGGCGCCGGCTCCGT




GTCGGAGCAGTCGAGCGACGAGGCGGAGATGACGGT




GATGGGCTGGCGCACAACGGCGGCGGAGGTGCCCCG




GCCGCAGCGCGGGCAGCACTCGGCTGCCACCGGCACC




GACGTTGAGGACAGCTGCAACGAAGAGGAGACGGCC




GCCTACGACCTGCTCTTCTTCGCCGGCTTTGACTGCGG




CGACTATGGCTACTCGGCGCCGTCCGGGCCCAGCCAC




GGCCACAACACACGCCGCCAAGCCGCGCCGCAGCGCC




GCTCGGACGACTTCTATTATTACGAGGAGCAGGACCA




CGAGGGCGAGCACGGGGTGGCCGCCGGAGAGCATGA




GCGGCTGCCCATGTCGGCTCCGGCGCTGCAGCAGGTG




TCGTCCATCAAGCGCCGGCGCGTGCTGGCGGCCCCGC




CCAAAGTGCACATCCGCCCCGGCCGGTCCGCGATGAC




GAGCTTCCCGTCTTCCTCGGCCGAGCACGAGGCAGCG




GCTGTACCGGCCGTGAGCAACATGAGCAGCCTGCCGG




CGGCCGCGGGGCCTGCGCCTGCATCGTCCTCAGACGC




CGCAACGGCGGAGTTGCTGCCGGCGGCGCCGGCGGTG




CTACCGTCCTCTGCCATGCTGGCGCTGCAGCTGCCGCT




GCTGCCGCTCGCGCTTCCGGCGCTGTCGCTTCCGGGGG




CGGTTGTGGCGGGCGGCGCAAGCCCGGCGGACCTGGA




GATGATTGCCGCACTGCACGCCGAGTTCCAGCGTGCC




TGCATGCAGATGCAGCAGGCTGTGGCTGCGGCGGAGG




CGGTCGGCGCGGTAGCGGCAGAGCGGCGCGACGCCG




CGGACGCGGCGCATGCTGTCGCCGCTGTGGCGTCGCA




GCGCCTGGCGGACGGCGCTAAGGTCGTGGCGGCCCTG




CCGGAGGTGCGTGACGTGCTCGCGGAGCTGCACACCG




GCCCAGTCGCCATGGCCGTTGCGCCGCCCCTGTAA





88
174
ATGGCGCTCGTATCACATCATGGTGTATATAACCAGC




GTTGTAAACATGCAAACGGCGGTCGTTCCGCTCCTGG




GTGGCGCCTCTCGCAACCACAGCCTGCTCAGCCCCGG




CGACATCGCCATGTCGTGTCCGCCGCGCGTTCGCCGC




AGCAGCCCGCTCCGCTGCCGCCTCGGGTGAGCTGTGG




CGAGGAGGGCGGAGCGCCGCTGCACATACGCGCCGC




GGAGCTCCGCGACTACTGGCCGGCAGCGGACCTACAC




ACGCGGGTGTTCTGTCCGGAGGCGGAGTCAGACCGAA




GTAAGGCGCTGTCCATGCGTGTGGACCGCATCATAGC




GCTGCAGATCAACGACCGCATATCCAGAGAGGGCGGC




GGCAACTCTGTGTTGCTGCTGGCATTCAACGGGGAGG




CGCCGGGCAGTGCGGAGGAGCGCACGGCGGCGGAGG




CGGCGTTTGCGGCGGCGGCGCAGGCGGCACAGACGCC




CGGGTCTGTCACCCACCTGTCCACCGCCTTCCCCAACC




CCATGTGGTGGCTGGCGCGGCCGCTGGGGCCGGGCGT




GCGGGCCGGCATGGGCGTGGCGGCCGAGTCCGTGGGC




CTGGTGGGGGTGGCGGCGGTGGACAGCTTCTGTGACC




TGGTGCCGCCGCGGGAGCTGGACCCGCGGCGGGACGG




CGCGTTCGGCTTGTACCGCCGGGACGGCTACGCCTAC




GTGAGCAACGTGGCGGTGCTGCCGGCGGCGCGGCGGC




GCGGCGTGGCGCGTCAGCTCATGGCGGCGGCGGAGGC




GCTGGCGGCGGAGTGGGGGTGCAAGGCGGTGGGGCT




GCACTGCAACACCAAGAAGACGGCGCCATGGGCGCTG




TACCGCAGCCTGGGCTACCGGGACAGCGGTGTGGTGG




AGCCCTGGATCATGCCCTACCTGCAGGGCCGGCCGCC




CGACCGCTGCTCGTTCCTGGTGAAACGCGTGCCGCTG




CAACCGCAGCCGCAACCGCAGCCGGAGGCAGGGGCG




GGGGGGGCGGGGCGCACGGAGGGTTCGGGGCCAGCC




GGGCTCCGGTAG





89
175
ATGCCCAAGGAGTACATCGTGCGCCTGGTGTTTGACC




GGCGGCACCGCTCCGTGGCGCTGCTGAAGCGCAACGG




CACCGTCATCGGCGGCATCACCTACCGCGCCTTCCAC




GAGCAGGCATTCGGCGAGATCGCCTTCTGCGCCGTGA




CCAGCCACGAGCAGGTCAAGGGCTACGGCACGCGGCT




CATGAACCAGACCAAGGAGTTCGCGCGCACCGTGGAC




CGCCTCACGCACTTCCTCACCTACGCCGACAACAACG




CGGTGGGGTACTTTGAGAAGCAGGGCTTCACGCGCGA




GATCACGCTGGCGCGGGAGCGCTGGCAGGGCTACATC




AAGGACTACGACGGCGGCACGCTGATGGAGTGCGTCA




TGCACCCGCGCGTCAGCTACACCGCCCTGCCCGACCT




CATCCGCACGCAGCGCCTGGCGCTGGACGACCGCGTT




CGCCAGGTCTCCAACTCCCACGTGGTGCGGACCGGGC




TGAGGCACTTCCAGGAGGAGGACGCGCGGCTGGCGGC




GGCCACGGCAGCAGCAGCGGCGGCGGCGGGGGCAGC




AGGAGGGAGAGGCGCGGGCGGTGTAGGGGCCGGGGC




GCCGGCTGGTGACGCGGCGGCGGCAACAGCGGACAC




CGACCCGGCGTTGCGGCGACGTATGCTGGACATCGGC




GGCATCCCAGGGGTGCGGGAGGCGGGCTGGTCGCCGG




ACATGGTGCAGCAGGGGCCGCGCTTCCGGCTGCTGCT




GGACGAGGCGGGGGCGGGTCCGGCGGTGGAGGCGGG




GTCGGAGGCGCTGCACCGGTTCCTGGTGCTGCTGCTG




GAGCACGTCAAGGGGCTGGAGGACGCCTGGCCGTTCC




GGGAGCGGGTGGCGGTGCAGGACGCGCCCGACTACTA




CGACATCATCAAGGACCCCATGGCTCTGGACGTGATG




GAGGAGCGCCTGGCCTCGCGCGGCTACTACGTCACCC




TGGACATCTTCACCGCCGACCTGCGCCGCGTGTTCGAC




AACTGCCGCCTCTACAACGCGCCGGACACCATCTACT




ACAAGCTGGCCAACAAGCTGGAGGCGCAGGTCAACG




CCTTCATGTCCAACCACGTGCTGTACGAGGATGAGGC




AGGGCCGGCGGCGGCGGCAGCGGCAGCGGCAGCTGG




GACTGGGGCTGGAGCAGGCGCTGGGCGGTAG





90
176
ATGCAGCAGCCCGCTCGCAGGACCTGGACGGACCAGG




AACTGGCAATCAGCGGCTTTGAGCGGTTCGCCCTTGA




ATTGGAGTTCTTGCAGTGCCTGGCCAATCCTCTTTACA




TCAATTGGCTCGCAACGAAACAGTATTTTGACAACCC




AGCGTTTTTGAACTACCTTAAGTACCTGCAGTACTGGA




AGCAGCCTGCATACGCAGTGCACATCACGTACCCGCA




CTGCCTGTTCTTCTTAGACCTGGTTCAGGATGCGGACT




TCCGCAACGCAATAAAGGATTTCTCATACGCGGAGCA




TATCCGCCAGGCACAGGACTCGTTTTTCCGCAACTTCC




ACTCCAACCGGGTGGCGGAGGCGGAGGGCAAGGCCA




CGGCCGCGCCGGCAGCAGATGGCGACGGTGGCGCAG




GTGATGCCATGGATTGA





91
177
ATGGACTCGGAGCAGCAGCCGGCCAGCCCGAGGGCTG




CGCCTGGTGCAAGCGGAGGCCGACGCTTGCCTGGTCG




GACACCTTCTGGTCTATTGGGACAGGCAGCGCAGGGG




CCGCAGCAACCTCAGCCCCAACTTGGCAAGGGAGCAC




TTCAGCTCAATCAGTCCAGCAGCGCAGCGACAACCGC




GTTGCCGGTGAAACGTCGGGGGAGTTTCCAGCAGTTG




AAGAAAATAGGTGCCGCCGGGGGGCGAGATGGCAGC




TCTTCGCACCTGGACTCGGACTCGGCACCATCAATTTT




CGCCATTGTGAAAAAGTCCACACACTGGGAAAAGTAT




GGCACGGTGCTCGTGCTGCTCGTTGCCGACGAGCTCA




GCAGTGACAAGGAGGCGGTGGTGCAGATGCTGAGCG




CAGAGGGATACGATGACCAGACGTCGGACAGCATCG




AGGAGGCGGTGAAGTTGTTTTCGGAAAGGGAGGTGTA




CCCGGACATTGTTATTGTTGATTCAGACAATGAGCTGG




TGGACACCAAACAGCTCATCAAGGCGCTGCAGGCGCT




GAACCCCACGGTGGCGGTGCTGGTACTGGGCAGCCGC




GGCGGGCCCATGGGCGCGGTGGCGGCGCTGCAGGCG




GGCGCGGCGGACTACATGGTGAAGCCGCTGGATCTGG




ATGAGGTGGTTGCCCGCGTGGAGCGACACGTGCAGCG




ACAGCACTGCATCAAGTTGGAAATGGAAAAGGCGCTG




GAGCACGCCAAGGAGATGATGCAGCAGCTCATGCCGG




CATCACTACTCGGGGACGTGATGTTGCGGAAAGACGG




CAGCGCCGCGGGCGGCGCGCCGGCGGGCGGCAAGGC




GAGTCTCAACAGCGTGGCGGAGACCGACTTTGAGGAG




CAGATGAGCGAGCTGAGCGAGGAGAACCACCGCTTG




GGCCAGAAGGTGCAGGAGATGGAGCGCAAGCTTGAG




CTCAAGGACCAGGAGAACCGCGACCTGGAAGCCAAA




CTCAACGCCATCGACCGCAAAGTCAGCGCGCTGGCCG




CCAGCCGCGAGATGGGCGGCGGCAACGGCGGCGGCA




ACGGCGGCGGCGGGGGGTCGGGCTGCACGGCCGTGG




GGCCTGAGCAGCGTGCCGCGGCGCAGCAGGCGGCGC




AGGCGGCCCAGGCCTCGTTGCAGGGGCAGCTGAACAG




CGTGGCACAGGCCAACGAGGACCTCCGACATAAAGTG




GACGAGCTGGAGCGGCTGATGCAGTCGCACACAGGCG




TCACCAGCGCCAGCAACCAAAACCTGCGCCTGAGCGT




CAACGGTGGGCAGCAGCAGGGCTAG





92
178
ATGGCGGCCCGGCTCCTGCGGGATCCTGAAGCAGACG




GATGGGAGCGCTCGGATATGCCCATCGTGTGCGAGAC




GTGCTTGGGACCCAATCCTTTCGTGCGCATGCAGCGG




ATCGAGTTCGGCGGCACCTGCCACATTTCTGGTCGCCC




CTACACGGTCTTCCGCTGGCGCCCCGGCAACGACGCT




AGGTACAAGAAGACGGTGATCTGCCAGGAGGTGGCC




AAGGCCAAGAACGTGTGCCAGGTGTGCCTGCTGGACC




TCGAGTACGGACTGCCCGTGCAGGTCCGTGACGCCGC




CATGGGCGTGAAGCCGGACGAGGAGCCCCAGAGCGA




GGTGGGCAAGGAGTACAAGCTGCAGATGGAGGCGGA




CGCGGGCACACTGGGCGGCGGCGGCGTGGGCGGGGC




CAGCAGCAGCTACGCGGCGGGCCGGCCCAACGAGAT




GCTGCAGAAGCTGCAGCGCTCGCAGCCCTACTACAAG




CGCAACCAAGCGCGCGTGTGCTCCTTCTTCGCCAAGG




GGCAGTGCACGCGCGGCGCCGAGTGCCCCTACCGGCA




CGAGCTGCCCACCGCCGACCCGGCGCTGGCCAACCAG




TCCTACAAGGACCGCTACTACGGCACAAACGACCCCG




TGGCCGCCAAGATGCTCAAGCGGGTGGACGAGCTCAA




CAAGCTCACGCCGCCGGAGGACACCTCCATCACCACG




CTGTACGTGGGCGGGGTGGACGCCTCCATCACCGAGG




ACGACGTGCGGGACGCCTTCTACTCATTCGGAGAGCT




GGCCAGCGTGCGCAAGATGGACGTCAAGAGCTGCGCC




TTCGTGACCTACACCACGCGCTCCGCCGCGGAGAAGG




CGGCGGAGGAGCTGGGCGGCAACCCGCTCATCAAGG




GCGCGCGCGTCAAGCTCATGTGGGGCCGCCCGCCGCC




CGCGCCCGCAGCCCGCAACGCCGCCGCCGCCGACCCC




ATGCAGCCCTCCACCAGCGGCGCCGGCGGCTACGGCG




GCGCGGCGCCCGGCAGCGCCGCCTCCTACTACCCGTC




CATGGACCCCTCGGCCATGGGCTCGCGGGCGCCGGGC




GGGCCGCCCGGCATGCGGCCAGGCGGGGAAGGCGGC




GGCCCCGGAGGCCCCGGAGGCATGGCGCCGCCGCGG




CCCATGGGCTACGGCGCGCCGCCCGGGTACGGCGCGC




CGCCGCCTGGCTACATGCCGCCGCCGCGCCCCATGGT




GTCTGCCAGCATGCAGCCGCCGCAGCAGCAGCACCAG




TAG









Putative transcription factors initiate transcription from C. reinhardtii promoters in yeast. As an initial screen for potential DNA-binding activity, we performed a high-throughput yeast one-hybrid (Y1H) assay to test our TFs' ability to activate transcription from known C. reinhardtii promoters [36,37]. We transferred our entire pENTR-TF library to the Y1H vector pDEST22 via Gateway LF-transferase which allowed the TFs to be fused to the yeast GAL4 transcription activation domain [38]. Separately, “bait” promoters of interest were cloned (in 300 base pair (bp) fragments, labeled A, B, and C (5′ to 3′), for a total of 900 bps per promoter (Table 9) 5′ to a yeast minimal promoter element followed by the reporter gene Gaussia luciferase [39]. Each TF-vector was transformed into separate haploid Saccharomyces cerevisiae YM4271 cells and crossed against the opposite mating type of strains harboring DNA bait promoters of interest. S. cerevisiae strains producing each TF were also cultured so whole cells could be processed for western blot analysis of TF protein production (FIG. 9).









TABLE 9







Promoter sequences used in yeast one-hybrid assay.














SEQ



Frag


ID



ment
Gene
Species
NO
Sequence





A
LHCBM5
CRE
179
TGAAAGACGGGCAAGACACGATTATCCTGC






AGGCAATTGCCGGCGCGAGCTTGGGGCGCC






CCTTCAGCGTCCCATCGGCGGTCGCTTTTTG






CCCCGGTGTCGCCGTTCCTGGTTCTCGGCAG






CCCAAGATAATTTAATCTAGTAGTAATAATC






ATGTGCAGCGTTGTGGCAGCTGCCCCCAAAG






GAAACTGTGGCGGGAAGCGCCCCAGTCGCG






CAAGCTTATCGCTCGGTCGCGCGTCGGGGCC






ACCCTGAAGACCCTGAATTATTTGTGCGACA






ATATAGCAGCCACTTCTTTTCATTTGAATGG






TTT





C
LHCBM5
CRE
180
AGGGGAGGGGAGGGGCGGGGCGGGGCGGG






GCGGGGCGGGGCGGGGCGGGGAGGGGAGG






GGCGGGGCGGGGCGGGGCGGGGCGGGGAG






GGGAGGGGCGGGGCGGGGCGGACAAATAG






GTCAGCAAATGGATGAACATGACCGCAAAT






TGATAATCATACCTGGCTTGCAAGCTCGCGC






CCAGCGAGATGGAGTACGGACGATGGAGAT






CTGGCCGCGATTGGCGAGCCGGGCAAGAAA






AACAGCCGAGCGCTGCATATAACACTTGTCA






CACCGTCGACCTTGTTCGTTCAGTCACTTGA






ACAGCAACACC





A
LCIC
CRE
181
CACAACACCTCGCCACGGGCACACCGCCAG






CCACCCGCCCCACCAGCGAACTAGACCGAC






CCGACAAACAGGCACGCGCGCGCCCGGAGG






CGAACAGGCGCACCAGCCGCCCGGGCGCCC






GGGCAACAGCCGCCCAGGCACTCACAACCC






GACACCCGGGACTACCCGACCAGCGTCATCT






GCTGCCTAACGGTCCCTGAACCGCCATGCTA






CGAACGGCACCCGCAACCTAACTATCTGCTG






AGCCAGCAAGGCCGCCGGTGGAGACGACAG






CGGGCCAGGCGGCACGAGGAGAGGCGCACA






GGGCTGC





B
LCIC
CRE
182
GGCGCACAGGGCTGCGTGCATGGCCAAACC






CTCAGTTGGGAAATTCGGACAGGAAGCAGT






GAATGGGGCACAGTACTATACTAGGGGAAA






CGATAACGTGATCTCAGGGGCGTGGGGGGG






GGGCTAGAAGGGAAGGGGCGCTGTAACTGG






ATTGCGTGGTGTGCGCGGTGCATTCTTCGCA






CACCTCGGCAGCAGCCCGGCCCCGCGTTCCC






TGGCCTAGTGACGCCGGTTGCCACCAGCAAC






CAAATGCCATGCATGCGGCCAGTATGCGCAT






GCGTCGCCCCCGCGGCCGAGCTGCACGCAC






ATGCCG





C
LCIC
CRE
183
TGCACGCACATGCCGACCGAAAGGAAATGG






GTGTTGCGCGTCAGAGCGGGTTTGAACAAGT






GATTTCTTCGCTCCGCCATGCACAGCAAGCT






AGCTAAGCTGGATGTATTAGGGGCTTGGTTT






GTTCATTTGCACCTCTCCAACACGTACGACC






TCCAACCCTCCTACAATTGCCCATGCGCCGG






GTTTTATAGGTCGCCGGTGCGTATGATGGGC






TGCAGTAACAACATTCTTCTCGTGGTTGTGT






GTTAAACGTGCACAGTTAAATACATTACATA






TCTCGTTGACACTACAAACCAGCGATAGAA






GG





A
LCI5
CRE
184
CGTGATTGCCGGCGGCGAGGCGGGGCCATG






GACGGGGCTACGGGCAGGGCGACGCCACGG






TTACTCGCACTGCCCAGCCGTTCACCTGTGC






TGACATGCATGGCAGTCTGGCAGACCTCACG






CAAGACCACTGGATGAGGCGTGGCCGTGTG






GGGCTCGTCGTCGCACTCAGCTGTTGGCAGG






CCCCCGCTAGTTGCCCTGTGTCCGCCCTCTTC






GGTGCTCAGCCTGACCAAGGCCTTGGGGGC






GCCGGCAACCACAAACCCAACTGAGGCTGT






ATACTTGGACGCAACCCATCCGTGGCCAGGT






TTCT





B
LCI5
CRE
185
CGTGGCCAGGTTTCTATCGACGTCCTCCGAC






AGTGAAGGGTTCCGCAAAACCGCCTCACCG






ACATGTGAGACATGCGACATGTGCCCTCAG






GTCTCTCAGCCCCTGTGCTCCTGGAGCGCTA






CGTTATGCGCAGCATGACCATCGCAGCTACT






CAAGAAAACAAAAGACCATAAGCTGTGAGC






CGTTGACTGAGTTGACCGTCGCGAAACAGC






GTCCTTTCTCAGCAAGCCTTGCCAGCCGAAC






CCGAATTTATTTACCTTCACGGCAATACACC






ATGTACGTTTTGAATGCCTGCAATCGGGTTT






CGGC





C
LCI5
CRE
186
CAATCGGGTTTCGGCCTCGCCCTGGGCCTGC






TAAGAAATTCACAACTCCCCGCGAGAATGCT






GGCCGTGCACTCAATTAAATATGCTCATGCA






AGTAAGCTGATTACATGCATATTTGAGGAGC






GGGGCGGGGCCATTCCTCCAGGAAATGGGG






AACTCCTACCACAACCTCCTACAATGTACGG






AATGGCCCATCGCCGCGGGCAGCTTGCACTT






AAGCTTGCCGGCCGGCGCGCACAGATTCAC






CTTCAGGCAAGCACTCGCAGCCGCTCCATCT






GTAGCGTCGACCTTTCAGAACCACTCCAAAA






CA





A
SEBP1
CRE
187
TGGATGAGGCAGGGGTTCCCCTCAGCTGAG






GCAACCATGCTGCCGTGGCAAGGCGGCGCG






TTAATGTGCTCCGTTGCTCACGGTCACAGGC






GTGCATAGGCTGCATTACGCTGCGTGTCGCT






TATTACTCTGGACGCCCTCTGCTTCGGGTGG






GGCTATGCCAGTGCCGTGCGCACCTCGTGCA






AGTAGACTATGGTACCAAGGTAGACCCAGC






TTGATTCCACGCGTGATCCATGTTAGTGCGT






AGGCTCATAGAAAGACACACCGGTGAGAAA






GACACATGGAGGCGCGGCACTGCGGACGCT






GCGGA





B
SEBP1
CRE
188
TGCGGACGCTGCGGAGAAAGGCACATGGAG






GCGTGCCGGTGTGTTCCGGCAGTGCTGCTGA






CATGCAACTGTGTTGACCGTTGACATGCCCG






TGCCGTAAGTGCCCCAGCGACGAGTTCGTGG






CCCCTAGCAGTGGCTTGACATGGGGCTTTGG






GCCCACAATTAAGCCATGTGAGCAACGCAC






CTTGACGCGGGCTTAAATCTGGCAGTCCAAA






CGACACGCGTGTGAAACCCGCCAGCTTCTTT






TCCCTGTTGACGATTCGCCAAGCTCCCGGCA






ACCCCCGCTTGCCCATTGCAAATTCCCAAGT






GT





C
SEBP1
CRE
189
CAAATTCCCAAGTGTACTCCCGTCCTCGCGG






CTTTAAAATATGGCAGTCCGTCCGGCTTGAA






CATGCGCAAGTCGCATTTCCCAACGACAATC






CTCTTCGTAGCGCGCACGTTGCCAGGCAGCG






AAATATTCTATCATGTTTTTGCTGGGTTGAA






TGCAATTGAACACCGGTTTGGTTTCGGCAGG






CAGCTCCCCGACCGTCAAGGCTTGCATGGGA






TAGGGTTGCCCATCGCCGATAGCGACCGGCT






ACTTCAGCCAGCCCTCGCAGTGAGGTAGTGC






TTTTGGGTCTATATACAAAATGGCCGCTATG





A
NAR1.2
CRE
190
CGGCACACAACAGGGACACAGCACGGCGCA






CAGCACATGGCACACACTGCAGTGGCAGGC






TGACGCTGCACATTGGCTGTCTGCAGTCTTG






CTTGCGGCCCCTCCTAAATCTTGTTCCGGGC






TCGCGGGTTAGCTCTCGCCAGTCCCCCAGCC






CCCAGCACGCCTGCACTGTTGGCCCTGGCCC






TGGCCCTGGTTTTCGTGGGACAGTTGTCGAG






CAATGTCACTTCAACTCCTTGACGTTCGGGC






GCATCATGTGTGAACCTACGGGGGCTCTCCT






GGCGGTTGGGGGGTATACATTACGATACTAT






TT





B
NAR1.2
CRE
191
ATTACGATACTATTTTTTAGGGGCCGACATT






TGGGGTGAGTATTGAGTAGAGGGACGCCTG






GACTGCGGTGCCTAGATGCGCGAGGCGGCA






ACTCGGCACGGTCAGCGCGTTTCGCCCCCCG






CACCCAGGGCTGACCCGCTCGCTCGCTTGCG






CCAACCGACCGAAGTTCAAACGTCAGCGTC






GCGTCGAAACCCCAAATCATGCCGTCAGTA






AGTCGGCAGCGGATGACACGGCACATGCAA






TGAGGTCAGCCTTTGTTCCAAGGACTGCACA






TGTGGGGCGAAAGGGCGCCGTCGACGGCGC






GACTGC





C
NAR1.2
CRE
192
CGACGGCGCGACTGCAAATGCAACCACCGC






CGACAGCGCGAGCAAGCGGCCACAATTTTG






TTCTACGCGGTTGCAGCATGCTCAATACGAT






GTGCAATTTTGCAGCGCATGAGCGCGCACGT






TGGTGGGGTCTCCGACGTAGAGTAGGGCGG






TTGTGTACGGAACATACAACGGGGCTCTGCG






CGAACTCAATAAACTCCGCTGTTGGTGTGCA






ATTTTCAAACATCTGTAGCGGCAAGTACTGG






CAATAGTCCAGGCTATAACGCAACGATTCA






GGGCTAGACGCACAGTCGAGTTTAGACGCG






CAAAG





A
LHCBM5
CVU
193
TAGAGAAGAGCACTGGCGGCCGAAGGCTCG






GCAGCGCTGGCTGCTCGACACCGCGCTGCGC






AAACGCTTACCCACTAGCGCAAACAGCACC






ACCAGCACAAGTTTGAGCAGGGCCGCGGGG






CACACCATCGCAACCAGATCCCTGGTCACGC






CAGTTGCGCTGCGCTACCCCACAGAGACTGC






GCGGGCAGCAGCGAAGGCTGGCGCCTGACA






CACTTTCAAAAGGGCCCAGGGCAGCTGTAC






AGCGCTGTACCCTCGGCACCAGCGGGGAAG






CTGGCAGGGAAGCTGTAACAACACCATCAG






CAGCATC





B
LHCBM5
CVU
194
ACCATCAGCAGCATCAATTCTGGAGCCACG






ACAAGCCCTCCACGCTGCCCAATGTGCATTT






GATTGGATTTGATCCCCAAAAGGCAGCTGCA






CTCTGCCCCCCTCTCCTGTCCTCCTGCTGCCT






GTGGCGCCCCGCTCAAAAGCCGTGTGCATG






GAGCAGCTGGTTGGACAGCGGGTTTTGACCC






ACAAGCAGCCAGTCGCGAGGAAGGGATTTG






GGCCCGGCTGCTGAGGCCAGGCCTCATGGA






GCTGGCAGAGCCCTGACCACCGTCGCCACC






GACCAGCGCCAACCGCCCCACGGTCTCGTCC






GCCA





C
LHCBM5
CVU
195
CGGTCTCGTCCGCCAACACCCTGCTCCAGGC






GCCACACACCCTCCCCTCCCCGCCTCTCCCT






CCTCTCTAGCTTCCAGGAAGTAGCAAAGAAC






GGTTACTGTGGTGTTACAGCGCGCATACGCG






GCTGGGGGTGGATGCGAGTATAATCGTGTC






GAGGTGGGAGTTGAAAATTATCCTCTCTGGG






GACGAGTGGCGGGGCACCAAACCAAATGCT






GAAAGCACAAGCAGAACAAAGGGAGACAA






GCTAAAAGCTACAACACCTGCGCCGCCATC






AAGCGGGCGCCGGCGGACCAAGCGGGGGTG






CGGCAT





A
LCIC
CVU
196
TCTTACTGTTGTGGGGCTGCGCCTGTGCTAA






GCTGGCTGCCCGCCGCCTGCACTGAACACCT






GGCATGCCTGCCCTGGAGCTGCGGTGCAGAT






GCATGTGCATGTGGCGCAGCTCGACACAGC






ACTGCAGACCTTCCTCAAAAGCGTGGCAGTG






GATGCCCCAGACTGGAAATATGCAAATTGC






ACCGGGTGGCAGAGCTTGAGGTGTGCAGCC






ACCAACAAAGCCACGGGAGTGGCTGCTGTG






TGCAAGTCGGTCAACGCTGGGCGGGGCCCC






TCCGATGCGGTGCCTTTTGAAAGCGTCTACG






GCACA





B
LCIC
CVU
197
AGCGTCTACGGCACATACAACAGCACTGCT






ACCATGCTGGCCACCACAGCAGTTTACTCGC






CGCGTGACAATGTCTTTTGCGTCCTTCGGGC






AACTGACCGGCCGGTGGGCAGGCGGCCAGC






TGCGGCATGCCCTGCTGCCGTCTGGGCGGCA






CAGGCTGCTTCCTTCCCATCTGTGTGTTGGG






TTGATGGTGTGCTGGCTGCCCCTGTTGCAGG






CTGAGTGTCTGCTCCGATGCAAGACGGAGTG






CCAATCAAAGGCTGGCATCAAGTGCCCGTG






AGCCGCCCCACCTTCCTGTGGTGGTCAGCGC






CTC





A
SEBP1
CVU
198
GCCGGTTTACGCAAGGCGCGGCAAAGCAAA






GCACCCGGCGCAGGCGTGCACGAAGGATCG






CAGGGTGGGGCAGGCTGAGGCATGCCGGCA






GGCATGGGAGGCGGTGAGTGCGAGCCAGCA






CAGCGCGGGTGGAGGCTCACGCTTTGCTGCC






AGAGGCCTTGCCGCTGCCAGCGGTGGGCCC






CTCCTCCCGCCGCCGCTTGTTCCTGCATGCG






GGTGCGGCGCGGAAATGCAGCATGCTTGGC






AGCATCACGGTGTAGCGGTGCCCCCGGGGC






TGGTGTGGGGCAATGCCAGCCAGCTGCAGT






GTCCCGGC





B
SEBP1
CVU
199
CTGCAGTGTCCCGGCGGTGTGGCCCAAAAC






GGCACCGCCCAGGTCGGGCGACGCTGGCGG






CAGCGACGGCGGCGGCGCAGGGGTGGGGCC






TGGCCCCCATCTGCGGGCGGCATCTAGGTGG






CGGAGGGATGCTGCGTAGTTTCAAGGCGCA






GGGAGCGCACCTGGAGGGCGGCAAAGCGGT






GGGCGGCCCCATCTCCACGACAGCTGTTCCG






CTGCGCCCCTCCCCGCTGCCAGGGCTGTTCA






CTGCGTCAACCGCTCCCGATTGCGCGGTCAG






ACGCCCAGCTTTTGGGTCGCCAGCCGGTACA






GGTGT





C
SEBP1
CVU
200
AGCCGGTACAGGTGTACCCCAGGCTGGGTT






GACGCCCAAAGTCGCAATGCGCGTGGGATC






GGGCCTCTGTGTTGCTTGTGTGCCCAGGACA






GAAGCAGCAGAGCAGGCACCATGGCCGCTG






CCACCTTCTCCGCCCAGGCGACCGTCGCAGC






CCGTGTGGCGACCACCGCCAAGAGCTCCAC






CAGCATGAAGGTCCGATGGGGCGCCGGGGG






CATCGTTGCCGGCCTTCGATATGCCAGGGAG






CCAAGCGGGGCCCTGGGCGCCGTCTTATCCG






CTGCCTTGCATTGATGCCCTGCAGGTGGCTC






CCCGC





A
NAR1.2
CVU
201
GGCGGGGGACACGCGGCGGGCAGCCCCGAG






GCGGCACCGGGCGCCGGCCCCGGCAGCGCC






GGCGTCAGCCCGCCGCAGCCGCCCGCCGCG






GCCGAGCGCGCGCAGCCCAGCCCCGGCAGC






GGCGGCGGCGGCGAGGTGCGCCGCTCCTGG






GGCAGCCTCAAGTCCAAGTTTGGCAGCCTGA






GCGGGCGGGGAGGCAGCAAGGAGGAGGAG






GCGGTGGCGGCTGGGGCGGCCGCCAACACA






CCACGCAAATAGGGGCACGCGCATCTGCTG






CCTGGCCCCTGCCGGATGGTTGATGTGTACA






GAAGAGTTG





B
NAR1.2
CVU
202
TGTACAGAAGAGTTGAGAGCGTCAGTAGGG






TTGTGGTGGGGTGCCGGTTGCCCCGCCCATC






TCATCCCAGTTGTTTCCCTTCAAAACCAACC






CCAGCCAATAGGTTCTTAACCAGTACATCGT






AGACGCAACTCTGAACATCCGGGCCACTGA






TTCTTGTCGATTTATCTTGTTGATTGGTTGAG






CAGCACGTGTGCATCCCCGCTACTCTGTATG






TATCCAGCCATGCCGTCTGTTCCCCTTGCCA






GCGGTGCAACACTTGTTTTCTTTGTCTTGCA






ACATTTCGGTGTGATGGAAGTGAAGGAAAA






AA





C
NAR1.2
CVU
203
AAGTGAAGGAAAAAAGCCACAGTGAAGAA






ATGAGGTAAGCAATGAAGGCAGGGACAAAG






GGAGAGCAGGGCACCGGGAAAGAGAGCAG






CATGACACGGGACGAGTAGACGGCTCACAA






CCCACCGGCGGGAGCAGGGAAGAATGGAAG






GGGAGGCGAGCCAGGCGGCAGCACCCGTCT






CAATGTGACTTCTACTTGGCATCGGCGGCAC






CTGGCAGGCGGAACCTGCCTCCTCGAAGGG






CGCGGGTGCGCCCCGCCAGGCTTACGGCTG






GGCAGCGGCCATGCCAGTCGCTGCGTTGCCC






TGACAACTCC





A
LHCB5
VCA
204
AGGCCCATGGTTCGCCTTGGAGTTTGTGCCT






TCTTGGAAATTACAATAGAAGGCGTGCAGA






ACACATTTAGTGCATTTTTATATAAGGTATT






CTCATGGGCTTCTCTGACAGTTAAACAACAC






TACGTAGAGCCGCGCACCCGCCCCTGCGCTG






TGTTTCGGCCCGGTCAGGGCCCCCGGTGCTC






GTCCTTTTTCGGGGTGAGCCGTGAGCCGCCC






CACAGCGTAACACCCCAACACTCCTGTAGA






AACATGACATTAGCCAAAAGCATCTCCCTGT






CACAGCTTCGCTAATGATTGTGGTTGTGAAC






AA





B
LHCB5
VCA
205
TGTGGTTGTGAACAAAATCCCTCCTTGGACA






GGGTCGTTTGCAGGTAACATAACTCCCTCGA






GCCTCGTAACTTTACTCCAGCGTACTTGTAC






TGTGCGTTAACAAGACAACCTGTCTGGAAGT






AATGCTTTGCTAGGAATCCTTCTACAACGCT






TCATGCATGTAAACAGCGACTACGAAGAAA






ACTAAAAGGGAGCAATCCATATCAGTATCA






TACGTAAAGGGGTACTACATTTCTCACGTAG






TGGCCCATTCAGTTTCAGGGGTGTATACTTG






CTTTTGCAAGTGGTTTGCAAAATCATGTAAG






CT





C
LHCB5
VCA
206
AAAATCATGTAAGCTATTTGATTTAGCCACG






CAAATCCGAAAGAATGCCATACAAGCAGTG






TCATCCTGTACCCGAAGCTTCAGAGCTCTTC






ACTTGCCCATCATTATAAATAAGCTAAGAGA






GTAATGCACAAACTTTTATAACCTAATGCAC






ACAGGTACAGGAAGCGGTCCTGACGGAAGA






GGTCACGTCGTACGCATAGGGCCCTCGATCA






CAGCAAGGAACACCCTTTTATGGGCGCAGC






AGCGCTGGTATGGACACTTGCGCTGCCCTTC






TCTTCTTGTGTGTTCTAAACAGTAGCCAGTC






AAA





A
LCIC
VCA
207
AAGCTCCACTAGCTCCGAAGTTCCGACACGG






TCTCACACGCGCTCGTTAACTAACTTCAAAA






CATTACACTGCAAGTCAAAATTGCGCAGCGC






TGCTTGATCAGCTACCTTAACGCGCGGCACG






ACAAGACGCGTTGGTTATGCCAGCACTGACC






CGCCTCAAGCAATACGGCAAGATAAGGATC






TTCCCCGTGGCAGGGTTGGAAGTTGCTGTTG






GCATGCGGAGAGCTGTGAGGTCACATCTCA






CATGGAAACGCGTGTAGCACAACTCTTGGCT






GCCTATGCCAGTCCTGAAGGACACTTTCAGA






AC





B
LCIC
VCA
208
GGACACTTTCAGAACTGTTGAGATCATAAGC






TACTCGGCTACAACACATCTGTAAAGTTAAC






TGCCAGCGACAACTCTAAAAACTGCGGCCTT






TTGCGGCCACATGCCGTGCGATTGCCAACTG






CTTGGGTGTAAAGGTTGGAATTCCGGTAGTT






GATGCACAATTTCTCACTGTTTCTAAGCATT






ATTCATGAGAATGTGGCTTAGTAATCTAATT






AAGTCATCTTGGCTCGATACTGTAGTCTACA






TCCACATGGTTCAGGCTGCCGAAGGCCTGGC






CATACGATGACCGGAAGTCAGTCGCGCTAC






A





C
LCIC
VCA
209
GTCAGTCGCGCTACATACATGAGCTATGCTT






CTTTAGTTTGGCATTCTAAGCGAAGCTGATA






CAATTTCATTTCATCATGTTTAAATGCCACT






ACGCCCCATTTCTCCTTTACACATCCCGGGG






AAGACGAGTTACAATGTATTAAATCTTCAAT






CATATATACTTGATTCTTGGCATGCAGGATG






GAAAGCGAGTTGTAGGGTGTGTTGTCGTGCA






TCGCACGACATCGCATGTAGTAGTAGTAGG






AACATGTCCTCACCCGCCAACACATAAGGA






GCCAACGCTAACCAAGTCTGGCCAATCAGTT






CA





A
LCI5
VCA
210
ATAGCGACTTGGCGGGGCCATTGCTTTGCGG






TTTAGGATTTAACCGGGTTTTCTCTGGATGA






AGAGCGCGGACAGCTGACGAGCTTTCCTGC






AACCGTATGTTGGCGACCCTGGAAGTGTTAG






AAAGCTTAGAAAGCTTAGAAAGTTAGAAAG






CTCGATATAGTCGAACAATGAGCACAAAGG






AATGTGCTATGTGCTTGGGAAATTGCAAGAG






GCCAGCACAAATTTGCTATGTTGTCCTCAGC






GCCCACCCAAAGCCTTCGGGCCTCAGCTTTG






CATGGGCCAAGTTCCTGCTCTTAATTTCGGC






AAT





B
LCI5
VCA
211
CTTAATTTCGGCAATTCCATCAATTAGGCAT






ACAACATCGTTAGCAGGCATAAATCTCTGCT






GTCCATGACTATGTAGAGGAGGCGCGCAAG






CATAACAGTTGAGTATCTCTACTGCCGAACC






ATTTTTTTATAGATGCATTGTCTTCAAGACCT






AGTCCTGTTCTTCTTATGCTTTACCACAACG






AGAAGCGCGGAGGGATATCGCTGTACCTAT






GTGTAACGAAAAGGGCTTGCATGCATGCAT






GCACCATGAAGCAAATCCTAAAGAAAGGCG






TAAATGTAAAAACATGTATGGCAAAGCCAA






CGAT





C
LCI5
VCA
212
GGCAAAGCCAACGATGTTAAACATGTGAGC






GTGGAACTGACGTGTGCAAAGTACAACTCG






AACTTGCAGCAGTAAATCTTCCAAATAGCTA






ACGTATCCATATAGCATAGGAAAATTAAAT






ACACATGCGCTCCATGCATAAATTCTCCAAC






TGGACGAGCTACCATGTCTGGTTGAGAGACC






TGCCGTACCCCAACCCTACCACGTCCGTACT






CTTTTGGATAAAACAAAGGTGGCCCCAATGT






CCAAGCATCATTCACATTTTGAGCTGCACCG






CATTCGTCGTTCATTGTAATCTCCTTATAACA






AG





A
SEBP1
VCA
213
GTACGGTTGCGTGCTATTATATCTATGGGTT






GTGTTTGGAAGTTTTTAGCAAGACATGCTAT






CGAGGGGTCACATTTGAAGTTGCATCATGGT






AGCGAATCATGATGCACAACCAATTGACAG






CTCCTCCTCATTGCAGCTTGACGTAATCCGC






TAATGTCCCCGACCGCAGTGAGCCCATGTTG






ACGAGTTTGGCAAATCATAAGATGGGGTAT






GCGTACACACCCACGTGTCAAGCGGTTAGA






CTTGAGGACAAACCATAAGCTTCGGAGCTTC






AGATGCTATCGGTGCACTTGCGGACAACTGC






AGC





B
SEBP1
VCA
214
GCGGACAACTGCAGCTCCAGAGGGGGAATT






CAAAGGTCTTGGAGTCGCGGGTTTAGGGTGC






ATTTCCAGTGCGGATTAAGGCCAAAGATTAA






CCCTCTGTCCTCCATCGATACTTGCTCAAAC






GGCTAAGTTGTTGGCAAACTTACCTCGACTT






TTCAACCTTTGGTTCCCTTATGGAACAAAAC






TATGTGGTAAGCTCGTACCAAGGACTTCCGT






GCCCTAATCCCTGGCCTTAATCCGCACTGGA






AATGCGCCCTTAAAGATGGAGTGATGTCCCA






TTGCAAGGCCGCAATTGAAAGGAGCTCCTTG






C





C
SEBP1
VCA
215
AAAGGAGCTCCTTGCCAGCATCGCCTGAGTA






GTCTATATGGTCTTTTAAACTCTGACTTCCCT






GCAAGAGGCTTGCTATTGCCTGACCCATACG






CAGCGGACAGTGTCCTGTTTCACAAGTAATG






TGCAATAAAACTATGCAAAGAAACTTTTCAT






AATATGACTAAATATTGTAATAGTCTGAGTC






TCCCTATTTAGTAGGAATGCGCACCGCGGTA






CTATAGCAGATAAGGTGCCGTACATAGACT






GAAGCGGCAAGAACAAGAGGGGTGCAGCA






GCATAGATCCTTGCTTTAGGGTCAATTGCAA






AG





A
NAR1.2
VCA
216
GTACTTGGCAAGGTGCTATAGAAGTAGAAG






ATAGGAGACGATGATTGACACTTTGGTCCGA






CTATTTGGCTCGACATTCGCACGACATTCCT






AGCTGATGAGAGGGATGTCAAGATGTCAGG






GCAATCAATCCTGTCACATTCAGTCTTGTTG






AAATAATCGTAGTGTCTTGGTTTCATTATAA






ATCGGGGAGTTGCAGAGGAGACGTTCCCAC






CAGCGAGCGATGCCTGAAGATGTCTATGTGC






ACAGACTGTTGCATTTTCAGATGATATGCAA






TAAAGATAAGAACACAAGTCGTGCAGGAAA






AACG





B
NAR1.2
VCA
217
CGTGCAGGAAAAACGCGCAACGATGCTTTA






ACGCATAGTGGTTTAAGATGGGCGCGCTGA






ATTGATCCGGCATGGAGCGCGATGCGAATT






ATGTTTGAATACATGAAGCATTCATGTAAAC






AATTAAATACGTTTGGTCAAAAATAAAGTGC






GCACCACCAACGCATCGTCCCTGTCTCGCAG






AAAATCATACTTCCAATTTCTCATCTAAACG






GATCAAATTGCAGCTACTGAAACATCAAGC






AAATATAACGACATCCTCCGTGCAAGATCA






AAAATGATTCACATTGCACTTTCGCCATTGA






TCCCG





C
NAR1.2
VCA
218
TCGCCATTGATCCCGGAATTCGTTTGACAGC






GCGAACCCATAAGCCAATCACCCTATCATAA






AGCATAAATCTTCCATTAAACATACCCTATC






AACCTGGCCGCAACTTGTGGGGATGTAACTG






TATGTGGGTTTGTGTGTGTGGGTGCTCGGCC






AAATACAGCCGGCGTACGACATCACACTGA






CCTACTACCTTTCTTATCTTTTTTATATATGC






TGCTATGCACCCGGCTTACTCGTATAGCAGT






GTTACAAAGCTAGTTGGTTTCAGTAGTGTGT






TGTTCCTCATTGATCATCATATCTGGAAAGC






AGTTGTCACCACAAACCAAACGGGCGTTATT






TGTTCTTCCATCTTATTGCCTTTTCAAGGATG





A
LHCB5
ZMA
219
AGTCATGTCTTGGACAAAACTTCAGCAATTT






TCTAATAAAAGAACATTCCTATGGTGTATGA






TGTTAATCATCGTTTCTCCCACCTCTCTTTTC






CAGGGACACTGTCGATGCAATATTTGAAGA






GCTGGTTATAAACACCAAGAAGCTTGTGGCT






GCAACGTCAAAATGAATCGAAAAATAGCGT






TGAGTGGCACCACTGCATTGTCGTCTCTATT






AATCAGCTTGAACAGGCGGTAGGACTTAGT






C





B
LHCB5
ZMA
220
CGGTAGGACTTAGTCCTAGAATGCAGCCTGT






TGATCTCATGACATTCTATTAATTATGAGCG






TAGTTAGGTAGGATACTGACACAACACACA






TGGTTTCTGGTCCATATTTATTAGTTACATTC






CAGTATATTGTGGATTGCTCATCACTTGTTA






AATTAGAGAAAATTGATGCTCTGAGCTTCAG






ATGAACTTTGTTTCGTGCTTGTGCGTGTGTTC






TTCACCCTTCTGGTATCAGTGTGTGGCCAGC






ACTTGTTGTCTCGGCGCTCTCTCTCACTCACT






CTGGTTGGTTCCCCTAGGTCTTTGTCTAT





C
LHCB5
ZMA
221
TAGGTCTTTGTCTATCTTGTTTGGGCCATTTG






GCGCTAACTAACCAACAAGTGCACAAGAGG






CCCCTCAAGCTGCCACATCAGCACCCTCATC






TGCCAAGTCAGCACAGCCTGCCCAATCGCCT






CCAGGCAACAGATAGCCCTGATGGGCACCC






ATCCAATGGCAGCTCCGATGGCCAAATCTCT






GCTAGGCCCACAGCATCCTCCGATCCTCATT






TTCATCCATTTAAACTAGCTCGCCTTTTCCTC






CACAAGCCCCCATCAGCCATCCCCTCCCGCG






GCAAGTCTCTCTGAATTGTGGGTCTCCGGCG





B
SEBP1
ZMA
222
CAGTGAGAAAAGGCCTTGCCACTCTACGTAT






CTGATGTTGTTAATAATTTCAGAAGTCGTCG






TATATACCATGGGGTGTTTAATTGTCGTATA






TACGATGGGATGCTTAATTGTCGTATATACG






ATGGTATGATGAAACAACTGACTTAAACATC






ACACTGAACAATTTCAGAAAACGATCCATG






CCGTCGTATATATACGACAACAAAATACCA






GAAGCAAACCTCCCAGACCCAAGGGGAAAT






AAACGGGCCTGCTTCTGGTCGCTAGCTTGGG






GGCGCTGGAGCTGCAGTGCGTAGGCCCGTC






CGAT





C
SEBP1
ZMA
223
GTAGGCCCGTCCGATCCGTGGCTCGTCTCGG






CATGGCCACACAAACCACGAACGGTCGTCG






TGCACCGCAGCGCGGCCCCCCCGTTCTATCT






TCTCCAGCTCCAAATGGCGCCATCGCGGCGG






CCGGGTTATCTTGTCCAGACGTGCATCATAT






CCTCCGTGTGATCCATTCATCCCCGCGCCGT






GCTAGCTTGCTAGTTGCAAGCACCAGCCGAC






CACCAAACGGTAGCGCACGCGGACAATTTA






ACAGCATCAGGTTTAGGCCCTGCTGCCGTCG






TCGAGCGCCCGGGCCACCGCACACCTGAAA






GCA





A
LHCB5
ATH
224
TTCTGGTAATGTGTATGGTTTGAGTGCTGAT






TTTTGGTGCTATGAGTTGTTCTTTATGGCTCA






ACTTGGATCAATATGGAGGTTGAGTTTGAGA






TTTTCTCTCAGTTTAAGGAGGTAGAATAGTG






CGTATAGTGGCACAGTGAGCTCAGCTCTAGG






GCCAAAGGGCATAAATTCATTATAGCTCTTT






CGATTCTACCGTAGTACTGTGTGTGAACCGG






CACTGTGAACCAAGATGATTAAATTTTCGTA






TTCTCTATGTACATGATCCTGCGGCTCAATC






GCTTCAGTTTCGATCCACATGATGTATATG





B
LHCB5
ATH
225
CACATGATGTATATGTTATAGAATTGTGGGA






AACTCCTTGTAGAAAGAGTATGTTCACGTCT






AGGACTAGTCGGATGATTCGTTTCTCTTTTT






GGTGTAATGAGTATGTTCATAACTGTTGATA






CAATGTGAAAATCTAACCGTTGAGCTTGGGA






GTTTTACGTCTATATGAAAATTCCGGTTGTC






GTCTACATTACGGTAGTAAACAGGACCACA






GTGATTCCAAATGTCCCAAGGAATTTACTGA






AAACCCCAACTAGGACTGTGAAAGGCTTGT






GGATGACATTTAACAGTTGAGATTTTCATGT






GT





C
LHCB5
ATH
226
GAGATTTTCATGTGTTTGAGATTCTTGTAAC






ACATTTTGCTGTATAGGTGAAAGCTTAGCCA






CACAAAAGGAGAAACAGAGGATATGGATAA






AATAAATTATCCAACAAAAACCAATCTAAA






AGCCACATCAGCATCCACAACCAATCAGAG






GACAGAATCATATTTCACATTTTCAATCCAG






ACCAATCAAAATCCTGAACGAATCCTACTCT






CCACCTTATAGGAGCAGTTTCGTCTCTTCCT






CCTTCTTTCACTTAGCTCTTCCTAGTGTTAAA






CCAGAGTAAAGCTTGAAACTTTGGACTAAA






AGA





C
SEBP1
ATH
227
TATAATTTGGTTTGTATGTCATTGGTGATGT






AAACTGAAATTGAAGATAATAGAATCTCAT






AACCACACAAAAAATGAATGAACGCAAATC






AAAGCCTCTCAACACATCTCTTTGCCTCGGT






CTCTCTCTCGCCCAATTGCCCATCACCAGAG






CTTAATCATATCTTCTTCAGTTACTGCCACGT






GTCACTCTGACCGTGAACAGCCTTTATCTCT






TCCAAGTCCACTTGTGTTCTTGATTATTTTGT






CTTCACCATTCTCTCTACTCAAAGCTCTTCTT






CTTCGATCAAAAAACCTCGAGCTTCTAACA









We assayed all 92 TFs against five C. reinhardtii nuclear promoters: LCIC, LCI5, SEBP1, Nar1.2, and LHCBM5 (FIG. 10). LCIC, LCI5, and Nar1.2 are low CO2-induced genes that play roles in the CO2-concentrating mechanism (CCM) [40-42]. SEBP1 encodes sedoheptulose-1,7-bisphosphatase which functions during the Calvin cycle [43]. LHCBM5 encodes a component of light harvesting complex II and is involved in photosynthesis [44]. These genes were chosen because they were identified from a published RNA-sequencing dataset as highly regulated genes (i.e., they were expressed under laboratory conditions) in C. reinhardtii [45].


TFs 2, 3, 9, 28, 34, 45, 64, 69, and 81 each activated transcription from LCIC promoter fragment C (FIG. 10). TF64 activated transcription from LCI5 promoter fragment A; TFs 39 and 78 activated transcription from LCI5 promoter fragment C. TFs 3, 6, 27, 30, and 64 activated transcription from SEBP1 promoter fragment A; TF64 activated transcription from SEBP1 promoter fragment B; TFs 27, 30, 56, and 64 activated transcription from SEBP1 promoter fragment C. TFs 10, 30, and 64 activated transcription from Nar1.2 promoter fragment C. Finally, TF34 activated transcription from LHCBM5 promoter fragment C (FIG. 10). Note that LHCBM5 promoter fragment B was unable to be cloned (due to repeat sequences) and therefore was not assayed here. (See Materials and Methods for statistical information on Y1H assay.)


To summarize these Y1H assays, our data provide information on 1,288 TF-promoter potential binding interactions, 26 of which were positive hits. TF64 was the most active in this assay, activating transcription with four of the five promoters tested. TFs 3, 30, and 34 each activated transcription from two promoters. Note that some TFs bound multiple fragments of the same promoter. Many TFs however did not show activity with any of the five C. reinhardtii promoters we assayed. These data are summarized in Table 10.


Table 10. Yeast one-hybrid data summary









TABLE 10







Yeast one-hybrid data summary









Species
Promoter
Transcription Factor






Chlamydomonas
reinhardtii

SEBP1
3, 6, 27, 30, 56, 64



Chlamydomonas
reinhardtii

LCI5
39, 64, 78



Chlamydomonas
reinhardtii

LCIC
2, 3, 9, 28, 34, 45, 64, 69, 81



Chlamydomonas
reinhardtii

NAR1.2
10, 30, 64



Chlamydomonas
reinhardtii

LHCBM5
34



Volvox
carteri

SEBP1
64



Volvox
carteri

LCI5
2, 64



Volvox
carteri

LCIC
2, 21, 45, 57, 64, 69



Volvox
carteri

NAR1.2
2, 3, 4, 5, 13



Volvox
carteri

LHCBM5
58, 64



Chlorella
vulgaris

SEBP1
64



Chlorella
vulgaris

LCIC
10



Chlorella
vulgaris

NAR1.2
7



Chlorella
vulgaris

LHCBM5
2, 7, 18, 27, 51



Zea
mays

SEBP1
30, 64



Zea
mays

LHCBM5
2, 6, 14, 28, 37, 64, 76



Arabidopsis
thaliana

SEBP1
56



Arabidopsis
thaliana

LHCBM5
85









Putative transcription factors initiate transcription from orthologous promoters from multiple species. We also assayed our TF library with bait promoters from the closely related algal species Volvox carteri and Chlorella vulgaris, as well as from the distantly related plant species Arabidopsis thaliana and Zea mays. Again, we tested promoters LCIC, LCI5, SEBP1, Nar1.2, and LHCBM5 (Table 10, FIG. 11). Like the C. reinhardtii promoter data, TF64 was the most active in activating transcription in combination with promoter fragments from other species, specifically V. carteri LCIC, LCI5, SEBP1, and LHCB5; C. vulgaris SEBP1; and Z. mays SEBP1 and LHCBM5 (Table 10, FIG. 11). In full we analyzed 49 promoter fragments against 92 TFs for a total of 4,508 potential binding interactions. We found 65 positive hits and, most importantly, 28 TFs with potential DNA binding activity.


Analysis of potential TF64-binding promoters identified from the Y1H assay. Utilizing the collection of our Y1H data, we hypothesized we could identify commonalities among promoters which may function as specific motifs or binding sites important for gene regulation. We chose to analyze the promoter fragments that activated transcription in combination with TF64 because it provided us with the largest sample size, 13 promoter fragments in total. We used the software program MEME (Multiple Em (Expectation maximization) for Motif Elicitation) [32,33] to search for enriched DNA motifs. Unfortunately, no statistically significant motifs were identified. The top motif found was an 11 nucleotide sequence, TGNGCANNTNN (SEQ ID NO: 228) (FIG. 12A). Interestingly, this motif does contains remnants of the canonical binding site, CANNTG (nucleotides 5-10) (FIG. 12B), typical for the basic Helix-Loop-Helix family of transcription factors that TF64 belongs to [46,47].


Constitutive expression of the TF library in C. reinhardtii. We next attempted to study our TF library expressed in C. reinhardtii cc1010. The gene encoding each TF was cloned from the pENTR vector into a ble-2A expression vector [19], pTM207 (see FIG. 13, panel B). This expression vector results in co-transcription of a gene of interest along with the ble gene (conferring zeocin resistance) followed by post-translation cleavage of the two peptides at the 2A linker peptide site. Each pTM207 plasmid encoding a unique TF under control of the constitutive promoter PAR1 was electroporated into the C. reinhardtii nuclear genome. However, we were unable to obtain colonies of C. reinhardtii constitutively expressing the genes encoding most TFs. While we attempted transformation of all 92 TFs, gene-positive colonies were only recovered for 59 TFs, and only 21 TFs (1, 2, 4, 5, 14, 22, 31, 34, 38, 40, 41, 47, 52, 53, 55, 62, 63, 64, 75, 76, 84) had over 20% gene-positive colonies of those tested (data not shown). Western blot analyses of whole cell lysates were performed to verify production of the TFs, however protein was detected only in strains transformed with TFs 1, 2, 5, 13, 22, 31, 40, and 64.


In deciding which TF to carry forward with our study, we considered our Y1H data concurrently with our limited ability to produce the recombinant TFs in C. reinhardtii. TFs 2 and 64 both showed potential DNA binding activity and were capable of being constitutively produced in C. reinhardtii. Ultimately, we chose TF64 to continue our study of TF-promoter binding partners in C. reinhardtii.


Production of TF64 in C. reinhardtii. Basic Helix-Loop-Helix (bHLH) transcription factor family members, like TF64, are highly conserved in their functional and DNA-binding domains, even across distantly related species and genera [46-49]. They recognize a canonical binding site, CANNTG (called the E-box), in promoters of genes they regulate [47,49]. A BLAST search of the PlnTFDB TF64 sequence showed conservation in DNA binding, E-box specificity site, and dimerization interface domains among top hits of TF-like proteins from other microalgae species (FIG. 13, panel A). The remainder of the TF64 protein sequence is highly variable with the exception of a conserved ACT domain in the C-terminus of unknown function typically found in bacterial species [50] (FIG. 12).


We generated multiple strains of cc1010 that constitutively produced TF64 (cc1010::TF64-4, -7, -8, -9, and -11) shown by western blot (FIG. 13 panels B, C). The pTM207 vector encodes an N-terminal 3×FLAG-tag fused to each TF (not shown in FIG. 13, panel B), and the TF64 proteins were detected using antibodies against FLAG-tag. TF64 is predicted to be a 33 kDa protein (FIG. 13, panel C). The 3×FLAG-tag adds 2.7 kDa to the protein product. The higher molecular weight band is the Ble2A-TF64 fusion product prior to 2A cleavage. Through multiple western blot analyses, strain cc1010::TF64-7 appeared produced the least amount of transcription factor protein, and strain cc1010::TF64-9 appeared to produce the most amount of protein (representative data shown in FIG. 13, panel C).


As a control, we also used the pTM207 vector to generate a strain that constitutively produced GFP under control of PAR1 (FIG. 13, panel B). Whole cell lysate of strain cc1010::GFP is shown on the western blot in FIG. 13, panel C.


Growth curves were performed on strains cc1010::TF64-7, cc1010::GFP, and wild type cc1010 cultured in TAP medium under constant light for four days (FIG. 13, panels C, D). While cc1010::TF64-7 did exhibit an extended lag phase in growth, it was capable of reaching an OD750 similar to that of cc1010::GFP and the wild type cc1010 strain (FIGS. 13 panels C, D).


TF64 regulates many endogenous nuclear genes. To identify the genes/promoters TF64 regulates in C. reinhardtii, we performed an RNA-sequencing experiment on two independent strains, cc1010::TF64-7 (referred to as the low-constitutive strain) and cc1010::TF64-9 (referred to as the high-constitutive strain), along with our control strain cc1010::GFP (FIG. 14). RNA from three biological replicates for each strain was sequenced at the UCSD Institute for Genomic Medicine. Transcript abundance and differential expression analysis for each TF64-producing strain was compared to the GFP-producing strain (FIG. 14A). The data indicate that approximately 2.4% and 1.0% of the genome was affected at least 10-fold (log2≥16B, R2=0.498). Furthermore, a greater range of regulation was observed in the low-constitutive strain (TF64-7) compared to the high-constitutive strain (TF64-9) (FIG. 14, panels A, B, C).


The most highly regulated genes, both activated and inhibited, from the low-constitutive and high-constitutive TF64-producing strains were identified by bioinformatics using the BLASTx search function from NCBI (Table 11a, 11b, 11c). Inhibited genes were mostly uncharacterized and showed little similarity in function. Activated genes, particularly from the low-constitutive TF64-7 dataset, fell into relatively distinct functional categories including: photosynthesis, cell structure, cell cycle, and metabolism. Table 12 lists the top 20 activated genes (that have also been previously characterized) identified from the TF64-7 RNA-Seq data. These data suggest TF64, like many bHLH transcription factor family members [51,52], regulates many genes involved in a wide variety of developmental and cellular processes in C. reinhardtii.









TABLE 11a





Identification of TF64-regulated genes.







Top 40 Up-Regulated Genes in C.reinhardtii TF64-7














Log2 Fold
Gene

Protein


No.
Gene ID
Change
Symbol
Accession No.
Length





1
jgi|Chlre4|513883|
7.58
LHCBM7
XP_001694115
249



au5.g4042_t1:0-146






2
jgi|Chlre4|523567|
7.09
LHCBM8
XP_001695467
254



au5.g13085_t1:285-1460






3
jgi|Chlre4|512488|
6.82

XP_001697347
385



au5.g2746_t1:76-967






4
jgi|Chlre4|523561|
6.81
LHCBM4
XP_001695344
254



au5.g13079_t1:149-1280






5
jgi|Chlre4|520677|
6.80

XP_001697417
258



au5.g10379_t1:97-2184






6
jgi|Chlre4|518507|
6.80
FAP211
XP_001701654
698



au5.g8360_t1:204-4111






7
jgi|Chlre4|521087|
6.66
METE
XP_001702934
815



au5.g10761_t1:39-2944






8
jgi|Chlre4|513788|
6.37

XP_001693945
370



au5.g3953_t1:2032-4745






9
jgi|Chlre4|512994|
6.13






au5.g3208_t1:314-2817






10
jgi|Chlre4|521595|
6.09
SAH1
XP_001693339
483



au5.g11226_t1:266-2760






11
jgi|Chlre4|522358|
5.96

XP_001697707
306



au5.g11951_t1:421-1892






12
jgi|Chlre4|517273|
5.89

XP_001691691
381



au5.g7220_t1:576-2741






13
jgi|Chlre4|515402|
5.79
PHC13
XP_001690309
506



au5.g5474_t1:537-2854






14
jgi|Chlre4|520083|
5.77
GCP3
XP_001699475
930



au5.g9823_t1:3664-4112






15
jgi|Chlre4|524734|
5.71

XP_001700124
124



au5.g14197_t1:2040-2317






16
jgi|Chlre4|519722|
5.51

XP_001694801
130



au5.g9487_t1:300-2262






17
jgi|Chlre4|520120|
5.39
LHCBM1
XP_001700243
266



au5.g9859_t1:2-129






18
jgi|Chlre4|524285|
5.34
MCM4
XP_001700810
544



au5.g13771_t1:3002-3795






19
jgi|Chlre4|513665|
5.33

XP_001692967
581



au5.g3835_t1:58-185






20
jgi|Chlre4|518165|
5.30

XP_001701406
86



au5.g8046_t1:194-1066






21
jgi|Chlre4|526354|
5.20

XP_001696801
304



au5.g15724_t1:5515-5842






22
jgi|Chlre4|524988|
5.17

XP_001692594
241



au5.g14435_t1:1814-1954






23
jgi|Chlre4|512084|
5.15
DCL2
XP_001698921
5684



au5.g2359_t1:10431-10587






24
jgi|Chlre4|512529|
5.11
GAP1
XP_001703199
371



au5.g2782_t1:35-1932






25
jgi|Chlre4|518966|
5.09
SYP72
XP_001700031
270



au5.g8779_t1:1773-1883






26
jgi|Chlre4|519390|
5.07
FTSZ1
XP_001702420
479



au5.g9173_t1:283-2258






27
jgi|Chlre4|515943|
4.99
FTSZ2
XP_001700508
434



au5.g5981_t1:176-2507






28
jgi|Chlre4|512163|
4.91

XP_001699495
346



au5.g2437_t1:1012-1109






29
jgi|Chlre4|513021|
4.90

XP_001691021
93



au5.g3230_t1:36-1751






30
jgi|Chlre4|520083|
4.79
GCP3
XP_001699475
930



au5.g9823_t1:4197-4255






31
jgi|Chlre4|519414|
4.77

XP_001702440
1844



au5.g9197_t1:7768-7861






32
jgi|Chlre4|518566|
4.76

XP_001701683
863



au5.g8414_t1:7331-7954






33
jgi|Chlre4|523024|
4.75
EFG8
XP_001696344
395



au5.g12580_t1:45-2087






34
jgi|Chlre4|521599|
4.74

XP_001693192
1300



au5.g11230_t1:910-4556






35
jgi|Chlre4|513496|
4.73
GLN3
XP_001692927
375



au5.g3676_t1:1531-1934






36
jgi|Chlre4|512150|
4.70

XP_001699532
660



au5.g2424_t1:2797-2877






37
jgi|Chlre4|513333|
4.69
MIND1
XP_001697031
351



au5.g3525_t1:167-1848






38
jgi|Chlre4|520302|
4.66
TEF13
XP_001703033
150



au5.g10033_t1:278-1558






39
jgi|Chlre4|514112|
4.62

XP_001703138
150



au5.g4259_t1:7-1195






40
jgi|Chlre4|525978|
4.62

XP_001694482
133



au5.g15362_t1:230-2771















Closest Hit for



No.
Function
Hypotheticals
Category





1
Chlorophylla-b binding

Photosynthesis



protein of LHCII




2
Chlorophylla-b binding

Photosynthesis



protein of LHCII




3
Hypothetical protein
Extracellular matrix
Cell structure




glycoprotein





pherophorin-V32





(Volvox)



4
Chlorophylla-b binding

Photosynthesis



protein of LHCII




5
Predicted protein
Hydroxyproline-rich
Cell structure




glycoprotein





(Chlamydomonas






reinhardtii)




6
Flagellar associated

Motility



protein




7
Cobalamin-independent

Metabolism



methionine synthasae




8
Predicted protein
Flagellar associated
Motility




protein (Chlamydomanas






reinhardtii)




9

Cell wall protein
Cell structure




pherophorin-C4





(Chlamydomonas






reinhardtii)




10
S-Adenosyl homocysteine

Metabolism



hydrolase




11
Hypothetical protein
None



12
Hypothetical protein
Flagellar associated
Motility




protein (Chlamydomanas






reinhardtii)




13
Cell wall protein

Cell structure



pherophorin-C13




14
Gamma tubulin

Cell structure



interacting protein




15
Predicted protein
None



16
Predicted protein
None



17
Chlorophylla-b binding

Photosynthesis



protein of LHCII




18
Minichromosome

Cell cycle



maintenance protein 4




19
Predicted protein, zinc
GATA transcription
Regulation



finger DNA binding
factor 26




domain
(Auxenochlorella)



20
Predicted protein
None



21
Cohesin subunit SCC1b

Cell cycle



(Rad21/Rec8 homolog)




22
Predicted protein
Hypotheticals



23
Dicer-like protein

Regulation


24
Glyceraldehyde 3-

Metabolism



phosphate dehydrogenase




25
Qc-SNARE protein,

Localization



SYP7-family




26
Plastid division protein

Cell cycle


27
Plastid division protein

Cell cycle


28
Predicted protein
Hypotheticals



29
Hypothetical protein
Hypotheticals



30
Gamma tubulin

Cell



interacting protein

structure/Localization


31
Predicted protein
Forkhead-associated
Regulation/Localization




protein (Geitlerinema)



32
Predicted protein
Hypotheticals



33
Mitochondrial translation

Translation



factor Tu




34
Predicted protein
Flagellar associated
Motility




protein (Chlamydomanas






reinhardtii)




35
Glutamine synthetase

Metabolism


36
Predicted protein
Hypotheticals
Metabolism



(Peptidase M7)




37
Chloroplast septum site-

Cell cycle



determining protein




38
Predicted protein
Aminoacyl-tRNA
Localization




synthase CAAD domain,





Curvature thylakoid



39
Glutathione S-transferase

Metabolism


40
RAN binding protein,

Cell cycle



RANBP1
















TABLE 11b





Identification of TF64-regulated genes.







Top 20 Down-Regulated Genes in C.reinhardtii TF64-7














Log2 Fold
Gene

Protein


No.
Gene ID
Change
Symbol
Acession No.
Length





1
jgi|Chlre4|516390|
−6.45

XP_001701467
415



au5.g6397_t1:9021-11277






2
jgi|Chlre4|518525|
−5.94

XP_001701867
274



au5.g8375_t1:24-151






3
jgi|Chlre4|525738|
−5.91

XP_001694214
433



au5.g15143_t1:11-124






4
jgi|Chlre4|525694|
−5.31

XP_001694228
264



au5.g15099_t1:14-125






5
jgi|Chlre4|522989|
−5.24
MSRA2
XP_001696359
335



au5.g12549_t1:37-201






6
jgi|Chlre4|511147|
−5.22

XP_001690001
198



au5.g1489_t1:2524-2687






7
jgi|Chlre4|515954|
−5.19

XP_001700503
335



au5.g5992_t1:1663-3048






8
jgi|Chlre4|523962|
−5.14

XP_001691410
1549



au5.g13460_t1:2301-9088






9
jgi|Chlre4|515035|
−5.13

XP_001699067
202



au5.g5129_t1:1-2180






10
jgi|Chlre4|518356|
−4.96

XP_001703564
182



au5.g8226_t1:0-167






11
jgi|Chlre4|521856|
−4.87

XP_001691165
516



au5.g11476_t1:1355-1480






12
jgi|Chlre4|512501|
−4.85






au5.g2756_t1:79-213






13
jgi|Chlre4|516261|
−4.82

XP_001697937
590



au5.g6278_t1:87-255






14
jgi|Chlre4|517935|
−4.80






au5.g7833_t1:73-154






15
jgi|Chlre4|521621|
−4.78






au5.g11252_t1:1529-1690






16
jgi|Chlre4|510735|
−4.68

XP_001702142
268



au5.g1093_t1:2936-3258






17
jgi|Chlre4|519614|
−4.68
VIG1
XP_001694669
361



au5.g9382_t1:52-2262






18
jgi|Chlre4|520495|
−4.65

XP_001697557
91



au5.g10220_t1:53-192






19
jgi|Chlre4|519116|
−4.64

XP_001699975
185



au5.g8918_t1:1732-2084






20
jgi|Chlre4|521566|
−4.63

XP_001693207
5234



au5.g11198_t1:0-150















Closest Hit for



No.
Function
Hypotheticals
Category





1
Predicted protein
Snurportin-1 (nuclear
Regulation/Localization




import) (Monoraphidium)



2
Predicted protein
Serine/threonine protein
Signaling/Cell cycle




kinase (Microcystis)



3
Predicted protein
Hypotheticals



4
Predicted protein
Transmembrane E3
Localization/Regulation




ubiquitin-protein ligase 1-





like (Zn-finger) (Camelina)



5
Peptide methionine-S-

Metabolism/Redox



sulfoxide reductase




6
Predicted protein
Inositol oxygenase
Metabolism/Redox




(Monoraphidium)



7
Predicted protein
Hypotheticals



8
Hypothetical protein
T-complex protein 10
Protein stability




(chaperone) domain-





containing protein





(Rozella)



9
Hypothetical protein
None



10
Predicted protein
DNA-directed RNA
Regulation




polymerase (Ostreococcus)



11
Hypothetical protein
ATP-dependent DNA
Regulation




helicase (Rhizoctonia)



12





13
Hypothetical protein
Kinesin-like protein





(Oxytricha)
Localization


14





15

Dicer-like protein
Regulation




(Chlamydomonas






reinhardtii)




16
Hypothetical protein
Hypotheticals





(Chlamydomonas






reinhardtii)




17
Vasa intronic gene

Regulation



(putative RISC





associated factor)




18
Predicted protein
Calcium/calmodulin-
Signaling/Cell cycle




dependent protein kinase





(Cladophialophora)



19
Hypothetical protein
Carboxylesterase
Metabolism




(Chrondromyces)



20
Predicted protein
None











Top 20 Up-Regulated Genes in C.reinhardtii TF64-9














Log2 Fold
Gene

Protein


No.
Gene ID
Change
Symbol
Accession No.
Length





1
jgi|Chlre4|523567|
5.88
LHCBM8
XP_001695467
254



au5.g13085_t1:285-1460






2
jgi|Chlre4|521087|
5.86
METE
XP_001702934
815



au5.g10761_t1:39-2944






3
jgi|Chlre4|512084|
5.51
DCL2
XP_001698921
5684



au5.g2359_t1:10431-10587






4
jgi|Chlre4|512529|
5.47
GAP1
XP_001703199
371



au5.g2782_t1:35-1932






5
jgi|Chlre4|521595|
5.16
SAH1
XP_001693339
483



au5.g11226_t1:266-2760






6
jgi|Chlre4|523561|
4.98
LHCBM4
XP_001695344
254



au5.g13079_t1:149-1280






7
jgi|Chlre4|518569|
4.95
BIP2
XP_001701884
662



au5.g8417_t1:356-3190






8
jgi|Chlre4|526287|
4.91

XP_001696684
577



au5.g15661_t1:0-160






9
jgi|Chlre4|522775|
4.82

XP_001697724
262



au5.g12346_t1:17-1931






10
jgi|Chlre4|514561|
4.56






au5.g4680_t1:249-354






11
jgi|Chlre4|518501|
4.51

XP_001701651
825



au5.g8356_t1:6-97






12
jgi|Chlre4|520083|
4.50
GCP3
XP_001699475
930



au5.g9823_t1:3664-4112






13
jgi|Chlre4|515402|
4.47
PHC13
XP_001690309
506



au5.g5474_t1:537-2854






14
jgi|Chlre4|522427|
4.46

XP_001702210
320



au5.g12017_t1:8-1276






15
jgi|Chlre4|524246|
4.42
GGH1
XP_001700978
395



au5.g13735_t1:144-263






16
jgi|Chlre4|518951|
4.41

XP_001699834
565



au5.g8765_t1:2102-2222






17
jgi|Chlre4|524734|
4.32

XP_001700124
124



au5.g14197_t1:2040-2317






18
jgi|Chlre4|520302|
4.29
TEF13
XP_001703033
150



au5.g10033_t1:278-1558






19
jgi|Chlre4|524988|
4.27

XP_001692594
241



au5.g14435_t1:1814-1954






20
jgi|Chlre4|513993|
4.12






au5.g4144_t1:98-242



















Closest Hit for



No.
Function
Hypotheticals
Category





1
Chlorophylla-b binding

Photosynthesis



protein of LHCII




2
Cobalamin-independent

Metabolism



methionine synthasae




3
Dicer-like protein

Regulation


4
Glyceraldehyde 3-

Metabolism



phosphate





dehydrogenase




5
S-Adenosyl

Metabolism



homocysteine hydrolase




6
Chlorophylla-b binding

Photosynthesis



protein of LHCII




7
Binding protein 2

Regulation



(HSP70-like)




8
Cell wall protein

Cell structure


9
Hypothetical protein
Hypotheticals



10





11
Predicted protein
Hypotheticals
Cell structure



(Pherophorin)




12
Gamma tubulin

Cell



interacting protein

structure/Localization


13
Cell wall protein

Cell structure



pherophorin-C13




14
Hypothetical protein
None



15
Gamma-glutamyl

Metabolism



hydrolase




16
Predicted protein
Kinetochore protein
Cell cycle




(Monoraphidium)



17
Predicted protein
None



18
Predicted protein
Aminoacyl-tRNA synthase
Localization




CAAD domain, Curvature





thylakoid



19
Predicted protein
Hypotheticals



20



















TABLE 11c





Identification of TF64-regulated genes.







Top 20 Down-Regulated Genes in C.reinhardtii TF64-9














Log2 Fold
Gene

Protein


No.
Gene ID
Change
Symbol
Acession No.
Length





1
jgi|Chlre4|516390|
−8.08

XP_001701467
415



au5.g6397_t1:9021-11277






2
jgi|Chlre4|526060|
−7.81

XP_001694632
205



au5.g15439_t1:1-1486






3
jgi|Chlre4|515007|
−6.22






au5.g5104_t1:7-111






4
jgi|Chlre4|515035|
−6.16

XP_001699067
202



au5.g5129_t1:1-2180






5
jgi|Chlre4|525250|
−5.62
CNX3
XP_001696086
158



au5.g14686_t1:1808-1933






6
jgi|Chlre4|519344|
−5.54

XP_001699873
285



au5.g9128_t1:1286-1407






7
jgi|Chlre4|519746|
−5.34

XP_001694814
849



au5.g9511_t1:3078-3200






8
jgi|Chlre4|519781|
−5.32






au5.g9545_t1:8-295






9
jgi|Chlre4|525292|
−4.89

XP_001696021
509



au5.g14727_t1:6992-7141






10
jgi|Chlre4|515252|
−4.85

XP_001699041
368



au5.g5337_t1:24-2393






11
jgi|Chlre4|524801|
−4.83

XP_001692414
358



au5.g14261_t1:85-214






12
jgi|Chlre4|509820|
−4.76

XP_001702523
249



au5.g239_t1:3263-3428






13
jgi|Chlre4|517501|
−4.70
ZYS1a
XP_001703789
183



au5.g7428_t1:0-158






14
jgi|Chlre4|518295|
−4.69

XP_001699461
454



au5.g8166_t1:3257-3355






15
jgi|Chlre4|522765|
−4.66

XP_001702143
345



au5.g12336_t1:12-248






16
jgi|Chlre4|512725|
−4.65

XP_001700531
139



au5.g2956_t1:743-841






17
jgi|Chlre4|522065|
−4.63






au5.g11678_t1:3348-3418






18
jgi|Chlre4|523269|
−4.61

XP_001696499
500



au5.g12806_t1:2071-2239






19
jgi|Chlre4|512204|
−4.54






au5.g2477_t1:657-808






20
jgi|Chlre4|512657|
−4.53






au5.g2894_t1:3901-4046



















Closest Hit for



No.
Function
Hypotheticals
Category





1
Predicted protein
Snurportin-1 (nuclear
Regulation/Localization




import) (Monoraphidium)



2
Hypothetical protein
Hypotheticals



3





4
Hypothetical protein
None



5
Molybdenum cofactor

Metabolism/Redox



synthesis-step 1 protein




6
Hypothetical protein
Antibiotic biosynthesis
Metabolism/Redox




monooxygenase





(Acidovorax), Negative





regulatory factor (HIV)



7
Predicted protein
GRIP (glutamate receptor-
Metabolism




interacting protein)





(Auxenochlorella)



8

Putative ribonuclease H
Regulation




protein



9
Hypothetical protein
Chitin binding domain-
Metabolism




containing protein





(Strongyloides)



10
Hypothetical protein
Hypotheticals



11
Predicted protein
Hypotheticals



12
Predicted protein
Hypotheticals



13
Transcription factor,

Regulation



zygote-specific




14
Hypothetical protein
AP2 family transcription
Regulation




factor (Volvox)



15
Hypothetical protein
Reverse transcriptase
Regulation




(Chlorella)



16
Predicted protein
Hypotheticals



17





18
Hypothetical protein
KDEL motif-containing
Localization




protein 1 (Chlamydotis)



19





20

Hypotheticals











TF64-7 RNA-Seq data for Yeast One-Hybrid Assayed Genes














Log2 Fold
Gene

Protein


No.
Gene ID
Change
Symbol
Acession No.
Length





1a
jgi|Chlre4|516524|
2.10
LHCBM5
XP_001695927
289



au5.g6524_t1:5-1994






1b
jgi|Chlre4|516524|
1.56
LHCBM5





au5.g6524_t1:5-1994






2a
jgi|Chlre4|509966|
−0.47
LCI5
XP_001690584
235



au5.g377_t1:5-1831






2b
jgi|Chlre4|509966|
−0.85
LCI5





au5.g377_t1:5-1831






2c
jgi|Chlre4|509966|
−1.52
LCI5





au5.g377_t1:5-1831






3a
jgi|Chlre4|521190|
−0.40
SEBP1
XP_001691997
389



au5.g10858_t1:251-1857






4a
jgi|Chlre4|524083|
−0.61
Nar1.2
XP_001691213
336



au5.g13574_t1:501-1961






5a
jgi|Chlre4|524053|
−1.57
LCIC
XP_001691223
443



au5.g13545_t1:9-2267



















Closest Hit for



No.
Function
Hypotheticals
Category





1a
Minor chlorophyll a-b

Photosynthesis



binding protein of





photosystem II




1b





2a
Low-CO2-inducible





protein




2b





2c





3a
Sedoheptulose-1,7-

Metabolism



bisphosphatase




4a
Anion transporter

Metabolism/Redox


5a
Low-CO2 inducible

Carbon-concentrating



protein

mechanism










TF64-9 RNA-Seq data for Yeast One-Hybrid Assayed Genes














Log2 Fold
Gene

Protein


No.
Gene ID
Change
Symbol
Acession No.
Length





1a
jgi|Chlre4|516524|
0.92
LHCBM5
XP_001695927
289



au5.g6524_t1:5-1994






1b
jgi|Chlre4|516524|
1.04
LHCBM5





au5.g6524_t1:5-1994






2a
jgi|Chlre4|509966|
−3.66
LCI5
XP_001690584
235



au5.g377_t1:5-1831






2b
jgi|Chlre4|509966|
1.43
LCI5





au5.g377_t1:5-1831






2c
jgi|Chlre4|509966|
−1.77
LCI5





au5.g377_t1:5-1831






2d
jgi|Chlre4|509966|
−1.82
LCI5





au5.g377_t1:5-1831






3a
jgi|Chlre4|521190|
−0.62
SEBP1
XP_001691997
389



au5.g10858_t1:251-1857






5a
jgi|Chlre4|524053|
−2.33
LCIC
XP_001691223
443



au5.g13545_t1:9-2267



















Closest Hit for



No.
Function
Hypotheticals
Category





1a
Minor chlorophyll a-b

Photosynthesis



binding protein of





photosystem II




1b





2a
Low-CO2-inducible





protein




2b





2c





2d





3a
Sedoheptulose-1,7-

Metabolism



bisphosphatase




5a
Low-CO2 inducible

Carbon-concentrating



protein

mechanism
















TABLE 12







Top 20 up-regulated genes in C.reinhardtii cc1010::TF64-7.














Log2 Fold
Gene




No.
Gene ID
Change
Symbol
Function
Category















1
jgi|Chlre4|513883|
7.58
LHCBM7
Chlorophylla-b
Photosynthesis



au5.g4042_t1:0-146


binding protein of







LHCII



2
jgi|Chlre4|523567|
7.09
LHCBM8
Chlorophylla-b
Photosynthesis



au5.g13085_t1:285-1460


binding protein of







LHCII



3
jgi|Chlre4|523561|
6.81
LHCBM4
Chlorophylla-b
Photosynthesis



au5.g13079_t1:149-1280


binding protein of







LHCII



4
jgi|Chlre4|518507|
6.80
FAP211
Flagellar
Motility



au5.g8360_t1:204-4111


associated protein



5
jgi|Chlre4|521087|
6.66
METE
Cobalamin-
Metabolism



au5.g10761_t1:39-2944


independent







methionine







synthase



6
jgi|Chlre4|521595|
6.09
SAH1
S-Adenosyl
Metabolism



au5.g11226_t1:266-2760


homocysteine







hydrolase



7
jgi|Chlre4|515402|
5.79
PHC13
Cell wall protein
Cell structure



au5.g5474_t1:537-2854


pherophorin-C13



8
jgi|Chlre4|520083|
5.77
GCP3
Gamma tubulin
Cell structure



au5.g9823_t1:3664-4112


interacting protein



9
jgi|Chlre4|520120|
5.39
LHCBM1
Chlorophylla-b
Photosynthesis



au5.g9859_t1:2-129


binding protein of







LHCII



10
jgi|Chlre4|524285|
5.34
MCM4
Minichromosome
Cell cycle



au5.g13771_t1:3002-3795


maintenance







protein 4



11
jgi|Chlre4|512084|
5.15
DCL2
Dicer-like protein
Regulation



au5.g2359_t1:10431-10587






12
jgi|Chlre4|512529|
5.11
GAP1
Glyceraldehyde 3-
Metabolism



au5.g2782_t1:35-1932


phosphate







dehydrogenase



13
jgi|Chlre4|518966|
5.09
SYP72
Qc-SNARE
Localization



au5.g8779_t1:1773-1883


protein, SYP7-







family



14
jgi|Chlre4|519390
5.07
FTSZ1
Plastid division
Cell cycle



au5.g9173_t1:283-2258


protein



15
jgi|Chlre4|515943|
4.99
FTSZ2
Plastid division
Cell cycle



au5.g5981_t1:176-2507


protein



16
jgi|Chlre4|520083|
4.79
GCP3
Gamma tubulin
Cell structure/



au5.g9823_t1:4197-4255


interacting protein
Localization


17
jgi|Chlre4|523024|
4 75
EFG8
Mitochondrial
Translation



au5.g12580_t1:45-2087


translation factor Tu



18
jgi|Chlre4|513496|
4.73
GLN3
Glutamine
Metabolism



au5.g3676_t1:1531-1934


synthetase



19
jgi|Chlre4|513333|
4.69
MIND1
Chloroplast
Cell cycle



au5.g3525_t1:167-1848


septum site-







determining







protein



20
jgi|Chlre4|520302|
4.66
TEF13
Aminoacyl-tRNA
Localization



au5.g10033_t1:278-1558


synthase CAAD







domain









Bioinformatic analysis of promoters of genes regulated by TF64. We chose three sets of promoters, TF64-activated, TF64-inhibited, and TF64-non-regulated, from the low-constitutive TF64-7 RNA-Seq dataset to analyze for common motifs. Promoters included 1,000 bps 5′ to the ATG translation start site of the 30 top activated, inhibited, and non-regulated (log2=0) genes. Most genes did not have annotated 5′ UTRs. Promoters from each regulatory category were analyzed by MEME to identify any common motifs, however no statistically significant sequences were found for any group. Additionally, we used the program AME (Analysis of Motif Enrichment) [34] to determine if the bHLH canonical binding site, CANNTG, was present with statistical significance, and it was not for any of the three promoter categories.


We further analyzed the promoter groups using the alignment software Jalview [35]. Promoters were aligned without gaps and all CANNTG sequences were identified for each group. Analysis of CANNTG composition as well as relative location within the promoter did not reveal significant differences among the three promoter groups analyzed. These data suggest that the CANNTG sequence is ubiquitous throughout the C. reinhardtii genome. While this motif may play a role in TF64-DNA binding, it is not solely responsible for the gene regulation observed in the TF64-constitutive expression strains. It is likely that other co-factors and/or regulatory elements are important for transcription of the genes we identified to be regulated by TF64, further underscoring the complex nature of nuclear gene regulation in eukaryotic microalgae.


TF64 activates transcription of light harvesting complex II components. To validate our RNA-Seq analysis, we performed reverse transcriptase quantitative PCR (RT-qPCR) on selected genes. Strains cc1010::TF64-7 and cc1010::GFP were cultured in TAP medium under constant light for three days until mid-log phase growth was reached. RNA was isolated from cells and cDNA was synthesized for RT-qPCR analysis. Among the top activated genes from the TF64-7 RNA-Seq dataset were LHCBM7, LHCBM8, LHCBM4, and LHCBM1 (Table 10) of light harvesting complex II (PSII) [44]. We were able to confirm that transcripts from these genes were approximately 16 times (for LHCBM7), four times (for LHCBM8 and LHCBM4), and eight times (for LHCBM1) more abundant in the TF64-producing strain compared to the GFP-producing strain by RT-qPCR (FIG. 15, panel A). Furthermore, genes LHCBM5, LHCBM2, LHCBM3, LHCBM6, and LHCBM9 also of PSII [44] were additionally analyzed and found to be activated in the TF64-producing strain (FIG. 15, panel A). Interestingly, the promoter of gene LHCBM5 was assayed in our Y1H screen but was not detected to activate transcription with TF64 in yeast. FIG. 15, panel A shows transcript abundance data for each of these genes by RNA-Seq and RT-qPCR. These data indicate TF64 plays a role in activating PSII components and possibly regulation of photosynthesis. The nine PSII promoters were analyzed similarly to those previously discussed. Again, MEME did not identify any new motifs, CANNTG was not present with statistical significance determined by AME (data not shown), and CANNTG composition and location were not different from any group of promoters analyzed from the RNA-Seq selected promoters.


Transcription analysis of Y1H-assayed genes. We also investigated transcription of the genes whose promoters were found to activate transcription with TF64 by Y1H (i.e., LCI5, SEBP1, LCIC, and Nar1.2). RNA-Seq data indicated that each of these genes were down-regulated in C. reinhardtii cells constitutively expressing the gene encoding TF64 (FIG. 15, panel B, Table 11). By RT-qPCR, we confirmed that transcription of the genes LCI5, SEBP1, and LCIC were in fact inhibited by constitutive expression of the gene encoding of TF64. Nar1.2, however, was activated in our RT-qPCR analysis (FIG. 15, panel B). Overall, these data support our RNA-Seq analysis.


Collectively, these results highlight the nature of high-throughput screens, like the Y1H, and high-throughput sequencing data, as generated here by RNA-sequencing: they produce large amounts of data that can serve as an excellent starting point for narrowing down potential molecular interactions of interest. Here, we successfully used these two screens to identify potential TF-promoter binding partners in C. reinhardtii.


Conclusions.

In this study, we successfully constructed a recombinant transcription factor library that includes 92 (nearly one third of the putative) transcription factors (TFs) encoded by the nuclear genome of C. reinhardtii. To date, very few TFs have actually been characterized from this species of microalgae [20]. We analyzed the 92 TFs' ability to activate transcription via a yeast one-hybrid screen, studied the TFs' abilities to be constitutively expressed in their native organism C. reinhardtii, and finally assessed transcription profiles by RNA-Seq from two independent strains constitutively expressing one specific TF (TF64). These high-throughput studies were designed to narrow down the vast amount of hypothetical transcription factor-promoter binding pairs in C. reinhardtii (˜350 TFsט15,000 nuclear genes=5,250,000 potential interactions). Our results establish a clear direction for investigation of direct binding partners that could be used in an engineered synthetic nuclear transcription system in green algae.


Using a yeast one-hybrid assay [37], we were able to analyze 4,508 potential binding interactions between TFs and promoter fragments. Sixty-five of these were found to be positive hits correlating with 28 TFs with potential DNA binding activity. We assayed five promoters (LCIC, LCI5, SEBP1, Nar1.2, and LHCBM5) in different combinations from C. reinhardtii, V. carteri, C. vulgaris, A. thaliana, and Z. mays. The ability to activate transcription from unique DNA sequences by a number of the putative TFs analyzed support the bioinformatic data [24] suggesting these proteins are in fact functional transcription factors, capable of regulating transcription in C. reinhardtii.


Compiling the yeast one-hybrid data, we sought to identify common motifs among promoter fragments found to activate transcription in combination with an individual TF. The promoters, however, proved to be more cryptic than anticipated. We studied TF64-associated promoters, 13 sequences in total, and were unable to identify commonalities by bioinformatics. It may be that a larger number of promoters need to be analyzed before such a characterization is possible. In the future, it would be interesting to compare DNA sequences from a larger dataset of C. reinhardtii promoters and also determine if identified motifs were conserved in the promoters of other closely or distantly related species.


Our TF library was cloned into a C. reinhardtii constitutive expression vector for production in C. reinhardtii. To our knowledge, this was the first attempt to constitutively produce a recombinant library of native TFs in C. reinhardtii. Of the 92 TF-encoding vectors that were transformed, only eight resulted in successful production of protein under the conditions attempted. As almost all of the TFs produced protein in S. cerevisiae, the algae expression data suggest that the failure for most TFs to produce protein in C. reinhardtii is possibly due to adverse effects of constitutively expressing their genes. It is possible these TFs could be produced under more tightly controlled experimental conditions, or when placed under inducible or conditional expression systems.


TF64 was our most successful TF in that it was able to be produced in multiple strains of C. reinhardtii and it was the most active TF in the yeast one-hybrid assay. From RNA-sequencing data on strains constitutively producing TF64, compared to a GFP-constitutive strain, we were able to determine that TF64 likely plays a role in regulating transcription of genes involved in multiple cellular and developmental processes in wild type C. reinhardtii. Constitutive production of TF64 led to an increase in transcript levels of genes functioning in photosynthesis and the cell cycle, as well as many others. Follow-up studies on the biological role of TF64 should prove to be interesting from a basic science perspective, leading to greater insights into the C. reinhardtii lifecycle.


Our goal with this study was to identify potential cognate transcription factor-promoter pairs from C. reinhardtii that, once validated, could be used in a synthetic nuclear transcription system. From our yeast one-hybrid data, we identified 28 TFs with possible DNA binding activity. Further studies are required to confirm these interactions in vivo in C. reinhardtii. Specifically focusing on TF64, we were able to verify the activation of transcription of nine genes, LHCBM1-9, by both RNA-Seq and RT-qPCR. It is yet to be determined if this gene activation is in fact due to a direct TF-promoter binding interaction.


These data lay the groundwork for the construction of a synthetic transcription system. This line of work provides the scientific community the necessary tools for sophisticated and robust genetic engineering in microalgae.


References for Example 2.

1. Blunt J W, Copp B R, Keyzers R A, Munro M H, Prinsep M R Marine natural products. Nat Prod Rep 29: 144-222.


2. Dufresne A, Ostrowski M, Scanlan D J, Garczarek L, Mazard S, et al. (2008) Unraveling the genomic mosaic of a ubiquitous genus of marine cyanobacteria. Genome Biol 9: R90.


3. Parker M S, Mock T, Armbrust E V (2008) Genomic insights into marine microalgae. Annu Rev Genet 42: 619-645.


4. Gimpel JA, Specht EA, Georgianna DR, Mayfield SP Advances in microalgae engineering and synthetic biology applications for biofuel production. Curr Opin Chem Biol 17: 489-495.


5. Cardozo K H, Guaratini T, Barros M P, Falcao V R, Tonon A P, et al. (2007) Metabolites from algae with economical impact. Comp Biochem Physiol C Toxicol Pharmacol 146: 60-78.


6. Rosales-Mendoza S, Paz-Maldonado L M, Soria-Guerra R E Chlamydomonas reinhardtii as a viable platform for the production of recombinant proteins: current status and perspectives. Plant Cell Rep 31: 479-494.


7. Specht E, Miyake-Stoner S, Mayfield S Micro-algae come of age as a platform for recombinant protein production. Biotechnol Lett 32: 1373-1383.


8. Jones C S, Mayfield S P Algae biofuels: versatility for the future of bioenergy. Curr Opin Biotechnol 23: 346-351.


9. Stephens E, Ross I L, King Z, Mussgnug J H, Kruse O, et al. An economic and technical evaluation of microalgal biofuels. Nat Biotechnol 28: 126-128.


10. Georgianna D R, Mayfield S P Exploiting diversity and synthetic biology for the production of algal biofuels. Nature 488: 329-335.


11. Merchant S S, Prochnik S E, Vallon O, Harris E H, Karpowicz S J, et al. (2007) The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science 318: 245-250.


12. Tran M, Van C, Barrera D J, Pettersson P L, Peinado C D, et al. Production of unique immunotoxin cancer therapeutics in algal chloroplasts. Proc Natl Acad Sci USA 110: E15-22.


13. Gregory J A, Li F, Tomosada L M, Cox C J, Topol A B, et al. Algae-produced Pfs25 elicits antibodies that inhibit malaria transmission. PLoS One 7: e37179.


14. Gimpel J A, Hyun J S, Schoepp N G, Mayfield S P Production of recombinant proteins in microalgae at pilot greenhouse scale. Biotechnol Bioeng 112: 339-345.


15. Lingg N, Zhang P, Song Z, Bardor M The sweet tooth of biopharmaceuticals: importance of recombinant protein glycosylation analysis. Biotechnol J 7: 1462-1472.


16. Corchero J L, Gasser B, Resina D, Smith W, Parrilli E, et al. Unconventional microbial systems for the cost-efficient production of high-quality protein therapeutics. Biotechnol Adv 31: 140-153.


17. Rasala B A, Chao S S, Pier M, Barrera D J, Mayfield S P Enhanced genetic tools for engineering multigene traits into green algae. PLoS One 9: e94028.


18. Neupert J, Karcher D, Bock R (2009) Generation of Chlamydomonas strains that efficiently express nuclear transgenes. Plant J 57: 1140-1150.


19. Rasala B A, Lee P A, Shen Z, Briggs S P, Mendez M, et al. Robust expression and secretion of Xylanasel in Chlamydomonas reinhardtii by fusion to a selection gene and processing with the FMDV 2A peptide. PLoS One 7: e43349.


20. Riano-Pachon D M, Correa L G, Trejos-Espinosa R, Mueller-Roeber B (2008) Green transcription factors: a chlamydomonas overview. Genetics 179: 31-39.


21. Yoshioka S, Taniguchi F, Miura K, Inoue T, Yamano T, et al. (2004) The novel Myb transcription factor LCR1 regulates the CO2-responsive gene Cah1, encoding a periplasmic carbonic anhydrase in Chlamydomonas reinhardtii. Plant Cell 16: 1466-1477.


22. Ibanez-Salazar A, Rosales-Mendoza S, Rocha-Uribe A, Ramirez-Alonso J I, Lara-Hernandez I, et al. Over-expression of Dof-type transcription factor increases lipid production in Chlamydomonas reinhardtii. J Biotechnol 184: 27-38.


23. Tsai CH, Warakanont J, Takeuchi T, Sears B B, Moellering E R, et al. The protein Compromised Hydrolysis of Triacylglycerols 7 (CHT7) acts as a repressor of cellular quiescence in Chlamydomonas. Proc Natl Acad Sci USA 111: 15833-15838.


24. Riano-Pachon D M, Ruzicic S, Dreyer I, Mueller-Roeber B (2007) PlnTFDB: an integrative plant transcription factor database. BMC Bioinformatics 8: 42.


25. Gorman D S, Levine R P (1965) Cytochrome f and plastocyanin: their sequence in the photosynthetic electron transport chain of Chlamydomonas reinhardi. Proc Natl Acad Sci USA 54: 1665-1669.


26. Perez-Rodriguez P, Riano-Pachon D M, Correa L G, Rensing S A, Kersten B, et al. PlnTFDB: updated content and new features of the plant transcription factor database. Nucleic Acids Res 38: D822-827.


27. Korbie D J, Mattick J S (2008) Touchdown PCR for increased specificity and sensitivity in PCR amplification. Nat Protoc 3: 1452-1456.


28. Goecks J, Nekrutenko A, Taylor J Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11: R86.


29. Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, et al. Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol Chapter 19: Unit 19 10 11-21.


30. Giardine B, Riemer C, Hardison R C, Burhans R, Elnitski L, et al. (2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15: 1451-1455.


31. Livak K J, Schmittgen T D (2001) Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods 25: 402-408.


32. Bailey T L, Boden M, Buske F A, Frith M, Grant C E, et al. (2009) MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res 37: W202-208.


33. Bailey T L, Elkan C (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2: 28-36.


34. McLeay R C, Bailey T L Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data. BMC Bioinformatics 11: 165.


35. Waterhouse A M, Procter J B, Martin D M, Clamp M, Barton G J (2009) Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25: 1189-1191.


36. Reece-Hoyes J S, Marian Walhout A J Yeast one-hybrid assays: a historical and technical perspective. Methods 57: 441-447.


37. Gaudinier A, Zhang L, Reece-Hoyes J S, Taylor-Teeples M, Pu L, et al. Enhanced Y1H assays for Arabidopsis. Nat Methods 8: 1053-1055.


38. Wilson T E, Fahrner T J, Johnston M, Milbrandt J (1991) Identification of the DNA binding site for NGFI-B by genetic selection in yeast. Science 252: 1296-1300.


39. Verhaegent M, Christopoulos T K (2002) Recombinant Gaussia luciferase. Overexpression, purification, and analytical application of a bioluminescent reporter for DNA hybridization. Anal Chem 74: 4378-4385.


40. Yamano T, Tsujikawa T, Hatano K, Ozawa S, Takahashi Y, et al. Light and low-CO2-dependent LCIB-LCIC complex localization in the chloroplast supports the carbon-concentrating mechanism in Chlamydomonas reinhardtii. Plant Cell Physiol 51: 1453-1468.


41. Turkina MV, Blanco-Rivero A, Vainonen J P, Vener A V, Villarejo A (2006) CO2 limitation induces specific redox-dependent protein phosphorylation in Chlamydomonas reinhardtii. Proteomics 6: 2693-2704.


42. Mariscal V, Moulin P, Orsel M, Miller A J, Fernandez E, et al. (2006) Differential regulation of the Chlamydomonas Nar1 gene family by carbon and nitrogen. Protist 157: 421-433.


43. Hahn D, Kaltenbach C, Kuck U (1998) The Calvin cycle enzyme sedoheptulose-1,7-bisphosphatase is encoded by a light-regulated gene in Chlamydomonas reinhardtii. Plant Mol Biol 36: 929-934.


44. Stauber E J, Fink A, Markert C, Kruse O, Johanningmeier U, et al. (2003) Proteomics of Chlamydomonas reinhardtii light-harvesting proteins. Eukaryot Cell 2: 978-994.


45. Fang W, Si Y, Douglass S, Casero D, Merchant S S, et al. Transcriptome-wide changes in Chlamydomonas reinhardtii gene expression regulated by carbon dioxide and the CO2-concentrating mechanism regulator CIA5/CCM1. Plant Cell 24: 1876-1893.


46. Pireyre M, Burow M Regulation of MYB and bHLH transcription factors: a glance at the protein level. Mol Plant 8: 378-388.


47. Robinson K A, Lopes J M (2000) SURVEY AND SUMMARY: Saccharomyces cerevisiae basic helix-loop-helix proteins regulate diverse biological processes. Nucleic Acids Res 28: 1499-1505.


48. Feller A, Machemer K, Braun E L, Grotewold E Evolutionary and comparative analysis of MYB and bHLH plant transcription factors. Plant J 66: 94-116.


49. Kewley R J, Whitelaw M L, Chapman-Smith A (2004) The mammalian basic helix-loop-helix/PAS family of transcriptional regulators. Int J Biochem Cell Biol 36: 189-204.


50. Lang E J, Cross P J, Mittelstadt G, Jameson G B, Parker E J Allosteric ACTion: the varied ACT domains regulating enzymes of amino-acid metabolism. Curr Opin Struct Biol 29: 102-111.


51. Zhao H, Li X, Ma L Basic helix-loop-helix transcription factors and epidermal cell fate determination in Arabidopsis. Plant Signal Behav 7: 1556-1560.


52. Castilhos G, Lazzarotto F, Spagnolo-Fonini L, Bodanese-Zanettini M H, Margis-Pinheiro M Possible roles of basic helix-loop-helix transcription factors in adaptation to drought. Plant Sci 223: 1-7.


53. Curtis D J, Salmon J M, Pimanda J E Concise review: Blood relatives: formation and regulation of hematopoietic stem cells by the basic helix-loop-helix transcription factors stem cell leukemia and lymphoblastic leukemia-derived sequence 1. Stem Cells 30: 1053-1058.


54. Fritzsch B, Eberl D F, Beisel K W The role of bHLH genes in ear development and evolution: revisiting a 10-year-old hypothesis. Cell Mol Life Sci 67: 3089-3099.


55. Powell L M, Jarman A P (2008) Context dependence of proneural bHLH proteins. Curr Opin Genet Dev 18: 411-417.


Example 3
Identifying Conditional Regulatory Elements in C. reinhardtii Nuclear Genome

For photosynthetic organisms, light and dark cycles act as major drivers of metabolism and gene expression pattern variation. During day time, green algae can utilize photosynthesis to drive the production of sugars that are then used for energy in a myriad of metabolic processes including the production of starches and sugars. During the night the cells must utilize stored energy in the form of sugars, starches, or lipids to continue metabolic activity. The switching from phototrophic to hetrotropic metabolism requires large sets of genes to be switched on or off. In Chlamydomonas ˜80% of the genome displays detectable periodic gene expression changes throughout a 24 hour day/night cycle (Zones et al., 2015). We therefore predicted that unique regulatory motifs may be used to regulate these light-induced or dark-induced genes in response to light intensity. If identified, these motifs can then be utilized to drive transgene expression specifically in response to light or dark conditions. Since light is one of the easiest variables to control in commercial scale cultivation of algae, design and production of light/dark-responsive synthetic promoters would be highly useful for inducing or silencing transgene expression.


Using high resolution RNA-seq data taken from Chlamydmonas reinhardtii on a 12 hour light-12 hour dark cycle (Zones et al., 2015, supra) we identified genes that were differentially expressed by at least two fold between the middle of the light-period (day) and the middle of the dark-period (night) while displaying moderate to high expression levels overall during their upregulated time period. Specifically, we averaged the Reads Per Kilobase of transcript per Million mapped reads (RPKM) for each transcript during the middle 4 hours of the 12-hour light period and the middle 4 hours of the 12-hour dark period. Genes with at least a 2-fold increase in averaged read count during the light phase compared to the dark phase and an average RPKM of more than 100 were determined to be light-upregulated strong expressers. Similarly genes with at least a 2-fold increase in average read count during the dark phase compared to the light and an average RPKM of more than 100 were determined to be dark-upregulated strong expressers. Collectively this represented 255 light-upregulated genes and 248 dark-upregulated genes. The 1000 bp region 5′ from the transcriptional start site of these genes was retrieved (Phytozome 12, Chlamydomonas reinhardtii genome v5.5) and analyzed using the POWRS motif identification program (Davis et al., 2012). All default settings on POWRS were used and −1000 bp regions from all 17737 annotated genes in the whole genome used as the background control data set. POWRS identified 31 and 32 enriched motif clusters in the light-upregulated and dark-upregulated promoter datasets, respectively compared to promoters in the rest of the genome. Motifs enriched in the light-upregulated or dark-upregulated data sets were compared each other using the Tomtom motif comparison tool (Gupta, et al., (2007) Genome Biol. 8(2):R24). FIGS. 16A and 16B identify motifs unique to either the light up-regulated (FIG. 16A) or dark-upregulated (FIG. 16B) data sets. Many of the light/dark-regulated motifs are different from the motifs identified from simply looking at the highest expressed genes during logarithmic growth in the previous example. Taken together this shows that comparison of promoters from genes up or down regulated in unique abiotic contexts can be used to identify unique motifs that may regulate those genes in a specific context for selective expression or repression of a transgene construct. These motifs can then be assembled in to synthetic algae promoters as was shown in the first example.


References for Example 3

Crooks G. E., Hon G., Chandonia J. M., Brenner S. E. WebLogo: A sequence logo generator, Genome Research. 2004. 14:1188-1190.


Zones J. M., Blaby I. K., Merchant S. S., Umen J. G. High-Resolution Profiling of a Synchronized Diurnal Transcriptome from Chlamydomonas reinhardtii Reveals Continuous Cell and Metabolic Differentiation. Plant Cell. 2015. 27(10):2743-69.


Davis I. W., Benninger C., Benfey P. N., Elich T. POWRS: position-sensitive motif discovery. PLoS One. 2012. 7(7):e40373.


Gupta S., Stamatoyannopoulos J. A., Bailey T. L., Noble W. S. Quantifying similarity between motifs. Genome Biol. 2007. 8(2):R24.


Example 4
Other Systems For Regulatory Elements

Statistical analyses as those presented above serve as an unbiased method for identifying conserved nucleotide motifs which correlate with increased transcription levels. This strategy alleviates the necessity for understanding the mechanism of action of the associated sequence. For an organism like Chlamydomonas reinhardtii, it is favorable to use this approach due to large gaps in the understanding of regulatory elements in the species. However, a wealth of knowledge is available across the kingdom Plantae which serve as a guide to understanding the complex transcriptional regulation found in C. reinhardtii. One of the best-understood aspects of the regulatory system is that by encouraging an activating transcription factor to bind in a regulatory region associated with a transgene, one can increase transcript abundance and subsequent protein accumulation. Systems have been derived in S. cerevisiae and E. coli which take advantage of known DNA-binding proteins to engineer complex circuits of protein expression for a wide variety of purposes (Wang et al. 2011, Ellis et al. 2009, Kotula et al. 2014).


Transcription factor families are easily identifiable in silico and homology analysis to better-understood systems can provide a groundwork for understanding in C. reinhardtii. The Plant Transcription Factor Database (PTFDB) (//planttfdb.cbi.pku.edu.cn/) has identified each family of transcription factor found in C. reinhardtii based on sequence homology to other plants. The PTFDB has also compiled data from across the literature to provide putative binding sites for those families of transcription factors. Transcription factor (TF) binding sites have been studied across plants through one of the following processes: ampDAP, ChIP/ChIP-seq, DAP, PBM, or SELEX. TF binding sites found in the literature that are associated with a given TF family are projected to other species to help characterize binding in a virgin system. The sequence motifs attributed to TF families found in C. reinhardtii are provided as position-weight matrices in FIGS. 17A-C. These serve as a promising set of sequences for synthetic promoter engineering. By integrating these sequences into a novel synthetic promoter, we can project the regulation of the transgene onto one or many specific transcription factor. We know that certain transcription factors have variable function based on external stimuli (Riano-Pachon et al. 2008), and as such these sequences are clear candidates for inducible promoter engineering.


In an effort to better characterize the in vivo TF/sequence cognate pairs for C. reinhardtii, 90 predicted transcription factors were cloned from C. reinhardtii cDNA into a constitutive nuclear expression construct (Andersen et a 2017). Upon characterization of their binding in a Y1H assay, a bHLH-family transcription factor (Cre02.g109700.t1.2, will be referred to as TF64) was selected for further analysis. Three strains were designed to determine if constitutive expression of a transgenic transcription factor can increase recombinant protein abundance in C. reinhardtii. We generated a strain which expressed high levels of TF64, one which expressed low levels of TF64, and a control strain which used the same construct to express GFP, a non-DNA binding protein. These three strains in addition to an untransformed wild-type strain were transformed with an expression cassette which drives OFP expression, which is easily detected by a fluorescent plate reader. The promoter associated with the OFP gene must contain binding site(s) associated with the bHLH transcription factor family (CANNTG). Conveniently, the AR1 promoter that is well-established in the field has three putative bHLH binding sites, identified in FIG. 18. The AR1 promoter was used to drive the expression of OFP in the TF64 expression strains, shown in FIG. 19. These data indicate that presence of putative TF-binding site motifs in an expression construct when combined with their associated transcription factors can help drive recombinant protein accumulation. The generation of more in vivo cognate TF/site pairs based on the putative TF binding sites shown in FIGS. 17A-C will facilitate the development of more advanced promoters with the added functionality of orthogonal regulation.


References for Example 4.

Wang B., Kitney R. I., Joly N., Buck M. Engineering modular and orthogonal genetic logic gates for robust digital-like synthetic biology. Nat Commun. 2011 Oct 18;2:508.


Ellis T., Wang X., Collins J. J. Diversity-based, model-guided construction of synthetic gene networks with predicted functions. Nat Biotechnol. 2009 May;27(5):465-71.


Kotula J. W., Kerns S. J., Shaket L. A., Siraj L., Collins J. J., Way J. C., Silver P. A. Programmable bacteria detect and record an environmental signal in the mammalian gut. Proc. Natl. Acad. Sci. U.S.A. 2014 Apr 1;111(13):4838-4843.


M S Anderson, T J Muff, D R Georgianna, S P Mayfield. Towards a synthetic nuclear transcription system in green algae: Characterization of Chlamydomonas reinhardtii nuclear transcription factors and identification of targeted promoters, Algal Research (2017) 22: 47-55.


Riaño-Pachón D M, Corrêa LGG, Trejos-Espinosa R, Mueller-Roeber B. Green Transcription Factors: A Chlamydomonas Overview. Genetics. 2008;179(1): 31-39.


It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents, and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes.

Claims
  • 1. A synthetic promoter capable of promoting and/or initiating transcription of a polynucleotide in an algal cell, the synthetic promoter comprising from 3 to 30 promoter (cis)-elements selected from the group consisting of the sequences in Tables 1 and 2, FIGS. 16A and 16B.
  • 2-7. (canceled)
  • 8. The synthetic promoter of claim 1, wherein the synthetic promoter comprises one or more transcriptional factor binding site motifs selected from the group consisting of the sequences in FIGS. 17A, 17B, and 17C.
  • 9. The synthetic promoter of claim 1, wherein: the promoter (cis)-elements are positioned or located within the promoter relative to the transcriptional start site (TSS) as indicated in Table 1; and/orthe promoter comprises a nucleic acid sequence of any one of the sequences in Table 4 (SEQ ID NOs: 38-62).
  • 10. (canceled)
  • 11. The synthetic promoter of claim 1, wherein: the promoter is responsive to light exposure and comprises one or more promoter (cis)-elements selected from the group consisting of the sequences in FIG. 16A, orthe promoter is responsive to dark exposure and comprises one or more promoter (cis)-elements selected from the group consisting of the sequences in FIG. 16B.
  • 12. (canceled)
  • 13. The synthetic promoter of claim 1, wherein: the promoter is at least about 200 bp in length and up to about 1000 bp in length; and/orthe synthetic promoter promotes transcription levels that are at least 2-fold greater than a control promoter; and/orthe promoter (cis)-elements are positioned within a promoter scaffold or backbone; and/orthe nucleic acid base of highest probability or second highest probability at a particular position of the promoter scaffold or backbone relative to the transcriptional start site (TSS) is assigned to that position.
  • 14-16. (canceled)
  • 17. The synthetic promoter of claim 1, wherein the algal cell is a green algal cell.
  • 18-19. (canceled)
  • 20. An expression cassette comprising a synthetic promoter of claim 1.
  • 21. A vector comprising the expression cassette of claim 20.
  • 22. (canceled)
  • 23. A cell comprising the synthetic promoter of claim 1.
  • 24-25. (canceled)
  • 26. The cell of claim 23, wherein the cell is a Chlamydomonas reinhardtii cell.
  • 27. The cell of claim 23, wherein the cell overexpresses or underexpresses one or more transcription factors encoded by a polynucleotide comprising at least about 60% sequence identity to SEQ ID NOs: 87-178.
  • 28. (canceled)
  • 29. A kit comprising the synthetic promoter of claim 1.
  • 30. A method of transcribing a polynucleotide in an algal cell, comprising expressing in the algal cell the polynucleotide operably linked to a synthetic promoter of claim 1.
  • 31. A method of increasing the transcription of a polynucleotide in an algal cell, comprising expressing in the algal cell the polynucleotide operably linked to a synthetic promoter of claim 1.
  • 32. The method of claim 31, wherein: the transcription levels of the polynucleotide are increased at least about 2-fold in comparison to a control promoter; and/ortranscription of the polynucleotide is increased in response to light exposure and the synthetic promoter comprises one or more promoter (cis)-elements selected from the group consisting of the sequences in FIG. 16A, or transcription of the polynucleotide is increased in response to dark exposure and the synthetic promoter comprises one or more promoter (cis)-elements selected from the group consisting of the sequences in FIG. 16B.
  • 33-38. (canceled)
  • 39. The method of claim 30, wherein: the cell comprises one or more transcription factors encoded by a polynucleotide comprising at least about 60% sequence identity to SEQ ID NOs: 87-178; and/orthe cell overexpresses one or more transcription factors encoded by a polynucleotide comprising at least about 60% sequence identity to SEQ ID NOs: 87-178, or-the cell underexpresses one or more transcription factors encoded by a polynucleotide comprising at least about 60% sequence identity to SEQ ID NOs: 87-178.
  • 40-41. (canceled)
  • 42. A method of designing, constructing and/or assembling a synthetic promoter of claim 1, comprising arranging at least about 3 (cis)-elements selected from the group consisting of the sequences in Tables 1 and 2, and FIGS. 16A and 16B within a promoter scaffold or backbone.
  • 43. The method of claim 42, wherein: the promoter (cis)-elements are positioned or located within the promoter relative to the transcriptional start site (TSS) as indicated in Table 1; and/orthe promoter is at least about 200 bp in length and up to about 1000 bp in length; and/orthe synthetic promoter promotes transcription levels that are at least 2-fold greater than a control promoter; and/orthe nucleic acid base of highest probability or second highest probability at a particular position of the promoter scaffold or backbone relative to the transcriptional start site (TSS) is assigned to that position.
  • 44-47. (canceled)
  • 48. A synthetic nuclear transcription system, the system comprising a synthetic promoter of claim 1 operably linked to a polynucleotide of interest, and one or more transcription factors encoded by a polynucleotide comprising at least about 60% sequence identity to SEQ ID NOs: 87-178.
  • 49. The system of claim 48, wherein: transcription of the polynucleotide is increased in response to light exposure and the synthetic promoter comprises one or more promoter (cis)-elements selected from the group consisting of the sequences in FIG. 16A; ortranscription of the polynucleotide is increased in response to dark exposure and the synthetic promoter comprises one or more promoter (cis)-elements selected from the group consisting of the sequences in FIG. 16B.
  • 50. (canceled)
  • 51. A cell comprising the system of claim 48.
  • 52-54. (canceled)
  • 55. The cell of claim 51, wherein: the cell overexpresses one or more transcription factors encoded by a polynucleotide comprising at least about 60% sequence identity to SEQ ID NOs: 87-178; orthe cell underexpresses one or more transcription factors encoded by a polynucleotide comprising at least about 60% sequence identity to SEQ ID NOs: 87-178.
  • 56. (canceled)
CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a U.S. 371 National Phase of International Application No. PCT/US2017/018196, filed on Feb. 16, 2017, which claims the benefit under 35 U.S.C. § 119(e) of U.S. Provisional Application No. 62/295,997, filed on Feb. 16, 2016, which are hereby incorporated herein by reference in their entireties.

STATEMENT OF GOVERNMENTAL SUPPORT

This work was supported in part by Grant No DE-EE-0003373 from the United States Department of Energy. The Government has certain rights in this invention.

PCT Information
Filing Document Filing Date Country Kind
PCT/US2017/018196 2/16/2017 WO 00
Provisional Applications (1)
Number Date Country
62295997 Feb 2016 US