COMPOSITIONS AND METHODS FOR THE TREATMENT OF DBA USING GATA1 GENE THERAPY

SEQUENCE LISTING

The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Jun. 3, 2020, is named 701039-094470WOPT_SL.txt and is 188,598 bytes in size.

TECHNICAL FIELD

The technology described herein relates to compositions and methods of GATA-1 gene therapy for the treatment of Diamond-Blackfan anemia and uses thereof.

BACKGROUND

Diamond-Blackfan anemia (DBA) is one of a rare group of inherited bone marrow failure syndromes (IBMFSs) and is characterized by red cell failure, the presence of congenital anomalies, and cancer predisposition. DBA is usually diagnosed in children during their first year of life. Children with DBA do not make enough red blood cells, the cells that carry oxygen to all other cells in the body. In children with DBA, many of the cells that would have become red blood cells die before they develop. In addition to being an inherited bone marrow failure syndrome, DBA is also categorized as a ribosomopathy as, in more than 50% of cases, the syndrome appears to result from haploinsufficiency of either a small or large subunit-associated ribosomal protein.

DBA is characterized by a specific reduction in the production of red blood (erythroid) cells and their precursors without defects in other hematopoietic lineages. Over the past decade, the elucidation of mutations in the ribosomal protein gene RPS19, followed by the discovery of mutations in 9 other ribosomal protein genes, has led to the hypothesis that DBA is a disorder of ribosomal biogenesis. However, approximately 50% of DBA cases have as-yet-unidentified molecular mutations, despite systematic sequencing of all ribosomal protein and other candidate genes in these cases.

The GATA-1 gene is located on the X-chromosome and encodes a transcription factor that regulates the development of erythrocytes. Recently, loss-of-function mutations in GATA-1 have been found in patients with Diamond-Blackfan anemia (DBA). However, no treatment targeting GATA-1 augmentation specifically in erythroid cells is currently available. Thus, therapeutic approaches that directly target GATA-1 dysfunction in erythroid cells are necessary in order to provide effective treatment.

SUMMARY

Recent studies have shown that GATA-1 augmentation in erythroid cells may have therapeutic effects in Diamond-Blackfan anemia (DBA). However, increasing the lineage-specific expression of therapeutic proteins including GATA-1 in vivo remains challenging. Attempting to increase GATA1 expression with existing technology necessarily increased GATA1 expression in cells (e.g. HSCs) where it is overwhelming deleterious to the subject, negating any possible therapeutic effect.

As described herein, the inventors have identified compositions and methods to increase lineage-specific expression of GATA1 specifically in early erythroid progenitors but not in hematopoietic stem cells as a gene therapeutic approach for the treatment of Diamond-Blackfan anemia. DBA is characterized by a specific reduction in the production of red blood (erythroid) cells and their precursors without defects in other hematopoietic lineages.

In one aspect of any of the embodiments, described herein is a nucleic acid sequence comprising at least one heterologous regulatory sequence selected from a hematopoietic enhancer element and miRNA binding site for a HSC restricted miRNA; and a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.

In some embodiments of any of the aspects, the nucleic acid sequence comprises at least one hematopoietic enhancer element.

In some embodiments of any of the aspects, the enhancer element comprises a sequence of at least 80% homology to a nucleotide sequence that is selected from the group consisting of: SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 38 and/or SEQ ID NO: 39.

In some embodiments of any of the aspects, the enhancer element comprises an enhancer element of a gene selected from the group consisting of: Kell metalloendopeptidase (KEL); 5′ aminolevulinate synthase 2 (ALAS2); and glycophorin A (GYPA).

In some embodiments of any of the aspects, the nucleic acid comprises at least one miRNA binding site for at least one HSC-restricted miRNA.

In some embodiments of any of the aspects, the at least one miRNA binding site for at least one HSC-restricted miRNA is selected from the group consisting of miR binding sites for miR10aT, miR125, miR155, miR130aT, miR142T, miR196bT, miR99, miR126miR126, miR181, miR193, miR223T, miR542, and let7e.

In some embodiments of any of the aspects, the nucleic acid comprises at least one hematopoietic enhancer element and at least one miRNA binding site for at least one HSC-restricted miRNA.

In some embodiments of any of the aspects, comprising: a heterologous 5′ UTR comprising: a 5′UTR sequence of a hematopoietic transcription factor other than GATA1; ii. a sequence of at least 20 nucleotide acids; and/or iii. 1-25 upstream codons uAUGs; and/or b. a hematopoietic enhancer minigene.

In one aspect of any of the embodiments, described herein is a nucleic acid sequence comprising a 5′ UTR comprising; i. a 5′UTR sequence of a hematopoietic transcription factor other than GATA1; ii. a sequence of at least 20 nucleotide acids; and/or iii. 1-25 upstream codons uAUGs and a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.

In some embodiments of any of the aspects, the 5′UTR comprises a 5′UTR of a gene selected from the group consisting of: Runt-related transcription factor 1 (RUNX1), LIM Domain Only 2 (LMO2), or ETS Variant 6 (ETV6).

In some embodiments of any of the aspects, the nucleic acid further comprises at least one hematopoietic enhancer element, miRNA binding site for a HSC restricted miRNA and/or a hematopoietic enhancer minigene (G1HEM).

In one aspect of any of the embodiments, described herein is a nucleic acid sequence comprising an hematopoietic enhancer minigene (G1HEM); a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.

In some embodiments of any of the aspects, the hematopoietic enhancer minigene (mG1HEM) comprises a sequence of at least 80% homology to a nucleotide sequence of: SEQ ID NO: 13.

In some embodiments of any of the aspects, the nucleic acid further comprises a 5′ UTR comprising; i. a 5′UTR sequence of a hematopoietic transcription factor other than GATA1; ii. a sequence of at least 20 nucleotide acids; and/or iii. 1-25 upstream codons uAUGs; and/or at least one hematopoietic enhancer element; and/or at least one miRNA binding site for a HSC restricted miRNA.

In some embodiments of any of the aspects, the nucleic acid further comprises a 5′ UTR comprising; a 5′UTR of a gene selected from the group consisting of: Runt-related transcription factor 1 (RUNX1), at least one hematopoietic enhancer element; and/or at least one miRNA binding site for a HSC restricted miRNA.

In some embodiments of any of the aspects, the nucleic acid the sequence comprises a promoter operably linked to the elements of a. and b.

In some embodiments of any of the aspects, the promoter is not a GATA1 promoter.

In some embodiments of any of the aspects, the promoter comprises a promoter sequence of Elongation factor 1-alpha 1 (eEF1a1).

In some embodiments of any of the aspects, the nucleic acid sequence comprises: a posttranscriptional regulatory element operably linked to the sequence encoding the GATA1 polypeptide.

In some embodiments of any of the aspects, the posttranscriptional regulatory element comprises a Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element (WPRE).

In some embodiments of any of the aspects, the nucleic acid sequence further comprises: an internal ribosome entry site.

In some embodiments of any of the aspects, the internal ribosome entry site is operably linked to a marker gene and wherein the marker gene encodes an optically visible protein or an enzyme.

In some embodiments of any of the aspects, the sequence comprises a sequence selected from SEQ ID NOs 8, 9 and 62.

In some embodiments of any of the aspects, the nucleic acid sequence is a vector.

In some embodiments of any of the aspects, the vector is a plasmid, or an adenoviral, lentiviral or retroviral vector.

In one aspect of any of the embodiments, described herein is a lentiviral particle comprising the nucleic acid sequence.

In one aspect of any of the embodiments, described herein is a composition comprising a nucleic acid sequence or particle and a pharmaceutically acceptable carrier.

In one aspect of any of the embodiments, described herein is a method of treating Diamond-Blackfan Anemia in a subject in need thereof, the method comprising administering a therapeutically effective amount of a nucleic acid sequence, particle, or composition to the patient.

In one aspect of any of the embodiments, described herein is a method of restoring early erythroid progenitor cell-specific GATA1 expression, the method comprising contacting a population of cells comprising early erythroid progenitor cells with a nucleic acid sequence, particle, or composition.

In some embodiments of any of the aspects, the early erythroid progenitor cells comprise a DBA-associated gene mutation.

In one aspect of any of the embodiments, described herein is a nucleic acid sequence, particle, or composition described herein for use in the treatment of Diamond-Blackfan Anemia in a subject in need thereof.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts a schematic of the molecular pathways involved in Diamond-Blackfan anemia (DBA) pathogenesis.

FIG. 2A, FIG. 2B, and FIG. 2C demonstrate reduced ribosome levels with DBA-molecular lesions.

FIG. 3 demonstrates reduced GATA1 expression levels in hematopoietic stem cells (HSPCs) from DBA patients with RP gene mutations (RPS19, RPL5, and RPL35A mutations present in patients shown here).

FIG. 4A, FIG. 4B, and FIG. 4C demonstrate the rescue of erythroid lineage commitment and differentiation (as assessed by morphology (FIG. 4B) and markers of terminal differentiation (FIG. 4C); bottom) in DBA patient HSPCs by GATA1 lentiviral transduction. FIG. 4A. The three patients shown have mutations in RPS19 (Patient 2 and 3) and RPL35A (Patient 1).

FIG. 5 depicts a schematic of the claimed vectors allowing regulated GATA1 expression. The endogenous GATA1 locus is shown above and below the pRRL.PPT.EFS vectors (including self-inactivating long-terminal repeat elements [LTR] with safety modifications and post transcriptional regulatory elements of the woodchuck hepatitis virus) are shown. The vectors either include the endogenous GATA1 promoter or the short EF1α (EFS) promoter. The GATA1 cDNA is codon optimized for improved expression. FIG. 5 discloses SEQ ID NOS 67-69, respectively, in order of appearance.

FIG. 6 depicts a schematic of the use of the claimed GATA1 vectors in primary human hematopoietic cells.

FIG. 7 depicts a schematic of the various combinations of vectors to achieve developmentally faithful expression of GATA1 in early erythroid progenitors but not in hematopoietic stem cells.

FIG. 8A, and FIG. 8B show genomic plots of human GATA1 and diagrams of two vectors. FIG. 8A demonstrates the chromatin accessibility upstream of human GATA1. FIG. 8B. Two vectors to achieve developmentally faithful expression of GATA1 in early erythroid progenitors but not in hematopoietic stem cells.

FIG. 9A, FIG. 9B, FIG. 9C, FIG. 9D, and FIG. 9E depict the five vectors including a control vector to achieve developmentally faithful expression of GATA1 in early erythroid progenitors but not in hematopoietic stem cells. FIG. 9A. R18 EF-1α IRES GFP Control. FIG. 9B. R21 EF-1α IRES GFP miR126. FIG. 9C. R49 EF-1α 1 peak enhancer GFP. FIG. 9D. R50 3 Peak Enhancer GFP. FIG. 9E. GATA1 vector with enhancer and miR126 binding site.

FIG. 10 shows a FACS analysis plot of cells transfected with the R18 EF-1α IRES GFP Control. day 4, day 9 and day 11 of CD71 and CD235a during in vitro differentiation. As cells move from quadrant 1 to 4, they are maturing down the erythroid lineage.

FIG. 11 shows a FACS analysis plot of cells transfected with the R21 EF-1α IRES GFP.

FIG. 12 shows a FACS analysis plot of cells transfected with the R21 EF-1α IRES GFP miR126.

FIG. 13 shows a FACS analysis plot of cells transfected with the R49 EF-1α 1 peak enhancer GFP.

FIG. 14 shows a FACS analysis plot of cells transfected with the R49 EF-1α 3 peak enhancer GFP.

FIG. 15 shows a FACS analysis plots of cells transfected with R18 EF-1α IRES GFP Control, R21 EF-1α IRES GFP miR126, R49 EF-1α 1 peak enhancer GFP, R50 3 Peak Enhancer GFP.

FIG. 16 demonstrates that R50 3 Peak Enhancer GFP of Human GATA enhancer preferentially drives erythroid transgene expression but not CD34+ cells.

FIG. 17 depicts the FACS analysis plots using HSC d4 of Ef1a-GFP, miR126, miR223T, 1peak, 3peak, 1peak-miR126, 1peak-miR223T, 3peak-miR126, and 3peak-miR223T. Experimental outline: D0: Thaw CD34+ cells into SSII+cc100+TPO, culture at 5% O2. D2: Lentiviral infection, recover overnight in SSII+cc100+TPO. HSC D3: split culture—half in HSC conditions, half in RBC differentiation conditions. HSC D4 and D7: Analysis by flow cytometry. RBC D4: Analysis by flow cytometry (to continue every 3-4 days).

FIG. 18A and FIG. 18B show bar graphs depicting GFP expression in a CD34+CD38-CD45RA-CD90+ subset at day 4 (FIG. 18A) and at day 7 (FIG. 18B).

FIG. 19 depicts FACS analysis plots using RBC D4 of Ef1a-GFP, miR126, miR223T, 1peak, 3peak, 1peak-miR126, 1peak-miR223T, 3peak-miR126, and 3peak-miR223T.

FIG. 20 shows a bar graph depicting GFP expression of RBC d4, CD71+CD235+.

FIG. 21 depicts the % of GFP in erythroid subsets. CD71-CD235-, CD71+CD235-, and CD71+CD235+.

FIG. 22 show a bar graph depicting the % GFP fold increase RBC vs HSC. Results are showing for of Ef1a-GFP, miR126, miR223T, 1peak, 3peak, 1peak-miR126, 1peak-miR223T, 3peak-miR126, and 3peak-miR223T.

FIG. 23 shows FACs analysis plots of RPS19 knockdown impairs erythroid differentiation. Experimental outline: D0: thaw cells into Phase I media. D2: spinfect with shRNA lenti+/−GATA1 expression constructs. D4: begin puro selection. D6: remove puro. D7 flow analysis.

FIG. 24 shows FACs analysis plots of RPS19 knockdown rescued by GATA1 overexpression.

FIG. 25 shows FACs analysis plots of RPS19 knockdown rescued by GATA1 overexpression.

FIG. 26 shows a bar graph depicting CD235+/CD235- level of EF1a-GFP, EF1a-GATA-IRES-GFP, 1 peak-GATA-GFP, 3 peak-GATA-GFP, and HMD-GATA-GFP.

FIG. 27 shows a schemata depicting key features and a summary of experimental validation of a GATA1 gene therapy vector to cure DBA.

FIG. 28A, FIG. 29B, FIG. 28C, and FIG. 28D show that developmentally regulated expression of GATA1 rescues DBA phenotype in vitro. FIG. 28A. Accessible chromatin upstream of human GATA1 in descending order from HSPCs to reticulocytes (top) and schematic of lentiviral vector to achieve regulated GATA1 expression (bottom). FIG. 28B. shRNA knockdown of RPS19 in primary human HSPCs impairs erythroid development and is rescued by GATA1 expression. FIG. 28C. Erythroid differentiation of murine G1E cells is achieved with regulated GATA1 expression. FIG. 28D. GFP ratio in erythroid progenitors compared to HSCs shows developmentally regulated expression.

FIG. 29A, FIG. 29B, and FIG. 29C shows exogenous GATA1 expression during erythroid differentiation. FIG. 29A. differentiating erythroid precursors first express CD71 followed by CD235 and finally loss of CD71 during terminal erythroid differentiation. FIG. 29B. Percentage of erythroid progenitors that express CD71 (dark grey) or both CD71 and CD235 (light grey) on day 4 is higher after infection with GATA1 virus. FIG. 29C. Ratio of GFP expression of CD71-CD235+ cells compared to CD71+CD235+ cells reveals decreased expression from hG1E during terminal erythroid differentiation, mimicking endogenous GATA1 expression.

FIG. 30A and FIG. 30B. Regulated GATA1 rescues erythroid block after RPS19 editing. FIG. 30A. Proportion of CD71+ cells that also express CD235 is higher after GATA1 infection. FIG. 30B. Regulated GATA1 promotes erythroid colony formation.

DETAILED DESCRIPTION

As described herein, GATA-1 augmentation in erythroid cells can have therapeutic effects in Diamond-Blackfan anemia (DBA). However, existing methods of increasing GATA-1 expression in erythoid cells also necessarily increase expression in other cell types, e.g., in hematopoietic stem cells. These off-target effects can lead to damaging side effects and must be avoided in order to provide an actual treatment to subjects. That said, increasing the lineage-specific expression of therapeutic proteins including GATA-1 in vivo has proven challenging and has not yet been successfully done.

As described herein, the inventors have identified nucleic acid sequences comprising regulatory sequences that can restore early erythroid progenitor cell-specific GATA1 expression, thereby permitting a therapeutic approach for DBA. Briefly, the methods described herein relate to compositions and methods to increase lineage-specific expression of GATA1 in early erythroid progenitors but not in hematopoietic stem cells as a therapy for DBA. More specifically, described herein are methods of restoring early eythroid progenitor cell-specific GATA1 expression by contacting a population of early erythroid progenitor cells, including but not limited to cells that comprise a DBA-associated gene mutation with a nucleic acid sequence, particle, or composition as described herein.

DBA is characterized by a specific reduction in the production of red blood (erythroid) cells and their precursors without defects in other hematopoietic lineages. Provided herein are methods of treating Diamond-Blackfan Anemia in a subject in need thereof, the method comprising administering a therapeutically effective amount of a nucleic acid sequence, particle, or composition including but not limited to vectors with specific gene regulatory elements for the development of broadly applicable hematopoietic gene therapy approaches for DBA patients, as described herein.

Furthermore, provided herein are methods of restoring early erythroid progenitor cell-specific GATA1 expression, the method comprising contacting a population of cells comprising early erythroid progenitor cells with a nucleic acid sequence, particle, or composition as described herein.

Diamond-Blackfan anemia (DBA) is a congenital erythroid aplasia that usually presents in infancy. DBA causes low red blood cell counts (anemia), without substantially affecting the other blood components (the platelets and the white blood cells). About 47% of affected individuals also have a variety of congenital abnormalities, including craniofacial malformations, thumb or upper limb abnormalities, cardiac defects, urogenital malformations, and cleft palate. Low birth weight and generalized growth delay are sometimes observed. DBA patients have a modest risk of developing leukemia and other malignancies.

DBA is characterized by a specific reduction in the production of red blood (erythroid) cells and their precursors without defects in other hematopoietic lineages. In more than 50% of cases, DBA is caused by heterozygous loss-of-function mutations (haploinsufficiency) in one of 11 genes encoding ribosomal proteins, including the RPL5, RPL11, RPL35A, RPS10, RPS17, RPS19, RPS24, and RPS26 genes. These and other genes associated with Diamond-Blackfan anemia provide instructions for making ribosomal proteins. Approximately 25 percent of individuals with Diamond-Blackfan anemia have mutations in the RPS19 gene. About another 25 to 35 percent of individuals with this disorder have mutations in the RPL5, RPL11, RPL35A, RPS10, RPS17, RPS24, or RPS26 gene. Mutations in any of these genes are believed to cause problems with ribosome function. It is striking that mutations of such ubiquitously expressed ribosomal proteins result in such specific human disorders. Studies indicate that a shortage of functioning ribosomes may increase the self-destruction of blood-forming cells in the bone marrow, resulting in anemia. Abnormal regulation of cell division or inappropriate triggering of apoptosis may contribute to the other health problems that affect some people with Diamond-Blackfan anemia. Numerous theories have been proposed for the pathogenesis underlying these diseases. However, these models are unable to explain the exquisite cell-type specificity of DBA and the other ribosomal disorders.

Haploinsufficiency of ribosomal proteins can contribute to other cell-type specific diseases in humans, including congenital asplenia and T-cell lymphocytic leukemia. It is striking that mutations of such ubiquitously expressed ribosomal proteins result in such specific human disorders. Numerous theories have been proposed for the pathogenesis underlying these diseases. However, these models are unable to explain the exquisite cell-type specificity of DBA and the other ribosomal disorders.

In various embodiments described herein are methods of restoring early erythroid progenitor cell-specific GATA1 expression, comprising contacting a population of cells comprising early erythroid progenitor cells with a nucleic acid sequences, particles, or compositions as described herein. Furthermore, it is contemplated that the nucleic acid sequences, particles, or compositions described herein can be used to treat DBA by administering a therapeutically effective amount of a nucleic acid sequence, particle, or composition as described herein to a patient in need of treatment for DBA.

As used herein, “GATA-1”, “GATA1”, or “GATA binding protein 1” is a protein that is encoded by the GATA1 gene. The protein encoded by this gene is a protein of the GATA family of transcription factors. The protein plays an important role in erythroid development by regulating the switch of fetal hemoglobin to adult hemoglobin. The GATA1 gene is located on the X-chromosome (Xp11.23) and encodes a transcription factor that regulates the development of erythrocytes. Loss-of-function mutation in GATA-1 are linked to hematopoietic disorders, including DBA.

The GATA-1 polypeptide has three functional domains: a N-terminal transactivation domain (TD), essential for transcriptional activation activity, a N-terminal zinc finger (NF), and a C-terminal zinc finger (CF) responsible for the binding to DNA. Exon 4 mutations have been identified in families with dyserythropoietic anemia, thrombocytopenia, thalassemia, and erythropoietic porphyria. Related germline mutations have also been described. The loss-of-function mutations of GATA-1 in DBA occur at the donor splice site of exon 2 in the GATA-1 gene and result in exon skipping.

Sequences for GATA1 are known for a number of species, e.g., human GATA1 (the GATA1 NCBI Gene ID is 2623) mRNA sequences (e.g., NM_002049.3, XM_011543897.2, XM_011543898.2, and XM_024452363.1) and polypeptide sequences (e.g., NP_002040.1, XP_011542199.1, XP_011542200.1, XP_024308131.1) are known in the art. These, together with any naturally occurring allelic, splice variants, and processed forms thereof that catalyze the same reaction are contemplated for use in the methods and compositions described herein.

In some embodiments of any of the aspects, the GATA1 nucleic acid includes or is derived from human GATA1 having the following nucleic acid sequence CCDS14305.1 (SEQ ID NO: 1).

ATGGAGTTCCCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCCA

GTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCT

TCCCCTCTGGGCCTGAGGGCTTGGATGCAGCAGCTTCCTCCACTGCCCCG

AGCACAGCCACCGCTGCAGCTGCGGCACTGGCCTACTACAGGGACGCTGA

GGCCTACAGACACTCCCCAGTCTTTCAGGTGTACCCATTGCTCAACTGTA

TGGAGGGGATCCCAGGGGGCTCACCATATGCCGGCTGGGCCTACGGCAAG

ACGGGGCTCTACCCTGCCTCAACTGTGTGTCCCACCCGCGAGGACTCTCC

TCCCCAGGCCGTGGAAGATCTGGATGGAAAAGGCAGCACCAGCTTCCTGG

AGACTTTGAAGACAGAGCGGCTGAGCCCAGACCTCCTGACCCTGGGACCT

GCACTGCCTTCATCACTCCCTGTCCCCAATAGTGCTTATGGGGGCCCTGA

CTTTTCCAGTACCTTCTTTTCTCCCACCGGGAGCCCCCTCAATTCAGCAG

CCTATTCCTCTCCCAAGCTTCGTGGAACTCTCCCCCTGCCTCCCTGTGAG

GCCAGGGAGTGTGTGAACTGCGGAGCAACAGCCACTCCACTGTGGCGGAG

GGACAGGACAGGCCACTACCTATGCAACGCCTGCGGCCTCTATCACAAGA

TGAATGGGCAGAACAGGCCCCTCATCCGGCCCAAGAAGCGCCTGATTGTC

AGTAAACGGGCAGGTACTCAGTGCACCAACTGCCAGACGACCACCACGAC

ACTGTGGCGGAGAAATGCCAGTGGGGATCCCGTGTGCAATGCCTGCGGCC

TCTACTACAAGCTACACCAGGTGAACCGGCCACTGACCATGCGGAAGGAT

GGTATTCAGACTCGAAACCGCAAGGCATCTGGAAAAGGGAAAAAGAAACG

GGGCTCCAGTCTGGGAGGCACAGGAGCAGCCGAAGGACCAGCTGGTGGCT

TTATGGTGGTGGCTGGGGGCAGCGGTAGCGGGAATTGTGGGGAGGTGGCT

TCAGGCCTGACACTGGGCCCCCCAGGTACTGCCCATCTCTACCAAGGCCT

GGGCCCTGTGGTGCTGTCAGGGCCTGTTAGCCACCTCATGCCTTTCCCTG

GACCCCTACTGGGCTCACCCACGGGCTCCTTCCCCACAGGCCCCATGCCC

CCCACCACCAGCACTACTGTGGTGGCTCCGCTCAGCTCATGA

In some embodiments of any of the aspects, the GATA1 mRNA sequences includes or is derived from human GATA1 having the following sequence NM_002049.3 (SEQ ID NO: 2):

GACACCCCCTGGGATCACACTGAGCTTGCCACATCCCCAAGGCGGCCGAA

CCCTCCGCAACCACCAGCCCAGGTTAATCCCCAGAGGCTCCATGGAGTTC

CCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCCAGTTTGTGGA

TCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCTTCCCCTCTG

GGCCTGAGGGCTTGGATGCAGCAGCTTCCTCCACTGCCCCGAGCACAGCC

ACCGCTGCAGCTGCGGCACTGGCCTACTACAGGGACGCTGAGGCCTACAG

ACACTCCCCAGTCTTTCAGGTGTACCCATTGCTCAACTGTATGGAGGGGA

TCCCAGGGGGCTCACCATATGCCGGCTGGGCCTACGGCAAGACGGGGCTC

TACCCTGCCTCAACTGTGTGTCCCACCCGCGAGGACTCTCCTCCCCAGGC

CGTGGAAGATCTGGATGGAAAAGGCAGCACCAGCTTCCTGGAGACTTTGA

AGACAGAGCGGCTGAGCCCAGACCTCCTGACCCTGGGACCTGCACTGCCT

TCATCACTCCCTGTCCCCAATAGTGCTTATGGGGGCCCTGACTTTTCCAG

TACCTTCTTTTCTCCCACCGGGAGCCCCCTCAATTCAGCAGCCTATTCCT

CTCCCAAGCTTCGTGGAACTCTCCCCCTGCCTCCCTGTGAGGCCAGGGAG

TGTGTGAACTGCGGAGCAACAGCCACTCCACTGTGGCGGAGGGACAGGAC

AGGCCACTACCTATGCAACGCCTGCGGCCTCTATCACAAGATGAATGGGC

AGAACAGGCCCCTCATCCGGCCCAAGAAGCGCCTGATTGTCAGTAAACGG

GCAGGTACTCAGTGCACCAACTGCCAGACGACCACCACGACACTGTGGCG

GAGAAATGCCAGTGGGGATCCCGTGTGCAATGCCTGCGGCCTCTACTACA

AGCTACACCAGGTGAACCGGCCACTGACCATGCGGAAGGATGGTATTCAG

ACTCGAAACCGCAAGGCATCTGGAAAAGGGAAAAAGAAACGGGGCTCCAG

TCTGGGAGGCACAGGAGCAGCCGAAGGACCAGCTGGTGGCTTTATGGTGG

TGGCTGGGGGCAGCGGTAGCGGGAATTGTGGGGAGGTGGCTTCAGGCCTG

ACACTGGGCCCCCCAGGTACTGCCCATCTCTACCAAGGCCTGGGCCCTGT

GGTGCTGTCAGGGCCTGTTAGCCACCTCATGCCTTTCCCTGGACCCCTAC

TGGGCTCACCCACGGGCTCCTTCCCCACAGGCCCCATGCCCCCCACCACC

AGCACTACTGTGGTGGCTCCGCTCAGCTCATGAGGGCACAGAGCATGGCC

TCCAGAGGAGGGGTGGTGTCCTTCTCCTCTTGTAGCCAGAATTCTGGACA

ACCCAAGTCTCTGGGCCCCAGGCACCCCCTGGCTTGAACCTTCAAAGCTT

TTGTAAAATAAAACCACCAAAGTCCTGAAAAAAAAAAAAAAAAAAAAAAA

A

In some embodiments of any of the aspects, the GATA1 mRNA sequences includes or is derived from human GATA1 having the following sequence XM_011543898.2 (SEQ ID NO: 3):

GACACCCCCTGGGATCACACTGAGCTTGCCACATCCCCAAGGCGGCCGAACCCTCCGCAACCACCAGCCC

AGGTTAATCCCCAGAGGCTCCATGGAGTTCCCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCC

AGTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCTTCCCCTCTGGGCCTGAGGG

CTTGGATGCAGCAGCTTCCTCCACTGCCCCGAGCACAGCCACCGCTGCAGCTGCGGCACTGGCCTACTAC

AGGGACGCTGAGGCCTACAGACACTCCCCAGTCTTTCAGGTGTACCCATTGCTCAACTGTATGGAGGGGA

TCCCAGGGGGCTCACCATATGCCGGCTGGGCCTACGGCAAGACGGGGCTCTACCCTGCCTCAACTGTGTG

TCCCACCCGCGAGGACTCTCCTCCCCAGGCCGTGGAAGATCTGGATGGAAAAGGCAGCACCAGCTTCCTG

GAGACTTTGAAGACAGAGCGGCTGAGCCCAGACCTCCTGACCCTGGGACCTGCACTGCCTTCATCACTCC

CTGTCCCCAATAGTGCTTATGGGGGCCCTGACTTTTCCAGTACCTTCTTTTCTCCCACCGGGAGCCCCCT

CAATTCAGCAGCCTATTCCTCTCCCAAGCTTCGTGGAACTCTCCCCCTGCCTCCCTGTGAGGCCAGGGAG

TGTGTGAACTGCGGAGCAACAGCCACTCCACTGTGGCGGAGGGACAGGACAGGCCACTACCTATGCAACG

CCTGCGGCCTCTATCACAAGATGAATGGGCAGAACAGGCCCCTCATCCGGCCCAAGAAGCGCCTGATTGT

CAGTAAACGGGCAGGTACTCAGTGCACCAACTGCCAGACGACCACCACGACACTGTGGCGGAGAAATGCC

AGTGGGGATCCCGTGTGCAATGCCTGCGGCCTCTACTACAAGCTACACCAGGTGAACCGGCCACTGACCA

TGCGGAAGGATGGTATTCAGACTCGAAACCGCAAGGCATCTGGAAAAGGGAAAAAGAAACGGGGCTCCAG

TCTGGGAGGCACAGGAGCAGCCGAAGGACCAGCTGGTGGCTTTATGGTGGTGGCTGGGGGCAGCGGTAGC

GGGAATTGTGGGGAGGTGGCTTCAGGCCTGACACTGGGCCCCCCAGGTACTGCCCATCTCTACCAAGGCC

TGGGCCCTGTGGTGCTGTCAGGGCCTGTTAGCCACCTCATGCCTTTCCCTGGACCCCTACTGGGCTCACC

CACGGGCTCCTTCCCCACAGGCCCCATGCCCCCCACCACCAGCACTACTGTGGTGGCTCCGCTCAGCTCA

TGAGGGCACAGAGCATGGCCTCCAGAGGAGGGGTGGTGTCCTTCTCCTCTTGTAGCCAGAATTCTGGACA

ACCCAAGTCTCTGGGCCCCAGGCACCCCCTGGCTTGAACCTTCAAAGCTTTTGTAAAATAAAACCACCAA

AGTCCTGAAAAAAAAAAAAAAAAAAAAAAAA

In some embodiments of any of the aspects, the GATA1 mRNA sequences includes or is derived from human GATA1 having the following sequence XM_024452363.1 (SEQ ID NO: 4):

GGAAGGGAGCCTCAAAGGCCAAGGCCAGCCAGGACACCCCCTGGGATCACACTGAGCTTGCCACATCCCC

AAGGCGGCCGAACCCTCCGCAACCACCAGCCCAGTCTTTCAGGTGTACCCATTGCTCAACTGTATGGAGG

GGATCCCAGGGGGCTCACCATATGCCGGCTGGGCCTACGGCAAGACGGGGCTCTACCCTGCCTCAACTGT

GTGTCCCACCCGCGAGGACTCTCCTCCCCAGGCCGTGGAAGATCTGGATGGAAAAGGCAGCACCAGCTTC

CTGGAGACTTTGAAGACAGAGCGGCTGAGCCCAGACCTCCTGACCCTGGGACCTGCACTGCCTTCATCAC

TCCCTGTCCCCAATAGTGCTTATGGGGGCCCTGACTTTTCCAGTACCTTCTTTTCTCCCACCGGGAGCCC

CCTCAATTCAGCAGCCTATTCCTCTCCCAAGCTTCGTGGAACTCTCCCCCTGCCTCCCTGTGAGGCCAGG

GAGTGTGTGAACTGCGGAGCAACAGCCACTCCACTGTGGCGGAGGGACAGGACAGGCCACTACCTATGCA

ACGCCTGCGGCCTCTATCACAAGATGAATGGGCAGAACAGGCCCCTCATCCGGCCCAAGAAGCGCCTGAT

TGTCAGTAAACGGGCAGGTACTCAGTGCACCAACTGCCAGACGACCACCACGACACTGTGGCGGAGAAAT

GCCAGTGGGGATCCCGTGTGCAATGCCTGCGGCCTCTACTACAAGCTACACCAGGTGAACCGGCCACTGA

CCATGCGGAAGGATGGTATTCAGACTCGAAACCGCAAGGCATCTGGAAAAGGGAAAAAGAAACGGGGCTC

CAGTCTGGGAGGCACAGGAGCAGCCGAAGGACCAGCTGGTGGCTTTATGGTGGTGGCTGGGGGCAGCGGT

AGCGGGAATTGTGGGGAGGTGGCTTCAGGCCTGACACTGGGCCCCCCAGGTACTGCCCATCTCTACCAAG

GCCTGGGCCCTGTGGTGCTGTCAGGGCCTGTTAGCCACCTCATGCCTTTCCCTGGACCCCTACTGGGCTC

ACCCACGGGCTCCTTCCCCACAGGCCCCATGCCCCCCACCACCAGCACTACTGTGGTGGCTCCGCTCAGC

TCATGAGGGCACAGAGCATGGCCTCCAGAGGAGGGGTGGTGTCCTTCTCCTCTTGTAGCCAGAATTCTGG

ACAACCCAAGTCTCTGGGCCCCAGGCACCCCCTGGCTTGAACCTTCAAAGCTTTTGTAAAATAAAACCAC

CAAAGTCCTGAAA

In some embodiments of any of the aspects, the GATA1 mRNA sequences includes or is derived from human GATA1 having the following sequence XM 011543897.2 (SEQ ID NO: 5):

GACACCCCCTGGGATCACACTGAGCTTGCCACATCCCCAAGGCGGCCGAACCCTCCGCAACCACCAGCCC

AGGTTAATCCCCAGAGGCTCCATGGAGTTCCCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCC

AGTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCTTCCCCTCTGGGCCTGAGGG

CTTGGATGCAGCAGCTTCCTCCACTGCCCCGAGCACAGCCACCGCTGCAGCTGCGGCACTGGCCTACTAC

AGGGACGCTGAGGCCTACAGACACTCCCCAGTCTTTCAGGTGTACCCATTGCTCAACTGTATGGAGGGGA

TCCCAGGGGGCTCACCATATGCCGGCTGGGCCTACGGCAAGACGGGGCTCTACCCTGCCTCAACTGTGTG

TCCCACCCGCGAGGACTCTCCTCCCCAGGCCGTGGAAGATCTGGATGGAAAAGGCAGCACCAGCTTCCTG

GAGACTTTGAAGACAGAGCGGCTGAGCCCAGACCTCCTGACCCTGGGACCTGCACTGCCTTCATCACTCC

CTGTCCCCAATAGTGCTTATGGGGGCCCTGACTTTTCCAGTACCTTCTTTTCTCCCACCGGGAGCCCCCT

CAATTCAGCAGCCTATTCCTCTCCCAAGCTTCGTGGAACTCTCCCCCTGCCTCCCTGTGAGGCCAGGGAG

TGTGTGAACTGCGGAGCAACAGCCACTCCACTGTGGCGGAGGGACAGGACAGGCCACTACCTATGCAACG

CCTGCGGCCTCTATCACAAGATGAATGGGCAGAACAGGCCCCTCATCCGGCCCAAGAAGCGCCTGATTGT

CAGTAAACGGGCAGGTACTCAGTGCACCAACTGCCAGACGACCACCACGACACTGTGGCGGAGAAATGCC

AGTGGGGATCCCGTGTGCAATGCCTGCGGCCTCTACTACAAGCTACACCAGGTGAACCGGCCACTGACCA

TGCGGAAGGATGGTATTCAGACTCGAAACCGCAAGGCATCTGGAAAAGGGAAAAAGAAACGGGGCTCCAG

TCTGGGAGGCACAGGAGCAGCCGAAGGACCAGCTGGTGGCTTTATGGTGGTGGCTGGGGGCAGCGGTAGC

GGGAATTGTGGGGAGGTGGCTTCAGGCCTGACACTGGGCCCCCCAGGTACTGCCCATCTCTACCAAGGCC

TGGGCCCTGTGGTGCTGTCAGGGCCTGTTAGCCACCTCATGCCTTTCCCTGGACCCCTACTGGGCTCACC

CACGGGCTCCTTCCCCACAGGCCCCATGCCCCCCACCACCAGCACTACTGTGGTGGCTCCGCTCAGCTCA

TGAGGGCACAGAGCATGGCCTCCAGAGGAGGGGTGGTGTCCTTCTCCTCTTGTAGCCAGAATTCTGGACA

ACCCAAGTCTCTGGGCCCCAGGCACCCCCTGGCTTGAACCTTCAAAGCTTTTGTAAAATAAAACCACCAA

AGTCCTGAAAAAAAAAAAAAAAAAAAAAAAA

In some embodiments of any of the aspects, the GATA1 polypeptide includes or is derived from human GATA1 having the following amino acid sequence NP_002040.1 (SEQ ID NO: 6):

MEFPGLGSLGTSEPLPQFVDPALVSSTPESGVFFPSGPEGLDAAASSTAPSTATAAAAALAYYRDAEAYR

HSPVFQVYPLLNCMEGIPGGSPYAGWAYGKTGLYPASTVCPTREDSPPQAVEDLDGKGSTSFLETLKTER

LSPDLLTLGPALPSSLPVPNSAYGGPDFSSTFFSPTGSPLNSAAYSSPKLRGTLPLPPCEARECVNCGAT

ATPLWRRDRTGHYLCNACGLYHKMNGQNRPLIRPKKRLIVSKRAGTQCTNCQTTTTTLWRRNASGDPVCN

ACGLYYKLHQVNRPLTMRKDGIQTRNRKASGKGKKKRGSSLGGTGAAEGPAGGFMVVAGGSGSGNCGEVA

SGLTLGPPGTAHLYQGLGPVVLSGPVSHLMPFPGPLLGSPTGSFPTGPMPPTTSTTVVAPLSS

In some embodiments of any of the aspects, the GATA1 polypeptide includes or is derived from human GATA1 having the following amino acid sequence XP_011542199.1 (SEQ ID NO: 7):

MEFPGLGSLGTSEPLPQFVDPALVSSTPESGVFFPSGPEGLDAAASSTAPSTATAAAAALAYYRDAEAYR

HSPVFQVYPLLNCMEGIPGGSPYAGWAYGKTGLYPASTVCPTREDSPPQAVEDLDGKGSTSFLETLKTER

LSPDLLTLGPALPSSLPVPNSAYGGPDFSSTFFSPTGSPLNSAAYSSPKLRGTLPLPPCEARECVNCGAT

ATPLWRRDRTGHYLCNACGLYHKMNGQNRPLIRPKKRLIVSKRAGTQCTNCQTTTTTLWRRNASGDPVCN

ACGLYYKLHQPPFWQVNRPLTMRKDGIQTRNRKASGKGKKKRGSSLGGTGAAEGPAGGFMVVAGGSGSGN

CGEVASGLTLGPPGTAHLYQGLGPVVLSGPVSHLMPFPGPLLGSPTGSFPTGPMPPTTSTTVVAPLSS

In some embodiments of any of the aspects, the GATA1 polypeptide includes or is derived from human GATA1 having the following amino acid sequence XP_011542200.1 (SEQ ID NO 64)

MEGIPGGSPYAGWAYGKTGLYPASTVCPTREDSPPQAVEDLDGKGSTSFLETLKTERLSPDLLTLGPALP

SSLPVPNSAYGGPDFSSTFFSPTGSPLNSAAYSSPKLRGTLPLPPCEARECVNCGATATPLWRRDRTGHY

LCNACGLYHKMNGQNRPLIRPKKRLIVSKRAGTQCTNCQTTTTTLWRRNASGDPVCNACGLYYKLHQPPF

WQVNRPLTMRKDGIQTRNRKASGKGKKKRGSSLGGTGAAEGPAGGFMVVAGGSGSGNCGEVASGLTLGPP

GTAHLYQGLGPVVLSGPVSHLMPFPGPLLGSPTGSFPTGPMPPTTSTTVVAPL

In some embodiments of any of the aspects, the GATA1 polypeptide includes or is derived from human GATA1 having the following amino acid sequence XP_024308131.1 (SEQ ID NO: 65):

MEGIPGGSPYAGWAYGKTGLYPASTVCPTREDSPPQAVEDLDGKGSTSFLETLKTERLSPDLLTLGPALP

SSLPVPNSAYGGPDFSSTFFSPTGSPLNSAAYSSPKLRGTLPLPPCEARECVNCGATATPLWRRDRTGHY

LCNACGLYHKMNGQNRPLIRPKKRLIVSKRAGTQCTNCQTTTTTLWRRNASGDPVCNACGLYYKLHQVNR

PLTMRKDGIQTRNRKASGKGKKKRGSSLGGTGAAEGPAGGFMVVAGGSGSGNCGEVASGLTLGPPGTAHL

YQGLGPVVLSGPVSHLMPFPGPLLGSPTGSFPTGPMPPTTSTTVVAPLSS

In some embodiments of any of the aspects, the sequence encoding a GATA-binding factor 1 (GATA1) polypeptide comprises at least 60% sequence identity to a nucleotide sequence encoding a human GATA1 polypeptide. In some embodiments of any of the aspects, the sequence encoding a GATA-binding factor 1 (GATA1) polypeptide comprises a nucleotide sequence encoding a human GATA1 polypeptide.

In some embodiments of any of the aspects, a sequence encoding a GATA1 polypeptide is comprises, consists of, or consists essentially of a nucleic acid sequence selected from any of SEQ ID NOs. 1-5. In some embodiments of any of the aspects, a sequence encoding a GATA1 polypeptide comprises, consists of, or consists essentially of a nucleic acid sequence with at least 60%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98% or greater sequence identity to one of SEQ ID NOs. 1-5. In some embodiments of any of the aspects, a sequence encoding a GATA1 polypeptide comprises, consists of, or consists essentially of a nucleic acid sequence with at least 60%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98% or greater sequence identity to one of SEQ ID Nos. 1-5, which encodes a polypeptide which retains the GATA1 wild-type activity, e.g., it has transcription factor activity as described herein.

In some embodiments of any of the aspects, a GATA1 polypeptide comprises, consists of, or consists essentially of an amino acid sequence selected from any of SEQ ID NOs. 6, 7, 64 and/or 65. In some embodiments of any of the aspects, a GATA1 polypeptide comprises, consists of, or consists essentially of an amino acid sequence with at least 60%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98% or greater sequence identity to one of SEQ ID NOs. 6, 7, 64 and/or 65. In some embodiments of any of the aspects, a GATA1 polypeptide comprises, consists of, or consists essentially of an amino acid sequence with at least 60%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98% or greater sequence identity to one of SEQ ID NOs. 6, 7, 64 and/or 65, which retains the GATA1 wild-type activity, e.g., it has transcription factor activity as described herein.

Hematopoietic stem cells (HSCs) are the stem cells that give rise to other blood cells. This process is called haematopoiesis. This process occurs in the red bone marrow, in the core of most bones. In embryonic development, the red bone marrow is derived from the layer of the embryo called the mesoderm. Hematopoiesis is the process by which all mature blood cells are produced. It must balance enormous production needs with the need to precisely regulate the number of each blood cell type in the circulation. In vertebrates, the vast majority of hematopoiesis occurs in the bone marrow and is derived from a limited number of HSCs that are multipotent and capable of extensive self-renewal. HSCs are found in the bone marrow of adults, especially in the pelvis, femur, and sternum. They are also found in umbilical cord blood and, in small numbers, in peripheral blood. Mammalian hematopoiesis produces approximately 10 distinct cell types, the most abundant of which belongs to the erythroid lineage. Erythropoiesis results in the production of large numbers of red blood cells that are responsible for supplying oxygen to the developing embryonic, fetal, and adult tissues. They also help maintain blood viscosity and provide the shear stress required for vascular development and remodeling.

As used herein, the term “Hematopoietic stem cell” or “HSC” refers to a clonogenic, self-renewing pluripotent cell capable of ultimately differentiating into all cell types of the hematopoietic system, including B cells T cells, NK cells, lymphoid dendritic cells, myeloid dendritic cells, granulocytes, macrophages, megakaryocytes, and erythroid cells. As with other cells of the hematopoietic system, HSCs can be defined by the presence of a characteristic set of cell markers. In some embodiments of any of the aspects, a HSC can be a cell which expresses CD34, CD90, or the combination thereof. Other marker signatures used to identify HSCs include, but are not limited to: EMCN⁺, CD34⁺, CD59⁺, CD90⁺, CD117⁺, CD133⁺, CD38⁻, lin⁻, CD150⁺, CD48⁻, and CD244⁻.

GATA1 protein levels are suppressed in HSCs from DBA patients and increasing GATA1 expression specifically in those cells can ameliorate the erythroid lineage commitment defect characteristic of DBA. The expression of GATA1 during terminal erythropoiesis needs to be regulated.

In one aspect of any of the embodiments, described herein is a nucleic acid sequence comprising a) at least one heterologousheterologous regulatory sequence selected from i) a hematopoietic enhancer element and/or ii) a binding site for for a HSC-restricted miRNA; and b) a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.

Regulatory sequences as disclosed herein include but are not limited to promoters, enhancers and other expression control elements (e.g., polyadenylation signals) that control the transcription or translation of a gene they are operably linked to. Such regulatory sequences are described, for example, in Goeddel; Gene Expression Technology. Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990). Examples of regulatory sequences for mammalian host cell expression include viral elements that direct high levels of protein expression in mammalian cells, such as promoters and/or enhancers derived front cytomegalovirus (CMV), Simian Virus 40 (SV40), adenovirus, (e.g., the adenovirus major late promoter (AdMLP)) and polyoma. Alternatively, nonviral regulatory sequences may be used, such as the ubiquitin promoter, Elongation factor 1-alpha 1 (eEF1a1) promoter or β-globin promoter. A eukaryotic promoter is a regulatory region of DNA located upstream of a gene that binds transcription factor II D (TFIID) and allows the subsequent coordination of components of the transcription initiation complex, facilitating recruitment of RNA polymerase II and initiation of transcription.

In some embodiments of any of the aspects, disclosed herein are heterologous regulatory sequences or combinations thereof that permit carefully regulated expression of GATA1 in hematopoietic progenitors to improve erythropoiesis in DBA without unwanted effects on hematopoiesis.

As used herein, “HSC-restricted”, e.g., as used in reference to regulatory sequences, is an activity or element which preferentially occurs or exists in HSCs as compared to other cells of the hematopoietic lineage (e.g. erythrocytes or erythroid precursors). In some embodiments of any of the aspects, the activity or element occurs or exists at a level in HSCs which is at least 10×, at least 100×, or higher than in other cells of the hematopoietic lineage (e.g. erythrocytes or erythroid precursors). More specifically, an HSC-restricted miRNA is a miRNA that is expressed at higher (e.g., 10×, 100×, or higher) levels in HSCs than in other cells of the hematopoietic lineage (e.g. erythrocytes or erythroid precursors).

The term “heterologous” refers to a combination of elements which is not naturally occurring. For example, a heterologous regulatory sequence is one that is not naturally found operably connected to the coding sequence being considered. In some embodiments of any of the aspects, the heterologous regulatory sequence can be a regulatory sequence not naturally found in that species.

As used herein, “regulatory sequence” refers to a nucleic acid sequence that is capable of increasing or decreasing the expression of specific genes, nucleic acid sequences or polypeptides.

In some embodiments of any of the aspects, the heterologous regulatory sequence is a hematopoietic enhancer element. A Hematopoietic enhancer element is an enhancer element which is active in hematopoetic cells, e.g., in HSCs and/or in other cells in the erythroid lineage. In some embodiments, the hematopoietic enhancer element is active in cells undergoing erythropoiesis. A hematopoietic enhancer element is not necessarily exclusively active in any of the foregoing cells. Alternatively, in some embodiments of any of the aspects, the hematopoietic enhancer element can be HSC-restricted and or restricted to erythroid precursors/progenitors. In some embodiments, the enhancer element is located distal to the sequence encoding GATA1, e.g., it is a distal enhancer element. Suitable enhancer elements can readily be identified by one of skill in the art by consulting, e.g., expression data freely available on the world wide web for one or more cell types in the erythroid lineage and identifying genes which are expressed or highly expressed in those cells.

In some embodiments of any of the aspects, the heterologous enhancer element comprises the following nucleic acid sequence: NC_000023.11:48638900-48639300 on Homo sapiens chromosome X, GRCh38.p12 Primary Assembly (SEQ ID NO: 10):

ACTTTCATGAAATTACTGACATAATTTTGGGTCCAAAATTTCAAAATTTTAAATATTTTTATTTGGAATT

TTAAAATAATTTATATGCTCTTTTTACTGGCTAATAATGCTATTCATTATAATCTGATATTCAAACTGTC

TAAAAAAGTTAACAATCATTGATTTATTTGTTGTATATACAGTTTATTTCTATGACAGTTTTAATGTCAC

CTAATATTATTTTTAATGTTTCAATTTCTCATTTAAATACATTTTGTGTTGTTTATTTTAATCTCATTCA

ATCTGTATGTGCAAATGGCTTAGAAAAAAAGGCCATATATGACAAGCCCACAGCTAACATCATATAGTCA

ACAGTGAAAAACTAAAAGCTTCTCCTTTAAGATCAGGAACAAGGCAAGGAT

In some embodiments of any of the aspects, the heterologous enhancer element comprises the following nucleic acid sequence: NC_000023.11:48641200-48641700 on Homo sapiens chromosome X, GRCh38.p12 Primary Assembly (SEQ ID NO: 11):

TTTTATTATTTATTTATTTTTTTGAGACAGATTCTCACTCTGTCGCCTAGGCTGGAATGCAATGGCGTGA

TCCCGGCTCACTGCAACCTCTGCCTCCCAGGTTCAAGCGATTCTCCTGCCTCAGCCTCCCGAGTAGCTGG

GATTACAGGCATGCGCCACCACGCCTGGCTAATTTTTTGTATTTTTAGTAGAGACAGGGTTTCTCCATGT

TGGTCAGGCTGGTCTCGAACTACCGACCTTAGGTAATCCTCCCACCTCGGCCTCCGAAAGTGCTGGGATT

ACAGGCGTGAGCCACTGCGCCCGGCCTACATTTATTTTTAAATAAATGGATTTAAATGTTAAGACCTGAA

CCTATAAAAATGGGACACCTGCATAGGGCATTAACCATGAGTAGAGCTTGCAGGACTGGAAGTTGCTATG

GGTGAGTCAGTGTGTGAGTGGTGAGTGAATGGGAAGGCCTAGGACATTCCTGTACACTACCATGGACTTT

ATAAATTCTGT

In some embodiments of any of the aspects, the heterologous enhancer element comprises the following nucleic acid sequence: NC_000023.11:48644250-48645100 on Homo sapiens chromosome X, GRCh38.p12 Primary Assembly (SEQ ID NO: 12):

TCATAGAAACAAAACACTAGGATGGTGGTTGCCAGGGGCTGAGAGGATGGGGAAATGGGGAGTTGCTGTT

CAATGGATATTGCGCCCGGCCAGCCACACCAATTCTTACACCAAGAAGTGATGGAGCACAAGTGCTGATG

GGCCTTAACACCATCATAAACATCTTTTGTTTGTCCCGGGGAAGAAATTCCCAACTCCTTCCAAAGGTCT

GCCAAAGTCTACCAGTATCCCAAGCTGATTTCCTTATCCCCTCAGCAGATGCTGGAAAGCTGGAAGTCTC

CTTCCTTCTCACTCTCCTGCTTGACATCTGCACAGCCATTCTTCTTCCTCCCCTTGCTCCCCTTCCTCCC

CTTCTCCTTCTCCTACTTATTGAGACAGAGTCTCGCTCTGTCGCCGAGGCTGGAGTGCAGTGGTGTCATC

TCGGCTCACTGCAACCTCTGCCTCCTGGGTTCAAGCAATTCTCTTGCCTCCACCTCCTGAGTAGGTGGGA

TTACAGGTGTGTGCCACCACAGCAGGCTAATTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATATTGG

CCAGGATGGTCTCGAACTCCTGACCTCAGGTGATCTGCCTGTCTTGGCCTCCCAAAGTGCCGGGATTACA

GGCATGAGCCACCGGCGCCCGGCCCTTTTTATTATTATATATTATTTTTGAGACTGGGTCTCACTCTGTA

ATCCAGGCTGGAGGGCAGTGGCGTGATCACAGCTCACTGCAGCCCTGACCTCTTGGGCACAAGCAGTCCT

CCCGCGTCAGCCACCCAAAGTGCTGGGTCTACAGGCATGAGCTACTGTGCCCAGTCTACGATTTTTTTAA

AATTTATAATT

In some embodiments of any of the aspects, the heterologous enhancer element comprises the following nucleic acid sequence (SEQ ID NO: 38):

ATGAAACCATATCTGCTATTTTCATTTATCTTGGTTTCAGCCTATTTTGCTTGTCTGGACACTACAGTCCACGGGAGCCTAGG

TCGAGCGAGGTCCAAGAATCCCCAGGGTGGGCAGGGAGGGTGGAAGAGGGCCTCCAGTGCCCAAGAGGTGCCCCACAAGCATG

GGACCCGCCCCCTCCCCTGGACTGCCCCACCCACTGGGGCACCAGCCACTCCCTGGGGAGGAGGGAGGAGGGAGAAGGGAGGG

AGGGAGGGAGGGAGGAAGGGAGCCTCAAAGGCCAAGGCCAGCCAGGACACCCCCTGGGATCACACTGAGCTTGCCACATCCCC

AAGGCGGCCGAACCCTCCGCAACCACCAGCCCAGAGATCTAGAGTTAATCCCCAGAGGCTCCATGGTGAGCAAGGGCGAGGAG

CTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGA

CCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCC

GAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGA

CACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACT

ACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG

GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTA

CCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCG

GGATCACTCTCGGCATGGACGAGCTGTACAAGTAAAGCGGCCGCATCGATACCGTCGACCTCGATCGAGACCTAGAAAAACAT

GGAGCAATCACAAGTAG

In some embodiments of any of the aspects, the heterologous enhancer element comprises the following nucleic acid sequence (SEQ TD NO 39)

ATGGCGGGCAAGAAGTTGAGGCCACTGTCCCTGGGTGTTCCTACCCCCACACCCTCACCCCAAGACAGCCTGTTACTGCGGCG

CCAACAGCCACGGTCGCCTACATCTGATAAGACTTATCTGCTGCCCCAGGGCAGGCCGGAGCTGGCGTAAGCCCCAGTGGGGC

GCTAAGTGAGTGTGCCCCTGCCTCCCGCCAGCACTGGCCTGGCCTGCAGGCTTAGCCTGGGTCATCAAGGTATCCCACAGGCT

CTAGTTCAAATCCAGCAGAACCTCTCTGAGCCTCACTCTTCTCACCTGCAAAATGGGTACAGCCACATCCCTTCTCTCCCTGC

AGCCAGGAAGACGCACATACACAGGAGTCTAGCCCACACCGGCCCCGCACAAATTAAGGGCTTTACTCTCTGAAAAGCCCAGT

GAAGTCATGAAACCATATCTGCTATTTTCATTTATCTTGGTTTCAGCCTATTTTGCTTGTCTGGACACTACAGTCCACGGGAG

CCTAGGTCGAGCGAGGTCCAAGAATCCCCAGGGTGGGCAGGGAGGGTGGAAGAGGGCCTCCAGTGCCCAAGAGGTGCCCCACA

AGCATGGGACCCGCCCCCTCCCCTGGACTGCCCCACCCACTGGGGCACCAGCCACTCCCTGGGGAGGAGGGAGGAGGGAGAAG

GGAGGGAGGGAGGGAGGGAGGAAGGGAGCCTCAAAGGCCAAGGCCAGCCAGGACACCCCCTGGGATCACACTGAGCTTGCCAC

ATCCCCAAGGCGGCCGAACCCTCCGCAACCACCAGCCCAGAGATCTAGA

In some embodiments of any of the aspects, hematopoietic enhancer element comprises, consists of, or consists essentially of a sequence of at least 80% homology to a nucleotide sequence that is selected from the group consisting of: SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 38 and/or SEQ ID NO: 39. In some embodiments of any of the aspects, a hematopoietic enhancer element comprises, consists of, or consists essentially of a sequence of at least with at least 60%, at least 80%, at least 85, at least 90%, at least 95, at least 98 or greater sequence identity to one of SEQ ID 10, SEQ ID NO: 11, ID NO: 12, SEQ ID NO: 38 and/or SEQ ID NO: 39. In some embodiments of any of the aspects, the nucleic acid sequence described herein comprises at least one, or at least 2, or at least 3, or at least 4, or at least 5, or at least 6, or at least 7, or at least 10, or at least 11, or at least 12, or at least 13, or at least 14, or at least 15, or at least 16, or at least 17, or at least 20, or at least 25, or at least 30 Hematopoietic enhancer elements. Where a subset of the three foregoing Hematopoietic enhancer elements is used, any combination of the Hematopoietic enhancer elements can be used in each of various embodiments of the aspects described herein. For example, it is specifically contemplated herein that any pairwise combination of the 3 Hematopoietic enhancer elements can be used, e.g., any combination shown in Table 1.

TABLE 1

Contemplated exemplary combinations of

enhancer elements are indicated by “X”

Enhancer
Enhancer
Enhancer
Enhancer
Enhancer

element
element
element
element
element

(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID

NO: 10)
NO: 11)
NO: 12)
NO: 38)
NO: 39)

Enhancer

X
X
X
X

element (SEQ

ID NO: 10)

Enhancer
X

X
X
X

element (SEQ

ID NO: 11)

Enhancer
X
X

X
X

element (SEQ

ID NO: 12)

Enhancer
X
X
X

X

element (SEQ

ID NO: 38)

Enhancer
X
X
X
X

element (SEQ

ID NO: 39)

In some embodiments of any of the aspects, the hematopoietic enhancer element can be an enhancer element of a gene selected from the group consisting of: Kell metallo-endopeptidase (KEL), 5-aminolevulinate synthase 2 (ALAS2), glycophorin A (GYPA).

As used herein, “KEL”, “ECE3”; “CD238”, or “Kell metallo-endopeptidase” is a type II transmembrane glycoprotein that is the highly polymorphic Kell blood group antigen. Sequences for KEL are known for a number of species, e.g., human KEL (the KEL NCBI Gene ID is 3792), the nucleic acid sequence (e.g. NG_007492.2), mRNA sequences (e.g. NM_000420.3) and polypeptide sequences (e.g., NP_000411.1) are known in the art. These, together with any naturally occurring allelic, splice variants, and processed forms thereof that catalyze the same reaction are contemplated for use in the methods and compositions described herein.

In some embodiments of any of the aspects, the KEL enhancer elements includes or is derived from human KEL sequences having the following nucleic acid sequence NG_007492.2 (SEQ ID NO: 40):

NG_007492.2: 5001-26303 Homo sapiens Kell metallo-endopeptidase

(Kell blood group) (KEL), RefSeqGene on chromosome 7

GGGAGGAGAAGCCTGGGTGCCCCCCACTGATAAGCAGGCTCCACCCAGAGGCCAGTCCTGTGTGTCTGGG

GACAAGGCGAAAGAGCAGCAGAAGTGCCCCTTCTCCAGGATCAAGGAACTGGGGCGGGGGGTGTTTCCTG

GACCCCAGTCCTCCGAATCAGCTCCTAGAGTGGAACCAGGAAGGATTCTGGAGCCACAGAAGATAGACAG

ATGGTAAGTCCCCTTTTGGAGTCAGAGGCTTAGCGGGGAGGGGTGAGGGTGGCTGTGTGCAAAAGTCCTG

CCCCCACTGGAGGGGAGGGAATGTAAGGCTTACAGAGTAGAAAGGTGGGGAGAGAGGGAGGTAATGGGAG

AGGGATCGAGAAATGGCACATTCAGGGGACAGGTT GTTCTGAAGCCCATCTGGGAACACTGCTCCGAGA

TAAAAATATGTGTGTGGGGGCAGGGCAGGCAGCGAGGGTATCAAAATGGCCTGATAAAACTCTCTTCAAT

GCACCATTTCCTGAACCAGCTTCTCTCTCCTCCTTCTCCCTCCACTCACTTCAGGAAGGTGGGGACCAAA

GTGAGGAAGAGCCGAGGGAACGCAGCCAGGCAGGTGGAATGGGAACTCTCTGGAGCCAAGAGGTAAGTGG

CCTCCTCTCCTGGGTCTGGAATACACTGATGTTGTCACTCTCGGCTCTAAAATCCCACAAACACTCATCT

ACTAACTGTCTGCTTCATCCTCACCCAAAACAGTTGACATTCCTTGTTTTCTCATCTCCCAGGAGTTAAA

GTAGGGCTGGGTTTAGGAAGAATTGGGATAATTATTTCTGTATAAAGGGACTGTAGCACCAACAGATTCA

TTCTCTCTCCTCTTCTTCCCATCCCTGTCTCTCAACCCCCATCTTGTATCTTTCACCTCTTGGTTCCTCC

CACAGAGCACTCCAGAAGAGAGGCTGCCCGTGGAAGGGAGCAGGCCATGGGCAGTGGCCAGGCGGGTGCT

GACAGCTATCCTGATTTTGGGCCTGCTCCTTTGTTTTTCTGTGCTTTTGTTCTACAACTTCCAGAACTGT

GGCCCTCGTAAGCAAGATCCCAGACCCCCTAACCTAGTCAGCCCTCCCCCAGCCCTGGGGCCCAGGCCCA

GTCCCTGCTCCTGGGGCTTCTGCCCACCCTGACCCTTGGGGTCCCCATGGTTCTTCTTCCTCCCTGCATC

CTAACCATTTCTTTTTCATCAGCTCCCCACTTAGTTACTCACCTGATGTTCTTTGCCTAGCCCCTTGGGG

GAGCCCTTGTCTTTTTGCCTCTTCTTTCCCAGCTCTGAGCTTTTCCCCACAGGCCCCTGTGAGACATCTG

TGTGTTTGGATCTCCGGGATCATTACCTGGCCTCTGGGAACACAAGTGTGGCCCCCTGCACCGACTTCTT

CAGCTTTGCCTGTGGAAGGGCCAAAGAGACCAATAATTCTTTTCAGGAGCTTGCCACAAAGAACAAAAAC

CGACTTCGGAGAATACTGGGTGAGGAAAGCAGGGTGGAAGATGCTCTGTGCAAGTGGGTGACTCTGTGCC

TAAAATGACCATGACTGCTCCAAACCCTGTGTAGTTGTGGAACAACTGATTTGCACCATCCCAGGTGGGA

TTATACGGGTGGATGATTGGAGATGATGGGGGAGTAAAAGAGGCAGGATGGCGGGAGCTGCCTGGGTTTG

CTCATCTCTCACTGTTTCCTGTTGCCTTGCCTTGGGTACCCTTCTTCCGTTTCTCTTGGTCCCTTTCTGC

ATTTTTTTCTTTATCTAATTTCCATCTTCTTTGCTTCTCCATGTATCCATAATTACTCCATTCTCTCCAA

CTTGTCCCTTTTAGCAAGCTCCATCTTTGTTGCTTCCTCCAAATGTTCAGTTTCTATCCTATGCATGGTG

TTTTCCTCCACAAGCATCTCTTCAGCATCTCCTGCATTTCAATTCTTTTGTCCATCACTCTCATTCTCTA

ACCTCCAAAACCTCAGTCTCCCAATGACTCCTTGTCAACATTACCCTCTCCCTCTCACCATGCCGGAGCT

CCCCTCTCTCACAATGATCTCTTGCTTCTTGCTTCTCCATTGAAACCTTGAACCATGGCAAGCAAGTTGA

CCTGGAACAAGTGGGATGTTAGAGATGGATGATTGGAGATGATGGATGATGGTGGAATGAAAGGGGTAGG

ATGGTGGGGTGAGAAGTGAGAGAGGGCTTCATCACTGTGCATAAGAGAAAAAGTGGGTAAGTACAAAGGA

TATGCTGGAAGAAGAGGAGAGCTGAGTTAATTGGCAGTGGAAGTAAAGTTCCTGCAGATGGAGGCTGGAG

AGGAAAACTGCCAGGACTGAGAGGAAAACCAGAAGGATGAGCTGAAACTGAGTAGGAGGTTGGAAGTGCG

TCCCAGGAAGTTGGTGGATGGTGGTGAGGATTTGGGAATAAGAACATATAAGATAGACATGCATTTCCAG

TGCAAGGGAACCTAAAGAATGTGTTGACACTATCAATTAGAATCTGGGAAAAGTAAATGCACCCCTCTGC

CCTCTTTTTTTGATGGGGAAAGAGTGGGAGGGGGCCTCTCTTTGGGTAAATGGATACTTTCAGGGAAGGC

ACAGAGATAAAAAGAAAAAATATGCTCAGGATAAATTATATTGCCTACAATGGGATGAATAGATATCAGG

GGGACTGAGGGTGAAAAGAGTGTTAGATATTAGAGGGTGGATGATTCAGAGAGACTTGCATTTGATTATT

GTAGTGTGTTTGTTTCCTGGGATCAATGGATGAGGAGTCTGGACTAGAAGAGTCTTCCCCTGTTTCTTCT

CTTTGCTAAACCTTTCCTTATGAGTTTTCTTCTCTCCAAATCCTTAAAGTTCTCTAGTTCCCTGAATTTG

TCTAATTTCTTCAATCATTTCTTTTGTCTTTCATTTCTCTCTTTTCTCCTTTGCCCATATCCCACTTATT

GCTACCTTTCTCCTTTCTTCCCTGTCTTTTCCTTCTTGGTTTCTTCCCCACATTTCTTTTATTTTCCATA

TTGTCTTCTTCTCCTCATTCTCTTTCCCTGCTTTCATCATTTCATCAAGTTGATCCATTCCAAATTGGGC

AGTCCTCTCATCTTTCTTATTTTCCTCATCTCTATTCCTCCCCCTCCTTCCATATTCTGTGGGAGTCTTT

CTTTCCTGTAAGCTCCCTGTCTCCCACCCTCCCTCTTTGCCTCTATACCAGTTGCCACTCCTTTAATTCT

CCTGCCGACAAAAAGAGTCAAACTCTGTAAAATATTTGAAAAGATTTATTTTGAGCCAAATATGAGTGAC

CATGGCCCATGATACAGTCCTCAGGAGATCCTGAGAACATGTGCCCAAGGTGGCTGGGGCACAGCTTGGT

TTTATACATTTTAGAGAGTCATGAGACATCAATCAAATACATTTAAGAAATACATTGGTTTGGTCCAGAA

AGGTGGAACAACTCAAAGGGGTGGGGGTGGCTTCCAGGGTACAGGTGAATTTAAACATTTCCGGATTGAC

AGTTGCTTGAGTTTGTCTAAAGATCTGGGATAGATAGAAAGGGAATGTTCAGGGTAAGATAAAGATTGCG

GAGACCGAAGTTCTTTTGAAGTCTTATAGTGGCTGCCCTTAGAGACAATAGGTGACAAATGTTTCCTATT

CAGATCTTAGTTAATCAAAAGATCTAGCTATGTTAATGAGATATGTTAATAGCTAATAGAGATGCTTTAC

AGATGCAAATTTTCCTCCACAAAGAACAGCTTTGCAGGGCCATTTCAAAATGTGGCAAAGAAACATGTTT

TGGGGTAAAATATTTTTGTTTTCTTCTTTGTCTCGTAATGTTATGCCAGAATCAGGTTAGAAAGTAAATC

ATGTTACATGGGTTAAATAAAACCCATCTGATGAGAACTTATGATATAGGGCATGACTCCCCAGACCCCT

TTGATAGGAATTTGGGGCAAGATAAAAAAAATCAGAGTTTAGTCCTCACTCCCATGCTTCCTTTCTAGAG

GTCCAGAATTCCTGGCACCCAGGCTCTGGGGAGGAGAAAGCCTTCCAGTTCTACAACTCCTGCATGGATA

CACTTGCCATTGAAGCTGCAGGGACTGGTCCCCTCAGACAAGTTATTGAGGAGGTGAGAAAAGTTGGGAT

ATTAACTTTTCTGGATACATAACATATGGGACCAATGCATGCTTAGGGCTGCCATTTTTTTTTCTAGAGG

GTGGGTCTTCTTCCTAGGGCCCCCCAATTTCTAGGAGGGAGATGGAGATGGAAATGGTTATGCCCTATGA

AAGTATCAGGACCTTGGGAGAAGGCAGATAAAAAAGGATAGATGTGGCTTCCTAGAGGAATCGAAGGGCG

CAGGGCAGAGGTCAGGCAGTAGCAGCTGTGTAAGAGCCGATCCAGACAATGGGGGATGGGCTCCACGGAT

CCTTATGCTCAGCCCCCTCTCTCTCCTTTAAAGCTTGGAGGCTGGCGCATCTCTGGTAAATGGACTTCCT

TAAACTTTAACCGAACGCTGAGACTTCTGATGAGTCAGTATGGCCATTTCCCTTTCTTCAGAGCCTACCT

AGGACCTCATCCTGCCTCTCCACACACACCAGTCATCCAGGTGAGGGATGCACTGGCGAAGACACAGTTG

GACCTGGCCTGCCTCCAACTCTAGCCAATCATCCCTTAGAGGAAGGTTGCAGGTTGGGAAGAGAGGACAC

CTGTGTGATATAGGAAACAACCCTACCTTAAGGGAAAATTATTGATGTGAAAGTCAGGGACATTAGCTGG

GGGTGGGAAATGGAGCAGCAGAGCCAGTGCTGGGAAGACAGAAGTAGGCCTGGTCTTTCTTACTGTTAAT

CTGGATTAGTCTCAGAGCCCCTTAACCAGTCCTCCTATCTCTAGGATTGCCCTCATTTTATTTACTCTTT

ATTTTTACTAGAGGGAACTTTTCTAAACCAAGGGCTAACTAACTATGCTACTGTCTGTATTTAAATGCTT

GTCAGTGACCCAGTGGCTTGCCAGGTCATCAGAATCTAGTCCCTAATCTTTAGTAAAGCTTTGCAAGCAC

CTTGTGATCTGACCCCTACACACTTCTCCAGCCTTATCTCCCGTACATTCCTTCTCTCCCTTACCCCCAA

GCCATGCTGACTCACTGCTGCTTCCAGGAATATTCCTCAGTTCTTTGCCTATGCTGCTCCCTGTGCCTGC

AACCATCCCCCACACTGAACCTGGAAAACTTACATGTTTTTCAAATGTTGGCTTTATTATCTCTTCCAGG

AAGTCTTCACCGACACCCTAGTTATGAGTTAGGTGAAGCCCTGCTCTCCCTACTTTCGTTTCCTCATGCT

CTCAGCATTTATCACTCTGTGTTGAAGATTGTGAGCCTCTTTAGAACAGGACCATGCTTTATTCACCTTT

GTTTCTCAGGACCTATCACAGGGCCAGGCAGCTAGAAGTTTTGCCAGGTATTTGTAGTGAGTGAGTAACT

AAATAAAAACACTGGAGCTATCACTCTTGTGGTTAAACAATGTAATGCTATCTGCATATTTGGGCCCTAC

TGTCAAAAGAGCCACAAAATTACCAAAGGATAAGTACAAAAGAAGAATTGATTATCATTATGAGGTGTTC

TAAAATTTAGTTTTAAACAGTCTGCTCAGGAGTTTAACTGATGTGGCCTTTAGGGGCCGGTTAAGATCTG

GTTAAGGAGAGGCTCAGAGAGGAGAGAATGAGAGAAGGTGAGCTAAGCCAGCCTTGAAACATGGTTAATT

CACACAAGTGGAGGTGAAGCTATGGGGCGTTGGAAATGCTGAGCCAGGGGGAGGACCTGGAATGGTGTGA

TTCCTTCGTGGAGTCAGTGAGGAGGCTGATCTATTTAATTGAGGATTTGGGAGGCAAGGTGGGGTGCAGT

GGGAGGTAAAAGTGAGACTGAAGACATAAGGTTGAGCCTGATTATTTCTAAGAAGCCAGGCGAAGGTGAA

ACATTTGACATAATAGAAAAAAAAAAAAGAGCTACTGAGGCCATCCAACTCTTATGACAATTGTGCATAG

AGCAAGTATTTTGATGGTTGTGCGTAGAGTCAGCAGTTTTGAAGGTCAGTCTGGGGGTGTTGAGGAAACT

AAATGAGCATTTTTGAGGCCCTGAGATAGAGGTAGAAATGGAAAGGAAGAGCCAGGCACAAGGATTTAGG

CAACTTCACCCTAGTGATGATAGTTCATGCTGTTTCTAGAGGATTTGGTGACTGATTGGATATAAAGAAA

GAAAGTGGGGGATTACACAGTGATCCCATTGTTTTGATTTAGTGTGAGTGGGAGGAGGGTGATTATCATC

AGTGTGAGCCTGGATAGTCTCTTGGGTTAAAAGCAGGTAGGAAGAATGGACTACAGAAAGAGAAGTCCAA

AGACTGAGGGCAGAAGGGAGCCAGGGAAGAGAGAGTACTATTGGAGAGATGGGAGCTAGACCAGTATGGT

GGGCCACAAAGGAAAGAAAAGGAGCTTCAGGAAGGAGGGGTCAGCTCAGAGAAGAAGGAATGAGAAGACA

CCCTTGGATACCTAGAGATACTTTCCAAACAGTTATGGCAGTGGACACAGACTGCACAGAGCTTAGGAGG

AAGATAAGAAAGTGGAAACAATGGGCATAGATGCTTTTTTGTTCTTTGAACTGTGGACATACAATGTAGC

AAAAGGGTCAAGTGAAAGTTTTTTTCGAGACAGAAGGAAAAGTATATGGCTCAAGATAAGAGTGGGATAT

TGAAATTGGAGAAGAAAAGGGAAAGAGTAGAAGCAAAGATCTTCAGAATAGAAACAAGGGTTCATCAGGG

CCAGACTAAGGTGAAATATACATGGTGCTTACCTGGGGTGCTAATTTAAGAAGGTCCCCAAAACTCAGTA

TCATGATAAATAGTATTTTATTAAATATTCCTAAAAAATCAAAATCAATGCAACAATACATGATGGAACA

AAATATCAAACTTTTCTTCATTATGAATTTTTTTGAAAAAAGATTATGCTTTTTTTCCCAAAAAATGGGA

CAAAATTCTGTGTGAATCTTTTTGAAAATACTAATTTTTTTATTCAAAATGAATCAAAAATACATTGAGG

ACTTTTCTTGAACACATCATGATTCTTTTCAAAATTGACTAAAAGTATGTTTTTTTGGGGAAAAAAAGTC

CATGATAAGCAAAGTTTTGAGATTTTATTTATCATACATTTTTGGTAGTAATTTTGATTTTTTAAAATGT

TAATTATTTATCTTGATTACTGAGTTTTTTTAAAAAAGAGTTTATTTGAGCAAAGACTGATTTATGAATT

GGGCAGCATCCTGAAGCAGTAGAGGTTCAGAGAGCTCCACCCAACAATGCAGGCAGGCAGTATTTACAGA

AAGAGGAAGTGACACCCAGAAACAGCTTGATTGGTTACAGCTTAGCAATTGTCTTTAATGGGCATGGTCT

GATCACTTGACAGCCTGTGGTTGCCTGAAGATCAGCTGGTATGGCTGGCTGAGATGGAGCTACCTGTTGC

AAGAATATACTCCTAAGTTAGGTTGCAGTTTGATTACTGAGTTTTTGGTACCTCTTAGATTTTGTACCTG

GGACAGGTTCCTCACCTCACTCACCCTGGCCCTGTTCCTGAGACAAGGAATAGCTCCTTTTAAGATGCTG

ATTATCATGCTTCTGCCTTGCTGGGCACACCCACACTGGTTGTAATACTCACCATCTCTTCCCATTTTCA

CATCTGGACTCTTCTTCTCATGCCCCTCAACCCTTAATCCCTCCCTTTCTTTGTACTCTTGCTTCTCTTC

TGTCCAATCTTTGTGTCCATCTCCCAAGGCCATCTCCCATGGTATATTCCCCACCTCCCCACACCTGCCC

TCTCCATCCGCCATGCTCCCTGCTTCTCTCCAGTCTCTCTTGTGCCCAGATAGACCAGCCAGAGTTTGAT

GTTCCCCTCAAGCAAGATCAAGAACAGAAGATCTATGCCCAGGTAAGATGGCACATGGACAAAGGCCCTG

CCCTCTGAGGCCAGGAGAAAAGCAGGGACCTCTGGCACCTGTGACTGACATTTCCTTCCTCCAGATCTTT

CGGGAATACCTGACTTACCTGAATCAGCTGGGAACCTTGCTGGGAGGAGACCCAAGCAAGGTGCAAGAAC

ACTCTTCCTTGTCAATCTCCATCACTTCACGGCTGTTCCAGTTTCTGAGGCCCCTGGAGCAGCGGCGGGC

ACAGGGCAAGCTCTTCCAGATGGTCACTATCGACCAGCTCAAGGTGCCTGGAACTGGGGGCCAGAAGACT

GTGGGCATGGGGATCTTCCTCTCAAACATTACCTCCTTTCCTTCTTCCTCCTAGTGCCCTTAATACCTTT

TCATTCTGTCTCTGACTCCATCCCCTCCCCCAGTTAGCCTGTTCTCTTCTTTTTCTCACACCCAAGGGGA

AGCCCTTTCCCCTTCCTTCTCTTTTCCTTTTCCCCCTCAGCTTTGTGTCCCTCCTCTAAGGAAATGGCCC

CCGCCATCGACTGGTTGTCCTGCTTGCAAGCGACATTCACACCGATGTCCCTGAGCCCTTCTCAGTCCCT

CGTGGTCCATGACGTGGAATATTTGAAAAACATGTCACAACTGGTGGAGGAGATGCTGCTAAAGCAGAGG

TTCGCCGCAGGTGGGATTGGGGAGATCATGGAAATGGAGGAGAGCCTGAGCACCGTAGATCTTGGGGGCA

AAGGAAACCTTGGGGAAGGCAGGCTGGTAAGGGCCTCCCAGGAGGATAAGAGGAACCTGCCACCTGTGCG

GGCAGAGAAGCGTGGGGTGGGTGGCACAGAGAGGATGGAGGGATCAAGAAGGATGTGTCTTGGGAGCACG

AGTAAGGGAGGATACACACGACATGAGGAACGCAGGGTCAGCCAAGACACGGGGTTTCCTGAGAGTAGAA

CACCAGCCAGTCAAGAGCCTCTGAGCTGTAGAAGATGCTGGAAGACCCAGACACAGAAGACAGTTAAGTG

TATGTATGTCTTTTTAGCAGCTGAGGACTGTGGGCAGGAGGAGGAGGCACATGAGATGAGGAGATGAAGA

TGGTGAAGGCTGGGGATGCTTAGGGGAAGAAAGGAAGAGGAGGGGCCATTCCTCAGGTGTGGTGTGAAGA

TGCTGGAGCTCTTATGGGAAACAATGTCTAAGAGCATTTCTGCTGGTGTCAGGAAATCAAGGGGGTGTTG

GGGTTGGGGACATGAAAGAGTGGCTCTTTGTTGGGCTCTCTGCCTCCCCTGATACCTGGGTGGCTACCAC

CTGAAAGCAGTGGCTTTCTTCCAGGGGCTTGGACCTAAGGGCCTTCTTCATGGTGGCAGCAGCATCTGGA

AATCCTTTTTGAGGGAGGTAGCTGCCCATTCACATGGCAGTGAGCAGGCTTACATAAGGGTGCAATGCAG

CCCTGGCAGGAGCATTGCTGGTGGAGGAGAGAGCAGTCACAGAGACCAGCTTACTTATGCTTATGAGATA

CATCTGAGGATAACCAGAGATATCTTGACTGTGGAAGCAGAATCTGTTTCATGACATGAGTCCAGACTCC

ATCTAGCCCAGAACTTTCTTTCCCTGTGACTTTGAAGGCTGCCTCTTCATCTAGTTTCTTTTACTAAGGA

GCTAGATCCCACCCCAACCTACATCATGAAAAGCTCTTTTTGACTTGGGTGCATGTTAAAACACTTATTA

ATACAGAGGAGAAGGAGCTGCCTTCACGAGTATCAAGGTGACTTACACAAGGAGAGGCTCTTCTTGAAGC

ATCCCCAGATTCCTGGGGTATATGTGTGGGTCTCTTTTGTCTCCATAGGGACTTTCTGCAGAGCCACATG

ATCTTAGGGCTGGTGGTGACCCTTTCTCCAGCCCTGGACAGTCAATTCCAGGAGGCACGCAGAAAGCTCA

GCCAGAAACTGCGGGAACTGACAGAGCAACCACCCATGGTGAGGAGAGGAGCGGGTGTATTTGCCCAGAT

ACTCGAAAGGAGTATCTACTCTTTTGAGGGGTAAATGTCGGCATCTCTCTCTCAGGGAGGGGGCCGTGAT

GGTAGATGCCCCTCCATGTCTTGGCTTTCCATAGAAGCAGGCAAGTTGGACAGACAAAGTTTAACTTGAA

AACCAAGATGCCACGTGCCAGACCTTCAGGCACACATCTCCCAGCCTGACTACCTCTCTGGCTTCTTGCT

GGGTGTTTGAGCTCAAATATAAAACTCTGATATTATCAAAACTGCCCTTTCTTTGTCATGATGCTTACAC

TATTTGCTCAGGATAACTTGGACTTAGAGCTTACAATTTATTGGGATGACAGAGAGATATGTTACGCAGT

GGCCTTCCTTATGTCTAGTTGATTCCATGTTCAAACGTGCTTCACAAAGAGTTTATCTCTGACATCCAGT

GGGATCCACTGGGCCACATGTAGACTTTGTGGCACAGATGTGGATATATCTGAGGAGGGGCCTGGGTAGA

AAATGCACTTCACTAACCAGAGTCTACTTATTACATAAGATGCAGAGATGCTCCTTTGCTGAGAATCTTG

AAATCCCAAGTTGGATATATCCAAATGCAAGCAGAAGAGTCTAGTACATTGGATACATCCCAACCTCAGT

GAAGGCCTCAGTTTAGTCTTAAAAATCACTGGATTTTTTTTCTTAGTAATTTGTGGTCCATTTCCCTGCC

TTGGAGAAACTCTCTGCTTTGGCAACCTAAAATTGCTGTGGAATTCAGAGAAGATAAATGTATTCACAGG

GACTGGAATGTAGTTATTGCTTATCAAGAGCTAATGGTGTGCTAGACACTCTGAAATCCTTTAGATCTAA

ATCTAGATTTAGATTTAATCTTTACAATTCCATGAGGTACCATGGATGCCATTTGGTTCCTATTTTAAAG

AGGAGGAGACAGAGGCACGAAAGATAAGGAAGTTGCTCAGGTATGACAGTAAGTTAGTGGGGTGAGGATT

TGAACCCTGGCAGTCTGGCTCCAGGGTCTGTGTTGTTTACTCATTGTGCTAAAAAAGCAGTCTTCCTGAG

GAACATCACTTGGGTTGGAGAGTGGCCAAGAAGCTTCTGCCCAGCTTTTCTCTTGATTCAGATGAAGCAG

ACCAGAGCCCCAAGTTATCTTAATTGGGGTTGCTACAAAATCCTGGCAACAAACAGCTACCTATAAATGC

CAGCACCATGGCCTCATGGCACTTCTTGGAGGCTGTAAGAGTGCTAATGTTGAGGCTTAGGCTTAAAGAA

TGCAGAAGGCTTAGATGTCCTGAAGCCATTATCTTTTCCACTAGGGCACATAATTGTCCTTGGGCTTAAA

AGCTGAACTAATCTCTGCCAACAAATAGTTGTGTGACCTTGGGGACGCCACTTCACCTTTCTGGAACAAT

AGTATAAAAGATGGCACTTAATAATAATGATAATAGCTGCTATACATGGAGTAGTCACTGTCTGTCAGCA

CTTGGGACAGGTTATTCATTTAAATCTTCCAGAAACACTTGGAGGTTTTTAATCCCCATTTTGCAGAAGC

AAAAATAGGCTCAGAAAGGTCAAGAAACTTTCTCAAGACCACACAGCTCACAAGTAAGTGAACAGACTCC

AAAACAGATGTTTTGGCTCATAAAGTCATGTTTTTAACCACACACTATACAGGATTGAGAAACAAGTAGG

TGCTACAAACAAAGGTTAGAAAACTTTTTTATAAAGGGCAACATAGTAAATATCGACTTCGTGATCCATA

AATGGTTGGTGTTACAAACTACTCAACTCTGTCCCTGTAGTGCAAAAACAACTGTACACTAAGTAAATGC

TGTGTTCCCAGGGGATCCTGGTTGAGACAGCAGATATTCTTGGAGTTCCCAAGAGGGAGAGATCAGGGAG

CATTTGAAGGATCAGTGGCATCTCTGTGCAGGAGGCAGAACTGACAAAATGTCTAGAGAGAGGAAGGAGT

TTTCTGGTGAAGAAAGGGGTATCATCTCATGGGGACAGGGCAGGAGGCAGGCTGGCTAAAACTTGGTGCA

GGGTGAGGGATCCTCCTGGTGGCTCTGGTTGAGAGGAGAAGACTAGGCTTGCTGTGTCCACTGATGCCCC

TGGAGCATGCTCCAGGTGTTTGAGAATCAGCAAGGGAGCCAGGGCACCTGGATCAGAGTGACTAGGACAA

TAGTGGGGAGGGAATCAGAGCAGGAAGGAGAGAACCATACAAGGTCTGGTAGGTTGCTGAAGGACTTTTG

CTTCTCTCTGTATGAAATAAAGACATGCAGAGGGATTTATCTCATTTATGTTTTAAAAGAACATATTTTA

AGGTTAGTAATGGGATGTCCTGATGATGAGTGATGTGAGAAGGAGAATGGAATCAAAGACATCACCTAGA

GTTTGGCCTTGATATGATCAAAATGTTTGGTTTTATTCAGTGGCCATTAATTACCGACTTCTGATCATAT

TCTTTTGAATGAATTATAATTTATAGTGCCCTTATACAGAAAGATTTCTAAATCTCATTATTGGCCCATC

TTTGGATGATTAGTTTTGAATAGAGTTATAGTCAATGAAAATGGCTGTTAAGTCAGGTTTTCTTTTATGA

AACTTGGGAAGGTGGGTTTTGAGAAGTAAAAGCAGAACTTCACATTTGTGATGATTAAATGTGAATGATT

TATATTCAGCCCAACATCTCAATTTATTCAGGTCTTCCAGCTTTGGATCATTTGCAATTTTATTCAGTGT

ATCTTCGTCCAGACTACTGTTAAGATCCTGAAGGGAGAAGGGCATCGGGTCAGGTTATTGAAGACCTAGA

TATGGATTTATGCATTCATTTATGTAACAAACATTTATTGAGAACCTAGTGTACTTCAGGTACTTCTCCA

GGCACTTGGAATGCAGCAATGAACAAAAAAGACAAATAAATAATCCTGCCTTCAGCCACATATCCTGGTG

AAAGAAGAAAGACAATAAACAAACTAATAAAATAATAAAATATGTTAGGAGGTGTTATGAAGAAAAGCAA

AACAGGAAATGAGGAAAGGAAATGCTAAGTGAGTGGTAGTTAGGATTCTTAGTAGGAATGTCACTGGAGG

TCAAGTTAACTTGAAATCATTCACCATTGATGTTTACTTTTGATTCAGCCAGATGAGACTCCACTCAAAT

TGCACTATCATTCAACATCAGTTTCTCTATCTAATTCACGAGGACTCAATCTGTGTTTTTCAAGCCTGGC

TAAATCAAGATAATGCCAACAGAGTGGGGTAGTGCCTTAGAGTACTTGAAAGGTATTATTTCACCTGATC

CCCAAACCTGTGAGGAAGGTAGACTAGATATTGTTTTCATTTCGACAACTGGTGTCACTGAACCACAGGG

GTTTAAGTTAATAACTCAAACTTAGTAAGTGCTAATACTCTATTCAGTGGTAGGATGGTAGTGGTGCTTG

AGGATGTATTTCGTCTATAGATGTGTTTTGTTAGCCTGTAGAATCTTTTGCAAACTTTGAATTAATCACC

AACATTCAAAAACTAGGATATGGCATGCCAGCATTCAGGTTTCTAGTGTGTGTGTGTGTGTGTGTGTGTG

TGTGTGTCTGTGAAGCTTGGGAAACACTGGGCTACCCTTCTCCTGTGGCAACAACTGACTGTCGCTACAT

GATGCAGCTCAGGGCTGGGTGCGCTCTCTGAAGCCCCACCACAGCCTGTAGCTCTGATGTTGCACTGCTG

TTCTCTGTTATGCCTCTGCATGGCCCCTATTGGAGTTTGCGGCTTCCGGTCTTTCATATGCCTCAGTTAC

ATAAGCCTTTTAGCCAGAAGAATTTTTATCATTTTGGCATTATTTTTCTTCAGTGATCCTATCATAGCCC

TTAGTAGTTACACATTATTTTCCAAGTGTTAAAAAACTGTTTAATGATTCGTTCCACAATTTTGTTTAGA

AATTAACATTAAGGATTCCTGGTTGGCTCGTAATCCCTAAAATTTCCTTTCATCCTATAGAAGATTGGTC

AAATTTTTGCTTCCCTCCGGACTCTTAGAATCTGTCCTGATTTCTATCATTTCTCAAATACTATCTGTGG

TTCTGAGGTTGTATATGGAACTTTTTTTTTCTGGTGCCCTAAAATTAGTCCACTGAGTTTCATTATCTTG

GGTTTGAAGTATTTCTTCTATTGTTTATATTTTGGAGACTTTTTTTTCTCGAATTCTATTTCTCTCCCTC

TCTTTCTCTCTCTGACTCTCCCTTTGCAGTCAATGTGGTATACACTACCATTCCACATCTTGAGAGAGAG

CTGTAGTAGTGGTCTGAGGTGGCGATTGTATTATCCAGTAGTCAGGTCCCACGGCAAAGCATGTTGGAGA

AATGATCAGGCTCCAGCAAAGGGCATCAGGAAACAAATCAAGAATGAGAAGGGGTGAGAAGAATAGGCAG

ATCTACACTTCCAAGCTCAAGTGGTCTCCCTGCTGATGCTGGTTGCTGCTCCACATGTAGCAACTGTCTG

GTAAGAGGTATTCCTGGAGCCAAGCTTGTCCAGCAGAATGTGGCTGGCAGATTCTCAACTTGGCCTATAA

TTGCTTTCAGACCCGGACTTCTTTTTAGTTCCTGTTGTTTCAGAGCTCCAACTCATGCAGCATGAGAAGA

ATCTGAGCCTCTTCTCTTTATCAGAGACAAGGTTGGCCAGGTGCGGTGGCTCTTGCCTGCAATCCCAGCA

CTTTGGGAGGCCAAGGCAGATGGACCACTTGAGCCCAGGAGTTTGAGACCAGCCTGGCCAACATGGCAAA

ACTTCATCTCTGGTGGTAGCCACCTGTAATCCCAGCTACTTGGGAGACTGAAGCAGAAGACTCACTTGAA

CCCGGGAGGTGAAAGTTGCAGTGAGCCGAGATTGCACCACTGCACTCCAGCCTGGGTCACAGAGTGAGAC

TCTGTTACAAAATAAAAATAAAAATAAGACTCAAGGTTAGCAGACCTCAAGGTTCAATAGAACACAGATG

TGGACAGCCAGGCCTGCAGCAACCTCCAAAATGATAACCTCTTTAACTGGTGGGTTCGGGAGTTTTTTCT

TCGGTGACTACCAGACTGGCCTCTTTGGTCTGTTTCCTGTAGTGGGATGCACATAAACCCCCTCCATTCC

CAGGACCAGCCTAGCTCCTGCGGGGAGAGTATTAGTGGCAGCCTTCCTACCTTCCCCGTGGGCAGGTCTT

TGGGAAGTAAAAAAATCACAGGAATAAAGTTTTGAGGCTTCATCCTGCCTAACCCAAATTAGCATATTAG

CTGGTATTTATCAGTTCCAGCTCAGCTTTCCCTCAGGCCAGCTACCTCCTCCTGTCCCTGGGTTCCTTGA

GTGTGTGTCTCCATTTACCGTGTCATCTCTGGGTTTATGCCTTGGTCAAGTTTTTAAAGCCATGCAAGCC

CACCGCCAAGACCTTCTCAGCATCTGTCTCTTCTGTTTCTCATTCTTGAGGTCCTCAGCTGGCACTGCCC

TCTTGGATGTTTGTCCATGGCCTCCTGCCTCTGCAGTGAAAGCCCTCCACCTTCCTGTTCTATTCTCTCC

TCTCTGACTTGGCTGGAAGTCTTCCAGCTCTATGAATTTATACACTGAGTCTTGTCTTGTGTCCTCTTTT

CCTAGCAAACAATATGGCATCTAAAACCCAGTTCTACTCTGATAATTTTTTCTTTACAAGATGCTACAGT

ATGATACACCATGCCCACCTGGAGAGAGGATAAAGGTGATGGTGGTAGGACAGAATTTCCATCCGCAATC

TCCGTTTTGAGCAAAGAAGCATGGAGGATGGAAGTCATTGCTGGGACCCCGGAGTAGAGTGGTGGTGGGG

GAACAGGGGGAACATCAGACTGCCGAGGTATGAGTTTGGGTTCTCATCTTCTTCCCAGGAGGCTTTTGAA

ACCCCAGGATGATGCCTCCTAGAGGCCTTGCTGTCAAATTCAATAGGCAATAACATGAAGGATTTACTCA

GCCAGGCTCATGAGACCAGCTCTGAGGAAGCTGTGCTTTTCTTGTACTGATCGGTGATGTGCATCACCCT

AAGGGATAGTAAACAGATGAAACCCAGAAAGTCCAGTCAAAAGAGCACCCTCTGGGAATGAAGATCTAGT

GAAGACTGGGGAGACAGATGAGGAAAGAGTCCTGAACAGGAGCCACTCATTCCAGCTTTGTCTCCATAGC

CTGCCCGCCCACGATGGATGAAGTGCGTGGAGGAGACAGGCACGTTCTTCGAGCCCACGCTGGCGGCTTT

GTTTGTTCGTGAGGCCTTTGGCCCGAGCACCCGAAGTGCTGTATGTGAGAGCTCTTCCCAGCCCACATCC

CTCCACCCCTTCCTACCCAAAGCAGCCTTCCCTCTTCTATTAACTTTGACTTTCTCAGTGGTGTGTGTGA

TTGGGGAATTGGGCAGTCAGAGAAGGGCCACTGAGAGAGGGAACCCAAAGGCCTGCTCCATCCCTGGTGT

GGAAACAGTTCAGCTTCAGGCCACAAATTCTCCATGACATGCTCTCACTTGGACAAGTCACCCAACTTTC

CTGGTCTTGTGTTTCTTCAACCATCAAATGAGAAAATCGAGCCAGGCTCGGTGGCTCACACCTGTAATCC

CAGCACTTTGGGAGGCTGAGGTGGGCGGATCACCTGAGGTCAGGAGTTCAAGACCAGCCTGACCAACATG

GAGAAACCCCATCTCTACTAAAAATACAAAATTAGCTGGGCGTGGTGGTGCATGCCTGTAATCCCAGCTA

CTCGGGAGGCCGAGGCAGGCGAATCGCTTGAACCTGGGCGGCAGAGGTTGCAGTGAGCCGAGATCACGCC

ATTGTACTCTAGCCTGGGTGACAAGAGTGAAACTCCATCTCCAAAAAAAAAAAAGGAAAATTGAACACTA

TCATCTCTAAGTCTCCTCCCTGTTGTAGCTAAGATTTTTTTAACAACACATGACGTGACATCAGAACAGA

TGACATAATCTTGAAGAGGGCAAATAAATCAAATAAATCACCACTGAATACTTTCTGAGTACCTACCACA

TGCCTGGGACTCCTTCAAGAACTTTGCATGAACTACGTCATTTAGTTCCTATTATGATCCTGATTTTATA

CAAGAGGGAACTGAAGCAAAGAGAGGTTAAGTGACTTGCCCAAAGTCACACAGTTACCAAAAAGCAGAGA

CAGGGTTTGAACTCAGGCATTCTGATGCCAGAGCCCAGGCTCTCGATATTGCCTTTCATTTTCCTCCAGG

AAAGGATTTACATGAGATGGCAGGTGGCTGGGGAAGCAGTGAGTACACACTCACGTTGTGAAGGCAGGGA

GACTTGTGGGGGACTTGCTGGGAAGCTGAAGAGCTCAGGAGGATGAGGAGAGGGAGTGGACGGTTTAAAA

AAGACAGTGTGAGAACAAGAGCCCTGAGCCAGAGGAGAAAATGACAGCCCTCTCCTCCCTCTGATTTCTG

AGAGGTGTTCCTGCCCCCAGGAGTGAGGACACTGTCTTTCTCCTGTGTCAGGCTATTTCCCCATGGAAAG

GAACTATATCTCCCTGATGGCCCTCACGGATGGCCAGGCCCCACCTTCCCTTTGTGGGCTTGGCACTGCC

TTCCTTTCTCCACAGATCCTTTAGTTGCTTTAGTTGAGCTGCTCCTCTAGCAGCAGCTCCAGCCCAGGCA

GCTCCTTGGGGCCAAGCCCTTTTCCAAGGGTCAGAAGCTGTGGGCAGGGCCAGGCTGAGGCCTCTCCTGA

TCCTGTCCCCCTGTCCCTGGACCTCACTCCCACAGGCCATGAAATTATTCACTGCGATCCGGGATGCCCT

CATCACTCGCCTCAGAAACCTTCCCTGGATGAATGAGGAGACCCAGAACATGGCCCAGGACAAGGTCAGG

CCAGGCGTCCTGGCTGGTGTGGGAGCCTGTGCAGGGAATGGAGTATTGGAACAAGCGAGATGGGGATTGG

AAGCAAATGCCAAAGGCCCCCCCAGGCACATGCTAAGTAGGGAAGCCACTGGGCTGTATACTCACACTGG

CAACAATGTGAGAGGCTGGGACAGGGCAACGAGTGGGAGAAATTTCCTCTGGTAGACTCGGAGAGTATTC

CTAGCCTCTTCTGTGTCTCTCTCCAGGTTGCTCAACTGCAGGTGGAGATGGGGGCTTCAGAATGGGCCCT

GAAGCCAGAGCTGGCCCGACAAGAATACAACGATGTGGGTCCCTGTGTTTTCCAGCTCCTTTTCAGTCCT

TGACTTCTCGTCACTTCTCTGACCCTCCTAAGTCTTTGTTGGACAATCAGTTTTCCCTGGGTGACTTAGC

TCTGTCCTTACTCTGGTGCTGGCTGGGGTTGATGGGGAAATATCCACACTGTACGTCTTGCTGGCAGAAG

AACAGAATCTTTTCAGGTCCCAACGCATGTGCCAACACACATGCATGCATCCTGTGACTTGTCTGGGCGT

GTTCATCTGTGTGCTGATATGTGTAAAGCCTGGGTGTGCTGTGTAGTGATGCCATTGGGCTGCTCTCTCC

TAATCCCTGGATGCCTGCCTGTCAGGGCTTGCCTGTTTGGGGTCAAATGGTCCCATTGGTGTTTGTCAGC

GTGCATCTATAGAAGTCTCTGTGTGCCCAAGTCACCTCCTGCCTCTTCCCCAGATACAGCTTGGATCGAG

CTTCCTGCAGTCTGTCCTGAGCTGTGTCCGGTCCCTCCGAGCTAGAATTGTCCAGAGCTTCTTGCAGCCT

CACCCCCAACACAGGTATGACAGCAGGGGAGACACAGGCACTCCATCCCAGAGAGACCCATCCATGATTC

ACAGGAAAGGAAGCCAGGGCTCAGGGCAGGCAGCATGAACAGTAATGGTAGTTGGGAGGGACTGTGTAGG

TCTCAGGGTGGCAGGGCAATACGTGGTGGGGGCTGGAGTTCACATGTCCTCTTCCCACAGGTGGAAGGTG

TCCCCTTGGGACGTCAATGCTTACTATTCGGTATCTGACCATGTGGTAGTCTTTCCAGCTGGACTCCTCC

AACCCCCATTCTTCCACCCTGGCTATCCCAGGTATGGGTCACTCTGTAAGGGTAGGTAGGGAGTTTCCCA

AGAGGGGCCGACAGGTGTTATGATGGATGGGACTTACGGTTGGAGAATTGGGGTCACAAATGCTGAGAGA

TTCTGGGGGTCAAATAAGCCCTTGTCTCCCTAGAGCCGTGAACTTTGGCGCTGCTGGCAGCATCATGGCC

CACGAGCTGTTGCACATCTTCTACCAGCTCTGTGGGTAACAGGGGCCACTGGGAGGTGGGATAATAGGGA

ACCTAAGGGAAGACCACAAGGGAGGCCTGGAGGGGAAAGGGAGGTTATTTGAGGGTTTGAGGTGGGGCAG

TCCTGGGAACTTTGCCATGCTCCTGGGAGCTGATTCAGTCTGTGGTACCACCCACATCCTCACCTAGGCA

GCACCAACCCTATGTTCTCTTGCTGTATGTTCTCTTGTCCCATTTTCAACAGTACTGCCTGGGGGCTGCC

TCGCCTGTGACAACCATGCCCTCCAGGAAGCTCACCTGTGCCTGAAGCGCCATTATGCTGCCTTTCCATT

ACCTAGCAGAACCTCCTTCAATGACTCCCTCACATTCTTAGAGAATGCTGCAGACGTTGGGGGGCTAGCC

ATCGCGCTGCAGGTATGCAAGTGTCAAGGGCCACAGTTTATGTGTACTGGCAGACTAGAAAACATGTCCT

CAAGTTTTCCTTCCACCATTCCTGACACAAGTACAGTTGCATGGCTTTCTGCCCTTCGCATCCCCACTGA

ATAGACGGCAACTTGGGGATCCCCCTCCTACCCCAGAGATCCTCCATTTTAGGACATCTATAGGTCTTCT

GGGAAGTACTCTTTCTTCTGGCTCAGATCAACTAGTCAGTGCAGAACCAGTGAGCAAGGGCCATGGGTTT

TGGGTACTGTGTGGAGGGACTTTCAAATGGCCACAGGTCTAGAGCCTGATGGCCCTTCTCTACCCACCCC

TACCCAGGCATACAGCAAGAGGCTGTTACGGCACCATGGGGAGACTGTCCTGCCCAGCCTGGACCTCAGC

CCCCAGCAGATCTTCTTTCGAAGCTATGCCCAGGTAGGCAGCGGCCACCTCCCGCCACAGCTTGCTTTAT

GTCAGTTGAACGCCTTATTACTGAAGCTCATGGAAGTCCCCTCTTCAGACACTCCGTCAAATACCCCAAA

CCCTCTTCTGCAGATGTCCTCACTGTTATCTTTTCTCTTCCCTCCCTACCCCTTGGAATCACCCCTCAGA

TGACTACAGGTTCTTCTACCTAATTCAGCACCCCCACAACTCAAAAGGTAGAAAAAACTCTATTCCCAAG

TTCCTCCAGGAGAGGAGGAGACCAACTTTTTTTTCCTCTCATACCCCCAAAATACAGATGCCTTAAAAAT

GAGCCTGTGGTTGGGCACAGTGGCTCACACCTGTAATCCTGGCACTCTAGGAGGCCGAGGTGGGCGGATC

ACTTGAGATCAGGAGTTTAAGACCAGCCTGGCCAATATGGTGAAACCCCGTCTCTACTAAAAATACAAAA

CTTAGCTGGGCTTGGTGGCGGGCGCCTGTAATCCCAGCTACTTGAGAGGCTGAGGCACGAGAATCGCTTG

AACCTGGGAGGCGGAGGTTGCAGTGAGCCAAGATCATGCCACTGCACTCCAGGCTGGGTGGTAGAGCAAG

ACTCAGTCTCACAAAAAAAAAAAAAAAGCCTGCGACAGGCTGACTGTGTGCCACATTCCTCTTCAGACAC

CTGACCTTAGGTGTGGCGCCCACTTGACATCACCTCCTTAAGCACCCTGTACTCCCTCAACAGACTCAGG

TGCCAGGTCTTCAACACGCTTAGATTAGACTTCACCCCAGAGCTCCTGCGCTAGACCCTGCCTCTCTGTC

ATTGATAAATGGTATCATTACACAGCCCAGGCCCTCCTCCTGGACTCCTATTGCCAGATTAAATGAACTA

TACATTTCAAATGCTCCATGTGGCCCTTGGGGCACTTGATCCCCTGGTTCCCCTCTTTGTCTGCTGTCCC

TGATCACCCCTTGTCACCGGGTCAGCTTTGTCCTGTGGACCCTCCCCCTTCAATGACCTCTCTTCCTGCT

CAGGTGATGTGTAGGAAGCCCAGCCCCCAGGACTCTCACGACACTCACAGCCCTCCACACCTCCGAGTCC

ACGGGCCCCTCAGCAGCACCCCAGCCTTTGCCAGGTATTTCCGCTGTGCACGTGGTGCTCTCTTGAACCC

CTCCAGCCGCTGCCAGCTCTGGTAACTTGGTTACCAAAGATGCCACAGCACAGAAATATCGACCAACACC

TCCCTGGTCACATCCATGGAATCAGAGCAAGATTTCCTTTCTGCTTCTGTTCCAAAAATAAAAGCTGGCA

CTTGGCTTCCGCTTGTCTCTTAA

As used herein, “ALAS2”, “ASB”; “ANH1”, or “5′-aminolevulinate synthase 2” is an erythroid-specific mitochondrially located enzyme. Sequences for ALAS2 are known for a number of species, e.g., human ALAS2 (the ALAS2 NCBI Gene ID is 212), the nucleic acid sequence (e.g. NG_008983.1), mRNA sequences (e.g. NM_001037967.3) and polypeptide sequences (e.g. NP_001033056.1) are known in the art. These, together with any naturally occurring allelic, splice variants, and processed forms thereof that catalyze the same reaction are contemplated for use in the methods and compositions described herein.

In some embodiments of any of the aspects, the ALAS2 enhancer element includes or is derived from human ALAS2 sequences having the following nucleic acid sequence NG_008983.1 (SEQ ID NO: 41):

NG_008983.1: 5088-27010 Homo sapiens 5′-aminolevulinate

synthase 2 (ALAS2), RefSeqGene (LRG_L163) on chromosome X

ACCTGTCATTCGTTCGTCCTCAGTGCAGGGCAACAGGTAAGAGCTGCTTTCAGCCTGGCACCCTATCTCT

GGTCTGCCAGCTGGTCTCTCAGGGCTGTACACACTGACTCTCTGGTCTGAGTAGATCTGACTTTTTCCTT

TGTTTGTTTCTTAGAATCTGTCTCTTTTTCATTTTCTTTTTATCTCCCATGTCTCTTTCTGTCTTTCCTC

ATTTTCAGCTTTTTTCTCTCTTTTTCCCTTCGTTACTTTCTTTTGTTAGTTTTCAAGATCATTCATTTCA

TTTCATCATTCTCTGACACTCTTGCTTTCTCTTATTTTTCCCTCTGAATTCTAACTATCTTTTTCTCTAA

ATTTCTTTCTCTCCCCCTTTTTGTCTCTTTCCTCGGCTTTGTATCTCTCCGTCTCTGTGTTTCTGTCTCT

CTCTTCCTCTCTATCAAGAACGATGGCTTAATATTTCTTCCTGCAATTCCCCATTCCTCTCTCCCTTTGA

CTCCCTCTACCTGCTGGGCTGACAGCAGAGCTCAGTGGGTCAGAGCCCATGGGGAGCCTAGGGGTGGGGG

AAGAGCTAGGGAGGGAAACTAAGAGGATGTGGGGGTGATGGGAATGATGAATTGGGTAAGGAGAGATTTG

GGGAATTGAGAGATGAATAATTAGCAGAAATAAGTGAAGAAAGTGGAAGAGGAATGTAGTGTCACTATAC

AGAAAGTAAACAGATTTCTATTCTCATCCTAATTCACTGTGAGACCCTAGGCAAGTCATTCACTCTCTGA

AAAAAAGGCTTGGCCTGTAATTTCCACCACCCTTTCTAGTTTTGATTTTGTGATCTTCTAAATTTTCCTG

TTTCTAAGAATTTCTGATTCTCTGATTACAGTTATCTAAAGTTCTGTATGATTCTTTCATGGTGGGAAAG

GGGTACTAGGAAGAGAAGTAAGGCCTGATGTTTCCAACTCCTGAAGAGAAATTACCACTTCCCTTCCAGA

CCTAATTGACTTTTGCAAAGCAGGCCACAAAAGGGGTGGGGGGGTGGGGGACAAGGAATGCTGCAATGAG

TGTTTTCTGGCTGTCTGCTGGGGTAGAGTTGCAGTTGGCCCTTTTCACCTCTGGGAGTACAGATTGGGTG

CTGACACAAGAGAGGATTTTAAAGTCGTAGGGAAAAACTTTCAGTAATGATCTGTTACTTGGTCTCAAAT

TTCACCATCATCTCTTTGGTTAAAAGTATTGTTTTAAGAAGATGCCTGGCAAGCATTATCACACATTAGG

TACATAAGTTATTGAATGGTAGAGTAAATGAATATTCAACAGTACCTGAAATTCCACTGTAGTTACAGAT

CTGTTCCTTTGGTAAGGCATTGGTGACAAATGGCATATGACCTGGAAAGAGGCCTATGTTAGTGCAGCAG

AGGAGATAAATGTCTAGAGTCAGGCCCTCAGTCAAGAAAAAAAGGTAGTAATATTTGAATCACAGATCCA

TAATGGTTAAGTTAGGAATCTCTGGAAACAGATTGCCTAGGTTCAAATCCTGCTTCTCCTATGTACTAGC

TTTCTGATCTAGACAGGTTACTTAATCTTTTTGGGATTCAGTTTCCCTATCATCACAGGGTTGACATGAG

AACACGGCCTGGCACAGAGGGCTCTGTAAGTGTTTGACTATCAGAACTAGGCGGAATCTATGAAATTATC

TAGTCCAATGTCAGTGGAGAAACGGAAGCCCAGAGAGGGGAATTACAGAGCCCAAGTTCACACAATAAAT

TGTAACAGGATTGGGACAAGAATCAATTCTCTAGCTTCCCAAACCCAGCCTGGTATATTCATGTGACTTC

CCTTGGCTGTACGTTCATTTTTTCTACATGGGAAATGGAGAAAATAAAAATAATAAAGTCTATCAATTAA

ATATAATATTTAACACTTTTTTACTGTTTACTCTGGGATAGGTACTCTGCTAAATGCTTTATATGGATTA

TCTTACTGAATCTTCACAACATTCCTGTGATGCAGATTGTCCTTGTTATTACCAACATTTTCCAGATATA

AGATGTACAGCAGGGAAGTGACTTTTCTAAGGTCCCAAAGCTAGTGAGTGGTGGAGCCAGGATTCAAACC

CAAGTAGTTTGGCTCTAGAGCCTATACTCTTTATACCCTAAATTGACTAAAATGCTTCCTTGATTCAATT

TTACTCACTCTAGTCTCTTGGTAGGTAATGAGATGGAATAGAAACAGAGCCCATGGTAACTAGACTACAA

GGTCATGGGTATAATGATGGCCAGGCAGAGTGAGGCAGAGCAAATTTCAGGAAAGGAGTAACAGAACAAG

AGAAATGAGAACAGGAGCTTGAAAGAACTTGAGAATTCAACAAATTCCAAGAAGTGGTCTATATTTTCCC

AGGACCCTGAGCATATCATGGCCAAAAGCCCCCTAGTAATGATGTGTGTTAATTTCTCCTGTTTTTATAT

ACAGGAGGTAGGTCTTCTCCACCATCCCAAGGCAGGACTGGACTTTGCCTCCAATATTGGGGGCTTTCCT

TCCCACTACATACCCCAATGTTGTTGGCATTATTGTTGCCAGTATTGATGTTAGGGGAGTTTACAGGAGC

CTGGAGCCTTGTCATCTGCCTTGCCTGCACTTCTGGGCCATCCATTTCTTACCACCAATAGCCAGGGCCA

GCTCTAGCCAGATGCTCAGACGTGATTCCAGGAAGGGGCTCCTCTTCTCTCCCACGCCCTGGTCTCAGCT

TGGGGAGTGGTCAGACCCCAATGGCGATAAACTCTGGCAACTTTATCTGTGGTCTGCAGGCTCAGCCCCA

AGTGCTTTAGCTTTCACAAGCAGGCAGGGGAAGGGAAACACATATCTCCAGATATGAGGTAGGCACTGGA

TCCAATTCCTTACCTACCTTGTGAAGTGGCCATAATTACCTCACGTTTGACAGCTGATGAAGGCCAAGAT

CCAGAGAGGGGAAGTGATTTGAACAAGAACATCCAACAATGAAATTGGAGAGCTGGAATTTTAATAAGAA

AAGCTAACATTTATTGAAGATTTACTATGTGCCAAAAACTATACTAAAGGCTTAACTTGGATTGTTTCAT

TTAGTCCCTCCAACAACCCTTCTGTCTTTTCCAATTTCAGGGCCCACATGCCTTGGCCCCACATACCAAC

CCAGGCTGCTGTGACAGCCCATGAGAGGGGGAGAGGTTGCTCTGGGATGGAACAAGAAAAAGAGGTTGTT

TTGTGAGGTACGGGGAGGGTGCTTGTTCTATGAGATCAGGAAGGGAGGGAGATGAAGGAGGTTGCCATAT

GAGGGCAGGGCCATGAGCTGACCTGTCCCTCAAAACATAAGGCTGAGGGTGCTAGTAGATTCTACTCAGT

AACTTTCTTCACAGTGTCAGTGCTTTAGTCTTCTCACATTCTCCCATGTCTCTCCCATTGTACTGTCCCT

TATCTTGTCTCACTTTTTGACTCTGTCTTTCCAATTTGCCCTTTTTCTTTACATCTGTCTCTCCTTCTTG

CTCTCTCTAGCTGTCTTTCTCTTGGTGTCTCTCAGCTCTCACCCCTCTTAACCCTCATCCCCCTGCTTTA

GTCACCTCTCTGTCTCTATCCTTTGATCTTGTCATTTTCTCTACTCTCTTCTCTCTGTCCCTCAGTCTCT

CTCTCATCTCCCTCAATTAGGGCCATGATTCTCTTCCCTAAACTTACTTAGCCTTTTGCAATTTCTGGCA

GCATTTTTTTATGTTTGTGTCTGACTGACTCTCTACCCCTGCTGGATCCTCTCCACTCCTGTTCTCACTT

CTATGAATCTTTGTATAATCCTCTAGACTCATTGATCCCTCCTCATGTCCCTTTCGTGCCCCTTGGTCTA

TCTGTCTCTGCCTTTATCCCTGTGTGCACTATCACCACCCCCTTTTTCTTTTTTCATTTTCTCTTTCTCT

CGACTCAATCTCTGTTTTCATCTCTACCCTGCTCCCTTTCCCTCTACCTTTGATCTCTTTTTCCCCCTCA

ATTTCTGTTCTTTTAACTCTACCACCACCACCACATCTTTGTTCTCTCTCTACTTTCCTCCTTTTATCTT

TCCTAAATTTTCTTTTCTTCTGGCTTTTCTCCTAGTCCCTTCTCCTTCCTCAATTTCAGACTCTGTTCAT

TCATCAATTTACCCCAAAATTCAACAAATATTTATTGAGTGCCTGTGTGTCATTTGCTTTCTCTTTTTCT

GATCTCTTTGCCCCCTTTCTCTTCTCTGTCTTGGCCTCTGCCTGTTTCACTAATCCATAGACTATGTCTT

TGTCCCTGTTTTCCAGCCCCACTGGGACTTGCTTTCACCTCTTCCTATATCTGTGCTTATCCAAGAGACA

GGAGCAAATTCAAAGACAGCATAATATCAGGCTGGTGGTACACATTCTGTAGGACCTAGGGCCTACCCTT

CCTTCCGGATCCCTTGATTTCCTTAAACTGATACATGTGACCTCAAGCTCCTTCTCCCCTCTGGCTGATC

CTGCTTAGGAAACACCCTGGGCCAAGCCTCAGGAGCTCTACTCAATGACATATGTTTGCATTAGCAGGCT

GAATCTTCACTTGGCTAAGACCAACATTCTTAGAAAGATTCTTGGCCTTAAGTATTGATCAAAGGGTTAG

TGGGTTGGCAGTTCTCATCCTGCCACACAAAAACACATTTCAGTGATCCTCATCATCACAGAGGTAGTCA

GTGCCAGAATGTGAGTCAGAATCCAGGCTTTCTGACCTCCAGTTAGAACTGTTTCCTTCACCCCTTTGCC

CAGTAGTCAGTTTCCTATTTCTTCCTCCCTCATGTTTTATTGGTACATGTTAACATTGGGAAAGAAGTTC

TTTCCCTGGAAGGGCAATAAGAGCATCTCGGAGGCAGCAAGTTTTGGGTGGGAAGCTGAAGACGAGGATC

AAAGGCTTGGCTTTTTGCCAGGCCCTCATGATGGAACCTCATCTCTTCCATGTCTTCTGCAGGACTTTAG

GTTCAAGATGGTGACTGCAGCCATGCTGCTACAGTGCTGCCCAGTGCTTGCCCGGGGCCCCACAAGCCTC

CTAGGCAAGGTGGTTAAGACTCACCAGTTCCTGTTTGGTATTGGACGCTGTCCCATCCTGGCTACCCAAG

GACCAAACTGTTCTCAAATCCACCTTAAGGCAACAAAGGCTGGAGGAGGTAAGAAGAGGCTGCTAGCAAA

AGGGGAGAATGTTAGGGTCCTGGGGTAAAAGTTCCAAGTTATACTGGCCATCTTTGCCTAATAATTAGGA

CGGTTCATGTGAAAAGTGTCAAGATAGCATGAACTGGCCCCAAAATATACCCAGAATCTGTCTTCTGCCA

GGTTCTCTAGAAAGAGTCTCATTCTCGGCCAGGCACAGTGGCTCACGCCTGTAATCCCAGCACTTTGGGA

GGCCGAGGCGAGTGGATCACGAGGTCAGGAGTTCAAGACCACCCTGGCCAAGATGGTGAAATCCCATATC

TACTAAAAATAAAAAAATTAGCCAGGAGTGGTGGTGGGCGCCTGTAATCCCAGCTGCTTGGGAGGCTGAG

GCAGAGAATTGCTTGAACCCAGGAGGCGGAGGTTGCAGTGAGCCAAGATCATGCCACTGCACTCCAGCCT

GGGCAACAGAGCGAGAATCTGTCAAAGAAAAGAAAAGAAAAGAAAAGAAACAGTCTCACTGTCATGTCCC

TCACACACTATACTCCAGACATGCTGAAACTACTTAAAATTGCCTAAATCAACTATTCTGTCAAGAGTTT

GTGCCTTTGCTCCTGTCAGATTACCCTCTCCTAGACCCTGTACTGGAGAATCTCATACTTCTCATTTGAC

ACTAAGCTTGGCCATCATCTCCTCTGCAAAGCCTGCTTAGACCTCCAAACTGTCTAATTCCAATTCTGGC

TCATTTCCCCTCCCTCTTCTGGACTTCTGTAGCCCATGTACTTCCTCTATCCCAGCACTGTTCACAATGT

GTCTTCAGTGTATGCCATTCCCACCAGTTTAGTAGCTCCCCTAGCACAGGGACCAGACTCATCTATCTCT

GTGTCTCTACAATAGCCTGAGATAGGGCTTTAGGGGTACATTAGATCTCAGCAATTATTGTTGAGCTGAA

CTTATGACTAGAAATGCACCCCAAATTACTCTCTTACCTTTGCATAGATTCTCCATCTTGGGCGAAGGGC

CACTGTCCCTTCATGCTGTCGGAACTCCAGGATGGGAAGAGCAAGATTGTGCAGAAGGCAGCCCCAGAAG

TCCAGGAAGATGTGAAGGCTTTCAAGACAGGTTGGAGTCAAGTTCCACCTTATGCAACCTTTACTCCTAA

TGCTTGAACACACTACGTCACAGTCCTGAGCTAGGCTAATACAAAAGCAGCCAGTACACATCCCATGATG

AGAAGTCCAGTCTTTCCAGGGGAGCCATGGTAGGCAACAGTTTAGGCTGTATGCTGAAGCACACCATACC

TGACAAACACATATGTACGGGCTCCTGAAACTTTTAGTCATTATTCTAAGATGAGCCCTCTAGAATTTTG

ACTCCTCTTTTTCAGGTGGCTAAACTGATCCCAACAGGCTGGGGTCCCACATTTCAGCAAGACCACTCTA

TGAGAATATGGATTTGCATGAAAGAGAAAGAGCTGGGAGTAGGTACCTCCTTTAACCAGGGTGCAGATCC

CCAGGTCAACTTAATTAGTGCAGACCACCCAAGATAATCACCCTTGAGATATGGCCACACTGTTGACATC

TTTCATAGGCCCCTTTGGGATATCATTAAGGACAAAAACTTCAAAATTGAAATTTAATGATGTTTAGAAA

AGAAGAGTAAGGTACATTATCCTGCATCTACTTTCTAAATGCAGGACCCAGGGTGGCTGCTCCAGTTACC

TGAGCCAAGGGAAAATCCTAGTGGAGAGAAGTATGATTCACCTTATAGAAGGTTTCCTAACAATGTAATA

GTCTCCATTCGGGGGGATAAATAGAAGCTCACCTTGGAGAAGATTTCTTCTCGCTGTAGAAGCTGCCCTT

ACCTTATAAACTTGAATTTTCATGTGTTGCATTGAGCTTAAAGAGGACAACACATGCTTTCTTTTTCCCC

CATTCTCTTCACGGCCAATGAATCTCACATTCCGTCTCAGATCTGCCTAGCTCCCTGGTCTCAGTCAGCC

TAAGGAAGCCATTTTCCGGTCCCCAGGAGCAGGAGCAGATCTCTGGGAAGGTCACACACCTGATTCAGAA

CAATATGCCTGGTGAGTTTGCTGAGGTGGAAAAAAAGGGGACCGGAATAGGGAAGGCATTCTGAAAGGGC

CTCTGTCACAGTAGGGGAAACAGTACAGAAGGGCCTTGGAACCAAAGGAAATTTGAGTTTAAAATTTAAT

GCTGGCACTTGCTGGATCTAGGTGTTTTGGCAAGTAAGACACTTTCCTTCAGTGGCATTTAATACCTACC

TCAATAGGTTACCATGAGAAGAAAGTGAAATTACATTTATGGAAGTGTTTCTAATGAGGCTTCATTAAAT

ATTAGGCTTATTTCCATTATTTCTTCTCTATGCTTCCCTCAAAAACTTTCACCCTTCATACAGCACCTTT

TCCCCATTCTTATATGTGTTTATATTCCTTTCCATAATGACATTTACATTATTTTCTAATGTAAAAGGAA

TATGATTCATGGTAAAATATTTTTCAACATATACAGGAAAGTATAAGGAGGGAAATTTAAGTCATGCAGA

GTTCCACCATTAAGTTTTTGTTATATTTTCTCCCAGATATTTTTCTATGGCTACACACACACACACACAC

ACACACACACACACACCCTCTGCTCTCTTCACCACACCCATGCTTTTGTTAGAAGTGTGATCTTATTTTA

CCTGGAGTTCGTTATGCTGTTTTGTTCACTTAAAAATATGTCATGGGTATAGTATGGATTCAATATCATT

CAGTTAATCAAGCATCTATAATTTAAGTTGTTTCCAATTTTTTGTATTCTCTCAGTTTAGATTGTAGGTT

GGTTTTACATACATACAAATGTACTCAAAGAAAATGTATAGTATTACTTTTTTCAATTTTTATTTTTACC

TAATAATATCTTGCTATATATTTTACTCTGTGCCCTTTTTTCACTCAACAATATACTGTGGAAATGCTTC

CACTTTAACACATATGTATCTACCTTATTTTTCAATGCTTCAAAATATTTTGTAGTATAGATATAATAGA

GATTATTTGGCTACTCCTCTATTTGGTTGCTTCCAATTTTTTCTATTACAAACAGTGGTGCAACAAACAT

CCTTGAATGTATCTCCTTGTGTACACAGGCAAGTGTTTCTCCAGGATAAACACTCAGTGGTGGAAATTCT

TGGGATGTAAGGATGTGTACATTTTTGATATTAATACATTTTGTCAATTAGCCCTCCAACATGGCTGTAC

CAGTTATCAAGGAGGGTATCCATAGTCTCATACCCTTACCAGCCCTTGATATTATCAAACTTTAAATCTT

TATCAATTGATAGGTGAAATTTTGTTTTCCCAGTTTTATTTTTCCTGATTAAGAATCTTTTTCTACATTT

ATTGAATTGTCTGTTCATATTCTATGCCCATTTTTCTACTGAGTTGAAATTTTTCATGTTAATTTTTCAG

AGATTATATAATAAATTCTGAGTATCAATCATTTGTCTGTTAAGTATGCTGCAAATATTTCTCTAGATAT

GTCAGTATGTGCATTTAAAAAACTTTTGATATGTATTTCCAAACATCTCTGCAGCAAGGATGTTACCAGT

TTGCACCTCCAGCAGCCATATAAATTGCTGTCTGCAACATGATTTCTGTCTCACGTAAAGAGTTCTAGAG

TTTAACAAGCTCTTTGGCAAACGTTATTTCAATTTATCCTAGAAATAAAGTTACCCCATTTTGTAGTGGT

AATGGTTAAAGAAGTGGGCTCTGAGTTACTTACTTGATGAACACTTACTTGCTGCATGACCCTGGTCAAG

TTGTCTAACACTTAATGCCCCAGTTCCCTCATCTGTAAAATGGAGATACTAATAGAACTGTCCATGGAGC

ATTGTTGTGAGGAATAAATTAAATATTTATAAAGTTCCTAGGAAAGAACTTACATGTACTAGGCATTCAT

TAAATGTTAGCTATAATGATGTAATTGAATATTAGCTATCTTTATTAGTATTATTATGACTACTAATACT

ATAGCAGTAATAATACTACTATTACCATGTGCCATTTATTAGTTTGAATATATTACATGTTGTTGGTTGT

CAGATGCTCACAACTCTCCAAGGAAAGTATTATTAGCCTCATTCTACAAATAAAGAAATTTAAAGTAAGA

AAGAAGATTCATGACTTGTTCAAGGCCACACAGCTAGGAAGTGGCAAAGAGATCGCTAGAAACAAGATCT

GTTGATACTCCTTCCAGTGAGACTGAAAGCAGTGATTCTAGTAAGGAGGCTGCCACACCAACCCGGGAAG

AGAGATGAGGCCATAAGAAAGTCTAAATGAATGTGTGAATGAACTACTGAGTGAATGAGTGAATGAGTAA

GCAAAAGGATGGCTGAATGAAGTAGTAGAGAGTTAATGTGGTCCATAAGTCAATGACTGAGCAAATAAAT

GAATATGTGGAAAAAGAGTTGGAGAACTCAAAATCAGCAACATGGGTAAAATACAGACTAGCCAGGGAGA

GACTTAAAACGAATTCTTTTCATCCTCATATCTGCTCCTGCAGGAAACTATGTCTTCAGTTATGACCAGT

TTTTCAGGGACAAGATCATGGAGAAGAAACAGGATCACACCTACCGTGTGTTCAAGACTGTGAACCGCTG

GGCTGATGCATATCCCTTTGCCCAACATTTCTCTGAGGCATCTGTGGCCTCAAAGGATGTGTCCGTCTGG

TGTAGTAATGATTACCTGGGCATGAGCCGACACCCTCAGGTCTTGCAAGCCACACAGTGAGTAGTAGGCT

TTCAGCCATCAGCAGTGGCCAGAGGAGATGAAAAACCACACATGGAAAAAAAAAAAAGGCAGAGCTGGCA

GTGGAAACTTGGGTTCTATCACCACTTCTTTTGTCCAAGGTCCTCCATCATATCTATTCCTTGGATATGA

AATAAGTCAACACACCATGTTTCCCAAACTCTTCGGTGTCCAATGCTATGGAGGGGAAGGATGGGAGACC

AAGCAAGGCCCACTCTGCCTGAGTTTTTAATCTAGCTGCAGAATTAGTATTGCCAGAGATGGAGTGTGAC

TTCCTCTAGGTCTTCCAAACTACTCAAGCTCAACCTAGCTTCTCCCTCTCTCCCTGAGTACCTCCAGTCC

TAGAAGGAAGGCACATGTCTCCCTATCCTCCCCATCCTTCCCTCTACTTTGTCTCATAGGACACAGTTTA

TATAGGATCACTAACTCAACATTGACTCCCATCAAGGAAGAGAAACCTACCCAGTTCCTCGATGCCTGAC

AAGAGTTTCTTTTTCTCCTTTTCTCCTGTTTTCTCCTGGCCAGGGAGACCCTGCAGCGTCATGGTGCTGG

AGCTGGTGGCACCCGCAACATCTCAGGCACCAGTAAGTTTCATGTGGAGCTTGAGCAGGAGCTGGCTGAG

CTGCACCAGAAGGACTCAGCCCTGCTCTTCTCCTCCTGCTTTGTTGCCAATGACTCTACTCTCTTCACCT

TGGCCAAGATCCTGCCAGGTAAGCCTGAGGCCTGAGCTTTGTTCAGGGCTGGTATCCTGCAATACAGCAT

CCAGTTTCACTGGTTCCATCACTCCTTCCCTGTATTTGGAGTTCCCTCACTCCCATTGTTCTTCCTTCTT

ATCCACCTTGCATATCCTCAACACTGGATAATTATATCCCTCTGCTTTCTCTCCTTCTGCACGTAGAGAG

GACCATTACCGGGGAACATTACCCCACCTCACAGAAAGGAAACACTATAAATTCATCACCTCCCAACTCA

ACTGAGCTCTTAACACACATACATAGTTATTTTATGTCTCCACAGGAGCTTTTTCAAACTTCTTCTCCTC

TTCTAAAACCTCTGACTACCTTCTCCTCCACACTTAGCAAATAACCTCACATCTTACTTCACAATAAAAA

CAGAAGCCCCAGACAGAGAATCCTTATTTATTGCCACCAAACCTACGAACTTATCTAATTGTTTATCTAG

CCTTGCCTCATTCTTTCCTTTTACAATGGAAGGCATATCTCTCCTTCTGCCTAAAACCAATCCCTTCACT

TGTACACTGGTTCCCATATTCCCAGTCTCCTACTCTCTAGTCTGTAATGTCCTCACCTCATACGCCTTGT

TGTCCTTCCGCCAAGGCCCAATCCAGAATGAATACAACCCTCCATCTTCACTATATCAATTCCGGGCTCA

TACAGTTGCTCAGACAGGAGTCACTAAAAATTCATACTCTTAACCTCTACTGGGTTCTCCATGGTCTCTG

ACAATCCCATTTCCCTGGTCAGTTCTCGAAGTTTATGGGGCAGTTTTGCCAAACCACCATTATCCTCAGC

CTTCCCACACCCCCTCCTCCCCATCTCCCTCAGCAGACAACTTCATGTTCTACTACATTCAAAATAGAAG

ATACCAGACAGCAATGTCCTTGACTCCCAGCCACAAAGCACCTACAAACTCATAAGCATCTTCAAATGTC

CTCTCCTCACTCCTTCTCTTCTGTCATAGTGGAAGAAGTATCCTTTTTCTTGTGACTAATCCTTCCACTG

TTGCTCTGTGCCCCATTCCCCTCTACCACCTTAGGAATCTTGACCTATTGGCTCTCTCCTCCTCTCCTGT

ATCTTCAGCCTCTCCCTCTCTTTAAACATGTTTTCAAGTCTCTTGTATCTTATAAAAAAACATTGCCTCA

ACCCCTGATCACTCTCTAGCTACTGCCCTCTTTCCTCCCTATAACAGGCAAACTGCTTGAGAGAAGTCTT

CGCTCTTACTATCTACTTCCTCACCTCCTGCTGATTCTTCAGCACAGCAAAAATATTACCACCACTTCTC

AGAAACTTTTTTTGAGTCCACCCATAAGCCCCAACTAAACTCAACATCTTTAAGTTGTTTTTAGTCCATC

CCCTCCTCAACCATTAAACTTCTTTCCATCTCTACTGCCAGCATCCTAGCCTGATCCAACATCATTTTTT

AAAGAAAATTTTACCTTTGCCCTCCGATAATCTATTCTTTACAACAGTCAGAATTTTTTTTAATGCAAAA

CTATCTTTGTCACCCCACCCTCAGCCCTGGTCAAAACCCTTTAGTGGACCCCCATTCCCCCAGGACCAAA

TCCAAATTTCTTATCACAGCTTCTAAAGTTCTCAATAATCTGGCTTCTATGTATCTCTTCGGTCTCACCT

TTTTGCATCCCTCCTCTCACTATTTCATTCAGTAATACATTCATTCATATACTCATTCACTTACTTATAA

ATCTGTCATCAGTTTATTTATCCATTCATTTAATAAATGTTTACTTAGCATCTACTGTGTGCTTACTCTT

ATACTGGACACCAGAGACAGAGAGATAATAAGATGTTTTTGCTCCCATGCAACTCCCAGTCTGCTTGTCT

TTCAAGCCATTTTCTCCAGAAAGCCATAACTCATTTTCTCAGGTGGAAGTTATCCCTTAATCTTATAATA

AGGCCACAGTTCCTTGATGGCAGTGCAGTTGGTGGCAGGGGTTGGGGAGGTCCAGGAATCAACTCCCTCT

ACCAATTTCACATGCCCACCTGCCCCACCAGGATTGCCCAGTAAAAAGCCCTGCATTCTTCAAATCTTTC

TGGACCTTAGCTTTCTCACTTGTATAGTAAAGGGATGAATCCCATGATCACTAACAGCCCTGCCAGCTCT

GACATGCCATAAGCTTATGATTCCAACAGTAAAAGCCTGATAAATATCCATCCCTGTAACCACAAGCAGA

TGCTACCTGGAATGGATGGAATTTCATCTAGACTAGGAACAATCTAGCATCAGTCCGAGTCAACAAACAT

TCCCTGGGGTAATCCCTTTTTCAAGTCTTGATCTTATATATTGGGGAGAAGGAAAATAGGTCCCGTCCTC

AAAAAACTCTGAAGCTTCTTGGGAAATTAAATGTTCTTCCACCCCAAGGCAGTCAGAGGCTAGACCAGGG

TTACAAATGACTGGAGGGAAGGATGTAGGGGTCAGAATTTGGGAACAGTGAAGTCCTTCCAAGGGAGAAA

GAAGTGTCACAAAAGTTCCCAGAGAAGGAAGAAGCAGAGCAAGGTCTTCAAAGGGAAGAAAGGGTTGGCC

CTTTTCTTTGCCAGGTCAAACCTGAAGGTTGAAGTGGGAGTACTGGGACAGAAGCTTAAGGATTATACAT

CTGCTTCCTCAGGGTGCGAGATTTACTCAGACGCAGGCAACCATGCTTCCATGATCCAAGGTATCCGTAA

CAGTGGAGCAGCCAAGTTTGTCTTCAGGCACAATGACCCTGACCACCTAAAGAAACTTCTAGAGAAGTCT

AACCCTAAGATACCCAAAATTGTGGCCTTTGAGACTGTCCACTCCATGGATGGTATGTATATGAGTGAGT

GTATGTTTACTAGTGTTGGTCTCACAAAAACCATGATGATCATGATGATGATGATGACGATAACATTATA

ACAGCTAATATTTATAGTGTTTATTATGTGCCAAGCAAAATTATTAGTATTTTACATGTATTAATTCATT

TAATTTTCTGAACAATTCTATGTGATAGGTGTTATTATTATTTTGATTTTTTACATGAGGAAACTGAGAC

ATAAGAGTAATTTGTCCAAGGTCACACAGCTAGTAAATGCCAAAGAATGGAGGCAGCTATTACATTCATC

TTATAGGTAAAGAAACTAAAGTTCAGAGTTGGCATCCAATTCATCTTGAGTGGCTCAGCAAGTTGGTGCT

AAAGTGAGTATCTGCACCCTAACACATATAACTCCAATTCCTCGAGTAACACTTCTCTTGTTAGAAATGA

TATGTAAATCAATAATCCCAGTGTTTGGTTTTTATGAAGGAAATTTCAAAAACCATTGCCTAGGATTTTT

TTCAAGGTCCAGTATGAAGCATTGGGGTCAAAACAGGTTTTCAAGTCAGAGAGACCTGGGTTCAAATCCC

ACCTTTGACAGTTACTGGCTATGACCATGGGTAACTCTTTAACTGTCTAAGCCTCAATTTTCCCAAAGGT

AAAATATCTGGTTGTAAGAATTAGAGATGATAGAAACCATTCTAGTTATTATGCTTTAGTAGAATTAAAT

GATCTTCACACTCCTACCTCCTTTCTTTGCTCAATTGAAACAATGTCCAAAGCTTTCTATTGCTGGCCCT

GTTGTGTAGAAATCATGTGTTTTAGGCATCCTCTTATGGATTTATTTAAGGGAAGAGGTCCTCAACTCAT

TTCAGTTTGTCCCTTTTCCAACTGAAACAAAAGAGTCCATAGTATTCCCTGATTTAGGTATCTTAAGTGG

CATGTAATGACTATACACACAGGCTCTAAAACCAGACTATCCATGTTCAAATCCTAGCATGACCATTTAC

TAGCTTGGGCAAGCTTCTTAATTGCTCTGTGTCTCAGTTCTCAGTTGCTTATTTGAAAAATGTAAGTGAT

AATAATTAAATAGGTATGCAAATTAAATGAGTTAATATATGTAAGAAACTTACTATTATGCCCACTCCCA

CATTTCTAACACTAGCAATAAAGTAAAACTATCCTATCCCTTTTGTATATTTCTACCACTGAGACTATTC

AAATTCATTATTTCTCTAGTGGAAACTATGTTGGTACCATTCTACCTCGTTACATTTGCAAATAAATAGT

TATTTACCTATTTTTGGGGTGCAAACTCTGCCCAAACTGTTGATCCTTAGGCTGAATCTCTCCCATTGAA

ATGATGCTAGGCTGAACACAGCAGAAACAGGAAAATAGACATTGTCAGAATGAAGTAAAAACAGAAAGAC

AAAGAGTCAAGCCTTGATCCCAGGCTGGGGAACACACACACATGCGCACACACACGTACACACACACACA

CACACACACACACACACACACACACACACACACACACAGAGAGACAGAGAGAGAGAGAGAGAAGGCAGGG

ATGAGATACAGGCAATCGATCCATACACAGAGGTTTGTAATAGTTCTAAATGAAGGCGCACATCCTCCTT

CCTCTCTACAACACCCTTTTCCAACCCAAAGTAGGCATGTATGGGAAATTCCACATTGGAGATGGAGCTG

GGGAAGGGTTATGATGTCCTACCTCTATCCCTTGGCTTTGCTCAGGTGCCATCTGTCCCCTCGAGGAGTT

GTGTGATGTGTCCCACCAGTATGGGGCCCTGACCTTCGTGGATGAGGTCCATGCTGTAGGACTGTATGGG

TCCCGGGGCGCTGGGATTGGGGAGCGTGATGGAATTATGCATAAGATTGACATCATCTCTGGAACTCTTG

GTAAGTGAATGCTTTGGGCCTTCTTATATACCCTCCAGAGAGGAGGCCCTTACAAAATTCTTTTCTGCCT

CCTCCCCAAAGCTATAGGGGTTGTTTGGACAGAATTCACAGCCCCAGGCTGCTGCCATCCTGGACTCCCT

CTCTCCACTCGCATCCCACTGCAGAGTTGATGAGAAAGTCTGGTAGAGTTTTTTGAAAAGACCTTGAACT

AGGCCAAATAGTTAGATTCAACTTGAGTATGTGAAGAGCTGTGTTTCTAAACCCCTCCCCCACCCTAGCC

CCAAGCTTCATCTTAGCTCCACTCCTGACCCTATCCAGCTAAAGGTCCCCACCCAGCTCCTGCCTATCTA

GTCATTGCATATGGCAAGACTTGAAAGTCCTATCTCAAAGCAGCAGAATTATCAGCTACGACTGCCTTGT

CATGGACAGATGAGCAGAGGCCTGGGAAGACAGCCTGGAGCCCCAACTTCTGGTGCACCCCCTTGTGTTA

TCTGGCACATGATCCTGTTGCTCTGGGACTGATTATGGGATCTGTGTATATCTTATTCCTTTCTGTCTCC

AGGCAAGGCCTTTGGCTGTGTGGGCGGCTACATTGCCAGCACCCGTGACTTGGTGGACATGGTGCGCTCC

TATGCTGCAGGCTTCATCTTTACCACTTCTCTGCCCCCCATGGTGCTCTCTGGAGCTCTAGAATCTGTGC

GGCTGCTCAAGGGAGAGGAGGGCCAAGCCCTGAGGCGAGCCCACCAGCGCAATGTCAAGCACATGCGCCA

GCTACTCATGGACAGGGGCCTTCCTGTCATCCCCTGCCCCAGCCACATCATCCCCATCCGGGTGAGAGCC

CCACCATGCCCATTGCCCTCTCCACCTATTTATTCTGGGAGCCTCACGCTCCCAACAAACCTACATCTGT

TGCTGTCTTCAATTATTTGCTTTCCTGCTAACCATTCCCTTTATTGCCAGCTTTGTTTCCCTTTTTGAAA

AATTATCAGCCATTCTGGATTAACCAGTCTTTTCCTTGCATCAGCCATTACCTCATGCTTATTAGATTAT

CCTAACCCTAACAATAGCGAGTGCTCACAGCCTATAATTCAGAGTTTTTCAAACTGGATCAAGACAATTA

ATGGGTCACAAAATCAGCTTAGTGGGTTATCATTAGCATTAAAAAAAGAAAAGAAACAGAAAATGTTGGA

GTACATCACATACTAAGGGTATCATCAATTTGTGAAAAATTTGTATGCATTTTGGGTATTTGCATATACA

CATGTATGTGTATGTGTGCGTTTATGGTCACGGTGTAAAACGTACTTCTTATTGAGAAATGAGGGCAGAA

AAATAAAATCAAAAGCCATAGGATTAGCTGCTACTTTGGATCCTCAATATGAGCATTTACTGCCTTTAAA

AATGAACTGCTACTTCTTTCTTAAATAACACGTATTTGTGTGAGTCAGTAAGCCAGGGCAGGGAAAGGAC

ACTTATTTGTGACAATTTTGTGGATGAGAAATAGTCACTGCTCTTTAGACTAACCTAGTATTTCCTTTAA

ACACTCATTTTATGAATTAATTTAGTGACAGCACCCCAGAATTGGCTTGGCGGGGGTTCCAGAATTGGCT

TGGTGGGGGGTATCTTCTCACCCAGAACCATCCCAAACTAAGATATTAGCTAAGTAAAATCAGTGTGCTT

GCTCTGCAAACAGCTTCCAAACAGGGCTCCTGGTACCACCTCTGCTCCATCCTTTTCAAACCAAATTGCT

AGCTCTGAGCTCCTCCTTGATAGAAATTCTGGAGCTGCCACTAAGCCCCTAATGGAAAAAAAAAATCTAT

CCCAAAATTCAGTGATGTTCCCTCATCTAGTTCCCTCCATCTGCTTAATGGAGCTAGTGATGGTGGAGCC

AGAGTGGCAGGTACTGATTAGCCTTTCTCCTGAGTCCAGGTGGGCAATGCAGCACTCAACAGCAAGCTCT

GTGATCTCCTGCTCTCCAAGCATGGCATCTATGTGCAGGCCATCAACTACCCAACTGTCCCCCGGGGTGA

AGAGCTCCTGCGCTTGGCACCCTCCCCCCACCACAGCCCTCAGATGATGGAAGATTTTGTGGGTAAGTTC

TCAACATGGGTGCCTACAGGACCTCCCTCCCCTCAGCCCCAGGATCTGAAAGAGAAGCTGAGAGGACAGA

GACCACTGAGTTTACAAAATATTTCTGGAACATCTAATGTGTGCCAGCACCTATACTAGGGTCACAAATA

AATGAGAAGCAGCCCCTACACTTGTAGGGCTCCAGTTTGGTTGGGGATACCATAGTGAACACAAACAATG

ACACTAAGGGATGATCAAAGCTCCACAAGGCAGTGCATGATAGAGTTGTCGGAGCAGAGAGGAGGGGCCT

GACTCAGCCTGAGGGATGCAAGACCCACTTCCTAGTAGAGGTGACACCTGAGCTGAGTCTTGCAAAGTGA

GTGGTATTAAAAGAAAGAGGGCATGGAAGAAGTATTCCTACCAGAGGGAAGAGCATGAAGATAGGTGAGG

AGAATGAGAAGCAGCCAGGGATATATCAAGAACAATAAGCAGGTGGTATTGGAATGTAGGGTCATAGGAA

TGGAGTGGGGCAGGGGAGTATCAATCTATGAGTCTACAAAGACAACATGAGATAGAGACTGGATTGAGAG

GCTTGTAGAGCTGAGTAGTTTGAGATTTACCCTGAAAATGCCAGTTTAGTCAATTCACCTAATGTTTGTT

GGATTTCTGTTGGGTAGTTTTGTTTTTGTTTGTTTGTTTTTGTTTTTGTTTTTTTGAGACAGAGTCTGGC

TCTGTAGCCCAGGCTGGAGTGCAGTGGCACGATCTTGGCTCACTGCTACCTCTGCCTCCCGGGTCCTGGC

TCAAGCAATTCTCCTGCCTCAGCCTCCCAAGTAGCTGGGATTACAGGCACGTGCCACCATGCCTAGCTAA

TTTCTGTATTTTTAGTAGAGATGGGGTTTCACCATGTTGGCCAGGCTAGTCTCGAACTCCTGACCTCGTA

ATCCACCTGCCTAGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCATGCCCGGCCTGGGTAGTT

TTTAATGCAGGGCCTGACATTGAATAGGTGCTCATTCCAGGCCTGTTGGATGAAAGACATGTAGGCAGTT

GATGGTCTAGCAGAGGAGCCAGATATAGATGGTACTGGTCCAGTATGATGAGCTCCAGTATTCTGGGAGC

TAGAGGGAGTGGACACATTATGGAGAGAGAGGGTGGGAAGGATGAAATTGGAGAGGCTTTGTGAGTAAGG

AAGTTTTTATGATGCATGTTGAAGTACATGTGAATATGTTGTAAGAATATTCCAGAATAAGGGAATTCCA

CGAGCAATGACCTAGAGATAGGAAAGCAGTGGGTATGTATTGACAACATAATTCTGTTTGTCTGAAGCAT

GGGCAGTATGAGAATTCAAGGAAGACAAGCTAGGTAGGCGCCATTCATTCATTCAAAAACATTAAATAAT

GCTGGCTAACATTAAGTACTTACCATGTGCCAAGCACTGTTCTAAACACTTTACACGTATTAACTCATCT

AATCCCCACAACAACCTCAAGAGTTAGAGATCCTCTTATCATTTCCATTTTGTACATGTGGAAATTGAGG

CACAAAAATATATAGTCGCTGATCCAAGGTCACACAGCTTCTAAGTTGCAACTGGGAGGTCTGTCTCTAC

CTCCATGGTCATAACTGCTAGGTCTACCACCTCTCTGAGCTGATGACCCAGACTCCTGGGCCTTTTGTTC

AGTATTCTCTTTTGCTCTGGGCTTCAATTGTAGAGCTCTCAGTATTCTTGGTTCTCTGAATGTCCACCTA

GGCTAGGCTTTTGTAAGAATATATGAGGCATCCACGATGGCTCCACCAGTCCCTAAGTTCCATAGCCAAT

CCATCCTGAAATCCTGCAAAAGTTATCTATAATCTCTCTCAAACCTATTTGCTTTTCTCCCCTGCCACTT

CTTTAATCCATGTCAACATGATTTTTTTCCTAATTTCTCTGCTTCTCTCTTGCTCCTCTCAAATCCTTTC

TCGATGATGACCACTAGAGGGATTTTTCTAAAATTCTGACTATATTGCTCCCTTGCTTAAACCCCTTCAT

GTTTCCCTCTAGACTCTAAAGCAGTGACCTCCAAGGGGTATGCAAAATGATTACAGGGTGAAGGAACAGA

ATATGTATTAGAATTTTATGTTTTTTTATCTTAAAAATAGGAAATCAAGCATCACTGATACTGATCTTTA

ATATACAGACTGACAGTTATACATGTATATAATATATAAACAAATATAGAGATTGGAGGTACATGCTAAA

ACATTTGTACTGATAGGGATGTATAGTCCAAAATTTGGAAACATTGACATATAGGACAGAGTTGAAGCTC

TTCAGCATAGCATTCAATGCCTTCCACATGGTGATCTCTATGCCCTCACCTCCTCCCCACATGCATTTTG

TTTTTTCAGCTACACTGAAGGACTTGTCGTTCCCTCATTTTTTTCTGCTCTCTTACCTCTGGGACTTTGC

TCATGCTGCTCTCTTTTGATTGGAATGCCCTCCCTCACACTTTCCTCTGGCTTACTTTCCTTCATCTTGT

AGACTTAACTTAGGCATTCTTTCAACAAATATTTATTGAGTACCAACTGTGTACTAGATACTGTTCTAGG

CACTGGGGATGCAGTAGCAAACAAATCAGACACAAAATTCCTACCCTCTGGAGCTTACATTCTAGTGGAA

GGGGTAGTAAAAAAAATTACCAAAAATAAGCAAATTAAGTAGCACATTAGTTCTAAGTGCTATGGGAAAA

AATAAAGCAGGATAAGGAGAATGGGATAAGGGGCCAGGGGCGAGTTCAGAGAAGGGTTGTAGTATTAGAG

TGGCAAGGGTAGAAGACGCTGAGGTGAAACTTGAGCAAAAATTTGAAGGAGGTGAAGTTAGTGAGGCAGA

TATCTAAGGGAATGGCATCGCAGGCAGAGGGAACATCCTAAGGCAGGGAAGACACAGGAGTATTCCTTTT

ATATTTGAGGAACAGTAAGAAGATGGGTGTGGGTGGAATGGTATAAGCAAGTGGGAGACAGAAAAATTGA

GTACATAGAGGCAATGTGGGACCAGATTGTATAGGGTATGGTAGGCCATTAGAAGGAGTTTGGCTTTTAC

TCTGAGAGCCCTTGAAAGGATTTGAACACAGGACTGATATTTCTGACTCGGGTTTTAACAAAATTGCTCC

AACTTCTATGTAGAGAATACACTAAAAGGGAGCAAGGGTGGAAGCAGGGAGACCCAAGAGTGGGCTACAG

TAATATCCCAGGTGAGAGATGATGGTGGCTCAGACTTGATCATAATGAAGGCAATAAGAAGTGGTCAGAT

TTTGAAGGTAGAGCCAAGGGTCTTTGCTGATAGATGGGATATAGGGTAAGAGAGAAAGAGAAAAATAAAG

GATAGCTCTGAAATTTTTGGACTGAGCAACTGGAATTGCCATCCACTGAGATGGGAAAAGCTAAAAGTAG

AATAGCTTGGTGGAGGGTAGGGACATGAGTAGCTCAGTTGTACTCCTAAGTTAGAAATGCATATTAGACA

TCTAGGTGGAGATGGAGAAAAGCCATTGGATATACAAGATTGGAAACCAGTAGAGTGGCGTGAGCTGGAG

ATTAAAATTTCTGAACCATCAGCATATAGATGGTCTTTAAAGTCATGTGACTAGACAAGATCAACAAGGG

CATGAACACAGAAAAGGCCAAGAACAGAGCCCTGGAACGTACCTGGGGTACTTCCTCCAGCTAGGTCAGG

TTCCCTTCTCTGGGTTTTCACACCCCCAGGTGGACCCCCTACCCCAGGTTTCCTGGTCATAGCACCAATG

ACACAGTATAGTTACTGTCATTATCATTGTCCTCATAGGGCTTAGAGTTCCCAAGCAGACAGTCATTCTT

GGGCCACAGCACATCCTATACTTAGGGAGTGGTCCAGGCCAGGACAGTATGGCTTCAAATTGTGTCAAAG

GAGAGCTTCCAAATCTTTTATAATATATATCCCAGCATCCAGATACAAATGGTAATATTCACGGCACACA

CAGAAGCAAACAGTAGGCTACTTCTGGCCCTGAGGTATCTTGAAGGGTTGAGGGGGATCAATATCTTGGC

TCATCTGTACTGTGACAGATTTGGAAGATCTAGTCTAACCCATTTTTTCCCTCCCCTCCCCCTACCACCT

TCAGAGAAGCTGCTGCTGGCTTGGACTGCGGTGGGGCTGCCCCTCCAGGATGTGTCTGTGGCTGCCTGCA

ATTTCTGTCGCCGTCCTGTACACTTTGAGCTCATGAGTGAGTGGGAACGTTCCTACTTCGGGAACATGGG

GCCCCAGTATGTCACCACCTATGCCTGAGAAGCCAGCTGCCTAGGATTCACACCCCACCTGCGCTTCACT

TGGGTCCAGGCCTACTCCTGTCTTCTGCTTTGTTGTGTGCCTCTAGCTGAATTGAGCCTAAAAATAAAGC

ACAAACCACAGCA

As used herein, “GYPA”, “GPA”; “MN”, or “glycophorin A” a sialoglycoprotein of the human erythrocyte membrane which bear the antigenic determinants for the MN and Ss blood groups. Sequences for are known for a number of species, e.g., human GYPA (the GYPA NCBI Gene ID is 2993), the nucleic acid sequence (e.g. NG_007470.3), mRNA sequences (e.g. NM_001308190.1) and polypeptide sequences (e.g. NP_001295119.1) are known in the art. These, together with any naturally occurring allelic, splice variants, and processed forms thereof that catalyze the same reaction are contemplated for use in the methods and compositions described herein.

In some embodiments of any of the aspects, the GYPA enhancer element includes or is derived from human GYPA sequences having the following nucleic acid sequence NG_007470.3 (SEQ ID NO: 42):

NG_007470.3: 5001-36438 Homo sapiens glycophorin A (MNS blood

group) (GYPA), RefSeqGene on chromosome 4

GCAGGAAGGTGGGCCTGGAAGATAACAGCTAGCAGGCTAAGGTCAGACACTGACACTTGCAGTTGTCTTT

GGTAGTTTTTTTGCACTAACTTCAGGAACCAGCTCATGATCTCAGGATGTATGGAAAAATAATCTTTGTA

TTACTATTGTCAGGTAAGTGATTTTATTTCATCTTGGTTCTGTTATATTGGGTATGAGATCATAGAATAA

AATATGAACTACCCTATTTTAGTTCTATCTTATTTAAATCAATAAATGAGTAGTATTTCCTCTTCCAGTC

TGGTGGATGGATTTTACTGGAACTCAGCTACCAATGTGGGGGAAATGGCACAAGGGAGCCCAGTATTTAT

GGCCAAATCCAGTTTTCTAGTATGAGAAGCTTACTTCAATTCTAAGTCTAGCTAGAATTAAAATAATTTT

ATCAAATGCTATGAGAAATACCTCTCTGTGAATAAATGTATTGCTTTGTTTGAGTTATAAGGAGATTCAT

TTCCAAACTAAAGAGTTATTAACGAAGATGTTGGTAGCTATATGGCTTTTAGTTTTCAAAAGGTATAATT

TCCTATTTCTGCCAAATGGCGAGAAGCCAAAAGCATGAACACTGAAACCGTGGGGAGTTGTTCGCTTCTC

TGTGGGTCCATTACTAAAGTGTCACATAGGAAGAAAAAAAACAAAAACAACTCTTACTGGCTTAGGTATC

CTGTGAATTTTAGGAGAAATTTAAATCCATTAAAATAAAGAAATATCATAGGGTTATTATTAAATTGTAT

TAATTCAATAATTTGAATTTAACTTAGTTTAAATTTAATTATTAATTTAGTGTCTTAAATTAACATGATT

TTGGCCTCTTTCTGAGAATATTATAGTTAAACATCCTCTCAAGTGCAGTGCTTATGTGTTAGCAATACTA

GTGCCCAGCACACAGCGGGCAGGCAGTTGCTTGAAACATTCTGAGTCTATTAGACATTGCTGTATCCCAA

GTGAGAGCAAGTATCAAGGAGCTACTGAGCACTCTGTAGCACACAGGGAGGAGAGATCAGCATTTTCTAA

GATACCCTAGGGGAGGATAAAATAGTGCAATAGTTAAGAGCACAGGCATGAGGAACAGACAGAACTGGGT

TCAAATCTACTTTTACTTCTCAAGGCTGGGGAACATTAAGGCAAATTATGTGCCCACATTTTTATGTGTC

CTCGTCTTTAAAATGCAGGCAGTGTTGGTACTTACCTCATAATAATTGCATAAAGATTAAACAAAATATT

TAATGGAATACACTTACTGATGCCTGAAACAAAGTAAAATGTTAAGATTACTATGCATTTTCTGTGATTA

GAATTAACTATCATGATTAAAAAGTATTAATAATATATTATTAAAATAAGCAGTAGCTATCAATAGTTAC

AGACTAGGGAACAAACCTACGTATGTGATTGGTGATTTCTGAAAAGTCAGAGAGAAAAGAAAATTACAGA

AAGAAAACAGAAAACAAACATAGCTACTCTAATTTTTTAAGCAGAAAAGTATGAAAACATTTAGTTTGAA

GAAAAGAAAACAAATGAAAGGGATGTAGTGTAATATTTGTATATATATTCATATATTTGAAGTGCTATTA

CACAGAAAAAAAGATGTATTCTTTGTGTTGCTCCATGGGGCAAACCAAACTGGATGTAACTCAAGCAAAA

TTAGACACTGCATACTCTACTGGGGGTGTGCCCAGCATTTGGGAAAACTCTGTGTGACTTACAAGTGCCC

CAAATTTGGAAAGGGTTCCTGGCAAAGAAATGATTTTTTTTTTAAATTTCTACAACTACACAAGCAGATA

GTGTATTAAAGCCTTAAATGGCACTTGGTCACTGGGGCAAGATGACCCTGAAAGCTACAATGGTCTCCAG

TACCCAAGCTGTTATCATCTTTGTAGCTTCAGAAACCCTCCAAGGAAACTCTCTTGATGTGGCTACTTTA

TAGTATAACAGAAAGGTGTAAGATCAAGTTTTTCCCCCATACTGATTAGCTGAAGAGTAAACATGGTGAA

GTCTTTTTCTTTTTCTTTTATGTTGCTATAAAAAAAAAGATGATTGCCTTGCTTTCTCCAGGAATCTTAA

GAATAAAGCCAATATTTCTAATTCTAAACTTACCAGAGATCTCCTTCCAAATGGAGAATCCATTTTTTCT

AATATGACTTGATTCCCAGTCCCTGAATTCCTGCACTCATTTGATGATTCAGTCATTACATGTCAGATTG

TGAACCAGACACTGAGCCCACAGCAGGAAGAAAAATGGGCTCCCATGGAGGATACACGGAGGGTAGGCGC

AGTGGATGATGGGAGGGAACGCAGATAATAAATGGAACAACAACTATCTTATTAAAATAAGATAAAAACA

GTCAAAACTAATACAAAGCATATAAAACCAGGTAAGATGATAAACATGAATGCCGAAAGCTGCTTAAGAA

AAGGGTAGCAGGGAGTTATTTTCTGAGTAGATGACATTTATGCTAAATGTGGAACAAGGAGACGGAGCCA

ACCCTGAAAATTCTGGGAAAAGAGGACAGAAGGCAGAGGGAAGAGCAAGAGCAAAAATTCTGAAACAGCA

GGTAAGTTAGTGTTTTCAAGGAAAAGCTGGAGCTTTTATCTGAAAATCAGATTCTGAAGCTAAGAACCAA

TTTGAAAATACAATACAATATCACTTCGACTAGGAAATTATGGCATAAACCAGGAGTCTCCAAAAGCTTT

TTGTGTTTACTTAAAAATTCATACAAAATTTGCATTCTAGGTCATAATATACTAATTTAATTGGAGGAAA

CAAAGGCACTGGTATGATATCATCATGCCTACTTTATTCATCCGTGTATCCCCAGAATCTAGCACAGTTC

CCGATTGGTATTTATAGTAGCATATTGGTTGAATAAGCAAGGAAGGAGGTGAAGGGAGGGAGAAGGAGAG

AGAAGCAGAGAGGGAGAGGAAGGAAGAAAGAAAAGGAAAAAGGGAAGGAAAGAAGAGAGGAGGGAGAGAG

GGAGGGAGGCAAGAAGGGAGAAGAGAGAAGGGAAGGGAAGAGACAGGAGGAAGGGGAGGAGGAAAGGAAA

GAGGAAATATTTGTTTTCATCTGGTTAGACACAGTGAGTGCTCCGCATAGACAGATCATTATTACCCTGT

GCATCTGACTCATACCCCTGCAAGTACATCAGTCTGAGAAGCACATGTTAAGTGAAGAAACAAGGCATCT

CTTTTTTTTTTTTTTTTCAGGGATCCAAGAAGAGAGCCTTGCTAGCTGCTATTTAATTGGCACAGGAAAG

AGTTACAGGAACTGTATGCCAGGGAATACATGACTATAAATTCTTTAAAAGCAAAACCTGTGTCTTCGCT

TATGTGTCCCACACATTGTCAGCCACATAGTAGGCAGTCAATATCAACTACTCAAAATGACAAATGACAA

ATGACCAGAATTCTGCGGCAGACTAGTTTAGCCATGAAAAATCATTTAACACCCGTGGGCCTCAGTTTTC

TTGTGCCTATTCAATAAAGCGCCGAGTAGATGGTATCTACAAGCATTTTTCAACTGTAAACCCCAATGAA

TCCCCAAAATTCAGCCTGAGATGAGCTGGACTAGTTGCCAAACCTATAAATATCTTTAGCATGGTGTGAA

ATAGGGTTTTTAGAAAGAAACAGACACCCACTGTGAACTCCTTTGCAGAAAAGGTCTGAATAGAGGGGAA

AGTAGGGATGGTATCTCAAACTTACTTTGTAGTGATTTTAAATTAGGAAATTTAGCTTCACATTCTTGTG

ATAAATTTCTTTTCACCTTGGTTTCTAGAAGATTATTCAAAACATCTGTGAGACTATTTGAGAAGTATAC

TTTTGGGGAATTTCCCCAAGTTATCTTTATAGATTATATTTTGACATCAACTGCAAATGTAATATCTTTT

ACTCAAAAAAAACCCAATCCTACTTACATGGTGCTGACAAAATCAGGCTGGACCTACATTTTTACATCAT

AGATTTCCAGCCATTATTATCATATCCACATCTTTAGTAAGTACCTATCTGTGTAGTTTTCTGTGATAAA

TGAACTAAACTAAAACTAAAGCAAAAATGTTGAAAAAAAATTCCAGGTTTATCTCTGAGTGTTGGGATTG

CAAGGTTTTTTTTTCTCATTTTAAATACTTTCTAAATTTTCTGCAAAGAGAACCATATAATCTAATCAGG

ACAAGTTTTAATATATTTTAAAAAGTAAACCGAACAAACACAATCTCTGCTTTCTAAGAAGTCTTTAATT

TTTGTACGTTGGTCATAGACTATGACTATACAATTTATTTGTGATATGTATTAAGAATTTCTGTCTAACC

CAAATTATTATATGTAAGCACGGGAAAAATGATGTCATCTTTGTTTGTAGTGTACAAAGTTCTATAAACA

GCTATTTGATCAACTTTGGTATTTCCATCCCTAGATTTATATACAGCAGGTTAGGTTCCATACAGAGGCA

GGTTCTGAATAATAATAACCAACACTGATAATAGCACTTACTTTGTGCCGTGCACTGTTCTAAGCAATTT

ACATACACTTAATTTTTAAAATTGTAGTAAAATACACATAATATAAATTTACCATTTGAACCATTTTAAA

GTGTACAATGGGTAGCATTTAATGCAGTCAAAATGATGCACACCCATCACCATTATGTAGCTCCAGAACA

TTTTCATCACTCCAAAAGGAAACCTCTTACCCATTAGCAGCCACTTCCAATTCCTCCAGCCCCTGGAAAC

CACTAATTTGTTTTCTACATCTACAGATATACCCATTGTAGATATTTCATATAAATGGAATCATATAATA

GGTAGCCTTTTGTGTATGTCCTCTTTCACTTAAAATAATGTGTTTAAAGTTCATCCATATTGTAGCATGT

ATCAGTATTTCATTCCTTTTATAATTGTGTTGGTATATCTCATTTTGTTTATCCACCCATCATTTGATTA

AAATTTGGGTTGGCATATCACATTTTGCTTATCGATCCATCATTTGATTAAAATTTGTGTTGTTTCCACC

TTTTGGCTATTGTGAATAGTGCTGCTATAAATATTCCTGTACTAGTTTTGTTTGAACCCACTTTTAATAC

TCAAAGATGTATAGGGGTAGAATTGCTGGGTCATAGTAATTTTATGTTTAACTTACTAAGGAACTGCTCA

ACTCTTTTCCACAGGAGCTGCACCTTTTGACCTTTTCACCAGGGTGTATGAGGTGCCAATTTCTCCACAA

TCTTGCCAGAAATTGTACTTTTTCATTTTTTTAATTATAGCCATTTCAGAGGGTATGAAATGGTTTTTCA

CTGTGGTTTCTTGCATTTTCCTAATAACTAATGACGCTGAGAATCTTCTCATGTAATTGTTGGTAACTGC

ATTTTGCATATCTTTGGAGAAATGTTGGTACTAGTCCTTCACCCATTTTTCAATCTATTTTTCTTTTTGT

GTTGCTAAGTTGTAAGAGTTCTTTCTATGTTCTGGATAAAGAGTCTTATCAGATATACTATTTGCAAATC

TTTTCCTTCATTCTGTAGATTTTTGTTTTTACTTTTGATAGTGTCCTTTGATGCACAAATGTTTTTCATT

TTCAAGTCCAATTTATTTTTTTTTCTTTTGCTGCTTACGCTTTTGATATCATATCTAAAAATAATTGCCA

AATTTAAAGTCATAAAAATTTCTCCCTATGTTTTCTTCTAAGAGTTTTGTATTTCTTCTCTTATATTTAG

ATCTTTGGTTTATTATCAGTTAATTTTTCTATATGATGTATGATAAGAGTCCACCTTTATTATTTTGCAG

CTGTCCCAGCACCATTTGTTGAAGAGACTATCCTTTGCCCATTGAATGGTCTTGACACCCTTCTTGAAAG

TTAATTGGCCATGGATATATGAGTTTATTTCTGGAGTCTCAATTCTATCCTAAGAATATGTCTGTTCTTG

GGGCAAAATCACACAGTTTTTATTGCTGTTACTTGGTTATACGTTTTTAATTCATGAAGTGTGATTCACC

AAACTTTGTTCTTCAAGATTGTTTTGCCTATTTAGATCCCTAACAATTTCATAGAAATTTTAGGATTAGG

TTTTCCATTCTTGCAAAAAAATAATTATGTGCATTTTAACTTAACCTGTTCAATAACTCTATAAGGTAGA

GACTAATCCATGTATAATGATGGAACAAAAATATAGAGATTAAGTAAATTTTGCAAGGTCTCAGGTAGTT

GCTAGAGGAATTAGTTTGAGCCTAGGCAGTTCCACTGCAGAATCTGTGCACTTAGAGAATATGTCATGTT

GCCTGTACCATACCTAGTGATGTTCCAGGATTGGCTCCTTTACTCTTACAACATTGTCACTCAGTGTTCT

GCCTGTGCTTTCACCAAGCTGAAGACTTTAATGAAGGTTGACGGTCTGTCTTCCTCACGTGGTGCAGCTA

AGGAACTCTAACTGTGTGGCTGTTATGTTAGCCTTTTGCTCCTTTTTATATGGGCTATAGAAAATGTTTT

TAAATCCTGGAGGCCTCCTTTTGATGTTATCACTTATTTCCCAGTCATCACTATATTTTTAAAAGCCAAA

ATAGAAGGAAATAAATACAAAACATAAAACATGAATAGTACAGCTATTTGAGGCAACTGAGAATAGAGAT

CATGGCACTGAAATTGCATTTTGCTAGGAAAAAGACCACAAAAGTTCTCCCCTTGCTACCTTTCCTGAAC

TATTCTGCTAGATTCAGACTTCAAAAACATTGTATCAGGAAATACAGAAATGTTCTTTCAAAATGAGTGT

ATGGGAATGTGGGAATGCCTAATAAAATCTGTCCTCATTGATTCGTTAGCAAAAATCATATAAATCAATA

CCTTGTGATTGCAAGCAGATATATTTCAGATCCTTTCTGTGTTTGTTTTTTTGCTTTCTTGATCTATCAC

AATTGGAGAAAACTTAAAATTTCTCAATGGTATTGTATTTTTGCCAATTTCTTATTCTGCTTTATGTTTC

TCGTTGCTATATTATTGGGCTATAATGGTCCATAATTACTTAAGAATCACTGTGAAATATATTGCTTAAT

GACACAAGTAAATCTTTTTCATTGTTTGTAATGTCTTTGCTCTTAATTCTACTTTGCCTAAGATTAATAC

GGTTATTCCTGTTTAGTTTTATATGTATTTATTTATTTATTTTGAAGATGGAGTCTCGTTCTGTCGCCCA

GGCTGGAGTGCAGTTGCATGATCTCGGCTCACTGCAACCTCTGCCTCCCGGGTTCAAGCAATTCTCCTGC

CTCAGCCTCCCAAGTAGCTGAGAATACAGGCGCACACCATCGCGCCCAGTTAATTTTTTGTATTTTAGTA

GAGACGGGGTTTCACTGTGTTGCCCATGCTGGTCTCCAACTCCTGAGCTCAGGCAATCCACCTGCCTCGG

CCTCCCAAAGTGTTGGGATTACAGGCATGAGCCATTGCACCAGTCCTAACCTATCTCTTTTGACTCAATC

TAAAAGTTTCTGTCTTTTAATACAAAACCACAATCCATATGCATTCATTAATTCACAACTGACATTTAGT

ATCTTATTTCTGTTATCCTATTTCATATTTTATGATTCCTTGTTTCTGCTCTTTTGATATATAAATTATG

TTTTATTTGCCCTTATCCTTTCATGTGTTTCTAAAGTATATAGCCTACGTGTAATTGTCCCATTAGCTAA

CTTTATGTTTTTGAAAGCATTCTCTCTCAGAATTCCCATTTTAGTGGTGCAGCACACATAGAAAGTCTAA

GTGCTTTCTGGAGCTAGATAAGCTGGATAAAGGTGTGCATGAGCCACTGGTCAATGGCTTGTGCAGGCGG

TGAGTGCATTTCTGGTATTTCATATGCTATTGATCTGGCAGCCAGGTATTCAGATAGGGTATAACCAGGT

TCATCAGGCTCAAAACATAATCAAGTATTATTGAGACATAGTTAATGTGCACTACAACTCACAGCACACA

GGCTCACACACACACTTGTCTGAAATAAAATTCCACAAAATAATACCTTCCCTTATTCTGTGTGATGTAC

TTTGATATATTCTCTCCTGTTTTATACAACTTAATTTTTTTTAGAGAAAAGATTTTGCTCTGTGGCCTAA

GCTGGACTGCAACGGCACAGTCATAGCTTACTTCAGTCTTGAACTGCTGGATTCAAGTGATTCTCCAGCT

TCTGCCTCTCAAGTAGCTGAGACTTCAGGTGTGCTCAACCACACCTGACTAATTTTTTGGTTATTTAATT

TGTAAATATGGGGTCTTGCTATGTTGCCCAGGCTGGTCTCGAGCTCCTGGCCTCAAGCGATCCTCCTGCC

TTGGCCTCCCAAAGCACTGGGGTTACAGGCATGAGCCACCACACCTAGAATACAACTTAATTTTTTAGTG

CCAGTGACAACCCACTGGACTGATTTCATAACCCATTAGTAGAGGAATGCACCATCTTGACTGAAGGTTG

GAATTTTCTCAGGGAATCTATGTAGCACTGATGATTGGGTTTCATATCCAGAGATTCTAGTTATGCTAAT

ACAGAGGCCAAGCAAACTATAGCCTGTGAATGGCCGGCCCCCTGGTTTTGTATACCTTACAAGTTACAAA

TGATTTTTACTTTTTTAAGTGCTTAAAAAAACCAAAATAGGCCGGGTGCAGTGGTTCAAGCCTGTAATCC

CATCACTTTGGGAGGCTGAGGCAGGCGGATCACGAGGTCAGAGGATCGAGATCGTCCTGGCTAACACAGT

GAAACCCCATCTCTCCTAAAAATACAAAAAATTAGCCAGGCTTGGTGGTGGGCGCCTGTAGTCCTAGCTA

CTTGGGAGGCTGAGGCAGGAGAATGGAGTGAACCCGGGAGGCAGAGCTTGCAGTGAGCCAAGATCATGCC

ACTTCACTCTAGCCTGGGCAACAGAGCAAGCCTCTGTCTCAAAGAAAAAAAAAAAGAAAGACACAAAAAA

AATCAAAATAATAATAATAATATGTGAATATTATATGAAATTCAAATTCTACTGCCCACAAATCATTATT

GGAACATAGTCATACTCATTTATTTATGCTTTGGTTTACATATTGTCTGTAGCTGCTTTTGCACAGTGAC

AGAGTTGAATATTTGTAATAGATGGTCCACAAAGCCTAAAGTAGTTGTGGCCCACAAATCCTAAAGTAGT

TACTCTCTCTCCCTTTACATAGGAAGTTTACTAATACTTGTGCTAAGGGATCTCAACAGACAATCTGAAA

AACTTAAGTTTTAGACTAAAGATTTCCAATCTAAATTCCTGTGGAGCTTTCTGAAGCTGCCAGGTGGAGA

TGGGAACAGGTTGTGAGGCTGCAGGCCAAACACTCAGGCCAGCTTCCACCAAGCAGTTCAACTCTGTCTG

TTTCACACACTGATGAGCTTATCCTTGGAAAGTGATTAAAGTAAAATTAAATGCGAATTGAGGGAGGAAG

TGAGGGAGACTGTGGCTCTAAAACAAAACCCTAAGAAACACCAACATTTAAGATGGCAAATGATGTTATT

TCTAAAGTCGTTCAGGCTAATATCACATACTATAGCTGTTCACTTTATAGATAAAGGTGACACTACAACC

ATAGAAAATGTAAGAGTGGACCTCGAAACTCAGGAAGATGAAGTTTACATATATTAATCTATATTACCAA

CTGGAGCAGTTGTTCTCACTGCTGGCCGCACATCAGAATCCAATTCCTGGGATATCACAGATGATTCTAC

CATGCAGTCAAGGATGAGAACAAACTAGGTTCATTTCTGCAATTTTTTTATTGTTCAACCAGTGAAAAGG

AAGTACCAGTGGTGTGAGAACTTTGGGATAAAGTTTTTGTTTTCAATTAAAATTATTTTCATCCAGCCCA

ACTTCCTTAAGCCCAAATTTAATGTGTGTGAAGTTCAGCTACAGAAATACCAAACCTTAGACTAAAGCGG

ACACAGGTAAAATATGTGAAATCCTCTTTTGTTCTGAGGATTCTTTAGTAGGCAGGAGTGACCAGATAGG

AATATGCTTGGCTGGAAAAATTAAGATTCAAGTTAACAAACTGTTAATAACCAGGACCATCTGCTCTTCC

GTAATGTGGATTTGCCACTGCAGGTCACCCTACAATGCTATGTTAGAGGTACAACACTCTTACCCTCAGG

CTATAAACAAGGTGAATTATTATCTTTATATCTCTTCATTTAGCCCTGATTTGCTGAAGTGAAGGCTCGC

TTGAGAGTTGGTTGCATTATAATTTGGTGAGAATTTAATCTCTCAATGACAACTTACTTGATTCCCTCAT

TCTCTTTCTGCTACATAGATCACAGTAGACCTTGGCAGACAGTTCTGTAGTTACATAGGTCTGAATTCAA

AATCCAGGTCTGCCACTTGGTGGCTGTGTGAACTTAAGCAAGTCAGGCAATGCTTCTGATGTTTTTTTCC

TCCTCCACAAAGAATAATTAACATATAACAATAGGGTCTCAGCTAGTTGTTTTAAAAATGGTTAGAGAGA

TGTGTGGAATGAAGTAAGTGTGCAGTAAGTGTTAACTACAAATATTATTATCTTAGACATACAGATTTCC

ATGATTCATGAATGGTGAAGCATCTTAGAAGACATCCATTCCAGGCCAGGCATGGTGGTGTGCACCTATA

GTCCAAGTTGCTCAGTAGAATGAGGCAGGAGAATTGCTTGAGCCTAGGAGTTTGAGGCTAGTATGGGCAA

TATGGTGAAACCCTATCTCAAGAAAAAAGCAAAACATTTTTTAAAGTTTAAAAAGAGAGACATCTGTTCC

ACTACTCTCATCTTAGAGGCCATAAAACTGAGGCTCAGATAATTTCAGAGACTTGCACAGATCCCCCAAC

CATTTGGTGGCAAAGCCAGGAAGAGAACTCTGCTCTCCTTTCCCACTGGGACAGTGGAAGAAATTCGTCT

TGATTTCCATCTGTCCAGGCTGAAGAATGTGCACTGGCTGGAATGACAGACTGACCGACTTTTTTTCTCC

ACCTCTGCTGTCTCAGCAATGGTTTGGGACAGTGTGGATGACCAGAAGCTGGATAGTACAGAGCCAGGCT

AAAGAGTTCAGGCTTCCTGAAGGGAAGCTGCAGTCCTCCTAGGCCACAACACCTTCGAGATAGAATACAT

AAAGCACCCTTCTCTACCAAGTTAGGAAAGGAAGAAGTGTGACCAATTAGCTGTATGGGGACTGCCAAAG

CATGCCAGTCTGAAGATGAGCAGAAACTGGCTCATTCCATTTGGCACCTAGCACACTAACTGCATCCGTT

AATAGGCCATGCTTTTCTCCAGAGCCATTGGCTGAAGAGATCAAATAAAAAGTATTGAGAATAGGCTACC

CAAAACAGTAGGCTCAGATGCTATCACACAAAGCACTTTATCCTTAAGTTCAATTTTTCTAAATTGTAGT

TGGCTGCTTTGGCTTAATAAAAACTTCCAAAAAAGAAAAACGAATGGCCACAGACAGTATGGGTATCTAA

CTATATTATCACAACTTGACCAAGATTGAACTTGCCAATCCTTTGGTTCAAGAGCCAAACAAAATCGTTC

CCTTAAAATATTGCTTCATGGGAACAGTCTTCTTCAAACATCTTTTAGCACAGGCAAGATTCCCATTTAT

ACATTAATTCTGTTCAAGACAATGAGATTGGGCAGAAAAGGCATTGAGTTGGAAGTCAATGGATATGAGT

TTTTATCCCAGTTTTACCACAAATTAGCTGAGCATAACTTCCACAGATGCATTTATCAAGTAGTTTTCAT

GGTCATTGCAATGCCAAAAAACTGTAGCATTTAGAAAATTTAGTTTTCAGACTTGGAAACTATTTAAGGC

ATTTCATATGAAGGGTGTGTCCTTGTGAGAGTTTGCTTATGCAAGATAAGGCTTCTTTCAGCTGCAAGTC

AGGAGCGAACCAAAACTCAAAGCAGCAGCTGCATGAGCTGACTTTATCACATCTTGACAAGAGCTCAGCC

ACTGGAAGTTTTGGCATACAGCGAAACTGAAGCGTACTTATACAATATCACATTTTATTTTTATTGTTTC

TAATAGCATTCCAGGTTAGAAATGTCAATTATTTGGGAAAGCTGAGGGTCTGGTAGATAAAGCATGCAGC

AGAGAGCTAGGAGGCTGGCTATTTCCAGTCGTTATCCTAACATGTCTTGGGCCCCCAAGTCACCCCACCT

CCATGGTACAATGGGAACTGTGGCAGAAGTCCACGCTCTCTCCCCCAACACATGGGGATAAGAGACAAGA

GAGGTGAAATGTTCTGGAACATATCCGATGTTATACAAGTATAAGCTGTGAGATGATCCAAACGCAAATA

TTGAATATTTCATTTTCTAGAAAGTATACCAATTCATTCCACCCTTCTCAAACCTAAATTACAGAATTCA

ATTCAGGTCACACAGATTTACTTTGTACTAAGTACCATAGCAAATGCCATTTCAGTGCCTGAAAACTGAA

AAACATAAATTTAAAGTAGGAGTTTGAGGCCTCACTAATATGACAAAACATACCTTTATATTTTATTTTG

CAGTAATTTGCCACTTAATCATTAAACTCTTATCAATCTGAGAGATTTGCCAACACTTGCCTGCTAGGTG

ACCTAAGCCTCCACATCAATGCATGTTATACTCCCCTTTCTCCATATGTTAGGCCCATGCTATTTCTTTA

TCCCTCCTCCTCTGCATCTTCACCTAAAACTCTGCCCATCCTTCAGGGTTCATCCAGTGATTCATTTGCA

AGCAGGCATGGGGTAAGGTCTTCAGAGTATGTTTCTCAGAGGCCCATGCAGCTAAGAAAATGTGCAGTGT

TGGCACAAGGTCTGTCTATTCCTGGGTAGCCAGATGCTGGACACATCTTTCATAACACCACAAGGTAAAT

ATACTTCACTTGGAGAGAGAGGTGAAATTTTGCAGGTATAGACTGGATGTGTTCCTGCCAGAAGATGTGA

AGGGATTAAGAAACTGACTCTCATCTCCGTATTGCTAGAGCAAAACATAATTTCTCATAGTGGCTATAGT

ATAAGGACACTGAGGGGTAAGAGATATAATCTAAGTAATACAATAAATTAGTGTGGAAAAATCATCAAAA

TGAAGACTACATGGTTTTTACTAAAATTCTAGCTTTTAGGATGTCCAGGGAGCTCAGGAATTTAGCTGTC

CTTTTTTGTATGTACAATATGCCCCAATGCTTGCTGACTAATGTACTAAAACATTAGAGAAATCTTGCTG

ACAAGATCTCAACCAGTCAGCGAGATCCGGAAGGTGAGACTAATATTGAGGGTCAGCAGAATTAAGTCTC

AGTTCTGCTGCTTACCAGATATGCTGATCTGAGCTAGTCATTTAATTTTTATGAGACCAAATGTCTATCT

GTAAAGTCGGCAATTTGGATTAGATGTGCTGCAAGTGGTTTTCTAGCTTAAATGTACCTTCTGAATTCAA

CAGGACAATACTTAAACTGACCTTTAATCTAGGAATGACACAAGTAGATTTTTGAAAGCTACTTTAGCTA

CAGAAAGCTGAGAGCACCAAAGGCAAAGAGATAAAAATAACAGGAGAGCCTTCCCTTAATCCAGTCCCTA

AGCAGTTTTGGCAAACTAAAGTTTGTTGTTCAATGGTTACGAGTTTGCTTCAATGCTTTCTACCCAGTTT

ACTGAACTAAATAGTATATAGCTATAGTAAAAAGTCCTATTCAAAAACCAGCTTCTCACAGATATTTTGC

AGCTTTGCAGAATTGAATATGTCCACAGACGTCTATTAGCTGGTTAGGGTCTTAGGAATCTAGGAGAGCC

AAGTAGTTGTGTGAGCTGTTGTTATCAAATGTAGTTTTGAACATTCTTGGTGATTTTAAGGGATCATATT

GTGGAAATTTGGTTTCCTTACCTTGAATTTTGAATGAAGCTTTAGAATTTGAGGATGTTTCTTTGGTTTC

TCCTTCCAGGTAAGTGATTTTTTTTTTTTTCAACCAGATGCTGGTTTATTTAATTTGAAGGTATTGATGA

AATTCTTTAAATTGCCCCCATGTGATTCTACTCTGGAATAACTACGAAATTATTTAAAAGTTAATTAATA

CAAGAAAATATGAAAACTCATTTTTATGGGAGCTATTGTTCCTTCAAGATGACACTGTTTTGTAAACTAT

AGACTTCCAGTAACAAGCCTCTGTGCCTTCTTCTTACCACTAAGCATGCATGGGTATTAATTCCTACTGA

AAGACTTATGCTATCTTTTTTCCAGAAATGGAAGAAAAATGAACTATGAAAAAGGTCATTTTATAGGTCA

GCTACCACTATGAGATTGTTGAGGAAATGATATAAAAAACAATTTTTATCAAATTATCTTTAGGGCATTT

ATATGTTTATTTTCTTACTATGTTGACTTAGGTGACTATAAGAAGTTGTATCAGAGCAACTGATTCTGGT

GAATTAAAGCAAGTATTTCTAAGAACATAAGTGGCAACTTTCAGTCTCAAATCAATTTGGCCACCAATCA

GTTTTTGTAAGGGTACAAATAGGACATAACATGCTCAGATGGGACTTGGATAAAGTGTATACAATTTTAC

ATCGAGGAAATTGTGTCAATGTGTTACCTTCAATGTTAGAAATTCCCAAGTTCTGACAATAGTTCAGAGC

CTTGTTAAAAGCCAGAGTGGAGGCATGTAGATCCAGCTGGAAAGAGAGGCATTATGGTCTAACTTAGGAC

AAATTTTAAAGCCAGTGTTAGGGTCTGAGTCCAGCTTTGTAAACTTGAGTACAGTGTTTGATCTCTGGGG

TTTCAGCCTTCACTTCAGAACAAAATTTCCACCAAGTGCTCTTTTACTGTGAGGAGTAGCTGTTGAAGAA

GAAAGAAGTCTACTTATTTGCTAGAGTGTTACAATTGTTTTGATAAAGCTCAAAACTTATCTAAATAAGC

TCTCTCTCCCTAAGCATGTTTTCATTTTTATAAAAAAGTTACATATACTTTGCTTATAAATTTAAAATAC

TTTTCACCTCCTCTGACTTCATTTAAAATTAAAATAATTAAAGTGCCAATTTTAAGAGATGTTAGCTCCC

ATTATTGGTTCTTTGCCATATTCTTTTGACAACCTGCTGTAATTTTCTGCCCCCTTTAAAGCCTCAGGCT

ATAGGCCTTCTCCACCAAAGGAATATTAAGAAGTGATAAGGACCTTCTGTGAGCAGAAGTGGCTTGTTTG

CAAAGGGACTGCTTATCTTGGCCACTCTTGAACACAAGATGGGACCCTCTACTGCAAAGCTCTGGCATGT

TTTTTTTTCCCCTAAGTTATCCTCCATACTACTGACAGTGATTTTCCCTAAATAAAAAACTGCTTCAAAC

CATTCATTGTCTTTCCACTGCCTTAAAGATAAAGTCCAAATTCTAGAACATGGCCCACAGCATTTGGTGC

CTCACCACCTCTTCAGCCTCTCAGTTGCTGTTCACCCATTTCTCTATTCCTCTCCTTCTCACACCTTGTG

CTGCAGCCACATAGATAACCTGCAGTTTTTGTAACGTGCAATGATGTCTCAAATTCCAAGGCATTGCTGG

TACCACACAGCCTGCCTGGTAAAATCCTAGACTTCTTTCAAGATAAATTCAAAGACACCTCCATGAGGTC

TTTCTACCTCTCCAAGTAGAGTTGACCGCTGTCTCCTTTGTGTCCCCACTTCCACCACCATCCTAAAATA

CTTATTATACTTAGATTAATAATTGTCGCTCTTACTGCACTGGAATTACCCTGAAAGGAAAGGCCATGTA

TTATTTATCATTGTCTTCCTAGTACATAGCCCACAGCCTATACCTCCCACCCCAAAAAAAACCTTTTGTA

AATAATTGAACAAATTAAGAAACACCCAAGGCCCCCAGTAAACATCAAGGCCTAAGGAATGCATATCTGG

ATTCTAAATAATCATAAGGTTTTACAACACCATGTTAAGCACCAGGGACTTCAGAGAGCTTTTAGTCTAA

ATCTTATTAGAGAGGCCAGCGAAGACCTCCCAAAGGAAGTGGCATTGAACTGAGACTTGAAAAGCCAGTA

GTTAGGCAAAGATAGGGAGGGAAATATTTCAGACGAAGGGAGGAGATGGCACAAGATTTAGGACACGGAA

AAGGGTATGGTGCAGTCATAGAGAAAACAGATGTGCAGAATGGCTGGAGCCCCAAGAGGGAAGGGAAGGG

CGAAGCAATGAAGATGTGAGGCAAGCAGGACTGGACCATGCAGAGTCTTGCAGATGTTCACAAAGAAAAT

TGCAGCAGGTAGTCCCTAACATCGTGCTGAACAGTTAGGCAACTTGGAGGAATATGTATATTTGTACTCA

TAGTCAAAACCACTAGATGGCATTTACAGACTACGTTTTGTGTATTTTTATTTTTTACTTTTTGTTTTTT

TTTTCTTATGTTAGCAAAAGTATGCTCGCTATTGAAATGTTGAAAATATTTCATTGGTCTTAAAATGATG

CTTATTTTTCCAGATGCTTGCATTCATTCTGCATGTGCTATTTTGTCATGTGGTTTGCTTAATTTATTAA

ACAATTGTATTAATTAAATATATTAATTATAAATTGATTAATTTATAATTAATTATGTGTTATAATTAAG

TTAAATTTATTAATTACTTAAATTATTATATTCACATTCAGATGCAATCTGAAAACCCATTTGTTCTCAC

ACTGCTATAAAGAAATAACTGATACTGGGTAATTTATAAAGAAAAGAGGTTCCATTTGACCCAGCCATCC

CATTACTGGGTATATACCCAAAGGACTATAAATCATGCTGCTATAAAGACACATGGACGTGTATGTTTAT

TGCGGCACTATTCATAATATCAAAGACTTGGAACCAATCCAAATGTCCAACAATGATAGACTGGATTAAG

AAAATGTGGCAAATATACACCATGGAATACTATGCAGCCATAAAAAATGATGAGTTCATGTCCTTTGTAG

GAACAGGGATGAAATTGGAAATCATCATTCTCAGTAAACTGTCGCAAGAACAAAAAACCAAACACCGCAT

ATTCTCACTCATAGGTGGGAATTGAACAGTGAGAACACATGGACACAGGAAGGGGAACATCACACTCTGG

AGACTGTTGTGGGGTGGGGGGAGGGGGGAGGGATAGCATTAGGAGATATACCTAATGCTAAATGACGAGT

TAATGGGTGCAGCACACCAGCATGGCACATGTATACATATGTAACTAACCTGCACATTGTGCACAGGTAC

CCAAAAACTTAAAGTATAATAATAATAAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAA

TAAAATAAAAGAGGTTTAATTGCCTCATGGTTCTGCAGGCTATACAAGAAGCATAGTGCTTCTGCTTCTG

GGGAGGCCTCAGGAAACAATCATGGCAAAAGACGAAGGGAAAGTAGGCACGTCTTACATGGTTGGAACAA

GAGCAAGAGAGAGAGTGGGGAGAGAGAGCCTTGGAGCAGGAGCAAGAGAGAGTGGGGAGGTGCCACACAC

TTTTAAACAACCAGATCTTATGAGAAATCACTATCTCCCAGACAGCATCAAGGGGGATGATGTTAAGCAA

TGAGAAACCAGCCCCATGATTCAATTACCACCCACCAGTCCCCACTTCCAACATTGGGGATTACATTTCC

CCATGAGATTTGGATGATGCCACAGATCCAAACCATACCACTCACCTAATTCTTTCTACGTAAGAATTTG

TCCAAGCATTTATAACAATTAGCATTTCATTTAACATCTTTTATGAATAAAGCACTATTCTCATGCTGAG

AAGATTCAAAATAATGGGAAATTGAAGTCCTAGGAACAAGTTTTATGTTTCAGAAGAGCCCATTTGGTAT

CCACAGGGCTAAGAAATGTGCACCCTAAATGTAAGTGGATTACACTGAACTGAAAGGTGTAAAGAAGGAG

TGGAAGATTAAAGGGAGAAGCTTGGAGAGGATGAAAGTTAGAAATGGAAGTGACGAGCACACCTGAGTGA

AGGATGAGAGCTCCAGCTGCATTTTCCAGTTGTATTCCCATGTTGCTGAGCCAAAGGCTGATCTCAAGTT

TATTGTTACATGCCCATTTAAGGCTTCTGGCCATTAACACTTTTGATTTTTTTTGGCTTGTTGTTTTACT

AGCTATTTTCACAACACTTTCATAGCTAAACCTATTTTACTCAGATTGTATGCCTTTTCAAAAATACAAT

AGAAGGTCCATATTCCATTATCTAGAAATAAGCCAAAGCTCATATCTAACATTTATTAAGAGAGATGGAT

TATTTTTGTTCATTAGTTATCTTTATAAATAATTTTTACGTACTTTAGTTGACTCATAAAGATGTTTCTT

TCTGTAATTTTAATCTTAATATTTGTTGAACTTCAAAATCCCTATCACCAGGTTATTGTTTAAAAGCATT

GGTTTTTATATTATCTTAAAAGCCATTATACCTGAGTGCTGAACAACTTAGAAACATTCAGTAATTGTTT

TGCATGCTATTTAGTGAATTCATATGGCAATCGTTTATACATACATGATGGAATCAGGTGGCAGGCCAAG

TTAAAGAGCAAGGCCAGAAAAGAACTTAAAAGAGAAGAGAAAAAATAGACAGTTTAGGAACAATAGATCA

TGTCTTCTCCATGATTTGGAGGTAAACTGATTACCTATCAGCTGATAAATAGAGGAAGGTTTTAGAAGTC

TTCAGTTGGGTAGACTAATGAGAGGTGTCAGAGAAGATGTTTTCTGTTGTTTGTGGGTTCTCCAGGAAAC

TTTGAGCATTCAGCTGAGGGGCCAAGTTGGCTGCCTCTGAGAAGAAGCCCTTCCACCTCCACTCCATTGC

ACTTGGGTGCCATTCCCCTCAGTTGAATATCTCCAAGAGATGAGCAAATGTACATCTACAGAGTTCAGGG

TACTGACTTTTATCATAATGATTTATAACTCTCAGAAGAGTGAAAAACACATGAATGCACAGAATAGGAG

ATTGAAATATAAACCACAGAACATTCATACAATGGAATACTCTGCAGTCATAAAAATCTTCTCATAGAAG

AATATTTGACAGCATAGGGATATCTGTGGCATATTAAGTAGAAAGTCAGACTTGTAAACATTATATACAT

ATTCACGTATATTTAAACACCATGATCCCATATTTAGATATAACAACTAAAAGTTCAGATGGCTATATAT

CAAAATGTGTCAAATGTTCAACCTTGCATAGGCTGACTGTAGATGAATTTTATATTATTCTTTGTGCTTT

CTTGTAGTTCCCAAATTTTCTTTACTGAATCTATATTACTTTTGCAATTTAAAGAATTTAATTTATAAAA

TTTTATAAAATAACTTATAAATTTGAAATGTATTGCATTTAAGAATAAAAAGTGTTTAATTACAAAAATA

ATTCACAATTTATTTAATGAGATTTTAAAAGGATATATGTGAGTCTACATTCTGATTTCATGTTTGCATG

CATGGTTTTTTTTTTCTTTTGAGACAGAGTCTCGCTCTGTCGCCCAGGCTGGAGTGCAGTGGCGTGATCT

CGGCTCACTGCAAGCTTTGCCTCCTGGGTTCACACAATGTAATAGTGTTTTATTATTGTTTCCATTTTTA

TTGAAGAAGTAAGATTGTCCCTAGCAGATGGAGACACTGAGATATGGGACAGAAGTTTTGTTCTATATAA

TTATTATGCGCTTCCACCTTTCTTAGCATAGACAGTTTCCAAAATGCAACTTCAAGTTACCCCTTTATAA

GCATAATAACAATAATACCCAACATATATGTAATGCTCTTTATGTGCCAAGTACTATACTAACACATGCA

CATTACATACACACACACCACATACACACACATATTTAAACTAATTTCGTTCTCACAATGACATTTTGAG

GCAAGTATTATTATTGTACAGATGAGAAAACCAAGGCACGCTTTATCTGTAAACCTCTGCTATGCAGAAA

TTCTGGAGGGGCTTCTGGCCCCTTAATTTTAAAATAAGGCCAATAATACAATACTTACCACATAGCAATT

CTCTAAACATTATGTAAGATATATACCAAAGCGCTTAGCTCAGGGACTGGAGGGATGTGAGGGAATTTGT

CTTTTGCAATATGCTTTATGGTCCGCTCAGTCACCTCGTTCTTAATCCCTTTCTCAACTTCTATTTTATA

CAGCAATTGTGAGCATATCAGCATCAAGTACCACTGGTGTGGCAATGCACACTTCAACCTCTTCTTCAGT

CACAAAGAGTTACATCTCATCACAGACAAATGGTTTGTTTTCATTTTTATTTTTAAATTGTGGCTCCGAA

ATCATTTTTGTGATGTAACCCATTTTAGGGGACCTGTCACTGCAGAGAAACTGACAAACACTGAGAAATG

CGAGCTAAGTAGACACAGCCTACTAAGTAGACACAATTCCTACTATGGAGGAATTCTTGCCTCTGAAATA

TCTCACAGAAATAATACTGTGAGTTAAAGAAATTAAAACAATGTGGCAAAGCACAGAAATGATGCACGTG

ACCATGAAATAGTGGGCCAGATAAAGGGGACCTAATAGTGCGGTGGTGCGGAGGGTCTGTGGGCAAACTG

AGTTCAGCTCAGACCCGGGCTCAGCTCTATGCCAGCTGCTGACCCAGGGTGAGTTGCCCTGCAGGGTTTC

TATCCCATTAATTTTAAAATGGGGCCAATAACACAGTACTTATCTCACAGCATTTCTCTAAAGGCTAAAT

AAGAAGATGTATCTAAAAGTTATTAGCTCAGAGCCTCACACATTCTCAGTGACTGATAAACAATAAGCAA

AGCTGGGTGCTGAGATAAGAGTAATCTGGTGGCAGTCTCTCTTGTTAGTTTTCAGGGGAGAAGAAGAAAT

TCTGGAGCCGCTGCTGGGAGGGATGTGGGAGAGTTTGTCTTTCATAATACGCTCTATGTCCACGCAGTCA

CCTCATTCTTGTGCCCTTTCTCAACTTCTCTTATATGCAGATACGCACAAACGGGACACATATGCAGCCA

CTCCTAGAGCTCATGAAGTTTCAGAAATTTCTGTTAGAACTGTTTACCCTCCAGAAGAGGAAACCGGTAT

GTTCTTAGTTTTAAATAGTTGCTCTGGAGTCATTGTTGTGATTGAACTCTATTTACACGAGCTGTAACTC

ATGACAGTTCTCAAGCTTTCGTGACAGAAAACCCATCTCTTTTACTCCAAAGCCCATATAGCACCCACAA

CTATTAACTGTGACCAAGAAAGAGAAGGCAAGCCCCAATTAACCTTTGTACGTAAAGCCTAAAGAATGAA

AAAATATACCTGAATCCTCAATCATCAAACAGCATAGTATATACTAAGTAATTTGTAATAATTAAACTCT

AGAAAATTGTGTGGCTTCGGTAGTAAGAGAGCTTCATGATGTAAAATGGCAAGTGGAGACAGAGACAAAA

GTAGGATGTGGACTGAGAGGGAAGGTTAGCACAGGTGGAACAGTAAGGCAACCATACTATCAATTGCTGC

TGACATAGAATCCAGAGAGACTATTGGCAAAAGCTCAAATGAGACACAGTAACAGTTTAGATTCAGACAG

TGGCTGTGGCATAAATCAGAAAATTGATAGTCGCATGATCCCTCTTTGCATGGGACTGGCATCTGTGTGG

AGTAATGGTTCCATATGCCTCCTTTCTTCTCCTTATTTTTAAATTTTTTAAAAATGCATTGCTTCTTGTG

GAAGTCAATAAGTGATTCTTCCAATACTTTCTCATTCCTTCCCCCTCAGTTATGAGACAATTTGCTTATT

TCTCATCCATGAATACTTGTTGGGTCATTAAAAGTAGATACTGAAATTACTAATGGTACGACTGACATAT

TACCTCATAAATGTTACTAGCTAGATGTTGAAAGTTGACCAACAACTCTCAAAATATGATTAAGAAAAGG

AAACCCACAGAACAGTTTGATTCCAAAATGATTTTTTTCTTTGCACATGCCTTACTTATTTGGACTTACA

TTGAAATTTTGCTTTATAGGAGAAAGGGTACAACTTGCCCATCATTTCTCTGAACCAGGTATGTTAATAT

TTGACAAAGAATAAAAGTCATTCCATTTTAAACTATCCATTGCTTGTTTCAAATGCCTAAGAAAATGTGT

CTATCTTAGAAGAGCATATGTTGTTAACTTTATTCACACAAAATTGTAAAGGCAAAGAAAATATTCTCTT

TTTAAAATTAAAATAGGCATTTCTTATTTTTAAAAACATTTTGGGGGCCAGGGGCCGTGGCTCATGCCTA

TAATCCCAGAACTTTGGGAGGCTGAGCCTGGCTAATCGCTTGAGCCCAGGAATTTGAGAACAGCCTGGGC

AATATGGCGAAATCCATCTCTACAAAAAATACAAAAATTAGCTGGCATGGGGCACGCACCTGTAGTCTCA

GCTACTTGGGAGGCTGGCTGAGGTGGGAGGATCGGATCCATTGCCTGAGTCTGGGAGTTTAAGGCTGCAG

TGAGCTATGACTGTGCCACTGTACTCTAGCCTTGGTAAGACCCTGTCTCAAAAACAAATACATAAGTAAA

TAAAAATAAATAAAAACATTTTGGAAATAGAAATACATAATTTGGTAATAGTTTTTCTCTTAAGTTAGAT

GTTTTACCTTTCTAACCAAGCCTGAGTACTTGAAAAAAGCCTCATAAGAGCTTATAAAACAAATGAACTT

CCCTCATATAAAAAGCAAGGCATTTAAAATCATCTAATTAACTGGTACTGTATTTCAAGGGTAAATCTCA

GCCTTGATTCATTTTTGGCCCAATGCAACCACTTAGGGACCATCTTGACAACCTCTGCTGAAGGGACATC

CCTTCCCCTCACTTGAGTATCACTGTGTGTGCTCATTTGCTATTCTGCATTCCAACCCTCCCTTCACACT

TGGCTGTGTCCACGGCTCACAGGGTAAAAAGCACATCATAGAACTTCATCACTATCGCATACATTCAAGC

TAAGTGGTCAAGAAGGCTGGGCAACACCAGCAAGAGGAAATGCTACTTTTACTTTTTATCAACAATAGGG

CTTTTAAATATTAATTAGGCAAATAAATGAGCCATTTTACCTTTATGTCTAGCCTTCCATTCTATTTACT

TCAACTGGAAGCACTACAAATATGCTATAAATATGGAAATATCTCTTAATTGATTTCAATTGTTTCATTC

CCAACATATAAATGACTCAACAAGCATTTTTAGTGACTACATTGGAGACTATGCATAAGAATACTATGGA

AGGAATAAAGCTTAGAACATAGATGACCTGCATTATAATTATAATTCTACTTTTAACTAGTTGTCTGACC

AAGGCTAAGTTAACCTTATTCAGCTTCTTTTCTTCATTTGTAAACTGTTTATACCAGTTTCTTTCCAAAA

TTATGATTCTATGATCTGTTCAATGCTCTTTTATACATTAAGACATTATTTTCTCTCATAACTTCCAAAC

TATGGGAGAATTTGTGGTTTTTTCCCCATATCTGAGGAGAACGTCCACTGAGTTCTTATCTACAGTTACA

CTAGTGAAGAACGCTGGGTCTGGAATCAGAAGCTTCAGGTCTTAGTTCTGTCATCAACTATTTTGCGACC

TTGGACAAAAGACTTGATCACTCACAGTCCCAGTTTCCCACAAGGTTACTGTAAAGCACACAATTTAAAA

AAAGACAAAATCTACATAATAGTATATTAATTGTGCTTTCTATTAAAAGGCAAGGTGATGGTATGCTGAT

GTTATCTGTCTTATTTTTCAGTTGCTATATGGTCATTTATTTCAGACTTTCATAATTTTGCTGCTCTCTT

TATCTCCTGTAGAGATAACACTCATTATTTTTGGGGTGATGGCTGGTGTTATTGGAACGATCCTCTTAAT

TTCTTACGGTATTCGCCGACTGATAAAGGTGAGAATTCAGTTTTTAATTTTGCTGTAAATACCAATGTGA

ACAGCTCTAAGAGGGTTTATTCCTCTGAGTTCAGTTAAACTCAAAAGAGAAACAGAACTGCATAAAATTC

CATATTTTTCAACTGGACACATAGAAGTCACTGTGTTTCTCTAGCAGAATTTTTCTTTGCATTTGCCCAA

TTAAAGGGAACCTCTAAATATAAATCTGTCCCCCATTTTCCCAATGAAAGATCTCCCTAAGTTTTTGTCT

AACTTGCTGTCACATATTTTGATGGATATTGAGGAAATATTAAGATTCTACTTATAGTATTTACCCTATT

AGTGTATAAAATATTTAAAATAATATATTTACATATGTTTAAAACTTTGAGGGAAGCCAAGGCAGGAGGA

TTGCTTGAGCTCAGGAGTTTGAGACCAGCCTGAGCAAAAAGGTGAAACCTAGTCTATACAAAAAATATGA

AAATTAGAAAGGCGTGGTGGTGCACATGTGTAGTATCAGCTACTCAGGGGGCTGAAGTGGGAGGATTGCT

TGAGCCTGGGAAATCAAGGCTGCAGTGAGCTGTGATCATGCTACTGCACTCCAGCCTGGGCAACAGAGTG

AGACCCTGTCTCAATAATTATATAAATAAATAAATAAAAATAAACAAAATAAAACTTTTGCCTTTCTTAA

TTCTCACATATTCTGAAACAGATTTTTCAAATTTCCACCCATGAATTCTTAACATCAGTGATTTTTTTTG

AATCATTAATGCTTTTTTTAATTTTTTTTTTTTTTTTTGAGACAAGAGTTTCCCTCTGTCACCCAGGCTC

GAGTGCAAAGTGGTGCAATCTCTGCTCACTGCAGCCTCTGCCTCCCTGGTTTAAGTGATTCTCGTGCTTC

AGCCTCCGCAGTAGTTGGGACTACAGGTGCGGGACACCATGCCTGACTAATTTTTGTATTTTTTTAATAG

CAGAGATGGGGTTTCGCTGTGTTGGCCAGGCTGGTTTCAAACTCCTGACCTCAAGTGATCCATCTGCCCT

TGGCCTCCAAAGTGCTGGGATTACAAGCATGAGCCACCACGCCCAGCCCACTAATGCTATTTTTACATCC

ATACAACACAGCTTATCGAAGTGCATAACTTTTGCTATCACTTTCTATTCACGATATTTAAGACATAATA

TGTGTGTGTGTATTTATGATGCTGTCACTGTCTCTGTAATCCTAGATCAGAAGTACTTAGTCACATGAGA

TTGGTACAGTTGTGTTTTCATTCATCCTCTATTCTTAATCTCTCTTTGTGATTTTTGAGACCATAACCAC

TATATAATTCTTTTAAAAAGGCTGAGAGGTGTGACAGCACTGCAATTGTGGGGCCATCAGAAGATATGAT

AGTAATATCTACATTAAGTTCCTTTGCCTCTTTTCTTTTTTAACTACTTCTAACAGTTAACTTCTACCAT

CATCCAATCCTATAATTGATTTTCAGTATTCCATGTAAATATATCTTCCTTAAATAATACTTTTTGTTAA

TCAAAGAAAAGTAACTGAAAATGCCTACTCTTGTGTGAGATATTTTGTAAGGACTTTAATATAAGATAGC

TTTTTTTGCCTGGAGTATAAAAGAGAAAAGTCATCTTCTTACATGGGCATATATGGCAAAGTGGGTTGTC

TTCTCTCTTCGTCAATGTTCTAAAACCTGAAAAAGCCAAGGAAATATTTAGTTGGCAAAGTTCAGAGAAT

TTTCTAAGTGTATATGGATGAATTTTGTCCTGGTCAACATGATGCAGAGATCACACACTTTATTTTTATT

TTTATTTTCACTTTCACTATTTATTACAGCAGGGAAATATGTAAGTATCAGTGTTTGAGGTGATATTTCT

CCTACTGAAATACCAAATACTATAGAGGAACACAAATACAAGTTTAAATCAATGCTTATACCAGTAACTA

GTAACAACAACAATAACAAAATCTCTGCAAAGGGGATTTCAACCAAAAGAAAAAAAATTTTAGAAAAAAA

TATTTTTAAGCTGAAGCATTTTACTTTTTACTGTCTTAAGACTAGAAAATTGTGTTATTAATATTTTATG

GTATTTCTTCATAGAAAAGCCCATCTGATGTAAAACCTCTCCCCTCACCTGACACAGACGTGCCTTTAAG

TTCTGTTGAAATAGAAAATCCAGGTTGGTGTTAATATTTGCAGTTCCTTTTGCCTTTTAGGAAAAAAAAA

TCAAACCAGTGAGTTACTTCTTTCTGATTTGAGGGAGGAGGGAACCAGTTATGATTCATTTCTATTCTAT

CTCATTAATTCTACTTCTTTGACTTTTTAGAAATGTCTGCAGCATAGTGAGATTCTCCTTTGGACACAAA

GTGTTTTGTTTTGTTTTGTTTTTTTAACAAAAAAAAAAAAACTCAATCAAATAGTAAAAGCAAAAGAGAA

AACCAAGTGTACTTCGTATTTCCCAAACTGCAAAGTTATGTGTATAGGAGACTCTATGGTCAGTATGGTG

TAGCATAGTGAATTAGCCCCAGATCTGAAATCAGACTTGGATTTGAATCCATGCTCCAACACCTATTAGC

TGTGTAACCCTGAGCAAGCTACTAAACCTCTTTTAATATGGGGATAATGATAGTATCAACCTCACAAAGT

TTAATGAGAATTAAATGAGCTACAACCGGTAAAGCATTTAAAACCATTTGTGGCCATCATAAGTCCTCAT

GCCTGTTAGCTGTTATCAATATAGCACTGACATCAATGCTATATCAATATAGCATGTTATCAATATAGTG

TCATTCCCAAATGACCTCCTGTGCACACTGGCAAGCCATCTGGCACATGCTTTCATCTCCACTCCCAGGT

GCTAAGCAGATACAAAACATGTGAAAGGCCATGGATATATTTTGTTTATCCAGAACAGTATTAAACCACA

TAGTGCTTTTTGAAAAGAATATTTATTGTCAACCTTTAAAAGTCGGAAATTGTTACATTTTAAAAATCAA

GTATTGCTATTCCTCTGGGGAAAAATGTAAACTCCCAAAATGCTGAGAGCCTTCATACCAGCATGAGACC

AATTCCTAAGAGCTGAGTAGTGGCTGCTACCTGTACTGTCTGTCTAAATCCCTAGCCAATTGCATTTGTT

TTATTCACCGTGGCCCCTGGTATGAACTCACTAAGAAAGCATATAGTTTCTATTAAACTTTGCCTGAAGC

ATAAACCCAAATGACATCTATTTTGGGAGATAGTTACTAAGAACAAGTCTCTGGAATGAGCTTTATTTCT

CAAGCAAAAGAGATTTCATTCTGCCTTCTACAAAATCAACTGATTTTACTCCCATAATTTTCAGAAATCA

TGACAGATCAGAGGTCCTGTATGCTTCTGGATTTCGATTTTAACCCTGGGCCAGTCTAGGTTTTCTAGAC

TTTAGAGTCACAGAACACAGAGTTTTCAAGATCCATCACAGCTACACAGGTTATATGCAGGATTTGCCAC

ATCACATTATCATGTGAATTCTTAAAGCTTAAGAGTAATTGTTACATAAGTTTATAATCCTAAGACATTC

CTGCTATGTGGAAATGAATGGCATAGATATGATTCTCAGCTAAAAGGATTAATAAAATCCAATCTGCAGA

TACTTGAAACAACGGAAGTTTTTGAGTCATATGCCAGATTCACTTCATTTACTAAGGTTATCTTGTTATT

GGACTGGCAGCTGGAACAAGTATCTGTAAAATATTCATTTTATCTGCATTCTGCCTTGTTCCACAAAAAA

GTCTTGATGTAGTTTTTCAAGTGGAGCAATTACAACCTAAAGCCTATTTTTCGAACTGAAATTTATATAC

ATTTTTAGCTACTTATTTATTCTAGAGACAAATTTATTGTTTAGAGTTTCCCCTGCCATTTTTTTCATAC

AATTTTAAGCATCTCAAATGTTTGGCACAATTTAATACGCCACAGTGCATCAAGATGTCCTTGTAGTTTA

ATTCAGTTAAGTGCAACAAACATTTGCTAAATGCATACAGTGGGGTAGGCACCACACTCACATTAGATAT

ACCAATATGAGTCTTCGTCCTTTAGAAGCTGAGAGACTAATGGAAAAAACAGAATGTCATTGCAGTGAAC

AAGTTCTACAGTAGTGGAGGCAATAGCTCCACTTGTCCCAGAGACTGAGACAGGTATCAAAGGCTTCTGA

AGATGAAATCACCTGGGATTAGCCTTAAAAGACAGATAGATATTAGCTAGGGCAGGGTAGTTTTAGCAGA

AGGGCAGCCTGAGTGAGTAAAAGCATGGAAGACAGAATATGTTTACTTAAAGAATTGTATGCATTTCCAC

ATTAGCAGGATTGCTGCTTTGGTTCTCTGTTCACATCTCAAATATGTGTAATGGCAGTGGAAAGTCAGAA

GAACCAAACTTTAGGCTCACTTTATTTCCCCACATTTGTGCAAGTGAAGTTATTAAATGTCTTAGTATGT

TAGTGAGACAAGTTATGAATTCTGACTGCACCTCACAGAAAACATAGGAAAACACATTATTAAAGATTAT

TTAAAATGCTTTATTTCTACTTTTATAGAATATGGCTCTAAATTAGTTTATAAGCCAAAGGCATAAGAGG

TTAAAATGACAGTACCATCTCAACAAGAACTAATGATGTAAAGGAGTAATTAGAGTATAAATTGTTTTAA

CCTTCTAAAAGTGCACATGATCTGTGATTGGTGAAAAATGAGAATAAGCGAATCTGAGTCAGCTGGCCAC

TGTGGCATGCATATGTGACCCACTAGCCTATTTCCCACAGGAGAATGTTTGAGATGCACAGTTCCTGTGG

TGCCCAAATAGAAGAAGGCTGGAAAAGCTCTGCTTCTGGAAGAGCAAGGGCTCCCCTCTCCCTTTCATGC

AGTTTCTAGGAGCAACATAAATTCAACCTTCCAACCAGGAAAAGTGGAGCATCGGGTTTACTGGAGAAAA

CTAGCCCAGTGCCCTTCTTTTACACCCTAGAACCAGAGAGGAACTTGGCCATAAGCTTTTGTGCAGACTT

CTCCTTGGGGGAAAAAAAAAGTCATTATTTAAAAAGACATGACAGACTTAGACACATGCCTTAAATTTTA

ACATGCATATGTGATTCAACTTATCATTTACTGGCTTCACATTATATTTTGCCTCTATACAAGTTTGGCT

GTTTGTTTCTTATCTCTGTAGAAACTAGGAGCAGAGCAATTATATTTATTCTTTACCTAAGGCTTTTAGA

ATAGATATTCTAAGAAATTCTGTATTTTTCTTTACACAAAACTTGACAATAGAGCTAATATGTAAGGAGA

GTCCTTTCGTTTCCTACTAATTACATTCAAGAACAACTCTGCAAGAATGTAGAATCCTAAAATGTATACT

GTGCATTAATTTCCTGTTGTGTTTAAACATAACTATGTCTCATATTTCGGTCTTGTATTTTTTTTACTAT

AATCCTTCTAGAGACAAGTGATCAATGAGAATCTGTTCACCAAACCAAATGTGGAAAGAACACAAAGAAG

ACATAAGACTTCAGTCAAGTGAAAAATTAACATGTGGACTGGACACTCCAATAAATTATATACCTGCCTA

AGTTGTACAATTTCAGAATGCAATTTTCATTATAATGAGTTCCAGTGACTCAATGATGGGGAAAAAAATC

TCTGCTCATTAATATTTCAAGATAAAGAACAAATGTTTCCTTGAATGCTTGCTTTTGTGTGTTAGCATAA

TTTTTAGAATTGTTTGAGAATTCTGATCCAAAACTTTAGTTGAATTCATCTACGTTTGTTTAATATTAAC

TTAACCTATTCTATTGTATTATAATGATGATTCTGTCAAATGAAAGGCTTGAAATACCTAGATGAAGTTT

AGATTTTCTTCCTATTGTAAACTTTTGAGTCTGGTTTCATTGTTTTAAATAAATTAAGGGGACACTAAAG

TCCTATCATTCATTTCCTTCATTGCTGAACAGGCAAGATATAATATTACATGAATGATTACTATATTTTG

TTCACACTAATAAAGCTTATGCTCAGAAATGCCATACACACACACAAACACACACATTTATCATTTAATG

CATAAATCAACACAAAAGGTTTTCCCATTAATATGAAATATTACATATATATAAGTGCCATATTTAAAAT

AATTTGTCTAACAGTAGAACTATGTCGGAGCACTCACTGAAGCTTGCATTCCACTGAAAGAGTTATTTGT

GTAAGTAGAGTATCCGGAGAAGGAAAAGAACTTACGACCTTTCTTTATAACAGAAACTCAACTCTAAATT

CAACAAGATGTGCAAACCGGACATGCAGGTGAATATTTTAATAGGTTACTATAAGGTTCTCAATTAAATT

CTTTAATCTGTCCAGTCCCAGTTTCTCTTATTAATAAAACTTTGGAAATTGCTTTAAACCATTTAAAGGA

AATTTCTAGATATAGAAACTAAGGACTGTGACTATACAGCTGTCACTCATTTGTAGTAAAACTTAAAAAG

CAAAAACAAAAAACAAAAAAGACCTTCCTGTGATACTTTATTTCCGAACTAATAAAAATCTATATGACTT

TTTATTATTGTGTGATAACCAAGTAAATGTTTTCTATTTTGCATATTTTCAGGCATGGTAACAGAAATTT

ACCTTTTAATAAATTAAAAAATCTAAATTTTAACCTACTTGTATGTTCGGAGAGTGTTTTTGTACTATAT

TGACTACTTAAAATAGAGAATGAGACTAAGAAGGGAACATTTCTGTTGATACATGTTTTTTAAAAGAAAT

TTTAAGAGCATTATTAGGTTAATTTTAATCCAATTAATGACCCAAATGCCAAGGTAATTTTAAATTTACA

TTTTTAATAAAAGCAACATGTTGAAACAAGAGAGGGTGAGATTAACCTTTTTGCTAAAGTAATTTACAAG

TCAAAGACAGGAAGAGATCAGAGTGAATGTGCCTTCTTAACCAGAGCTACAGAATTTAGTGAATAATTAA

AGTACAAACTGCTTTGACCTCCTTGAACTTTTCCAAGCAATTTCTCTGTACTTCTATATATGAATGTCTT

AGCCAATTTTCTGCTACTATAACAGAATACGACAGACTGGGTAATTTAAAAAGAAAAGAAATTTATTTTC

TTCCTAGTTCTGGAGGCTGGGAAGGCGAAGGGCATGGCACTGACATCTGCCTTGTAACTGATGAGAACCT

TCTTACTGCATGATAACAAAGCAGCAAGGCAAGCAAAAGCGTAAGATGAAGAGAGAGGAAATGAAGCCAA

ACACATCCTTTCATCAGAAGCCCATTCCCTCTATAAGGCGTTACTACATTTATGAGAATGGAGTCCTCAT

GACCTAATCGTGACCTTAAAGGCCCCTCCCAACACTGTTACAATGGCAATTAAATTTCAACAAAGGTTCC

AGAGGTGACATTCGAATCAGCAATGAAATTTTCATAGTTAAATTTGGTATTCGTGGGGGAAGAAATGACC

ATTTCCCTTGTATTTTTATAATTAAATCAGCAAAATATTGTAATAAAGAAATCTTTCCTGTGAAGATACC

ATGACCCC

Enhancer elements use m the nucleic acids described herein can be single instances of an enhancer element sequence, or concatentations or repeats of one or more individual unique enhancer element sequences. Concatentations and repeats can comprise 2, 3, 4, 5, or more instances of a single sequence, or a collection of 2, 3, 4, 5 or more distinguishable enhancer element sequences (e.g., different elements from one gene or different elements from different genes).

In some embodiments of any of the aspects, the hematopoietic enhancer element is located at least about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least about 5 kb, at least about 6 kb, at least about 7 kb, at least about 8 kb, at least about 9 kb, at least about 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the hematopoietic enhancer element sequence is located at least 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least 5 kb, at least 6 kb, at least 7 kb, at least 8 kb, at least 9 kb, at least 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the hematopoietic enhancer element sequence is located at about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at about 5 kb, at about 6 kb, at about 7 kb, at about 8 kb, at about 9 kb, or at about 10 kb from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the hematopoietic enhancer element sequence can be in intergenic sequence or in the sequence of an intervening gene. In some embodiments of any of the aspects described herein, the target sequence can be identified within from the sequence which is about 500 bp to about 10 kb from the end of the open reading frame, e.g., about 1 kb to about 9 kb, about 2 kb to about 8 kb, about 3 kb to about 7 kb, or about 4 kb to about 6 kb from the open reading frame. In some embodiments of any of the aspects described herein, the hematopoietic enhancer element sequence can be located within the sequence which is 500 bp to 10 kb from the end of the open reading frame, e.g., 1 kb to 9 kb, 2 kb to 8 kb, 3 kb to 7 kb, or 4 kb to 6 kb from the open reading frame.

In some embodiments of any of the aspects, the heterologous regulatory sequence is a GATA1 hematopoietic enhancer minigene (G1HEM). The G1HEM can permit lineage-specific expression of GATA1 specifically in early erythroid progenitors but not in hematopoietic stem cells, e.g., as a gene therapeutic approach for the treatment of Diamond-Blackfan anemia. GATA1 hematopoietic enhancer minigene (G1HEM) comprises a concatentation of 4 distinct regulatory elements to achieve lineage-specific expression of GATA1 specifically in early erythroid progenitors. G1HEM elements as disclosed herein include a −3 kb hematopoietic enhancer, an upstream double GATA motif, an upstream CACCC box, and a segment of the first intron of GATA1. Indeed, the 979 nucleotides present in this minigene are sufficient to drive Gata1 cDNA appropriately to rescue a Gata1 knockout mouse and allow for ostensibly normal erythropoiesis.

In some embodiments of any of the aspects, the GATA1 hematopoietic enhancer minigene (G1HEM) comprises the following nucleic acid sequence (SEQ ID NO: 13):

ACCGGTGGCGCGCCGATCCAAGGAAGAGAGGACATTAGCATGGGTCTCAA

ATGGAAGCCTGACAGAGAAGACGCTTCAACCCGGACACCCCACCCCCGCC

TGCAATGGGCTCCCCCAAGCCTAGCCTGGCCCCCGCTGATTCCCTTATCT

ATGCCTTCCCAGCTGCCTCCCTGCTGGCTGAACTGTGGCCACAGACTTCT

GGGCCTTGCACCCCCTCCACTGCCCCCCAGCCCCAAGACAGCCTGTTACT

GCGGCACCAACAGCCACAGTCGAGTCCATCTGATAAGACTTATCTGCTGC

CCCAGAGCAGGCCAGAGCTGGCGTAAGCCCCAGGCACGAGCCGAAGCACT

AAAGAAGTGTATGTACCCTTACCCACTAGTAGTAAAACATGAAACTTAGA

TCTTGACTAATTGCTCATATGACTTGACTGGACACTGGACTCCACAGAAG

CCAAAGGCAAAGGGGATCCAACAACCTGCAGGATAGACAGGAAGGGCGGA

GGGACTAGAGCCTAAAAGGTCCTCCACAAGGAGGCGGCACACCCCCTCCC

CTGCACTGCCCCACCCACTGGGGCACCAGCCACTCCCTGGGGAGGAAAGA

GGAGGGAGAAGGTGAGTGGGAGGGAGGGAGGGCGGGCGGGCTGGCAGGAG

GGAGAGAAGGGAGACTCAGAGGCCGAGCTCCAAGGATAAATTACTTGTTG

AATAAGGATCTAATGTGTAGAACCCATACTGACATGGTAGCAGGCACATC

AGCACAGTTTTAGGGAAATGGGAGATGGAGAAGACTCACTGGAGGCTCAC

AGGCCTGTCCTGGTACACACGGTGGAAAAATATGAGACCCTCTTTAAAAA

GGAAGTGGATGGTAAGGACCAACACCCATGTTTGTCCACTGACCTCCAGA

TAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATA

GATAGACAGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGA

TTGACTGCAG

In some embodiments of any of the aspects, described herein is a GATA1 hematopoietic enhancer minigene (G1HEM) comprising, consisting of, or consisting essentially of a sequence of at least 80% homology to SEQ ID NO: 13. In some embodiments of any of the aspects, a GATA1 hematopoietic enhancer minigene (G1HEM) comprises, consists of, or consists essentially of a sequence of with at least 60%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98% or greater sequence identity to SEQ ID NO: 13.

In some embodiments of any of the aspects, the nucleic acid sequence comprises at least one, or at least 2, or at least 3, or at least 4, or at least 5, or at least 6, or at least 7, or at least 10, or at least 11, or at least 12, or at least 13, or at least 14, or at least 15, or at least 16, or at least 17, or at least 20, or at least 25, or at least 30 GATA1 hematopoietic enhancer minigenes (G1HEM).

In some embodiments of any of the aspects, the GATA1 hematopoietic enhancer minigene is located at least about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least about 5 kb, at least about 6 kb, at least about 7 kb, at least about 8 kb, at least about 9 kb, at least about 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the GATA1 hematopoietic enhancer minigene sequence is located at least 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least 5 kb, at least 6 kb, at least 7 kb, at least 8 kb, at least 9 kb, at least 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the GATA1 hematopoietic enhancer minigene is located at about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at about 5 kb, at about 6 kb, at about 7 kb, at about 8 kb, at about 9 kb, or at about 10 kb from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the GATA1 hematopoietic enhancer minigene sequence can be in intergenic sequence or in the sequence of an intervening gene. In some embodiments of any of the aspects described herein, the GATA1 hematopoietic enhancer minigene sequence can be located about 500 bp to about 10 kb from the end of the open reading frame, e.g., about 1 kb to about 9 kb, about 2 kb to about 8 kb, about 3 kb to about 7 kb, or about 4 kb to about 6 kb from the open reading frame. In some embodiments of any of the aspects described herein, the GATA1 hematopoietic enhancer minigene sequence is located s 500 bp to 10 kb from the end of the open reading frame, e.g., 1 kb to 9 kb, 2 kb to 8 kb, 3 kb to 7 kb, or 4 kb to 6 kb from the open reading frame.

In some embodiments of any of the aspects, disclosed herein are binding sites for HSC restricted miRNAs that permit regulated expression of GATA1 in hematopoietic progenitors to improve erythropoiesis in DBA without unwanted effects on hematopoiesis.

Non-limiting examples of HSC-restricted miRNAs include miR10aT, miR125, miR155, miR130aT, miR142T, miR196bT, miR99, miR126, miR181, miR193, miR223T, miR542, and let7e. Sequences for these miRNAs are known in the art for a number of species, e.g., human miR10aT, miR125, miR155, miR130aT, miR142T, miR196bT, miR99, miR126miR126, miR181, miR193, miR223T, miR542, and let7e.

Binding sites for each of these miRNAs are similarly known in the art and include those readily available on miRBase, miRDB, and/or TargetScan. Briefly, animal miRNA binding sites will be complementary to at least the “seed region” (6-8 nt in length) of the miRNA's sequence. Seed regions for each of the miRNAs described herein are publically available, e.g., at TargetScan and SEQ ID NOs: 43-55 provided herein at Table 2.

In some embodiments of any of the aspects, a binding site for a given miRNA described herein can be a sequence that comprises, consists of, or consists essentially of a sequence complementary to the seed region of that miRNA. In some embodiments of any of the aspects, a nucleic acid sequence described herein can comprise 2, 3, 4, or more repeats of a sequence complementary to the seed region of a single HSC restricted miRNA. Such a sequence can include repeats of an individual sequence and/or combinations of different sequences in series.

In some embodiments of any of the aspects, a binding site for a two or more miRNAs described herein can be a sequence that comprises, consists of, or consists essentially of sequences complementary to the seed region(s) of those miRNAs. In some embodiments of any of the aspects, a binding site for two or more miRNAs described herein can be a sequence that comprises, consists of, or consists essentially of sequences having 2, 3, 4, or more repeats of a sequences complementary to the seed region(s) of those miRNAs. Such a sequence can include repeats of an individual sequence and/or combinations of different sequences in series.

In some embodiments ofany of the aspects, a binding site for one or more miRNAs described herein can be a sequence that comprises, consists of, or consists essentially of a sequence or sequences selected from SEQ ID NOs: 31-37. In some embodiments ofany of the aspects, a binding site for one or more miRNAs described herein can be a sequence that comprises, consists of, or consists essentially of a sequence having 2, 3, 4, or more sequences selected from SEQ ID NOs: 31-37. Such a sequence can include repeats of an individual sequence and/or combinations of different sequences in series. In some embodiments of any of the aspects, a nucleic acid sequence described herein can comprise a sequence that comprises, consists of, or consists essentially of 4 repeats of a sequence selected from SEQ ID NOs: 31-37.

TABLE 2

Non-limiting examples of HSC-restricted miRNA names, miRBase accession

number, nucleotide sequence, exemplary seed regions and exemplary nucleotide sequence of the

miRNA binding site.

miRBase

Nucleotide sequence

accession
Nucleotide sequence of the
Exemplary seed
of exemplary

miRNA name
number
mature miRNA
regions
miRNA binding site

miR10aT
MI0000266
UACCCUGUAGAUCCGAAUU
UGUCCCA
CACAAAT

UGUG (SEQ ID NO: 18)
(SEQ ID NO: 43)
TCGGATCTACAGG

GTA (SEQ ID NO:

31)

miR99
MI0000101
AACCCGUAGAUCCGAUCUU
AUGCCCA

GUG (SEQ ID NO: 19)
(SEQ ID NO: 44)

miR125
MI0000469
ACAGGUGAGGUUCUUGGGA
GAGUCCC

GCC (SEQ ID NO: 20)
(SEQ ID NO: 45)

miR126
MI0000471
CAUUAUUACUUUUGGUACG
GCCAUGC
GCATTAT

CG (SEQ ID NO: 21)
(SEQ ID NO: 46)
TACTCACGGTACG

A (SEQ ID NO: 32)

miR155
MI0000681
CUGUUAAUGCUAAUCGUGA
CGUAAU

UAGGGGUUUUUGCCUCCAA
(SEQ ID NO: 47)

CUGACUCCUACAUAUUAGC

AUUAACAG

(SEQ ID NO: 22)

miR181
MI0000289
AACAUUCAACGCUGUCGGU
ACUUACA

GAGU
(SEQ ID NO: 48)

(SEQ ID NO: 23)

miR193
MI0000487
AACUGGCCUACAAAGUCCC
CCGGUCA

AGU (SEQ ID NO: 24)
(SEQ ID NO: 49)

miR196bT
MI0000238
CAACAACAUUAAACCACCC
UGAUGGA
CCAACAA

GA (SEQ ID NO: 25)
(SEQ ID NO: 50)
CAGGAAACTACCT

A (SEQ ID NO: 33)

miR223T
MI0000300
UGUCAGUUUGUCAAAUACC
UUGACUG
TGTCAGT

CCA (SEQ ID NO: 26)
(SEQ ID No: 51)
TTGTCAAATACCC

C (SEQ ID NO: 34)

miR542
MI0003686
UGUGACAGAUUGAUAACUG
AGGGGC

AAA (SEQ ID NO: 27)
(SEQ ID NO: 52)

let7e
MI0000066
UGAGGUAGGAGGUUGUAU
GGCAUAU
AACTATA

AGUU (SEQ ID NO: 28)
(SEQ ID NO: 53)
CAACCTACTACCT

CA (SEQ ID NO: 35)

miR130aAT
MI0000448
GCUCUUUUCACAU
AACGUGA
CAGTGCA

UGUGCUACU (SEQ ID NO:
C (SEQ ID NO: 54)
ATGTTAAAAGGGC

29)

AT (SEQ ID NO: 36)

miR142T
MI0000458
CAUAAAGUAGAAA
UGAAAUA
TCCATAA

GCACUACU (SEQ ID NO: 30)
(SEQ ID NO: 55)
AGTAGGAAACACT

ACA (SEQ ID NO:

37)

In one aspect of any of the embodiments, described herein is a nucleic acid sequence comprising at least one miRNAbinding site for at least one HSC-restricted miRNA that is selected from the group consisting of miR binding sites for miR10aT, miR125, miR155, miR130aT, miR142T, miR196bT, miR99, miR126, miR181, miR193, miR223T, miR542, and let7e. In one aspect of any of the embodiments, described herein is a nucleic acid sequence comprising at least one, or at least two, or at least three, or at least four, or at least five, or at least six, or at least seven, or at least eight, or at least ten, or at least eleven, or at least twelve binding sites for at least one HSC-restricted miRNA that is selected from the group consisting of miR binding sites for miR10aT, miR125, miR155, miR130aT, miR142T, miR196bT, miR99, miR126, miR181, miR193, miR223T, miR542, and let7e. Where a subset of the miRNA binding sites for the foregoing miRNAs is used, any combination of the miRNA binding sites can be used in each of various embodiments of the aspects described herein. For example, it is specifically contemplated herein that any pairwise combination of binding sites for the 12 miRNAs can be used, e.g., any combination shown in Table 3.

In one aspect of any of the embodiments, described herein is a nucleic acid sequence comprising at least one Hematopoietic enhancer element and at least miRNA binding site for at least one HSC-restricted miRNA. In one aspect of any of the embodiments, described herein is a nucleic acid sequence comprising at least one Hematopoietic enhancer element and at least one binding site for at least one HSC-restricted miRNA and a sequence encoding a GATA1 polypeptide.

TABLE 3

Contemplated exemplary combinations of miRNA binding sites are indicated by “X”

miR10aT
miR125
miR155
miR130aT
miR196bT
miR142T
miR99
miR126
miR181
miR193
miR223T
miR542
Let7e

miR10aT

X
X
X
X
X
X
X
X
X
X
X
X

miR125
X

X
X
X
X
X
X
X
X
X
X
X

miR155
X
X

X
X
X
X
X
X
X
X
X
X

miR130aT
X
X
X

X
X
X
X
X
X
X
X
X

miR196bT
X
X
X
X

X
X
X
X
X
X
X
X

miR142T
X
X
X
X
X

X
X
X
X
X
X
X

miR99
X
X
X
X
X
X

X
X
X
X
X
X

miR126
X
X
X
X
X
X
X

X
X
X
X
X

miR181
X
X
X
X
X
X
X
X

X
X
X
X

miR193
X
X
X
X
X
X
X
X
X

X
X
X

miR223T
X
X
X
X
X
X
X
X
X
X

X
X

miR542
X
X
X
X
X
X
X
X
X
X
X

X

Let7e
X
X
X
X
X
X
X
X
X
X
X
X

In some embodiments of any of the aspects, the miRNA binding site is located at least about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least about 5 kb, at least about 6 kb, at least about 7 kb, at least about 8 kb, at least about 9 kb, at least about 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the miRNA binding site sequence is located at least 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least 5 kb, at least 6 kb, at least 7 kb, at least 8 kb, at least 9 kb, at least 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the miRNA binding site sequence is located at about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at about 5 kb, at about 6 kb, at about 7 kb, at about 8 kb, at about 9 kb, or at about 10 kb from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the miRNA binding site sequences can be in intergenic sequence or in the sequence of an intervening gene. In some embodiments of any of the aspects described herein, the target sequence located within the sequence which is about 500 bp to about 10 kb from the end of the open reading frame, e.g., about 1 kb to about 9 kb, about 2 kb to about 8 kb, about 3 kb to about 7 kb, or about 4 kb to about 6 kb from the open reading frame. In some embodiments of any of the aspects described herein, the miRNA binding site sequences are located about 500 bp to 10 kb from the end of the open reading frame, e.g., 1 kb to 9 kb, 2 kb to 8 kb, 3 kb to 7 kb, or 4 kb to 6 kb from the open reading frame.

In some embodiments of any of the aspects, disclosed herein are nucleic acid sequences comprising a sequence encoding a GATA1 polypeptide and a heterologous 5′ UTR. Such combinations permit lineage-specific expression of GATA1 specifically in early erythroid progenitors

Cap analysis of gene expression was used to define 5′ untranslated regions (UTRs) for transcripts in HSPCs undergoing erythroid lineage commitment, a stage at which the functional defects in erythroid differentiation arise. Transcripts that were most highly translated at baseline and which had short and unstructured 5′ UTRs tend to be the ones that were downregulated at the translational level in the setting of RP haploinsufficiency. The 5′ UTR or “5′ untranslated region” or 5′ leader sequence refers to regions of an mRNA that are not translated. Described herein is the discovery that among all hematopoietic master transcript factors, only GATA1 has a short 5′ UTR and that replacing this 5′ UTR with those of other transcript factors (including but not limited to RUNX1, LMO2, or ETV6) alters the translation of the GATA1 hematopoietic transcription factor.

In one aspect of any of the embodiments, described herein is a nucleic acid sequence comprising i) a heterologous 5′ UTR comprising a) a 5′UTR sequence of a hematopoietic transcription factor other than GATA1; b) a sequence of at least 20 nucleotide acids; and/or c) 1-25 upstream codons uAUGs and ii) a nucleic acid sequence encoding a GATA1 polypeptide. In some embodiments of any of the aspects, a nucleic acid sequence described herein can further comprise a) a heterologous 5′ UTR comprising a) a 5′UTR sequence of a hematopoietic transcription factor other than GATA1; b) a sequence of at least 20 nucleotide acids; and/or c) 1-25 upstream codons uAUGs.

The length of the 5′ UTR can be modified by mutation for example substitution, deletion or insertion of the 5′ UTR. The 5′ UTR can be further modified by mutating a naturally occurring start codon or translation initiation site such that the codon no longer functions as start codon and translation may initiate at an alternate initiation site.

In some embodiments of any of the aspects, the a 5′UTR sequence of a hematopoietic transcription factor other than GATA1 can be a 5′UTR of a gene selected from the group consisting of: Runt-related transcription factor 1 (RUNX1), LIM Domain Only 2 (LMO2), and ETS Variant 6 (ETV6).

As used herein, “RUNX1”, “ANL1”, or “Runt-related transcription factor 1” refers to the alpha subunit of the heterodimeric core binding factor (CBF) transcription factor which is thought to be involved in the development of normal hematopoiesis. RUNX1 is itself a transcription factor and complexes with CBFB cofactor to form CBF. Sequences for RUNX1 are known for a number of species, e.g., human RUNX1 (the RUNX1 NCBI Gene ID is 861) mRNA sequences (e.g., NM_001001890.2) and polypeptide sequences (e.g., NP 001001890.1) are known in the art. These, together with any naturally occurring allelic, splice variants, and processed forms thereof that catalyze the same reaction are contemplated for use in the methods and compositions described herein.

In some embodiments of any of the aspects, the RUNX1 5′ UTR comprises a 5′UTR that comprises, consists of, consists essentially of or is derived from the following nucleic acid sequence: NG_011402.2:940414-1201911 Homo sapiens RUNX family transcription factor 1 (RUNX1), RefSeqGene (LRG 482) on chromosome 21, (SEQ ID NO: 14):

CACAGAACCACAAGTTGGGTAGCCTGGCAGTGTCAGAAGTCTGAACCCAG

CATAGTGGTCAGCAGGCAGGACGAATCACACTGAATGCAAACCACAGGGT

TTCGCAGCGTGGTAAAAGAAATCATTGAGTCCCCCGCCTTCAGAAGAGGG

TGCATTTTCAGGAGGAAGCG

As used herein, “LMO2”, “TTG2”, or “LIM Domain Only 2” refers to a cysteine-rich, two LIM-domain protein that is required for yolk sac erythropoiesis. Sequences for LMO2 are known for a number of species, e.g., human LMO2 (the LMO2 NCBI Gene ID is 4005) mRNA sequences (e.g., NM_001142315.1) and polypeptide sequences (e.g., NP 001135787.1) are known in the art. These, together with any naturally occurring allelic, splice variants, and processed forms thereof that catalyze the same reaction are contemplated for use in the methods and compositions described herein.

In some embodiments of any of the aspects, the LMO2 5′ UTR comprises a 5′UTR that comprises, consists of, consists essentially of, or is derived from the following nucleic acid sequence: NC_000011.10:c33892289-33858576 Homo sapiens chromosome 11, GRCh38.p12, (SEQ ID NO: 15):

ACAAGGGCCTCTGGGTGTCCTGGCAGAGAGGGGAGATGGCACAGGCACCA

GGTGCTAGGGTGCCAGGGCCTCCCGAGAAGGAACAGGTGCAAAGCAGGCA

ATTAGCCCAGAAGGTATCCGTGGGGCAGGCAGCCTAGATCTGATGGGGGA

AGCCACCAGGATTACATCATCTGCTGTAACAACTGCTCTGAAAAGAAGAT

ATTTTTCAACCTGAACTTGCAGTAGCTAGTGGAGAGGCAGGAAAAAGGAA

ATGAAACCAGAGACAGAGGGAAGCTGAGCGAAAATAGACCTTCCCGAGAG

AGGAGGAAGCCCGGAGAGAGACGCACGGTCCCCTCCCCGCCCCTAGGCCG

CCGCCCCCTCTCTGCCCTCGGCGGCGAGCAGCGCGCCGCGACCCGGGCCG

AAGGTGCGAGGGGCTCCGGGCGGCCGGGCGGGCGCACACCATCCCCGCGG

GCGGCGCGGAGCCGGCGACAGCGCGCGAGAGGGACCGGGCGGTGGCGGCG

GCGGGACCGGG

As used herein, “ETV6”, “TEL”, or “ETS Variant 6” refers to a transcription factor with two functional domains: a N-terminal pointed (PNT) domain that is involved in protein-protein interactions with itself and other proteins, and a C-terminal DNA-binding domain. Sequences for ETV6 are known for a number of species, e.g., human ETV6 (the ETV6 NCBI Gene ID is 2120) mRNA sequences (e.g., NM_001987.4) and polypeptide sequences (e.g., NP 001978.1) are known in the art. These, together with any naturally occurring allelic, splice variants, and processed forms thereof that catalyze the same reaction are contemplated for use in the methods and compositions described herein.

In some embodiments of any of the aspects, the ETV6 5′ UTR comprises a 5′UTR that comprises, consists of, consists essentially of, or is derived from the following nucleic acid sequence NG_011443.1:5001-250549 Homo sapiens ETS variant 6 (ETV6), RefSeqGene (LRG 609) on chromosome 12 (SEQ ID NO: 16):

CGTCAGTTTCTGCACTGAAACTCTCAAGATCAATGAGCAAAGAGCTTTCT

CAGTTCTGCCTTTCAGTTTCTCTCTTCCAGGAAGGAAAACATTCGAGAGA

GCGAGGGAGAGCCGCGGGAGGGCGGGGGGCGGGGGCGCCGGCTGCGGGTG

GGAGGAGAGACCGGGAGGCCGGCCGGGCTGCGTCCCGGGTCCCCGCGCCG

CGCCGCGACCTGCAGACCCCGCCGCCGCGCTCGGGCCCGTCTCCCACGCC

CCCGCCGCCCCGCGCGCCCAACTCCGCCGGCCGCCCCGCCCCGCCCCGCG

CGCTCCAGACCCCCGGGGCGGCTGCCGGGAGAGATGCTGGAAGAAACTTC

TTAAATGACCGCGTCTGGCTGGCCGTGGAGCCTTTCTGGGTTGGGGAGAG

GAAAGGAAAGTGGAAAAAACCTGAGAACTTCCTGATCTCTCTCGCTGTGA

GAC

The nucleic acid sequences/elements described herein can be operably linked so that they can interact either directly or indirectly to carry out an intended function, e.g. the mediation or modulation of expression of a nucleic acid sequence. “Operably linked” refers to an arrangement of elements wherein the components so described are configured so as to perform their usual function. Thus, control elements operably linked to an open reading frame are capable of effecting the expression of the open reading frame. The control elements need not be contiguous with the open reading frame, so long as they function to direct the expression thereof. Thus, for example, intervening untranslated yet transcribed sequences can be present between a promoter sequence and the open reading frame and the promoter sequence can still be considered “operably linked” to the open reading frame. The interaction of operatively linked sequences can, for example, be mediated by proteins that interact with the operatively linked sequences.

In some embodiments of any of the aspects, a promoter can be operably linked to any of the elements disclosed herein, e.g., a nucleic acid sequence comprising a hetereologous 5′UTR, at least one distal hematopoietic stem cell (HSC) restricted enhancer element, a binding site for a HSC restricted miRNA, and/or a nucleic acid encoding a GATA1 polypeptide. In some embodiments of any of the aspects, the promoter is not a GATA1 promoter.

In some embodiments of any of the aspects, the promoter comprises a promoter sequence of Elongation factor 1-alpha 1 (eEF1a1). As used herein, “eEF1a1”, “CCS-3”, or “LENG7” refers to the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. Sequences for eEF1a1 are known for a number of species, e.g., human eEF1a1 (the eEF1a1 NCBI Gene ID is 1915) are known in the art. In some embodiments of any of the aspects, the eEF1a1 promoter comprises a promoter that comprises, consists of, consists essentially of, or is derived from the following nucleic acid sequence NC_000006.12:c73521032-73515750 Homo sapiens chromosome 6, GRCh38.p12 Primary Assembly (SEQ ID NO: 17):

CTTTTTCGCAACGGGTTTGCCGCCAGAACACAGGTAAGTGCCGTGTGTGGTTCCCGCGGGCCTGGCCTCT

TTACGGGTTATGGCCCTTGCGTGCCTTGAATTACTTCCACGCCCCTGGCTGCAGTACGTGATTCTTGATC

CCGAGCTTCGGGTTGGAAGTGGGTGGGAGAGTTCGAGGCCTTGCGCTTAAGGAGCCCCTTCGCCTCGTGC

TTGAGTTGAGGCCTGGCTTGGGCGCTGGGGCCGCCGCGTGCGAATCTGGTGGCACCTTCGCGCCTGTCTC

GCTGCTTTCGATAAGTCTCTAGCCATTTAAAATTTTTGATGACCTGCTGCGACGCTTTTTTTCTGGCAAG

ATAGTCTTGTAAATGCGGGCCAAGATCTGCACACTGGTATTTCGGTTTTTGGGGCCGCGGGCGGCGACGG

GGCCCGTGCGTCCCAGCGCACATGTTCGGCGAGGCGGGGCCTGCGAGCGCGGCCACCGAGAATCGGACGG

GGGTAGTCTCAAGCTGGCCGGCCTGCTCTGGTGCCTGGCCTCGCGCCGCCGTGTATCGCCCCGCCCTGGG

CGGCAAGGCTGGCCCGGTCGGCACCAGTTGCGTGAGCGGAAAGATGGCCGCTTCCCGGCCCTGCTGCAGG

GAGCTCAAAATGGAGGACGCGGCGCTCGGGAGAGCGGGCGGGTGAGTCACCCACACAAAGGAAAAGGGCC

TTTCCGTCCTCAGCCGTCGCTTCATGTGACTCCACGGAGTACCGGGCGCCGTCCAGGCACCTCGATTAGT

TCTCGAGCTTTTGGAGTACGTCGTCTTTAGGTTGGGGGGAGGGGTTTTATGCGATGGAGTTTCCCCACAC

TGAGTGGGTGGAGACTGAAGTTAGGCCAGCTTGGCACTTGATGTAATTCTCCTTGGAATTTGCCCTTTTT

GAGTTTGGATCTTGGTTCATTCTCAAGCCTCAGACAGTGGTTCAAAGTTTTTTTCTTCCATTTCAGGTGT

CGTGAAAACTACCCCTAAAAGCCAAAATGGGAAAGGAAAAGACTCATATCAACATTGTCGTCATTGGACA

CGTAGATTCGGGCAAGTCCACCACTACTGGCCATCTGATCTATAAATGCGGTGGCATCGACAAAAGAACC

ATTGAAAAATTTGAGAAGGAGGCTGCTGAGGTATGTTTAATACCAGAAAGGGAAAGATCAACTAAAATGA

GTTTTACCAGCAGAATCATTAGGTGATTTCCCCAGAACTAGTGAGTGGTTTAGATCTGAATGCTAATAGT

TAAGACCTTACTTATGAAATAATTTTGCTTTTGGTGACTTCTGTAATCGTATTGCTAGTGAGTAGATTTG

GATGTTAATAGTTAAGATCCGACTTATAAAAGTTTGATTTTTGGTTGCTTCTGTAACCCAAAGTGACTAA

AATCACTTTGGACTTGGAGTTGTAAAGTGGAAACTGCCAATTAAGGGCTGGGGACAAGGAAATTGAAGCT

GGAGTTTGTGTTTTAGTAACCAAGTAACGACTCTTAATCCTTACAGATGGGAAAGGGCTCCTTCAAGTAT

GCCTGGGTCTTGGATAAACTGAAAGCTGAGCGTGAACGTGGTATCACCATTGATATCTCCTTGTGGAAAT

TTGAGACCAGCAAGTACTATGTGACTATCATTGATGCCCCAGGACACAGAGACTTTATCAAAAACATGAT

TACAGGGACATCTCAGGTTGGTGGGATTAATAATTCTAGGTTTCTTTATCCCAAAAGGCTTGCTTTGTAC

ACTGGTTTTGTCATTTGGAGAGTTGACAGGGATATGTCTTTGCTTTCTTTAAAGGCTGACTGTGCTGTCC

TGATTGTTGCTGCTGGTGTTGGTGAATTTGAAGCTGGTATCTCCAAGAATGGGCAGACCCGAGAGCATGC

CCTTCTGGCTTACACACTGGGTGTGAAACAACTAATTGTCGGTGTTAACAAAATGGATTCCACTGAGCCA

CCCTACAGCCAGAAGAGATATGAGGAAATTGTTAAGGAAGTCAGCACTTACATTAAGAAAATTGGCTACA

ACCCCGACACAGTAGCATTTGTGCCAATTTCTGGTTGGAATGGTGACAACATGCTGGAGCCAAGTGCTAA

CGTAAGTGGCTTTCAAGACCATTGTTAAAAAGCTCTGGGAATGGCGATTTCATGCTTACACAAATTGGCA

TGCTTGTGTTTCAGATGCCTTGGTTCAAGGGATGGAAAGTCACCCGTAAGGATGGCAATGCCAGTGGAAC

CACGCTGCTTGAGGCTCTGGACTGCATCCTACCACCAACTCGTCCAACTGACAAGCCCTTGCGCCTGCCT

CTCCAGGATGTCTACAAAATTGGTGGTAAGTTGGCTGTAAACAAAGTTGAATTTGAGTTGATAGAGTACT

GTCTGCCTTCATAGGTATTTAGTATGCTGTAAATATTTTTAGGTATTGGTACTGTTCCTGTTGGCCGAGT

GGAGACTGGTGTTCTCAAACCCGGTATGGTGGTCACCTTTGCTCCAGTCAACGTTACAACGGAAGTAAAA

TCTGTCGAAATGCACCATGAAGCTTTGAGTGAAGCTCTTCCTGGGGACAATGTGGGCTTCAATGTCAAGA

ATGTGTCTGTCAAGGATGTTCGTCGTGGCAACGTTGCTGGTGACAGCAAAAATGACCCACCAATGGAAGC

AGCTGGCTTCACTGCTCAGGTAACAATTTAAAGTAACATTAACTTATTGCAGAGGCTAAAGTCATTTGAG

ACTTTGGATTTGCACTGAATGCAAATCTTTTTTCCAAGGTGATTATCCTGAACCATCCAGGCCAAATAAG

CGCCGGCTATGCCCCTGTATTGGATTGCCACACGGCTCACATTGCATGCAAGTTTGCTGAGCTGAAGGAA

AAGATTGATCGCCGTTCTGGTAAAAAGCTGGAAGATGGCCCTAAATTCTTGAAGTCTGGTGATGCTGCCA

TTGTTGATATGGTTCCTGGCAAGCCCATGTGTGTTGAGAGCTTCTCAGACTATCCACCTTTGGGTAAGGA

TGACTACTTAAATGTAAAAAAGTTGTGTTAAAGATGAAAAATACAACTGAACAGTACTTTGGGTAATAAT

TAACTTTTTTTTTAATAGGTCGCTTTGCTGTTCGTGATATGAGACAGACAGTTGCGGTGGGTGTCATCAA

AGCAGTGGACAAGAAGGCTGCTGGAGCTGGCAAGGTCACCAAGTCTGCCCAGAAAGCTCAGAAGGCTAAA

TGAATATTATCCCTAATACCTGCCACCCCACTCTTAATCAGTGGTGGAAGAACGGTCTCAGAACTGTTTG

TTTCAATTGGCCATTTAAGTTTAGTAGTAAAAGACTGGTTAATGATAACAATGCATCGTAAAACCTTCAG

AAGGAAAGGAGAATGTTTTGTGGACCACTTTGGTTTTCTTTTTTGCGTGTGGCAGTTTTAAGTTATTAGT

TTTTAAAATCAGTACTTTTTAATGGAAACAACTTGACCAAAAATTTGTCACAGAATTTTGAGACCCATTA

AAAAAGTTAAATGAGAAACCTGTGTGTTCCTTTGGTCAACACCGAGACATTTAGGTGAAAGACATCTAAT

TCTGGTTTTACGAATCTGGAAACTTCTTGAAAATGTAATTCTTGAGTTAACACTTCTGGGTGGAGAATAG

GGTTGTTTTCCCCCCACATAATTGGAAGGGGAAGGAATATCATTTAAAGCTATGGGAGGGTTGCTTTGAT

TACAACACTGGAGAGAAATGCAGCATGTTGCTGATTGCCTGTCACTAAAACAGGCCAAAAACTGAGTCCT

TGTGTTGCATAGAAAGCTTCATGTTGCTAAACCAATGTTAAGTGAATCTTTGGAAACAAAATGTTTCCAA

ATTACTGGGATGTGCATGTTGAAACGTGGGTTAAAATGACTGGGCAGTGAAAGTTGACTATTTGCCATGA

CATAAGAAATAAGTGTAGTGGCTAGTGTACACCCTATGAGTGGAAGGGTCCATTTTGAAGTCAGTGGAGT

AAGCTTTATGCCAGTTTGATGGTTTCACAAGTTCTATTGAGTGCTATTCAGAATAGGAACAAGGTTCTAA

TAGAAAAAGATGGCAATTTGAAGTAGCTATAAAATTAGACTAATCTACATTGCTTTTCTCCTGCAGAGTC

TAATACCTTTTATGCTTTGATAATTAGCAGTTTGTCTACTTGGTCACTAGGAATGAAACTACATGGTAAT

AGGCTTAACAGGTGTAATAGCCCACTTACTCCTGAATCTTTAAGCATTTGTGCATTTGAAAAATGCTTTT

CGCGATCTTCCTGCTGGGATTACAGGCATGAGCCACTGTGCCTGACCTCCCATATGTAAAAGTGTCTAAA

GGTTTTTTTTTGGTTATAAAAGGAAAATTTTTGCTTAAGTTTGAAGGATAGGTAAAATTAAAGGACATGC

TTTCTGTTTGTGTGATGGTTTTTAAAAATTTTTTTTAAGATGGAGTTCTTGTTGCCCAGGCTAGAATGCA

ATGGCAAAATCTCACTGCAATCTCCTCCTCCTGGGTTCAAGCAATTCTCCTACTTCAGCCTCCCAAGTAG

CTGGGATTACAGGCATGTGCTAATTTGGTGTTTTTAATAGAGATGAGGTTTTTCCATGTTGGTCAGGCTG

GTCTCAAACTCCTGACCTTAGGTGATCGCCTCGGCCTCCTAAAGTGCTGGAATTACAGGCATGAGCCACC

ATGCCTGGCCAGGACATGTGTTCTTAAGGACATGCTAAGCAGGAGTTAAAGCAGCCCAAGAGATAAGGCC

TCTTAAAGTGACTGGCAATGTGTATTGCTCAAGATTCAAAGGTACTTGAATTGGCCATAGACAAGTCTGT

AATGAAGTGTTATCGTTTTCCCTCATCTGAGTCTGAATTAGATAAAATGCCTTCCCATCAGCCAGTGCTC

TGAGGTATCAAGTCTAAATTGAACTAGAGATTTTTGTCCTTAGTTTCTTTGCTATCTAATGTTTACACAA

GTAAATAGTCTAAGATTTGCTGGATGACAGAAAAAACAGGTAAGGCCTTTAATAGATGGCCAATAGATGC

CCTGATAATGAAAGTTGACACCTGTAAGATTTACCAGTAGAGAATTCTTGACATGCAAGGAAGCAAGATT

TAACTGAAAAATTGTTCCCACTGGAAGCAGGAATGAGTCAGTTTACTTGCATATACTGAGATTGAGATTA

ACTTCCTGTGAAACCCAGTGTCTTAGACAACTGTGGCTTGAGCACCACCTGCTGGTATTCATTACAAACT

TGCTCACTACAATAAATGAATTTTAAGCTTTAA

Complex cellular and developmental processes depend on precise spatiotemporal regulation of mRNA and protein levels and activities. Such regulation arises essentially at the transcriptional, posttranscriptional, and posttranslational levels. Post-transcriptional regulation is the control of gene expression at the RNA level, therefore between the transcription and the translation of the gene. Posttranscriptional regulation can be controlled through both protein-RNA and RNA-RNA interactions. As used herein, posttranscriptional regulatory elements include nucleotide sequences including but not limited Woodchuck Hepatitis Virus Posttranscriptional Regulatory Elements. In some embodiments of any of the aspects, the nucleic acid sequences described herein can further comprise a posttranscriptional regulatory element operably linked to the sequence encoding the GATA1 polypeptide.

In some embodiments of any of the aspects, the posttranscriptional regulatory element comprises a Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element. Woodchuck Hepatitis Virus (WHP) Posttranscriptional Regulatory Element, abbreviated WPRE, is a DNA sequence that, when transcribed, creates a tertiary structure enhancing expression. WPRE is a tripartite regulatory element with gamma, alpha, and beta components.

In some embodiments of any of the aspects, the Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element (WPRE) comprises, consists of, or consists essentially of the following nucleotide sequence (SEQ ID NO: 56):

GCCACGGCGGAACTCATCGCCGCCTGCCTTGCCCGCTGCTGGACAGGGGC

TCGGCTGTTGGGCACTGACAATTCCGTGGT

AATCAACCTCTGGATTACAAAATTTGTGAAAGATTGACTGGTATTCTTAA

CTATGTTGCTCCTTTTACGCTATGTGGATACGCTGCTTTAATGCCTTTGT

ATCATGCTATTGCTTCCCGTATGGCTTTCATTTTCTCCTCCTTGTATAAA

TCCTGGTTGCTGTCTCTTTATGAGGAGTTGTGGCCCGTTGTCAGGCAACG

TGGCGTGGTGTGCACTGTGTTTGCTGACGCAACCCCCACTGGTTGGGGCA

TTGCCACCACCTGTCAGCTCCTTTCCGGGACTTTCGCTTTCCCCCTCCCT

ATTGCCACGGCGGAACTCATCGCCGCCTGCCTTGCCCGCTGCTGGACAGG

GGCTCGGCTGTTGGGCACTGACAATTCCGTGGTGTTGTCGGGGAAGCTGA

CGTCCTTTCCATGGCTGCTCGCCTGTGTTGCCACCTGGATTCTGCGCGGG

ACGTCCTTCTGCTACGTCCCTTCGGCCCTCAATCCAGCGGACCTTCCTTC

CCGCGGCCTGCTGCCGGCTCTGCGGCCTCTTCCGCGTCTTCGCCTTCGCC

CTCAGACGAGTCGGATCTCCCTTTGGGCCGCCTCCCCGCCTG

Alternative and/or optimized WPRE are also known in the art, e.g., as described in Patel and Olsen RNA Virus Vectors 11:S322 (2005), which is incorporated by reference herein in its entirey.

In some embodiments of any of the aspects, a WPRE comprises a sequence of at least 80% homology to a nucleotide sequence that is of: SEQ ID NO: 56 and/or SEQ ID NO: 63. In some embodiments of any of the aspects, a WPRE comprises a sequence of at least with at least 60%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98% or greater sequence identity to SEQ ID NO: 56 and/or SEQ ID NO: 63. In some embodiments of any of the aspects, a WPRE comprises a sequence of at least with at least 60%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98% or greater sequence identity to SEQ ID NO: 56 and/or SEQ ID NO: 63 and which retains the wild-type activity of SEQ ID NO: 56 and/or SEQ ID NO: 63. A nucleic acid sequence described herein can comprise multiple post-transcriptional regulatory elements, e.g., the nucleic acid sequence comprises at least one, or at least 2, or at least 3, or at least 4, or at least 5, or at least 6, or at least 7, or at least 10, or at least 11, or at least 12, or at least 13, or at least 14, or at least 15, or at least 16, or at least 17, or at least 20, or at least 25, or at least 30 post-transcriptional regulatory elements.

In some embodiments of any of the aspects, the posttranscriptional regulatory element is located at least about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least about 5 kb, at least about 6 kb, at least about 7 kb, at least about 8 kb, at least about 9 kb, at least about 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the posttranscriptional regulatory element sequence is located at least 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least 5 kb, at least 6 kb, at least 7 kb, at least 8 kb, at least 9 kb, at least 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the posttranscriptional regulatory element sequence is located at about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at about 5 kb, at about 6 kb, at about 7 kb, at about 8 kb, at about 9 kb, or at about 10 kb from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the posttranscriptional regulatory element sequence can be in intergenic sequence or in the sequence of an intervening gene. In some embodiments of any of the aspects described herein, the posttranscriptional regulatory element sequence can be located within the sequence which is about 500 bp to about 10 kb from the end of the open reading frame, e.g., about 1 kb to about 9 kb, about 2 kb to about 8 kb, about 3 kb to about 7 kb, or about 4 kb to about 6 kb from the open reading frame. In some embodiments of any of the aspects described herein, the posttranscriptional regulatory element sequence can be located from about 500 bp to 10 kb from the end of the open reading frame, e.g., 1 kb to 9 kb, 2 kb to 8 kb, 3 kb to 7 kb, or 4 kb to 6 kb from the open reading frame.

In some embodiments of any of the aspects, a nucleic acid sequence described herein can further comprise an internal ribosome entry site. An internal ribosome entry site, abbreviated IRES, is an RNA element that allows for translation initiation in a cap-independent manner, as part of the greater process of protein synthesis. In eukaryotic translation, initiation typically occurs at the 5′ end of mRNA molecules, since 5′ cap recognition is required for the assembly of the initiation complex. The location for IRES elements is often in the 5′UTR, but can also occur elsewhere in mRNAs.

In some embodiments of any of the aspects, the internal ribosome entry site comprises, consists of, or consists essentially of the following nucleotide sequence (SEQ ID NO: 66)

CCCCTCTCCCTCCCCCCCCCCTAACGTTACTGGCCGAAGCCGCTTGGAAT

AAGGCCGGTGTGCGTTTGTCTATATGTTATTTTCCACCATATTGCCGTCT

TTTGGCAATGTGAGGGCCCGGAAACCTGGCCCTGTCTTCTTGACGAGCAT

TCCTAGGGGTCTTTCCCCTCTCGCCAAAGGAATGCAAGGTCTGTTGAATG

TCGTGAAGGAAGCAGTTCCTCTGGAAGCTTCTTGAAGACAAACAACGTCT

GTAGCGACCCTTTGCAGGCAGCGGAACCCCCCACCTGGCGACAGGTGCCT

CTGCGGCCAAAAGCCACGTGTATAAGATACACCTGCAAAGGCGGCACAAC

CCCAGTGCCACGTTGTGAGTTGGATAGTTGTGGAAAGAGTCAAATGGCTC

TCCTCAAGCGTATTCAACAAGGGGCTGAAGGATGCCCAGAAGGTACCCCA

TTGTATGGGATCTGATCTGGGGCCTCGGTACACATGCTTTACATGTGTTT

AGTCGAGGTTAAAAAAACGTCTAGGCCCCCCGAACCACGGGGACGTGGTT

TTCCTTTGAAAAACACGATGATAATATGGCCACAACC

In some embodiments of any of the aspects, described herein is a IRES comprising a sequence of at least 80% homology to a nucleotide sequence that is of: SEQ ID NO: 66. In some embodiments of any of the aspects, a IRES comprises a sequence of at least 60%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98% or greater sequence identity to SEQ ID NO: 66. In some embodiments of any of the aspects, a IRES comprises a sequence with at least 60%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98% or greater sequence identity to SEQ ID NO: 66, which retains the wild-type activity of SEQ ID NO: 66.

Nucleic acid sequences described herein can comprise multiple IRES', e.g., a nucleic acid sequence can comprise at least one, or at least 2, or at least 3, or at least 4, or at least 5, or at least 6, or at least 7, or at least 10, or at least 11, or at least 12, or at least 13, or at least 14, or at least 15, or at least 16, or at least 17, or at least 20, or at least 25, or at least 30 IRES sequences.

In some embodiments of any of the aspects, the IRES is located at least about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least about 5 kb, at least about 6 kb, at least about 7 kb, at least about 8 kb, at least about 9 kb, at least about 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the IRES sequence is located at least 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least 5 kb, at least 6 kb, at least 7 kb, at least 8 kb, at least 9 kb, at least 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the IRES sequence is located at about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at about 5 kb, at about 6 kb, at about 7 kb, at about 8 kb, at about 9 kb, or at about 10 kb from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the IRES sequence can be in intergenic sequence or in the sequence of an intervening gene. In some embodiments of any of the aspects described herein, the IRES sequence can be located within the sequence which is about 500 bp to about 10 kb from the end of the open reading frame, e.g., about 1 kb to about 9 kb, about 2 kb to about 8 kb, about 3 kb to about 7 kb, or about 4 kb to about 6 kb from the open reading frame. In some embodiments of any of the aspects described herein, the IRES sequence can be located within the sequence which is 500 bp to 10 kb from the end of the open reading frame, e.g., 1 kb to 9 kb, 2 kb to 8 kb, 3 kb to 7 kb, or 4 kb to 6 kb from the open reading frame.

In some embodiments of any of the aspects, a nucleic acid sequence described herein can further comprise a self-cleaving 2 A polypeptide. A self-cleaving peptide, or 2A peptide, is a polypeptide which can induce the cleaving of a polypeptide of which it is a part, e.g., a recombinant GATA-1 described herein. Thus, a 2A peptide can be used to cleave a longer peptide into two shorter peptides, thereby two peptides can be generated with a single transcript. 2A peptides are derived from the 2A region in the genome of a virus. The 2A-peptide-mediated cleavage commences after the translation. The cleavage is trigged by breaking of peptide bond between the Proline (P) and Glycine (G) in C-terminal of 2A peptide. A 2A polypeptide can comprise at least 10, at least, 15, at least 20, at least 25, at least 30, or at least 40 amino acids.

In some embodiments of any of the aspects, 2A peptides can be combined with the IRES elements in a single nucleic acid sequence, thereby generating three separate polypeptides encoded within a single transcript.

Exemplary 2A peptides that can be used with the methods described herein include, but are not limited to P2A, E2A, F2A and T2A (see also Table 4, SEQ ID NOs: 57-60). F2A is derived from foot-and-mouth disease virus 18; E2A is derived from equine rhinitis A virus; P2A is derived from porcine teschovirus-1 2A; T2A is derived from thosea asigna virus 2A.

TABLE 4

Names and sequences of 2A peptides that can be

used in various embodiments described herein. An

optional linker “GSG” (Gly-Ser-Gly)(bolded) can be

added on the N-terminal of the 2A peptides listed.

Name
Sequence

T2A

GSG EGRGSLLTCGDVEENPGP (SEQ ID NO: 57)

P2A

GSG ATNFSLLKQAGDVEENPGP (SEQ ID NO: 58)

E2A

GSG QCTNYALLKLAGDVESNPGP (SEQ ID NO: 59)

F2A

GSG VKQTLNFDLLKLAGDVESNPGP (SEQ ID NO: 60)

In some embodiments of any of the aspects, the IRES and/or self-cleaving 2A polypeptide can be operably linked to a marker gene, e.g., a marker gene encoding an optically detectable protein or an enzyme. Optically detectable proteins/enzymes can comprise an optically detectable label and/or comprise the ability to generate a detectable signal (e.g. by catalyzing reaction converting a compound to a detectable product). Detectable labels can comprise, for example, a light-absorbing moiety or a fluorescent moiety. Detectable labels, marker genes, methods of detecting them, and methods of incorporating them into reagents (e.g. antibodies and nucleic acid probes) are well known in the art.

Optically detectable labels/signals can comprise those visible to the human eye or those detectable with optical equipment, e.g., by spectroscopic, photochemical, biochemical, immunochemical, electromagnetic, radiochemical, or chemical means, such as fluorescence, chemifluoresence, or chemiluminescence, or any other appropriate means. Detectable labels can include, but are not limited to radioisotopes, bioluminescent compounds, chromophores, antibodies, chemiluminescent compounds, fluorescent compounds, metal chelates, and enzymes.

Marker genes are well-known in the art, e.g., and can include but are not limited to naturally fluorescent proteins such as the Green Fluorescent Protein (GFP) of Aequorea victoria (Cubitt, A. B. et al. 1995. Understanding, improving, and using green fluorescent proteins. Trends Biochem. Sci. 20: 448-455; Chalfie, M., and Prasher, D. C. U.S. Pat. No. 5,491,084), a lacZ gene encoding a beta-galactosidase enzyme, horseradish peroxidase, alkaline phosphatase, malate dehydrogenase, staphylococcal nuclease, delta-V-steroid isomerase, yeast alcohol dehydrogenase, alpha-glycerophosphate dehydrogenase, triose phosphate isomerase, horseradish peroxidase, alkaline phosphatase, asparaginase, glucose oxidase, beta-galactosidase, ribonuclease, urease, catalase, glucose-VI-phosphate dehydrogenase, glucoamylase and acetylcholinesterase.

In some embodiments of any of the aspects, the nucleic acid sequence described herein can comprise, consist of, or consists essentially of a sequence selected from SEQ ID NOs 8, 9, 61, and 62.

SEQ ID NO: 61 (also designated as R18 EF1a IRES GFP) comprises an EF1A promoter, an IRES sequence operably linked to a nucleotide sequence encoding

GFP: GTCGACGGATCGGGAGATCTCCCGATCCCCTATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCA

GTATCTGCTCCCTGCTTGTGTGTTGGAGGTCGCTGAGTAGTGCGCGAGCAAAATTTAAGCTACAACAAGGCAAGGCTTGACCG

ACAATTGCATGAAGAATCTGCTTAGGGTTAGGCGTTTTGCGCTGCTTCGCGATGTACGGGCCAGATATACGCGTTGACATTGA

TTATTGACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTACATAACTTAC

GGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAA

TAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCA

AGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTAC

TTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGT

TTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCA

AAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGCGCGTTTTG

CCTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGGGAACCCACTGCTTAAGCCTCAA

TAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTA

GTCAGTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGGGAAACCAGAGGAGCTCTCTCGACGCAG

GACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCGGCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCT

AGAAGGAGAGAGATGGGTGCGAGAGCGTCAGTATTAAGCGGGGGAGAATTAGATCGCGATGGGAAAAAATTCGGTTAAGGCCA

GGGGGAAAGAAAAAATATAAATTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAACGATTCGCAGTTAATCCTGGCCTGTT

AGAAACATCAGAAGGCTGTAGACAAATACTGGGACAGCTACAACCATCCCTTCAGACAGGATCAGAAGAACTTAGATCATTAT

ATAATACAGTAGCAACCCTCTATTGTGTGCATCAAAGGATAGAGATAAAAGACACCAAGGAAGCTTTAGACAAGATAGAGGAA

GAGCAAAACAAAAGTAAGACCACCGCACAGCAAGCGGCCGGCCGCTGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAAT

TGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGT

GCAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGT

CAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTATTGAGGCG

CAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAATCCTGGCTGTGGAAAGATACCTAAAGGA

TCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATA

AATCTCTGGAACAGATTTGGAATCACACGACCTGGATGGAGTGGGACAGAGAAATTAACAATTACACAAGCTTAATACACTCC

TTAATTGAAGAATCGCAAAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTG

GTTTAACATAACAAATTGGCTGTGGTATATAAAATTATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTG

CTGTACTTTCTATAGTGAATAGAGTTAGGCAGGGATATTCACCATTATCGTTTCAGACCCACCTCCCAACCCCGAGGGGACCC

GACAGGCCCGAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCATTCGATTAGTGAACGGATCGGCACTGCG

TGCGCCAATTCTGCAGACAAATGGCAGTATTCATCCACAATTTTAAAAGAAAAGGGGGGATTGGGGGGTACAGTGCAGGGGAA

AGAATAGTAGACATAATAGCAACAGACATACAAACTAAAGAATTACAAAAACAAATTACAAAAATTCAAAATTTTCGGGTTTA

TTACAGGGACAGCAGAGATCCAGTTTGGTTAGTACCGGGCCCGCTCTAGCGTGAGGCTCCGGTGCCCGTCAGTGGGCAGAGCG

CACATCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACCGGTGCCTAGAGAAGGTGGCGCGGGGTAAACT

GGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAAC

GTTCTTTTTCGCAACGGGTTTGCCGCCAGAACACAGGTAAGTGCCGTGTGTGGTTCCCGCGGGCCTGGCCTCTTTACGGGTTA

TGGCCCTTGCGTGCCTTGAATTACTTCCACCTGGCTGCAGTACGTGATTCTTGATCCCGAGCTTCGGGTTGGAAGTGGGTGGG

AGAGTTCGAGGCCTTGCGCTTAAGGAGCCCCTTCGCCTCGTGCTTGAGTTGAGGCCTGGCCTGGGCGCTGGGGCCGCCGCGTG

CGAATCTGGTGGCACCTTCGCGCCTGTCTCGCTGCTTTCGATAAGTCTCTAGCCATTTAAAATTTTTGATGACCTGCTGCGAC

GCTTTTTTTCTGGCAAGATAGTCTTGTAAATGCGGGCCAAGATCTGCACACTGGTATTTCGGTTTTTGGGGCCGCGGGCGGCG

ACGGGGCCCGTGCGTCCCAGCGCACATGTTCGGCGAGGCGGGGCCTGCGAGCGCGGCCACCGAGAATCGGACGGGGGTAGTCT

CAAGCTGGCCGGCCTGCTCTGGTGCCTGGCCTCGCGCCGCCGTGTATCGCCCCGCCCTGGGCGGCAAGGCTGGCCCGGTCGGC

ACCAGTTGCGTGAGCGGAAAGATGGCCGCTTCCCGGCCCTGCTGCAGGGAGCTCAAAATGGAGGACGCGGCGCTCGGGAGAGC

GGGCGGGTGAGTCACCCACACAAAGGAAAAGGGCCTTTCCGTCCTCAGCCGTCGCTTCATGTGACTCCACGGAGTACCGGGCG

CCGTCCAGGCACCTCGATTAGTTCTCGAGCTTTTGGAGTACGTCGTCTTTAGGTTGGGGGGAGGGGTTTTATGCGATGGAGTT

TCCCCACACTGAGTGGGTGGAGACTGAAGTTAGGCCAGCTTGGCACTTGATGTAATTCTCCTTGGAATTTGCCCTTTTTGAGT

TTGGATCTTGGTTCATTCTCAAGCCTCAGACAGTGGTTCAAAGTTTTTTTCTTCCATTTCAGGTGTCGTGAGCGGCCGCTGAG

TTAACTATTCTAGACCCGGGCTAGGATCCGCCCCTCTCCCTCCCCCCCCCCTAACGTTACTGGCCGAAGCCGCTTGGAATAAG

GCCGGTGTGCGTTTGTCTATATGTTATTTTCCACCATATTGCCGTCTTTTGGCAATGTGAGGGCCCGGAAACCTGGCCCTGTC

TTCTTGACGAGCATTCCTAGGGGTCTTTCCCCTCTCGCCAAAGGAATGCAAGGTCTGTTGAATGTCGTGAAGGAAGCAGTTCC

TCTGGAAGCTTCTTGAAGACAAACAACGTCTGTAGCGACCCTTTGCAGGCAGCGGAACCCCCCACCTGGCGACAGGTGCCTCT

GCGGCCAAAAGCCACGTGTATAAGATACACCTGCAAAGGCGGCACAACCCCAGTGCCACGTTGTGAGTTGGATAGTTGTGGAA

AGAGTCAAATGGCTCTCCTCAAGCGTATTCAACAAGGGGCTGAAGGATGCCCAGAAGGTACCCCATTGTATGGGATCTGATCT

GGGGCCTCGGTACACATGCTTTACATGTGTTTAGTCGAGGTTAAAAAAACGTCTAGGCCCCCCGAACCACGGGGACGTGGTTT

TCCTTTGAAAAACACGATGATAATATGGCCACAACCATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCT

GGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGA

CCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGC

TTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCAT

CTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGA

AGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATG

GCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCA

CTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGCA

AAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTG

TACAAGTAAAGCGGCCGCATCGATACCGTCGACCTCGATCGAGACCTAGAAAAACATGGAGCAATCACAAGTAGCAATACAGC

AGCTACCAATGCTGATTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAGGTGGGTTTTCCAGTCACACCTCAGGTACCTTTAA

GACCAATGACTTACAAGGCAGCTGTAGATCTTAGCCACTTTTTAAAAGAAAAGGGGGGACTGGAAGGGCTAATTCACTCCCAA

CGAAGACAAGATATCCTTGATCTGTGGATCTACCACACACAAGGCTACTTCCCTGATTGGCAGAACTACACACCAGGGCCAGG

GATCAGATATCCACTGACCTTTGGATGGTGCTACAAGCTAGTACCAGTTGAGCAAGAGAAGGTAGAAGAAGCCAATGAAGGAG

AGAACACCCGCTTGTTACACCCTGTGAGCCTGCATGGGATGGATGACCCGGAGAGAGAAGTATTAGAGTGGAGGTTTGACAGC

CGCCTAGCATTTCATCACATGGCCCGAGAGCTGCATCCGGACTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGA

GCTCTCTGGCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGT

TGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGCATGTGAGCAAAAGGCCAG

CAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCG

ACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTC

CTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGT

AGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTT

ATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCA

GAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATC

TGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGG

TTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACG

CTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAA

AAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTAT

CTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCAT

CTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGG

GCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTC

GCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCA

GCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATC

GTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGT

AAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGG

CGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTC

TCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCAC

CAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCA

TACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAA

AATAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGAC

SEQ ID NO: 8 (also designated as R21 miR126) comprises an EF1A promoter, and an IRES sequence operably linked to a nucleotide sequence encoding GFP and four miRNAa binding site for the HSC restricted miRNA miR126:

GTCGACGGATCGGGAGATCTCCCGATCCCCTATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAA

GCCAGTATCTGCTCCCTGCTTGTGTGTTGGAGGTCGCTGAGTAGTGCGCGAGCAAAATTTAAGCTACAACAAGGC

AAGGCTTGACCGACAATTGCATGAAGAATCTGCTTAGGGTTAGGCGTTTTGCGCTGCTTCGCGATGTACGGGCCA

GATATACGCGTTGACATTGATTATTGACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCAT

ATATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGA

CGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTAC

GGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTA

AATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAG

TCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGAT

TTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTC

GTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGCGCGTTTT

GCCTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGGGAACCCACTGCTT

AAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGA

TCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGGGA

AACCAGAGGAGCTCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCGGCGACTGG

TGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTGCGAGAGCGTCAGTATTAAGCGGG

GGAGAATTAGATCGCGATGGGAAAAAATTCGGTTAAGGCCAGGGGGAAAGAAAAAATATAAATTAAAACATATAG

TATGGGCAAGCAGGGAGCTAGAACGATTCGCAGTTAATCCTGGCCTGTTAGAAACATCAGAAGGCTGTAGACAAA

TACTGGGACAGCTACAACCATCCCTTCAGACAGGATCAGAAGAACTTAGATCATTATATAATACAGTAGCAACCC

TCTATTGTGTGCATCAAAGGATAGAGATAAAAGACACCAAGGAAGCTTTAGACAAGATAGAGGAAGAGCAAAACA

AAAGTAAGACCACCGCACAGCAAGCGGCCGGCCGCTGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTG

GAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAG

AGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCAC

TATGGGCGCAGCGTCAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGCAGAACAA

TTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAG

AATCCTGGCTGTGGAAAGATACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTG

CACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAAATCTCTGGAACAGATTTGGAATCACACGACCTGGAT

GGAGTGGGACAGAGAAATTAACAATTACACAAGCTTAATACACTCCTTAATTGAAGAATCGCAAAACCAGCAAGA

AAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTTAACATAACAAATTGGCT

GTGGTATATAAAATTATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGTACTTTCTAT

AGTGAATAGAGTTAGGCAGGGATATTCACCATTATCGTTTCAGACCCACCTCCCAACCCCGAGGGGACCCGACAG

GCCCGAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCATTCGATTAGTGAACGGATCGGCACT

GCGTGCGCCAATTCTGCAGACAAATGGCAGTATTCATCCACAATTTTAAAAGAAAAGGGGGGATTGGGGGGTACA

GTGCAGGGGAAAGAATAGTAGACATAATAGCAACAGACATACAAACTAAAGAATTACAAAAACAAATTACAAAAA

TTCAAAATTTTCGGGTTTATTACAGGGACAGCAGAGATCCAGTTTGGTTAGTACCGGGCCCGCTCTAGCGTGAGG

CTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTG

AACCGGTGCCTAGAGAAGGTGGCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGA

GGGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAAC

ACAGGTAAGTGCCGTGTGTGGTTCCCGCGGGCCTGGCCTCTTTACGGGTTATGGCCCTTGCGTGCCTTGAATTAC

TTCCACCTGGCTGCAGTACGTGATTCTTGATCCCGAGCTTCGGGTTGGAAGTGGGTGGGAGAGTTCGAGGCCTTG

CGCTTAAGGAGCCCCTTCGCCTCGTGCTTGAGTTGAGGCCTGGCCTGGGCGCTGGGGCCGCCGCGTGCGAATCTG

GTGGCACCTTCGCGCCTGTCTCGCTGCTTTCGATAAGTCTCTAGCCATTTAAAATTTTTGATGACCTGCTGCGAC

GCTTTTTTTCTGGCAAGATAGTCTTGTAAATGCGGGCCAAGATCTGCACACTGGTATTTCGGTTTTTGGGGCCGC

GGGCGGCGACGGGGCCCGTGCGTCCCAGCGCACATGTTCGGCGAGGCGGGGCCTGCGAGCGCGGCCACCGAGAAT

CGGACGGGGGTAGTCTCAAGCTGGCCGGCCTGCTCTGGTGCCTGGCCTCGCGCCGCCGTGTATCGCCCCGCCCTG

GGCGGCAAGGCTGGCCCGGTCGGCACCAGTTGCGTGAGCGGAAAGATGGCCGCTTCCCGGCCCTGCTGCAGGGAG

CTCAAAATGGAGGACGCGGCGCTCGGGAGAGCGGGCGGGTGAGTCACCCACACAAAGGAAAAGGGCCTTTCCGTC

CTCAGCCGTCGCTTCATGTGACTCCACGGAGTACCGGGCGCCGTCCAGGCACCTCGATTAGTTCTCGAGCTTTTG

GAGTACGTCGTCTTTAGGTTGGGGGGAGGGGTTTTATGCGATGGAGTTTCCCCACACTGAGTGGGTGGAGACTGA

AGTTAGGCCAGCTTGGCACTTGATGTAATTCTCCTTGGAATTTGCCCTTTTTGAGTTTGGATCTTGGTTCATTCT

CAAGCCTCAGACAGTGGTTCAAAGTTTTTTTCTTCCATTTCAGGTGTCGTGAGCGGCCGCTGAGTTAACTATTCT

AGACCCGGGCTAGGATCCGCCCCTCTCCCTCCCCCCCCCCTAACGTTACTGGCCGAAGCCGCTTGGAATAAGGCC

GGTGTGCGTTTGTCTATATGTTATTTTCCACCATATTGCCGTCTTTTGGCAATGTGAGGGCCCGGAAACCTGGCC

CTGTCTTCTTGACGAGCATTCCTAGGGGTCTTTCCCCTCTCGCCAAAGGAATGCAAGGTCTGTTGAATGTCGTGA

AGGAAGCAGTTCCTCTGGAAGCTTCTTGAAGACAAACAACGTCTGTAGCGACCCTTTGCAGGCAGCGGAACCCCC

CACCTGGCGACAGGTGCCTCTGCGGCCAAAAGCCACGTGTATAAGATACACCTGCAAAGGCGGCACAACCCCAGT

GCCACGTTGTGAGTTGGATAGTTGTGGAAAGAGTCAAATGGCTCTCCTCAAGCGTATTCAACAAGGGGCTGAAGG

ATGCCCAGAAGGTACCCCATTGTATGGGATCTGATCTGGGGCCTCGGTACACATGCTTTACATGTGTTTAGTCGA

GGTTAAAAAAACGTCTAGGCCCCCCGAACCACGGGGACGTGGTTTTCCTTTGAAAAACACGATGATAATATGGCC

ACAACCATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTA

AACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATC

TGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGC

CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACC

ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC

ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGC

CACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG

GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC

AACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAG

TTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAAAGCGGCCGCATCGATAATCAACCT

CTGGATTACAAAATTTGTGAAAGATTGACTGGTATTCTTAACTATGTTGCTCCTTTTACGCTATGTGGATACGCT

GCTTTAATGCCTTTGTATCATGCTATTGCTTCCCGTATGGCTTTCATTTTCTCCTCCTTGTATAAATCCTGGTTG

CTGTCTCTTTATGAGGAGTTGTGGCCCGTTGTCAGGCAACGTGGCGTGGTGTGCACTGTGTTTGCTGACGCAACC

CCCACTGGTTGGGGCATTGCCACCACCTGTCAGCTCCTTTCCGGGACTTTCGCTTTCCCCCTCCCTATTGCCACG

GCGGAACTCATCGCCGCCTGCCTTGCCCGCTGCTGGACAGGGGCTCGGCTGTTGGGCACTGACAATTCCGTGGTG

TTGTCGGGGAAGCTGACGTCCTTTCCATGGCTGCTCGCCTGTGTTGCCACCTGGATTCTGCGCGGGACGTCCTTC

TGCTACGTCCCTTCGGCCCTCAATCCAGCGGACCTTCCTTCCCGCGGCCTGCTGCCGGCTCTGCGGCCTCTTCCG

CGTCTTCGCCTTCGCCCTCAGACGAGTCGGATCTCCCTTTGGGCCGCCTCCCCGCGAATTCGCATTATTACTCAC

GGTACGAGCATTATTACTCACGGTACGAGCATTATTACTCACGGTACGAGCATTATTACTCACGGTACGAGCGAT

CGCCCTCAGGTACCTTTAAGACCAATGACTTACAAGGCAGCTGTAGATCTTAGCCACTTTTTAAAAGAAAAGGGG

GGACTGGAAGGGCTAATTCACTCCCAACGAAGACAAGATATCCTTGATCTGTGGATCTACCACACACAAGGCTAC

TTCCCTGATTGGCAGAACTACACACCAGGGCCAGGGATCAGATATCCACTGACCTTTGGATGGTGCTACAAGCTA

GTACCAGTTGAGCAAGAGAAGGTAGAAGAAGCCAATGAAGGAGAGAACACCCGCTTGTTACACCCTGTGAGCCTG

CATGGGATGGATGACCCGGAGAGAGAAGTATTAGAGTGGAGGTTTGACAGCCGCCTAGCATTTCATCACATGGCC

CGAGAGCTGCATCCGGACTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTA

GGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGAC

TCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGCATGTGAGCAAAAGGCCAG

CAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCAC

AAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGC

TCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTG

GCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCAC

GAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGAC

TTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTG

AAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTC

GGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAG

CAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAAC

GAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAA

TGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCA

CCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGG

GAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCA

ATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAAT

TGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATC

GTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCC

CCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTA

TCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGT

GAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGAT

AATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGG

ATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTC

ACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGT

TGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATA

TTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGAC

SEQ ID NO: 9 (also designated as R49 1 peak enhancer) comprises, an IRES sequence operably linked to a nucleotide sequence encoding GFP and one hematopoietic enhancer element:

GTCGACGGATCGGGAGATCTCCCGATCCCCTATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGTAT

CTGCTCCCTGCTTGTGTGTTGGAGGTCGCTGAGTAGTGCGCGAGCAAAATTTAAGCTACAACAAGGCAAGGCTTGACCGACAA

TTGCATGAAGAATCTGCTTAGGGTTAGGCGTTTTGCGCTGCTTCGCGATGTACGGGCCAGATATACGCGTTGACATTGATTAT

TGACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTACATAACTTACGGTA

AATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGG

GACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTA

CGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGG

CAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGA

CTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAAT

GTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGCGCGTTTTGCCTG

TACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAA

GCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCA

GTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGGGAAACCAGAGGAGCTCTCTCGACGCAGGACT

CGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCGGCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAA

GGAGAGAGATGGGTGCGAGAGCGTCAGTATTAAGCGGGGGAGAATTAGATCGCGATGGGAAAAAATTCGGTTAAGGCCAGGGG

GAAAGAAAAAATATAAATTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAACGATTCGCAGTTAATCCTGGCCTGTTAGAA

ACATCAGAAGGCTGTAGACAAATACTGGGACAGCTACAACCATCCCTTCAGACAGGATCAGAAGAACTTAGATCATTATATAA

TACAGTAGCAACCCTCTATTGTGTGCATCAAAGGATAGAGATAAAAGACACCAAGGAAGCTTTAGACAAGATAGAGGAAGAGC

AAAACAAAAGTAAGACCACCGCACAGCAAGCGGCCGGCCGCTGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGA

GAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAG

AGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAAT

GACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAAC

AGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAATCCTGGCTGTGGAAAGATACCTAAAGGATCAA

CAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAAATC

TCTGGAACAGATTTGGAATCACACGACCTGGATGGAGTGGGACAGAGAAATTAACAATTACACAAGCTTAATACACTCCTTAA

TTGAAGAATCGCAAAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTT

AACATAACAAATTGGCTGTGGTATATAAAATTATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGT

ACTTTCTATAGTGAATAGAGTTAGGCAGGGATATTCACCATTATCGTTTCAGACCCACCTCCCAACCCCGAGGGGACCCGACA

GGCCCGAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCATTCGATTAGTGAACGGATCGGCACTGCGTGCG

CCAATTCTGCAGACAAATGGCAGTATTCATCCACAATTTTAAAAGAAAAGGGGGGATTGGGGGGTACAGTGCAGGGGAAAGAA

TAGTAGACATAATAGCAACAGACATACAAACTAAAGAATTACAAAAACAAATTACAAAAATTCAAAATTTTCGGGTTTATTAC

AGGGACAGCAGAGATCCAGTTTGGTTAGTACCGGGCCCGCTCTAGCGTGAGGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACA

TCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACCGGTGCTAGCATGGCGGGCAAGAAGTTGAGGCCACT

GTCCCTGGGTGTTCCTACCCCCACACCCTCACCCCAAGACAGCCTGTTACTGCGGCGCCAACAGCCACGGTCGCCTACATCTG

ATAAGACTTATCTGCTGCCCCAGGGCAGGCCGGAGCTGGCGTAAGCCCCAGTGGGGCGCTAAGTGAGTGTGCCCCTGCCTCCC

GCCAGCACTGGCCTGGCCTGCAGGCTTAGCCTGGGTCATCAAGGTATCCCACAGGCTCTAGTTCAAATCCAGCAGAACCTCTC

TGAGCCTCACTCTTCTCACCTGCAAAATGGGTACAGCCACATCCCTTCTCTCCCTGCAGCCAGGAAGACGCACATACACAGGA

GTCTAGCCCACACCGGCCCCGCACAAATTAAGGGCTTTACTCTCTGAAAAGCCCAGTGAAGTCATGAAACCATATCTGCTATT

TTCATTTATCTTGGTTTCAGCCTATTTTGCTTGTCTGGACACTACAGTCCACGGGAGCCTAGGTCGAGCGAGGTCCAAGAATC

CCCAGGGTGGGCAGGGAGGGTGGAAGAGGGCCTCCAGTGCCCAAGAGGTGCCCCACAAGCATGGGACCCGCCCCCTCCCCTGG

ACTGCCCCACCCACTGGGGCACCAGCCACTCCCTGGGGAGGAGGGAGGAGGGAGAAGGGAGGGAGGGAGGGAGGGAGGAAGGG

AGCCTCAAAGGCCAAGGCCAGCCAGGACACCCCCTGGGATCACACTGAGCTTGCCACATCCCCAAGGCGGCCGAACCCTCCGC

AACCACCAGCCCAGAGATCTAGAGTTAATCCCCAGAGGCTCCATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCC

CATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCA

AGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTG

CAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCG

CACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCG

AGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTAT

ATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGC

CGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCC

TGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGAC

GAGCTGTACAAGTAAAGCGGCCGCATCGATACCGTCGACCTCGATCGAGACCTAGAAAAACATGGAGCAATCACAAGTAGCAA

TACAGCAGCTACCAATGCTGATTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAGGTGGGTTTTCCAGTCACACCTCAGGTAC

CTTTAAGACCAATGACTTACAAGGCAGCTGTAGATCTTAGCCACTTTTTAAAAGAAAAGGGGGGACTGGAAGGGCTAATTCAC

TCCCAACGAAGACAAGATATCCTTGATCTGTGGATCTACCACACACAAGGCTACTTCCCTGATTGGCAGAACTACACACCAGG

GCCAGGGATCAGATATCCACTGACCTTTGGATGGTGCTACAAGCTAGTACCAGTTGAGCAAGAGAAGGTAGAAGAAGCCAATG

AAGGAGAGAACACCCGCTTGTTACACCCTGTGAGCCTGCATGGGATGGATGACCCGGAGAGAGAAGTATTAGAGTGGAGGTTT

GACAGCCGCCTAGCATTTCATCACATGGCCCGAGAGCTGCATCCGGACTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGC

CTGGGAGCTCTCTGGCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCC

GTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGCATGTGAGCAAAA

GGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAA

AAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGC

GCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCA

CGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTG

CGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGA

TTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTT

GGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAG

CGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGT

CTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTA

AATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGC

ACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCT

TACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCC

GGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAG

TAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTT

CATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCT

CCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCC

ATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTT

GCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGA

AAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTAC

TTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAA

TACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATT

TAGAAAAATAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGAC

SEQ ID NO: 62 (also designated as R50 3 peak enhancer) comprises an IRES sequence operably linked to a nucleotide sequence encoding GFP and three hematopoietic enhancer elements:

GTCGACGGATCGGGAGATCTCCCGATCCCCTATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGTAT

CTGCTCCCTGCTTGTGTGTTGGAGGTCGCTGAGTAGTGCGCGAGCAAAATTTAAGCTACAACAAGGCAAGGCTTGACCGACAA

TTGCATGAAGAATCTGCTTAGGGTTAGGCGTTTTGCGCTGCTTCGCGATGTACGGGCCAGATATACGCGTTGACATTGATTAT

TGACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTACATAACTTACGGTA

AATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGG

GACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTA

CGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGG

CAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGA

CTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAAT

GTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGCGCGTTTTGCCTG

TACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAA

GCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCA

GTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGGGAAACCAGAGGAGCTCTCTCGACGCAGGACT

CGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCGGCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAA

GGAGAGAGATGGGTGCGAGAGCGTCAGTATTAAGCGGGGGAGAATTAGATCGCGATGGGAAAAAATTCGGTTAAGGCCAGGGG

GAAAGAAAAAATATAAATTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAACGATTCGCAGTTAATCCTGGCCTGTTAGAA

ACATCAGAAGGCTGTAGACAAATACTGGGACAGCTACAACCATCCCTTCAGACAGGATCAGAAGAACTTAGATCATTATATAA

TACAGTAGCAACCCTCTATTGTGTGCATCAAAGGATAGAGATAAAAGACACCAAGGAAGCTTTAGACAAGATAGAGGAAGAGC

AAAACAAAAGTAAGACCACCGCACAGCAAGCGGCCGGCCGCTGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGA

GAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAG

AGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAAT

GACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAAC

AGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAATCCTGGCTGTGGAAAGATACCTAAAGGATCAA

CAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAAATC

TCTGGAACAGATTTGGAATCACACGACCTGGATGGAGTGGGACAGAGAAATTAACAATTACACAAGCTTAATACACTCCTTAA

TTGAAGAATCGCAAAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTT

AACATAACAAATTGGCTGTGGTATATAAAATTATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGT

ACTTTCTATAGTGAATAGAGTTAGGCAGGGATATTCACCATTATCGTTTCAGACCCACCTCCCAACCCCGAGGGGACCCGACA

GGCCCGAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCATTCGATTAGTGAACGGATCGGCACTGCGTGCG

CCAATTCTGCAGACAAATGGCAGTATTCATCCACAATTTTAAAAGAAAAGGGGGGATTGGGGGGTACAGTGCAGGGGAAAGAA

TAGTAGACATAATAGCAACAGACATACAAACTAAAGAATTACAAAAACAAATTACAAAAATTCAAAATTTTCGGGTTTATTAC

AGGGACAGCAGAGATCCAGTTTGGTTAGTACCGGGCCCGCTCTAGCGTGAGGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACA

TCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACCGGTACTGGCCTGGCCAACATAGTGAAACCCCATCT

CTCCTAATAATACAAAAATTAGCCAGGCATGGTGGCGGGTGCCTGTAATCCCAGCTACTCAGGAGACTGAGGCAGGATAATCA

CTTGAACCCAGCAGGTGGAGGCTGCAGTGAGCCAAGATCGTGCCACTGCACTCCAGCCTGGGTGACAGAGCAAGACTACATCT

CAAAAAAAAAAAAAAAAAAAAAAAGAAGATAGATGACCAACAAGTTTATGAAAATATGCTCAACATCAGTGGTCACAGGGAAA

TGCAAATCAAAACCATAACAAGATACCACTTCACACCCACACCCAGTAGGATGGCGCGATCGCAGAACCCCAGAAGATGCCAG

GAGGGAGTGAGCCAGTCAGGGAAGGCTTCCGAGAAGAGAGGACATTGAAGAAGAGTCTCAAACTTAGGCCTGACGGAGAAGAC

GCGCGGCCAGGACACCCCACCCCCGCCCTCGTCTCCCCCAAAGCCTGATCTGGCCCCACTGATTCCCTTATCTGCCCACTCCC

AGCTGCCTCCTTGCTGGCTGAACTGTCGCCGCAGACTTCTGAGCCTGCGCCCCCTCCACGGGGATGGGGGAGGGAATGGGGTG

AGGCCTGGCCTCACAGCCTCGGGGTTTCCAGCTCTTGCTGGAGGCAGGGCTCTGGGGCGCCCTACTCCTCACCCTTGGCTTCT

CTTCCTGAGCGCTCTGTGCTCTCCAGAGCTAGCATGGCGGGCAAGAAGTTGAGGCCACTGTCCCTGGGTGTTCCTACCCCCAC

ACCCTCACCCCAAGACAGCCTGTTACTGCGGCGCCAACAGCCACGGTCGCCTACATCTGATAAGACTTATCTGCTGCCCCAGG

GCAGGCCGGAGCTGGCGTAAGCCCCAGTGGGGCGCTAAGTGAGTGTGCCCCTGCCTCCCGCCAGCACTGGCCTGGCCTGCAGG

CTTAGCCTGGGTCATCAAGGTATCCCACAGGCTCTAGTTCAAATCCAGCAGAACCTCTCTGAGCCTCACTCTTCTCACCTGCA

AAATGGGTACAGCCACATCCCTTCTCTCCCTGCAGCCAGGAAGACGCACATACACAGGAGTCTAGCCCACACCGGCCCCGCAC

AAATTAAGGGCTTTACTCTCTGAAAAGCCCAGTGAAGTCATGAAACCATATCTGCTATTTTCATTTATCTTGGTTTCAGCCTA

TTTTGCTTGTCTGGACACTACAGTCCACGGGAGCCTAGGTCGAGCGAGGTCCAAGAATCCCCAGGGTGGGCAGGGAGGGTGGA

AGAGGGCCTCCAGTGCCCAAGAGGTGCCCCACAAGCATGGGACCCGCCCCCTCCCCTGGACTGCCCCACCCACTGGGGCACCA

GCCACTCCCTGGGGAGGAGGGAGGAGGGAGAAGGGAGGGAGGGAGGGAGGGAGGAAGGGAGCCTCAAAGGCCAAGGCCAGCCA

GGACACCCCCTGGGATCACACTGAGCTTGCCACATCCCCAAGGCGGCCGAACCCTCCGCAACCACCAGCCCAGAGATCTAGAG

TTAATCCCCAGAGGCTCCATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGA

CGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCA

CCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGAC

CACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGG

CAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGG

AGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAAC

GGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCC

CATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGC

GCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAAAGCGGCCGC

ATCGATACCGTCGACCTCGATCGAGACCTAGAAAAACATGGAGCAATCACAAGTAGCAATACAGCAGCTACCAATGCTGATTG

TGCCTGGCTAGAAGCACAAGAGGAGGAGGAGGTGGGTTTTCCAGTCACACCTCAGGTACCTTTAAGACCAATGACTTACAAGG

CAGCTGTAGATCTTAGCCACTTTTTAAAAGAAAAGGGGGGACTGGAAGGGCTAATTCACTCCCAACGAAGACAAGATATCCTT

GATCTGTGGATCTACCACACACAAGGCTACTTCCCTGATTGGCAGAACTACACACCAGGGCCAGGGATCAGATATCCACTGAC

CTTTGGATGGTGCTACAAGCTAGTACCAGTTGAGCAAGAGAAGGTAGAAGAAGCCAATGAAGGAGAGAACACCCGCTTGTTAC

ACCCTGTGAGCCTGCATGGGATGGATGACCCGGAGAGAGAAGTATTAGAGTGGAGGTTTGACAGCCGCCTAGCATTTCATCAC

ATGGCCCGAGAGCTGCATCCGGACTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGG

GAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACT

AGAGATCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGCATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGT

AAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTG

GCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGC

TTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTG

TAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCT

TGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCG

GTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCA

GTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCA

GCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACT

CACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCA

ATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATT

TCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAA

TGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGT

CCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCG

CAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGAT

CAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTG

GCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGAC

TGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATA

CCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTG

TTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGC

AAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAAT

ATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTT

CCGCGCACATTTCCCCGAAAAGTGCCACCTGAC

In some embodiments of any of the aspects, the nucleic acid sequence described herein is a vector or is comprised by or provided in a vector. The vector can be, e.g., a plasmid, viral vector, or an adenoviral, lentiviral or retroviral vector. As used herein, the term “retrovirus” refers a type of RNA virus that inserts a copy of its genome into the DNA of a host cell that it invades, thus changing the genome of that cell. Such viruses are either single stranded RNA or double stranded DNA viruses. In some embodiments of any of the aspects, the retrovirus is an alpha retrovirus. As used herein, the term “lentivirus” refers to a group (or genus) of complex retroviruses. lentiviruses are capable of infecting non-dividing and actively dividing cell types, whereas standard retroviruses can only infect mitotically active cell types. Illustrative lentiviruses include, but are not limited to: HIV (human immunodeficiency virus; including HIV type 1, and HIV type 2); visna-maedi virus (VMV) virus; the caprine arthritis-encephalitis virus (CAEV); equine infectious anemia virus (EIAV); feline immunodeficiency virus (FIV); bovine immune deficiency virus (BIV); and simian immunodeficiency virus (SIV). As used herein, the term “Adenoviruses” refers to nonenveloped viruses with an icosahedral nucleocapsid containing a double stranded DNA genome. As used herein, the term “viral vector” refers to a nucleic acid vector construct that includes at least one element of viral origin and has the capacity to be packaged into a viral vector particle. The viral vector can contain the nucleic acid described herein in place of non-essential viral genes. The vector and/or particle may be utilized for the purpose of transferring any nucleic acids into cells either in vitro or in vivo. Numerous forms of viral vectors are known in the art.

In some embodiments of any of the aspects, the nucleic acid sequence and/or vector described herein is comprised by, provided in, or located in, a viral particle (e.g., a lentiviral particle).

In one aspect of any of the embodiments, described herein is a composition comprising a nucleic acid sequence, vector, or particle as described herein and a pharmaceutically acceptable carrier.

In one aspect of any of the embodiments, described herein is to a pharmaceutical composition comprising a nucleic acid sequence as described herein (and/or a vector or virus particle comprising such a nucleic acid sequence), and optionally a pharmaceutically acceptable carrier. In some embodiments of any of the aspects, the active ingredients of the pharmaceutical composition comprise a nucleic acid as described herein (and/or a vector or virus particle comprising such a nucleic acid sequence). In some embodiments of any of the aspects, the active ingredients of the pharmaceutical composition consist of a nucleic acid as described herein (and/or a vector or virus particle comprising such a nucleic acid sequence). Pharmaceutically acceptable carriers and diluents include saline, aqueous buffer solutions, solvents and/or dispersion media. The use of such carriers and diluents is well known in the art. Some non-limiting examples of materials which can serve as pharmaceutically-acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose, methylcellulose, ethyl cellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients, such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) glycols, such as propylene glycol; (11) polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters, such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents, such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethyl alcohol; (20) pH buffered solutions; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids (23) serum component, such as serum albumin, HDL and LDL; (22) C₂-C₁₂alcohols, such as ethanol; and (23) other non-toxic compatible substances employed in pharmaceutical formulations. Wetting agents, coloring agents, release agents, coating agents, sweetening agents, flavoring agents, perfuming agents, preservative and antioxidants can also be present in the formulation. The terms such as “excipient”, “carrier”, “pharmaceutically acceptable carrier” or the like are used interchangeably herein. In some embodiments of any of the aspects, the carrier inhibits the degradation of the active agent, e.g. of a nucleic acid comprising a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide as described herein.

In some embodiments of any of the aspects, the pharmaceutical composition comprising a nucleic acid sequence comprising a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide as described herein (and/or a vector or virus particle comprising such a nucleic acid sequence) can be a parenteral dose form. Since administration of parenteral dosage forms typically bypasses the patient's natural defenses against contaminants, parenteral dosage forms are preferably sterile or capable of being sterilized prior to administration to a patient. Examples of parenteral dosage forms include, but are not limited to, solutions ready for injection, dry products ready to be dissolved or suspended in a pharmaceutically acceptable vehicle for injection, suspensions ready for injection, and emulsions. In addition, controlled-release parenteral dosage forms can be prepared for administration of a patient, including, but not limited to, DUROS®-type dosage forms and dose-dumping.

Suitable vehicles that can be used to provide parenteral dosage forms of the pharmaceutical composition comprising a nucleic acid sequence comprising a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide as described herein (and/or a vector or virus particle comprising such a nucleic acid sequence) are well known to those skilled in the art. Examples include, without limitation: sterile water; water for injection USP; saline solution; glucose solution; aqueous vehicles such as but not limited to, sodium chloride injection, Ringer's injection, dextrose Injection, dextrose and sodium chloride injection, and lactated Ringer's injection; water-miscible vehicles such as, but not limited to, ethyl alcohol, polyethylene glycol, and propylene glycol; and non-aqueous vehicles such as, but not limited to, corn oil, cottonseed oil, peanut oil, sesame oil, ethyl oleate, isopropyl myristate, and benzyl benzoate. Compounds that alter or modify the solubility of a pharmaceutically acceptable salt of the pharmaceutical composition as disclosed herein can also be incorporated into the parenteral dosage forms of the disclosure, including conventional and controlled-release parenteral dosage forms.

Pharmaceutical compositions comprising a nucleic acid sequence comprising a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide as disclosed herein (and/or a vector or virus particle comprising such a nucleic acid sequence) can also be formulated to be suitable for oral administration, for example as discrete dosage forms, such as, but not limited to, tablets (including without limitation scored or coated tablets), pills, caplets, capsules, chewable tablets, powder packets, cachets, troches, wafers, aerosol sprays, or liquids, such as but not limited to, syrups, elixirs, solutions or suspensions in an aqueous liquid, a non-aqueous liquid, an oil-in-water emulsion, or a water-in-oil emulsion. Such compositions contain a predetermined amount of the pharmaceutically acceptable salt of the disclosed compounds, and may be prepared by methods of pharmacy well known to those skilled in the art. See generally, Remington: The Science and Practice of Pharmacy, 21st Ed., Lippincott, Williams, and Wilkins, Philadelphia Pa. (2005).

Conventional dosage forms generally provide rapid or immediate drug release from the formulation. Depending on the pharmacology and pharmacokinetics of the drug, use of conventional dosage forms can lead to wide fluctuations in the concentrations of the drug in a patient's blood and other tissues. These fluctuations can impact a number of parameters, such as dose frequency, onset of action, duration of efficacy, maintenance of therapeutic blood levels, toxicity, side effects, and the like. Advantageously, controlled-release formulations can be used to control a drug's onset of action, duration of action, plasma levels within the therapeutic window, and peak blood levels. In particular, controlled- or extended-release dosage forms or formulations can be used to ensure that the maximum effectiveness of a drug is achieved while minimizing potential adverse effects and safety concerns, which can occur both from under-dosing a drug (i.e., going below the minimum therapeutic levels) as well as exceeding the toxicity level for the drug. In some embodiments of any of the aspects, the comprising a nucleic acid sequence comprising a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide as disclosed herein (and/or a vector or virus particle comprising such a nucleic acid sequence) can be administered in a sustained release formulation.

Controlled-release pharmaceutical products have a common goal of improving drug therapy over that achieved by their non-controlled release counterparts. Ideally, the use of an optimally designed controlled-release preparation in medical treatment is characterized by a minimum of drug substance being employed to cure or control the condition in a minimum amount of time. Advantages of controlled-release formulations include: 1) extended activity of the drug; 2) reduced dosage frequency; 3) increased patient compliance; 4) usage of less total drug; 5) reduction in local or systemic side effects; 6) minimization of drug accumulation; 7) reduction in blood level fluctuations; 8) improvement in efficacy of treatment; 9) reduction of potentiation or loss of drug activity; and 10) improvement in speed of control of diseases or conditions. Kim, Chemg-ju, Controlled Release Dosage Form Design, 2 (Technomic Publishing, Lancaster, Pa.: 2000).

Most controlled-release formulations are designed to initially release an amount of drug (active ingredient) that promptly produces the desired therapeutic effect, and gradually and continually release other amounts of drug to maintain this level of therapeutic or prophylactic effect over an extended period of time. In order to maintain this constant level of drug in the body, the drug must be released from the dosage form at a rate that will replace the amount of drug being metabolized and excreted from the body. Controlled-release of an active ingredient can be stimulated by various conditions including, but not limited to, pH, ionic strength, osmotic pressure, temperature, enzymes, water, and other physiological conditions or compounds.

A variety of known controlled- or extended-release dosage forms, formulations, and devices can be adapted for use with the salts and compositions of the disclosure. Examples include, but are not limited to, those described in U.S. Pat. Nos. 3,845,770; 3,916,899; 3,536,809; 3,598,123; 4,008,719; 5,674,533; 5,059,595; 5,591,767; 5,120,548; 5,073,543; 5,639,476; 5,354,556; 5,733,566; and 6,365,185 B1; each of which is incorporated herein by reference. These dosage forms can be used to provide slow or controlled-release of one or more active ingredients using, for example, hydroxypropylmethyl cellulose, other polymer matrices, gels, permeable membranes, osmotic systems (such as OROS® (Alza Corporation, Mountain View, Calif. USA)), or a combination thereof to provide the desired release profile in varying proportions.

In some aspects of the embodiments, described herein is a method of treating Diamond-Blackfan Anemia in a subject in need thereof, the method comprising administering a therapeutically effective amount of a nucleic acid sequence, particle, or composition as described herein to the patient.

The compositions described herein can be administered to a subject having or diagnosed as having DBA. In some embodiments of any of the aspects, the methods described herein comprise administering an effective amount of a composition described herein, e.g. of a nucleic acid comprising a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide as as described herein to a subject in order to alleviate a symptom of DBA. As used herein, “alleviating a symptom” is ameliorating any condition or symptom associated with DBA. As compared with an equivalent untreated control, such reduction is by at least 5%, 10%, 20%, 40%, 50%, 60%, 80%, 90%, 95%, 99% or more as measured by any standard technique. A variety of means for administering the compositions described herein to subjects are known to those of skill in the art. Such methods can include, but are not limited to oral, parenteral, intravenous, intramuscular, subcutaneous, transdermal, airway (aerosol), pulmonary, cutaneous, topical, or injection administration. Administration can be local or systemic.

The term “effective amount” as used herein refers to the amount of the active agent needed to alleviate at least one or more symptom of the disease or disorder, and relates to a sufficient amount of pharmacological composition to provide the desired effect. The term “therapeutically effective amount” therefore refers to an amount of the active agent that is sufficient to provide a particular effect when administered to a typical subject. An effective amount as used herein, in various contexts, would also include an amount sufficient to delay the development of a symptom of the disease, alter the course of a symptom disease (for example but not limited to, slowing the progression of a symptom of the disease), or reverse a symptom of the disease. Thus, it is not generally practicable to specify an exact “effective amount”. However, for any given case, an appropriate “effective amount” can be determined by one of ordinary skill in the art using only routine experimentation.

Effective amounts, toxicity, and therapeutic efficacy can be determined by standard pharmaceutical procedures in cell cultures or experimental animals, e.g., for determining the LD50 (the dose lethal to 50% of the population) and the ED50 (the dose therapeutically effective in 50% of the population). The dosage can vary depending upon the dosage form employed and the route of administration utilized. The dose ratio between toxic and therapeutic effects is the therapeutic index and can be expressed as the ratio LD50/ED50. Compositions and methods that exhibit large therapeutic indices are preferred. A therapeutically effective dose can be estimated initially from cell culture assays. Also, a dose can be formulated in animal models to achieve a circulating plasma concentration range that includes the IC50 (i.e., the concentration of the active agent, which achieves a half-maximal inhibition of symptoms) as determined in cell culture, or in an appropriate animal model. Levels in plasma can be measured, for example, by high performance liquid chromatography. The effects of any particular dosage can be monitored by a suitable bioassay, e.g,. assays for the levels of red blood cells and/or erythropoiesis, among others. The dosage can be determined by a physician and adjusted, as necessary, to suit observed effects of the treatment.

The dosage of a composition as described herein can be determined by a physician and adjusted, as necessary, to suit observed effects of the treatment. With respect to duration and frequency of treatment, it is typical for skilled clinicians to monitor subjects in order to determine when the treatment is providing therapeutic benefit, and to determine whether to increase or decrease dosage, increase or decrease administration frequency, discontinue treatment, resume treatment, or make other alterations to the treatment regimen. The dosing schedule can vary from once a week to daily depending on a number of clinical factors, such as the subject's sensitivity to the active agent. The desired dose or amount of activation can be administered at one time or divided into subdoses, e.g., 2-4 subdoses and administered over a period of time, e.g., at appropriate intervals through the day or other appropriate schedule. In some embodiments of any of the aspects, administration can be chronic, e.g., one or more doses and/or treatments daily over a period of weeks or months. Examples of dosing and/or treatment schedules are administration daily, twice daily, three times daily or four or more times daily over a period of 1 week, 2 weeks, 3 weeks, 4 weeks, 1 month, 2 months, 3 months, 4 months, 5 months, or 6 months, or more. A composition a nucleic acid sequence comprising a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide as disclosed herein (and/or a vector or virus particle comprising such a nucleic acid sequence) can be administered over a period of time, such as over a 5 minute, 10 minute, 15 minute, 20 minute, or 25 minute period.

In some embodiments of any of the aspects, after an initial treatment regimen, the treatments can be administered on a less frequent basis. For example, after treatment biweekly for three months, treatment can be repeated once per month, for six months or a year or longer. Treatment according to the methods described herein can reduce levels of a marker or symptom of a condition by at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80% or at least 90% or more.

The dosage ranges for the administration of a nucleic acid sequence comprising a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide as disclosed herein (and/or a vector or virus particle comprising such a nucleic acid sequence), according to the methods described herein depend upon, for example, the form of the inhibitor, its potency, and the extent to which symptoms, markers, or indicators of a condition described herein are desired to be reduced, for example the percentage Generally, the dosage will vary with the age, condition, and sex of the patient and can be determined by one of skill in the art. The dosage can also be adjusted by the individual physician in the event of any complication.

The efficacy of a nucleic acid sequence comprising a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide as disclosed herein (and/or a vector or virus particle comprising such a nucleic acid sequence) in, e.g. the treatment of DBA or any other condition described herein, or to induce a response as described herein can be determined by the skilled clinician. However, a treatment is considered “effective treatment,” as the term is used herein, if one or more of the signs or symptoms of a condition described herein are altered in a beneficial manner, other clinically accepted symptoms are improved, or even ameliorated, or a desired response is induced e.g., by at least 10% following treatment according to the methods described herein. Efficacy can be assessed, for example, by measuring a marker, indicator, symptom, and/or the incidence of a condition treated according to the methods described herein or any other measurable parameter appropriate. Efficacy can also be measured by a failure of an individual to worsen as assessed by hospitalization, or need for medical interventions (i.e., progression of the disease is halted). Methods of measuring these indicators are known to those of skill in the art and/or are described herein. Treatment includes any treatment of a disease in an individual or an animal (some non-limiting examples include a human or an animal) and includes: (1) inhibiting the disease, e.g., preventing a worsening of symptoms; or (2) relieving the severity of the disease, e.g., causing regression of symptoms. An effective amount for the treatment of a disease means that amount which, when administered to a subject in need thereof, is sufficient to result in effective treatment as that term is defined herein, for that disease. Efficacy of an agent can be determined by assessing physical indicators of a condition or desired response. It is well within the ability of one skilled in the art to monitor efficacy of administration and/or treatment by measuring any one of such parameters, or any combination of parameters. Efficacy can be assessed in animal models of a condition described herein, for example treatment of DBA.

In some embodiments of any of the aspects, the early erythroid progenitor cells comprise a DBA-associated gene mutation including but not limited to the ones listed in Table 5. In some embodiments of any of the aspects, the erythroid progenitor cells comprise one or more DBA-associated gene mutations. DBA-associated gene mutations are well-known in the art and include but are not limited to mutations listed in Table 5 (e.g., see Int J Hematol. 2010 October; 92(3):413-8).

TABLE 5

Exemplary DBA-associated gene mutations

Gene
Exemplary DBA-associated cDNA

Name
mutations; predicted amino acid change

GALA1
220G>C; p.Leu74Val

RPL5
c.535C>T; p.Arg179X

RPL11
c.475_476ins11; p.Lys159ThrfsX39

RPS19
c.49G>C; p.Ala17Pro

In some embodiments of any of the aspects, the level of GATA-1 can be measured, by way of non-limiting example, by Western blot; immunoprecipitation; enzyme-linked immunosorbent assay (ELISA); radioimmunological assay (RIA); sandwich assay; fluorescence in situ hybridization (FISH); immunohistological staining; radioimmunometric assay; immunofluoresence assay; mass spectroscopy and/or immunoelectrophoresis assay.

RNA and/or DNA molecules can be isolated, derived, or amplified from a biological sample, such as a blood sample. Techniques for the detection of mRNA expression is known by persons skilled in the art, and can include but not limited to, PCR procedures, RT-PCR, quantitative RT-PCR Northern blot analysis, differential gene expression, RNAse protection assay, microarray based analysis, next-generation sequencing; hybridization methods, etc.

In general, the PCR procedure describes a method of gene amplification which is comprised of (i) sequence-specific hybridization of primers to specific genes or sequences within a nucleic acid sample or library, (ii) subsequent amplification involving multiple rounds of annealing, elongation, and denaturation using a thermostable DNA polymerase, and (iii) screening the PCR products for a band of the correct size. The primers used are oligonucleotides of sufficient length and appropriate sequence to provide initiation of polymerization, i.e. each primer is specifically designed to be complementary to a strand of the genomic locus to be amplified. In an alternative embodiment, mRNA level of gene expression products described herein can be determined by reverse-transcription (RT) PCR and by quantitative RT-PCR (QRT-PCR) or real-time PCR methods. Methods of RT-PCR and QRT-PCR are well known in the art.

In some embodiments of any of the aspects, the level of an mRNA can be measured by a quantitative sequencing technology, e.g. a quantitative next-generation sequence technology. Methods of sequencing a nucleic acid sequence are well known in the art. Briefly, a sample obtained from a subject can be contacted with one or more primers which specifically hybridize to a single-strand nucleic acid sequence flanking the target gene sequence and a complementary strand is synthesized. In some next-generation technologies, an adaptor (double or single-stranded) is ligated to nucleic acid molecules in the sample and synthesis proceeds from the adaptor or adaptor compatible primers. In some third-generation technologies, the sequence can be determined, e.g. by determining the location and pattern of the hybridization of probes, or measuring one or more characteristics of a single molecule as it passes through a sensor (e.g. the modulation of an electrical field as a nucleic acid molecule passes through a nanopore). Exemplary methods of sequencing include, but are not limited to, Sanger sequencing, dideoxy chain termination, high-throughput sequencing, next generation sequencing, 454 sequencing, SOLiD sequencing, polony sequencing, Illumina sequencing, Ion Torrent sequencing, sequencing by hybridization, nanopore sequencing, Helioscope sequencing, single molecule real time sequencing, RNAP sequencing, and the like. Methods and protocols for performing these sequencing methods are known in the art, see, e.g. “Next Generation Genome Sequencing” Ed. Michal Janitz, Wiley-VCH; “High-Throughput Next Generation Sequencing” Eds. Kwon and Ricke, Humanna Press, 2011; and Sambrook et al., Molecular Cloning: A Laboratory Manual (4 ed.), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., USA (2012); which are incorporated by reference herein in their entireties.

Nucleic acid and ribonucleic acid (RNA) molecules can be isolated from a particular biological sample using any of a number of procedures, which are well-known in the art, the particular isolation procedure chosen being appropriate for the particular biological sample. For example, freeze-thaw and alkaline lysis procedures can be useful for obtaining nucleic acid molecules from solid materials; heat and alkaline lysis procedures can be useful for obtaining nucleic acid molecules from urine; and proteinase K extraction can be used to obtain nucleic acid from blood (Roiff, A et al. PCR: Clinical Diagnostics and Research, Springer (1994)).

In some embodiments of any of the aspects, one or more of the reagents (e.g. an antibody reagent and/or nucleic acid probe) described herein can comprise a detectable label and/or comprise the ability to generate a detectable signal (e.g. by catalyzing reaction converting a compound to a detectable product). Detectable labels can comprise, for example, a light-absorbing dye, a fluorescent dye, or a radioactive label. Detectable labels, methods of detecting them, and methods of incorporating them into reagents (e.g. antibodies and nucleic acid probes) are well known in the art.

In some embodiments of any of the aspects, detectable labels can include labels that can be detected by spectroscopic, photochemical, biochemical, immunochemical, electromagnetic, radiochemical, or chemical means, such as fluorescence, chemifluoresence, or chemiluminescence, or any other appropriate means. The detectable labels used in the methods described herein can be primary labels (where the label comprises a moiety that is directly detectable or that produces a directly detectable moiety) or secondary labels (where the detectable label binds to another moiety to produce a detectable signal, e.g., as is common in immunological labeling using secondary and tertiary antibodies). The detectable label can be linked by covalent or non-covalent means to the reagent. Alternatively, a detectable label can be linked such as by directly labeling a molecule that achieves binding to the reagent via a ligand-receptor binding pair arrangement or other such specific recognition molecules. Detectable labels can include, but are not limited to radioisotopes, bioluminescent compounds, chromophores, antibodies, chemiluminescent compounds, fluorescent compounds, metal chelates, and enzymes.

In other embodiments, the detection reagent is label with a fluorescent compound. When the fluorescently labeled reagent is exposed to light of the proper wavelength, its presence can then be detected due to fluorescence. In some embodiments of any of the aspects, a detectable label can be a fluorescent dye molecule, or fluorophore including, but not limited to fluorescein, phycoerythrin, phycocyanin, o-phthaldehyde, fluorescamine, Cy3™, Cy5™, allophycocyanine, Texas Red, peridenin chlorophyll, cyanine, tandem conjugates such as phycoerythrin-Cy5™, green fluorescent protein, rhodamine, fluorescein isothiocyanate (FITC) and Oregon Green™, rhodamine and derivatives (e.g., Texas red and tetrarhodimine isothiocynate (TRITC)), biotin, phycoerythrin, AMCA, CyDyes™, 6-carboxyfhiorescein (commonly known by the abbreviations FAM and F), 6-carboxy-2′,4′,7′,4,7-hexachlorofluorescein (HEX), 6-carboxy-4′,5′-dichloro-2′,7′-dimethoxyfluorescein (JOE or J), N,N,N′,N′-tetramethyl-6carboxyrhodamine (TAMRA or T), 6-carboxy-X-rhodamine (ROX or R), 5-carboxyrhodamine-6G (R6G5 or G5), 6-carboxyrhodamine-6G (R6G6 or G6), and rhodamine 110; cyanine dyes, e.g. Cy3, Cy5 and Cy7 dyes; coumarins, e.g umbelliferone; benzimide dyes, e.g. Hoechst 33258; phenanthridine dyes, e.g. Texas Red; ethidium dyes; acridine dyes; carbazole dyes; phenoxazine dyes; porphyrin dyes; polymethine dyes, e.g. cyanine dyes such as Cy3, Cy5, etc; BODIPY dyes and quinoline dyes. In some embodiments of any of the aspects, a detectable label can be a radiolabel including, but not limited to 3H, 125I, 35S, 14C, 32P, and 33P. In some embodiments of any of the aspects, a detectable label can be an enzyme including, but not limited to horseradish peroxidase and alkaline phosphatase. An enzymatic label can produce, for example, a chemiluminescent signal, a color signal, or a fluorescent signal. Enzymes contemplated for use to detectably label an antibody reagent include, but are not limited to, malate dehydrogenase, staphylococcal nuclease, delta-V-steroid isomerase, yeast alcohol dehydrogenase, alpha-glycerophosphate dehydrogenase, triose phosphate isomerase, horseradish peroxidase, alkaline phosphatase, asparaginase, glucose oxidase, beta-galactosidase, ribonuclease, urease, catalase, glucose-VI-phosphate dehydrogenase, glucoamylase and acetylcholinesterase. In some embodiments of any of the aspects, a detectable label is a chemiluminescent label, including, but not limited to lucigenin, luminol, luciferin, isoluminol, theromatic acridinium ester, imidazole, acridinium salt and oxalate ester. In some embodiments of any of the aspects, a detectable label can be a spectral colorimetric label including, but not limited to colloidal gold or colored glass or plastic (e.g., polystyrene, polypropylene, and latex) beads.

In some embodiments of any of the aspects, detection reagents can also be labeled with a detectable tag, such as c-Myc, HA, VSV-G, HSV, FLAG, V5, HIS, or biotin. Other detection systems can also be used, for example, a biotin-streptavidin system. In this system, the antibodies immunoreactive (i. e. specific for) with the biomarker of interest is biotinylated. Quantity of biotinylated antibody bound to the biomarker is determined using a streptavidin-peroxidase conjugate and a chromagenic substrate. Such streptavidin peroxidase detection kits are commercially available, e. g. from DAKO; Carpinteria, Calif. A reagent can also be detectably labeled using fluorescence emitting metals such as 152Eu, or others of the lanthanide series. These metals can be attached to the reagent using such metal chelating groups as diethylenetriaminepentaacetic acid (DTPA) or ethylenediaminetetraacetic acid (EDTA).

A level which is less than a reference level can be a level which is less by at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 50%, at least about 60%, at least about 80%, at least about 90%, or less relative to the reference level. In some embodiments of any of the aspects, a level which is less than a reference level can be a level which is statistically significantly less than the reference level.

A level which is more than a reference level can be a level which is greater by at least about 10%, at least about 20%, at least about 50%, at least about 60%, at least about 80%, at least about 90%, at least about 100%, at least about 200%, at least about 300%, at least about 500% or more than the reference level. In some embodiments of any of the aspects, a level which is more than a reference level can be a level which is statistically significantly greater than the reference level.

In some embodiments of any of the aspects, the reference can be a level of the target in a population of subjects who do not have or are not diagnosed as having, and/or do not exhibit signs or symptoms of lung infection and/or lung inflammation. In some embodiments of any of the aspects, the reference can also be a level of the target in a control sample, a pooled sample of control individuals or a numeric value or range of values based on the same. In some embodiments of any of the aspects, the reference can be the level of a target in a sample obtained from the same subject at an earlier point in time, e.g., the methods described herein can be used to determine if a subject's sensitivity or response to a given therapy is changing over time.

In some embodiments of the foregoing aspects, the expression level of a given gene can be normalized relative to the expression level of one or more reference genes or reference proteins.

In some embodiments of any of the aspects, the reference level can be the level in a sample of similar cell type, sample type, sample processing, and/or obtained from a subject of similar age, sex and other demographic parameters as the sample/subject for which the level of neutrophil accumulation and/or polyP is to be determined. In some embodiments of any of the aspects, the test sample and control reference sample are of the same type, that is, obtained from the same biological source, and comprising the same composition, e.g. the same number and type of cells.

The term “sample” or “test sample” as used herein denotes a sample taken or isolated from a biological organism, e.g., a blood or plasma sample from a subject. In some embodiments of any of the aspects, the present invention encompasses several examples of a biological sample. In some embodiments of any of the aspects, the biological sample is cells, or tissue, or peripheral blood, or bodily fluid. Exemplary biological samples include, but are not limited to, a biopsy, a tumor sample, biofluid sample; blood; serum; plasma; urine; sperm; mucus; tissue biopsy; organ biopsy; synovial fluid; bile fluid; cerebrospinal fluid; mucosal secretion; effusion; sweat; saliva; and/or tissue sample etc. The term also includes a mixture of the above-mentioned samples. The term “test sample” also includes untreated or pretreated (or pre-processed) biological samples. In some embodiments of any of the aspects, a test sample can comprise cells from a subject. In some embodiments of any of the aspects, the test sample can be a lung sample, lung aspirate, sputum sample, airway sample, serum sample, or the like.

The test sample can be obtained by removing a sample from a subject, but can also be accomplished by using a previously isolated sample (e.g. isolated at a prior timepoint and isolated by the same or another person).

In some embodiments of any of the aspects, the test sample can be an untreated test sample. As used herein, the phrase “untreated test sample” refers to a test sample that has not had any prior sample pre-treatment except for dilution and/or suspension in a solution. Exemplary methods for treating a test sample include, but are not limited to, centrifugation, filtration, sonication, homogenization, heating, freezing and thawing, and combinations thereof. In some embodiments of any of the aspects, the test sample can be a frozen test sample, e.g., a frozen tissue. The frozen sample can be thawed before employing methods, assays and systems described herein. After thawing, a frozen sample can be centrifuged before being subjected to methods, assays and systems described herein. In some embodiments of any of the aspects, the test sample is a clarified test sample, for example, by centrifugation and collection of a supernatant comprising the clarified test sample. In some embodiments of any of the aspects, a test sample can be a pre-processed test sample, for example, supernatant or filtrate resulting from a treatment selected from the group consisting of centrifugation, filtration, thawing, purification, and any combinations thereof. In some embodiments of any of the aspects, the test sample can be treated with a chemical and/or biological reagent. Chemical and/or biological reagents can be employed to protect and/or maintain the stability of the sample, including biomolecules (e.g., nucleic acid and protein) therein, during processing. One exemplary reagent is a protease inhibitor, which is generally used to protect or maintain the stability of protein during processing. The skilled artisan is well aware of methods and processes appropriate for pre-processing of biological samples required for determination of the level of an expression product as described herein.

For convenience, the meaning of some terms and phrases used in the specification, examples, and appended claims, are provided below. Unless stated otherwise, or implicit from context, the following terms and phrases include the meanings provided below. The definitions are provided to aid in describing particular embodiments, and are not intended to limit the claimed invention, because the scope of the invention is limited only by the claims. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. If there is an apparent discrepancy between the usage of a term in the art and its definition provided herein, the definition provided within the specification shall prevail.

For convenience, certain terms employed herein, in the specification, examples and appended claims are collected here.

The terms “decrease”, “reduced”, “reduction”, or “inhibit” are all used herein to mean a decrease by a statistically significant amount. In some embodiments of any of the aspects, “reduce,” “reduction” or “decrease” or “inhibit” typically means a decrease by at least 10% as compared to a reference level (e.g. the absence of a given treatment or agent) and can include, for example, a decrease by at least about 10%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, at least about 99%, or more. As used herein, “reduction” or “inhibition” does not encompass a complete inhibition or reduction as compared to a reference level. “Complete inhibition” is a 100% inhibition as compared to a reference level. A decrease can be preferably down to a level accepted as within the range of normal for an individual without a given disorder.

The terms “increased”, “increase”, “enhance”, or “activate” are all used herein to mean an increase by a statically significant amount. In some embodiments of any of the aspects, the terms “increased”, “increase”, “enhance”, or “activate” can mean an increase of at least 10% as compared to a reference level, for example an increase of at least about 20%, or at least about 30%, or at least about 40%, or at least about 50%, or at least about 60%, or at least about 70%, or at least about 80%, or at least about 90% or up to and including a 100% increase or any increase between 10-100% as compared to a reference level, or at least about a 2-fold, or at least about a 3-fold, or at least about a 4-fold, or at least about a 5-fold or at least about a 10-fold increase, or any increase between 2-fold and 10-fold or greater as compared to a reference level. In the context of a marker or symptom, a “increase” is a statistically significant increase in such level.

As used herein, a “subject” means a human or animal. Usually the animal is a vertebrate such as a primate, rodent, domestic animal or game animal. Primates include chimpanzees, cynomologous monkeys, spider monkeys, and macaques, e.g., Rhesus. Rodents include mice, rats, woodchucks, ferrets, rabbits and hamsters. Domestic and game animals include cows, horses, pigs, deer, bison, buffalo, feline species, e.g., domestic cat, canine species, e.g., dog, fox, wolf, avian species, e.g., chicken, emu, ostrich, and fish, e.g., trout, catfish and salmon. In some embodiments of any of the aspects, the subject is a mammal, e.g., a primate, e.g., a human. The terms, “individual,” “patient” and “subject” are used interchangeably herein.

Preferably, the subject is a mammal. The mammal can be a human, non-human primate, mouse, rat, dog, cat, horse, or cow, but is not limited to these examples. Mammals other than humans can be advantageously used as subjects that represent animal models of a condition. A subject can be male or female.

A subject can be one who has been previously diagnosed with or identified as suffering from or having a condition in need of treatment or one or more complications related to such a condition, and optionally, have already undergone treatment for the condition or the one or more complications related to the condition. Alternatively, a subject can also be one who has not been previously diagnosed as having the condition or one or more complications related to the condition. For example, a subject can be one who exhibits one or more risk factors for the condition or one or more complications related to the condition or a subject who does not exhibit risk factors.

A “subject in need” of treatment for a particular condition can be a subject having that condition, diagnosed as having that condition, or at risk of developing that condition.

In the various embodiments described herein, it is further contemplated that variants (naturally occurring or otherwise), alleles, homologs, conservatively modified variants, and/or conservative substitution variants of any of the particular polypeptides described are encompassed. As to amino acid sequences, one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters a single amino acid or a small percentage of amino acids in the encoded sequence is a “conservatively modified variant” where the alteration results in the substitution of an amino acid with a chemically similar amino acid and retains the desired activity of the polypeptide. Such conservatively modified variants are in addition to and do not exclude polymorphic variants, interspecies homologs, and alleles consistent with the disclosure.

A given amino acid can be replaced by a residue having similar physiochemical characteristics, e.g., substituting one aliphatic residue for another (such as Ile, Val, Leu, or Ala for one another), or substitution of one polar residue for another (such as between Lys and Arg; Glu and Asp; or Gln and Asn). Other such conservative substitutions, e.g., substitutions of entire regions having similar hydrophobicity characteristics, are well known. Polypeptides comprising conservative amino acid substitutions can be tested in any one of the assays described herein to confirm that a desired activity, e.g. activity and specificity of a native or reference polypeptide is retained.

Amino acids can be grouped according to similarities in the properties of their side chains (in A. L. Lehninger, in Biochemistry, second ed., pp. 73-75, Worth Publishers, New York (1975)): (1) non-polar: Ala (A), Val (V), Leu (L), Ile (I), Pro (P), Phe (F), Trp (W), Met (M); (2) uncharged polar: Gly (G), Ser (S), Thr (T), Cys (C), Tyr (Y), Asn (N), Gln (Q); (3) acidic: Asp (D), Glu (E); (4) basic: Lys (K), Arg (R), His (H). Alternatively, naturally occurring residues can be divided into groups based on common side-chain properties: (1) hydrophobic: Norleucine, Met, Ala, Val, Leu, Ile; (2) neutral hydrophilic: Cys, Ser, Thr, Asn, Gln; (3) acidic: Asp, Glu; (4) basic: His, Lys, Arg; (5) residues that influence chain orientation: Gly, Pro; (6) aromatic: Trp, Tyr, Phe. Non-conservative substitutions will entail exchanging a member of one of these classes for another class. Particular conservative substitutions include, for example; Ala into Gly or into Ser; Arg into Lys; Asn into Gln or into His; Asp into Glu; Cys into Ser; Gln into Asn; Glu into Asp; Gly into Ala or into Pro; His into Asn or into Gln; Ile into Leu or into Val; Leu into Ile or into Val; Lys into Arg, into Gln or into Glu; Met into Leu, into Tyr or into Ile; Phe into Met, into Leu or into Tyr; Ser into Thr; Thr into Ser; Trp into Tyr; Tyr into Trp; and/or Phe into Val, into Ile or into Leu.

The terms “miRNA” and “microRNA” refer to 21-25 nt non-coding RNAs derived from endogenous genes. They are processed from longer (ca. 75 nt) hairpin-like precursors termed pre-miRNAs. MicroRNAs assemble in complexes termed miRNPs and recognize their targets by antisense complementarity. If the microRNAs match 100% their target, i.e., the complementarity is complete, the target mRNA is cleaved, and the miRNA acts like a siRNA. If the match is incomplete, i.e., the complementarity is partial, then the translation of the target mRNA is blocked.

The terms “miRNA target site” or “microRNA target site” refers to a specific target binding sequence of a microRNA in a mRNA target. Complementarity between the miRNA and its target site need not be perfect.

As used herein, the terms “protein” and “polypeptide” are used interchangeably herein to designate a series of amino acid residues, connected to each other by peptide bonds between the alpha-amino and carboxy groups of adjacent residues. The terms “protein”, and “polypeptide” refer to a polymer of amino acids, including modified amino acids (e.g., phosphorylated, glycated, glycosylated, etc.) and amino acid analogs, regardless of its size or function. “Protein” and “polypeptide” are often used in reference to relatively large polypeptides, whereas the term “peptide” is often used in reference to small polypeptides, but usage of these terms in the art overlaps. The terms “protein” and “polypeptide” are used interchangeably herein when referring to a gene product and fragments thereof. Thus, exemplary polypeptides or proteins include gene products, naturally occurring proteins, homologs, orthologs, paralogs, fragments and other equivalents, variants, fragments, and analogs of the foregoing.

In some embodiments of any of the aspects, the polypeptide described herein (or a nucleic acid encoding such a polypeptide) can be a functional fragment of one of the amino acid sequences described herein. As used herein, a “functional fragment” is a fragment or segment of a peptide which retains at least 50% of the wildtype reference polypeptide's activity according to the assays described below herein. A functional fragment can comprise conservative substitutions of the sequences disclosed herein.

In some embodiments of any of the aspects, the polypeptide described herein can be a variant of a sequence described herein. In some embodiments of any of the aspects, the variant is a conservatively modified variant. Conservative substitution variants can be obtained by mutations of native nucleotide sequences, for example. A “variant,” as referred to herein, is a polypeptide substantially homologous to a native or reference polypeptide, but which has an amino acid sequence different from that of the native or reference polypeptide because of one or a plurality of deletions, insertions or substitutions. Variant polypeptide-encoding DNA sequences encompass sequences that comprise one or more additions, deletions, or substitutions of nucleotides when compared to a native or reference DNA sequence, but that encode a variant protein or fragment thereof that retains activity. A wide variety of PCR-based site-specific mutagenesis approaches are known in the art and can be applied by the ordinarily skilled artisan.

A variant amino acid or DNA sequence can be at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or more, identical to a native or reference sequence. The degree of homology (percent identity) between a native and a mutant sequence can be determined, for example, by comparing the two sequences using freely available computer programs commonly employed for this purpose on the world wide web (e.g. BLASTp or BLASTn with default settings).

Alterations of the native amino acid sequence can be accomplished by any of a number of techniques known to one of skill in the art. Mutations can be introduced, for example, at particular loci by synthesizing oligonucleotides containing a mutant sequence, flanked by restriction sites enabling ligation to fragments of the native sequence. Following ligation, the resulting reconstructed sequence encodes an analog having the desired amino acid insertion, substitution, or deletion. Alternatively, oligonucleotide-directed site-specific mutagenesis procedures can be employed to provide an altered nucleotide sequence having particular codons altered according to the substitution, deletion, or insertion required. Techniques for making such alterations are very well established and include, for example, those disclosed by Walder et al. (Gene 42:133, 1986); Bauer et al. (Gene 37:73, 1985); Craik (BioTechniques, January 1985, 12-19); Smith et al. (Genetic Engineering: Principles and Methods, Plenum Press, 1981); and U.S. Pat. Nos. 4,518,584 and 4,737,462, which are herein incorporated by reference in their entireties. Any cysteine residue not involved in maintaining the proper conformation of the polypeptide also can be substituted, generally with serine, to improve the oxidative stability of the molecule and prevent aberrant crosslinking. Conversely, cysteine bond(s) can be added to the polypeptide to improve its stability or facilitate oligomerization.

As used herein, the term “Erythropoiesis” is the process which produces red blood cells, which is the development from erythropoietic stem cell to mature red blood cell. As used herein, the term “erythroid cells” referes to red blood cells.

As used herein, the term “nucleic acid” or “nucleic acid sequence” refers to any molecule, preferably a polymeric molecule, incorporating units of ribonucleic acid, deoxyribonucleic acid or an analog thereof. The nucleic acid can be either single-stranded or double-stranded. A single-stranded nucleic acid can be one nucleic acid strand of a denatured double- stranded DNA. Alternatively, it can be a single-stranded nucleic acid not derived from any double-stranded DNA. In one aspect of any of the embodiments, the nucleic acid can be DNA. In another aspect, the nucleic acid can be RNA. Suitable DNA can include, e.g., genomic DNA or cDNA. Suitable RNA can include, e.g., mRNA.

The term “expression” refers to the cellular processes involved in producing RNA and proteins and as appropriate, secreting proteins, including where applicable, but not limited to, for example, transcription, transcript processing, translation and protein folding, modification and processing. Expression can refer to the transcription and stable accumulation of sense (mRNA) or antisense RNA derived from a nucleic acid fragment or fragments of the invention and/or to the translation of mRNA into a polypeptide.

In some embodiments of any of the aspects, the expression of a biomarker(s), target(s), or gene/polypeptide described herein is/are tissue-specific. In some embodiments of any of the aspects, the expression of a biomarker(s), target(s), or gene/polypeptide described herein is/are global. In some embodiments of any of the aspects, the expression of a biomarker(s), target(s), or gene/polypeptide described herein is systemic.

As used herein, “expression products” include RNA transcribed from a gene, and polypeptides obtained by translation of mRNA transcribed from a gene. The term “gene” means the nucleic acid sequence which is transcribed (DNA) to RNA in vitro or in vivo when operably linked to appropriate regulatory sequences. The gene may or may not include regions preceding and following the coding region, e.g. 5′ untranslated (5′UTR) or “leader” sequences and 3′ UTR or “trailer” sequences, as well as intervening sequences (introns) between individual coding segments (exons).

As used herein, “5′UTR” or “5′ untranslated region” or “5′ leader sequence” refers to regions of an mR A that are not translated. A 5′UTR typically begins at the transcription start site and ends just before the translation initiation site or start codon (usually AUG in an mRNA, ATG in a DNA sequence) of the coding region. The length of the 5′UTR may be modified by mutation for example substitution, deletion or insertion of the 5′UTR. The 5′UTR may be further modified by mutating a naturally occurring start codon or translation initiation site such that the codon no longer functions as start codon and translation may initiate at an alternate initiation site.

As used herein, an “expression enhancer”, an “enhancer sequence” or an “enhancer element”, refers to a nucleic acid sequence that can enhance expression of a downstream heterologous open reading frame (ORF) to which they are operably linked to.

As used herein, the term “post-transcriptional regulation”, refers to the control of gene expression at the RNA level, between the transcription and the translation of the gene.

As used herein, the term “operably linked” refers to sequences that interact either directly or indirectly to carry out an intended function, e.g. the mediation or modulation of expression of a nucleic acid sequence. The interaction of operatively linked sequences may, for example, be mediated by proteins that interact with the operatively linked sequences. Typically, it refers to the functional relationship of a transcriptional regulatory sequence to a transcribed sequence. For example, a promoter sequence is operably linked to an open reading frame if it stimulates or modulates the transcription of the open reading frame in an appropriate host cell or other expression system. Generally, promoter transcriptional regulatory sequences that are operably linked to a transcribed sequence are physically contiguous to the transcribed sequence, i.e., they are cis-acting. However, some transcriptional regulatory sequences, such as enhancers, need not be physically contiguous or located in close proximity to the open reading frame s whose transcription they enhance.

“Marker” in the context of the present invention refers to an expression product, e.g., nucleic acid or polypeptide which is differentially present in a sample taken from subjects having increased neutrophil accumulation and/or polyP, as compared to a comparable sample taken from control subjects (e.g., a healthy subject). The term “biomarker” is used interchangeably with the term “marker.”

In some embodiments of any of the aspects, the methods described herein relate to measuring, detecting, or determining the level of at least one marker. As used herein, the term “detecting” or “measuring” refers to observing a signal from, e.g. a probe, label, or target molecule to indicate the presence of an analyte in a sample. Any method known in the art for detecting a particular label moiety can be used for detection. Exemplary detection methods include, but are not limited to, spectroscopic, fluorescent, photochemical, biochemical, immunochemical, electrical, optical or chemical methods. In some embodiments of any of the aspects, measuring can be a quantitative observation.

In some embodiments of any of the aspects, a polypeptide, nucleic acid, or cell as described herein can be engineered. As used herein, “engineered” refers to the aspect of having been manipulated by the hand of man. For example, a polypeptide is considered to be “engineered” when at least one aspect of the polypeptide, e.g., its sequence, has been manipulated by the hand of man to differ from the aspect as it exists in nature. As is common practice and is understood by those in the art, progeny of an engineered cell are typically still referred to as “engineered” even though the actual manipulation was performed on a prior entity.

As used herein, the term “distal” refers to a nucleic acid sequence upstream of the gene that may contain additional regulatory elements (e.g. distal promoter elements are regulatory DNA sequences that can be many kilobases distant from the gene that they regulate). Each strand of DNA or RNA has a 5′ end and a 3′ end, so named for the carbon position on the deoxyribose (or ribose) ring. As used herein, the term “upstream” refers to the relative positions of the genetic code in DNA and/or RNA. the 5′ to 3′ direction respectively in which RNA transcription takes place.

The term “exogenous” refers to a substance present in a cell other than its native source. The term “exogenous” when used herein can refer to a nucleic acid (e.g. a nucleic acid encoding a polypeptide) or a polypeptide that has been introduced by a process involving the hand of man into a biological system such as a cell or organism in which it is not normally found and one wishes to introduce the nucleic acid or polypeptide into such a cell or organism. Alternatively, “exogenous” can refer to a nucleic acid or a polypeptide that has been introduced by a process involving the hand of man into a biological system such as a cell or organism in which it is found in relatively low amounts and one wishes to increase the amount of the nucleic acid or polypeptide in the cell or organism, e.g., to create ectopic expression or levels. In contrast, the term “endogenous” refers to a substance that is native to the biological system or cell. As used herein, “ectopic” refers to a substance that is found in an unusual location and/or amount. An ectopic substance can be one that is normally found in a given cell, but at a much lower amount and/or at a different time. Ectopic also includes substance, such as a polypeptide or nucleic acid that is not naturally found or expressed in a given cell in its natural environment.

In some embodiments of any of the aspects, a nucleic acid described herein, e.g., an inhibitory nucleic acid is or is provided or administered when it is comprised by a vector. In some of the aspects described herein, a nucleic acid sequence is operably linked to a vector. The term “vector”, as used herein, refers to a nucleic acid construct designed for delivery to a host cell or for transfer between different host cells. As used herein, a vector can be viral or non-viral.

The term “vector” encompasses any genetic element that is capable of replication when associated with the proper control elements and that can transfer gene sequences to cells. A vector can include, but is not limited to, a cloning vector, an expression vector, a plasmid, phage, transposon, cosmid, chromosome, virus, virion, etc. A vector can be a plasmid or lentiviral vector.

As used herein, the term “viral vector” refers to a nucleic acid vector construct that includes at least one element of viral origin and has the capacity to be packaged into a viral vector particle. The viral vector can contain the nucleic acid encoding a polypeptide as described herein in place of non-essential viral genes. The vector and/or particle may be utilized for the purpose of transferring any nucleic acids into cells either in vitro or in vivo. Numerous forms of viral vectors are known in the art.

By “recombinant vector” is meant a vector that includes a heterologous nucleic acid sequence, or “transgene” that is capable of expression in vivo. It should be understood that the vectors described herein can, In some embodiments of any of the aspects, be combined with other suitable compositions and therapies. In some embodiments of any of the aspects, the vector is episomal. The use of a suitable episomal vector provides a means of maintaining the nucleotide of interest in the subject in high copy number extra chromosomal DNA thereby eliminating potential effects of chromosomal integration. In some embodiments of any of the aspects, the vector is recombinant, e.g., it comprises sequences originating from at least two different sources. In some embodiments of any of the aspects, the vector comprises sequences originating from at least two different species. In some embodiments of any of the aspects, the vector comprises sequences originating from at least two different genes, e.g., it comprises a fusion protein or a nucleic acid encoding an expression product which is operably linked to at least one non-native (e.g., heterologous) genetic control element (e.g., a promoter, suppressor, activator, enhancer, response element, or the like).

As used herein, the term “heterologous” means a nucleic acid sequence or polypeptide that originates from a foreign species, or that is substantially modified from its original form if from the same species.

In some embodiments of any of the aspects, the vector or nucleic acid described herein is codon-optomized, e.g., the native or wild-type sequence of the nucleic acid sequence has been altered or engineered to include alternative codons such that altered or engineered nucleic acid encodes the same polypeptide expression product as the native/wild-type sequence, but will be transcribed and/or translated at an improved efficiency in a desired expression system. In some embodiments of any of the aspects, the expression system is an organism other than the source of the native/wild-type sequence (or a cell obtained from such organism). In some embodiments of any of the aspects, the vector and/or nucleic acid sequence described herein is codon-optimized for expression in a mammal or mammalian cell, e.g., a mouse, a murine cell, or a human cell. In some embodiments of any of the aspects, the vector and/or nucleic acid sequence described herein is codon-optimized for expression in a human cell. In some embodiments of any of the aspects, the vector and/or nucleic acid sequence described herein is codon-optimized for expression in a yeast or yeast cell. In some embodiments of any of the aspects, the vector and/or nucleic acid sequence described herein is codon-optimized for expression in a cell. In some embodiments of any of the aspects, the vector and/or nucleic acid sequence described herein is codon-optimized for expression in an E. coli cell.

As used herein, the term “expression vector” refers to a vector that directs expression of an RNA or polypeptide from sequences linked to transcriptional regulatory sequences on the vector. The sequences expressed will often, but not necessarily, be heterologous to the cell. An expression vector may comprise additional elements, for example, the expression vector may have two replication systems, thus allowing it to be maintained in two organisms, for example in human cells for expression and in a prokaryotic host for cloning and amplification.

The term “regulatory sequence” is intended to include promoters, enhancers and other expression control elements (e.g., polyadenylation signals) that control the transcription or translation of a gene they are operably linked to. Such regulatory sequences are described, for example, in Goeddel; Gene Expression Technology. Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990). Examples of regulatory sequences for mammalian host cell expression include viral elements that direct high levels of protein expression in mammalian cells, such as promoters and/or enhancers derived front cytomegalovirus (CMV), Simian Virus 40 (SV40), adenovirus, (e.g., the adenovirus major late promoter (AdMLP)) and polyoma. Alternatively, nonviral regulatory sequences may be used, such as the ubiquitin promoter, Elongation factor 1-alpha 1 (eEF1a1) promoter or β-globin promoter. A eukaryotic promoter is a regulatory region of DNA located upstream of a gene that binds transcription factor II D (TFIID) and allows the subsequent coordination of components of the transcription initiation complex, facilitating recruitment of RNA polymerase II and initiation of transcription. Genes with complex promoters are likely to make use of regulatory elements, such as enhancers and silencers, selectively, allowing varying levels of expression as required.

As used herein, the terms “treat” “treatment,” “treating,” or “amelioration” refer to therapeutic treatments, wherein the object is to reverse, alleviate, ameliorate, inhibit, slow down or stop the progression or severity of a condition associated with a disease or disorder, e.g. a lung infection and/or lung inflammation. The term “treating” includes reducing or alleviating at least one adverse effect or symptom of a condition, disease or disorder associated with a condition. Treatment is generally “effective” if one or more symptoms or clinical markers are reduced. Alternatively, treatment is “effective” if the progression of a disease is reduced or halted. That is, “treatment” includes not just the improvement of symptoms or markers, but also a cessation of, or at least slowing of, progress or worsening of symptoms compared to what would be expected in the absence of treatment. Beneficial or desired clinical results include, but are not limited to, alleviation of one or more symptom(s), diminishment of extent of disease, stabilized (i.e., not worsening) state of disease, delay or slowing of disease progression, amelioration or palliation of the disease state, remission (whether partial or total), and/or decreased mortality, whether detectable or undetectable. The term “treatment” of a disease also includes providing relief from the symptoms or side-effects of the disease (including palliative treatment).

As used herein, the term “pharmaceutical composition” refers to the active agent in combination with a pharmaceutically acceptable carrier e.g. a carrier commonly used in the pharmaceutical industry. The phrase “pharmaceutically acceptable” is employed herein to refer to those compounds, materials, compositions, and/or dosage forms which are, within the scope of sound medical judgment, suitable for use in contact with the tissues of human beings and animals without excessive toxicity, irritation, allergic response, or other problem or complication, commensurate with a reasonable benefit/risk ratio. In some embodiments of any of the aspects, a pharmaceutically acceptable carrier can be a carrier other than water. In some embodiments of any of the aspects, a pharmaceutically acceptable carrier can be a cream, emulsion, gel, liposome, nanoparticle, and/or ointment. In some embodiments of any of the aspects, a pharmaceutically acceptable carrier can be an artificial or engineered carrier, e.g., a carrier that the active ingredient would not be found to occur in in nature.

As used herein, the term “administering,” refers to the placement of a compound as disclosed herein into a subject by a method or route which results in at least partial delivery of the agent at a desired site. Pharmaceutical compositions comprising the compounds disclosed herein can be administered by any appropriate route which results in an effective treatment in the subject. In some embodiments of any of the aspects, administration comprises physical human activity, e.g., an injection, act of ingestion, an act of application, and/or manipulation of a delivery device or machine. Such activity can be performed, e.g., by a medical professional and/or the subject being treated.

As used herein, “contacting” refers to any suitable means for delivering, or exposing, an agent to at least one cell. Exemplary delivery methods include, but are not limited to, direct delivery to cell culture medium, perfusion, injection, or other delivery method well known to one skilled in the art. In some embodiments of any of the aspects, contacting comprises physical human activity, e.g., an injection; an act of dispensing, mixing, and/or decanting; and/or manipulation of a delivery device or machine.

The term “statistically significant” or “significantly” refers to statistical significance and generally means a two standard deviation (2SD) or greater difference.

Other than in the operating examples, or where otherwise indicated, all numbers expressing quantities of ingredients or reaction conditions used herein should be understood as modified in all instances by the term “about.” The term “about” when used in connection with percentages can mean±1%.

As used herein, the term “comprising” means that other elements can also be present in addition to the defined elements presented. The use of “comprising” indicates inclusion rather than limitation.

The term “consisting of” refers to compositions, methods, and respective components thereof as described herein, which are exclusive of any element not recited in that description of the embodiment.

As used herein the term “consisting essentially of” refers to those elements required for a given embodiment. The term permits the presence of additional elements that do not materially affect the basic and novel or functional characteristic(s) of that embodiment of the invention.

As used herein, the term “specific binding” refers to a chemical interaction between two molecules, compounds, cells and/or particles wherein the first entity binds to the second, target entity with greater specificity and affinity than it binds to a third entity which is a non-target. In some embodiments of any of the aspects, specific binding can refer to an affinity of the first entity for the second target entity which is at least 10 times, at least 50 times, at least 100 times, at least 500 times, at least 1000 times or greater than the affinity for the third nontarget entity. A reagent specific for a given target is one that exhibits specific binding for that target under the conditions of the assay being utilized.

The singular terms “a,” “an,” and “the” include plural referents unless context clearly indicates otherwise. Similarly, the word “or” is intended to include “and” unless the context clearly indicates otherwise. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of this disclosure, suitable methods and materials are described below. The abbreviation, “e.g.” is derived from the Latin exempli gratia, and is used herein to indicate a non-limiting example. Thus, the abbreviation “e.g.” is synonymous with the term “for example.”

Groupings of alternative elements or embodiments of the invention disclosed herein are not to be construed as limitations. Each group member can be referred to and claimed individually or in any combination with other members of the group or other elements found herein. One or more members of a group can be included in, or deleted from, a group for reasons of convenience and/or patentability. When any such inclusion or deletion occurs, the specification is herein deemed to contain the group as modified thus fulfilling the written description of all Markush groups used in the appended claims.

Unless otherwise defined herein, scientific and technical terms used in connection with the present application shall have the meanings that are commonly understood by those of ordinary skill in the art to which this disclosure belongs. It should be understood that this invention is not limited to the particular methodology, protocols, and reagents, etc., described herein and as such can vary. The terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention, which is defined solely by the claims. Definitions of common terms in immunology and molecular biology can be found in The Merck Manual of Diagnosis and Therapy, 20th Edition, published by Merck Sharp & Dohme Corp., 2018 (ISBN 0911910190, 978-0911910421); Robert S. Porter et al. (eds.), The Encyclopedia of Molecular Cell Biology and Molecular Medicine, published by Blackwell Science Ltd., 1999-2012 (ISBN 9783527600908); and Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, Inc., 1995 (ISBN 1-56081-569-8); Immunology by Werner Luttmann, published by Elsevier, 2006; Janeway's Immunobiology, Kenneth Murphy, Allan Mowat, Casey Weaver (eds.), W. W. Norton & Company, 2016 (ISBN 0815345054, 978-0815345053); Lewin's Genes XI, published by Jones & Bartlett Publishers, 2014 (ISBN-1449659055); Michael Richard Green and Joseph Sambrook, Molecular Cloning: A Laboratory Manual, 4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., USA (2012) (ISBN 1936113414); Davis et al., Basic Methods in Molecular Biology, Elsevier Science Publishing, Inc., New York, USA (2012) (ISBN 044460149X); Laboratory Methods in Enzymology: DNA, Jon Lorsch (ed.) Elsevier, 2013 (ISBN 0124199542); Current Protocols in Molecular Biology (CPMB), Frederick M. Ausubel (ed.), John Wiley and Sons, 2014 (ISBN 047150338X, 9780471503385), Current Protocols in Protein Science (CPPS), John E. Coligan (ed.), John Wiley and Sons, Inc., 2005; and Current Protocols in Immunology (CPI) (John E. Coligan, ADA M Kruisbeek, David H Margulies, Ethan M Shevach, Warren Strobe, (eds.) John Wiley and Sons, Inc., 2003 (ISBN 0471142735, 9780471142737), the contents of which are all incorporated by reference herein in their entireties.

Other terms are defined herein within the description of the various aspects of the invention.

All patents and other publications; including literature references, issued patents, published patent applications, and co-pending patent applications; cited throughout this application are expressly incorporated herein by reference for the purpose of describing and disclosing, for example, the methodologies described in such publications that might be used in connection with the technology described herein. These publications are provided solely for their disclosure prior to the filing date of the present application. Nothing in this regard should be construed as an admission that the inventors are not entitled to antedate such disclosure by virtue of prior invention or for any other reason. All statements as to the date or representation as to the contents of these documents is based on the information available to the applicants and does not constitute any admission as to the correctness of the dates or contents of these documents.

The description of embodiments of the disclosure is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. While specific embodiments of, and examples for, the disclosure are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the disclosure, as those skilled in the relevant art will recognize. For example, while method steps or functions are presented in a given order, alternative embodiments may perform functions in a different order, or functions may be performed substantially concurrently. The teachings of the disclosure provided herein can be applied to other procedures or methods as appropriate. The various embodiments described herein can be combined to provide further embodiments. Aspects of the disclosure can be modified, if necessary, to employ the compositions, functions and concepts of the above references and application to provide yet further embodiments of the disclosure. Moreover, due to biological functional equivalency considerations, some changes can be made in protein structure without affecting the biological or chemical action in kind or amount. These and other changes can be made to the disclosure in light of the detailed description. All such modifications are intended to be included within the scope of the appended claims.

Specific elements of any of the foregoing embodiments can be combined or substituted for elements in other embodiments. Furthermore, while advantages associated with certain embodiments of the disclosure have been described in the context of these embodiments, other embodiments may also exhibit such advantages, and not all embodiments need necessarily exhibit such advantages to fall within the scope of the disclosure.

Some embodiments of the technology described herein can be defined according to any of the following numbered paragraphs:

- 1. A nucleic acid sequence comprising
  - a. at least one heterologous regulatory sequence selected from an hematopoietic enhancer element and miRNA binding site for a HSC restricted miRNA; and
  - b. a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.
- 2. The nucleic acid sequence of paragraph 1, comprising at least one hematopoietic enhancer element.
- 3. The nucleic acid sequence of paragraph 2, wherein the enhancer element comprises a sequence of at least 80% homology to a nucleotide sequence that is selected from the group consisting of: SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 38 and/or SEQ ID NO: 39.
- 4. The nucleic acid sequence of paragraph 2, wherein the enhancer element comprises an enhancer element of a gene selected from the group consisting of:
  - Kell metalloendopeptidase (KEL); 5′ aminolevulinate synthase 2 (ALAS2); and glycophorin A (GYPA).
- 5. The nucleic acid sequence of any of paragraphs 1-4, comprising at least one miRNA binding site for at least one HSC-restricted miRNA.
- 6. The nucleic acid sequence of any of paragraphs 1-5, wherein the at least one miRNA binding site for at least one HSC-restricted miRNA is selected from the group consisting of miR binding sites for miR10aT, miR125, miR155, miR130aT, miR142T, miR196bT, miR99, miR126miR126, miR181, miR193, miR223T, miR542, and let7e.
- 7. The nucleic acid sequence of any of paragraphs 1-6, comprising at least one hematopoietic enhancer element and at least one miRNA binding site for at least one HSC-restricted miRNA.
- 8. The nucleic acid sequence of any of paragraphs 1-7, further comprising:
  - a. a heterologous 5′ UTR comprising:
    - i. a 5′UTR sequence of a hematopoietic transcription factor other than GATA1;
    - ii. a sequence of at least 20 nucleotide acids; and/or
    - iii. 1-25 upstream codons uAUGs; and/or
  - b. a hematopoietic enhancer minigene.
- 9. A nucleic acid sequence comprising
  - a. a 5′ UTR comprising;
    - i. a 5′UTR sequence of a hematopoietic transcription factor other than GATA1;
    - ii. a sequence of at least 20 nucleotide acids; and/or
    - iii. 1-25 upstream codons uAUGs.
  - b. a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.
- 10. The nucleic acid sequence of any of paragraphs 1-9, wherein the 5′UTR comprises a 5′UTR of a gene selected from the group consisting of: Runt-related transcription factor 1 (RUNX1), LIM Domain Only 2 (LMO2), or ETS Variant 6 (ETV6).
- 11. The nucleic acid sequence of any of paragraphs 1-10, further comprising at least one hematopoietic enhancer element, miRNA binding site for a HSC restricted miRNA, and/or a hematopoietic enhancer minigene (G1HEM).
- 12. A nucleic acid sequence comprising
  - a. an hematopoietic enhancer minigene (G1HEM);
  - b. a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.
- 13. The nucleic acid sequence of paragraph 12, wherein the hematopoietic enhancer minigene (mG1HEM) comprises a sequence of at least 80% homology to a nucleotide sequence of: SEQ ID NO: 13.
- 14. The nucleic acid sequence of any of paragraphs 12-13, further comprising a 5′ UTR comprising;
  - i. a 5′UTR sequence of a hematopoietic transcription factor other than GATA1;
  - ii. a sequence of at least 20 nucleotide acids; and/or
  - iii. 1-25 upstream codons uAUGs; and/or
- at least one hematopoietic enhancer element; and/or at least one miRNA binding site for a HSC restricted miRNA.
- 15. The nucleic acid sequence of paragraph 14, wherein the 5′ UTR sequence of a hematopoietic transcription factor other than GATA1 is a 5′UTR sequence of a; a gene selected from the group consisting of: Runt-related transcription factor 1 (RUNX1), at least one hematopoietic enhancer element; and/or at least one miRNA binding site for a HSC restricted miRNA.
- 16. The nucleic acid sequence of any of paragraphs 1-15, wherein the binding site for at least one HSC restricted miRNA comprises a sequence selected from SEQ ID NOs: 31-37 and 43-55.
- 17. The nucleic acid sequence of any of paragraphs 1-16, wherein the hematopoietic enhancer element comprises a sequence with at least 80% sequence identity to a sequence selected from SEQ ID NOs: 10, 11, 12, 38, and 39.
- 18. The nucleic acid sequence of any of paragraphs 1-17, wherein the 5′ UTR sequence comprises a sequence with at least 80% sequence identity to a sequence selected from SEQ ID NOs: 14, 15, and 16.
- 19. The nucleic acid sequence of any of paragraphs 1-18, wherein the sequence comprises a promoter operably linked to the elements of a. and b.
- 20. The nucleic acid sequence of paragraph 19, wherein the promoter is not a GATA1 promoter.
- 21. The nucleic acid sequence of paragraph 20, wherein the promoter comprises a promoter sequence of Elongation factor 1-alpha 1 (eEF1a1).
- 22. The nucleic acid sequence of any of paragraphs 1-21, wherein the sequence encoding a GATA-binding factor 1 (GATA1) polypeptide comprises at least 60% sequence identity to a nucleotide sequence encoding a human GATA1 polypeptide.
- 23. The nucleic acid sequence of any of paragraphs 1-22, further comprising:
  - a posttranscriptional regulatory element operably linked to the sequence encoding the GATA1 polypeptide.
- 24. The nucleic acid sequence of paragraph 23, wherein the posttranscriptional regulatory element comprises a Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element (WPRE).
- 25. The nucleic acid sequence of any of paragraphs 1-24, further comprising an internal ribosome entry site.
- 26. The nucleic acid sequence of paragraph 25, wherein the internal ribosome entry site is operably linked to a marker gene and wherein the marker gene encodes an optically visible protein or an enzyme.
- 27. The nucleic acid sequence of any of paragraphs 1-26, wherein the sequence comprises a sequence selected from SEQ ID NOs 8, 9, 61, and 62.
- 28. The nucleic acid sequence of any of paragraphs 1-27, wherein the nucleic acid sequence is a vector.
- 29. The nucleic acid sequence of paragraph 28, wherein the vector is a plasmid, or an adenoviral, lentiviral or retroviral vector.
- 30. A lentiviral particle comprising the nucleic acid sequence of any of paragraphs 1-30.
- 31. A composition comprising a nucleic acid sequence or particle of any of paragraphs 1-31 and a pharmaceutically acceptable carrier.
- 32. A method of treating Diamond-Blackfan Anemia in a subject in need thereof, the method comprising administering a therapeutically effective amount of a nucleic acid sequence, particle, or composition of any of paragraphs 1-31 to the patient.
- 33. A method of restoring early erythroid progenitor cell-specific GATA1 expression, the method comprising contacting a population of cells comprising early erythroid progenitor cells with a nucleic acid sequence, particle, or composition of any of paragraphs 1-31.
- 34. The method of paragraph 33, wherein the early erythroid progenitor cells comprise a DBA-associated gene mutation.

Some embodiments of the technology described herein can be defined according to any of the following numbered paragraphs:

- 1. A nucleic acid sequence comprising
  - a. at least one heterologous regulatory sequence selected from an hematopoietic enhancer element and miRNA binding site for a HSC restricted miRNA; and
  - b. a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.
- 2. The nucleic acid sequence of paragraph 1, comprising at least one hematopoietic enhancer element.
- 3. The nucleic acid sequence of paragraph 2, wherein the enhancer element comprises a sequence of at least 80% homology to a nucleotide sequence that is selected from the group consisting of: SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 38 and/or SEQ ID NO: 39.
- 4. The nucleic acid sequence of paragraph 2, wherein the enhancer element comprises an enhancer element of a gene selected from the group consisting of:
  - Kell metalloendopeptidase (KEL); 5′ aminolevulinate synthase 2 (ALAS2); and glycophorin A (GYPA).
- 5. The nucleic acid sequence of any of paragraphs 1-4, comprising at least one miRNA binding site for at least one HSC-restricted miRNA.
- 6. The nucleic acid sequence of any of paragraphs 1-5, wherein the at least one miRNA binding site for at least one HSC-restricted miRNA is selected from the group consisting of miR binding sites for miR10aT, miR125, miR155, miR130aT, miR142T, miR196bT, miR99, miR126miR126, miR181, miR193, miR223T, miR542, and let7e.
- 7. The nucleic acid sequence of any of paragraphs 1-6, comprising at least one hematopoietic enhancer element and at least one miRNA binding site for at least one HSC-restricted miRNA.
- 8. The nucleic acid sequence of any of paragraphs 1-7, further comprising:
  - a. a heterologous 5′ UTR comprising:
    - i. a 5′UTR sequence of a hematopoietic transcription factor other than GATA1;
    - ii. a sequence of at least 20 nucleotide acids; and/or
    - iii. 1-25 upstream codons uAUGs; and/or
  - b. a hematopoietic enhancer minigene.
- 9. A nucleic acid sequence comprising
  - a. a 5′ UTR comprising;
    - i. a 5′UTR sequence of a hematopoietic transcription factor other than GATA1;
    - ii. a sequence of at least 20 nucleotide acids; and/or
    - iii. 1-25 upstream codons uAUGs.
  - b. a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.
- 10. The nucleic acid sequence of any of paragraphs 1-9, wherein the 5′UTR comprises a 5′UTR of a gene selected from the group consisting of: Runt-related transcription factor 1 (RUNX1), LIM Domain Only 2 (LMO2), or ETS Variant 6 (ETV6).
- 11. The nucleic acid sequence of any of paragraphs 1-10, further comprising at least one hematopoietic enhancer element, miRNA binding site for a HSC restricted miRNA, and/or a hematopoietic enhancer minigene (G1HEM).
- 12. A nucleic acid sequence comprising
  - a. an hematopoietic enhancer minigene (G1HEM);
  - b. a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.
- 13. The nucleic acid sequence of paragraph 12, wherein the hematopoietic enhancer minigene (mG1HEM) comprises a sequence of at least 80% homology to a nucleotide sequence of: SEQ ID NO: 13.
- 14. The nucleic acid sequence of any of paragraphs 12-13, further comprising a 5′ UTR comprising;
  - i. a 5′UTR sequence of a hematopoietic transcription factor other than GATA1;
  - ii. a sequence of at least 20 nucleotide acids; and/or
  - iii. 1-25 upstream codons uAUGs; and/or
- at least one hematopoietic enhancer element; and/or at least one miRNA binding site for a HSC restricted miRNA.
- 15. The nucleic acid sequence of paragraph 14, wherein the 5′ UTR sequence of a hematopoietic transcription factor other than GATA1 is a 5′UTR sequence of a; a gene selected from the group consisting of: Runt-related transcription factor 1 (RUNX1), at least one hematopoietic enhancer element; and/or at least one miRNA binding site for a HSC restricted miRNA.
- 16. The nucleic acid sequence of any of paragraphs 1-15, wherein the binding site for at least one HSC restricted miRNA comprises a sequence selected from SEQ ID NOs: 31-37 and 43-55.
- 17. The nucleic acid sequence of any of paragraphs 1-16, wherein the hematopoietic enhancer element comprises a sequence with at least 80% sequence identity to a sequence selected from SEQ ID NOs: 10, 11, 12, 38, and 39.
- 18. The nucleic acid sequence of any of paragraphs 1-17, wherein the 5′ UTR sequence comprises a sequence with at least 80% sequence identity to a sequence selected from SEQ ID NOs: 14, 15, and 16.
- 19. The nucleic acid sequence of any of paragraphs 1-18, wherein the sequence comprises a promoter operably linked to the elements of a. and b.
- 20. The nucleic acid sequence of paragraph 19, wherein the promoter is not a GATA1 promoter.
- 21. The nucleic acid sequence of paragraph 20, wherein the promoter comprises a promoter sequence of Elongation factor 1-alpha 1 (eEF1a1).
- 22. The nucleic acid sequence of any of paragraphs 1-21, wherein the sequence encoding a GATA-binding factor 1 (GATA1) polypeptide comprises at least 60% sequence identity to a nucleotide sequence encoding a human GATA1 polypeptide.
- 23. The nucleic acid sequence of any of paragraphs 1-22, further comprising:
  - a posttranscriptional regulatory element operably linked to the sequence encoding the GATA1 polypeptide.
- 24. The nucleic acid sequence of paragraph 23, wherein the posttranscriptional regulatory element comprises a Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element (WPRE).
- 25. The nucleic acid sequence of any of paragraphs 1-24, further comprising an internal ribosome entry site.
- 26. The nucleic acid sequence of paragraph 25, wherein the internal ribosome entry site is operably linked to a marker gene and wherein the marker gene encodes an optically visible protein or an enzyme.
- 27. The nucleic acid sequence of any of paragraphs 1-26, wherein the sequence comprises a sequence selected from SEQ ID NOs 8, 9, 61, and 62.
- 28. The nucleic acid sequence of any of paragraphs 1-27, wherein the nucleic acid sequence is a vector.
- 29. The nucleic acid sequence of paragraph 28, wherein the vector is a plasmid, or an adenoviral, lentiviral or retroviral vector.
- 30. A lentiviral particle comprising the nucleic acid sequence of any of paragraphs 1-30.
- 31. A composition comprising a nucleic acid sequence or particle of any of paragraphs 1-31 and a pharmaceutically acceptable carrier.
- 32. A method of treating Diamond-Blackfan Anemia in a subject in need thereof, the method comprising administering a therapeutically effective amount of a nucleic acid sequence, particle, or composition of any of paragraphs 1-31 to the patient.
- 33. A method of restoring early erythroid progenitor cell-specific GATA1 expression, the method comprising contacting a population of cells comprising early erythroid progenitor cells with a nucleic acid sequence, particle, or composition of any of paragraphs 1-31.
- 34. The method of paragraph 33, wherein the early erythroid progenitor cells comprise a DBA-associated gene mutation.
- 35. A nucleic acid sequence, particle, or composition of any of paragraphs 1-31 for use in the treatment of Diamond-Blackfan Anemia in a subject in need thereof.

The technology described herein is further illustrated by the following examples which in no way should be construed as being further limiting.

EXAMPLES
Example 1: Methods for the Treatment of Dba Using Gata1 Gene Therapy

Diamond-Blackfan anemia (DBA), also known as congenital hypoplastic anemia, is a condition that was first described in 1938 and is characterized by a paucity of red blood cell progenitors and precursors in the bone marrow of patients, while all other aspects of hematopoiesis occur in an ostensibly normal manner (1, 2). DBA is estimated to occur in approximately 1 in 100,000 to 200,000 live births (3), although this may be an underestimate given a number of individuals who have been found to have variable expressivity or who may have been misdiagnosed. For many decades, the diagnosis of DBA was made primarily based upon clinical criteria and was assisted by the use of the biomarker erythrocyte adenosine deaminase, which is elevated in ˜80% of patients with DBA (3).

After an extensive mapping effort that spanned much of the 1990s, the first gene mutated in DBA was discovered in 1999 through the identification of an individual with a translocation on chromosome 19 (4). Surprisingly, heterozygous loss of function mutations were identified in ˜20-25% of DBA cases in this initial mutated gene, which was a ubiquitously expressed ribosomal protein (RP) gene, RPS19. This immediately raised a lot of speculation about underlying mechanisms and whether a ribosomal or non-ribosomal role for RPS19 may be involved. A number of subsequent studies demonstrated that impaired ribosome biogenesis appeared to be a major contributor to this phenotype as a result of RP haploinsufficiency, suggesting a role for ribosome activity/levels in this phenotype (5). However, the underlying basis for the erythroid-specificity of this disorder remained a mystery.

Subsequent studies in cohorts of patients with DBA that either employed targeting sequencing, assessment of copy number variation using single nucleotide polymorphism microarrays/comparative genomic hybridization, or whole exome sequencing have revealed a total of 19 distinct RPs harboring heterozygous loss of function mutations that result in RP haploinsufficiency (6, 7). Collectively, these mutations explain the cause in ˜60-80% of DBA cases. These 19 RP gene mutations are heterogeneously distributed throughout the ribosome and involve both the large (60 S) and small (40 S) subunits of the ribosome. There is no clustering of mutations on a particular structural region of the ribosome (8). More recently, through whole exome sequencing on a cohort of over 450 patients with a diagnosis of DBA, the inventors have now identified an additional 7 RP gene mutations, bringing the total number of RP genes implicated in this disorder to 26 that collectively explain the underlying basis of ˜80% of DBA cases (nearly ⅓ of RPs composing the ribosome) (9).

Despite the advances in understanding the majority of genetic causes of DBA, there have been two major limitations that have been present. Despite the robust findings of heterozygous RP loss of function mutations in the majority of DBA cases, how this can lead to the erythroid-specific hematopoietic defects in DBA has remained an enigma (10). Secondly, there are very limited therapies available to treat patients with DBA at the current time (3, 10). Some patients respond to corticosteroids, but there are often significant side effects limiting the long-term effectiveness of this therapy in the majority of patients. Many patients require chronic red blood cell transfusions, which can be associated with significant and difficult to control iron overload. Finally, some patients can be cured through the use of allogeneic bone marrow transplantation, but in general this is limited to those with matched sibling donors, given the poor outcomes noted with unrelated donor transplantation in this condition (11). Only limited candidate experimental therapeutics have been developed to date and many have unfortunately not shown robust efficacy in later stage pre-clinical or clinical studies (12). Therefore, there is a significant need for new and improved therapies for DBA that could be effective in the majority of patients with this condition, which is due to a large number of distinct mutations primarily affecting RP genes.

With these limitations in mind, the inventors reasoned that further study of DBA through the use of human genetics coupled with mechanistic follow up could give us further insight into this disorder and allow us to identify improved therapeutic strategies. The inventors subsequently identified the first non-RP gene mutation in this disorder. The inventors identified several patients with a diagnosis of DBA who had mutations that impaired the production of the long protein form of the hematopoietic master transcription factor GATA1 (13). Several other patients with similar types of mutations were subsequently reported, as well (14-16). While these findings demonstrated that GATA1 mutations could cause a phenotype resembling DBA, whether there was a molecular connection between the more commonly observed RP gene mutations and the GATA1 mutations remained unclear.

The inventors tested whether RP haploinsufficiency—the most common cause of DBA—could alter GATA1 translation. The inventors could demonstrate using both RP suppression in primary human hematopoietic stem and progenitor cells (HSPCs) and in DBA patient samples that GATA1 mRNA translation was impaired in the setting of RP haploinsufficiency, while a variety of other erythroid-important transcripts were not affected in terms of their translation in this setting (15). Moreover, the inventors demonstrated that increasing GATA1 protein levels through lentiviral expression was sufficient to rescue the erythroid differentiation defect present in mononuclear cells from DBA patients with various RP gene mutations (to the level that is seen in normal individuals). These results produced a model, as illustrated in FIG. 1, regarding the pathogenesis of DBA.

However, a number of questions have remained. (1) It was unclear exactly how the ribosome was being altered in the setting of RP haploinsufficiency. It was possible that the ribosome may be altered in composition in this case, although the finding of 28 distinct RP mutations in this condition made this seem less likely. An alternative, although not mutually exclusive, possibility was that ribosome levels were reduced in the setting of RP haploinsufficiency. (2) The range of transcripts beyond those that were specifically tested in initial studies and the features common to those transcripts remained unclear. (3) The stage of hematopoiesis at which these defects emerged was also unclear.

The inventors then employed a ribosome profiling approach to better understand at a genomic level what transcripts were affected by this reduction in ribosome levels due to DBA-associated molecular lesions (19, 20). The inventors were able to obtain high quality ribosome profiling data from RP haploinsufficient HSPCs undergoing erythroid lineage commitment—a stage at which the functional defects in erythroid differentiation arise. Importantly, through analysis of this data, the inventors could show that a limited set of ˜500 transcripts display the most significant changes in translation efficiency in the setting of RP haploinsufficiency (similar for RPS19 or RPL5 suppression). Consistent with the inventors earlier targeted findings from polysome analysis, GATA1 mRNA was among the most downregulated transcripts in terms of translation efficiency. Interestingly, the majority of other transcripts showing translational downregulation were all components of the ribosome or ribosome-associated factors, including all RPs and a variety of translation initiation and elongation factors. Upon further analysis by using cap analysis of gene expression to define 5′ untranslated regions (UTRs) for these transcripts, the inventors could show that those transcripts that were most highly translated at baseline and which had short and unstructured 5′ UTRs tended to be the ones that were downregulated at the translational level in the setting of RP haploinsufficiency. Interestingly, among all hematopoietic master transcript factors, only GATA1 has a short 5′ UTR and the inventors could show that replacing this 5′ UTR with those of other master regulators (such as RUNX1, LMO2, or ETV6) altered the translation of this key hematopoietic transcription factor.

Finally, the inventors also demonstrated that this happens in vivo in DBA patients and the inventors assessed the stage of hematopoiesis at which these lesions emerge. The inventors showed by both immunohistochemistry for GATA1 in bone marrow biopsy specimens and using intracellular flow cytometry that GATA1 levels were reduced in hematopoietic progenitors from DBA patients. Importantly, the inventors demonstrated that GATA1 levels were reduced even upon its earliest expression in very primitive CD34+CD38− HSPCs from DBA patient bone marrow cells, as compared to control samples (FIG. 3). In addition, the inventors found that GATA1 levels continued to be lower in DBA patient cells, even as GATA1 levels increased in more mature CD34+CD38+ HSPCs. These results are consistent with the emerging model that hematopoietic lineage commitment occurs at the most primitive stages of stem and progenitor cells and demonstrates the relevance of these findings to human disease (21-23).

All of these mechanistic findings have important implications for improving the understanding of DBA pathogenesis. However, the challenge still remained as to how better therapies can be developed for DBA. As discussed above, the only currently available therapies are the chronic use of corticosteroids, regular blood transfusions, or allogeneic hematopoietic stem cell transplantation (10). An alternative and valuable approach would be to use autologous hematopoietic stem cell transplantation coupled to gene therapy (24). Indeed, there have been attempts to develop lentiviral vectors to allow for increased production of RPS19 (25). It is difficult to envision how this approach can be useful for the majority of patients, given the pleiotropic RP gene mutations present in DBA patients (28 mutations have been identified to date). Given the inventor's findings that impaired GATA1 protein production underlies all DBA cases and that increasing GATA1 protein is sufficient to rescue the erythroid differentiation defects present in these patients, the development of GATA1 gene therapy is a valuable approach for achieving curative treatment in DBA patients. The major limitation, as discussed in detail below, is that expression of GATA1 in the hematopoietic stem cell (HSC) compartment will cause the stem cells to differentiate precociously and the expression of GATA1 during terminal erythropoiesis needs to be regulated.

While GATA1 protein levels are suppressed in HSPCs from DBA patients and increasing GATA1 expression can ameliorate the erythroid lineage commitment defect characteristic of DBA, dysregulated expression of GATA1 can be problematic. HSCs can undergo precocious differentiation with exogenous GATA1 expression and effective terminal erythropoiesis requires regulation of GATA1 levels.

Based on the inventor's mechanistic studies, the development of GATA1 gene therapy for treatment of DBA is compelling and appears to be a promising approach. The inventors have been able to demonstrate that increasing GATA1 expression can rescue the erythroid differentiation defect in primary HSPCs from patients with DBA harboring a variety of molecular lesions in various RP genes. In addition, the inventors have also been able to show that they can regularly produce the same results across a variety of DBA-associated molecular lesions modeled in primary HSPCs through RNA interference-based approaches (15, 17). In these cases, the increased expression of GATA1 was achieved through the use of lentiviruses, where the GATA1 cDNA containing altered 5′ and 3′ UTR elements was under the transcriptional control of a lentiviral LTR that displays high-level and ubiquitous expression. For therapeutic purposes, such expression must be regulated and tuned at various stages of the differentiation process. GATA1 levels must be controlled to avoid any perturbations of hematopoiesis.

Prior studies have shown that exogenous unregulated expression of Gata1 in mouse HSCs can promote precocious differentiation toward the megakaryocytic and erythroid lineages, while preventing the maintenance of self-renewing HSCs capable of long-term engraftment (26, 27). Indeed, exogenous Gata1 expression can reprogram other hematopoietic lineages to take on an erythroid fate (26). However, regulated expression of a Gata1 transgene can allow long-term maintenance of HSCs (27). To bolster these findings in a human context, the inventors have utilized a serum-free culture system that allows for the maintenance of long-term engrafting human HSCs (capable of engrafting immunodeficient xenograft recipients) over the course of a few days in culture. In this setting, the introduction of exogenous GATA1 expression regulated by a lentiviral LTR element causes precocious differentiation of these cells, while the control cells maintained their phenotype and functional ability to give rise to long-term hematopoietic grafts. These findings extend the previously published results in mouse models (26). These results also collectively emphasize the need to prevent GATA1 expression in early HSCs to allow for effective engraftment, as would be required for a curative lentiviral gene therapy approach. In addition, GATA1 levels must not be excessively elevated during terminal erythroid differentiation, since this can impair effective erythropoiesis (28). To address these issues, the inventors undertook a series of studies to identify key regulatory elements that will permit regulated expression of GATA1 from lentiviral vectors.

To achieve regulated expression of GATA1 for effective gene therapy, the inventors have been employing two complementary and synergistic approaches to ensure that there will not be potentially detrimental ectopic expression, while also regulating levels of GATA1 during the course of erythroid differentiation. It is contemplated herein that either approach could be used alone, or that they can be combined.

The first regulatory element that is being used in the gene therapy vectors is a GATA1 hematopoietic enhancer minigene (G1HEM) that concatenates 4 distinct regulatory elements to achieve faithful expression of GATA1 during hematopoiesis (27, 29). These elements include a −3 kb hematopoietic enhancer, an upstream double GATA motif, an upstream CACCC box, and a segment of the first intron of GATA1. Indeed, the 979 nucleotides present in this minigene are sufficient to drive Gata1 cDNA expression appropriately to rescue a Gata1 knockout mouse and allow for ostensibly normal erythropoiesis.

For the development of the GATA1 expression vectors that are clinically usable and involve the first transcriptional regulatory element discussed above, the inventors utilize safe and well-designed vectors that have already been proven effective in human clinical studies. The pRRL.PPT.EFS vector that has demonstrated controlled and well-regulated exogenous cDNA expression in a variety of human hematopoietic cell types and which has been utilized in clinical settings (30) is one such vector. The G1HEM can be incorporated upstream of the GATA1 cDNA that is both driven by the endogenous promoter or by a modified (shortened) ubiquitous EF1α promoter (EFS), as an alternative and complementary approach. Importantly, as discussed above, the Gata1 regulatory elements contained in the G1HEM from mice are capable of driving regulated expression of marker genes solely in the cell types where Gata1 is normally expressed and are sufficient to allow appropriate rescue of knockout mice using Gata1 cDNA (27, 31).

The inventors have produced a total of 4 different vectors (the 2 shown in FIG. 6, with both mouse and human regulatory elements used for all cases). The inventors incorporated a self-cleaving 2A peptide (P2A) element followed by the Venus fluorescent marker after the GATA1 cDNA to be able to readily track those cells expressing GATA1 in real time Flow cytometry assays were used to quantify the extent of Venus expression seen in the various hematopoietic cell types tested. The extent of increase in GATA1 expression in cell types that normally express this transcription factor can be assessed by performing cell sorting of particular populations. Finally, using this primary cell culture approach, the inventors can assess variation in phenotypes that occur with GATA1 expression (32-34). This powerful approach allows the inventors to simultaneously determine effectiveness, specificity, and effects upon hematopoietic differentiation using a streamlined approach that is directly relevant to the process of hematopoiesis in vivo. Every vector tested in 2-3 independent primary human hematopoietic cell samples to ascertain both specificity and effectiveness of expression.

While the transcriptional regulatory elements discussed above that compose the G1HEM permit regulated expression of GATA1 cDNA, studies have indicated that there can be leaky expression in the HSC compartment with the use of this regulatory element (27). As this could profoundly affect the ability to obtain long-term engraftment (26), expression in the HSC compartment must be prevented. To achieve this, the inventors incorporated a second gene regulatory element—binding elements for the HSC-restricted microRNA (miR), miR126, after the post transcriptional regulatory elements of the woodchuck hepatitis virus (PRE), e.g., in the modified pRRL.PPT.EES derivatives. Insertion of three repeated miR126 binding elements after the PRE prevents expression of transgenes in the HSC compartment. The inventors also modified the pRRL.PPT.EFS with the G1HEM and GATA1 cDNA to include these miR126 elements, as well. In vitro testing is performed in primary human hematopoietic cells to ensure effective and selective expression. HSCs that will be transplanted into the NOD.Cg-KitW-41J Tyr+ Prkdcscid Il2rgtm1Wj1 (NBSGW) mouse model that has previously used successfully and extensively to produce human hematopoietic xenograft models (36) can be transduced. HSC function can then be tested after 16 weeks of engraftment using phenotypic marker quantification, secondary transplantation into NBSGW recipients, and by assessing Venus expression in the phenotypic HSC compartment.

Described herein is the development of clinical-grade lentiviral vectors that permits the regulated expression of GATA1 cDNA for use in gene therapy. The studies in vitro and in vivo in primary human hematopoietic permit screening of multiple independent vectors incorporating both a critical set of transcriptional regulatory elements (the G1HEM or a derivative of it) and miR126 binding elements.

REFERENCES

1. Nathan D G, Clarke B J, Hillman D G, Alter B P, Housman D E. Erythroid precursors in congenital hypoplastic (Diamond-Blackfan) anemia. The Journal of clinical investigation. 1978; 61(2):489-98. doi: 10.1172/JCI108960. PubMed PMID: 621285; PMCID: PMC372560.

2. Iskander D, Psaila B, Gerrard G, Chaidos A, En Foong H, Harrington Y, Karnik L C, Roberts I, de la Fuente J, Karadimitris A. Elucidation of the EP defect in Diamond-Blackfan anemia by characterization and prospective isolation of human EPs. Blood. 2015; 125(16):2553-7. doi: 10.1182/blood-2014-10-608042. PubMed PMID: 25755292.

3. Vlachos A, Ball S, Dahl N, Alter B P, Sheth S, Ramenghi U, Meerpohl J, Karlsson S, Liu J M, Leblanc T, Paley C, Kang E M, Leder E J, Atsidaftos E, Shimamura A, Bessler M, Glader B, Lipton J M, Participants of Sixth Annual Daniella Maria Arturi International Consensus C. Diagnosing and treating Diamond Blackfan anaemia: results of an international clinical consensus conference. Br J Haematol. 2008; 142(6):859-76. doi: 10.1111/j.1365-2141.2008.07269.x. PubMed PMID: 18671700; PMCID: PMC2654478.

4. Draptchinskaia N, Gustavsson P, Andersson B, Pettersson M, Willig T N, Dianzani I, Ball S, Tchernia G, Klar J, Matsson H, Tentler D, Mohandas N, Carlsson B, Dahl N. The gene encoding ribosomal protein S19 is mutated in Diamond-Blackfan anaemia. Nat Genet. 1999; 21(2):169-75. doi: 10.1038/5951. PubMed PMID: 9988267.

5. Flygare J, Karlsson S. Diamond-Blackfan anemia: erythropoiesis lost in translation. Blood. 2007; 109(8):3152-4. doi: 10.1182/blood-2006-09-001222. PubMed PMID: 17164339.

6. Mirabello L, Khincha P P, Ellis S R, Giri N, Brodie S, Chandrasekharappa S C, Donovan F X, Zhou W, Hicks B D, Boland J F, Yeager M, Jones K, Zhu B, Wang M, Alter B P, Savage S A. Novel and known ribosomal causes of Diamond-Blackfan anaemia identified through comprehensive genomic characterisation. J Med Genet. 2017. doi: 10.1136/jmedgenet-2016-104346. PubMed PMID: 28280134.

7. Landowski M, O'Donohue M F, Buros C, Ghazvinian R, Montel-Lehry N, Vlachos A, Sieff C A, Newburger P E, Niewiadomska E, Matysiak M, Glader B, Atsidaftos E, Lipton J M, Beggs A H, Gleizes P E, Gazda H T. Novel deletion of RPL15 identified by array-comparative genomic hybridization in Diamond-Blackfan anemia. Hum Genet. 2013; 132(11):1265-74. doi: 10.1007/s00439-013-1326-z. PubMed PMID: 23812780; PMCID: PMC3797874.

8. Khatter H, Myasnikov A G, Natchiar S K, Klaholz B P. Structure of the human 80S ribosome. Nature. 2015; 520(7549):640-5. doi: 10.1038/nature l4427. PubMed PMID: 25901680.

9. Ulirsch J C, Verboon J M, Kazerounian S, Guo M H, Yuan D, Ludwig L S, Handsaker R E, Abdulhay N J, Fiorini C, Genovese G, Lim E T, Cheng A, Cummings B B, Chao K R, Beggs A H, Genetti C A, Sieff C A, Newburger P E, Niewiadomska E, Matysiak M, Vlachos A, Lipton J M, Atsidaftos E, Glader B, Narla A, Gleizes P E, O'Donohue M F, Montel-Lehry N, Amor D J, McCarroll S A, O'Donnell-Luria A H, Gupta N, Gabriel S B, MacArthur D G, Lander E S, Lek M, Da Costa L, Nathan D G, Korostelev A A, Do R, Sankaran V G, Gazda H T. The Genetic Landscape of Diamond-Blackfan Anemia. Am J Hum Genet. 2018; 103(6):930-47. doi: 10.1016/j.ajhg.2018.10.027. PubMed PMID: 30503522.

10. Lipton J M, Ellis S R. Diamond-Blackfan anemia: diagnosis, treatment, and molecular pathogenesis. Hematology/oncology clinics of North America. 2009; 23(2):261-82. doi: 10.1016/j.hoc.2009.01.004. PubMed PMID: 19327583; PMCID: PMC2886591.

11. Roy V, Perez W S, Eapen M, Marsh J C, Pasquini M, Pasquini R, Mustafa M M, Bredeson C N, Non-Malignant Marrow Disorders Working Committee of the International Bone Marrow Transplant R. Bone marrow transplantation for diamond-blackfan anemia. Biol Blood Marrow Transplant. 2005; 11(8):600-8. doi: 10.1016/j.bbmt.2005.05.005. PubMed PMID: 16041310.

12. Narla A, Vlachos A, Nathan D G. Diamond Blackfan anemia treatment: past, present, and future. Semin Hematol. 2011; 48(2):117-23. doi: 10.1053/j.seminhematol.2011.01.004. PubMed PMID: 21435508; PMCID: PMC3073777.

13. Sankaran V G, Ghazvinian R, Do R, Thiru P, Vergilio J A, Beggs A H, Sieff C A, Orkin S H, Nathan D G, Lander E S, Gazda H T. Exome sequencing identifies GATA1 mutations resulting in Diamond-Blackfan anemia. The Journal of clinical investigation. 2012; 122(7):2439-43. doi: 10.1172/JCI63597. PubMed PMID: 22706301; PMCID: PMC3386831.

14. Parrella S, Aspesi A, Quarello P, Garelli E, Pavesi E, Carando A, Nardi M, Ellis S R, Ramenghi U, Dianzani I. Loss of GATA-1 full length as a cause of Diamond-Blackfan anemia phenotype. Pediatr Blood Cancer. 2014; 61(7):1319-21. doi: 10.1002/pbc.24944. PubMed PMID: 24453067; PMCID: PMC4684094.

15. Ludwig L S, Gazda H T, Eng J C, Eichhorn S W, Thiru P, Ghazvinian R, George T I, Gotlib J R, Beggs A H, Sieff C A, Lodish H F, Lander E S, Sankaran V G. Altered translation of GATA1 in Diamond-Blackfan anemia. Nature medicine. 2014; 20(7):748-53. doi: 10.1038/nm.3557. PubMed PMID: 24952648; PMCID: PMC4087046.

16. Klar J, Khalfallah A, Arzoo P S, Gazda H T, Dahl N. Recurrent GATA1 mutations in Diamond-Blackfan anaemia. Br J Haematol. 2014; 166(6):949-51. doi: 10.1111/bjh.12919. PubMed PMID: 24766296.

17. Khajuria R K, Munschauer M, Ulirsch J C, Fiorini C, Ludwig L S, McFarland S K, Abdulhay N J, Specht H, Keshishian H, Mani D R, Jovanovic M, Ellis S R, Fulco C P, Engreitz J M, Schutz S, Lian J, Gripp K W, Weinberg O K, Pinkus G S, Gehrke L, Regev A, Lander E S, Gazda H T, Lee W Y, Panse V G, Carr S A, Sankaran V G. Ribosome Levels Selectively Regulate Translation and Lineage Commitment in Human Hematopoiesis. Cell. 2018; 173(1):90-103 e19. doi: 10.1016/j.cell.2018.02.036. PubMed PMID: 29551269; PMCID: PMC5866246.

18. Mills E W, Green R. Ribosomopathies: There's strength in numbers. Science. 2017; 358(6363). doi: 10.1126/science.aan2755. PubMed PMID: 29097519.

19. Ingolia N T, Ghaemmaghami S, Newman J R, Weissman J S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science. 2009; 324(5924):218-23. doi: 10.1126/science.1168978. PubMed PMID: 19213877; PMCID: PMC2746483.

20. Ingolia N T. Ribosome Footprint Profiling of Translation throughout the Genome. Cell. 2016; 165(1):22-33. doi: 10.1016/j.cell.2016.02.066. PubMed PMID: 27015305; PMCID: PMC4917602.

21. Notta F, Zandi S, Takayama N, Dobson S, Gan O I, Wilson G, Kaufmann K B, McLeod J, Laurenti E, Dunant C F, McPherson J D, Stein L D, Dror Y, Dick J E. Distinct routes of lineage development reshape the human blood hierarchy across ontogeny. Science. 2016; 351(6269):aab2116. doi: 10.1126/science.aab2116. PubMed PMID: 26541609; PMCID: PMC4816201.

22. Velten L, Haas S F, Raffel S, Blaszkiewicz S, Islam S, Hennig B P, Hirche C, Lutz C, Buss E C, Nowak D, Boch T, Hofmann W K, Ho A D, Huber W, Trumpp A, Essers M A, Steinmetz L M. Human haematopoietic stem cell lineage commitment is a continuous process. Nature cell biology. 2017; 19(4):271-81. doi: 10.1038/ncb3493. PubMed PMID: 28319093; PMCID: PMC5496982.

23. Paul F, Arkin Y, Giladi A, Jaitin D A, Kenigsberg E, Keren-Shaul H, Winter D, Lara-Astiaso D, Gury M, Weiner A, David E, Cohen N, Lauridsen F K, Haas S, Schlitzer A, Mildner A, Ginhoux F, Jung S, Trumpp A, Porse B T, Tanay A, Amit I. Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors. Cell. 2015; 163(7):1663-77. doi: 10.1016/j.cell.2015.11.013. PubMed PMID: 26627738.

24. Sankaran V G, Weiss M I. Anemia: progress in molecular mechanisms and therapies. Nature medicine. 2015; 21(3):221-30. doi: 10.1038/nm.3814. PubMed PMID: 25742458; PMCID: 4452951.

25. Debnath S, Jaako P, Siva K, Rothe M, Chen J, Dahl M, Gaspar H B, Flygare J, Schambach A, Karlsson S. Lentiviral Vectors with Cellular Promoters Correct Anemia and Lethal Bone Marrow Failure in a Mouse Model for Diamond-Blackfan Anemia. Molecular therapy: the journal of the American Society of Gene Therapy. 2017; 25(8):1805-14. doi: 10.1016/j.ymthe.2017.04.002. PubMed PMID: 28434866; PMCID: PMC5542636.

26. Iwasaki H, Mizuno S, Wells R A, Cantor A B, Watanabe S, Akashi K. GATA-1 converts lymphoid and myelomonocytic progenitors into the megakaryocyte/erythrocyte lineages. Immunity. 2003; 19(3):451-62. PubMed PMID: 14499119.

27. Takai J, Moriguchi T, Suzuki M, Yu L, Ohneda K, Yamamoto M. The Gata1 5′ region harbors distinct cis-regulatory modules that direct gene activation in erythroid cells and gene inactivation in HSCs. Blood. 2013; 122(20):3450-60. doi: 10.1182/blood-2013-01-476911. PubMed PMID: 24021675.

28. Whyatt D, Lindeboom F, Karis A, Ferreira R, Milot E, Hendriks R, de Bruijn M, Langeveld A, Gribnau J, Grosveld F, Philipsen S. An intrinsic but cell-nonautonomous defect in GATA-1-overexpressing mouse erythroid cells. Nature. 2000; 406(6795):519-24. doi: 10.1038/35020086. PubMed PMID: 10952313.

29. Ohneda K, Shimizu R, Nishimura S, Muraosa Y, Takahashi S, Engel J D, Yamamoto M. A minigene containing four discrete cis elements recapitulates GATA-1 gene expression in vivo. Genes Cells. 2002; 7(12):1243-54. PubMed PMID: 12485164.

30. Schambach A, Bohne J, Chandra S, Will E, Margison G P, Williams D A, Baum C. Equal potency of gammaretroviral and lentiviral SIN vectors for expression of 06-methylguanine-DNA methyltransferase in hematopoietic cells. Mol Ther. 2006; 13(2):391-400. Epub 2005/10/18. doi: 10.1016/j.ymthe.2005.08.012. PubMed PMID: 16226060.

31. Shimizu R, Hasegawa A, Ottolenghi S, Ronchi A, Yamamoto M. Verification of the in vivo activity of three distinct cis-acting elements within the Gata1 gene promoter-proximal enhancer in mice. Genes Cells. 2013; 18(11):1032-41. Epub 2013/10/15. doi: 10.1111/gtc.12096. PubMed PMID: 24118212.

32. Sankaran V G, Ludwig L S, Sicinska E, Xu J, Bauer D E, Eng J C, Patterson H C, Metcalf R A, Natkunam Y, Orkin S H, Sicinski P, Lander E S, Lodish H F. Cyclin D3 coordinates the cell cycle during differentiation to regulate erythrocyte size and number. Genes Dev. 2012; 26(18):2075-87. Epub 2012/08/30. doi: 10.1101/gad.197020.112. PubMed PMID: 22929040; PMCID: 3444733.

33. Sankaran V G, Menne T F, Scepanovic D, Vergilio J A, Ji P, Kim J, Thiru P, Orkin S H, Lander E S, Lodish H F. MicroRNA-15a and -16-1 act via MYB to elevate fetal hemoglobin expression in human trisomy 13. Proc Natl Acad Sci USA. 2011; 108(4):1519-24. Epub 2011/01/06. doi: 10.1073/pnas.1018384108. PubMed PMID: 21205891; PMCID: 3029749.

34. Sankaran V G, Xu J, Byron R, Greisman H A, Fisher C, Weatherall D J, Sabath D E, Groudine M, Orkin S H, Premawardhena A, Bender M A. A functional element necessary for fetal hemoglobin silencing. N Engl J Med. 2011; 365(9):807-14. Epub 2011/09/02. doi: 10.1056/NEJMoa1103070. PubMed PMID: 21879898; PMCID: 3174767.

35. Gentner B, Visigalli I, Hiramatsu H, Lechman E, Ungari S, Giustacchini A, Schira G, Amendola M, Quattrini A, Martino S, Orlacchio A, Dick J E, Biffi A, Naldini L. Identification of hematopoietic stem cell-specific miRNAs enables gene therapy of globoid cell leukodystrophy. Sci Transl Med. 2010; 2(58):58ra84. doi: 10.1126/scitranslmed.3001522. PubMed PMID: 21084719.

36. Fiorini C, Abdulhay N J, McFarland S K, Munschauer M, Ulirsch J C, Chiarle R, Sankaran V G. Developmentally-faithful and effective human erythropoiesis in immunodeficient and Kit mutant mice. Am J Hematol. 2017; 92(9):E513-E9. doi: 10.1002/ajh.24805. PubMed PMID: 28568895; PMCID: PMC5546987.

37. Ito E, Konno Y, Toki T, Terui K. Molecular pathogenesis in Diamond-Blackfan anemia. Int J Hematol. 2010 October; 92(3):413-8.

Example 2: Vector Design for Lineage-Specific Expression of Gata1 as a Therapy for Diamond-Blackfan Anemia

In some embodiments of any of the aspects, described herein are various combinations of the following lentiviral vectors (FIG. 7):

1) Lentiviral backbone: 3rd generation self-inactivating lentiviral backbone based on pHIV-GFP (Welm et al Cell Stem Cell. 2008 Jan. 10. 2(1):90-102), driven by an EF1a promoter and containing an IRES-GFP sequence for initial characterization and testing but which will be removed from the final vector sequence.

2) Mouse GATA1 hematopoietic enhancer minigene (mG1HEM): concatenation of 3 sequences upstream of the mouse GATA1 transcription start site and a fourth sequence from the first intron of mouse GATA1 that have been shown to faithfully allow expression of GATA1 in erythroid cells but not hematopoietic stem cells (Takai et al. Blood. 2013 Nov. 14 122(20):3450-3460).

3) Minimal promoter (minP): either from 5′UTR of mouse GATA1 or from firefly luciferase reporter vector pGL4.25, Genbank accession number DQ904457.1

4) Human GATA1 cDNA (GATA1) with codon optimization for optimal expression in human cells with or without FLAG tag

5) Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element (WPRE) for enhanced stability of transgene mRNA.

6) miR126 binding site (miR126 BS): repeated sequence which is bound by miR126, a microRNA expressed in hematopoietic stem cells, and causes decreased transgene expression in the stem cell compartment (Gentner et al. Sci Trans Med. 2010 Nov. 17 2(58):58-84).

REFERENCES

Welm et al Cell Stem Cell. 2008 Jan. 10. 2(1):90-102.Gentner et al. Sci Trans Med. 2010 Nov. 17 2(58):58-84.

Example 3: Gata1 Gene Therapy as a Therapy for Diamond-Blackfan Anemia

Pre-clinical studies by the inventors have shown that GATA-1 augmentation in erythroid cells shows therapeutic effects in Diamond-Blackfan anemia (DBA). Herein, the inventors show the results of further experiments that demonstrate that the regulated increase in GATA1 expression in erythroid precursors, but not in hematopoietic stem cells, provides therapeutic effects in DBA.

A clinically relevant GATA1 gene therapy vector for DBA must achieve four crucial functions (FIG. 27). First, despite the requirement that a gene therapy vector gets incorporated into the genome of long-term, undifferentiated hematopoietic stem cells (LT-HSCs), there must be very little expression of the GATA1 transgene in the stem cell compartment, since GATA1 expression in HSCs leads to a loss of self-renewing stem cells. Second, to overcome the erythroid differentiation defect that is the hallmark of DBA, the gene therapy vector must drive robust expression in early progenitors once they have become committed to erythroid differentiation. Third, to mimic the pattern of endogenous GATA1 expression and achieve normal terminal erythroid differentiation, the expression from the gene therapy vector should decline at late stages of erythroid development. Fourth, developmentally regulated increased GATA1 expression must be sufficient to overcome the erythroid maturation block caused by ribosomal protein haploinsufficiency in experimental model systems and in primary patient samples.

To design a vector that incorporates the four key features above, the inventors first analyzed accessible chromatin peaks upstream of GATA1, and identified chromatin that is open in differentiating erythroid cellsut not in HSCs or other early progenitors. The inventors provide evidence that these regions of DNA contain regulatory elements that are responsible for erythroid-specific expression of GATA1. The inventors constructed a human GATA1 enhancer (hG1E) element (FIG. 28A) by concatenating the 3 regions of DNA with open chromatin upstream of GATA1. The inventors developed a vector that uses the hG1E element to drive both GATA1 and GFP expression by including an internal ribosomal entry site (IRES) sequence between the two genes. As an additional mechanism to achieve developmentally regulated transgene expression, the inventors combined the hG1E element with a miR223T binding site that has been previously used to restrict transgene expression in the HSC compartment.

To assess whether hG1E-GATA1 or hG1E-GATA1-miR constructs can drive sufficient increases in GATA1 expression, the inventors used an in vitro model of DBA. Primary human CD34+ HSPCs were infected with an shRNA vector targeting the DBA gene RPS19 which the inventors have previously shown can mimic the erythroid differentiation defects in vitro that are characteristic of DBA. The inventors defined the erythroid ratio as the proportion of cells that express erythroid markers when cultured under erythropoietic conditions. When co-infected with the hG1E-GATA1 or hG1E-GATA1-miR vector, CD34+ HSPCs had a restored erythroid ratio after RPS19 knockdown at levels comparable to constitutive GATA1 overexpression with the HMD-GATA1 vector, showing rescue of the DBA phenotype (FIG. 28B). As further evidence that hG1E-GATA1 and hG1E-GATA1-miR vectors can drive enough GATA1 expression to be physiologically relevant, the inventors used the G1E murine hematopoietic cell line that lacks endogenous GATA1 expression. Infection of G1E cells with the hG1E-GATA1 and hG1E-GATA1-miR vectors induced terminal erythroid differentiation, as measured by Ter119 expression (FIG. 28C).

Having achieved functionally sufficient increased GATA1 expression in erythroid progenitors, the inventors sought to determine whether the inventors novel regulatory elements can restrict GATA1 expression in the LT-HSC compartment, since GATA1 expression in these cells would impair the maintenance of stem cells in the bone marrow. The inventors infected CD34+ HSPCs with the hG1E-GATA1 or hG1E-GATA1-miR vector and cultured them in conditions that enable short-term HSC maintenance in vitro. Two days after infection, GFP expression and surface expression of LT-HSC markers were assessed by flow cytometry to quantify transgene expression in LT-HSCs. These cells were then transferred to media that promotes erythroid development and GFP expression was measured in differentiated erythroid precursors. There was a significant increase in the ratio of GFP expression in erythroid cells to GFP in HSCs (RBCGFP/HSCGFP ratio) in the cells infected with hG1E-GATA1 and hG1E-GATA1-miR viruses compared to HMD-GATA1 virus that has constitutive expression of GATA1 (FIG. 28D). The increased RBCGFP/HSCGFP ratio is due to restricted expression of the experimental vectors in HSCs. These data reveal that regulated, increased GATA1 expression in erythroid precursors is sufficient to overcome the differentiation block in two distinct in vitro DBA models and has restricted expression in the LT-HSC compartment. This developmentally faithful increase in GATA1 expression provides shows that a gene therapy approach based on regulated GATA1 overexpression can be a viable cure for Diamond-Blackfan anemia.

To further investigate the expression of GATA1 from the hG1E-GATA1 vector in developing erythroid cells, the inventors used a three-phase culture system to induce human HSPCs to differentiate into fully hemoglobinized, enucleated red blood cells in vitro. During in vitro differentiation, developing erythroid progenitors and precursors first express high levels of the transferrin receptor CD71. Several days later, glycophorin A (CD235a) is highly expressed, followed by loss of CD71 expression in terminally differentiated RBCs (FIG. 5a). Following transduction with HMD-GATA1 or hG1E-GATA1, cells that are already primed for erythroid development undergo more rapid early differentiation measured by percentage of cells expressing CD71 compared to negative controls (FIG. 29B). Next, the inventors compared the GFP expression in the terminally differentiated CD71-CD235a+ subset with GFP expression in the more primitive CD71+CD235a+ subset (ErythrocyteGFP/progenitorGFP). There is significantly decreased GFP expression from the hG1E-GATA1 vector in terminally differentiated erythrocytes, faithfully recapitulating the pattern of decreased GATA1 expression during terminal differentiation. Notably, but not unexpectedly, this decreased GFP expression was not seen in the HMD-GATA1 samples, indicating impaired terminal differentiation with unregulated GATA1 expression (FIG. 29C).

Next the inventors sought to recapitulate RPS19 haploinsufficiency in primary HSPCs isolated from healthy adult donors by using CRISPR/Cas9 mediated gene-disruption of RPS19. The inventors showed that efficient editing of RPS19 led to an erythroid maturation block with significantly fewer cells expressing CD71 during early erythroid culture. The inventors then transduced RPS19-edited HSPCs with HMD-empty, HMD-GATA1, or hG1E-GATA1 virus. Of the cells that were committed to erythroid differentiation on day 4 in culture (as measured by CD71 expression), the population infected with HMD-GATA1 or hG1E-GATA1 virus had more CD235 expression (FIG. 30A), confirming the ability of regulated increase of GATA1 expression to rescue the block in erythroid differentiation induced by loss of a ribosomal protein as is seen in DBA. Finally, there was a significant reduction in erythroid colonies detected in a methylcellulose colony forming assay after RPS19 editing that was partially rescued by hG1E-GATA1 (FIG. 30B). Altogether, the inventors data reveal that the hG1E-GATA1 vector satisfies all four criteria that are required to be a gene therapy cure for DBA (FIG. 27).

COMPOSITIONS AND METHODS FOR THE TREATMENT OF DBA USING GATA1 GENE THERAPY

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS-REFERENCE TO RELATED APPLICATIONS

GOVERNMENT SUPPORT

PCT Information

Provisional Applications (1)