COMPOSITIONS AND METHODS FOR THE TREATMENT OF DBA USING GATA1 GENE THERAPY

Information

  • Patent Application
  • 20220265863
  • Publication Number
    20220265863
  • Date Filed
    June 08, 2020
    3 years ago
  • Date Published
    August 25, 2022
    a year ago
Abstract
Described herein are methods and compositions related to GATA-1 gene therapy for the treatment of Diamond-Blackfan anemia.
Description
SEQUENCE LISTING

The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Jun. 3, 2020, is named 701039-094470WOPT_SL.txt and is 188,598 bytes in size.


TECHNICAL FIELD

The technology described herein relates to compositions and methods of GATA-1 gene therapy for the treatment of Diamond-Blackfan anemia and uses thereof.


BACKGROUND

Diamond-Blackfan anemia (DBA) is one of a rare group of inherited bone marrow failure syndromes (IBMFSs) and is characterized by red cell failure, the presence of congenital anomalies, and cancer predisposition. DBA is usually diagnosed in children during their first year of life. Children with DBA do not make enough red blood cells, the cells that carry oxygen to all other cells in the body. In children with DBA, many of the cells that would have become red blood cells die before they develop. In addition to being an inherited bone marrow failure syndrome, DBA is also categorized as a ribosomopathy as, in more than 50% of cases, the syndrome appears to result from haploinsufficiency of either a small or large subunit-associated ribosomal protein.


DBA is characterized by a specific reduction in the production of red blood (erythroid) cells and their precursors without defects in other hematopoietic lineages. Over the past decade, the elucidation of mutations in the ribosomal protein gene RPS19, followed by the discovery of mutations in 9 other ribosomal protein genes, has led to the hypothesis that DBA is a disorder of ribosomal biogenesis. However, approximately 50% of DBA cases have as-yet-unidentified molecular mutations, despite systematic sequencing of all ribosomal protein and other candidate genes in these cases.


The GATA-1 gene is located on the X-chromosome and encodes a transcription factor that regulates the development of erythrocytes. Recently, loss-of-function mutations in GATA-1 have been found in patients with Diamond-Blackfan anemia (DBA). However, no treatment targeting GATA-1 augmentation specifically in erythroid cells is currently available. Thus, therapeutic approaches that directly target GATA-1 dysfunction in erythroid cells are necessary in order to provide effective treatment.


SUMMARY

Recent studies have shown that GATA-1 augmentation in erythroid cells may have therapeutic effects in Diamond-Blackfan anemia (DBA). However, increasing the lineage-specific expression of therapeutic proteins including GATA-1 in vivo remains challenging. Attempting to increase GATA1 expression with existing technology necessarily increased GATA1 expression in cells (e.g. HSCs) where it is overwhelming deleterious to the subject, negating any possible therapeutic effect.


As described herein, the inventors have identified compositions and methods to increase lineage-specific expression of GATA1 specifically in early erythroid progenitors but not in hematopoietic stem cells as a gene therapeutic approach for the treatment of Diamond-Blackfan anemia. DBA is characterized by a specific reduction in the production of red blood (erythroid) cells and their precursors without defects in other hematopoietic lineages.


In one aspect of any of the embodiments, described herein is a nucleic acid sequence comprising at least one heterologous regulatory sequence selected from a hematopoietic enhancer element and miRNA binding site for a HSC restricted miRNA; and a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.


In some embodiments of any of the aspects, the nucleic acid sequence comprises at least one hematopoietic enhancer element.


In some embodiments of any of the aspects, the enhancer element comprises a sequence of at least 80% homology to a nucleotide sequence that is selected from the group consisting of: SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 38 and/or SEQ ID NO: 39.


In some embodiments of any of the aspects, the enhancer element comprises an enhancer element of a gene selected from the group consisting of: Kell metalloendopeptidase (KEL); 5′ aminolevulinate synthase 2 (ALAS2); and glycophorin A (GYPA).


In some embodiments of any of the aspects, the nucleic acid comprises at least one miRNA binding site for at least one HSC-restricted miRNA.


In some embodiments of any of the aspects, the at least one miRNA binding site for at least one HSC-restricted miRNA is selected from the group consisting of miR binding sites for miR10aT, miR125, miR155, miR130aT, miR142T, miR196bT, miR99, miR126miR126, miR181, miR193, miR223T, miR542, and let7e.


In some embodiments of any of the aspects, the nucleic acid comprises at least one hematopoietic enhancer element and at least one miRNA binding site for at least one HSC-restricted miRNA.


In some embodiments of any of the aspects, comprising: a heterologous 5′ UTR comprising: a 5′UTR sequence of a hematopoietic transcription factor other than GATA1; ii. a sequence of at least 20 nucleotide acids; and/or iii. 1-25 upstream codons uAUGs; and/or b. a hematopoietic enhancer minigene.


In one aspect of any of the embodiments, described herein is a nucleic acid sequence comprising a 5′ UTR comprising; i. a 5′UTR sequence of a hematopoietic transcription factor other than GATA1; ii. a sequence of at least 20 nucleotide acids; and/or iii. 1-25 upstream codons uAUGs and a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.


In some embodiments of any of the aspects, the 5′UTR comprises a 5′UTR of a gene selected from the group consisting of: Runt-related transcription factor 1 (RUNX1), LIM Domain Only 2 (LMO2), or ETS Variant 6 (ETV6).


In some embodiments of any of the aspects, the nucleic acid further comprises at least one hematopoietic enhancer element, miRNA binding site for a HSC restricted miRNA and/or a hematopoietic enhancer minigene (G1HEM).


In one aspect of any of the embodiments, described herein is a nucleic acid sequence comprising an hematopoietic enhancer minigene (G1HEM); a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.


In some embodiments of any of the aspects, the hematopoietic enhancer minigene (mG1HEM) comprises a sequence of at least 80% homology to a nucleotide sequence of: SEQ ID NO: 13.


In some embodiments of any of the aspects, the nucleic acid further comprises a 5′ UTR comprising; i. a 5′UTR sequence of a hematopoietic transcription factor other than GATA1; ii. a sequence of at least 20 nucleotide acids; and/or iii. 1-25 upstream codons uAUGs; and/or at least one hematopoietic enhancer element; and/or at least one miRNA binding site for a HSC restricted miRNA.


In some embodiments of any of the aspects, the nucleic acid further comprises a 5′ UTR comprising; a 5′UTR of a gene selected from the group consisting of: Runt-related transcription factor 1 (RUNX1), at least one hematopoietic enhancer element; and/or at least one miRNA binding site for a HSC restricted miRNA.


In some embodiments of any of the aspects, the nucleic acid the sequence comprises a promoter operably linked to the elements of a. and b.


In some embodiments of any of the aspects, the promoter is not a GATA1 promoter.


In some embodiments of any of the aspects, the promoter comprises a promoter sequence of Elongation factor 1-alpha 1 (eEF1a1).


In some embodiments of any of the aspects, the sequence encoding a GATA-binding factor 1 (GATA1) polypeptide comprises at least 60% sequence identity to a nucleotide sequence encoding a human GATA1 polypeptide.


In some embodiments of any of the aspects, the nucleic acid sequence comprises: a posttranscriptional regulatory element operably linked to the sequence encoding the GATA1 polypeptide.


In some embodiments of any of the aspects, the posttranscriptional regulatory element comprises a Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element (WPRE).


In some embodiments of any of the aspects, the nucleic acid sequence further comprises: an internal ribosome entry site.


In some embodiments of any of the aspects, the internal ribosome entry site is operably linked to a marker gene and wherein the marker gene encodes an optically visible protein or an enzyme.


In some embodiments of any of the aspects, the sequence comprises a sequence selected from SEQ ID NOs 8, 9 and 62.


In some embodiments of any of the aspects, the nucleic acid sequence is a vector.


In some embodiments of any of the aspects, the vector is a plasmid, or an adenoviral, lentiviral or retroviral vector.


In one aspect of any of the embodiments, described herein is a lentiviral particle comprising the nucleic acid sequence.


In one aspect of any of the embodiments, described herein is a composition comprising a nucleic acid sequence or particle and a pharmaceutically acceptable carrier.


In one aspect of any of the embodiments, described herein is a method of treating Diamond-Blackfan Anemia in a subject in need thereof, the method comprising administering a therapeutically effective amount of a nucleic acid sequence, particle, or composition to the patient.


In one aspect of any of the embodiments, described herein is a method of restoring early erythroid progenitor cell-specific GATA1 expression, the method comprising contacting a population of cells comprising early erythroid progenitor cells with a nucleic acid sequence, particle, or composition.


In some embodiments of any of the aspects, the early erythroid progenitor cells comprise a DBA-associated gene mutation.


In one aspect of any of the embodiments, described herein is a nucleic acid sequence, particle, or composition described herein for use in the treatment of Diamond-Blackfan Anemia in a subject in need thereof.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 depicts a schematic of the molecular pathways involved in Diamond-Blackfan anemia (DBA) pathogenesis.



FIG. 2A, FIG. 2B, and FIG. 2C demonstrate reduced ribosome levels with DBA-molecular lesions.



FIG. 3 demonstrates reduced GATA1 expression levels in hematopoietic stem cells (HSPCs) from DBA patients with RP gene mutations (RPS19, RPL5, and RPL35A mutations present in patients shown here).



FIG. 4A, FIG. 4B, and FIG. 4C demonstrate the rescue of erythroid lineage commitment and differentiation (as assessed by morphology (FIG. 4B) and markers of terminal differentiation (FIG. 4C); bottom) in DBA patient HSPCs by GATA1 lentiviral transduction. FIG. 4A. The three patients shown have mutations in RPS19 (Patient 2 and 3) and RPL35A (Patient 1).



FIG. 5 depicts a schematic of the claimed vectors allowing regulated GATA1 expression. The endogenous GATA1 locus is shown above and below the pRRL.PPT.EFS vectors (including self-inactivating long-terminal repeat elements [LTR] with safety modifications and post transcriptional regulatory elements of the woodchuck hepatitis virus) are shown. The vectors either include the endogenous GATA1 promoter or the short EF1α (EFS) promoter. The GATA1 cDNA is codon optimized for improved expression. FIG. 5 discloses SEQ ID NOS 67-69, respectively, in order of appearance.



FIG. 6 depicts a schematic of the use of the claimed GATA1 vectors in primary human hematopoietic cells.



FIG. 7 depicts a schematic of the various combinations of vectors to achieve developmentally faithful expression of GATA1 in early erythroid progenitors but not in hematopoietic stem cells.



FIG. 8A, and FIG. 8B show genomic plots of human GATA1 and diagrams of two vectors. FIG. 8A demonstrates the chromatin accessibility upstream of human GATA1. FIG. 8B. Two vectors to achieve developmentally faithful expression of GATA1 in early erythroid progenitors but not in hematopoietic stem cells.



FIG. 9A, FIG. 9B, FIG. 9C, FIG. 9D, and FIG. 9E depict the five vectors including a control vector to achieve developmentally faithful expression of GATA1 in early erythroid progenitors but not in hematopoietic stem cells. FIG. 9A. R18 EF-1α IRES GFP Control. FIG. 9B. R21 EF-1α IRES GFP miR126. FIG. 9C. R49 EF-1α 1 peak enhancer GFP. FIG. 9D. R50 3 Peak Enhancer GFP. FIG. 9E. GATA1 vector with enhancer and miR126 binding site.



FIG. 10 shows a FACS analysis plot of cells transfected with the R18 EF-1α IRES GFP Control. day 4, day 9 and day 11 of CD71 and CD235a during in vitro differentiation. As cells move from quadrant 1 to 4, they are maturing down the erythroid lineage.



FIG. 11 shows a FACS analysis plot of cells transfected with the R21 EF-1α IRES GFP.



FIG. 12 shows a FACS analysis plot of cells transfected with the R21 EF-1α IRES GFP miR126.



FIG. 13 shows a FACS analysis plot of cells transfected with the R49 EF-1α 1 peak enhancer GFP.



FIG. 14 shows a FACS analysis plot of cells transfected with the R49 EF-1α 3 peak enhancer GFP.



FIG. 15 shows a FACS analysis plots of cells transfected with R18 EF-1α IRES GFP Control, R21 EF-1α IRES GFP miR126, R49 EF-1α 1 peak enhancer GFP, R50 3 Peak Enhancer GFP.



FIG. 16 demonstrates that R50 3 Peak Enhancer GFP of Human GATA enhancer preferentially drives erythroid transgene expression but not CD34+ cells.



FIG. 17 depicts the FACS analysis plots using HSC d4 of Ef1a-GFP, miR126, miR223T, 1peak, 3peak, 1peak-miR126, 1peak-miR223T, 3peak-miR126, and 3peak-miR223T. Experimental outline: D0: Thaw CD34+ cells into SSII+cc100+TPO, culture at 5% O2. D2: Lentiviral infection, recover overnight in SSII+cc100+TPO. HSC D3: split culture—half in HSC conditions, half in RBC differentiation conditions. HSC D4 and D7: Analysis by flow cytometry. RBC D4: Analysis by flow cytometry (to continue every 3-4 days).



FIG. 18A and FIG. 18B show bar graphs depicting GFP expression in a CD34+CD38-CD45RA-CD90+ subset at day 4 (FIG. 18A) and at day 7 (FIG. 18B).



FIG. 19 depicts FACS analysis plots using RBC D4 of Ef1a-GFP, miR126, miR223T, 1peak, 3peak, 1peak-miR126, 1peak-miR223T, 3peak-miR126, and 3peak-miR223T.



FIG. 20 shows a bar graph depicting GFP expression of RBC d4, CD71+CD235+.



FIG. 21 depicts the % of GFP in erythroid subsets. CD71-CD235-, CD71+CD235-, and CD71+CD235+.



FIG. 22 show a bar graph depicting the % GFP fold increase RBC vs HSC. Results are showing for of Ef1a-GFP, miR126, miR223T, 1peak, 3peak, 1peak-miR126, 1peak-miR223T, 3peak-miR126, and 3peak-miR223T.



FIG. 23 shows FACs analysis plots of RPS19 knockdown impairs erythroid differentiation. Experimental outline: D0: thaw cells into Phase I media. D2: spinfect with shRNA lenti+/−GATA1 expression constructs. D4: begin puro selection. D6: remove puro. D7 flow analysis.



FIG. 24 shows FACs analysis plots of RPS19 knockdown rescued by GATA1 overexpression.



FIG. 25 shows FACs analysis plots of RPS19 knockdown rescued by GATA1 overexpression.



FIG. 26 shows a bar graph depicting CD235+/CD235- level of EF1a-GFP, EF1a-GATA-IRES-GFP, 1 peak-GATA-GFP, 3 peak-GATA-GFP, and HMD-GATA-GFP.



FIG. 27 shows a schemata depicting key features and a summary of experimental validation of a GATA1 gene therapy vector to cure DBA.



FIG. 28A, FIG. 29B, FIG. 28C, and FIG. 28D show that developmentally regulated expression of GATA1 rescues DBA phenotype in vitro. FIG. 28A. Accessible chromatin upstream of human GATA1 in descending order from HSPCs to reticulocytes (top) and schematic of lentiviral vector to achieve regulated GATA1 expression (bottom). FIG. 28B. shRNA knockdown of RPS19 in primary human HSPCs impairs erythroid development and is rescued by GATA1 expression. FIG. 28C. Erythroid differentiation of murine G1E cells is achieved with regulated GATA1 expression. FIG. 28D. GFP ratio in erythroid progenitors compared to HSCs shows developmentally regulated expression.



FIG. 29A, FIG. 29B, and FIG. 29C shows exogenous GATA1 expression during erythroid differentiation. FIG. 29A. differentiating erythroid precursors first express CD71 followed by CD235 and finally loss of CD71 during terminal erythroid differentiation. FIG. 29B. Percentage of erythroid progenitors that express CD71 (dark grey) or both CD71 and CD235 (light grey) on day 4 is higher after infection with GATA1 virus. FIG. 29C. Ratio of GFP expression of CD71-CD235+ cells compared to CD71+CD235+ cells reveals decreased expression from hG1E during terminal erythroid differentiation, mimicking endogenous GATA1 expression.



FIG. 30A and FIG. 30B. Regulated GATA1 rescues erythroid block after RPS19 editing. FIG. 30A. Proportion of CD71+ cells that also express CD235 is higher after GATA1 infection. FIG. 30B. Regulated GATA1 promotes erythroid colony formation.





DETAILED DESCRIPTION

As described herein, GATA-1 augmentation in erythroid cells can have therapeutic effects in Diamond-Blackfan anemia (DBA). However, existing methods of increasing GATA-1 expression in erythoid cells also necessarily increase expression in other cell types, e.g., in hematopoietic stem cells. These off-target effects can lead to damaging side effects and must be avoided in order to provide an actual treatment to subjects. That said, increasing the lineage-specific expression of therapeutic proteins including GATA-1 in vivo has proven challenging and has not yet been successfully done.


As described herein, the inventors have identified nucleic acid sequences comprising regulatory sequences that can restore early erythroid progenitor cell-specific GATA1 expression, thereby permitting a therapeutic approach for DBA. Briefly, the methods described herein relate to compositions and methods to increase lineage-specific expression of GATA1 in early erythroid progenitors but not in hematopoietic stem cells as a therapy for DBA. More specifically, described herein are methods of restoring early eythroid progenitor cell-specific GATA1 expression by contacting a population of early erythroid progenitor cells, including but not limited to cells that comprise a DBA-associated gene mutation with a nucleic acid sequence, particle, or composition as described herein.


DBA is characterized by a specific reduction in the production of red blood (erythroid) cells and their precursors without defects in other hematopoietic lineages. Provided herein are methods of treating Diamond-Blackfan Anemia in a subject in need thereof, the method comprising administering a therapeutically effective amount of a nucleic acid sequence, particle, or composition including but not limited to vectors with specific gene regulatory elements for the development of broadly applicable hematopoietic gene therapy approaches for DBA patients, as described herein.


Furthermore, provided herein are methods of restoring early erythroid progenitor cell-specific GATA1 expression, the method comprising contacting a population of cells comprising early erythroid progenitor cells with a nucleic acid sequence, particle, or composition as described herein.


Diamond-Blackfan anemia (DBA) is a congenital erythroid aplasia that usually presents in infancy. DBA causes low red blood cell counts (anemia), without substantially affecting the other blood components (the platelets and the white blood cells). About 47% of affected individuals also have a variety of congenital abnormalities, including craniofacial malformations, thumb or upper limb abnormalities, cardiac defects, urogenital malformations, and cleft palate. Low birth weight and generalized growth delay are sometimes observed. DBA patients have a modest risk of developing leukemia and other malignancies.


DBA is characterized by a specific reduction in the production of red blood (erythroid) cells and their precursors without defects in other hematopoietic lineages. In more than 50% of cases, DBA is caused by heterozygous loss-of-function mutations (haploinsufficiency) in one of 11 genes encoding ribosomal proteins, including the RPL5, RPL11, RPL35A, RPS10, RPS17, RPS19, RPS24, and RPS26 genes. These and other genes associated with Diamond-Blackfan anemia provide instructions for making ribosomal proteins. Approximately 25 percent of individuals with Diamond-Blackfan anemia have mutations in the RPS19 gene. About another 25 to 35 percent of individuals with this disorder have mutations in the RPL5, RPL11, RPL35A, RPS10, RPS17, RPS24, or RPS26 gene. Mutations in any of these genes are believed to cause problems with ribosome function. It is striking that mutations of such ubiquitously expressed ribosomal proteins result in such specific human disorders. Studies indicate that a shortage of functioning ribosomes may increase the self-destruction of blood-forming cells in the bone marrow, resulting in anemia. Abnormal regulation of cell division or inappropriate triggering of apoptosis may contribute to the other health problems that affect some people with Diamond-Blackfan anemia. Numerous theories have been proposed for the pathogenesis underlying these diseases. However, these models are unable to explain the exquisite cell-type specificity of DBA and the other ribosomal disorders.


Haploinsufficiency of ribosomal proteins can contribute to other cell-type specific diseases in humans, including congenital asplenia and T-cell lymphocytic leukemia. It is striking that mutations of such ubiquitously expressed ribosomal proteins result in such specific human disorders. Numerous theories have been proposed for the pathogenesis underlying these diseases. However, these models are unable to explain the exquisite cell-type specificity of DBA and the other ribosomal disorders.


In various embodiments described herein are methods of restoring early erythroid progenitor cell-specific GATA1 expression, comprising contacting a population of cells comprising early erythroid progenitor cells with a nucleic acid sequences, particles, or compositions as described herein. Furthermore, it is contemplated that the nucleic acid sequences, particles, or compositions described herein can be used to treat DBA by administering a therapeutically effective amount of a nucleic acid sequence, particle, or composition as described herein to a patient in need of treatment for DBA.


As used herein, “GATA-1”, “GATA1”, or “GATA binding protein 1” is a protein that is encoded by the GATA1 gene. The protein encoded by this gene is a protein of the GATA family of transcription factors. The protein plays an important role in erythroid development by regulating the switch of fetal hemoglobin to adult hemoglobin. The GATA1 gene is located on the X-chromosome (Xp11.23) and encodes a transcription factor that regulates the development of erythrocytes. Loss-of-function mutation in GATA-1 are linked to hematopoietic disorders, including DBA.


The GATA-1 polypeptide has three functional domains: a N-terminal transactivation domain (TD), essential for transcriptional activation activity, a N-terminal zinc finger (NF), and a C-terminal zinc finger (CF) responsible for the binding to DNA. Exon 4 mutations have been identified in families with dyserythropoietic anemia, thrombocytopenia, thalassemia, and erythropoietic porphyria. Related germline mutations have also been described. The loss-of-function mutations of GATA-1 in DBA occur at the donor splice site of exon 2 in the GATA-1 gene and result in exon skipping.


Sequences for GATA1 are known for a number of species, e.g., human GATA1 (the GATA1 NCBI Gene ID is 2623) mRNA sequences (e.g., NM_002049.3, XM_011543897.2, XM_011543898.2, and XM_024452363.1) and polypeptide sequences (e.g., NP_002040.1, XP_011542199.1, XP_011542200.1, XP_024308131.1) are known in the art. These, together with any naturally occurring allelic, splice variants, and processed forms thereof that catalyze the same reaction are contemplated for use in the methods and compositions described herein.


In some embodiments of any of the aspects, the GATA1 nucleic acid includes or is derived from human GATA1 having the following nucleic acid sequence CCDS14305.1 (SEQ ID NO: 1).









ATGGAGTTCCCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCCA





GTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCT





TCCCCTCTGGGCCTGAGGGCTTGGATGCAGCAGCTTCCTCCACTGCCCCG





AGCACAGCCACCGCTGCAGCTGCGGCACTGGCCTACTACAGGGACGCTGA





GGCCTACAGACACTCCCCAGTCTTTCAGGTGTACCCATTGCTCAACTGTA





TGGAGGGGATCCCAGGGGGCTCACCATATGCCGGCTGGGCCTACGGCAAG





ACGGGGCTCTACCCTGCCTCAACTGTGTGTCCCACCCGCGAGGACTCTCC





TCCCCAGGCCGTGGAAGATCTGGATGGAAAAGGCAGCACCAGCTTCCTGG





AGACTTTGAAGACAGAGCGGCTGAGCCCAGACCTCCTGACCCTGGGACCT





GCACTGCCTTCATCACTCCCTGTCCCCAATAGTGCTTATGGGGGCCCTGA





CTTTTCCAGTACCTTCTTTTCTCCCACCGGGAGCCCCCTCAATTCAGCAG





CCTATTCCTCTCCCAAGCTTCGTGGAACTCTCCCCCTGCCTCCCTGTGAG





GCCAGGGAGTGTGTGAACTGCGGAGCAACAGCCACTCCACTGTGGCGGAG





GGACAGGACAGGCCACTACCTATGCAACGCCTGCGGCCTCTATCACAAGA





TGAATGGGCAGAACAGGCCCCTCATCCGGCCCAAGAAGCGCCTGATTGTC





AGTAAACGGGCAGGTACTCAGTGCACCAACTGCCAGACGACCACCACGAC





ACTGTGGCGGAGAAATGCCAGTGGGGATCCCGTGTGCAATGCCTGCGGCC





TCTACTACAAGCTACACCAGGTGAACCGGCCACTGACCATGCGGAAGGAT





GGTATTCAGACTCGAAACCGCAAGGCATCTGGAAAAGGGAAAAAGAAACG





GGGCTCCAGTCTGGGAGGCACAGGAGCAGCCGAAGGACCAGCTGGTGGCT





TTATGGTGGTGGCTGGGGGCAGCGGTAGCGGGAATTGTGGGGAGGTGGCT





TCAGGCCTGACACTGGGCCCCCCAGGTACTGCCCATCTCTACCAAGGCCT





GGGCCCTGTGGTGCTGTCAGGGCCTGTTAGCCACCTCATGCCTTTCCCTG





GACCCCTACTGGGCTCACCCACGGGCTCCTTCCCCACAGGCCCCATGCCC





CCCACCACCAGCACTACTGTGGTGGCTCCGCTCAGCTCATGA






In some embodiments of any of the aspects, the GATA1 mRNA sequences includes or is derived from human GATA1 having the following sequence NM_002049.3 (SEQ ID NO: 2):









GACACCCCCTGGGATCACACTGAGCTTGCCACATCCCCAAGGCGGCCGAA





CCCTCCGCAACCACCAGCCCAGGTTAATCCCCAGAGGCTCCATGGAGTTC





CCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCCAGTTTGTGGA





TCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCTTCCCCTCTG





GGCCTGAGGGCTTGGATGCAGCAGCTTCCTCCACTGCCCCGAGCACAGCC





ACCGCTGCAGCTGCGGCACTGGCCTACTACAGGGACGCTGAGGCCTACAG





ACACTCCCCAGTCTTTCAGGTGTACCCATTGCTCAACTGTATGGAGGGGA





TCCCAGGGGGCTCACCATATGCCGGCTGGGCCTACGGCAAGACGGGGCTC





TACCCTGCCTCAACTGTGTGTCCCACCCGCGAGGACTCTCCTCCCCAGGC





CGTGGAAGATCTGGATGGAAAAGGCAGCACCAGCTTCCTGGAGACTTTGA





AGACAGAGCGGCTGAGCCCAGACCTCCTGACCCTGGGACCTGCACTGCCT





TCATCACTCCCTGTCCCCAATAGTGCTTATGGGGGCCCTGACTTTTCCAG





TACCTTCTTTTCTCCCACCGGGAGCCCCCTCAATTCAGCAGCCTATTCCT





CTCCCAAGCTTCGTGGAACTCTCCCCCTGCCTCCCTGTGAGGCCAGGGAG





TGTGTGAACTGCGGAGCAACAGCCACTCCACTGTGGCGGAGGGACAGGAC





AGGCCACTACCTATGCAACGCCTGCGGCCTCTATCACAAGATGAATGGGC





AGAACAGGCCCCTCATCCGGCCCAAGAAGCGCCTGATTGTCAGTAAACGG





GCAGGTACTCAGTGCACCAACTGCCAGACGACCACCACGACACTGTGGCG





GAGAAATGCCAGTGGGGATCCCGTGTGCAATGCCTGCGGCCTCTACTACA





AGCTACACCAGGTGAACCGGCCACTGACCATGCGGAAGGATGGTATTCAG





ACTCGAAACCGCAAGGCATCTGGAAAAGGGAAAAAGAAACGGGGCTCCAG





TCTGGGAGGCACAGGAGCAGCCGAAGGACCAGCTGGTGGCTTTATGGTGG





TGGCTGGGGGCAGCGGTAGCGGGAATTGTGGGGAGGTGGCTTCAGGCCTG





ACACTGGGCCCCCCAGGTACTGCCCATCTCTACCAAGGCCTGGGCCCTGT





GGTGCTGTCAGGGCCTGTTAGCCACCTCATGCCTTTCCCTGGACCCCTAC





TGGGCTCACCCACGGGCTCCTTCCCCACAGGCCCCATGCCCCCCACCACC





AGCACTACTGTGGTGGCTCCGCTCAGCTCATGAGGGCACAGAGCATGGCC





TCCAGAGGAGGGGTGGTGTCCTTCTCCTCTTGTAGCCAGAATTCTGGACA





ACCCAAGTCTCTGGGCCCCAGGCACCCCCTGGCTTGAACCTTCAAAGCTT





TTGTAAAATAAAACCACCAAAGTCCTGAAAAAAAAAAAAAAAAAAAAAAA





A






In some embodiments of any of the aspects, the GATA1 mRNA sequences includes or is derived from human GATA1 having the following sequence XM_011543898.2 (SEQ ID NO: 3):










GACACCCCCTGGGATCACACTGAGCTTGCCACATCCCCAAGGCGGCCGAACCCTCCGCAACCACCAGCCC






AGGTTAATCCCCAGAGGCTCCATGGAGTTCCCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCC





AGTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCTTCCCCTCTGGGCCTGAGGG





CTTGGATGCAGCAGCTTCCTCCACTGCCCCGAGCACAGCCACCGCTGCAGCTGCGGCACTGGCCTACTAC





AGGGACGCTGAGGCCTACAGACACTCCCCAGTCTTTCAGGTGTACCCATTGCTCAACTGTATGGAGGGGA





TCCCAGGGGGCTCACCATATGCCGGCTGGGCCTACGGCAAGACGGGGCTCTACCCTGCCTCAACTGTGTG





TCCCACCCGCGAGGACTCTCCTCCCCAGGCCGTGGAAGATCTGGATGGAAAAGGCAGCACCAGCTTCCTG





GAGACTTTGAAGACAGAGCGGCTGAGCCCAGACCTCCTGACCCTGGGACCTGCACTGCCTTCATCACTCC





CTGTCCCCAATAGTGCTTATGGGGGCCCTGACTTTTCCAGTACCTTCTTTTCTCCCACCGGGAGCCCCCT





CAATTCAGCAGCCTATTCCTCTCCCAAGCTTCGTGGAACTCTCCCCCTGCCTCCCTGTGAGGCCAGGGAG





TGTGTGAACTGCGGAGCAACAGCCACTCCACTGTGGCGGAGGGACAGGACAGGCCACTACCTATGCAACG





CCTGCGGCCTCTATCACAAGATGAATGGGCAGAACAGGCCCCTCATCCGGCCCAAGAAGCGCCTGATTGT





CAGTAAACGGGCAGGTACTCAGTGCACCAACTGCCAGACGACCACCACGACACTGTGGCGGAGAAATGCC





AGTGGGGATCCCGTGTGCAATGCCTGCGGCCTCTACTACAAGCTACACCAGGTGAACCGGCCACTGACCA





TGCGGAAGGATGGTATTCAGACTCGAAACCGCAAGGCATCTGGAAAAGGGAAAAAGAAACGGGGCTCCAG





TCTGGGAGGCACAGGAGCAGCCGAAGGACCAGCTGGTGGCTTTATGGTGGTGGCTGGGGGCAGCGGTAGC





GGGAATTGTGGGGAGGTGGCTTCAGGCCTGACACTGGGCCCCCCAGGTACTGCCCATCTCTACCAAGGCC





TGGGCCCTGTGGTGCTGTCAGGGCCTGTTAGCCACCTCATGCCTTTCCCTGGACCCCTACTGGGCTCACC





CACGGGCTCCTTCCCCACAGGCCCCATGCCCCCCACCACCAGCACTACTGTGGTGGCTCCGCTCAGCTCA





TGAGGGCACAGAGCATGGCCTCCAGAGGAGGGGTGGTGTCCTTCTCCTCTTGTAGCCAGAATTCTGGACA





ACCCAAGTCTCTGGGCCCCAGGCACCCCCTGGCTTGAACCTTCAAAGCTTTTGTAAAATAAAACCACCAA





AGTCCTGAAAAAAAAAAAAAAAAAAAAAAAA






In some embodiments of any of the aspects, the GATA1 mRNA sequences includes or is derived from human GATA1 having the following sequence XM_024452363.1 (SEQ ID NO: 4):










GGAAGGGAGCCTCAAAGGCCAAGGCCAGCCAGGACACCCCCTGGGATCACACTGAGCTTGCCACATCCCC






AAGGCGGCCGAACCCTCCGCAACCACCAGCCCAGTCTTTCAGGTGTACCCATTGCTCAACTGTATGGAGG





GGATCCCAGGGGGCTCACCATATGCCGGCTGGGCCTACGGCAAGACGGGGCTCTACCCTGCCTCAACTGT





GTGTCCCACCCGCGAGGACTCTCCTCCCCAGGCCGTGGAAGATCTGGATGGAAAAGGCAGCACCAGCTTC





CTGGAGACTTTGAAGACAGAGCGGCTGAGCCCAGACCTCCTGACCCTGGGACCTGCACTGCCTTCATCAC





TCCCTGTCCCCAATAGTGCTTATGGGGGCCCTGACTTTTCCAGTACCTTCTTTTCTCCCACCGGGAGCCC





CCTCAATTCAGCAGCCTATTCCTCTCCCAAGCTTCGTGGAACTCTCCCCCTGCCTCCCTGTGAGGCCAGG





GAGTGTGTGAACTGCGGAGCAACAGCCACTCCACTGTGGCGGAGGGACAGGACAGGCCACTACCTATGCA





ACGCCTGCGGCCTCTATCACAAGATGAATGGGCAGAACAGGCCCCTCATCCGGCCCAAGAAGCGCCTGAT





TGTCAGTAAACGGGCAGGTACTCAGTGCACCAACTGCCAGACGACCACCACGACACTGTGGCGGAGAAAT





GCCAGTGGGGATCCCGTGTGCAATGCCTGCGGCCTCTACTACAAGCTACACCAGGTGAACCGGCCACTGA





CCATGCGGAAGGATGGTATTCAGACTCGAAACCGCAAGGCATCTGGAAAAGGGAAAAAGAAACGGGGCTC





CAGTCTGGGAGGCACAGGAGCAGCCGAAGGACCAGCTGGTGGCTTTATGGTGGTGGCTGGGGGCAGCGGT





AGCGGGAATTGTGGGGAGGTGGCTTCAGGCCTGACACTGGGCCCCCCAGGTACTGCCCATCTCTACCAAG





GCCTGGGCCCTGTGGTGCTGTCAGGGCCTGTTAGCCACCTCATGCCTTTCCCTGGACCCCTACTGGGCTC





ACCCACGGGCTCCTTCCCCACAGGCCCCATGCCCCCCACCACCAGCACTACTGTGGTGGCTCCGCTCAGC





TCATGAGGGCACAGAGCATGGCCTCCAGAGGAGGGGTGGTGTCCTTCTCCTCTTGTAGCCAGAATTCTGG





ACAACCCAAGTCTCTGGGCCCCAGGCACCCCCTGGCTTGAACCTTCAAAGCTTTTGTAAAATAAAACCAC





CAAAGTCCTGAAA






In some embodiments of any of the aspects, the GATA1 mRNA sequences includes or is derived from human GATA1 having the following sequence XM 011543897.2 (SEQ ID NO: 5):










GACACCCCCTGGGATCACACTGAGCTTGCCACATCCCCAAGGCGGCCGAACCCTCCGCAACCACCAGCCC






AGGTTAATCCCCAGAGGCTCCATGGAGTTCCCTGGCCTGGGGTCCCTGGGGACCTCAGAGCCCCTCCCCC





AGTTTGTGGATCCTGCTCTGGTGTCCTCCACACCAGAATCAGGGGTTTTCTTCCCCTCTGGGCCTGAGGG





CTTGGATGCAGCAGCTTCCTCCACTGCCCCGAGCACAGCCACCGCTGCAGCTGCGGCACTGGCCTACTAC





AGGGACGCTGAGGCCTACAGACACTCCCCAGTCTTTCAGGTGTACCCATTGCTCAACTGTATGGAGGGGA





TCCCAGGGGGCTCACCATATGCCGGCTGGGCCTACGGCAAGACGGGGCTCTACCCTGCCTCAACTGTGTG





TCCCACCCGCGAGGACTCTCCTCCCCAGGCCGTGGAAGATCTGGATGGAAAAGGCAGCACCAGCTTCCTG





GAGACTTTGAAGACAGAGCGGCTGAGCCCAGACCTCCTGACCCTGGGACCTGCACTGCCTTCATCACTCC





CTGTCCCCAATAGTGCTTATGGGGGCCCTGACTTTTCCAGTACCTTCTTTTCTCCCACCGGGAGCCCCCT





CAATTCAGCAGCCTATTCCTCTCCCAAGCTTCGTGGAACTCTCCCCCTGCCTCCCTGTGAGGCCAGGGAG





TGTGTGAACTGCGGAGCAACAGCCACTCCACTGTGGCGGAGGGACAGGACAGGCCACTACCTATGCAACG





CCTGCGGCCTCTATCACAAGATGAATGGGCAGAACAGGCCCCTCATCCGGCCCAAGAAGCGCCTGATTGT





CAGTAAACGGGCAGGTACTCAGTGCACCAACTGCCAGACGACCACCACGACACTGTGGCGGAGAAATGCC





AGTGGGGATCCCGTGTGCAATGCCTGCGGCCTCTACTACAAGCTACACCAGGTGAACCGGCCACTGACCA





TGCGGAAGGATGGTATTCAGACTCGAAACCGCAAGGCATCTGGAAAAGGGAAAAAGAAACGGGGCTCCAG





TCTGGGAGGCACAGGAGCAGCCGAAGGACCAGCTGGTGGCTTTATGGTGGTGGCTGGGGGCAGCGGTAGC





GGGAATTGTGGGGAGGTGGCTTCAGGCCTGACACTGGGCCCCCCAGGTACTGCCCATCTCTACCAAGGCC





TGGGCCCTGTGGTGCTGTCAGGGCCTGTTAGCCACCTCATGCCTTTCCCTGGACCCCTACTGGGCTCACC





CACGGGCTCCTTCCCCACAGGCCCCATGCCCCCCACCACCAGCACTACTGTGGTGGCTCCGCTCAGCTCA





TGAGGGCACAGAGCATGGCCTCCAGAGGAGGGGTGGTGTCCTTCTCCTCTTGTAGCCAGAATTCTGGACA





ACCCAAGTCTCTGGGCCCCAGGCACCCCCTGGCTTGAACCTTCAAAGCTTTTGTAAAATAAAACCACCAA





AGTCCTGAAAAAAAAAAAAAAAAAAAAAAAA






In some embodiments of any of the aspects, the GATA1 polypeptide includes or is derived from human GATA1 having the following amino acid sequence NP_002040.1 (SEQ ID NO: 6):










MEFPGLGSLGTSEPLPQFVDPALVSSTPESGVFFPSGPEGLDAAASSTAPSTATAAAAALAYYRDAEAYR






HSPVFQVYPLLNCMEGIPGGSPYAGWAYGKTGLYPASTVCPTREDSPPQAVEDLDGKGSTSFLETLKTER





LSPDLLTLGPALPSSLPVPNSAYGGPDFSSTFFSPTGSPLNSAAYSSPKLRGTLPLPPCEARECVNCGAT





ATPLWRRDRTGHYLCNACGLYHKMNGQNRPLIRPKKRLIVSKRAGTQCTNCQTTTTTLWRRNASGDPVCN





ACGLYYKLHQVNRPLTMRKDGIQTRNRKASGKGKKKRGSSLGGTGAAEGPAGGFMVVAGGSGSGNCGEVA





SGLTLGPPGTAHLYQGLGPVVLSGPVSHLMPFPGPLLGSPTGSFPTGPMPPTTSTTVVAPLSS






In some embodiments of any of the aspects, the GATA1 polypeptide includes or is derived from human GATA1 having the following amino acid sequence XP_011542199.1 (SEQ ID NO: 7):










MEFPGLGSLGTSEPLPQFVDPALVSSTPESGVFFPSGPEGLDAAASSTAPSTATAAAAALAYYRDAEAYR






HSPVFQVYPLLNCMEGIPGGSPYAGWAYGKTGLYPASTVCPTREDSPPQAVEDLDGKGSTSFLETLKTER





LSPDLLTLGPALPSSLPVPNSAYGGPDFSSTFFSPTGSPLNSAAYSSPKLRGTLPLPPCEARECVNCGAT





ATPLWRRDRTGHYLCNACGLYHKMNGQNRPLIRPKKRLIVSKRAGTQCTNCQTTTTTLWRRNASGDPVCN





ACGLYYKLHQPPFWQVNRPLTMRKDGIQTRNRKASGKGKKKRGSSLGGTGAAEGPAGGFMVVAGGSGSGN





CGEVASGLTLGPPGTAHLYQGLGPVVLSGPVSHLMPFPGPLLGSPTGSFPTGPMPPTTSTTVVAPLSS






In some embodiments of any of the aspects, the GATA1 polypeptide includes or is derived from human GATA1 having the following amino acid sequence XP_011542200.1 (SEQ ID NO 64)










MEGIPGGSPYAGWAYGKTGLYPASTVCPTREDSPPQAVEDLDGKGSTSFLETLKTERLSPDLLTLGPALP






SSLPVPNSAYGGPDFSSTFFSPTGSPLNSAAYSSPKLRGTLPLPPCEARECVNCGATATPLWRRDRTGHY





LCNACGLYHKMNGQNRPLIRPKKRLIVSKRAGTQCTNCQTTTTTLWRRNASGDPVCNACGLYYKLHQPPF





WQVNRPLTMRKDGIQTRNRKASGKGKKKRGSSLGGTGAAEGPAGGFMVVAGGSGSGNCGEVASGLTLGPP





GTAHLYQGLGPVVLSGPVSHLMPFPGPLLGSPTGSFPTGPMPPTTSTTVVAPL






In some embodiments of any of the aspects, the GATA1 polypeptide includes or is derived from human GATA1 having the following amino acid sequence XP_024308131.1 (SEQ ID NO: 65):










MEGIPGGSPYAGWAYGKTGLYPASTVCPTREDSPPQAVEDLDGKGSTSFLETLKTERLSPDLLTLGPALP






SSLPVPNSAYGGPDFSSTFFSPTGSPLNSAAYSSPKLRGTLPLPPCEARECVNCGATATPLWRRDRTGHY





LCNACGLYHKMNGQNRPLIRPKKRLIVSKRAGTQCTNCQTTTTTLWRRNASGDPVCNACGLYYKLHQVNR





PLTMRKDGIQTRNRKASGKGKKKRGSSLGGTGAAEGPAGGFMVVAGGSGSGNCGEVASGLTLGPPGTAHL





YQGLGPVVLSGPVSHLMPFPGPLLGSPTGSFPTGPMPPTTSTTVVAPLSS






In some embodiments of any of the aspects, the sequence encoding a GATA-binding factor 1 (GATA1) polypeptide comprises at least 60% sequence identity to a nucleotide sequence encoding a human GATA1 polypeptide. In some embodiments of any of the aspects, the sequence encoding a GATA-binding factor 1 (GATA1) polypeptide comprises a nucleotide sequence encoding a human GATA1 polypeptide.


In some embodiments of any of the aspects, a sequence encoding a GATA1 polypeptide is comprises, consists of, or consists essentially of a nucleic acid sequence selected from any of SEQ ID NOs. 1-5. In some embodiments of any of the aspects, a sequence encoding a GATA1 polypeptide comprises, consists of, or consists essentially of a nucleic acid sequence with at least 60%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98% or greater sequence identity to one of SEQ ID NOs. 1-5. In some embodiments of any of the aspects, a sequence encoding a GATA1 polypeptide comprises, consists of, or consists essentially of a nucleic acid sequence with at least 60%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98% or greater sequence identity to one of SEQ ID Nos. 1-5, which encodes a polypeptide which retains the GATA1 wild-type activity, e.g., it has transcription factor activity as described herein.


In some embodiments of any of the aspects, a GATA1 polypeptide comprises, consists of, or consists essentially of an amino acid sequence selected from any of SEQ ID NOs. 6, 7, 64 and/or 65. In some embodiments of any of the aspects, a GATA1 polypeptide comprises, consists of, or consists essentially of an amino acid sequence with at least 60%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98% or greater sequence identity to one of SEQ ID NOs. 6, 7, 64 and/or 65. In some embodiments of any of the aspects, a GATA1 polypeptide comprises, consists of, or consists essentially of an amino acid sequence with at least 60%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98% or greater sequence identity to one of SEQ ID NOs. 6, 7, 64 and/or 65, which retains the GATA1 wild-type activity, e.g., it has transcription factor activity as described herein.


Hematopoietic stem cells (HSCs) are the stem cells that give rise to other blood cells. This process is called haematopoiesis. This process occurs in the red bone marrow, in the core of most bones. In embryonic development, the red bone marrow is derived from the layer of the embryo called the mesoderm. Hematopoiesis is the process by which all mature blood cells are produced. It must balance enormous production needs with the need to precisely regulate the number of each blood cell type in the circulation. In vertebrates, the vast majority of hematopoiesis occurs in the bone marrow and is derived from a limited number of HSCs that are multipotent and capable of extensive self-renewal. HSCs are found in the bone marrow of adults, especially in the pelvis, femur, and sternum. They are also found in umbilical cord blood and, in small numbers, in peripheral blood. Mammalian hematopoiesis produces approximately 10 distinct cell types, the most abundant of which belongs to the erythroid lineage. Erythropoiesis results in the production of large numbers of red blood cells that are responsible for supplying oxygen to the developing embryonic, fetal, and adult tissues. They also help maintain blood viscosity and provide the shear stress required for vascular development and remodeling.


As used herein, the term “Hematopoietic stem cell” or “HSC” refers to a clonogenic, self-renewing pluripotent cell capable of ultimately differentiating into all cell types of the hematopoietic system, including B cells T cells, NK cells, lymphoid dendritic cells, myeloid dendritic cells, granulocytes, macrophages, megakaryocytes, and erythroid cells. As with other cells of the hematopoietic system, HSCs can be defined by the presence of a characteristic set of cell markers. In some embodiments of any of the aspects, a HSC can be a cell which expresses CD34, CD90, or the combination thereof. Other marker signatures used to identify HSCs include, but are not limited to: EMCN+, CD34+, CD59+, CD90+, CD117+, CD133+, CD38, lin, CD150+, CD48, and CD244.


GATA1 protein levels are suppressed in HSCs from DBA patients and increasing GATA1 expression specifically in those cells can ameliorate the erythroid lineage commitment defect characteristic of DBA. The expression of GATA1 during terminal erythropoiesis needs to be regulated.


In one aspect of any of the embodiments, described herein is a nucleic acid sequence comprising a) at least one heterologousheterologous regulatory sequence selected from i) a hematopoietic enhancer element and/or ii) a binding site for for a HSC-restricted miRNA; and b) a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.


Regulatory sequences as disclosed herein include but are not limited to promoters, enhancers and other expression control elements (e.g., polyadenylation signals) that control the transcription or translation of a gene they are operably linked to. Such regulatory sequences are described, for example, in Goeddel; Gene Expression Technology. Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990). Examples of regulatory sequences for mammalian host cell expression include viral elements that direct high levels of protein expression in mammalian cells, such as promoters and/or enhancers derived front cytomegalovirus (CMV), Simian Virus 40 (SV40), adenovirus, (e.g., the adenovirus major late promoter (AdMLP)) and polyoma. Alternatively, nonviral regulatory sequences may be used, such as the ubiquitin promoter, Elongation factor 1-alpha 1 (eEF1a1) promoter or β-globin promoter. A eukaryotic promoter is a regulatory region of DNA located upstream of a gene that binds transcription factor II D (TFIID) and allows the subsequent coordination of components of the transcription initiation complex, facilitating recruitment of RNA polymerase II and initiation of transcription.


In some embodiments of any of the aspects, disclosed herein are heterologous regulatory sequences or combinations thereof that permit carefully regulated expression of GATA1 in hematopoietic progenitors to improve erythropoiesis in DBA without unwanted effects on hematopoiesis.


As used herein, “HSC-restricted”, e.g., as used in reference to regulatory sequences, is an activity or element which preferentially occurs or exists in HSCs as compared to other cells of the hematopoietic lineage (e.g. erythrocytes or erythroid precursors). In some embodiments of any of the aspects, the activity or element occurs or exists at a level in HSCs which is at least 10×, at least 100×, or higher than in other cells of the hematopoietic lineage (e.g. erythrocytes or erythroid precursors). More specifically, an HSC-restricted miRNA is a miRNA that is expressed at higher (e.g., 10×, 100×, or higher) levels in HSCs than in other cells of the hematopoietic lineage (e.g. erythrocytes or erythroid precursors).


The term “heterologous” refers to a combination of elements which is not naturally occurring. For example, a heterologous regulatory sequence is one that is not naturally found operably connected to the coding sequence being considered. In some embodiments of any of the aspects, the heterologous regulatory sequence can be a regulatory sequence not naturally found in that species.


As used herein, “regulatory sequence” refers to a nucleic acid sequence that is capable of increasing or decreasing the expression of specific genes, nucleic acid sequences or polypeptides.


In some embodiments of any of the aspects, the heterologous regulatory sequence is a hematopoietic enhancer element. A Hematopoietic enhancer element is an enhancer element which is active in hematopoetic cells, e.g., in HSCs and/or in other cells in the erythroid lineage. In some embodiments, the hematopoietic enhancer element is active in cells undergoing erythropoiesis. A hematopoietic enhancer element is not necessarily exclusively active in any of the foregoing cells. Alternatively, in some embodiments of any of the aspects, the hematopoietic enhancer element can be HSC-restricted and or restricted to erythroid precursors/progenitors. In some embodiments, the enhancer element is located distal to the sequence encoding GATA1, e.g., it is a distal enhancer element. Suitable enhancer elements can readily be identified by one of skill in the art by consulting, e.g., expression data freely available on the world wide web for one or more cell types in the erythroid lineage and identifying genes which are expressed or highly expressed in those cells.


In some embodiments of any of the aspects, the heterologous enhancer element comprises the following nucleic acid sequence: NC_000023.11:48638900-48639300 on Homo sapiens chromosome X, GRCh38.p12 Primary Assembly (SEQ ID NO: 10):










ACTTTCATGAAATTACTGACATAATTTTGGGTCCAAAATTTCAAAATTTTAAATATTTTTATTTGGAATT






TTAAAATAATTTATATGCTCTTTTTACTGGCTAATAATGCTATTCATTATAATCTGATATTCAAACTGTC





TAAAAAAGTTAACAATCATTGATTTATTTGTTGTATATACAGTTTATTTCTATGACAGTTTTAATGTCAC





CTAATATTATTTTTAATGTTTCAATTTCTCATTTAAATACATTTTGTGTTGTTTATTTTAATCTCATTCA





ATCTGTATGTGCAAATGGCTTAGAAAAAAAGGCCATATATGACAAGCCCACAGCTAACATCATATAGTCA





ACAGTGAAAAACTAAAAGCTTCTCCTTTAAGATCAGGAACAAGGCAAGGAT






In some embodiments of any of the aspects, the heterologous enhancer element comprises the following nucleic acid sequence: NC_000023.11:48641200-48641700 on Homo sapiens chromosome X, GRCh38.p12 Primary Assembly (SEQ ID NO: 11):










TTTTATTATTTATTTATTTTTTTGAGACAGATTCTCACTCTGTCGCCTAGGCTGGAATGCAATGGCGTGA






TCCCGGCTCACTGCAACCTCTGCCTCCCAGGTTCAAGCGATTCTCCTGCCTCAGCCTCCCGAGTAGCTGG





GATTACAGGCATGCGCCACCACGCCTGGCTAATTTTTTGTATTTTTAGTAGAGACAGGGTTTCTCCATGT





TGGTCAGGCTGGTCTCGAACTACCGACCTTAGGTAATCCTCCCACCTCGGCCTCCGAAAGTGCTGGGATT





ACAGGCGTGAGCCACTGCGCCCGGCCTACATTTATTTTTAAATAAATGGATTTAAATGTTAAGACCTGAA





CCTATAAAAATGGGACACCTGCATAGGGCATTAACCATGAGTAGAGCTTGCAGGACTGGAAGTTGCTATG





GGTGAGTCAGTGTGTGAGTGGTGAGTGAATGGGAAGGCCTAGGACATTCCTGTACACTACCATGGACTTT





ATAAATTCTGT






In some embodiments of any of the aspects, the heterologous enhancer element comprises the following nucleic acid sequence: NC_000023.11:48644250-48645100 on Homo sapiens chromosome X, GRCh38.p12 Primary Assembly (SEQ ID NO: 12):










TCATAGAAACAAAACACTAGGATGGTGGTTGCCAGGGGCTGAGAGGATGGGGAAATGGGGAGTTGCTGTT






CAATGGATATTGCGCCCGGCCAGCCACACCAATTCTTACACCAAGAAGTGATGGAGCACAAGTGCTGATG





GGCCTTAACACCATCATAAACATCTTTTGTTTGTCCCGGGGAAGAAATTCCCAACTCCTTCCAAAGGTCT





GCCAAAGTCTACCAGTATCCCAAGCTGATTTCCTTATCCCCTCAGCAGATGCTGGAAAGCTGGAAGTCTC





CTTCCTTCTCACTCTCCTGCTTGACATCTGCACAGCCATTCTTCTTCCTCCCCTTGCTCCCCTTCCTCCC





CTTCTCCTTCTCCTACTTATTGAGACAGAGTCTCGCTCTGTCGCCGAGGCTGGAGTGCAGTGGTGTCATC





TCGGCTCACTGCAACCTCTGCCTCCTGGGTTCAAGCAATTCTCTTGCCTCCACCTCCTGAGTAGGTGGGA





TTACAGGTGTGTGCCACCACAGCAGGCTAATTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATATTGG





CCAGGATGGTCTCGAACTCCTGACCTCAGGTGATCTGCCTGTCTTGGCCTCCCAAAGTGCCGGGATTACA





GGCATGAGCCACCGGCGCCCGGCCCTTTTTATTATTATATATTATTTTTGAGACTGGGTCTCACTCTGTA





ATCCAGGCTGGAGGGCAGTGGCGTGATCACAGCTCACTGCAGCCCTGACCTCTTGGGCACAAGCAGTCCT





CCCGCGTCAGCCACCCAAAGTGCTGGGTCTACAGGCATGAGCTACTGTGCCCAGTCTACGATTTTTTTAA





AATTTATAATT






In some embodiments of any of the aspects, the heterologous enhancer element comprises the following nucleic acid sequence (SEQ ID NO: 38):










ATGAAACCATATCTGCTATTTTCATTTATCTTGGTTTCAGCCTATTTTGCTTGTCTGGACACTACAGTCCACGGGAGCCTAGG






TCGAGCGAGGTCCAAGAATCCCCAGGGTGGGCAGGGAGGGTGGAAGAGGGCCTCCAGTGCCCAAGAGGTGCCCCACAAGCATG





GGACCCGCCCCCTCCCCTGGACTGCCCCACCCACTGGGGCACCAGCCACTCCCTGGGGAGGAGGGAGGAGGGAGAAGGGAGGG





AGGGAGGGAGGGAGGAAGGGAGCCTCAAAGGCCAAGGCCAGCCAGGACACCCCCTGGGATCACACTGAGCTTGCCACATCCCC





AAGGCGGCCGAACCCTCCGCAACCACCAGCCCAGAGATCTAGAGTTAATCCCCAGAGGCTCCATGGTGAGCAAGGGCGAGGAG





CTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGA





CCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCC





GAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGA





CACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACT





ACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG





GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTA





CCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCG





GGATCACTCTCGGCATGGACGAGCTGTACAAGTAAAGCGGCCGCATCGATACCGTCGACCTCGATCGAGACCTAGAAAAACAT





GGAGCAATCACAAGTAG






In some embodiments of any of the aspects, the heterologous enhancer element comprises the following nucleic acid sequence (SEQ TD NO 39)










ATGGCGGGCAAGAAGTTGAGGCCACTGTCCCTGGGTGTTCCTACCCCCACACCCTCACCCCAAGACAGCCTGTTACTGCGGCG






CCAACAGCCACGGTCGCCTACATCTGATAAGACTTATCTGCTGCCCCAGGGCAGGCCGGAGCTGGCGTAAGCCCCAGTGGGGC





GCTAAGTGAGTGTGCCCCTGCCTCCCGCCAGCACTGGCCTGGCCTGCAGGCTTAGCCTGGGTCATCAAGGTATCCCACAGGCT





CTAGTTCAAATCCAGCAGAACCTCTCTGAGCCTCACTCTTCTCACCTGCAAAATGGGTACAGCCACATCCCTTCTCTCCCTGC





AGCCAGGAAGACGCACATACACAGGAGTCTAGCCCACACCGGCCCCGCACAAATTAAGGGCTTTACTCTCTGAAAAGCCCAGT





GAAGTCATGAAACCATATCTGCTATTTTCATTTATCTTGGTTTCAGCCTATTTTGCTTGTCTGGACACTACAGTCCACGGGAG





CCTAGGTCGAGCGAGGTCCAAGAATCCCCAGGGTGGGCAGGGAGGGTGGAAGAGGGCCTCCAGTGCCCAAGAGGTGCCCCACA





AGCATGGGACCCGCCCCCTCCCCTGGACTGCCCCACCCACTGGGGCACCAGCCACTCCCTGGGGAGGAGGGAGGAGGGAGAAG





GGAGGGAGGGAGGGAGGGAGGAAGGGAGCCTCAAAGGCCAAGGCCAGCCAGGACACCCCCTGGGATCACACTGAGCTTGCCAC





ATCCCCAAGGCGGCCGAACCCTCCGCAACCACCAGCCCAGAGATCTAGA






In some embodiments of any of the aspects, hematopoietic enhancer element comprises, consists of, or consists essentially of a sequence of at least 80% homology to a nucleotide sequence that is selected from the group consisting of: SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 38 and/or SEQ ID NO: 39. In some embodiments of any of the aspects, a hematopoietic enhancer element comprises, consists of, or consists essentially of a sequence of at least with at least 60%, at least 80%, at least 85, at least 90%, at least 95, at least 98 or greater sequence identity to one of SEQ ID 10, SEQ ID NO: 11, ID NO: 12, SEQ ID NO: 38 and/or SEQ ID NO: 39. In some embodiments of any of the aspects, the nucleic acid sequence described herein comprises at least one, or at least 2, or at least 3, or at least 4, or at least 5, or at least 6, or at least 7, or at least 10, or at least 11, or at least 12, or at least 13, or at least 14, or at least 15, or at least 16, or at least 17, or at least 20, or at least 25, or at least 30 Hematopoietic enhancer elements. Where a subset of the three foregoing Hematopoietic enhancer elements is used, any combination of the Hematopoietic enhancer elements can be used in each of various embodiments of the aspects described herein. For example, it is specifically contemplated herein that any pairwise combination of the 3 Hematopoietic enhancer elements can be used, e.g., any combination shown in Table 1.









TABLE 1







Contemplated exemplary combinations of


enhancer elements are indicated by “X”













Enhancer
Enhancer
Enhancer
Enhancer
Enhancer



element
element
element
element
element



(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID
(SEQ ID



NO: 10)
NO: 11)
NO: 12)
NO: 38)
NO: 39)





Enhancer

X
X
X
X


element (SEQ







ID NO: 10)







Enhancer
X

X
X
X


element (SEQ







ID NO: 11)







Enhancer
X
X

X
X


element (SEQ







ID NO: 12)







Enhancer
X
X
X

X


element (SEQ







ID NO: 38)







Enhancer
X
X
X
X



element (SEQ







ID NO: 39)









In some embodiments of any of the aspects, the hematopoietic enhancer element can be an enhancer element of a gene selected from the group consisting of: Kell metallo-endopeptidase (KEL), 5-aminolevulinate synthase 2 (ALAS2), glycophorin A (GYPA).


As used herein, “KEL”, “ECE3”; “CD238”, or “Kell metallo-endopeptidase” is a type II transmembrane glycoprotein that is the highly polymorphic Kell blood group antigen. Sequences for KEL are known for a number of species, e.g., human KEL (the KEL NCBI Gene ID is 3792), the nucleic acid sequence (e.g. NG_007492.2), mRNA sequences (e.g. NM_000420.3) and polypeptide sequences (e.g., NP_000411.1) are known in the art. These, together with any naturally occurring allelic, splice variants, and processed forms thereof that catalyze the same reaction are contemplated for use in the methods and compositions described herein.


In some embodiments of any of the aspects, the KEL enhancer elements includes or is derived from human KEL sequences having the following nucleic acid sequence NG_007492.2 (SEQ ID NO: 40):










NG_007492.2: 5001-26303 Homo sapiens Kell metallo-endopeptidase



(Kell blood group) (KEL), RefSeqGene on chromosome 7


GGGAGGAGAAGCCTGGGTGCCCCCCACTGATAAGCAGGCTCCACCCAGAGGCCAGTCCTGTGTGTCTGGG





GACAAGGCGAAAGAGCAGCAGAAGTGCCCCTTCTCCAGGATCAAGGAACTGGGGCGGGGGGTGTTTCCTG





GACCCCAGTCCTCCGAATCAGCTCCTAGAGTGGAACCAGGAAGGATTCTGGAGCCACAGAAGATAGACAG





ATGGTAAGTCCCCTTTTGGAGTCAGAGGCTTAGCGGGGAGGGGTGAGGGTGGCTGTGTGCAAAAGTCCTG





CCCCCACTGGAGGGGAGGGAATGTAAGGCTTACAGAGTAGAAAGGTGGGGAGAGAGGGAGGTAATGGGAG





AGGGATCGAGAAATGGCACATTCAGGGGACAGGTT GTTCTGAAGCCCATCTGGGAACACTGCTCCGAGA





TAAAAATATGTGTGTGGGGGCAGGGCAGGCAGCGAGGGTATCAAAATGGCCTGATAAAACTCTCTTCAAT





GCACCATTTCCTGAACCAGCTTCTCTCTCCTCCTTCTCCCTCCACTCACTTCAGGAAGGTGGGGACCAAA





GTGAGGAAGAGCCGAGGGAACGCAGCCAGGCAGGTGGAATGGGAACTCTCTGGAGCCAAGAGGTAAGTGG





CCTCCTCTCCTGGGTCTGGAATACACTGATGTTGTCACTCTCGGCTCTAAAATCCCACAAACACTCATCT





ACTAACTGTCTGCTTCATCCTCACCCAAAACAGTTGACATTCCTTGTTTTCTCATCTCCCAGGAGTTAAA





GTAGGGCTGGGTTTAGGAAGAATTGGGATAATTATTTCTGTATAAAGGGACTGTAGCACCAACAGATTCA





TTCTCTCTCCTCTTCTTCCCATCCCTGTCTCTCAACCCCCATCTTGTATCTTTCACCTCTTGGTTCCTCC





CACAGAGCACTCCAGAAGAGAGGCTGCCCGTGGAAGGGAGCAGGCCATGGGCAGTGGCCAGGCGGGTGCT





GACAGCTATCCTGATTTTGGGCCTGCTCCTTTGTTTTTCTGTGCTTTTGTTCTACAACTTCCAGAACTGT





GGCCCTCGTAAGCAAGATCCCAGACCCCCTAACCTAGTCAGCCCTCCCCCAGCCCTGGGGCCCAGGCCCA





GTCCCTGCTCCTGGGGCTTCTGCCCACCCTGACCCTTGGGGTCCCCATGGTTCTTCTTCCTCCCTGCATC





CTAACCATTTCTTTTTCATCAGCTCCCCACTTAGTTACTCACCTGATGTTCTTTGCCTAGCCCCTTGGGG





GAGCCCTTGTCTTTTTGCCTCTTCTTTCCCAGCTCTGAGCTTTTCCCCACAGGCCCCTGTGAGACATCTG





TGTGTTTGGATCTCCGGGATCATTACCTGGCCTCTGGGAACACAAGTGTGGCCCCCTGCACCGACTTCTT





CAGCTTTGCCTGTGGAAGGGCCAAAGAGACCAATAATTCTTTTCAGGAGCTTGCCACAAAGAACAAAAAC





CGACTTCGGAGAATACTGGGTGAGGAAAGCAGGGTGGAAGATGCTCTGTGCAAGTGGGTGACTCTGTGCC





TAAAATGACCATGACTGCTCCAAACCCTGTGTAGTTGTGGAACAACTGATTTGCACCATCCCAGGTGGGA





TTATACGGGTGGATGATTGGAGATGATGGGGGAGTAAAAGAGGCAGGATGGCGGGAGCTGCCTGGGTTTG





CTCATCTCTCACTGTTTCCTGTTGCCTTGCCTTGGGTACCCTTCTTCCGTTTCTCTTGGTCCCTTTCTGC





ATTTTTTTCTTTATCTAATTTCCATCTTCTTTGCTTCTCCATGTATCCATAATTACTCCATTCTCTCCAA





CTTGTCCCTTTTAGCAAGCTCCATCTTTGTTGCTTCCTCCAAATGTTCAGTTTCTATCCTATGCATGGTG





TTTTCCTCCACAAGCATCTCTTCAGCATCTCCTGCATTTCAATTCTTTTGTCCATCACTCTCATTCTCTA





ACCTCCAAAACCTCAGTCTCCCAATGACTCCTTGTCAACATTACCCTCTCCCTCTCACCATGCCGGAGCT





CCCCTCTCTCACAATGATCTCTTGCTTCTTGCTTCTCCATTGAAACCTTGAACCATGGCAAGCAAGTTGA





CCTGGAACAAGTGGGATGTTAGAGATGGATGATTGGAGATGATGGATGATGGTGGAATGAAAGGGGTAGG





ATGGTGGGGTGAGAAGTGAGAGAGGGCTTCATCACTGTGCATAAGAGAAAAAGTGGGTAAGTACAAAGGA





TATGCTGGAAGAAGAGGAGAGCTGAGTTAATTGGCAGTGGAAGTAAAGTTCCTGCAGATGGAGGCTGGAG





AGGAAAACTGCCAGGACTGAGAGGAAAACCAGAAGGATGAGCTGAAACTGAGTAGGAGGTTGGAAGTGCG





TCCCAGGAAGTTGGTGGATGGTGGTGAGGATTTGGGAATAAGAACATATAAGATAGACATGCATTTCCAG





TGCAAGGGAACCTAAAGAATGTGTTGACACTATCAATTAGAATCTGGGAAAAGTAAATGCACCCCTCTGC





CCTCTTTTTTTGATGGGGAAAGAGTGGGAGGGGGCCTCTCTTTGGGTAAATGGATACTTTCAGGGAAGGC





ACAGAGATAAAAAGAAAAAATATGCTCAGGATAAATTATATTGCCTACAATGGGATGAATAGATATCAGG





GGGACTGAGGGTGAAAAGAGTGTTAGATATTAGAGGGTGGATGATTCAGAGAGACTTGCATTTGATTATT





GTAGTGTGTTTGTTTCCTGGGATCAATGGATGAGGAGTCTGGACTAGAAGAGTCTTCCCCTGTTTCTTCT





CTTTGCTAAACCTTTCCTTATGAGTTTTCTTCTCTCCAAATCCTTAAAGTTCTCTAGTTCCCTGAATTTG





TCTAATTTCTTCAATCATTTCTTTTGTCTTTCATTTCTCTCTTTTCTCCTTTGCCCATATCCCACTTATT





GCTACCTTTCTCCTTTCTTCCCTGTCTTTTCCTTCTTGGTTTCTTCCCCACATTTCTTTTATTTTCCATA





TTGTCTTCTTCTCCTCATTCTCTTTCCCTGCTTTCATCATTTCATCAAGTTGATCCATTCCAAATTGGGC





AGTCCTCTCATCTTTCTTATTTTCCTCATCTCTATTCCTCCCCCTCCTTCCATATTCTGTGGGAGTCTTT





CTTTCCTGTAAGCTCCCTGTCTCCCACCCTCCCTCTTTGCCTCTATACCAGTTGCCACTCCTTTAATTCT





CCTGCCGACAAAAAGAGTCAAACTCTGTAAAATATTTGAAAAGATTTATTTTGAGCCAAATATGAGTGAC





CATGGCCCATGATACAGTCCTCAGGAGATCCTGAGAACATGTGCCCAAGGTGGCTGGGGCACAGCTTGGT





TTTATACATTTTAGAGAGTCATGAGACATCAATCAAATACATTTAAGAAATACATTGGTTTGGTCCAGAA





AGGTGGAACAACTCAAAGGGGTGGGGGTGGCTTCCAGGGTACAGGTGAATTTAAACATTTCCGGATTGAC





AGTTGCTTGAGTTTGTCTAAAGATCTGGGATAGATAGAAAGGGAATGTTCAGGGTAAGATAAAGATTGCG





GAGACCGAAGTTCTTTTGAAGTCTTATAGTGGCTGCCCTTAGAGACAATAGGTGACAAATGTTTCCTATT





CAGATCTTAGTTAATCAAAAGATCTAGCTATGTTAATGAGATATGTTAATAGCTAATAGAGATGCTTTAC





AGATGCAAATTTTCCTCCACAAAGAACAGCTTTGCAGGGCCATTTCAAAATGTGGCAAAGAAACATGTTT





TGGGGTAAAATATTTTTGTTTTCTTCTTTGTCTCGTAATGTTATGCCAGAATCAGGTTAGAAAGTAAATC





ATGTTACATGGGTTAAATAAAACCCATCTGATGAGAACTTATGATATAGGGCATGACTCCCCAGACCCCT





TTGATAGGAATTTGGGGCAAGATAAAAAAAATCAGAGTTTAGTCCTCACTCCCATGCTTCCTTTCTAGAG





GTCCAGAATTCCTGGCACCCAGGCTCTGGGGAGGAGAAAGCCTTCCAGTTCTACAACTCCTGCATGGATA





CACTTGCCATTGAAGCTGCAGGGACTGGTCCCCTCAGACAAGTTATTGAGGAGGTGAGAAAAGTTGGGAT





ATTAACTTTTCTGGATACATAACATATGGGACCAATGCATGCTTAGGGCTGCCATTTTTTTTTCTAGAGG





GTGGGTCTTCTTCCTAGGGCCCCCCAATTTCTAGGAGGGAGATGGAGATGGAAATGGTTATGCCCTATGA





AAGTATCAGGACCTTGGGAGAAGGCAGATAAAAAAGGATAGATGTGGCTTCCTAGAGGAATCGAAGGGCG





CAGGGCAGAGGTCAGGCAGTAGCAGCTGTGTAAGAGCCGATCCAGACAATGGGGGATGGGCTCCACGGAT





CCTTATGCTCAGCCCCCTCTCTCTCCTTTAAAGCTTGGAGGCTGGCGCATCTCTGGTAAATGGACTTCCT





TAAACTTTAACCGAACGCTGAGACTTCTGATGAGTCAGTATGGCCATTTCCCTTTCTTCAGAGCCTACCT





AGGACCTCATCCTGCCTCTCCACACACACCAGTCATCCAGGTGAGGGATGCACTGGCGAAGACACAGTTG





GACCTGGCCTGCCTCCAACTCTAGCCAATCATCCCTTAGAGGAAGGTTGCAGGTTGGGAAGAGAGGACAC





CTGTGTGATATAGGAAACAACCCTACCTTAAGGGAAAATTATTGATGTGAAAGTCAGGGACATTAGCTGG





GGGTGGGAAATGGAGCAGCAGAGCCAGTGCTGGGAAGACAGAAGTAGGCCTGGTCTTTCTTACTGTTAAT





CTGGATTAGTCTCAGAGCCCCTTAACCAGTCCTCCTATCTCTAGGATTGCCCTCATTTTATTTACTCTTT





ATTTTTACTAGAGGGAACTTTTCTAAACCAAGGGCTAACTAACTATGCTACTGTCTGTATTTAAATGCTT





GTCAGTGACCCAGTGGCTTGCCAGGTCATCAGAATCTAGTCCCTAATCTTTAGTAAAGCTTTGCAAGCAC





CTTGTGATCTGACCCCTACACACTTCTCCAGCCTTATCTCCCGTACATTCCTTCTCTCCCTTACCCCCAA





GCCATGCTGACTCACTGCTGCTTCCAGGAATATTCCTCAGTTCTTTGCCTATGCTGCTCCCTGTGCCTGC





AACCATCCCCCACACTGAACCTGGAAAACTTACATGTTTTTCAAATGTTGGCTTTATTATCTCTTCCAGG





AAGTCTTCACCGACACCCTAGTTATGAGTTAGGTGAAGCCCTGCTCTCCCTACTTTCGTTTCCTCATGCT





CTCAGCATTTATCACTCTGTGTTGAAGATTGTGAGCCTCTTTAGAACAGGACCATGCTTTATTCACCTTT





GTTTCTCAGGACCTATCACAGGGCCAGGCAGCTAGAAGTTTTGCCAGGTATTTGTAGTGAGTGAGTAACT





AAATAAAAACACTGGAGCTATCACTCTTGTGGTTAAACAATGTAATGCTATCTGCATATTTGGGCCCTAC





TGTCAAAAGAGCCACAAAATTACCAAAGGATAAGTACAAAAGAAGAATTGATTATCATTATGAGGTGTTC





TAAAATTTAGTTTTAAACAGTCTGCTCAGGAGTTTAACTGATGTGGCCTTTAGGGGCCGGTTAAGATCTG





GTTAAGGAGAGGCTCAGAGAGGAGAGAATGAGAGAAGGTGAGCTAAGCCAGCCTTGAAACATGGTTAATT





CACACAAGTGGAGGTGAAGCTATGGGGCGTTGGAAATGCTGAGCCAGGGGGAGGACCTGGAATGGTGTGA





TTCCTTCGTGGAGTCAGTGAGGAGGCTGATCTATTTAATTGAGGATTTGGGAGGCAAGGTGGGGTGCAGT





GGGAGGTAAAAGTGAGACTGAAGACATAAGGTTGAGCCTGATTATTTCTAAGAAGCCAGGCGAAGGTGAA





ACATTTGACATAATAGAAAAAAAAAAAAGAGCTACTGAGGCCATCCAACTCTTATGACAATTGTGCATAG





AGCAAGTATTTTGATGGTTGTGCGTAGAGTCAGCAGTTTTGAAGGTCAGTCTGGGGGTGTTGAGGAAACT





AAATGAGCATTTTTGAGGCCCTGAGATAGAGGTAGAAATGGAAAGGAAGAGCCAGGCACAAGGATTTAGG





CAACTTCACCCTAGTGATGATAGTTCATGCTGTTTCTAGAGGATTTGGTGACTGATTGGATATAAAGAAA





GAAAGTGGGGGATTACACAGTGATCCCATTGTTTTGATTTAGTGTGAGTGGGAGGAGGGTGATTATCATC





AGTGTGAGCCTGGATAGTCTCTTGGGTTAAAAGCAGGTAGGAAGAATGGACTACAGAAAGAGAAGTCCAA





AGACTGAGGGCAGAAGGGAGCCAGGGAAGAGAGAGTACTATTGGAGAGATGGGAGCTAGACCAGTATGGT





GGGCCACAAAGGAAAGAAAAGGAGCTTCAGGAAGGAGGGGTCAGCTCAGAGAAGAAGGAATGAGAAGACA





CCCTTGGATACCTAGAGATACTTTCCAAACAGTTATGGCAGTGGACACAGACTGCACAGAGCTTAGGAGG





AAGATAAGAAAGTGGAAACAATGGGCATAGATGCTTTTTTGTTCTTTGAACTGTGGACATACAATGTAGC





AAAAGGGTCAAGTGAAAGTTTTTTTCGAGACAGAAGGAAAAGTATATGGCTCAAGATAAGAGTGGGATAT





TGAAATTGGAGAAGAAAAGGGAAAGAGTAGAAGCAAAGATCTTCAGAATAGAAACAAGGGTTCATCAGGG





CCAGACTAAGGTGAAATATACATGGTGCTTACCTGGGGTGCTAATTTAAGAAGGTCCCCAAAACTCAGTA





TCATGATAAATAGTATTTTATTAAATATTCCTAAAAAATCAAAATCAATGCAACAATACATGATGGAACA





AAATATCAAACTTTTCTTCATTATGAATTTTTTTGAAAAAAGATTATGCTTTTTTTCCCAAAAAATGGGA





CAAAATTCTGTGTGAATCTTTTTGAAAATACTAATTTTTTTATTCAAAATGAATCAAAAATACATTGAGG





ACTTTTCTTGAACACATCATGATTCTTTTCAAAATTGACTAAAAGTATGTTTTTTTGGGGAAAAAAAGTC





CATGATAAGCAAAGTTTTGAGATTTTATTTATCATACATTTTTGGTAGTAATTTTGATTTTTTAAAATGT





TAATTATTTATCTTGATTACTGAGTTTTTTTAAAAAAGAGTTTATTTGAGCAAAGACTGATTTATGAATT





GGGCAGCATCCTGAAGCAGTAGAGGTTCAGAGAGCTCCACCCAACAATGCAGGCAGGCAGTATTTACAGA





AAGAGGAAGTGACACCCAGAAACAGCTTGATTGGTTACAGCTTAGCAATTGTCTTTAATGGGCATGGTCT





GATCACTTGACAGCCTGTGGTTGCCTGAAGATCAGCTGGTATGGCTGGCTGAGATGGAGCTACCTGTTGC





AAGAATATACTCCTAAGTTAGGTTGCAGTTTGATTACTGAGTTTTTGGTACCTCTTAGATTTTGTACCTG





GGACAGGTTCCTCACCTCACTCACCCTGGCCCTGTTCCTGAGACAAGGAATAGCTCCTTTTAAGATGCTG





ATTATCATGCTTCTGCCTTGCTGGGCACACCCACACTGGTTGTAATACTCACCATCTCTTCCCATTTTCA





CATCTGGACTCTTCTTCTCATGCCCCTCAACCCTTAATCCCTCCCTTTCTTTGTACTCTTGCTTCTCTTC





TGTCCAATCTTTGTGTCCATCTCCCAAGGCCATCTCCCATGGTATATTCCCCACCTCCCCACACCTGCCC





TCTCCATCCGCCATGCTCCCTGCTTCTCTCCAGTCTCTCTTGTGCCCAGATAGACCAGCCAGAGTTTGAT





GTTCCCCTCAAGCAAGATCAAGAACAGAAGATCTATGCCCAGGTAAGATGGCACATGGACAAAGGCCCTG





CCCTCTGAGGCCAGGAGAAAAGCAGGGACCTCTGGCACCTGTGACTGACATTTCCTTCCTCCAGATCTTT





CGGGAATACCTGACTTACCTGAATCAGCTGGGAACCTTGCTGGGAGGAGACCCAAGCAAGGTGCAAGAAC





ACTCTTCCTTGTCAATCTCCATCACTTCACGGCTGTTCCAGTTTCTGAGGCCCCTGGAGCAGCGGCGGGC





ACAGGGCAAGCTCTTCCAGATGGTCACTATCGACCAGCTCAAGGTGCCTGGAACTGGGGGCCAGAAGACT





GTGGGCATGGGGATCTTCCTCTCAAACATTACCTCCTTTCCTTCTTCCTCCTAGTGCCCTTAATACCTTT





TCATTCTGTCTCTGACTCCATCCCCTCCCCCAGTTAGCCTGTTCTCTTCTTTTTCTCACACCCAAGGGGA





AGCCCTTTCCCCTTCCTTCTCTTTTCCTTTTCCCCCTCAGCTTTGTGTCCCTCCTCTAAGGAAATGGCCC





CCGCCATCGACTGGTTGTCCTGCTTGCAAGCGACATTCACACCGATGTCCCTGAGCCCTTCTCAGTCCCT





CGTGGTCCATGACGTGGAATATTTGAAAAACATGTCACAACTGGTGGAGGAGATGCTGCTAAAGCAGAGG





TTCGCCGCAGGTGGGATTGGGGAGATCATGGAAATGGAGGAGAGCCTGAGCACCGTAGATCTTGGGGGCA





AAGGAAACCTTGGGGAAGGCAGGCTGGTAAGGGCCTCCCAGGAGGATAAGAGGAACCTGCCACCTGTGCG





GGCAGAGAAGCGTGGGGTGGGTGGCACAGAGAGGATGGAGGGATCAAGAAGGATGTGTCTTGGGAGCACG





AGTAAGGGAGGATACACACGACATGAGGAACGCAGGGTCAGCCAAGACACGGGGTTTCCTGAGAGTAGAA





CACCAGCCAGTCAAGAGCCTCTGAGCTGTAGAAGATGCTGGAAGACCCAGACACAGAAGACAGTTAAGTG





TATGTATGTCTTTTTAGCAGCTGAGGACTGTGGGCAGGAGGAGGAGGCACATGAGATGAGGAGATGAAGA





TGGTGAAGGCTGGGGATGCTTAGGGGAAGAAAGGAAGAGGAGGGGCCATTCCTCAGGTGTGGTGTGAAGA





TGCTGGAGCTCTTATGGGAAACAATGTCTAAGAGCATTTCTGCTGGTGTCAGGAAATCAAGGGGGTGTTG





GGGTTGGGGACATGAAAGAGTGGCTCTTTGTTGGGCTCTCTGCCTCCCCTGATACCTGGGTGGCTACCAC





CTGAAAGCAGTGGCTTTCTTCCAGGGGCTTGGACCTAAGGGCCTTCTTCATGGTGGCAGCAGCATCTGGA





AATCCTTTTTGAGGGAGGTAGCTGCCCATTCACATGGCAGTGAGCAGGCTTACATAAGGGTGCAATGCAG





CCCTGGCAGGAGCATTGCTGGTGGAGGAGAGAGCAGTCACAGAGACCAGCTTACTTATGCTTATGAGATA





CATCTGAGGATAACCAGAGATATCTTGACTGTGGAAGCAGAATCTGTTTCATGACATGAGTCCAGACTCC





ATCTAGCCCAGAACTTTCTTTCCCTGTGACTTTGAAGGCTGCCTCTTCATCTAGTTTCTTTTACTAAGGA





GCTAGATCCCACCCCAACCTACATCATGAAAAGCTCTTTTTGACTTGGGTGCATGTTAAAACACTTATTA





ATACAGAGGAGAAGGAGCTGCCTTCACGAGTATCAAGGTGACTTACACAAGGAGAGGCTCTTCTTGAAGC





ATCCCCAGATTCCTGGGGTATATGTGTGGGTCTCTTTTGTCTCCATAGGGACTTTCTGCAGAGCCACATG





ATCTTAGGGCTGGTGGTGACCCTTTCTCCAGCCCTGGACAGTCAATTCCAGGAGGCACGCAGAAAGCTCA





GCCAGAAACTGCGGGAACTGACAGAGCAACCACCCATGGTGAGGAGAGGAGCGGGTGTATTTGCCCAGAT





ACTCGAAAGGAGTATCTACTCTTTTGAGGGGTAAATGTCGGCATCTCTCTCTCAGGGAGGGGGCCGTGAT





GGTAGATGCCCCTCCATGTCTTGGCTTTCCATAGAAGCAGGCAAGTTGGACAGACAAAGTTTAACTTGAA





AACCAAGATGCCACGTGCCAGACCTTCAGGCACACATCTCCCAGCCTGACTACCTCTCTGGCTTCTTGCT





GGGTGTTTGAGCTCAAATATAAAACTCTGATATTATCAAAACTGCCCTTTCTTTGTCATGATGCTTACAC





TATTTGCTCAGGATAACTTGGACTTAGAGCTTACAATTTATTGGGATGACAGAGAGATATGTTACGCAGT





GGCCTTCCTTATGTCTAGTTGATTCCATGTTCAAACGTGCTTCACAAAGAGTTTATCTCTGACATCCAGT





GGGATCCACTGGGCCACATGTAGACTTTGTGGCACAGATGTGGATATATCTGAGGAGGGGCCTGGGTAGA





AAATGCACTTCACTAACCAGAGTCTACTTATTACATAAGATGCAGAGATGCTCCTTTGCTGAGAATCTTG





AAATCCCAAGTTGGATATATCCAAATGCAAGCAGAAGAGTCTAGTACATTGGATACATCCCAACCTCAGT





GAAGGCCTCAGTTTAGTCTTAAAAATCACTGGATTTTTTTTCTTAGTAATTTGTGGTCCATTTCCCTGCC





TTGGAGAAACTCTCTGCTTTGGCAACCTAAAATTGCTGTGGAATTCAGAGAAGATAAATGTATTCACAGG





GACTGGAATGTAGTTATTGCTTATCAAGAGCTAATGGTGTGCTAGACACTCTGAAATCCTTTAGATCTAA





ATCTAGATTTAGATTTAATCTTTACAATTCCATGAGGTACCATGGATGCCATTTGGTTCCTATTTTAAAG





AGGAGGAGACAGAGGCACGAAAGATAAGGAAGTTGCTCAGGTATGACAGTAAGTTAGTGGGGTGAGGATT





TGAACCCTGGCAGTCTGGCTCCAGGGTCTGTGTTGTTTACTCATTGTGCTAAAAAAGCAGTCTTCCTGAG





GAACATCACTTGGGTTGGAGAGTGGCCAAGAAGCTTCTGCCCAGCTTTTCTCTTGATTCAGATGAAGCAG





ACCAGAGCCCCAAGTTATCTTAATTGGGGTTGCTACAAAATCCTGGCAACAAACAGCTACCTATAAATGC





CAGCACCATGGCCTCATGGCACTTCTTGGAGGCTGTAAGAGTGCTAATGTTGAGGCTTAGGCTTAAAGAA





TGCAGAAGGCTTAGATGTCCTGAAGCCATTATCTTTTCCACTAGGGCACATAATTGTCCTTGGGCTTAAA





AGCTGAACTAATCTCTGCCAACAAATAGTTGTGTGACCTTGGGGACGCCACTTCACCTTTCTGGAACAAT





AGTATAAAAGATGGCACTTAATAATAATGATAATAGCTGCTATACATGGAGTAGTCACTGTCTGTCAGCA





CTTGGGACAGGTTATTCATTTAAATCTTCCAGAAACACTTGGAGGTTTTTAATCCCCATTTTGCAGAAGC





AAAAATAGGCTCAGAAAGGTCAAGAAACTTTCTCAAGACCACACAGCTCACAAGTAAGTGAACAGACTCC





AAAACAGATGTTTTGGCTCATAAAGTCATGTTTTTAACCACACACTATACAGGATTGAGAAACAAGTAGG





TGCTACAAACAAAGGTTAGAAAACTTTTTTATAAAGGGCAACATAGTAAATATCGACTTCGTGATCCATA





AATGGTTGGTGTTACAAACTACTCAACTCTGTCCCTGTAGTGCAAAAACAACTGTACACTAAGTAAATGC





TGTGTTCCCAGGGGATCCTGGTTGAGACAGCAGATATTCTTGGAGTTCCCAAGAGGGAGAGATCAGGGAG





CATTTGAAGGATCAGTGGCATCTCTGTGCAGGAGGCAGAACTGACAAAATGTCTAGAGAGAGGAAGGAGT





TTTCTGGTGAAGAAAGGGGTATCATCTCATGGGGACAGGGCAGGAGGCAGGCTGGCTAAAACTTGGTGCA





GGGTGAGGGATCCTCCTGGTGGCTCTGGTTGAGAGGAGAAGACTAGGCTTGCTGTGTCCACTGATGCCCC





TGGAGCATGCTCCAGGTGTTTGAGAATCAGCAAGGGAGCCAGGGCACCTGGATCAGAGTGACTAGGACAA





TAGTGGGGAGGGAATCAGAGCAGGAAGGAGAGAACCATACAAGGTCTGGTAGGTTGCTGAAGGACTTTTG





CTTCTCTCTGTATGAAATAAAGACATGCAGAGGGATTTATCTCATTTATGTTTTAAAAGAACATATTTTA





AGGTTAGTAATGGGATGTCCTGATGATGAGTGATGTGAGAAGGAGAATGGAATCAAAGACATCACCTAGA





GTTTGGCCTTGATATGATCAAAATGTTTGGTTTTATTCAGTGGCCATTAATTACCGACTTCTGATCATAT





TCTTTTGAATGAATTATAATTTATAGTGCCCTTATACAGAAAGATTTCTAAATCTCATTATTGGCCCATC





TTTGGATGATTAGTTTTGAATAGAGTTATAGTCAATGAAAATGGCTGTTAAGTCAGGTTTTCTTTTATGA





AACTTGGGAAGGTGGGTTTTGAGAAGTAAAAGCAGAACTTCACATTTGTGATGATTAAATGTGAATGATT





TATATTCAGCCCAACATCTCAATTTATTCAGGTCTTCCAGCTTTGGATCATTTGCAATTTTATTCAGTGT





ATCTTCGTCCAGACTACTGTTAAGATCCTGAAGGGAGAAGGGCATCGGGTCAGGTTATTGAAGACCTAGA





TATGGATTTATGCATTCATTTATGTAACAAACATTTATTGAGAACCTAGTGTACTTCAGGTACTTCTCCA





GGCACTTGGAATGCAGCAATGAACAAAAAAGACAAATAAATAATCCTGCCTTCAGCCACATATCCTGGTG





AAAGAAGAAAGACAATAAACAAACTAATAAAATAATAAAATATGTTAGGAGGTGTTATGAAGAAAAGCAA





AACAGGAAATGAGGAAAGGAAATGCTAAGTGAGTGGTAGTTAGGATTCTTAGTAGGAATGTCACTGGAGG





TCAAGTTAACTTGAAATCATTCACCATTGATGTTTACTTTTGATTCAGCCAGATGAGACTCCACTCAAAT





TGCACTATCATTCAACATCAGTTTCTCTATCTAATTCACGAGGACTCAATCTGTGTTTTTCAAGCCTGGC





TAAATCAAGATAATGCCAACAGAGTGGGGTAGTGCCTTAGAGTACTTGAAAGGTATTATTTCACCTGATC





CCCAAACCTGTGAGGAAGGTAGACTAGATATTGTTTTCATTTCGACAACTGGTGTCACTGAACCACAGGG





GTTTAAGTTAATAACTCAAACTTAGTAAGTGCTAATACTCTATTCAGTGGTAGGATGGTAGTGGTGCTTG





AGGATGTATTTCGTCTATAGATGTGTTTTGTTAGCCTGTAGAATCTTTTGCAAACTTTGAATTAATCACC





AACATTCAAAAACTAGGATATGGCATGCCAGCATTCAGGTTTCTAGTGTGTGTGTGTGTGTGTGTGTGTG





TGTGTGTCTGTGAAGCTTGGGAAACACTGGGCTACCCTTCTCCTGTGGCAACAACTGACTGTCGCTACAT





GATGCAGCTCAGGGCTGGGTGCGCTCTCTGAAGCCCCACCACAGCCTGTAGCTCTGATGTTGCACTGCTG





TTCTCTGTTATGCCTCTGCATGGCCCCTATTGGAGTTTGCGGCTTCCGGTCTTTCATATGCCTCAGTTAC





ATAAGCCTTTTAGCCAGAAGAATTTTTATCATTTTGGCATTATTTTTCTTCAGTGATCCTATCATAGCCC





TTAGTAGTTACACATTATTTTCCAAGTGTTAAAAAACTGTTTAATGATTCGTTCCACAATTTTGTTTAGA





AATTAACATTAAGGATTCCTGGTTGGCTCGTAATCCCTAAAATTTCCTTTCATCCTATAGAAGATTGGTC





AAATTTTTGCTTCCCTCCGGACTCTTAGAATCTGTCCTGATTTCTATCATTTCTCAAATACTATCTGTGG





TTCTGAGGTTGTATATGGAACTTTTTTTTTCTGGTGCCCTAAAATTAGTCCACTGAGTTTCATTATCTTG





GGTTTGAAGTATTTCTTCTATTGTTTATATTTTGGAGACTTTTTTTTCTCGAATTCTATTTCTCTCCCTC





TCTTTCTCTCTCTGACTCTCCCTTTGCAGTCAATGTGGTATACACTACCATTCCACATCTTGAGAGAGAG





CTGTAGTAGTGGTCTGAGGTGGCGATTGTATTATCCAGTAGTCAGGTCCCACGGCAAAGCATGTTGGAGA





AATGATCAGGCTCCAGCAAAGGGCATCAGGAAACAAATCAAGAATGAGAAGGGGTGAGAAGAATAGGCAG





ATCTACACTTCCAAGCTCAAGTGGTCTCCCTGCTGATGCTGGTTGCTGCTCCACATGTAGCAACTGTCTG





GTAAGAGGTATTCCTGGAGCCAAGCTTGTCCAGCAGAATGTGGCTGGCAGATTCTCAACTTGGCCTATAA





TTGCTTTCAGACCCGGACTTCTTTTTAGTTCCTGTTGTTTCAGAGCTCCAACTCATGCAGCATGAGAAGA





ATCTGAGCCTCTTCTCTTTATCAGAGACAAGGTTGGCCAGGTGCGGTGGCTCTTGCCTGCAATCCCAGCA





CTTTGGGAGGCCAAGGCAGATGGACCACTTGAGCCCAGGAGTTTGAGACCAGCCTGGCCAACATGGCAAA





ACTTCATCTCTGGTGGTAGCCACCTGTAATCCCAGCTACTTGGGAGACTGAAGCAGAAGACTCACTTGAA





CCCGGGAGGTGAAAGTTGCAGTGAGCCGAGATTGCACCACTGCACTCCAGCCTGGGTCACAGAGTGAGAC





TCTGTTACAAAATAAAAATAAAAATAAGACTCAAGGTTAGCAGACCTCAAGGTTCAATAGAACACAGATG





TGGACAGCCAGGCCTGCAGCAACCTCCAAAATGATAACCTCTTTAACTGGTGGGTTCGGGAGTTTTTTCT





TCGGTGACTACCAGACTGGCCTCTTTGGTCTGTTTCCTGTAGTGGGATGCACATAAACCCCCTCCATTCC





CAGGACCAGCCTAGCTCCTGCGGGGAGAGTATTAGTGGCAGCCTTCCTACCTTCCCCGTGGGCAGGTCTT





TGGGAAGTAAAAAAATCACAGGAATAAAGTTTTGAGGCTTCATCCTGCCTAACCCAAATTAGCATATTAG





CTGGTATTTATCAGTTCCAGCTCAGCTTTCCCTCAGGCCAGCTACCTCCTCCTGTCCCTGGGTTCCTTGA





GTGTGTGTCTCCATTTACCGTGTCATCTCTGGGTTTATGCCTTGGTCAAGTTTTTAAAGCCATGCAAGCC





CACCGCCAAGACCTTCTCAGCATCTGTCTCTTCTGTTTCTCATTCTTGAGGTCCTCAGCTGGCACTGCCC





TCTTGGATGTTTGTCCATGGCCTCCTGCCTCTGCAGTGAAAGCCCTCCACCTTCCTGTTCTATTCTCTCC





TCTCTGACTTGGCTGGAAGTCTTCCAGCTCTATGAATTTATACACTGAGTCTTGTCTTGTGTCCTCTTTT





CCTAGCAAACAATATGGCATCTAAAACCCAGTTCTACTCTGATAATTTTTTCTTTACAAGATGCTACAGT





ATGATACACCATGCCCACCTGGAGAGAGGATAAAGGTGATGGTGGTAGGACAGAATTTCCATCCGCAATC





TCCGTTTTGAGCAAAGAAGCATGGAGGATGGAAGTCATTGCTGGGACCCCGGAGTAGAGTGGTGGTGGGG





GAACAGGGGGAACATCAGACTGCCGAGGTATGAGTTTGGGTTCTCATCTTCTTCCCAGGAGGCTTTTGAA





ACCCCAGGATGATGCCTCCTAGAGGCCTTGCTGTCAAATTCAATAGGCAATAACATGAAGGATTTACTCA





GCCAGGCTCATGAGACCAGCTCTGAGGAAGCTGTGCTTTTCTTGTACTGATCGGTGATGTGCATCACCCT





AAGGGATAGTAAACAGATGAAACCCAGAAAGTCCAGTCAAAAGAGCACCCTCTGGGAATGAAGATCTAGT





GAAGACTGGGGAGACAGATGAGGAAAGAGTCCTGAACAGGAGCCACTCATTCCAGCTTTGTCTCCATAGC





CTGCCCGCCCACGATGGATGAAGTGCGTGGAGGAGACAGGCACGTTCTTCGAGCCCACGCTGGCGGCTTT





GTTTGTTCGTGAGGCCTTTGGCCCGAGCACCCGAAGTGCTGTATGTGAGAGCTCTTCCCAGCCCACATCC





CTCCACCCCTTCCTACCCAAAGCAGCCTTCCCTCTTCTATTAACTTTGACTTTCTCAGTGGTGTGTGTGA





TTGGGGAATTGGGCAGTCAGAGAAGGGCCACTGAGAGAGGGAACCCAAAGGCCTGCTCCATCCCTGGTGT





GGAAACAGTTCAGCTTCAGGCCACAAATTCTCCATGACATGCTCTCACTTGGACAAGTCACCCAACTTTC





CTGGTCTTGTGTTTCTTCAACCATCAAATGAGAAAATCGAGCCAGGCTCGGTGGCTCACACCTGTAATCC





CAGCACTTTGGGAGGCTGAGGTGGGCGGATCACCTGAGGTCAGGAGTTCAAGACCAGCCTGACCAACATG





GAGAAACCCCATCTCTACTAAAAATACAAAATTAGCTGGGCGTGGTGGTGCATGCCTGTAATCCCAGCTA





CTCGGGAGGCCGAGGCAGGCGAATCGCTTGAACCTGGGCGGCAGAGGTTGCAGTGAGCCGAGATCACGCC





ATTGTACTCTAGCCTGGGTGACAAGAGTGAAACTCCATCTCCAAAAAAAAAAAAGGAAAATTGAACACTA





TCATCTCTAAGTCTCCTCCCTGTTGTAGCTAAGATTTTTTTAACAACACATGACGTGACATCAGAACAGA





TGACATAATCTTGAAGAGGGCAAATAAATCAAATAAATCACCACTGAATACTTTCTGAGTACCTACCACA





TGCCTGGGACTCCTTCAAGAACTTTGCATGAACTACGTCATTTAGTTCCTATTATGATCCTGATTTTATA





CAAGAGGGAACTGAAGCAAAGAGAGGTTAAGTGACTTGCCCAAAGTCACACAGTTACCAAAAAGCAGAGA





CAGGGTTTGAACTCAGGCATTCTGATGCCAGAGCCCAGGCTCTCGATATTGCCTTTCATTTTCCTCCAGG





AAAGGATTTACATGAGATGGCAGGTGGCTGGGGAAGCAGTGAGTACACACTCACGTTGTGAAGGCAGGGA





GACTTGTGGGGGACTTGCTGGGAAGCTGAAGAGCTCAGGAGGATGAGGAGAGGGAGTGGACGGTTTAAAA





AAGACAGTGTGAGAACAAGAGCCCTGAGCCAGAGGAGAAAATGACAGCCCTCTCCTCCCTCTGATTTCTG





AGAGGTGTTCCTGCCCCCAGGAGTGAGGACACTGTCTTTCTCCTGTGTCAGGCTATTTCCCCATGGAAAG





GAACTATATCTCCCTGATGGCCCTCACGGATGGCCAGGCCCCACCTTCCCTTTGTGGGCTTGGCACTGCC





TTCCTTTCTCCACAGATCCTTTAGTTGCTTTAGTTGAGCTGCTCCTCTAGCAGCAGCTCCAGCCCAGGCA





GCTCCTTGGGGCCAAGCCCTTTTCCAAGGGTCAGAAGCTGTGGGCAGGGCCAGGCTGAGGCCTCTCCTGA





TCCTGTCCCCCTGTCCCTGGACCTCACTCCCACAGGCCATGAAATTATTCACTGCGATCCGGGATGCCCT





CATCACTCGCCTCAGAAACCTTCCCTGGATGAATGAGGAGACCCAGAACATGGCCCAGGACAAGGTCAGG





CCAGGCGTCCTGGCTGGTGTGGGAGCCTGTGCAGGGAATGGAGTATTGGAACAAGCGAGATGGGGATTGG





AAGCAAATGCCAAAGGCCCCCCCAGGCACATGCTAAGTAGGGAAGCCACTGGGCTGTATACTCACACTGG





CAACAATGTGAGAGGCTGGGACAGGGCAACGAGTGGGAGAAATTTCCTCTGGTAGACTCGGAGAGTATTC





CTAGCCTCTTCTGTGTCTCTCTCCAGGTTGCTCAACTGCAGGTGGAGATGGGGGCTTCAGAATGGGCCCT





GAAGCCAGAGCTGGCCCGACAAGAATACAACGATGTGGGTCCCTGTGTTTTCCAGCTCCTTTTCAGTCCT





TGACTTCTCGTCACTTCTCTGACCCTCCTAAGTCTTTGTTGGACAATCAGTTTTCCCTGGGTGACTTAGC





TCTGTCCTTACTCTGGTGCTGGCTGGGGTTGATGGGGAAATATCCACACTGTACGTCTTGCTGGCAGAAG





AACAGAATCTTTTCAGGTCCCAACGCATGTGCCAACACACATGCATGCATCCTGTGACTTGTCTGGGCGT





GTTCATCTGTGTGCTGATATGTGTAAAGCCTGGGTGTGCTGTGTAGTGATGCCATTGGGCTGCTCTCTCC





TAATCCCTGGATGCCTGCCTGTCAGGGCTTGCCTGTTTGGGGTCAAATGGTCCCATTGGTGTTTGTCAGC





GTGCATCTATAGAAGTCTCTGTGTGCCCAAGTCACCTCCTGCCTCTTCCCCAGATACAGCTTGGATCGAG





CTTCCTGCAGTCTGTCCTGAGCTGTGTCCGGTCCCTCCGAGCTAGAATTGTCCAGAGCTTCTTGCAGCCT





CACCCCCAACACAGGTATGACAGCAGGGGAGACACAGGCACTCCATCCCAGAGAGACCCATCCATGATTC





ACAGGAAAGGAAGCCAGGGCTCAGGGCAGGCAGCATGAACAGTAATGGTAGTTGGGAGGGACTGTGTAGG





TCTCAGGGTGGCAGGGCAATACGTGGTGGGGGCTGGAGTTCACATGTCCTCTTCCCACAGGTGGAAGGTG





TCCCCTTGGGACGTCAATGCTTACTATTCGGTATCTGACCATGTGGTAGTCTTTCCAGCTGGACTCCTCC





AACCCCCATTCTTCCACCCTGGCTATCCCAGGTATGGGTCACTCTGTAAGGGTAGGTAGGGAGTTTCCCA





AGAGGGGCCGACAGGTGTTATGATGGATGGGACTTACGGTTGGAGAATTGGGGTCACAAATGCTGAGAGA





TTCTGGGGGTCAAATAAGCCCTTGTCTCCCTAGAGCCGTGAACTTTGGCGCTGCTGGCAGCATCATGGCC





CACGAGCTGTTGCACATCTTCTACCAGCTCTGTGGGTAACAGGGGCCACTGGGAGGTGGGATAATAGGGA





ACCTAAGGGAAGACCACAAGGGAGGCCTGGAGGGGAAAGGGAGGTTATTTGAGGGTTTGAGGTGGGGCAG





TCCTGGGAACTTTGCCATGCTCCTGGGAGCTGATTCAGTCTGTGGTACCACCCACATCCTCACCTAGGCA





GCACCAACCCTATGTTCTCTTGCTGTATGTTCTCTTGTCCCATTTTCAACAGTACTGCCTGGGGGCTGCC





TCGCCTGTGACAACCATGCCCTCCAGGAAGCTCACCTGTGCCTGAAGCGCCATTATGCTGCCTTTCCATT





ACCTAGCAGAACCTCCTTCAATGACTCCCTCACATTCTTAGAGAATGCTGCAGACGTTGGGGGGCTAGCC





ATCGCGCTGCAGGTATGCAAGTGTCAAGGGCCACAGTTTATGTGTACTGGCAGACTAGAAAACATGTCCT





CAAGTTTTCCTTCCACCATTCCTGACACAAGTACAGTTGCATGGCTTTCTGCCCTTCGCATCCCCACTGA





ATAGACGGCAACTTGGGGATCCCCCTCCTACCCCAGAGATCCTCCATTTTAGGACATCTATAGGTCTTCT





GGGAAGTACTCTTTCTTCTGGCTCAGATCAACTAGTCAGTGCAGAACCAGTGAGCAAGGGCCATGGGTTT





TGGGTACTGTGTGGAGGGACTTTCAAATGGCCACAGGTCTAGAGCCTGATGGCCCTTCTCTACCCACCCC





TACCCAGGCATACAGCAAGAGGCTGTTACGGCACCATGGGGAGACTGTCCTGCCCAGCCTGGACCTCAGC





CCCCAGCAGATCTTCTTTCGAAGCTATGCCCAGGTAGGCAGCGGCCACCTCCCGCCACAGCTTGCTTTAT





GTCAGTTGAACGCCTTATTACTGAAGCTCATGGAAGTCCCCTCTTCAGACACTCCGTCAAATACCCCAAA





CCCTCTTCTGCAGATGTCCTCACTGTTATCTTTTCTCTTCCCTCCCTACCCCTTGGAATCACCCCTCAGA





TGACTACAGGTTCTTCTACCTAATTCAGCACCCCCACAACTCAAAAGGTAGAAAAAACTCTATTCCCAAG





TTCCTCCAGGAGAGGAGGAGACCAACTTTTTTTTCCTCTCATACCCCCAAAATACAGATGCCTTAAAAAT





GAGCCTGTGGTTGGGCACAGTGGCTCACACCTGTAATCCTGGCACTCTAGGAGGCCGAGGTGGGCGGATC





ACTTGAGATCAGGAGTTTAAGACCAGCCTGGCCAATATGGTGAAACCCCGTCTCTACTAAAAATACAAAA





CTTAGCTGGGCTTGGTGGCGGGCGCCTGTAATCCCAGCTACTTGAGAGGCTGAGGCACGAGAATCGCTTG





AACCTGGGAGGCGGAGGTTGCAGTGAGCCAAGATCATGCCACTGCACTCCAGGCTGGGTGGTAGAGCAAG





ACTCAGTCTCACAAAAAAAAAAAAAAAGCCTGCGACAGGCTGACTGTGTGCCACATTCCTCTTCAGACAC





CTGACCTTAGGTGTGGCGCCCACTTGACATCACCTCCTTAAGCACCCTGTACTCCCTCAACAGACTCAGG





TGCCAGGTCTTCAACACGCTTAGATTAGACTTCACCCCAGAGCTCCTGCGCTAGACCCTGCCTCTCTGTC





ATTGATAAATGGTATCATTACACAGCCCAGGCCCTCCTCCTGGACTCCTATTGCCAGATTAAATGAACTA





TACATTTCAAATGCTCCATGTGGCCCTTGGGGCACTTGATCCCCTGGTTCCCCTCTTTGTCTGCTGTCCC





TGATCACCCCTTGTCACCGGGTCAGCTTTGTCCTGTGGACCCTCCCCCTTCAATGACCTCTCTTCCTGCT





CAGGTGATGTGTAGGAAGCCCAGCCCCCAGGACTCTCACGACACTCACAGCCCTCCACACCTCCGAGTCC





ACGGGCCCCTCAGCAGCACCCCAGCCTTTGCCAGGTATTTCCGCTGTGCACGTGGTGCTCTCTTGAACCC





CTCCAGCCGCTGCCAGCTCTGGTAACTTGGTTACCAAAGATGCCACAGCACAGAAATATCGACCAACACC





TCCCTGGTCACATCCATGGAATCAGAGCAAGATTTCCTTTCTGCTTCTGTTCCAAAAATAAAAGCTGGCA





CTTGGCTTCCGCTTGTCTCTTAA






As used herein, “ALAS2”, “ASB”; “ANH1”, or “5′-aminolevulinate synthase 2” is an erythroid-specific mitochondrially located enzyme. Sequences for ALAS2 are known for a number of species, e.g., human ALAS2 (the ALAS2 NCBI Gene ID is 212), the nucleic acid sequence (e.g. NG_008983.1), mRNA sequences (e.g. NM_001037967.3) and polypeptide sequences (e.g. NP_001033056.1) are known in the art. These, together with any naturally occurring allelic, splice variants, and processed forms thereof that catalyze the same reaction are contemplated for use in the methods and compositions described herein.


In some embodiments of any of the aspects, the ALAS2 enhancer element includes or is derived from human ALAS2 sequences having the following nucleic acid sequence NG_008983.1 (SEQ ID NO: 41):










NG_008983.1: 5088-27010 Homo sapiens 5′-aminolevulinate



synthase 2 (ALAS2), RefSeqGene (LRG_L163) on chromosome X


ACCTGTCATTCGTTCGTCCTCAGTGCAGGGCAACAGGTAAGAGCTGCTTTCAGCCTGGCACCCTATCTCT





GGTCTGCCAGCTGGTCTCTCAGGGCTGTACACACTGACTCTCTGGTCTGAGTAGATCTGACTTTTTCCTT





TGTTTGTTTCTTAGAATCTGTCTCTTTTTCATTTTCTTTTTATCTCCCATGTCTCTTTCTGTCTTTCCTC





ATTTTCAGCTTTTTTCTCTCTTTTTCCCTTCGTTACTTTCTTTTGTTAGTTTTCAAGATCATTCATTTCA





TTTCATCATTCTCTGACACTCTTGCTTTCTCTTATTTTTCCCTCTGAATTCTAACTATCTTTTTCTCTAA





ATTTCTTTCTCTCCCCCTTTTTGTCTCTTTCCTCGGCTTTGTATCTCTCCGTCTCTGTGTTTCTGTCTCT





CTCTTCCTCTCTATCAAGAACGATGGCTTAATATTTCTTCCTGCAATTCCCCATTCCTCTCTCCCTTTGA





CTCCCTCTACCTGCTGGGCTGACAGCAGAGCTCAGTGGGTCAGAGCCCATGGGGAGCCTAGGGGTGGGGG





AAGAGCTAGGGAGGGAAACTAAGAGGATGTGGGGGTGATGGGAATGATGAATTGGGTAAGGAGAGATTTG





GGGAATTGAGAGATGAATAATTAGCAGAAATAAGTGAAGAAAGTGGAAGAGGAATGTAGTGTCACTATAC





AGAAAGTAAACAGATTTCTATTCTCATCCTAATTCACTGTGAGACCCTAGGCAAGTCATTCACTCTCTGA





AAAAAAGGCTTGGCCTGTAATTTCCACCACCCTTTCTAGTTTTGATTTTGTGATCTTCTAAATTTTCCTG





TTTCTAAGAATTTCTGATTCTCTGATTACAGTTATCTAAAGTTCTGTATGATTCTTTCATGGTGGGAAAG





GGGTACTAGGAAGAGAAGTAAGGCCTGATGTTTCCAACTCCTGAAGAGAAATTACCACTTCCCTTCCAGA





CCTAATTGACTTTTGCAAAGCAGGCCACAAAAGGGGTGGGGGGGTGGGGGACAAGGAATGCTGCAATGAG





TGTTTTCTGGCTGTCTGCTGGGGTAGAGTTGCAGTTGGCCCTTTTCACCTCTGGGAGTACAGATTGGGTG





CTGACACAAGAGAGGATTTTAAAGTCGTAGGGAAAAACTTTCAGTAATGATCTGTTACTTGGTCTCAAAT





TTCACCATCATCTCTTTGGTTAAAAGTATTGTTTTAAGAAGATGCCTGGCAAGCATTATCACACATTAGG





TACATAAGTTATTGAATGGTAGAGTAAATGAATATTCAACAGTACCTGAAATTCCACTGTAGTTACAGAT





CTGTTCCTTTGGTAAGGCATTGGTGACAAATGGCATATGACCTGGAAAGAGGCCTATGTTAGTGCAGCAG





AGGAGATAAATGTCTAGAGTCAGGCCCTCAGTCAAGAAAAAAAGGTAGTAATATTTGAATCACAGATCCA





TAATGGTTAAGTTAGGAATCTCTGGAAACAGATTGCCTAGGTTCAAATCCTGCTTCTCCTATGTACTAGC





TTTCTGATCTAGACAGGTTACTTAATCTTTTTGGGATTCAGTTTCCCTATCATCACAGGGTTGACATGAG





AACACGGCCTGGCACAGAGGGCTCTGTAAGTGTTTGACTATCAGAACTAGGCGGAATCTATGAAATTATC





TAGTCCAATGTCAGTGGAGAAACGGAAGCCCAGAGAGGGGAATTACAGAGCCCAAGTTCACACAATAAAT





TGTAACAGGATTGGGACAAGAATCAATTCTCTAGCTTCCCAAACCCAGCCTGGTATATTCATGTGACTTC





CCTTGGCTGTACGTTCATTTTTTCTACATGGGAAATGGAGAAAATAAAAATAATAAAGTCTATCAATTAA





ATATAATATTTAACACTTTTTTACTGTTTACTCTGGGATAGGTACTCTGCTAAATGCTTTATATGGATTA





TCTTACTGAATCTTCACAACATTCCTGTGATGCAGATTGTCCTTGTTATTACCAACATTTTCCAGATATA





AGATGTACAGCAGGGAAGTGACTTTTCTAAGGTCCCAAAGCTAGTGAGTGGTGGAGCCAGGATTCAAACC





CAAGTAGTTTGGCTCTAGAGCCTATACTCTTTATACCCTAAATTGACTAAAATGCTTCCTTGATTCAATT





TTACTCACTCTAGTCTCTTGGTAGGTAATGAGATGGAATAGAAACAGAGCCCATGGTAACTAGACTACAA





GGTCATGGGTATAATGATGGCCAGGCAGAGTGAGGCAGAGCAAATTTCAGGAAAGGAGTAACAGAACAAG





AGAAATGAGAACAGGAGCTTGAAAGAACTTGAGAATTCAACAAATTCCAAGAAGTGGTCTATATTTTCCC





AGGACCCTGAGCATATCATGGCCAAAAGCCCCCTAGTAATGATGTGTGTTAATTTCTCCTGTTTTTATAT





ACAGGAGGTAGGTCTTCTCCACCATCCCAAGGCAGGACTGGACTTTGCCTCCAATATTGGGGGCTTTCCT





TCCCACTACATACCCCAATGTTGTTGGCATTATTGTTGCCAGTATTGATGTTAGGGGAGTTTACAGGAGC





CTGGAGCCTTGTCATCTGCCTTGCCTGCACTTCTGGGCCATCCATTTCTTACCACCAATAGCCAGGGCCA





GCTCTAGCCAGATGCTCAGACGTGATTCCAGGAAGGGGCTCCTCTTCTCTCCCACGCCCTGGTCTCAGCT





TGGGGAGTGGTCAGACCCCAATGGCGATAAACTCTGGCAACTTTATCTGTGGTCTGCAGGCTCAGCCCCA





AGTGCTTTAGCTTTCACAAGCAGGCAGGGGAAGGGAAACACATATCTCCAGATATGAGGTAGGCACTGGA





TCCAATTCCTTACCTACCTTGTGAAGTGGCCATAATTACCTCACGTTTGACAGCTGATGAAGGCCAAGAT





CCAGAGAGGGGAAGTGATTTGAACAAGAACATCCAACAATGAAATTGGAGAGCTGGAATTTTAATAAGAA





AAGCTAACATTTATTGAAGATTTACTATGTGCCAAAAACTATACTAAAGGCTTAACTTGGATTGTTTCAT





TTAGTCCCTCCAACAACCCTTCTGTCTTTTCCAATTTCAGGGCCCACATGCCTTGGCCCCACATACCAAC





CCAGGCTGCTGTGACAGCCCATGAGAGGGGGAGAGGTTGCTCTGGGATGGAACAAGAAAAAGAGGTTGTT





TTGTGAGGTACGGGGAGGGTGCTTGTTCTATGAGATCAGGAAGGGAGGGAGATGAAGGAGGTTGCCATAT





GAGGGCAGGGCCATGAGCTGACCTGTCCCTCAAAACATAAGGCTGAGGGTGCTAGTAGATTCTACTCAGT





AACTTTCTTCACAGTGTCAGTGCTTTAGTCTTCTCACATTCTCCCATGTCTCTCCCATTGTACTGTCCCT





TATCTTGTCTCACTTTTTGACTCTGTCTTTCCAATTTGCCCTTTTTCTTTACATCTGTCTCTCCTTCTTG





CTCTCTCTAGCTGTCTTTCTCTTGGTGTCTCTCAGCTCTCACCCCTCTTAACCCTCATCCCCCTGCTTTA





GTCACCTCTCTGTCTCTATCCTTTGATCTTGTCATTTTCTCTACTCTCTTCTCTCTGTCCCTCAGTCTCT





CTCTCATCTCCCTCAATTAGGGCCATGATTCTCTTCCCTAAACTTACTTAGCCTTTTGCAATTTCTGGCA





GCATTTTTTTATGTTTGTGTCTGACTGACTCTCTACCCCTGCTGGATCCTCTCCACTCCTGTTCTCACTT





CTATGAATCTTTGTATAATCCTCTAGACTCATTGATCCCTCCTCATGTCCCTTTCGTGCCCCTTGGTCTA





TCTGTCTCTGCCTTTATCCCTGTGTGCACTATCACCACCCCCTTTTTCTTTTTTCATTTTCTCTTTCTCT





CGACTCAATCTCTGTTTTCATCTCTACCCTGCTCCCTTTCCCTCTACCTTTGATCTCTTTTTCCCCCTCA





ATTTCTGTTCTTTTAACTCTACCACCACCACCACATCTTTGTTCTCTCTCTACTTTCCTCCTTTTATCTT





TCCTAAATTTTCTTTTCTTCTGGCTTTTCTCCTAGTCCCTTCTCCTTCCTCAATTTCAGACTCTGTTCAT





TCATCAATTTACCCCAAAATTCAACAAATATTTATTGAGTGCCTGTGTGTCATTTGCTTTCTCTTTTTCT





GATCTCTTTGCCCCCTTTCTCTTCTCTGTCTTGGCCTCTGCCTGTTTCACTAATCCATAGACTATGTCTT





TGTCCCTGTTTTCCAGCCCCACTGGGACTTGCTTTCACCTCTTCCTATATCTGTGCTTATCCAAGAGACA





GGAGCAAATTCAAAGACAGCATAATATCAGGCTGGTGGTACACATTCTGTAGGACCTAGGGCCTACCCTT





CCTTCCGGATCCCTTGATTTCCTTAAACTGATACATGTGACCTCAAGCTCCTTCTCCCCTCTGGCTGATC





CTGCTTAGGAAACACCCTGGGCCAAGCCTCAGGAGCTCTACTCAATGACATATGTTTGCATTAGCAGGCT





GAATCTTCACTTGGCTAAGACCAACATTCTTAGAAAGATTCTTGGCCTTAAGTATTGATCAAAGGGTTAG





TGGGTTGGCAGTTCTCATCCTGCCACACAAAAACACATTTCAGTGATCCTCATCATCACAGAGGTAGTCA





GTGCCAGAATGTGAGTCAGAATCCAGGCTTTCTGACCTCCAGTTAGAACTGTTTCCTTCACCCCTTTGCC





CAGTAGTCAGTTTCCTATTTCTTCCTCCCTCATGTTTTATTGGTACATGTTAACATTGGGAAAGAAGTTC





TTTCCCTGGAAGGGCAATAAGAGCATCTCGGAGGCAGCAAGTTTTGGGTGGGAAGCTGAAGACGAGGATC





AAAGGCTTGGCTTTTTGCCAGGCCCTCATGATGGAACCTCATCTCTTCCATGTCTTCTGCAGGACTTTAG





GTTCAAGATGGTGACTGCAGCCATGCTGCTACAGTGCTGCCCAGTGCTTGCCCGGGGCCCCACAAGCCTC





CTAGGCAAGGTGGTTAAGACTCACCAGTTCCTGTTTGGTATTGGACGCTGTCCCATCCTGGCTACCCAAG





GACCAAACTGTTCTCAAATCCACCTTAAGGCAACAAAGGCTGGAGGAGGTAAGAAGAGGCTGCTAGCAAA





AGGGGAGAATGTTAGGGTCCTGGGGTAAAAGTTCCAAGTTATACTGGCCATCTTTGCCTAATAATTAGGA





CGGTTCATGTGAAAAGTGTCAAGATAGCATGAACTGGCCCCAAAATATACCCAGAATCTGTCTTCTGCCA





GGTTCTCTAGAAAGAGTCTCATTCTCGGCCAGGCACAGTGGCTCACGCCTGTAATCCCAGCACTTTGGGA





GGCCGAGGCGAGTGGATCACGAGGTCAGGAGTTCAAGACCACCCTGGCCAAGATGGTGAAATCCCATATC





TACTAAAAATAAAAAAATTAGCCAGGAGTGGTGGTGGGCGCCTGTAATCCCAGCTGCTTGGGAGGCTGAG





GCAGAGAATTGCTTGAACCCAGGAGGCGGAGGTTGCAGTGAGCCAAGATCATGCCACTGCACTCCAGCCT





GGGCAACAGAGCGAGAATCTGTCAAAGAAAAGAAAAGAAAAGAAAAGAAACAGTCTCACTGTCATGTCCC





TCACACACTATACTCCAGACATGCTGAAACTACTTAAAATTGCCTAAATCAACTATTCTGTCAAGAGTTT





GTGCCTTTGCTCCTGTCAGATTACCCTCTCCTAGACCCTGTACTGGAGAATCTCATACTTCTCATTTGAC





ACTAAGCTTGGCCATCATCTCCTCTGCAAAGCCTGCTTAGACCTCCAAACTGTCTAATTCCAATTCTGGC





TCATTTCCCCTCCCTCTTCTGGACTTCTGTAGCCCATGTACTTCCTCTATCCCAGCACTGTTCACAATGT





GTCTTCAGTGTATGCCATTCCCACCAGTTTAGTAGCTCCCCTAGCACAGGGACCAGACTCATCTATCTCT





GTGTCTCTACAATAGCCTGAGATAGGGCTTTAGGGGTACATTAGATCTCAGCAATTATTGTTGAGCTGAA





CTTATGACTAGAAATGCACCCCAAATTACTCTCTTACCTTTGCATAGATTCTCCATCTTGGGCGAAGGGC





CACTGTCCCTTCATGCTGTCGGAACTCCAGGATGGGAAGAGCAAGATTGTGCAGAAGGCAGCCCCAGAAG





TCCAGGAAGATGTGAAGGCTTTCAAGACAGGTTGGAGTCAAGTTCCACCTTATGCAACCTTTACTCCTAA





TGCTTGAACACACTACGTCACAGTCCTGAGCTAGGCTAATACAAAAGCAGCCAGTACACATCCCATGATG





AGAAGTCCAGTCTTTCCAGGGGAGCCATGGTAGGCAACAGTTTAGGCTGTATGCTGAAGCACACCATACC





TGACAAACACATATGTACGGGCTCCTGAAACTTTTAGTCATTATTCTAAGATGAGCCCTCTAGAATTTTG





ACTCCTCTTTTTCAGGTGGCTAAACTGATCCCAACAGGCTGGGGTCCCACATTTCAGCAAGACCACTCTA





TGAGAATATGGATTTGCATGAAAGAGAAAGAGCTGGGAGTAGGTACCTCCTTTAACCAGGGTGCAGATCC





CCAGGTCAACTTAATTAGTGCAGACCACCCAAGATAATCACCCTTGAGATATGGCCACACTGTTGACATC





TTTCATAGGCCCCTTTGGGATATCATTAAGGACAAAAACTTCAAAATTGAAATTTAATGATGTTTAGAAA





AGAAGAGTAAGGTACATTATCCTGCATCTACTTTCTAAATGCAGGACCCAGGGTGGCTGCTCCAGTTACC





TGAGCCAAGGGAAAATCCTAGTGGAGAGAAGTATGATTCACCTTATAGAAGGTTTCCTAACAATGTAATA





GTCTCCATTCGGGGGGATAAATAGAAGCTCACCTTGGAGAAGATTTCTTCTCGCTGTAGAAGCTGCCCTT





ACCTTATAAACTTGAATTTTCATGTGTTGCATTGAGCTTAAAGAGGACAACACATGCTTTCTTTTTCCCC





CATTCTCTTCACGGCCAATGAATCTCACATTCCGTCTCAGATCTGCCTAGCTCCCTGGTCTCAGTCAGCC





TAAGGAAGCCATTTTCCGGTCCCCAGGAGCAGGAGCAGATCTCTGGGAAGGTCACACACCTGATTCAGAA





CAATATGCCTGGTGAGTTTGCTGAGGTGGAAAAAAAGGGGACCGGAATAGGGAAGGCATTCTGAAAGGGC





CTCTGTCACAGTAGGGGAAACAGTACAGAAGGGCCTTGGAACCAAAGGAAATTTGAGTTTAAAATTTAAT





GCTGGCACTTGCTGGATCTAGGTGTTTTGGCAAGTAAGACACTTTCCTTCAGTGGCATTTAATACCTACC





TCAATAGGTTACCATGAGAAGAAAGTGAAATTACATTTATGGAAGTGTTTCTAATGAGGCTTCATTAAAT





ATTAGGCTTATTTCCATTATTTCTTCTCTATGCTTCCCTCAAAAACTTTCACCCTTCATACAGCACCTTT





TCCCCATTCTTATATGTGTTTATATTCCTTTCCATAATGACATTTACATTATTTTCTAATGTAAAAGGAA





TATGATTCATGGTAAAATATTTTTCAACATATACAGGAAAGTATAAGGAGGGAAATTTAAGTCATGCAGA





GTTCCACCATTAAGTTTTTGTTATATTTTCTCCCAGATATTTTTCTATGGCTACACACACACACACACAC





ACACACACACACACACCCTCTGCTCTCTTCACCACACCCATGCTTTTGTTAGAAGTGTGATCTTATTTTA





CCTGGAGTTCGTTATGCTGTTTTGTTCACTTAAAAATATGTCATGGGTATAGTATGGATTCAATATCATT





CAGTTAATCAAGCATCTATAATTTAAGTTGTTTCCAATTTTTTGTATTCTCTCAGTTTAGATTGTAGGTT





GGTTTTACATACATACAAATGTACTCAAAGAAAATGTATAGTATTACTTTTTTCAATTTTTATTTTTACC





TAATAATATCTTGCTATATATTTTACTCTGTGCCCTTTTTTCACTCAACAATATACTGTGGAAATGCTTC





CACTTTAACACATATGTATCTACCTTATTTTTCAATGCTTCAAAATATTTTGTAGTATAGATATAATAGA





GATTATTTGGCTACTCCTCTATTTGGTTGCTTCCAATTTTTTCTATTACAAACAGTGGTGCAACAAACAT





CCTTGAATGTATCTCCTTGTGTACACAGGCAAGTGTTTCTCCAGGATAAACACTCAGTGGTGGAAATTCT





TGGGATGTAAGGATGTGTACATTTTTGATATTAATACATTTTGTCAATTAGCCCTCCAACATGGCTGTAC





CAGTTATCAAGGAGGGTATCCATAGTCTCATACCCTTACCAGCCCTTGATATTATCAAACTTTAAATCTT





TATCAATTGATAGGTGAAATTTTGTTTTCCCAGTTTTATTTTTCCTGATTAAGAATCTTTTTCTACATTT





ATTGAATTGTCTGTTCATATTCTATGCCCATTTTTCTACTGAGTTGAAATTTTTCATGTTAATTTTTCAG





AGATTATATAATAAATTCTGAGTATCAATCATTTGTCTGTTAAGTATGCTGCAAATATTTCTCTAGATAT





GTCAGTATGTGCATTTAAAAAACTTTTGATATGTATTTCCAAACATCTCTGCAGCAAGGATGTTACCAGT





TTGCACCTCCAGCAGCCATATAAATTGCTGTCTGCAACATGATTTCTGTCTCACGTAAAGAGTTCTAGAG





TTTAACAAGCTCTTTGGCAAACGTTATTTCAATTTATCCTAGAAATAAAGTTACCCCATTTTGTAGTGGT





AATGGTTAAAGAAGTGGGCTCTGAGTTACTTACTTGATGAACACTTACTTGCTGCATGACCCTGGTCAAG





TTGTCTAACACTTAATGCCCCAGTTCCCTCATCTGTAAAATGGAGATACTAATAGAACTGTCCATGGAGC





ATTGTTGTGAGGAATAAATTAAATATTTATAAAGTTCCTAGGAAAGAACTTACATGTACTAGGCATTCAT





TAAATGTTAGCTATAATGATGTAATTGAATATTAGCTATCTTTATTAGTATTATTATGACTACTAATACT





ATAGCAGTAATAATACTACTATTACCATGTGCCATTTATTAGTTTGAATATATTACATGTTGTTGGTTGT





CAGATGCTCACAACTCTCCAAGGAAAGTATTATTAGCCTCATTCTACAAATAAAGAAATTTAAAGTAAGA





AAGAAGATTCATGACTTGTTCAAGGCCACACAGCTAGGAAGTGGCAAAGAGATCGCTAGAAACAAGATCT





GTTGATACTCCTTCCAGTGAGACTGAAAGCAGTGATTCTAGTAAGGAGGCTGCCACACCAACCCGGGAAG





AGAGATGAGGCCATAAGAAAGTCTAAATGAATGTGTGAATGAACTACTGAGTGAATGAGTGAATGAGTAA





GCAAAAGGATGGCTGAATGAAGTAGTAGAGAGTTAATGTGGTCCATAAGTCAATGACTGAGCAAATAAAT





GAATATGTGGAAAAAGAGTTGGAGAACTCAAAATCAGCAACATGGGTAAAATACAGACTAGCCAGGGAGA





GACTTAAAACGAATTCTTTTCATCCTCATATCTGCTCCTGCAGGAAACTATGTCTTCAGTTATGACCAGT





TTTTCAGGGACAAGATCATGGAGAAGAAACAGGATCACACCTACCGTGTGTTCAAGACTGTGAACCGCTG





GGCTGATGCATATCCCTTTGCCCAACATTTCTCTGAGGCATCTGTGGCCTCAAAGGATGTGTCCGTCTGG





TGTAGTAATGATTACCTGGGCATGAGCCGACACCCTCAGGTCTTGCAAGCCACACAGTGAGTAGTAGGCT





TTCAGCCATCAGCAGTGGCCAGAGGAGATGAAAAACCACACATGGAAAAAAAAAAAAGGCAGAGCTGGCA





GTGGAAACTTGGGTTCTATCACCACTTCTTTTGTCCAAGGTCCTCCATCATATCTATTCCTTGGATATGA





AATAAGTCAACACACCATGTTTCCCAAACTCTTCGGTGTCCAATGCTATGGAGGGGAAGGATGGGAGACC





AAGCAAGGCCCACTCTGCCTGAGTTTTTAATCTAGCTGCAGAATTAGTATTGCCAGAGATGGAGTGTGAC





TTCCTCTAGGTCTTCCAAACTACTCAAGCTCAACCTAGCTTCTCCCTCTCTCCCTGAGTACCTCCAGTCC





TAGAAGGAAGGCACATGTCTCCCTATCCTCCCCATCCTTCCCTCTACTTTGTCTCATAGGACACAGTTTA





TATAGGATCACTAACTCAACATTGACTCCCATCAAGGAAGAGAAACCTACCCAGTTCCTCGATGCCTGAC





AAGAGTTTCTTTTTCTCCTTTTCTCCTGTTTTCTCCTGGCCAGGGAGACCCTGCAGCGTCATGGTGCTGG





AGCTGGTGGCACCCGCAACATCTCAGGCACCAGTAAGTTTCATGTGGAGCTTGAGCAGGAGCTGGCTGAG





CTGCACCAGAAGGACTCAGCCCTGCTCTTCTCCTCCTGCTTTGTTGCCAATGACTCTACTCTCTTCACCT





TGGCCAAGATCCTGCCAGGTAAGCCTGAGGCCTGAGCTTTGTTCAGGGCTGGTATCCTGCAATACAGCAT





CCAGTTTCACTGGTTCCATCACTCCTTCCCTGTATTTGGAGTTCCCTCACTCCCATTGTTCTTCCTTCTT





ATCCACCTTGCATATCCTCAACACTGGATAATTATATCCCTCTGCTTTCTCTCCTTCTGCACGTAGAGAG





GACCATTACCGGGGAACATTACCCCACCTCACAGAAAGGAAACACTATAAATTCATCACCTCCCAACTCA





ACTGAGCTCTTAACACACATACATAGTTATTTTATGTCTCCACAGGAGCTTTTTCAAACTTCTTCTCCTC





TTCTAAAACCTCTGACTACCTTCTCCTCCACACTTAGCAAATAACCTCACATCTTACTTCACAATAAAAA





CAGAAGCCCCAGACAGAGAATCCTTATTTATTGCCACCAAACCTACGAACTTATCTAATTGTTTATCTAG





CCTTGCCTCATTCTTTCCTTTTACAATGGAAGGCATATCTCTCCTTCTGCCTAAAACCAATCCCTTCACT





TGTACACTGGTTCCCATATTCCCAGTCTCCTACTCTCTAGTCTGTAATGTCCTCACCTCATACGCCTTGT





TGTCCTTCCGCCAAGGCCCAATCCAGAATGAATACAACCCTCCATCTTCACTATATCAATTCCGGGCTCA





TACAGTTGCTCAGACAGGAGTCACTAAAAATTCATACTCTTAACCTCTACTGGGTTCTCCATGGTCTCTG





ACAATCCCATTTCCCTGGTCAGTTCTCGAAGTTTATGGGGCAGTTTTGCCAAACCACCATTATCCTCAGC





CTTCCCACACCCCCTCCTCCCCATCTCCCTCAGCAGACAACTTCATGTTCTACTACATTCAAAATAGAAG





ATACCAGACAGCAATGTCCTTGACTCCCAGCCACAAAGCACCTACAAACTCATAAGCATCTTCAAATGTC





CTCTCCTCACTCCTTCTCTTCTGTCATAGTGGAAGAAGTATCCTTTTTCTTGTGACTAATCCTTCCACTG





TTGCTCTGTGCCCCATTCCCCTCTACCACCTTAGGAATCTTGACCTATTGGCTCTCTCCTCCTCTCCTGT





ATCTTCAGCCTCTCCCTCTCTTTAAACATGTTTTCAAGTCTCTTGTATCTTATAAAAAAACATTGCCTCA





ACCCCTGATCACTCTCTAGCTACTGCCCTCTTTCCTCCCTATAACAGGCAAACTGCTTGAGAGAAGTCTT





CGCTCTTACTATCTACTTCCTCACCTCCTGCTGATTCTTCAGCACAGCAAAAATATTACCACCACTTCTC





AGAAACTTTTTTTGAGTCCACCCATAAGCCCCAACTAAACTCAACATCTTTAAGTTGTTTTTAGTCCATC





CCCTCCTCAACCATTAAACTTCTTTCCATCTCTACTGCCAGCATCCTAGCCTGATCCAACATCATTTTTT





AAAGAAAATTTTACCTTTGCCCTCCGATAATCTATTCTTTACAACAGTCAGAATTTTTTTTAATGCAAAA





CTATCTTTGTCACCCCACCCTCAGCCCTGGTCAAAACCCTTTAGTGGACCCCCATTCCCCCAGGACCAAA





TCCAAATTTCTTATCACAGCTTCTAAAGTTCTCAATAATCTGGCTTCTATGTATCTCTTCGGTCTCACCT





TTTTGCATCCCTCCTCTCACTATTTCATTCAGTAATACATTCATTCATATACTCATTCACTTACTTATAA





ATCTGTCATCAGTTTATTTATCCATTCATTTAATAAATGTTTACTTAGCATCTACTGTGTGCTTACTCTT





ATACTGGACACCAGAGACAGAGAGATAATAAGATGTTTTTGCTCCCATGCAACTCCCAGTCTGCTTGTCT





TTCAAGCCATTTTCTCCAGAAAGCCATAACTCATTTTCTCAGGTGGAAGTTATCCCTTAATCTTATAATA





AGGCCACAGTTCCTTGATGGCAGTGCAGTTGGTGGCAGGGGTTGGGGAGGTCCAGGAATCAACTCCCTCT





ACCAATTTCACATGCCCACCTGCCCCACCAGGATTGCCCAGTAAAAAGCCCTGCATTCTTCAAATCTTTC





TGGACCTTAGCTTTCTCACTTGTATAGTAAAGGGATGAATCCCATGATCACTAACAGCCCTGCCAGCTCT





GACATGCCATAAGCTTATGATTCCAACAGTAAAAGCCTGATAAATATCCATCCCTGTAACCACAAGCAGA





TGCTACCTGGAATGGATGGAATTTCATCTAGACTAGGAACAATCTAGCATCAGTCCGAGTCAACAAACAT





TCCCTGGGGTAATCCCTTTTTCAAGTCTTGATCTTATATATTGGGGAGAAGGAAAATAGGTCCCGTCCTC





AAAAAACTCTGAAGCTTCTTGGGAAATTAAATGTTCTTCCACCCCAAGGCAGTCAGAGGCTAGACCAGGG





TTACAAATGACTGGAGGGAAGGATGTAGGGGTCAGAATTTGGGAACAGTGAAGTCCTTCCAAGGGAGAAA





GAAGTGTCACAAAAGTTCCCAGAGAAGGAAGAAGCAGAGCAAGGTCTTCAAAGGGAAGAAAGGGTTGGCC





CTTTTCTTTGCCAGGTCAAACCTGAAGGTTGAAGTGGGAGTACTGGGACAGAAGCTTAAGGATTATACAT





CTGCTTCCTCAGGGTGCGAGATTTACTCAGACGCAGGCAACCATGCTTCCATGATCCAAGGTATCCGTAA





CAGTGGAGCAGCCAAGTTTGTCTTCAGGCACAATGACCCTGACCACCTAAAGAAACTTCTAGAGAAGTCT





AACCCTAAGATACCCAAAATTGTGGCCTTTGAGACTGTCCACTCCATGGATGGTATGTATATGAGTGAGT





GTATGTTTACTAGTGTTGGTCTCACAAAAACCATGATGATCATGATGATGATGATGACGATAACATTATA





ACAGCTAATATTTATAGTGTTTATTATGTGCCAAGCAAAATTATTAGTATTTTACATGTATTAATTCATT





TAATTTTCTGAACAATTCTATGTGATAGGTGTTATTATTATTTTGATTTTTTACATGAGGAAACTGAGAC





ATAAGAGTAATTTGTCCAAGGTCACACAGCTAGTAAATGCCAAAGAATGGAGGCAGCTATTACATTCATC





TTATAGGTAAAGAAACTAAAGTTCAGAGTTGGCATCCAATTCATCTTGAGTGGCTCAGCAAGTTGGTGCT





AAAGTGAGTATCTGCACCCTAACACATATAACTCCAATTCCTCGAGTAACACTTCTCTTGTTAGAAATGA





TATGTAAATCAATAATCCCAGTGTTTGGTTTTTATGAAGGAAATTTCAAAAACCATTGCCTAGGATTTTT





TTCAAGGTCCAGTATGAAGCATTGGGGTCAAAACAGGTTTTCAAGTCAGAGAGACCTGGGTTCAAATCCC





ACCTTTGACAGTTACTGGCTATGACCATGGGTAACTCTTTAACTGTCTAAGCCTCAATTTTCCCAAAGGT





AAAATATCTGGTTGTAAGAATTAGAGATGATAGAAACCATTCTAGTTATTATGCTTTAGTAGAATTAAAT





GATCTTCACACTCCTACCTCCTTTCTTTGCTCAATTGAAACAATGTCCAAAGCTTTCTATTGCTGGCCCT





GTTGTGTAGAAATCATGTGTTTTAGGCATCCTCTTATGGATTTATTTAAGGGAAGAGGTCCTCAACTCAT





TTCAGTTTGTCCCTTTTCCAACTGAAACAAAAGAGTCCATAGTATTCCCTGATTTAGGTATCTTAAGTGG





CATGTAATGACTATACACACAGGCTCTAAAACCAGACTATCCATGTTCAAATCCTAGCATGACCATTTAC





TAGCTTGGGCAAGCTTCTTAATTGCTCTGTGTCTCAGTTCTCAGTTGCTTATTTGAAAAATGTAAGTGAT





AATAATTAAATAGGTATGCAAATTAAATGAGTTAATATATGTAAGAAACTTACTATTATGCCCACTCCCA





CATTTCTAACACTAGCAATAAAGTAAAACTATCCTATCCCTTTTGTATATTTCTACCACTGAGACTATTC





AAATTCATTATTTCTCTAGTGGAAACTATGTTGGTACCATTCTACCTCGTTACATTTGCAAATAAATAGT





TATTTACCTATTTTTGGGGTGCAAACTCTGCCCAAACTGTTGATCCTTAGGCTGAATCTCTCCCATTGAA





ATGATGCTAGGCTGAACACAGCAGAAACAGGAAAATAGACATTGTCAGAATGAAGTAAAAACAGAAAGAC





AAAGAGTCAAGCCTTGATCCCAGGCTGGGGAACACACACACATGCGCACACACACGTACACACACACACA





CACACACACACACACACACACACACACACACACACACAGAGAGACAGAGAGAGAGAGAGAGAAGGCAGGG





ATGAGATACAGGCAATCGATCCATACACAGAGGTTTGTAATAGTTCTAAATGAAGGCGCACATCCTCCTT





CCTCTCTACAACACCCTTTTCCAACCCAAAGTAGGCATGTATGGGAAATTCCACATTGGAGATGGAGCTG





GGGAAGGGTTATGATGTCCTACCTCTATCCCTTGGCTTTGCTCAGGTGCCATCTGTCCCCTCGAGGAGTT





GTGTGATGTGTCCCACCAGTATGGGGCCCTGACCTTCGTGGATGAGGTCCATGCTGTAGGACTGTATGGG





TCCCGGGGCGCTGGGATTGGGGAGCGTGATGGAATTATGCATAAGATTGACATCATCTCTGGAACTCTTG





GTAAGTGAATGCTTTGGGCCTTCTTATATACCCTCCAGAGAGGAGGCCCTTACAAAATTCTTTTCTGCCT





CCTCCCCAAAGCTATAGGGGTTGTTTGGACAGAATTCACAGCCCCAGGCTGCTGCCATCCTGGACTCCCT





CTCTCCACTCGCATCCCACTGCAGAGTTGATGAGAAAGTCTGGTAGAGTTTTTTGAAAAGACCTTGAACT





AGGCCAAATAGTTAGATTCAACTTGAGTATGTGAAGAGCTGTGTTTCTAAACCCCTCCCCCACCCTAGCC





CCAAGCTTCATCTTAGCTCCACTCCTGACCCTATCCAGCTAAAGGTCCCCACCCAGCTCCTGCCTATCTA





GTCATTGCATATGGCAAGACTTGAAAGTCCTATCTCAAAGCAGCAGAATTATCAGCTACGACTGCCTTGT





CATGGACAGATGAGCAGAGGCCTGGGAAGACAGCCTGGAGCCCCAACTTCTGGTGCACCCCCTTGTGTTA





TCTGGCACATGATCCTGTTGCTCTGGGACTGATTATGGGATCTGTGTATATCTTATTCCTTTCTGTCTCC





AGGCAAGGCCTTTGGCTGTGTGGGCGGCTACATTGCCAGCACCCGTGACTTGGTGGACATGGTGCGCTCC





TATGCTGCAGGCTTCATCTTTACCACTTCTCTGCCCCCCATGGTGCTCTCTGGAGCTCTAGAATCTGTGC





GGCTGCTCAAGGGAGAGGAGGGCCAAGCCCTGAGGCGAGCCCACCAGCGCAATGTCAAGCACATGCGCCA





GCTACTCATGGACAGGGGCCTTCCTGTCATCCCCTGCCCCAGCCACATCATCCCCATCCGGGTGAGAGCC





CCACCATGCCCATTGCCCTCTCCACCTATTTATTCTGGGAGCCTCACGCTCCCAACAAACCTACATCTGT





TGCTGTCTTCAATTATTTGCTTTCCTGCTAACCATTCCCTTTATTGCCAGCTTTGTTTCCCTTTTTGAAA





AATTATCAGCCATTCTGGATTAACCAGTCTTTTCCTTGCATCAGCCATTACCTCATGCTTATTAGATTAT





CCTAACCCTAACAATAGCGAGTGCTCACAGCCTATAATTCAGAGTTTTTCAAACTGGATCAAGACAATTA





ATGGGTCACAAAATCAGCTTAGTGGGTTATCATTAGCATTAAAAAAAGAAAAGAAACAGAAAATGTTGGA





GTACATCACATACTAAGGGTATCATCAATTTGTGAAAAATTTGTATGCATTTTGGGTATTTGCATATACA





CATGTATGTGTATGTGTGCGTTTATGGTCACGGTGTAAAACGTACTTCTTATTGAGAAATGAGGGCAGAA





AAATAAAATCAAAAGCCATAGGATTAGCTGCTACTTTGGATCCTCAATATGAGCATTTACTGCCTTTAAA





AATGAACTGCTACTTCTTTCTTAAATAACACGTATTTGTGTGAGTCAGTAAGCCAGGGCAGGGAAAGGAC





ACTTATTTGTGACAATTTTGTGGATGAGAAATAGTCACTGCTCTTTAGACTAACCTAGTATTTCCTTTAA





ACACTCATTTTATGAATTAATTTAGTGACAGCACCCCAGAATTGGCTTGGCGGGGGTTCCAGAATTGGCT





TGGTGGGGGGTATCTTCTCACCCAGAACCATCCCAAACTAAGATATTAGCTAAGTAAAATCAGTGTGCTT





GCTCTGCAAACAGCTTCCAAACAGGGCTCCTGGTACCACCTCTGCTCCATCCTTTTCAAACCAAATTGCT





AGCTCTGAGCTCCTCCTTGATAGAAATTCTGGAGCTGCCACTAAGCCCCTAATGGAAAAAAAAAATCTAT





CCCAAAATTCAGTGATGTTCCCTCATCTAGTTCCCTCCATCTGCTTAATGGAGCTAGTGATGGTGGAGCC





AGAGTGGCAGGTACTGATTAGCCTTTCTCCTGAGTCCAGGTGGGCAATGCAGCACTCAACAGCAAGCTCT





GTGATCTCCTGCTCTCCAAGCATGGCATCTATGTGCAGGCCATCAACTACCCAACTGTCCCCCGGGGTGA





AGAGCTCCTGCGCTTGGCACCCTCCCCCCACCACAGCCCTCAGATGATGGAAGATTTTGTGGGTAAGTTC





TCAACATGGGTGCCTACAGGACCTCCCTCCCCTCAGCCCCAGGATCTGAAAGAGAAGCTGAGAGGACAGA





GACCACTGAGTTTACAAAATATTTCTGGAACATCTAATGTGTGCCAGCACCTATACTAGGGTCACAAATA





AATGAGAAGCAGCCCCTACACTTGTAGGGCTCCAGTTTGGTTGGGGATACCATAGTGAACACAAACAATG





ACACTAAGGGATGATCAAAGCTCCACAAGGCAGTGCATGATAGAGTTGTCGGAGCAGAGAGGAGGGGCCT





GACTCAGCCTGAGGGATGCAAGACCCACTTCCTAGTAGAGGTGACACCTGAGCTGAGTCTTGCAAAGTGA





GTGGTATTAAAAGAAAGAGGGCATGGAAGAAGTATTCCTACCAGAGGGAAGAGCATGAAGATAGGTGAGG





AGAATGAGAAGCAGCCAGGGATATATCAAGAACAATAAGCAGGTGGTATTGGAATGTAGGGTCATAGGAA





TGGAGTGGGGCAGGGGAGTATCAATCTATGAGTCTACAAAGACAACATGAGATAGAGACTGGATTGAGAG





GCTTGTAGAGCTGAGTAGTTTGAGATTTACCCTGAAAATGCCAGTTTAGTCAATTCACCTAATGTTTGTT





GGATTTCTGTTGGGTAGTTTTGTTTTTGTTTGTTTGTTTTTGTTTTTGTTTTTTTGAGACAGAGTCTGGC





TCTGTAGCCCAGGCTGGAGTGCAGTGGCACGATCTTGGCTCACTGCTACCTCTGCCTCCCGGGTCCTGGC





TCAAGCAATTCTCCTGCCTCAGCCTCCCAAGTAGCTGGGATTACAGGCACGTGCCACCATGCCTAGCTAA





TTTCTGTATTTTTAGTAGAGATGGGGTTTCACCATGTTGGCCAGGCTAGTCTCGAACTCCTGACCTCGTA





ATCCACCTGCCTAGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCATGCCCGGCCTGGGTAGTT





TTTAATGCAGGGCCTGACATTGAATAGGTGCTCATTCCAGGCCTGTTGGATGAAAGACATGTAGGCAGTT





GATGGTCTAGCAGAGGAGCCAGATATAGATGGTACTGGTCCAGTATGATGAGCTCCAGTATTCTGGGAGC





TAGAGGGAGTGGACACATTATGGAGAGAGAGGGTGGGAAGGATGAAATTGGAGAGGCTTTGTGAGTAAGG





AAGTTTTTATGATGCATGTTGAAGTACATGTGAATATGTTGTAAGAATATTCCAGAATAAGGGAATTCCA





CGAGCAATGACCTAGAGATAGGAAAGCAGTGGGTATGTATTGACAACATAATTCTGTTTGTCTGAAGCAT





GGGCAGTATGAGAATTCAAGGAAGACAAGCTAGGTAGGCGCCATTCATTCATTCAAAAACATTAAATAAT





GCTGGCTAACATTAAGTACTTACCATGTGCCAAGCACTGTTCTAAACACTTTACACGTATTAACTCATCT





AATCCCCACAACAACCTCAAGAGTTAGAGATCCTCTTATCATTTCCATTTTGTACATGTGGAAATTGAGG





CACAAAAATATATAGTCGCTGATCCAAGGTCACACAGCTTCTAAGTTGCAACTGGGAGGTCTGTCTCTAC





CTCCATGGTCATAACTGCTAGGTCTACCACCTCTCTGAGCTGATGACCCAGACTCCTGGGCCTTTTGTTC





AGTATTCTCTTTTGCTCTGGGCTTCAATTGTAGAGCTCTCAGTATTCTTGGTTCTCTGAATGTCCACCTA





GGCTAGGCTTTTGTAAGAATATATGAGGCATCCACGATGGCTCCACCAGTCCCTAAGTTCCATAGCCAAT





CCATCCTGAAATCCTGCAAAAGTTATCTATAATCTCTCTCAAACCTATTTGCTTTTCTCCCCTGCCACTT





CTTTAATCCATGTCAACATGATTTTTTTCCTAATTTCTCTGCTTCTCTCTTGCTCCTCTCAAATCCTTTC





TCGATGATGACCACTAGAGGGATTTTTCTAAAATTCTGACTATATTGCTCCCTTGCTTAAACCCCTTCAT





GTTTCCCTCTAGACTCTAAAGCAGTGACCTCCAAGGGGTATGCAAAATGATTACAGGGTGAAGGAACAGA





ATATGTATTAGAATTTTATGTTTTTTTATCTTAAAAATAGGAAATCAAGCATCACTGATACTGATCTTTA





ATATACAGACTGACAGTTATACATGTATATAATATATAAACAAATATAGAGATTGGAGGTACATGCTAAA





ACATTTGTACTGATAGGGATGTATAGTCCAAAATTTGGAAACATTGACATATAGGACAGAGTTGAAGCTC





TTCAGCATAGCATTCAATGCCTTCCACATGGTGATCTCTATGCCCTCACCTCCTCCCCACATGCATTTTG





TTTTTTCAGCTACACTGAAGGACTTGTCGTTCCCTCATTTTTTTCTGCTCTCTTACCTCTGGGACTTTGC





TCATGCTGCTCTCTTTTGATTGGAATGCCCTCCCTCACACTTTCCTCTGGCTTACTTTCCTTCATCTTGT





AGACTTAACTTAGGCATTCTTTCAACAAATATTTATTGAGTACCAACTGTGTACTAGATACTGTTCTAGG





CACTGGGGATGCAGTAGCAAACAAATCAGACACAAAATTCCTACCCTCTGGAGCTTACATTCTAGTGGAA





GGGGTAGTAAAAAAAATTACCAAAAATAAGCAAATTAAGTAGCACATTAGTTCTAAGTGCTATGGGAAAA





AATAAAGCAGGATAAGGAGAATGGGATAAGGGGCCAGGGGCGAGTTCAGAGAAGGGTTGTAGTATTAGAG





TGGCAAGGGTAGAAGACGCTGAGGTGAAACTTGAGCAAAAATTTGAAGGAGGTGAAGTTAGTGAGGCAGA





TATCTAAGGGAATGGCATCGCAGGCAGAGGGAACATCCTAAGGCAGGGAAGACACAGGAGTATTCCTTTT





ATATTTGAGGAACAGTAAGAAGATGGGTGTGGGTGGAATGGTATAAGCAAGTGGGAGACAGAAAAATTGA





GTACATAGAGGCAATGTGGGACCAGATTGTATAGGGTATGGTAGGCCATTAGAAGGAGTTTGGCTTTTAC





TCTGAGAGCCCTTGAAAGGATTTGAACACAGGACTGATATTTCTGACTCGGGTTTTAACAAAATTGCTCC





AACTTCTATGTAGAGAATACACTAAAAGGGAGCAAGGGTGGAAGCAGGGAGACCCAAGAGTGGGCTACAG





TAATATCCCAGGTGAGAGATGATGGTGGCTCAGACTTGATCATAATGAAGGCAATAAGAAGTGGTCAGAT





TTTGAAGGTAGAGCCAAGGGTCTTTGCTGATAGATGGGATATAGGGTAAGAGAGAAAGAGAAAAATAAAG





GATAGCTCTGAAATTTTTGGACTGAGCAACTGGAATTGCCATCCACTGAGATGGGAAAAGCTAAAAGTAG





AATAGCTTGGTGGAGGGTAGGGACATGAGTAGCTCAGTTGTACTCCTAAGTTAGAAATGCATATTAGACA





TCTAGGTGGAGATGGAGAAAAGCCATTGGATATACAAGATTGGAAACCAGTAGAGTGGCGTGAGCTGGAG





ATTAAAATTTCTGAACCATCAGCATATAGATGGTCTTTAAAGTCATGTGACTAGACAAGATCAACAAGGG





CATGAACACAGAAAAGGCCAAGAACAGAGCCCTGGAACGTACCTGGGGTACTTCCTCCAGCTAGGTCAGG





TTCCCTTCTCTGGGTTTTCACACCCCCAGGTGGACCCCCTACCCCAGGTTTCCTGGTCATAGCACCAATG





ACACAGTATAGTTACTGTCATTATCATTGTCCTCATAGGGCTTAGAGTTCCCAAGCAGACAGTCATTCTT





GGGCCACAGCACATCCTATACTTAGGGAGTGGTCCAGGCCAGGACAGTATGGCTTCAAATTGTGTCAAAG





GAGAGCTTCCAAATCTTTTATAATATATATCCCAGCATCCAGATACAAATGGTAATATTCACGGCACACA





CAGAAGCAAACAGTAGGCTACTTCTGGCCCTGAGGTATCTTGAAGGGTTGAGGGGGATCAATATCTTGGC





TCATCTGTACTGTGACAGATTTGGAAGATCTAGTCTAACCCATTTTTTCCCTCCCCTCCCCCTACCACCT





TCAGAGAAGCTGCTGCTGGCTTGGACTGCGGTGGGGCTGCCCCTCCAGGATGTGTCTGTGGCTGCCTGCA





ATTTCTGTCGCCGTCCTGTACACTTTGAGCTCATGAGTGAGTGGGAACGTTCCTACTTCGGGAACATGGG





GCCCCAGTATGTCACCACCTATGCCTGAGAAGCCAGCTGCCTAGGATTCACACCCCACCTGCGCTTCACT





TGGGTCCAGGCCTACTCCTGTCTTCTGCTTTGTTGTGTGCCTCTAGCTGAATTGAGCCTAAAAATAAAGC





ACAAACCACAGCA






As used herein, “GYPA”, “GPA”; “MN”, or “glycophorin A” a sialoglycoprotein of the human erythrocyte membrane which bear the antigenic determinants for the MN and Ss blood groups. Sequences for are known for a number of species, e.g., human GYPA (the GYPA NCBI Gene ID is 2993), the nucleic acid sequence (e.g. NG_007470.3), mRNA sequences (e.g. NM_001308190.1) and polypeptide sequences (e.g. NP_001295119.1) are known in the art. These, together with any naturally occurring allelic, splice variants, and processed forms thereof that catalyze the same reaction are contemplated for use in the methods and compositions described herein.


In some embodiments of any of the aspects, the GYPA enhancer element includes or is derived from human GYPA sequences having the following nucleic acid sequence NG_007470.3 (SEQ ID NO: 42):










NG_007470.3: 5001-36438 Homo sapiens glycophorin A (MNS blood



group) (GYPA), RefSeqGene on chromosome 4


GCAGGAAGGTGGGCCTGGAAGATAACAGCTAGCAGGCTAAGGTCAGACACTGACACTTGCAGTTGTCTTT





GGTAGTTTTTTTGCACTAACTTCAGGAACCAGCTCATGATCTCAGGATGTATGGAAAAATAATCTTTGTA





TTACTATTGTCAGGTAAGTGATTTTATTTCATCTTGGTTCTGTTATATTGGGTATGAGATCATAGAATAA





AATATGAACTACCCTATTTTAGTTCTATCTTATTTAAATCAATAAATGAGTAGTATTTCCTCTTCCAGTC





TGGTGGATGGATTTTACTGGAACTCAGCTACCAATGTGGGGGAAATGGCACAAGGGAGCCCAGTATTTAT





GGCCAAATCCAGTTTTCTAGTATGAGAAGCTTACTTCAATTCTAAGTCTAGCTAGAATTAAAATAATTTT





ATCAAATGCTATGAGAAATACCTCTCTGTGAATAAATGTATTGCTTTGTTTGAGTTATAAGGAGATTCAT





TTCCAAACTAAAGAGTTATTAACGAAGATGTTGGTAGCTATATGGCTTTTAGTTTTCAAAAGGTATAATT





TCCTATTTCTGCCAAATGGCGAGAAGCCAAAAGCATGAACACTGAAACCGTGGGGAGTTGTTCGCTTCTC





TGTGGGTCCATTACTAAAGTGTCACATAGGAAGAAAAAAAACAAAAACAACTCTTACTGGCTTAGGTATC





CTGTGAATTTTAGGAGAAATTTAAATCCATTAAAATAAAGAAATATCATAGGGTTATTATTAAATTGTAT





TAATTCAATAATTTGAATTTAACTTAGTTTAAATTTAATTATTAATTTAGTGTCTTAAATTAACATGATT





TTGGCCTCTTTCTGAGAATATTATAGTTAAACATCCTCTCAAGTGCAGTGCTTATGTGTTAGCAATACTA





GTGCCCAGCACACAGCGGGCAGGCAGTTGCTTGAAACATTCTGAGTCTATTAGACATTGCTGTATCCCAA





GTGAGAGCAAGTATCAAGGAGCTACTGAGCACTCTGTAGCACACAGGGAGGAGAGATCAGCATTTTCTAA





GATACCCTAGGGGAGGATAAAATAGTGCAATAGTTAAGAGCACAGGCATGAGGAACAGACAGAACTGGGT





TCAAATCTACTTTTACTTCTCAAGGCTGGGGAACATTAAGGCAAATTATGTGCCCACATTTTTATGTGTC





CTCGTCTTTAAAATGCAGGCAGTGTTGGTACTTACCTCATAATAATTGCATAAAGATTAAACAAAATATT





TAATGGAATACACTTACTGATGCCTGAAACAAAGTAAAATGTTAAGATTACTATGCATTTTCTGTGATTA





GAATTAACTATCATGATTAAAAAGTATTAATAATATATTATTAAAATAAGCAGTAGCTATCAATAGTTAC





AGACTAGGGAACAAACCTACGTATGTGATTGGTGATTTCTGAAAAGTCAGAGAGAAAAGAAAATTACAGA





AAGAAAACAGAAAACAAACATAGCTACTCTAATTTTTTAAGCAGAAAAGTATGAAAACATTTAGTTTGAA





GAAAAGAAAACAAATGAAAGGGATGTAGTGTAATATTTGTATATATATTCATATATTTGAAGTGCTATTA





CACAGAAAAAAAGATGTATTCTTTGTGTTGCTCCATGGGGCAAACCAAACTGGATGTAACTCAAGCAAAA





TTAGACACTGCATACTCTACTGGGGGTGTGCCCAGCATTTGGGAAAACTCTGTGTGACTTACAAGTGCCC





CAAATTTGGAAAGGGTTCCTGGCAAAGAAATGATTTTTTTTTTAAATTTCTACAACTACACAAGCAGATA





GTGTATTAAAGCCTTAAATGGCACTTGGTCACTGGGGCAAGATGACCCTGAAAGCTACAATGGTCTCCAG





TACCCAAGCTGTTATCATCTTTGTAGCTTCAGAAACCCTCCAAGGAAACTCTCTTGATGTGGCTACTTTA





TAGTATAACAGAAAGGTGTAAGATCAAGTTTTTCCCCCATACTGATTAGCTGAAGAGTAAACATGGTGAA





GTCTTTTTCTTTTTCTTTTATGTTGCTATAAAAAAAAAGATGATTGCCTTGCTTTCTCCAGGAATCTTAA





GAATAAAGCCAATATTTCTAATTCTAAACTTACCAGAGATCTCCTTCCAAATGGAGAATCCATTTTTTCT





AATATGACTTGATTCCCAGTCCCTGAATTCCTGCACTCATTTGATGATTCAGTCATTACATGTCAGATTG





TGAACCAGACACTGAGCCCACAGCAGGAAGAAAAATGGGCTCCCATGGAGGATACACGGAGGGTAGGCGC





AGTGGATGATGGGAGGGAACGCAGATAATAAATGGAACAACAACTATCTTATTAAAATAAGATAAAAACA





GTCAAAACTAATACAAAGCATATAAAACCAGGTAAGATGATAAACATGAATGCCGAAAGCTGCTTAAGAA





AAGGGTAGCAGGGAGTTATTTTCTGAGTAGATGACATTTATGCTAAATGTGGAACAAGGAGACGGAGCCA





ACCCTGAAAATTCTGGGAAAAGAGGACAGAAGGCAGAGGGAAGAGCAAGAGCAAAAATTCTGAAACAGCA





GGTAAGTTAGTGTTTTCAAGGAAAAGCTGGAGCTTTTATCTGAAAATCAGATTCTGAAGCTAAGAACCAA





TTTGAAAATACAATACAATATCACTTCGACTAGGAAATTATGGCATAAACCAGGAGTCTCCAAAAGCTTT





TTGTGTTTACTTAAAAATTCATACAAAATTTGCATTCTAGGTCATAATATACTAATTTAATTGGAGGAAA





CAAAGGCACTGGTATGATATCATCATGCCTACTTTATTCATCCGTGTATCCCCAGAATCTAGCACAGTTC





CCGATTGGTATTTATAGTAGCATATTGGTTGAATAAGCAAGGAAGGAGGTGAAGGGAGGGAGAAGGAGAG





AGAAGCAGAGAGGGAGAGGAAGGAAGAAAGAAAAGGAAAAAGGGAAGGAAAGAAGAGAGGAGGGAGAGAG





GGAGGGAGGCAAGAAGGGAGAAGAGAGAAGGGAAGGGAAGAGACAGGAGGAAGGGGAGGAGGAAAGGAAA





GAGGAAATATTTGTTTTCATCTGGTTAGACACAGTGAGTGCTCCGCATAGACAGATCATTATTACCCTGT





GCATCTGACTCATACCCCTGCAAGTACATCAGTCTGAGAAGCACATGTTAAGTGAAGAAACAAGGCATCT





CTTTTTTTTTTTTTTTTCAGGGATCCAAGAAGAGAGCCTTGCTAGCTGCTATTTAATTGGCACAGGAAAG





AGTTACAGGAACTGTATGCCAGGGAATACATGACTATAAATTCTTTAAAAGCAAAACCTGTGTCTTCGCT





TATGTGTCCCACACATTGTCAGCCACATAGTAGGCAGTCAATATCAACTACTCAAAATGACAAATGACAA





ATGACCAGAATTCTGCGGCAGACTAGTTTAGCCATGAAAAATCATTTAACACCCGTGGGCCTCAGTTTTC





TTGTGCCTATTCAATAAAGCGCCGAGTAGATGGTATCTACAAGCATTTTTCAACTGTAAACCCCAATGAA





TCCCCAAAATTCAGCCTGAGATGAGCTGGACTAGTTGCCAAACCTATAAATATCTTTAGCATGGTGTGAA





ATAGGGTTTTTAGAAAGAAACAGACACCCACTGTGAACTCCTTTGCAGAAAAGGTCTGAATAGAGGGGAA





AGTAGGGATGGTATCTCAAACTTACTTTGTAGTGATTTTAAATTAGGAAATTTAGCTTCACATTCTTGTG





ATAAATTTCTTTTCACCTTGGTTTCTAGAAGATTATTCAAAACATCTGTGAGACTATTTGAGAAGTATAC





TTTTGGGGAATTTCCCCAAGTTATCTTTATAGATTATATTTTGACATCAACTGCAAATGTAATATCTTTT





ACTCAAAAAAAACCCAATCCTACTTACATGGTGCTGACAAAATCAGGCTGGACCTACATTTTTACATCAT





AGATTTCCAGCCATTATTATCATATCCACATCTTTAGTAAGTACCTATCTGTGTAGTTTTCTGTGATAAA





TGAACTAAACTAAAACTAAAGCAAAAATGTTGAAAAAAAATTCCAGGTTTATCTCTGAGTGTTGGGATTG





CAAGGTTTTTTTTTCTCATTTTAAATACTTTCTAAATTTTCTGCAAAGAGAACCATATAATCTAATCAGG





ACAAGTTTTAATATATTTTAAAAAGTAAACCGAACAAACACAATCTCTGCTTTCTAAGAAGTCTTTAATT





TTTGTACGTTGGTCATAGACTATGACTATACAATTTATTTGTGATATGTATTAAGAATTTCTGTCTAACC





CAAATTATTATATGTAAGCACGGGAAAAATGATGTCATCTTTGTTTGTAGTGTACAAAGTTCTATAAACA





GCTATTTGATCAACTTTGGTATTTCCATCCCTAGATTTATATACAGCAGGTTAGGTTCCATACAGAGGCA





GGTTCTGAATAATAATAACCAACACTGATAATAGCACTTACTTTGTGCCGTGCACTGTTCTAAGCAATTT





ACATACACTTAATTTTTAAAATTGTAGTAAAATACACATAATATAAATTTACCATTTGAACCATTTTAAA





GTGTACAATGGGTAGCATTTAATGCAGTCAAAATGATGCACACCCATCACCATTATGTAGCTCCAGAACA





TTTTCATCACTCCAAAAGGAAACCTCTTACCCATTAGCAGCCACTTCCAATTCCTCCAGCCCCTGGAAAC





CACTAATTTGTTTTCTACATCTACAGATATACCCATTGTAGATATTTCATATAAATGGAATCATATAATA





GGTAGCCTTTTGTGTATGTCCTCTTTCACTTAAAATAATGTGTTTAAAGTTCATCCATATTGTAGCATGT





ATCAGTATTTCATTCCTTTTATAATTGTGTTGGTATATCTCATTTTGTTTATCCACCCATCATTTGATTA





AAATTTGGGTTGGCATATCACATTTTGCTTATCGATCCATCATTTGATTAAAATTTGTGTTGTTTCCACC





TTTTGGCTATTGTGAATAGTGCTGCTATAAATATTCCTGTACTAGTTTTGTTTGAACCCACTTTTAATAC





TCAAAGATGTATAGGGGTAGAATTGCTGGGTCATAGTAATTTTATGTTTAACTTACTAAGGAACTGCTCA





ACTCTTTTCCACAGGAGCTGCACCTTTTGACCTTTTCACCAGGGTGTATGAGGTGCCAATTTCTCCACAA





TCTTGCCAGAAATTGTACTTTTTCATTTTTTTAATTATAGCCATTTCAGAGGGTATGAAATGGTTTTTCA





CTGTGGTTTCTTGCATTTTCCTAATAACTAATGACGCTGAGAATCTTCTCATGTAATTGTTGGTAACTGC





ATTTTGCATATCTTTGGAGAAATGTTGGTACTAGTCCTTCACCCATTTTTCAATCTATTTTTCTTTTTGT





GTTGCTAAGTTGTAAGAGTTCTTTCTATGTTCTGGATAAAGAGTCTTATCAGATATACTATTTGCAAATC





TTTTCCTTCATTCTGTAGATTTTTGTTTTTACTTTTGATAGTGTCCTTTGATGCACAAATGTTTTTCATT





TTCAAGTCCAATTTATTTTTTTTTCTTTTGCTGCTTACGCTTTTGATATCATATCTAAAAATAATTGCCA





AATTTAAAGTCATAAAAATTTCTCCCTATGTTTTCTTCTAAGAGTTTTGTATTTCTTCTCTTATATTTAG





ATCTTTGGTTTATTATCAGTTAATTTTTCTATATGATGTATGATAAGAGTCCACCTTTATTATTTTGCAG





CTGTCCCAGCACCATTTGTTGAAGAGACTATCCTTTGCCCATTGAATGGTCTTGACACCCTTCTTGAAAG





TTAATTGGCCATGGATATATGAGTTTATTTCTGGAGTCTCAATTCTATCCTAAGAATATGTCTGTTCTTG





GGGCAAAATCACACAGTTTTTATTGCTGTTACTTGGTTATACGTTTTTAATTCATGAAGTGTGATTCACC





AAACTTTGTTCTTCAAGATTGTTTTGCCTATTTAGATCCCTAACAATTTCATAGAAATTTTAGGATTAGG





TTTTCCATTCTTGCAAAAAAATAATTATGTGCATTTTAACTTAACCTGTTCAATAACTCTATAAGGTAGA





GACTAATCCATGTATAATGATGGAACAAAAATATAGAGATTAAGTAAATTTTGCAAGGTCTCAGGTAGTT





GCTAGAGGAATTAGTTTGAGCCTAGGCAGTTCCACTGCAGAATCTGTGCACTTAGAGAATATGTCATGTT





GCCTGTACCATACCTAGTGATGTTCCAGGATTGGCTCCTTTACTCTTACAACATTGTCACTCAGTGTTCT





GCCTGTGCTTTCACCAAGCTGAAGACTTTAATGAAGGTTGACGGTCTGTCTTCCTCACGTGGTGCAGCTA





AGGAACTCTAACTGTGTGGCTGTTATGTTAGCCTTTTGCTCCTTTTTATATGGGCTATAGAAAATGTTTT





TAAATCCTGGAGGCCTCCTTTTGATGTTATCACTTATTTCCCAGTCATCACTATATTTTTAAAAGCCAAA





ATAGAAGGAAATAAATACAAAACATAAAACATGAATAGTACAGCTATTTGAGGCAACTGAGAATAGAGAT





CATGGCACTGAAATTGCATTTTGCTAGGAAAAAGACCACAAAAGTTCTCCCCTTGCTACCTTTCCTGAAC





TATTCTGCTAGATTCAGACTTCAAAAACATTGTATCAGGAAATACAGAAATGTTCTTTCAAAATGAGTGT





ATGGGAATGTGGGAATGCCTAATAAAATCTGTCCTCATTGATTCGTTAGCAAAAATCATATAAATCAATA





CCTTGTGATTGCAAGCAGATATATTTCAGATCCTTTCTGTGTTTGTTTTTTTGCTTTCTTGATCTATCAC





AATTGGAGAAAACTTAAAATTTCTCAATGGTATTGTATTTTTGCCAATTTCTTATTCTGCTTTATGTTTC





TCGTTGCTATATTATTGGGCTATAATGGTCCATAATTACTTAAGAATCACTGTGAAATATATTGCTTAAT





GACACAAGTAAATCTTTTTCATTGTTTGTAATGTCTTTGCTCTTAATTCTACTTTGCCTAAGATTAATAC





GGTTATTCCTGTTTAGTTTTATATGTATTTATTTATTTATTTTGAAGATGGAGTCTCGTTCTGTCGCCCA





GGCTGGAGTGCAGTTGCATGATCTCGGCTCACTGCAACCTCTGCCTCCCGGGTTCAAGCAATTCTCCTGC





CTCAGCCTCCCAAGTAGCTGAGAATACAGGCGCACACCATCGCGCCCAGTTAATTTTTTGTATTTTAGTA





GAGACGGGGTTTCACTGTGTTGCCCATGCTGGTCTCCAACTCCTGAGCTCAGGCAATCCACCTGCCTCGG





CCTCCCAAAGTGTTGGGATTACAGGCATGAGCCATTGCACCAGTCCTAACCTATCTCTTTTGACTCAATC





TAAAAGTTTCTGTCTTTTAATACAAAACCACAATCCATATGCATTCATTAATTCACAACTGACATTTAGT





ATCTTATTTCTGTTATCCTATTTCATATTTTATGATTCCTTGTTTCTGCTCTTTTGATATATAAATTATG





TTTTATTTGCCCTTATCCTTTCATGTGTTTCTAAAGTATATAGCCTACGTGTAATTGTCCCATTAGCTAA





CTTTATGTTTTTGAAAGCATTCTCTCTCAGAATTCCCATTTTAGTGGTGCAGCACACATAGAAAGTCTAA





GTGCTTTCTGGAGCTAGATAAGCTGGATAAAGGTGTGCATGAGCCACTGGTCAATGGCTTGTGCAGGCGG





TGAGTGCATTTCTGGTATTTCATATGCTATTGATCTGGCAGCCAGGTATTCAGATAGGGTATAACCAGGT





TCATCAGGCTCAAAACATAATCAAGTATTATTGAGACATAGTTAATGTGCACTACAACTCACAGCACACA





GGCTCACACACACACTTGTCTGAAATAAAATTCCACAAAATAATACCTTCCCTTATTCTGTGTGATGTAC





TTTGATATATTCTCTCCTGTTTTATACAACTTAATTTTTTTTAGAGAAAAGATTTTGCTCTGTGGCCTAA





GCTGGACTGCAACGGCACAGTCATAGCTTACTTCAGTCTTGAACTGCTGGATTCAAGTGATTCTCCAGCT





TCTGCCTCTCAAGTAGCTGAGACTTCAGGTGTGCTCAACCACACCTGACTAATTTTTTGGTTATTTAATT





TGTAAATATGGGGTCTTGCTATGTTGCCCAGGCTGGTCTCGAGCTCCTGGCCTCAAGCGATCCTCCTGCC





TTGGCCTCCCAAAGCACTGGGGTTACAGGCATGAGCCACCACACCTAGAATACAACTTAATTTTTTAGTG





CCAGTGACAACCCACTGGACTGATTTCATAACCCATTAGTAGAGGAATGCACCATCTTGACTGAAGGTTG





GAATTTTCTCAGGGAATCTATGTAGCACTGATGATTGGGTTTCATATCCAGAGATTCTAGTTATGCTAAT





ACAGAGGCCAAGCAAACTATAGCCTGTGAATGGCCGGCCCCCTGGTTTTGTATACCTTACAAGTTACAAA





TGATTTTTACTTTTTTAAGTGCTTAAAAAAACCAAAATAGGCCGGGTGCAGTGGTTCAAGCCTGTAATCC





CATCACTTTGGGAGGCTGAGGCAGGCGGATCACGAGGTCAGAGGATCGAGATCGTCCTGGCTAACACAGT





GAAACCCCATCTCTCCTAAAAATACAAAAAATTAGCCAGGCTTGGTGGTGGGCGCCTGTAGTCCTAGCTA





CTTGGGAGGCTGAGGCAGGAGAATGGAGTGAACCCGGGAGGCAGAGCTTGCAGTGAGCCAAGATCATGCC





ACTTCACTCTAGCCTGGGCAACAGAGCAAGCCTCTGTCTCAAAGAAAAAAAAAAAGAAAGACACAAAAAA





AATCAAAATAATAATAATAATATGTGAATATTATATGAAATTCAAATTCTACTGCCCACAAATCATTATT





GGAACATAGTCATACTCATTTATTTATGCTTTGGTTTACATATTGTCTGTAGCTGCTTTTGCACAGTGAC





AGAGTTGAATATTTGTAATAGATGGTCCACAAAGCCTAAAGTAGTTGTGGCCCACAAATCCTAAAGTAGT





TACTCTCTCTCCCTTTACATAGGAAGTTTACTAATACTTGTGCTAAGGGATCTCAACAGACAATCTGAAA





AACTTAAGTTTTAGACTAAAGATTTCCAATCTAAATTCCTGTGGAGCTTTCTGAAGCTGCCAGGTGGAGA





TGGGAACAGGTTGTGAGGCTGCAGGCCAAACACTCAGGCCAGCTTCCACCAAGCAGTTCAACTCTGTCTG





TTTCACACACTGATGAGCTTATCCTTGGAAAGTGATTAAAGTAAAATTAAATGCGAATTGAGGGAGGAAG





TGAGGGAGACTGTGGCTCTAAAACAAAACCCTAAGAAACACCAACATTTAAGATGGCAAATGATGTTATT





TCTAAAGTCGTTCAGGCTAATATCACATACTATAGCTGTTCACTTTATAGATAAAGGTGACACTACAACC





ATAGAAAATGTAAGAGTGGACCTCGAAACTCAGGAAGATGAAGTTTACATATATTAATCTATATTACCAA





CTGGAGCAGTTGTTCTCACTGCTGGCCGCACATCAGAATCCAATTCCTGGGATATCACAGATGATTCTAC





CATGCAGTCAAGGATGAGAACAAACTAGGTTCATTTCTGCAATTTTTTTATTGTTCAACCAGTGAAAAGG





AAGTACCAGTGGTGTGAGAACTTTGGGATAAAGTTTTTGTTTTCAATTAAAATTATTTTCATCCAGCCCA





ACTTCCTTAAGCCCAAATTTAATGTGTGTGAAGTTCAGCTACAGAAATACCAAACCTTAGACTAAAGCGG





ACACAGGTAAAATATGTGAAATCCTCTTTTGTTCTGAGGATTCTTTAGTAGGCAGGAGTGACCAGATAGG





AATATGCTTGGCTGGAAAAATTAAGATTCAAGTTAACAAACTGTTAATAACCAGGACCATCTGCTCTTCC





GTAATGTGGATTTGCCACTGCAGGTCACCCTACAATGCTATGTTAGAGGTACAACACTCTTACCCTCAGG





CTATAAACAAGGTGAATTATTATCTTTATATCTCTTCATTTAGCCCTGATTTGCTGAAGTGAAGGCTCGC





TTGAGAGTTGGTTGCATTATAATTTGGTGAGAATTTAATCTCTCAATGACAACTTACTTGATTCCCTCAT





TCTCTTTCTGCTACATAGATCACAGTAGACCTTGGCAGACAGTTCTGTAGTTACATAGGTCTGAATTCAA





AATCCAGGTCTGCCACTTGGTGGCTGTGTGAACTTAAGCAAGTCAGGCAATGCTTCTGATGTTTTTTTCC





TCCTCCACAAAGAATAATTAACATATAACAATAGGGTCTCAGCTAGTTGTTTTAAAAATGGTTAGAGAGA





TGTGTGGAATGAAGTAAGTGTGCAGTAAGTGTTAACTACAAATATTATTATCTTAGACATACAGATTTCC





ATGATTCATGAATGGTGAAGCATCTTAGAAGACATCCATTCCAGGCCAGGCATGGTGGTGTGCACCTATA





GTCCAAGTTGCTCAGTAGAATGAGGCAGGAGAATTGCTTGAGCCTAGGAGTTTGAGGCTAGTATGGGCAA





TATGGTGAAACCCTATCTCAAGAAAAAAGCAAAACATTTTTTAAAGTTTAAAAAGAGAGACATCTGTTCC





ACTACTCTCATCTTAGAGGCCATAAAACTGAGGCTCAGATAATTTCAGAGACTTGCACAGATCCCCCAAC





CATTTGGTGGCAAAGCCAGGAAGAGAACTCTGCTCTCCTTTCCCACTGGGACAGTGGAAGAAATTCGTCT





TGATTTCCATCTGTCCAGGCTGAAGAATGTGCACTGGCTGGAATGACAGACTGACCGACTTTTTTTCTCC





ACCTCTGCTGTCTCAGCAATGGTTTGGGACAGTGTGGATGACCAGAAGCTGGATAGTACAGAGCCAGGCT





AAAGAGTTCAGGCTTCCTGAAGGGAAGCTGCAGTCCTCCTAGGCCACAACACCTTCGAGATAGAATACAT





AAAGCACCCTTCTCTACCAAGTTAGGAAAGGAAGAAGTGTGACCAATTAGCTGTATGGGGACTGCCAAAG





CATGCCAGTCTGAAGATGAGCAGAAACTGGCTCATTCCATTTGGCACCTAGCACACTAACTGCATCCGTT





AATAGGCCATGCTTTTCTCCAGAGCCATTGGCTGAAGAGATCAAATAAAAAGTATTGAGAATAGGCTACC





CAAAACAGTAGGCTCAGATGCTATCACACAAAGCACTTTATCCTTAAGTTCAATTTTTCTAAATTGTAGT





TGGCTGCTTTGGCTTAATAAAAACTTCCAAAAAAGAAAAACGAATGGCCACAGACAGTATGGGTATCTAA





CTATATTATCACAACTTGACCAAGATTGAACTTGCCAATCCTTTGGTTCAAGAGCCAAACAAAATCGTTC





CCTTAAAATATTGCTTCATGGGAACAGTCTTCTTCAAACATCTTTTAGCACAGGCAAGATTCCCATTTAT





ACATTAATTCTGTTCAAGACAATGAGATTGGGCAGAAAAGGCATTGAGTTGGAAGTCAATGGATATGAGT





TTTTATCCCAGTTTTACCACAAATTAGCTGAGCATAACTTCCACAGATGCATTTATCAAGTAGTTTTCAT





GGTCATTGCAATGCCAAAAAACTGTAGCATTTAGAAAATTTAGTTTTCAGACTTGGAAACTATTTAAGGC





ATTTCATATGAAGGGTGTGTCCTTGTGAGAGTTTGCTTATGCAAGATAAGGCTTCTTTCAGCTGCAAGTC





AGGAGCGAACCAAAACTCAAAGCAGCAGCTGCATGAGCTGACTTTATCACATCTTGACAAGAGCTCAGCC





ACTGGAAGTTTTGGCATACAGCGAAACTGAAGCGTACTTATACAATATCACATTTTATTTTTATTGTTTC





TAATAGCATTCCAGGTTAGAAATGTCAATTATTTGGGAAAGCTGAGGGTCTGGTAGATAAAGCATGCAGC





AGAGAGCTAGGAGGCTGGCTATTTCCAGTCGTTATCCTAACATGTCTTGGGCCCCCAAGTCACCCCACCT





CCATGGTACAATGGGAACTGTGGCAGAAGTCCACGCTCTCTCCCCCAACACATGGGGATAAGAGACAAGA





GAGGTGAAATGTTCTGGAACATATCCGATGTTATACAAGTATAAGCTGTGAGATGATCCAAACGCAAATA





TTGAATATTTCATTTTCTAGAAAGTATACCAATTCATTCCACCCTTCTCAAACCTAAATTACAGAATTCA





ATTCAGGTCACACAGATTTACTTTGTACTAAGTACCATAGCAAATGCCATTTCAGTGCCTGAAAACTGAA





AAACATAAATTTAAAGTAGGAGTTTGAGGCCTCACTAATATGACAAAACATACCTTTATATTTTATTTTG





CAGTAATTTGCCACTTAATCATTAAACTCTTATCAATCTGAGAGATTTGCCAACACTTGCCTGCTAGGTG





ACCTAAGCCTCCACATCAATGCATGTTATACTCCCCTTTCTCCATATGTTAGGCCCATGCTATTTCTTTA





TCCCTCCTCCTCTGCATCTTCACCTAAAACTCTGCCCATCCTTCAGGGTTCATCCAGTGATTCATTTGCA





AGCAGGCATGGGGTAAGGTCTTCAGAGTATGTTTCTCAGAGGCCCATGCAGCTAAGAAAATGTGCAGTGT





TGGCACAAGGTCTGTCTATTCCTGGGTAGCCAGATGCTGGACACATCTTTCATAACACCACAAGGTAAAT





ATACTTCACTTGGAGAGAGAGGTGAAATTTTGCAGGTATAGACTGGATGTGTTCCTGCCAGAAGATGTGA





AGGGATTAAGAAACTGACTCTCATCTCCGTATTGCTAGAGCAAAACATAATTTCTCATAGTGGCTATAGT





ATAAGGACACTGAGGGGTAAGAGATATAATCTAAGTAATACAATAAATTAGTGTGGAAAAATCATCAAAA





TGAAGACTACATGGTTTTTACTAAAATTCTAGCTTTTAGGATGTCCAGGGAGCTCAGGAATTTAGCTGTC





CTTTTTTGTATGTACAATATGCCCCAATGCTTGCTGACTAATGTACTAAAACATTAGAGAAATCTTGCTG





ACAAGATCTCAACCAGTCAGCGAGATCCGGAAGGTGAGACTAATATTGAGGGTCAGCAGAATTAAGTCTC





AGTTCTGCTGCTTACCAGATATGCTGATCTGAGCTAGTCATTTAATTTTTATGAGACCAAATGTCTATCT





GTAAAGTCGGCAATTTGGATTAGATGTGCTGCAAGTGGTTTTCTAGCTTAAATGTACCTTCTGAATTCAA





CAGGACAATACTTAAACTGACCTTTAATCTAGGAATGACACAAGTAGATTTTTGAAAGCTACTTTAGCTA





CAGAAAGCTGAGAGCACCAAAGGCAAAGAGATAAAAATAACAGGAGAGCCTTCCCTTAATCCAGTCCCTA





AGCAGTTTTGGCAAACTAAAGTTTGTTGTTCAATGGTTACGAGTTTGCTTCAATGCTTTCTACCCAGTTT





ACTGAACTAAATAGTATATAGCTATAGTAAAAAGTCCTATTCAAAAACCAGCTTCTCACAGATATTTTGC





AGCTTTGCAGAATTGAATATGTCCACAGACGTCTATTAGCTGGTTAGGGTCTTAGGAATCTAGGAGAGCC





AAGTAGTTGTGTGAGCTGTTGTTATCAAATGTAGTTTTGAACATTCTTGGTGATTTTAAGGGATCATATT





GTGGAAATTTGGTTTCCTTACCTTGAATTTTGAATGAAGCTTTAGAATTTGAGGATGTTTCTTTGGTTTC





TCCTTCCAGGTAAGTGATTTTTTTTTTTTTCAACCAGATGCTGGTTTATTTAATTTGAAGGTATTGATGA





AATTCTTTAAATTGCCCCCATGTGATTCTACTCTGGAATAACTACGAAATTATTTAAAAGTTAATTAATA





CAAGAAAATATGAAAACTCATTTTTATGGGAGCTATTGTTCCTTCAAGATGACACTGTTTTGTAAACTAT





AGACTTCCAGTAACAAGCCTCTGTGCCTTCTTCTTACCACTAAGCATGCATGGGTATTAATTCCTACTGA





AAGACTTATGCTATCTTTTTTCCAGAAATGGAAGAAAAATGAACTATGAAAAAGGTCATTTTATAGGTCA





GCTACCACTATGAGATTGTTGAGGAAATGATATAAAAAACAATTTTTATCAAATTATCTTTAGGGCATTT





ATATGTTTATTTTCTTACTATGTTGACTTAGGTGACTATAAGAAGTTGTATCAGAGCAACTGATTCTGGT





GAATTAAAGCAAGTATTTCTAAGAACATAAGTGGCAACTTTCAGTCTCAAATCAATTTGGCCACCAATCA





GTTTTTGTAAGGGTACAAATAGGACATAACATGCTCAGATGGGACTTGGATAAAGTGTATACAATTTTAC





ATCGAGGAAATTGTGTCAATGTGTTACCTTCAATGTTAGAAATTCCCAAGTTCTGACAATAGTTCAGAGC





CTTGTTAAAAGCCAGAGTGGAGGCATGTAGATCCAGCTGGAAAGAGAGGCATTATGGTCTAACTTAGGAC





AAATTTTAAAGCCAGTGTTAGGGTCTGAGTCCAGCTTTGTAAACTTGAGTACAGTGTTTGATCTCTGGGG





TTTCAGCCTTCACTTCAGAACAAAATTTCCACCAAGTGCTCTTTTACTGTGAGGAGTAGCTGTTGAAGAA





GAAAGAAGTCTACTTATTTGCTAGAGTGTTACAATTGTTTTGATAAAGCTCAAAACTTATCTAAATAAGC





TCTCTCTCCCTAAGCATGTTTTCATTTTTATAAAAAAGTTACATATACTTTGCTTATAAATTTAAAATAC





TTTTCACCTCCTCTGACTTCATTTAAAATTAAAATAATTAAAGTGCCAATTTTAAGAGATGTTAGCTCCC





ATTATTGGTTCTTTGCCATATTCTTTTGACAACCTGCTGTAATTTTCTGCCCCCTTTAAAGCCTCAGGCT





ATAGGCCTTCTCCACCAAAGGAATATTAAGAAGTGATAAGGACCTTCTGTGAGCAGAAGTGGCTTGTTTG





CAAAGGGACTGCTTATCTTGGCCACTCTTGAACACAAGATGGGACCCTCTACTGCAAAGCTCTGGCATGT





TTTTTTTTCCCCTAAGTTATCCTCCATACTACTGACAGTGATTTTCCCTAAATAAAAAACTGCTTCAAAC





CATTCATTGTCTTTCCACTGCCTTAAAGATAAAGTCCAAATTCTAGAACATGGCCCACAGCATTTGGTGC





CTCACCACCTCTTCAGCCTCTCAGTTGCTGTTCACCCATTTCTCTATTCCTCTCCTTCTCACACCTTGTG





CTGCAGCCACATAGATAACCTGCAGTTTTTGTAACGTGCAATGATGTCTCAAATTCCAAGGCATTGCTGG





TACCACACAGCCTGCCTGGTAAAATCCTAGACTTCTTTCAAGATAAATTCAAAGACACCTCCATGAGGTC





TTTCTACCTCTCCAAGTAGAGTTGACCGCTGTCTCCTTTGTGTCCCCACTTCCACCACCATCCTAAAATA





CTTATTATACTTAGATTAATAATTGTCGCTCTTACTGCACTGGAATTACCCTGAAAGGAAAGGCCATGTA





TTATTTATCATTGTCTTCCTAGTACATAGCCCACAGCCTATACCTCCCACCCCAAAAAAAACCTTTTGTA





AATAATTGAACAAATTAAGAAACACCCAAGGCCCCCAGTAAACATCAAGGCCTAAGGAATGCATATCTGG





ATTCTAAATAATCATAAGGTTTTACAACACCATGTTAAGCACCAGGGACTTCAGAGAGCTTTTAGTCTAA





ATCTTATTAGAGAGGCCAGCGAAGACCTCCCAAAGGAAGTGGCATTGAACTGAGACTTGAAAAGCCAGTA





GTTAGGCAAAGATAGGGAGGGAAATATTTCAGACGAAGGGAGGAGATGGCACAAGATTTAGGACACGGAA





AAGGGTATGGTGCAGTCATAGAGAAAACAGATGTGCAGAATGGCTGGAGCCCCAAGAGGGAAGGGAAGGG





CGAAGCAATGAAGATGTGAGGCAAGCAGGACTGGACCATGCAGAGTCTTGCAGATGTTCACAAAGAAAAT





TGCAGCAGGTAGTCCCTAACATCGTGCTGAACAGTTAGGCAACTTGGAGGAATATGTATATTTGTACTCA





TAGTCAAAACCACTAGATGGCATTTACAGACTACGTTTTGTGTATTTTTATTTTTTACTTTTTGTTTTTT





TTTTCTTATGTTAGCAAAAGTATGCTCGCTATTGAAATGTTGAAAATATTTCATTGGTCTTAAAATGATG





CTTATTTTTCCAGATGCTTGCATTCATTCTGCATGTGCTATTTTGTCATGTGGTTTGCTTAATTTATTAA





ACAATTGTATTAATTAAATATATTAATTATAAATTGATTAATTTATAATTAATTATGTGTTATAATTAAG





TTAAATTTATTAATTACTTAAATTATTATATTCACATTCAGATGCAATCTGAAAACCCATTTGTTCTCAC





ACTGCTATAAAGAAATAACTGATACTGGGTAATTTATAAAGAAAAGAGGTTCCATTTGACCCAGCCATCC





CATTACTGGGTATATACCCAAAGGACTATAAATCATGCTGCTATAAAGACACATGGACGTGTATGTTTAT





TGCGGCACTATTCATAATATCAAAGACTTGGAACCAATCCAAATGTCCAACAATGATAGACTGGATTAAG





AAAATGTGGCAAATATACACCATGGAATACTATGCAGCCATAAAAAATGATGAGTTCATGTCCTTTGTAG





GAACAGGGATGAAATTGGAAATCATCATTCTCAGTAAACTGTCGCAAGAACAAAAAACCAAACACCGCAT





ATTCTCACTCATAGGTGGGAATTGAACAGTGAGAACACATGGACACAGGAAGGGGAACATCACACTCTGG





AGACTGTTGTGGGGTGGGGGGAGGGGGGAGGGATAGCATTAGGAGATATACCTAATGCTAAATGACGAGT





TAATGGGTGCAGCACACCAGCATGGCACATGTATACATATGTAACTAACCTGCACATTGTGCACAGGTAC





CCAAAAACTTAAAGTATAATAATAATAAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAA





TAAAATAAAAGAGGTTTAATTGCCTCATGGTTCTGCAGGCTATACAAGAAGCATAGTGCTTCTGCTTCTG





GGGAGGCCTCAGGAAACAATCATGGCAAAAGACGAAGGGAAAGTAGGCACGTCTTACATGGTTGGAACAA





GAGCAAGAGAGAGAGTGGGGAGAGAGAGCCTTGGAGCAGGAGCAAGAGAGAGTGGGGAGGTGCCACACAC





TTTTAAACAACCAGATCTTATGAGAAATCACTATCTCCCAGACAGCATCAAGGGGGATGATGTTAAGCAA





TGAGAAACCAGCCCCATGATTCAATTACCACCCACCAGTCCCCACTTCCAACATTGGGGATTACATTTCC





CCATGAGATTTGGATGATGCCACAGATCCAAACCATACCACTCACCTAATTCTTTCTACGTAAGAATTTG





TCCAAGCATTTATAACAATTAGCATTTCATTTAACATCTTTTATGAATAAAGCACTATTCTCATGCTGAG





AAGATTCAAAATAATGGGAAATTGAAGTCCTAGGAACAAGTTTTATGTTTCAGAAGAGCCCATTTGGTAT





CCACAGGGCTAAGAAATGTGCACCCTAAATGTAAGTGGATTACACTGAACTGAAAGGTGTAAAGAAGGAG





TGGAAGATTAAAGGGAGAAGCTTGGAGAGGATGAAAGTTAGAAATGGAAGTGACGAGCACACCTGAGTGA





AGGATGAGAGCTCCAGCTGCATTTTCCAGTTGTATTCCCATGTTGCTGAGCCAAAGGCTGATCTCAAGTT





TATTGTTACATGCCCATTTAAGGCTTCTGGCCATTAACACTTTTGATTTTTTTTGGCTTGTTGTTTTACT





AGCTATTTTCACAACACTTTCATAGCTAAACCTATTTTACTCAGATTGTATGCCTTTTCAAAAATACAAT





AGAAGGTCCATATTCCATTATCTAGAAATAAGCCAAAGCTCATATCTAACATTTATTAAGAGAGATGGAT





TATTTTTGTTCATTAGTTATCTTTATAAATAATTTTTACGTACTTTAGTTGACTCATAAAGATGTTTCTT





TCTGTAATTTTAATCTTAATATTTGTTGAACTTCAAAATCCCTATCACCAGGTTATTGTTTAAAAGCATT





GGTTTTTATATTATCTTAAAAGCCATTATACCTGAGTGCTGAACAACTTAGAAACATTCAGTAATTGTTT





TGCATGCTATTTAGTGAATTCATATGGCAATCGTTTATACATACATGATGGAATCAGGTGGCAGGCCAAG





TTAAAGAGCAAGGCCAGAAAAGAACTTAAAAGAGAAGAGAAAAAATAGACAGTTTAGGAACAATAGATCA





TGTCTTCTCCATGATTTGGAGGTAAACTGATTACCTATCAGCTGATAAATAGAGGAAGGTTTTAGAAGTC





TTCAGTTGGGTAGACTAATGAGAGGTGTCAGAGAAGATGTTTTCTGTTGTTTGTGGGTTCTCCAGGAAAC





TTTGAGCATTCAGCTGAGGGGCCAAGTTGGCTGCCTCTGAGAAGAAGCCCTTCCACCTCCACTCCATTGC





ACTTGGGTGCCATTCCCCTCAGTTGAATATCTCCAAGAGATGAGCAAATGTACATCTACAGAGTTCAGGG





TACTGACTTTTATCATAATGATTTATAACTCTCAGAAGAGTGAAAAACACATGAATGCACAGAATAGGAG





ATTGAAATATAAACCACAGAACATTCATACAATGGAATACTCTGCAGTCATAAAAATCTTCTCATAGAAG





AATATTTGACAGCATAGGGATATCTGTGGCATATTAAGTAGAAAGTCAGACTTGTAAACATTATATACAT





ATTCACGTATATTTAAACACCATGATCCCATATTTAGATATAACAACTAAAAGTTCAGATGGCTATATAT





CAAAATGTGTCAAATGTTCAACCTTGCATAGGCTGACTGTAGATGAATTTTATATTATTCTTTGTGCTTT





CTTGTAGTTCCCAAATTTTCTTTACTGAATCTATATTACTTTTGCAATTTAAAGAATTTAATTTATAAAA





TTTTATAAAATAACTTATAAATTTGAAATGTATTGCATTTAAGAATAAAAAGTGTTTAATTACAAAAATA





ATTCACAATTTATTTAATGAGATTTTAAAAGGATATATGTGAGTCTACATTCTGATTTCATGTTTGCATG





CATGGTTTTTTTTTTCTTTTGAGACAGAGTCTCGCTCTGTCGCCCAGGCTGGAGTGCAGTGGCGTGATCT





CGGCTCACTGCAAGCTTTGCCTCCTGGGTTCACACAATGTAATAGTGTTTTATTATTGTTTCCATTTTTA





TTGAAGAAGTAAGATTGTCCCTAGCAGATGGAGACACTGAGATATGGGACAGAAGTTTTGTTCTATATAA





TTATTATGCGCTTCCACCTTTCTTAGCATAGACAGTTTCCAAAATGCAACTTCAAGTTACCCCTTTATAA





GCATAATAACAATAATACCCAACATATATGTAATGCTCTTTATGTGCCAAGTACTATACTAACACATGCA





CATTACATACACACACACCACATACACACACATATTTAAACTAATTTCGTTCTCACAATGACATTTTGAG





GCAAGTATTATTATTGTACAGATGAGAAAACCAAGGCACGCTTTATCTGTAAACCTCTGCTATGCAGAAA





TTCTGGAGGGGCTTCTGGCCCCTTAATTTTAAAATAAGGCCAATAATACAATACTTACCACATAGCAATT





CTCTAAACATTATGTAAGATATATACCAAAGCGCTTAGCTCAGGGACTGGAGGGATGTGAGGGAATTTGT





CTTTTGCAATATGCTTTATGGTCCGCTCAGTCACCTCGTTCTTAATCCCTTTCTCAACTTCTATTTTATA





CAGCAATTGTGAGCATATCAGCATCAAGTACCACTGGTGTGGCAATGCACACTTCAACCTCTTCTTCAGT





CACAAAGAGTTACATCTCATCACAGACAAATGGTTTGTTTTCATTTTTATTTTTAAATTGTGGCTCCGAA





ATCATTTTTGTGATGTAACCCATTTTAGGGGACCTGTCACTGCAGAGAAACTGACAAACACTGAGAAATG





CGAGCTAAGTAGACACAGCCTACTAAGTAGACACAATTCCTACTATGGAGGAATTCTTGCCTCTGAAATA





TCTCACAGAAATAATACTGTGAGTTAAAGAAATTAAAACAATGTGGCAAAGCACAGAAATGATGCACGTG





ACCATGAAATAGTGGGCCAGATAAAGGGGACCTAATAGTGCGGTGGTGCGGAGGGTCTGTGGGCAAACTG





AGTTCAGCTCAGACCCGGGCTCAGCTCTATGCCAGCTGCTGACCCAGGGTGAGTTGCCCTGCAGGGTTTC





TATCCCATTAATTTTAAAATGGGGCCAATAACACAGTACTTATCTCACAGCATTTCTCTAAAGGCTAAAT





AAGAAGATGTATCTAAAAGTTATTAGCTCAGAGCCTCACACATTCTCAGTGACTGATAAACAATAAGCAA





AGCTGGGTGCTGAGATAAGAGTAATCTGGTGGCAGTCTCTCTTGTTAGTTTTCAGGGGAGAAGAAGAAAT





TCTGGAGCCGCTGCTGGGAGGGATGTGGGAGAGTTTGTCTTTCATAATACGCTCTATGTCCACGCAGTCA





CCTCATTCTTGTGCCCTTTCTCAACTTCTCTTATATGCAGATACGCACAAACGGGACACATATGCAGCCA





CTCCTAGAGCTCATGAAGTTTCAGAAATTTCTGTTAGAACTGTTTACCCTCCAGAAGAGGAAACCGGTAT





GTTCTTAGTTTTAAATAGTTGCTCTGGAGTCATTGTTGTGATTGAACTCTATTTACACGAGCTGTAACTC





ATGACAGTTCTCAAGCTTTCGTGACAGAAAACCCATCTCTTTTACTCCAAAGCCCATATAGCACCCACAA





CTATTAACTGTGACCAAGAAAGAGAAGGCAAGCCCCAATTAACCTTTGTACGTAAAGCCTAAAGAATGAA





AAAATATACCTGAATCCTCAATCATCAAACAGCATAGTATATACTAAGTAATTTGTAATAATTAAACTCT





AGAAAATTGTGTGGCTTCGGTAGTAAGAGAGCTTCATGATGTAAAATGGCAAGTGGAGACAGAGACAAAA





GTAGGATGTGGACTGAGAGGGAAGGTTAGCACAGGTGGAACAGTAAGGCAACCATACTATCAATTGCTGC





TGACATAGAATCCAGAGAGACTATTGGCAAAAGCTCAAATGAGACACAGTAACAGTTTAGATTCAGACAG





TGGCTGTGGCATAAATCAGAAAATTGATAGTCGCATGATCCCTCTTTGCATGGGACTGGCATCTGTGTGG





AGTAATGGTTCCATATGCCTCCTTTCTTCTCCTTATTTTTAAATTTTTTAAAAATGCATTGCTTCTTGTG





GAAGTCAATAAGTGATTCTTCCAATACTTTCTCATTCCTTCCCCCTCAGTTATGAGACAATTTGCTTATT





TCTCATCCATGAATACTTGTTGGGTCATTAAAAGTAGATACTGAAATTACTAATGGTACGACTGACATAT





TACCTCATAAATGTTACTAGCTAGATGTTGAAAGTTGACCAACAACTCTCAAAATATGATTAAGAAAAGG





AAACCCACAGAACAGTTTGATTCCAAAATGATTTTTTTCTTTGCACATGCCTTACTTATTTGGACTTACA





TTGAAATTTTGCTTTATAGGAGAAAGGGTACAACTTGCCCATCATTTCTCTGAACCAGGTATGTTAATAT





TTGACAAAGAATAAAAGTCATTCCATTTTAAACTATCCATTGCTTGTTTCAAATGCCTAAGAAAATGTGT





CTATCTTAGAAGAGCATATGTTGTTAACTTTATTCACACAAAATTGTAAAGGCAAAGAAAATATTCTCTT





TTTAAAATTAAAATAGGCATTTCTTATTTTTAAAAACATTTTGGGGGCCAGGGGCCGTGGCTCATGCCTA





TAATCCCAGAACTTTGGGAGGCTGAGCCTGGCTAATCGCTTGAGCCCAGGAATTTGAGAACAGCCTGGGC





AATATGGCGAAATCCATCTCTACAAAAAATACAAAAATTAGCTGGCATGGGGCACGCACCTGTAGTCTCA





GCTACTTGGGAGGCTGGCTGAGGTGGGAGGATCGGATCCATTGCCTGAGTCTGGGAGTTTAAGGCTGCAG





TGAGCTATGACTGTGCCACTGTACTCTAGCCTTGGTAAGACCCTGTCTCAAAAACAAATACATAAGTAAA





TAAAAATAAATAAAAACATTTTGGAAATAGAAATACATAATTTGGTAATAGTTTTTCTCTTAAGTTAGAT





GTTTTACCTTTCTAACCAAGCCTGAGTACTTGAAAAAAGCCTCATAAGAGCTTATAAAACAAATGAACTT





CCCTCATATAAAAAGCAAGGCATTTAAAATCATCTAATTAACTGGTACTGTATTTCAAGGGTAAATCTCA





GCCTTGATTCATTTTTGGCCCAATGCAACCACTTAGGGACCATCTTGACAACCTCTGCTGAAGGGACATC





CCTTCCCCTCACTTGAGTATCACTGTGTGTGCTCATTTGCTATTCTGCATTCCAACCCTCCCTTCACACT





TGGCTGTGTCCACGGCTCACAGGGTAAAAAGCACATCATAGAACTTCATCACTATCGCATACATTCAAGC





TAAGTGGTCAAGAAGGCTGGGCAACACCAGCAAGAGGAAATGCTACTTTTACTTTTTATCAACAATAGGG





CTTTTAAATATTAATTAGGCAAATAAATGAGCCATTTTACCTTTATGTCTAGCCTTCCATTCTATTTACT





TCAACTGGAAGCACTACAAATATGCTATAAATATGGAAATATCTCTTAATTGATTTCAATTGTTTCATTC





CCAACATATAAATGACTCAACAAGCATTTTTAGTGACTACATTGGAGACTATGCATAAGAATACTATGGA





AGGAATAAAGCTTAGAACATAGATGACCTGCATTATAATTATAATTCTACTTTTAACTAGTTGTCTGACC





AAGGCTAAGTTAACCTTATTCAGCTTCTTTTCTTCATTTGTAAACTGTTTATACCAGTTTCTTTCCAAAA





TTATGATTCTATGATCTGTTCAATGCTCTTTTATACATTAAGACATTATTTTCTCTCATAACTTCCAAAC





TATGGGAGAATTTGTGGTTTTTTCCCCATATCTGAGGAGAACGTCCACTGAGTTCTTATCTACAGTTACA





CTAGTGAAGAACGCTGGGTCTGGAATCAGAAGCTTCAGGTCTTAGTTCTGTCATCAACTATTTTGCGACC





TTGGACAAAAGACTTGATCACTCACAGTCCCAGTTTCCCACAAGGTTACTGTAAAGCACACAATTTAAAA





AAAGACAAAATCTACATAATAGTATATTAATTGTGCTTTCTATTAAAAGGCAAGGTGATGGTATGCTGAT





GTTATCTGTCTTATTTTTCAGTTGCTATATGGTCATTTATTTCAGACTTTCATAATTTTGCTGCTCTCTT





TATCTCCTGTAGAGATAACACTCATTATTTTTGGGGTGATGGCTGGTGTTATTGGAACGATCCTCTTAAT





TTCTTACGGTATTCGCCGACTGATAAAGGTGAGAATTCAGTTTTTAATTTTGCTGTAAATACCAATGTGA





ACAGCTCTAAGAGGGTTTATTCCTCTGAGTTCAGTTAAACTCAAAAGAGAAACAGAACTGCATAAAATTC





CATATTTTTCAACTGGACACATAGAAGTCACTGTGTTTCTCTAGCAGAATTTTTCTTTGCATTTGCCCAA





TTAAAGGGAACCTCTAAATATAAATCTGTCCCCCATTTTCCCAATGAAAGATCTCCCTAAGTTTTTGTCT





AACTTGCTGTCACATATTTTGATGGATATTGAGGAAATATTAAGATTCTACTTATAGTATTTACCCTATT





AGTGTATAAAATATTTAAAATAATATATTTACATATGTTTAAAACTTTGAGGGAAGCCAAGGCAGGAGGA





TTGCTTGAGCTCAGGAGTTTGAGACCAGCCTGAGCAAAAAGGTGAAACCTAGTCTATACAAAAAATATGA





AAATTAGAAAGGCGTGGTGGTGCACATGTGTAGTATCAGCTACTCAGGGGGCTGAAGTGGGAGGATTGCT





TGAGCCTGGGAAATCAAGGCTGCAGTGAGCTGTGATCATGCTACTGCACTCCAGCCTGGGCAACAGAGTG





AGACCCTGTCTCAATAATTATATAAATAAATAAATAAAAATAAACAAAATAAAACTTTTGCCTTTCTTAA





TTCTCACATATTCTGAAACAGATTTTTCAAATTTCCACCCATGAATTCTTAACATCAGTGATTTTTTTTG





AATCATTAATGCTTTTTTTAATTTTTTTTTTTTTTTTTGAGACAAGAGTTTCCCTCTGTCACCCAGGCTC





GAGTGCAAAGTGGTGCAATCTCTGCTCACTGCAGCCTCTGCCTCCCTGGTTTAAGTGATTCTCGTGCTTC





AGCCTCCGCAGTAGTTGGGACTACAGGTGCGGGACACCATGCCTGACTAATTTTTGTATTTTTTTAATAG





CAGAGATGGGGTTTCGCTGTGTTGGCCAGGCTGGTTTCAAACTCCTGACCTCAAGTGATCCATCTGCCCT





TGGCCTCCAAAGTGCTGGGATTACAAGCATGAGCCACCACGCCCAGCCCACTAATGCTATTTTTACATCC





ATACAACACAGCTTATCGAAGTGCATAACTTTTGCTATCACTTTCTATTCACGATATTTAAGACATAATA





TGTGTGTGTGTATTTATGATGCTGTCACTGTCTCTGTAATCCTAGATCAGAAGTACTTAGTCACATGAGA





TTGGTACAGTTGTGTTTTCATTCATCCTCTATTCTTAATCTCTCTTTGTGATTTTTGAGACCATAACCAC





TATATAATTCTTTTAAAAAGGCTGAGAGGTGTGACAGCACTGCAATTGTGGGGCCATCAGAAGATATGAT





AGTAATATCTACATTAAGTTCCTTTGCCTCTTTTCTTTTTTAACTACTTCTAACAGTTAACTTCTACCAT





CATCCAATCCTATAATTGATTTTCAGTATTCCATGTAAATATATCTTCCTTAAATAATACTTTTTGTTAA





TCAAAGAAAAGTAACTGAAAATGCCTACTCTTGTGTGAGATATTTTGTAAGGACTTTAATATAAGATAGC





TTTTTTTGCCTGGAGTATAAAAGAGAAAAGTCATCTTCTTACATGGGCATATATGGCAAAGTGGGTTGTC





TTCTCTCTTCGTCAATGTTCTAAAACCTGAAAAAGCCAAGGAAATATTTAGTTGGCAAAGTTCAGAGAAT





TTTCTAAGTGTATATGGATGAATTTTGTCCTGGTCAACATGATGCAGAGATCACACACTTTATTTTTATT





TTTATTTTCACTTTCACTATTTATTACAGCAGGGAAATATGTAAGTATCAGTGTTTGAGGTGATATTTCT





CCTACTGAAATACCAAATACTATAGAGGAACACAAATACAAGTTTAAATCAATGCTTATACCAGTAACTA





GTAACAACAACAATAACAAAATCTCTGCAAAGGGGATTTCAACCAAAAGAAAAAAAATTTTAGAAAAAAA





TATTTTTAAGCTGAAGCATTTTACTTTTTACTGTCTTAAGACTAGAAAATTGTGTTATTAATATTTTATG





GTATTTCTTCATAGAAAAGCCCATCTGATGTAAAACCTCTCCCCTCACCTGACACAGACGTGCCTTTAAG





TTCTGTTGAAATAGAAAATCCAGGTTGGTGTTAATATTTGCAGTTCCTTTTGCCTTTTAGGAAAAAAAAA





TCAAACCAGTGAGTTACTTCTTTCTGATTTGAGGGAGGAGGGAACCAGTTATGATTCATTTCTATTCTAT





CTCATTAATTCTACTTCTTTGACTTTTTAGAAATGTCTGCAGCATAGTGAGATTCTCCTTTGGACACAAA





GTGTTTTGTTTTGTTTTGTTTTTTTAACAAAAAAAAAAAAACTCAATCAAATAGTAAAAGCAAAAGAGAA





AACCAAGTGTACTTCGTATTTCCCAAACTGCAAAGTTATGTGTATAGGAGACTCTATGGTCAGTATGGTG





TAGCATAGTGAATTAGCCCCAGATCTGAAATCAGACTTGGATTTGAATCCATGCTCCAACACCTATTAGC





TGTGTAACCCTGAGCAAGCTACTAAACCTCTTTTAATATGGGGATAATGATAGTATCAACCTCACAAAGT





TTAATGAGAATTAAATGAGCTACAACCGGTAAAGCATTTAAAACCATTTGTGGCCATCATAAGTCCTCAT





GCCTGTTAGCTGTTATCAATATAGCACTGACATCAATGCTATATCAATATAGCATGTTATCAATATAGTG





TCATTCCCAAATGACCTCCTGTGCACACTGGCAAGCCATCTGGCACATGCTTTCATCTCCACTCCCAGGT





GCTAAGCAGATACAAAACATGTGAAAGGCCATGGATATATTTTGTTTATCCAGAACAGTATTAAACCACA





TAGTGCTTTTTGAAAAGAATATTTATTGTCAACCTTTAAAAGTCGGAAATTGTTACATTTTAAAAATCAA





GTATTGCTATTCCTCTGGGGAAAAATGTAAACTCCCAAAATGCTGAGAGCCTTCATACCAGCATGAGACC





AATTCCTAAGAGCTGAGTAGTGGCTGCTACCTGTACTGTCTGTCTAAATCCCTAGCCAATTGCATTTGTT





TTATTCACCGTGGCCCCTGGTATGAACTCACTAAGAAAGCATATAGTTTCTATTAAACTTTGCCTGAAGC





ATAAACCCAAATGACATCTATTTTGGGAGATAGTTACTAAGAACAAGTCTCTGGAATGAGCTTTATTTCT





CAAGCAAAAGAGATTTCATTCTGCCTTCTACAAAATCAACTGATTTTACTCCCATAATTTTCAGAAATCA





TGACAGATCAGAGGTCCTGTATGCTTCTGGATTTCGATTTTAACCCTGGGCCAGTCTAGGTTTTCTAGAC





TTTAGAGTCACAGAACACAGAGTTTTCAAGATCCATCACAGCTACACAGGTTATATGCAGGATTTGCCAC





ATCACATTATCATGTGAATTCTTAAAGCTTAAGAGTAATTGTTACATAAGTTTATAATCCTAAGACATTC





CTGCTATGTGGAAATGAATGGCATAGATATGATTCTCAGCTAAAAGGATTAATAAAATCCAATCTGCAGA





TACTTGAAACAACGGAAGTTTTTGAGTCATATGCCAGATTCACTTCATTTACTAAGGTTATCTTGTTATT





GGACTGGCAGCTGGAACAAGTATCTGTAAAATATTCATTTTATCTGCATTCTGCCTTGTTCCACAAAAAA





GTCTTGATGTAGTTTTTCAAGTGGAGCAATTACAACCTAAAGCCTATTTTTCGAACTGAAATTTATATAC





ATTTTTAGCTACTTATTTATTCTAGAGACAAATTTATTGTTTAGAGTTTCCCCTGCCATTTTTTTCATAC





AATTTTAAGCATCTCAAATGTTTGGCACAATTTAATACGCCACAGTGCATCAAGATGTCCTTGTAGTTTA





ATTCAGTTAAGTGCAACAAACATTTGCTAAATGCATACAGTGGGGTAGGCACCACACTCACATTAGATAT





ACCAATATGAGTCTTCGTCCTTTAGAAGCTGAGAGACTAATGGAAAAAACAGAATGTCATTGCAGTGAAC





AAGTTCTACAGTAGTGGAGGCAATAGCTCCACTTGTCCCAGAGACTGAGACAGGTATCAAAGGCTTCTGA





AGATGAAATCACCTGGGATTAGCCTTAAAAGACAGATAGATATTAGCTAGGGCAGGGTAGTTTTAGCAGA





AGGGCAGCCTGAGTGAGTAAAAGCATGGAAGACAGAATATGTTTACTTAAAGAATTGTATGCATTTCCAC





ATTAGCAGGATTGCTGCTTTGGTTCTCTGTTCACATCTCAAATATGTGTAATGGCAGTGGAAAGTCAGAA





GAACCAAACTTTAGGCTCACTTTATTTCCCCACATTTGTGCAAGTGAAGTTATTAAATGTCTTAGTATGT





TAGTGAGACAAGTTATGAATTCTGACTGCACCTCACAGAAAACATAGGAAAACACATTATTAAAGATTAT





TTAAAATGCTTTATTTCTACTTTTATAGAATATGGCTCTAAATTAGTTTATAAGCCAAAGGCATAAGAGG





TTAAAATGACAGTACCATCTCAACAAGAACTAATGATGTAAAGGAGTAATTAGAGTATAAATTGTTTTAA





CCTTCTAAAAGTGCACATGATCTGTGATTGGTGAAAAATGAGAATAAGCGAATCTGAGTCAGCTGGCCAC





TGTGGCATGCATATGTGACCCACTAGCCTATTTCCCACAGGAGAATGTTTGAGATGCACAGTTCCTGTGG





TGCCCAAATAGAAGAAGGCTGGAAAAGCTCTGCTTCTGGAAGAGCAAGGGCTCCCCTCTCCCTTTCATGC





AGTTTCTAGGAGCAACATAAATTCAACCTTCCAACCAGGAAAAGTGGAGCATCGGGTTTACTGGAGAAAA





CTAGCCCAGTGCCCTTCTTTTACACCCTAGAACCAGAGAGGAACTTGGCCATAAGCTTTTGTGCAGACTT





CTCCTTGGGGGAAAAAAAAAGTCATTATTTAAAAAGACATGACAGACTTAGACACATGCCTTAAATTTTA





ACATGCATATGTGATTCAACTTATCATTTACTGGCTTCACATTATATTTTGCCTCTATACAAGTTTGGCT





GTTTGTTTCTTATCTCTGTAGAAACTAGGAGCAGAGCAATTATATTTATTCTTTACCTAAGGCTTTTAGA





ATAGATATTCTAAGAAATTCTGTATTTTTCTTTACACAAAACTTGACAATAGAGCTAATATGTAAGGAGA





GTCCTTTCGTTTCCTACTAATTACATTCAAGAACAACTCTGCAAGAATGTAGAATCCTAAAATGTATACT





GTGCATTAATTTCCTGTTGTGTTTAAACATAACTATGTCTCATATTTCGGTCTTGTATTTTTTTTACTAT





AATCCTTCTAGAGACAAGTGATCAATGAGAATCTGTTCACCAAACCAAATGTGGAAAGAACACAAAGAAG





ACATAAGACTTCAGTCAAGTGAAAAATTAACATGTGGACTGGACACTCCAATAAATTATATACCTGCCTA





AGTTGTACAATTTCAGAATGCAATTTTCATTATAATGAGTTCCAGTGACTCAATGATGGGGAAAAAAATC





TCTGCTCATTAATATTTCAAGATAAAGAACAAATGTTTCCTTGAATGCTTGCTTTTGTGTGTTAGCATAA





TTTTTAGAATTGTTTGAGAATTCTGATCCAAAACTTTAGTTGAATTCATCTACGTTTGTTTAATATTAAC





TTAACCTATTCTATTGTATTATAATGATGATTCTGTCAAATGAAAGGCTTGAAATACCTAGATGAAGTTT





AGATTTTCTTCCTATTGTAAACTTTTGAGTCTGGTTTCATTGTTTTAAATAAATTAAGGGGACACTAAAG





TCCTATCATTCATTTCCTTCATTGCTGAACAGGCAAGATATAATATTACATGAATGATTACTATATTTTG





TTCACACTAATAAAGCTTATGCTCAGAAATGCCATACACACACACAAACACACACATTTATCATTTAATG





CATAAATCAACACAAAAGGTTTTCCCATTAATATGAAATATTACATATATATAAGTGCCATATTTAAAAT





AATTTGTCTAACAGTAGAACTATGTCGGAGCACTCACTGAAGCTTGCATTCCACTGAAAGAGTTATTTGT





GTAAGTAGAGTATCCGGAGAAGGAAAAGAACTTACGACCTTTCTTTATAACAGAAACTCAACTCTAAATT





CAACAAGATGTGCAAACCGGACATGCAGGTGAATATTTTAATAGGTTACTATAAGGTTCTCAATTAAATT





CTTTAATCTGTCCAGTCCCAGTTTCTCTTATTAATAAAACTTTGGAAATTGCTTTAAACCATTTAAAGGA





AATTTCTAGATATAGAAACTAAGGACTGTGACTATACAGCTGTCACTCATTTGTAGTAAAACTTAAAAAG





CAAAAACAAAAAACAAAAAAGACCTTCCTGTGATACTTTATTTCCGAACTAATAAAAATCTATATGACTT





TTTATTATTGTGTGATAACCAAGTAAATGTTTTCTATTTTGCATATTTTCAGGCATGGTAACAGAAATTT





ACCTTTTAATAAATTAAAAAATCTAAATTTTAACCTACTTGTATGTTCGGAGAGTGTTTTTGTACTATAT





TGACTACTTAAAATAGAGAATGAGACTAAGAAGGGAACATTTCTGTTGATACATGTTTTTTAAAAGAAAT





TTTAAGAGCATTATTAGGTTAATTTTAATCCAATTAATGACCCAAATGCCAAGGTAATTTTAAATTTACA





TTTTTAATAAAAGCAACATGTTGAAACAAGAGAGGGTGAGATTAACCTTTTTGCTAAAGTAATTTACAAG





TCAAAGACAGGAAGAGATCAGAGTGAATGTGCCTTCTTAACCAGAGCTACAGAATTTAGTGAATAATTAA





AGTACAAACTGCTTTGACCTCCTTGAACTTTTCCAAGCAATTTCTCTGTACTTCTATATATGAATGTCTT





AGCCAATTTTCTGCTACTATAACAGAATACGACAGACTGGGTAATTTAAAAAGAAAAGAAATTTATTTTC





TTCCTAGTTCTGGAGGCTGGGAAGGCGAAGGGCATGGCACTGACATCTGCCTTGTAACTGATGAGAACCT





TCTTACTGCATGATAACAAAGCAGCAAGGCAAGCAAAAGCGTAAGATGAAGAGAGAGGAAATGAAGCCAA





ACACATCCTTTCATCAGAAGCCCATTCCCTCTATAAGGCGTTACTACATTTATGAGAATGGAGTCCTCAT





GACCTAATCGTGACCTTAAAGGCCCCTCCCAACACTGTTACAATGGCAATTAAATTTCAACAAAGGTTCC





AGAGGTGACATTCGAATCAGCAATGAAATTTTCATAGTTAAATTTGGTATTCGTGGGGGAAGAAATGACC





ATTTCCCTTGTATTTTTATAATTAAATCAGCAAAATATTGTAATAAAGAAATCTTTCCTGTGAAGATACC





ATGACCCC






Enhancer elements use m the nucleic acids described herein can be single instances of an enhancer element sequence, or concatentations or repeats of one or more individual unique enhancer element sequences. Concatentations and repeats can comprise 2, 3, 4, 5, or more instances of a single sequence, or a collection of 2, 3, 4, 5 or more distinguishable enhancer element sequences (e.g., different elements from one gene or different elements from different genes).


In some embodiments of any of the aspects, the hematopoietic enhancer element is located at least about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least about 5 kb, at least about 6 kb, at least about 7 kb, at least about 8 kb, at least about 9 kb, at least about 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the hematopoietic enhancer element sequence is located at least 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least 5 kb, at least 6 kb, at least 7 kb, at least 8 kb, at least 9 kb, at least 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the hematopoietic enhancer element sequence is located at about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at about 5 kb, at about 6 kb, at about 7 kb, at about 8 kb, at about 9 kb, or at about 10 kb from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the hematopoietic enhancer element sequence can be in intergenic sequence or in the sequence of an intervening gene. In some embodiments of any of the aspects described herein, the target sequence can be identified within from the sequence which is about 500 bp to about 10 kb from the end of the open reading frame, e.g., about 1 kb to about 9 kb, about 2 kb to about 8 kb, about 3 kb to about 7 kb, or about 4 kb to about 6 kb from the open reading frame. In some embodiments of any of the aspects described herein, the hematopoietic enhancer element sequence can be located within the sequence which is 500 bp to 10 kb from the end of the open reading frame, e.g., 1 kb to 9 kb, 2 kb to 8 kb, 3 kb to 7 kb, or 4 kb to 6 kb from the open reading frame.


In some embodiments of any of the aspects, the heterologous regulatory sequence is a GATA1 hematopoietic enhancer minigene (G1HEM). The G1HEM can permit lineage-specific expression of GATA1 specifically in early erythroid progenitors but not in hematopoietic stem cells, e.g., as a gene therapeutic approach for the treatment of Diamond-Blackfan anemia. GATA1 hematopoietic enhancer minigene (G1HEM) comprises a concatentation of 4 distinct regulatory elements to achieve lineage-specific expression of GATA1 specifically in early erythroid progenitors. G1HEM elements as disclosed herein include a −3 kb hematopoietic enhancer, an upstream double GATA motif, an upstream CACCC box, and a segment of the first intron of GATA1. Indeed, the 979 nucleotides present in this minigene are sufficient to drive Gata1 cDNA appropriately to rescue a Gata1 knockout mouse and allow for ostensibly normal erythropoiesis.


In some embodiments of any of the aspects, the GATA1 hematopoietic enhancer minigene (G1HEM) comprises the following nucleic acid sequence (SEQ ID NO: 13):









ACCGGTGGCGCGCCGATCCAAGGAAGAGAGGACATTAGCATGGGTCTCAA





ATGGAAGCCTGACAGAGAAGACGCTTCAACCCGGACACCCCACCCCCGCC





TGCAATGGGCTCCCCCAAGCCTAGCCTGGCCCCCGCTGATTCCCTTATCT





ATGCCTTCCCAGCTGCCTCCCTGCTGGCTGAACTGTGGCCACAGACTTCT





GGGCCTTGCACCCCCTCCACTGCCCCCCAGCCCCAAGACAGCCTGTTACT





GCGGCACCAACAGCCACAGTCGAGTCCATCTGATAAGACTTATCTGCTGC





CCCAGAGCAGGCCAGAGCTGGCGTAAGCCCCAGGCACGAGCCGAAGCACT





AAAGAAGTGTATGTACCCTTACCCACTAGTAGTAAAACATGAAACTTAGA





TCTTGACTAATTGCTCATATGACTTGACTGGACACTGGACTCCACAGAAG





CCAAAGGCAAAGGGGATCCAACAACCTGCAGGATAGACAGGAAGGGCGGA





GGGACTAGAGCCTAAAAGGTCCTCCACAAGGAGGCGGCACACCCCCTCCC





CTGCACTGCCCCACCCACTGGGGCACCAGCCACTCCCTGGGGAGGAAAGA





GGAGGGAGAAGGTGAGTGGGAGGGAGGGAGGGCGGGCGGGCTGGCAGGAG





GGAGAGAAGGGAGACTCAGAGGCCGAGCTCCAAGGATAAATTACTTGTTG





AATAAGGATCTAATGTGTAGAACCCATACTGACATGGTAGCAGGCACATC





AGCACAGTTTTAGGGAAATGGGAGATGGAGAAGACTCACTGGAGGCTCAC





AGGCCTGTCCTGGTACACACGGTGGAAAAATATGAGACCCTCTTTAAAAA





GGAAGTGGATGGTAAGGACCAACACCCATGTTTGTCCACTGACCTCCAGA





TAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATA





GATAGACAGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGA





TTGACTGCAG






In some embodiments of any of the aspects, described herein is a GATA1 hematopoietic enhancer minigene (G1HEM) comprising, consisting of, or consisting essentially of a sequence of at least 80% homology to SEQ ID NO: 13. In some embodiments of any of the aspects, a GATA1 hematopoietic enhancer minigene (G1HEM) comprises, consists of, or consists essentially of a sequence of with at least 60%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98% or greater sequence identity to SEQ ID NO: 13.


In some embodiments of any of the aspects, the nucleic acid sequence comprises at least one, or at least 2, or at least 3, or at least 4, or at least 5, or at least 6, or at least 7, or at least 10, or at least 11, or at least 12, or at least 13, or at least 14, or at least 15, or at least 16, or at least 17, or at least 20, or at least 25, or at least 30 GATA1 hematopoietic enhancer minigenes (G1HEM).


In some embodiments of any of the aspects, the GATA1 hematopoietic enhancer minigene is located at least about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least about 5 kb, at least about 6 kb, at least about 7 kb, at least about 8 kb, at least about 9 kb, at least about 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the GATA1 hematopoietic enhancer minigene sequence is located at least 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least 5 kb, at least 6 kb, at least 7 kb, at least 8 kb, at least 9 kb, at least 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the GATA1 hematopoietic enhancer minigene is located at about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at about 5 kb, at about 6 kb, at about 7 kb, at about 8 kb, at about 9 kb, or at about 10 kb from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the GATA1 hematopoietic enhancer minigene sequence can be in intergenic sequence or in the sequence of an intervening gene. In some embodiments of any of the aspects described herein, the GATA1 hematopoietic enhancer minigene sequence can be located about 500 bp to about 10 kb from the end of the open reading frame, e.g., about 1 kb to about 9 kb, about 2 kb to about 8 kb, about 3 kb to about 7 kb, or about 4 kb to about 6 kb from the open reading frame. In some embodiments of any of the aspects described herein, the GATA1 hematopoietic enhancer minigene sequence is located s 500 bp to 10 kb from the end of the open reading frame, e.g., 1 kb to 9 kb, 2 kb to 8 kb, 3 kb to 7 kb, or 4 kb to 6 kb from the open reading frame.


In some embodiments of any of the aspects, disclosed herein are binding sites for HSC restricted miRNAs that permit regulated expression of GATA1 in hematopoietic progenitors to improve erythropoiesis in DBA without unwanted effects on hematopoiesis.


Non-limiting examples of HSC-restricted miRNAs include miR10aT, miR125, miR155, miR130aT, miR142T, miR196bT, miR99, miR126, miR181, miR193, miR223T, miR542, and let7e. Sequences for these miRNAs are known in the art for a number of species, e.g., human miR10aT, miR125, miR155, miR130aT, miR142T, miR196bT, miR99, miR126miR126, miR181, miR193, miR223T, miR542, and let7e.


Binding sites for each of these miRNAs are similarly known in the art and include those readily available on miRBase, miRDB, and/or TargetScan. Briefly, animal miRNA binding sites will be complementary to at least the “seed region” (6-8 nt in length) of the miRNA's sequence. Seed regions for each of the miRNAs described herein are publically available, e.g., at TargetScan and SEQ ID NOs: 43-55 provided herein at Table 2.


In some embodiments of any of the aspects, a binding site for a given miRNA described herein can be a sequence that comprises, consists of, or consists essentially of a sequence complementary to the seed region of that miRNA. In some embodiments of any of the aspects, a nucleic acid sequence described herein can comprise 2, 3, 4, or more repeats of a sequence complementary to the seed region of a single HSC restricted miRNA. Such a sequence can include repeats of an individual sequence and/or combinations of different sequences in series.


In some embodiments of any of the aspects, a binding site for a two or more miRNAs described herein can be a sequence that comprises, consists of, or consists essentially of sequences complementary to the seed region(s) of those miRNAs. In some embodiments of any of the aspects, a binding site for two or more miRNAs described herein can be a sequence that comprises, consists of, or consists essentially of sequences having 2, 3, 4, or more repeats of a sequences complementary to the seed region(s) of those miRNAs. Such a sequence can include repeats of an individual sequence and/or combinations of different sequences in series.


In some embodiments ofany of the aspects, a binding site for one or more miRNAs described herein can be a sequence that comprises, consists of, or consists essentially of a sequence or sequences selected from SEQ ID NOs: 31-37. In some embodiments ofany of the aspects, a binding site for one or more miRNAs described herein can be a sequence that comprises, consists of, or consists essentially of a sequence having 2, 3, 4, or more sequences selected from SEQ ID NOs: 31-37. Such a sequence can include repeats of an individual sequence and/or combinations of different sequences in series. In some embodiments of any of the aspects, a nucleic acid sequence described herein can comprise a sequence that comprises, consists of, or consists essentially of 4 repeats of a sequence selected from SEQ ID NOs: 31-37.









TABLE 2







Non-limiting examples of HSC-restricted miRNA names, miRBase accession


number, nucleotide sequence, exemplary seed regions and exemplary nucleotide sequence of the


miRNA binding site.












miRBase


Nucleotide sequence



accession
Nucleotide sequence of the
Exemplary seed
of exemplary


miRNA name
number
mature miRNA
regions
miRNA binding site





miR10aT
MI0000266
UACCCUGUAGAUCCGAAUU
UGUCCCA
CACAAAT




UGUG (SEQ ID NO: 18)
(SEQ ID NO: 43)
TCGGATCTACAGG






GTA (SEQ ID NO:






31)





miR99
MI0000101
AACCCGUAGAUCCGAUCUU
AUGCCCA





GUG (SEQ ID NO: 19)
(SEQ ID NO: 44)






miR125
MI0000469
ACAGGUGAGGUUCUUGGGA
GAGUCCC





GCC (SEQ ID NO: 20)
(SEQ ID NO: 45)






miR126
MI0000471
CAUUAUUACUUUUGGUACG
GCCAUGC
GCATTAT




CG (SEQ ID NO: 21)
(SEQ ID NO: 46)
TACTCACGGTACG






A (SEQ ID NO: 32)





miR155
MI0000681
CUGUUAAUGCUAAUCGUGA
CGUAAU





UAGGGGUUUUUGCCUCCAA
(SEQ ID NO: 47)





CUGACUCCUACAUAUUAGC






AUUAACAG






(SEQ ID NO: 22)







miR181
MI0000289
AACAUUCAACGCUGUCGGU
ACUUACA





GAGU
(SEQ ID NO: 48)





(SEQ ID NO: 23)







miR193
MI0000487
AACUGGCCUACAAAGUCCC
CCGGUCA





AGU (SEQ ID NO: 24)
(SEQ ID NO: 49)






miR196bT
MI0000238
CAACAACAUUAAACCACCC
UGAUGGA
CCAACAA




GA (SEQ ID NO: 25)
(SEQ ID NO: 50)
CAGGAAACTACCT






A (SEQ ID NO: 33)





miR223T
MI0000300
UGUCAGUUUGUCAAAUACC
UUGACUG
TGTCAGT




CCA (SEQ ID NO: 26)
(SEQ ID No: 51)
TTGTCAAATACCC






C (SEQ ID NO: 34)





miR542
MI0003686
UGUGACAGAUUGAUAACUG
AGGGGC





AAA (SEQ ID NO: 27)
(SEQ ID NO: 52)






let7e
MI0000066
UGAGGUAGGAGGUUGUAU
GGCAUAU
AACTATA




AGUU (SEQ ID NO: 28)
(SEQ ID NO: 53)
CAACCTACTACCT






CA (SEQ ID NO: 35)





miR130aAT
MI0000448
GCUCUUUUCACAU
AACGUGA
CAGTGCA




UGUGCUACU (SEQ ID NO:
C (SEQ ID NO: 54)
ATGTTAAAAGGGC




29)

AT (SEQ ID NO: 36)





miR142T
MI0000458
CAUAAAGUAGAAA
UGAAAUA
TCCATAA




GCACUACU (SEQ ID NO: 30)
(SEQ ID NO: 55)
AGTAGGAAACACT






ACA (SEQ ID NO:






37)









In one aspect of any of the embodiments, described herein is a nucleic acid sequence comprising at least one miRNAbinding site for at least one HSC-restricted miRNA that is selected from the group consisting of miR binding sites for miR10aT, miR125, miR155, miR130aT, miR142T, miR196bT, miR99, miR126, miR181, miR193, miR223T, miR542, and let7e. In one aspect of any of the embodiments, described herein is a nucleic acid sequence comprising at least one, or at least two, or at least three, or at least four, or at least five, or at least six, or at least seven, or at least eight, or at least ten, or at least eleven, or at least twelve binding sites for at least one HSC-restricted miRNA that is selected from the group consisting of miR binding sites for miR10aT, miR125, miR155, miR130aT, miR142T, miR196bT, miR99, miR126, miR181, miR193, miR223T, miR542, and let7e. Where a subset of the miRNA binding sites for the foregoing miRNAs is used, any combination of the miRNA binding sites can be used in each of various embodiments of the aspects described herein. For example, it is specifically contemplated herein that any pairwise combination of binding sites for the 12 miRNAs can be used, e.g., any combination shown in Table 3.


In one aspect of any of the embodiments, described herein is a nucleic acid sequence comprising at least one Hematopoietic enhancer element and at least miRNA binding site for at least one HSC-restricted miRNA. In one aspect of any of the embodiments, described herein is a nucleic acid sequence comprising at least one Hematopoietic enhancer element and at least one binding site for at least one HSC-restricted miRNA and a sequence encoding a GATA1 polypeptide.









TABLE 3







Contemplated exemplary combinations of miRNA binding sites are indicated by “X”





















miR10aT
miR125
miR155
miR130aT
miR196bT
miR142T
miR99
miR126
miR181
miR193
miR223T
miR542
Let7e





miR10aT

X
X
X
X
X
X
X
X
X
X
X
X


miR125
X

X
X
X
X
X
X
X
X
X
X
X


miR155
X
X

X
X
X
X
X
X
X
X
X
X


miR130aT
X
X
X

X
X
X
X
X
X
X
X
X


miR196bT
X
X
X
X

X
X
X
X
X
X
X
X


miR142T
X
X
X
X
X

X
X
X
X
X
X
X


miR99
X
X
X
X
X
X

X
X
X
X
X
X


miR126
X
X
X
X
X
X
X

X
X
X
X
X


miR181
X
X
X
X
X
X
X
X

X
X
X
X


miR193
X
X
X
X
X
X
X
X
X

X
X
X


miR223T
X
X
X
X
X
X
X
X
X
X

X
X


miR542
X
X
X
X
X
X
X
X
X
X
X

X


Let7e
X
X
X
X
X
X
X
X
X
X
X
X









In some embodiments of any of the aspects, the miRNA binding site is located at least about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least about 5 kb, at least about 6 kb, at least about 7 kb, at least about 8 kb, at least about 9 kb, at least about 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the miRNA binding site sequence is located at least 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least 5 kb, at least 6 kb, at least 7 kb, at least 8 kb, at least 9 kb, at least 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the miRNA binding site sequence is located at about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at about 5 kb, at about 6 kb, at about 7 kb, at about 8 kb, at about 9 kb, or at about 10 kb from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the miRNA binding site sequences can be in intergenic sequence or in the sequence of an intervening gene. In some embodiments of any of the aspects described herein, the target sequence located within the sequence which is about 500 bp to about 10 kb from the end of the open reading frame, e.g., about 1 kb to about 9 kb, about 2 kb to about 8 kb, about 3 kb to about 7 kb, or about 4 kb to about 6 kb from the open reading frame. In some embodiments of any of the aspects described herein, the miRNA binding site sequences are located about 500 bp to 10 kb from the end of the open reading frame, e.g., 1 kb to 9 kb, 2 kb to 8 kb, 3 kb to 7 kb, or 4 kb to 6 kb from the open reading frame.


In some embodiments of any of the aspects, disclosed herein are nucleic acid sequences comprising a sequence encoding a GATA1 polypeptide and a heterologous 5′ UTR. Such combinations permit lineage-specific expression of GATA1 specifically in early erythroid progenitors


Cap analysis of gene expression was used to define 5′ untranslated regions (UTRs) for transcripts in HSPCs undergoing erythroid lineage commitment, a stage at which the functional defects in erythroid differentiation arise. Transcripts that were most highly translated at baseline and which had short and unstructured 5′ UTRs tend to be the ones that were downregulated at the translational level in the setting of RP haploinsufficiency. The 5′ UTR or “5′ untranslated region” or 5′ leader sequence refers to regions of an mRNA that are not translated. Described herein is the discovery that among all hematopoietic master transcript factors, only GATA1 has a short 5′ UTR and that replacing this 5′ UTR with those of other transcript factors (including but not limited to RUNX1, LMO2, or ETV6) alters the translation of the GATA1 hematopoietic transcription factor.


In one aspect of any of the embodiments, described herein is a nucleic acid sequence comprising i) a heterologous 5′ UTR comprising a) a 5′UTR sequence of a hematopoietic transcription factor other than GATA1; b) a sequence of at least 20 nucleotide acids; and/or c) 1-25 upstream codons uAUGs and ii) a nucleic acid sequence encoding a GATA1 polypeptide. In some embodiments of any of the aspects, a nucleic acid sequence described herein can further comprise a) a heterologous 5′ UTR comprising a) a 5′UTR sequence of a hematopoietic transcription factor other than GATA1; b) a sequence of at least 20 nucleotide acids; and/or c) 1-25 upstream codons uAUGs.


The length of the 5′ UTR can be modified by mutation for example substitution, deletion or insertion of the 5′ UTR. The 5′ UTR can be further modified by mutating a naturally occurring start codon or translation initiation site such that the codon no longer functions as start codon and translation may initiate at an alternate initiation site.


In some embodiments of any of the aspects, the a 5′UTR sequence of a hematopoietic transcription factor other than GATA1 can be a 5′UTR of a gene selected from the group consisting of: Runt-related transcription factor 1 (RUNX1), LIM Domain Only 2 (LMO2), and ETS Variant 6 (ETV6).


As used herein, “RUNX1”, “ANL1”, or “Runt-related transcription factor 1” refers to the alpha subunit of the heterodimeric core binding factor (CBF) transcription factor which is thought to be involved in the development of normal hematopoiesis. RUNX1 is itself a transcription factor and complexes with CBFB cofactor to form CBF. Sequences for RUNX1 are known for a number of species, e.g., human RUNX1 (the RUNX1 NCBI Gene ID is 861) mRNA sequences (e.g., NM_001001890.2) and polypeptide sequences (e.g., NP 001001890.1) are known in the art. These, together with any naturally occurring allelic, splice variants, and processed forms thereof that catalyze the same reaction are contemplated for use in the methods and compositions described herein.


In some embodiments of any of the aspects, the RUNX1 5′ UTR comprises a 5′UTR that comprises, consists of, consists essentially of or is derived from the following nucleic acid sequence: NG_011402.2:940414-1201911 Homo sapiens RUNX family transcription factor 1 (RUNX1), RefSeqGene (LRG 482) on chromosome 21, (SEQ ID NO: 14):









CACAGAACCACAAGTTGGGTAGCCTGGCAGTGTCAGAAGTCTGAACCCAG





CATAGTGGTCAGCAGGCAGGACGAATCACACTGAATGCAAACCACAGGGT





TTCGCAGCGTGGTAAAAGAAATCATTGAGTCCCCCGCCTTCAGAAGAGGG





TGCATTTTCAGGAGGAAGCG






As used herein, “LMO2”, “TTG2”, or “LIM Domain Only 2” refers to a cysteine-rich, two LIM-domain protein that is required for yolk sac erythropoiesis. Sequences for LMO2 are known for a number of species, e.g., human LMO2 (the LMO2 NCBI Gene ID is 4005) mRNA sequences (e.g., NM_001142315.1) and polypeptide sequences (e.g., NP 001135787.1) are known in the art. These, together with any naturally occurring allelic, splice variants, and processed forms thereof that catalyze the same reaction are contemplated for use in the methods and compositions described herein.


In some embodiments of any of the aspects, the LMO2 5′ UTR comprises a 5′UTR that comprises, consists of, consists essentially of, or is derived from the following nucleic acid sequence: NC_000011.10:c33892289-33858576 Homo sapiens chromosome 11, GRCh38.p12, (SEQ ID NO: 15):









ACAAGGGCCTCTGGGTGTCCTGGCAGAGAGGGGAGATGGCACAGGCACCA





GGTGCTAGGGTGCCAGGGCCTCCCGAGAAGGAACAGGTGCAAAGCAGGCA





ATTAGCCCAGAAGGTATCCGTGGGGCAGGCAGCCTAGATCTGATGGGGGA





AGCCACCAGGATTACATCATCTGCTGTAACAACTGCTCTGAAAAGAAGAT





ATTTTTCAACCTGAACTTGCAGTAGCTAGTGGAGAGGCAGGAAAAAGGAA





ATGAAACCAGAGACAGAGGGAAGCTGAGCGAAAATAGACCTTCCCGAGAG





AGGAGGAAGCCCGGAGAGAGACGCACGGTCCCCTCCCCGCCCCTAGGCCG





CCGCCCCCTCTCTGCCCTCGGCGGCGAGCAGCGCGCCGCGACCCGGGCCG





AAGGTGCGAGGGGCTCCGGGCGGCCGGGCGGGCGCACACCATCCCCGCGG





GCGGCGCGGAGCCGGCGACAGCGCGCGAGAGGGACCGGGCGGTGGCGGCG





GCGGGACCGGG






As used herein, “ETV6”, “TEL”, or “ETS Variant 6” refers to a transcription factor with two functional domains: a N-terminal pointed (PNT) domain that is involved in protein-protein interactions with itself and other proteins, and a C-terminal DNA-binding domain. Sequences for ETV6 are known for a number of species, e.g., human ETV6 (the ETV6 NCBI Gene ID is 2120) mRNA sequences (e.g., NM_001987.4) and polypeptide sequences (e.g., NP 001978.1) are known in the art. These, together with any naturally occurring allelic, splice variants, and processed forms thereof that catalyze the same reaction are contemplated for use in the methods and compositions described herein.


In some embodiments of any of the aspects, the ETV6 5′ UTR comprises a 5′UTR that comprises, consists of, consists essentially of, or is derived from the following nucleic acid sequence NG_011443.1:5001-250549 Homo sapiens ETS variant 6 (ETV6), RefSeqGene (LRG 609) on chromosome 12 (SEQ ID NO: 16):









CGTCAGTTTCTGCACTGAAACTCTCAAGATCAATGAGCAAAGAGCTTTCT





CAGTTCTGCCTTTCAGTTTCTCTCTTCCAGGAAGGAAAACATTCGAGAGA





GCGAGGGAGAGCCGCGGGAGGGCGGGGGGCGGGGGCGCCGGCTGCGGGTG





GGAGGAGAGACCGGGAGGCCGGCCGGGCTGCGTCCCGGGTCCCCGCGCCG





CGCCGCGACCTGCAGACCCCGCCGCCGCGCTCGGGCCCGTCTCCCACGCC





CCCGCCGCCCCGCGCGCCCAACTCCGCCGGCCGCCCCGCCCCGCCCCGCG





CGCTCCAGACCCCCGGGGCGGCTGCCGGGAGAGATGCTGGAAGAAACTTC





TTAAATGACCGCGTCTGGCTGGCCGTGGAGCCTTTCTGGGTTGGGGAGAG





GAAAGGAAAGTGGAAAAAACCTGAGAACTTCCTGATCTCTCTCGCTGTGA





GAC






The nucleic acid sequences/elements described herein can be operably linked so that they can interact either directly or indirectly to carry out an intended function, e.g. the mediation or modulation of expression of a nucleic acid sequence. “Operably linked” refers to an arrangement of elements wherein the components so described are configured so as to perform their usual function. Thus, control elements operably linked to an open reading frame are capable of effecting the expression of the open reading frame. The control elements need not be contiguous with the open reading frame, so long as they function to direct the expression thereof. Thus, for example, intervening untranslated yet transcribed sequences can be present between a promoter sequence and the open reading frame and the promoter sequence can still be considered “operably linked” to the open reading frame. The interaction of operatively linked sequences can, for example, be mediated by proteins that interact with the operatively linked sequences.


In some embodiments of any of the aspects, a promoter can be operably linked to any of the elements disclosed herein, e.g., a nucleic acid sequence comprising a hetereologous 5′UTR, at least one distal hematopoietic stem cell (HSC) restricted enhancer element, a binding site for a HSC restricted miRNA, and/or a nucleic acid encoding a GATA1 polypeptide. In some embodiments of any of the aspects, the promoter is not a GATA1 promoter.


In some embodiments of any of the aspects, the promoter comprises a promoter sequence of Elongation factor 1-alpha 1 (eEF1a1). As used herein, “eEF1a1”, “CCS-3”, or “LENG7” refers to the alpha subunit of the elongation factor-1 complex, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. Sequences for eEF1a1 are known for a number of species, e.g., human eEF1a1 (the eEF1a1 NCBI Gene ID is 1915) are known in the art. In some embodiments of any of the aspects, the eEF1a1 promoter comprises a promoter that comprises, consists of, consists essentially of, or is derived from the following nucleic acid sequence NC_000006.12:c73521032-73515750 Homo sapiens chromosome 6, GRCh38.p12 Primary Assembly (SEQ ID NO: 17):










CTTTTTCGCAACGGGTTTGCCGCCAGAACACAGGTAAGTGCCGTGTGTGGTTCCCGCGGGCCTGGCCTCT






TTACGGGTTATGGCCCTTGCGTGCCTTGAATTACTTCCACGCCCCTGGCTGCAGTACGTGATTCTTGATC





CCGAGCTTCGGGTTGGAAGTGGGTGGGAGAGTTCGAGGCCTTGCGCTTAAGGAGCCCCTTCGCCTCGTGC





TTGAGTTGAGGCCTGGCTTGGGCGCTGGGGCCGCCGCGTGCGAATCTGGTGGCACCTTCGCGCCTGTCTC





GCTGCTTTCGATAAGTCTCTAGCCATTTAAAATTTTTGATGACCTGCTGCGACGCTTTTTTTCTGGCAAG





ATAGTCTTGTAAATGCGGGCCAAGATCTGCACACTGGTATTTCGGTTTTTGGGGCCGCGGGCGGCGACGG





GGCCCGTGCGTCCCAGCGCACATGTTCGGCGAGGCGGGGCCTGCGAGCGCGGCCACCGAGAATCGGACGG





GGGTAGTCTCAAGCTGGCCGGCCTGCTCTGGTGCCTGGCCTCGCGCCGCCGTGTATCGCCCCGCCCTGGG





CGGCAAGGCTGGCCCGGTCGGCACCAGTTGCGTGAGCGGAAAGATGGCCGCTTCCCGGCCCTGCTGCAGG





GAGCTCAAAATGGAGGACGCGGCGCTCGGGAGAGCGGGCGGGTGAGTCACCCACACAAAGGAAAAGGGCC





TTTCCGTCCTCAGCCGTCGCTTCATGTGACTCCACGGAGTACCGGGCGCCGTCCAGGCACCTCGATTAGT





TCTCGAGCTTTTGGAGTACGTCGTCTTTAGGTTGGGGGGAGGGGTTTTATGCGATGGAGTTTCCCCACAC





TGAGTGGGTGGAGACTGAAGTTAGGCCAGCTTGGCACTTGATGTAATTCTCCTTGGAATTTGCCCTTTTT





GAGTTTGGATCTTGGTTCATTCTCAAGCCTCAGACAGTGGTTCAAAGTTTTTTTCTTCCATTTCAGGTGT





CGTGAAAACTACCCCTAAAAGCCAAAATGGGAAAGGAAAAGACTCATATCAACATTGTCGTCATTGGACA





CGTAGATTCGGGCAAGTCCACCACTACTGGCCATCTGATCTATAAATGCGGTGGCATCGACAAAAGAACC





ATTGAAAAATTTGAGAAGGAGGCTGCTGAGGTATGTTTAATACCAGAAAGGGAAAGATCAACTAAAATGA





GTTTTACCAGCAGAATCATTAGGTGATTTCCCCAGAACTAGTGAGTGGTTTAGATCTGAATGCTAATAGT





TAAGACCTTACTTATGAAATAATTTTGCTTTTGGTGACTTCTGTAATCGTATTGCTAGTGAGTAGATTTG





GATGTTAATAGTTAAGATCCGACTTATAAAAGTTTGATTTTTGGTTGCTTCTGTAACCCAAAGTGACTAA





AATCACTTTGGACTTGGAGTTGTAAAGTGGAAACTGCCAATTAAGGGCTGGGGACAAGGAAATTGAAGCT





GGAGTTTGTGTTTTAGTAACCAAGTAACGACTCTTAATCCTTACAGATGGGAAAGGGCTCCTTCAAGTAT





GCCTGGGTCTTGGATAAACTGAAAGCTGAGCGTGAACGTGGTATCACCATTGATATCTCCTTGTGGAAAT





TTGAGACCAGCAAGTACTATGTGACTATCATTGATGCCCCAGGACACAGAGACTTTATCAAAAACATGAT





TACAGGGACATCTCAGGTTGGTGGGATTAATAATTCTAGGTTTCTTTATCCCAAAAGGCTTGCTTTGTAC





ACTGGTTTTGTCATTTGGAGAGTTGACAGGGATATGTCTTTGCTTTCTTTAAAGGCTGACTGTGCTGTCC





TGATTGTTGCTGCTGGTGTTGGTGAATTTGAAGCTGGTATCTCCAAGAATGGGCAGACCCGAGAGCATGC





CCTTCTGGCTTACACACTGGGTGTGAAACAACTAATTGTCGGTGTTAACAAAATGGATTCCACTGAGCCA





CCCTACAGCCAGAAGAGATATGAGGAAATTGTTAAGGAAGTCAGCACTTACATTAAGAAAATTGGCTACA





ACCCCGACACAGTAGCATTTGTGCCAATTTCTGGTTGGAATGGTGACAACATGCTGGAGCCAAGTGCTAA





CGTAAGTGGCTTTCAAGACCATTGTTAAAAAGCTCTGGGAATGGCGATTTCATGCTTACACAAATTGGCA





TGCTTGTGTTTCAGATGCCTTGGTTCAAGGGATGGAAAGTCACCCGTAAGGATGGCAATGCCAGTGGAAC





CACGCTGCTTGAGGCTCTGGACTGCATCCTACCACCAACTCGTCCAACTGACAAGCCCTTGCGCCTGCCT





CTCCAGGATGTCTACAAAATTGGTGGTAAGTTGGCTGTAAACAAAGTTGAATTTGAGTTGATAGAGTACT





GTCTGCCTTCATAGGTATTTAGTATGCTGTAAATATTTTTAGGTATTGGTACTGTTCCTGTTGGCCGAGT





GGAGACTGGTGTTCTCAAACCCGGTATGGTGGTCACCTTTGCTCCAGTCAACGTTACAACGGAAGTAAAA





TCTGTCGAAATGCACCATGAAGCTTTGAGTGAAGCTCTTCCTGGGGACAATGTGGGCTTCAATGTCAAGA





ATGTGTCTGTCAAGGATGTTCGTCGTGGCAACGTTGCTGGTGACAGCAAAAATGACCCACCAATGGAAGC





AGCTGGCTTCACTGCTCAGGTAACAATTTAAAGTAACATTAACTTATTGCAGAGGCTAAAGTCATTTGAG





ACTTTGGATTTGCACTGAATGCAAATCTTTTTTCCAAGGTGATTATCCTGAACCATCCAGGCCAAATAAG





CGCCGGCTATGCCCCTGTATTGGATTGCCACACGGCTCACATTGCATGCAAGTTTGCTGAGCTGAAGGAA





AAGATTGATCGCCGTTCTGGTAAAAAGCTGGAAGATGGCCCTAAATTCTTGAAGTCTGGTGATGCTGCCA





TTGTTGATATGGTTCCTGGCAAGCCCATGTGTGTTGAGAGCTTCTCAGACTATCCACCTTTGGGTAAGGA





TGACTACTTAAATGTAAAAAAGTTGTGTTAAAGATGAAAAATACAACTGAACAGTACTTTGGGTAATAAT





TAACTTTTTTTTTAATAGGTCGCTTTGCTGTTCGTGATATGAGACAGACAGTTGCGGTGGGTGTCATCAA





AGCAGTGGACAAGAAGGCTGCTGGAGCTGGCAAGGTCACCAAGTCTGCCCAGAAAGCTCAGAAGGCTAAA





TGAATATTATCCCTAATACCTGCCACCCCACTCTTAATCAGTGGTGGAAGAACGGTCTCAGAACTGTTTG





TTTCAATTGGCCATTTAAGTTTAGTAGTAAAAGACTGGTTAATGATAACAATGCATCGTAAAACCTTCAG





AAGGAAAGGAGAATGTTTTGTGGACCACTTTGGTTTTCTTTTTTGCGTGTGGCAGTTTTAAGTTATTAGT





TTTTAAAATCAGTACTTTTTAATGGAAACAACTTGACCAAAAATTTGTCACAGAATTTTGAGACCCATTA





AAAAAGTTAAATGAGAAACCTGTGTGTTCCTTTGGTCAACACCGAGACATTTAGGTGAAAGACATCTAAT





TCTGGTTTTACGAATCTGGAAACTTCTTGAAAATGTAATTCTTGAGTTAACACTTCTGGGTGGAGAATAG





GGTTGTTTTCCCCCCACATAATTGGAAGGGGAAGGAATATCATTTAAAGCTATGGGAGGGTTGCTTTGAT





TACAACACTGGAGAGAAATGCAGCATGTTGCTGATTGCCTGTCACTAAAACAGGCCAAAAACTGAGTCCT





TGTGTTGCATAGAAAGCTTCATGTTGCTAAACCAATGTTAAGTGAATCTTTGGAAACAAAATGTTTCCAA





ATTACTGGGATGTGCATGTTGAAACGTGGGTTAAAATGACTGGGCAGTGAAAGTTGACTATTTGCCATGA





CATAAGAAATAAGTGTAGTGGCTAGTGTACACCCTATGAGTGGAAGGGTCCATTTTGAAGTCAGTGGAGT





AAGCTTTATGCCAGTTTGATGGTTTCACAAGTTCTATTGAGTGCTATTCAGAATAGGAACAAGGTTCTAA





TAGAAAAAGATGGCAATTTGAAGTAGCTATAAAATTAGACTAATCTACATTGCTTTTCTCCTGCAGAGTC





TAATACCTTTTATGCTTTGATAATTAGCAGTTTGTCTACTTGGTCACTAGGAATGAAACTACATGGTAAT





AGGCTTAACAGGTGTAATAGCCCACTTACTCCTGAATCTTTAAGCATTTGTGCATTTGAAAAATGCTTTT





CGCGATCTTCCTGCTGGGATTACAGGCATGAGCCACTGTGCCTGACCTCCCATATGTAAAAGTGTCTAAA





GGTTTTTTTTTGGTTATAAAAGGAAAATTTTTGCTTAAGTTTGAAGGATAGGTAAAATTAAAGGACATGC





TTTCTGTTTGTGTGATGGTTTTTAAAAATTTTTTTTAAGATGGAGTTCTTGTTGCCCAGGCTAGAATGCA





ATGGCAAAATCTCACTGCAATCTCCTCCTCCTGGGTTCAAGCAATTCTCCTACTTCAGCCTCCCAAGTAG





CTGGGATTACAGGCATGTGCTAATTTGGTGTTTTTAATAGAGATGAGGTTTTTCCATGTTGGTCAGGCTG





GTCTCAAACTCCTGACCTTAGGTGATCGCCTCGGCCTCCTAAAGTGCTGGAATTACAGGCATGAGCCACC





ATGCCTGGCCAGGACATGTGTTCTTAAGGACATGCTAAGCAGGAGTTAAAGCAGCCCAAGAGATAAGGCC





TCTTAAAGTGACTGGCAATGTGTATTGCTCAAGATTCAAAGGTACTTGAATTGGCCATAGACAAGTCTGT





AATGAAGTGTTATCGTTTTCCCTCATCTGAGTCTGAATTAGATAAAATGCCTTCCCATCAGCCAGTGCTC





TGAGGTATCAAGTCTAAATTGAACTAGAGATTTTTGTCCTTAGTTTCTTTGCTATCTAATGTTTACACAA





GTAAATAGTCTAAGATTTGCTGGATGACAGAAAAAACAGGTAAGGCCTTTAATAGATGGCCAATAGATGC





CCTGATAATGAAAGTTGACACCTGTAAGATTTACCAGTAGAGAATTCTTGACATGCAAGGAAGCAAGATT





TAACTGAAAAATTGTTCCCACTGGAAGCAGGAATGAGTCAGTTTACTTGCATATACTGAGATTGAGATTA





ACTTCCTGTGAAACCCAGTGTCTTAGACAACTGTGGCTTGAGCACCACCTGCTGGTATTCATTACAAACT





TGCTCACTACAATAAATGAATTTTAAGCTTTAA






Complex cellular and developmental processes depend on precise spatiotemporal regulation of mRNA and protein levels and activities. Such regulation arises essentially at the transcriptional, posttranscriptional, and posttranslational levels. Post-transcriptional regulation is the control of gene expression at the RNA level, therefore between the transcription and the translation of the gene. Posttranscriptional regulation can be controlled through both protein-RNA and RNA-RNA interactions. As used herein, posttranscriptional regulatory elements include nucleotide sequences including but not limited Woodchuck Hepatitis Virus Posttranscriptional Regulatory Elements. In some embodiments of any of the aspects, the nucleic acid sequences described herein can further comprise a posttranscriptional regulatory element operably linked to the sequence encoding the GATA1 polypeptide.


In some embodiments of any of the aspects, the posttranscriptional regulatory element comprises a Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element. Woodchuck Hepatitis Virus (WHP) Posttranscriptional Regulatory Element, abbreviated WPRE, is a DNA sequence that, when transcribed, creates a tertiary structure enhancing expression. WPRE is a tripartite regulatory element with gamma, alpha, and beta components.


In some embodiments of any of the aspects, the Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element (WPRE) comprises, consists of, or consists essentially of the following nucleotide sequence (SEQ ID NO: 56):









GCCACGGCGGAACTCATCGCCGCCTGCCTTGCCCGCTGCTGGACAGGGGC





TCGGCTGTTGGGCACTGACAATTCCGTGGT






In some embodiments of any of the aspects, the Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element (WPRE) comprises, consists of, or consists essentially of the following nucleotide sequence (SEQ ID NO: 63):









AATCAACCTCTGGATTACAAAATTTGTGAAAGATTGACTGGTATTCTTAA





CTATGTTGCTCCTTTTACGCTATGTGGATACGCTGCTTTAATGCCTTTGT





ATCATGCTATTGCTTCCCGTATGGCTTTCATTTTCTCCTCCTTGTATAAA





TCCTGGTTGCTGTCTCTTTATGAGGAGTTGTGGCCCGTTGTCAGGCAACG





TGGCGTGGTGTGCACTGTGTTTGCTGACGCAACCCCCACTGGTTGGGGCA





TTGCCACCACCTGTCAGCTCCTTTCCGGGACTTTCGCTTTCCCCCTCCCT





ATTGCCACGGCGGAACTCATCGCCGCCTGCCTTGCCCGCTGCTGGACAGG





GGCTCGGCTGTTGGGCACTGACAATTCCGTGGTGTTGTCGGGGAAGCTGA





CGTCCTTTCCATGGCTGCTCGCCTGTGTTGCCACCTGGATTCTGCGCGGG





ACGTCCTTCTGCTACGTCCCTTCGGCCCTCAATCCAGCGGACCTTCCTTC





CCGCGGCCTGCTGCCGGCTCTGCGGCCTCTTCCGCGTCTTCGCCTTCGCC





CTCAGACGAGTCGGATCTCCCTTTGGGCCGCCTCCCCGCCTG






Alternative and/or optimized WPRE are also known in the art, e.g., as described in Patel and Olsen RNA Virus Vectors 11:S322 (2005), which is incorporated by reference herein in its entirey.


In some embodiments of any of the aspects, a WPRE comprises a sequence of at least 80% homology to a nucleotide sequence that is of: SEQ ID NO: 56 and/or SEQ ID NO: 63. In some embodiments of any of the aspects, a WPRE comprises a sequence of at least with at least 60%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98% or greater sequence identity to SEQ ID NO: 56 and/or SEQ ID NO: 63. In some embodiments of any of the aspects, a WPRE comprises a sequence of at least with at least 60%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98% or greater sequence identity to SEQ ID NO: 56 and/or SEQ ID NO: 63 and which retains the wild-type activity of SEQ ID NO: 56 and/or SEQ ID NO: 63. A nucleic acid sequence described herein can comprise multiple post-transcriptional regulatory elements, e.g., the nucleic acid sequence comprises at least one, or at least 2, or at least 3, or at least 4, or at least 5, or at least 6, or at least 7, or at least 10, or at least 11, or at least 12, or at least 13, or at least 14, or at least 15, or at least 16, or at least 17, or at least 20, or at least 25, or at least 30 post-transcriptional regulatory elements.


In some embodiments of any of the aspects, the posttranscriptional regulatory element is located at least about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least about 5 kb, at least about 6 kb, at least about 7 kb, at least about 8 kb, at least about 9 kb, at least about 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the posttranscriptional regulatory element sequence is located at least 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least 5 kb, at least 6 kb, at least 7 kb, at least 8 kb, at least 9 kb, at least 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the posttranscriptional regulatory element sequence is located at about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at about 5 kb, at about 6 kb, at about 7 kb, at about 8 kb, at about 9 kb, or at about 10 kb from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the posttranscriptional regulatory element sequence can be in intergenic sequence or in the sequence of an intervening gene. In some embodiments of any of the aspects described herein, the posttranscriptional regulatory element sequence can be located within the sequence which is about 500 bp to about 10 kb from the end of the open reading frame, e.g., about 1 kb to about 9 kb, about 2 kb to about 8 kb, about 3 kb to about 7 kb, or about 4 kb to about 6 kb from the open reading frame. In some embodiments of any of the aspects described herein, the posttranscriptional regulatory element sequence can be located from about 500 bp to 10 kb from the end of the open reading frame, e.g., 1 kb to 9 kb, 2 kb to 8 kb, 3 kb to 7 kb, or 4 kb to 6 kb from the open reading frame.


In some embodiments of any of the aspects, a nucleic acid sequence described herein can further comprise an internal ribosome entry site. An internal ribosome entry site, abbreviated IRES, is an RNA element that allows for translation initiation in a cap-independent manner, as part of the greater process of protein synthesis. In eukaryotic translation, initiation typically occurs at the 5′ end of mRNA molecules, since 5′ cap recognition is required for the assembly of the initiation complex. The location for IRES elements is often in the 5′UTR, but can also occur elsewhere in mRNAs.


In some embodiments of any of the aspects, the internal ribosome entry site comprises, consists of, or consists essentially of the following nucleotide sequence (SEQ ID NO: 66)









CCCCTCTCCCTCCCCCCCCCCTAACGTTACTGGCCGAAGCCGCTTGGAAT





AAGGCCGGTGTGCGTTTGTCTATATGTTATTTTCCACCATATTGCCGTCT





TTTGGCAATGTGAGGGCCCGGAAACCTGGCCCTGTCTTCTTGACGAGCAT





TCCTAGGGGTCTTTCCCCTCTCGCCAAAGGAATGCAAGGTCTGTTGAATG





TCGTGAAGGAAGCAGTTCCTCTGGAAGCTTCTTGAAGACAAACAACGTCT





GTAGCGACCCTTTGCAGGCAGCGGAACCCCCCACCTGGCGACAGGTGCCT





CTGCGGCCAAAAGCCACGTGTATAAGATACACCTGCAAAGGCGGCACAAC





CCCAGTGCCACGTTGTGAGTTGGATAGTTGTGGAAAGAGTCAAATGGCTC





TCCTCAAGCGTATTCAACAAGGGGCTGAAGGATGCCCAGAAGGTACCCCA





TTGTATGGGATCTGATCTGGGGCCTCGGTACACATGCTTTACATGTGTTT





AGTCGAGGTTAAAAAAACGTCTAGGCCCCCCGAACCACGGGGACGTGGTT





TTCCTTTGAAAAACACGATGATAATATGGCCACAACC






In some embodiments of any of the aspects, described herein is a IRES comprising a sequence of at least 80% homology to a nucleotide sequence that is of: SEQ ID NO: 66. In some embodiments of any of the aspects, a IRES comprises a sequence of at least 60%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98% or greater sequence identity to SEQ ID NO: 66. In some embodiments of any of the aspects, a IRES comprises a sequence with at least 60%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98% or greater sequence identity to SEQ ID NO: 66, which retains the wild-type activity of SEQ ID NO: 66.


Nucleic acid sequences described herein can comprise multiple IRES', e.g., a nucleic acid sequence can comprise at least one, or at least 2, or at least 3, or at least 4, or at least 5, or at least 6, or at least 7, or at least 10, or at least 11, or at least 12, or at least 13, or at least 14, or at least 15, or at least 16, or at least 17, or at least 20, or at least 25, or at least 30 IRES sequences.


In some embodiments of any of the aspects, the IRES is located at least about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least about 5 kb, at least about 6 kb, at least about 7 kb, at least about 8 kb, at least about 9 kb, at least about 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the IRES sequence is located at least 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at least 5 kb, at least 6 kb, at least 7 kb, at least 8 kb, at least 9 kb, at least 10 kb or further from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the IRES sequence is located at about 5 kb from the boundary of the GATA-1 gene's open reading frame, e.g., at about 5 kb, at about 6 kb, at about 7 kb, at about 8 kb, at about 9 kb, or at about 10 kb from the boundary of the GATA-1 gene's open reading frame. In some embodiments of any of the aspects, the IRES sequence can be in intergenic sequence or in the sequence of an intervening gene. In some embodiments of any of the aspects described herein, the IRES sequence can be located within the sequence which is about 500 bp to about 10 kb from the end of the open reading frame, e.g., about 1 kb to about 9 kb, about 2 kb to about 8 kb, about 3 kb to about 7 kb, or about 4 kb to about 6 kb from the open reading frame. In some embodiments of any of the aspects described herein, the IRES sequence can be located within the sequence which is 500 bp to 10 kb from the end of the open reading frame, e.g., 1 kb to 9 kb, 2 kb to 8 kb, 3 kb to 7 kb, or 4 kb to 6 kb from the open reading frame.


In some embodiments of any of the aspects, a nucleic acid sequence described herein can further comprise a self-cleaving 2 A polypeptide. A self-cleaving peptide, or 2A peptide, is a polypeptide which can induce the cleaving of a polypeptide of which it is a part, e.g., a recombinant GATA-1 described herein. Thus, a 2A peptide can be used to cleave a longer peptide into two shorter peptides, thereby two peptides can be generated with a single transcript. 2A peptides are derived from the 2A region in the genome of a virus. The 2A-peptide-mediated cleavage commences after the translation. The cleavage is trigged by breaking of peptide bond between the Proline (P) and Glycine (G) in C-terminal of 2A peptide. A 2A polypeptide can comprise at least 10, at least, 15, at least 20, at least 25, at least 30, or at least 40 amino acids.


In some embodiments of any of the aspects, 2A peptides can be combined with the IRES elements in a single nucleic acid sequence, thereby generating three separate polypeptides encoded within a single transcript.


Exemplary 2A peptides that can be used with the methods described herein include, but are not limited to P2A, E2A, F2A and T2A (see also Table 4, SEQ ID NOs: 57-60). F2A is derived from foot-and-mouth disease virus 18; E2A is derived from equine rhinitis A virus; P2A is derived from porcine teschovirus-1 2A; T2A is derived from thosea asigna virus 2A.









TABLE 4







Names and sequences of 2A peptides that can be


used in various embodiments described herein. An


optional linker “GSG” (Gly-Ser-Gly)(bolded) can be


added on the N-terminal of the 2A peptides listed.








Name
Sequence





T2A

GSG EGRGSLLTCGDVEENPGP (SEQ ID NO: 57)






P2A

GSG ATNFSLLKQAGDVEENPGP (SEQ ID NO: 58)






E2A

GSG QCTNYALLKLAGDVESNPGP (SEQ ID NO: 59)






F2A

GSG VKQTLNFDLLKLAGDVESNPGP (SEQ ID NO: 60)










In some embodiments of any of the aspects, the IRES and/or self-cleaving 2A polypeptide can be operably linked to a marker gene, e.g., a marker gene encoding an optically detectable protein or an enzyme. Optically detectable proteins/enzymes can comprise an optically detectable label and/or comprise the ability to generate a detectable signal (e.g. by catalyzing reaction converting a compound to a detectable product). Detectable labels can comprise, for example, a light-absorbing moiety or a fluorescent moiety. Detectable labels, marker genes, methods of detecting them, and methods of incorporating them into reagents (e.g. antibodies and nucleic acid probes) are well known in the art.


Optically detectable labels/signals can comprise those visible to the human eye or those detectable with optical equipment, e.g., by spectroscopic, photochemical, biochemical, immunochemical, electromagnetic, radiochemical, or chemical means, such as fluorescence, chemifluoresence, or chemiluminescence, or any other appropriate means. Detectable labels can include, but are not limited to radioisotopes, bioluminescent compounds, chromophores, antibodies, chemiluminescent compounds, fluorescent compounds, metal chelates, and enzymes.


Marker genes are well-known in the art, e.g., and can include but are not limited to naturally fluorescent proteins such as the Green Fluorescent Protein (GFP) of Aequorea victoria (Cubitt, A. B. et al. 1995. Understanding, improving, and using green fluorescent proteins. Trends Biochem. Sci. 20: 448-455; Chalfie, M., and Prasher, D. C. U.S. Pat. No. 5,491,084), a lacZ gene encoding a beta-galactosidase enzyme, horseradish peroxidase, alkaline phosphatase, malate dehydrogenase, staphylococcal nuclease, delta-V-steroid isomerase, yeast alcohol dehydrogenase, alpha-glycerophosphate dehydrogenase, triose phosphate isomerase, horseradish peroxidase, alkaline phosphatase, asparaginase, glucose oxidase, beta-galactosidase, ribonuclease, urease, catalase, glucose-VI-phosphate dehydrogenase, glucoamylase and acetylcholinesterase.


In some embodiments of any of the aspects, the nucleic acid sequence described herein can comprise, consist of, or consists essentially of a sequence selected from SEQ ID NOs 8, 9, 61, and 62.


SEQ ID NO: 61 (also designated as R18 EF1a IRES GFP) comprises an EF1A promoter, an IRES sequence operably linked to a nucleotide sequence encoding










GFP: GTCGACGGATCGGGAGATCTCCCGATCCCCTATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCA






GTATCTGCTCCCTGCTTGTGTGTTGGAGGTCGCTGAGTAGTGCGCGAGCAAAATTTAAGCTACAACAAGGCAAGGCTTGACCG





ACAATTGCATGAAGAATCTGCTTAGGGTTAGGCGTTTTGCGCTGCTTCGCGATGTACGGGCCAGATATACGCGTTGACATTGA





TTATTGACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTACATAACTTAC





GGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAA





TAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCA





AGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTAC





TTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGT





TTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCA





AAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGCGCGTTTTG





CCTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGGGAACCCACTGCTTAAGCCTCAA





TAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTA





GTCAGTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGGGAAACCAGAGGAGCTCTCTCGACGCAG





GACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCGGCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCT





AGAAGGAGAGAGATGGGTGCGAGAGCGTCAGTATTAAGCGGGGGAGAATTAGATCGCGATGGGAAAAAATTCGGTTAAGGCCA





GGGGGAAAGAAAAAATATAAATTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAACGATTCGCAGTTAATCCTGGCCTGTT





AGAAACATCAGAAGGCTGTAGACAAATACTGGGACAGCTACAACCATCCCTTCAGACAGGATCAGAAGAACTTAGATCATTAT





ATAATACAGTAGCAACCCTCTATTGTGTGCATCAAAGGATAGAGATAAAAGACACCAAGGAAGCTTTAGACAAGATAGAGGAA





GAGCAAAACAAAAGTAAGACCACCGCACAGCAAGCGGCCGGCCGCTGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAAT





TGGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGT





GCAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGT





CAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTATTGAGGCG





CAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAATCCTGGCTGTGGAAAGATACCTAAAGGA





TCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATA





AATCTCTGGAACAGATTTGGAATCACACGACCTGGATGGAGTGGGACAGAGAAATTAACAATTACACAAGCTTAATACACTCC





TTAATTGAAGAATCGCAAAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTG





GTTTAACATAACAAATTGGCTGTGGTATATAAAATTATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTG





CTGTACTTTCTATAGTGAATAGAGTTAGGCAGGGATATTCACCATTATCGTTTCAGACCCACCTCCCAACCCCGAGGGGACCC





GACAGGCCCGAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCATTCGATTAGTGAACGGATCGGCACTGCG





TGCGCCAATTCTGCAGACAAATGGCAGTATTCATCCACAATTTTAAAAGAAAAGGGGGGATTGGGGGGTACAGTGCAGGGGAA





AGAATAGTAGACATAATAGCAACAGACATACAAACTAAAGAATTACAAAAACAAATTACAAAAATTCAAAATTTTCGGGTTTA





TTACAGGGACAGCAGAGATCCAGTTTGGTTAGTACCGGGCCCGCTCTAGCGTGAGGCTCCGGTGCCCGTCAGTGGGCAGAGCG





CACATCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACCGGTGCCTAGAGAAGGTGGCGCGGGGTAAACT





GGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGAGGGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAAC





GTTCTTTTTCGCAACGGGTTTGCCGCCAGAACACAGGTAAGTGCCGTGTGTGGTTCCCGCGGGCCTGGCCTCTTTACGGGTTA





TGGCCCTTGCGTGCCTTGAATTACTTCCACCTGGCTGCAGTACGTGATTCTTGATCCCGAGCTTCGGGTTGGAAGTGGGTGGG





AGAGTTCGAGGCCTTGCGCTTAAGGAGCCCCTTCGCCTCGTGCTTGAGTTGAGGCCTGGCCTGGGCGCTGGGGCCGCCGCGTG





CGAATCTGGTGGCACCTTCGCGCCTGTCTCGCTGCTTTCGATAAGTCTCTAGCCATTTAAAATTTTTGATGACCTGCTGCGAC





GCTTTTTTTCTGGCAAGATAGTCTTGTAAATGCGGGCCAAGATCTGCACACTGGTATTTCGGTTTTTGGGGCCGCGGGCGGCG





ACGGGGCCCGTGCGTCCCAGCGCACATGTTCGGCGAGGCGGGGCCTGCGAGCGCGGCCACCGAGAATCGGACGGGGGTAGTCT





CAAGCTGGCCGGCCTGCTCTGGTGCCTGGCCTCGCGCCGCCGTGTATCGCCCCGCCCTGGGCGGCAAGGCTGGCCCGGTCGGC





ACCAGTTGCGTGAGCGGAAAGATGGCCGCTTCCCGGCCCTGCTGCAGGGAGCTCAAAATGGAGGACGCGGCGCTCGGGAGAGC





GGGCGGGTGAGTCACCCACACAAAGGAAAAGGGCCTTTCCGTCCTCAGCCGTCGCTTCATGTGACTCCACGGAGTACCGGGCG





CCGTCCAGGCACCTCGATTAGTTCTCGAGCTTTTGGAGTACGTCGTCTTTAGGTTGGGGGGAGGGGTTTTATGCGATGGAGTT





TCCCCACACTGAGTGGGTGGAGACTGAAGTTAGGCCAGCTTGGCACTTGATGTAATTCTCCTTGGAATTTGCCCTTTTTGAGT





TTGGATCTTGGTTCATTCTCAAGCCTCAGACAGTGGTTCAAAGTTTTTTTCTTCCATTTCAGGTGTCGTGAGCGGCCGCTGAG





TTAACTATTCTAGACCCGGGCTAGGATCCGCCCCTCTCCCTCCCCCCCCCCTAACGTTACTGGCCGAAGCCGCTTGGAATAAG





GCCGGTGTGCGTTTGTCTATATGTTATTTTCCACCATATTGCCGTCTTTTGGCAATGTGAGGGCCCGGAAACCTGGCCCTGTC





TTCTTGACGAGCATTCCTAGGGGTCTTTCCCCTCTCGCCAAAGGAATGCAAGGTCTGTTGAATGTCGTGAAGGAAGCAGTTCC





TCTGGAAGCTTCTTGAAGACAAACAACGTCTGTAGCGACCCTTTGCAGGCAGCGGAACCCCCCACCTGGCGACAGGTGCCTCT





GCGGCCAAAAGCCACGTGTATAAGATACACCTGCAAAGGCGGCACAACCCCAGTGCCACGTTGTGAGTTGGATAGTTGTGGAA





AGAGTCAAATGGCTCTCCTCAAGCGTATTCAACAAGGGGCTGAAGGATGCCCAGAAGGTACCCCATTGTATGGGATCTGATCT





GGGGCCTCGGTACACATGCTTTACATGTGTTTAGTCGAGGTTAAAAAAACGTCTAGGCCCCCCGAACCACGGGGACGTGGTTT





TCCTTTGAAAAACACGATGATAATATGGCCACAACCATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCT





GGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGA





CCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGC





TTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCAT





CTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGA





AGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATG





GCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCA





CTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGCA





AAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTG





TACAAGTAAAGCGGCCGCATCGATACCGTCGACCTCGATCGAGACCTAGAAAAACATGGAGCAATCACAAGTAGCAATACAGC





AGCTACCAATGCTGATTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAGGTGGGTTTTCCAGTCACACCTCAGGTACCTTTAA





GACCAATGACTTACAAGGCAGCTGTAGATCTTAGCCACTTTTTAAAAGAAAAGGGGGGACTGGAAGGGCTAATTCACTCCCAA





CGAAGACAAGATATCCTTGATCTGTGGATCTACCACACACAAGGCTACTTCCCTGATTGGCAGAACTACACACCAGGGCCAGG





GATCAGATATCCACTGACCTTTGGATGGTGCTACAAGCTAGTACCAGTTGAGCAAGAGAAGGTAGAAGAAGCCAATGAAGGAG





AGAACACCCGCTTGTTACACCCTGTGAGCCTGCATGGGATGGATGACCCGGAGAGAGAAGTATTAGAGTGGAGGTTTGACAGC





CGCCTAGCATTTCATCACATGGCCCGAGAGCTGCATCCGGACTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGA





GCTCTCTGGCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGT





TGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGCATGTGAGCAAAAGGCCAG





CAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCG





ACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTC





CTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGT





AGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTT





ATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCA





GAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATC





TGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGG





TTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACG





CTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAA





AAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTAT





CTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCAT





CTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGG





GCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTC





GCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCA





GCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATC





GTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGT





AAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGG





CGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTC





TCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCAC





CAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCA





TACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAA





AATAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGAC







SEQ ID NO: 8 (also designated as R21 miR126) comprises an EF1A promoter, and an IRES sequence operably linked to a nucleotide sequence encoding GFP and four miRNAa binding site for the HSC restricted miRNA miR126:










GTCGACGGATCGGGAGATCTCCCGATCCCCTATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAA






GCCAGTATCTGCTCCCTGCTTGTGTGTTGGAGGTCGCTGAGTAGTGCGCGAGCAAAATTTAAGCTACAACAAGGC





AAGGCTTGACCGACAATTGCATGAAGAATCTGCTTAGGGTTAGGCGTTTTGCGCTGCTTCGCGATGTACGGGCCA





GATATACGCGTTGACATTGATTATTGACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCAT





ATATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGA





CGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTAC





GGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTA





AATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAG





TCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGAT





TTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTC





GTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGCGCGTTTT





GCCTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGGGAACCCACTGCTT





AAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGA





TCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGGGA





AACCAGAGGAGCTCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCGGCGACTGG





TGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTGCGAGAGCGTCAGTATTAAGCGGG





GGAGAATTAGATCGCGATGGGAAAAAATTCGGTTAAGGCCAGGGGGAAAGAAAAAATATAAATTAAAACATATAG





TATGGGCAAGCAGGGAGCTAGAACGATTCGCAGTTAATCCTGGCCTGTTAGAAACATCAGAAGGCTGTAGACAAA





TACTGGGACAGCTACAACCATCCCTTCAGACAGGATCAGAAGAACTTAGATCATTATATAATACAGTAGCAACCC





TCTATTGTGTGCATCAAAGGATAGAGATAAAAGACACCAAGGAAGCTTTAGACAAGATAGAGGAAGAGCAAAACA





AAAGTAAGACCACCGCACAGCAAGCGGCCGGCCGCTGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTG





GAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAG





AGTGGTGCAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCAC





TATGGGCGCAGCGTCAATGACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGCAGAACAA





TTTGCTGAGGGCTATTGAGGCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAG





AATCCTGGCTGTGGAAAGATACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTG





CACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAAATCTCTGGAACAGATTTGGAATCACACGACCTGGAT





GGAGTGGGACAGAGAAATTAACAATTACACAAGCTTAATACACTCCTTAATTGAAGAATCGCAAAACCAGCAAGA





AAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTTAACATAACAAATTGGCT





GTGGTATATAAAATTATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGTACTTTCTAT





AGTGAATAGAGTTAGGCAGGGATATTCACCATTATCGTTTCAGACCCACCTCCCAACCCCGAGGGGACCCGACAG





GCCCGAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCATTCGATTAGTGAACGGATCGGCACT





GCGTGCGCCAATTCTGCAGACAAATGGCAGTATTCATCCACAATTTTAAAAGAAAAGGGGGGATTGGGGGGTACA





GTGCAGGGGAAAGAATAGTAGACATAATAGCAACAGACATACAAACTAAAGAATTACAAAAACAAATTACAAAAA





TTCAAAATTTTCGGGTTTATTACAGGGACAGCAGAGATCCAGTTTGGTTAGTACCGGGCCCGCTCTAGCGTGAGG





CTCCGGTGCCCGTCAGTGGGCAGAGCGCACATCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTG





AACCGGTGCCTAGAGAAGGTGGCGCGGGGTAAACTGGGAAAGTGATGTCGTGTACTGGCTCCGCCTTTTTCCCGA





GGGTGGGGGAGAACCGTATATAAGTGCAGTAGTCGCCGTGAACGTTCTTTTTCGCAACGGGTTTGCCGCCAGAAC





ACAGGTAAGTGCCGTGTGTGGTTCCCGCGGGCCTGGCCTCTTTACGGGTTATGGCCCTTGCGTGCCTTGAATTAC





TTCCACCTGGCTGCAGTACGTGATTCTTGATCCCGAGCTTCGGGTTGGAAGTGGGTGGGAGAGTTCGAGGCCTTG





CGCTTAAGGAGCCCCTTCGCCTCGTGCTTGAGTTGAGGCCTGGCCTGGGCGCTGGGGCCGCCGCGTGCGAATCTG





GTGGCACCTTCGCGCCTGTCTCGCTGCTTTCGATAAGTCTCTAGCCATTTAAAATTTTTGATGACCTGCTGCGAC





GCTTTTTTTCTGGCAAGATAGTCTTGTAAATGCGGGCCAAGATCTGCACACTGGTATTTCGGTTTTTGGGGCCGC





GGGCGGCGACGGGGCCCGTGCGTCCCAGCGCACATGTTCGGCGAGGCGGGGCCTGCGAGCGCGGCCACCGAGAAT





CGGACGGGGGTAGTCTCAAGCTGGCCGGCCTGCTCTGGTGCCTGGCCTCGCGCCGCCGTGTATCGCCCCGCCCTG





GGCGGCAAGGCTGGCCCGGTCGGCACCAGTTGCGTGAGCGGAAAGATGGCCGCTTCCCGGCCCTGCTGCAGGGAG





CTCAAAATGGAGGACGCGGCGCTCGGGAGAGCGGGCGGGTGAGTCACCCACACAAAGGAAAAGGGCCTTTCCGTC





CTCAGCCGTCGCTTCATGTGACTCCACGGAGTACCGGGCGCCGTCCAGGCACCTCGATTAGTTCTCGAGCTTTTG





GAGTACGTCGTCTTTAGGTTGGGGGGAGGGGTTTTATGCGATGGAGTTTCCCCACACTGAGTGGGTGGAGACTGA





AGTTAGGCCAGCTTGGCACTTGATGTAATTCTCCTTGGAATTTGCCCTTTTTGAGTTTGGATCTTGGTTCATTCT





CAAGCCTCAGACAGTGGTTCAAAGTTTTTTTCTTCCATTTCAGGTGTCGTGAGCGGCCGCTGAGTTAACTATTCT





AGACCCGGGCTAGGATCCGCCCCTCTCCCTCCCCCCCCCCTAACGTTACTGGCCGAAGCCGCTTGGAATAAGGCC





GGTGTGCGTTTGTCTATATGTTATTTTCCACCATATTGCCGTCTTTTGGCAATGTGAGGGCCCGGAAACCTGGCC





CTGTCTTCTTGACGAGCATTCCTAGGGGTCTTTCCCCTCTCGCCAAAGGAATGCAAGGTCTGTTGAATGTCGTGA





AGGAAGCAGTTCCTCTGGAAGCTTCTTGAAGACAAACAACGTCTGTAGCGACCCTTTGCAGGCAGCGGAACCCCC





CACCTGGCGACAGGTGCCTCTGCGGCCAAAAGCCACGTGTATAAGATACACCTGCAAAGGCGGCACAACCCCAGT





GCCACGTTGTGAGTTGGATAGTTGTGGAAAGAGTCAAATGGCTCTCCTCAAGCGTATTCAACAAGGGGCTGAAGG





ATGCCCAGAAGGTACCCCATTGTATGGGATCTGATCTGGGGCCTCGGTACACATGCTTTACATGTGTTTAGTCGA





GGTTAAAAAAACGTCTAGGCCCCCCGAACCACGGGGACGTGGTTTTCCTTTGAAAAACACGATGATAATATGGCC





ACAACCATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTA





AACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATC





TGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGC





CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACC





ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC





ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGC





CACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAG





GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC





AACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAG





TTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAAAGCGGCCGCATCGATAATCAACCT





CTGGATTACAAAATTTGTGAAAGATTGACTGGTATTCTTAACTATGTTGCTCCTTTTACGCTATGTGGATACGCT





GCTTTAATGCCTTTGTATCATGCTATTGCTTCCCGTATGGCTTTCATTTTCTCCTCCTTGTATAAATCCTGGTTG





CTGTCTCTTTATGAGGAGTTGTGGCCCGTTGTCAGGCAACGTGGCGTGGTGTGCACTGTGTTTGCTGACGCAACC





CCCACTGGTTGGGGCATTGCCACCACCTGTCAGCTCCTTTCCGGGACTTTCGCTTTCCCCCTCCCTATTGCCACG





GCGGAACTCATCGCCGCCTGCCTTGCCCGCTGCTGGACAGGGGCTCGGCTGTTGGGCACTGACAATTCCGTGGTG





TTGTCGGGGAAGCTGACGTCCTTTCCATGGCTGCTCGCCTGTGTTGCCACCTGGATTCTGCGCGGGACGTCCTTC





TGCTACGTCCCTTCGGCCCTCAATCCAGCGGACCTTCCTTCCCGCGGCCTGCTGCCGGCTCTGCGGCCTCTTCCG





CGTCTTCGCCTTCGCCCTCAGACGAGTCGGATCTCCCTTTGGGCCGCCTCCCCGCGAATTCGCATTATTACTCAC





GGTACGAGCATTATTACTCACGGTACGAGCATTATTACTCACGGTACGAGCATTATTACTCACGGTACGAGCGAT





CGCCCTCAGGTACCTTTAAGACCAATGACTTACAAGGCAGCTGTAGATCTTAGCCACTTTTTAAAAGAAAAGGGG





GGACTGGAAGGGCTAATTCACTCCCAACGAAGACAAGATATCCTTGATCTGTGGATCTACCACACACAAGGCTAC





TTCCCTGATTGGCAGAACTACACACCAGGGCCAGGGATCAGATATCCACTGACCTTTGGATGGTGCTACAAGCTA





GTACCAGTTGAGCAAGAGAAGGTAGAAGAAGCCAATGAAGGAGAGAACACCCGCTTGTTACACCCTGTGAGCCTG





CATGGGATGGATGACCCGGAGAGAGAAGTATTAGAGTGGAGGTTTGACAGCCGCCTAGCATTTCATCACATGGCC





CGAGAGCTGCATCCGGACTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTA





GGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGAC





TCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGCATGTGAGCAAAAGGCCAG





CAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCAC





AAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGC





TCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTG





GCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCAC





GAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGAC





TTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTG





AAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTC





GGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAG





CAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAAC





GAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAA





TGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCA





CCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGG





GAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCA





ATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAAT





TGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATC





GTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCC





CCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTA





TCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGT





GAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGAT





AATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGG





ATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTC





ACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGT





TGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATA





TTTGAATGTATTTAGAAAAATAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGAC







SEQ ID NO: 9 (also designated as R49 1 peak enhancer) comprises, an IRES sequence operably linked to a nucleotide sequence encoding GFP and one hematopoietic enhancer element:










GTCGACGGATCGGGAGATCTCCCGATCCCCTATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGTAT






CTGCTCCCTGCTTGTGTGTTGGAGGTCGCTGAGTAGTGCGCGAGCAAAATTTAAGCTACAACAAGGCAAGGCTTGACCGACAA





TTGCATGAAGAATCTGCTTAGGGTTAGGCGTTTTGCGCTGCTTCGCGATGTACGGGCCAGATATACGCGTTGACATTGATTAT





TGACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTACATAACTTACGGTA





AATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGG





GACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTA





CGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGG





CAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGA





CTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAAT





GTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGCGCGTTTTGCCTG





TACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAA





GCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCA





GTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGGGAAACCAGAGGAGCTCTCTCGACGCAGGACT





CGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCGGCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAA





GGAGAGAGATGGGTGCGAGAGCGTCAGTATTAAGCGGGGGAGAATTAGATCGCGATGGGAAAAAATTCGGTTAAGGCCAGGGG





GAAAGAAAAAATATAAATTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAACGATTCGCAGTTAATCCTGGCCTGTTAGAA





ACATCAGAAGGCTGTAGACAAATACTGGGACAGCTACAACCATCCCTTCAGACAGGATCAGAAGAACTTAGATCATTATATAA





TACAGTAGCAACCCTCTATTGTGTGCATCAAAGGATAGAGATAAAAGACACCAAGGAAGCTTTAGACAAGATAGAGGAAGAGC





AAAACAAAAGTAAGACCACCGCACAGCAAGCGGCCGGCCGCTGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGA





GAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAG





AGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAAT





GACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAAC





AGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAATCCTGGCTGTGGAAAGATACCTAAAGGATCAA





CAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAAATC





TCTGGAACAGATTTGGAATCACACGACCTGGATGGAGTGGGACAGAGAAATTAACAATTACACAAGCTTAATACACTCCTTAA





TTGAAGAATCGCAAAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTT





AACATAACAAATTGGCTGTGGTATATAAAATTATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGT





ACTTTCTATAGTGAATAGAGTTAGGCAGGGATATTCACCATTATCGTTTCAGACCCACCTCCCAACCCCGAGGGGACCCGACA





GGCCCGAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCATTCGATTAGTGAACGGATCGGCACTGCGTGCG





CCAATTCTGCAGACAAATGGCAGTATTCATCCACAATTTTAAAAGAAAAGGGGGGATTGGGGGGTACAGTGCAGGGGAAAGAA





TAGTAGACATAATAGCAACAGACATACAAACTAAAGAATTACAAAAACAAATTACAAAAATTCAAAATTTTCGGGTTTATTAC





AGGGACAGCAGAGATCCAGTTTGGTTAGTACCGGGCCCGCTCTAGCGTGAGGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACA





TCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACCGGTGCTAGCATGGCGGGCAAGAAGTTGAGGCCACT





GTCCCTGGGTGTTCCTACCCCCACACCCTCACCCCAAGACAGCCTGTTACTGCGGCGCCAACAGCCACGGTCGCCTACATCTG





ATAAGACTTATCTGCTGCCCCAGGGCAGGCCGGAGCTGGCGTAAGCCCCAGTGGGGCGCTAAGTGAGTGTGCCCCTGCCTCCC





GCCAGCACTGGCCTGGCCTGCAGGCTTAGCCTGGGTCATCAAGGTATCCCACAGGCTCTAGTTCAAATCCAGCAGAACCTCTC





TGAGCCTCACTCTTCTCACCTGCAAAATGGGTACAGCCACATCCCTTCTCTCCCTGCAGCCAGGAAGACGCACATACACAGGA





GTCTAGCCCACACCGGCCCCGCACAAATTAAGGGCTTTACTCTCTGAAAAGCCCAGTGAAGTCATGAAACCATATCTGCTATT





TTCATTTATCTTGGTTTCAGCCTATTTTGCTTGTCTGGACACTACAGTCCACGGGAGCCTAGGTCGAGCGAGGTCCAAGAATC





CCCAGGGTGGGCAGGGAGGGTGGAAGAGGGCCTCCAGTGCCCAAGAGGTGCCCCACAAGCATGGGACCCGCCCCCTCCCCTGG





ACTGCCCCACCCACTGGGGCACCAGCCACTCCCTGGGGAGGAGGGAGGAGGGAGAAGGGAGGGAGGGAGGGAGGGAGGAAGGG





AGCCTCAAAGGCCAAGGCCAGCCAGGACACCCCCTGGGATCACACTGAGCTTGCCACATCCCCAAGGCGGCCGAACCCTCCGC





AACCACCAGCCCAGAGATCTAGAGTTAATCCCCAGAGGCTCCATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCC





CATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCA





AGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTG





CAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCG





CACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCG





AGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTAT





ATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGC





CGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCC





TGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGAC





GAGCTGTACAAGTAAAGCGGCCGCATCGATACCGTCGACCTCGATCGAGACCTAGAAAAACATGGAGCAATCACAAGTAGCAA





TACAGCAGCTACCAATGCTGATTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAGGTGGGTTTTCCAGTCACACCTCAGGTAC





CTTTAAGACCAATGACTTACAAGGCAGCTGTAGATCTTAGCCACTTTTTAAAAGAAAAGGGGGGACTGGAAGGGCTAATTCAC





TCCCAACGAAGACAAGATATCCTTGATCTGTGGATCTACCACACACAAGGCTACTTCCCTGATTGGCAGAACTACACACCAGG





GCCAGGGATCAGATATCCACTGACCTTTGGATGGTGCTACAAGCTAGTACCAGTTGAGCAAGAGAAGGTAGAAGAAGCCAATG





AAGGAGAGAACACCCGCTTGTTACACCCTGTGAGCCTGCATGGGATGGATGACCCGGAGAGAGAAGTATTAGAGTGGAGGTTT





GACAGCCGCCTAGCATTTCATCACATGGCCCGAGAGCTGCATCCGGACTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGC





CTGGGAGCTCTCTGGCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCC





GTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGCATGTGAGCAAAA





GGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAA





AAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGC





GCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCA





CGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTG





CGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGA





TTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTT





GGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAG





CGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGT





CTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTA





AATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGC





ACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCT





TACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCC





GGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAG





TAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTT





CATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCT





CCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCC





ATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTT





GCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGA





AAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTAC





TTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAA





TACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATT





TAGAAAAATAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGAC







SEQ ID NO: 62 (also designated as R50 3 peak enhancer) comprises an IRES sequence operably linked to a nucleotide sequence encoding GFP and three hematopoietic enhancer elements:










GTCGACGGATCGGGAGATCTCCCGATCCCCTATGGTGCACTCTCAGTACAATCTGCTCTGATGCCGCATAGTTAAGCCAGTAT






CTGCTCCCTGCTTGTGTGTTGGAGGTCGCTGAGTAGTGCGCGAGCAAAATTTAAGCTACAACAAGGCAAGGCTTGACCGACAA





TTGCATGAAGAATCTGCTTAGGGTTAGGCGTTTTGCGCTGCTTCGCGATGTACGGGCCAGATATACGCGTTGACATTGATTAT





TGACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTACATAACTTACGGTA





AATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGG





GACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTA





CGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGG





CAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGTTTGA





CTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAAT





GTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGCGCGTTTTGCCTG





TACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAA





GCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCA





GTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACTTGAAAGCGAAAGGGAAACCAGAGGAGCTCTCTCGACGCAGGACT





CGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAGGGGCGGCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTAGAA





GGAGAGAGATGGGTGCGAGAGCGTCAGTATTAAGCGGGGGAGAATTAGATCGCGATGGGAAAAAATTCGGTTAAGGCCAGGGG





GAAAGAAAAAATATAAATTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAACGATTCGCAGTTAATCCTGGCCTGTTAGAA





ACATCAGAAGGCTGTAGACAAATACTGGGACAGCTACAACCATCCCTTCAGACAGGATCAGAAGAACTTAGATCATTATATAA





TACAGTAGCAACCCTCTATTGTGTGCATCAAAGGATAGAGATAAAAGACACCAAGGAAGCTTTAGACAAGATAGAGGAAGAGC





AAAACAAAAGTAAGACCACCGCACAGCAAGCGGCCGGCCGCTGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGA





GAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAG





AGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCAAT





GACGCTGACGGTACAGGCCAGACAATTATTGTCTGGTATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAAC





AGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAATCCTGGCTGTGGAAAGATACCTAAAGGATCAA





CAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAAATC





TCTGGAACAGATTTGGAATCACACGACCTGGATGGAGTGGGACAGAGAAATTAACAATTACACAAGCTTAATACACTCCTTAA





TTGAAGAATCGCAAAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAAATGGGCAAGTTTGTGGAATTGGTTT





AACATAACAAATTGGCTGTGGTATATAAAATTATTCATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGT





ACTTTCTATAGTGAATAGAGTTAGGCAGGGATATTCACCATTATCGTTTCAGACCCACCTCCCAACCCCGAGGGGACCCGACA





GGCCCGAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATCCATTCGATTAGTGAACGGATCGGCACTGCGTGCG





CCAATTCTGCAGACAAATGGCAGTATTCATCCACAATTTTAAAAGAAAAGGGGGGATTGGGGGGTACAGTGCAGGGGAAAGAA





TAGTAGACATAATAGCAACAGACATACAAACTAAAGAATTACAAAAACAAATTACAAAAATTCAAAATTTTCGGGTTTATTAC





AGGGACAGCAGAGATCCAGTTTGGTTAGTACCGGGCCCGCTCTAGCGTGAGGCTCCGGTGCCCGTCAGTGGGCAGAGCGCACA





TCGCCCACAGTCCCCGAGAAGTTGGGGGGAGGGGTCGGCAATTGAACCGGTACTGGCCTGGCCAACATAGTGAAACCCCATCT





CTCCTAATAATACAAAAATTAGCCAGGCATGGTGGCGGGTGCCTGTAATCCCAGCTACTCAGGAGACTGAGGCAGGATAATCA





CTTGAACCCAGCAGGTGGAGGCTGCAGTGAGCCAAGATCGTGCCACTGCACTCCAGCCTGGGTGACAGAGCAAGACTACATCT





CAAAAAAAAAAAAAAAAAAAAAAAGAAGATAGATGACCAACAAGTTTATGAAAATATGCTCAACATCAGTGGTCACAGGGAAA





TGCAAATCAAAACCATAACAAGATACCACTTCACACCCACACCCAGTAGGATGGCGCGATCGCAGAACCCCAGAAGATGCCAG





GAGGGAGTGAGCCAGTCAGGGAAGGCTTCCGAGAAGAGAGGACATTGAAGAAGAGTCTCAAACTTAGGCCTGACGGAGAAGAC





GCGCGGCCAGGACACCCCACCCCCGCCCTCGTCTCCCCCAAAGCCTGATCTGGCCCCACTGATTCCCTTATCTGCCCACTCCC





AGCTGCCTCCTTGCTGGCTGAACTGTCGCCGCAGACTTCTGAGCCTGCGCCCCCTCCACGGGGATGGGGGAGGGAATGGGGTG





AGGCCTGGCCTCACAGCCTCGGGGTTTCCAGCTCTTGCTGGAGGCAGGGCTCTGGGGCGCCCTACTCCTCACCCTTGGCTTCT





CTTCCTGAGCGCTCTGTGCTCTCCAGAGCTAGCATGGCGGGCAAGAAGTTGAGGCCACTGTCCCTGGGTGTTCCTACCCCCAC





ACCCTCACCCCAAGACAGCCTGTTACTGCGGCGCCAACAGCCACGGTCGCCTACATCTGATAAGACTTATCTGCTGCCCCAGG





GCAGGCCGGAGCTGGCGTAAGCCCCAGTGGGGCGCTAAGTGAGTGTGCCCCTGCCTCCCGCCAGCACTGGCCTGGCCTGCAGG





CTTAGCCTGGGTCATCAAGGTATCCCACAGGCTCTAGTTCAAATCCAGCAGAACCTCTCTGAGCCTCACTCTTCTCACCTGCA





AAATGGGTACAGCCACATCCCTTCTCTCCCTGCAGCCAGGAAGACGCACATACACAGGAGTCTAGCCCACACCGGCCCCGCAC





AAATTAAGGGCTTTACTCTCTGAAAAGCCCAGTGAAGTCATGAAACCATATCTGCTATTTTCATTTATCTTGGTTTCAGCCTA





TTTTGCTTGTCTGGACACTACAGTCCACGGGAGCCTAGGTCGAGCGAGGTCCAAGAATCCCCAGGGTGGGCAGGGAGGGTGGA





AGAGGGCCTCCAGTGCCCAAGAGGTGCCCCACAAGCATGGGACCCGCCCCCTCCCCTGGACTGCCCCACCCACTGGGGCACCA





GCCACTCCCTGGGGAGGAGGGAGGAGGGAGAAGGGAGGGAGGGAGGGAGGGAGGAAGGGAGCCTCAAAGGCCAAGGCCAGCCA





GGACACCCCCTGGGATCACACTGAGCTTGCCACATCCCCAAGGCGGCCGAACCCTCCGCAACCACCAGCCCAGAGATCTAGAG





TTAATCCCCAGAGGCTCCATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGA





CGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCA





CCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGAC





CACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGG





CAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGG





AGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAAC





GGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCC





CATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGC





GCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAAAGCGGCCGC





ATCGATACCGTCGACCTCGATCGAGACCTAGAAAAACATGGAGCAATCACAAGTAGCAATACAGCAGCTACCAATGCTGATTG





TGCCTGGCTAGAAGCACAAGAGGAGGAGGAGGTGGGTTTTCCAGTCACACCTCAGGTACCTTTAAGACCAATGACTTACAAGG





CAGCTGTAGATCTTAGCCACTTTTTAAAAGAAAAGGGGGGACTGGAAGGGCTAATTCACTCCCAACGAAGACAAGATATCCTT





GATCTGTGGATCTACCACACACAAGGCTACTTCCCTGATTGGCAGAACTACACACCAGGGCCAGGGATCAGATATCCACTGAC





CTTTGGATGGTGCTACAAGCTAGTACCAGTTGAGCAAGAGAAGGTAGAAGAAGCCAATGAAGGAGAGAACACCCGCTTGTTAC





ACCCTGTGAGCCTGCATGGGATGGATGACCCGGAGAGAGAAGTATTAGAGTGGAGGTTTGACAGCCGCCTAGCATTTCATCAC





ATGGCCCGAGAGCTGCATCCGGACTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGG





GAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACT





AGAGATCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGCATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGT





AAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTG





GCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGC





TTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTTTCTCATAGCTCACGCTGTAGGTATCTCAGTTCGGTG





TAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCT





TGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCG





GTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGAACAGTATTTGGTATCTGCGCTCTGCTGAAGCCA





GTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCA





GCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACT





CACGTTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCA





ATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATT





TCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAA





TGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGT





CCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCG





CAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGAT





CAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTG





GCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGAC





TGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATA





CCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTG





TTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGC





AAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAAT





ATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATAGGGGTT





CCGCGCACATTTCCCCGAAAAGTGCCACCTGAC






In some embodiments of any of the aspects, the nucleic acid sequence described herein is a vector or is comprised by or provided in a vector. The vector can be, e.g., a plasmid, viral vector, or an adenoviral, lentiviral or retroviral vector. As used herein, the term “retrovirus” refers a type of RNA virus that inserts a copy of its genome into the DNA of a host cell that it invades, thus changing the genome of that cell. Such viruses are either single stranded RNA or double stranded DNA viruses. In some embodiments of any of the aspects, the retrovirus is an alpha retrovirus. As used herein, the term “lentivirus” refers to a group (or genus) of complex retroviruses. lentiviruses are capable of infecting non-dividing and actively dividing cell types, whereas standard retroviruses can only infect mitotically active cell types. Illustrative lentiviruses include, but are not limited to: HIV (human immunodeficiency virus; including HIV type 1, and HIV type 2); visna-maedi virus (VMV) virus; the caprine arthritis-encephalitis virus (CAEV); equine infectious anemia virus (EIAV); feline immunodeficiency virus (FIV); bovine immune deficiency virus (BIV); and simian immunodeficiency virus (SIV). As used herein, the term “Adenoviruses” refers to nonenveloped viruses with an icosahedral nucleocapsid containing a double stranded DNA genome. As used herein, the term “viral vector” refers to a nucleic acid vector construct that includes at least one element of viral origin and has the capacity to be packaged into a viral vector particle. The viral vector can contain the nucleic acid described herein in place of non-essential viral genes. The vector and/or particle may be utilized for the purpose of transferring any nucleic acids into cells either in vitro or in vivo. Numerous forms of viral vectors are known in the art.


In some embodiments of any of the aspects, the nucleic acid sequence and/or vector described herein is comprised by, provided in, or located in, a viral particle (e.g., a lentiviral particle).


In one aspect of any of the embodiments, described herein is a composition comprising a nucleic acid sequence, vector, or particle as described herein and a pharmaceutically acceptable carrier.


In one aspect of any of the embodiments, described herein is to a pharmaceutical composition comprising a nucleic acid sequence as described herein (and/or a vector or virus particle comprising such a nucleic acid sequence), and optionally a pharmaceutically acceptable carrier. In some embodiments of any of the aspects, the active ingredients of the pharmaceutical composition comprise a nucleic acid as described herein (and/or a vector or virus particle comprising such a nucleic acid sequence). In some embodiments of any of the aspects, the active ingredients of the pharmaceutical composition consist of a nucleic acid as described herein (and/or a vector or virus particle comprising such a nucleic acid sequence). Pharmaceutically acceptable carriers and diluents include saline, aqueous buffer solutions, solvents and/or dispersion media. The use of such carriers and diluents is well known in the art. Some non-limiting examples of materials which can serve as pharmaceutically-acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose, methylcellulose, ethyl cellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients, such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) glycols, such as propylene glycol; (11) polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters, such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents, such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethyl alcohol; (20) pH buffered solutions; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids (23) serum component, such as serum albumin, HDL and LDL; (22) C2-C12 alcohols, such as ethanol; and (23) other non-toxic compatible substances employed in pharmaceutical formulations. Wetting agents, coloring agents, release agents, coating agents, sweetening agents, flavoring agents, perfuming agents, preservative and antioxidants can also be present in the formulation. The terms such as “excipient”, “carrier”, “pharmaceutically acceptable carrier” or the like are used interchangeably herein. In some embodiments of any of the aspects, the carrier inhibits the degradation of the active agent, e.g. of a nucleic acid comprising a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide as described herein.


In some embodiments of any of the aspects, the pharmaceutical composition comprising a nucleic acid sequence comprising a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide as described herein (and/or a vector or virus particle comprising such a nucleic acid sequence) can be a parenteral dose form. Since administration of parenteral dosage forms typically bypasses the patient's natural defenses against contaminants, parenteral dosage forms are preferably sterile or capable of being sterilized prior to administration to a patient. Examples of parenteral dosage forms include, but are not limited to, solutions ready for injection, dry products ready to be dissolved or suspended in a pharmaceutically acceptable vehicle for injection, suspensions ready for injection, and emulsions. In addition, controlled-release parenteral dosage forms can be prepared for administration of a patient, including, but not limited to, DUROS®-type dosage forms and dose-dumping.


Suitable vehicles that can be used to provide parenteral dosage forms of the pharmaceutical composition comprising a nucleic acid sequence comprising a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide as described herein (and/or a vector or virus particle comprising such a nucleic acid sequence) are well known to those skilled in the art. Examples include, without limitation: sterile water; water for injection USP; saline solution; glucose solution; aqueous vehicles such as but not limited to, sodium chloride injection, Ringer's injection, dextrose Injection, dextrose and sodium chloride injection, and lactated Ringer's injection; water-miscible vehicles such as, but not limited to, ethyl alcohol, polyethylene glycol, and propylene glycol; and non-aqueous vehicles such as, but not limited to, corn oil, cottonseed oil, peanut oil, sesame oil, ethyl oleate, isopropyl myristate, and benzyl benzoate. Compounds that alter or modify the solubility of a pharmaceutically acceptable salt of the pharmaceutical composition as disclosed herein can also be incorporated into the parenteral dosage forms of the disclosure, including conventional and controlled-release parenteral dosage forms.


Pharmaceutical compositions comprising a nucleic acid sequence comprising a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide as disclosed herein (and/or a vector or virus particle comprising such a nucleic acid sequence) can also be formulated to be suitable for oral administration, for example as discrete dosage forms, such as, but not limited to, tablets (including without limitation scored or coated tablets), pills, caplets, capsules, chewable tablets, powder packets, cachets, troches, wafers, aerosol sprays, or liquids, such as but not limited to, syrups, elixirs, solutions or suspensions in an aqueous liquid, a non-aqueous liquid, an oil-in-water emulsion, or a water-in-oil emulsion. Such compositions contain a predetermined amount of the pharmaceutically acceptable salt of the disclosed compounds, and may be prepared by methods of pharmacy well known to those skilled in the art. See generally, Remington: The Science and Practice of Pharmacy, 21st Ed., Lippincott, Williams, and Wilkins, Philadelphia Pa. (2005).


Conventional dosage forms generally provide rapid or immediate drug release from the formulation. Depending on the pharmacology and pharmacokinetics of the drug, use of conventional dosage forms can lead to wide fluctuations in the concentrations of the drug in a patient's blood and other tissues. These fluctuations can impact a number of parameters, such as dose frequency, onset of action, duration of efficacy, maintenance of therapeutic blood levels, toxicity, side effects, and the like. Advantageously, controlled-release formulations can be used to control a drug's onset of action, duration of action, plasma levels within the therapeutic window, and peak blood levels. In particular, controlled- or extended-release dosage forms or formulations can be used to ensure that the maximum effectiveness of a drug is achieved while minimizing potential adverse effects and safety concerns, which can occur both from under-dosing a drug (i.e., going below the minimum therapeutic levels) as well as exceeding the toxicity level for the drug. In some embodiments of any of the aspects, the comprising a nucleic acid sequence comprising a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide as disclosed herein (and/or a vector or virus particle comprising such a nucleic acid sequence) can be administered in a sustained release formulation.


Controlled-release pharmaceutical products have a common goal of improving drug therapy over that achieved by their non-controlled release counterparts. Ideally, the use of an optimally designed controlled-release preparation in medical treatment is characterized by a minimum of drug substance being employed to cure or control the condition in a minimum amount of time. Advantages of controlled-release formulations include: 1) extended activity of the drug; 2) reduced dosage frequency; 3) increased patient compliance; 4) usage of less total drug; 5) reduction in local or systemic side effects; 6) minimization of drug accumulation; 7) reduction in blood level fluctuations; 8) improvement in efficacy of treatment; 9) reduction of potentiation or loss of drug activity; and 10) improvement in speed of control of diseases or conditions. Kim, Chemg-ju, Controlled Release Dosage Form Design, 2 (Technomic Publishing, Lancaster, Pa.: 2000).


Most controlled-release formulations are designed to initially release an amount of drug (active ingredient) that promptly produces the desired therapeutic effect, and gradually and continually release other amounts of drug to maintain this level of therapeutic or prophylactic effect over an extended period of time. In order to maintain this constant level of drug in the body, the drug must be released from the dosage form at a rate that will replace the amount of drug being metabolized and excreted from the body. Controlled-release of an active ingredient can be stimulated by various conditions including, but not limited to, pH, ionic strength, osmotic pressure, temperature, enzymes, water, and other physiological conditions or compounds.


A variety of known controlled- or extended-release dosage forms, formulations, and devices can be adapted for use with the salts and compositions of the disclosure. Examples include, but are not limited to, those described in U.S. Pat. Nos. 3,845,770; 3,916,899; 3,536,809; 3,598,123; 4,008,719; 5,674,533; 5,059,595; 5,591,767; 5,120,548; 5,073,543; 5,639,476; 5,354,556; 5,733,566; and 6,365,185 B1; each of which is incorporated herein by reference. These dosage forms can be used to provide slow or controlled-release of one or more active ingredients using, for example, hydroxypropylmethyl cellulose, other polymer matrices, gels, permeable membranes, osmotic systems (such as OROS® (Alza Corporation, Mountain View, Calif. USA)), or a combination thereof to provide the desired release profile in varying proportions.


In some aspects of the embodiments, described herein is a method of treating Diamond-Blackfan Anemia in a subject in need thereof, the method comprising administering a therapeutically effective amount of a nucleic acid sequence, particle, or composition as described herein to the patient.


The compositions described herein can be administered to a subject having or diagnosed as having DBA. In some embodiments of any of the aspects, the methods described herein comprise administering an effective amount of a composition described herein, e.g. of a nucleic acid comprising a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide as as described herein to a subject in order to alleviate a symptom of DBA. As used herein, “alleviating a symptom” is ameliorating any condition or symptom associated with DBA. As compared with an equivalent untreated control, such reduction is by at least 5%, 10%, 20%, 40%, 50%, 60%, 80%, 90%, 95%, 99% or more as measured by any standard technique. A variety of means for administering the compositions described herein to subjects are known to those of skill in the art. Such methods can include, but are not limited to oral, parenteral, intravenous, intramuscular, subcutaneous, transdermal, airway (aerosol), pulmonary, cutaneous, topical, or injection administration. Administration can be local or systemic.


The term “effective amount” as used herein refers to the amount of the active agent needed to alleviate at least one or more symptom of the disease or disorder, and relates to a sufficient amount of pharmacological composition to provide the desired effect. The term “therapeutically effective amount” therefore refers to an amount of the active agent that is sufficient to provide a particular effect when administered to a typical subject. An effective amount as used herein, in various contexts, would also include an amount sufficient to delay the development of a symptom of the disease, alter the course of a symptom disease (for example but not limited to, slowing the progression of a symptom of the disease), or reverse a symptom of the disease. Thus, it is not generally practicable to specify an exact “effective amount”. However, for any given case, an appropriate “effective amount” can be determined by one of ordinary skill in the art using only routine experimentation.


Effective amounts, toxicity, and therapeutic efficacy can be determined by standard pharmaceutical procedures in cell cultures or experimental animals, e.g., for determining the LD50 (the dose lethal to 50% of the population) and the ED50 (the dose therapeutically effective in 50% of the population). The dosage can vary depending upon the dosage form employed and the route of administration utilized. The dose ratio between toxic and therapeutic effects is the therapeutic index and can be expressed as the ratio LD50/ED50. Compositions and methods that exhibit large therapeutic indices are preferred. A therapeutically effective dose can be estimated initially from cell culture assays. Also, a dose can be formulated in animal models to achieve a circulating plasma concentration range that includes the IC50 (i.e., the concentration of the active agent, which achieves a half-maximal inhibition of symptoms) as determined in cell culture, or in an appropriate animal model. Levels in plasma can be measured, for example, by high performance liquid chromatography. The effects of any particular dosage can be monitored by a suitable bioassay, e.g,. assays for the levels of red blood cells and/or erythropoiesis, among others. The dosage can be determined by a physician and adjusted, as necessary, to suit observed effects of the treatment.


The dosage of a composition as described herein can be determined by a physician and adjusted, as necessary, to suit observed effects of the treatment. With respect to duration and frequency of treatment, it is typical for skilled clinicians to monitor subjects in order to determine when the treatment is providing therapeutic benefit, and to determine whether to increase or decrease dosage, increase or decrease administration frequency, discontinue treatment, resume treatment, or make other alterations to the treatment regimen. The dosing schedule can vary from once a week to daily depending on a number of clinical factors, such as the subject's sensitivity to the active agent. The desired dose or amount of activation can be administered at one time or divided into subdoses, e.g., 2-4 subdoses and administered over a period of time, e.g., at appropriate intervals through the day or other appropriate schedule. In some embodiments of any of the aspects, administration can be chronic, e.g., one or more doses and/or treatments daily over a period of weeks or months. Examples of dosing and/or treatment schedules are administration daily, twice daily, three times daily or four or more times daily over a period of 1 week, 2 weeks, 3 weeks, 4 weeks, 1 month, 2 months, 3 months, 4 months, 5 months, or 6 months, or more. A composition a nucleic acid sequence comprising a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide as disclosed herein (and/or a vector or virus particle comprising such a nucleic acid sequence) can be administered over a period of time, such as over a 5 minute, 10 minute, 15 minute, 20 minute, or 25 minute period.


In some embodiments of any of the aspects, after an initial treatment regimen, the treatments can be administered on a less frequent basis. For example, after treatment biweekly for three months, treatment can be repeated once per month, for six months or a year or longer. Treatment according to the methods described herein can reduce levels of a marker or symptom of a condition by at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80% or at least 90% or more.


The dosage ranges for the administration of a nucleic acid sequence comprising a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide as disclosed herein (and/or a vector or virus particle comprising such a nucleic acid sequence), according to the methods described herein depend upon, for example, the form of the inhibitor, its potency, and the extent to which symptoms, markers, or indicators of a condition described herein are desired to be reduced, for example the percentage Generally, the dosage will vary with the age, condition, and sex of the patient and can be determined by one of skill in the art. The dosage can also be adjusted by the individual physician in the event of any complication.


The efficacy of a nucleic acid sequence comprising a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide as disclosed herein (and/or a vector or virus particle comprising such a nucleic acid sequence) in, e.g. the treatment of DBA or any other condition described herein, or to induce a response as described herein can be determined by the skilled clinician. However, a treatment is considered “effective treatment,” as the term is used herein, if one or more of the signs or symptoms of a condition described herein are altered in a beneficial manner, other clinically accepted symptoms are improved, or even ameliorated, or a desired response is induced e.g., by at least 10% following treatment according to the methods described herein. Efficacy can be assessed, for example, by measuring a marker, indicator, symptom, and/or the incidence of a condition treated according to the methods described herein or any other measurable parameter appropriate. Efficacy can also be measured by a failure of an individual to worsen as assessed by hospitalization, or need for medical interventions (i.e., progression of the disease is halted). Methods of measuring these indicators are known to those of skill in the art and/or are described herein. Treatment includes any treatment of a disease in an individual or an animal (some non-limiting examples include a human or an animal) and includes: (1) inhibiting the disease, e.g., preventing a worsening of symptoms; or (2) relieving the severity of the disease, e.g., causing regression of symptoms. An effective amount for the treatment of a disease means that amount which, when administered to a subject in need thereof, is sufficient to result in effective treatment as that term is defined herein, for that disease. Efficacy of an agent can be determined by assessing physical indicators of a condition or desired response. It is well within the ability of one skilled in the art to monitor efficacy of administration and/or treatment by measuring any one of such parameters, or any combination of parameters. Efficacy can be assessed in animal models of a condition described herein, for example treatment of DBA.


In one aspect of any of the embodiments, described herein is a method of restoring early erythroid progenitor cell-specific GATA1 expression, the method comprising contacting a population of cells comprising early erythroid progenitor cells with a nucleic acid sequence, particle, or composition as described herein.


In some embodiments of any of the aspects, the early erythroid progenitor cells comprise a DBA-associated gene mutation including but not limited to the ones listed in Table 5. In some embodiments of any of the aspects, the erythroid progenitor cells comprise one or more DBA-associated gene mutations. DBA-associated gene mutations are well-known in the art and include but are not limited to mutations listed in Table 5 (e.g., see Int J Hematol. 2010 October; 92(3):413-8).









TABLE 5







Exemplary DBA-associated gene mutations








Gene
Exemplary DBA-associated cDNA


Name
mutations; predicted amino acid change





GALA1
220G>C; p.Leu74Val


RPL5
c.535C>T; p.Arg179X


RPL11
c.475_476ins11; p.Lys159ThrfsX39


RPS19
c.49G>C; p.Ala17Pro









In some embodiments of any of the aspects, the level of GATA-1 can be measured, by way of non-limiting example, by Western blot; immunoprecipitation; enzyme-linked immunosorbent assay (ELISA); radioimmunological assay (RIA); sandwich assay; fluorescence in situ hybridization (FISH); immunohistological staining; radioimmunometric assay; immunofluoresence assay; mass spectroscopy and/or immunoelectrophoresis assay.


RNA and/or DNA molecules can be isolated, derived, or amplified from a biological sample, such as a blood sample. Techniques for the detection of mRNA expression is known by persons skilled in the art, and can include but not limited to, PCR procedures, RT-PCR, quantitative RT-PCR Northern blot analysis, differential gene expression, RNAse protection assay, microarray based analysis, next-generation sequencing; hybridization methods, etc.


In general, the PCR procedure describes a method of gene amplification which is comprised of (i) sequence-specific hybridization of primers to specific genes or sequences within a nucleic acid sample or library, (ii) subsequent amplification involving multiple rounds of annealing, elongation, and denaturation using a thermostable DNA polymerase, and (iii) screening the PCR products for a band of the correct size. The primers used are oligonucleotides of sufficient length and appropriate sequence to provide initiation of polymerization, i.e. each primer is specifically designed to be complementary to a strand of the genomic locus to be amplified. In an alternative embodiment, mRNA level of gene expression products described herein can be determined by reverse-transcription (RT) PCR and by quantitative RT-PCR (QRT-PCR) or real-time PCR methods. Methods of RT-PCR and QRT-PCR are well known in the art.


In some embodiments of any of the aspects, the level of an mRNA can be measured by a quantitative sequencing technology, e.g. a quantitative next-generation sequence technology. Methods of sequencing a nucleic acid sequence are well known in the art. Briefly, a sample obtained from a subject can be contacted with one or more primers which specifically hybridize to a single-strand nucleic acid sequence flanking the target gene sequence and a complementary strand is synthesized. In some next-generation technologies, an adaptor (double or single-stranded) is ligated to nucleic acid molecules in the sample and synthesis proceeds from the adaptor or adaptor compatible primers. In some third-generation technologies, the sequence can be determined, e.g. by determining the location and pattern of the hybridization of probes, or measuring one or more characteristics of a single molecule as it passes through a sensor (e.g. the modulation of an electrical field as a nucleic acid molecule passes through a nanopore). Exemplary methods of sequencing include, but are not limited to, Sanger sequencing, dideoxy chain termination, high-throughput sequencing, next generation sequencing, 454 sequencing, SOLiD sequencing, polony sequencing, Illumina sequencing, Ion Torrent sequencing, sequencing by hybridization, nanopore sequencing, Helioscope sequencing, single molecule real time sequencing, RNAP sequencing, and the like. Methods and protocols for performing these sequencing methods are known in the art, see, e.g. “Next Generation Genome Sequencing” Ed. Michal Janitz, Wiley-VCH; “High-Throughput Next Generation Sequencing” Eds. Kwon and Ricke, Humanna Press, 2011; and Sambrook et al., Molecular Cloning: A Laboratory Manual (4 ed.), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., USA (2012); which are incorporated by reference herein in their entireties.


Nucleic acid and ribonucleic acid (RNA) molecules can be isolated from a particular biological sample using any of a number of procedures, which are well-known in the art, the particular isolation procedure chosen being appropriate for the particular biological sample. For example, freeze-thaw and alkaline lysis procedures can be useful for obtaining nucleic acid molecules from solid materials; heat and alkaline lysis procedures can be useful for obtaining nucleic acid molecules from urine; and proteinase K extraction can be used to obtain nucleic acid from blood (Roiff, A et al. PCR: Clinical Diagnostics and Research, Springer (1994)).


In some embodiments of any of the aspects, one or more of the reagents (e.g. an antibody reagent and/or nucleic acid probe) described herein can comprise a detectable label and/or comprise the ability to generate a detectable signal (e.g. by catalyzing reaction converting a compound to a detectable product). Detectable labels can comprise, for example, a light-absorbing dye, a fluorescent dye, or a radioactive label. Detectable labels, methods of detecting them, and methods of incorporating them into reagents (e.g. antibodies and nucleic acid probes) are well known in the art.


In some embodiments of any of the aspects, detectable labels can include labels that can be detected by spectroscopic, photochemical, biochemical, immunochemical, electromagnetic, radiochemical, or chemical means, such as fluorescence, chemifluoresence, or chemiluminescence, or any other appropriate means. The detectable labels used in the methods described herein can be primary labels (where the label comprises a moiety that is directly detectable or that produces a directly detectable moiety) or secondary labels (where the detectable label binds to another moiety to produce a detectable signal, e.g., as is common in immunological labeling using secondary and tertiary antibodies). The detectable label can be linked by covalent or non-covalent means to the reagent. Alternatively, a detectable label can be linked such as by directly labeling a molecule that achieves binding to the reagent via a ligand-receptor binding pair arrangement or other such specific recognition molecules. Detectable labels can include, but are not limited to radioisotopes, bioluminescent compounds, chromophores, antibodies, chemiluminescent compounds, fluorescent compounds, metal chelates, and enzymes.


In other embodiments, the detection reagent is label with a fluorescent compound. When the fluorescently labeled reagent is exposed to light of the proper wavelength, its presence can then be detected due to fluorescence. In some embodiments of any of the aspects, a detectable label can be a fluorescent dye molecule, or fluorophore including, but not limited to fluorescein, phycoerythrin, phycocyanin, o-phthaldehyde, fluorescamine, Cy3™, Cy5™, allophycocyanine, Texas Red, peridenin chlorophyll, cyanine, tandem conjugates such as phycoerythrin-Cy5™, green fluorescent protein, rhodamine, fluorescein isothiocyanate (FITC) and Oregon Green™, rhodamine and derivatives (e.g., Texas red and tetrarhodimine isothiocynate (TRITC)), biotin, phycoerythrin, AMCA, CyDyes™, 6-carboxyfhiorescein (commonly known by the abbreviations FAM and F), 6-carboxy-2′,4′,7′,4,7-hexachlorofluorescein (HEX), 6-carboxy-4′,5′-dichloro-2′,7′-dimethoxyfluorescein (JOE or J), N,N,N′,N′-tetramethyl-6carboxyrhodamine (TAMRA or T), 6-carboxy-X-rhodamine (ROX or R), 5-carboxyrhodamine-6G (R6G5 or G5), 6-carboxyrhodamine-6G (R6G6 or G6), and rhodamine 110; cyanine dyes, e.g. Cy3, Cy5 and Cy7 dyes; coumarins, e.g umbelliferone; benzimide dyes, e.g. Hoechst 33258; phenanthridine dyes, e.g. Texas Red; ethidium dyes; acridine dyes; carbazole dyes; phenoxazine dyes; porphyrin dyes; polymethine dyes, e.g. cyanine dyes such as Cy3, Cy5, etc; BODIPY dyes and quinoline dyes. In some embodiments of any of the aspects, a detectable label can be a radiolabel including, but not limited to 3H, 125I, 35S, 14C, 32P, and 33P. In some embodiments of any of the aspects, a detectable label can be an enzyme including, but not limited to horseradish peroxidase and alkaline phosphatase. An enzymatic label can produce, for example, a chemiluminescent signal, a color signal, or a fluorescent signal. Enzymes contemplated for use to detectably label an antibody reagent include, but are not limited to, malate dehydrogenase, staphylococcal nuclease, delta-V-steroid isomerase, yeast alcohol dehydrogenase, alpha-glycerophosphate dehydrogenase, triose phosphate isomerase, horseradish peroxidase, alkaline phosphatase, asparaginase, glucose oxidase, beta-galactosidase, ribonuclease, urease, catalase, glucose-VI-phosphate dehydrogenase, glucoamylase and acetylcholinesterase. In some embodiments of any of the aspects, a detectable label is a chemiluminescent label, including, but not limited to lucigenin, luminol, luciferin, isoluminol, theromatic acridinium ester, imidazole, acridinium salt and oxalate ester. In some embodiments of any of the aspects, a detectable label can be a spectral colorimetric label including, but not limited to colloidal gold or colored glass or plastic (e.g., polystyrene, polypropylene, and latex) beads.


In some embodiments of any of the aspects, detection reagents can also be labeled with a detectable tag, such as c-Myc, HA, VSV-G, HSV, FLAG, V5, HIS, or biotin. Other detection systems can also be used, for example, a biotin-streptavidin system. In this system, the antibodies immunoreactive (i. e. specific for) with the biomarker of interest is biotinylated. Quantity of biotinylated antibody bound to the biomarker is determined using a streptavidin-peroxidase conjugate and a chromagenic substrate. Such streptavidin peroxidase detection kits are commercially available, e. g. from DAKO; Carpinteria, Calif. A reagent can also be detectably labeled using fluorescence emitting metals such as 152Eu, or others of the lanthanide series. These metals can be attached to the reagent using such metal chelating groups as diethylenetriaminepentaacetic acid (DTPA) or ethylenediaminetetraacetic acid (EDTA).


A level which is less than a reference level can be a level which is less by at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 50%, at least about 60%, at least about 80%, at least about 90%, or less relative to the reference level. In some embodiments of any of the aspects, a level which is less than a reference level can be a level which is statistically significantly less than the reference level.


A level which is more than a reference level can be a level which is greater by at least about 10%, at least about 20%, at least about 50%, at least about 60%, at least about 80%, at least about 90%, at least about 100%, at least about 200%, at least about 300%, at least about 500% or more than the reference level. In some embodiments of any of the aspects, a level which is more than a reference level can be a level which is statistically significantly greater than the reference level.


In some embodiments of any of the aspects, the reference can be a level of the target in a population of subjects who do not have or are not diagnosed as having, and/or do not exhibit signs or symptoms of lung infection and/or lung inflammation. In some embodiments of any of the aspects, the reference can also be a level of the target in a control sample, a pooled sample of control individuals or a numeric value or range of values based on the same. In some embodiments of any of the aspects, the reference can be the level of a target in a sample obtained from the same subject at an earlier point in time, e.g., the methods described herein can be used to determine if a subject's sensitivity or response to a given therapy is changing over time.


In some embodiments of the foregoing aspects, the expression level of a given gene can be normalized relative to the expression level of one or more reference genes or reference proteins.


In some embodiments of any of the aspects, the reference level can be the level in a sample of similar cell type, sample type, sample processing, and/or obtained from a subject of similar age, sex and other demographic parameters as the sample/subject for which the level of neutrophil accumulation and/or polyP is to be determined. In some embodiments of any of the aspects, the test sample and control reference sample are of the same type, that is, obtained from the same biological source, and comprising the same composition, e.g. the same number and type of cells.


The term “sample” or “test sample” as used herein denotes a sample taken or isolated from a biological organism, e.g., a blood or plasma sample from a subject. In some embodiments of any of the aspects, the present invention encompasses several examples of a biological sample. In some embodiments of any of the aspects, the biological sample is cells, or tissue, or peripheral blood, or bodily fluid. Exemplary biological samples include, but are not limited to, a biopsy, a tumor sample, biofluid sample; blood; serum; plasma; urine; sperm; mucus; tissue biopsy; organ biopsy; synovial fluid; bile fluid; cerebrospinal fluid; mucosal secretion; effusion; sweat; saliva; and/or tissue sample etc. The term also includes a mixture of the above-mentioned samples. The term “test sample” also includes untreated or pretreated (or pre-processed) biological samples. In some embodiments of any of the aspects, a test sample can comprise cells from a subject. In some embodiments of any of the aspects, the test sample can be a lung sample, lung aspirate, sputum sample, airway sample, serum sample, or the like.


The test sample can be obtained by removing a sample from a subject, but can also be accomplished by using a previously isolated sample (e.g. isolated at a prior timepoint and isolated by the same or another person).


In some embodiments of any of the aspects, the test sample can be an untreated test sample. As used herein, the phrase “untreated test sample” refers to a test sample that has not had any prior sample pre-treatment except for dilution and/or suspension in a solution. Exemplary methods for treating a test sample include, but are not limited to, centrifugation, filtration, sonication, homogenization, heating, freezing and thawing, and combinations thereof. In some embodiments of any of the aspects, the test sample can be a frozen test sample, e.g., a frozen tissue. The frozen sample can be thawed before employing methods, assays and systems described herein. After thawing, a frozen sample can be centrifuged before being subjected to methods, assays and systems described herein. In some embodiments of any of the aspects, the test sample is a clarified test sample, for example, by centrifugation and collection of a supernatant comprising the clarified test sample. In some embodiments of any of the aspects, a test sample can be a pre-processed test sample, for example, supernatant or filtrate resulting from a treatment selected from the group consisting of centrifugation, filtration, thawing, purification, and any combinations thereof. In some embodiments of any of the aspects, the test sample can be treated with a chemical and/or biological reagent. Chemical and/or biological reagents can be employed to protect and/or maintain the stability of the sample, including biomolecules (e.g., nucleic acid and protein) therein, during processing. One exemplary reagent is a protease inhibitor, which is generally used to protect or maintain the stability of protein during processing. The skilled artisan is well aware of methods and processes appropriate for pre-processing of biological samples required for determination of the level of an expression product as described herein.


For convenience, the meaning of some terms and phrases used in the specification, examples, and appended claims, are provided below. Unless stated otherwise, or implicit from context, the following terms and phrases include the meanings provided below. The definitions are provided to aid in describing particular embodiments, and are not intended to limit the claimed invention, because the scope of the invention is limited only by the claims. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. If there is an apparent discrepancy between the usage of a term in the art and its definition provided herein, the definition provided within the specification shall prevail.


For convenience, certain terms employed herein, in the specification, examples and appended claims are collected here.


The terms “decrease”, “reduced”, “reduction”, or “inhibit” are all used herein to mean a decrease by a statistically significant amount. In some embodiments of any of the aspects, “reduce,” “reduction” or “decrease” or “inhibit” typically means a decrease by at least 10% as compared to a reference level (e.g. the absence of a given treatment or agent) and can include, for example, a decrease by at least about 10%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, at least about 99%, or more. As used herein, “reduction” or “inhibition” does not encompass a complete inhibition or reduction as compared to a reference level. “Complete inhibition” is a 100% inhibition as compared to a reference level. A decrease can be preferably down to a level accepted as within the range of normal for an individual without a given disorder.


The terms “increased”, “increase”, “enhance”, or “activate” are all used herein to mean an increase by a statically significant amount. In some embodiments of any of the aspects, the terms “increased”, “increase”, “enhance”, or “activate” can mean an increase of at least 10% as compared to a reference level, for example an increase of at least about 20%, or at least about 30%, or at least about 40%, or at least about 50%, or at least about 60%, or at least about 70%, or at least about 80%, or at least about 90% or up to and including a 100% increase or any increase between 10-100% as compared to a reference level, or at least about a 2-fold, or at least about a 3-fold, or at least about a 4-fold, or at least about a 5-fold or at least about a 10-fold increase, or any increase between 2-fold and 10-fold or greater as compared to a reference level. In the context of a marker or symptom, a “increase” is a statistically significant increase in such level.


As used herein, a “subject” means a human or animal. Usually the animal is a vertebrate such as a primate, rodent, domestic animal or game animal. Primates include chimpanzees, cynomologous monkeys, spider monkeys, and macaques, e.g., Rhesus. Rodents include mice, rats, woodchucks, ferrets, rabbits and hamsters. Domestic and game animals include cows, horses, pigs, deer, bison, buffalo, feline species, e.g., domestic cat, canine species, e.g., dog, fox, wolf, avian species, e.g., chicken, emu, ostrich, and fish, e.g., trout, catfish and salmon. In some embodiments of any of the aspects, the subject is a mammal, e.g., a primate, e.g., a human. The terms, “individual,” “patient” and “subject” are used interchangeably herein.


Preferably, the subject is a mammal. The mammal can be a human, non-human primate, mouse, rat, dog, cat, horse, or cow, but is not limited to these examples. Mammals other than humans can be advantageously used as subjects that represent animal models of a condition. A subject can be male or female.


A subject can be one who has been previously diagnosed with or identified as suffering from or having a condition in need of treatment or one or more complications related to such a condition, and optionally, have already undergone treatment for the condition or the one or more complications related to the condition. Alternatively, a subject can also be one who has not been previously diagnosed as having the condition or one or more complications related to the condition. For example, a subject can be one who exhibits one or more risk factors for the condition or one or more complications related to the condition or a subject who does not exhibit risk factors.


A “subject in need” of treatment for a particular condition can be a subject having that condition, diagnosed as having that condition, or at risk of developing that condition.


In the various embodiments described herein, it is further contemplated that variants (naturally occurring or otherwise), alleles, homologs, conservatively modified variants, and/or conservative substitution variants of any of the particular polypeptides described are encompassed. As to amino acid sequences, one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters a single amino acid or a small percentage of amino acids in the encoded sequence is a “conservatively modified variant” where the alteration results in the substitution of an amino acid with a chemically similar amino acid and retains the desired activity of the polypeptide. Such conservatively modified variants are in addition to and do not exclude polymorphic variants, interspecies homologs, and alleles consistent with the disclosure.


A given amino acid can be replaced by a residue having similar physiochemical characteristics, e.g., substituting one aliphatic residue for another (such as Ile, Val, Leu, or Ala for one another), or substitution of one polar residue for another (such as between Lys and Arg; Glu and Asp; or Gln and Asn). Other such conservative substitutions, e.g., substitutions of entire regions having similar hydrophobicity characteristics, are well known. Polypeptides comprising conservative amino acid substitutions can be tested in any one of the assays described herein to confirm that a desired activity, e.g. activity and specificity of a native or reference polypeptide is retained.


Amino acids can be grouped according to similarities in the properties of their side chains (in A. L. Lehninger, in Biochemistry, second ed., pp. 73-75, Worth Publishers, New York (1975)): (1) non-polar: Ala (A), Val (V), Leu (L), Ile (I), Pro (P), Phe (F), Trp (W), Met (M); (2) uncharged polar: Gly (G), Ser (S), Thr (T), Cys (C), Tyr (Y), Asn (N), Gln (Q); (3) acidic: Asp (D), Glu (E); (4) basic: Lys (K), Arg (R), His (H). Alternatively, naturally occurring residues can be divided into groups based on common side-chain properties: (1) hydrophobic: Norleucine, Met, Ala, Val, Leu, Ile; (2) neutral hydrophilic: Cys, Ser, Thr, Asn, Gln; (3) acidic: Asp, Glu; (4) basic: His, Lys, Arg; (5) residues that influence chain orientation: Gly, Pro; (6) aromatic: Trp, Tyr, Phe. Non-conservative substitutions will entail exchanging a member of one of these classes for another class. Particular conservative substitutions include, for example; Ala into Gly or into Ser; Arg into Lys; Asn into Gln or into His; Asp into Glu; Cys into Ser; Gln into Asn; Glu into Asp; Gly into Ala or into Pro; His into Asn or into Gln; Ile into Leu or into Val; Leu into Ile or into Val; Lys into Arg, into Gln or into Glu; Met into Leu, into Tyr or into Ile; Phe into Met, into Leu or into Tyr; Ser into Thr; Thr into Ser; Trp into Tyr; Tyr into Trp; and/or Phe into Val, into Ile or into Leu.


The terms “miRNA” and “microRNA” refer to 21-25 nt non-coding RNAs derived from endogenous genes. They are processed from longer (ca. 75 nt) hairpin-like precursors termed pre-miRNAs. MicroRNAs assemble in complexes termed miRNPs and recognize their targets by antisense complementarity. If the microRNAs match 100% their target, i.e., the complementarity is complete, the target mRNA is cleaved, and the miRNA acts like a siRNA. If the match is incomplete, i.e., the complementarity is partial, then the translation of the target mRNA is blocked.


The terms “miRNA target site” or “microRNA target site” refers to a specific target binding sequence of a microRNA in a mRNA target. Complementarity between the miRNA and its target site need not be perfect.


As used herein, the terms “protein” and “polypeptide” are used interchangeably herein to designate a series of amino acid residues, connected to each other by peptide bonds between the alpha-amino and carboxy groups of adjacent residues. The terms “protein”, and “polypeptide” refer to a polymer of amino acids, including modified amino acids (e.g., phosphorylated, glycated, glycosylated, etc.) and amino acid analogs, regardless of its size or function. “Protein” and “polypeptide” are often used in reference to relatively large polypeptides, whereas the term “peptide” is often used in reference to small polypeptides, but usage of these terms in the art overlaps. The terms “protein” and “polypeptide” are used interchangeably herein when referring to a gene product and fragments thereof. Thus, exemplary polypeptides or proteins include gene products, naturally occurring proteins, homologs, orthologs, paralogs, fragments and other equivalents, variants, fragments, and analogs of the foregoing.


In some embodiments of any of the aspects, the polypeptide described herein (or a nucleic acid encoding such a polypeptide) can be a functional fragment of one of the amino acid sequences described herein. As used herein, a “functional fragment” is a fragment or segment of a peptide which retains at least 50% of the wildtype reference polypeptide's activity according to the assays described below herein. A functional fragment can comprise conservative substitutions of the sequences disclosed herein.


In some embodiments of any of the aspects, the polypeptide described herein can be a variant of a sequence described herein. In some embodiments of any of the aspects, the variant is a conservatively modified variant. Conservative substitution variants can be obtained by mutations of native nucleotide sequences, for example. A “variant,” as referred to herein, is a polypeptide substantially homologous to a native or reference polypeptide, but which has an amino acid sequence different from that of the native or reference polypeptide because of one or a plurality of deletions, insertions or substitutions. Variant polypeptide-encoding DNA sequences encompass sequences that comprise one or more additions, deletions, or substitutions of nucleotides when compared to a native or reference DNA sequence, but that encode a variant protein or fragment thereof that retains activity. A wide variety of PCR-based site-specific mutagenesis approaches are known in the art and can be applied by the ordinarily skilled artisan.


A variant amino acid or DNA sequence can be at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or more, identical to a native or reference sequence. The degree of homology (percent identity) between a native and a mutant sequence can be determined, for example, by comparing the two sequences using freely available computer programs commonly employed for this purpose on the world wide web (e.g. BLASTp or BLASTn with default settings).


Alterations of the native amino acid sequence can be accomplished by any of a number of techniques known to one of skill in the art. Mutations can be introduced, for example, at particular loci by synthesizing oligonucleotides containing a mutant sequence, flanked by restriction sites enabling ligation to fragments of the native sequence. Following ligation, the resulting reconstructed sequence encodes an analog having the desired amino acid insertion, substitution, or deletion. Alternatively, oligonucleotide-directed site-specific mutagenesis procedures can be employed to provide an altered nucleotide sequence having particular codons altered according to the substitution, deletion, or insertion required. Techniques for making such alterations are very well established and include, for example, those disclosed by Walder et al. (Gene 42:133, 1986); Bauer et al. (Gene 37:73, 1985); Craik (BioTechniques, January 1985, 12-19); Smith et al. (Genetic Engineering: Principles and Methods, Plenum Press, 1981); and U.S. Pat. Nos. 4,518,584 and 4,737,462, which are herein incorporated by reference in their entireties. Any cysteine residue not involved in maintaining the proper conformation of the polypeptide also can be substituted, generally with serine, to improve the oxidative stability of the molecule and prevent aberrant crosslinking. Conversely, cysteine bond(s) can be added to the polypeptide to improve its stability or facilitate oligomerization.


As used herein, the term “Erythropoiesis” is the process which produces red blood cells, which is the development from erythropoietic stem cell to mature red blood cell. As used herein, the term “erythroid cells” referes to red blood cells.


As used herein, the term “nucleic acid” or “nucleic acid sequence” refers to any molecule, preferably a polymeric molecule, incorporating units of ribonucleic acid, deoxyribonucleic acid or an analog thereof. The nucleic acid can be either single-stranded or double-stranded. A single-stranded nucleic acid can be one nucleic acid strand of a denatured double- stranded DNA. Alternatively, it can be a single-stranded nucleic acid not derived from any double-stranded DNA. In one aspect of any of the embodiments, the nucleic acid can be DNA. In another aspect, the nucleic acid can be RNA. Suitable DNA can include, e.g., genomic DNA or cDNA. Suitable RNA can include, e.g., mRNA.


The term “expression” refers to the cellular processes involved in producing RNA and proteins and as appropriate, secreting proteins, including where applicable, but not limited to, for example, transcription, transcript processing, translation and protein folding, modification and processing. Expression can refer to the transcription and stable accumulation of sense (mRNA) or antisense RNA derived from a nucleic acid fragment or fragments of the invention and/or to the translation of mRNA into a polypeptide.


In some embodiments of any of the aspects, the expression of a biomarker(s), target(s), or gene/polypeptide described herein is/are tissue-specific. In some embodiments of any of the aspects, the expression of a biomarker(s), target(s), or gene/polypeptide described herein is/are global. In some embodiments of any of the aspects, the expression of a biomarker(s), target(s), or gene/polypeptide described herein is systemic.


As used herein, “expression products” include RNA transcribed from a gene, and polypeptides obtained by translation of mRNA transcribed from a gene. The term “gene” means the nucleic acid sequence which is transcribed (DNA) to RNA in vitro or in vivo when operably linked to appropriate regulatory sequences. The gene may or may not include regions preceding and following the coding region, e.g. 5′ untranslated (5′UTR) or “leader” sequences and 3′ UTR or “trailer” sequences, as well as intervening sequences (introns) between individual coding segments (exons).


As used herein, “5′UTR” or “5′ untranslated region” or “5′ leader sequence” refers to regions of an mR A that are not translated. A 5′UTR typically begins at the transcription start site and ends just before the translation initiation site or start codon (usually AUG in an mRNA, ATG in a DNA sequence) of the coding region. The length of the 5′UTR may be modified by mutation for example substitution, deletion or insertion of the 5′UTR. The 5′UTR may be further modified by mutating a naturally occurring start codon or translation initiation site such that the codon no longer functions as start codon and translation may initiate at an alternate initiation site.


As used herein, an “expression enhancer”, an “enhancer sequence” or an “enhancer element”, refers to a nucleic acid sequence that can enhance expression of a downstream heterologous open reading frame (ORF) to which they are operably linked to.


As used herein, the term “post-transcriptional regulation”, refers to the control of gene expression at the RNA level, between the transcription and the translation of the gene.


As used herein, the term “operably linked” refers to sequences that interact either directly or indirectly to carry out an intended function, e.g. the mediation or modulation of expression of a nucleic acid sequence. The interaction of operatively linked sequences may, for example, be mediated by proteins that interact with the operatively linked sequences. Typically, it refers to the functional relationship of a transcriptional regulatory sequence to a transcribed sequence. For example, a promoter sequence is operably linked to an open reading frame if it stimulates or modulates the transcription of the open reading frame in an appropriate host cell or other expression system. Generally, promoter transcriptional regulatory sequences that are operably linked to a transcribed sequence are physically contiguous to the transcribed sequence, i.e., they are cis-acting. However, some transcriptional regulatory sequences, such as enhancers, need not be physically contiguous or located in close proximity to the open reading frame s whose transcription they enhance.


“Marker” in the context of the present invention refers to an expression product, e.g., nucleic acid or polypeptide which is differentially present in a sample taken from subjects having increased neutrophil accumulation and/or polyP, as compared to a comparable sample taken from control subjects (e.g., a healthy subject). The term “biomarker” is used interchangeably with the term “marker.”


In some embodiments of any of the aspects, the methods described herein relate to measuring, detecting, or determining the level of at least one marker. As used herein, the term “detecting” or “measuring” refers to observing a signal from, e.g. a probe, label, or target molecule to indicate the presence of an analyte in a sample. Any method known in the art for detecting a particular label moiety can be used for detection. Exemplary detection methods include, but are not limited to, spectroscopic, fluorescent, photochemical, biochemical, immunochemical, electrical, optical or chemical methods. In some embodiments of any of the aspects, measuring can be a quantitative observation.


In some embodiments of any of the aspects, a polypeptide, nucleic acid, or cell as described herein can be engineered. As used herein, “engineered” refers to the aspect of having been manipulated by the hand of man. For example, a polypeptide is considered to be “engineered” when at least one aspect of the polypeptide, e.g., its sequence, has been manipulated by the hand of man to differ from the aspect as it exists in nature. As is common practice and is understood by those in the art, progeny of an engineered cell are typically still referred to as “engineered” even though the actual manipulation was performed on a prior entity.


As used herein, the term “distal” refers to a nucleic acid sequence upstream of the gene that may contain additional regulatory elements (e.g. distal promoter elements are regulatory DNA sequences that can be many kilobases distant from the gene that they regulate). Each strand of DNA or RNA has a 5′ end and a 3′ end, so named for the carbon position on the deoxyribose (or ribose) ring. As used herein, the term “upstream” refers to the relative positions of the genetic code in DNA and/or RNA. the 5′ to 3′ direction respectively in which RNA transcription takes place.


The term “exogenous” refers to a substance present in a cell other than its native source. The term “exogenous” when used herein can refer to a nucleic acid (e.g. a nucleic acid encoding a polypeptide) or a polypeptide that has been introduced by a process involving the hand of man into a biological system such as a cell or organism in which it is not normally found and one wishes to introduce the nucleic acid or polypeptide into such a cell or organism. Alternatively, “exogenous” can refer to a nucleic acid or a polypeptide that has been introduced by a process involving the hand of man into a biological system such as a cell or organism in which it is found in relatively low amounts and one wishes to increase the amount of the nucleic acid or polypeptide in the cell or organism, e.g., to create ectopic expression or levels. In contrast, the term “endogenous” refers to a substance that is native to the biological system or cell. As used herein, “ectopic” refers to a substance that is found in an unusual location and/or amount. An ectopic substance can be one that is normally found in a given cell, but at a much lower amount and/or at a different time. Ectopic also includes substance, such as a polypeptide or nucleic acid that is not naturally found or expressed in a given cell in its natural environment.


In some embodiments of any of the aspects, a nucleic acid described herein, e.g., an inhibitory nucleic acid is or is provided or administered when it is comprised by a vector. In some of the aspects described herein, a nucleic acid sequence is operably linked to a vector. The term “vector”, as used herein, refers to a nucleic acid construct designed for delivery to a host cell or for transfer between different host cells. As used herein, a vector can be viral or non-viral.


The term “vector” encompasses any genetic element that is capable of replication when associated with the proper control elements and that can transfer gene sequences to cells. A vector can include, but is not limited to, a cloning vector, an expression vector, a plasmid, phage, transposon, cosmid, chromosome, virus, virion, etc. A vector can be a plasmid or lentiviral vector.


As used herein, the term “viral vector” refers to a nucleic acid vector construct that includes at least one element of viral origin and has the capacity to be packaged into a viral vector particle. The viral vector can contain the nucleic acid encoding a polypeptide as described herein in place of non-essential viral genes. The vector and/or particle may be utilized for the purpose of transferring any nucleic acids into cells either in vitro or in vivo. Numerous forms of viral vectors are known in the art.


By “recombinant vector” is meant a vector that includes a heterologous nucleic acid sequence, or “transgene” that is capable of expression in vivo. It should be understood that the vectors described herein can, In some embodiments of any of the aspects, be combined with other suitable compositions and therapies. In some embodiments of any of the aspects, the vector is episomal. The use of a suitable episomal vector provides a means of maintaining the nucleotide of interest in the subject in high copy number extra chromosomal DNA thereby eliminating potential effects of chromosomal integration. In some embodiments of any of the aspects, the vector is recombinant, e.g., it comprises sequences originating from at least two different sources. In some embodiments of any of the aspects, the vector comprises sequences originating from at least two different species. In some embodiments of any of the aspects, the vector comprises sequences originating from at least two different genes, e.g., it comprises a fusion protein or a nucleic acid encoding an expression product which is operably linked to at least one non-native (e.g., heterologous) genetic control element (e.g., a promoter, suppressor, activator, enhancer, response element, or the like).


As used herein, the term “heterologous” means a nucleic acid sequence or polypeptide that originates from a foreign species, or that is substantially modified from its original form if from the same species.


In some embodiments of any of the aspects, the vector or nucleic acid described herein is codon-optomized, e.g., the native or wild-type sequence of the nucleic acid sequence has been altered or engineered to include alternative codons such that altered or engineered nucleic acid encodes the same polypeptide expression product as the native/wild-type sequence, but will be transcribed and/or translated at an improved efficiency in a desired expression system. In some embodiments of any of the aspects, the expression system is an organism other than the source of the native/wild-type sequence (or a cell obtained from such organism). In some embodiments of any of the aspects, the vector and/or nucleic acid sequence described herein is codon-optimized for expression in a mammal or mammalian cell, e.g., a mouse, a murine cell, or a human cell. In some embodiments of any of the aspects, the vector and/or nucleic acid sequence described herein is codon-optimized for expression in a human cell. In some embodiments of any of the aspects, the vector and/or nucleic acid sequence described herein is codon-optimized for expression in a yeast or yeast cell. In some embodiments of any of the aspects, the vector and/or nucleic acid sequence described herein is codon-optimized for expression in a cell. In some embodiments of any of the aspects, the vector and/or nucleic acid sequence described herein is codon-optimized for expression in an E. coli cell.


As used herein, the term “expression vector” refers to a vector that directs expression of an RNA or polypeptide from sequences linked to transcriptional regulatory sequences on the vector. The sequences expressed will often, but not necessarily, be heterologous to the cell. An expression vector may comprise additional elements, for example, the expression vector may have two replication systems, thus allowing it to be maintained in two organisms, for example in human cells for expression and in a prokaryotic host for cloning and amplification.


The term “regulatory sequence” is intended to include promoters, enhancers and other expression control elements (e.g., polyadenylation signals) that control the transcription or translation of a gene they are operably linked to. Such regulatory sequences are described, for example, in Goeddel; Gene Expression Technology. Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990). Examples of regulatory sequences for mammalian host cell expression include viral elements that direct high levels of protein expression in mammalian cells, such as promoters and/or enhancers derived front cytomegalovirus (CMV), Simian Virus 40 (SV40), adenovirus, (e.g., the adenovirus major late promoter (AdMLP)) and polyoma. Alternatively, nonviral regulatory sequences may be used, such as the ubiquitin promoter, Elongation factor 1-alpha 1 (eEF1a1) promoter or β-globin promoter. A eukaryotic promoter is a regulatory region of DNA located upstream of a gene that binds transcription factor II D (TFIID) and allows the subsequent coordination of components of the transcription initiation complex, facilitating recruitment of RNA polymerase II and initiation of transcription. Genes with complex promoters are likely to make use of regulatory elements, such as enhancers and silencers, selectively, allowing varying levels of expression as required.


As used herein, the terms “treat” “treatment,” “treating,” or “amelioration” refer to therapeutic treatments, wherein the object is to reverse, alleviate, ameliorate, inhibit, slow down or stop the progression or severity of a condition associated with a disease or disorder, e.g. a lung infection and/or lung inflammation. The term “treating” includes reducing or alleviating at least one adverse effect or symptom of a condition, disease or disorder associated with a condition. Treatment is generally “effective” if one or more symptoms or clinical markers are reduced. Alternatively, treatment is “effective” if the progression of a disease is reduced or halted. That is, “treatment” includes not just the improvement of symptoms or markers, but also a cessation of, or at least slowing of, progress or worsening of symptoms compared to what would be expected in the absence of treatment. Beneficial or desired clinical results include, but are not limited to, alleviation of one or more symptom(s), diminishment of extent of disease, stabilized (i.e., not worsening) state of disease, delay or slowing of disease progression, amelioration or palliation of the disease state, remission (whether partial or total), and/or decreased mortality, whether detectable or undetectable. The term “treatment” of a disease also includes providing relief from the symptoms or side-effects of the disease (including palliative treatment).


As used herein, the term “pharmaceutical composition” refers to the active agent in combination with a pharmaceutically acceptable carrier e.g. a carrier commonly used in the pharmaceutical industry. The phrase “pharmaceutically acceptable” is employed herein to refer to those compounds, materials, compositions, and/or dosage forms which are, within the scope of sound medical judgment, suitable for use in contact with the tissues of human beings and animals without excessive toxicity, irritation, allergic response, or other problem or complication, commensurate with a reasonable benefit/risk ratio. In some embodiments of any of the aspects, a pharmaceutically acceptable carrier can be a carrier other than water. In some embodiments of any of the aspects, a pharmaceutically acceptable carrier can be a cream, emulsion, gel, liposome, nanoparticle, and/or ointment. In some embodiments of any of the aspects, a pharmaceutically acceptable carrier can be an artificial or engineered carrier, e.g., a carrier that the active ingredient would not be found to occur in in nature.


As used herein, the term “administering,” refers to the placement of a compound as disclosed herein into a subject by a method or route which results in at least partial delivery of the agent at a desired site. Pharmaceutical compositions comprising the compounds disclosed herein can be administered by any appropriate route which results in an effective treatment in the subject. In some embodiments of any of the aspects, administration comprises physical human activity, e.g., an injection, act of ingestion, an act of application, and/or manipulation of a delivery device or machine. Such activity can be performed, e.g., by a medical professional and/or the subject being treated.


As used herein, “contacting” refers to any suitable means for delivering, or exposing, an agent to at least one cell. Exemplary delivery methods include, but are not limited to, direct delivery to cell culture medium, perfusion, injection, or other delivery method well known to one skilled in the art. In some embodiments of any of the aspects, contacting comprises physical human activity, e.g., an injection; an act of dispensing, mixing, and/or decanting; and/or manipulation of a delivery device or machine.


The term “statistically significant” or “significantly” refers to statistical significance and generally means a two standard deviation (2SD) or greater difference.


Other than in the operating examples, or where otherwise indicated, all numbers expressing quantities of ingredients or reaction conditions used herein should be understood as modified in all instances by the term “about.” The term “about” when used in connection with percentages can mean±1%.


As used herein, the term “comprising” means that other elements can also be present in addition to the defined elements presented. The use of “comprising” indicates inclusion rather than limitation.


The term “consisting of” refers to compositions, methods, and respective components thereof as described herein, which are exclusive of any element not recited in that description of the embodiment.


As used herein the term “consisting essentially of” refers to those elements required for a given embodiment. The term permits the presence of additional elements that do not materially affect the basic and novel or functional characteristic(s) of that embodiment of the invention.


As used herein, the term “specific binding” refers to a chemical interaction between two molecules, compounds, cells and/or particles wherein the first entity binds to the second, target entity with greater specificity and affinity than it binds to a third entity which is a non-target. In some embodiments of any of the aspects, specific binding can refer to an affinity of the first entity for the second target entity which is at least 10 times, at least 50 times, at least 100 times, at least 500 times, at least 1000 times or greater than the affinity for the third nontarget entity. A reagent specific for a given target is one that exhibits specific binding for that target under the conditions of the assay being utilized.


The singular terms “a,” “an,” and “the” include plural referents unless context clearly indicates otherwise. Similarly, the word “or” is intended to include “and” unless the context clearly indicates otherwise. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of this disclosure, suitable methods and materials are described below. The abbreviation, “e.g.” is derived from the Latin exempli gratia, and is used herein to indicate a non-limiting example. Thus, the abbreviation “e.g.” is synonymous with the term “for example.”


Groupings of alternative elements or embodiments of the invention disclosed herein are not to be construed as limitations. Each group member can be referred to and claimed individually or in any combination with other members of the group or other elements found herein. One or more members of a group can be included in, or deleted from, a group for reasons of convenience and/or patentability. When any such inclusion or deletion occurs, the specification is herein deemed to contain the group as modified thus fulfilling the written description of all Markush groups used in the appended claims.


Unless otherwise defined herein, scientific and technical terms used in connection with the present application shall have the meanings that are commonly understood by those of ordinary skill in the art to which this disclosure belongs. It should be understood that this invention is not limited to the particular methodology, protocols, and reagents, etc., described herein and as such can vary. The terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention, which is defined solely by the claims. Definitions of common terms in immunology and molecular biology can be found in The Merck Manual of Diagnosis and Therapy, 20th Edition, published by Merck Sharp & Dohme Corp., 2018 (ISBN 0911910190, 978-0911910421); Robert S. Porter et al. (eds.), The Encyclopedia of Molecular Cell Biology and Molecular Medicine, published by Blackwell Science Ltd., 1999-2012 (ISBN 9783527600908); and Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, Inc., 1995 (ISBN 1-56081-569-8); Immunology by Werner Luttmann, published by Elsevier, 2006; Janeway's Immunobiology, Kenneth Murphy, Allan Mowat, Casey Weaver (eds.), W. W. Norton & Company, 2016 (ISBN 0815345054, 978-0815345053); Lewin's Genes XI, published by Jones & Bartlett Publishers, 2014 (ISBN-1449659055); Michael Richard Green and Joseph Sambrook, Molecular Cloning: A Laboratory Manual, 4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., USA (2012) (ISBN 1936113414); Davis et al., Basic Methods in Molecular Biology, Elsevier Science Publishing, Inc., New York, USA (2012) (ISBN 044460149X); Laboratory Methods in Enzymology: DNA, Jon Lorsch (ed.) Elsevier, 2013 (ISBN 0124199542); Current Protocols in Molecular Biology (CPMB), Frederick M. Ausubel (ed.), John Wiley and Sons, 2014 (ISBN 047150338X, 9780471503385), Current Protocols in Protein Science (CPPS), John E. Coligan (ed.), John Wiley and Sons, Inc., 2005; and Current Protocols in Immunology (CPI) (John E. Coligan, ADA M Kruisbeek, David H Margulies, Ethan M Shevach, Warren Strobe, (eds.) John Wiley and Sons, Inc., 2003 (ISBN 0471142735, 9780471142737), the contents of which are all incorporated by reference herein in their entireties.


Other terms are defined herein within the description of the various aspects of the invention.


All patents and other publications; including literature references, issued patents, published patent applications, and co-pending patent applications; cited throughout this application are expressly incorporated herein by reference for the purpose of describing and disclosing, for example, the methodologies described in such publications that might be used in connection with the technology described herein. These publications are provided solely for their disclosure prior to the filing date of the present application. Nothing in this regard should be construed as an admission that the inventors are not entitled to antedate such disclosure by virtue of prior invention or for any other reason. All statements as to the date or representation as to the contents of these documents is based on the information available to the applicants and does not constitute any admission as to the correctness of the dates or contents of these documents.


The description of embodiments of the disclosure is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. While specific embodiments of, and examples for, the disclosure are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the disclosure, as those skilled in the relevant art will recognize. For example, while method steps or functions are presented in a given order, alternative embodiments may perform functions in a different order, or functions may be performed substantially concurrently. The teachings of the disclosure provided herein can be applied to other procedures or methods as appropriate. The various embodiments described herein can be combined to provide further embodiments. Aspects of the disclosure can be modified, if necessary, to employ the compositions, functions and concepts of the above references and application to provide yet further embodiments of the disclosure. Moreover, due to biological functional equivalency considerations, some changes can be made in protein structure without affecting the biological or chemical action in kind or amount. These and other changes can be made to the disclosure in light of the detailed description. All such modifications are intended to be included within the scope of the appended claims.


Specific elements of any of the foregoing embodiments can be combined or substituted for elements in other embodiments. Furthermore, while advantages associated with certain embodiments of the disclosure have been described in the context of these embodiments, other embodiments may also exhibit such advantages, and not all embodiments need necessarily exhibit such advantages to fall within the scope of the disclosure.


Some embodiments of the technology described herein can be defined according to any of the following numbered paragraphs:

    • 1. A nucleic acid sequence comprising
      • a. at least one heterologous regulatory sequence selected from an hematopoietic enhancer element and miRNA binding site for a HSC restricted miRNA; and
      • b. a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.
    • 2. The nucleic acid sequence of paragraph 1, comprising at least one hematopoietic enhancer element.
    • 3. The nucleic acid sequence of paragraph 2, wherein the enhancer element comprises a sequence of at least 80% homology to a nucleotide sequence that is selected from the group consisting of: SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 38 and/or SEQ ID NO: 39.
    • 4. The nucleic acid sequence of paragraph 2, wherein the enhancer element comprises an enhancer element of a gene selected from the group consisting of:
      • Kell metalloendopeptidase (KEL); 5′ aminolevulinate synthase 2 (ALAS2); and glycophorin A (GYPA).
    • 5. The nucleic acid sequence of any of paragraphs 1-4, comprising at least one miRNA binding site for at least one HSC-restricted miRNA.
    • 6. The nucleic acid sequence of any of paragraphs 1-5, wherein the at least one miRNA binding site for at least one HSC-restricted miRNA is selected from the group consisting of miR binding sites for miR10aT, miR125, miR155, miR130aT, miR142T, miR196bT, miR99, miR126miR126, miR181, miR193, miR223T, miR542, and let7e.
    • 7. The nucleic acid sequence of any of paragraphs 1-6, comprising at least one hematopoietic enhancer element and at least one miRNA binding site for at least one HSC-restricted miRNA.
    • 8. The nucleic acid sequence of any of paragraphs 1-7, further comprising:
      • a. a heterologous 5′ UTR comprising:
        • i. a 5′UTR sequence of a hematopoietic transcription factor other than GATA1;
        • ii. a sequence of at least 20 nucleotide acids; and/or
        • iii. 1-25 upstream codons uAUGs; and/or
      • b. a hematopoietic enhancer minigene.
    • 9. A nucleic acid sequence comprising
      • a. a 5′ UTR comprising;
        • i. a 5′UTR sequence of a hematopoietic transcription factor other than GATA1;
        • ii. a sequence of at least 20 nucleotide acids; and/or
        • iii. 1-25 upstream codons uAUGs.
      • b. a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.
    • 10. The nucleic acid sequence of any of paragraphs 1-9, wherein the 5′UTR comprises a 5′UTR of a gene selected from the group consisting of: Runt-related transcription factor 1 (RUNX1), LIM Domain Only 2 (LMO2), or ETS Variant 6 (ETV6).
    • 11. The nucleic acid sequence of any of paragraphs 1-10, further comprising at least one hematopoietic enhancer element, miRNA binding site for a HSC restricted miRNA, and/or a hematopoietic enhancer minigene (G1HEM).
    • 12. A nucleic acid sequence comprising
      • a. an hematopoietic enhancer minigene (G1HEM);
      • b. a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.
    • 13. The nucleic acid sequence of paragraph 12, wherein the hematopoietic enhancer minigene (mG1HEM) comprises a sequence of at least 80% homology to a nucleotide sequence of: SEQ ID NO: 13.
    • 14. The nucleic acid sequence of any of paragraphs 12-13, further comprising a 5′ UTR comprising;
      • i. a 5′UTR sequence of a hematopoietic transcription factor other than GATA1;
      • ii. a sequence of at least 20 nucleotide acids; and/or
      • iii. 1-25 upstream codons uAUGs; and/or
    •  at least one hematopoietic enhancer element; and/or at least one miRNA binding site for a HSC restricted miRNA.
    • 15. The nucleic acid sequence of paragraph 14, wherein the 5′ UTR sequence of a hematopoietic transcription factor other than GATA1 is a 5′UTR sequence of a; a gene selected from the group consisting of: Runt-related transcription factor 1 (RUNX1), at least one hematopoietic enhancer element; and/or at least one miRNA binding site for a HSC restricted miRNA.
    • 16. The nucleic acid sequence of any of paragraphs 1-15, wherein the binding site for at least one HSC restricted miRNA comprises a sequence selected from SEQ ID NOs: 31-37 and 43-55.
    • 17. The nucleic acid sequence of any of paragraphs 1-16, wherein the hematopoietic enhancer element comprises a sequence with at least 80% sequence identity to a sequence selected from SEQ ID NOs: 10, 11, 12, 38, and 39.
    • 18. The nucleic acid sequence of any of paragraphs 1-17, wherein the 5′ UTR sequence comprises a sequence with at least 80% sequence identity to a sequence selected from SEQ ID NOs: 14, 15, and 16.
    • 19. The nucleic acid sequence of any of paragraphs 1-18, wherein the sequence comprises a promoter operably linked to the elements of a. and b.
    • 20. The nucleic acid sequence of paragraph 19, wherein the promoter is not a GATA1 promoter.
    • 21. The nucleic acid sequence of paragraph 20, wherein the promoter comprises a promoter sequence of Elongation factor 1-alpha 1 (eEF1a1).
    • 22. The nucleic acid sequence of any of paragraphs 1-21, wherein the sequence encoding a GATA-binding factor 1 (GATA1) polypeptide comprises at least 60% sequence identity to a nucleotide sequence encoding a human GATA1 polypeptide.
    • 23. The nucleic acid sequence of any of paragraphs 1-22, further comprising:
      • a posttranscriptional regulatory element operably linked to the sequence encoding the GATA1 polypeptide.
    • 24. The nucleic acid sequence of paragraph 23, wherein the posttranscriptional regulatory element comprises a Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element (WPRE).
    • 25. The nucleic acid sequence of any of paragraphs 1-24, further comprising an internal ribosome entry site.
    • 26. The nucleic acid sequence of paragraph 25, wherein the internal ribosome entry site is operably linked to a marker gene and wherein the marker gene encodes an optically visible protein or an enzyme.
    • 27. The nucleic acid sequence of any of paragraphs 1-26, wherein the sequence comprises a sequence selected from SEQ ID NOs 8, 9, 61, and 62.
    • 28. The nucleic acid sequence of any of paragraphs 1-27, wherein the nucleic acid sequence is a vector.
    • 29. The nucleic acid sequence of paragraph 28, wherein the vector is a plasmid, or an adenoviral, lentiviral or retroviral vector.
    • 30. A lentiviral particle comprising the nucleic acid sequence of any of paragraphs 1-30.
    • 31. A composition comprising a nucleic acid sequence or particle of any of paragraphs 1-31 and a pharmaceutically acceptable carrier.
    • 32. A method of treating Diamond-Blackfan Anemia in a subject in need thereof, the method comprising administering a therapeutically effective amount of a nucleic acid sequence, particle, or composition of any of paragraphs 1-31 to the patient.
    • 33. A method of restoring early erythroid progenitor cell-specific GATA1 expression, the method comprising contacting a population of cells comprising early erythroid progenitor cells with a nucleic acid sequence, particle, or composition of any of paragraphs 1-31.
    • 34. The method of paragraph 33, wherein the early erythroid progenitor cells comprise a DBA-associated gene mutation.


Some embodiments of the technology described herein can be defined according to any of the following numbered paragraphs:

    • 1. A nucleic acid sequence comprising
      • a. at least one heterologous regulatory sequence selected from an hematopoietic enhancer element and miRNA binding site for a HSC restricted miRNA; and
      • b. a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.
    • 2. The nucleic acid sequence of paragraph 1, comprising at least one hematopoietic enhancer element.
    • 3. The nucleic acid sequence of paragraph 2, wherein the enhancer element comprises a sequence of at least 80% homology to a nucleotide sequence that is selected from the group consisting of: SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 38 and/or SEQ ID NO: 39.
    • 4. The nucleic acid sequence of paragraph 2, wherein the enhancer element comprises an enhancer element of a gene selected from the group consisting of:
      • Kell metalloendopeptidase (KEL); 5′ aminolevulinate synthase 2 (ALAS2); and glycophorin A (GYPA).
    • 5. The nucleic acid sequence of any of paragraphs 1-4, comprising at least one miRNA binding site for at least one HSC-restricted miRNA.
    • 6. The nucleic acid sequence of any of paragraphs 1-5, wherein the at least one miRNA binding site for at least one HSC-restricted miRNA is selected from the group consisting of miR binding sites for miR10aT, miR125, miR155, miR130aT, miR142T, miR196bT, miR99, miR126miR126, miR181, miR193, miR223T, miR542, and let7e.
    • 7. The nucleic acid sequence of any of paragraphs 1-6, comprising at least one hematopoietic enhancer element and at least one miRNA binding site for at least one HSC-restricted miRNA.
    • 8. The nucleic acid sequence of any of paragraphs 1-7, further comprising:
      • a. a heterologous 5′ UTR comprising:
        • i. a 5′UTR sequence of a hematopoietic transcription factor other than GATA1;
        • ii. a sequence of at least 20 nucleotide acids; and/or
        • iii. 1-25 upstream codons uAUGs; and/or
      • b. a hematopoietic enhancer minigene.
    • 9. A nucleic acid sequence comprising
      • a. a 5′ UTR comprising;
        • i. a 5′UTR sequence of a hematopoietic transcription factor other than GATA1;
        • ii. a sequence of at least 20 nucleotide acids; and/or
        • iii. 1-25 upstream codons uAUGs.
      • b. a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.
    • 10. The nucleic acid sequence of any of paragraphs 1-9, wherein the 5′UTR comprises a 5′UTR of a gene selected from the group consisting of: Runt-related transcription factor 1 (RUNX1), LIM Domain Only 2 (LMO2), or ETS Variant 6 (ETV6).
    • 11. The nucleic acid sequence of any of paragraphs 1-10, further comprising at least one hematopoietic enhancer element, miRNA binding site for a HSC restricted miRNA, and/or a hematopoietic enhancer minigene (G1HEM).
    • 12. A nucleic acid sequence comprising
      • a. an hematopoietic enhancer minigene (G1HEM);
      • b. a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.
    • 13. The nucleic acid sequence of paragraph 12, wherein the hematopoietic enhancer minigene (mG1HEM) comprises a sequence of at least 80% homology to a nucleotide sequence of: SEQ ID NO: 13.
    • 14. The nucleic acid sequence of any of paragraphs 12-13, further comprising a 5′ UTR comprising;
      • i. a 5′UTR sequence of a hematopoietic transcription factor other than GATA1;
      • ii. a sequence of at least 20 nucleotide acids; and/or
      • iii. 1-25 upstream codons uAUGs; and/or
    •  at least one hematopoietic enhancer element; and/or at least one miRNA binding site for a HSC restricted miRNA.
    • 15. The nucleic acid sequence of paragraph 14, wherein the 5′ UTR sequence of a hematopoietic transcription factor other than GATA1 is a 5′UTR sequence of a; a gene selected from the group consisting of: Runt-related transcription factor 1 (RUNX1), at least one hematopoietic enhancer element; and/or at least one miRNA binding site for a HSC restricted miRNA.
    • 16. The nucleic acid sequence of any of paragraphs 1-15, wherein the binding site for at least one HSC restricted miRNA comprises a sequence selected from SEQ ID NOs: 31-37 and 43-55.
    • 17. The nucleic acid sequence of any of paragraphs 1-16, wherein the hematopoietic enhancer element comprises a sequence with at least 80% sequence identity to a sequence selected from SEQ ID NOs: 10, 11, 12, 38, and 39.
    • 18. The nucleic acid sequence of any of paragraphs 1-17, wherein the 5′ UTR sequence comprises a sequence with at least 80% sequence identity to a sequence selected from SEQ ID NOs: 14, 15, and 16.
    • 19. The nucleic acid sequence of any of paragraphs 1-18, wherein the sequence comprises a promoter operably linked to the elements of a. and b.
    • 20. The nucleic acid sequence of paragraph 19, wherein the promoter is not a GATA1 promoter.
    • 21. The nucleic acid sequence of paragraph 20, wherein the promoter comprises a promoter sequence of Elongation factor 1-alpha 1 (eEF1a1).
    • 22. The nucleic acid sequence of any of paragraphs 1-21, wherein the sequence encoding a GATA-binding factor 1 (GATA1) polypeptide comprises at least 60% sequence identity to a nucleotide sequence encoding a human GATA1 polypeptide.
    • 23. The nucleic acid sequence of any of paragraphs 1-22, further comprising:
      • a posttranscriptional regulatory element operably linked to the sequence encoding the GATA1 polypeptide.
    • 24. The nucleic acid sequence of paragraph 23, wherein the posttranscriptional regulatory element comprises a Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element (WPRE).
    • 25. The nucleic acid sequence of any of paragraphs 1-24, further comprising an internal ribosome entry site.
    • 26. The nucleic acid sequence of paragraph 25, wherein the internal ribosome entry site is operably linked to a marker gene and wherein the marker gene encodes an optically visible protein or an enzyme.
    • 27. The nucleic acid sequence of any of paragraphs 1-26, wherein the sequence comprises a sequence selected from SEQ ID NOs 8, 9, 61, and 62.
    • 28. The nucleic acid sequence of any of paragraphs 1-27, wherein the nucleic acid sequence is a vector.
    • 29. The nucleic acid sequence of paragraph 28, wherein the vector is a plasmid, or an adenoviral, lentiviral or retroviral vector.
    • 30. A lentiviral particle comprising the nucleic acid sequence of any of paragraphs 1-30.
    • 31. A composition comprising a nucleic acid sequence or particle of any of paragraphs 1-31 and a pharmaceutically acceptable carrier.
    • 32. A method of treating Diamond-Blackfan Anemia in a subject in need thereof, the method comprising administering a therapeutically effective amount of a nucleic acid sequence, particle, or composition of any of paragraphs 1-31 to the patient.
    • 33. A method of restoring early erythroid progenitor cell-specific GATA1 expression, the method comprising contacting a population of cells comprising early erythroid progenitor cells with a nucleic acid sequence, particle, or composition of any of paragraphs 1-31.
    • 34. The method of paragraph 33, wherein the early erythroid progenitor cells comprise a DBA-associated gene mutation.
    • 35. A nucleic acid sequence, particle, or composition of any of paragraphs 1-31 for use in the treatment of Diamond-Blackfan Anemia in a subject in need thereof.


The technology described herein is further illustrated by the following examples which in no way should be construed as being further limiting.


EXAMPLES
Example 1: Methods for the Treatment of Dba Using Gata1 Gene Therapy

Diamond-Blackfan anemia (DBA), also known as congenital hypoplastic anemia, is a condition that was first described in 1938 and is characterized by a paucity of red blood cell progenitors and precursors in the bone marrow of patients, while all other aspects of hematopoiesis occur in an ostensibly normal manner (1, 2). DBA is estimated to occur in approximately 1 in 100,000 to 200,000 live births (3), although this may be an underestimate given a number of individuals who have been found to have variable expressivity or who may have been misdiagnosed. For many decades, the diagnosis of DBA was made primarily based upon clinical criteria and was assisted by the use of the biomarker erythrocyte adenosine deaminase, which is elevated in ˜80% of patients with DBA (3).


After an extensive mapping effort that spanned much of the 1990s, the first gene mutated in DBA was discovered in 1999 through the identification of an individual with a translocation on chromosome 19 (4). Surprisingly, heterozygous loss of function mutations were identified in ˜20-25% of DBA cases in this initial mutated gene, which was a ubiquitously expressed ribosomal protein (RP) gene, RPS19. This immediately raised a lot of speculation about underlying mechanisms and whether a ribosomal or non-ribosomal role for RPS19 may be involved. A number of subsequent studies demonstrated that impaired ribosome biogenesis appeared to be a major contributor to this phenotype as a result of RP haploinsufficiency, suggesting a role for ribosome activity/levels in this phenotype (5). However, the underlying basis for the erythroid-specificity of this disorder remained a mystery.


Subsequent studies in cohorts of patients with DBA that either employed targeting sequencing, assessment of copy number variation using single nucleotide polymorphism microarrays/comparative genomic hybridization, or whole exome sequencing have revealed a total of 19 distinct RPs harboring heterozygous loss of function mutations that result in RP haploinsufficiency (6, 7). Collectively, these mutations explain the cause in ˜60-80% of DBA cases. These 19 RP gene mutations are heterogeneously distributed throughout the ribosome and involve both the large (60 S) and small (40 S) subunits of the ribosome. There is no clustering of mutations on a particular structural region of the ribosome (8). More recently, through whole exome sequencing on a cohort of over 450 patients with a diagnosis of DBA, the inventors have now identified an additional 7 RP gene mutations, bringing the total number of RP genes implicated in this disorder to 26 that collectively explain the underlying basis of ˜80% of DBA cases (nearly ⅓ of RPs composing the ribosome) (9).


Despite the advances in understanding the majority of genetic causes of DBA, there have been two major limitations that have been present. Despite the robust findings of heterozygous RP loss of function mutations in the majority of DBA cases, how this can lead to the erythroid-specific hematopoietic defects in DBA has remained an enigma (10). Secondly, there are very limited therapies available to treat patients with DBA at the current time (3, 10). Some patients respond to corticosteroids, but there are often significant side effects limiting the long-term effectiveness of this therapy in the majority of patients. Many patients require chronic red blood cell transfusions, which can be associated with significant and difficult to control iron overload. Finally, some patients can be cured through the use of allogeneic bone marrow transplantation, but in general this is limited to those with matched sibling donors, given the poor outcomes noted with unrelated donor transplantation in this condition (11). Only limited candidate experimental therapeutics have been developed to date and many have unfortunately not shown robust efficacy in later stage pre-clinical or clinical studies (12). Therefore, there is a significant need for new and improved therapies for DBA that could be effective in the majority of patients with this condition, which is due to a large number of distinct mutations primarily affecting RP genes.


With these limitations in mind, the inventors reasoned that further study of DBA through the use of human genetics coupled with mechanistic follow up could give us further insight into this disorder and allow us to identify improved therapeutic strategies. The inventors subsequently identified the first non-RP gene mutation in this disorder. The inventors identified several patients with a diagnosis of DBA who had mutations that impaired the production of the long protein form of the hematopoietic master transcription factor GATA1 (13). Several other patients with similar types of mutations were subsequently reported, as well (14-16). While these findings demonstrated that GATA1 mutations could cause a phenotype resembling DBA, whether there was a molecular connection between the more commonly observed RP gene mutations and the GATA1 mutations remained unclear.


The inventors tested whether RP haploinsufficiency—the most common cause of DBA—could alter GATA1 translation. The inventors could demonstrate using both RP suppression in primary human hematopoietic stem and progenitor cells (HSPCs) and in DBA patient samples that GATA1 mRNA translation was impaired in the setting of RP haploinsufficiency, while a variety of other erythroid-important transcripts were not affected in terms of their translation in this setting (15). Moreover, the inventors demonstrated that increasing GATA1 protein levels through lentiviral expression was sufficient to rescue the erythroid differentiation defect present in mononuclear cells from DBA patients with various RP gene mutations (to the level that is seen in normal individuals). These results produced a model, as illustrated in FIG. 1, regarding the pathogenesis of DBA.


However, a number of questions have remained. (1) It was unclear exactly how the ribosome was being altered in the setting of RP haploinsufficiency. It was possible that the ribosome may be altered in composition in this case, although the finding of 28 distinct RP mutations in this condition made this seem less likely. An alternative, although not mutually exclusive, possibility was that ribosome levels were reduced in the setting of RP haploinsufficiency. (2) The range of transcripts beyond those that were specifically tested in initial studies and the features common to those transcripts remained unclear. (3) The stage of hematopoiesis at which these defects emerged was also unclear.


The inventors then employed a ribosome profiling approach to better understand at a genomic level what transcripts were affected by this reduction in ribosome levels due to DBA-associated molecular lesions (19, 20). The inventors were able to obtain high quality ribosome profiling data from RP haploinsufficient HSPCs undergoing erythroid lineage commitment—a stage at which the functional defects in erythroid differentiation arise. Importantly, through analysis of this data, the inventors could show that a limited set of ˜500 transcripts display the most significant changes in translation efficiency in the setting of RP haploinsufficiency (similar for RPS19 or RPL5 suppression). Consistent with the inventors earlier targeted findings from polysome analysis, GATA1 mRNA was among the most downregulated transcripts in terms of translation efficiency. Interestingly, the majority of other transcripts showing translational downregulation were all components of the ribosome or ribosome-associated factors, including all RPs and a variety of translation initiation and elongation factors. Upon further analysis by using cap analysis of gene expression to define 5′ untranslated regions (UTRs) for these transcripts, the inventors could show that those transcripts that were most highly translated at baseline and which had short and unstructured 5′ UTRs tended to be the ones that were downregulated at the translational level in the setting of RP haploinsufficiency. Interestingly, among all hematopoietic master transcript factors, only GATA1 has a short 5′ UTR and the inventors could show that replacing this 5′ UTR with those of other master regulators (such as RUNX1, LMO2, or ETV6) altered the translation of this key hematopoietic transcription factor.


Finally, the inventors also demonstrated that this happens in vivo in DBA patients and the inventors assessed the stage of hematopoiesis at which these lesions emerge. The inventors showed by both immunohistochemistry for GATA1 in bone marrow biopsy specimens and using intracellular flow cytometry that GATA1 levels were reduced in hematopoietic progenitors from DBA patients. Importantly, the inventors demonstrated that GATA1 levels were reduced even upon its earliest expression in very primitive CD34+CD38− HSPCs from DBA patient bone marrow cells, as compared to control samples (FIG. 3). In addition, the inventors found that GATA1 levels continued to be lower in DBA patient cells, even as GATA1 levels increased in more mature CD34+CD38+ HSPCs. These results are consistent with the emerging model that hematopoietic lineage commitment occurs at the most primitive stages of stem and progenitor cells and demonstrates the relevance of these findings to human disease (21-23).


All of these mechanistic findings have important implications for improving the understanding of DBA pathogenesis. However, the challenge still remained as to how better therapies can be developed for DBA. As discussed above, the only currently available therapies are the chronic use of corticosteroids, regular blood transfusions, or allogeneic hematopoietic stem cell transplantation (10). An alternative and valuable approach would be to use autologous hematopoietic stem cell transplantation coupled to gene therapy (24). Indeed, there have been attempts to develop lentiviral vectors to allow for increased production of RPS19 (25). It is difficult to envision how this approach can be useful for the majority of patients, given the pleiotropic RP gene mutations present in DBA patients (28 mutations have been identified to date). Given the inventor's findings that impaired GATA1 protein production underlies all DBA cases and that increasing GATA1 protein is sufficient to rescue the erythroid differentiation defects present in these patients, the development of GATA1 gene therapy is a valuable approach for achieving curative treatment in DBA patients. The major limitation, as discussed in detail below, is that expression of GATA1 in the hematopoietic stem cell (HSC) compartment will cause the stem cells to differentiate precociously and the expression of GATA1 during terminal erythropoiesis needs to be regulated.


While GATA1 protein levels are suppressed in HSPCs from DBA patients and increasing GATA1 expression can ameliorate the erythroid lineage commitment defect characteristic of DBA, dysregulated expression of GATA1 can be problematic. HSCs can undergo precocious differentiation with exogenous GATA1 expression and effective terminal erythropoiesis requires regulation of GATA1 levels.


Based on the inventor's mechanistic studies, the development of GATA1 gene therapy for treatment of DBA is compelling and appears to be a promising approach. The inventors have been able to demonstrate that increasing GATA1 expression can rescue the erythroid differentiation defect in primary HSPCs from patients with DBA harboring a variety of molecular lesions in various RP genes. In addition, the inventors have also been able to show that they can regularly produce the same results across a variety of DBA-associated molecular lesions modeled in primary HSPCs through RNA interference-based approaches (15, 17). In these cases, the increased expression of GATA1 was achieved through the use of lentiviruses, where the GATA1 cDNA containing altered 5′ and 3′ UTR elements was under the transcriptional control of a lentiviral LTR that displays high-level and ubiquitous expression. For therapeutic purposes, such expression must be regulated and tuned at various stages of the differentiation process. GATA1 levels must be controlled to avoid any perturbations of hematopoiesis.


Prior studies have shown that exogenous unregulated expression of Gata1 in mouse HSCs can promote precocious differentiation toward the megakaryocytic and erythroid lineages, while preventing the maintenance of self-renewing HSCs capable of long-term engraftment (26, 27). Indeed, exogenous Gata1 expression can reprogram other hematopoietic lineages to take on an erythroid fate (26). However, regulated expression of a Gata1 transgene can allow long-term maintenance of HSCs (27). To bolster these findings in a human context, the inventors have utilized a serum-free culture system that allows for the maintenance of long-term engrafting human HSCs (capable of engrafting immunodeficient xenograft recipients) over the course of a few days in culture. In this setting, the introduction of exogenous GATA1 expression regulated by a lentiviral LTR element causes precocious differentiation of these cells, while the control cells maintained their phenotype and functional ability to give rise to long-term hematopoietic grafts. These findings extend the previously published results in mouse models (26). These results also collectively emphasize the need to prevent GATA1 expression in early HSCs to allow for effective engraftment, as would be required for a curative lentiviral gene therapy approach. In addition, GATA1 levels must not be excessively elevated during terminal erythroid differentiation, since this can impair effective erythropoiesis (28). To address these issues, the inventors undertook a series of studies to identify key regulatory elements that will permit regulated expression of GATA1 from lentiviral vectors.


To achieve regulated expression of GATA1 for effective gene therapy, the inventors have been employing two complementary and synergistic approaches to ensure that there will not be potentially detrimental ectopic expression, while also regulating levels of GATA1 during the course of erythroid differentiation. It is contemplated herein that either approach could be used alone, or that they can be combined.


The first regulatory element that is being used in the gene therapy vectors is a GATA1 hematopoietic enhancer minigene (G1HEM) that concatenates 4 distinct regulatory elements to achieve faithful expression of GATA1 during hematopoiesis (27, 29). These elements include a −3 kb hematopoietic enhancer, an upstream double GATA motif, an upstream CACCC box, and a segment of the first intron of GATA1. Indeed, the 979 nucleotides present in this minigene are sufficient to drive Gata1 cDNA expression appropriately to rescue a Gata1 knockout mouse and allow for ostensibly normal erythropoiesis.


For the development of the GATA1 expression vectors that are clinically usable and involve the first transcriptional regulatory element discussed above, the inventors utilize safe and well-designed vectors that have already been proven effective in human clinical studies. The pRRL.PPT.EFS vector that has demonstrated controlled and well-regulated exogenous cDNA expression in a variety of human hematopoietic cell types and which has been utilized in clinical settings (30) is one such vector. The G1HEM can be incorporated upstream of the GATA1 cDNA that is both driven by the endogenous promoter or by a modified (shortened) ubiquitous EF1α promoter (EFS), as an alternative and complementary approach. Importantly, as discussed above, the Gata1 regulatory elements contained in the G1HEM from mice are capable of driving regulated expression of marker genes solely in the cell types where Gata1 is normally expressed and are sufficient to allow appropriate rescue of knockout mice using Gata1 cDNA (27, 31).


The inventors have produced a total of 4 different vectors (the 2 shown in FIG. 6, with both mouse and human regulatory elements used for all cases). The inventors incorporated a self-cleaving 2A peptide (P2A) element followed by the Venus fluorescent marker after the GATA1 cDNA to be able to readily track those cells expressing GATA1 in real time Flow cytometry assays were used to quantify the extent of Venus expression seen in the various hematopoietic cell types tested. The extent of increase in GATA1 expression in cell types that normally express this transcription factor can be assessed by performing cell sorting of particular populations. Finally, using this primary cell culture approach, the inventors can assess variation in phenotypes that occur with GATA1 expression (32-34). This powerful approach allows the inventors to simultaneously determine effectiveness, specificity, and effects upon hematopoietic differentiation using a streamlined approach that is directly relevant to the process of hematopoiesis in vivo. Every vector tested in 2-3 independent primary human hematopoietic cell samples to ascertain both specificity and effectiveness of expression.


While the transcriptional regulatory elements discussed above that compose the G1HEM permit regulated expression of GATA1 cDNA, studies have indicated that there can be leaky expression in the HSC compartment with the use of this regulatory element (27). As this could profoundly affect the ability to obtain long-term engraftment (26), expression in the HSC compartment must be prevented. To achieve this, the inventors incorporated a second gene regulatory element—binding elements for the HSC-restricted microRNA (miR), miR126, after the post transcriptional regulatory elements of the woodchuck hepatitis virus (PRE), e.g., in the modified pRRL.PPT.EES derivatives. Insertion of three repeated miR126 binding elements after the PRE prevents expression of transgenes in the HSC compartment. The inventors also modified the pRRL.PPT.EFS with the G1HEM and GATA1 cDNA to include these miR126 elements, as well. In vitro testing is performed in primary human hematopoietic cells to ensure effective and selective expression. HSCs that will be transplanted into the NOD.Cg-KitW-41J Tyr+ Prkdcscid Il2rgtm1Wj1 (NBSGW) mouse model that has previously used successfully and extensively to produce human hematopoietic xenograft models (36) can be transduced. HSC function can then be tested after 16 weeks of engraftment using phenotypic marker quantification, secondary transplantation into NBSGW recipients, and by assessing Venus expression in the phenotypic HSC compartment.


Described herein is the development of clinical-grade lentiviral vectors that permits the regulated expression of GATA1 cDNA for use in gene therapy. The studies in vitro and in vivo in primary human hematopoietic permit screening of multiple independent vectors incorporating both a critical set of transcriptional regulatory elements (the G1HEM or a derivative of it) and miR126 binding elements.


REFERENCES



  • 1. Nathan D G, Clarke B J, Hillman D G, Alter B P, Housman D E. Erythroid precursors in congenital hypoplastic (Diamond-Blackfan) anemia. The Journal of clinical investigation. 1978; 61(2):489-98. doi: 10.1172/JCI108960. PubMed PMID: 621285; PMCID: PMC372560.

  • 2. Iskander D, Psaila B, Gerrard G, Chaidos A, En Foong H, Harrington Y, Karnik L C, Roberts I, de la Fuente J, Karadimitris A. Elucidation of the EP defect in Diamond-Blackfan anemia by characterization and prospective isolation of human EPs. Blood. 2015; 125(16):2553-7. doi: 10.1182/blood-2014-10-608042. PubMed PMID: 25755292.

  • 3. Vlachos A, Ball S, Dahl N, Alter B P, Sheth S, Ramenghi U, Meerpohl J, Karlsson S, Liu J M, Leblanc T, Paley C, Kang E M, Leder E J, Atsidaftos E, Shimamura A, Bessler M, Glader B, Lipton J M, Participants of Sixth Annual Daniella Maria Arturi International Consensus C. Diagnosing and treating Diamond Blackfan anaemia: results of an international clinical consensus conference. Br J Haematol. 2008; 142(6):859-76. doi: 10.1111/j.1365-2141.2008.07269.x. PubMed PMID: 18671700; PMCID: PMC2654478.

  • 4. Draptchinskaia N, Gustavsson P, Andersson B, Pettersson M, Willig T N, Dianzani I, Ball S, Tchernia G, Klar J, Matsson H, Tentler D, Mohandas N, Carlsson B, Dahl N. The gene encoding ribosomal protein S19 is mutated in Diamond-Blackfan anaemia. Nat Genet. 1999; 21(2):169-75. doi: 10.1038/5951. PubMed PMID: 9988267.

  • 5. Flygare J, Karlsson S. Diamond-Blackfan anemia: erythropoiesis lost in translation. Blood. 2007; 109(8):3152-4. doi: 10.1182/blood-2006-09-001222. PubMed PMID: 17164339.

  • 6. Mirabello L, Khincha P P, Ellis S R, Giri N, Brodie S, Chandrasekharappa S C, Donovan F X, Zhou W, Hicks B D, Boland J F, Yeager M, Jones K, Zhu B, Wang M, Alter B P, Savage S A. Novel and known ribosomal causes of Diamond-Blackfan anaemia identified through comprehensive genomic characterisation. J Med Genet. 2017. doi: 10.1136/jmedgenet-2016-104346. PubMed PMID: 28280134.

  • 7. Landowski M, O'Donohue M F, Buros C, Ghazvinian R, Montel-Lehry N, Vlachos A, Sieff C A, Newburger P E, Niewiadomska E, Matysiak M, Glader B, Atsidaftos E, Lipton J M, Beggs A H, Gleizes P E, Gazda H T. Novel deletion of RPL15 identified by array-comparative genomic hybridization in Diamond-Blackfan anemia. Hum Genet. 2013; 132(11):1265-74. doi: 10.1007/s00439-013-1326-z. PubMed PMID: 23812780; PMCID: PMC3797874.

  • 8. Khatter H, Myasnikov A G, Natchiar S K, Klaholz B P. Structure of the human 80S ribosome. Nature. 2015; 520(7549):640-5. doi: 10.1038/nature l4427. PubMed PMID: 25901680.

  • 9. Ulirsch J C, Verboon J M, Kazerounian S, Guo M H, Yuan D, Ludwig L S, Handsaker R E, Abdulhay N J, Fiorini C, Genovese G, Lim E T, Cheng A, Cummings B B, Chao K R, Beggs A H, Genetti C A, Sieff C A, Newburger P E, Niewiadomska E, Matysiak M, Vlachos A, Lipton J M, Atsidaftos E, Glader B, Narla A, Gleizes P E, O'Donohue M F, Montel-Lehry N, Amor D J, McCarroll S A, O'Donnell-Luria A H, Gupta N, Gabriel S B, MacArthur D G, Lander E S, Lek M, Da Costa L, Nathan D G, Korostelev A A, Do R, Sankaran V G, Gazda H T. The Genetic Landscape of Diamond-Blackfan Anemia. Am J Hum Genet. 2018; 103(6):930-47. doi: 10.1016/j.ajhg.2018.10.027. PubMed PMID: 30503522.

  • 10. Lipton J M, Ellis S R. Diamond-Blackfan anemia: diagnosis, treatment, and molecular pathogenesis. Hematology/oncology clinics of North America. 2009; 23(2):261-82. doi: 10.1016/j.hoc.2009.01.004. PubMed PMID: 19327583; PMCID: PMC2886591.

  • 11. Roy V, Perez W S, Eapen M, Marsh J C, Pasquini M, Pasquini R, Mustafa M M, Bredeson C N, Non-Malignant Marrow Disorders Working Committee of the International Bone Marrow Transplant R. Bone marrow transplantation for diamond-blackfan anemia. Biol Blood Marrow Transplant. 2005; 11(8):600-8. doi: 10.1016/j.bbmt.2005.05.005. PubMed PMID: 16041310.

  • 12. Narla A, Vlachos A, Nathan D G. Diamond Blackfan anemia treatment: past, present, and future. Semin Hematol. 2011; 48(2):117-23. doi: 10.1053/j.seminhematol.2011.01.004. PubMed PMID: 21435508; PMCID: PMC3073777.

  • 13. Sankaran V G, Ghazvinian R, Do R, Thiru P, Vergilio J A, Beggs A H, Sieff C A, Orkin S H, Nathan D G, Lander E S, Gazda H T. Exome sequencing identifies GATA1 mutations resulting in Diamond-Blackfan anemia. The Journal of clinical investigation. 2012; 122(7):2439-43. doi: 10.1172/JCI63597. PubMed PMID: 22706301; PMCID: PMC3386831.

  • 14. Parrella S, Aspesi A, Quarello P, Garelli E, Pavesi E, Carando A, Nardi M, Ellis S R, Ramenghi U, Dianzani I. Loss of GATA-1 full length as a cause of Diamond-Blackfan anemia phenotype. Pediatr Blood Cancer. 2014; 61(7):1319-21. doi: 10.1002/pbc.24944. PubMed PMID: 24453067; PMCID: PMC4684094.

  • 15. Ludwig L S, Gazda H T, Eng J C, Eichhorn S W, Thiru P, Ghazvinian R, George T I, Gotlib J R, Beggs A H, Sieff C A, Lodish H F, Lander E S, Sankaran V G. Altered translation of GATA1 in Diamond-Blackfan anemia. Nature medicine. 2014; 20(7):748-53. doi: 10.1038/nm.3557. PubMed PMID: 24952648; PMCID: PMC4087046.

  • 16. Klar J, Khalfallah A, Arzoo P S, Gazda H T, Dahl N. Recurrent GATA1 mutations in Diamond-Blackfan anaemia. Br J Haematol. 2014; 166(6):949-51. doi: 10.1111/bjh.12919. PubMed PMID: 24766296.

  • 17. Khajuria R K, Munschauer M, Ulirsch J C, Fiorini C, Ludwig L S, McFarland S K, Abdulhay N J, Specht H, Keshishian H, Mani D R, Jovanovic M, Ellis S R, Fulco C P, Engreitz J M, Schutz S, Lian J, Gripp K W, Weinberg O K, Pinkus G S, Gehrke L, Regev A, Lander E S, Gazda H T, Lee W Y, Panse V G, Carr S A, Sankaran V G. Ribosome Levels Selectively Regulate Translation and Lineage Commitment in Human Hematopoiesis. Cell. 2018; 173(1):90-103 e19. doi: 10.1016/j.cell.2018.02.036. PubMed PMID: 29551269; PMCID: PMC5866246.

  • 18. Mills E W, Green R. Ribosomopathies: There's strength in numbers. Science. 2017; 358(6363). doi: 10.1126/science.aan2755. PubMed PMID: 29097519.

  • 19. Ingolia N T, Ghaemmaghami S, Newman J R, Weissman J S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science. 2009; 324(5924):218-23. doi: 10.1126/science.1168978. PubMed PMID: 19213877; PMCID: PMC2746483.

  • 20. Ingolia N T. Ribosome Footprint Profiling of Translation throughout the Genome. Cell. 2016; 165(1):22-33. doi: 10.1016/j.cell.2016.02.066. PubMed PMID: 27015305; PMCID: PMC4917602.

  • 21. Notta F, Zandi S, Takayama N, Dobson S, Gan O I, Wilson G, Kaufmann K B, McLeod J, Laurenti E, Dunant C F, McPherson J D, Stein L D, Dror Y, Dick J E. Distinct routes of lineage development reshape the human blood hierarchy across ontogeny. Science. 2016; 351(6269):aab2116. doi: 10.1126/science.aab2116. PubMed PMID: 26541609; PMCID: PMC4816201.

  • 22. Velten L, Haas S F, Raffel S, Blaszkiewicz S, Islam S, Hennig B P, Hirche C, Lutz C, Buss E C, Nowak D, Boch T, Hofmann W K, Ho A D, Huber W, Trumpp A, Essers M A, Steinmetz L M. Human haematopoietic stem cell lineage commitment is a continuous process. Nature cell biology. 2017; 19(4):271-81. doi: 10.1038/ncb3493. PubMed PMID: 28319093; PMCID: PMC5496982.

  • 23. Paul F, Arkin Y, Giladi A, Jaitin D A, Kenigsberg E, Keren-Shaul H, Winter D, Lara-Astiaso D, Gury M, Weiner A, David E, Cohen N, Lauridsen F K, Haas S, Schlitzer A, Mildner A, Ginhoux F, Jung S, Trumpp A, Porse B T, Tanay A, Amit I. Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors. Cell. 2015; 163(7):1663-77. doi: 10.1016/j.cell.2015.11.013. PubMed PMID: 26627738.

  • 24. Sankaran V G, Weiss M I. Anemia: progress in molecular mechanisms and therapies. Nature medicine. 2015; 21(3):221-30. doi: 10.1038/nm.3814. PubMed PMID: 25742458; PMCID: 4452951.

  • 25. Debnath S, Jaako P, Siva K, Rothe M, Chen J, Dahl M, Gaspar H B, Flygare J, Schambach A, Karlsson S. Lentiviral Vectors with Cellular Promoters Correct Anemia and Lethal Bone Marrow Failure in a Mouse Model for Diamond-Blackfan Anemia. Molecular therapy: the journal of the American Society of Gene Therapy. 2017; 25(8):1805-14. doi: 10.1016/j.ymthe.2017.04.002. PubMed PMID: 28434866; PMCID: PMC5542636.

  • 26. Iwasaki H, Mizuno S, Wells R A, Cantor A B, Watanabe S, Akashi K. GATA-1 converts lymphoid and myelomonocytic progenitors into the megakaryocyte/erythrocyte lineages. Immunity. 2003; 19(3):451-62. PubMed PMID: 14499119.

  • 27. Takai J, Moriguchi T, Suzuki M, Yu L, Ohneda K, Yamamoto M. The Gata1 5′ region harbors distinct cis-regulatory modules that direct gene activation in erythroid cells and gene inactivation in HSCs. Blood. 2013; 122(20):3450-60. doi: 10.1182/blood-2013-01-476911. PubMed PMID: 24021675.

  • 28. Whyatt D, Lindeboom F, Karis A, Ferreira R, Milot E, Hendriks R, de Bruijn M, Langeveld A, Gribnau J, Grosveld F, Philipsen S. An intrinsic but cell-nonautonomous defect in GATA-1-overexpressing mouse erythroid cells. Nature. 2000; 406(6795):519-24. doi: 10.1038/35020086. PubMed PMID: 10952313.

  • 29. Ohneda K, Shimizu R, Nishimura S, Muraosa Y, Takahashi S, Engel J D, Yamamoto M. A minigene containing four discrete cis elements recapitulates GATA-1 gene expression in vivo. Genes Cells. 2002; 7(12):1243-54. PubMed PMID: 12485164.

  • 30. Schambach A, Bohne J, Chandra S, Will E, Margison G P, Williams D A, Baum C. Equal potency of gammaretroviral and lentiviral SIN vectors for expression of 06-methylguanine-DNA methyltransferase in hematopoietic cells. Mol Ther. 2006; 13(2):391-400. Epub 2005/10/18. doi: 10.1016/j.ymthe.2005.08.012. PubMed PMID: 16226060.

  • 31. Shimizu R, Hasegawa A, Ottolenghi S, Ronchi A, Yamamoto M. Verification of the in vivo activity of three distinct cis-acting elements within the Gata1 gene promoter-proximal enhancer in mice. Genes Cells. 2013; 18(11):1032-41. Epub 2013/10/15. doi: 10.1111/gtc.12096. PubMed PMID: 24118212.

  • 32. Sankaran V G, Ludwig L S, Sicinska E, Xu J, Bauer D E, Eng J C, Patterson H C, Metcalf R A, Natkunam Y, Orkin S H, Sicinski P, Lander E S, Lodish H F. Cyclin D3 coordinates the cell cycle during differentiation to regulate erythrocyte size and number. Genes Dev. 2012; 26(18):2075-87. Epub 2012/08/30. doi: 10.1101/gad.197020.112. PubMed PMID: 22929040; PMCID: 3444733.

  • 33. Sankaran V G, Menne T F, Scepanovic D, Vergilio J A, Ji P, Kim J, Thiru P, Orkin S H, Lander E S, Lodish H F. MicroRNA-15a and -16-1 act via MYB to elevate fetal hemoglobin expression in human trisomy 13. Proc Natl Acad Sci USA. 2011; 108(4):1519-24. Epub 2011/01/06. doi: 10.1073/pnas.1018384108. PubMed PMID: 21205891; PMCID: 3029749.

  • 34. Sankaran V G, Xu J, Byron R, Greisman H A, Fisher C, Weatherall D J, Sabath D E, Groudine M, Orkin S H, Premawardhena A, Bender M A. A functional element necessary for fetal hemoglobin silencing. N Engl J Med. 2011; 365(9):807-14. Epub 2011/09/02. doi: 10.1056/NEJMoa1103070. PubMed PMID: 21879898; PMCID: 3174767.

  • 35. Gentner B, Visigalli I, Hiramatsu H, Lechman E, Ungari S, Giustacchini A, Schira G, Amendola M, Quattrini A, Martino S, Orlacchio A, Dick J E, Biffi A, Naldini L. Identification of hematopoietic stem cell-specific miRNAs enables gene therapy of globoid cell leukodystrophy. Sci Transl Med. 2010; 2(58):58ra84. doi: 10.1126/scitranslmed.3001522. PubMed PMID: 21084719.

  • 36. Fiorini C, Abdulhay N J, McFarland S K, Munschauer M, Ulirsch J C, Chiarle R, Sankaran V G. Developmentally-faithful and effective human erythropoiesis in immunodeficient and Kit mutant mice. Am J Hematol. 2017; 92(9):E513-E9. doi: 10.1002/ajh.24805. PubMed PMID: 28568895; PMCID: PMC5546987.

  • 37. Ito E, Konno Y, Toki T, Terui K. Molecular pathogenesis in Diamond-Blackfan anemia. Int J Hematol. 2010 October; 92(3):413-8.



Example 2: Vector Design for Lineage-Specific Expression of Gata1 as a Therapy for Diamond-Blackfan Anemia

In some embodiments of any of the aspects, described herein are various combinations of the following lentiviral vectors (FIG. 7):


1) Lentiviral backbone: 3rd generation self-inactivating lentiviral backbone based on pHIV-GFP (Welm et al Cell Stem Cell. 2008 Jan. 10. 2(1):90-102), driven by an EF1a promoter and containing an IRES-GFP sequence for initial characterization and testing but which will be removed from the final vector sequence.


2) Mouse GATA1 hematopoietic enhancer minigene (mG1HEM): concatenation of 3 sequences upstream of the mouse GATA1 transcription start site and a fourth sequence from the first intron of mouse GATA1 that have been shown to faithfully allow expression of GATA1 in erythroid cells but not hematopoietic stem cells (Takai et al. Blood. 2013 Nov. 14 122(20):3450-3460).


3) Minimal promoter (minP): either from 5′UTR of mouse GATA1 or from firefly luciferase reporter vector pGL4.25, Genbank accession number DQ904457.1


4) Human GATA1 cDNA (GATA1) with codon optimization for optimal expression in human cells with or without FLAG tag


5) Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element (WPRE) for enhanced stability of transgene mRNA.


6) miR126 binding site (miR126 BS): repeated sequence which is bound by miR126, a microRNA expressed in hematopoietic stem cells, and causes decreased transgene expression in the stem cell compartment (Gentner et al. Sci Trans Med. 2010 Nov. 17 2(58):58-84).


REFERENCES



  • Welm et al Cell Stem Cell. 2008 Jan. 10. 2(1):90-102.Gentner et al. Sci Trans Med. 2010 Nov. 17 2(58):58-84.



Example 3: Gata1 Gene Therapy as a Therapy for Diamond-Blackfan Anemia

Pre-clinical studies by the inventors have shown that GATA-1 augmentation in erythroid cells shows therapeutic effects in Diamond-Blackfan anemia (DBA). Herein, the inventors show the results of further experiments that demonstrate that the regulated increase in GATA1 expression in erythroid precursors, but not in hematopoietic stem cells, provides therapeutic effects in DBA.


A clinically relevant GATA1 gene therapy vector for DBA must achieve four crucial functions (FIG. 27). First, despite the requirement that a gene therapy vector gets incorporated into the genome of long-term, undifferentiated hematopoietic stem cells (LT-HSCs), there must be very little expression of the GATA1 transgene in the stem cell compartment, since GATA1 expression in HSCs leads to a loss of self-renewing stem cells. Second, to overcome the erythroid differentiation defect that is the hallmark of DBA, the gene therapy vector must drive robust expression in early progenitors once they have become committed to erythroid differentiation. Third, to mimic the pattern of endogenous GATA1 expression and achieve normal terminal erythroid differentiation, the expression from the gene therapy vector should decline at late stages of erythroid development. Fourth, developmentally regulated increased GATA1 expression must be sufficient to overcome the erythroid maturation block caused by ribosomal protein haploinsufficiency in experimental model systems and in primary patient samples.


To design a vector that incorporates the four key features above, the inventors first analyzed accessible chromatin peaks upstream of GATA1, and identified chromatin that is open in differentiating erythroid cellsut not in HSCs or other early progenitors. The inventors provide evidence that these regions of DNA contain regulatory elements that are responsible for erythroid-specific expression of GATA1. The inventors constructed a human GATA1 enhancer (hG1E) element (FIG. 28A) by concatenating the 3 regions of DNA with open chromatin upstream of GATA1. The inventors developed a vector that uses the hG1E element to drive both GATA1 and GFP expression by including an internal ribosomal entry site (IRES) sequence between the two genes. As an additional mechanism to achieve developmentally regulated transgene expression, the inventors combined the hG1E element with a miR223T binding site that has been previously used to restrict transgene expression in the HSC compartment.


To assess whether hG1E-GATA1 or hG1E-GATA1-miR constructs can drive sufficient increases in GATA1 expression, the inventors used an in vitro model of DBA. Primary human CD34+ HSPCs were infected with an shRNA vector targeting the DBA gene RPS19 which the inventors have previously shown can mimic the erythroid differentiation defects in vitro that are characteristic of DBA. The inventors defined the erythroid ratio as the proportion of cells that express erythroid markers when cultured under erythropoietic conditions. When co-infected with the hG1E-GATA1 or hG1E-GATA1-miR vector, CD34+ HSPCs had a restored erythroid ratio after RPS19 knockdown at levels comparable to constitutive GATA1 overexpression with the HMD-GATA1 vector, showing rescue of the DBA phenotype (FIG. 28B). As further evidence that hG1E-GATA1 and hG1E-GATA1-miR vectors can drive enough GATA1 expression to be physiologically relevant, the inventors used the G1E murine hematopoietic cell line that lacks endogenous GATA1 expression. Infection of G1E cells with the hG1E-GATA1 and hG1E-GATA1-miR vectors induced terminal erythroid differentiation, as measured by Ter119 expression (FIG. 28C).


Having achieved functionally sufficient increased GATA1 expression in erythroid progenitors, the inventors sought to determine whether the inventors novel regulatory elements can restrict GATA1 expression in the LT-HSC compartment, since GATA1 expression in these cells would impair the maintenance of stem cells in the bone marrow. The inventors infected CD34+ HSPCs with the hG1E-GATA1 or hG1E-GATA1-miR vector and cultured them in conditions that enable short-term HSC maintenance in vitro. Two days after infection, GFP expression and surface expression of LT-HSC markers were assessed by flow cytometry to quantify transgene expression in LT-HSCs. These cells were then transferred to media that promotes erythroid development and GFP expression was measured in differentiated erythroid precursors. There was a significant increase in the ratio of GFP expression in erythroid cells to GFP in HSCs (RBCGFP/HSCGFP ratio) in the cells infected with hG1E-GATA1 and hG1E-GATA1-miR viruses compared to HMD-GATA1 virus that has constitutive expression of GATA1 (FIG. 28D). The increased RBCGFP/HSCGFP ratio is due to restricted expression of the experimental vectors in HSCs. These data reveal that regulated, increased GATA1 expression in erythroid precursors is sufficient to overcome the differentiation block in two distinct in vitro DBA models and has restricted expression in the LT-HSC compartment. This developmentally faithful increase in GATA1 expression provides shows that a gene therapy approach based on regulated GATA1 overexpression can be a viable cure for Diamond-Blackfan anemia.


To further investigate the expression of GATA1 from the hG1E-GATA1 vector in developing erythroid cells, the inventors used a three-phase culture system to induce human HSPCs to differentiate into fully hemoglobinized, enucleated red blood cells in vitro. During in vitro differentiation, developing erythroid progenitors and precursors first express high levels of the transferrin receptor CD71. Several days later, glycophorin A (CD235a) is highly expressed, followed by loss of CD71 expression in terminally differentiated RBCs (FIG. 5a). Following transduction with HMD-GATA1 or hG1E-GATA1, cells that are already primed for erythroid development undergo more rapid early differentiation measured by percentage of cells expressing CD71 compared to negative controls (FIG. 29B). Next, the inventors compared the GFP expression in the terminally differentiated CD71-CD235a+ subset with GFP expression in the more primitive CD71+CD235a+ subset (ErythrocyteGFP/progenitorGFP). There is significantly decreased GFP expression from the hG1E-GATA1 vector in terminally differentiated erythrocytes, faithfully recapitulating the pattern of decreased GATA1 expression during terminal differentiation. Notably, but not unexpectedly, this decreased GFP expression was not seen in the HMD-GATA1 samples, indicating impaired terminal differentiation with unregulated GATA1 expression (FIG. 29C).


Next the inventors sought to recapitulate RPS19 haploinsufficiency in primary HSPCs isolated from healthy adult donors by using CRISPR/Cas9 mediated gene-disruption of RPS19. The inventors showed that efficient editing of RPS19 led to an erythroid maturation block with significantly fewer cells expressing CD71 during early erythroid culture. The inventors then transduced RPS19-edited HSPCs with HMD-empty, HMD-GATA1, or hG1E-GATA1 virus. Of the cells that were committed to erythroid differentiation on day 4 in culture (as measured by CD71 expression), the population infected with HMD-GATA1 or hG1E-GATA1 virus had more CD235 expression (FIG. 30A), confirming the ability of regulated increase of GATA1 expression to rescue the block in erythroid differentiation induced by loss of a ribosomal protein as is seen in DBA. Finally, there was a significant reduction in erythroid colonies detected in a methylcellulose colony forming assay after RPS19 editing that was partially rescued by hG1E-GATA1 (FIG. 30B). Altogether, the inventors data reveal that the hG1E-GATA1 vector satisfies all four criteria that are required to be a gene therapy cure for DBA (FIG. 27).

Claims
  • 1. A nucleic acid sequence comprising a) at least one heterologous regulatory sequence selected from an hematopoietic enhancer element and miRNA binding site for a HSC restricted miRNA; andb) a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.
  • 2. The nucleic acid sequence of claim 1, comprising at least one hematopoietic enhancer element.
  • 3. (canceled)
  • 4. The nucleic acid sequence of claim 2, wherein the enhancer element comprises an enhancer element of a gene selected from the group consisting of: Kell metalloendopeptidase (KEL); 5′ aminolevulinate synthase 2 (ALAS2); andglycophorin A (GYPA).
  • 5. The nucleic acid sequence of claim 1, comprising at least one miRNA binding site for at least one HSC-restricted miRNA.
  • 6. The nucleic acid sequence of claim 1, wherein the at least one miRNA binding site for at least one HSC-restricted miRNA is selected from the group consisting of miR binding sites for miR10aT, miR125, miR155, miR130aT, miR142T, miR196bT, miR99, miR126miR126, miR181, miR193, miR223T, miR542, and let7e.
  • 7. The nucleic acid sequence of claim 1, comprising at least one hematopoietic enhancer element and at least one miRNA binding site for at least one HSC-restricted miRNA.
  • 8. The nucleic acid sequence of claim 1, further comprising: a) a heterologous 5′ UTR comprising: i) a 5′UTR sequence of a hematopoietic transcription factor other than GATA1;ii) a sequence of at least 20 nucleotide acids; and/oriii) 1-25 upstream codons uAUGs; and/orb) a hematopoietic enhancer minigene.
  • 9. A nucleic acid sequence comprising a) a 5′ UTR comprising; i) a 5′UTR sequence of a hematopoietic transcription factor other than GATA1;ii) a sequence of at least 20 nucleotide acids; and/oriii) 1-25 upstream codons uAUGs; andb) a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.
  • 10. The nucleic acid sequence of claim 1, wherein the 5′UTR comprises a 5′UTR of a gene selected from the group consisting of: Runt-related transcription factor 1 (RUNX1), LIM Domain Only 2 (LMO2), or ETS Variant 6 (ETV6).
  • 11. The nucleic acid sequence of claim 1, further comprising at least one hematopoietic enhancer element, miRNA binding site for a HSC restricted miRNA, and/or a hematopoietic enhancer minigene (G1HEM).
  • 12. A nucleic acid sequence comprising a) an hematopoietic enhancer minigene (G1HEM); andb) a sequence encoding a GATA-binding factor 1 (GATA1) polypeptide.
  • 13. (canceled)
  • 14. (canceled)
  • 15. (canceled)
  • 16. The nucleic acid sequence of claim 1, wherein the binding site for at least one HSC restricted miRNA comprises a sequence selected from SEQ ID NOs: 31-37 and 43-55.
  • 17. The nucleic acid sequence of claim 1, wherein the hematopoietic enhancer element comprises a sequence with at least 80% sequence identity to a sequence selected from SEQ ID NOs: 10, 11, 12, 38, and 39.
  • 18. The nucleic acid sequence of claim 1, wherein the 5′ UTR sequence comprises a sequence with at least 80% sequence identity to a sequence selected from SEQ ID NOs: 14, 15, and 16.
  • 19. The nucleic acid sequence of claim 1, wherein the sequence comprises a promoter operably linked to the elements of a) and b).
  • 20. The nucleic acid sequence of claim 19, wherein the promoter is not a GATA1 promoter.
  • 21. The nucleic acid sequence of claim 20, wherein the promoter comprises a promoter sequence of Elongation factor 1-alpha 1 (eEF1a1).
  • 22. (canceled)
  • 23. The nucleic acid sequence of claim 1, further comprising: a posttranscriptional regulatory element operably linked to the sequence encoding the GATA1 polypeptide.
  • 24. The nucleic acid sequence of claim 23, wherein the posttranscriptional regulatory element comprises a Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element (WPRE).
  • 25. The nucleic acid sequence of claim 1, further comprising an internal ribosome entry site.
  • 26. The nucleic acid sequence of claim 25, wherein the internal ribosome entry site is operably linked to a marker gene and wherein the marker gene encodes an optically visible protein or an enzyme.
  • 27. The nucleic acid sequence of claim 1, wherein the sequence comprises a sequence selected from SEQ ID NOs 8, 9, 61, and 62.
  • 28. (canceled)
  • 29. (canceled)
  • 30. (canceled)
  • 31. (canceled)
  • 32. A method of treating Diamond-Blackfan Anemia in a subject in need thereof, the method comprising administering a therapeutically effective amount of a nucleic acid sequence, particle, or composition of claim 1 to the patient.
  • 33. (canceled)
  • 34. (canceled)
  • 35. (canceled)
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims benefit under 35 U.S.C. § 119(e) of U.S. Provisional Application No. 62/859,369 filed Jun. 10, 2019 the content of which is incorporated herein by reference in its entirety.

GOVERNMENT SUPPORT

This invention was made with government support under Grant Nos: R1 DK103794 and R33 HL120791 awarded by the National Institutes of Health. The government has certain rights in the invention.

PCT Information
Filing Document Filing Date Country Kind
PCT/US2020/036600 6/8/2020 WO
Provisional Applications (1)
Number Date Country
62859369 Jun 2019 US