The instant application contains a Sequence Listing that has been submitted in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Mar. 1, 2022, is named H049870729W000-SEQ-FL.TXT and is 88,611 bytes in size.
Existing approaches for the integration and expression of genes of interest in a desired human cellular context are marred by the safety concerns related to either the random nature of viral-mediated integration or unpredictable pattern of gene expression in currently employed targeted genomic integration sites. Disadvantages of these methods lead to their limited use in clinical practice, thus encouraging future research in identifying novel human genomic sites that allow for predictable and safe expression of genes of interest.
Provided herein, in some aspects, are methods and compositions for targeting novel genomic safe harbor sites in the human genome. A bioinformatic search was conducted followed by experimental validation of these genomic safe harbor sites, including at least two that demonstrated stable expression of integrated reporter and therapeutic genes without detrimental changes to cellular transcriptome. The cell-type agnostic criteria used in the bioinformatic search described herein suggest wide-scale applicability of the newly-identified sites for engineering of, for example, a diverse range of tissues for therapeutic as well as enhancement purposes, including modified T-cells for cancer therapy and engineered skin cells to ameliorate inherited diseases and aging. Additionally, the stable and robust levels of gene expression from identified sites enable their use, for example, in industry-scale biomanufacturing of desired proteins in human cells.
Some aspects of the present disclosure provide an engineered nucleic acid targeting vector comprising a sequence of interest flanked by homology arms, each homology arm comprising a sequence homologous to a sequence in a safe harbor site in the human genome in any one of the following loci: 1q31, 3p24, 7q35, and Xq21.
In some embodiments, the safe harbor site is at position 31 on the long arm of chromosome 1 (1q31). For example, the safe harbor site may be at position 31.3 on the long arm of chromosome 1 (1q31.3). In some embodiments, the safe harbor site is within coordinates 195,338,589-195,818,588[GRCh38/hg38] of 1q31.3.
In some embodiments, the safe harbor site is at position 24 on the short arm of chromosome 3 (3p24). For example, the safe harbor site may be at position 24.3 on the short arm of chromosome 3 (3p24.3). In some embodiments, the safe harbor site is within coordinates 22,720,711-22,761,389[GRCh38/hg38] of 3p24.3.
In some embodiments, the safe harbor site is at position 35 of the long arm of chromosome 7 (7q35). For example, the safe harbor site may be within coordinates 145,090,941-145,219,513[GRCh38/hg38] of 7q35. As another example, the safe harbor site may be within coordinates 145,320,384-145,525,881[GRCh38/hg38] of 7q35.
In some embodiments, the safe harbor site is at position 21 in the long arm of chromosome X (Xq21). For example, the safe harbor site may be at position 21.31 in the long arm of chromosome X (Xq21.31). In some embodiments, the safe harbor site is within coordinates 89,174,426-89,179,074[GRCh38/hg38] of Xq21.31.
In some embodiments, the sequence of interest comprises an open reading frame.
In some embodiments, the vector comprises a promoter operably linked to the sequence of interest.
In some embodiments, the sequence of interest comprises or is within a gene of interest. In some embodiments, the gene of interest is selected from Table 2.
In some embodiments, the vector is a double-stranded DNA vector. In some embodiments, the sequence of interest is flanked by regions that enable circularization, for example, via trans-splicing or other means upon expression. See, e.g., Santer L et al. Mol Ther. 2019 Aug. 7; 27(8):1350-1363 and Meganck R M et al. Mol Ther Nucleic Acids. 2021 Jan. 16; 23:821-834, each of which is incorporated by reference herein.
In some embodiments, each homology arm has a length of about 200 to about 500 base pairs (bp), optionally 300 bp.
In some embodiments, each homology arm is a microhomology arm having a length of about 5 to 50 bp, optionally 40 bp.
In some embodiments, the vector further comprises a sequence encoding at least one guide RNA that specifically targets the sequence in the safe harbor site and/or specifically targets a sequence in or near the homology arms.
In some embodiments, the vector further comprises a sequence encoding a programmable nuclease.
Other aspects of the present disclosure provide a delivery system, for example, a viral vector (e.g., adeno-associated virus (AAV)) or a non-viral vector, such as a synthetic lipid nanoparticle or liposome, comprising the vector of any one of the preceding embodiments.
In some embodiments, the delivery system further comprising a programmable nuclease or a nucleic acid encoding the programmable nuclease.
In some embodiments, the programmable nuclease is selected from ZFNs, TALENs, DNA-guided nucleases, and RNA-guided nucleases.
In some embodiments, the programmable nuclease is an RNA-guided nuclease. In some embodiments, the RNA-guided nuclease is a CRISPR Cas nuclease and the delivery system further comprises a guide RNA or a nucleic acid encoding the gRNA. In some embodiments, the CRISPR Cas nuclease is a Cas9 nuclease or a Cas12 nuclease. In some embodiments, the gRNA specifically targets the sequence in the safe harbor site and/or specifically targets a sequence in or near the homology arms. In some embodiments, the delivery system includes a cationic polymer conjugated to a ribonuclear protein (RNP) (e.g., Cas enzyme, such as Cas9, bound to a gRNA).
Yet other aspects of the present disclosure provide a method comprising delivering to a human cell the delivery system of any one of the preceding embodiments.
Still other aspects of the present disclosure provide a method comprising delivering to a human cell the engineered targeting vector any one of the preceding embodiments.
In some embodiments, a method further comprises delivering to the human cell a programmable nuclease or a nucleic acid encoding the programmable nuclease.
In some embodiments, a method further comprises incubating the human cell to modify the safe harbor site to include the sequence of interest.
In some embodiments, the human cell is a stem cell (e.g., an induced pluripotent stem cell (iPSC)), an immune cell (e.g., T cell), or a mesenchymal cell (e.g., fibroblast). In some embodiments, the human cell is a stem cell. In some embodiments, the human cell is an iPSC. In some embodiments, the human cell is a hematopoietic stem cell. In some embodiments, the human cell is a fibroblast (e.g., primary human dermal fibroblast). In some embodiments, the human cell is an embryonic kidney cell (e.g., HEK293T cell). In some embodiments, the human cell is a Jurkat cell. In some embodiments, the human cell is an immune cell. In some embodiments, the human cell is a T cell (e.g., a primary human T cell). In some embodiments, the human cell is a B cell. In some embodiments, the human cell is an NK cell. In some embodiments, the human cell is a mesenchymal cell. In some embodiments, the human cell is a mesenchymal stem cell. In some embodiments, the human cell is a fibroblast.
Still other aspects of the present disclosure provide a method comprising delivering to a subject the delivery system of any one of the preceding embodiments.
Other aspects of the present disclosure provide a method comprising delivering to a subject the engineered targeting vector any one of the preceding embodiments.
In some embodiments, a method further comprises delivering to the subject a programmable nuclease or a nucleic acid encoding the programmable nuclease.
In some embodiments, the programmable nuclease delivered to the subject is selected from ZFNs, TALENs, DNA-guided nucleases, and RNA-guided nucleases. In some embodiments, the programmable nuclease is an RNA-guided nuclease. In some embodiments, the RNA-guided nuclease is a CRISPR Cas nuclease and the delivery system further comprises a guide RNA or a nucleic acid encoding the gRNA. In some embodiments, the CRISPR Cas nuclease is a Cas9 nuclease or a Cas12 nuclease. In some embodiments, the gRNA specifically targets the sequence in the safe harbor site and/or specifically targets a sequence in or near the homology arms.
In some embodiments, the subject has a medical condition selected from Table 1. In some embodiments, the gene of interest is selected from Table 1. In some embodiments, the gene of interest is a variant of a gene selected from Table 1.
Some aspects of the present disclosure provide a guide RNA comprising a sequence homologous to a sequence in a safe harbor site in the human genome in any one of the following loci: 1q31, 3p24, 7q35, and Xq21.
Other aspects of the present disclosure provide a delivery system comprising the guide RNA of the preceding paragraph.
Some aspects of the present disclosure provide a method comprising genetically modifying a safe harbor site in the human genome in any one of the following loci: 1q31, 3p24, 7q35, and Xq21.
Other aspects of the present disclosure provide a engineered nucleic acid targeting vector comprising a sequence of interest flanked by homology arms, wherein each homology arm comprises a sequence homologous to a safe harbor site in the human genome that is at least 50 kb from any known gene, at least 20 kb from an enhanced region, at least 150 kb from a lncRNA and a tRNA, at least 300 kb from any known oncogene, at least 300 kb from a miRNA, and at least 300 kb from a telomere and a centromere.
Yet other aspects of the present disclosure provide a method comprising identifying a safe harbor site in the human genome that is at least 50 kb from any known gene, at least 20 kb from an enhanced region, at least 150 kb from a lncRNA and a tRNA, at least 300 kb from any known oncogene, at least 300 kb from a miRNA, and at least 300 kb from a telomere and a centromere.
Still other aspects of the present disclosure provide a method comprising amplifying sequence from safe harbor site in the human genome that is at least 50 kb from any known gene, at least 20 kb from an enhanced region, at least 150 kb from a lncRNA and a tRNA, at least 300 kb from any known oncogene, at least 300 kb from a miRNA, and at least 300 kb from a telomere and a centromere.
Further aspects of the present disclosure provide a method comprising modifying sequence in safe harbor site in the human genome that is at least 50 kb from any known gene, at least 20 kb from an enhanced region, at least 150 kb from a lncRNA and a tRNA, at least 300 kb from any known oncogene, at least 300 kb from a miRNA, and at least 300 kb from a telomere and a centromere.
Other aspects provide a method comprising introducing a polynucleotide (e.g., gene of interest) into a safe harbor site in a human cell ex vivo and producing a polypeptide (e.g., protein encoded by the gene of interest), wherein the safe harbor site is selected from any one of Table 1, optionally 1q31, 3p24, 7q35, or Xq21.
In some embodiments, the polynucleotide (e.g., gene of interest) encodes a therapeutic protein. In some embodiments, the therapeutic protein is an antibody, for example, selected from a human antibody, a humanized antibody, and a chimeric antibody. An antibody may be a whole antibody or a fragment. In some embodiments, the antibody is a monoclonal antibody. In some embodiments, the antibody is a NANOBODY® or a camelid antibody. Other antibodies are contemplated herein.
In some embodiments, the polynucleotide comprises a viral polynucleotide (e.g., encoding a viral protein). The viral polynucleotide may be, for example, an adenovirus protein, an adeno-associated virus (AAV) protein, a retrovirus protein, or a Herpes virus protein. In some embodiments, the polynucleotide is a gene therapy vector (e.g., a recombinant AAV vector). For example, the polynucleotide may include one or more of a promoter, enhancer, intron, exon, stop signals, polyadenylation signals, inverted terminal repeat (ITR) sequences, replication (rep) genes, capsid (cap) coding sequences, helper genes, or other sequences used in producing a gene therapy vector, such as a recombinant AAV vector.
Development of technologies for predictable, durable and safe expression of desired genetic constructs (e.g., transgenes) in human cells will contribute significantly to the improvement of gene and cell therapies (Bestor, 2000; Ellis, 2005), as well as for protein manufacturing (Lee et al., 2019). One prominent beneficiary of such technologies are genetically engineered T-cell therapies, which require genomic integration of transgenes encoding novel immune receptors (Chen et al., 2020; Richardson et al., 2019); another example is gene therapy for highly proliferating tissues, such as inherited skin disorders, in which entire wild-type gene copies have to be integrated into epidermal stem cells (Droz-Georget Lathion et al., 2015; Hirsch et al., 2017). Advances in genome editing using targeted integration tools (Maeder and Gersbach, 2016) already allow precise genomic delivery and sustained expression of transgenes in certain cellular contexts, such as chimeric antigen receptors (CARs) integrated into the T cell receptor alpha chain locus in T-cells (Eyquem et al., 2017), and coagulation factors delivered to hepatocytes using recombinant adeno-associated viral (rAAV) vectors (Barzel et al., 2015). These applications, however, are limited to specific cell types and cause disruption to the endogenous genes, limiting the diversity of cellular engineering applications. Specific loci in the human genome that support stable and efficient transgene expression, without detrimentally altering cellular functions are known as Genomic Safe Harbor (GSH) sites. Thus, precise integration of functional genetic constructs into GSH sites greatly enhances genome engineering safety and efficacy for clinical and biotechnology applications.
Empirical studies have identified three sites that support long-term expression of transgenes: AAVS1, CCR5 and hRosa26—all of which were established without any a priori safety assessment of the genomic loci in which they reside (Papapetrou and Schambach, 2016). The AAVS1 site, located in an intron of PPP1R12C gene region, has been observed to be a region for rare genomic integration events of the Adeno-associated virus's payload (Oceguera-Yanez et al., 2016). Despite being successfully implemented for durable transgene expression in numerous cell types (Hong et al., 2017), the AAVS1 site location is in a gene-dense region, suggesting potential disruption of expression profiles of genes located in the vicinity of this loci (Sadelain et al., 2012). Additionally, studies indicated frequent transgene silencing and decrease in growth rate following transgene integration into AAVS1 (Ordovas et al., 2015; Shin et al., 2020), which represents a liability for clinical gene therapy. The second site lies within the CCR5 gene, which encodes a protein involved in chemotaxis and also serves as co-receptor for HIV cellular entry in T cells (Jiao et al., 2019). Serendipitously, researchers identified that the naturally-occurring CCR5-delta-32 mutation present in people of Scandanavian-origin results in an HIV-resistant phenotype (Silva and Stumpf, 2004). This finding suggested disposability of this gene and applicability of CCR5 locus for targeted genome engineering, especially for T cell therapies (Lombardo et al., 2011; Sather et al.). However, similar to AAVS1, the CCR5 locus is located in a gene-rich region, surrounded by tumor associated genes (Sadelain et al., 2012), thus severely limiting its safe use for therapeutic purposes. Additionally, CCR5 expression has been associated with promoting functional recovery following stroke (Joy et al., 2019), thus disrupting CCR5 may be undesirable in clinical practice. The third site, human Rosa26 (hRosa26) locus, was computationally predicted by searching the human genome for orthologous sequences of mouse Rosa26 (mRosa26) locus (Trion et al., 2007). The mRosa26 was originally identified in mouse embryonic stem cells by using random integration by lentiviral-mediated delivery of gene trapping constructs consisting of promotorless transgenes ((3-galactosidase and neomycin phosphotransferase), resulting in sustainable expression of these transgenes throughout embryonic development (Friedrich and Soriano, 1991; Zambrowicz et al., 1997). Similar to the other two currently employed GSH sites, hRosa26 is located in an intron of a coding gene THUMPD3 (Trion et al., 2007), the function of which is still not fully characterized. This site is also surrounded by proto-oncogenes in its immediate vicinity (Sadelain et al., 2012), which may be upregulated following transgene insertion, thus potentially limiting the use of hRosa26 in clinical settings.
Attempts have been made to identify new human GSH sites that would satisfy various safety criteria, thus avoiding the disadvantages of existing sites. One approach developed by Sadelein and colleagues used lentiviral transfection of beta-globin and green fluorescence protein (GFP) genes into induced pluripotent stem cells (iPSCs), followed by the assessment of the integration sites in terms of their linear distance from various coding and regulatory elements in the genome, such as cancer genes, miRNAs and ultraconserved regions (Papapetrou et al., 2011). They discovered one lentiviral integration site that satisfied all of the proposed criteria, demonstrating sustainable expression upon erythroid differentiation of iPSCs. However, global transcriptome profile alterations of cells with transgenes integrated into this site were not assessed. A similar approach by Weiss and colleagues used lentiviral integration in Chinese hamster ovary (CHO) cells to identify sites supporting long-term protein expression for biotechnological applications (e.g., recombinant monoclonal antibody production) (Gaidukov et al., 2018). Although this study led to the evaluation of multiple sites for durable, high-level transgene expression in CHO cells, no extrapolation to human genomic sites was determined. Another study aimed at identifying novel GSHs through bioinformatic search of mCreI sites residing in loci that satisfy GSH criteria (Pellenz et al., 2019). Similarly, to previous work, several stably expressing sites were identified and proposed for synthetic biology applications in humans. However, local and global gene expression profiling following integration events in these sites has not yet been assessed.
All of the potential new GSH sites possess a shared limitation of being narrowed by lentiviral- or Cre-based integration. Additionally, safety assessments of these newly identified sites, as well as previously established AAVS1, CCR5 and Rosa26, were carried out by evaluating the differential gene expression of genes located solely in the vicinity of these integration sites, without observing global transcriptomic changes following integration. A more comprehensive bioinformatic-guided and genome-wide search of GSH sites based on established criteria, followed by experimental assessment of transgene expression durability in various cell types and safety assessment using global transcriptome profiling would, thus, lead to the identification of a more reliable and clinically useful genomic region.
In the studies described herein, bioinformatic screening was used to rationally identify multiple sites that satisfy established as well as newly introduced GSH criteria. CRISPR/Cas9 targeted genome editing was used to individually integrate a reporter gene into these sites to monitor long-term expression of the transgene in HEK293T and Jurkat cells. This experimental evaluation in cell lines was followed by testing of two promising candidate sites in primary human T-cells and human dermal fibroblasts using reporter and therapeutic transgenes, respectively. Finally, bulk and single-cell RNA-sequencing experiments were performed to analyze the transcriptomic effects of such integrations into these two newly established GSH sites.
A genome is an organism's complete set of deoxyribonucleic acid (DNA), which contains the genetic instructions needed to develop and direct the activities of every organism. The genes encoded by DNA reside in chromosomes, which are organized packages of DNA found in the nucleus of the cell. Different organisms have different numbers of chromosomes. The human genome contains 23 pairs of chromosomes within the nucleus of all cells: 22 pairs of numbered chromosomes (autosomes); and one pair of sex chromosomes, X and Y.
A gene's cytogenetic location is described in a standardized way, based on the position of a particular band on a stained chromosome, or as a range of bands, if less is known about the exact location. The combination of numbers and letters provide a gene's “address” on a chromosome. This address is made up of several parts, including:
A genomic safe harbor site (SHS or GSH site) is a genomic location where new genes or genetic elements (e.g., promoter, enhancer, etc.) can be introduced into a genome without disrupting the expression or regulation of adjacent genes. These GSH sites are important, inter alia, for effective human disease gene therapies; for investigating gene structure, function and regulation; and for cell marking and tracking. The most widely used human GSH sites were identified by serendipity (e.g., the AAVS1 adeno-associated virus insertion site on chromosome 19); by homology with useful SHS in other species (e.g., the human homolog of the murine Rosa26 locus); and most recently by recognition of the dispensability of a subset of human genes in most or all individuals (e.g., the CCR5 chemokine receptor gene, that when deleted confers resistance to HIV infection)
Provided herein are newly-identified genomic safe harbor sites that may be targeted for stable gene expression without detrimental changes to the cellular transcriptome, for example. Thus, the present disclosure provides, in some embodiments, compositions and methods for targeting any one or more for the genomic safe harbor site(s) identified in Table 1.
In some embodiments, the genomic safe harbor site is on chromosome 1. In some embodiments, the genomic safe harbor site is on the long arm of chromosome 1. In some embodiments, the genomic safe harbor site is at position 31 on the long arm of chromosome 1. For example, the genomic safe harbor site may be at position 31.3 on the long arm of chromosome 1. In some embodiments, the genomic safe harbor site is at position 31.3, coordinates 195,338,589-195,818,588[GRCh38/hg38], on the long arm of chromosome 1.
In some embodiments, the genomic safe harbor site is on chromosome 3. In some embodiments, the genomic safe harbor site is on the short arm of chromosome 3. In some embodiments, the genomic safe harbor site is at position 24 on the short arm of chromosome 3. For example, the genomic safe harbor site may be at position 24.3 on the short arm of chromosome 3. In some embodiments, the genomic safe harbor site is at position 24.3, coordinates 22,720,711-22,761,389[GRCh38/hg38], on the short arm of chromosome 3.
In some embodiments, the genomic safe harbor site is on chromosome 7. In some embodiments, the genomic safe harbor site is on the long arm of chromosome 7. In some embodiments, the genomic safe harbor site is at position 35 on the long arm of chromosome 7. For example, the genomic safe harbor site may be at position 35, coordinates 145,090,941-145,219,513[GRCh38/hg38], on the long arm of chromosome 7. In some embodiments, the genomic safe harbor site may be at position 35, coordinates 145,320,384-145,525,881[GRCh38/hg38], on the long arm of chromosome 7.
In some embodiments, the genomic safe harbor site is on chromosome X. In some embodiments, the genomic safe harbor site is on the long arm of chromosome X. In some embodiments, the genomic safe harbor site is at position 21 on the long arm of chromosome X. For example, the genomic safe harbor site may be at position 21.31 on the long arm of chromosome X. In some embodiments, the genomic safe harbor site is at position 21.31, coordinates 89,174,426-89,179,074[GRCh38/hg38], on the long arm of chromosome X.
Provided herein, in some aspects, are engineered targeting vectors. A targeting vector is a nucleic acid used to deliver foreign genetic material into a cell. A targeting vector may include DNA, RNA or a combination of DNA and RNA. It may be single-stranded or double stranded, depending on the particular use of the vector. In some embodiments, the targeting vector is a double stranded DNA vector.
An engineered nucleic acid is a nucleic acid (e.g., at least two nucleotides covalently linked together, and in some instances, containing phosphodiester bonds, referred to as a phosphodiester backbone) that does not occur in nature. Engineered nucleic acids include recombinant nucleic acids and synthetic nucleic acids. A recombinant nucleic acid is a molecule that is constructed by joining nucleic acids (e.g., isolated nucleic acids, synthetic nucleic acids or a combination thereof) from two different organisms (e.g., human and mouse). A synthetic nucleic acid is a molecule that is amplified or chemically, or by other means, synthesized. A synthetic nucleic acid includes those that are chemically modified, or otherwise modified, but can base pair with (bind to) naturally occurring nucleic acid molecules. Recombinant and synthetic nucleic acids also include those molecules that result from the replication of either of the foregoing.
An engineered nucleic acid may comprise DNA (e.g., genomic DNA, cDNA or a combination of genomic DNA and cDNA), RNA or a hybrid molecule, for example, where the nucleic acid contains any combination of deoxyribonucleotides and ribonucleotides (e.g., artificial or natural), and any combination of two or more bases, including uracil, adenine, thymine, cytosine, guanine, inosine, xanthine, hypoxanthine, isocytosine and isoguanine.
Engineered nucleic acids of the present disclosure may be produced using standard molecular biology methods (see, e.g., Green and Sambrook, Molecular Cloning, A Laboratory Manual, 2012, Cold Spring Harbor Press). In some embodiments, nucleic acids are produced using GIBSON ASSEMBLY® Cloning (see, e.g., Gibson, D. G. et al. Nature Methods, 343-345, 2009; and Gibson, D. G. et al. Nature Methods, 901-903, 2010, each of which is incorporated by reference herein). GIBSON ASSEMBLY® typically uses three enzymatic activities in a single-tube reaction: 5′ exonuclease, the 3′-extension activity of a DNA polymerase and DNA ligase activity. The 5′ exonuclease activity chews back the 5′ end sequences and exposes the complementary sequence for annealing. The polymerase activity then fills in the gaps on the annealed domains. A DNA ligase then seals the nick and covalently links the DNA fragments together. The overlapping sequence of adjoining fragments is much longer than those used in Golden Gate Assembly, and therefore results in a higher percentage of correct assemblies. Other methods of producing engineered nucleic acids may be used in accordance with the present disclosure.
The targeting vectors provided herein include a sequence of interest. A sequence of interest may be any nucleotide sequence, engineered (e.g., recombinant or synthetic), modified or unmodified (e.g., cloned from the genome of an organism without or with modification). In some embodiments, the sequence of interest comprises an open reading frame. An open reading frame is a continuous stretch of codons that begins with a start codon (e.g., ATG), ends with a stop codon (e.g., TAA, TAG, or TGA), and encodes a polypeptide, for example, a protein. An open reading frame is operably linked to a promoter if that promoter regulates transcription of the open reading frame. In some embodiments, the vector comprises a promoter operably linked to the sequence of interest. A promoter is a nucleotide sequence to which RNA polymerase binds to initial transcription (e.g., ATG). Promoters are typically located directly upstream from (at the 5′ end of) a transcription initiation site. In some embodiments, a promoter is an endogenous promoter. An endogenous promoter is a promoter that naturally occurs in that host animal. Promoters may be constitutive or inducible (e.g., temporally or spatially). A targeting vector may also include, for example, other genetic elements, such as enhancers, termination sequences and the like to enable and/or facilitate gene expression.
A sequence of interest of a targeting vector provided herein, in some embodiments, is flanked by homology arms. Homology arms, herein, refer to regions of a targeting vector that are homologous to regions of genomic DNA located in a safe harbor site (e.g., of Table 1). One homology arm is located to the left (5′) of a sequence of interest (the left homology arm) and another homology arm is located to the right (3′) of the sequence of interest (the right homology arm). These homology arms enable homologous recombination between regions of the targeting vector and the genomic safe harbor locus, resulting in insertion of the sequence of interest into the genomic safe harbor site (e.g., via programmable nuclease-mediated) (e.g., CRISPR/Cas9-mediated) homology directed repair (HDR)).
The homology arms may vary in length. For example, each homology arm (the left arm and the right homology arm) may have a length of 5 nucleotide base pairs to 1000 nucleotide base pairs, depending in part on the intended use of the targeting vector. In some embodiments, each homology arm has a length of 50 to 1000, 50 to 900, 50 to 800, 50 to 700, 50 to 600, 50 to 500, 50 to 400, 50 to 300, 50 to 200, 50 to 100, 100 to 1000, 100 to 900, 100 to 800, 100 to 700, 100 to 600, 100 to 500, 100 to 400, 100 to 300, 100 to 200, 150 to 1000, 150 to 900, 150 to 800, 150 to 700, 150 to 600, 150 to 500, 150 to 400, 150 to 300, 150 to 200, 200 to 1000, 200 to 900, 200 to 800, 200 to 700, 200 to 600, 200 to 500, 200 to 400, or 200 to 300 nucleotide base pairs. In other embodiments, for example, in the context of gene modification using the CRIS-PITCh or TAL-PITCh systems (see, e.g., Sakuma T et al. Nat Protoc. 2016 January; 11(1):118-33), each homology arm has a length of 5 to 100, 5 to 90, 5 to 80, 5 to 70, 5 to 60, 5 to 50, 5 to 40, 5 to 30, 5 to 20, 10 to 100, 10 to 90, 10 to 80, 10 to 70, 10 to 60, 10 to 50, 10 to 40, 10 to 30, 10 to 20, 15 to 100, 15 to 90, 15 to 80, 15 to 70, 15 to 60, 15 to 50, 15 to 40, 15 to 30, or 15 to 20 nucleotide base pairs. In some embodiments, each homology arm has a length of about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000 nucleotide bases. Longer homology arms are contemplated herein. In some embodiments, the length of one homology arm differs from the length of the other homology arm. For example, one homology arm may have a length of 200 nucleotide bases, and the other homology arm may have a length of 300 nucleotide bases.
Each homology arm comprises a sequence homologous to a sequence in a safe harbor site in the human genome selected from Table 1, for example. As is understood in the art, each homology arm flanking a gene of interest, for example, includes a sequence that is homologous to a target site in the genome such that the homology arms can function to facilitate insertion of that gene into the target site via a homologous recombination mechanism. Non-limiting examples of homology arm sequences are provided elsewhere herein.
The left homology arm, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of any one of SEQ ID NOs: 25-44. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 25. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 26. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 27. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 28. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 29. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 30. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 31. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 32. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 33. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 34. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 35. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 36. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 37. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 38 In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 39. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 40. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 41. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 42. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 43. In some embodiments, the left homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 44.
The right homology arm, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of any one of SEQ ID NOs: 45-64. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 45. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 46. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 47. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 48. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 49. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 50. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 51. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 52. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 53. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 54. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 55. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 56. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 57. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 58. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 59. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 60. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 61. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 62. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 63. In some embodiments, the right homology arm comprises a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the sequence of SEQ ID NO: 64.
In some embodiments, each homology arm comprises a sequence homologous to a genomic safe harbor site on chromosome 1. In some embodiments, each homology arm comprises a sequence homologous to a genomic safe harbor site on the long arm of chromosome 1. In some embodiments, each homology arm comprises a sequence homologous to a genomic safe harbor site at position 31 on the long arm of chromosome 1. For example, homology arms may comprise sequences homologous to a genomic safe harbor site at position 31.3 on the long arm of chromosome 1. In some embodiments, each homology arm comprises a sequence homologous to a genomic safe harbor site at position 31.3, coordinates 195,338,589-195,818,588[GRCh38/hg38], on the long arm of chromosome 1.
In some embodiments, each homology arm comprises a sequence homologous to a genomic safe harbor site on chromosome 3. In some embodiments, each homology arm comprises a sequence homologous to a genomic safe harbor site on the short arm of chromosome 3. In some embodiments, each homology arm comprises a sequence homologous to a genomic safe harbor site at position 24 on the short arm of chromosome 3. For example, homology arms may comprise sequences homologous to a genomic safe harbor site at position 24.3 on the short arm of chromosome 3. In some embodiments, each homology arm comprises a sequence homologous to a genomic safe harbor site at position 24.3, coordinates 22,720,711-22,761,389[GRCh38/hg38], on the short arm of chromosome 3.
In some embodiments, each homology arm comprises a sequence homologous to a genomic safe harbor site on chromosome 7. In some embodiments, each homology arm comprises a sequence homologous to a genomic safe harbor site on the long arm of chromosome 7. In some embodiments, each homology arm comprises a sequence homologous to a genomic safe harbor site at position 35 on the long arm of chromosome 7. For example, homology arms may comprise sequences homologous to a genomic safe harbor site at position 35, coordinates 145,090,941-145,219,513[GRCh38/hg38], on the long arm of chromosome 7. In some embodiments, homology arms may comprise sequences homologous to a genomic safe harbor site at position 35, coordinates 145,320,384-145,525,881[GRCh38/hg38], on the long arm of chromosome 7.
In some embodiments, each homology arm comprises a sequence homologous to a genomic safe harbor site on chromosome X. In some embodiments, each homology arm comprises a sequence homologous to a genomic safe harbor site on the long arm of chromosome X. In some embodiments, each homology arm comprises a sequence homologous to a genomic safe harbor site at position 21 on the long arm of chromosome X. For example, homology arms may comprise sequences homologous to a genomic safe harbor site at position 21.31 on the long arm of chromosome X. In some embodiments, each homology arm comprises a sequence homologous to a genomic safe harbor site at position 21.31, coordinates 89,174,426-89,179,074[GRCh38/hg38], on the long arm of chromosome X.
Targeting vectors of the present disclosure, in some embodiments, further comprise a sequence encoding at least one guide RNA that specifically targets (e.g., specifically binds to) the sequence in the safe harbor site and/or specifically targets a sequence in or near the homology arms. Specific binding refers to the gRNA binding with high specificity with a particular nucleic acid, as compared with other nucleic acid for which the gRNA has a lower affinity to bind (through Watson-Crick base pairing). Non-limiting examples of guide RNA sequences are described elsewhere herein. In some embodiments, a target vector further comprises a sequence encoding a programmable nuclease, such as a Cas nuclease, a zinc finger nuclease, or a TAL-effector nuclease. These programmable nuclease systems are discussed below.
In some embodiments, a sequence of interest comprises a gene of interest. A gene is a distinct sequence of nucleotides, the order of which determines the order of monomers in a polynucleotide or polypeptide. A gene typically encodes a protein. A gene may be endogenous (occurring naturally in a host organism) or exogenous (transferred, naturally or through genetic engineering, to a host organism). An allele is one of two or more alternative forms of a gene that arise by mutation and are found at the same locus on a chromosome. A gene, in some embodiments, includes a promoter sequence, coding regions (e.g., exons), non-coding regions (e.g., introns), and regulatory regions (also referred to as regulatory sequences). Non-limiting examples of genes of interest are provided in Table 2 below.
Any one or more of the gene(s) of interest in Table 2, for example, may be knocked into any one or more of the genomic safe harbor sites provided herein, ex vivo or in vivo, to treat a particular disease or condition, such as those listed in Table 2. The gene of interest may be modified (e.g., mutated) or unmodified, depending on the particular therapeutic application.
The compositions and methods provided herein, in some embodiments, may be used for manufacturing/producing (e.g., on a large scale) therapeutic proteins from human cells ex vivo. Thus, in some embodiments, a gene of interest encodes a therapeutic protein (see, e.g., Dimitrov D S Methods Mol Biol. 2012; 899: 1-26, incorporated herein by reference). Non-limiting examples of therapeutic proteins include antibodies, Fc fusion proteins, anticoagulants, blood factors, bone morphogenetic proteins, engineered protein scaffolds, enzymes, growth factors, hormones, interferons, interleukins, and thrombolytics. In some embodiments, the therapeutic protein is an antibody. Therapeutic proteins may also be classified based on mechanism of activity, for example, (a) binding non-covalently to target, e.g., mAbs; (b) affecting covalent bonds, e.g., enzymes; and (c) exerting activity without specific interactions, e.g., serum albumin.
Non-limiting examples of antibodies that may be produced using the compositions (e.g., targeting vectors) and/or methods of the present disclosure include: abagovomab, abciximab, abituzumab, abrezekimab, abrilumab, actoxumab, adalimumab, adecatumumab, aducanumab, afasevikumab, afelimomab, alacizumab pegol, alemtuzumab, alirocumab, altumomab pentetate, amatuximab, amivantamab, anatumomab mafenatox, andecaliximab, anetumab ravtansine, anifrolumab, ansuvimab, anrukinzumab, apolizumab, aprutumab ixadotin, arcitumomab, ascrinvacumab, aselizumab, atezolizumab, atidortoxumab, atinumab, atoltivimab, atoltivimab/maftivimab/odesivimab, atorolimumab, avelumab, azintuxizumab vedotin, bamlanivimab, bapineuzumab, basiliximab, bavituximab, bcd-, bectumomab, begelomab, belantamab mafodotin, belimumab, bemarituzumab, benralizumab, berlimatoxumab, bermekimab, bersanlimab, bertilimumab, besilesomab, bevacizumab, bezlotoxumab, biciromab, bimagrumab, bimekizumab, birtamimab, bivatuzumab, bleselumab, blinatumomab, blontuvetmab, blosozumab, bococizumab, brazikumab, brentuximab vedotin, briakinumab, brodalumab, brolucizumab, brontictuzumab, burosumab, cabiralizumab, camidanlumab tesirine, camrelizumab, canakinumab, cantuzumab mertansine, cantuzumab ravtansine, caplacizumab, casirivimab, capromab, carlumab, carotuximab, catumaxomab, cbr-doxorubicin immunoconjugate, cedelizumab, cemiplimab, cergutuzumab amunaleukin, certolizumab pegol, cetrelimab, cetuximab, cibisatamab, cirmtuzumab, citatuzumab bogatox, cixutumumab, clazakizumab, clenoliximab, clivatuzumab tetraxetan, codrituzumab, cofetuzumab pelidotin, coltuximab ravtansine, conatumumab, concizumab, cosfroviximab, crenezumab, crizanlizumab, crotedumab, cr, cusatuzumab, dacetuzumab, daclizumab, dalotuzumab, dapirolizumab pegol, daratumumab, dectrekumab, demcizumab, denintuzumab mafodotin, denosumab, depatuxizumab mafodotin, derlotuximab biotin, detumomab, dezamizumab, dinutuximab, dinutuximab beta, diridavumab, domagrozumab, dorlimomab aritox, dostarlimab, drozitumab, ds-, duligotuzumab, dupilumab, durvalumab, dusigitumab, duvortuxizumab, ecromeximab, eculizumab, edobacomab, edrecolomab, efalizumab, efungumab, eldelumab, elezanumab, elgemtumab, elotuzumab, elsilimomab, emactuzumab, emapalumab, emibetuzumab, emicizumab, enapotamab vedotin, enavatuzumab, enfortumab vedotin, enlimomab pegol, enoblituzumab, enokizumab, enoticumab, ensituximab, epcoritamab, epitumomab cituxetan, epratuzumab, eptinezumab, erenumab, erlizumab, ertumaxomab, etaracizumab, etesevimab, etigilimab, etrolizumab, evinacumab, evolocumab, exbivirumab, fanolesomab, faralimomab, faricimab, farletuzumab, fasinumab, fbta, felvizumab, fezakinumab, fibatuzumab, ficlatuzumab, figitumumab, firivumab, flanvotumab, fletikumab, flotetuzumab, fontolizumab, foralumab, foravirumab, fremanezumab, fresolimumab, frovocimab, frunevetmab, fulranumab, futuximab, galcanezumab, galiximab, gancotamab, ganitumab, gantenerumab, gatipotuzumab, gavilimomab, gedivumab, gemtuzumab ozogamicin, gevokizumab, gilvetmab, gimsilumab, girentuximab, glembatumumab vedotin, golimumab, gomiliximab, gosuranemab, guselkumab, ianalumab, ibalizumab, ibi, ibritumomab tiuxetan, icrucumab, idarucizumab, ifabotuzumab, igovomab, iladatuzumab vedotin, imab, imalumab, imaprelimab, imciromab, imdevimab, imgatuzumab, inclacumab, indatuximab ravtansine, indusatumab vedotin, inebilizumab, infliximab, intetumumab, inolimomab, inotuzumab ozogamicin, ipilimumab, iomab-b, iratumumab, isatuximab, iscalimab, istiratumab, itolizumab, ixekizumab, keliximab, labetuzumab, lacnotuzumab, ladiratuzumab vedotin, lampalizumab, lanadelumab, landogrozumab, laprituximab emtansine, larcaviximab, lebrikizumab, lemalesomab, lendalizumab, lenvervimab, lenzilumab, lerdelimumab, leronlimab, lesofavumab, letolizumab, lexatumumab, libivirumab, lifastuzumab vedotin, ligelizumab, loncastuximab tesirine, losatuxizumab vedotin, lilotomab satetraxetan, lintuzumab, lirilumab, lodelcizumab, lokivetmab, lorvotuzumab mertansine, lucatumumab, lulizumab pegol, lumiliximab, lumretuzumab, lupartumab, lupartumab amadotin, lutikizumab, maftivimab, mapatumumab, margetuximab, marstacimab, maslimomab, mavrilimumab, matuzumab, mepolizumab, metelimumab, milatuzumab, minretumomab, mirikizumab, mirvetuximab soravtansine, mitumomab, modotuximab, mogamulizumab, monalizumab, morolimumab, mosunetuzumab, motavizumab, moxetumomab pasudotox, muromonab-cd, nacolomab tafenatox, namilumab, naptumomab estafenatox, naratuximab emtansine, narnatumab, natalizumab, navicixizumab, navivumab, naxitamab, nebacumab, necitumumab, nemolizumab, neod, nerelimomab, nesvacumab, netakimab, nimotuzumab, nirsevimab, nivolumab, nofetumomab merpentan, obiltoxaximab, obinutuzumab, ocaratuzumab, ocrelizumab, odesivimab, odulimomab, ofatumumab, olaratumab, oleclumab, olendalizumab, olokizumab, omalizumab, omburtamab, oms, onartuzumab, ontuxizumab, onvatilimab, opicinumab, oportuzumab monatox, oregovomab, orticumab, otelixizumab, otilimab, otlertuzumab, oxelumab, ozanezumab, ozoralizumab, pagibaximab, palivizumab, pamrevlumab, panitumumab, pankomab, panobacumab, parsatuzumab, pascolizumab, pasotuxizumab, pateclizumab, patritumab, pdr, pembrolizumab, pemtumomab, perakizumab, pertuzumab, pexelizumab, pidilizumab, pinatuzumab vedotin, pintumomab, placulumab, prezalumab, plozalizumab, pogalizumab, polatuzumab vedotin, ponezumab, porgaviximab, prasinezumab, prezalizumab, priliximab, pritoxaximab, pritumumab, pro, quilizumab, racotumomab, radretumab, rafivirumab, ralpancizumab, ramucirumab, ranevetmab, ranibizumab, raxibacumab, ravagalimab, ravulizumab, refanezumab, regavirumab, regn-eb, relatlimab, remtolumab, reslizumab, rilotumumab, rinucumab, risankizumab, rituximab, rivabazumab pegol, robatumumab, rmab, roledumab, romilkimab, romosozumab, rontalizumab, rosmantuzumab, rovalpituzumab tesirine, rovelizumab, rozanolixizumab, ruplizumab, sa, sacituzumab govitecan, samalizumab, samrotamab vedotin, sarilumab, satralizumab, satumomab pendetide, secukinumab, selicrelumab, seribantumab, setoxaximab, setrusumab, sevirumab, sibrotuzumab, sgn-cda, shp, sifalimumab, siltuximab, simtuzumab, siplizumab, sirtratumab vedotin, sirukumab, sofituzumab vedotin, solanezumab, solitomab, sonepcizumab, sontuzumab, spartalizumab, stamulumab, sulesomab, suptavumab, sutimlimab, suvizumab, suvratoxumab, tabalumab, tacatuzumab tetraxetan, tadocizumab, tafasitamab, talacotuzumab, talizumab, talquetamab, tamtuvetmab, tanezumab, taplitumomab paptox, tarextumab, tavolimab, teclistamab, tefibazumab, telimomab aritox, telisotuzumab, telisotuzumab vedotin, tenatumomab, teneliximab, teplizumab, tepoditamab, teprotumumab, tesidolumab, tetulomab, tezepelumab, tgn, tibulizumab, tildrakizumab, tigatuzumab, timigutuzumab, timolumab, tiragolumab, tiragotumab, tislelizumab, tisotumab vedotin, tocilizumab, tomuzotuximab, toralizumab, tosatoxumab, tositumomab, tovetumab, tralokinumab, trastuzumab, trastuzumab duocarmazine, trastuzumab emtansine, trbs, tregalizumab, tremelimumab, trevogrumab, tucotuzumab celmoleukin, tuvirumab, ublituximab, ulocuplumab, urelumab, urtoxazumab, ustekinumab, utomilumab, vadastuximab talirine, vanalimab, vandortuzumab vedotin, vantictumab, vanucizumab, vapaliximab, varisacumab, varlilumab, vatelizumab, vedolizumab, veltuzumab, vepalimomab, vesencumab, visilizumab, vobarilizumab, volociximab, vonlerolizumab, vopratelimab, vorsetuzumab mafodotin, votumumab, vunakizumab, xentuzumab, xmab-, zalutumumab, zanolimumab, zatuximab, zenocutuzumab, ziralimumab, zolbetuximab (claudiximab), and zolimomab aritox.
The compositions and methods provided herein, in other embodiments, may be used for manufacturing/producing (e.g., on a large scale) gene therapy vectors from human cells ex vivo. Thus, provided herein are methods comprising introducing one or more polynucleotide into a safe harbor site in a human cell ex vivo and producing a recombinant gene therapy vector or one or more components of a gene therapy vector encoded by the one or more polynucleotide. In some embodiments, the polynucleotide comprises a viral polynucleotide (e.g., encoding a viral protein). The viral polynucleotide may be, for example, an adenovirus protein, an adeno-associated virus protein (AAV), a retrovirus protein, or a Herpes virus protein. In some embodiments, the polynucleotide may include one or more of a promoter, enhancer, intron, exon, stop signals, polyadenylation signals, inverted terminal repeat (ITR) sequences, replication (rep) genes, capsid (cap) coding sequences, helper genes, or other sequences used in producing a gene therapy vector, such as a recombinant AAV vector.
Engineered nucleic acids (e.g., sequences of interest) may be introduced to a genomic safe harbor site using any suitable method. The present application contemplates the use of a variety of gene editing and other knock-in technologies, for example, to introduce nucleic acids into a genomic safe harbor site. Non-limiting examples include programmable nuclease-based systems, such as clustered regularly interspaced short palindromic repeat (CRISPR) systems (e.g., including Cas-based systems, prime editing (see, e.g., Anzalone A V et al. Nat Biotechnol. 2021 Dec. 9) and CRISPR-directed integrases (see, e.g., Ioannidi E I et al. bioRxiv, 2021 Nov. 1), zinc-finger nucleases (ZFNs), and transcription activator-like effector nucleases (TALENs). See, e.g., Carroll D Genetics. 2011; 188(4): 773-782; Joung J K et al. Nat Rev Mol Cell Biol. 2013; 14(1): 49-55; and Gaj T et al. Trends Biotechnol. 2013 July; 31(7): 397-405, each of which is incorporated by reference herein.
In some embodiments, a CRISPR system is used to edit a genomic safe harbor site. See, e.g., Harms D W et al., Curr Protoc Hum Genet. 2014; 83: 15.7.1-15.7.27; and Inui M et al., Sci Rep. 2014; 4: 5396, each of which are incorporated by reference herein). For example, Cas9 mRNA or protein, one or multiple guide RNAs (gRNAs), and/or a targeting vector may be used to introduce a sequence of interest into a genomic safe harbor site.
The CRISPR/Cas system is a naturally occurring defense mechanism in prokaryotes that has been repurposed as an RNA-guided-DNA-targeting platform for gene editing. Engineered CRISPR systems contain two main components: a guide RNA (gRNA) and a CRISPR-associated endonuclease (e.g., Cas protein). The gRNA is a short synthetic RNA composed of a scaffold sequence for nuclease-binding and a user-defined nucleotide spacer (e.g., ˜15-25 nucleotides, or ˜20 nucleotides) that defines the genomic target (e.g., gene) to be modified. Thus, one can change the genomic target of the Cas protein by simply changing the target sequence present in the gRNA. In some embodiments, the Cas9 endonuclease is from Streptococcus pyogenes (NGG PAM) or Staphylococcus aureus (NNGRRT or NNGRR(N) PAM), although other Cas9 homologs, orthologs, and/or variants (e.g., evolved versions of Cas9) may be used, as provided herein. Additional non-limiting examples of RNA-guided nucleases that may be used as provided herein include Cpf1 (TTN PAM); SpCas9 D1135E variant (NGG (reduced NAG binding) PAM); SpCas9 VRER variant (NGCG PAM); SpCas9 EQR variant (NGAG PAM); SpCas9 VQR variant (NGAN or NGNG PAM); Neisseria meningitidis (NM) Cas9 (NNNNGATT PAM); Streptococcus thermophilus (ST) Cas9 (NNAGAAW PAM); and Treponema denticola (TD) Cas9 (NAAAAC). In some embodiments, the CRISPR-associated endonuclease is selected from Cas9, Cpf1 (Cas12a), C2c1, and C2c3. In some embodiments, the Cas nuclease is Cas9.
A guide RNA comprises at least a spacer sequence that hybridizes to (binds to) a target nucleic acid sequence and a CRISPR repeat sequence that binds the endonuclease and guides the endonuclease to the target nucleic acid sequence. As is understood by the person of ordinary skill in the art, each gRNA is designed to include a spacer sequence complementary to its genomic target sequence. See, e.g., Jinek et al., Science, 2012; 337: 816-821 and Deltcheva et al., Nature, 2011; 471: 602-607, each of which is incorporated by reference herein.
In some embodiments, a guide RNA comprising a sequence homologous to a sequence in a safe harbor site in the human genome in any one of the loci listed in Table 1, e.g., 1q31, 3p24, 7q35, and Xq21. One skilled in the art can readily determine a gRNA sequence for specifically targeting the genomic safe harbor sites provided herein. Nonetheless, non-limited examples of gRNA sequences are provided as SEQ ID NOs: 5-24. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to any one of the gRNA sequences of SEQ ID NOs: 5-24. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 5. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 6. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 7. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 8. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 9. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 10. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 11. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 12. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 13. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 14. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 15. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 16. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 17. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 18. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 19. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 20. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 21. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 22. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 23. The gRNA, in some embodiments, may comprise a sequence that has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or 100% identity to the gRNA sequence of SEQ ID NO: 24.
In some embodiments, the RNA-guided nuclease and the gRNA are complexed to form a ribonucleoprotein (RNP), prior to delivery to a cell, for example.
The concentration of programmable nuclease or nucleic acid encoding the programmable nuclease may vary. In some embodiments, the concentration is 100 ng/μl to 1000 ng/μl. For example, the concentration may be 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000 ng/μl. In some embodiments, the concentration is 100 ng/μl to 500 ng/μl, or 200 ng/μl to 500 ng/μl.
The concentration of gRNA may also vary. In some embodiments, the concentration is 200 ng/μl to 2000 ng/μl. For example, the concentration may be 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1700, 1900, or 2000 ng/μl. In some embodiments, the concentration is 500 ng/μl to 1000 ng/μl. In some embodiments, the concentration is 100 ng/μl to 1000 ng/μl. For example, the concentration may be 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1000 ng/μl.
In some embodiments, the ratio of concentration of RNA-guided nuclease or nucleic acid encoding the RNA-guided nuclease to the concentration of gRNA is 2:1. In other embodiments, the ratio of concentration of RNA-guided nuclease or nucleic acid encoding the RNA-guided nuclease to the concentration of gRNA is 1:1.
The targeting vector, in some embodiments, is delivered to a subject and/or cell using a delivery system. A delivery system, herein, is any substance or combination of substances that can be used to bring (deliver) a targeting vector to a cell. Delivery systems are often used to effectively deliver nucleic acids to cells ex vivo and/or in vivo. Such delivery systems can protect the targeting vector from inactivation and/or degradation. Non-limiting examples of delivery systems include viral delivery systems and non-viral delivery systems.
In some embodiments, the delivery system is a viral delivery system. Viral delivery system typically includes viruses engineered to be replication deficient. Such viral delivery systems can be used to deliver a targeting vector to a cell by infecting the cell. Non-limiting examples of viral delivery systems include engineered adeno-associated viruses, adenoviruses and lentiviruses. Such viral delivery systems are well-known.
In other embodiments, the delivery system is a non-viral delivery system. Non-limiting examples of non-viral delivery systems include synthetic nanoparticles, such as lipid nanoparticles and liposomes. A lipid nanoparticle is typically spherical with an average diameter between 10 and 1000 nanometers. Lipid nanoparticles possess a solid lipid core matrix that can solubilize lipophilic molecules. The lipid core is stabilized by surfactants (emulsifiers). The surfactant used depends, in part, on the route of administration. The term lipid includes triglycerides (e.g., tristearin), diglycerides (e.g., glycerol bahenate), monoglycerides (e.g., glycerol monostearate), fatty acids (e.g., stearic acid), steroids (e.g., cholesterol), and waxes (e.g., cetyl palmitate). All classes of emulsifiers (with respect to charge and molecular weight) have been used to stabilize lipid dispersions. Liposomes, by contrast, are small, spherical vesicles that have a phospholipid bilayer as coat, because the bulk of the interior of the particle is composed of aqueous substance. Such non-viral delivery systems are well-known.
Other non-viral biological agent delivery systems are also contemplated herein, including bacteria, bacteriophage, virus-like particles (VLPs), erythrocyte ghosts, and exosomes. See, e.g., Seow Y. et al. Mol Ther. 2009 May; 17(5):767-7.
The compositions provided herein may be used, in some embodiments, to deliver a targeting vector (with a modified or unmodified gene of interest, for example) to a genomic safe harbor site in a human cell, ex vivo or in vivo. Thus, provided herein are methods that comprise delivering to a human cell an engineered targeting vector or a delivery system comprising a targeting vector. The methods, in some embodiments, further comprise delivering to the human cell a programmable nuclease (e.g., RNA-guided nuclease and a (one, two, three, or more) gRNA, ZFN, and/or TALEN) or a nucleic acid encoding the programmable nuclease.
The method may also include incubating the human cell to modify the safe harbor site to include the sequence of interest. One of skill in the art can readily determine the incubation conditions to enable homologous recombination or non-homologous end joining to occur, depending on the configuration of the engineered targeting vector (e.g., homology arms v. microhomology arms) and the gene editing system of choice (e.g., RNA-guided nuclease and a (one, two, three, or more) gRNA, ZFN, and/or TALEN). In some embodiments, the human cell (e.g., containing an engineered targeting vector) is incubated for a time period of about 5 minutes to about 3 hours, e.g., 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, or 60 minutes, or 1.5, 2, 2.5, or 3 hours. In some embodiments, the human cell is incubated at a temperature of about 25° C. to about 95° C., e.g., 25° C., 37° C., 42° C. or 95° C.
Various therapies are also contemplated herein. Thus, the present disclosure provides methods of delivering to a subject an engineered targeting vector, a delivery system comprising the engineered targeting vector, or a cell modified using the engineered targeting vector. The subject may suffer from any one or more of the diseases or conditions listed in Table 2. The gene of interest will likely depend on the particular disease or condition, and guidance for selecting particular genes of interest, based on a particular diseases or conditions are provided in Table 2.
Also provided herein are methods comprising identifying a safe harbor site in the human genome that is at least 50 kb (e.g., at least 60, 70, 80, 90, or 100 kb) from any known gene, at least 20 kb (e.g., at least 30, 40, or 50 kb) from an enhanced region, at least 150 kb (e.g., at least 200, 300, 400, or 50 kb) from a long non-coding RNA (lncRNA) and a tRNA, at least 300 kb (e.g., at least 400 or 500 kb) from any known oncogene, at least 300 kb (e.g., at least 400 or 500 kb) from a miRNA, and at least 300 kb (e.g., at least 400 or 500 kb) from a telomere and a centromere.
Some aspects provide methods comprising amplifying sequence from safe harbor site in the human genome that is at least 50 kb (e.g., at least 60, 70, 80, 90, or 100 kb) from any known gene, at least 20 kb (e.g., at least 30, 40, or 50 kb) from an enhanced region, at least 150 kb (e.g., at least 200, 300, 400, or 50 kb) from a lncRNA and a tRNA, at least 300 kb (e.g., at least 400 or 500 kb) from any known oncogene, at least 300 kb (e.g., at least 400 or 500 kb) from a miRNA, and at least 300 kb (e.g., at least 400 or 500 kb) from a telomere and a centromere.
Other aspects provide methods comprising modifying sequence in safe harbor site in the human genome that is at least 50 kb (e.g., at least 60, 70, 80, 90, or 100 kb) from any known gene, at least 20 kb (e.g., at least 30, 40, or 50 kb) from an enhanced region, at least 150 kb (e.g., at least 200, 300, 400, or 50 kb) from a lncRNA and a tRNA, at least 300 kb (e.g., at least 400 or 500 kb) from any known oncogene, at least 300 kb (e.g., at least 400 or 500 kb) from a miRNA, and at least 300 kb (e.g., at least 400 or 500 kb) from a telomere and a centromere.
Multiple delivery methods are available for delivering nucleic acids into a cell in vivo or ex vivo. The method used depends, at least in part, on the delivery system chosen. For example, viral systems use the natural ability of viruses to infect cells that present cell surface receptors to the viral surface proteins. Once a virus attaches through its surface proteins to a cell surface receptor of a target cell, conformational changes occur in the viral proteins that lead either to penetration of the virus through the cell membrane (for non-enveloped viruses), or to fusion of the viral envelope with the cell membrane. Either process results in insertion of the viral genome, or viral payload, into the target cell. For non-viral systems, such as a liposome or an LNP, the payload carried by a particle, can be delivered into target cells through a variety of methods. Non-limiting examples include the fusion of the particle membrane (or coating) with the cell membrane leading to payload insertion into the cytoplasm, the endocytosis of the particle by engulfment into the cell, chemical transfection methods (e.g., calcium phosphate exposure), physical transfection methods (e.g., electroporation).
Multiple routes of administration are available for delivering targeting vectors to a human subject. Exemplary routes of administration include, without limitation, oral, intravenous, intramuscular, intrathecal, sublingual, buccal, rectal, vaginal, ocular, otic, nasal, inhalation, nebulization, cutaneous/subcutaneous (for topical or systemic effect), and transdermal. Modified cells may also be delivered through select routes, including but not limited to intravenous.
Cell therapy (e.g., allogeneic or autologous) is a therapy in which viable cells are injected, grafted or implanted into a patient in order to effectuate a medicinal effect, for example, by transplanting T-cells capable of fighting cancer cells via cell-mediated immunity in the course of immunotherapy, or grafting stem cells to regenerate diseased tissues. The present disclosure contemplates the modification of a myriad of cell types for cell therapy. Non-limiting examples include stem cells (e.g., an induced pluripotent stem cell (iPSC)), red blood cells (e.g., erythrocytes), white blood cells, platelets, nerve cells, muscle cells, cartilage cells (e.g., chondrocytes), bone cells, skin cells, endothelial cells, epithelial cells, fat cells, and sex cells. In embodiments in which red blood cells are contemplate, hematopoietic stem cells may be modified and then differentiated into red blood cells.
Examples of stem cells include, but are not limited to, human embryonic stem cells, human adult stem cells, neural stem cells, mesenchymal stem cells, and hematopoietic stem cells. The stem cells may be, in some embodiments, be induced pluripotent stem cells (iPSCs).
Examples of white blood cells include, but are not limited to, neutrophils, eosinophils, basophils, mast cells, monocytes, macrophages, dendritic cells, natural killer cells, and lymphocytes (B cells and T cells).
Examples of nerve cells include, but are not limited to, neurons and neuroglial cells.
Examples of muscle cells include, but are not limited to, skeletal, cardiac, and smooth muscle cells.
Examples of bone cells include, but are not limited to, osteoblasts, osteoclasts, osteocytes, and lining cells.
Examples of skin cells include, but are not limited to, keratinocytes, melanocytes, Merkel cells, and Langerhans cells.
Examples of fat cells include, but are not limited to, white adipocytes and brown adipocytes.
Particular cell therapies, such as adoptive cell transfer therapies are also provided herein, including, for example, chimeric antigen receptor (CAR) T cell therapy (e.g., for cancer therapy) and fibroblast cell therapy (e.g., to ameliorate inherited diseases and aging).
Additional embodiments of the present disclosure are encompassed by the following numbered paragraphs.
To identify novel sites that could serve as potential GSHs, a genome-wide bioinformatic search was first conducted based on previously established and widely accepted (Sadelain et al., 2012) as well as newly introduced criteria that would satisfy safe and stable gene expression (
Based on bioinformatic screening, close to two thousand sites were identified that satisfied all of the criteria (Table 1). Five sites that varied significantly in size (GSH1, 2, 7, 8, GSH31) were chosen and guide RNAs (gRNA) that showed the best scores in terms of on and off-target activities were designed and then characterized experimentally (
In order to experimentally assess transgene expression from the five predicted novel GSH sites, targeted integration of a gene construct encoding a red fluorescence reporter protein (mRuby) g into two common human cell lines—HEK293T and Jurkat cells was performed. HEK293 are commonly used for medium- to large-scale production of recombinant proteins (Chin et al., 2019), thus identifying GSH in HEK293 may be relevant for protein manufacturing. The Jurkat cell line was derived from T-cells of a pediatric patient with acute lymphoblastic leukemia (Abraham and Weiss, 2004) and has been used extensively for assessing the functionality of engineered immune receptors, thus discovery of GSH in this cell line supports applications in T cell therapies (Roybal et al., 2016; Vazquez-Lombardi et al., 2020). For integration of mRuby, a CRISPR/Cas9-based genome editing strategy was employed that used the Precise Integration into Target Chromosome (PITCh) method, assisted by microhomology-mediated end-joining (MMEJ) (Nakade et al., 2014; Sakuma et al., 2016; Sfeir and Symington, 2015). This approach utilizes a reporter-bearing plasmid possessing short microhomology sequences flanked by gRNA binding sites. Once inside the cells the reporter gene together with microhomologies directed against the candidate GSH site are liberated from the plasmid by Cas9-generated double-stranded breaks (DSB) at gRNA binding sites on the PITCh donor plasmid. A different gRNA-Cas9 pair generates DSBs at the candidate GSH locus, and the freed reporter gene with flanking micro-homologies is integrated by exploiting the MMEJ repair pathway (
Using the PITCh approach, mRuby transgene was transfected into the five candidate GSH sites using the best predicted gRNA sequence for each site (see Methods). A pooled selection of mRuby-expressing HEK293T and Jurkat cells was conducted by fluorescence-activated cell sorting (FACS), followed by expansion for one week and single-cell sorting to produce monoclonal populations of mRuby-expressing cells. In order to determine sites that support long-term stable transgene expression, clones with homogenous and high mRuby expression levels were monitored by performing flow cytometry at day 30, 45, 60 and 90 after integration.
Out of four candidate GSH sites, three sites in HEK293T cells—GSH1, 2 and 7 (
In order to assess whether targeted integration into the candidate GSH sites resulted in aberration of the global transcriptome profiles, bulk RNA-sequencing and analysis was performed. Following ninety days in culture the clone showing the highest GSH2-integrated mRuby levels was compared with untreated cells from the same culture for both HEK293T and Jurkat cells (
Next, targeted integration into GSH1 and GSH2 sites in primary human cells was characterized. One of the potential applications of targeted integration into novel GSH sites is for the ex-vivo engineering of human T-cells, which are being extensively explored for adoptive cell therapies in cancer and autoimmune disease. Thus, GSH1 and GSH2 were first tested in primary human T-cells isolated from peripheral blood of a healthy donor. These sites were targeted by employing an HDR-based integration approach using a linear double-stranded DNA donor template, which contained the mRuby transgene driven by a CMV promoter and with 300 bp homology arms (
Another possible ex-vivo application of identified GSH sites includes engineering dermal fibroblasts and keratinocytes for autologous skin grafting in people with burns or inherited skin disorders. A group of genetic skin disorders named junctional epidermolysis bullosa (JEB) is associated primarily with mutations in a family of multi-subunit laminin proteins, which are involved in anchoring the epidermis layer of the skin to derma (Bardhan et al., 2020). Certain variants of JEB are specifically related to mutations in a beta subunit of laminin-5 protein, encoded by the LAMB3 gene (Robbins et al., 2001). Using a similar dsDNA HDR donor with 300 bp homology arms possessing phosphorothioate bond and biotin, Cas9 HDR was used to integrate the LAMB3 gene tagged with GFP (total insert size 5409 bp) into GSH1 and GSH2 sites in primary human dermal fibroblasts isolated from neonatal skin (
Lastly, transcriptome-wide effects on a single-cell level following transgene integration into GSH1 in primary T-cells was assessed. Single-cell RNA sequencing was performed using the 10× Genomics protocol, which consists of encapsulating cells in gel beads bearing reverse transcription (RT) reaction mix with unique cell primers. Following the RT reaction, the cDNA is pooled, and the library is amplified for subsequent next-generation sequencing.
This single-cell sequencing workflow was applied to human T cells expressing mRuby in GSH1 after 25 days in culture, wildtype (non-transfected) cells were used as a control. These cells were also compared with wild-type controls from a different donor to again compare whether GSH integration resulted in more variability in gene expression relative to a biological replicate (
Next, targeted integration into GSH1 and GSH2 sites in human induced pluripotent stem cells (iPSCs) was characterized. These sites were targeted by employing an HDR-based integration approach using a linear double-stranded DNA donor template, which contained the eGFP transgene driven by an EF1α promoter and with 300 bp homology arms (
Previously established criteria (Sadelain et al., 2012) as well as newly introduced ones were used to predict genomic locations of novel GSHs. Specifically, coordinates of all known genes were extracted from GENCODE gene annotation (Release 24). A set of tier 1 and tier 2 oncogenes was obtained from Cancer Gene Census. The miRNA coordinates were obtained from MirGeneDB (Fromm et al., 2020). Enhancer regions were obtained from the EnhancerAtlas 2.0 database (Gao and Qian, 2019), coordinates were transposed into GRCh38/hg38 genome and union of enhancer sites was used. Genomic locations of sequences of tRNA and lncRNA were extracted from GENCODE gene annotation (Release 24). UCSC genome browser GRCh38/hg38 was used to get coordinates of telomeres and centromeres as well as unannotated regions. BEDTools (Quinlan and Hall, 2010) were used to determine flanking regions of each element of the criteria as well as to obtain union or difference between sets of coordinates. The source code for computational identification of novel safe harbors is available at https://github.com/elvirakinzina/GSH.
PITCh plasmids were generated through standard cloning methods. CMV-mRuby-bGH insert was amplified from pcDNA3-mRuby2 plasmid (Addgene, Plasmid #40260) with primers containing mircohomology sequences against specific GSH and AAVS1 site with 10 bp of overlapping ends for the pcDNA3 backbone. The pcDNA3 backbone was amplified with primers containing sequences of PITCh gRNA cut site (GCATCGTACGCGTACGTGTTTGG SEQ ID NO: 65) on both 5′ and 3′ ends of the backbone. The insert and the backbone were assembled using Gibson Assembly Master Mix (New England Biolabs, #E2611L).
Plasmids encoding CMV-mRuby-bGH flanked by GSH1/GSH2 300 bp homology arms were ordered from Twist Biosciences in pENTR vector. HDR donors were amplified from these plasmids using biotinylated primers with phosphorothioate bonds between the first 5 nucleotides on both 5′ and 3′ ends. Plasmid encoding CMV-LAMB3-T2A-GFP-bGH was generated by overlap extension PCR of LAMB3 cDNA, purchased from Genscript (NM_000228.3), and GFP-bGH sequence from Addgene (Plasmid #11154). T2A sequence was added to 5′primer of GFP-bGH. Produced insert was cloned into pENTR vector from Twist Biosciences bearing GSH1 and GSH2 300 bp homology arms using Gibson Assembly Master Mix (NEB, #E2611L). HDR donors were amplified from these plasmids using biotinylated primers with phosphorothioate bonds between the first 5 nucleotides on both 5′ and 3′ ends. HDR donors were then purified from PCR mix using SPRI beads (Beckman Coulter, #B23318) at 0.4× beads to PCR mix ratio.
HEK293T cells were obtained from the American Type Culture Collection (ATCC) (#CRL-3216); the Jurkat leukemia E6-1 T cell line was obtained from ATCC (#TIB152). HEK cells were cultured in Dulbecco's Modified Eagle's Medium (DMEM) (ATCC 30-2002) supplemented with 2 mM L-glutamine (ATCC 30-2214). Jurkat cells were cultured in ATCC-modified RPMI-1640 (Thermo Fisher, #A1049101). All media were supplemented with 10% FBS, 50 U ml-1penicillin and 50 μg ml-1streptomycin. Detachment of HEK cells for passaging was performed using the TrypLE reagent (Thermo Fisher, #12605010). All cell lines were cultured at 37° C., 5% CO2 in a humidified atmosphere.
Prior to transfection of HEK293T and Jurkat gRNA molecules were assembled by mixing 4 μl of custom Alt-R crRNA (200 μM, IDT) with 4 μL of Alt-R tracrRNA (200 μM, IDT, #1072534), incubating the mix at 95° C. for 5 min and cooling it to room temperature. 2 μL of assembled gRNA molecules were mixed with 2 μL of recombinant SpCas9 (61 μM, IDT, #1081059) and incubated for >10 min at room temperature to generate Cas9 RNP complexes.
For transfection of HEK cells 100 μL format SF Cell line kit (Lonza, V4XC-2012) and electroporation program CM-130 was used on the 4D-Nucleofector. 1×106 HEK cells were transfected with 2 μg of PITCh donor, 2 μl of Cas9 RNP complex against specific GSH and 2 μl of Cas9 RNP complex against PITCh plasmid to liberate MMEJ insert.
For transfection of Jurkat cells 100 μL format SE Cell line kit (Lonza, V4XC-1012) and electroporation program CL-120 was used on the 4D-Nucleofector. 1×106 Jurkat cells were transfected with 2 μg of PITCh donor, 2 μl of Cas9 RNP complex against specific GSH and 2 μl of Cas9 RNP complex against PITCh plasmid to liberate MMEJ insert.
Transfected HEK and Jurkat cells were bulk sorted on day 3 and single-cell sorted on day 10 following transfection using Sony SH800S sorter. Best expressing clone was selected on day 30 and cultured for another 2 months. mRuby expression of the best expressing clone was analyzed on BD LSRFortessa Flow Cytometer on day 45, 60 and 90 following transfection.
Human peripheral blood mononuclear cells were purchased from Stemcell Technologies (#70025) and T cells isolated using the EasySep Human T Cell Isolation kit (Stemcell Technologies, #17951). Primary human T cells were cultured for up to 25 days in ATCC-modified RPMI (Thermo Fisher, #A1049101) supplemented with 10% FBS, 10 mM non-essential amino acids, 5011M 2-mercaptoethanol, 50 U ml-1penicillin, 50 μg ml−6 streptomycin and freshly added 20 ng ml−1 recombinant human IL-2, (Peprotech, #200-02). T cells were cultured at 37° C., 5% CO2 in a humidified atmosphere. On day 1 of culture, transfection of primary T cells with Cas9 RNP complexes and GSH1/GSH2-mRuby HDR templates was performed using the 4D-Nucleofector and a 20 uL format P3 Primary Cell kit (Lonza, V4XP-3032). Briefly, gRNA molecules were assembled by mixing 4 μl of custom Alt-R crRNA (200 μM, IDT) with 4 μL of Alt-R tracrRNA (200 μM, IDT, #1072534), incubating the mix at 95° C. for 5 min and cooling it to room temperature. 2 μL of assembled gRNA molecules were mixed with 2 μL of recombinant SpCas9 (61 μM, IDT, #1081059) and incubated for >10 min at room temperature to generate Cas9 RNP complexes. 1×106 primary T cells were transfected with 1 μg of HDR template, 1 μl of GHS1/GSH2 Cas9 RNP complex using the E0115 electroporation program. T cells were activated with Dynabeads™ Human T-Activator CD3/CD28 (Thermo Fischer, #11161D) 3-4 hours following transfection. mRuby-positive T-cells were bulk sorted on day 4 using Sony SH800S sorter, re-activated with the new beads on day 8, sorted again on day 11 and analyzed on BD LSRFortessa Flow Cytometer on day 20.
Neonatal human dermal fibroblasts were purchased from Coriell Institute (Catalog ID GM03377). Primary fibroblasts were cultured for up to 25 days in Prime Fibroblast media (CELLNTEC, CnT-PR-F). Cells were passaged at 70% confluency using Accutase (CELLNTEC, CnT-Accutase-100). Detached cells were centrifuged for 5 min, 200×g at room temperature and seeded at seeded at 2,000 cells per cm 2. Fibroblasts were cultured at 37° C., 5% CO2 in a humidified atmosphere. Fibroblasts were transfected using Lipofectamine™ CRISPRMAX™ Cas9 Transfection Reagent (ThermoFisher Scientific, CMAX00001). Briefly, cells were transfected at 50% confluency with 1:1 ratio of custom sgRNA (40 pmoles, Synthego) and SpCas9 (40pmoles, Synthego) and 2.5 μg of GSH1/GSH2 LAMB3-T2A-GFP HDR template. GFP-positive fibroblasts were bulk sorted on day 3 and 10 using Sony SH800S sorter and analyzed on BD LSRFortessa Flow Cytometer on day 25.
Genotypic Analysis of GSH Integration Genomic DNA was extracted from 1×106 cells using PureLink Genomic DNA extraction kit (ThermoFischer Scientific, #K1820-01). 5 μL of genomic DNA extract were then used as templates for 25 μL PCR reactions using a primer with one primer residing outside of the homology arm of the integrated sequence and the other primer inside the integrated sequence. Obtained bands were gel extracted using Zymoclean Gel DNA Recovery Kit (Zymo Research, #D4001), 4 μl of eluted DNA was cloned into a TOPO-vector using Zero-blunt TOPO PCR Cloning Kit (ThermoFischer Scientific, #450245), incubated for 1 hour, transformed into NEB 5-alpha Competent E. coli cells (New England Biolabs, C2987H) and plated on agar plates containing kanamycin at 50 μg/ml. Produced clones were picked and inoculated for overnight culture in 5 ml of liquid broth supplemented with kanamycin at 50 μg/ml. Liquid cultures were mini-prepped the following morning using ZR Plasmid Miniprep—Classic kit (Zymo Research, #D4015) and Sanger sequenced by Microsynth using M13-forward and M13-reverse standard primers.
Following single-cell sort, the best expressing clone (GSH2) and wild-type (WT) of HEK293T and Jurkat cells were cultured for 80 days. Each of the four clones were split into 2 wells (1 and 2), cultured for an additional week, after which total RNA was extracted using PureLink RNA Mini Kit (ThermoFischer Scientific, #12183018A). Extracted total RNA was depleted of rRNA using RiboCop rRNA Depletion Kit (Lexogen, #144), first and second strands of cDNA were generated with SuperScript Double-Stranded cDNA Synthesis Kit (ThermoFischer Scientific, #11917010) using random hexamers and flow cell adapters were ligated to the produced double-stranded cDNA. DNA fragments were enriched by PCR using Q5 High-Fidelity 2× Master Mix (New England Biolabs, #M0492S) and sequenced by the Illumina NextSeq 500 system in the Genomics Facility Basel. Sequencing reads were aligned to the human reference genome (GRCh38) using Subread (v1.6.2) using unique mapping (Liao et al., 2013). Expression levels were quantified using the featureCounts function in the Rpackage Rsubread at gene-level (Liao et al.). Normalization across the samples was performed using default parameters in the Rpackage edgeR (Robinson et al., 2010). Differential expression analysis was performed using the exactTest function in the edgeR package. Gene ontology was performed by supplying those differentially expressed genes (adjusted p value<0.05) to the goana function (Young et al., 2010).
Single-cell RNA sequencing was conducted on day 25 of culture for Donor 1 WT (D1 WT) and Donor 1 GSH1 (D1 GSH1) and on day 5 for Donor 2 WT (D2 WT). Single cell 10× libraries were constructed from the isolated single cells following the Chromium Single Cell 3′ GEM, Library & Gel Bead Kit v3 (10× Genomics, PN-1000075). Briefly, single cells were co-encapsulated with gel beads (10× Genomics, 2000059) in droplets using Chromium Single Cell B Chip (10× Genomics, 1000074). Final D1 WT, D1 GSH1 and D2 WT libraries were pooled and sequenced on the Illumina NovaSeq platform (26/8/0/93 cycles). Raw sequencing files supplied to cellranger (v3.1.0) using the count argument under default parameters and the human reference genome (GRCh38-3.0.0). Filtering, normalization and transcriptome analysis was performed using a previously described pipeline in the R package Platypus (Yermanos et al.). Briefly, filtered gene expression matrices from cellranger were supplied as input into the Read10× function in the R package Seurat (Stuart et al., 2019). Cells containing more than 5% mitochondrial genes, or less than 150 unique genes detected were filtered out before using the RunPCA function and subsequent normalization using the function RunHarmony from the Harmony package under default parameters (Korsunsky et al., 2019). Uniform manifold approximation projection was performed with Seurat's RunUMAP function using the first 20 dimensions and the previously computed Harmony reduction. Clustering was performed by the Seurat functions FindNeighbors and FindClusters using the Harmony reduction and first 20 principal components and the default cluster resolution of 0.5, respectively (Satija et al., 2015). Cluster-specific genes were determined by Seurat's FindMarkers function for those genes expressed in at least 25% of cells in one of the two groups. Differential genes between samples were calculated using the FindMarkers function from Seurat using the default Wilcoxon Rank Sum Test with Bonferroni multiple hypothesis correction. The source code for the analysis of scRNA-seq data is available at https://github.com/alexyermanos/Platypus.
All references, patents and patent applications disclosed herein are incorporated by reference with respect to the subject matter for which each is cited, which in some cases may encompass the entirety of the document.
The indefinite articles “a” and “an,” as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean “at least one.”
It should also be understood that, unless clearly indicated to the contrary, in any methods claimed herein that include more than one step or act, the order of the steps or acts of the method is not necessarily limited to the order in which the steps or acts of the method are recited.
In the claims, as well as in the specification above, all transitional phrases such as “comprising,” “including,” “carrying,” “having,” “containing,” “involving,” “holding,” “composed of,” and the like are to be understood to be open-ended, i.e., to mean including but not limited to. Only the transitional phrases “consisting of” and “consisting essentially of” shall be closed or semi-closed transitional phrases, respectively, as set forth in the United States Patent Office Manual of Patent Examining Procedures, Section 2111.03.
The terms “about” and “substantially” preceding a numerical value mean±10% of the recited numerical value.
Where a range of values is provided, each value between and including the upper and lower ends of the range are specifically contemplated and described herein.
This application claims the benefit under 35 U.S.C. § 119(e) of U.S. provisional application No. 63/155,504, filed Mar. 2, 2021, which is incorporated by reference herein in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/018246 | 3/1/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63155504 | Mar 2021 | US |