The instant application contains a Sequence Listing which has been submitted electronically in XML format and is hereby incorporated by reference in its entirety. Said XML copy, created on Mar. 15, 2023, is named 59725-710_201_SL.xml and is 389,788 bytes in size.
Manipulation of genomes such as genome editing or rewriting is an important technology in developing therapeutics such as gene therapy, cell immunotherapy (e.g., T cells, CAR-T cells, etc), and regenerative medicine (e.g., induced pluripotent stem cells) to treat diseases such as cancer, autoimmunity, and neurodegeneration, or for tissue/organ replacement and rejuvenation. There are many challenges and limited approaches for manipulating genomes or site-specifically integrating large DNA constructs into human and other mammalian cells. For example, there are no straightforward approaches for integrating large (sub-megabase) constructs sequentially, to allow for megabase scale DNA integrations, e.g., placing the large DNA constructs anywhere in the genome or in any cell types. In addition, there are limited approaches for providing “scarless” or “nearly scarless” large DNA integrations, i.e., no remnants of unwanted DNA such as selectable markers. The ability to integrate large DNA constructs into the human genome, especially in human iPSCs, would facilitate the rewriting of the genome, delivery of large genetic circuits, and provide a platform for producing cellular therapies and regenerative medicines.
Cellular therapies hold great promise for combating previously intractable diseases (e.g., cancer, organ failure, neuropathy, etc.). While autologous (patient-derived) cells generally are optimal, this route remains limited for many applications, cost-prohibitive, and burdensome. Allogeneic (donor-derived) pluripotent stem cells (PSCs), which can be differentiated to any cell type, provide greater scalability and cost savings. However, the allogeneic approach is limited by Human Leukocyte Antigen (HLA) immune-matching, as without HLA immune-matching, the transplanted donor cells will be rejected by a patient's immune system. There is a need for developing a technology to provide personalized stem cells for both immunotherapy and regenerative medicine applications, leading to safer and more efficacious cell-based treatments.
In some aspects, provided herein is a method of integrating a DNA construct into a genome, the method comprising: a) contacting the genome in a sample with one or more agents, wherein the one or more agents are capable of cleaving the genome at a locus; b) integrating a first nucleic acid sequence into the genome at the locus; and c) integrating the one or more second nucleic acid sequences into the first nucleic acid sequence in the genome, thereby integrating the DNA construct into the genomes, wherein each of the one or more second nucleic acid sequences comprises a cargo sequence.
In some aspects, provided herein is a method of producing a synthetic genome, the method comprising: a) obtaining a cell from a sample, wherein the cell comprises a genome; b) contacting a first nucleic acid sequence at a first locus on the genome with a first set of one or more agents, wherein the first nucleic acid sequence encodes a first plurality of antigens and wherein the first locus comprises one or more binding sites for a first set one or more agents; and c) contacting a second nucleic acid sequence at a second locus on the genome with a second set of one or more agents, wherein the second nucleic acid sequence encodes a second antigen, and wherein the second locus comprises one or more binding sites for a second set of one or more agents, thereby producing the synthetic genome.
In some aspect, provided herein, is a method of producing a synthetic genome, the method comprising: a) obtaining a cell from a sample, wherein the cell comprises a genome; b) contacting a nucleic acid sequence encoding an antigen at a locus on the genome with a set of one or more agents, wherein the locus comprises one or more binding sites for the set of the one or more agents; c) integrating a synthetic sequence to the locus on the genome, wherein the integrating the synthetic sequence to the locus on the genome preserves regulatory components for gene expression, thereby producing the synthetic genome.
All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The features of the present disclosure are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the disclosure are utilized, and the accompanying drawings of which:
Provided herein are methods and compositions for integrating a DNA construct into a genome, the method comprising contacting the genome in a sample with one or more agents capable of cleaving the genome at a locus, integrating a first nucleic acid sequence into the genome at the locus, and integrating one or more second nucleic acid sequences into the first nucleic acid sequence in the genome, thereby integrating the DNA construct into the genome, wherein each of the one or more second nucleic acid sequences comprises a cargo sequence. In some aspects, provided herein are methods and compositions for integrating a large (sub-megabase) DNA construct, e.g., a DNA construct with at least 10-500 kilobases. In some aspects, methods described herein comprises integrating a landing pad sequence into a genome at a locus and integrating one or more cargo sequences into the landing pad sequence in the genome. In an exemplary workflow, a schematic of megabase-scale genome writing design is shown
As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the content clearly dictates otherwise. It should also be noted that the term “or” is generally employed in its sense including “and/or” unless the content clearly dictates otherwise. The terms “and/or” and “any combination thereof” and their grammatical equivalents as used herein, can be used interchangeably. These terms can convey that any combination is specifically contemplated. Solely for illustrative purposes, the following phrases “A, B, and/or C” or “A, B, C, or any combination thereof” can mean “A individually; B individually; C individually; A and B; B and C; A and C; and A, B, and C.” The term “or” can be used conjunctively or disjunctively, unless the context specifically refers to a disjunctive use.
The term “about” or “approximately” can mean within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 1 or more than 1 standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, up to 10%, up to 5%, or up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, within 5-fold, and more preferably within 2-fold, of a value. Where particular values are described in the application and claims, unless otherwise stated the term “about” meaning within an acceptable error range for the particular value should be assumed.
Throughout this disclosure, numerical features are presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of any embodiments. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range to the tenth of the unit of the lower limit unless the context clearly dictates otherwise. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual values within that range, for example, 1.1, 2, 2.3, 5, and 5.9. This applies regardless of the breadth of the range. The upper and lower limits of these intervening ranges may independently be included in the smaller ranges, and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention, unless the context clearly dictates otherwise.
As used in this specification and claim(s), the words “comprising” (and any form of comprising, such as “comprise” and “comprises”), “having” (and any form of having, such as “have” and “has”), “including” (and any form of including, such as “includes” and “include”) or “containing” (and any form of containing, such as “contains” and “contain”) are inclusive or open-ended and do not exclude additional, unrecited elements or method steps. It is contemplated that any embodiment discussed in this specification can be implemented with respect to any method or composition of the present disclosure, and vice versa. Furthermore, compositions of the present disclosure can be used to achieve methods of the present disclosure.
Reference in the specification to “some embodiments,” “an embodiment,” “one embodiment” or “other embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments, of the present disclosures. To facilitate an understanding of the present disclosure, a number of terms and phrases are defined below.
Certain specific details of this description are set forth in order to provide a thorough understanding of various embodiments. However, one skilled in the art will understand that the present disclosure may be practiced without these details. In other instances, well—known structures have not been shown or described in detail to avoid unnecessarily obscuring descriptions of the embodiments. Unless the context requires otherwise, throughout the specification and claims which follow, the word “comprise” and variations thereof, such as, “comprises” and “comprising” are to be construed in an open, inclusive sense, that is, as “including, but not limited to.” Further, headings provided herein are for convenience only and do not interpret the scope or meaning of the claimed disclosure.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure, suitable methods, and materials are described below.
There are limited approaches for site-specifically integrating very large DNA constructs (e.g., >10 kilobases) into human and other mammalian cells, especially primary cells like induced pluripotent stem cells (iPSCs) or embryonic stem cells (ESCs). Furthermore, there are no straightforward and portable (e.g., placed anywhere in the genome including silenced DNA and in any cell type) approaches for integrating large constructs sequentially, to allow for megabase scale DNA integrations. Finally, there are limited approaches for providing “scarless” or “nearly scarless” large DNA integrations, i.e., no remnants of unwanted DNA such as selectable markers like neomycin resistance cassettes. The ability to integrate large DNA constructs into the human genome, especially in human iPSCs, would facilitate the rewriting of the genome, delivery of large genetic circuits, and provide a platform for producing cellular therapies and regenerative medicines. Genome rewriting may provide a tool to create synthetic pathways, synthetic genomes, synthetic cells, or human patient DNA, and a tool for disease modeling.
Recombinase Mediated Cassette Exchange (RMCE) is a general approach for integrating large DNAs into the genome of mouse embryonic stem cells. However, there have been no demonstrations or approaches optimized for integrating large DNA into human stem cells such as iPSCs. RMCE uses tyrosine recombinases (including, but not limited to, Cre or Flp recombinase) and 2 incompatible recombinase recognition sites (including, but not limited to, Cre or Flp recognition sites). For example, a commonly used approach using Cre recombinase uses the incompatible recognition sites loxP and lox2272.
Provided herein is a new RMCE landing pad approach called REcombinase WRiting of Iterative DNA and Trap Excision (REWRITE). In some aspects, REWRITE has optimal features allowing for complete portability to any mammalian cells, including but not limited to, human iPSCs and mouse ESCs; any place in the genome including, but not limited to, silenced regions; allows for sequential integration of large DNA constructs towards the megabase scale; and includes a self-excising selection cassette for rapid and seamless removal of genetic scars, including but not limited to selectable markers. In some aspects, REWRITE landing pad can be placed at any region of the genome. In some embodiments, site-specific nucleases (SSNs) or homology-directed recombination (HR) can be used to place REWRITE landing pad. In some embodiments, SSN comprises meganucleases, zinc-finger nucleases (ZFN), TAL effector nucleases (TALEN), and clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) system.
These four major classes of gene-editing techniques, namely, meganucleases, ZFNs, TALENs, CRISPR/Cas systems share a common mode of action in binding a user-defined sequence of DNA and mediating a double-stranded DNA break (DSB). DSB may then be repaired by HR, an event that introduces the homologous sequence from a donor DNA fragment, or by non-homologous end joining (NHEJ), when there is no donor DNA present.
CRISPR-Cas system may be used with a guide target sequence for genetic screening, targeted transcriptional regulation, targeted knock-in, and targeted genome editing, including base editing, epigenetic editing, and introducing double strand breaks (DSBs) for homologous recombination-mediated insertion of a nucleotide sequence. CRISPR-Cas system comprises an endonuclease protein whose DNA-targeting specificity and cutting activity can be programmed by a short guide RNA or a duplex crRNA/TracrRNA. A CRISPR endonuclease comprises a caspase effector nuclease, typically microbial Cas9 and a short guide RNA (gRNA) or a RNA duplex comprising a 18 to 20 nucleotide targeting sequence that directs the nuclease to a location of interest in the genome. Genome editing can refer to the targeted modification of a DNA sequence, including but not limited to, adding, removing, replacing, or modifying existing DNA sequences, and inducing chromosomal rearrangements or modifying transcription regulation elements (e.g., methylation/demethylation of a promoter sequence of a gene) to alter gene expression. As described above CRISPR-Cas system requires a guide system that can locate Cas protein to the target DNA site in the genome. In some instances, the guide system comprises a crispr RNA (crRNA) with a 17-20 nucleotide sequence that is complementary to a target DNA site and a trans-activating crRNA (tracrRNA) scaffold recognized by the Cas protein (e.g., Cas9). The 17-20 nucleotide sequence complementary to a target DNA site is referred to as a spacer while the 17-20 nucleotide target DNA sequence is referred to a protospacer. While crRNAs and tracrRNAs exist as two separate RNA molecules in nature, single guide RNA (sgRNA or gRNA) can be engineered to combine and fuse crRNA and tracrRNA elements into one single RNA molecule. Thus, in one embodiment, the gRNA comprises two or more RNAs, e.g., crRNA and tracrRNA. In another embodiment, the gRNA comprises a sgRNA comprising a spacer sequence for genomic targeting and a scaffold sequence for Cas protein binding. In some instances, the guide system naturally comprises a sgRNA. For example, Cas12a/Cpf1 utilizes a guide system lacking tracrRNA and comprising only a crRNA containing a spacer sequence and a scaffold for Cas12a/Cpf1 binding. While the spacer sequence can be varied depending on a target site in the genome, the scaffold sequence for Cas protein binding can be identical for all gRNAs.
CRISPR-Cas systems described herein can comprise different CRISPR enzymes. For example, the CRISPR-Cas system can comprise Cas9, Cas12a/Cpf1, Cas12b/C2c1, Cas12c/C2c3, Cas12d/CasY, Cas12e/CasX, Cas12g, Cas12h, or Cas12i. Non-limiting examples of Cas enzymes include, but are not limited to, Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas5d, Cas5t, Cas5h, Cas5a, Cash, Cas7, Cas8, Cas8a, Cas8b, Cas8c, Cas9 (also known as Csn1 or Csx12), Cas10, Cas10d, Cas12a/|, Cas12b/C2c1, Cas12c/C2c3, Cas12d/CasY, Cas12e/CasX, Cas12f/Cas14/C2c10, Cas12g, Cas12h, Cas12i, Cas12k/C2c5, Cas13a/C2c2, Cas13b, Cas13c, Cas13d, C2c4, C2c8, C2c9, Csy1, Csy2, Csy3, Csy4, Cse1, Cse2, Cse3, Cse4, Cse5e, Csc1, Csc2, Csa5, Csn1, Csn2, Csm1, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx1S, Csx11, Csf1, Csf2, CsO, Csf4, Csd1, Csd2, Cst1, Cst2, Csh1, Csh2, Csa1, Csa2, Csa3, Csa4, Csa5, GSU0054, Type II Cas effector proteins, Type V Cas effector proteins, Type VI Cas effector proteins, CARF, DinG, homologues thereof, or modified or engineered versions thereof such as dCas9 (endonuclease-dead Cas9) and nCas9 (Cas9 nickase that has inactive DNA cleavage domain). In some cases, the compositions, methods, devices, and systems, described herein, may use the Cas9 nuclease from Streptococcus pyogenes, of which amino acid sequences and structures are well known to those skilled in the art.
In some aspects, described herein, are methods for contacting a genome from a sample with one or more agents capable of cleaving the genome at a locus. In some embodiments, the contacting may occur in vitro. In some embodiments, the contacting may occur in vivo, e.g., in a cell. In some embodiments, the one or more agents comprise a polypeptide, a polynucleotide, or a combination thereof. In some embodiments, the polypeptide comprises an enzyme, e.g., a site-specific nuclease. Examples of a site-specific nuclease are shown above. In some embodiments, a site-specific nuclease comprises an engineered homing endonuclease or meganuclease, a zinc-finger nuclease (ZFN), a transcription activator-like effector nuclease (TALEN), a clustered regularly interspaced short palindromic repeat (CRISPR/Cas), or a combination thereof. In some embodiments, the polynucleotide comprises a guide RNA (gRNA). In some embodiments, the one or more agents comprise a site-specific nuclease and a gRNA (e.g., CRISPR/Cas system).
Agents described herein can be delivered into cells in vitro or in vivo by art-known methods or as described herein. Delivery methods such as physical, chemical, and viral methods are also known in the art [63]. In some instances, physical delivery methods can be selected from the methods but not limited to electroporation, microinjection, or use of ballistic particles. On the other hand, chemical delivery methods require use of complex molecules such calcium phosphate, lipid, or protein. In some embodiments, viral delivery methods are applied for gene editing techniques using viruses such as but not limited to adenovirus, lentivirus, and retrovirus. In some embodiments, agents described herein can be delivered via a carrier. In some embodiments, agents described herein can be delivered by, e.g., vectors (e.g., viral or non-viral vectors), non-vector based methods (e.g., using naked DNA, DNA complexes, lipid nanoparticles, RNA such as mRNA), or a combination thereof. In some embodiments, a carrier can comprise comprises a vector, a messenger RNA (mRNA), double stranded DNA (dsDNA), single stranded DNA (ssDNA), or a plasmid. In some embodiments, agents can be delivered directly to cells as naked DNA or RNA, for instance by means of transfection or electroporation, or can be conjugated to molecules (e.g., N-acetylgalactosamine) promoting uptake by cells.
In some embodiments, vectors can comprise one or more sequences encoding one or more agents described herein. Vectors can also comprise a sequence encoding a signal peptide (e.g., for nuclear localization, nucleolar localization, or mitochondrial localization), associated with (e.g., inserted into or fused to) a sequence coding for a protein. As one example, vectors can include a Cas9 coding sequence that includes one or more nuclear localization sequences (e.g., a nuclear localization sequence from SV40). Vectors described herein can also include any suitable number of regulatory/control elements, e.g., promoters, enhancers, introns, polyadenylation signals, Kozak consensus sequences, or internal ribosome entry sites (IRES). These elements are well known in the art. Vectors described herein may include recombinant viral vectors. Any viral vectors known in the art can be used. Examples of viral vectors include, but are not limited to lentivirus (e.g., HIV and FIV-based vectors), Adenovirus (e.g., AD100), Retrovirus (e.g., Maloney murine leukemia virus, MML-V), herpesvirus vectors (e.g., HSV-2), and Adeno-associated viruses (AAVs), or other plasmid or viral vector types. In some embodiments, agents described herein may be delivered in one carrier (e.g., one vector). In some embodiments, agents described herein may be delivered in in multiple carriers (e.g., multiple vectors).
In addition, viral particles can be used to deliver agents in nucleic acid and/or peptide form. For example, “empty” viral particles can be assembled to contain any suitable cargo. Viral vectors and viral particles can also be engineered to incorporate targeting ligands to alter target tissue specificity. Non-viral vectors can be also used to deliver agents according to the present disclosure. One example of non-viral nucleic acid vectors is an nanoparticle, which can be organic or inorganic. Nanoparticles are well known in the art. Any suitable nanoparticle design can be used to deliver agents described herein (e.g., nucleic acids encoding such agents).
In some embodiments, agents described herein can be delivered as a ribonucleoprotein (RNP) to cells. An RNP may comprise a nucleic acid binding protein, e.g., Cas9, in a complex with a gRNA targeting a genome/locus/sequence of interest. RNPs can be delivered to cells using known methods in the art, including, but not limited to electroporation, nucleofection, or cationic lipid-mediated methods, for example, as reported by Zuris, J. A. et al., 2015, Nat. Biotechnology, 33(1):73-80.
Provided herein is a method of integrating a DNA construct into a genome, the method comprising contacting the genome from a sample with one or more agents capable of cleaving the genome at a locus, integrating a first nucleic acid sequence into the genome at the locus, and integrating one or more second nucleic acid sequences into the first nucleic acid sequence in the genome, thereby integrating the DNA construct into the genome, wherein each of the one or more second nucleic acid sequences comprises a cargo sequence. In some embodiments, the contacting and/or the integrating may occur in vitro. In some embodiments, the contacting and/or the integrating may occur in vivo, e.g., in a cell. In some embodiments, the first nucleic acid sequence comprises a landing pad sequence. A landing pad sequence as described herein may comprise a gene, a regulatory element, or a combination thereof. Details of a landing pad sequence is described in Example 1. In some embodiments, a regulatory element may comprise a promoter, a terminator, an enhancer, a recombinase recognition site, or a combination thereof. Examples of a promoter can include but are not limited to a universal chromatin opening element (UCOE), human EF1α promoter, Cytomegalovirus (CMV) promoter, simian virus 40 (SV40) promoter, a CAG promoter, or a combination thereof.
In some embodiments, a promoter may be an inducible promoter, a cell specific promoter, a developmental specific promoter, or a tissue specific promoter. Examples of tissue specific promoters include, but are not limited to, a FABP promoter, an Lck promoter, a CamKII promoter, a CD19 promoter, a Keratin promoter, an Albumin promoter, an aP2 promoter, an insulin promoter, an MCK promoter, a MyHC promoter, a WAP promoter, or a Col2A promoter. In some embodiments, the exogenous promoter may be a CMV promoter, an RSV long terminal repeat (LTR) promoter, a simian virus 40 (SV40) promoter, a dihydrofolate reductase (DHFR) promoter, a beta-actin promoter, a phosphoglycerate kinase (PGK) promoter, or an elongation factor-1 alpha (EF1a) promoter. In some embodiments, a promoter may comprise a promoter system comprising one or more promoters, enhancers, or regulatory sequences. For example, a promoter may comprise a cytomegalovirus early enhancer element, a first exon and first intron of chicken beta-actin gene, or a splice acceptor of rabbit beta-globin gene (the CAG promoter system). In some embodiments, a promoter may comprise one or more translational enhancer sequence of human telomerase reverse transcriptase (hTERT), a simian virus 40 (SV40) enhancer sequence, or a CMV enhancer sequence (the SGE promoter system). Promoter systems as described in Watanabe et al. Ocol Rep. 2014 31(3): 1089-1095 is incorporated herein by reference in its entirety. In some embodiments, a promoter may be a CMV promoter. In some embodiments, a promoter can comprise an enhancer region, a TATA box, and a transcription start point. In particular embodiments, a promoter may comprise a CAG promoter system. In another embodiment, a promoter is a SGE promoter system. In some embodiments, a promoter may be constitutive or inducible. In some embodiments, a promoter is a mammalian specific promoter.
Examples of a terminator can include but are not limited to simian virus 40 (SV40) terminator, human growth hormone (hGH) terminator, bovine growth hormone (BGH) terminator, or rabbit beta-globin (rbGlob) terminator. In some embodiments, a landing pad sequence may comprise a gene encoding estrogen receptor 2 (ERT2) or a variant thereof, a gene encoding P2A or a variant thereof, a gene encoding a recombinase or a variant thereof, a blasticidin resistance gene, or a gene encoding a fluorescent protein or a variant thereof. In some embodiments, the ERT2 gene is fused with a gene encoding a recombinase. In some embodiments, the ERT2 gene is fused with a gene encoding a tyrosine recombinase. In some embodiments, the ERT2 gene is fused with a gene encoding a Cre recombinase, a Flp recombinase, a variant thereof, or a combination thereof. In some embodiments, the ERT2 gene is induced by tamoxifen. In some embodiments, a promoter may comprise a human EF1α promoter combined with a universal chromatin opening element (UCOE). In some embodiments, a human EF1α promoter combined with a universal chromatin opening element (UCOE) may serve as a highly active gene promoter.
In some embodiments, a landing pad sequence may comprise one or more restriction enzyme site sequences. Examples of restriction enzyme site sequences can include, but are not limited to, restriction site sequence of restriction enzymes including, but not limited to, AarI, AatII, Acc65I, AccI, AclI, AcuI, AfeI, AflII, AflIII, AgeI, AhdI, AjuI, AleI, AlfI, AloI, AlwNI, ApaI, ApaLI, ApoI, AscI, AseI, AsiSI, AvaI, AvrII, BaeGI, BaeI, BamHI, BanI, BanII, BarI, BbsI, BbvCI, BcgI, BciVI, BcII, BdaI, BfuAI, BglI, BglII, BlpI, Bme1580I, BmeT110I, BmgBI, BmrI, BmtI, Bp1I, BpmI, Bpu10I, BpuEI, BsaAI, BsaBI, BsaHI, BsaI, BsaWI, BsaXI, BseRI, BseYI, BsgI, BsiEI, BsiHKAI, BsiWI, BsmBI, BsmI, BsoBI, Bsp1286I, BspDI, BspEI, BspHI, BspMI, BspQI, BsrBI, BsrDI, BsrFI, BsrGI, BssHII, BssSI, BstAPI, BstBI, BstEII, BstXI, BstYI, BstZ17I, Bsu36I, BtgI, BtgZI, BtsI, ClaI, CspCI, DraI, DraIII, DrdI, EaeI, EagI, Earl, EciI, Eco53kI, Eco57MI, EcoNI, EcoO109I, EcoP15I, EcoRI, EcoRV, Fall, FseI, FspAI, FspI, HaeII, Hin4I, HincII, HindIII, HpaI, KasI, KflI, KpnI, MauBI, MfeI, MluI, MmeI, MreI, MscI, MslI, MspA1I, NaeI, NarI, NcoI, NdeI, NgoMIV, NheI, NmeAIII, NotI, NruI, NsiI, NspI, PacI, PaeR7I, PasI, PciI, PflFI, PflMI, PfoI, PluTI, PmeI, Pm1I, PpiI, PpuMI, PshAI, PsiI, PspFI, PspOMI, PspXI, PstI, PvuI, PvuII, RsrII, SacI, SacII, SalI, SapI, SbfI, ScaI, SexAI, SfcI, SfiI, SfoI, SgrAI, SmaI, SmlI, SnaBI, SpeI, SphI, Srfl, SspI, StuI, StyI, TaqII, TatI, TsoI, TspMI, TstI, Tth111I, XcmI, XhoI, XmaI, XmnI, ZraI. In some embodiments, a landing pad sequence may comprise a BsaI restriction enzyme site sequence.
In some embodiments, a landing pad sequence may comprise a gene encoding a recombinase. In some embodiments, the recombinase may be codon-optimized. In some embodiments, the recombinase may be codon-optimized for mammals. Any codon-optimization methods well known in the art can be used. In some embodiments, the recombinase may lack CpG sites. In some embodiments, the recombinase may be codon-optimized for mammals and lack CpG sites. In some embodiments, recombinase expression may be induced. In some embodiments, an inducible gene expression system may be used. Inducible gene expression systems are well known in the art. Examples of inducible gene expression systems can include, but are not limited to, tetracycline-controlled operator system (e.g., LacR/O-based systems, lacR-VP16 chimeric system, Tet-on/Tet-off system, etc.), cumate-controlled operator system, protein-protein interaction-based chimeric system (e.g., rapamycin system, abscisic acid-regulated interaction system, light-induced protein-protein interaction system, etc.), tamoxifen-controlled recombinase system, and riboswitch-regulatable expression system (e.g., bacteria-derived RNA aptamers linked with hammerhead ribozymes (aptazymes), etc.). In some embodiments, inducible gene expression systems described herein may be combined with RNA interference or CRISPR/Cas9 system. In some embodiments, recombinase expression may be induced by a tetracycline-controlled operator system. In some embodiments, recombinase expression nay be induced by a Tet-on/Tet-off system. In some embodiments, a Tet-on/Tet-off system comprises a promoter. In some embodiments, a Tet-on/Tet-off system comprises a TET-On 3G promoter (TRE3G). In some embodiments, recombinase may be induced by tetracycline or a derivative thereof. In some embodiments, recombinase expression may be induced by doxycycline.
In some embodiments, a landing pad sequence may comprise a gene encoding a fluorescent protein. For example, a landing pad sequence may comprise a gene encoding a red fluorescent protein (RFP), a green fluorescent protein (GFP), a yellow fluorescent protein (YFP), a blue fluorescent protein (BFP), or a cyan fluorescent protein (CFP), or any other fluorescent protein known in the art. In some embodiments, a landing pad sequence may comprise a GFP. In some embodiments, a landing pad sequence may comprise an RFP. In some embodiments, a landing pad sequence may comprise an mNeongreen (GFP).
In some embodiments, a landing pad sequence may comprise a recombinase recognition site. Examples of a recombinase recognition site can include, but are not limited to, a Cre recognition site, or a Flp recognition site. Examples of a recombinase recognition site can include, but are not limited to, loxP, loxM, lox2722, loxFAS, lox5171, frt, a variant thereof, or a combination thereof. In some embodiments, a landing pad sequence described herein may comprise a recombinase recognition site placed downstream of or 3′ 5′ to a promoter. For example, a landing pad sequence described herein may comprise a loxP site placed downstream of or 3′ to an EFα promoter. In some embodiments, an incompatible recombinase recognition site may be used. In some embodiments, an incompatible Cre or Flp recombinase recognition site may be used. In some embodiments, one, two or more incompatible recombinase recognition sites may be used. In some embodiments, an incompatible recombinase recognition site may comprise loxP or lox2272. In some embodiments, incompatible recombinase recognition sites loxP and lox2272 may be used.
In some embodiments, a landing pad sequence described herein may comprise a gene encoding a recombinase. A recombinase may be a site-specific recombinase. A site-specific recombinase may include, but are not limited to a Cre recombinase, a Hin recombinase, a Tre recombinase, or a Flippase (Flp) recombinase. A recombinase may comprise a tyrosine recombinase or a serine recombinase. In some embodiments, a recombinase may be induced. In some embodiments, expression of the recombinase may be induced. Inducible gene expression systems described herein or any inducible gene expression systems known in the art can be used. In some embodiments, the recombinase may be doxycycline inducible. Doxycycline-inducible gene expression systems are well known in the art. In some embodiments, a recombinase is constitutively expressed. In some embodiments, the recombinase may be codon-optimized. In some embodiments, the recombinase may be codon-optimized for mammals. Any codon-optimization methods well known in the art can be used. In some embodiments, the recombinase may lack CpG sites. In some embodiments, the recombinase may be codon-optimized for mammals and lack CpG sites. In some embodiments, the recombinase may comprise Cre recombinase. In some embodiments, the recombinase may comprise codon-optimized Cre recombinase. In some embodiments, the recombinase may comprise a doxycycline inducible codon-optimized Cre recombinase (iCre). In some embodiments, the iCre recombinase is expressed from a plasmid comprising a sequence of SEQ ID NO: 13.
In some embodiments, the recombinase may be self-excisable. In some embodiments, the recombinase may comprise Flp recombinase. In some embodiments, the recombinase may be fused with a tamoxifen inducible estrogen receptor (ERT2). In some embodiments, the Flp recombinase may comprise a self-excising Flp recombinase. In some embodiments, the Flp recombinase may be an inducible Flp recombinase. In some embodiments, the Flp recombinase may be fused with a tamoxifen inducible estrogen receptor (ERT2). In some embodiments, the Flp recombinase may be a self-excising inducible Flp recombinase may be fused with a tamoxifen inducible estrogen receptor (ERT2).
In some embodiments, the landing pad sequence may not comprise a gene encoding a recombinase. In the embodiment where the landing pad sequence does not comprise a gene encoding a recombinase, a recombinase may be transiently expressed from a separate delivery vehicle. A delivery vehicle may be any standard expression vector. In some embodiments, a recombinase may be transiently expressed from a plasmid. In some embodiments, a landing pad sequence may comprise a sequence comprising SEQ ID NO: 2, SEQ ID NO: 5, SEQ ID NO: 6, or SEQ ID NO: 7.
The present disclosure also encompasses a Deleter construct. A Deleter construct can be assembled by standard cloning methods using the same backbone as the REWRITE landing pad. A Deleter construct can comprise a recombination recognition site, a promoter, a marker gene for selection, a sequence that can induce ribosome skipping, a kinase gene, a polyA sequence, or a combination thereof. For example, a Deleter construct can comprise an FRTm (F5) sequence, a PGK promoter (e.g., a constitutive PGK promoter), a hygromycin resistance gene, a P2A sequence, a thymidine kinase gene, an SV40 polyA sequence, and another FRTm (F5) sequence. In some embodiments, a Deleter construct can comprise one or more restriction enzyme digestion sites for cloning. For example, a Deleter construct can comprise BsaI sites for golden gate cloning. In some embodiments, a Deleter construct can comprise homology arms or sequences for genomic integration. In some embodiments, a Deleter construct can comprise a sequence of SEQ ID NO: 15.
In some aspects, provided herein are methods for integrating multiple DNA constructs into a genome. Also provided herein, in some aspects, are methods for sequentially integrating one or more large DNA constructs. In some aspects, methods described herein may comprise contacting the genome from a sample with one or more agents capable of cleaving the genome at a locus, integrating a first nucleic acid sequence into the genome at the locus, and integrating the one or more second nucleic acid sequences into the first nucleic acid sequence in the genome, thereby integrating the DNA construct into the genome, wherein each of the one or more second nucleic acid sequences comprises a cargo sequence. In some embodiments, the integrating may occur in vitro. In some embodiments, the integrating may occur in vivo, e.g., in a cell. In some embodiments, the one or more second nucleic acid sequences may be integrated sequentially. In some embodiments, the one or more second nucleic acid sequences, each comprising a cargo sequence, may be integrated sequentially. In some embodiments, methods described herein can comprise integrating a first nucleic acid sequence comprising a landing pad sequence and then integrating one or more second nucleic acid sequences each comprising a large DNA “payload” delivery construct or a cargo sequence into the landing pad sequence. In some embodiments, the one or more second nucleic acid sequences may comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleic acid sequences.
In some aspects, provided herein, are methods for integrating a large DNA construct. In some embodiments, the integrating may occur in vitro. In some embodiments, the contacting and/or the integrating may occur in vivo, e.g., in a cell. A DNA construct can be inserted into a suitable expression vector or a vector comprising an expression cassette known in the art. A vector can include a transcriptional unit comprising an assembly of a genetic element or elements having a regulatory role in gene expression, for example, promoters or enhancers, a structural or coding sequence which is transcribed into mRNA and translated into protein, and appropriate transcription initiation and termination sequences. In some embodiments, the one or more second nucleic acid sequences are provided by one or more vectors configured to carry a kilobase-sized or megabase-sized (e.g., 10 to 500 kilobases or 10-1000 kilobases) nucleic acid sequence. In some embodiments, vectors described herein can include expression vectors such as mammalian expression vectors. Any suitable expression vector known in the art can be used for delivery. In some embodiments, vectors described herein may comprise a marker. In some embodiments, vectors described herein may comprise a marker that is devoid of a promoter. In some embodiments, vectors described herein may comprise a marker that is devoid of a start codon. In some embodiments, vectors described herein may comprise a marker that is devoid of a promoter and a start codon. In some embodiments, a marker comprises a selectable marker. Examples of a selectable marker can include, but are not limited to an antibiotic marker, a fluorescent marker, or a autotrophic marker. In some embodiment, a marker comprises an antibiotic marker. Examples of an antibiotic marker can include, but are not limited to, neomycin (G418), puromycin, hygromycin, blasticidin, ampicillin, tetracycline, zeocin, kanamycin, bleomycin, chloramphenicol, spectinomycin, streptomycin, carbenicillin, bleomycin, erythromycin, and polymyxin B. Examples of a fluorescent marker can include, but are not limited to, a green fluorescent protein (GFP), a red fluorescent protein (RFP), a yellow fluorescent protein (YFP), a blue fluorescent protein (BFP), or a cyan fluorescent protein (CFP). In some embodiments, a fluorescent marker may comprise a GFP. In some embodiments, a fluorescent marker may comprise a mNeongreen (GFP). In some embodiments, a vector for delivering a DNA construct or a DNA payload may not comprise a promoter. In some embodiments, a vector for delivering a DNA construct or a DNA payload may comprise an antibiotic marker that may not comprise a start codon.
In some aspects, a “promoter trap” system may be used. “Promoter trap” systems are well known in the art and can include a promoter-less and/or start codon (i.e., ATG)-less reporter gene so that reporter gene expression can occur only when the insertion is within a transcriptional unit and/or in a correct direction. In some embodiments, a “promotor trap” system may comprise a marker. In some embodiments, a “promotor trap” system may comprise an antibiotic marker a fluorescent marker, or a autotrophic marker. In some embodiments, a “promotor trap” system may comprise a promoter-less and/or ATG-less antibiotic marker. In some embodiments, a promoter-less and/or ATG-less puromycin resistance marker may be used. In some embodiments, a “promoter trap” system may comprise a GFP. In some embodiments, a “promotor trap” system may comprise a mNeongreen (GFP). In some embodiments, a “promotor trap” system may comprise a promoter/ATG-less puromycin resistance marker and mNeongreen (GFP).
In some embodiments, a vector may be prepared based on a copy-number inducible Bacterial Artificial Chromosome (BAC), Yeast Artificial Chromosome (YAC), or a combination thereof. In some embodiments, a vector may comprise the “promoter trap” described herein. Any suitable vector system described herein can be used to deliver a cargo sequence. In some embodiments, one or more vectors each comprising a cargo sequence can be introduced to a cell or a genome sequentially or serially. In some embodiments, the last vector comprising a cargo sequence may comprise a final “exit” markers or elements. In some embodiments, the final exit markers or elements may include a tamoxifen inducible Flp-ert2 (Flp) and/or a second frt site.
In some embodiments, a cargo sequence (also referred to as a DNA payload, a DNA construct, or “Big-DNA” herein) may comprise at least 10 kilobases. In some embodiments, a cargo sequence may comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, or at least 1000 kilobases. In some embodiments, a cargo sequence may comprise approximately 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, or approximately 1000 kilobases.
In some embodiments, a cargo sequence may comprise at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 60, 70, 80, 90, 100, 120, 140, 150, 160, 180, 200, 22, 240, 250, 260, 280, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or at least 1000 megabases. In some embodiments, a cargo sequence may comprise approximately 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 60, 70, 80, 90, 100, 120, 140, 150, 160, 180, 200, 22, 240, 250, 260, 280, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or approximately 1000 megabases.
In some embodiments, a cargo sequence may comprise approximately 350 kilobases. In some embodiments, a cargo sequence may comprise approximately 500 kilobases. In some embodiments, one or more cargo sequences may be integrated at each step sequentially into the landing pad within an iPSC or ESC or any other cell type. In some embodiments, a cargo sequence may be cloned into a payload delivery vector.
Landing pads, large DNA payloads, cargo sequences, or “Big-DNA” described herein can be integrated into any genomic locus. Genomic loci described herein can include, but are not limited to HLA, AAVS1, CCR5, hROSA26, CYBB, CD40LG, CDK6, TLR8, TRBC1, HPRT, or HOXA. Table 1 describes exemplary genomic loci. In some embodiments, a genomic locus may be a human genomic locus. Because it may be beneficial to replace any gene in a genome with a synthetic version, for example, to provide a disease-linked allele for functional testing in cell culture or an animal model, in principle any gene in a mammalian genome represents a reasonable position to install a landing pad. In some embodiments, a payload vector comprising a location specific homology arm sequences can comprise a sequence of SEQ ID NO: 16.
In some embodiments, a sample can be derived from a biological sample, i.e., extracted from a biological sample. In some embodiments, the biological sample can be from a virus, bacterium, mycoplasma, parasite, fungus, or plant. In some embodiments, the biological sample can be an animal. In some embodiments, the animal is a mammal. In some embodiments, the mammal comprises a human, non-human primate, rodent, caprine, bovine, ovine, equine, canine, feline, mouse, rat, rabbit, horse or goat. In some embodiments, a biological sample is obtained from a human subject. The human subject can be a patient. The human subject can be an adult, an adolescent, a pre-adolescent, a child, a toddler, an infant, or a neonate. In some embodiments, the biological sample can be a tissue sample or bodily fluid, such as a human bodily fluid. For example, the bodily fluid can be blood, sera, plasma, lavage, urine, cerebrospinal fluid (CSF), sputum, saliva, bone marrow, synovial fluid, aqueous humor, amniotic fluid, cerumen, breast milk, bronchoalveolar lavage fluid, semen, prostatic fluid, Cowper's fluid, pre-ejaculatory fluid, female ejaculate, sweat, tears, cyst fluid, pleural fluid, peritoneal fluid, pericardial fluid, lymph, chyme, chyle, bile, interstitial fluid, menses, pus, sebum, vaginal secretion, mucosal secretion, stool water, pancreatic juice, lavage fluid from sinus cavities, bronchopulmonary aspirate, blastocoel cavity fluid, or umbilical cord blood. A biological sample can comprise a cell, such as a stem cell, undifferentiated cell, differentiated cell, or a cell from a diseased subject or a subject suspected of having a condition or infection. A biological sample can be blood, a cell, a population of cells, a quantity of tissue, or fluid of a subject. In some embodiments, a biological sample comprises nasopharyngeal fluid, oropharyngeal fluid, saliva, blood, sera, plasma, lavage, urine, ear exudate, cerebrospinal fluid (CSF), sputum, saliva, bone marrow, synovial fluid, aqueous humor, amniotic fluid, cerumen, breast milk, auroral pharyngeal lavage fluid, bronchoalveolar lavage, bronchoalveolar lavage fluid, semen, prostatic fluid, Cowper's fluid, pre-ejaculatory fluid, female ejaculate, sweat, tears, cyst fluid, pleural fluid, peritoneal fluid, pericardial fluid, lymph, chyme, chyle, bile, interstitial fluid, menses, pus, sebum, vaginal secretion, mucosal secretion, stool, stool water, pancreatic juice, lavage fluid from sinus cavities, bronchopulmonary aspirate, blastocoel cavity fluid, or umbilical cord blood.
A biological sample can be collected by any non-invasive means, such as, for example, by a nasopharyngeal swab, a nasal swab, an oropharyngeal swab, or a buccal swab. A biological sample can be also collected by drawing, for example, blood or any other bodily fluid from a subject, or using fine needle aspiration or needle biopsy. A biological sample can be collected by the subject providing the sample to, for example, a doctor or lab technician. For example, the subject can provide a urine, stool, or saliva sample.
A cell described herein can be a bacterial cell, a yeast cell, a fungal cell, an insect cell, or a mammalian cell. In some embodiments, a cell may comprise a mammalian cell. Mammalian cells can be derived or isolated from a tissue of a mammal. In some embodiments, mammalian cells may comprise COS cells, BHK cells, 293 cells, 3T3 cells, NSO hybridoma cells, baby hamster kidney (BHK) cells, PER.C6™ human cells, HEK293 cells or Cricetulus griseus (CHO) cells. In some embodiments, a mammalian cell may comprise a human cell or a mouse cell. Examples of mammalian cells can also include but are not limited to cells from humans, non-human primates such as chimpanzees, and other apes and monkey species; farm animals such as cattle, horses, sheep, goats, swine; domestic animals such as rabbits, dogs, and cats; laboratory animals including rodents, such as rats, mice and guinea pigs, and the like. In some embodiments, a mammalian cell is a human cell. In some embodiments, a mammalian cell is a mouse cell. In some embodiments, a mammalian cell comprises an embryonic stem cell (ESC), a pluripotent stem cell (PSC), or an induced pluripotent stem cell (iPSC).
A cell described herein can be a primary cell or a derivative of a primary cell. Primary cells may be taken directly from a biological sample such as living tissue (e.g., biopsy material) and established for growth in vitro. In some embodiments, primary cells described herein may have undergone very few population doublings. In some embodiments, a primary cell comprises an embryonic stem cell (ESC), a pluripotent stem cell (PSC), or an induced pluripotent stem cell (iPSC). In some embodiments, a primary cell is an embryonic stem cell (ESC). In some embodiments, a primary cell is a pluripotent stem cell (PSC). In some embodiments, a primary cell is an induced pluripotent stem cell (iPSC).
Cellular therapies hold great promise for combating previously intractable diseases (e.g., cancer, organ failure, neuropathy, autoimmunity, etc.). Clinical trials are underway for immunotherapies (T-cells), Parkinson's disease (dopaminergic neurons), heart failure (cardiomyocytes), leukemia (cord blood), and organ transplantation [1-10]. For autologous cell therapies, the patient's own cells are extracted, genetically modified ex vivo, and re-infused as a cell-based therapeutic. While autologous (patient-derived) cells generally are optimal, this route remains limited for many applications as this process can be expensive, difficult to scale, and limited by the overall fitness, cell expansion capacity, and potential for engineering of the required cell type. Allogeneic (donor-derived) pluripotent stem cells (PSCs) or embryonic stem cells (ESCs) are a promising alternative as they can be differentiated to many therapeutic cell types and grown indefinitely, providing a potentially unlimited supply of therapeutic cells (e.g., retinal cells for ocular diseases; vascular progenitor cells for ischemia and vascular repair; cardiac progenitors for heart diseases; insulin-producing (3-cells for diabetes; diabetic retinopathy; chronic obstructive pulmonary disease (COPD); organ failure; critical limb ischemia; platelets, red blood cells and CAR-T/Natural Killer cells for cancer, transfusion and hematological disorders). Importantly, PSCs derived from healthy donors (allogeneic) can reduce costs, enable centralized manufacturing, and open new market possibilities and opportunities.
However, allogeneic cells are limited by the need to match Human Leukocyte Antigen (HLA) Class-1 alleles, the most genetically polymorphic region in humans. Mismatches in HLA Class 1 haplotypes lead to the “self vs non-self” immune response that can result in rejection of transplanted therapeutic cells. This process is mediated by genetically polymorphic HLA Class I proteins HLA-A, HLA-B, and HLA-C, (HLA-A/B/C), which are found on the surface of most somatic cells. HLA-A/B/C proteins complex with non-polymorphic beta-2-microglobulin (B2M) and together present intracellular peptide antigens to immune cells. Polymorphisms in HLA-A/B/C alter how these internal peptides are displayed; this gives many patients a unique cellular “fingerprint.” Thus, the immune system rejects allogeneic cells with foreign HLA haplotypes. As the HLAs are the most genetically diverse region in the genome, this leads to difficulty in matching allogeneic donors to patients in need.
This unmet need needs to be addressed by developing a technology to program any HLA haplotype into a PSC on-demand. Pre-immune matched iPSCs can provide an “off-the-shelf” source of cells for regenerative medicine. Provided herein are methods and compositions for “immune-matching” technology that can provide isogenic PSCs that are compatible with any individual, including, but not limited to those with rare HLA haplotypes under-served by PSC “banks.” Such engineered cells can be centrally manufactured into any differentiated cell type, including, but not limited to neurons, muscles, etc. Before differentiation, the PSC can also be modified with additional genetic safeguards and efficacy enhancing features beyond just HLA Class 1 genes, thus reducing time and effort.
Provided herein are methods and compositions for generating a blank human PSC line that is devoid of HLA-1 proteins and capable of accepting a new HLA-1 haplotype. In some embodiments, methods described herein may utilize gene editing technology well known in the art. Also provided herein are methods and compositions for reconfiguring and synthesizing HLA Class 1 as a single linked 115-kb locus, with modular assembly features for inputting any patient haplotype on-demand. Further provided herein are methods and compositions for integrating a 115-kb donor HLA Class 1 locus into a blank stem cell line, while preserving the native regulatory information and expression levels of the HLA Class 1 genes. The technologies described herein can provide personalized stem cells for both immunotherapy and regenerative medicine applications, leading to more efficacious and safer cell-based treatments. The technologies described herein can also provide a cost-effective resource for allogeneic cell therapies. The methods described herein for developing technologies to make “off-the-shelf” stem cells compatible with the immune system of any patient, especially underserved patients with rare immune determinants, as a source to make any adult cell-type suitable for numerous applications in immunotherapy and regenerative medicine.
Methods described herein for genetically engineering immune-matches using a single well-defined PSC, enabling centralized good manufacturing practice (GMP) “off-the-shelf” cells for differentiation to therapeutic cells. This is in contrast to three other approaches, each with major limitations, as described below (Table 1).
(1) Autologous iPSCs. Patient-derived iPSCs obviate the need for HLA matching. However, methods that generate bespoke iPSCs are inefficient and cost-prohibitive [11], largely because each new line must be validated against genetic abnormalities for clinical GMP safety and assessed for differentiation capacity [12,13]. Furthermore, any genetic modifications (e.g., genomic safeguards) would have to be introduced and verified anew each time.
(2) iPSC Banking. Initiatives to provide banks of iPSCs covering common HLA haplotypes are underway in the US, UK, and Japan, but at a cost of $250 million and screening of >160,000 people [14,15]. Further, each banked cell line will require cGMP safety testing as above. Finally, due to genetic and epigenetic differences, each line will vary in ability to differentiate to other cell types, making centralized manufacturing infeasible.
(3) “Immune-Cloaking” Technology. So-called “universal” allogeneic iPSCs can be generated by deleting the B2M gene, thus eliminating HLA Class I cell surface localization [16]. This “hides” cells from recipient T-cells, but instead induces the missing-self response, alerting natural killer cells to destroy HLA-less cells. Many companies attempt to circumvent this issue by expressing HLA-E/G [17], as these HLAs are immune-suppressing. However, this approach has a critical flaw: if these cells are infected by a virus, bacterium, fungus or become cancerous, they'll become invisible reservoirs of disease because the immune system fails to recognize this dangerous cell population as they no longer display any foreign antigens or neoantigens. Indeed, the loss of HLA expression is a very common mechanism by which cancer avoids elimination by the immune system [18,19]. For cell therapies, this issue poses a serious safety concern for long-term transplantation of immune-cloaked cells. And last but not least, many natural killer cells do not express NKG2A (which HLA-E inhibits); this population of natural killer cells is expected to eliminate HLA-E/G expressing cells [16,20,21].
Provided herein are methods and compositions for “immune-matching” strategies that has crucial advantage of preserving essential immune surveillance capabilities as compared to the “immune-cloaking” approach described above. Strategies described herein simultaneously capture the major advantages offered by iPSC banks and immune-cloaking technology (Table 2). Furthermore, immune-matched iPSCs described herein can serve individuals from underrepresented minorities and those with rare HLA haplotypes, a critical unmet need. In addition, the greatest advantage of this off-the-shelf technology is its potential as a standardized platform cell: a uniform, isogenic cell that might be optimized for specific tasks and therapeutics. For example, cells could include ultra-safe features like kill-switches or immune modulating receptors [22-24].
A key innovation is that by reconfiguring HLA-A/B/C as a single linked synthetic locus (“synHLA”), custom HLA haplotypes can be delivered in a single shot directly into a blank acceptor cell (lacking HLA-AB/C). As iPSCs can be differentiated into any therapeutic cell type and can be a preferred cell type, in some embodiments. The inventors' platform technology enables the generation of 100+kb DNA (“Big-DNA”) and its integration into cells using highly specific recombinases or homology-directed repair using site-specific nucleases (CRISPR, TALENs, ZFNs) [25].
Finally, experimental evidence suggests that allogeneic transplants with a homozygous haplotype (e.g., a haplomatch) will be well tolerated [28,29]. This would allow generating a bank of pre-matched cells. A bank of 140 homozygous haplotypes has been estimated to cover 90% of the Japanese population, for example, but would require screening 160,000 individuals [15]; similar numbers are estimated for the UK population [30]. The HLA-matching technology described herein could, in principle, provide haplomatches rapidly and be installed in any cell type that is renewable in culture. It has been computationally determined that just 10 haplotypes can cover 30% of a diverse California population, and can be a reasonable first goal. 50% of people can be covered with just 46 haplotypes. Finally, only matching 2 HLAs may be required for transplants [31-33], and it has been calculated that using this method, a bank of 29 haplotypes could cover 73% of people. HLA haplotype data was taken from the website [34].
In some embodiments, HLA Class II protein deletion may not be necessary for most cell types, because HLA Class II protein are only expressed in professional antigen presenting cells [44,45]. In some embodiment, successful proof-of-principle for HLA-A/B/C could establish feasibility for HLA Class II, which would be useful for fully manipulating dendritic cell antigen presentation.
In some aspects, provided herein, is a method of producing a synthetic genome, the method comprising: a) obtaining a cell from a sample, wherein the cell comprises a genome; b) contacting a first nucleic acid sequence at a first locus on the genome with a first set of one or more agents, wherein the first nucleic acid sequence encodes a first plurality of antigens, and wherein the first locus comprises one or more binding sites for a first set one or more agents; and c) contacting a second nucleic acid sequence at a second locus on the genome with a second set of one or more agents, wherein the second nucleic acid sequence encodes a second antigen, and wherein the second locus comprises one or more binding sites for a second set of one or more agents, thereby producing the synthetic genome.
In some embodiments, the first plurality of antigens and the second antigens comprise a major histocompatibility complex (MHC) or a human leukocyte antigen (HLA). In some embodiments, the MHC or the HLA comprises class I MHC. In some embodiments, the HLA comprises an HLA-A gene, an HLA-B gene, an HLA-C gene, or a combination thereof. In some embodiments, the one or more binding sites comprise a target sequence for the first set or the second set of the one or more agents. In some embodiments, the first set and the second set of the one or more agents comprise a polypeptide, a polynucleotide, or a combination thereof. In some embodiments, the polypeptide comprises an enzyme. In some embodiments, the enzyme comprises a site-specific nuclease. In some embodiments, the site-specific nuclease comprises an engineered homing endonuclease or meganuclease, a zinc-finger nuclease (ZFN), a transcription activator-like effector nuclease (TALEN), a clustered regularly interspaced short palindromic repeat (CRISPR/Cas9), or a combination thereof. In some embodiments, the polynucleotide comprises a guide RNA (gRNA), a single-stranded oligodeoxynucleotide (ssODN), or a combination thereof. In some embodiments, the ssODN comprises at least 50 bp. In some embodiments, the ssODN comprises at least 80 bp. In some embodiments, the ssODN may comprise approximately 50-5000 bp. In some embodiments, the ssODN may comprise at most 5000 bp. In some embodiments, the ssODN may comprise a sequence comprising SEQ ID NO: 3. In some embodiments, the ssODN may comprise a sequence comprising SEQ ID NO: 4. In some embodiments, the first set and the second set of the one or more agents are delivered via a carrier. In some embodiments, the carrier comprises a vector, a messenger RNA (mRNA), double stranded DNA (dsDNA), single stranded DNA (ssDNA), or a plasmid.
In some embodiments, the first locus is at least 1 megabase away from the second locus. In some embodiments, the first nucleic acid sequence and the second nucleic acid sequence comprises at least 3 kilobases. In some embodiments, the first nucleic acid sequence comprises approximately 100 kilobases. In some embodiments, the second nucleic acid sequence comprises approximately 3 kilobases.
In some embodiments, the cell comprises a pluripotent stem cell (PSC) or an induced pluripotent stem cell (iPSC). In some embodiments, the sample comprises a biological sample. Examples of a biological samples is described in “Biological Samples and Cells section”.
In some embodiments, the method further comprises integrating a third nucleic acid sequence at the second locus on the genome after step c). In some embodiments, the integrating comprises homology-directed recombination (HR). In some embodiments, integrating comprises utilizing a recombinase. In some embodiments, the recombinase is constitutively expressed. In some embodiments, the recombinase is inducible, i.e., the expression of the recombinase is inducible. In some embodiments, the recombinase is a site-specific recombinase. In some embodiments, the recombinase comprises a Cre recombinase, a Flp recombinase, or a combination thereof. In some embodiments, integrating comprises utilizing an inducible or constitutive site-specific recombinase Cre, a Flp recombinase, or a combination thereof. In some embodiments, integrating comprises utilizing an inducible or constitutive site-specific recombinase Cre, an inducible or constitutive site-specific Flp recombinase, or a combination thereof. In some embodiments, the third nucleic acid sequence is linear. In some embodiments, the third nucleic acid sequence comprises a sequence homologous to the second locus on the genome. In some embodiments, the third nucleic acid sequence comprises a HLA-A gene, a HLA-B gene, a HLA-C gene, or a combination thereof. In some embodiments, the third nucleic acid sequence further comprises a regulatory element sequence. In some embodiments, the regulatory element sequence comprises a promoter sequence, an enhancer sequence, a terminator sequence, or a combination thereof. In some embodiments, the third nucleic acid sequence further comprises an intergenic sequence. In some embodiments, the intergenic sequence comprises at least about 50 kilobases. In some embodiments, the third nucleic acid sequence further comprises one or more loxP/loxM sites. In some embodiments, the third nucleic acid sequence comprises at least 20 kilobases. In some embodiments, the third nucleic acid sequence comprises at least 30 kilobases. In some embodiments, the third nucleic acid sequence comprises at least 100 kilobases. In some embodiments, a synHLA may comprise a sequence comprising SEQ ID NO: 1. In some embodiments, a synHLA may comprise a sequence comprising SEQ ID NO: 8. In some embodiments, a synHLA may comprise a sequence comprising SEQ ID NO: 14.
In some aspects, provided herein is a method of integrating a DNA construct into a genome, the method comprising: a) contacting the genome from a sample with one or more agents, wherein the one or more agents are capable of cleaving the genome at a locus; b) integrating a first nucleic acid sequence into the genome at the locus; and c) integrating the one or more second nucleic acid sequences into the first nucleic acid sequence in the genome, thereby integrating the DNA construct into the genomes, wherein each of the one or more second nucleic acid sequences comprises a cargo sequence.
In some embodiments, the first nucleic acid sequence comprises a landing pad sequence. In some embodiments, the landing pad sequence comprises one or more genes, regulatory elements, or combinations thereof. In some embodiments, the one or more regulatory elements comprise a promoter, a terminator, a recombinase recognition site, or a combination thereof. In some embodiments, the promoter comprises a universal chromatin opening element (UCOE), human EF1α promoter, phosphoglycerate kinase (PGK), or a combination thereof. In some embodiments, the terminator comprises simian virus 40 (SV40) terminator, bovine growth hormone (bGH) terminator, human growth hormone (hGH) terminator, or rabbit beta-globin (rbGlob).
In some embodiments, the one or more genes comprise a gene encoding estrogen receptor 2 (ERT2) or a variant thereof, a gene encoding P2A or a variant thereof, a Thymidine Kinase or a variant thereof, a gene encoding a recombinase or a variant thereof, a blasticidin resistance gene, a hygromycin resistance gene, or a gene encoding a fluorescent protein or a variant thereof. In some embodiments, the ERT2 gene is fused with a gene encoding a recombinase. In some embodiments, the ERT2 gene is induced by tamoxifen. In some embodiments, the recombinase comprises a tyrosine recombinase. In some embodiments, the recombinase comprises a Cre recombinase, a Flp recombinase, a variant thereof, or a combination thereof. In some embodiments, the recombinase is codon-optimized for mammals. In some embodiments, the recombinase lacks CpG sites. In some embodiments, the gene encoding the recombinase is induced by a gene expression system. In some embodiments, the gene encoding the recombinase is induced by doxycycline.
In some embodiments, the fluorescent protein is a red fluorescent protein, a green fluorescent protein, a yellow fluorescent protein, a blue fluorescent protein or a cyan fluorescent protein. In some embodiments, integrating in steps b) and c) comprises sequential integration. In some embodiments, the recombinase recognition site comprises a Cre recognition site, a Flp recognition site, or a combination thereof. In some embodiments, the recombinase recognition site comprises one or more loxP, loxM, lox2722, loxFAS, lox5171, frt, a variant thereof, or a combination thereof.
In some embodiments, the one or more second nucleic acid sequences are provided by one or more vectors configured to carry nucleic acid sequence that comprises approximately from 10 kilobases to 1,000 kilobases. In some embodiments, the one or more vectors comprise a marker, wherein the marker is devoid of a promoter and a start codon. In some embodiments, the marker comprises a selectable marker. In some embodiments, the selectable marker comprises an antibiotic marker.
In some embodiments, the genome in the sample is contained in a cell. In some embodiments, the contacting and integrating are performed in the cell. In some embodiments, the cell comprises a mammalian cell. In some embodiments, the mammalian cell comprises a human cell or a mouse cell. In some embodiments, the cell comprises a primary cell. In some embodiments, the primary cell comprises an embryonic stem cell (ESC), a pluripotent stem cell (PSC), or an induced pluripotent stem cell (iPSC).
In some embodiments, the one or more agents comprise a polypeptide, a polynucleotide, or a combination thereof. In some embodiments, the polypeptide comprises a site-specific nuclease. In some embodiments, the site-specific nuclease comprises an engineered homing endonuclease or meganuclease, a zinc-finger nuclease (ZFN), a transcription activator-like effector nuclease (TALEN), a clustered regularly interspaced short palindromic repeat (CRISPR), or a combination thereof. In some embodiments, the polynucleotide comprises a guide RNA (gRNA).
In some embodiments, the one or more agents are delivered via a carrier. In some embodiments, the carrier comprises a vector, a messenger RNA (mRNA), a double stranded DNA (dsDNA), a single stranded DNA (ssDNA), or a plasmid. In some embodiments, each of the one or more second nucleic acid sequences comprises at least 10 kilobases, 100 kilobases, or 1 megabase. In some embodiments, each of the one or more second nucleic acid sequences comprise approximately from 1 kilobase to 10 megabases.
In some embodiments, the sample comprises a biological sample. In some embodiments, the biological sample comprises a cell, tissue sample, or blood sample. In some embodiments, the biological sample is obtained from a subject. In some embodiments, the subject is a human.
In some embodiments, the integrating in steps b) and c) comprises homology-directed recombination (HR). In some embodiments, the integrating comprises utilizing an inducible or constitutive site-specific recombinase Cre, a Flp recombinase, or a combination thereof. In some embodiments, the cargo sequence comprises a sequence encoding one or more antigens. In some embodiments, the integrating in steps b) or c) comprise a self-excising recombinase. In some embodiments, the cargo sequence comprises 100 kilobases, 500 kilobases, or 1 megabase.
In some embodiments, the first plurality of antigens and the second antigens comprise a major histocompatibility complex (MHC) or a human leukocyte antigen (HLA). In some embodiments, the MHC or the HLA comprises class I MHC. In some embodiments, the HLA comprises an HLA-A gene, an HLA-B gene, an HLA-C gene, or a combination thereof. In some embodiments, wherein the one or more binding sites comprise a target sequence for the first set or the second set of the one or more agents.
In some embodiments, the first set and the second set of the one or more agents comprise a polypeptide, a polynucleotide, or a combination thereof. In some embodiments, the polypeptide comprises a site-specific nuclease. In some embodiments, the site-specific nuclease comprises an engineered homing endonuclease or meganuclease, a zinc-finger nuclease (ZFN), a transcription activator-like effector nuclease (TALEN), a clustered regularly interspaced short palindromic repeat (CRISPR/Cas), or a combination thereof.
In some embodiments, the polynucleotide comprises a guide RNA (gRNA), a single-stranded oligodeoxynucleotide (ssODN), or a combination thereof. In some embodiments, the ssODN comprises at least 50 bp or at least 80 bp. In some embodiments, the first set and the second set of the one or more agents are delivered via a carrier. In some embodiments, the carrier comprises a vector, a messenger RNA (mRNA), double stranded DNA (dsDNA), single stranded DNA (ssDNA), or a plasmid.
In some embodiments, the first locus is at least 10 kilobases or at least 1 megabase away from the second locus. In some embodiments, the first locus and the second locus are on different chromosomes. In some embodiments, the first nucleic acid sequence and the second nucleic acid sequence comprises at least 3 kilobases. In some embodiments, the first nucleic acid sequence comprises approximately 100 kilobases. In some embodiments, the second nucleic acid sequence comprises approximately 3 kilobases.
In some embodiments, the cell comprises a pluripotent stem cell (PSC) or an induced pluripotent stem cell (iPSC). In some embodiments, the sample comprises a biological sample. In some embodiments, the biological sample comprises a cell, tissue sample, or blood sample.
In some embodiments, the method further comprises integrating a third nucleic acid sequence at the second locus on the genome after step c). In some embodiments, the integrating comprises homology-directed recombination (HR). In some embodiments, the integrating comprises utilizing an inducible or constitutive site-specific recombinase Cre, a Flp recombinase, or a combination thereof.
In some embodiments, the third nucleic acid sequence is linear. In some embodiments, the third nucleic acid sequence comprises a sequence homologous to the second locus on the genome. In some embodiments, the third nucleic acid sequence comprises a HLA-A gene, a HLA-B gene, a HLA-C gene, or a combination thereof. In some embodiments, the third nucleic acid sequence further comprises a regulatory element sequence. In some embodiments, the regulatory element sequence comprises a promoter sequence, an enhancer sequence, a terminator sequence, or a combination thereof.
In some embodiments, the third nucleic acid sequence further comprises an intergenic sequence. In some embodiments, the intergenic sequence comprises at least about 50 kilobases. In some embodiments, the third nucleic acid sequence further comprises one or more loxP/loxM sites. In some embodiments, the third nucleic acid sequence comprises at least 10 kilobases. In some embodiments, the third nucleic acid sequence comprises at least 30 kilobases. In some embodiments, the third nucleic acid sequence comprises at least 100 kilobases.
In some aspect, provided herein, is a method of producing a synthetic genome, the method comprising: a) obtaining a cell from a sample, wherein the cell comprises a genome; b) contacting a nucleic acid sequence encoding an antigen at a locus on the genome with a set of one or more agents, wherein the locus comprises one or more binding sites for the set of the one or more agents; c) integrating a synthetic sequence to the locus on the genome, wherein the integrating the synthetic sequence to the locus on the genome preserves regulatory components for gene expression, thereby producing the synthetic genome.
In some embodiments, the antigen comprises a major histocompatibility complex (MHC) or a human leukocyte antigen (HLA). In some embodiments, the MHC or the HLA comprises class I MHC. In some embodiments, the HLA comprises an HLA-A gene, an HLA-B gene, an HLA-C gene, or a combination thereof. In some embodiments, the one or more binding sites comprise a target sequence for the set of the one or more agents. In some embodiments, the set of the one or more agents comprises a polypeptide, a polynucleotide, or a combination thereof. In some embodiments, the polypeptide comprises a site-specific nuclease. In some embodiments, the site-specific nuclease comprises an engineered homing endonuclease or meganuclease, a zinc-finger nuclease (ZFN), a transcription activator-like effector nuclease (TALEN), a clustered regularly interspaced short palindromic repeat (CRISPR/Cas), or a combination thereof.
In some embodiments, the polynucleotide comprises a guide RNA (gRNA), a single-stranded oligodeoxynucleotide (ssODN), or a combination thereof. In some embodiments, the ssODN comprises at least 50 bp or at least 80 bp. In some embodiments, the set of the one or more agents are delivered via a carrier. In some embodiments, the carrier comprises the carrier comprises a vector, a messenger RNA (mRNA), double stranded DNA (dsDNA), single stranded DNA (ssDNA), or a plasmid.
In some embodiments, the nucleic acid sequence comprises at least 1 kilobases. In some embodiments, the nucleic acid sequence comprises approximately 100 kilobases. In some embodiments, the nucleic acid sequence comprises approximately 3 kilobases. In some embodiments, the cell comprises a pluripotent stem cell (PSC) or an induced pluripotent stem cell (iPSC).
In some embodiments, the sample comprises a biological sample. In some embodiments, the biological sample comprises a cell, tissue sample, or blood sample. In some embodiments, the integrating comprises homology-directed recombination (HR). In some embodiments, the integrating comprises utilizing an inducible site-specific recombinase Cre, a Flp recombinase, or a combination thereof. In some embodiments, the synthetic sequence is linear. In some embodiments, the synthetic sequence comprises a sequence homologous to the locus on the genome. In some embodiments, the synthetic sequence comprises a HLA-A gene, a HLA-B gene, a HLA-C gene, or a combination thereof.
In some embodiments, the synthetic sequence further comprises a regulatory element sequence. In some embodiments, the regulatory element sequence comprises a promoter sequence, an enhancer sequence, a terminator sequence, or a combination thereof. In some embodiments, the synthetic sequence further comprises an intergenic sequence. In some embodiments, the intergenic sequence comprises at least about 50 kilobases. In some embodiments, the synthetic sequence further comprises one or more loxP/loxM sites. In some embodiments, the synthetic sequence comprises at least 10 kilobases. In some embodiments, the synthetic sequence comprises at least 30 kilobases. In some embodiments, the synthetic sequence comprises at least 100 kilobases.
These examples are provided for illustrative purposes only and not to limit the scope of the claims provided herein.
A Recombinase Mediated Cassette Exchange (RMCE) landing pad (LP) approach was designed to write large DNA into the genome of human iPSCs, which, when repeated consecutively, enables megabase genome engineering (
The initial LP design (
Large DNA payloads (100 kb+) are integrated using a Payload Vector (PV) (
It was found that the tamoxifen inducible double ert2 Cre fusion in LP1 (
It was also found that LP1 was rapidly silenced without continued use of blasticidin antibiotic in the media, as mScarlet expression diminished within 3 days (
Based on these results, the landing pad was reconstructed to a final configuration called LPneo, which represents the most common REWRITE landing pad configuration (
To show that REcombinase WRiting of Iterative DNA and Trap Excision (REWRITE) enables Big-DNA delivery into human induced pluripotent stem cells (hiPSCs), a REWRITE landing pad was generated to remove a 100 kilobase genomic region covering the HLA-B and HLA-C genes on chromosome 6 in iPSCs. Two CRISPR gRNAs for erCas12a, also called MAD7, were designed to generate double strand breaks spanning a 100 kilobase region. The REWRITE landing pad was successfully integrated over this 100 kilobase genomic locus, as the iPSCs were both Blasticidin resistant and mScarlet+.
Using the methods described herein, REWRITE landing pad was inserted at the site of HLA-B/C deletion in human iPSCs (
A 115 kilobase synthetic version of the human HLA locus, called synHLA, was assembled into our Payload Vector (PV). The 115 kilobase PV was introduced into the REWRITE iPSCs via nucleofection, and the iCre expression module was then activated for 3 days using Doxycycline. After 3 days, successful integrants was selected using Puromycin. After 10 days, −30 distinct colonies indicative of Big-DNA delivery were observed. Under fluorescent microscopy it was confirmed that all clones were mNeongreen+in addition to Puromycin resistant. This confirms that the REWRITE landing pad approach enables the delivery of Big-DNA of at least 115 kilobases.
A well-characterized iPSC (PGP1) from the Coriell Intitute was acquired [50] for this experiment; however, any human iPSC can be used. Starting with this iPSC line, a “blank” iPSC devoid of HLA-A/B/C and encoding the Recombinase Mediated Cassette Exchange (RMCE) landing pad will be constructed (
The 100 kb HLA-B/C locus will be replaced with the RMCE landing pad (5.5 kb) described earlier by CRISPR or TALEN or ZFN (heretofore “Site-Specific Nucleases” or “SSNs”) DNA-cut stimulated homology-directed recombination (HR). The other allele will be cleanly deleted with a commercially synthesized 100 bp single stranded oligodeoxynucleotides (ssODN) deletion fragment [46,47]. Two SSN guide RNAs will be introduced, one upstream of HLA-C and one downstream of HLA-B. The landing pad construct will be introduced as linear dsDNA designed with homology arms to the region (
The final step in preparing the blank iPSC requires deleting HLA-A (3.6 kb) (
Alternative Methods:
The second allele can also be deleted using a Deleter construct expressing Hygromycin resistance gene and Thymidine Kinase under a PGK promoter (
Big-DNA can be assembled up to 1 Mb by co-opting the naturally efficient homologous recombination (HR) machinery of budding yeast using techniques described earlier. Multiple linear DNA fragments can be stitched by using HR into a vector with yeast artificial chromosome (YAC) and bacterial artificial chromosome (BAC) replication sites, capable of carrying up to 1 Mb of DNA. Here the goal is to reconfigure HLA-A/B/C into a single locus of −115 kb (synHLA) or miniature version of ˜36 kb (mini-synHLA).
The HLA-A gene, including regulatory information (
Both synHLA and mini-synHLA may be constructed from smaller 4-kb DNA pieces with −180 bp of overlap (genomic PCR of the iPSC DNA and/or commercial synthesis). Sections may also be captured through Transformation Assisted Cloning using yeast from readily available Bacterial Artificial Chromosomes (BACs) containing human genetic loci. Examples of personalized HLA genic DNAs (˜3.4 kb) will be commercially synthesized from an HLA haplotype database. The amplicons will be transformed into budding yeast with the linearized YAC/BAC vector, assembled by HR, and identified by selecting for the plasmid auxotrophic marker (e.g., HIS3) (
Using the methods described above, a mini-synHLA comprising HLA-A/B/C (
Using the methods described above, 115 kb synHLA comprising HLA-A/B/C was assembled in yeast. A 10 kB fragment of the HLA-A locus was placed within the 78 kb intergenic region between HLA-B and HLA-C from a BAC clone, and these were assembled into the Payload Vector (PV1f) in yeast (
Personalizing class-I HLA haplotypes would provide immunological matches to various ethnic populations across the world (
It is important to note that RMCE methods insert DNA without size-limitations, with high specificity, and reduce the need for whole genome sequencing or equivalent validation of each new cell line [25], as nuclease-assisted HR would require [26,27]. Only a single correct clone needs to be isolated as PSCs can be expanded indefinitely, therefore, high integration efficiency is not required.
Alternative methods:
Authentication of Key Resources and Reproducibility
Expertise Deleting Genomic DNAs (Example 3):
The inventors have technology to delete big genomic DNA regions to homozygosity, including the HLAs (H2K), from 5 kb to 200 kb in mouse embryonic stem cells (mESCs) at frequencies between 2-50%, without the use of any selectable marker (Table 3). These deletions were achieved using two targeted CRISPR gRNAs and a 100 bp single-stranded oligonucleotide (ssODN) to induce homology-directed repair [46,47]. In some embodiments, Inscripta's MAD7 CRISPR system can be used [48].
Expertise Building Big-DNA (Example 4):
Big-DNA assembly technology leveraging yeast homologous recombination and developed by the inventors enables fusion of multiple linear DNAs into one large Big-DNA with success in configurations so far up to 1 Mb [39]. A high-throughput automated workflow for the design, assembly, and verification (1536 qPCR and deep sequencing) can be used [43]. Assembling Big-DNAs in the HLA size-range is a routine task (Table 4).
Demonstration of 101 kb DNA Delivery to Mammalian Cells (Example 5):
Big-DNA assembly and delivery have been reduced to practice using Recombinase Mediated Cassette Exchange (RMCE). RMCE uses recombinases (Cre and others) to integrate large DNA payloads site-specifically. Mutant heterotypic recognition sites based on lox series (loxM, loxP, etc.) ensure unidirectional DNA insertion. A 101 kb DNA comprising the human HPRT locus has been integrated into mouse ESCs, as described previously [42] using a RMCE method called Inducible Cassette Exchange (ICE) [49]. A 170 kb DNA comprising rat HoxA locus has also been delivered to mESCs using the same method [62]. A 144 kb human Sox2 locus has been delivered to mESCs using “Big-IN” method [61]. ICE method can leave behind genomic scars such as transgenes used for positive selection.
Scarless DNA Integration with “REWRITE” (Example 5):
A robust and nearly scarless proprietary Big-DNA delivery method called REcombinase WRiting of Iterative DNA and Trap Excision (REWRITE), which leaves behind only 34 bp recombinase sites have been developed by the inventors. The REWRITE landing pad also co-delivers an inducible FLP recombinase gene for inducible excision of markers. This enables precise delivery of >100 kb DNA payloads (e.g., synHLA) without DNA scars (
REWRITE landing pad plasmid and REWRITE payload plasmid containing location specific homology arms inserted at the flanking BsaI sites by Golden Gate Assembly were generated (
The examples and embodiments described herein are for illustrative purposes only and various modifications or changes suggested to persons skilled in the art are to be included within the spirit and purview of this application and scope of the appended claims.
This application claims the benefit of U.S. Provisional Application No. 63/321,561 filed on Mar. 18, 2022, and U.S. Provisional Application No. 63/322,664 filed on Mar. 23, 2022, each of which is incorporated by reference herein in its entirety.
This invention was developed with support from the Government of the United States of America under AI154417 and AI148008 awards by the National Institutes of Health. The Government of the United States of America may have certain rights in this invention.
Number | Date | Country | |
---|---|---|---|
63321561 | Mar 2022 | US | |
63322664 | Mar 2022 | US |