The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy was created on Oct. 13, 2021, is named “058636_00389_ Sequence_Listing_ST25.txt” and is 28,205 bytes in size.
A global understanding of genomic regulatory architecture is critical to interpreting the effect of variants associated with common human traits and diseases1. As the regulation of genes throughout development depends strongly on their native chromatin and genomic environments2, short artificial constructs are inherently incapable of modeling the complexity of native loci, even when integrated genomically. Analysis of natural sequence variation in regulatory DNA provides one high-throughput approach for functional assessment in an endogenous cellular and genomic context, but detailed investigation of locus architecture is limited by the low frequency of informative variants and patterns of linkage disequilibrium3,4.
Transgenic mammalian cell lines and animals generated using homologous recombination5,6 and the subsequent development of nuclease-mediated genome editing7 have enabled detailed functional analysis of the regulation of individual genes at their endogenous loci. These technologies have since facilitated screens of noncoding regulatory elements8,9 and locus-scale analyses10,11. However, editing approaches offer limited control over the final sequence, a low maximum edit size, no inherent allele specificity at diploid loci, and the risk of off-target editing by designer nucleases12.
Many limitations of genome editing do not apply to production of DNA using recombineering or yeast assembly approaches13,14. Indeed, transgenesis of large constructs such as yeast and bacterial artificial chromosomes (YACs and BACs)15 has enabled position-independent, copy-number dependent expression, reproduction of organismal phenotypes such as the developmental switch from fetal to adult hemoglobin16,17, and modeling of disease-associated variation18. Engineering of mammalian cells using recombinase-mediated cassette exchange (RMCE)19-22 or serine recombinase approaches' have enabled efficient single-copy targeting. RMCE schemes have been adapted for targeting large DNAs in mammalian cells24,25. However, existing schemes are not readily portable to new loci or cell lines, in particular to stem cells which may not tolerate certain selection schemes. Furthermore, the gene traps employed to select for integrants remain as transcriptionally active genomic scars, which confound dissection of regulatory sequences unless removed through a subsequent engineering step. Finally, all these approaches suffer from the difficulty of verifying both on-target and off-target events. These technical limitations on editing endogenous loci have impeded the development of synthetic regulatory genomics as an approach to understanding the regulatory architecture of mammalian genomes.
Thus, there is an ongoing and unmet need for improved approaches to locus-scale genome modification. The disclosure is pertinent to these and other needs.
The present disclosure relates generally to modifying chromosomes of eukaryotic cells, and in particular, mammalian cells. The method generally comprises iterative gene writing by sequential introduction of particular DNA segments into any genomic locus of interest. The compositions and methods include use of selection and counter selection to provide for insertion of large DNA segments, e.g., up to 5 kilobases (kb), or more. The DNA segments include a payload segment that can code for and facilitate expression of any RNA, including mRNA, and the concomitant expression of the protein encoded by the mRNA. The compositions and methods are suitable for modifying any mammalian cells. The modifications can be homozygous, heterozygous, or hemizygous. The cells modified using the described compositions and methods may be haploid, diploid, or tetraploid. The compositions and methods can thereby result in modified cells. The modified cells may comprise any type of stem cells, specific examples of which are discussed in the detailed description. The modified stem cells can be used to produce modified embryos, and modified mammals that develop from the modified embryos.
In one aspect, the disclosure provides a method for insertion of a DNA payload into a chromosomal locus in mammalian cells. The method generally comprises introducing into a selected locus a first double stranded DNA template (referred to herein as a landing pad “LP”) that comprises 5′ and 3′ homology arms (HAs). The LP comprises one or more selection markers. The LP comprises a pair of recombinase recognition sites configured to excise a segment of the LP that comprises at least one negative selection marker. The method comprises selecting cells that comprise the LP using the positive selection marker to obtain an isolated population of the mammalian cells that comprise the LP. Once selected cells that comprise the LP are selected, the method further comprises introducing into the selected cells a second dsDNA comprising a payload sequence and a positive selection marker used to select cells that comprise the payload. The positive selection marker is i) within the payload sequence in the second dsDNA and is inserted into the locus, or ii) is present on a location on the second dsDNA that is not inserted into the locus. A recombinase that is introduced into or already present in the cells recognizes the recombinase recognition sites and removes at least the segment of the LP that comprises the negative selection marker in at least some of the mammalian cells, such that at least the segment of the LP comprising the negative selection marker is replaced by the payload by homologous recombination of the payload into the locus in at least some of the mammalian cells. The method further comprises exposing the mammalian cells to an agent that acts on the negative selection marker such that only mammalian cells that contain the LP and the negative selection marker but not the payload are killed. Subsequently, the method comprises separating mammalian cells that comprise the payload but do not contain the LP to thereby obtain isolated viable mammalian cells that comprise the payload.
The LP may be introduced into the mammalian cells using any of a variety of techniques, which include by are not necessarily limited to using a nuclease system selected from an RNA-guided clustered regularly interspaced short palindromic repeats (CRISPR) enzyme, a Transcription activator-like effector nuclease (TALEN), a zinc finger nuclease, or a MAD-series nuclease.
In non-limiting embodiments, the mammalian cells into which the LP is introduced comprise an endogenous mutated gene that encodes Phosphatidylinositol Glycan Anchor Biosynthesis Class A (PIGA) enzyme such that the function of the PIGA enzyme is reduced or eliminated relative to a non-mutated gene that encodes the PIGA enzyme. In this configuration, the LP comprises a sequence encoding a functional PIGA enzyme as a negative selection marker, wherein an agent that acts on the negative selection marker is used and comprises Proaerolysin. In embodiments, the LP comprises a sequence encoding a herpes simplex virus type 1—thymidine kinase (HSV1-TK). In this configuration, an agent that acts on the negative selection marker is ganciclovir.
In various embodiments, the payload is only inserted into the locus on one homologous chromosome to thereby provide a heterozygous chromosome pair in which only one chromosome in the pair comprises the payload. In an embodiment, a positive selection marker is within the payload sequence in a second dsDNA and is inserted into the locus with the payload. In embodiments, the positive selection marker is present on a location on a second dsDNA that is not inserted into the locus, and the payload is inserted into the locus without the positive selection marker.
The disclosure also includes mammalian cells that are made using the described compositions and methods. The disclosure includes modified stem cells, and embryos that comprise the modified stem cells. Non-human transgenic mammals made by the described compositions and methods are also included.
Unless defined otherwise herein, all technical and scientific terms used in this disclosure have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains.
Every numerical range given throughout this specification includes its upper and lower values, as well as every narrower numerical range that falls within it, as if such narrower numerical ranges were all expressly written herein.
The disclosure includes all polynucleotide and amino acid sequences described herein. Each RNA sequence includes its DNA equivalent, and each DNA sequence includes its RNA equivalent. Complementary and anti-parallel polynucleotide sequences are included. Every DNA and RNA sequence encoding polypeptides disclosed herein is encompassed by this disclosure. Amino acids of all protein sequences and all polynucleotide sequences encoding them are also included, including but not limited to sequences included by way of sequence alignments. Sequences of from 80.00%-99.99% identical to any sequence (amino acids and nucleotide sequences) of this disclosure are included.
The disclosure includes all polynucleotide and all amino acid sequences that are identified herein by way of a database entry. Such sequences are incorporated herein as they exist in the database on the effective filing date of this application or patent.
The disclosure in certain aspects provides for sequential modification of eukaryotic cells that results in substitution of an introduced landing pad (LP) with a DNA payload.
Generally, the method comprises modifying a chromosome by inserting the LP in a selected locus, and replacing all or a segment of the LP with the DNA payload. The LP can comprise negative and positive selection markers so that cells that initially comprise the LP can be selected, and cells that do not comprise the LP (and which may also comprise the payload) can be eliminated. Cells that contain the LP are modified by a one-step recombinase mediated insertion of the DNA payload, which may be large (e.g., greater than 100 kb in length). After introduction into the cells, the payload may provide a transient (e.g., in a plasmid backbone) or persistent (e.g., integrated) positive selection marker that allows selection of cells that contain the payload but do not contain the LP. The disclosure includes insertion of multiple payloads in the same locus, and insertion of the same or different payloads in different loci, including multiple copies thereof if desired. Thus, iterative cell editing is included. The disclosure includes scarless insertion of the payload, with the exception of retained recombinase recognition sequences, as further described below.
The methods of this disclosure are performed using DNA constructs and involve the participation of certain proteins. In embodiments, the protein may be produced within the cell via expression of any suitable expression system that encodes the protein. In embodiments, any protein required to participate in the described process may be modified such that it includes a nuclear localization signal. In embodiments, a protein may be administered directly to the cells. For proteins that require an RNA component to function, such as certain Cas proteins as described below, the protein(s) and the RNA component may be administered to the cells as ribonucleoproteins (RNPs).
The disclosure in certain aspects provides for initial insertion of an LP into any desired chromosomal locus. In embodiments, the LP comprises first and second homology arms (each an “HA” and together “HAs”) which are configured to be introduced into any desired chromosomal locus using any suitable nuclease.
The sequence of the 5′ and 3′ homology arms are not particularly limited, provided they have a length that is adequate for homologous recombination to occur when nuclease-mediated cleavage of the selected locus occurs. In embodiments, the 5′ and 3′ homology arms have a length of from 100 bp-10 Kbp, inclusive, and including all integers and ranges of integers there between. In embodiments, the entire LP is 3.5 to inclusive, and including all integers and ranges of integers there between.
The LP includes recombinase recognition sequences that are configured so that a segment of the LP between the HAs can be recognized and excised by one or more recombinases in order to subsequently replace LP with the payload by operation of the recombinase.
The type of recombinase and recombinase recognition signals are not particularly limited, other than a preference for maintenance of the recombination recognition sites after a recombination event to enable iterative removal and insertion of different payloads in the same locus. Thus the disclosure includes using any site-dependent recombinase that recognizes heterotypic recombination sites.
In embodiments, the recombinase comprises Cre recombinase, and is used with lox sites, such as loxP and LoxM sites; or a Flp Recombinase which functions in the Flp/FRT system; or a Dre recombinase which functions in the Dre-rox system; or a Vika recombinase which functions in the Vika/vox system, or a BxB1 recombinase that functions with attP/attB sites. In embodiments, the recombinase can be provided to the cells in the form of a protein. In embodiments, the recombinase is encoded by an extrachromosomal element, such as a plasmid, or any other suitable vector, including but not limited to viral delivery vectors. The presence of the extrachromosomal element may be transient. In embodiments, a viral expression vector is used. Viral expression vectors may be used as naked polynucleotides, or may comprise any of viral particles, including but not limited to defective interfering particles or other replication defective viral constructs, and virus-like particles. In embodiments, the expression vector comprises a modified viral polynucleotide, such as from an adenovirus, a herpesvirus, or a retrovirus, such as a lentiviral vector. In embodiments, a recombinant adeno-associated virus (rAAV) vector may be used. rAAV vectors are commercially available, such as from TAKARA BIO® and other commercial vendors, and may be adapted for use with the described compositions and methods, given the benefit of the present disclosure. In certain embodiments, the expression vector is a self-complementary adeno-associated virus (scAAV). Suitable ssAAV vectors are commercially available, such as from CELL BIOLABS, INC.® and can be adapted for use in the presently provided embodiments when given the benefit of this disclosure. In one embodiment, the recombinase is encoded by and expressed from the LP. Expression of the recombinase may be inducible. In embodiments, expression of the recombinase may be controlled by a repressor. In embodiments, expression of the recombinase may be from an inducible promoter that is operably linked to the sequence encoding the recombinase. The DNA sequences of a wide variety of inducible promoters for use in eukaryotic cells are known in the art, as are the agents that are capable of inducing expression from the promoters. In embodiments, engineered regulated promoters such as the Tet promoter TRE which is regulated by tetracycline, anhydrotetracycline or doxycline, or the lad-regulated promoter ADHi, which is regulated by IPTG (isopropyl-thio-galactoside) may also be used. In embodiments, the activity or localization of the recombinase can be regulated. These embodiments include but are not limited to the use of tamoxifen-based relocalization of a recombinase to the nucleus or ligand-induced dimerization of the enzyme. In embodiments, expression of the recombinase may be controlled by, for example, by a degron. In non-limiting embodiments, the degron is a component of a degron system, including ubiquitin-dependent and ubiquitin-independent degron systems.
Virtually any chromosomal locus can be a site for insertion of an LP. In embodiments, the LP is introduced into a selected locus using any designer nuclease. In embodiments, the nuclease is a RNA-guided CRISPR-associated (Cas) nuclease. A variety of suitable CRISPR nucleases (e.g., Cas nucleases) are known in the art, as are methods for designing and selecting appropriate guide RNA constructs so that HAs can be precisely integrated at a predetermined location using a Cas nuclease. Thus, in embodiments, an RNA-guided Cas nuclease may be used. In an embodiment, two guide RNAs may be included so that the locus is modified in two positions, one for each HA.
In embodiments, the Cas is selected from a Class 1 or Class 2 Cas enzyme. In embodiments, a Type II or a Type V CRISPR Cas is used. In specific and non-limiting embodiments, the Cas comprises a Cas9, such as Streptococcus pyogenes (SpCas9). Derivatives of Cas9 are known in the art and may also be used to introduce the LP into a locus. Such derivatives may be, for example, smaller enzymes than Cas9, and/or have different proto adjacent motif (PAM) requirements. In non-limiting embodiments, the Cas enzyme may be Cas12a, also known as Cpf1, or SpCas9-HF1, or HypaCas9. The first and second HAs can include sequences that are recognized and cleaved by the same Cas-mediated cleavage system that recognizes and cleaves the chromosomes, as described and illustrated further herein. This configuration is particularly useful when, for example, the LP is provided on a plasmid, whereby excision of the plasmid-based LP facilitates the liberation of the HAs to aid in homologous recombination into the chromosomes. Thus, this approach also linearizes the plasmid. Cas cleavage sites may be positioned at or near the end of the HA arms.
The LP may also be inserted into a selected locus using non-Cas based nuclease approaches. Suitable examples include but are not necessarily limited to zinc-finger nucleases and MADzymes. Non-limiting examples of MADzymes known in the art include MAD2 and MAD7 and are included in the Cas12a category of nucleases.
In embodiments, the LP comprises a nucleotide sequence that is also cleaved by the nuclease, which for example, may induce linearization of a plasmid that contains the LP.
In embodiments, the LP comprises selectable markers. In embodiments, the LP includes a negative selection marker (also referred to as a counterselection marker). In embodiments, the negative selection marker is operatively linked to a positive selection marker. Examples of suitable selection markers are known and can be adapted for use with the described compositions and methods by those skilled in the art when given the benefit of the present disclosure. Suitable examples of positive selection markers to obtain cells that initially include the LP, and in which the LP will be replaced by the payload as further described herein, include but are not limited to puromycin N-acetyltransferase (pac), Blasticidin S deaminase (bsd), Neomycin (G418) resistance gene (neo), Hygromycin resistance gene (hygB), and Zeocin resistance gene (Sh bla).
Non-limiting examples of negative selection markers include use of the HSV1-TK gene that renders cells sensitive to ganciclovir (GCV) by converting it to the toxic metabolite GCV-triphosphate (GCV-TP). HSV1-TK can also be used as a positive selection marker using HAT medium.
In another embodiment, the cells into which the LP are introduced may have a mutated X-linked PIGA (phosphatidylinositol glycan class A) gene. A mutation in the PIGA gene may be made by adapting strategies described further herein, including but not limited to CRISPR-mediated mutations that are produced using a suitable guide RNA(s). The protein encoded by the PIGA gene renders cells sensitive to the bacterial prototoxin proaerolysin. Thus, cells into which an LP is introduced may include a functional PIGA gene that encodes a protein that renders the cells sensitive to proaerolysin, which facilitates elimination of cells that include the LP, e.g., cells in which the LP is present but wherein the desirable cells are those in which the LP is replaced by the payload, as further described below. Thus, positive and negative selection facilitates selecting cells that contain the LP, and eliminating cells that contain the LP after the payload has been recombined into the cells.
In embodiments, the payload may comprise or consist of 1 bp-1,000 kb, inclusive, and including all numbers and ranges of numbers there between, and in certain instances may be longer than 1000 kb.
Without intending to be constrained by any particular theory, it is considered that, other than a requirement for certain sequences to function with the recombinase as described herein, the presently provided systems are ambivalent with respect to the DNA sequence of the DNA insertion template. Accordingly, in embodiments, the DNA insertion template may be devoid of any sequence that can be transcribed, and as such may be transcriptionally inert. Such sequences may be used, for example, to alter a regulatory sequence in a genome, e.g., a promoter, enhancer, miRNA binding site, or transcription factor binding site, to result in knockout of an endogenous gene, or to provide an interval in the chromosome between two loci, and may be used for a variety of purposes, which include but are not limited to treatment of a genetic disease, enhancement of a desired phenotype, study of gene effects, chromatin modeling, enhancer analysis, DNA binding protein analysis, methylation studies, and the like.
In embodiments, payload comprises a sequence that may be transcribed by any RNA polymerase, e.g., a eukaryotic RNA polymerase, e.g., RNA polymerase I, RNA polymerase II, or RNA polymerase III. In embodiments, the RNA that is transcribed may or may not encode a protein, or may comprise a segment that encodes a protein and a non-coding sequence that is functional, such as a functional mRNA.
In embodiments, the payload includes one or more promoters. The promoter may be constitutive or inducible. The promoter may be operably linked to a sequence that encodes any protein or peptide, or a functional RNA.
In embodiments, the payload comprises one or more splice junctions.
In embodiments, the payload comprises an intact gene, or a gene fragment. The payload may include one or more genes or gene fragments. The gene or gene fragments may contain exons an introns. In embodiments, the payload comprises a cluster of genes. In embodiments, the payload comprises any of the foregoing features, which may be operably linked to a promoter that is included within the payload, or the DNA insertion template is linked to an endogenous cell promoter once integrated. In embodiments, the payload comprises at least one open reading frame. In embodiments, the payload encodes a protein.
In embodiments, the protein encoded by payload encodes a binding partner, such as an antibody or antigen binding fragment of an antibody. In embodiments, one or more binding partners encoded by the payload may be all or a component of a Bi-specific T-cell engager (BiTE), a bispecific killer cell engager (BiKE), or a chimeric antigen receptor (CAR), such as for producing chimeric antigen receptor T cells (e.g. CAR T cells). In embodiments, the payload encodes a T cell receptor, and thus may encode both an alpha and beta chain T cell receptor.
In embodiments, the payload comprises a sequence that is intended to disrupt or replace a gene or a segment of a gene. Thus, the disclosure includes producing both knock in and knock out gene modifications in cells, and transgenic non-human animals that contain such cells.
In embodiments, the payload is used to modify one or more chromosomal loci that are involved in, for example, any genetic disease. The payload may differ from an endogenous gene by as little as a single nucleotide, or may include or lack a particular exon, a splice junction, etc. The payload may also be a completely new sequence, relative to the genome of the cell prior to using the described approached to modify one or more loci. In embodiments, a detectable marker is encoded by the payload.
In embodiments, the payload comprises, as noted above, a positive selection marker, which may be present only during selection of the cells, or may be encoded by the payload, the former configuration allowing for scar-less insertion of the payload, except for the remainder of the recombinase recognition sequences.
In general, the described approaches are used to modify eukaryotic cells. The modified locus may be in the nucleus.
In embodiments, the eukaryotic cells comprise animal cells, which may comprise mammalian or avian cells, or insect cells. In embodiments, the mammalian cells are human or non-human mammalian cells. In embodiments, the cells are from avian animals, a canine, a feline, an equine animal, a murine mammal, e.g. a mouse or rat, a ruminant, or a psuedoruminant.
In embodiments, the cells that are modified by the approaches of this disclosure are totipotent, pluripotent, multipotent, or oligopotent stem cells when the modification is made. The stem cells may exhibit the described potency naturally, or the stem cells may be induced stem cells. In embodiments, the cells are hematopoietic stem cells. In embodiments, the cells are embryonic stem cells, or adult stem cells. In embodiments, the cells are epidermal stem cells or epithelial stem cells or neural stem cells. In embodiments, the cells are cancer cells, or cancer stem cells. In embodiments, the cells are spermatogonial stem cells. In embodiments, the cells are differentiated cells when the modification is made. In embodiments, the cells are leukocytes. In embodiments, the leukocytes are of a myeloid or lymphoid lineage. In embodiments, cells modified according to the compositions and methods of this disclosure are haploid, diploid, or tetraploid.
In embodiments, the disclosure includes obtaining cells from an individual, modifying the cells ex vivo using a system as described herein, and reintroducing the cells or their progeny into the individual or an immunologically matched individual for prophylaxis and/or therapy of a condition, disease or disorder. In embodiments, the cells modified ex vivo as described herein are autologous cells. In embodiments, the cells are provided as cell lines. In embodiments, the cells are engineered to produce a protein or other compound, and the cells themselves and/or the protein or compound they produce is used for prophylactic or therapeutic applications.
In embodiments, eukaryotic cells made according to this disclosure can be used to create transgenic, non-human organisms. In embodiments, a non-human mammal may be modified to include one or more human genes. In embodiments, the disclosure comprises modifying a gene in a mammalian embryo, such as a disease causing or disease associated gene, and implanting the embryo into a mammalian female.
In embodiments, one or more modified cells according to this disclosure may be used to perform a gene-drive in a population of animals, including but not necessarily limited to insects.
In embodiments, the one or more cells into which a described system is introduced comprises a plant cell, including but not limited cells from any variety of cannabis, tobacco, maize, rice, ornamental and vegetable plants.
In embodiments, the disclosure comprises providing a treatment to an individual in need thereof by introducing a therapeutically effective amount of modified eukaryotic cells as described herein to the individual, such that the payload produces a polynucleotide, peptide, protein, a drug, a prodrug, an immunological agent, an enzyme, or any other agent that may have a beneficial effect. A corrected or new gene may also be considered a therapeutic agent.
In embodiments, the modified eukaryotic cells can be provided in a pharmaceutical formulation, and such formulations are included in the disclosure. A pharmaceutical formulation can be prepared by mixing the modified eukaryotic cells with any suitable pharmaceutical additive, buffer, and the like. Examples of pharmaceutically acceptable carriers, excipients and stabilizers can be found, for example, in Remington: The Science and Practice of Pharmacy (2005) 21st Edition, Philadelphia, PA. Lippincott Williams & Wilkins, the disclosure of which is incorporated herein by reference.
The Examples below are intended to illustrate but not limit the disclosure. The Examples show, among other aspects of the disclosure, that the present disclosure a platform for scalable targeted integration into mammalian genomes, and demonstrated its flexibility, efficiency, and precision at three loci in mouse and human embryonic stem cells. Big-IN first targets a landing pad to a locus of interest using CRISPR-Cas9-mediated HDR, which permits single-step payload integration through Cre-mediated RMCE (
The cell engineering approach is designed to scale rapidly across multiple loci and cell lines. While we have demonstrated Big-IN in both mouse and human ESCs, it is possible that engineering other mammalian cell lines with LPs may require optimization. Indeed, we note the success of the LP-expressed CreERT2 strategy in H1 hESCs but not in mESCs. We have shown that the selection and delivery methods described herein can be redeployed in a modular fashion to overcome challenges associated with different cell types and loci. For example, the LP can employ either HSV1-ΔTK or hmPIGA as a counterselectable marker, with the former suffering from a bystander effect, and the latter requiring prior engineering to inactivate the endogenous PIGA. Similarly, inclusion of a positive selection marker on the payload augments delivery efficiency, while its placement in the payload backbone enables scarless integration (
The described verification strategy is tailored to enable early verification of engineering outcomes. For example, the use of locally generated bait in our Capture-Seq pipeline circumvents the cost and delay of commercially synthesized bait pools. Additionally, bamintersect works with standard libraries generated from genomic DNA, unlike specialized ligation-mediated approaches, and uses standard reference coordinates rather than custom assemblies for each delivery. We demonstrate the value of our pipeline through detection of internal duplications and deletions in integrated payloads that would have been difficult to detect using PCR screening.
The efficiency of Big-IN for integration of large DNA constructs suggests that it can also support integration of complex libraries for saturation mutagenesis of shorter elements46, and eventually, analysis of libraries of large constructs in a pooled format. When combined with the rapidly evolving big DNA synthesis field14,25, the disclosure includes use of Big-IN to obtain designer-like control over mammalian genomes and facilitate a synthetic approach to genome biology.
To enable repeated, precise, and efficient delivery of large DNAs to a given locus, we employed a two-stage approach that first targets a short landing pad (LP) to replace a genomic locus of interest using CRISPR-Cas9-mediated Homology Directed Repair (HDR) (
For LP integration, we first targeted the X-linked HPRT1 locus to permit counterselection with the cytotoxic antimetabolite 6-Thioguanine (6-TG)29. H1 male human embryonic stem cells (hESCs), which harbor a single copy of HPRT1, were co-transfected with pLP-TK and pCas9 plasmids30 expressing gRNAs targeting a 42 kb region including the HPRT1 gene for replacement. Cells were sequentially treated with 6-TG and puromycin to select for HPRT1 loss and LP-TK gain, followed by clonal isolation. Correct LP-TK integration was verified by PCR genotyping using primers targeting the novel junctions between LP-TK and the genomic sequences beyond the HAs (
To facilitate comprehensive genomic verification of multistep cellular engineering with these complex constructs, we developed a modular next-generation sequencing (NGS) analysis approach, which independently maps short reads to both reference genomes (hg38 and mm10) and custom references for each engineering construct. We further developed a custom capture sequencing (Capture-Seq) approach based on nick translation for rapid, flexible, and cost-effective generation of biotinylated bait for hybridization capture to efficiently verify correct engineering of screened clones (
Integration relied on 1 kb HAs to correctly target the LP, but HA length reduces the efficiency of PCR genotyping from genomic DNA (
We also assessed the efficacy of our in vivo linearization strategy to reduce off-target integration of transiently-transfected plasmids. We designed two pLP-TK plasmids differing only in the presence of the LP-flanking gRNA sites required for in vivo linearization, targeted them to HPRT1, selected for correct integrants with puromycin and 6-TG, and subjected the pool of cells to Capture-Seq. We found that the relative coverage depth of the LP backbone was lower for the in vivo-linearized pLP-TK (
To develop an approach for allele-specific engineering of diploid loci, we employed C57BL6/6J x CAST/EiJ (BL6xCAST or BL6xC) F1 hybrid mouse ESC cells (mESCs) 32, the genome of which harbors heterozygous point variants every 140 bp on average 33. We targeted the Sox2 locus, which encodes a master transcription factor essential for regulation of pluripotency and differentiation34,35. We designed gRNAs targeting the flanks of a 143 kb genomic region that includes the Sox2 coding sequence, promoter, long distance regulatory regions, and several non-coding genes35,36. These designed gRNAs target BL6-specific PAMs to facilitate allele-specific engineering. We constructed pLP-PIGA with a LP including heterotypic loxM/loxP sites and flanked by short homology arms and gRNA target sites (
We transfected pLP-PIGA and pCas9 plasmids into BL6xCAST mESCs, selected cells with puromycin and isolated clones. Of 40 clones screened using PCR genotyping, 16 (40%) contained both novel junctions (
A successful LP-PIGA integration (clone A1) and a failed LP-PIGA clone were subjected to Capture-Seq using bait generated from a BAC covering the Sox2 region, and the pLP-PIGA and pCas9 plasmids. Inspection of coverage depth at the 143 kb Sox2 genomic locus revealed a 50% reduction for clone A1 compared with parental mESCs or the failed clones (
Delivery of large DNA through cassette exchange is an infrequent event, requiring selection to obtain practical efficiency. The HSV1-TK gene is a widely used counterselectable marker that renders cells sensitive to GCV by converting it to the toxic metabolite GCV-triphosphate (GCV-TP), which inhibits DNA synthesis and leads to cell death38. To test the efficacy of TK/GCV counterselection in H1 hESCs, we mixed TK-negative and TK-positive (LP-TK) cells at different ratios and treated these co-cultures with GCV. More than 80% of the TK-negative cells died when mixed at a 1:1 ratio with TK-positive cells, and all died when mixed at a 1:10 ratio (
Therefore we tested an alternative counterselection strategy that relies on the X-linked PIGA (phosphatidylinositol glycan class A) gene, which encodes an enzyme crucial for the biosynthesis of glycosylphosphatidylinositol (GPI) anchors41 and renders cells sensitive to proaerolysin, a bacterial prototoxin. Proaerolysin perforates the plasma membrane upon binding to GPI anchors on the cell surface, resulting in rapid cell death42. Further, PIGA activity can be quantitatively monitored by measuring levels of CD59, a broadly expressed membrane-linked GPI-anchored protein43. Deletion of PIGA can be selected for with proaerolysin after a short period to allow for loss of PIGA protein and subsequent loss of GPI-anchored proteins from the cell surface44.
While proaerolysin efficiently killed parental H1 hESCs, ΔPIGA cells, in which the PIGA gene was deleted using CRISPR/Cas9 (see Methods), were entirely resistant (
Recovery of rare events where a payload replaces the LP requires that expression of hmPIGA is stably maintained following withdrawal of positive selection. However, while nearly all H1 LP-PIGA cells maintained high CD59 levels in the presence of puromycin, a substantial proportion of cells spontaneously lost CD59
To demonstrate a counterselection-based approach to isolation of successful RMCE events, we designed a minimal 2.7 kb payload (PL1), comprising an pEF1α-driven GFP-T2A-BSD ORF flanked by loxM and loxP sites (
We attempted to apply a similar strategy for delivery to LP-PIGA mESCs. However, all clones that survived blasticidin and proaerolysin selection manifested multicopy gain of payload and vector backbone without LP-PIGA loss (
This approach leaves a BSD-GFP transcriptional unit (TU) integrated with the payload, which might affect the activity of nearby genes or regulatory elements. To develop an alternate architecture and selection strategy for scarless delivery, we constructed pSox2143kb, which harbors the entire 143 kb Sox2 BL6 allele replaced by LP-PIGA, and in which the BSD-GFP TU is relocated on the backbone outside the lox sites (
To demonstrate the flexibility of Big-IN for delivery of payloads to additional loci, LP-PIGA2 was integrated into chromosome 7 of BL6xCAST mESCs, replacing a 157 kb region of the Igf2/H19 locus (
In order to screen genomic data for on- and off-target integration events, we developed bamintersect. Bamintersect leverages our modular mapping approach to analyze reads mapped separately to two reference genomes and detect read pairs indicative of a junction (
Of note, several novel engineered junctions were impossible to confirm using bamintersect due to technical reasons, including LP-TK integration at HPRT1, for which the 1 kb homology arms precluded mapping reads that span the junction between LP-TK and hg38, as well as PL1 deliveries to both HPRT1 and Sox2, for which the left junction is nearly identical to that of the replaced LP.
For Sox2143kb deliveries, the newly-formed payload-genome junctions are nearly identical to the original sequences in parental cells (deleted in LP-PIGA mESCs), as well as to the existing regions in the CAST allele. We therefore categorized bamintersect read pairs that overlap with BL6xCAST variants according to their genotypes, revealing that while LP-PIGA mESCs junctions are depleted of BL6 reads, these reads are restored in Sox2143kb clone G11 mESCs (
Combined, these results support the utility of bamintersect as a sensitive, scalable and unbiased tool for detection of on and off-target integration events.
Primers used for cloning are listed in Supplementary Table 1. pLP-TK (pLP050/pJML0050) was assembled by a combination of overlap PCR, Gibson assembly of intermediate fragments, and Golden Gate cloning. The backbone was assembled from PCR-amplified fragments: The HIS3 transcriptional unit (TU) fragment was amplified as two overlapping parts to remove an internal BbsI site from pRS413 (ATCC, 87518) using primers oJML0069+oJML0056 and oJML0057+oJML0058. The Ori-AmpR-CEN/ARS fragment was amplified as two overlapping parts to remove an internal BsaI site from pRS413 using primers oJML0053+oJML0068 and oJML0067+oJML0070. These parts were combined with a synthetic sequence containing Golden Gate compatible cloning sites for adding homology arms, and cloned by Gibson assembly. LP-TK, consisting of loxM, pEF1α-driven PuroRΔTK-P2A-CreERT2 coding sequence, EIF1 polyadenylation signal (EIF1 pA), and a loxP site, was assembled largely by overlap PCR followed by Golden Gate assembly into the above mentioned backbone: The PuroRΔTK-P2A-CreERT2 fragment was built by overlap PCR of PuroR using oJML0126+oJML0144 and pJML0010 as a template, ΔTK-P2A using oJML0137+oJML0129/oJML0138 with pSP0130 as a template, P2A-CreERT2 with oJML0130/oJML0139+oJML0131 and oJML0132+oJML0133/oJML0134 and pBabe-Puro Cre ERT244 as a template, and EIF1 pA with oJML0135+oJML0136 and pJTR0085 as a template. The assembled coding sequence was cloned into the backbone by BbsI-mediated Golden Gate assembly. pEF1α was amplified from pSP0044 with primers oJML0145+oJML0146 and cloned into the LP-containing vector by BsmBI-mediated Golden Gate assembly.
pLP-PIGA (pLP140/pJML0140) was cloned using BbsI-mediated Golden Gate assembly of a synthetic LP fragment, consisting of loxM/loxP-flanked pEF1α-driven mScarlet-P2A-CreERT2-P2A-PuroR-hmPIGA coding sequence, and an EIF1 pA, into a minimal ‘entry vector’. The entry vector (pJML0100) was modified from a minimal bacterial backbone (pYTK095, Addgene plasmid #65202) by inserting a BbsI Golden Gate-compatible entry sequence at the NotI site.
pLP-PIGA2 (pLP300/pRO_009) was constructed from a synthetic plasmid that included the following LP region components: loxM-pEF1α-PuroR-P2A-hmPIGA-P2A-mScarlet-EIF1 pA-loxP (where the P2A sequences are mutually recoded). A ΔTK synthetic transcriptional unit consisting of a human PGK1 promoter, an HSV1-ΔTK gene, and an SV40 polyadenylation signal was cloned into the SbfI site in the pLP backbone and a clone in which the two TUs are facing opposite ways was identified by PCR.
To facilitate targeting of LPs to specific genomic loci, homology arms (HAs) corresponding to the genomic sequence flanking the Cas9 cut sites were amplified from either mammalian genomic DNA or from a BAC corresponding to the engineered region. Homology arms were cloned distally to the LoxM and LoxP site in the LP using a BsaI Golden Gate assembly reaction. Primers used to amplify homology arms are listed in Supplementary Table 2.
pPL1 was assembled in yeasto47 from 3 linear DNA fragments, each encoding ≥40 bp terminal sequence homology with its adjacent fragments. These fragments included a BsaI-digested pLM1050 yeast/E. coli shuttle vector, a pEF1α-GFP cassette amplified using PCR primers oRB_061+oRB_063 from pSP0108 and a T2A-BSD-bGHpA (bGHpA, bovine growth hormone polyadenylation signal) cassette amplified using PCR primers oRB_062+oRB_064 from pSP0172.
pPL1-BBTK (pJML0206) was constructed from pPL1 in yeasto. pPL1 was linearized using recombinant Cas9 (New England Biolabs M0386) and a synthetic tracrRNA/crRNA (TTGCGCACGGTTATGTGGAC) (SEQ ID NO:1) duplex (Integrated DNA Technologies, IDT). A ΔTK synthetic transcriptional unit, consisting of a human PGK1 promoter driving the expression of a recoded ΔTK gene and an SV40 polyadenylation signal, was amplified to carry overlapping homology to the Cas9-digested pPL1 backbone. Linear fragments were co-transformed to yeast, and colonies were screened by colony PCR.
pSox246kb (pLM1113) and pSox2143kb (pLM1120) were constructed in yeasto in a two-step process starting with a BAC that carries the Sox2 locus (BACs and relevant genomic coordinates are listed in Supplementary Table 3). The BAC was subjected to in vitro CRISPR/Cas9 digestion using synthetic gRNAs mSox2-g1 and mSox2-g3 to release a 46 kb segment or mSox2-g1 and mSox2-g2, to release a 143 kb segment. Specifically, synthetic crRNAs and tracrRNA (IDT) were resuspended and mixed at 1 μM each with Duplex Buffer (IDT), heated to 94° C. and slowly cooled to room temperature. Next, 1 μL of duplexed crRNAs/tracrRNA were mixed with 2 μL 10×Cas9 Buffer and 1 μL recombinant Cas9 (New England Biolabs M0386S) in a total volume of 20 incubated for 10 min at room temperature, supplemented with 1 μg BAC DNA and incubated for 2 hours at 37° C., followed by inactivation with 1 μL Proteinase K (Qiagen 19131) for 10 min at room temperature. The digestion products were co-transformed with BsaI-digested assembly vector pLM1110 and terminal linker sequences (250 bp gBlocks, IDT) to enable homologous recombination-dependent assembly.
pSox246kb-MC (pLM1121) was cloned by digesting pSox246kb (pLM1113) using I-SceI and assembling in yeasto with a selectable marker cassette containing pEF1α-GFP-T2A-BSD-bGHpA, which was PCR-amplified from pPL1, a BsaI-digested pLM1081 yeast/E. coli shuttle vector and 3 gBlock (IDT) linkers to provide terminal homology between parts.
pSox246kb-MC-BBTK (pJML0207) was cloned from pSox246kb-MC (pLM1121) in the same manner as pPL1-BBTK was built from pPL1, using the same guide sequence and ΔTK fragment.
Payload constructs were recovered from yeast and transformed into CopyControl TransforMax EPI300 E. coli cells (Lucigen). A single colony was grown overnight in selective LB medium at 37° C. with shaking and then subcultured 1:100 in 150-300 mL selective LB medium supplemented with CopyControl Induction Solution (Lucigen) and grown for an additional 6-8 hours.
All gRNAs were cloned into pSpCas9(BB)-2A-Puro V2.0 (pCas9) plasmids using BbsI Golden Gate assembly as described30. gRNA sequences and genomic target coordinates are listed in Supplementary Table 4.
The lentiviral Cre reporter construct pLV-lox-dsRed-lox-GFP was cloned by amplifying a loxP-dsRed-loxP-eGFP cassette from pMSCV-loxP-dsRed-loxP-eGFP-Puro-WPRE48 (Addgene plasmid #32702) using primers oRB_036+oRB_037, digesting the product with ClaI+NotI and ligating into a ClaI+NotI-digested lentiviral vector pLH1263. The resulting lentivirus encodes a pEF1α-driven loxP-dsRed-loxP-eGFP-WPRE.
Plasmids were isolated using the ZymoPURE II Plasmid Maxiprep Kit (Zymo Research D4203) according to the manufacturer's protocol. BACs and large payloads were isolated using the NucleoBond Xtra BAC kit (Takara Bio 740436).d gRNA design gRNAs were designed using the GuideScan algorithm49. For allele-specific LP integration at Sox2 we produced a scored list of potential gRNAs targeting a 261 kb region surrounding Sox2 using the BL6 reference genome sequence. Next, we identified gRNAs for which the corresponding PAM is mutated in the CAST allele, resulting in a list of BL6-specific gRNAs. From this list we selected two high-scoring gRNAs, Sox2-g1 and Sox2-g2, which target a 143 kb genomic region for replacement with the LP. gRNA sequences are listed in Supplementary Table 4.
WA01 (H1) human embryonic stem cells (hESCs) were purchased from WiCell. H1 hESCs were initially grown for 2 weeks on plates coated with Matrigel (Corning 354277) in mTeSR medium (Stem Cell Technologies 85850) and subsequently transferred to plates coated with Geltrex (Gibco A1413302) and StemFlex medium (ThermoFisher A3349401) supplemented with 1% Pen-Strep (ThermoFisher 15140122). For routine passaging, cells were dissociated into clumps with Versene (Gibco 15-040-066) and gentle trituration. Wide-orifice pipette tips were used when handling small volumes of cell suspension.
C57BL6/6J x CAST/EiJ (BL6xCAST) clone 4 mESCs32 were used. mESCs were cultured on plates coated with 0.1% gelatin (EMD Millipore ES-006-B) in 80/20 medium comprising 80% 2i medium and 20% mESC medium. 2i medium contained a 1:1 mixture of Advanced DMEM/F12 (ThermoFisher 12634010) and Neurobasal-A (ThermoFisher 10888022) supplemented with 1% N2 Supplement (ThermoFisher 17502048), 2% B27 Supplement (ThermoFisher 17504044), 1% Glutamax (ThermoFisher 35050061), 1% Pen-Strep (ThermoFisher 15140122), 0.1 mM 2-Mercaptoethanol (Sigma M3148), 1250 U/ml LIF (ESGRO ESG11071), 3 μM CHIR99021 (R&D Systems 4423) and 1 PD0325901 (Sigma PZ0162). mESC medium contained Knockout DMEM (ThermoFisher 10829018) supplemented with 15% Fetal Bovine Serum (FBS, BenchMark 100-106), 0.1 mM 2-Mercaptoethanol, 1% Glutamax, 1% MEM Non-Essential Amino Acids (ThermoFisher 11140050), 1% Nucleosides (EMD Millipore ES-008-D), 1% Pen-Strep and 1250 U/ml LIF. HEK-293T cells were cultured in DMEM supplemented with 10% FBS, 1 mM sodium pyruvate (ThermoFisher 11360070), 1% Glutamax and 1% Pen-strep. All cells were grown at 37° C. in a humidified atmosphere of 5% CO2 and passaged on average twice per week.
Puromycin (Sigma P9620) and Blasticidin S (ThermoFisher R21001) were applied as described below. Ganciclovir (GCV, Sigma PHR1593) was dissolved in water and NaOH at pH 12 and adjusted to pH 11 with HCl and water to a final concentration of mg/ml. GCV and Proaerolysin (Aerohead Scientific) concentrations are indicated below. 4-Hydroxytamoxifen (tamoxifen, Sigma T176) was applied at 200 nM, unless indicated otherwise. 6-TG (Sigma A4660) was applied at 30 μM.
Relevant genomic coordinates are listed in Supplementary Table 3.
H1 hESCs were transfected using the Neon Transfection System (ThermoFisher). Cells were treated several hours prior to transfection with StemFlex medium supplemented with 1% RevitaCell Supplement (ThermoFisher A2644501). Cells were washed with PBS, dissociated into a single-cell suspension using TrypLE-Select (ThermoFisher 12563011), which was neutralized with StemFlex medium, spun down at 200 rcf for 3 min, supernatant aspirated and cells resuspended in PBS. 1×106 cells per transfection were spun down at 200 rcf for 3 min and resuspended in Neon Buffer R at a final concentration of 2×107 cells/ml. 50 μL of cell suspension were mixed with 50 Neon Buffer R containing 10 μg of total DNA per transfection. Nucleofection used Neon 100 μL Tips with two 20 ms pulses at 1100 V. Transfected cells were transferred into plates coated with rhLaminin-521 (Gibco A29249) prefilled with StemFlex medium supplemented with 1% RevitaCell. PIGA deletion was performed with 5 μg of each pCas9 plasmid expressing gRNAs hPIGA-g1 and hPIGA-g2 and cells were selected with 200 pM proaerolysin for 1-2 weeks post-transfection. These ΔPIGA cells were used for subsequent LP-PIGA integrations. All LP integrations at HPRT1 were performed using 5 of the pLP and 2.5 μg of each pCas9 plasmid expressing HPRT1-g1 and HPRT1-g2 gRNAs, and cells were selected using a combination of 1 μg/ml puromycin and 6-TG, as indicated. H1 PL1 integrations were performed using 5 μg pPL1. Cells were treated with 200 nM 4-Hydroxytamoxifen (Tam) the day following transfection for 3 hours, selected with 5 μg/ml Blasticidin S for 8 days followed by 4 days of selection with 100 nM GCV to eliminate TK-expressing cells.
LP integrations and genomic deletions in BL6xCAST mESCs were performed using the Neon Transfection System. Cells were washed with PBS, dissociated into a single-cell suspension using TrypLE-Select (Gibco), which was neutralized with mESC medium, spun down at 200 rcf for 3 min, supernatant aspirated and cells resuspended in PBS. 1×106 cells per transfection were spun down at 200 rcf for 3 min and resuspended in Neon Buffer R at a final concentration of 2×107 cells/ml. Per transfection, 50 μL of cell suspension were mixed with 50 μL Neon Buffer R containing 10 μg of total DNA and nucleofected using Neon 100 μL Tips with two 20 ms pulses at 1200 V. Transfected cells were transferred into gelatin-coated plates prefilled with 80/20 medium. Piga deletion was performed with 5 μg of each pCas9 plasmid expressing gRNAs mPiga-g1 and mPiga-g2 and cells were selected with 2 nM proaerolysin approximately 1 week post-transfection. ΔPiga cells were used for subsequent LP integrations. LP-PIGA integrations at Sox2 were performed using 5 μg of the pLP and 2.5 μg of each pCas9 plasmid expressing Sox2-g1 and Sox2-g2 gRNAs, and cells were selected with 1 μg/ml puromycin. LP-PIGA2 integration at Igf2/H19 was performed using 5 μg of the pLP-PIGA2 and 2.5 μg of each pCas9 plasmid expressing Igf2/H19-g1 and Igf2/H19-g2 gRNAs, and cells were selected with 1 μg/ml puromycin followed by selection with 1 μM GCV.
Payload deliveries in BL6xCAST mESCs were performed using a Nucleofector 2b (Lonza). Cells were washed with PBS, dissociated into a single-cell suspension using TrypLE-Select, which was neutralized with mESC medium, spun down at 200 rcf for 3 min, supernatant aspirated and cells resuspended in ice-cold PBS, counted, and 5×106 cells per transfection were spun down at 200 rcf for 3 min and resuspended in a room temperature mixture of 82 μL Nucleofector Solution and 18 μL Nucleofector Supplement from the Mouse ES Cell Nucleofector kit (Lonza VPH-1001). Per transfection, 100 μL of cell suspension were mixed with 10 μL TE containing 2.25-5 μs of total DNA, and nucleofected using program A-23. PL1 deliveries were performed with 1.5 μg pPL1-BBTK and 0.75 μg pCAG-Cre (Addgene plasmid #13775). pSox246 kb-MC deliveries (failed deliveries) were performed with 35 μg pSox246 kb-MC. Payload-transfected mESCs were treated with 200 nM Tam for 4 hours before and 24 hours after transfection. Cells were selected with blasticidin constitutively starting day 1 post-transfection and with 2 nM proaerolysin for 2 days starting day 14 post-transfection. pSox246 kb-MC-BBTK deliveries were performed with 3 μg pSox246 kb-MC-BBTK and 1 μg pCAG-Cre. Payload-transfected mESCs were treated with 200 nM Tam for 24 hours before and after transfection. mESCs were grown for 10 days with blasticidin. On days 11 and 12, 1 nM proaerolysin was added, and on days 13 and 14, 1 μM GCV was also added. pSox2143 kb delivery was performed with 0.3 μg pSox2143 kb and 2 μg pCAG-iCre (Addgene plasmid #89573). Payload-transfected mESCs were selected with blasticidin for 2 days starting day 1 post-transfection and with 2 nM proaerolysin for 2 days starting day 7 post-transfection. Payload deliveries to BL6xCAST Igf2/H19 were performed with 5 pSox246 kb-MC-BBTK or pSox246 kb and 2 μg pCAG-iCre. Cells were selected with blasticidin either transiently during days 1 and 2 post transfection (pSox246 kb) or constitutively (pSox246 kb-MC-BBTK), followed by 2 nM proaerolysin selection during days 7 and 8 post-transfection. pSox246kb-MC-BBTK transfected cells were further selected with 1 μM GCV during days 9 and 10 post-transfection.
Genomic DNA was extracted either using the DNeasy Blood & Tissue kit (QIAGEN 69506), according to the manufacturer's protocol, or by a crude extraction protocol, applied when a large number of samples were processed. For crude DNA extraction, cells were grown to confluency in 96-well plates and washed with PBS. After removing the PBS, plates were frozen at −80° C. for at least 30 min and then thawed at room temperature. Cells were resuspended in 100 μL/well TE buffer (pH 8.0) supplemented with 0.3 mg/ml proteinase K (Thermo Scientific E00491). Mixtures were triturated several times to ensure lysis. Lysates were transferred to PCR plates and plates were sealed, spun down, and incubated at 37° C. for 1 hour and 99° C. for 10 min. Plates were spun down and left to cool down at room temperature. Typical concentrations obtained from 80% confluent wells of mESCs were 100-300 ng/μL, according to Nanodrop measurements.
PCR was conducted with 50-100 ng column-prepped DNA or with 1-2 μL of crude extract using either 2×GoTaq Green Mastermix (Promega PRM7123) or Phusion Hot Start Flex 2×Master mix (New England Biolabs M0536L) according to the manufacturers' protocols. Genotyping primers are listed in Supplementary Table 5. 8-10 μL of amplified PCR products were separated on a 1-2% agarose gel and visualized with ethidium bromide on a BIO-RAD Gel Doc XR+System. Image color was inverted.
Quantitative Real-Time PCR (qRT-PCR)
Total RNA was extracted using RNeasy Mini kit (QIAGEN) and 1-2 μg were reverse-transcribed using the High Capacity Reverse Transcription Kit (Life Technologies 4368814) according to the manufacturer's protocol. Quantitative Real-Time PCR (qRT-PCR) was performed using the KAPA SYBR FAST (Kapa Biosystems KK4610) on a LightCycler480 Real-Time PCR System (Roche). Expression was calculated using the ΔCt method. Relative Expression was calculated by dividing the average level of each gene to that of the housekeeping gene GAPDH/Gapdh measured in the same cDNA sample. qRT-PCR primers and annealing temperatures are listed in Supplementary Table 6. When data are displayed as bar charts, error bars represent standard deviations of technical replicates.
Crystal Violet (CV) staining was performed by incubating plates for 5 min with CV solution (10 mM CV, 10% EtOH in water), followed by 3-5 gentle washes with water. PrestoBlue (ThermoFisher Scientific A13262) staining was performed according to the manufacturer's protocol. For CD59/HLA analysis, cells were washed with PBS, singularized using TrypLE-Select and neutralized with DMEM supplemented with 10% FBS. 1 million cells per sample were spun down at 500 rcf for 1 minute, and the supernatant was removed. Cell pellets were resuspended in staining solution containing DMEM, 10% FBS, 10% anti-CD59-FITC (BIO-RAD MCA1054F) and 10% anti HLA-PE (Invitrogen 12-9983-42) and incubated on ice in the dark for 30 min with occasional gentle mixing. Staining solution was topped with 0.5 mL ice-cold PBS, samples were spun down (500 rcf, 1 minute) and supernatants were aspirated. This washing step was repeated once more, and samples were resuspended in 0.3 mL ice-cold PBS, filtered and placed on ice until analysis. Flow cytometry was performed on a BD Accuri C6 instrument and results were analyzed using the FlowJo software.
For production of lentiviral particles, 1×107 HEK-293T cells were resuspended in growth media (as described above) and transfected with 20 μg lentiviral vector, 20 μg psPAX2 packaging plasmid and 10 μg pMD2.G envelope plasmid using the Calcium Phosphate method. Cells were then plated in a 10 cm dish and cultured for one day. On the second day, media was refreshed and cells were incubated at 32° C. Viral supernatants were collected on the morning and evening of the third and fourth days, passed through a 0.22 μm cellulose acetate filter and concentrated approximately 25-fold using an Amicon Ultra-15 Centrifugal Filter (Millipore UFC903024). Cells were infected with concentrated virus, diluted in appropriate media in the presence of 8 μg/ml polybrene (Sigma TR1003G) for approximately 16 hours at 37° C. One or more days following infection, cells harboring CreERT2 were treated with 4-Hydroxytamoxifen (tamoxifen, Sigma T176) and were then assayed for DsRed and GFP expression by flow cytometry on a BD Accuri C6 machine. Cre activity was calculated using the FlowJo software as the % of GFP-positive cells of the total infected (fluorescent) cells.
Preparation of Illumina dsDNA Libraries
Genomic DNA was isolated from cells using the DNeasy Blood and Tissue Kit (Qiagen) according to the manufacturer's protocol. 1000 ng of DNA was sheared to approximately 500-900 bp in a 96-well microplate using the Covaris LE220 (450 W, 10% Duty Factor, 200 cycles per burst, and 90 second treatment time). Sheared DNA was purified using the DNA Clean and Concentrate-5 Kit (Zymo Research), and the concentration was measured on a Nanodrop instrument (Invitrogen). DNA fragments were end-repaired with T4 DNA polymerase, Klenow DNA polymerase, and T4 polynucleotide kinase (NEB), and A-tailed using Klenow (3′-5′ exo-, NEB). Illumina-compatible adapters were subsequently ligated to DNA ends, and DNA libraries were amplified with KAPA 2×Hi-Fi Hotstart Readymix (Roche).
Baits for sequence capture were prepared from BAC or plasmid DNA containing the sequence of interest. Biotin-16-dUTP (Roche) was incorporated into bait DNA using a Nick Translation kit (Roche). The reaction (total volume 20 μL) was set-up in a 200 PCR tube on ice as follows: 2 μg of BAC DNA, 10 μL of 0.1 mM Biotin-dUTP/dNTP mixture (1 volume Biotin-16-dUTP, 2 volumes dTTP, 3 volumes dATP, 3 volumes dCTP and 3 volumes dGTP), 2 μL of 10×nick translation buffer and 2 μL of enzyme mixture. Nick translation was carried out at 15° C. for 16 hours or 8 hours (for BAC or plasmid DNA, respectively) in a thermal cycler. The reaction was stopped by addition of 1 μL 0.5 M EDTA and heating at 65° C. for 10 min or cooling at 4° C. overnight. Biotinylated baits were purified by ethanol precipitation, resuspended in 50 mL H2O, and the concentration was measured on a Nanodrop instrument. Baits were stored at −20° C.
Targeted sequencing using in-solution hybridization capture (Capture-Seq) was performed as described previously50 with modifications. 1 μg biotinylated DNA bait and μg Cot-1 human or mouse DNA (Invitrogen) were combined with universal and sample-specific blocking oligos and lyophilized using a SpeedVac. Lyophilized DNA was resuspended in 12 μL TE (pH 7.5) and overlaid with mineral oil. In a thermal cycler, the DNA mixture was denatured at 96° C. for 5 min, incubated at 65° C. for an additional min, and then 12 μL of 2×hybridization buffer (1.5 M NaCl, 40 mM sodium phosphate buffer (pH 7.2), 10 mM EDTA (pH 8), 10×Denhardt's and 0.2% SDS) was added to the DNA, and the mixture was pre-hybridized for 6 hours at 65° C.
A total of 1 μg from up to 2-8 libraries were pooled into a single 200 μL PCR tube for a single capture reaction. Library DNA was diluted in H2O to a final volume of 12 μL and overlaid with mineral oil. Library DNA was denatured at 96° C. for 5 min, incubated at 65° C. for an additional 15 min, and then 12 μL of 2×hybridization buffer was added to the denatured DNA library. The entire volume (24 μL) of denatured library DNA was added to the tube of pre-hybridized bait DNA, and the mixture was incubated at 65° C. for 16-22 hours. For each capture reaction, 50 μL of MyOne streptavidin-coated magnetic beads (Invitrogen) were washed with 1×B&W buffer (5 mM Tris-HCl pH 7.5, mM EDTA, 1 M NaCl) three times, and then resuspended in 150 μL 1×B&W buffer in a low-retention microcentrifuge tube. The hybridization mix (48 μL) plus 48 μL 2×B&W buffer (10 mM Tris-HCl pH 7.5, 1 mM EDTA, 2 M NaCl) were then combined with the pre-washed magnetic beads, and incubated at room temperature for 30 min with rotation. The magnetic beads were washed once at 25° C. for 15 min in 1×SSC with SDS and three times at 65° C. for 15 min in 0.1×SSC with 0.1% SDS. To denature the captured library DNA, the beads were resuspended in 100 μL 100 mM NaOH, and incubated at room temperature for 10 min. After allowing the beads to separate on a magnetic rack, the supernatant (containing enriched library DNA) was transferred to a new tube, neutralized with 100 μL 1 M Tris-HCl pH 7.5, and purified using the DNA Clean and Concentrate-5 Kit (Zymo Research). Four microliters of the captured library DNA were evaluated using qPCR to determine the optimal number of final PCR amplification cycles. Captured libraries were then amplified with KAPA Hi-Fi Hotstart Readymix (Roche).
Illumina libraries were sequenced in paired-end mode on an Illumina NextSeq 500 operated at the Institute for Systems Genetics or a NovaSeq 6000 operated by the NYU Langone Health Genome Technology Center. Reads were demultiplexed with Illumina bcl2fastq v2.20 requiring a perfect match to indexing BC sequences. All whole-genome sequencing and Capture-Seq data were processed using a uniform mapping and peak calling pipeline. Illumina sequencing adapters were trimmed with Trimmomatic v0.3951. Sequencing reads were aligned using BWA v0.7.1752 to a genome reference (GRCh38/hg38 or GRCm38/mm10) including unscaffolded contigs and alternate references, as well as independently to custom references for relevant vectors. PCR duplicates were marked using samblaster v0.1.2453. Generation of per-base coverage depth tracks and quantification was performed using BEDOPS v2.4.3554. Data were visualized using the UC SC Genome Browser.
Variant calling was performed on sequenced BL6xCAST samples to verify correct allele-specific engineering using a standard pipeline based on bcftools v1.9:
Raw Pileups were Filtered Using:
SNVs called in each sample were intersected with expected BL6/CAST heterozygous sites based on known variants called for CAST/EiJ 55.
Bamintersect enables efficient filtering and analysis of paired-end sequencing reads mapped independently to two different reference sequences, typically a mammalian reference genome (hg38 or mm10) and an engineered reference of interest (typically a LP or payload). To identify junctions between the two references in an unbiased fashion, baminter sect searches for read pairs where each read is mapped to a different genome. For LP/PL genomes, the read's mate is required to be unmapped to that genome. Reads must be fully mapped with <1 mismatched bases and no clipping, insertions, or deletions, and duplicate or supplementary alignments are excluded. Bamintersect filters reads (minimum of 20 bp mapping outside) against satellite repeats as well as uninformative regions defined as sequences of >120 bps with >85% similarity for the following contexts: for LP integrations, genomic regions corresponding to LP components hmPIGA, human EIF1 poly(A), ERT2 (ESR1), pEF1α (EEF1A1) and the homology arms; for payload integrations, the LP/payload shared regions pEF1α and lox sites, and the deleted genomic region; for pCas9, the human U6 promoter and hmPIGA.
Informative reads with the same strand and mapping to within 500 bp of each other were clustered for reporting. Regions below 75 bp or with fewer than 1 read/10M reads sequenced were excluded.
CCATGTAAGGTTG
AGGTGAA
GAAGCAACGCGCGCCGGT
A
GGTCTGGACCTGCACTTC
AAGTAGTTCAGG
AGGGTTTG
GCAAAGGCTCCCGTTAGG
CCATAGCAATGA
TGGGCCCA
TTATCCCAGCTCTGGG
AGG
The following reference listing is not an indication that any particular reference(s) is material to patentability.
This application claims priority to U.S. provisional application No. 63/091,508, filed Oct. 14, 2020, the entire disclosure of which is incorporated herein by reference.
This invention was made with government support under grant no. RM1-HG009491 awarded by the National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/054988 | 10/14/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63091508 | Oct 2020 | US |