Described herein are methods for the concurrent assessment of large numbers of genome engineering proteins, including CRISPR nucleases and base editors.
The continued development of genome engineering technologies requires methods that can accurately and rapidly characterize important parameters of these enzymes. Whether through protein engineering to improve the fundamental properties of CRISPR proteins, or through bioinformatic searches to identify previously uncharacterized nucleases, the suite of poorly understood proteins continues to grow. The availability of standardized, accurate, and high-throughput characterization methods is therefore critical to understanding the properties of genome editing technologies.
The adaptation of CRISPR-Cas enzymes for genome engineering applications has had a transformational impact on biomedical research. The number of CRISPR-based technologies with different capabilities is rapidly expanding through the discovery of naturally occurring type II (Cas9) and type V (Cas12) orthologs and the engineering of enzymes with improved properties (Makarova et al., Nat. Rev. Microbiol., 18(2):67-83); Anzalone et al., Nat. Biotechnol. 38, 824-844 (2020)). One critical property of these DNA-targeting Cas enzymes is the necessity to recognize a protospacer-adjacent motif (PAM) in their target site (Jinek et al., Science 337, 816-821 (2012)). This requirement fulfills an important biological role, enabling the CRISPR immune system to differentiate self from invading DNA (Marraffini and Sontheimer, Nature 463, 568-571 (2010)). For genome editing applications, the PAM of a Cas protein dictates which genomic sites are accessible to the enzyme. A major bottleneck in the identification or engineering of CRISPR enzymes with unique PAM requirements is the need for scalable experimental methods to characterize PAM preferences in biologically relevant settings. Here, we provide a detailed experimental protocol and steps for analyzing data with HT-PAMDA, a scalable assay to investigate the PAM profiles hundreds of Cas enzymes. Beyond understanding the targeting ranges of Cas enzymes, the HT-PAMDA workflow should be adaptable for scalable characterization of other important properties of CRISPR enzymes including their activities, specificities, guide RNA (gRNA) requirements, and others. For both naturally occurring and optimized enzymes, thorough characterization of the properties of these engineered tools is essential for understanding and benchmarking their performance for genome editing applications.
The present methods include providing a plurality of individual discrete samples comprising populations of cells, preferably mammalian cells, preferably human cells, wherein each population of cells overexpresses both (i) a single genome engineering protein or a variant thereof and (ii) a reporter protein, wherein (i) and (ii) are expressed in a known ratio, preferably 1:1, in the sample; lysing the cells to release the proteins; normalizing levels of the genome engineering proteins or variants thereof based on levels of the reporter protein; combining the genome engineering proteins or variants thereof with a guide RNA (or allowing the proteins or variants to combine with a guide RNA present in the sample) under conditions sufficient to form ribonucleoprotein complexes in each sample; contacting each sample with a plurality of analysis substrates, under conditions sufficient for the genome engineering protein or variant thereof to act on one or more of the substrates; determining levels of each of the analysis substrate in each sample at a plurality of times; and calculating rate of depletion or enrichment of each of the analysis substrates from each sample.
In some embodiments, the genome engineering protein is a nuclease, base editor, or other protein that can alter DNA. In some embodiments, the genome engineering protein can alter the genome of a living cell or genomic DNA in vitro).
In some embodiments, (i) and (ii) are expressed in a known ratio, e.g., 1:1 ratio, from a single nucleic acid construct, preferably a construct comprising a viral 2A sequence in between sequences encoding (i) and (ii), or a direct fusion between sequences encoding (i) and (ii) by a peptide linker.
In some embodiments, the reporter proteins are fluorescent. In some embodiments, expression levels of the reporter proteins is determined by spectrophotometry, image analysis, or other methods to quantify the levels of fluorescence from the reporter protein.
In some embodiments, each different genome engineering protein or variant thereof is expressed in an identified discrete individual population of cells in a single well of a multi-well plate. In some embodiments, a normalized amount of each genome engineering protein is transferred to a second multiwell plate.
In some embodiments, the genome engineering protein is or comprises a CRISPR nuclease, is mixed with a guide RNA to form ribonucleoprotein complexes (or is allowed to form complexes with guide RNAs present in the sample), and is contacted with a population of analysis substrates, each comprising a spacer sequence and a PAM sequence, wherein the population comprises analysis substrates having a plurality of spacer sequences, or plurality of PAM sequences, or both.
In some embodiments, the genome engineering protein is or comprises a cytosine base editor, is mixed with a guide RNA to form ribonucleoprotein complexes, is contacted with a population of analysis substrates, each comprising a spacer sequence and a PAM sequence, wherein the population comprises analysis substrates having a plurality of spacer sequences, or plurality of PAM sequences, or both, and is contacted with an enzyme that converts C-to-U deamination events to double-strand breaks when they co-occur with SpCas9-HNH domain mediated DNA nicks.
In some embodiments, the genome engineering protein is or comprises a adenine base editor, is mixed with a guide RNA to form ribonucleoprotein complexes, is contacted with a population of analysis substrates, each comprising a spacer sequence and a PAM sequence, wherein the population comprises analysis substrates having a plurality of spacer sequences, or plurality of PAM sequences, or both, and is contacted with an enzyme that converts a combination of a target strand nick and a non-target strand deamination event to a double strand break, e.g., Endonuclease V.
In some embodiments, the guide RNA is expressed in the cells along with, or separately from, the Cas protein, or is added to the samples from an exogenous source (e.g., as synthetic or in vitro transcribed RNA).
In some embodiments, the analysis substrates include identifying sequences, preferably 8-10 nt barcodes.
In some embodiments, determining levels of each of the analysis substrate in each sample at a plurality of times comprises using sequencing, detectably labeled probes, arrays, or hybridization methods.
In some embodiments, determining the rate of depletion of each analysis substrate from the population of analysis substrates over time is determined by modeling the depletion as exponential decay and determining the rate constant of depletion for each analysis substrate. In some embodiments, the methods include identifying analysis substrates that are depleted at a faster rate as substrates for the genome engineering protein.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Methods and materials are described herein for use in the present invention; other, suitable methods and materials known in the art can also be used. The materials, methods, and examples are illustrative only and not intended to be limiting. All publications, patent applications, patents, sequences, database entries, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control.
Other features and advantages of the invention will be apparent from the following detailed description and figures, and from the claims.
Here we describe a series of methods that enable the concurrent assessment of large numbers of genome engineering proteins, e.g., CRISPR nucleases and base editors, at a scale not previously performed, to reduce or eliminate the bottleneck of enzyme characterization in projects that seek to discover or engineer new Cas variants. The assay differentiates itself from prior methods at least because it can be executed in high-throughput format in a human cell lysate, with facile quantification and normalization of the expressed protein of interest (a step critical for accurate property assessment). These methods can be adapted to study different properties (e.g. the PAM preferences, mismatch tolerance, or general specificities) of many CRISPR proteins including nucleases, cytosine base editors (CBEs)1, or adenine base editors (ABEs)2.
The methods described herein include the use of cultured mammalian cells, preferably human cells, that have been engineered to overexpress both (i) a genome engineering protein (e.g., nuclease, base editor, or other protein that can alter DNA, e.g., can alter the genome of a living cell or genomic DNA in vitro) or a variant thereof and (ii) a reporter protein. In preferred embodiments, (i) and (ii) are expressed in a known, fixed ratio, preferably a 1:1 ratio, e.g., from a single nucleic acid construct, e.g., as a fusion protein (e.g., with an intervening linker sequence) a construct comprising a viral 2A sequence in between sequences encoding (i) and (ii). See, e.g., Lewis et al., J. Neuroscience Methods, 256:22-29 (2015). In some embodiments, the cells are also engineered to express a guide RNA.
In preferred embodiments, each different genome engineering protein or variant thereof is expressed in an identified discrete individual population of cells, optionally in a single well of a multi-well plate. The cells are then lysed and expression levels of the proteins determined, e.g., by spectrophotometry, image analysis, or other methods to quantify the levels of fluorescence or signal from the reporter protein. A normalized amount of each protein is then transferred to a second container, e.g., a second multiwell plate, mixed with a guide RNA or prime template to form ribonucleoprotein complexes, and contacted with a population of analysis substrates; in some embodiments, the gRNA can be co-expressed in the cells rather than added later. For example, gRNA expression plasmids can be co-transfected in a molar excess with of the nuclease expression plasmid such that the cell lysate will contain complexed RNPs. This step can be performed to avoid large numbers of in vitro transcription reactions to produce gRNAs. Then amounts of the analysis substrate in the sample are determined at one, two, three, or more time points and the rate of depletion of each analysis substrate from the population of analysis substrates overtime is determined, e.g., via modeling the depletion as exponential decay; the rate constant of depletion for each analysis substrate (e.g., for each PAM sequence) is then used to calculate comprehensive preferences (e.g., PAM preferences) for each variant.
Genome Engineering Proteins
In some embodiments, the methods include expressing a CRISPR nuclease or CRISPR-nuclease based genome editing reagent, e.g., Cas9 or a related protein, a base editor, or a prime editor, or a variant thereof. A number of such reagents, and methods for creating variants, are known in the art. In some embodiments, the protein is or comprises SaCas9, SpCas9, or another CRISPR-Cas protein, including other Cas9 orthologs (Esvelt et al., Nature Methods, 10(11):1116-21; Fonfara et al., Nucleic Acids Res., 42:2577-2590) with various levels of basal activity (e.g. SaCas9 (Ran et al., Nature, 520(7546):186-91; Kleinstiver et al., Nature, 523(7561)481-5; Kleinstiver et al., Nature Biotechnology, 33(12):1293-1298), St1Cas9 (Deveau et al., J. Bacteriol., 190:1390-1400; Horvath, et al., J. Bacteriol., 190:1401-1412; Kleinstiver et al., Nature, 523(7561)481-5), St3Cas9 (Gasiunas et al., Proc. Natl. Acad. Sci. USA, 109(39):E2579-86), NmeCas9 (Hou et al., Proc. Natl. Acad. Sci. USA, 110(39):15644-9), Nme2Cas9 (Edraki et al., Molecular Cell, 73(4):714-726.e4), CjeCas9 (Kim et al., Nature Communications, (8)14500), and other Cas9 orthologs; Cas12a orthologs (Zetsche et al., Cell, 163(3):759-71; Zetsche et al., doi.org/10.2302/kjm.2019-0009-OA), and other Cas3 (HidalgoCantabrana, PMID: 31922192, DOI: 10.1042/BST20190119), Cas12 (Koonin et al. Curr. Opp. Micro., 37:67-68); Yan et al., Science, 363(6422):88-91), Cas13 (Abudayyeh et al., Science, 353(6299):aaf5573; Shmakov et al., Molecular Cell, 60(3):385-97; Abudayyeh et al., Nature, 550(7675):280-284), Cas14 proteins (Harrington et al., Science, 362(6416):839-842), and those collectively reviewed in Makarova et al. (Nat. Rev. Microbiol., 18(2):67-83), as well as engineered variants thereof, which can be used alone or incorporated into a non-nuclease construct, e.g., a nickase (Mali et al., Nature Biotechnology, 31(9):833-8; Ran et al. Cell, 154(6):1380-9), Fokl-dCas9 fusions (Tsai et al., Nature Biotechnology, 32(6):569-76); Guilinger et al., Nature Biotechnology, 32(6):577-582), a base editor (Komor et al. Nature, 533(7603):420-4; Gaudelli et al. Nature, 551(7681):464-471; Rees et al., Nat. Rev. Genet., 19(12):770-788), or a prime editor (Anzalone et al., Nature, 576(7785):149-157).
In some embodiments, the variant is at least 50, 60, 65, 70, 75, 80, 85, 90, 95, or 99% identical to a wild type or reference sequence, and/or comprises at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 mutations/substitutions, e.g., up to 1%, 2%, 2%, 3%, 4%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, or 50% of the sequence, as compared to the wild type or reference sequence. The variants can be random mutations, or can be introduced using a rational design approach to alter one or more characteristics of the protein (e.g., on target effects, off target effects, PAM specificity, and so on). In some embodiments, the mutation is a conservative substitution, e.g., including substitutions within the following groups: glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid, asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, tyrosine. In some embodiments, the mutation is a non-conservative substitution. One of skill in the art could identify and generate such variants.
To determine the percent identity of two nucleic acid sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). The length of a reference sequence aligned for comparison purposes is at least 80% of the length of the reference sequence, and in some embodiments is at least 90% or 100%. The nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position (as used herein nucleic acid “identity” is equivalent to nucleic acid “homology”). The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences. Percent identity between two polypeptides or nucleic acid sequences is determined in various ways that are within the skill in the art, for instance, using publicly available computer software such as Smith Waterman Alignment (Smith, T. F. and M. S. Waterman (1981) J Mol Biol 147:195-7); “BestFit” (Smith and Waterman, Advances in Applied Mathematics, 482-489 (1981)) as incorporated into GeneMatcher PIus™, Schwarz and Dayhof (1979) Atlas of Protein Sequence and Structure, Dayhof, M. O., Ed, pp 353-358; BLAST program (Basic Local Alignment Search Tool; (Altschul, S. F., W. Gish, et al. (1990) J Mol Biol 215: 403-10), BLAST2, BLAST-P, BLAST-N, BLAST-X, WU-BLAST-2, ALIGN, ALIGN-2, CLUSTAL, or Megalign (DNASTAR) software. In addition, those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the length of the sequences being compared. In general, for proteins or nucleic acids, the length of comparison can be any length, up to and including full length (e.g., 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or 100%). For purposes of the present compositions and methods, at least 80% of the full length of the sequence is aligned using the BLAST algorithm and the default parameters.
For purposes of the present invention, the comparison of sequences and determination of percent identity between two sequences can be accomplished using a Blossum 62 scoring matrix with a gap penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5.
Reporter Proteins
A number of reporter proteins are known in the art, and include green fluorescent protein (GFP), variant of green fluorescent protein (GFP10), enhanced GFP (eGFP), TurboGFP, GFPS65T, TagGFP2, mUKGEmerald GFP, Superfolder GFP, GFPuv, destabilised EGFP (dEGFP), Azami Green, mWasabi, Clover, mClover3, mNeonGreen, NowGFP, Sapphire, TSapphire, mAmetrine, photoactivatable GFP (PA-GFP), Kaede, Kikume, mKikGR, tdEos, Dendra2, mEosFP2, Dronpa, blue fluorescent protein (BFP), eBFP2, azurite BFP, mTagBFP, mKalamal, mTagBFP2, shBFP, cyan fluorescent protein (CFP), eCFP, Cerulian CFP, SCFP3A, destabilised ECFP (dECFP), CyPet, mTurquoise, mTurquoise2, mTFPI, photoswitchable CFP2 (PS-CFP2), TagCFP, mTFP1, mMidoriishi-Cyan, aquamarine, mKeima, mBeRFP, LSS-mKate2, LSS-mKatel, LSS-mOrange, CyOFP1, Sandercyanin, red fluorescent protein (RFP), eRFP, mRaspberry, mRuby, mApple, mCardinal, mStable, mMaroonl, mGarnet2, tdTomato, mTangerine, mStrawberry, TagRFP, TagRFP657, TagRFP675, mKate2, HcRed, t-HcRed, HcRed-Tandem, mPlum, mNeptune, NirFP, Kindling, far red fluorescent protein, yellow fluorescent protein (YFP), eYFP, destabilised EYFP (dEYFP), TagYFP, Topaz, Venus, SYFP2, mCherry, PA-mCherry, Citrine, mCitrine, Ypet, IANRFP-AS83, mPapayal, mCyRFP1, mHoneydew, mBanana, mOrange, Kusabira Orange, Kusabira Orange 2, mKusabira Orange, mOrange 2, mKO.sub.K, mKO2, mGrapel, mGrape2, zsYellow, eqFP611, Sirius, Sandercyanin, shBFP-N158S/L1731, near infrared proteins, iFP1.4, iRFP713, iRFP670, iRFP682, iRFP702, iRFP720, iFP2.0, mlFP, TDsmURFP, miRFP670, Brilliant Violet (BV) 421, BV 605, BV 510, BV 711, BV786, PerCP, PerCP/Cy5.5, DsRed, DsRed2, mRFPI, pocilloporin, Renilla GFP, Monster GFP, paGFP, or a Phycobiliprotein, or a biologically active variant or fragment of any one thereof.
Cells
The methods described herein include expression in cells, e.g., mammalian cells, preferably human cells, e.g., cultured cells. Exemplary human cultured cell lines include 3T3; A375; A431; A549; Daudi; HEK293; HeLa; HepaRG; HepG2; Jurkat; MDA-MB-231; MDA-MB436; MDA-MB-468; Saos-2; 1321N1; AtT-20; B16; Ba/F3; BHK; Caki; Calu; CHO; COS; CV-1; Detroit; DMS; EPH4; HEK293T; HL-60; HUVEC; K562; Kasumi; LLC-MK2; MCF; MDA-MB; MDCK; PC3 (PC-3); Phoenix; SCC; Sf21; Sf9; SNU; T47D; THP1; U937 (U-937); U2-OS; and Vero cells.
Methods for expressing proteins in cells are well known in the art. Typically, the cells are combined with an exogenous nucleic acid sequence encoding the proteins and treated in order to accomplish transfection. As used herein, the term “transfection” includes a variety of techniques for introducing an exogenous nucleic acid into a cell including calcium phosphate or calcium chloride precipitation, microinjection, DEAE-dextrin-mediated transfection, lipofection, and electroporation.
For PAM specificity analysis, variants designed to have or suspected to have different PAM preferences are expressed in cells and normalized as described above. The analysis substrates comprise a library of oligonucleotides, each comprising a spacer sequence that corresponds to the spacer sequence of the guide RNA and one of a plurality of different PAM sequences. The rate of depletion of each analysis substrate from the population of analysis substrates due to the action of the nuclease overtime is determined, e.g., via modeling the depletion as exponential decay; the rate constant of depletion for each analysis substrate (and thus for each PAM sequence) is then used to calculate comprehensive PAM preferences for each variant.
While our initial implementation of HT-PAMDA was to profile the PAM preferences of SpCas9 variants, this approach should be extensible to other Cas enzymes and for the in vitro characterization of other properties. The enzyme-containing lysate and/or the PAM library (substrate library) can be substituted to develop new protocols to understand other parameters beyond targeting range. As examples, two alternate implementations to characterize the PAM requirements of C-to-T base editors (CBEs) and A-to-G base editors (ABEs) are highlighted in the CBE-HT-PAMDA and ABE-HT-PAMDA protocols, respectively. In these assays, the lysates containing normalized Cas nucleases are substituted for CBEs or ABEs to characterize the PAM requirements of these enzymes that nick and deaminate DNA compared to nucleases that generate double-strand breaks (Komor et al., Nature 533, 420-424 (2016); Gaudelli et al., Nature 551, 464-471 (2017)). Pending appropriate modifications (discussed below), the HT-PAMDA method is applicable to study other Cas9 orthologs and Cas proteins of different classes (such as Cas12a proteins, as we demonstrated with the lower-throughput PAMDA approach)(Kleinstiver et al., Nat. Biotechnol. 37, 276-282 (2019)). Alternatively, the protocol can also be modified to study different properties of Cas proteins. For example, the target specificities of Cas proteins can be studied using this method by substituting the randomized PAM substrate libraries for libraries encoding spacer sequences with mismatched bases. Broadly, HT-PAMDA and similar adaptations can form a suite of methods for the rapid characterization of the properties of genome editing tools.
To assess whether SpCas9 nucleases and BEs exhibit consistent PAM profiles, the HT-PAMDA assay described above was adapted to function in the absence of SpCas9-mediated DNA cleavage. Instead of double-strand DNA cleavage by SpCas9, this assay relies on SpCas9-based nicking and deamination of a cytosine by the tethered rAPOBEC1 domain. The combination of a target strand nick and a non-target strand deamination event is later converted to a double strand break using USER enzyme to remove the uracil base and cleave the non-target strand backbone, depleting CBE-targetable PAM-containing substrates from the library. Again, the rate of depletion of each analysis substrate from the population of analysis substrates due to the action of the nuclease overtime is determined, e.g., via modeling the depletion as exponential decay; the rate constant of depletion for each analysis substrate (and thus for each PAM sequence) is then used to calculate comprehensive PAM preferences for each variant. See, e.g.,
Adenine base editors (ABEs) enable the generation of A-to-G mutations in human cells2. To characterize the PAM preferences of ABEs, an adenine base editor high-throughput PAM determination assay (ABE-HT-PAMDA) was developed. Rather than relying on cleavage of both DNA strands by SpCas9 to deplete sequences as in HT-PAMDA (Example 1), ABE-HT-PAMDA relies on SpCas9 nicking of the target strand and deamination of an adenine to inosine in the non-target strand by the TadA domains of the ABE2. During the in vitro ABE-HT-PAMDA protocol, the combination of a target strand nick and a non-target strand deamination event is later converted to a double strand break using Endonuclease V (NEB) to nick the non-target strand at the second phosphodiester bond 3′ of the inosine. Again, the rate of depletion of each analysis substrate from the population of analysis substrates due to the action of the nuclease over time is determined, e.g., via modeling the depletion as exponential decay; the rate constant of depletion for each analysis substrate (and thus for each PAM sequence) is then used to calculate comprehensive PAM preferences for each variant. See, e.g.,
Assays that enable the rapid profiling of the tolerance of Cas9 and Cas12a enzymes to single nucleotide substitutions in their target site were developed. The assays are technically similar to the PAMDA (Example 1) but instead of establishing PAM preferences enable thorough characterization of single mismatch tolerance. Thus, in place of using a library of substrates encoding random PAM sequences, we designed and constructed a spacer mismatch depletion assay (SPAMDA) library containing a perfectly matched substrate, those bearing all possible single substitutions across a 39 nt sequence, and 10 controls bearing multiple substitutions, insertions, or deletions (see
The high throughout version of this assay utilizes the same SPAMDA library bearing all single mismatches across a 39 nt sequence, but instead of purified protein the HT assay utilizes human cell lysates containing expressed CRISPR proteins (as done for the HT-PAMDA assays, see Example 1). The variable expression of Cas9 or Cas12a proteins across different transfections is linked to the expression of a 2A-EGFP fluorescence, permitting the normalization of nuclease concentrations based on a fluorescein standard curve (
In each method, the rate of depletion of each analysis substrate from the population of analysis substrates due to the action of the nuclease over time is determined, e.g., via modeling the depletion as exponential decay; the rate constant of depletion for each analysis substrate (and thus for each spacer sequence) is then used to calculate comprehensive single mismatch tolerances for each variant.
Although the above have been described with regard to CRISPR nucleases, CRISPR-nuclease based constructs, and CRISPR base editors, the methods can also be applied to high throughput analysis of sequence specificity of other classes of genome editing proteins (including other CRISPR derivatives, including nickases, prime editors, and others). For example, this strategy can be applied to other nucleic acid-binding proteins (zinc-fingers and zinc-finger nucleases (ZFs and ZFNs), transcription activator-like effectors and transcription activator-like effector nucleases (TALEs and TALENs), restriction enzymes, transposases, recombinases, integrases, etc., using analysis substrate libraries suitable for the protein to be analyzed.
The following materials and methods were used in the Examples below.
High-throughput PAM Determination assay for nucleases
The high-throughput PAM determination assay (HT-PAMDA) was performed using linearized randomized PAM-containing plasmid substrates that were subject to in vitro cleavage reactions with SpCas9 and variant proteins. First, SpCas9 ribonucleoproteins (RNPs) were complexed by mixing 4.375 μL of normalized whole-cell lysate (150 nM Fluorescein) with 8.75 pmol of in vitro transcribed sgRNA and incubating for 5 minutes at 37° C. Cleavage reactions were initiated by the addition of 43.75 fmol of randomized-PAM plasmid library and buffer to bring the total reaction volume to 17.5 μL with a final composition of 10 mM Hepes pH 7.5, 150 mM NaCl, and 5 mM MgCl2. Reactions were performed at 37° C. and aliquots were terminated at timepoints of 1, 8, and 32 minutes by removing 5 μL aliquots from the reaction and mixing with 5 μL of stop buffer (50 mM EDTA and 2 mg/ml Proteinase K (NEB)), incubating at room temperature for 10-minutes, and heat inactivating at 98° C. for 5 minutes. For all variants characterized, time courses were completed on both libraries harboring distinct spacer sequences for n=2; several variants were characterized with additional replicates to evaluate reproducibility of the assay, where for those variants the final data is an average of all replicates.
Next, approximately 3 ng of digested PAM library for each SpCas9 variant and reaction timepoint was PCR amplified using Q5 polymerase (NEB) and barcoded using unique combinations of the i5 and i7 primers. PCR products were pooled for each time point, purified using paramagnetic beads, and prepared for sequencing using one of two library preparation methods. Pooled amplicons were prepared for sequencing using either (1) the KAPA HTP PCR-free Library Preparation Kit (KAPA BioSystems), or (2) a PCR-based method where pooled amplicons were treated with Exonuclease I, purified using paramagnetic beads, amplified using Q5 polymerase and primers with approximately 250 pg of pooled amplicons at template, and again purified using paramagnetic beads. Libraries constructed via either method were quantified using the Universal KAPA Illumina Library qPCR Quantification Kit (KAPA Biosystems) and sequenced on a NextSeq sequencer using a either 150-cycle (method 1) or 75-cycle (method 2) NextSeq 500/550 High Output v2.5 kits (Illumina). Identical cleavage reactions prepared and sequenced via either library preparation method did not exhibit substantial differences.
Sequencing reads were analyzed using a custom Python script to determine cleavage rates for all SpCas9 nucleases on each substrate with unique spacers and PAMs, similar to as previously described36. Briefly, reads were assigned to specific SpCas9 variants based on based on custom pooling barcodes, assigned timepoints based on the combination of i5 and i7 primer barcodes, assigned to a plasmid library based on the spacer sequence, and assigned to a 3 (NNNN) or 4 (NNNN) nt PAM based on the identities of the DNA bases adjacent to the spacer sequence. Counts for all PAMs were computed for every SpCas9 variant, plasmid library, and timepoint, corrected for inter-sample differences in sequencing depth, converted to a fraction of the initial representation of that PAM in the original plasmid library (as determined by an untreated control), and then normalized to account for the increased fractional representation of uncut substrates over time due to depletion of cleaved substrates (by selecting the five PAMs with the highest average fractional representation across all time points to represent the profile of uncleavable substrates). The depletion of each PAM over time was then fit to an exponential decay model (y(t)=Ae−kt, where y(t) is the normalized PAM count, t is the time (seconds), k is the rate constant, and A is a constant), by nonlinear regression. Reported rates are the average across both spacer sequences and across technical replicates when performed. Nonlinear least squares curve fitting was utilized to model Cas9 nuclease and CBE activities, whereas linear least squares curve fitting was previously used for our Cas12a PAMDA assay6.
The cytosine base editor high-throughput PAM determination assay (CBE-HT-PAMDA) was performed using a linearized randomized PAM-containing plasmid library that was subjected to in vitro reactions with base editor variants. First, base editor proteins were complexed with sgRNAs by mixing 8.75 μL of normalized whole-cell lysate (300 nM Fluorescein) with 14 pmol of in vitro transcribed sgRNA and incubating for 5 minutes at 37° C. Cleavage reactions were initiated by the addition of 43.75 fmol of randomized-PAM plasmid library and buffer to bring the total reaction volume to 17.5 μL with a final composition of 10 mM Hepes pH 7.5, 150 mM NaCl, and 5 mM MgCl2. Reactions were performed at 37° C. and aliquots were terminated at timepoints of 4, 32, and 256 minutes by removing 5 μL aliquots from the reaction and mixing with 5 μL of stop buffer (50 mM EDTA and 2 mg/ml Proteinase K (NEB)), incubating at room temperature for 10-minutes, and heat inactivating at 98° C. for 5 minutes. Deamination and nicking events were converted to double strand breaks through the addition of 1 unit of USER enzyme (NEB) in 5 μL of 1× NEB buffer 4 to each reaction, bringing the total volume to 15 μL. After an hour incubation at 37° C., reactions were stopped by adding of 5 ul of 4 mg/mL Proteinase K in 1 mM Tris pH 8.0, incubating at room temperature for 10-minutes, and heat inactivating at 98° C. for 5 minutes. Reactions were carried out on a single plasmid library for each base editor. Samples were subsequently processed as described above for HT-PAMDA for nucleases, with the exception that depletion rates are for a single spacer sequence for CBE-HT-PAMDA, rather than the average of two spacer sequences as in the nuclease analysis.
The high-throughput PAM determination assay for ABEs (ABE-HT-PAMDA) was performed using linearized randomized PAM-containing plasmid substrates that were subject to in vitro reactions with base editor variants. First, base editor proteins were complexed with sgRNAs by mixing 8.75 μl of normalized whole-cell lysate (300 mM Fluorescein) with 14 pmol of in vitro transcribed sgRNA and incubating for 5 minutes at 37° C. Cleavage reactions were initiated by the addition of 43.75 fmol of randomized-PAM plasmid library and buffer to bring the total reaction volume to 17.5 μl with a final composition of 10 mM Hepes pH 7.5, 150 mM NaCl, and 5 mM MgCl2. Reactions were performed at 37° C. and aliquots were terminated at timepoints of 4, 32, and 256 minutes by removing 5 μl aliquots from the reaction and mixing with 5 μl of stop buffer (50 mM EDTA and 2 mg/ml Proteinase K (NEB)), incubating at room temperature for 10-minutes, and heat inactivating at 98° C. for 5 minutes. Deamination and nicking events were converted to double strand breaks through the addition of 5 units of Endonuclease V (NEB) in 5 μl of 1× NEB buffer 4 to each reaction, bringing the total volume to 15 μL. After an hour incubation at 37° C., reactions were stopped by adding of 5 ul of 4 mg/mL Proteinase K in 1 mM Tris pH 8.0, incubating at room temperature for 10-minutes, and heat inactivating at 98° C. for 5 minutes. Reactions were carried out on a single plasmid library for each base editor. Samples were subsequently processed as described above for HT-PAMDA for nucleases.
The SPAMDA plasmid library was prepared by pooling individually cloned substrate plasmids. Oligos pairs harboring the 39 base pair target sequence, a unique 8 base pair barcode, and restriction enzyme overhangs were annealed and ligated into the NheI and HindIII sites of BPK1520 (Addgene plasmid 65777). The final SPAMDA library was a 128-plasmid pool consisting of the “on-target” sequence (1 plasmid), all single nucleotide mismatches throughout the 39 base pair sequence (117 plasmids), and 10 negative control plasmids (6 plasmids with 6 substitutions relative to the “on-target”, 2 plasmids with multiple nucleotide insertions, and 2 plasmids multiple nucleotide deletions). Plasmids were pooled in equimolar ratios.
In Vitro Transcription of sgRNAs or crRNAs for SPAMDA
SpCas9 sgRNAs were in vitro transcribed at 37° C. for 16 hours from roughly 1 μg of HindIII linearized sgRNA T7-transcription plasmid template (cloned into MSP3485) using the T7 RiboMAX Express Large Scale RNA Production Kit (Promega). The DNA template was degraded by the addition of 1 μL RQ1 DNase at 37° C. for 15 minutes. sgRNAs were purified with the MEGAclear Transcription Clean-Up Kit (ThermoFisher) and refolded by heating to 90° C. for 5 minutes and then cooling to room temperature for over 15 minutes.
Cas12a crRNAs were in vitro transcribed from roughly 1 μg of HindIII linearized crRNA transcription plasmid (cloned into MSP3491, Addgene plasmid 114067) using the T7 RiboMAX Express Large Scale RNA Production kit (Promega) at 37° C. for 16 h. The DNA template was degraded by the addition of 1 μL RQ1 DNase and digestion at 37° C. for 15 min. Transcribed crRNAs were subsequently purified with the miRNeasy Mini Kit (Qiagen) and refolded by heating to 90° C. for 5 minutes and then cooling to room temperature for over 15 minutes.
To perform the spacer mismatch depletion assay, first ribonucleoproteins (RNPs) were formed by complexing 1.8 pmol of purified SpCas9 protein with 3.6 pmol of in vitro transcribed sgRNA or 7.2 pmol of purified AsCas12a protein with 14.4 pmol of in vitro transcribed crRNA and incubating for 5 minutes at 37° C. Reactions were initiated through the addition of 225 fmol of PvuI-linearized SPAMDA plasmid library and buffer to a final composition of 10 mM Hepes pH 7.5, 150 mM NaCl, and 5 mM MgCl2 in 45 μL. Reactions were incubated at either 37° C. or 20° C. At each timepoint (30 seconds, 2 minutes, 8 minutes, and 32 minutes), 10 μL of reaction mix was transferred into 10 ul of reaction stop buffer (50 mM EDTA and 2 mg/ml Proteinase K (NEB)) and incubated at room temperature for 10 minutes. Terminated reactions were then purified using paramagnetic beads prepared as previously described6.
Next, approximately 3 ng of digested SPAMDA library for each reaction timepoint was PCR amplified using Q5 polymerase (NEB) and barcoded using unique combinations of barcoded PCR primers. PCR products were pooled for each time point, purified using paramagnetic beads, and prepared for sequencing using the KAPA HTP PCR-free Library Preparation Kit (KAPA BioSystems). Libraries were quantified using the Universal KAPA Illumina Library qPCR Quantification Kit (KAPA Biosystems) and sequenced on a MiSeq sequencer using a 300-cycle MiSeq Reagent Kit v2 (Illumina).
The high-throughput spacer mismatch depletion assay HT-SPAMDA was performed similarly to SPAMDA, but substitutes purified SpCas9 or AsCas12a with unpurified protein in human cell lysate. To generate SpCas9 and AsCas12a proteins from human cell lysates, approximately 20-24 hours prior to transfection 1.5×105 HEK 293T cells were seeded in 24-well plates. Transfections containing 500 ng of human codon optimized nuclease expression plasmid (with a -P2A-EGFP signal) and 1.5 μL TransIT-X2 were mixed in a total volume of 50 μL of Opti-MEM, incubated at room temperature for 15 minutes, and added to the cells. The lysate was harvested after 48 hours by discarding the media and resuspending the cells in 100 ul of gentle lysis buffer (1× SIGMAFAST Protease Inhibitor Cocktail, EDTA-Free (Millipore Sigma), 20 mM Hepes pH 7.5, 100 mM KCl, 5 mM MgCl2, 5% glycerol, 1 mM DTT, and 0.1% Triton X-100). The amount of nuclease protein was approximated from the whole-cell lysate based on EGFP fluorescence. Lysates were normalized to 150 nM Fluorescein (Sigma) based on a Fluorescein standard curve. Fluorescence was measured in 384-well plates on a DTX 880 Multimode Plate Reader (Beckman Coulter) with λex=485 nm and λem=535 nm.
RNPs were then formed by mixing 22.5 pmol sgRNA or crRNA with 11.25 μL of normalized lysate with either SpCas9 or AsCas12a, respectively. Reactions were initiated through the addition of 225 fmol of PvuI-linearized SPAMDA plasmid library and buffer to a final composition of 10 mM Hepes pH 7.5, 150 mM NaCl, and 5 mM MgCl2 in 45 μL. Reactions were incubated at 37° C. At each timepoint (30 seconds, 2 minutes, 8 minutes, and 32 minutes), 10 μL of reaction mix was transferred into 10 ul of reaction stop buffer (50 mM EDTA and 2 mg/ml Proteinase K (NEB)) and incubated at room temperature for 10 minutes. Terminated reactions were then purified using paramagnetic beads prepared as previously described6,21.
Next, approximately 3 ng of digested HT-SPAMDA library for each reaction timepoint was PCR amplified using Q5 polymerase (NEB) and barcoded using unique combinations of barcoded PCR primers. PCR products were pooled for each time point, purified using paramagnetic beads, and prepared for sequencing using the KAPA HTP PCR-free Library Preparation Kit (KAPA BioSystems). Libraries were quantified using the Universal KAPA Illumina Library qPCR Quantification Kit (KAPA Biosystems) and sequenced on a MiSeq sequencer using a 300-cycle MiSeq Reagent Kit v2 (Illumina).
Sequencing reads were analyzed using a custom Python script to determine cleavage rates for each nuclease on each substrate. Briefly, reads were assigned to specific nucleases based on custom pooling barcodes, assigned timepoints based on the combination of i5 and i7 primer barcodes, and assigned to substrate based on the 8 base pair barcode and the 39 base pair target sequence. Counts for all substrates were computed for every nuclease and timepoint, corrected for inter-sample differences in sequencing depth, converted to a fraction of the initial representation of that substrate in the original plasmid library (as determined by an untreated control), and then normalized to account for the increased fractional representation of uncut substrates over time due to depletion of cleaved substrates (by selecting the 10 negative control substrates to represent the profile of uncleavable substrates). The depletion of each substrate over time was then fit to an exponential decay model (y(t)=Ae-kt, where y(t) is the normalized substrate count, t is the time (seconds), k is the rate constant, and A is a constant), by linear regression.
The protospacer-adjacent motif (PAM) of CRISPR nucleases is a short DNA sequence that must be recognized by the enzyme to initiate target binding3. Thus, the availability of PAMs determines what sequences can be targeted by that protein. Accurate and scalable PAM characterization is therefore important for the development and assessment of genome editing technologies. Wild-type Cas9 from Streptococcus pyogenes (WT SpCas9) requires an NGG PAM4,5 (where ‘N’ is any nucleotide), limiting targeting to sites bearing this sequence.
To facilitate a large-scale rational engineering approach to develop SpCas9 variants capable of targeting new PAM sequences, we required a high-throughput PAM determination assay (HT-PAMDA) that could rapidly and comprehensively profile the PAM preferences of dozens or even hundreds of SpCas9 variants. A scalable assay to fulfill these criteria would: (1) preclude protein expression and purification as it is not feasible to purify dozens or hundreds of proteins at scale (as was previously described for modest numbers of Cas12a variants6; or others described for a small number of variants using un-normalized lysates7), (2) would optimally be performed in vitro with conditions approximating a human cell context, and (3) would not be performed in bacteria or bacterial lysates (as we had done previously for SpCas9 and SaCas9 variants8,9) due to intrinsic differences between activities in bacteria and human cells that might result from expression levels, post-translational modification, endogenous factors, etc.
To fulfill these criteria, we developed the HT-PAMDA that first relies on the expression of SpCas9 variants in human cells, a step that can be easily arrayed and thus performed in high-throughput (
Optimization and Validation of HT-PAMDA
In general, we found that the HT-PAMDA profiles for WT SpCas9 and SpCas9-VQR (a variant that we previously engineered to target sites with NGA PAMs9) were highly reproducible across two different spacer sequences (
While attempting to engineer an SpCas9 variant capable of more relaxed targeting, we utilized HT-PAMDA to sequentially determine the contributions of dozens of substitutions at six critical positions in the PAM-interacting domain of SpCas9 (D1135, S1136, G1218, E1219, R1335, and T1337) (
Next we sought to determine whether the HT-PAMDA results accurately recapitulated the PAM preferences of Cas9 enzymes in human cells. To do so, we performed a large number of gene editing experiments in human cells across target sites bearing NGNN PAMs with WT SpCas9, xCas9, SpCas9-NG, and SpG (
Base editor (BE) proteins are fusions of catalytically attenuated Cas9 variants to deaminase domains to mediate specific nucleotide changes in human cells1,2,11. The PAM requirements of BEs have generally been assumed to be consistent with the PAM requirements of CRISPR nucleases, yet it remains to be comprehensively determined whether that they exhibit distinctive preferences. To assess whether SpCas9 nucleases and BEs exhibit consistent PAM profiles, we adapted the HT-PAMDA assay to function in the absence of SpCas9-mediated DNA cleavage. The PAM profiles generated by HT-PAMDA are dependent on the depletion of library members over time due to plasmid cleavage, yet base editors do not intentionally cleave DNA (rather, DNA binding events are followed by nicking and deamination).
Cytosine base editors (CBEs) enable the generation of C-to-T mutations in human cells1. To determine the PAM profiles of CBEs, we adapted HT-PAMDA to develop a cytosine base editor high-throughput PAM determination assay (CBE-HT-PAMDA;
Compared to HT PAMDA for nucleases (
Adenine base editors (ABEs) enable the generation of A-to-G mutations in human cells2. To characterize the PAM preferences of ABEs, we developed an adenine base editor high-throughput PAM determination assay (ABE-HT-PAMDA;
Compared to HT PAMDA for nucleases (
Beyond their PAM requirements, there are other important properties of CRISPR nucleases that must be understood. It has been thoroughly established that SpCas9 and other CRISPR nucleases exhibit off-target effects because the enzymes tolerate substitutions in their binding sites12-15, so it is imperative to determine their tolerance to bind to or cleave off-target sites. In previous work we engineered high-fidelity SpCas9 and AsCas12a variants that have improved genome-wide specificity profiles6,10,16. However, these and other enzymes still remain unable to discriminate against DNA targets that bear single mismatches compared to the intended on-target site. It is therefore important to have assays that enable understanding of these parameters which are critical for the safe use of enzymes, and also required for improving their specificities.
We therefore sought to develop assays that would enable the rapid profiling of the tolerance of Cas9 and Cas12a enzymes to single nucleotide substitutions in their target sites. To do so, we developed an assay that was technically similar to the PAMDA but instead of establishing PAM preferences, would enable thorough characterization of single mismatch tolerance. Thus, in place of using a library of substrates encoding random PAM sequences, we designed and constructed a spacer mismatch depletion assay (SPAMDA) library containing a perfectly matched substrate, those bearing all possible single substitutions across a 39 nt sequence, and 10 controls bearing multiple substitutions, insertions, or deletions (see
To optimize and validate the SPAMDA assay, we purified WT SpCas9, SpCas9-HF1 (bearing N497A/R661A/Q695A/Q926A substitutions)10, and eSpCas9(1.1) (bearing K848A/K1003A/R1060A substitutions)17. While both SpCas9-HF1 and eSpCas9(1.1) were previously shown to exhibit dramatically improved genome-wide specificities (against off-target sites with 2+mismatches) using GUIDE-seq12 or other methods, they were both still able to cleave off-target sites bearing single mismatches16. In our experiments against 3 different target sites encoded in the same SPAMDA library (
We then wondered whether we could use the same SPAMDA library to characterize the single nucleotide specificities of other CRISPR nucleases, including those from the Cas12a family19. We and others have previously shown that WT AsCas12a generally has high genome-wide specificity against target sites bearing 2+mismatches13,20, but can exhibit a more relaxed tolerance of substitutions in the PAM and across certain positions of the spacer6,13. In addition to WT AsCas12a, we also purified AsCas12a-HF1 (bearing an N282A substitution and previously shown to improve specificity), enAsCas12a (bearing E174R/S542R/K548R substitutions and previously shown to exhibit ˜7-fold relaxed recognition of new PAM sequences along with ˜2-3-fold improved on-target activity), and enAsCas12a-HF1 (bearing E174R/N282A/S542R/K548R substitutions, a high-fidelity version of enAsCas12a)6. SPAMDA characterization of these four AsCas12a variants across two target sites using the same SPAMDA library largely recapitulated (
Collectively, these results show that SPAMDA can rapidly recapitulate known properties of naturally occurring and engineered CRISPR-Cas9 and -Cas12a enzymes.
Having established that SPAMDA can accurately determine the single nucleotide preferences of several different CRISPR proteins, we then wondered whether we could optimize a high-throughput version of SPAMDA (HT-SPAMDA) to improve scalability (
To validate the HT-SPAMDA, we utilized WT AsCas12a protein normalized from human cell lysates for in vitro cleavage reactions of two sets of target sites encoded within the SPAMDA library (
Together, these results demonstrate that it is feasible to utilize normalized human cell lysates in the HT-SPAMDA assay to comprehensively determine the single mismatch profile of CRISPR nucleases. The HT-SPAMDA assay should be extensible to other CRISPR proteins, including different Cas9 and Cas12a orthologs, CBEs, ABEs, and others.
This example describes an exemplary detailed protocol for a high-throughput PAM determination assay (HT-PAMDA) method that enables scalable characterization of the PAM preferences of different Cas proteins. Here, we provide a step-by-step protocol for the method, discuss experimental design considerations, and highlight how the method can be used to profile naturally occurring CRISPR-Cas9 enzymes, engineered derivatives with improved properties, orthologs of different classes (e.g. Cas12a), and even different platforms (e.g. base editors). A distinguishing feature of HT-PAMDA is that the enzymes are expressed in a cell type or organism of interest (e.g. mammalian cells), permitting scalable characterization and comparison of hundreds of enzymes in a relevant setting unlike previously available assays. HT-PAMDA does not require specialized equipment or expertise and is cost-effective for multiplexed characterization of many enzymes. The protocol enables comprehensive PAM characterization of dozens or hundreds of Cas enzymes in parallel in less than two weeks.
Overview of the Workflow
HT-PAMDA consists of four major steps (
Randomized PAM library (substrate library) cloning (Steps 1-28)
The randomized PAM libraries, or substrate libraries, are the substrates to be used in the in vitro cleavage reactions. These libraries have two critical features: (i) a fixed spacer sequence, and (ii) a region of randomized nucleotides in place of the PAM (
gRNA preparation (Steps 56-65)
In HT-PAMDA, the gRNA is targeted to the spacer sequence adjacent to the randomized region of the library. There are two general approaches to preparing the gRNA: separate production of a purified gRNA (as done in the HT-PAMDA protocol) or co-transfection of the gRNA and nuclease expression plasmids into cells, combining the nuclease and gRNA production steps. The choice between these options should depend on the number of unique gRNAs to be used in the assay. If a small number of gRNAs will be used to characterize many Cas enzymes that share the same gRNA scaffold (as is the case when characterizing engineered variants of one Cas ortholog), it may be more economical to prepare the gRNA in bulk by in vitro transcription or to purchase a chemically synthesized gRNA for those that are commercially available. Alternatively, if each nuclease requires a different gRNA (for example, when characterizing multiple different Cas orthologs), it may be advantageous to co-transfect nuclease and gRNA expression plasmids into human cells when generating the lysates to avoid a large number of in vitro transcription reactions. If generating the gRNA from a lysate, the gRNA expression plasmid should be transfected in excess so that nuclease molecules are saturated with gRNA.
Production of Nuclease-Containing Lysate (Steps 66-78)
The source of Cas enzyme for HT-PAMDA from unpurified and concentration-normalized human cell lysates facilitates the scalability and accuracy of the method. To generate Cas enzymes from human cell (e.g. HEK 293T) lysates, all nuclease coding sequences should be cloned into an appropriate human expression vector that also includes a transcriptionally coupled fusion to a reporter gene to enable lysate normalization (e.g. to a 2A peptide and a fluorescent protein;
In Vitro Cleavage Reactions (Steps 79-87)
Time course in vitro cleavage experiments with control samples can be performed to test the functionality of both the lysate and gRNA before proceeding to a large-scale characterization. This ensures performance of reagents and is recommended to optimize conditions for new systems. In addition to the intended lysate/gRNA/PAM library combination, control samples should include (i) un-transfected lysate, (ii) nuclease-containing lysate without gRNA, and (iii) nuclease-containing lysate with non-targeting gRNA. We recommend using SpCas9 and AsCas12a as positive control nucleases for 3′ and 5′ PAM libraries, respectively. The results of these quality control experiments may be determined by NGS by following the HT-PAMDA protocol. Alternatively, for a faster quality control readout, DNA substrates resembling the PAM library but instead harboring fixed canonical and non-canonical PAMs may be used (to establish an appropriate dynamic range of in vitro cleavage rates of various substrates for the assay). Small-scale pilot experiments allow optimization of PAM library concentration, lysate concentration, and timepoint selection, where the in vitro cleavage reactions can be visualized and quantified by agarose gel or capillary electrophoresis.
It is desirable to have a control nuclease for which the performance of the nuclease in mammalian genome editing applications is known. Assay conditions should reflect the performance of the control nuclease in relevant genome editing settings. For example, with SpCas9 as a control in vitro cleavage reaction, canonical NGG PAMs should be depleted in early timepoints, and non-canonical NAG and NGA PAMs should be depleted at later timepoints to recapitulate the well-documented relative activities in human cells5,7,17,18,25.
NGS library preparation and sequencing (Steps 29-48, 88-116)
The library preparation for HT-PAMDA is designed to maximize throughput by minimizing pipetting and leveraging multiple barcoding steps (
The required sequencing depth per sample is dependent on the PAM representation of the substrate library, the number of nucleotides required to ascertain the complete PAM, the number of timepoints, and the number of substrate libraries. These factors considered, we recommend sequencing at a depth of approximately 750,000 reads per sample to resolve up to 5 nt of PAM preference, where a sample is comprised of one nuclease across three timepoints on two randomized PAM libraries harboring distinct spacer sequences (an average of 125,000 reads per nuclease/substrate library/timepoint). Accounting for a PhiX spike-in to increase nucleotide diversity and typical mapping rates in the analysis pipeline, there are several sequencing platforms and reagent kits that enable flexible assay throughput, including MiSeq and NextSeq.
Visualization of PAM Preference (Step 116)
Representations of PAM preference ideally provide a comprehensive description of both PAM preference and activity. As examples, wild-type (WT) SpCas9, and the SpCas9 variants SpG (harboring the mutations D1135L/S1136W/G1218K/E1219Q/R1335Q/T1337R) and SpRY (harboring the mutations A61R/L1111R/D1135L/S1136W/G1218K/E1219Q/N1317R/A1322R/R1333P/R1335Q/T1337R), recognize NGG, NGN, and NRN>NYN PAMs, respectively (
Beyond the choice of PAM visualization format, it's also essential to represent all bases of the PAM that influence PAM preference. Failing to do so can misleadingly represent a group of PAMs as targetable, when the group is actually comprised of both targetable and non-targetable sequences. Even thoroughly characterized nucleases have PAM preferences beyond their well-known canonical requirements so it is good practice to visualize more positions than are anticipated to influence activity. For example, while SpCas9 is known to have 2 nt of specificity for its canonical NGG PAM, the capacity to target sites with shifted NNGG PAMs is apparent when also visualizing the 4th nucleotide of the PAM (
Additional Design Considerations
Endpoint Versus Kinetics Measurements
Most experimental methods for characterizing PAM specificity are amenable to either endpoint or multiple timepoint measurements that enable calculation of kinetic parameters. While endpoint measurements are experimentally more straightforward and require less total sequencing depth, they can provide dramatically different characterizations of PAM preference depending on the selected timepoint. The use of multiple timepoints enables the determination of cleavage kinetics for each PAM, a more intrinsic metric of activity that is more informative compared to the use of a single endpoint measurement.
Alterations for base editor formats (Step 87)
While PAM depletion assays typically require DNA double-strand breaks (DSBs) to deplete targetable PAMs from the library, these assays are also adaptable for the measurement of other DNA modifications such as those made by base editors. For example, in CBE-HT-PAMDA the CBE generates target strand nicks and non-target strand C-to-U deamination events that can be converted to DSBs via treatment with USER enzyme to excise uracil nucleotides. Similarly, in ABE-HT-PAMDA, ABEs generate target strand nicks and non-target strand A-to-I deamination events that can be converted to DSBs via treatment with Endonuclease V to cleave the inosine-containing non-target strand28. These assays require additional considerations, including library design to position target cytosines or adenines within the edit window of the target site, and alterations to in vitro reaction conditions to accommodate different reaction kinetics.
Assay Readout Formats by Sequencing
Most PAM determination assays can be read out by either NGS or Sanger sequencing. Sanger sequencing of PAM libraries provides a coarse description of PAM preference by averaging composition at each position of the PAM at a given endpoint. This can be rapid and affordable for a small number of samples; however, this approach occludes positional dependencies in the PAM and thus can provide an inaccurate characterization of PAM preference. NGS-based readouts provide a more complete characterization and enable sample multiplexing via barcoding that increase sample throughput while decreasing per-sample cost.
Biological Materials
Reagents
Solutions
Dissolve 18 g of glucose in 100 mL of water. Filter or autoclave to sterilize and store aliquots at room temperature (22° C.) or −20° C. indefinitely.
Dissolve 1 g of carbenicillin disodium salt in 10 mL of water. Mix to dissolve, aliquot, and store at −20° C. for at least one year.
To make 10× STE buffer, combine 1 mL of 1 M Tris-HCl pH 8.0, 1 mL of 5 M NaCl, 200 μL of 0.5 M EDTA pH 8.0, and nuclease-free water to 10 mL (1× STE: 10 mM Tris-HCl pH 8.0, 50 mM NaCl, and 1 mM EDTA). Filter or autoclave to sterilize and store aliquots at room temperature indefinitely.
Combine 5 mL 1 M Tris-HCl (pH 8.0), 1 mL 0.5M EDTA (pH 8.0), and nuclease-free water to 500 mL. To prepare 0.1× TE, diluted 1:10 using nuclease-free water. Filter or autoclave to sterilize and store aliquots at room temperature indefinitely.
Combine 135 g of PEG-8000, 150 mL of 5 M NaCl, 7.5 mL of 1 M Tris-HCl pH 8.0, 1.5 mL of 0.5 M EDTA, 375 μL of Tween20, and sterile-filtered deionized water to a final volume of 750 mL. Add a magnetic stir bar and stir on a magnetic stir plate. The solution may be heated to approximately 50° C. to facilitate dissolving the PEG. When dissolved, the solution should be completely transparent. Sterile filter the buffer and store at room temperature indefinitely. The buffer is highly viscous and will pass slowly through the filter.
To make 10× cleavage buffer, combine 10 mL of 1 M Hepes pH 7.5, 30 mL of 5 M NaCl, 5 mL of 1 M MgCl2, and deionized water to a final volume of 100 mL (1× cleavage buffer: 10 mM Hepes pH 7.5, 150 mM NaCl, and 5 mM MgCl2). Filter or autoclave to sterilize and store aliquots at room temperature indefinitely.
Prior to use in in vitro cleavage reactions, a 1 mL aliquot of 10× cleavage buffer should be supplemented with 10 μL of 1 M DTT (to make 10× cleavage buffer+DTT).
To make 1× lysis buffer, combine 2 mL of 1 M Hepes pH 7.5, 10 mL of 1 M KCl, 500 μL of 1 M MgCl2, 5 mL of glycerol, SIGMAFAST Protease Inhibitor Cocktail tablet (EDTA-Free), 100 μL of 1 M DTT, 100 μL of Triton X-100, and sterile-filtered deionized water to a final volume of 100 mL. Mix until the protease inhibitor tablet is dissolved. (1× lysis buffer: 20 mM Hepes pH 7.5, 100 mM KCl, and 5 mM MgCl2, 5% (v/v) glycerol, 1 mM DTT, 0.1% (v/v) Triton X-100, and protease inhibitor). The lysis buffer without DTT and the protease inhibitor can be filtered or autoclave to sterilize and aliquots can be stored at room temperature indefinitely. Fully reconstituted lysis buffer should be prepared fresh.
For stopping in vitro cleavage reactions, prepare a solution of 1× stop buffer by combining 0.5 μL Proteinase K (20 mg/ml), 0.5 μL 500 mM EDTA (pH 8.0), and 4 μL water for each reaction to be stopped (for final concentrations of 2 mg/mL Proteinase K and 50 mM EDTA).
Prepare a stock solution of 2.5 mM fluorescein dye. First, dissolve 1 mg of fluorescein free acid in 1 mL of 1 M NaOH. Next, dilute the 1 mg/mL dye solution to 2.5 mM in 1× cleavage buffer. Store 1 mL aliquots at −20° C. for at least one year.
Dilute 10 μL of 2 N NaOH in 90 μL of nuclease-free water.
Combine 100 μL of 1 M Tris-HCl pH 8.0, 10 μL of Tween 20, and nuclease-free water to 10 mL. Filter or autoclave to sterilize and store aliquots at room temperature indefinitely.
Combine 200 μL of 1 M Tris-HCl pH 8.0 and 800 μL of nuclease-free water. Filter or autoclave to sterilize and store aliquots at room temperature indefinitely.
In a biological safety cabinet, combine Dulbecco's Modified Eagle Medium (DMEM), Fetal Bovine Serum (FBS; final 10% v/v), and Penicillin-Streptomycin (100 U/mL). Sterile filter media with a vacuum flask. Media should be stored at 4° C. and warmed to 37° C. before use. Fresh media should be prepared every few months.
Reconstitute 28 g of super optimal broth (SOB) powder with distilled water to 1 L. Dissolve powder by swirling. Autoclave at 121° C. for 30 minutes to sterilize. Let the medium cool to room temperature; once cooled, add 20 mL of sterile-filtered 1 M glucose. Prepared SOC can be stored at room temperature indefinitely if kept sterile.
Reconstitute 25 g of lysogeny broth (LB) powder with deionized water to 1 L. Dissolve powder by swirling. Autoclave at 121° C. for 30 minutes to sterilize. Let the medium cool to room temperature before adding antibiotic. For LB with Carbenicillin: Add 1 mL of Carbenicillin at 100 mg/mL to 1 L of LB broth. LB with Carbenicillin can be stored at 4° C. for 2 weeks. For LB with Kanamycin: Add 1 mL of Kanamycin at 50 mg/mL to 1 L of LB broth. LB with Kanamycin can be stored at 4° C. for 2 weeks.
Reconstitute 40 g of LB agar powder with deionized water to 1 L. Dissolve powder by swirling. Add a magnetic stir bar. Autoclave at 121° C. for 30 minutes to sterilize. After autoclaving but while the solution is still hot, stir slowly at room temperature using the magnetic stir bar and a magnetic stir plate. Let the medium cool to approximately 50° C. while stirring before adding antibiotic. For LB with carbenicillin: Add 1 mL of Carbenicillin at 100 mg/mL to 1 L of LB agar and stir for several minutes. For LB with kanamycin: Add 1 mL of Kanamycin at 50 mg/mL to 1 L of LB agar and stir for several minutes. Before the media cools and solidifies, pour approximately 20 mL of LB agar with antibiotic into 100-mm Petri dishes. Cover Petri dishes once poured and store at room temperature until the plates have cooled to room temperature. Store LB agar plates in plastic bags at 4° C. for up to a month.
SPRI Bead Preparation
Preparation of Barcoded PCR Primer Plate for Library Preparation
For each set, prepare an arrayed 96-plate of 5 μM each forward and reverse primers as follows: Add 90 μL of 0.1× TE buffer to each well of a 96-well PCR plate. In a separate 8-strip tube, aliquot 70 μL of each 100 μM P5 primer in order P5-1 through P5-8. Using a multichannel, aliquot 5 μL of the primers into each column of the 96-well PCR plate such that row A contains P5-1, row B contains P5-2, etc. In a separate 12-strip tube, aliquot 50 μL of each 100 μM P7 primer in order P7-1 through P7-12. Using a channel multichannel, aliquot 5 μL of the primers into each row of the 96-well PCR plate such that column 1 contains P7-1, column 2 contains P7-2, etc. Seal tightly with an aluminum adhesive plate seal, mix by gently vortexing, spin down, and store at −20° C.
PAM Library, gRNA, and Lysate Preparation
Cloning the Randomized PAM Substrate Library
The following library construction steps should be performed for each PAM library. Multiple libraries can be constructed in parallel. The steps are described specifically for the construction of a library harboring a randomized 3′ PAM encoded by the primer oBK1948 (Table 1). Until analysis of the PAM representation within the library (Step 55), the steps are otherwise identical for constructing other libraries bearing different spacers or randomized PAMs on the 5′ end of the spacer (e.g. those encoded by oligos oBK1949, oBK5962, oBK5964, or user-defined oligo designs following the same cloning strategy; Table 1). The following steps include cloning of the randomized PAM libraries, however four ready-to-use libraries are available on Addgene (two spacer sequences each for 3′ and 5′ randomized PAM libraries). To skip cloning, proceed directly to NGS validation of the library (Step 29).
The steps to produce the SpCas9 gRNA targeted to spacer 1 by in vitro transcription are described below. This procedure should be carried out for each gRNA to be used in HT-PAMDA and multiple gRNAs can be produced in parallel. Custom gRNAs can be cloned into pT7-gRNA entry vectors for SpCas9 and AsCas12a, by digesting the vectors with the appropriate type IS restriction enzyme and ligating in annealed complementary oligos encoding the desired spacer sequence with the appropriate restriction site overhangs (Table 1). Entry vectors for other Cas ortholog gRNAs can be prepared with standard molecular cloning techniques. Ready-to-use T7 transcription plasmids are available on Addgene for two spacer sequences each for SpCas9 gRNAs and AsCas12a crRNAs corresponding to the substrate libraries. To avoid cloning steps, gRNAs may also be produced by in vitro transcription from oligo templates composed of a T7 promoter and the gRNA. Oligo templates can be used to produce SpCas9 sgRNAs, separate SpCas9 tracrRNA and crRNAs, AsCas12a crRNAs, and other gRNA designs. When available from commercial vendors, chemically synthesized gRNAs may also be used.
TGCCGGNNNNNNNNCTNNNGCGCAGGTCA
TCACCTNNNNNNNNCTNNNGCGCAGGTCA
AATCCCTTCTGCAGCACCTGGGCGCAGGT
GATGGTCCATGTCTGTTACTCGCGCAGGT
Cell culture and transfection should be performed for each nuclease (unless cotransfecting gRNAs, in which case, transfections must be carried out for each nuclease-gRNA pair). Transfections can be executed in parallel.
Add 3 mL of pre-warmed trypsin and incubate at 37° C. for approximately 5 minutes. Add 22 mL of pre-warmed media to quench trypsin and suspend the cells by pipetting. Count the cells and seed approximately 5×106 or 2.5×106 cells for 2 or 3 days of growth, respectively, in a total volume of 25 mL in a 150-mm culture dish. To seed HEK 293T cells in 24-well plates for next-day transfection, seed 1.5×105 cells per well in 500 μL of HEK 293T culture medium per well. HEK 293 Ts can easily detach from the plate. Pipette PBS onto the side of the culture dish rather than directly onto the cells when passaging.
The transfection mix should be added to cells within 30 minutes following the mixing of TransIT-X2 with OptiMEM and DNA for optimal transfection efficiency.
This procedure should be carried out for each linearized library harboring randomized PAMs from Step 27 (henceforth referred to as “substrate libraries”). All steps should be performed with care to avoid cross-contamination.
Stagger sets of 12 reactions to save time. For example, with timepoints of 1, 8, and 32 minutes, stagger four sets of 12 reactions for a total of 48 reactions simultaneously as follows:
Library Preparation
PCR #1—Sample Barcoding
Each treated sample must receive a unique sample barcode primer pair. Any primer pair can be used for the no-template control.
If the untreated substrate library will be sequenced, a unique primer pair must be used to barcode the sample. If the full set of 96 primer pairs are used for experimental samples, a unique primer pair may be created for the untreated control by using one of the extra P5 sample barcoding primers not included in the arrayed primer plate (see Table 1).
Repeat any PCRs that exhibit low or no evidence of amplification.
Pooling of PCR samples corresponding to single timepoints
PCR #2—Timepoint Barcoding
101. To each 16 μL PCR rom Step 100, add 2 μL of diluted (0.125 ng/μL) timepoint pool (from Step 98) as template and 2 μL of 5 μM unique timepoint barcoding primer pairs (as described in Reagent Setup).
Each timepoint pool must receive a unique timepoint barcode primer pair.
103. Confirm amplification by running the reactions on a capillary electrophoresis machine (as described in Step 32) or an agarose gel. All samples except the negative control should have a single band of roughly equal intensity with a size of 279 bp.
Purified timepoint pool PCRs can be stored at −20° C. for extended periods of time until proceeding to library quantification.
Library Quantification
Accurate dilution of the library is important for ensuring appropriate cluster density during sequencing.
The final 4 nM HT-PAMDA library (
Properly mix the resulting loading solution.
The HT-PAMDA library has low nucleotide diversity. Two-color sequencing systems like the NextSeq are especially sensitive to over-clustering with low nucleotide diversity libraries. For this reason, we recommend loading below Illumina's recommended library concentrations for the NextSeq system and using a high proportion of PhiX control (to improve nucleotide diversity). We recommend the following loading concentrations for the MiSeq and NextSeq:
Load the sequencer following standard protocols in the Illumina system manual and sequence the libraries with the following options: For NextSeq, put the instrument in “Manual Run Mode” (also called “Standalone Mode” prior to NextSeq Control Software 4.0). For the MiSeq, complete the run setup with the “Manual” option. Enter the number of cycles to meet the following minimum requirements, as follows:
Deep sequencing of the randomized PAM libraries following library construction but prior to in vitro cleavage reactions ensures adequate representation of all PAMs. Additionally, the composition of the substrate library serves as the zero-timepoint sample for subsequent experiments. Library composition for two of our 3′ PAM substrate libraries is provided in the GitHub repository as a reference to compare user-constructed libraries. Ideally, all PAMs will have similar representation in the untreated substrate library; for analysis of an NNNN PAM window from the library, there are 256 possible PAM sequences that will have an average representation of 0.3906% of the library (
Control samples and replicates provide quality control metrics for an HT-PAMDA experiment. Well-characterized CRISPR nucleases for mammalian genome editing applications including SpCas9 and AsCas12a for 3′ and 5′ PAMs, respectively, can ensure appropriate assay performance to infer activities in mammalian cells. Raw read counts of each PAM from a given timepoint can verify the success of an HT-PAMDA experiment; the PAM read count distribution of the no-guide control should not deviate from that of the untreated substrate library, while experimental samples should show depletion and enrichment of sequences that are consistent with the expected PAM profile (
Nature 523, 481-5 (2015).
Science 361, 1259-1262 (2018).
Nature 523, 481-5 (2015).
Nat Med 25, 776-783 (2019).
This application claims priority under 35 USC § 119(e) to U.S. Patent Application Ser. No. 62/965,645, filed on Jan. 24, 2020. The entire contents of the foregoing are hereby incorporated by reference.
This invention was made with government support under Grant No. CA218870 awarded by the National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/014887 | 1/25/2021 | WO |
Number | Date | Country | |
---|---|---|---|
62965645 | Jan 2020 | US |