The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Jan. 12, 2022, is named UNCG_20-0009_SL.txt and is 19,426 bytes in size.
The present disclosure relates to methods for designing gRNAs for use in applications such as antivirals.
Class II CRISPR effectors like Cas9, Cas12, and Cas13, are endonucleases that use a modular segment of their RNA cofactors known as CRISPR RNAs (crRNAs) or guide RNAs (gRNAs) to recognize and trigger the degradation of nucleic acids with a sequence complementary to that segment. These diverse enzymes are derived from a bacterial and archaeal defensive response to invasive plasmids and viruses and, because of their ability to be easily redirected to nucleic acid with different sequences by simply changing the sequence composition of that short portion of their gRNAs called their ‘spacer,’ they have been re-appropriated over the past several years for a number of different biotechnological applications, most notably in precision gene editing. During precision gene editing, a CRISPR effector is transfected into a human cell and directed to introduce a double strand break (DSB) into the genomic DNA at a specific targeted sequence; genomic mutations have been introduced at those sites as a result of mutagenic DSB repair. These technologies have experienced widespread adoption for biomedical research and possess a number of emerging therapeutic applications as well.
Another nascent, but less-developed, application of CRISPR effectors has been as novel antiviral therapeutics, diagnostics, and prophylactics, based on their ability to recognize and degrade viral genomes. The first CRISPR antiviral efforts used the type II CRISPR effector Cas9 from Streptococcus pyogenes (SpyCas9), which recognizes and introduces DSBs into double-stranded DNA (dsDNA) targets, and so efforts were focused largely on degrading dsDNA viruses and excising the Human immunodeficiency virus 1 (HIV-1) proviruses from cells with latent infection. However, it was found that rapid accumulation of mutations within the target regions inhibit CRISPR activity and can drive mutagenic escape from these treatments, and so successful application of these efforts has been limited. Later, another variety of CRISPR effectors, type V CRISPR effector Cas12a (formerly named Cpf1), was identified as a divergent class of RNA-guided dsDNA endonucleases that are also capable of precision gene editing activities. Recently, it was reported that Cas12a effectors can outperform Cas9 in HIV inhibition studies in vitro. Cas12a effectors were also found to indiscriminately degrade single-stranded DNA (ssDNA) after recognizing its dsDNA target, and several sensitive viral detection technologies have been developed that make use of this capability. Furthermore, because the vast majority of pathogenic viruses are RNA viruses, more recently excitement for the potential of CRISPR antivirals has been spurred by the development of RNA-guided RNA endonucleases, in particular type VI CRISPR effectors known as Cas13a (formerly C2c2), Cas13b, and Cas13d, for applications in human cells. Recent demonstrations of Cas13 reducing viral load by either degrading viral single-stranded RNA (ssRNA) genomes or viral mRNA have been performed in plant (e.g., turnip mosaic virus), mammalian (e.g., dengue virus), porcine reproductive and respiratory syndrome virus, and human cells (e.g., lymphocytic choriomeningitis virus); influenza A virus; and vesicular stomatitis virus, and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Cas13 nucleases also exhibit nonspecific RNAse activity after recognition of their targets, and this nonspecific degradation has been exploited in sensitive viral detection strategies as well. These applications have shown significant promise for the future of CRISPR antivirals; however, further maturation of these biotechnologies is required to overcome some of the remaining challenges to reach their full potential.
One major challenge in the development of CRISPR antivirals comes from the rapid mutation rate of viruses. As a result, CRISPR antivirals must be tolerant to polymorphisms that occur across viral strains, and CRISPR antiviral systems also must be designed to suppress mutational escape. Previously, these challenges have been addressed by targeting the CRISPR effector to highly conserved regions of the viral genome, and by the introduction of multiple gRNAs to target different regions of the viral genome simultaneously (multiplexing) in order to make mutational escape less likely. At the same time, CRISPR multiplexing introduces a number of additional practical challenges. Furthermore, no quantitative criteria have been described for the level of sequence conservation, beyond counting the number of inter-strain variations at different genomic locations, for identifying potential antiviral targets expected to be highly active across clinical variants.
The present disclosure is directed to methods of gRNA design and nucleic acid sequences derived therefrom. In particular, the present disclosure provides methods for designing the sequences of polyvalent guide RNAs (pgRNAs).
An example aspect of the present disclosure can include a method to improve the breadth, range, and efficiency of CRISPR antivirals and CRISPR-based virus detection by improving the design and selection of the guide RNA. The disclosure is based on the idea that CRISPR effectors are inherently “promiscuous” (able to degrade non-perfect complements, subject to a number of biophysical constraints) as a result of their origins in bacterial defense against phages, and this promiscuity can be exploited in the design of gRNAs that might more effectively be able to target a broad range of coronaviruses (or viral families more broadly) or even multiple sites within the same viral genome in order to potentially enhance anti-viral activity.
The off-target activities of CRISPR systems have been noted in gene editing technologies, where off-target activity can a major hindrance to therapeutic applications, however there have been few applications of this knowledge. Example embodiments herein can be applied for identifying widely conserved ‘targets’, which are sequences (partially) complementary to the gRNA but which may have mutations in some strains at parts of the target where mutations are well tolerated, as one of the primary design considerations of a gRNA, rather than locations of conserved sequence (where mutations might not at all affect CRISPR activity).
Further, one example aspect of the present disclosure includes methods to balance the promiscuity of guide RNA to reduce possible promiscuous activity with the human genome (DNA) or transcriptome (RNA). In some implementations, these considerations can also be balanced against other biophysical factors that might affect CRISPR activity, such as any predicted secondary structures of the guide RNA, polynucleotide repeats that might affect expression or structure, accessibility of the targeted sites, activity prediction from other sources.
While, CRISPR antivirals have not been validated for therapeutic application, there are a number of in vitro reports. The therapeutic potential of CRISPR antivirals is emerging and there will likely be increased interest in the wake of the COVID-19 pandemic. Such antivirals may be of particular interest in cases of emerging pathogenic viruses, like SARS-COV-2, where no vaccine exists and limited treatments exist. CRISPR antivirals could provide a very rapid response therapeutic under these conditions.
The same CRISPR effectors (e.g., Cas 13) that have been used for in vitro antivirals have also been used for the rapid detection of pathogenic viruses from human samples, so another example aspect of the present disclosure can include detection systems for targeting a virus.
In general, the present disclosure is directed to various embodiments which can include, for example, a method for determining a pgRNA sequence. For instance, an example method can include identifying two or more target sequences (nucleic acid sequence can be RNA and/or DNA) in a viral genome for recognition by a Cas effector, and for each target sequence of the two or more target sequences, calculating a homology score comprising aligning said target sequence with each other target sequence of the two or more target sequences. After calculating the homology score, the example method can also include determining one or more target pairs based at least in part on the homology score, where each target pair includes a first target sequence and a second target sequence of the two or more target sequences having the homology score calculated as greater than or equal to 60% sequence identity (e.g., greater than or equal to 75, 80, 85, or 95% sequence identity). Additionally, the example method can include generating a pgRNA template for at least one of the one or more target pairs, where the pgRNA template has a complementary sequence to the first target sequence, a complementary sequence the second target sequence, or a convergent sequence (e.g., a sequence that is some combination of both complementary sequences). Another aspect of the example method can include generating a relative activity score for each of one or more pgRNA templates by comparing the pgRNA template to a complementary sequence to the first target sequence and a complementary sequence to a second nucleotide sequence present in a different viral genome, a mutant viral genome, or both, wherein each pgRNA template comprises a sequence of nucleotides. The example method can optionally include determining an off-target score for each pgRNA template based at least in part on the relative activity score generated for said pgRNA template. Finally, the example method can include determining the pgRNA sequence based at least in part on the relative activity score for each pgRNA template, the off-target score, or both.
An example aspect of identifying the two or more target sequences in the viral genome can include determining a sequence position for each of one or more protospacer motifs present in the viral genome based at least in part on the CAS effector, were each of the one or more protospacer motifs include an adjacent sequence of nucleotides; and assigning at least one sequence position as a protospacer position; and identifying the two or more target sequences as a sequence of nucleotides immediately downstream (toward the 3′ end) of the protospacer position.
For certain example methods the Cas effector can be enAsCas12a.
In some example methods the one or more protospacer motifs are from the group consisting of: TTYN, CTTV, RTTC, TATM, CTCC, TCCC, TACA, RTTS, TATA, TGTV, ANCC, CVCC, TGCC, GTCC, TTAC, or combinations thereof.
In some example methods, the one or more protospacer motifs are from the group consisting of: TTYN, CTTV, RTTC, TATM, CTCC, TCCC, TACA, or combinations thereof.
In some example methods, the different viral genome and the viral genome are included in a viral family (e.g., coronaviruses).
An example aspect comparing the pgRNA template to the complementary sequence to the first nucleotide sequence and the complementary sequence to the second nucleotide sequence present in the different viral genome, the mutant viral genome, or both can include determining a first sequence identify for the pgRNA template to the complementary sequence to the first nucleotide sequence and a second sequence identity for the pgRNA template to the complementary sequence to the second nucleotide sequence. In certain example methods the first sequence identity and the second sequence identity are calculated based on a BLAST alignment, and wherein the relative activity score is based at least in part on the first sequence identity and the second sequence identity.
In some example methods calculating the off-target score is performed only for the pgRNA templates having calculated the first sequence identity as greater than about 60% and the second sequence identity as greater than about 60%. For instance, in certain example methods, calculating the off-target score is performed only for the pgRNA templates having calculated the first sequence identity as greater than about 90% and the second sequence identity as greater than about 90%.
For certain example methods, calculating the off-target score is based at least in part on comparing each of the one or more pgRNA templates to a human genome sequence or a human transcriptome sequence.
For certain example methods, determining the pgRNA sequence is based at least in part on a region of interest comprising a sequence of adjacent nucleotides present in the viral genome.
Another example embodiment of the present disclosure can include a pgRNA sequence determined according to any of the preceding example methods. For instance, a pgRNA can be determined based on identifying two or more target sequences in a coronavirus genome (e.g., SARS-CoV-2).
A further example embodiment of the present disclosure can include a method for treating a viral infection in a patient that includes delivering to a patient in need thereof a composition including an example pgRNA having a sequence determined according to example methods herein.
Aspects of certain methods for treating a viral infection can include treating a patient displaying certain symptoms (e.g., Covid-19).
In general, the present disclosure is directed to methods for design of gRNAs for CRISPR antivirals that exploits the widely-recognized tendency of different CRISPR effectors to possess varying levels tolerances to imperfect complementary between the gRNA spacer and the targets. While significant efforts have gone into limiting this tendency for precision gene editing applications—and activity at multiple or “off-target” sites prevented at all costs—implementations of the present disclosure utilize a process for generating “polyvalent” gRNA (pgRNAs) that can demonstrate activity at multiple viral genomic sites: in effect producing operational multiplexing with a single gRNA. For instance, embodiments of the present disclosure can be used to generate pgRNA sequences that can be characterized by one or more of the following properties: (i) high relative activity at multiple viral targets, (ii) high relative activity across clinical strain variants, (iii) low predicted relative activity at potential human “off-targets,” and (iv) reasonable biophysical characteristics that suggest high CRISPR activity for potential antiviral and/or viral detection applications.
Aspects of example implementations include: designing pgRNAs which exhibit >95% activity at distant viral sites along a viral genome such as the SARS-CoV-2 ssRNA genome and which can be tolerant to variations across strains, while still avoiding predicted off-target activity with components of the human transcriptome. In particular, these pgRNAs may be designed based on the pgRNA use in combination with a specific Cas effector such as Cas13 from Ruminococcus flavefaciens XPD3002 (RfxCas13d). Another example of a Cas effector can include a Cas12a variant (engineered Cas12a from Acidaminococcus sp. BV3L6, enAsCas12a) that can target multiple locations along the HIV-1 provirus—up to three viral targets using a single pgRNA designed in accordance with the present disclosure—while minimizing activity at other sites in the human genome.
One example implementation in accordance with the present disclosure can include a method for determining a pgRNA sequence, such as a pgRNA sequence for producing an antiviral. The method for determining a pgRNA sequence can include identifying two or more target sequences (e.g., a nucleic acid sequence that can be RNA or DNA) in a viral genome for recognition by a CAS effector. The method can also include calculating a homology score, based on performing an alignment between each target sequence of the two or more target sequences with each other target sequence. More particularly, the homology score can include a metric such as sequence identify, sequence similarity, or other similar method for determining regions of overlap between target sequences.
Example methods for determining a pgRNA sequence can also include determining a target pair comprising a first nucleotide sequence present in the viral genome and a second nucleotide sequence present in the different viral genome, the mutant viral genome, or both. In some embodiments, the target pair can be determined based at least in part on the homology score. For example, the homology score may determine that a sequence of nucleotides (nt) displays 95% sequence identity between the viral genome and a different viral genome. In certain implementations, depending on if the homology score meets a certain threshold (e.g., greater than 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%), the sequence of nucleotides can be used to determine the target pair. As should be understood, the different viral genome may include a viral genome from the same viral family (e.g., coronaviruses).
Another aspect of example methods for determining a pgRNA sequence can include generating a relative activity score for each of one or more pgRNA templates by comparing the pgRNA template to a complementary sequence to the first nucleotide sequence and a complementary sequence to the second nucleotide sequence. The pgRNA templates can be generated by various means including random generation, computer modeling, or both, and generally each pgRNA template includes a sequence of nucleotides.
Example methods for determining a pgRNA sequence may further include determining whether to calculate an off-target score for each pgRNA template based at least in part on the relative activity score generated for said pgRNA template.
For example embodiments according to the present disclosure, determining the pgRNA sequence can based at least in part on the relative activity score for each pgRNA template, the off-target score, or both.
One example aspect of identifying the two or more target sequences in the viral genome can include determining a sequence position for each of one or more protospacer motifs present in the viral genome based at least in part on the Cas effector. For instance, certain Cas effectors may display preferential recognition and/or binding to different regions of the viral genome (e.g., protospacer motifs). In particular, some implementations may use the position of protospacer motifs in the viral genome to identify possible target sequences that would display improved efficacy for antiviral treatments. For example, by assigning at least one sequence position as a protospacer position, certain embodiments may identify the two or more target sequences as at least including a sequence of nucleotides immediately downstream of the protospacer position in the viral genome.
For implementations of the present disclosure, the Cas effector can include any Cas effector that can be implemented as part of a CRISPR system to result in breakage of nucleotide oligomers such as RNA or DNA. Some non-limiting examples of Cas effectors that can be used in embodiments of the disclosure include enAsCas12a (Cas12a), RfxCas13d (Cas 13d), and/or SpyCas9 (Cas 9).
As previously discussed, certain Cas effectors may display preferred recognition and/or binding to certain protospacer motifs. For instance, using a Cas effector of the present disclosure, the one or more protospacer motifs can include one or more from the group: TTYN, CTTV, RTTC, TATM, CTCC, TCCC, TACA, RTTS, TATA, TGTV, ANCC, CVCC, TGCC, GTCC, TTAC, or combinations thereof. In some implementations, the one or more protospacer motifs can include a subset of this group. For example, in certain embodiments, the one or more protospacer motifs are from the group: TTYN, CTTV, RTTC, TATM, CTCC, TCCC, TACA, or combinations thereof. More particularly, some embodiments can include identifying target sequences that occur downstream of the position of one or more of these protospacer motifs in the viral genome. As used herein, protospacer motifs are provided as nucleotide sequences: A—adenosine, C—cytosine, T—thymidine, G—guanosine, V—uridine, N—any nucleotide, R—adenosine or guanosine, S—guanosine or cytosine, Y—a pyrimidine (C, T, or V).
One aspect of example embodiments can include methods for developing pgRNA that can target members of a viral family. For instance, in some implementations, the viral genome and the different viral genome can be included in the same viral family. Viral families are similar to animal families in that the genomes of viruses of the same family display some degree of overlap which can be determined based on aligning the genetic sequence to determine the sequence identity or similarity for regions of the genome. One non-limiting example of a viral family can include coronaviruses (coronaviridae), which includes members such as SARS-CoV-2, MERS-CoV, and SARS-CoV. Another non-limiting example of a viral family can include retroviruses (retroviridae), which includes members such as human immunodeficiency virus (HIV) and human T-lymphotropic virus (HTLV).
In certain implementations, methods for determining a pgRNA sequence can include identifying target sequences in a viral genome from a certain viral family and, calculating a homology score between a first viral genome from the certain viral family and a second, different viral genome from the same certain viral family. As an example for illustration, the first viral genome can be the genome for SARS-CoV-2 and the second viral genome can be the genome for MERS-CoV.
According to an aspect of certain embodiments, comparing the pgRNA template to a complementary sequence to the first nucleotide sequence and a complementary sequence to the second nucleotide sequence can include determining a first sequence identify for the pgRNA template to the complementary sequence to the first nucleotide sequence and a second sequence identity for the pgRNA template to the complementary sequence to the second nucleotide sequence. In general, a complementary sequence as used herein carries the ordinary meaning in biology. Base paring rules for nucleotides indicate that each one of the 5 nucleobases (adenosine ‘A’, guanosine ‘G’, cytidine ‘C’, uridine ‘U’, thymidine ‘T’) has a complementary nucleobase based on the type of nitrogenous base. For example, the complement to A is T or U (and vice-versa) and the complement to C is G (and vise-versa). Thus a complementary sequence to the example oligonucleotide AUCGCAUCU can be XAGCGXAGA where ‘X’ is independently T or U. In determining whether the complement to A is T or U, the type of viral genetic material may be used as one basis. In certain embodiments for designing pgRNA, the complement to A may only be U.
For some example embodiments of the present disclosure, the first sequence identity and/or the second sequence identity can be determined according to various methods. One example method can include performing a sequence alignment such as a BLAST alignment. BLAST alignment is a tool for comparing two sequences (e.g., nucleotide sequences) to determine characteristics such as sequence identity or sequence similarity as measures of overlap between portions of the sequences. In this manner, regions of higher overlap (greater similarity) and regions of poor overlap (lower similarity) can be determined. Thus these regions of greater similarity may be used to design pgRNA that can target multiple viruses. As such, in some embodiments of the present disclosure, the relative activity score can be based at least in part on the first sequence identity and the second sequence identity.
In certain example embodiments, calculating the off-target score can be performed only for pgRNA templates having calculated the first sequence identity as greater than about 60% and the second sequence identity as greater than about 60%, such as the first sequence identity greater than 62%, 64%, 66%, 68%, 70%, 72%, 74%, 76%, 78%, 80%, 82%, 84%, 86%, 88%, 90%, 92%, 94%, 96%, or 98% and, independently, the second sequence identity greater than 62%, 64%, 66%, 68%, 70%, 72%, 74%, 76%, 78%, 80%, 82%, 84%, 86%, 88%, 90%, 92%, 94%, 96%, or 98%. For instance, in some implementations, calculating the off-target score is performed only for the pgRNA templates having calculated the first sequence identity as greater than about 90% and the second sequence identity as greater than about 90%.
An aspect of some implementations may include calculating the off-target score based at least in part on comparing each of the one or more pgRNA templates to a human genome sequence or a human transcriptome sequence. Generally, the off-target score can be used to approximate overlapping or possible reactivity between the designed pgRNA and genetic material (e.g., RNA or DNA) present in humans. In this manner, overlapping reactivity may be diminished by excluding or removing pgRNA templates meeting an off-target score threshold.
Another aspect of certain implementations can include using further selection criteria in the design of pgRNAs. For instance, determining the pgRNA sequence can based at least in part on a region of interest which includes a sequence of adjacent nucleotides present in the viral genome. The region of interest can include a position of a gene that may be of clinical or functional significance, a position which is conserved over many viral strains and or that demonstrates greater intolerance to mutations, or a position determined using an activity prediction such as one that can be performed using bioinformatic tools and/or methods, prior to experimental validation.
While the present application is generally directed to embodiments for treating humans, it should be understood that similar protocols may be developed for treating viral diseases in a variety of organisms. For example, viral prophylaxis and/or treatment is particularly needed in many agriculturally important plants and animals. One aspect of implementations for designing pgRNA for these organisms is modifying the step for calculating the off-target score. For the organism to be treated, the off-target score should be based on the alignment to the genome or transcriptome of the host organism to be treated (e.g., a plant genome). In this manner, implementations of the present disclosure can include pgRNA designed according to such example method that can be delivered to a plant to treat a viral infestation. Further, genetic modification of organisms including plants, may be used to create transgenic organisms that produce the pgRNA rather than requiring a delivery method.
One example embodiment of the present disclosure can include a pgRNA having a pgRNA sequence determined according to example embodiments of the present disclosure. Aspects of the pgRNA can include improved activity across multiple viral strains (e.g., viruses from the same viral family). For instance, the pgRNA can be included as a cofactor in a CRISPR-Cas system to produce an antiviral.
Aspects of the pgRNA can include a pgRNA sequence that is determined based on identifying two or more target sequences in a coronavirus genome (e.g., SARS-CoV-2).
Another example embodiment of the present disclosure can include a method for treating a viral infection by delivering to a patient in need thereof a composition comprising a pgRNA, the pgRNA having a pgRNA sequence determining according to example methods of the present disclosure. For instance, an example implementation of the present disclosure can include a method for treating a patient displaying symptoms of Covid-19, by delivering a composition including a pgRNA sequence determined based on identifying one or more sequences in the SARS-CoV-2 genome.
As described in the disclosure, sequence identity is related to sequence homology. Homology comparisons may be conducted by eye, or more usually, with the aid of readily available sequence comparison programs. These commercially and publicly available computer programs can be used to determine percent (%) homology between two or more sequences and may also calculate the sequence identity shared by two or more nucleic acid sequences. Sequence homologies may be generated by any of a number of computer programs known in the art, for example BLAST. BLAST and are available for offline and online searching (see e.g., https://blast.ncbi.nlm.nih.gov/Blast.cgi). As used herein, sequence identity values
further embodiment of the present disclosure can include a diagnostic that includes one or more pgRNA sequences designed according to example implementations of the present disclosure. These diagnostics can include viral detection platforms which can provide advantages such as more sensitive identification of viral genetic material (e.g., by increasing the effective numbers of viral targets in a clinical sample), improved time-to-detection, and diagnostics that are more robust to viral mutations and variations across viral strains. When these example CRISPR diagnostic effectors recognize a viral nucleic acid sequence complementary to their gRNA, they cleave the viral nucleic acids, then begin to indiscriminately degrade any other single-stranded RNA or DNA they encounter. In a CRISPR-based viral detection platform, a “probe” nucleic acid is attached to a molecule that becomes highly fluorescent when the probes are degraded indiscriminately by the CRISPR effector. When these probes are included and this reaction is coupled with an isothermal PCR reaction to increase the amount of viral nucleic acids present in a clinical sample, it rapidly produces a bright signal without the need for a thermocycler.
The present invention will be better understood with reference to the following non-limiting examples.
The present examples provide aspects of embodiments of the present disclosure. These examples are not meant to limit embodiments solely to such examples herein, but rather to illustrate some possible implementations.
The Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) isolate Wuhan-Hu-1 complete genome (NCBI Reference Sequence: NC_045512.2) served as the primary target for pgRNA development vs. the SARS-CoV-2 ssRNA genome. Design of pgRNA targets vs. HIV-1 provirus used the Human immunodeficiency virus type 1 (HXB2), complete genome; HIV1/HTLV-III/LAV reference genome (GenBank: K03455.1).
Estimates of the relative CRISPR activity at sites not perfectly targeted by the gRNA/pgRNA spacer sequence were generated by calculating the Cutting Frequency Determination (CFD) score (35,45). To calculate the CFD score, the penalty (relative reduction in CRISPR activities) that result from each site with a mismatch is first drawn from a CFD matrix, the table of position-specific reductions of activity that occur as a result of mispairing between specific nucleotides in the spacer and target. The CFD matrices for CRISPR effector were generated by the Sanjana lab (RfxCas13d) and Doench lab (SpyCas9 and enAsCas12a, using the data from the “dropout” experiments) using massively parallel screens of gRNA libraries for CRISPR activity, and CFD scoring implemented in MATLAB using publicly available data sets from those labs. The CFD score for a given target and gRNA spacer is the product of the CFD penalties for each mismatch; the position-specific penalties (average over all possible mismatched nucleotides). This approach is fast to implement and has been successfully used as a reasonable approximation for CRISPR activity at off-target sites by for a number of different CRISPR effectors. The effect of different PAMs (PAM strength) for enAsCas12a activity at different sites used multiplicative penalty using data from similar large-scale screens of PAM libraries. In the case of RfxCas13d, penalties were recovered from taking the value of the reported log2(Fold-Change in expression) to the second power, vs. a perfectly complementary targeted mRNA reporter in their massively parallel screen for gRNA activity in the presence of mismatches. A missing value (rA-rC mismatch at position 15) was interpolated from the penalties of the rA-rC mismatches at positions 14 and 16. In the event of multiple sequential mismatches (two-in-a-row, three-in-a-row, etc.), the position-specific penalties for double- and triple- mismatches were used to calculate the CFD scores at those sites. If the off-target sites had <15 nt (nucleotide) identity as the intended target (<55% identity for RfxCas13d or <65% identity for enAsCas12a), the CRISPR effectors were considered effectively inactive at those sites.
One example protocol for the design of polyvalent guide RNAs is summarized in
Step 1: Identification of Targets (‘protospacers’). For RfxCas13d, every 27 nt sequence along the Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1 complete genome was evaluated as a CRISPR target, also known as a ‘protospacer.’ For enAsCas12a, to recognized sufficiently by the enzyme protospacers must be located immediately downstream of a “Tier 1” protospacer adjacent motif (‘PAM’) (TTYN, CTTV, RTTC, TATM, CTCC, TCCC, and TACA) or a weaker “Tier 2” PAM (RTTS, TATA, TGTV, ANCC, CVCC, TGCC, GTCC, TTAC). Every 23 nt target targets sequences located immediately downstream of a Tier 1 or Tier 2 PAM sites were identified on either strand of the HIV-1 proviral reference genome and evaluated as a potential target/protospacer.
Step 2: Identification of Targetable Pairs with high homology. For each virus, every potential target was aligned to every other potential target, and pairs with >75% sequence identity (≥21 nt identity for Cas13d targets and ≥16 nt identity for Cas12a targets) identified. Those overlapping the SARS-CoV-2 poly(rA)-tail were removed from the list of potential pairs. For targeting the HIV provirus, exact target matches between pairs of sequences on the two long terminal repeat (LTR) regions were not considered (for reasons discussed below) unless they also formed a “target pair” with a segment between the two regions.
Step 3: Adaptation of pgRNA activity at pair sequences. For a given target pair, a pgRNA spacer template was generated complementary to the targets, using the location and sequences of the matching targets. Different ‘candidate pgRNA’ spacers were generated with all four potential nucleotides (rA, rU, rC, rG) at each of the sites of sequence divergence between the target pairs, i.e. 4n candidates for target pairs with n differences between sequence. A mismatch penalty (CFD score) between the candidates and each of the target pairs was calculated using the multiplicative approach (
Step 4: Estimate relative CRISPR activity across clinical strains (SARS-CoV-2). Sequences of 942 SARS-CoV-2 clinical strain variants were downloaded from the Severe acute respiratory syndrome coronavirus 2 data hub (NCBI Virus, accessed Apr. 23, 2020) (48) as all the “complete” nucleotide sequences available at the time. The sequences were then each individually aligned to the Wuhan-1 reference strain using a Needleman-Wunsch global alignment, and for each potential target site (27 nt region) across the genome, the number and prevalence of unique variants were counted. In evaluating pgRNA candidates, if the minimum relative activity across variants (MRAV) for the candidate pgRNAs across all the sequenced SARS-CoV-2 strains was <95% at either target site, the candidates were flagged. Sequences with ambiguous sites or indels (because their effect on Cas13d and Cas12a are less well defined) were removed from the calculation. To evaluate sequence conservation and “conservation of targets” across the SARS-CoV-2 genome in general (i.e.,
Step 5: Estimate relative activity at potential human off-targets. Candidate pgRNA spacers were aligned to the human genome for Cas12a (Genome Reference Consortium Human Build 38, GRCh38 human reference genome) or human transcriptome for Cas13d (GRCh38 human RefSeq transcripts) using a local nucleotide BLAST targeted for short sequences <30 nt (blastn-short). The region surrounding each hits to the human genome or transcriptome, to a total of 27 nt (the 27 nt protospacer for Cas13d and a 4 nt PAM+23 nt protospacer for Cas12a), were evaluated for a mismatch penalty score with its respective pgRNA candidates and, for Cas12a, the presence of a Tier 1 or Tier 2 PAM. While “off-target” interactions with the human transcriptome by Cas13d is not expected to have too detrimental of consequences compared to off-target genomic mutations by Cas12a, these unwanted interactions may titrate or dilute the activities of the Cas13d against the desired targets. For Cas13d, pgRNA spacer candidates with maximum predicted relative activity at any human transcript ≥10% were removed and, for Cas12a, those with maximum predicted relative activity at any site in the human genome ≥1% were removed.
Step 6: Selection of pgRNA based on additional functional criteria. At this stage, the RNA candidates have been screened for high relative activity at multiple viral targets and across clinical strains, low predicted activity at human “off-target” sites, and biophysical characteristics that suggest high overall CRISPR activity. The candidates can then be further refined by considering pgRNA targets located within specific genes or regions of interest (ROIs) that may be of clinical or functional significance, conservation of the targets/viral intolerance to mutations, and on-target activity prediction, which can be performed using several bioinformatic tools and methods available, prior to experimental validation.
One example computer implemented protocol for the design of polyvalent guide RNAs is s coded and made available at: https://github.com/ejosephslab/pgrna. This example code can be executed by a computing system such as a laptop, personal computer, or other device configured to read the code.
All complete sequences of all RNA viruses with human, mammal, arthropoda, ayes, and higher plant hosts found in the NCBI Reference Sequence database were subjected to a brute force direct (nucleotide-by-nucleotide, no gaps) alignment for each of their 23 nt sequence targets to each other, considering only sequence polymorphisms at the same site. We considered only the (+) strand, as even for (−) and dsRNA viruses these sequences would match the vast majority of mRNA sequences. Only targets lacking polynucleotide repeats (4 consecutive rU's, rC's, rG's, or rA's) were considered viable targets. Targets derived from different segments or cDNAs of the same viral strain were considered together. In total: arthropoda (1074 viral species), ayes (111), mammal (496), higher plant/embrophyta (691), and human (89)-hosted viruses were considered. For human-hosted (+) ssRNA viruses or sequenced viral transcripts (59 in the RefSeq database), candidate pgRNA sequences for RfxCas13d were generated for each target pair found with predicted (monovalent) activity at both sites to be in the top quartile,25 screened for biophysical compatibility (lacking polynucleotide repeats or significant predicted secondary structure in the spacer), and aligned to Genome Reference Consortium Human Build 38, GRCh38 human reference transcriptome) using a local nucleotide BLAST34 search optimized for short sequences <30 nt (blastn-short). Only those with no hits (less than 15 nt homology out of 23 nt targets) to the human transcriptome and with predicted activity at both sites to be within the top quartile of all Cas13 activity for targets of that virus were considered viable pgRNA candidates.
All complete SARS-CoV-2 genomic sequences available from the NCBI Virus database were downloaded on Nov. 23, 2020 (29,123 sequences). For each of the 205 target pairs possessing biophysically feasible pgRNA candidates, we aligned (no gaps) each target sequence to each genome to determine the closest matching sequence. Alignments containing ambiguous nucleotide calls were not included. Sequence variants were grouped together, with a minimum prevalence of 0.1%, with the fraction of hits by the most prevalent group being considered the sequence conservation reported.
The DNA sequences of the plant codon optimized Cas13d-EGFP with the Cas13d from Ruminococcus flavefaciens (RfxCas13d) flanked by two nuclear localization signal (NLS) was amplified from plasmid pXR001 (Addgene #109049) using Q5 high fidelity of DNA polymerase (NEB). Similarly, overlap extension PCR was performed to amplify plant expression vector pB_35S/mEGFP (Addgene #135320) with ends that matched the ends of the Cas13 product so RfxCas13d expression would be under the control of 35S Cauliflower mosaic virus promoter. The PCR products were treated with Dpnl (NEB), assembled together in a HiFi DNA assembly reaction (NEB), transformed into NEB10b cells (NEB), and grown overnight on antibiotic selection to create plasmid pB_35S/RfxCas13. Successful clones were identified and confirmed by sequencing followed by transformation into electro-competent Agrobacterium tumefaciens strain GV3101 (pMP90).
Single stranded oligonucleotides corresponding to “monovalent”, non-targeting (NT), and “polyvalent” gRNAs were purchased from Integrated DNA Technologies (Coralville, Iowa), phosphorylated, annealed, and ligated into binary vector SPDK3876 (Addgene #149275) that had been digested with restriction enzymes XbaI and XhoI (NEB) to be expressed under the pea early browning virus promoter (pEBV). The binary vector containing the right constructs were identified, sequenced and finally transformed into Agrobacterium tumefaciens strain GV3101. Multiplexed expression of two crRNAs was achieved by ligating (annealed, phosphorylated) oligos for two individual crRNAs (hairpin+spacer) together with an internal 4 nt “sticky-end” and into SPDK3876 so both crRNAs would be expressed on a single transcript.
In addition to pB 35S/RfxCas13 and the SPDK3876′s harboring gRNA sequences (TRV RNA2), PLY192 (TRV RNA1) (Addgene #148968) and RNA viruses TRBO-GFP (Addgene # 800083) were individually electroporated into A. tumefaciens strain GV3101. Single colonies were grown overnight at 28 degrees in LB media (10 g/L tryptone, 5 g/L yeast extract, 10 g/L NaCl; pH 7). The overnight cultures were then centrifuged and re-suspended in infiltration media (10 mM MOPS buffer pH 5.7, 10 mM MgC12, and 200 μM acetosyringone) and incubated to 3-4 hours at 28 degrees. The above cultures were mixed to a final OD600 of 0.5 for CasRX-NLS-GFP-pB35, 0.1 for PLY192 (TRV RNA1), 0.1 for RNA2-crRNAs and 0.005 for TRBO-GFP and injected into healthy leaves of five to six-week-old N. benthamiana plants grown under long-day conditions (16 h light, 8 h dark at 24° C.). A total of four leaves for each gRNA were infiltrated. Three days post-transfection, leaves were cut out and photographed under a handheld UV light in the dark, and stored at −80° C. before subsequent analysis.
Referring now to
Referring now to
Referring now to
Referring now to
Total RNA was extracted from infiltrated leaves using RNeasy Plant Mini Kit (Qiagen) and the yield was quantified using a nanodrop. A total of lug RNA from control (NT gRNAs) and experimental samples were used for DNase I treatment (Ambion, AM2222) followed by reverse transcription using a poly-dT primer and the Superscript III First Strand cDNA Synthesis System for RT-PCR (Invitrogen). Quantitative PCR was performed on Quant studio 3 Real-Time PCR System from Applied Biosystem using iTaq PowerUP™ SYBR Green pre-formulated 2× master mix (Applied Biosystems). Relative expression levels based on fold changes were calculated using the ddCT method. Cycle 3 GFP mRNA expression levels from the TRBO-GFP replicon were normalized against transcripts of the tobacco PP2A. The samples were performed in three biological replicates.
Initial screens were performed using synthetic dsDNA (˜300 bp) containing a T7 promoter located upstream of a specific target sequence derived from either SARS-CoV-2 (
Specificity of Cas13 collateral activity was evaluated using dsDNA fragments that were not complementary to the gRNAs being tested to confirm that activation of collateral activity as well as human universal RNA (10 tissues) (Invitrogen ThermoFisher, CA US), and total human lung RNA (Invitrogen ThermoFisher, CA US), was also used at 1 and 3 ug, respectively per reaction.
Heat-inactivated SARS-CoV-2 RNA from respiratory specimens, deposited by the Centers for Disease Control and Prevention, was obtained through BEI Resources, NIAID, NIH: Genomic RNA from SARS-Related Coronavirus 2, Isolate USA-WA1/2020, NR-52285 (American Type Culture Collection (ATCC) VR-1986HK). In a SHERLOCK-type reaction, 1 μl of heat-denatured SARS-CoV-2 (350,000 copies total) was reverse transcribed using the High Capacity cDNA Reverse Transcription Kit (Thermo Fisher Scientific) with 3.4 μl of primer (0.5 μM) in a final volume of 16 μl and PCR-amplified by the addition of 2 μl of reverse and forward target primers (2 μM) and 20 μl of 2× OneTaq Master Mix (NEB) in a final volume of 40 μl under standard thermocycler conditions (2 min at 95° C., followed by 35 cycles of 30 s at 95° C., 30 s at 49° C., and 30 sec at 68° C., followed by a final extension of 5 min at 72° C.). PCR cDNA targets were then combined accordingly, and serial dilutions were made such that the final concentrations of the starting SARs RNA material in SHERLOCK reaction was adjusted to either 400, 40, or 4 copies per μl for each target. SHERLOCK reactions were performed as described early using candidate pgRNAs and their monovalent counterparts in the presence of none (background), one, two, or four cDNA targets per reaction. SHERLOCK reactions in the absence of guide RNA were also evaluated and resulted in equivalent background signals produced from no RNA template controls.
Single guide RNA (sgRNA) was synthesized by using the EnGen sgRNA synthesis Kit (NEB, New England Biolabs, Ipswich, Mass., United States) following standard protocols. DNA oligos (IDT) were designed to contain a T7 promoter sequence upstream of the target sequences with an initiating 5′- d(G), as well as overlapping tracrRNA DNA sequence at the 3′ end of the target. The sgRNA was purified using Monarch RNA Cleanup Kit (NEB) and quantitated using standard protocols.
Duplex CRISPR gRNAs (cRNA:tracrRNA) was generated by hybridizing synthetic RNA oligos listed in Table S9 to a universal synthetic tracer RNA oligo (IDT). To hybridize oligos, equal molar concentration of oligos were combined in IDT duplex buffer to a final concentration of 10 uM. Reactions were heated to 95° C. for 2 min and allowed to cool to room temperature prior to the reaction assembly.
Cas9 Nuclease from S. pyogenes (NEB) was diluted in 1× NEB Buffer 3.1. prior to the reaction assembly. Cas9 cleavage activity was performed using either PCR-amplified targets, whole plasmid, or hybridized DNA oligos containing desired targets using standard methods. Briefly, Cas9 was preincubated with either a sgRNA or duplex gRNA (crNA:tracRNA) for 5 min at equal molar concentrations in 1× NEB Buffer 3.1 (NEB) in a volume total of 10 ul. Reactions were incubated for 5-10 min at room temperature. Target DNA was then added to the reactions, NEB Buffer 3.1 was added back to a final concentration of 1×, and nuclease-free water was added bringing the final volume to 20 ul. The final reaction contained 100 nM Cas9-CRISPR complex and 10 nM of target DNA. Similar reactions without the addition of gRNAs to Cas9 were used as a control for uncut DNA. Reactions were incubated at 37° C. for 1 hour, followed by the addition of 1 unit of Proteinase K and further incubation at 56° C. for 15 min. Reactions were stopped by the addition of one volume of purple Gel Loading dye (NEB).
Fragments were separated and analyzed using a 1.5% Agarose gel in 1×TAE and 1×SYBR Green 1 Nucleic Acid Gel Stain (Thermo Fisher Scientific; Waltham, Mass.), and fluorescence was photographed and measured (Amersham™ Imager 600; GE Life Sciences, Piscataway, N.J., United States).
Despite significant differences in the goals and desired outcomes between CRISPR precision gene editing and CRISPR antivirals as illustrated in
In the case of precision gene editing as shown in
In contrast, for CRISPR antivirals as shown in
Referring to
We hypothesized that, if we could match target sequences within a viral genome to other targets on the same viral genome with some shared sequence homology, a single gRNA spacer sequence could be adapted to maximize CRISPR activity at both targets; this is, in effect, the opposite as what is performed during gRNA design for precision gene editing. The development of “polyvalent” gRNAs—with one spacer able to target multiple protospacers—would have multiple advantages for CRISPR antiviral applications: operative “multiplexing” with fewer components, limiting the potential for viral escape, and increasing the effective number of potential “targets” a CRISPR effector could recognize in viral detection applications. This approach could exploit the myriad of validated tools that are currently used to predict and minimize off-target activity to instead maximize the predicted activity at both those sites. However, because of the differences in the objectives of current gRNA design tools, polyvalent gRNAs would normally be algorithmically rejected, so new approaches are necessary.
The design of polyvalent gRNAs or pgRNAs relies on exploiting known tolerances of CRISPR effectors for mismatches between gRNA and the target to maximize activity at multiple viral sites. These tolerances exhibit a strong dependence on both the type of mismatch (what nucleotides are incorrectly paired) and the position of the mismatch(es) along the target, and vary not only by type of CRISPR effector but across homologues of the effector derived from different species.
Careful and systematic studies have been performed to better predict and minimize the propensity of “off-target” effects gene editing; for the design of pgRNAs, we can use these same studies to instead attempt to maximize activity of a single gRNA at multiple viral sites. A metric to score the relative propensities of a CRISPR effector at a site that does not perfectly match its target that is both powerful and simple-to-implement uses a Cutting Frequency Density (CFD) matrix to estimate the penalty or relative decrease in CRISPR activity at off-target sites as a result of each difference in sequence between the target and that site. This approach is described in more detail in the Materials and Methods section. The CFD matrix consists of the mismatch-and position-specific penalties that have been derived from massively parallel characterizations of off-target CRISPR activity, and for each expected mispairing between the gRNA and the off-target site, these penalties are multiplied together to obtain a final score or relative expected CRISPR activity at that site. CFD scores in precision gene editing are used to reject gRNAs which may exhibit high activities at multiple sites in a targeted genome.
The design of pgRNAs can use CFD scores as an example metric for increasing predicted activity at multiple viral sites based at least in part on the following approach as shown in
For instance,
More particularly, candidate pgRNAs were also evaluated in silico for biophysical characteristics, like GC %, secondary structure free energy, and the ability of the ‘direct repeat’ segment of the gRNA to form (which is essential for CRISPR activity) as preliminary indicators for a high likelihood of strong on-target activities. We note that the CFD calculated in the way described above provides an estimate of CRISPR activity at the viral sites relative to a hypothetical target with a sequence perfectly complementary to the pgRNA spacer: this allows us later to integrate our pgRNA design algorithm into other computational tools that predict CRISPR activity at on-target/perfectly matched sequences.
We first sought to determine if we could generate novel pgRNAs for RfxCas13d that could be expected to exhibit high activity at multiple viral targets in SARS-CoV-2, the etiological agent of the infectious respiratory illness human COVID-19, while maintaining minimal activity with potential human off-targets (
For instance,
We first identified 81 pairs of target sites along the SARS-CoV-2 reference genome that had >75% (21/27) nt sequence identity (
562
253
1Number of targets on both strands to the immediate 5′- of a Tier 1 or Tier 2 enAsCas12a PAM.
2177 pairs, including exact matches located within the long terminal repeat (LTR) regions of the HIV-1 provirus.
3125 candidates identified with <10% activity vs. human transcriptome and >95% activity targeting the reference strain sequence; 25 candidates identified with <10% activity vs. human transcriptome and >95% activity across clinical strains.
The viral targets sites for CRISPR effectors are often chosen based not only on the gene product encoded but also by conservation of nucleotide sequence across clinical strains or related viral families. However, based on the differential ability of CRISPR effectors to recognize and degrade targeted sequences in spite of mismatches between the gRNA and the protospacer, we endeavoured to quantify the “conservation of targets” (rather than sequence, per se) as potential target sites where CRISPR effectors may be highly active across strains regardless of the presence of certain sequence variations. To evaluate the “target conservation” at each of these candidate pgRNA spacers, first we aligned the 942 sequenced viral genomes from clinical samples to the reference Wuhan-1 sequence and characterized their variability. Approximately 50% (50.07%) of the target sites possessed sequence identity, or perfect sequence conservation (SC), across all 942 samples over the entire 27 nt range (
Genetic targets for detection and inactivation SARS-CoV-2 virus have largely been focused on the highly conserved genes for nucleocapsid protein N and the gene for the RNA-dependent RNA polymerase (RdRP), which is essential for viral replication. Interestingly, the top candidate pgRNA spacers each have two target sites localized across ORF lab, which encodes a large polyprotein later processed into smaller nonstructural proteins (nsp), several of which are important for viral replication. Two of the pairs have one target within the segment of ORF lab that encodes the RnRP. The results presented here demonstrate that pgRNAs can be designed for RfxCas13d that simultaneously are expected to exhibit high relative activity at multiple (essential) target sites on the SARS-CoV-2 genome for which “target conservation” is high, while minimizing expected interactions with the human transcriptome.
UAACA
GU
AUCA-3′ (SEQ
CU
U
UA
A
CAGCA-3′ (SEQ
CAGCU
A
CUG
U
A-3′ (SEQ
ACCAGAGCA
U
C-3′ (SEQ
AUA
A
A
C
GU
GU
C-3′ (SEQ
1pgRNAs have >95% predicted relative activity at both targets; <10% predicted relative activity at hits to human transcriptome; and >95% predicted relative activity across all (948) clinical rains
2Underlined at sites where Target A and Target B/C sequences diverge
3Labelled according to np (nucleotide position) of central nucleotide of 27 nt protospacers, (according to SARS-CoV-2 Wuhan-1 strain). nsp: nonstructural protein
4Δ: Increase in predicted CRISPR activity by using pgRNA at target A or B, compared to using gRNA for target A at target B (or vice versa)
5Nucleotide BLAST targeted for short (<30 nt) sequences vs. GRCh38.p12 RefSeq transcripts.
To determine whether we could generate pgRNAs for against a dsDNA virus, we targeted the HIV-1 provirus using a Cas12a effector (
However, there are additional challenges for targeting the HIV-1 proviral genome using Cas12a. The HIV-1 proviral genome is smaller (9719 bp) than the SARS-CoV-2 genome and, while both strands of the dsDNA could be targeted, unlike Cas13, Cas12a can only target sequences positioned immediately downstream a protospacer adjacent motif (PAM') that is recognized by the enzyme itself rather than the gRNA. Even with engineered enAsCas12a, which is able to recognize a larger number of PAMs than the native enzyme, strong PAM sequences (Tier 1 or Tier 2) able to activate robust endonucleoltyic activity only appear on average every 1 in 16 bp. Additionally, off-target DSBs on the human genome hold the potential for significant deleterious consequences, so we require pgRNAs with even less potential for accidental targeting of human off-targets than Cas13.
In particular,
With these considerations taken into account, we identified 177 target sites next to Tier 1 or Tier 2 PAMs of enAsCas12a in the HIV-1 proviral genome that shared >75% homology across 23 bp targets (
For instance,
To further validate proposed example implementations, a Cas9 pgRNAwas designed for two virally-derived targets. As shown in
The crRNA spacer sequences and target sequences for the above data are provided below:
These results demonstrate that, even subject to the additional constraints, multiple pgRNAs for enAsCas12a could be generated that able to target multiple viral sites simultaneously while maintaining high specificity. These candidates can then be introduced into the computational predictors for on-target enCas12a activity and validated experimentally, where they are expected to strongly suppress reactivation of HIV-1.
AA
C
CAGU-3′ (SEQ ID
CA
GCCA
G-3′ (SEQ ID
C
UUCCCU-3′ (SEQ ID
UUCCU
A
U-3′ (SEQ ID
UAACAGA-3′ (SEQ ID
1pgRNAs have predicted relative activity at both sites >20%; both targets have Tier 1 or Tier 2 PAM sites; and predicted relative activity at BLAST n hits to human genome <1%.
2Underlined at sites where Target A and Target B sequences diverge
3Labelled at the first position of the protospacer 3′- the PAM site, according to Human immunodeficiency virus type 1 (HXB2), complete genome; HIV1/HTLV-III/LAV reference genome
4Increase in predicted CRISPR activity by using pgRNA at target A or B/C, compared to using gRNA for target A at target B (or vice versa)
5GRCh38.p12 human genome reference sequence
An analysis of 2,372 genomes of RNA viruses in the NCBI Reference Sequence database revealed that these homeologous pairs of Cas13-targetable sites (23 nt) with >70% identity (>16 out of 23 nt) are prevalent across RNA viruses of mammals, birds, and arthropods, and plants: RNA viruses with genomes that are 5,000 nt in length have on average around 30 of such pairs, and those with genomes that are 10,000 nt in length have on average approximately 120, obeying a power law scaling with genome length. For human-hosted RNA viruses, we could identify 19,926 of these homologous target pairs across 89 viruses.
Candidate pgRNA sequences for each pair are then generated in silico by determining what nucleotides at the positions of divergent sequence between the two targets would allow for and maximize predicted activity at both sites (
Sequences with predicted biophysical properties that might negatively impact expression or activity such as strong predicted secondary structures or the presence of mononucleotide stretches are then removed from consideration, as are any sequences with more than 65% complementarity with potential “off-targets” in the host genome or transcriptome (with at least 15 nts complementarity for 23 nt Cas13 targets), yielding a final set of pgRNA candidates with high predicted activity at multiple viral sites and effectively no predicted “off-target” activity vs. the host. To illustrate the broad potential applicability of our approach, we found we could design pgRNA candidates for RNA-targeting Cas13d from Ruminococcus flavefaciens XPD3002 (RfxCas13d) with predicted activity at both their targeted sites ranking in the top quartile of all “monovalent” gRNAs for that virus and no significant homology/predicted activity vs. the human transcriptome for 53 of the 59 (+) ssRNA viruses or expressed viral mRNA sequences in the NCBI Reference Sequence database. RfxCas13d, which has been used in CRISPR-based viral diagnostics and was recently demonstrated to disrupt influenza and SARS-CoV-2 virulence in human epithelial cells, was found to exhibit significant tolerance to mismatches relative to other CRISPR effectors and does not require specific flanking sequences next to its targets, so RfxCas13d may represent an optimal effector for antiviral applications in that regard.
To test our hypothesis that pgRNAs targeting to multiple viral sites simultaneously would inhibit viral propagation in vivo during a viral infection better than their monovalent counterparts, we designed pgRNAs for RfxCas13d to target pairs of protospacers found in the tobacco mosaic virus (TMV) and infected Nicotiana benthamiana with a TMV replicon (TRBO-GFP) via Agrobacterium tumefaciens-mediated transformation into its leaves (
After three days, plants expressing one of six different monovalent gRNAs showed viral RNA levels in their leaves reduced to approximately 10% to 25% of those in plants that were not targeting TMV via Cas13 (
After target recognition and cleavage, many Cas13 variants undergo a conformational change and exhibit “collateral activity” or a non-specific RNAse activity that has been used for applications in viral diagnostics such as SHERLOCK (
To assess whether pgRNAs might be suitable for in vitro viral diagnostics, We generated a series of 23 pgRNAs with high predicted activity at 15 target pairs found in SARS-CoV-2, then screened their collateral activity in the presence of their SARS-CoV-2 RNA targets and compared those results with the combined activity their perfectly matched monovalent gRNA counterparts (30 separate gRNAs). We found that each of the pgRNAs tested exhibited collateral activity at levels similar to or higher than their combined monovlanent gRNA counterparts with both targets present in the same sample, and no off-site collateral activity was detected in the presence of non-targeted RNA sequences, universal human reference RNA (10 human cell lines; ThermoFisher Scientific), or human lung total RNA (ThermoFisher Scientific) (3 μg RNA). We then assessed their limits of detection (LoD) in a SHERLOCK-type assay using Cas13 and the best-performing pgRNAs, and found that Cas13 with single pgRNAs (recognizing two sites) or two pgRNAs (recognizing four) could robustly generate detectable signals in samples initially containing 40 cp/uL heat-inactivated SARS-CoV-2 (clinically relevant LoD for SARS-CoV-2 is often considered to be 1000 cp/uL) (
Last, we sought to determine whether the design principles we use for pgRNAs could be applied to gRNAs of other types of CRISPR effectors like the Cas9 effector from Streptococcus pyogenes (SpyCas9), which recognizes and introduces double-strand breaks into dsDNA targets (
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
The CRISPR effector proteins used in biotechnological applications were originally found in bacteria and archaea as an antiviral mechanism to degrade foreign DNA and RNA, and so some tolerance to sequence variation in their targets is likely beneficial for this purpose. In gene editing applications, this tolerance is suppressed to the greatest extent possible using a number of strategies to prevent degradation and mutations at any sequence not exactly matching the gRNA spacer sequence. Rather, in a new gRNA design paradigm for antiviral applications, we show that the polyvalent targeting of viruses by single engineered gRNAs—optimized based on the CRISPR effector's natural position- and sequence-determined tolerance for mismatches for activity at the homologous target pairs that are abundant in viral genomes—can drive robust CRISPR activity at specific targeted pairs simultaneously in vitro/ex vivo, can exhibit stronger viral suppression during infection of a higher organism relative to “monovalent” targeting, and may in fact be optimal for applications of CRISPR antiviral diagnostics, prophylactics, and therapeutics.
The present application claims priority to U.S. Provisional Application Ser. No. 63/128,453, filed Dec. 21, 2020, the contents and substance of which are incorporated herein in their entirety by reference.
This invention was made with government support under contract nos. A20-0074-001 and RAMSeS 19-0113 awarded by the National Institute of Health and National institute of General Medical Sciences, respectively. The government has certain rights in the invention.
| Number | Date | Country | |
|---|---|---|---|
| 63128453 | Dec 2020 | US |