The present application claims the benefit of Chinese application for invention No. CN202110686751.3 filed on Jun. 21, 2021, the content of which is incorporated herein in its entirety.
The present invention pertains to the field of genetic engineering. Specifically, the present invention relates to a library construction method based on long overhang sequence ligation, a library constructed by the library construction method, for example, a CRISPR library of pair-specific multiplexed gRNA (guide RNA) combinations, and a use of the library.
With the development of systemic biology, various high-throughput biotechnology methods have emerged. In the field of molecular biology, various library screening techniques can identify genotype-phenotype relationships without prior knowledge. The core steps of library screening technology include: (1) high-complexity library construction, and (2) phenotype screening. Among them, the library construction mainly involves the design and synthesis of high-complexity oligonucleotide fragment (oligo) library, and the commonly used synthesis techniques that meet the requirements are mainly chip-based oligonucleotide pool (oligo pool) synthesis. The limitations of price, quality and length of oligonucleotide pool synthesis have currently become the main bottlenecks in the field.
There are 20,000 to 30,000 genes in the human body, and their functions vary in different biological processes and environments. In order to study the function of each gene in specific biological processes and environments, a variety of high-throughput gene function screening tools have been developed. Genetic screening based on CRISPR (clustered regularly interspaced short palindromic repeats) editing technology is an important tool for such studies. In CRISPR technology, a guide RNA (guide RNA, gRNA) binding a gene is designed to guide a corresponding enzyme to intervene the expression of a target gene. At present, genetic screening based on CRISPR editing technology can intervene a gene in a single cell and intervene different genes in different cells, thereby achieving the effect of screening all functional genes in a cell population.
However, from a biological perspective, combinatorial behaviors of genes widely exist in cells, which is a common characteristic of complex organisms. In other words, most biological states and functions of cells are not determined by molecules expressed by a single gene, but are jointly regulated by groups of molecules expressed by multiple genes. These molecules are successively expressed by genes and activated to transmit biological signals and regulate various biological states and behaviors of cells. In cells, the combined effects of various regulatory genes are complex and diverse, and there are compensatory mechanisms. The intervention of a combined effect of multiple genes cannot be effectively achieved by targeting a single gene, which hinders the application of CRISPR editing technology to the screening and research of the biological state and behavior of cells regulated by multiple genes. The lack of high throughput methodology hinders the combinatorial mapping from genotypes to phenotype. In many cases, disturbing a single gene is insufficient to direct to a phenotype of interest. For example, in cancer progression, sets of transcription factors crosstalk with each other to orchestrate the invasion-metastasis cascade. Therefore, a screening method for high-order combinatorial genetic perturbation is urgently needed to accelerate the research and discovery of the complex gene coordination.
From a technical view, a library with smaller complexity is favored in many genetic screenings, as large-sized library typically requires more efforts to construct and needs large amount of host cells to achieve decent coverage. In applications where cells are difficult to obtain, or subjected to inject into animals, optimized small libraries have intrinsic advantages.
So far, it is still challenging to multiplex more than two pre-designed genes in CRISPR/Cas9 screens. The challenge stems from the length limitation of oligo pool synthesis, which is typically around 150-nt and can fit 2 to 3 sgRNAs in maximum. Although longer oligo synthesis is theoretically possible, the error rate and cost quickly increase along with the length, so that the oligo pool synthesized in regular length is still more preferred and practical. The crRNA for the Cpf1 editing has its advantage of shorter unit length, and may fit in up to three units into oligo sequence with proper design, but the limited options on the PAM sequences, especially around promoter regions with high GC content, restricted it from being a universal solution for high-order combinatorial genetic screens.
Therefore, there is an urgent need in the art for a library construction method based on long overhang sequence ligation with cloning accuracy and efficiency reaching the level of library construction, and a library constructed by the library construction method.
In the present application, the present inventors have developed as exemplary embodiments a high-complexity library construction method that realizes long fragment ligation at the library level, specifically a library construction method based on long overhang sequence ligation, as well as constructed and obtained a pair-specific multiplexed gRNA combination library for CRISPR editing and screening by the library construction method, and realized the purpose of performing CRISPR library screening for combinations of 4 gRNAs at the same time.
For example, in order to solve at least one of the problems existing in the prior art and enable massively parallel characterization of multiplexed and pre-designed gRNA combinations, the present inventors developed an in-library ligation method that enables the ligation of thousands of sequences to their specific counterparts, which generated a 4gRNA-comb library with high accuracy. Compared to the previous two-gRNA screening, the exemplary 4gRNA-comb library of the present invention facilitated the discovery of high-order gene coordination with higher efficiency, which cannot be achieved by the existing gRNA screening techniques. Besides, the 4gRNA-comb library could significantly reduce the required library complexity for the discovery of candidates, as a 4-gRNA combination contains multiple subsets, for example, including four single-gRNA subset, six double gRNA pairs and four three-gRNA subsets. Moreover, candidate 4gRNA combination from screening could be dissected for further investigation, e.g., to further analyze and probe the screened subset, or to identify synergistic effects among genes, and so on.
One object of the present invention is to provide a CRISPR library of pair-specific multiplexed gRNA combinations, so as to perform high-throughput screening for effect of specific multi-gene combinations in regulating the biological state and behavior of cells.
Another object of the present invention is to provide a library construction method based on long overhang sequence ligation, in which the library construction method can optimize the cloning method of pair-specific multiplexed gRNA combinations, so that the cloning efficiency can reach the level for library construction.
In a first aspect, the present invention provides a CRISPR library of pair-specific multi-gene combinations, in which the CRISPR library comprises a plurality of vectors each carrying more than two kinds of gRNA sequences, for example, each vector in the CRISPR library carries 3 to 6 kinds of gRNA sequences, for example, each vector in the CRISPR library carries 4 kinds of gRNA sequences.
The more than two kinds of gRNA sequences carried on each vector in the CRISPR library are capable of performing co-editing of more than two kinds of important molecules (e.g., genes). For example, when each vector in the CRISPR library carries 4 kinds of gRNA sequences, the 4 kinds of gRNA sequences carried on each vector are capable of performing the co-editing of four kinds of important molecules (e.g., genes). The more than two kinds of important molecules may be molecules in one or more signaling pathways, or one or more gene families, for example, may be molecules in the same signaling pathway or the same gene family, or may be molecules in different signaling pathways or different gene families.
In one embodiment, the more than two kinds of gRNA sequences may be a combination of 3 kinds of gRNA sequences, a combination of 4 kinds of gRNA sequences, a combination of 5 kinds of gRNA sequences, a combination of 6 kinds of gRNA sequences, etc.; however, considering factors such as construction cost and error rate, a combination of 4 kinds of gRNA sequences is preferred. In addition, the more than two kinds of gRNA sequences are separated between each other by tRNA. When more than two kinds of tRNAs are present, the sequences of the tRNAs can be the same or different.
In one embodiment, the present invention provides a CRISPR library of pair-specific multi-gene combinations, the CRISPR library comprises a plurality of vectors each carrying 4 kinds of gRNA sequences, and the 4 kinds of gRNA sequences carried on each vector are capable of targeting and co-editing any 4 kinds of important molecules.
Those skilled in the art can understand that the gRNA sequences can be selected and designed according to the gene sequences and gene number in the pathways and gene families to be screened to determine whether synergistic genes exist, and the number of recombinant vectors in the CRISPR library can be determined according to the desired coverage rate after the gRNA sequences are combined.
For example, in one embodiment, the CRISPR library of pair-specific multi-gene combinations comprises: approximately more than 6000 vectors each carrying 4 kinds of gRNA sequences, in which the 4 kinds of gRNA sequences carried on each vector are directed to co-editing of 4 kinds of important molecules.
In one embodiment, in the CRISPR library of pair-specific multi-gene combinations, each vector comprises an insert fragment as shown by gRNA1-tRNA1-gRNA2-tRNA2-gRNA3-tRNA3-gRNA4, wherein gRNA1, gRNA2, gRNA3 and gRNA4 are directed to editing of 4 different genes, respectively.
In one embodiment, the insert fragment shown by gRNA1-tRNA1-gRNA2-tRNA2-gRNA3-tRNA3-gRNA4 further comprises a U6 promoter, preferably a human U6 promoter, at the N-terminus.
In one embodiment, the sequences of tRNA1, tRNA2 and tRNA3 may all be the same, or two of them may be the same and one is different, or all three of them may be different. In addition, the tRNA1, tRNA2 and tRNA3 can be derived from different species, which can be appropriately selected according to the research purpose.
In a preferred embodiment, the sequences of tRNA1, tRNA2 and tRNA3 can be represented by SEQ ID NOs: 705, 706 or 707, respectively.
In a second aspect, the present invention provides a method for constructing a CRISPR library of pair-specific multiplexed gRNA combinations based on long overhang sequence ligation, the method comprising the following steps:
In one embodiment, those skilled in the art will appreciate that in step (1), a mixture of two or more oligonucleotide chain pools, such as a mixture of oligonucleotide chain pools 1, 2 and 3, a mixture of oligonucleotide chain pools 1, 2, 3 and 4, and the like, may be designed and synthesized, according to the number of gRNAs to be constructed in one vector of the CRISPR library and the requirements in practice, and each oligonucleotide sequence in each oligonucleotide chain pool may comprise a suitable number of gRNAs, for example, one or more kinds of gRNAs. There is no limitation to the number of the oligonucleotide chain pools in the mixture, provided that, no matter how many oligonucleotide chain pools are designed in the mixture, they will be ligated by their complementary long overhangs into linear library sequences, after PCR amplification and nicking endonuclease digestion. For example, if the mixture of two or more oligonucleotide chain pools comprises three oligonucleotide chain pools, i.e., oligonucleotide chain pools 1, 2 and 3, for 3′-end of each sequence in oligonucleotide chain pool 1, there is only one kind of 5′-end sequence completely complementary thereto in oligonucleotide chain pool 2, and for 3′-end of each sequence in oligonucleotide chain pool 2, there is only one kind of 5′-end sequence completely complementary thereto in oligonucleotide chain pool 3, wherein the complementary portion has a sequence length of 2-100 nucleotides, such as 4-50, 10-40, 15-35, 20-30, preferably 21 nucleotides (21 nt). In this way, after PCR amplification and nicking endonuclease digestion, the sequence derived from oligonucleotide chain pool 1, the sequence derived from oligonucleotide chain pool 2, and the sequence derived from oligonucleotide chain pool 3, may be ligated into a linear sequence (i.e., by their complementary long overhangs), which comprises pair-specific multiplexed gRNA combinations.
For example, for constructing a CRISPR library of pair-specific 4-gRNA combinations, a mixture of two oligonucleotide chain pools may be designed, in which each oligonucleotide sequence in either oligonucleotide chain pool comprises 2 kinds of gRNAs, or each oligonucleotide sequence in one oligonucleotide chain pool comprises 1 gRNA and each oligonucleotide sequence in the other oligonucleotide chain pool comprises 3 kinds of gRNAs; or a mixture of three oligonucleotide chain pools may be designed, in which each oligonucleotide sequence in one oligonucleotide chain pool comprises 2 kinds of gRNAs, and each oligonucleotide sequence in the remaining two oligonucleotide chain pools comprises 1 gRNA.
In another embodiment, for constructing a CRISPR library of pair-specific 5-gRNA combinations, a mixture of two oligonucleotide chain pools may be designed, in which each oligonucleotide sequence in one oligonucleotide chain pool comprises 2 kinds of gRNAs, and each oligonucleotide sequence in the other oligonucleotide chain pool comprises 3 kinds of gRNAs; or a mixture of three oligonucleotide chain pools may be designed, in which each oligonucleotide sequence in one oligonucleotide chain pool comprises 1 gRNA, and each oligonucleotide sequence in two oligonucleotide chain pools comprises 2 kinds of gRNAs, and the like.
In one embodiment, the present invention provides a method for constructing a CRISPR library of pair-specific multiplexed gRNA combinations based on long overhang sequence ligation, the method comprising the following steps:
In one embodiment, for the 3′ end of each kind of sequence in the oligonucleotide chain pool 1, there is only one kind of 5′-end sequence completely complementary thereto in the oligonucleotide chain pool 2, and the complementary portion has a sequence length of 2-100 nucleotides, such as 4-50, 10-40, 15-35, 20-30, preferably 21 nucleotides (21 nt) in the present invention. The sequence of the complementary portion is capable of forming a complementary overhang after it is subjected to PCR amplification and nicking endonuclease digestion. That is, the term “long overhang” or “long overhang sequence” as used herein corresponds to the complementary portion between the oligonucleotide chain pool 1 and the oligonucleotide chain pool 2. Generally, the long overhang sequence may have a length of 2-100 nucleotides, such as 4-50, 10-40, 15-35, 20-30, preferably 21 nt.
It will be understood by those skilled in the art that the length of the complementary portion sequence (i.e., the long overhang sequence to be generated therefrom) may be longer or shorter in specific instances. The selection of the length should be considered in two aspects: (1) it is long enough to ensure that enough one-to-one correspondence combinations between the single-chain pool 1 (i.e., the oligonucleotide chain pool 1) and the single-chain pool 2 (i.e., the oligonucleotide chain pool 2) can be generated; (2) the total length of the designed sequence does not exceed an upper limit of the current general synthesis of oligonucleotide single-chain pools.
In one embodiment, in step (2), the reverse primer in the primer pair used for the amplification of the oligonucleotide chain pool 1 is biotinylated, and the forward primer in the primer pair used for the amplification of the oligonucleotide chain pool 2 is biotinylated, so that a double-stranded amplification product with biotin in one amplified chain is obtained, and in step (3), a biotin-carrying small fragment generated during the nicking endonuclease digestion can be removed by using a streptavidin magnetic bead, thereby more thoroughly exposing the 3′- and 5′-overhang products of the oligonucleotide chain pools.
In one embodiment, in step (3), the library chain pools 1 and 2 obtained by the PCR amplification are digested by nicking endonuclease, respectively, wherein the sequence in library chain pool 1 forms 3′-end long overhang of 21 nt, and the sequence in library chain pool 2 forms 5′-end long overhang of 21 nt; for the pair-specific gRNA combinations, the two overhang sequences are complementary, and the corresponding two DNA chains can be specifically ligated under the action of T4 ligase, thereby obtaining a DNA sequence shown by gRNA1-gRNA2-gRNA3-gRNA4.
In one embodiment, the nicking endonuclease used in step (3) may be selected from, but not limited to, Nb.BsrDI or Nt.BspQI nicking endonucleases.
In one embodiment, the double-stranded library fragments obtained by annealing and ligation in step (3) may have double-stranded fragments that are poorly matched (for example, mismatched), therefore, before step (4), T7 endonuclease I (T7E1) may be used to perform a digestion reaction to remove these poorly matched double-stranded fragments.
In one embodiment, the vector used in step (4) may be a viral vector, for example, a lentiviral vector, a retroviral vector, an adenoviral vector, an adeno-associated virus vector, but not limited thereto, and those skilled in the art can select an appropriate vector according to actual requirements.
In one embodiment, the sequences of tRNA1, tRNA2 and tRNA3 used in step (5) are all the same, or two of them are the same and one is different, or all three of them are different. In addition, tRNA1, tRNA2 and tRNA3 may be derived from different species, which may be appropriately selected according to the research purpose.
In one embodiment, in step (5), tRNA1, tRNA2 and tRNA3 are sequentially introduced through golden gate assembly to form a vector comprising the insert fragment shown by gRNA1-tRNA1-gRNA2-tRNA2-gRNA3-tRNA3-gRNA4. In the introduction of tRNA1, tRNA2 and tRNA3, different endonucleases are used respectively to ensure that each insertion is at a pre-designed position.
In a preferred embodiment, the TypeIIS endonucleases used in the three-step reaction for the introduction of tRNA1, tRNA2 and tRNA3 may be AarI, BbsI and BsaI, respectively. Those skilled in the art can design desired restriction sites and select corresponding endonucleases according to actual needs. The method of sequentially introducing tRNA1, tRNA2 and tRNA3 is also within the capacity of those skilled in the art.
In one embodiment, the insert fragment shown by gRNA1-tRNA1-gRNA2-tRNA2-gRNA3-tRNA3-gRNA4 is under the control of a U6 promoter, preferably a human U6 promoter.
In a specific embodiment, the complete library vectors obtained in step (5) are about 6000 recombinant vectors, and each vector comprises different 4-gRNA combinations, which are capable of performing co-editing of 4 kinds of important molecules in a signaling pathway or a gene family.
In a specific embodiment, the vector used in step (4) is a lentiviral vector, therefore, after obtaining the complete library vector in step (5), the following step is also included: (6) a step of packaging lentivirus with the constructed library vector and detecting lentivirus titer.
In a specific embodiment, the method for constructing a CRISPR library of pair-specific multiplexed gRNA combinations based on long overhang sequence ligation of the present invention comprises the following steps:
In one embodiment, the vector used in step (4) may be a viral vector, for example, a lentiviral vector, a retroviral vector, an adenoviral vector, an adeno-associated virus vector, but not limited thereto, and those skilled in the art may select an appropriate vector according to actual needs.
In one embodiment, the vector used in step (4) is a lentiviral vector, and the method further comprises the following step after obtaining the complete library vector in step (5): (6) a step of packaging lentivirus with the constructed library vector and detecting lentivirus titer.
In an alternative embodiment, the present application provides a method for constructing a CRISPR library of pair-specific multiplexed gRNA combinations based on long overhang sequence ligation, comprising:
Those skilled in the art may appreciate that in step (1), two or more additional mixtures of two oligonucleotide chain pools may be further synthesized according to the designed sequences of the library and the requirements in practice, wherein each oligonucleotide sequence in each oligonucleotide chain pool comprises one or more kinds of gRNAs. Each oligonucleotide sequence in each oligonucleotide chain pool follows the same principle of complementary, so that after PCR amplification and nicking endonuclease digestion, the oligonucleotide sequences derived from each oligonucleotide chain pool will be ligated into a linear sequence (i.e., by their complementary long overhangs), which comprises pair-specific multiplexed gRNA combinations.
In one embodiment, step (3) may be performed as follows:
In a third aspect, the present invention provides a library construction method based on long overhang sequence ligation, the method comprising the following steps:
Wherein, in step (1), “for the 3′ end of each kind of sequence in the oligonucleotide chain pool 1, there is only one kind of 5′-end sequence completely complementary thereto in the oligonucleotide chain pool 2”, means that the 5′ end of each sequence in the oligonucleotide chain pool 2 can be completely complementary to the 3′ end of one or more sequences in the oligonucleotide chain pool 1; for example, in one embodiment, the 5′ end of each sequence in the oligonucleotide chain pool 2 is completely complementary to the 3′ end of one sequence in the oligonucleotide chain pool 1; in another embodiment, the 5′ end of each sequence in the oligonucleotide chain pool 2 is completely complementary to the 3′ end of a plurality of sequences in the oligonucleotide chain pool 1, so that one target sequence can test a variety of other additional sequences that can be paired with it, which can reduce the cost of library construction, and many such target sequences can be tested simultaneously during a test performed in the library.
Those skilled in the art will appreciate that in step (1), a mixture of more than two oligonucleotide chain pools, such as a mixture of oligonucleotide chain pools 1, 2 and 3, a mixture of oligonucleotide chain pools 1, 2, 3 and 4, and the like, can be designed and synthesized, according to the requirements in practice. There is no limitation to the number of the oligonucleotide chain pools in the mixture. No matter how many oligonucleotide chain pools are designed in the mixture, they will be ligated by their complementary long overhangs into a linear sequence, after PCR amplification and nicking endonuclease digestion. For example, if the mixture of more than two oligonucleotide chain pools comprises three oligonucleotide chain pools, i.e., oligonucleotide chain pools 1, 2 and 3, for 3′-end of each sequence in oligonucleotide chain pool 1, there is only one kind of 5′-end sequence completely complementary thereto in oligonucleotide chain pool 2, and for 3′-end of each sequence in oligonucleotide chain pool 2, there is only one kind of 5′-end sequence completely complementary thereto in oligonucleotide chain pool 3, wherein the complementary portion has a sequence length of 15-35 nucleotides, preferably 20-30 nucleotides, more preferably 21 nucleotides (21 nt). In this way, after PCR amplification and nicking endonuclease digestion, the sequence derived from oligonucleotide chain pool 1, the sequence derived from oligonucleotide chain pool 2, and the sequence derived from oligonucleotide chain pool 3, may be ligated into a linear sequence by their complementary long overhangs.
In one embodiment, the nicking endonuclease used in step (3) may be selected from, but not limited to, Nb.BsrDI or Nt.BspQI nicking endonuclease.
In one embodiment, the vector used in step (4) can be a viral vector, for example, a lentiviral vector, a retroviral vector, an adenoviral vector, an adeno-associated virus vector, but not limited thereto, and those skilled in the art can select appropriate vectors according to actual needs.
In a specific embodiment, the vector used in step (4) is a lentiviral vector, therefore, after obtaining the primary library vector in step (4), the method further comprises the following steps: (5) a step of packaging lentivirus with the constructed library vector and detecting lentivirus titer.
In one embodiment, the primary library vector obtained in step (4) can be further processed, for example, by introducing another required insert fragment to form a complete library vector.
According to the disclosure in the second aspect and the third aspect, those skilled in the art can understand that the term “in-library ligation” refers to a high-throughput and specific fragment ligation in a library, which is a nucleic acid sample (library) composed of a large number of mixed sequences, to achieve the purpose of extending sequence length and/or increasing sequence diversity. By using nicking endonuclease (e.g., Nb.BsrDI), this method can generate on nucleic acid fragments long overhangs that have one-to-one correspondence and can realize pairwise complementary ligation between nucleic acid sequences. Thus, in a high-throughput library, the ligation reaction occurs only between pre-designed sequences. The key difference between this ligation reaction and the general restriction endonuclease-mediated digestion reaction lies in that: the in-library ligation reaction can generate overhang sequences with variable lengths on nucleic acid sequences through pre-designed complementary sequences and nicking restriction enzymes. Since the length of the overhang sequence determines the number of fragment combinations that can achieve pairwise complementarity (the theoretical value is 4n, wherein n is the number of nucleotides in the overhang sequence, for example, the theoretical value for an overhang sequence with a length of 10 nucleotides is 410). The in-library ligation reaction can theoretically realize the one-to-one corresponding ligation among the internal sequences of a mixed nucleic acid sample with extremely high complexity. However, the common restriction enzymes (e.g., EcoRI, BamHI, etc.), firstly, have very limited types, and secondly, can generate overhang sequences that are generally 4 to 6 nucleotides with a specific sequence, so that the library constructed by using common restriction enzymes does not comprise a high-throughput one-to-one corresponding ligation among specified sequences.
In the in-library ligation design, a nicking endonuclease, for example, Nb. BsrDI, is used to generate nicks on the double strand DNA sequences. The Nb. BsrDI is one example of so-called “nicking endonuclease”, which has different cutting pattern from the commonly used restriction enzymes (e.g., EcoRI). To apply the Nb. BsrDI digestion, two recognition sites were designed to ligate two sub-pools. One recognition site located on the top strand of DNA, generated one nick at the top strand. Another recognition site located on the bottom strand of DNA, generated one nick at the bottom strand. And the two recognition sites were apart away from each other, resulted two nicks that were also apart from each other. Importantly, the distance between the two nicks is flexible. When we design the oligo, we could adjust the distance between the two recognition sites to determine the distance between the two nicks. This is why long overhangs of 21-nt (or shorter, or longer, e.g., 2-100 nucleotides, such as 4-50, 10-40, 15-35, 20-30, preferably 21 nt) can be generated by using nicking endonuclease.
In a fourth aspect, the present invention provides a CRISPR library constructed by the method of the second or third aspect.
In a fifth aspect, the present invention provides a host cell transformed with the CRISPR library of the first or fourth aspect.
In one embodiment, the host cell used to transform the CRISPR library can be a prokaryotic cell, such as a bacterial cell, such as, but not limited to, an E. coli cell, etc., or a eukaryotic cell such as a fungal cell (e.g., yeast cell), or a mammalian cell, such as, but not limited to, a murine cell or a human cell, and the like.
In a preferred embodiment, the host cell used to transform the CRISPR library is a mammalian cell, such as, but not limited to, a murine cell or a human cell, and the like.
In a sixth aspect, the present invention provides a use of the CRISPR library of pair-specific multi-gene combinations. Specifically, the CRISPR library can be used for a high-throughput screening of a signaling pathway or gene family that determines a specific cell biological event, so as to obtain a plurality of kinds of interacting genes corresponding to a specific phenotype.
In a seventh aspect, the present invention provides a high-throughput method for combined screening of a plurality of kinds of interacting genes, the method is performed by using the CRISPR library of pair-specific multi-gene combinations described in the first aspect of the present invention.
In summary, the present application provides a fragment to realize high-throughput pair-specific ligation in library by using a long overhang sequence that is generated with nicking restriction enzyme, can achieve pairwise complementarity and has one-to-one correspondence. By optimizing the cloning method of pair-specific multiplexed gRNA combinations so that the cloning efficiency can reach the level of library construction, an innovative library construction scheme of multi-gene editing system is provided. The establishment of this library construction scheme makes it possible to study the role of multi-gene combined functions in cell regulation. That is, the present invention has achieved at least one of the following beneficial technical effects:
The above is an overview and therefore contains simplifications, generalizations and omissions of details where necessary. Accordingly, those skilled in the art will recognize that this overview is illustrative only and is not intended to limit the present invention in any way. Other aspects, features and advantages of the method, CRISPR library and/or other subject matter described herein will be apparent from the teachings presented herein. An overview is provided to briefly introduce some selected concepts that are further described in the detailed description below. This overview is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Furthermore, the contents of all references, patents, and published patent applications cited throughout this application are incorporated herein by reference in their entirety.
Although the present invention may be embodied in many different forms, disclosed herein are specific illustrative embodiments thereof that demonstrate the principles of the present invention. It should be emphasized that the present invention is not limited to the specific embodiments illustrated herein. Furthermore, any section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described.
Unless otherwise defined herein, scientific and technical terms used in conjunction with the present invention have the meanings commonly understood by one of ordinary skill in the art. Furthermore, unless the context otherwise requires, terms in the singular forms shall include the plural forms thereof, and terms in the plural forms shall include the singular forms thereof. More specifically, as used in the description and the appended claims, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise. In this application, the use of “or” means “and/or” unless stated otherwise. Furthermore, the use of the term “comprising” and other forms such as “including” and “containing” is not restrictive. Furthermore, the ranges provided in the description and the appended claims include all values between the endpoints and the breakpoints.
For better understanding the present invention, definitions and explanations of related terms are provided below.
The term CRISPR (Clustered regularly interspaced short palindromic repeats) refers to a repetitive sequence in the genome of prokaryotes, it is an immune weapon produced by bacteria and viruses in the history of life evolution. Briefly, during the infection with viruses, viruses can integrate their genes into the bacterial genome, and use the bacterial cell tools to serve their own gene replication; however, in order to remove the foreign invasion genes of viruses, the bacteria have evolved a CRISPR-Cas9 system, by using this system, the bacteria can silently excise the integrated viral genes from their own chromosomes, and this is the bacteria's unique immune system. CRISPR technology was discovered in the early 1990s, and as the research progressed, it quickly became the most popular gene-editing tool in fields such as human biology, agriculture, and microbiology.
In general, “CRISPR system” is collectively referred to as transcripts and other elements involved in the expression or directing activity of CRISPR-associated (“Cas”) gene, including sequence encoding Cas gene, tracr (transactivating CRISPR) sequence (e.g., tracrRNA or active part of tracrRNA), tracr pairing sequence (covering “direct repeat” and partial direct repeat of tracrRNA processing in the context of endogenous CRISPR system), guide sequence (also known as “spacer” in the context of endogenous CRISPR system”), or other sequence and transcript from CRISPR locus. In some embodiments, one or more elements of CRISPR system are derived from Type I, Type II, or Type III CRISPR system. In some embodiments, one or more elements of the CRISPR system are derived from a particular organism comprising an endogenous CRISPR system, such as Streptococcus pyogenes. In general, CRISPR system is characterized by elements that facilitate the formation of a CRISPR complex (also referred to as a protospacer in the context of an endogenous CRISPR system) at the site of a target sequence. In the context of CRISPR complex formation, “target sequence” refers to a sequence for which a guide sequence is designed to be complementary thereto, in which the hybridization between the target sequence and guide sequence promotes the formation of the CRISPR complex. Perfect complementarity is not required, provided that sufficient complementarity is present to cause hybridization and facilitate the formation of a CRISPR complex. A target sequence can comprise any polynucleotide, such as DNA or RNA polynucleotide. In some embodiments, the target sequence is located in the nucleus or cytoplasm of the cell. In some embodiments, the target sequence may be located in an organelle such as mitochondria or chloroplast of a eukaryotic cell. A sequence or template that can be used for recombination into a target locus that includes the target sequence is referred to as an “editing template” or “editing polynucleotide” or “editing sequence.” In the present invention, the exogenous template polynucleotide may be referred to as an editing template. In one aspect of the present invention, the recombination is homologous recombination.
In various aspects of the present invention, the terms “chimeric RNA”, “chimeric guide RNA”, “guide RNA”, “single guide RNA” and “synthetic guide RNA” are used interchangeably and refer to a polynucleotide sequence comprising guide sequence, tracr sequence, and tracr-pairing sequence. The term “guide sequence” refers to a sequence of approximately 20 bp within the guide RNA of a designated target site, and is used interchangeably with the term “guide” or “spacer.” The term “tracr-pairing sequence” is also used interchangeably with the term “direct repeat(s)”. The 20-nucleotide sequence at the 5′ end of guide gRNA is designated as spacer sequence (i.e., spacer) and is used to identify and bind to the complementary target sequence in the genome. The spacer sequence represents the specificity of gRNA. In a gRNA library, usually only the spacer sequence representing the specificity of gRNA is different between each sequence in the library. The spacer sequence of approximately 20 nucleotides, together with the downstream dozens of nucleotides, forms some special structures on the secondary structure, and binds a nuclease (e.g., Cas9) to direct the Cas nuclease to the target sequence for gene edition.
The terms “polynucleotide”, “nucleotide”, “nucleotide sequence”, “nucleic acid” and “oligonucleotide” are used interchangeably. They refer to a polymeric form of nucleotides in any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. A polynucleotide can have any three-dimensional structure and can perform any function, known or unknown. The following are non-limiting examples of polynucleotide: coding or non-coding region of a gene or gene fragment, multiple loci (one locus) defined by ligation analysis, exon, intron, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short hairpin RNA (shRNA), micro-RNA (miRNA), ribozyme, cDNA, recombinant polynucleotide, branched polynucleotide, plasmid, vector, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probe, and primer. A polynucleotide may contain one or more modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modification of the nucleotide structure can be performed before or after polymer assembly. The sequence of nucleotide can be interrupted by a non-nucleotide component. Polynucleotide can be further modified after polymerization, such as by conjugation to a labeled component.
“Complementarity” refers to the ability of a nucleic acid sequence to form one or more hydrogen bonds with another nucleic acid sequence by means of classical Watson-Crick or other non-classical types of interaction. The percentage of complementarity represents the percentage of residues in a nucleic acid molecule that can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 5, 6, 7, 8, 9, 10 out of 10 mean 50%, 60%, 70%, 80%, 90%, and 100% complementary). “Completely complementary” means that all contiguous residues of a nucleic acid sequence form hydrogen bonds with the same number of contiguous residues in a second nucleic acid sequence. “Substantially complementary” as used herein means being at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99% or 100% complementary over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50 or more nucleotides; alternatively, refers to the case where two nucleic acids are capable of hybridizing under stringent conditions.
“Expression” as used herein refers to a process by which a polynucleotide is transcribed (e.g., into mRNA or other RNA transcripts) from a DNA template and/or a process by which the transcribed mRNA is subsequently translated into a peptide, polypeptide or protein. Transcript and encoded polypeptide may be collectively referred to as “gene products.” If the polynucleotide is derived from a genomic DNA, the expression may comprise splicing of mRNA in a eukaryotic cell.
Generally, and throughout this description, the term “vector” refers to a nucleic acid molecule capable of transporting another nucleic acid molecule to which it is linked. Vectors include, but are not limited to, single-stranded, double-stranded, or partially double-stranded nucleic acid molecules; nucleic acid molecules comprising one or more free ends, no free end (e.g., circular); nucleic acid molecules comprising DNA, RNA or both; and a wide variety of other polynucleotides known in the art. One type of vector is a “plasmid,” which refers to a circular double-stranded DNA loop into which an additional DNA fragment can be inserted, for example, by standard molecular cloning techniques. Another type of vector is a viral vector, in which a virus-derived DNA or RNA sequence is present in a vector for packaging a virus (e.g., retrovirus, replication-defective retrovirus, adenovirus, replication-defective adenovirus, and adeno-associated virus). Viral vectors also include a polynucleotide carried by a virus used for transfection into a host cell. Certain vectors (e.g., bacterial vectors with bacterial replication origin, and episomal mammalian vectors) are capable of autonomous replication in a host cell into which they are introduced. Other vectors (e.g., non-episomal mammalian vectors) integrate into a host cell's genome upon introduction into the host cell, and thus replicate together with the host genome. Furthermore, certain vectors are capable of directing the expression of a gene to which they are operably linked. Such vectors are referred to herein as “expression vectors”. Common expression vectors used in recombinant DNA technology are usually in the form of plasmids.
Recombinant expression vectors may comprise the nucleic acid of the present invention in a form suitable for nucleic acid expression in host cells, which means that these recombinant expression vectors contain one or more regulatory elements selected based on the host cell to be used for expression, and the regulatory elements are operably linked to the nucleic acid sequence to be expressed. Within the recombinant expression vector, “operably linked” is intended to mean that the nucleotide sequence of interest is linked to the one or more regulatory elements in a manner that allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).
The term “regulatory elements” are intended to include promoter, enhancer, internal ribosome entry site (IRES), and other expression control elements (e.g., transcription termination signal such as polyadenylation signal and polyU sequence). Such regulatory sequences are described, for example, in Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY, 185, Academic Press, San Diego, California, 1990. Regulatory elements include those sequences that direct constitutive expression of a nucleotide sequence in many types of host cells as well as those sequences that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences). Tissue-specific promoters can primarily direct expression in the desired tissue of interest, such as muscle, neuron, bone, skin, blood, specific organ (e.g., liver, pancreas), or specific cell type (e.g., lymphocyte). Regulatory elements may also direct expression in a time-dependent manner (e.g., in a cell cycle-dependent or developmental stage-dependent manner), which may or may not be tissue- or cell type-specific.
Those skilled in the art will appreciate that the design of expression vector may depend on factors such as the choice of host cell to be transformed, the desired level of expression, and the like. A vector can be introduced into a host cell to produce a transcript, protein, or peptide, including fusion protein or peptide encoded by the nucleic acid as described herein (e.g., clustered regularly interspaced short palindromic repeat (CRISPR) transcript, protein, enzyme, mutant form thereof, fusion protein thereof, etc.).
Favorable vectors include lentiviruses and adeno-associated viruses, and the vectors of this type can also be selected to target specific types of cells.
The term “in-library ligation” refers to a high-throughput, specific fragment ligation of a nucleic acid sample (library) composed of a large number of mixed sequences in a library, so as to achieve the purposes of extending sequence length and/or improving sequence diversity, etc. Using nicking endonuclease (e.g., Nb.BsrDI), this method can generate on a nucleic acid fragment a long overhang that has one-to-one correspondence and can realize pairwise complementary ligation between nucleic acid sequences. Thus, in a high-throughput library, the ligation reaction occurs only between pre-designed sequences. The key difference between this ligation reaction and the general restriction endonuclease-mediated digestion reaction lies in that the in-library ligation reaction can generate on a nucleic acid sequence a long overhang with variable length through pre-designed complementary sequences and nicking restriction enzymes. Since the overhang sequence length determines the number of fragment combinations that can achieve pairwise complementarity (the theoretical value is 4n, wherein n is the number of nucleotides in the overhang sequence, for example, the theoretical value of the number of overhang sequence with a length of 10 nucleotides is 410). The in-library ligation reaction can theoretically realize the one-to-one corresponding ligation between the internal sequences of a mixed nucleic acid sample with extremely high complexity. However, the common restriction enzymes (e.g., EcoRI, BamHI, etc.), firstly, have very limited types, and secondly, can generate overhang sequences that are generally 4 to 6 nucleotides with a specific sequence, so that the library constructed by using common restriction enzymes does not comprise a high-throughput one-to-one corresponding ligation between specified sequences.
In the in-library ligation design, a nicking endonuclease, for example, Nb. BsrDI, is used to generate nicks on the double strand DNA sequences. The Nb. BsrDI is one example of so-called “nicking endonuclease”, which has different cutting pattern from the commonly used restriction enzymes (e.g., EcoRI). To apply the Nb. BsrDI digestion, two recognition sites were designed to ligate two sub-pools. One recognition site located on the top strand of DNA, generated one nick at the top strand. Another recognition site located on the bottom strand of DNA, generated one nick at the bottom strand. And the two recognition sites were apart away from each other, resulted two nicks that were also apart from each other. Importantly, the distance between the two nicks is flexible. When we design the oligo, we could adjust the distance between the two recognition sites to determine the distance between the two nicks. This is why long overhangs of 21-nt (or a little shorter, or longer) can be generated by using nicking endonuclease.
The present invention provides the following exemplary embodiments:
The present invention, as generally described herein, will be more readily understood by reference to the following examples, which are provided by way of illustration and are not intended to limit the present invention. These examples do not imply that the experiments below are all or only experiments performed.
Oligonucleotide chain pools 1 and 2 are designed for each signaling pathway of a cell to form a CRISPR library of pair-specific multiplexed gRNA combinations (see the schematic diagram in
1.1 Selecting Sequence of Original gRNA Library[25]
In this experiment, the pairing principle was as follows:
The pairing sequences were as follows:
The 3′ end of each oligonucleotide in the oligonucleotide chain pool 1 and the 5′ end of each oligonucleotide in the oligonucleotide chain pool 2 were respectively added with a sequence, and the two sequences were specifically complementary in the two pair-specific oligonucleotides so as to ensure the specific pairing of the four gRNAs.
Referring to the schematic diagram of
As described in Example 1, the oligonucleotide chain pools 1 and 2 synthesized by the biotechnology company did not reach the amount for library construction and storage, and thus PCR amplification was required.
Objective: PCR amplification of oligonucleotide chain pools 1 and 2 was performed to achieve a sufficient amount to construct a CRISPR library of pair-specific multiplexed gRNA combinations.
It can be seen from Table 1 that one primer of the primer pairs used in each of the amplification of oligonucleotide chain pools 1 and 2 was biotinylated, so that a double-stranded amplification product with biotin in one amplification chain was obtained. The biotin-bearing small fragments (i.e., small fragments without gRNA combination, see
Usually, a 50 μl PCR reaction system was used as a single system, and 1 μl of 20 ng/l oligonucleotide chain pool was used as a PCR template in the single system, and a total of 24 single systems were made.
The PCR reaction system was as follows:
The PCR reaction was as follows:
Results: The oligonucleotide chain pool 1 was successfully amplified into the library chain pool 1 through the above PCR system and reaction conditions.
The PCR reaction system was as follows:
The PCR reaction was as follows:
Results: The oligonucleotide chain pool 2 was successfully amplified into the library chain pool 2 through the above PCR system and reaction conditions.
Since the residues in the PCR amplification reaction system would affect the digestion efficiency of the library chain pools 1 and 2, it was necessary to concentrate and purify the PCR amplification products to remove the residues.
The steps of concentration and purification were as follows:
The success of purification was confirmed according to the detected quality of the library. The purified PCR product could be used in subsequent reactions.
Each sequence in the library was digested by nicking endonuclease to generate specific sticky ends, which were used to complete the ligation of library chain pools 1 and 2 between each other so as to form a double-stranded ligation library (see
The reaction system and conditions of Nb.BsrDI enzyme digestion of oligonucleotide chain pool 1 were as follows:
Digestion time: 4 hours.
The reaction system and conditions of Nb.BsrDI enzyme digestion of oligonucleotide chain pool 2 were as follows:
Digestion time: 4 hours.
Oligonucleotide chain pools 1 and 2 were separately digested with Nb.BsrDI enzyme to generate products with 21 nt overhangs. In this example, the oligonucleotide chain pool 1 generated 3′-overhang products, and the oligonucleotide chain pool 2 generated 5′-overhang products. For pair-specific gRNA combinations, these two overhangs were complementary to each other.
The products of the digestion in 3.1.1 were purified with streptavidin magnetic beads. This allowed the removal of biotin-carrying small fragments of digestion products, thereby more thoroughly exposing the 3′- and 5′-overhang products of the oligonucleotide pools.
The kits used were as follows:
Purify the digested nucleotide library according to the kit instructions for subsequent reactions.
A schematic diagram of the annealing of library chain pools 1 and 2 to generate a double-stranded ligation library was shown in
The reaction system was as follows:
The reaction conditions were as follows:
(2) Digestion and Removal of Poorly Matched Double-Stranded Ligation Library Fragments with T7E1 Enzyme
After 30 minutes of digestion, 4 μl of 0.5M EDTA was added to the reaction system to stop the reaction.
Through the above-mentioned annealing and ligation procedure and the T7E1 enzyme digestion procedure on the poorly matched double-stranded ligated library fragments, the library chain pools 1 and 2 were successfully ligated to generate a double-stranded library.
Objective: To remove the residues in the reaction to improve the quality of the reaction products for subsequent reactions.
1.2× Ampure NXP beads (purchased from Beckman, A63882) were used to purify the library fragments, the purified library fragments were dissolved in 20 μl ddH2O, and detected with Qubit to determine the quality of the samples.
The purified high-quality double-stranded ligation library (i.e., spacer1-spacer2-spacer3-spacer4 shown in
4.1 Inserting Double-Stranded Ligation Library into lentiGuide-Puro Vector
The double-stranded ligation library prepared in Example 3 (i.e., gRNA1-gRNA2-gRNA3− gRNA4) was cloned into a modified lentiGuide-Puro backbone with mKate2 (Addgene, 52963) by Golden gate reaction.
The golden gate reaction (the reaction included two groups: sample and control) conditions were as follows:
The molar ratio of vector to insert was 1:3.5, and the amount of insert was 145 fmol.
The reaction conditions were as follows:
The library fragments were purified using 0.7× Ampure XP beads (Beckman, A63882), and dissolved in 10 μl ddH2O, and the sample quality was checked by Qubit.
Through the above procedures, the present inventors successfully inserted the double-stranded ligation library prepared in Example 3 into the lentiGuide-Puro vector.
Objective: To electroporate the lentiGuide-Puro vector carrying the double-stranded ligation library into competent cells for amplification for subsequent reactions; and to perform sequencing on random clones to confirm successful insertion of the library.
Preparation before electroporation: A LB dish and a LB medium containing ampicillin were preheated at 37° C. for 30 minutes; E. coli Endura electroporation competent cells (Lucigen, 60242-2) were thawed on ice; sample vial for electroporation and EP tube containing 2 μl of goldengate reaction product was cooled on ice.
Results: The Endura bacterial solution containing the library Golden Gate Assembly I was obtained, and the sequencing of random bacterial colony clone showed the insertion of the library sequence.
Objective: To extract the vector, purify and remove the residues in the reaction to improve the quality of the reaction product for subsequent reactions.
Objective: To construct an NGS sequencing library, and to determine the quality including homogeneity and diversity of the library carried in the vector by sequencing.
Samples for constructing the sequencing library:
The reaction system was as follows:
The reaction conditions were as follows:
Results: The NGS sequencing library was successfully constructed.
0.7× Ampure NXP beads were used to purify the sequencing library fragments, the purified library fragments were dissolved in 10 μl ddH2O, and the sample quality was checked with Qubit. Qualified sample was subjected to Illumina MiSeq or NextSeq sequencing.
Results: Sequencing results indicated that the insert library in the first-round reaction had good diversity and homogeneity (
For the primary library prepared in Example 4 (i.e., the recombinant lentiGuide-Puro vector containing gRNA1-gRNA2-gRNA3-gRNA4), scaffold1-tRNA1, scaffold2-tRNA2 and scaffold3-tRNA3 sequences were inserted sequentially between two adjacent gRNAs, that was, a total of 3 insertions were performed.
Objective: In this round of reaction, one tRNA sequence (hereafter referred to as tRNA) in an exogenous plasmid (e.g., a plasmid containing scaffold1-tRNA1, which had a sequence shown in SEQ ID NO: 708, and was synthesized by China General Biosystems (Anhui) Co., Ltd., http://www.generalbiol.com) was inserted into the library vector in the first-round reaction (see
5.1.1 Insertion of First tRNA Sequence
The molar ratio of the Golden Gate Assembly I library to the plasmid containing scaffold1-tRNA1 was 1:4.9, and 31 fmol of the Golden Gate Assembly I library was used.
The reaction conditions were as follows:
The first tRNA sequence (i.e., tRNA1) was successfully inserted into the vector carrying the library to form the Golden Gate Assembly II library.
0.7× Ampure beads were used to purify the library fragments, the purified library fragments were dissolved in 10 μl ddH2O, and the sample quality was checked with Qubit.
Results: A purified vector carrying one tRNA sequence (i.e., tRNA1) and the library was obtained.
5.1.3 Electroporation of the First-Round Reaction Product Obtained after Purification
Objective: To electroporate a vector carrying one tRNA sequence and a double-stranded ligation library into competent cells for amplification for subsequent reactions.
Preparation before electroporation: A LB dish containing ampicillin and a recovery medium were preheated at 37° C. for 30 minutes; Endura electroporation competent cells were thawed on ice; sample vial for electroporation and EP tube containing 4 μl of goldengate reaction product were cooled on ice.
Results: The Golden Gate Assembly II library was amplified using competent cells.
Objective: To extract the vector carrying Golden Gate Assembly II library from competent cells, and purify the vector for subsequent reactions.
Results: The purified Golden Gate Assembly II library was obtained
Objective: In this round of reaction, one tRNA sequence (hereafter referred to as tRNA2) in an exogenous plasmid (for example, a plasmid containing scaffold2-tRNA2, which had a sequence shown in SEQ ID NO: 709, and was synthesized by China General Biosystems (Anhui) Co., Ltd., http://www.generalbiol.com) was inserted into the library vector in the second-round reaction (see
5.2.1 Insertion Reaction of tRNA Sequence
Materials: Golden Gate Assembly II library; plasmid containing scaffold2-tRNA2.
The molar ratio of the Golden Gate Assembly II library to the plasmid containing scaffold2-tRNA2 was 1:3, and 35 fmol of the Golden Gate Assembly II library was used.
The reaction conditions were as follows:
Results: The second tRNA sequence (i.e., tRNA2) was successfully inserted into the vector carrying the 4-gRNA combinations.
0.7× Ampure beads were used to purify the library fragments, the purified library fragments were dissolved in 15 μl of ddH2O, and the sample quality was checked with Qubit.
Results: The purified vector carrying two tRNA sequences (i.e., tRNA1 and tRNA2) and the library was obtained.
5.2.3 Electroporation of Third-Round Reaction Product Obtained after Purification
Objective: To electroporate the vector carrying two tRNA sequences and the library into competent cells for amplification for subsequent reactions.
Preparation before electroporation: A LB dish containing ampicillin and a recovery medium were preheated at 37° C. for 30 minutes; Endura electroporation competent cells were thawed on ice; sample vial for electroporation and EP tube containing 4 μl goldengate reaction product were cooled on ice.
Results: The Golden Gate Assembly III library was amplified by competent cells, and the purified Golden Gate Assembly III library was obtained.
Objective: In this round of reaction, one tRNA sequence (hereafter referred to as tRNA3) in an exogenous plasmid (for example, a plasmid containing scaffold3-tRNA3, which had a sequence shown in SEQ ID NO: 710, and was synthesized by China General Biosystems (Anhui) Co., Ltd., http://www.generalbiol.com) was inserted into the library vector in the third-round reaction (see
5.3.1 Insertion Reaction of tRNA Sequence
Materials: Golden Gate Assembly III library; plasmid containing scaffold3-tRNA3.
The molar ratio of the Golden Gate Assembly III library to the plasmid containing BsaI-tRNA was 1:3.5, and 30 fmol of the Golden Gate Assembly III library was used.
The reaction conditions were as follows:
Results: The third tRNA sequence (i.e., tRNA3) was successfully inserted into the vector carrying the library, and a vector carrying pair-specific multiplexed 4-gRNA combinations and 3 tRNAs was obtained.
0.7× Ampure beads were used to purify the library fragments, the purified library fragments were dissolved in 15 μl and ddH2O, and the sample quality was checked with Qubit.
Results: The purified vector carrying three tRNA sequences (i.e., tRNA1, tRNA2 and tRNA3) and the library was obtained.
5.3.3 Electroporation of Third-Round Reaction Product Obtained after Purification
Objective: To electroporate lentiGuide-The Puro vector carrying three tRNA sequences and the double-stranded ligation library into competent cells for amplification and for subsequent reactions.
Preparation before electroporation: A LB dish containing ampicillin and a recovery medium were preheated at 37° C. for 30 minutes; Endura electroporation competent cells were thawed on ice; sample vial for electroporation and EP tube containing 4 μl of goldengate reaction product were cooled on ice.
Results: The Golden Gate Assembly IV library was amplified with competent cells, and the purified Golden Gate Assembly IV library was obtained.
Materials: Golden Gate Assembly IV library and primers (Table 4)
Fwd-libseq-U6:
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACG
CTCTTCCGATCTGGACTATCATATGCTTACCGTAAC
Fwd-libseq-gln:
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACG
CTCTTCCGATCTGGTTAGCACTCTGGACTCTG
The reaction system was as follows:
The reaction conditions were as follows:
Results: The NGS sequencing library was successfully constructed.
0.7× Ampure NXP beads were used to purify the sequencing library fragments, the purified library fragments were dissolved in 10 μl of ddH2O, and the sample quality was checked with Qubit. The qualified sample was subjected to Illumina MiSeq or NextSeq sequencing.
Results: The sequencing results indicated that the insert library in the first-round reaction had good diversity and homogeneity (
Results: According to the results of the flow cytometry, the titer of the concentrated lentiviruses was concluded to reach a level of 108, which could be used to infect human cell lines to deliver the CRISPR library of pair-specific multiplexed gRNA combinations for subsequent high-throughput screening.
A total of 21,938,825 sequence reads were obtained by NGS sequencing via PE150 for the CRISPR library constructed by the method of the present invention. Using cutadapt (v2.6), low-quality sequence reads and sequencing adapter sequences were removed, and the sequence reads after filtration were aligned with the designed oligonucleotide chain pools using Bowtie2 (v 2.3.5.1), the alignment rate of proper pair reads was 96.74%, with a total of 21,223,537 reads, indicating that 96.74% of the clones were correct. Among them, 9,916,616 gRNA sequences were completely correct, and the other 11,307,121 sequences contained errors generated during DNA synthesis, and this correct rate met the requirements for library construction.
The present inventors used Jurkat T cell receptor (TCR) signaling pathway activation model to verify the effectiveness of the CRISPR library of multiplexed gRNA combinations constructed in the present invention for screening signaling pathway.
Cas9 encoding gene was inserted into Jurkat cells using lentiCas9-Blast lentivirus (Addgene, 52962). The Jurkat-Cas9 cell line was selected using 2 ug/mL blasticidin as determined by blasticidin killing curve. Following the blasticidin selection, viable cells were collected, and the cells were sorted by BD FACS fusion flow cytometer (BD Bioscience). Then, Cas9-expressing Jurkat cell monoclones were established in the presence of 2 ug/mL blasticidin. The Cas9 expression of each monoclone was verified by Western blotting (Cell Signaling, Mouse anti-Cas9, 7A9-3A3).
The vector library of 4-gRNA combinations, pMD2.G (Addgene, 12259) envelope plasmid and psPAX2 packaging plasmid (Addgene, 12260) were mixed in a mass ratio of 5:2:3, and incubated with 250 uM calcium chloride. An equal volume of 2×HeBS (280 mM NaCl, 1.5 mM Na2HPO4, 50 mM HEPES, pH 7.05) was added to the above DNA-CaCl2 and incubated for 15 minutes at room temperature. This mixture was added dropwise to 80% confluent HEK293T cells for transfection. Lentiviral supernatants were collected at 48 and 72 hours after transfection, filtered through 0.45 m filters (Millipore, SLHV033RB), and concentrated by ultracentrifugation at 70,000 g for 2 hours at 4° C. A total of 20×106 Jurkat-Cas9 cells were infected with the concentrated viral library at MOI ≤0.3 in RPMI-1640 medium containing 8 ug/ml polybrene. Spinfection was performed by centrifuging the culture plate at 700 g for 2 hours at 32° C. The cells were verified for mKate2 expression by flow cytometry (Cytoflex, Beckman) 48 hours after the transduction, and this expression indicated successful transduction. The proportion of mKate2-positive cells was typically about 30%. During the next 6 to 10 days, the cells were grown under antibiotic selection of 2 ug/mL puromycin and 2 ug/mL blasticidin, and the cell concentration was maintained at 5×105 cells/mL. During antibiotic selection, the cells were monitored for mKate2 expression by flow cytometry until 95% of the cells were mKate2-positive.
6×106 successfully infected Jurkat-Cas9 cells were collected as a starting reference (the control in
To validate the inhibitory effect of a single candidate gRNA vector or a gRNA combination vector, validation experiments were performed following the same procedure for large-scale library transduction and activation. The difference was that the starting number of Jurkat-Cas9 cells per viral transduction experiment was 5×105. 24 hours after stimulation, the percentage of CD69+ cells was examined using flow cytometry (Cytoflex, Beckman). All flow cytometry data were analyzed by Flowjo v10.
In the experiments, the cells with highly activated and inactivated TCR signaling pathway were collected separately and their genomic DNAs were extracted, followed by NGS sequencing after amplification and insertion of gRNA sequences. According to the sequencing results, the present inventors performed the sorting from high to low according to the amount of enriched multiplexed gRNA combinations in the inactivated cells, which confirmed that various gRNA combinations targeting the TCR signaling pathway were significantly enriched in the inactivated cells (Log 2 FC<−1 or >1; −log 10(P-value)>1) (
(1) T Cell Activation Screening Through Library of 4-gRNA Combinations
The present inventors reasoned that multiple genes involved in the same pathway or the same gene family might exhibit more functional relevance and lead to genetic compensation when only one of them was functionally disrupted. Therefore, the present inventors hypothesized that disturbing multiple genes in the same pathway or gene family might help with identifying new candidates that shared coordinated behavior, comparing to disturbing one single target. To facilitate this goal, the present inventors designed most of the 4-gRNA combinations either from the same pathway (3,672) or from the same gene family (945). To balance the coverage among targeting genes, the present inventors generated 1,569 random combinations by picking genes according to their occurrences across the established combinations in descending order. 50 negative controls were also included. Finally, in the designed library, each of the 1,599 candidate genes were covered by 15 combinations in average (minimum 13 combinations) (
To demonstrate the performance of the 4-gRNA combinations (4gRNA-comb) in CRISPR screening, the present inventors applied the library into a canonical T cell activation model by interrogating genes in combinations. The activation of T cell receptor (TCR) promoted signal transduction cascades that ultimately activated transcription factors such as NF-κB, NFAT and AP-1, thereby promoting the transcription of specific genes that lead to T cell proliferation and differentiation. In this system, genes involved in multiple signaling pathways cooperated and constituted a complicated network that governed the fate of T cells.
To perform multiplexed screening, Jurkat cells with stable Cas9 expression (also known as Jurkat-Cas9 cells) were transduced with the expression vector of the 4-gRNA combinations (
To discover combinations that potentially interrupted the cellular signal transduction of TCR activation, the present inventors focused on comparisons between CD69+ and CD69− post-stimulation samples. The ratio of normalized read counts of each combination between the CD69+ and CD69− samples were used to evaluate the perturbation to the TCR signal transduction. Firstly, the ratios from the combinations from the TCR signaling pathway, the salivary secretion pathway and pre-designed non-targeting controls were first examined (
Next, the present inventors ranked the ratios across all combinations to identify top candidates. The present inventors calculated the ratios of normalized read counts of each combination between CCD69+ and CD69− cell populations, and assigned p-values under a negative binomial model (
To further analyze the synergistic behavior of multiple genes in the same combination, the present inventors dissected the top candidate “PSMF1-PSMD11-ROCK1-HRAS” into the following subsets: six 2-gRNA combinations and four single gRNA, and repeated the activation experiment. This was to test whether the down-regulation to the Jurkat activation was due to the incorporated behavior of the four genes, or due to incorporated behavior of a dominant subset. Among all 2-gRNA subsets, only “PSMD11-PSMF1” reduced the Jurkat activation level with statistical significance but not as effective as the 4-gene combination (
Finally, the present inventors defined a set of “highly impacting” 4-gene combinations, they were enriched more than 2-fold in the CD69-cell populations, and the present inventors attempted to find other subsets that were essential to T cell activation. The present inventors defined a synergy score to quantify the contribution of each subset. Since the occurrences of 3-gene or 4-gene subsets was limited to the library of the present invention, the present inventors calculated scores for the 2-gene combination subsets. The present inventors validated the top candidate “ATP6V1D-KDELR1” and confirmed that the simultaneous knockout of these two genes reduced the activation rate of Jurkat cells (
Overall, these data demonstrated that the multiplexed CRISPR perturbation of the present invention was an effective strategy to identify functional and combinatorial gene sets that were responsible for phenotypic outcomes.
The experimental results of this example proved that the screening library of pair-specific multiplexed gRNA combinations constructed by the present invention was effective in the screening of cell signaling pathways, and could perform high-throughput screening for the effect of specific multi-gene combinations in mediating the biological state and behavior of cells, thereby studying the role of multi-gene combinational functions in cell regulation; while traditional gRNA libraries were only designed for a single gene, and transducing multiple gRNAs at the same time was low in efficiency and random in combination, so that it was impossible to perform high-throughput screening for specific multi-gene combinations that medicated biological state and behavior of cells.
The present inventors used the strategy of CRISPR library of multiplexed gRNA combinations to construct a pegRNA library for the Prime Editor system. The spacer part and the PBS+ reverse transcription template part in gRNA were designed in two oligonucleotide libraries, respectively. Using the aforementioned in-library ligation protocol, a screening library capable of testing gRNA editing efficiency was constructed (
To identify potential candidates for a combined immunotherapy, we applied a 4-gRNA multiplexed library in an in vivo screen for boosted tumor-infiltrating T cells (TILs). Following a multiplexed CRISPR library construction strategy, as further detailed below, we genetically engineered CD8+ T cells collected from OT-1 mice. To investigate synergistic or additive anti-tumor efficacies of multiple gene knockouts, we engineered the T cells with a library of four gRNAs simultaneously.
The engineered T cells were screened for activation capability in a tumor environment. The engineered T cells were injected into recipient mice inoculated with Hepa1-6 cells with stable H2Kb-OVA257-264 expression (
More specifically, the in vivo screening library was designed to target six checkpoint genes (Btla, Pdcd1, Tigit, Ctla4, Havcr2 and Adora2a) and included all fifty-six possible combinations, composed of fifteen 4-gRNA combinations, twenty 3-gRNA combinations, fifteen 2-gRNA combinations and six single-gRNA combinations (denoted as “CP group” herein). Moreover, for each combination, we used non-targeting control gRNAs to fill the unoccupied positions if the number of the targeting gRNAs is less than 4. For example, for the six single-gRNA combinations, each of them contains three non-targeting control gRNAs to fill the unoccupied positions. For comparison, we also included combinations targeting two other groups of genes: one included four genes (Lat, Zap70, Cd3e, and CD247) involved in the first signaling of T cell activation (fifteen combinations, denoted as “TCR group” herein), the other included five co-stimulatory molecules (Il2ra, Tnfrsf9, Tnfrsf4, Tnfrsf18 and CD28) involved in the secondary signaling of T cell (thirty combinations, denoted as “CS group” herein). T cells engineered by combinations from the TCR group and CS group should be incapable of T cell activation. All together, we included 101 distinct combinations targeting one to four genes of the CP group, the TCR group, and the CS group. For each distinct combination, we designed a group of six gRNA-combos in the library to eliminate biases of individual guide RNA. Another eighty-four combinations included only non-targeting control gRNAs, which served as negative controls (denoted as “NT group” herein). The sequences of the gRNA combinations are listed in SEQ ID NOs: 14-703. The screening was conducted in three independent batches.
We calculated log 2 transformed fold-change (log 2FC) values to show the relative abundance of each gRNA combination in the tumor infiltrated lymphocytes (TIL) relative to the engineered T cells before being injected into the recipient mice (“SR,” representing “starting reference”) (further described below). It was contemplated that the T cells enriched in the tumors gained functions relevant to anti-tumor immunity, which were reflected by the gRNA combinations with high log 2FC values. As shown in
We ranked all gRNA combinations based on the corresponding T cell enrichments in the tumors from three screening batches and identified a top candidate of 3-gRNA combination that simultaneously targets Pdcd1, Adora2a and Ctla4 (denoted as “PAC” herein) (
Next, we performed validation experiments to confirm the screen results. We prepared T cells knocked out only at the Pdcd1 loci (denoted as “PNN”), at Pdcd1 and Ctla4 loci (denoted as “PCN”), as well as Pdcd1, Ctla4, and Adora2a loci (“PAC”). The knockout efficiencies of the gRNAs were confirmed (
The engineered T cells were injected intravenously into the recipient mice inoculated with Hepa1-6 cancer cells expressing H2Kb-OVA257-264. After the T cell therapy, the weight loss of the mice and the tumor size was monitored for eight weeks. We found that the growth of tumor size of the other two groups (PNN and PCN) was all controlled at different levels. T cells engineered by the PAC combination showed the best anti-tumor immune responses compared to T cells engineered by PCN or PNN, which were reflected by the tumor size and the survival rate of the mice (
These results indicated that the multiplexed CRISPR screen is an effective way to look for candidates for potential combinatorial immune checkpoint blockades, and for other potential combinatorial pathway blockades.
An in vivo screen library was designed and constructed via in-library ligation and vector library construction as illustrated in
As noted above, for the check point blockade screening library, we included a group of six immune checkpoint genes (CP group), a group of four genes involved in the first signaling of T cell activation (TCR group) and a group of five co-stimulatory molecules involved in the secondary signaling (CS group). Within each group, all possible 4 gRNA combinations, 3 gRNA combinations, 2 gRNA combinations, and single gRNA construct were designed. For each construct, the unoccupied position was placed with non-targeting control gRNA if the number of the targeting gRNAs is less than four. Further, all combinations were represented by six groups of gRNAs that are distinct from each other. Additionally, 84 combinations containing only the non-targeting control gRNAs (NT group) were included and served as negative control. This screen library composed a total of 101 gene combinations represented by 606 gRNA groups and 84 negative control combinations including on non-targeting gRNAs.
For the screening part, a multiplexed CRISPR knockout vector that contained a 4-sgRNA tandem cassette (as illustrated in
For the validation part, a multiplexed CRISPR knockout vector that contained a Pdcd1-Adora2a-Ctla4 gRNA tandem cassette and a mKate2 reporter was generated (SEQ ID NO: 704). A vector that contained a Pdcd1-NTC-NTC gRNA tandem cassette and a BFP reporter were created as control, in which one NTC gRNAs replaced the Adora2a gRNA, one NTC gRNA replaced the Ctla4 gRNA, and a BFP reporter replaced the mKate2 reporter. A vector that contained a Pdcd1-Ctla4-NTC sgRNA tandem cassette and a BFP reporter was created as a control, in which one NTC gRNAs replaced the Adora2a gRNA and a BFP reporter replaced the mKate2 reporter.
Hepa1-6 cells were transduced with H-2Kb-OVA257-264-expressing lentivirus. And the H-2Kb-OVA257-264 expression in a mono-clone was validated via flow cytometry. The resulted cell line was named as Hepa1-6-H-2Kb-OVA257-264. The established Hepa1-6-H-2Kb-OVA257-264 cells were further transduced with a lentiviral vector (lenti-EF-1α-luciferase-T2A-BSD) for luciferase stable expression.
Primary T cells were isolated from OT-1 or Cas9+OT-1 mice, which were bred from OT-1 and Cas9 mouse obtained from the Jackson Laboratory. The tumor was inoculated to the NOD-Prkdcscid Il2rgnull/Shjh mice purchased from Shanghai Jihui Laboratory Animal Care. The T cell donor mice were 10-12 weeks old. The tumor recipient mice were 6-8 weeks old. All mice were housed in standard individually ventilated and pathogen-free conditions in the laboratory facility of the Westlake University, under that animal protocol (AP #21-016-MLJ). All mice were used in accordance with Institutional Animal Care and Use Committee (IACUC) guidelines for Westlake University.
10.2.3 T Cell Isolation and Culture Spleens were isolated from the Cas9+OT-1 mice, followed by mashing through 40 m filter and RBCs lysis (BD Pharm Lyse). CD8+T cells were purified by negative selection via CD8a+ T cell isolation Kit (Milteny). Cells were stimulated with 100U/ml recombinant human IL-2 (Peprotech), 1 g/ml anti-mouse CD3F (Ultraleaf, Clone 145-2C11, Biolegend) and 0.5 g/ml anti-mouse CD28 (Ultraleaf, Clone 37.51, Biolegend) and cultured in RPMI-1640 with 10% FBS, 10 mM HEPES (Gibco), 100 M non-essential amino acids (Gibco), 1 mM Sodium Pyruvate (Gibco), 50 μM β-mercaptoethanol (Sigma), 50 U/ml penicillin, and 50 μg/ml streptomycin (Gibco).
After ex vivo stimulation for 24h, CD8+T cells were transduced with lentivirus in the presence of polybrene at 8 μg/ml during spinfection at 2,000 g for 2h at 32° C. At 48h after transduction, T cells were collected for transduction efficiency test via flow cytometry and adoptive transfer.
In a validation experiment, CD8+T cells were transduced with lentivirus for 2 times at 24h and 48h after isolation. At 24h after second transduction, T cells were collected for transduction efficiency test via flow cytometry and adoptive transfer after sorting via FACS. The gene editing efficiency was tested in T cells with a Pdcd1-Adora2a-Ctla4 combined disruption. At 48h after second transduction, mKate2+ T cells were sorted via FACS and pelleted for gDNA extraction. Then, the sgRNA target sequences of each gene were amplified by 2-step PCR for NGS sequencing. The list of oligos used in gene editing efficiency test were included in Table 5.
OT-1 CD8+T cells were co-cultured with either Hepa1-6 cells or Hepa1-6 expressing H-2Kb-OVA257-264 cells for 2h and 48h. In the 2h test, cells were co-cultured at the presence of anti-CD107 (Biolegend). After 2h, all cells were collected and stained with anti-CD8a (Biolegend) for degranulation analysis via flow cytometry (Cytoflex, Beckman). After 48h, all cells were collected and stained with anti-CD8a, PI and Annexin V (Biolegend) for target cell apoptosis analysis via flow cytometry (Cytoflex, Beckman). All FCM Data were analyzed by Flowjo.
Hepa1-6 cells expressing H-2Kb-OVA257-264 were mixed with matrigel (1:1 volume) and injected subcutaneously into the right flank of NPSG mice at 1×106/recipient. At d12 after tumor cell inoculation, 1×107 CD8+ T cells with screening library transduction (5%˜10% mKate2+ cells in total cells) were adoptively transferred into each recipient via i.v. injection. Meanwhile, 2˜3×106 CD8+ T cells with screening library transduction were frozen as a starting reference (SR). Weight loss and tumor size was measured at d0 and d7 after T cell injection. On d7 after injection, the tumor was collected and cut into small fragments. After consecutively mashing through 100 m and 40 m filters, RBCs in the cell suspension were lysed. Then, the tumor infiltrating CD8+T cells were enriched by density gradient centrifugation via Lymphprep (StenCell). Cells at the interface were carefully collected and washed by PBS. Then, the cells were re-suspended into PBS and stained with anti-mouse CD8a for 30 mins on ice. Finally, CD8+mKate2+ TILs were sorted via FACS (BD Fusion). A total of 20,000-40,000 CD8+mKate2+ TIL could be collected per tumor. TIL from 3-4 recipient mice were mixed together and pelleted with carrier cells (Raji cell) at 1:50 (CD8+ T cells: carrier cells) for genomic DNA extraction.
10.2.7 Genomic DNA Extraction and sgRNA Library PCR Amplification
Genomic DNA extraction was performed using TIANamp Genomic DNA kit (TIANGEN) and finally resuspended in 50 μl nuclease free water. To prepare the gRNA NGS library for the SR sample, all gDNA were amplified on thermocycling with parameters of 98° C. for 30 sec, 20˜22 cycles of (98° C. for 10 sec, 64° C. for 30 sec, 72° C. for 20 sec), 72° C. for 2 min. One NGS library generated amplicons covering the 1st and the 2nd gRNAs (G12 library), and another NGS library generated amplicons covering the 2nd and the 3rd gRNAs (G23 library). Primers of SEQ ID NO: 10 and SEQ ID NO: 11 were used as a pair of primers to amplify the G12 library. Primers of SEQ ID NO: 12 and SEQ ID NO: 13 were used as a pair of primers to amplify the G23 library. To prepare the gRNA NGS library for the TIL sample, two-step amplification was applied. In the 1st step, PCR reaction (400˜800 ng DNA input per reaction, 2˜4 reactions per sample) was performed using Ultra II Q5 Master Mix (NEB) with thermocycling parameters as 98° C. for 30 sec, 28-30 cycles of (98° C. for 10 sec, 60° C. for 30 sec, 72° C. for 20 sec), 72° C. for 2 min. Primers of SEQ ID NO: 8 was used as the forward primer and SEQ ID NO: 9 was used as the reverse primer. And the PCR condition and primers of the 2nd step follows the condition of the SR library preparation, but with 8-10 cycles.
The list of primers used in gene editing efficiency test were included in Table 5.
Hepa1-6 cells expressing H-2Kb-OVA257-264 with luciferase were mixed with matrigel (1:1 volume) and injected subcutaneously into the right flank of NPSG mice at 1×106/recipient. On d11-d12 after tumor cell inoculation, 1×106 mKate2+ or BFP+ CD8+ T cells were sorted via FACS and adoptively transferred into each recipient via intravenous injection. Weight loss and tumor size were measured every 3 days after T cell injection. Meanwhile, the biological signal of tumor was monitored weekly by in vivo imaging via PHOTON IMAGER™ OPTIMA, in which luciferin was administered intraperitoneally 5 minutes prior to signal collection.
In order to find the effective 4-gRNA combinations that enhance the capacity of the CD8+T cell-mediated tumor elimination in vivo, the normalized read counts of each combination were used to compare their representatives between the TIL and SR libraries. Normalizations were conducted according to the depth of sequencing libraries. We calculated both the fold-change and the p-value for each 4-gRNA combination. The TIL and SR libraries were treated as two samples, and G12 library and G23 library of each sample were treated as technical replicates. We used the log 2 fold-change of G12 and G23 between the TIL and SR libraries to pick out combinations for validations, which can be explained as Log 2((Mean of TIL three batches g12+1)/(Mean of SR three batches g12+1)) and Log 2((Mean of TIL three batches g23+1)/(Mean of SR three batches g23+1)).
Those skilled in the art will further realize that the present invention may be embodied in other specific forms without departing from its spirit or central characteristics. Since the foregoing description of the present invention discloses only exemplary embodiments thereof, it is to be understood that other variations are considered to be within the scope of the present invention. Therefore, the present invention is not limited to the specific embodiments described in detail herein. Rather, reference should be made to the appended claims to indicate the scope and content of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
202110686751.3 | Jun 2021 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2022/096250 | 5/31/2022 | WO |