Genetic interactions lay the foundation of virtually all biological systems. With rare exceptions, every gene interacts with one or more other genes, forming highly complex and dynamic networks. The nature of genetic interactions includes physical interactions, functional redundancy, enhancer, suppressor, and/or synthetic lethality. Such interactions are the cornerstones of biological processes such as embryonic development, homeostatic regulation, immune responses, nervous system function and behavior, and evolution. Perturbation or misregulation of genetic interactions in the germ line can lead to failures in development, physiological malfunction, autoimmunity, neurological disorders, and/or many forms of genetic diseases. Disruption of the genetic networks in somatic cells can lead to malignant cellular behaviors such as uncontrolled growth, driving the development of cancer.
The study of genetic interactions evolved over a century, originating in the era of classical genetics. In essence, how two genes interact can be studied by examining the phenotypes of double mutants as compared to single mutants. This concept of epistasis has guided the conceptualization and subsequent discovery of countless important pathways, and has become the gold standard for determining downstream and upstream regulation in genetic analysis. For instance, synthetic lethality has been investigated in animal development and cancer therapeutics. Classical approaches such as genome-wide association studies (GWAS) and quantitative trait loci (QTL) mapping have been extensively employed to study complex phenotypes that involve multiple genes. While high-throughput genetic perturbation approaches have been developed to map out the landscape of genetic interactions in yeast and in worms, large-scale double knockout studies in mammalian species are scarce, due to the exponentially scaling number of possible gene combinations and the technological challenges of generating and screening double knockouts.
There is thus a need in the art for compositions and methods for high-throughput multi-dimensional knockout screening. Such compositions and methods should be useful for multiplexed genome editing and screening. The present invention satisfies this need.
As described herein, the present invention relates to compositions and methods for simultaneously or sequentially mutagenizing multiple target sequences in a cell.
One aspect of the invention includes a vector comprising a first long terminal repeat (LTR) sequence, an Embryonal Fyn-Associated Substrate (EFS) sequence, a Cpf1 sequence, a Nuclear Localization Signal (NLS) sequence, an antibiotic resistance sequence, and a second LTR sequence.
Another aspect of the invention includes a vector comprising a first LTR sequence, a promoter sequence, a direct repeat sequence of Cpf1, a first restriction site, a second restriction site, an EFS sequence, an antibiotic resistance sequence, a posttranscriptional regulatory element sequence, and a second LTR sequence.
Yet another aspect of the invention includes a crRNA array comprising a 5′ nucleotide sequence that is homologous to a first nucleotide sequence on a vector, a first crRNA sequence, a direct repeat sequence of Cpf1, a second crRNA sequence, a terminator sequence, and a 3′ sequence that is homologous to a second sequence on the vector.
In another aspect, the invention includes a vector comprising a first LTR sequence, a promoter sequence, a first direct repeat sequence of Cpf1, a first crRNA sequence, a second direct repeat sequence of Cpf1, a second crRNA sequence, a terminator sequence, an EFS sequence, a posttranscriptional regulatory sequence, and a second LTR sequence.
In yet another aspect, the invention includes a crRNA library comprising a plurality of crRNA arrays cloned into a plurality of vectors, wherein the crRNA arrays individually comprise a 5′ nucleotide sequence that is homologous to a first nucleotide sequence on a vector, a first crRNA sequence, a direct repeat sequence of Cpf1, a second crRNA sequence, a terminator sequence, and a 3′ sequence that is homologous to a second sequence on the vector.
In still another aspect, the invention includes a method for simultaneously mutagenizing multiple target sequences in a cell. The method comprises administering to the cell a crRNA library comprising a plurality of vectors comprising a plurality of crRNA arrays. Each crRNA array independently comprises a 5′ nucleotide sequence that is homologous to a first nucleotide sequence on the vector, a first crRNA sequence, a direct repeat sequence of Cpf1, a second crRNA sequence, a terminator sequence, and a 3′ sequence that is homologous to a second sequence on the vector. The first crRNA is complementary to a first target sequence and the second crRNA is complementary to a second target sequence.
Another aspect of the invention includes a method of identifying synergistic drivers of transformation and/or tumorigenesis in vivo. The method comprises administering a cell mutagenized by a crRNA library to an animal. The crRNA library comprises a plurality of vectors comprising a plurality of crRNA arrays. Each crRNA array independently comprises a 5′ nucleotide sequence that is homologous to a first nucleotide sequence on the vector, a first crRNA sequence, a direct repeat sequence of Cpf1, a second crRNA sequence, a terminator sequence and a 3′ sequence that is homologous to a second sequence on the vector. The first crRNA is complementary to a first target sequence and the second crRNA is complementary to a second target sequence. A nucleotide from a tumor from the animal is sequenced. The data from the sequencing are analyzed to identify the synergistic drivers of transformation and/or tumorigenesis.
Yet another aspect of the invention includes an in vivo method for identifying and mapping genetic interactions between a plurality of genes. The method comprises administering a cell mutagenized by a crRNA library to an animal. The crRNA library comprises a plurality of vectors comprising a plurality of crRNA arrays. The crRNA array comprises a 5′ nucleotide sequence that is homologous to a first nucleotide sequence on the vector, a first crRNA sequence, a direct repeat sequence of Cpf1, a second crRNA sequence, a terminator sequence, and a 3′ sequence that is homologous to a second sequence on the vector. The first crRNA is complementary to a first target sequence and the second crRNA is complementary to a second target sequence. A nucleotide from a tissue from the animal is sequence. The data from the sequencing are analyzed to identify and map the genetic interactions.
Another aspect of the invention includes a kit comprising a CCAS library comprising a plurality of vectors comprising a plurality of crRNA arrays, wherein the crRNA arrays comprise a nucleotide sequence selected from the group consisting of SEQ ID NOs: 4-9,708, and instructional material for use thereof.
Still another aspect of the invention includes a kit comprising a MCAP library comprising a plurality of vectors comprising a plurality of crRNA arrays, wherein the crRNA arrays comprise a nucleotide sequence selected from the group consisting of SEQ ID NOs: 9,762-21,695, and instructional material for use thereof.
In another aspect, the invention includes a vector comprising a first promoter, a Cpf1 sequence, a second promoter, a first Cpf1 direct repeat sequence, a lox66 sequence, a second Cpf1 direct repeat sequence, two inverted restriction sites, an inverted lox71 sequence, and a crRNA FlipArray. The crRNA FlipArray comprises a first crRNA sequence, 4-10 consecutive thymidines, a second inverted crRNA sequence, 4-10 consecutive adenines, and a third inverted direct repeat sequence.
In yet another aspect, the invention includes a gene editing system capable of inducible, sequential mutagenesis in a cell. The system comprises a vector and a Cre recombinase. The vector comprises a first promoter, a Cpf1 sequence, a second promoter, a first Cpf1 direct repeat sequence, a lox66 sequence, a second Cpf1 direct repeat sequence, two inverted restriction sites, an inverted lox71 sequence, and a crRNA FlipArray. The crRNA FlipArray comprises a first crRNA sequence, 4-10 consecutive thymidines, a second inverted crRNA sequence, 4-10 consecutive adenines, and a third inverted direct repeat sequence.
Another aspect of the invention includes a gene editing system capable of inducible, sequential mutagenesis in a cell. The system comprises a plurality of vectors and a Cre recombinase. The the vectors comprise a first promoter, a Cpf1 sequence, a second promoter, a first Cpf1 direct repeat sequence, a lox66 sequence, a second Cpf1 direct repeat sequence, two inverted restriction sites, an inverted lox71 sequence, and a crRNA FlipArray. The crRNA FlipArray comprises a first crRNA sequence, 4-10 consecutive thymidines, a second inverted crRNA sequence, 4-10 consecutive adenines, and a third inverted direct repeat sequence.
Yet another aspect of the invention includes a method of inducible, sequential mutagenesis in a cell. The method comprises administering to the cell a vector comprising a first promoter, a Cpf1 sequence, a second promoter, a first Cpf1 direct repeat sequence, a lox66 sequence, a second Cpf1 direct repeat sequence, two inverted restriction sites, an inverted lox71 sequence, and a crRNA FlipArray. The crRNA FlipArray comprises a first crRNA sequence, 4-10 consecutive thymidines, a second inverted crRNA sequence, 4-10 consecutive adenines, and a third inverted direct repeat sequence. The first crRNA is expressed. A Cre recombinase is administered to the cell. When the Cre recombinase is administered, the second crRNA is expressed, thus sequentially mutagenizing the cell.
Still another aspect of the invention includes a method of inducible, sequential mutagenesis in a cell. The method comprises administering to the cell a plurality of vectors. The plurality of vectors individually comprise a first promoter, a Cpf1 sequence, a second promoter, a first Cpf1 direct repeat sequence, a lox66 sequence, a second Cpf1 direct repeat sequence, two inverted restriction sites, an inverted lox71 sequence, and a crRNA FlipArray. The crRNA FlipArray comprises a first crRNA sequence, 4-10 consecutive thymidines, a second inverted crRNA sequence, 4-10 consecutive adenines, and a third inverted direct repeat sequence. The first crRNA is expressed. A Cre recombinase is administered to the cell. When the Cre recombinase is administered, the second crRNA is expressed, thus sequentially mutagenizing the cell.
Another aspect of the invention includes a method of inducible, sequential mutagenesis in a cell in an animal. The method comprises administering to the animal a plurality of vectors. The plurality of vectors individually comprise a first promoter, a Cpf1 sequence, a second promoter, a first Cpf1 direct repeat sequence, a lox66 sequence, a second Cpf1 direct repeat sequence, two inverted restriction sites, an inverted lox71 sequence, and a crRNA FlipArray. The crRNA FlipArray comprises a first crRNA sequence, 4-10 consecutive thymidines, a second inverted crRNA sequence, 4-10 consecutive adenines, and a third inverted direct repeat sequence. The first crRNA is expressed. The animal is administered a Cre recombinase. When the Cre recombinase is administered, the second crRNA is expressed thus sequentially mutagenizing the cell in the animal.
In various embodiments of the above aspects or any other aspect of the invention delineated herein, the vector further comprises a tag sequence. In one embodiment, the tag sequence is a a Flag2A sequence. In one embodiment, the first and/or second restriction site is a BsmBI restriction site. In one embodiment, the posttranscriptional regulatory element sequence comprises a Woodchuck Hepatitis Virus (WHP) Posttranscriptional Regulatory Element (WPRE) sequence. In one embodiment, the promoter sequence comprises a U6 promoter sequence. In one embodiment, the terminator sequence comprises a U6 terminator sequence.
In one embodiment, the first promoter is an EFS promoter. In one embodiment, the EFS promoter drives expression of Cpf1. In one embodiment, the second promoter is a U6 promoter. In one embodiment, the U6 promoter drives expression of the crRNA FlipArray. In one embodiment, the first promoter and the second promoter are in opposite orientations. In one embodiment, the vector further comprises an antibiotic resistance marker. In one embodiment, In one embodiment, the antibiotic resistance marker is a puromycin resistance sequence. In one embodiment, the restriction sites are BsmbI restriction sites. In one embodiment, the Cpf1 sequence is a Lachnospiraceae bacterium Cpf1 (LbCpf1) sequence. In one embodiment, any one of the first, second, or third, direct repeat sequences is from LbCpf1. In one embodiment, the first crRNA sequence comprises six consecutive thymidines. In one embodiment, the second inverted crRNA sequence comprises six consecutive adenines. In one embodiment, the first crRNA and/or the second crRNA target more than one sequence.
In one embodiment, the vector comprises the nucleic acid sequence of SEQ ID NO: 1. In one embodiment, the vector comprises the nucleic acid sequence of SEQ ID NO: 2. In one embodiment, the vector comprises SEQ ID NO: 21,697.
In one embodiment, the crRNA array comprises any one of the vectors of the present invention. In one embodiment, the crRNA library comprises any one of the vectors of the present invention.
In one embodiment, the first crRNA sequence is complementary to a gene selected from the group consisting of Pten and Nf1, and the second crRNA sequence is complementary to a gene selected from the group consisting of Pten and Nf1. In one embodiment, the first crRNA targets Nf1 and the second crRNA targets Pten. In one embodiment, the first crRNA and/or the second crRNA targets a panel of immunomodulatory factors comprising Cd274, Ido1, B2m, Fas1, Jak2, and Lgals9.
In one embodiment, the crRNA arrays comprise a nucleotide sequence selected from the group consisting of SEQ ID NOs: 4-9,708. In one embodiment, the crRNA arrays comprise a nucleotide sequence selected from the group consisting of SEQ ID NOs: 9,762-21,695. In one embodiment, the plurality of crRNA arrays comprise a nucleotide sequence selected from the group consisting of SEQ ID NOs. 4-9,708. In one embodiment, the plurality of crRNA arrays comprise a nucleotide sequence selected from the group consisting of SEQ ID NOs. 9,762-21,695. In one embodiment, the crRNA comprises at least one additional crRNA sequence that is complementary to at least one additional target sequence. In one embodiment, the first crRNA and/or the second crRNA targets more than one sequence.
In one embodiment, the crRNA library comprises a Cpf1 crRNA array screening (CCAS) library, wherein the crRNA arrays consist of SEQ ID NOs: 4-9,708. In one embodiment, the crRNA library comprises a Massively-Parallel crRNA Array Profiling (MCAP) library comprising a plurality of crRNA arrays targeting pairwise combinations of genes significantly mutated in human metastases. In one embodiment, the MCAP library comprises crRNA arrays consisting of SEQ ID NOs: 9,762-21,695.
In one embodiment, the cell is selected from the group consisting of a T cell, a CD8+ cell, a CD4+ cell, a dendritic cell, an endothelial cell, and a stem cell. In one embodiment, the cell is a human cell. In one embodiment, the animal is a mouse. In one embodiment, the animal is a human.
In one embodiment, the mutagenesis is selected from the group consisting of nucleotide insertion, nucleotide deletion, frameshift mutation, gene activation, gene repression, and epigenetic modification.
The following detailed description of specific embodiments of the invention will be better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there are shown in the drawings exemplary embodiments. It should be understood, however, that the invention is not limited to the precise arrangements and instrumentalities of the embodiments shown in the drawings.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although any methods and materials similar or equivalent to those described herein can be used in the practice for testing of the present invention, specific materials and methods are described herein. In describing and claiming the present invention, the following terminology will be used.
It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.
The articles “a” and “an” are used herein to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. By way of example, “an element” means one element or more than one element.
“About” as used herein when referring to a measurable value such as an amount, a temporal duration, and the like, is meant to encompass variations of 20% or 10%, more preferably 5%, even more preferably 1%, and still more preferably 0.1% from the specified value, as such variations are appropriate to perform the disclosed methods.
As used herein the term “amount” refers to the abundance or quantity of a constituent in a mixture.
As used herein, the term “bp” refers to base pair.
The term “complementary” refers to the degree of anti-parallel alignment between two nucleic acid strands. Complete complementarity requires that each nucleotide be across from its opposite. No complementarity requires that each nucleotide is not across from its opposite. The degree of complementarity determines the stability of the sequences to be together or anneal/hybridize. Furthermore various DNA repair functions as well as regulatory functions are based on base pair complementarity.
The term “CRISPR/Cas” or “clustered regularly interspaced short palindromic repeats” or “CRISPR” refers to DNA loci containing short repetitions of base sequences followed by short segments of spacer DNA from previous exposures to a virus or plasmid. Bacteria and archaea have evolved adaptive immune defenses termed CRISPR/CRISPR-associated (Cas) systems that use short RNA to direct degradation of foreign nucleic acids. In bacteria, the CRISPR system provides acquired immunity against invading foreign DNA via RNA-guided DNA cleavage. “crRNA” or “CRISPR targeting RNA” is the transcribed region of the unique “spacer” sequences found in CRISPRs. The cRNAs confer target specificity to the endonuclease, e.g. Cpf1.
The term “cleavage” refers to the breakage of covalent bonds, such as in the backbone of a nucleic acid molecule or the hydrolysis of peptide bonds. Cleavage can be initiated by a variety of methods, including, but not limited to, enzymatic or chemical hydrolysis of a phosphodiester bond. Both single-stranded cleavage and double-stranded cleavage are possible. Double-stranded cleavage can occur as a result of two distinct single-stranded cleavage events. DNA cleavage can result in the production of either blunt ends or staggered ends. In certain embodiments, fusion polypeptides can be used for targeting cleaved double-stranded DNA.
A “disease” is a state of health of an animal wherein the animal cannot maintain homeostasis, and wherein if the disease is not ameliorated then the animal's health continues to deteriorate. In contrast, a “disorder” in an animal is a state of health in which the animal is able to maintain homeostasis, but in which the animal's state of health is less favorable than it would be in the absence of the disorder. Left untreated, a disorder does not necessarily cause a further decrease in the animal's state of health.
“Effective amount” or “therapeutically effective amount” are used interchangeably herein, and refer to an amount of a compound, formulation, material, or composition, as described herein effective to achieve a particular biological result or provides a therapeutic or prophylactic benefit. Such results may include, but are not limited to, anti-tumor activity as determined by any means suitable in the art.
“Encoding” refers to the inherent property of specific sequences of nucleotides in a polynucleotide, such as a gene, a cDNA, or an mRNA, to serve as templates for synthesis of other polymers and macromolecules in biological processes having either a defined sequence of nucleotides (i.e., rRNA, tRNA and mRNA) or a defined sequence of amino acids and the biological properties resulting therefrom. Thus, a gene encodes a protein if transcription and translation of mRNA corresponding to that gene produces the protein in a cell or other biological system. Both the coding strand, the nucleotide sequence of which is identical to the mRNA sequence and is usually provided in sequence listings, and the non-coding strand, used as the template for transcription of a gene or cDNA, can be referred to as encoding the protein or other product of that gene or cDNA.
As used herein “endogenous” refers to any material from or produced inside an organism, cell, tissue or system.
The term “expression” as used herein is defined as the transcription and/or translation of a particular nucleotide sequence driven by its promoter.
“Expression vector” refers to a vector comprising a recombinant polynucleotide comprising expression control sequences operatively linked to a nucleotide sequence to be expressed. An expression vector comprises sufficient cis-acting elements for expression; other elements for expression can be supplied by the host cell or in an in vitro expression system. Expression vectors include all those known in the art, such as cosmids, plasmids (e.g., naked or contained in liposomes) and viruses (e.g., Sendai viruses, lentiviruses, retroviruses, adenoviruses, and adeno-associated viruses) that incorporate the recombinant polynucleotide.
“Homologous” as used herein, refers to the subunit sequence identity between two polymeric molecules, e.g., between two nucleic acid molecules, such as, two DNA molecules or two RNA molecules, or between two polypeptide molecules. When a subunit position in both of the two molecules is occupied by the same monomeric subunit; e.g., if a position in each of two DNA molecules is occupied by adenine, then they are homologous at that position. The homology between two sequences is a direct function of the number of matching or homologous positions; e.g., if half (e.g., five positions in a polymer ten subunits in length) of the positions in two sequences are homologous, the two sequences are 50% homologous; if 90% of the positions (e.g., 9 of 10), are matched or homologous, the two sequences are 90% homologous.
“Identity” as used herein refers to the subunit sequence identity between two polymeric molecules particularly between two amino acid molecules, such as, between two polypeptide molecules. When two amino acid sequences have the same residues at the same positions; e.g., if a position in each of two polypeptide molecules is occupied by an arginine, then they are identical at that position. The identity or extent to which two amino acid sequences have the same residues at the same positions in an alignment is often expressed as a percentage. The identity between two amino acid sequences is a direct function of the number of matching or identical positions; e.g., if half (e.g., five positions in a polymer ten amino acids in length) of the positions in two sequences are identical, the two sequences are 50% identical; if 90% of the positions (e.g., 9 of 10), are matched or identical, the two amino acids sequences are 90% identical.
As used herein, an “instructional material” includes a publication, a recording, a diagram, or any other medium of expression which can be used to communicate the usefulness of the compositions and methods of the invention. The instructional material of the kit of the invention may, for example, be affixed to a container which contains the nucleic acid, peptide, and/or composition of the invention or be shipped together with a container which contains the nucleic acid, peptide, and/or composition. Alternatively, the instructional material may be shipped separately from the container with the intention that the instructional material and the compound be used cooperatively by the recipient.
“Isolated” means altered or removed from the natural state. For example, a nucleic acid or a peptide naturally present in a living animal is not “isolated,” but the same nucleic acid or peptide partially or completely separated from the coexisting materials of its natural state is “isolated.” An isolated nucleic acid or protein can exist in substantially purified form, or can exist in a non-native environment such as, for example, a host cell.
The term “knockdown” as used herein refers to a decrease in gene expression of one or more genes. The term “knockout” as used herein refers to the ablation of gene expression of one or more genes.
A “lentivirus” as used herein refers to a genus of the Retroviridae family. Lentiviruses are unique among the retroviruses in being able to infect non-dividing cells; they can deliver a significant amount of genetic information into the DNA of the host cell, so they are one of the most efficient methods of a gene delivery vector. HIV, SIV, and FIV are all examples of lentiviruses. Vectors derived from lentiviruses offer the means to achieve significant levels of gene transfer in vivo.
By the term “modified” as used herein, is meant a changed state or structure of a molecule or cell of the invention. Molecules may be modified in many ways, including chemically, structurally, and functionally. Cells may be modified through the introduction of nucleic acids.
By the term “modulating,” as used herein, is meant mediating a detectable increase or decrease in the level of a response in a subject compared with the level of a response in the subject in the absence of a treatment or compound, and/or compared with the level of a response in an otherwise identical but untreated subject. The term encompasses perturbing and/or affecting a native signal or response thereby mediating a beneficial therapeutic response in a subject, preferably, a human.
A “mutation” as used herein is a change in a DNA sequence resulting in an alteration from a given reference sequence (which may be, for example, an earlier collected DNA sample from the same subject). The mutation can comprise deletion and/or insertion and/or duplication and/or substitution of at least one deoxyribonucleic acid base such as a purine (adenine and/or thymine) and/or a pyrimidine (guanine and/or cytosine). Mutations may or may not produce discernible changes in the observable characteristics (phenotype) of an organism (subject).
By “nucleic acid” is meant any nucleic acid, whether composed of deoxyribonucleosides or ribonucleosides, and whether composed of phosphodiester linkages or modified linkages such as phosphotriester, phosphoramidate, siloxane, carbonate, carboxymethylester, acetamidate, carbamate, thioether, bridged phosphoramidate, bridged methylene phosphonate, phosphorothioate, methylphosphonate, phosphorodithioate, bridged phosphorothioate or sulfone linkages, and combinations of such linkages. The term nucleic acid also specifically includes nucleic acids composed of bases other than the five biologically occurring bases (adenine, guanine, thymine, cytosine and uracil).
In the context of the present invention, the following abbreviations for the commonly occurring nucleic acid bases are used. “A” refers to adenosine, “C” refers to cytosine, “G” refers to guanosine, “T” refers to thymidine, and “U” refers to uridine.
Unless otherwise specified, a “nucleotide sequence encoding an amino acid sequence” includes all nucleotide sequences that are degenerate versions of each other and that encode the same amino acid sequence. The phrase nucleotide sequence that encodes a protein or an RNA may also include introns to the extent that the nucleotide sequence encoding the protein may in some version contain an intron(s).
The term “oligonucleotide” typically refers to short polynucleotides, generally no greater than about 60 nucleotides. It will be understood that when a nucleotide sequence is represented by a DNA sequence (i.e., A, T, G, C), this also includes an RNA sequence (i.e., A, U, G, C) in which “U” replaces “T”.
“Parenteral” administration of an immunogenic composition includes, e.g., subcutaneous (s.c.), intravenous (i.v.), intramuscular (i.m.), or intrasternal injection, or infusion techniques.
The term “polynucleotide” as used herein is defined as a chain of nucleotides.
Furthermore, nucleic acids are polymers of nucleotides. Thus, nucleic acids and polynucleotides as used herein are interchangeable. One skilled in the art has the general knowledge that nucleic acids are polynucleotides, which can be hydrolyzed into the monomeric “nucleotides.” The monomeric nucleotides can be hydrolyzed into nucleosides. As used herein polynucleotides include, but are not limited to, all nucleic acid sequences which are obtained by any means available in the art, including, without limitation, recombinant means, i.e., the cloning of nucleic acid sequences from a recombinant library or a cell genome, using ordinary cloning technology and PCR™, and the like, and by synthetic means. Conventional notation is used herein to describe polynucleotide sequences: the left-hand end of a single-stranded polynucleotide sequence is the 5′-end; the left-hand direction of a double-stranded polynucleotide sequence is referred to as the 5′-direction.
As used herein, the terms “peptide,” “polypeptide,” and “protein” are used interchangeably, and refer to a compound comprised of amino acid residues covalently linked by peptide bonds. A protein or peptide must contain at least two amino acids, and no limitation is placed on the maximum number of amino acids that can comprise a protein's or peptide's sequence. Polypeptides include any peptide or protein comprising two or more amino acids joined to each other by peptide bonds. As used herein, the term refers to both short chains, which also commonly are referred to in the art as peptides, oligopeptides and oligomers, for example, and to longer chains, which generally are referred to in the art as proteins, of which there are many types. “Polypeptides” include, for example, biologically active fragments, substantially homologous polypeptides, oligopeptides, homodimers, heterodimers, variants of polypeptides, modified polypeptides, derivatives, analogs, fusion proteins, among others. The polypeptides include natural peptides, recombinant peptides, synthetic peptides, or a combination thereof.
The term “promoter” as used herein is defined as a DNA sequence recognized by the synthetic machinery of the cell, or introduced synthetic machinery, required to initiate the specific transcription of a polynucleotide sequence.
A “sample” or “biological sample” as used herein means a biological material from a subject, including but is not limited to organ, tissue, exosome, blood, plasma, saliva, urine and other body fluid. A sample can be any source of material obtained from a subject.
As used herein, the terms “sequencing” or “nucleotide sequencing” refer to determining the order of nucleotides (base sequences) in a nucleic acid sample, e.g. DNA or RNA. Many techniques are available such as Sanger sequencing and high-throughput sequencing technologies (also known as next-generation sequencing technologies) such as Illumina's HiSeq and MiSeq platforms or the GS FLX platform offered by Roche Applied Science.
The term “subject” is intended to include living organisms in which an immune response can be elicited (e.g., mammals). A “subject” or “patient,” as used therein, may be a human or non-human mammal. Non-human mammals include, for example, livestock and pets, such as ovine, bovine, porcine, canine, feline and murine mammals. Preferably, the subject is human.
A “target site” or “target sequence” refers to a genomic nucleic acid sequence that defines a portion of a nucleic acid to which a binding molecule may specifically bind under conditions sufficient for binding to occur.
The term “therapeutic” as used herein means a treatment and/or prophylaxis. A therapeutic effect is obtained by suppression, remission, or eradication of a disease state.
The term “transfected” or “transformed” or “transduced” as used herein refers to a process by which exogenous nucleic acid is transferred or introduced into the host cell. A “transfected” or “transformed” or “transduced” cell is one that has been transfected, transformed or transduced with exogenous nucleic acid. The cell includes the primary subject cell and its progeny.
To “treat” a disease as the term is used herein, means to reduce the frequency or severity of at least one sign or symptom of a disease or disorder experienced by a subject.
A “vector” is a composition of matter which comprises an isolated nucleic acid and which can be used to deliver the isolated nucleic acid to the interior of a cell. Numerous vectors are known in the art including, but not limited to, linear polynucleotides, polynucleotides associated with ionic or amphiphilic compounds, plasmids, and viruses. Thus, the term “vector” includes an autonomously replicating plasmid or a virus. The term should also be construed to include non-plasmid and non-viral compounds which facilitate transfer of nucleic acid into cells, such as, for example, polylysine compounds, liposomes, and the like. Examples of viral vectors include, but are not limited to, Sendai viral vectors, adenoviral vectors, adeno-associated virus vectors, retroviral vectors, lentiviral vectors, and the like.
Ranges: throughout this disclosure, various aspects of the invention can be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 2.7, 3, 4, 5, 5.3, and 6. This applies regardless of the breadth of the range.
The present invention provides, in one aspect, compositions and methods for simultaneously mutagenizing multiple target sequences in a cell. In certain aspects, the invention provides compositions and methods for sequentially mutagenizing multiple target sequences in a cell. In other aspects, the invention provides methods for identifying synergistic drivers of transformation and/or tumorigenesis and/or metastasis. In other aspects, the invention provides in vivo methods for identifying and mapping genetic interactions.
Certain aspects of the invention include lentiviral vectors for use in genome editing. In one aspect, the invention includes a vector comprising a first long terminal repeat (LTR) sequence, an Embryonal Fyn-Associated Substrate (EFS) sequence, a Cpf1 sequence, a Nuclear Localization Signal (NLS) sequence, a Flag2A sequence, an antibiotic resistance sequence, and a second LTR sequence (pLenti-EFS-Cpf1-blast vector, LentiCpf1 for short). The Cpf1 enzyme can be derived from any genera of microbes including but not limited to Parcubacteria, Lachnospiraceae, Butyrivibrio, Peregrinibacteria, Acidaminococcus, Porphyromonas, Lachnospiraceae, Porphromonas, Prevotella, Moraxela, Smithella, Leptospira, Lachnospiraceae, Francisella, Candidatus, and Eubacterium. In certain embodiments, Cpf1 is derived from a species from the Lachnospiraceae genus (LbCpf1). In some embodiments, the Cpf1 sequence comprises a humanized form of a Lachnospiraceae bacterium Cpf1 (LbCpf1). In one embodiment, the antibiotic resistance sequence is a blasticidin resistance sequence. In one embodiment, the vector comprises SEQ ID NO: 1 (
In another aspect, the invention includes a vector comprising a first long terminal repeat (LTR) sequence, a U6 sequence, a direct repeat sequence of Cpf1, a first restriction site, a second restriction site, an EFS sequence, an antibiotic resistance sequence, a Woodchuck Hepatitis Virus 50 (WHP) Posttranscriptional Regulatory Element (WPRE) sequence, and a second LTR sequence (pLenti-U6-DR-crRNA-puro vector, Lenti-U6-crRNA for short). In certain embodiments, the first and/or second restriction site is a BsmBI restriction site. In one embodiment, the antibiotic resistance sequence is a puromycin resistance sequence. In one aspect, the vector comprises SEQ ID NO: 2 (
In another aspect, the invention includes a crRNA array comprising a 5′ nucleotide sequence that is homologous to a first nucleotide sequence on a vector, a first crRNA sequence, a direct repeat sequence of Cpf1, a second crRNA sequence, a terminator sequence, and a 3′ sequence that is homologous to a second sequence on the vector. In one embodiment, the terminator sequence is a U6 terminator sequence. The vector can include any vector known in the art or described herein. In certain embodiments the vector comprises the pLenti-U6-DR-crRNA-puro vector. The crRNA sequences can be designed to target any gene of interest or nucleotide sequence of interest.
In yet another aspect, the invention includes a double knockout crRNA expression vector (pLenti-U6-DR-cr1-DR-cr2-puro). The vector comprises a first LTR sequence, a promoter sequence, a first direct repeat sequence of Cpf1, a first crRNA sequence, a second direct repeat sequence of Cpf1, a second crRNA sequence, a terminator sequence, an EFS sequence, a WPRE sequence, and a second LTR sequence. In one embodiment, the promoter sequence is a U6 promoter sequence. In one embodiment, the terminator sequence is a U6 terminator sequence.
The crRNA sequences can target any gene or nucleotide sequence of interest. In certain embodiments, the first crRNA sequence is complementary to a gene selected from the group consisting of Pten and Nf1, and the second crRNA sequence is complementary to a gene selected from the group consisting of Pten and Nf1. The first and second crRNAs can target the same gene/sequence or different genes/sequences. The vector can further comprise additional crRNA sequences totaling up to 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 crRNAs in one vector.
In one aspect, the invention includes a Cpf1 crRNA array screening (CCAS) library. In another aspect, the invention includes a Massively-Parallel crRNA Array Profiling (MCAP) library. In certain embodiments, the library comprises a plurality of the crRNA arrays of the invention cloned into a plurality of the vectors of the invention. In certain embodiments, the MCAP library comprises a plurality of crRNA arrays targeting pairwise combinations of genes significantly mutated in human metastases. In certain embodiments, the crRNA arrays in the library comprise at least one nucleotide sequence selected from the group consisting of SEQ ID NOs. 4-9,708. In certain embodiments, the crRNA arrays in the library consist of the nucleotide sequences of SEQ ID NOs. 4-9,708. In certain embodiments, the crRNA arrays in the library comprise at least one nucleotide sequence selected from the group consisting of SEQ ID NOs. 9,762-21,695. In certain embodiments, the crRNA arrays in the library consist of the nucleotide sequences of SEQ ID NOs. 9,762-21,695.
The invention also provides, in one aspect, a kit comprising a CCAS library comprising a plurality of vectors comprising a plurality of crRNA arrays, wherein the crRNA arrays comprise a nucleotide sequence selected from the group consisting of SEQ ID NOs. 4-9,708. In another aspect, the invention includes a kit comprising a MCAP library comprising a plurality of vectors comprising a plurality of crRNA arrays, wherein the crRNA arrays comprise a nucleotide sequence selected from the group consisting of SEQ ID NOs. 9,762-21,695. Also included in the kits are instructional materials for use thereof. Instructional material can include directions for using the components of the kit as well as instructions and guidance for interpreting the results. In one aspect, the kit comprises at least one additional crRNA sequence that is complementary to at least one additional target sequence. For example, the kit is capable of multiplexing 3 or more crRNAs in each array in order to study triple knockouts and even higher-dimension (i.e., quadruple or higher) genetic interactions.
Methods Described herein are multiplexed Cpf1 screens that provide a powerful tool for studying genetic interactions with unparalleled simplicity and specificity. The Cpf1 crRNA array screening (CCAS) and MCAP (Massively-parallel crRNA array profiling) technologies enable rapid identification of all combinations of double inhibition of two targets simultaneously. The methods described herein can be broadly applied to many cell types of interest, including but not limited to cancer cells. As shown in the present study (
The methods can also encompass additional applications in immune cells for immunotherapy screening and enhancement. Editing of primary immune cells (such as Dendritic cells (DCs)) was demonstrated herein (
Applications in primary cells for improving regenerative medicine are also encompassed by this approach. Editing of freshly isolated primary cells (such as Endothelial cells (ECs)) was demonstrated herein (
In one aspect, the invention includes a method for simultaneously mutagenizing multiple target sequences in a cell. The method comprises administering to the cell a CCAS library. The CCAS library comprises a plurality of vectors comprising a plurality of crRNA arrays. The crRNA arrays comprise a 5′ nucleotide sequence that is homologous to a first nucleotide sequence on the vector, a first crRNA sequence, a direct repeat sequence of Cpf1, a second crRNA sequence, a terminator sequence, and a 3′ sequence that is homologous to a second sequence on the vector, and wherein the first crRNA is complementary to a first target sequence and the second crRNA is complementary to a second target sequence. In certain embodiments, the plurality of crRNA arrays in the CCAS library comprises a nucleotide sequence selected from the group consisting of SEQ ID NOs. 4-9,708. The method can also include additional crRNA sequences complementary to additional target sequences. For example, additional crRNA sequences totaling up to 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 crRNAs can be included in the methods as described herein.
In another aspect, the invention includes a method for simultaneously mutagenizing multiple target sequences in a cell comprising administering to the cell a MCAP library. The MCAP library comprises a plurality of vectors comprising a plurality of crRNA arrays. The crRNA arrays comprise a 5′ nucleotide sequence that is homologous to a first nucleotide sequence on the vector, a first crRNA sequence, a direct repeat sequence of Cpf1, a second crRNA sequence, a terminator sequence, and a 3′ sequence that is homologous to a second sequence on the vector, and wherein the first crRNA is complementary to a first target sequence and the second crRNA is complementary to a second target sequence. In certain embodiments, the plurality of crRNA arrays in the MCAP library comprises a nucleotide sequence selected from the group consisting of SEQ ID NOs. 9,762-21,695.
By ‘target sequence’ is meant any nucleic acid sequence or gene of interest targeted to be mutated by the methods described herein.
Any type of cell can be mutagenized by the methods described herein, including but not limited to cancer cells, immune cells, cell lines, hybridomas, primary cells, T cells, dendritic cells (DCs), endothelial cells, brain endothelial cells, macrophages, monocytes, CD8+ cells, CD4+ cells, T regulatory (Treg) cells, B cells, Natural Killer cells (NKs), and stem cells.
Another aspect of the invention includes a method of identifying synergistic drivers of transformation and/or tumorigenesis and/or metastasis in vivo. The method comprises administering to an animal cells mutagenized by a CCAS library. The CCAS library comprises a plurality of vectors comprising a plurality of crRNA arrays. Each crRNA array comprises a 5′ nucleotide sequence that is homologous to a first nucleotide sequence on the vector, a first crRNA sequence, a direct repeat sequence of Cpf1, a second crRNA sequence, a terminator sequence, and a 3′ sequence that is homologous to a second sequence on the vector. The first crRNA is complementary to a first target sequence and the second crRNA is complementary to a second target sequence. A nucleotide from a tumor from the animal are sequenced, and the data are analyzed to identify the synergistic drivers of transformation and/or tumorigenesis.
Still another aspect of the invention includes a method of identifying synergistic drivers of transformation and/or tumorigenesis and/or metastasis in vivo comprising administering cells mutagenized by a MCAP library to an animal. The MCAP library comprises a plurality of vectors comprising a plurality of crRNA arrays. Each crRNA array comprises a 5′ nucleotide sequence that is homologous to a first nucleotide sequence on the vector, a first crRNA sequence, a direct repeat sequence of Cpf1, a second crRNA sequence, a terminator sequence, and a 3′ sequence that is homologous to a second sequence on the vector. In certain embodiments, the MCAP library comprises a plurality of crRNA arrays targeting pairwise combinations of genes significantly mutated in human metastases. The first crRNA is complementary to a first target sequence and the second crRNA is complementary to a second target sequence. A nucleotide from a tumor from the animal are sequenced, and the data are analyzed to identify the synergistic drivers of transformation and/or tumorigenesis.
Yet another aspect of the invention includes an in vivo method for identifying and mapping genetic interactions. The method comprises administering cells mutagenized by a CCAS library to an animal. The CCAS library comprises a plurality of vectors comprising a plurality of crRNA arrays. Each crRNA array comprises a 5′ nucleotide sequence that is homologous to a first nucleotide sequence on the vector, a first crRNA sequence, a direct repeat sequence of Cpf1, a second crRNA sequence, a U6 terminator sequence, and a 3′ sequence that is homologous to a second sequence on the vector. The first crRNA is complementary to a first target sequence, and the second crRNA is complementary to a second target sequence. A nucleotide from a tumor and/or tissue and/or cell of the animal are sequenced, and the data are analyzed to identify and map the genetic interactions.
Another aspect of the invention includes an in vivo method for identifying and mapping genetic interactions. The method comprises administering to an animal cells mutagenized by a MCAP library. The MCAP library comprises a plurality of vectors comprising a plurality of crRNA arrays. Each crRNA array comprises a 5′ nucleotide sequence that is homologous to a first nucleotide sequence on the vector, a first crRNA sequence, a direct repeat sequence of Cpf1, a second crRNA sequence, a terminator sequence, and a 3′ sequence that is homologous to a second sequence on the vector. The first crRNA is complementary to a first target sequence, and the second crRNA is complementary to a second target sequence. A nucleotide (DNA or RNA) from a tumor and/or tissue and/or cell of the animal are sequenced, and the data are analyzed to identify and map the genetic interactions.
In certain embodiments of the methods, the plurality of crRNA arrays comprises SEQ ID NOs. 4-9,708. In certain embodiments of the methods, the plurality of crRNA arrays comprises SEQ ID NOs. 9,762-21,695. In certain embodiments, the methods further comprise wherein the crRNA comprises additional crRNA sequences that are complementary to additional target sequences. The methods of the invention are capable of multiplexing 3 or more crRNAs in each array in order to study triple knockouts and even higher-dimension genetic interactions.
Nucleotide sequencing or “sequencing”, as it is commonly known in the art, can be performed by standard methods commonly known to one of ordinary skill in the art. In certain embodiments of the invention, sequencing is performed by targeted capture sequencing.
Targeted captured sequencing can be performed as described herein, or by methods commonly performed by one of ordinary skill in the art. In certain embodiments of the invention, sequencing is performed via next-generation sequencing. Next-generation sequencing (NGS), also known as high-throughput sequencing, is used herein to describe a number of different modern sequencing technologies that allow to sequence DNA and RNA much more quickly and cheaply than the previously used Sanger sequencing (Metzker, 2010, Nature Reviews Genetics 11.1: 31-46). It is based on micro- and nanotechnologies to reduce the size of sample, the reagent costs, and to enable massively parallel sequencing reactions. It can be highly multiplexed, which allows simultaneous sequencing and analysis of millions of samples. NGS includes first, second, third as well as subsequent Next Generations Sequencing technologies. Data generated from NGS can be analyzed via a broad range of computational tools and statistical methods including but not limited to those described herein. Sequencing can also be performed at the single cell level, e.g. single cell sequencing. Sequencing can be performed on DNA as well as RNA (e.g. RNASeq). The wide variety of analysis can be appreciated and performed by those skilled in the art.
Mutagenizing a cell can include introducing mutations throughout the genome of the cell. The mutations introduced can be any combination of insertions or deletions, including but not limited to a single base insertion, a single base deletion, a frameshift, a rearrangement, and an insertion or deletion of 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, any and all numbers in between, bases. The mutation can occur in a gene or in a non-coding region.
In certain embodiments of the invention, the animal is a mouse. Other animals that can be used include but are not limited to rats, rabbits, dogs, cats, horses, pigs, cows and birds. In certain embodiments, the animal is a human. The sgRNA library can be administered to an animal by any means standard in the art. For example the vectors can be injected into the animal. The injections can be intravenous, subcutaneous, intraperitoneal, or directly into a tissue or organ. In certain embodiments, the sgRNA library is adoptively transferred to the animal.
Cpf1-Flip In certain aspects, the invention includes compositions and methods for sequential mutagnesis in a cell using the Cpf1-Flip system.
In a large variety of biological and pathological processes, genetic mutations or alterations are often acquired in a sequential manner. In evolution and speciation, the genomes of organisms acquire mutations constantly and are subjected to natural selection. In genetically complex disorders such as cancer, multi-step mutagenesis is often a major obstacle for effective treatments. Cancers evolve through an ongoing process of mutation-selection balance, where initial mutations are selected for, or against, in vivo, followed by subsequent acquisition of additional mutations as the tumor grows. Since the initial set of oncogenic “driver” mutations is generally what starts and sustains tumor growth, targeted molecular therapies are often chosen to specifically attack such oncogenic dependencies. However, the selection pressures of treatment favor secondary mutations that confer drug resistance, leading to relapse. Thus, the process of cancer evolution by sequential mutagenesis stymies these therapies via continuous diversification and adaptation to the tumor microenvironment, eventually exhausting available treatment options. Even with the advent of cancer immunotherapy, where checkpoint blockade is increasingly being utilized in the clinic, the acquisition of secondary mutations that abolish T cell receptor (TCR)—antigen—major histocompatibility complex (MHC) recognition can still lead to immune escape and ultimately negate the effect of immunotherapy. Thus, the ability to perform sequential and precise mutagenesis is critical for studying biological processes with multi-stage genetic events such as development and evolution, as well as the pathogenesis of complex diseases such as cancer.
From a genetic engineering perspective, stepwise mutagenesis or perturbation is a powerful technique for precise genetic manipulation of cells and live organisms. Multiple methods have been employed to achieve this end. In the pre-recombinant DNA era, stepwise perturbation was often done by multiple rounds of random mutagenesis using chemical or physical carcinogens followed by artificial selection. The subsequent discovery and application of recombinase systems such as Cre-loxP, Flp-FRT and cpC31-att enabled inducible genetic events. In these systems, the DNA recombinase (i.e. Cre) specifically recognizes its target DNA sequence motif (i.e. loxP) and catalyzes recombination between two such target sites. Depending on the configuration of the target sites, targeted recombinases can be utilized for DNA excision, translocation, and/or inversion. However, the floxed genomic loci underlying Cre-based systems must be pre-engineered on a gene-by-gene basis. This process of generating new floxed alleles for each unique application is time and labor intensive, further limiting the feasibility of multiplexed Cre recombination.
More recently, precisely targeted and customizable mutagenesis was simplified by the discovery of RNA-guided endonucleases (RGNs) Cas9 and Cpf1. RGNs can induce double strand DNA breaks, subsequently generating insertions and deletions at the target site. This process is precisely targeted based on the sequences of CRISPR RNAs (crRNAs), which complex with RGNs to enable and guide their nuclease functions. Unlike with Cre recombination, CRISPR crRNAs can be easily transferred to target cells through transfection or viral vectors, thus obviating the need to pre-engineer the host genome for each target gene. In contrast to Cas9, the most widely utilized RGN to date, Cpf1 is a single component RGN that does not depend on trans-activating RNA and can autonomously process CRISPR-RNA (crRNA) arrays. These features have made Cpf1 particularly attractive for multiplexed mutagenesis. In addition to several studies in mammalian systems, Cpf1-mediated mutagenesis and transcriptional repression have now been successfully applied in plants. Furthermore, chemical modifications on Cpf1 mRNA and crRNAs have been identified that can improve cutting efficiency. Cpf1 can also process crRNAs from mRNAs expressed by a Pol II promoter, further enabling flexible transcriptional control.
Sequential mutagenesis using Cas9 has been demonstrated in ex vivo organoid cultures. However, this approach required sequentially introducing each sgRNA in culture, one at a time, limiting its broader applicability. In particular, the sequential introduction of different sgRNAs would be impractical for library-scale screening or any in vivo experimental designs. Prior to this disclosure, conditional sequential mutagenesis using RGNs has not yet been demonstrated.
Herein, a flexible sequential mutagenesis system was created through inducible inversion of a single crRNA array (Cpf1-Flip) and its simplicity demonstrated in stepwise multiplexed gene editing in mammalian cells for modeling sequential genetic events, such as in cancer. Cpf1-Flip was further applied to model the acquisition of resistance mutations to immunotherapy in a pooled mutagenesis setting, demonstrating the feasibility of Cpf1-Flip for conducting sequential genetic studies. This system can be utilized for multi-step mutagenesis of any genes in the genome for interrogating complex genetic events with temporal control.
In certain aspects, the invention includes a crRNA Flip Array. In one embodiment, the crRNA FlipArray comprises a first crRNA sequence, 4-10 consecutive thymidines, a second inverted crRNA sequence, 4-10 consecutive adenines, and a third inverted direct repeat sequence. In one embodiment, the first crRNA sequence comprises six consecutive thymidines. In one embodiment, the second inverted crRNA sequence comprises six consecutive adenines. The crRNA Flip Array can be included in any vector known to one of ordinary skill in the art.
In one embodiment, the invention includes a vector comprising a first promoter, a Cpf1 sequence, a second promoter, a first Cpf1 direct repeat sequence, a lox66 sequence, a second Cpf1 direct repeat sequence, two inverted restriction sites, an inverted lox71 sequence, and a crRNA FlipArray, wherein the crRNA FlipArray comprises a first crRNA sequence, 4-10 consecutive thymidines, a second inverted crRNA sequence, 4-10 consecutive adenines, and a third inverted direct repeat sequence.
In certain embodiments, the vector comprises SEQ ID NO: 21,697. In one embodiment, the first promoter is an EFS promoter. In one embodiment, the EFS promoter drives expression of Cpf1. In one embodiment, the second promoter is a U6 promoter. In one embodiment, the U6 promoter drives expression of the crRNA FlipArray. In one embodiment, the first promoter and the second promoter are in opposite orientations. In one embodiment, the vector further comprises an antibiotic resistance marker. In one embodiment, the antibiotic resistance marker is a puromycin resistance sequence. In one embodiment, the restriction sites are BsmbI restriction sites. In one embodiment, the Cpf1 sequence is a Lachnospiraceae bacterium Cpf1 (LbCpf1) sequence. In one embodiment, any one of the first, second, or third, direct repeat sequences is from LbCpf1.
In one aspect, the invention includes a gene editing system capable of inducible, sequential mutagenesis in a cell. The system comprising a vector and a Cre recombinase, wherein the vector comprising a first promoter, a Cpf1 sequence, a second promoter, a first Cpf1 direct repeat sequence, a lox66 sequence, a second Cpf1 direct repeat sequence, two inverted restriction sites, an inverted lox71 sequence, and a crRNA FlipArray, wherein the crRNA FlipArray comprises a first crRNA sequence, 4-10 consecutive thymidines, a second inverted crRNA sequence, 4-10 consecutive adenines, and a third inverted direct repeat sequence.
Another aspect of the invention includes a gene editing system capable of inducible, sequential mutagenesis in a cell comprising a plurality of vectors and a Cre recombinase. The vectors comprising a first promoter, a Cpf1 sequence, a second promoter, a first Cpf1 direct repeat sequence, a lox66 sequence, a second Cpf1 direct repeat sequence, two inverted restriction sites, an inverted lox71 sequence, and a crRNA FlipArray, wherein the crRNA FlipArray comprises a first crRNA sequence, 4-10 consecutive thymidines, a second inverted crRNA sequence, 4-10 consecutive adenines, and a third inverted direct repeat sequence.
In any of the gene editing systems of the present invention, the first crRNA and/or the second crRNA can target more than one sequence.
In another aspect, the invention includes a method of inducible, sequential mutagenesis in a cell. The method comprises administering to the cell a vector comprising a first promoter, a Cpf1 sequence, a second promoter, a first Cpf1 direct repeat sequence, a lox66 sequence, a second Cpf1 direct repeat sequence, two inverted restriction sites, an inverted lox71 sequence, and a crRNA FlipArray, wherein the crRNA FlipArray comprises a first crRNA sequence, 4-10 consecutive thymidines, a second inverted crRNA sequence, 4-10 consecutive adenines, and a third inverted direct repeat sequence. The first crRNA is expressed, then a Cre recombinase is administered to the cell. When the Cre recombinase is administered, the second crRNA is expressed, thus sequentially mutagenizing the cell.
Another aspect of the invention includes a method of inducible, sequential mutagenesis in a cell comprising administering to the cell a plurality of vectors. The vectors individually comprise a first promoter, a Cpf1 sequence, a second promoter, a first Cpf1 direct repeat sequence, a lox66 sequence, a second Cpf1 direct repeat sequence, two inverted restriction sites, an inverted lox71 sequence, and a crRNA FlipArray, wherein the crRNA FlipArray comprises a first crRNA sequence, 4-10 consecutive thymidines, a second inverted crRNA sequence, 4-10 consecutive adenines, and a third inverted direct repeat sequence. The first crRNA is expressed and a Cre recombinase is administered to the cell. When the Cre recombinase is administered, the second crRNA is expressed, thus sequentially mutagenizing the cell.
Yet another aspect of the invention includes a method of inducible, sequential mutagenesis in a cell in an animal. The method comprises administering to the animal a plurality of vectors. The vectors individually comprise a first promoter, a Cpf1 sequence, a second promoter, a first Cpf1 direct repeat sequence, a lox66 sequence, a second Cpf1 direct repeat sequence, two inverted restriction sites, an inverted lox71 sequence, and a crRNA FlipArray, wherein the crRNA FlipArray comprises a first crRNA sequence, 4-10 consecutive thymidines, a second inverted crRNA sequence, 4-10 consecutive adenines, and a third inverted direct repeat sequence. The first crRNA is expressed and a Cre recombinase is administered to the animal.
When the Cre recombinase is administered, the second crRNA is expressed, thus sequentially mutagenizing the cell in the animal.
In one embodiment of the method, the cell is a human cell. In one embodiment, the animal is a mouse. In one embodiment, the animal is a human. In one embodiment, mutagenesis is selected from the group consisting of nucleotide insertion, nucleotide deletion, frameshift mutation, gene activation, gene repression, and epigenetic modification. In one embodiment, the first crRNA and/or the second crRNA target more than one sequence. In one embodiment, the first crRNA targets Nf1 and the second crRNA targets Pten. In one embodiment, the first crRNA targets Pten and the second crRNA targets Nf1. In one embodiment, the first crRNA and/or the second crRNA targets a panel of immunomodulatory factors comprising Cd274, Ido1, B2m, Fas1, Jak2, and Lgals9
As described herein, the discovery and characterization of the type V CRISPR system, Cpf1 (CRISPR from Prevotella and Francisella) has enabled rapid genome editing of multiple loci in the same cell. Cpf1 is a single component RNA-guided nuclease that can mediate target cleavage with a single crRNA. Compared to Cas9, Cpf1 does not require a tracrRNA, which greatly simplifies multiplexed genome editing of two or more loci simultaneously by using a string of crRNAs targeting different genes, as described herein. Thus, Cpf1 is an ideal system for high-throughput higher dimensional screens in mammalian species, with substantial advantages in library design and readout when compared to Cas9-based approaches. Herein, a Cpf1 crRNA array library that targets a set of the most significantly mutated cancer genes was designed. An unbiased screen was performed on two different mouse models, one studying early-stage tumorigenesis and the second studying cancer metastasis, identifying many unpredicted gene pairs. Thus, Cpf1 screening is a powerful approach to systematically quantify genetic interactions and identify new synergistic combinations. Unlike with Cas9-based strategies, due to the simple expansion of crRNA arrays, this approach can be readily extended to perform triple-, quadruple- or higher dimensional screens in vivo.
The Cpf1 enzyme can be derived from any genera of microbes, including but not limited to, Parcubacteria, Lachnospiraceae, Butyrivibrio, Peregrinibacteria, Acidaminococcus, Porphyromonas, Lachnospiraceae, Porphromonas, Prevotella, Moraxela, Smithella, Leptospira, Lachnospiraceae, Francisella, Candidatus, and Eubacterium. In certain embodiments, Cpf1 is derived from a species from the Acidaminococcus genus (AsCpf1). In other embodiments, Cpf1 is derived from a species from the Lachnospiraceae genus (LbCpf1). In yet other embodiments, the Cpf1 is a humanized form of LbCpf1.
In the context of formation of a CRISPR complex, “target sequence” refers to a sequence to which a crRNA sequence is designed to have some complementarity, where hybridization between a target sequence and a crRNA sequence promotes the formation of a CRISPR complex.
Full complementarity is not necessarily required, provided there is sufficient complementarity to cause hybridization and promote formation of a CRISPR complex. A target sequence may comprise any polynucleotide, such as DNA or RNA polynucleotides.
In certain embodiments, one or more vectors driving expression of one or more elements of a CRISPR system are introduced into a cell, such that expression of the elements of the CRISPR system direct formation of a CRISPR complex at one or more target sites. For example, a Cpf1 enzyme, and a crRNA could each be operably linked to separate regulatory elements on separate vectors. Alternatively, two or more of the elements expressed from the same or different regulatory elements may be combined in a single vector, with one or more additional vectors providing any components of the CRISPR system not included in the first vector. CRISPR system elements that are combined in a single vector may be arranged in any suitable orientation, such as one element located 5′ with respect to (“upstream” of) or 3′ with respect to (“downstream” of) a second element. The coding sequence of one element may be located on the same or opposite strand of the coding sequence of a second element, and oriented in the same or opposite direction.
In certain embodiments, the CRISPR enzyme is part of a fusion protein comprising one or more heterologous protein domains (e.g. about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more domains in addition to the CRISPR enzyme). A CRISPR enzyme fusion protein may comprise any additional protein sequence, and optionally a linker sequence between any two domains. Examples of protein domains that may be fused to a CRISPR enzyme include, without limitation, epitope tags, reporter gene sequences, and protein domains having one or more of the following activities: methylase activity, demethylase activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, RNA cleavage activity, and nucleic acid binding activity. Additional domains that can form part of a fusion protein comprising a CRISPR enzyme are described in U.S. Patent Appl. Publ. No. US20110059502, which is incorporated herein by reference. In certain embodiments, a tagged CRISPR enzyme is used to identify the location of a target sequence.
Conventional viral and non-viral based gene transfer methods can be used to introduce nucleic acids in mammalian and non-mammalian cells or target tissues. Such methods can be used to administer nucleic acids encoding components of a CRISPR system to cells in culture, or in a host organism. Non-viral vector delivery systems include DNA plasmids, RNA (e.g., a transcript of a vector described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome. Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell (Anderson, 1992, Science 256:808-813; and Yu, et al., 1994, Gene Therapy 1:13-26).
In one non-limiting embodiment, a vector drives the expression of the CRISPR system. The art is replete with suitable vectors that are useful in the present invention. The vectors to be used are suitable for replication and, optionally, integration in eukaryotic cells. Typical vectors contain transcription and translation terminators, initiation sequences, and promoters useful for regulation of the expression of the desired nucleic acid sequence. The vectors of the present invention may also be used for nucleic acid standard gene delivery protocols. Methods for gene delivery are known in the art (U.S. Pat. Nos. 5,399,346, 5,580,859 & 5,589,466, incorporated by reference herein in their entireties).
Further, the vector can be provided to a cell in the form of a viral vector. Viral vector technology is well known in the art and is described, for example, in Sambrook et al. (4th Edition, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, New York, 2012), and in other virology and molecular biology manuals. Viruses, which are useful as vectors include, but are not limited to, retroviruses, adenoviruses, adeno-associated viruses, herpes viruses, Sindbis virus, gammaretrovirus, and lentiviruses. In general, a suitable vector contains an origin of replication functional in at least one organism, a promoter sequence, convenient restriction endonuclease sites, and one or more selectable markers (e.g., WO 01/96584; WO 01/29058; and U.S. Pat. No. 6,326,193).
Introduction of Nucleic Acids Methods of introducing nucleic acids into a cell include physical, biological and chemical methods. Physical methods for introducing a polynucleotide, such as RNA, into a host cell include calcium phosphate precipitation, lipofection, particle bombardment, microinjection, electroporation, and the like. RNA can be introduced into target cells using commercially available methods including electroporation (Amaxa Nucleofector-II (Amaxa Biosystems, Cologne, Germany)), (ECM 830 (BTX) (Harvard Instruments, Boston, Mass.) or the Gene Pulser II (BioRad, Denver, Colo.), Multiporator (Eppendort, Hamburg Germany). RNA can also be introduced into cells using cationic liposome mediated transfection using lipofection, using polymer encapsulation, using peptide mediated transfection, or using biolistic particle delivery systems such as “gene guns” (see, for example, Nishikawa, et al., Hum Gene Ther., 12(8):861-70 (2001).
Biological methods for introducing a polynucleotide of interest into a host cell include use of DNA and RNA vectors. Viral vectors, and especially retroviral vectors, have become the most widely used method for inserting genes into mammalian, e.g., human cells. Other viral vectors can be derived from lentivirus, poxviruses, herpes simplex virus I, adenoviruses and adeno-associated viruses, and the like. See, for example, U.S. Pat. Nos. 5,350,674 and 5,585,362.
Chemical means for introducing a polynucleotide into a host cell include colloidal dispersion systems, such as macromolecule complexes, nanocapsules, microspheres, beads, and lipid-based systems including oil-in-water emulsions, micelles, mixed micelles, and liposomes. An exemplary colloidal system for use as a delivery vehicle in vitro and in vivo is a liposome (e.g., an artificial membrane vesicle).
Regardless of the method used to introduce exogenous nucleic acids into a host cell or otherwise expose a cell to the inhibitor of the present invention, in order to confirm the presence of the nucleic acids in the host cell, a variety of assays may be performed. Such assays include, for example, “molecular biological” assays well known to those of skill in the art, such as Southern and Northern blotting, RT-PCR and PCR; “biochemical” assays, such as detecting the presence or absence of a particular peptide, e.g., by immunological means (ELISAs and Western blots) or by assays described herein to identify agents falling within the scope of the invention.
It should be understood that the methods and compositions that would be useful in the present invention are not limited to the particular formulations set forth in the examples. The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description, and are not intended to limit the scope of what the inventors regard as their invention.
The practice of the present invention employs, unless otherwise indicated, conventional techniques of molecular biology (including recombinant techniques), microbiology, cell biology, biochemistry and immunology, which are well within the purview of the skilled artisan. Such techniques are explained fully in the literature, such as, Molecular Cloning: A Laboratory Manual”, fourth edition (Sambrook et al. (2012) Molecular Cloning, Cold Spring Harbor Laboratory); “Oligonucleotide Synthesis” (Gait, M. J. (1984). Oligonucleotide synthesis. IRL press); “Culture of Animal Cells” (Freshney, R. (2010). Culture of animal cells. Cell Proliferation, 15(2.3), 1); “Methods in Enzymology” “Weir's Handbook of Experimental Immunology” (Wiley-Blackwell; 5 edition (Jan. 15, 1996); “Gene Transfer Vectors for Mammalian Cells” (Miller and Carlos, (1987) Cold Spring Harbor Laboratory, New York); “Short Protocols in Molecular Biology” (Ausubel et al., Current Protocols; 5 edition (Nov. 5, 2002)); “Polymerase Chain Reaction: Principles, Applications and Troubleshooting”, (Babar, M., VDM Verlag Dr. Müller (Aug. 17, 2011)); “Current Protocols in Immunology” (Coligan, John Wiley & Sons, Inc. Nov. 1, 2002).
Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, numerous equivalents to the specific procedures, embodiments, claims, and examples described herein. Such equivalents were considered to be within the scope of this invention and covered by the claims appended hereto. For example, it should be understood, that modifications in reaction conditions, including but not limited to reaction times, reaction size/volume, and experimental reagents, such as solvents, catalysts, pressures, atmospheric conditions, e.g., nitrogen atmosphere, and reducing/oxidizing agents, with art-recognized alternatives and using no more than routine experimentation, are within the scope of the present application.
It is to be understood that wherever values and ranges are provided herein, all values and ranges encompassed by these values and ranges, are meant to be encompassed within the scope of the present invention. Moreover, all values that fall within these ranges, as well as the upper or lower limits of a range of values, are also contemplated by the present application.
The following examples further illustrate aspects of the present invention. However, they are in no way a limitation of the teachings or disclosure of the present invention as set forth herein.
The invention is now described with reference to the following Examples. These Examples are provided for the purpose of illustration only, and the invention is not limited to these Examples, but rather encompasses all variations that are evident as a result of the teachings provided herein.
The materials and methods employed in Experimental Examples 1-7 are now described.
Design, synthesis and cloning of the CCAS library: Significantly mutated genes (SMGs) were identified by analysis of pan-cancer mutation data of 17 cancer types from The Cancer Genome Atlas downloaded via Synapse (www dot synapse.org/#!Synapse:syni729383) and from the Broad Institute GDAC (gdac dot broadinstitute dot org/). The top 50 putative tumor suppressors (TSGs) were chosen in an unbiased manner using a multistep approach that prioritizes genes, which are significantly mutated in multiple cancer types and possess mutational signatures consistent with non-oncogenes. (1) A list of all significantly mutated genes in each of the 17 cancer types were first compiled by collecting all MutSig2CV results from GDAC and using a cutoff of q<0.1. (2) To remove putative oncogenes from the significantly mutated gene sets in each cancer type, the ratio of null to silent mutations for each SMG in that cancer was calculated, and this ratio was multiplied by the square root of the number of null mutations. (3) Ratio scores for each gene were then summed across cancer types. (4) Finally, to heavily weight genes that are SMGs in multiple cancer types, the summed ratio scores were multiplied by the number of unique cancer types in which a gene was considered an SMG. The resulting gene set was defined as PANCAN17-TSG50.
Of the top 50 putative TSGs identified by this approach, 49 were found to have clear mouse orthologs (defined as PANCAN17-mTSG). The complete exon sequences of these 49 genes were then analyzed to extract all possible Cpf1 spacers (i.e., all 20 mers beginning with the Cpf1 PAM, 5′-TTTN-3′). Each of these 20 mers was then reverse complemented and mapped to the entire mm10 reference genome by Bowtie 1.1.2, with settings bowtie -n 2-l 18 -p 8 -a -y --best -e 90. After filtering out all alignments that contained mismatches in the final 3 basepairs (corresponding to the Cpf1 PAM) and disregarding any mismatches in the fourth to last basepair, the number of genome-wide alignments for each crRNA were quantified using all 0, 1, and 2 mismatch (mm) alignments. A total mismatch score (MM score) was calculated for each crRNA using the following ad hoc formula: MM score=0 mm*1000+1 mm*50+2 mm*1. An “on-target” (OT) score was also approximated by counting the number of consecutive thymidines in each crRNA, and then using the formula: OT score=100/(max_consecutive_Thymidines)2. All the crRNAs corresponding to each target gene were sorted by low MM score and high OT score. Finally, the top 2 crRNAs for each gene were chosen. In the event of ties, crRNAs targeting constitutive exons and/or the first exon were prioritized. 3 NTC crRNAs were randomly generated.
To generate the 9,408 DKO crRNA arrays in the library, all possible permutations of the 98 gene-targeting crRNAs were computed, with the stipulation that crRNAs targeting the same gene would not be included in the same crRNA array. For SKO crRNA arrays, each gene-targeting crRNA was placed in the first position of the crRNA array and the 3 NTCs were toggled through the second position (98*3=294 crRNA arrays). Finally, 3 NTC-NTC crRNA arrays were generated from various combinations of the 3 NTC single crRNAs.
Cell lines: A non-small cell lung cancer (NSCLC) cell line (KPD cell line) was used for initial testing of crRNA array constructs. An immortalized, but non-transformed hepatocyte cell line (clone IM) was transduced with LentiCpf1 to generate Cpf1-positive cells (IM.C9-Cpf1). All cell lines were grown under standard conditions using DMEM containing 10% FBS, 1% Pen/strep in a 5% CO2 incubator.
Nextera analysis of indels generated by Cpf1: CrRNA arrays (crPten.crNf1 and crNf1.crPten) were cloned into Lenti-U6-crRNA vector, and virus was generated for transduction of KPD cell line.
Seven days after transduction and puromycin selection, genomic DNA was harvested from the cells in culture. The surrounding genomic regions flanking the target sites of crPten and crNf1 were first amplified by PCR using the following primers (5′-3′): Pten_fwd=ACTCACCAGTGTTTAACATGCAGGC (SEQ ID NO: 9,711), Pten_rev=GGCAAGGTAGGTACGCATTTGCT (SEQ ID NO: 9,712); Nf1_fwd=AGCAGCTGTCCTGGCTGTTC (SEQ ID NO: 9,713), Nf1_rev=CGTGCACCTCCCTTGTCAGG (SEQ ID NO: 9,714). Nextera XT library preparation was then performed according to manufacturer protocol. Reads were mapped to the mm10 mouse genome using BWA (Li and Durbin. Bioinforma. Oxf Engl. 25, 1754-1760 (2009)), with the settings bwa mem -t 8 -w 200. Indel variants were first processed with Samtools (Li, H. et al. Bioinformatics 25, 2078-2079 (2009)) with the settings samtools mpileup -B -q 10 -d 10000000000000, then piped into VarScan v2.3.9 (Koboldt, et al. Genome Res. 22, 568-576 (2012)) with the settings pileup2indel --min-coverage 1 --min-reads2 1 --min-var-freq 0.00001.
Lentiviral library production: The LentiCpf1, Lenti-U6-crRNA vector and Lenti-CCAS library plasmids were used to make vector or library-containing lentiviruses. Briefly, envelope plasmid pMD2.G, packaging plasmid psPAX2, and LentiCpf1, Lenti-U6-crRNA or Lenti-CCAS-library plasmid were added at ratios of 1:1:2.5, and then polyethyleneimine (PEI) was added and mixed well by vortexing. The solution was standing at room temperature for 10-20 min, and then the mixture was dropwisely added into 80-90% confluent HEK293FT cells and mixed well by gently agitating the plates. Six hours post-transfection, fresh DMEM supplemented with 10% FBS and 1% Pen/Strep was added to replace the transfection media. Virus-containing supernatant was collected at 48 h and 72 h post-transfection, and was centrifuged at 1500 g for 10 min to remove the cell debris; aliquoted and stored at −80° C. Virus was titrated by infecting IM-Cpf1 cells at a number of different concentrations, followed by the addition of 2 μg/mL puromycin at 24 h post-infection to select the transduced cells. The virus titers were determined by calculating the ratios of surviving cells 48 or 72 h post infection and the cell count at infection.
CCAS in a mouse model of transformation and early tumorigenesis: Cells were transduced and library transduction was performed with four infection replicates at high coverage and low MOI. Briefly, according to the viral titers, CCAS library lentiviruses were added into a total of >1×108 IM.C9-Cpf1 cells at calculated MOI of <=0.2 and incubated 24 h before replacing the viruses-containing media with 3 g/mL puromycin containing fresh media to select the virus-transduced cells. Approximately 2×107 cells confer a -2,000× library coverage. Vector and CCAS library-transduced cells were culture under the pressure of 3 μg/mL puromycin for 7 days before injection or cryopreservation.
Vector and CCAS library-transduced IM.C9-Cpf1 cells were injected subcutaneously into the right and left flanks of Nu/Nu mice at 4×106 cells per flank (˜400× coverage per transplant). Tumors were measured every week by caliper and their sizes were estimated as spheres. Statistical significance was assessed by paired t-test.
Mouse tumor dissection and histology: Mice were sacrificed by carbon dioxide asphyxiation followed by cervical dislocation. Tumors and other organs were manually dissected, and then fixed in 10% formalin for 24-96 hours, and transferred into 70% Ethanol for long-term storage. The tissues were embedded in paraffin, sectioned at 5 μm and stained with hematoxylin and eosin (H&E) for pathological analysis. For tumor size quantification, H&E slides were scanned using an Aperio digital slidescanner (Leica). For molecular biological analysis, tissues were flash frozen with liquid nitrogen, and ground in 5 mL Frosted polyethylene vial set (2240-PEF) in a 2010 GenoGrinder machine (SPEXSamplePrep). Homogenized tissues were used for DNA/RNA/protein extractions.
CCAS in a mouse model of metastasis: For Cpf1 crRNA array library screen in a mouse model of metastasis, lentiviral pools were generated from the CCAS plasmid library, and transduced ≥1×108 Cpf1+ KPD cells with three independent infection replicates at calculated MOI of ≤0.2 and incubated 24 h before replacing the viruses-containing media with 3 g/mL puromycin containing fresh media to select the virus-transduced cells. Approximately 2×107 cells confer a 2,000× library coverage. CCAS library-transduced cells were culture under the pressure of 3 μg/mL puromycin for 7 days before injection or cryopreservation.
CCAS-treated cells were then injected at 4×106 cells per mouse (˜400× coverage) subcutaneously into Nu/Nu mice (n=7) and Rag1−/− mice (n=4). Metastases were allowed to form in vivo for 8 weeks after injection. Primary tumors, four lung lobes, and other stereoscope-visible metastases, were then dissected and then subjected to genomic DNA extraction and crRNA array sequencing.
Genomic DNA extraction: 200-800 mg of frozen ground tissue were re-suspended in 6 mL of NK Lysis Buffer (50 mM Tris, 50 mM EDTA, 1% SDS, pH 8.0) supplemented with 30 μL of 20 mg/mL Proteinase K (Qiagen) in 15 mL conical tubes, and incubated at 55° C. bath for 2 h up to overnight. After all the tissues have been lysed, 30 μL of 10 mg/mL RNAse A (Qiagen) was added, mixed well and incubated at 37° C. for 30 min. Samples were chilled on ice and then 2 mL of pre-chilled 7.5 M ammonium acetate (Sigma) was added to precipitate proteins. The samples were inverted and vortexed for 15-30 s and then centrifuged at ≥4,000 g for 10 min. The supernatant was carefully decanted into a new 15 mL conical tube, followed by the addition of 6 mL 100% isopropanol (at a ratio of 0.7), inverted 30-50 times and centrifuged at ≥4,000 g for 10 minutes. Genomic DNA should be visible as a small white pellet. After discarding the supernatant, 6 mL of freshly prepared 70% ethanol was added, mixed well, and then centrifuged at ≥4,000 g for 10 min. The supernatant was discarded by pouring; and remaining residues was removed using a pipette. After air-drying for 10-30 min, DNA was re-suspended by adding 200-500 μL of Nuclease-Free H2O. The genomic DNA concentration was measured using a Nanodrop (Thermo Scientific), and normalized to 1000 ng/L for the following readout PCR.
Cpf1 CrRNA array library readout: The crRNA array library readout was performed using a 2-step PCR approach. Briefly, in the 1st round PCR, enough genomic DNA was used as template to guarantee coverage of the library abundance and representation. For example, assuming 6.6 pg of gDNA per cell, 20-48 μg of gDNA (≥75×) was used per sample. For the 1st PCR, the sgRNA-included region was amplified using primers specific to the double-knockout CCAS vector using Phusion Flash High Fidelity Master Mix (ThermoFisher) with thermocycling parameters: 98° C. for 1 min, 15 cycles of (98° C. for is, 60° C. for 5 s, 72° C. for 15s), and 72° C. for 1 min. Fwd: AATGGACTATCATATGCTTACCGTAACTTGAAAGTATTTCG (SEQ ID NO: 9,715); Rev: CTTTAGTTTGTATGTCTGTTGCTATTATGTCTACTATTCTTTCCC (SEQ ID NO: 9,716)
In the 2nd PCR, 1st round PCR products for each biological repeats were pooled, then 1-2 μL well-mixed 1st PCR products were used as the template for amplification using sample-tracking barcode primers with thermocycling conditions as 98° C. for 1 min, 15 cycles of (98° C. for is, 60° C. for 5 s, 72° C. for 15s), and 72° C. for 1 min. The 2nd PCR products were quantified in 2% E-gel EX (Life Technologies) using E-Gel® Low Range Quantitative DNA Ladder (ThermoFisher), then the same amount of each barcoded samples were combined. The pooled PCR products were purified using QIAquick PCR Purification Kit and further QIAquick Gel Extraction Kit from 2% E-gel EX. The purified pooled library was quantified in a gel-based method. Diluted libraries with 5-20% PhiX were sequenced with Hiseq 2500 or HiSeq 4000 systems (Illumina) with 150 bp paired-end read length.
Cpf1 double knockout Illumina data pre-processing: Raw single-end fastq read files were filtered and demultiplexed using Cutadapt (Martin, EMBnet.journal 17, 10-12 (2011)). To remove extra sequences downstream (i.e. 3′ end) of the dual-RNA spacer sequences, including the U6 terminator, the following settings were used: cutadapt --discard-untrimmed -e 0.1 -a TTTTTTAAGCTTGGCGTGGATCCGATATCA (SEQ ID NO: 9,717). As the forward PCR primers used to readout crRNA array representation were designed to have a variety of barcodes to facilitate multiplexed sequencing, these filtered reads were then demultiplexed with the following settings: cutadapt -g file:fbc.fasta --no-trim, where fbc.fasta contained the 12 possible barcode sequences within the forward primers. Finally, to remove extraneous sequences upstream (i.e. 5′ end) of the crRNA array spacers, including the first DR, the following settings were used: cutadapt --discard-untrimmed -e 0.1 -g AAAGGACGAAACACCgTAATTTCT ACTAAGTGTAGAT (SEQ ID NO: 9,718). Through this procedure, the raw fastq read files were pared down to the sequences of the first crRNA, the second DR, and finally the second crRNA (cr1-DR-cr2). The filtered fastq reads were then mapped to the CCAS reference index.
To do so, a bowtie index of the CCAS library was first generated using the bowtie-build command in Bowtie 1.1.2 (Langmead, et al. (2009), Genome Biol. 10, R25). Using these bowtie indexes, the filtered fastq read files were mapped using the following settings: bowtie -v 2 -k 1 -m 1 --best. These settings ensured only single-match reads would be retained for downstream analysis.
Analysis of CCAS library representation: Using the resultant mapping output, the number of reads that had mapped to each crRNA array within the library were quantitated. The number of reads in each sample were normalized by converting raw crRNA array counts to reads per million (rpm). The rpm values were then subject to log2 transformation for certain analyses. To generate correlation heatmaps, the NMF R package was used. To generate sgRNA representation barplots, a detection threshold of log2 rpm≥1 was set, and the number of unique crRNA arrays present in each sample was counted.
Analysis of enriched DKO and SKO crRNA arrays: To directly compare the abundance in tumor samples vs. cells, linear regression was performed and significant outliers identified using the outlierTest function from the car R package. Significant outlier crRNA arrays in individual tumors vs. cells were defined as having a Bonferroni adjusted p<0.05, based on analysis of the studentized regression residuals.
To identify crRNA arrays significantly enriched above NTC-NTC controls, two-sided t-tests were similarly performed on the log2 rpm abundance of each crRNA array compared to the average of all NTC-NTC crRNA arrays. Significantly enriched crRNA arrays were defined as having a Benjamini-Hochberg adjusted p<0.05. Each significantly enriched crRNA array was then deconstructed into its two constituent crRNAs, and finally down to the two target genes. This 3-tiered dataset was used to determine how many genes were involved in an enriched crRNA array (either SKO or DKO). Finally, all of the significant crRNA arrays associated with each gene were compiled, and the number of DKO or SKO crRNA arrays counted.
Position effect analysis of crRNA permutations: Marginal distribution analysis was performed by considering each of the 98 single crRNAs when found in position 1 or position 2 of the crRNA array. Specifically, the average log2 rpm abundance was calculated for each single crRNA, and these average scores were compared between position 1 and position 2. For direct permutation correlation analysis, the 9,408 DKO crRNA arrays were condensed down into 4,704 crRNA array combinations (i.e., crX.crY and crY.crX are two permutations of the same combination). The correlation between the two corresponding permutations was then calculated the across all 10 tumor samples (defined as permutation correlation), and the statistical significance assessed by t-distribution. Violin plots, empirical density plots, and scatterplots were generated using these permutation correlation coefficients.
Synergy analysis of gene pairs: The synergy coefficient (SynCo) for each DKO crRNA array was defined with the following formula:
SynCo=DKOxy−SKOx−SKOy
The DKOxy score is the log2 rpm abundance of the DKO crRNA array (i.e., crX.crY) after subtracting average NTC-NTC abundance, while SKOx and SKOy scores are defined as the average log2 rpm abundance of each SKO crRNA array (3 SKO crRNA arrays associated with each individual crRNA), each after subtracting average NTC-NTC abundance. By this definition, a SynCo score>>0 would indicate that a given DKO crRNA array is synergistic, as the DKO score would thus be greater than the sum of the individual SKO scores. The SynCo of each DKO crRNA array was calculated within each tumor sample and it was assessed whether the SynCo score of a given crRNA array across all 10 tumors was statistically significantly different from 0 by a two-sided one-sample t-test. A significance threshold of Benjamini-Hochberg adjusted p<0.05 was set, and all significant DKO crRNA arrays with an average SynCo>0 were considered to be synergistic.
Network analysis: Using the synergistic crRNA arrays identified through SynCo analysis, library-wide networks were constructed using individual genes as nodes and SynCo scores as edge weights. The pairwise connections were visualized through Cytoscape 3.4.0 (Shannon et al., Genome Res. 13, 2498-2504 (2003)). Edge width was scaled according to SynCo score. For the global network, node color was additionally scaled according to the degree of network connectivity.
Analysis of co-mutation patterns in human pan-cancer datasets: For the synergistic driver pairs identified by the CCAS screen, co-mutation analyses were performed on 21 different solid tumor types, all of which were from TCGA except for small cell lung cancer. The somatic mutation and copy number status of each cohort were obtained from cBioPortal (Cerami et al., Cancer Discov. 2, 401-404 (2012) (only somatic mutations were available for lung small cell cancer) and classified all tumors as a mutant or non-mutant for the genes represented in the CCAS library. “Mutant” was defined as the presence of nonsynonymous mutations and/or deep deletions in a given gene. After classifying every patient in terms of mutant status, co-mutation (co-occurrence) analysis was performed by calculating the co-occurrence rate for each gene pair. The co-occurrence rate was defined as the intersection (the number of double mutant samples) divided by the union (the number of all single and double mutant samples). Statistical significance was tested by a hypergeometric test, with a significance threshold of Benjamini-Hochberg adjusted p<0.05.
Analysis of metastasis enrichment over primary tumor and metastatic clonal spread: Comparison of the crRNA array representations was made between metastases to primary tumors. A crRNA array was called metastasis-enriched if it was a dominant clone in a lung lobe or extra-pulmonary metastasis (≥2% total reads) but not a dominant clone in the corresponding primary tumor of the same mouse. Waterfall plot was made for all crRNA arrays enriched in a metastases vs primary tumor, ranked by numbers of mice where an crRNA was called enriched.
Monoclonal spread was defined where dominant metastases in all lobes were derived from identical crRNA arrays, and polyclonal spread was defined where dominant metastases in all lobes were derived from multiple varying crRNAs.
Blinding statement: Investigators were blinded for sequencing data analysis, but not blinded for tumor engraftment, organ dissection and histology analysis.
The results of the experiments from Examples 1-7 are now described.
To establish a lentiviral system for CRISPR/Cpf1-mediated genetic screening, a human-codon-optimized LbCpf1 expression vector (pLenti-EFS-Cpf1-blast, LentiCpf1 for short) and a crRNA expression vector (pLenti-U6-DR-crRNA-puro, Lenti-U6-crRNA for short) were generated (
To investigate whether Cpf1 multiplex gene targeting could be utilized for multidimensional genetic interaction screens, a library for Cpf1 crRNA array screening was developed (CCAS library). Considering the resolution of library complexity under in vivo cellular dynamics, a focused CCAS library was designed of the top 50 significantly mutated genes (SMGs) that are not oncogenes, with the vast majority being established or putative tumor suppressor genes (TSGs) identified through analysis of 17 different cancer types from The Cancer Genome Atlas (TCGA). The resultant gene set was termed PANCAN7-TSG50. (
Compiling these 98 gene-targeting crRNAs and 3 additional non-targeting control (NTC) crRNAs, crRNA array library was designed containing 9,705 permutations of two crRNAs each (
To perform an in vivo Cpf1 screen, a mouse model of malignant transformation and early stage tumorigenesis was utilized. An immortalized murine cell line was transduced with low tumorigenicity (clone IM) with LentiCpf1 and then with the CCAS lentiviral pool. The library transduction was performed with four infection replicates at high coverage (˜2,000× coverage for each replicate) and low multiplicity of infection (MOI, ≤0.2) to ensure the vast majority of cells would only carry one provirus integrant (
To unveil the genetic interactions that had driven rapid tumor growth upon Cpf1-mediated mutagenesis, crRNA array sequencing was performed on genomic DNA from CCAS tumors (n=10) and pre-injection cell pools (n=4). Whereas plasmid and cell samples were highly correlated with one another, tumor samples were more correlated with other tumors (
To further investigate the specific genetic interactions that had driven early stage tumorigenesis in CCAS-treated cells, the distribution of raw crRNA array abundance within each sample was examined. Within each tumor, specific crRNA arrays were observed that were heavily enriched by several orders of magnitude, suggesting that these mutant clones had undergone potent positive selection (
Interestingly, this finding that several DKO crRNA arrays were more heavily enriched than their SKO counterparts was corroborated across tumors. For instance, Tumor 3 was dominated by crSetd2.crAcvr2a and crRnf43.crAtrx, Tumor 5 by crCic.crZc3h13 and crCbwd1.crNsd1, and Tumor 6 by crAtm.crRunx1 and crKmt2d.crH2-Q2 (
Taken together, these data point to the dominance of a handful of individual clones within each tumor sample, and further suggest that certain double-mutant clones had out-competed the corresponding single-mutant clones.
In order to uncover the genetic interactions underlying the positive selection in vivo, the next set of experiments set out to quantitatively identify all significantly enriched crRNA arrays across all 10 tumors. The abundance of each DKO and SKO crRNA array was compared to the average of all NTC-NTC crRNA arrays. 655 crRNA arrays targeting 498 gene combinations were found to be significantly enriched compared to NTC-NTC controls (Benjamini-Hochberg adjusted p<0.05) (
Specific genetic interactions that comprise this network were then investigated. The number of significant DKO crRNA arrays associated with each gene pair were quantified (
To investigate possible positional effects for each individual crRNA in the CCAS library, the two permutations of each crRNA array combination were directly compared (
To quantitate the gross contributions of individual crRNAs to tumorigenesis, marginal distribution meta-analysis of all 98 constituent single crRNAs in the CCAS library was performed (
To quantitatively investigate the genetic interactions in this model, a metric of synergy for DKO crRNA arrays was developed. Since the relative abundance of a crRNA array is effectively an estimate of its relative selective advantage in vivo, the synergy coefficient (SynCo) for each DKO crRNA array was defined as DKOy−SKOx−SKOy. The DKOx score is the log2 rpm abundance of the DKO crRNA array (i.e., crX.crY) after subtracting average NTC-NTC abundance; SKOx and SKOy scores are defined as the average log2 rpm abundance of each SKO crRNA array (3 SKO crRNA arrays associated with each individual crRNA), each after subtracting average NTC-NTC abundance (
The SynCo of each DKO crRNA array within each tumor sample was calculated, and it was assessed whether the SynCo score of a given crRNA array across all 10 tumors was statistically significantly different from 0 by a two-sided one-sample t-test. Out of 9,408 DKO crRNA arrays, 294 were significantly synergistic (Benjamini-Hochberg adjusted p<0.05, average Synco>0), representing 270 gene combinations. To obtain a comprehensive picture of the synergistic driver pairs, the average SynCo of each DKO crRNA array was plotted against its associated p-value, while additionally color-coding each point by average abundance and scaling the size of each point by the percentage of tumors that had a high SynCo score (SynCo>7) for that crRNA array (
To pinpoint the most robust genetic interactions from SynCo analysis, the number of synergistic dual-crRNAs associated with each gene pair was quantified. Of the 268 significant gene pairs, 24 were represented by at least 2 synergistic dual-crRNAs (
Two hundred and seventy significant pairwise genetic interactions in early tumorigenesis were identified, many of which corresponded to genomic features of human tumors. Next, each of these gene pairs was placed within the larger network of tumor suppression. A network of all synergistic driver interactions captured by CCAS screening was constructed, where each node represented a gene and each edge represented a significant synergistic interaction (
Cpf1 crRNA array library screening was performed in a mouse model of metastasis to identify co-drivers of the metastatic process in vivo. Lentiviral pools were generated from the CCAS plasmid library, and Cpf1+ KPD cells were subsequently infected to perform massively parallel gene-pair level mutagenesis. The mixed double mutant cell populations (CCAS-treated cells, 4×106 cells per mouse, ˜400× coverage) were then injected subcutaneously into Nu/Nu mice (n=7) and Rag1−/− mice (n=4). After 8 weeks, the primary tumors, four lung lobes, and other stereoscope-visible metastases (two large extra-pulmonary metastases were found) were collected and subjected to crRNA array sequencing (
In the CCAS metastasis screen dataset, strong overall permutation correlation was observed, where 97.4% of all crRNA array combinations were significantly correlated when comparing the two permutations associated with each combination (Benjamini-Hochberg adjusted p<0.05, by t-distribution) (median permutation correlation >0.85) (
Independent evidence for selection of metastasis co-drivers was sought via investigation of independent crRNA arrays targeting the same gene pair. By calculating the number of significant DKO crRNA arrays associated with each gene pair in the CCAS library, it was discovered that the majority (729/1176=61.99%) of gene pairs were represented by at least 2 independent DKO crRNA arrays. Of note, 30 gene pairs were represented by seven independent crRNA arrays, among them including Apc+Cdh1, Cdh1+H2-Q2, Epha2+Kmt2b; and 8 gene pairs were represented by all eight designed crRNA arrays, including Arid1a+Pten, Cdh+Nf1, Cdh1+Kdm5c, Arid1a+Rasa1, Arid1a+Cdh1 Cdh7+Kmt2b, Arid1a+Kmt2b, and Arid1a+Epha2, suggesting these are the strong co-drivers of metastasis (
The in vivo patterns of metastatic evolution of these double mutants were investigated. Examination of the clonal architecture of the crRNA arrays in the metastases samples revealed a highly heterogenous pattern of clonal dominance (
To quantify the metastasis-specific signature of double mutants, the number of times a crRNA array was considered as metastasis-enriched (i.e. a dominant clone in a lung lobe or extra-pulmonary metastasis (>=2% total reads) but not a dominant clone in the corresponding primary tumor of the same mouse) was calculated. Top ranked metastasis-specific dominant crRNA arrays were found to be crCic.crKmt2b, crCdkn2a.crApc, crRasa1.crNf2, crApc.crKmt2b, crNf2.crPik3r1, crNf2.crRnf43, among 23 enriched crRNA arrays, with crCic.crKmt2b being metastasis-enriched 55% (6/11 mice) of the time. These data suggest strong genetic signatures of metastasis-specific co-drivers, which have notably been difficult to parse from single-gene studies. Collectively, the results presented herein demonstrate the power of in vivo Cpf1 crRNA array screens for mapping and identification of genetic interactions in an unbiased manner.
Due to the complex nature of biological systems, a single gene is often far from sufficient to explain the biological or pathological variation observed in health and disease. Genetic interactions are the building blocks of highly connected biological networks, and their modular nature enables biological pathways to take on a variety of forms—linear, branching divergent, convergent, feed-forward, feedback, or any combination of the above. In systems biology, numerous theories and algorithms have been developed to understand such complex networks and to predict genetic interactions. However, predictions have often been surprised by unexpected experimental findings, urging for experimental testing of combinatorial perturbations in a systems manner.
High-throughput genetic screens are a powerful approach for mapping genes to their associated phenotypes. Unbiased and quantitative analysis of double knockouts enables phenotypic assessment of all possible combinations of any given gene pairs. Advances in high-throughput technologies utilizing RNA-interference-based gene knockdown or CRISPR/Cas9-based gene knockout, activation and repression, have enabled genome-scale screening in multiple species across various biological applications. While high-throughput genetic perturbation approaches have been developed to map out the landscape of genetic interactions in yeast and in worms, large-scale double knockout studies in mammalian species are scarce, due to the exponentially scaling number of possible gene combinations and the technological challenges of generating and screening double knockouts. Recently, several high-throughput double perturbations have been performed in mammalian cells using RNA interference (RNAi) or clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 technologies.
However, RNAi-based methods act on the level of mRNA silencing. Though CRISPR/Cas9-based methods can induce complete knockouts, the dependence of Cas9 on a trans-activating crRNA (tracrRNA) requires multiple sgRNA cassettes, hindering the scalability of Cas9-mediated high-dimensional screens, and making in vivo genetics more difficult.
Cpf1 was recently identified and characterized as a single-effector RNA-guided endonuclease with two orthologs from Acidaminococcus (AsCpf1) and Lachnospiraceae (LbCpf1) capable of efficient genome-editing activity in human cells. Unlike Cas9, Cpf1 requires only a single 39-42-nt crRNA without the need of an additional trans-activating crRNA, enabling one RNA polymerase III promoter to drive an array of several crRNAs targeting multiple loci simultaneously. This unique feature of the Cpf1 nuclease greatly simplifies the design, synthesis and readout of multiplexed CRISPR screens, making it a suitable system to carry out combinatorial screens.
Considering that cancer is a polygenic disease of malignant somatic cells, a Cpf1 double knockout screen was designed herein and performed in a mouse model of malignant transformation and early tumorigenesis. In this setting, successful mapping of all permutations of crRNA arrays targeting combinations of two putative non-oncogenes was demonstrated, revealing a wide array of unexpected synergistic gene pairs. The most highly connected ‘hub’ genes were epigenetic factors such as Kmt2c, Atrx, Kdm5c, Setd2, Kdm6a, and Arid1a, suggesting that the multifarious interactions of these factors, whether direct or indirect, lead to drastically accelerated tumorigenesis upon loss-of-function. Without wishing to be limited by any theory, this finding might explain why, despite being frequently mutated in human cancers, single knockouts of such factors rarely lead to tumorigenesis in vivo (though only a limited number of these genes have thus far been studied in animal models). In that sense, epigenetic modifiers might function as genetic buffers, redundant backup pathways, modifiers or amplifiers of multiple other apparently unrelated pathways. Many of the synergistic interactions identified through the screen were subsequently found to be significantly co-mutated across multiple cancer types. In a more complex biological process such as metastasis, which includes a cascade of primary tumor growth, inducing angiogenesis and lyphangiogenesis, extravasation, circulation, extravasation, colonization and immunological interactions, the screen is capable of detecting robust signatures of selection and revealing modes and patterns of clonal expansion of complex pools of double mutants in vivo. Multiplexed Cpf1 screens thus represent a powerful tool for studying genetic interactions with unparalleled simplicity and specificity.
As shown herein, multiplexed Cpf1 screens can enable the high-throughput discovery of synergistic interactions by examining patterns of crRNA array enrichment. On the flip side, crRNA array depletion screens would enable the identification of synthetically lethal gene mutations in cancer, potentially opening new avenues for therapeutic discovery (
The materials and methods employed in Experimental Example 8 are now described.
Design of the MCAP-MET library: The top 23 ranked “tumor suppressors” from the human MET500 cohort (Robinson, D. R. et al. (2017) Nature 548, 297-303) were collected, and combined with 3 top hits from a previous mouse metastasis screen (Nf2, Trim72, and Ube2g2) (Chen, S. et al. (2015) Cell 160, 1246-1260) for a final set of 26 genes. The complete exon sequences of these 26 genes were analyzed to extract all possible Cpf1 spacers (i.e., all 20 mers beginning with the Cpf1 PAM, 5′-TTTV). Each of these 20 mers was then reverse complemented and mapped to the entire mm10 reference genome by Bowtie 1.1.2, with settings -n 2 -l 18 -p 8 -a -y --best -e 90 (Langmead, et al. (2009), Genome Biol. 10, R25). After filtering out all alignments that contained mismatches in the final 3 basepairs (corresponding to the Cpf1 PAM) and disregarding any mismatches in the fourth to last basepair, the number of genome-wide alignments were quantified for each crRNA using all 0, 1, and 2 mismatch (mm) alignments. A total mismatch score (MM score) was calculated for each crRNA using the following formula: MM score=0 mm*1000+1 mm*50+2 mm*1. The number of consecutive thymidines was counted in each crRNA, using the following formulas: T score=100/(max_consecutive_Thymidines). The crRNAs were sorted corresponding to each target gene by low MM score and high T score. Finally, the top 4 crRNAs for each gene were chosen. In the event of ties, crRNAs targeting constitutive exons and/or the first exon were prioritized. 52 NTC crRNAs were randomly generated. In combination with the 104 crRNAs targeting 26 genes, a total of 5,200 DKO, 5,408 SKO, and 1,326 NTC-NTC arrays were designed for a total of 11,934 arrays (MCAP-MET library). Each gene pair is represented by 16 DKO arrays, while each single gene condition is represented by 208 SKO arrays. For SKO crRNA arrays, each gene-targeting crRNA was placed in the first position of the crRNA array and the NTC crRNAs were toggled through the second position. For each oligo, a degenerate 10 mer was appended following the U6 termination sequence to serve as a barcode for downstream clonality analysis. After pooled oligo synthesis (CustomArray), Gibson cloning was used to insert the MCAP-MET library into the BsmbI-linearized crRNA expression vector.
Cell lines: A non-small cell lung cancer (NSCLC) cell line (KPD cell line) was transduced with LentiCpf1 to generate Cpf1-positive cells (LCC-Cpf1). All cell lines were grown under standard conditions using DMEM containing 10% FBS, 1% Pen/strep in a 5% CO2 incubator.
Lentiviral library production: The LentiCpf1 and Lenti-MCAP-MET library plasmids were used for lentiviral production. Briefly, envelope plasmid pMD2.G, packaging plasmid psPAX2, and LentiCpf1 or Lenti-MCAP-library plasmid were added at ratios of 1:1:2.5, and then polyethyleneimine (PEI) was added and mixed well by vortexing. The solution was left at room temperature for 10-20 min, and then the mixture was added dropwise into 80-90% confluent HEK293FT cells and mixed well by gently agitating the plates. Six hours post-transfection, fresh DMEM supplemented with 10% FBS and 1% Pen/Strep was added to replace the transfection media. Virus-containing supernatant was collected at 48 h and 72 h post-transfection, and was centrifuged at 1500 g for 10 min to remove the cell debris; aliquoted and stored at −80° C. Virus was titrated by infecting LCC cells at a number of different concentrations, followed by the addition of 3 μg/mL puromycin at 24 h post-infection to select the transduced cells. The virus titers were determined by calculating the ratios of surviving cells 48 or 72 h post infection and the cell count at infection.
Nextera analysis of indels generated by Cpf1: CrRNA arrays (crPten.crNf1 and crNf1.crPten) were cloned into Lenti-U6-crRNA vector, and virus was generated for transduction of KPD cell line. Pten spacer=TGCATACGCTATAGCTGCTT (SEQ ID NO: 9,709); Nf1 spacer=TAAGCATAATGATGATGCCA (SEQ ID NO: 9,710). Seven days after transduction and puromycin selection, genomic DNA was harvested from the cells in culture. The surrounding genomic regions flanking the target sites of crPten and crNf1 were first amplified by PCR using the following primers (5′-3′): Pten_fwd=ACTCACCAGTGTTTAA CATGCAGGC (SEQ ID NO: 9,711), Pten_rev=GGCAAGGTAGGTACGCATTTGCT (SEQ ID NO: 9,712); Nf1_fwd=AGCAGCTGTCCTGGCTGTTC (SEQ ID NO: 9,713), Nf1_rev=CGTGCACCTCCCTTGTCAGG (SEQ ID NO: 9,714). Nextera XT library preparation was then performed according to manufacturer protocol. Reads were mapped to the mm10 mouse genome using BWA (Li, H. & Durbin, R. (2009) Bioinforma. Oxf Engl. 25, 1754-1760), with the settings bwa mem -t 8 -w 200. Indel variants were first processed with Samtools (Li, H. et al. (2009) Bioinformatics 25, 2078-2079). with the settings samtools mpileup -B -q 10 -d 10000000000000, then piped into VarScan v2.3.9 (Koboldt, D. C. et al. (2012) Genome Res. 22, 568-576) with the settings pileup2indel --min-coverage 1 --min-reads2 1 --min-var-freq 0.00001.
Evaluation of in vivo library diversity in the absence of mutagenesis: A library of degenerate 8 mers was synthesized and cloned into the crRNA expression vector. After lentiviral production, LCC cells were transduced with the 8 mer lentiviral library and selected by puromycin. 4×106 LCC-8 mer cells were subcutaneously injected both in Rag1−/− and nu/nu mice. Twelve days post-transplantation, mice were sacrificed and tumors were isolated for genomic preparation and readout.
MCAP in a mouse model of metastasis: Library transduction was performed with three infection replicates at high coverage and low MOI. Briefly, according to the viral titers, MCAP-MET lentiviruses were added to a total of 1×108 LCCCpf1 cells at calculated MOI of ≤0.2 and incubated 24 h before replacing the virus-containing media with 3 g/mL puromycin containing fresh media to select the virus-transduced cells. Approximately 2.5×107 cells confer a -2,000× library coverage. MCAP-MET library-transduced cells were cultured under the pressure of 3 μg/mL puromycin for 7 days before injection or cryopreservation. MCAP library-transduced LCC-Cpf1 cells were injected subcutaneously into the right and left flanks of nu/nu mice at 4×106 cells per flank (˜350× coverage per transplant).
Mouse tumor dissection: Mice were sacrificed by carbon dioxide asphyxiation followed by cervical dislocation. Tumors and lungs were manually dissected, then fixed in 10% formalin for 24-96 hours, and transferred into 70% Ethanol. Tissues were flash frozen with liquid nitrogen, and ground in 5 mL Frosted polyethylene vial set (2240-PEF) in a 2010 GenoGrinder machine (SPEXSamplePrep). Homogenized tissues were then used for DNA extraction.
Genomic DNA extraction: 200-800 mg of frozen ground tissue were re-suspended in 6 mL of NK Lysis Buffer (50 mM Tris, 50 mM EDTA, 1% SDS, pH 8.0) supplemented with 30 μL of 20 mg/mL Proteinase K (Qiagen) in 15 mL conical tubes, and incubated at 55° C. bath overnight. After all the tissues were lysed, 30 μL of 10 mg/mL RNAse A (Qiagen) was added, mixed well and incubated at 37° C. for 30 min. Samples were chilled on ice and then 2 mL of pre-chilled 7.5 M ammonium acetate (Sigma) was added to precipitate proteins. The samples were inverted and vortexed for 15-30s and then centrifuged at ≥4,000 g for 10 min. The supernatant was carefully decanted into a new 15 mL conical tube, followed by the addition of 6 mL 100% isopropanol (at a ratio of 0.7), inverted 30-50 times and centrifuged at ≥4,000 g for 10 minutes. At this time, genomic DNA became visible as a small white pellet. After discarding the supernatant, 6 mL of freshly prepared 70% ethanol was added, mixed well, and then centrifuged at ≥4,000 g for 10 min. The supernatant was discarded by pouring; and remaining residues was removed using a pipette. After air-drying for 10-30 min, DNA was re-suspended by adding 200-500 μL of Nuclease-Free H2O. The genomic DNA concentration was measured using a Nanodrop (Thermo Scientific), and normalized to 1000 ng/L for the following readout PCR.
MCAP library readout: MCAP library readout was performed using a 2-step PCR approach. Briefly, in the 1st round PCR, enough genomic DNA was used as template to guarantee coverage of the library abundance and representation. For example, assuming 6.6 pg of gDNA per cell, 20-48 μg of gDNA (≥75×) was used per sample. For the 1st PCR, the sgRNA-included region was amplified using primers specific to the MCAP vector using Phusion Flash High Fidelity Master Mix (ThermoFisher) with thermocycling parameters: 98° C. for 1 min, 15 cycles of (98° C. for is, 60° C. for 5 s, 72° C. for 15s), and 72° C. for 1 min. Fwd: AATGGACTA TCATATGCTTACCGTAACTTGAAAGTATTTCG (SEQ ID NO: 9,715); Rev: CTTTAGTTT GTATGTCTGTTGCTATTATGTCTACTATTCTTTCCC (SEQ ID NO: 9,716) In the 2nd PCR, 1st round PCR products for each biological repeats were pooled, then 1-2 μL well-mixed 1st PCR products were used as the template for amplification using sample-tracking barcode primers with thermocycling conditions as 98° C. for 1 min, 15 cycles of (98° C. for is, 60° C. for 5 s, 72° C. for 15s), and 72° C. for 1 min. The 2nd PCR products were quantified in 2% E-gel EX (Life Technologies) using E-Gel® Low Range Quantitative DNA Ladder (ThermoFisher), then the same amount of each barcoded samples were combined. The pooled PCR products were purified using QIAquick PCR Purification Kit and further QIAquick Gel Extraction Kit from 2% E-gel EX. The purified pooled library was quantified in a gel-based method. Diluted libraries with 5-20% PhiX were sequenced with HiSeq 4000 systems (Illumina) with 150 bp paired-end read length.
MCAP-MET plasmid library readout and analysis: Raw paired-end fastq read files were first merged to single fastq files by PEAR (Zhang, J. et al. (2014). Bioinformatics 30, 614-620). with the settings -y 8G -j 8 -v 3. The merged fastq files were then filtered and demultiplexed using Cutadapt (Martin, M. (2011) EMBnet.journal 17, 10-12), using two different sets of adapters for extraction of crRNA array sequences or the 10 mer barcode. For the crRNA array, the following settings were used: cutadapt --discard-untrimmed -g tcttGTGGAAAGGACGAAACACCg (SEQ ID NO: 9,731), followed by cutadapt --discard-untrimmed -a TGTAGATTTTTTT (SEQ ID NO: 9,758). The trimmed sequences were then mapped to the MCAP-MET library using Bowtie (Langmead, et al. (2009), Genome Biol. 10, R25): bowtie -v 3 -k 1 -m 1. For the Omer barcodes, we used the following Cutadapt settings: cutadapt --discard-untrimmed -a aagcttggcgtGGATC (SEQ ID NO: 9,759), followed by cutadapt --discard-untrimmed -g TACTAAGTGTAGATTTTTTT (SEQ ID NO: 9,760). The resultant sequences were quantified to a reference of all possible 10 mer sequences. Reads that successfully mapped to both the MCAP-MET library and contained a valid barcode were tabulated.
Processing of MCAP-MET crRNA array abundance in cells and tumors: PEAR-merged fastq files were filtered and demultiplexed using Cutadapt. To remove extra sequences downstream (i.e. 3′ end) of the crRNA array sequences, including the DR and U6 terminator, the following settings were used: cutadapt --discard-untrimmed -e 0.1 -a aagcttggcgtGGATCCGATATCa (SEQ ID NO: 9,761) -m 80. As the forward PCR primers used to readout crRNA array representation were designed to have a variety of barcodes to facilitate multiplexed sequencing, these filtered reads were then demultiplexed with the following settings: cutadapt -g file:fbc.fasta --no-trim, where fbc.fasta contained the 12 possible barcode sequences within the forward primers. Finally, to remove extraneous sequences upstream (i.e. 5′ end) of the crRNA array spacers, the following settings were used: cutadapt --discard-untrimmed -e 0.1 -g tcttGTGGAAAGGACGAAACACCg (SEQ ID NO: 9,731) -m 80. The 5′ DR were removed as follows: cutadapt --discard-untrimmed -e 0.1 -g TAATTTCTACTAAGTGTAGAT (SEQ ID NO: 21,696) -m 80. The filtered fastq reads were then mapped to the MCAP-MET reference index. To do so, a Bowtie index of the MCAP-MET library was generated using the bowtie-build command in Bowtie 1.1.2 (Langmead, et al. (2009), Genome Biol. 10, R25). Using these bowtie indexes, the filtered fastq read files were mapped using the following settings: bowtie -n 2 -k 1 -m 1 --best. These settings ensured only single-match reads would be retained for downstream analysis. For data processing on the level of barcoded-crRNAs, the same trimmed fastq files as above were utilized, but instead the barcoded-crRNA plasmid library was used as the reference index.
Analysis of MCAP crRNA array library representation: Using the resultant mapping output, the number of reads that had mapped to each crRNA array within the library were quantified. The number of reads in each sample was normalized by converting raw crRNA array counts to reads per million (rpm). The rpm values were then subject to log2 transformation for certain analyses. To generate Spearman correlation heat maps, the NMF R package was used. Where applicable, linear regression lines and 95% confidence intervals were calculated. For comparing cells, primary tumors, and lung metastases, crRNA array abundances were averaged within sample groups and linear regression was performed using the NTC-NTC arrays as a model for neutral selection. Significant outliers were identified using the outlierTest function from the car R package. For gene/gene pair analyses, the corresponding SKO and DKO arrays were first averaged together, then aggregated by sample type. Linear regression was performed using all SKO/DKO genotypes, and outliers were identified as above.
Clone-level analysis of MCAP-MET samples: The data were analyzed at the clone level using the barcoded-crRNA abundances. The counts in each sample were first converted to percentages of total reads. Two different frequency cutoffs were used for considering clones: ≥0.01% and ≥0.001%. Differences in the number of clones between sample types was assessed by Wilcoxon rank sum test, and visualized after log2 transform. Empirical CDFs were calculated after combining all the clones in a given sample group; statistical differences in clone size distributions was assessed by Kolmogorov-Smirnov test. The Shannon diversity index was also calculated on each sample with the vegan R package; statistical differences were assessed by Wilcoxon rank sum test.
Enrichment analysis of MCAP-MET genotypes: To identify crRNA arrays that were enriched in individual samples, the 1,326 NTC-NTC arrays were utilized for modeling the empirical null distribution. Enriched crRNA arrays were subsequently called at FDR<0.5%. These results were aggregated to the single gene/gene pair level, then tabulated across samples. Finally, all of the significant crRNA arrays associated with each genotype were counted.
Identification of synergistic mutation combinations: The synergy coefficient (SynCo) for each gene pair was defined with the following formula: SynCo=DKONM−SKON−SKOM. The DKONM value is the average log2 rpm abundance of all corresponding DKO crRNA arrays (i.e., crN.crM), while SKON and SKOM values are defined as the average log2 rpm abundance of all corresponding SKO crRNA arrays. By this definition, a SynCo score>0 would indicate that a given DKO crRNA array is synergistic, as the DKO score would thus be greater than the sum of the individual SKO scores. The SynCo of each gene pair was calculated and it was assessed whether the DKO abundances were statistically significantly higher than both SKO abundances by Wilcoxon rank sum test.
To generate a library-wide map of the relative selective advantages for each gene pair vs. single gene knockout, the aggregated gene-level abundances were utilized in lung metastasis samples. The abundance of each DKO was compared to its reference SKO, and the data visualized in a heat map. Each column refers to the reference SKO, while each row denotes the modulatory effects of the second KO.
Statistics: All statistical tests are two-sided.
Blinding statement: Investigators were not blinded for sequencing data analysis, tumor engraftment, or organ dissection.
The results of the experiments from Example 8 are now described.
Metastasis is the major lethal factor of solid cancers. However, the complex genetic interactions underlying the metastatic phenotype of tumor cells have remained elusive. A streamlined approach for constructing global maps of metastasis gene networks is key to understanding metastasis at the systems level. Herein was developed MCAP (Massively-parallel crRNA array profiling), an approach for high-throughput interrogation of genetic combinations in vivo. A UMI-barcoded, high-density, high-redundancy MCAP library was designed with 11,934 crRNA arrays targeting 325 pairwise combinations of genes significantly mutated in human metastases, and the metastatic potential of all combinations were functionally interrogated in parallel in mice. Enrichment, synergy and clonality analyses unveiled a quantitative landscape of genetic interactions in metastasis.
Metastasis, the major lethal factor of solid tumors, is controlled by a complex network of genetic interactions. However, a systems-level understanding of the genetic interactions driving metastatic spread is lacking. Due to various technological challenges, high-throughput in vivo interrogation of double knockouts in mammalian species has not yet been reported in the literature. Thus, a streamlined approach is essential for rapidly mapping out a global, clinically relevant metastasis gene networks with high resolution.
The discovery and characterization of the type V CRISPR system Cpf1 (CRISPR from Prevotella and Francisella, also known as Cas12a) has empowered genome editing of multiple loci in individual cells. Cpf1 is a single component RNA-guided nuclease that can mediate target cleavage with a single crRNA. Unlike Cas9, Cpf1 does not require a tracrRNA, which greatly simplifies multiplexed genome editing of two or more loci simultaneously through the use of a single crRNA array targeting different genes. Thus, Cpf1 is an ideal system for investigating genetic interactions in vivo, with substantial advantages in library design and readout when compared to Cas9-based approaches. Leveraging the Cpf1 system, MCAP (Massively-parallel crRNA array profiling) was developed: an approach for in vivo high-throughput quantitative mapping of double or higher dimensional genetic perturbations. A UMI-barcoded high-density MCAP library was designed with 11,934 crRNA arrays (SEQ ID NOs: 9,762-21,695) targeting 325 gene pairs significantly mutated in human metastases, with high-redundancy crRNA array coverage for each gene and gene pair. Using this library, MCAP was demonstrated to be a powerful tool for functional interrogation of hundreds of double knockouts and their single knockout counterparts for their metastatic potential in mice.
To establish a CRISPR/Cpf1 lentiviral system for characterization of mutation combinations in cancer, a human-codon-optimized LbCpf1 expression vector (pLenti-EFS-Cpf1-blast, LentiCpf1 for short) and a crRNA expression vector (pLenti-U6-DR-crRNA-puro, Lenti-U6-crRNA for short) were generated (
In order to perform high-throughput genetic investigation of metastasis suppression in vivo, it is important to evaluate the library diversity that can be accommodated upon introduction of the cell pool. To that end, a mock library of degenerate 8 mers was constructed and cloned into the base Lenti-U6-crRNA vector (
To investigate whether Cpf1 multiplexed gene targeting could be utilized for high-throughput investigation of mutation combinations, massively-parallel Cpf1-crRNA array profiling (MCAP) was developed. Considering the resolution of library complexity under in vivo cellular dynamics, genes significantly mutated in a human metastasis cohort (MET-500) (Robinson, D. R. et al. (2017) Nature 548, 297-303), and the top hits from a single-gene metastasis screen in mice (Chen, S. et al. (2015) Cell 160, 1246-1260) were focused on (
Lentiviral pools were generated from the MCAP-MET plasmid library and Cpf1+ KPD cells were infected (
The relative abundances of these various barcoded-crRNA clones were examined (
To map the metastatic potential of all these single and double knockouts in an unbiased manner, the barcoded-crRNA counts were collapsed to the crRNA array level (Supplementary
In addition to the binary FDR-based enrichment analysis above, the relative metastatic potential of the various genotypes represented in the MCAP-MET library were quantitatively compared using the information of relative abundance for all crRNA arrays in each sample. Aggregating by sample type, the average abundances of each crRNA array in cell pools (n=6), primary tumors (n=10), and lung metastases (n=37) were compared (
Analyses suggested that certain gene pairs may be especially synergistic in promoting tumorigenesis and/or metastasis. To quantitatively identify such mutation combinations, the gene-level data were utilized to compare the normalized abundances of each DKO gene pair with its two constituent SKO genes across all primary tumors and lung metastases (total n=47) (
Due to the complex nature of biological systems, a single gene is far from sufficient to explain the clinical and pathological variation observed across patients. Genetic interactions are the building blocks of highly connected biological networks, and their modular nature enables biological pathways to take on a variety of forms—linear, branching divergent, convergent, feed-forward, feedback, or any combination of the above. These complex interactions may account for a substantial part of variation for intricate phenotypes in complex biological or pathological processes such as cancer. Numerous theories and algorithms have been developed to understand such complex networks and to predict genetic interactions. However, predictions have often been surprised by unexpected experimental findings, urging for experimental testing of combinatorial perturbations in a systems manner.
High-throughput genetic studies are a powerful approach for mapping genes to their associated phenotypes. Unbiased and quantitative analysis of double knockouts enables phenotypic assessment of all possible combinations of any given gene pairs. Advances in high-throughput technologies utilizing RNA-interference-based gene knockdown or CRISPR/Cas9-based gene knockout, activation and repression, have enabled genome-scale studies in multiple species across various biological applications. While high-throughput genetic perturbation approaches have been developed to map out the landscape of genetic interactions in yeast and in worms, large-scale double knockout studies in mammalian species are relatively scarce, due to the exponentially scaling number of possible gene combinations and the technological challenges of generating and evaluating double knockouts. Recently, several high-throughput double perturbations have been performed in mammalian cells using RNA interference (RNAi) or clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 technologies. However, RNAi-based methods act on the level of mRNA silencing. Though CRISPR/Cas9-based methods can induce complete knockouts, the dependence of Cas9 on a trans-activating crRNA (tracrRNA) predicates the need for multiple sgRNA cassettes when performing combinatorial knockouts, hindering the scalability of Cas9-mediated high-dimensional studies to in vivo settings.
Cpf1 is a single-effector RNA-guided endonuclease with two orthologs from Acidaminococcus (AsCpf1) and Lachnospiraceae (LbCpf1) capable of efficient genome-editing activity in human cells. Unlike Cas9, Cpf1 requires only a single 39-42-nt crRNA without the need of an additional trans-activating crRNA, enabling one RNA polymerase III promoter to drive an array of several crRNAs targeting multiple loci simultaneously. This unique feature of the Cpf1 nuclease greatly simplifies the design, synthesis and readout of multiplexed CRISPR studies, making it a suitable system to investigate mutation combinations.
In summary, the present study demonstrates the utility of MCAP for simultaneous, massively parallel profiling of single and double knockouts, implementing a high-density library design with 16 independent constructs per double knockout and 208 per single knockout. Even in a complex biological process such as metastasis, MCAP is capable of detecting robust signatures of selection in vivo and quantitatively profiling single and double mutants of strong, moderate and weak phenotypes. MCAP thus represents a powerful new tool for mapping genetic interactions in mammalian species in vivo with unparalleled simplicity and throughput.
The materials and methods employed in Experimental Example 9 are now described.
FlipArray design and construction: The empty EFS-Cpf1-Puro; U6-FipArray vector was constructed by modification of the pY109 lentiviral vector (Zetsche, B. et al. (2017) Nat. Biotechnol. 35, 31-34). After BsmbI digestion (FastDigest Esp3I, ThermoScientific) to linearize the U6 crRNA expression cassette, oligo cloning was performed to insert a lox66 sequence, a DR, two BsmbI sites, and an inverted lox71. The empty vector thus expresses LbCpf1 and puromycin resistance from an EFS promoter, while a U6 promoter drives expression of a lox66/lox71 flanked crRNA expression module containing two BsmbI sites. BsmbI digestion and oligo cloning was then used to insert FlipArrays into the empty vector. For a given pair of crRNAs, the following oligo overhangs were used for cloning: Oligo1 5′ overhang: TAGAT; Oligo1 3′ overhang: A; Oligo2 5′ overhang: GTTAT; Oligo2 3′ overhang: A
The main body of the FlipArray was structured as such: 5′-crRNA 1-6×T -6×A-Rev.Complement(crRNA 2)-Rev.Complement(DR)-3′ In certain embodiments, the vector comprising the FlipArray comprises SEQ ID NO: 21,697.
In this study, the following oligo sequences were used to target Nf1 and Pten:
The following crRNA spacer sequences were also used, with analogous oligo designs for cloning into the Cpf1-Flip vector:
Lenti-Cre vector design and construction: The Lenti-Cre vector was designed to express the Cre recombinase under a constitutive EFS promoter. The plasmid was generated by PCR amplification of Cre and EFS fragments followed by Gibson assembly into a previous lentiviral vector backbone (lentiGuidePuro) Sanjana, et al. (2014) Nat. Methods 11, 783-784).
Cell culture and genomic DNA extraction: KPD cells, E0771 cells, and HEK293T cells were cultured in DMEM supplemented with 10% FBS and 1% penicillin/streptomycin. Experiments were conducted with at least 2 independent cellular replicates. For genomic DNA extraction, approximately 500,000 cells were isolated. Cells were spun down at 500 rpm for 5 minutes and washed once with 1×PBS. After removing the supernatant, cell pellets were resuspended in 500 ul QuickExtract DNA Extraction Solution (Epicentre). Cells were then incubated at 65° C. for 20 minutes, followed by incubation at 85° C. for 5 minutes to deactivate the enzymes.
Detection of FlipArray inversion at the genomic DNA level by PCR: The following primers were used to amplify the U6 cassette from genomic DNA:
PCR conditions: 98° C. 2 minutes, 32 cycles of (98° C. 1 second, 62° C. 5 seconds, 72° C. 15 seconds), 72° C. 2 minutes, 4° C. hold.
Following Qiagen PCR purification, 2 ng of the first PCR were used for the second inversion-specific or non-inverted-specific PCR. The following primers were used for detection of non-inverted or inverted FlipArrays:
PCR conditions: 98° C. 2 minutes, 14 cycles of (98° C. 1 second, 62° C. 5 seconds, 72° C. 2 seconds), 72° C. 2 minutes, 4° C. hold. PCR reactions specific to non-inverted and inverted FlipArrays were performed and analyzed simultaneously for each sample. Quantification was done on 2% E-gel using low-range quantitative ladder (ThermoFisher), and was normalized to the first PCR product.
Detection and quantification of FlipArray inversion at the RNA transcript level: KPD cells were cultured in DMEM supplemented with 10% FBS and 1% penicillin/streptomycin. For RNA extraction, approximately 200,000 cells were isolated and spun down at 500 rpm for 5 minutes. After a PBS wash, cells were resuspended in 450 ul TRIzol. 100 ul of chloroform was then added to each tube, followed by rigorous vortexing for 15 seconds and centrifuging at 14,000 rpm for 10 minutes. The supernatant containing RNA was then purified using a Qiagen RNeasy Kit following the RNA cleanup protocol. cDNA was generated by reverse transcription with random hexamers. PCR detection of inverted crRNA FlipArray transcripts was done using the following primers:
PCR conditions: 98° C. 2 minutes, 34 cycles of (98° C. 1 second, 56° C. 5 seconds, 72° C. 5 seconds), 72° C. 2 minutes, 4° C. hold.
As a normalization control, PCR detection of Cpf1 transcripts was done using the following primers:
PCR conditions: 98° C. 2 minutes, 40 cycles of (98° C. 1 second, 56° C. 5 seconds, 72° C. 20 seconds), 72° C. 2 minutes, 4° C. hold. Quantification of inverted FlipArray RNA abundance was done on 2% E-gel using low-range quantitative ladder (ThermoFisher), and was normalized to Cpf1 mRNA transcript abundance.
Detection of Cpf1 mutagenesis: The genomic regions flanking the crRNA target sites were amplified from genomic DNA using the following primers:
PCR conditions: 98° C. 2 minutes, 32 cycles of (98° C. 1 second, 63° C. 5 seconds, 72° C. 20 seconds), 72° C. 2 minutes, 4° C. hold.
The genomic DNA from approximately 1000 cells was used for PCR with the NPF and DVF FlipArrays. For the TSG-Immune FlipArray library experiments, genomic DNA from approximately 6000 cells were used to account for the pooled nature of the experiment. The resultant PCR products were used for Nextera library preparation following manufacturer protocols. Reads were mapped to the mm10 or hg38 genome using BWA-MEM (Li, H ArXiv13033997 Q-Bio (2013)), with settings -t 8 -w 200. After identification of indel variants using the pileup2indel function in VarScan v2.3.9, a 1% variant frequency threshold was to identify high confidence variants for NPF and DVF experiments. A less stringent 0.2% variant frequency threshold was used for the TSG-Immune experiments due to their pooled nature.
Sample size determination: No specific methods were used to predetermine sample size.
Blinding statement: Investigators were blinded for sequencing data analysis with generic sample IDs, but not blinded for PCR or RT-PCR.
The results of the experiments from Example 9 are now described.
Mutations and genetic alterations are often sequentially acquired in various biological and pathological processes, such as development, evolution, and cancer. Certain phenotypes only manifest with precise temporal sequences of genetic events. While multiple approaches have been developed to model the effects of mutations in tumorigenesis, few recapitulate the stepwise nature of cancer evolution. A flexible sequential mutagenesis system, Cpf1-Flip, with inducible inversion of a single crRNA array (FlipArray), was created, and its application in stepwise mutagenesis in murine and human cells was demonstrated. As a proof-of-concept, Cpf1-Flip was further utilized in a pooled-library approach to model the acquisition of diverse resistance mutations to cancer immunotherapy. Cpf1-Flip offers a simple, versatile and controlled approach for precise mutagenesis of multiple loci in a sequential manner.
When loxP sites are arranged such that they point towards each other, Cre recombination leads to inversion of the intervening sequence. However, this process leads to the complete regeneration of the loxP sites, thereby allowing Cre to continually catalyze DNA inversion. As continuous Cre-mediated inversion would be counterproductive in many applications, mutant loxP sites have been characterized that enable unidirectional Cre inversion. When the mutant loxP sites lox66 and lox71 are recombined, they generate a wildtype loxP site and a double-mutant lox72. Cre has a substantially lower affinity for lox72, thus leading to mostly irreversible inversion of the floxed DNA segment.
A U6 expression cassette was designed containing two inverted BsmbI restriction sites, flanked by a lox66 sequence and an inverted lox71 sequence (
Cre-mediated recombination of the lox66 and lox71 mutant loxP sites leads to inversion of the FlipArray, generating a wildtype loxP and a double-mutant loxP, lox72. As the affinity of Cre recombinase for lox72 is substantially lower than for wildtype loxP, inversion of the FlipArray is mostly irreversible. After inversion, the two crRNAs trade places and the second crRNA becomes expressed. Thus, in the absence of Cre, Cpf1 generates indels at the target site of the first crRNA; after Cre recombination, Cpf1 is directed to the target site of the second crRNA. This approach is herein termed Cpf1-Flip. In short, the Cpf1-Flip system leverages CRISPR-Cpf1 mutagenesis and melds it with the inversion capabilities of Cre/lox66/lox71 to enable programmable two-step mutagenesis.
To demonstrate sequential editing of cancer genes, Cpf1-Flip was first applied to generate Neurofibromatosis I (Nf1) and Phosphatase and tensin homolog (Pten) mutations in a mammalian lung cancer cell line (KPD). A FlipArray containing a spacer targeting Nf1 (crNf1) and an inverted spacer targeting Pten (crPten) (crNf1-crPten FlipArray, or NPF) was cloned in. The cells were infected with lentivirus containing EFS-Cpf1-Puro; U6-NPF (
To detect Cre-mediated inversion of the FlipArray, genomic DNA was isolated from the NPF-expressing lung cancer cells before infection with EFS-Cre and 10 days after infection. Primers were designed that would only generate a product if the FlipArray had successfully inverted (
The target sites of crNf1 and crPten were sequenced to determine whether the NPF construct had indeed created mutations in a controlled stepwise manner. Uninfected controls did not have any significant variants at crNf1 or crPten target sites (
To further demonstrate the utility of Cpf1-Flip in diverse biological systems, a FlipArray was designed targeting two human genes, DNA Methyltransferase 1 (DNMT1) and Vascular Endothelial Growth Factor A (VEGFA). The crRNA in the first position targets DNMT1 (crDNMT1) while the second, inverted crRNA targets VEGFA (crVEGFA) (crDNMT1-crVEGFA FlipArray, or DVF) (
Next, to determine whether the Cpf1-Flip system had enabled sequential mutagenesis at the crDNMT1 and crVEGFA target sites, deep sequencing was performed. As anticipated, uninfected controls did not have significant mutations at either site (
Cpf1-Flip was applied to model acquired resistance to immunotherapy in breast cancer cells (E0771 cell line). A small pool of FlipArrays was designed in which the first crRNA targeted Nf1 while the inverted second crRNA targeted a panel of immunomodulatory factors (Cd274, Ido1, B2m, Fas1, Jak2, and Lgals9; referred to as TSG-Immune FlipArray library). These factors are thought to influence anti-tumor immunity and have been implicated in acquired resistance to checkpoint inhibitors. After pooled lentiviral transduction of E0771 cells with the TSG-Immune FlipArray library, the cells were infected with EFS-Cre lentivirus to induce FlipArray inversion (
Targeted amplicon sequencing confirmed efficient mutagenesis of Nf1 (
The present disclosure provides Cpf1-Flip, an inducible sequential mutagenesis system using invertible crRNA FlipArrays. As a proof-of-concept, sequential mutagenesis were demonstrated in both mouse and human cells, while additionally performing pooled sequential mutagenesis in a cancer cell line. These data revealed that the cutting efficiency of the second target loci can be low with certain crRNAs despite successful FlipArray inversion. The most likely explanation for the discordance between FlipArray inversion and subsequent mutagenesis of the second target locus is the differing efficiencies of the crRNAs themselves. This is corroborated by the variance observed across independent crRNAs in the pooled TSG-Immune library (
In certain non-limiting embodiments, by altering the composition and length of the crRNA arrays within the FlipArray, one can readily engineer more complex CRISPR perturbation programs. In other non-limiting embodiments, designs with two or more crRNAs within an invertible FlipArray at baseline can empower stepwise double knockouts (2+2, or quadruple knockouts as an end result) or higher dimensional sequential mutagenesis. In other non-limiting embodiments, the use of modified Cre systems such as CreER, photoactivatable Cre, and split-Cre can provide even greater control of FlipArray inversion. In yet other non-limiting embodiments, utilizing orthogonal recombinases and recognition sites in the crRNA array allows for even more complex multi-step gene editing programs. In yet other non-limiting embodiments, through the use of tethered Cpf1 variants, FlipArrays can also be used for sequential and reversible gene activation, repression, or epigenetic modification (
In certain non-limiting embodiments, such applications of Cpf1-Flip and its derivatives can be self-contained within a single viral vector, facilitating direct in vivo sequential genetic manipulations and functional studies.
The recitation of a listing of elements in any definition of a variable herein includes definitions of that variable as any single element or combination (or subcombination) of listed elements. The recitation of an embodiment herein includes that embodiment as any single embodiment or in combination with any other embodiments or portions thereof.
The disclosures of each and every patent, patent application, and publication cited herein are hereby incorporated herein by reference in their entirety. While this invention has been disclosed with reference to specific embodiments, it is apparent that other embodiments and variations of this invention may be devised by others skilled in the art without departing from the true spirit and scope of the invention. The appended claims are intended to be construed to include all such embodiments and equivalent variations.
The present application is entitled to priority under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application No. 62/521,600, filed Jun. 19, 2017, and U.S. Provisional Patent Application No. 62/660,467, filed Apr. 20, 2018, which are both incorporated by reference in their entireties herein.
This invention was made with government support under CA21974, CA209992, CA196530, and GM007205 awarded by National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US18/38242 | 6/19/2018 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62521600 | Jun 2017 | US | |
62660467 | Apr 2018 | US |