The invention relates to the field of personalized medicine, and the ability to administer targeted therapies consequently to biomarkers functional identification.
In particular, the invention relates to the field of clinical applicable methods for the characterization, and especially the functional evaluation, of genetic variants in a patient. In particular, the invention relates to the field of the characterization, and classification, of variants of uncertain significance or other unreported variants in patients.
The applications include, but are not limited to, the characterization of genetic variants which are presumed to be involved in the occurrence or re-occurrence of cancers in patients.
Over the last two decades, the therapeutic options available in oncology have evolved towards therapies targeted on the basis of tumoral genetic information. These new treatments improve patient outcomes and have fewer adverse effects. However, the functional significance of many variants of targetable genes remains unknown. In the Clinvar database, 237 934 variants of uncertain significance (VUS) are registered for the total set of genes considered. Most are missense mutations and potential splice site variants. With recent technical improvements and the development of whole-exome and whole-genome sequencing, this number of VUS is likely to rise still further in the coming years, with one in three variants classified as VUS overall, and 80% located on tumor suppressor genes. As VUS affect treatment options and patient management, this trend highlights the need for a method of functional testing, which could be applied in oncology, but also other fields of medicine.
Mutations of the BRCA1 and BRCA2 genes can be used to illustrate this problem. These two tumor suppressor genes have many roles, mostly in genome protection via the homologous recombination (HR) pathway. Inheritable mutations of BRCA1/2 increase the risk of breast cancer (50-80%) and ovarian cancer (40-60%), and have also been implicated in prostatic and pancreatic cancers. With the development of PARP inhibitors, pathogenic mutations of these genes are now biomarkers of response to these treatments. Their multiple interaction domains and protein partners account for the tremendous diversity of variants found in tumors. More than 2660 BRCA1 and 4840 BRCA2 VUS are registered in Clinvar, which includes 35.7% and 45.7% of the total known variants of BRCA1 and BRCA2 respectively. However, databases, such as Clinvar, BRCA exchange and UMD, mostly contain constitutive variants. Somatic variants, which may be detected in only one or two individuals, also exist and many such variants remain unreported. Functional testing in vitro is currently based on transcriptional activation, HR activity and splicing. However, such tests are not necessarily compatible with clinical management in terms of the time taken to obtain results to guide treatment.
CRISPR-Cas9 genome editing has been proposed as a promising tool in the context of discovering gene function for large-scale screens, as reported by Doench. (Am I ready for CRISPR? A user's guide to genetic screens. Nat. Rev. Genet. 19, 67-80 (2017)).
Findlay et al. (Saturation editing of genomic regions by multiplex homology-directed repair. Nature 513, 120-123 (2014)) reports a method coupling CRISPR-Cas9 genome editing with multiplex homology-directed repair (HDR) using a library of donor templates.
US20160076093 and Findlay et al. (Accurate functional classification of thousands of BRCA1 variants with saturation genome editing. Nature 294520 (2018)) also report a method wherein all possible SNVs are simultaneously introduced and concurrently assayed by determining the ratio of the frequency of each SNV to its frequency in a whole plasmid library. However, the above-mentioned variants are meant for high-throughput analysis and their application to the clinical practice is uncertain.
Hence, there remains a need for new tools and methods for meeting the challenges of personalized medicine, and especially of VUS classification.
Thus there remains a need for new tools and methods which are applicable efficiently, reliably, and in a timeframe that is compatible with clinical practice.
More specifically, there remains a need for tools and methods which can be undergone within three weeks.
The invention has for purpose to meet the above-mentioned needs.
The invention relates to an in vitro method for characterizing one or more genetic variant(s) of a patient, comprising at least the steps of:
a) bringing into contact a first and a second population of haploid cells with:
b) culturing said first and second population of haploid cells in a culture medium;
c) determining the occurrence of the genetic variant(s) in the first and second population of haploid cells, thereby characterizing the genetic variant(s) of the patient.
Herein, we adapt the concept of saturation genome editing, to the clinical study of genetic variants in a patient. By comparing editing frequencies, by NGS sequencing, between a variant of interest and a silencing mutation classified as benign, we were able to evaluate the functional consequences of 33 mutations of BRCA1/2, including 23 variants of uncertain significance (VUS). We further extend the method to the evaluation of other clinically-relevant variants, including seven variants of POLE, another tumor suppressor gene biomarker for immunotherapy administration, demonstrating the utility of this approach for the characterization of genetic variants within a timeframe compatible with clinical application, and of essential tumor suppressor genes in general. The essentiality of the tested genes, including in a non-limitative manner BRCA1, BRCA2 and POLE genes in the haploid model is also an important feature, making it possible to evaluate function rapidly, within three weeks, compatible with direct clinical application.
The in vitro method presented here is thus effective for the characterization of the functional impact of genetic variants in a patient, in particular of VUS, such as BRCA1 and BRCA2 VUS. More importantly, this experimental framework can be used to obtain the necessary biological evidence of VUS function required for the prescription of targeted treatment within three weeks, which is compatible with use in clinical application.
In particular, the in vitro method is particularly suitable for characterizing genetic variants localized on, or associated to, tumor suppressor genes.
The patient carrying the genomic abnormality therefore benefits from an analysis of his or her own mutation, with potential consequences for relatives. This is particularly important for extremely rare somatic variants, which resemble orphan diseases. The extension of its application to the study of other variants and/or characterization of all essential tumor suppressor genes is hence enabled by the proposed experimental model, including, but not limited to, the field of oncology. At a time at which purely in silico approaches are being used to guide therapeutic decisions, a method evaluating the functional implications of VUS is much needed. The experimental model, and the methods reported herein, will be further developed herebelow and in the examples.
As used herein, a «Genetic variant», for a given patient, relates to the substitution, the deletion or the insertion of at least one (one or more) nucleotides at a specific position, or genomic region of interest, in the genome of a patient. Non-limiting examples of genetic variants include frameshift, stop gained, start lost, splice acceptor, splice donor, stop lost, inframe indel, missense, splice region, synonymous and copy number variants. Non-limiting types of copy number variants include deletions and duplications. When the substitution consists of a single nucleotide, it may also be referred herein as a single-nucleotide variant (SNV). Types of SNVs which are considered as genetic variants according to the methods of the invention, may thus include SNVs in the non-coding region and SNVs in the coding region. Such genetic variants are generally defined by reference to the sequence most prevalent in a population.
The genetic variants (i.e. variants of uncertain significance or unreported variants) which are particularly considered herein are those which concern essential genes. As used herein, an «essential gene» is a gene for which loss of function results in loss of viability or fitness.
As used herein, the terms «Variant of uncertain significance», or «Variant of uncertain significance», or «VUS», are used interchangeably and refer to a form of genetic variant that has been identified through genetic testing but whose significance to the function of a gene or protein or the health of an organism is not known at the time of characterization.
«Locus of interest» and «genomic region of interest» are used interchangeably herein to mean the region of the genome of the patient.
The terms «at least one» and «one or more» are used interchangeably. Accordingly, the term «at least one» may comprise «two or more», «three or more», «four or more», «five or more», and so on.
The terms “polynucleotide”, “nucleotide”, “nucleotide sequence”, “nucleic acid” and “oligonucleotide” are used interchangeably. They refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have any three dimensional structure, and may perform any function, known or unknown. The following are non-limiting examples of polynucleotides: coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. A polynucleotide may comprise one or more modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component.
“Nuclease” and “endonuclease” are used interchangeably herein to mean one or more enzymes or enzyme-containing complexes (which may include protein-nucleic acid complexes such as Cas9 in complex with sgRNAs) which possesses catalytic activity for polynucleotide cleavage, in particular DNA cleavage. Endonucleases which are considered include naturally occurring, non-naturally occurring, recombinant, chimeric and/or heterologous endonucleases, and analogs thereof.
Analogs of endonucleases may include endonucleases which share at least 80% of sequence identity with a given endonuclease, which includes at least 80%; 85%; 90% and 95% of identity, based on an optimum alignment.
The optimum alignment of the sequences for the comparison can be carried out by computer using known algorithms. Entirely preferably, the percentage sequence identity is determined using the CLUSTAL W2 software (version 2.1), the parameters being fixed as «default». By “endonuclease suitable for targeting a genomic region of interest” it is meant any CRISPR/Cas endonuclease, as described above, that is able to target specifically a genomic region of interest of a given cell, and to provide catalytic activity for polynucleotide cleavage on the targeted genomic region.
Thus, said definition may include both:
(i) endonucleases having at least one targeting domain and at least one active domain for polynucleotide cleavage; and/or
(ii) endonuclease having at least one active domain for polynucleotide cleavage, wherein the targeting domain is part of a distinct polypeptide and/or a distinct polynucleotide.
In general, “CRISPR system” or «CRISPR/Cas system» refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding one or more of: a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA), a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a “spacer” in the context of an endogenous CRISPR system), a single-guide nucleic acid (in particular a single-guide RNA (sgRNA)) or other associated sequences and transcripts from a CRISPR locus needed for targeting a genomic region of interest (i.e. a genetic variant of the patient which is to be characterized).
In some embodiments, one or more elements of a CRISPR system is/are derived from a type I, type II, or type III CRISPR system. In some embodiments, one or more elements of a CRISPR system is derived from a particular organism comprising an endogenous CRISPR system, such as Streptococcus pyogenes. In general, a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system). In the context of formation of a CRISPR complex, “target sequence” refers to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex. Full complementarity is not necessarily required, provided there is sufficient complementarity to cause hybridization and promote formation of a CRISPR complex. A target sequence may comprise any polynucleotide, such as DNA or RNA polynucleotides.
Among CRISPR-Cas systems, a type II CRISPR system from Streptococcus pyogenes involves only a single gene encoding the Cas9 protein and two RNAs—a mature CRISPR RNA (crRNA) and a partially complementary trans-acting RNA (tracrRNA)—which are necessary and sufficient for RNA-guided silencing of foreign DNAs.
Accordingly, a CRISPR-Cas system suitable for the methods of the invention may involve a Cas endonuclease and a CRISPR-Cas system guide nucleic acid, such as a CRISPR-Cas system guide RNA that hybridizes with the target sequence.
Examples of CRISPR/Cas endonucleases include class 2 CRISPR/Cas endonucleases, such as: (a) type II CRISPR/Cas proteins, e.g., a Cas9 protein and the like; (b) type IIA CRISPR/Cas proteins, e.g., a Csn2 protein and the like; (c) type IIB CRISPR/Cas proteins, e.g., a Cas4 protein and the like; (d) type IIC CRISPR/Cas proteins; (e) type V CRISPR/Cas proteins, e.g., a Cpf1 polypeptide, a C2c1 polypeptide, a C2c3 polypeptide, and the like; and (f) type VI CRISPR/Cas proteins, e.g., a C2c2 protein, a Cas13b protein, a Cas13c protein, a Cas13d protein and the like.
In particular, “Class 2 CRISPR system” which are considered by the invention include Type II (a sub-type of “class 2”) CRISPR systems such as CRISPR/Cas9 or the more recently characterized CRISPR from Provotella and Francisella 1 (Cpf1) in Zetsche et al. (“Cpf1 is a Single RNA-guided Endonuclease of a Class 2 CRISPR-Cas System (2015); Cell; 163, 1-13).
By “cleavage domain” or “active domain” or “nuclease domain” of a nuclease/endonuclease it is meant the polypeptide sequence or domain within the nuclease which possesses the catalytic activity for polynucleotide cleavage, in particular DNA cleavage. A cleavage domain can be contained in a single polypeptide chain or cleavage activity can result from the association of two (or more) polypeptides. A single nuclease domain may consist of more than one isolated stretch of amino acids within a given polypeptide.
By “recombination” it is meant a process of exchange of genetic information between two polynucleotides. As used herein, “homology-directed repair (HDR)” refers to the specialized form DNA repair that takes place, for example, during repair of double-strand breaks in cells. This process requires nucleotide sequence homology, uses a “donor” molecule to template repair of a “target” molecule (i.e., the one that experienced the double-strand break), and leads to the transfer of genetic information from the donor to the target. Homology-directed repair may result in an alteration of the sequence of the target molecule (e.g., insertion, deletion, mutation), if the donor polynucleotide differs from the target molecule and part or all of the sequence of the donor polynucleotide is incorporated into the target DNA. In some embodiments, the donor polynucleotide, a portion of the donor polynucleotide, a copy of the donor polynucleotide, or a portion of a copy of the donor polynucleotide integrates into the target genomic region of interest. When HDR requires a «donor» nucleic acid, the genomic region of interest can be defined as the region that is complementary to the «donor» nucleic acid.
By “non-homologous end joining (NHEJ)” it is meant the repair of double-strand breaks in DNA by direct ligation of the break ends to one another without the need for a homologous template (in contrast to homology-directed repair, which requires a homologous sequence to guide repair). NHEJ often results in the insertion or deletion (indel) of one or more nucleotides near the site of the double-strand break.
“Complementarity” refers to the ability of a nucleic acid to form hydrogen bond(s) with another nucleic acid sequence by either traditional Watson-Crick or other non-traditional types. A percent complementarity indicates the percentage of residues in a nucleic acid molecule which can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 5, 6, 7, 8, 9, 10 out of 10 being 50%, 60%, 70%, 80%, 90%, and 100% complementary). “Perfectly complementary” means that all the contiguous residues of a nucleic acid sequence will hydrogen bond with the same number of contiguous residues in a second nucleic acid sequence. “Substantially complementary” as used herein refers to a degree of complementarity that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%. 97%, 98%, 99%, or 100% over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, or more nucleotides, or refers to two nucleic acids that hybridize under stringent conditions.
“Hybridization” refers to a reaction in which one or more polynucleotides react to form a complex that is stabilized via hydrogen bonding between the bases of the nucleotide residues. The hydrogen bonding may occur by Watson Crick base pairing, Hoogstein binding, or in any other sequence specific manner. The complex may comprise two strands forming a duplex structure, three or more strands forming a multi stranded complex, a single self 17 hybridizing strand, or any combination of these. A hybridization reaction may constitute a step in a more extensive process, such as the initiation of PCR, or the cleavage of a polynucleotide by an enzyme. A sequence capable of hybridizing with a given sequence is referred to as the “complement” of the given sequence.
Hybridization requires that the two nucleic acids contain complementary sequences, although mismatches between bases are possible. The conditions appropriate for hybridization between two nucleic acids depend on the length of the nucleic acids and the degree of complementation, variables well known in the art. The greater the degree of complementation between two nucleotide sequences, the greater the value of the melting temperature (Tm) for hybrids of nucleic acids having those sequences. For hybridizations between nucleic acids with short stretches of complementarity (e.g. complementarity over 35 or less, 30 or less, 25 or less, 22 or less, 20 or less, or 18 or less nucleotides) the position of mismatches becomes important (see Sambrook et al., supra, 11.7-11.8). Typically, the length for a hybridizable nucleic acid is at least about 8 or 10 nucleotides. Illustrative minimum lengths for a hybridizable nucleic acid are: at least about 15 nucleotides; at least about 20 nucleotides; at least about 22 nucleotides; at least about 25 nucleotides; and at least about 30 nucleotides).
It is understood in the art that the sequence of polynucleotide need not be 100% complementary to that of its target nucleic acid to be specifically hybridizable or hybridizable. Moreover, a polynucleotide may hybridize over one or more segments such that intervening or adjacent segments are not involved in the hybridization event (e.g., a loop structure or hairpin structure). A polynucleotide can comprise at least 70%, at least 80%, at least 90%, at least 95%, at least 99%, or 100% sequence complementarity to a target region within the target nucleic acid sequence to which they are targeted. For example, an antisense nucleic acid in which 18 of 20 nucleotides of the antisense compound are complementary to a target region, and would therefore specifically hybridize, would represent 90 percent complementarity. In this example, the remaining noncomplementary nucleotides may be clustered or interspersed with complementary nucleotides and need not be contiguous to each other or to complementary nucleotides. Percent complementarity between particular stretches of nucleic acid sequences within nucleic acids can be determined routinely using BLAST programs (basic local alignment search tools) and PowerBLAST programs known in the art (Altschul et al., J. Mol. Biol., 1990, 215, 403-410; Zhang and Madden, Genome Res., 1997, 7, 649-656) or by using the Gap program (Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics Computer Group, University Research Park, Madison Wis.), using default settings, which uses the algorithm of Smith and Waterman (Adv. Appl. Math., 1981, 2, 482-489).
A “vector” or “expression vector” is a replicon, such as plasmid, phage, virus, or cosmid, to which another DNA segment, i.e. an “insert”, may be attached so as to bring about the replication of the attached segment in a cell.
An “expression cassette” comprises a DNA coding sequence operably linked to a promoter. “Operably linked” refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner. For instance, a promoter is operably linked to a coding sequence if the promoter affects its transcription or expression.
The terms “recombinant expression vector,” or “DNA construct” are used interchangeably herein to refer to a DNA molecule comprising a vector and at least one insert. Recombinant expression vectors are usually generated for the purpose of expressing and/or propagating the insert(s), or for the construction of other recombinant nucleotide sequences. The insert(s) may or may not be operably linked to a promoter sequence and may or may not be operably linked to DNA regulatory sequences.
By «silent mutation» it is meant mutations which, when introduced into the genomic region of interest, do not alter the phenotype of the cell and/or organism in which they occur. Silent mutations can occur in non-coding regions (outside of genes or within introns), or they may occur within exons. When silent mutations occur within exons, they either do not result in a change to the amino acid sequence of a protein, or result in the insertion of an alternative amino acid with similar properties to that of the original amino acid. Yet, according to a most preferred embodiment of the invention, «silent mutations consist of mutations which occur within an exon or open-reading frame but that do not result in a change to the amino acid sequence of the protein, or fragment thereof, corresponding to said exon or open-reading frame. Examples of silent mutations include mutations introducing restriction site(s) recognized by one or more endonucleases, but that do not alter the phenotype of the cell and/or organism.
Accordingly, «non-silent mutations» preferably consist of mutations which occur within an exon and that do result in a change to the amino acid sequence of the protein, or fragment thereof, corresponding to said exon. Said change may include deletions, substitutions and insertions of another amino acid sequence. Examples of non-silent mutations include mutations introducing a STOP codon within an open reading frame (ORF).
As used herein a «cell» may encompass the group consisting of eukaryotic and non-eukaryotic cells; which includes eukaryotic cells, and prokaryotic cells selected from bacteria and archaebacterias.
As used herein, an «haploid cell» refers to a cell having a single set of chromosomes;
An «eukaryotic cell» may be selected from the group comprising or consisting of: a yeast, an eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, in invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, a mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell. Examples of eukaryotic cells which may be considered by the invention include PC9 (lung cancer) cells, BT474 and MCF7 (breast cancer) cells, and DLD-1 and HCT116 (colon cancer), HAP1 (human near-haploid cell line derived from the male chronic myelogenous leukemia (CML) cell line KBM-7) cells.
The term “naturally-occurring” or “unmodified” as used herein as applied to a nucleic acid, a polypeptide, a cell, or an organism, refers to a nucleic acid, polypeptide, cell, or organism that is found in nature. For example, a polypeptide or polynucleotide sequence that is present in an organism (including viruses) that can be isolated from a source in nature and which has not been intentionally modified by a human in the laboratory is naturally occurring.
The term “chimeric” as used herein as applied to a nucleic acid or polypeptide refers to two components that are defined by structures derived from different sources. For example, where “chimeric” is used in the context of a chimeric polypeptide (e.g., a chimeric Cas9/Csn1 protein), the chimeric polypeptide includes amino acid sequences that are derived from different polypeptides. A chimeric polypeptide may comprise either modified or naturally-occurring polypeptide sequences (e.g., a first amino acid sequence from a modified or unmodified Cas9/Csn1 protein; and a second amino acid sequence other than the Cas9/Csn1 protein). Similarly, “chimeric” in the context of a polynucleotide encoding a chimeric polypeptide includes nucleotide sequences derived from different coding regions (e.g., a first nucleotide sequence encoding a modified or unmodified Cas9/Csn1 protein; and a second nucleotide sequence encoding a polypeptide other than a Cas9/Csn1 protein).
The term “chimeric polypeptide” refers to a polypeptide which is made by the combination (i.e., “fusion”) of two otherwise separated segments of amino sequence, usually through human intervention. A polypeptide that comprises a chimeric amino acid sequence is a chimeric polypeptide. Some chimeric polypeptides can be referred to as “fusion variants.”
“Heterologous,” as used herein, means a nucleotide or polypeptide sequence that is not found in the native nucleic acid or protein, respectively. For example, in a chimeric Cas9/Csn1 protein, the RNA-binding domain of a naturally-occurring bacterial Cas9/Csn1 polypeptide (or a variant thereof) may be fused to a heterologous polypeptide sequence (i.e. a polypeptide sequence from a protein other than Cas9/Csn1 or a polypeptide sequence from another organism). The heterologous polypeptide sequence may exhibit an activity (e.g., enzymatic activity) that will also be exhibited by the chimeric Cas9/Csn1 protein (e.g., methyltransferase activity, acetyltransferase activity, kinase activity, ubiquitinating activity, etc.). A heterologous nucleic acid sequence may be linked to a naturally-occurring nucleic acid sequence (or a variant thereof) (e.g., by genetic engineering) to generate a chimeric nucleotide sequence encoding a chimeric polypeptide. As another example, in a fusion variant Cas9 site-directed polypeptide, a variant Cas9 site-directed polypeptide may be fused to a heterologous polypeptide (i.e. a polypeptide other than Cas9), which exhibits an activity that will also be exhibited by the fusion variant Cas9 site-directed polypeptide. A heterologous nucleic acid sequence may be linked to a variant Cas9 site-directed polypeptide (e.g., by genetic engineering) to generate a nucleotide sequence encoding a fusion variant Cas9 site-directed polypeptide.
“Recombinant,” as used herein, means that a particular nucleic acid (DNA or RNA) is the product of various combinations of cloning, restriction, polymerase chain reaction (PCR) and/or ligation steps resulting in a construct having a structural coding or non-coding sequence distinguishable from endogenous nucleic acids found in natural systems. DNA sequences encoding polypeptides can be assembled from cDNA fragments or from a series of synthetic oligonucleotides, to provide a synthetic nucleic acid which is capable of being expressed from a recombinant transcriptional unit contained in a cell or in a cell-free transcription and translation system. Genomic DNA comprising the relevant sequences can also be used in the formation of a recombinant gene or transcriptional unit. Sequences of non-translated DNA may be present 5′ or 3′ from the open reading frame, where such sequences do not interfere with manipulation or expression of the coding regions, and may indeed act to modulate production of a desired product by various mechanisms (see “DNA regulatory sequences”, below). Alternatively, DNA sequences encoding RNA (e.g., DNA-targeting RNA) that is not translated may also be considered recombinant. Thus, e.g., the term “recombinant” nucleic acid refers to one which is not naturally occurring, e.g., is made by the artificial combination of two otherwise separated segments of sequence through human intervention. This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques. Such is usually done to replace a codon with a codon encoding the same amino acid, a conservative amino acid, or a non-conservative amino acid. Alternatively, it is performed to join together nucleic acid segments of desired functions to generate a desired combination of functions. This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques. When a recombinant polynucleotide encodes a polypeptide, the sequence of the encoded polypeptide can be naturally occurring (“wild type”) or can be a variant (e.g., a mutant) of the naturally occurring sequence. Thus, the term “recombinant” polypeptide does not necessarily refer to a polypeptide whose sequence does not naturally occur. Instead, a “recombinant” polypeptide is encoded by a recombinant DNA sequence, but the sequence of the polypeptide can be naturally occurring (“wild type”) or non-naturally occurring (e.g., a variant, a mutant, etc.). Thus, a “recombinant” polypeptide is the result of human intervention, but may be a naturally occurring amino acid sequence.
The terms “DNA regulatory sequences,” “control elements,” and “regulatory elements,” used interchangeably herein, refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, protein degradation signals, and the like, that provide for and/or regulate transcription of a non-coding sequence (e.g., DNA-targeting RNA) or a coding sequence (e.g., site-directed modifying polypeptide, or Cas9/Csn1 polypeptide) and/or regulate translation of an encoded polypeptide.
In particular, the term “regulatory element” is intended to include promoters, enhancers, internal ribosomal entry sites (IRES), and other expression control elements (e.g. transcription termination signals, such as polyadenylation signals and poly-U sequences). Such regulatory elements are described, for example, in Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif. (1990). Regulatory elements include those that direct constitutive expression of a nucleotide sequence in many types of host cell and those that direct expression of the nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences). A tissue-specific promoter may direct expression primarily in a desired tissue of interest, such as muscle, neuron, bone, skin, blood, specific organs (e.g. liver, pancreas), or particular cell types (e.g. lymphocytes). Regulatory elements may also direct expression in a temporal-dependent manner, such as in a cell-cycle dependent or developmental stage-dependent manner, which may or may not also be tissue or cell-type specific. In some embodiments, a vector comprises one or more pol III promoter (e.g. 1, 2, 3, 4, 5, or more pol I promoters), one or more pol II promoters (e.g. 1, 2, 3, 4, 5, or more pol II promoters), one or more pol I promoters (e.g. 1, 2, 3, 4, 5, or more pol I promoters), or combinations thereof. Examples of pol III promoters include, but are not limited to, U6 and H1 promoters. Examples of pol II promoters include, but are not limited to, the retroviral Rous sarcoma virus (RSV) LTR promoter (optionally with the RSV enhancer), the cytomegalovirus (CMV) promoter (optionally with the CMV enhancer) [see, e.g., Boshart et al, Cell, 41:521-530 (1985)], the SV40 promoter, the dihydrofolate reductase promoter, the β-actin promoter, the phosphoglycerol kinase (PGK) promoter, and the EF1α promoter. Also encompassed by the term “regulatory element” are enhancer elements, such as WPRE; CMV enhancers; the R-U5′ segment in LTR of HTLV-I (Mol. Cell. Biol., Vol. 8(1), p. 466-472, 1988); SV40 enhancer; and the intron sequence between exons 2 and 3 of rabbit β-globin (Proc. Natl. Acad. Sci. USA., Vol. 78(3), p. 1527-31, 1981). It will be appreciated by those skilled in the art that the design of the expression vector can depend on such factors as the choice of the host cell to be transformed, the level of expression desired, etc. A vector can be introduced into host cells to thereby produce transcripts, proteins, or peptides, including fusion proteins or peptides, encoded by nucleic acids as described herein (e.g., clustered regularly interspersed short palindromic repeats (CRISPR) transcripts, proteins, enzymes, mutant forms thereof, fusion proteins thereof, etc.).
In Vitro Methods of the Invention
The invention relates to an in vitro method for characterizing one or more genetic variant(s) of a patient.
Hence, according to a first object, the invention thus relates to an in vitro method for characterizing one or more genetic variant(s) of a patient, comprising at least the steps of:
a) bringing into contact a first and a second population of haploid cells with:
b) culturing said first and second population of haploid cells in a culture medium;
c) determining the occurrence of the genetic variant(s) in the first and second population of haploid cells, thereby characterizing the genetic variant(s) of the patient.
Preferably, the mutation at the corresponding Protospacer Adjacent Motif (PAM) sequence is a silent mutation.
It will be readily understood herein that the efficiency of sequence-specific cleavage by the endonuclease for both conditions should be the same or highly similar in order to compare the genetic variant(s) of both populations of haploid cells, and therefore characterize the genetic variant(s) of the patient.
It will be readily understood herein that CRISPR-Cas systems generally require also the transfection of a guide nucleic acid, in particular of a guide RNA (gRNA) for sequence-specific cleavage. Hence, according to one preferred embodiment, the invention thus relates to an in vitro method for characterizing one or more genetic variant(s) of a patient, comprising at least the steps of:
a) bringing into contact a first and a second population of haploid cells with:
b) culturing said first and second population of haploid cells in a culture medium;
c) determining the occurrence of the genetic variant(s) in the first and second population of haploid cells, thereby characterizing the genetic variant(s) of the patient.
According to one preferred embodiment of the in vitro method, the first and a second population of haploid cells are brought into contact with a same guide nucleic acid, and more particularly a same guide RNA (gRNA) that hybridizes with the target genomic region of interest.
Advantageously, the said first and second nucleic acid, and the guide nucleic acid (when applicable) are brought simultaneously into contact with the populations of cells.
The expression “mutation at the corresponding Protospacer Adjacent Motif (PAM) sequence” may consist of a mutation on the PAM sequence itself, or around the PAM sequence itself, which may thus consist of a mutation within five (i.e. 1, 2, 3 o, 4 or 5) nucleotides before the PAM sequence in the corresponding sequence (i.e. the sequence targeted by the corresponding guide RNA when applicable).
According to one embodiment, the first and second nucleic acid suitable for introducing, after sequence specific cleavage, a mutation at the site of the genetic variant(s) can be selected from a group consisting of: single-stranded deoxyribonucleotide(s) (ssDNA); double-stranded deoxyribonucleotide(s) (dsDNA); single-stranded ribonucleotide(s) (ssRNA); double-stranded ribonucleotide(s) (dsRNA); single-stranded oligo-deoxyribonucleotide(s) (ssODNA); double-stranded oligo-deoxyribonucleotide(s) (dsODNA); single-stranded oligo-ribonucleotide(s) (ssORNA); double-stranded oligo-ribonucleotide(s) (dsORNA); RNA-DNA duplexes; either in a modified or non-modified form. When the said nucleic acids are in a modified form, they may optionally comprise degenerate sequences and non-standard bases.
For instance, the use of a ssODNA as a donor nucleic acid has been described in: Chen et al. (2011). High-frequency genome editing using ssDNA oligonucleotides with zinc-finger nucleases. Nat Methods. 8(9):753-5.
In a non-limitative manner, said first and second nucleic acid may be in the form of messenger RNA (mRNA), transfer RNA, ribosomal RNA, short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers.
According to a preferred embodiment, the first and second nucleic acid are deoxyribonucleic acids, for instance single-stranded oligo-deoxyribonucleotide(s) (ssODNA).
They may be of varying length depending on the nature and length of the genomic region of interest and also for achieving hybridization in the cell and HDR after endonuclease treatment. They may thus comprise or consist of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 100, 150 or more nucleotides.
When the first and second nucleic acids are double-stranded nucleic acids, they may either comprise blunt or sticky ends. According to one embodiment, the first and second nucleic acid comprise blunt ends.
According to one embodiment, the first and second nucleic acid are single-stranded oligo-deoxyribonucleotide(s) (ssODNA).
According to one embodiment, the first and second nucleic acid are single-stranded oligo-deoxyribonucleotide(s) (ssODNA) or blunt-ended double-stranded oligo-deoxyribonucleotide(s) (dsODNA).
According to one embodiment of the in vitro method, the first and a second population of haploid cells are brought into contact with a Class II— Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-CRISPR associated (Cas) (CRISPR-Cas) endonuclease.
According to one particular embodiment of the in vitro method, the endonuclease belongs to the Type II-CRISPR/Cas endonuclease system, and preferably is a Cas9 or a Cpf1 endonuclease.
In particular, the genetic variant(s) to be characterized is/are classified as Variants of Uncertain Significance (VUS) and/or genetic variant(s) which have not already been classified in databases.
In particular, the genetic variant(s) to be characterized, such as VUS, is/are single nucleotide variants (SNVs), or insertions or deletions (INDELs).
Most preferably, the genetic variant(s) to be characterized by the in vitro method is/are comprised within an essential gene, such as a tumor suppressor gene.
According to one embodiment, the genetic variant(s) to be characterized by the in vitro method is/are comprised within an essential gene, for which loss-of-function results in loss of at least one selected from viability or fitness.
According to one embodiment, the genetic variant(s) to be characterized by the in vitro method is/are comprised within an essential gene, for which loss-of-function results in loss of viability.
According to one embodiment, the genetic variant(s) to be characterized by the in vitro method is/are comprised within an essential gene, for which loss-of-function results in loss of fitness.
According to one alternative embodiment of the in vitro method, the genetic variant(s) is/are not comprised within the gene BRCAL
According to one alternative embodiment of the in vitro method, the genetic variant(s) is/are not comprised within the gene BRCA2.
According to one alternative embodiment of the in vitro method, the genetic variant(s) is/are not comprised within the gene POLE.
According to one alternative embodiment of the in vitro method, the genetic variant(s) is/are not comprised within a gene selected from BRCA1 and BRCA2. According to one alternative embodiment of the in vitro method, the genetic variant(s) is/are not comprised within a gene selected from BRCA1 and POLE.
According to one alternative embodiment of the in vitro method, the genetic variant(s) is/are not comprised within a gene selected from BRCA2 and POLE.
According to one alternative embodiment of the in vitro method, the genetic variant(s) is/are not comprised within a gene selected from BRCA1 and BRCA2 and POLE.
According to one alternative embodiment of the in vitro method, the genetic variant(s) is/are comprised within a gene selected from BRCA1 and BRCA2 and POLE.
In particular, the in vitro method according to the invention can be advantageously applied to a patient having, or which is presumed to have a disorder, for example selected from the group consisting of: a cancer, an autoimmune disease, an inflammatory disease, a neurodegenerative disease.
In particular the invention relates to an in vitro method for characterizing one or more genetic variants, wherein the genetic variant(s) to be characterized is/are comprised within, or associated to, tumor suppressor genes.
The in vitro method according to the invention may thus be advantageously applied to a patient having, or which is presumed to have, a cancer, or a patient having a family history of cancer.
In some particular embodiments, a cancer may include, without limitation, leukemias (e.g., acute leukemia, acute lymphocytic leukemia, acute myelocytic leukemia, acute myeloblastic leukemia, acute promyelocytic leukemia, acute myelomonocytic leukemia, acute monocytic leukemia, acute erythroleukemia, chronic leukemia, chronic myelocytic leukemia, chronic lymphocytic leukemia), polycythemia vera, lymphoma (e.g., Hodgkin's disease or non-Hodgkin's disease), Waldenstrom's macroglobulinemia, multiple myeloma, heavy chain disease, and solid tumors such as sarcomas and carcinomas (e.g., fibrosarcoma, myxosarcoma, liposarcoma, chondrosarcoma, osteogenic sarcoma, chordoma, angiosarcoma, endotheliosarcoma, lymphangiosarcoma, lymphangioendotheliosarcoma, synovioma, mesothelioma, Ewing's tumor, leiomyosarcoma, rhabdomyosarcoma, colon carcinoma, pancreatic cancer, breast cancer, ovarian cancer, prostate cancer, squamous cell carcinoma, basal cell carcinoma, adenocarcinoma, sweat gland carcinoma, sebaceous gland carcinoma, papillary carcinoma, papillary adenocarcinomas, cystadenocarcinoma, medullary carcinoma, bronchogenic carcinoma, renal cell carcinoma, hepatoma, bile duct carcinoma, choriocarcinoma, seminoma, embryonal carcinoma, Wilm's tumor, cervical cancer, uterine cancer, testicular cancer, lung carcinoma, small cell lung carcinoma, bladder carcinoma, epithelial carcinoma, glioma, astrocytoma, glioblastoma multiforme (GBM, also known as glioblastoma), medulloblastoma, craniopharyngioma, ependymoma, pinealoma, hemangioblastoma, acoustic neuroma, oligodendroglioma, schwannoma, neurofibrosarcoma, meningioma, melanoma, neuroblastoma, and retinoblastoma).
Most preferably, the populations of cells which are considered by the methods of the invention consist of haploid eukaryotic cells.
According to one embodiment of the in vitro method, the haploid cells include an inactivated or impaired Non-Homologous End Joining (NHEJ) pathway.
According to one embodiment of the in vitro method, the haploid cells are LIG4 KO or XRCC4 KO cells.
According to one embodiment of the in vitro method, the haploid cells are HAP1 or KBM7 cells.
Advantageously, the in vitro method of the invention does not require any limiting dilution step before being cultured. Hence, according to one embodiment of the in vitro method, the first and second populations of haploid cells are not subjected to limiting dilution before being cultured, thereby avoiding clonal side effects.
According to one embodiment of the in vitro method, the first and second populations of haploid cells are cultured in a suitable culture medium for at least 48 hours, in particular for at least 72 hours, preferably for at least 96 hours, or more.
According to one embodiment, the in vitro method further comprises a step of recovering the first and second population of haploid cells from the culture medium.
According to one embodiment, the in vitro method further comprises a step of recovering genomic DNA, or any nucleic acid sequence derived from said genomic DNA, from the cultured first and second population of haploid cells.
According to one embodiment, the in vitro method further comprises a step of sequencing the genomic DNA, or any nucleic acid sequence derived from said genomic DNA, of the cultured first and second population of haploid cells.
According to one embodiment, the step of determining the occurrence of the genetic variant(s) comprises amplifying the specifically cleaved sequences in the first and second population of haploid cells.
According to one embodiment, the step of determining the occurrence of the genetic variant(s) comprises a step of comparing the level of the genetic variant(s) in the first population of haploid cells to the level of the genetic variant(s) in the second population of haploid cells.
According to one embodiment, the step of determining the occurrence of the genetic variant(s) comprises a determining the frequency of the genetic variant(s) in the first population of haploid cells to the frequency of the genetic variant(s) in the second population of haploid cells.
According to one embodiment, the step of determining the occurrence of the genetic variant(s) in the first and second population of haploid cells consists of determining the occurrence of the mutation corresponding to the genetic variant(s) and to the silent or benign, preferably silent, PAM sequence mutation introduced by the first nucleic acid and comparing it to the occurrence of the silent or benign, preferably silent, mutation at the site of the genetic variant(s) and to the silent PAM sequence mutation introduced by the second nucleic acid.
According to one embodiment, the step of determining the occurrence of the genetic variant(s) in the first and second population of haploid cells consists of determining the mutation frequency corresponding to the genetic variant(s) and to the silent or benign, preferably silent, PAM sequence mutation introduced by the first nucleic acid and the mutation frequency corresponding to the silent mutation at the site of the genetic variant(s) and to the silent or benign, preferably silent, PAM sequence mutation introduced by the second nucleic acid
According to one embodiment, the step of determining the occurrence of the genetic variant(s) in the first and second population of haploid cells consists of determining a function score (FS) with the following formula:
Function score=½*((log 2((fmut*fPAMmut)/(fsil*fPAMsil)))+(log 2((fmut*fPAMsil)/(fsil*fPAMmut))))
with:
Advantageously, the determination of the genetic variant(s) in the first and second population of haploid cells, for instance the determination of the corresponding function score for a given variant, allows the characterization of the said variant in the patient.
Advantageously, the function score allows its comparison with function scores from other variants. For instance a decreased function score for a given genetic variant is indicative of a pathogenic mutation, when compared to a reference control mutation (see
According to one embodiment, the in vitro method further comprises a step of comparing the level of the genetic variant(s) of the first and second population of haploid cells to a reference value. Hence, the in vitro method for characterizing genetic variant(s) in a patient is also suitable as an in vitro method for classifying genetic variant(s) in a patient, and/or for classifying genetic variant(s) in a population of patients.
Therapeutic Applications of the In Vitro Methods
According to a second, alternative, object, the invention relates to a method for preventing or treating a patient bearing a genetic variant, wherein said genetic variant is characterized as pathogenic according to the above-mentioned in vitro method.
The said method thus comprises a step of administering a suitable medication to the patient for which the genetic variant has been characterized.
In particular, the medication may be suitable for preventing the occurrence or re-occurrence, or for reducing the likelihood of occurrence or re-occurence of the disease for which the genetic variant has been characterized as pathogenic.
Material & Methods
HAP1 Cell Culture
Wild-type haploid HAP1 cells were purchased from Horizon Discovery and cultured in Isocove's Modified Dulbecco's Medi (IMDM) containing L-glutamine and 25 mM HEPES (Corning), supplemented with 10% fetal calf serum (Eurobio). Cells were grown at 37° C., under an atmosphere containing 5% CO2 and were passaged before confluence, to prevent reversion to the diploid state. Haploidy of HAP1 cells was confirmed by measure of DNA content via DNA coloration with propidium iodide (PI) dye following vindelov83's method and cytometry analysis before their use.
Genetically Engineered HAP1 Cells
Polyclonal LIG4 knock-out cells were generated with CRISPR-Cas9 technology. A guide RNA (gRNA) was first designed to target the second exon with an AfIII restriction site three nucleotides upstream from the PAM sequence. An Alt-R CRISPR-Cas9 crRNA (IDT DNA) (SEQ ID No 1: 5′-CAATTACACAGTACGTGTCT-3′) and an Alt-R CRISPR-Cas9 tracrRNA with an ATTO550 fluorescent dye (IDT DNA) were complexed at a final concentration of 1 μM with 6 pmol of Alt-R S.p. Hifi Cas9 Nuclease V3 (IDT DNA) in presence of Lipofectamine CRISPRMAX Cas9 Transfection Reagent (Thermo Fisher Scientific). The mixture was incubated for 20 minutes, and reverse transfection was then performed by adding RNA-Cas9 ribonucleoprotein complexes to 1.6×105 cells. Four hours after transfection, cells were sorted by FACS on the basis of ATTO550 fluorescence. Only the 20% of cells with the highest level of fluorescence were retained and used to seed with IMDM supplemented with 1% penicillin-streptomycin (Gibco). The cells were incubated for five days, and then subjected to limiting dilution. About 20 clones were amplified for DNA extraction with Chelex 100 Resin (Biorad). We used 10 μL of these DNA extract for PCR amplification (SEQ ID No 2: forward primer: 5′-CTGGAGAACAGAATTGCAGA-3′; SEQ ID No 3 reverse primer: 5′-TAGCAATCATATTCACGGGC-3′) followed by digestion with the AfIIII restriction enzyme (New England Biolabs) for 1 h at 37° C. The mixture was then incubated for 20 min at 80° C. for enzyme inactivation. The clones were screened by following their migration in a 2% agarose gel on electrophoresis. Clones that had undergone genomic editing and had lost the restriction site where identified by Sanger sequencing on an ABI 3130 Genetic Analyzer (Thermo Fisher Scientific) with the BigDye Terminator v1.1 Cycle sequencing Kit (Thermo Fisher Scientific). Results were visualized and analyzed with Sequencing Analysis 5.3.1 software.
The same technique was used for XRCC4 KO haploid cells using another Alt-R CRISPR-Cas9 crRNA (IDT DNA) (SEQ ID No 4: 5′-ATGGTCATTCAGCATGGACT-3′) and the following primers for amplification (SEQ ID No 5: forward primer 5′-GAGGCCAGTACAGAAAACAT-3′; SEQ ID No 6 reverse primer: 5′-TGGAAAAGTATCCCTGAGGA-3′).
VUS Selection and gRNA Design
The first BRCA1, BRCA2 and POLE variants to be characterized were those found in somatic DNA from patients at the Institut de Cancerologie de l'Ouest (ICO). They were selected for study on the basis of their status as variants of uncertain significance or unclassified variants according to different databases (UMD database, ClinVar, BRCA exchange); well-known mutations were also analyzed as controls. All variants were characterized by Next-Generation Sequencing (NGS) sequencing.
The first stages of NGS library preparation were carried out with the Oncomine BRCA Assay Manual Kit (Thermo Fisher Scientific). Briefly, barcoded libraries were generated from 20 ng of DNA per sample. Two premixed pools of 265 primer pairs for entire BRCA1/2 coding regions and noncoding putative splice boundaries were used to generate the sequencing libraries. Clonal amplification of the libraries was carried out by emulsion PCR using an Ion Chef System (Thermo Fisher Scientific) according to the manufacturer's instructions. The prepared libraries were then sequenced on Ion Torrent S5 Sequencer using Ion 520&530 Kit Chef (Thermo Fisher Scientific). Variants of interest were visualized in Integrative Genomics Viewer (IGV), a high-performance visualization tool for interactive exploration of large integrated genomic datasets on standard desktop computers.
Human Genome Variation Society (HGVS)-approved guidelines (http://www.hgvs.org/mutnomen/) were used for BRCA1 nomenclature. The variants found by NGS were researched in the UMD-BRCA1 database (http://www.umd.be/BRCA1/), the ClinVar database (https://www.ncbi.nlm.nih.gov/clinvar/) and the BRCA Mutation Database (http://www.arup.utah.edu/database/BRCA/Home/BRCA1_landing.php).
A “read” is defined herein as a non-paired sequence having an average length of about 98 bases.
In order to increase the number of characterized VUS, we also selected 10 variants of BRCA1 (p.Ile31Asn, p.Glu149Ala, p.Val191Asp, p.G1n210=, p.Gly462Arg, p.Arg979Cys, p.Gly1201Ser, p.Thr1394Ile, p.Ala1752Pro, p.Gly1770Val) and a POLE variant (p.Arg1826Trp) for which conflicting interpretations had been reported in the databases. Four variants of POLE (p.Ala31Ser, p.Pro286Ser, p.Leu424Val, p.Phe695Ile) were also searched as controls.
Alt-R CRISPR-Cas9 crRNA (IDT DNA) were designed with the Alt-R Custom Cas9 crRNA design tool (IDT DNA) (https://eu.idtdna.com/site/order/designtool/index/CRISPR_CUSTOM).
The PAM sequence had to be adjacent to the mutation to facilitate the editing of KO LIG4 HAP1 cells. The guide RNAs (gRNA) were selected on the basis of the possibility of inserting a silent mutation into the PAM sequence or into the 3 to 5 nucleotides immediately upstream. This increases editing efficiency and will be used as a control in subsequent experiments. For each variant, two Ultramer DNA Oligos (https://eu.idtdna.com/site/order/designtool/index/CRISPR_CUSTOM) (IDT DNA) of about 84nt were designed. The first contained the patient's variant and the second contained a silent mutation, already reported to be benign if possible.
Both contained the silent control mutation mentioned above. All the gRNA and DNA oligonucleotides were designed as described in Table 1 herebelow.
Transfection of LIG4 KO HAP1 Cells
For each variant, two transfections were performed simultaneously, both with the same gRNA but with different DNA oligomers (the VUS to be classified in one transfection 10 or the silent mutation in the other). The same protocol was used, but with 2 nmol of DNA oligonucleotides added before the Lipofectamine CRISPRMAX Cas9 Transfection Reagent. A cell suspension containing 400 000 cells/mL in IMDM supplemented with 10% FBS was then prepared and Alt-R HDR Enhancer (IDT DNA) was added to a final concentration of 2 nM. Reverse transfection was then performed. On day 1 post-transfection, the medium was replaced with fresh Iscove's Modified Dulbecco's Medium (IMDM) supplemented with 10% FBS. On day 4 to 5, depending on the degree of confluence, the cells were released by trypsin treatment and used to seed 6 cm-diameter plates. Two days after plating, a second transfection was performed with the same protocol for both types of transfection, to enrich the cells preparation in edited cells. The cells were then incubated for a further four to five days before DNA extraction.
DNA Extraction and NGS Sequencing
All gDNA were extracted from edited cells with the Maxwell 16 Blood DNA Purification Kit (Promega) and quantified using a Qubit (Thermo Fisher Scientific) and the Quantifluor dsDNA System Kit (Promega). 20 ng of the extracted DNA were then used to generate the NGS library. The libraries were prepared with the Oncomine BRCA Assay Manual Kit or Ion Ampliseq POLE (Thermo Fisher Scientific), allowing amplification of the entire BRCA1 and BRCA2 or POLE coding regions and noncoding putative splice boundaries. Samples were barcoded and the libraries were subjected to clonal amplification by PCR emulsion with an Ion Chef System (Thermo Fisher Scientific). The prepared libraries were then sequenced on an Ion Torrent S5 Sequencer with the Ion 520 and 530 Chef Kit (Thermo Fisher Scientific). Variants of interest were visualized with Integrative Genomics Viewer (IGV) (http://software.broadinstitute.org/software/igv/).
Sequencing Analysis and VUS Function Score Evaluation
Following NGS sequencing, insertions or deletions located around the expected cleavage site, in the eight nucleotides centered on the PAM sequence or the seven nucleotides centered on the VUS, were also counted. Indel frequencies were then calculated by dividing the total amount of indels by the total number of reads. For the evaluation of Single-Nucleotide Variation (SNV) coverage, the ratio of the total numbers of reads for the VUS evaluated and the control SNV was calculated. Finally, function scores for all the variants studied were calculated by comparing the sequence frequencies of all the inserted variants (VUS of interest, silent control SNV and silent reference SNV) and the results contained in the available databases (UMD database, Clinvar, BRCA exchange).
Function Score Determination
The following formula is used:
Function score=½*((log 2((fmut*fPAMmut)/(fsil*fPAMsil)))+(log 2((fmut*fPAMsil)/(fsil*fPAMmut))))
with:
Read covers must be similar to the control condition and the tested condition. The same applies to indel frequencies. All the variants measured for a given mutation must be localised on a same read.
Statistics
All statistical analyses were performed with GraphPad Prism analysis software.
Generation of the Polyclonal LIG4 Knock-Out HAP1 Model
The gRNA targeting this gene was selected according to its proximity to the AflIII restriction site, which is located at the Cas9 double-stranded cleavage site (
A comparison of editing frequencies between BRCA1/2 variants and silent control SNV can be used for functional classification.
In HAP1 cells, BRCA1 and BRCA2 are essential genes. Genomic editing to create a pathogenic mutation of these genes thus leads to cell death, facilitating the screening of edited cells. Moreover, edited cells with insertions or deletions (hence generating a shift of the reading frame) instead of the mutation of interest also die, due to the essential nature of the gene concerned.
We checked that the absence of a mutation following NGS sequencing was due to the pathogenicity of the mutation rather than a problem linked to genomic editing, by simultaneously performing a second transfection, with the same gRNA, but the insertion of a silent mutation already classified as benign in databases where possible (
We then tested our method by using it to characterize 10 variants of BRCA1 and BRCA2 already classified as benign or pathogenic in databases.
We also evaluated the indel frequency to estimate the editing efficiency in the two conditions compared for each mutation. Indeed, this indel frequency was identical for both conditions when analyzed for the 8nt surrounding the PAM sequence (
These results confirm that the Cas9 protein cleaves the DNA 3nt upstream from the PAM sequence. Moreover, the observed linear regression made it possible to evaluate genomic editing efficiency and to compare the two conditions with the same gRNA. Following NGS sequencing, the coverage of the mutation of interest (patient or silent control) and the reference control mutation was also checked and shown to be similar in the two conditions (
Functional Characterization of BRCA1/2 Variants of Unknown Significance
Variants of BRCA1 and BRCA2 were initially selected after characterization in our laboratory, on the basis of an absence of annotation concerning their function. We then also studied other mutations classified as VUS or unreported in the databases (such as UMD database, Clinvar, BRCA exchange . . . ) (Table 1). The 20 BRCA1 and 3 BRCA2 variants affected different domains of the proteins and were distributed along the entire length of these genes (
We classified 20 of these variants as neutral, six as pathogenic and two as intermediate. These results were consistent with the saturation genome editing study of the RING and BRCT domains of BRCA1 (p.Ile31Asn, p.Ala1752Thr, p.Ala1752Pro, p.Gly1770Val and p.Pro1812A1a) provided in Findlay (“Accurate classification of BRCA1 variants with saturation genome editing”; Nature; 2018).
Four of the six variants we classified as pathogenic concerned the BRCT domain of BRCA1; the other two were previously unreported nonsense mutations. The results were more surprising for the p.G1n210=variant of BRCA1, which was also found pathogenic. However, this silent mutation may create or strengthen a splice site according to databases. One of the two intermediate variants, c.5194-2A>G, has already been reported to affect splicing and may also be pathogenic. Its classification as functionally intermediate might reflect the existence of a large number of BRCA1 splicing variants. The second intermediate variant, p.Leu1080=, is located in the middle of exon 11 of BRCA1. This synonymous variant has been reported in databases having a likelihood of resulting in a splicing alteration according to bioinformatic analyses. However, our intermediate function score (−1.415) is consistent with the finding of the ESE finder tool, a bioinformatic tool used to identify exonic splicing enhancers, which predicted that this variant might create an SRp40 ESE site (see
A three-week period to determine the functional impact of a variant is compatible with clinical management and is one of the main advantages of our protocol (
We have further tested our protocol with the Cpf1 endonuclease. Similar results were obtained for the variant studied (p.Tyr422X from BRCA1, a pathogenic variant), implying a stable function score.
We have also generalized in another polyclonal cell line, knock-out for the XRCC4 gene, also implicated in the NHEJ pathway. The same function score was obtained for the mutations (p.Ile1275Val, p.Pro1812Ala, p.Tyr422X, p.Gln210=) analyzed in this line, including p.Gln210=variant of BRCA1
The method presented here was proved effective for the characterization of the functional impact of BRCA1 and BRCA2 VUS. More importantly, it can be used to obtain the necessary biological evidence of VUS function required for the prescription of targeted treatment within three weeks, which is compatible with use in clinical application. The patient carrying the genomic abnormality therefore benefits from an analysis of his or her own mutation, with potential consequences for relatives. This is particularly important for extremely rare somatic variants, which resemble orphan diseases.
Extension of the Experimental Process to the Functional Evaluation of POLE Variants
We then extended our protocol to the characterization of VUS from other tumor suppressor genes that were also essential in our model. We chose to study variants of the POLE gene because of potential interest of their functional impact for determining access to immunotherapy. We therefore selected seven POLE mutations from databases, included two classified as benign and two classified as pathogenic (
This supports the extension of the application of our method to the characterization of all essential tumor suppressor genes.
Number | Date | Country | Kind |
---|---|---|---|
20305476.2 | May 2020 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2021/062697 | 5/12/2021 | WO |