The present invention relates to a composition of primers, a kit and a method for detecting high grade squamous intraepithelial lesion (HSIL) and/or for typing a Human Papillomavirus (HPV).
Human papillomaviruses (HPV) infections are associated with the development of cervical carcinoma, one of the most common cancers among women, and other cancers like anal cancer (Lin C et al. Human papillomavirus types from infection to cancer in the anus, according to sex and HIV status: a systematic review and meta-analysis. Lancet Infect Dis, 2018, 18:198-206) and head and neck cancer (Chaturvedi A K, et al. Human papillomavirus and rising oropharyngeal cancer incidence in the United States. J Clin Oncol Off J Am Soc Clin Oncol, 2011, 29:4294-301). HPV are the etiologic agents responsible for over 99% of all cervical cancers (Walboomers J M, et al. Human papillomavirus is a necessary cause of invasive cervical cancer worldwide. J Pathol, 1999, 189:12-9). HPV are small, non-enveloped DNA viruses commonly transmitted through sexual contact, which infect basal cells and replicate in the nucleus of squamous epithelial cells. HPV include more than 200 genotypes characterized by their oncogenic potential, with highly oncogenic HPV types (high-risk HPV) having a unique ability to drive cell proliferation (Schiffman M, et al. S. Carcinogenic human papillomavirus infection. Nat Rev Dis Primer, 2016, 2:16086).
The genomic organization of papillomaviruses is divided into functional early and late regions. The model of HPV infection, which is mainly derived from knowledge on HPV16, is that following the infection of basal cells in the cervical epithelium, the early HPV genes (E6, E7, E1, E2, E4 and E5) are expressed and the viral DNA replicates from the episomal form of the viral DNA. As the cells divide, in the upper layers of the epithelium the viral genome is replicated further, and the late genes (L1 and L2) and E4 are expressed. Viral shedding then further initiates new infections (Woodman C B J, et al. The natural history of cervical HPV infection: unresolved issues. Nat Rev Cancer, 2007, 7:11-22).
HPV infection during the development of cervical cancer is associated with a shift from productive infection (which in most of the cases will be cleared by the immune system), towards non-productive persistent and transforming infection (in a minority of cases) characterized in particular by a high level of E6 and E7 mRNAs and low expression of E2 and late genes such as L1 (Doorbar J, et al. The biology and life-cycle of human papillomaviruses. Vaccine, 2012, 30 Suppl 5:F55-70, Shulzhenko N, et al. Ménage à trois: an evolutionary interplay between human papillomavirus, a tumor, and a woman. Trends Microbiol, 2014, 22:345-53). High-risk HPV infection may result in low-grade lesions, with highly productive infection and high rate of spontaneous regression. In contrast, high-risk persistent HPV infection is responsible for high-grade lesion, the true precancerous lesion.
Cervical cancer screening allows detection and treatment of precancerous lesions before the development of cervical cancer. Screening is based on different algorithms, some allowing detection of HPV, and others identifying abnormal cells. Despite the role of high-risk HPV in cervical cancer, screening tests of cancer or precancerous lesions remain in many countries mainly based on the Papanicolaou (Pap) cytology test and do not include molecular virology tests (Schiffman M, et al. 2016). This is largely due to the low Positive Predictive Value (PPV) of current molecular tests. Indeed, because most of the current molecular diagnostic methods rely on the detection of HPV genome (DNA) and do not address the patterns of viral expression (RNA), they remain weak predictors of the evolution from low-grade squamous intraepithelial lesion (LSIL) to high-grade squamous intraepithelial lesion (HSIL) of the cervix (Tornesello M L, et al. Viral and cellular biomarkers in the diagnosis of cervical intraepithelial neoplasia and cancer. BioMed Res Int, 2013, 2013:519619). In addition, DNA identification of high-risk HPV is not fully predictive of cancer since only persistence for years of high-risk HPV is associated with an increased risk of cancer development (Schiffman M, et al. 2016). Thus, the use of HPV DNA tests, as a screening assay, is currently increasing worldwide and shows high sensitivity (Ogilvie G S, et al. Effect of Screening With Primary Cervical HPV Testing vs Cytology Testing on High-grade Cervical Intraepithelial Neoplasia at 48 Months: The HPV FOCAL Randomized Clinical Trial. JAMA, 2018, 320:43-52) but low PPV for HSIL detection (Cuzick J, et al. Comparing the performance of six human papillomavirus tests in a screening population. Br J Cancer, 2013, 108:908-13).
HPV RNA tests and in particular expression of E6 and E7 mRNAs of high-risk HPV have been proposed as better molecular markers of cancer development, but E6 and E7 are also expressed during HPV transient infection so it remains difficult to define a threshold of expression associated with the persistence and evolution to high-grade lesions and cancer. There is no consensus that HPV RNA tests have a better diagnostic accuracy compared to HPV DNA tests and cytology for the detection of cervical precancerous lesions (Virtanen E, et al. Performance of mRNA- and DNA-based high-risk human papillomavirus assays in detection of high-grade cervical lesions. Acta Obstet Gynecol Scand, 2017, 96:61-8, Cook D A, et al. Aptima HPV Assay versus Hybrid Capture® 2 HPV test for primary cervical cancer screening in the HPV FOCAL trial. J Clin Virol Off Publ Pan Am Soc Clin Virol, 2017, 87:23-9, Ge Y et al. Aptima Human Papillomavirus E6/E7 mRNA Test Results Strongly Associated With Risk for High-Grade Cervical Lesions in Follow-Up Biopsies. J Low Genit Tract Dis, 2018, 22:195-200). There is therefore a need for a novel generation of molecular diagnostic tests that can not only detect HPV infection, but also have the ability to accurately predict precancerous stages to offer a better and cost saving medical benefit (de Thurah L, et al. Concordant testing results between various human papillomavirus assays in primary cervical cancer screening: systematic review. Clin Microbiol Infect Off Publ Eur Soc Clin Microbiol Infect Dis, 2018, 24:29-36, Hawkes D, et al. Not all HPV nucleic acid tests are equal: only those calibrated to detect high grade lesions matter for cervical screening. Clin Microbiol Infect Off Publ Eur Soc Clin Microbiol Infect Dis, 2018, 24:436-7, de Thurah L, et al. Not all HPV nucleic acid tests are equal: only those calibrated to detect high grade lesions matter for cervical screening: Response to “Concordant testing results between various human papillomavirus assays in primary cervical cancer screening: systematic review” Published 27 May, 2017. Clin Microbiol Infect Off Publ Eur Soc Clin Microbiol Infect Dis, 2018, 24:438-9).
Now, taking advantage of Next-Generation Sequencing (NGS) technologies, the inventors have developed a multiplexed amplification system targeting the virus splice junctions coupled with NGS analysis that allows to describe fine equilibrium among transcript species of 13 high-risk HPV (HPV16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 66) plus 3 putative high-risk HPV (HPV68, 73, 82), in a single reaction. This molecular approach makes, in particular, possible to take a snapshot of the early vs late populations of HPV transcripts and to define a model based on a combination of reads that reflects the biology of the virus, which can then be correlated to the evolution of lesions. The ultimate goal is to replace the conventional methods of the triage of women at risk of transforming infection before colposcopy.
Based on a study conducted on 55 patients, starting from cervical smears conserved at room temperature, the inventors have showed that the method of the invention can be used as a marker of high-grade cytology, with encouraging diagnostic performances as a triage test.
A subject of the present invention is therefore a composition of primers for detecting high grade squamous intraepithelial lesion (HSIL) comprising a first set of primers, called splice junctions set of primers, which comprises:
The aim of the invention is notably to lower the number of primers used in the multiplex system. This lowering of the numbers of primers may be done by lowering the number of the targeted splice junctions and by using redundant nucleic acid sequences.
Thus the 362 nucleic acid sequences of the primers of the splice junctions set primers are redundant and represent in fact 165 unique nucleic acid sequences.
The present invention also relates to a composition of primers for detecting HSIL comprising a first set of pairs of primers, called splice junctions set of primers, which comprises:
Definitions
High-risk HPV also called HR-HPV herein refer to the HPV of the following types: HPV16, HPV18, HPV31, HPV33, HPV35, HPV39, HPV45, HPV51, HPV52, HPV56, HPV58, HPV59 and HPV66.
Putative high risk HPV herein refer to the HPV of the following types: HPV68, HPV73 and HPV82.
HSIL refers to high grade squamous intraepithelial lesion. HSIL may be cervical, anogenital, head and neck HSIL. Preferably, HSIL is cervix HSIL.
LSIL refers to low grade squamous intraepithelial lesion. LSIL may be cervical, anogenital, head and neck LSIL. Preferably, LSIL is cervix LSIL.
Splice junctions set of primers refer herein to a set of primers which target high risk and optionally putative high risk HPV splice events involving a pair of splice donor (SD) and splice acceptor (SA) sites.
Unsplice junctions set of primers refer herein to a set of primers which target high risk and optionally putative high risk HPV genomic regions spanning either splice donor or splice acceptor sites in the absence of any splice event. In this context, the term “junction” refers to exon-intron interface (i.e. the position where a donor or acceptor site would be found in case of a splice event).
Genomic set of primers refer herein to a set of primers which target high risk and optionally of putative high risk HPV genomic regions away from any splice donor or splice acceptor sites.
Fusion set of primers refer herein to a set of primers which target high risk and optionally of putative high risk HPV fusion transcripts.
Human set of primers refer herein to a set of primers which target human sequences.
HPV RNA Seq refers herein to a multiplexed amplification system coupled with Next Generation Sequencing analysis.
The expression “the nucleic acid sequence selected from the group consisting of SEQ ID NO: x-x+1 to SEQ ID NO: x+n-x+n+1” means “the nucleic acid sequence selected from the group consisting of SEQ ID NO: x-x+1, SEQ ID NO: x+2-x+3, SEQ ID NO: x+4-x+5 . . . and SEQ ID NO: x+n-x+n+1”. For example, the nucleic acid sequence selected from the group consisting of SEQ ID NO: 1-2 to SEQ ID NO: 5-6 means the nucleic acid sequence selected from the group consisting of SEQ ID NO: 1-2 (pair of primers SEQ ID NO: 1 and SEQ ID NO: 2), SEQ ID NO: 3-4 (pair of primers SEQ ID NO: 3 and SEQ ID NO: 4) and SEQ ID NO: 5-6 (pair of primers SEQ ID NO: 5 and SEQ ID NO: 6).
The expression “the nucleic acid sequence selected from the group consisting of SEQ ID NO: x to SEQ ID NO: x+n” means “the nucleic acid sequence selected from the group consisting of SEQ ID NO: x, SEQ ID NO: x+1, SEQ ID NO: x+2 . . . and SEQ ID NO: x+n”. For example, the nucleic acid sequence selected from the group consisting of SEQ ID NO: 1 to SEQ ID NO: 6 means the nucleic acid sequence selected from the group consisting of SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5 and SEQ ID NO: 6.
Biological samples as referred herein include, without limitation, mammalian bodily fluids, especially oral fluids or scrapings, genital scrapings, in particular cervix scrapings.
Primers and amplicons encompassed by the invention are not limited to the sequences defined in the primers and amplicons depicted below. Primers and amplicons may encompass primers having at least 95% of identity with the primers and amplicons defined below. Primers can also comprise extra bases at the 5′ end. Also, primers shall be understood as embracing shorter sequences of at least 12, 15, 20 or 25 consecutive bases of the primers featured below. In some embodiments, it shall be understood that the invention also contemplates generic probes which have the sequences of the primers depicted herein and which are directly or indirectly labeled. The probes and primers can be extended or swifted from 1 to 15 bases depending on the desired specificity of the PCR amplification step and/or on the specificity of the detection step using standard parameters such as the nucleic acid size and GC contents, stringent hybridization conditions and temperature reactions. For example, low stringency conditions are used when it is desired to obtain broad positive results on a range of homologous targets whereas high stringency conditions are preferred to obtain positive results only if the specific target nucleic is present in the sample.
As used herein, the term “stringent hybridization conditions” refers to conditions under which the primer or probe will hybridize only to that exactly complementary target(s). The hybridization conditions affect the stability of hybrids, e.g., temperature, salt concentration, pH, formamide concentration and the like. These conditions are optimized to maximize specific binding and minimize non-specific binding of primer or probe to its target nucleic acid sequence. Stringent conditions are sequence dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. Generally, stringent conditions are selected to be about 5° C. lower than the thermal melting point (Tm) for the specific sequences at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength and pH) at which 50% of a complementary target sequence hybridizes to a perfectly matched probe or primer. Typically, stringent conditions will be those in which the salt concentration is less than about 1.0 M Na+, typically about 0.01 to 1.0 M Na+ concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30° C. for short probes or primers (e.g. 10 to 50 nucleotides) and at least about 60° C. for long probes or primers (e.g. greater than 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. Exemplary low stringent conditions include hybridization with a buffer solution of 20-30% formamide, 1 M NaCl, 1% SDS at 37° C. and a wash in 2*SSC at 40° C. Exemplary high stringency conditions include hybridization in 40-50% formamide, 1 M NaCl, 1% SDS at 37° C., and a wash in 0.1*SSC at 60° C. Determination of particular hybridization conditions relating to a specified nucleic acid is routine and is well known in the art, for instance, as described in J. Sambrook and D. W. Russell, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press; 3rd Ed., 2001; and F. M. Ausubel, Ed., Short Protocols in Molecular Biology, Current Protocols; 5th Ed., 2002.
Moreover, to improve the hybridization with the coupled oligonucleotide, it can be advantageous for the oligonucleotide to contain an “arm” and a “spacer” sequence of bases. The use of an arm makes it possible, in effect, to bind the primer at a chosen distance from the support, enabling its conditions of interaction with the DNA to be improved. The arm advantageously consists of a linear carbon chain, comprising 1 to 18 and preferably 6 or 12 (CH2) groups, and an amine which permits binding to the column. The arm is linked to a phosphate of the oligonucleotide or of a “spacer” composed of bases which do not interfere with the hybridization. Thus, the “spacer” can comprise purine bases. As an example, the “spacer” can comprise the sequence GAGG. The arm is advantageously composed of a linear carbon chain comprising 6 or 12 carbon atoms.
For implementation of the present invention, different types of support may be used. These can be functionalized chromatographic supports, in bulk or prepacked in a column, functionalized plastic surfaces or functionalized latex beads, magnetic or otherwise. Chromatographic supports are preferably used. As an example, the chromatographic supports capable of being used are agarose, acrylamide or dextran as well as their derivatives (such as Sephadex, Sepharose, Superose, etc.), polymers such as poly(styrene/divinylbenzene), or grafted or ungrafted silica, for example. The chromatography columns can operate in the diffusion or perfusion mode.
As used herein, the term “sequencing” is used in a broad sense and refers to any technique known by the skilled person including but not limited to Sanger dideoxy termination sequencing, whole-genome sequencing, sequencing by hybridization, pyrosequencing, capillary electrophoresis, cycle sequencing, single-base extension sequencing, solid-phase sequencing, high-throughput sequencing, massively parallel signature sequencing (MPSS), sequencing by reversible dye terminator, paired-end sequencing, near-term sequencing, exonuclease sequencing, sequencing by ligation, short-read sequencing, single-molecule sequencing, sequencing-by-synthesis, real-time sequencing, reverse-terminator sequencing, nanopore sequencing, 454 sequencing, Solexa Genome Analyzer sequencing, SOLiD sequencing, MS-PET sequencing, mass spectrometry, and a combination thereof. In specific embodiments, the method and kit of the invention is adapted to run on ABI PRISM® 377 DNA Sequencer, an ABI PRISM® 310, 3100, 3100-Avant, 3730, or 3730×1 Genetic Analyzer, an ABI PRISM® 3700 DNA Analyzer, or an Applied Biosystems SOLiD™ System (all from Applied Biosystems), a Genome Sequencer 20 System (Roche Applied Science).
For all technologies described herein, although the said primers can be used in solution, in another embodiment the said primers are linked to a solid support.
To permit its covalent coupling to the support, the primer is generally functionalized. Thus, it may be modified by a thiol, amine or carboxyl terminal group at the 5′ or 3′ position. In particular, the addition of a thiol, amine or carboxyl group makes it possible, for example, to couple the oligonucleotide to a support bearing disulphide, maleimide, amine, carboxyl, ester, epoxide, cyanogen bromide or aldehyde functions. These couplings form by establishment of disulphide, thioether, ester, amide or amine links between the primer and the support. Any other method known to a person skilled in the art may be used, such as bifunctional coupling reagents, for example.
Moreover, to improve the hybridization with the coupled oligonucleotide, it can be advantageous for the oligonucleotide to contain an “arm” and a “spacer” sequence of bases. The use of an arm makes it possible, in effect, to bind the primer at a chosen distance from the support, enabling its conditions of interaction with the DNA to be improved. The arm advantageously consists of a linear carbon chain, comprising 1 to 18 and preferably 6 or 12 (CH2) groups, and an amine which permits binding to the column. The arm is linked to a phosphate of the oligonucleotide or of a “spacer” composed of bases which do not interfere with the hybridization. Thus, the “spacer” can comprise purine bases. As an example, the “spacer” can comprise the sequence GAGG. The arm is advantageously composed of a linear carbon chain comprising 6 or 12 carbon atoms.
For implementation of the present invention, different types of support may be used. These can be functionalized chromatographic supports, in bulk or prepacked in a column, functionalized plastic surfaces or functionalized latex beads, magnetic or otherwise. Chromatographic supports are preferably used. As an example, the chromatographic supports capable of being used are agarose, acrylamide or dextran as well as their derivatives (such as Sephadex, Sepharose, Superose, etc.), polymers such as poly(styrene/divinylbenzene), or grafted or ungrafted silica, for example. The chromatography columns can operate in the diffusion or perfusion mode.
Composition of Primers for Detecting High Grade Squamous Intraepithelial Lesion
The present invention relates to a composition of primers for detecting HSIL comprising a first set of primers, called splice junctions set of primers, which comprises:
The splice junctions set of primers may further comprise:
These additional subsets of pairs of primers correspond to the putative high risk HPV: HPV68, HPV73 and HPV82.
The composition of primers for detecting HSIL may comprises the splice junctions set comprising at least 10 pairs of primers of each of the first to the thirteenth subsets of pairs of primers of the splice junctions set of primers as defined above and optionally at least 10 pairs of primers of the fourteenth and/or the fifteenth and/or the sixteenth subsets of pairs of primers of the splice junctions set of primers as defined above.
The composition of primers for detecting HSIL may comprises the splice junctions set comprising 2, 3, 4, 5, 6, 7, 8, 9 or more preferably 10 pairs of primers of each of the first to the thirteenth subsets of pairs of primers of the splice junctions set of primers as defined above and optionally 2, 3, 4, 5, 6, 7, 8, 9 or more preferably 10 pairs of primers of the fourteenth and/or the fifteenth and/or the sixteenth subsets of pairs of primers of the splice junctions set of primers as defined above.
The composition of primers for detecting HSIL may comprises the splice junctions set consisting of 2, 3, 4, 5, 6, 7, 8, 9 or more preferably 10 pairs of primers of each of the first to the thirteenth subsets of pairs of primers of the splice junctions set of primers as defined above and optionally of 2, 3, 4, 5, 6, 7, 8, 9 or more preferably 10 pairs of primers of the fourteenth and/or the fifteenth and/or the sixteenth subsets of pairs of primers of the splice junctions set of primers as defined above.
In one embodiment, the composition of primers for detecting HSIL according to the invention comprises the pairs of primers having the nucleic acid sequence SEQ ID NO: 1-2 to SEQ ID NO: 361-362 and optionally of SEQ ID NO: 363-364 to SEQ ID NO: 381-382 and/or SEQ ID NO: 383-384 to SEQ ID NO: 385-386 and/or SEQ ID NO: 387-388 to SEQ ID NO: 429-430.
In a preferred embodiment, the composition of primers for detecting HSIL according to the invention comprises the pairs of primers having the nucleic acid sequence SEQ ID NO: 1-2 to SEQ ID NO: 429-430.
In one embodiment, the composition of primers for detecting HSIL consists of the pairs of primers having the nucleic acid sequence SEQ ID NO: 1-2 to SEQ ID NO: 361-362 and optionally of SEQ ID NO: 363-364 to SEQ ID NO: 381-382 and/or SEQ ID NO: 383-384 to SEQ ID NO: 385-386 and/or SEQ ID NO: 387-388 to SEQ ID NO: 429-430.
In a preferred embodiment, the composition of primers for detecting HSIL according to the invention consists of the pairs of primers having the nucleic acid sequence SEQ ID NO: 1-2 to SEQ ID NO: 429-430.
The splice junctions set of primers of the invention may be defined by the nucleic acid sequence of the pairs of primers that compose it as defined above or by the nucleic acid sequence of the amplicons which are produced by the pairs of primers that compose it as defined below.
The pairs of primers that compose splice junctions set of primers as defined above correspond to the amplicons which are produced by the pairs of primers that compose splice junctions set of primers as defined below. The correspondence between the pairs of primers and their corresponding amplicons is given in table 2Abis.
The present invention also relates to a composition of primers for detecting HSIL comprising a first set of pairs of primers, called splice junctions set of primers, which comprises:
The splice junctions set of primers may further comprise:
The composition of primers for detecting HSIL may comprises the splice junction set of primers comprising at least 10 pairs of primers of each of the first to the thirteenth subsets of pairs of primers of the splice junctions set of primers as defined above and optionally at least 10 pairs of primers of the fourteenth and/or the fifteenth and/or the sixteenth subsets of pairs of primers of the splice junctions set of primers as defined above.
The composition of primers for detecting HSIL may comprises the splice junction set of primers comprising 2, 3, 4, 5, 6, 7, 8, 9 or more preferably 10 pairs of primers of each of the first to the thirteenth subsets of pairs of primers of the splice junctions set of primers as defined above and optionally 2, 3, 4, 5, 6, 7, 8, 9 or more preferably 10 pairs of primers of the fourteenth and/or the fifteenth and/or the sixteenth subsets of pairs of primers of the splice junctions set of primers as defined above.
The composition of primers for detecting HSIL may comprises the splice junction set of primers consisting of 2, 3, 4, 5, 6, 7, 8, 9 or more preferably 10 pairs of primers of each of the first to the thirteenth subsets of pairs of primers of the splice junctions set of primers as defined above and of optionally 2, 3, 4, 5, 6, 7, 8, 9 or more preferably 10 pairs of primers of the fourteenth and/or the fifteenth and/or the sixteenth subsets of pairs of primers of the splice junctions set of primers as defined above.
In one embodiment, the composition for detecting HSIL of primers according to the invention comprises the splice junction set of primers comprising the pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 1501 to SEQ ID NO: 1681 and optionally of SEQ ID NO: 1682 to SEQ ID NO: 1691 and/or SEQ ID NO: 1692 to SEQ ID NO: 1704 and/or SEQ ID NO: 1705 to SEQ ID NO: 1715.
In a preferred embodiment, the composition of primers for detecting HSIL according to the invention comprises the splice junction set of primers comprising the pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 1501 to SEQ ID NO: 1715.
In one embodiment, the composition of primers consists of the splice junction set of primers consisting of the pairs of primers having the nucleic acid sequence SEQ ID NO: 1501 to SEQ ID NO: 1681 and optionally of SEQ ID NO: 1682 to SEQ ID NO: 1691 and/or SEQ ID NO: 1692 to SEQ ID NO: 1704 and/or SEQ ID NO: 1705 to SEQ ID NO: 1715.
In a preferred embodiment, the composition of primers for detecting HSIL according to the invention consists of the splice junction set of primers consisting of the pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 1501 to SEQ ID NO: 1715.
In one embodiment, the composition of primers for detecting HSIL does not comprise an additional set of primers selected from the group consisting of a unsplice junctions set of primers, a genomic set of primers and a fusion set of primers. In particular, the composition of primers for detecting HSIL may not comprise an unsplice junctions set of primers, a genomic set of primers and a fusion set of primers.
In one embodiment, the composition of primers for detecting HSIL comprises a human set of primers. The primers of the human set of primers target human sequences.
The human set of primers may be used as an internal control.
The human set of primers may comprise at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29 or at least 30, or 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 pairs of primers having the nucleic acid sequence selected from the group consisting of SEQ ID NO: 1441-1442, SEQ ID NO: 1443-1444, SEQ ID NO: 1445-1446, SEQ ID NO: 1447-1448, SEQ ID NO: 1449-1450, SEQ ID NO: 1451-1452, SEQ ID NO: 1453-1454, SEQ ID NO: 1455-1456, SEQ ID NO: 1457-1458, SEQ ID NO: 1459-1460, SEQ ID NO: 1461-1462, SEQ ID NO: 1463-1464, SEQ ID NO: 1465-1466, SEQ ID NO: 1467-1468, SEQ ID NO: 1469-1470, SEQ ID NO: 1471-1472, SEQ ID NO: 1473-1474, SEQ ID NO: 1475-1476, SEQ ID NO: 1477-1478, SEQ ID NO: 1479-1480, SEQ ID NO: 1481-1482, SEQ ID NO: 1483-1484, SEQ ID NO: 1485-1486, SEQ ID NO: 1487-1488, SEQ ID NO: 1489-1490, SEQ ID NO: 1491-1492, SEQ ID NO: 1493-1494, SEQ ID NO: 1495-1496, SEQ ID NO: 1497-1498 and SEQ ID NO: 1499-1500.
In one preferred embodiment, the human set of primers comprises SEQ ID NO: 1441-1442 to SEQ ID NO: 1499-1500.
In one more preferred embodiment, the human set of primers consists of SEQ ID NO: 1441-1442 to SEQ ID NO: 1499-1500.
The human set of primers may comprise at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29 or at least 30, or 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 pairs of primers able to produce the amplicons having the nucleic acid sequence selected from the group consisting of SEQ ID NO: 2221 to SEQ ID NO: 2250.
In one preferred embodiment, the human set of primers comprises pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 2221 to SEQ ID NO: 2250.
In one more preferred embodiment, the human set of primers consists of pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 2221 to SEQ ID NO: 2250.
In one embodiment, the composition of primers for detecting HSIL may also comprise a fusion set of primers. The primers of fusion set of primers target high risk and optionally of putative high risk HPV fusion transcripts.
The primers of fusion set of primers may comprise:
The fusion set of primers may further comprise:
In one embodiment, the fusion set of primers comprises the pairs of primers having the nucleic acid sequence SEQ ID NO: 865-866 to SEQ ID NO: 1331-1332 and optionally of SEQ ID NO: 1333-1334 to SEQ ID NO: 1367-1368 and/or SEQ ID NO: 1369-1370 to SEQ ID NO: 1403-1404 and/or SEQ ID NO: 1405-1406 to SEQ ID NO: 1439-1440.
In a preferred embodiment, the fusion set of primers comprises the pairs of primers having the nucleic acid sequence SEQ ID NO: 865-866 to SEQ ID NO: 1439-1440.
In one embodiment, the fusion set of primers consists of the pairs of primers having the nucleic acid sequence SEQ ID NO: 865-866 to SEQ ID NO: 1331-1332 and optionally of SEQ ID NO: 1333-1334 to SEQ ID NO: 1367-1368 and/or SEQ ID NO: 1369-1370 to SEQ ID NO: 1403-1404 and/or SEQ ID NO: 1405-1406 to SEQ ID NO: 1439-1440.
In a preferred embodiment, the fusion set of primers consists of the pairs of primers having the nucleic acid sequence SEQ ID NO: 865-866 to SEQ ID NO: 1439-1440.
The primers of fusion set of primers may comprise:
The fusion set may also comprise:
In one embodiment, the fusion set of primers comprises the pairs of primers having the nucleic acid sequence SEQ ID NO: 1933 to SEQ ID NO: 2166 and optionally of SEQ ID NO: 2167 to SEQ ID NO: 2184 and/or SEQ ID NO: 2185 to SEQ ID NO: 2202 and/or SEQ ID NO: 2203 to SEQ ID NO: 2220.
In a preferred embodiment, the fusion set of primers comprises the pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 1869 to SEQ ID NO: 2220.
In one embodiment, the fusion set of primers consists of the pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 1933 to SEQ ID NO: 2166 and optionally of SEQ ID NO: 2167 to SEQ ID NO: 2184 and/or SEQ ID NO: 2185 to SEQ ID NO: 2202 and/or SEQ ID NO: 2203 to SEQ ID NO: 2220.
In a preferred embodiment, the fusion set of primers consists of the pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 1869 to SEQ ID NO: 2220.
The present invention also relates to a kit for detecting HSIL comprising the composition of primers for detecting HSIL of the invention and optionally reagents for a cDNA amplification. Reagents available for this purpose are well-known in the art and include the DNA polymerases, buffers for the enzymes, detergents, enhancing agents. The kit of the invention may also comprise reagent for reverse transcription and/or for sequencing. In some preferred embodiments of the kit of the invention, the primers, and optional reagents are in lyophilised form to allow ambient storage. The components of the kits are packaged together into any of the various containers suitable for nucleic acid amplification such as plates, slides, wells, dishes, beads, particles, cups, strands, chips, strips and others. The kit optionally includes instructions for performing at least one specific embodiment of the method of the invention. In some advantageous embodiments, the kit comprises micro-well plates or microtubes, preferably in a dried format, i.e., wherein the wells of the plates or microtubes comprise a dried composition containing at least the primers, and preferably further comprising all the reagents for the reverse transcription, cDNA amplification or sequencing.
The present invention also relates to the use of the composition of primers for detecting HSIL of the invention or of the kit for detecting HSIL of the invention.
An In Vitro Method for Detecting HSIL in a Biological Sample
The present invention also relates to an in vitro method for detecting HSIL in a biological sample comprising the steps of:
Preferably, the step (d) of quantifying the expression level of each amplicon is carried by sequencing.
The step (d) of quantifying the expression level of each amplicon may comprise the steps of:
The quantification of the expression level of each amplicons may be carried using a partial digestion of the amplicons. Then, the step (d) of quantifying the expression level of each amplicon may comprise the steps of:
In a preferred embodiment, the step of determining if the biological sample comprises HSIL comprises a step of determining if the biological sample comprises HSIL corresponding to one HPV type defined herein based on the expression level of the amplicons quantified in step (d) specific of the said HPV type. In this embodiment, the step of determining if the biological sample comprises HSIL is carried for each HPV type. Thus, for each HPV type, the expression level of the amplicons corresponding this HPV type is analyzed and it is determined if the biological sample comprises a HSIL corresponding to this HPV type. If it is determined that the biological sample comprises a HSIL corresponding to at least one HPV type, than the biological sample is classified as comprising HSIL.
Preferably, the step of determining if the biological sample comprises HSIL is carried out by using a logistic regression analysis wherein the variables depend on the quantified level of expression the amplicons.
Thus, the step of determining if the biological sample comprises HSIL may comprise:
with:
wherein if one pHPVj is higher than 0.5, it is indicative of the presence of a HPVj HSIL in the biological sample.
In a preferred embodiment, the amplicons corresponding to the splice junction i=1 are respectively:
The in vitro method for detecting HSIL may comprise a step of treatment of the biological sample with a solution comprising 30-60 wt % of methanol and 40-70 wt % of water such as preservCyt.
Composition of Primers for HPV Typing
The present invention relates to a composition of primers for typing HPV selected from the group consisting of:
The present invention relates to a composition of primers for typing HPV comprising the set of primers selected from the group consisting of the splice junctions set of primers as defined above, a second set of primers, called unsplice junctions set of primers, a third set of primers, called genomic set of primers, and a fourth set of primers, called fusion set of primers.
The present invention also relates to a composition of primers for typing HPV comprising the splice junctions set of primers as defined above and an additional set of primers selected from the group consisting of a second set of primers, called unsplice junctions set of primers, a third set of primers, called genomic set of primers and a fourth set of primers, called fusion set of primers.
The method and composition of primers for typing HPV according the invention provides results as good as current gold standard test for HPV typing.
Moreover, the method and the composition of primers of the invention replace the current combination of cytology (Pap smear) and HPV molecular screening by a single molecular test for both the detection of high-risk or putative high-risk HPV and the triage of women at risk of transforming infection, before colposcopy. In particular, the splice junctions set of primers may be used for both detecting a HSIL lesion and typing HPV.
Preferably, the composition of primers for typing HPV of the invention comprises the splice junctions set of primers as defined above, an unsplice junctions set of primers, a genomic set of primers and a fusion set of primers.
Each of the unspliced junctions set, the genomic set of primers and the fusion set of primers may comprise a subset of pairs of primers specific of each high risk HPV and optionally a subset of primers specific of each putative high risk HPV.
The composition of primers for typing HPV of the invention may also comprises an additional fifth set of primers, called human set of primers. The human set of primers is as defined above.
The primers of unsplice junctions set of primers target high risk and optionally of putative high risk HPV genomic regions spanning either splice donor or splice acceptor sites in the absence of any splice event.
The unsplice junctions set of primers may comprise:
The unsplice junctions set of primers may further comprise:
In one embodiment, the unsplice junctions set of primers comprises the pairs of primers having the nucleic acid sequence SEQ ID NO: 431-432 to SEQ ID NO: 679-680 and optionally of SEQ ID NO: 681-682 to SEQ ID NO: 697-698 and/or SEQ ID NO: 699-700 to SEQ ID NO: 717-718 and/or SEQ ID NO: 719-720 to SEQ ID NO: 735-736.
In a preferred embodiment, the unsplice junctions set of primers comprises the pairs of primers having the nucleic acid sequence SEQ ID NO: 431-432 to SEQ ID NO: 735-736.
In one embodiment, the unsplice junctions set of primers consists of the pairs of primers having the nucleic acid sequence SEQ ID NO: 431-432 to SEQ ID NO: 679-680 and optionally of SEQ ID NO: 681-682 to SEQ ID NO: 697-698 and/or SEQ ID NO: 699-700 to SEQ ID NO: 717-718 and/or SEQ ID NO: 719-720 to SEQ ID NO: 735-736.
In a preferred embodiment, the unsplice junctions set of primers consists of the pairs of primers having the nucleic acid sequence SEQ ID NO: 431-432 to SEQ ID NO: 735-736.
The unsplice junctions set of primers may comprise:
The unsplice junctions set of primers may further comprise:
In one embodiment, the unsplice junctions set of primers comprises the pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 1716 to SEQ ID NO: 1840 and optionally of SEQ ID NO: 1841 to SEQ ID NO: 1849 and/or SEQ ID NO: 1850 to SEQ ID NO: 1859 and/or SEQ ID NO: 1860 to SEQ ID NO: 1868.
In a preferred embodiment, the unsplice junctions set of primers comprises the pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 1716 to SEQ ID NO: 1726 to SEQ ID NO: 1860 to SEQ ID NO: 1868.
In one embodiment, the unsplice junctions set of primers consists of the pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 1716 to SEQ ID NO: 1840 and optionally of SEQ ID NO: 1841 to SEQ ID NO: 1849 and/or SEQ ID NO: 1850 to SEQ ID NO: 1859 and/or SEQ ID NO: 1860 to SEQ ID NO: 1868.
In a preferred embodiment, the unsplice junctions set of primers consists of the pairs of primers having the nucleic acid sequence SEQ ID NO: 1716 to SEQ ID NO: 1726 to SEQ ID NO: 1868.
The primers of the genomic set of primers target high risk and optionally of putative high risk HPV genomic regions away from any splice donor or splice acceptor sites,
The genomic set of primers may comprise:
The genomic set of primers may further comprise:
In one embodiment, the genomic set of primers comprises the pairs of primers having the nucleic acid sequence SEQ ID NO: 737-738 to SEQ ID NO: 839-840 and optionally of SEQ ID NO: 841-842 to SEQ ID NO: 847-848 and/or SEQ ID NO: 849-850 to SEQ ID NO: 853-854 and SEQ ID NO: 855-856 and/or SEQ ID NO: 857-858 to SEQ ID NO: 863-864. In a preferred embodiment, the genomic set of primers comprises the pairs of primers having the nucleic acid sequence SE SEQ ID NO: 737-738 to SEQ ID NO: 863-864.
In one embodiment, the genomic set of primers consists of the pairs of primers having the nucleic acid sequence SEQ ID NO: 737-738 to SEQ ID NO: 839-840 and optionally of SEQ ID NO: 841-842 to SEQ ID NO: 847-848 and/or SEQ ID NO: 849-850 to SEQ ID NO: 853-854 and SEQ ID NO: 855-856 and/or SEQ ID NO: 857-858 to SEQ ID NO: 863-864. In a preferred embodiment, the genomic set of primers consists of the pairs of primers having the nucleic acid sequence SEQ ID NO: 737-738 to SEQ ID NO: 863-864.
The genomic set of primers may comprise:
The genomic set of primers may further comprise:
In one embodiment, the genomic set of primers comprises the pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 1869 to SEQ ID NO: 1920 and optionally of SEQ ID NO: 1921 to SEQ ID NO: 1924 and/or SEQ ID NO: 1925 to SEQ ID NO: 1928 and/or SEQ ID NO: 1929 to SEQ ID NO: 1932.
In a preferred embodiment, the genomic set of primers comprises the pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 1869 to SEQ ID NO: 1932.
In one embodiment, the genomic set of primers consists of the pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 1869 to SEQ ID NO: 1833 to 1840 and optionally of SEQ ID NO: 1921 to SEQ ID NO: 1924 and/or SEQ ID NO: 1925 to SEQ ID NO: 1928 and/or SEQ ID NO: 1929 to SEQ ID NO: 1932.
In a preferred embodiment, the genomic set of primers consists of the pairs of primers having the nucleic acid sequence SEQ ID NO: 1869 to SEQ ID NO: 1932.
The primers of fusion set of primers target high risk and optionally of putative high risk HPV fusion transcripts.
The primers of fusion set of primers may comprise:
The fusion set of primers may further comprise:
In one embodiment, the fusion set of primers comprises the pairs of primers having the nucleic acid sequence SEQ ID NO: 865-866 to SEQ ID NO: 1331-1332 and optionally of SEQ ID NO: 1333-1334 to SEQ ID NO: 1367-1368 and/or SEQ ID NO: 1369-1370 to SEQ ID NO: 1403-1404 and/or SEQ ID NO: 1405-1406 to SEQ ID NO: 1439-1440.
In a preferred embodiment, the fusion set of primers comprises the pairs of primers having the nucleic acid sequence SEQ ID NO: 865-866 to SEQ ID NO: 1439-1440.
In one embodiment, the fusion set of primers consists of the pairs of primers having the nucleic acid sequence SEQ ID NO: 865-866 to SEQ ID NO: 1331-1332 and optionally of SEQ ID NO: 1333-1334 to SEQ ID NO: 1367-1368 and/or SEQ ID NO: 1369-1370 to SEQ ID NO: 1403-1404 and/or SEQ ID NO: 1405-1406 to SEQ ID NO: 1439-1440.
In a preferred embodiment, the fusion set of primers consists of the pairs of primers having the nucleic acid sequence SEQ ID NO: 865-866 to SEQ ID NO: 1439-1440.
The primers of fusion set of primers may comprise:
The fusion set may also comprise:
In one embodiment, the fusion set of primers comprises the pairs of primers having the nucleic acid sequence SEQ ID NO: 1933 to SEQ ID NO: 2166 and optionally of SEQ ID NO: 2167 to SEQ ID NO: 2184 and/or SEQ ID NO: 2185 to SEQ ID NO: 2202 and/or SEQ ID NO: 2203 to SEQ ID NO: 2220.
In a preferred embodiment, the fusion set of primers comprises the pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 1869 to SEQ ID NO: 2220.
In one embodiment, the fusion set of primers consists of the pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 1933 to SEQ ID NO: 2166 and optionally of SEQ ID NO: 2167 to SEQ ID NO: 2184 and/or SEQ ID NO: 2185 to SEQ ID NO: 2202 and/or SEQ ID NO: 2203 to SEQ ID NO: 2220.
In a preferred embodiment, the fusion set of primers consists of the pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 1869 to SEQ ID NO: 2220.
In one embodiment, the composition of primers for HPV typing comprises the pairs of primers having the nucleic acid sequence SEQ ID NO: 1-2 to SEQ ID NO: 1439-1440.
In one embodiment, the composition of primers for HPV typing consists of the pairs of primers able to produce amplicons having the nucleic acid sequence SEQ ID NO: 1-2 to SEQ ID NO: 1439-1440.
In one embodiment, the composition of primers for HPV typing comprises the pairs of primers able to produce amplicons having the nucleic acid sequence SEQ ID NO: 1501 to SEQ ID NO: 2220.
In one embodiment, the composition of primers for HPV typing consists of the pairs of primers having the nucleic acid sequence SEQ ID NO: 1501 to SEQ ID NO: 2220.
The primers of the human set of primers target human sequences.
The human set of primers may comprise at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29 or at least 30, or 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 pairs of primers having the nucleic acid sequence selected from the group consisting of SEQ ID NO: 1441-1442, SEQ ID NO: 1443-1444, SEQ ID NO: 1445-1446, SEQ ID NO: 1447-1448, SEQ ID NO: 1449-1450, SEQ ID NO: 1451-1452, SEQ ID NO: 1453-1454, SEQ ID NO: 1455-1456, SEQ ID NO: 1457-1458, SEQ ID NO: 1459-1460, SEQ ID NO: 1461-1462, SEQ ID NO: 1463-1464, SEQ ID NO: 1465-1466, SEQ ID NO: 1467-1468, SEQ ID NO: 1469-1470, SEQ ID NO: 1471-1472, SEQ ID NO: 1473-1474, SEQ ID NO: 1475-1476, SEQ ID NO: 1477-1478, SEQ ID NO: 1479-1480, SEQ ID NO: 1481-1482, SEQ ID NO: 1483-1484, SEQ ID NO: 1485-1486, SEQ ID NO: 1487-1488, SEQ ID NO: 1489-1490, SEQ ID NO: 1491-1492, SEQ ID NO: 1493-1494, SEQ ID NO: 1495-1496, SEQ ID NO: 1497-1498 and SEQ ID NO: 1499-1500.
In one preferred embodiment, the human set of primers comprises SEQ ID NO: 1441-1442 to SEQ ID NO: 1499-1500.
In one more preferred embodiment, the human set of primers consists SEQ ID NO: 1441-1442 to SEQ ID NO: 1499-1500.
The human set of primers may comprise at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29 or at least 30, or 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 pairs of primers able to produce the amplicons having the nucleic acid sequence selected from the group consisting of SEQ ID NO: 2221 to SEQ ID NO: 2250.
In one preferred embodiment, the human set of primers comprises pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 2221 to SEQ ID NO: 2250.
In one more preferred embodiment, the human set of primers consists of pairs of primers able to produce the amplicons having the nucleic acid sequence SEQ ID NO: 2221 to SEQ ID NO: 2250.
In one embodiment, the composition of primers for HPV typing comprises the pairs of primers having the nucleic acid sequence SEQ ID NO: 1-2 to SEQ ID NO: 1499-1500.
Due to the redondancy between the pairs of primers the nucleic acid sequence SEQ ID NO: 1-2 to SEQ ID NO: 1499-1500 represent only 525 unique pairs of primers.
In one embodiment, the composition of primers for HPV typing consists of the pairs of primers able to produce amplicons having the nucleic acid sequence SEQ ID NO: 1-2 to SEQ ID NO: 1499-1500.
In one embodiment, the composition of primers for HPV typing comprises the pairs of primers able to produce amplicons having the nucleic acid sequence SEQ ID NO: 1501 to SEQ ID NO: 2250.
In one embodiment, the composition of primers for HPV typing consists of the pairs of primers having the nucleic acid sequence SEQ ID NO: 1501 to SEQ ID NO: 2250.
The present invention also relates to a kit for HPV typing comprising the composition of primers for HPV typing of the invention and optionally reagents for cDNA amplification.
The reagents for the kit for HPV typing may be the same as those for detecting HSIL.
The present invention also relates to the use of the composition of primers for HPV typing as defined above or of the kit for HPV typing as defined above for HPV typing.
An In Vitro Method for HPV Typing in a Biological Sample
The present invention also relates to an in vitro method for HPV typing in a biological sample comprising the steps of:
The quantification of the expression level of each amplicon as well as the step (a) to (c) may be carried by the same methods as those disclosed for the in vitro method for detecting HSIL.
Typically, the in vitro method for HPV typing in a biological sample further comprises the step (e) of for each HPV type, comparing the expression level of all amplicons specific of said HPV type with a reference value, wherein if the expression level of all the amplicons specific of said HPV type is higher than the reference value, it is indicative of the presence of said HPV type in the biological sample.
Indeed, the number of reads mapping to HPV-specific amplicons (i.e. the sum of categories “splice junction”, “unsplice junction” and “genomic”) was used to detect the presence of a given HPV genotype. According the results of the inventors, the reference value is preferably between of 100-200 reads, more preferably 150 reads.
The practice of the present invention will employ, unless otherwise indicated, conventional techniques, which are within the skill of the art. Such techniques are explained fully in the literature.
For a better understanding of the invention and to show how the same may be carried into effect, there will now be described by way of example a specific mode contemplated by the Inventors with reference to the accompanying drawings in which:
The examples and figures should not be interpreted in any way as limiting the scope of the present invention.
Material and Methods:
Evaluation of Transport Medium for RNA Conservation
HPV16-positive cervical squamous cell carcinoma SiHa cells were cultivated and inoculated at a final concentration of 7×104 cells/mL in four transport medium: PreservCyt Solution (Hologic, USA), NovaPrep HQ+ Solution (Novaprep, France), RNA Protect Cell Reagent (Qiagen, Germany) and NucliSens Lysis Buffer (BioMerieux, France). The mixtures were aliquoted in 1 mL tubes and kept at room temperature for 2 hours (DO), 48 hours (D2), 168 hours (D7), 336 hours (D14) and 504 hours (D21). In parallel, 7×104 cells pellets without transport medium were kept frozen −80° C. for 2 hours, 48 hours, 168 hours, 336 hours and 504 hours as a control. At D0, D2, D7, D14 and D21, room temperature aliquots were centrifuged, the medium removed, and the pellets were frozen −80° C. for a short time (<1 h) before proceeding with RNA extraction. In the particular case of the NucliSens Lysis Buffer since the cells were lysed, the entire 1 mL aliquot was frozen −80° C. for a short time without prior centrifugation. For each sample, RNA was extracted using the PicoPure RNA Isolation kit (Thermo Fisher Scientific, USA), together with the corresponding (time match) frozen control, so that all samples have undergone one freezing cycle. RT-qPCR was performed to quantify the expression of the two human genes G6PD (forward primer: TGCAGATGCTGTGTCTGG (SEQ ID NO: 2251); reverse primer: CGTACTGGCCCAGGACC (SEQ ID NO: 2252) and GAPDH (forward primer: GAAGGTGAAGGTCGGAGTC; reverse primer: GAAGATGGTGATGGGATTTC (SEQ ID NO: 2253)) and the expression of the two viral genes HPV16 E6 (forward primer: ATGCACCAAAAGAGAACTGC (SEQ ID NO: 2254); reverse primer: TTACAGCTGGGTTTCTCTAC (SEQ ID NO: 2255)) and E7 (forward primer: GTAACCTTTTGTTGCAAGTGTGACT (SEQ ID NO: 2256); reverse primer: GATTATGGTTTCTGAGAACAGATGG (SEQ ID NO:2257)). RNA integrity was assessed on a Bioanalyzer instrument (Agilent, USA).
HPV Selection and Splice Sites Analysis
HPV reference clones made available by the International Human Papillomavirus Reference Center (Karolinska University, Stockholm, Sweden) served as reference genomes, except for HPV68 which was retrieved from Chen et al. (Evolution and Taxonomic Classification of Alphapapillomavirus 7 Complete Genomes: HPV18, HPV39, HPV45, HPV59, HPV68 and HPV70. PLOS ONE, 2013, 8:e72565). Accession numbers used in this study were: K02718 (HPV16), X05015 (HPV18), J04353 (HPV31), M12732 (HPV33), X74477 (HPV35), M62849 (HPV39), X74479 (HPV45), M62877 (HPV51), X74481 (HPV52), X74483 (HPV56), D90400 (HPV58), X77858 (HPV59), U31794 (HPV66), KC470267 (HPV68), X94165 (HPV73) and AB027021 (HPV82). Multiple alignments of HPV genomes was done with ClustalW v2.1 using Geneious v10 (Kearse M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinforma Oxf Engl, 2012, 28:1647-9). Previously known splice donor (SD) and splice acceptor (SA) sites for HPV16 (Zheng Z-M, et al. Papillomavirus genome structure, expression, and post-transcriptional regulation. Front Biosci J Virtual Libr, 2006, 11:2286-302) and HPV18 (Wang X, et al. Construction of a full transcription map of human papillomavirus type 18 during productive viral infection. J Virol, 2011, 85:8080-92) were reported on the alignment, and predictions of unknown SD and SA sites were done manually for the other genotypes by sequence analogy (
A custom AmpliSeq panel was designed to be used on both PGM and Ion Proton instruments (Thermo Fisher Scientific). Five categories of target sequences were defined as follow:
HPV splice junctions (sp): a set of target sequences which are specific HPV splice events, involving a pair of splice donor (SD) and splice acceptor (SA) sites. The nomenclature includes a “sp” tag. For example, “31_sp_1296_3295_J43-46” stands for HPV31 (31), splice junction (sp), SD at position 1296 on HPV31 genome, SA at position 3295 on HPV31 genome, and junction (J) at position 43-46 on amplicon. The junction coordinates are given in a 4-bases interval, where the first 2 bases correspond to the donor part (or left part) and the last 2 bases to the acceptor part (or right part) of the sequence. Primers and amplicons corresponding to splice junctions set are given at Table 2A and 2Abis.
HPV unsplice junctions (unsp): a set of target sequences which are specific HPV genomic regions spanning either SD or SA sites, in the absence of any splice event. The nomenclature includes an “unsp” tag. For example, “31_unsp_1296_1297_J43-46” stands for HPV31 (31), unspliced (unsp), last base of the left part of the amplicon at position 1296 on HPV31 genome, first base of the right part of the amplicon at position 1297 on HPV31 genome, junction (J) at position 43-46 on amplicon. Primers and amplicons corresponding to unsplice junctions set are given at Table 2B and 2Bbis. In this context, the term ‘junction’ refers to the exon-intron interface (i.e. the position where a donor or acceptor site would be found in case of a splice event), and the associated junction coordinates are used to characterize unspliced sequences bioinformatically as described in section “Sequencing data processing”.
HPV genome away from splice junctions (gen): a set of target sequences which are specific HPV genomic regions, away from any SD or SA sites. The nomenclature includes a “gen” tag. For example, “45_gen_1664_1794_NoJ” stands for HPV45 (45), HPV genomic region (gen), amplicon coordinates from position 1664 to position 1794 on HPV45 genome. Primers and amplicons corresponding to the genomic set are given at Table 2C and 2Cbis.
HPV-human fusion sequences (fus): a set of hypothesis-driven viral-cellular fusion transcripts, based on previous descriptions (Wentzensen N, et al. Characterization of viral-cellular fusion transcripts in a large series of HPV16 and 18 positive anogenital lesions. Oncogene, 2002, 21:419-2622-26, Tang K-W, et al. The landscape of viral expression and host gene fusion and adaptation in human cancer. Nat Commun, 2013, 4:2513, Peter M, et al. MYC activation associated with the integration of HPV DNA at the MYC locus in genital tumors. Oncogene, 2006, 25:5985-93, Lu X, et al. Multiple-integrations of HPV16 genome and altered transcription of viral oncogenes and cellular genes are associated with the development of cervical cancer. PloS One, 2014, 9:e97588, Kraus I, et al. The majority of viral-cellular fusion transcripts in cervical carcinomas cotranscribe cellular sequences of known or predicted genes. Cancer Res, 2008, 68:2514-22). For each HPV, 18 fusion sequence candidates involving SA2 or putative breakpoint 1 or 2 (put. bkpt, see
Human sequences (hg): a set of 30 human sequences used as internal controls retrieved from publically available AmpliSeq projects and representing housekeeping genes (ACTB, B2M, GAPDH, GUSB, RPLPO), epithelial markers (KRT10, KRT14, KRT17), oncogenes, tumor supressor genes canonical cancer pathways and direct or indirect downstream effectors of HPV oncoproteins (AKT1, BCL2, BRAF, CDH1, CDKN2A, CDKN2B, ERBB2, FOS, HRAS, KRAS, MET, MK167, MYC, NOTCH1, PCNA, PTEN, RB1, STAT1, TERT, TOP2A, TP53, WNT1). The nomenclature for these sequences includes an “hg” tag. For example, “hg_TOP2A_E21E22” stands for human topoisomerase 2A mRNA exon (Wang X, et al. 2011, Wentzensen N, et al., 2002). Primers and amplicons corresponding to the human set are given at Table 2E and 2Ebis.
In total, 750 target sequences were included into the panel (Table 1) and can be amplified with a pool of 525 unique primers (Table 2A-2E). The average amplicon size of the panel (primers included) is 141 bp (range: 81-204 bp). A detailed table including the nucleic acid sequences of the primers along with their corresponding amplicons and amplicon sequences is given in Table 2Abis-2Ebis.
Table 1 below shows the HPV RNA-Seq AmpliSeq custom panel contents. The number of target amplicons is indicated for each category (sp, unsp, gen, fus, hg) and for each viral and cellular origin. Putative high-risk HPV are indicated by a star (*).
Study Participants
Study participants were women aged from 25 to 65 years old referred for colposcopy consultation in French hospitals. The patients were referred for colposcopy in the context of a LSIL or a HSIL result at their cytology test performed in accordance with French recommendations regarding the cervical cancer screening program. Patients provided written informed consent according to French legislation.
Specimen Collection
Genital samples were collected just before performing colposcopy using a cervical sampling device, immersed and rinsed in a vial filled with 20mL of PreservCyt Solution (Hologic, USA), and sent at room temperature to the HPV National Reference Center (CNR) at Institut Pasteur, Paris, France. From July 2014 to April 2015, 84 patients were enrolled in the study, coming from 3 different French centers: CHU Angers (n=66); CHU Kremlin-Bicêtre (n=10); CHU Tours (n=6). Samples were removed of the study because of technical reasons (sample leakage, n=1) or legal issues (n=7) or because they were used for initial technical tests (RNA conservation, RNA extraction and amplification, n=4). The remaining 72 samples (HSIL=37; LSIL=35) were processed.
Data Collection
The following bio-clinical data were collected: date and results of the cytology test, age at the time of the cytology test, date and results of all available histological results posterior to colposcopy. As colposcopy was performed in the context of routine healthcare, biopsies were not performed in case of normal colposcopy.
HPV DNA Detection Using the PapilloCheck Test Kit (HPV DNA)
Upon reception at CNR, 16 mL of cytological sample were transferred into a 50 mL Falcon tube and centrifuged at 4,500 g for 10 minutes. The supernatant was removed and the pellet washed with 1 mL of PBS. Sample was then centrifuged again at 5000 g for 10 minutes and the supernatant removed. The pellet was frozen at −80° C. before DNA extraction. Following DNA extraction (Macherey Nagel, Germany), HPV detection was done using the PapilloCheck Test Kit (Greiner Bio-One GmbH, Germany) according to manufacturer instructions.
RNA Extraction and Characterization
In parallel to the HPV DNA procedure, 3×1 mL aliquots of cytological specimen were centrifuged at 14,000 rpm for 7 minutes, the supernatant was removed and the pellet was washed with 1 mL of PBS. Sample was then centrifuged again at 14,000 rpm for 7 minutes and the supernatant removed. The pellet was frozen at −80° C. before RNA extraction. RNA extractions were done using the PicoPure RNA Isolation kit (Thermo Fisher Scientific,), including on-column DNAse treatment, with a final elution volume of 30 μl. Total RNA was quantified on a Nanodrop (Life Technologies) and RNA integrity was evaluated on a Bioanalyzer RNA 6000 pico chip (Agilent) using the RIN (RNA Integrity Number), a quality score ranging from 1 (strongly degraded RNA) to 10 (intact RNA). For each sample, RT-qPCR targeting mRNA from on housekeeping genes ACTB (forward primer: CATCGAGCACGGCATCGTCA (SEQ ID NO: 2258); reverse primer: TAGCACAGCCTGGATAGCAAC (SEQ ID NO: 2259); amplicon size=210 bp) and GAPDH (forward primer: GAAGGTGAAGGTCGGAGTC (SEQ ID NO: 2260); reverse primer: GAAGATGGTGATGGGATTTC (SEQ ID NO: 2261); amplicon size=226 bp) were done in a SYBR Green format with 45 cycles of amplification. RT-negative (RT-) PCR were also run to evaluate the presence of residual DNA after RNA extraction.
Amplification and Sequencing
Starting from RNA, cDNA were generated using the SuperScript III (n=17 samples) or Superscript IV (n=55 samples) (Thermo Fisher Scientific) with random hexamers and a final RNAse H treatment. Libraries were prepared using the Ion AmpliSeq Library Kit 2.0 and AmpliSeq custom panel WG_WG00141, with 21 cycles of amplification before adapter's ligation. Each sample was barcoded individually. Only positive libraries were sequenced. In total, 55 clinical samples plus 1 cellular model (SiHa) were sequenced on 4 Ion Proton runs.
Sequencing Data Processing
Reads were aligned to the reference sequences of the amplicons using STAR23 v2.5.3a in local alignment mode (parameter -alignEndsType EndToEnd), by only reporting uniquely mapped reads (-outFilterMultimapNmax 1) and turning off splicing alignment (-alignIntronMax 1). The expression of each amplicon was evaluated by the number of sequencing reads uniquely mapping to their respective sequence (read counts). For reference sequences containing a splice junction, only reads mapping at the junction site and encompassing at least 10 bases before and 10 bases after the junction were kept.
HSIL Prediction Model
Selection of Amplicons
Read counts were normalized by the size of the library (each read count was divided by a ratio of the library size for a given sample to that of the average library size across samples) and the 215 amplicons capturing splice junctions (sp) of the 16 high-risk or putative high risk HPV were selected. These amplicons have been annotated with generic names with respect to the type of transcripts they capture, which are shared across HPV species (e.g. “SD1-SA1”, see
Logistic Regression Model
Calling high grade cytology Y as taking the value 1 for high grade (HSIL) and 0 for low grade (LSIL), and a set of amplicons x, a logistic regression model was used to predict the probability that a given observation belongs to the “1” class versus the probability that it belongs to the “0” class. Logistic regression models the log odds of the event (here the grade of the cytology) as a function of the predictor variables (here the amplicon expression estimated by its read count). Formally, the logistic regression model assumes that the log odds is a linear function of the predictors:
where indicates the probability of the event (being of high-grade), βi are the regression coefficients, and xi the explanatory variables, in our case the log 2 number of reads mapping to the amplicons.
Solving for π, this gives:
Implementation of the Logistic Regression Model
To limit overfitting, the inventors used L2-norm (ridge) regularization, which allows shrinking the magnitudes of the regression coefficients such that they will better fit future data. The inventors estimated the logistic model using the R (http://www.r-project.org/) package glmnet (Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw, 2010, 33:1-22). Leave-one-out (LOO) cross-validation was used to pick the regularization parameter λ, the one that gives minimum mean cross-validated misclassification error was used. Using λ as the regularization parameter, the model output consisted in an estimate of a coefficient value β for each variable in the logistic regression model. This model was then used to predict the grade of the multi-infected observations, by treating each HPV species separately.
Training Set and Test Set
The model was built upon the clinical outcome LSIL or HSIL obtained from the cytological analysis, and estimated on a training set consisting of 20 mono-infected samples (5 LSIL and 15 HSIL) in order to avoid a confusion bias. It is indeed anticipated that, in the case of multi-infected samples, several HPV could contribute differently to the progression of the lesion or to a mix of several grades within the same sample, because they are engaged in different stages of their cycle. The performance of the model was then evaluated on a test set consisting of 13 multi-infected samples. In this case, the set of amplicons of each HPV species was used separately to classify the multi-infected samples, to get one prediction per HPV, as done for the mono-infected samples. For example if a sample had expression of amplicons from both HPV16 and HPV32, two predictions were given: one using only sequencing reads mapping to HPV16, and one using only sequencing reads mapping to HPV32. Like this it became possible to interpret the results finely from a virological point of view, as the inventors could discriminate which HPV was responsible of the lesion.
Results:
Evaluation of Transport Medium for RNA Conservation
The stability of total RNA from cervical cells at room temperature was evaluated in four solutions: PreservCyt (Hologic), the most widely used solution for gynecological specimen collection; NovaPrep HQ+ Solution (Novaprep), a competitor product used for cells and DNA recovery but never evaluated for RNA conservation; RNA Protect Cell Reagent (Qiagen), a popular solution for RNA stability; and NucliSens Lysis Buffer (BioMérieux), a lysis buffer part of the NucliSens automated acid nucleic procedure which has been described as a RNA stabilizer. The amount of spiked HPV16-positive cervical squamous cell carcinoma cells (SiHa) was calibrated to be representative of a cervical smear. After 48 h at room temperature, RT-qPCR measurement of cellular and viral transcripts showed no or little RNA loss in PreservCyt, only limited RNA degradation (<1 log) in RNA Protect and NucliSens Lysis Buffer, and a marked RNA loss in NovaPrep HQ+ Solution (>2 log). After 7 days and up to 21 days, only the PreservCyt solution provided RNA quality with a limited RNA degradation pattern as indicated by the detection of 18S and 28S rRNA. The inventors therefore decided to use the PreservCyt solution to collect the gynecological specimen of the study.
HPV RNA-Seq AmpliSeq Custom Panel
Transcriptomic maps known for HPV1620 and HPV1821 were used to predict unknown but likely splice donor and splice acceptor sites for HPV31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 66, 68, 73, and 82 (
Samples, RNA & Sequencing
72 gynecological samples (HSIL=37; LSIL=35) coming from 3 different French centers (Angers, Kremlin-Bicêtre and Tours) and collected in PreservCyt solution were processed with RNA extraction using a method designed to recover total RNA from as little as a single cell (PicoPure RNA Isolation kit, Thermo Fisher Scientific, USA). In most of the cases total RNA was measurable using a Nanodrop (70/72 positive, average on positive RNA eluates=18 ng/μL) and was detectable on a Bioanalyzer pico RNA chip with a pattern indicating RNA degradation (63/72 positive, average RNA Integrity Number on positive=2.2). RT-qPCR performed for all samples on ACTB mRNA (amplicon size=210 bp) and GAPDH mRNA (amplicon size=226 bp) indicated that RNA quality was compatible with amplification of 200-250 bp size fragments (ACTB mRNA average Ct=27.8; GAPDH mRNA average Ct=30.1). Samples that failed passing this initial RT-PCR quality control were not sequenced. qPCR performed after omitting the reverse transcription step (RT-) were also run and showed in general no or little traces of residual genomic DNA (ACTB DNA average Ct=38.4; GAPDH DNA average Ct=35.6). Note, the presence of residual cellular DNA or HPV DNA in RNA preparation is not a major concern since the AmpliSeq assay can differentiate between HPV transcripts and genomic sequences. AmpliSeq libraries were initiated from total RNA and were positive after 21 cycles of amplification for 55 samples (i.e. detectable on a Bioanalyzer HS DNA chip). Attempts to add one or two amplification cycles did not bring any significant improvement to the results (data not shown). In total, 55 patients (HSIL=27; LSIL=28), plus SiHa HPV16-positive cells as a control, had been sequenced on Ion Proton. The sequencing reads were aligned to the target sequences and read counts were generated. An average of 2.4 million usable reads per sample was reached (min=0.02M; max=8.3M), among which an average of 2.1 million reads mapped to the human sequences (hg) used as internal controls (min=0.01M; max=8.06M). The detection of highly expressed human sequences in all samples, even though inter-sample variations were observed, contributed to validate the sequencing procedure, which is important especially for the interpretation of HPV-negative samples. Rare non-zero values were also observed for some of the numerous HPV-human fusion sequences (fus) that were hypothesized but were all false positives, identified as such because only half of the reference sequences were covered by reads.
HPV RNA-Seq Used for HPV Detection and Genotyping
The first application of HPV RNA-Seq is to detect the presence in a given sample of any of the 16 high-risk or putative high-risk HPV targeted by the panel. The number of reads mapping to HPV-specific amplicons (i.e. the sum of categories “sp”, “unsp” and “gen”) was used to detect the presence of a given HPV genotype. To help determining a threshold for detection, we took as a reference a HPV DNA test validated for clinical use (PapilloCheck, Greiner Bio-One GmbH). The best sensitivity and specificity values between the two tests were obtained for threshold of 100-200 reads (
A more detailed view of the genotypes identified by both techniques is given in
Using a threshold value of 150 reads, HPV RNA-Seq detected two more positive patients than the HPV DNA test (n=39 vs n=37, Table 3). HPV RNA-Seq identified the presence of more than one HPV for three more patients than the HPV DNA test (n=13 vs n=10 multi-infected samples, Table 4). Globally, HPV16 was found at a slightly weaker occurrence by HPV RNA-Seq (n=18 vs n=19) in favor of other genotypes such as HPV31, 33, 45, 52, 56, 58 or 66 which were less commonly found by the HPV DNA test (HPV31 n=5 vs n=4; HPV33 n=3 vs n=1; HPV45 n=3 vs n=2; HPV52 n=5 vs n=3; HPV56 n=4 vs n=2; HPV58 n=5 vs n=4; HPV66 n=2 vs n=1,
HPV RNA-Seq Used as a Marker of High-Grade Cytology
The inventors conducted an exploratory analysis on 20 of the mono-infected samples in which they showed that HPV RNA splice junctions could be used to predict high-grade cytology. They focused the analysis on amplicons capturing splice junctions (category “sp”) to be sure to detect HPV transcripts. However, the number of mono-infected samples (n=20) used as training set was small, in particular the number of samples of LSIL (n=5). LOO cross-validation was used to pick the lambda giving the minimum cross-validated error using ridge regularization. Lambda=0.08 gave a mean cross-validated error of 15%. The inventors also computed a 20% prediction error using nested cross-validation. This error rate can be seen as an indicator of how the model could fit future datasets. The inventors used the corresponding parameter to fit a regularized logistic regression model, assigning a coefficient to each amplicon (Table 5) and a probability of being of high-grade to each sample (Table 6). In table 5, the first and fourth columns give the id of the splice junction captured by the amplicon, the second column gives the coefficient assigned by the logistic regression, the third column indicate whether the splice junction comes from a “late” or “early” transcript.
Table 6 shows the classification results of the (ridge) logistic regression. The first column gives the sample id, the second column gives the probability estimate that the sample is HSIL, the third and fourth columns gives the corresponding prediction, the fifth column contains TRUE if the prediction is consistent with the grade evaluated by cytology.
The grade of the 20 mono-infected samples was classified correctly, except for one observation (Table 5). It is interesting to note that this unique misclassified sample (IonXpress_019_2613), which was classified LSIL by the cytological analysis, was further found as containing a mixture of LSIL and HSIL lesions after histological examination performed more than one year after the sampling done for HPV RNA-Seq/cytology sampling.
The estimated model was then used to classify the 13 multi-infected samples, with each HPV species present within one sample being classified individually for its implication in HSIL development. If at least one HPV species gave a HSIL prediction, the sample was considered to be HSIL. We calculated performances for HSIL prediction for all samples, considering as not being of high-grade both the six samples without sufficient coverage of the splice junctions and the 16 HPV-negative samples not exceeding the threshold of HPV detection. The calculated performances for HSIL prediction in comparison to cytology for the 55 patients (mono-infected, multi-infected and HPV-negative) were Se(cyto)=66.7%, Sp(cyto)=85.7%, PPV(cyto)=81.8% and NPV(cyto)=72.7% (Table 7A). The performances were also calculated for the subset of 39 samples having at least one HPV identified by HPV RNA-Seq, giving in this case Se(cyto/HR+)=94.7%, Sp(cyto/HR+)=80.0%, PPV(cyto/HR+)=81.8% and NPV(cyto/HR+)=94.1% (Table 7B). In table 7, Sensitivity (Se), Specificity (Sp), Positive Predictive Value (PPV) and Negative Predictive Value (NPV) are given. “Not HSIL” means that either no HPV was detected in the sample by HPV RNA-Seq or that none of the HPV genotypes detected were given HSIL prediction.
Note that the ratio HSIL to LSIL remained similar between these two populations (around 1:1), making the comparison of the PPV and the NPV possible. Finally a summary of the results for HPV detection and genotyping (HPV RNA-Seq vs HPV DNA) and high-grade cytology prediction (HPV RNA-Seq vs cytology), including posterior histological data of cervix biopsies when available, is presented in Table 8.
HPV RNA-Seq Used as a Triage Test
The performances of HPV RNA-Seq as a triage test were evaluated using histology as gold standard. Results from histological examination were, however, not available for all patients. The time interval separating HPV RNA-Seq/cytology tests from histological analysis, varying between 0 and 780 days, was another limitation in this study. To try to overcome these drawbacks, we compared the performances of HPV RNA-Seq vs histology to the performances of cytology vs histology, considering either all available samples or only samples for which histology was done less than 3 months after HPV RNA-Seq/cytology or only samples for which histology was done less than 6 months after HPV RNA-Seq/cytology. In addition and for each category, we made the distinction between the performances obtained when HPV RNA-Seq HPV-positive and HPV-negative patients were grouped together or when only HPV-positive patients were considered. Calculation of the PPV as a function of HSIL prevalence in the population was also done.
Throughout this application, various references describe the state of the art to which this invention pertains. The disclosures of these references are hereby incorporated by reference into the present application.
Number | Date | Country | Kind |
---|---|---|---|
19305394.9 | Mar 2019 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2020/057078 | 3/16/2020 | WO | 00 |