A Sequence Listing in XML format, entitled 5470-905WO_ST26.xml, 1,103,014 bytes in size, generated on Feb. 22, 2023 and filed herewith, is hereby incorporated by reference in its entirety for its disclosures.
This invention relates to human papilloma virus (HPV) positive cancers such as HPV+ squamous cell carcinoma of the oropharynx (OPSCC). This invention further relates methods of determining treatment regimens, methods of stratifying prognosis from treatment of HPV positive cancers, methods of determining suitability for de-escalation of treatment of HPV positive cancers, and methods of treating HPV positive cancers.
HPV-positive (HPV+) squamous cell carcinoma of the oropharynx (OPSCC) is the most prevalent HPV-associated malignancy in the United States and is primarily caused by HPV16. HPV+ OPSCC has surpassed cervical cancer in incidence and is the most commonly diagnosed malignancy caused by HPV in the USA (Pan et al. Cancers Head Neck. 2018; 3). HPV+ OPSCC has an improved prognosis compared to non-HPV OPSCC, however, treatment can carry significant, lifelong therapeutic toxicity. While the concept of de-escalation of therapy represents an effort to limit morbidity while preserving tumor control, there are few available tools to select appropriate patients with favorable prognosis.
The present invention overcomes previous shortcomings in the art by providing methods of stratifying risk prognosis, determining treatment regimens, and determining suitability for de-escalated therapy for HPV-associated malignancies such as OPSCC.
One aspect of the present invention provides a method of determining a treatment regimen for a subject having human papillomavirus (HPV) positive (HPV+) cancer (e.g., OPSCC) or a subject at risk for or suspected to have or develop HPV+ cancer (e.g., OPSCC), comprising: a) obtaining a sample from the subject; b) detecting a level of expression of HPV virus genome and/or genome product in the sample; c) obtaining sequence information of the HPV viral genome and/or genome product in the sample; d) comparing the sequence information obtained in c) to a reference HPV viral genome; and e) determining the prognosis of the subject upon treatment to the cancer, wherein a low divergence of the sequence information of the detected HPV virus genome and/or genome product as compared to the reference HPV viral genome identifies the subject as a candidate for standard (e.g., “therapeutic”) treatment for OPSCC, and wherein a greater divergence identifies the subject as a suitable candidate for de-escalated (e.g., “sub-therapeutic”) treatment for the cancer.
Another aspect of the present invention provides a method of stratifying prognosis from treatment of human papillomavirus (HPV) positive cancer in a subject having human papillomavirus (HPV) positive (HPV+) cancer or a subject at risk for or suspected to have or develop HPV+ cancer, comprising: a) obtaining a sample from the subject; b) detecting a level of expression of HPV virus genome and/or genome product in the sample; c) obtaining sequence information of the HPV viral genome and/or genome product in the sample; d) comparing the sequence information obtained in c) to a reference HPV viral genome; and e) stratifying the prognosis of the subject upon treatment to the cancer, wherein a low divergence of the sequence information of the detected HPV virus genome and/or genome product as compared to the reference HPV viral genome categorizes the subject as having elevated risk of poor prognosis from de-escalated (e.g., “sub-therapeutic”) treatment to the cancer, and wherein a greater divergence categorizes the subject as having reduced risk of poor prognosis from de-escalated treatment to the cancer.
Another aspect of the present invention provides a method of determining suitability for de-escalation of treatment of human papillomavirus (HPV) positive cancer in a subject having human papillomavirus (HPV) positive (HPV+) cancer or a subject at risk for or suspected to have or develop HPV+ cancer, comprising: a) obtaining a sample from the subject; b) detecting a level of expression of HPV virus genome and/or genome product in the sample; c) obtaining sequence information of the HPV viral genome and/or genome product in the sample; d) comparing the sequence information obtained in c) to a reference HPV viral genome; and e) determining the prognosis of the subject upon treatment to the cancer, wherein a greater divergence identifies the subject as a suitable candidate for de-escalated (e.g., “sub-therapeutic”) treatment for the cancer.
Another aspect of the present invention provides a method of de-escalating treatment of human papillomavirus (HPV) positive cancer in a subject having human papillomavirus (HPV) positive (HPV+) cancer or a subject at risk for or suspected to have or develop HPV+ cancer and undergoing standard (e.g., “therapeutic,” “escalated”) treatment of the cancer, comprising: a) obtaining a sample from the subject; b) detecting a level of expression of HPV virus genome and/or genome product in the sample; c) obtaining sequence information of the HPV viral genome and/or genome product in the sample; d) comparing the sequence information obtained in c) to a reference HPV viral genome; e) identifying the risk of poor prognosis of the subject from de-escalated treatment to the cancer, wherein a low divergence of the sequence information of the detected HPV virus genome and/or genome product as compared to the reference HPV viral genome categorizes the subject as having elevated risk of poor prognosis from de-escalated (e.g., “sub-therapeutic”) treatment to the cancer, and wherein a greater divergence categorizes the subject as having reduced risk of poor prognosis from de-escalated treatment to the cancer; and f) treating the subject identified as having reduced risk of poor prognosis with de-escalated treatment as compared to standard treatment for the cancer.
Another aspect of the present invention provides a method of treating human papillomavirus (HPV) positive (HPV+) cancer in a subject having human papillomavirus (HPV) positive (HPV+) cancer or a subject at risk for or suspected to have or develop HPV+ cancer, comprising: a) obtaining a sample from the subject; b) detecting a level of expression of HPV virus genome and/or genome product in the sample; c) obtaining sequence information of the HPV viral genome and/or genome product in the sample; d) comparing the sequence information obtained in c) to a reference HPV viral genome; e) identifying the risk of poor prognosis of the subject from de-escalated treatment to the cancer, wherein a low divergence of the sequence information of the detected HPV virus genome and/or genome product as compared to the reference HPV viral genome categorizes the subject as having elevated risk of poor prognosis from de-escalated (e.g., “sub-therapeutic”) treatment to the cancer, and wherein a greater divergence categorizes the subject as having reduced risk of poor prognosis from de-escalated treatment to the cancer; and f) treating the subject identified as having reduced risk of poor prognosis with de-escalated treatment as compared to standard (e.g., “therapeutic”) treatment for the cancer.
In some embodiments, the sequence information of the HPV viral genome and/or genome product in the sample comprises RNA and/or DNA sequence information.
The present invention now will be described hereinafter with reference to the accompanying drawings and examples, in which embodiments of the invention are shown. This description is not intended to be a detailed catalog of all the different ways in which the invention may be implemented, or all the features that may be added to the instant invention. For example, features illustrated with respect to one embodiment may be incorporated into other embodiments, and features illustrated with respect to a particular embodiment may be deleted from that embodiment. Thus, the invention contemplates that in some embodiments of the invention, any feature or combination of features set forth herein can be excluded or omitted. In addition, numerous variations and additions to the various embodiments suggested herein will be apparent to those skilled in the art in light of the instant disclosure, which do not depart from the instant invention. Hence, the following descriptions are intended to illustrate some particular embodiments of the invention, and not to exhaustively specify all permutations, combinations, and variations thereof.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
All publications, patent applications, patents and other references cited herein are incorporated by reference in their entireties for the teachings relevant to the sentence and/or paragraph in which the reference is presented.
Unless the context indicates otherwise, it is specifically intended that the various features of the invention described herein can be used in any combination. Moreover, the present invention also contemplates that in some embodiments of the invention, any feature or combination of features set forth herein can be excluded or omitted. To illustrate, if the specification states that a composition comprises components A, B and C, it is specifically intended that any of A, B or C, or a combination thereof, can be omitted and disclaimed singularly or in any combination.
As used in the description of the invention and the appended claims, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise.
Also as used herein, “and/or” refers to and encompasses any and all possible combinations of one or more of the associated listed items, as well as the lack of combinations when interpreted in the alternative (“or”).
The term “about,” as used herein when referring to a measurable value such as an amount or concentration and the like, is meant to encompass variations of ±10%, ±5%, ±1%, ±0.5%, or even ±0.1% of the specified value as well as the specified value. For example, “about X” where X is the measurable value, is meant to include X as well as variations of ±10%, ±5%, ±1%, ±0.5%, or even ±0.1% of X. A range provided herein for a measurable value may include any other range and/or individual value therein.
As used herein, phrases such as “between X and Y” and “between about X and Y” should be interpreted to include X and Y. As used herein, phrases such as “between about X and Y” mean “between about X and about Y” and phrases such as “from about X to Y” mean “from about X to about Y.”
The term “comprise,” “comprises” and “comprising” as used herein, specify the presence of the stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
As used herein, the transitional phrase “consisting essentially of” means that the scope of a claim is to be interpreted to encompass the specified materials or steps recited in the claim and those that do not materially affect the basic and novel characteristic(s) of the claimed invention. Thus, the term “consisting essentially of” when used in a claim of this invention is not intended to be interpreted to be equivalent to “comprising.” With respect to the terms “comprising”, “consisting essentially of”, and “consisting of”, where one of these three terms is used herein, the presently disclosed subject matter can include the use of either of the other two terms.
The term “consists essentially of” (and grammatical variants), as applied to a polynucleotide or polypeptide sequence of this invention, means a polynucleotide or polypeptide that consists of both the recited sequence (e.g., SEQ ID NO) and a total of ten or fewer (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) additional nucleotides or amino acids on the 5′ and/or 3′ or N-terminal and/or C-terminal ends of the recited sequence or between the two ends (e.g., between domains) such that the function of the polynucleotide or polypeptide is not materially altered. The total of ten or fewer additional nucleotides or amino acids includes the total number of additional nucleotides or amino acids added together. The term “materially altered,” as applied to polynucleotides of the invention, refers to an increase or decrease in ability to express the encoded polypeptide of at least about 50% or more as compared to the expression level of a polynucleotide consisting of the recited sequence. The term “materially altered,” as applied to polypeptides of the invention, refers to an increase or decrease in biological activity of at least about 50% or more as compared to the activity of a polypeptide consisting of the recited sequence.
The term “sequence identity,” as used herein, has its standard meaning in the art. As is known in the art, a number of different programs can be used to identify whether a polynucleotide or polypeptide has sequence identity or similarity to a known sequence. Sequence identity or similarity may be determined using standard techniques known in the art, including, but not limited to, the local sequence identity algorithm of Smith & Waterman, Adv. Appl. Math. 2:482 (1981), by the sequence identity alignment algorithm of Needleman & Wunsch, J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Natl. Acad. Sci. USA 85:2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Drive, Madison, WI), the Best Fit sequence program described by Devereux et al., Nucl. Acid Res. 12:387 (1984), preferably using the default settings, or by inspection.
An example of a useful algorithm is PILEUP. PILEUP creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments. It can also plot a tree showing the clustering relationships used to create the alignment. PILEUP uses a simplification of the progressive alignment method of Feng & Doolittle, J. Mol. Evol. 35:351 (1987); the method is similar to that described by Higgins & Sharp, CABIOS 5:151 (1989).
Another example of a useful algorithm is the BLAST algorithm, described in Altschul et al., J. Mol. Biol. 215:403 (1990) and Karlin et al., Proc. Natl. Acad. Sci. USA 90:5873 (1993). A particularly useful BLAST program is the WU-BLAST-2 program which was obtained from Altschul et al., Meth. Enzymol., 266:460 (1996); blast.wustl/edu/blast/README.html. WU-BLAST-2 uses several search parameters, which are preferably set to their default values. The parameters are dynamic values and are established by the program itself depending upon the composition of the particular sequence of interest and the composition of the particular database against which the sequence of interest is being searched; however, the values may be adjusted to increase sensitivity.
An additional useful algorithm is gapped BLAST as reported by Altschul et al., Nucleic Acids Res. 25:3389 (1997).
A percentage amino acid sequence identity value is determined by the number of matching identical residues divided by the total number of residues of the “longer” sequence in the aligned region. The “longer” sequence is the one having the most actual residues in the aligned region (gaps introduced by WU-BLAST-2 to maximize the alignment score are ignored).
In a similar manner, percent nucleic acid sequence identity is defined as the percentage of nucleotide residues in the candidate sequence that are identical with the nucleotides in the polynucleotide specifically disclosed herein.
The alignment may include the introduction of gaps in the sequences to be aligned. In addition, for sequences that contain either more or fewer nucleotides than the polynucleotides specifically disclosed herein, it is understood that in one embodiment, the percentage of sequence identity will be determined based on the number of identical nucleotides in relation to the total number of nucleotides. Thus, for example, sequence identity of sequences shorter than a sequence specifically disclosed herein, will be determined using the number of nucleotides in the shorter sequence, in one embodiment. In percent identity calculations, relative weight is not assigned to various manifestations of sequence variation, such as insertions, deletions, substitutions, etc.
In one embodiment, only identities are scored positively (+1) and all forms of sequence variation including gaps are assigned a value of “0,” which obviates the need for a weighted scale or parameters as described below for sequence similarity calculations. Percent sequence identity can be calculated, for example, by dividing the number of matching identical residues by the total number of residues of the “shorter” sequence in the aligned region and multiplying by 100. The “longer” sequence is the one having the most actual residues in the aligned region.
As used herein, an “isolated” nucleic acid or nucleotide sequence (e.g., an “isolated DNA” or an “isolated RNA”) means a nucleic acid or nucleotide sequence separated or substantially free from at least some of the other components of the naturally occurring organism or virus, for example, the cell or viral structural components or other polypeptides or nucleic acids commonly found associated with the nucleic acid or nucleotide sequence.
As used herein, the term “nucleic acid” refers to a single or double-stranded polymer of deoxyribonucleotide or ribonucleotide bases read from the 5′ to the 3′ end. The “nucleic acid” may also optionally contain non-naturally occurring or modified nucleotide bases. The term “nucleotide sequence” or “nucleic acid sequence” refers to both the sense and antisense strands of a nucleic acid, either as individual single strands or in the duplex. The term “ribonucleic acid” (RNA) is inclusive of RNAi (inhibitory RNA), dsRNA (double stranded RNA), siRNA (small interfering RNA), shRNA (short/small hairpin RNA), mRNA (messenger RNA), miRNA (micro-RNA), tRNA (transfer RNA, whether charged or discharged with a corresponding acylated amino acid), long non-coding RNA (lncRNA), ribosomal RNA (rRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA) and CRNA (complementary RNA), and the term “deoxyribonucleic acid” (DNA) is inclusive of cDNA and genomic DNA and DNA-RNA hybrids.
As used herein, “HPV” refers to any human papillomavirus, including HPV strains 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 66, and 68. In particular embodiments, the HPV strain is HPV-16. As used herein, “positive for HPV” or “HPV-positive” means the subject and/or sample has been infected with the HPV virus.
As used herein, “p16” refers to the p16 gene, as well as gene products encoded and/or derived therefrom.
As used herein, OPSCC refers to “oropharyngeal squamous cell carcinoma,” a cancer commonly referred to as “throat cancer” and/or “tonsil cancer,” derived from the squamous epithelium lining the middle part of the pharynx (the oropharynx), extending vertically from the soft palate to the superior area of the hyoid bone and including the base and posterior of the tongue, the tonsils, soft palate, and posterior and lateral pharyngeal walls. OPSCC can be categorized into HPV-positive and HPV negative cancer, sometimes also referred to as p16 and/or HPV/p16-positive and negative cancer. p16 (also known as p16INK4a and/or CDKN2A) is a cyclin-dependent kinase inhibitor and tumor suppressor protein encoded by the CDKN2A gene commonly used as a biomarker for epithelial neoplasia and optionally used as a proxy for HPV infection.
As used herein, a “genome product” refers to any material produced from the expression of a genome such as a viral genome, including but not limited to DNA, RNA (e.g., mRNA, miRNA, etc.), RNP, and proteins.
“Amino acid sequence” and terms such as “peptide”, “polypeptide”, and “protein” are used interchangeably herein, and are not meant to limit the amino acid sequence to the complete, native amino acid sequence (i.e., a sequence containing only those amino acids found in the protein as it occurs in nature) associated with the recited protein molecule.
As used herein, the terms “sequence information,” “nucleic acid sequencing information,” “nucleic acid sequencing data,” “sequence information,” “nucleic acid sequence,” “genomic sequence,” “genome sequence,” “genetic sequence,” “fragment sequence,” “nucleic acid sequencing read,” “read,” and the like denote any information or data that is indicative of the order of the nucleotide bases (e.g., adenine, guanine, cytosine, and thymine/uracil) in a molecule (e.g., whole genome, whole transcriptome, exome, oligonucleotide, polynucleotide, fragment, etc.) of DNA or RNA. It should be understood that the present teachings contemplate sequence information obtained using all available varieties of techniques, platforms or technologies, including, but not limited to: capillary electrophoresis, microarrays, ligation-based systems, polymerase-based systems, hybridization-based systems, direct or indirect nucleotide identification systems, pyrosequencing, ion- or pH-based detection systems, electronic signature-based systems, etc. Further examples of sequencing technologies include those described in, e.g., U.S. PGPub US/2020/0051663 and US/2020/0002747, incorporated herein by reference.
As used herein, the term “dataset” refers to a collection of related sets of information, i.e., data, attained from experimental or computational analyses, comprising any type of data, including but not limited to nucleic acid sequences or amino acid sequences (i.e., “sequence information”). The dataset may be screened and/or otherwise searched for particular data of interest depending on variable parameters as defined by each particular dataset. In some embodiments, the dataset is a nucleic acid dataset, i.e., a dataset comprising nucleic acid sequences. The source material (e.g., healthy subject(s) and/or patient(s)) may be alternatively referred to as a database, a repository, a reference group, a cohort, a library, or any similar terminology understood in the art.
In some embodiments, a dataset set of invention may be a collection of reference viral genomes, also referred to as a reference library. In some embodiments, a reference library may comprise two or more viral genomes (e.g., two or more reference viral genomes). The viral genomes may be known viral genomes and/or newly isolated and/or sequenced viral genomes, and as such a reference library may comprise the sequence information of known viral genomes and/or of newly sequenced viral genomes including any viral genomes not yet known and/or sequences. In some embodiments, a reference library of the present invention is a pre-existing reference library, de novo generated from newly sequenced viral genomes, or any combination thereof.
As used herein, the term “biomarker” can mean any chemical or biological entity that is produced by cells (e.g., cells of the subject), or substances that are produced by cells that might be then chemically modified by extracellular enzymes, free radicals produced by cells of the body and/or other naturally occurring processes and that is found, for example, in the saliva, urine, blood, vaginal secretion, tears, feces, sputum, hair, nails, skin, wound fluid, nasal swab, lymph, perspiration, oral mucosa, vaginal mucosa, or the anus, or in serum or plasma obtained from blood. In some embodiments, a biomarker of the present invention may be a polymorphism.
The terms “polymorphic” and “polymorphism” as used herein (e.g. genetic variation), refer to variation in the sequence of a gene in the genome or the encoded amino acid sequence thereof amongst a population, such as allelic variations and other variations that arise or are observed. Thus, a polymorphism refers to the occurrence of two or more genetically determined alternative sequences or alleles in a population. These differences can occur in coding and non-coding portions of the genome, and can be manifested or detected as differences in nucleic acid sequences and/or gene expression, including, for example, transcription, processing, translation, transport, protein processing, trafficking, DNA synthesis, expressed proteins, protein modifications, RNA expression modification, DNA and RNA methylation, regulatory factors that alter gene expression and DNA replication, other gene products or products of biochemical pathways or in post-translational modifications, and any other differences manifested amongst members of a population in genomic nucleic acid or organelle nucleic acids.
A polymorphic site or polymorphic position refers to a site in the nucleic acid sequence at which divergence occurs. Polymorphic markers include, but are not limited to, restriction fragment length polymorphisms, variable number of tandem repeats (VNTR's), hypervariable regions, minisatellites, dinucleotide repeats, trinucleotide repeats, tetranucleotide repeats and other repeating patterns, simple sequence repeats and insertional elements, such as Alu. Polymorphic forms also include different mendelian alleles for a gene. A single nucleotide polymorphism (SNP) refers to a polymorphism that arises as the result of a single base change (i.e., a single nucleotide position), such as an insertion, deletion or change in a base. The term “genotype” refers to a description of the alleles of a gene or genes contained in an individual or a sample.
When compared to a reference genome, a particular genotype (e.g., of a genome in an individual or sample) may be referred to as having more or less divergence from the reference sequence, relative to the number of polymorphisms comprise in the genome as compared to the reference genome. Divergence as compared to a reference genome may be expressed in qualitative and/or quantitative terms, such as but not limited to, as a number of polymorphisms (e.g., SNPs, non-synonymous (i.e., coding) polymorphisms, synonymous polymorphisms, all polymorphisms, or any combination thereof), relative and/or absolute copy number of any one or more polymorphisms as compared to the reference genome, as well as a direct and/or normalized comparison of nucleic acid sequence information of any one or more genome product, genome fragment, and/or total genome as compared to the reference genome (e.g., percent sequence identity, sequence identity score, various sequence distance scores, and the like). As used herein, relative terms of divergence such as a “low” divergence or a “high” divergence may correspond with a quantifiable amount of divergence. For example, in some embodiments a low divergence may comprise a divergence of less than or equal to 100, 99.9, 99.5, 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 68, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0.75, 0.5, 0.25, 0.10, 0.05, 0.01% from a reference genome, in terms of any of the above quantified sequence information variables (e.g., number of polymorphisms, copy number of polymorphisms, percent sequence identity, sequence identity score, various sequence distance scores and/or other sequence information). In some embodiments a high divergence may comprise a divergence of greater than or equal to 100, 99.9, 99.5, 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 68, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0.75, 0.5, 0.25, 0.10, 0.05, 0.01% from a reference genome, in terms of any of the above quantified sequence information variable. In some embodiments, a low divergence may comprise a divergence of less than or equal to a particular predetermined threshold for a relevant variable (e.g., number of polymorphisms, copy number of polymorphisms, percent sequence identity, sequence identity score, various sequence distance scores and/or other sequence information). In some embodiments, a “high” divergence may comprise a divergence of greater than or equal to such a predetermined threshold.
As used herein, the term “sample” is used in its broadest sense. In one sense, it is meant to include a specimen from a biological source. Biological samples can be obtained from animals (including humans) and encompass fluids (e.g., blood, mucus, urine, saliva), solids, tissues, cells, and gases. In some embodiments, the sample is obtained from a tumor (e.g., tumor stroma) in the subject. The sample may also comprise one or more immune cells, including T cells of the subject, including immune cells (e.g., helper T cells) from the tumor (e.g., tumor stroma) of the subject. Thus, in the methods of this invention, the sample can be any biological fluid or tissue that can be used in a method of this invention, including but not limited to, serum, plasma, blood, saliva, semen, lymph, cerebrospinal fluid, prostatic fluid, urine, sputum, oral mucosa, nasal mucosa, duodenal fluid, gastric fluid, skin, endothelium, biopsy material from a salivary gland, biopsy material of a parotid gland, biopsy material of other glands of the mouth, secretions of the salivary gland, secretions of the parotid gland, secretions of other glands of the mouth, joint fluid, body cavity fluid, tear fluid, anal secretions; vaginal secretions, perspiration, whole cells, cell extracts, tissue, biopsy material, aspirates, exudates, slide preparations, fixed cells, tissue sections, etc.
A “subject” of the invention may include any animal in need thereof. In some embodiments, a subject may be, for example, a mammal, a reptile, a bird, an amphibian, or a fish. A mammalian subject may include, but is not limited to, a laboratory animal (e.g., a rat, mouse, guinea pig, rabbit, primate, etc.), a farm or commercial animal (e.g., cattle, pig, horse, goat, donkey, sheep, etc.), or a domestic animal (e.g., cat, dog, ferret, gerbil, hamster etc.). In some embodiments, a mammalian subject may be a primate, or a non-human primate (e.g., a chimpanzee, baboon, macaque (e.g., rhesus macaque, crab-eating macaque, stump-tailed macaque, pig-tailed macaque), monkey (e.g., squirrel monkey, owl monkey, etc.), marmoset, gorilla, etc.). In some embodiments, a mammalian subject may be a human. The terms “subject” and “patient” are in some embodiments used interchangeably herein, such as but not limited to in reference to a human subject or patient.
A “subject in need” of the methods of the invention can be any subject known or suspected to have an HPV+ cancer such as OPSCC to which the methods of the present invention disclosed herein may provide beneficial health effects, or a subject having an increased risk of developing the same.
As used herein the term “control” refers to a comparative sample and/or other reference source for a control subject.
The terms “administering” and “administration” of a treatment to a subject include any route of introducing or delivering to a subject a compound to perform its intended function. Administration can be carried out by any suitable route, including orally, intranasally, parenterally (intravenously, intramuscularly, intraperitoneally, intracisternally, intrathecally, intraventricularly, or subcutaneously), or topically. Administration includes self-administration and administration by another.
By the terms “treat,” “treating,” and “treatment of” (or grammatically equivalent terms) it is meant that the severity of the subject's condition is reduced or at least partially improved or ameliorated and/or that some alleviation, mitigation or decrease in at least one clinical symptom is achieved and/or there is a delay in the progression of the condition and/or prevention or delay of the onset of a disease or disorder.
As used herein, the terms “prevent,” “prevents,” and “prevention” (and grammatical equivalents thereof) refer to a delay in the onset of a disease or disorder or the lessening of symptoms upon onset of the disease or disorder. The terms are not meant to imply complete abolition of disease and encompass any type of prophylactic treatment that reduces the incidence of the condition or delays the onset and/or progression of the condition.
A “treatment effective” amount as used herein is an amount that is sufficient to provide some improvement or benefit to the subject. Alternatively stated, a “treatment effective” amount is an amount that will provide some alleviation, mitigation, decrease or stabilization in at least one clinical symptom in the subject. Those skilled in the art will appreciate that the therapeutic effects need not be complete or curative, as long as some benefit is provided to the subject.
A “prevention effective” amount as used herein is an amount that is sufficient to prevent and/or delay the onset of a disease, disorder and/or clinical symptoms in a subject and/or to reduce and/or delay the severity of the onset of a disease, disorder and/or clinical symptoms in a subject relative to what would occur in the absence of the methods of the invention. Those skilled in the art will appreciate that the level of prevention need not be complete, as long as some benefit is provided to the subject.
The term “enhance” or “increase” refers to an increase in the specified parameter of at least about 1.25-fold, 1.5-fold, 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 8-fold, 10-fold, twelve-fold, or even fifteen-fold, and/or at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100% or more, or any value or range therein.
The term “inhibit” or “reduce” or grammatical variations thereof as used herein refers to a decrease or diminishment in the specified level or activity of at least about 15%, 25%, 35%, 40%, 50%, 60%, 75%, 80%, 90%, 95% or more. In particular embodiments, the inhibition or reduction results in little or essentially no detectible activity (at most, an insignificant amount, e.g., less than about 10% or even 5%).
In some embodiments, a subject of the present invention may be administered standard (e.g., “therapeutic”) treatment for a relevant disorder of the invention, such as but not limited to, OPSCC. Standard therapeutic treatment regimens for HPV+ cancers such as OPSCC are known in the art and can be readily implemented by the skilled artisan (e.g., an oncologist). As used herein, the term “escalate” and the like (e.g., “escalated,” “escalation, etc.”) may be used to refer to standard (e.g., “therapeutic” level) treatment for the disorder, and/or to refer to a return to standard/therapeutic level treatment following prior lack of treatment (naïve) or other (e.g., de-escalated, alternative, supratherapeutic, etc.) treatment.
In some embodiments, a subject of the present invention may be administered “de-escalated” (e.g., “sub-therapeutic”) treatment for a relevant disorder of the invention, such as but not limited to, OPSCC. As used herein, the term “de-escalate” and the like (e.g., “de-escalated,” de-escalating,” etc.) refers to a reduction (e.g., to a decrease or diminishment in the specified dose or activity of at least about 15%, 25%, 35%, 40%, 50%, 60%, 75%, 80%, 90%, 95% or more) or elimination of a standard (e.g., “therapeutic”) treatment of the disorder (e.g., to a “sub-therapeutic” level of treatment). De-escalation may be inclusive of any one or more treatment(s) targeted to the disorder, in any combination, as well as de-escalation of an entire treatment regimen. In some embodiments, the de-escalation may be a reduction or elimination of, for example, radiation therapy, chemotherapy, immunotherapy, surgery, and/or intubation of the subject.
This invention relates to human papilloma virus (HPV) positive cancers such as HPV+ squamous cell carcinoma of the oropharynx (OPSCC). This invention further relates methods of determining treatment regimens, methods of stratifying prognosis from treatment of HPV positive cancers, methods of determining suitability for de-escalation of treatment of HPV positive cancers, and methods of treating HPV positive cancers.
HPV subtype 16 (HPV16) is the most common high-risk HPV causing OPSCC. There are four main variant lineages (A, B, C, D) and at least 10 known sub-lineages (A1, A2, A3, A4, B1, B2, C, D1, D2, D3) of HPV16. Formerly, these sub-lineages were geographically termed European (A1-3), Asian (A4), African-1 (B), African-2 (C), and North-American/Asian-American (D1-3). Lineages are defined by 1-10% differences and sub-lineages defined by 0.5-1% differences in the L1 (capsid protein) sequence (Smith B, Chen Z, Reimers et al. PLOS ONE. 2011; 6(6): e21375; Burk et al. Virology. 2013; 445(1-2):232-243; Mirabello et al. J Natl Cancer Inst. 2016; 108(9)).
Along with OPSCC and other anogenital cancers, HPV16 is also the most common high-risk HPV associated with cervical cancer. HPV16 sub-lineage classification of cervical cancer has indicated that though lineage A1 is most prevalent, non-A HPV16 lineages (B/C/D) are associated with a higher risk of precancer and cancer (Burk 2013; Mirabello 2016). HPV16 lineage D variants are thought to be associated with the highest rate of persistent infection and progression to cervical cancer compared with other variants.
HPV16 viral oncoproteins E6 and E7 bind to p53 and pRb, respectively, and contribute to carcinogenesis and inhibition of apoptosis and entry of cell into the S phase. The E6 L83V polymorphism has been associated with infection persistence and progression of cervical carcinoma within the A1-3 sub-lineages. However, few studies have examined HPV16+ OPSCC.
RNA sequencing data have been used to determine HPV positivity and viral genome integration (Cancer Genome Atlas Network. Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature. 2015; 517(7536):576-582. HPV integration in OPSCC has been reported to correlate with genomic methylation and tumor mutational burden (Parfenov et al. Proc Natl Acad Sci USA. 2014; 111(43):15544-15549), genomic instability (Akagi et al. Genome Res. 2014; 24(2):185-199), HPV gene expression (Walline et al. Mol Cancer Res MCR. 2016; 14(10):941-952), and tumor immune landscape and survival (Koneva et al. Mol Cancer Res MCR. 2018; 16(1):90-102).
HPV+ OPSCC has an improved prognosis compared to non-HPV OPSCC, however, treatment can carry significant, lifelong therapeutic toxicity. De-escalation of therapy represents an effort to limit morbidity while preserving tumor control (Cheraghlou et al. Cancer. 2018; 124(4):717-726; Chera et al. Cancer. 2018; 124(11):2347-2354; Chera et al. Clin Cancer Res Off J Am Assoc Cancer Res. 2019; 25(15):4682-4690; Marur et al. J Clin Oncol Off J Am Soc Clin Oncol. 2017; 35(5):490-497), however, initial results have been mixed and there are few available tools to select appropriate patients with favorable prognosis.
Thus, one aspect of the invention relates to a method of determining a treatment regimen for a subject having HPV positive (HPV+) cancer (e.g., OPSCC) or a subject at risk for or suspected to have or develop HPV+ cancer (e.g., OPSCC), comprising: a) obtaining a sample from the subject; b) detecting a level of expression of HPV virus genome and/or genome product in the sample; c) obtaining sequence information of the HPV viral genome and/or genome product in the sample; d) comparing the sequence information obtained in c) to a reference HPV viral genome; and e) determining the prognosis of the subject upon treatment to the cancer, wherein a low divergence of the sequence information of the detected HPV virus genome and/or genome product as compared to the reference HPV viral genome identifies the subject as a candidate for standard (e.g., “therapeutic”) treatment for OPSCC, and wherein a greater divergence identifies the subject as a suitable candidate for de-escalated (e.g., “sub-therapeutic”) treatment for the cancer.
Another aspect of the invention relates to a method of stratifying prognosis from treatment of HPV+ cancer in a subject having HPV+ cancer or a subject at risk for or suspected to have or develop HPV+ cancer, comprising: a) obtaining a sample from the subject; b) detecting a level of expression of HPV virus genome and/or genome product in the sample; c) obtaining sequence information of the HPV viral genome and/or genome product in the sample; d) comparing the sequence information obtained in c) to a reference HPV viral genome; and e) stratifying the prognosis of the subject upon treatment to the cancer, wherein a low divergence of the sequence information of the detected HPV virus genome and/or genome product as compared to the reference HPV viral genome categorizes the subject as having elevated risk of poor prognosis from de-escalated (e.g., “sub-therapeutic”) treatment to the cancer, and wherein a greater divergence categorizes the subject as having reduced risk of poor prognosis from de-escalated treatment to the cancer.
Another aspect of the invention relates to a method of determining suitability for de-escalation of treatment of human papillomavirus (HPV) positive cancer in a subject having human papillomavirus (HPV) positive (HPV+) cancer or a subject at risk for or suspected to have or develop HPV+ cancer, comprising: a) obtaining a sample from the subject; b) detecting a level of expression of HPV virus genome and/or genome product in the sample; c) obtaining sequence information of the HPV viral genome and/or genome product in the sample; d) comparing the sequence information obtained in c) to a reference HPV viral genome; and e) determining the prognosis of the subject upon treatment to the cancer, wherein a greater divergence identifies the subject as a suitable candidate for de-escalated (e.g., “sub-therapeutic”) treatment for the cancer.
In some embodiments, a low divergence identifies the subject as not a suitable candidate for de-escalated treatment for the cancer; e.g., identifies the subject as having elevated risk of poor prognosis from de-escalated treatment to cancer.
In some embodiments, methods of the present invention may further comprise treating the subject identified as a suitable candidate for de-escalated treatment and/or having reduced risk of poor prognosis with de-escalated treatment to the cancer.
Another aspect of the invention relates to a method of de-escalating treatment of human papillomavirus (HPV) positive cancer in a subject having human papillomavirus (HPV) positive (HPV+) cancer or a subject at risk for or suspected to have or develop HPV+ cancer and undergoing standard (e.g., “therapeutic,” “escalated”) treatment of the cancer, comprising: a) obtaining a sample from the subject; b) detecting a level of expression of HPV virus genome and/or genome product in the sample; c) obtaining sequence information of the HPV viral genome and/or genome product in the sample; d) comparing the sequence information obtained in c) to a reference HPV viral genome; e) identifying the risk of poor prognosis of the subject from de-escalated treatment to the cancer, wherein a low divergence of the sequence information of the detected HPV virus genome and/or genome product as compared to the reference HPV viral genome categorizes the subject as having elevated risk of poor prognosis from de-escalated (e.g., “sub-therapeutic”) treatment to the cancer, and wherein a greater divergence categorizes the subject as having reduced risk of poor prognosis from de-escalated treatment to the cancer; and f) treating the subject having reduced risk of poor prognosis with de-escalated treatment as compared to standard treatment for the cancer.
Another aspect of the invention relates to a method of treating human papillomavirus (HPV) positive (HPV+) cancer in a subject having human papillomavirus (HPV) positive (HPV+) cancer or a subject at risk for or suspected to have or develop HPV+ cancer, comprising: a) obtaining a sample from the subject; b) detecting a level of expression of HPV virus genome and/or genome product in the sample; c) obtaining sequence information of the HPV viral genome and/or genome product in the sample; d) comparing the sequence information obtained in c) to a reference HPV viral genome; e) identifying the risk of poor prognosis of the subject from de-escalated treatment to the cancer, wherein a low divergence of the sequence information of the detected HPV virus genome and/or genome product as compared to the reference HPV viral genome categorizes the subject as having elevated risk of poor prognosis from de-escalated (e.g., “sub-therapeutic”) treatment to the cancer, and wherein a greater divergence categorizes the subject as having reduced risk of poor prognosis from de-escalated treatment to the cancer; and f) treating the subject identified as having reduced risk of poor prognosis from de-escalated treatment with de-escalated treatment as compared to standard (e.g., “therapeutic”) treatment for the cancer.
The sequence information of the HPV viral genome and/or genome product of the invention may be obtained by any suitable method known in the art, including but not limited to standard nucleic acid sequencing methods as will be apparent to one skilled in the art upon review of the present disclosure. Many methods are known in the art for obtaining sequence information and are within the scope of the presently disclosed subject matter, including but not limited to whole genome (e.g., viral genome) sequencing, Sanger sequencing, next-generation sequencing (NGS), high-throughput sequencing, pyrosequencing (“454” sequencing), sequencing by ligation (SOLID sequencing), nanopore sequencing, polony sequencing, massively parallel signature sequencing (MPSS), iIllumina sequencing, metagenomic sequencing, polymerase chain reaction (PCR) amplification sequencing, target enrichment sequencing, RNA sequencing (RNAseq), chromatin-immunoprecipitation sequencing (chIP-seq), and shotgun sequencing. In some embodiments, obtaining sequence information of the HPV viral genome and/or genome product in the sample may comprise obtaining RNA and/or DNA sequence information.
In some embodiments of the methods of the invention, the comparing step (e.g., of comparing the sequence information obtained to a reference HPV viral genome) may comprise identifying one or more polymorphism(s) of the detected HPV virus genome and/or genome product and comparing the identified one or more polymorphism(s) to polymorphisms of the reference HPV viral genome, wherein a divergence of 17 or fewer (e.g., 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, or fewer, or any value or range therein) polymorphisms (e.g., single-nucleotide polymorphisms (SNPs) and/or non-synonymous polymorphisms) of the detected HPV virus genome and/or genome product as compared to the reference HPV viral genome categorizes the subject as having elevated risk of poor prognosis from de-escalated treatment to the cancer, and wherein a divergence of 18 or more (e.g., 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more, or any value or range therein) polymorphisms (e.g., single-nucleotide polymorphisms (SNPs) and/or non-synonymous polymorphisms) to the reference HPV viral genome categorizes the subject as having reduced risk of poor prognosis from de-escalated treatment to the cancer.
In some embodiments, a low diverge of the detected HPV virus genome and/or genome product may be a divergence of 17 (e.g., 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, or fewer, or any value or range therein) polymorphisms of any kind as compared to the reference HPV viral genome, e.g., SNPs, non-synonymous (i.e., coding) polymorphisms, synonymous polymorphisms, or any combination thereof. In some embodiments, a low diverge of the detected HPV virus genome and/or genome product may be a divergence of 8 (e.g., 8, 7, 6, 5, 4, 3, 2, 1, or fewer, or any value or range therein) non-synonymous polymorphisms of the detected HPV virus genome and/or genome product as compared to the reference HPV viral genome.
In some embodiments of the methods of the invention, the comparing step (e.g., of comparing the sequence information obtained to a reference HPV viral genome) may comprise performing statistical and/or probabilistic analyses which may indicate a relative relationship to (“near” or “far”) the reference HPV viral genome. Upon review of the present disclosure, those skilled in the art will be familiar with numerous statistical and/or probabilistic analyses and variations thereof that can be useful for carrying out the methods of the presently disclosed subject matter. For example, in some embodiments, a nearest neighbor analysis may be useful for carrying out the methods of the invention. As used herein, the term “nearest neighbor,” nearest neighbor distribution,” “nearest neighbor function,” and the like refer to a mathematical function that is defined in relation to a point process, representable as randomly positioned points in time, space or both. Nearest neighbor functions are defined with respect to some point in the point process as being the probability distribution of the distance from this point to its nearest neighboring point in the same point process, and accordingly describe the probability of another point existing within some distance of a particular point.
Accordingly, in some embodiments of the methods of the invention, the comparing step (e.g., of comparing the sequence information obtained to a reference HPV viral genome) may comprise determining a nearest neighbor distribution of the detected HPV virus genome and/or genome product as compared to the reference HPV viral genome, wherein a nearest neighbor distribution of the sequence information of the detected HPV virus genome and/or genome product less than a predetermined threshold as compared to (“grouping with”) the reference HPV viral genome categorizes the subject as having elevated risk of poor prognosis from de-escalated treatment to the cancer, and wherein a nearest neighbor distribution equal to or greater than the pre-determined threshold as compared to (e.g., “grouping outside” of) the reference HPV viral genome categorizes the subject as having reduced risk of poor prognosis from de-escalated treatment to the cancer.
In some embodiments of the methods of the invention, the comparing step (e.g., of comparing the sequence information obtained to a reference HPV viral genome) may comprise calculating a sequence distance (e.g., identity score and/or % sequence identity) of the sequence information of the detected HPV virus genome and/or genome product as compared to the reference HPV viral genome, wherein a sequence distance equal to or less than a pre-determined threshold of the detected HPV virus genome and/or genome product as compared to (e.g., “near” to) the reference HPV viral genome categorizes the subject as having elevated risk of poor prognosis from de-escalated treatment to the cancer, and wherein a sequence distance greater than the pre-determined threshold of the detected HPV virus genome and/or genome product as compared to (e.g., “far” from) the reference HPV viral genome categorizes the subject as having reduced risk of poor prognosis from de-escalated treatment to the cancer.
In some embodiments of the methods of the invention, the comparing step (e.g., of comparing the sequence information obtained to a reference HPV viral genome) may comprise any combination of the above methods of comparing.
In some embodiments, the pre-determined threshold for the nearest neighbor distribution and/or the sequence distance is determined by a phylogenetic analysis (e.g., maximum parsimony phylogenetic analysis) of the reference HPV viral genome, wherein the HPV viral genome is a reference library (e.g., a de novo reference library, e.g., a pre-existing reference library) comprising a multitude of (e.g., at least two or more) reference HPV viral genomes.
In some embodiments, the phylogenetic analysis establishes at least two or more groups, wherein at least one group includes a HPV A1 reference genome (e.g., the “in-group,” the “high risk group”), and wherein at least one group does not include a HPV A1 reference genome (e.g., the “out-group,” the “low risk group”).
In some embodiments, the low risk group may include but is not limited to any one of the following HPV genomes (e.g., HPV reference genomes) which sequences are provided in the SEQUENCES section, in any combination thereof: “UNCseq1628” SEQ ID NO:13, “UNCseq1770” SEQ ID NO:23, “UNCseq2162” SEQ ID NO:58, “UNCseq1750” SEQ ID NO: 22, “UNCseq1864” SEQ ID NO:30, “UNCseq1867” SEQ ID NO:31, “UNCseq2234” SEQ ID NO:62, “UNCseq2491” SEQ ID NO:80, “UNCseq2539” SEQ ID NO:82, “UNCseq2012” SEQ ID NO:48, “UNCseq2554” SEQ ID NO:83, “UNCseq2771” SEQ ID NO: 93, “UNCseq1796” SEQ ID NO:25, “UNCseq0733” SEQ ID NO: 1, “UNCseq2051” SEQ ID NO: 52, “UNCseq2789” SEQ ID NO: 102, “UNCseq2106” SEQ ID NO:56, “UNCseq1742” SEQ ID NO:20, “UNCseq1140” SEQ ID NO:4, “UNCseq1849” SEQ ID NO: 28, “UNCseq1980” SEQ ID NO:41, “UNCseq2005” SEQ ID NO:45, “UNCseq2105” SEQ ID NO:55, “UNCseq1360” SEQ ID NO:6, “UNCseq1525” SEQ ID NO:7, “UNCseq1891” SEQ ID NO:32, “UNCseq1662” SEQ ID NO: 14, “UNCseq1924” SEQ ID NO: 36, “UNCseq2292” SEQ ID NO:66, “UNCseq1994” SEQ ID NO:43, “UNCseq2007” SEQ ID NO:46, “UNCseq1834” SEQ ID NO:26, “UNCseq2783” SEQ ID NO:100, “UNCseq2786” SEQ ID NO:101, “UNCseq1693” SEQ ID NO:66, “UNCseq2033” SEQ ID NO: 51, “UNCseq0848” SEQ ID NO:2, “UNCseq2249” SEQ ID NO:63, “UNCseq2794” SEQ ID NO: 106, “UNCseq2393” SEQ ID NO: 72, “UNCseq2576” SEQ ID NO:84, “UNCseq2795” SEQ ID NO: 107, “UNCseq1697” SEQ ID NO: 17, “UNCseq1938” SEQ ID NO: 38, “UNCseq1991” SEQ ID NO:42, GenBank® Accession Nos. AF536179.1 (HPV16 A2), HQ644236.1 (HPV16 A3), AF534061.1 (HPV16 A4), AF536180.1 (HPV16 B1), KU053915.1 (HPV16 B2), HQ644298.1 (HPV16 B3), KU053914.1 (HPV16 B4), AF472509.1 (HPV16 C1), HQ644244.1 (HPV16 C2), KU053920.1 (HPV16 C3), KU053925.1 (HPV16 C4), HQ644257.1 (HPV16 D1), AY686579.1 (HPV16 D2), AF402678.1 (HPV16 D3), KU053931.1 (HPV16 D4), and/or any of SEQ ID NOs: 108-123, in any combination thereof.
In some embodiments, the high risk group may include any HPV genomes (e.g., HPV reference genomes) which sequences are provided in the SEQUENCES section other than those listed above, in any combination thereof, including but not limited to HPV16 A1.
In some embodiments, the high risk group is defined as inclusive of any viral genome with a divergence of 17 or fewer (e.g., 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, or fewer, or any value or range therein) polymorphisms (e.g., single-nucleotide polymorphisms (SNPs) and/or non-synonymous polymorphisms) as compared to the reference HPV viral genome.
In some embodiments, the high risk group is defined as inclusive of any viral genome with a divergence of 8 or fewer (e.g., 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, or fewer, or any value or range therein) non-synonymous polymorphisms as compared to the reference HPV viral genome.
The sequence distance score may be calculated by any suitable method known in the art. In some embodiments, the sequence distance score is calculated according to JC69 or F81 sequence distances.
Polymorphisms relevant to the present invention include any known or as yet discovered polymorphisms between a relevant HPV genome and the reference HPV viral genome of HPV16 A1 strain GenBank® Accession No. NC_001526.4. In some embodiments, the one or more polymorphism(s) of the invention comprise all polymorphisms, single nucleotide polymorphisms (SNPs), non-synonymous (i.e., coding) polymorphisms, synonymous polymorphisms, or any combination thereof. For example, in some embodiments, the one or more polymorphism(s) comprise non-synonymous polymorphisms and/or SNPs. In some embodiments, the one or more polymorphism(s) comprises non-synonymous polymorphisms. In some embodiments, the one or more polymorphism(s) comprise any one or more of E5 I44L, E5 I65V, E2 P219S, L1 T266A, L2 L330F, E6 L90V, E1 S220T, L2 S269P, E2 T310K, E2 I210T, L1 T353P, E2 E232K, E2 A143T, L2 I420T, or E2 N203D polymorphisms, any one of the polymorphisms of
In some embodiments of the methods of the present invention, the identifying step (e.g., identifying one or more polymorphism(s) of the detected HPV virus genome and/or genome product and comparing the identified polymorphisms to polymorphisms of the reference HPV viral genome) comprises identifying the presence of one or more of a subset of specific polymorphisms in the detected HPV virus genome and/or genome product and comparing the identified polymorphisms to the presence of the subset of specific polymorphisms in the reference HPV viral genome, wherein a divergence of less than about 8 (e.g., about 5, 6, 7, 8, 9, 10 or more) polymorphisms of the detected HPV virus genome and/or genome product as compared to the reference HPV viral genome categorizes the subject as having elevated risk of poor prognosis from de-escalated treatment to the cancer, and wherein a divergence of greater than about 8 polymorphisms (e.g., about 5, 6, 7, 8, 9, 10 or more) to the reference HPV viral genome categorizes the subject as having reduced risk of poor prognosis from de-escalated treatment to the cancer. The subset of specific polymorphisms may comprise any known polymorphism of interest and/or a polymorphism not yet known or discovered. In some embodiments, the subject of specific polymorphisms may include, but is not limited to, E5 I44L, E5 I65V, E2 P219S, L1 T266A, L2 L330F, E6 L90V, E1 S220T, L2 S269P, E2 T310K, E2 I210T, L1 T353P, E2 E232K, E2 A143T, L2 I420T, or E2 N203D polymorphisms, any one of the polymorphisms of
The polymorphisms described above and elsewhere herein are denoted by [protein name, residue position relative to that protein], wherein the numbering corresponds to the relevant amino acid sequence encoded by the reference HPV viral genome of HPV16 A1 strain GenBank® Accession No. NC_001526.4. However it would be readily understood by one of ordinary skill in the art that the equivalent amino acid positions in other HPV virus amino acid sequences or other HPV virus genome sequences can be readily identified and employed in the utilization of this invention.
Standard therapeutic treatment regimens for HPV+ cancers such as OPSCC are known in the art and can be readily implemented by the skilled artisan (e.g., an oncologist). De-escalation (“sub-therapeutic”) of standard (“therapeutic”) may comprise a reduction of any standard treatment for a particular disorder which reduces an undesirable side effect of the standard therapeutic dose of said treatment. In some embodiments, de-escalated treatment comprises treating the subject with no and/or a reduced amount (e.g., 5%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% reduced amount or any value or range therein) as compared to standard treatment.
In some embodiments, de-escalated treatment of HPV+ OPSCC comprises treating the subject with no and/or a reduced amount (e.g., 5%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% reduced amount or any value or range therein) as compared to standard treatment of one or more treatments including but not limited to radiation therapy, chemotherapy, immunotherapy, surgery, and/or intubation of the subject.
In some embodiments, de-escalated treatment to the cancer comprises reducing the effective amount of chemotherapy administered to the subject (e.g., reducing the effective amount to about 5% to about 30% of the effective amount of chemotherapy; e.g., wherein de-escalated treatment to OPSCC comprises treating the subject with a sub-therapeutic amount (e.g., about 5% to about 30% of the effective amount) of chemotherapy.
In some embodiments, de-escalated treatment to the cancer comprises reducing the effective amount of radiation therapy administered to the subject (e.g., reducing the effective amount to about 5% to about 30% of the effective amount or radiation therapy; e.g., wherein de-escalated treatment to OPSCC comprises treating the subject with a sub-therapeutic amount (e.g., about 5% to about 30% of the effective amount) of radiation therapy.
In some embodiments, de-escalated treatment to the cancer (e.g., HPV+ cancer, e.g., HPV+ OPSCC) comprises not performing surgery for removal of cancerous and/or precancerous tissue on the subject.
In some embodiments, the subject is receiving and/or has previously received standard (e.g., “therapeutic,” “escalated”) treatment for the cancer (e.g., HPV+ cancer, e.g., HPV+ OPSCC).
In some embodiments, the subject is not receiving and/or has not previously received standard (e.g., “therapeutic,” “escalated”) treatment for the cancer (i.e., wherein the subject is treatment naïve).
In some embodiments, the subject is newly diagnosed has having cancer (e.g., HPV+ cancer, e.g., HPV+ OPSCC).
The HPV virus genome and/or genome product in the sample may comprise the genome and/or any genome product of any human papillomavirus, including but not limited to HPV strains 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 66, and 68. In some embodiments, the HPV virus genome and/or genome product in the sample comprises the genome and/or a genome product of a strain of HPV16 (e.g., sub-lineage(s) HPV16 A1, HPV16 A2-3, HPV16B1-4, HPV16 C1-4, and/or HPV16 D1-4).
The reference HPV viral genome may be any HPV viral genome of use as a reference for the utilization of the methods of this invention. In some embodiments, the reference HPV viral genome comprises a “high risk” HPV viral genome (e.g., an HPV viral genome encompassed in the high risk group, per phylogenetic analysis).
In some embodiments, the reference HPV viral genome comprises one or more reference HPV viral genomes (e.g., a reference library including but not limited to a de novo prepared reference library and/or a pre-existing reference library, e.g., UNCseq database).
In some embodiments, the reference HPV viral genome comprises the viral genome(s) of HPV strain GenBank® Accession No. NC_001526.4 (HPV16 A1, also numerated as K02718.1), AF536179.1 (HPV16 A2), HQ644236.1 (HPV16 A3), AF534061.1 (HPV16 A4), AF536180.1 (HPV16 B1), KU053915.1 (HPV16 B2), HQ644298.1 (HPV16 B3), KU053914.1 (HPV16 B4), AF472509.1 (HPV16 C1), HQ644244.1 (HPV16 C2), KU053920.1 (HPV16 C3), KU053925.1 (HPV16 C4), HQ644257.1 (HPV16 D1), AY686579.1 (HPV16 D2), AF402678.1 (HPV16 D3), KU053931.1 (HPV16 D4), any one of the sequences provided in the SEQUENCES section or any combination thereof.
In some embodiments, the reference HPV viral genome comprises, consists essentially of, or consists of the viral genome of HPV16 A1 strain GenBank® Accession No. NC_001526.4.
In some embodiments, the sample comprises a biopsy sample, blood sample, saliva sample, oral washing sample, and/or a direct tumor brushing/swabbing sample.
In some embodiments, the cancer is any HPV+ carcinoma, such as but not limited to HPV+ throat cancer and/or cervical cancer. In some embodiments, the cancer is HPV+ oropharyngeal squamous cell carcinoma (OPSCC).
The invention will now be described with reference to the following examples. It should be appreciated that these examples are not intended to limit the scope of the claims to the invention but are rather intended to be exemplary of certain embodiments. Any variations in the exemplified methods that occur to the skilled artisan are intended to fall within the scope of the invention.
Briefly, in order to develop prognostic biomarkers to identify appropriately low-risk patients for reduced treatment intensity in patients with HPV-positive (HPV+) squamous cell carcinoma of the oropharynx (OPSCC), targeted DNA sequencing including all HPV16 open reading frames was performed on tumors from 104 patients with HPV16+ OPSCC treated at a single center. Genotypes closely related to HPV16-A1 were associated with increased numbers of somatic copy-number variants in the human genome. Genotypes divergent from HPV16-A1 were strongly associated with favorable recurrence-free survival as compared to HPV16-A1 (or similar genotype); this finding was independent of tobacco smoke exposure. Total RNA sequencing was performed on a second cohort of 89 HPV16+ OPSCC cases. HPV16 genotypes divergent from HPV16-A1 were again validated in this independent cohort, to be strongly prognostic of improved RFS in patients with moderate (less than 30 pack-years) or low (no more than 10 pack-years) of tobacco smoke exposure. Genotypes divergent from HPV16-A1 were also associated with lower rates of viral integration in tumors from patients with low tobacco smoke exposure. Sequence divergence from the HPV16-A1 reference sequence to be strongly correlated with improved recurrence-free survival in patients with moderate or low tobacco smoke exposure in two independent cohorts.
In detail, to examine the diversity of HPV16 oncogenic genotypes promoting OPSCC in the study cohort, polymorphisms with variant allele frequency >0.9 were considered to be clonal. Based on clonal polymorphisms alone, HPV16 coding genotypes were highly diverse, with 93 distinct protein coding HPV16 genomes amongst the 104 tumors examined. To assess selective pressure for protein sequence conservation, the ratio of coding to synonymous clonal sequence variants were considered. All polymorphisms relative to the HPV16 A1 reference were considered, as well as uncommon polymorphisms, defined as polymorphisms identified in less than 25% of tumors. Based on these metrics, E1{circumflex over ( )}E4 and E2 demonstrated the least conservation amongst all viral genes (
The viral sequencing data had an average depth of coverage of the HPV16 genome of ˜8,000, enabling analysis of sub-clonal genomic variants (
HPV16 sub-lineage was also defined by the nearest HPV16 sub-lineage reference sequence in sequence space as determined by the Jukes and Cantor (JC69) (Jukes and Cantor. CHAPTER 24-Evolution of Protein Molecules. In: Munro H N, ed. Mammalian Protein Metabolism. Academic Press; 1969:21-132). Sixteen sub-lineages were queried based on contemporary studies (
Review of patient factors identified two patients presented with distant metastases and were treated with palliative intent; therefore 96 patients were available for analysis of recurrence-free-survival (RFS). The A1 sub-lineage was highly associated with poor recurrence-free survival (
Despite known associations of clinical stage and tobacco smoke exposure with outcome, there was no detectable association to viral sub-lineage (
The prevalence of common coding polymorphisms in HPV+ OPSCC was compared to a population matched (same center and time period) cohort of 44 HPV16+ uterine cervical squamosa cell carcinomas (UCSCC), sequenced with the same technique. The prevalence of common (clonal) coding polymorphisms were similar between HPV+ OPSCC and uterine cervical carcinoma (
In 14/104 tumors, deep copy loss (depth of coverage <1% of that of tumor matched average E7 coverage) of a portion of the HPV genome was noted (excluding the non-coding hypervariable region 3150-3351). These losses, however, represented only 1.2% of all potential genomic space (tumor HPV bases) interrogated. In only 2 of 104 cases (1.9%), genomic regions with deep loss accounted for more than 10% of the viral genome (75%, 47%). These large genomic losses are likely to be related to (clonal) genomic integration of the HPV16 viral genome. Although integration events can be present as sub-clonal genomic variation, deep loss of large segments of the HPV16 genome was quite uncommon in HPV+OPSCC (
Based on the clonal SNPs, relative to the HPV16-A1 reference, full HPV16 genotypes were reconstructed for each tumor. This approach may bias the few (n=2) cases with large areas of deep copy loss as being closer to the A1 reference sequence, as A1 reference is assumed for uncovered bases. To organize the sequence diversity, a maximum parsimony phylogenetic model was implemented. Common non-synonymous polymorphisms were highly correlated with the substructure of the phylogenetic tree. The most common protein-coding polymorphisms are displayed in
The maximum parsimony phylogeny of oncogenic oropharyngeal HPV16 viruses could be reasonably divided into two clades, with membership being highly correlated to the number of non-synonymous polymorphisms (as well as total number of SNPs) relative to the HPV16-A1 reference sequence (
Origins of Genomic Diversity in Oncogenic Oropharyngeal HPV16—Environmental Vs. Intra-Tumoral:
Considering that tumors harboring HPV16 genomes more distantly related to HPV16-A1 had distinct biological characteristics, it was considered whether the origins of the clonal (environmental) HPV genomic diversity of these groups were also distinct. SNPs and their related trinucleotide contexts were generated for identified non-synonymous polymorphisms (in the HPV16 genome). The analysis was limited to those polymorphisms identified in <25% of all tumors investigated (limiting analysis to even more uncommon polymorphisms did not change the qualitative results). Tumors groups were then stratified by viral clade as in
Sub-clonal viral polymorphisms, which likely arose during or after oncogenesis also demonstrated a distinct pattern of trinucleotide contexts, compared to clonal polymorphisms (
To determine if RNA sequencing data was sufficient to assign the HPV16 genotype of a given tumor to the viral clades investigated in
Validation RNA sequencing Cohort:
Considering the strong relationship of HPV16 viral genotype to recurrence free survival in the cohort presented above, studies were performed to validate the finding in a secondary, independent set of patients. Inclusion criteria were the same as the DNA sequencing cohort, with the exception that p16 status was not examined, because many patients with older archival tissue from OPSCC were not subject to routine p16 testing. 120 patients with archival FFPE from OPSCC were identified and processed for next generation RNA sequencing. Of these, 89 patients were found to express HPV16 genes and had sufficient clinical data for inclusion.
Prior studies have demonstrated a relationship between viral genomic integration and survival in HPV+OPSCC. To compare the prognostic values of HPV16 genotype and viral integration, integration status was assigned using a combination of human-viral split read pair identification, as well as the ratio of expression of HPV16 genes E6/E7 to E5/E2 (
In summary, two independent cohorts of HPV16+ OPSCC tumors were analyzed for a total of 187 cases, representing one of the largest reported series of HPV+ OPSCC with viral sequencing. Since HPV16 accounts for greater than 90% of OPSCC, and HPV sub-type may correlate with outcome (Nichols et al. J Otolaryngol—Head Neck Surg. 2013; 42(1):9), we limited our study to HPV16 positive tumors as confirmed by DNA or RNA sequencing. This study has uncovered a surprising degree of genomic diversity in oncogenic HPV16 viral genomes, with 93 distinct protein coding HPV16 genomes amongst the 104 tumors examined. This rich diversity has been previously under investigated in OPSCC.
HPV+ and HPV-negative OPSCC respond quite differently to treatment, with much better survival for patients with HPV+tumors (Ben-David et al. Nature. 2018; 560(7718):325-330). Excellent rates of 5-year disease control often come at the cost of relatively high rates of unfortunate treatment-related morbidities such as gastrostomy tube dependence and osteoradionecrosis of the mandible. Although there is interest in de-intensification of therapy for HPV+ OPSCC (Cheraghlou et al. Cancer. 2018; 124(4):717-726; Chera et al. Cancer. 2018; 124(11):2347-2354), inaccuracy of current prognostic markers such as smoking history, stage or radiologic characteristics in predicting long-term disease control are of concern to clinicians who fear undertreating a patient who may otherwise be cured with standard regimes (Cheraghlou et al. Cancer. 2018; 124(4):717-726). However, long-term analysis of patients treated with high-intensity chemoradiation protocols has revealed increased non-cancer mortality that may negate the survival advantage of aggressive therapy (Forastiere et al. J Clin Oncol Off J Am Soc Clin Oncol. 2013; 31(7):845-852; Tasoulas et al. Cancer Med. 2021; 10(10):3231-3239).
This study demonstrates that targeted HPV16 sequencing or total RNA sequencing from FFPE from routine clinical specimens is sufficient to acquire clinically relevant and risk-stratifying information that could inform treatment decisions. The prognostic value of HPV16 genotype classification demonstrated herein within two independent cohorts for the subgroup of patients with no more than 10 pack-years of tobacco smoke exposure (the current eligibility threshold for reduced treatment intensity at our center and others). Although sequencing of the relatively small ˜7900 base HPV genome from FFPE would be easily performed in many modern clinical laboratories, circulating cell-free HPV DNA shed from OPSCC cancer cells can also be detected, quantified and genotyped from clinical blood draw specimens. Therefore, it is also may be possible to similarly acquire prognostic HPV16 genotypic data from clinical blood samples; making HPV16 genotyping a relatively ideal biomarker for clinical application.
We have previously reported that genomic instability determined by copy number variant burden is also prognostic in HPV16+ OPSCC (Schrank et al. 2021 Cancer 127(15):2788-2800; incorporated herein by reference in its entirety). In the present study it is shown that HPV16-A1 and closely related genotypes are also associated with increased CNV burden. Genomic CNVs identified were primarily numerical chromosomal aberrations or large-scale sub-chromosomal amplifications or losses. This correlation could be explained by the fact that high-risk HPV16 infection and expression of HPV16 E6 and E7 have been demonstrated to directly result in numerical and structural chromosomal instability (Duensing and Münger. Cancer Res. 2002; 62(23):7075-7082).
Studies have established that HPV infection both induces replication stress, causes DNA damage, disrupts cellular DNA double-strand break (DSB) repair via multiple mechanisms, and that HPV-positive head and neck cancer cells are deficient in homologous recombination repair of double-strand breaks. Double-strand break repair dysfunction has been broadly linked to genomic instability in human cancers. Therefore, it is possible that functional polymorphisms in the HPV16 genome result in differential modulation of the host of human DDR factors known to be influenced by high risk HPV infection. Low-risk HPVs (HPV6, HPV11) have been reported to have relatively diminished ability to disrupt the human DNA damage response (DDR) (Wallace N A. Trends Microbiol. 2020; 28(3):191-201).
Although somewhat counterintuitive that double-strand break mediated viral mutagenesis was correlated with the low-risk group of tumors (divergent from A1) with relatively stable human genomes (
Prior studies of uterine cervical cancer have demonstrated that HPV18 is more likely to integrated as compared to HPV16, suggesting that HPV genotype can influence the likelihood of genomic integration during carcinogenesis (McBride & Warburton. PLOS Pathog. 2017; 13(4): e1006211). In the present work, the RNA sequencing cohort which allowed viral integration analysis, demonstrated a relationship between viral integration and HPV16 genotype in patients with <=10 pack-years of smoking. These data suggest that even more subtle genotypic variations in the HPV genome (amongst HPV16 viruses) may influence the chance of genomic integration. The data herein from two independent patient cohorts suggests that molecular risk-stratification based on HPV16 genotype may be used as a genomic tool for the identification of very low-risk patients which may enable safe treatment de-escalation and limit long-term, treatment-related toxicity which is a key goal in the field.
DNA sequencing data were collected as a part of the UNCseq tumor sequencing program. The UNCseq targeted sequencing platform involves sequencing exons of a custom list of 650 human genes (covering 3.4 M bases) and 10 pathogen genome segments in fixed or frozen cancer tissue and matched germline DNA from consenting local patients. This custom sequencing platform provided targeted coverage of all HPV16 open reading frames. Tumor sample identifiers and genomic data were derived from the clinical trial LCCC1108: Development of a Tumor Molecular Analyses Program and Its Use to Support Treatment Decisions. This IRB-approved trial opened in 2011. All studies were done with the approval of the Institutional Review Board, patient participation required written informed consent, and all studies were conducted in accordance with recognized ethical guidelines as described in U.S Common Rule. The UNCseq database was queried for all patients with HPV+ oropharyngeal cancer.
Via chart review of electronic medical records, demographic information was obtained for each study subject, including age, gender, race, and smoking history. Clinical stage at presentation according to the AJCC staging system (AJCC 8th edition) was recorded, clinical AJCC8 staging was used, considering that many patients did not receive surgical treatment. Recurrence-free survival was defined from the date of initial diagnosis to the date at which evidence of recurrent disease was first documented after primary treatment. Cases that presented with distant metastasis (n=2) were excluded from recurrence free survival analysis. All survival analysis was performed with the R Survival package. Cox proportional hazards models were implemented with the coxph function.
A pathologist examined H&E-stained slides from each case to confirm the diagnosis of squamous cell carcinoma. Automated DNA extraction was from FFPE tissue sections using the Promega Maxwell MD×16™ instruments (Promega) and then fragmented by sonication. Subsequent quality assessments were performed by ultraviolet absorbance and quantity assessments. During
DNA isolation and library preparation, DNA concentration was measured by fluorometry and DNA quality was evaluated using the Agilent 2100 Bioanalyzer high sensitivity assay. DNA libraries were pooled for deep sequencing using an Illumina HiSeq2500™ sequencer. Excluding the hypervariable region 3150-3351, median coverage of the HPV16 genome was 7904 with an IQR of 6867-7953. The hypervariable region 3150-3351 had median coverage of 3307 with an IQR of 837-7377. Data have been made publicly available through dbGaP accession number phs001713.v1.p1.
Formalin-fixed paraffin-embedded (FFPE) tissue samples were prepared for RNA isolation using the Maxwell 16 MDx Instrument (Promega AS3000) and the Maxwell 16 LEV RNA FFPE Kit (Promega AS1260) following the manufacturer's protocol (Promega 9FB167). After a pathology review of a hematoxylin and eosin (H&E) stained slide to identify tumor area, RNA was extracted from unstained slides using macrodissection. Total RNA quality was measured using a NanoDrop spectrophotometer (Thermo Scientific ND-2000C) and a TapeStation 4200 (Agilent G2991AA). Total RNA concentration was quantified using a Qubit 3.0 fluorometer (Life Technologies Q33216). Libraries were prepared with Illumina TruSeq Stranded Total RNA with Ribo-Zero protocol. Libraries were sequenced on an Illumina HiSeq2500 sequencer. Paired end read data, with read lengths of 75 were collected.
Raw reads were aligned to the human genome plus a comprehensive library of HPV virus sequences, using compiled reference sequences from the ViFi analysis pipeline. Human genomic mutations were not investigated as only a subset of samples had matched normal DNA sequencing. Tumors were considered to be HPV16 positive if they had more than 20,000 reads mapping to the HPV16 genome. The HPV16 A1 genotype, RefSeq NC_001526.4 (incorporated herein by reference in its entirety) was selected as the primary reference sequence for this study. The Varscan pipeline was used for variant/polymorphism calling, using the reference-free approach. The SnpEff pipeline was used to assign and prioritize variant effects. Consensus sequences were derived from the variant calls which represented clonal variants using in-house scrips. SNPs were only analyzed if the sequencing depth was >=100 and VAF >=0.5. Variants were considered subclonal if VAF <=0.9. The HPV tumor genomes were then aligned against each other and 16 reference HPV16 sub-lineage sequences. Alignments were imported into R for phylogenetic analyses, where maximum parsimony phylogeny was constructed using default parameters provided in the phangorn package. To assign the nearest sub-lineages to the tumor HPV genomes, the R phangorn package was used to construct a sequence distance matrix for each tumor HPV and 16 reference sequences. Each tumor sample was assigned to an HPV16 sub-lineage based on the “closest” reference sequence using the “JC69” distance.
For each patient, targeted genomic sequencing including HLA-A, HLA-B, HLA-C was processed by the Optitype pipeline. Comparison with matched normal (blood) sequencing data, where available, demonstrated remarkable consistency of the results between tumor and matched normal blood. Therefore, tumor sequencing data was utilized for HLA typing for the purposes of this study. The netMHCpan pipeline was then applied to generate predicted binding affinities of all possible patient-matched viral peptide and MHC pairs. An empiric threshold of <325 nM was applied to identify high affinity interaction between MHC and viral peptides. The log fraction of high-affinity to all possible viral peptide MCH interactions was also investigated as a neo-antigenicity metric, as homozygous haplotypes of HLA loci in some patients made the background number of potential peptide-HLA interactions variable between patients.
Based on prior reports, mutational contexts represented by COSMIC Version 2 Signature 13 (C>G variants) and Signature 2 (C>T variants) were considered to be potentially APOBEC related (Revathidevi et al. Cancer Lett. 2021; 496:104-116). Using the trinucleotide contexts most prominent in COSMIC Signatures 2 (T [C>T] A, T [C>T] C, T [C>T] T) and 13 (T [C>G] A, T [C>G] C, T [C>G] T), alterations were counted as potentially APOBEC related. Chi-squared test was applied to determine differences in minor proportions of ABOBEC related variants. Nonnegative matrix factorization was used to estimate the contribution of different mutational process influencing the HPV16 genome. This analysis was implemented and visualized with the R package DeconstructSigs, after generating trinucleotide context SNP matrices with in-house scripts.
Sequencing data were routed through an automated pipeline including a somatic workflow using paired tumor and normal libraries to detect somatic mutations, large and small indels, structural variants, and pathogenic organisms. Raw sequences were aligned using the BWA-mem algorithm and refined using an assembly based realignment process to allow for accurate alignment of complex sequence variation. Only high confidence variants with phred-scaled quality scores greater than 30 were included in the analysis. Average target coverage was ˜1000x.
Copy number calls were generated with the SynthEx algorithm using the tumor sequencing data and a library of 200 un-matched normal samples sequenced with the same technique. The SynthEx pipeline utilizes both on and off target reads to allow large-segment copy number variant calling across the human genome. A conservative approach was taken. Thirty replicates varying the parameter k (number of nearest neighbor) were done per tumor and the model with the fewest deviations from the expected copy number of 2 was selected. Sex chromosomes were excluded.
The HPV16 A1 genotype, RefSeq NC_001526.4 was selected as the primary reference sequence for this study. Salmon was used to quantify RNA reads for NC_001526.4 as well as hg38 (Patro et al. Nat Methods. 2017; 14(4):417-419). Viral transcripts read counts were transformed into log 2 viral read counts per million total mapped reads (human and viral) prior to visualization and analysis. Tumors were considered to be HPV16 positive if there were more than 2000 reads mapping to HPV16 and the log 10 ratio of HPV16/Human reads was >−4.5. 24 cases were excluded based on these criteria.
The ViFi pipeline was used to identified discordant (human—viral) read pairs which cluster to potential integration sites in the human and HPV genomes. Tumors with more than 25 clustered discordant read pairs were classified as positive for (discordant) split reads. The ratio HPV16 E6 and E7 to HPV16 E5 and E2 was calculated for each tumor based on quantification from Salmon. Above a ratio of −0.304, 88% of tumors also displayed locus specific clusters of human-viral split read pairs. Below this threshold only 26% of tumors had human-viral split read pairs. Therefore, tumors with E6E7/E5E2 ratio above this threshold were considered to have an integrated pattern of viral gene expression.
First partial consensus sequences were constructed from viral BAM files output from the ViFi pipeline. With the goal getting an approximate sequence, the Varscan pipeline was used for variant/polymorphism calling, using the reference-free approach, and lax parameters with minimal coverage requirement of 1, VAF minimum of 0.51 and no p-value cut off. These “variants” were used to construct approximate consensus sequences in area of the viral genome covered by the sequencing data. See Supplemental
The UNCseq database was queried for p16+ tumors originating from the anatomic oropharynx (tonsil or tongue base) with available tumor sequencing data as well as data on stage, treatment strategy, clinical outcome, and histopathology available. HPV16 positivity was confirmed by DNA sequencing reads which mapped to the HPV16 genome. For the DNA sequencing cohort, patients were excluded from clinical analyses if tumors were not p16+ or were from atypical oropharyngeal sub-sites (midline soft palate or lateral oropharyngeal wall) (
HPV-positive squamous cell carcinoma of the oropharynx (HPV+ OPSCC) is the most prevalent HPV-associated malignancy in the United States and is primarily caused by HPV16. Favorable treatment outcomes have led to increasing interest in treatment de-escalation to reduce treatment-related morbidity. Prognostic biomarkers are needed to identify appropriately low-risk patients for reduced treatment intensity. Large series of complete HPV16 genome sequencing from HPV+ OPSCC tumors are lacking in the literature. Therefore, this study tested the hypothesis that HPV16 genotype is prognostic of recurrence-free survival (RFS) in HPV16+ OPSCC.
Materials/Methods: Targeted sequencing of 104 patients with HPV16+ OPSCC tumors was performed, providing complete coverage of all HPV16 open reading frames. Clinical features were retrospectively extracted from the medical record. A second cohort of OPSCC patients was sequenced using total RNA sequencing, which identified 89 patients with HPV16+ OPSCC for analysis.
Results: A high degree of coding diversity in the HPV16 was identified, with 93 distinct protein-coding HPV16 genotypes amongst the 104 patients subject to HPV (DNA) sequencing. As found in uterine cervical carcinoma, E7 was the most conserved amongst HPV16 viral genes. Sub-clonal variants were more likely to be non-synonymous and were enhanced for APOBEC-related mutagenesis. The HPV16-A1 sub-lineage was the most prevalent (approximately 70%). Genotypes closely related to HPV16-A1 were associated with increased numbers of copy-number variants in the human genome. Genotypes divergent from HPV16-A1 were strongly associated with favorable RFS as compared to HPV16-A1 (or similar genotypes); this finding was independent of tobacco smoke exposure. HPV16 genotypes divergent from HPV16-A1 were subsequently validated in an independent cohort (subject to RNA sequencing), to be associated with improved RFS in patients with moderate (less than 30 pack-years) and low (no more than 10 pack-years) of tobacco smoke exposure.
Conclusion: HPV16 viral genotype is highly diverse in HPV associated OPSCC. Sequence divergence from the HPV16-A1 reference sequence is strongly associated with improved RFS in patients with moderate to no tobacco smoke exposure. This finding was confirmed in two independent cohorts. HPV16 genotype is a promising biomarker to guide therapeutic decision-making related to de-escalation therapy. Prognostic genotypic information can be obtained from clinical samples stored in FFPE applying either DNA or RNA sequencing technology.
The following samples (determined by phylogenetic analysis) are defined as low risk/de-escalation group:
“UNCseq1628” SEQ ID NO: 13, “UNCseq1770” SEQ ID NO:23, “UNCseq2162” SEQ ID NO: 58, “UNCseq1750” SEQ ID NO:22, “UNCseq1864” SEQ ID NO:30, “UNCseq1867” SEQ ID NO:31, “UNCseq2234” SEQ ID NO:62, “UNCseq2491” SEQ ID NO: 80, “UNCseq2539” SEQ ID NO:82, “UNCseq2012” SEQ ID NO:48, “UNCseq2554” SEQ ID NO:83, “UNCseq2771” SEQ ID NO:93, “UNCseq1796” SEQ ID NO:25, “UNCseq0733” SEQ ID NO: 1, “UNCseq2051” SEQ ID NO:52, “UNCseq2789” SEQ ID NO: 102, “UNCseq2106” SEQ ID NO:56, “UNCseq1742” SEQ ID NO:20, “UNCseq1140” SEQ ID NO:4, “UNCseq1849” SEQ ID NO:28, “UNCseq 1980” SEQ ID NO:41, “UNCseq2005” SEQ ID NO:45, “UNCseq2105” SEQ ID NO:55, “UNCseq1360” SEQ ID NO: 6, “UNCseq1525” SEQ ID NO:7, “UNCseq1891” SEQ ID NO:32, “UNCseq 1662” SEQ ID NO: 14, “UNCseq 1924” SEQ ID NO:36, “UNCseq2292” SEQ ID NO:66, “UNCseq1994” SEQ ID NO:43, “UNCseq2007” SEQ ID NO:46, “UNCseq1834” SEQ ID NO:26, “UNCseq2783” SEQ ID NO: 100, “UNCseq2786” SEQ ID NO: 101, “UNCseq1693” SEQ ID NO: 66, “UNCseq2033” SEQ ID NO:51, “UNCseq0848” SEQ ID NO:2, “UNCseq2249” SEQ ID NO: 63, “UNCseq2794” SEQ ID NO: 106, “UNCseq2393” SEQ ID NO:72, “UNCseq2576” SEQ ID NO:84, “UNCseq2795” SEQ ID NO: 107, “UNCseq1697” SEQ ID NO: 17, “UNCseq1938” SEQ ID NO:38, “UNCseq1991” SEQ ID NO:42
Samples recited below which are not named above are defined as high risk/do not de-escalate.
UNCseq0733 (SEQ ID NO:1); UNCseq0848 (SEQ ID NO:2); UNCseq1009 (SEQ ID NO:3); UNCseq1140 (SEQ ID NO:4); UNCseq1310 (SEQ ID NO:5); UNCseq1360 (SEQ ID NO:6) UNCseq1525 (SEQ ID NO:7); UNCseq1527 (SEQ ID NO:8); UNCseq1583 (SEQ ID NO:9); UNCseq 1588 (SEQ ID NO:10); UNCseq1593 (SEQ ID NO:11); UNCseq1610 (SEQ ID NO: 12); UNCseq1628 (SEQ ID NO:13); UNCseq1662 (SEQ ID NO:14); UNCseq1678 (SEQ ID NO:15); UNCseq1693 (SEQ ID NO:16); UNCseq1697 (SEQ ID NO:17); UNCseq1710 (SEQ ID NO:18); UNCseq1712 (SEQ ID NO:19); UNCseq 1742 (SEQ ID NO: 20); UNCseq1743 (SEQ ID NO:21); UNCseq1750 (SEQ ID NO:22); UNCseq1770 (SEQ ID NO:23); UNCseq1787 (SEQ ID NO:24); UNCseq1796 (SEQ ID NO:25); UNCseq1834 (SEQ ID NO:26); UNCseq1840 (SEQ ID NO:27); UNCseq1849 (SEQ ID NO: 28); UNCseq1851 (SEQ ID NO:29); UNCseq1864 (SEQ ID NO:30); UNCseq1867 (SEQ ID NO:31); UNCseq1891 (SEQ ID NO:32); UNCseq1897 (SEQ ID NO:33); UNCseq1906 (SEQ ID NO:34); UNCseq1918 (SEQ ID NO:35); UNCseq1924 (SEQ ID NO: 36); UNCseq1930 (SEQ ID NO:37); UNCseq1938 (SEQ ID NO:38); UNCseq1954 (SEQ ID NO:39); UNCseq1958 (SEQ ID NO:40); UNCseq1980 (SEQ ID NO:41); UNCseq1991 (SEQ ID NO:42); UNCseq1994 (SEQ ID NO:43); UNCseq2000 (SEQ ID NO: 44); UNCseq2005 (SEQ ID NO:45); UNCseq2007 (SEQ ID NO:46); UNCseq2010 (SEQ ID NO:47); UNCseq2012 (SEQ ID NO:48); UNCseq2025 (SEQ ID NO:49); UNCseq2032 (SEQ ID NO:50); UNCseq2033 (SEQ ID NO:51); UNCseq2051 (SEQ ID NO: 52); UNCseq2056 (SEQ ID NO:53); UNCseq2083 (SEQ ID NO:54); UNCseq2105 (SEQ ID NO:55); UNCseq2106 (SEQ ID NO:56); UNCseq2116 (SEQ ID NO:57); UNCseq2162 (SEQ ID NO:58); UNCseq2166 (SEQ ID NO:59); UNCseq2182 (SEQ ID NO: 60); UNCseq2209 (SEQ ID NO:61); UNCseq2234 (SEQ ID NO:62); UNCseq2249 (SEQ ID NO:63); UNCseq2253 (SEQ ID NO:64); UNCseq2254 (SEQ ID NO:65); UNCseq2292 (SEQ ID NO:66); UNCseq2298 (SEQ ID NO:67); UNCseq2333 (SEQ ID NO: 68); UNCseq2337 (SEQ ID NO:69); UNCseq2344 (SEQ ID NO:70); UNCseq2376 (SEQ ID NO:71); UNCseq2393 (SEQ ID NO:72); UNCseq2413 (SEQ ID NO:73); UNCseq2426 (SEQ ID NO:74); UNCseq2427 (SEQ ID NO:75); UNCseq2430 (SEQ ID NO: 76); UNCseq2450 (SEQ ID NO:77); UNCseq2468 (SEQ ID NO:78); UNCseq2476 (SEQ ID NO:79); UNCseq2491 (SEQ ID NO:80); UNCseq2523 (SEQ ID NO:81); UNCseq2539 (SEQ ID NO:82); UNCseq2554 (SEQ ID NO:83); UNCseq2576 (SEQ ID NO: 84); UNCseq2594 (SEQ ID NO:85); UNCseq2601 (SEQ ID NO:86); UNCseq2708 (SEQ ID NO:87); UNCseq2750 (SEQ ID NO:88); UNCseq2767 (SEQ ID NO:89); UNCseq2768 (SEQ ID NO:90); UNCseq2769 (SEQ ID NO:91); UNCseq2770 (SEQ ID NO: 92); UNCseq2771 (SEQ ID NO:93); UNCseq2772 (SEQ ID NO:94); UNCseq2773 (SEQ ID NO:95); UNCseq2774 (SEQ ID NO:96); UNCseq2775 (SEQ ID NO:97); UNCseq2778 (SEQ ID NO:98); UNCseq2779 (SEQ ID NO:99); UNCseq2783 (SEQ ID NO: 100); UNCseq2786 (SEQ ID NO:101); UNCseq2789 (SEQ ID NO:102); UNCseq2790 (SEQ ID NO:103); UNCseq2791 (SEQ ID NO:104); UNCseq2792 (SEQ ID NO:105); UNCseq2794 (SEQ ID NO:106); UNCseq2795 (SEQ ID NO:107)
Sample HPV16 sub-lineage genotypes.
A1 (SEQ ID NO:108); A2 (SEQ ID NO: 109); A3 (SEQ ID NO:123); A4 (SEQ ID NO:110); B1 (SEQ ID NO:111); B2 (SEQ ID NO:112); B3 (SEQ ID NO:113); B4 (SEQ ID NO:114); C1 (SEQ ID NO:115); C2 (SEQ ID NO:116); C3 (SEQ ID NO:117); C4 (SEQ ID NO:118); DI (SEQ ID NO:119); D2 (SEQ ID NO:120); D3 (SEQ ID NO:121); D4 (SEQ ID NO:122)
The foregoing is illustrative of the present invention, and is not to be construed as limiting thereof. The invention is defined by the following claims, with equivalents of the claims to be included therein.
This application claims the benefit, under 35 U.S.C. § 119 (e), of U.S. Provisional Application No. 63/313,056, filed on Feb. 23, 2022, the entire contents of which are incorporated by reference herein.
This invention was made with government support under Grant Number DE029241 awarded by the National Institutes of Health. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2023/063005 | 2/22/2023 | WO |
Number | Date | Country | |
---|---|---|---|
63313056 | Feb 2022 | US |