The present invention relates to a cancer sub-type. Provided are methods for determining clinical prognosis and selecting whether to administer an anti-angiogenic therapeutic agent based on assessing from the expression level of biomarkers whether the cancer belongs to the sub-type.
Individualisation of therapy for cancer patients is desirable in order to ensure the most effective treatment for a particular patient. Currently, it is often difficult for healthcare professionals to identify cancer patients who will benefit from a given therapy regime. Thus, patients often needlessly undergo ineffective, toxic drug therapy. The advent of microarrays and molecular genomics has the potential to aid in the prediction of the response of an individual patient to a defined therapeutic regimen.
Angiogenesis is a key area for therapeutic intervention. This has promoted the development of a number of agents that target angiogenesis related processes and pathways, including the market leader and first FDA-approved anti-angiogenic, bevacizumab (Avastin), produced by Genentech/Roche.
Treatment regimens that include bevacizumab have demonstrated broad clinical activity 1-10. However, no overall survival (OS) benefit has been shown after the addition of bevacizumab to cytotoxic chemotherapy in most cancers 8, 12-13. This suggests that a substantial proportion of tumours are either initially resistant or quickly develop resistance to VEGF blockade (the mechanism of action of bevacizumab). In fact, 21% of ovarian, 10% of renal and 33% of rectal cancer patients show partial regression when receiving bevacizumab monotherapy, suggesting that bevacizumab may be active in small subgroups of patients, but that such incremental benefits do not reach significance in unselected patients15-18. As such, the availability of biomarkers of response to bevacizumab would improve assessment of treatment outcomes and thus enable the identification of patient subgroups that would receive the most clinical benefit from bevacizumab treatment.
Thus, there is a need for a test that would facilitate the stratification of patients based upon their predicted response to anti-angiogenic therapeutics, either in combination with standard of care or as a single-agent therapeutic. This would allow for the rapid identification of those patients who should receive alternative therapies.
A cancer with a given histopathological diagnosis may represent multiple diseases at a molecular level.
The present inventors have identified a molecular sub-type of high grade serous ovarian cancer (HGSOC) that has an improved prognosis and where the addition of bevacizumab to the treatment regimen significantly reduces overall survival and progression free survival. The sub-type is associated with an up-regulation in molecular signaling related to immune response and a down-regulation in molecular signaling related to angiogenesis and vasculature development, referred to herein as a “non-angiogenesis” or “immune” subtype. The inventors have found that this sub-type can be reliably identified using a range of biomarker expression signatures.
Thus, in a first aspect the invention provides a method for selecting whether to administer an anti-angiogenic therapeutic agent to a subject with cancer, comprising:
measuring the expression levels of at least 3 biomarkers in a sample from the subject,
wherein at least two of the biomarkers are from Table A and at least one of the biomarkers is from Table B and assessing from the expression levels of the biomarkers whether the cancer belongs to a cancer sub-type
wherein the cancer sub-type is defined by the expression levels of a set of biomarkers associated with angiogenesis and a set of biomarkers associated with immune response
wherein if the cancer belongs to the sub-type an anti-angiogenic therapeutic agent is contraindicated
wherein the at least 3 biomarkers do not comprise at least two biomarkers selected from COL1A2, COL3A1, TIMP3, COL4A1, COL8A1, CDH11, TIMP2, ANGPTL2, and MMP14 and at least one biomarker selected from CIITA, XAF1 and CD74.
The cancer sub-type may be defined by the probesets listed in Tables A and B and by the expression levels of the corresponding genes in Tables A and B, which may be measured using the probesets. Negative values are indicative of decreased (mean) expression levels and positive values of increased (mean) expression levels.
In a further aspect the invention provides a method for selecting whether to administer an anti-angiogenic therapeutic agent to a subject with cancer, comprising:
measuring the expression levels of at least 3 biomarkers in a sample from the subject,
wherein at least two of the biomarkers are from Table A and at least one of the biomarkers is from Table B and assessing from the expression levels of the biomarkers whether the cancer belongs to a cancer sub-type
(optionally) wherein the cancer sub-type is defined by the expression levels of the genes in Tables A and B
wherein if the cancer belongs to the sub-type an anti-angiogenic therapeutic agent is contraindicated
wherein the at least 3 biomarkers do not comprise at least two biomarkers selected from COL1A2, COL3A1, TIMP3, COL4A1, COL8A1, CDH11, TIMP2, ANGPTL2, and MMP14 and at least one biomarker selected from CIITA, XAF1 and CD74.
According to a further aspect of the invention there is provided a method for selecting whether to administer an anti-angiogenic therapeutic agent to a subject with cancer, comprising:
measuring the expression levels of at least 3 biomarkers in a sample from the subject,
wherein at least two of the biomarkers are from Table A and at least one of the biomarkers is from Table B and assessing from the expression levels of the biomarkers whether the cancer belongs to a cancer sub-type, wherein if the cancer belongs to the subtype the expression levels of the at least two biomarkers from Table A and the at least one biomarker from Table B are increased or decreased as defined for each biomarker in Table A and Table B (optionally) wherein the cancer sub-type is defined by the expression levels of the genes in Tables A and B
wherein if the cancer belongs to the sub-type an anti-angiogenic therapeutic agent is contraindicated.
In yet a further aspect, the present invention relates to a method for predicting the responsiveness of a subject with cancer to an anti-angiogenic therapeutic agent comprising:
allocating the cancer to a cancer sub-type by measuring the expression levels of at least 3 biomarkers in a sample from the subject, wherein at least two of the biomarkers are from Table A and at least one of the biomarkers is from Table B [IMMUNE LIST] and assessing from the expression levels of the biomarkers whether the cancer belongs to the sub-type (optionally) wherein the cancer sub-type is defined by the expression levels of the genes in Tables A and B
classifying the subject as responsive or non-responsive to the anti-angiogenic therapeutic agent on the basis of allocation to the subtype, wherein if the cancer belongs to the sub-type it is predicted to be non-responsive to the anti-angiogenic therapeutic agent
wherein the at least 3 biomarkers do not comprise at least two biomarkers selected from COL1A2, COL3A1, TIMP3, COL4A1, COL8A1, CDH11, TIMP2, ANGPTL2, and MMP14 and at least one biomarker selected from CIITA, XAF1 and CD74.
The invention also relates to a method for predicting the responsiveness of a subject with cancer to an anti-angiogenic therapeutic agent comprising:
allocating the cancer to a cancer sub-type by measuring the expression level of at least 3 biomarkers in a sample from the subject, wherein at least two of the biomarkers are from Table A and at least one of the biomarkers is from Table B and assessing from the expression levels of the biomarkers whether the cancer belongs to the sub-type wherein if the cancer belongs to the subtype the expression levels of the at least two biomarkers from Table A and the at least one biomarker from Table B are increased or decreased as defined for each biomarker in Table A and Table B
(optionally) wherein the cancer sub-type is defined by the expression levels of the genes in Tables A and B
classifying the subject as responsive or non-responsive to the anti-angiogenic therapeutic agent on the basis of allocation to the subtype, wherein if the cancer belongs to the sub-type it is predicted to be non-responsive to the anti-angiogenic therapeutic agent.
In a further aspect, the present invention relates to a method of determining clinical prognosis of a subject with cancer comprising:
measuring the expression level of at least 3 biomarkers in a sample from the subject,
wherein at least two of the biomarkers are from Table A and at least one of the biomarkers is from Table B and assessing from the expression levels of the biomarkers whether the cancer belongs to a cancer sub-type
(optionally) wherein the cancer sub-type is defined by the expression levels of the genes in Tables A and B
classifying the subject as having a good prognosis if the cancer belongs to the sub-type wherein the at least 3 biomarkers do not comprise at least two biomarkers selected from COL1A2, COL3A1, TIMP3, COL4A1, COL8A1, CDH11, TIMP2, ANGPTL2, and MMP14 and at least one biomarker selected from CIITA, XAF1 and CD74.
The invention also relates to a method of determining clinical prognosis of a subject with cancer comprising:
measuring the expression level of at least 3 biomarkers in a sample from the subject,
wherein at least two of the biomarkers are from Table A and at least one of the biomarkers is from Table B and assessing from the expression levels of the biomarkers whether the cancer belongs to a cancer sub-type wherein if the cancer belongs to the subtype the expression levels of the at least two biomarkers from Table A and the at least one biomarker from Table B are increased or decreased as defined for each biomarker in Table A and Table B
(optionally) wherein the cancer sub-type is defined by the expression levels of the genes in Tables A and B
classifying the subject as having a good prognosis if the cancer belongs to the sub-type.
In yet a further aspect, the present invention relates to a method for selecting whether to administer an anti-angiogenic therapeutic agent to a subject with cancer, comprising:
measuring the expression levels of at least 2 biomarkers in a sample from the subject and assessing from the expression levels of the biomarkers whether the cancer belongs to a cancer sub-type
(optionally) wherein the cancer sub-type is defined by the expression levels of the genes in Tables A and B
wherein if the cancer belongs to the sub-type an anti-angiogenic therapeutic agent is contraindicated
wherein the at least 2 biomarkers do not consist of from 1 to 63 of the biomarkers shown in Table C.
The genes from Table C are shown ranked in Table D and probesets that can be used to detect these genes are shown in Table E.
According to a further aspect of the invention there is provided a method for predicting the responsiveness of a subject with cancer to an anti-angiogenic therapeutic agent comprising: allocating the cancer to a cancer sub-type by measuring the expression levels of at least 2 biomarkers in a sample from the subject and assessing from the expression levels of the biomarkers whether the cancer belongs to the sub-type
(optionally) wherein the cancer sub-type is defined by the expression levels of the genes in Tables A and B
classifying the subject as responsive or non-responsive to the anti-angiogenic therapeutic agent on the basis of allocation to the subtype, wherein if the cancer belongs to the sub-type it is predicted to be non-responsive to the anti-angiogenic therapeutic agent
wherein the at least 2 biomarkers do not consist of from 1 to 63 of the biomarkers shown in Table C.
In yet a further aspect, the present invention relates to a method of determining clinical prognosis of a subject with cancer comprising:
measuring the expression levels of at least 2 biomarkers in a sample from the subject and assessing from the expression levels of the biomarkers whether the cancer belongs to a cancer sub-type
(optionally) wherein the cancer sub-type is defined by the expression levels of the genes in Tables A and B
classifying the subject as having a good prognosis if the cancer belongs to the sub-type wherein the at least 2 biomarkers do not consist of from 1 to 63 of the biomarkers shown in Table C.
According to all relevant aspects of the invention the subject (whose clinical prognosis is determined) is receiving, has received and/or will receive a standard chemotherapeutic treatment for the subject's cancer type and/or has not, is not and/or will not receive an anti-angiogenic therapeutic agent. In certain embodiments the standard chemotherapeutic treatment comprises, consists essentially of or consists of a platinum based-chemotherapeutic agent, a mitotic inhibitor, or a combination thereof. In specific embodiments the standard chemotherapeutic treatment comprises, consists essentially of or consists of carboplatin (or cisplatin) and/or paclitaxel.
Good prognosis may indicate increased progression free survival and/or overall survival rates and/or decreased likelihood of recurrence or metastasis compared to subjects with cancers that do not belong to the sub-type. Metastasis, or metastatic disease, is the spread of a cancer from one organ or part to another non-adjacent organ or part. The new occurrences of disease thus generated are referred to as metastases.
A therapeutic agent is “contraindicated” or “detrimental” to a patient if the cancer's rate of growth is accelerated as a result of contact with the therapeutic agent, compared to its growth in the absence of contact with the therapeutic agent and/or if the therapeutic agent is toxic to a patient. Growth of a cancer can be measured in a variety of ways. For instance, the size of a tumour, or measuring the expression of tumour markers appropriate for that tumour type. A therapeutic agent can also be considered “contraindicated” or “detrimental” if the patient's overall prognosis (progression free survival and/or overall survival) is reduced by the administration of the therapeutic agent.
A cancer is “responsive” to a therapeutic agent if its rate of growth is inhibited as a result of contact with the therapeutic agent, compared to its growth in the absence of contact with the therapeutic agent. Growth of a cancer can be measured in a variety of ways. For instance, the size of a tumor or measuring the expression of tumour markers appropriate for that tumour type. A cancer can also be considered responsive to a therapeutic agent if the patient's overall prognosis (progression free survival and/or overall survival) is improved by the administration of the therapeutic agent.
A cancer is “non-responsive” to a therapeutic agent if its rate of growth is not inhibited, or inhibited to a very low degree or to a non-statistically significant degree, as a result of contact with the therapeutic agent when compared to its growth in the absence of contact with the therapeutic agent. As stated above, growth of a cancer can be measured in a variety of ways, for instance, the size of a tumour or measuring the expression of tumour markers appropriate for that tumour type. A cancer can also be considered non-responsive to a therapeutic agent if the patient's overall prognosis (progression free survival and/or overall survival) is not improved by the administration of the therapeutic agent. Still further, measures of non-responsiveness can be assessed using additional criteria beyond growth size of a tumor such as, but not limited to, patient quality of life, and degree of metastases.
In a further aspect, the present invention relates to a method of treating cancer comprising administering a chemotherapeutic agent to a subject wherein the subject is selected for treatment on the basis of a method as described herein and wherein an anti-angiogenic therapeutic agent is not administered (if the cancer is determined to belong to the subtype).
The invention also relates to a method of treating cancer comprising administering a chemotherapeutic agent and not administering an anti-angiogenic therapeutic agent to a subject, wherein the subject has a cancer that has been determined to belong to a cancer sub-type, (optionally) wherein the cancer sub-type is defined by the expression levels of the genes in Tables A and B, by either:
(i) measuring the expression levels of at least 3 biomarkers in a sample from the subject,
wherein at least two of the biomarkers are from Table A and at least one of the biomarkers is from Table B and assessing from the expression levels of the biomarkers whether the cancer belongs to the cancer sub-type
wherein the at least 3 biomarkers do not comprise at least two biomarkers selected from COL1A2, COL3A1, TIMP3, COL4A1, COL8A1, CDH11, TIMP2, ANGPTL2, and MMP14 and at least one biomarker selected from CIITA, XAF1 and CD74; or
(ii) measuring the expression levels of at least 2 biomarkers in a sample from the subject and assessing from the expression levels of the biomarkers whether the cancer belongs to the cancer sub-type
wherein the at least 2 biomarkers do not consist of from 1 to 63 of the biomarkers shown in Table C.
According to a further aspect of the invention there is provided a chemotherapeutic agent for use in treating cancer in a subject wherein the subject is selected for treatment on the basis of a method as described herein and wherein the subject is not treated with an anti-angiogenic therapeutic agent (if the cancer is determined to belong to the subtype).
In yet a further aspect, the present invention relates to a chemotherapeutic agent for use in treating cancer in a subject wherein the subject has a cancer that has been determined to belong to a cancer sub-type, (optionally) wherein the cancer sub-type is defined by the expression levels of the genes in Tables A and B, by either:
(i) measuring the expression levels of at least 3 biomarkers in a sample from the subject, wherein at least two of the biomarkers are from Table A and at least one of the biomarkers is from Table B and assessing from the expression levels of the biomarkers whether the cancer belongs to the cancer sub-type
wherein the at least 3 biomarkers do not comprise at least two biomarkers selected from COL1A2, COL3A1, TIMP3, COL4A1, COL8A1, CDH11, TIMP2, ANGPTL2, and MMP14 and at least one biomarker selected from CIITA, XAF1 and CD74; or
(ii) measuring the expression levels of at least 2 biomarkers in a sample from the subject and assessing from the expression levels of the biomarkers whether the cancer belongs to the cancer sub-type
wherein the at least 2 biomarkers do not consist of from 1 to 63 of the biomarkers shown in Table C
and wherein the subject is not treated with an anti-angiogenic therapeutic agent.
The invention also relates to a method of treating cancer comprising administering a chemotherapeutic agent to a subject wherein the subject has a cancer that belongs to a cancer sub-type defined by the expression levels of the genes in Tables A and B and wherein an anti-angiogenic therapeutic agent is not administered.
In a further aspect, the present invention relates to a chemotherapeutic agent for use in treating cancer in a subject wherein the subject has a cancer that belongs to a cancer sub-type defined by the expression levels of the genes in Tables A and B and wherein the subject is not treated with an anti-angiogenic therapeutic agent.
According to all aspects of the invention the chemotherapeutic agent may comprise a platinum-based chemotherapeutic agent, an alkylating agent, an anti-metabolite (such as 5FU), an anti-tumour antibiotic, a topoisomerase inhibitor, a mitotic inhibitor, or a combination thereof. In certain embodiments the chemotherapeutic agent comprises a platinum based-chemotherapeutic agent, a mitotic inhibitor, or a combination thereof. In specific embodiments the chemotherapeutic agent comprises carboplatin and/or paclitaxel. The chemotherapeutic agent may reflect the standard of care treatment for the cancer. The standard of care treatment may differ for different types of cancer—for example, carboplatin in ovarian cancer, 5FU in colorectal cancer, platinum in head and neck cancer.
According to all aspects of the invention assessing whether the cancer belongs to the sub-type may comprise the use of classification trees.
According to all aspects of the invention assessing whether the cancer belongs to the sub-type may comprise:
determining a sample expression score for the biomarkers;
comparing the sample expression score to a threshold score; and
determining whether the sample expression score is above or
equal to or below the threshold expression score,
wherein if the sample expression score is above or equal to the threshold expression score the cancer belongs to the sub-type.
The sample expression score and threshold score may also be determined such that if the sample expression score is below or equal to the threshold expression score the cancer belongs to the sub-type.
“Expression levels” of biomarkers may be numerical values or directions of expression.
In certain embodiments the expression score is calculated using a weight value and/or a bias value for each biomarker. In specific embodiments the at least two biomarkers from Table A are weighted as 1/N where N is the number of biomarkers used from Table A and the at least one biomarker from Table B is weighted as 1/M where M is the number of biomarkers used from Table B.
As used herein, the term “weight” refers to the absolute magnitude of an item in a mathematical calculation. The weight of each biomarker in a gene expression classifier may be determined on a data set of patient samples using learning methods known in the art. As used herein the term “bias” or “offset” refers to a constant term derived using the mean or median expression of the signatures genes in a training set and is used to mean- or median-center each gene analyzed in the test dataset.
By expression score is meant a compound decision score that summarizes the expression levels of the biomarkers. This may be compared to a threshold score that is mathematically derived from a training set of patient data. The threshold score is established with the purpose of maximizing the ability to separate cancers into those that belong to the sub-type and those that do not. The patient training set data is preferably derived from cancer tissue samples having been characterized by sub-type, prognosis, likelihood of recurrence, long term survival, clinical outcome, treatment response, diagnosis, cancer classification, or personalized genomics profile. Expression profiles, and corresponding decision scores from patient samples may be correlated with the characteristics of patient samples in the training set that are on the same side of the mathematically derived score decision threshold. In certain example embodiments, the threshold of the (optionally linear) classifier scalar output is optimized to maximize the sum of sensitivity and specificity under cross-validation as observed within the training dataset.
The overall expression data for a given sample may be normalized using methods known to those skilled in the art in order to correct for differing amounts of starting material, varying efficiencies of the extraction and amplification reactions, etc.
In one embodiment, the biomarker expression levels in a sample are evaluated by a linear classifier. As used herein, a linear classifier refers to a weighted sum of the individual biomarker intensities into a compound decision score (“decision function”). The decision score is then compared to a pre-defined cut-off score threshold, corresponding to a certain set-point in terms of sensitivity and specificity which indicates if a sample is equal to or above the score threshold (decision function positive) or below (decision function negative).
Using a linear classifier on the normalized data to make a call (e.g. cancer belongs to the sub-type or not) effectively means to split the data space, i.e. all possible combinations of expression values for all genes in the classifier, into two disjoint segments by means of a separating hyperplane. This split is empirically derived on a large set of training examples. Without loss of generality, one can assume a certain fixed set of values for all but one biomarker, which would automatically define a threshold value for this remaining biomarker where the decision would change from, for example, belonging to the sub-type or not. The precise value of this threshold depends on the actual measured expression profile of all other biomarkers within the classifier, but the general indication of certain biomarkers remains fixed. Therefore, in the context of the overall gene expression classifier, relative expression can indicate if either up- or down-regulation of a certain biomarker is indicative of belonging to the sub-type or not. In certain example embodiments, a sample expression score above the threshold expression score indicates the cancer belongs to the subtype. In certain other example embodiments, a sample expression score above a threshold score indicates the subject has a good clinical prognosis compared to a subject with a sample expression score below the threshold score. In certain other example embodiments, a sample expression score above the threshold score indicates the subject has an increased relative risk of experiencing a detrimental effect, or having a poor prognosis, if an anti-angiogenic therapeutic agent is administered.
In certain embodiments the biomarkers used to assess whether the cancer belongs to the cancer sub-type do not comprise or consist of any one or more of the 63 biomarkers shown in Table C.
According to all aspects of the invention the cancer sub-type may be defined by increased and/or decreased expression levels of the genes listed in Tables A and B as shown in Tables A and B.
When a biomarker indicates or is a sign of an abnormal process, disease or other condition in an individual, that biomarker may be described as being either over-expressed or under-expressed or having an increased or decreased expression level as compared to an expression level or value of the biomarker that indicates or is a sign of a normal process, an absence of a disease or other condition in an individual. “Up-regulation”, “up-regulated”, “over-expression”, “over-expressed”, “increased expression” and any variations thereof are used interchangeably to refer to a value or level of a biomarker in a biological sample that is (statistically significantly) greater than a value or level (or range of values or levels) of the biomarker that is typically detected in similar biological samples from healthy or normal individuals. The terms may also refer to a value or level of a biomarker in a biological sample that is (statistically significantly) greater than a value or level (or range of values or levels) of the biomarker that may be detected at a different stage of a particular disease. The terms may also be used to refer to a value or level of biomarker in a biological sample that is (statistically significantly) greater than the average value or level of the biomarker that may be detected for samples of the same disease as a whole. For example, the level of biomarker may be (statistically significantly) greater than the average level for ovarian cancer samples, preferably serous ovarian cancer samples, more preferably high-grade serous ovarian cancer samples.
“Down-regulation”, “down-regulated”, “under-expression”, “under-expressed”, “decreased expression” and any variations thereof are used interchangeably to refer to a value or level of a biomarker in a biological sample that is (statistically significantly) less than a value or level (or range of values or levels) of the biomarker that is typically detected in similar biological samples from healthy or normal individuals. The terms may also refer to a value or level of a biomarker in a biological sample that is (statistically significantly) less than a value or level (or range of values or levels) of the biomarker that may be detected at a different stage of a particular disease. The terms may also be used to refer to a value or level of biomarker in a biological sample that is (statistically significantly) less than the average value or level of the biomarker that may be detected for samples of the same disease as a whole. For example, the level of biomarker may be (statistically significantly) less than the average level for ovarian cancer samples, preferably serous ovarian cancer samples, more preferably high-grade serous ovarian cancer samples.
Further, a biomarker that is either over-expressed or under-expressed can also be referred to as being “differentially expressed” or as having a “differential level” or “differential value” as compared to a “normal” expression level or value of the biomarker that indicates or is a sign of a normal process or an absence of a disease, disease subtype, or other condition in an individual. Thus, “differential expression” of a biomarker can also be referred to as a variation from a “normal” expression level of the biomarker.
The terms “differential biomarker expression” and “differential expression” are used interchangeably to refer to a biomarker whose expression is activated to a higher or lower level in a subject suffering from a specific disease, relative to its expression in a normal subject, or relative to its expression in a patient that responds differently to a particular therapy or has a different prognosis. The terms also include biomarkers whose expression is activated to a higher or lower level at different stages of the same disease. It is also understood that a differentially expressed biomarker may be either activated or inhibited at the nucleic acid level or protein level, or may be subject to alternative splicing to result in a different polypeptide product. Such differences may be evidenced by a variety of changes including mRNA levels, miRNA levels, antisense transcript levels, or protein surface expression, secretion or other partitioning of a polypeptide. Differential biomarker expression may include a comparison of expression between two or more genes or their gene products; or a comparison of the ratios of the expression between two or more genes or their gene products; or even a comparison of two differently processed products of the same gene, which differ between normal subjects and subjects suffering from a disease; or between various stages of the same disease. Differential expression includes both quantitative, as well as qualitative, differences in the temporal or cellular expression pattern in a biomarker among, for example, normal and diseased cells, or among cells which have undergone different disease events or disease stages.
In certain embodiments the subject is receiving, has received and/or will receive (optionally together with the anti-angiogenic therapeutic agent) treatment with a chemotherapeutic agent.
According to all aspects of the invention the method may further comprise obtaining a test sample from the subject. The methods may be vitro methods performed on an isolated sample.
According to all aspects of the invention samples may be of any suitable form including any material, biological fluid, tissue, or cell obtained or otherwise derived from an individual. In specific embodiments the sample comprises, consists essentially of or consists of a formalin-fixed paraffin-embedded biopsy sample. In further embodiments the sample comprises, consists essentially of or consists of a fresh/frozen (FF) sample. The sample may comprise, consist essentially of or consist of tumour (cancer) tissue, optionally ovarian tumour (cancer) tissue. The sample may comprise, consist essentially of or consist of tumour (cancer) cells, optionally ovarian tumour (cancer) cells. The sample may be obtained by any suitable technique. Examples include a biopsy procedure, optionally a fine needle aspirate biopsy procedure. Body fluid samples may also be utilised. Suitable sample types include blood (including whole blood, leukocytes, peripheral blood mononuclear cells, buffy coat, plasma, and serum), sputum, tears, mucus, nasal washes, nasal aspirate, breath, urine, semen, saliva, meningeal fluid, amniotic fluid, glandular fluid, lymph fluid, nipple aspirate, bronchial aspirate, synovial fluid, joint aspirate, ascites, cells, a cellular extract, and cerebrospinal fluid. This also includes experimentally separated fractions of all of the preceding. For example, a blood sample can be fractionated into serum or into fractions containing particular types of blood cells, such as red blood cells or white blood cells (leukocytes). If desired, a sample can be a combination of samples from an individual, such as a combination of a tissue and fluid sample. The term “sample” also includes materials containing homogenized solid material, such as from a stool sample, a tissue sample, or a tissue biopsy, for example. The term “sample” also includes materials derived from a tissue culture or a cell culture, including tissue resection and biopsy samples. Example methods for obtaining a sample include, e.g., phlebotomy, swab (e.g., buccal swab). Samples can also be collected, e.g., by micro dissection (e.g., laser capture micro dissection (LCM) or laser micro dissection (LMD)), bladder wash, smear (e.g., a PAP smear), or ductal lavage. A “sample” obtained or derived from an individual includes any such sample that has been processed in any suitable manner after being obtained from the individual. The methods of the invention as defined herein may begin with an obtained sample and thus do not necessarily (although they may) incorporate the step of obtaining the sample from the patient. As used herein, the term “patient” includes human and non-human animals. The preferred patient for treatment is a human. “Patient,” “individual” and “subject” are used interchangeably herein.
According to all aspects of the invention the cancer may be ovarian cancer,
peritoneal cancer or fallopian tube cancer. In certain embodiments the ovarian cancer is high grade serous ovarian cancer. The cancer may also be leukemia, brain cancer, glioblastoma prostate cancer, liver cancer, stomach cancer, colorectal cancer, colon cancer, thyroid cancer, neuroendocrine cancer, gastrointestinal stromal tumors (GIST), gastric cancer, lymphoma, throat cancer, breast cancer, skin cancer, melanoma, multiple myeloma, lung cancer, sarcoma, cervical cancer, testicular cancer, bladder cancer, endocrine cancer, endometrial cancer, esophageal cancer, glioma, lymphoma, neuroblastoma, osteosarcoma, pancreatic cancer, pituitary cancer, renal cancer, and the like. As used herein, colorectal cancer encompasses cancers that may involve cancer in tissues of both the rectum and other portions of the colon as well as cancers that may be individually classified as either colon cancer or rectal cancer.
In all aspects of the invention the anti-angiogenic therapeutic agent may be a VEGF-pathway-targeted therapeutic agent, an angiopoietin-TIE2 pathway inhibitor, an endogenous angiogenic inhibitor, or an immunomodulatory agent. In certain embodiments the VEGF pathway-targeted therapeutic agent is selected from Bevacizumab (Avastin), Aflibercept (VEGF Trap), IMC-1121B (Ramucirumab), Imatinib (Gleevec), Sorafenib (Nexavar), Gefitinib (Iressa), Sunitinib (Sutent), Erlotinib, Tivozinib, Cediranib (Recentin), Pazopanib (Votrient), BIBF 1120 (Vargatef), Dovitinib, Semaxanib (Sugen), Axitinib (AG013736), Vandetanib (Zactima), Nilotinib (Tasigna), Dasatinib (Sprycel), Vatalanib, Motesanib, ABT-869, TKI-258 or a combination thereof. The angiopoietin-TIE2 pathway inhibitor may be selected from AMG-386, PF-4856884 CVX-060, CEP-11981, CE-245677, MEDI-3617, CVX-241, Trastuzumab (Herceptin) or a combination thereof. In certain embodiments the endogenous angiogenic inhibitor is selected from Thombospondin, Endostatin, Tumstatin, Canstatin, Arrestin, Angiostatin, Vasostatin, Interferon alpha or a combination thereof. In further embodiments the immunomodulatory agent is selected from thalidomide and lenalidomide. In specific embodiments the VEGF pathway-targeted therapeutic agent is bevacizumab.
Accordingly, in a further aspect, the present invention relates to a method for selecting whether to administer Bevacizumab to a subject, comprising:
in a test sample obtained from a subject suffering from ovarian cancer, which subject is being, has been and/or will be treated using a platinum-based chemotherapeutic agent and/or a mitotic inhibitor;
measuring expression levels of at least 2 biomarkers;
determining a sample expression score for the 2 or more biomarkers;
comparing the sample expression score to a threshold score;
wherein if the sample expression score is above or equal to the threshold expression score the cancer belongs to a cancer sub-type defined by the expression levels of the genes in Tables A and B
selecting a treatment based on whether the cancer belongs to the sub-type, wherein if the cancer belongs to the sub-type Bevacizumab is contraindicated.
In certain embodiments if Bevacizumab is contraindicated the patient is and/or continues to be treated with a platinum-based chemotherapeutic agent and/or a mitotic inhibitor. In further embodiments if the cancer does not belong to the sub-type the patient is and/or continues to be treated with a platinum-based chemotherapeutic agent and/or a mitotic inhibitor together with Bevacizumab.
According to all aspects of the invention the method may comprise measuring the expression level of at least 3 biomarkers in a sample from the subject, wherein at least two of the biomarkers are from Table A and at least one of the biomarkers is from Table B.
The method may comprise measuring the expression levels of at least 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120 or each of the biomarkers from Table F. In certain embodiments the method may comprise measuring the expression levels of 4-20, preferably 4-15, more preferably 4-11 of the biomarkers from Table F. The inventors have shown that measuring the expression levels of at least 4 of the markers in Table F enables the subtype to be reliably detected.
The biomarkers from Table F are ranked in Table G from most important to least important based upon hazard ratio reduction when the genes are included versus when they are excluded. The genes/biomarkers may be selected for inclusion in a panel of biomarkers/a signature based on their ranking. Table H illustrates probesets that can be used to detect expression of the biomarkers.
Accordingly, the method may comprise measuring the expression levels of at least one of GABRE, HLA-DPA1, CHI3L1, KCND2, GBP3, UPK2, SYTL4, LRRN1, USP53 and POU2F3. In specific embodiments the method comprises measuring the expression levels of each of GABRE, HLA-DPA1, CHI3L1, KCND2, GBP3, UPK2, SYTL4, LRRN1, USP53 and POU2F3. In further embodiments the method comprises measuring the expression levels of each of the biomarkers from Table F.
The method may comprise measuring the expression levels of at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230 or each of the biomarkers from Table I. In certain embodiments the method may comprise measuring the expression levels of 10-25 biomarkers from Table I. The inventors have shown that measuring the expression levels of at least 10 of the markers in Table I enables the subtype to be reliably detected.
The biomarkers from Table I are ranked in Table J from most important to least important based upon hazard ratio reduction when the genes are included versus when they are excluded. The genes/biomarkers may be selected for inclusion in a panel of biomarkers/a signature based on their ranking. Table K illustrates probesets that can be used to detect expression of the biomarkers.
Accordingly, the method may comprise measuring the expression levels of at least one of MT1L, MT1G, LRP4, RASL11B, IFI27, PKIA, ALOX5AP, UBD, MEX3B, and TMEM98. In specific embodiments the method comprises measuring the expression levels of each of MT1L, MT1G, LRP4, RASL11B, IFI27, PKIA, ALOX5AP, UBD, MEX3B, and TMEM98. In further embodiments the method comprises measuring the expression levels of each of the biomarkers listed in Table I.
The method may comprise measuring the expression levels of at least 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 185 or each of the biomarkers from Table L. In certain embodiments the method may comprise measuring the expression levels of 15-26 biomarkers from Table L. The inventors have shown that measuring the expression levels of at least 15 of the biomarkers in Table L enables the subtype to be reliably detected.
The biomarkers from Table L are ranked in Table M from most important to least important based upon hazard ratio reduction when the genes are included versus when they are excluded. The genes/biomarkers may be selected for inclusion in a panel of biomarkers/a signature based on their ranking. Table N illustrates probesets that can be used to detect expression of the biomarkers.
The method may comprise measuring the expression levels of at least one of MTL1, GABRE, KCND2, UPK2, HLA-DPA1, SYTL4, SCEL, MZT1, EFNB3, and DLL1. In specific embodiments the method comprises measuring the expression levels of each of MTL1, GABRE, KCND2, UPK2, HLA-DPA1, SYTL4, SCEL, MZT1, EFNB3, and DLL1. In further embodiments the method comprises measuring the expression levels of each of the biomarkers listed in Table L.
Methods for determining the expression levels of the biomarkers are described in greater detail herein. Typically, the methods may involve contacting a sample obtained from a subject with a detection agent, such as primers/probes/antibodies (as discussed in detail herein) specific for the biomarker and detecting expression products.
According to all aspects of the invention the expression level of the gene or genes may be measured by any suitable method. Genes may also be referred to, interchangeably, as biomarkers. In certain embodiments the expression level is determined at the level of protein, RNA or epigenetic modification. The epigenetic modification may be DNA methylation.
The expression level may be determined by immunohistochemistry. By Immunohistochemistry is meant the detection of proteins in cells of a tissue sample by using a binding reagent such as an antibody or aptamer that binds specifically to the proteins.
Accordingly, in a further aspect, the present invention relates to an antibody or aptamer that binds specifically to a protein product of at least one of the biomarkers listed herein.
The antibody may be of monoclonal or polyclonal origin. Fragments and derivative antibodies may also be utilised, to include without limitation Fab fragments, ScFv, single domain antibodies, nanoantibodies, heavy chain antibodies, aptamers etc. which retain peptide-specific binding function and these are included in the definition of “antibody”. Such antibodies are useful in the methods of the invention. They may be used to measure the level of a particular protein, or in some instances one or more specific isoforms of a protein. The skilled person is well able to identify epitopes that permit specific isoforms to be discriminated from one another.
Methods for generating specific antibodies are known to those skilled in the art. Antibodies may be of human or non-human origin (e.g. rodent, such as rat or mouse) and be humanized etc. according to known techniques (Jones et al., Nature (1986) May 29-Jun. 4; 321(6069):522-5; Roguska et al., Protein Engineering, 1996, 9(10):895-904; and Studnicka et al., Humanizing Mouse Antibody Frameworks While Preserving 3-D Structure. Protein Engineering, 1994, Vol. 7, pg 805).
In certain embodiments the expression level is determined using an antibody or aptamer conjugated to a label. By label is meant a component that permits detection, directly or indirectly. For example, the label may be an enzyme, optionally a peroxidase, or a fluorophore.
Where the antibody is conjugated to an enzyme a chemical composition may be used such that the enzyme catalyses a chemical reaction to produce a detectable product. The products of reactions catalyzed by appropriate enzymes can be, without limitation, fluorescent, luminescent, or radioactive or they may absorb visible or ultraviolet light. Examples of detectors suitable for detecting such detectable labels include, without limitation, x-ray film, radioactivity counters, scintillation counters, spectrophotometers, colorimeters, fluorometers, luminometers, and densitometers. In certain embodiments a secondary antibody is used and the expression level is then determined using an unlabeled primary antibody that binds to the target protein and a secondary antibody conjugated to a label, wherein the secondary antibody binds to the primary antibody.
Additional techniques for determining expression level at the level of protein include, for example, Westem blot, immunoprecipitation, immunocytochemistry, mass spectrometry, ELISA and others (see ImmunoAssay: A Practical Guide, edited by Brian Law, published by Taylor & Francis, Ltd., 2005 edition). To improve specificity and sensitivity of an assay method based on immunoreactivity, monoclonal antibodies are often used because of their specific epitope recognition. Polyclonal antibodies have also been successfully used in various immunoassays because of their increased affinity for the target as compared to monoclonal antibodies.
Measuring mRNA in a biological sample may be used as a surrogate for detection of the level of the corresponding protein in the biological sample. Thus, the expression level of any of the genes described herein can also be detected by detecting the appropriate RNA.
Accordingly, in specific embodiments the expression level is determined by microarray, northern blotting, or nucleic acid amplification. Nucleic acid amplification includes PCR and all variants thereof such as real-time and end point methods and qPCR. Typically, PCR includes of a series of 20-40 repeated temperature changes (cycles) with each cycle generally including 2-3 discrete temperature steps for denaturation, annealing and elongation. The cycling is often preceded by a single temperature step (called hold) at a high temperature (>90° C.), and followed by one hold at the end for final product extension or brief storage. The temperatures used and the length of time they are applied in each cycle vary based on a variety of parameters, including the enzyme used for DNA synthesis, the concentration dNTPs in the reaction, and the melting temperature (Tm) of the primers. For DNA polymerases that require heat activation the first step is heating the reaction to a temperature of 94-98° C. for 1-9 minutes. Then the reaction is heated to 94-98° C. for 20-30 seconds, which produces single-stranded DNA molecules. Next the reaction temperature is lowered to 50-65° C. for 20-40 seconds allowing annealing of the primers to the single-stranded DNA template. Typically the annealing temperature is about 3-5° C. below the Tm of the primers used. The temperature of the elongation step depends on the DNA polymerase used e.g. Taq polymerase has its optimum activity temperature at 75-80° C. At this step the DNA polymerase synthesizes a new DNA strand complementary to the DNA template strand by adding dNTPs that are complementary to the template. The extension time depends both on the DNA polymerase used and on the length of the DNA fragment to be amplified—a thousand bases per minute is usual. A final elongation may be performed at a temperature of 70-74° C. for 5-15 minutes after the last PCR cycle to ensure that any remaining single-stranded DNA is fully extended. A final hold at 4-15° C. for an indefinite time may be employed for short-term storage of the reaction. Other nucleic acid amplification techniques are well known in the art, and include methods such as NASBA, 3SR and Transcription Mediated Amplification (TMA). Other suitable amplification methods include the ligase chain reaction (LCR), selective amplification of target polynucleotide sequences (U.S. Pat. No. 6,410,276), consensus sequence primed polymerase chain reaction (U.S. Pat. No. 4,437,975), arbitrarily primed polymerase chain reaction (WO 90/06995), invader technology, strand displacement technology, and nick displacement amplification (WO 2004/067726). This list is not intended to be exhaustive; any nucleic acid amplification technique may be used provided the appropriate nucleic acid product is specifically amplified. Design of suitable primers and/or probes is within the capability of one skilled in the art. Various primer design tools are freely available to assist in this process such as the NCBI Primer-BLAST tool. Primers and/or probes may be at least 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 or 25 (or more) nucleotides in length. mRNA expression levels may be measured by reverse transcription quantitative polymerase chain reaction (RT-PCR followed with qPCR). RT-PCR is used to create a cDNA from the mRNA. The cDNA may be used in a qPCR assay to produce fluorescence as the DNA amplification process progresses. By comparison to a standard curve, qPCR can produce an absolute measurement such as number of copies of mRNA per cell. Northern blots, microarrays, Invader assays, and RT-PCR combined with capillary electrophoresis have all been used to measure expression levels of mRNA in a sample. See Gene Expression Profiling: Methods and Protocols, Richard A. Shimkets, editor, Humana Press, 2004.
RNA expression may be determined by hybridization of RNA to a set of probes. The probes may be arranged in an array. Microarray platforms include those manufactured by companies such as Affymetrix, Illumina and Agilent. Examples of microarray platforms manufactured by Affymetrix include the U133 Plus2 array, the Almac proprietary Xcel™ array and the Almac proprietary Cancer DSAs®, including the Ovarian Cancer DSA®. In specific embodiments a sample of target nucleic acids is first prepared from the initial nucleic acid sample being assayed, where preparation may include labeling of the target nucleic acids with a label, e.g., a member of a signal producing system. Following target nucleic acid sample preparation, the sample is contacted with the array under hybridization conditions, whereby complexes are formed between target nucleic acids that are complementary to probe sequences attached to the array surface. The presence of hybridized complexes is then detected, either qualitatively or quantitatively. Specific hybridization technology which may be practiced to generate the expression profiles employed in the subject methods includes the technology described in U.S. Pat. Nos. 5,143,854; 5,288,644; 5,324,633; 5,432,049; 5,470,710; 5,492,806; 5,503,980; 5,510,270; 5,525,464; 5,547,839; 5,580,732; 5,661,028; 5,800,992; the disclosures of which are herein incorporated by reference; as well as WO 95/21265; WO 96/31622; WO 97/10365; WO 97/27317; EP 373 203; and EP 785 280. In these methods, an array of “probe” nucleic acids that includes a probe for each of the biomarkers whose expression is being assayed is contacted with target nucleic acids as described above. Contact is carried out under hybridization conditions, e.g., stringent hybridization conditions as described above, and unbound nucleic acid is then removed. The resultant pattern of hybridized nucleic acids provides information regarding expression for each of the biomarkers that have been probed, where the expression information is in terms of whether or not the gene is expressed and, typically, at what level, where the expression data, i.e., expression profile, may be both qualitative and quantitative.
The methods described herein may further comprise extracting total nucleic acid or RNA from the sample. Suitable methods are known in the art and include use of commercially available kits such as RNeasy and GeneJET RNA purification kit.
The invention also relates to a system or device for performing a method as described herein.
In a further aspect, the present invention relates to a system or test kit for performing a method as described herein, comprising:
By testing device is meant a combination of components that allows the expression level of a gene to be determined. The components may include any of those described above with respect to the methods for determining expression level at the level of protein, RNA or epigenetic modification. For example the components may be antibodies, primers, detection agents and so on. Components may also include one or more of the following: microscopes, microscope slides, x-ray film, radioactivity counters, scintillation counters, spectrophotometers, colorimeters, fluorometers, luminometers, and densitometers.
In certain embodiments the system or test kit further comprises a display for the output from the processor.
The invention also relates to a computer application or storage medium comprising a computer application as defined above.
In certain example embodiments, provided is a computer-implemented method, system, and a computer program product for selection of whether to administer an anti-angiogenic therapeutic agent to a subject having a cancer and/or prediction of the responsiveness of a subject with cancer to an anti-angiogenic therapeutic agent and/or determining the clinical prognosis of a subject with cancer, in accordance with the methods described herein. For example, the computer program product may comprise a non-transitory computer-readable storage device having computer-readable program instructions embodied thereon that, when executed by a computer, cause the computer to select whether to administer an anti-angiogenic therapeutic agent to a subject having a cancer and/or a predict the responsiveness of a subject with cancer to an anti-angiogenic therapeutic agent and/or determine the clinical prognosis of a subject with cancer as described herein. For example, the computer executable instructions may cause the computer to:
(i) access and/or calculate the determined expression levels of the at least 3 biomarkers in a sample from the subject, wherein at least two of the biomarkers are from Table A and at least one of the biomarkers is from Table B or the at least two biomarkers in a sample on one or more testing devices;
(ii) calculate whether there is an increased or decreased level of the at least 3 biomarkers in a sample from the subject, wherein at least two of the biomarkers are from Table A and at least one of the biomarkers is from Table B or the at least two biomarkers in the sample; and,
(iii) provide an output regarding the selection of whether to administer an anti-angiogenic therapeutic agent to a subject having a cancer and/or a prediction of the responsiveness of a subject with cancer to an anti-angiogenic therapeutic agent and/or the clinical prognosis of a subject with cancer.
In certain example embodiments, the computer-implemented method, system, and computer program product may be embodied in a computer application, for example, that operates and executes on a computing machine and a module. When executed, the application may select whether to administer an anti-angiogenic therapeutic agent to a subject having a cancer and/or a predict the responsiveness of a subject with cancer to an anti-angiogenic therapeutic agent and/or determine the clinical prognosis of a subject with cancer, in accordance with the example embodiments described herein.
As used herein, the computing machine may correspond to any computers, servers, embedded systems, or computing systems. The module may comprise one or more hardware or software elements configured to facilitate the computing machine in performing the various methods and processing functions presented herein. The computing machine may include various internal or attached components such as a processor, system bus, system memory, storage media, input/output interface, and a network interface for communicating with a network, for example.
The computing machine may be implemented as a conventional computer system, an embedded controller, a laptop, a server, a customized machine, any other hardware platform, such as a laboratory computer or device, for example, or any combination thereof. The computing machine may be a distributed system configured to function using multiple computing machines interconnected via a data network or bus system, for example.
The processor may be configured to execute code or instructions to perform the operations and functionality described herein, manage request flow and address mappings, and to perform calculations and generate commands. The processor may be configured to monitor and control the operation of the components in the computing machine. The processor may be a general purpose processor, a processor core, a multiprocessor, a reconfigurable processor, a microcontroller, a digital signal processor (“DSP”), an application specific integrated circuit (“ASIC”), a graphics processing unit (“GPU”), a field programmable gate array (“FPGA”), a programmable logic device (“PLD”), a controller, a state machine, gated logic, discrete hardware components, any other processing unit, or any combination or multiplicity thereof. The processor may be a single processing unit, multiple processing units, a single processing core, multiple processing cores, special purpose processing cores, co-processors, or any combination thereof. According to certain example embodiments, the processor, along with other components of the computing machine, may be a virtualized computing machine executing within one or more other computing machines.
The system memory may include non-volatile memories such as read-only memory (“ROM”), programmable read-only memory (“PROM”), erasable programmable read-only memory (“EPROM”), flash memory, or any other device capable of storing program instructions or data with or without applied power. The system memory may also include volatile memories such as random access memory (“RAM”), static random access memory (“SRAM”), dynamic random access memory (“DRAM”), and synchronous dynamic random access memory (“SDRAM”). Other types of RAM also may be used to implement the system memory. The system memory may be implemented using a single memory module or multiple memory modules. While the system memory may be part of the computing machine, one skilled in the art will recognize that the system memory may be separate from the computing machine without departing from the scope of the subject technology. It should also be appreciated that the system memory may include, or operate in conjunction with, a non-volatile storage device such as the storage media.
The storage media may include a hard disk, a floppy disk, a compact disc read only memory (“CD-ROM”), a digital versatile disc (“DVD”), a Blu-ray disc, a magnetic tape, a flash memory, other non-volatile memory device, a solid state drive (“SSD”), any magnetic storage device, any optical storage device, any electrical storage device, any semiconductor storage device, any physical-based storage device, any other data storage device, or any combination or multiplicity thereof. The storage media may store one or more operating systems, application programs and program modules such as module, data, or any other information. The storage media may be part of, or connected to, the computing machine. The storage media may also be part of one or more other computing machines that are in communication with the computing machine, such as servers, database servers, cloud storage, network attached storage, and so forth.
The module may comprise one or more hardware or software elements configured to facilitate the computing machine with performing the various methods and processing functions presented herein. The module may include one or more sequences of instructions stored as software or firmware in association with the system memory, the storage media, or both. The storage media may therefore represent examples of machine or computer readable media on which instructions or code may be stored for execution by the processor. Machine or computer readable media may generally refer to any medium or media used to provide instructions to the processor. Such machine or computer readable media associated with the module may comprise a computer software product. It should be appreciated that a computer software product comprising the module may also be associated with one or more processes or methods for delivering the module to the computing machine via a network, any signal-bearing medium, or any other communication or delivery technology. The module may also comprise hardware circuits or information for configuring hardware circuits such as microcode or configuration information for an FPGA or other PLD.
The input/output (“I/O”) interface may be configured to couple to one or more external devices, to receive data from the one or more external devices, and to send data to the one or more external devices. Such external devices along with the various internal devices may also be known as peripheral devices. The I/O interface may include both electrical and physical connections for operably coupling the various peripheral devices to the computing machine or the processor. The I/O interface may be configured to communicate data, addresses, and control signals between the peripheral devices, the computing machine, or the processor. The I/O interface may be configured to implement any standard interface, such as small computer system interface (“SCSI”), serial-attached SCSI (“SAS”), fiber channel, peripheral component interconnect (“PCI”), PCI express (PCIe), serial bus, parallel bus, advanced technology attached (“ATA”), serial ATA (“SATA”), universal serial bus (“USB”), Thunderbolt, FireWire, various video buses, and the like. The I/O interface may be configured to implement only one interface or bus technology.
Alternatively, the I/O interface may be configured to implement multiple interfaces or bus technologies. The I/O interface may be configured as part of, all of, or to operate in conjunction with, the system bus. The I/O interface may include one or more buffers for buffering transmissions between one or more external devices, internal devices, the computing machine, or the processor.
The I/O interface may couple the computing machine to various input devices including mice, touch-screens, scanners, electronic digitizers, sensors, receivers, touchpads, trackballs, cameras, microphones, keyboards, any other pointing devices, or any combinations thereof. The I/O interface may couple the computing machine to various output devices including video displays, speakers, printers, projectors, tactile feedback devices, automation control, robotic components, actuators, motors, fans, solenoids, valves, pumps, transmitters, signal emitters, lights, and so forth.
The computing machine may operate in a networked environment using logical connections through the network interface to one or more other systems or computing machines across the network. The network may include wide area networks (WAN), local area networks (LAN), intranets, the Internet, wireless access networks, wired networks, mobile networks, telephone networks, optical networks, or combinations thereof. The network may be packet switched, circuit switched, of any topology, and may use any communication protocol.
Communication links within the network may involve various digital or an analog communication media such as fiber optic cables, free-space optics, waveguides, electrical conductors, wireless links, antennas, radio-frequency communications, and so forth. The processor may be connected to the other elements of the computing machine or the various peripherals discussed herein through the system bus. It should be appreciated that the system bus may be within the processor, outside the processor, or both. According to some embodiments, any of the processor, the other elements of the computing machine, or the various peripherals discussed herein may be integrated into a single device such as a system on chip (“SOC”), system on package (“SOP”), or ASIC device.
Embodiments may comprise a computer program that embodies the functions described and illustrated herein, wherein the computer program is implemented in a computer system that comprises instructions stored in a machine-readable medium and a processor that executes the instructions. However, it should be apparent that there could be many different ways of implementing embodiments in computer programming, and the embodiments should not be construed as limited to any one set of computer program instructions. Further, a skilled programmer would be able to write such a computer program to implement one or more of the disclosed embodiments described herein. Therefore, disclosure of a particular set of program code instructions is not considered necessary for an adequate understanding of how to make and use embodiments. Further, those skilled in the art will appreciate that one or more aspects of embodiments described herein may be performed by hardware, software, or a combination thereof, as may be embodied in one or more computing systems. Moreover, any reference to an act being performed by a computer should not be construed as being performed by a single computer as more than one computer may perform the act.
The example embodiments described herein can be used with computer hardware and software that perform the methods and processing functions described previously. The systems, methods, and procedures described herein can be embodied in a programmable computer, computer-executable software, or digital circuitry. The software can be stored on computer-readable media. For example, computer-readable media can include a floppy disk, RAM, ROM, hard disk, removable media, flash memory, memory stick, optical media, magneto-optical media, CD-ROM, etc. Digital circuitry can include integrated circuits, gate arrays, building block logic, field programmable gate arrays (FPGA), etc.
Reagents, tools, and/or instructions for performing the methods described herein can be provided in a kit. Such a kit can include reagents for collecting a tissue sample from a patient, such as by biopsy, and reagents for processing the tissue. The kit can also include one or more reagents for performing a expression level analysis, such as reagents for performing nucleic acid amplification, including RT-PCR and qPCR, NGS, northern blot, proteomic analysis, or immunohistochemistry to determine expression levels of biomarkers in a sample of a patient. For example, primers for performing RT-PCR, probes for performing northern blot analyses, and/or antibodies or aptamers, as discussed herein, for performing proteomic analysis such as Westem blot, immunohistochemistry and ELISA analyses can be included in such kits. Appropriate buffers for the assays can also be included. Detection reagents required for any of these assays can also be included. The kits may be array or PCR based kits for example and may include additional reagents, such as a polymerase and/or dNTPs for example. The kits featured herein can also include an instruction sheet describing how to perform the assays for measuring expression levels.
The kit may include one or more primer pairs complementary to at least one of the biomarkers described herein.
Informational material included in the kits can be descriptive, instructional, marketing or other material that relates to the methods described herein and/or the use of the reagents for the methods described herein. For example, the informational material of the kit can contain contact information, e.g., a physical address, email address, website, or telephone number, where a user of the kit can obtain substantive information about performing a gene expression analysis and interpreting the results.
The inventors have found that a range of signatures can point to the sub-type and can be identified using the teaching herein.
Accordingly, the invention also relates to a method of deriving a panel of at least 2 biomarkers, wherein the expression level(s) of the at least 2 biomarkers in a sample from a subject with a cancer allows the cancer to be allocated to a sub-type
(optionally) wherein the cancer sub-type is defined by the expression levels of the genes in Tables A and B
said method comprising the steps of:
sorting samples from a sample set of known pathology and/or clinical outcome on the basis of allocation to the sub-type
obtaining the expression profiles of the samples
analysing the expression profiles from the sample set using a mathematical model identifying from the results of the mathematical model one or more biomarkers expressed in the sample set that are most predictive of the cancer sub-type.
In certain embodiments the mathematical model is a parametric, non-parametric or semi-parametric model. In specific embodiments the mathematical model is Partial Least Squares (PLS), Shrinkage Discriminate Analysis (SDA), or Diagonal SDA (DSDA). Identifying from the results of the mathematical model one or more biomarkers expressed in the sample set that are most predictive of the cancer sub-type may comprise identifying one or more biomarkers for which area under the receiver operator characteristic curve (AUC) and/or Concordance Index (C-Index) are significant.
In certain embodiments the panel is derived by obtaining the expression profiles of samples from a sample set of known pathology and/or clinical outcome. The samples may originate from the same sample tissue type or different tissue types. As used herein an “expression profile” comprises a set of values representing the expression level for each biomarker analyzed from a given sample.
The expression profiles from the sample set are then analyzed using a mathematical model. Different mathematical models may be applied and include, but are not limited to, models from the fields of pattern recognition (Duda et al. Pattern Classification, 2nd ed., John Wiley, New York 2001), machine learning (Schölkopf et al. Learning with Kernels, MIT Press, Cambridge 2002, Bishop, Neural Networks for Pattern Recognition, Clarendon Press, Oxford 1995), statistics (Hastie et al. The Elements of Statistical Learning, Springer, New York 2001), bioinformatics (Dudoit et al., 2002, J. Am. Statist. Assoc. 97:77-87, Tibshirani et al., 2002, Proc. Natl. Acad. Sci. USA 99:6567-6572) or chemometrics (Vandeginste, et al., Handbook of Chemometrics and Qualimetrics, Part B, Elsevier, Amsterdam 1998). The mathematical model identifies one or more biomarkers expressed in the sample set that are most predictive of the cancer sub-type. These one or more biomarkers define a panel or an expression signature. In certain example embodiments, the mathematical model defines a variable, such as a weight, for each identified biomarker. In certain example embodiments, the mathematical model defines a decision function. The decision function may further define a threshold score which separates the sample set into two classes such as, but not limited to, samples where the cancer belongs to the cancer sub-type and samples where the cancer does not belong to the sub-type. In one example embodiment, the decision function and panel or expression signature are defined using a linear classifier.
The overall expression data for a given sample may be normalized using methods known to those skilled in the art in order to correct for differing amounts of starting material, varying efficiencies of the extraction and amplification reactions.
In certain example embodiments, biomarkers useful for distinguishing between cancer subtypes can be determined by identifying biomarkers exhibiting the highest degree of variability between samples in the patient data set as determined using the expression detection methods and patient sample sets discussed above. Standard statistical methods known in the art for identifying highly variable data points in expression data may be used to identify the highly variable biomarkers. For example, a combined background and variance filter to the patient data set. The background filter is based on the selection of probe sets with expression E and expression variance varE above the thresholds defined by background standard deviation σBg (from the Expression Console software) and quantile of the standard normal distribution zα at a specified significance a probe sets were kept if:
E>log2((zaσBg)); log2((varE)>2[log2(σBg)−E−log2(log(2))]
where a defines a significance threshold. In certain example embodiment, the significance threshold is 6.3·10−5. In another example embodiment, the significance threshold may be between 1.0·10−7 to 1.0·10−3.
In certain example embodiments, the highly variable biomarkers may be further analyzed to group samples in the patient data set into subtypes or clusters based on similar gene expression profiles. For examples, biomarkers may be clustered based on how highly correlated the up-regulation or down-regulation of their expression is to one another. Different clustering analysis techniques may be applied to gene expression data and include, but are not limited to hierarchical clustering, inclusive of agglomerative and divisive methods (Eisen et al., 1998, PNAS 25:14863-14868), k-mean family clustering, inclusive of hard and fuzzy methods (Tavazoie et al., 1999, Nat Genet, 22281-285; Gasch and Eisen, 2002, Genome Biology 3: RESEARCH0059), self-organizing maps (SOM) (Tamayo et al., 1999, PNAS 96:2907-2912), methods based on graph theory (Sharan and Shamir, 2000, Proc Int Conf Intell Syst Mol Biol., 8:307-16), biclustering methods (Tanay et al., 2002, Bioinformatics 18: Suppl 1:S136-44), and ensemble methods (Dudoit et al. 2003, Bioinformatics, 19:1090-9). In one example embodiment, hierarchical agglomerative clustering is used to identify the cancer subtypes.
During clustering, determination of the similarity of features (sample, gene) requires the specification of a similarity matrix and methods used to calculate the similarity include, but are not limited to Euclidean distance, maximum distance, Manhattan distance, Minkowski distance, Canberra distance, binary distance, kendall's tau, Pearson correlation, Spearman correlation.
During hierarchical clustering, inter-cluster distances are defined by linkage functions. Several linkage functions can be used to calculate inter-cluster distances and include, but are not limited to single linkage (Sneath, 1957, Journal of General Microbiology, 17:201-226), complete linkage (McQuitty, 1960, Educational and Psychological Measurement, 20:55-67; Sokal and Sneath, 1963, Principles of Numerical Taxonomy, San Francisco:Freeman), UPGMA/group average (Sokal and Michener, 1958, University of Kansas Scientific Bulletin, 38:1409-1438), UPGMC/unweighted centroid (Lance and Williams, 1965, Computer Journal, 8246:249), WPGMC/weighted centroid (Gower, 1967, Biometrics, 30:623-637) and Ward's method of minimum variance (Ward, 1963, Journal of the American Statistical Association, 58:236-244).
To determine the biological relevance of each subtype, the biomarkers within each cluster may be further mapped to their corresponding genes and annotated by cross-reference to one or more databases referencing metabolic and signaling pathways, human gene functions and disease association, and/or ontological categories (e.g. biological processes, cellular components, molecular functions). In another example embodiment, biomarkers in clusters that are up regulated and enriched for immune response general functional terms are grouped into a putative non-angiongenesis sample group and used for expression signature generation. In another example embodiment, biomarkers in clusters that are down regulated and enriched for angiogenesis and vasculature development and are up regulated and enriched for immune response general functional terms are grouped into a putative non-angiongenesis sample group and used for expression signature generation. Further details for conducting functional analysis of biomarker clusters is provided in the Examples section below.
The following methods may be used to derive panels or expression signatures for distinguishing between cancers that belong to the sub-type or not or between subjects that are responsive or non-responsive to anti-angiogenic therapeutics, or as prognostic indicators of certain cancer types, including expression signatures derived from the biomarkers disclosed above. In certain other example embodiments, the panel or expression signature is derived using a decision tree (Hastie et al. The Elements of Statistical Learning, Springer, New York 2001), a random forest (Breiman, 2001 Random Forests, Machine Learning 45:5), a neural network (Bishop, Neural Networks for Pattern Recognition, Clarendon Press, Oxford 1995), discriminant analysis (Duda et al. Patter Classification, 2nd ed., John Wiley, New York 2001), including, but not limited to linear, diagonal linear, quadratic and logistic discriminant analysis, a Prediction Analysis for Microarrays (PAM, (Tibshirani et al., 2002, Proc. Natl. Acad. Sci. USA 99:6567-6572)) or a Soft Independent Modeling of Class Analogy analysis. (SIMCA, (Wold, 1976, Pattern Recogn. 8:127-139)). Classification trees (Breiman, Leo; Friedman, J. H.; Olshen, R. A.; Stone, C. J. (1984). Classification and regression trees. Monterey, Calif.: Wadsworth & Brooks/Cole Advanced Books & Software. ISBN 978-0-412-04841-8) provide a means of predicting outcomes based on logic and rules. A classification tree is built through a process called binary recursive partitioning, which is an iterative procedure of splitting the data into partitions/branches. The goal is to build a tree that distinguishes among pre-defined classes. Each node in the tree corresponds to a variable. To choose the best split at a node, each variable is considered in turn, where every possible split is tried and considered, and the best split is the one which produces the largest decrease in diversity of the classification label within each partition. This is repeated for all variables, and the winner is chosen as the best splitter for that node. The process is continued at the next node and in this manner, a full tree is generated. One of the advantages of classification trees over other supervised learning approaches such as discriminant analysis, is that the variables that are used to build the tree can be either categorical, or numeric, or a mix of both. In this way it is possible to generate a classification tree for predicting outcomes based on say the directionality of gene expression. Random forest algorithms (Breiman, Leo (2001). “Random Forests”. Machine Learning 45 (1): 5-32. doi:10.1023/A:1010933404324) provide a further extension to classification trees, whereby a collection of classification trees are randomly generated to form a “forest” and an average of the predicted outcomes from each tree is used to make inference with respect to the outcome.
Biomarker expression values may be defined in combination with corresponding scalar weights on the real scale with varying magnitude, which are further combined through linear or non-linear, algebraic, trigonometric or correlative means into a single scalar value via an algebraic, statistical leaming, Bayesian, regression, or similar algorithms which together with a mathematically derived decision function on the scalar value provide a predictive model by which expression profiles from samples may be resolved into discrete classes of responder or non-responder, resistant or non-resistant, to a specified drug, drug class, molecular subtype, or treatment regimen. Such predictive models, including biomarker membership, are developed by learning weights and the decision threshold, optimized for sensitivity, specificity, negative and positive predictive values, hazard ratio or any combination thereof, under cross-validation, bootstrapping or similar sampling techniques, from a set of representative expression profiles from historical patient samples with known drug response and/or resistance.
In one embodiment, the biomarkers are used to form a weighted sum of their signals, where individual weights can be positive or negative. The resulting sum (“expression score”) is compared with a pre-determined reference point or value. The comparison with the reference point or value may be used to diagnose, or predict a clinical condition or outcome.
In certain example embodiments, the panel or expression signature is defined by a decision function. A decision function is a set of weighted expression values derived using a linear classifier. All linear classifiers define the decision function using the following equation:
f(x)=w′·x+b=Σwi·xi+b (1)
All measurement values, such as the microarray gene expression intensities xi, for a certain sample are collected in a vector x. Each intensity is then multiplied with a corresponding weight wi to obtain the value of the decision function f(x) after adding an offset term b. In deriving the decision function, the linear classifier will further define a threshold value that splits the gene expression data space into two disjoint sections. Example linear classifiers include but are not limited to partial least squares (PLS), (Nguyen et al., Bioinformatics 18 (2002) 39-50), support vector machines (SVM) (Schölkopf et al., Learning with Kernels, MIT Press, Cambridge 2002), and shrinkage discriminant analysis (SDA) (Ahdesmäki et al., Annals of applied statistics 4, 503-519 (2010)). In one example embodiment, the linear classifier is a PLS linear classifier.
The decision function is empirically derived on a large set of training samples, for example from patients showing a good or poor clinical prognosis. The threshold separates a patient group based on different characteristics such as, but not limited to, clinical prognosis before or after a given therapeutic treatment. The interpretation of this quantity, i.e. the cut-off threshold, is derived in the development phase (“training”) from a set of patients with known outcome. The corresponding weights and the responsiveness/resistance cut-off threshold for the decision score are fixed a priori from training data by methods known to those skilled in the art. In one example embodiment, Partial Least Squares Discriminant Analysis (PLS-DA) is used for determining the weights. (L. Ståhle, S. Wold, J. Chemom. 1 (1987) 185-196; D. V. Nguyen, D. M. Rocke, Bioinformatics 18 (2002) 39-50).
Effectively, this means that the data space, i.e. the set of all possible combinations of biomarker expression values, is split into two mutually exclusive groups corresponding to different clinical classifications or predictions, for example, one corresponding to good clinical prognosis and poor clinical prognosis. In the context of the overall classifier, relative over-expression of a certain biomarker can either increase the decision score (positive weight) or reduce it (negative weight) and thus contribute to an overall decision of, for example, a good clinical prognosis.
In certain example embodiments of the invention, the data is transformed non-linearly before applying a weighted sum as described above. This non-linear transformation might include increasing the dimensionality of the data. The non-linear transformation and weighted summation might also be performed implicitly, for example, through the use of a kernel function. (Schölkopf et al. Learning with Kernels, MIT Press, Cambridge 2002).
In certain example embodiments, the patient training set data is derived by isolated RNA from a corresponding cancer tissue sample set and determining expression values by hybridizing the cDNA amplified from the isolated RNA to a microarray. In certain example embodiments, the microarray used in deriving the panel or expression signature is a transcriptome array. As used herein a “transcriptome array” refers to a microarray containing probe sets that are designed to hybridize to sequences that have been verified as expressed in the diseased tissue of interest. Given alternative splicing and variable poly-A tail processing between tissues and biological contexts, it is possible that probes designed against the same gene sequence derived from another tissue source or biological context will not effectively bind to transcripts expressed in the diseased tissue of interest, leading to a loss of potentially relevant biological information. Accordingly, it is beneficial to verify what sequences are expressed in the disease tissue of interest before deriving a microarray probe set. Verification of expressed sequences in a particular disease context may be done, for example, by isolating and sequencing total RNA from a diseased tissue sample set and cross-referencing the isolated sequences with known nucleic acid sequence databases to verify that the probe set on the transcriptome array is designed against the sequences actually expressed in the diseased tissue of interest. Methods for making transcriptome arrays are described in United States Patent Application Publication No. 2006/0134663, which is incorporated herein by reference. In certain example embodiments, the probe set of the transcriptome array is designed to bind within 300 nucleotides of the 3′ end of a transcript. Methods for designing transcriptome arrays with probe sets that bind within 300 nucleotides of the 3′ end of target transcripts are disclosed in United States Patent Application Publication No. 2009/0082218, which is incorporated by reference herein. In certain example embodiments, the microarray used in deriving the gene expression profiles of the present invention is the Almac Ovarian Cancer DSA™ microarray (Almac Group, Craigavon, United Kingdom).
An optimal (linear) classifier can be selected by evaluating a (linear) classifier's performance using such diagnostics as “area under the curve” (AUC). AUC refers to the area under the curve of a receiver operating characteristic (ROC) curve, both of which are well known in the art. AUC measures are useful for comparing the accuracy of a classifier across the complete data range. (Linear) classifiers with a higher AUC have a greater capacity to classify unknowns correctly between two groups of interest (e.g., ovarian cancer samples and normal or control samples). ROC curves are useful for plotting the performance of a particular feature (e.g., any of the biomarkers described herein and/or any item of additional biomedical information) in distinguishing between two populations (e.g., individuals responding and not responding to a therapeutic agent). Typically, the feature data across the entire population (e.g., the cases and controls) are sorted in ascending order based on the value of a single feature. Then, for each value for that feature, the true positive and false positive rates for the data are calculated. The true positive rate is determined by counting the number of cases above the value for that feature and then dividing by the total number of positive cases. The false positive rate is determined by counting the number of controls above the value for that feature and then dividing by the total number of controls. Although this definition refers to scenarios in which a feature is elevated in cases compared to controls, this definition also applies to scenarios in which a feature is lower in cases compared to the controls (in such a scenario, samples below the value for that feature would be counted). ROC curves can be generated for a single feature as well as for other single outputs, for example, a combination of two or more features can be mathematically combined (e.g., added, subtracted, multiplied, etc.) to provide a single sum value, and this single sum value can be plotted in a ROC curve. Additionally, any combination of multiple features, in which the combination derives a single output value, can be plotted in a ROC curve. These combinations of features may comprise a test. The ROC curve is the plot of the true positive rate (sensitivity) of a test against the false positive rate (1-specificity) of the test.
In certain embodiments deriving a panel of at least 2 biomarkers, wherein the expression level(s) of the at least 2 biomarkers in a sample from a subject with a cancer allows the cancer to be allocated to a sub-type (optionally) wherein the cancer sub-type is defined by the expression levels of the genes in Tables A and B comprises obtaining the expression profiles of a training set of samples known to belong to the sub-type or not using microarray probes
mapping probes to genes and measuring gene expression using the log2 transformation of the median probeset expression for each gene
within nested CV, performing quantile normalization following a pre-filtering to remove 75% of genes with low variance, low intensity, and high correlation to cDNA yield
ranking genes/features based on correlation adjusted t-scores2 and discarding 10% of the least important genes until 5 genes remain
identifying a panel of at least 2 biomarkers for which AUC and C-Index (Concordance Index) for the Progression free survival (PFS) endpoint under cross-validation are significant.
In further embodiments deriving a panel of at least 2 biomarkers, wherein the expression level(s) of the at least 2 biomarkers in a sample from a subject with a cancer allows the cancer to be allocated to a sub-type (optionally) wherein the cancer sub-type is defined by the expression levels of the genes in Tables A and B comprises
obtaining the expression profiles of a training set of samples known to belong to the sub-type or not using microarray probes
mapping probes to genes and measuring gene expression using the log2 transformation of the median probeset expression for each gene
within nested CV, performing quantile normalization following a pre-filtering to remove 75% of genes with low variance, low intensity, and high correlation to cDNA yield
using Recursive Feature Elimination (RFE) for feature reduction to discard 10% of the least important genes (based upon their discriminatory ability) until 5 genes remain
identifying a panel of at least 2 biomarkers for which AUC and C-Index (Concordance Index) for the Progression free survival (PFS) endpoint under cross-validation are significant.
The signatures/panels described herein may result from the application of the methods for deriving panels of biomarkers described herein.
According to all aspects of the invention the method may comprise allocating the cancer to the sub-type based on the expression level of a panel of one or more, optionally two or more, biomarkers derived using the method outlined above in a sample from the subject.
The example systems, methods, and acts described in the embodiments presented previously are illustrative, and, in alternative embodiments, certain acts can be performed in a different order, in parallel with one another, omitted entirely, and/or combined between different example embodiments, and/or certain additional acts can be performed, without departing from the scope and spirit of various embodiments. Accordingly, such alternative embodiments are included in the examples described herein.
Although specific embodiments have been described above in detail, the description is merely for purposes of illustration. It should be appreciated, therefore, that many aspects described above are not intended as required or essential elements unless explicitly stated otherwise.
Modifications of, and equivalent components or acts corresponding to, the disclosed aspects of the example embodiments, in addition to those described above, can be made by a person of ordinary skill in the art, having the benefit of the present disclosure, without departing from the spirit and scope of embodiments defined in the following claims, the scope of which is to be accorded the broadest interpretation so as to encompass such modifications and equivalent structures.
The present invention will be further understood by reference to the following experimental examples.
A cohort of 287 macrodissected epithelial serous ovarian tumor FFPE tissue samples sourced from the NHS Lothian and University of Edinburgh.
Gene Expression Profiling from FFPE
Total RNA was extracted from macrodissected FFPE tissue using the High Pure RNA Paraffin Kit (Roche Diagnostics GmbH, Mannheim, Germany). RNA was converted into complementary deoxyribonucleic acid (cDNA), which was subsequently amplified and converted into single-stranded form using the SPIA® technology of the WT-Ovation™ FFPE RNA Amplification System V2 (NuGEN Technologies Inc., San Carlos, Calif., USA). The amplified single-stranded cDNA was then fragemented and biotin labeled using the FL-Ovation™ cDNA Biotin Module V2 (NuGEN Technologies Inc.). The fragmented and labeled cDNA was then hybridized to the Almac Ovarian Cancer DSA™. Almac's Ovarian Cancer DSA research tool has been optimised for analysis of FFPE tissue samples, enabling the use of valuable archived tissue banks. The Almac Ovarian Cancer DSA™ research tool is an innovative microarray platform that represents the transcriptome in both normal and cancerous ovarian tissues. Consequently, the Ovarian Cancer DSA™ provides a comprehensive representation of the transcriptome within the ovarian disease and tissue setting, not available using generic microarray platforms. Arrays were scanned using the Affymentrix Genechip® Scanner 7G (Affymetrix Inc., Santa Clara, Calif.).
Data Preparation
Quality Control (QC) of profiled samples was carried out using MAS5 pre-processing algorithm. Different technical aspects were addressed: average noise and background homogeneity, percentage of present call (array quality), signal quality, RNA quality and hybridization quality. Distributions and Median Absolute Deviation of corresponding parameters were analyzed and used to identify possible outliers.
Almac's Ovarian Cancer DSA™ contains probes that primarily target the area within 300 nucleotides from the 3′ end. Therefore standard Affymetrix RNA quality measures were adapted—for housekeeping genes intensities of 3′ end probe sets with ratios of 3′ end probe set intensity to the average background intensity were used in addition to usual 3′/5′ ratios. Hybridization controls were checked to ensure that their intensities and present calls conform to the requirements specified by Affymetrix.
Hierarchical Clustering and Functional Analysis
Sample pre-processing was carried out using Robust Multi-Array analysis (RMA) [Irizarry R A, Bolstad B M, Collin F, Cope L M, Hobbs B, Speed T P. Summaries of Affymetrix GeneChip probe level data. Nucleic acids research 2003; 31:015]. The data matrix was sorted by decreasing variance, decreasing intensity and increasing correlation to cDNA yield. Following filtering of probe sets correlated with cDNA yield, incremental subsets of the data matrix were tested for cluster stability: the GAP statistic [Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. J Roy Stat Soc B 2001; 63:411-23] was applied to calculate the number of sample and probe set clusters while the stability of cluster composition was assessed using partition comparison methods. The final most variable probe set list was determined based on the smallest and most stable data matrix for the selected number of sample cluster.
Following standardization of the data matrix to the median probe set expression values, agglomerative hierarchical clustering was performed using Euclidean distance and Ward's linkage method [Ward J H. Hierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association 1963; 58:236-&.]. The optimal number of sample and probe set clusters was determined using the GAP statistic [Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. J Roy Stat Soc B 2001; 63:411-23]. The significance of the distribution of clinical parameter factor levels across sample clusters was assessed using ANOVA (continuous factor) or chi-squared analysis (discrete factor) and corrected for false discovery rate (product of p-value and number of tests performed). A corrected p-value threshold of 0.05 was used as criterion for significance. Ovarian Cancer DSA® probe sets were remapped to genes using an annotation pipeline based on Ensembl v60 [http://oct2012.archive.ensembl.org/]. Functional enrichment analysis was conducted to identify and rank biological entities which were found to be associated with the clustered gene sets using the Gene Ontology biological processes classification [Ashburner M, Ball C A, Blake J A, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics 2000; 25:25-9]. Entities were ranked according to a statistically derived enrichment score [Cho R J, Huang M X, Campbell M J, et al. Transcriptional regulation and function during the human cell cycle. Nature genetics 2001; 27:48-54] and adjusted for multiple testing [Benjamini Y, Hochberg Y. Controlling the False Discovery Rate—a Practical and Powerful Approach to Multiple Testing. J Roy Stat Soc B Met 1995; 57:289-300]. A corrected p-value of 0.05 was used as significance threshold. The identified enriched processes were summarised into an overall group function for each probe set/gene cluster.
Defining the Core Genes
The core angiogenic and immune genes were defined by evaluating functional enrichment of the 136 immune and 350 angiogeneic probe sets that constitute the immune and angiogenic clusters from the unsupervised analysis of the 265 HGS samples was performed using Almac's Functional Enrichment Tool (FET) v1.1.0. The functions were ordered by p-value and the 100 most significant biological functions were looked at. Of these 100 significant functions the ones directly related to immune processes (immune response, inflamatory response, interferon, antigen processing) or angiogeneic processes (angiogenesis, vasculature development, system development) were kept and the genes involved in each process were kept and remapped to the ovarian array resulting in the 238 core functional genes (77 immune, 161 angiogenesis)
Results
265 HGS tumors passed microarray QC and subsequently underwent unsupervised hierarchical clustering based on 1400 most variable probe sets (corresponding to 1040 genes). Three sample clusters and four gene clusters were identified (
Multivariable survival analysis according to subgroup revealed that the patients in the Immune cluster had significantly prolonged OS compared to both patients in the Angioimmune (HR-0.58 [0.41-0.82], padj=0.001) and Angio clusters (HR-0.55 [0.37−0.80], padj=0.001). Kaplan-Meier curves are shown in
Since patients in the Immune cluster had a significantly better outcome than those in the other clusters we proceeded to develop an assay to prospectively identify these patients in the clinic. In addition, given the low expression of angiogenic genes in the immune cluster, we hypothesized that this assay may identify a population that would not benefit from therapies targeting angiogenesis, although it would require additional datasets to test this theory. For the purpose of signature generation the Angio and Angioimmune clusters were grouped together and labeled as the “pro-angiogenic” group.
The core set of genes to define the “Immune” subtype comprise 161 angiogenesis related probesets and 77 immune related probesets. The general pattern of expression to define the subtype is up-regulation of immune probesets and down-regulation of angiogenesis probesets.
Scoring Method for Predicting the Immune Subtype
A scoring method was derived to enable classification of patients into one of either the Immune or Pro-Angiogenic subtypes. The scoring method is based on the following, using the 265 high grade serous (HGS) samples that were used to discovery the subtype:
Minimum Number of Genes Required
The ratio of Immune:Angiogenesis probesets is approximately 2:1, therefore in evaluating the minimum number of probesets required to classify samples into the Immune or Pro-angiogenic subtype, it is assumed that a 2:1 ratio should be maintained.
The minimum number of features considered were 3 (2 angio and 1 immune) increasing by three at each iteration up to 228 (maintaining the 2:1 ratio). At each feature length 1000 random samplings of the probesets was performed, and the 265 HGS samples were scored by the signature as described above.
The performance of the signatures was measured by the following:
Results
Scoring Method for Predicting the Immune Subtype
The scoring method applied to all samples using all core probesets resulted in an AUC performance against the subtype endpoint of 0.89 [0.85−0.93].
Minimum Number of Genes Required
Methods:
Signature Development
A balanced sample set of 193 Ovarian HGS samples were used to develop the signature using the PLS19 (Partial Least Squares) method during 10 repeats of 5-fold cross validation (CV). The following steps were used within signature development:
The following datasets have been evaluated within CV to determine the performance of the 63 gene signature:
Core Gene Analysis
The purpose of evaluating the core gene set of the signature is to determine a ranking for the genes based upon their impact on performance when removed from the signature.
This analysis involved 1,000,000 random samplings of 10 signature genes from the original 63 signature gene set. At each iteration, 10 randomly selected signature genes were removed and the performance of the remaining 53 genes was evaluated using the PFS endpoint to determine the impact on HR performance when these 10 genes were removed in the following 3 datasets:
Within each of these 3 datasets, the signature genes were weighted based upon the change in HR performance (Delta HR) based upon their inclusion or exclusion. Genes ranked ‘1’ have the most negative impact on performance when removed and those ranked ‘63’ have the least impact on performance when removed.
Minimum Gene Analysis
The purpose of evaluating the minimum number of genes is to determine if significant performance can be achieved within smaller subsets of the original signature.
This analysis involved 10,000 random samplings of the 63 signature genes starting at 1 gene/feature, up to a maximum of 25 genes/features. For each randomly selected feature length, the signature was redeveloped using the PLS machine learning method under CV and model parameters derived. At each feature length, all randomly selected signatures were applied to calculate signature scores for the following 3 datasets:
Continuous signature scores were evaluated with PFS (Progression Free Survival) to determine the HR (Hazard Ratio) effect. The HR for all random signatures at each feature length was summarized and figures generated to visualize the performance over CV.
Results
Signature Development
This section presents the results of signature development within CV.
Core Gene Analysis
The results for the core gene analysis of the 63 gene signature in 3 datasets is provided in this section.
Minimum Gene Analysis
The results for the minimum gene analysis of the 63 gene signature in 3 datasets is provided in this section.
Methods:
Signature Development
A balanced sample set of 193 Ovarian HGS samples were used to develop the signature using the PLS19 (Partial Least Squares) method during 10 repeats of 5-fold cross validation (CV). The following steps were used within signature development:
The following datasets have been evaluated within CV to determine the performance of the 121 gene signature:
Core Gene Analysis
The purpose of evaluating the core gene set of the signature is to determine a ranking for the genes based upon their impact on performance when removed from the signature.
This analysis involved 1,000,000 random samplings of 10 signature genes from the original 121 signature gene set. At each iteration, 10 randomly selected signature genes were removed and the performance of the remaining 111 genes was evaluated using the PFS endpoint to determine the impact on HR performance when these 10 genes were removed in the following 3 datasets:
Within each of these 3 datasets, the signature genes were weighted based upon the change in HR performance (Delta HR) based upon their inclusion or exclusion. Genes ranked ‘1’ have the most negative impact on performance when removed and those ranked ‘121’ have the least impact on performance when removed.
Minimum Gene Analysis
The purpose of evaluating the minimum number of genes is to determine if significant performance can be achieved within smaller subsets of the original signature.
This analysis involved 10,000 random samplings of the 121 signature genes starting at 1 gene/feature, up to a maximum of 25 genes/features. For each randomly selected feature length, the signature was redeveloped using the PLS machine learning method under CV and model parameters derived. At each feature length, all randomly selected signatures were applied to calculate signature scores for the following 3 datasets:
Continuous signature scores were evaluated with PFS (Progression Free Survival) to determine the HR (Hazard Ratio) effect. The HR for all random signatures at each feature length was summarized and figures generated to visualize the performance over CV.
Results
Signature Development
This section presents the results of signature development within CV.
Core Gene Analysis
The results for the core gene analysis of the 121 gene signature in 3 datasets are provided in this section.
Minimum Gene Analysis
The results for the minimum gene analysis of the 121 gene signature in 3 datasets are provided in this section.
Methods:
Signature Development
A balanced sample set of 193 Ovarian HGS samples were used to develop the signature using the PLS19 (Partial Least Squares) method during 10 repeats of 5-fold cross validation (CV). The following steps were used within signature development:
The following datasets have been evaluated within CV to determine the performance of the 232 gene signature:
Core Gene Analysis
The purpose of evaluating the core gene set of the signature is to determine a ranking for the genes based upon their impact on performance when removed from the signature.
This analysis involved 1,000,000 random samplings of 10 signature genes from the original 232 signature gene set. At each iteration, 10 randomly selected signature genes were removed and the performance of the remaining 222 genes was evaluated using the PFS endpoint to determine the impact on HR performance when these 10 genes were removed in the following 3 datasets:
Within each of these 3 datasets, the signature genes were weighted based upon the change in HR performance (Delta HR) based upon their inclusion or exclusion. Genes ranked ‘1’ have the most negative impact on performance when removed and those ranked ‘232’ have the least impact on performance when removed.
Minimum Gene Analysis
The purpose of evaluating the minimum number of genes is to determine if significant performance can be achieved within smaller subsets of the original signature.
This analysis involved 10,000 random samplings of the 232 signature genes starting at 1 gene/feature, up to a maximum of 25 genes/features. For each randomly selected feature length, the signature was redeveloped using the PLS machine learning method under CV and model parameters derived. At each feature length, all randomly selected signatures were applied to calculate signature scores for the following 3 datasets:
Continuous signature scores were evaluated with PFS (Progression Free Survival) to determine the HR (Hazard Ratio) effect. The HR for all random signatures at each feature length was summarized and figures generated to visualize the performance over CV.
Results
Signature Development
This section presents the results of signature development within CV.
Core Gene Analysis
The results for the core gene analysis of the 232 gene signature in 3 datasets are provided in this section.
Minimum Gene Analysis
The results for the minimum gene analysis of the 232 gene signature in 3 datasets are provided in this section.
Methods:
Signature Development
A balanced sample set of 193 Ovarian HGS samples were used to develop the signature using the SDA (Ahdesmaki, M. and Strimmer, K. (2010) Feature selection in omics prediction problems using cat scores and false non-discovery rate control Annals of applied statistics 4, 503-519) (Shrinkage Discriminate Analysis) method during 10 repeats of 5-fold cross validation (CV). The following steps were used within signature development:
The following datasets have been evaluated within CV to determine the performance of the 188 gene signature:
Core Gene Analysis
The purpose of evaluating the core gene set of the signature is to determine a ranking for the genes based upon their impact on performance when removed from the signature.
This analysis involved 1,000,000 random samplings of 10 signature genes from the original 188 signature gene set. At each iteration, 10 randomly selected signature genes were removed and the performance of the remaining 178 genes was evaluated using the PFS endpoint to determine the impact on HR performance when these 10 genes were removed in the following 3 datasets:
Within each of these 3 datasets, the signature genes were weighted based upon the change in HR performance (Delta HR) based upon their inclusion or exclusion. Genes ranked ‘1’ have the most negative impact on performance when removed and those ranked ‘188’ have the least impact on performance when removed.
Minimum Gene Analysis
The purpose of evaluating the minimum number of genes is to determine if significant performance can be achieved within smaller subsets of the original signature.
This analysis involved 10,000 random samplings of the 188 signature genes starting at 1 gene/feature, up to a maximum of 25 (or 35 in the case of Tothill dataset) genes/features. For each randomly selected feature length, the signature was redeveloped using the SDA machine learning method under CV and model parameters derived. At each feature length, all randomly selected signatures were applied to calculate signature scores for the following 3 datasets:
Continuous signature scores were evaluated with PFS (Progression Free Survival) to determine the HR (Hazard Ratio) effect. The HR for all random signatures at each feature length was summarized and figures generated to visualize the performance over CV.
Results
Signature Development
This section presents the results of signature development within CV.
Core Gene Analysis
The results for the core gene analysis of the 188 gene signature in 3 datasets is provided in this section.
Minimum Gene Analysis
The results for the minimum gene analysis of the 188 gene signature in 3 datasets is provided in this section.
Number | Date | Country | Kind |
---|---|---|---|
1409476.7 | May 2014 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2015/051557 | 5/28/2015 | WO | 00 |