The present invention relates to cancer and in particular to prostate cancer and ER positive breast cancer. Provided are methods for characterising and prognosing cancer and in particular prostate cancer and ER positive breast cancer. The methods utilize various biomarkers, specifically in the form of one or more gene signatures. Primers, probes, antibodies, kits, devices and systems useful in the methods are also described.
Prostate cancer is the most common malignancy in men with a lifetime incidence of 15.3% (Howlader 2012). Based upon data from 1999-2006 approximately 80% of prostate cancer patients present with early disease clinically confined to the prostate (Altekruse et al 2010) of which around 65% are cured by surgical resection or radiotherapy (Kattan et al 1999, Pound et al 1999). 35% will develop PSA recurrence of which approximately 35% will develop local or metastatic recurrence, which is non-curable. At present it is unclear which patients with early prostate cancer are likely to develop recurrence and may benefit from more intensive therapies. Current prognostic factors such as tumour grade as measured by Gleason score have prognostic value but a significant number of those considered lower grade (7 or less) still recur and a proportion of higher-grade tumours do not. Additionally there is significant heterogeneity in the prognosis of Gleason 7 tumours (Makarov et al 2002, Rasiah et al 2003). Furthermore it has become evident that the grading of Gleason score has changed leading to changes in the distribution of Gleason scores over time (Albertsen et al 2005, Smith et al 2002).
It is now clear that most solid tumours originating from the same anatomical site represent a number of distinct entities at a molecular level (Perou et al 2000). DNA microarray platforms allow the analysis of tens of thousands of transcripts simultaneously from archived paraffin embedded tissues and are ideally suited for the identification of molecular subgroups. This kind of approach has identified primary cancers with metastatic potential in solid tumours such as breast (van 't Veer et al 2002) and colon cancer (Bertucci et al 2004).
The present invention is based upon the identification and verification of cancer biomarkers, particularly prognostic biomarkers that identify potentially metastatic cancers (such as prostate and ER positive breast cancers).
The present inventors have identified a group of primary prostate cancers that are similar to metastatic disease at a molecular level. Primary tumour samples which clustered with metastatic samples define a group with poor (bad) prognosis. These tumours may be defined by down regulation of genes associated with cell adhesion, cell differentiation and cell development. These tumours may be defined by up regulation of androgen related processes and epithelial to mesenchymal transition (EMT). In contrast, benign and primary like benign tumours cluster to define a group with improved (good) prognosis. A series of biomarker/gene signatures that can be used to prospectively identify tumours within either subgroup (i.e. with metastatic or non-metastatic biology) have been generated and validated which have prognostic power. The signatures can thus be used to prospectively assess a tumour's progression, for example to determine whether a tumour is at increased likelihood of recurrence and/or metastatic development. The signatures also display excellent performance in heterogeneity studies as discussed further herein. In particular, a 70 gene signature is described herein. The gene signatures are also shown to be effective in other cancer types including ER positive breast cancer, thus suggesting that the underlying molecular biology may have applicability in defining potentially metastatic primary tumours.
Thus, in a first aspect the invention provides a method for characterising and/or prognosing cancer, such as prostate cancer or ER positive breast cancer, in a subject comprising: determining the expression level of at least one gene from Table 1 in a sample from the subject wherein the determined expression level is used to provide a characterisation of and/or a prognosis for the cancer.
According to a further aspect of the invention there is provided a method for diagnosing (or identifying or characterizing) a cancer, such as prostate cancer or ER positive breast cancer, with an increased metastatic potential in a subject comprising:
determining the expression level of at least one gene from Table 1 in a sample from the subject wherein the determined expression level is used to identify whether a subject has a cancer, such as prostate cancer or ER positive breast cancer, with increased metastatic potential.
The invention also relates to a method for characterising and/or prognosing a cancer, such as prostate cancer or ER positive breast cancer in a subject comprising:
determining the expression level of at least one gene from Table 1 in a sample from the subject in order to identify the presence or absence of cells characteristic of an increased likelihood of recurrence and/or metastasis wherein the determined presence or absence of the cells is used to provide a characterisation of and/or a prognosis for the cancer, such as prostate cancer or ER positive breast cancer.
In a further aspect, the present invention relates to a method for characterising and/or prognosing a cancer, such as prostate cancer or ER positive breast cancer in a subject comprising:
a) obtaining a sample from the subject/in a sample obtained from the subject
b) applying a nucleic acid probe that specifically hybridizes with the nucleotide sequence of at least one gene or full sequence or target sequence selected from Table 1 to the sample from the subject
c) applying a detection agent that detects the nucleic acid probe-gene complex
d) using the detection agent to determine the level of the at least one gene or full sequence or target sequence
d) wherein the determined level of the at least one gene (or full sequence or target sequence) is used to provide a characterisation of and/or a prognosis for the cancer, such as prostate cancer or ER positive breast cancer. Suitable probes and probesets are listed in Table 1 and further details are provided in Table 1A.
In a further aspect, the present invention relates to a method for characterising and/or prognosing a cancer, such as prostate cancer or ER positive breast cancer in a subject comprising:
a) obtaining a sample from the subject/in a sample obtained from the subject
b) applying a set of nucleic acid primers that specifically hybridize with the nucleotide sequence of at least one gene or full sequence or target sequence selected from Table 1 to the sample from the subject
c) specifically amplifying the nucleotide sequence using the set of nucleic acid primers
d) detecting the amplification products using a specific detection agent to determine the level of the at least one gene or full sequence or target sequence
e) wherein the determined level of the at least one gene (or full sequence or target sequence) is used to provide a characterisation of and/or a prognosis for the cancer, such as prostate cancer or ER positive breast cancer. Suitable primers and primer pairs are listed in Table 1B.
The detection agent may comprise a label, such as a fluorescence label or fluorophore/quencher system attached to the nucleic acid probe and/or primer (as appropriate). Suitable systems and methodologies are known in the art and described herein.
The characterization, prognosis or diagnosis of the cancer, such as prostate cancer or ER positive breast cancer can also be used to guide treatment.
Accordingly, in a further aspect, the present invention relates to a method for selecting a treatment for a cancer, such as prostate cancer or ER positive breast cancer in a subject comprising:
(a) determining the expression level of at least one gene selected from Table 1 in a sample from the subject wherein the determined expression level is used to provide a characterisation of and/or a prognosis for the cancer, such as prostate cancer or ER positive breast cancer and
(b) selecting a treatment appropriate to the characterisation of and/or prognosis for the cancer, such as prostate cancer or ER positive breast cancer.
In yet a further aspect, the present invention relates to a method for selecting a treatment for a cancer, such as prostate cancer or ER positive breast cancer in a subject comprising:
(a) determining the expression level of at least one gene selected from Table 1 in a sample from the subject wherein the determined expression level is used to provide a characterisation of and/or a prognosis for the cancer, such as prostate cancer or ER positive breast cancer
(b) selecting a treatment appropriate to the characterisation of and/or prognosis for the cancer, such as prostate cancer or ER positive breast cancer and
(c) treating the subject with the selected treatment.
The invention also relates to a method of treating cancer, such as prostate cancer or ER positive breast cancer comprising administering a chemotherapeutic agent or radiotherapy, optionally extended radiotherapy, preferably extended-field radiotherapy, to a subject or carrying out surgery on a subject wherein the subject is selected for treatment on the basis of a method as described herein.
In a further aspect, the present invention relates to a chemotherapeutic agent for use in treating a cancer, such as prostate cancer or ER positive breast cancer in a subject, wherein the subject is selected for treatment on the basis of a method as described herein.
In yet a further aspect, the present invention relates to method of treating a cancer, such as prostate cancer or ER positive breast cancer comprising administering a chemotherapeutic agent or radiotherapy, optionally extended radiotherapy, preferably extended-field radiotherapy to a subject or carrying out surgery on a subject wherein the subject has an increased expression level of at least one gene with a positive weight selected from Table 1 and/or wherein the subject has a decreased expression level of at least one gene with negative weight selected from Table 1.
The invention also relates to a chemotherapeutic agent for use in treating a cancer, such as prostate cancer or ER positive breast cancer in a subject, wherein the subject has an increased expression level of at least one gene with a positive weight selected from Table 1 and/or wherein the subject has a decreased expression level of at least one gene with a negative weight selected from Table 1.
In certain embodiments according to all relevant aspects of the invention the chemotherapeutic agent comprises, consists essentially of or consists of
a) an anti-hormone treatment, preferably bicalutamide and/or abiraterone
b) a cytotoxic agent
c) a biologic, preferably an antibody and/or a vaccine, more preferably Sipuleucel-T and/or
d) a targeted therapeutic agent
Suitable therapies and therapeutic agents are discussed in further detail herein. The treatment may comprise or be adjuvant therapy in some embodiments.
According to all aspects of the invention the cancer may be a prostate cancer or ER positive breast cancer. Typically, the cancer is a primary tumor. In some embodiments, the prostate cancer may be a primary prostate cancer.
It is shown herein that the gene signatures may have particularly advantageous utility when combined with determination of other prognostic factors. Thus, all aspects of the invention may include other prognostic factors in the characterization, diagnosis or prognosis of the cancer. This may comprise generation of a combined risk score. This is particularly applicable in the context of prostate cancer. Other prognostic factors include prostate specific antigen (PSA) levels and/or Gleason score. MRI scan results may also be taken into account. Thus, according to all aspects of the invention, characterization, prognosis or diagnosis may take into account other prognostic factors such as PSA levels and/or Gleason score. PSA is a well-known serum biomarker and may be used according to the invention, in particular when measured pre-operatively. For example, a PSA value of 4-10 ng/ml may be considered “low risk”. A PSA value of 10-20 ng/ml may be considered reflective of “medium risk”. A PSA value of 20 ng/ml or more may be considered reflective of “high risk”. High risk would correspond to poor prognosis and/or be indicative of aggressive disease. Levels of PSA may contribute towards a final characterization of the cancer in combination with the measured expression levels. Medium risk PSA levels when combined with a positive or high signature score may indicate poor prognosis.
The Gleason system is used to grade prostate tumours with a score from 2 to 10, where a Gleason score of 10 indicates the most abnormalities. Cancers with a higher Gleason score are more aggressive and have a worse prognosis. The system is based on how the prostate cancer tissue appears under a microscope and indicates how likely it is that a tumour will spread. A low Gleason score means the cancer tissue is similar to normal prostate tissue and the tumour is less likely to spread; a high Gleason score means the cancer tissue is very different from normal and the tumour is more likely to spread. Gleason scores are calculated by adding the score of the most common grade (primary grade pattern) and the second most common grade (secondary grade pattern) of the cancer cells. Where more than two grades are observed the primary grade is added to the worst observable grade to arrive at the Gleason score. Grades are assigned using the 2005 (amended in 2009) International Society of Urological Pathology (ISUP) Consensus Conference on Gleason Grading of Prostatic Carcinoma. Thus, in some embodiments, a Gleason score of 7 or more contributes to a characterization of poor prognosis. In such embodiments, a Gleason score of less than 7 may contribute to a characterization of good prognosis. In some embodiments, a Gleason score of 7 is classified as an intermediate position between good and poor prognosis. Thus, a Gleason score of 8 or more is classified as poor prognosis. A Gleason score of less than 7 may contribute to a characterization of good prognosis. In some embodiments, a Gleason score of 7 thus contributes less to a characterization of poor prognosis than does a Gleason score of 8 or more, but more than a Gleason score of 6 or less. A Gleason score of 7 when combined with a positive or high signature score may indicate poor prognosis.
Where both Gleason score and PSA levels contribute to the characterization of the cancer, they may be weighted relative to one another. Typically, Gleason score is given greater significance than PSA levels. Thus, for example a Gleason score indicative of poor prognosis in combination with PSA levels associated with low risk, or good prognosis, may still result in a conclusion of poor prognosis (depending upon the measured expression levels of the gene or genes from Table 1). Similar considerations may apply to MRI results, which may be given greater weight than PSA levels in making the final characterization of the cancer.
The genes which may be included in suitable gene signatures and their identifying information are described and defined in further detail in Table 1 below. The genes may also be referred to, interchangeably, as biomarkers. Full sequences, against which suitable expression level determination assays may be designed, are also indicated in the table. Similarly, target sequences, against which suitable expression level determination assays may be designed, are also indicated in the table. Probe sequences interrogating the target sequences are also provided. Each sequence type is useful in the performance of the invention and form a separate aspect thereof.
Further details of the probesets can be found in Table 1A, including orientation information:
Table 1 lists the sequence identifiers for the full sequences against which gene expression assays may be targeted, more specific target sequences and probes/probesets which hybridize to those target sequences. Suitable primers and/or probes may be designed using known methods to determine gene expression based on the deposited gene sequences, the full sequences and target sequences specified herein. Furthermore, specific nucleic acid amplification assays (e.g. PCR, such as qPCR) have also been designed that permit reliable determination of gene expression levels for the genes in table 1. These assays are summarized in Table 1B. The assay target sequence and primers and primer pairs form separate aspects of the invention. For two of the targets, MIR578 and MIR4530, due to the short length of the target sequences, the approach taken by the inventors was not applicable to generate an amplification assay. For those targets, commercial assays are available and the sequences of the primers are provided below. For MIR578, the Life Technologies 4426961 Origene HP300490 assay may be employed. The forward and reverse primers are as follows:
For MIR4530, the Life Technologies 4427012 Origene HP301022 assay may be employed. The forward and reverse primers are as follows:
These specific primers, while useful in performing the methods of the invention, are thus not specifically claimed per se as forming part of the invention.
It should be noted that the complement of each sequence described herein may be employed as appropriate (e.g. for designing hybridizing probes and/or primers, including primer pairs).
In certain embodiments the expression level of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69 or 70 of the genes in table 1 is determined. Some analysis reported herein indicates that applying a signature comprising the measured expression levels of 7 or 12 genes can provide acceptable performance. Thus, in some embodiments, the minimum number of genes in the gene signature is 12. They can be any 7 or 12 genes from the 70 genes.
For the avoidance of doubt, additional genes (outside of the 70 genes) can be included in the signatures as would be readily appreciated by one skilled in the art. As is shown in
In some embodiments, a signature score is derived from the measured expression levels of the 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69 or 70 genes in table 1. Generation of such signature scores is described herein. The signature score may rely upon the weightings attributed to each gene as listed in Table 1, for the 70 gene signature. The weightings would, of course, need to be recalculated where a signature of different composition was utilized, for example including fewer than the total 70 gene signature. Similar considerations apply to the bias and constant offset values, as discussed below.
Gene signatures may be formulated in rank order in some embodiments, for example a 10 gene signature could be formed from the first 10 ranked genes listed in Table 1. However, the rankings are based on performance in the context of the 70 gene signature. Accordingly, formulation of sub-signatures of the 70 gene signature are not restricted to the same hierarchy and may be formulated using any combination of the 70 genes to form the suitably sized signature.
Core gene analysis was performed to determine a ranking for the genes based upon their impact on performance when removed from the signature. This analysis involved 10,000 random samplings of 10 signature genes from the original 70 signature gene set. For each iteration, 10 randomly selected signature genes were removed and the performance of the remaining 65 genes was evaluated using the endpoint to determine the impact on HR (Hazard Ratio) performance when these 10 genes were removed.
When this was performed using the FASTMAN Biopsy Validation Cohort of 248 samples, evaluation utilised the biochemical recurrence (BCR) endpoint.
The signature genes were weighted based upon the change in HR performance (Delta HR) based upon their inclusion or exclusion. The gene ranked ‘1’ has the most negative impact on performance when removed and the gene ranked ‘70’ has the least impact on performance when removed. The results are shown in Table 35 below.
Thus, in some embodiments, gene signatures are formulated in rank order. For example a 10 gene signature could comprise the first 10 ranked genes listed in Table 35. Accordingly, in some embodiments, the expression level of at least 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 of the 10 highest ranked genes in Table 35 is determined.
When this was performed using the Internal Resection Validation Cohort of 322 samples, evaluation utilised the metastatic recurrence (MET) endpoint.
The signature genes were weighted based upon the change in HR performance (Delta HR) based upon their inclusion or exclusion. The gene ranked ‘1’ has the most negative impact on performance when removed and the gene ranked ‘70’ has the least impact on performance when removed. The results are shown in Table 36 below.
Thus, in some embodiments, gene signatures are formulated in rank order. For example a 10 gene signature could comprise the first 10 ranked genes listed in Table 36. Accordingly, in some embodiments, the expression level of at least 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 of the 10 highest ranked genes in Table 36 is determined.
The results for combined rankings are shown in Table 38. In some embodiments, gene signatures are formulated in rank order. For example a 10 gene signature could comprise from the first 10 ranked genes listed in Table 38. Accordingly, in some embodiments, the expression level of at least 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 of the 10 highest ranked genes in Table 38 is determined.
Additional gene signatures representing selections from the genes of Table 1 are described herein and are applicable to all aspects of the invention. These signatures may also provide the basis for larger signatures. The additional signatures are set forth in Tables 2 to 24, together with suitable weight and bias scores that may be adopted when calculating the final signature score (as further described herein). The k value for each signature can be set once the threshold for defining a positive signature score has been determined, as would be readily appreciated by the skilled person. Similarly, the rankings for each gene in the signature can readily be determined by reviewing the weightings attributed to each gene (where a larger weight indicates a higher ranking in the signature—see Table 1 for the rank order in respect of the 70 gene signature).
Thus, in some embodiments, the methods of the invention involve determining expression levels of at least MT1A and PCP4 (two gene signature shown in Table 2). As shown in
In some embodiments, applicable to all aspects of the invention, the expression level of PDK4 alone is not measured. PDK4 expression is thus typically measured in combination with at least one further gene up to all 69 further genes from table 1. In some embodiments, PDK4 expression is determined using an assay targeting a sequence within the full sequences of SEQ ID NO: 52, 53, 63, 108, 09, 152, 153, 157, 158, 184, 194 and/or 216 respectively. In some embodiments, PDK4 expression is determined using an assay targeting a sequence within the target sequences of SEQ ID NO: 284, 285, 295, 340, 341, 384, 385, 389, 390, 416, 426 and/or 448 respectively. In some embodiments PDK4 expression is determined using one or more probes selected from SEQ ID Nos: 1011-1021, 1022-1032, 1132-1142, 1627-1637, 1638-1648, 2122-2132, 2133-2143, 2177-2187, 2188-2198, 2474-2484, 2584-2594 and 2834-2844 or probe sets of SEQ ID Nos: 1011-1021, 1022-1032, 1132-1142, 1627-1637, 1638-1648, 2122-2132, 2133-2143, 2177-2187, 2188-2198, 2474-2484, 2584-2594 and/or 2834-2844. In some embodiments, PDK4 expression is determined using an amplification (PCR, or qPCR) assay employing primers of SEQ ID NO: 3053 and/or 3121 respectively.
In some embodiments, applicable to all aspects of the invention, the expression level of KIF11, PTTG1 or TK1 alone is not measured. In some embodiments, the expression levels of KIF11, PTTG1 and TK1 may be measured together as a 3 gene signature. In some embodiments, the expression levels of KIF11, PTTG1 and/or TK1 may be measured in combination with at least one further gene from Table 1, including forming the 70 gene signature. In some embodiments, KIF11 expression is determined using an assay targeting a sequence within the full sequences of SEQ ID NO: 180 and/or 181 respectively. In some embodiments, KIF11 expression is determined using an assay targeting a sequence within the target sequences of SEQ ID NO: 412 and/or 413 respectively. In some embodiments KIF11 expression is determined using one or more probes selected from SEQ ID Nos: 2430-2440 and 2441-2451 or probe sets of SEQ ID Nos: 2430-2440 and/or 2441-2451. In some embodiments, KIF11 expression is determined using an amplification (PCR, or qPCR) assay employing primers of SEQ ID NO: 3062 and/or 3130 respectively.
In some embodiments, PTTG1 expression is determined using an assay targeting a sequence within the full sequences of SEQ ID NO: 62 and/or 201 respectively. In some embodiments, PTTG1 expression is determined using an assay targeting a sequence within the target sequences of SEQ ID NO: 294 and/or 433 respectively. In some embodiments PTTG1 expression is determined using one or more probes selected from SEQ ID Nos: 1121-1131 and 2661-2671 or probe sets of SEQ ID Nos: 1121-1131 and/or 2661-2671. In some embodiments, PTTG1 expression is determined using an amplification (PCR, or qPCR) assay employing primers of SEQ ID NO: 3037 and/or 3105 respectively.
In some embodiments, TK1 expression is determined using an assay targeting a sequence within the full sequence of SEQ ID NO: 197. In some embodiments, TK1 expression is determined using an assay targeting a sequence within the target sequence of SEQ ID NO: 429. In some embodiments TK1 expression is determined using one or more probes selected from SEQ ID Nos: 2617-2627 or probe sets of SEQ ID Nos: 2617-2627. In some embodiments, TK1 expression is determined using an amplification (PCR, or qPCR) assay employing primers of SEQ ID NO: 3060 and/or 3128 respectively.
In some embodiments, applicable to all aspects of the invention, the expression level of ANO7 or MYBPC1 alone is not measured. In some embodiments, the expression levels of ANO7 and MYBPC1 may be measured together as a 2 gene signature. In some embodiments, the expression levels of ANO7 and/or MYBPC1 may be measured in combination with at least one further gene from Table 1, including forming the 70 gene signature.
In some embodiments, ANO7 expression is determined using an assay targeting a sequence within the full sequences of SEQ ID NO: 37, 38, 125, 205 and/or 206 respectively. In some embodiments, ANO7 expression is determined using an assay targeting a sequence within the target sequences of SEQ ID NO: 269, 270, 357, 437 and/or 438 respectively. In some embodiments ANO7 expression is determined using one or more probes selected from SEQ
ID Nos: 849-859, 860-870, 1825-1835, 2715-2724 and 2725-2735 or probe sets of SEQ ID Nos: 849-859, 860-870, 1825-1835, 2715-2724 and/or 2725-2735. In some embodiments, ANO7 expression is determined using an amplification (PCR, or qPCR) assay employing primers of SEQ ID NO: 3022 and/or 3090 respectively.
In some embodiments, MYBPC1 expression is determined using an assay targeting a sequence within the full sequences of SEQ ID NO: 39, 40, 74, 75, 101, 102, 103 and/or 144 respectively. In some embodiments, MYBPC1 expression is determined using an assay targeting a sequence within the target sequences of SEQ ID NO: 271, 272, 306, 307, 333, 334, 335 and/or 376 respectively. In some embodiments MYBPC1 expression is determined using one or more probes selected from SEQ ID Nos: 871-881, 882-892, 1253-1263, 1264-1274, 1550-1560, 1561-1571, 1572-1582 and 2034-2044 or probe sets of SEQ ID Nos: 871-881, 882-892, 1253-1263, 1264-1274, 1550-1560, 1561-1571, 1572-1582 and/or 2034-2044.
In some embodiments, MYBPC1 expression is determined using an amplification (PCR, or qPCR) assay employing primers of SEQ ID NO: 3025 and/or 3093 respectively.
By “characterization” is meant classification and/or evaluation of the cancer, such as prostate cancer or ER positive breast cancer. Thus, the methods of the invention allow cancers with high metastic potential to be identified for example. The methods rely upon determining whether the cancer is a metastatic biology cancer or a non-metastatic biology cancer. The methods permit cancers to be identified that are likely to recur. Prognosis refers to predicting the likely outcome of the cancer, such as prostate cancer or ER positive breast cancer for the subject. A bad or poor prognosis as determined herein, indicates an increased likelihood of metastases and/or a higher likelihood or recurrence. By diagnosis is meant identifying the presence of a cancer, of a particular type such as prostate cancer or ER positive breast cancer with an increased metastatic potential. Thus, it will be readily apparent that there is some overlap between the terms “characterization”, “prognosis” and “diagnosis” as adopted herein. The use of relative terms indicates the position vis a vis cancers which do not display the relevant gene expression characteristics and thus have lower metastatic potential, are less likely to recur and/or have a good prognosis. The gene signatures described herein may be useful to stratify (prostate) cancer patients who have been diagnosed, in particular at an early stage, and identify those at increased risk of developing more aggressive high risk disease. This more aggressive disease may develop within 3-5 years of treatment. The initial treatment may be radiotherapy and/or surgery (prostatectomy) for example. Upon identification of the aggressive disease, the methods may require treatments as described herein to be utilized. In the absence of cancer with high metastatic potential, the subject may be placed under active surveillance and not further treated, at least initially. Further monitoring, by any suitable means (including use of PSA monitoring or by performing the methods of the invention) can be used to determine whether further intervention is required.
In some embodiments the characterisation of and/or prognosis for the cancer, such as prostate cancer or ER positive breast cancer may comprise, consist essentially of or consist of predicting an increased likelihood of recurrence. Cancers with the metastatic biology are shown herein to be more likely to recur. The characterisation of and/or prognosis for the cancer, such as prostate cancer or ER positive breast cancer may comprise, consist essentially of or consist of predicting a reduced time to recurrence. Recurrence may be considered co-terminus with relapse, as would be understood by the skilled person.
Recurrence may be clinical recurrence, metastatic recurrence or biochemical recurrence. In the context of prostate cancer biochemical recurrence means a rise in the level of PSA in a subject after treatment for prostate cancer. Biochemical recurrence may indicate that the prostate cancer has not been treated effectively or has recurred. Recurrence may be following surgery, for example radical prostatectomy and/or following radiotherapy.
In some embodiments, the characterisation of and/or prognosis for the cancer, such as prostate cancer or ER positive breast cancer may comprise, consist essentially of or consist of predicting an increased likelihood of metastasis. Metastasis, or metastatic disease, is the spread of a cancer from one organ or part to another non-adjacent organ or part. The new occurrences of disease thus generated are referred to as metastases. In certain embodiments, the methods of the invention are used to facilitate metastases staging of cancer, in particular prostate cancer. Thus, determined expression levels (e.g. determination of a gene signature positive sample) can be used to stage a subject as M1. M1 means that metastases are present (i.e. the cancer has spread to other parts of the body). For gene signature negative samples, that subject may be staged as M0. M0 means that the cancer has not yet spread to other parts of the body. Such methods may be used in conjunction with other measures used to identify metastases e.g. imaging/scanning techniques. Thus, the invention provides a method for metastases staging of a cancer comprising determining the expression level of at least one gene selected from Table 1 in a sample from the subject wherein the determined expression level is used to identify whether a subject has a M1 or M0 cancer. Thus, in some embodiments, the methods may comprise:
(i) determining the expression level of at least one gene selected from Table 1 in a sample from the subject; and
(ii) assessing from the expression level of the at least one gene whether the sample from the subject is positive or negative for a gene signature comprising the at least one gene. Suitable gene signatures and derivations of signature scores are discussed in further detail herein.
In some embodiments, characterisation of and/or prognosis for the cancer, such as prostate cancer or ER positive breast cancer may also comprise, consist essentially of or consist of determining whether the cancer has a poor prognosis. A poor prognosis may be a reduced likelihood of cause-specific, i.e. cancer-specific, or long term survival. Cause- or Cancer-specific survival is a net survival measure representing cancer survival in the absence of other causes of death. Cancer survival may be for 6, 7, 8, 9, 10, 11, 12 months or 1, 2, 3, 4, 5 etc. years. Long-term survival may be survival for 1 year, 5 years, 10 years or 20 years following diagnosis. A cancer, such as prostate cancer or ER positive breast cancer with a poor prognosis may be aggressive, fast growing, and/or show resistance to treatment.
In certain embodiments an increased expression level of at least one gene selected from Table 1 with a positive weight indicates an increased likelihood of recurrence and/or metastasis and/or a poor prognosis.
In further embodiments a decreased expression level of at least one gene selected from Table 1 with a negative weight indicates an increased likelihood of recurrence and/or metastasis and/or a poor prognosis.
Expression levels are weighted accordingly, to account for their contribution to gene signature score as discussed herein. A threshold of expression may be set relative to a median level against which “signature positive” and “signature negative” expression values can be set. Examples of such median threshold expression levels and corresponding signature positive and negative values are set forth in table 25 immediately below. As can be seen, the median values are set individually for each dataset as would be understood by one skilled in the art:
In certain embodiments the methods described herein may comprise determining the expression level of at least one of the genes with a negative weight listed in Table 1 together with at least one gene with a positive weight listed in Table 1. Thus, the methods may rely upon a combination of an up-regulated marker and a down-regulated marker. The combined up and down regulated marker expression levels, as appropriately weighted, may then contribute to, or make up, the final signature score.
In certain embodiments the methods described herein comprise comparing the expression level of one or more genes to a reference value or to the expression level in one or more control samples or to the expression level in one or more control cells in the same sample. The control cells may be normal (i.e. cells characterised by an independent method as non-cancerous) cells. The one or more control samples may consist of non-cancerous cells or may include a mixture of cancer cells (prostate, ER positive breast or otherwise) and non-cancerous cells. The expression level may be compared to the expression level of the same gene in one or more control samples or control cells.
The reference value may be a threshold level of expression of at least one gene set by determining the level or levels in a range of samples from subjects with and without the relevant cancer. The cancer, such as prostate cancer or ER positive breast cancer may be cancer with and/or without an increased likelihood of recurrence and/or metastasis and/or a poor prognosis. Suitable methods for setting a threshold are well known to those skilled in the art. The threshold may be mathematically derived from a training set of patient data. The score threshold thus separates the test samples according to presence or absence of the particular condition. The interpretation of this quantity, i.e. the cut-off threshold may be derived in a development or training phase from a set of patients with known outcome. The threshold may therefore be fixed prior to performance of the claimed methods from training data by methods known to those skilled in the art and as detailed herein in relation to generation of the various gene signatures.
The reference value may also be a threshold level of expression of at least one gene set by determining the level of expression of the at least one gene in a sample from a subject at a first time point. The determined levels of expression at later time points for the same subject are then compared to the threshold level. Thus, the methods of the invention may be used in order to monitor progress of disease in a subject, namely to provide an ongoing characterization and/or prognosis of disease in the subject. For example, the methods may be used to identify (or “diagnose”) a cancer, such as prostate cancer or ER positive breast cancer that has developed into a more aggressive or potentially metastatic form. This may be used to guide treatment decisions as discussed in further detail herein. In some embodiments, such monitoring methods determine whether treatment should be administered or not. If the cancer is identified within the metastatic biology group the cancer should be treated. If the cancer is identified as “non-metastatic” further monitoring can be performed to ensure that the cancer remains stable (i.e. does not evolve into the metastatic form). In such circumstances, no further treatment may be applied.
For genes whose expression level does not differ between normal cells and cells from a cancer, such as prostate cancer or ER positive breast cancer that does not have an increased likelihood of recurrence and/or metastasis and/or a poor prognosis the expression level of the same gene in normal cells in the same sample can be used as a control.
Different may be statistically significantly different. By statistically significant is meant unlikely to have occurred by chance alone. A suitable statistical assessment may be performed according to any suitable method.
The methods described herein may further comprise determining the expression level of a reference gene. A reference gene may be required if the target gene expression level differs between normal cells and cells from a cancer, such as prostate cancer or ER positive breast cancer that does not have an increased likelihood of recurrence and/or metastasis and/or a poor prognosis.
In certain embodiments the expression level of at least one gene selected from Table 1 is compared to the expression level of a reference gene.
The reference gene may be any gene with minimal expression variance across all cancer, such as prostate cancer or ER positive breast cancer samples. Thus, the reference gene may be any gene whose expression level does not vary with likelihood of recurrence and/or metastasis and/or a poor prognosis. The skilled person is well able to identify a suitable reference gene based upon these criteria. The expression level of the reference gene may be determined in the same sample as the expression level of at least one gene selected from Table 1.
The expression level of the reference gene may be determined in a different sample. The different sample may be a control sample as described above. The expression level of the reference gene may be determined in normal cells and/or cancer, such as prostate cancer or ER positive breast cancer, cells in a sample.
The expression level of the at least one gene in the sample from the subject may be analysed using a statistical model. In specific embodiments where the expression level of at least 2 genes, up to all 70 genes from Table 1, is measured the genes may be weighted. As used herein, the term “weight” refers to the relative importance of an item in a statistical calculation. The weight of each gene may be determined on a data set of patient samples using analytical methods known in the art. An overall score, termed a “signature score”, may be calculated and used to provide a characterisation of and/or prognosis for the cancer, such as prostate cancer or ER positive breast cancer. Typically, the score represents the sum of the weighted gene expression levels. Suitable weights for calculating the 70 gene signature score are set forth in Table 1 and may be employed according to the methods of the invention. Similarly, suitable weights for exemplary smaller signatures are set forth in Tables 2 to 24.
Thus, according to all aspects of the invention, the methods may comprise:
(i) determining the expression level of at least one gene selected from Table 1 in a sample from the subject; and
(ii) assessing from the expression level of the at least one gene whether the sample from the subject is positive or negative for a gene signature comprising the at least one gene.
As discussed herein, if the sample is positive for the gene signature this identifies the cancer as of the high metastatic potential type. This may indicate a (relatively) poor prognosis, or any other pertinent associated characterisation, prognosis or diagnosis as described herein. By corollary, a sample negative for the gene signature identifies the cancer as not of the high metastatic potential type. This may indicate a (relatively) good prognosis, or any other pertinent associated characterisation, prognosis or diagnosis as described herein.
Thus, at its simplest, an increased level of expression of one or more genes defines a sample as positive for the gene signature. For certain genes, a decreased level of expression of one or more gene defines a sample as positive for the gene signature. However, where the expression level of a plurality of genes is measured, the combination of expression levels is typically aggregated in order to determine whether the sample is positive for the gene signature. Thus, some genes may display increased expression and some genes may display decreased expression. This can be achieved in various ways, as discussed in detail herein.
In specific embodiments, the signature score may be calculated according to the following equation:
Similarly, each gene in the signature may be attributed a bias score. Example bias scores for the 70 gene signature are specified in table 1 and may be adopted according to the performance of the methods of the invention. Of course, where different signatures are utilised, representing a subset of the 70 gene signature, the bias values would be recalculated. Examples are provided in Tables 2 to 24.
As indicated, k is a constant offset. Where the bias and weight values of table 1 are adopted for the 70 gene signature, the constant offset may have a value of 0.4365. Again, where different signatures are utilised, representing a subset of the 70 gene signature, the value of k would be recalculated. The value of k varies dependent upon where the threshold for “signature positive” is set. This threshold may be set dependent upon which considerations are most important, e.g. to maximize sensitivity and/or specificity as against a particular outcome or characterisation. Suitable thresholds may be determined as described above.
In some embodiments, a score above the threshold may indicate a poor prognosis (or other pertinent characterisation, prognosis or diagnosis as described herein). In those embodiments, a score equal to or below threshold may indicate a good prognosis. In other embodiments, a score above or equal to the threshold may indicate a poor prognosis (or other pertinent characterisation, prognosis or diagnosis as described herein). In those embodiments, a score below threshold may indicate a good prognosis. The skilled person would also appreciate that a simple mathematical transformation could be used to invert the score and “above” and “below” should be construed accordingly unless indicated otherwise.
By “signature score” is meant a compound decision score that summarizes the expression levels of the genes. This may be compared to a threshold score that is mathematically derived from a training set of patient data. The threshold score is established with the purpose of maximizing the ability to separate cancers into those that are positive for the biomarker signature and those that are negative. The patient training set data is preferably derived from cancer tissue samples having been characterized by sub-type, prognosis, likelihood of recurrence, long term survival, clinical outcome, treatment response, diagnosis, cancer classification, or personalized genomics profile. Expression profiles, and corresponding decision scores from patient samples may be correlated with the characteristics of patient samples in the training set that are on the same side of the mathematically derived score decision threshold. In certain example embodiments, the threshold of the (linear) classifier scalar output is optimized to maximize the sum of sensitivity and specificity under cross-validation as observed within the training dataset.
The overall expression data for a given sample may be normalized using methods known to those skilled in the art in order to correct for differing amounts of starting material, varying efficiencies of the extraction and amplification reactions, etc.
In one embodiment, the biomarker expression levels in a sample are evaluated by a (linear) classifier. As used herein, a (linear) classifier refers to a weighted sum of the individual biomarker intensities into a compound decision score (“decision function”). The decision score is then compared to a pre-defined cut-off score threshold, corresponding to a certain set-point in terms of sensitivity and specificity which indicates if a sample is equal to or above the score threshold (decision function positive) or below (decision function negative).
Using a (linear) classifier on the normalized data to make a call (e.g. positive or negative for a biomarker signature) effectively means to split the data space, i.e. all possible combinations of expression values for all genes in the classifier, into two disjoint segments by means of a separating hyperplane. This split is empirically derived on a (large) set of training examples. Without loss of generality, one can assume a certain fixed set of values for all but one biomarker, which would automatically define a threshold value for this remaining biomarker where the decision would change from, for example, positive or negative for the biomarker signature. The precise value of this threshold depends on the actual measured expression profile of all other genes within the classifier, but the general indication of certain genes remains fixed. Therefore, in the context of the overall gene expression classifier, relative expression can indicate if either up- or down-regulation of a certain biomarker is indicative of being positive for the signature or not. In certain example embodiments, a sample expression score above the threshold expression score indicates the sample is positive for the biomarker signature. In certain other example embodiments, a sample expression score above a threshold score indicates the subject has a poor clinical prognosis compared to a subject with a sample expression score below the threshold score.
In certain other example embodiments, the expression signature is derived using a decision tree (Hastie et al. The Elements of Statistical Learning, Springer, New York 2001), a random forest (Breiman, 2001 Random Forests, Machine Learning 45:5), a neural network (Bishop, Neural Networks for Pattern Recognition, Clarendon Press, Oxford 1995), discriminant analysis (Duda et al. Pattern Classification, 2nd ed., John Wiley, New York 2001), including, but not limited to linear, diagonal linear, quadratic and logistic discriminant analysis, a Prediction Analysis for Microarrays (PAM, (Tibshirani et al., 2002, Proc. Natl. Acad. Sci. USA 99:6567-6572)) or a Soft Independent Modeling of Class Analogy analysis. (SIMCA, (Wold, 1976, Pattern Recogn. 8:127-139)). Classification trees (Breiman, Leo; Friedman, J. H.; Olshen, R. A.; Stone, C. J. (1984). Classification and regression trees. Monterey, Calif.: Wadsworth & Brooks/Cole Advanced Books & Software. ISBN 978-0-412-04841-8) provide a means of predicting outcomes based on logic and rules. A classification tree is built through a process called binary recursive partitioning, which is an iterative procedure of splitting the data into partitions/branches. The goal is to build a tree that distinguishes among pre-defined classes. Each node in the tree corresponds to a variable. To choose the best split at a node, each variable is considered in turn, where every possible split is tried and considered, and the best split is the one which produces the largest decrease in diversity of the classification label within each partition. This is repeated for all variables, and the winner is chosen as the best splitter for that node. The process is continued at the next node and in this manner, a full tree is generated. One of the advantages of classification trees over other supervised learning approaches such as discriminant analysis, is that the variables that are used to build the tree can be either categorical, or numeric, or a mix of both. In this way it is possible to generate a classification tree for predicting outcomes based on say the directionality of gene expression.
Random forest algorithms (Breiman, Leo (2001). “Random Forests”. Machine Learning 45 (1): 5-32. doi:10.1023/A:1010933404324) provide a further extension to classification trees, whereby a collection of classification trees are randomly generated to form a “forest” and an average of the predicted outcomes from each tree is used to make inference with respect to the outcome.
Biomarker expression values may be defined in combination with corresponding scalar weights on the real scale with varying magnitude, which are further combined through linear or non-linear, algebraic, trigonometric or correlative means into a single scalar value via an algebraic, statistical learning, Bayesian, regression, or similar algorithms which together with a mathematically derived decision function on the scalar value provide a predictive model by which expression profiles from samples may be resolved into discrete classes of responder or non-responder, resistant or non-resistant, to a specified drug, drug class, molecular subtype, or treatment regimen. Such predictive models, including biomarker membership, are developed by learning weights and the decision threshold, optimized for sensitivity, specificity, negative and positive predictive values, hazard ratio or any combination thereof, under cross-validation, bootstrapping or similar sampling techniques, from a set of representative expression profiles from historical patient samples with known drug response and/or resistance.
In one embodiment, the genes are used to form a weighted sum of their signals, where individual weights can be positive or negative. The resulting sum (“expression score”) is compared with a pre-determined reference point or value. The comparison with the reference point or value may be used to diagnose, or predict a clinical condition or outcome.
As described above, one of ordinary skill in the art will appreciate that the genes included in the classifier provided in the various Tables will carry unequal weights in a classifier. Therefore, while as few as one biomarker may be used to diagnose or predict a clinical prognosis or response to a therapeutic agent, the specificity and sensitivity or diagnosis or prediction accuracy may increase using more genes.
In certain example embodiments, the expression signature is defined by a decision function. A decision function is a set of weighted expression values derived using a (linear) classifier.
All linear classifiers define the decision function using the following equation:
f(x)=w′·x+b=Σwi·xi+b (1)
All measurement values, such as the microarray gene expression intensities xi, for a certain sample are collected in a vector x. Each intensity is then multiplied with a corresponding weight wi to obtain the value of the decision function f(x) after adding an offset term b. In deriving the decision function, the linear classifier will further define a threshold value that splits the gene expression data space into two disjoint sections. Example (linear) classifiers include but are not limited to partial least squares (PLS), (Nguyen et al., Bioinformatics 18 (2002) 39-50), support vector machines (SVM) (Schölkopf et al., Learning with Kernels, MIT Press, Cambridge 2002), and shrinkage discriminant analysis (SDA) (Ahdesmaki et al., Annals of applied statistics 4, 503-519 (2010)). In one example embodiment, the (linear) classifier is a PLS linear classifier.
The decision function is empirically derived on a large set of training samples, for example from patients showing a good or poor clinical prognosis. The threshold separates a patient group based on different characteristics such as, but not limited to, clinical prognosis before or after a given therapeutic treatment. The interpretation of this quantity, i.e. the cut-off threshold, is derived in the development phase (“training”) from a set of patients with known outcome. The corresponding weights and the responsiveness/resistance cut-off threshold for the decision score are fixed a priori from training data by methods known to those skilled in the art. In one example embodiment, Partial Least Squares Discriminant Analysis (PLS-DA) is used for determining the weights. (L. Ståhle, S. Wold, J. Chemom. 1 (1987) 185-196; D. V. Nguyen, D. M. Rocke, Bioinformatics 18 (2002) 39-50).
Effectively, this means that the data space, i.e. the set of all possible combinations of biomarker expression values, is split into two mutually exclusive groups corresponding to different clinical classifications or predictions, for example, one corresponding to good clinical prognosis and poor clinical prognosis. In the context of the overall classifier, relative over-expression of a certain biomarker can either increase the decision score (positive weight) or reduce it (negative weight) and thus contribute to an overall decision of, for example, a good clinical prognosis.
In certain example embodiments of the invention, the data is transformed non-linearly before applying a weighted sum as described above. This non-linear transformation might include increasing the dimensionality of the data. The non-linear transformation and weighted summation might also be performed implicitly, for example, through the use of a kernel function. (Schölkopf et al. Learning with Kernels, MIT Press, Cambridge 2002).
In certain example embodiments, the patient training set data is derived by isolated RNA from a corresponding cancer tissue sample set and determining expression values by hybridizing the (cDNA amplified from) isolated RNA to a microarray. In certain example embodiments, the microarray used in deriving the expression signature is a transcriptome array. As used herein a “transcriptome array” refers to a microarray containing probe sets that are designed to hybridize to sequences that have been verified as expressed in the diseased tissue of interest. Given alternative splicing and variable poly-A tail processing between tissues and biological contexts, it is possible that probes designed against the same gene sequence derived from another tissue source or biological context will not effectively bind to transcripts expressed in the diseased tissue of interest, leading to a loss of potentially relevant biological information. Accordingly, it is beneficial to verify what sequences are expressed in the disease tissue of interest before deriving a microarray probe set. Verification of expressed sequences in a particular disease context may be done, for example, by isolating and sequencing total RNA from a diseased tissue sample set and cross-referencing the isolated sequences with known nucleic acid sequence databases to verify that the probe set on the transcriptome array is designed against the sequences actually expressed in the diseased tissue of interest. Methods for making transcriptome arrays are described in United States Patent Application Publication No. 2006/0134663, which is incorporated herein by reference. In certain example embodiments, the probe set of the transcriptome array is designed to bind within 300 nucleotides of the 3′ end of a transcript. Methods for designing transcriptome arrays with probe sets that bind within 300 nucleotides of the 3′ end of target transcripts are disclosed in United States Patent Application Publication No. 2009/0082218, which is incorporated by reference herein. In certain example embodiments, the microarray used in deriving the gene expression profiles of the present invention is the Almac Prostate Cancer DSA™ microarray (Almac Group, Craigavon, United Kingdom).
An optimal (linear) classifier can be selected by evaluating a (linear) classifier's performance using such diagnostics as “area under the curve” (AUC). AUC refers to the area under the curve of a receiver operating characteristic (ROC) curve, both of which are well known in the art. AUC measures are useful for comparing the accuracy of a classifier across the complete data range. (Linear) classifiers with a higher AUC have a greater capacity to classify unknowns correctly between two groups of interest (e.g., ovarian cancer samples and normal or control samples). ROC curves are useful for plotting the performance of a particular feature (e.g., any of the genes described herein and/or any item of additional biomedical information) in distinguishing between two populations (e.g., individuals responding and not responding to a therapeutic agent). Typically, the feature data across the entire population (e.g., the cases and controls) are sorted in ascending order based on the value of a single feature. Then, for each value for that feature, the true positive and false positive rates for the data are calculated. The true positive rate is determined by counting the number of cases above the value for that feature and then dividing by the total number of positive cases. The false positive rate is determined by counting the number of controls above the value for that feature and then dividing by the total number of controls. Although this definition refers to scenarios in which a feature is elevated in cases compared to controls, this definition also applies to scenarios in which a feature is lower in cases compared to the controls (in such a scenario, samples below the value for that feature would be counted). ROC curves can be generated for a single feature as well as for other single outputs, for example, a combination of two or more features can be mathematically combined (e.g., added, subtracted, multiplied, etc.) to provide a single sum value, and this single sum value can be plotted in a ROC curve. Additionally, any combination of multiple features, in which the combination derives a single output value, can be plotted in a ROC curve. These combinations of features may comprise a test. The ROC curve is the plot of the true positive rate (sensitivity) of a test against the false positive rate (1-specificity) of the test.
Alternatively, an optimal classifier can be selected by evaluating performance against time-to-event endpoints using methods such as Cox proportional hazards (PH) and measures of performance across all possible thresholds assessed via the concordance-index (C-index) (Harrell, Jr. 2010). The C-Index is analagous to the “area under the curve” (AUC) metric (used for dichotomised endpoints), and it is used to measure performance with respect to association with survival data. Note that the extension of AUC to time-to-event endpoints is the C-index, with threshold selection optimised to maximise the hazard ratio (HR) under cross-validation. In this instance, the partial Cox regression algorithm (Li and Gui, 2004) was chosen for the biomarker discovery analyses. It is analogous to principal components analysis in that the first few latent components explain most of the information in the data. Implementation is as described in Ahdesmaki et al 2013.
C-index values can be generated for a single feature as well as for other single outputs, for example, a combination of two or more features can be mathematically combined (e.g., added, subtracted, multiplied, etc.) to provide a single sum value, and this single sum value can be evaluated for statistical significance. Additionally, any combination of multiple features, in which the combination derives a single output value, can be evaluated as a C-index for assessing utility for time-to-event class separation. These combinations of features may comprise a test. The C-index (Harrell, Jr. 2010, see Equation 4) of the continuous cross-validation test set risk score predictions was evaluated as the main performance measure.
Methods for determining the expression levels of the at least one gene from Table 1 (biomarkers) are described in greater detail herein. Typically, the methods may involve contacting a sample obtained from a subject with a detection agent, such as primers and/or probes, or an antibody or functionally equivalent binding reagent, (as discussed in detail herein) specific for the gene and detecting expression products. The detection agent may be labelled as discussed herein. A comparison may be made against expression levels determined in a control sample to provide a characterization and/or a prognosis for the cancer, such as prostate cancer or ER positive breast cancer.
According to all aspects of the invention the expression level of the gene or genes may be measured by any suitable method. In certain embodiments the expression level is determined at the level of protein, RNA or epigenetic modification. The epigenetic modification may be DNA methylation.
The expression level of any of the genes described herein may be detected by detecting the appropriate RNA. The assays may investigate specific regions of the genes, as described herein. For example, the assays may investigate the regions flanked by specific primer binding sites and/or regions of the gene to which the probe sets described herein hybridize. The assays may investigate, promoter, terminator, exonic and/or intronic regions of the genes as appropriate. The assays may investigate one or more of the full sequences or target sequences, or regions thereof, as specified in Table 1 for the respective genes.
In certain embodiments, according to all aspects of the invention, expression of the at least one gene may be determined using one or more probes or primers (primer pairs) designed to hybridize with one or more of the target sequences or full sequences listed in Table 1. The probes and probesets identified in table 1 (and detailed further in Table 1A) may be employed according to all aspects of the invention. The primers and primer pairs listed in Table 1B and identified as SEQ ID NOs 3151-3154 may be employed according to all aspects of the invention.
Accordingly, in specific embodiments the expression level is determined by microarray, northern blotting, RNA-seq (RNA sequencing), in situ RNA detection or nucleic acid amplification. Nucleic acid amplification includes PCR and all variants thereof such as real-time and end point methods and quantitative PCR (qPCR). Other nucleic acid amplification techniques are well known in the art, and include methods such as NASBA, 3SR and Transcription Mediated Amplification (TMA). Other suitable amplification methods include the ligase chain reaction (LCR), selective amplification of target polynucleotide sequences (U.S. Pat. No. 6,410,276), consensus sequence primed polymerase chain reaction (U.S. Pat. No. 4,437,975), arbitrarily primed polymerase chain reaction (WO 90/06995), invader technology, strand displacement technology, and nick displacement amplification (WO 2004/067726). This list is not intended to be exhaustive; any nucleic acid amplification technique may be used provided the appropriate nucleic acid product is specifically amplified. Design of suitable primers and/or probes is within the capability of one skilled in the art. Various primer design tools are freely available to assist in this process such as the NCBI Primer-BLAST tool. Primers and/or probes may be at least 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 or 25 (or more) nucleotides in length. mRNA expression levels may be measured by reverse transcription quantitative polymerase chain reaction (RT-PCR followed with qPCR). RT-PCR is used to create a cDNA from the mRNA. The cDNA may be used in a qPCR assay to produce fluorescence as the DNA amplification process progresses. By comparison to a standard curve, qPCR can produce an absolute measurement such as number of copies of mRNA per cell. Northern blots, microarrays, Invader assays, and RT-PCR combined with capillary electrophoresis have all been used to measure expression levels of mRNA in a sample. See Gene Expression Profiling: Methods and Protocols, Richard A. Shimkets, editor, Humana Press, 2004. Many detection technologies are well known and commercially available, such as TAQMAN®, MOLECULAR BEACONS®, AMPLIFLUOR® and SCORPION®, DzyNA®, Plexor™ etc.
Suitable amplification assays (PCR or qPCR) have been designed by the inventors and are described in further detail in Table 1B. The forward and reverse primers listed therein for each gene may be utilized according to all aspects of the invention. Similarly, the primers of SEQ ID NOs 3151-3154 may be used to amplify MIR578 and MIR4530 respectively.
RNA-seq uses next-generation sequencing to measure changes in gene expression. RNA may be converted into cDNA or directly sequenced. Next generation sequencing techniques include pyrosequencing, SOLiD sequencing, Ion Torrent semiconductor sequencing, Illumina dye sequencing, single-molecule real-time sequencing or DNA nanoball sequencing. RNA-seq allows quantitation of gene expression levels.
In situ RNA detection involves detecting RNA without extraction from tissues and cells. In situ RNA detection includes in situ hybridization (ISH) which uses a labeled (e.g. radio labelled, antigen labelled or fluorescence labelled) probe (complementary DNA or RNA strand) to localize a specific RNA sequence in a portion or section of tissue, or in the entire tissue (whole mount ISH), or in cells. The probe labeled with either radio-, fluorescent- or antigen-labeled bases (e.g., digoxigenin) may be localized and quantified in the tissue using either autoradiography, fluorescence microscopy or immunohistochemistry, respectively. ISH can also use two or more probes to simultaneously detect two or more transcripts. A branched DNA assay can also be used for RNA in situ hybridization assays with single molecule sensitivity. This approach includes ViewRNA assays. Samples (cells, tissues) are fixed, then treated to allow RNA target accessibility (RNA un-masking). Target-specific probes hybridize to each target RNA. Subsequent signal amplification is predicated on specific hybridization of adjacent probes (individual oligonucleotides that bind side by side on RNA targets). A typical target-specific probe will contain 40 oligonucleotides. Signal amplification is achieved via a series of sequential hybridization steps. A pre-amplifier molecule hybridizes to each oligo pair on the target-specific RNA, then multiple amplifier molecules hybridize to each pre-amplifier. Next, multiple label probe oligonucleotides (conjugated to an enzyme such as alkaline phosphatase or directly to fluorophores) hybridize to each amplifier molecule. Separate but compatible signal amplification systems enable multiplex assays. The signal can be visualized by measuring fluorescence or light emitted depending upon the detection system employed. Detection may involve using a high content imaging system, or a fluorescence or brightfield microscope in some embodiments.
Thus, in a further aspect the present invention relates to use of the kit for characterising and/or prognosing cancer, such as prostate cancer or ER positive breast cancer. The kit for (in situ) characterising and/or prognosing prostate cancer in a subject may comprise one or more oligonucleotide probes specific for an RNA product of at least one gene selected from Table 1. Suitable probes and probesets for each gene are listed in Table 1 and may be incorporated in the kits of the invention. The probes and probesets also constitute separate aspects of the invention. By “probeset” is meant the collection of probes designed to target (by hybridization) a single gene. The groupings are apparent from table 1 (and Table 1A).
The kit may further comprise one or more of the following components:
The components of the kit may be suitable for conducting a viewRNA assay (https://www.panomics.com/products/rna-in-situ-analysis/view-rna-overview).
The components of the kit may be nucleic acid based molecules, optionally DNA (or RNA). The blocking probe is a molecule that acts to reduce background signal by binding to sites on the target not bound by the target specific probes (probes specific for the RNA product of the at least one gene of the invention). The PreAmplifier is a molecule capable of binding to a (a pair of) target specific probe(s) when target bound. The Amplifier is a molecule capable of binding to the PreAmplifier. Alternatively, the Amplifier may be capable of binding directly to a (a pair of) target specific probe(s) when target bound. The Amplifier has binding sites for multiple label molecules (which may be label probes).
RNA expression may be determined by hybridization of RNA to a set of probes. The probes may be arranged in an array. Microarray platforms include those manufactured by companies such as Affymetrix, Illumina and Agilent. Examples of microarray platforms manufactured by Affymetrix include the U133 Plus2 array, the Almac proprietary Xcel™ array and the Almac proprietary Cancer DSAs®, including the Prostate Cancer DSA®.
In specific embodiments, according to all aspects of the invention, expression of the at least one gene may be determined using one or more probes selected from those listed in Table 1.
In certain embodiments, according to all aspects of the invention, expression of the at least one gene may be determined using one or more probes or primers designed to hybridize with the target sequences or full sequences listed in Table 1.
These probes may also be incorporated into the kits of the invention. The probe sequences may also be used in order to design primers for detection of expression, for example by RT-PCR. Such primers may also be included in the kits of the invention. Suitable primers are listed in Table 1B and SEQ ID NOs 3151-3154.
The corresponding target sequences are listed in Table 1 below for the relevant probesets. The invention may involve use of different probes that target any one or more of these target sequences.
Similarly, the full gene sequences are listed in Table 1 for the relevant probesets. The invention may involve use of different probes that target any one or more of these full gene sequences as target sequences.
Increased rates of DNA methylation at or near promoters have been shown to correlate with reduced gene expression levels. DNA methylation is the main epigenetic modification in humans. It is a chemical modification of DNA performed by enzymes called methyltransferases, in which a methyl group (m) is added to specific cytosine (C) residues in DNA. In mammals, methylation occurs only at cytosine residues adjacent to a guanosine residue, i.e. at the sequence CG or at the CpG dinucleotide.
Accordingly, in yet a further aspect, the present invention relates to a method for characterising and/or prognosing cancer, such as prostate cancer or ER positive breast cancer in a subject comprising:
determining the methylation status of at least one gene selected from Table 1 in a sample from the subject wherein the determined methylation status is used to provide a characterisation of and/or a prognosis for the cancer, such as prostate cancer or ER positive breast cancer.
Methylation typically results in a down regulation of gene expression. Thus, methylation (which may be hypermethylation) of the genes with a negative weighting in table 1 may be determined according to some embodiments in order to indicate a poor prognosis (or related outcome as described herein). Additionally or alternatively, a lack of methylation (which may be hypomethylation) of the genes with a positive weighting in table 1 may be determined according to some embodiments in order to indicate a poor prognosis (or related outcome as described herein).
Determination of the methylation status may be achieved through any suitable means. Suitable examples include bisulphite genomic sequencing and/or by methylation specific PCR. Various techniques for assessing methylation status are known in the art and can be used in conjunction with the present invention: sequencing (including NGS), methylation-specific PCR (MS-PCR), melting curve methylation-specific PCR (McMS-PCR), MLPA with or without bisulphite treatment, QAMA (Zeschnigk et al, 2004), MSRE-PCR (Melnikov et al, 2005), MethyLight (Eads et al., 2000), ConLight-MSP (Rand et al., 2002), bisulphite conversion-specific methylation-specific PCR (BS-MSP)(Sasaki et al., 2003), COBRA (which relies upon use of restriction enzymes to reveal methylation dependent sequence differences in PCR products of sodium bisulphite—treated DNA), methylation-sensitive single-nucleotide primer extension conformation (MS-SNuPE), methylation-sensitive single-strand conformation analysis (MS-SSCA), Melting curve combined bisulphite restriction analysis (McCOBRA)(Akey et al., 2002), PyroMethA, HeavyMethyl (Cottrell et al. 2004), MALDI-TOF, MassARRAY, Quantitative analysis of methylated alleles (QAMA), enzymatic regional methylation assay (ERMA), QBSUPT, MethylQuant, Quantitative PCR sequencing and oligonucleotide-based microarray systems, Pyrosequencing, Meth-DOP-PCR. A review of some useful techniques for DNA methylation analysis is provided in Nucleic acids research, 1998, Vol. 26, No. 10, 2255-2264, Nature Reviews, 2003, Vol. 3, 253-266; Oral Oncology, 2006, Vol. 42, 5-13.
Techniques for assessing methylation status are based on distinct approaches. Some include use of endonucleases. Such endonucleases may either preferentially cleave methylated recognition sites relative to non-methylated recognition sites or preferentially cleave non-methylated relative to methylated recognition sites. Some examples of the former are Acc III, Ban I, BstN I, Msp I, and Xma I. Examples of the latter are Acc II, Ava I, BssH II, BstU I, Hpa II, and Not I. Differences in cleavage pattern are indicative for the presence or absence of a methylated CpG dinucleotide. Cleavage patterns can be detected directly, or after a further reaction which creates products which are easily distinguishable. Means which detect altered size and/or charge can be used to detect modified products, including but not limited to electrophoresis, chromatography, and mass spectrometry.
Alternatively, the identification of methylated CpG dinucleotides may utilize the ability of the methyl binding domain (MBD) of the MeCP2 protein to selectively bind to methylated DNA sequences (Cross et al, 1994; Shiraishi et al, 1999). The MBD may also be obtained from MBP, MBP2, MBP4, poly-MBD (Jorgensen et al., 2006) or from reagents such as antibodies binding to methylated nucleic acid. The MBD may be immobilized to a solid matrix and used for preparative column chromatography to isolate highly methylated DNA sequences. Variant forms such as expressed His-tagged methyl-CpG binding domain may be used to selectively bind to methylated DNA sequences. Eventually, restriction endonuclease digested genomic DNA is contacted with expressed His-tagged methyl-CpG binding domain. Other methods are well known in the art and include amongst others methylated-CpG island recovery assay (MIRA). Another method, MB-PCR, uses a recombinant, bivalent methyl-CpG-binding polypeptide immobilized on the walls of a PCR vessel to capture methylated DNA and the subsequent detection of bound methylated DNA by PCR.
Further approaches for detecting methylated CpG dinucleotide motifs use chemical reagents that selectively modify either the methylated or non-methylated form of CpG dinucleotide motifs. Suitable chemical reagents include hydrazine and bisulphite ions. The methods of the invention may use bisulphite ions, in certain embodiments. The bisulphite conversion relies on treatment of DNA samples with sodium bisulphite which converts unmethylated cytosine to uracil, while methylated cytosines are maintained (Furuichi et al., 1970). This conversion finally results in a change in the sequence of the original DNA. It is general knowledge that the resulting uracil has the base pairing behaviour of thymidine which differs from cytosine base pairing behaviour. This makes the discrimination between methylated and non-methylated cytosines possible. Useful conventional techniques of molecular biology and nucleic acid chemistry for assessing sequence differences are well known in the art and explained in the literature. See, for example, Sambrook, J., et al., Molecular cloning: A laboratory Manual, (2001) 3rd edition, Cold Spring Harbor, N.Y.; Gait, M. J. (ed.), Oligonucleotide Synthesis, A Practical Approach, IRL Press (1984); Hames B. D., and Higgins, S. J. (eds.), Nucleic Acid Hybridization, A Practical Approach, IRL Press (1985); and the series, Methods in Enzymology, Academic Press, Inc.
Some techniques use primers for assessing the methylation status at CpG dinucleotides. Two approaches to primer design are possible. Firstly, primers may be designed that themselves do not cover any potential sites of DNA methylation. Sequence variations at sites of differential methylation are located between the two primers and visualisation of the sequence variation requires further assay steps. Such primers are used in bisulphite genomic sequencing, COBRA, Ms-SnuPE and several other techniques. Secondly, primers may be designed that hybridize specifically with either the methylated or unmethylated version of the initial treated sequence. After hybridization, an amplification reaction can be performed and amplification products assayed using any detection system known in the art. The presence of an amplification product indicates that a sample hybridized to the primer. The specificity of the primer indicates whether the DNA had been modified or not, which in turn indicates whether the DNA had been methylated or not. If there is a sufficient region of complementarity, e.g., 12, 15, 18, or 20 nucleotides, to the target, then the primer may also contain additional nucleotide residues that do not interfere with hybridization but may be useful for other manipulations. Examples of such other residues may be sites for restriction endonuclease cleavage, for ligand binding or for factor binding or linkers or repeats. The oligonucleotide primers may or may not be such that they are specific for modified methylated residues.
A further way to distinguish between modified and unmodified nucleic acid is to use oligonucleotide probes. Such probes may hybridize directly to modified nucleic acid or to further products of modified nucleic acid, such as products obtained by amplification. Probe-based assays exploit the oligonucleotide hybridisation to specific sequences and subsequent detection of the hybrid. There may also be further purification steps before the amplification product is detected e.g. a precipitation step. Oligonucleotide probes may be labeled using any detection system known in the art. These include but are not limited to fluorescent moieties, radioisotope labeled moieties, bioluminescent moieties, luminescent moieties, chemiluminescent moieties, enzymes, substrates, receptors, or ligands.
In the MSP approach, DNA may be amplified using primer pairs designed to distinguish methylated from unmethylated DNA by taking advantage of sequence differences as a result of sodium-bisulphite treatment (WO 97/46705). For example, bisulphite ions modify non-methylated cytosine bases, changing them to uracil bases. Uracil bases hybridize to adenine bases under hybridization conditions. Thus an oligonucleotide primer which comprises adenine bases in place of guanine bases would hybridize to the bisulphite-modified DNA, whereas an oligonucleotide primer containing the guanine bases would hybridize to the non-modified (methylated) cytosine residues in the DNA. Amplification using a DNA polymerase and a second primer yield amplification products which can be readily observed, which in turn indicates whether the DNA had been methylated or not. Whereas PCR is a preferred amplification method, variants on this basic technique such as nested PCR and multiplex PCR are also included within the scope of the invention.
As mentioned earlier, one embodiment for assessing the methylation status of the relevant gene requires amplification to yield amplification products. The presence of amplification products may be assessed directly using methods well known in the art, and the ensuing discussion also applies to all other amplification embodiments as described herein. They simply may be visualized on a suitable gel, such as an agarose or polyacrylamide gel. Detection may involve the binding of specific dyes, such as ethidium bromide, which intercalate into double-stranded DNA and visualisation of the DNA bands under a UV illuminator for example. Another means for detecting amplification products comprises hybridization with oligonucleotide probes. Alternatively, fluorescence or energy transfer can be measured to determine the presence of the methylated DNA.
A specific example of the MSP technique is designated real-time quantitative MSP (QMSP), and permits reliable quantification of methylated DNA in real time or at end point. Real-time methods are generally based on the continuous optical monitoring of an amplification procedure and utilise fluorescently labelled reagents whose incorporation in a product can be quantified and whose quantification is indicative of copy number of that sequence in the template. One such reagent is a fluorescent dye, called SYBR Green I that preferentially binds double-stranded DNA and whose fluorescence is greatly enhanced by binding of double-stranded DNA. Alternatively, labelled primers and/or labelled probes can be used for quantification. They represent a specific application of the well-known and commercially available real-time amplification techniques such as TAQMAN®, MOLECULAR BEACONS®, AMPLIFLUOR® and SCORPION®, DzyNA®, Plexor™ etc. In the real-time PCR systems, it is possible to monitor the PCR reaction during the exponential phase where the first significant increase in the amount of PCR product correlates to the initial amount of target template.
Real-Time PCR detects the accumulation of amplicon during the reaction. Real-time methods do not need to be utilised, however. Many applications do not require quantification and Real-Time PCR is used only as a tool to obtain convenient results presentation and storage, and at the same time to avoid post-PCR handling. Thus, analyses can be performed only to confirm whether the target DNA is present in the sample or not. Such end-point verification is carried out after the amplification reaction has finished.
The expression level of one or more genes from Table 1 may be determined by immunohistochemistry. By Immunohistochemistry is meant the detection of proteins in cells of a tissue sample by using a binding reagent such as an antibody or aptamer that binds specifically to the proteins. Thus, the expression level as determined by immunohistochemistry is a protein level. The sample may be a tissue sample and may comprise cancer (tumour) cells, normal tissue cells and, optionally, infiltrating immune cells. In embodiments applicable to prostate cancer, the sample may be a prostate tissue sample and may comprise prostate cancer (tumour) cells, prostatic intraepithelial neoplasia (PIN) cells, normal prostate epithelium, stroma and, optionally, infiltrating immune cells. In some embodiments the expression level of the at least one gene in the cancer (tumour) cells in a sample is compared to the expression level of the same gene (and/or a reference gene) in the normal cells in the same sample. In some embodiments the expression level of the at least one gene in the cancer (tumour) cells in a sample is compared to the expression level of the same gene (and/or a reference gene) in the normal cells in a control sample. The normal cells may comprise, consist essentially of or consist of normal (non-cancer) epithelial cells. In certain embodiments the normal cells do not comprise PIN cells and/or stroma cells. In certain embodiments the prostate cancer (tumour) cells do not comprise PIN cells and/or stroma cells. In further embodiments the expression level of the at least one gene in the prostate cancer (tumour) cells in a sample is (additionally) compared to the expression level of a reference gene in the same cells or in the prostate cancer cells in a control sample. In yet further embodiments the expression level of the at least one gene in the cancer (tumour) cells in a sample is scored using a method based on intensity, proportion and/or localisation of expression in the cancer (tumour) cells (without comparison to normal cells). The scoring method may be derived in a development or training phase from a set of patients with known outcome.
Accordingly, in a further aspect, the present invention relates to an antibody or aptamer that binds specifically to a protein product of at least one gene selected from Table 1. The epitope to which the antibody or aptomer binds may be derived from the amino acid sequences corresponding to the full sequences or target sequences identified in Table 1.
The antibody may be of monoclonal or polyclonal origin. Fragments and derivative antibodies may also be utilised, to include without limitation Fab fragments, ScFv, single domain antibodies, nanoantibodies, heavy chain antibodies, aptamers etc. which retain peptide-specific binding function and these are included in the definition of “antibody”. Such antibodies are useful in the methods of the invention. They may be used to measure the level of a particular protein, or in some instances one or more specific isoforms of a protein. The skilled person is well able to identify epitopes that permit specific isoforms to be discriminated from one another.
Methods for generating specific antibodies are known to those skilled in the art. Antibodies may be of human or non-human origin (e.g. rodent, such as rat or mouse) and be humanized etc. according to known techniques (Jones et al., Nature (1986) May 29-Jun. 4; 321(6069):522-5; Roguska et al., Protein Engineering, 1996, 9(10):895-904; and Studnicka et al., Humanizing Mouse Antibody Frameworks While Preserving 3-D Structure. Protein Engineering, 1994, Vol. 7, pg 805).
In certain embodiments the expression level is determined using an antibody or aptamer conjugated to a label. By label is meant a component that permits detection, directly or indirectly. For example, the label may be an enzyme, optionally a peroxidase, or a fluorophore.
A label is an example of, and may form part of, a detection agent. By detection agent is meant an agent that may be used to assist in the detection of the complex between binding reagent (which may be an antibody, primer or probe for example) and target. The binding agent may form part of the overall detection agent. Where the antibody is conjugated to an enzyme the detection agent may be comprise a chemical composition such that the enzyme catalyses a chemical reaction to produce a detectable product. The products of reactions catalyzed by appropriate enzymes can be, without limitation, fluorescent, luminescent, or radioactive or they may absorb visible or ultraviolet light. Examples of detectors suitable for detecting such detectable labels include, without limitation, x-ray film, radioactivity counters, scintillation counters, spectrophotometers, colorimeters, fluorometers, luminometers, and densitometers. In certain embodiments the detection agent may comprise a secondary antibody. The expression level is then determined using an unlabeled primary antibody that binds to the target protein and a secondary antibody conjugated to a label, wherein the secondary antibody binds to the primary antibody.
The invention also relates to use of an antibody or aptamer as described above for characterising and/or prognosing a cancer, such as prostate cancer or ER positive breast cancer in a subject.
Additional techniques for determining expression level at the level of protein include, for example, Western blot, immunoprecipitation, immunocytochemistry, mass spectrometry, ELISA and others (see ImmunoAssay: A Practical Guide, edited by Brian Law, published by Taylor & Francis, Ltd., 2005 edition). To improve specificity and sensitivity of an assay method based on immunoreactivity, monoclonal antibodies are often used because of their specific epitope recognition. Polyclonal antibodies have also been successfully used in various immunoassays because of their increased affinity for the target as compared to monoclonal antibodies.
According to all aspects of the invention samples may be of any suitable form. The sample is typically intended to contain nucleic acids (DNA and/or RNA), or protein in some embodiments, from the primary tumour (even if no longer contained within the tumour cells e.g. shed into the circulation). The sample may comprise, consist essentially of or consist of cells, such as prostate or breast cells and often a suitable tissue sample (such as a prostate or breast tissue sample). The sample may comprise or be a primary tumour sample. The cells or tissue may comprise cancer cells, such as prostate cancer cells or ER positive breast cancer cells. In specific embodiments the sample comprises, consists essentially of or consists of a biopsy sample, which may be fixed, such as a formalin-fixed paraffin-embedded biopsy sample. The tissue sample may be obtained by any suitable technique. Examples include a biopsy procedure, optionally a fine needle aspirate biopsy procedure. Body fluid samples may also be utilised. Samples may comprise resection material (e.g. where radical prostatectomy has been performed). Suitable sample types include blood, to encompass whole blood, serum and plasma samples, urine and semen.
The methods described herein may further comprise extracting nucleic acids, DNA and/or RNA from the sample. Suitable methods are known in the art and include use of commercially available kits such as Rneasy and GeneJET RNA purification kit.
In certain embodiments the methods may further comprise obtaining the sample from the subject. Typically the methods are in vitro methods performed on an isolated sample.
The methods of the invention may prove useful for determining which patients should undergo a more aggressive therapeutic regime, by identifying high risk cancers (i.e, those within the high metastatic potential group and thus having a poor prognosis).
The methods of the invention may comprise selecting a treatment for cancer, such as prostate cancer or ER positive breast cancer in a subject and optionally performing the treatment. In certain embodiments if the characterisation of and/or prognosis for the cancer, such as prostate cancer or ER positive breast cancer is an increased likelihood of recurrence and/or metastasis and/or a poor prognosis the treatment selected may be one or more of
a) an anti-hormone treatment
b) a cytotoxic agent
c) a biologic
d) radiotherapy
e) targeted therapy
f) surgery
By anti-hormone treatment (or hormone therapy) is meant a form of treatment which reduces the level and/or activity of selected hormones, in particular testosterone. The hormones may promote tumour growth and/or metastasis. The anti-hormone treatment may comprise a luteinizing hormone blocker, such as goserelin (also called Zoladex), buserelin, leuprorelin (also called Prostap), histrelin (Vantas) and triptorelin (also called Decapeptyl). The anti-hormone treatment may comprise a gonadotrophin release hormone (GnRH) blocker such as degarelix (Firmagon) or an anti-androgen such as flutamide (also called Drogenil) and bicalutamide (also called Casodex). In specific embodiments the anti-hormone treatment may be bicalutamide and/or abiraterone.
The cytotoxic agent may be administered as an adjuvant therapy. The cytotoxic agent may be a platinum based agent and/or a taxane. In specific embodiments the platinum based agent is selected from cisplatin, carboplatin and oxaliplatin. The taxane may be paclitaxel, cabazitaxel or docetaxel. The cytotoxic agent may also be a vinca alkaloid, such as vinorelbine or vinblastine. The cytotoxic agent may be a topoisomerase inhibitor such as etoposide or an anthracycline (antibiotic) such as doxorubicin. The cytotoxic agent may be an alkylating agent such as estramustine. Adjuvant taxane and/or topoisomerase inhibitor therapy may be particularly suitable for treatment of ER positive breast cancer.
By biologic is meant a medicinal product that is created by a biological process. A biologic may be, for example, a vaccine, blood or blood component, cells, gene therapy, tissue, or a recombinant therapeutic protein. Optionally the biologic is an antibody and/or a vaccine. The biologic may be Sipuleucel-T. The biologic may be a cancer immunotherapy.
In certain embodiments the radiotherapy is extended radiotherapy, preferably extended-field radiotherapy. In specific embodiments, the radiotherapy comprises or is (pelvic) lymph node irradiation. Adjuvant radiation may be employed.
Surgery may comprise radical prostatectomy. By radical prostatectomy is meant removal of the entire prostate gland, the seminal vesicles and the vas deferens. In further embodiments surgery comprises tumour resection i.e. removal of all or part of the tumour. Surgery may comprise or be extended nodal dissection.
By targeted therapy is meant treatment using targeted therapeutic agents which are directed towards a specific drug target for the treatment of a cancer, such as prostate cancer or ER positive breast cancer. In specific embodiments this may mean inhibitors directed towards targets such as PARP, AKT, MET, VEGFR etc. PARP inhibitors are a group of pharmacological inhibitors of the enzyme poly ADP ribose polymerase (PARP). Several forms of cancer are more dependent on PARP than regular cells, making PARP an attractive target for cancer therapy. Examples (in clinical trials) include iniparib, olaparib, rucaparib, veliparib, CEP 9722, MK 4827, BMN-673 and 3-aminobenzamide. AKT, also known as Protein Kinase B (PKB), is a serine/threonine-specific protein kinase that plays a key role in multiple cellular processes such as glucose metabolism, apoptosis, cell proliferation, transcription and cell migration. AKT is associated with tumor cell survival, proliferation, and invasiveness. Examples of AKT inhibitors include VQD-002, Perifosine, Miltefosine and AZD5363. MET is a proto-oncogene that encodes hepatocyte growth factor receptor (HGFR). The hepatocyte growth factor receptor protein possesses tyrosine-kinase activity. Examples of kinase inhibitors for inhibition of MET include K252a, SU11274, PHA-66752, ARQ197, Foretinib, SGX523 and MP470. MET activity can also be blocked by inhibiting the interaction with HGF. Many suitable antagonists including truncated HGF, anti-HGF antibodies and uncleavable HGF are known. VEGF receptors are receptors for vascular endothelial growth factor (VEGF). Various inhibitors are known such as lenvatinib, motesanib, pazopanib and regorafenib.
If the method identifies the cancer as not within the high metastatic potential group, then different decisions may be taken. If the cancer has already been treated e.g. by radiotherapy or surgery, the decision may be taken not to treat the cancer further. The decision may be taken to continue to monitor the cancer, by any suitable means (e.g. by PSA levels or using the methods of the invention), and not perform any further treatment if the cancer remains in the same state.
The methods of the present invention can guide therapy selection as well as selecting patient groups for enrichment strategies during clinical trial evaluation of novel therapeutics. For example, when evaluating a putative anti-cancer agent or treatment regime, the methods disclosed herein may be used to select individuals for clinical trials that have cancer, such as prostate cancer or ER positive breast cancer, characterized as having an increased likelihood of recurrence and/or metastasis and/or a poor prognosis.
The invention also relates to a system or device or test kit for performing a method as described herein.
In a further aspect, the present invention relates to a system, device or test kit for characterising and/or prognosing cancer, such as prostate cancer or ER positive breast cancer in a subject, comprising:
By testing device is meant a combination of components that allows the expression level of a gene to be determined. The components may include any of those described above with respect to the methods for determining expression level at the level of protein, RNA or epigenetic modification. For example the components may be antibodies, primers, detection agents and so on. Components may also include one or more of the following: microscopes, microscope slides, x-ray film, radioactivity counters, scintillation counters, spectrophotometers, colorimeters, fluorometers, luminometers, and densitometers. The discussion of the methods of the invention thus applies mutatis mutandis to these aspects of the invention.
In certain embodiments the system, device or test kit further comprises a(n electronic) display for the output from the processor.
The invention also relates to a computer application or storage medium comprising a computer application as defined above.
In certain example embodiments, provided is a computer-implemented method, system, and a computer program product for characterising and/or prognosing cancer, such as prostate cancer or ER positive breast cancer in a subject, in accordance with the methods described herein. For example, the computer program product may comprise a non-transitory computer-readable storage device having computer-readable program instructions embodied thereon that, when executed by a computer, cause the computer to characterise and/or prognose cancer, such as prostate cancer or ER positive breast cancer in a subject as described herein. For example, the computer executable instructions may cause the computer to:
(i) access and/or calculate the determined expression levels of the at least one gene selected from Table 1 in a sample on one or more testing devices;
(ii) calculate whether there is an increased or decreased level of the at least one gene selected from Table 1 in the sample; and,
(iii) provide an output regarding the characterization of and/or prognosis for the cancer, such as prostate cancer or ER positive breast cancer.
In certain example embodiments, the computer-implemented method, system, and computer program product may be embodied in a computer application, for example, that operates and executes on a computing machine and a module. When executed, the application may characterise and/or prognose cancer, such as prostate cancer or ER positive breast cancer in a subject, in accordance with the example embodiments described herein.
As used herein, the computing machine may correspond to any computers, servers, embedded systems, or computing systems. The module may comprise one or more hardware or software elements configured to facilitate the computing machine in performing the various methods and processing functions presented herein. The computing machine may include various internal or attached components such as a processor, system bus, system memory, storage media, input/output interface, and a network interface for communicating with a network, for example. The computing machine may be implemented as a conventional computer system, an embedded controller, a laptop, a server, a customized machine, any other hardware platform, such as a laboratory computer or device, for example, or any combination thereof. The computing machine may be a distributed system configured to function using multiple computing machines interconnected via a data network or bus system, for example.
The processor may be configured to execute code or instructions to perform the operations and functionality described herein, manage request flow and address mappings, and to perform calculations and generate commands. The processor may be configured to monitor and control the operation of the components in the computing machine. The processor may be a general purpose processor, a processor core, a multiprocessor, a reconfigurable processor, a microcontroller, a digital signal processor (“DSP”), an application specific integrated circuit (“ASIC”), a graphics processing unit (“GPU”), a field programmable gate array (“FPGA”), a programmable logic device (“PLD”), a controller, a state machine, gated logic, discrete hardware components, any other processing unit, or any combination or multiplicity thereof. The processor may be a single processing unit, multiple processing units, a single processing core, multiple processing cores, special purpose processing cores, co-processors, or any combination thereof. According to certain example embodiments, the processor, along with other components of the computing machine, may be a virtualized computing machine executing within one or more other computing machines.
The system memory may include non-volatile memories such as read-only memory (“ROM”), programmable read-only memory (“PROM”), erasable programmable read-only memory (“EPROM”), flash memory, or any other device capable of storing program instructions or data with or without applied power. The system memory may also include volatile memories such as random access memory (“RAM”), static random access memory (“SRAM”), dynamic random access memory (“DRAM”), and synchronous dynamic random access memory (“SDRAM”). Other types of RAM also may be used to implement the system memory. The system memory may be implemented using a single memory module or multiple memory modules. While the system memory may be part of the computing machine, one skilled in the art will recognize that the system memory may be separate from the computing machine without departing from the scope of the subject technology. It should also be appreciated that the system memory may include, or operate in conjunction with, a non-volatile storage device such as the storage media.
The storage media may include a hard disk, a floppy disk, a compact disc read only memory (“CD-ROM”), a digital versatile disc (“DVD”), a Blu-ray disc, a magnetic tape, a flash memory, other non-volatile memory device, a solid state drive (“SSD”), any magnetic storage device, any optical storage device, any electrical storage device, any semiconductor storage device, any physical-based storage device, any other data storage device, or any combination or multiplicity thereof. The storage media may store one or more operating systems, application programs and program modules such as module, data, or any other information. The storage media may be part of, or connected to, the computing machine. The storage media may also be part of one or more other computing machines that are in communication with the computing machine, such as servers, database servers, cloud storage, network attached storage, and so forth.
The module may comprise one or more hardware or software elements configured to facilitate the computing machine with performing the various methods and processing functions presented herein. The module may include one or more sequences of instructions stored as software or firmware in association with the system memory, the storage media, or both. The storage media may therefore represent examples of machine or computer readable media on which instructions or code may be stored for execution by the processor. Machine or computer readable media may generally refer to any medium or media used to provide instructions to the processor. Such machine or computer readable media associated with the module may comprise a computer software product. It should be appreciated that a computer software product comprising the module may also be associated with one or more processes or methods for delivering the module to the computing machine via a network, any signal-bearing medium, or any other communication or delivery technology. The module may also comprise hardware circuits or information for configuring hardware circuits such as microcode or configuration information for an FPGA or other PLD.
The input/output (“I/O”) interface may be configured to couple to one or more external devices, to receive data from the one or more external devices, and to send data to the one or more external devices. Such external devices along with the various internal devices may also be known as peripheral devices. The I/O interface may include both electrical and physical connections for operably coupling the various peripheral devices to the computing machine or the processor. The I/O interface may be configured to communicate data, addresses, and control signals between the peripheral devices, the computing machine, or the processor. The I/O interface may be configured to implement any standard interface, such as small computer system interface (“SCSI”), serial-attached SCSI (“SAS”), fiber channel, peripheral component interconnect (“PCI”), PCI express (PCIe), serial bus, parallel bus, advanced technology attached (“ATA”), serial ATA (“SATA”), universal serial bus (“USB”), Thunderbolt, FireWire, various video buses, and the like. The I/O interface may be configured to implement only one interface or bus technology.
Alternatively, the I/O interface may be configured to implement multiple interfaces or bus technologies. The I/O interface may be configured as part of, all of, or to operate in conjunction with, the system bus. The I/O interface may include one or more buffers for buffering transmissions between one or more external devices, internal devices, the computing machine, or the processor.
The I/O interface may couple the computing machine to various input devices including mice, touch-screens, scanners, electronic digitizers, sensors, receivers, touchpads, trackballs, cameras, microphones, keyboards, any other pointing devices, or any combinations thereof. The I/O interface may couple the computing machine to various output devices including video displays, speakers, printers, projectors, tactile feedback devices, automation control, robotic components, actuators, motors, fans, solenoids, valves, pumps, transmitters, signal emitters, lights, and so forth.
The computing machine may operate in a networked environment using logical connections through the network interface to one or more other systems or computing machines across the network. The network may include wide area networks (WAN), local area networks (LAN), intranets, the Internet, wireless access networks, wired networks, mobile networks, telephone networks, optical networks, or combinations thereof. The network may be packet switched, circuit switched, of any topology, and may use any communication protocol. Communication links within the network may involve various digital or an analog communication media such as fiber optic cables, free-space optics, waveguides, electrical conductors, wireless links, antennas, radio-frequency communications, and so forth. The processor may be connected to the other elements of the computing machine or the various peripherals discussed herein through the system bus. It should be appreciated that the system bus may be within the processor, outside the processor, or both. According to some embodiments, any of the processor, the other elements of the computing machine, or the various peripherals discussed herein may be integrated into a single device such as a system on chip (“SOC”), system on package (“SOP”), or ASIC device.
Embodiments may comprise a computer program that embodies the functions described and illustrated herein, wherein the computer program is implemented in a computer system that comprises instructions stored in a machine-readable medium and a processor that executes the instructions. However, it should be apparent that there could be many different ways of implementing embodiments in computer programming, and the embodiments should not be construed as limited to any one set of computer program instructions. Further, a skilled programmer would be able to write such a computer program to implement one or more of the disclosed embodiments described herein. Therefore, disclosure of a particular set of program code instructions is not considered necessary for an adequate understanding of how to make and use embodiments. Further, those skilled in the art will appreciate that one or more aspects of embodiments described herein may be performed by hardware, software, or a combination thereof, as may be embodied in one or more computing systems. Moreover, any reference to an act being performed by a computer should not be construed as being performed by a single computer as more than one computer may perform the act.
The example embodiments described herein can be used with computer hardware and software that perform the methods and processing functions described previously. The systems, methods, and procedures described herein can be embodied in a programmable computer, computer-executable software, or digital circuitry. The software can be stored on computer-readable media. For example, computer-readable media can include a floppy disk, RAM, ROM, hard disk, removable media, flash memory, memory stick, optical media, magneto-optical media, CD-ROM, etc. Digital circuitry can include integrated circuits, gate arrays, building block logic, field programmable gate arrays (FPGA), etc.
Reagents, tools, and/or instructions for performing the methods described herein can be provided in a kit. Such a kit can include reagents for collecting a tissue sample from a patient, such as by biopsy, and reagents for processing the tissue. Thus, the kit may include suitable fixatives, such as formalin and embedding reagents, such as paraffin. The kit can also include one or more reagents for performing an expression level analysis, such as reagents for performing nucleic acid amplification, including RT-PCR and qPCR, NGS (RNA-seq), northern blot, proteomic analysis, or immunohistochemistry to determine expression levels of biomarkers in a sample of a patient. For example, primers for performing RT-PCR, probes for performing northern blot analyses or bDNA assays, and/or antibodies or aptamers, as discussed herein, for performing proteomic analysis such as Western blot, immunohistochemistry and ELISA analyses can be included in such kits. Appropriate buffers for the assays can also be included. Detection reagents required for any of these assays can also be included. The kits may be array or PCR based kits for example and may include additional reagents, such as a polymerase and/or dNTPs for example. The kits featured herein can also include an instruction sheet describing how to perform the assays for measuring expression levels.
There is provided a kit for characterising and/or prognosing cancer in a subject comprising one or more primers and/or primer pairs for amplifying and/or which specifically hybridize with at least one gene, full sequence or target sequence selected from Table 1. There is also provided a kit for characterising and/or prognosing cancer in a subject comprising one or more probes that specifically hybridize with at least one gene, full sequence or target sequence selected from Table 1.
The kit may include one or more primer pairs and/or probes complementary to at least one gene selected from Table 1. In certain embodiments, according to all aspects of the invention, the kits may include one or more probes or primers (primer pairs) designed to hybridize with the target sequences or full sequences listed in Table 1 and thus permit expression levels to be determined. The probes and probesets identified in table 1 and 1A may be employed according to all aspects of the invention. The primers and primer pairs identified in Table 1B may also be employed according to all aspects of the invention.
The kits may include primers/primer pairs/probes/probesets to form any of the gene signatures specified herein (see for example the gene signatures of Tables 1 to 24).
The kits may also include one or more primer pairs complementary to a reference gene.
Such a kit can also include primer pairs complementary to at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69 or 70 of the genes listed in Table 1.
Thus, in a further aspect the present invention relates to a kit for (in situ) characterising and/or prognosing prostate cancer in a subject comprising one or more oligonucleotide probes specific for an RNA product of at least one gene selected from Table 1. Suitable probes and probesets for each gene are listed in Table 1 and may be incorporated in the kits of the invention. The probes and probesets also constitute separate aspects of the invention. By “probeset” is meant the collection of probes designed to target (by hybridization) a single gene. The groupings are apparent from table 1 (and Table 1A).
The kit may further comprise one or more of the following components:
The components of the kit may be suitable for conducting a viewRNA assay (https://www.panomics.com/products/rna-in-situ-analysis/view-rna-overview).
The components of the kit may be nucleic acid based molecules, optionally DNA (or RNA). The blocking probe is a molecule that acts to reduce background signal by binding to sites on the target not bound by the target specific probes (probes specific for the RNA product of the at least one gene of the invention). The PreAmplifier is a molecule capable of binding to a (a pair of) target specific probe(s) when target bound. The Amplifier is a molecule capable of binding to the PreAmplifier. Alternatively, the Amplifier may be capable of binding directly to a (a pair of) target specific probe(s) when target bound. The Amplifier has binding sites for multiple label molecules (which may be label probes).
Kits for characterising and/or prognosing cancer, such as prostate cancer or ER positive breast cancer in a subject may permit the methylation status of at least one gene selected from Table 1 to be determined. The determined methylation status, which may be hypermethylation or hypomethylation as appropriate, is used to provide a characterisation of and/or a prognosis for the cancer, such as prostate cancer or ER positive breast cancer. Such kits may include primers and/or probes for determining the methylation status of the gene or genes directly. They may thus comprise methylation specific primers and/or probes that discriminate between methylated and unmethylated forms of DNA by hybridization. Such primers and/or probes may include derivatives of the primers and probes described herein, which are adapted to reflect selective modification of the cytosine residues in the target sequence depending upon whether they are methylated or not. Thus, sets of “methylated-specific” and “unmethylated-specific” primers (to include primer pairs) and probes may be designed in order to probe particular cytosine-containing target sequences. Such kits will typically also contain a reagent that selectively modifies either the methylated or non-methylated form of CpG dinucleotide motifs. Suitable chemical reagents comprise hydrazine and bisulphite ions. An example is sodium bisulphite. The kits may, however, contain other reagents as discussed hereinabove to determine methylation status such as restriction endonucleases. Methylation specific PCR primers may be derived from the primer pairs of Table 1B and of SEQ ID NOs 3151-3154, to take account of bisulphite conversion of CpG dinucleotide pairs if present in the unmethylated form (unmethylated-specific) or lack of conversion if the CpG dinucleotide is methylated (methylated-specific).
The invention also relates to a kit for characterising and/or prognosing cancer, such as prostate cancer or ER positive breast cancer in a subject comprising one or more antibodies or aptamers as described above and which are useful in the methods of the invention.
Informational material included in the kits can be descriptive, instructional, marketing or other material that relates to the methods described herein and/or the use of the reagents for the methods described herein. For example, the informational material of the kit can contain contact information, e.g., a physical address, email address, website, or telephone number, where a user of the kit can obtain substantive information about performing a gene expression analysis and interpreting the results.
The kit may further comprise a computer application or storage medium as described above.
The example systems, methods, and acts described in the embodiments presented previously are illustrative, and, in alternative embodiments, certain acts can be performed in a different order, in parallel with one another, omitted entirely, and/or combined between different example embodiments, and/or certain additional acts can be performed, without departing from the scope and spirit of various embodiments. Accordingly, such alternative embodiments are included in the scope of the invention as described herein.
Although specific embodiments have been described above in detail, the description is merely for purposes of illustration. It should be appreciated, therefore, that many aspects described above are not intended as required or essential elements unless explicitly stated otherwise.
Modifications of, and equivalent components or acts corresponding to, the disclosed aspects of the example embodiments, in addition to those described above, can be made by a person of ordinary skill in the art, having the benefit of the present disclosure, without departing from the spirit and scope of embodiments defined in the following claims, the scope of which is to be accorded the broadest interpretation so as to encompass such modifications and equivalent structures.
The present invention will be further understood by reference to the following experimental examples.
Tumor Material
70 primary prostate cancers with no known concomitant metastases, 20 primary prostate cancers with known lymph node metastases, 11 lymph nodes containing metastatic prostate cancer, 25 normal prostate samples.
Gene Expression Profiling from FFPE
Total RNA was extracted from macrodissected FFPE tissue using the High Pure RNA Paraffin Kit (Roche Diagnostics GmbH, Mannheim, Germany). RNA was converted into complementary deoxyribonucleic acid (cDNA), which was subsequently amplified and converted into single-stranded form using the SPIA® technology of the WT-Ovation™ FFPE RNA Amplification System V2 (NuGEN Technologies Inc., San Carlos, Calif., USA). The amplified single-stranded cDNA was then fragemented and biotin labeled using the FL-Ovation™ cDNA Biotin Module V2 (NuGEN Technologies Inc.). The fragmented and labeled cDNA was then hybridized to the Almac Prostate Cancer DSA™. Almac's Prostate Cancer DSA™ research tool has been optimised for analysis of FFPE tissue samples, enabling the use of valuable archived tissue banks. The Almac Prostate Cancer DSA™ research tool is an innovative microarray platform that represents the transcriptome in both normal and cancerous prostate tissues. Consequently, the Prostate Cancer DSA™ provides a comprehensive representation of the transcriptome within prostate disease and tissue setting, not available using generic microarray platforms. Arrays were scanned using the Affymentrix Genechip® Scanner 7G (Affymetrix Inc., Santa Clara, Calif.).
Data Preparation
Quality Control (QC) of profiled samples was carried out using MASS pre-processing algorithm. Various technical aspects were assessed including: average noise and background homogeneity, percentage of present call (array quality), signal quality, RNA quality and hybridization quality. Distributions and Median Absolute Deviation of corresponding parameters were analyzed and used to identify possible outliers.
Almac's Prostate Cancer DSA™ contains probes that primarily target the area within 300 nucleotides from the 3′ end. Therefore standard Affymetrix RNA quality measures were adapted—for housekeeping genes intensities of 3′ end probe sets with ratios of 3′ end probe set intensity to the average background intensity were used in addition to usual 375′ ratios. Hybridization controls were checked to ensure that their intensities and present calls conform to the requirements specified by Affymetrix.
Hierarchical Clustering and Functional Analysis
Sample pre-processing was carried out using Robust Multi-Array analysis (RMA) [1]. The data matrix was initially summarised to Entrez gene ID level using Ensemble annotation version 75, specifically ustilising the probe set that was least associated to present call for each Entrez gene. Probe sets that 1) did not map to an Entrez gene ID or 2) mapped to multiple Entrez gene IDs were removed. The resulting gene level data matrix was sorted by decreasing variance and intensity and incremental subsets of the data matrix were tested for cluster stability: the GAP statistic [2] was applied to calculate the number of sample and gene clusters while the stability of cluster composition was assessed using partition comparison methods. The final most variable gene list was determined based on the smallest and most stable data matrix for the selected number of sample cluster.
Following standardization of the data matrix to the median gene expression values, agglomerative hierarchical clustering was performed using Euclidean distance and Ward's linkage method [3]. The optimal number of sample and gene clusters was determined using the GAP statistic [2] which compares the change in with-cluster dispersion with that expected under a reference null distribution. The significance of the distribution of clinical parameter factor levels across sample clusters was assessed using ANOVA (continuous factor) or chi-squared analysis (discrete factor) and corrected for false discovery rate (product of p-value and number of tests performed). A corrected p-value threshold of 0.05 was used as criterion for significance.
Functional enrichment analysis was conducted to identify and rank biological entities which were found to be associated with the clustered gene sets using the Gene Ontology biological processes classification [4]. Entities were ranked according to a statistically derived enrichment score [5] and adjusted for multiple testing [6]. A corrected p-value of 0.05 was used as significance threshold. The identified enriched processes were summarised into an overall group function for each gene cluster.
From the hierarchical clustering analysis, primary tumour samples clustering with metastatic samples will be labelled as tad whereas primary tumour samples clustering with normal samples will be labelled as ‘good’.
Signature Generation
Following the identification of class labels a gene signature was derived to enable prospective identification of the bad prognosis group within the primary tumour samples. The following steps summarise the procedure for developing the gene signature:
Model selection included the following steps:
The signature length that yielded a high AUC in training set; a high C-index in the Taylor set; and a low SD in the heterogeneity samples was selected.
Multivariate Analysis
Of interest is the time until biochemical recurrence in prostate cancer patients in the Taylor dataset. Multivariable Cox survival modelling was used to test for and describe interactions with the biomarker, understand prognostic factors and model the relative effect of prognostic factors. Based on clinical judgement pre-operative PSA (4 ng/ml), pathology stage (“T2 A/B/C”, “T3 A/B/C”, “T4”), Gleason (<7, 7, 8-9) and the dichotomised signature score were used as independent predictor variables. A log 2 transformation of pre-operative PSA was applied. Multiple imputation was used to ensure all available events were used in the analysis. The sample size is 168 patients with 46 biochemical recurrence events and the median time until biochemical recurrence approximately 15 years. A formal test of the proportional hazard assumption, assessment of the functional form of the log transformation of Pre PSA and the model fit using a graphical plot of the Nelson-Aalen cumulative hazard function all provided no cause for concern. Twelve influential data points defined by a change to the regression coefficient equal to or greater than 2 standard errors on removal from the analysis were identified. These were not removed or investigated further.
Following model selection two independent prostate cancer data sets were further evaluated with the final model:
Performance of each of these data sets was evaluated using AUC, to establish if the signature could discriminate patients with recurrences from those with no recurrences, under the hypothesis that higher scores are more representative of patients with metastatic-like disease (bad prognosis) therefore more likely to have a recurrence outcome.
Evaluation of the Final Model in Breast Cancer Data Sets
It was of further interest to evaluate the final signature in other hormone related data sets with respect to predicting prognosis in untreated patients. Three ER positive breast cancer data sets were evaluated:
3. Data set retrieved from Gene Expression Omnibus database, accession number GSE2990
For each data set a median signature score cut-off was applied to predict patients as either signature positive (metastatic-like) if they scored above the median value, or signature negative (non-metastatic-like) otherwise. Kaplan Meier curve was used to observe the survival differences between the two subgroups of patients. Cox proportional hazard regression analysis of the signature calls against each endpoint was used to calculate a univariate hazard ratio for the signature as a measure of performance against the respective clinical endpoint.
Results
126 samples passed microarray QC and subsequently underwent unsupervised hierarchical clustering based on 1000 most variable genes. Four sample clusters and four gene clusters were identified (
The results from signature development at all considered signature lengths are provided in
The signature content and weightings of the final 70 gene model are listed in Table 1. The 70 gene scores calculated in the Taylor data were dichotomised at a threshold of 0.4241 where patients with a signature score >0.4241 were classified as “bad prognosis” and patients with a signature score 0.4241 were classified as “good prognosis”. The signature classifications into good and poor prognosis were used to generate a Kaplan Meier curve to show the differences in survival probabilities for the two predicted groups.
Evaluation of the Final Model in Breast Cancer Data Sets
The results of evaluating the 70 gene signature in three breast cancer data sets is described below:
Purpose:
The purpose of this analysis is to evaluate the performance of the 70 gene signature when a random probeset per gene is selected. This is to provide evidence of the importance of certain probesets associated to the signature genes.
Data:
Table 26 outlines the number of probesets available per signature gene. The table shows that the number of probesets that can be selected per gene varies from 1 to a maximum of 21 probesets per gene.
Analysis:
The following analysis steps were performed:
For completeness, it is noted that the random selection of probeset per signature gene will only be applicable for signature genes with >1 probeset i.e. 30 of the signature genes have only 1 probeset per gene, so for these genes, the same probeset is being selected each time.
As outlined in the earlier examples, using the transcriptional profile and hierarchical clustering of the Discovery cohort of prostate cancer samples, we have identified a distinct molecular subgroup of primary prostate cancers that clustered with metastatic disease and prostate cancers known to have concomitant metastases. This subgroup of primary tumour samples clustered with metastatic samples represented a poor prognostic population, whilst the benign like primary tumours defined a good prognostic subgroup. Functional analysis of the subgroup identified biological processes known to be involved in metastasis such as Epithelial Mesenchymal Transition (EMT) and cell migration. This cluster was hence defined as the ‘Metastatic-Like’ subgroup and for the purposes of this specification will be referred to throughout as ‘Met-like’.
We developed a 70-gene signature to prospectively identify the ‘Met-like’ subgroup of patients. This 70-gene assay can be used to prospectively assess disease progression from a primary tumour, to determine the likelihood of disease recurrence and/or metastatic progression. We have also previously shown that the 70-gene signature also displays good performance in heterogeneity studies, maintaining subgroup detection and signature score stability.
We have also demonstrated the prognostic significance of this molecular subgroup using the 70-gene signature in three independent in silico datasets with different clinical endpoints. In the Glinksy dataset (79 prostate cancer cases), the signature showed a good discrimination of biochemical recurrence endpoint with a statistically significant AUC=0.69 [0.57-0.79], p=0.0032 (Glinsky et al 2004). Also in the Erho dataset (545 prostate cancer cases), a statistically significant modest discrimination was observed with the signature for classifying patients metastatic recurrence endpoint (AUC 0.612 [0.569-0.653], p<0.0001) (Erho et al 2013). Finally, in the Taylor dataset, the signature had statistically significant association with patients time to metastatic recurrence (HR=6.32 [1.98-20.20], p<0.0001) and time to biochemical recurrence with HR 3.76 [1.70-8.34], p<0.0001 (Taylor et al 2010). Importantly, the metastatic biology subgroup has also been shown to predict poor outcome as identified by disease recurrence following surgical removal of the prostate independent of known prognostic factors such as Gleason score.
The identification of prostate cancer patients at high risk of recurrence following curative surgery or radiation is a key clinical requirement to identify those men that should receive adjuvant chemotherapy or radiation treatment whilst avoiding unnecessary interventions and side-effects in those who do not require further treatment. Based on this, the ability and performance of our 70-gene assay in identifying this high-risk population of patients required comprehensive clinical validation in independent cohorts of clinical prostate samples, either resections following curative surgery or biopsy specimens following curative radiotherapy.
Objectives
To further assess the performance of the prostate prognostic 70-gene assay in primary prostate resections.
To clinically validate the prostate prognostic 70-gene assay in an independent cohort of primary localised prostate cancer resections with the ability to identify a subgroup of prostate cancer patients at increased risk of developing biochemical recurrence and/or metastatic disease progression following surgery with curative intent.
To assess the performance of the prostate prognostic 70-gene assay in prostate biopsies in comparison to resection specimens.
To clinically validate the prostate prognostic 70-gene assay in an independent cohort of primary prostate biopsies with the ability to identify a subgroup of prostate cancer patients at increased risk of developing biochemical recurrence and/or metastatic disease progression following radiation treatment.
Materials & Methods
Processing and clinical validations of the 70 gene prognostic assay was performed in a blinded and randomised manner to avoid technical or biological confounding in the expression data which could have the potential to compromise data quality, integrity and validation objectives.
Prostate Cancer Tumour Material
This study performed gene expression analysis of two separate cohort of prostate cancer specimens. The first validation cohort was collected internally by Almac Diagnostics and included 349 prostate resection FFPE tissue samples obtained from four clinical sites; University College Dublin (62 samples), Wales Cancer Bank (100 samples), University of Surrey (41 samples) and University Hospital of Oslo (146 samples). This cohort consisted of samples across three key clinical groups, Non-recurrence patients (189 samples), Biochemical recurrence (also referred to as PSA recurrence) patients (112 samples) and Metastatic progression patients (48 samples). The resection dataset incorporated samples were collected based on the following inclusion criteria:
Demographic, clinical and pathological variables utilised for the data analysis of the prostate resection cohort is summarised in Table 27.
The second validation cohort was collected in collaboration with the QUB as part of the FASTMAN Research Group and included 312 prostate biopsy FFPE tissue samples. This cohort consisted of 60 patient failures which incorporated 58 Biochemical recurrence, 24 Metastatic progression and 18 Castrate Resistant Prostate Cancer (CRPC). The biopsy dataset incorporated samples were collected based on the following inclusion criteria:
Demographic, clinical and pathological variables utilised for the data analysis of the prostate biopsy cohort is summarised in Table 28.
Ethical approval for the sample acquisition and dataset analysis as validation of the prostate prognostic assay was obtained from the East of England Research Ethics Committee (Ref: 14/EE/1066).
Gene Expression Profiling of Prostate Cancer Samples
Prior to sample profiling, clinical samples were randomized into RNA extraction batches and re-randomised into cDNA amplification processing batches using a list of pre-defined factors i.e. Clinical T-stage, PSA, Gleason, Age and Response. Clinical site factor was also included for validation 1. A further randomization of reagents, equipment and operators was performed prior to sample processing.
All samples were centrally pathology reviewed (Prof E. Kay RCSI) and marked-up for macrodissection based on the tumour area with the most dominant Gleason grade. For resection samples 2×10 μm sections were processed whereas for biopsy samples 4×5 μm sections were used for profiling. Total RNA was extracted from macrodissected FFPE tissue using the Roche High Pure RNA Paraffin Kit (Roche Diagnostics GmbH, Mannheim, Germany). RNA was converted into complementary deoxyribonucleic acid (cDNA), which was subsequently amplified and converted into single-stranded form using the SPIA® technology of the WT-Ovation™ FFPE RNA Amplification System V3 (NuGEN Technologies Inc., San Carlos, Calif., USA). The amplified single-stranded cDNA was then fragmented and biotin labelled using the FL-Ovation™ cDNA Biotin Module V3 (NuGEN Technologies Inc.). The fragmented and labelled cDNA was then hybridised to the Almac Prostate Cancer DSA™. Almac's Prostate Cancer DSA™ research tool has been optimised for analysis of FFPE tissue samples, enabling the use of valuable archived tissue banks. The Almac Prostate Cancer DSA™ research tool is an innovative microarray platform that represents the transcriptome in both normal and cancerous prostate tissues. Consequently, the Prostate Cancer DSA™ provides a comprehensive representation of the transcriptome within prostate disease and tissue setting, not available using generic microarray platforms. Arrays were scanned using the Affymetrix Genechip® Scanner 7G (Affymetrix Inc., Santa Clara, Calif.).
Process Controls
Stratagene Universal Human Reference (UHR) samples and ES-2 cell line material were used as process controls within each processing batch as a standard measure during profiling of clinical cohorts. The UHR control is designed to be used as a universal reference RNA for microarray profiling experiments. These controls have been generated from pooling equal quantities of DNase treated cell line RNA to make a control RNA pool. The ES-2 cell line is a human clear cell carcinoma cell line representing ovarian cancer, established from an ovarian surgical tumour. The ES-2 cell line is characterised by a fibroblast morphology and cultures as an adherent cell line. Cells are maintained in McCoy's 5a Medium Modified with 10% Foetal Calf Serum (FCS), with a doubling time of approximately 24 hours. Due to their adherent properties and their fast doubling time these cells are ideal for bulking up as standard cell line controls. Approximately 1×106 ES-2 cells were pelleted and fixed overnight prior to processing as a Formalin Fixed Paraffin Embedded (FFPE) tissue block. One 10 μm section of the prepared ES-2 cell line FFPE block was utilised for RNA extraction prior to downstream profiling as a Prostate Metastatic assay specific processing control.
Data Preparation and QC
A continual QC assessment of samples during sample processing was performed. Samples with RNA and cDNA concentrations were taken forward for microarray profiling i.e. minimum of 12.5 ng/ul for RNA concentration and minimum of 140 ng/ul for cDNA concentration.
Microarray data quality was assessed continuously throughout the profiling of these cohorts on a batch by batch basis, and also cumulatively after the completion of profiling to exclude poor quality samples prior to analysis. Samples were pre-processed using the Robust Multi-Array (RMA) average methodology (Irizarry et al. 2003). The QC assessment comprised a combination of the following quality metrics:
Pre-defined limits of acceptance for Prostate assay specific cell line ES-2 were monitored using statistical process control (SPC) charts.
Generation of Signature Scores
Samples were pre-processed on a per sample basis using the refRMA (Irizarry et al. 2003) pre-processing model generated during the development of the 70 gene assay. Ensemble version 75 was used to annotate the probe sets to the corresponding Entrez Gene ID. Probe set expression was summarised to an Entrez Gene ID level using the median value (and excluding anti-sense probe sets). Assay scores were calculated using the following formula from the partial least squares model:
Where wi is the weight of each entrez gene, xi is the gene expression, bi is the entrez gene specific bias and k=0.4365 (Table 29). Assay calls were assigned based upon predefined cut-off for all samples Samples with a continuous signature result >cut-off were labelled ‘assay positive’ otherwise ‘assay negative’.
Univariate and Multivariate Analysis
Time to event (survival) analysis using time to biochemical recurrence (BCR) and time to metastatic disease was performed to evaluate the prognostic effects of the 70 gene prognostic assay. The survival distributions of patient groups defined by assay status (positive or negative) are visualized using Kaplan-Meier (KM) survival curves.
The Cox proportional hazards regression model was used to assess 70 gene assay status and survival (BCR and Metastatic disease). The hazard ratio (HR) was used to quantify the effect (association) of assay status with survival endpoints. In addition to the univariate (unadjusted) analysis, the multivariable (adjusted) Cox model was used to assess the effect of the assay status (positive or negative) on BCR and Metastatic disease, adjusting for PSA at diagnosis, patient age and Gleason score on survival outcome. All estimated effects are reported with 95% confidence intervals from an analysis in which the assay and these standard prognostic factors were included, regardless of their significance. Interpretation of estimated parameters from Cox proportional hazards test and the level of significance, the goodness of fit of the fitted model was investigated including checking the fulfilment of the proportional hazards assumption (Gramsbsch & Therneau, 1994).
Multivariable (adjusted) Cox model was also used to assess the effect of the assay status (positive or negative) on BCR and Metastatic disease, adjusting for CAPRA score (Cooperberg et al. 2006). CAPRA scores for each sample were determined using PSA, Biopsy Gleason score, clinical T-stage, percentage of positive biopsy cores and age.
All tests of statistical significance were 2-sided at 5% level of significance. Statistical analysis was performed using MedCalc version 13.
Results
The 70-Gene Signature Predicts Time to Biochemical Recurrence of the ‘Met-Like’ Subgroup in the Resection Validation Cohort
Utilising 5-10 year clinical follow up data, univariate survival analysis was performed on the 322 samples which passed microarray data QC to assess the performance of the 70-gene signature at predicting time to biochemical recurrence in the resection dataset following surgery. The Kaplan-Meier survival curve shows a significant association of the 70-gene signature at predicting earlier time to recurrence (months) of the ‘Met-like’ subgroup (blue) in comparison to the Non Met-like samples (green). This suggests that the samples within the ‘Met-like’ subgroup have an increased risk of developing biochemical disease recurrence following radical prostatectomy surgery with curative intent (HR=1.74 [1.18-2.56]; p=0.0009) (
The 70-Gene Signature Predicts Time to Metastatic Disease Progression of the ‘Met-Like’ Subgroup in the Resection Validation Cohort
Next using the 5-10 year clinical follow up data, univariate survival analysis was also performed on the 322 samples which passed microarray data QC to assess the performance of the 70-gene signature at predicting time to metastatic progression either local or distant sites, in the resection dataset following surgery. Similarly to biochemical recurrence, the Kaplan-Meier survival curve shows a significant association of the 70-gene signature at predicting metastatic progression of the ‘Met-like’ subgroup (blue) in comparison to the Non Met-like samples (green). This suggests that the patients within the ‘Met-like’ subgroup have an increased risk of developing metastatic disease progression following radical prostatectomy surgery with curative intent (HR=3.60 [1.81-7.13]; p<0.0001) (
The 70-Gene Signature Predicts Time to Biochemical Recurrence of the ‘Met-Like’ Subgroup in the Biopsy Validation Cohort
Univariate survival analysis was performed using the collated 5-10 year follow up clinical data on the 322 samples to assess the performance of the 70-gene signature at predicting time to biochemical recurrence in the biopsy dataset following radiotherapy with curative intent. The Kaplan-Meier survival curve shows a significant association of the 70-gene signature at predicting earlier time to recurrence (months) of the ‘Met-like’ subgroup (blue) in comparison to the Non Met-like samples (green). As with the resection dataset, this suggests that the patients within the ‘Met-like’ subgroup have an increased risk of developing biochemical disease recurrence following radical radiotherapy with curative intent (HR=2.18 [1.14-4.17]; p=0.0042) (
The 70-Gene Signature Predicts Time to Metastatic Disease Progression of the ‘Met-Like’ Subgroup in the Biopsy Validation Cohort
Following this, univariate survival analysis was also performed on the 248 QC pass samples to determine the performance of the 70-gene signature at predicting time to metastatic progression either local or distant sites, in the biopsy dataset following surgery. As with biochemical recurrence, the Kaplan-Meier survival curve shows a significance of the 70-gene signature at predicting metastatic progression of the ‘Met-like’ subgroup (blue) in comparison to the Non Met-like samples (green). This suggests that the patients within the ‘Met-like’ subgroup have an increased risk of developing metastatic disease progression following radical radiotherapy treatment with curative intent (HR=3.50 [1.28-9.56]; p=0.0017) (
Collectively, the data for both the resection and biopsy cohorts support the 70-gene signature as a prognostic assay in the field of prostate cancer which could be implemented as a patient stratifier to identify prostate cancer patients from early detection that may be at increased risk of developing more aggressive high-risk disease within 3-5 years of initial treatment.
Performance of the 70-Gene Signature as a Prognostic Tool for Biochemical and Metastatic Recurrence in Comparison to the CAPRA Scoring System
The CAPRA and CAPRA-S scoring system for prostate cancer is a multivariate prognostic tool which has been developed to predict risk of disease recurrence using pre-operative biopsy material (CAPRA) and post-operative resected material (CAPRA-S). The scoring system can provide outcome based on a range of risk levels and is calculated on a points system taking into account PSA levels, patient age, Gleason grade and clinical T-stage whereby the higher the cumulative points the greater the risk of disease recurrence (Cooperberg et al 2005). CAPRA-S used to assess risk and prediction post-surgery also includes scoring for additional clinical factors including seminal vesicle invasion (SVI), extracapsular extension (ECE), lymph node invasion (LNI) and surgical margins. The only additional factor utilised in the CAPRA scoring system for biopsy material is the % of positive cores > or <34%. Firstly, we investigated the prognostic performance of the novel 70-gene signature in comparison to the CAPRA-S scoring system. In multivariate analysis only the CAPRA-S scoring was significantly associated with biochemical recurrence, (HR=1.36 [1.28-1.45], p<0.0001) however both the metastatic assay and CAPRA-S scoring were significantly associated with the development of metastatic disease (HR 2.53 [1.40-4.60]; p=0.0024 and HR=1.43 [1.28-1.61], p<0.0001 (Table 32a and 32b). These data indicate that the metastatic signature provided additional information to the CAPRA-S scoring system.
Finally we also interrogated the prognostic performance of our 70-gene signature in comparison to the CAPRA scoring system. Only the 70-gene signature was significantly associated with prognostic outcome and identifying the high-risk ‘Met-like’ subgroup at increased chance of developing biochemical recurrence in the biopsy dataset (HR 2.05 [1.18-3.59]; p=0.0119) whilst the CAPRA score showing no significance independent of the prognostic assay (Table 33a). Similarly, in the biopsy validation cohort, only the 70-gene signature was significantly associated with prognostic outcome and identifying the high-risk ‘Met-like’ subgroup at increased chance of developing metastatic disease progression (HR 3.39 [1.44-7.97]; p=0.0054) (Table 33b). In sum, the comparison of the 70-gene signature to the CAPRA scoring system shows better performance in biopsy material and provides further evidence for the use of the 70-gene signature as a prognostic assay within the field of prostate cancer.
Approximately 35% of primary localised prostate cancer progress to a more aggressive and recurrent disease state despite radical treatment such as surgery or external beam radiotherapy, whilst a large number of primary cancers will not progress to clinically significant disease. With this in mind, a great clinical question within the field is how to easily distinguish these subgroups of patients to allow patient stratification which could ultimately determine which patients may require further and more intense treatment regimens and which patients could avoid the toxic less tolerated therapies if unnecessary. It is thought that a potential approach to stratification is the development of compound prognostics factors which is based on both a combination of single prognosticators and their associations or alternatively gene expression profiles from DNA-microarray profiling (Buhmeida et al 2006).
Utilising this approach, Almac Diagnostics have developed and validated a 70-gene signature as a potential prognostic assay which could promote the identification of a high-risk prostate cancer population at increased risk of developing more aggressive disease, either biochemical or metastatic recurrence. The data within this specification strongly supports the performance of the prostate prognostic assay in both resection and biopsy material. In two independent clinical validation cohorts of primary prostate resections and biopsies, the 70-gene signature can accurately identify a subgroup of patients with a ‘Met-like’ biology and a greater risk of biochemical disease relapse or metastatic disease within 3-5 years of follow up. The subgroup of patients with a ‘Met-like’ biology are considered the population who should receive additional treatment post-surgery, such as adjuvant hormone therapy, radiotherapy or treatment with taxanes. Conversely to this, the patients identified within the Non Met-like subgroup should be spared from further treatment and monitored throughout standard clinical follow-up. It is evident this prognostic assay has two clear clinical utilities:
Predicting a subset of a defined prostate cancer cohort from resection material who may progress with high-risk disease (either biochemical recurrence or metastatic progression) following radical prostatectomy surgery with curative intent.
Predicting a subset of a defined prostate cancer cohort from biopsy material who may progress with high-risk disease (wither biochemical or metastatic progression) following radical radiotherapy with curative intent.
Table Legends
Table 28—Summary of demographic, clinical and pathological variables considered for analysis of the internal resection cohort. Table outlines total number of patients, the median and range of age at surgery (years), time to recurrence (months), pre-operative PSA levels (ng/ml) and the number (%) of patients from each of the four clinical sites, within each recurrence subgroup, associated with each of the representative Gleason grades, within each pathological T-stage subgroup, with lymph node invasion (LNI), seminal vesicle invasion (SVI), extracapsular extension (ECE) and patients with negative, diffuse or focal surgical margins.
Table 29—Summary of demographic, clinical and pathological variables considered for analysis of the FASTMAN biopsy cohort. Table outlines total number of patients, the median and range of age at diagnosis (years), time to recurrence (months), PSA levels at diagnosis (ng/ml) and the number (%) of patients, within each recurrence subgroup, associated with each of the representative Gleason grades and within each pathological T-stage subgroup.
Table 30—Genes, weightings and bias of the 70-gene signature.
Table 31—A) Multivariate analysis of the 70-gene signature in the internal resection cohort for biochemical recurrence, demonstrating assay performance independent of other prognostic clinical factors including age at surgery, PSA levels and combined Gleason score. P-values, hazard ratios (HR) and 95% confidence intervals (CI) of the HR are outlined within the table. P-values highlighted in red indicate statistical significance. B) Multivariate analysis of the 70-gene signature in the internal resection cohort for metastatic disease progression, demonstrating assay performance independent of other prognostic clinical factors including age at surgery, PSA levels and combined Gleason score. P-values, hazard ratios (HR) and 95% confidence intervals (CI) of the HR are outlined within the table. P-values highlighted in red indicate statistical significance.
Table 32—A) Multivariate analysis of the 70-gene signature in the FASTMAN biopsy cohort for biochemical recurrence, demonstrating assay performance independent of other prognostic clinical factors including age at diagnosis, PSA levels and combined Gleason score. P-values, hazard ratios (HR) and 95% confidence intervals (CI) of the HR are outlined within the table. P-values highlighted in red indicate statistical significance. B) Multivariate analysis of the 70-gene signature in the FASTMAN biopsy cohort for metastatic disease progression, demonstrating assay performance independent of other prognostic clinical factors including age at diagnosis, PSA levels and combined Gleason score. P-values, hazard ratios (HR) and 95% confidence intervals (CI) of the HR are outlined within the table. P-values highlighted in red indicate statistical significance.
Table 33—A) Covariate analysis of the 70-gene signature in comparison to the CAPRA-S scoring system within the internal resection cohort for biochemical recurrence, demonstrating assay performance against alternative prognostic scoring assays. P-values, hazard ratios (HR) and 95% confidence internals (CI) of the HR are outlined for each comparison within the table. P-values highlighted in red indicate statistical significance. B) Covariate analysis of the 70-gene signature in comparison to the CAPRA-S scoring system within the internal resection cohort for metastatic disease progression, demonstrating assay performance against alternative prognostic scoring assays. P-values, hazard ratios (HR) and 95% confidence internals (CI) of the HR are outlined for each comparison within the table. P-values highlighted in red indicate statistical significance.
Table 34—A) Covariate analysis of the 70-gene signature in comparison to the CAPRA scoring system within the FASTMAN biopsy cohort for biochemical recurrence, demonstrating assay performance against alternative prognostic scoring assays. P-values, hazard ratios (HR) and 95% confidence internals (CI) of the HR are outlined for each comparison within the table. P-values highlighted in red indicate statistical significance. B) Covariate analysis of the 70-gene signature in comparison to the CAPRA scoring system within the FASTMAN biopsy cohort for metastatic disease progression, demonstrating assay performance against alternative prognostic scoring assays. P-values, hazard ratios (HR) and 95% confidence internals (CI) of the HR are outlined for each comparison within the table. P-values highlighted in red indicate statistical significance.
Samples:
Methods:
Core Gene Analysis
The purpose of evaluating the core gene set of the signature is to determine a ranking for the Entrez genes based upon their impact on performance when removed from the signature.
This analysis involved 10,000 random samplings of 10 signature Entrez genes from the original 70 signature Entrez gene set. At each iteration, 10 randomly selected signature Entrez genes were removed and the performance of the remaining 65 genes was evaluated using the endpoint to determine the impact on HR (Hazard Ratio) performance when these 10 Entrez genes were removed in the following 2 datasets:
FASTMAN Biopsy Validation was evaluated using the biochemical recurrence (BCR) endpoint and Internal Resection Validation was evaluated using the metastatic recurrence (MET) endpoint. Within each of the 2 datasets, the signature Entrez genes were weighted based upon the change in HR performance (Delta HR) based upon their inclusion or exclusion. Entrez genes ranked ‘1’ have the most negative impact on performance when removed and those ranked ‘70’ have the least impact on performance when removed.
Minimum Gene Analysis
The purpose of evaluating the minimum number of Entrez genes is to determine if significant performance can be achieved within smaller subsets of the original signature.
This analysis involved 10,000 random samplings of the 70 signature Entrez genes starting at 1 Entrez gene/feature, up to a maximum of 30 Entrez genes/features. For each randomly selected feature length, the signature was redeveloped using the PLS machine learning method under CV and model parameters derived. At each feature length, all randomly selected signatures were applied to calculate signature scores for the following 2 datasets:
Continuous signature scores were evaluated with outcome to determine the HR effect; FASTMAN Biopsy Validation was evaluated with BCR and Internal Resection Validation was evaluated with MET. The HR for all random signatures at each feature length was summarized and figures generated to visualize the performance over CV.
Results
Core Gene Analysis
The results for the core gene analysis of the 70 gene signature in the 2 datasets is provided in this section.
The ranks assigned to the signature Entrez genes based on the combined core set analysis is summarized in Table 38 below:
Minimum Gene Analysis
The results for the minimum gene analysis of the 70 gene signature in 2 datasets is provided in this section.
The present invention is not to be limited in scope by the specific embodiments described herein. Indeed, various modifications of the invention in addition to those described herein will become apparent to those skilled in the art from the foregoing description and accompanying figures. Such modifications are intended to fall within the scope of the appended claims. Moreover, all embodiments described herein are considered to be broadly applicable and combinable with any and all other consistent embodiments, as appropriate.
Various publications are cited herein, the disclosures of which are incorporated by reference in their entireties.
Number | Date | Country | Kind |
---|---|---|---|
1510684.2 | Jun 2015 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2016/051825 | 6/17/2016 | WO | 00 |