The present invention relates to the technical field of hepatocellular carcinoma (HCC) management, and more precisely to the prognosis of HCC aggressiveness and associated therapeutic decisions. The invention provides a new prognosis method of HCC aggressiveness, based on determination in vitro and analysis of an expression profile comprising genes TAF9, RAMP3, HN1, KRT19, and RAN. The invention also provides kits for the prognosis of HCC aggressiveness, and methods of treatment of HCC in a subject based on a preliminary prognosis of said subject HCC aggressiveness.
Hepatocellular tumors are composed of a heterogeneous group of tumors, including malignant (hepatocellular carcinoma or HCC) and benign (hepatocellular adenoma or HCA, focal nodular hyperplasia or FNH, and regenerative macronodule) tumors.
HCC constitutes a major health problem in Asia and Africa, mainly explain by the high rate of chronic hepatitis B infection, but it incidence also rises constantly in western countries, where more than 90% of HCC develop on cirrhosis. In Western countries, the main causes of the underlining liver disease are chronic hepatitis B and C and alcohol consumption. Non-alcoholic steato-hepatitis, as a consequence of metabolic syndrome, is also an increasing cause of chronic liver disease and HCC. More rarely (around 10% of cases) HCC develops on a non-cirrhotic liver.
Surgical resection represents an important curative treatment of HCC but is impaired by a high rate of recurrence (50% to 70% at 5 years) and tumor related death (30% to 50% at 5 years) (Ishizawa T Gastroenterology 2008).
There is thus a need for simple tools permitting to predict or prognose HCC patients' overall survival and early tumor recurrence.
Indeed, depending on the aggressiveness of the HCC of the patient, said patient's clinical management should be different:
In this setting, a simple prognosis tool based on molecular profiling of a subject's liver sample would be very helpful.
Some genes such as EPCAM (Yamashita T, et al. 2008; Lee J S, et al. 2006) and KRT19 (Lee J S, et al. 2006; Durnez A, et al, 2006) have been associated to HCC prognosis.
Early recurrence, defined by tumor recurrence within the 2 years following surgery, is mainly related to tumor biology (Imamura H J hepatol 2003). The inventors have previously described a molecular classification of HCC into 6 subgroups (G1-G6) and have showed that HCC of the G3 subgroup have a poor prognosis (Boyault S Hepatology 2007; Villanueva A Gastroenterology 2011; WO2007/063118A1). Other molecular signatures of HCC recurrence and related death have been published but few of them have been externally validated (Villanueva A, clinical cancer res 2010). One of the validated molecular prognostic classifications was the G3-signature that has been previously validated in paraffin-embedded tissues (Boyault S Hepatology 2007, Villanueva A, gastroenterology 2011). In addition, several signatures for prognosis of survival without relapse (a good prognosis being associated to no relapse during the first 4 post-operative years; a bad prognosis being associated to relapse during the first 2 post-operative years) have also been described in WO2007/063118A1.
In contrast, late recurrence, defined by tumor recurrence 3 years or more after surgery, is mainly related to the feature of the surrounding non-tumor tissue (“carcinogenic field effect”). A molecular signature of 196 genes derived from non-tumor liver sample is associated with late recurrence and overall survival, and can be considered as a surrogate marker of the severity and of the carcinogenic potential of the underlining cirrhosis (Hoshida Y, NEJM, 2008). In addition, several signatures for prognosis of global survival (with or without relapse) at 5 years have also been described in WO2007/063118A1.
While the above prior art tools are useful for prognosis of HCC aggressiveness, there is still a need for validated and more powerful tumor molecular signature, in order to predict overall survival and early recurrence of resected HCC.
In particular, in view of the distinct therapeutic managements selected depending on the prognosis, it is crucial that the method of prognosis used for taking this type of therapeutic decision be highly sensitive and specific, and show high positive predictive value (PPV), negative predictive value (NPV) and accuracy (as measured by the area under the ROC curve or AUC).
In addition, it would be very useful for clinicians if a unique molecular signature was able to predict both overall survival and early recurrence. In this respect, we note that prognosis tools described in the prior art are always different for prognosis overall survival and early recurrence. Notably, best predictors of global survival (i.e. overall survival) and of survival without relapse (which also predicts early recurrence) disclosed in WO2007/063118A1 are different, which is not practical for clinicians.
In addition, many studies trying to identify molecular signature of HCC prognosis are based on cohorts of patients with specific etiologies (such as HBV- or HCV-related HCC, see Nault J C, semliver dis 2011, Woo H G gastroenterology 2011, Hsu H C Am J pathol 2000), and the general applicability of molecular signatures identified on such cohorts may be questioned and in any case needs further validation in patients with other HCC etiology.
There is thus still a need for a simple and highly reliable prognosis tool, which would permit to predict both overall survival and early recurrence and would show high sensitivity, specificity, PPV, NPV and accuracy.
Based on a new strategy of analysis of microarray data obtained from various HCC samples, the inventors have constructed a simple and reliable molecular prognosis tool that fulfills the above criteria:
The present invention thus relates to a method of in vitro prognosis of global survival and/or survival without relapse in a subject suffering from HCC from a liver sample of said subject, comprising:
By “subject”, it is meant any human subject, regardless of sex or age. The subject is affected with HCC, and has preferably been subjected to a surgical liver tumor resection.
According to the invention, a “prognosis” of HCC evolution means a prediction of the future evolution of a particular HCC tumor relative to the patient suffering of this particular HCC tumor. The method according to the invention allows simultaneously for both a global survival prognosis and a survival without relapse prognosis.
By “global survival prognosis” is meant prognosis of survival, with or without relapse. As stated before, the main current treatment against HCC is tumor surgical resection. As a result, a “bad global survival prognosis” is defined as the occurrence of death within the 3 years after liver resection, whereas a “good global survival prognosis” is defined as the lack of death during the 5 post-operative years.
By “survival without relapse prognosis” is meant prognosis of survival in the absence of any relapse or recurrence. A “bad survival without relapse prognosis” is defined as the presence of tumor-relapse within the two years after liver resection, whereas a “good survival without relapse prognosis” is defined as the lack of relapse during the 4 post-operative years. By “relapse” or “recurrence”, it is meant the growing back of HCC in the same subject, after initial treatment, generally by tumor surgical resection.
In the above methods according to the invention, reference samples are used in order to calibrate an algorithm, which may then be used to prognose global survival and/or survival without relapse. In advantageous embodiments of the methods of the invention, reference samples used for calibrating the algorithm(s) used for prognosing global survival and survival without relapse are the following:
In the methods according to the invention, liver samples are analyzed. By “liver sample”, it is meant any sample obtained by taking part of the liver of a subject. By “HCC liver sample”, it is meant a liver sample from a subject affected with HCC. Such liver samples may notably be a liver biopsy or a partial or whole liver tumor surgical resection. Reference samples used for calibrating the algorithm are also liver samples, preferably of the same type as those analyzed.
The above methods according to the invention are based on the in vitro determination of a particular expression profile comprising or consisting of 5 specific genes. Information concerning those 5 genes is provided in Table 1 below:
In the above method according to the invention, prognosis of global survival and/or survival without relapse is made based on an expression profile comprising or consisting of 5 specific genes, and optionally one or more internal control genes, or Equivalent Expression Profiles thereof. By “expression profile”, it is meant the expression levels of the group of genes included in the expression profile. By “comprising”, it is intended to mean that the expression profile may further comprise other genes. In contrast, by “consisting of”, it is intended to mean that no further gene is present in the expression profile analyzed. By “Equivalent Expression Profile thereof” or “EEP”, it is intended to mean the original expression profile (to which said EEP is equivalent), wherein the addition, deletion or substitution of some of the genes (preferably at most 1 or 2 genes) does not change significantly the reliability of the diagnosis.
In a preferred embodiment, Equivalent Expression Profiles include expression profiles in which one of the genes of a selected genes combination is replaced by an equivalent gene. In the present description, a first gene (“gene A”) can be considered as equivalent to another second gene (“gene B”), when replacing “gene A” in the expression profile of by “gene B” does not significantly impact the performance of the test. This is typically the case when “gene A” is correlated to “gene B”, meaning that the expression of “gene A” is statistically correlated to the expression level of “gene B”, as determined by a measure such as Pearson's correlation coefficient. The correlation may be positive (meaning that when “gene A” is upregulated in a patient, then “gene” B is also upregulated in that same patient) or negative (meaning that when “gene A” is upregulated in a patient, then “gene B” is downregulated in that same patient). A maximum of 10 genes among the 103 genes analyzed by the inventors using quantitative PCR, which are the best correlated to each of the 5 genes necessary for prognosis, and which have an average Pearson's correlation coefficient ≧0.3 or ≦−0.3 are mentioned in Table 1 above.
By “determining an expression profile”, it is meant the measure of the expression level of a group a selected genes. The expression level of each gene may be determined in vitro either at the proteic or at the nucleic level, using any technology known in the art. For instance, at the proteic level, the in vitro measure of the expression level of a particular protein may be performed by any dosage method known by a person skilled in the art, including but not limited to ELISA or mass spectrometry analysis. These technologies are easily adapted to any liver sample. Indeed, proteins of the liver sample may be extracted using various technologies well known to those skilled in the art for ELISA or mass spectrometry in solution measure. Alternatively, the expression level of a protein in a liver sample may be analyzed using mass spectrometry directly on the tissue slice.
In a preferred embodiment of a method according to the invention, the expression profile is determined in vitro at the nucleic level. At the nucleic level, the in vitro measure of the expression level of a gene may be carried out either directly on messenger RNA (mRNA), or on retrotranscribed complementary DNA (cDNA). Any method to measure the expression level may be used, including but not limited to microarray analysis, quantitative PCR, southern analysis.
In a preferred embodiment of a method according to the invention the expression profile is determined in vitro using a nucleic acid microarray, in particular an oligonucleotide microarray. In another preferred embodiment of a method according to the invention, the expression profile is determined in vitro using quantitative PCR. In any case, the expression level of any gene is preferably normalized. There are many methods for normalizing obtained expression data, depending on the technology used for measuring expression. Such methods are well known to those skilled in the art. In some embodiments, normalization may be performed in comparison to the expression level of an internal control gene, generally a household gene, including but not limited to ribosomal RNA (such as for instance 18S ribosomal RNA) or genes such as HPRT1 (hypoxanthine phosphoribosyltransferase 1), UBC (ubiquitin C), YWHAZ (tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, zeta polypeptide), B2M (beta-2-microglobulin), GAPDH (glyceraldehyde-3-phosphate dehydrogenase), FPGS (folylpolyglutamate synthase), DECR1 (2,4-dienoyl CoA reductase 1, mitochondrial), PPIB (peptidylprolyl isomerase B (cyclophilin B)), ACTB (actin β), PSMB2 (proteasome (prosome, macropain) subunit, beta type, 2), GPS1 (G protein pathway suppressor 1), CANX (calnexin), NACA (nascent polypeptide-associated complex alpha subunit), TAX1BP1 (Taxi (human T-cell leukemia virus type I) binding protein 1), and PSMD2 (proteasome (prosome, macropain) 26S subunit, non-ATPase, 2).
In the context of the present invention, “expression values” (also referred to as “expression levels”) of genes used for the prognosis include both:
These technologies are also easily adapted to any liver sample. Indeed, several well-known technologies are available to those skilled in the art for extracting mRNA from a tissue sample and retrotranscribing mRNA into cDNA.
Many algorithms may be used for prognosing global survival and/or survival without relapse based on the expression profile determined in vitro. In particular, the algorithm may be selected from PLS (Partial Least Square) regression, Support Vector Machines (SVM), linear regression or derivatives thereof (such as the generalized linear model abbreviated as GLM, including logistic regression), Linear Discriminant Analysis (LDA, including Diagonal Linear Discriminant Analysis (DLDA)), Diagonal quadratic discriminant analysis (DQDA), Random Forests, k-NN (Nearest Neighbour) or PAM (Predictive Analysis of Microarrays) algorithms. Cox models may also be used. Centroid models using various types of distances may also be used.
A group of reference samples, which is generally referred to as training data, is used to select an optimal statistical algorithm that best separates good from bad prognosis (like a decision rule). The best separation is usually the one that misclassifies as few samples as possible and that has the best chance to perform comparably well on a different dataset.
For a binary outcome such as good/bad prognosis, linear regression or a generalized linear model (abbreviated as GLM), including logistic regression, may be used.
Linear regression is based on the determination of a linear regression function, which general formula may be represented as:
ƒ(x1, . . . ,xN)=β0+β1x1+ . . . +βNxN.
Other representations of linear regression functions may be used (see below). Logistic regression is based on the determination of a logistic regression function:
in which z is usually defined as
z==β
0+β1x1+ . . . +βNxN.
In the above linear or logistic regression functions, x1 to xN are the expression values (or derivatives thereof such as ΔCt, −ΔCt, ΔΔCt, or −ΔΔCt for quantitative PCR or logged values for microarray) of the N genes in the signature, β0 is the intercept, and β1 to βN are the regression coefficients.
The values of the intercept and of the regression coefficients are determined based on a group of reference samples (“training data”). The value of the linear or logistic regression function then defines the probability that a test expression profile has a good or bad prognosis (when defining the linear or logistic regression function based on training data, the user decides if the probability is a probability of good or bad prognosis). A test expression profile is then classified as having a good or bad prognosis depending if the probability that it has good or bad prognosis is inferior or superior to a particular threshold value, which is also determined based on training data. Sometimes, two threshold values are used, defining an undetermined area. Other types of generalized linear models than logistic regression may also be used.
Alternative methods such as nearest neighbour (abbreviated as k-NN) are also commonly used for a new sample, based on whether the sample is closer to the group of good prognosis or to the group of bad prognosis. The notion of “closer” is based on a choice of distance (metric, such as but not limited to Euclidian distance) in the n-dimension space defined by a signature consisting of N genes useful for prognosis (thus excluding potential housekeeping genes used for normalization purpose). The distances between a test expression profile and all reference good or bad prognosis expression profiles are calculated and the sample is classified by analysis of the k closest reference samples (k being an positive integer of at least 1 and most commonly 3 or 5), a rule of classification being pre-established depending of the number of good or bad prognosis reference expression profiles among the k closest reference expression profiles. For instance, when k is 1, a test expression profile is classified as good prognosis if the closest reference expression profile is a good prognosis expression profile, and as bad prognosis if the closest reference expression profile is a bad prognosis expression profile. When k is 2, a test expression profile is classified as responding if the two closest reference expression profiles are good prognosis expression profiles, as non-responding if the two closest reference expression profiles are bad prognosis expression profiles, and undetermined if the two closest reference expression profiles include a good prognosis and a bad prognosis reference expression profile. When k is 3, a test expression profile is classified as good prognosis if at least two of the three closest reference expression profiles are good prognosis expression profiles, and as bad prognosis if at least two of the three closest reference expression profiles are bad prognosis expression profiles. More generally, when k is p, a test expression profile is classified as good prognosis if more than half of the p closest reference expression profiles are good prognosis expression profiles, and as bad prognosis if more than half of the p closest reference expression profiles are bad prognosis expression profiles. If the numbers of good prognosis and bad prognosis reference expression profiles are equal, then the test expression profile is classified as undetermined.
Other methodologies from the field of statistics, mathematics or engineering exist, for example but not limited to decision trees, Support Vector Machines (SVM), Neural Networks and Linear Discriminant Analyses (LDA). Cox models may also be used. Centroid models using various types of distances may also be used. These approaches are well known to people skilled in the art.
In summary, an algorithm (which may be selected from linear regression or derivatives thereof such as generalized linear models (GLM, including logistic regression), nearest neighbour (k-NN), decision trees, support vector machines (SVM), neural networks, linear discriminant analyses (LDA), Random forests, or Predictive Analysis of Microarrays (PAM) is calibrated based on a group of reference samples (preferably including several good prognosis reference expression profiles and several bad prognosis reference expression profiles) and then applied to the test sample. In simple terms, a patient will be classified as good prognosis (or bad prognosis) based on how all the genes in the signature compare to all the genes from a reference profile that was developed from a group of good prognosis (training data).
The notion of whether individual genes of the expression profile are increased or decreased in a good prognosis versus a bad prognosis sample is of scientific interest. For each individual gene, the gene expression levels in the good prognosis group can be compared to the bad prognosis group by the use of Student's t-test or equivalent methods. However, such binary comparisons are generally not used for prognosis when a signature comprises several distinct genes.
In an advantageous embodiment, the algorithm used for prognosing global survival and/or survival without relapse is linear regression, using the following formula:
wherein:
In a particularly preferred embodiment, the expression profile is determined using quantitative PCR, expression values are ΔΔCt values, N is 5, threshold value T is zero, and mi and 1≦i≦5, have the values displayed in following Table 2:
The method of prognosis according to the invention as described herein may further comprise
Indeed, the inclusion of further variables independently associated to prognosis may further improve the reliability of the prognosis. Said other variables may notably be selected from G1-G6 classification (as disclosed in WO2007/063118A1, see below), BCLC (Barcelona Clinic Liver Cancer, Llovet, 1999, sem liv dis), CLIP (Cancer of the Liver Italian Program, CLIP investigators Hepatology, 1998), JIS (Japan Integrated Staging, Kudo m, J Gasterol 2003), TNM (Tumour-Node-Metastasis, AJCC cancer staging Handbook, 7th ed Springer) clinical staging, Milan (Mazzaferro v, New England J Medicine 1996) and metroticket calculator (Mazzaferro v, lancet Oncol 2009) criteria, presence of cirrhosis (Hoshida y, NEJM, 2008), preoperative AFP (alpha feto protein) plasma levels (Chevret S J hepatol 1999), Edmonson grade (Edmondson Cancer, 1954), and microvascular invasion of the liver sample (Mazzaferro v, lancet Oncol 2009).
The G1-G6 classification is described below.
BCLC, CLIP, JIS, and TNM clinical stagings, Milan and metroticket calculator criteria, and Edmonson grade are well known to and easily determined by those skilled in the art of HCC diagnosis, prognosis and management for any liver sample based on common general knowledge, as described in publications mentioned above.
When other variables are determined, their values are combined with the expression profile in order to perform a global prognosis based on all variables (expression profile and further variables), using any appropriate algorithm.
In a preferred embodiment, when other variables are determined, said other variables are BCLC clinical staging and microvascular invasion of the liver sample.
In a preferred embodiment, a composite score is determined, based on the values of the other variables (in particular BCLC clinical staging and microvascular invasion) and the expression profile score, calculated as described herein.
An example of a composite score that may be used for prognosis is displayed in
The present invention also relates to a kit comprising reagents for the determination of an expression profile comprising at most 65 distinct genes, wherein said expression profile comprises or consists of the following 5 genes: TAF9, RAMP3, HN1, KRT19, and RAN, and optionally one or more internal control genes, or an Equivalent Expression Profile thereof.
In a preferred embodiment, the kit according to the invention may be dedicated to the determination or one of the above mentioned expression profile, and then comprises reagents for the determination of an expression profile comprising at most 10 distinct genes, knowing that the expression profile with the highest number of genes of interest comprises 5 genes, and optionally one or more internal control gene. In another preferred embodiment, the kit according to the invention may further comprise reagents for the determination of other expression profiles of interest, which may be associated to HCC diagnosis and/or HCC classification into subgroups. In this case, the kit comprises reagents for the determination of an expression profile comprising at most 65 distinct genes, in order to be able to determine in vitro the expression levels of the additional expression profiles of interest. In particular, a classification of HCC samples into 6 subgroups G1 to G6 defined by the clinical and genetic main features displayed in following Table 3 has been described in WO2007/063118A1, which content relating to such classification is herein incorporated by reference:
This classification is based on the in vitro determination of an expression profile, which advantageously comprises or consists of the following 16 genes: RAB1A, REG3A, NRAS, RAMP3, MERTK, PIR, EPHA1, LAMAS, G0S2, HN1, PAK2, AFP, CYP2C9, CDH2, HAMP, and SAE1, and the method may notably comprise:
Preferably, the expression profile is determined using quantitative PCR, wherein the distance of a sample; to each subgroupk is calculated using the following formula:
wherein for each genet and subgroupk, the p(subgroupk, genet) and σ(genet) values are those displayed in following Table 4.
Reagents for the determination of an expression profile comprising N genes may include any reagents permitting to specifically quantify the expression levels of the genes included in said expression profile. For instance, when the expression profile is determined at the proteic level, then such reagents may include antibodies specific for each of the genes included in the expression profile. Preferably, the expression is determined at the nucleic level. In this case, reagents in the kit of the invention may notably include primers pairs (forward and reverse primers) and/or probes specific for each of the genes included in the expression profile (useful notably for quantitative PCR determination of the expression profile) or a nucleic acid microarray, in particular an oligonucleotide microarray. In the latter case, the nucleic acid microarray is a dedicated nucleic acid microarray, comprising probes for the detection of a maximum number of genes, as defined in the previous paragraph.
As indicated in background art section, the prognosis method according to the invention is important for clinicians because it will permit them, based on a unique and simple test, to assess the aggressiveness of the HCC tumor, and thus to adapt the treatment to the prognosis.
The invention thus also relates to a cytotoxic chemotherapeutic agent or a targeted therapeutic agent, for use in the treatment of HCC in a subject that has been given a bad prognosis using the prognosis method of the invention. The invention also relates to the use of a therapeutic cytotoxic chemotherapeutic agent or a targeted therapeutic agent for the preparation of a medicament intended for the treatment of HCC in a subject that has been given a bad global survival and/or survival without relapse prognosis by the prognosis method according to the invention. If the HCC of said subject has been further classified into subgroup G1 as defined above, then an IGFR1 inhibitor or an Akt/mTor inhibitor is preferred as adjuvant therapy. Alternatively, if the HCC of said subject has been further classified into subgroup G2 as defined above, then an Akt/mTor inhibitor is preferred as adjuvant therapy. Alternatively, if the HCC of said subject has been further classified into subgroup G3 as defined above, then a proteasome inhibitor is preferred as adjuvant therapy. Alternatively, if the HCC of said subject has been further classified into subgroup G5 or G6 as defined above, then a WNT inhibitor is preferred as adjuvant therapy However, current WNT inhibitors have toxicity problems, and there is still a need for more efficient and safer WNT inhibitors. By “cytotoxic chemotherapeutic agent” it is meant any suitable chemical agent useful for killing cancer cells. Cytotoxic chemotherapeutic agents currently used as adjuvant treatment of HCC and preferred in the present invention are doxorubicin, gemcitabine, oxaliplatine, and combinations thereof. Doxorubicin or association of gemcitabine and oxaliplatine are particularly preferred. By “targeted therapy”, it is intended to mean any suitable agent that selectively inhibits enzymes of a signaling pathway involved in HCC malignant transformation. Currently, Sorafenib, a small molecular inhibitor of several Tyrosine protein kinases (VEGFR and PDGFR) and Raf kinases (more avidly C-Raf than B-Raf), is approved for the adjuvant treatment of HCC is preferred in the present invention. Sorafenib is a bi-aryl urea of formula:
The invention also relates to a method for treating a HCC in a subject in need thereof, comprising:
The method of treatment of the invention may further comprise:
The present invention also relates to systems (and computer readable medium for causing computer systems) to perform a method of prognosis according to the invention.
In an embodiment, the invention relates to a system 1 for prognosis of global survival or survival without relapse in a subject from a liver sample of said subject, comprising:
In another embodiment, the invention relates to a computer readable medium 7 having computer readable instructions recorded thereon to define software modules for implementing on a computer steps of a prognosis method according to the invention relating to interpretation of expression profiles data. Preferably, said software modules comprising:
Embodiments of the invention relating to systems and computer-readable media have been described through functional modules, which are defined by computer executable instructions recorded on computer readable media and which cause a computer to perform method steps when executed. The modules have been segregated by function for the sake of clarity. However, it should be understood that the modules need not correspond to discreet blocks of code and the described functions can be carried out by the execution of various code portions stored on various media and executed at various times. Furthermore, it should be appreciated that the modules may perform other functions, thus the modules are not limited to having any particular functions or set of functions.
The computer readable medium can be any available tangible media that can be accessed by a computer. Computer readable medium includes volatile and nonvolatile, removable and non-removable tangible media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer readable medium includes, but is not limited to, RAM (random access memory), ROM (read only memory), EPROM (eraseable programmable read only memory), EEPROM (electrically eraseable programmable read only memory), flash memory or other memory technology, CD-ROM (compact disc read only memory), DVDs (digital versatile disks) or other optical storage media, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage media, other types of volatile and non-volatile memory, and any other tangible medium which can be used to store the desired information and which can accessed by a computer including and any suitable combination of the foregoing.
Computer-readable data embodied on one or more computer-readable media, may define instructions, for example, as part of one or more programs, that, as a result of being executed by a computer, instruct the computer to perform one or more of the functions described herein (e.g., in relation to system 1, or computer readable medium 7), and/or various embodiments, variations and combinations thereof. Such instructions may be written in any of a plurality of programming languages, for example, Java, J#, Visual Basic, C, C#, C++, Fortran, Pascal, Eiffel, Basic, COBOL assembly language, and the like, or any of a variety of combinations thereof. The computer-readable media on which such instructions are embodied may reside on one or more of the components of either system 1, or computer readable medium 6 described herein, may be distributed across one or more of such components, and may be in transition there between.
The computer-readable media may be transportable such that the instructions stored thereon can be loaded onto any computer resource to implement the aspects of the present invention discussed herein. In addition, it should be appreciated that the instructions stored on the computer readable media, or the computer-readable medium, described above, are not limited to instructions embodied as part of an application program running on a host computer. Rather, the instructions may be embodied as any type of computer code (e.g., software or microcode) that can be employed to program a computer to implement aspects of the present invention. The computer executable instructions may be written in a suitable computer language or combination of several languages. Basic computational biology methods are known to those of ordinary skill in the art and are described in, for example, Setubal and Meidanis et al., Introduction to Computational Biology Methods (PWS Publishing Company, Boston, 1997, ref 38); Salzberg, Searles, Kasif, (Ed.), Computational Methods in Molecular Biology, (Elsevier, Amsterdam, 1998, ref 39); Rashidi and Buehler, Bioinformatics Basics: Application in Biological Science and Medicine (CRC Press, London, 2000, ref 40) and Ouelette and Bzevanis Bioinformatics: A Practical Guide for Analysis of Gene and Proteins (Wiley & Sons, Inc., 2nd ed., 2001).
The functional modules of certain embodiments of the invention include a determination module 2, a storage device 3, a comparison module 4 and a display module 5. The functional modules can be executed on one, or multiple, computers, or by using one, or multiple, computer networks. The determination module 2 has computer executable instructions to provide expression level information in computer readable form.
As used herein, “expression level information” refers to information about expression level of any nucleotide (RNA or DNA) and/or amino acid sequences, either full-length or partial. In a preferred embodiment, it refers to the level of expression of mRNA or cDNA, measured by various technologies. The information may be qualitative (presence or absence of a transcript) or quantitative. Preferably it is quantitative.
Methods for determining expression level information, i.e. determination modules 2, include systems for protein and DNA/RNA analysis, and in particular those described above for determination of expression profiles at the nucleic or protein level.
The expression level information determined in the determination module can be read by the storage device 3. As used herein the “storage device” 3 is intended to include any suitable computing or processing apparatus or other device configured or adapted for storing data or information. Examples of electronic apparatus suitable for use with the present invention include stand-alone computing apparatus, data telecommunications networks, including local area networks (LAN), wide area networks (WAN), Internet, Intranet, and Extranet, and local and distributed computer processing systems. Storage devices 3 also include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage media, magnetic tape, optical storage media such as CD-ROM, DVD, electronic storage media such as RAM, ROM, EPROM, EEPROM and the like, general hard disks and hybrids of these categories such as magnetic/optical storage media. The storage device 3 is adapted or configured for having recorded thereon expression level information. Such information may be provided in digital form that can be transmitted and read electronically, e.g., via the Internet, on diskette, via USB (universal serial bus) or via any other suitable mode of communication including wireless communication between devices.
As used herein, “stored” refers to a process for encoding information on the storage device 3. Those skilled in the art can readily adopt any of the presently known methods for recording information on known media to generate manufactures comprising the expression level information.
A variety of software programs and formats can be used to store the expression level information on the storage device. Any number of data processor structuring formats (e.g., text file, spreadsheets or database) can be employed to obtain or create a medium having recorded thereon the expression level information.
By providing expression level information in computer-readable form, one can use the expression level information in readable form in the comparison module 4 to compare a specific expression profile with the reference data within the storage device 3. The comparison may notably be done using the various algorithms described above. The comparison made in computer-readable form provides a computer readable comparison result which can be processed by a variety of means. Content based on the comparison result can be retrieved from the comparison module 4 and displayed by the display module 5 to indicate a good or bad prognosis.
Preferably, reference data are expression level profiles that are indicative of all types of liver samples that may be found by a classification method according to the invention. The “comparison module” 4 can use a variety of available software programs and formats for the comparison operative to compare expression level information determined in the determination module 2 to reference data, either directly, or indirectly using any software providing statistical algorithms such as those already described above.
The comparison module 4, or any other module of the invention, may include an operating system (e.g., Windows, Linux, Mac OS or UNIX) on which runs a relational database management system, a World Wide Web application, and a World Wide Web server. World Wide Web application includes the executable code necessary for generation of database language statements (e.g., Structured Query Language (SQL) statements). Generally, the executables will include embedded SQL statements. In addition, the World Wide Web application may include a configuration file which contains pointers and addresses to the various software entities that comprise the server as well as the various external and internal databases which must be accessed to service user requests. The Configuration file also directs requests for server resources to the appropriate hardware—as may be necessary should the server be distributed over two or more separate computers. In one embodiment, the World Wide Web server supports a TCP/IP protocol. Local networks such as this are sometimes referred to as “Intranets.” An advantage of such Intranets is that they allow easy communication with public domain databases residing on the World Wide Web (e.g., the GenBank or Swiss Pro World Wide Web site). Thus, in a particular preferred embodiment of the present invention, users can directly access data (via Hypertext links for example) residing on Internet databases using a HTML interface provided by Web browsers and Web servers.
The comparison module 4 provides computer readable comparison result that can be processed in computer readable form by predefined criteria, or criteria defined by a user, to provide a content 6 based in part on the comparison result that may be stored and output as requested by a user using a display module 5. The display module 5 enables display of a content 6 based in part on the comparison result for the user, wherein the content is a signal indicative of a good or bad prognosis. Such signal can be, for example, a display of content indicative of a good or bad prognosis on a computer monitor, a printed page or printed report of content indicating a good or bad prognosis from a printer, or a light or sound indicative of a good or bad prognosis.
The content 6 based on the comparison result varies depending on the algorithm used for comparison.
For instance, when linear regression or derivatives thereof is used, the content 6 may include a score or probability of having a good or bad prognosis, or both a probability of having a good or bad prognosis and one or more threshold values, or merely a signal indicative of a good or bad prognosis. When nearest neighbor (k-NN) is used, the content 6 may include the number or proportion of good and bad prognosis expression profiles among the k closest profiles, or merely a signal indicative of a good or bad prognosis. Moreover, the content 6 may simply be a continuous or categorical score reported in a numerical, text or graphical way (for example using a color code such as red, orange or green).
The display module 5 can be any suitable device configured to receive from a computer and display computer readable information to a user. Non-limiting examples include, for example, general-purpose computers such as those based on Intel PENTIUM-type processor, Motorola PowerPC, Sun UltraSPARC, Hewlett-Packard PA-RISC processors, any of a variety of processors available from Advanced Micro Devices (AMD) of Sunnyvale, Calif., or from ARM Holdings, or any other type of processor, visual display devices such as flat panel displays, cathode ray tubes and the like, as well as computer printers of various types or integrated devices such as laptops or tablets, in particular iPads.
In one embodiment, a World Wide Web browser is used for providing a user interface for display of the content 6 based on the comparison result. It should be understood that other modules of the invention can be adapted to have a web browser interface. Through the Web browser, a user may construct requests for retrieving data from the comparison module. Thus, the user will typically point and click to user interface elements such as buttons, pull down menus, scroll bars and the like conventionally employed in graphical user interfaces. The requests so formulated with the user's Web browser are transmitted to a Web application which formats them to produce a query that can be employed to extract the pertinent information.
In one embodiment, the display module 5 displays the comparison result and whether the comparison result is indicative of a good or bad prognosis.
In one embodiment, the content 6 based on the comparison result that is displayed is a signal (e.g. positive or negative signal) indicative of a good or bad prognosis, thus only a positive or negative indication may be displayed.
The present invention therefore provides for systems 1 (and computer readable media 7 for causing computer systems) to perform methods of prognosing global survival and/or survival without relapse in HCC subjects, based on expression profiles information from a liver sample of said HCC subject.
System 1, and computer readable medium 7, are merely illustrative embodiments of the invention for performing methods of prognosing global survival and/or survival without relapse in HCC subjects based on expression profiles, and are not intended to limit the scope of the invention. Variations of system 1, and computer readable medium 7, are possible and are intended to fall within the scope of the invention.
The modules of the system 1 or used in the computer readable medium, may assume numerous configurations. For example, function may be provided on a single machine or distributed over multiple machines.
Liver samples were systematically frozen following liver resection for tumor in two French University hospitals, in Bordeaux (from 1998 to 2007) and Créteil (From 2003 to 2007). A total of 550 samples were included in this work and the study was approved by the local IRB committee (CCPRB Paris Saint Louis, 1997 and 2004) and all patients gave their informed consent according to French law. Were excluded: (1) tumors with necrosis>80%, (2) tumors with RNA of poor quality or of insufficient amount, (3) HCC with non-curative resection: R1 or R2 resection or extra hepatic metastasis at the time of the surgery, (4) HCC treated by liver transplantation.
Some HCC patients (n=10) died during the month following surgery owing to surgical complications and/or decompensated cirrhosis, and were excluded from the prognostic analysis (see specific flowchart for prognosis in
Accordingly, the following samples were included 324 HCC, of which 314 were qualified for the prognosis analysis, 40 non-hepatocellular tumors, 156 benign hepatocellular tumors including focal nodular hyperplasia (FNH, n=25), hepatocellular adenoma (HCA, n=111), regenerative macronodule (with dysplasia, n=15, or without, n=5) and 30 non-tumor samples.
Clinical, histological and molecular data of HCC included in prognosis analysis (n=314) are summarized in Tables 5 and 6 below:
#expressed in months (median, 25th and 75th percentile) and analyzed using Mann Whitney test.
Tumor and non-tumor liver samples were frozen immediately after surgery and conserved at −80° C. Tissue samples from the frozen counterpart were also fixed in 10% formaldehyde, paraffin-embedded and stained with Hematoxylin and Eosin and Masson's trichrome. The diagnosis of HCA, HCC, FNH, macroregenerative nodule and all non-hepatocellular tumors was based on established histological criteria (International working party Hepatology 1995, international consensus group Hepatology 2009). All tumors were assessed independently by 2 expert pathologists (JC and PBS) without knowledge of patient's outcome and initial diagnosis. In case of disagreement regarding the subtype diagnosis of hepatocellular tumors or regarding the pathological features of HCC included in prognosis analysis, sections were re-examined and a consensus was reached and used for the study. In the case of multitumors, the largest nodule available was analysed in our prognostic study.
103 genes were selected for the quantitative RT-PCR analysis. Using Affymetrix HG133A gene chip TM microarray hybridizations performed on the same platform, the mRNA expression of 82 liver samples including 57 HCC (E-TABM-36), 5 HNF1A inactivated adenomas (GSE7473), 7 inflammatory adenomas (GSE11819), 4 focal nodular hyperplasia (GSE9536) 9 non-tumor liver samples including cirrhosis and normal livers (E-TABM-36 and GSE7473) was analyzed. For classification purposes, genes differentially expressed in specific subgroups of tumors were selected according to 3 criteria for inclusion:
A total of 60 genes were selected for further analysis by quantitative PCR.
The inventors also wished to provide a new tool for simple and reliable prognosis of HCC, so that further genes found or already described as associated to HOC prognosis were also included for further quantitative PCR analysis:
A total of 43 genes were selected for their association with HCC prognosis.
RNAs extraction and quantitative RT-PCR was performed, as previously described. Expression of the 103 selected genes was analysed in duplicate in all the 550 samples using TaqMan Microfluidic card TLDA (Applied Biosystems) gene expression assays. Gene expression was normalized with the RNA ribosomal 18S, and the level of expression of the tumor sample was compared with the mean level of the corresponding gene expression in normal liver tissues, expressed as an n-fold ratio. The relative amount of RNA was calculated with the 2-delta delta CT method.
DNA was extracted and quality was assessed. All HCA samples have been sequenced for CTNNB1 (exon 2 to 4), HNF1A (exon 1 to 10), IL6ST (exon 6 and 10), GNAS (exon 8) and STAT3 (exon 2, 5 and 20). AH HCC samples have been sequenced for CTNNB1 (exon 2 to 4) and TP53 (exons 2 to 11). All mutations were confirmed by sequencing a second independent amplification product on both strands; screening for mutations in the matched non-tumor sample was performed in order to detect any germline mutations.
The study design followed general recommendations of the report for markers in prognosis study REMARK (McShane L M, et al. 2005) and of EASL/EORTC guidelines (EASL J, et al. 2012). After surgery, patients were followed and HCC recurrence was screened by dosage of serum AFP and CT-SCAN (or liver MRI). The primary end point of the study was disease specific overall survival by analysing the tumor related death and we censored patients died of another etiology. Tumor related death was defined when death occurred in patients with HCC involving more than 50% of the liver, HCC with extensive tumor portal thrombosis or extrahepatic metastasis. To limit the background noise due to the occurrence of a second independent HCC, we censored survival at 5 years after the initial resection surgery. The last follow-up recorded visit was in February 2011. We also assessed survival in patients that relapse, “survival post-recurrence”, defined by the interval between tumor recurrence and death.
The 314 HCC were divided into a training set S1 (189 patients treated in Bordeaux) and a validation set S2 (125 patients treated in Créteil). Based on S1, univariate Cox models were calculated for each of the 103 measured genes (survival R package, coxph function, breslow method) and genes with a logrank test pvalue less than 0.05 were selected, yielding 31 genes. These 31 genes were used in a stepwise procedure with the logrank test pvalue as selection criterion, to build multivariate Cox models on S1. We used a modified stepwise forward procedure: at run k>2 (i.e. building a model at k variables, based on a previously obtained model at (k−1) variables), we add a variable, then remove a variable and add again a variable. The variable to be added or removed is selected among those optimizing the criterion. When several variables are optimizing the criterion, the first encountered is selected. We built 10 models, ranging from 1 to 10 genes. We then selected the smallest model, i.e. with the less possible variables, optimizing the criterion. To validate this model (k=5 genes), it was used to predict samples from the validation set S2.
Given a sample to be classified in one of two prognostic classes 0 and 1 (respectively corresponding to favorable and pejorative outcomes), N variables and related measures X=(x1, xN) for this sample, the sample will be attributed to class 0 or 1 based on the following rule:
Parameters (mi,wi) are given in Table 2 above.
In the composite prognostic score the value of A(X) is used as an input, in addition to the BCLC class and the microvascular invasion.
Log rank test and Kaplan Meier method were used to assess survival. Continuous and discontinuous variable were compared using Mann Whitney and Chi square or fisher exact test respectively. Univariate and multivariate analysis were performed using the Cox model. Statistical analysis was performed using the R statistical software and rms package.
The area under the curve for testing the signature accuracy in terms of specific survival prediction was performed according to Uno, H., et al. 2007. Prediction rules were evaluated for t-year survivors with censored regression models (Journal of the American Statistical Association 102, 527-537) and using the survAUC R package. The nomogram was built by using the rms package.
To create and validate a robust molecular genes-score to predict overall survival and early tumor recurrence of resected HCC, the expression of a set of 103 genes was analyzed in the 314 HCC qualified for prognosis (see flowchart in
The dichotomized 5-genes score was significantly associated with overall survival in the training (log rank P<0.0001,
Moreover, the 5 genes score was also associated with early tumor recurrence in both the training (log rank P<0.0001, see
Then, the inventors asked if the molecular prognostic classification of the primitive tumor could predict the clinical course of the corresponding relapse. Accordingly, in the subgroup of patients that relapse, the score (performed on the primitive tumor) accurately predicted the risk of death after relapse (log rank P<0.0001, see
Among the 314 HCC patients treated by complete resection, 129 were classified in the poor prognosis group with the 5-genes score. This group of patients with molecular poor prognosis was significantly related to almost all the well-known clinical (HBV infection, tumor size, preoperative AFP, BCLC stage), pathological (macro and micro-vascular invasion, tumor differentiation) and molecular features (G3 classification, P53 mutations) previously associated with HCC prognosis (see Table 7 below). In contrast the molecular prognostic 5-genes score is not associated with age, other etiologies, tumor number, METAVIR score and CTNNB1 mutations.
#expressed in months (median, 25th and 75th percentile) and analyzed using Mann Whitney test.
The inventors also aimed to test the independent value of the new molecular 5-genes score to predict prognosis. It was showed using multivariate analysis that the 5-gene score is associated with overall survival independently of clinical and pathological features, including the BCLC staging, in the training, validation and overall cohort (see Table 8 below).
5.93 10−13
Interestingly, in tested patients, TP53 and CTNNB1 mutations were not related to prognosis. Moreover, while related to G3-classification (see Table 9 below), the 5-genes score was more contributive to predict prognosis in each cohort of patients (see Table 9 below).
In addition, the performance of the 5-genes score was also compared to that of several prognosis scores disclosed in WO2007/063118A1. The 5-genes score was also found to be more contributive to predict prognosis in each cohort of patients (see Table 10 below).
As the French patients reflected the diversity of HCC in term of stages, etiologies and underlining liver diseases, the performance of the 5-genes score in each condition was analyzed (see
All these results underline the robustness and the strong independent ability of the 5-genes score to predict the prognosis of patients with HCC treated by resection.
Finally, the most relevant clinical, pathological and molecular variables was assembled in the overall series of HCC patients to develop a composite prognostic predictor. Integration of the BCLC classification with microvascular invasion and the 5-genes score was performed to obtain a composite score. The nomogram in
Molecular prediction of HCC recurrence and related death is an expanding field. More than 18 different molecular signatures have been published yet but few of them have been externally validated (Villanueva A, et al. 2010). One of these validated molecular prognostic classifications was the G3-signature that has been previously validated in paraffin-embedded tissues (Boyault S, et al. 2007, Villanueva A, et al. 2011).
The 5 genes included in the prognostic signature were TAF9, RAMP3, HN1, KRT19 and RAN. They reflected different signaling pathways deregulated in poor prognostic tumors. The stem cell/progenitor feature related to KRT19 expression was already described in poor-prognostic HCC (Lee J S nat med 2006). Similarly, TAF9, RAMP3, and HN1 had already been associated to HCC prognosis in WO2007/063118A1. In contrast, RAN is a new player in HCC prognosis. These deregulations, identified within the tumors, are related to aggressiveness of the cancer and this is linked to the early relapse after surgery and survival after relapse.
In the present work, the newly identified 5-genes score was more contributive than the G3 signature to predict the prognosis of patients with HCC treated by resection. Notably, the 5-gene signature identified most of the tumors classified in G3-subgroup (86%) as having bad prognosis, but it also identified the poor-prognosis patients with tumor classified in non-G3 molecular subgroups.
Similarly, the single newly identified 5-genes score was also found more contributive than the various signatures disclosed in WO2007/063118A1 for prognosis of global survival or survival without relapse.
In the western cohort of patients used in the present study, it was taken advantage of various etiologies (alcohol, hepatitis C and B, metabolic disease) and of various stages of the disease (from early to invasive) HCC treated similarly in two French academic hospitals. In contrast to other studies focusing mainly on HBV-related HCC (Nault J C, et al. 2011, Woo H G, et al. 2011, Hsu H C, et al. 2000), no significant association between TP53 or CTNNB1 mutations and prognosis was found. The 5-gene scoring is significantly associated with prognosis independently of tumor stage, etiology or presence of cirrhosis.
In conclusion, the 5-genes score identified by the inventors will simplify and refine the prognosis and the therapeutic decision of HCC patients.
The 5 genes prognosis predictor described in Example 1 is based on protocols that are designed for RT quantitative PCR ΔΔCt measurements.
10 additional versions of the same 5 genes prognosis predictor (based on an expression profile consisting of genes TAF9, RAMP3, HN1, KRT19, and RAN), dedicated to microarray data, have also been developed in order to validate the 5 genes signature.
These 10 “microarray” versions were obtained based on two distinct training sets, one based on quantitative RT-PCR data and the other on microarray data, and using 5 distinct algorithms.
More precisely, the 10 “microarray” versions were obtained as follows:
The above results indicate that predictors based on the same genes but calibrated differently, based on another training set and/or another technology for measuring expression level and/or another algorithm) lead to comparable results.
They also show that the technology used for measuring expression level in a validation group does not need to be the same at that used for the training group.
Number | Date | Country | Kind |
---|---|---|---|
12306146.7 | Sep 2012 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2013/069753 | 9/23/2013 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61704360 | Sep 2012 | US |