The field of the invention is characterizing breast cancer using omics analysis, especially as it relates to subtyping of breast cancer, especially TNBC (triple negative breast cancer).
The background description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.
All publications herein are incorporated by reference to the same extent as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference. Where a definition or use of a term in an incorporated reference is inconsistent or contrary to the definition of that term provided herein, the definition of that term provided herein applies and the definition of that term in the reference does not apply.
Treatment of patients with TNBC (breast cancer typically lacking expression of estrogen receptors, progesterone receptors and HER2 (human epidermal growth factor receptor 2)) is often challenging due to underlying genetic heterogeneity and the absence of well-defined molecular targets. TNBCs constitute 10%-20% of all breast cancers, and more frequently affect younger patients. TNBC tumors are typically larger in size, tend to have a higher grade and lymph node involvement, and are often more aggressive. Despite having higher rates of clinical response to presurgical (neoadjuvant) chemotherapy, TNBC patients have a higher rate of distant recurrence and a poorer prognosis than women with other breast cancer subtypes. Indeed, less than 30% of women with metastatic TNBC survive 5 years, and almost all patients die of breast cancer even with adjuvant chemotherapy.
More recently, efforts have been undertaken to refine TNBC into molecular subtypes into several molecularly distinct subgroups based on retrospective analysis of observed treatment responses to chemotherapy (see e.g., PLOS ONE|DOI:10.1371/journal.pone.0157368 Jun. 16, 2016). Similarly, subtypes for TNBC were defined based on five potential clinically actionable groupings of TNBC: 1) basal-like TNBC with DNA-repair deficiency or growth factor pathways; 2) mesenchymal-like TNBC with epithelial-to-mesenchymal transition and cancer stem cell features; 3) immune-associated TNBC; 4) luminal/apocrine TNBC with androgen-receptor overexpression; and 5) HER2-enriched TNBC (see e.g., Oncotarget, Vol. 6, No. 15; pp 12890-12908). In yet another study (see e.g., J Breast Cancer 2016 September; 19(3): 223-230), subtypes of TNBC were identified as basal-like, mesenchymal, luminal androgen receptor, and immune-enriched. In still further known studies, expression subtyping was performed and identified three sub-clusters among tested patient samples (see e.g., Breast Cancer Research (2015) 17:43). Likewise, an online classification tool was published to classify TNBC by gene expression (URL: cbc.mc.vanderbilt.edu/tnbc; Cancer Informatics 2012:11 147-156) that separated TNBC data into six distinct subtypes.
While such known methods provide at least some insight into different subgroups of TNBC, several of these subtypes are bound to specific parameters such as specific drug response, biomarkers, etc. and as such have an inherent bias. On the other hand, other methods require analysis of a substantially complete omics data set to identify a subtype. Consequently, analysis is often time consuming and expensive.
Despite remarkable advances in molecular insight into breast cancer genetics of TNBC, prediction of survival time or treatment success remains elusive. Therefore, there is still a need for improved systems and methods to better characterize TNBC subtypes that may help identify appropriate treatment methods and/or predict patient survival. Ideally, such improved systems and methods will not require a full omics data set but can be performed using a limited number of omics data.
The inventive subject matter is directed to various systems and methods of omics analysis and especially expression analysis of a limited set of genes from a breast cancer sample that are suitable to identify TBNC and a particular molecular subtype within TBNC. Advantageously, such analysis is not tied to a particular outcome (e.g., treatment sensitivity or survival) and will require less than 100, and more typically less than 80 data for gene expression of selected genes.
Thus, in one aspect of the inventive subject matter, the inventor contemplates a method of processing omics data of a cancer sample that includes a step of obtaining transcriptomic data of a cancer tissue. Most preferably, the transcriptomics data is associated with protein expression level of a plurality of proteins in the cancer tissue, and the plurality of proteins is associated with a phenotype of the cancer tissue. Then, the transcriptomics data is stratified into a subgroup of data and the subgroup of data is clustered. In yet another step, the clustered subgroup of data is subjected to a recursive feature elimination to thereby obtain a reduced transcriptomic data.
For example, contemplated cancer samples include a breast cancer sample in which the plurality of proteins includes an estrogen receptor, a progesterone receptor, and HER2. In such example, the derived phenotype of the cancer tissue will be TNBC. However, other contemplated proteins include DNA repair proteins, cell cycle proteins, and/or proteins encoded by a cancer driver gene. Most typically, the transcriptomic data are RNAseq data, and/or the step of stratifying uses a cutoff value that is optimized for a ratio between true positive and false negative.
While not limiting to the inventive subject matter, the step of clustering may use between 3 and 10 clusters, and the recursive feature elimination is repeated at least once. Consequently, the reduced transcriptomic data are less than 30%, or less than 10%, or less than 1% of the transcriptomic data of a cancer tissue.
Where desired, contemplated methods may include a step of associating the reduced transcriptomic data with a drug response, overall survival, disease free survival, and/or progression free survival. In such embodiments, the method may further include a step of determining a treatment regimen based on at least one of the drug response, the overall survival, the disease free survival, and the progression free survival. Additionally, the method may also further include a step of treating a patient having the cancer tissue with a cancer treatment in the treatment regimen in a dose and a schedule sufficient to treat the cancer tissue. Moreover, the reduced transcriptomic data may also be used as an input for a pathway analysis.
In another aspect of the inventive subject matter, the inventors contemplate a system for processing omics data of a cancer tissue that includes an omics database storing transcriptomic data of the cancer tissue and a machine learning system informationally coupled to the omics database. The machine learning system is programmed to obtain the transcriptomic data of the cancer tissue, wherein the transcriptomics data is associated with protein expression level of a plurality of proteins in the cancer tissue, and wherein the plurality of proteins is associated with a phenotype of the cancer tissue, stratify the transcriptomics data into a subgroup of data, and clustering the subgroup of data, and subject the clustered subgroup of data to recursive feature elimination to obtain reduced transcriptomic data.
For example, contemplated cancer samples include a breast cancer sample in which the plurality of proteins includes an estrogen receptor, a progesterone receptor, and HER2. In such example, the derived phenotype of the cancer tissue will be TNBC. However, other contemplated proteins include DNA repair proteins, cell cycle proteins, and/or proteins encoded by a cancer driver gene. Most typically, the transcriptomic data are RNAseq data, and/or the step of stratifying uses a cutoff value that is optimized for a ratio between true positive and false negative.
While not limiting to the inventive subject matter, the subgroup is clustered using between 3 and 10 clusters, and the recursive feature elimination is repeated at least once. Consequently, the reduced transcriptomic data are less than 30%, or less than 10%, or less than 1% of the transcriptomic data of a cancer tissue.
Where desired, the machine learning system may be further programmed to associate the reduced transcriptomic data with a drug response, overall survival, disease free survival, and/or progression free survival. In such embodiments, the machine learning system may be further programmed to determine a treatment regimen based on at least one of the drug response, the overall survival, the disease free survival, and the progression free survival. Moreover, the reduced transcriptomic data may also be used as an input for a pathway analysis.
In still another aspect of the inventive subject matter, the inventors contemplate a non-transient computer readable medium that is informationally coupled to an omics database that stores transcriptomic data of a cancer tissue. The transient computer readable medium contains program instructions for causing a computer system comprising a machine learning system to perform a method of obtaining the transcriptomic data of the cancer tissue, wherein the transcriptomics data is associated with protein expression level of a plurality of proteins in the cancer tissue, and wherein the plurality of proteins is associated with a phenotype of the cancer tissue, stratifying the transcriptomics data into a subgroup of data, and clustering the subgroup of data, and subjecting the clustered subgroup of data to recursive feature elimination to obtain reduced transcriptomic data.
For example, contemplated cancer samples include a breast cancer sample in which the plurality of proteins includes an estrogen receptor, a progesterone receptor, and HER2. In such example, the derived phenotype of the cancer tissue will be TNBC. However, other contemplated proteins include DNA repair proteins, cell cycle proteins, and/or proteins encoded by a cancer driver gene. Most typically, the transcriptomic data are RNAseq data, and/or the step of stratifying uses a cutoff value that is optimized for a ratio between true positive and false negative.
While not limiting to the inventive subject matter, the step of clustering may use between 3 and 10 clusters, and the recursive feature elimination is repeated at least once. Consequently, the reduced transcriptomic data are less than 30%, or less than 10%, or less than 1% of the transcriptomic data of a cancer tissue.
Where desired, contemplated methods may include a step of associating the reduced transcriptomic data with a drug response, overall survival, disease free survival, and/or progression free survival. In such embodiments, the method may further include a step of determining a treatment regimen based on at least one of the drug response, the overall survival, the disease free survival, and the progression free survival. Moreover, the reduced transcriptomic data may also be used as an input for a pathway analysis.
Various objects, features, aspects and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawings.
The inventors have now discovered that breast cancer can be accurately typed as triple negative breast cancer (TNBC) using expression data for selected receptor genes at appropriate threshold (i.e., cutoff) values and even subtyped into four distinct classes using expression data for a relatively small number of selected genes. Viewed from a different perspective, the inventors discovered that accurate diagnosing and/or characterizing the subtypes of breast cancers, especially TNBC can be performed with substantially reduced types and size of omics data when such reduced omics data is selected by clustering the data and eliminating less relevant data (e.g., via ranking the data based on the model and attributes, etc.). Thus, in one especially preferred aspect of the inventive subject matter, the inventors contemplate a method of processing omics data of a cancer tissue to obtain the reduced omics data set for subtyping the cancer tissue. In this method, transcriptomic data of the cancer tissue can be obtained and stratified into a subgroup of data, which is then clustered. Then, such clustered subgroup of data can be subjected to recursive feature elimination to obtain reduced transcriptomic data.
As used herein, the term “tumor” or “cancer” refers to, and is interchangeably used with one or more cancer cells, cancer tissues, malignant tumor cells, or malignant tumor tissue, that can be placed or found in one or more anatomical locations in a human body. It should be noted that the term “patient” as used herein includes both individuals that are diagnosed with a condition (e.g., cancer) as well as individuals undergoing examination and/or testing for the purpose of detecting or identifying a condition. Thus, a patient having a tumor refers to both individuals that are diagnosed with a cancer as well as individuals that are suspected to have a cancer. As used herein, the term “provide” or “providing” refers to and includes any acts of manufacturing, generating, placing, enabling to use, transferring, or making ready to use. As used herein, the term “bind” refers to, and can be interchangeably used with a term “recognize” and/or “detect”, an interaction between two molecules with a high affinity with a KD of equal or less than 10−6M, or equal or less than 10−7M. As used herein, the term “provide” or “providing” refers to and includes any acts of manufacturing, generating, placing, enabling to use, or making ready to use.
As used herein, the term “locus” (or in plural, “loci”) refers to a portion of or a location in a gene, a transcript of a gene, or a nucleic acid molecule derived from a gene or a transcript of a gene.
It should be noted that any language directed to a computer should be read to include any suitable combination of computing devices, including servers, interfaces, systems, databases, agents, peers, engines, modules, controllers, or other types of computing devices operating individually or collectively. One should appreciate the computing devices comprise a processor configured to execute software instructions stored on a tangible, non-transitory computer readable storage medium (e.g., hard drive, solid state drive, RAM, flash, ROM, etc.). The software instructions preferably configure the computing device to provide the roles, responsibilities, or other functionality as discussed below with respect to the disclosed apparatus. In especially preferred embodiments, the various servers, systems, databases, or interfaces exchange data using standardized protocols or algorithms, possibly based on HTTP, HTTPS, AES, public-private key exchanges, web service APIs, known financial transaction protocols, or other electronic information exchanging methods. Data exchanges preferably are conducted over a packet-switched network, the Internet, LAN, WAN, VPN, or other type of packet switched network.
As used herein, and unless the context dictates otherwise, the term “coupled to” is intended to include both direct coupling (in which two elements that are coupled to each other contact each other) and indirect coupling (in which at least one additional element is located between the two elements). Therefore, the terms “coupled to” and “coupled with” are used synonymously.
Obtaining Omics Data: Any suitable methods and/or procedures to obtain omics data are contemplated. For example, the omics data can be obtained by obtaining tissues from an individual and processing the tissue to obtain DNA, RNA, protein, or any other biological substances from the tissue to further analyze relevant information. In another example, the omics data can be obtained directly from a database that stores omics information of an individual.
Where the omics data is obtained from the tissue of an individual, any suitable methods of obtaining a tumor sample (tumor cells or tumor tissue) or healthy tissue from the patient are contemplated. Most typically, a tumor sample or healthy tissue sample can be obtained from the patient via a biopsy (including liquid biopsy, or obtained via tissue excision during a surgery or an independent biopsy procedure, etc.), which can be fresh or processed (e.g., frozen, etc.) until further process for obtaining omics data from the tissue. For example, tissues or cells may be fresh or frozen. In other example, the tissues or cells may be in a form of cell/tissue extracts. In some embodiments, the tissues or cells may be obtained from a single or multiple different tissues or anatomical regions. For example, a metastatic breast cancer tissue can be obtained from the patient's breast as well as other organs (e.g., liver, brain, lymph node, blood, lung, etc.) for metastasized breast cancer tissues. In another example, a healthy tissue or matched normal tissue (e.g., patient's non-cancerous breast tissue) of the patient can be obtained from any part of the body or organs, preferably from liver, blood, or any other tissues near the tumor (in a close anatomical distance, etc.).
In some embodiments, tumor samples can be obtained from the patient in multiple time points in order to determine any changes in the tumor samples over a relevant time period. For example, tumor samples (or suspected tumor samples) may be obtained before and after the samples are determined or diagnosed as cancerous. In another example, tumor samples (or suspected tumor samples) may be obtained before, during, and/or after (e.g., upon completion, etc.) a one time or a series of anti-tumor treatment (e.g., radiotherapy, chemotherapy, immunotherapy, etc.). In still another example, the tumor samples (or suspected tumor samples) may be obtained during the progress of the tumor upon identifying a new metastasized tissues or cells.
From the obtained tumor samples (cells or tissue) or healthy samples (cells or tissue), DNA (e.g., genomic DNA, extrachromosomal DNA, etc.), RNA (e.g., mRNA, miRNA, siRNA, shRNA, etc.), and/or proteins (e.g., membrane protein, cytosolic protein, nucleic protein, etc.) can be isolated and further analyzed to obtain omics data. Alternatively and/or additionally, a step of obtaining omics data may include receiving omics data from a database that stores omics information of one or more patients and/or healthy individuals. For example, omics data of the patient's tumor may be obtained from isolated DNA, RNA, and/or proteins from the patient's tumor tissue, and the obtained omics data may be stored in a database (e.g., cloud database, a server, etc.) with other omics data set of other patients having the same type of tumor or different types of tumor. Omics data obtained from the healthy individual or the matched normal tissue (or healthy tissue) of the patient can be also stored in the database such that the relevant data set can be retrieved from the database upon analysis. Likewise, where protein data are obtained, these data may also include protein activity, especially where the protein has enzymatic activity (e.g., polymerase, kinase, hydrolase, lyase, ligase, oxidoreductase, etc.). As used herein, omics data includes but is not limited to information related to genomics, proteomics, and transcriptomics, as well as specific gene expression or transcript analysis, and other characteristics and biological functions of a cell.
In an especially preferred embodiment, the omics data that is used to characterize the tumor, especially breast cancer, in this inventive subject matter is transcriptomics data. The transcriptomics data includes sequence information and expression level (including expression profiling, copy number, or splice variant analysis) of RNA(s) (preferably cellular mRNAs) that is obtained from the patient, from the cancer tissue (diseased tissue) and/or matched healthy tissue of the patient or a healthy individual. There are numerous methods of transcriptomic analysis known in the art, and all of the known methods are deemed suitable for use herein (e.g., RNAseq, RNA hybridization arrays, qPCR, etc.). The suitable transcriptomics data may typically include absolute or relative strength of transcription, for example, expressed as transcription levels of genes in the first location relative to transcription levels of genes in normal tissue of first patient. Alternatively, or additionally, transcriptomics data may also be expressed as relative abundance (e.g., transcripts per million (TPM)). Consequently, preferred materials include mRNA and primary transcripts (hnRNA), and RNA sequence information may be obtained from reverse transcribed polyA+-RNA, which is in turn obtained from a tumor sample and a matched normal (healthy) sample of the same patient. Likewise, it should be noted that while polyA+-RNA is typically preferred as a representation of the transcriptome, other forms of RNA (hn-RNA, non-polyadenylated RNA, siRNA, miRNA, etc.) are also deemed suitable for use herein. Preferred methods include quantitative RNA (hnRNA or mRNA) analysis and/or quantitative proteomics analysis, especially including RNAseq. In other aspects, RNA quantification and sequencing is performed using RNA-seq, qPCR and/or rtPCR based methods, although various alternative methods (e.g., solid phase hybridization-based methods) are also deemed suitable. Viewed from another perspective, transcriptomic analysis may be suitable (alone or in combination with genomic analysis) to identify and quantify genes having a cancer- and patient-specific mutation.
Preferably, the transcriptomics data set includes allele-specific sequence information and copy number information. In such embodiment, the transcriptomics data set includes all read information of at least a portion of a gene, preferably at least 10×, at least 20×, or at least 30×. Allele-specific copy numbers, more specifically, majority and minority copy numbers are calculated using a dynamic windowing approach that expands and contracts the window's genomic width according to the coverage in the germline data, as described in detail in U.S. Pat. No. 9,824,181, which is incorporated by reference herein. As used herein, the majority allele is the allele that has majority copy numbers (>50% of total copy numbers (read support) or most copy numbers) and the minority allele is the allele that has minority copy numbers (<50% of total copy numbers (read support) or least copy numbers).
It should be appreciated that one or more desired nucleic acids or genes may be selected for a particular disease (e.g., cancer, etc.), disease stage, specific mutation, or even on the basis of personal mutational profiles or presence of expressed neoepitopes. Alternatively, where discovery or scanning for new mutations or changes in expression of a particular gene is desired, RNAseq is preferred to so cover at least part of a patient transcriptome. Moreover, it should be appreciated that analysis can be performed static or over a time course with repeated sampling to obtain a dynamic picture without the need for biopsy of the tumor or a metastasis. Thus, in some embodiments, the desired nucleic acids or genes may include genes encoding at least one of a DNA repair protein, a cell cycle protein, a neoepitope, an immune-response related genes, a protein encoded by a cancer driver gene, or any genes that are known to be specifically mutated or their expressions are up- or down-regulated in the tumor cells, or during tumorigenesis. In addition, the desired nucleic acids or genes may include genes encoding proteins that are associated with a phenotype of the cancer tissue. Thus, those genes may include any genes mutated or differentially expressed in different types of tumor or related or attributed to the shape or behavior (e.g., prone to be metastasized, solid tumor, cell shape, morphology of tumor tissue, etc.). For example, where the tumor is a breast cancer, the desired genes may be an estrogen receptor, a progesterone receptor, and/or HER2.
Consequently, the transcriptomics data may be associated with one or more protein expression level(s) of one or more protein(s) in the cancer tissue. Viewed from different perspective, the transcriptomics data may be used to infer one or more protein expression level(s) of one or more protein(s) in the cancer tissue. For example, RNAseq data on PD-L1 in a tumor tissue may show 10× increased TPM compared to the normal tissue, and such data can be associated with increased PD-L1 protein expression in the tumor tissue. Alternatively, at least it can be inferred that the PD-L1 protein expression in the tumor tissue is increased when the RNAseq data on PD-L1 in a tumor tissue may show 10× increased TPM compared to the normal tissue.
The inventors contemplate that types and/or scope of omics data that may be analyzed to classify the tumor or cancer may vary depending on the type of cancer or tumor of interest. For example,
The inventors found that transcriptomics data of some genes, and/or inferred protein expression level from the transcriptomics data of some genes is more reliable to infer the status or classify a specific type of tumor. Viewed from different perspective, the inventors found that transcriptomics data of some genes, and/or inferred protein expression level from the transcriptomics data of some genes reflects the status or classify a specific type of tumor in more consistent and/or accurate manner. Thus, in an especially preferred embodiment, the inventors further contemplate that transcriptomics data of various genes can be stratified to identify the types of genes and their expression levels that can be more reliably used for characterizing the cancer tissue. While any suitable methods to stratify the transcriptomics data are contemplated, one preferred method uses a cutoff values that is optimized for a ratio between true positive and false negative values. Typically, the true positive and false negative values are determined based on the immunohistochemical data (IHC data) of the cancer tissues based on the known receptor status of the tumor tissue samples. In some embodiments, the transcriptomics data is stratified in a Youden plot in which the ratio of true positive to false positive was maximized. The so obtained cutoff values were cross validated in a 10-fold cross validation study using the same data and RNAseq data from an unrelated breast cancer cohort (e.g., TCGA, METABRIC, PRAEGNANT, etc.).
For example, TNBC status may be ascertained using RNAseq data (typically expressed as TPM (transcripts per million)) for the estrogen receptor, the progesterone receptor, and HER2. More particularly
The so obtained cutoff values were cross validated in a 10-fold cross validation study using the same data and RNAseq data from an unrelated breast cancer cohort (PRAEGNANT). The inventors further found that the 10-fold cross-validation accuracy for all receptors (ER: 93.96%+/−1.28, PR: 84.18%+/−2.04, HER2: 84.56%+/−3.08), and accuracy in PRAEGNANT (ER: 83.33%, PR: 72.92%, HER2: 86.15%) are high across both cohorts.
The inventors further found that the data shown in
Consequently, the inventors further contemplate that a relatively large number of cancer tissue samples and the transcriptomics data (preferably filtered with threshold values by true positive and/or false negative values) are used to build and train an intrinsic subtype predictor for subtyping the cancer. Preferably the intrinsic subtype predictor can be built and trained using any machine learning system and/or algorithms. For example, suitable machine learning processes may read all relevant or selected omics data across all time points and biopsy location and perform training and validation splitting, data and metadata transformations, and then write those data to various formats required by disparate machine learning software packages. Suitable machine learning processes include glmnet lasso, glmnet ridge regression, glmnet elastic nets, NMFpredictor, WEKA SMO, WEKA j48 trees, WEKA hyperpipes, WEKA random forests, WEKA naive Bayes, WEKA JRip rules, etc. Exemplary machine learning processes are disclosed in WO 2014/059036 or WO 2014/193982, which are incorporated by references herein. Moreover, mutational data may be employed to further refine the gene set or to associate mutations with one or more expression levels.
The inventors further found that the machine learning process to classify and/or characterize the cancer tissue using transcriptomics data can be more efficiently and/or effectively performed when the transcriptomics data are clustered into a plurality of clusters (e.g., based on the level of up- or down-regulation, based on the absolute expression level, based on the associated changes with other genes, based on the associated changes with specific types of cancer tissue, etc.). Thus, the number of clusters of transcriptomics may vary, and the number of genes in each cluster may vary as well. For example, the number of clusters may be at least 3 clusters, at least 5 clusters, at least 10 clusters, at least 15 clusters, at least 20 clusters, and the number of genes in each cluster may range between 10-10,000 genes, between 10-1000 genes, between 10-100 genes, etc.
Consequently, the inventors contemplate that an optimal number of clusters can be selected to increase the efficiency of the machine learning for characterizing and/or classifying the cancer tissues. Preferably, the optimal or appropriate number of clusters can be selected using a knee point analysis identifying a point with the largest acceleration with decreased inconsistency. For example, the inventors further subject all identified TNBC samples to an analysis to identify subtypes independent of any classifier. The inventor first defined a set of clusters that was considered gold-standard but included too many genes suitable for diagnostic use. More specifically, the initially selected genes were highly differentially expressed (i.e., most variable genes) within the TNBC group. This group of genes included approximately 10,000 genes. To identify an appropriate number of clusters, a knee point analysis was performed on a restricted set of data (here 115 patient data using the 10,000 most variant genes). As can be taken from
While there can be 10,000 mostly variable genes related to the breast cancer classification, such number of genes are often too many for further analysis, especially to visualize the clusters. Thus, in
While a cluster size of 4 was so determined the best clustering in the example depicted in
It is contemplated that the reduced gene sets clustered in an optimal number of clusters (e.g., k=4) can substantially increase the efficiency and speed of the transcriptomics analysis to classify and/or characterize the cancer tissue as the amount of data to be processed can be at least 10 times, at least 50 times, at least 100 times smaller than the whole transcriptomics analysis. Further, such reduced gene sets in each cluster may reduce the false positive data and/or false negative data due to the high variance of the transcriptomics data among tissues such that the accuracy of the analysis can be substantially increased. Preferably, subtyping is unsupervised and based on recursive feature elimination of a large set of genes with highest variability in gene expression.
In addition, the results of such clustering of cancer tissues can be used as an input into pathway analysis algorithms to identify affected and/or targetable pathways and/or intrinsic properties of the tumor tissue or cells. In some embodiments, the transcriptomics data of selected genes (in each cluster or one of the clusters) can be integrated into a pathway model (e.g., as a pathway element or a regulatory parameter to control or affect the pathway element, etc.) to generate a modified pathway of cancer tissue to determine any differential pathway characteristic of the cancer tissue. While any suitable methods of analyzing pathway characteristics of cells are contemplated, a preferred method uses PARADIGM (Pathway Recognition Algorithm using Data Integration on Genomic Models), which is a genomic analysis tool described in WO2011/139345 and WO/2013/062505 and uses a probabilistic graphical model to integrate multiple genomic data types on curated pathway databases.
Further, it is also contemplated that classification and/or characterization of the cancer tissue may be advantageously associated (preferably via machine learning) with a desired treatment or predictive parameter, and/or improved by use of supervised learning. For example, a specific subtype as presented herein may be associated with treatment response to nab-paclitaxel, optionally followed by epirubicin plus cyclophosphamide. Likewise, a specific subtype as presented herein may be associated with the overall survival rate or a disease free or progression free survival time. As will be readily appreciated, the results of such clustering can be used to stratify breast cancer patient data, and/or used in supervised machine learning using various classifiers, and particularly drug response (e.g., NAB paclitaxel, optionally with epirubicin/cyclophosphamide), overall survival prediction, or prediction of disease free survival or progression free survival.
In some embodiments, such association with drug sensitivity, predicted treatment response, overall survival rate or a disease free or progression free survival time can be further used to generate and/or determine a treatment regimen. For example, the predicted treatment response using nab-paclitaxel is highly positive, the treatment regimen to the patient can include nab-paclitaxel. In addition, the effect of nab-paclitaxel treatment to the tumor tissue can be simulated in a pathway analysis to determine any potential changes in the pathway activity in one or more selected genes in the cluster. In such scenario, a treatment targeting the one or more selected genes that are (potentially) changed by nab-paclitaxel treatment can be further selected as a treatment regimen followed by nab-paclitaxel treatment. As used here, a treatment targeting a gene refers a treatment targeting (e.g., binding, inhibiting the activity, enhancing the activity, etc.) a protein encoded by the gene, and/or a treatment inhibiting or enhancing the gene expression of the one or more genes in a transcriptional level, in a translational level, and/or in a post-translational modification level (e.g., phosphorylation, glycosylation, protein-protein binding, etc.). Such determined or generated treatment (regimen) can be further administered to the patient having the tumor in a dose and a schedule effective or sufficient to treat the tumor (e.g., to reduce the tumor size, to increase the immune response against the tumor, to increase the survival rate, etc.). As used herein, the term “administering” refers to both direct and indirect administration of the treatment regimens, drugs, therapies contemplated herein, where direct administration is typically performed by a health care professional (e.g., physician, nurse, etc.), while indirect administration typically includes a step of providing or making the compounds and compositions available to the health care professional for direct administration.
As used in the description herein and throughout the claims that follow, the meaning of “a,” “an,” and “the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise. Unless the context dictates the contrary, all ranges set forth herein should be interpreted as being inclusive of their endpoints, and open-ended ranges should be interpreted to include commercially practical values. Similarly, all lists of values should be considered as inclusive of intermediate values unless the context indicates the contrary.
Moreover, all methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g. “such as”) provided with respect to certain embodiments herein is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention otherwise claimed. No language in the specification should be construed as indicating any non-claimed element essential to the practice of the invention.
Groupings of alternative elements or embodiments of the invention disclosed herein are not to be construed as limitations. Each group member can be referred to and claimed individually or in any combination with other members of the group or other elements found herein. One or more members of a group can be included in, or deleted from, a group for reasons of convenience and/or patentability. When any such inclusion or deletion occurs, the specification is herein deemed to contain the group as modified thus fulfilling the written description of all Markush groups used in the appended claims.
It should be apparent to those skilled in the art that many more modifications besides those already described are possible without departing from the inventive concepts herein. The inventive subject matter, therefore, is not to be restricted except in the scope of the appended claims. Moreover, in interpreting both the specification and the claims, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms “comprises” and “comprising” should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced. Where the specification claims refers to at least one of something selected from the group consisting of A, B, C . . . and N, the text should be interpreted as requiring only one element from the group, not A plus N, or B plus N, etc.
This application claims priority to our copending U.S. Provisional Patent Application with the Ser. No. 62/594,223, which was filed Dec. 4, 2017, which is incorporated by reference in its entirety herein.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2018/063676 | 12/3/2018 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62594223 | Dec 2017 | US |