The present invention relates to a method of identification of clinically and genetically distinct sub-groups of patients subject to a medical condition, particularly (but not exclusively) breast, lung, and colon cancer patients using a composition of respective gene expression values for certain gene pairs. It further relates to using respective gene expression values for these genes to predict patient risk groups (in context of patient survival or/and disease progression) and to using the predicted groups for identification of the specific and robust prognostic biomarkers with mechanistic interpretations of biological changes (associated with the gene signature) appropriate for an implementation of therapeutic targeting.
Breast cancer ranks second among commonly diagnosed cancers in the world and is the most frequent cause of cancer death in women in both developing and developed countries, although it is only the fifth greatest cause of cancer mortality overall [1]. During the last decade, substantial progress have been achieved in reducing the mortality of breast cancer (especially in developed countries) [1] as compared to its increasing incidence worldwide. The reasons for reduction of breast cancer mortality include application of early mammographic screenings [2] as well as adjuvant chemo-, hormono-therapy [3]. Nevertheless, the benefit of the adjuvant therapy and clinical outcome vary essentially among breast cancer patients [4]. For example, therapy modalities are often dramatically different depending on the tumor grade status (poorly differentiated tumors vs. highly differentiated tumors); targeted biologic therapy with trastuzumab or lapatinib is highly efficient in HER2/neu-positive breast tumors [5]. With the currently used post-surgery therapeutic treatments approaches about 60% of all breast cancer patients with early-stage breast cancer still receive adjuvant chemotherapy of which only a small proportion (2-15%) of patients derive therapeutic benefit [3]. All treated (and, often over-treated) patients (by systemic therapy) remain at risk of long-term toxic side effects which can include cognitive impairment, cardiac tissue damage, infertility, disease of the central nervous system, secondary malignancies and personality changes.
According to a recent report which included 29 US cost-of-illness studies for breast cancer, the estimate of lifetime per-patient costs of breast cancer ranges from $US 20,000 to $US 100,000 [6]. Costs of different surgeries are relatively similar (breast-conserving surgery vs. mastectomy) but, all else being equal, significant costs ($US 23,000-31,000) were observed for patients who received adjuvant chemotherapy compared with those who did not [6]. According to another source [7] the cost of breast cancer treatment for pre-invasive stages is approximately $US 10,000-$US 15,000, whereas by contrast later stage breast cancers (with higher grade, higher invasiveness and metastatic potential) can reach total cost of between $US 60 000 and $US 145 000. Therefore, improvement of the prognosis/prediction and further stratification of hormone therapeutic/chemo therapeutic schemes (which includes identification of patients with highly invasive/recurrent/metastatic tumors) could substantially improve life quality of individual patients and decrease per-patient treatment costs.
The relatively low efficiency of currently used chemotherapy schemes can be explained by the high level of heterogeneity of breast tumors, on the one hand, and by real challenges for its identification in routine everyday clinical practice, on the other. Nevertheless, very active research ongoing in the field nowadays including current report provides new opportunities and technological innovations to tackle those challenges.
Previous and very recent works reported a large number of parameters which are able to grasp breast cancer heterogeneity: clinico-pathological parameters, simple molecular biomarkers and complex clinical and multi-gene molecular classifiers (“gene signatures”). The first and second types of parameters include, for example, histological grade, estrogen receptor status, progesterone receptor status, lymph node status, Ki67 status, mitotic index, tumor size. The histological Nottingham Grading System discriminates 3 distinct grades: grade 1(G1), grade 2(G2) and grade 3(G3) [8]. NPI score is a typical example of a complex clinical biomarker which is based on three simple clinical parameters-tumor size, lymph node status and histological grade and can identify three prognostic groups with 10-year survival rates 83%, 52% and 13% [9]. However, Nottingham grading system has substantial limitations due to high genetic heterogeneity within each of subtypes. Not fully characterized genetic heterogeneity of G3, G2 and, most probably, G1 breast tumors could be one of the reasons of inconsistency in histologic grading between institutions and, as a consequence, the reason why some health institutions do not include histologic grading in their staging criteria [10, 11].
Intrinsic molecular classification independently sorted out all types of breast tumors into 5 distinct molecular subtypes different in prognosis and therapeutic treatment: basal-like, luminal A, luminal B, ERBB2-enriched and normal-like [12, 13]. Alternatively, in multiple recent studies application of novel complex multigene classifiers led to the discovery that some of the already classical intrinsic subtypes turned out to be heterogeneous in terms of survival [14, 15]. However, typically each of the classifiers was efficient only within one specific subtype and has limited tumor stratifying/prognostic power in the other subtypes.
Gene pairs as distinct prognostic biomarkers can have higher prognostic impact than individual genes in various cancers [16, 17]. The expression levels ratio (expression index) of two genes-HOXB13 and IL17BR—have been shown to be efficient in prediction of recurrence risk in ER-positive, lymph node negative breast cancer patients after hormonotherapy (tamoxifen) [17]. Nevertheless, a single-gene-pair ratio cannot cover all possible and obviously non-linear relationships between the genes and their associations with diseases, medical conditions and population variation. Mechanistic interpretation of the biological changes associated with the single gene ratio tests is not clear. Thus, such signatures have practical limitations in the context of sensitivity and specificity. The robustness of such single gene-pair classifiers for prognosis raised hot debates in the literature [18].
Below we determine several practical challenges in the process of making therapeutic decisions for cancer patients, and specifically breast cancer patients, which include:
i) making therapeutic decisions within poorly differentiated (G3 tumors) tumors, especially within basal-like G3 breast tumors, until now represents a problem for implementation by clinical oncologists;
ii) basal-like breast cancers representing 15-20% of invasive breast cancers are poorly differentiated high grade (typically, G2 or G3) tumors which frequently do not express hormone ER-, PgR- and ERBB2-receptors and are considered to have the worst prognosis [19]. This subtype is genetically more homogenous than the triple-negative group (i.e., ER“−”, PgR“−”, HER2“−”) [20], and therefore, problematic for clinical prognosis and optimal treatment.
iii) luminal A breast cancers which express hormone receptors, have an overall good prognosis and can be treated by hormone therapy, nevertheless even within this group it is necessary to identify tumors that will relapse and metastasize and might be treated with chemotherapy;
iv) grade 1 (G1) and grade 1-like breast tumors (G1, G1-like) are considered to be the low-risk prognosis group which can routinely be determined by histological analysis. However, within this group there is a substantial chance of relapse and metastasis cases which might be treated with chemotherapy;
v) Relatively “good” prognosis group of breast tumors predominantly includes ER-positive (ER“+”) and lymph node negative (LN“−”) patients. However, within that group, a subset of patients still develops tumor recurrence after curative surgery and adjuvant tamoxifen systemic therapy [21].
The biological functions and molecular processes of a significant number of genes in the computationally derived molecular signatures have not been well characterized in many of cancer sub-groups of interest (e.g. in G1 breast cancer), making the determination of the personalized diagnostics or prognosis genes unattainable. Additionally, functional interconnection of a collection of the genes in a signature (often derived computationally from the limited genome-wide studies) in a given cancer subtype is poorly understood. At present, identification of molecular targets for therapeutic intervention are only curiously considered in the computational strategies of the prognostic gene signature discovery methods.
Novel integrative computational, genome-wide and biological mechanism-driven strategies for cancers are promising to discover prognostic signatures that will provide oncologists with unbiased computational predictions and mechanistic interpretations of the pathobiology process associated with the identified gene signatures, enabling decision making about tumor subtype classification, disease recurrence risk stratification and the most appropriate therapeutic strategy of a patient. In particular, re-classification of the G2 breast cancer patients onto G1-like and G3-like subtypes identified to the 5-gene tumor aggressiveness gene (TAG) signature [22], in which genes are functionally associated to each other in a genome of breast cancer cells and play critical role within cell cycle, mitosis and kinetochore machineries. Only such an approach could permit an appropriate interpretation of the results and maximize the usefulness of the signature.
Sense-antisense gene pairs (SAGPs) are naturally occurring gene architectures in which paired genes are located on different strands of a chromosome, transcribed in opposite directions and share a common locus (overlapping region) [23] and, therefore, are functionally connected. Recent data indicate that the expressions of genes-members in SAGPs can be coordinated through specific molecular mechanisms which may not be applicable for the gene pairs without sense-antisense overlaps [24, 25, 26, 27, 28]. It has been shown that antisense transcription and alternative splicing are tightly coordinated processes [25, 27, 29, 30, 31]. Recently Morrissy et at [27] reported the role of SA overlapping regions on slowing down the PoIII complex and, as a consequence, increase of the alternative splicing rate at the same regions. Systematic changes/deregulation of co-expression profiles in such gene pairs have been shown to be directly or indirectly associated with pathogenesis of various cancers including breast, colon, lung, gastric and endometrial cancers as well as B-cell lymphomas and acute lymphoblastic leukemia [16, 23, 32, 33, 34]. Deregulation of co-expression profile in such gene pairs could be a driver of cancer progression and a source for discovery of novel and distinct molecular subtypes of breast cancer and other cancers. Specific and systematic changes of genes expression in cancer-relevant SAGPs could be systematically exploited to detect and to monitor the significant differences in tumor aggressiveness, to identify novel mechanically relevant and robust biomarkers for those differences and make prognosis/prediction of clinical outcome of cancer patients.
Thus, cancer-relevant SAGPs could be utilized to predict patient risk groups and subgroups (in context of survival time or/and disease progression) using respective gene expression values for these genes. The predicted groups could be further implemented for an identification of specific and robust prognostic biomarkers with mechanistic interpretations of biological changes (e.g., associated with the SAGPs signature) appropriating for therapeutic targeting.
Therefore, there is a continuing need in the art for systematic identification of cancer-relevant SAGPs coupled with their direct application in clinical practice.
In general terms, the present invention proposes a computerized method of identifying candidate biomolecules relevant to a medical condition, the candidate biomolecules being putative clinical biomarkers for prognosis of, or putative therapeutic targets for treating, the medical condition. The method comprises identifying a set of SAGPs which optimally stratifies low-risk and high-risk patient sub-populations, identifying genes amongst the SAGPs which are differentially expressed between the sub-populations, and identifying biologically significant genes amongst the differentially expressed genes found in the patient sub-populations The SAGPs may be those listed in Tables 1A and 1B, for example, which are cis-anti-sense interconnected gene pairs.
The invention also provides methods and kits for prognosis of survival or/and treatment response, for example using the identified differentially significant genes belonging specific biological mechanisms. Embodiments of the invention provide a computational method for identification of SAGPs which are relevant to a variation of medical condition and disease outcome, particularly breast cancer. Embodiments also provide an implementation of this method providing identification of statistically and biologically specific patient stratification and prognostic disease models via the cancer relevant small gene signatures (prognostic predictors). Such strategy allows a mechanistic interpretation of pathobiological changes in the tumors and their subtypes associated with the deducted prognostic molecular signatures for patient stratification and prognosis, and for identification of appropriate prognostic biomarkers for the most optimal therapeutic intervention.
In one aspect, the present invention provides a computerized method of identifying candidate biomolecules relevant to a medical condition, the candidate biomolecules being putative clinical biomarkers for prognosis of, or putative therapeutic targets for treating, the medical condition, the method comprising:
In another aspect, the present invention provides a computerized method of clinical outcome prognosis in a subject having a medical condition, the method comprising:
In a further aspect, the present invention provides a kit for predicting clinical outcome in a subject having a medical condition, the kit comprising: a plurality of polynucleotide sequences, ones of the plurality of polynucleotide sequences being capable of specifically hybridizing to and/or detecting a gene of a plurality of genes and/or an expression product of the gene to obtain respective gene expression values, wherein the plurality of genes comprises one or more of the sense-antisense gene pairs (SAGPs) listed in Table 1A, and written instructions for comparing, and/or a tangible computer-readable medium having stored thereon machine-readable instructions for causing a computer processor to compare, the respective gene expression values to optimal gene expression cut-off values, wherein the plurality of genes comprises no more than 100 genes; and wherein the optimal gene expression cut-off values are determined for each SAGP by:
In a yet further aspect, the invention provides a computerized method of composite survival prediction combining the output values from a plurality of SPMs associated with prognosis of a potentially fatal medical condition in each subject k of a set of K subjects suffering from the medical condition, each SPM being a model of the statistical significance of the expression level values of a corresponding set of one or more genes or gene pairs, the method employing test data which for each gene i of the pair of genes indicates a corresponding gene expression value yi,k of subject k;
In a still further aspect of the present invention, there is provided a method of prognosis of survival or treatment response in a subject suffering from breast cancer, comprising:
obtaining a test sample from the subject;
measuring a gene expression level in the test sample for one or more of the prognostic genes obtained according to the first or second aspects of the invention and listed in Table 11; and
In a still further aspect, the present invention provides a kit for prognosis of survival or treatment response in a subject having breast cancer, the kit comprising: at least one nucleic acid probe capable of specifically hybridizing to and/or detecting a gene of a plurality of genes and/or an expression product of the gene, wherein the plurality of genes comprises one or more of the genes listed in Table 11, and wherein the plurality of genes comprises no more than 200 genes.
In yet another aspect of the present invention, there is provided a system for identifying candidate biomolecules relevant to a medical condition, the candidate biomolecules being putative clinical biomarkers for prognosis of, or putative therapeutic targets for treating, the medical condition, the system comprising at least one processor and a tangible computer-readable storage medium having stored thereon machine-readable instructions which, when executed, cause the at least one processor to:
The method may include genome wide screening and selection of a relatively large number (at least 50 SAGPs) to identify SAGPs which are significantly correlated with the medical condition and survival disease outcome data, and then use them to construct a statistics-based prognostic algorithm/method which can generate a most predictive statistical partition model (SPM) based on the estimated cut-offs of gene expression values of the SAGPs. The SAGP for which their best SPM is found is then used for construction of the composite prognosis model (CPM) and stratification of the patients according to the estimated risk outcome.
Next, the method may use the patient classification provided by SAGP CPM for further identification of the specific and reliable differentially expressed genes (DEG) signature in context of discovery of mechanistically related biomarkers (e.g., spliceosome prognostic gene signature) including the genes which could be the most appropriate for therapeutic targeting.
In one embodiment, a method referred to herein as 2-Dimensional Rotated Data-Driven grouping (“2D RDDg”) is provided. In 2D RDDg, expression level values for two genes of a gene pair, expressed as points in a two-dimensional space spanned by the expression level values of a plurality of subjects, are compared to perpendicular cut-off lines which are iteratively rotated in the two dimensional space at a succession of incrementally different angles, performing stratification of the subjects into two subgroups (e.g. low- and high-risk) during each iteration, without losing their orthogonality property, to improve the quality of a statistical partition/dichotomization model in relation to a medical condition or a genetic or phenotypic variation.
In other embodiments, there is provided a computer-implemented method for identification of prognostic SAGPs, comprising: receiving expression data indicative of expression levels of a plurality of genes of a plurality of sense-antisense gene pairs (SAGPs) for a plurality of subjects; identifying, from the expression data, SAGPs for which expression levels of genes in respective pairs are significantly correlated with each other and with a survival or treatment outcome for a medical condition; and identifying a set of prognostically significant SAGPs from among the identified SAGPs using 2D DDg or 2D RDDg. Each of the prognostically significant SAGPs assigns (stratifies) each subject to a low- or high-disease development risk subgroup, refined by the 2D DDg or 2D RDDg method. The method may further comprise applying a weighted voting procedure to p-values of the prognostically significant SAGPs to the stratified subjects to obtain a weighted voting grouping for each subject.
Embodiments of the invention make it possible to extract SAGPs relevant to a medical condition such as cancer, or breast cancer, as well as their combinations which are highly prognostically significant within the diverse subgroups/subtypes of the medical condition.
A computational algorithm (2D RDDg) for patient grouping may be specifically adapted for the usage of those SAGPs and substantially improves the accuracy of stratification and prognosis of patients' outcome. Embodiments of the invention make it possible to substantially improve the accuracy of classification of any pathological samples using survival analysis.
Embodiments of the present invention also propose a sense-antisense gene classifier SAGC as a complex biomarker as a specific subset of gene pairs to substantially improve the accuracy of classification of breast cancer tumors into low risk (LR) and high risk (HR) subgroups. This classifier either outperforms or has a comparable accuracy of stratification and clinical outcome prognosis as compared with currently known complex multi-gene biomarkers/classifiers and clinical tests/assays.
Specifically, embodiments of the present invention propose a new molecular classifier: a sense-antisense gene classifier (SAGC) which is composed of 12 distinct classification units—sense-antisense gene pairs (SAGPs) or 24 individual genes, correspondingly.
These gene pairs are shown in Table 1B below.
The molecular classifier can be used for stratification and prognosis/prediction of novel LR and HR subgroups within total unselected groups as well as within various characterized subgroups/subtypes of breast cancer. The classifier is demonstrated below to be of use for nine different subgroups/subtypes of breast tumors and for tumors of two other epithelial cancers: ER“+”, LN“−” breast tumors treated with tamoxifen; ER“+”, LN“−” PgR“+” breast tumors with size not exceeding 2 cm before curative surgery and not received systemic treatment; grade 3 (G3) breast tumors; G3 and G3-like breast tumors; G1 and G1-like breast tumors; G1 breast tumors; ER“−” breast tumors; basal-like grade 3 breast tumors and luminal A breast tumors, colon cancer stage II tumors and non-small lung cancer tumors. The proposed SAGC classifier substantially outperforms many of the currently known classifiers in accuracy. At the same time, the same set of gene pairs (and a multigene assay) can be used for various molecularly distinct subpopulations of breast tumors, which is not possible for any of the currently known classifiers. Therefore, the SAGC classifier is, to our knowledge, the first multitask complex multi-gene classifier of breast cancer ever proposed based on gene expression studies. We further expect that the classifier could be highly efficient in other subpopulations of breast tumors.
Typically, the classifier contains a core sense-antisense gene pair for a specific subpopulation of breast cancer under prognosis: for example, the SAGP (RNF139/TATDN1) for ER“+”, LN“−” breast cancer patients shows similar accuracy in prognosis of clinical outcome as the currently commercially available two-gene classifier HOXB13/IL17BR. In order to improve the accuracy of our classifier in each of the specific breast tumors subpopulations, additional gene pairs could be introduced in the classifier (maximum number of additional gene pairs-11).
In the era of stratified and personalized medicine a cancer patient with a tumor categorized into a subpopulation or subtype of tumors distinct in terms of molecular etiology and/or patient survival would receive a distinct stratified/individual treatment scheme. This can optimize the ratio: treatment efficiency/life quality for each individual patient. In that context the routine and accurate identification of novel molecular subgroups within the known clinical/genetic subgroups and subtypes would be very helpful to achieve that important goal.
Embodiments of the invention will now be described, by way of non-limiting example only, with reference to the accompanying figures, in which:
As used herein, gene expression level value is a measure of expression activity of a gene by detection of mRNA and for the protein molecules in a given tissue sample.
As used herein, a combination refers to any association between or among two or more components. The combination can be two or more separate components, such as two compositions or two collections, can be a mixture thereof, such as a single mixture of the two or more items, or any variation thereof. The items of a combination are generally functionally associated or related.
As used herein, the term “comprising” is to be interpreted as specifying the presence of the stated features, integers, steps or components as referred to, but does not preclude the presence or addition of one or more features, integers, steps or components, or groups thereof. However, in context with the present disclosure, the term “comprising” also includes “consisting of”. The variations of the word “comprising”, such as “comprise” and “comprises”, have correspondingly varied meanings.
The term “gene pair” refers to a combination of two selected nucleic acid sequences. The two selected nucleic acid sequences can be two separate components, such as two compositions. For example, the two selected nucleic acid sequences may be immobilized at two discrete positions on a solid substrate. Correspondingly, a combination of gene pairs refers to at least two such gene pairs (i.e. at least four selected nucleic acid sequences). With a combination of two or more gene pairs, each selected nucleic acid sequence may be immobilized at discrete positions forming an array on a solid substrate.
The term “risk”, or “relative risk” refers to a measure of separability between two (or more) Kaplan-Meier survival curves related to the potentially fatal medical condition or disease.
The term “statistical partition model (SPM)” defines cut-off values of gene expression level values (low or high) and typically also other necessary parameters (e.g., partition design, rotation angle (see Methods) for a gene or a gene pair in a given group of tumor samples (obtained from distinct patients) and stratifies them into subgroups with, respectively, a relatively high-risk- and a low-risk of a potentially fatal medical condition.
The term “medical condition associated feature” refers to any gene product (e.g. mRNA, (gene expression values detectable by micro-array, PCR-based assays, or other mRNA quantification techniques such as massively parallel sequencing) or protein (detected by immuno-staining, mass-spectrometry, etc) or any other quantitative features (e.g. clinical classification score) useful for discrimination between different states or degrees of a medical condition, and may include combinations of such features (e.g. a ratio of the RNA expression levels, produced by a given gene set, expressed in the same tissue or tissues of a given a patient).
The term “prognostic method”, as used herein, refers to a stratification of patients with a medical condition (e.g. cancer) into two (or more) survival significant sub-groups via any “process of optimization”, including (but not limited to) (i) a rank-order of the patients with a given medical condition according a medical condition associated feature value (e.g., gene expression value) of a training data set and (ii) an identification of cut-off value(s), splitting this feature value onto two (or more) grades which via a survival prediction model (e.g., Data Driven grouping (DDg)) assign the patients with such medical condition to one of statistically distinct disease development risk sub-groups.
The method of “composite survival prediction” (CSP) refers to the group of prognostic methods which integrates the information for individual features (e.g., genes or gene pairs expression signals) into a significantly improved integrated partition of the patients. CSP includes, but is not limited to, Weighted Voting Grouping (WVG), Hierarchical Clustering Analysis (HCA) and Principal Component Analysis (PCA).
The term “disease prognosis model” (DPM) refers to a mathematical model of optimization procedure of the patient stratification into low-risk and high-risk subgroups implemented through the use of any of SPMs and any of methods of CSP. For a given patient, DPM with the most appropriate SPMs and CSP (optimized using training dataset(s)) is used for prognosis/prediction of patient “relative risk” and/or clinical outcome.
As used herein, “differentially expressed” means that a gene is expressed differently, for example in mRNA level, in two or more given samples or groups of samples. The gene may be determined to be differentially expressed by any method known in the art, for example by applying a fold-change threshold for the relative expression level or relative mean expression level in the two samples, or by a parametric or non-parametric statistical testing procedure such as a t-test (including a moderated t-test such as that disclosed in [35]), or for digital gene expression measurement platforms such as mRNA-Seq, Fisher's exact test or likelihood ratio statistics based on a generalized linear model (see, for example, Bullard, J. H. et al, [36] and references cited therein).
The term “original/total group of BC patients” refers to the entire cohort of patients from a given clinical center or hospital without any preselecting by clinical and pathological parameters or conventional clinical biomarker (e.g., ER-status, Histological grade, Ki67 etc.).
The term “Functional gene annotation/Gene Ontology” refers to the bioinformatics project providing ontology of defined terms representing genes and their product properties and covering three gene ontology classes: cellular component, molecular function and biological process.
Functional Gene Annotation/Gene Ontology Enrichment Analysis (FGA/GO EA) is refers to an estimation procedure whether certain Functional Gene annotation/Gene Ontology categories or terms in a gene list are present in higher numbers than it would be expected by chance using a statistical test as known in the art (e.g., Fisher's exact test, or a hypergeometric test, with p-values adjusted using a multiple-testing correction method such as the Holm-Bonferroni method, or a method of controlling the false discovery rate, such as the Benjamini-Hochberg procedure).
The term “polynucleotide sequence” refers to a sequence of nucleotides in a biopolymer composed of 13 or more nucleotide monomers covalently bonded in a chain.
As used herein, the term “oligonucleotide” refers to a short single-stranded nucleic acid biopolymer (typically from 2 to 100 bases) composed of nucleotides and used for artificial gene synthesis, DNA sequencing, as molecular hybridization probes at discrete positions on a solid substrate, and for polymerase chain reaction (PCR).
The term “oligonucleotide sequence” refers to a sequence of nucleotides in an oligonucleotide.
Accordingly, an array refers to a plurality of biological molecules (e.g., oligonucleotides, polypeptides, antibodies, etc) immobilized at discrete positions on a solid substrate. Typically, the position of each of the molecule in the array is known, so as to allow for identification of a target molecule in a sample following analysis.
As used herein, the term “microarray” refers to a substrate comprising a plurality of biological macromolecules (e.g., proteins, polypeptides, nucleic acids, antibodies, etc.) affixed to its surface. In some embodiments, the location of each of the macromolecules in the microarray is known, so as to allow for identification of the samples following analysis.
The term “DNA microarray” refers to a solid support platform (nylon membrane, glass or plastic) on which single stranded DNA is printed or otherwise affixed (for example, as part of a masked or maskless photolithographic fabrication process) in localized features (e.g. nucleic acid probes or probesets for detecting gene expression) that are arranged in a regular grid-like pattern.
The term “reverse transcription polymerase chain reaction” refers to the method used to quantitatively detect gene expression though creation of complimentary DNA from transcribed RNA.
Herein, we deal with but one essential subclass of SAGPs in which each gene-partner can encode a protein (coding-coding SAGPs-ccSAGPs). The genes of ccSAGPs are highly populated in the genome, relatively higher expressed in cancer cells and better annotated than other classes of SAGPs (non-coding-coding or non-coding-non-coding SAGPs). Besides, in ccSAGPs expression patterns of both genes-partners could be mutually regulated effecting the levels of their protein products with presumably stronger combined impact for the cells fate.
A first step (step 1 in
Step 1.1. All ccSAGPs from publicly available annotation databases (e.g., USAGP database [29]) are identified by (manually and/or automatically) searching the databases;
Step 1.2. Gene pairs identified in step 1.1 are screened to select BCR-SAGPs. This step may be done using the criteria of significant Kendall tau correlations (p<0.05) which assumes that if gene expression levels for genes in a sense-antisense gene pair are significantly correlated across patients they could be co-regulated by common biological/molecular mechanism(s). This step is done in at least three independent cohorts to guarantee the robustness of the selected gene set. Selection of ccSAGPs with significant correlations is done within already characterized subgroups and subtypes (e.g., grade 3 tumors, basal-like subtype or grade 3 tumors, non-basal-like subtypes) of breast tumors in order to minimize effect of false-positive correlations and the fraction of less relevant gene pairs. Correlation analysis is performed for each cohort and each subgroup, to produce a respective set of ccSAGPs with significant correlations between the genes-partners included in each ccSAGP and finds those ccSAGPs which are in common subset found across the cohorts.
In one example, we selected the robust set of 73 BCR-SAGPs (Table 1A) within the groups of patients with Grade 3 tumors of basal-like subtype and within the combined groups of patients with Grade 3 tumors of “non-basal-like” subtypes (ERBB2-enriched+Luminal A+Luminal B+Normal-like subtypes) from 3 independent breast cancer cohorts (Uppsala, Stockholm and Harvard 1).
Steps 2-6. Screening and validation of gene pairs to select synergistic survival significant ccSAGPs (referred to herein as 3S-SAGPs). This may be done using the criteria of survival significance (Wald p<0.05).
Step 2 is to perform survival analysis of the ccSAGPs obtained in step 1. The survival analysis procedure we developed for this proposal is performed for pre-selection of synergistic survival significant ccSAGPs and uses a combination of 1D-DDg and 2-D DDg procedures. The 2-D DDg method is used to pre-select survival significant ccSAGPs; within the pre-selected ccSAGPs, and the 1D-DDg method is used to select 3S-SAGPs.
The 2-D DDg method is itself an extension of an algorithm known as the one-dimensional (1-D) DDg method [37]. The 1-D DDg method associates clinical data to single gene expression data, available for a set of patients K suffering from a medical condition, via survival analysis with the Cox proportional hazards model. We denote the clinical and gene expression data for each patient k=1, K as (tk, ek, yi,k) where tk indicates the survival time, ek is a binary outcome of patient's k status at time tk (e.g. ek=1 if relapse occurs and 0 otherwise) and yi,k is the expression value of gene i, i=1, . . . , N. The 1-D DDg method finds for each gene i an optimal cut-off value ci, that partitions the K* subjects into those with expression values (or log transformed expression values) above and below the threshold. The 1-D DDg tries out a number of trial values for ci, and for each trial value, it finds the subset of the K subjects such that yi,k is above the trial value of ci. The survival times/events are fitted to a Cox proportional hazard regression model,
log hki(tk|xki,βi)=αi(tk)+βixki (1)
using a regression parameter βi corresponding to the gene i, and then the regression parameter βi is used to obtain a Wald p-value (significance value) indicative of the prognostic significance of the gene, using
where χv2 denotes the chi-square distribution with v degrees of freedom. The algorithm then finds the trial value of ci such that this significance value is maximized. This gives the cut-off value ci for which gene i has maximal prognostic significance. The algorithm can then estimate which genes are associated with the medical condition: the ones for which the maximum prognostic significance is highest.
The 2-D DDg method [37] extends this idea to gene pairs, assuming that in some situations the expression values of individual genes organized in 2-dimensional space as gene pairs may provide a better statistical partition model of survival prognosis than the expression levels of individual genes organized in 1-dimensional space. A pair of genes is labeled i,j. The method uses a number of “designs” (models) illustrated in
A: y
i,k
<c
i and yj,k<cj
B: y
i,k
≧c
i and yj,k<cj
C: y
i,k
<c
i and yj,k≧cj
D: y
i,k
≧c
i and yj,k≧cj (2)
Each of the seven models is then defined as a respective selection from among the four regions:
Design 1 indicates whether the subject's expression signal are within regions A or D, rather than B or C.
Design 2 indicates whether the subject's expression levels are within regions A, B or C, rather than D.
Design 3 indicates whether the subject's expression levels are within regions A, C or D, rather than B.
Design 4 indicates whether the subject's expression levels are within regions B, C or D, rather than A.
Design 5 indicates whether the subject's expression levels are within regions A, B or D, rather than C.
Design 6 indicates whether the subject's expression levels are within regions A or C, rather than B or D.
Design 7 indicates whether the subject's expression levels are within regions A or B, rather than C or D.
Note that design 6 is equivalent to asking only whether the expression level of gene 1 in the subject is below or above c1 (i.e. it assumes that the expression value of gene 2 is not important). Model 7 is equivalent to asking only whether the expression for gene 2 in the subject is above or below c2 (it assumes that the expression value of gene 1 is not important). Thus, models 1-5 are referred to as “synergetic” (1-5), and the models 6 and 7 as “independent”.
The 2-D DDg algorithm considers all pairs of genes (i, j) in turn. For each pair, it considers each of the seven designs. For each design, it obtains a unique patients' grouping. For example, for design 1, the following subjects' grouping is obtained: patients with expressions (yi,k, yj,k) falling in A and D belong to Group 1; patients with expressions (yi,k′, yj,k′) falling in B and C belong to Group 2. Thus in Group 1 are the subjects with yi,k<ci and yj,k<cj or yi,k≧cj and yj,k≧cj. Let us define a parameter xi,j,km, where xi,j,km=1 if and only if, for genes i and j, and design m (m=1, . . . 7), the expression levels yi,k and yj,k meet the conditions of design m. The algorithm then fits the survival values to the Cox proportional model:
log hi,jk(tk|xi,j,km,βi,jm)=αi,j(tk)+βi,jm·xi,j,km, (3)
and finds the design with the smallest Wald p-value βi,jm (i.e. highest statistical significance). The algorithm then seeks the pairs of genes for which this significance value is the smallest. Thus the algorithm has found both a significant pair of genes, and a design indicating which form of correlation between the genes' expression levels is statistically significant to the medical condition.
Note that
Step 3 is performed in order to select the highly robust synergistic survival significant ccSAGPs and utilizes another survival analysis procedure which is an extension of the 2-D DDg method [37], adapted to any correlated gene pairs (including ccSAGPs and other subclasses of sense-antisense transcripts and gene pairs). The extension is termed “2-D Rotated Data-Driven grouping” (2-D RDDg).
The rotated 2-D Data-Driven grouping (2-D RDDg) is a generalization of the 2-D DDg algorithm that considers patients' grouping using different angles for separating the data. In other words, the original X, Y axes are iteratively rotated by angle α, without losing their orthogonality property, and in each rotation the patients are grouped as before. The best grouping is the one that minimizes the Wald P value of the β coefficient of the Cox proportional model.
Note that instead of rotating (transforming) the data by using trigonometric functions:
where X′, Y′ and X, Y denote the new and the old coordinates, respectively, the algorithm is preferably implemented by rotating the axes themselves. In fact, these two possibilities are equivalent mathematically, but it is conceptually easier for a viewer to see different grouping patterns when the axes are rotated.
The steps of an implementation of the 2-D RDDg algorithm are as follows. Assume that, for each of a number of subjects k=1, K, expression level data exists for each of n gene pairs, where n is at least 10, or much higher.
1. A pair of genes is generated, and considered as a probeset pair denoted by i,j where i takes values in the range 1, . . . , N−1, and j takes values in the range i+1, . . . , N. For each probeset of the pair, form the candidate cutoffs vectors {right arrow over (w)}i=yi* and {right arrow over (w)}i=yi* of dimension 1×Q each, where Q is an integer. The values of {right arrow over (w)}i are expression levels for gene i falling into (q10i, q90i), i.e. the range of values between the 10th and 90th quantiles of the distribution of the log-transformed intensities. Similar logic holds for {right arrow over (w)}j. We generate all Q2 trial cut-off pair values of the predefined quantiles. Thus, each element of the ({right arrow over (w)}i, {right arrow over (w)}j) pair is a trial cutoff pair value for gene pair i, j.
For 1-D DDg, the value of Q depends on the sample size. In the Stockholm cohort we have 159 samples (patients) and within the (q10i, q90i) interval there are approximately Q=120 patients. In the Uppsala cohort, Q is approximately 220.
For 2-D DDg, we need all possible pairs, so in the Stockholm cohort Q=120*120 (all 120 values of gene i for all 120 values of gene j) and in the Uppsala Q=220*220 (similarly). So, there is no standard Q value. It is determined from the data. The standard values for this algorithm are that we always take the 10th and 90th quantiles of the distribution of the expression levels.
Optionally, a “filtration step” is performed in which the algorithm finds which of the Q trial cut-off values in produces the global minimum P value in a 1-D DDg algorithm (i.e. each trial cut-off value is used to partition the patients, and the result is fitted to Eqn. (1)), and a number (e.g. 10) of other trial cut-off values having the next lowest P values. Then, the Q-dimensional vector of cut-offs for gene i is replaced by a vector having only these cut-off values. The filtration can do the same for {right arrow over (w)}j. Subsequently, only the “filtered” cut-off pairs are considered in the 2-D version of the algorithm.
2. Denote each element of {right arrow over (w)}i as {right arrow over (w)}z′i. Similarly for {right arrow over (w)}j. For zi=1 and zj=1 (the first elements of {right arrow over (w)}i and {right arrow over (w)}j), and for design 1 in
log hi,jk(tk|xi,j,km,βi,jm=αi,j(tk)+βi,jm·xi,j,km (4)
which is the same as Eqn. (3) above. This is iterated for each of the other six designs of
3. Iterate for all combinations of {right arrow over (w)}i and {right arrow over (w)}j cutoffs, to find the design and the cut-off values giving the highest statistical significance value (i.e. lowest p-value).
4. For each of a number of values s=1, . . . , S, define a corresponding angle αs. These angles are spaced apart by a regular amount such as π/32. For each value of s, rotate each of the X, Y axes by angle αs. This is illustrated in
5. Iterate the above steps for all i and j combinations of the N genes (i=N−1, j=i+1, . . . , N). Optionally, this may be performed only for sense-antisense gene pairs. Pairs of genes for which the result of step 4 is most significant are identified.
This 2-D RDDg method has a higher accuracy in grouping of patients using ccSAGPs than the 2-D DDg method because it considers the effect of significant positive correlations typical for genes-members of BCR SAGPs. Also, it makes it possible to select more optimal partitions of breast cancer patients into low-risk and high-risk subgroups. This is illustrated by
Step 3 is performed for multiple cohorts of subjects (in our experiment—for two cohorts: the Uppsala and the Stockholm cohorts), to obtain respective sets of pairs of genes which are robustly survival significant using 2-D RDDg method. Step 3 is composed of step 3.1 and 3.2. In the step 3.1 the designs, rotation angles and cut-offs are chosen (to have the lowest Wald p-values for each pair) which are most optimal for all cohorts analysed and, therefore, can be more robust. We name this step also the training step.
Step 3.2 includes application of 1 D-DDg algorithm for each of the gene-members of BCR-SAGPs within total groups of breast cancer patients in order to estimate Wald p-value for each of all of the individual genes composing the ccSAGPs. Finally, those gene pairs are chosen which show lower synergistic 2-D RDDg Wald p-value as compared with 1-D DDg p-values for individual genes in all analysed cohorts (in our experiment-two cohorts). Therefore, typically, the number of survival significant ccSAGPs is expected to be less after step 3.2, than the total number of survival significant pairs extracted by applying 2-D RDDg at step 3.1.
Step 4 included application of Statistically Weighted Voting Grouping (WVG) procedure for integration of survival information for individual gene pairs into a dramatically improved patients partition. Due to the fact that the finally selected set of 3S-SAGPs showed highly significant integrated patients partition at the step 4, we named this gene pairs set as the putative sense-antisense gene classifier (SAGC). The gene pairs composing it are shown in Table 1B. Table 2 shows the p-values for the individual genes and gene pairs listed in Table 1B, to demonstrate that the test of step 3.2 was passed (refer to the first three columns under each of the headings “Stockholm cohort” and “Uppsala cohort”). Much lower integrated WVG Wald p-values (Table 2) than any of the 2-D RDDg p-values indicated that step 4 was passed as well.
Table 1B gives the host genes, Affymetrix probe sets and representative RNA transcripts for the SAGC. The best RNA ID corresponding to the Affymetrix probeset have been chosen. Priority for selection was as follows: a) best ID by chromosome coordinates; b) for the type of IDs: first, well characterized RefSeq NM IDs, then-RefSeq mRNA IDs and, finally, —EST IDs have been chosen. 1-paired transcript located on the same strand as NPC1 gene but within the territory of C18orf8 gene; 2-putative 14kD protein containing SHMT homology, clone pUS1215 from breast cancer cell line ZR-75-1; 3-fetal brain EST from cDNA clone FCBBF3000065. These three genes are indexed by superscripts in Table 1B.
Importantly, to our knowledge, none of the gene pairs composing SAGC have been suggested to be involved in breast cancer, though as individual genes, twelve out of twenty four genes composing SAGC have been reported as associated with various cancers (Table 8). That fact highlights the novelty of our approach.
Selection of synergistic SAGPs assumes that classification of breast tumors using such gene pairs is more efficient than classification using individual genes composing ccSAGP, therefore, such gene pairs can be considered as distinct classification modules in further analyses. Thus, referring to
Steps 4 and 6 of
The WVG step allows integration of the grouping information for 12 gene pairs into a dramatically improved integrated grouping. In table 2, the numbers in the columns LR subgroup and HR subgroup are the number of individuals in these cohorts in each of the groups. The numbers were produced by RDDg, without use of the WVG step.
Step 5 of
We now turn to
Step 7 is training and testing of the SAGC classifier for each new subpopulation or subtype of breast tumor, and comprises sub-steps 7.1 and 7.2. Sub-step 7.1 is selection of the best design, the best rotation angle and gene expression cut-offs for each of the 12 pairs of genes using the 2-D RDDg algorithm with consequent WVG procedure. The procedure is the same as in steps 3 and 4 (
For example, within G3, G3-like breast tumors, application of the full SAGC leads to a substantially better patients partition into high-risk and low-risk subgroups (
The rest of
Step 8. A method for stratification and prediction of clinical outcome of ER“+”, LN“−” breast cancer patients who received adjuvant systemic tamoxifen treatment after curative surgery using the two-gene (SAGP) classifier RNF139/TATDN1. The results are shown in
Step 9. A method for stratification and prediction of clinical outcome of ER“+”, LN“−” breast cancer patients received adjuvant systemic tamoxifen treatment after curative surgery using SAGC classifier (12 gene pairs, 24 genes). The results are shown in
Step 10. A method for stratification and prognosis of clinical outcome of breast cancer patients with grade 3 tumors using VPRBP/RBM15B SAGP (Table 5) as well as the full SAGC classifier (12 gene pairs, 24 genes). The results are shown in
Step 11. A method for stratification and prognosis of clinical outcome of breast cancer patients with grade 3 and grade 3-like tumors using SAGPs C18orf8/NPC1 and EME1/LRRC59 (Table 5) as well as the full SAGC classifier (12 gene pairs, 24 genes). The results are shown in
Step 12. A method for stratification and prognosis of clinical outcome of breast cancer patients with grade 1 and grade 1-like tumors using SHMT1/SMCR8 SAGP (Table 5) as well as the full SAGC classifier (12 gene pairs, 24 genes. The results are shown in
Step 13. A method for stratification and prognosis of clinical outcome of breast cancer patients with grade 1 breast tumors using the full SAGC classifier (12 gene pairs, 24 genes). The results are shown in
Step 14. A method for stratification and prognosis of clinical outcome of ER“−”, breast cancer patients from total unselected groups using the CTNS/TAX1BP3 SAGP (Table 5) as well as the full SAGC classifier (12 gene pairs, 24 genes). The results are shown in
Step 15. A method for stratification and prognosis of clinical outcome of breast cancer patients with basal-like grade 3 (G3) breast tumors using the SAGPs CTNS/TAX1 BP3 and RNF139/TATDN1 (Table 5) as well as the full SAGC classifier (12 gene pairs, 24 genes). It includes estimation of the optimal cut-offs for expression values for each of the twenty four genes, the optimal designs and rotation angles using the 2-D RDDg algorithm for all the genes in one training cohort composed of at least 50 breast cancer patients with consequent testing in at least one cohort composed of at least 50 patients. The optimal classification parameters for all 12 ccSAGPs are presented in Table 7, G. Reference [42] addressed the same problem using a 14-gene signature (14 genes), and Reference [15] addressed it using a 28-kinase metagene classifier (28 genes).
Step 16. A method for stratification and prognosis of clinical outcome of breast cancer patients with Luminal A breast tumors using the BIVM/KDELC1 SAGPs (Table 5) as well as the full SAGC classifier (12 gene pairs, 24 genes). It includes estimation of the optimal cut-offs for expression values for each of the twenty eight genes, the optimal designs and rotation angles using the 2-D RDDg algorithm in all 12 SAGPs in one training cohort composed of at least 50 breast cancer patients with consequent testing in at least one cohort composed of at least 50 patients. The optimal classification parameters for all 12 ccSAGPs are presented in Table 7, H. Reference [14] addressed the same problem using a sixteen kinase gene expression classifier.
Step 17. A method for stratification and prognosis of clinical outcome of ER“+”, LN“−”, PgR“+” breast cancer patients with breast tumors <=2 cm at the time of curative surgery who usually do not receive any systemic treatment, using the SAGC classifier (12 gene pairs, 24 genes). It includes estimation of the optimal cut-offs for expression values for each of the twenty four genes, the optimal designs and rotation angles using the 2-D RDDg algorithm in all 12 SAGPs in one training cohort composed of at least 50 breast cancer patients. The optimal classification parameters for all 12 ccSAGPs are presented in Table 7, I. We are not aware of a similar method.
Step 18. A method for stratification and prognosis of clinical outcome of colon cancer patients with stage II tumors using the SAGC classifier (12 gene pairs, 24 genes). Results are shown in
Step 19. A method for stratification and prognosis of clinical outcome of non-small lung cancer patients from total unselected group using the SAGC classifier (12 gene pairs, 24 genes). It includes estimation of the optimal cut-offs for expression values for each of the twenty four genes, the optimal designs and rotation angles using the 2-D RDDg algorithm in all 12 SAGPs in one training cohort composed of at least 50 non-small lung cancer patients. The optimal classification parameters for all 12 ccSAGPs are presented in Table 7, K. Reference [44] addressed the same problem with a non-small lung cancer 17-gene signature.
Step 20. A method for stratification and prognosis of clinical outcome of breast cancer patients from original/total unselected group using the SAGC classifier (12 gene pairs, 24 genes). It includes estimation of the optimal cut-offs for expression values for each of the twenty four genes, the optimal designs and rotation angles using the 2-D RDDg algorithm in all 12 SAGPs in one training cohort composed of at least 50 breast cancer patients. The optimal classification parameters for all 12 ccSAGPs are presented in Table 7, L.
Step 21. A method for identification of SAGC classification-associated biomarkers of breast tumor heterogeneity which are specific and reliable in a context of patient survival, as well as mechanistically related biomarkers mostly appropriate for therapeutic targeting. The method includes the following steps:
The method has been successfully used to identify breast cancer patients with distinct prognosis of breast cancer recurrence (as shown below). We apply our method to two original total (unselected) breast cancer patient cohorts (Uppsala and Stockholm cohorts (training) as well as to Marseille, Harvard 2, Singapore and OriGene cohorts (testing)). The optimal parameters of SAGC for original cohorts are presented in Table 7L.
The method can be also applied to a patient subpopulation with a given tumor subtype shown to be heterogeneous upon application of SAGC and described in the steps 9-19 above. Because the tumors in subpopulations/subtypes are biologically more homogeneous than the tumors in original unselected cohorts, for the identification of robust DEGs and associated mechanistically-related and therapeutic biomarkers, at least three independent patient groups with size at least 100 patients in each is recommended. We are not aware of a similar method.
Step 22. A method for identification of specific HR subgroups (with a relative upregulation of “proteasome- and spliceosome-enriched” genes associated with poor prognosis of breast tumors) of breast cancer patients from original/total unselected groups using SAGC and method described on Step 20. Results of application of this method are shown in Table 10 and
That specific subgroup is characterized by: i) significantly higher rate of distant metastases/distant recurrence; ii) resistance to chemotherapy and hormonotherapy (
Step 23. A method for identification of specific HR subgroups (with “proteasome-” and “spliceosome-enriched” breast tumors) of breast cancer patients from original/total unselected groups of breast tumors using genes of proteasome and/or spliceosome complex B in breast tumors. The method includes computational procedures on steps 3-6 in
Step 24. A method for identification of novel drug targets using SAGC and their implication. In the current proposal, we identified the certain genes of proteasome and spliceosome as novel prospective therapeutic target(s) in primary breast tumors which were classified as “proteasome-” and “spliceosome-enriched” HR subtype and were revealed using SAGC. We propose that existing or novel drugs which could be used for the treatment breast cancer patients belonging to the “proteasome-” and “spliceosome-enriched” subgroup can be identified based our prognostic method and our SAGC. The “proteasome-” and “spliceosome-enriched” subtype of breast tumors could be sensitive to: i) anti-spliceosome drugs belonging to the GEX1 group [48]; ii) synthetic compounds spliceostatin A, meayamycin, meayamycin B and their derivatives which target U2 snRNP and block spliceosome complex A formation [49]; iii) groups of compounds called sudemycins and their derivatives; iv) groups of compounds called pladienolides and their derivatives, such as E7107; v) compound isoginkgetin and its analogs targeting precatalytic stage of spliceosome assembly and inhibiting the A to B spliceosome complex transition [50]; vi) anti-proteasome drugs targeting i) the 20S proteolytic proteasome subunit (such as Bortezomib); ii) the 19S proteolytic proteasome subunit (such as b-AP15).
We are aware of two similar developments. Firstly, a study in which it has been shown that anti-LSM1(anti-oncogene) antisense gene therapy can be effective in vitro (pancreatic cell line) and in vivo (SCID-Bg mice) for pancreatic cancer treatment [51, 52]. Specifically, a single intramural injection of an adenoviral vector expressing a 900-base pair antisense RNA to CaSm (LSM1) directly to subcutaneous AsPC-1 tumors reduced in vivo tumor growth by 40% and extended median survival time from 35 to 60 days [51]. Secondly, a study in which treatment of human breast cancer MCF-7 cells by synthetic compounds FR901464 and meayamycin specifically targeting spliceosome (and, namely SF3b complex) inhibited their proliferation [53]. These results provide independent support of our spliceosome signature, deduced via prognostic method presented in this specification (see Steps 20-24).
Step 25. A method for detecting multidrug-resistant tumors (i.e., resistant to chemo- and hormonotherapy) in primary breast tumors using the genes of precatalytic stage of spliceosome assembly (complex B). Increased level of gene expression for those 14 genes in breast cancer patients indicates the phenotype of resistance to standard chemo- or hormonotherapy. In Reference [54] the authors have addressed the same problem, and showed that the over-expression of the U2-related splicing component RBM17 (SPF45) could be the causative factor and indicator of multidrug-resistant phenotype in HeLa cells. These results support our identification of the 14-gene spliceosome signature and its importance as a mechanistically-driven complex prognostic biomarker.
1) The proposed two-gene classifier RNF139/TATDN1 achieved similar or higher accuracy in prediction of clinical outcome and stratification of ER“+”, LN“−” breast cancer patients who received systemic tamoxifen treatment—to the two-gene expression ratio (HOX13:IL17BR) [38, 55]. The SAGC classifier outperformed the HOX13:IL17BR classifier in the testing experiment (lower log-rank p-value, larger difference for 5-year- and 10-year DFS between LR and HR subgroups). See
2) The SAGC classifier (12 gene pairs, 24 genes) achieved substantially higher accuracy in prediction of clinical outcome and stratification of ER“+”, LN“−” breast cancer patients who received systemic tamoxifen treatment than the Oncotype DX Assay (21 genes) [39]. The SAGC classifier outperformed the Oncotype DX Assay: lower likelihood ratio p-values and larger differences for 5-year- and 10-year DFS between LR and HR subgroups both in the training and testing experiments. See
3) The SAGC classifier (12 gene pairs, 24 genes) achieved substantially higher accuracy in prognosis of clinical outcome and stratification of breast cancer patients with grade 3 tumors. The SAGC classifier outperformed the molecular cytogenetic classifier: dramatically lower log-rank p-value and larger differences for 5-year- and 10-year DFS between LR and HR subgroups in-training experiments. See
4) The SAGC classifier (12 gene pairs, 24 genes) makes possible a prognosis of clinical outcome and stratification of breast cancer patients with grade 3 and grade 3-like tumors. This is shown in
5) The SAGC classifier (12 gene pairs, 24 genes) makes possible the accurate prognosis of clinical outcome and stratification of breast cancer patients with grade 1 and grade 1-like tumors. This is demonstrated by
6) The SAGC classifier (12 gene pairs, 24 genes) makes possible the accurate prognosis of clinical outcome and stratification of breast cancer patients with grade 1 tumors. This is demonstrated by
7) The SAGC classifier (12 gene pairs, 24 genes) makes possible prognosis of clinical outcome and stratification of ER“−” breast cancer patients with similar or higher accuracy than the prototype—the seven-gene classifier from Reference [41]. The SAGC classifier outperformed the corresponding prototype in the training and testing experiments (lower log-rank p-values, larger differences for 5-year- and 10-year RFS/DFS between LR and HR subgroups). This is demonstrated in
8) The SAGC classifier (24 genes) provides higher accuracy in prognosis of clinical outcome and stratification of breast cancer patients with basal-like grade 3 (G3) breast tumors as compared with 2 prototypes—the 14-gene signature (14 genes) from Reference [42] and the 28-kinase immune metagene (28 genes) from Reference [15]. The SAGC classifier outperformed the prototype 1 in the testing experiment (lower log-rank p-value). It outperformed the prototype 2 (lower log-rank p-values in the training experiment, larger differences for 5-year RFS/DFS between LR and HR subgroups). See
9) The proposed SAGC classifier (24 genes) provided substantially higher accuracy in prognosis of clinical outcome and stratification of breast cancer patients with Luminal A breast tumors as compared with the prototype-sixteen kinase gene expression classifier from Reference [14]. SAGC classifier outperformed the corresponding prototype in the training and testing experiments (lower log-rank p-values, larger differences for 5-year- and 10-year RFS/DFS between LR and HR subgroups). See
10) The SAGO classifier (12 gene pairs, 24 genes) made it possible to predict the clinical outcome and stratify breast cancer patients with generally favorable prognosis: ER“+”, LN“−”, PgR“+” patients with tumors <=2 cm who usually do not receive systemic chemo- or tamoxifen therapy. See
11) The proposed SAGO classifier (24 genes) permitted substantially higher accuracy in prognosis of the clinical outcome and stratification of colon cancer patients with stage tumors as compared with the prototype-colon cancer stem cell gene signature from Reference [43]. The SAGC classifier outperformed the corresponding prototype in the training experiment (lower log-rank p-values, larger differences for 5-year RFS between LR and HR subgroups). See
12) The proposed SAGC classifier (24 genes) provided substantially higher accuracy in prognosis of clinical outcome of non-small lung cancer patients from total unselected group as compared with the prototype-non-small lung cancer 17-gene signature from Reference [44]. The SAGC classifier outperformed the corresponding prototype in the training experiment (lower log-rank p-values, larger differences for 5-year and 10-year OS between LR and HR subgroups). See
13) The SAGC classifier (12 gene pairs, 24 genes) made possible identification of novel biomarkers of breast tumors heterogeneity as well as novel drug targets using SAGC.
14) The SAGO classifier (12 gene pairs, 24 genes) made possible identification of breast tumors (breast cancer patients) with “proteasome-” and “spliceosome-enriched” BC subtype characterized by: i) high rate of distant recurrence/distant metastases; ii) resistance to chemo- and hormonotherapy; iii) overrepresented deregulated (overexpressed) genes of proteasome and spliceosome (see
15) The experimental results obtained from the embodiment suggested the possibility of using the genes of proteasome and spliceosome for identification of tumors of “proteasome-” and “spliceosome-enriched” BC subtype by application gene pairs composed of any of those genes in Table 10 to procedures in steps 3-6 in
16) The experimental results obtained from the embodiment suggested the possibility of using the genes spliceosome as robust biomarkers for detecting breast tumors with multidrug resistance (i.e., chemo- and hormonotherapy) corresponding to HR subgroups selected by SAGC in primary breast tumors. As shown in
17) The experimental results obtained using the embodiment suggested the possibility of using the genes of proteasome and spliceosome as potential drug targets for treatment of breast cancer patients with “proteasome-” and “spliceosome-enriched” subtype of breast tumors (see method 20 above). In the similar development 1 (see method 20 above) another gene of U4/U6 snRNP(LSM1) was proposed as antisense RNA therapy target for treatment of pancreatic but not breast cancer. At least eight genes of precatalytic stage of spliceosome showed more robust overexpression than LSM1 in “spliceosome enriched” breast tumors. In the similar development 2 (see method 33 above) the study was performed using MCF-7 breast cancer cell lines; in the current proposal the primary breast tumors have been studied. Our focus was the breast tumors belonging specifically to “proteasome-” and “spliceosome-enriched” subtype. Similar development 2 focused targeting SF3B complex using drugs FR901464 and meayamycin targeting spliceosome complex A; in our proposal we also suggest targeting precatalytic stage of spliceosome (complex B) by drug isoginkgetin or its analogs.
Total RNA was obtained for 58 breast cancer patients from OriGene Technology (Rockville, Md.). Agilent 2100 bio analyzer was used to check the quality of selected total RNA. All the RNA samples used for microarray studies had a RIN value above 8 indicating good quality of RNA. The GeneChip 3′ In vitro transcription (IVT) protocol that includes Reverse transcription to synthesize First strand cDNA, Second-strand cDNA, Biotin-modified mRNA labeling, mRNA purification and fragmentation were carried out using Affymetrix manufacturer's protocol. A total of 500 ng of RNA was used for the above procedures. Positive control RNA provided by the manufacturer was included for quality control check. Hybridization, subsequent washing, and staining of the arrays were carried out as outlined in the GeneChip® Expression Technical Manual. 62 Affymetrix GeneChip® Human Genome U133 Plus 2.0 oligonucleotide chips were used for gene expression analysis. Hybridization was carried out for 16 h; washing and staining were undertaken in Affymetrix Fluidics Station 450 workshop. Probe arrays were scanned using Affymetrix GeneChip Scanner 3000, covering 47,000 transcript variants, containing over 38,500 function-known genes, based on databases (GenBank, dbEST, RefSeq, UniGene database (Build 159 Jan. 25, 2003), Washington University EST trace repository, NCBI human genome assembly (Build 31)).
Biological validation of SAGC was performed in the total unselected groups in the testing groups (
For technical validation of SAGC, the selected ccSAGPs identified using microarray data were validated using strand-specific QRT-PCR. We designed a protocol for strand-specific QRT-PCR for nine out of twelve SAGPs (eighteen genes, Table 11) in order to exclude undesirable noisy signal for gene expression from an opposite DNA strand within the regions of sense-antisense overlaps. Classification of forty two unrelated breast tumors purchased from OriGene (OriGene Technologies, Rockville, Md.) was performed in parallel using the U133Plus microarray expression data (
cDNA synthesis was carried out for 42 total RNAs (250 ng) of breast cancer patient samples purchased from Origene Technologies (Rockville, Md.) using a gene-specific pool of reverse primers specific for the regions of sense/anti-sense transcripts in separate reactions. Oligoprimers were selected based on being located within specific regions spanned by corresponding Affymetrix probesets. Pre-amplification step for sense/anti-sense cDNAs of 42 patient samples was conducted (LifeTechnologies, Taqman PreAmp Master Mix kit) using a gene-specific pool of sense/anti-sense of forward and reverse primers by including actin beta (ACTB) and TATA box binding protein (TBP) as endogenous controls. Taqman probes were designed for all sense and anti-sense genes and also for the endogenous controls. A 96.96 Dynamic Array IFC was prepared according to the manufacturer's instructions (Fluidigm, San Francisco, Calif.) and as described in Reference [56]. Quantitative PCR was performed using a gene assay (1st BASE, Singapore), according to the protocol for the Biomark System (Fluidigm, San Francisco, Calif.). Reaction conditions were as follows: 50° C. for 2 min, 70° C. for 30 min, 25° C. for 10 min and 50° C. for 2 min and 95° C. for 10 min, followed by 40 cycles of 95° C. for 15 sec and 60° C. for 60 sec. Data processing and Ct values extraction was done by using detector threshold settings, allowing thresholds to be individually set for each gene, and linear baseline correction was performed using Biomark Real-time PCR Analysis software (v.3.0.4) (Fluidigm, San Francisco, Calif.). Relative quantification of various genes was done using the ΔΔCt method [57]. A list of forward and reverse primers for both sense/anti-sense genes along with respective fluorescent Taqman probes labeled with FAM-TAMRA quencher is shown in Table 9.
The Applicability of the SAGC for Identification of Novel Biomarkers of Breast Tumors Heterogeneity, Biomarkers of Resistance to Standard Chemo- and Hormonotherapy as Well as for Discovery of Novel Potential Drug Targets for Specific Breast Tumor Subtypes.
In order to test if SAGC can identify candidates for novel robust biomarkers of specific breast tumors subpopulations we applied SAGC for 7 independent total unselected cohorts having 1161 breast cancer patients in total. In the first step, optimal parameters for the 2-D RDDg procedure (design, rotation angle and gene expression cutoff) were chosen and fixed in the training procedure (Uppsala and Stockholm cohorts) and applied to 5 other independent testing cohorts (Marseille, Harvard, OriGene, Singapore and Metadata cohorts,
The second step included identification of differentially expressed genes between low-risk and high risk subgroups using EDGE software [58] in the Uppsala, Stockholm and Metadata cohorts (training cohorts for differential expression). The robust list of 1377 genes which passed the selection criteria (FDR corrected t-test Q-value<0.01) simultaneously in three cohorts were selected for further FGA/GO enrichment analysis by DAVID software. We found among 978 genes upregulated in HR subgroups within the category KEGG_PATHWAY such FGA terms as “DNA replication” (p=2.1e-10)”, “cell cycle” (p=3.3e-14), “mismatch repair” (p=1.2e-4) (Tables 6 and 11). Similarly, within the category SP_PIR_KEYWORDS we observed strong enrichments for cell division, mitosis, DNA replication and ubiquitin conjugation pathway. Importantly, among all 978 differentially expressed upregulated genes the FGA term “Proteasome” (KEGG_PATHWAY, p=5.5e-17) had showed the strongest enrichment (p=5.5E-17). Within the same category, we also observed strong enrichment for the term “Spliceosome” (p=8.5E-05). Moreover, among upregulated genes several other categories revealed various terms associated with proteasome, splicing and spliceosome: “proteasome complex” (GOTERM_CC_FAT, 9.8E-18), “mRNA splicing” (SP_PIR_KEYWORDS, p=1.3e-07), “RNA splicing” (GOTERM_BP_FAT, p=6.8e-08) and others (Table 6).
In order to get an idea how the SAGC-associated genes (i.e., differentially expressed genes between HR and LR subgroups derived by SAGC) are related to currently known breast cancer-associated genes, we compared the SAGC-associated gene set with: 1) the published gene set of Genetic Grade Signature (201 unique Gene Symbols) [22]; 2) the reliable set of 289 genes significantly associated with breast cancer from MalaCard database (http://www.malacards.org/card/breast_cancer). In the first comparison, striking enrichment (8.2 times, p=3.0E-82,
Uppsala, Stockholm and Metadata cohorts showed significant enrichment of FGA/GO terms for proteasome and spliceosome genes between HR and LR subgroups (Tables 6, 10 and 11). We suggested that HR-subgroups selected by SAGC demonstrate similar specific molecular characteristic and we proposed that they belong to the same novel subtype of breast tumors enriched by the overexpressed genes of proteasome and spliceosome. More detailed analysis revealed that the identified spliceosome genes mostly belong to the same specific stage of spliceosome cycle-precatalytic spliceosome, or complex B. Of note, this stage of splicing cycle is marked by formation of snRNP complex composed of U1-, U2-snRNPs, Prp19 complex and U4/U5/U6 tri-snRNPs and followed by the catalytic spliceosome, or active complex C, when chemical steps of splicing occur. Complex C misses the U4/U6 snRNPs [59]. The stage of complex B is also distinct from the stage of complex A where only U1- and U2-snRNPs, but not Prp19 and U4/U5/U6 tri-snRNPs are involved [59].
Analysis of 27 proteasome genes (proteasome gene signature) identified under the DAVID term “hsa03050: Proteasome” revealed that they are evenly representing both the 20S core particle and the 19S regulatory particle of proteasome (Tables 6, 10 and 11). The association of the SAGC-based classification with proteasome (20S and 19S subunits) and spliceosome (precatalytic splicing) genes is interesting in context of drug targets for BC. The first anti-proteasome drug targeting the 20S proteolytic proteasome subunit, Bortezomib, was developed [60] and approved by US FDA for treatment of multiple myeloma. However, due to drug resistance, its efficiency in BC was insignificant when used as a single agent. Recently, a novel drug targeting the 19S-proteasome subunit, b-AP15, was identified and tested against several cancers [61] in mice. In contrast to Bortezomib, b-AP15 induced apoptosis regardless of mutations or deletions in TP53 or amplification of BCL2 [61]. These data suggest that the development of multigene classifiers to specifically identify and predict “proteasome-” and “spliceosome-enriched” patient subgroups could improve personalized treatment schemes in BC. In turn, these therapies could be combined with standard adjuvant therapy and known or novel anti-proteasome and anti-spliceosome drugs [60, 61, 62]
We suggested that those 25 spliceosomal and 27 proteasomal genes (Table 10) could be used for development of novel biomarker(s)/drug targets specific for the “proteasome-” and “spliceosome enriched” subtype identified by SAGC. Noteworthy, that similar scheme could be applied within other specific subpopulations of breast tumors and, correspondingly, novel biomarkers of high-risk subgroups could be identified by SAGC.
As more detailed drug treatment information has been available in the Stockholm, Harvard, OriGene and Singapore cohorts, we checked if SAGC could be useful for the assessment of drug resistance in standard treatment schemes after curative surgery. In four cohorts total percentages of patients who underwent systemic treatment (chemotherapy or hormonotherapy or both) was not different in LR and HR subgroups (
More intriguing potential drug for such breast cancer patients would be naturally occurring biflavonoid isoginkgetin which have been shown to be general inhibitor of splicing in vitro and in vivo [50]. In in vitro reactions, isoginkgetin caused the arrest of spliceosome assembly and sequestered pre-mRNA in complex A. Importantly, isoginkgetin is also known as an inhibitor of tumor invasion through regulation of PI3K/Akt/NF-kappa B signaling pathway in MDA-MB-231 breast cancer cell line [74]. As in our study we observed robust upregulation of several genes specific for the following complex B in the “spliceosome-enriched” subtype, isoginkgetin could be an even more specific drug for such breast cancer patients than pladienolides, spliceostatin A and sudemycins [48].
Alternatively, those 27 genes of proteasome and 25 spliseosome genes robustly overexpressed in SAGC HR subgroups could be used directly to develop a specific assay(s) for prognosis of breast cancer outcome. Correct identification of that specific subgroup of patients (either by SAGC or using the genes of proteasome and/or spliceosome as biomarkers or both in combination) would facilitate development of novel systemic treatment schemes and modalities for them. Such schemes would use the combination of conventional drugs targeting cell cycle and DNA replication, hormonotherapy as well as agents targeting specific components of spliceosome.
Another important property of the most anti-spliceosome drugs is their highly selective tumor cytotoxicity as opposed to normal tissues [46, 47]. One could suggest, that transient, short term tumors treatment with drugs specifically targeting spliceosome may not lead to substantial drug side effects, though it could potentially lead to significant increase of tumor's sensitivity in the course of the following standard chemotherapy. On the other hand, efficiency/drug resistance effects of the novel combined treatment schemes could be tested by the SAGC (
The published datasets as well as our own original breast cancer dataset used in this document are summarized in Table 4.
For the microarray and survival analyses we used two independent microarray datasets from Sweden—the Uppsala cohort representing breast=cancer patients resected in Uppsala County and the Stockholm cohort derived from breast cancer patients operated on at the Karolinska Hospital [22, 75]; one dataset from France-including 250 breast cancer patients at the Institute Paoli-Calmettes and Hopital Nord (Marseille) [76]. The Harvard cohort 1 included primary 38 breast tumors classified as basal-like and non-basal-like subtypes obtained as anonymous samples from Harvard SPORE blood and tissue repository [77]. The Harvard cohort 2 (115 samples) was another collection of primary breast tumors from NCI-Harvard Breast SPORE blood and tissue repository [78]. The Singapore samples were derived from patients operated on at the National University Hospital (Singapore) from Feb. 1, 2000, through Jan. 31, 2002 [22]. Colon cancer microarray dataset was collected in Academic Medical Center in Amsterdam (Netherlands) [43], Non-Small Lung Cancer Dataset—from Erasmus University Medical Center in Rotterdam (Netherlands) [44].
To obtain the additional large testing group used to verify the SAGC as well as to do massive DEG analysis, we combined the microarray expression datasets from 5 independent BC cohorts (Metadata: combined the Oxford, the Guys hospital (GEO accessions: GSE6532, GSE9195), the Harvard (GEO accession: GSE19615), the Marseille (GEO accession: GSE21653) and the BII-OriGene cohorts (GEO accession: GSE61304). To obtain the testing group for verification of the SAGC in G3 breast tumors and other tumors subpopulations we joined microarray expression datasets of the Uppsala and Stockholm cohorts into the whole dataset with consequent batch effect correction using dChip [79]. Further, we checked the quality of the joined dataset applying the R-package arrayQualityMetrics [80].
The methods according to the described embodiments may be implemented on a standard computer system such as an Intel IA-32 based computer, as shown in
As shown in
The system 200 also includes a display adapter 214, which is connected to a display device such as an LCD panel display 222, and a number of standard software modules, including an operating system 224 such as Linux or Microsoft Windows. The system 200 may include structured query language (SQL) support 230 such as MySQL, available from http://www.mysql.com, which allows data to be stored in and retrieved from an SQL database 232. The database 232 may store the gene expression data from the plurality of subjects, for example, and may also store the output of the processes described above (classification parameters, identification of gene pairs, and so on). In one embodiment, the modules implementing the above processes are realized as scripts 202 received as input by the R statistical programming environment 234, which has associated with it a plurality of add-on modules including dChip and arrayQualityMetrics of Bioconductor 236. The scripts 202 contain instructions for performing, within the R environment 234, a series of computational operations corresponding to some or all of the steps 1 to 25 of
Certain embodiments may relate to a kit for predicting clinical outcome in a subject having a medical condition. The kit may comprise a plurality of polynucleotide sequences or other probes capable of specifically binding to a target sequence in a sample (for example, a tissue sample, or a body fluid sample such as blood, urine, saliva, etc.) to allow a concentration or copy number of the target sequence in the sample to be quantified. As is well-known in the art, such probes may comprise a detectable label such as a fluorescent, phosphorescent or radioactive moiety which emits detectable electromagnetic or other radiation. For example, the probes may be fluorescent reporter probes used in a quantitative PCR process. In another example, the probes may be unlabelled oligonucleotide or cDNA probes bound to a solid support, to which labelled target sequences (each bound to a fluorescent dye, for example) can specifically hybridize in order to quantify the concentration or copy number of the target sequences.
The kit may comprise a plurality of polynucleotide sequences being capable of specifically hybridizing to and/or detecting a gene of a plurality of genes and/or an expression product of the gene to obtain respective gene expression values. In particular, the plurality of genes may comprise genes of one or more of the sense-antisense gene pairs (SAGPs) listed in Table 1A. Preferably the kit comprises polynucleotide sequences corresponding to no more than 100 genes.
The kit may also comprise written instructions for comparing the respective gene expression values to optimal gene expression cut-off values for respective ones of the plurality of genes in order to make the prediction of clinical outcome. For example, the written instructions may contain the cut-off values and an indication of the clinical relevance of expression of respective genes being above or below respective cut-off values.
In some embodiments the kit may comprise, alternatively to or in addition to the written instructions, a tangible computer-readable medium having stored thereon machine-readable instructions for causing a computer processor to compare the respective gene expression values to optimal gene expression cut-off values for respective ones of the plurality of genes in order to make the prediction of clinical outcome. In some embodiments the optimal gene expression cut-off values are determined for each SAGP by:
Preferred embodiments of the invention exhibit the following advantageous features:
1. A fully automatic method of identification of human breast cancer associated ccSAGPs which expression pattern models and model′ cut-off values form a highly confidence combined survival prognostic signature (CSPS) stratifying the patients onto favorable and unfavorable subgroups predicted within conventional clinical or/and molecular classification systems of breast tumors (
2. A fully automatic method of identification of human breast cancer associated ccSAGPs which expression pattern models and model′ cut-off values form a highly confidence CSPS stratifying the patients onto favorable and unfavorable subgroups within conventional clinical or/and molecular classification of colon and lung tumors. The same is applicable to any other oncologic diseases or other disease when information about patient's survival or other time-course treatment response is available.
3. A fully automatic method of breast cancer patient's risk stratification based on statistical voting of negatively and positively correlated and physically interconnected ccSAGPs forming cancer's patient CSPS which stratifying the patients onto favorable and unfavorable clinical subgroups and which is also applicable to the stratification of breast cancer, lung cancer, and colon cancer types or subtypes. The same is applicable to any other oncologic diseases or other disease when information about patient's survival or other time-course treatment response is available.
4. More generally, a fully automatic method of cancer patient's risk stratification based on statistical voting of correlated or co-regulated or physically interconnected gene pairs (or/and other linked feature pairs characterizing neoplastic process) forming cancer patient′ CSPS, which stratifying/discriminating the patients having a given tumor type (or/and a subtype) onto favorable and unfavorable clinical subgroups. The same is applicable to any oncologic diseases or other disease when information about patient's survival or other time-course treatment response is available.
5. A method of implementation of sense-antisense gene classifier (SAGC) as a complex biomarker composed of a specific subset of gene pairs which can substantially improve the accuracy of re-classification of breast cancer tumors into relatively low-risk (unfavorable) and relatively high-risk (favorable) subgroups within patient's group defined by conventional clinical or/and molecular classification system of breast tumors (
6. A fully automatic method of patient's survival prediction adapted to any correlated gene pairs (including ccSAGPs and all other subclasses of sense-antisense transcripts and gene pairs) and termed the 2-D rotation data-driven grouping (2-D RDDg). The method is applicable not only to ccSAGPs, but also to any significantly correlated gene pairs/transcripts including other known classes of sense-antisense gene pairs and sense-antisense transcripts pairs.
7. A computerized method of integration of survival information for individual gene pairs into a dramatically improved patients partition which is based on statistically weighted voting grouping procedure. The method is applicable not only to individual gene pairs but also to any individual genes or to other characteristics of the patients with available survival information.
8. A computerized method for implication of any gene pairs including sense-antisense gene pairs for prognosis/prediction and stratification in cancer patients with available survival information. The method includes estimation of the optimal cut-offs for expression values for each of the two genes, the optimal design and rotation angle using 2-D RDDg procedure in one training cohort composed of at least 50 breast cancer patients with consequent testing using 2-D RDDg procedure in at least one cohort composed of at least 50 patients. The method is applicable not only to breast cancer patients, but also to any cancer patients with available survival information.
9. A computerized method for implication of sense-antisense gene classifier which includes at least two steps (training and testing procedures) using 2-D RDDg procedure coupled with WVG procedure and is based on methods in features 5 and 4 (
10. A computerized method for stratification and prediction of clinical outcome of ER“+”, LN“−” breast cancer patients who received adjuvant systemic tamoxifen treatment after curative surgery using the RNF139/TATDN1 SAGP. The method includes estimation of the optimal parameters for 2-D RDDg procedure (training procedure) for the individual gene pair and its testing using 2-D RDDg procedure as in claim 8.
11. A computerized method for stratification and prediction of clinical outcome of ER“+”, LN“−” breast cancer patients received adjuvant systemic tamoxifen treatment after curative surgery using SAGC. The method includes estimation of the optimal parameters for 2-D RDDg procedure (training procedure) and the testing procedure for all ccSAGPs comprising SAGC as described in feature 8. SAGC is implemented as in feature 9.
12. A computerized method for stratification and prognosis of clinical outcome of breast cancer patients with grade 3 tumors using the VPRBP/RBM15B SAGP as well as the full SAGC. The method includes estimation of the optimal parameters for 2-D RDDg procedure (training procedure) and the testing procedure for all ccSAGPs comprising SAGC as described in feature 8. SAGC is implemented as in feature 9.
13. A computerized method for stratification and prognosis of clinical outcome of breast cancer patients with grade 3 and grade 3-like tumors using the SAGPs C18orf8/NPC1 and EME1/LRRC59 as well as the full SAGC. The method includes estimation of the optimal parameters for 2-D RDDg procedure (training procedure) and the testing procedure for all ccSAGPs comprising SAGC as described in feature 8. SAGC is implemented as in feature 9.
14. A computerized method for stratification and prognosis of clinical outcome of breast cancer patients with grade 1 and grade 1-like tumors using the SHMT1/SMCR8 SAGP as well as the full SAGC. The method includes estimation of the optimal parameters for 2-D RDDg procedure (training procedure) and the testing procedure for all ccSAGPs comprising SAGC as described in feature 8. SAGC is implemented as in feature 9.
15. A computerized method for stratification and prognosis of clinical outcome of breast cancer patients with grade 1 breast tumors using the full SAGC. The method includes estimation of the optimal parameters for 2-D RDDg procedure (training procedure) and the testing procedure for all ccSAGPs comprising SAGC as described in feature 8. SAGC is implemented as in feature 9.
16. A computerized method for stratification and prognosis of clinical outcome of ER“−” breast cancer patients from total unselected groups using the CTNS/TAX1 BP3 SAGP as well as the full SAGC. The method includes estimation of the optimal parameters for. 2-D RDDg procedure (training procedure) and the testing procedure for all ccSAGPs comprising SAGC as described in feature 8. SAGC is implemented as in feature 9.
17. A computerized method for stratification and prognosis of clinical outcome of breast cancer patients with basal-like grade 3 (G3) breast tumors using the SAGPs CTNS/TAX1BP3 and RNF139/TATDN1 as well as the full SAGC. The method includes estimation of the optimal parameters for 2-D RDDg procedure (training procedure) and the testing procedure for all ccSAGPs comprising SAGC as described in feature 8. SAGC is implemented as in feature 9.
18. A computerized method for stratification and prognosis of clinical outcome of breast cancer patients with Luminal A breast tumors using the BIVM/KDELC1 SAGPs as well as the full SAGC. The method includes estimation of the optimal parameters for 2-D RDDg procedure (training procedure) and the testing procedure for all ccSAGPs comprising SAGC as described in feature 8. SAGC is implemented as in feature 9.
19. A computerized method for stratification and prognosis of clinical outcome of ER“+”, LN“−”, PgR“+” breast cancer patients with breast tumors <=2 cm on the moment of curative surgery who usually do not receive any systemic treatment, using the SAGC. The method includes estimation of the optimal parameters for 2-D RDDg procedure (training procedure) and the testing procedure for all ccSAGPs comprising SAGC as described in feature 8. SAGC is implemented as in feature 9.
20. A computerized method for stratification and prognosis of clinical outcome of colon cancer patients with stage II tumors using the SAGC. The method includes estimation of the optimal parameters for 2-D RDDg procedure (training procedure) and the testing procedure for all ccSAGPs comprising SAGC as described in feature 8 SAGC is implemented as in feature 9.
21. A computerized method for stratification and prognosis of clinical outcome of non-small lung cancer patients from total unselected group using the SAGC. The method includes estimation of the optimal parameters for 2-D RDDg procedure (training procedure) and the testing procedure for all ccSAGPs comprising SAGC as described in feature 8. SAGC is implemented as in feature 9.
22. A computerized method for identification of novel biomarkers of breast tumors heterogeneity as well as novel potential candidates for drug targets using SAGC. i) stratification of breast cancer patients into low-risk and high-risk subgroups using the workflow described in steps 3-6 of
23. A computerized method for the identification of a high risk disease recurrence patient subgroup of BC patients, which primary tumors are characterized by over-expression of “proteasome-enriched” and “spliceosome-enriched” genes (Table 10) including the genes differentially expressed between low-risk and high-risk groups defined by SAGC in several original patient cohorts. Such specific patient subgroups are characterized by: i) significantly higher rate of distant metastases/distant recurrence events; ii) more often resistance against primary chemotherapy and hormone therapy (
24. A computerized method for the stratification of BC patients and an identification of a high-risk subgroup of the patients with “spliceosome-enriched” in total unselected groups of the patients using 27-gene prognostic signature (or proteasome-based predictor) of proteasome machinery and 25-gene prognostic signature of spliceosome machinery (or spliceosome-based predictor) (Table 10).
25. An assay/kit for detecting multidrug-resistant tumors (i.e., resistant to chemotherapy- and hormonotherapy) in breast tumors and their treatment monitoring using the proteasome-based predictor and spliceosome-based predictor of (Table 10).
26. A method for identification of novel drug targets using strategy of discovery of SAGC classifier and the signature of spliceosome complex B.
27. A method for identification of novel cancer biomarker or drug targets using genes of SAGC or the products derived from the genes of that molecular signature and used as the biomarkers or drug targets.
28. A method for identification of novel cancer biomarker or drug targets using genes of the proteasome and spliceosome or the products derived from the same gene's and used as the biomarkers or drug targets.
29. An assay/kit using combined any genes of SAGC and their products as biomarkers of breast, lung, colon and other cancers.
9
17
SSB
NM_003142
autoantigen La
308
18
METTL5
NM_014168
methyltransferase like 5
18
35
CTNS
NM_004937
cystinosis, nephropathic isoform 2
211
36
TAX1BP3
NM_014604
Tax1 (human T-cell leukemia virus
type I)
25
49
AIMP2
NM_006303
aminoacyl tRNA synthetase complex-
1588
50
EIF2AK1
NM_014413
interacting multifunctional protein 2
eukaryotic translation initiation factor
2-alpha
29
57
RNF139
NM_007218
ring finger protein 139
125
58
TATDN1
NM_032026
TatD DNase domain containing 1
32
63
RBM15B
NM_013286
RNA binding motif protein 15B
2039
64
VPRBP
NM_014703
HIV-1 Vpr binding protein
33
65
C18orf8
NM_013326
colon cancer-associated protein Mic1
282
66
NPC1
NM_000271
Niemann-Pick disease, type C1
precursor
46
91
MRPS18C
NM_016067
mitochondrial ribosomal protein S18C
834
92
FAM175A
NM_139076
coiled-coil domain containing 98
47
93
KDELC1
NM_024089
KDEL (Lys-Asp-Glu-Leu) containing 1
6
94
BIVM
NM_017693
basic, immunoglobulin-like variable
motif
57
113
C13orf34
NM_024808
aurora borealis
789
114
DIS3
NM_014953
DIS3 mitotic control isoform b
62
123
POLR2C
NM_032940
DNA directed RNA polymerase II
52
124
DOK4
NM_018110
polypeptide C
docking protein 4
64
127
SMCR8
NM_144775
Smith-Magenis syndrome
184
128
SHMT1
NM_148918
chromosome region,
serine hydroxymethyltransferase 1
(soluble)
67
133
EME1
NM_152463
essential meiotic endonuclease 1
227
134
LRRC59
NM_018509
homolog 1
leucine rich repeat containing 59
LR: 92% DFS;
ccSAGP
HR: 63% DFS
RNF139/
Difference:
TATDN1
29%
4.2
Wald
LR: 89% DFS
p = 0.003;
HR: 54% DFS
Log-rank
Difference: 35%
p = 0.001
HOXB13/IL17BR
2.03
Log-rank
LR: 94% DFS;
p = 0.015
HR: 64% DFS
Difference: 30%
27.9 (DFS,
Wald p = 7.3E−06;
LR: 95%
LR: 84%
12 ccSAGPs
multivariate
Likelihood ratio
HR: 27%
HR: < or =
(SAGC)
with
p = 2.3E−06
Difference: 68%
19%
tumor
Difference: <
size
or = 65%
and
age)
4.8
Wald
LR: 88% DFS;
LR: 68%%
(DFS,
p = 0.0009;
HR: 41% DFS
DFS;
multivariate
Log-rank
Difference:
HR: < or = 41%
with
p = 0.0003
47%
Difference: >
tumor
or = 27%
size
and
age*)
3.21
Likelihood
LR: 98% DRFS;
LR: 94%
21 gene
(DRFS,
ratio p = 0.001
HR: 78% DRFS
DRFS;
signature
multivariate
Difference:
HR: 69% DRFS
(Oncotype DX)
with
20%
Difference:
tumor
25%
size
and
age)
6.4
Wald p = 3.3E−8
LR: 83% DFS;
LR: 66%
12 ccSAGPs
Log-rank
HR: 26% DFS
RFS;
(SAGC)
p = 3.2E−10
Difference:
HR: 18% RFS
57%
Difference:
48%
2.7
Wald p = 0.008
LR: 63% DFS;
LR: 56%
Log-rank
HR: 33% DFS
DFS;
p = 0.006
Difference:
HR: 17% DFS
30%
Difference:
Molecular
cytogenetic
classifier
ccSAGP
RNF139/TATDN1
Ma et al.,
HOXB13/IL17BR
Goetz et
al., [55]
12 ccSAGPs (SAGC)
21 gene signature
(Oncotype DX)
12 ccSAGPs (SAGC)
Molecular
cytogenetic classifier
(SAGC)
(SAGC)
(SAGC)
12 ccSAGPs
(SAGC)
Seven-gene
immune
response
module
12 ccSAGPs
(SAGC)
14-gene
signature
14.5
Wald
LR: 80% DFS;
p = 1.4E−8
HR: 0% DFS
ccSAGPs
Log-rank
Difference:
(SAGC)
p = 7.0-13
80%
3.0
Wald
LR: 60% DFS;
LR: 55%
p = 0.0002
HR: 21% DFS
DFS;
Log-rank
Difference:
HR: 5% DFS
p = 9.7E−05
39%
Difference:
50%
16.1
(5.64-48.56)
Wald
LR: 95% DFS;
NA
p = 3.2E−7
HR: 30% DFS
ccSAGPs
Log-rank
Difference:
(SAGC)
p = 7.1E−12
65%
3.3
(1.73-6.16)
Wald
LR: 90% DFS;
LR: 81%
p = 0.00025
HR: 71% DFS
DFS;
Log-rank
Difference:
HR: 56% DFS
p = 0.0001
19%
Difference:
25%
17.4
(5.67-53.62)
Wald
LR: 99% DFS;
LR: 94%
p = 6.1E−7
HR: 64% DFS
DFS;
ccSAGPs
Log-rank
Difference:
HR: 28% DFS
(SAGC)
p = 1.6E−11
35%
Difference:
66%
0.12
(0.06-0.22)
Wald
LR: 73% DFS;
LR: 67%
p = 1.2E−10
HR: 5% DFS
DFS;
Log-rank
Difference:
HR: 5% DFS
12 ccSAGPs
p = 3.0E−14
68%
Difference:
(SAGC)
62%
LR: 79% DFS;
LR: 72%
HR: 34% DFS
DFS;
Difference:
HR: < or = 26%
45%
DFS
Difference:
> or = 56%
Seven-gene
immune
response
module
0.15
(0.07-0.36)
Log-rank
Good-
Good-
p = 1.0E−06
up: 99%%
up: 94%
DFS;
DFS;
poor: 63% DFS
poor: 50% DFS
Difference:
Difference:
36%
44%
15.6 or
Wald
LR: 81% DFS;
LR: 81%
12 ccSAGPs
0.06
p = 1.0E−07
HR: 9% DFS
DFS;
(SAGC)
Log-rank
Difference:
HR: 0% DFS
p = 5.4E−12
72%
Difference:
0%
5.5 or
(1.89-15.92)
Wald p =
LR: 81% DFS;
LR: 75%
0.18
or
0.0018
HR: 34% DFS
DFS;
(0.06-0.53)
Log-rank
Difference:
HR: 23% DFS
p = 4.8E−04
47%
Difference:
52
14-gene
signature
28-kinase immune
metagene
12 ccSAGPs (SAGC)
Sixteen kinase gene
expression classifier
12 ccSAGPs (SAGC)
12 ccSAGPs(SAGC)
12 ccSAGPs(SAGC)
28-kinase
immune
metagene
Wald
LR: 87% DFS;
LR: 78% DFS;
12 ccSAGPs
p = 1.5E−08
HR: 22% DFS
HR: 0% DFS
(SAGC)
Log-rank
Difference:
Difference: 78%
p = 2.4E−12
65%
LR: 89% DFS;
LR: 73% DFS;
HR: 52% DFS
HR: 41% DFS
Difference:
Difference: 22%
37%
7.77
Sixteen kinase
gene
expression
classifier
Log-rank
LR: 78% RFS;
p = 1.7E−05
HR: 44% RFS
Difference: 34%
16.3
(6.20-42.9)
Wald
LR: 88% DFS;
LR: 79% DFS;
12 ccSAGPs
p = 1.6E−08
HR: 10% DFS
HR: 0% DFS
(SAGC)
Log-rank
Difference:
Difference: 79%
p = 7.5E−14
78%
13.6
Wald
LR: 84% RFS;
LR: 84% RFS;
12 ccSAGPs
p = 9.4E−08
HR: < or = 11%
HR: < or = 11% RFS
Log rank
RFS
Difference: > or =
p = 7.9E−12
Difference: > or =
73% RFS
73% RFS
6.4
Wald
LR: 63% OS;
LR: 40% OS;
12 ccSAGPs
HR: 5% OS
HR: 0% OS
Difference: 58%
Difference: 40%
OS
OS
KEGG_PATHWAY
hsa03050: Proteasome
27
5.53E−17
8.57
hsa04110: Cell cycle
39
3.31E−14
4.65
hsa03030: DNA replication
19
2.06E−10
7.87
hsa03040: Spliceosome
25
8.47E−05
3.08
hsa03430: Mismatch repair
11
1.22E−04
7.13
hsa00240: Pyrimidine metabolism
20
1.77E−03
3.14
hsa00970: Aminoacyl-tRNA
12
7.71E−03
4.36
biosynthesis
hsa00230: Purine metabolism
25
9.43E−03
2.44
hsa04114: Oocyte meiosis
20
1.49E−02
2.71
Tables 7. The optimal classification parameters for SAGC (partition design, rotation angle, and gene expression cut-offs) for 2-D RDDg procedure1. Selected twelve pairs of Affyprobesets have been used for subsequent Weighted Voting Grouping in each group. Comments: 1—for description of the method see Materials and Methods section; 2-optimal cut-off for expression value for the corresponding Affyprobeset. 3-rotation angle coefficient in the 2 RDDg procedure. 4-one of 7 possible two-group designs (see materials and methods section). 5-gene expression data were not Log 2-transformed; gene pairs in which expression values were <=50 were excluded from the consequent WVG procedure. 6-expression data for each probeset were displayed as the log-2 of the deviations to the calculated geometric means for that of probesets.
1
SHFM1
202276_at
split hand/foot
BC032782
7q21.3
4.7E−06
8.7E−06
3.50E−16
hsa03050:Proteasome
malformation
(ectrodactyly) type 1
2
PSMA7
201114_x_at
proteasome subunit,
BC004427
20q13.33
4.7E−06
8.7E−06
3.50E−16
hsa03050:Proteasome
alpha type, 7
3
PSMB5
208799_at
proteasome subunit,
BC057840
14q11.2
4.7E−06
4.0E−04
7.54E−13
hsa03050:Proteasome
beta type, 5
4
PSMB4
202243_s_at
proteasome subunit,
na
1q21
4.7E−06
8.7E−06
3.50E−16
hsa03050:Proteasome
beta type, 4
5
PSMB7
200786_at
proteasome subunit,
BC000509
9834.11-
4.7E−06
1.2E−04
6.54E−14
hsa03050:Proteasome
beta type, 7
q34.12
6
PSMB6
208827_at
proteasome subunit,
BC000835
17p13
4.7E−06
1.6E−05
1.16E−15
hsa03050:Proteasome
beta type, 6
7
PSMB1
200876_s_at
proteasome subunit,
BC000508
6q27
4.7E−06
8.7E−06
3.50E−16
hsa03050:Proteasome
beta type, 1
8
PSMB2
200039_s_at
proteasome subunit,
BC101836
1p34.2
4.7E−06
8.7E−06
3.50E−16
hsa03050:Proteasome
beta type, 2
9
PSMD1
201198_s_at
proteasome 26S
BC094720
2q37.1
4.7E−06
5.3E−04
1.31E−12
hsa03050:Proteasome
subunit, non-ATPase, 1
10
PSMD2
200830_at
proteasome 26S
BC007897
3q27.1
4.7E−06
1.9E−04
1.70E−13
hsa03050:Proteasome
subunit, non-ATPase, 2
11
PSMD4
200882_s_at
proteasome 26S
BC002365
1q21.3
4.7E−06
1.6E−05
1.16E−15
hsa03050:Proteasome
subunit, non-ATPase, 4
12
PSMD7
201705_at
proteasome 26S
BC012606
16q22.3
4.7E−06
6.3E−04
1.87E−12
hsa03050:Proteasome
subunit, non-ATPase, 7
13
PSMA2
201317_s_at
proteasome subunit,
BC047697
7p13
4.7E−06
6.4E−04
1.90E−12
hsa03050:Proteasome
alpha type, 2
14
PSMA1
210759_s_at
proteasome subunit,
BC015356
11p15.1
4.7E−06
8.7E−06
3.50E−16
hsa03050:Proteasome
alpha type, 1
15
PSMD14
212296_at
proteasome 26S
BC066336
2q24.2
4.7E−06
8.7E−06
3.50E−16
hsa03050:Proteasome
subunit, non-ATPase,
14
16
PSMA6
208805_at
proteasome subunit,
BC002979
14q13
4.7E−06
2.8E−03
3.64E−11
hsa03050:Proteasome
alpha type, 6
17
PSMD12
202352_s_at
proteasome 26S
BC019062
17q24.2
4.7E−06
8.7E−06
3.50E−16
hsa03050:Proteasome
subunit, non-ATPase,
12
18
PSMA5
201274_at
proteasome subunit,
BC103751
1p13
4.7E−06
4.2E−03
8.18E−11
hsa03050:Proteasome
alpha type, 5
19
PSMD11
208777_s_at
proteasome 26S
BC000437
17q11.2
4.7E−06
8.7E−06
3.50E−16
hsa03050:Proteasome
subunit, non-ATPase,
11
20
PSMC3
201267_s_at
proteasome 26S
BC008713
11p11.2
4.7E−06
5.6E−04
1.46E−12
hsa03050:Proteasome
subunit, ATPase, 3
21
PSMA4
203396_at
proteasome subunit,
BC005361
15q25.1
4.7E−06
6.3E−04
1.87E−12
hsa03050:Proteasome
alpha type, 4
22
PSMC2
201068_s_at
proteasome 26S
BC002589
7q22.1-
4.7E−06
4.8E−05
1.06E−14
hsa03050:Proteasome
subunit, ATPase, 2
q22.3
23
PSMC1
204219_s_at
proteasome 26S
BC000512
14q32.11
4.7E−06
2.2E−03
2.27E−11
hsa03050:Proteasome
subunit, ATPase, 1
24
PSMA3
201532_at
proteasome subunit,
BC005265
14q23
4.7E−06
2.4E−03
2.64E−11
hsa03050:Proteasome
alpha type, 3
25
POMP
217769_s_at
proteasome maturation
BC003390
13q12.3
4.7E−06
4.0E−03
7.57E−11
hsa03050:Proteasome
protein
26
PSME3
209853_s_at
proteasome activator
BC008020
17q21
1.1E−05
2.7E−04
7.91E−13
hsa03050:Proteasome
subunit 3
27
PSME4
212219_at
proteasome activator
BC112169
2p16.2
4.7E−06
1.6E−04
1.22E−13
hsa03050:Proteasome
subunit 4
E2F transcription factor 2
tyrosine 3-
monooxygenase
activation protein
DBF4 homolog
TTK protein kinase
protein kinase,
membrane associated
tyrosine/threonine 1
CHK1 checkpoint
homolog (
)
anaphase promoting
complex subunit 11
pituitary tumor-
transforming 1
ring-box 1, E3 ubiquitin
protein ligase
cyclin E2
cyclin E1
cell division cycle 45
homolog (
)
minichromosome
maintenance complex
component 7
RAD21 homolog
budding uninhibited by
benzimidazoles 1
homolog
cyclin A2
transcription factor Dp-1
cell division cycle 7
homolog (
)
cell division cycle 6
homolog (
)
cyclin-dependent
kinase 1
S-phase kinase-
associated protein 2
(p45)
78
NCBP1
209520_s_at
nuclear cap binding
BC001450
9q34.1
6.0E−04
2.8E−03
4.61E−09
hsa03040:Spliceosome
protein subunit 1,
80 kDa
79
NHP2L1
201077_s_at
NHP2 non-histone
BC005358
22q13
4.7E−06
6.3E−04
1.82E−12
hsa03040:Spliceosome
chromosome protein 2-
like 1
80
PPIL1
222500_at
peptidylprolyl
BC003048
6p21.1
1.1E−05
1.1E−03
1.34E−11
hsa03040:Spliceosome
isomerase (cyclophilin)-
like 1
81
LSM7
204559_s_at
LSM7 homolog, U6
BC018621
19p13.3
4.7E−06
1.1E−03
5.76E−12
hsa03040:Spliceosome
small nuclear RNA
associated
82
SNRPD1
202690_s_at
small nuclear
BC001721
18q11.2
4.7E−06
8.7E−06
3.50E−16
hsa03040:Spliceosome
ribonucleoprotein D1
polypeptide 16 kDa
83
SNRPD2
200826_at
small nuclear
BC000486
19q13.2
4.7E−06
5.7E−04
1.51E−12
hsa03040:Spliceosome
ribonucleoprotein D2
polypeptide 16.5 kDa
84
SF3B5
221263_s_at
splicing factor 3b,
BC000198
6q24.2
4.7E−06
2.7E−04
3.49E−13
hsa03040:Spliceosome
subunit 5, 10 kDa
85
SF3B3
200687_s_at
splicing factor 3b,
BC003146
16q22.1
4.7E−06
8.7E−06
3.50E−16
hsa03040:Spliceosome
subunit 3, 130 kDa
86
HNRNPA3
211930_at
heterogeneous nuclear
BC064494
2q31.2
4.7E−06
7.3E−03
2.46E−10
hsa03040:Spliceosome
ribonucleoprotein A3
87
HNRNPK
200775_s_at
heterogeneous nuclear
BC000355
9q21.32-
1.0E−02
1.6E−03
2.40E−08
hsa03040:Spliceosome
ribonucleoprotein K
q21.33
88
RBM8A
222443_s_at
RNA binding motif
BC017088
1q12
3.5E−04
8.7E−06
2.59E−14
hsa03040:Spliceosome
protein 8A
89
USP39
217829_s_at
ubiquitin specific
BC067273
2p11.2
1.4E−05
1.4E−04
2.73E−13
hsa03040:Spliceosome
peptidase 39
90
LSM4
202737_s_at
LSM4 homolog, U6
BC000387
19p13.11
4.7E−06
8.7E−06
3.50E−16
hsa03040:Spliceosome
small nuclear RNA
associated
91
LSM3
202209_at
LSM3 homolog, U6
BC007055
3p25.1
8.1E−04
3.7E−03
1.13E−08
hsa03040:Spliceosome
small nuclear RNA
associated
92
SNRPA1
215722_s_at
small nuclear
BC022816
15q26.3
2.8E−05
8.7E−06
2.08E−15
hsa03040:Spliceosome
ribonucleoprotein
polypeptide A′
93
EFTUD2
222398_s_at
elongation factor Tu
BC002360
17q21.31
4.7E−06
5.4E−03
1.36E−10
hsa03040:Spliceosome
GTP binding domain
containing 2
94
PRPF18
221547_at
PRP18 pre-mRNA
BC000794
10p13
2.2E−03
6.7E−05
9.91E−12
hsa03040:Spliceosome
processing factor 18
homolog
95
EIF4A3
201303_at
eukaryotic translation
BC004386
17q25.3
4.7E−06
8.7E−06
3.50E−16
hsa03040:Spliceosome
initiation factor 4A3
96
SNRPB
213175_s_at
small nuclear
BC080516
20p13
4.7E−06
1.5E−04
9.78E−14
hsa03040:Spliceosome
ribonucleoprotein
polypeptides B and B1
97
SNRPA
201770_at
small nuclear
BC000405
19q13.1
1.9E−04
4.5E−03
3.91E−09
hsa03040:Spliceosome
ribonucleoprotein
polypeptide A
98
SNRPC
201342_at
small nuclear
BC121082
6p21.31
4.7E−06
8.7E−06
3.50E−16
hsa03040:Spliceosome
ribonucleoprotein
polypeptide C
99
SNRNP27
212438_at
small nuclear
BC017890
2p13.3
1.1E−05
4.6E−03
2.39E−10
hsa03040:Spliceosome
ribonucleoprotein
27 kDa (U4/U6.U5)
100
PUF60
209899_s_at
poly-U binding splicing
BC008875
8q24.3
1.1E−05
8.7E−06
8.34E−16
hsa03040:Spliceosome
factor 60 kDa
101
SNRPG
205644_s_at
small nuclear
BC000070
2p13.3
4.7E−06
8.7E−06
3.50E−26
hsa03040:Spliceosome
ribonucleoprotein
polypeptide G
102
RBM17
224781_s_at
RNA binding motif
BC007871
10p15.1
1.1E−05
8.7E−06
8.34E−16
hsa03040:Spliceosome
protein 17
Number | Date | Country | Kind |
---|---|---|---|
201307917-3 | Oct 2013 | SG | national |
The present application is related to U.S. patent application Ser. No. 13/255,898.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/SG2014/000492 | 10/20/2014 | WO | 00 |