The present invention relates to identification of pairs of genes for which the respective gene expression values in a subject are statistically significant in relation to a medical condition, for example cancer, or more particularly breast cancer. The gene expression values may for example be indicative of the susceptibility of the subject to the medical condition, or the prognosis of a subject who exhibits the medical condition. The invention further relates to methods and arrays employing the identified gene pairs, and in particular three specific gene pairs obtained by the method, in obtaining information about specific subjects.
Global gene expression profiles of subjects are often used to obtain information about those subjects, such as their susceptibility to certain medical condition, or, in the case of subjects exhibiting medical conditions, their prognosis. For example, having determined that a particular gene is important, the level in which that gene is expressed in a subject can be used to classify the individual into one of a plurality of classes, each class being associated with a different susceptibility or prognosis. An important task is the identification of “significant gene signature(s)”, that is gene set(s) such that the corresponding gene expression values can be such to classify subjects in a useful way.
First we will describe briefly the background theory of survival analysis. We denote by T the patient's survival time. T is a continuous non-negative random variable which can take values t, tε[0,∞) T has density function f(t) and cumulative distribution function
We are primarily interested in estimating two quantities:
The survival function expresses the probability of a patient to be alive at time t. It is often presented in the form S(t)=exp(−H(t)), where
denotes the cumulative hazard. The hazard function assesses the instantaneous risk of death at time t, conditional on survival up to that time.
Notice that the hazard function is expressed in terms of the survival function. To this extent, survival distributions and hazard functions can be generated for any distribution defined for tε[0,∞). By considering a random variable W, distributed in (∞,−∞), we can generate a family of survival distributions by introducing location (α) and scale (σ) changes of the form log T=α+σW.
Alternatively, we can express the relationship of the survival distribution to covariates by means of a parametric model. The parametric model employs a “regressor” variable x. Take for example a model based on the exponential distribution and write: log(h(t))=α+βx, or equivalently, h(t)=exp(α+βx).
This is a linear model for the log-hazard, or, equivalently, a multiplicative model for the hazard. The constant α represents the log-baseline hazard (the hazard when the regressor x=0) and the slope parameter β gives the change in hazard rate as x varies. This is an easy example of how survival models can be obtained from simple distributional assumptions. In the next paragraphs we will see more specific examples.
One of the most popular survival models is the Cox proportional hazards model (Cox, 1972):
log h(t)=α(t)+βx (1)
where, as before, t is the survival time, h(t) represents the hazard function, α(t) is the baseline hazard, β is the slope parameter of the model and x is the regressor. The popularity of this model lies in the fact that it leaves the baseline hazard function α(t) (which we may alternatively designate as log h0(t))
unspecified (no distribution assumed). It can be estimated iteratively by the method of partial likelihood of Cox (1972). The Cox proportional hazards model is semi-parametric because while the baseline hazard can take any form, the covariates enter the model linearly.
Cox (1972) showed that the β coefficient can be estimated efficiently by the Cox partial likelihood function. Suppose that for each of a plurality of K subjects (labelled by k=1, . . . , K), we observe at corresponding time tk a certain nominal (i.e. yes/no) clinical event has occurred (e.g. whether there has been metastasis). This knowledge is denoted ek. For example ek may be 0 if the event has not occurred by time tk (e.g. no tumour metastasis at time tk) and 1 if the event has occurred (e.g. tumour metastasis at time tk). Cox (1972) showed that the β coefficient can be estimated efficiently by the Cox partial likelihood function, estimated as:
where R(tk)={j: =tk} is the risk set at time tk. Typically, e is a binary variable taking value 0=non-occurrence of the event until time t or 1=occurrence of the event at time t. Later we will discuss a particular case of clinical event we consider in the work, without limiting though our model to this specific case. The likelihood (2) is minimized by Newton-Raphson optimization method for finding successively better approximations to the zeroes (or roots) of a real-valued function, with a very simple elimination algorithm to invert and solve the simultaneous equations.
Assume a microarray experiment with i=1, 2, . . . , N genes, whose intensities are measured for k=1, 2, . . . , K breast cancer patients. The log-transformed intensities of gene i and patient k are denoted as yi,k. Log-transformation serves for data “Gaussianization” and variance stabilization purposes, although other approaches, such as the log-linear hybrid transformation of Holder et al. (2001), the generalized logarithm transform of Durbin et al. (2002) and the data-driven Haar-Fisz transform have also been considered in the literature.
Associated with each patient k are a disease free survival time tk (in this work DFS time), a nominal clinical event ek taking values 0 in the absence of an event until DFS time tk or 1 in the presence of the event at DFS time tk (DFS event) and a discrete gradual characteristic (histologic grade). Note that in this particular work the events correspond to the presence or absence of tumor metastasis for each of the k patients. Other types of events and/or survival times are possible to be analyzed by the model we will discuss below. Additional information, which is not utilized in this work, includes patients' age (continuous variable ranging from 28 to 93 years old), tumor size (in millimeters), breast cancer subtype (Basal, ERBB2, Luminal A, Luminal B, No subtype, normal-like), patients' ER status (ER+ and ER−) and distant metastasis (a binary variable indicating the presence or absence of distant metastasis).
Assuming, without loss of generality, that the K clinical outcomes are negatively correlated with the vector of expression signal intensity yi of gene i, patient k can be assigned to the high-risk or the low-risk group according to:
where ci denotes the predefined cutoff of the ith gene's intensity level. In the case of positive correlation between the K clinical outcomes and yi, patient k is simply assigned to one of the two groups according to:
After specifying xki, the DFS times and events are subsequently fitted to the patients' groups by the Cox proportional hazard regression model:
log hki(tk|xi,βi)=αi(tk)+βixki (4)
where, as before, hik is the hazard function and ai(tk)=log hi0(tk) represents the unspecified log-baseline hazard function for gene i; βi is the regression parameter to be estimated from the model; and tk is the patients' survival time. To assess the ability of each gene to discriminate the patients into two distinct genetic classes, the Wald statistic (W) of the at coefficient of model (4) is estimated by minimizing the univariate Cox partial likelihood function for each gene i:
where R(tk)={i:ti=tk} is the risk set at time tk and ek is the clinical event at time tk. The actual fitting of model (4) is conducted by the survival package in R (http://cran.r-project.org/web/packages/survival/index.html). The genes with the largest βi Wald statistics (Wi's) or the lowest βi Wald P values are assumed to have better group discrimination ability and thus called highly survival significant genes. These genes are selected for further confirmatory analysis or for inclusion in a prospective gene signature set. Note that given βi, one derives the Wald statistic, W, as:
where
and I denotes the Fisher information matrix of the βi parameter. Estimating the Wald P value, simply requires evaluation of the probability:
where xv2 denotes the chi-square distribution with v degrees of freedom.
Typically, v is the number of parameters of the Cox proportional hazards model and in our case v=1. Expression (5) can be derived from the proper statistical tables of the chi-square distribution.
From Eqn. (3) notice that the selection of prognostic significant genes relies on the predefined cut-off value ci that separates the low-risk from the high-risk patients. The simplest cut-off basis is the mean of the individual gene expression values within samples, although other choices (e.g. median, trimmed mean, etc) could be also applied.
In general terms, a first aspect of the present invention proposes that, instead of testing the significance of the expression of individual genes individually, many pairs of genes are generated, and for each pair of genes clinical data is used to fit a statistical model to obtain the statistical significance of the ratio of the corresponding expression values. The clinical data characterizes for each of the patients the level of expressions of the genes and times until a clinical endpoint of interest (“survival time”).
Thus, in contrast to the known techniques described above, reliance is not exclusively on linear (or sometimes non-linear) associations between the expression of individual genes and the clinical endpoint of interest. The present invention is motivated by the belief that it is biologically plausible that some genes may by themselves have no strong or obvious statistical correlation with survival, but when put together in a ratio with another gene, particularly one that it interacts with on a biological basis, could result in a “synergistic” correlation with outcome.
One possible expression of this aspect of the invention is a computerized method for identifying one or more pairs of genes, selected from a set of N genes, which are statistically associated with prognosis of a potentially fatal medical condition,
the method employing test data which, for each subject k of a set of K* subjects suffering from the medical condition, indicates (i) a survival time of subject k, and (ii) for each gene i, a corresponding gene expression value yi,k of subject k;
the method comprising:
(i) forming a plurality of pairs of the identified genes (i, j with i≠j), and for each pair of genes:
The survival time may be an actual survival time (i.e. a time taken to die) or a time spent in a certain state associated with the medical condition, e.g. a time until metastasis of a cancer occurs.
Note that the K* subjects may be a subset of a larger dataset of K (K>K*) subjects. For example, the data for K* subjects can be used as training data, and the rest used for validation.
Alternatively, a plurality of subsets of the K subjects can be defined, and the method defined above is carried out independently for each of the subsets. Each of these subsets of the K subjects is a respective “cohort” of the subjects; if the cohorts do not overlap, they are independent training datasets. Note that each time the method is performed for a certain cohort, K* denotes the number of subjects in that cohort, which may be different from the number of subjects in other of the cohorts. After this, there is a step of discovering which pairs of genes were found to be significant for all the cohorts.
Optionally, there may be one of more further steps of reducing the number of candidate gene pairs, to find the most statistically significant.
Once one or more pairs of significant genes are identified, they can be used to obtain useful information in relation to a certain subject (typically not one of the cohort(s) of subjects) using a statistical model which takes as an input the ratio(s) of the expression values of the corresponding identified pair(s) of genes. The information may for example be susceptibility to the medical condition, the or prognosis (e.g. relating to recurrence or death) of a subject suffering from the condition.
A second aspect of the invention relates to three specific gene pairs which were identified using a method employing gene ratios in relation to breast cancer. These gene pairs are here referred to as:
The gene expression values of one or more of these three pairs of genes are obtained for a subject, the ratio(s) of the expression values of these pair(s) of genes are found, and then the results are entered into a statistical model which takes as an input the ratio(s) of the expression values of the pair(s) of genes, to obtain information about the subject in relation to breast cancer. The information may for example be susceptibility to the medical condition, or prognosis of a subject suffering from the condition.
Based on the same idea, an array may be found for obtaining the gene expressions of at least one (and preferably all) of the three pairs of genes (i)-(iii). Since these three pairs of genes are known to have statistical significance in relation to breast cancer, the array need not additionally be designed to obtain the expression values for a great many other genes, and therefore the array can be manufactured at lower cost than conventional arrays which measure the expression values of hundreds or thousands of other genes. In fact, preferably the present arrays may measure the expression values of no more than 100 genes, or more preferably no more than 20 genes, or even just the three pairs of genes (i) to (iii).
The method, as herein described may be performed with a computer system and/or apparatus. The invention also proposes computer program products (e.g. stored on a tangible recording medium) which are software operable by a computer to cause the computer to perform the computational steps of the method.
Having now generally described the invention, the same will be more readily understood through reference to the following exemplary embodiments which are provided by way of illustration, and are not intended to be limiting of the present invention.
“Array” or “microarray,” as used herein, comprises a surface with an array, preferably an ordered array, of putative binding (e.g., by hybridization) sites for a sample which often has undetermined characteristics. An array can provide a medium for matching known and unknown nucleic acid molecules based on base-pairing rules and automating the process of identifying the unknowns. An array experiment can make use of common assay systems such as microplates or standard blotting membranes, and can be worked manually, or make use of robotics to deposit the sample. The array can be a macro-array (containing nucleic acid spots of about 250 microns or larger) or a micro-array (typically containing nucleic acid spots of less than about 250 microns).
The term “cut-off” represented by c in the context of the classification, refers to a value for which if the expression level of a particular nucleic acid molecule (or gene) in a subject is above the cut-off, the subject is classified into a first classification group; and if the expression level of the nucleic acid molecule is below or equal to the cut-off, the subject is classified into a second classification group.
The term “gene” refers to a nucleic acid molecule that encodes, and/or expresses in a detectable manner, a discrete product, whether RNA or protein. It is appreciated that more than one nucleic acid molecule may be capable of encoding a discrete product. The term includes alleles and polymorphisms of a gene that encodes the same product or an analog thereof.
Standard molecular biology techniques known in the art and not specifically described were generally followed as described in Sambrook and Russel, Molecular Cloning: A Laboratory Manual, Cold Springs Harbor Laboratory, New York (2001).
The steps of the method are shown in
Label the subjects of one cohort k=1, 2, . . . , K*. For each subject k, there is an expression value yi,k of each of the genes i, and the logarithm of this expression value is given by log yi,k. For each gene and for each of the identified probes, we stratify the subjects (step 2) in the following way. For each gene i and each pair of genes (i, j with i≠j) we define a cut-off parameter ci or ci,j respectively, and define:
Due to the mathematical identity log A−log B=log(A/B) for any A and B, the cut-off determines according to the ratio of the expression levels yi,k and yj,k.
The values of ci and ci,j may be selected to according to a mean or a median over the cohort.
The significance of each individual gene is then obtained by fitting the survival data to Eqn. (4) above (step 3). The significance of each pair of genes i,j may be determined by fitting the survival data to:
log hi,jk(tk|xki,j,βi,j)=αi(tk)+βi,j·xki,j, (6)
We then assess the significance of genes and pairs of genes from the values of βi and βi,j respectively, for example by obtaining p values from a likelihood ratio test or log rank test. The result reflects the prognostic value of a given gene or gene ratio. The more significant the p-value, the greater the gene or gene ratio is associated (in a linear manner) with patient survival. Thus significant pairs of genes, or individual genes, are identified (step 4).
The method of
To test the theory, we targeted a specific clinical entity: early stage (lymph node negative, small tumor diameter) breast cancer. The rationale is that multiple independent cohorts, where the tumor material has been assessed by expression microarrays (namely the Affymatrix U133 platform), and where long term clinical follow-up information is available (i.e. distant metastatis-free survival data), exist and can be utilized for the validation of the methodology. We seek to determine if the gene expression ratios can significantly enhance patient prognosis in this disease type over convention and widely publicized methods involving single gene measurements.
Once significant gene ratios (and optionally also genes) have been identified, they can be used to obtain information relating to specific subjects (e.g. prognosis for those subjects) using measured expression values of those genes in those subjects. This can be done by forming a model (e.g. a Cox proportional hazards model) in which those gene ratios are weighted with values obtained based on a training dataset, and extracting the information from the model, e.g. in the form of diagram indicating survival probability for the subject against time.
Another aspect of this example relates to the biological inferences that can be drawn through further analysis of the “best” prognostic expression ratios. In some instances we expect that the prognostic significance of the gene ratios lies in the ratio's capacity to provide better functional information regarding pathway activity, or other conditional relationships that drive aggressive tumor growth, than individual gene expression levels. Thus, the methodology may lead to the discovery of clinically relevant biological/pathological interactions that could ultimately be therapeutically targetable. Examples of such interactions could be transcription factors/targets, phosphatases/kinases, agonists/antagonists of the same pathway, oncogenes/tumor suppressor genes, etc.
We now present experimental results obtained using 5 cohorts of subjects, referred to here as the Uppsala, Erasmus, Oxford, Stockholm and Transbig cohorts, in a method which is a second embodiment of the invention and illustrated in
Details of the Erasmus cohort are given in Wang et al. Details of the Oxford cohort are given in Loi et al. Details of the Transbig cohort are given in Desmedt et al.
This experiment has focused on the clinical entity: LN−, untreated breast cancer. Representing this entity is a large collection of patient data we have assembled that comprises 5 independent patient cohorts totaling 708 cases for which we have corresponding gene expression profiles and clinical annotation. The primary clinical outcome studied was distant metastasis-free survival (i.e., the time interval from initial diagnosis of the primary tumor to distant recurrence, or last follow-up whereby the patient was diagnosed as relapse-free) within a 10-year window.
Each cohort was selected based on compliance with the following 2 clinical criteria: 1) median patient follow-up of at least 5 years, and 2) all patients were treated by surgery with/without local radiation, and without systemic adjuvant (or neo-adjuvant) therapy. Our rationale for selecting node-negative, untreated patients is two-fold. First, that these patients have received no systemic adjuvant therapy ensures that the prognostic associations we discover are not confounded by systemic treatment effects. Second, this subgroup represents early stage breast cancer—for which the decision to treat or not to treat (or how aggressively to treat) is controversial and thus would benefit from better prognostic markers. Clinical characteristics of the cohorts and corresponding reference information are shown in Table 1.
All patient/tumor clinical data were obtained from original published reports or by personal communication with the study leaders. All raw microarray data were downloaded from the NCBI's Gene Expression Omnibus (GEO) via the GEO accession numbers presented in Table 1. All five microarray studies were conducted on the Affymetrix U133A or U133 Plus 2.0 array platforms which contain 22,268 overlapping gene probe sets. Each of the 708 gene array profiles was evaluated for quality according to our previously published methods (Ploner, 2005). All expression profiles were MAS5.0 processed, scaled to a median target intensity of 500 and log2 transformed. (Post-scaling intensity values<10 were adjusted to 10 in order to prevent negative values after log transformation.) For discovery and validation purposes, the cohorts were divided into training (Uppsala, Stockholm, OXFU, and EMC) and test (TBIG) cohorts.
Prior to computing the gene ratios, the Affymetrix data were filtered to exclude control probe sets, probe sets that failed to map to human gene sequences with high specificity (Vlad), and probe sets with mean signal intensities of 50 units or less in 2 or more training cohorts. After exclusions, 18,190 probe sets remained for ratio analysis. All possible pair-wise combinations of these probe sets equalled 181902/2, or 165,438,050 unique gene-pair combinations (step 11 of
We next analyzed all unique gene-pair combinations, independently, in each of the 4 training cohorts, for statistical correlations with DMFS by Cox Proportional-Hazards Regression (step 12 of
Using a data extraction script, the results files output from the Hilbert Cluster were filtered to retain the gene ratios having the following properties: 1) p-value>10-fold more significant than either of the 2 contributing genes (in at least 1 of 4 cohorts), 2) p-value<0.01 in all 4 training cohorts, and 3) consistent directionality (i.e., the ratio must be either positively or negatively correlated with DMFS in all 4 cohorts). In total, we identified 5284 ratios that passed these criteria (step 13 of
To develop prognostic models from the selected gene ratios described above, we first combined all 510 tumor (ratio) profiles from the 4 initial training cohorts into a single “unified” training cohort (step 14 of
However, we have observed that this variation is largely systematic, affecting all gene distributions within a cohort in a similar way. This property is illustrated in the left and middle panels of
As our goal was to identify a small number of gene pairs that contribute to prognosis in a synergistic fashion (step 15 of
Typically, in analogous classification procedures, the ideal prognostic model is extrapolated from a feature selection/internal cross-validation strategy to minimize data overfitting and to determine the optimal number of features for inclusion in the model. In our case, however, we used a clinically-guide approach to select a minimal number of ratios for inclusion and to simultaneously define thresholds for “good”, “intermediate” and “poor” outcome categories.
Specifically, we asked at what number of ratios in the model can we identify one or more SOW classes that correspond to patients with a distant recurrence rate of <10% at 10 years. The answer, we observed, was 3 or more ratios. With only 3 ratios in the model, the SOW of greatest negative magnitude (−0.86) contained 149 cases which together had a 10-year recurrence rate of only 8.4%—which we subsequently defined as the good outcome group. We then classified the remainder of cases into intermediate and poor outcome groups using SOW thresholds that provided roughly equivalent survival curve separation between the three groups. This can be seen in the Kaplan-Meier plot of
To test the prognostic performance of the 3-ratio model in the TBIG cohort, we extracted the expression intensities of the 3 gene pairs, computed their corresponding ratios, and applied the Adaboost thresholds and weights as defined in the model. The resulting good, intermediate and poor outcome groups showed significantly different survival rates (p<0.0001; likelihood ratio test) with the good outcome group having a 10-year recurrence rate of 7.8%—comparable to the training set (
With an interest in the reproducibility of each ratio's contribution to the performance of the model, we examined the prognostic potential of each ratio in the TBIG cohort (Table 2). By univariate analysis, we found that all three ratios were significantly correlated with distant recurrence. Moreover, in a multivariate model, each ratio remained significant, suggesting a unique contribution from each ratio to the prognostic power of the 3-ratio predictor.
5. Comparison of the 3-Ratio Predictor with Conventional Variables.
A primary reason for selecting the TBIG cohort for model testing was its detailed annotation for conventional prognostic markers (including the test results for the 70-gene MammaPrint assay) with which we could compare the prognostic value of our predictor via multivariate analysis. In this cohort, the 3-ratio predictor, tumor size, grade, and ER status were all significantly correlated with DMFS (Table 3). However, when considered in the multivariate model, only the 3-ratio predictor and tumor size remained significant at the 0.05 level, reflecting their unique contributions to prognosis. Furthermore, when considered in the presence of the Mammaprint score, the 3-ratio predictor remained significant, indicating an additive prognostic contribution unique from that of MammaPrint.
Note that in this embodiment, there is no pre-selection of genes based on survival/prognostic association. Rather, all gene pairs (i.e. ratios) are considered by Cox regression. So the embodiment screens all possible gene combinations (limited only by the number of probes on the microarray) looking for those ratios with the greatest robustness (i.e. reproducible survival associations across 4 independent training datasets, in our example), then we combined all these ratios together into one set, and combined all the tumour samples from the 4 training datasets into one dataset, and asked Adaboost to find the few ratios with the greatest complementarity in predicting outcome (i.e. that work well together in a prognostic model).
Genbank sequence: NM—006472 (SEQ ID NO: 1)
Affymetrix ID: 201010_s_at
Name: Thioredoxin interacting protein; TXNIP (symbol)
Aliases: VDUP1; Thioredoxin-binding protein 2; Vitamin D3-upregulated protein 1
Genbank sequence: AI826060 (SEQ ID NO: 2)
Affymetrix ID: 202069_s_at
Name: Isocitrate dehydrogenase 3 (NAD+)alpha: IDH3A (symbol)
Aliases: Isocitrate dehydrogenase 3, alpha subunit
Genbank sequence: NM—002497 (SEQ ID NO: 3)
Name: NIMA (never in mitosis gene a)-related kinase 2; NEK2 (symbol)
Aliases: NIMA-related kinase 2
Genbank sequence: T15766 (SEQ ID NO: 4)
Name: Calcium/calmodulin-dependent protein kinase (CaM kinase) II beta; CAMK2B (symbol)
Aliases: 2.7.1.123; CAM2; CAMKB; MGC29528; CaM kinase II beta subunit; CaM-kinase II beta chain; proline rich calmodulin-dependent protein kinase
Genbank sequence: NM—005008 (SEQ ID NO: 5)
Name: NHP2 non-histone chromosome protein 2-like 1 (S. cerevisiae); NHP2L1 (symbol)
Aliases: NHP2-like protein 1; Non-histone chromosome protein 2, S. cerevisiae, homolog-like 1; U2/U6-15.5K protein
Genbank sequence NM—022341 (SEQ ID NO: 6)
Affymetrix ID: 219575_s_at
Name: Peptide deformylase (mitochondrial); COG8 (symbol)
Aliases: Transcribed locus, strongly similar to NP—115758.3 component of oligomeric golgi complex 8 [Homo sapiens].
For a final assessment of ratio performance, we analyzed another well known breast cancer microarray cohort from the Netherlands Cancer Institute (NKI). The NKI cohort is comprised of 295 consecutive breast cancer patients with detailed clinical annotation, and the patient samples were profiled on a comprehensive Agilent microarray. Notably, microarray analysis of this cohort enabled the discovery and validation of the 70-gene MammaPrint signature. While the Affymetrix and Agilent microarrays share the ability to detect a large common set of overlapping genes, the probe design algorithms are different, resulting in oligonucleotides of different sequence and length otherwise designed to detect the same genes. Upon accessing this microarray dataset( ) we found that each of our 6 3-ratio predictor genes was represented by at least 1 probe on the Agilent array. In the case of redundant probes, we averaged the expression values to obtain a single value for each gene. Next, we computed the gene ratios, then divided the cohort into 2 subgroups: LN−, untreated (n=141) and LN+ (n=144, >80% were treated with chemotherapy, hormone therapy, or both). Due to technical differences between the array platforms than influence ratio distributions, the 3 ratios were “retrained” by the Adaboost algorithm to identify appropriate thresholds and weights for each ratio. In effect, this optimizes the survival separation achievable by the 3 ratios (and thus could be susceptible to overfitting). While this does not constitute an independent validation of the predictor's performance, it does allow us to evaluate the relative prognostic potential of the 3-ratio predictor in the context of a different gene expression platform (Agilent) and LN+, systemically-treated patients. First, by univariate analysis, we found that each ratio was significantly associated with DMFS by Cox regression in the NKI cohort (p-values: 0.02, 0.006, 0.002, for ratios #1, #2 and #3, respectively). Next, by Kaplan-Meier analysis, we observed a very significant separation between the good, intermediate and poor outcome groups (
Using the thresholds and weights determined in the LN− cohort, we then directly tested the 3-ratio predictor on the LN+ NKI cohort. As can be seen in
The disclosure of the following documents is hereby incorporated by reference:
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/SG2010/000079 | 3/10/2010 | WO | 00 | 9/8/2011 |
Number | Date | Country | |
---|---|---|---|
61158948 | Mar 2009 | US |