The method of this application relates to analysis of unknown gene function, to a gene function database used for the analysis and to a method constructing the database. More particularly, the invention of this application relates to a novel method for analyzing function of function-unknown gene which is useful as a genetic material for the pharmacogenomics and for the manufacture of various useful proteins by means of genetic engineering and to a gene function database used for the analysis as well as a method for constructing the database.
As a result of the human genome project, all of human gene sequences will be soon elucidated. It is predicted that, in near future, causative genes for all genetic diseases will be made clear. Identification of causative genes for diseases is expected to greatly contribute in correct and simple diagnosis of diseases or in effective preventive treatment and therapy.
However, although many causative genes have been identified already, the greater part thereof has not yet been applied for the development of therapeutic drugs or others. That is because functions of the causative genes (functions of expression products) have not been elucidated yet. For example, even when the relevancy of the causative gene with pathology is made clear using knockout mice, etc., the action mechanism of a genetic product during that process is ambiguous and, therefore, it is not possible to search a compound (a lead compound) affecting a gene product (a target protein) and to develop the drug using such a compound.
Until now, analysis of gene function is greatly dependent upon discretion of researchers, and a lot of labor and expense have been paid for analysis of one gene function. It is predicted that, in future, identification of a gene product relating to an expression product of the causative gene for disease among huge numbers of genetic products will become more and more difficult, and there has been a strong demand for development of a new method for analysis of gene function.
The invention of this application has been achieved under the above-mentioned circumstances, and objects of the invention are to provide a novel method for a simple and correct analysis of function of function-unknown genes, a database for analyzing gene function used for the analysis method and a method of constructing the database.
As an invention for solving the above-mentioned objects, this application provides a method for construction of a gene function database, which comprises:
In the method for constructing the database, it is a preferred embodiment that “n” of the known genes (gn) is 50 or more, and that “n” of the drugs (Dn) is 40 or more.
This application further provides a gene function database, which is constructed by the above constructing method.
This application furthermore provides a method of analyzing functions of a function-unknown gene (gx) on the basis of DSPA (Drug Sensitivity Pattern Analysis) using the gene function database set forth above, which comprises:
This application still further provides a method for constructing the database according to claim 1, wherein the data of the unknown gene (gx) whose function is determined by the method of analyzing functions is added to database as a data of the function-known gene, and still furthermore provides a gene function database constructed by the constructing method set forth above.
The gene function database of this application is constructed by the following steps (a) to (d).
There are measured the viabilities, against a plural number of drugs (D1, D2, D3, . . . Dn) at various concentrations, of transformed eukaryotic cells overexpressing a plural number of function-known genes (g1, g2, g3, . . . gn) and their parental cell lines.
The function-known genes are those where functions of the expression products thereof have been known already and, with regard to their numbers, not less than 50 or, preferably, not less than 100 genes are used. Full-length cDNA of each of those genes is integrated into an expression vector for eukaryotic cells and the recombinant vector is transfected into eukaryotic cells. With regard to the expression vector, known vectors such as pRc-CMV, pcDNA3 and pMSG may be appropriately used. With regard to the eukaryotic cells, there may be exemplified cell lines such as mouse fibroblast cell NIH3T3 and Ha-ras-NIH3T3 although they are non-limitative. Transfectants may be selected using an appropriate selecting drug depending upon the type of the drug-resistant gene of the vector. All of cell lines into which each of the genes (g1, g2, g3, . . . gn) is introduced are checked for gene expression by Western blotting method or Northern blotting method, and the most highly expressed cell line is selected.
Then, with regard to those gene-introduced cells and the parental cell line thereof, their viabilities to a plural number of drugs at various concentrations are measured. The drugs are physiologically active substances (such as cytokines) or drugs which have been known to affect the viabilities of the parental cell line, and they are other than the object drugs for the drug-resistant gene owned by the vector used for the introduction of the gene. Forty kinds or more of drugs are used.
Cell viabilities can be measured by various known methods, and MTT method is preferred. The MTT method is a method where coloration of formazan which is a metabolite of MTT dye (tetrazolium salt 3-(4,5-dimethylthiasol-2-yl)-2,5-diphenyltetralin bromide) by mitochondrial succinic dehydrogenase of growing cells is measured (J. Immunol. Methods 65: 55-63, 1983; J. Immunol. Methods 116: 151-158, 1989) and, because of its quickness and precision, it has been used as a method for measuring the cell growth.
There are calculated the ratios of the concentration value of the drug to inhibit the viabilities of the transformed cell to an extent of 40% (IC40 value) to the IC40 value of the parental cell line. Thus, for example, ratio of the IC40 values can be calculated from the concentration-dependent curve on the viability as shown in
With regard to all of the known genes (g1, g2, g3, . . . gn), there are calculated the logarithmic values of the ratios calculated in the above step (b). The logarithmic values are inputted, for example, into a list (Table 2) prepared in the Examples mentioned later.
There are calculated the correlation coefficients among the known genes (g1, g2, g3, . . . gn) for the logarithmic values of the above step (c). The correlation coefficients thereof (r) can be expressed as shown in the list (Tables 3 to 14) which are prepared in the Examples mentioned later, and can be subjected to a test of significance by t-test.
By the above-mentioned method, there is prepared a gene function database equipped with information how the function-known genes are functionally related to each other. The database is inputted into a computer and used for a method of analyzing gene functions which will be mentioned later.
Now the method of analyzing gene functions provided by this application will be illustrated.
The method of analyzing gene functions of this invention using DSPA is a method where function-unknown gene (gx) is analyzed using the above gene function database and comprises the following steps (i) to (iii).
There is measured the IC40 value for each drug from the viabilities at various concentrations of a plural number of drugs (D1, D2, D3, . . . Dn) for transformed eukaryotic cells overexpressing the unknown gene (gx).
Types of the vector for recombination of cDNA of unknown gene (gx), the eukaryotic cells and the drugs and measurement of the viabilities are the same as those in the above step (a) for the construction of the gene function database.
There are calculated the correlation coefficients between the unknown gene (gx) and the known genes (g1, g2, g3, . . . gn) from the IC40 value of the above step (i) by the same method (steps (b) to (d)) as in the calculation of the correlation coefficients among the known genes (g1, g2, g3, . . . gn) of the gene function database.
Step (iii):
There is determined that the function of the known gene showing a significant correlation coefficient to the unknown gene (gx) is related to the function of the unknown gene (gx).
Thus, in the t-test for the calculation of the correlation coefficient “r”, t={r2(n−2)/(1−r2)}1/2 (n: data numbers). Therefore, in case the data for 40 or more kinds of drugs are available, the correlation coefficient “r” is significant when it is not less than 0.4 or not more than −0.4. More preferably, when the case where correlation coefficient “r” is not less than 0.5 or not more than −0.5 is used as a standard, it is possible to clearly specify the relationship among the genes. In the case of each of genes shown in Tables 3 to 14, the gene Ha-ras for example shows a correlation of not less than 0.5 to Ki-ras (r=0.71), N-ras (0.56) and erbB2 (0.53), and shows a correlation of not more than −0.5 to Cip-1 (−0.55), RhoA (−0.55) and C/EBPb (−0.50) whereupon it is noted that Ha-ras is functionally related to those genes. When there is such a high correlation between unknown gene and known gene, it is can be judged that the function of the known gene is related to the function of the unknown gene as well.
Incidentally, the above steps (ii) and (iii) can be quickly processed by a computer. Further, when the data of the function-unknown gene being newly functionally analyzed by the method of analyzing the gene functions are appropriately added to the above-mentioned database, it is now possible to be developed to a database having higher accuracy.
As a result of the method of analyzing gene functions as mentioned above, function of the unknown gene can be quickly decided, and action mechanism of the genetic product can be estimated with a high accuracy. Thus, in most cases, overexpression of gene affects the molecule which is related to the product thereof. The molecule affected as such further affects the surrounding molecules and, as a result, a pathway such as a signal transduction is activated or inactivated. Such a cell shows a different sensitivity to the drugs which act the molecule relating to the pathway. Alternatively, it also shows a different sensitivity to a physiologically active substance acting on the same pathway such as cytokines. On the other hand, it has been confirmed that, in the case of genes where their functional relations have been known already (such as p53 and p21), similar sensitivity to the drugs is noted. It is presumed that there are several decades of main signal transduction pathways and that, even when minor routes accompanied therewith are included, there are about 100 kinds. Accordingly, by constructing a database where result of influence (or result of non-influence) to sensitivity to drugs by overexpression of function-known genes (preferably, 100 kinds or more) made into a pattern using the correlation value for each other followed by comparing the result of the overexpression of function-unknown genes with the database, it is possible to identify the pathway concerning the product of the function-unknown genes. Further, when the pathway is investigated, a direct action mechanism of the genetic product can be elucidated with a high accuracy.
The invention of this application will now be illustrated in more detail and specifically by way of the following Examples, although the invention of this application is not limited to the following Examples.
With regard to the function-known genes, the genes shown in the left column of Table 2 were used. With regard to the drugs, those shown in Table 1 were dissolved in DMSO to an extent of 100-fold of the maximum concentration for the search of the drugs and used. With regard to the cells, incubated NIH3T3 or ras-NIH3T3 cells were used.
Full-length cDNA of each function-known genes was incorporated into an expression vector (pRc-CMV, etc.) and transfected into NIH3T3 cells by a common method. Transformed cells were isolated using the resistance to G-418 as an index and the expression of the transfected gene in each cell was investigated by Western blotting method to select a highly expressing line.
Cell viabilities were measured according to the following procedures.
As an index for the sensitivity of highly gene-expressing cells and the parental cell line to various drugs, each of the IC40 values thereof was specified and ratios of the IC40 values were calculated.
indicates data missing or illegible when filed
In the list of the logarithmic values shown in Table 2, calculation of correlation coefficients by t-test was carried out in each line.
indicates data missing or illegible when filed
indicates data missing or illegible when filed
indicates data missing or illegible when filed
indicates data missing or illegible when filed
+lactacystin
+ CA-074
PD98059
+ONO3403
+Y27632
indicates data missing or illegible when filed
From the result of the Tables 3 to 14, there were identified 71 pairs of genes showing positive correlations (r>0.5) and 17 pairs of genes showing negative correlations (r<−0.5). A part of those gene combinations and functional relationship thereof is shown in Table 15.
As a result of the above method, there was constructed a gene function database where correlation among the genes having known functions was made clear. By referring to the database, it is now possible to easily elucidate the functions of the gene having unknown functions.
As mentioned in details hereinabove, the invention of this application provides a novel method of analysis of functions of function-unknown genes useful as a genetic material for the Pharmacogenomics and for the manufacture of various useful proteins by means of genetic engineering and also provides a gene function database to be used for the analysis as well as a method for constructing the database.
Number | Date | Country | Kind |
---|---|---|---|
2000-378047 | Dec 2000 | JP | national |
2001-375565 | Dec 2001 | JP | national |
This application is a divisional of application Ser. No. 10/450,118, which is a U.S. National Stage Application of International Application No. PCT/JP01/10838, filed Dec. 11, 2001.
Number | Date | Country | |
---|---|---|---|
Parent | 10450118 | Jul 2003 | US |
Child | 12149434 | US |