The present invention relates to the field of prediction of toxic substances in an environment, and particularly relates to a QSAR toxicity prediction method for evaluating a health effect of nano-crystalline metal oxide.
People are greatly benefited from fruitful achievements gained based on nanotechnology. Nanometer materials have been widely applied to daily supplies, and have wider application prospects in biomedical fields such as drug carriers, cancer therapy, gene therapy, antibacterial materials, medical diagnosis, biosensors and the like. Nano-crystalline metal oxide is an important component of industrially produced nanoparticles and have high application values in aspects such as foods, materials, environmental protection, chemistry, biomedicine and the like. However, people increasingly care about nanometer characteristics of the nanometer materials, such as small size effects, surface and interfacial effects, quantum size effects and the like. A special biological effect may be initiated; human health may be threatened; and negative effects may be brought to the environment and society. In view of this, characteristics of the nano-crystalline metal oxide need to be reflected; a basis convenient for safety application needs to be provided; cognition of nano-toxicity plays a key role; and necessary safety evaluation of nano-products is facilitated.
Study on biotoxicity and health effects of metal oxides at a nanoscale has become a research hotspot in recent decades. Compared with a toxic effect of a nano metal element, a toxication mechanism of the metal oxides may be more complicated. Meanwhile, nano-crystalline metal oxides of different elements may have similar active sites and toxication mechanisms. A dose-response relationship and a predictive model have important significances at theoretical and practical levels. An original intention of a QSAR technology is to predict the toxicity of an untested compound, and the knowledge is applied to risk assessment. With respect to a series of substances with the same action mode, a relationship between structural parameters and biological activity or toxicity of the compound is established by virtue of statistical means, and an influence on the activity or toxicity of an unknown compound is further predicted. In the recent decades, study on QSAR at the nanoscale is very active. Winkler et al. analyzed a current situation of QSAR study on nano toxicity effects and predicted potential applications. It is thought that the method can optimize resources in toxicology survey and reduce moral and monetary cost of toxicity tests, Wolterbeek and Walker summarized physical and chemical properties of 20 cations and potential toxicity effects of different species and recognize and interpret toxic action modes. An appropriate compound classification and cross reference method is developed to perform preliminary hazardous risk assessment on the nanometer materials. Meng considered that the QSAR method for acquiring a correct toxicological path and a damage mechanism may play a crucial role in safety assessment of the nanometer materials. Pathakoti et al. determined toxicities of 17 nano-crystalline metal oxides to Escherichia coli, and established a two-parameter QSAR model based on the toxicities to predict light-free (F=33.83, R2=0.87) and light-induced (F=20.51, R2=0.804) toxicity effects. Epa et al. established a quantitative prediction model for ingestion and apoptosis of nanoparticle-induced pancreatic cancer cell multi-type cells PaCa2 and human umbilical vein endothelial cells and respectively proposed modeling strategies for different modifications on surfaces of different materials and the same material. Toropova et al. proposed an optimal descriptor independent of a space structure and established a toxicity prediction model of Escherichia coli. Although Leszczynski preliminarily established a toxicity prediction model of 13 nano-crystalline metal oxides, a predicted effect and an application field of the model need to be further researched and demonstrated.
In general, the above method is only used for performing preliminary model prediction on toxicities of nano-crystalline metal oxides. Qualitative mode recognition and quantitative prediction of toxicity effects of the nano-crystalline metal oxides lack of systematic research and reliable prediction methods.
In view of the above defects, an inventor in the present invention finally achieves the present invention after long-term research and practice.
A purpose of the present invention is to provide a QSAR toxicity prediction method for evaluating a health effect of nano-crystalline metal oxide, so as to overcome the above technical defects.
In order to achieve the above purpose, the present invention provides a toxicity prediction method based on a quantitative structure-activity relationship of nano-crystalline metal oxide. A toxicity endpoint of unknown nano-crystalline metal oxide is predicted according to a quantitative relationship between structural characteristics and a cytotoxic effect of the nano-crystalline metal oxide.
The toxicity prediction method specifically comprises the following steps:
step a, acquiring, screening, calculating and summarizing modeling toxicity data;
step b, establishing a structural descriptor dataset of nano-crystalline metal oxides, and performing linear correlation analysis and principal component analysis by taking a structural parameter corresponding to each metal oxide as an independent variable, thereby obtaining an optimal structural descriptor combination,
wherein the established structural descriptor dataset of the nano-crystalline metal oxides respectively comprises a soft index of metal ion σp, a soft index per unit charge σP/Z, an atomic number AN, an ion radius r, IP: ionic potential of ON-state ion, IP(N+1): ionic potential of ON+1-state ion, a difference ΔIP of IP(N+1) and IP, an atomic radius R, an atomic weight AW, a Pauling electronegativity Xm, a covalence index Xm2r, an atomic ionization potential AN/ΔIP, a first hydrolysis constant |log KOH|, an electrochemical potential ΔE0, an atomic size AR/AW, measured electronegativity x, polarizability z/rx, ionic valency Z, polarizing force parameters Z/r, Z/r2 and Z2/r, polarizing force-like parameters Z/AR and Z/AR2, a formation enthalpy ΔHme+ of gaseous cations, an energy barrier GAP and standard heat of formation HoF of an oxide cluster.
The step b specifically comprises the processes as follows:
step b1, taking a toxicity endpoint as a dependent variable, performing linear correlation analysis by taking a structural parameter corresponding to each metal oxide as an independent variable, and calculating a correlation coefficient r according to a formula (1) as follows:
in the formula, x and y respectively represent the average values of structural parameters and toxicity values, and xi and yi respectively represent a structural parameter and a toxicity value corresponding to the ith metal;
the correlation coefficient r>0.8 is a significant correlation parameter;
in the step b2, the optimal structural descriptor combination is obtained through principal component analysis on premise of significant correlation. A specific formula is as follows:
F=a
1i
*Z
X1
+a
2i
*Z
X2
+ . . . +a
pi
*Z
Xp (2)
wherein a1i, a2i, . . . , api(i=1, . . . , m) are characteristic vectors corresponding to characteristic values of a covariance matrix Σ of X, and ZX1, ZX2, . . . , ZXp are values obtained by performing standardized processing on original variables;
A=(aij)p×m=(a1,a2, . . . , am) (3)
Rai=λiai (4)
R is a correlation coefficient matrix; λi and ai are a corresponding characteristic value and a unit characteristic vector; and λ1≥λ2≥ . . . ≥λp≥0.
Step c, establishing a toxicity prediction model and checking robustness; establishing a multiple regression equation, estimating parameters, and checking by adopting a value P corresponding to a statistic F;
specifically, step c1, establishing the multiple regression equation and estimating the parameters;
two optimal structural parameters determined in the step c refer to the independent variable X; a cytotoxicity value of the metal oxide is a dependent variable Y; a QICAR equation Y=XB+E of each model organism is established by utilizing a multiple linear regression analysis method, as shown in a formula (5):
wherein n is a number of observed values;
parameters in the equation are estimated by adopting a least square method, and X′ is a transposed matrix of X:
step c2, performing goodness-of-fit test and significance test of the regression equation, and testing by adopting the F;
goodness-of-fit test indexes of the model refer to: square R2 of the correlation coefficient and correlation coefficient
indexes of F test refer to a value F and correlative probability p (Significance F) calculated by multi-factor variance analysis (Multi-ANOVA); and test is performed by adopting the value P corresponding to the statistic F;
step c3, judgment standards: according to a toxicity data acquisition way, in vitro test R2≥0.81, and in vivo test R2≥0.64; a significance level is α, and when p<α, the regression equation is significant.
Calculation is made in the step c3 according to a formula as follows:
in the formula, R2 represents the square of the correlation coefficient,
step d, performing internal validation on a QSAR model;
the step d comprises a specific process as follows:
step d1, taking a sample as a prediction set in given modeling samples, modeling the rest samples as a training set, and calculating a prediction error of the sample;
step d2, recording the sum of the squares of prediction errors in each equation until all the samples are forecast once only;
step d3, calculating a cross validation correlation coefficient Q2cv and a cross validation root-mean-square error RMSECV, wherein the determining criteria include Q2cv>0.6 and R2−Q2cv≤0.3;
calculation formulas adopted in the step d3 are as follows:
in the formula, yiobs represents a measured value of toxicity of the ith compound, yipredcv represents a predicted value of the toxicity of the ith compound,
step e, calculating an application field of the model; and drawing a Williams diagram by taking a leverage value h as a horizontal coordinate, taking a standardized residual of each data point as a vertical coordinate by virtue of the tested model;
in the step e, a calculation formula of the leverage value h, is as follows:
h
i
=x
i
T(XTX)−1xi (12)
in the formula, xi represents a column vector composed of structural parameters of the ith metal; for a two-parameter model,
XT represents a transposed matrix of the matrix X, and (XTX)−1 represents an inverse matrix of a matrix XTX;
a calculation formula of a critical value h* is as follows:
in the formula, p represents a variable number in the model; p is equal to 2 in the two-parameter model; and n represents a number of compounds in the model training set, and is determined according to a number of metal oxides in the training set in the QSAR equation after test in the steps a-d;
a coordinate space of h<h* in the Williams diagram is the application field of the model; and
step f, rapidly screening and predicting the toxicity of unknown nano-crystalline metal oxides.
A specific process is as follows: obtaining a nano QSAR prediction equation according to method in the above steps a-e, searching and sorting values of all structural descriptors of to-be-predicted nano-crystalline metal oxides, and substituting the values into the equations to calculate a to-be-predicted toxicity endpoint.
According to the QSAR toxicity prediction method for evaluating the health effects of the nano-crystalline metal oxides provided in the present invention, the toxicity prediction model is established based on the action modes and toxication mechanisms of the nano-crystalline metal oxides. The unknown toxicity value is predicted by the QSAR modeling method; the method is rapid and simple; and prediction of the toxicity endpoint of multiple compounds lacking of toxicity data is completed depending on less test data.
The above and additional technical features and advantages of the present invention are described in detail below in combination with drawings.
A principle of the present invention is to predict a toxicity endpoint of an unknown oxide according to a quantitative relationship between structural characteristics and a cytotoxic effect of a nano-crystalline metal oxides. The method in the present invention is a method for establishing a nano metal toxicity prediction model in combination with physicochemical structural parameters and toxication mechanisms of nano-crystalline metal oxides and applying the toxicity prediction model to predicting the toxicity endpoint of the unknown nano-crystalline metal oxides.
step a, acquiring, screening, calculating and summarizing modeling toxicity data;
step a1, a data acquisition process;
step a2, a data screening process; conditions for the data screening are as follows:
1) cytotoxicity data of all nano-crystalline metal oxides shall come from the same test source, the same research group and same test condition;
2) toxicity endpoint data types include a fatality rate, a growth rate and a reproductive rate, represented as EC50 or LC50;
3) toxicity test must be implemented through standard operation procedures under environmental conditions in a certain range; and
4) biological test exposure time is 48-96 hours, and a particle size of nano-crystalline metal oxides is between 30 nm and 100 nm;
step a3, a data calculation process, wherein a calculation method in embodiments of the present invention is as follows:
a concentration of an aqueous solution of the nano-crystalline metal oxides serves as a measurement index of data, e.g., a unit obtained by dividing a mass concentration by a molecular weight is transformed into a molar concentration, that is, mol/L;
step a4, a data summarizing process:
a finally obtained dataset includes molecular formulas of the nano-crystalline metal oxides, types of tested cells, toxicity effect types, endpoint indexes, test conditions, exposure time and data sources.
A detailed toxicity data acquisition process is as follows:
Acute toxicity data for modeling is preferentially collected from the ECOTOX Database (http://cfpub.epa.gov/ecotox/) of the United States Environmental Protection Agency. If toxicity data is insufficient, valid data (ISIWebofKnowledge) queried by SCI (Science Citation Index) in recent 10 years serves as a supplement. Names of the nano-crystalline metal oxides, names of to-be-tested species, acute toxicity and other key words are input by virtue of the database and a document retrieval engine, and a toxicity dataset meeting the conditions is exported. Qualified toxicity data is screened on premise of meeting conditions in the step a2. A free ion concentration of metals serves as a measurement index of the data. If original data takes mass of an ionic compound as a toxicity endpoint index, the mass needs to be divided by the molecular weight to obtain a value and the value is transformed into a micromole concentration in a unified manner, that is, μmol/L. In a data compilation process, the molecular formulas of the nano-crystalline metal oxides, the types of the tested cells, the toxicity effect types, the endpoint indexes, the test conditions, the exposure time, the data sources and other information are recorded and sorted in an Excel form to serve as a modeling basis.
Data screening, calculating and summarizing are performed by taking a cell viability toxicity endpoint of Escherichia coli as an example. Results are shown in Table 1.
A data source in Table 1 is:
Escherichia coli
step b, establishing a structural descriptor dataset of the nano-crystalline metal oxides:
establishing a structural descriptor set of metal ions, combining a quantum chemistry semi-empirical method and document statistics, and calculating 26 physicochemical structural parameters, including physicochemical parameters of metal ions, physicochemical parameters of metal nanoparticles, scale parameters and thermodynamic parameters, of nanoscale metal oxides of 30-100 nm. The structural descriptor set respectively comprises a soft index of metal ion σp, a soft index per unit charge σP/Z, an atomic number AN, an ion radius r, IP: ionic potential of ON-state ion, IP(N+1): ionic potential of ON+1-state ion, a difference ΔIP of IP(N+1) and IP, an atomic radius R, an atomic weight AW, a Pauling electronegativity Xm, a covalence index Xm2r, an atomic ionization potential AN/ΔIP, a first hydrolysis constant |log KOH|, an electrochemical potential ΔE0, an atomic size AR/AW, measured electronegativity x, polarizability z/rx, ionic valency Z, polarizing force parameters Z/r, Z/r2 and Z2/r, polarizing force-like parameters Z/AR and Z/AR2, a formation enthalpy ΔHme+ of gaseous cations, an energy barrier GAP and standard heat of formation HoF of an oxide cluster, wherein the ΔHme+, the GAP and the HoF are completed by utilizing a PM6 semi-empirical algorithm in MOPAC quantum chemistry software;
step b1, taking a toxicity endpoint as a dependent variable, performing linear correlation analysis by taking a structural parameter corresponding to each nano-crystalline metal oxide as an independent variable, and calculating a Pearson's correlation coefficient r according to a formula (1) as follows:
in the formula, xi and yi respectively represent a structural parameter and a measured toxicity value corresponding to the ith metal; and
The Pearson's correlation coefficient r of each structural parameter is respectively calculated according to the method in the step a1, as shown in Table 2.
in the step b2, the optimal structural descriptor combination is obtained through principal component analysis on premise of significant correlation. A specific formula is as follows:
F=a
1i
*Z
X1
+a
2i
*Z
X2
+ . . . +a
pi
*Z
Xp (2)
wherein a1i, a2i, . . . , api(i=1, . . . , m) are characteristic vectors corresponding to characteristic values of a covariance matrix Σ of X, and ZX1, ZX2, . . . , ZXp are values obtained by performing standardized processing on original variables;
A=(aij)p×m=(a1,a2, . . . , am) (3)
Rai=λiai (4)
R is a correlation coefficient matrix; λi and ai are a corresponding characteristic value and a unit characteristic vector; and λ1≥λ2≥ . . . ≥λp≥0.
The principal component analysis is to delete excessive variables in a close relationship from all the originally proposed variables and establish new variables as few as possible, so that every two of the new variables are uncorrelated. Moreover, original information of the new variables may be maintained as much as possible.
The Pearson's correlation coefficient r of each structural parameter is respectively calculated according to the method in the step b2, as shown in Table 3.
step c, establishing a toxicity prediction model and checking robustness;
step c1 establishing a multiple regression equation and estimating parameters;
two optimal structural parameters determined in the above step d refer to the independent variable X; a cytotoxicity value of the metal oxide is a dependent variable Y; a QICAR equation Y=XB+E of each model organism s established by utilizing a multiple linear regression analysis method, as shown in the following formula (5):
wherein n is a number of observed values; B represents an unknown parameter and needs to be estimated in the equation through a least square method; and E represents a random error term and reflects an influence of random factors on y except a linear relationship of x1 and x2 on y. Compared with unary linear regression, in the equation (5), a relationship between two different structural parameters and the toxicity value is established by adopting multiple linear regression, and a relationship between a predicted object and correlative factors is completely and accurately expressed;
parameters in the equation are estimated by adopting the least square method, and X′ is a transposed matrix of X:
least square regression is to perform parameter estimation on a regression model from an error fitting angle, is a standard multiple modeling tool, and is particularly applicable to prediction analysis.
step c2, performing goodness-of-fit test and significance test of the regression equation (F test);
goodness-of-fit test indexes of the model include: square R2) of the correlation coefficient and correlation coefficient (
step c3, judgment standards: according to a toxicity data acquisition way, in vitro test R2≥0.81 and in vivo test R2≥0.64; a significance level is α, and when p<α, the regression equation is significant;
in the formula, yi represents a measured toxicity value of the ith metal, ŷ represents a predicted toxicity value of the ith metal,
the correlation coefficient and standard deviation in the equations (7) and (8) can measure goodness of fit of a regression straight line; and the equation (9) is a universal method for testing whether a linear relationship between the dependent variable and the multiple independent variables is significant;
step d, performing internal validation on a QSAR model;
A QSAR model of each species may be validated by adopting a leave-one-out method. A core concept of the method is to randomly take a data from the training set, establish a multiple regression model by using other toxicity data and the optimal structural descriptor obtained in the step c and check an established network model according to comparison of a predicted value of the taken data with an experimental value. In order to reduce variability of cross validation results, a sample dataset is differently divided for multiple times to obtain different complementary subsets and perform multiple cross validations. In the step, the average value of multiple validations is taken as a validation result.
The internal validation method has the advantages that: the model is trained by using almost all the samples and is closest to the sample, so that an assessed result is relatively reliable. Any random factor does not exist in the experiment, and the whole process is repeatable.
Specific steps are as follows:
step d1, taking a sample as a prediction set in given modeling samples, modeling the rest samples serving as a training set, and calculating a prediction error of the sample;
step d2, recording the sum of squares of prediction errors in each equation until all the samples are forecast once only;
step d3, calculating a cross validation correlation coefficient Q2cv and a cross validation root-mean-square error RMSECV, wherein calculation formulas are as follows; and the determining criteria include Q2cv>0.6 and R2−Q2cv≤0.3;
in the formula, yiobs represents a measured toxicity value of the ith compound; yipredcv represents a predicted toxicity value of the ith compound,
equations (10) and (11) are indicator parameters of leave-one-out interval validation; over-fitting of the mode on data of the training set can be effectively reduced; and existence of an influence of a specific metal on robustness of the model in the training set is determined.
The model is subjected to internal validation by adopting the method in the step d. By taking a prediction equation Pred.MLR=(4.412±0.165)+(−0.001±2.57×10−4)ΔHme++(−0.121±0.068) Z/r as an example, the model is subjected to leave-one-out interval validation and related fitting parameters are shown in Table 4. According to the formulas (7) and (8) in the step d3, Q2cv=0.7422, RMSECV=0.2695 and R2−Q2cv=0.8793−0.7422=0.1371 are calculated. If model robustness determining criterions of Q2cv>0.6 and R2−Q2cv≤0.3 are met, the model passes the internal validation.
step e, calculating an application field of the model;
the application field of the model is calculated by adopting a leverage value method with respect to the validated model, and is visually represented by a Williams diagram. The method may ensure that the model has the optimal reliability in a prediction process.
a calculation formula of the leverage value h, is as follows:
h
i
=x
i
T(XTX)−1xi (12)
in the formula, xi represents a column vector composed of structural parameters of the ith metal; for a two-parameter model,
XT represents a transposed matrix of the matrix X, and (XTX)−1 represents an inverse matrix of a matrix XTX;
a calculation formula of a critical value h* is as follows:
in the formula, p represents a variable number in the model; p is equal to 2 in the two-parameter model; and n represents a number of compounds in the model training set, and is determined according to a number of metal oxides in the training set in the QSAR equation after test in the steps a-d;
a Williams diagram is drawn by taking the leverage value h as a horizontal coordinate and a standardized residual of each data point as a vertical coordinate. A coordinate space of h<h* in the diagram is the application field of the model.
Structural parameters and toxicity endpoints of various nano-crystalline metal oxides in the training set are shown in Table 5. A critical value is h*=3*(2+1)/16=0.5625.
The Williams diagram is drawn by taking leverage values of two optimal structural parameters of each metal as a horizontal coordinate and taking a predicted residual as a vertical coordinate, as shown in
Step f, obtaining a nano QSAR prediction equation according to method in the a bone steps a-e, searching and sorting values of all structural descriptors of to-be-predicted nano-crystalline metal oxides, and substituting the values into the equations to calculate the to-be-predicted toxicity endpoint.
Number | Date | Country | Kind |
---|---|---|---|
201510333022.4 | Jun 2015 | CN | national |
This application is a continuation of International Patent Application No. PCT/CN2015/088336 with a filing date of Aug. 28, 2015, designating the United States, now pending, and further claims priority to Chinese Patent Application No. 201510333022.4 with a filing date of Jun. 16, 2015. The content of the aforementioned applications, including any intervening amendments thereto, are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2015/088336 | Aug 2015 | US |
Child | 15839850 | US |