The present invention relates to an evaluation model of a biomaterial, and in particular to a method for predicting and evaluating the function of a biomaterial.
At present, medical materials are mainly evaluated in terms of physicochemical and biological properties at home and abroad. The evaluation of biological properties focuses on the evaluation of biological toxicity and safety, but there is no unified evaluation system for functional evaluation. For example, the evaluation of the function of regulating the stem cell fate of biological materials has not been included in the national evaluation standards for effectiveness and safety of medical biomaterials. Therefore, the material evaluation data in this field are generated in various biomaterial research laboratories, and the sample database is heterogeneous due to the lack of uniform standards for characterization methods and techniques, etc. In addition, most current functional evaluation experiments are limited to a single index. The identity of a cell is reflected in the expression of its specific genes, and thus the current identification of cell classes is often the identification of the expression of a single specific gene. For example, genes BMP2, Runx2, COL1, etc., which are highly expressed in osteoblasts, are detected via quantitative real-time polymerase chain reaction (qPCR) at the gene level, or osteocalcin (OCN) and bone alkaline phosphatase (ALP) are detected via western blot (WB) at the protein level.
However, the use of traditional single-index evaluation methods has great limitations, which are mainly reflected in the following aspects: (1) The detection of a single gene via qPCR is not enough to accurately determine the identity of a cell, because the same gene may be highly expressed in a variety of cell classes; and in addition, even if only a part of the cells highly express the gene, it may still lead to the detection of overall high expression via qPCR. (2) In order to improve the accuracy, multiple genes often need to be detected via qPCR, resulting in a waste of labor. (3) It is difficult to compare the evaluations of different materials: evaluations based on different indices cannot be directly compared, and even if the indices are the same, it is still difficult to compare the evaluations due to the lack of standard quantification. (4) It is impossible to provide a complete picture of the state of cell differentiation, neither the proportion of differentiated cells nor whether the cells have differentiated in the direction of osteocytes.
In summary, the effect of the expression of a single biomarker molecule on the evaluation of the direction of cell differentiation is not quantifiable, the lack of quantifiable evaluation of the complete picture of cell differentiation makes the functional design and optimization of novel biomaterials lack of theoretical and data support, it is difficult to optimize the physicochemical parameters of the material systems by high-throughput screening, and the biological properties of the novel biomaterials are also unpredictable.
The present invention provides a highly accurate and predictable method for predicting and evaluating the function of a biomaterial, in order to solve the technical problems of labor intensiveness, a long experiment period and large sample heterogeneity in the existing evaluation methods.
To this end, the present invention provides a method for predicting and evaluating the function of a biomaterial, comprising the following steps: (1) in the environment of a material to be tested, culturing human bone marrow mesenchymal stem cells; (2) collecting the human bone marrow mesenchymal stem cells cultured in step (1), extracting total RNA from the cells, and subjecting the total RNA to purification, library construction and transcriptome sequencing to obtain transcriptome data of samples to be tested; and (3) subjecting the transcriptome data of the samples to be tested obtained in step (2) to batch effect adjustment and feature extraction, then inputting the resulting data to a function prediction and evaluation model of the present invention, and calculating the confidence coefficients of the samples to be tested being different cell classes, respectively.
Preferably, a method for constructing the function prediction and evaluation model in step (3) comprises the following steps: (a) dividing transcriptome data collected from the public database GEO into training sets and testing sets, and subjecting the training sets and the testing sets to batch effect adjustment, respectively; (b) extracting the gene expression profiles of four cell classes based on data of the training sets, and subjecting the transcriptome data to feature extraction; (c) training and optimizing machine learning models based on the data of the training sets, and obtaining an MeD-P intelligent prediction model after integration; and (d) inputting data of the testing sets to the MeD-P intelligent prediction model to obtain predicted cell classes of samples in the testing sets, comparing the predicted cell classes with true cell classes of the samples, and statistically calculating the indices of accuracy, precision, recall and F1-score of the model.
Preferably, in step (a), the batch effect adjustment is integrated optimization based on the ComBatseq algorithm and the DaMiRseq algorithm; the classes and batches of samples in the training sets are known; and the classes of the samples in the testing sets are unknown, the batch effect adjustment of the testing sets is based on parameters generated by the batch effect adjustment of the training sets, and each testing set is adjusted independently.
Preferably, in step (b), the feature extraction is integrated extraction based on the DaMiRseq algorithm and the DESeq2 algorithm; after subjecting the training sets to batch effect adjustment, specifically expressed genes of the four cell classes are extracted according to the classes of the samples; and expression matrices of the feature genes are extracted from the data of the training sets and the testing sets after batch effect adjustment, respectively.
Preferably, in step (c), the model is first trained and optimized on the training sets, and then the evaluation indices of the model are calculated on the testing sets, and the MeD-P constructed comprises nine machine learning algorithm of SVM-R, SVM-L, RF, GNB, LDA, LR, MLP, RidgeCV and kNN.
The present invention has the following beneficial effects:
The present invention designs and constructs a method for predicting and evaluating the function of a biomaterial based on transcriptome for quantitative evaluation, in which the gene expression profile of the cell transcriptome to be tested is compared with the gene expression profiles of different cell classes differentiated from stem cells constructed in advance, in order to obtain a complete picture of the state of cell differentiation induced by a biomaterial.
Specifically, the present invention trains an intelligent prediction model that can distinguish samples of four cell classes of osteoblasts, chondrocytes, adipocytes and undifferentiated mesenchymal stem cells by integrating nine machine learning algorithms of SVM-R, SVM-L, RF, GNB, LDA, LR, MLP, RidgeCV and kNN. Compared with the traditional biomarker-dependent evaluation methods, the accuracies in determining the four cell classes are significantly improved. Moreover, the present invention utilizes the RNAseq data derived from public databases of human bone marrow mesenchymal stem cells before and after chemical induction and culture with biological materials as test samples, and inputs the RNAseq data to a prediction model constructed based on a database of the gene expression profiles of reference samples. The results show that the cell classes predicted by the intelligent model are consistent with the phenotypes of the test samples.
The present invention will be further described with reference to the following examples.
The present invention provides a method for predicting and evaluating the function of a biomaterial, comprising the following steps: (1) in the environment of a material to be tested, culturing human bone marrow mesenchymal stem cells; (2) collecting the human bone marrow mesenchymal stem cells cultured in step (1), extracting total RNA from the cells, and subjecting the total RNA to purification, library construction and transcriptome sequencing; and (3) subjecting transcriptome data of samples to be tested (i.e., the data of the samples obtained in step (2)) to batch effect adjustment and feature extraction, then inputting a function prediction and evaluation model of the present invention (which is an MeD-P intelligent prediction model constructed by integrating nine machine learning algorithms of SVM-R, SVM-L, RF, GNB, LDA, LR, MLP, RidgeCV and kNN), and calculating the confidence coefficients of the samples to be tested being four cell classes of osteoblasts, chondrocytes, adipocytes and undifferentiated mesenchymal stem cells, respectively.
As shown in
I. Batch effect adjustment: Integrated optimization based on ComBatseq algorithm and DaMiRseq algorithm.
The classes and batches of samples in the training sets were known, and the function parameters selected for their batch effect adjustment were shown in
II. Feature extraction: Integrated extraction based on DaMiRseq algorithm and DESeq2 algorithm.
After subjecting the training sets to batch effect adjustment, specifically expressed genes of the four cell classes were extracted according to the classes of the samples, and the function parameters selected were shown in
III. Function prediction and evaluation model: MeD-P intelligent prediction model constructed by integrating nine machine learning algorithms of SVM-R, SVM-L, RF, GNB, LDA, LR, MLP, RidgeCV and kNN. The model was first trained and optimized on the training sets, and then the evaluation indices of the model were calculated on the testing sets.
As shown in
As shown in
As shown in
This example shows that the model trained on the selected reference feature genes outperforms those trained on the traditional marker genes and the all-gene set.
When training the machine learning models, we hypothesized that the lineage-specific genes selected based on big data were better than the known traditional lineage-specific marker genes as the training features. In order to verify this hypothesis, in this example, a kNN model based on the traditional marker genes (Table 1) was generated by training the same training samples with the same machine learning methods, and predictions were performed on the same testing datasets with the same procedures. In order to eliminate the interference of the number of genes, we also tested the models trained on the randomly-selected gene sets with the same number of genes as the selected reference genes, as well as that trained on all genes that passed the quality control. Comparisons of these models on the testing datasets showed that as compared to those trained on the traditional marker genes, the randomly-selected gene sets, and the all-gene set, the model trained on the reference feature genes selected based on big data had higher per-class accuracy, per-class specificity, per-class precision, per-class recall, and per-class F1-score (
Furthermore, since Day 7 is a common timepoint for evaluating the biomaterial-induced MSCs differentiation, comparisons were also carried out between the traditional marker genes and the reference feature genes used by MeD-P using Day 7 samples from the public testing datasets and the tri-lineage differentiation experiments in our own laboratory in the present invention. The results showed that in the Day 7 sample testing sets, the reference feature genes used by MeD-P had higher overall accuracy than the traditional marker genes, especially for the undifferentiated samples (
This example shows that MeD-P provides robust, accurate and rapid predictions on varied biomaterial-induced hMSCs lineage fate.
The different biomaterials used for induction were prepared as follows. For the preparation of the 3D-printed β-TCP scaffolds, see L. Chen, C. Deng, J. Li, Q. Yao, J. Chang, L. Wang, C. Wu, Biomaterials 2019, 196, 138. For the preparation of the electrospun PLLA nanofibrous membranes, see W. Liu, Y. Wei, X. Zhang, M. Xu, X. Yang, X. Deng, ACS Nano 2013, 7, 6928. For the preparation of the BTO NPs/P(VDF-TrFE) nanocomposite membranes, see X. Zhang, C. Zhang, Y. Lin, P. Hu, Y. Shen, K. Wang, S. Meng, Y. Chai, X. Dai, X. Liu, Y. Liu, X. Mo, C. Cao, S. Li, X. Deng, L. Chen, ACS Nano 2016, 10, 7279. For the preparation of the phenylalanine hydrogel matrices, see Y. Wei, S. Jiang, M. Si, X. Zhang, J. Liu, Z. Wang, C. Cao, J. Huang, H. Huang, L. Chen, S. Wang, C. Feng, X. Deng, L. Jiang, Adv. Mater. 2019, 31, 1900582. The contents of the above references are incorporated herein by reference in their entirety. The sandblasting and acid-etching (SLA)-treated Ti-6Al-4V alloy substrates were prepared as follows. White corundum with a particle size of 250-300 μm was used to sandblast the pure titanium substrates uniformly at a distance of 3-5 cm under standard atmospheric pressure. Then, the samples were ultrasonically cleaned with acetone, anhydrous ethanol and deionized water for 10 min, and then dried. The sandblasted titanium alloy substrates were placed in an acid etching solution mixed with an equal volume of 18% HCl and 48% H2SO4 at 60° C. for 30 min. Then, the substrates were ultrasonically cleaned with deionized water 3 times for 15 min each time, and placed in a vacuum oven at 55° C. for 12 h. Ti-6Al-4V titanium alloy substrates with polished surfaces were prepared as control.
For evaluation, human bone marrow mesenchymal stem cells (hBM-MSCs) were cultured on the representative biomaterials, and harvested after 7 days for RNA sequencing. The processed RNAseq data were loaded into MeD-P, and a report on the tri-lineage differentiation probabilities was generated (
Then, the RNA-seq data were respectively collected from hBM-MSCs after 7 days of culture on the 3D-printed β-TCP scaffolds, the electrospun PLLA nanofibrous membranes, the SLA-treated Ti-6Al-4V alloy substrates, the BTO NPs/P(VDF-TrFE) nanocomposite membranes and the chiral phenylalanine hydrogel matrices, and processed with MeD-P. hBM-MSCs were also cultured in a normal complete medium for 7 days, and the samples were collected as blank control. The MeD-P evaluation reports showed that the 3D-printed β-TCP scaffolds yielded the highest probability in inducing osteogenesis of hBM-MSCs (
MeD-P predicted that for the electrospun PLLA nanofibrous membranes, the probability of adipogenic differentiation of hBM-MSCs cultured in the AL group (the aligned PLLA nanofibrous membranes) was the highest, and the probability of osteogenic differentiation of hBM-MSCs cultured in the RD group (the randomly-oriented PLLA nanofibrous membranes) was the highest (
The MeD-P reports showed that the polarized BTO NPs/P(VDF-TrFE) nanocomposite membranes had significant osteogenic induction potential (
However, the foregoing are only specific examples of the present invention, and cannot limit the scope of implementation of the present invention. Therefore, replacements of equivalent components of the present invention or equivalent changes and modifications made according to the scope of patent protection of the present invention should all still fall within the scope covered by the claims of the present invention.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202110884816.5 | Aug 2021 | CN | national |
This application is a Continuation of International Patent Application No. PCT/CN2021/119233, filed Sep. 18, 2021, which claims priority to Application No. CN202110884816.5, entitled “METHOD FOR PREDICTING AND EVALUATING FUNCTION OF BIOMATERIAL”, filed on Aug. 3, 2021, the contents of which are incorporated herein by reference in their entireties.
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/CN2021/119233 | Sep 2021 | WO |
| Child | 18429680 | US |