This application claims priority to Japanese Patent Application No. 2023-100052 filed on Jun. 19, 2023. The disclosure of Japanese Patent Application No. 2023-100052 is hereby incorporated by reference in its entirety.
The present disclosure relates to a method for predicting a physical property of a polymer and a polymer.
A technique to predict the physical properties of a new compound is proposed. The technique includes constructing a regression model for predicting the physical properties of a compound by machine learning, and predicting the physical properties of a new compound based on the regression model. The technique also includes repetition of a cycle of synthesizing a new compound, measuring the physical properties of the new compound to validate the predicted values, and adding the new compound and the actually measured physical properties to the data to improve the accuracy of the regression model. Further, the new compounds and their physical properties are utilized to discover useful compounds. Such a technique is popularly performed in the pharmaceutical field.
At present, however, the technique is not sufficiently used in the field of polymeric compounds (polymers), particularly in the prediction of specific physical properties (for example, moisture content, interstitial water content, unfreezable water content, adhesion to specific cells including cancer cells and stem cells, and biocompatibility) of polymers.
A substrate coated with a special polymer has been proposed as a device for capturing specific cells (e.g., blood cells, cancer cells present in blood or biological fluid) from blood or biological fluid (see, JP 2005-523981 T).
Unfortunately, the substrate may capture blood cells together with specific cells including cancer cells. There has been a demand for the development of polymers capable of selectively capturing specific cells including cancer cells while reducing the capture of normal cells such as blood cells.
Polymers with a better biocompatibility are also requested.
The present disclosure aims to solve the above problem and provide a highly accurate method for predicting a physical property of a polymer and a polymer having excellent physical properties such as an ability to selectively capture specific cells and biocompatibility.
The present disclosure relates to a method for predicting a physical property of a polymer, the method including
The method for predicting a physical property of a polymer of the present disclosure includes the Steps 1 to 7, and thus the present disclosure can provide a highly accurate method for predicting a physical property of a polymer. The method is expected to enable rapid discovery of polymers (candidate polymers) with many excellent physical properties.
Moreover, the present disclosure is expected to provide a polymer having excellent physical properties such as an ability to selectively capture specific cells including cancer cells and biocompatibility.
The method for predicting a physical property of a polymer of the present disclosure includes Step 1 of obtaining a structure of a monomer from a structure of a polymer, Step 2 of converting the obtained structure of the monomer or a converted structure of the monomer into a format recognizable by a computer, Step 3 of computing a descriptor from the format recognizable by a computer, Step 4 of obtaining a physical property of the polymer and performing a regression calculation using the physical property of the polymer as an objective variable and the descriptor as an explanatory variable to construct a regression model, Step 5 of computing a format of a new monomer from the format recognizable by a computer, Step 6 of computing a descriptor from the format of the new monomer or a converted format of the new monomer, and Step 7 of applying the descriptor of the new monomer to the regression model to predict a physical property of a polymer produced by polymerizing the new monomer.
The method for predicting a physical property of a polymer includes Step 1 of obtaining a structure of a monomer from a structure of a polymer. For example, in the Step 1, the structure of a monomer having a carbon-carbon double bond which constitutes a polymer is obtained from the structure of the polymer. Specifically, when the structure of the polymer is polyacrylic acid, acrylic acid, which is a structure of a monomer constituting the polyacrylic acid, is obtained.
The polymer in the Step 1 is not limited and may be an appropriate known polymer.
Examples of the polymer include homopolymers of a single monomer and copolymers of two or more monomers. The polymer may be used alone or in combinations of two or more.
The polymer can be produced by known methods. For example, it may be synthesized by polymerizing a monomer constituting the polymer using a solution of the monomer by a known method. Non-limiting examples of the solvent of the monomer solution include those described later. Toluene or methanol is preferred among those.
From the standpoint of an ability to selectively capture specific cells including cancer cells and biocompatibility, the polymer is desirably a polymer with hydrophilicity (hydrophilic polymer).
Examples of the hydrophilic polymer include homopolymers or copolymers of one or two or more hydrophilic monomers, and copolymers of one or two or more hydrophilic monomers and one or two or more different monomers. Each of these may be used alone or in combinations of two or more.
Non-limiting examples of the hydrophilic monomer include various monomers containing hydrophilic groups. Examples of the hydrophilic groups include known hydrophilic groups such as an amide group, a sulfuric acid group, a sulfonic acid group, a carboxylic acid group, a hydroxy group, an amino group, and an oxyethylene group.
Specific examples of the hydrophilic monomer include (meth)acrylic acids, (meth)acrylic acid esters, alkoxyalkyl (meth)acrylates such as methoxyethyl (meth)acrylates, hydroxyalkyl (meth)acrylates such as hydroxyethyl (meth)acrylates, (meth)acrylamides, and (meth)acrylamide derivatives containing cyclic groups such as (meth)acryloylmorpholines. From the standpoint of an ability to selectively capture specific cells including cancer cells and biocompatibility, (meth)acrylic acids, (meth)acrylic acid esters, and alkoxyalkyl (meth)acrylates are preferred, alkoxyalkyl (meth)acrylates are more preferred, and 2-methoxyethylacrylate is particularly preferred among these. Each of these may be used alone or in combinations of two or more.
The different monomer may be appropriately selected from those which do not inhibit the effect of the hydrophilic polymer. Specific examples of the different monomer include aromatic monomers such as styrene, vinyl acetate, and N-isopropylacrylamide which can impart temperature responsiveness. Each of these may be used alone or in combinations of two or more.
Specific examples of the homopolymers and copolymers include: homopolymers formed from a single hydrophilic monomer, such as polyacrylic acids, polyacrylic acid esters, polymethacrylic acids, polymethacrylic acid esters, polyacryloylmorpholine, polymethacryloylmorpholine, polyacrylamide, polymethacrylamide, polyalkoxyalkyl acrylate, and polyalkoxyalkyl methacrylate; copolymers formed from two or more hydrophilic monomers listed above; and copolymers formed from one or more hydrophilic monomers listed above and one or more different monomers listed above. Each of these hydrophilic polymers may be used alone or in combinations of two or more.
Specific examples of homopolymers and copolymers of the hydrophilic monomer include polyacrylic acids, polyacrylic acid esters, polymethacrylic acid, polymethacrylic acid esters, polyacryloyl morpholine, polymethacryloyl morpholine, polyacrylamide, polymethacrylamide, polyalkoxyalkyl acrylate, and polyalkoxyalkyl methacrylate.
From the standpoint of an ability to selectively capture specific cells including cancer cells and biocompatibility, the hydrophilic polymer is preferably a polymer represented by the following formula (I):
wherein R51 represents a hydrogen atom or a methyl group; R52 represents an alkyl group; p represents 1 to 8; m represents 1 to 5; and n represents the number of repetitions. The polymer may be used alone or in combinations of two or more.
From the standpoint of an ability to selectively capture specific cells including cancer cells and biocompatibility, the polymer represented by the formula (I) is preferably a polymer represented by the following formula (I-1):
wherein R51 represents a hydrogen atom or a methyl group; R52 represents an alkyl group; m represents 1 to 5; and n represents the number of repetitions. The polymer may be used alone or in combinations of two or more.
The number of carbon atoms of the alkyl group for R52 is preferably 1 to 10, more preferably 1 to 5. R52 is particularly preferably a methyl group or an ethyl group. The symbol p is preferably 1 to 5, more preferably 1 to 3. The symbol m is preferably 1 to 3. The symbol n (the number of repeating units) is preferably 15 to 1500, more preferably 40 to 1200.
From the standpoint of an ability to selectively capture specific cells including cancer cells and biocompatibility, the hydrophilic polymer is preferably a copolymer of a compound represented by the following formula (II) with a different monomer. The copolymer may be used alone or in combinations of two or more.
In the formula, R51, R52, p, and m are as defined above.
From the standpoint of an ability to selectively capture specific cells including cancer cells and biocompatibility, the polymer represented by the formula (II) is preferably a compound represented by the following formula (II-1):
wherein R51, R52, and m are as defined above. The compound may be used alone or in combinations of two or more. From the standpoint of an ability to selectively capture specific cells including cancer cells and biocompatibility, a hydrophilic polymer represented by the formula (I) is preferred, a hydrophilic polymer represented by the formula (I-1) is more preferred, and polyalkoxyalkyl acrylate or polyalkoxyalkyl methacrylate is still more preferred among the above-described hydrophilic polymers.
To better achieve the advantageous effect, the number average molecular weight (Mn) of the polymer is preferably 8000 to 150000, more preferably 10000 to 60000, still more preferably 10000 to 39000.
Herein, the number average molecular weight (Mn) can be determined with a gel permeation chromatograph (GPC) (GPC-8000 series available from Tosoh Corporation, detector: differential refractometer, column: TSKgel SuperMultipore HZ-M available from Tosoh Corporation). The resulting value is calibrated with polystyrene standards.
In the Step 1 in the method for predicting a physical property of a polymer, when the structure of the polymer is poly(2-methoxyethyl acrylate), for example, 2-methoxyethyl acrylate (a structure of a monomer) is obtained.
After the Step 1 in the method for predicting a physical property of a polymer, Step 2 is performed in which the obtained structure of the monomer or a converted structure of the monomer is converted into a format recognizable by a computer. Specifically, in the Step 2, a compound with the structure of the monomer obtained in the Step 1 (the original structure of the monomer obtained) or a compound converted from the structure of the monomer is converted into a format recognizable by a computer.
The structure of the monomer obtained in the Step 1 may be converted by any technique that can convert the structures of monomers, such as the following techniques (1) and (2): a technique (1) of cleaving a carbon-carbon double bond in the structure of the monomer obtained in the Step 1 (the original structure of the monomer including a carbon-carbon double bond) and introducing a hydrogen atom or a saturated hydrocarbon group such as a methyl group to each terminal of the cleaved monomer to form a compound (converted into a monomer including no carbon-carbon double bond); and a technique (2) of cleaving a carbon-carbon double bond in the structure of the monomer obtained in the Step 1 (the original structure of the monomer) to form a multimer containing only carbon-carbon single bonds and introducing a hydrogen atom or a saturated hydrocarbon group such as a methyl group to each terminal of the multimer to form a compound (converted into a multimer including no carbon-carbon double bond, such as a dimer including no carbon-carbon double bond or a trimer including no carbon-carbon double bond). A desirable technique in the case of conversion to the multimer includes cleaving a carbon-carbon double bond in the structure of the monomer obtained in the Step 1 (the original structure of the monomer) to form a multimer containing only carbon-carbon single bonds and introducing a methyl group to each terminal of the multimer to form a compound.
In the step 2, a compound with the structure of the monomer (the original structure of the monomer) obtained in the Step 1 or a multimer including no carbon-carbon double bond converted from the structure of the monomer is converted into a format recognizable by a computer. Examples of the computer include usual personal computers, servers, workstations, compute nodes, and microcomputers. The hardware of the computer may be a known one.
Examples of the format recognizable by a computer include known formats such as a simplified molecular input line entry system (SMILES) format and a MOL file format.
In the Step 2 in the method for predicting a physical property of a polymer, when the structure of the polymer is, for example, poly(2-methoxyethyl acrylate), the 2-methoxyethyl acrylate (the original structure of the monomer including a carbon-carbon double bond) obtained in the Step 1 is used as it is or one of the following compounds (A) to (C) converted from the 2-methoxyethyl acrylate is used.
In the Step 2, the 2-methoxyethyl acrylate or one of the compounds (A) to (C) is converted into one of the following SMILES formats (SMILES notations) which are formats recognizable by a computer.
The method for predicting a physical property of a polymer includes the Step 2 and separately Step 3 of computing a descriptor from the format recognizable by a computer. Specifically, the structure of the monomer including a carbon-carbon double bond (the original structure of the monomer including a carbon-carbon double bond) constituting a polymer with known physical properties, structure, etc. is converted into a format recognizable by a computer. Alternatively, a compound converted from the structure of the monomer (a multimer including no carbon-carbon double bond, such as the monomer including no carbon-carbon double bond, the dimer including no carbon-carbon double bond, or the trimer including no carbon-carbon double bond) is converted into a format recognizable by a computer. Then, a descriptor is computed from the format.
Non-limiting examples of the physical property of the polymer include glass transition temperature (Tg), melting point, refractive index, contact angle, surface free energy, density, water absorption, interstitial water content, unfreezable water content, selective adhesion to specific cells including cancer cells, and biocompatibility. Of these, the method can suitably predict interstitial water content, selective adhesion to specific cells, and biocompatibility.
In the Step 3, the original structure of the monomer constituting the polymer or the compound converted from the structure of the monomer is converted into a format recognizable by a computer. Examples of the format include the SMILES format and the MOL file format described above. A descriptor (numerical data) to numerically represent the chemical structure is computed from the format. For example, in Python, 208 descriptors or 400 descriptors are computed by RDkit from SMILES. The descriptors can also be computed using mordred, PaDEL, alvaDesc, etc. Fingerprints may be another method to numerically represent the chemical structure.
The step 3 in the method for predicting a physical property of a polymer includes preparing a SMILE format of the structure of the monomer (the original structure of the monomer including a carbon-carbon double bond) constituting one of 11 polymers with known interstitial water contents or a SMILE format of a compound converted from the structure of the monomer (the monomer including no carbon-carbon double bond, the dimer including no carbon-carbon double bond, or the trimer including no carbon-carbon double bond). From each SMILE format, it is possible to compute 208 descriptors of the numerical data reflecting the chemical structure.
The Step 3 may be performed separately from the Steps 1 and 2. The Steps 1 and 2 and the Step 3 may be performed in any order. For example, the Step 3 may be performed after the Steps 1 and 2, simultaneously with the Steps 1 and 2, or before the Steps 1 and 2.
In the method for predicting a physical property of a polymer, Step 4 of obtaining a physical property of the polymer and performing a regression calculation using the physical property of the polymer as an objective variable and the descriptor as an explanatory variable to construct a regression model is performed after the Steps 1, 2, and 3. Specifically, in the Step 4, a regression formula (regression model) is constructed by regression calculation using the physical property of the polymer as an objective variable and the descriptor (for example, 208 descriptors described above) as an explanatory variable.
Known algorithms used in machine learning and the like may be used for the model construction. The regression calculation to construct the regression formula (regression model) may be a known method such as multiple regression calculation, support vector regression calculation, decision tree calculation, random forest regression calculation, XGBoost regression calculation, LightGBM regression calculation, CatBoost regression calculation, Gaussian process regression calculation, neural networks, or deep learning. One regression calculation may be performed alone or a combination of two or more regression calculations may be performed.
In the Step 4 in the method for predicting a physical property of a polymer, four models to predict the interstitial water content can be constructed from four structures including the structure of the monomer (the original structure of the monomer including a carbon-carbon double bond) constituting the polymer and the structures of compounds converted from the structure of the monomer (the monomer including no carbon-carbon double bond, the dimer including no carbon-carbon double bond, and the trimer including no carbon-carbon double bond) using their interstitial water contents as objective variables and 208 descriptors as explanatory variables by ensemble learning including random forest, XGBoost regression calculation, LightGBM regression calculation, and CatBoost regression calculation.
In the case of Boost regression calculation, neural networks, or deep learning, adjustment of hyperparameters is performed to reduce overfitting and to allow the entire dataset usable for training. Examples of the technique for the adjustment of hyperparameters include grid research, random research, and Bayesian optimization. The term “hyperparameter” refers to a parameter that is determined by humans before training. Hyperparameters are necessary to control the behaviors of algorithms. Typical examples include a maximum depth of decision trees, a maximum number of leaves in decision trees, a minimum number of samples per leaf, a proportion of samples randomly extracted from each decision tree, a proportion of columns randomly extracted from each decision tree, a L1 regularization term for leaf weights in decision trees, a L2 regularization term for leaf weights in decision trees, a minimum loss reduction by adding leaves to decision trees, a method to optimize a loss function, the number of boosting rounds, the number of epochs, a learning rate, a threshold, a minibatch size, the number of layers, and the number of neurons per layer.
In the method for predicting a physical property of a polymer, Step 5 of computing the format of a new monomer from the format recognizable by a computer is performed after the Steps 1, 2, 3, and 4. Specifically, in the Step 5, the format of a new monomer is computed (generated) from the format, such as a SMILES format or a MOL file format, of the structure of the monomer (the monomer including a carbon-carbon double bond) constituting the polymer and the like.
The format recognizable by a computer of a new monomer can be computed (generated) by, for example, fragmenting the chemical structure of the compound in the format using the breaking of retrosynthetically interesting chemical substructures (BRICS) algorithm, the retrosynthetic combinatorial analysis procedure (RECAP) algorithm, the design of genuine structures (DOGS) algorithm, or the like and then mechanically combining the fragments. Examples of the format recognizable by a computer include the SMILES format and the MOL file format described above.
When using RECAP, for example, an organic molecule is broken into fragments at sites where bonds therein are easily cleaved. Examples of the bonds to consider include amide bonds, ester bonds, amine bonds, and N—C bonds in urea, ether bonds, C═C bonds, ammonia, N—S bonds in sulfenamide, aromatic ring-aromatic ring bonds, N (in an aromatic ring)-C (sp3) bonds, and N (in a lactam ring)-C (sp3) bonds. When using BRICS, an organic molecule is broken into fragments while considering 16 types of bonds by the same method as described for RECAP.
In the computation (generation) of the format recognizable by a computer of a new monomer, compounds including no carbon-carbon double bond may be generated. In order to predict the physical property of the polymer, compounds including no carbon-carbon double bond are excluded from the generated monomers.
In the Step 5 in the method for predicting a physical property of a polymer, for example, 74 compounds are generated using BRICS or the like from the SMILES formats of the structures of 11 monomers (the original structures of the monomers including a carbon-carbon double bond) respectively constituting 11 types of polymers each with a known interstitial water content. From the compounds, 33 compounds that can be monomers are selected (41 others are excluded). Thus, the SMILES formats of the 33 new monomers are computed (generated).
The method for predicting a physical property of a polymer includes Step 6 of computing a descriptor from the format (format recognizable by a computer) of the new monomer computed (generated) in the Step 5 or a converted format of the new monomer. Specifically, in the Step 6, first, the format (format recognizable by a computer) of the new monomer computed (generated) in the Step 5 is converted into a format of a compound with the structure of the monomer (the original structure of the monomer) or a format of a compound converted from the structure of the monomer by the same method as in the Step 2, and then a descriptor (numerical data) is computed from the format by the same method as in the Step 3.
In the Step 6 in the method for predicting a physical property of a polymer, for example, SMILES formats of the 33 new monomers computed (generated) as above are generated from the respective structures of the 33 new monomers (the original structures of the monomers including a carbon-carbon double bond) or the structures of compounds converted from the structures of the monomers (the monomer including no carbon-carbon double bond, the dimer including no carbon-carbon double bond, the trimer including no carbon-carbon double bond), and then descriptors (numerical data) are computed from the SMILES formats.
The method for predicting a physical property of a polymer includes Step 7 of applying the descriptor of the new monomer to the regression model to predict a physical property of a polymer produced by polymerizing the new monomer. Specifically, in the Step 7, the descriptor (numerical data) of the new monomer computed in the Step 6 is applied to the regression model constructed in the Step 4 to predict the physical property of a polymer produced by polymerizing the new monomer.
In the Step 7, for example, the descriptors (numerical data) of the structures of the 33 new monomers (the original structures of the monomers including a carbon-carbon double bond) or the compounds converted from the monomers (the monomers including no carbon-carbon double bond, the dimers including no carbon-carbon double bond, the trimers including no carbon-carbon double bond) computed (generated) in the Step 6 are applied to the regression model constructed in the Step 4 to predict the interstitial water contents of polymers respectively produced by polymerizing the new monomers.
Specifically,
Further, when compounds are generated while setting the maximum number of compounds to be generated to 2000 and also the compounds are generated from the structures of the monomers (the original structures of the monomers including a carbon-carbon double bond) derived from the 11 types of polymers using the BRIXCS algorithm, more than 1600 compounds (SMILES notations) are generated. From the generated compounds, compounds including no carbon-carbon double bond and compounds including two or more carbon-carbon double bonds are excluded to select 625 compounds including only one carbon-carbon double bond. The descriptors of the 625 compounds are computed. Then, the interstitial water contents of the 625 compounds are predicted from the prediction models of the compounds including only one carbon-carbon double bond.
The interstitial water contents from 0.01 to 0.15 among the predicted interstitial water contents are shown in Table 1-1, Table 1-2 (continued from Table 1-1), and Table 1-3 (continued from Table 1-1).
The interstitial water contents from 0.22 to 0.5 are shown in Table 2-1, Table 2-2 (continued from Table 2-1), Table 2-2 (continued from Table 2-1), Table 2-3 (continued from Table 2-1), Table 2-4 (continued from Table 2-1), Table 2-5 (continued from Table 2-1), Table 2-6 (continued from Table 2-1), and Table 2-7 (continued from Table 2-1).
indicates data missing or illegible when filed
indicates data missing or illegible when filed
indicates data missing or illegible when filed
indicates data missing or illegible when filed
The machine learning method enabled generation of the regression formulas (regression models) to predict the physical property of the polymers.
Moreover, in the case where the interstitial water content was from 0.01 to 0.15 (Tables 1-1 to 1-3), the resulting polymers were found to adhere well to specific cells including cancer cells. Therefore, a polymer that has an interstitial water content of 0.01 to 0.15 as determined by the method for predicting a physical property of a polymer is excellent in selective adhesion to specific cells, particularly in selective adhesion to cancer cells. For example, a polymer produced by polymerizing a monomer expressed by any of the SMILES formats in Tables 1-1 to 1-3 had an interstitial water content of 0.01 to 0.15 and adhered well to cancer cells.
Further, in the case where the interstitial water content was from 0.22 to 0.5 (Tables 2-1 to 2-7), the resulting polymers were found to have good biocompatibility. Therefore, a polymer that has an interstitial water content of 0.22 to 0.5 as determined by the method for predicting a physical property of a polymer has excellent biocompatibility. For example, a polymer produced by polymerizing a monomer expressed by any of the SMILES formats in Tables 2-1 to 2-7 had good biocompatibility.
The above are the results in the case where a double bond was included and random forest regression calculation was used. In contrast, Table 3 shows predicted interstitial water contents obtained by a method including: constructing a prediction model using a monomer including no double bond with XGBoost regression calculation and similarly constructing a prediction model with CatBoost regression calculation (each hyperparameter is manually adjusted to reduce the root mean square error (RMSE)); constructing a prediction model using an Optuna framework with XGBoost regression calculation and adjusting the hyperparameters by Bayesian optimization; converting the monomers in the SMILES formats in Table 3 into monomers including no double bond (by cleaving a double bond and introducing a methyl group to each terminal); computing descriptors of the converted monomers; and applying the descriptors to the three prediction models to predict interstitial water contents. The predicted interstitial water contents were within a certain range and found to be somewhat plausible.
Exemplary embodiments of the present disclosure include the following.
Embodiment 1. A method for predicting a physical property of a polymer, the method including:
Embodiment 2. The method for predicting a physical property of a polymer according to Embodiment 1,
Embodiment 3. The method for predicting a physical property of a polymer according to Embodiment 1 or 2,
Embodiment 4. The method for predicting a physical property of a polymer according to any combination with any one of Embodiments 1 to 3,
Embodiment 5. The method for predicting a physical property of a polymer according to any combination with any one of Embodiments 1 to 4,
Embodiment 6. The method for predicting a physical property of a polymer according to any combination with any one of Embodiments 1 to 5,
Embodiment 7. The method for predicting a physical property of a polymer according to any combination with any one of Embodiments 1 to 6,
Embodiment 8. A polymer having an interstitial water content of 0.01 to 0.15 as determined by the method for predicting a physical property of a polymer according to any combination with any one of Embodiments 1 to 7.
Embodiment 9. The polymer according to Embodiment 8, which is produced by polymerizing a monomer having a SMILES format in any of Tables 1-1 to 1-3.
Embodiment 10. The polymer according to Embodiment 8 or 9, which is capable of selectively adhering to specific cells.
Embodiment 11. The polymer according to Embodiment 10,
Embodiment 12. A polymer having an interstitial water content of 0.22 to 0.5 as determined by the method for predicting a physical property of a polymer according to any combination with any one of Embodiments 1 to 7.
Embodiment 13. The polymer according to Embodiment 12, which is produced by polymerizing a monomer having a SMILES format in any of Tables 2-1 to 2-7.
Embodiment 14. The polymer according to Embodiment 12 or 13, which has biocompatibility.
Number | Date | Country | Kind |
---|---|---|---|
2023-100052 | Jun 2023 | JP | national |