This application is based on and claims priority from Japanese Patent Application No. 2021-031234, filed on Feb. 26, 2021, the disclosure of which is incorporated by reference herein.
The present disclosure relates to a prediction device, a trained model generation device, a prediction method, a trained model generation method, a recording medium recorded with a prediction program, and a recording medium recorded with a trained model generation program.
A molecular dynamics simulation is disclosed in Japanese Patent Application Laid-Open (JP-A) No. 2017-037378. This takes, as an initial structure in structural analysis of a biopolymer, a structure of outlier values not included in any cluster for clustering performed on plural structures in multidimensional space having all of the index dimensions included in a dimension set as coordinate axes (i.e. in claim 4).
A protein three dimensional structure prediction program disclosed in international Publication (WO) No. 2003/054743 predicts the three dimensional structure of a protein. A computer executes this protein three dimensional structure prediction program, reads in an amino acid sequence of a protein, and predicts secondary structure information. Next, the computer computes a number of amino acids to form a turn based on the secondary structure information, acquires turn structure information of a turn having a high probability of being present from the computed number of amino acids and the secondary structure information, performs prediction-reproduction of a turn, and predicts a three dimensional structure of the protein.
Moreover, Japanese National-Phase Publication No. 2020-523010 discloses a method for generating, for each patient, a set of likelihoods for a set of neoantigens for the patient by inputting a peptide sequence of each of the sets of neoantigens into a machine-learned presentation model (i.e. in claim 1).
Moreover, Japanese National-Phase Publication No. 2020-519246 discloses a method for generating a set of presentation likelihoods for a set of neoantigens by employing a processor of a computer to input numerical vectors of peptides into a deep learning presentation model (i.e. in claim 1),
Peptide drugs have recently become a focus of attention as a type of middle molecule drugs. However, there are many unclear points regarding the pharmacokinetics of peptides. In particular, peptides have a low membrane permeability, which is a measure of permeation through a cell membrane. There is accordingly a demand to predict with good accuracy whether a peptide Obtained for administering as a drug has a certain degree of membrane permeability.
Technology disclosed in JP-A No. 2017-037378, WO No. 2003/054743, Japanese National-Phase Publication Nos. 2020-523010, or 2020-519246, as listed above, is technology to execute a molecular dynamics simulation of a biopolymer, technology to predict a three dimensional structure of a protein using a computer, and technology to predict a peptide that is effective as a neoantigen, and is not technology for predicting membrane permeability of a peptide. The technology in the citations above accordingly has the problem of not being able to predict peptide membrane permeability.
In consideration of the above circumstances, an object of the present disclosure is to predict membrane permeability of a peptide.
A prediction device, a prediction method, and a recording medium recorded with a prediction program of a first aspect of the present disclosure are configured to extract a predictive feature vector expressing a feature from a peptide that is a target for membrane permeability prediction, to adjust such that a length of the extracted predictive feature vector is a prescribed length, and to generate a predicted value of membrane permeability for the prediction target peptide by inputting the length-adjusted predictive feature vector into a trained model pre-trained to output a predicted value of peptide membrane permeability from a feature vector expressing a feature of a peptide.
A trained model generation device, a trained model generation method, and a recording medium recorded with a trained model generation program according to a second aspect of the present disclosure are configured to extract a training feature vector expressing a feature from each of plural of training peptides, to adjust a length of each of the extracted training feature vectors for each of the plural training peptides so as to be a prescribed length, and to generate a trained model, for outputting a predicted value of peptide membrane permeability from a feature vector expressing a feature of a peptide, by executing a machine learning algorithm based on training data that is the length-adjusted training feature vectors paired with correct values of membrane permeability for the training peptides.
A prediction device, a prediction method, and a recording medium recorded with a prediction program of a third aspect of the present disclosure are configured to extract from a cyclic peptide that is a target for membrane permeability prediction each of predictive feature vectors expressing a feature for instances in which each of a plural residues contained in the cyclic peptide is at a start point of a cyclic sequence, and to generate a predicted value of membrane permeability of the prediction target cyclic peptide by inputting the plural extracted predictive feature vectors into a trained model pre-trained for outputting a predicted value of peptide membrane permeability from a feature vector expressing a feature of a cyclic peptide.
A trained model generation device, a trained model generation method, and a recording medium recorded with a trained model generation program of a fourth aspect of the present disclosure are configured to extract from each of plural training cyclic peptides a training feature vector expressing a feature for instances in which each of plural residues contained in the training cyclic peptide is at a start point of a cyclic sequence, and to generate a trained model for outputting a predicted value of cyclic peptide membrane permeability from a feature vector expressing a feature of a cyclic peptide by executing a machine learning algorithm based on training data that is the plural extracted training feature vectors for the plural training cyclic peptides paired with a correct value of membrane permeability for the respective training cyclic peptide.
A prediction device, a prediction method, and a recording medium recorded with a prediction program of a fifth aspect of the present disclosure are a prediction device, a prediction method, and a recording medium recorded with a prediction program for computing a predicted value of membrane permeability of a peptide permeating through a membrane region representing a cell membrane, a first solvent region representing a solvent adjacent to one side of the membrane region, and a second solvent region representing a solvent adjacent to another side of the membrane region. Based on a result of simulation of a peptide permeating through the first solvent region, the membrane region, and the second solvent region, a free energy G(z) of the peptide is computed for each reaction coordinate z expressing a position of the peptide in a region including the first solvent region, the membrane region, and the second solvent region and expressing a position of the peptide in a direction of an axis perpendicular to a membrane surface of the membrane region. At each of the reaction coordinates z, a difference ΔG(z) is computed between a minimum value Gmin out of the free energies G(z) of the peptide computed for the reaction coordinates z and the free energy G(z) of the peptide at the reaction coordinate z. A local diffusion coefficient D(z) is computed at each of the reaction coordinates z, and a value R(z) expressing a local resistance of the peptide at the reaction coordinate z is computed based on the difference AG-(z) computed for each of the reaction coordinates z and based on the local diffusion coefficient D(z) computed for each of the reaction coordinates z. A predicted value of membrane permeability is computed for the peptide based on the value R(z) expressing the local resistances computed for each of the reaction coordinates z.
A prediction device, a prediction method, and a recording medium recorded with a prediction program according to a sixth aspect of the present disclosure are a prediction device, a prediction method, and a recording medium recorded with a prediction program for simulating dynamics of a peptide permeating through a membrane region representing a cell membrane, a first solvent region representing a solvent adjacent to one side of the membrane region, and a second solvent region representing a solvent adjacent to another side of the membrane region. An initial conformation of the peptide is set according to a relative substance permittivity in the first solvent region for simulation of the peptide permeating a segment spanning from the first solvent region to a vicinity of a lipid molecule join positioned further toward a membrane center side than a boundary between the first solvent region and the membrane region. An initial conformation of the peptide is set according to a relative substance permittivity in the membrane region for simulation of the peptide permeating a segment spanning from the vicinity of the join to past a region of a membrane central zone representing a central area of the membrane region. Dynamics of the peptide are simulated according to the set initial conformation of the peptide and a series of initial conformations are set at respective regions using an umbrella sampling method based on a result of simulation obtained. The dynamics of the peptide are simulated according to an umbrella sampling method based on the series of initial conformations set for each of the regions, and membrane permeability of the peptide is predicted based on a result of simulation based on an umbrella sampling.
A prediction device, a prediction method, and a recording medium recorded with a. prediction program according to a seventh aspect of the present disclosure are a prediction device, a prediction method, and a recording medium recorded with a prediction program for simulating dynamics of a peptide permeating through a membrane region representing a cell membrane, a first solvent region representing a solvent adjacent to one side of the membrane region, and a second solvent region representing a solvent adjacent to another side of the membrane region. When simulating permeation of the peptide using an umbrella sampling method, a spacing between restraint positions of the peptide is set so as to be finer the closer a region is to a membrane central zone representing a central area of the membrane region. Dynamics of the peptide are simulated using an umbrella sampling method according to the spacing set between the restraint positions, and membrane permeability of the peptide is predicted based on a result of the simulation.
A prediction device, a prediction method, and a recording medium recorded with a prediction program according to an eighth aspect of the present disclosure are configured to generate a first membrane permeability value expressing peptide membrane permeability by simulating dynamics of the peptide permeating through a membrane region representing a cell membrane, a first solvent region representing a solvent adjacent to one side of the membrane region, and a second solvent region representing a solvent adjacent to another side of the membrane region, to generate a second membrane permeability value expressing membrane permeability of the peptide by extracting a predictive feature vector expressing a feature from the peptide and inputting the predictive feature vector into a trained model previously subjected to machine learning, and to compute a predicted value of membrane permeability of the peptide by consolidating the generated first membrane permeability value with the generated second membrane permeability value.
A trained model generation device, a trained model generation method, and a recording medium recorded with a trained model generation program according to a ninth aspect of the present disclosure are configured to generate a predicted value of membrane permeability expressing membrane permeability of a peptide by simulating dynamics of the peptide permeating through a membrane region representing a cell membrane, a first solvent region representing a solvent adjacent to one side of the membrane region, and a second solvent region representing a solvent adjacent to another side of the membrane region, to generate simulation-derived training data expressed by the obtained predicted value of peptide membrane permeability paired with a feature vector generated from a 3D descriptor obtained from a tertiary structure of the peptide at each location in the membrane region, the first solvent region, or the second solvent region, and to generate a trained model, for outputting a predicted value of the membrane permeability from the feature vector, by executing a machine learning algorithm based on training data including the generated simulation-derived training data.
A trained model generation device, a trained model generation method, and a recording medium recorded with a. trained model generation program according to a tenth aspect of the present disclosure are configured to extract a first training feature vector expressing a feature from each of plural training cyclic peptides, to generate plural second training feature vectors for each of the extracted first training feature vectors by cyclically shifting elements of the first training feature vectors, and to generate training data expressed. by the first training feature vector and the plural second training feature vectors paired with a correct value of membrane permeability of the respective training cyclic peptide, and to generate a trained model, for outputting a predicted value of membrane permeability of a cyclic peptide from a feature vector expressing a feature of a cyclic peptide, by executing a machine learning algorithm based on the plural generated training data.
A trained model generation device, a trained model generation method, and recording medium recorded with a trained model generation program according to an eleventh aspect of the present disclosure are configured to, based on training data expressed by a training feature vector expressing a feature extracted for each of plural training cyclic peptides paired with correct values of membrane permeability for the plural training cyclic peptides, to generate a trained convolutional neural network model for outputting a predicted value of membrane permeability of a cyclic peptide from a feature vector expressing a feature of a cyclic peptide by executing a machine learning algorithm using a convolutional neural network model including a both-end-adjacency layer in which elements at both ends of the training feature vectors are placed adjacent to one another.
A prediction device, a prediction method, and a recording medium recorded with a. prediction program according to a twelfth aspect of the present disclosure are configured to extract a predictive feature vector expressing a feature from a cyclic peptide that is a target for membrane permeability prediction, and to generate a predicted value of membrane permeability of the prediction target peptide by inputting the extracted predictive feature vector into a trained convolutional neural network model for outputting a predicted value of membrane permeability of a peptide from the feature vector that is a trained convolutional neural network model including a both-end-adjacency layer in which elements at both ends of a feature vector expressing a feature of a cyclic peptide are placed adjacent to one another.
The present disclosure obtains the advantageous effect of being able to predict membrane permeability of a peptide.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
Detailed explanation follows regarding exemplary embodiments of the present invention, with reference to the drawings.
The prediction device 10 of the present exemplary embodiment predicts membrane permeability of cyclic peptides.
Training peptide information expressing cyclic peptides used for training and correct values for membrane permeability of these training cyclic peptides are stored associated with each other in the data storage section 12. Note that the peptide information is information including at least one type of information from out of a chemical formula of the peptide, SMILES notation of the peptide, a primary structure of the peptide, a secondary structure of the peptide, a tertiary structure of the peptide, or a quaternary structure the peptide.
The correct values for membrane permeability of the training cyclic peptides are, for example, data obtained by performing known experiments on the training cyclic peptides.
The training extraction section 14 extracts training feature vectors expressing features of cyclic peptides from the plural training peptide information stored in the data storage section 12. Note that the feature vectors are extracted from the peptide information using a known method.
For example, for a feature vector configuration in which the residue 1 illustrated in
Thus, even if the cyclic peptides are the same, the feature vectors will be different in cases in which the residue at the start point of the cyclic sequence is different. The membrane permeability of cyclic peptides is not able to be appropriately predicted for such cases.
To address this, in the present exemplary embodiment, the respective feature vectors are extracted for instances in which each of the plural residues contained in the cyclic peptide is at the start point of the cyclic sequence, and the membrane permeability is predicted based on these plural feature vectors.
Specifically, from peptide information for each of plural items of training cyclic peptides, the training extraction section 14 extracts feature vectors expressing features for instances in which each of the plural residues contained in the training cyclic peptide is at the start point of the cyclic sequence.
For example, the training extraction section 14 extracts a feature vector 1 for an instance in which the residue 1 illustrated in
The training extraction section 14 sets each single extracted feature vector as a single training feature vector. Thus, a set of feature vectors extracted from a single training cyclic peptide corresponds to a training feature vector set.
For each of the plural training cyclic peptides, the training extraction section 14 associates the set of training feature vectors with a correct value for membrane permeability of the training peptide, and stores these in the training data storage section 16.
Plural training data are stored in the training data storage section 16. A single item of training data is training feature vectors paired with a correct value for membrane permeability of the training peptide.
The training section 18 generates a trained model, for outputting a predicted value for peptide membrane permeability from feature vectors, by executing a known supervised machine learning algorithm based on the plural training data stored in the training data storage section 16. The training section 18 then stores the trained model in the trained model storage section 20. Note that the trained model itself is a known model, and may for example be a neural network model, a support vector machine, a logistic regression model, or the like. Note that neural network models include deep neural network models obtained by deep learning.
Note that as described below, plural feature vectors are also extracted from the cyclic peptide that is the target for membrane permeability prediction by employing different start points for the cyclic sequence. By inputting each of these plural feature vectors into the trained model a predicted value of membrane permeability is obtained corresponding to each of the plural feature vectors.
The trained model generated by the training section 18 is stored in the trained model storage section 20. Note that the trained model is data in which a structure and trained parameters of a model are associated with each other.
The extraction section 22 extracts feature vectors expressing features from the cyclic peptide that is the target for membrane permeability prediction. Specifically, from the peptide information regarding the cyclic peptide that is the target for membrane permeability prediction, the extraction section 22 extracts respective feature vectors (hereafter referred to as predictive feature vectors) expressing features for instances in which each of the plural residues contained in the cyclic peptide is at the start point of the cyclic sequence.
The generation section 24 generates a predicted value of membrane permeability for the prediction target cyclic peptide by inputting the plural predictive feature vectors obtained by the extraction section 22 into the trained model stored in the trained model storage section 20.
Specifically, the generation section 24 generates respective predicted values for membrane permeability of the prediction target peptide by inputting each of plural predictive feature vectors obtained by the extraction section 22 into the trained model. Note that a single predicted value corresponds to a single predictive feature vector. The generation section 24 then generates a representative value for the plural predicted values and sets the representative value as the membrane permeability of the peptide that is the prediction target. For example, the generation section 24 may generate an average value of the plural predicted values as the representative value. Alternatively, the generation section 24 may generate a maximum value or a minimum value of the plural predicted values as the representative value.
Note that either the plural predicted values or the representative value for membrane permeability generated by the generation section 24 may be displayed on a display section (not illustrated in the drawings).
In this manner, the prediction device 10 of the first exemplary embodiment extracts respective feature vectors for instances in which each of the plural residues contained in a cyclic peptide is at the start point of the cyclic sequence, and predicts membrane permeability based on these plural feature vectors. This obtains plural feature vectors in consideration of rotational symmetry of the cyclic peptide, and enables membrane permeability of the cyclic peptide to be predicted in an appropriate manner based on these feature vectors.
The prediction device 10 may for example be implemented by a computer 50 such as that illustrated in
The storage section 53 may be implemented by a hard disk drive (HDD), a solid state drive (SSD), flash memory, or the like. The storage section 53 serves as a storage medium and is stored with a program causing the computer 50 to function. The CPU 51 reads the program from the storage section 53, expands the program in the memory 52, and sequentially execute processes in the program.
Next, explanation follows regarding operation of the prediction device 10 of the first exemplary embodiment.
On receiving an instruction signal indicating an instruction to perform trained model generation processing, the prediction device 10 executes a trained model generation processing routine as illustrated in
At step S100, the training extraction section 14 extracts, from each of the peptide information for plural training cyclic peptides, the training feature vectors expressing features for the instances in which each of the plural residues contained in a. training cyclic peptide is at the start point of the cyclic sequence.
At step S102, the training extraction section 14 associates the set of training feature vectors extracted at step S100 with a correct value for membrane permeability of the training cyclic peptide to generate training data, and temporarily stores this training data in the training data storage section 16.
At step S104, the training section 18 generates a trained model, for outputting a predicted value for peptide membrane permeability from feature vectors, by executing a known supervised machine learning algorithm based on the plural training data stored in the training data storage section 16.
At step S106, the training section 18 stores the trained model generated at step S104 in the trained model storage section 20.
When the trained model has been stored in the trained model storage section 20 and the peptide information for the target for membrane permeability prediction has been input to the prediction device 10, the prediction device 10 executes the prediction processing routine illustrated in
At step S200, the extraction section 22 receives the peptide information for the target for membrane permeability prediction.
At step S202, from the peptide information received at step S200, the extraction section 22 extracts respective predictive feature vectors expressing features for instances in which each of the plural residues contained in the cyclic peptide is at the start point of the cyclic sequence.
At step S204, the generation section 24 generates plural predicted values of membrane permeability for the prediction target peptide by inputting each of the plural predictive feature vectors extracted at step S202 into the trained model stored in the trained model storage section 20.
At step S206, the generation section 24 generates a representative value from the plural predicted values generated at step S204.
At step S208, the generation section 24 outputs the representative value of the predicted values of membrane permeability generated at step S206 as a result.
As described in detail above, the prediction device of the first exemplary embodiment extracts from each of plural training cyclic peptides a set of training feature vectors expressing features for instances in which each of plural residues contained in the training cyclic peptide is at the start point of the cyclic sequence. Then, for each of the plural training cyclic peptides, the prediction device executes a machine learning algorithm based on the training data of the extracted plural training feature vectors paired with correct values for the membrane permeability of the training cyclic peptides, so as to generate a trained model for outputting a predicted value of membrane permeability for a cyclic peptide from feature vectors expressing cyclic peptide features This enables a trained model to be obtained for predicting the membrane permeability of cyclic peptides. Note that the trained model is trained based on training feature vectors for instances in which each of the plural residues is at the start point of the cyclic sequence, and so the model is suited to predicting the membrane permeability of cyclic peptides.
Moreover, the prediction device of the first exemplary embodiment extracts from a cyclic peptide that is the target for membrane permeability prediction respective feature vectors expressing features for instances in which each of plural residues contained in the cyclic peptide is at the start point of the cyclic sequence. The prediction device then generates a predicted value of membrane permeability for the prediction target cyclic peptide by inputting the plural feature vectors into the trained model. This enables the membrane permeability of the cyclic peptide to be predicted. Specifically, as described previously, the trained model is trained based on the training feature vectors for instances in which each of the plural residues is at the start point of the cyclic sequence, and so the model is suited to predicting the membrane permeability of cyclic peptides. This enables predicted values for membrane permeability to be generated in consideration of the cyclic peptide structure.
Next, explanation follows regarding a second exemplary embodiment. A prediction device of the second exemplary embodiment differs from the first exemplary embodiment in that lengths of the plural feature vectors are aligned. Note that although an example of a case applied to cyclic peptides as the target has been described in the first exemplary embodiment, there is no limitation to cyclic peptides in the second exemplary embodiment, and linear peptides may be the target. Moreover, similar portions in the configuration of the prediction device according to the second exemplary embodiment to those of the prediction device of the first exemplary embodiment are allocated the same reference numerals, and explanation thereof is omitted.
The training adjustment section 15 performs adjustment such that the respective lengths of the training feature vectors of the plural training peptides extracted by the training extraction section 14 become a prescribed length.
The peptides include plural residues. Thus the length of the feature vectors differs between peptides that have a different number of residues. Specifically, the number of feature vector elements correspond to the number of residues, and so the length of the feature vectors differs between peptides that have a different number of residues. Note that feature vectors input into a trained model such as a neural network model are preferably uniform. For example, in cases in which the number of feature vector elements is ten, an action is required to make it such that there is also a corresponding ten nodes in the input layer of the neural network model, an example of a trained model.
Thus, in cases in which the lengths of the feature vectors extracted from each of the plural peptides differ, unless some appropriate measure is taken, a trained model employing a machine learning algorithm such a neural network model cannot be built, or the peptide membrane permeability cannot be predicted using such a trained model.
To address this, in the prediction device of the second exemplary embodiment, the lengths of the feature vectors extracted from the peptides are aligned, thereby enabling training to be performed using a machine learning algorithm that employs these feature vectors. Furthermore, the peptide membrane permeability can be predicted using a trained model obtained by training.
Specifically, for example, the training adjustment section 15 identifies the training feature vector with the maximum length from out of the plural training feature vectors, and performs adjustment such that the lengths of the plural other training feature vectors become this maximum length. Alternatively, for example, the training adjustment section 15 may perform adjustment such that the respective lengths of the plural training feature vectors become a prescribed length. Note that the prescribed length in such cases may be preset by a user.
For example, the training adjustment section 15 may align the length of the training feature vectors by converting using a known padding method. A padding method is a method in which a vacant location of a target is filled with a substitute value or the like. Thus, for example, in the case of a training feature vector [0.13, 0.45, 0.82] with a length of three, if the prescribed length is five then the training adjustment section 15 may use a padding method so as to generate a training feature vector with a length of five such as [0.00, 0.13, 0.45, 0.82, 0.00], Note that when adjusting the lengths of the training feature vectors, the training adjustment section 15 may add an element containing information about the length pre-adjustment, such as the number of residues prior to length adjustment.
Alternatively, for example, the training adjustment section 15 may align the lengths of the training feature vectors by conversion using a linear interpolation method. Specifically, the training adjustment section 15 may compute a feature value x′, this being an element of a training feature vector, using the following Equation (1).
Wherein xi is a feature value of a residue position i of a peptide with residue length k (1≤i≤k); and
x′j is a jth feature value of sequence length m after interpolation (1≤j≤m)
The training adjustment section 15 converts a training feature vector with a length k obtained from a peptide with a residue length k into a training feature vector with a length m according to the Equation (1). Note that xi is a feature value at the position of an ith element of a training feature vector x prior to conversion, and x′j is a feature value at the position of an jth element of a training feature vector x′ after conversion. The lengths of plural training feature vectors are aligned in this manner.
The training adjustment section 15 then associates the training feature vectors having aligned lengths with the correct values for membrane permeability of the corresponding training peptides, and stores these in the training data storage section 16.
There are plural training data stored in the training data storage section 16.
The training section 18 generates a trained model, for outputting a predicted value for peptide membrane permeability from feature vectors, by executing a known supervised machine learning algorithm based on the plural training data stored in the training data storage section 16. The training section 18 then stores the trained model in the trained model storage section 20.
The trained model generated by the training section 18 is stored in the trained model storage section 20.
The extraction section 22 extracts predictive feature vectors expressing features from the membrane permeability prediction target cyclic peptide.
The adjustment section 23 performs adjustment such that the lengths of the predictive feature vectors extracted by the extraction section 22 are the same prescribed length as those in the training data. Specifically, the adjustment section 23 adjusts the lengths of the predictive feature vectors using a similar method to the training adjustment section 15 as described above.
The generation section 24 generates a predicted value for membrane permeability of the prediction target peptide by inputting the predictive feature vectors with their lengths adjusted by the adjustment section 23 into the trained model stored in the trained model storage section 20.
Note that the predicted values for membrane permeability generated by the generation section 24 are displayed on a display section (not illustrated in the drawings).
Next, explanation follows regarding operation of the prediction device 210 of the second exemplary embodiment.
On receiving an instruction signal indicating an instruction to perform trained model generation processing, the prediction device 210 executes the trained model generation processing routine illustrated in
At step S300, the training extraction section 14 extracts training feature vectors expressing features of the training peptide from the plural training peptide information stored in the data storage section 12.
At step S302, the training adjustment section 15 performs adjustment such that the respective lengths of the training feature vectors for the plural training peptides extracted at step S300 become a prescribed length.
At step S304, the training adjustment section 15 associates the training feature vectors having lengths aligned at step S302 with respective correct values for membrane permeability of the training peptides and generates training data to be temporarily stored in the training data storage section 16.
At step S306, the training section 18 generates a trained model, for outputting a predicted value for peptide membrane permeability from feature vectors expressing peptide features, by executing a known supervised machine learning algorithm based on the plural training data stored in the training data storage section 16.
At step S308, the training section 18 stores the trained model generated at step S306 in the trained model storage section 20.
When the trained model has been stored in the trained model storage section 20, and the peptide information for the target for membrane permeability prediction has been input to the prediction device 210, the prediction device 210 executes the prediction processing routine illustrated in
At step S400, the extraction section 22 receives the peptide information for the target for membrane permeability prediction.
At step S402, the extraction section 22 extracts predictive feature vectors from the peptide information received at step S400.
At step S404, the adjustment section 23 performs adjustment such that the lengths of the predictive feature vectors extracted at step S402 become the prescribed length.
At step S406, the generation section 24 generates a predicted value of membrane permeability for the prediction target peptide by inputting the predictive feature vectors having lengths adjusted at step S404 into the trained model stored in the trained model storage section 20.
At step S408, the generation section 24 outputs the predicted value for membrane permeability generated at step S406 as a result.
As described in detail above, the prediction device of the second exemplary embodiment performs adjustment such that the respective lengths of the training feature vectors for the plural training peptides become the prescribed length. The prediction device then generates a trained model, for outputting a predicted value for peptide membrane permeability from feature vectors extracted from peptides, by executing a machine learning algorithm based on the training data in which the length-adjusted training feature vectors are paired with the respective correct values for membrane permeability of the training peptides. Thus a trained model can be obtained for predicting peptide membrane permeability, even in cases in which peptides are configured from plural residues having a different number of residues.
Moreover, the prediction device of the second exemplary embodiment generates a predicted value for membrane permeability of a prediction target peptide by adjusting the length of feature vectors extracted from the peptide that is the target for membrane permeability prediction so as to become the prescribed length, and inputting the length-adjusted feature vectors into the trained model. This enables the peptide membrane permeability to be predicted even in cases in which peptides are configured from plural residues having a different number of residues.
Next, explanation follows regarding a third exemplary embodiment. A prediction device of the third exemplary embodiment differs from the first and second exemplary embodiments in respect that the training data is augmented by data augmentation that focuses on the structural properties of a cyclic peptide, and a trained model is generated based on this augmented training data. Note that similar portions in the configuration of the prediction device according to the third exemplary embodiment to those of the prediction devices of the first and second exemplary embodiments are allocated the same reference numerals, and explanation thereof is omitted.
When augmenting the training feature vectors, the prediction device of the third. exemplary embodiment performs a similar length adjustment to that in the second exemplary embodiment, and then cyclically shifts elements of the training feature vectors so as to generate plural training feature vectors. This enables the training data to be augmented while considering structural characteristic of the cyclic peptides.
The training extraction section 14 of the third exemplary embodiment extracts a set of first training feature vectors expressing features of a training cyclic peptide from out of each of the plural peptide information regarding training cyclic peptides.
Specifically, first, the training data generation section 315 aligns the lengths of the plural first training feature vectors to a prescribed length, similarly to in the second exemplary embodiment. Next, for each of the first training feature vectors included in the first training feature vector set extracted by the training extraction section 14, the training data generation section 315 cyclically shifts elements of the first training feature vectors to generate a set of second training feature vectors.
Next, as illustrated in
The training data generation section 315 generates training data expressed by the first training feature vector set and the second training feature vectors set paired with respective correct values for membrane permeability of the training cyclic peptides. The training data generation section 315 then stores the plural generated items of training data in the training data storage section 16,
The training section 18 generates a trained model, for outputting a predicted value for membrane permeability of a cyclic peptide from feature vectors expressing cyclic peptide features, by executing a machine learning algorithm based on the plural training data stored in the training data storage section 16.
Note that other configuration and operation of the prediction device 310 of the third exemplary embodiment are similar to those of the first exemplary embodiment or second exemplary embodiment, and so explanation thereof is omitted.
As described above, the prediction device of the third exemplary embodiment extracts the first training feature vectors expressing features from the plural training cyclic peptides. For each of the first training feature vectors, the prediction device adjusts the length of the first training feature vector to a prescribed length, then cyclically shifts the elements of the first training feature vector so as to generate a set of second training feature vectors. The prediction device generates training data expressed by the first training feature vector set and the second training feature vectors set paired with correct values for membrane permeability of the respective training cyclic peptides. The prediction device then generates a trained model, for outputting a predicted value for membrane permeability of a cyclic peptide from feature vectors expressing features of a cyclic peptide, by executing a machine learning algorithm based on the plural generated items of training data. This enables the training data to be augmented while considering structural characteristic of the cyclic peptides, Moreover, the trained model can be obtained based on a large amount of training data generated in consideration of the configuration of the cyclic peptides.
Next, explanation follows regarding a fourth exemplary embodiment. A prediction device of the fourth exemplary embodiment differs from the first to third exemplary embodiments in respect that a predicted value for membrane permeability of a cyclic peptide is generated using a convolutional neural network model including a layer in which elements at both ends of a feature vector are placed adjacent to one another so as to correspond to the structural properties of cyclic peptides. Note that similar portions in the configuration of the prediction device according to the fourth exemplary embodiment to any of those of the prediction devices of the first to third exemplary embodiments are allocated the same reference numerals, and explanation thereof is omitted.
There is a need for feature vectors extracted from a cyclic peptide to be expressed as a ring of the residues configuring the cyclic peptide. In this regard, vectors that are elements simply arrayed in a one-dimensional form result in a start end and a terminal end being created as a result, and thus might not be considered as appropriately expressing the continuity of the ring of residues in a cyclic peptide.
Thus, the prediction device of the fourth exemplary embodiment generates predicted. values for membrane permeability of cyclic peptides using a convolutional neural network model including a both-end-adjacency layer in which elements at both ends of a feature vector are placed adjacent to one another. The configuration of the residues of the cyclic peptides are thereby expressed in the convolutional neural network model.
In contrast thereto a convolutional neural network model of the fourth exemplary embodiment includes a. layer that considers the structural features of the cyclic peptide.
Based on plural training data, the training section 18 of the fourth exemplary embodiment generates a trained convolutional neural network model, for outputting a predicted value for membrane permeability of a cyclic peptide from feature vectors, by training a convolutional neural network model including a both-end-adjacency layer in which elements at both ends of a training feature vector are placed adjacent to one another. The training section 18 then stores the trained convolutional neural network model in the trained model storage section 20.
The generation section 24 of the fourth exemplary embodiment generates predicted values for membrane permeability of prediction target peptides by inputting feature vectors extracted from a cyclic peptide that is the target for membrane permeability prediction into the trained convolutional neural network model stored in the trained model storage section 20.
Note that other configuration and operation of the prediction device 410 of the fourth exemplary embodiment are similar to those of a prediction device of the first to third exemplary embodiments, and so explanation thereof is omitted.
As described above, based on plural training data the prediction device of the fourth exemplary embodiment generates a trained convolutional neural network model for outputting a predicted value for membrane permeability of a cyclic peptide from feature vectors, by training a convolutional neural network model including a both-end-adjacency layer in which elements at both ends of training feature vectors are placed adjacent to one another. This enables a trained convolutional neural network model to be obtained that considers structural characteristic of cyclic peptides.
Moreover, the prediction device generates predicted values for membrane permeability of prediction target peptides by inputting feature vectors extracted from a cyclic peptide that is the target for membrane permeability prediction into the trained convolutional neural network model including the both-end-adjacency layer in which elements at both ends of a feature vector are placed adjacent to one another. This enables predicted values for membrane permeability to be Obtained that consider structural characteristic of cyclic peptides.
Next, explanation follows regarding a fifth exemplary embodiment. A prediction device of the fifth exemplary embodiment differs from the first to fourth exemplary embodiments in respect that predicted values for peptide membrane permeability are generated by molecular dynamics simulation. Note that similar portions in the configuration of the prediction device according to the fifth exemplary embodiment to those of any of the prediction devices of the first to fourth exemplary embodiments are allocated the same reference numerals, and explanation thereof is omitted.
Various data for simulating pharmacokinetics of a peptide by molecular dynamics simulation is stored in the simulation data storage section 31. The simulation section 33, described later, simulates pharmacokinetics of the peptide based on the various data stored in the simulation data storage section 31. Note that data obtained from simulation is also stored in the simulation data storage section 31. The energy computation section 34, described later, computes free energy of the peptide based on the data obtained from simulation. The diffusion coefficient computation section 35, described later, computes a diffusion coefficient based on the data obtained from simulation. The prediction section 36 computes predicted values for membrane permeability based on the free energy of the peptide and the diffusion coefficient.
A tail Ta2, a head He2, and a lipid molecule join Jo1 therebetween are present inside the membrane region C on the side adjacent to the first solvent region W1. The join Jot is positioned further toward a membrane center side than a boundary between the first solvent region W1 and the membrane region C.
A tail Ta2, a head He2, and a join Jo2 therebetween are present inside the membrane region C on the side adjacent to the second solvent region W2. A membrane central zone Z0 is present in a central area of the membrane region C.
Note that the membrane region C is a region spanning from the head He1 in contact with the first solvent region W1 across to the head He2 in contact with the second solvent region W2.
As can be seen from
Thus, the following explanation describes a case in which a molecular dynamics simulation is executed for a segment spanning from the first solvent region W1 to sufficiently past the membrane central zone Z0, and in which the simulation result thereof is utilized to obtain a simulation result for from the membrane central zone Z0 to the second solvent region W2.
The setting section 32 sets, as setting information when simulating permeation of the peptide P in the segment spanning from the first solvent region W1 to as far as the vicinity of the join Jo1, an initial conformation of the peptide P as an initial conformation corresponding to the relative substance permittivity in the first solvent region W1.
The setting section 32 also sets, as setting information when simulating permeation of the peptide P in a segment spanning from the vicinity of the join Jo1 to sufficiently past the membrane central zone Z0, an initial conformation of the peptide P as an initial conformation corresponding to the relative substance permittivity in the membrane region C. The initial conformation of the peptide is thereby set to correspond to the surrounding environment in which the peptide is present, enabling simulation of pharmacokinetics of the peptide.
The simulation section 33 executes a molecular dynamics simulation of pharmacokinetics of the peptide. For example, the simulation section 33 executes a molecular dynamics simulation of pharmacokinetics of the peptide using known simulation software such as AMBER (Internet URL=https://ambermd.org/ accessed Feb. 8, 2021), or GROMACS (Internet URL=http://www.gromacs.org/ accessed Feb. 8, 2021).
First, the simulation section 33 simulates permeation of the peptide P in the segment spanning from the first solvent region W1 to as far as the vicinity of the join Jo1, and also simulates permeation of the peptide P in the segment spanning from the vicinity of the join Jo1 to a region sufficiently past the membrane central zone Z0, these corresponding to the initial conformation of the peptide as set by the setting section 32, and stores these simulation results in the simulation data storage section 31.
The setting section 32 acquires an initial conformation of simulation by replica exchange umbrella sampling (hereafter simply referred to as REUS simulation) executed subsequently to storing the simulation results in the simulation data storage section 31, The initial conformation is a series of initial conformations of the peptide in respective regions at the periphery of the membrane when simulating peptide dynamics by REUS simulation. The setting section 32 thereby sets the series of initial conformations in the REUS simulation based on the peptide dynamics simulation results.
Note that in the fifth exemplary embodiment the pharmacokinetics of the peptide are simulated by a known REUS simulation, as described later. When this is performed, there is a need to preset restraint positions of peptide replicas in the respective regions of the first solvent region W1 and of the membrane region C.
The setting section 32 sets such that a spacing between the restraint positions of the peptide replicas is finer (or narrower) closer a region is to the membrane central zone Z0 at the central area of the membrane region C. This enables a smooth exchange of replicas to be performed in the central portion of the membrane where the change in free energy is predicted to be large. As a result this enables efficient sampling of conformation and the like of the peptide inside the membrane.
In a REUS simulation, restraint positions of replicas are set slightly offset from one another, and the structure etc. of the simulation target is exchanged between the respective restraint positions. In the fifth exemplary embodiment also, the replica restraint positions are set slightly offset from one another, and conformation of the peptide is exchanged between the respective restraint positions. A peptide with a new conformation is therefore anticipated to be found at each of the restraint positions.
There is a steep change in free energy of the peptide in the vicinity of the membrane central zone Z0 at the central area of the membrane region C due to different reaction coordinates z. Namely, the membrane central zone Z0 is considered to be a so-called difficult region when the peptide permeates the cell membrane. To address this, in the fifth exemplary embodiment, the replica restraint positions of the peptide are set at a narrower spacing the closer to the membrane central zone Z0 at the central area of the membrane region C, such that more information is reflected in the simulation that relates to conformation of the peptide in the vicinity of the membrane central zone Z0.
In currents computers, the simulation of cell membrane permeability of a peptide is a simulation having an extremely high computation cost. The greater the number of peptide replicas, the higher the computation cost of the simulation. Thus, rather than setting the spacing of replica restraint positions simply at a uniform spacing, setting the restraint positions at a narrower spacing the closer to the membrane central zone Z0, this being the region with greatest impact on membrane permeability prediction results, as in the fifth exemplary embodiment, enables simulation results to be obtained at good accuracy while suppressing computation cost.
The simulation section 33 then executes a REUS simulation based on the replica restraint positions set by the setting section 32 and on a series of initial conformations of the peptide in the respective regions in the vicinity of the cell membrane, and stores the result thereof in the simulation data storage section 31.
The energy computation section 34 computes a free energy G(z) of the peptide at the respective reaction coordinates z according to a known computation formula based on the REUS simulation result stored in the simulation data storage section 31. Specifically, the energy computation section 34 computes the free energy G(z) of the peptide based on the information regarding relative position coordinates of the membrane and the peptide in the REVS simulation result stored in the simulation data storage section 31.
Next, the energy computation section 34 computes for each of the reaction coordinates z when the REUS simulation has executed a difference ΔG(z) between a minimum value Gmin out of the free energies G(z) of the peptide computed for each of the reaction coordinates z and the free energy G(z) of the peptide at the reaction coordinate z.
Next, the simulation section 33 executes a simulation using an umbrella sampling method (hereafter simply referred to as US simulation), and stores the simulation result in the simulation data storage section 31. Specifically, the simulation section 33 obtains the US simulation result by performing umbrella sampling using a final structure (such as a series of initial conformations) in the results of REUS simulation stored in the simulation data storage section 31. Note that exchange of replicas is not implemented when this umbrella sampling is performed. The US simulation result includes, for each of the reaction coordinates z, a value var(z) expressing variance in peptide centroid position, and a value Czz(t) expressing autocorrelation of centroid position at respective timings t.
Note that the simulation section 33 takes an inverse of the simulation result for the segment spanning from the first solvent region W1 as far as the membrane central zone Z0 in the simulation data storage section 31 and stores this as a simulation result for the segment spanning from the membrane central zone Z0 as far as the second solvent region W2.
The diffusion coefficient computation section 35 computes a local diffusion coefficient D(z) at each of the reaction coordinates z based on the US simulation result stored in the simulation data storage section 31. Specifically, the diffusion coefficient computation section 35 computes at each of the reaction coordinates z the diffusion coefficient D(z) according to Equation (2) below based on the value var(z) expressing variance in peptide centroid position, and the value Czz(t) expressing autocorrelation in centroid position at respective timings t.
The prediction section 36 computes at each of the respective reaction coordinates z value R(z) expressing local resistance at the reaction coordinate z of the peptide according to Equation (3) below and based on the difference ΔG(z) and the local diffusion coefficient D(z) at the reaction coordinate z, Note that β in the following Equation is a preset coefficient
The prediction section 36 then computes a predicted value Peff for membrane permeability of the peptide according to Equation (4) below and based on the value R(z) expressing the local resistance computed at the respective reaction coordinates z.
Note that za, zb in the above Equation are coordinates expressing ends of the reaction coordinates z in the simulation. Note that in cases in which a conventional method is adopted (see for example Siewert-Jan Marrink and Herman J. C. Berendsen, “Simulation of water transport through a lipid membrane”, J. Phys. Chem. 1994, 98, 15, 4155-4168), za is set as the first solvent region W1, and zb is set as the second solvent region W2.
The prediction section 36 of the present exemplary embodiment sets as a reaction coordinate z at a position za corresponding the minimum value Gmin out of the free energies G(z). The prediction section 36 then sets zb as the membrane central zone Z0, and computes membrane permeability Pflip, by computing the right side of Equation (4). Moreover, the prediction section 36 sets zb as the first solvent region W1 and computes membrane permeability Pout by computing the right side of Equation (4).
The prediction section 36 then takes the lower value out of the membrane permeability Pflip and the membrane permeability Pout to predict a predicted value Peff for membrane permeability.
Note that the above simulations may, for example, be executed based on the following Reference Cited Document.
Reference Cited Document 1: Yuji Sugita, Akio Kitao, and Yuko Okamoto, “Multidimensional replica-exchange method for free-energy calculations”, J. Chem. Phys. 2000, 113, 15, 6042-6051.
Next, explanation follows regarding operation of the prediction device 510 of the fifth exemplary embodiment.
On receiving an instruction signal indicating an instruction to start simulation, the prediction device 510 of the fifth exemplary embodiment executes the simulation processing routine illustrated in
At step S500, the setting section 32 sets an initial conformation of the peptide P when simulating permeation of the peptide P in a segment spanning from the first solvent region W1 to the vicinity of the join Jo1 as an initial conformation corresponding to relative substance permittivity in the first solvent region W1. The setting section 32 also sets an initial conformation of the peptide when simulating permeation of the peptide P in a segment spanning from the vicinity of the join Jo1 to sufficiently past the membrane central zone Z0 as an initial conformation corresponding to relative substance permittivity in the membrane region C.
At step S502, the simulation section 33 executes a simulation of the peptide P permeating the segment spanning from the first solvent region W1 to the vicinity of the join Jo1, and executes a simulation of the peptide P permeating the segment spanning from the vicinity of the join Jo1 to sufficiently past the membrane central zone Z0 according to the peptide initial conformations set at step S500. The simulation section 33 then stores the simulation results in the simulation data storage section 31. Note that these simulation results include the series of initial conformations of the peptide in the respective regions.
At step S504, the setting section 32 sets the series of initial conformations obtained at step S502 as the initial conformations to be employed in the REUS simulation, described later.
At step S506, the setting section 32 sets the spacing between restraint positions of the peptide replicas when executing the REUS simulation such that the spacing between the restraint positions of the peptide replicas become finer the closer a region is to the membrane central zone Z0.
At step S508, the simulation section 33 executes simulation of pharmacokinetics of the peptide by REUS simulation based on the series of initial conformations set at step S504 and on the replica restraint positions set at step S506. The simulation section 33 then stores the REUS simulation results in the simulation data storage section 31.
At step S510, the energy computation section 34 computes the free energy G(z) of the peptide at the respective reaction coordinates z according to a known computation formula and based on the REUS simulation result stored in the simulation data storage section 31.
At step S512, for each of the reaction coordinates z, the energy computation section 34 computes the difference ΔG(z) between the minimum value Gmin out of the free energies G(z) of the peptide computed at the reaction coordinates z, and the free energy G(z) of the peptide at the reaction coordinate z, based on the results of REUS simulation stored in the simulation data storage section 31.
At step S514, the simulation section 33 executes a US simulation based on a series of end structures, these being the results of the simulation performed at step S508.
Next, at step S516, the diffusion coefficient computation section 35 computes the local diffusion coefficient D(z) based on the value var(z) expressing variance of centroid position of the peptide when executing the US simulation for the respective reaction coordinates z in the results of the simulation executed at step S514, and based on the value Czz(t) expressing autocorrelation in centroid position at respective timings t.
At step S518, the prediction section 36 computes, for each of the reaction coordinates z in the results of the simulation executed at step S514, the value R(z) expressing local resistance of the peptide at the reaction coordinate z based on the difference ΔG(z) and the local diffusion coefficient D(z) at the reaction coordinate z.
At step S520, the prediction section 36 computes a predicted value for membrane permeability of the peptide based on the value R(z) expressing the local resistance computed at the respective reaction coordinates z.
At step S522, the prediction section 36 outputs as a result the predicted value for peptide membrane permeability computed at step S520.
As described above, the prediction device of the fifth exemplary embodiment computes a predicted value for membrane permeability of a peptide when permeating a membrane region representing a cell membrane, a first solvent region representing a solvent adjacent to one side of the membrane region, and a second solvent region representing a solvent adjacent to the other side of the membrane region. The prediction device computes the free energy G(z) of the peptide at respective reaction coordinates z expressing positions of the peptide in regions including the first solvent region, the membrane region, and the second solvent region, and also expressing the position of the peptide in a direction of an axis perpendicular to the membrane surface of the membrane region. The prediction device computes at each of the respective reaction coordinates z the difference ΔG(z) between the minimum value Gmin out of the free energies G(z) of the peptide computed at the reaction coordinates z and the free energy G(z) of the peptide at the reaction coordinate z. The prediction device also computes, for each of the respective reaction coordinates z, the value R(z) expressing local resistance of the peptide at the reaction coordinate z based on the difference ΔG(z) and based on the local diffusion coefficient D(z) at the reaction coordinate z. The prediction device then computes a predicted value for membrane permeability of the peptide based on the value R(z) expressing the local resistance computed at the respective reaction coordinates z. This enables the membrane permeability of the peptide to be predicted at good accuracy. In conventional methods, ΔG(z) at the respective reaction coordinates z has been computed based on an energy reference value outside the cell membrane. In contrast thereto, the prediction device of the fifth exemplary embodiment uses the minimum value Gmin out of the free energy of the peptide inside the cell membrane to compute ΔG(z) at the respective reaction coordinates z, thereby enabling the dynamics of the peptide inside the cell membrane to be simulated at good accuracy, and the membrane permeability of the peptide to be predicted at good accuracy.
Specifically, as is evident from
Moreover, the prediction device of the fifth exemplary embodiment sets an initial conformation of the peptide corresponding to relative substance permittivity in the first solvent region when simulating permeation of the peptide in a segment spanning from the first solvent region to the vicinity of a lipid molecule join positioned further toward the membrane center side than a boundary between the first solvent region and the membrane region. The prediction device also sets an initial conformation of the peptide corresponding to relative substance permittivity in the membrane region when simulating permeation of the peptide in a segment spanning from the vicinity of the join to sufficiently past the membrane central zone expressing the central area of the membrane region. The prediction device then predicts membrane permeability of the peptide by simulating dynamics of the peptide corresponding to the initial conformations of the peptide that were set. This enables the initial conformations of the peptide to be set as initial conformations corresponding to relative substance permittivity in the membrane region. As a result this enables the dynamics of the peptide inside the cell membrane to be simulated at good accuracy, enabling the membrane permeability of the peptide to be predicted at good accuracy.
Moreover, the prediction device of the fifth exemplary embodiment performs setting such that spacing between the restraint positions of the peptide are finer the closer a region is to the membrane central zone expressing the central area of the membrane region when simulating peptide permeation using a replica exchange umbrella sampling method. The prediction device then predicts the membrane permeability of the peptide by simulating dynamics of the peptide using the replica exchange umbrella sampling method according to the spacing between restraint positions that were set. Setting the restraint positions at a finer spacing the closer to the membrane central zone, this being the region with greatest impact on the prediction results for membrane permeability of the peptide, enables simulation results to be obtained with good accuracy while suppressing computation cost, thereby enabling the membrane permeability of the peptide to be predicted at good accuracy.
Next, explanation follows regarding a sixth exemplary embodiment. A prediction device of the sixth exemplary embodiment differs from the first to fifth exemplary embodiments in respect that a predicted value for peptide membrane permeability is computed by consolidating a predicted value for peptide membrane permeability obtained by molecular dynamics simulation with a predicted value for membrane permeability obtained by a trained model built by machine learning. Note that similar portions in the configuration of the prediction device according to the sixth exemplary embodiment to those of any of the prediction devices of the first to fifth exemplary embodiments are allocated the same reference numerals, and explanation thereof is omitted.
The simulation section 40 generates a first membrane permeability value expressing membrane permeability of a peptide by simulating dynamics of the peptide permeating through a membrane region representing a cell membrane, a first solvent region representing a solvent adjacent to one side of the membrane region, and a second solvent region representing a solvent adjacent to the other side of the membrane region. For example, the simulation section 40 generates the first membrane permeability value expressing membrane permeability of the peptide by a similar method to that of the prediction device of the fifth exemplary embodiment.
A trained model for outputting a predicted value for membrane permeability of a peptide from feature vectors is stored in the trained model storage section 42. For example, a trained model generated using any one of the prediction devices of the first to fourth exemplary embodiments is stored in the trained model storage section 42.
The trained model prediction section 44 extracts a predictive feature vector expressing a feature from the peptide that is the target for membrane permeability prediction, and generates a second membrane permeability value expressing membrane permeability of the peptide by inputting these predictive feature vectors into the trained model stored in the trained model storage section 42.
The computation section 46 computes a predicted value for membrane permeability of the peptide by consolidating the first membrane permeability value generated by the simulation section 40 with the second membrane permeability value generated by the trained model prediction section 44. For example, the computation section 46 may compute a predicted value for membrane permeability of the peptide by averaging the first membrane permeability value and the second membrane permeability value. Alternatively, the computation section 46 may compute the larger or smaller value out of the first membrane permeability value or the second membrane permeability value as being the predicted value for membrane permeability of the peptide.
The computation section 46 outputs this predicted value for membrane permeability of the peptide as a result.
As described above, the prediction device of the sixth exemplary embodiment generates the first membrane permeability value expressing peptide membrane permeability by simulating dynamics of the peptide. The prediction device also extracts predictive feature vectors expressing features from the peptide, and generates the second membrane permeability value expressing peptide membrane permeability by inputting the predictive feature vectors into a pre-built trained model, The prediction device then computes a predicted value for membrane permeability of the peptide by consolidating the generated first membrane permeability value with the generated second membrane permeability value. This enables a predicted value to be obtained that reflects both a predicted value obtained by molecular dynamics simulation and a predicted value obtained using a trained model.
Next, explanation follows regarding a seventh exemplary embodiment. A prediction device of the seventh exemplary embodiment differs from the first to sixth exemplary embodiments in respect that a trained model that employs a machine learning algorithm is built based on data of simulation results obtained by a molecular dynamics simulation. Note that similar portions in the configuration of the prediction device according to the seventh exemplary embodiment to any of those in the prediction devices of the first to sixth exemplary embodiments are allocated the same reference numerals, and explanation thereof is omitted.
The simulation section 40 simulates dynamics of a peptide permeating a membrane region C, a first solvent region W1, and a second solvent region W2, similarly to in the sixth exemplary embodiment. The simulation section 40 then stores the simulation result obtained by this simulation in the simulation result storage section 741. Note that the simulation result includes a predicted value for peptide membrane permeability obtained by the simulation section 40, physical quantities in respective region locations, tertiary structure of the peptide in the respective region locations, and so on.
The simulation result obtained by the simulation section 40 is stored in the simulation result storage section 741.
The training data generation section 715 generates simulation-derived training data expressed by the predicted value for peptide membrane permeability stored in the simulation result storage section 741 paired with a feature vector generated from a 3D descriptor obtained from the tertiary structure of the peptide in the respective region locations. Note that the respective region locations correspond to positions corresponding to several representative reaction coordinates z in the first solvent region W1, the membrane region C, and the second solvent region W2.
Specifically, the training data generation section 715 obtains 3D descriptors at the respective locations in the tertiary structure of the peptide in the respective region locations included in the simulation result, and extracts a single or plural feature vectors from these 3D descriptors. The training data generation section 715 generates as the simulation-derived. training data the extracted feature vector set paired with the predicted values for peptide membrane permeability included in the simulation results, and stores this in the training data storage section 716.
Plural items of simulation-derived training data are stored in the training data storage section 716.
The training section 718 generates a trained model, for outputting a predicted value for membrane permeability from feature vectors expressing features of a peptide and also expressing tertiary structure of the peptide, by executing a machine learning algorithm based on training data including the simulation-derived training data stored in the training data. storage section 716. The training section 718 then stores the trained model in the trained model storage section 720. Note that the training data storage section 716 may also contain training data. other than the simulation-derived training data.
Moreover, instead of predicted values for peptide membrane permeability, the training data may be configured by a physical quantity computed from the peptide and its surrounding environment at respective locations when executing a simulation, combined with feature vectors extracted from the 3D descriptors. In such cases, a trained model for predicting a physical quantity of a peptide from feature vectors is generated as result.
As described above, the prediction device of the seventh exemplary embodiment generates a predicted value for membrane permeability expressing membrane permeability of a peptide by simulating the dynamics of the peptide. The prediction device also generates simulation-derived training data expressed by a predicted value for peptide membrane permeability or a physical quantity computed from the peptide and its surrounding environment at respective locations, paired with a feature vector generated from a 3D descriptor obtained from a tertiary structure of the peptide at the respective locations. The prediction device then generates a trained model by executing a machine learning algorithm based on training data including the simulation-derived training data. This enables a trained model to be obtained for outputting a predicted value for membrane permeability from feature vectors based on data obtained by molecular dynamics simulation.
Note that the present disclosure is not limited to the exemplary embodiments described above, and various modifications and applications may be implemented within a. range not departing from the spirit of the present disclosure.
For example, although an example has been described in the first exemplary embodiment of a case in which each feature vector is extracted for instances in which each of the plural residues contained in a cyclic peptide are at the start point of the cyclic sequence, these plural feature vectors are input into a trained model, and a representative value is obtained for the predicted values of membrane permeability output from the trained model, there is no limitation thereto. For example, a single feature vector may be generated from each of the feature vectors for instances in which each of the plural residues contained in the cyclic peptide are at the start point of the cyclic sequence, this single feature vector input into a trained model, so as to obtain a predicted value of membrane permeability. In such a case, for example, the single feature vector may be generated by taking a weighted average of the plural feature vectors. Moreover, for example, specific feature vectors may be selected from out of plural feature vectors, and a single feature vector generated by taking a weighted average of the plural feature vectors that have been selected. Moreover, even when generating the trained model, a single training feature vector may be generated from each of the training feature vectors for instances in which each of the plural residues contained in the cyclic peptide are at the start point of the cyclic sequence, and then this training feature vector employed so as to generate the trained model.
Moreover, although an example has been described of a case in which the simulation section 33 of the fifth exemplary embodiment described above executes REDS simulation based on the replica restraint positions set by the setting section 32 and on the series of initial conformations of peptide at the respective regions in the vicinity of the cell membrane, there is no limitation thereto. For example, instead of REUS simulation, a US simulation or a metadynamics simulation (Alessandro Laio and Michele Parrinello, “Escaping free-energy minima”, Proc. Natl. Acad. Sci., 2002, 99, 12562-12566.) may be executed.
Note that although an example has been described of a case in the fifth exemplary embodiment described above in which the result of simulation for the segment from the first solvent region W1 to the vicinity of the membrane central zone Z0 is inverted to obtain the result of simulation from the membrane central zone Z0 to the second solvent region W2, there is no limitation thereto. The results of simulation from the membrane central zone Z0 to the second solvent region W2 may be obtained by executing an actual simulation from the membrane central zone Z0 to the second solvent region W2.
Moreover, although in the above exemplary embodiment examples have been described of cases in which the trained model is generated based on training data, there is no limitation thereto. For example, the trained model of the present exemplary embodiment may be generated as a distillation model based on other trained models.
Moreover, although embodiments have been described above in which a program according to the present disclosure is pre-stored (installed) in a storage section (not illustrated in the drawings), the program according to the present disclosure may be provided in a format recorded on a recording medium such as a CD-ROM, a DVD-ROM, a micro SD card, or the like.
Note that although in the above exemplary embodiments a CPU reads in software (a program) and executes processing thereof, various processors other than a CPU may be employed for execution. Processors in such cases include programmable logic devices (PLD) that allow circuit configuration to be modified post-manufacture, such as a field-programmable gate array (FPGA), and dedicated electric circuits, these being processors including a circuit configuration custom-designed to execute specific processing, such as an application specific integrated circuit (ASIC). The processing may be executed by any one of these various types of processor, or may be executed by a combination of two or more of the same type or different types of processor (such as plural FPGAs, or a combination of a CPU and an FPGA). The hardware structure of these various types of processors is more specifically an electric circuit combining circuit elements such as semiconductor elements.
Moreover, the respective processing of the exemplary embodiments may be executed by the processing being executed by a program in a configuration of a computer, a server, or the like including a generic computation processing device, a storage device, and the like. Such a program may be stored in a storage device or recorded on a recording medium such as a magnetic disc, an optical disc, or semiconductor memory, or provided over a network. Obviously, other configuration elements also do not need to be implemented using a single computer or server, and may be distributed across and implemented by plural computers that are connected together over a network.
The disclosures of Japanese Patent Application No. 2021-031234, filed on Feb. 26, 2021, are incorporated herein by reference in their entirety. All publications, patent applications, and technical standards mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent application, or technical standard was specifically and individually indicated to be incorporated by reference
Number | Date | Country | Kind |
---|---|---|---|
2021-031234 | Feb 2021 | JP | national |