The present application claims a priority of the Chinese patent application No. 202110367301.8 filed in China on Apr. 6, 2021, the disclosure of which is incorporated herein by reference in its entirety.
The present application relates to the field of a big data technology and a deep learning technology in a computer technology, in particular to a method and a device for determining correlation between a drug and a target, and an electronic device.
For the research and development of a new drug, it is an important phase to predict binding affinity (also referred to as correlation) between the new drug and a target. In this phase, the affinity between a plurality of candidate new drugs and the target is measured and ranked, so as to find a new drug of real worth.
Currently, during the prediction, a Gaussian screening test is commonly adopted.
An object of the present application is to provide a method and a device for determining correlation between a drug and a target, and an electronic device+9−.
In one aspect, the present application provides in some embodiments a method for determining correlation between a drug and a target, including: establishing a spatial molecular graph of a candidate drug and the target, the spatial molecular graph including an atomic node set and an edge set, the atomic node set including atoms in the candidate drug and atoms in the target, the edge set including at least one atom connection edge; inputting a first atom feature of the atomic node set and the spatial molecular graph into a first Graph Attention Network (GAT) for prediction, so as to obtain a second atom feature of the atomic node set; and determining a parameter value of the correlation between the candidate drug and the target in accordance with the second atom feature of the atomic node set.
According to the method for determining the correlation between the drug and the target in the embodiments of the present application, the spatial molecular graph of the candidate drug and the target is established. Next, the first atom feature of the atomic node set and the spatial molecular graph into the first GAT for prediction, i.e., the prediction is performed using the first GAT to obtain the second atom feature of the atomic node set. Then, the parameter value of the correlation between the candidate drug and the target is determined in accordance with the second atom feature of the atomic node set. As a result, the prediction is performed without a Gaussian screening test, so it is able to reduce a computational burden, and determine the correlation between the drug and the target efficiently.
In another aspect, the present application provides in some embodiments a device for determining correlation between a drug and a target, including: an establishment module configured to establish a spatial molecular graph of a candidate drug and the target, the spatial molecular graph including an atomic node set and an edge set, the atomic node set including atoms in the candidate drug and atoms in the target, the edge set including at least one atom connection edge; a prediction module configured to input a first atom feature of the atomic node set and the spatial molecular graph into a first GAT for prediction, so as to obtain a second atom feature of the atomic node set; and a first determination module configured to determine a parameter value of the correlation between the candidate drug and the target in accordance with the second atom feature of the atomic node set.
In yet another aspect, the present application provides in some embodiments an electronic device, including at least one processor, and a memory in communication connection with the at least one processor and storing therein an instruction executed by the at least one processor. The instruction is executed by the at least one processor, so as to implement the method for determining the correlation between the drug and the target in the embodiments of the present application.
In still yet another aspect, the present application provides in some embodiments a non-transient computer-readable storage medium storing therein a computer instruction. The computer instruction is executed by a computer so as to implement the above-mentioned method for determining the correlation between the drug and the target in the embodiments of the present application.
In still yet another aspect, the present application provides in some embodiments a computer program product including a computer program. The computer program is executed by a processor so as to implement the above-mentioned method for determining the correlation between the drug and the target in the embodiments of the present application.
The following drawings are provided to facilitate the understanding of the present application, but shall not be construed as limiting the present application. In these drawings,
In the following description, numerous details of the embodiments of the present application, which should be deemed merely as exemplary, are set forth with reference to accompanying drawings to provide understanding of the embodiments of the present application. Therefore, those skilled in the art will appreciate that modifications or replacements may be made in the described embodiments without departing from the scope and spirit of the present application. Further, for clarity and conciseness, descriptions of known functions and structures are omitted.
As shown in
Step S101: establishing a spatial molecular graph of a candidate drug and the target.
The spatial molecular graph includes an atomic node set and an edge set, the atomic node set includes atoms in the candidate drug and atoms in the target, and the edge set includes at least one atom connection edge.
The candidate drug is a compound consisting of a plurality of atoms. The target of the drug is a position where the drug and a body biomacromolecule are gathered, and it may also be understood as a protein. As an important part in a drug discovery process, the prediction of interaction between the drug and the target is represented by prediction of affinity between the drug and the target, and the correlation may be just understood as affinity.
In the embodiments of the present application, the spatial molecular graph of the candidate drug (compound) and the target (protein) is established at first. For example, the spatial molecular graph is represented by G=(V, E), where V represents the atomic node set, V=VM ∪Vp={a1, a2, . . . aN}, VM represents an atom set of the candidate drug, VP represents an atom set of the protein, ai represents an ith atomic node and 1≤i≤N, and E represents the edge set including at least one atom connection edge, i.e., an edge connecting at least one pair of atomic nodes. Any pair of atomic nodes include two atomic nodes. It should be appreciated that, there is the atom connection edge between any two atoms merely when the two atoms meet a certain condition, otherwise, there is no atom connection edge.
Step S102: inputting a first atom feature of the atomic node set and the spatial molecular graph into a first GAT for prediction, to obtain a second atom feature of the atomic node set.
The atomic node set includes a plurality of atomic nodes, so the first atom feature of the atomic node set includes a first atom feature of each atomic node in the plurality of atomic nodes. At first, the first atom feature of the atomic node set is obtained, and the first atom feature includes, but not limited to, an atom type, the quantity of neighboring nodes, and the distribution of chemical bonds. The quantity of neighboring nodes for a certain atomic node represents the quantity of nodes having chemical bonds with the atomic node. The distribution of the chemical bonds for a certain atomic node represents the distribution of the chemical bonds for the atomic node in a corresponding candidate drug or target. In the embodiments of the present application, the first atom feature of the atomic node set and the spatial molecular graph are inputted into the first GAT for prediction, and then the first GAT outputs the second atom feature of the atomic node set. The second atom feature includes a second atom feature of each atomic node in the atomic node set.
It should be appreciated that, in a Graph Convolutional Network (GCN), a topical graph structure and a node feature are combined to obtain a good effect in a node classification task. However, a combination mode of a neighboring node feature in the GCN depends on the graph structure, leading to a limitation on a generalization ability of the GCN on the other graph structure. In the GAT, weighted summation is performed on the neighboring node features using an attention mechanism, and a weight of each neighboring node feature depends on the node feature and is independent of the graph structure. In other words, in the GAT, a fixed, standardized operation in the GCN is replaced with the attention mechanism, so the generalization ability is relatively strong. In the embodiments of the present application, the second atom feature different from the first atom feature and capable of representing an atom feature is obtained through the GAT in accordance with the first atom feature and the spatial molecular graph, so as to improve the atom representation accuracy.
Step S103: determining a parameter value of the correlation between the candidate drug and the target in accordance with the second atom feature of the atomic node set.
The parameter value of the correlation between the candidate drug and the target is determined in accordance with the second atom feature of the atomic node set, so as to predict the affinity between the candidate drug and the target. The larger the parameter value, the stronger the affinity; the smaller the parameter value, the weaker the affinity.
According to the method for determining the correlation between the drug and the target in the embodiments of the present application, the spatial molecular graph of the candidate drug and the target is established. Next, the first atom feature of the atomic node set and the spatial molecular graph into the first GAT for prediction, i.e., the prediction is performed using the first GAT to obtain the second atom feature of the atomic node set. Then, the parameter value of the correlation between the candidate drug and the target is determined in accordance with the second atom feature of the atomic node set. As a result, the prediction is performed without a Gaussian screening test, so it is able to reduce a computational burden, and determine the correlation between the drug and the target efficiently.
In a possible embodiment of the present application, the second atom feature of the atomic node set is inputted into a fully connected layer, and the parameter value of the correlation between the candidate drug and the target is outputted by the fully connected layer.
In a possible embodiment of the present application, the establishing the spatial molecular graph of the candidate drug and the target includes establishing the spatial molecular graph in accordance with a distance between atomic nodes in the atomic node set. A distance between two atomic nodes for any edge in the edge set is smaller than or equal to a predetermined distance threshold.
A coordinate position of each atomic node in the atomic node set is obtained in advance in a three-dimensional space using a conventional method, which will not be particularly defined herein. A distance between any two atoms in the atomic node set in the three-dimensional space is calculated in advance to obtain a distance matrix D. The distance matrix D includes the distance between any two atomic nodes in the atomic node set, e.g., Dij represents a distance between an ith atomic node and a jth atomic node. Subsequently, an edge connecting the atomic nodes is determined in accordance with the predetermined distance threshold θd (e.g., 5 Å), and the edge set E is expressed as E={eij=(ai, aj)|ai, aj∈V, Dij≤θd}, where ai represents an ith atomic node in the atomic node set, aj represents a jth atomic node in the atomic node set, eij represents an edge connecting the ith atomic node and the jth atomic node, and 1≤j≤N. The distance between any two atomic nodes is smaller than or equal to the predetermined distance threshold, so an edge connecting the two atomic nodes may be established. It should be appreciated that, eij represents an edge connecting the ith atomic node and the jth atomic node with the ith atomic node as an end point, i.e., the edge is a directed edge from the jth atomic node to the ith atomic node.
In an original molecule, a link between atoms is merely determined by a chemical bond, which is insufficient to model a relationship among the atoms in the molecule. In addition, there is no original chemical bond between the drug and the target. In order to obtain more complete correlation between the atoms, in the embodiments of the present application, the spatial molecular graph of the drug and the target is established in accordance with a spatial distance, and in the spatial molecular graph, the distance between the two atomic nodes for any edge in the edge set is smaller than or equal to the predetermined distance threshold. In this way, it is able to represent the correlation between the atoms in the drug and the atoms in the target in a better manner through the spatial molecular graph, thereby to improve the accuracy of the spatial molecular graph.
In a possible embodiment of the present application, prior to inputting the first atom feature of the atomic node set and the spatial molecular graph into the first GAT for prediction so as to obtain the second atom feature of the atomic node set, the method further includes: encoding the distance between the atomic nodes in the atomic node set to obtain a first distance vector between the atomic nodes in the atomic node set; and converting the first distance vector between the atomic nodes in the atomic node set into a target distance vector between the atomic nodes in the atomic node set.
The inputting the first atom feature of the atomic node set and the spatial molecular graph into the first GAT for prediction so as to obtain the second atom feature of the atomic node set includes: inputting the first atom feature of the atomic node set, the spatial molecular graph and the target distance vector between the atomic nodes in the atomic node set into the first GAT for prediction so as to obtain the second atom feature of the atomic node set.
The distance between the atomic nodes in the atomic node set may include a distance between any two atomic nodes in the atomic node set. In the embodiments of the present application, during the prediction of the correlation, the distance between the atomic nodes in the atomic node set is also taken into consideration. However, this distance is a scalar distance, i.e., a specific value, and it needs to be encoded to obtain a corresponding first distance vector. Different scalar distances correspond to different first distance vectors. The first distance vector may be understood as a sparse vector, and the first distance vector between the atomic nodes in the atomic node set may be converted into a dense vector, so as to obtain the target distance vector between the atomic nodes in the atomic node set, i.e., the obtained target distance vector is a dense vector. Then, the first atom feature of the atomic node set, the spatial molecular graph and the distance vector between the atomic nodes in the atomic node set are inputted into the first GAT for prediction, so as to obtain the second atom feature of the atomic node set. The parameter value of the correlation is determined in accordance with the second atom feature, so as to improve the accuracy of the parameter value of the correlation.
As an instance, the distance between the atomic nodes in the atomic node set is encoded through one-hot encoding, so as to obtain the distance vector between the atomic nodes in the atomic node set. In the one-hot encoding, a categorical vector is taken as a representation of a binary vector. At first, a categorical value (i.e., the distance in the embodiments of the present application) is mapped to an integral value, and each integral value is represented as a binary vector. Apart from an index of an integer, each integral value is a zero value and marked as 1. In the three-dimensional space, a position of each atomic node is defined through position coordinates (x, y, z), and the coordinates depend on a definition of a coordinate system (e.g., directions of axes x, y and z, and an origin of the coordinate). Hence, the distance is encoded in accordance with this relative position relationship. As shown in
In a possible embodiment of the present application, the inputting the first atom feature of the atomic node set, the spatial molecular graph, and the target distance vector between the atomic nodes in the atomic node set into the first GAT for prediction so as to obtain the second atom feature of the atomic node set includes: inputting the target distance vector between the atomic nodes in the atomic node set, the spatial molecular graph and the first atom feature of the atom node set into the first GAT for prediction, so as to obtain a target feature representation of each edge in the edge set; and predicting the first atom feature of the atomic node set, the target distance vector between the atomic nodes in the atomic node set and the target feature representation of the edge in the edge set in accordance with the first GAT, to obtain the second atom feature of the atomic node set.
During the determination of the second atom feature of the atomic nodes in the atomic node set, firstly edge nodes are aggregated to obtain the target feature representation of each edge in the edge set, and the edge node here refers to the edge in the edge set. A spatial distance depends on a pair of atomic nodes, and it is difficult for an existing neural network to effectively learn long-distance dependency during the aggregation. Hence, in the embodiments of the present application, distance information is aggregated into the edge node, and spatial structure information is captured through the propagation and aggregation of the edge nodes. One atom connection edge relates to one pair of atomic nodes, and after obtaining the target feature representation of the edge in the edge set, the first atom feature of the atomic nodes is updated through the aggregation of the atomic nodes in accordance with the target feature representation of the edge in the edge set, so as to obtain the second target atom feature.
In other words, in the embodiments of the present application, the target feature representation of the edge is determined at first, and during the determination of the target feature representation of the edge, the target distance vector between the atomic nodes in the atomic node set, the spatial molecular graph and the first atom feature of the atomic node set have been taken into consideration. Next, the second atom feature of the atomic node set is determined in accordance with the target feature representation of the edge in the edge set, i.e., during the determination of the second atom feature, not only the target feature representation of the edge but also the first atom feature of the atomic node set and the target distance vector between the atomic nodes in the atomic node set have been taken into consideration. In this regard, when determining the parameter value of the correlation in accordance with the second atom feature, it is able to improve the accuracy of determining the parameter value of the correlation.
In a possible embodiment of the present application, the inputting the target distance vector between the atomic nodes in the atomic node set, the spatial molecular graph and the first atom feature of the atomic node set into the first GAT for prediction so as to obtain the target feature representation of the edge in the edge set includes: determining a neighboring edge set for an edge between an ith atomic node and a jth atomic node in the edge set, where i and j are integers, 1≤i≤N, 1≤j≤M, N represents the total quantity of the atomic nodes in the atomic node set, and M represents the quantity of atomic nodes each having an edge with the ith atomic node; determining an initial feature representation of the edge in the neighboring edge set in accordance with a target distance vector between atomic nodes for the edge in the neighboring edge set, a first atom feature of the atomic nodes for the edge in the neighboring edge set, as well as a first activation function, a first transfer matrix and an offset vector in the first GAT; determining a first standardized weight in accordance with the initial feature representation of the edge in the neighboring edge set, as well as a first weight matrix, a second activation function and a first attention weight in the first GAT; and determining a target feature representation of the edge between the ith atomic node and the jth atomic node in accordance with the initial feature representation of the edge in the neighboring edge set, the first standardized weight and the first weight matrix in the first GAT.
In the embodiments of the present application, the neighboring edge set for the edge between the ith atomic node and the jth atomic node may be understood as a neighboring edge set for the edge between the ith atomic node and the jth atomic node with the ith atomic node as an end point, i.e., any edge in the neighboring edge set points to the ith atomic node. For example, the spatial molecular graph G includes an edge eki=(ak, ai) and an edge eij=(ai, aj). The edge eki is an edge between a kth atomic node and the ith atomic node with the ith atomic node as an edge point, i.e., the edge eki is an edge from the kth atomic node to the ith atomic node. The edge eki is adjacent to the edge eij, so the edge eki is a neighboring edge of the edge eij. In this way, it is able to determine all neighboring edges for the edge between the ith atomic node and the jth atomic node, thereby to obtain the neighboring edge set for the edge between the ith atomic node and the jth atomic node. The neighboring edge set for the edge between the ith atomic node and the jth atomic node includes all neighboring edges adjacent to the edge between the ith atomic node and the jth atomic node.
After determining the neighboring edge set for the edge between the ith atomic node and the jth atomic node, the initial feature representation of the edge in the neighboring edge set may be determined in accordance with the target distance vector between the atomic nodes for the edge in the neighboring edge set, the first atom feature of the atomic nodes for the edge in the neighboring edge set, the first activation function in the first GAT, the first transfer matrix in the first GAT, and the offset vector in the first GAT. It should be appreciated that, an initial feature representation of a target edge may be determined in accordance with a target distance vector between atomic nodes for a target edge in the neighboring edge set, a first atomic feature of two atomic nodes for the target edge, as well as the first activation function, the first transfer matrix and the offset vector in the first GAT. The target edge is any edge in the neighboring edge set. In other words, for each atom connection edge in the neighboring edge set, the initial feature representation of the target edge is determined in the above-mentioned way, so as to determine the initial feature representation of the edge in the neighboring edge set.
As an instance, for the target edge, the first atom feature of the two atomic nodes for the target edge is spliced with the target distance vector between the two atomic nodes for the target edge, so as to obtain a first splicing result. Next, the first transfer matrix is multiplied by the first splicing result to obtain a first target result. Next, the first target result is added to the offset vector to obtain a second target result. Then, the second target result is taken as an input of the first activation function, and the initial feature representation of the target edge is outputted through the first activation function.
As an instance, the initial feature representation
As an instance, ak,i,j is determined through
where ak,i,j is a first standardized weight related to the edge eki and the edge eij and represents an importance level of the edge eki relative to the edge eij during the determination of a target feature, σ2 represents a second activation function, ae represents the first attention weight, We represents the first weight matrix, eu represents an initial feature representation of the edge eu, ek, represents the initial feature representation of the edge eki in the neighboring edge set,
As an instance, a target feature representation
It should be appreciated that,
Through the above process, the target feature representation of the edge between the ith atomic node and the jth atomic node in the edge set may be determined. 1≤i≤N and 1≤j≤M, so through the similar process, the target feature representation of each edge in the edge set is determined merely through updating values of i and j. When the values of i and j are updated, the neighboring edge set for the edge between the ith atomic node and the jth atomic node, the target distance vector between the ith atomic node and the jth atomic node, the first atom feature of the ith atomic node and the first atom feature of the jth atomic node are updated accordingly. In this way, it is able to obtain the target feature representation of the edge in the edge set.
In the embodiments of the present application, during the determination of the target feature representation, through the combination of the distance information, it is able to learn the distance dependency in the spatial molecular graph, and determine the second atom feature of the atomic node in accordance with the target feature representation of the edge, and then determine the parameter value of the correlation between the candidate drug and the target in accordance with the second atom feature. In this way, it is able to improve the accuracy of the parameter value of the correlation between the candidate drug and the target.
In a possible embodiment of the present application, the predicting the first atom feature of the atomic node set, the target distance vector between the atomic nodes in the atomic node set and the target feature representation of the edge in the edge set in accordance with the first GAT to obtain the second atom feature of the atomic node set includes: determining a target neighboring edge set for the ith atomic node, an end point of any edge in the target neighboring edge set being the ith atomic node; and determining the second atom feature of the ith atomic node in accordance with a target feature representation of the edge in the target neighboring edge set, the first atom feature of the ith atomic node, a target distance vector between atomic nodes for the edge in the target neighboring edge set as well as a second attention weight, a second transfer matrix and a second weight matrix in the first GAT.
Any edge in the target neighboring edge set points toward the ith atomic node, and the second atom feature of the ith atomic node may be determined through the above process. 1≤i≤N, so through the similar process, the second atom feature of each atomic node in the atomic node set is determined merely through updating a value of i. When the value of i is updated, the target neighboring edge set for the ith atomic node, the target distance vector between the atomic nodes for the edge in the target neighboring edge set, the first atom feature of the ith atomic node and the target distance vector between the atomic nodes in the edge in the target neighboring edge set are updated accordingly. In this way, it is able to obtain the target feature representation of each atomic node in the atomic node sets, i.e., the second atom feature of the atomic node set.
In the embodiments of the present application, during the determination of the second atom feature, through the combination of the distance information, it is able to learn the distance dependency in the spatial molecular graph, and take the target feature representation of the edge into consideration, and then determine the parameter value of the correlation between the candidate drug and the target in accordance with the second atom feature. In this way, it is able to improve the accuracy of the parameter value of the correlation between the candidate drug and the target.
As an instance, during the determination of the second atom feature of the ith atomic node, at first the target feature representation of the edge in the target neighboring edge set may be converted to obtain a first conversion feature of the edge in the target neighboring edge set, e.g., hk,i,e=Wh
Next, an important level of an edge node is calculated with respect to different spatial distance relationships. An attention weight of the edge eki relative to ai is calculated through ωki=σ3 (anT[hi,a⊕hk,i,j⊕Wspki]), where an represents a second attention weight, Ws represents the second transfer matrix, and σ3 represents a third activation function. Then, ωki may be standardized, e.g., through a softmax function, so as to obtain a second standardized weight through
where βki represents the second standardized weight after standardizing ωki, and Neon(ai) represents the target neighboring edge set for the ith atomic node.
Finally, the atomic nodes are aggregated and updated in accordance with the second attention weight βki, and the second atom feature
In this way, the second atom feature of each atomic node in the atomic node set may be obtained. A sum of the second atom features of all the atomic nodes is obtained as a representation of the molecular graph
and inputted into a fully connected layer consisting of a plurality of fully-connected layers cascaded to each other. The prediction of the affinity is performed through the fully-connected layer, so as to obtain the parameter value of the correlation, e.g.,
In a possible embodiment of the present application, the first GAT may be a hierarchical GAT, i.e., it includes L layers of GATs, where L is an integer greater than 1. In two adjacent layers of GATs, an input of the latter includes an output of the former. An input of a first layer of GAT in the L layers of GATs includes the target distance vector between the atomic nodes in the atomic node set, the spatial molecular graph and the first atom feature of the atomic node set. An output of an lth layer of GAT includes an lth-layer atom feature of the atomic node set, where 1≤l≤L. An output of a last GAT, i.e., an Lth layer of GAT, includes an Lth-layer atom feature of the atomic node set, i.e., the second atom feature of the atomic node set. The lh-layer atom feature is obtained by predicting an (l−1)th-layer atom feature of the atomic node set, the target distance vector between the atomic nodes in the atomic node set and an lh-layer target feature representation of the edge in the edge set according to the lh layer of GAT in the first GAT, and the lh-layer target feature representation of the edge in the edge set is obtained through inputting the target distance vector between the atomic nodes in the atomic node set, the spatial molecular graph and the (l−1)th-layer atom feature of the atomic node set into the lth-layer of GAT for prediction.
As an instance, an lth-layer initial feature representation
As an instance, ak,i,jl may be determined through
where ak,i,jl is a standardized weight of the lth layer of GAT in the first standardized weight related to the edge eki and the edge eij and it represents an importance level of the edge eki relative to the edge eij in the lth layer of GAT during the aggregation, σ2 represents the second activation function, ae,l represents a first attention weight of the lth layer of GAT, Wel represents a first weight matrix of the lth layer of GAT,
As an instance, a target feature representation of the edge eij between the ith atomic node and the jth atomic node in the lth layer of GAT, i.e., an lth-layer atom feature
The target neighboring edge set Neon (ai) for the ith atomic node may be expressed as Neon(ai)={eki|eki=(ak,ai)∈E}.
Prior to the node aggregation, the representations of the atomic nodes and the edge nodes are uniformly transferred to a same vector space, i.e., hk,i,el=Whl
Next, an important level of the edge node is calculated with respect to different spatial distance relationships. An attention weight of the edge eki relative to ai in the lth GAT may be calculated through ωkil=σ3 (an,lT[hi,al⊕hk,i,jl⊕Wslpki]), where an,l represents a second attention weight of the lth GAT, Wsl represents a second transfer matrix of the lth GAT, and σ3 represents a third activation function. Then, ωkil is standardized through a softmax function, i.e.,
where βkil represents a second standardized weight of wkil in the lth layer of GAT after the standardization of ωkil, and Neon (ai) represents the target neighboring edge set for the ith atomic node.
Finally, the atomic nodes are aggregated and updated in accordance with the attention weight βkil, which is similar to extending the GAT to a multi-head GAT, and the resultant representations are averaged through
where
At a final prediction stage, a sum of the second atom features of all the atomic nodes is obtained as a representation of the molecular graph
and the affinity is predicted subsequently through a plurality of fully-connected layers, i.e.,
It should be appreciated that, when training the GAT, a mean square error of a prediction result ŷ of a training sample and a really observed result y is taken as a training loss function, i.e.,
where represents the training sample, and |
| represents the quantity of training samples.
In the embodiments of the present application, as shown in
In the embodiments of the present application, it is able to effectively learn distance information about each molecule in the three-dimensional space, thereby to rapidly, accurately predict the affinity of the combination of the drug and the target in conjunction with topological structure information about the molecular graph. To be specific, as compared with a traditional method and a physically based method, it is able to reduce a computational cost and a time cost. As compared with a machine learning method, it is unnecessary to extract features in accordance with domain expert knowledge, and it is able to improve the prediction accuracy of the model. In addition, as compared with a common deep learning model, it is able to accurately model the spatial association between the molecules, and learn the spatial distance information that cannot be learned by the traditional method, thereby to further improve the performance of the model.
As shown in
In a possible embodiment of the present application, the establishing the spatial molecular graph of the candidate drug and the target includes establishing the spatial molecular graph in accordance with a distance between atomic nodes in the atomic node set. A distance between two atomic nodes for any edge in the edge set is smaller than or equal to a predetermined distance threshold.
In a possible embodiment of the present application, the device further includes: an encoding module configured to encode the distance between the atomic nodes in the atomic node set to obtain a first distance vector between the atomic nodes in the atomic node set; and a first conversion module configured to convert the first distance vector between the atomic nodes in the atomic node set into a target distance vector between the atomic nodes in the atomic node set. The inputting the first atom feature of the atomic node set and the spatial molecular graph into the first GAT for prediction so as to obtain the second atom feature of the atomic node set includes inputting the first atom feature of the atomic node set, the spatial molecular graph and the target distance vector between the atomic nodes in the atomic node set into the first GAT for prediction so as to obtain the second atom feature of the atomic node set.
In a possible embodiment of the present application, the prediction module includes: a second determination module configured to input the target distance vector between the atomic nodes in the atomic node set, the spatial molecular graph and the first atom feature of the atom node set into the first GAT for prediction, so as to obtain a target feature representation of the edge in the edge set; and a third determination module configured to predict the first atom feature of the atomic node set, the target distance vector between the atomic nodes in the atomic node set and the target feature representation of the edge in the edge set according to the first GAT, to obtain the second atom feature in the atomic node set.
In a possible embodiment of the present application, the second determination module includes: a neighboring edge determination module configured to determine a neighboring edge set for an edge between an ith atomic node and a jth atomic node in the edge set, where i and j are integers, 1≤i≤N, 1≤j≤M, N represents the total quantity of the atomic nodes in the atomic node set, and M represents the quantity of atomic nodes each having an edge with the ith atomic node; a first determination sub-module configured to determine an initial feature representation of the edge in the neighboring edge set in accordance with a target distance vector between atomic nodes for the edge in the neighboring edge set, a first atom feature of the atomic nodes for the edge in the neighboring edge set, as well as a first activation function, a first transfer matrix and an offset vector in the first GAT; a second determination sub-module configured to determine a first standardized weight in accordance with the initial feature representation of the edge in the neighboring edge set, as well as a first weight matrix, a second activation function and a first attention weight in the first GAT; and a third determination sub-module configured to determine a target feature representation of the edge between the ith atomic node and the jth atomic node in accordance with the initial feature representation of the edge in the neighboring edge set, the first standardized weight and the first weight matrix in the first GAT.
In a possible embodiment of the present application, the second determination module includes: a fourth determination sub-module configured to determine a target neighboring edge set for the ith atomic node, an end point of any edge in the target neighboring edge set being the ith atomic node; and a fifth determination sub-module configured to determine the second atom feature of the ith atomic node in accordance with a target feature representation of the edge in the target neighboring edge set, the first atom feature of the ith atomic node, a target distance vector between atomic nodes for the edge in the target neighboring edge set as well as a second attention weight, a second transfer matrix and a second weight matrix in the first GAT.
The device for determining the correlation between the drug and the target is used to implement the above-mentioned method with same technical features and technical effects, which will thus not be further particularly defined herein.
The present application further provides an electronic device, a computer-readable storage medium, and a computer program product.
In the embodiments of the present application, the non-transient computer-readable storage medium is configured to store therein computer instructions, and the computer instructions are executed by a computer to implement the above-mentioned method.
In the embodiments of the present application, the computer program product includes a computer program, and the computer program is executed by a computer to implement the above-mentioned method.
As shown in
Multiple components in the electronic device 500 are connected to the I/O interface 505. The multiple components include: an input unit 506, e.g., a keyboard, a mouse and the like; an output unit 507, e.g., a variety of displays, loudspeakers, and the like; a storage unit 508, e.g., a magnetic disk, an optic disk and the like; and a communication unit 509, e.g., a network card, a modem, a wireless transceiver, and the like. The communication unit 509 allows the electronic device 500 to exchange information/data with other devices through a computer network and/or other telecommunication networks, such as the Internet.
The computing unit 501 may be any general purpose and/or special purpose processing components having a processing and computing capability. Some examples of the computing unit 501 include, but are not limited to: a central processing unit (CPU), a graphic processing unit (GPU), various special purpose artificial intelligence (AI) computing chips, various computing units running a machine learning model algorithm, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 carries out the aforementioned methods and processes, e.g., the method for determining the correlation between the drug and the target. For example, in some embodiments of the present application, the method may be implemented as a computer software program tangibly embodied in a machine readable medium such as the storage unit 508. In some embodiments of the present application, all or a part of the computer program may be loaded and/or installed on the electronic device 500 through the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the foregoing method may be implemented. Optionally, in some other embodiments of the present application, the computing unit 501 may be configured in any other suitable manner (e.g., by means of firmware) to implement the above-mentioned method.
Various implementations of the aforementioned systems and techniques may be implemented in a digital electronic circuit system, an integrated circuit system, a field-programmable gate array (FPGA), an application specific integrated circuit (ASIC), an application specific standard product (ASSP), a system on a chip (SOC), a complex programmable logic device (CPLD), computer hardware, firmware, software, and/or a combination thereof. The various implementations may include an implementation in form of one or more computer programs. The one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor. The programmable processor may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device and at least one output device, and may transmit data and instructions to the storage system, the at least one input device and the at least one output device.
Program codes for implementing the methods of the present application may be written in one programming language or any combination of multiple programming languages. These program codes may be provided to a processor or controller of a general purpose computer, a special purpose computer, or other programmable data processing device, such that the functions/operations specified in the flow diagram and/or block diagram are implemented when the program codes are executed by the processor or controller. The program codes may be run entirely on a machine, run partially on the machine, run partially on the machine and partially on a remote machine as a standalone software package, or run entirely on the remote machine or server.
In the context of the present application, the machine readable medium may be a tangible medium, and may include or store a program used by an instruction execution system, device or apparatus, or a program used in conjunction with the instruction execution system, device or apparatus. The machine readable medium may be a machine readable signal medium or a machine readable storage medium. The machine readable medium includes, but is not limited to: an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or apparatus, or any suitable combination thereof. A more specific example of the machine readable storage medium includes: an electrical connection based on one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read only memory (ROM), an erasable programmable read only memory (EPROM or flash memory), an optic fiber, a portable compact disc read only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination thereof.
To facilitate user interaction, the system and technique described herein may be implemented on a computer. The computer is provided with a display device (for example, a cathode ray tube (CRT) or liquid crystal display (LCD) monitor) for displaying information to a user, a keyboard and a pointing device (for example, a mouse or a track ball). The user may provide an input to the computer through the keyboard and the pointing device. Other kinds of devices may be provided for user interaction, for example, a feedback provided to the user may be any manner of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received by any means (including sound input, voice input, or tactile input).
The system and technique described herein may be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middle-ware component (e.g., an application server), or that includes a front-end component (e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the system and technique), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (LAN), a wide area network (WAN), Internet and a block chain network.
The computer system can include a client and a server. The client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on respective computers and having a client-server relationship to each other. The server may be a cloud server, also called as cloud computing server or cloud server, which is a host product in a cloud calculating service system, so as to overcome such defects as large management difficulty and insufficient service extensibility in a conventional physical host and a Virtual Private Server (VPS). The server may also be a server of a distributed system, or a server combined with block chain.
It should be appreciated that, all forms of processes shown above may be used, and steps thereof may be reordered, added or deleted. For example, as long as expected results of the technical solutions of the present application can be achieved, steps set forth in the present application may be performed in parallel, performed sequentially, or performed in a different order, and there is no limitation in this regard.
The above embodiments are for illustrative purposes only, but the present application is not limited thereto. It should be appreciated that the foregoing specific implementations do not constitute a limitation on the protection scope of the present application. A person skilled in the art should understand that various modifications, combinations, sub-combinations and substitutions can be made according to design requirements and other factors. Any modification, equivalent replacement and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Number | Date | Country | Kind |
---|---|---|---|
202110367301.8 | Apr 2021 | CN | national |