This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2018-201591, filed on Oct. 26, 2018, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein relate to a method and device for searching a compound.
A protein is a chain polymer where amino acids are linked one-dimensionally without branching. A protein forms a certain conformation (three-dimensional shape) by folding a chain polymer thereof. A conformation of a protein is determined by a sequence of amino acids.
A conformation of a protein is deeply related to functions of a protein. A molecule-recognition function of a protein is expressed by specifically bounding a certain region within a conformation thereof to a certain molecule. Therefore, it is important to determine a conformation of a protein to understand functions of the protein.
For example, a conformation of a protein can be determined by X-ray crystallography, or nuclear magnetic resonance spectroscopy (NMR). However, it takes a long time to determine a conformation of one protein by X-ray crystallography or NMR. According to X-ray crystallography, moreover, a single crystal of one kind of protein is created first. When the single crystal cannot be created, X-ray crystallography cannot be performed on a conformation of the protein. Moreover, NMR can determine a conformation of a protein in an aqueous solution without crystalizing the protein, but a large quantity of information related to a conformation of the protein cannot be obtained when the protein is a large protein.
Meanwhile, a sequence of amino acids of a protein can be relatively easily determined from genetic information or the protein itself, even when a conformation of the protein is unknown.
Accordingly, there have been attempts to predict a conformation of a protein from a sequence of amino acids. For example, there is a method for determining folding of a protein according to the diamond encoding method. The method is a method for embedding positions of chain amino acids in a diamond lattice, and can express a three-dimensional structure (conformation). Energy of the conformation determined by the above-described method can be calculated, for example, using the Ising model. To solve the Ising model, for example, an annealing machine is used. One example of the background technology is disclosed in R. Babbush et.al., Construction of Energy Functions for Lattice Heteropolymer Models: A Case Study in Constraint Satisfaction Programming and Adiabatic Quantum Optimization, arXiv:quant-ph/1211.3422v2 (https://arxiv.org/abs/1211.3422).
According to one aspect of the present disclosure, a device for searching a compound includes: a defining unit configured to define a lattice space that is a collection of lattices where a plurality of compound groups are sequentially arranged; a limiting unit configured to, in the case where any of the compound groups is arranged in any of the lattices of the lattice space, followed by arranging a next compound group in the lattice space, generate a limited lattice space that is a space created by eliminating, from the lattice space, undesirable regions for the next compound group to be arranged; an assigning unit configured to assign a bit to each of lattice points, to which the compound groups can be arranged, in the limited lattice space; an arithmetic unit configured to perform a ground state search on an Ising model obtained through conversion based on restriction conditions related to each of the lattice points according to simulated annealing, to thereby calculate minimum energy of the Ising model; a judging unit configured to judge whether any of the compound groups assigned to the lattice points is arranged on the outermost edge of the limited lattice space or not; and a controlling unit configured to cause the limiting unit to execute expansion of the limited lattice space, cause the assigning unit to execute assignment of the bits to the lattice points included in the limited lattice space after the expansion, and cause the arithmetic unit to execute calculation of the minimum energy of the Ising model, in the case where the judging unit judges that any of the compound groups assigned to the lattice points is arranged on the outermost edge of the limited lattice space, wherein the device is a device for searching the compound, in which a plurality of the compound groups are linked with one another.
According to another aspect of the present disclosure, a method for searching a compound includes: defining a lattice space that is a collection of lattices where a plurality of compound groups are sequentially arranged; in the case where any of the compound groups is arranged in any of the lattices of the lattice space, followed by arranging a next compound group in the lattice space, generating a limited lattice space that is a space created by eliminating, from the lattice space, undesirable regions for the next compound group to be arranged; assigning a bit to each of lattice points, to which the compound groups can be arranged, in the limited lattice space; performing a ground state search on an Ising model obtained through conversion based on restriction conditions related to each of the lattice points according to simulated annealing, to thereby calculate minimum energy of an Ising model; judging whether any of the compound groups assigned to the lattice points is arranged on the outermost edge of the limited lattice space or not; and executing expansion of the limited lattice space, assigning the bits to the lattice points included in the limited lattice space after the expansion, assignment of the bits to the lattice points included in the limited lattice space after the expansion, and calculation of the minimum energy of the Ising model, in the case where it is judged that any of the compound groups assigned to the lattice points is arranged on the outermost edge of the limited lattice space, wherein the method is a method for allowing a computer to search the compound in which a plurality of the compound groups are linked with one another.
According to another aspect of the present disclosure, a program for searching a compound for causing a computer to execute a method for searching a compound in which a plurality of compound groups are linked with one another. The method includes: defining a lattice space that is a collection of lattices where a plurality of compound groups are sequentially arranged; in the case where any of the compound groups is arranged in any of the lattices of the lattice space, followed by arranging a next compound group in the lattice space, generating a limited lattice space that is a space created by eliminating, from the lattice space, undesirable regions for the next compound group to be arranged; assigning a bit to each of lattice points, to which the compound groups can be arranged, in the limited lattice space; performing a ground state search on an Ising model obtained through conversion based on restriction conditions related to each of the lattice points according to simulated annealing, to thereby calculate minimum energy of an Ising model; judging whether any of the compound groups assigned to the lattice points is arranged on the outermost edge of the limited lattice space or not; and executing expansion of the limited lattice space, assigning the bits to the lattice points included in the limited lattice space after the expansion, assignment of the bits to the lattice points included in the limited lattice space after the expansion, and calculation of the minimum energy of the Ising model, in the case where it is judged that any of the compound groups assigned to the lattice points is arranged on the outermost edge of the limited lattice space.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
The disclosed device for searching a compound is a compound search device for searching a compound in which a plurality of compound groups are linked with one another.
The device for searching a compound includes at least a defining unit, a limiting unit, an assigning unit, an arithmetic unit, a judging unit, and a controlling unit.
The defining unit is configured to define a lattice space that is a collection of lattices where a plurality of compound groups are sequentially arranged.
The limiting unit is configured to, in the case where any of the compound groups is arranged in any of the lattices of the lattice space, followed by arranging a next compound group in the lattice space, generate a limited lattice space that is a space created by eliminating, from the lattice space, undesirable regions for the next compound group to be arranged.
The assigning unit is configured to assign a bit to each of lattice points, to which the compound groups can be arranged, in the limited lattice space.
The arithmetic unit is configured to perform a ground state search on an Ising model obtained through conversion based on restriction conditions related to each of the lattice points according to simulated annealing, to thereby calculate minimum energy of the Ising model.
The judging unit is configured to judge whether any of the compound groups assigned to the lattice points is arranged on the outermost edge of the limited lattice space or not.
The controlling unit is configured to cause the limiting unit to execute expansion of the limited lattice space, causing the assigning unit to execute assignment of the bits to the lattice points included in the limited lattice space after the expansion, and cause the arithmetic unit to execute calculation of the minimum energy of the Ising model, in the case where the judging unit judges that any of the compound groups assigned to the lattice points is arranged on the outermost edge of the limited lattice space.
The disclosed method for searching a compound is a method for searching a compound in which a plurality of compound groups are linked with one another.
The method for searching a compound allows a computer to perform a method including: defining a lattice space that is a collection of lattices where a plurality of compound groups are sequentially arranged; in the case where any of the compound groups is arranged in any of the lattices of the lattice space, followed by arranging a next compound group in the lattice space, generating a limited lattice space that is a space created by eliminating, from the lattice space, undesirable regions for the next compound group to be arranged; assigning a bit to each of lattice points, to which the compound groups can be arranged, in the limited lattice space; and performing a ground state search on an Ising model obtained through conversion based on restriction conditions related to each of the lattice points according to simulated annealing, to thereby calculate minimum energy of the Ising model.
In the method for searching method, moreover, the computer judges whether any of the compound groups assigned to the lattice points is arranged on the outermost edge of the limited lattice space or not, and executes expansion of the limited lattice space, assigning the bits to the lattice points included in the limited lattice space after the expansion, assignment of the bits to the lattice points included in the limited lattice space after the expansion, and calculation of the minimum energy of the Ising model, in the case where it is judged that any of the compound groups assigned to the lattice points is arranged on the outermost edge of the limited lattice space.
The disclosed program for searching a compound is a program for causing a computer to execute a method for searching a compound in which a plurality of compound groups are linked with one another.
The method includes: defining a lattice space that is a collection of lattices where a plurality of compound groups are sequentially arranged; in the case where any of the compound groups is arranged in any of the lattices of the lattice space, followed by arranging a next compound group in the lattice space, generating a limited lattice space that is a space created by eliminating, from the lattice space, undesirable regions for the next compound group to be arranged; assigning a bit to each of lattice points, to which the compound groups can be arranged, in the limited lattice space; and performing a ground state search on an Ising model obtained through conversion based on restriction conditions related to each of the lattice points according to simulated annealing, to thereby calculate minimum energy of the Ising model.
In the program for searching method, moreover, the computer judges whether any of the compound groups assigned to the lattice points is arranged on the outermost edge of the limited lattice space or not, and executes expansion of the limited lattice space, assigning the bits to the lattice points included in the limited lattice space after the expansion, assignment of the bits to the lattice points included in the limited lattice space after the expansion, and calculation of the minimum energy of the Ising model, in the case where it is judged that any of the compound groups assigned to the lattice points is arranged on the outermost edge of the limited lattice space.
Since there is a restriction in hardware of an anealing machine for solving an Ising model, there are restrictions in the number of arithmetic bits or quantum bits the annealing machine can handle.
Meanwhile, the number of bits used for solving a problem of folding of a protein increases exponentially relative to a scale of the protein (the number of amino acid residues) as demonstrated in the graph of
As described above, a scale of a problem to be solved is limited by the restriction in the number of bits handled by hardware, and therefore search targets of amino acids cannot be expanded.
The present disclosure has an object to provide a device, method, and program for searching a compound, which can appropriately suppress the number of arithmetic bits or quantum bits used for searching a predetermined compound, and can search a compound having a large molecular weight.
According to one aspect of the present disclosure, provided is a device for searching a compound, which can appropriately suppress the number of arithmetic bits or quantum bits used for searching a predetermined compound, and can search a compound having a large molecular weight.
According to another aspect of the present disclosure, provided is a method for searching a compound, which can appropriately suppress the number of arithmetic bits or quantum bits used for searching a predetermined compound, and can search a compound having a large molecular weight.
According to another aspect of the present disclosure, provided is a program for searching a compound, which can appropriately suppress the number of arithmetic bits or quantum bits used for searching a predetermined compound, and can search a compound having a large molecular weight.
Before describing the details of the disclosed technology, a method for determining folding of a protein that is a compound according to the diamond encoding method will be described.
A search of a stable conformation of a protein is typically performed in the following manner.
First, coarse graining of a protein is performed (
Next, a structure search is performed using the created coarse-grained model (
Next, the coarse-grained model is returned back to the whole atoms (
The diamond encoding method is a method where a linear amino acid is embedded in a position on a diamond lattice, and can represents a three-dimensional structure. For the sake of simplicity, a two-dimensional structure is described as an example.
Used as an example is a linear pentapeptide having a structure illustrated in
First, an amino acid residue of No. 1 is arranged at a center of a diamond lattice as illustrated in
Next, in
Next, in
Next, in
In the manner as described above, a three-dimensional structure can be expressed by linking the positions where amino acid residues can be arranged.
As amino acid residues are bounded into a straight chain, a radius (n) of a diamond lattice space is set according to the number (n) of amino acid residues to be bounded.
However, amino acid residues are typically rarely arranged into a straight chain in a protein due to interaction between amino acid residues.
Therefore, a conformation of a protein can be determined without matching a radius r of a diamond lattice space with the number (n) of amino acid residues as illustrated in
According to the disclosed technology, therefore, in the case where any of the compound groups is arranged in any of the lattices of the lattice space, followed by arranging a next compound group in the lattice space, a limited lattice space that is a space created by eliminating, from the lattice space, undesirable regions for the next compound group to be arranged, is generated, and a bit is assigned to each of lattice points, to which the compound groups can be arranged in the limited lattice space. This technology may be referred to as “Referential Example” hereinafter. As a result, the number of arithmetic bits or quantum bits used for a search of a predetermined compound is suppressed, and a compound having a large molecular weight can be searched.
In this case, however, the arrangement of the compound group is limited by the outermost edge of the limited lattice space if the limited lattice space is too small. As a result, an appropriate conformation may not be obtained. Specifically, in the case where any of the compound groups assigned to the lattice points is arranged on the outermost edge of the limited lattice space, the lattice space may be excessively limited.
For example, an example where 5 amino acid residues are aligned in a diamond lattice space having a radius of 3 as illustrated in
In the disclosed technology, therefore, the limited lattice space is expanded further when any of the compound groups assigned to the lattice groups is arranged on the outermost edge of the limited lattice space. As a result, the number of arithmetic bits or quantum bits used for search of a predetermined compound can be appropriately suppressed, and a compound having a large molecular weight can be searched.
In the present specification, the term “outermost edge” means an outer shell of a diamond lattice space, and includes both the outermost surface and the outermost side.
For example, the compound groups are amino acid residues.
In the case where the compound groups are amino acid residues, examples of the compound include a protein.
Amino acid that is a base of an amino acid residue may be natural amino acid or synthetic amino acid. Examples of the natural amino acid include alanine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, glycine, histidine, isoleucine, leucine, lysine, methionine, phenyl alanine, proline, serine, threonine, tryptophan, tyrosine, valine, β-alanine, and β-phenylalanine. Examples of the synthetic amino acid include para-benzoylphenylalanine.
The number of amino acid residues in the protein is not particularly limited and may be appropriately selected depending on the intended purpose. For example, the number thereof may be from about 10 to about 30, or about several hundreds.
For example, the number thereof may be from about 10 to about 30 as long as the protein is a protein for middle molecule drug discovery.
One example of the disclosed technology will be described using examples of a device, flowcharts, etc., hereinafter.
A structural example of a device for searching a compound is illustrated in
The device for searching a compound 10A illustrated in
A flowchart for describing a method for searching a stable conformation of a protein using the device for searching a compound 10A of
First, the number (n) of amino acid residues (compound groups) constituting the input protein (an alignment of the amino acid residues) is counted by the compound group number counting unit 11 (S101).
Next, a lattice space that is a collection of lattices to which a plurality of the amino acid residues are sequentially arranged is defined by the defining unit 12 based on the number (n) of the amino acid residues (S102).
One example of the definition of the lattice space will be described. The lattice space is three dimensional, but a two dimensional lattice space is described as an example for simplicity.
First, a collection of lattices within a radius r in a diamond lattice space is determined as a shell, and each lattice point is determined as Sr. Each lattice point Sr is represented as in
In the case where a limited lattice space is not generated unlike the disclosed technology, for example, collections V1 to V5 of lattice points to which amino acid residues of Nos. 1 to 5 are moved is represented as in
In
In
In
In
Note that, when S1, S2, and S3 are represented in three dimension, S1, S2, and S3 are represented as in
In the case where a limited lattice space is not generated, a space Vi used for i-numbered amino acid residues in a protein having amino acid residues in the number of n is represented by the following formula.
In the formula above, i={1, 2, 3, . . . n}.
In case of an odd numbered (i=odd number) amino acid residue, J={1, 3, . . . i}. In case of an even numbered (i=even number) amino acid residue, J={2, 4, . . . i}.
In the disclosed technology, meanwhile, in the case where any of the compound groups is arranged in any of the lattices of the lattice space, followed by arranging a next compound group in the lattice space, a limited lattice space that is a space created by eliminating, from the lattice space, undesirable regions for the next compound group to be arranged, is generated by the limiting unit 13. For example, a space limiting parameter L (L<n) representing a size of a diamond lattice space is set (S103), and a collection of lattice points to which i-numbered amino acid residue is move under the limit of the space limiting parameter L is determined as Vi (S104).
Vi that is the space for the i-numbered amino acid residue is represented by the following formula.
In the formula above, i={1, 2, 3, . . . n}.
When the space limiting parameter L is an even number, and i<L:
When the space limiting parameter L is an even number and i>L:
When the space limiting parameter L is an odd number and i<L:
When the space limiting parameter L is an odd number and i>L:
As described above, a space to which an amino acid residue is arranged is determined.
Next, the assigning unit 14 is configured to assign a bit to each of the lattice points to which a plurality of compound groups are arranged in the limited lattice space. Specifically, special information is assigned to each of bits X1 to Xn (S105). As illustrated in
Next, Hone, Hconn, Holap, and Hpair are set and an Ising model obtained through conversion based on restriction conditions related to each lattice point is created (S106).
Setting of Hone, Hconn, Holap, and Hpair is performed in each of a Hone generating unit 15A, a Hconn generating unit 15B, a Holap, generating unit 15C, and a Hpair generating unit 15D of the H generating unit 15.
In the diamond encoding method, the entire energy can be expressed as follows.
E(x)=H=Hone+Hconn+Holap+Hpair
In the formula above, Hone is a restriction that there is only one from each of first to n-numbered amino acids.
Hconn is a restriction that the first to n-numbered amino acids are all linked with one another.
Holap is a restriction that the first to n-numbered amino acids are not overlapped with one another.
Hpair is a restriction representing an interaction between amino acids.
One example of each restriction is as follows.
Note that, in
X2 to X5 are positions to which an amino acid residue of No. 2 can be arranged.
X6 to X13 are positions to which an amino acid residue of No. 3 can be arranged.
X14 to X29 are positions to which an amino acid residue of No. 4 can be arranged.
One example of Hone is presented below.
In the function above, Xa and Xb may be 1 or 0. Specifically, Hone is a function that energy increases when any two or more of X2, X3, X4, and X5 are 1, because only one of X2, X3, X4, and X5 is 1 in
Note that, in the function above, Aone is a weighting coefficient.
One example of Hconn is presented below.
In the function above, Xd and Xu may be 1 or 0. Specifically, Hconn is a s formula that energy decreases as long as X13, X6, or X7 is 1 when X2 is 1 in
Note that, in the function above, λconn is a weighting coefficient. For example, the relationship of λone>λconn is satisfied.
One example of Holap is presented below.
In the function above, Xa and Xb are 1 or 0. Specifically, Holap is a term generating a penalty when X14 is 1 with X2 being 1 in
Note that, in the function above, λolap is a weighting coefficient.
One example of Hpair is presented below.
In the function above, Xa and Xb may be 1 or 0. Specifically, Hpair is a function that energy decreases due to interaction Pω(x1)ω(x15) between the amino acid residue of Xi and the amino acid residue of X15 when X15 is 1 with X1 being 1 in
Next, H is calculated by synthesizing Hone, Hconn, Holap, and Hpair by the synthesizing unit 15E.
Next, a weighting coefficient (λone, λconn, and λolap) of each functions above is extracted by the weight extracting unit 16.
Next, a weight file corresponding to the extracted weight coefficient is created by the weight file creating unit 17. For example, the weight file is a matrix. In case of 2X1X2+4X2X3, for example, the weight file is a matrix file as illustrated in
The following energy formula of the Ising model can be expressed by using the created weight file.
In the function above, the states Xi and Xj may be 0 or 1, where 0 means absence and 1 means presence. Wij that is a first term of the right side is a weighting coefficient.
The first term of the right side is the integration of the product of the state of two neuron circuits and the weighting value for all selectable combinations of two neuron circuits from the whole neuron circuits without any omission or overlap.
Moreover, the second term of the right side is the integration of the product of the bias value and state of each of the whole neuron circuits. bi is a bias value of the i-numbered neuron circuit.
Next, the arithmetic unit 18 (annealing machine) executes a ground state search of the Ising model converted based on the restriction conditions related to each of the lattice points according to simulated annealing to thereby calculate the minimum energy of the Ising model (S107).
The arithmetic unit 18 (annealing machine) may be any of a quantum annealing machine, a semiconductor annealing machine using a semiconductor technology, or simulated annealing executed by software using a central processing unit (CPU) or a graphics processing unit (GPU), if the computer for use is a computer employing an annealing system for performing a ground state search of an energy function represented by the Ising model.
One example of simulated annealing and the arithmetic unit 18 (annealing machine) will be described below.
Simulated annealing (SA) is a kind of Monte Carlo methods, and a method for stochastically determining using a random numerical value. In the description below, a problem for minimizing a value of an evaluation function to be optimized is taken as an example, and the value of the evaluation function is called energy. In case of maximization, a plus or minus sign of the evaluation function may be changed.
Starting with an initial state where one discrete value is assigned to each variable, a state that is close to the initial state (e.g., a state where only one variable is changed) is selected from the current state (combinations of values of the variables), and then state transition thereof is studied. An energy change for the state transition is calculated, and whether the state transition is adapted to change the state or the original state is retained without adapting the state transition is determined stochastically depending on the calculated value. When the adaption probability of a case where energy reduces is selected to be larger than the adaption probability of a case where energy increases, the state change occurs in the tendency that the energy reduces on average, and it is expected that the state transits to an appropriate state over time. Then, ultimately, it is possible to obtain an approximation solution that gives energy close to an optimum solution or an optimum value. If the case where energy reduces is adopted deterministically and the case where energy increases is not adapted, the energy change is in the state of weakly decreasing with respect to time, but the change will stop once the local solution is reached. Since there are a large number of local solutions in the discrete optimization problem as described above, it is most likely that the state is trapped by a local solution that is not very close to an optimum value. Accordingly, it is important to stochastically determine whether to adapt.
It is proved in the simulated annealing that the state reaches the optimum solution with the limit of infinite time (the number of iteration) when the adaptation (tolerance) probability of the state transition is determined as follows.
(1) With respect to an energy change (energy decrease) value (−ΔE) along with the state transition, acceptance probability p of the state transition is determined by any of the following functions f( ).
In the formula above, T is a parameter called a temperature value, which is changed as follows.
(2) The temperature value T is logarithmically decreased relative to the number of iteration t as represented by the following formula.
In the formula above, T0 is an initial temperature value, and is desired to be sufficiently large depending on a problem.
In the case where acceptance probability represented by the formula of (1) is used, once the state reaches a steady state after sufficient iterations, occupancy probability of each state follows the Boltzmann distribution for a thermal equilibrium state in thermodynamics.
As the temperature is gradually lowered from a high temperature, occupancy probability of a low energy state increases. Therefore, a low energy state is supposed to be obtained when the temperature is sufficiently reduced. The state as described above is very similar to a state change occurred when a material is developed. Therefore, the method described above is called simulated annealing. The stochastic occurrence of the state transition of energy increase is equivalent to thermal excitation in physics.
An optimizing device (arithmetic unit 18) for performing simulated annealing is illustrated in
An optimizing device 100 includes a state retaining unit 111 configured to retain a current state S (a plurality of state variable values). Moreover, the optimizing device 100 includes an energy calculating unit 112 configured to calculate an energy change value {−ΔEi} of each state transition when a state transition occurs from the current state S due to a change of any of the state variable values.
Moreover, the optimizing device 100 includes a temperature controlling unit 113 configured to control a temperature value T and a transition controlling unit 114 configured to control a state transition.
The transition controlling unit 114 is configured to stochastically determine whether any of the state transitions is adapted or not according to the correlation between the energy change value {−ΔEi} and the thermal excitation energy based on the temperature value T, the energy change value {−ΔEi}, and the random numerical value.
The transition controlling unit 114 is further subdivided. The transition controlling unit 114 includes a candidate generating unit 114a configured to generate candidates of state transition, and a judging unit 114b configured to stochastically judge on each candidate whether the state transition is allowed or not based on the energy change value {ΔEi} and temperature value T thereof. The transition controlling unit 114 further includes a transition determining unit 114c configured to determine the candidate to be adapted among the allowed candidates, and a random number generating unit 114d configured to generate probability variables.
An operation of one iteration is as follows. First, the candidate generating unit 114a generates one or more candidates (candidate number {Ni}) of state transition from the current state S retained in the state retaining unit 111 to the next state. The energy calculating unit 112 calculates an energy change value {−ΔEi} for each of state transitions listed as candidates using the current state S and the candidates of state transition. The judging unit 114b accepts the state transition with the acceptance probability of the formula of (1) above according to the energy change value {−ΔEi} of each state transition using the temperature value T generated by the temperature controlling unit 113 and the probability variable (random numerical value) generated by the random number generating unit 114d. Then, the judging unit 114b outputs acceptance or rejection {fi} of each state transition. In the case where there are a plurality of the accepted state transitions, the transition determining unit 114c randomly selects one of the accepted state transitions using the random numerical value. The transition determining unit 114c outputs the transition number N of the selected state transition and acceptance or rejection of the transition f. In the case where there is the accepted state transition, the value of the state variable stored in the state retaining unit 111 is updated according to the adapted state transition.
The iteration described above is started from an initial state and repeated with decreasing the temperature value by the temperature controlling unit 113. When the finishing judgement conditions, such as reaching the certain number of iterations, or the energy being dropped below a certain value, are satisfied, the operation is completed. The answer output by the optimizing device 110 is the state at the time of the finish.
A transition controlling unit 114 includes a random number generator 114b1, a selector 114b2, a noise table 114b3, a multiplier 114b4, and a comparator 114b5.
The selector 114b2 is configured to select the value corresponding to the transition number N that is a random numerical value generated by the random number generator 114b1 among the energy change values {ΔEi} calculated for candidates of each state transition, and then output the value.
Functions of the noise table 114b3 will be described later. As the noise table 114b3, for example, a memory, such as a random access memory (RAM), and a flash memory, can be used.
The multiplier 114b4 outputs a product (corresponding to the above-described thermal excitation energy) obtained by multiplying the value output by the noise table 114b3 with the temperature value T.
The comparator 114b5 outputs, as transition acceptance or rejection f, a comparison result obtained by comparing the product result output by the multiplier 114b4 and the energy change value −ΔE selected by the selector 114b2.
The transition controlling unit 114 illustrated in
The circuit that outputs 1 with the acceptance probability p and 0 with (1−p) has two inputs A and B, can be realized by inputting the acceptance probability p to the input A of the comparator and a uniform random number having the value in the interval [0, 1] to the input B of the comparator where the comparator outputs 1 when A>B and outputs 0 when A<B. Accordingly, the above-described function can be realized by inputting the value of the acceptance probability p calculated from the energy change value and the temperature value T using the formula of (1) to the input A of the comparator.
Specifically, the above-described function can be realized with the circuit that outputs 1 when f(ΔE/T) is larger than u, where f is the function represented by the formula of (1) and u is a uniform random number having the value of the interval [0, 1].
The circuit may be as it is, but the same function can be also realized by performing the following deformation. The magnitude relationship of two numbers does not change when the same monotone increasing function is given the two numbers. Therefore, output does not change even when the same monotone increasing function is gives two inputs of the comparator. It can be understood that a circuit outputting 1 when −ΔE/T is larger than f−1(u) is acceptable when an inverse function f−1 off is used as the monotone increasing function. Since the temperature value T is a positive value, moreover, a circuit outputting 1 when −ΔE is larger than Tf1(u) is acceptable. The noise table 114b3 in
The transition controlling unit 114 also includes a latch configured to retain judgement results etc., a state machine configured to generate timing thereof, etc., but the above-mentioned units are omitted in
Next, as a confirmation whether the limited lattice space is set to have a sufficient space, the judging unit 19 judges whether any of the compound groups assigned to the lattice points is arranged on the outermost edge of the limited lattice space (S108).
Typically, the judgement is performed on a conformation of a protein having the minimum energy calculated.
During the judging, the judgement is performed on whether any of the compound groups excluding the compound group arranged first and the compound group arranged last is arranged on the outermost edge of the limited lattice space among the compound groups assigned to the lattice points. This is because the compound group arranged first is not typically arranged on the outermost edge. Moreover, this is because the arrangement of the compound group is not limited by the outermost edge as there is no more compound group to be arranged next even when the compound group arranged last is arranged on the outermost edge.
When the judging unit 19 judges that any of the compound groups assigned to the lattice points is not arranged on the outermost edge of the limited lattice space, it is judged that a sufficient space of the limited lattice space is set, and a calculation result related to conformation of a protein having the calculated minimum energy is output (S109).
The calculation result is output from the outputting unit 21. The result may be output as a protein conformation diagram or may be output as coordinate information of each amino acid residue constituting a protein.
When the judging unit 19 judges that any of the compound groups assigned to the lattice points is arranged on the outermost edge of the limited lattice space, meanwhile, it is judged that a sufficient space of the limited lattice space is not set and hence the limited lattice space is expanded. Therefore, expansion information K is added to the space limiting parameter L (S110). Then, assignment of bits to the lattice points included in the limited lattice space after the expansion and calculation of the minimum energy of the Ising model are performed (S104 to S107).
During this step, the controlling unit 20 causes the limiting unit 13 to execute expansion of the limited lattice space.
Moreover, the controlling unit 20 causes the assigning unit 14 to execute assignment of a bit to each lattice point included in the limited lattice space after the expansion.
Moreover, the controlling unit 20 causes the arithmetic unit 18 to execute calculation of the minimum energy of the Ising model.
When the limited lattice space is expanded, the limited lattice space is not typically expanded to the original lattice space.
For example, the embodiment of the expansion may be an embodiment where the limited lattice space is uniformly expanded by the predetermined number of lattice points towards the outside of the outermost edge of the limited lattice space, and may be an embodiment where only a lattice space surrounding the compound group arranged on the outermost edge of the limited lattice space is expanded.
Moreover, the controlling unit 20 preferably does not change the already assigned bits to the lattice points of the compound group judged as being arranged on the outermost edge of the limited lattice space and the compound group arranged the earlier than the compound group judged as being arranged on the outermost edge. In the case where there are a plurality of compound groups judged as being arranged on the outermost edge of the limited lattice space, the “compound group judged as being arranged on the outermost edge of the limited lattice space” is preferably the compound group arranged the earliest among the compound groups judged as being arranged on the outermost edge of the limited lattice space.
Moreover, more preferred is that a difference (n−M) between the order (n) of the arrangement of a compound group arranged at last, and the order (M) of arrangement of the compound group judged as being arranged on the outermost edge of the limited lattice space be considered in expansion information, and the controlling unit 20 be configured to cause the limiting unit 13 to execute expansion of the limited lattice space based on the expansion information in a manner that the limited lattice space is expanded smaller when the difference (n−M) is small than when the difference (n−M) is large.
One of the above-described embodiments may be performed or the above-described embodiments may be performed in combination. Examples thereof will be described in Examples 1 to 6 below.
An example of expansion of a limited lattice space will be described with reference to drawings.
In
Therefore, it is judged by the judgment performed in Step S108 that any of compound groups assigned to lattice points is arranged on the outermost edge of the limited lattice space and expansion of the diamond lattice space is performed.
The expansion is an embodiment where the limited lattice space is uniformly expanded by the predetermined number from the lattice number towards the outside of the outermost edge.
For example, the expansion above can be performed according to the expansion information K.
When compound groups are sequentially arranged, arrangement of the compound groups is not restricted by the outermost edge of the limited lattice space until the compound group is arranged on the outermost edge of the limited lattice space.
Therefore, preferred in view of reduction in a calculation time is that, as well as expanding the limited lattice space with maintaining the arrangement of the compound groups until a compound group is arranged on the outermost edge of the limited lattice space, assignment of bits to the lattice points included in the limited lattice space after the expansion and calculation of the minimum energy of the Ising model be executed.
In the case where assignment of bits to the lattice points included in the limited lattice space after the expansion and calculation of the minimum energy of the Ising model are executed as well as expanding the limited lattice space, it is preferred that bits already assigned to the lattice points of the compound group judged as being arranged on the outermost edge of the limited lattice space and the compound group arranged the earlier than the compound group judged as being arranged on the outermost edge be not changed. In the case where a plurality of compound groups are judged as being arranged on the outermost edge of the limited lattice space, in Example 2, the “compound group judged as being arranged on the outermost edge of the limited lattice space” is preferably the compound group arranged the earliest among the compound groups judged as being arranged on the outermost edge.
The examples above will be described with a flowchart and drawings.
The flowchart of
In
In
Therefore, it is judged by the judgment performed in Step S108 that any of compound groups assigned to lattice points is arranged on the outermost edge of the limited lattice space and expansion of the diamond lattice space is performed.
As illustrated in
For example, the constraint term Hfix is represented by the following formula.
In the function above, XAi is an address to be fixed (not changed), and i is the sequence number of the amino acid residues.
In the function above, λfix is a weighting coefficient.
E(x) to which the constraint term Hfix is added can be expressed as follows.
E(x)=H=Hone+Hconn+Holap+Hpair+Hfix
The constraint term Hfix is prioritized than other constraint terms by making λfix of the constraint term Hfix sufficiently larger than the weighting coefficient than other constraint terms, and therefore XAi is fixed to 1.
An embodiment of the expansion is preferably an embodiment where only the lattice space surrounding the compound group arranged on the outermost edge of the limited lattice space is expanded. There is a less possibility that an amino acid residue is arranged even when the lattice space other than the space surrounding the compound group arranged on the outermost edge of the limited lattice space is expanded. Therefore, the expanding range can be maintained to an appropriate range by using an embodiment where only the lattice space surrounding the compound group arranged on the outermost edge of the limited lattice space.
To expand only the lattice space surrounding the compound group arranged on the outermost edge of the limited lattice space may be referred to as “giving the expansion directionality” hereinafter.
An embodiment of Example 3 is an embodiment where directionality is given to the expansion in the embodiment of Example 2.
An embodiment of Example 3 will be described with reference to drawings.
In
In
Therefore, it is judged by the judgment performed in Step S108 that any of compound groups assigned to lattice points is arranged on the outermost edge of the limited lattice space and expansion of the diamond lattice space is performed.
As illustrated in
In the case where the sequence number of the compound group arranged on the outermost edge of the limited lattice space is large, the number of compound groups (remaining compound groups) having the larger sequence number than the compound group arranged on the outermost edge is small. In the case where the sequence number of the compound group arranged on the outermost edge of the limited lattice space is small, on the other hand, on the other hand, the number of compound groups (remaining compound groups) having the larger sequence number than the compound group arranged on the outermost edge is large. In the case where the number of the remaining compound groups is large, appropriate expansion may not be performed unless a range of the expansion is made large. In the case where the number of the remaining compound groups is small, on the other hand, most of the expanded range may be pointless.
Therefore, a range of expansion of the limited lattice space is preferably made large when the sequence number of the compound group arranged on the outermost edge of the limited lattice space. A range of expansion of the limited lattice space is preferably made small when the sequence number of the compound group arranged on the outermost edge of the limited lattice space is large.
Specifically, it is preferred that the limited lattice space is preferably expanded in a manner that the limited lattice space be expanded smaller when the difference (n−M) is small than when the difference (n−M) is large, considering in the expansion information K the difference (n−M) between the order (n) of the arrangement of the compound group arranged last and the order (M) of the arrangement of the compound group judged as being arranged on the outermost edge of the limited lattice space.
The above-described embodiment may be referred to as an “amino acid residue number dependency” of the expansion range hereinafter.
The embodiment of Example 4 is an embodiment where the amino acid residue number dependency is given to the expansion range in the embodiment of Example 1.
An embodiment of Example 4 will be described with reference to drawings.
In
In
Therefore, it is judged by the judgment performed in Step S108 that any of compound groups assigned to lattice points is arranged on the outermost edge of the limited lattice space and expansion of the diamond lattice space is performed.
When the expansion is performed, a range K of the expansion (the number of layers of the lattice points) is determined based on the following function.
K=roundup((n−M)/2)
In the equation above, “round up” is a function for rounding up after the decimal point.
In case of n=7 and M=4, for example, (7−4)/2=1.5 and K=2. As illustrated in
Example 5 is an embodiment where Example 4 and Example 2 are combined.
The embodiment of Example 5 will be described with reference to drawings.
In
In
Therefore, it is judged by the judgment performed in Step S108 that any of compound groups assigned to lattice points is arranged on the outermost edge of the limited lattice space and expansion of the diamond lattice space is performed.
When the expansion is performed, a range K of the expansion (the number of layers of the lattice points) is determined based on the following function.
K=roundup((n−M)/2)
In the equation above, “round up” is a function for rounding up after the decimal point.
In case of n=7 and M =4, for example, (7−4)/2=1.5 and K=2. As illustrated in
Example 6 is an embodiment where Example 4 and Example 3 are combined.
The embodiment of Example 6 will be described with reference to drawings.
In
In
Therefore, it is judged by the judgment performed in Step S108 that any of compound groups assigned to lattice points is arranged on the outermost edge of the limited lattice space and expansion of the diamond lattice space is performed.
When the expansion is performed, a range K of the expansion (the number of layers of lattice points) is determined based on the following function with giving the directionality to the range of the expansion.
K=roundup((n−M)/2)+1
In the equation above, “round up” is a function for rounding up after the decimal point.
In case of n=7 and M=4, for example, (7−4)/2=1.5 and K=3. As illustrated in
Examples 1 to 6 are described above, and can be summarized in
Note that, the phrase “Is it automatically set?” in
Moreover, the conventional method in
Moreover, Referential Example in
Note that, the device for searching a compound 10A of
Next, a modified example of the steps S101 to S107 of the flowchart of
In the flowchart of
Therefore, the descriptions are given with focusing on the limiting unit 13 and the step S203.
By setting the maximum number M (straight chain number limiting parameter M) of amino acid residues aligned in a straight chain (S203), the limiting unit 13 generates a limited lattice space obtained by removing, from the lattice space, undesirable regions for the next compound group to be arranged, in the case where any of a plurality of compound groups is arranged in any lattice of the lattice space, followed by arranging the next compound group in the lattice space.
As described earlier, amino acid residues are typically rarely aligned in a straight chain due to interaction between the amino acid residues.
Therefore, the number of arithmetic bits or quantum bits can be suppressed by setting the maximum number M (straight chain number limiting parameter M) of amino acid residues aligned in a straight chain, and eliminating regions where the amino acid residues are not be disposed under the restrictions above to thereby generate a limited lattice space. Naturally, M is smaller than the number (n) of amino acid residues (M<n).
For example, when the straight chain number limiting parameter M is set to 5, as illustrated in
When the straight chain number limiting parameter M is set, the limited lattice space increases as the number of the amino acid residues increases as illustrated in
A space limiting parameter L (L<n) may be used in combination for generation of a limited lattice space. In this case, it is preferable that L<K be satisfied.
In the flowchart of
Therefore, descriptions are given with focusing on the limiting unit 13 and the step S304.
In the case where any of a plurality of compound groups is arranged in any lattice of a lattice space followed by arranging a next compound group in the lattice space, the limiting unit 13 creates a limited lattice space, which is obtained by eliminating undesirable regions for the next compound group to be arranged from the lattice space, by setting the maximum number M (straight chain limiting parameter M) of amino acid residues aligned in a straight chain (S303), and moreover defining the maximum S(i) of the site to which i-numbered amino acid residue is moved (S304).
When the straight chain number limiting parameter M is used, a space radius r of each amino acid residue is for example as presented in Table 1 with M=5 (K=8), n=11, and L=K.
The above-described example is visualized as in
Therefore, the straight chain limiting parameter M is added and a space parameter s(x) using the straight chain number limiting parameter s(x). As a result, the space can be limited as follows, and the number of bits can be suppressed without lowering accuracy.
When the space limiting parameter L is an even number, and i<L:
When the space limiting parameter L is an even number, and i>L:
When the space limiting parameter L is an odd number, and i<L:
When the space limiting parameter L is an odd number, and i>L:
Regarding the technology of referential examples, an embodiment where a straight chain number limiting parameter is not set is determined as Referential Example 1, a modified example for describing using
It was confirmed in all of Examples that the number of bits used could be significantly reduced compared to Comparative Example 1 where no restriction was given, and a compound having relatively a large scale of a problem (e.g., a protein) could be used as a target of a search.
According to the referential examples above, however, the arrangement of the compound groups is limited by the outermost edge of the limited lattice space if the limited lattice space is two narrow. As a result, an appropriate conformation may not be obtained.
Therefore, the number of bits can be appropriately suppressed, for example, by combining any of the referential examples with Examples 1 to 6, compared with the conventional technology.
When a lattice space is not limited, for example, the number of bits used for amino acid residues in the number of n is represented by the following formula.
When the limit of the lattice space (L: space limiting parameter) is set to L=n−1 and the radium of the space to be expanded (K: expansion information) is set to K=1, in the case where the limited lattice space is expanded with directionality, for example, the directional increase is 3 bits when the amino acid residue arranged on the outermost edge is positioned on the outermost plane of the limited lattice space, is 4 bits when the amino acid residue arranged on the outermost edge is positioned on the outermost side of the limited lattice space, and is 5 bits when the amino acid residue arranged on the outermost edge is positioned on the outermost apex of the limited lattice space. Therefore, the number of bits used for expanding the limited lattice space with directionality can be represented by the following formula when the limit of the lattice space (L: space limiting parameter) is set to L=n−1 and the radius of the space to be expanded (K: expansion information) is set to K=1.
An example where 11 amino acid residues are arranged as in
When the lattice space for the 11 amino acid residues is not limited (in the case of n=11), the number of bits used is 2,921 bits.
In case of the arrangement as in
In case of the arrangement as in
Note that, such a reduction effect becomes the larger as the number of the amino acid residues is the larger.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the sprit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2018-201591 | Oct 2018 | JP | national |