The present invention relates, in general, to DNA computing and, more particularly, to a DNA computing method of providing solutions of theorem proving with a resolution refutation.
Beyond conventional, sequential, irreversible silicon computing. LM computing, also known as molecular computing, is a new computational paradigm that harnesses biological molecules to solve computational problems, with the promise of massive parallelism to find solutions to huge problems by invoking a parallel search when enough DNA information is given. Research in this area began with an experiment by Leonard Adleman, in 1994 by using the tools of molecular biology to solve a hard computational problem.
Current silicon computing uses binary bits to write information in code where binary numbers 1 and 0 correspond to electric states “ON” and “OFF”. The basic calculation of information processing in silicon computing consists of logical product (AND calculation: e.g., 1 and 1=1; 1 and 0=0; 0 and 0=0) and logical sum (OR calculation: e.g., 1 or 1=1; 1 or 0=1; 0 or 0=0).
By contrast, DNA computing represents any information with synthetic DNA molecules consisting of four basic bases adenine (A), guanine (G), thymine (T) and cytosine (C). Instead of using electrical impulses to represent bits of information, DNA computers use the chemical properties of these molecules by examining the patterns of combination or growth of the molecules or strings. For instance, when the digit number “2” is represented, silicon-based computers use electric states “ON” and “OFF” corresponding to the 8-bit combination 00000010. In contrast, DNA-based computers employ the bases in the 8-oligonucleotide combination AAAACCTG.
According to the father of DNA computing, Adleman, computing by representing any information in DNA base sequences enjoys thee following advantages thanks to the massive parallelism of the biochemical reactions on DNA molecules:
Huge information can be represented in logic which can be processed. As well known in the computation and artificial intelligence fields, logic is used as a method for expressing the most fundamental knowledge and inferring new knowledge from the preexisting knowledge known to be true.
In order to enable DNA-based computers to process a huge amount of information of real life and infer new knowledge with logic, it is important to develop logic expressions and processing methods. Logic expressions consist of true and false symbols for expressing truth or falsehood, Boolean variables which can take only T (true) or F (false) as their values, and connectives such as logical sum (), logical product (), negation (), implication () and equality (). For the processing of logic, such concepts or methods as propositional logic, propositional calculus, predicate logic, predicate calculus, inference, resolution and refutation are mainly used.
In the automated reasoning field which is one of the artificial intelligence fields, research has been made on silicon computer-based logic processing methods for representing knowledge of real life and eliciting new 2 knowledge from the known knowledge through inference. Theorem proving, one of the problems which has been under study in the field, is a method for logical reasoning which determines whether or not knowledge newly obtained based on already known knowledge coincides with preexisting knowledge or whether or not new knowledge can be obtained from preexisting knowledge through a logical procedure. The automated reasoning field sets the goal of letting computers automatically perform such processes so as to endow the computers with automated information processing capability. An expert system is an example in which such technology is applied to real life.
There are two methods of eliciting a solution to theorem proving: a deductive method in which, starting from a preexisting knowledge (hereinafter referred to as “theory”), a new knowledge (hereinafter referred to as “theorem”) is obtained through rounds of deduction; and a refutation method in which the negation of a theorem is added to a theory set and research is made on whether inconsistency happens in the theory set. The deduction method is known to have no assured measures for resolving a theorem from a theory. In contrast, the refutation method makes it possible to give a solution to theorem proving only by repetitively applying reasoning rules to theory and theorem sets. That is, computers into which available reasoning rules are inputted can automatically solve theorem proving problems. Therefore, extensive research has been made on the refutation method in the automated reasoning field.
Because resolution is not only a reasoning method of always eliciting logically true results, but gives simple and mechanical rules suitable for computers, it has been used to find solutions to theorem proving in computers. However, it is necessary to express the theory and theorem in a certain form, that is, a conjunctive normal form.
Prior to defining the conjunctive normal form, explanations are needed for some terms. The term “literal” as used herein means a most fundamental unit of a logical formula, and may be subject to a positive or a negative literal. Positive literals consist of Boolean variables themselves while negative literals are expressed as negations of Boolean variables. The term “clause” is defined as a formula in which positive or negative literals are connected to one another only through a logical sum. The conjunctive normal form means a formula in which clauses are connected to one another only through a logical product, and is expressed as a set of clauses on a computer. According to research results, it is known that all logical formulas can be expressed in a conjunctive normal form by making use of the distribution law [P(QR)(PQ)(PR)] and the De Morgan law [(PQ)][PQ].
Following is how to calculate logical formulas which are expressed in a conjunctive normal form and to which resolution is applied assuming that a positive literal is subject to one of two clauses and its negation to the other clause. After removals of the two contradict literals from the two clauses, the remains can be connected via a logical sum to draw a clause. It is demonstrated that the obtained clause is the result of the logically correct derivation from, the two clauses. For instance, two clauses PQ and PR are derived into QR through resolution. The newly drawn clause QR is called resolvent.
Where a theorem proving problem is solved by a resolution refutation, the presence of inconsistency produces a conjunctive normal form of PP which is expressed as two clauses, P and P, on computers. If the resolution continues to be applied to the two clauses, the resulting resolvent is derived into a clause to which no literal is subject, called an “empty clause”. In the resolution refutation, the empty clause is construed as a signal of inconsistency. Proof is an orderly arrangement of the clauses and resolvents which are used until an empty clause is produced.
A simple example of the resolution refutation is seen below. Given is a theory set of “if it rains, then the ground is wet.”, “it rained.” Let us prove that a proposition of “the ground was wet” is the conclusion which can be drawn from this theory set. If the proposition of “it rains” is a expressed as a Boolean variable R and the proposition of “the ground is wet” as a Boolean variable E, the proposition of “if it rains, then the ground is wet” is expressed as RE. This is known to be equivalent to RE. Thus, the theory set is {RE, R}. The negation of the logical formula to be proved, expressed as E, is added to the clauses RE and R to which resolution is then applied to draw E. Resolving an clauses E and E results in an empty clause. Therefore, it is proven that the original logical formula is the conclusion which can be logically drawn from the theory set.
As seen above, the resolution refutation is a sequential repetition of a procedure in which a resolution is applied to two clauses to draw a new resolvent. In a theorem proving process, the complexity of proof grows exponentially with given clauses. Because of their limitations to a sequential calculation, silicon-based computers can test logically correct proofs one by one. Hence, there is needed a strategy for such discreet selection of clauses to which resolution is applied as to test as few numbers of proofs as possible. To this end, a breadth-first strategy and a depth-first strategy were suggested. In addition, as an approach to reduce the selection range of the clauses to which resolution is applied, linear resolution, semantic resolution or etc. was developed. Despite all these strategies, however, no exceptional advance in performance has been achieved on silicon-based computers.
A solution to this problem is to parallelize the theorem proving process of drawing a resolvent from two clauses. The massive parallelism of DNA molecular reactions for implementing parallel theorem provers makes it possible to produce resolvents from many clauses simultaneously. Accordingly, the massive parallelism of DNA molecular reactions not only exceptionally reduces the time period which it takes to perform a theorem proving process, but enables various theorem proving processes to be implemented at the same time. In contrast to silicon-based computers where the computing time period increases exponentially with the number of clauses, DNA-based computers can give solutions to theorem proving in an exceptionally short period of time. Moreover, if theorem proving problems are solved by experiments, DNA molecules can be used without digitalizing the biological data which are enriched due to great advances in the human genome project.
Among the problems which are difficult to effectively solve with conventional computers, the satisfiability problem exists. This problem requires calculations for the conjunctive normal form. It is known that all of the solve-hard problems can be converted into satisfiability problems. That is, finding solutions to satisfiability problems is finding solutions to the problems which are reversibly converted into the satisfiability problems. Intensive research has been done on these problems by use of the above-mentioned DNA computing, leading to the following results in which clauses in conjunctive normal form are represented with DNA and processed.
In the case that only one of the two literals constituting a clause is a negative literal, the clause Corresponds to a simple “IF-THEN” rule and can be represented by a double strand of DNA (P. Wasiewicz, T. janczak, j. J. Mulawka, and A. Rlucienniczak, The inference based on molecular computing, Cybernetics and Systems, 31:283-315, 2000). However, this method has a disadvantage in that expression is impossible when the nun er of literals in a clause exceeds two.
Likewise, after the condition of “IF” and the conclusion of “THEN” in each “IF-THEN” rule are given different sequences, single strands of DNA in which complementary sequences are connected to the sequences given to the condition and the conclusions are used to express a set of “IF-THEN” rules (K. Sakamoto, D. Kiga, K. Komiya, H. Gouzu, S. Yokoyama, S. Ikeda, H. Sugiyama, M. Hagiya, State transition by molecules, In Proceedings of 4th DIMACS Workshop on DNA Based Computers, 1998). In this expression method, a set of “IF-THEN” rules is expressed by a sequence in which the sequences given to the rules are connected in a line. However, where the “IF” condition of one rule corresponds to the “THEN” conclusion of another rule, many complementary sequences exist in a single strand of DNA so that they are highly likely to undesirably bind to each other. Additionally, this method cannot express a clause containing three or more literals. Between literals contained in a clause, sequences that different restriction enzymes recognize are inserted to express the clause (S. Kobayashi, T. Yokomori, G Sampei, and K. Mizobuchi, DNA implementation of simple horn clause computation. In Proceedings of the IEEE International Conference on Evolutionary Computation, 1997). However, there are a limited number of restriction enzymes in the natural world so that the method cannot express the clause which contains literals more than restriction enzymes. Furthermore, reaction conditions for restriction enzymes are so different that high complexity is loaded on experiment.
Methods for expressing highly complex logical formulas with DNA and processing them are under study. In particular, many attempts have been made to use DNA to express literals and clauses of logical formulas, but is limited to the expression of logical formulas or “IF-THEN” rules, research on chain reactions, or solutions to the Hamiltonial path problem or satisfiability problems.
With the prior problems in mind, the present invention has an object of providing a DNA computing method using a resolution refutation, in which theorem proving can be solved in a parallel manner and experimentally implemented in order to accomplish the above object, the present invention provides a DNA computing method of providing solutions of theorem proving, comprising the steps of: representing logical clauses for theorem proving with DNA molecules; synthesizing the DMA molecules; chemically reacting the synthetic DNA molecules; and deciding solutions to the theorem proving based on the result of the chemical reaction.
In accordance with an, embodiment, the representing step comprises encoding a positive literal of the clauses with a base sequence, the negation of the positive literal with the complementary base sequence, and a clause with a single strand of DNA or a branched form of DNA.
The above and other objects, features and other advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
The application of the preferred embodiments of a DNA computing method according to the present invention is best understood with reference to the accompanying drawings, wherein like reference numerals are used for like and corresponding parts, respectively.
Solutions to theorem proving are drawn by connecting a literal of one clause to the negation of the literal of the other clause in a conjunctive normal form to produce a resolvent in accordance with conventional computation whereas by hybridizing a certain base sequence with its complementary sequence in the present invention. Additionally, a final result through the resolution refutation is determined by the production of an empty clause in conventional methods while by the production of a perfect double strand of DNA which is verified by PCR in the present invention. In giving solutions to theorem proving according to the DNA computing of the present invention, each DNA molecule is used to express a clause and resolution results can be obtained through the parallel and spontaneous reactions of the molecules used.
The expression of conjunctive normal forms according to the present invention is based on the finding that, when complementary sequences are given to positive/negative literals in the conjunctive normal forms, the procedure of removing the positive/negative literals from the two clauses to draw a resolvent can be represented with the annealing reaction between DNA molecules corresponding to clauses, characterized by (1) giving a certain DNA sequence to a positive literals, (2) expressing the negative literal with the DNA sequence complimentary to the DNA sequence corresponding to the positive literal, and (3) representing one clause with at least one strand of DNA molecules. DNA strands corresponding to clauses may take linear forms, hairpin structures or branched forms.
In conventional computing methods of providing solutions of theorem proving with the resolution refutation, all clauses are stored in memory devices and combined in every possible case to produce resolvents which are then reduced in number while researching on the drawing of empty clauses. In contrast to conventional methods, the DMA computing method of the present invention provides very efficient solutions to theorem proving.
In more detail, solutions to theorem proving according to the present invention are provided by synthesizing DNA molecules corresponding to clauses, mixing the synthetic DNA molecules in a test tube and performing biochemical experiments for confirming the production of empty clauses, that is, hybridization, ligation, PCR and electrophoresis. Furthermore, the DNA computing method according to the present invention is characterized by the ability to process in a parallel manner the clauses of a logical formula represented in conjunctive normal form.
With reference to
In step (1), the representation of clauses with DNA can be largely divided into two: how to represent Boolean variables or literals of clauses and then clauses by use of them.
According to the present invention, a positive literal is represented with a predetermined base sequence of a suitable length while its negation corresponds to the complimentary sequence. As for the DNA sequences for representing literals, their lengths and base arrangements, though not specifically limited, are preferably selected so as to hybridize with their complementary sequences at suitable temperatures. Usually, the DNA sequences are approximately 10-30 mer in length. In consideration of hybridization conditions, the sequence is determined as to GC content, base arrangement, and length. For convenience of explanation, a base sequence which is suitable for hybridization is called a predetermined base sequence in this specification. Those skilled in the art can determine whether a base sequence is proper for hybridization under specific conditions or not.
Following are four conditions which are considered when determining base sequences. That is, (1) Base sequences corresponding to different positive/negative literals must be as inconsistent as possible. (2) Base sequences corresponding to different positive/negative literals hybridize with each other at as low possibility as possible. (3) Base sequences corresponding to different positive/negative literals have as uniform melting points and GC contents within the range of hybridization conditions required as possible. (4) In each of the base sequences corresponding to different positive/negative literals, as few the same bases as possible are arranged in a tandem manner. It is very important to solutions to theorem proving problems to design sequences satisfying the above conditions. DNA molecules may form hybrids in non-Watson-Crick complementary manners or may be not amplified in desired intramolecular regions for various reasons such as base sequence homology, intermolecular bonding, and secondary structure. Algorithms or programs for designing optimal DNA sequences can be applied for securing industrially useful genes in addition to solutions to theorem proving.
Base sequences used in the present invention can be designed by a base sequence generator such as multiobjective genetic algorithm-based base sequence generator NACST/Seq (Shin, S.-Y. Kim, D.-M, Lee, I.-H., and Zhang, B.-T., Evolutionary sequence generation for reliable DNA computing. Proceedings of 2002 Congress on Evolutionary Computation, IEEE press, 2002).
In the present invention, a clause is represented by a single DNA molecules formed from DNA segments corresponding to literals. The representation of clauses nay be done largely in two manners one is to concatenate DNA segments corresponding to literals into a single strand of DNA corresponding to the clause. For example, as shown in
The other representation method is to use hairpin structures of DNA or branched DNA. In the case of branched DNA, a stretch of base sequences representing a literal is positioned at the end of each branch. The branched DNA molecule is formed by combining three or more strands of DNA in such a way that one strand partially binds to other two strands, which is illustrated by the upper-left box in
The representation of clauses with branched EMA makes the result of the hybridization step free from the arrangement of literals in the clauses. In addition, the use of branched molecules in the representation of clauses gives feasibility to the selection of PCR primers. That is, when an empty clause is drawn, it will start with a goal sequence and end with its negation since clauses are either branched molecules or hairpin molecules except for the goal. Therefore, at the PCR step, the sequence for the literal of the goal and its complementary sequence are used as printers. On the other hand when representing clauses with linear single strands of DNA, the design of PCR primers must depend only on predicting which DNA hybrid is formed
In a conjunctive normal form, the procedure of deriving a resolvent from a clause with a positive literal and a clause with the negation of the positive literal, that is, resolution between the two clauses is denoted as the hybridization of two DNA sequences corresponding respectively to the two clauses. When an empty clause is drawn, a perfect double strand of DNA is obtained. In this case, the PCR product corresponds to the sum of the lengths of the DNA sequences for the total literals. The term “a perfect double strand of DNA” as used herein means a DNA molecule in which a base sequence in one strand binds to its complementary sequence in the other strand after hybridization and ligation, since a base sequence for a positive literal is contained together with a base sequence for a negative literal in a reaction container.
In accordance with the present invention, the production of an empty clause results in a double strand of DNA consisting of two DNA strand upon linear implementation and in a double strand of DNA consisting of one DNA strand upon hairpin or branched implementation (see
A better understanding of the method of encoding clauses and the DNA computing method of providing solutions of theorem proving with the resolution refutation may be obtained in light of the following examples which are set forth to illustrate, but are not to be construed to limit the present invention.
An example of the theorem proving problem to be solved is given as shown in
1. Representation of Clauses on Linear DNA
In this example, each literal was represented with 15 mer ssDNA which was designed by NACST/Seq in consideration of the aforementioned four conditions. Concrete procedures of designing base sequences by use of NACST/Seq are as follows.
After determining the length and number of base sequences to be produced, a predetermined number of individuals are produced arbitrarily. Each individual denotes a set of a predetermined number of base sequences. Next, suitability of the individuals to each of the four conditions is calculated. This calculated suitability has great influence on the selection of the individuals which will be used in the next generation. In this regard, individuals with higher suitability are designed to be selected at a higher possibility. Two individuals taken from the selected individuals are subjected to crossover operations to produce two new individuals with predetermined probability and copied to the next generation. The other individuals are transferred, as they are, into the next generation. Then, a mutation operation is performed on randomly selected individuals at arbitrary positions with a predetermined possibility. Individuals produced through this procedure form individual sets in the next generation. The individual sets are repetitively subjected to selection an crossover, mutation calculation at a predetermined number of rounds. In the course of such processes, sets of base sequences corresponding to individuals with low suitability are excluded during the selection process and finally disappear, and individuals with high suitability survive users can select base sequence sets from the survivals. Sequences for clauses are given in
2. Representation of Clauses on Hairpin DNA
In this example, each literal was encoded with a characteristic DNA sequence of 5 mer. In case of branches, each branch has a double strand which is 5 bp long with the sticky end of a single strand. A hairpin DNA is structured to have a double strand of 5 bp and a loop head of 6-mer. Base sequences were designed to satisfy the problem of
3. DNA Computing of Providing Solutions of Theorem Proving
In the present invention, in order to identify the final solutions of linear DNA and hairpin DNA, the procedure from hybrid formation to gel electrophoresis is performed under the same conditions.
1) Hybridization
100 pmol of each of the oligomers of
2) Verification of Formation of Empty Clause
When repetitive application of resolution to a conjunctive normal formula results in the formation of an empty clause, the DNA molecule obtained by the hybridization step has a double strand structure amplified.
Whether the DNA molecule is of a perfect double strand structure or not can be confirmed through the following processes.
2-1 Ligation
In order to ligate nicks within single strands of the hybrid obtained, it was mixed with a T4 DNA ligase in a reaction buffer (50 mM Tris-HCl pH 7.8, 10 mM MgCl2, 5 mM DTT, 1 mM ATP, and 2.5 μg/ml BSA) for a total volume of 10 μl. This was incubated at 16° C. for at least sixteen hours to give a stable ligation product.
2-2 Polymerase Chain Reaction
In the case of connecting literals in a line, the amplification of the final solution was perform in a PCR machine to produce a perfect double strand of DNA corresponding to an empty clause as depicted in the scheme of
2-3 Polyacrylamide Gel Electrophoresis (PAGE)
The PCR products thus obtained were run on 15% polyacrylamide gel across which an electric field of 60 V was applied. After being dyed with EtBr, the PCR product of interest was found to be 75 bp in size on a UV illuminator as measured with reference to a DNA marker run together.
When branched or hairpin structure DNA is used to represent clauses, as seen in
3) DNA Computing Results
With reference to
In another embodiment of the present invention, solutions of theorem proving can be automatically implemented on one chip. All of the processing steps including mixing, encoding, hybridizing, ligating, PCR, and electrophoresis steps can be feasibly performed on one chip, thereby finding solutions to theorem proving problems with higher rapidity, convenience and ease.
In a further embodiment, the DNA computing of the present invention can be applied to DNA microarray technology for disease diagnosis in which the expression extent of a gene relating to a disease of interest is probed. In this regard, the relationship between the expression of the disease-related gene and the attack of the disease can be described in a logical formula such as an “IF-THEN” rule. For example, supposing that if genes A and B are expressed at higher levels and a gene D expressed in a lower level in a person than in normal persons then he is diagnosed to suffer from a disease D. This can be expressed as the logical formula {(ABC)D} which can be then converted into the conjunctive normal form {ABCD}. The literals A, B, C are Boolean variables representing the proposition that. “genes A, B, C are expressed in higher levels in a patient than in normal person” while the literal D is a Boolean variable representing the proposition that “the patient suffers from a disease D”. Deciding whether the patient suffers from the disease D when his gene expression level is measured to be higher for genes A and B and lower for gene C than normal persons is equivalent to logically resolving D from the theory set {ABCD, A, B, C}. When this theorem proving problem is solved with the resolution refutation, an empty clause is drawn. Therefore, if a chip where all of the processing steps including mixing, encoding, hybridizing, ligating, PCR and electrophoresis can be implemented is inputted with rules of diagnosing a disease of interest and added with proper theories from the gene expression information of the patient, the resolution refutation leads to seeing whether the patient suffers from the disease. In addition, if the hybridizing step is performed in a separate compartment for each disease on the chip, different diseases or various patients can be diagnosed on only one chip. This is the difference from conventional diagnosis chips which can diagnose only one disease for only one patient.
As described above, all logical formulas can be represented in conjunctive normal forms which can be very effectively processed in a parallel manner by the DNA computing of the present invention. Therefore, based on massive parallelism, the DNA computing of the present invention can be applied to the finding of solutions to complex theorem proving problems by use of a resolution refutation. For example, the procedure of solving theorem proving problems using the resolution refutation according to the DNA computing of the present invention can be applied to disease diagnosis and decision support systems. Thanks to massive parallelism, the DNA computing of the present invention allows theorem proving problems to be solved with a simple experimental procedure, and thus is easy to implement on chips. Further, the DNA computing of the present invention is useful for the construction of point-of-care diagnosis systems which are advanced versions of intelligence chips. Diagnosis results vary depending on DNA information, health state and life style. Information of each individual is stored in intelligent chips for diagnosis and investigated through hybridization with that of patients to develop diagnosing reagents which are the most suitable for each individual.
Generally well-known problems such as Horn clause have been solved by computer algorithms. These are proved in computer calculation methods, but not in experiments. Realization of computer simulation into experimental results of biochemical reactions and its proving procedure in vitro have been not seen ever. Such research enables theorem proving problems to be resolved in parallel manners to bring about effective calculation. When considering the basic concept that DNA computing is accomplished through molecular biological experiments, the present invention is expected to provide a great contribution to the industry.
Number | Date | Country | Kind |
---|---|---|---|
2002-0039320 | Jul 2002 | KR | national |
2002-0047795 | Aug 2002 | KR | national |