The present invention relates to a method for design support of nucleic acids that function in vivo.
Nucleic acids are capable of forming their own conformation. Particularly, RNA with its high capability exists in nature as a nucleic acid that combines information and function, and functions in vivo. Since RNA was given another look as a functioning molecule, researches on the structure and the function of RNA have been actively conducted. Nucleic acids are functional molecules with simple structures that are made of only four kinds of bases, which is different from proteins composed of many kinds of amino acids. Hence, attention is being paid to nucleic acids from a standpoint of basic research including prediction of their secondary structures to be formed and elucidation of mechanism of enzyme activity based on their sequence information as well as that of applied research such as inhibition of gene expression with the use of nucleic acids, particularly RNA.
For a method to inhibit gene expression with RNA, ribozymes are well known. The typical ribozymes currently being used include hammerhead ribozymes and hairpin ribozymes. More recently, antisense method and RNAi method in which these RNAs function in vivo by making use of various intracellular mechanisms to suppress gene expression even if they themselves do not have any enzymatic activities to inhibit gene are rapidly becoming widespread. In addition, the method of inhibition of gene expression not only by RNA but also by DNA is being explored beginning with nucleic acid DNazyme that functions in vivo.
In accordance with the progress in biology, functional analysis of gene has come to require speed and accuracy more than ever. Analysis with the use of ribozyme, antisense method, and siRNA allows specifically inhibiting gene expression with great ease compared with conventional gene knockout technology, and it is a technology that currently attracts attention.
[Non-patent document 1] GENE SILENCING IN MAMMALS BY SMALL INTERFERING RNAS, Michael T. McManus and Phillip A. Sharp, Nat Rev Genet. 2002 October;3(10):737-47.
[Non-patent document 2] Ribozymes: A Modern Tool in Medicine, Asad U. Khan and Sunil K. Lal, J. Biomed. Sci., 2003 10:457-467
[Non-patent document 3] Thermodynamic criteria for high hit rate antisense oligonucleotide design., Matveeva O V, Freier S M., et al., Nucleic Acids Res. 2003 Sep. 1;31(17):4989-94.
Although the inhibition of gene expression with a nucleic acid that functions in vivo such as ribozyme is simple as an experimental technique, it is necessary to select a target sequence on which the nucleic acid that functions in vivo acts most efficiently from a target nucleic acid sequence. For example, the sequence on which ribozymes act efficiently in cells contains NUX sequence at the cleavage site, and is believed to reside in the 5′ UTR region of the target sequence as well as in a region that does not assume a stem structure in its predicted secondary structure. The stem structure that represents one of the secondary structures of nucleic acids is considered to be a significant factor to inhibit binding of a nucleic acid that functions in vivo to its target. It is also important that the selected target site does not have a homology to mRNAs other than the target. Since substantial EST database is available for each species of organisms in recent years, it has become rather easy to search for a region having no homology to any sequences other than the target site.
Although many methods for designing nucleic acids that function in vivo are proposed at present based on the above information, a definitive method of designing the sequence has not yet been proposed. Therefore, the design of a nucleic acid that functions in vivo essentially requires designing of a plurality of sequences as well as repeating the design while confirming the effects of the designed sequences.
The object of the present invention is to provide a support method for determination work of a target sequence by a user so that the target sequence might be determined visually, intuitively, and efficiently by the user in the design of a nucleic acid that functions in vivo such as ribozyme.
To solve the above problems, the present invention supports the design of a nucleic acid that functions in vivo (hereinafter, referred to as functional nucleic acid) by steps of: focusing attention on the secondary structure of the target nucleic acid that gives an important effect on the activity of the functional nucleic acid; displaying various annotation information that the target sequence has on the secondary structure and restricting information of the functional nucleic acid to be designed for the target sequence; and then selecting visually and intuitively a sequence predicted to be effective from the nucleic acid sequence shown as the secondary structure in a way that the user designates with a pointing device.
Currently, there are a number of software applications that give rise to a design sequence by inputting conditions such as those mentioned above. However, information provided to users is only the nucleic acid sequence after completing the design in many cases. Therefore, it is thought necessary that a predicted secondary structure of the target and its associated information, both of which are useful for promoting the design by repetition, as well as the designed functional nucleic acid are displayed visually and intuitively at the same time.
In consideration of repeated design by a user, the present invention allows to design a functional nucleic acid by steps of: making the user strongly realize what kind of structure the target nucleic acid assumes in vitro or in vivo; displaying various information possessed by the target sequence on the secondary sequence structure; designating a candidate sequence with a pointing device; predicting an effect when the candidate sequence is selected for a target sequence; and carrying out the design by trial and error. Moreover, the present invention allows supporting judgment based on the user's experience by displaying also an output from existing software for predicting nucleic acid sequences on the secondary structure.
In other words, the method for design support of functional nucleic acid acting on a target nucleic acid according to the present invention includes steps of: displaying the secondary structure data of the target nucleic acid; discernibly displaying annotation information that represents characteristics of partial sequences constituting the target nucleic acid on the secondary structure data of the target nucleic acid by correlating with the corresponding partial sequences; and accepting the designation of a partial sequence on the displayed secondary structure data of the target nucleic acid as the target sequence on which the functional nucleic acid acts.
When the designated partial sequence does not meet the conditions for the target sequence of the functional nucleic acid, it is better to notify the above information in a manner such as highlight. Furthermore, it is preferred that a partial sequence meeting the conditions preset for the target sequence of the functional nucleic acid in the sequence of the target nucleic acid is discernibly displayed on the displayed secondary structure data of the target nucleic acid. The annotation information includes regions highly homologous to other ESTs, UTR regions, SNPs, etc. These pieces of annotation information may be varied depending on the kind of the functional nucleic acid. The functional nucleic acids aimed at in the present invention include, for example, ribozyme, siRNA, antisense RNA, and DNazyme.
The melting temperature and the GC content of the designated partial sequence may be displayed. The designation of the partial sequence on the secondary structure data of the target nucleic acid may be carried out by defining the length of functional nucleic acid in advance and indicating a starting point for the partial sequence. An arbitrary sequence that the user designated may be highlighted on the secondary structure data of the target nucleic acid.
The method of the present invention can be realized by a computer program.
According to the present invention, not only sequence information that represents a primary structure but also a secondary structure of the sequence information is focused attention on, and whether the target region for the nucleic acid to be designed is a region in a large open loop or an end region in a long stem structure can be distinguished. Furthermore, when the design region is practically determined, it is possible to design with the image of a state closer to reality by enabling the user to designate a sequence on the two-dimensional structure rather than the one-dimensional structure.
By employing such display method and sequence designation method, it is possible for the user to design more suitable sequences without being restrained by only linear images of nucleic acid sequences as before.
Embodiments of the present invention will hereinafter be explained with reference to the accompanying drawings.
It is possible to confirm visually in a comprehensive manner whether or not the candidate sequence in the target that has been predicted with the sequence prediction software is really acceptable by displaying it with information about peripheral regions on this screen. In this manner, the choice of whether to use the designed sequence or to make a modification to it, for example, to make the recognition sequence a little longer can easily be made.
In designing a functional nucleic acid that targets, for example, a motif region common to a certain group of genes, those parts of sequences with the common motif are cataloged as sequences and displayed on the screen. In this way, it is possible to visually design a region that seems likely to be particularly effective among the common motif sequences.
Furthermore, even when the design results in a palindromic sequence or a peculiar sequence such that stems of ribozymes bind to each other, easy mistakes made by users are prevented by displaying the designed region in a warning color. In addition, the design by users may be supported by calculating Tm and GC content of the region being selected and displaying them dynamically.
In
First, a user inputs information about a sequence desired to be designed into a design sequence option 403. In the embodiment shown in
Next, the user selects an arbitrary region on secondary structure data of the sequence displayed on the display region for secondary structure 404 with a mouse, and carries out a design. The user can strongly realize the secondary structure of the target sequence by directly pointing at a region on the structure information with the mouse. Moreover, which part of the target sequence is targeted for the designed functional nucleic acid can be instantly grasped. In
According to the present invention, it is possible to design a region such as a boundary region between a stem region and a loop region of a nucleic acid while having an image as shown in
A program for design support of functional nucleic acid 120 of the present invention is stored in the program memory 102. The program for design support of functional nucleic acid 120 is composed of programs such as a computing unit for secondary structure of nucleic acid 121 to compute a secondary structure of a target nucleic acid, a computing unit for display position of secondary structure 122 to appropriately arrange the computed secondary structure on the screen, a display unit for secondary structure data of nucleic acid 123 to display the computed secondary structure in a practically understandable way to users, and an inquiring unit for design site of nucleic acid 124 that inquires the users about a design site of the nucleic acid. These programs may be provided by being stored in a recording medium such as a floppy (trademark) disk, CD-ROM, DVD-ROM, or MO. In addition, these programs may also be provided through the network.
The external storage unit 103 stores sequence information 131 of a target nucleic acid or nucleic acids that are not targeted but are likely to interact with the designed sequence, annotation information 132 associated with the sequence that represents additional information to display on the secondary structure, and restricting information of nucleic acid to be designed. The data memory 104 stores secondary structure of sequence 141 in which computed secondary structures of nucleic acids are stored and information on candidate sequence 142 in which layout information of the secondary structures on the screen is stored. The input device 105 is composed of a pointing device 151, a keyboard 152, and the like. Although the program memory 102 and the data memory 104 are separately depicted in
When the processing is started, the sequence of a gene whose expression users want to control is first loaded from the sequence information 131 at a step 201. In many cases, the target sequence is one that is present in vivo as an mRNA in various living organisms.
A specific structure of the sequence information 131 is shown in
Next, at a step 202, the secondary structure of sequence 141 is computed for the loaded sequence data by the central processing unit 101 using the computing program for secondary structure of nucleic acid 121. A number of investigations have been conducted so far as to the computation of secondary structures of nucleic acids, and any program of M. Zucker's M-FOLD, Vienna's RNA package, or the like may be used for the computation of the secondary structure. The computation of secondary structures of nucleic acids needs temperature, salt concentration, and the like as parameters for their structure computation. At this step, users are asked about these parameters on the display 106 as required, and these are set up using the pointing device 151 and the keyboard 152. At this time, various parameters may also be asked by the central processing unit. At a subsequent step 203, a computation of display position for the target nucleic acid sequence on the screen is performed.
A specific structure of the secondary structure of sequence 141 is shown in
At a next step 204, the annotation information 132 is loaded. Hereby, the annotation information consists of data representing part of nucleic acid sequence and data of its associated information. Conceivable information includes information about UTRs known for the target nucleic acid, SNPs, sequences highly homologous to other various nucleic acid sequences in vivo including mRNA, and sequences seen in certain specific kinds of gene sequences in common, and the like. In addition, sequences of functional nucleic acids predicted beforehand with the use of other software to predict a functional nucleic acid sequence may be used for the annotation information.
The specific structure of the annotation information 132 is shown in
At a subsequent step 205, information related to the target nucleic acid sequence is displayed. At the step 205, all information specified so far is displayed on the secondary structure of the target nucleic acid that has been computed at the step 203.
At a next step 206, information on a functional nucleic acid to be designed is specified. As to the restricting information on the nucleic acid to be designed 133, its relevant information to be specified includes general information such as the length of the nucleic acid to be designed, and the target Tm value and GC content. In the case of designing a ribozyme, information on the presence of NUX at a sequence cleavage site is specified. In the case of information on siRNA, beginning of its sequencing from 5′-AA is specified. When users conduct the design repeatedly, data at this step change most frequently. Therefore, an interface for directly inputting values as well as the input from the external storage unit 103 may be provided.
A specific structure of the restricting information on nucleic acid to be designed 133 is shown in
At a step 207, a design site of nucleic acid is asked to a user. An example of designation at the time of designating a nucleic acid sequence is shown in
When functional nucleic acids are designed, the procedure of the step 207 may also be carried out by the following method: Icons 1101 that indicate setting of multiple kinds of functional nucleic acids are prepared as shown in
When a region is designated, the sequence of the designated region is highlighted on the screen to notify the user if the information differs from that designated for TM, length of sequence, and the like at the step 206. In the example shown in
At the time of designating a region, the designation of a sequence may be made not to select bases longer than the indicated length of sequence when designing a sequence with a definite length of sequence such as siRNA.
The user can finally determine the design sequence after much trial and error on the basis of the annotation information about the predicted site and the sequence. When multiple sequences are designed for the same target, the sequence design can be repeated by going back to the step 206. When the target sequence is changed, the design can be repeated by going back to the step 201. The sequences designed in this way are stored in the form of the structure shown in
Number | Date | Country | Kind |
---|---|---|---|
2003-328264 | Sep 2003 | JP | national |