The present invention is related to an identification method.
In recent years, the base sequences constituting the DNA (deoxyribonucleic acid) and the RNA (ribonucleic acid) of living organisms are analyzed so as to predict the impact of new types of viruses, and accordingly vaccines are developed. Moreover, research is being carried out for detecting mutation (point mutation) such as cancer and detecting genetic abnormality such as genetic mutation, and diagnosing the risk of developing diseases.
The DNA and the RNA have four types of bases represented by symbols “A”, “G”, “C”, and “T” or “U”. Moreover, a mass of three base sequences decides 20 types of amino acids. Each amino acid is represented by a symbol from “A” to “Y”.
As illustrated in
In the related technology, in the case of analyzing a new type of virus, FASTA or BLAST is implemented. In FASTA or BLAST, the base sequences are translated into the symbols of amino acids; a homology search is performed with the amino acids serving as the units for comparison; and similarities with the viruses discovered in the past are determined.
Moreover, in the related technology, in the case of analyzing mutation such as cancer, mutation in the form of “base insertion”, “base deletion”, or “base substitution” is determined; the frameshift of the sequences attributed to mutation is determined; and the underlying genetic mutation developed from the mutation point onward is further detected.
According to an aspect of the embodiments, an identification method includes: obtaining reference codon sequence data and analysis-target codon sequence data; comparing codons included in the obtained reference codon sequence data and codons included in the obtained analysis-target codon sequence data, at each sequence position of codon; identifying that, based on result of the comparing, includes identifying, from among codons included in the analysis-target codon sequence data, codon positioned at each of a plurality of sequence positions subsequent to sequence position at which codons are nonidentical; and identifying that includes referring to a memory unit configured to store type of mutation, which has occurred at a particular codon included in particular codon sequence data, in a corresponding manner to codon positioned at each of a plurality of sequence positions subsequent to the particular codon, on account of occurrence of the mutation in the particular codon, and identifying type of mutation associated to codon positioned at each of the plurality of identified sequence positions, by a processor.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
However, in the related technology explained above, a long period of time is requested in determining the frameshift of the mutation and detecting the underlying genetic mutation developed from the mutation point onward. Moreover, in order to speed up the search (collation), the base sequences need to be partitioned.
In the related technology, in the case of determining the frameshift of the mutation, such as cancer, or detecting the underlying genetic mutation developed from the mutation point onward, local alignment determination is performed in the units of bases in order to enhance the accuracy. However, that results in a decline in the speed. On the other hand, in a genome search, as compared to a text search, the size of the pointer-type inverted index becomes enormous. Hence, an index-based search cannot be performed, thereby resulting in a low speed. In order to hold down the decline in the speed, the base data is partitioned, and automaton collation is performed in parallel operations. However, it results in losses attributed to partitioning, such as complications in management and decline in operability.
In one aspect, it is an object of the embodiments to provide an identification method, an identification program, and an information processing device that enable achieving reduction in the time requested in determining the frameshift of the mutation and detecting the underlying genetic mutation developed from the mutation point onward. Moreover, according to an aspect, it is an object of the embodiments to provide an identification method, an identification program, and an information processing device that enable speeding up the search and the analysis without having to partition the base sequences.
Exemplary embodiments of an identification method, an identification program, and an information processing device according to the present invention are described below in detail with reference to the accompanying drawings. However, the present invention is not limited by the embodiments described below.
The following explanation is given about
The following explanation is given about
Then, based on an insertion transition table 140f and based on the mutation n codon and the mutation n+1 codon that are positioned subsequent to the mutation codon, the information processing device identifies the mutant n codon that is the subsequent codon of the mutant codon. Herein, n is an integer equal to or greater than one. Herein, the codon subsequent to the mutant codon is referred to as “mutant n codon (base insertion)”. The insertion transition table 140f is a table in which two codons subsequent to the mutation codon and a single codon subsequent to the pre-base-insertion mutant codon are held in a corresponding manner. When the mutant n codon in the insertion transition table 140f is identical to the codon subsequent to the mutation position in the reference codon sequence data, the point mutation that has occurred in the analysis-target codon sequence data is “base insertion”.
In the example illustrated in
Meanwhile, if the mutation n codon in the insertion transition table 140f is not identical to the subsequent codon of the mutation position in the reference codon sequence data, the point mutation that has occurred in the analysis-target codon sequence data is “base deletion” or “base substitution”.
The following explanation is given about
The following explanation is given about
Then, based on a deletion transition table 140g and based on the two codons that are positioned subsequent to the mutation codon, the information processing device identifies the second subsequent codon of the pre-base-deletion mutant codon. The second subsequent codon is referred to as “mutant n+1 codon (base deletion)”. The deletion transition table 140g is a table in which the mutation codon, the subsequent two codons, and the second subsequent codon of the pre-base-deletion mutant codon are held in a corresponding manner. When the mutant n+1 codon in the deletion transition table 140g is identical to the second subsequent codon of the mutation position in the reference codon sequence data, the point mutation that has occurred in the analysis-target codon sequence data is “base deletion”.
In the example illustrated in
Till now, for convenience, the explanation was given about an example of determining deletion regarding the mutant 2 codon “UGC”. However, regarding the mutant 1 codon “AAG” too, the deletion transition table 140g can be used and the mutant 1 codon “AAG” can be referred to using the mutation (0) codon “UCA” and the mutation 1 codon “AUG”, and deletion can be determined (herein, n is an integer equal to or greater than zero).
Meanwhile, if the mutant n+1 codon in the deletion transition table 140g is not identical to the second subsequent codon of the mutation position in the reference codon sequence data, then the point mutation that has occurred in the analysis-target codon sequence data is “base insertion” or “base substitution”.
On the other hand, if a plurality of codons subsequent to the mutation codon in the analysis-target codon sequence data is identical to a plurality of mutant codons in the reference codon sequence data, then the point mutation that has occurred in the analysis-target codon sequence data is “base substitution”.
As explained above, the information processing device according the first embodiment compares the reference codon sequence data and the analysis-target codon sequence data in the units of codons, and identifies nonidentical codons. Then, based on the two subsequent codons of the nonidentical codon, the information processing device obtains the subsequent codon of the mutant codon from the insertion transition table 140f; obtains the second subsequent codon of the mutant codon from the deletion transition table 140g; compares the obtained codons with the subsequent codon of the mutant codon included in the analysis-target-codon sequence data; and identifies the type of point mutation. Thus, as a result of performing comparison in the units of encoded codons in a consistent manner, the type of mutation can be determined while identifying the nonidentical codons. That enables achieving reduction in the time requested in determining the type of mutation.
Given below is the explanation of a configuration of the information processing device according to the first embodiment.
The communication unit 110 is a processing unit that performs data communication with external devices (not illustrated) via a network. The communication unit 110 is an example of a communication device. For example, the information processing device 100 can receive information such as reference codon sequence data 140a and analysis-target codon sequence data 140b from an external device via a network.
The input unit 120 is an input device for enabling input of a variety of information to the information processing device 100. Examples of the input unit 120 include a keyboard, a mouse, or a touch-sensitive panel.
The display unit 130 is a display device that displays a variety of information output from the control unit 150. Examples of the display unit 130 include an organic EL (electro-luminescence) display, a liquid crystal display, and a touch-sensitive panel.
The memory unit 140 is used to store the reference codon sequence data 140a, the analysis-target codon sequence data 140b, a code conversion table 140c, first-type sequence data 140d, and second-type sequence data 140e. Moreover, the memory unit 140 is used to store the insertion transition table 140f, the deletion transition table 140g, and a detection result table 140h. Examples of the memory unit 140 include a semiconductor memory such as a RAM (Random Access Memory), a ROM (Read Only Memory), or a flash memory; and a memory device such as an HDD (Hard Disk Drive).
The reference codon sequence data 140a represents the information about normal base sequences indicated in the units of codons.
The analysis-target codon sequence data 140b represents the information about the target base sequence for analysis indicated in the units of codons.
The code conversion table 140c is a table in which codons and codes are held in a corresponding manner.
The first-type sequence data 140d represents the sequence data obtained as a result of encoding the reference codon sequence data 140a based on the code conversion table 140c.
The second-type sequence data 140e represents sequence data obtained as a result of encoding the analysis-target codon sequence data 140b based on the code conversion table 140c.
The insertion transition table 140f is a table in which mutation n codons and mutation n+1 codons, which are positioned subsequent to mutation codons, are held in a corresponding manner with pre-base-insertion mutant n codons.
In the transition table 50U, all mutation n codons, the mutation n+1 codons (the codons starting with U), and the pre-base-insertion mutant n codons are held in a corresponding manner. The relationship among the codons is defined by the encoded codons.
In the transition table 50C, all mutation n codons, the mutation n+1 codons (the codons starting with C), and the pre-base-insertion mutant n codons are held in a corresponding manner. The relationship among the codons is defined by the encoded codons.
In the transition table 50A, all mutation n codons, the mutation n+1 codons (the codons starting with A), and the pre-base-insertion mutant n codons are held in a corresponding manner. The relationship among the codons is defined by the encoded codons.
In the transition table 50G, all mutation n codons, the mutation n+1 codons (the codons starting with G), and the pre-base-insertion mutant n codons are held in a corresponding manner. The relationship among the codons is defined by the encoded codons.
In the deletion transition table 140g, the mutation n codons, all mutation n+1 codons, and the pre-base-deletion mutant n+1 codons are held in a corresponding manner.
In the transition table 55U, the mutation n codons (the codons ending with U), all mutation n+1 codons, and the pre-base-deletion mutant n+1 codons are held in a corresponding manner. The relationship among the codons is defined by the encoded codons.
In the transition table 55C, the mutation n codons (the codons ending with C), all mutation n+1 codons, and the pre-base-deletion mutant n+1 codons are held in a corresponding manner. The relationship among the codons is defined by the encoded codons.
In the transition table 55A, the mutation n codons (the codons ending with A), all mutation n+1 codons, and the pre-base-deletion mutant n+1 codons are held in a corresponding manner. The relationship among the codons is defined by the encoded codons.
In the transition table 55G, the mutation n codons (the codons ending with G), all mutation n+1 codons, and the pre-base-deletion mutant n+1 codons are held in a corresponding manner. The relationship among the codons is defined by the encoded codons.
Returning to the explanation with reference to
The control unit 150 includes a receiving unit 150a, an encoding unit 150b, a comparing unit 150c, and an identifying unit 150d. The control unit 150 is implemented using a CPU (Central Processing Unit) or an MPU (Micro Processing Unit). Alternatively, the control unit 150 can also be implemented using a hardwired logic such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).
The receiving unit 150a is a processing unit that receives the reference codon sequence data 140a and the analysis-target codon sequence data 140b from the input unit 120 or an external device. Then, the receiving unit 150a registers the reference codon sequence data 140a and the analysis-target codon sequence data 140b in the memory unit 140.
Moreover, when the insertion transition table 140f and the deletion transition table 140g are received from the input unit 120 or an external device, the receiving unit 150a registers the insertion transition table 140f and the deletion transition table 140g in the memory unit 140.
The encoding unit 150b is a processing unit that encodes the reference codon sequence data 140a and the analysis-target codon sequence data 140b based on the code conversion table 140c. The encoding unit 150b compares the reference codon sequence data 140a and the code conversion table 140c and encodes each codon, so as to generate the first-type sequence data 140d. Similarly, the encoding unit 150b compares the analysis-target codon sequence data 140b and the code conversion table 140c and encodes each codon, so as to generate the second-type sequence data 140e. Then, the encoding unit 150b stores the first-type sequence data 140d and the second-type sequence data 140e in the memory unit 140.
As illustrated in
The comparing unit 150c is a processing unit that compares the first-type sequence data 140d and the second-type sequence data 140e, and identifies mutation positions at which the encoded codons are not identical. As explained above, each codon is assigned with a 1-byte code. Hence, from the first-type sequence data 140d and the second-type sequence data 140e, the comparing unit 150c reads the codes one byte at a time from the beginning, and performs comparison.
If a mutation position having nonidentical codes is identified, the comparing unit 150c outputs the comparison result to the identifying unit 150d. The comparison result includes the information about the mutation position, a first-type mutant codon, a second-type mutation codon, the mutation n codon, and the mutation n+1 codon. The first-type mutant codon represents the encoded codon at the mutation position as included in the first-type sequence data 140d. The second-type mutation codon represents the encoded codon at the mutation position as included in the second-type sequence data 140e. The mutation n codon represents the codon (encoded codon) subsequent to the second-type mutation codon. The mutation n+1 codon represents the codon (encoded codon) positioned after the subsequent codon of the second-type mutation codon.
Meanwhile, when the first-type sequence data 140d is identical to the second-type sequence data 140e, the comparing unit 150c outputs the information indicating identicalness as the comparison result to the identifying unit 150d.
The identifying unit 150d is a processing unit that, based on the comparison result obtained by the comparing unit 150c and based on the insertion transition table 140f and the deletion transition table 140g, identifies the type of point mutation that has occurred at the mutation position.
If the pre-base-insertion mutant n codon, which is identified by the comparison of the mutation n codon and the mutation n+1 codon with the insertion transition table 140f, is identical to the subsequent codon of the first-type mutant codon; then the identifying unit 150d sets “base insertion” as the type of point mutation that has occurred at the mutation position.
For example, assume that the following information is included in the comparison result: the first-type mutant n codon “AAG (6Bh)”, the second-type mutation n codon “CAA (5Ah)”, and the mutation n+1 codon “GUG (73h)”. As explained with reference to
On the other hand, when the pre-base-insertion mutant n codon, which is identified by the comparison of the mutation n codon and the mutation n+1 codon with the insertion transition table 140f, is not identical to the subsequent codon of the first-type mutant codon; the identifying unit 150d excludes “base insertion” from the types of point mutation that has occurred at the mutation position.
When the pre-base-deletion mutant n+1 codon, which is identified by the comparison of the mutation n codon and the mutation n+1 codon with the deletion transition table 140g, is identical to the codon positioned after the subsequent codon of the first-type mutant codon; the identifying unit 150d sets “base deletion” as the type of point mutation that has occurred at the mutation position.
For example, assume that the following information is included in the comparison result: the first-type mutant n+1 codon “UGC (4Dh)”, the second-type mutation n codon “AGU (6Ch)”, and the mutation n+1 codon “GCU (74h)”. As explained with reference to
On the other hand, when the pre-base-deletion mutant n+1 codon, which is identified by the comparison of the mutation n codon and the mutation n+1 codon with the deletion transition table 140g, is not identical to the codon positioned after the subsequent codon of the first-type mutant codon; the identifying unit 150d excludes “base deletion” from the types of point mutation that has occurred at the mutation position.
Meanwhile, as a result of performing identification using the insertion transition table 140f and performing identification using the deletion transition table 140g, if “base insertion” and “base deletion” are excluded from the types of point mutation that has occurred at the mutation position, then the identifying unit 150d sets “base substitution” as the type of point mutation that has occurred at the mutation position.
The identifying unit 150d registers, in the detection result table 140h, the information associating the mutation positions and the types of point mutation. Meanwhile, if information indicating identicalness is included in the comparison result, then the identifying unit 150d registers, in the detection result table 140h, the information indicating the absence of abnormalities. The information processing device 100 either can notify the external devices about the information of the detection result table 140h via a network, or can output the information of the detection result table 140h to the display unit 130 for display purposes.
Given below is the explanation of an exemplary sequence of operations performed in the information processing device 100 according to the first embodiment.
The encoding unit 150b of the information processing device 100 encodes the reference codon sequence data 140a and the analysis-target codon sequence data 140b, and generates the first-type sequence data 140d and the second-type sequence data 140e, respectively, (Step S102).
The comparing unit 150c of the information processing device 100 compares the first-type sequence data 140d and the second-type sequence data 140e in the units of codons (single bytes), and identifies mutation positions at which the codons are not identical (Step S103). Then, based on each mutation position, the comparing unit 150c identifies the first-type mutant codon, the mutant n codon, and the mutant n+1 codon in the first-type sequence data 140d; and identifies the second-type mutation codon, the mutation n codon, and the mutation n+1 codon in the second-type sequence data 140e (Step S104).
The identifying unit 150d of the information processing device 100 determines whether or not, in the insertion transition table 140f, the pre-base-insertion mutant n codon, which is identified from the mutation n codon and the mutation n+1 codon, is identical to the subsequent codon of the first-type mutant codon (Step S105). If the two codons are identical (Yes at Step S105), then the identifying unit 150d identifies “base insertion” as the type of point mutation (Step S106). On the other hand, if the two codons are not identical (No at Step S105), then the system control proceeds to Step S107.
The following explanation is given about Step S107. The identifying unit 150d determines whether or not, in the deletion transition table 140g, the pre-base-insertion mutant n codon, which is identified from the mutation n codon and the mutation n+1 codon, is identical to the codon positioned after the subsequent codon of the first-type mutant codon (Step S107). If the two codons are identical (Yes at Step S107), then the identifying unit 150d identifies “base deletion” as the type of point mutation (Step S108).
On the other hand, if the two codons are not identical (No at Step S107), then the identifying unit 150d identifies “base substitution” as the type of point mutation (Step S109).
Then, the identifying unit 150d registers the information about the identified type of point mutation in the detection result table 140h (Step S110). The information processing device 100 outputs the detection result table 140h to the display unit 130 (Step S111).
Given below is the explanation of the effects achieved in the information processing device 100 according to the first embodiment. The information processing device 100 compares the first-type sequence data 140d and the second-type sequence data 140e in the units of one-byte codons, and identifies nonidentical codons (nonidentical encoded codons). Then, the information processing device 100 compares the transition destination codon, for which the nonidentical codons serve as the mutation position, with the insertion transition table 140f and the deletion transition table 140g, and identifies the type of point mutation included in the analysis-target codon sequence data. Thus, as a result of performing comparison in the units of encoded codons in a consistent manner, the type of mutation can be determined while identifying the nonidentical codons. That enables achieving reduction in the time requested in determining the type of mutation.
The information processing device shifts the mutation position P40 to the sequence position of the subsequent codon. That position is referred to as a sequence position P41. Regarding the sequence position P41, the information processing device compares the mutation n codon “GUG (73h)” and the mutation n+1 codon “CAU (48h)” with the insertion transition table 140f; and identifies the pre-base-insertion mutant n codon “UGC (4Dh)”. Then, the information processing device performs correction by substituting the codon “GUG (73h)”, which is the subsequent codon of the mutation codon, with the codon “UGC (4Dh)”, which is the subsequent codon of the pre-base-insertion mutant codon.
As explained above, while shifting the sequence position, the information processing device repeatedly performs the operation of substituting the mutation n codon with the pre-base-insertion mutant n codon, and generates third-type sequence data 240e.
Then, the information processing device compares the encoded codons in the third-type sequence data 240e with the encoded codons in the first-type sequence data 140d, and identifies the nonidentical codons. The information processing device identifies the nonidentical codons as the underlying genetic mutation. In the example illustrated in
Explained below with reference to
Although not illustrated in
As explained above, while shifting the sequence position, the information processing device repeatedly performs the operation of substituting the mutation n+1 codon with the pre-base-deletion mutant n+1 codon, and generates the third-type sequence data 240e.
Then, the information processing device compares the encoded codons in the third-type sequence data 240e and the encoded codons in the first-type sequence data 140d, and identifies the nonidentical codons. The information processing device identifies the nonidentical codons as the underlying genetic mutation. In the example illustrated in
Explained below with reference to
The information processing device compares the encoded codons in the third-type sequence data 240e with the encoded codons in the first-type sequence data 140d, and identifies the nonidentical codons. The information processing device identifies the nonidentical codons as the underlying genetic mutation. In the example illustrated in
As explained above, after identifying the type of point mutation, the information processing device according to the second embodiment generates the third-type sequence data 240e by correcting the second-type sequence data 140e and identifies the nonidentical codons between the first-type sequence data 140d and the third-type sequence data 240e. As a result, the underlying genetic mutation can be detected.
Given below is the explanation of a configuration of the information processing device according to the second embodiment.
The memory unit 240 is used to store the reference codon sequence data 140a, the analysis-target codon sequence data 140b, the code conversion table 140c, the first-type sequence data 140d, and the second-type sequence data 140e. Moreover, the memory unit 240 is used to store the insertion transition table 140f, the deletion transition table 140g, the third-type sequence data 240e, and a detection result table 240h. Examples of the memory unit 240 include a semiconductor memory such as a RAM, a ROM, or a flash memory; and a memory device such as an HDD.
Regarding the reference codon sequence data 140a, the analysis-target codon sequence data 140b, the code conversion table 140c, the first-type sequence data 140d, and the second-type sequence data 140e stored in the memory unit 240; the explanation is identical to the explanation given in the first embodiment. Moreover, regarding the insertion transition table 140f and the deletion transition table 140g stored in the memory unit 240, the explanation is identical to the explanation given in the first embodiment.
The third-type sequence data 240e represents sequence data in which, from among the encoded codons in the second-type sequence data 140e, the codons corresponding to point mutation are corrected to normal codons.
The detection result table 240h is a table for holding the information about point mutation and genetic mutation detected from the analysis-target codon sequence data 140b.
The control unit 250 includes the receiving unit 150a, the encoding unit 150b, the comparing unit 150c, and an identifying unit 250d. The control unit 250 is implemented using a CPU or an MPU. Alternatively, the control unit 250 can be implemented using a hardwired logic such as an ASIC or an FPGA.
The receiving unit 150a is a processing unit that receives the reference codon sequence data 140a and the analysis-target codon sequence data 140b from the input unit 120 or an external device. Then, the receiving unit 150a registers the reference codon sequence data 140a and the analysis-target codon sequence data 140b in the memory unit 240. Besides that, the operations of the receiving unit 150a are identical to the explanation according to the first embodiment.
The encoding unit 150b is a processing unit that encodes the reference codon sequence data 140a and the analysis-target codon sequence data 140b based on the code conversion table 140c. Besides that, the operations of the encoding unit 150b are identical to the explanation according to the first embodiment.
The comparing unit 150c is a processing unit that compares the first-type sequence data 140d and the second-type sequence data 140e, and identifies mutation positions at which the encoded codons are not identical. Then, the comparing unit 150c outputs the comparison result to the identifying unit 250d. Besides that, the operations of the comparing unit 150c are identical to the explanation according to the first embodiment.
The identifying unit 250d identifies the type of point mutation, which has occurred at a mutation position, based on the comparison result of the comparing unit 150c, the insertion transition table 140f, and the deletion transition table 140g. Once the type of point mutation is identified, the identifying unit 250d generates the third-type sequence data 240e by correcting the second-type sequence data 140e. Then, the identifying unit 250d compares the first-type sequence data 140d and the third-type sequence data 240e, and detects genetic mutation. The identifying unit 250d registers the information about the mutation position, the type of point mutation, and the genetic mutation in the detection result table 240h.
Regarding the identifying unit 250d, the operations for identifying the type of point mutation are identical to the operations performed by the identifying unit 150d according to the first embodiment. In the following explanation, the operations performed by the identifying unit 250d are separately explained for the cases in which point mutation of the “base insertion” type is detected, point mutation of the “base deletion” type is detected, and point mutation of the “base substitution” type is detected.
Given below is the explanation of the operations performed by the identifying unit 250d performed when point mutation of the “base insertion” type is detected. As explained with reference to
Subsequently, the identifying unit 250d shifts the mutation position P40 to the subsequent sequence position. That position is referred to as the sequence position P41. Regarding the sequence position P4, the identifying unit 250d compares the mutation n codon “GUG (73h)” and the mutation n+1 codon “CAU (48h)” with the insertion transition table 140f; and identifies the pre-base-insertion mutant n codon “UGC (4Dh)”. Then, the identifying unit 250d performs correction by substituting the codon “GUG (73h)”, which is the codon positioned after the subsequent codon of the mutation codon, with the codon “UGC (4Dh)”, which is the pre-base-insertion mutant n codon.
As explained above, while shifting the sequence position, the identifying unit 250d repeatedly performs the operation of substituting the mutation n codon with the pre-base-insertion mutant n codon, and generates the third-type sequence data 240e.
Then, the identifying unit 250d compares the encoded codons in the third-type sequence data 240e with the encoded codons in the first-type sequence data 140d, and identifies the nonidentical codons. The identifying unit 250d identifies the nonidentical codons as the underlying genetic mutation. In the example illustrated in
Then, in the detection result table 240h, the identifying unit 250d registers the information indicating “base insertion” as the type of point mutation and indicating the mutation position, as well as registers the information about the codons identified as the genetic mutation and their sequence positions.
Given below is the explanation about the operations performed by the identifying unit 250d when point mutation of the “base deletion” type is detected. With reference to
Although not illustrated in
As explained above, while shifting the sequence position; the identifying unit 250d repeatedly performs the operation of substituting the mutation n+1 codon with the pre-base-deletion mutant n+1 codon, and generates the third-type sequence data 240e.
The identifying unit 250d compares the encoded codons in the third-type sequence data 240e and the encoded codons in the first-type sequence data 140d, and identifies the nonidentical codons. The identifying unit 250d identifies the nonidentical codons as the underlying genetic mutation. In the example illustrated in
Then, in the detection result table 240h, the identifying unit 250d registers the information indicating “base deletion” as the type of point mutation and indicating the mutation position, as well as registers the information about the codons identified as the genetic mutation and their sequence positions.
Given below is the explanation about the operations performed by the identifying unit 250d when point mutation of the “base substitution” type is detected. With reference to
The identifying unit 250d compares the encoded codons in the third-type sequence data 240e with the encoded codons in the first-type sequence data 140d, and identifies the nonidentical codons. The identifying unit 250d identifies the nonidentical codons as the underlying genetic mutation. In the example illustrated in
Then, in the detection result table 240h, the identifying unit 250d registers the information indicating “base substitution” as the type of point mutation and indicating the mutation position, as well as registers the information about the codons identified as the genetic mutation and their sequence positions.
Given below is the explanation of an exemplary sequence of operations performed in the information processing device 200 according to the second embodiment.
The encoding unit 150b of the information processing device 200 encodes the reference codon sequence data 140a and the analysis-target codon sequence data 140b, and generates the first-type sequence data 140d and the second-type sequence data 140e, respectively, (Step S202).
The comparing unit 150c of the information processing device 200 compares the first-type sequence data 140d and the second-type sequence data 140e in the units of codons (single bytes), and identifies mutation positions at which the codons are not identical (Step S203). Then, the identifying unit 250d of the information processing device 200 identifies the type of point mutation (Step S204). The sequence of operations performed for identifying the type of point mutation is same as the sequence of operations performed from Step S105 to Step S109 illustrated in
Based on the type of point mutation, the identifying unit 250d generates the third-type sequence data 240e by correcting the second-type sequence data 140e (Step S205). Then, the identifying unit 250d compares the first-type sequence data 140d and the third-type sequence data 240e, and identifies genetic mutation (Step S206).
Subsequently, the identifying unit 250d registers the information indicating the identified type of mutation and the identified genetic mutation in the detection result table 240h (Step S207). The information processing device 200 outputs the detection result table 240h to the display unit 130 (Step S208).
Given below is the explanation about the effects achieved in the information processing device 200 according to the second embodiment. After identifying the type of point mutation included in the second-type sequence data 140e, the information processing device 200 generates the third-type sequence data 240e by correcting the second-type sequence data 140e; and identifies nonidentical codons between the first-type sequence data 140d and the third-type sequence data 240e. As a result, even after the determination of the type of point mutation, as a result of performing comparison in the units of encoded codons in a consistent manner, the underlying genetic mutation can be detected.
For the purpose of illustration, the explanation is given about the case in which the information processing device 200 according to the second embodiment generates the third-type sequence data 240e, and compares it with the first-type sequence data 140d. However, that is not the only possible case. Alternatively, instead of generating the third-type sequence data 240e, the information processing device 200 can convert the second-type sequence data 140e into the units of bytes, and compare the conversion result with the first-type sequence data 140d in the units of bytes.
Given below is the explanation of the other operations performed in the information processing device 200 according to the second embodiment. When the input of a search query is an amino-acid sequence, the information processing device 200 performs codon-amino acid conversion based on the first-type sequence data 140d that is obtained by encoding the reference codon sequence data 140a written using base symbols; and generates fourth-type sequence data (not illustrated in the drawings). Then, the information processing device 200 compares, in the units of amino acids, the fourth-type sequence data, which is obtained as a result of codon-amino acid conversion, with the amino-acid sequence specified in the search query; and identifies mutation positions.
Then, the information processing device 200 compares the fourth-type sequence data 240j and the second-type sequence data 140e, and identifies mutation positions at which the amino acids are not identical. In the example illustrated in
Given below is the explanation of an exemplary sequence of operations performed in the information processing device 200 according to the second embodiment when the input of a search query is an amino-acid sequence.
The receiving unit 150a receives the amino-acid sequence data to be analyzed (Step S212). Then, the encoding unit 150b encodes the amino-acid sequence data to be analyzed, and generates the second-type sequence data 140e (Step S213). At Step S213, the encoding unit 150b converts the amino acid conversion data, which is to be analyzed, into the second-type sequence data 140e based on the code conversion table 140c. Although the specific explanation is not given, it is assumed that the code conversion table 140c is used to hold the amino acids and the encoded amino acids in a corresponding manner.
Then, based on the codon-amino acid conversion table 240i, the comparing unit 150c of the information processing device 200 generates the fourth-type sequence data 240j from the first-type sequence data 140d (Step S214). Subsequently, the comparing unit 150c compares the fourth-type sequence data 240j and the second-type sequence data 140e in the units of amino acids, and identifies mutation positions (Step S215).
The information processing device 200 registers the information about the mutation positions, which are identified by the comparing unit 150c, in the detection result table 240h (Step S216). Then, the information processing device 200 outputs the detection result table 240h to the display unit 130 (Step S217).
In this way, when the input of a search query is an amino-acid sequence, the information processing device 200 performs codon-amino acid conversion based on the first-type sequence data 140d, which is obtained by encoding the reference codon sequence data 140a written using base symbols, and compares the conversion result with the search query. Thus, even when the input of a search query is an amino-acid sequence, it becomes possible to identify the amino acids in which mutation has occurred.
The following explanation is given regarding
The horizontal axis of the inverted index 340a corresponds to the offsets. The vertical axis of the inverted index 340a corresponds to the types of the encoded codons. The inverted index 340a is illustrated using bitmaps of “0” and “1”; and, in the initial state, all bitmaps are set to “0”.
Herein, the offset implies the offset from the first codon included in the sequence data. In the third embodiment, the first codon is assumed to have the offset of “0”. For example, regarding the first-type sequence data 140d, if the codon “AUG (63h)” is the seventh codon from the beginning, then it has the offset of “6”.
The information processing device scans the first-type sequence data 140d from the beginning; identifies the relationship between the types of the encoded codons and the offsets; and sets “1” at corresponding positions in the inverted index 340a. For example, since the codon “AUG (63h)” is present at the offset “6”, the information processing device sets “1” at the intersecting position of the column of the offset “6” and the row of the codon type “AUG (63h)”. The information processing device performs such operations in a repeated manner and generates the inverted index 340a.
The following explanation is given regarding
The information processing device obtains, from the inverted index 340a, a bitmap b10 of the codon “AUG (63h)”, a bitmap b11 of the codon “UUU (40h)”, a bitmap b12 of the codon “GUC (71h)”, and so on in a sequential manner. The bitmap b10 is the bitmap corresponding to the row of the codon type “AUG (63h)” in the inverted index 340a. The bitmap b11 is the bitmap corresponding to the row of the codon type “UUU (40h)” in the inverted index 340a. The bitmap b12 is the bitmap corresponding to the row of the codon type “GUC (71h)” in the inverted index 340a.
The information processing device focuses on the positions of “1” in the bitmap b10 to b12 and, as long as the position of “1” shifts to the left side by one offset in sequence, determines that the codons are identical in the first-type sequence data 140d and the second-type sequence data 140e. When the position of “1” stops shifting to the left side by one offset in sequence, the information processing device determines that the codons are not identical in the first-type sequence data 140d and the second-type sequence data 140e. In the example illustrated in
As explained above, the information processing device according to the third embodiment generates the inverted index 340a based on the first-type sequence data 140d. The information processing device obtains, from the inverted index 340a, the bitmaps corresponding to the codon types in a sequential manner from the first codon included in the second-type sequence data 140e; and identifies nonidentical codons based on the positions of the flag “1” in a plurality of obtained bitmaps. As a result, it becomes possible to perform a high-speed search for the codons having point mutation.
Given below is the explanation of a configuration of the information processing device according to the third embodiment.
The memory unit 340 is used to store the reference codon sequence data 140a, the analysis-target codon sequence data 140b, the code conversion table 140c, the first-type sequence data 140d, the inverted index 340a, and the second-type sequence data 140e. Moreover, the memory unit 340 is used to store the insertion transition table 140f, the deletion transition table 140g, the third-type sequence data 240e, and the detection result table 240h. Examples of the memory unit 340 include a semiconductor memory such as a RAM, a ROM, or a flash memory; and a memory device such as an HDD. Meanwhile, although not illustrated in
Regarding the reference codon sequence data 140a, the analysis-target codon sequence data 140b, the code conversion table 140c, the first-type sequence data 140d, and the second-type sequence data 140e stored in the memory unit 340; the explanation is identical to the explanation given in the first embodiment. Moreover, regarding the insertion transition table 140f and the deletion transition table 140g stored in the memory unit 340, the explanation is identical to the explanation given in the first embodiment. Furthermore, regarding the third-type sequence data 240e and the detection result table 240h stored in the memory unit 340, the explanation is identical to the explanation given in the second embodiment.
The inverted index 340a represents information indicating the relationship between the types of the encoded codons, which are included in the first-type sequence data 140d, and the sequence positions (offsets) using bitmaps. As explained with reference to
The control unit 350 includes the receiving unit 150a, the encoding unit 150b, a generating unit 350a, an obtaining unit 350b, and an identifying unit 350c. The control unit 350 is implemented using a CPU or an MPU. Alternatively, the control unit 350 can be implemented using a hardwired logic such as an ASIC or an FPGA.
The receiving unit 150a is a processing unit that receives the reference codon sequence data 140a and the analysis-target codon sequence data 140b from the input unit 120 or an external device. Then, the receiving unit 150a registers the reference codon sequence data 140a and the analysis-target codon sequence data 140b in the memory unit 340. Besides that, the operations of the receiving unit 150a are identical to the explanation according to the first embodiment.
The encoding unit 150b is a processing unit that encodes the reference codon sequence data 140a and the analysis-target codon sequence data 140b based on the code conversion table 140c. Besides that, the operations of the encoding unit 150b are identical to the explanation according to the first embodiment.
The generating unit 350a is a processing unit that generates the inverted index 340a based on the first-type sequence data 140d. The generating unit 350a scans the first-type sequence data 140d from the beginning; identifies the relationship between the types of the encoded codons and the offsets (sequence positions); and sets “1” at the corresponding locations in the inverted index 340a. For example, since the codon “AUG (63h)” is present at the offset “6”, the generating unit 350a sets “1” at the intersecting position of the column of the offset “6” and the row of the codon type “AUG (63h)”. The generating unit 350a performs such operations in a repeated manner and generates the inverted index 340a.
Upon generating the inverted index 340a, in order to reduce the information volume, the generating unit 350a can perform hashing of the inverted index 340a.
In the example illustrated in
The bitmap b1 represents a bitmap obtained by extracting a particular row of an inverted index (for example, the inverted index 340a illustrated in
The generating unit 350a associates, to the positions in the hashed bitmap, the values obtained as the remainders when the positions of the bits of the bitmap b1 are divided by a single base. When “1” is set at the position of a bit in the bitmap b1, the generating unit 350a sets “1” at the corresponding position in the hashed bitmap.
Given below is the explanation of an example of the operations performed to generate the hashed bitmap h11 having the base “29” from the bitmap b1. Firstly, the generating unit 350a copies the information about the positions “0 to 28” of the bitmap b1 in the hashed bitmap h11. Subsequently, if the bit position “35” in the bitmap b1 is divided by the base “29”, the remainder is equal to “6”. Hence, the position “35” in the bitmap b1 is associated to the position “6” in the hashed bitmap h11. Since “1” is set at the position “35” in the bitmap b1, the generating unit 350a sets “1” at the position “6” in the hashed bitmap h11.
If the bit position “42” in the bitmap b1 is divided by the base “29”, the remainder is equal to “13”. Hence, the position “42” in the bitmap b1 is associated to the position “13” in the hashed bitmap h11. Since “1” is set at the position “42” in the bitmap b1, the generating unit 350a sets “1” at the position “13” in the hashed bitmap h11.
Regarding the positions from the position “29” onward in the bitmap b1, the generating unit 350a repeatedly performs the operations explained above and generates the hashed bitmap h11.
Given below is the explanation of an example of the operations performed to generate the hashed bitmap h12 having the base “31” from the bitmap b1. Firstly, the generating unit 350a copies the information about the positions “0 to 30” of the bitmap b1 in the hashed bitmap h12. Subsequently, if the bit position “35” in the bitmap b1 is divided by the base “31”, the remainder is equal to “4”. Hence, the position “35” in the bitmap b1 is associated to the position “4” in the hashed bitmap h12. Since “1” is set at the position “35” in the bitmap b1, the generating unit 350a sets “1” at the position “4” in the hashed bitmap h12.
If the bit position “42” in the bitmap b1 is divided by the base “31”, the remainder is equal to “11”. Hence, the position “42” in the bitmap b1 is associated to the position “11” in the hashed bitmap h12. Since “1” is set at the position “42” in the bitmap b1, the generating unit 350a sets “1” at the position “11” in the hashed bitmap h12.
Regarding the positions from the position “31” onward in the bitmap b1, the generating unit 350a repeatedly performs the operations explained above and generates the hashed bitmap h12.
Regarding each row in the inverted index 340a, the generating unit 350a performs compression according to the loop back technique explained above, and obtains a hashed inverted index. Meanwhile, the hashed bitmaps corresponding to the bases “29” and “31” are attached with the information about the corresponding row (the types of the encoded codons) of the respective source bitmaps.
The obtaining unit 350b is a processing unit that sequentially obtains, from the inverted index 340a, the bitmaps corresponding to the encoded codons included in the second-type sequence data 140e. Then, the obtaining unit 350b outputs the information about the obtained bitmaps to the identifying unit 350c. Herein, it is assumed that the bitmap information output to the identifying unit 350c is sorted in the order in which it was read.
The obtaining unit 350b reads the encoded codons in sequence from the start codon in the second-type sequence data 140e and obtains, from the inverted index 340a, the bitmap corresponding to the type of the read codon. For example, it is assumed that “AUG (63h)” represents the start codon and that the second-type sequence data 140e is as illustrated in
Meanwhile, when the inverted index 340a is hashed, the obtaining unit 350b performs the following operations and restores the hashed inverted index 340a.
The obtaining unit 350b generates an intermediate bitmap h11′ from the hashed bitmap h11 corresponding to the base “29”. The obtaining unit 350b copies the values of the positions “0” to “28” in the hashed bitmap h11 to the positions “0” to “28” in the intermediate bitmap h11′.
Regarding the values from the position “29” onward in the intermediate bitmap h11′, the obtaining unit 350b repeatedly performs, after every position “29”, the operation of copying the values of the positions “0” to “28” in the hashed bitmap h11. In the example illustrated in
The obtaining unit 350b generates an intermediate map h12′ from the hashed bitmap h12 corresponding to the base “31”. The obtaining unit 350b copies the values of the positions “0” to “30” in the hashed bitmap h12 to the positions “0” to “30” in the intermediate bitmap h12′.
Regarding the values from the position “31” onward in the intermediate bitmap h12′, the obtaining unit 350b repeatedly performs, after every position “31”, the operation of copying the values of the positions “0” to “30” in the hashed bitmap h12. In the example illustrated in
After generating the intermediate bitmaps h11′ and h12′, the obtaining unit 350b performs the AND operation of the intermediate bitmaps h11′ and h12′ so as to restore the pre-hashing bitmap b1. Regarding the other hashed bitmaps too, the obtaining unit 350b can perform identical operations and restore the bitmaps corresponding to the codons (i.e., restore the inverted index 340a).
Returning to the explanation with reference to
Given below is the explanation of the operations performed by the identifying unit 350c for identifying the mutation position at which the first-type sequence data 140d and the second-type sequence data 140e become nonidentical.
The identifying unit 350c performs left-side shifting of the bitmap b10 and generates a bitmap b10-1 (Step S10). Then, the identifying unit 350c performs the AND operation of the bitmap b10-1 and the bitmap b11, and calculates a bitmap b11-1 (Step S11). In the bitmap b11-1, the bit “1” is set at the offset “7”. Thus, it implies that the first-type sequence data 140d and the second-type sequence data 140e are identical from the offset “0” to the offset “7”.
Moreover, the identifying unit 350c performs left-side shifting of the bitmap b11-1 and calculates a bitmap b11-2 (Step S12). Then, the identifying unit 350c performs the AND operation of the bitmap b11-2 and the bitmap b12, and calculates a bitmap b12-1 (Step S13). In the bitmap b11-2, the bit “1” is set at the offset “8”. However, in the bitmap b12-1, the offset “8” has the bit “0” set therein. Hence, the identifying unit 350c determines that the first-type sequence data 140d and the second-type sequence data 140e are not identical at the offset (sequence position) “8”.
Given below is the explanation of the operations performed by the identifying unit 350c for identifying the type of point mutation. Based on a nonidentical mutation position (offset) and based on the insertion transition table 140f and the deletion transition table 140g, the identifying unit 350c identifies the type of point mutation that has occurred at the mutation position. Once the type of point mutation is identified, the identifying unit 350c generates the third-type sequence data 240e by correcting the second-type sequence data 140e.
Herein, the operations performed by the identifying unit 350c for identifying the type of point mutation are identical to the operations performed by the identifying unit 150d according to the first embodiment. Moreover, the operations performed by the identifying unit 350c for generating the third-type sequence data 240e by correcting the second-type sequence data 140e based on the type of point mutation are identical to the operations performed by the identifying unit 250d according to the second embodiment.
Given below is the explanation of the operations performed by the identifying unit 350c for identifying genetic mutation. The identifying unit 350c sequentially obtains, from the inverted index 340a, the bitmaps corresponding to the types of the encoded codons included in the third-type sequence data 240e. In the case of reading a bitmap, in an identical manner to the obtaining unit 350b, the identifying unit 350c reads the encoded codons in sequence from the start codon, and obtains the bitmaps corresponding to the types of the read codons from the inverted index 340a.
Once the bitmaps are obtained, in an identical manner to the explanation given with reference to
The identifying unit 350c performs the operations explained above and registers, in the detection result table 240h, the information about the type of point mutation and the mutation position (offset), as well as registers the information about the codon identified as genetic mutation and its sequence position (offset).
Given below is the explanation of an exemplary sequence of operations performed in the information processing device 300 according to the third embodiment.
The encoding unit 150b of the information processing device 300 encodes the reference codon sequence data 140a and generates the first-type sequence data 140d; as well as generates the inverted index 340a at the same time (Step S302).
The encoding unit 150b of the information processing device 300 encodes the reference codon sequence data 140b and generates the second-type sequence data 140e (Step S303). The obtaining unit 350b of the information processing device 300 compares the encoded codons in the second-type sequence data 140e and the inverted index 340a, and sequentially obtains the bitmaps corresponding to the codons (Step S304).
The identifying unit 350c of the information processing device 300 performs shifting of the bitmaps and performs the AND operations, and identifies the mutation position (offset) having non-identicalness (Step S305). Moreover, the identifying unit 350c identifies the type of point mutation (Step S306).
Then, the identifying unit 350c generates the third-type sequence data 240e by correcting the second-type sequence data 140e based on the type of point mutation (Step S307). The identifying unit 350c compares the encoded codons in the third-type sequence data and the inverted index 340a, and sequentially obtains the bitmaps corresponding to the codons (Step S308).
Subsequently, the identifying unit 350c performs shifting of the bitmaps and performs the AND operations, and identifies the mutation position (offset) having non-identicalness and identifies genetic mutation (Step S309). Then, the identifying unit 350c registers the information about the identified type of point mutation and the identified genetic mutation in the detection result table 240h (Step S310). Subsequently, the information processing device 300 outputs the detection result table 240h to the display unit 130 for display purposes (Step S311).
Given below is the explanation of an exemplary sequence of operations performed by the identifying unit 350c for identifying, based on bitmaps, the offset corresponding to point mutation.
The identifying unit 350c performs left-side shifting of the first bitmap (Step S403). Then, the identifying unit 350c increments the offset n by one (Step S404). Subsequently, the obtaining unit 350b obtains, from the inverted index 340a, a second bitmap corresponding to the codon at the offset n included in the second-type sequence data (Step S405).
Then, the identifying unit 350c performs the AND operation of the first bitmap and the second bitmap, and generates a third bitmap (Step S406). Moreover, the identifying unit 350c determines whether or not the bit of the offset n in the third bitmap is set to “1” (Step S407).
If the bit of the offset n in the third bitmap is not set to “1” (No at Step S408), then the identifying unit 350c determines that point mutation has occurred at the offset n included in the second-type sequence data (Step S409).
On the other hand, if the bit of the offset n in the third bitmap is set to “1” (Yes at Step S408), then the identifying unit 350c updates the first bitmap with a bitmap obtained by performing left-side shifting of the third bitmap (Step S410). Then, the system control returns to Step S404.
Given below is the explanation about the effects achieved in the information processing device 300 according to the third embodiment. The information processing device 300 according to the third embodiment sequentially obtains, from the inverted index 340a, the bitmaps corresponding to the types of codons starting from the start codon included in the second-type sequence data 140e, and identifies nonidentical codons based on the shifting of a plurality of obtained bitmaps and the AND operation thereof. As a result, it becomes possible to perform a high-speed search for the codons having point mutation or genetic mutation.
Meanwhile, for the purpose of illustration, the explanation is given about the case in which the information processing device 300 according to the third embodiment generates the third-type sequence data 240e, and compares it with the first-type sequence data 140d. However, that is not the only possible case. Alternatively, instead of generating the third-type sequence data 240e, the information processing device 300 can convert the second-type sequence data 140e into the units of bytes, and compare the conversion result with the first-type sequence data 140d in the units of bytes.
Given below is the explanation of the other operations performed in the information processing device 300 according to the third embodiment. When the input of a search query is an amino-acid sequence, the information processing device 300 encodes the reference codon sequence data 140a written using base symbols; and generates an inverted index in a corresponding manner to the codons. Moreover, the information processing device 300 converts the codon sequence into an amino-acid sequence; generates an inverted index associated to the amino acids; and identifies the mutation position using that inverted index.
The information processing device 300 performs the operation of identifying the mutation position using the inverted index 340b corresponding to the amino-acid sequence. For example, the information processing device 300 obtains, from the inverted index 340b, the bitmaps corresponding to the types of amino acids starting from the first amino acid included in the amino-acid sequence data; and, based on the positions of the flags of a plurality of obtained bitmaps, identifies the sequence positions, from among the amino acids included in the amino-acid sequence data, that are not identical with respect to the fourth-type sequence data 240j.
Given below is the explanation of an exemplary sequence of operations performed in the information processing device 300 according to third embodiment when the input of a search query is an amino-acid sequence.
As illustrated in
The receiving unit 150a receives the amino-acid sequence data to be analyzed (Step S413). Then, the encoding unit 150b encodes the amino-acid sequence data to be analyzed, and generates the second-type sequence data 140e (Step S414).
Then, based on the codon-amino acid conversion table 240i, the generating unit 350a generates the fourth-type sequence data 240j from the first-type sequence data 140d, and at the same time generates the inverted index 340b corresponding to the amino acids (Step S415).
The identifying unit 350c of the information processing device 400 performs shifting of the bitmaps and performs the AND operations, and identifies the nonidentical mutation position (offsets) (Step S416). Then, the identifying unit 350c registers the information about the identified mutation in the detection result table 240h (Step S417). The information processing device 300 outputs the detection result table 240h to the display unit 130 for display purposes (Step S418).
As explained above, when the input of a search query is an amino-acid sequence, the information processing device 300 generates the inverted index 340b corresponding to the amino acids, and compares the inverted index 340b with the second-type sequence data 140e. Thus, even when the input of a search query is an amino-acid sequence, the amino acids in which mutation has occurred can be identified using the inverted index.
Given below is the explanation of an exemplary hardware configuration of a computer that implements the functions identical to the functions of the information processing device 100 according to the first embodiment and the information processing device 200 according to the second embodiment.
As illustrated in
The hard disk device 407 includes a receiving program 407a, an encoding program 407b, a comparison program 407c, and an identification program 407d. The CPU 401 reads the receiving program 407a, the encoding program 407b, the comparison program 407c, and the identification program 407d and loads them in the RAM 406.
The receiving program 407a functions as a receiving process 406a. The encoding program 407b functions as an encoding process 406b. The comparison program 407c functions as a comparison process 406c. The identification program 407d functions as an identification process 406d.
The operations of the receiving process 406a correspond to the operations of the receiving unit 150a. The operations of the encoding process 406b correspond to the operations of the encoding unit 150b. The operations of the comparison process 406c correspond to the operations of the comparing unit 150c. The operations of the identification process 406d correspond to the operations of the identifying units 150d and 250d.
The programs 407a to 407d need not always be stored in the hard disk device 407 from the beginning. Alternatively, for example, the programs 407a to 407d can be stored in a “portable physical medium” such as a flexible disk (FD), a CD-ROM, a DVD, a magneto-optical disk, or an IC card that is insertable in the computer 400. Then, the computer 400 can read and execute the programs 407a to 407d.
Given below is the explanation of an exemplary hardware configuration of a computer that implements the functions identical to the functions of the information processing device 300 according to the third embodiment.
As illustrated in
The hard disk device 507 includes a receiving program 507a, an encoding program 507b, a generation program 507c, an obtaining program 507d, and an identification program 507e. The CPU 501 reads the receiving program 507a, the encoding program 507b, the generation program 507c, the obtaining program 507d, and the identification program 507e; and load them in the RAM 506.
The receiving program 507a functions as a receiving process 506a. The encoding program 507b functions as an encoding process 506b. The generation program 507c functions as a generation process 506c. The obtaining program 507d functions as an obtaining process 506d. The identification program 507e functions as an identification process 506e.
The operations of the receiving process 506a correspond to the operations of the receiving unit 150a. The operations of the encoding process 506b correspond to the operations of the encoding unit 150b. The operations of the generation process 506c correspond to the operations of the generating unit 350a. The operations of the obtaining process 506d correspond to the operations of the obtaining unit 350b. The operations of the identification process 506e correspond to the operations of the identifying unit 350c.
The programs 507a to 507e need not always be stored in the hard disk device 507 from the beginning. Alternatively, for example, the programs 507a to 507e can be stored in a “portable physical medium” such as a flexible disk (FD), a CD-ROM, a DVD, a magneto-optical disk, or an IC card that is insertable in the computer 500. Then, the computer 500 can read and execute the programs 507a to 507e.
It becomes possible to reduce the time requested in determining the type of frameshift of the mutation and detecting the genetic mutation.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventors to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a continuation application of International Application PCT/JP2018/033329, filed on Sep. 7, 2018, and designating the U.S., the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2018/033329 | Sep 2018 | US |
Child | 17182397 | US |