The present invention relates to a gene mutation assessment device, an assessment method, a program, and a storage medium.
Since gene mutations affect various traits, it is important to extract gene mutation and analyze what kind of trait is associated with the gene mutation. While examples of the trait generally include diseases and the responsiveness to a drug, in recent years, attention has been paid not only to these traits but also to a trait associated with an environment including a lifestyle.
For the identification of the association between the gene mutation and the trait, an exhaustive analysis of gene mutation using a next-generation sequencer, a microarray, or the like is usually utilized (Patent Literature 1). However, since a large number of gene mutations are found as candidates by the analysis, it is necessary to determine which gene mutation is associated with which trait and to select a gene mutation which is relatively highly associated with the trait.
Patent Literature 1: JP 2018-191716 A
While a large number of gene mutations are found as candidates as described above, the association between gene mutations in a gene mutation group is not clear. For this reason, in the current analysis, inferring the association between each mutation at a single position and the trait is the only way. However, when the association with a trait is analyzed focusing on only one locus mutation, for example, despite the mutation which actually affects the trait, there is a possibility that the association with the trait cannot be detected (false negative) due to the detection error of the mutation, measurement error of the trait, and the like and that such a mutation is missed as a gene mutation candidate having an association with the trait.
It is therefore an object of the present invention to provide a new gene mutation assessment system which makes it possible, even when a gene mutation has considered to be apparently not associated with a trait from mutation information at a single position, to pick the gene mutation as a gene mutation candidate showing an association with the trait, for example.
In order to achieve the aforementioned object, the present invention provides a gene mutation assessment device, including: a communication unit; an assessment target mutation information acquisition unit; a score assignment unit; a score determination unit; a region mutation information acquisition unit; a score re-assignment unit; and an assessment score determination unit, wherein the communication unit can communicate with a database in which information on a gene mutation for a trait is stored, the assessment target mutation information acquisition unit acquires mutation information of a common gene mutation in a sample group showing a common trait as mutation information of an assessment target mutation, the mutation information includes position information of a mutation and base information of a mutation, the score assignment unit assigns a first score showing an association with a trait in the database information to the assessment target mutation based on the database information, the score determination unit compares the first score of the assessment target mutation with an association threshold and determines the assessment target mutation as a re-scoring target when the first score is less than the association threshold, the region mutation information acquisition unit acquires, as region mutation information, a gene mutation in an associated region with respect to a re-scoring target assessment target mutation based on the database information, the score re-assignment unit assigns a second score weighted to the first score to the re-scoring target assessment target mutation based on the region mutation information, and the assessment score determination unit determines the second score as an assessment score of the re-scoring target assessment target mutation.
The present invention also provides a gene mutation assessment method, including: an assessment target mutation information acquiring step; a score assigning step; a score determining step; a region mutation information acquiring step; a score re-assigning step; and an assessment score determining step, wherein the method is capable of communicating with a database in which information on a gene mutation for a trait is stored, the assessment target mutation information acquiring step acquires mutation information of a common gene mutation in a sample group showing a common trait as mutation information of an assessment target mutation, the mutation information includes position information of a mutation and base information of a mutation, the score assigning step assigns a first score showing an association with a trait in the database information to the assessment target mutation based on the database information, the score determining step compares the first score of the assessment target mutation with an association threshold and determines the assessment target mutation as a re-scoring target when the first score is less than the association threshold, the region mutation information acquiring step acquires, as region mutation information, a gene mutation in an associated region with respect to the re-scoring target assessment target mutation based on the database information, the score re-assigning step assigns a second score weighted to the first score to the re-scoring target assessment target mutation based on the region mutation information, and the assessment score determining step determines the second score as an assessment score of the re-scoring target assessment target mutation.
The present invention also provides a program for causing a computer to execute the gene mutation assessment method according to the present invention.
The present invention also provides a computer readable storage medium with the program according to the present invention.
According to the present invention, for example, even when it cannot be apparently determined that a gene mutation at a single position is associated with a trait, by referring to information on an associated region of the gene mutation, it is possible to pick a gene mutation having a possibility of showing an association with the trait. Therefore, the association between the gene mutation and the trait can be assessed more efficiently.
Example embodiments of the present invention will be described. Note here that the present invention is not limited to the following example embodiments. In the drawings, identical parts are denoted by identical reference numerals. Each example embodiment can be described with reference to the descriptions of other example embodiments, unless otherwise specified, and the configurations of the example embodiments may be combined, unless otherwise specified.
The assessment device 10 may be a single assessment device including the respective units or may be an assessment device to which the respective units are connectable via a communication network, for example.
The assessment device 10 includes a communication unit 19 and is capable of communicating with a database 30 (301, 302, 303, 304). For example, as shown in
The type and the number of the databases 30 communicating with the assessment device 10 are not limited, for example. The database 30 may be a database in which information on gene mutations for traits is stored. As the database 30, for example, a public database may be used, and examples thereof include PolyPhen, ExAC, Clinvar, Japanese genomic data (iJGVD), SIFT, and CADD. Further, in the present invention, the database is not limited to a database existing at the time of filing of the present application, for example, and a new database after filing can be used.
In the information of the database 30, the type of the trait is not particularly limited, and examples thereof include various traits such as diseases, responsiveness to drugs, traits associated with lifestyle, traits of physical characteristics, and traits such as exercise abilities or academic abilities. As the disease, for example, the classification of the International Disease Classification Table can be used. When the trait is a disease, for example, the gene mutation for the trait is a gene mutation that is significantly different between the group of patients with the disease and the group of normal individuals. When the trait is a specific disease, for example, the gene mutation for the trait is a gene mutation that is significantly different between the group of patients with the specific disease and the group of patients who are not infected with the specific disease (e.g., the group of normal individuals for the specific disease or the group of healthy individuals).
The assessment target mutation information acquisition unit 11 acquires mutation information of a common gene mutation in a sample group showing a common trait as mutation information of an assessment target mutation. The method for acquiring the mutation information is not particularly limited. The assessment target mutation information acquisition unit 11 may acquire the mutation information by input of a user using the input device to be described below, or may acquire the mutation information by reception from a database or the like via the communication network, for example.
The mutation information includes position information of the mutation and base information of the mutation. The position information is, for example, information on the position of the assessment target mutation in the gene, and the base information is, for example, information on the type of the base at the position in the gene. The format of the mutation information is not particularly limited, and may be, for example, a file format such as text data or a VCF file.
The sample group is a sample group showing a common trait. The type of the trait is not limited in any way as described above, and any trait can be set. Examples of the type of the trait include various traits such as diseases, responsiveness to drugs, traits associated with lifestyle, traits of physical characteristics, and traits such as exercise abilities or academic abilities. When the common trait of the sample group is a disease, the assessment target mutation is, for example, a gene mutation that is significantly different between the group of patients with the disease and the group of normal individuals. The common gene mutation may be acquired from, for example, information such as databases, papers, and the like, or may be extracted and acquired from mutation information on the sample group X+ showing the trait X and the mutation information on the sample group X− not showing the trait X. The type of the sample group is not particularly limited, and examples of the sample group include sample groups classified by various factors such as presence or absence of disease, severity of disease, cohort, race, sex, age, and the like.
The number of common gene mutations in the sample group is not particularly limited, and may be, for example, one or two or more. For example, the assessment target mutation information acquisition unit 11 may acquire mutation information on a plurality of common gene mutations in the sample group.
The score assignment unit 12 assigns a first score showing the association with the trait in the database information to the assessment target mutation based on the database information. The score showing the association with the trait is preferably, for example, a relative value by which the magnitude of the association can be compared. As to the relative value, in the case where a score of 0 (zero) is set when no association is shown, and a score of 1 is set when the highest association is shown, a score closer to 0 can be given as the association is smaller, and a score closer to 1 can be given as the association is larger.
When the assessment device 10 can communicate with a plurality of databases by the communication unit 19, for example, the score assignment unit 12 may calculate the score of the assessment target mutation for each of the plurality of databases based on the database information, integrate the scores of the respective databases, and set the integrated score as the first score of the assessment target mutation. The method for calculating the integrated score is not particularly limited, and the integrated score can be calculated by a weighted linear sum using the scores of the respective databases, for example. The databases generally have different scales of values. For this reason, for example, by performing scoring based on the relative values and integrating the scores as described above, it is possible to avoid the influence due to the difference of values of the respective databases.
The score for each database may be weighted based on the accuracy of the database, for example. The accuracy of the database can be set as appropriate, for example.
The score determination unit 13 compares the first score of the assessment target mutation with the association threshold and determines the assessment target mutation as a re-scoring target when the first score is less than the association threshold. The threshold is not specifically limited and can be set as appropriate. For example, the score determination unit 13 may compare the first score of the assessment target mutation with the association threshold, and the assessment target mutation may be determined as a mutation associated with the trait in the database information if the first score satisfies the association threshold.
The region mutation information acquisition unit 14 acquires, as region mutation information, a gene mutation in an associated region with respect to the re-scoring target assessment target mutation based on the database information. The associated region is not particularly limited and can be set as appropriate. The information of the associated region with respect to the assessment target mutation may be stored in advance in the storage unit 17, for example.
The length of the associated region is not particularly limited, and can be set as appropriate, and as a specific example, the length is, for example, ±10,000 bases long, ±100,000 bases long, or the like. The associated region may be, for example, a contiguous sequence including the position of the assessment target mutation. The associated region may be, for example, a position of linkage with respect to the position of the assessment target mutation, a combination of positions of a plurality of linkages, or a region including the position of the linkage. The associated region may be, for example, a coding region, a structural domain, or the like associated with a gene having the assessment target mutation.
The score re-assignment unit 15 assigns a second score weighted to the first score to the re-scoring target assessment target mutation based on the region mutation information.
For example, the assessment score determination unit 16 determines the first score as the assessment score of the assessment target mutation when the first score of the assessment target mutation satisfies the threshold, and determines the second score as the assessment score of the re-scoring target assessment target mutation when the first score of the assessment target mutation does not satisfy the threshold.
In the assessment device 10, for example, the score determination unit 13 may also serve as an associated gene mutation determination unit. The associated gene mutation determination unit may compare the assessment score with the association threshold, and determine that the assessment target mutation whose assessment score satisfies the association threshold is a mutation associated with the trait in the database information.
When the assessment device 10 includes the storage unit 17, the storage unit 17 may store, for example, information from the database 30, information used for processing in each unit of the assessment device 10, and information obtained by processing in each unit of the assessment device 10. In the assessment device 10, the storage unit 17 may be the database 30.
When the assessment device 10 includes the output unit 18, the output unit 18 may output information obtained by processing in each unit of the assessment device 10, for example. The output destination by the output unit 18 may be a display when the assessment device 10 includes a display or the output destination by the output unit 18 may be external equipment to be described below, for example. In the latter case, the assessment device 10 and the external equipment are connectable via a communication network, for example.
(2) Hardware Configuration
The CPU 101 is responsible for the entire control of the assessment device 10. In the assessment device 10, the CPU 101 executes the program of the present invention or other programs, and reads and writes various kinds of information, for example. Specifically, for example, the CPU 101 of the assessment device 10 functions as the assessment target mutation information acquisition unit 11, the score assignment unit 12, the score determination unit 13, the region mutation information acquisition unit 14, the score re-assignment unit 15, and the assessment score determination unit 16.
The bus 103 connects the respective functional units of the CPU 101, the memory 102, and the like, for example. The bus 103 can also be connected to external equipment, for example. The external equipment may be, for example, the database 30, a display terminal, or the like. The assessment device 10 can be connected to the communication network 20 by the communication device 110 connected to the bus 103, and can also be connected to the external equipment via the communication network 20. The communication device 110 is, for example, the communication unit 19.
The memory 102 includes, for example, a main memory, which is also referred to as a main memory. When the CPU 101 performs processing, the memory 102 reads various operation programs 108 such as the program of the present invention stored in the auxiliary storage device to be described below, and the CPU 101 receives data from the memory 102 and executes the program 108. The main memory is, for example, a RAM (random access memory). The memory 102 further includes, for example, a ROM (read-only memory).
The storage device 107 is also referred to as a so-called auxiliary storage in comparison with the main memory (main storage device), for example. The storage device 107 includes a storage medium and a drive for reading from and writing to the storage medium, for example. The storage medium is not particularly limited, and may be, for example, a built-in type or an external type, and examples thereof include HDs (hard disks), FDs (Floppy® disks), CD-ROMs, CD-Rs, CD-RWs, MOs, DVDs, flash memories, and memory cards. The drive is not particularly limited. The storage device 107 may be, for example, a hard disk drive (HDD) in which a storage medium and a drive are integrated. For example, as described above, the operation program 108 is stored in the storage device 107. Further, the storage device 107 may be the storage unit of the assessment device 10 and may store information input to the assessment device 10, information generated by the assessment device 10, or the like, for example.
The assessment device 10 further includes an input device 104, a display 105, and the like, for example. Examples of the input device 104 include a touch panel, a keyboard, and a mouse. Examples of the display 105 include an LED display and a liquid crystal display, and the display 105 serves as the output unit 18, for example.
(3) Gene Mutation Assessment Method
The assessment method of the present example embodiment can be performed using the assessment device 10 shown in
The assessment method of the present example embodiment will be described with reference to
First, as an assessment target mutation information acquiring step, mutation information on a common gene mutation in a sample group showing a common trait is acquired as mutation information of the assessment target mutation (S100). This step can be performed by the assessment target mutation information acquisition unit 11 of the assessment device 10, for example.
The number (n) of common gene mutations in the sample group is not particularly limited, and may be one or two or more. In the present example embodiment, as a specific example, the following four types of gene mutations (mutations M1, M2, M3, and M4) are exemplified as common gene mutations in the sample group.
Next, as the score assigning step, the first score showing the association with the trait in the database information is assigned to the assessment target mutation based on the database information (S101). This step can be performed, for example, by the scoring unit 12 of the assessment device 10.
In a specific example, Database 1 (DB1) in which gene mutation information for a trait A is stored is referred to, for example. The DB1 is considered to also contain information on the association between the trait A and each of the mutations M1 to M4. Then, when the first score showing the association between each of the mutations M1 to M4 and the trait A is assigned based on the information of the DB1, for example, the first scores, 0.9, 0.1, 0.3, and 0.1, can be assigned to the mutations M1 to M4, respectively, as shown in the Table 1. From this first score, it can be seen that the level of the association with respect to the trait A is in the order of the mutation M1, M3, M2, and M4.
Then, as the score determining step, the first score of the assessment target mutation is compared with the association threshold, and it is determined whether or not the first score satisfies the threshold (S102). When the first score is less than the association threshold (NO), the assessment target mutation is determined as a re-scoring target (S103). These steps can be performed by, for example, the score determination unit 13 of the assessment device 10.
The threshold can be set as appropriate as described above. In the case where the score is set to be larger as the association is higher and smaller as the association is lower, for example, when the first score is less than (or equal to or less than) the threshold, the assessment target mutation can be determined as a re-scoring target. On the other hand, in the case where the score is set to be smaller as the association is higher and larger as the association is lower, for example, when the first score exceeds the threshold (or is equal to or larger than the threshold), the assessment target mutation can be determined as a re-scoring target.
In the usual method, as to the assessment target mutation, when the first score showing the association with the trait is less than the threshold, which is a criterion, the assessment target mutation is excluded as being unassociated with the trait. However, some of such assessment target mutations may actually be associated with the trait. In contrast, the present invention makes it possible to pick the assessment target mutation having the possibility of being actually associated with the trait by assigning a further score to the assessment target mutation having the first score of less than the threshold, as described below.
In the specific example, for example, when the threshold is 0.5, the first scores of the mutation M2, M3, and M4 are less than the threshold as shown in the Table 1, and therefore, the assessment target mutations are determined as a re-scoring target.
Next, as the region mutation information acquiring step, a gene mutation in an associated region with respect to the re-scoring target assessment target mutation is acquired as region mutation information based on the database information (S104). This step can be performed, for example, by the region mutation information acquisition unit 14 of the assessment device 10. Then, as the score re-assigning step, a second score weighted to the first score is assigned to the re-scoring target assessment target mutation based on the region mutation information (S105). This step can be performed, for example, by the score re-assignment unit 15 of the assessment device 10.
These steps are based on the findings obtained by the inventors of the present invention. Hence, the findings obtained by the inventors of the present invention will be described with reference to the simulation graphs of
Next,
As shown in
The associated region can be set as appropriate. The setting condition of the associated region may be stored in advance in the storage unit 17, for example. In this case, in the case where the associated region is a contiguous sequence including the assessment target mutation as described above, for example, the position of the assessment target mutation in the contiguous sequence, the length of the contiguous sequence, and the like can be set as the setting condition. When the associated region is a position of a linkage with respect to the position of the assessment target mutation as described above, for example, the position of the linkage with respect to the position for each mutation can be set as the setting condition. The region mutation information in the associated region can be obtained from the database information.
In a specific example, associated regions are set for the re-scoring target mutations M2, M3, and M4, respectively, and the gene mutation in each associated region is acquired as region mutation information. The gene mutation in the associated region may be, for example, a gene mutation for the trait A or a gene mutation for other traits. That is, for example, the relative values of the gene mutations of the sample group with respect to the trait A (breast cancer) may be plotted with white circles in
Then, the assessment score determining step determines the second score as an assessment score of the re-scoring target assessment target mutation (S106). These steps can be performed by the assessment score determination unit 16 of the assessment device 10, for example.
When it is determined in the step (S102) that the first score satisfies the association threshold (Yes), the first score is determined as an assessment score of the assessment target mutation (S107). These steps can be performed by the assessment score determination unit 16 of the assessment device 10, for example.
While the relative values of mutations that could not be detected in the sequence of the sample group with respect to the trait were plotted (black circles) to generate the density curve (W) in
(Variation 1)
When the assessment device 10 is capable of communicating with a plurality of databases by the communication unit 19 as shown in
The integrated score is not particularly limited, and can be calculated by a weighted linear sum using the scores of the respective databases, for example. As for the weighted linear sum, statistical means such as, for example, a generalized linear model, a neural network, or the like can be utilized. The score assignment unit 12 may weight the score for each database based on the accuracy of the database.
As a specific example, as shown in Table 2 below, there are four types of gene mutations (mutations M1, M2, M3, and M4) as common gene mutations in the sample group, and four types of databases (DB1, DB2, DB3, DB4) are used.
For each of the assessment mutations (M1, M2, M3, and M4), the score can be calculated based on each database information, and the integrated score can be obtained by the following model equation using the scores of the four types of databases. For the calculation of the integrated score, for example, machine learning such as unsupervised learning or supervised learning can be utilized. The unsupervised learning may be, for example, principal component analysis, and the supervised learning may be, for example, a support vector machine, a Naive Bayes classifier, or the like.
i: i-th gene mutation
j: j-th database
n: Number of databases
β0: Constant term representing intercept
Si,j: Score for gene mutation i in database j
βi,j: Weight of score for gene mutation i in database j
The assessment device of the present example embodiment can further output the assessment score, for example. The output of the assessment score may include, for example, visualization data based on the assessment score.
As shown in
As can be seen from the graph of
In the present example embodiment, for the profile of the assessment target mutation and the disease, for example, hierarchical clustering, k-means method, and the like can also be used.
The format of the visualization data is not particularly limited, and may be the format of a numerical matrix as described above, or may be a bar graph, a plot graph, or the like.
The program of the present example embodiment is a program capable of causing a computer to execute the assessment method of the present invention. Alternatively, the program of the present example embodiment may be recorded on, for example, a computer readable storage medium. The storage medium is not particularly limited, and may be, for example, a storage medium as described above, or the like.
While the present invention has been described above with reference to illustrative example embodiments, the present invention is by no means limited thereto. Various changes and variations that may become apparent to those skilled in the art may be made in the configuration and specifics of the present invention without departing from the scope of the present invention.
This application claims priority from Japanese Patent Application No. 2018-051268 filed on Mar. 19, 2018. The entire subject matter of the Japanese Patent Application is incorporated herein by reference.
(Supplementary Notes)
Some or all of the above example embodiments and examples may be described as in the following Supplementary Notes, but are not limited thereto.
A gene mutation assessment device, including:
a communication unit;
an assessment target mutation information acquisition unit;
a score assignment unit;
a score determination unit;
a region mutation information acquisition unit;
a score re-assignment unit; and
an assessment score determination unit, wherein
the communication unit can communicate with a database in which information on a gene mutation for a trait is stored,
the assessment target mutation information acquisition unit acquires mutation information of a common gene mutation in a sample group showing a common trait as mutation information of an assessment target mutation,
the mutation information includes position information of a mutation and base information of a mutation,
the score assignment unit assigns a first score showing an association with a trait in the database information to the assessment target mutation based on the database information,
the score determination unit compares the first score of the assessment target mutation with an association threshold and determines the assessment target mutation as a re-scoring target when the first score is less than the association threshold,
the region mutation information acquisition unit acquires, as region mutation information, a gene mutation in an associated region with respect to a re-scoring target assessment target mutation based on the database information,
the score re-assignment unit assigns a second score weighted to the first score to the re-scoring target assessment target mutation based on the region mutation information, and
the assessment score determination unit determines the second score as an assessment score of the re-scoring target assessment target mutation.
The assessment device according to Supplementary Note 1, wherein
the assessment score determination unit determines the first score as an assessment score of the assessment target mutation when the first score of the assessment target mutation satisfies the threshold, and determines the second score as the assessment score of the re-scoring target assessment target mutation when the first score of the assessment target mutation does not satisfy the threshold.
The assessment device according to Supplementary Note 1 or 2, wherein
in the assessment target mutation information acquisition unit, the common trait of the sample group is a disease, and the assessment target mutation is a gene mutation that is significantly different between a group of patients with the disease and a group of normal individuals.
The assessment device according to any one of Supplementary Notes 1 to 3, wherein
the assessment target mutation information acquisition unit acquires mutation information on a plurality of common gene mutations in the sample group.
The assessment device according to any one of Supplementary Notes 1 to 4, wherein
the trait in the database information is a disease, and the gene mutation for the trait is a gene mutation that is significantly different between a group of patients with the disease and a group of normal individuals.
The assessment device according to any one of Supplementary Notes 1 to 5, wherein
the trait in the database information is a specific disease, and the gene mutation for the trait is a gene mutation that is significantly different between a group of patients with the specific disease and a group of normal individuals.
The assessment device according to any one of Supplementary Notes 1 to 6, wherein
in the region mutation information acquisition unit, the associated region is a contiguous sequence including a position of the assessment target mutation.
The assessment device according to any one of Supplementary Notes 1 to 6, wherein
in the region mutation information acquisition unit, the associated region includes a position of a linkage with respect to a position of the assessment target mutation.
The assessment device according to any one of Supplementary Notes 1 to 8, wherein
the communication unit can communicate with a plurality of databases, and the score assignment unit calculates a score of the assessment target mutation for each of the plurality of databases based on the database information, integrates the scores of the respective databases, and sets the integrated score as the first score of the assessment target mutation.
The assessment device according to Supplementary Note 9, wherein
the score assignment unit calculates the integrated score by a weighted linear sum using the scores of the respective databases.
The assessment device according to Supplementary Note 9 or 10, wherein
the score assignment unit weights the score for each database based on an accuracy of the database.
The assessment device according to any one of Supplementary Notes 1 to 11, wherein
the score assignment unit assigns a relatively large score as an association with the trait is relatively high, and assigns a relatively small score as the association with the trait is relatively low.
The assessment device according to any one of Supplementary Notes 1 to 12, wherein
the score determination unit compares the assessment score with the association threshold, and determines an assessment target mutation whose assessment score satisfies the association threshold as a mutation associated with the trait in the database information.
The assessment device according to any one of Supplementary Notes 1 to 13, further including:
a storage unit, wherein
the storage unit stores the assessment score in association with each assessment target mutation.
The assessment device according to any one of Supplementary Notes 1 to 14, further including:
an output unit, wherein
the output unit outputs an assessment score showing an association with the trait in association with each assessment target mutation.
The assessment device according to any one of Supplementary Notes 1 to 15, further including:
a storage unit, wherein
the storage unit stores the assessment score of the assessment target mutation in association with each trait in the database information.
The assessment device according to any one of Supplementary Notes 1 to 16, further including:
an output unit, wherein
the output unit outputs the assessment score of the assessment target mutation in association with each trait in the database information.
The assessment device according to Supplementary Note 15 or 17, wherein
the output unit outputs the assessment score as visualization data.
A gene mutation assessment method, including:
an assessment target mutation information acquiring step;
a score assigning step;
a score determining step;
a region mutation information acquiring step;
a score re-assigning step; and
an assessment score determining step, wherein
the method is capable of communicating with a database in which information on a gene mutation for a trait is stored,
the assessment target mutation information acquiring step acquires mutation information of a common gene mutation in a sample group showing a common trait as mutation information of an assessment target mutation,
the mutation information includes position information of a mutation and base information of a mutation,
the score assigning step assigns a first score showing an association with a trait in the database information to the assessment target mutation based on the database information,
the score determining step compares the first score of the assessment target mutation with an association threshold and determines the assessment target mutation as a re-scoring target when the first score is less than the association threshold,
the region mutation information acquiring step acquires, as region mutation information, a gene mutation in an associated region with respect to the re-scoring target assessment target mutation based on the database information,
the score re-assigning step assigns a second score weighted to the first score to the re-scoring target assessment target mutation based on the region mutation information, and
the assessment score determining step determines the second score as an assessment score of the re-scoring target assessment target mutation.
The assessment method according to Supplementary Note 19, wherein
the assessment score determining step determines the first score as an assessment score of the assessment target mutation when the first score of the assessment target mutation satisfies the threshold, and determines the second score as the assessment score of the re-scoring target assessment target mutation when the first score of the assessment target mutation does not satisfy the threshold.
The assessment method according to Supplementary Note 19 or 20, wherein
in the assessment target mutation information acquiring step, the common trait of the sample group is a disease, and the assessment target mutation is a gene mutation that is significantly different between a group of patients with the disease and a group of normal individuals.
The assessment method according to any one of Supplementary Notes 19 to 21, wherein
the assessment target mutation information acquiring step acquires mutation information on a plurality of common gene mutations in the sample group.
The assessment method according to any one of Supplementary Notes 19 to 22, wherein
the trait in the database information is a disease, and the gene mutation for the trait is a gene mutation that is significantly different between a group of patients with the disease and a group of normal individuals.
The assessment method according to any one of Supplementary Notes 19 to 23, wherein
the trait in the database information is a specific disease, and the gene mutation for the trait is a gene mutation that is significantly different between a group of patients with the specific disease and a group of normal individuals.
The assessment method according to any one of Supplementary Notes 19 to 24, wherein
in the region mutation information acquiring step, the associated region is a contiguous sequence including a position of the assessment target mutation.
The assessment method according to any one of Supplementary Notes 19 to 25, wherein
in the region mutation information acquiring step, the associated region includes a position of a linkage with respect to a position of the assessment target mutation.
The assessment method according to any one of Supplementary Notes 19 to 26, wherein
the method is capable of communicating with a plurality of databases, and
the score assigning step calculates a score of the assessment target mutation for each of the plurality of databases based on the database information, integrates the scores of the respective databases, and sets the integrated score as the first score of the assessment target mutation.
The assessment method according to Supplementary Note 27, wherein
the score assigning step calculates the integrated score by a weighted linear sum using the scores of the respective databases.
The assessment method according to Supplementary Note 27 or 28, wherein
the score assigning step weights the score for each database based on an accuracy of the database.
The assessment method according to any one of Supplementary Notes 19 to 29, wherein
the score assigning step assigns a relatively large score as an association with the trait is relatively high, and assigns a relatively small score as the association with the trait is relatively low.
The assessment method according to any one of Supplementary Notes 19 to 30, wherein
the score determining step compares the assessment score with the association threshold, and determines an assessment target mutation whose assessment score satisfies the association threshold as a mutation associated with the trait in the database information.
The assessment method according to any one of Supplementary Notes 19 to 31, further including:
a storing step, wherein
the storing step stores the assessment score in association with each assessment target mutation.
The assessment method according to any one of Supplementary Notes 19 to 32, further including:
an outputting step, wherein
the outputting step outputs an assessment score showing an association with the trait in association with each assessment target mutation.
The assessment method according to any one of Supplementary Notes 19 to 33, further including:
a storing step, wherein
the storing step stores the assessment score of the assessment target mutation in association with each trait in the database information.
The assessment method according to any one of Supplementary Notes 19 to 34, further including:
an outputting step, wherein
the outputting step outputs the assessment score of the assessment target mutation in association with each trait in the database information.
The assessment method according to Supplementary Note 33 or 35, wherein
the outputting step outputs the assessment score as visualization data.
A program for causing a computer to execute the assessment method according to any one of Supplementary Notes 19 to 36.
A computer readable storage medium with the program according to Supplementary Note 37.
According to the present invention, for example, even when it cannot be apparently determined that a gene mutation at a single position is associated with a trait, by referring to information on an associated region of the gene mutation, it is possible to pick a gene mutation having a possibility of showing an association with the trait. Therefore, the association between the gene mutation and the trait can be assessed more efficiently.
Number | Date | Country | Kind |
---|---|---|---|
2018-051268 | Mar 2018 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2018/036376 | 9/28/2018 | WO | 00 |