This application claims priority from prior Japanese Patent Application No. 2017-216502, filed on Nov. 9, 2017, entitled “Sequence Analysis Method, Sequence Analysis Apparatus, Reference Sequence Generation Method, Reference Sequence Generation Apparatus, Program, and Storage Medium”, the entire content of which is incorporated herein by reference.
The present invention relates to a sequence analysis method, a sequence analysis apparatus, a sequence analysis program, and a storage medium.
Conventionally, gene sequence analysis has been utilized as an important tool in basic study, clinical study, and medical care, in recent years, next-generation sequencers (NGS) have appeared and it has become possible to obtain a large amount of gene sequence information comprehensively and at high speed. Accordingly, gene sequence analysis has been utilized in broader fields.
One example of technologies that utilizes gene sequence analysis is target sequencing. The target sequencing is a technique of determining base sequences only with respect to a target region in the entire genome sequence. The target sequencing enables analysis of only gene sequences of a target region that includes hereditary-disorder-related genes, cancer related genes, and the like, and acquisition of highly useful analysis results at low sequencing cost.
For example, gene panels, with which a plurality of mutations occurring in genes related to a specific disease can be analyzed in detail and at high-throughput by use of a next-generation sequencer, are recognized as a useful tool for diagnosing the disease.
Japanese Translation of PCT International Application Publication No. 2015-536661 discloses a method for quickly and efficiently mapping read sequences obtained through target sequencing. In the method described in Japanese Translation of PCT International Application Publication No. 2015-536661, read sequences are mapped with respect to a reference sequence not of the entire genome but of a target region for which sequence reading is performed. Thus, the calculation efficiency is improved. In addition, in order to prevent a read sequence similar to the sequence of a target region being erroneously mapped on the target region, a reference sequence of an alternate region that is similar to a target region of a reference genome is also used in alignment of read sequences. The degree of agreement between the target region and each read sequence and the degree of agreement between the alternate region and the each read sequence are determined, and when the read sequence is more similar to the target region compared with the alternate region, the read sequence is mapped on the target region.
The scope of the present invention is defined solely by the appended claims, and is not affected to any degree by the statements within this summary.
There are cases where polymorphism, mutation, methylation, and the like occur in the sequences of genes that are to be analyzed (hereinafter, also referred to as analysis targets). For example, in a case where a mutation such as deletion or insertion has occurred, if a reference sequence of a target region that does not include any mutation is used, alignment accuracy could be reduced.
In order to solve the above problem, a sequence analysis method according to one aspect of the present invention is a method for analyzing nucleic acid sequence, the method including: obtaining a plurality of read sequences read from the nucleic acid sequence; and determining the nucleic acid sequence by aligning the plurality of read sequences with reference to a single reference sequence, wherein the reference sequence comprises at least a first rearrangement sequence and a second rearrangement sequence that is different from the first rearrangement sequence.
According to the aspect of present invention, a plurality of read sequences read from the nucleic acid sequence are aligned with reference to a single reference sequence that comprises at least a first rearrangement sequence and a second rearrangement sequence that is different from the first rearrangement sequence. Accordingly, even when polymorphism, mutation, methylation, and the like have occurred in the sequence of the analysis target gene, the read sequences can be more accurately mapped.
Even when the number of rearrangement sequences has changed, since the read sequences are aligned with reference to a single reference sequence that comprises a plurality of rearrangement sequences, the number of the reference sequence that is used in alignment does not change. Therefore, information regarding a new mutation can be easily reflected in the alignment of the read sequences.
“Read sequence” means a polynucleotide sequence obtained by sequencing. “Rearrangement sequence” is a partial sequence or a complete sequence of a wild type exon that includes at least one of known polymorphism, known mutation, known methylation that have occurred in the wild type exon or the like included in a genome sequence.
“Reference sequence” is a sequence with respect to which each read sequence is mapped in order to determine which region on the gene the read sequence corresponds to, which mutation on the gene the read sequence corresponds to, and the like. For each gene to be analyzed, as the reference sequence, (1) a wild-type reference sequence which is a partial sequence or a complete sequence of a wild type exon, and (2) a single reference sequence obtained by connecting, into one piece, rearrangement sequences which each include known polymorphism or mutation from the wild type exon sequence, can be used. In a case where bisulfite sequencing is performed, a sequence in which unmethylated cytosine is converted to uracil through bisulfite treatment can be used as a wild type sequence, and a sequence in which cytosine remains unconverted can be used as a rearrangement sequence. “Mapping” means a process of aligning each read sequence to a region having the highest matching rate between the read sequence and a base sequence in the reference sequence that is used.
“Single reference sequence” is a sequence generated, for each gene to be analysis target, by connecting two or more rearrangement sequences regarding the gene as the analysis target, into one piece. The single reference sequence is used as the only one reference sequence that comprises rearrangement sequences, when each read sequence is mapped.
Bisulfite sequencing is one technique for analyzing methylation of DNA. There are cases where cytosine among the four bases forming DNA is methylated to become methylated cytosine. This is called methylation of DNA. Bisulfite sequencing is a sequencing method that is used in order to detect the methylated cytosine. In bisulfite sequencing, DNA contained in a sample is treated by bisulfite, whereby unmethylated cytosine of DNA undergoes base substitution to be replaced with uracil. Meanwhile, methylated cytosine does not undergo base substitution and is not replaced with uracil even through bisulfite treatment. Sequence analysis is performed after the bisulfite treatment, and cytosine that has not been replaced with uracil is determined. Accordingly, cytosine methylated in the sample DNA can be determined.
“Mutation” means at least one of mutations such as polymorphism, substitution, InDel, and the like of a gene. “InDel (Insertion and/or Deletion)” means a mutation that includes insertion, deletion, or both of insertion and deletion. “Polymorphism” of a gene includes SNV (single nucleotide variant, single nucleotide polymorphism), VNTR (variable nucleotide of tandem repeat, repeat sequence polymorphism), STRP (short tandem repeat polymorphism, microsatellite polymorphism), and the like.
The first rearrangement sequence may comprise at least one of polymorphism, mutation, and methylation, and the second rearrangement sequence comprises at least one of polymorphism, mutation, and methylation.
The polymorphism may be any one of repeat sequence polymorphism, microsatellite, and single nucleotide polymorphism, and the mutation may be any one of substitution, deletion, and insertion.
The determining may comprise comparing the read sequence with the reference sequence, and mapping the read sequence to a region on the reference sequence, that has a highest matching rate between the read sequence and the reference sequence.
The sequence analysis method may further comprise generating the reference sequence that comprises the first rearrangement sequence and the second rearrangement sequence.
The sequence analysis method, in the determining the nucleic acid sequence, may use the reference sequence comprising the first rearrangement sequence and the second rearrangement sequence which are generated on the basis of known mutation information obtained from a mutation information database (3, 3a).
“Mutation information” may include known mutation information which is publicly known mutation information and mutation information that has not been made public. “Publicly known mutation information” is not limited to information regarding mutation but may also include information regarding polymorphism and methylation. Similar to publicly known mutation information, known mutation information may include information regarding mutation, polymorphism, and methylation.
The known mutation information and information indicating when the known mutation information was obtained may be associated with each other in the mutation information database (3, 3a).
The sequence analysis method may further comprise adding a third arrangement sequence so as to be included in the reference sequence.
The sequence analysis method may further comprise adding a third arrangement sequence so as to be connected with at least one of the first rearrangement sequence and the second rearrangement sequence.
In the generating of the reference sequence, when known mutation information that is different from known mutation information used in generation of the first rearranged sequence and the second rearranged sequence has been newly stored in the mutation information database (3, 3a), the generating of the reference sequence may comprise generating on the basis of a third rearranged sequence generated on the basis of the newly stored known mutation information, the reference sequence including the first rearranged sequence, the second rearranged sequence, and the third rearranged sequence.
In the generating of the reference sequence, when known mutation information that is different from known mutation information used in generation of the first rearranged sequence and the second rearranged sequence has been newly stored in the mutation information database (3, 3a), the generating of the reference sequence may comprise generating the reference sequence, by connecting, on the basis of a third rearranged sequence generated on the basis of the newly stored known mutation information, the third rearranged sequence to the first rearranged sequence or to the second rearranged sequence.
The sequence analysis method may further comprise providing each of known mutation information stored in the mutation information database (3, 3a) with individual identification information, and generating the first rearrangement sequence, the second rearrangement sequence, and the third rearrangement sequence on the basis of known mutation information respectively provided with different identification information.
The first rearrangement sequence may be a partial sequence or a complete sequence of an exon or intron that has at least one of polymorphism, mutation, or methylation, and the second rearrangement sequence may be a partial sequence or a complete sequence of an exon or intron that has at least one of polymorphism, mutation, or methylation.
The obtaining may comprise obtaining the plurality of read sequences by reading the nucleic acid sequence collected with a bait.
The sequence analysis method may further comprise reading the plurality of read sequences by using oligo DNA immobilized on a surface of a member. Examples of a member to be used for reading the nucleic acid sequence include flow cells and the like shown in
The determining may comprise comparing the plurality of read sequences with a wild-type reference sequence and the single reference sequence.
The reference sequence may comprise the first rearrangement sequence for a gene to be analyzed and a second rearrangement sequence for another gene to be analyzed.
The sequence analysis method may further comprise reading the plurality of read sequences by use of a next-generation sequencer.
The sequence analysis method may further comprise obtaining a plurality of single reference sequences for each gene to be analyzed, wherein the determining comprises aligning the plurality of read sequences with reference to each of the plurality of single reference sequence.
In order to solve the above problem, a sequence analysis apparatus according to another aspect of the present invention is a sequence analysis apparatus (1) comprising: a read sequence information obtaining unit (111) configured to obtain a plurality of read sequences read from the nucleic acid sequence; and a sequence determination unit (113) configured to determine the nucleic acid sequence by aligning the plurality of read sequences with reference to a single reference sequence, wherein the reference sequence comprises at least a first rearrangement sequence and a second rearrangement sequence that is different from the first rearrangement sequence.
According to the aspect of present invention, a plurality of read sequences read from nucleic acid sequence are aligned with reference to a single reference sequence that comprises a plurality of rearrangement sequences. Accordingly, even when polymorphism, mutation, methylation, and the like have occurred in the sequence of the analysis target gene, the read sequences can be more accurately mapped. Since information regarding a new mutation can be uploaded in the single reference sequence, even when the number of rearrangement sequences has changed, the read sequences can be aligned with reference to the single reference sequence. This also provides an effect that the routine of the analysis program need not be modified.
The sequence determination unit (113) may compare the read sequence with the reference sequence and determine a region, on the reference sequence, that has a highest matching rate between the read sequence and the reference sequence.
The sequence analysis apparatus may further include a reference sequence generation unit (115) configured to generate the reference sequence that includes the first rearrangement sequence and the second rearrangement sequence.
The sequence analysis apparatus may further comprise a reference sequence management unit (112) may be configured to obtain, from a mutation information database (3, 3a), known mutation information to be used in generation of the first rearrangement sequence and the second rearrangement sequence.
The reference sequence generation unit (115) may generate the reference sequence that includes the first rearrangement sequence, the second rearrangement sequence, and a third rearrangement sequence generated on the basis of known mutation information that is different from known mutation information used in generation of the first rearrangement sequence and the second rearrangement sequence.
The reference sequence generation unit (115) may generate the reference sequence by connecting a third rearrangement sequence to the first rearrangement sequence or to the second rearrangement sequence, the third rearrangement sequence being generated on the basis of known mutation information that is different from known mutation information used in generation of the first rearrangement sequence and the second rearrangement sequence.
Each of known mutation information stored in the mutation information database (3, 3a) may be provided with individual identification information, and the reference sequence management unit (112) may generate the first rearrangement sequence, the second rearrangement sequence, and the third rearrangement sequence on the basis of pieces of known mutation information respectively provided with different pieces of identification information.
The sequence determination unit (113) may compare each of the plurality of read sequences with a wild-type reference sequence and the reference sequence.
The sequence analysis apparatus may further include an output unit (14) configured to output information regarding which of the reference sequence or the wild-type reference sequence matches the nucleic acid sequence determined by the sequence determination unit (113).
In order to solve the above problem, a reference sequence generation method according to another aspect of present invention is a reference sequence generation method comprising: obtaining a first rearrangement sequence and a second rearrangement sequence; and generating a reference sequence in which the first rearrangement sequence and the second rearrangement sequence are connected in one piece.
According to the above aspect of present invention, the single reference sequence is generated by connecting into one piece the first rearrangement sequence and the second rearrangement sequence. If the nucleic acid sequence of the read sequence is determined by performing aligning using the single reference sequence generated in this manner, a similar effect to that according to the sequence analysis method above is exhibited.
Each of the first rearrangement sequence and the second rearrangement sequence may be a sequence that includes at least one of polymorphism, mutation, and methylation.
The polymorphism may be any one of repeat sequence polymorphism, microsatellite, and single nucleotide polymorphism, and the mutation may be any one of substitution, deletion, and insertion.
Each of the first rearrangement sequence and the second rearrangement sequence may be generated on the basis of information obtained from a mutation information database (3, 3a).
Known mutation information and information indicating a date and a time at which the known mutation information was stored in the mutation information database (3, 3a) may be associated with each other in the mutation information database (3, 3a).
In the generating of the reference sequence, when known mutation information that is different from known mutation information used in generation of the first rearrangement sequence and the second rearrangement sequence has been newly stored in the mutation information database (3, 3a), the reference sequence is generated on the basis of a third rearrangement sequence generated on the basis of the newly stored known mutation information, the reference sequence including the first rearrangement sequence, the second rearrangement sequence, and the third rearrangement sequence.
In the generating of the reference sequence, when known mutation information that is different from known mutation information used in generation of the first rearrangement sequence and the second rearrangement sequence has been newly stored in the mutation information database (3, 3a), the reference sequence may be generated on the basis of a third rearrangement sequence generated on the basis of the newly stored known mutation information, the reference sequence being obtained by connecting the third rearrangement sequence to the first rearrangement sequence or to the second rearrangement sequence.
Each of the first rearrangement sequence and the second rearrangement sequence may include polymorphism, mutation, or methylation, and each of the first rearrangement sequence and the second rearrangement sequence may be a partial sequence or a complete sequence of an exon that has the polymorphism, the mutation, or the methylation.
In order to solve the above problem, a reference sequence generation apparatus according to another aspect of the present invention is a reference sequence generation apparatus configured to generate a reference sequence to be used for determining a nucleic acid sequence of a read sequence read by a sequencer (2), the reference sequence generation apparatus including: a reference sequence management unit configured to obtain a first rearrangement sequence and a second rearrangement sequence; and a reference sequence generation unit configured to generate a reference sequence in which the first rearrangement sequence and the second rearrangement sequence are connected in one piece.
According to the above aspect of present invention, even when the number of rearrangement sequences has changed, the number of the reference sequence that is used for aligning the read sequences does not change. Therefore, information regarding a new mutation can be easily reflected in the alignment of the read sequences.
The sequence analysis apparatus (1) according to each aspect of the present invention may be realized by a computer. In addition, the program for making a computer realize the functions of the sequence analysis apparatus (1) and a computer-readable storage medium having the program stored therein is also included in the scope of the present invention.
According to present invention, even when polymorphism, mutation, methylation, and the like have occurred in the sequence of an analysis target gene, read sequences can be more accurately and efficiently mapped.
In the present embodiment, a sample that contains DNA is fragmented so as to have a length at which a sequencer reads a sequence, the base sequence of each DNA fragment is read by the sequencer, and read sequences having been read are mapped on a single reference sequence obtained by connecting, into one piece, a plurality of rearrangement sequences including mutations, whereby alignment is performed.
In a case where a single reference sequence obtained by connecting a plurality of rearrangement sequences into one piece is not used, if a read sequence is to be mapped through comparison of the read sequence with two or more rearrangement sequences, it is common that, firstly, a read sequence as a mapping target is compared with a wild-type reference sequence, one rearrangement sequence 1 is read out, and the read sequence and the rearrangement sequence 1 are compared with each other. Next, a rearrangement sequence 2 is read out, and the read sequence and the rearrangement sequence 2 are compared with each other. In this method, it is necessary to repeat the process of reading out rearrangement sequences one by one and comparing the read sequence with each rearrangement sequence until the read sequence is compared with all of the rearrangement sequences.
However, in recent years, there are increasing concerns on mutations that occur in genes, and it is considered that information regarding mutations is continued to be added and accumulated globally, in association with progress of research and development. Therefore, when alignment of read sequences is performed, rearrangement sequences that include known mutations are not fixed in number, but gradually increased or sometimes decreased.
In the above-described general method in which rearrangement sequences are read out one by one to be compared with the read sequence, in a case where information regarding known mutations has been uploaded or deleted and thus the number of rearrangement sequences that include known mutations has changed, it is necessary to modify a program routine for adding or deleting rearrangement sequences that should be read out.
Meanwhile, in the present embodiment, after a read sequence is compared with a wild-type reference sequence, the read sequence is compared with a single reference sequence obtained by connecting a plurality of rearrangement sequences into one piece, and then, the position on the reference sequence at which the matching rate with the read sequence satisfies a predetermined level is identified.
Thus, the present embodiment has an advantage that, since a reference sequence obtained by connecting a plurality of rearrangement sequences into one piece is used, even when the number of rearrangement sequences has changed, information regarding a new mutation can be reflected in the single reference sequence, and thus, the program routine need not be modified.
Embodiments of the present disclosure are described in detail.
In the following, an example case in which a sequence analysis apparatus 1 according to an embodiment of the present disclosure is installed in a test institution 110 is described.
(Test Institution 110)
The test institution 110 tests/analyzes samples provided from one or plurality of medical institutions 210 and provides analysis results to the medical institutions 210. In the test institution 110, as shown in
In the test institution 110 shown in
(Mutation Information Database 3)
A mutation information database 3 shown in
(Sequencer 2)
The sequencer 2 is an analysis apparatus that is used in order to read the base sequences of genes contained in a sample. For example, preferably, the sequencer 2 is a next-generation sequencer that can read a large amount of base sequences of DNA fragments simultaneously and in a parallel manner. The next-generation sequencer is one of base sequence analysis apparatuses which have been developed in recent years. The next-generation sequencer has a significantly improved analysis capability by performing, in a flow cell, parallel processing of a large amount of single DNA molecules or DNA templates that have been clonally amplified.
Examples of a sequencing technology applicable to the sequencer 2 include sequencing technologies that can obtain a large number of read sequences per run, such as ionic semiconductor sequencing, pyrosequencing, sequencing-by-synthesis using a reversible dye terminator, sequencing-by-ligation, and sequencing by use of probe ligation of oligonucleotide.
A sequencing primer to be used in sequencing is not limited in particular, and is set as appropriate on the basis of a sequence that is suitable for amplifying a target region. Also with respect to reagents to be used in sequencing, suitable reagents may be selected in accordance with the sequencing technology and the sequencer 2 to be used.
(Configuration of Sequence Analysis Apparatus 1)
The sequence analysis apparatus 1 is an apparatus that obtains a plurality of read sequences read from nucleic acid sequence, and that aligns each read sequence with reference to a single reference sequence that includes at least a first rearrangement sequence and a second rearrangement sequence, thereby determining the nucleic acid sequence.
The sequence analysis apparatus 1 shown in
(Step S51: Process of Generating Single Reference Sequence)
The process of step S51 is performed by the reference sequence management unit 112 and the reference sequence generation unit 115 of the controller 11.
The reference sequence management unit 112 transmits a mutation information request to the mutation information database 3, and downloads publicly known mutation information from the mutation information database 3. The reference sequence management unit 112 may be configured to download only publicly known mutation information that has been uploaded to the mutation information database 3 on and after the day on which the reference sequence management unit 112 downloaded publicly known mutation information at the immediately preceding time. According to this embodiment, for example, in a case where the reference sequence management unit 112 downloaded publicly known mutation information from the mutation information database 3 also before “year zz, month z, day z”, the reference sequence management unit 112 does not download the publicly known mutation information that was downloaded at the immediately preceding time. In
The reference sequence management unit 112 may be configured to download publicly known mutation information about all the analysis target genes of the sequence analysis apparatus 1 periodically (for example, once a month, once a week, or once in two days) from the mutation information database 3. Alternatively, with respect to one or a plurality of analysis target genes of a gene panel associated with a gene panel name or genes that correspond to gene names or the like inputted through the input unit 15 by a user who uses the sequence analysis apparatus 1, publicly known mutation information may be downloaded in accordance with an instruction from the user. In this case, the reference sequence management unit 112 refers to the gene panel information database 121 and determines genes of which publicly known mutation information should be downloaded. In a case of the embodiment in which publicly known mutation information is downloaded in accordance with an instruction from a user, the reference sequence management unit 112 may present to the user the date on which publicly known mutation information was downloaded at the immediately preceding time. Accordingly, it is possible to notify the user in advance whether or not the downloaded publicly known mutation information is new and appropriate.
Here, data which is stored in the gene panel information database 121 and which is referred to by the reference sequence management unit 112 when information regarding a gene panel has been inputted through the input unit 15 is described with reference to
In the gene panel information database 121, as shown in data 121A in
When a gene panel name has been inputted by the user through the input unit 15, the reference sequence management unit 112 may refer to the gene panel information database 121 and extract the gene name, the gene panel ID, and related gene IDs which are associated with the inputted gene panel name.
When gene names have been inputted by the user through the input unit 15, the reference sequence management unit 112 may refer to the gene panel information database 121 and extract the gene IDs associated with the inputted gene names, and the gene panel ID of a gene panel associated with these gene IDs.
In the gene panel information database 121, as shown in data 121C in
When a disease name has been inputted by the user through the input unit 15, the reference sequence management unit 112 may refer to the gene panel information database 121 and extract related gene IDs and a gene panel ID, on the basis of the gene panel name or the gene names associated with the inputted disease name.
On the basis of the downloaded publicly known mutation information, the reference sequence management unit 112 generates a rearrangement sequence and adds/saves the generated rearrangement sequence in the reference sequence database 122. For example, by use of a partial sequence or a complete sequence of a wild type, and the chromosome number, the position, and mutation sequence a of a mutation indicated by publicly known mutation information, the reference sequence management unit 112 generates a rearrangement sequence that includes the mutation sequence a. Accordingly, the rearrangement sequence becomes a sequence in which a known polymorphism, mutation, methylation, or the like that has occurred in a partial sequence or a complete sequence of an exon or the like of a wild type has been reproduced.
Here, a data structure of the reference sequence database 122 is described with reference to
With reference back to
As shown in
The reference sequence generated by the reference sequence generation unit 115 is provided with a reference sequence ID such as “egfr-20170801” by the reference sequence management unit 112, and is saved in the reference sequence database 122.
Data 122B shown in
The reference sequence stored in the reference sequence database 122 is referred to by the sequence determination unit 113 when the sequence determination unit 113 performs alignment of read sequences of nucleic acid fragments.
<Flow of Process of Generating and Updating Single Reference Sequence>
One example of the flow of a process of generating and updating a single reference sequence is described with reference to the flow chart shown in
First, in step S1 shown in
When the specified gene is a gene that is analyzed for the first time by use of the sequence analysis apparatus 1 (YES in step S2), the reference sequence management unit 112 downloads, from the mutation information database 3, all pieces of publicly known mutation information of the gene, the mutation ID provided to each of the publicly known mutation information, the date on which information regarding each mutation was uploaded, and the like (step S3). However, not limited thereto, on the basis of information regarding each mutation downloaded from the external mutation information database 3, the user may create a specific file and the created file may be uploaded to a mutation information database 3a included in the test institution 110. It should be noted that “sequence information of known mutations” to be downloaded may not be all the pieces of publicly known mutation information uploaded in the mutation information database 3. For example, “sequence information of known mutations” may be limited to publicly known mutation information regarding mutations, among polymorphism, mutation, and methylation that occur in the specified gene, that are known to be related to diseases.
Specifically, through the communication unit 16, the reference sequence management unit 112 transmits the gene ID of the specified gene and a mutation information request to the mutation information database 3, and downloads desired publicly known mutation information designated by this request, from the mutation information database 3. A mutation information request may be periodically transmitted at a predetermined interval (for example, every day, once a week, or once a month), or may be transmitted every time the user uses the sequence analysis apparatus 1. Alternatively, the sequence analysis apparatus 1 may obtain a notification to the effect that information regarding a new mutation has been uploaded to the mutation information database 3. In this case, every time the notification is obtained, a mutation information request may be transmitted from the sequence analysis apparatus 1.
Next, the reference sequence management unit 112 generates a rearrangement sequence that corresponds to each of the publicly known mutation information downloaded in step S3, and saves the rearrangement sequences in the reference sequence database 122 (step S4).
The reference sequence generation unit 115 reads out generated rearrangement sequences from the reference sequence database 122, and connects the rearrangement sequences into one piece according to a predetermined connection method, to generate a reference sequence (step S5).
The reference sequence generated by the reference sequence generation unit 115 is provided with a reference sequence ID by the reference sequence management unit 112, and is saved in the reference sequence database 122 (step S6).
Meanwhile, when the specified gene is not a gene that is analyzed for the first time by use of the sequence analysis apparatus 1 (NO in step S2), the reference sequence management unit 112 determines, with respect to the gene, the presence or absence of information regarding any mutation that was uploaded to the mutation information database 3 after the date on which publicly known mutation information was downloaded at the immediately preceding time (step S7).
When there is information regarding a mutation that was uploaded after the date on which publicly known mutation information was downloaded at the immediately preceding time (YES in step S7), the reference sequence management unit 112 downloads the new publicly known mutation information, generates a rearrangement sequence by use of the publicly known mutation information and saves the generated rearrangement sequence (step S8).
The reference sequence generation unit 115 obtains the reference sequence stored in the reference sequence database 122 and the newly generated rearrangement sequence, and connects these into one piece according to a predetermined connection method, to generate a new reference sequence (step S9). That is, the reference sequence generation unit 115 reads out the reference sequence stored in the reference sequence database 122, and connects the rearrangement sequence newly generated by the reference sequence management unit 112 (for example, a rearrangement sequence including the mutation “C797S” shown in
The reference sequence generated by the reference sequence generation unit 115 is stored in the reference sequence database 122 (step S10). As in data 122C shown in
Meanwhile, when there is no information regarding any mutation that was uploaded after the date on which publicly known mutation information was downloaded at the immediately preceding time (NO in step S7), the reference sequence management unit 112 does not download publicly known mutation information. In addition, the reference sequence generation unit 115 does not update the reference sequence. Even when the reference sequence generation unit 115 does not update the reference sequence, the reference sequence management unit 112 preferably updates the reference sequence ID of the reference sequence stored in the reference sequence database 122. Accordingly, it is possible to notify the user that the reference sequence has been generated with the newest publicly known mutation information reflected. For example, irrespective of whether or not a new rearrangement sequence was connected to a rearrangement sequence included in the reference sequence having the reference sequence ID “egfr-20170801” on Sep. 1, 2017, a new reference sequence ID in which the portion of “20170801” of the reference sequence ID is updated with “20170901” may be provided.
Here, an example case where, for each gene, the reference sequence generation unit 115 generates one reference sequence by connecting rearrangement sequences has been described. However, the present disclosure is not limited thereto. For example, for each gene, a reference sequence may be created by connecting a wild-type reference sequence and rearrangement sequences into one piece, or a reference sequence may be created by connecting, into one piece, all the rearrangement sequences of the analysis target genes of the gene panel inputted by the user in step S1.
Alternatively, a reference sequence may be created by connecting, into one piece, all of the wild-type reference sequences and rearrangement sequences for the analysis target genes of the gene panel inputted by the user in step S1. For example, when a gene panel name “A panel” has been inputted by the user, the reference sequence generation unit 115 refers to the gene panel information database 121 and determines the gene ID or the gene name associated with the inputted gene panel name. One or a plurality of the gene names or gene IDs are determined. The reference sequence generation unit 115 may read out wild type sequences and rearrangement sequences associated with the determined gene names from the reference sequence database 122, and connect these into one piece. The reference sequence thus generated serves as the only one reference sequence that includes rearrangement sequences and that is to be used in alignment of read sequence information in the analysis using the gene panel. This reference sequence may be provided with, in addition to the generation date on which the reference sequence was generated, a reference sequence ID (for example, “A Panel 20170901”) that includes information (for example, gene panel name) indicating the gene panel.
Next, in the test institution 110, pretreatment for allowing the sequencer 2 to analyze base sequences of a sample DNA is performed (step S52 in
(Step S52: Pretreatment)
First, as shown in
Next, as shown in
The adapter sequences are used in order to perform sequencing in the steps below. In one embodiment, the adapter sequences can be those to be hybridized to oligo DNA immobilized on a flow cell in a Bridge PCR method.
In one embodiment, as shown in the upper part of
In another embodiment, as shown in the lower part of
The index sequence is a sequence for distinguishing data of each sample. The index sequence is unique to each sample, each gene panel, and each company providing the gene panel. For example, a base sequence used as the index sequence has, but not limited to a given length; and a sequence pattern, such as 10 to 14 consecutive adenines, or 5 to 7 consecutive adenines followed by 5 to 7 consecutive guanines. With respect to the sequence of the DNA fragment having the index sequence added thereto, the index sequence can, on the basis of the sequence pattern and the length thereof, be used to identify information regarding which sample is the source of the sequence data; which gene panel was used; which company provided the gene panel used; and the like.
For example, the index sequence in an analysis using a gene panel A may have a sequence pattern of 14 consecutive adenines, and the index sequence in an analysis using a gene panel B may have a sequence pattern of 7 consecutive adenines followed by 7 consecutive guanines. Alternatively, the index sequence in an analysis using the gene panel A may have a sequence of 14 consecutive adenines (i.e., the length of the index sequence is 14), and the index sequence in an analysis using a gene panel C may have a sequence of 10 consecutive adenines (i.e., the length of the index sequence is 10).
For adding the index sequence and the adapter sequences to the DNA fragment, techniques known in this field can be used. For example, the DNA sequence may be blunted and ligated with the index sequence, and then, further ligated with the adapter sequences.
Next, as shown in
In the panel test using the sequencer 2 in the present embodiment, a large number of genes (for example, 100 or more) are analyzed. The reagent to be used in the panel test includes a set of RNA baits that respectively correspond to the large number of genes. When a different panel is used, the number and the types of analysis target genes are different, and thus, the set of RNA baits included in the reagent to be used in the panel test is also different.
As shown in
Further, as shown in the left section to the center section of
Next, in the test institution 110, sequencing for reading base sequences of the sample DNA is performed (step S53 in
The type of the sequencer 2 that can be used in the present embodiment is not limited in particular, and any sequencer that can analyze a plurality of analysis targets in one run can be suitably used. Examples of such a sequencer include: MySeq9 (registered trademark), HiSeq (registered trademark), and NextSeq (registered trademark) of Illumina, Inc. (San Diego, CA); Ion Proton (registered trademark) and Ion PGM (registered trademark) of Thermo Fisher (Waltham, MA); and GS FLX+ (registered trademark) and GS Junior (registered trademark) of Roche (Basel, Switzerland). In the following, one example is described in which a sequencer of Illumina, Inc., or an apparatus that employs a similar method to that of the sequencer of Illumina, Inc. is used. Through a combination of a Bridge PCR method and a Sequencing-by-synthesis technique, the sequencer of Illumina, Inc. can perform sequencing, with analysis target DNA amplified and synthesized to a huge number on a flow cell.
(Step S53: Sequencing)
First, as shown in the right section of
That is, each DNA fragment as an analysis target (Template DNA in
Then, as shown in
First, to the oligo DNA immobilized on the flow cell (for example, the single-stranded DNA shown in the left section in the upper part of
After addition of the sequencing primer, one base elongation of the 3′ end-blocked fluorescently-labeled dNTP is caused by the DNA polymerase. Since the dNTP of which 3′ end side is blocked is used, the polymerase reaction stops when one base has been elongated. Then, the DNA polymerase is removed (the right section in the middle part of
According to the technique described above, the length of the chain that can be analyzed reaches 150 bases×2, and analysis in a unit much smaller than the unit of a picotiter plate can be performed. Thus, due to the high density, huge sequence information of 40 to 200 Gb can be obtained in one analysis.
The gene panel used when read sequences are read by the sequencer 2 denotes an analysis kit for analyzing a plurality of analysis targets in one run as described above, and, in one embodiment, can be an analysis kit for analyzing a plurality of gene sequences regarding a specific disease.
When used herein, the term “kit” is intended to mean a package that includes containers each containing a specific material. Examples of the containers include bottles, plates, tubes, and dishes. Preferably, the kit includes an instruction insert for using each material. When used in a context regarding a kit herein, “include” is intended to mean a state where the thing that is included is contained in any of the individual containers forming the kit. The kit can be a package in which a plurality of different compositions are packed. Here, the forms of the compositions can be the forms as described above, and in the case of a solution form, the solution may be contained in a container. The kit may include a substance A and a substance B that are mixed in one container or that are in separate containers. The “instruction insert” indicates information regarding each component in the kit, such as information regarding the procedure in a case where the kit is applied to a therapy and/or a diagnosis. The “instruction insert” may be written or printed on paper or any other medium, or may be in the form of an electronic medium such as a magnetic tape, a computer readable disk or tape, or a CD-ROM. The kit can include a container which contains a diluent, a solvent, a washing liquid, or another reagent. Further, the kit may also include an apparatus that is necessary for the kit to be applied to a therapy and/or a diagnosis.
In one embodiment, the gene panel may be provided with one or more of the reagents such as: the reagent that fragments nucleic acid; the ligation reagent; the washing liquid; the PCR reagent (dNTP, DNA polymerase, etc.) such as dNTP, DNA polymerase; and the magnetic bead, which are described above. The gene panel may be provided with one or more of: oligonucleotides for adding the adapter sequences to the fragmented DNA; oligonucleotides for adding the index sequence to the fragmented DNA; the RNA bait library; the sequencing primer to be used in sequencing; and the like. Further, the gene panel may include: a flow cell in which predetermined oligo DNA is immobilized on at least a part of the surface thereof; a reagent for immobilizing the oligo DNA to at least a part of the surface of the flow cell; and the like.
In particular, the index sequence provided to each gene panel can be a sequence that is unique to the gene panel and that identifies the gene panel. The RNA bait library provided to each gene panel can be a library that is unique to the gene panel and that includes RNA baits that correspond to test genes of the gene panel.
Next, in the test institution 110, an analysis process of the read sequences that have been read is performed (step S54 in
(Step S54: Analysis of Read Sequence)
The analysis process of read sequences is performed by the read sequence information obtaining unit 111, the sequence determination unit 113, and the mutation identification unit 114 of the controller 11. One example of the flow of the analysis is described with reference to the flow chart of
First, the read sequence information obtaining unit 111 reads read sequence information provided from the sequencer 2 (step S21 in
The read sequence information is data that indicates the base sequence read by the sequencer 2. The sequencer 2 performs sequencing on a large number of nucleic acid fragments; reads the sequence information thereof; and provides the sequence information, as read sequence information, to the sequence analysis apparatus 1. The read sequence information that has been read may be the sequences, of genes as analysis targets, that have been read in the panel test.
In one embodiment, the read sequence information may include the sequence that has been read and a quality score of each base in the sequence. Both of read sequence information obtained by subjecting to the sequencer 2 an FFPE sample collected from a lesion site of a subject, and read sequence information obtained by subjecting to the sequencer 2 a blood sample of the subject are inputted to the sequence analysis apparatus 1. A “subject” herein denotes a human subject or a subject that is not human such as a mammal, an invertebrate, a vertebrate, a fungus, a yeast, a bacterium, a virus, or a plant. The “FFPE sample” denotes a formalin-fixed paraffin-embedded sample.
Q=−10 log10E
In this formula, E represents an estimated value of the probability of incorrect base assignment. A greater Q value means that the error probability is low. When the Q value decreases, the part that cannot be used in the read increases. In addition, false-positive mutation assignment also increases, which could result in a lower accuracy of the result. “False-positive” means that, although the read sequence does not have any true mutation as a target of the determination, the read sequence is determined as having a mutation. “Positive” means that the read sequence has a true mutation as a target of the determination. “Negative” means that the read sequence does not have any mutation as a target of the determination.
Next, on the basis of the read sequence information read by the read sequence information obtaining unit 111, the sequence determination unit 113 performs alignment of the read sequence of each nucleic acid fragment included in the read sequence information (step S22 in
The sequence determination unit 113 performs alignment with respect to both of read sequence information obtained by subjecting to the sequencer 2 an FFPE sample collected from a lesion site of a subject, and read sequence information obtained by subjecting a blood sample of the subject to the sequencer 2.
One example of a format of a file for outputting a result of alignment performed by the sequence determination unit 113 is described. The format of the alignment result is not limited in particular as long as the format can specify the read sequence, the reference sequence, and the mapping position. The format may include reference sequence information, read sequence name, position information, map quality, and sequence.
“Reference sequence information” indicates the reference sequence name, the reference sequence ID, the sequence length of the reference sequence, and the like. “Read sequence name” is the name, the read sequence ID, and the like of each read sequence for which the alignment was performed. “Position information” indicates the position on the reference sequence at which the leftmost base (the base at the 5′ end) of the read sequence was mapped. “Map quality” is information regarding the mapping quality that corresponds to the read sequence, “Sequence” indicates the base sequence that corresponds to each read sequence.
In step S11 shown in
In one embodiment, the sequence determination unit 113 calculates a score that indicates the matching rate between the read sequence and the reference sequence. The score indicating the matching rate can be a percentage identity between the two sequences, for example. As shown in
In the calculation of the score indicating the matching rate between a read sequence and a reference sequence, the sequence determination unit 113 may calculate such that, when the read sequence includes a predetermined mutation (for example, InDel) with respect to the reference sequence, a score lower than that in the normal calculation is obtained.
In one embodiment, with respect to the read sequence that includes at least one of insertion and deletion with respect to the reference sequence, the sequence determination unit 113 may correct the score by multiplying the score calculated in the normal calculation as described above by a weighting factor according to the number of bases that correspond to the InDel, for example. The weighting factor W may be calculated as W={1−(1/100)×(the number of bases that correspond to InDel)}, for example.
At the positions on the reference sequence shown in
By calculating the score of the matching rate while changing the mapping position of the read sequence with respect to each reference sequence, the sequence determination unit 113 specifies the position on the reference sequence at which the matching rate with the read sequence satisfies a predetermined level. At this time, an algorithm known in the field, such as FASTA or BLAST, may be used.
With reference back to
When all of the read sequences included in the read sequence information obtained by the read sequence information obtaining unit 111 have not been aligned (NO in step S15), the sequence determination unit 113 returns to step S21. When all of the read sequences included in the read sequence information have been aligned (YES in step S15), the sequence determination unit 113 returns to the process in the flow chart shown in
Next, the mutation identification unit 114 of the controller 11 compares an alignment sequence with which the read sequence obtained by subjecting the sample collected from a lesion site of the subject has been aligned, with an alignment sequence with which the read sequence obtained by subjecting the blood sample of the subject has been aligned (step S23 in
In one embodiment, the mutation identification unit 114 generates a result file on the basis of the extracted mutation.
In
With reference back to
The mutation information included in the mutation database 123 may include mutation ID, mutation position information (for example, “CHROM” and “POS”), “REF”, “ALT”, and “Annotation”. The mutation ID is an identifier for identifying a mutation. In the mutation position information, “CHROM” indicates the chromosome number and “POS” indicates the position at the Chromosome number. “REF” indicates a base in the wild type, and “ALT” indicates a base after the mutation. “Annotation” indicates information regarding the mutation. “Annotation” may be information that indicates a mutation of an amino acid such as “EGFR C2573G”, “EGFR L858R”, or the like. For example, “EGFR C2573G” indicates a mutation in which cysteine at the 2573rd residue of protein “EGFR” is substituted by glycine.
As described in the example above, “Annotation” of mutation information may be information for converting a mutation according to base information into a mutation according to amino acid information. In this case, on the basis of information of “Annotation” that has been referred to, the mutation identification unit 114 can convert a mutation according to base information into a mutation according to amino acid information.
Using the information that specifies each mutation included in the result file (for example, mutation position information and base information that corresponds to the mutation) as a key, the mutation identification unit 114 searches the mutation database 123. For example, using any one of pieces of information “CHROM”, “POS”, “REF”, and “ALT” as a key, the mutation identification unit 114 may search the mutation database 123. When a mutation extracted by comparing the alignment sequence derived from the blood specimen and the alignment sequence derived from the lesion site has been registered in the mutation database 123, the mutation identification unit 114 identifies the mutation as a mutation existing in the sample, and provides annotation (for example, “EGFR L858R”, “BRAF V600E”, etc.) to the mutation included in the result file.
With reference back to
Alternatively, the output unit 14 may be a display that displays information regarding processes performed by the units of the controller 11. For example, read sequence information read by the read sequence information obtaining unit 111 may be displayed, and the adapter sequences and the index sequence included in the 5′ end portion and the 3′ end portion in each of read sequence information (see
As described above, a test is performed in the test institution 110, and an analysis report created on the basis of an analysis result is sent to the medical institution 210 having sent an analysis request.
The present disclosure is not limited to the embodiments described above, and various modifications can be made without departing from the scope of the claims. Embodiments obtained by combining as appropriate technological means disclosed in different embodiments are also included in the technological scope of the present disclosure.
For example, as shown in
In the test institution 110 shown in
The mutation information database 3a shown in
A mutation information database 3b may be provided in the test institution 110 as shown in
The reference sequence management unit 112 shown in
When the sequence analysis apparatus 1 shown in
In the above description, an example case has been described in which the reference sequence management unit 112 generates a rearrangement sequence on the basis of known mutation information. However, the present disclosure is not limited thereto. For example, a person who belongs to the analysis institution 120 or the like may obtain known mutation information from the mutation information database 3a and generate a rearrangement sequence. The generated rearrangement sequence may be stored in the mutation information database 3a. In this case, the reference sequence management unit 112 may obtain the rearrangement sequence from the mutation information database 3a. That is, an apparatus (for example, the mutation information database 3a) that is different from the sequence analysis apparatus 1 may provide rearrangement sequences to the sequence analysis apparatus 1.
In the above description, an example case has been described in which a rearrangement sequence that corresponds to each of publicly known mutation information downloaded from the mutation information database 3 is generated by the reference sequence management unit 112 and saved in the reference sequence database 122. However, the present disclosure is not limited thereto. For example, a rearrangement sequence, a rearrangement sequence ID, a mutation ID of a mutation included in the rearrangement sequence, and the like that correspond to each of known mutation information may be stored in the mutation information database 3a, in association with one another. In this case, the reference sequence management unit 112 obtains, through the communication unit 16, a rearrangement sequence, a rearrangement sequence ID, a mutation ID of a mutation included in the rearrangement sequence, and the like that correspond to each of known mutation information, from the mutation information database 3a, and stores those in the reference sequence database 122.
The number of each of the medical institution 210, the test institution 110, and the analysis institution 120 is not limited to one. That is, the medical institution 210 may request analyses to a plurality of test institutions 110, and the test institution 110 may receive analysis requests from a plurality of medical institutions 210. The test institution 110 may request analyses to a plurality of analysis institutions 120, and the analysis institution 120 may receive analysis requests from a plurality of test institutions 110. That is, a plurality of medical institutions 210, a plurality of test institutions 110, and a plurality of analysis institutions 120 may be included. The sequence analysis apparatus 1 can be applied to an institution that has the functions of both of the medical institution 210 and the test institution 110, such as research institutes, university hospitals, and the like that have both a clinical facility and a test facility, and to an institution in which the test institution 110, the analysis institution 120, and the medical institution 210 are integrated.
For example, an apparatus that is separate from the sequence analysis apparatus 1 may have the functions of the reference sequence management unit 112 and the reference sequence generation unit 115, and the apparatus may function as a reference sequence generation apparatus that outputs rearrangement sequences and reference sequences to the sequence analysis apparatus 1. This reference sequence generation apparatus may be an external server that includes a management server 3 connected to the sequence analysis apparatus 1 via the network 4. In this case, reference sequences may be provided to the sequence analysis apparatus 1 from the external server having the function of the reference sequence generation apparatus. For example, the sequence analysis apparatus 1 may be configured as a system that includes: a first apparatus that includes the read sequence information obtaining unit 111, the sequence determination unit 113, and the mutation identification unit 114; a second apparatus that includes the reference sequence management unit 112 and the reference sequence generation unit 115; and a third apparatus (or database) that has a function similar to that of the storage unit 12.
Number | Date | Country | Kind |
---|---|---|---|
2017-216502 | Nov 2017 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
20120208706 | Downing et al. | Aug 2012 | A1 |
20140066317 | Talasaz | Mar 2014 | A1 |
20140149049 | Chen et al. | May 2014 | A1 |
20140249764 | Kumar et al. | Sep 2014 | A1 |
20140336941 | Park | Nov 2014 | A1 |
20150056613 | Kural | Feb 2015 | A1 |
20150057946 | Kural | Feb 2015 | A1 |
20150199472 | Kural | Jul 2015 | A1 |
20150211047 | Borns | Jul 2015 | A1 |
20150299812 | Talasaz | Oct 2015 | A1 |
20150347678 | Kural | Dec 2015 | A1 |
20150368708 | Talasaz | Dec 2015 | A1 |
20160017405 | Borns | Jan 2016 | A1 |
20160040229 | Talasaz et al. | Feb 2016 | A1 |
20160046986 | Eltoukhy et al. | Feb 2016 | A1 |
20160092630 | Chen et al. | Mar 2016 | A1 |
20160251704 | Talasaz et al. | Sep 2016 | A1 |
20160306921 | Kural | Oct 2016 | A1 |
20160333417 | Talasaz | Nov 2016 | A1 |
20160340722 | Platt | Nov 2016 | A1 |
20170218459 | Talasaz et al. | Aug 2017 | A1 |
20170218460 | Talasaz | Aug 2017 | A1 |
20180023125 | Talasaz et al. | Jan 2018 | A1 |
20180171415 | Talasaz et al. | Jun 2018 | A1 |
20180223374 | Talasaz et al. | Aug 2018 | A1 |
20180230530 | Eltoukhy et al. | Aug 2018 | A1 |
20180327862 | Talasaz et al. | Nov 2018 | A1 |
20180336314 | Kural | Nov 2018 | A1 |
20180357367 | Kural | Dec 2018 | A1 |
20190078164 | Talasaz | Mar 2019 | A1 |
20190177802 | Talasaz | Jun 2019 | A1 |
20190177803 | Talasaz | Jun 2019 | A1 |
20190185940 | Talasaz | Jun 2019 | A1 |
20190185941 | Talasaz | Jun 2019 | A1 |
20190272891 | Kural | Sep 2019 | A1 |
20190316185 | Talasaz et al. | Oct 2019 | A1 |
20200032323 | Talasaz et al. | Jan 2020 | A1 |
20200087735 | Talasaz | Mar 2020 | A1 |
20200087736 | Talasaz | Mar 2020 | A1 |
20200115739 | Talasaz et al. | Apr 2020 | A1 |
20200115746 | Talasaz et al. | Apr 2020 | A1 |
20200123602 | Eltoukhy et al. | Apr 2020 | A1 |
20200131568 | Talasaz et al. | Apr 2020 | A1 |
20200168295 | Kural | May 2020 | A1 |
20200224254 | Talasaz et al. | Jul 2020 | A1 |
20200248270 | Talasaz | Aug 2020 | A1 |
20200263239 | Talasaz et al. | Aug 2020 | A1 |
20200291487 | Talasaz | Sep 2020 | A1 |
20200299756 | Talasaz et al. | Sep 2020 | A1 |
20200299785 | Talasaz | Sep 2020 | A1 |
20200325529 | Talasaz et al. | Oct 2020 | A1 |
20200362405 | Talasaz et al. | Nov 2020 | A1 |
20210032707 | Talasaz | Feb 2021 | A1 |
20210040545 | Talasaz et al. | Feb 2021 | A1 |
20210087616 | Talasaz et al. | Mar 2021 | A1 |
20210102243 | Talasaz et al. | Apr 2021 | A1 |
20210130912 | Talasaz | May 2021 | A1 |
20210139998 | Talasaz | May 2021 | A1 |
20210164037 | Talasaz et al. | Jun 2021 | A1 |
20210340632 | Talasaz | Nov 2021 | A1 |
20210355549 | Talasaz | Nov 2021 | A1 |
20210371912 | Talasaz et al. | Dec 2021 | A1 |
20210395814 | Talasaz et al. | Dec 2021 | A1 |
20220042104 | Talasaz | Feb 2022 | A1 |
20220049299 | Talasaz et al. | Feb 2022 | A1 |
20220049300 | Talasaz et al. | Feb 2022 | A1 |
20220119880 | Talasaz et al. | Apr 2022 | A1 |
20220145385 | Talasaz et al. | May 2022 | A1 |
20220205051 | Talasaz | Jun 2022 | A1 |
20220325340 | Talasaz et al. | Oct 2022 | A1 |
20220380842 | Talasaz et al. | Dec 2022 | A1 |
20220389489 | Talasaz et al. | Dec 2022 | A1 |
Number | Date | Country |
---|---|---|
102766688 | Nov 2012 | CN |
103797486 | May 2014 | CN |
104781421 | Jul 2015 | CN |
105637098 | Jun 2016 | CN |
105793689 | Jul 2016 | CN |
105793859 | Jul 2016 | CN |
3097206 | Nov 2016 | EP |
2014-507133 | Mar 2014 | JP |
2015-180193 | Oct 2015 | JP |
2015-536661 | Dec 2015 | JP |
2016-536698 | Nov 2016 | JP |
2017-500004 | Jan 2017 | JP |
2017-33046 | Feb 2017 | JP |
WO 2014041380 | Mar 2014 | WO |
WO 2015048753 | Apr 2015 | WO |
WO 2015-112619 | Jul 2015 | WO |
WO 2017053683 | Mar 2017 | WO |
Entry |
---|
DbSNP: a database of single nucleotide polymorphism, Elizabeth M Smigielski et al. Nucleic Acids Research, vol. 28, pp. 352-355. (Year: 2000). |
The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. S Bamford et al. British Journal of Cancer (2004) 91, pp. 355-358. (Year: 2004). |
Principles of analytical validation of next-generation sequencing based mutational analysis for hematologic neoplasms in a CLIA-certified laboratory. Kanagal-Shamanna R, Singh RR, Routbort MJ, Patel KP, Medeiros LJ, Luthra R. Expert Review of Molecular Diagnostics. Volume 16, pp. 461-472. (Year: 2016). |
Bioinformatics for Clinical Next Generation Sequencing. Gavin R Oliver, Steven N Hart, Eric W Klee. Clinical Chemistry, vol. 61, Issue 1, Jan. 1, 2015, pp. 124-135. (Year: 2015). |
Guidelines for Validation of Next-Generation Sequencing-Based Oncology Panels. Lawrence J Jennings, Maria E Arcila, Christopher Corless, Suzanne Kamel-Reid, Ira M Lubin, John Pfeifer, Robyn L Temple-Smolkin, Karl V Voelkerding, Marina N Nikiforova. J Mol Diagn. May 2017; 19(3): pp. 341-365. (Year: 2017). |
Uwe Baier, Timo Beller, Enno Ohlebusch: Graphical pan-genome analysis with compressed suffix trees and the Burrows-Wheeler transform, Bioinformatics, vol. 32, Issue 4, Feb. 15, 2016, pp. 497-504 (Year: 2016). |
Forbes: (“COSMIC: exploring the world's knowledge of somatic mutations in human cancer”, Nucleic Acids Research, 2015, vol. 43 , Database issue D805-D811 (Year: 2015). |
Reporting Letter received from Japanese associate dated Jun. 26, 2019 enclosing Extended European Search Report received in European Application No. 18205386.8 dated Apr. 11, 2019, pp. 1-2. |
Extended European Search Report dated Apr. 11, 2019 received in European Application No. 18205386.8, pp. 3-5. |
Office Action in Japanese Application No. 2017-216502, dated Jul. 13, 2021, 8 pages (including English translation), pp. 1-8. |
Communication/Office Action received in European Application No. 18 205 386.8 dated Feb. 10, 2021, pp. 1-8. |
Chinese Office Action with English Translation, dated Feb. 15, 2023, pp. 1-19, Issued in Chinese patent application No. 201811329017.6, China National Intellectual Property Administration, Beijing, China. |
“Genome Sequencing and Analysis of E. coli H001 Strain and E. coli V0001 Strain, Hongsheng Zhang (Bioinformatics), Directed by Prof. Jun. Yu, Beijing Institute of Genomics Chinese Academy of Sciences, Apr. 2011”. |
Notice of Allowance in China Application No. 201811329017.6, including English tranlsation, dated Jul. 7, 2023, 12 pages. |
Number | Date | Country | |
---|---|---|---|
20190156914 A1 | May 2019 | US |