This application claims priority from prior Japanese Patent Application No. 2017-208652, filed on Oct. 27, 2017, entitled “Quality Evaluation Method, Quality Evaluation Apparatus, Program, Storage Medium, and Quality Control Sample”, the entire content of which is incorporated herein by reference.
The present invention relates to a quality evaluation method, a quality evaluation apparatus, a program, a storage medium, and a quality control sample, which are used for a genetic test.
Development of genetic test technology in recent years enhances expectation of individualized medical care in which gene sequences of a subject are analyzed and a therapeutic method or medication is appropriately selected according to the characteristics of the subject. For example, a panel test in which abnormality in a specific gene associated with a specific disease or abnormality in an exon region that is translated into protein is analyzed by using a next-generation sequencer with high throughput, is known for analyzing gene sequences.
In Lih et al., Analytical Validation of the Next-Generation Sequencing Assay for a Nationwide Signal-Finding Clinical Trial, The Journal of Molecular Diagnostics, Vol. 19, No. 2, March 2017 (hereinafter, referred to as Non-Patent Literature 1), a quality control method for a genetic test using a next-generation sequencer is described.
However, the quality control in the genetic test field is in the trial stage, and the quality control method for a genetic test using a next-generation sequencer is not established as a standard quality control method for genetic tests. For example, when the technique disclosed in Non-Patent Literature 1 is used as a quality control method for a panel test in which a plurality of genes are to be analyzed, accuracy for quality evaluation may become low.
The scope of the present invention is defined solely by the appended claims, and is not affected to any degree by the statements within this summary.
In order to solve the aforementioned problem, a quality evaluation method according to one mode of the present invention is directed to a quality evaluation method performed in a genetic test for testing a gene in a sample collected from a subject, for a plurality of types of gene mutations that include a first type gene mutation and a second type gene mutation different from the first type gene mutation, and the quality evaluation method includes preparing a quality control sample that includes a first reference gene having the first type gene mutation, and a second reference gene having the second type gene mutation; obtaining sequence information of the genes included in the quality control sample; and outputting an index for evaluation of a quality of the genetic test, based on the sequence information having been obtained.
A “subject” represents a human subject or a subject, which is not human, such as a mammal, an invertebrate, a vertebrate, a fungus, a yeast, a bacterium, a virus, or a plant. The embodiment herein relates to a human subject, but the concept of the present disclosure can be applied to a genome derived from an organism such as any animal or any plant other than human, and is useful in fields such as medical care, veterinary medicine, and zoological science.
A “sample” can be also referred to as a specimen, and is used so as to be synonymous with a preparation in this field. A “sample” is intended to mean any preparation obtained from a biological material (for example, individual, body fluid, cell strain, cultured tissue, or tissue section) as a supply source.
A “quality control sample” is intended to mean a preparation that is prepared for performing, for example, pretreatment for analyzing a sequence of a gene, and a process for reading sequence information by the sequencer.
“Mutation” includes substitution, deletion, or insertion of nucleotide of a gene, gene fusion, or copy number polymorphism. “Substitution” represents a phenomenon that at least one base in a gene sequence is changed to a different base. “Substitution” includes point mutation and single nucleotide polymorphism. “Deletion” and “insertion” are also referred to as “InDel (Insertion and/or Deletion)”. InDel represents a phenomenon that insertion and/or deletion occurs in at least one base in a gene sequence. “Gene fusion” represents a phenomenon that the 5′ side sequence of a gene binds to the 3′ end side sequence of another gene due to translocation of chromosome or the like. “Copy number polymorphism” indicates that the number of copies on a genome per one cell is different among individuals. Specific examples of the copy number polymorphism include Variable Nucleotide of Tandem Repeat (VNTR), Short Tandem Repeat Polymorphism (STRP, microsatellite polymorphism), and gene amplification.
A quality evaluation apparatus (1) according to one mode of the present invention is directed to a quality evaluation apparatus (1) for evaluating a quality of a genetic test for testing a gene in a sample collected from a subject, for at least a first type gene mutation and a second type gene mutation different from the first type gene mutation, and the quality evaluation apparatus (1) includes a data adjustment unit (113) configured to analyze sequence information of a gene in a quality control sample that includes a first reference gene having the first type gene mutation and a second reference gene having the second type gene mutation; and a quality management unit (117) configured to generate an index for evaluation of a quality of the genetic test, based on the sequence information.
A program according to one mode of the present invention is directed to a quality evaluation program for a genetic test for testing a gene in a sample collected from a subject, for at least a first type gene mutation and a second type gene mutation different from the first type gene mutation, and the program causes a computer to perform obtaining sequence information of a gene in a quality control sample that includes a first reference gene having the first type gene mutation, and a second reference gene having the second type gene mutation; and generating an index for evaluation of a quality of the genetic test, based on the sequence information having been obtained.
A storage medium according to one mode of the present invention is directed to a computer-readable storage medium having stored therein the program according to one mode of the present invention.
A quality control sample according to one mode of the present invention is directed to a quality control sample for use in a genetic test for testing a gene in a sample collected from a subject, for at least a first type gene mutation and a second type gene mutation different from the first type gene mutation, and the quality control sample includes a first reference gene having the first type gene mutation, and a second reference gene having the second type gene mutation.
One embodiment of the present disclosure will be described below in detail.
Firstly, a gene analyzing system 100 according to one embodiment of the present disclosure will be schematically described with reference to
The gene analyzing system 100 shown in
The test institution 120 tests and/or analyzes the sample provided by the medical institution 210, generates a report based on the analysis result, and provides the medical institution 210 with the report. In the test institution 120, a sequencer 2, the quality evaluation apparatus 1, and the like are mounted. However, the test institution 120 is not limited thereto.
The analyzing system management institution 130 entirely manages the analyses performed by each test institution 120 that uses the gene analyzing system 100.
The medical institution 210 is an institution in which doctors, nurses, pharmacists, and the like perform medical practice for patients such as diagnosis, treatment, and dispensation. Examples of the medical institution 210 include hospitals, clinics, and pharmacies.
Subsequently, a flow of a process in an exemplary application of the gene analyzing system 100 shown in
Firstly, the test institution 120 that would like to use the gene analyzing system 100 introduces the quality evaluation apparatus 1. The test institution 120 applies for the use of the gene analyzing system 100 to the analyzing system management institution 130 (step S101).
The test institution 120 and the analyzing system management institution 130 can make a desired contract therebetween among a plurality of contract types, in advance, for the use of the gene analyzing system 100. For example, service contents provided to the test institution 120 by the analyzing system management institution 130, a method for determining a system usage fee for which the analyzing system management institution 130 bills the test institution 120, or a method for payment of the system usage fee may be selected from among a plurality of different contract types. The management server 3 of the analyzing system management institution 130 specifies the contents of the contract made with the test institution 120 according to application from the test institution 120 (step S102).
Next, the management server 3 managed by the analyzing system management institution 130 assigns a test institution ID to the quality evaluation apparatus 1 of the test institution 120 with which the analyzing system management institution 130 has made the contract, and starts providing various services (step S103).
The quality evaluation apparatus 1 receives various services from the management server 3. Examples of various services include providing of programs and information for controlling; an analysis result of gene sequences which can be outputted from the quality evaluation apparatus 1; a report based on the analysis result; and the like. Thus, the quality evaluation apparatus 1 can output, for example, the analysis result and the report which correspond to inputted gene-panel-associated information.
In many cases, a gene panel includes a set of reagents such as a primer and a probe. The analysis of the gene panel is not limited thereto, and the gene panel may be used for analysis of polymorphism such as single nucleotide polymorphism (SNP) and copy number polymorphism (CNV). The gene panel may be used for outputting information (also referred to as tumor mutation burden) associated with an amount of mutation of the entirety of genes to be analyzed, or calculating methylation frequency.
In the medical institution 210, a doctor or the like collects, as appropriate, a sample such as blood and tissue of a lesion site of a subject. When a request for analyzing the collected sample is made to the test institution 120, for example, the request for analysis is transmitted via a communication terminal 5 disposed in the medical institution 210 (step S105). When a request for analyzing a sample is made to the test institution 120, the medical institution 210 transmits the request for analysis to the test institution 120 and provides the test institution 120 with a sample ID assigned to each sample. The sample ID assigned to each sample is used for associating, for example, information on a subject from which each sample is collected, and the sample with each other.
Hereinafter, an exemplary case where the medical institution 210 requests the test institution 120 to perform a panel test analysis, will be described. The panel tests include not only clinical laboratory tests but also tests for research.
When a request for a gene panel test is made by the medical institution 210, a desired gene panel may be designated. Therefore, in step S105 in
The quality evaluation apparatus 1 receives a request for analysis from the medical institution 210 (S106). The quality evaluation apparatus 1 receives a sample from the medical institution 210 that has transmitted the request for analysis.
The number of gene panels usable in the analysis for which the medical institution 210 makes a request to the test institution 120, is plural, and a gene group to be analyzed is determined for each gene panel. In the test institution 120, a plurality of gene panels can be selectively used according to the purpose of the analysis. That is, for a first sample provided by the medical institution 210, a first gene panel can be used in order to analyze a first gene group to be analyzed, and, for a second sample, a second gene panel can be used in order to analyze a second gene group to be analyzed.
In the present embodiment, for example, a first type gene mutation is “substitution” and a second type gene mutation is “deletion”. In this case, in the genetic test to be performed for quality control, at least presence or absence of substitution and deletion, and the type are tested. In the present embodiment, a first reference gene includes a specific substitution mutation with respect to wild-type gene sequences, and a second reference gene includes a specific deletion mutation with respect to wild-type sequences of paired electrons.
The quality evaluation apparatus 1 receives, from a user, an input of gene-panel-associated information on a gene panel to be used for analyzing a sample (step S107).
In the test institution 120, pretreatment is performed for the received sample by using the gene panel, and sequencing is performed by using the sequencer 2 (step S108).
In the test institution 120, for a predetermined quality control sample corresponding to a gene panel, pretreatment is performed by using the gene panel, and sequencing is performed by using the sequencer 2 (step S108), separately from a general sample sequencing, thereby controlling accuracy.
When the quality control sample is subjected to genetic test such as pretreatment, sequencing, and sequence analysis, the result of the genetic test is used as a quality evaluation index of the panel test.
One or a plurality of quality control samples may be associated with each gene panel. For example, the quality control sample(s) corresponding to each gene panel may be prepared in advance. The quality control sample(s) may be measured alone, or may be measured together with a sample provided from the medical institution 210.
In the description herein, the quality control sample is a sample for quality control used in the genetic test in which the first type gene mutation and the second type gene mutation different from the first type gene mutation are tested. The “quality control sample” is a preparation that includes a first reference gene including the first type gene mutation and a second reference gene including the second type gene mutation.
The pretreatment may include processing from fragmenting genes such as DNA included in a sample to collecting the fragmented genes. The sequencing includes processing of reading sequences of one or a plurality of DNA fragments, to be analyzed, which are collected in the pretreatment. The sequence information read in the sequencing by the sequencer 2 is outputted as read sequence information to the quality evaluation apparatus 1.
The pretreatment may include processing from fragmenting genes such as DNA included in a sample and a quality control sample to collecting the fragmented genes.
The read sequence represents polynucleotide sequence obtained by sequencing, and represents a sequence outputted by the sequencer 2.
The sequencing includes processing of reading sequences of one or a plurality of DNA fragments, to be analyzed, which are collected in the pretreatment. The sequence information read in the sequencing by the sequencer 2 is outputted as the read sequence information to the quality evaluation apparatus 1.
The sequencer 2 may output, to the quality evaluation apparatus 1, the read sequence information that includes a quality score which is a quality evaluation index for the process step of reading the gene sequences. The sequencer 2 may output, to the quality evaluation apparatus 1, a cluster concentration that is a quality evaluation index for a process step of amplifying DNA fragments to be analyzed. The “quality score” and the “cluster concentration” will be described below.
The number of the gene panels which can be used in analysis for which the medical institution 210 makes a request to the test institution 120 is plural, and a gene group to be analyzed is determined for each gene panel. The test institution 120 can selectively use the plurality of gene panels according to the purpose of the analysis. That is, for the first sample provided by the medical institution 210, the first gene panel is used for analyzing the first gene group to be analyzed, and, for the second sample, the second gene panel can be used for analyzing a second gene group to be analyzed.
The quality evaluation apparatus 1 obtains the read sequence information from the sequencer 2 and analyzes gene sequences (step S109).
The quality control sample is also processed in the same process step as performed in the panel test for the sample from the medical institution 210, and sequence information of genes in the quality control sample is analyzed. A quality evaluation index for evaluating the quality of the panel test is generated based on the result of analyzing the quality control sample.
Next, the quality evaluation apparatus 1 evaluates the quality of the panel test based on the generated quality evaluation index (step S110). Specifically, the quality evaluation apparatus 1 can evaluate the quality of each panel test, based on a result of comparison between the evaluation criterion which is set for each quality evaluation index, and the generated quality evaluation index.
The quality evaluation apparatus 1 generates a report based on the result of the analysis in step S109, and an index generated based on the result of analyzing the quality control sample (step S111), and transmits the generated report to the communication terminal 5 (step S112). For example, the report may include data of an alignment result of the read sequence information data of a result of analysis itself by the quality evaluation apparatus 1, such as data associated with identified mutation or the like; and information associated with the quality of the panel test.
The generated report may be printed in the test institution 120. For example, the test institution 120 may transmit the generated report as a paper medium to the medical institution 210.
The quality evaluation apparatus 1 of the test institution 120 that uses the gene analyzing system 100 notifies the management server 3 of gene-panel-associated information used in the analysis, information associated with the analyzed genes, analysis record, the quality evaluation index generated for the genetic test having been performed, and the like (step S114).
The management server 3 obtains a test institution ID, a gene panel ID, a gene ID, an analysis record, and the like via, for example, a network 4 from the quality evaluation apparatus 1 of each test institution 120 that uses the gene analyzing system 100. The management server 3 stores the obtained test institution ID, gene panel ID, gene ID, analysis record, and the like so as to associate them with each other (step S115).
The test institution ID is information for specifying a user who performs gene sequence analysis, and may be a user ID that is identification information assigned to each user that uses the quality evaluation apparatus 1.
The gene panel ID is identification information provided for specifying a gene panel used for analyzing a target gene. The gene panel ID assigned to the gene panel is associated with a gene panel name, a name of a company that provides the gene panel, and the like.
The gene ID is identification information provided to each gene for specifying a gene to be analyzed.
The gene analyzing system 100 analyzes gene sequence information, and includes at least the quality evaluation apparatus 1 and the management server 3. The quality evaluation apparatus 1 is connected to the management server 3 via the network 4 such as an intranet and the Internet.
The sequencer 2 is a base sequence analyzing device used for reading a base sequence of a gene included in a sample.
The sequencer 2 according to the present embodiment is preferably a next-generation sequencer that performs sequencing using a next-generation sequencing technique, or a third-generation sequencer. The next-generation sequencer is a series of base sequence analyzing devices which are being developed in recent years, and has a significantly improved analytical capability by performing, in a flow cell, parallel processing of a large amount of single DNA molecules or DNA templates which have been clonally amplified.
Sequencing technology usable in the present embodiment may be sequencing technology in which a plurality of reads are obtained by reading the same region multiple times (deep sequencing).
Examples of the sequencing technology usable in the present embodiment include sequencing technology, such as ionic semiconductor sequencing, pyrosequencing, sequencing-by-synthesis which uses a reversible dye terminator, sequencing-by-ligation, and sequencing by probe ligation of oligonucleotide, which can obtain multiple reads per one run and is based on sequencing principle other than the Sanger's method.
The sequencing primer used in the sequencing is not particularly limited, and is set as appropriate based on the sequence suitable for amplifying a target region. A reagent used in the sequencing may be also suitably selected according to the sequencing technology and the sequencer 2 to be used. A procedure from the pretreatment to the sequencing will be described below by using a specific example.
In the storage unit 12, a program for sequence analysis, a program for generating a single reference sequence, and the like are stored. The output unit 13 includes a display, a printer, a speaker, and the like. An input unit 17 includes a keyboard, a mouse, a touch sensor, and the like. A device, which functions as both the input unit and the output unit, such as a touch panel having a touch sensor and a display integrated with each other may be used. A communication unit 14 is an interface that allows the controller 11 to communicate with an external device.
The quality evaluation apparatus 1 includes the controller 11 that comprehensively controls the components of the quality evaluation apparatus 1; the storage unit 12 that stores various data used by an analysis execution unit 110; the output unit 13, the communication unit 14, and the input unit 17. The controller 11 includes the analysis execution unit 110 and a management unit 116. The analysis execution unit 110 includes a sequence data reading unit 111, an information selection unit 112, a data adjustment unit 113, a mutation identifying unit 114, and a report generation unit 115. In the storage unit 12, a gene-panel-associated information database 121, a reference sequence database 122, a mutation database 123, and an analysis record log 151 are stored.
The quality evaluation apparatus 1 generates, even when a gene panel to be used is changed for each analysis, a report that includes a result of analysis corresponding to the gene panel having been used. A user using the gene analyzing system 100 is allowed to analyze the result of a panel test by a common analysis program to generate a report regardless of a type of the gene panel. Therefore, when the panel test is performed, a bothersome operation such as selecting an analysis program to be used for each gene panel, and performing specific setting for the analysis program for each gene panel to be used is omitted, thereby improving usability for a user.
When a user of the quality evaluation apparatus 1 inputs gene-panel-associated information from the input unit 17, the information selection unit 112 refers to the gene-panel-associated information database 121, and controls an algorithm of an analysis program so as to execute analysis of a gene to be analyzed by the analysis program according to the inputted gene-panel-associated information.
In the description herein, the gene-panel-associated information may be any information for specifying a gene panel used for measurement by the sequencer 2, and represents, for example, a gene panel name, a name of a gene to be analyzed in the gene panel, and a gene panel ID.
The information selection unit 112 changes an analysis algorithm so as to perform analysis corresponding to a gene to be analyzed in the gene panel indicated by the gene-panel-associated information, based on the gene-panel-associated information which is inputted by the input unit 17.
The information selection unit 112 outputs an instruction based on the gene-panel-associated information to at least one of the data adjustment unit 113, the mutation identifying unit 114, and the report generation unit 115. By using this configuration, the quality evaluation apparatus 1 can output a result of the analysis of the read sequence information, based on the inputted gene-panel-associated information.
That is, the information selection unit 112 is a functional block for performing control so as to obtain gene-panel-associated information on gene panels that include a plurality of genes to be analyzed, and causing the output unit 13 to output a result of analysis of the read sequence information, based on the obtained gene-panel-associated information.
When genes included in various samples are analyzed by a user that performs the panel test, various gene panels are used according to gene groups, to be analyzed, for each sample.
That is, the quality evaluation apparatus 1 can obtain first read sequence information read by using the first gene panel for analyzing, from the first sample, the first gene group to be analyzed, and second read sequence information read by using the second gene panel for analyzing, from the second sample, the second gene group to be analyzed.
Even when various combinations of genes to be analyzed are analyzed by using various gene panels, the quality evaluation apparatus 1 can appropriately output results of analyses obtained by analyzing the read sequence information since the quality evaluation apparatus 1 includes the information selection unit 112.
That is, a user merely selects gene-panel-associated information without setting an analysis program used for analyzing the read sequence information and performing analysis for each gene to be analyzed, whereby a result of analysis of each piece of the read sequence information can be appropriately outputted.
For example, when the information selection unit 112 outputs, to the data adjustment unit 113, an instruction based on the gene-panel-associated information, the data adjustment unit 113 performs, for example, an alignment process based on the gene-panel-associated information.
According to the gene-panel-associated information, the information selection unit 112 makes an instruction for limiting the reference sequence (reference sequence in which wild-type genome sequence and mutation sequence are incorporated) used by the data adjustment unit 113 in mapping of the read sequence information, only to the reference sequence associated with a gene corresponding to the gene-panel-associated information.
In this case, since the gene-panel-associated information has already been reflected on the result of the process by the data adjustment unit 113, the information selection unit 112 need not output an instruction based on the gene-panel-associated information, to the mutation identifying unit 114 that performs a process subsequent to the process performed by the data adjustment unit 113.
For example, when the information selection unit 112 outputs, to the mutation identifying unit 114, an instruction based on the gene-panel-associated information, the mutation identifying unit 114 preforms a process in which the gene-panel-associated information is reflected.
For example, according to the gene-panel-associated information, the information selection unit 112 makes an instruction for limiting a region of the mutation database 123 to which the mutation identifying unit 114 refers, only to mutation associated with a gene corresponding to the gene-panel-associated information. Thus, gene-panel-associated information is reflected on the result of the process by the mutation identifying unit 114.
A flow of a process of analyzing gene sequences of a sample and a quality control sample will be described with reference to
Firstly, in step S31 in
Next, in step S32, sequences of genes included in the sample and the quality control sample having been subjected to the pretreatment are read by the sequencer 2.
Step S32 is, specifically, a step of reading sequences of one or a plurality of DNA fragments, to be analyzed, which have been collected after the pretreatment. The read sequence information includes the gene sequence which is read in this step. One or a plurality of DNA fragments, to be analyzed, which have been collected after the pretreatment may be also referred to as “library”.
Subsequently, when the quality control sample is measured, the quality evaluation apparatus 1 analyzes the read gene sequence and specifies presence or absence of mutation in the sequence, a position of the mutation, a type of the mutation, and the like in step S33. By the read gene sequence being analyzed, the detected mutation is identified.
Subsequently, in step S34, the quality evaluation apparatus 1 generates a quality evaluation index for evaluating the quality of the panel test. The quality evaluation apparatus 1 may evaluate the quality of the panel test having been performed, based on the generated quality evaluation index.
Finally, the quality evaluation apparatus 1 generates a report that includes a result of the analysis such as information associated with the mutation identified in step S33, and information, representing the quality of the panel test, such as the quality evaluation index generated in step S34. The generated report is provided to the medical institution 210.
Next, a procedure of the pretreatment in step S31 shown in
When DNA is extracted from each of the sample and the quality control sample to perform sequence analysis, DNA is firstly extracted from the sample that includes genes to be analyzed, and the quality control sample corresponding to the gene panel to be used (step 300 in
In this case, the DNA derived from the sample and the DNA derived from the quality control sample are each subjected to the process of step S301 and the subsequent steps.
The DNA extracted from the quality control sample is subjected to the same process as for the DNA extracted from the sample, whereby a quality evaluation index useful for evaluating the quality of the sequence analysis in the panel test can be generated.
The usage of the quality control sample is not limited thereto. For example, as shown in
Alternatively, as shown in
By comparison between a result of analysis of DNA derived from the quality control sample that includes mutation and a result of analysis of DNA derived from the quality control sample that does not include mutation, a quality evaluation index useful for evaluating the quality of the sequence analysis in the panel test can be generated.
Furthermore, as shown in
In the process of step S301 and the subsequent steps, DNA derived from the sample and DNA derived from the quality control sample may be mixed to perform the process of step S301 and the subsequent steps without individually processing the DNA derived from the sample and the DNA derived from the quality control sample. Thus, in all the process of step S301 and the subsequent steps, the conditions for both of the samples are the same, whereby the quality evaluation index can be more accurately generated. A part of the lanes in the flow cell used for the sequencer 2 need not be used only for DNA fragments prepared from the quality control sample. Thus, the limited number of lanes can be effectively used for DNA fragments derived from a sample that includes genes to be analyzed.
In this case, (1) a reagent for appropriately fragmenting a reference gene included in a quality control sample and a gene to be analyzed in the panel test to prepare a library, and (2) a reagent that contains RNA baits for appropriately capturing the DNA fragments, respectively, after the reference gene included in the quality control sample and the gene to be analyzed in the panel test are fragmented, are preferably utilized.
According to one embodiment, the quality control sample is a composition containing a plurality of reference genes. The quality control sample can be prepared by a plurality of reference genes being mixed. A reagent obtained by the reference genes being mixed and stored in a single container can be provided as the quality control sample to a user. A plurality of reference genes that are stored in separate containers may be provided in the form of a kit as the quality control sample to a user. The quality control sample may be in the form of a solution or may be in a solid state (powder). When the quality control sample is provided in the form of solution, an aqueous solvent, such as water and TE buffer, known to a person skilled in the art can be used as the solvent.
The quality control sample will be described with reference to
A quality control sample A1 corresponding to a gene panel A includes at least two of a reference gene including SNV, a reference gene including Insertion, a reference gene including Deletion, a reference gene including CNV, and a reference gene including Fusion. For example, the quality control sample A1 includes, as the reference genes, a partial sequence of a gene A including “SNV” with respect to a wild-type gene, and a partial sequence of a gene B including “Insertion” with respect to a wild-type gene.
The first reference gene and the second reference gene included in the quality control sample may be different DNA molecules, or may ligate each other. When the first reference gene and the second reference gene ligate each other, the sequence of the first reference gene and the sequence of the second reference gene may directly ligate each other, or a spacer sequence may intervene between the sequence of the first reference gene and the sequence of the second reference gene.
The spacer sequence is preferably a sequence which is less likely to be included in a specimen used for the genetic test. For example, the space sequence may be a sequence in which only a plurality (for example, 100) of adenine bases are consecutive.
The reference gene may be a gene included in a gene panel to be analyzed, or a gene which is not included in the gene panel to be analyzed. The reference gene may be a gene of a biological species for which the genetic test is to be performed, or a gene of a different biological species. For example, when the genetic test is performed for the human, the reference gene may be a gene of an animal other than the human, a plant, or bacteria.
A method for synthesizing the reference gene is not particularly limited. For example, the reference gene can be synthesized by a known DNA synthesizer. A gene, derived from an organism, which serves as a template is amplified by PCR and purified, whereby the reference gene may be obtained. PCR amplification is performed by using, as a template, a reference gene synthesized by a DNA synthesizer and purification is performed, whereby the reference gene may be obtained.
The length of the reference gene is not particularly limited. For example, the length of the reference gene may be greater than or equal to the length of 50 nucleotides. When amplification by PCR is performed, amplification can be advantageously performed with ease such that the length of the reference gene is less than or equal to the length of 2000 nucleotides. When the reference gene is synthesized by a DNA synthesizer, up to several kbp of the reference gene can be synthesized.
The concentration of the reference genes in the quality control sample is not particularly limited. For example, the concentration of the reference genes can be approximately the same as a DNA concentration in the specimen.
The reference gene in the quality control sample may be single-stranded or double-stranded. The reference gene may be linear or cyclic.
Hereinafter, one example of preparation of the quality control sample will be specifically described.
A reference gene having a sequence represented by sequence number 1 is synthesized by a known DNA synthesizer. The synthesized DNA is amplified by PCR by using a commercially available reagent that contains DNA polymerase, dNTPs, and buffer.
The amplification product is subjected to agarose gel electrophoresis, and a band portion near 500 bp is cut out. The gel having been cut out is purified in a fixed method. After the purification, DNA is quantified, and is diluted by a TE buffer to a desired concentration, whereby the reference gene having the sequence represented by sequence number 1 is obtained.
A reference gene having a sequence represented by sequence number 3 is synthesized by a known DNA synthesizer. The synthesized DNA is amplified by PCR by using a commercially available reagent that contains DNA polymerase, dNTPs, and buffer.
Similarly to (1) described above, a reference gene having the sequence represented by sequence number 3 is obtained.
Reference DNA molecules having the sequence represented by sequence number 1 and reference DNA molecules having the sequence represented by sequence number 3 are mixed at a desired concentration, to prepare a quality control sample. The quality control sample is mixed with a specimen to prepare a sample for sequence analysis.
The quality of gene panel test is evaluated by using the prepared sample for sequence analysis by a next-generation sequencer (for example, NextSeq500 manufactured by Illumina, Inc.). In the gene panel, a plurality of genes that include PIK3CA genes and EML4-ALK fusion genes are target genes. The genomic DNA derived from the specimen in the sample for sequence analysis, and the reference gene are subjected to the pretreatment (fragmentation, DNA concentration, PCR amplification using tag primer, and the like) and the sequence analysis, to obtain sequence information of the target genes. In the sequence analysis, an index for quality control is obtained, and the quality of a result of analysis of the target gene is evaluated based on the index of sequence analysis of the reference DNA molecules. A user is allowed to determine reliability of the result of analysis of the gene to be analyzed, based on the result of the quality evaluation.
In the example described above, in (3), the quality control sample and the specimen are mixed. However, each of the quality control sample and the specimen may be separately subjected to the sequence analysis without mixing them.
When the panel test using the same gene panel is repeated, the same quality control sample may be repeatedly used. As indicated by data 121D in
When a plurality of quality control samples having different combinations of reference genes are selectively used for each panel test, on a weekly basis, or on a monthly basis, the quality evaluation index for evaluating the quality of the process of detecting mutation in the panel test can be generated by detecting mutations of the increased number of kinds of reference genes. Therefore, the comprehensiveness of the quality control in the panel test is improved.
For example,
Next, as shown in
Next, as shown in
The adapter sequence is a sequence used for executing the sequencing in the following process steps. According to one embodiment, in Bridge PCR method, the adapter sequence can be a sequence which is hybridized with oligo DNA immobilized on a flow cell.
In one mode, as shown in the upper part in
The adapter sequences can be added to the DNA fragment by using a known method in this technical field. For example, the DNA fragment may be blunted and ligated with an index sequence, and, thereafter, may be further ligated with the adapter sequences.
Next, as shown in
The biotinylated RNA bait library is formed from a biotinylated RNA (hereinafter, referred to as RNA bait) which is hybridized with a gene to be analyzed. The RNA bait may have any length. For example, long oligo RNA bait having about 120 bp may be used in order to enhance specificity.
In the panel test using the sequencer 2 according to the present embodiment, multiple genes (for example, greater than or equal to 100 genes) are to be analyzed.
A reagent used in the panel test includes a set of RNA baits corresponding to the multiple genes, respectively. When the panel is different, the number and the kinds of genes to be tested are different, whereby a set of RNA baits that are contained in the reagent used in the panel test is different. When a gene different from a gene to be analyzed is used as a reference gene, a bait that binds to the reference gene need to be prepared.
As shown in
Thus, as indicated in the mid-part in
Thus, the DNA fragments which are hybridized with the RNA baits, that is, the DNA fragments to be analyzed can be selected and concentrated. The sequencer 2 reads nucleic acid sequences of the DNA fragments selected by using a plurality of RNA baits, thereby obtaining a plurality of read sequences.
Next, the procedure of step S32 in
As shown in
Firstly, as indicated in the right part in
Subsequently, as shown in
That is, two different kinds of adapter sequences (for example, adapter 1 sequence and adapter 2 sequence in
The adapter 2 sequence on the 3′ end side is immobilized on the flow cell in advance, and the adapter 2 sequence on the 3′ end side of the DNA fragment binds to the adapter 2 sequence on the 3′ end side on the flow cell to form a bridged state, thereby forming a bridge (“3” in
When DNA elongation by DNA polymerase is caused in this state (“4” in
Forming of the bridge, DNA elongation, and denaturation as described above are repeatedly performed in order, respectively, whereby multiple single-stranded DNA fragments are locally amplified and immobilized to form a cluster (“6” to “10” in
As shown in
Firstly, to the single-stranded DNA (the upper left part in
The sequence primer may be designed, for example, to be hybridized with a part of the adapter sequence. In other words, the sequence primer may be designed so as to amplify the DNA fragment derived from the sample DNA, and, when an index sequence is added, the sequence primer may be further designed so as to amplify the index sequence.
After the sequence primer is added, one base elongation is caused, by the DNA polymerase, for dNTP which is labeled with fluorescence and has 3′ end blocked. Since the dNTP having 3′ end side blocked is used, polymerase reaction stops when one base elongation has been caused. The DNA polymerase is removed (the right center part in
The photograph is taken for each of fluorescent colors corresponding to A, C, G, T, respectively while a wavelength filter is changed in order to determine four kinds of bases by using a fluorescence microscope. After all the photographs have been obtained, bases are determined from the photograph data. Fluorescent substance and the protecting group that blocks the 3′ end side are removed, and the subsequent polymerase reaction is caused. This flow is set as one cycle, and the second cycle, the third cycle, and so on are repeatedly performed, whereby sequencing over the entirety of the length can be performed.
In the above-described manner, the length of the chain which can be analyzed reaches 150 bases×2, and analysis can be performed in units which are much less than those for a picotiter plate. Therefore, due to high density, a huge amount of sequence information corresponding to 40 to 200 Gb can be obtained in one analysis.
(c. Gene Panel)
The gene panel used for reading the read sequence by the sequencer 2 represents an analysis kit for analyzing a plurality of targets to be analyzed in one run as described above. According to one embodiment, the gene panel can be an analysis kit for analyzing a plurality of gene sequences associated with a specific disease.
In the description herein, the term “kit” represents a packaging that includes a container (for example, bottle, plate, tube, and dish) that contains a specific material therein. The kit preferably includes an instruction for use of each material. In the description herein, according to the aspect of kit, “include (is included)” represents a state of being included in any of individual containers that form the kit. The kit can be a package in which a plurality of different compositions are packaged into one, and the mode of the compositions can be as described above. In the case of solution form, the solution may be contained in the container.
In the kit, one container may contain a material A and a material B in a mixed manner, or the material A and the material B may be contained in separate containers, respectively. The “instruction” indicates the procedure of applying the components in the kit to treatment and/or diagnosis. The “instruction” may be written or printed on paper or another medium. Alternatively, the “instruction” may be stored in an electronic medium such as a magnetic tape, a computer-readable disc or tape, and a CD-ROM. The kit may also include a container in which diluent, solvent, washing liquid, or another reagent is stored therein. The kit may also include equipment necessary for applying the kit to treatment and/or diagnosis.
In one embodiment, the gene panel may include one or more of the quality control sample, reagents such as a reagent for fragmenting nucleic acid, a reagent for ligation, washing liquid, and PCR reagent (dNTP, DNA polymerase or the like), and magnetic beads, as described above. The gene panel may also include one or more of oligonucleotide for adding an adapter sequence to fragmented DNA, oligonucleotide for adding an index sequence to fragmented DNA, the RNA bait library, and the like.
In particular, the index sequence included in each gene panel can be a sequence, specific to the gene panel, for identifying the gene panel. The RNA bait library included in each gene panel may be a library, specific to the gene panel, which includes a RNA bait corresponding to each test gene of the gene panel.
Subsequently, the sequence data reading unit 111, the data adjustment unit 113, and the mutation identifying unit 114 of the analysis execution unit 110 will be described based on the flow of the process shown in
Firstly, in step S11 shown in
The read sequence information is data representing a base sequence read by the sequencer 2. The sequencer 2 performs sequencing of multiple nucleic acid fragments obtained by using a specific gene panel, and reads the sequence information therein, and provides the quality evaluation apparatus 1 therewith as the read sequence information.
In one mode, the read sequence information may include the quality score of each base in the sequence as well as the sequence having been read. Both the read sequence information obtained by subjecting, to the sequencer 2, the FFPE sample collected from a lesion site of a subject and the read sequence information obtained by subjecting, to the sequence 2, blood sample of the subject are inputted to the quality evaluation apparatus 1.
Q=−10 log10E
In this equation, E represents an estimated value of the probability of incorrect base assignment. The greater the value of Q is, the lower the probability of the error is. The less the value of Q is, the greater a portion of the read which cannot be used is.
False-positive mutation assignment increases, and the accuracy of the result may be lowered. The “false-positive” means that the read sequence is determined as having mutation although the read sequence does not have true mutation to be determined.
“Positive” means that the read sequence has true mutation to be determined, and “negative” means that the read sequence does not have mutation to be determined. For example, if the quality score is 20, the probability of error is 1/100. Therefore, this means that the accuracy (also referred to as “basecall accuracy”) for each base in the gene sequence having been read is 99%.
Subsequently, in step S12 in
The data adjustment unit 113 performs alignment for both the read sequence information obtained by subjecting, to the sequencer 2, the FFPE sample collected from a lesion site of a subject, and the read sequence information obtained by subjecting, to the sequencer 2, a blood sample of the subject.
The reference sequence information represents, for example, the reference sequence name (reference sequence ID) in the reference sequence database 122, and the sequence length of the reference sequence. The read sequence name is information that represents the name (read sequence ID) of each read sequence for which the alignment has been performed. The position information represents the position (Leftmost mapping position) on the reference sequence at which the leftmost base of the read sequence has been mapped. The mapping quality is information that represents the quality of mapping corresponding to the read sequence. The sequence is information that represents the base sequence (for example, . . . GTAAGGCACGTCATA) corresponding to each read sequence.
Metadata representing the gene-panel-associated information is added to each reference sequence in the reference sequence database 122. For example, the gene-panel-associated information which is to be added to each reference sequence can directly or indirectly indicate the gene, to be analyzed, corresponding to each reference sequence.
In one embodiment, the information selection unit 112 may perform control such that, when the data adjustment unit 113 obtains a reference sequence from the reference sequence database 122, the data adjustment unit 113 refers to the inputted gene-panel-associated information and the metadata of each reference sequence, and selects a reference sequence corresponding to the gene-panel-associated information.
For example, in one mode, the information selection unit 112 may control the data adjustment unit 113 so as to select a reference sequence corresponding to a gene, to be analyzed, which is specified by the inputted gene-panel-associated information. Thus, the data adjustment unit 113 performs mapping merely on the reference sequence associated with the gene panel having been used, thereby improving efficiency of the analysis.
In another embodiment, the information selection unit 112 need not perform the above-described control. In this case, the information selection unit 112 merely controls the mutation identifying unit 114 or the report generation unit 115 as described below.
In step S401 shown in
In one mode, the data adjustment unit 113 calculates a score representing the degree of matching between the read sequence and the reference sequence. The score that represents the degree of matching may be, for example, a percentage (percentage identity) of the matching between the two sequences. For example, the data adjustment unit 113 specifies positions at which bases of the read sequence and bases of the reference sequence are the same, and obtains the number of the positions, and divides the number of the positions at which the bases are the same, by the number of bases (the number of bases in the comparison window) of the read sequence compared with the reference sequence, to calculate the percentage.
In a case where the score representing the degree of matching between the read sequence and the reference sequence is calculated, the data adjustment unit 113 may calculate the score such that, when the read sequence includes a predetermined mutation (for example, insertion deletion (InDel: Insertion/Deletion)) with respect to the reference sequence, the score is less than that calculated in the normal calculation.
In one mode, for a read sequence that includes at least one of insertion and deletion with respect to the reference sequence, the data adjustment unit 113 may correct the score by, for example, multiplying the score calculated in the above-described normal calculation, by a weighting factor according to the number of bases corresponding to the insertion deletion. The weighting factor W may be calculated as, for example, W={1−( 1/100)×(the number of bases corresponding to insertion deletion)}.
The data adjustment unit 113 calculates the score representing the degree of matching while changing the mapping position of the read sequence with respect to each reference sequence, thereby specifying a position on the reference sequence at which the degree of matching with the read sequence satisfies a predetermined criterion. At this time, an algorithm known in this technical field, such as dynamic programming, the FASTA method, and the BLAST method, may be used.
Returning to
When alignment of all the read sequences included in the read sequence information obtained by the sequence data reading unit 111 has not been performed (NO in step S405), the data adjustment unit 113 returns the process to step S401. When alignment of all the read sequences included in the read sequence information has been performed (YES in step S405), the process step of step S12 is ended.
Subsequently, returning to
In step S14 shown in
In one mode, the mutation identifying unit 114 generates a result file based on the extracted mutation.
As shown in
In
Returning to
Among the mutation position information, “CHROM” represents the chromosome number, and “POS” represents a position on the chromosome number. “REF” represents a base in the wild-type, and “ALT” represents a base that is present after the mutation. “Annotation” represents information associated with the mutation. “Annotation” may be, for example, information representing mutation of amino acid such as “EGFR C2573G” or “EGFR L858R”. For example, “EGFR C2573G” represents mutation in which cysteine at the 2573-th residue of protein “EGFR” is substituted by glycine.
As in the above-described example, “Annotation” of the mutation information may be information for converting mutation based on the base information to mutation based on the amino acid information. In this case, the mutation identifying unit 114 can convert the mutation based on the base information to the mutation based on the amino acid information, according to the information of “Annotation” which has been referred to.
The mutation identifying unit 114 searches the mutation database 123 by using, as a key, information (for example, base information corresponding to mutation position information and mutation) for specifying the mutation included in the result file. For example, the mutation identifying unit 114 may search the mutation database 123 by using, as a key, information of any of “CHROM”, “POS”, “REF”, and “ALT”. When the mutation extracted by comparison between the alignment sequence derived from the blood specimen and the alignment sequence derived from the lesion site is registered in the mutation database 123, the mutation identifying unit 114 identifies the mutation as a mutation in the sample, and adds annotation (for example, “EGFR L858R”, “BRAF V600E”, or the like) to the mutation included in the result file.
The report generation unit 115 generates a report based on the information outputted by the mutation identifying unit 114 and the gene-panel-associated information provided by the information selection unit 112 (corresponding to step S111 in
The report generation unit 115 selects information to be included in the report, based on the gene-panel-associated information provided by the information selection unit 112, and eliminates, from the report, the information which has not been selected. Alternatively, the information selection unit 112 may control the report generation unit 115 so as to select gene-associated information corresponding to the gene-panel-associated information inputted via the input unit 17, as information to be included in the report, and eliminate, from the report, the information which has not been selected.
The report generated by the report generation unit 115 may be transmitted as data to the communication terminal 5 installed in the medical institution 210, through the output unit 13, as a result of analysis of the read sequence information (corresponding to step S112 in
Examples of the quality evaluation index obtained by measuring the quality control sample are as follows.
The above-described quality evaluation index will be described with reference to
Index (i-1): Quality Score
The quality score is an index representing accuracy for each base in the gene sequence read by the sequencer 2.
For example, when the read sequence information is outputted as FASTQ file from the sequencer 2, the quality score is also included in the read sequence information (see
Index (i-2): Cluster Concentration
The sequencer 2 locally amplifies and immobilizes multiple single-stranded DNA fragments on the flow cell to form a cluster (see 9 in
For example, in a case where the cluster density is excessively high, and the clusters are excessively close to each other or overlap each other, the contrast of the taken image of the flow cell, that is, the S/N ratio is lowered, whereby focusing by the fluorescence microscope is less likely to be easily performed. Therefore, fluorescence cannot be accurately detected. As a result, the sequence cannot be accurately read.
The index indicates how many bases in the target region have been read, among bases (also including bases other than those in the target region) read by the sequencer 2, and can be calculated as a ratio between the total number of bases in the target region and the total number of bases having been read.
Index (iii): Quality Evaluation Index Representing the Depth of Read Sequence Information
The index is an index based on the total number of pieces of the read sequence information obtained by reading the bases included in a gene to be analyzed, and can be calculated as a ratio between the total number of bases, among the bases having been read, having depths which are greater than or equal to a predetermined value, and the total number of bases having been read.
The depth represents the total number of pieces of the read sequence information having been read for one base.
The index is an index representing the uniformity of the depth. When the number of pieces of the read sequence information having been read in a certain portion among the region having been read is extremely great, uniformity of the depth is low. When the read sequence information is relatively uniform over the region having been read, the uniformity of the depth is high. The uniformity of the depth is not limited thereto. For example, the uniformity can be represented as numbers by using the interquartile range (IQR). The greater the IQR is, the lower the uniformity is. The less the IQR is, the higher the uniformity is.
Index (v): Quality Evaluation Index Indicating Whether or not all the Mutations in Each Reference Gene Included in the Quality Control Sample have been Detected
The index is an index indicating that the mutation in each reference gene included in the quality control sample has been detected and accurately identified. The mutation (see the cell for “Variant”) in each reference gene included in a quality control sample A shown in
Below these items, the gene panel name “A panel” is indicated as the gene-panel-associated information. The quality evaluation index “QC index” obtained from the process using the quality control sample, the result of analysis thereof, and the like is outputted in the report.
In the report, in the cells for “detected gene mutation and associated medication”, information associated with the mutation identified by the mutation identifying unit 114 and the list associated with the medication are included.
When the quality evaluation index is less than a predetermined criterion, the detected gene mutation may be marked with“*”. In addition thereto or instead thereof, a comment for indicating that reliability is low can be added.
The present disclosure is not limited to the above-described embodiments. Numerous modifications can be made without departing from the scope of the appended claims. An embodiment in which techniques disclosed in different embodiments are combined with each other as appropriate may be also included in the technical scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
2017-208652 | Oct 2017 | JP | national |