Early detection and monitoring of diseases may be useful in a number of diagnostic methods. Mutations may be detected in associations with establishing a higher risk of a disease for a patient. Disorders can be a result of changes in epigenetic markers or rare genetic alterations. Such disorders may be characterized with DNA and RNA sequence information. In some cases, the disease may be identified and characterized by biological markers, such as nucleotide insertions and deletions, nucleotide substitutions, amino acid insertions, amino acid deletions, amino acid substitutions, gene fusions, copy-number variations, translocations, or gene expression signatures.
In the past, patients with a particular disease may be identified and enrolled into clinical trials from an investigator's clinic or practice from advertising or referrals. The clinical trials may be paper-based, unavoidably burdensome, slow to monitor, process, and store. In addition, with pharmaceutical companies producing more novel drug compounds, it is important for pharmaceutical companies to test and market new drugs in a minimum amount of time. Embodiments of the invention provide methods for analyzing a biological sample of a subject, identifying a disease in a subject, and using a computer implemented method to extract clinical history and data from a biological sample for clinical trial enrollment and drug development.
In certain aspects, the disclosure provides a method for qualifying a subject for a subset of therapies comprising clinical trials or standard of care treatments for one or more types of cancers, comprising: (a) subjecting at least one biological sample from the subject to at least one assay to generate biologic data from the subject; (b) processing the biologic data from the subject against a filtered set of therapies to generate the subset of therapies for which the subject qualifies, wherein the subset of therapies comprises the clinical trials or standard of care treatments for the one or more types of cancers, which filtered set of therapies is generated by computer assessing eligibility of a database of therapies against one or more criteria; and (c) presenting the subset of therapies on a user interface on an electronic device of a user. In certain embodiments, the method for qualifying a subject further comprises transmitting medical history data of the subject to one or more therapy coordinators of the subset of therapies.
In certain embodiments, the method for qualifying a subject further comprises receiving a selection from the subject as to a given clinical trial from the subset of therapies. In certain embodiments, the method for qualifying a subject further comprises receiving a request for enrollment of the subject in a therapy selected from the subset of therapies through the user interface. In certain embodiments, the method for qualifying a subject further comprises computer assessing the eligibility of the database of therapies against the one or more criteria to generate the filtered set of therapies. In certain embodiments, computer assessing the eligibility comprises (i) identifying at least one portion of the database of therapies; and (ii) curating at least one portion of the database of therapies using one or more clinical labels or molecular labels to generate the filtered set of therapies. In certain embodiments, the user interface comprises one or more graphical elements with one or more network links to the subset of therapies and contact information for the subset of therapies for which the subject qualifies. In certain embodiments, the subset of therapies comprises clinical trials or standard of care treatments for one or more types of cancers. In certain embodiments, the biologic data is generated from at least one biological sample of the subject by an automated assaying system, which automated assaying system uses automated processing for at least one member selected from the group consisting of cell extraction, nucleic acid extraction, enrichment, sequencing, and immunohistochemistry, during processing of at least one biological sample. In certain embodiments, step (b) comprises validating the filtered set of therapies by a human therapy curator. In certain embodiments, step (b) further comprises using medical history data of the subject to generate the subset of therapies for which the subject qualifies, wherein the medical history data is separate from the biologic data. In certain embodiments, the medical history data is identifiable according to medical text segments from the medical history data of the subject. In certain embodiments, the method for qualifying a subject further comprises using at least one machine learning algorithm to detect and label the medical text segments. In certain embodiments, step (b) comprises validating the subset of therapies for which the subject qualifies by a human therapy curator. In certain embodiments, at least one biological sample comprises a tumor tissue sample or a blood sample. In certain embodiments, the method for qualifying a subject further comprises, prior to step (a), (i) receiving a first nucleic acid sample from a tumor sample of the subject; and (ii) receiving a second nucleic acid sample from a normal sample of the subject. In certain embodiments, the method for qualifying a subject further comprises enriching the first nucleic acid sample for a plurality of nucleic acid sequences to provide an enriched nucleic acid sample using a probe set comprising probes that have an on-target rate as a group of at least about 80%, as determined by (i) measuring, for the probe set in at least one predetermined region, (1) probe coverage of each probe in the probe set and (2) off-target probe coverage for each probe in the probe set, and (ii) determining the on-target rate of the probe set based on a ratio of the off-target coverage to the probe coverage. In certain embodiments, the method for qualifying a subject further comprises assaying the enriched nucleic acid sample and the second nucleic acid sample to identify one or more genomic aberrations in a biological sample to generate the biologic data for the subject. In certain embodiments, the method for qualifying a subject further comprises labeling one or more genomic aberrations in the biological sample.
In certain aspects, the disclosure provides a method for qualifying a subject for a subset of therapies, comprising: (a) receiving medical history data and biologic data for the subject wherein the biologic data is generated from one or more biological samples of the subject; (b) computer analyzing the medical history data and the biologic data to yield a genomic-based medical history analysis for the subject; (c) using the genomic-based medical history analysis for the subject to query one or more databases of therapies for the subject, to generate the subset of therapies for which the subject qualifies; and (d) providing the subset of therapies on a user interface on an electronic device of a user.
In certain embodiments, the biologic data is generated from one or more biological samples of the subject by an automated assaying system, which automated assaying system uses automated processing for at least one member selected from the group consisting of cell extraction, nucleic acid extraction, enrichment, sequencing, and immunohistochemistry. In certain embodiments, the method for qualifying a subject further comprises computer assessing eligibility of the one or more databases of therapies against one or more criteria to generate a filtered set of therapies. In certain embodiments, the one or more databases is computer assessed using medical history data. In certain embodiments, the genomic-based medical history analysis for the subject comprises labels from the medical history data and labels from the biologic data, and wherein (c) comprises computer processing the labels against therapies from one or more database to yield the subset of therapies for which the subject qualifies. In certain embodiments, the method for qualifying a subject further comprises receiving a selection from the subject as to a given therapy from the subset of therapies. In certain embodiments, the method for qualifying a subject further comprises receiving a request for enrollment of the subject in a therapy selected from the provided subset of therapies through the user interface. In certain embodiments, the user interface comprises one or more graphical elements with one or more network links to the subset of therapies and contact information for the subset of therapies for which the subject qualifies. In certain embodiments, the subset of therapies comprises clinical trials or standard of care treatments for one or more types of cancers. In certain embodiments, step (c) comprises validating the subset of therapies for which the subject qualifies by a human therapy curator. In certain embodiments, prior to the step (a) the method comprises (i) receiving a first nucleic acid sample from a tumor sample of the subject; and (ii) receiving a second nucleic acid sample from a normal sample of the subject. In certain embodiments, the method for qualifying a subject further comprises enriching the first nucleic acid sample for a plurality of nucleic acid sequences to provide an enriched nucleic acid sample using a probe set comprising probes that have an on-target rate as a group of at least about 80%, as determined by (i) measuring, for the probe set in at least one predetermined region, (1) probe coverage of each probe in the probe set and (2) off-target probe coverage for each probe in the probe set, and (ii) determining the on-target rate of the probe set based on a ratio of the off-target coverage to the probe coverage. In certain embodiments, the method for qualifying a subject further comprises assaying the enriched nucleic acid sample and the second nucleic acid sample to identify one or more genomic aberrations in a biological sample to generate biologic data for the subject. In certain embodiments, prior to step (b), the medical history data is processed and transformed to provide processed medical history data. In certain embodiments, processing is selected from the group consisting of cleaning, organizing, and labeling. In certain embodiments, the subset of therapies comprises clinical trials or standard of care treatments for one or more types of cancer.
In certain embodiments, the method for qualifying a subject further comprises presenting the subset of therapies to a clinician to select for a recommended therapy. In certain embodiments, the method for qualifying a subject further comprises receiving a selection from the subset of therapies from the clinician. In certain embodiments, the biologic data include nucleic acid mutations or differentially expressed proteins. In certain embodiments, the nucleic acid mutations are selected from genes and variants of Table 1. In certain embodiments, (c) comprises querying one or more databases for one or more targeted therapies according to a predetermined gene or genomic region. In certain embodiments, the subset of therapies in (c) excludes therapies that target genomic aberrations absent in the biologic data. In certain embodiments, (c) comprises removing therapies that target genomic aberrations absent in the biologic data. In certain embodiments, the subset of therapies in (c) is filtered according to clinical phases of the therapy. In certain embodiments, the medical history data is identifiable according to medical text segments from the medical history data of the subject. In certain embodiments, the method for qualifying a subject further comprises using at least one machine learning algorithm to detect and label the medical text segments. In certain embodiments, (c) comprises determining ineligible therapies according to a categorical score and rejecting the ineligible therapies from remaining therapies to generate the subset of therapies. In certain embodiments, the categorical score is selected from the group consisting of yes, maybe, and no. In certain embodiments, the subset of therapies are compared and reviewed. In certain embodiments, the subset of therapies is passed to a user to manually verify eligibility using links to information from the medical history data and the biologic data for the subject.
In certain embodiments, the method for qualifying a subject further comprises filtering the subset of therapies based on filtering preferences of the user. In certain embodiments, filtering further comprises an evaluation by a healthcare professional and a selection for a recommended therapy. In certain embodiments, the subset of therapies is generated from one or more databases of therapies without use of the biologic data of the subject. In certain embodiments, step (a) comprises receiving phenotype information for the subject. In certain embodiments, the method for qualifying a subject further comprises (e) monitoring the subject enrolled in the subset of therapies by assaying one or more biological samples from the subject, wherein assaying is directed to 100 or more genes or variants thereof selected from Table 1. In certain embodiments, the querying of step (c) has a predicted likelihood of matching to a clinical trial of at least about 90%. In certain embodiments, the one or more biological samples are assayed for a presence or absence of biological markers at a concordance correlation coefficient of greater than or equal to about 90% when the one or more biological samples is re-assayed for the presence or absence of the biological markers, which biological markers include a plurality of different types of biological markers. In certain embodiments, the assaying covers at least 2,500 genes, gene fusions, point mutations, indels, copy-number variations, promoters, or enhancers. In certain embodiments, the subject is diagnosed with a solid tumor or cancer. In certain embodiments, the biologic data generates an initial list of therapies and the medical history data filters the initial list of therapies to generate the subset of therapies.
In certain aspects, the disclosure provides a method for qualifying a subject for a subset of therapies, comprising: (a) receiving (i) a first nucleic acid sample from the subject, which first nucleic acid sample has or is suspected of having tumor-derived cells or biological markers, and (ii) a second nucleic acid sample from a normal sample of the subject; (b) enriching the first nucleic acid sample for a plurality of nucleic acid sequences to provide an enriched nucleic acid sample using a probe set comprising probes that have an on-target rate as a group of at least about 80%, as determined by (i) measuring, for the probe set in at least one predetermined region, (1) probe coverage of each probe in the probe set and (2) off-target probe coverage for each probe in the probe set, and (ii) determining the on-target rate of the probe set based on a ratio of the off-target coverage to the probe coverage; (c) assaying the enriched nucleic acid sample and the second nucleic acid sample to identify one or more genomic alterations in the first nucleic acid sample relative to the second nucleic acid sample to generate a set of genomic data for the subject; (d) querying one or more databases of therapies for one or more therapies corresponding to a medical history of the subject and the genomic data, to generate the subset of therapies for which the subject qualifies; and (e) providing the subset of therapies on a user interface on an electronic device of a user.
In certain embodiments, the method for qualifying a subject further comprises receiving a selection from the subject as to a given therapy from the subset of therapies. In certain embodiments, the method for qualifying a subject further comprises receiving a request for enrollment of the subject in a therapy selected from the subset of therapies through the user interface. In certain embodiments, the method for qualifying a subject further comprises computer assessing eligibility of the one or more databases of therapies against one or more criteria to generate a filtered set of therapies. In certain embodiments, the user interface comprises one or more graphical elements with one or more network links to the subset of therapies and contact information for the subset of therapies for which the subject qualifies.
In certain embodiments, the subset of therapies comprises clinical trials or standard of care treatments for one or more types of cancers. In certain embodiments, step (d) comprises validating the subset of therapies for which the subject qualifies by a human therapy curator. In certain embodiments, the method for qualifying a subject further comprises receiving medical history data for the subject. In certain embodiments, the method for qualifying a subject further comprises identifying a therapeutic target based on the medical history and the genomic data and enrolling the subject in a therapy based on the identified therapeutic target. In certain embodiments, the method for qualifying a subject further comprises monitoring the subject, the monitoring comprising assaying one or more nucleic acid samples to generate genomic data, wherein the assaying is directed to 100 or more genes or variants thereof selected from Table 1. In certain embodiments, the assaying covers at least 2,500 genes, gene fusions, point mutations, indels, copy-number variations, promoters, or enhancers. In certain embodiments, the first nucleic acid sample comprises cell-free DNA. In certain embodiments, 100 or more genes are assayed in the cell-free DNA. In certain embodiments, the first nucleic acid sample and the second nucleic acid sample are assayed for one or more genomic alterations at a concordance correlation coefficient of greater than or equal to about 90% when the first nucleic acid sample and the second nucleic acid sample are re-assayed for presence or absence of the genomic alterations, which genomic alterations include a plurality of different types of genomic alterations.
In certain aspects, the disclosure provides a method for analyzing a biological sample of a subject, comprising assaying the biological sample for a presence or absence of biological markers at a concordance correlation coefficient of greater than or equal to about 90% and an accuracy of at least about 90% as compared to a control when the biological sample is re-assayed for the presence or absence of the biological markers, which biological markers include a plurality of different types of biological markers, wherein the assaying comprises a plurality of different assays, including sequencing, wherein greater 90% of operations of the assaying are automatically performed.
In certain embodiments, the biological sample is homogenous. In certain embodiments, the biological sample comprises a tumor tissue or a whole blood sample from the subject. In certain embodiments, the biological sample comprises nucleic acid molecules. In certain embodiments, the biological sample comprises cell-free deoxyribonucleic acid (cfDNA) molecules, cellular deoxyribose nucleic acid (cDNA) molecules, ribonucleic acid (RNA) molecules, and protein, and wherein the cfDNA molecules, the cDNA molecules, and the RNA molecules are assayed for the presence or absence of the biological markers. In certain embodiments, the biological sample comprises normal biomolecules and abnormal biomolecules. In certain embodiments, the normal biomolecules are isolated from a buffy coat of the biological sample. In certain embodiments, the abnormal biomolecules are isolated from plasma or a tumor tissue of the biological sample. In certain embodiments, the biological sample is a single cell. In certain embodiments, biological sample is indexed. In certain embodiments, the method for analyzing a biological sample of a subject further comprises re-assaying the biological sample at a later point in time and identifying a change in one or more biological markers. In certain embodiments, the assaying comprises processing the biological sample or sequencing the biological sample without any involvement from a user during sample preparation. In certain embodiments, the assaying comprises immunohistochemistry profiling and genomic profiling of the biological sample. In certain embodiments, 2500 or greater of the biological markers are assayed. In certain embodiments, the assaying is at a concordance correlation coefficient of greater than or equal to about 90% and an accuracy of at least about 90% based on assaying the biological sample multiple times. In certain embodiments, the assaying is at a concordance correlation coefficient of greater than or equal to about 90% and an accuracy of at least about 90% based on assaying the biological sample in at least two different geographic locations.
In certain aspects, the disclosure provides a method for identifying a genomic aberration in one or more biological samples of a subject, comprising: (a) obtaining the one or more biological samples of the subject, which one or more biological samples comprise a nucleic acid sample that has or is suspected of having one or more genomic aberration(s) that appears at a frequency of less than about 5% in the nucleic acid sample; (b) enriching the nucleic acid sample for a plurality of nucleic acid sequences to provide an enriched nucleic acid sample using a probe set comprising probes that have an on-target rate as a group of at least about 80%, as determined by (i) measuring, for the probe set in at least one predetermined region, (1) probe coverage of each probe in the probe set and (2) off-target probe coverage for each probe in the probe set, and (ii) determining the on-target rate of the probe set based on a ratio of the off-target coverage to the probe coverage; (c) sequencing the enriched nucleic acid sample to generate sequencing reads; and (d) processing the sequencing reads to identify the genomic aberration(s) in the one or more biological samples of the subject that appears at a frequency of less than about 5% in the nucleic acid sample.
In certain embodiments, one or more biological samples comprise blood sample(s) or a tissue sample(s). In certain embodiments, the processing covers at least 2,500 genes, gene fusions, point mutations, indels, copy-number variations, promoters, or enhancers. In certain embodiments, the nucleic acid sample comprises cell-free DNA. In certain embodiments, one or more biological samples are indexed. In certain embodiments, the method for identifying a genomic aberration further comprises re-processing the biological sample at a later point in time and identifying a change in one or more biological markers. In certain embodiments, the processing comprises immunohistochemistry profiling and genomic profiling of the biological sample. In certain embodiments, 2500 or greater biological markers are assayed.
In certain aspects, the disclosure provides a system for providing a subject displaying cancer with a therapy, comprising: one or more computer memory comprising (i) biologic data of the subject, which biologic data is generated from one or more biological samples of the subject, or (ii) medical history data of the subject; and one or more computer processors operatively coupled to one or more databases of therapies, wherein the one or more computer processors are individually or collectively programmed to: (i) receive medical history data and biologic data for the subject, which biologic data is generated from one or more biological samples of the subject by automated handling from insertion into an automated system using at least one of the following steps of cell extraction, nucleic acid extraction, enrichment, sequencing, and immunohistochemistry, during processing of the one or more biological samples; (ii) analyze the medical history data and the biologic data to yield a genomic-based medical history analysis for the subject; (iii) use the genomic-based medical history analysis for the subject to query one or more databases of therapies for the subject, to generate a subset of therapies for which the subject qualifies; and (iv) electronically output the subset of therapies on a user interface for display to a user.
In certain embodiments, the one or more computer processors receive the biologic data or the medical history data over a network. In certain embodiments, the system for providing a subject displaying cancer with a therapy further comprises a sequencer that subjects the one or more biological samples to sequencing to generate the biologic data.
In certain aspects, the disclosure provides a non-transitory computer-readable medium comprising machine-executable code that, upon execution by one or more computer processors, implements a method for providing a subject displaying cancer with a therapy, comprising: (a) receiving medical history data and biologic data for the subject, which biologic data is generated from one or more biological samples of the subject by automated handling from insertion into an automated system using at least one of the following steps of cell extraction, nucleic acid extraction, enrichment, sequencing, and immunohistochemistry, during processing of the one or more biological samples; (b) analyzing the medical history data and the biologic data to yield a genomic-based medical history analysis for the subject; (c) using the genomic-based medical history analysis for the subject to query one or more databases of therapies for the subject, to generate a subset of therapies for which the subject qualifies; and (d) electronically outputting the subset of therapies on a user interface for display to a user.
In certain aspects, the disclosure provides a method for qualifying a subject for a subset of therapies, comprising: (a) subjecting at least one biological sample from the subject to at least one assay to generate biologic data from the subject; (b) processing the biologic data from the subject against a filtered set of therapies to generate the subset of therapies for which the subject qualifies, which filtered set of therapies is generated by computer assessing eligibility of a database of therapies against one or more criteria; (c) presenting the subset of therapies on a user interface on an electronic device of a user; and (d) further comprising transmitting medical history data of the subject to one or more therapy coordinators of the subset of therapies. In certain embodiments, the biologic data is generated from at least one biological sample of the subject by an automated assaying system, which automated assaying system uses automated processing for at least one member selected from the group consisting of cell extraction, nucleic acid extraction, enrichment, sequencing, and immunohistochemistry, during processing of the at least one biological sample.
In certain aspects, the disclosure provides a computer-implemented method for providing a subject displaying cancer with a therapy, comprising: (a) receiving biologic data for the subject, which biological data is generated from one or more biological samples of the subject; (b) using the biologic data to generate a first list of therapies according to a molecular profile of the subject, which molecular profile is indicative of one or more genomic aberrations in one or more biological samples; (c) generating a second list of therapies from the first list of therapies using medical history data of the subject; and (d) electronically outputting the second list of therapies. In certain embodiments, prior to (c), medical history data is received for the subject. In certain embodiments, prior to (c), the medical history data is processed and transformed to provide processed medical history data. In certain embodiments, the processing is selected from the group consisting of cleaning, organizing, and labeling. In certain embodiments, the processed medical history data is presented to the subject. In certain embodiments, the list of therapies comprises clinical trials and/or standard of care.
In certain embodiments, the computer-implemented method for providing a subject displaying cancer with a therapy further comprises presenting the second list of therapies on a user interface for display to the subject. In certain embodiments, the computer-implemented method for providing a subject displaying cancer with a therapy further comprises presenting the second list of therapies to a clinician to select for a recommended therapy. In certain embodiments, the computer-implemented method for providing a subject displaying cancer with a therapy further comprises receiving a request for enrollment of the subject in a given therapy selected from the second list of therapies.
In certain embodiments, the biologic data is generated from one or more biological samples of the subject without any pipetting by a user during preparation of one or more biological samples. In certain embodiments, the biologic data comprises data generated from one or more biological samples selected from the group consisting of protein, peptides, cell-free nucleic acids, ribonucleic acids, deoxyribose nucleic acids, and any combination thereof. In certain embodiments, one or more genomic aberrations include nucleic acid mutations and/or differentially expressed proteins. In certain embodiments, nucleic acid mutations are selected from the group consisting of an insertion(s), nucleotide deletion(s), nucleotide substitution(s), amino acid insertion(s), amino acid deletion(s), amino acid substitution(s), gene fusion(s), and copy-number variation(s). In certain embodiments, the nucleic acid mutations are selected from genes and variants of Table 1.
In certain embodiments, (b) of the computer-implemented method for providing a subject displaying cancer with a therapy comprises querying one or more databases for one or more targeted clinical trials and therapies according to a predetermined gene or genomic region. In certain embodiments, the first list of therapies in (b) excludes therapies that target genomic aberrations absent in one or more biological samples. In certain embodiments, (b) comprises removing therapies that target genomic aberrations absent in one or more biological samples. In certain embodiments, the first list of therapies in (b) is filtered according to clinical phases of the therapy.
In certain embodiments, the medical history data is identifiable according to relevant medical text segments. In certain embodiments, machine learning algorithms are further used to detect and label relevant medical text segments.
In certain embodiments, (c) of the computer-implemented method for providing a subject displaying cancer with a therapy comprises determining ineligible therapies according to a categorical score and rejecting ineligible therapies from remaining therapies to generate a filtered list of remaining therapies. In certain embodiments, the categorical score is selected from the group consisting of yes, maybe, and no. In certain embodiments, the filtered list of remaining therapies are compared and reviewed. The review may generate a second list of therapies. The second list of therapies may be passed to a user to manually verify eligibility using links to information from the medical history data and the biologic data for the subject. In certain embodiments, the user is a healthcare professional. In certain embodiments, the user is a primary care provider of the subject.
In certain embodiments, the computer-implemented method for providing a subject displaying cancer with a therapy further comprising filtering the second list of therapies based on filtering preferences of a user. The user may be the subject. In certain embodiments, the filtering preferences are selected from the group consisting of availability at a specific institution, availability at a set of institutions, type of treatment, phase of clinical trial, method of drug delivery, location and distance of a given therapy from a specified location, duration of treatment, and subject relocation therapy duration. In certain embodiments, the filtering further comprises an evaluation by a healthcare professional and a selection for a recommended therapy. In certain embodiments, the second list of therapies is generated from the first list of therapies without use of the molecular profile of the subject. In certain embodiments, the computer-implemented method for providing a subject displaying cancer with a therapy further comprises, prior to (a), subjecting one or more biological samples of the subject to sequencing to generate the biologic data.
In certain aspects, the disclosure provides a method for identifying a genomic aberration in one or more biological samples of a subject, comprising: (a) obtaining one or more biological samples of the subject, which one or more biological samples comprise a nucleic acid sample that has or is suspected of having one or more genomic aberration(s) that appears at a frequency of less than about 5% in the nucleic acid sample; (b) enriching the nucleic acid sample for a plurality of nucleic acid sequences to provide an enriched nucleic acid sample using a probe set comprising probes that have an on-target rate as a group of at least about 95%, as determined by (i) comparing the probe set to at least one predetermined region to measure (1) probe coverage of each probe in the probe set and (2) off-target probe coverage for each probe in the probe set, and (ii) determining the on-target rate of the probe set based on a ratio of the off-target coverage to the probe coverage; (c) sequencing the enriched nucleic acid sample to generate sequencing reads; and (d) processing the sequencing reads to identify one or more genomic aberration(s) in one or more biological samples of the subject that appears at a frequency of less than about 5% in the nucleic acid sample. In certain embodiments, one or more biological samples comprise blood sample(s) and/or a tissue sample(s). In certain embodiments, the tumor tissue sample is formalin-fixed, paraffin-embedded (FFPE) tissue. In certain embodiments, one or more biological samples is selected from the group consisting of protein, peptides, cell-free nucleic acids, ribonucleic acids, deoxyribose nucleic acids, and any combination thereof. In certain embodiments, one or more genomic aberrations include nucleic acid mutations. In certain embodiments, one or more genomic aberrations are selected from the group consisting of an insertion, nucleotide deletion, nucleotide substitution, amino acid insertion, amino acid deletion, amino acid substitution, gene fusion, copy-number variation, gene expression signatures, and any combination thereof.
In certain embodiments, the method for identifying a genomic aberration in one or more biological samples of a subject, further comprises using the probe set to generate a classifier for identifying the genomic aberration, which classifier is at least in part generated by: sequencing one or more predetermined regions of a genome from a tumor tissue sample of the subject to provide sequencing reads; in the sequencing reads, identifying sequences for the probe set that covers the one or more predetermined regions of a genome; comparing the probe set to one or more predetermined regions to measure (i) probe coverage of each probe in the probe set and (ii) off-target probe coverage for each probe in the probe set; determining an on-target rate of the probe set based on a ratio of the off-target coverage to the probe coverage; selecting a portion of the probe set that covers one or more predetermined regions of a genome and a portion of the probe set with an on-target rate of at least 95% in aggregate, thereby determining a custom probe set; and providing one or more features to permit classification of the probe set for one or more probes.
In certain embodiments, the classifier is used to identify a new set of probes, at least in part by: generating one or more features from the new set of probes; inputting one or more features from the new set of probes into the classifier; and using the classifier to predict a classification outcome for the new set of probes. In certain embodiments, one or more features is selected from the group consisting of sequence, sequence length, alignment location, probe coverage, off-target probe coverage, on target rate, genomic aberrations, genes, and variants of the genes. In certain embodiments, one or more features are selected from Table 1. In certain embodiments, the classification outcome is selected from a first outcome and a second outcome, wherein the first outcome directs a user to order the new set of probes and the second outcome does not direct the user to order the new set of probes.
In certain embodiments, the one or more predetermined region(s) comprise one or more components selected from the group consisting of one or more segments of a gene, one or more segments of a plurality of genes, coding sequences, non-coding sequences, at least 2600 genes, gene fusions, point mutations, indels, copy-number variations, promoters, and enhancers. In certain embodiments, the sequencing is selected from the group consisting of exome sequencing, transcriptome sequencing, genome sequencing, and cell-free DNA sequencing. In certain embodiments, the genome sequencing is targeted sequencing. In certain embodiments, the genome sequencing is untargeted sequencing.
In certain aspects, the disclosure provides a system for providing a subject displaying cancer with a therapy, comprising: one or more computer memory comprising (i) biologic data of the subject, which biologic data is generated from one or more biological samples of the subject, or (ii) medical history data of the subject; and one or more computer processors operatively coupled to the database, wherein one or more computer processors are individually or collectively programmed to: (i) receive biologic data of the subject from the database; (ii) use the biologic data to generate a first list of therapies according to a molecular profile of the subject, which molecular profile is indicative of one or more genomic aberrations in one or more biological samples; (iii) generate a second list of therapies from the first list of therapies using medical history data of the subject; and (iv) electronically output the second list of therapies.
In certain embodiments, one or more computer memory comprises biologic data of the subject and the medical history data of the subject. In certain embodiments, one or more computer processors receive the biologic data or the medical history data over a network. In certain embodiments, the system for providing a subject displaying cancer with a therapy further comprises a sequencer that subjects one or more biological samples to sequencing to generate the biologic data.
In certain aspects, the disclosure provides a non-transitory computer-readable medium comprising machine-executable code that, upon execution by one or more computer processors, implements a method for providing a subject displaying cancer with a therapy, comprising: (a) receiving biologic data for the subject, which biological data is generated from one or more biological samples of the subject; (b) using the biologic data to generate a first list of therapies according to a molecular profile of the subject, which molecular profile is indicative of one or more genomic aberrations in one or more biological samples; (c) generating a second list of therapies from the first list of therapies using medical history data of the subject; and (d) electronically outputting the second list of therapies.
In certain aspects, the disclosure provides a computer-implemented method for qualifying a subject for a clinical trial, comprising: (a) receiving medical history data and biologic data for the subject, which biologic data is generated from one or more biological samples of the subject without any pipetting by a user during preparation of the one or more biological samples; (b) querying one or more databases for one or more clinical trials corresponding to the medical history data and the biologic data for the subject to generate a set of clinical trials for which the subject qualifies, which set of clinical trials comprises at least one clinical trial; (c) providing the set of clinical trials on a user interface for display to a user; and (d) receiving a request for enrollment of the subject in a clinical trial selected from the provided set of clinical trials through the user interface.
In certain embodiments, (a) comprises receiving phenotype information for the subject. In certain embodiments, the phenotype information comprises one or more of age, weight, height, sex, race, body mass index (BMI), previous treatments and response, eastern cooperative oncology group (ECOG) score, and diagnosis. In certain embodiments, computer-implemented method for qualifying a subject further comprises automatically generating the biologic data from the one or more biological samples of the subject without any involvement of the user. In certain embodiments, computer-implemented method for qualifying a subject further comprises prioritizing the one or more clinical trials within the generated set of clinical trials. In certain embodiments, prioritizing is based on one or more factors selected from the group consisting of: geographic location of the clinical trial, regulatory approval status, annotated medical history data for the subject, or a combination thereof. In certain embodiments, computer-implemented method for qualifying a subject further comprises enrolling the subject in the clinical trial. In certain embodiments, computer-implemented method for qualifying a subject further comprises (e) monitoring the subject enrolled in the clinical trial by assaying the one or more biological samples from the subject, wherein assaying is directed to 100 or more genes or variants thereof selected from Table 1. In certain embodiments, computer-implemented method for qualifying a subject further comprises predicting a likelihood of success for the subject. In certain embodiments, the one or more clinical trials are annotated. In certain embodiments, the querying of (b) has a predicted likelihood of matching to a clinical trial of at least about 90%. In certain embodiments, the request is received over a network. In certain embodiments, the one or more biological samples comprise a blood sample. In certain embodiments, one or more biological samples comprise a tumor tissue sample and a normal tissue sample. In certain embodiments, the tumor tissue sample is a formalin-fixed paraffin embedded (FFPE) tissue sample. In certain embodiments, the receiving of (a) comprises receiving (i) a first biological sample from the tumor tissue sample of the subject, and (ii) a second biological sample from the normal tissue sample of the subject, and assaying the first biological sample and the second biological sample to identify the one or more biological markers in the tumor tissue sample relative to the normal tissue sample to generate a set of biologic data for the subject. In certain embodiments, one or more biological samples are assayed for a presence or absence of biological markers at a concordance correlation coefficient of greater than or equal to about 90% when the biological sample is re-assayed for the presence or absence of the biological markers, which biological markers include a plurality of different types of biological markers. In certain embodiments, the plurality of different types of biological markers are selected from the group consisting of one or more nucleotide insertions, nucleotide deletions, nucleotide substitutions, amino acid insertions, amino acid deletions, amino acid substitutions, gene fusions, copy-number variations, and any combination thereof. In certain embodiments, assaying is directed to two or more genes or variants thereof selected from Table 1. In certain embodiments, assaying is directed to 100 or more genes or variants thereof selected from Table 1. In certain embodiments, the assaying covers at least 2,500 genes, gene fusions, point mutations, indels, copy-number variations, promoters, and/or enhancers. In certain embodiments, biologic data comprises one or more genomic alterations are selected from the group consisting of one or more nucleotide insertions, nucleotide deletions, nucleotide substitutions, amino acid insertions, amino acid deletions, amino acid substitutions, gene fusions, copy-number variations, and any combination thereof. In certain embodiments, the biologic data comprises data from one or more biological sample components selected from the group consisting of: protein, peptides, cell-free nucleic acids, ribonucleic acids, deoxyribose nucleic acids, and any combination thereof.
In certain embodiments, the subject is diagnosed with a solid tumor or cancer. In certain embodiments, the medical history data is automatically annotated. In certain embodiments, the medical history data is annotated in standardized terminology. In certain embodiments, the standardized terminology is Unified Medical Language System. In certain embodiments, the user interface is a web-based user interface or mobile user interface. In certain embodiments, the biologic data is automatically generated from one or more biological samples of the subject without any involvement of the user during the preparation.
In certain aspects, the disclosure provides a method for qualifying a subject for a clinical trial, comprising: (a) receiving (i) a first nucleic acid sample from a tumor tissue sample of the subject, and (ii) a second nucleic acid sample from a normal tissue sample of the subject; (b) assaying the first nucleic acid sample and the second nucleic acid sample to identify the one or more genomic alterations in the tumor tissue sample relative to the normal tissue sample to generate a set of genomic data for the subject, wherein the assaying is performed without any pipetting by a user during preparation of the first nucleic acid sample and the second nucleic acid sample prior to identifying the one or more genomic alternations; (c) querying one or more databases for one or more clinical trials corresponding to a medical history of the subject and the genomic data to generate a set of clinical trials for which the subject qualifies; and providing the set of clinical trials on a user interface for display to a user.
In certain embodiments, the method for qualifying a subject further comprises receiving medical history data for the subject. In certain embodiments, the method for qualifying a subject further comprises (e) receiving a request for enrollment of the subject in a clinical trial selected from the provided set of clinical trials through the user interface. In certain embodiments, the method for qualifying a subject further comprises identifying a therapeutic target based on the medical history and the genomic data and enrolling the subject in a clinical trial based on the identified target. In certain embodiments, the method for qualifying a subject further comprises monitoring the subject, the monitoring comprising assaying one or more nucleic acid samples to generate genomic data, wherein the assaying is directed to 100 or more genes or variants thereof selected from Table 1. In certain embodiments, the normal tissue sample comprises blood. In certain embodiments, the tumor tissue sample is formalin-fixed, paraffin-embedded (FFPE) tissue.
In certain embodiments, assaying is directed to two or more genes or variants thereof selected from Table 1. In certain embodiments, assaying is directed to 100 or more genes or variants thereof selected from Table 1. In certain embodiments, assaying covers at least 2,500 genes, gene fusions, point mutations, indels, copy-number variations, promoters, and/or enhancers. In certain embodiments, the first nucleic acid sample comprises cell-free DNA. In certain embodiments, 100 or more genes are assayed in the cell-free DNA. In certain embodiments, assaying comprises sequencing the first nucleic acid sample and the second nucleic acid sample. In certain embodiments, sequencing is performed without any involvement from the user. In certain embodiments, assaying further comprises receiving a request from the user to sequence the biological sample. In certain embodiments, the sequencing is selected from the group consisting of exome sequencing, transcriptome sequencing, genome sequencing, and cell-free DNA sequencing. In certain embodiments, the first nucleic acid sample and second nucleic acid sample are assayed for one or more genomic alterations at a concordance correlation coefficient of greater than or equal to about 90% when the first nucleic acid sample and second nucleic acid sample are re-assayed for the presence or absence of the genomic alterations, which genomic alterations include a plurality of different types of genomic alterations. In certain embodiments, the types of genomic alteration are selected from the group consisting of: nucleotide insertions, nucleotide deletions, nucleotide substitutions, gene fusions, and copy-number variations. In certain embodiments, the method for qualifying a subject further comprises receiving a request from the user to sequence the first nucleic acid sample and the second nucleic acid sample. In certain embodiments, assaying comprises subjecting the first nucleic acid sample and the second nucleic acid sample to sequencing to detect at least 5 genes or variants thereof selected from Table 1. In certain embodiments, the assaying comprises subjecting the first nucleic acid sample and the second nucleic acid sample to sequencing to detect at least 10 genes or variants thereof selected from Table 1. In certain embodiments, assaying comprises subjecting the first nucleic acid sample and the second nucleic acid sample to sequencing to detect at least 15 genes or variants thereof selected from Table 1. In certain embodiments, the assaying comprises subjecting the first nucleic acid sample and the second nucleic acid sample to sequencing to detect at least 20 genes or variants thereof selected from Table 1. In certain embodiments, the assaying comprises subjecting the first nucleic acid sample and the second nucleic acid sample to sequencing to detect at least 30 genes or variants thereof selected from Table 1. In certain embodiments, the assaying comprises subjecting the first nucleic acid sample and the second nucleic acid sample to sequencing to detect at least 40 genes or variants thereof selected from Table 1. In certain embodiments, the first nucleic acid sample and second nucleic acid sample are obtained from the tumor tissue sample and the normal tissue sample without any pipetting by the user. In certain embodiments, the first nucleic acid sample and second nucleic acid sample are obtained from the tumor tissue sample and the normal tissue sample automatically without any involvement from the user.
In certain aspects, the disclosure provides a method for analyzing a biological sample of a subject, comprising assaying the biological sample for a presence or absence of biological markers at a concordance correlation coefficient of greater than or equal to about 90% and an accuracy of at least about 90% as compared to a control, when the biological sample is re-assayed for the presence or absence of the biological markers, which biological markers include a plurality of different types of biological markers, wherein the assaying comprises a plurality of different assays, including sequencing.
In certain embodiments, the biological sample is a tumor tissue sample. In certain embodiments, the biological sample is homogenous. In certain embodiments, the biological sample is a blood sample comprising plasma and a buffy coat. In certain embodiments, the biological sample comprises tumor tissue and whole blood from the subject. In certain embodiments, the biological sample comprises nucleic acid molecules. In certain embodiments, the biological sample comprises cell-free deoxyribonucleic acid (cfDNA) molecules, cellular deoxyribose nucleic acid (cDNA) molecules, ribonucleic acid (RNA) molecules, and protein, and wherein the cfDNA molecules, the cDNA molecules, and the RNA molecules are assayed for the presence or absence of the biological markers. In certain embodiments, the biological sample comprises normal biomolecules and abnormal biomolecules. In certain embodiments, the normal biomolecules are isolated from a buffy coat of the biological sample. In certain embodiments, the abnormal biomolecules are isolated from plasma or a tumor tissue of the biological sample. In certain embodiments, assaying the biological sample comprises comparing the normal biomolecules to the abnormal biomolecules.
In certain embodiments, the biological sample is a single cell. In certain embodiments, the biological sample is indexed. In certain embodiments, the method for analyzing a biological sample of a subject further comprises re-assaying the biological sample at a later point in time and identifying a change in one or more biological markers. In certain embodiments, assaying comprises processing the biological sample or sequencing the biological sample without any involvement from a user during sample preparation. In certain embodiments, sequencing is selected from the group consisting of exome sequencing, transcriptome sequencing, genome sequencing, and cell-free DNA sequencing. In certain embodiments, assaying begins after a user inputs the biological sample. In certain embodiments, assaying comprises immunohistochemistry profiling and genomic profiling of the biological sample. In certain embodiments, the method for analyzing a biological sample of a subject further comprises receiving a request from the user to process the biological sample or sequence the biological sample. In certain embodiments, the plurality of different types of biological markers are selected from the group consisting of one or more nucleotide insertions, nucleotide deletions, nucleotide substitutions, amino acid insertions, amino acid deletions, amino acid substitutions, gene fusions, copy-number variations, and any combination thereof. In certain embodiments, 2500 or greater biological markers are assayed. In certain embodiments, assaying comprises assaying 100 or greater biological markers in cell-free DNA of the biological sample. In certain embodiments, the plurality of different types of biological markers comprises antigens and genetic alterations. In certain embodiments, the plurality of different types of biological markers comprises antigens and genetic alterations. In certain embodiments, the method for analyzing a biological sample of a subject further comprises selecting a clinical trial based on the presence or absence of biological markers. In certain embodiments, the control is a healthy control. In certain embodiments, the control is from the subject. In certain embodiments, the assaying includes performing an assay that is not sequencing. In certain embodiments, the assaying is at a concordance correlation coefficient of greater than or equal to about 90% and an accuracy of at least about 90% based on assaying the biological sample multiple times. In certain embodiments, the assaying is at a concordance correlation coefficient of greater than or equal to about 90% and an accuracy of at least about 90% based on assaying the biological sample in at least two different geographic locations. In certain embodiments, the concordance correlation coefficient is greater than or equal to about 95%. In certain embodiments, the concordance correlation coefficient is greater than or equal to about 99%. In certain embodiments, the assaying comprises retrieving the biological sample and processing the biological sample, which processing is in the absence of pipetting.
In certain aspects, the disclosure provides a method for identifying one or more somatic mutations in a subject, comprising: (a) obtaining a tumor biological sample and normal biological sample from the subject; (b) assaying the tumor biological sample and the normal biological sample to (i) obtain sequence information for a first nucleic acid sample and a second nucleic acid sample obtained from the tumor biological sample and the normal biological sample, respectively, without any pipetting by a user during preparation of the first nucleic acid sample and the second nucleic acid sample prior to sequencing, and (ii) identify one or more other biological markers of a type different than the first nucleic acid sample and the second nucleic acid sample; (c) comparing the sequence information obtained for the first nucleic acid sample and the second nucleic acid sample to identify one or more genomic alterations in the tumor biological sample relative to the normal biological sample; and (d) using the (i) one or more other biological markers identified in (b) and (ii) the one or more genomic alterations identified in (c) to identify the one or more somatic mutations in the subject at an accuracy of at least about 90% as compared to a control.
In certain embodiments, the first nucleic acid sample and the second nucleic acid sample are automatically obtained from the tumor biological sample and the normal biological sample, respectively. In certain embodiments, the first nucleic acid sample and the second nucleic acid sample are automatically obtained from the tumor biological sample and the normal biological sample, respectively, without any involvement of the user during the preparation. In certain embodiments, the method for identifying one or more somatic mutations further comprises prior to (b), automatically obtaining (i) the first nucleic acid sample from the tumor biological sample of the subject and (ii) the second nucleic acid sample from the normal biological sample of the subject, without any involvement from the user. In certain embodiments, the tumor biological sample and the normal biological sample are obtained from a sample of blood comprising plasma and buffy coat from the subject. In certain embodiments, the first nucleic acid sample is obtained from cell-free DNA in the plasma. In certain embodiments, the tumor biological sample is a formalin-fixed paraffin embedded (FFPE) tissue sample. In certain embodiments, the normal biological sample is a buffy coat sample. In certain embodiments, the sequencing is selected from the group consisting of exome sequencing, transcriptome sequencing, genome sequencing, and cell-free DNA sequencing. In certain embodiments, the cell-free DNA sequencing comprises mismatch targeted sequencing (Mita-Seq) or tethered elimination of termini (Tet-Seq). In certain embodiments, the method for identifying one or more somatic mutations further comprises receiving a request from the user to sequence the first nucleic acid sample and the second nucleic acid sample. In certain embodiments, the sequencing covers at least 2,500 genes, gene fusions, point mutations, indels, copy-number variations, promoters, and/or enhancers. In certain embodiments, the sequencing is directed to two or more genes or variants thereof selected from Table 1. In certain embodiments, the sequencing is directed to 100 or more genes or variants thereof selected from Table 1. In certain embodiments, the one or more genomic alterations are selected from the group consisting of one or more nucleotide insertions, nucleotide deletions, nucleotide substitutions, amino acid insertions, amino acid deletions, amino acid substitutions, gene fusions, copy-number variations, and any combination thereof.
In certain embodiments, the subject is diagnosed with a solid tumor or cancer. In certain embodiments, the method for identifying one or more somatic mutations further comprises indexing the first nucleic acid sample and the second nucleic acid sample. In certain embodiments, the first nucleic acid sample and the second nucleic acid sample are assayed for one or more genomic alterations at a concordance correlation coefficient of greater than or equal to about 90% when the first nucleic acid sample and the second nucleic acid sample are re-assayed for the presence or absence of the genomic alterations, which genomic alterations include a plurality of different types of genomic alterations. In certain embodiments, the types of genomic alterations are selected from the group consisting of: nucleotide insertions, nucleotide deletions, nucleotide substitutions, gene fusions, and copy-number variations. In certain embodiments, the one or more genomic alterations are identified at an accuracy of at least about 90%.
Another aspect of the present disclosure provides a non-transitory computer readable medium comprising machine executable code that, upon execution by one or more computer processors, implements any of the methods above or elsewhere herein.
Another aspect of the present disclosure provides a computer system comprising one or more computer processors and a non-transitory computer readable medium coupled thereto. The non-transitory computer readable medium comprises machine executable code that, upon execution by the one or more computer processors, implements any of the methods above or elsewhere herein.
Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.
The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings (also “figure” and “FIG.” herein), of which:
While various embodiments of the invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.
The term “genetic variant,” as used herein, generally refers to an alteration, variant or polymorphism in a nucleic acid sample or genome of a subject. Such alteration, variant or polymorphism can be with respect to a reference genome, which may be a reference genome of the subject or other individual. Single nucleotide polymorphisms (SNPs) are a form of polymorphisms. In some examples, one or more polymorphisms comprise one or more single nucleotide variations (SNVs), insertions, deletions, repeats, small insertions, small deletions, small repeats, structural variant junctions, variable length tandem repeats, and/or flanking sequences. Copy number variants (CNVs) and other rearrangements are also forms of genetic variation. A genomic alternation may be or include a base change, insertion, deletion, repeat, copy number variation, or structural rearrangement.
The term “polynucleotide,” as used herein, generally refers to a molecule comprising one or more nucleic acid subunits. A polynucleotide can include one or more subunits selected from adenosine (A), cytosine (C), guanine (G), thymine (T) and uracil (U), or variants thereof. A nucleotide can include A, C, G, T or U, or variants thereof. A nucleotide can include any subunit that can be incorporated into a growing nucleic acid strand. Such subunit can be an A, C, G, T, or U, or any other subunit that is specific to one or more complementary A, C, G, T or U, or complementary to a purine (i.e., A or G, or variant thereof) or a pyrimidine (i.e., C, T or U, or variant thereof). A subunit can enable individual nucleic acid bases or groups of bases (e.g., AA, TA, AT, GC, CG, CT, TC, GT, TG, AC, CA, or uracil-counterparts thereof) to be resolved. In some examples, a polynucleotide is deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), or derivatives thereof. A polynucleotide can be single-stranded or double stranded.
The term “subject,” as used herein, generally refers to an animal, such as a mammalian species (e.g., human) or avian (e.g., bird) species, or other organism, such as a plant. More specifically, the subject can be a vertebrate, a mammal, a mouse, a primate, a simian or a human. Animals include, but are not limited to, farm animals, sport animals, and pets. A subject can be a healthy individual, an individual that has or is suspected of having a disease or a pre-disposition to the disease, or an individual that is in need of therapy or suspected of needing therapy. A subject can be a patient.
The term “sample,” as used herein, generally refers can be any biological sample isolated from a subject. For example, a sample can comprise, without limitation, bodily fluid, whole blood, platelets, serum, plasma, stool, red blood cells, white blood cells or leucocytes, endothelial cells, tissue biopsies, synovial fluid, lymphatic fluid, ascites fluid, interstitial or extracellular fluid, the fluid in spaces between cells, including gingival crevicular fluid, bone marrow, cerebrospinal fluid, plueral fluid, saliva, mucous, sputum, semen, sweat, urine, or any other bodily fluids. A bodily fluid can include saliva, blood, or serum. For example, a polynucleotide can be cell-free DNA and/or cell-free RNA (e.g., transcripts) isolated from a bodily fluid, e.g., blood or serum. A sample can also be a tumor sample, which can be obtained from a subject by various approaches, including, but not limited to, venipuncture, excretion, ejaculation, massage, biopsy, needle aspirate, lavage, scraping, surgical incision, or intervention or other approaches.
The term “genome” generally refers to an entirety of an organism's hereditary information. A genome can be encoded either in DNA or in RNA. A genome can comprise coding regions that code for proteins as well as non-coding regions. A genome can include the sequence of all chromosomes together in an organism. For example, the human genome has a total of 46 chromosomes. The sequence of all of these together constitutes a human genome.
As used herein, the term “sequencing” is used in a broad sense and may refer to any technique that allows the order of at least some consecutive nucleotides in at least part of a nucleic acid to be identified, including without limitation at least part of an extension product or a vector insert.
The terms “adaptor(s)”, “adapter(s)” and “tag(s)” are used synonymously throughout this specification. An adaptor or tag can be coupled to a polynucleotide sequence to be “tagged” by any approach including ligation, hybridization, or other approaches. Adaptors may be unidirectional or bidirectional. Adaptors may be blunt-ended or have overhang ends.
The term “sequencing adaptor,” as used herein, generally refers to a molecule (e.g., polynucleotide) that is adapted to permit a sequencing instrument to sequence a target polynucleotide, such as by interacting with the target polynucleotide to enable sequencing. The sequencing adaptor permits the target polynucleotide to be sequenced by the sequencing instrument. In an example, the sequencing adaptor comprises a nucleotide sequence that hybridizes or binds to a capture polynucleotide attached to a solid support of a sequencing system, such as a flow cell. In another example, the sequencing adaptor comprises a nucleotide sequence that hybridizes or binds to a polynucleotide to generate a hairpin loop, which permits the target polynucleotide to be sequenced by a sequencing system. The sequencing adaptor can include a sequencer motif, which can be a nucleotide sequence that is complementary to a flow cell sequence of other molecule (e.g., polynucleotide) and usable by the sequencing system to sequence the target polynucleotide. The sequencer motif can also include a primer sequence for use in sequencing, such as sequencing by synthesis. The sequencer motif can include the sequence(s) needed to couple a library adaptor to a sequencing system and sequence the target polynucleotide.
As used herein the terms “at least”, “at most” or “about”, when preceding a series, refers to each member of the series, unless otherwise identified.
The term “about” and its grammatical equivalents in relation to a reference numerical value can include a range of values up to plus or minus 10% from that value. For example, the amount “about 10” can include amounts from 9 to 11. In other embodiments, the term “about” in relation to a reference numerical value can include a range of values plus or minus 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or 1% from that value.
The term “at least” and its grammatical equivalents in relation to a reference numerical value can include the reference numerical value and greater than that value. For example, the amount “at least 10” can include the value 10 and any numerical value above 10, such as 11, 100, and 1,000.
The term “at most” and its grammatical equivalents in relation to a reference numerical value can include the reference numerical value and less than that value. For example, the amount “at most 10” can include the value 10 and any numerical value under 10, such as 9, 8, 5, 1, 0.5, and 0.1.
The term “label,” as used herein, generally refers to one or more strings of characters. A label may be text string, a numerical string, alphanumerical string, or a string of characters. A label may identify a relevant portion of certain biological data, medical history data, or clinical trial data.
The present disclosure provides methods for analyzing a biological sample of a subject and for clinical diagnosis and testing, such as screening (for example for breast cancer as is common in women over 50), scans, such as magnetic resonance imaging (MM) scans, computerized tomography (CT) scans, or body fluid testing (for instance blood tests).
A subject with a genetic susceptibility may be diagnosed with a specific condition. Such conditions can include cancer, a solid tumor, obesity, autoimmune diseases, heart disease, AIDS at the onset of which is known to occur at different times in otherwise similar individuals, blood pressure control, asthma, diabetes and other chronic diseases. Autoimmune diseases may include hay fever and arthritis. Depression may include conditions such as Major Depression, Dysthymic Disorder, Unspecified Depression, Adjustment Disorder (with Depression) and Bipolar Depression.
The subject may also be diagnosed with cancer, such as acute lymphoblastic leukemia (ALL), acute myeloid leukemia (AML), adrenocortical carcinoma, Kaposi Sarcoma, anal cancer, basal cell carcinoma, bile duct cancer, bladder cancer, bone cancer, osteosarcoma, malignant fibrous histiocytoma, brain stem glioma, brain cancer, bowl cancer, cancers of the blood, craniopharyngioma, ependymoblastoma, ependymoma, medulloblastoma, medulloeptithelioma, pineal parenchymal tumor, breast cancer, bronchial tumor, Burkitt lymphoma, Non-Hodgkin lymphoma, carcinoid tumor, cervical cancer, chordoma, chronic lymphocytic leukemia (CLL), chronic myelogenous leukemia (CML), colon cancer, colorectal cancer, cutaneous T-cell lymphoma, ductal carcinoma in situ, endometrial cancer, esophageal cancer, Ewing Sarcoma, eye cancer, intraocular melanoma, retinoblastoma, fibrous histiocytoma, gallbladder cancer, gastric cancer, glioma, hairy cell leukemia, head and neck cancer, heart cancer, hepatocellular (liver) cancer, Hodgkin lymphoma, hypopharyngeal cancer, kidney cancer, laryngeal cancer, lip cancer, oral cavity cancer, lung cancer, non-small cell carcinoma, small cell carcinoma, melanoma, mouth cancer, myelodysplastic syndromes, multiple myeloma, medulloblastoma, nasal cavity cancer, paranasal sinus cancer, neuroblastoma, nasopharyngeal cancer, oral cancer, oropharyngeal cancer, osteosarcoma, ovarian cancer, pancreatic cancer, papillomatosis, paraganglioma, parathyroid cancer, penile cancer, pharyngeal cancer, pituitary tumor, plasma cell neoplasm, prostate cancer, rectal cancer, renal cell cancer, rhabdomyosarcoma, salivary gland cancer, Sezary syndrome, skin cancer, nonmelanoma, small intestine cancer, soft tissue sarcoma, squamous cell carcinoma, testicular cancer, throat cancer, thymoma, thyroid cancer, urethral cancer, uterine cancer, uterine sarcoma, vaginal cancer, vulvar cancer, Waldenstrom macroglobulinemia, Wilms Tumor and/or other tumors.
The workflow 100 is capable of generating clinical trial matches and/or standard of care treatment options. Under operation 105, a subject's medical records may be acquired and processed to extract relevant clinical information.
In an aspect, the present disclosure provides a method for analyzing a biological sample of a subject, comprising assaying a biological sample for a presence or absence of biological markers at a concordance correlation coefficient of greater than or equal to about 90% and an accuracy of at least about 90% as compared to a control. The concordance correlation coefficient may be greater than or equal to about 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, or 99%. The accuracy may be at least about 60%, about 70%, about 80%, or about 90%. The accuracy may be at least about 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, or 99%. The biological sample may be re-assayed for the presence or absence of the biological markers. The biological sample may be homogenous. The biological markers may include a plurality of different types of biological markers. At least about 500 biological markers, 1000 biological markers, 1500 biological markers, 2000 biological markers, 2500 biological markers, 3000 biological markers, 3500 biological markers, or 4000 biological markers can be assayed.
Biological samples may include fluid and/or tissue from a subject. The biological sample may be a tumor biological sample or a normal biological sample. A control may be obtained from the subject. The control may be a healthy control or normal biological sample. The biological sample to be tested may be whole blood, or saliva. The biological sample can comprise plasma, a buffy coat, or saliva. A buffy coat may comprise lymphocytes, thrombocytes, and leukocytes. A tumor sample may include a tumor tissue biopsy and/or circulating tumor DNA in a cell-free DNA sample. The normal sample can include buffy coat cells, whole blood, or normal epithelial cells. Buffy coat cells may be white blood cells. The normal sample can include nucleic acid molecules derived from the white blood cells or epithelial cells in the saliva. Normal DNA may be extracted from the white blood cells or epithelial cells in the saliva. A sample can comprise nucleic acids from different sources. For example, a sample can comprise germline DNA or somatic DNA. A sample can comprise nucleic acids carrying mutations. For example, a sample can comprise DNA carrying germline mutations and/or somatic mutations. A sample can also comprise DNA carrying cancer-associated mutations (e.g., cancer-associated somatic mutations). Tumor and normal cells may be compared. The tumor sample may be compared to the various normal samples. A sample can comprise RNA (e.g., mRNA), which may be sequenced (e.g., via reverse transcription of RNA and subsequent sequencing of cDNA).
A biological fluid can include any untreated or treated fluid associated with living organisms. Examples can include, but are not limited to, blood, including whole blood, warm or cold blood, and stored or fresh blood; treated blood, such as blood diluted with at least one physiological solution, including but not limited to saline, nutrient and/or anticoagulant solutions; blood components, such as platelet concentrate (PC), platelet-rich plasma (PRP), platelet-poor plasma (PPP), platelet-free plasma, plasma, fresh frozen plasma (FFP), components obtained from plasma, packed red cells (PRC), transition zone material or buffy coat (BC); analogous blood products derived from blood or a blood component or derived from bone marrow; red cells separated from plasma and resuspended in physiological fluid or a cryoprotective fluid; and platelets separated from plasma and resuspended in physiological fluid or a cryoprotective fluid. Other non-limiting examples of biological samples include skin, heart, lung, kidney, bone marrow, breast, pancreas, liver, muscle, smooth muscle, bladder, gall bladder, colon, intestine, brain, prostate, esophagus, thyroid, serum, saliva, urine, gastric and digestive fluid, tears, stool, semen, vaginal fluid, interstitial fluids derived from tumorous tissue, ocular fluids, sweat, mucus, earwax, oil, glandular secretions, spinal fluid, hair, fingernails, skin cells, plasma, nasal swab or nasopharyngeal wash, spinal fluid, cerebral spinal fluid, tissue, throat swab, biopsy, placental fluid, amniotic fluid, cord blood, emphatic fluids, cavity fluids, sputum, pus, micropiota, meconium, breast milk, and/or other excretions or body tissues. Results from blood samples may be obtained after at least about 1 minute, 5 minutes, 10 minutes, 20 minutes, 30 minutes, 1 hour, 2 hours, 3 hours, 4 hours, 5 hours, 6 hours, 12 hours, 1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 8 days, 9 days, 10 days, or longer.
A sample can also be a tumor sample, which can be obtained from a subject by various approaches, including, but not limited to, venipuncture, excretion, ejaculation, massage, biopsy, needle aspirate, lavage, scraping, surgical incision, or intervention or other approaches. The tumor sample may be a tumor tissue sample.
The biological sample can comprise nucleic acid molecules from different sources. For example, a sample can comprise germline DNA or somatic DNA. A sample can comprise nucleic acids carrying mutations. For example, a sample can comprise DNA carrying germline mutations and/or somatic mutations. A sample can also comprise DNA carrying cancer-associated mutations (e.g., cancer-associated somatic mutations).
A sample can comprise various amount of nucleic acid that contains genome equivalents. For example, a sample of about 30 ng DNA can contain about 10,000 (104) haploid human genome equivalents and, in the case of cfDNA, about 200 billion (2×1011) individual polynucleotide molecules. Similarly, a sample of about 100 ng of DNA can contain about 30,000 haploid human genome equivalents and, in the case of cell-free DNA (cfDNA), about 600 billion individual molecules.
The biological sample may be a tissue sample. A tissue may be a group of connected specialized cells that perform a special function. The tissue may also be an extracellular matrix material. The tissue analyzed can be a portion of a tissue to be transplanted or surgically grafted, such as an organ (e.g., heart, kidney, liver, lung, etc.), skin, bone, nervous tissue, tendons, blood vessels, fat, cornea, blood, or a blood component.
Examples of tissue may be selected from a group consisting of placental tissue, mammary gland tissue, gastrointestinal tissue, liver tissue, kidney tissue, musculoskeletal tissue, genitourinary tissue, bone marrow tissue, prostate tissue, skin tissue, nasal passage tissue, neural tissue, eye tissue, and central nervous system tissue. The tissue may originate from a human and or mammal. The tissue can comprise the connecting material and the liquid material found in association with the cells and/or tissues. A tissue can also include biopsied tissue and media containing cells or biological material. The biological sample may be a tumor tissue sample.
Tissue from a subject may be preserved for research that involves maintaining molecule and morphological integrity. The preservation methods of tissue for latter downstream usage can include freezing media embedded tissue, flash freezing tissue, and formalin-fixed paraffin embedded (FFPE tissue). The preservation method may also include blood sample collection, transport, and storage in a direct draw whole blood collection tube. The collection tube may be a Cell-Free DNA BCT®. The Cell-Free DNA BCT can stabilize cell-free plasma DNA and can preserve cellular genomic DNA found in nucleated blood cells and circulating epithelial cells in whole blood. Blood may be preserved in blood collection tubes.
The tumor biological sample may be a formalin-fixed paraffin embedded (FFPE) tissue sample. Paraformaldehyde may be used for tissue fixation. The tissue can be sliced or used as a whole. Prior to sectioning, the tissue can be embedded in cryomedia or paraffin wax. A microtome or a cryostat may be used to section the tissue. The sections may be mounted onto slides, dehydrated with alcohol washes and cleared with a detergent. The detergent may be xylene or citrisolv. For FFPE tissues, antigen retrieval may occur by thermal pre-treatment or protease pre-treatment of the sections.
Cells and other biocomponents in a biological sample may be analyzed using antibodies (e.g., immunohistochemistry, western blot, enzyme linked immunosorbent assay (ELISA), mass spectrometry, antibody staining, radioimmunoassay, fluoroimmunoassay, chemiluminescence immunoassay, and liposome immunoassay). Primary cells may be isolated from small fragments of tissue and purified from the blood. The primary cells may include lymphocytes (white blood cells), fibroblasts (skin biopsy cells), or epithelial cells. The biological sample may be a single cell. Before antibody staining, endogenous biotin or enzymes can be quenched. Biological samples may be incubated with buffer for blockage of reactive sites in which primary or secondary antibodies can bind. This step may help with reducing non-specific binding between the antibodies and non-specific proteins resulting in background staining. Blocking buffers may be selected from the group consisting of non-fat dry milk, normal serum, gelatin, or bovine serum albumin. Background staining may be reduced by methods selected from the group consisting of dilution of the primary or secondary antibodies, use of different detection system or a different primary antibody, and changing the time or temperature of the incubation. Tissue known to express the antigen and tissue not known to express the antigen may be used as a control.
The biological sample obtainable from specimens or fluids can include detached tumor cells or free nucleic acids that are released from dead or damaged tumor cells. Nucleic acids may include deoxyribonucleic acid (DNA), cell free-deoxyribonucleic acid (cfDNA) molecules, cellular deoxyribose nucleic acid (cDNA) molecules, ribonucleic acid (RNA) molecules, genomic DNA molecules, mitochondrial DNA molecules, single or double stranded DNA molecules, and protein-associated nucleic acids. Any nucleic acid specimen in purified or non-purified form obtained from such specimen cell can be utilized as the starting nucleic acid or acids. The cfDNA molecules, cDNA molecules, and RNA molecules may be assayed for presence or absence of biological markers.
Biological data may be obtained from the biological samples. Biologic data may comprise data from one or more biological sample components selected from the group consisting of: protein, peptides, cell-free nucleic acids, ribonucleic acids, deoxyribose nucleic acids, and any combination thereof.
The biomolecules may be normal and abnormal. The normal biomolecules may be isolated from the buffy coat of the biological sample. The abnormal biomolecules may be isolated from the plasma or a tumor tissue of the biological sample. A sample can comprise nucleic acids from different sources. For example, a sample can comprise germline DNA or somatic DNA. A sample can comprise nucleic acids carrying mutations. For example, a sample can comprise DNA carrying germline mutations and/or somatic mutations. A sample can also comprise DNA carrying cancer-associated mutations (e.g., cancer-associated somatic mutations).
A biological sample of components may be analyzed with respect to various biomarkers. Biomarkers can be indicators of or a proxy for various biological phenomena. The presence or absence of a biological marker, a quantity or quality thereof can be indicative of a biological process of phenomena. Biomarkers (biological markers) may be a characteristic that is objectively measured and determined as an indicator of normal biological processes, pathogenic processes, pharmacologic responses to a therapeutic intervention, or environmental exposure. Biomarkers may be categorized into DNA biomarkers, DNA tumor biomarkers, and general biomarkers. Biomarkers can be selected from the group consisting of cancer biomarker, clinical endpoint, companion endpoint, copy number variant (CNV) biomarker, diagnostic biomarker, disease biomarker, DNA biomarker efficacy biomarker, epigenetic biomarker, monitoring biomarker, prognostic biomarker, predictive biomarker, safety biomarker, screening biomarker, staging biomarker, stratification biomarker, surrogate biomarker, target biomarker, target biomarker, and toxicity biomarker. Diagnostic biomarkers may be used to diagnose a disease or decide on the severity of a disease. DNA biomarkers can comprise interleukin 28B (IL28B) or solute carrier organic anion transporter family member 1B1 (SLCO1B1). DNA tumor biomarkers may comprise BluePrint®, epidermal growth factor receptor (EGFR), Kirsten rat scarcoma viral oncogene homologue (K-Ras), MammaPrint®, and OncoTypDX®. General biomarkers may be a point of care test, such as RheumaChec or CCPoint assay.
The biological sample may comprise normal biomolecules and abnormal biomolecules extracted from a subject. DNA extraction may be obtained from buccal swabs, hair sample, urine sample, blood sample, and a tissue sample. During a biopsy, sample of cells and tissue may be removed from the subject's body for analysis in a laboratory. Biopsy may be selected from the group consisting of advanced breast biopsy instrumentation, brush biopsy, computed tomography, cone biopsy, core biopsy, Crosby capsule, curettings, ductal lavage, endoscopic biopsy, endoscopic retrograde cholangiopancreatography, evacuation, excision biopsy, fine needle aspiration, fluoroscopy, frozen section, imprint, incision biopsy, liquid based cytology, loop electrosurgical excision procedure, magnetic resonance imaging, mammography, needle biopsy, positron emission tomography with fluorodeoxy-glucose, punch biopsy, sentinel node biopsy, shave biopsy, smears, stereotactic biopsy, transurethral resection, trephine (bone marrow) biopsy, ultrasound, vacuum-assisted biopsies, and wire localization biopsy.
A subject may undergo blood sample withdrawal. After centrifugation, white blood cells may be isolated from the blood sample. Next, the white blood cells may be divided into diseased cells and control cells.
A subject may collect their own biological samples. The biological sample may be collected at home and transported to the medical center or facility. The biological sample may also be collected at a medical center, for example, at a doctor's office, clinic, laboratory patient service center, or hospital. Methods of collection may comprise male patient ejaculation, subjects coughing up sputum, subjects collecting stool during toileting, urination, saliva swab, combination of saliva and oral mucosal transudate collected from the mouth, and sweat collected by a sweat simulation procedure.
Assaying may begin after a user inputs the biological sample. Assaying can comprise nucleic acid extraction from the biological sample. Nucleic acids may be extracted from a biological sample using various techniques. During nucleic acid extraction, cells may be disrupted to expose the nucleic acid by grinding or sonicating. Detergent and surfactants may be added during cell lysis to remove the membrane lipids. Protease may be used to remove proteins. Also, RNase may be added to remove RNA. Nucleic acids can also be purified by organic extraction with phenol, phenol/chloroform/isoamyl alcohol, or similar formulations, including TRIzol and TriReagent. Other non-limiting examples of extraction techniques include: (1) organic extraction followed by ethanol precipitation, e.g., using a phenol/chloroform organic reagent (Ausubel et al., 1993), with or without the use of an automated nucleic acid extractor, e.g., the Model 341 DNA Extractor available from Applied Biosystems (Foster City, Calif.); (2) stationary phase adsorption methods (U.S. Pat. No. 5,234,809; Walsh et al., 1991, which is entirely incorporated herein by reference); and (3) salt-induced nucleic acid precipitation methods (Miller et al., (1988), such precipitation methods being typically referred to as “salting-out” methods. Another example of nucleic acid isolation and/or purification includes the use of magnetic particles (e.g., beads) to which nucleic acids can specifically or non-specifically bind, followed by isolation of the particles using a magnet, and washing and eluting the nucleic acids from the particles. See e.g., U.S. Pat. No. 5,705,628, which is entirely incorporated herein by reference. The above isolation methods may be preceded by an enzyme digestion step to help eliminate unwanted protein from the sample, e.g., digestion with proteinase K, or other like proteases. See, e.g., U.S. Pat. No. 7,001,724, which is entirely incorporated herein by reference. RNase inhibitors may be added to the lysis buffer. For certain cell or sample types, it may be desirable to add a protein denaturation/digestion step to the protocol. Purification methods may be directed to isolate DNA, RNA (including but not limited to mRNA, rRNA, tRNA), or both. When both DNA and RNA are isolated together during or subsequent to an extraction procedure, further steps may be employed to purify one or both separately from the other. Sub-fractions of extracted nucleic acids can also be generated, for example, purification by size, sequence, or other physical or chemical characteristic. In addition to an initial nucleic acid isolation step, purification of nucleic acids can be performed after subsequent manipulation, such as to remove excess or unwanted reagents, reactants, or products.
In another aspect, the present disclosure provides a method for identifying one or more somatic mutations in a biological sample from a subject. A tumor biological sample and normal biological sample may be obtained from the subject. The tumor biological sample and the normal biological sample may be assayed to (i) obtain sequence information for a first nucleic acid sample and a second nucleic acid sample automatically obtained from the tumor biological sample and the normal biological sample, respectively, without any involvement from a user, and (ii) identify one or more other biological markers of a type different than the first nucleic acid sample and the second nucleic acid sample. The sequence information obtained for the first nucleic acid sample and the second nucleic acid sample may be compared to identify one or more genomic alterations in the tumor biological sample relative to the normal biological sample. One or more other biological markers previously identified and one or more genomic alterations previously identified may be used to identify one or more somatic mutations in the subject at an accuracy of at least about 90% as compared to a control.
A first nucleic acid sample from a tumor biological sample of the subject and the second nucleic acid sample from a normal biological sample of the subject may be obtained. Obtaining a biological sample can comprise receiving a biological sample from the tumor tissue sample of the subject, and (ii) a biological sample from the normal tissue sample of the subject. The first biological sample and the second biological sample may be assayed to identify one or more biological markers in the tumor tissue sample relative to the normal tissue sample to generate a set of biologic data for the subject. The first nucleic acid sample and the second nucleic acid sample may be indexed. The first nucleic acid sample may be obtained from cell-free DNA in the plasma.
Assaying biological samples may comprise comparing the normal biomolecules to the abnormal biomolecules. After a user inputs a biological sample, the assaying may begin. The assaying can comprise processing the biological sample or sequencing the biological sample without any involvement from the user. The profiles of at least one or more markers of a disease or condition may be compared. This comparison can be quantitative or qualitative. Quantitative measurements can be taken using any of the assays described herein. Assaying may comprise processing a biological sample and/or sequencing of the biological sample without any involvement from a user. For example, sequencing, direct sequencing, random shotgun sequencing, Sanger dideoxy termination sequencing, whole-genome sequencing, exome sequencing, transcriptome sequencing, cell-free DNA sequencing by hybridization, pyrosequencing, capillary electrophoresis, gel electrophoresis, duplex sequencing, cycle sequencing, single-base extension sequencing, solid-phase sequencing, high-throughput sequencing, massively parallel signature sequencing, emulsion PCR, sequencing by reversible dye terminator, paired-end sequencing, near-term sequencing, exonuclease sequencing, sequencing by ligation, short-read sequencing, single-molecule sequencing, sequencing-by-synthesis, real-time sequencing, reverse-terminator sequencing, nanopore sequencing, 454 sequencing, Solexa Genome Analyzer sequencing, SOLiDsequencing, MS-PET sequencing, mass spectrometry, matrix assisted laser desorption/ionization-time of flight (MALDI-TOF) mass spectrometry, electrospray ionization (ESI) mass spectrometry, surface-enhanced laser desorption/ionization-time of flight (SELDI-TOF) mass spectrometry, quadrupole-time of flight (Q-TOF) mass spectrometry, atmospheric pressure photoionization mass spectrometry (APPI-MS), Fourier transform mass spectrometry (FTMS), matrix-assisted laser desorption/ionization-Fourier transform-ion cyclotron resonance (MALDI-FT-ICR) mass spectrometry, secondary ion mass spectrometry (SIMS), polymerase chain reaction (PCR) analysis, quantitative PCR, real-time PCR, fluorescence assay, colorimetric assay, chemiluminescent assay, or a combination thereof. The sequencing may be whole genome sequencing, low pass whole genome sequencing, or targeted sequencing. The sequencing may be whole transcriptome sequencing on RNA, such as tumor RNA.
Sequencing may also comprise detecting the sequencing product using an instrument, for example but not limited to an ABI PRISM 377 DNA Sequencer, an ABI PRISM 310, 3100, 3100-Avant, 3730, or 373OxI Genetic Analyzer, an ABI PRISM 3700 DNA Analyzer, or an Applied Biosystems SOLiD.™. System (all from Applied Biosystems), a Genome Sequencer 20 System (Roche Applied Science), or a mass spectrometer.
Sequencing can cover 2,500 genes, gene fusions, point mutations, indels, copy-number variations, promoters, and/or enhancers. Sequencing may be directed to at least 1 gene, 2 genes, 3 genes, 4 genes, 5 genes, 10 genes, 20 genes, 25 genes, 50 genes, 100 genes, 200 genes, 300 genes, 400 genes, or 500 genes, variants, or promoters thereof, selected from Table 1. Multiple subjects may be sequenced simultaneously. Sequencing may have a depth of coverage of at least about 0.5×, 1×, 2×, 3×, 4×, 5×, 6×, 7×, 8×, 9×, 10×, 20×, 30×, 40×, 50×, 100×, 200×, 300×, 400×, 500×, 600×, 700×, 800×, 900×, 1000×, 2000×, 3000×, 4000×, 5000×, 6000×, 7000×, 8000×, 9000×, or 10,000×. Sequencing can comprise whole exome sequencing, whole genome sequencing, or a combination thereof.
In a biological sample comprising one or more nucleic acids, various genes may be assayed. One or several, e.g., a panel, of genes may be assayed. For example, at least about 50 genes, 100 genes, 150 genes, 200 genes, 250 genes, 300 genes, or 500 genes may be assayed in the cell free DNA. The tumor biological sample may be a blood and formalin-fixed paraffin embedded (FFPE) tissue sample. The tissue sample may be frozen or fresh. The first nucleic acid sample and the second nucleic acid sample may be assayed for one or more genomic alterations and biomarkers at a concordance correlation coefficient of at least about 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% when the first nucleic acid sample and the second nucleic acid sample are re-assayed for the presence or absence of the genomic alterations or biomarkers. The assayed genomic alterations and biomarkers may contain a plurality of genomic alterations and biomarkers. The genomic alterations may include a plurality of different types of genomic alterations. The genomic alterations may include: nucleotide insertions, nucleotide deletions, nucleotide substitutions, gene fusions, and copy-number variations, point mutations, gene amplifications, gene deletions, non-recurring mutations, and mRNA based alterations. At least 1 genomic alteration, 2 genomic alterations, 3 genomic alterations, 4 genomic alterations, 5 genomic alterations, 10 genomic alterations, 15 genomic alterations, 20 genomic alterations, 25 genomic alterations, 50 genomic alterations, or 100 genomic alterations may be identified at an accuracy of at least about 90%. For example, at least about 70%, 75%, 80%, 85%, 90%, 95%, or 99% accuracy.
Quantitative comparisons can include statistical analyses such as t-test, ANOVA, Kruskal-Wallis, Wilcoxon, Mann-Whitney, and odds ratio. Quantitative differences can include differences in the levels of markers between profiles or differences in the numbers of markers present between profiles, and combinations thereof. Examples of levels of the markers can be, without limitation, gene expression levels, nucleic acid levels, protein levels, lipid levels, and the like. Qualitative differences can include, but are not limited to, activation and inactivation, protein degradation, nucleic acid degradation, and covalent modifications.
The profile may be a nucleic acid profile, a protein profile, a lipid profile, a carbohydrate profile, a metabolite profile, immunohistochemistry profile, or a combination thereof. The profile can be qualitatively or quantitatively determined.
A nucleic acid profile can be, without limitation, a genotypic profile, a single nucleotide polymorphism profile, a gene mutation profile, a gene copy number profile, a DNA methylation profile, a DNA acetylation profile, a chromosome dosage profile, a gene expression profile, or a combination thereof.
The nucleic acid profile can be determined by various methods for determining or detecting genotypes, single nucleotide polymorphisms, gene mutations, gene copy numbers, DNA methylation states, DNA acetylation states, chromosome dosages. Biological markers may comprise antigens or genomic alterations. Biological markers may include one or more nucleotide insertions, nucleotide deletions, nucleotide substitutions, amino acid insertions, amino acid deletions, amino acid substitutions, gene fusions, copy-number variations, and any combination thereof.
Several methods or techniques can be used to analyze various biomolecules. Exemplary methods may include, but are not limited to, polymerase chain reaction (PCR) analysis, sequencing analysis, electrophoretic analysis, restriction fragment length polymorphism (RFLP) analysis, Northern blot analysis, quantitative PCR, reverse-transcriptase-PCR analysis (RT-PCR), allele-specific oligonucleotide hybridization analysis, comparative genomic hybridization, heteroduplex mobility assay (HMA), single strand conformational polymorphism (SSCP), denaturing gradient gel electrophoresis (DGGE), RNAase mismatch analysis, mass spectrometry, tandem mass spectrometry, matrix assisted laser desorption/ionization-time of flight (MALDI-TOF) mass spectrometry, electrospray ionization (ESI) mass spectrometry, surface-enhanced laser desorption/ionization-time of flight (SELDI-TOF) mass spectrometry, quadrupole-time of flight (Q-TOF) mass spectrometry, atmospheric pressure photoionization mass spectrometry (APPI-MS), Fourier transform mass spectrometry (FTMS), matrix-assisted laser desorption/ionization-Fourier transform-ion cyclotron resonance (MALDI-FT-ICR) mass spectrometry, secondary ion mass spectrometry (SIMS), surface plasmon resonance, Southern blot analysis, in situ hybridization, fluorescence in situ hybridization (FISH), chromogenic in situ hybridization (CISH), immunohistochemistry (IHC), microarray, comparative genomic hybridization, karyotyping, multiplex ligation-dependent probe amplification (MLPA), Quantitative Multiplex PCR of Short Fluorescent Fragments (QMPSF), microscopy, methylation specific PCR (MSP) assay, HpaII tiny fragment Enrichment by Ligation-mediated PCR (HELP) assay, radioactive acetate labeling assays, colorimetric DNA acetylation assay, chromatin immunoprecipitation combined with microarray (ChIP-on-chip) assay, restriction landmark genomic scanning, Methylated DNA immunoprecipitation (MeDIP), molecular break light assay for DNA adenine methyltransferase activity, chromatographic separation, methylation-sensitive restriction enzyme analysis, bisulfite-driven conversion of non-methylated cytosine to uracil, methyl-binding PCR analysis, or a combination thereof. These methods for analysis may be wholly or partially automated and have varying degrees of user involvement.
The biological sample may be re-assayed at a later point in time and a change may be identified in one or more biological markers. The biological sample may be re-assayed in least about 30 minutes, 1 hours, 2 hours, 3 hours, 4 hours, 5 hours, 6 hours, 12 hours, 1 day, 2 days, 3 days, 5 days, 1 week, 2 weeks, 1 month, 6 months, 12 months, 1.5 years, 2 years, 5 years, 10 years, 20 years, 30 years, or 50 years. Assaying may comprise assaying at least about 50 biological markers, 100 biological markers, 150 biological markers, 200 biological markers, 250 biological markers, 300 biological markers, or 350 biological markers in a cell-free DNA or the biological sample.
Various components can be isolated from a biological sample. A biological sample may comprise one or more cells and/or biomolecules, e.g., nucleic acids, proteins, hormones, and the like. Cell populations of the biological samples can be transformed into nucleic acids appropriate for molecular analysis. Target cells may be enriched from a heterogeneous cell population. The isolation process may be selected from laser-capture microdissection, gross dissection, or flow cytometry, among other techniques. Accompanying these processes is genetic manipulation to molecularly marked target cell types. Second, specific subsets of RNA and DNA may be extracted through direct, indirect, or modification protocols. A sequence library can be generated comprising DNA fragments labeled with a platform specific adaptor. The platform specific adaptor may be a sequence tag for sample indexing or molecular tagging.
Direct targeting DNA methods for sequence-specific enrichment may comprise molecular inversion probes, pulldown probes, bait sets, standard PCR, multiplex PCR, hybrid capture, endonuclease digestion, DNase I hypersensitivity, and selective circularization. Such probes may have sequences selected to target genes or sequences of interest, such as genes or variants thereof listed in Table 1. For example, such probes may have sequence complementarity with the genes or variants thereof listed in Table 1. RNA enrichment methods may be directed towards a specific subpopulation such as small RNAs or messenger ribonucleic acids (mRNAs). The RNA enrichment methods may be selected from, ‘not-so-random’ amplification, poly(A)-mediated reverse transcription, BrdU incorporation, or oligo(dT) hybridization. Strand preservation RNA enrichment methods may also include strand specific degradation after cDNA synthesis, orientation specific adaptor ligation, or reverse transcription-PCR of a specific biological target, or digestion of RNases for capturing secondary RNA structures. Enrichment can be achieved through negative selection of nucleic acids by eliminating undesired material. This sort of enrichment includes ‘footprinting’ techniques or ‘subtractive’ hybrid capture. During the former, the target sample is safe from nuclease activity through the protection of protein or by single and double stranded arrangements. During the latter, nucleic acids that bind ‘bait’ probes are eliminated.
DNA target enrichment may include in solution capture. During in solution capture, a custom pool of probes may be designed, synthesized and hybridized in solution to fragmented genomic DNA sample. The probes may be oligonucleotides and may be labeled with beads. The genomic DNA sample may be viral DNA present in the tumor sample. After the probes hybridize to the genomic regions of interest, the beads may be pulled down and washed. The beads can be removed and the genomic fragments may be sequenced in preparation for selective DNA sequencing of genomic sequences of interest. From the sequence reads, it can be determined which reads are off target and the probes that are associated with the off target reads. In the next cycle of in solution capture, the probes that correspond to the off target reads may be pulled down. The map of the off target reads, may compare the probes coverage. Then, the ratio of probes corresponding to off-target reads to on-target reads may be determined. The target rate for any set of probes may be estimated.
The probes may pull down at least about 1000 genes, 1500 genes, 2000 genes, 2500 genes, or 3000 genes. Once the desired or predetermined genes or genomic regions are selected, the probes may be synthesized. The probes may be at least about 50 nucleotides, 100 nucleotides, 150 nucleotides, 200 nucleotides, or 300 nucleotides in length. The probes may be separated into at least about 20 pools, 30 pools, 40 pools, 50 pools, 60 pools, 70 pools, 80 pools, 90 pools, or 100 pools. The probes may be separated based on biological function. The probes may be selected by their performance during sequencing. The assay may be conducted on a single probe level to identify which probes are selected. The probes may cover one or more coding regions, one or more non-coding regions, or both.
Nucleic acids can also be purified indirectly depending on their location to other molecular entities. The molecular entities may be other nucleic acids or proteins. The first step can be to form the desired cross-link types, such as DNA-DNA, DNA-protein, RNA-protein, or protein-protein. Cross-linkers may be selected from the group consisting of formaldehyde, ultraviolet (UV) light, dimethyl suberimidate (DMS), dimethyl adipimidate (DMA), glutaradehyde, bis(sulfosuccinimidyl) suberate (BS3), spermine or spermidine, and 1-ethyl-3-[3-dimethylaminopropyl]carbodiimide hydrochloride (EDAC). Immunoprecipitation can aid in nucleic acid extraction depending on their proximity to proteins of interests or histone modifications. Lastly, ligation may be another viable option in isolating co-localized nucleic acids to study chromosome interactions in the cell.
Modification protocols for nucleic acid extraction can direct transformation of the sequence to encode the specific modification. The protocols may include bisulfite treatment for detection of cytosine methylation and T4 bacteriophage b-glucosyltransferase and Huisgen cycloaddition for detection of 5-hydroxymethylcytosine. Post-transcriptional modifications of RNA may be detectable by determining the characteristic error signatures that they generate during the sequencing of data. Lastly, specific polymerase error signatures secondary to cross-linking events may be used to determine the target RNA nucleotide in RNA-protein interactions.
Prior to sequencing, the nucleic acids can be converted to a population of DNA fragments tagged with platform-specific adaptors. This tagging process may also occur after the nucleic acid targeting processes described above. “Fragment libraries” may first be created by random fragmentation. The fragmentation can be mechanical, chemical or enzymatic. After fragmentation, universal adaptor sequences can be ligated and undergo PCR amplification. For example, a hyperactive derivative of the Tn5 transposase can catalyze in vitro integration of the universal adaptor sequences into the target DNA at a high density. This is then usually followed by amplification. Another example PCR-free library preparation can minimize sequence bias. For example, sequencing technologies can choose to do without an amplification step.
The biological sample may be indexed. The biological sample may be tagged. A variety of methods can allow for many experiments to be efficiently multiplexed on a single sequencing lane. For example, a synthetic index or barcode may be flanked continually to all molecules in a sequencing library. The concurrent sequencing of the index can be used to determine reads in silico to the target libraries from which they derived. Alternatively, the sample may be tagged with a unique molecular index (UMI) which can be used for de-duplication at very a high coverage. Further, sequence may be appended that allows for mutations identification at deeper coverage, for example, detection of ultralow-frequency mutations by duplex sequencing. Synthetic tags can serve other functions. For example, individual molecules can be assigned during assembly. Accurate quantification, robust error-correction and increased effective read length may be achieved by categorizing reads from the same nucleic acid. Synthetic variants can be tagged during synthetic saturation mutagenesis and function as the readout. It may also be possible to assign tags to specific cells and determine genetic variability for single-cell resolution. The index may be or include a whole exome classifier.
The biological sample may comprise cell-free deoxyribonucleic acid (cfDNA) molecules, cellular deoxyribose nucleic acid (cDNA) molecules, ribonucleic acid (RNA) molecules, and protein, and wherein the cfDNA molecules, the cDNA molecules, and the RNA molecules are assayed for the presence or absence of the biological markers. The biological sample may comprise cfDNA. Dying tumor cells can release small pieces of their nucleic acids into a subject's bloodstream. These small pieces of nucleic acids are cell-free circulating tumor DNA (ctDNA).
Circulating tumor DNA can also be used non-invasively to monitor tumor progression and determine if a subject's tumor may react to targeted drug treatments. For example, the subject's ctDNA can be screened for mutations both before therapy and after therapy and drug treatment. During the therapy, developing somatic mutations can prevent the drug from working. For example, the subjects can observe an initial tumor response to the drug. This response can signal that the drug was initially effective in killing tumor cells. However, the development of new mutations may prevent the drug from continuing to work. Obtaining this critical information can assist doctors and oncologists in identifying that the subject's tumors are no longer responsive and different treatment is necessary. Circulating tumor DNA testing can be applicable to every stage of cancer subject care and clinical studies. Since ctDNA can be detected in most types of cancer at both early and advanced stages, it may be used as an effective screening method for most patients. A measurement of the levels of ctDNA in blood may also efficiently indicate a subject's stage of cancer and survival chances.
Various methods may be used to sequence cfDNA in addition to those discussed above. Techniques for sequencing cfDNA may include exome sequencing, transcriptome sequencing, genome sequencing, and cell-free DNA sequencing. Cell-free DNA sequencing may include mismatch targeted sequencing (Mita-Seq) and tethered elimination of termini (Tet-Seq).
In addition to sequencing, other reactions and/operations may occur within the systems and methods disclosed herein, including but not limited to: nucleic acid quantification, sequencing optimization, detecting gene expression, quantifying gene expression, genomic profiling, cancer profiling, or analysis of expressed markers. The assay may include immunohistochemistry profiling and genomic profiling of the biological sample. During immunohistochemistry, antigens may be identified during examination of the tumor and normal tissue cells of the biological sample. Immunohistochemistry can also provide results on the distribution and localization of biomarkers and differentially expressed proteins in different locations of the biological sample tissue. The differentially expressed proteins may be over or under-expressed proteins.
Genome profiling may be the process after sequencing in determining and measuring the activity of thousands of genes simultaneously. The profiling may be use to distinguish between cells that are actively dividing. Genomic profiling can also be used to measure how well cells respond to a particular treatment. One may determine patterns in the tumor DNA by comparing the tumor DNA against a set of known DNA. The group of genes whose combined expression pattern is uniquely characteristic to a given condition establishes the gene signature of the particular condition. The gene signature can then be used to choose a group of subjects at a specific state of a disease with accuracy that matches them with treatments.
In another aspect, the present disclosure provides a method for identifying a genomic aberration in one or more biological samples of a subject. Biological samples of the subject may be obtained and can comprise a nucleic acid sample that has or is suspected of having one or more genomic aberration(s) that appears at a frequency of less than about 1%, less than about 2%, less than about 3%, less than about 4%, less than about 5%, less than about 6%, less than about 7%, less than about 8%, less than about 9%, less than about 10%, less than about 15%, or less than about 20% in the nucleic acid sample. The nucleic acid sample may be enriched for a plurality of nucleic acid sequences to provide an enriched nucleic acid sample using a probe set comprising probes that have an on-target rate as a group of at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, and at least about 95%. The on-target rate as a group may be determined by (i) comparing the probe set to at least one predetermined region to measure (1) probe coverage of each probe in the probe set and (2) off-target probe coverage for each probe in the probe set, and (ii) determining the on-target rate of the probe set based on a ratio of the off-target coverage to the probe coverage. Alternatively, the off-target rate as a group may be determined by (i) comparing the probe set to at least one predetermined region to measure (1) probe coverage of each probe in the probe set and (2) on-target probe coverage for each probe in the probe set, and (ii) determining the off-target rate of the probe set based on a ratio of the on-target coverage to the probe coverage. The off-target probe coverage may measure the portion of probes that do not cover the predetermined region(s) of interest. The on-target probe coverage may measure the portion of probes that do cover the predetermined region(s) of interest. The probe coverage of each probe in the probe set may be the total mapped coverage of probes to the predetermined region(s) of interest. The enriched nucleic acid sample may then be sequenced to generate sequencing reads. The sequencing reads may be processed to identify one or more genomic aberration(s) in one or more biological samples of the subject that appears at a frequency of less than about 1%, less than about 2%, less than about 3%, less than about 4%, less than about 5%, less than about 6%, less than about 7%, less than about 8%, less than about 9%, less than about 10%, less than about 15%, or less than about 20% in the nucleic acid sample. One or more biological samples may comprise blood sample(s) and/or a tissue sample(s). The tumor tissue sample may be a FFPE tissue. One or more biological samples may be selected from the group consisting of protein, peptides, cell-free nucleic acids, ribonucleic acids, deoxyribose nucleic acids, and any combination thereof. One or more genomic aberrations can include nucleic acid mutations. One or more genomic aberrations may be selected from the group consisting of an insertion, nucleotide deletion, nucleotide substitution, amino acid insertion, amino acid deletion, amino acid substitution, gene fusion, copy-number variation, gene expression signatures, and any combination thereof.
The probe set can be further used to generate a classifier. First, one or more predetermined regions of a genome may be sequenced from a tumor tissue sample of the subject to provide sequencing reads. From the sequencing reads, sequences for the probe set may be identified that cover one or more predetermined regions of a genome. Then, the probe set may be compared to one or more predetermined regions to measure (i) probe coverage of each probe in the probe set and (ii) off-target probe coverage for each probe in the probe set. An on-target rate of the probe set may be determined based on a ratio of the off-target coverage to the probe coverage. A portion of the probe set may be selected that covers one or more predetermined regions of a genome and a portion of the probe set with an on-target rate as a group of at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, and at least about 95%, thereby determining a custom probe set. One or more features may be provided to permit classification of the probe set for one or more probes. Alternatively, the off-target rate as a group may be determined by (i) comparing the probe set to at least one predetermined region to measure (1) probe coverage of each probe in the probe set and (2) on-target probe coverage for each probe in the probe set, and (ii) determining the off-target rate of the probe set based on a ratio of the on-target coverage to the probe coverage.
One or more predetermined region(s) can comprise components selected from the group consisting of one or more segments of a gene, one or more segments of a plurality of genes, coding sequences, non-coding sequences, at least 2600 genes, gene fusions, point mutations, indels, copy-number variations, promoters, and/or enhancers. Such components may comprise at least about 500 genes, at least about 1000 genes, at least about 1200 genes, at least about 1400 genes, at least about 1600 genes, at least about 1800 genes, at least about 2000 genes, at least about 2200 genes, at least about 2600 genes, at least about 2800 genes, at least about 3000 genes, or at least about 3500 genes. One or more features can be selected from the group consisting of sequence, sequence length, alignment location, probe coverage, off-target probe coverage, on target rate, genomic aberrations, and genes or variants selected from Table 1. The predetermined regions may be coding or non-coding sequences. Non-coding sequences may comprise pseudogenes, genes for encoding RNA, introns and untranslated regions of mRNA, regulatory DNA sequences, repetitive DNA sequences, and transposons. Sequencing can be selected from the group consisting of exome sequencing, transcriptome sequencing, genome sequencing, and cell-free DNA sequencing.
The classifier may also provide a method for classifying a new set of probes. First, a classifier and a new probe set may be provided. Then, one or more features may be generated from the new set of probes. One or more features may be inputted from the new set of probes into the classifier. The classifier may be used to predict a classification outcome for the new set of probes. The features may be selected from the group consisting of sequence, sequence length, alignment location, probe coverage, off-target probe coverage, on target rate, genomic aberrations, and genes or variants selected from Table 1. The classification outcome can be selected from a choice of 0 or a choice of 1. The choice of 0 may indicate a selection to not order the new set of probes and the choice of 1 may indicate a selection to order the new set of probes. The classifier may be a machine learning algorithm. The classifier may be a supervised learning algorithm. The classifier may be a machine learning algorithm that is capable of getting trained by feature selection. Machine learning methods can be selected from the group consisting of decision tree learning, association rule learning, artificial neural networks, deep learning, inductive logic programming, support vector machines, clustering, Bayesian networks, reinforcement learning, representation learning, similarity and metric learning, sparse dictionary learning, genetic algorithms, rule-based machine learning, learning classifier systems, supervised learning, and unsupervised learning. In supervised machine learning, the pursuit for algorithms can reason from outwardly supplied instances to produce general hypotheses to determine predictions about future behavior. Supervised machine learning can build a succinct model of the distribution of class labels in terms of predictor features.
When generating a classifier, the classifier may be evaluated based on prediction accuracy. The accuracy may be determined by splitting a training set, by using a portion for estimating performance, by cross-validation, and leave-one-out validation. Examples of classification algorithms may include linear classifiers, support vector machines, quadratic classifiers, kernel estimation, boosting, decision trees, neural networks, FMM neural networks, and learning vector quantization. Linear classifiers can include Fischer's linear discriminant, logistic regression, multinomial logistic regression, probit regression, support vector machines, Naive Bayes classifier, and perceptron.
The present disclosure provides a system that may provide for analysis of one or more biological sample(s), which may be automated and/or not require involvement from a user. The automated system may preclude the need for any pipetting by a user, such as pipetting to transfer a sample from one station to another. For example, a user may input a biological sample into a machine for analysis of biocomponents (e.g., proteins and/or nucleic acids). Such an analyzer may analyze protein and/or nucleic acid biocomponents. The system, described in detail below, may provide a non-limiting example of an automated bioanalyzer that may not require any involvement from a user. The system may also comprise manual involvement from a user, such as manual pipetting.
The system may permit a user to prepare a biological sample for assaying and assay the biological sample without any pipetting by the user, or even without any involvement from the user. In some examples, the system permits the user to provide a biological sample (e.g., blood sample or tissue sample) to the system, at which point the system prepares the biological sample for sequencing and performs sequencing on the biological sample to generate sequencing data.
Systems of the present disclosure may permit a biological sample to be processed (e.g., sample preparation and sequencing) in a reproducible manner. For example, two systems as provided herein, in different geographic locations, may process the same biological sample or two subsets from the same biological sample and provide results that vary by at most about 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.1%, or 0.01%. Such variance may be determined, for example, by comparing sequence reads or consensus sequences.
The system may comprise two robotic movers with at least about 20, 25, 30, 35, or 40 peripheral instruments. For example, the instruments may be selected from the group consisting of Spinnaker Robot with 1270 mm Extended Height Upgrade (Robotic Plate mover with gripper fingers and integrated camera), custom tables (Supports instruments and robotics), keyboard shelf and monitor stand (Support Keyboard and Monitor), Custom Guarding (Floor Standing Guarding), HEPA Ceiling with Positive Pressure (HEPA filtered air for pre PCR system with positive air pressure), HEPA Ceiling with Negative Pressure (Ceiling enclosure for Negative air pressure for Post Amplification system), Slide out Instrument Mezzanine (Pull out Mezzanine for instruments), Instrument Mezzanine (Fixed Instrument Mezzanine), Spinnaker Mix and Match Carousel (Plate Storage Carousel), Momentum Multimover (Scheduling Software with multi mover license), Momentum Concurrent License, Slide out Docking Tables (Custom Docking Tables for Hamilton Star), 10KVM UPS (Battery Backup), One Way Air Lock (Custom air lock between systems), AATI Fragment Analyzer (Performs QC on DNA fragments), ALPS 3000 (Plate Sealer (2 on system 2 offline)), Inheco Standard Plate Shaker (Automated Plate Shaker), Inheco DWP Plate Shaker (Automated Plate Shaker), Inheco Controller (Controls Plate Shakers), Inheco ODTC 96 (96 Well PCR Block), Hamilton Elite Decapper, Biotek MultifloFX (Dispenses Plates), Brooks Automation Xpeel (Plate Peeler), Thermo Kingfisher (DNA Extraction and Prep), Hamilton STAR (Liquid Handler), Bionex BeeSure (Acoustic Volume Check), Roche LC480 (QPCR), Bionex HiG4 (Plate Centrifuge), PCR Plate, Assay Plate for DNA Quantification, 96 Well Tube Racks, and 96 well tip boxes. The Hamilton STAR can be an automated liquid handler. The pre-Amplification STAR may be configured with 8 Pipetting channels, 2 Autolys channels (cell lysis and DNA extraction), EasyBlood Camera channel, and an Autoload barcode reader. The post-Amplification STAR can be configured with 8 Pipetting channels and an Autoload barcode reader. The EasyBlood component may be used in preparation and splitting of blood samples into their basic components including serum, plasma, white blood cells, and red blood cells. The camera may be used in determining the volume of separated plasma and cells.
Assaying may begin after a user inputs the biological sample. A request from the user may be received to process the biological sample or sequence the biological sample. The process may be automated.
During the quality check fragment analysis 606, the distribution size for biological sample's DNA fragments may be analyzed. The distribution size (or size distribution) may be at least about 100 base pairs (bp), 200 bp, 300 bp, 400 bp, 500 bp, 600 bp, 700 bp, 800 bp, 900 bp, 1000 bp, 1500 bp, or 2000 bp. Such size distribution may be an average or mean size distribution. The distribution size for FFPE tumor fragments may be at least about 50 bp, 100 bp, 150 bp, 200 bp, or 250 bp. The distribution size for cell free fragments may be at least about 50 bp, 100 bp, 150 bp, 200 bp, 250 bp. The distribution size for buffy coat fragments may be at least about 10 kb, 15 kb, 20 kb, 25 kb, 30 kb, 35 kb, or 40 kb. The isolated DNA may then be quantified 607 and the DNA concentration may be adjusted for storage 608. The FFPE tumor DNA quantified may be at least about 1 nanogram/microliter (ng/μL), 5 ng/μL, 10 ng/μL, 15 ng/μL, 20 ng/μL, 25 ng/μL, 30 ng/μL, 35 ng/μL, 40 ng/μL, 45 ng/μL, or 50 ng/μL. The cell free DNA quantified may be at least about 10 picograms/microliter (pg/μL), 20 pg/μL, 30 pg/μL, 40 pg/μL, 50 pg/μL, 60 pg/μL, 70 pg/μL, 80 pg/μL, 90 pg/μL, 100 pg/μL, 200 pg/μL, 300 pg/μL, 400 pg/μL, 500 pg/μL, 600 pg/μL, 700 pg/μL, 800 pg/μL, 900 pg/μL, 1000 pg/μL, or 1.5 ng/μL. The buffy coat DNA quantified may be at least about 1 ng/μL, 2 ng/μL, 3 ng/μL, 4 ng/μL, 5 ng/μL, 6 ng/μL, 7 ng/μL, 8 ng/μL, 9 ng/μL, 10 ng/μL, 15 ng/μL, 20 ng/μL, 25 ng/μL, 50 ng/μL, 100 ng/μL, 150 ng/μL, 200 ng/μL, or 300 ng/μL. During the DNA library preparations for downstream processes, the DNA fragments can be modified 609. The fragments can then undergo a quality control fragment analysis 610 by determining the distribution sizes for the modified DNA fragments and quantifying 611 the modified DNA. The distribution size (or size distribution) for FFPE tumor fragments may be at least about 50 bp, 100 bp, 150 bp, 200 bp, 250 bp, or 300 bp. The distribution size for buffy coat fragments may be at least about 50 bp, 100 bp, 150 bp, 200 bp, 300 bp, 400 bp, 500 bp, 600 bp, 700 bp, 800 bp, 900 bp, or 1000 bp. The FFPE tumor fragment quantified may be at least about 500 ng/μL, 600 ng/μL, 700 ng/μL, 800 ng/μL, 900 ng/μL, 1000 ng/μL, 1500 ng/μL, or 2000 ng/μL. The buffy coat fragment quantified may be at least about 500 ng/μL, 600 ng/μL, 700 ng/μL, 800 ng/μL, 900 ng/μL, 1000 ng/μL, 1500 ng/μL, or 2000 ng/μL. The cell free fragment quantified may be at least about 5 ng/μL, 10 ng/μL, 15 ng/μL, 20 ng/μL, 25 ng/μL, 30 ng/μL, 35 ng/μL, 40 ng/μL, 45 ng/μL, or 50 ng/μL. Of the DNA library, during target capture 612, DNA can be selected based on its match with at most about 1000 genes, 1500 genes, 2000 genes, 2500 genes, or 3000 genes in table 1. After target capture, the distribution of the size for the DNA fragments and the amount of DNA isolated may be measured 613, 614. Then, the DNA can be adjusted 615 to the correct concentration and each patient library can be tagged 615 with a specific barcode for downstream analysis. The correct concentration may be at most about 100 ng/μL, 150 ng/μL, 200 ng/μL, 250 ng/μL, 300 ng/μL, 350 ng/μL, 400 ng/μL, 450 ng/μL, 500 ng/μL, 550 ng/μL, or 600 ng/μL.
The system can accommodate at most about 100, 50, 45, 40, 35, 30, 20, 10, or less subject (e.g., patient) samples. Alternatively, the system can accommodate at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 or more subject samples. Oligonucleotides, such as DNA or RNA (e.g., transcripts), can be selected for targets of interest, such as by enriching, and prepared for loading onto a nucleic acid sequencer (e.g., sequencer by Illumina, Pacific Biosciences of California, Ion Torrent or Oxford Nanopore). Each sample can be indexed and each indexed group can load together to the sequencer without mixing the results.
Polynucleotides may be tagged with a multitude of polynucleotide molecules from an adaptor library to generate a pool of tagged polynucleotides. The pool of tagged polynucleotides may be amplified among a variety of sequencing adaptors. The sequencing adaptors may comprise primers with sequences that are specifically complementary to sequences in of the plurality of polynucleotide molecules. Each of the sequencer adaptors may further contain an index tag, which can be a recognizable sample motif.
Tags can be any types of molecules chemically attached to aid in detection or labeling. Tags may be attached to a polynucleotide, comprising, nucleic acids, chemical compounds, florescent probes, or radioactive probes. Tags may also be oligonucleotides (e.g., DNA or RNA). Tags can comprise known sequences, unknown sequences, or both. A tag can comprise random sequences, pre-determined sequences, or both. A tag can be double-stranded or single-stranded. A double-stranded tag can be a duplex tag. A double-stranded tag can comprise two complementary strands. Alternatively, a double-stranded tag can comprise a hybridized portion and a non-hybridized portion. The double-stranded tag can be Y-shaped, e.g., the hybridized portion is at one end of the tag and the non-hybridized portion is at the opposite end of the tag. One such example is the “Y adapters” used in Illumina sequencing. Other examples include hairpin shaped adapters or bubble shaped adapters. Bubble shaped adapters have non-complementary sequences flanked on both sides by complementary sequences.
Samples may be processed to include barcodes (e.g., sample barcode, molecular barcode) and functional sequences that may be used, for example, to permit use of a given sample of a nucleic acid sequence. In an example, such functional sequences may include flow cell sequences that permit a nucleic acid sample to be coupled to a flow cell of a nucleic acid sequencer (e.g., Illumina P5/P7 adaptors).
A variety of methods can be used for tagging. For example, a polynucleotide can be tagged with an adaptor by hybridization. The adaptor may have a nucleotide sequence that is complementary to at least a portion of a sequence of the polynucleotide. The polynucleotide may also be tagged with an adaptor by ligation.
One or more enzymes may also be used for tagging. The enzyme can be a ligase such as a DNA ligase or a thermostable ligase. For example, the DNA ligase can be selected from a group consisting of E. coli DNA ligase, T4 DNA ligase, and/or mammalian ligase. The mammalian ligase can be DNA ligase I, DNA ligase III, or DNA ligase IV. Tags can be ligated to a blunt-end of a polynucleotide by blunt-end ligation. Tags can also be ligated to a sticky end of a polynucleotide by sticky-end ligation. Efficiency of ligation can be increased by optimizing various conditions. Efficiency of ligation can be increased by optimizing the reaction time of ligation. For example, the reaction time of ligation can be less than about 12 hours, such as less than about 1, less than 2, less than 3, less than 4, less than 5, less than 6, less than 7, less than 8, less than 9, less than 10, less than 11, less than 12, less than 13, less than 14, less than 15, less than 16, less than 17, less than 18, less than 19, or less than 20 hours.
The ligase concentration of the reaction may increase the efficiency of ligation. For example, the ligase concentration can be at least about 10 unit/microliter, at least 50 unit/microliter, at least 100 unit/microliter, at least 150 unit/microliter, at least 200 unit/microliter, at least 250 unit/microliter, at least 300 unit/microliter, at least 400 unit/microliter, at least 500 unit/microliter, or at least 600 unit/microliter. Efficiency can also be optimized by adding or varying the concentration of an enzyme suitable for ligation, enzyme cofactors or other additives, and/or optimizing a temperature of a solution having the enzyme. Efficiency can also be optimized by varying the addition order of various components of the reaction. The end of tag sequence can comprise dinucleotide to increase ligation efficiency. When the tag comprises a non-complementary portion (e.g., Y-shaped adaptor), the sequence on the complementary portion of the tag adaptor can comprise one or more selected sequences that promote ligation efficiency. Preferably such sequences are located at the terminal end of the tag. Such sequences can comprise 1 terminal base, 2 terminal bases, 3 terminal bases, 4 terminal bases, 5 terminal bases, 6 terminal bases, 7 terminal bases, 8 terminal bases, 9 terminal bases, 10 terminal bases, 11 terminal bases, or 12 terminal bases. Reaction solution with high viscosity (e.g., a low Reynolds number) can also be used to increase ligation efficiency. For example, solution can have a Reynolds number less than 3000, less than 2000, less than 1000, less than 900, less than 800, less than 700, less than 600, less than 500, less than 400, less than 300, less than 200, less than 100, less than 50, less than 25, or less than 10. Further, roughly unified distribution of fragments can be used to increase ligation efficiency. The roughly unified distribution of fragments can be a tight standard deviation. For example, the variation in fragment sizes can vary by less than 20%, less than 15%, less than 10%, less than 5%, or less than 1%. Tagging can also comprise primer extension, for example, by polymerase chain reaction (PCR). Tagging can also comprise any of ligation-based PCR, multiplex PCR, single strand ligation, or single strand circularization.
The tags may also comprise molecular barcodes. Molecular barcodes can be used to differentiate polynucleotides in a sample and may be different from one another. For example, molecular barcodes can have a difference between them that can be characterized by a predetermined edit distance or a Hamming distance. In some instances, the molecular barcodes herein have a minimum edit distance of 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10. To further improve efficiency of conversion (e.g., tagging) of untagged molecular to tagged molecules, one preferably utilizes short tags. For example, a library adapter tag can be up to about 75, 70, 65, 60, 55, 50, 45, 40, or 35 nucleotide bases in length. A collection of such short library barcodes can include a number of different molecular barcodes, such as at least 2, 4, 6, 8, 10, 12, 14, 16, 18 or 20 different barcodes with a minimum edit distance of 1, 2, 3 or more.
As a result, a collection of molecules may comprise one or more tags. In some instances, some molecules in a collection can include an identifying tag (“identifier”) such as a molecular barcode that is not shared by any other molecule in the collection. For example, in some instances of a collection of molecules, at least 50%, at least 51%, at least 52%, at least 53%, at least 54%, at least 55%, at least 56%, at least 57%, at least 58%, at least 59%, at least 60%, at least 61%, at least 62%, at least 63%, at least 64%, at least 65%, at least 66%, at least 67%, at least 68%, at least 69%, at least 70%, at least 71%, at least 72%, at least 73%, at least 74%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% of the molecules in the collection can include an identifier or molecular barcode that is not shared by any other molecule in the collection. A collection of molecules may be considered “uniquely tagged” if each of at least 95% of the molecules in the collection carries an identifier that is not shared by any other molecule in the collection (“unique tag” or “unique identifier”). A collection of molecules is considered to be “non-uniquely tagged” if each of at least 1%, at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, or at least or about 50% of the molecules in the collection bears an identifying tag or molecular barcode that is shared by at least one other molecule in the collection (“non-unique tag” or “non-unique identifier”). Accordingly, in a non-uniquely tagged population no more than 1% of the molecules are uniquely tagged. For example, in a non-uniquely tagged population, no more than 1%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, or 50% of the molecules can be uniquely tagged. Examples of tags and adaptors, which may be used with methods and systems of the present disclosure, are provided in U.S. Patent Publication Nos. 2016/0040229 and 2016/0046986, each of which is entirely incorporated herein by reference.
The estimated number of molecules in a sample can result in a number of different tags selected. In some tagging methods, the number of different tags can be at least the same as the estimated number of molecules in the sample. In other tagging methods, the number of different tags can be at least two, three, four, five, six, seven, eight, nine, ten, one hundred or one thousand times as many as the estimated number of molecules in the sample. In unique tagging, at least two times (or more) as many different tags can be used as the estimated number of molecules in the sample.
The molecules in the sample may be non-uniquely tagged. In such instances a fewer number of tags or molecular barcodes is used then the number of molecules in the sample to be tagged. For example, no more than 100, 50, 40, 30, 20 or 10 unique tags or molecular barcodes are used to tag a complex sample such as a cell free DNA sample with many more different fragments.
The polynucleotide can be fragmented prior to tagging either naturally or using other approaches, such as, for example, shearing. The polynucleotides can be fragmented by certain methods selected from the group consisting of mechanical shearing, passing the sample through a syringe, sonication, heat treatment (e.g., for 30 minutes at 90° C.), and/or nuclease treatment (e.g., using DNase, RNase, endonuclease, exonuclease, and/or restriction enzyme).
The polynucleotides fragments before tagging can comprise sequences of any length. For example, the length can be selected from the group consisting of at least 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000 or more nucleotides in length. The polynucleotide fragments can be about the average length of cell-free DNA. For example, the polynucleotide fragments can comprise about 160 bases in length. The polynucleotide fragment can also be fragmented from a larger fragment into smaller fragments about 160 bases in length.
Tagged polynucleotides tagged may include cancer related sequences. The cancer-associated sequences can comprise single nucleotide variation (SNV), copy number variation (CNV), insertions, deletions, and/or rearrangements.
Nucleic acid barcodes with identifiable sequences comprising molecular barcodes may be used for tagging. For example, a plurality of DNA barcodes can comprise various numbers of sequences of nucleotides. A plurality of DNA barcodes having 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more identifiable sequences of nucleotides can be used. When attached to only one end of a polynucleotide, the plurality of DNA barcodes can produce 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more different identifiers. Alternatively, when attached to both ends of a polynucleotide, the plurality DNA barcodes can produce 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400 or more different identifiers (which is the ̂2 of when the DNA barcode is attached to only 1 end of a polynucleotide). In one example, a plurality of DNA barcodes having 6, 7, 8, 9 or 10 identifiable sequences of nucleotides can be used. When attached to both ends of a polynucleotide, they produce 36, 49, 64, 81 or 100 possible different identifiers, respectively. Samples tagged in such a way can be those with a range of about 10 ng to any of about 100 ng, about 1 μg, about 10 μg of fragmented polynucleotides, e.g., genomic DNA, e.g., cfDNA.
There are many ways a polynucleotide may be uniquely identified. For example, a polynucleotide can be uniquely identified by a unique DNA barcode. Any two polynucleotides in a sample are attached two different DNA barcodes. Alternatively, a polynucleotide can be uniquely identified by the combination of a DNA barcode and one or more endogenous sequences of the polynucleotide. For example, any two polynucleotides in a sample can be attached the same DNA barcode, but the two polynucleotides can still be identified by different endogenous sequences. The endogenous sequence can be on an end of a polynucleotide. For example, the endogenous sequence can be adjacent (e.g., base in between) to the attached DNA barcode. In some instances the endogenous sequence can be at least 2, 4, 6, 8, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 bases in length. The endogenous sequence may be a terminal sequence of the fragment/polynucleotides to be analyzed. The endogenous sequence may be the length of the sequence. For example, a plurality of DNA barcodes comprising 8 different DNA barcodes can be attached to both ends of each polynucleotide in a sample. Each polynucleotide in the sample can be identified by the combination of the DNA barcodes and about 10 base pair endogenous sequence on an end of the polynucleotide. Without being bound by theory, the endogenous sequence of a polynucleotide can also be the entire polynucleotide sequence.
A barcode can comprise either a contiguous or non-contiguous sequences. A barcode that comprises at least 1, 2, 3, 4, 5 or more nucleotides may be a contiguous sequence or non-contiguous sequence. For example, if a barcode comprises the sequence TTGC, a barcode is contiguous if the barcode is TTGC. On the other hand, a barcode is non-contiguous if the barcode is TTXGC, where X is a nucleic acid base.
An identifier or molecular barcode can have an n-mer sequence which may be 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 or more nucleotides in length. A tag herein can comprise any range of nucleotides in length. For example, the sequence can be between 2 to 100, 10 to 90, 20 to 80, 30 to 70, 40 to 60, or about 50 nucleotides in length.
The tag can comprise downstream of the identifier or molecular barcode, a double-stranded fixed reference sequence. The tag may also comprise a double-stranded fixed reference sequence upstream or downstream of the identifier or molecular barcode. Each strand of a double-stranded fixed reference sequence can be, for example, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 nucleotides in length.
These instruments may be used to perform the function described below: Hamilton STAR, Thermo KingFisher, Bionex HiG4 centrifuge, Inheco ODTC thermocycler, Inheco incubator shaker, Biotek MultifloFX, Thermo Fisher Spinnaker robotic arm, Thermo Fisher ALPS3000 plate sealer, Brooks XPeel, Roche LightCycler 480 for qPCR based nucleic acid quantitation, AATI Fragment Analyzer Infinity for nucleic acid size and quantity determination, and Hamilton LabElite Capper/Decapper. The automated sample analysis platform may perform multiple functions for biological sample analysis. These functions may include the main sample prep for the system (the Main method) and may be divided into two methods. The first method may include the Pre-Amplification Sample Processing which is associated with sequencing preparations. Pre-Amplification Sample Processing may comprise the tasks of DNA extraction from buffy coat or whole blood, cell-free DNA extraction from plasma, DNA and RNA extraction from FFPE tissues samples, DNA and RNA quantitation, QC, Normalization, DNA Fragmentation, End Repair, adapter Ligation and Bead Cleanup, PCR amplification and sample combination. Methods may vary in accordance with user preference(s). The system may have at least about 1 iteration, 2 iterations, 3 iterations, 4 iterations, or 5 iterations in a work day. One work day may be at least about 6 hours, 7 hours, 8 hours, 9 hours, or 10 hours. During each work day, at least about 1 PCR plate, 2 PCR plates, 3 PCR plates, 4 PCR plates, or 5 PCR plates may be transferred to Post-Amplification System. During the Pre-Amplification sample processing, the lysis method may be run on the liquid handler (Hamilton Star) with deep well plate. The tip box can be sent to the waste. The plate may be sealed and incubated for at least about 15 minutes, 30 minutes, 1 hour, 2 hours, or 3 hours with shaking. Then the plate may be undergo centrifugation for at least about 30 seconds, 1 minute, 1.5 minutes, 2 minutes, 3 minutes or 5 minutes. The plate may be peeled. The beads can be added onto the liquid handler and loaded onto the DNA and extraction prep shelves (Kingfisher). The beads may be magnetic beads. The extraction protocol ran and may comprise an additional wash and extraction of plates onto the Kingfisher. The extracted DNA may have magnetic heads. The QC plates on the fragment analyzer may be read. Sounds waves maybe utilized to determine the volume of fragments. If the samples are good, the result may include pure DNA or RNA from various samples. Quantification may be determined by capillary based separation of DNA by size. Real time or quantitative PCR (qPCR) may be used to measure the amount. The quantitative PCR may performed by a KAPA kit. The qPCR may be used to select for the DNA that will be sequenced. If the samples are bad, the extraction protocol can be re-run. The destination tube rack may be decapped and placed on the star deck. The data from the fragment analyzer and LightCycler 480 may be used to make the normalization plate on the Star. The sample may be aliquoted to the tube rack, re-capped, and sent to the output rack. During shearing, enzyme may be dispensed to the normalized plate. During shearing, flow cell adaptors may be attached to DNA. For cell free DNA, identifiers may be attached. The identifier may be a patient identifier or a unique identifier. The normalized plate may be sealed and incubated with shaking for at least about 10 minutes, 15 minutes, 20 minutes, 25 minutes, 30 minutes. The plate can be spun and the seal peeled. The end repair method can be run on the Star. The plate on the fragment analyzer may be read for QC. The normalized plate may be sealed and incubated with shaking for at least about 1 minutes, 5 minutes, 10 minutes, 20 minutes, 30 minutes, 1 hour, 2 hours, 3 hours, 4 hours, or 5 hours. The normalized plate may undergo centrifugation and then peeled. During adaptor ligation, the method may be run on the Star and beads can be added. The plate may be moved to Kingfisher and can undergo an additional wash and cleanup and eluent step. The magbead cleanup process can be run on the Kingfisher. The remaining plates may be removed to the waste or carousel from Kingfisher and the PCR plate may be sealed.
The completion time may be at least about 3 hours, 4 hours, 5 hours, 6 hours, 7 hours, 8 hours, 9 hours, or 10 hours for at least about 1 plate, 2 plates, 3 plates, 4 plates, 5 plates, 6 plates, or about 7 plates. The timing may be influenced by incubations that are at least about 30 min, 1 hr, 2 hrs, 3 hrs, 4 hrs, 5 hrs, or 10 hrs.
The second method may be the Post Amplification Plate preparation. The second method may include PCR, cleanup, QC, target capture, normalization and pooling. And these methods may change depending on the customer. During the Post Amplification Plate preparation, the Pre Amplification PCR plate may be placed on the Inheco and the protocol may be run. The PCR plate may be centrifuged and peeled, moved to the Star and transferred to the new Kingfisher plate. The reagents may be dispensed on the Biotek MultifloFX dispenser and transferred to the Kingfisher. The wash plates may be loaded, Kingfisher routine can be run, and transferred to the Star. The QC plate and PCR plate can be made. The beads can be added with Star, the Kingfisher routine can be run, transferred to the Star, and 8 PCR plates can be generated. The PCR protocol can then run, the Ampure cleanup protocol may be repeated on the Star and Kingfisher. The QC plate can be made, can run on the fragment analyzer, and the output and pool samples on the Star can be normalized. The system may also comprise a robotic camera that checks every plate and scans the barcode to ensure the right sample is handled.
The system providing for analysis of one or more biological sample(s) may be connected to a cloud computing system to form a “lab in a box with a cloud”. The cloud computing system may comprise a cloud storage system and one or more super computers. In cloud computing, a network of remote servers may be hosted on the internet to store, manage, and process data from the system providing for analysis of one or more biological sample(s), rather than a local server or a personal computer. In cloud storage, data and the mathematical models from the system providing for analysis of one or more biological sample(s) may be stored on remote servers accessed from the internet or “cloud”. The cloud storage may be maintained, operated and managed by a cloud storage service provider on storage servers that are built on virtualization methods. The output data and methods, disclosed herein, from the system providing for analysis of one or more biological sample(s) can transfer directly to the cloud computing system. The cloud computing system can comprise the system providing for analysis of one or more biological sample(s). The cloud computing system can store method and data as meta data along every step of the analysis of one or more biological sample(s). A user may have access to the “lab in a box with a cloud”.
The biological markers may include a plurality of different types of biological markers. In some cases, at least about 1 biological marker, 10 biological markers, 50 biological markers, 100 biological markers, 500 biological markers, 1000 biological markers, 1500 biological markers, 2000 biological markers, 2500 biological markers, 3000 biological markers, 3500 biological markers, or 4000 biological markers can be assayed. Through curated clinical trials and drugs, an annotated set of biological markers may be generated.
Cell-free DNA may be assayed for one or more biomarkers in the following genes including: ABL1, AKT1, AKT2, AKT3, ALK, APC, AR, ARAF, ARID1A, ASXL1, ATM, ATR, AURKA, AURKB, AURKC, BAP1, BCL2, BRAF, BRCA1, BRCA2, BRD2, BRD3, BRD4, CCND1, CCND2, CCND3, CCNE1, CDH1, CDK12, CDK4, CDK6, CDKN1A, CDKN1B, CDKN2A, CDKN2B, CEBPA, CREBBP, CRKL, CSF1R, CTNNB1, DDR2, DNMT3A, EGFR, EPHA3, EPHAS, ERBB2, ERBB3, ERBB4, ERCC2, ERG, ERRFI1, ESR1, ETV1, ETV4, ETVS, ETV6, EWSR1, EZH2, FBXW7, FGFR1, FGFR2, FGFR3, FLCN, FLT3, GATA3, GNA11, GNAQ, GNAS, GSTM1, HNF1A, HRAS, IDH1, IDH2, IGF1R, JAK2, JAK3, KDR, KEAP1, KIT, KMT2A, KRAS, MAP2K1, MAP2K2, MAP2K4, MAPK1, MAPK3, MCL1, MDM2, MDM4, MED12, MEN1, MET, MITF, MKI67, MLH1, MPL, MSH2, MSH6, MTOR, MYC, MYD88, NF1, NF2, NFE2L2, NFKBIA, NKX2-1, NOTCH1, NOTCH2, NPM1, NRAS, NTRK1, NTRK3, NUTM1, PDGFRA, PDGFRB, PGR, PIK3CA, PIK3CB, PIK3R1, PTCH1, PTEN, PTPN11, RAB35, RAF1, RARA, RB1, RET, RHEB, RHOA, RIT1, RNF43, ROS1, RSPO2, RUNX1, SMAD2, SMAD4, SMARCA4, SMARCB1, SMO, SRC, STK11, SYK, TERT, TET2, TMPRSS2, TP53, TSC1, TSC2, VHL, WT1, XPO1, ZNRF3, BTK, CD274, FOXL2, MYCN, PDCD1LG2, and VEGFA.
Biomarkers may comprise at least one present in one or more of the following exons 61E3.4, AAK1, AARS, AARS2, AATK, ABCB1, ABCC9, ABI1, ABL1, ABL2, AC099552.4, ACKR3, ACP1, ACSL3, ACSL6, ACSM2B, ACTA2, ACTB, ACTC1, ACTG1, ACTL6B, ACTR2, ACVR1, ACVR1B, ACVR1C, ACVR2A, ACVR2B, ACVRL1, ADAM10, ADAM29, ADAMTS10, ADAMTS16, ADAMTS2, ADAMTS20, ADCK1, ADCK2, ADCK3, ADCK4, ADCK5, ADCY1, ADORA2A, ADRB1, ADRB2, ADRBK1, ADRBK2, AES, AFAP1, AFF1, AFF3, AFF4, AGBL4, AGXT2, AHCTF1, AHCYL2, AHDC1, AHNAK, AHNAK2, AJUBA, AK9, AKAP1, AKAP13, AKAP9, AKR1B10, AKT1, AKT2, AKT3, AL603965.1, ALDH2, ALDH3A2, ALDH7A1, ALG10B, ALK, ALKBH2, ALKBH3, ALOX12B, ALOX5, ALPK1, ALPK2, ALPK3, AMER1, AMHR2, AMPH, ANAPC1, ANKK1, ANKRD11, ANKRD12, ANKRD20A4, ANKRD30A, ANKRD36, ANKRD53, ANKRD6, ANXA6, ANXA8L2, AP003733.1, AP2A1, APAF1, APC, APC2, APEX1, APEX2, API5, APLF, APOB, APOBEC3G, APTX, AQP12A, AQP7, AR, ARAF, AREG, ARFRP1, ARG1, ARG2, ARHGAP26, ARHGAP32, ARHGAP35, ARHGAP36, ARHGEF12, ARHGEF18, ARHGEF35, ARHGEF6, ARID1A, ARID1B, ARID2, ARID3A, ARID3B, ARID4A, ARID4B, ARID5A, ARID5B, ARNT, ASB5, ASCL4, ASH2L, ASPM, ASPSCR1, ASTN2, ASXL1, ASXL2, ASXL3, ATF1, ATF7IP, ATG13, ATG5, ATIC, ATM, ATP1A1, ATP2B3, ATR, ATRIP, ATRX, ATXN1, AURKA, AURKB, AURKC, AXIN1, AXIN2, AXL, B2M, B3GNTL1, B4GALT3, BAGE2, BAIAP2L1, BAP1, BARD1, BAZ1B, BAZ2A, BBC3, BCAP31, BCKDK, BCL10, BCL11A, BCL11B, BCL2, BCL2A1, BCL2L1, BCL2L11, BCL2L12, BCL2L2, BCL3, BCL6, BCL7A, BCL9, BCL9L, BCLAF1, BCOR, BCORL1, BCR, BIRC2, BIRC3, BLK, BLM, BMP2K, BMPR1A, BMPR1B, BMPR2, BMX, BPNT1, BRAF, BRCA1, BRCA2, BRD2, BRD3, BRD4, BRDT, BRINP3, BRIP1, BRSK1, BRSK2, BRWD3, BTG1, BTG2, BTK, BUB1, BUB1B, C11ORF30, C15ORF65, C16ORF59, C19ORF40, C1ORF159, C1ORF86, C1QTNF5, C20ORF26, C2CD3, C2ORF44, C3ORF70, C4ORF27, C7, C7ORF50, C7ORF55, C8A, C8ORF37, C8ORF44, CABLES2, CACNA1C, CACNA1D, CACNA1S, CAD, CALCR, CALM1, CALN1, CALR, CAMK1D, CAMK1G, CAMK2A, CAMK2B, CAMK2D, CAMK2G, CAMK4, CAMKK1, CAMKK2, CAMKV, CAMTA1 CANT1, CARD11, CARM1, CARS, CASC5, CASK, CASP8, CAST, CBFA2T3, CBFB, CBL, CBLB, CBLC, CBLN4, CBWD1, CCAR1, CCDC107, CCDC144A, CCDC160, CCDC178, CCDC6, CCDC74A, CCNB1IP1, CCND1, CCND2, CCND3, CCNE1, CCNH, CD163L1, CD274, CD276, CD40, CD5L, CD74, CD79A, CD79B, CD82, CDC14A, CDC14B, CDC20, CDC25A, CDC25B, CDC25C, CDC27, CDC42, CDC42BPA, CDC42BPB, CDC42BPG, CDC42EP1, CDC7, CDC73, CDH1, CDH10, CDH11, CDH18, CDH2, CDH20, CDH4, CDH5, CDH6, CDH9, CDK1, CDK10, CDK11A, CDK12, CDK13, CDK14, CDK15, CDK16, CDK17, CDK18, CDK19, CDK2, CDK20, CDK3, CDK4, CDK5, CDK5RAP2, CDK6, CDK7, CDK8, CDK9, CDKL1, CDKL2, CDKL3, CDKL4, CDKL5, CDKN1A, CDKN1B, CDKN2A, CDKN2B, CDKN2C, CDKN3, CDX2, CEBPA, CEP170, CEP89, CETN2, CFH, CFHR4, CFLAR, CHAF1A, CHCHD7, CHD2, CHD3, CHD4, CHD5, CHD7, CHD8, CHDC2, CHEK1, CHEK2, CHIC2, CHMP3, CHN1, CHUK, CIC, CIITA, CIT, CKMT1A, CKS1B, CLCN6, CLDN18, CLIP1, CLK1, CLK2, CLK3, CLK4, CLP1, CLSTN2, CLTC, CLTCL1, CLVS2, CMKLR1, CNBD1, CNBP, CNOT1, CNOT3, CNPY3, CNTN1, CNTNAP5, CNTRL, COBLL1, COL11A1, COL18A1, COL1A1, COL1A2, COL2A1, COL3A1, COMT, COX6C, CPS1, CPXCR1, CR1, CRB1, CREB1, CREB3L1, CREB3L2, CREBBP, CRIPAK, CRKL, CRLF2, CRTC1, CRTC3, CSDE1, CSF1, CSF1R, CSF3R, CSK, CSNK1A1, CSNK1A1L, CSNK1D, CSNK1E, CSNK1G1, CSNK1G2, CSNK1G3, CSNK2A1, CSNK2A2, CTAGE6, CTCF, CTDNEP1, CTDSP1, CTDSP2, CTDSPL, CTDSPL2, CTLA4, CTNNA1, CTNNA2, CTNNB1, CTNND1, CTTN, CUL1, CUL3, CUX1, CXCR4, CYC 1, CYLD, CYP11B1, CYP2A6, CYP2B6, CYP2C19, CYP2C8, CYP2C9, CYP2D6, CYP3A4, CYP3A5, CYP4F2, DAB2IP, DACH1, DACH2, DAPK1, DAPK2, DAPK3, DAXX, DCAF12L2, DCC, DCLK1, DCLK2, DCLK3, DCLRE1A, DCLRE1B, DCLRE1C, DCP1B, DCTN1, DCUN1D1, DDB1, DDB2, DDIT3, DDR1, DDR2, DDX10, DDX3X, DDX5, DDX6, DEFB114, DEFB118, DEFB119, DEK, DERL1, DHX16, DHX9, DIAPHL DICER1, DIDO1, DIO2, DIS3, DIS3L2, DISP1, DKK2, DKK4, DLG2, DLX4, DMC1, DMD, DMPK, DNAH12, DNAJA2, DNAJC6, DNER, DNM2, DNM3, DNMT1, DNMT3A, DNMT3B, DOCK2, DOCK4, DOK6, DOLPP1, DOT1L, DPH3, DPPA4, DPYD, DRD2, DRD5, DSC2, DSG2, DSP, DST, DSTYK, DUPD1, DUSP1, DUSP10, DUSP11, DUSP12, DUSP13, DUSP14, DUSP15, DUSP16, DUSP18, DUSP19, DUSP2, DUSP21, DUSP22, DUSP23, DUSP26, DUSP27, DUSP28, DUSP3, DUSP4, DUSP5, DUSP6, DUSP7, DUSP8, DUSP9, DUT, DYNCH1, DYRK1A, DYRK1B, DYRK2, DYRK3, DYRK4, E2F3, EBF1, EBPL, ECT2L, EDNRB, EED, EEF1A1, EEF2K, EGFL7, EGFR, EGR3, EIF1AX, EIF2AK1, EIF2AK2, EIF2AK3, EIF2AK4, EIF2S1, EIF3E, EIF4A2, ELAVL3, ELF3, ELF4, ELF5, ELK4, ELL, ELN, ELTD1, EME1, EME2, EMG1, EML4, ENDOV, EP300, EPAS1, EPB41L3, EPCAM, EPDR1, EPHA1, EPHA10, EPHA2, EPHA3, EPHA4, EPHA5, EPHA6, EPHA7, EPHA8, EPHB1, EPHB2, EPHB3, EPHB4, EPHB6, EPM2A, EPOR, EPPK1, EPS15, ERBB2, ERBB2IP, ERBB3, ERBB4, ERC1, ERCC1, ERCC2, ERCC3, ERCC4, ERCC5, ERCC6, ERCC6L, ERCC8, ERG, ERN1, ERN2, ERRFI1, ESPL1, ESR1, ESR2, ESRRG, ETNK1, ETS1, ETV1, ETV4, ETV5, ETV6, EWSR1, EXO1, EXOSC10, EXT1, EXT2, EYA1, EYA2, EYA3, EYA4, EZH1, EZH2, EZR, F2, F5, FADD, FAM101A, FAM129B, FAM129C, FAM131B, FAM155A, FAM157B, FAM174B, FAM175A, FAM194B, FAM21A, FAM46C, FAM46D, FAM58A, FAM71B, FAM83H, FAM86B1, FAM86B2, FAM9A, FAN1, FANCA, FANCB, FANCC, FANCD2, FANCE, FANCF, FANCG, FANCI, FANCL, FANCM, FANK1, FAS, FASTK, FAT1, FBN1, FBN2, FBXO11, FBXO43, FBXW7, FCGR1A, FCGR2B, FCGR3B, FCHO2, FCRL4, FEN1, FER, FES, FEV, FGF10, FGF14, FGF19, FGF23, FGF3, FGF4, FGF6, FGF7, FGFR1, FGFR1OP, FGFR2, FGFR3, FGFR4, FGR, FH, FHIT, FIP1L1, FIS1, FKBP9, FLCN, FLI1, FLNA, FLT1, FLT3, FLT4, FN1, FNBP1, FOLR1, FOSL2, FOXA1, FOXA2, FOXL2, FOXO1, FOXO3, FOXO4, FOXP1, FOXP4, FOXQ1, FRG1, FRG2B, FRK, FRS2, FSCN3, FSIP1, FSTL3, FTH1, FUBP1, FUS, FUT9, FYN, G3BP1, G6PD, GAB2, GAB3, GABRA6, GABRB2, GABRB3, GABRP, GAK, GALNT13, GAS6, GAS7, GATA1, GATA2, GATA3, GATA4, GATA6, GATS, GCK, GCSAML, GDI1, GEN1, GID4, GIGYF2, GIPC3, GLA, GLI1, GLI2, GLIPR1L2, GML, GMPS, GNA11, GNA13, GNAI1, GNAQ, GNAS, GNL3L, GNPTAB, GOLGA2, GOLGA5, GOLGA6L6, GOPC, GOT2, GP6, GPC3, GPC6, GPHN, GPR124, GPR89A, GPRASP1, GPS2, GPSM1, GREM1, GRIN2A, GRIN3A, GRK4, GRK5, GRK6, GRK7, GRM3, GRXCR1, GSG2, GSK3A, GSK3B, GSTM1, GSTP1, GSTT1, GTF2H1, GTF2H2, GTF2H3, GTF2H4, GTF2H5, GTF2I, GTF3C5, GUCY1A2, GUCY2C, GUCY2D, GUCY2F, H1F0, H1FNT, H1FOO, H1FX, H2AFB1, H2AFB2, H2AFB3, H2AFJ, H2AFV, H2AFX, H2AFY, H2AFY2, H2AFZ, H2BFM, H2BFWT, H3F3A, H3F3B, H3F3C, HCK, HCN1, HDAC1, HDAC10, HDAC11, HDAC2, HDAC3, HDAC4, HDAC5, HDAC6, HDAC7, HDAC8, HDAC9, HDDC2, HDHD1, HDHD2, HDHD3, HECW1, HELQ, HERC1, HERC2, HERPUD1, HEY1, HGF, HHLA2, HIF1A, HIP1, HIPK1, HIPK3, HIPK4, HIST1H1A, HIST1H1B, HIST1H1C, HIST1H1D, HIST1H1E, HIST1H1T, HIST1H2AA, HIST1H2AB, HIST1H2AC, HIST1H2AD, HIST1H2AE, HIST1H2AG, HIST1H2AH, HIST1H2AI, HIST1H2AJ, HIST1H2AK, HIST1H2AL, HIST1H2AM, HIST1H2BA, HIST1H2BB, HIST1H2BC, HIST1H2BD, HIST1H2BE, HIST1H2BF, HIST1H2BG, HIST1H2BH, HIST1H2BI, HIST1H2BK, HIST1H2BL, HIST1H2BM, HIST1H2BO, HIST1H3A, HIST1H3B, HIST1H3C, HIST1H3D, HIST1H3F, HIST1H3G, HIST1H3H, HIST1H3I, HIST1H3J, HIST1H4A, HIST1H4B, HIST1H4C, HIST1H4D, HIST1H4E, HIST1H4F, HIST1H4G, HIST1H4I, HIST1H4J, HIST1H4K, HIST1H4L, HIST2H2AA3, HIST2H2AA4, HIST2H2AB, HIST2H2AC, HIST2H2BE, HIST2H3A, HIST2H3C, HIST2H3D, HIST2H4A, HIST3H2A, HIST3H2BB, HIST3H3, HKR1, HLA-A, HLA-B, HLF, HLTF, HMGA1, HMGA2, HMGXB4, HNF1A, HNRNPA2B1, HNRNPM, HOOK3, HOXA11, HOXA13, HOXA3, HOXA9, HOXB13, HOXC11, HOXC13, HOXD11, HOXD13, HPCAL4, HRAS, HS6ST1, HSD3B1, HSP90AA1, HSP90AA2P, HSP90AB1, HSPA2, HSPA5, HSPA8, HSPB8, HUNK, HUS1, HUWE1, IAPP, IARS2, ICK, ICOSLG, ID3, IDH1, IDH2, IDO1, IFNGR1, IFNL3, IFT172, IGF1, IGF1R, IGF2, IGF2BP3, IGF2R, IGFBP7, IK, IKBKAP, IKBKB, IKBKE, IKBKG, IKZF1 IKZF2, IKZF3, IL10, IL18RAP, IL1RAPL1, IL2, IL21R, IL2RG, IL3, IL32, IL36A, IL6ST, IL7R, ILF2, ILK, ILKAP, IMPA1, IMPA2, IMPAD1, ING1, INHBA, INPP1, INPP4A, INPP4B, INPP5A, INPP5B, INPP5D, INPP5E, INPP5F, INPP5J, INPP5K, INPPL1, INSR, INSRR, INTS1, INTS4, IRAK1, IRAK2, IRAK3, IRAK4, IRF2, IRF4, IRS1, IRS2, ISOC2, ITGA6, ITK, ITPA, ITPR1, ITPR3, JAK1, JAK2, JAK3, JARID2, JAZF1 JMJD1C, JUN, KALRN, KANK3, KAT6A, KAT6B, KCNE1, KCNH2, KCNJ11, KCNJ5, KCNQ1, KCNT2, KDM5A, KDM5B, KDM5C, KDM6A, KDM6B, KDR, KDSR, KEAP1, KEL, KIAA1109, KIAA1549, KIAA1598, KIDINS220, KIF20B, KIF3A, KIF5B, KIFC3, KIT, KLF4, KLF5, KLF6, KLHL4, KLHL6, KLK2, KLRG1, KMT2A, KMT2B, KMT2C, KMT2D, KNSTRN, KRAS, KRT1, KRTAP1-1, KRTAP15-1, KRTAP19-6, KRTAP5-5, KSR1, KSR2, KTN1, LARS, LASP1, LATS1, LATS2, LCE1B, LCK, LCP1, LDLR, LEF1, LENG9, LEPR, LEPROTL1, LGI4, LHFP, LHPP, LHX9, LIFR, LIG1, LIG3, LIG4, LILRB5, LIMK1, LIMK2, LIN28A, LIN28B, LIN7A, LMNA, LMO1, LMO2, LMOD2, LMTK2, LMTK3, LPP, LPPR1, LPPR2, LPPR3, LPPR4, LPPR5, LRFN5, LRIG3, LRP1B, LRP6, LRRC4C, LRRC55, LRRIQ1, LRRIQ3, LRRK1, LRRK2, LRRTM4, LSM14A, LTBP1, LTBR, LTK, LTV1, LUC7L2, LUM, LUZP2, LYL1, LYN, LZTR1, MACF1, MAD2L2, MADCAM1, MAF, MAFB, MAGEA3, MAGEB18, MAGEB2, MAGEC1, MAGI2, MAK, MALT1, MAML2, MAP1A, MAP1B, MAP2K1, MAP2K2, MAP2K3, MAP2K4, MAP2K5, MAP2K6, MAP2K7, MAP3K1, MAP3K10, MAP3K11, MAP3K12, MAP3K13, MAP3K14, MAP3K2, MAP3K3, MAP3K4, MAP3K5, MAP3K6, MAP3K7, MAP3K8, MAP3K9, MAP4, MAP4K1, MAP4K3, MAP4K4, MAP4K5, MAPK1, MAPK10, MAPK11, MAPK12, MAPK13, MAPK14, MAPK15, MAPK3, MAPK4, MAPK6, MAPK7, MAPK8, MAPK8IP1, MAPK9, MAPKAPK2, MAPKAPK3, MAPKAPK5, 2-Mar, MARCKSL1, MARK1, MARK2, MARK3, MARK4, MAST1, MAST2, MAST3, MAST4, MASTL, MAT2A, MATK, MAX, MBD4, MCL1, MCM7, MCTP1, MDC1, MDM2, MDM4, MDN1, MECOM, MED12, MED13, MED16, MED17, MED20, MEF2A, MEF2B, MEF2C, MEGF6, MELK, MEN1, MERTK, MET, METRNL, METTL14, MGA, MGMT, MGRN1, MICAL1, MINPP1, MITF, MKI67, MKL1, MKNK1, MKNK2, MKRN1, MLF1, MLH1, MLH3, MLKL, MLLT1, MLLT10, MLLT11, MLLT3, MLLT4, MLLT6, MME, MMP2, MMP24, MMP9, MMS19, MN1, MNAT1, MNX1, MOK, MOS, MPG, MPL, MPLKIP, MPND, MPP7, MPRIP, MRAS, MRE11A, MROH2B, MRPS31, MRPS9, MSH2, MSH3, MSH4, MSH5, MSH6, MSI2, MSMB, MSN, MST1, MST1R, MST4, MTCP1, MTF2, MTHFR, MTM1, MTMR1, MTMR10, MTMR11, MTMR12, MTMR2, MTMR3, MTMR4, MTMR6, MTMR7, MTMR8, MTMR9, MTOR, MTRNR2L1, MTRNR2L8, MTUS2, MUC1, MUC2, MUC4, MUC6, MUC7, MUM1L1, MUS81, MUSK, MUTYH, MYB, MYBL1, MYBPC3, MYC, MYCBP2, MYCN, MYD88, MYH11, MYH7, MYH9, MYL10, MYL2, MYL3, MYLK, MYLK2, MYLK3, MYLK4, MYNN, MYO1D, MYO3A, MYO3B, MYO5A, MYOD1, MYOZ3, MYT1, NAA15, NAB2, NABP2, NACA, NACC2, NALCN, NAP1L2, NAT2, NAV1, NAV3, NBEA, NBN, NBPF10, NCF1, NCKIPSD, NCOA1, NCOA2, NCOA3, NCOA4, NCOA7, NCOR1, NCOR2, NDRG1, NEB, NEDD4L, NEFH, NEIL 1, NEIL2, NEIL3, NEK1, NEK10, NEK11, NEK2, NEK3, NEK4, NEK5, NEK6, NEK7, NEK8, NEK9, NELFA, NELFB, NF1, NF2, NFATC2, NFE2L2, NFE2L3, NFIB, NFKB1, NFKB2, NFKBIA, NFKBIB, NFKBIE, NFKBIZ, NHEJ1, NIM1, NIN, NIPBL, NKX2-1, NKX3-1, NLK, NLRP2, NLRP3, NLRP5, NLRP6, NM, NMS, NMT2, NOD1, NOMO1, NONO, NOTCH1, NOTCH2, NOTCH2NL, NOTCH3, NOTCH4, NPAS3, NPEPL1, NPEPPS, NPM1, NPR1, NPR2, NQO1, NR, NR1H2, NR4A2, NR4A3, NRAS, NRBP1, NRBP2, NRG1, NRG3, NRK, NSD1, NT5C2, NTHL1, NTM, NTNG1, NTRK1, NTRK2, NTRK3, NUAK1, NUAK2, NUDT1, NUDT10, NUDT11, NUDT14, NUDT3, NUDT4, NUMA1, NUMBL, NUP214, NUP93, NUP98, NUTM1, NUTM2A, NUTM2B, NXPE1, OBSCN, OCRL, OGG1, OLIG2, OMD, OR2L2, OR2W3, OR5L1, OR9G1, OSBPL6, OSR1, OTOL1, OTUB1, OTUD4, OXA1L, OXNAD1, OXR1, P2RY11, P2RY8, P4HB, PABPC1, PABPC3, PABPC4, PABPC5, PACS1, PADI2, PADI4, PAFAH1B2, PAK1, PAK2, PAK3, PAK4, PAK6, PAK7, PALB2, PAN3, PAPD5, PARK2, PARM1, PARP1, PARP2, PARP3, PASK, PATZ 1, PAX3, PAX5, PAX7, PAX8, PBK, PBRM1, PBX1, PCBP1, PCDH11X, PCK1, PCM1, PCMTD1, PCNA, PCSK7, PCSK9, PDCD1, PDCD1LG2, PDE1A, PDE4DIP, PDGFB, PDGFRA, PDGFRB, PDIK1L, PDK1, PDK2, PDK3, PDK4, PDP2, PDPK1, PDS5A, PDS5B, PDXP, PDYN, PEAK1, PEG3, PER1, PES1, PFN2, PGM5, PGP, PGR, PHF 1, PHF 19, PHF6, PHKG1, PHKG2, PHLDA1, PHLDA3, PHLPP2, PHOX2B, PICALM, PIK3C2B, PIK3C2G, PIK3C3, PIK3CA, PIK3CB, PIK3CD, PIK3CG, PIK3R1, PIK3R2, PIK3R3, PIK3R4, PIM1, PIM2, PIM3, PINK1, PIP5K1A, PJA1, PKD1, PKD2, PKDCC, PKHD1, PKN1, PKN2, PKN3, PKP2, PLAG1, PLAGL1, PLCG1, PLCG2, PLCH2, PLCL1, PLEC, PLEKHS1, PLK1, PLK2, PLK3, PLK4, PMAIP1, PML, PMS1, PMS2, PNCK, PNKP, PNLIPRP3, PNRC1, POLB, POLD1, POLE, POLG, POLH, POLI, POLK, POLL, POLM, POLN, POLQ, POLR2D, POM121L12, POMK, POT1, POTEC, POTEF, POTEG, POU2AF1, POU3F2, POU5F 1, PPA1, PPA2, PPAP2A, PPAP2B, PPAP2C, PPAPDC1A, PPAPDC1B, PPAPDC2, PPAPDC3, PPARG, PPEF 1, PPEF2, PPFIA4, PPFIBP1, PPIF, PPM1A, PPM1B, PPM1D, PPM1E, PPM1F, PPM1G, PPM1H, PPM1J, PPM1K, PPM1L, PPM1M, PPM1N, PPP1CA, PPP1CB, PPP1CC, PPP2CA, PPP2CB, PPP2R1A, PPP3CA, PPP3CB, PPP3CC, PPP4C, PPP5C, PPP6C, PPTC7, PRB 1, PRB2, PRB4, PRCC, PRDM1, PRDM16, PRDM2, PRELID2, PREX2, PRF 1, PRG4, PRKAA1, PRKAA2, PRKACA, PRKACB, PRKACG, PRKAG2, PRKAR1A, PRKAR1B, PRKCA, PRKCB, PRKCD, PRKCE, PRKCG, PRKCH, PRKCI, PRKCQ, PRKCZ, PRKD3, PRKDC, PRKG1, PRKG2, PRKX, PRPF19, PRPF4, PRPF8, PRRC2A, PRRX1, PRSS1, PRSS3, PRSS8, PRX, PSEN1, PSG5, PSG6, PSG8, PSIP1, PSKH1, PSKH2, PSMD11, PSME3, PSPH, PTCH1, PTCH2, PTEN, PTH, PTK2, PTK2B, PTK6, PTK7, PTP4A1, PTP4A2, PTP4A3, PTPDC1, PTPLA, PTPMT1, PTPN1, PTPN11, PTPN12, PTPN13, PTPN14, PTPN18, PTPN2, PTPN20A, PTPN21, PTPN22, PTPN23, PTPN3, PTPN4, PTPN5, PTPN6, PTPN7, PTPN9, PTPRA, PTPRB, PTPRC, PTPRD, PTPRE, PTPRF, PTPRG, PTPRH, PTPRJ, PTPRK, PTPRM, PTPRN, PTPRN2, PTPRO, PTPRQ, PTPRR, PTPRS, PTPRT, PTPRU, PTPRZ1, PWP1, PWWP2A, PXK, PXN, PYDC2, QKI, RAB11FIP5, RAB35, RABEP1, RAC1, RAC2, RAD1, RAD17, RAD18, RAD21, RAD23A, RAD23B, RAD50, RAD51, RAD51B, RAD51C, RAD51D, RAD52, RAD54B, RAD54L, RAD9A, RAF1, RAG1, RAI14, RALGAPA1, RALGDS, RANBP17, RANBP2, RANBP3, RANGAP1, RAP1GDS1, RARA, RASA1, R131, RBBP8, RBFOX2, RBM10, RBM11, RBM15, RBMX, RCN1, RDM1, RECQL, RECQL4, RECQL5, REG1A, REG1B, REG3A, REG3G, REL, RELA, RELB, RERE, RERG, RET, REV1, REV3L, RFWD2, RGPD8, RGS18, RHEB, RHOA, RHOB, RHOH, RHOT1, RICTOR, RIF1, RIMS2, RIOK1, RIOK2, RIOK3, RIPK1, RIPK2, RIPK3, RIPK4, RIT1, RMI2, RNASEL, RNF10, RNF111, RNF144A, RNF168, RNF185, RNF213, RNF34, RNF4, RNF43, RNF8, RNGTT, ROBO3, ROCK1, ROCK2, ROR1, ROR2, ROS1, RP11-160N1.10, RP11-181C3.1, RP11-683L23.1, RP11-758M4.1, RPA1, RPA2, RPA3, RPA4, RPGR, RPL10, RPL10L, RPL13A, RPL22, RPL5, RPN1, RPP38, RPS27, RPS6KA1, RPS6KA2, RPS6KA3, RPS6KA4, RPS6KA5, RPS6KA6, RPS6KB1, RPS6KB2, RPS6KC1, RPS6KL1, RPTOR, RQCD1, RRAD, RRAS, RRAS2, RRM1, RRM2B, RSPO2, RSPO3, RSRC1, RUNDC3B, RUNX1, RUNX1T1, RUNX2, RXRA, RYBP, RYK, RYR1, RYR2, SACM1L, SAMHD1, SATB2, SAV1, SBDS, SBF1, SBF2, SBK1, SBK2, SBK3, SCN5A, SCYL1, SCYL2, SCYL3, SDC4, SDHA, SDHAF2, SDHB, SDHC, SDHD, SEC23B, SEC31A, SECISBP2, SEMA3C, SEMA3E, SEMG1, SEPT5, SEPT6, SEPT9, SERPINB3, SERPINB4, SET, SETBP1, SETD2, SETDB1, SETDB2, SETMAR, SETX, SF3B1, SFPQ, SFRP1, SGK1, SGK2, SGK223, SGK3, SGK494, SGPP1, SGPP2, SH2B3, SH2D1A, SH3GL1, SH3PXD2A, SHFM1, SHH, SHOC2, SHPRH, SHQ1, SI, SIK1, SIK2, SIK3, SIN3A, SIRT1, SIRT2, SIRT3, SIRT4, SIRT5, SIRT6, SIRT7, SKI, SKP2, SLC12A2, SLC13A1, SLC17A8, SLC1A2, SLC22A13, SLC25A10, SLC25A4, SLC25A5, SLC26A3, SLC34A2, SLC38A4, SLC3A2, SLC45A3, SLC5A7, SLC9B1, SLCO1B1, SLIT2, SLITRK6, SLK, SLX1A, SLX1B, SLX4, SMAD2, SMAD3, SMAD4, SMARCA2, SMARCA4, SMARCAD1, SMARCB1, SMARCD1, SMARCE1, SMC1A, SMC3, SMC4, SMCHD1, SMG1, SMG7, SMO, SMUG1, SMYD4, SNAP91, SNCAIP, SND1, SNRK, SNTG2, SNX29, SNX31, SOCS1, SOS1, SOS2, SOX10, SOX17, SOX2, SOX9, SP2, SPAG16, SPANXN1, SPANXN2, SPATA6, SPECC1, SPEG, SPEN, SPHKAP, SPNS1, SPO11, SPOCK3, SPOP, SPRED1, SPRR2G, SPRTN, SPRY1, SPRY2, SPRY4, SPTA1, SPTAN1, SPTBN1, SQSTM1, SRC, SRCAP, SRCIN1, SRGAP3, SRM, SRPK1, SRPK2, SRPK3, SRRM2, SRSF2, SRSF3, SS18, SS18L1, SSH1, SSH2, SSH3, SSX1, SSX2, SSX2IP, SSX4, STAG1, STAG2, STAG3, STARD6, STAT3, STAT4, STAT5B, STATE, STEAP4, STIL, STIP1, STK10, STK11, STK16, STK17A, STK17B, STK19, STK24, STK25, STK3, STK31, STK32A, STK32B, STK32C, STK33, STK35, STK36, STK38L, STK39, STK40, STRADA, STRADB, STRN, STYK1, STYX, STYXL1, SUFU, SULT1A1, SULT1B1, SUPT4H1, SUPT5H, SUZ12, SV2C, SVIL, SWI5, SYK, SYNE1, SYNJ1, SYNJ2, SYT4, TAB 1, TACC1, TADA1, TADA2B, TAF1, TAF15, TAF1A, TAF1L, TAL1, TANC2, TAOK1, TAOK2, TAOK3, TAS2R10, TAS2R13, TAS2R14, TAS2R43, TAS2R60, TBC1D2B, TBC1D31, TBCK, TBK1, TBL1XR1, TBP, TBX15, TBX22, TBX3, TCEA1, TCF12, TCF3, TCF4, TCF7, TCF7L2, TCL1A, TDG, TDP1, TDP2, TEC, TECRL, TEK, TENC1, TENM3, TERT, TESK1, TESK2, TET1, TET2, TEX13A, TEX14, TFDP1, TFE3, TFEB, TFG, TFPT, TFRC, TGFBR1, TGFBR2, TGIF1, TGIF2LX, TGOLN2, THADA, THEM5, THEMIS, THRAP3, TICAM1, TIE1, TIMM50, TJP2, TLK1, TLK2, TLR4, TLX1, TLX3, TMCO5A, TMED4, TMEM101, TMEM127, TMEM43, TMPRSS2, TMTC1, TNC, TNFAIP3, TNFRSF10C, TNFRSF11A, TNFRSF13B, TNFRSF14, TNFRSF17, TNIK, TNK1, TNK2, TNKS, TNKS1BP1, TNKS2, TNNI3, TNNI3K, TNNT2, TNPO1, TNS1, TNS3, TOB2, TOM1, TOP1, TOP2A, TOP3A, TOPBP1, TP53, TP53BP1, TP53RK, TP53TG3D, TP63, TPM1, TPM3, TPM4, TPMT, TPR, TPSAB1, TPSB2, TPST1, TPTE, TPTE2, TRADD, TRAF2, TRAF3, TRAF7, TRAT1, TRDN, TREX1, TREX2, TRIM24, TRIM27, TRIM28, TRIM33, TRIM58, TRIM7, TRIML2, TRIO, TRIP11, TRMT10C, TRPM1, TRPM3, TRPM4, TRPM6, TRPM7, TRPV4, TRRAP, TSC1, TSC2, TSHR, TSHZ2, TSHZ3, TSPAN19, TSSK1B, TSSK2, TSSK3, TSSK4, TSSK6, TTBK1, TTBK2, TTK, TTL, TTN, TUBA1A, TUSC3, TWF1, TWF2, TXK, TXNIP, TYK2, TYMS, TYRO3, U2AF1, UBALD1, UBE2A, UBE2B, UBE2N, UBE2NL, UBE2V2, UBE2Z, UBE4A, UBLCP1, UBR5, UBXN11, UGT1A1, UGT1A7, UGT2A3, UGT2B28, UHMK1, UHRF1BP1L, ULK1, ULK2, ULK3, ULK4, UNG, UQCRFS1, USP2, USP28, USP29, USP6, USP7, USP9X, UTP14A, UTY, UVSSA, VAT1L, VCPIP1, VCX2, VEGFA, VEGFC, VEZF1, VEZT, VHL, VKORC1, VRK1, VRK2, VRK3, VTCN1, VTI1A, WAPAL, WAS, WBSCR17, WDR49, WDR52, WDR74, WEE1, WEE2, WHSC1, WHSC1L1, WIF1, WISP3, WNK1, WNK2, WNK3, WNK4, WNT2, WRN, WT1, WWTR1, XAB2,XBP1, XIAP, XPA, XPC, XPO1, XPOT, XRCC1, XRCC2, XRCC3, XRCC4, XRCC5, XRCC6, YAP1, YARS, YES1, YME1L1, YPEL5, YWHAE, ZAP70, ZBBX, ZBTB16, ZBTB2, ZBTB7B, ZCCHC3, ZCCHC8, ZDHHC14, ZDHHC16, ZEB2, ZFHX3, ZFP36L1, ZFP36L2, ZFP41, ZIC4, ZMAT4, ZMYM2, ZMYM3, ZMYM4, ZMYND8, ZNF100, ZNF132, ZNF208, ZNF217, ZNF268, ZNF28, ZNF300, ZNF324, ZNF331, ZNF384, ZNF429, ZNF444, ZNF451, ZNF488, ZNF492, ZNF493, ZNF521, ZNF567, ZNF598, ZNF668, ZNF676, ZNF703, ZNF705G, ZNF708, ZNF716, ZNF717, ZNF727, ZNF750, ZNF799, ZNF80, ZNF804A, ZNF804B, ZNF812, ZNF814, ZNF844, ZNF91, ZNF98, ZNF99, ZNRF3, ZPBP, ZRSR2, ZSWIM2, MYCL, MYCL, MLK4, MLK4, ZAK, FRG1B, FRG1B, TRBV5-4.
The biomarkers may be selected from one or more intron source including: ALK, BRAF, BRD3, BRD4, EGFR, ERG, ETV1, ETV4, ETV5, EWSR1, FGFR1, FGFR2, FGFR3, MET, NOTCH1, NRG1, NTRK1, NTRK2, NTRK3, NUTM1, PDGFRA, PDGFRB, PRKCA, PRKCB, RAF1, RET, ROS1, TMPRS S2.
The biomarkers may be selected from one or more promoters including: AC099552.4, ADAMTS10, AGBL4, ANKRD30BL, ANKRD53, AP003733.1, AP2A1, ARHGEF18, ARHGEF35, BCL2, BCL2L11, C16orf59, C4orf27, CABLES2, CACNA1C, CBWD1, CCDC107, CDC20, CDH18, CHMP3, COL11A1, CYLD, CYP4F2, DIO2, DLG2, DNAJA2, EZH2, FAM129C, FAM21A, FCGR3B, GALNT13, GOLGA2, GPR89A, GTF2I, GTF3C5, HCN1, HERC2, HKR1, IGFBP7, INSR, ISOC2, ITPR1, KALRN, KLRG1, LENG9, LEPROTL1, LTV1, LUC7L2, MAGEA3, MASTL, MED16, MEF2C, MGRN1, MPND, MRPS9, MTRNR2L1, MTRNR2L8, MYNN, MYOZ3, NALCN, NCOA7, NEK11, NFKBIE, NPAS3, NPEPPS, NXPE1, OR2L2, OR2W3, OR9G1, OXNAD1, PACS1, PADI4, PAPD5, PFN2, PLEKHS1, POLR2D, POU5F1B, PPAPDC1A, PRSS1, RAI14, RGPD8, RNF185, RNF34, RPL13A, RPS27, SECISBP2, SLC12A2, SMG1, SMUG1, SNTG2, SP2, STAG3, STAG3L5P-PVRIG2P-PILRB, TBC1D2B, TBC1D31, TCF3, TCL1A, TERT, TNK2, TPM3, TPSAB1, TPSB2, TPTE, TRBV5-4, TRMT10C, TRPM4, TRPV4, VCPIP1, WDR74, ZDHHC16, ZNF324, ZNF488, ZNF708, ZNF716, ZNF717, ZNF727, ZNF799.
The biomarkers may be selected from the microsatellite instability (MSI) source including ADGRG6, ALG10B, BAT25, BAT26, BCL11B, BCL2, BCL6, BCL7A, C1orf159, CALM1, CTNNA2, D17S250, D2S123, D5S346, DHX16, DLX4, DRD5, EEF1A1, FGF7, FLI1, FSCN3, GNAS, GP6, HPCAL4, INPP4B, LRRC4C, MAP2K2, MAT2A, METRNL, NR21, NR22, NR27, PES1, PLCL1, PRELID2, RCN1, TBC1D31, TENM3, TOB2, TP53TG3D, XBP1, ZFP41, ZNF208.
The biomarkers may be selected from viral genomes that are known to be involved in cancer including human papillomavirus (HPV), Herpes Simplex (HSV), Epstein-Barr Virus (EBV), Hepatitis B Virus (HBV), Hepatitis C Virus (HCV), Human T-lymphotropic Virus 1 (HTLV-1), Human Herpesvirus-8 (HHV8). A genetic variant or alteration may be a single nucleotide variant, an indel, a transversion, a translocation, an inversion, a deletion, a chromosomal structure alteration, a gene fusion, a chromosome fusion, a gene truncation, a gene amplification, a gene duplication and a chromosomal lesion.
In another aspect, the present disclosure provides a computer-implemented method for providing a subject displaying cancer with a therapy. Biologic data may be received for a subject. The biological data may be generated from one or more biological samples of the subject. The biologic data can be used to generate a first list of therapies according to a molecular profile of the subject. The molecular profile may be indicative of one or more genomic aberrations in one or more biological samples. A second list of therapies may be generated from a first list of therapies using medical history data of the subject. The list of therapies may comprise clinical trial(s) and/or standard of care. The second list of therapies may be presented to a subject on a user interface. The second list of therapies can be presented to a clinician to select for a recommended therapy. The subject may also receive a request for enrollment in a given therapy from the second list of therapies.
During acquisition of biological data, the biological data may be generated from one or more biological samples of the subject. The biologic data may be generated from one or more biological samples of the subject without any pipetting by a user during preparation of one or more biological samples. Alternatively, the biologic data may be generated from one or more biological samples of the subject with pipetting by a user during preparation of one or more biological samples. The biologic data may comprise data generated from one or more biological samples selected from the group consisting of protein, peptides, cell-free nucleic acids, ribonucleic acids, deoxyribose nucleic acids, and any combination thereof. The biologic data may comprise a molecular profile that is indicative of one or more genomic aberrations in one or more biological samples. One or more genomic aberrations can include nucleic acid mutations and/or differentially expressed proteins. Nucleic acid mutations may be selected from the group consisting of an insertion(s), nucleotide deletion(s), nucleotide substitution(s), amino acid insertion(s), amino acid deletion(s), amino acid substitution(s), gene fusion(s), copy-number variation(s), and genes or variants selected from Table 1.
A panel of molecular assays may be used for DNA, RNA, and protein analysis. The tumor tissue DNA assay may be a highly sensitive, next generation sequencing (NGS) based somatic mutation detection across at least about 100, at least about 500, at least about 1000, at least about 1500, at least about 2000, at least about 2500, at least about 3000, or at least about 4000 genes or at least about 20, at least about 30, at least about 40, at least about 50, at least about 60, at least about 70, at least about 80, at least about 90, at least about 100, at least about 150, at least about 200, at least about 250,or at least about 300 introns. The tumor tissue DNA assay may meet the analytical standards for Medicare coverage. The circulating tumor DNA (ctDNA) assay may be a non-invasive, liquid biopsy of circulating tumor DNA. Additionally NGS based mutation detection may be obtained for at least about 100, at least about 200, at least about 300, at least about 400, at least about 500, at least about 600, at least about 700, at least about 800, at least about 900, at least about 1000, at least about 1500, or at least about 2000 genes. The tumor RNA-sequencing assay may be NGS-based, whole transcriptome sequencing. The tumor IHC assay may be an immunohistochemical testing of key oncology proteins and immune-oncology markers.
The biologic data can be used to generate a first list of therapies according to a molecular profile of the subject. Alternatively, the subject's medical history data and biologic data may be used concurrently to generate the first list of therapies. Generating a first list of therapies may comprise querying one or more databases for one or more targeted therapies according to a predetermined gene or genomic region. Matches with therapies according to molecular requirements may be grouped based on matching specificity to the subject's molecular profile. For example, therapies that match for a specific point mutation can be grouped in separate category than therapies that match for mutations of a gene. Therapy databases can comprise public repositories or trials obtained from specific affiliations. Public repositories can include a database selected from the group consisting of ClinicalTrials.gov, National Institute of Health, Research Match, and national registries, such as the breast cancer family registry and the colon cancer family registry. Trials obtained from a specific affiliation can comprise knowledge of trials that are not accessible in a public repository and can be obtained from an affiliated institution.
The first list of therapies may exclude therapies that target genomic aberrations absent in one or more biological samples. Generating a first list of therapies can also comprise removing therapies that target genomic aberrations absent in one or more biological samples. Generating a first list of therapies (e.g. clinical trials) can also comprise sorting the therapies into two categories. The two categories may include therapies that target the subject's mutation and therapies that do not specify a molecular target. Matches of the therapies according to molecular requirements may be determined based on matching specificity to the subject. For example, therapies that match for a specific point mutation can be differentiated from therapies that match for mutations of a gene. The therapies may be matched to a subject according to labels identifying the profile of the subject. The labels may be questions targeted to understanding the subjects's molecular and medical history and status. Labels can be generated according to a topic selected from the subject's genomic and biomarker profile, diagnosis status, prior therapies conducted on the subject, outcomes of prior therapies conducted on the subject, and other comorbidities.
The first list of therapies may additionally be filtered according to phases of the therapy. For example, phases of a therapy may be phases of a clinical trial. Clinical trials can comprise five phases: phase 0, phase 1, phase 2, phase 3, and phase 4. Phase 0 may comprise human micro dosing studies. Data from phase 0 can accelerate the development of promising drugs or imaging agents by determining early on whether a drug or agent can behave in human subjects as was expected from pre-clinical studies. Phase 1 may be the first-in-man studies and can be the first stage to test the drug in human subjects. In phase 1, the maximum dosage of a drug administered to a subject before adverse effects become dangerous or intolerable can be determined. This group of clinical trials may be operated by the contract research organization (CROs). During phase 2, the drug can be tested for biological activity or effect. A group of at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 350, or at least about 400 subjects can be enrolled during the phase 2 studies. During phase 3, the effectiveness of the new drug may be determined and the value of the new intervention can be assessed. A group of at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 350, at least about 400, at least about 500, at least about 1000, at least about 2000, and at least about 3000 subjects can be enrolled during the phase 3 studies. Phase 4 trials may comprise determining safety surveillance and ongoing technical support of a drug after it has been approved for sale.
A second list of therapies may be generated from a first list of therapies using medical history data of the subject. Alternatively, the subject's medical history data and biologic data may be used concurrently to generate the first list of therapies. The second list of therapies may be the first list of therapy. Medical history data for a subject may be received and processed according to
The medical history data may also be labeled according to relevant medical text segments. The medical history data may be processed into the label name, the label category, and the label value. The label name indicates a question identifying one or more relevant portions of the medical history data. The label category may be a grouping and/or classification of one or more label names. The label value may be an answer to the label name. The label value may be selected from the group consisting of yes, maybe, and no. The label value may correspond to the group consisting of yes, maybe, and no. A medical text segment may be a word or phrase in a medical record that can be used to confirm an eligibility requirement for a clinical trial. There can be an abundance of text in medical records but only a small subset of it is relevant to determine the eligibility of a subject for a trial. The medical text segment may comprise a proprietary set of topics. Labeling can comprise extracting from the first list of therapies a second list of therapies. The labels can comprise questions targeted to understanding the subject's profile, prior therapy history and outcomes from prior therapies. Labeling can be accomplished manually or automatically. Manual labeling can involve a lengthy review of patient records and trial criteria descriptions. The machine learning model can detect and label the relevant medical text segments. Different weight may be assigned to different subject parameters depending on the particular medical condition being treated and on the particular patient being treated. Machine learning prediction can be used to generate vectors to calculate similarity and to generate a set of scores for matching between the subject's clinical trial eligibility and the medical records.
The subject's clinical trial eligibility that is pre-filtered by the subject's molecular profile may be combined with a subject's medical records into a natural language processor (NLP). State of the art NLP and information extraction (IE) techniques may be customized and implemented to build the automated eligibility screening (ES) architecture. Eligibility criteria can include a demographics filter such as a filter for age, race, geographic data, physical data, financial data, and gender. A trial enrollment window may also be used to expedite a pre-filtering process. For example, if a subject did not have clinical data within a start date and closing date of an enrollment window of at least, the subject may be removed from participating in a specific clinical trial. Text and medical terms processing can utilize advanced NLP methods to extract medically relevant information from the patient medical history records. During NLP extraction, an algorithm may be generated to first extract medical information using acronyms and keywords from an extraction system. The extraction system may be a custom designed extraction system. The extraction system may be the Apache clinical Text Analysis and Knowledge Extraction System (cTAKES). Extraction systems, such as cTAKES, can assign medical terms to the identified text strings from controlled terminologies such as Concept Unique Identifiers (CUI) from the Universal Medical Language System (UMLS), standardized nomenclature for clinical drugs (RxNorm), and Systematized Nomenclature of Medical Clinical Terms codes (SNOMED-CT). This process can also be utilized for identifying medical terms and texts from the diagnosis strings. Additionally, codes from the international classification of diseases, such as ICD-9 codes, can be mapped to SNOMED-CT terms using the UMLS ICD-9 to SNOMED-CT dictionary. A negation detector can also be utilized to determine negations. The negation detector may be based on the NegEx algorithm. Identified medical terms and texts can be stored as a bucket of words in a subject vector. Such an inclusion exclusion technique can be derived from medical terms and text processing to pull term-level patterns. All terms pulled from the exclusion criteria can be transformed into the negated format. The medical terms and texts extracted from a subject's Electronic Health Record (EHR) can be stored in a vector that is a representation of the subject's profile. The Bayesian network may be used to infer the marginal probability of label values given other labels' values observed in a subject's medical records as well as from aggregated population data. Bayesian Networks may be used to infer medical history that is not explicitly found in the subject's medical records. Bayesian networks may be used to infer labels or label values not found in the medical text but using relationships between labels that are found in the text and/or informed by population-level data. Alternatively, statistical learning algorithms may be used to infer aspects of the medical history not available in the text based on population data.
Generation of the first or second list of therapies can also comprise determining ineligible therapies according to a categorical score and rejecting ineligible therapies from remaining therapies to generate a filtered list of remaining therapies. The categorical score can be selected from the group consisting of yes, maybe, and no. The categorical score may correspond to the group consisting of yes, maybe, and no. Boolean logic may be used to calculate whether any given label's value as assessed for a subject by the system is a mismatch with the expected label values in the criteria crucial to therapy enrollment. If a subject's value for a given label is mismatched with the expected value for a given label, as expressed in the criteria for a therapy, then the subject maybe ineligible for the therapy. The therapies may be grouped using a similarity score between the subject and all the therapies based on the labels. One similarity metric used can be finding an empirical significance threshold and determining positive therapies by a specific criterion and then assessing overlap among positive therapies in a standard manner. Contrarily, a dissimilarity measure can be a numerical measure of the degree to which two objects are different. The therapies that fall below a minimum similarity score for criteria crucial to therapy enrollment can be ineligible. The list of remaining therapies may then be compared and reviewed. The review may generate a first list or second list of therapies.
The first list or second list of therapies may be passed to a user to manually verify eligibility using links to information from the medical history data and the biologic data for the subject. The user may be a healthcare professional or a primary care provider of the subject. The therapy filtering preferences can be selected from the group consisting of availability at a specific institution, availability at a set of institutions, type of treatment, phase of clinical trial, method of drug delivery, location and distance of a given therapy from a specified location, duration of treatment, and patient relocation therapy duration. The types of treatment may be selected from the group consisting of immunotherapy, targeted therapy, chemotherapy, radiation therapy, hormone therapy, stem cell transplant, precision medicine, and surgery. Methods of drug delivery can comprise non-invasive peroral, topical, transmucosal, and inhalation routes. Transmucosal route can comprise nasal, buccal/sublingual, vaginal, ocular and rectal. Filtering can further comprise an evaluation by a healthcare professional and a selection for a recommended therapy. A group of at most 10, 15, 20, 25, 30, 35, 40, 45, or 50 therapies may be presented to a clinician to select for a recommended therapy. The therapies may then be passed for a final authorization by a medically qualified staff member to review therapies based on the proprietary labels, and using their expert knowledge rule out groups of labels that are less successful for the subject. The subject may access a link to the matched therapies on their profile webpage on the user interface. The subject may receive an email with a link to the matched therapies. The matched therapies may be displayed on a user interface. The user interface may display the status of the acquisition of medical history data and biologics data. The user interface may display matched therapies organized according to categories such as chemotherapies, targeted therapies, immunotherapies, and radiotherapies.
A subject may then receive a request for enrollment in a therapy through a user interface. A selection from the subject may be received as to one or more therapies. A request for enrollment may be received from the subject in a therapy selected from the therapies through the user interface. Any therapy can be added to a subject profile for a subject. A caregiver may view all profiled therapies of the subject. If desired, a new clinical trial can be profiled. The name of a new clinical trial can be entered into the subject's therapy system. As part of the subject's profile, the subject may select for a crowd funding option to aid in the cost of his or her cancer therapy. The crowd funding option may connect the subject to links such as YouCaring.com, FundRazr, GoFundMe, GiveForward and Indiegogo.
In another aspect, the present disclosure provides a computer-implemented method for qualifying a subject for a clinical trial
In another aspect, the present disclosure provides a method for qualifying a subject for a subset of therapies. The medical history data and biologic data may be received for the subject. The biologic data may be generated from one or more biological samples of the subject. The medical history data and the biologic data may be analyzed to yield a genomic-based medical history analysis for the subject. The genomic-based medical history analysis may be used to query one or more databases of therapies for the subject and to generate the subset of therapies for which the subject qualifies. Then, the subset of therapies can be presented on a user interface on an electronic device of a user.
During therapy curation 1002, an abundance of therapy criteria may be condensed using a set of labels as identifiers of relevant portions of the therapy data. For example, trial 1 may require the subject to be absent of lesions in the brain, trial 2 may require the subject to be free of central nervous system involvement, and trial 3 may require the subject to be absent of leptomeningeal disease. The label for these three requirements may be identified as “Does the patient have brain metastases?” and the required answer would be “No” if the subject is to qualify for the three therapies. The required answer may be obtained by reviewing the subject's biologic data and medical history data.
In the treatment matching 1200 of
A software based laboratory and management system may be utilized. The system may be a laboratory information management system (LIMS). The LIMS may comprise features that support a modern laboratory's operations.
The biologic data from the one or more biological samples of the subject may be automatically generated without any involvement of the user. The biological data may be used for cloud based clinical trial matching, clinical trial enrollment, treatment matching, records acquisition, and drug development. One or more clinical trials within the generated set of clinical trials may be prioritized. The prioritizing may be based on one or more factors selected from the group consisting of: geographic location of the clinical trial, regulatory approval status, annotated medical history data for the subject, or a combination thereof.
In another aspect, the subject may qualify for one or more therapies. The method may include receiving a first nucleic acid sample from a tumor tissue sample of the subject and a second nucleic acid sample from a normal tissue sample of the subject. The first nucleic acid sample and second nucleic acid sample may be obtained from the tumor tissue sample and the normal tissue sample automatically without any involvement from a user. Next, the first nucleic acid sample and second nucleic acid sample may be assayed to identify one or more genomic alterations in the tumor tissue sample relative to the normal tissue sample to generate a set of genomic data for the subject. The databases may be queried for one or more therapies (e.g. clinical trials) corresponding to a medical history of the subject and the genomic data to generate a set of therapies. The therapy may comprise at least one therapy that has a predicted likelihood of success that is at least about 90%. A set of therapies and standard treatment options, such as treatment options based on National Comprehensive Cancer Network (NCCN) guidelines, may be presented on a user interface for display to a user.
In preparation for a therapy, subjects may be recruited. Several factors may be considered in qualifying a subject for a therapy or enrolling a subject in a therapy. Factors considered may include geographical feasibility or location, population research, optimal recruiting site selection, site assessment, recruitment materials, media support, media management, site training materials, study website, patient referral follow-up, translations, community outreach, physician outreach, site support, and monitoring and reporting for assessment of patient recruiting activities. For subjects participating in global clinical studies, patient retention services may be a factor. The subject retention services can include visit reminders, patient support items, and care giver support.
During enrollment of a subject into therapies, the database may be queried for one or more therapies corresponding to a medical history of the subject and genomic data to generate a set of therapies. Eligibility criteria can be another decisive factor for the types of clinical trial enrollment. Eligibility criteria may comprise age, gender, medical history, and current health status. For example, subjects may need to have a particular type and stage of cancer to participate in a particular trial. The subject may be comprise one or more of individual, a group of individuals, a medical professional providers including clinicians, physicians, dentists, nurse practitioners, radiologists, anesthesiologist, psychologists, pharmacist, psychiatrists, dental hygienists, nurses, dentists, chiropractors, physical therapists, occupational therapists, speech pathologists, nutritionists, orthodontists, laboratory personnel, medical coders, diagnostic center personnel, emergency\ambulatory medical personnel, a hospital, a health care providing organization, an HMO, an insurance provider, a government agency, or a financial institution, business entity (e.g., insurance company, employer, pharmaceutical company, academic institution, non-governmental organization, Medicare/Medicaid, or community health care provider.
The subject enrolled in the therapy may be monitored by assaying one or more biological samples from the subject. The assaying may be directed to at least about 50 genes, 100 genes, 200 genes, 300 genes, 400 genes, 500 genes, 1000 genes, 1500 genes, 2000 genes, or 2500 genes selected from Table 1. The likelihood of success for the subject may be predicted. One or more therapies may be annotated. Querying of one or more databases has a predicted likelihood of matching to a therapy of at least about 70%, 75%, 80%, 85%, 90%, or 95%.
Medical history may be retrieved for the subject. The medical history data may be automatically annotated in standardized terminology. The standardized terminology may be Unified Medical Language System. The medical history data may be inputted into the records acquisition and processing system and a resultant annotated medical history may be attained. The medical history may be editable file or non-editable files. Editable files may comprise one or more of medical history nutrition, habits, exercise regimen, medication, race, height, weight, demographics, event log, allergies, testing results, diagnostics electronic living will, DNA profile, DNA samples or markers, blood pressure ranges, blood sugar levels, mental health information, cancer treatment history, response to treatment, surgical interventions, history of present illness, review of organ systems, family and childhood diseases, regular and acute medications, sexual history, obstetric/gynecological history, health care encounters to include diagnosis and/or procedures or personal information contact information, address, work and occupation information, health savings account information, bank account information, authorized associate account information. Non-editable files can include but are not limited to a DNA profile, medication history, lab reports/results, digital images, binary attachment files, research data or a combination thereof. The file may be an immunohistochemistry report. The report may be a supplemental research report. The supplemental research report may be publications found based on genetic data. The medical history may also involve assessment of the cardiovascular system, respiratory system, gastrointestinal system, genitourinary system, nervous system, cranial nerves symptoms, endocrine system, musculoskeletal system, and the skin.
The medical history may be a personal health record. A personal health record can be content files. Examples of content files comprise past patient medical history, including treatment, illnesses, family history, past and current medications, and other content information, such as medical history. Other examples include X-rays, CT scans, MRI scans, blood screens/test results, medical treatment information, medical conditions (e.g., current, past, pre-existing), allergies to medications, current medications or any other results, laboratory results/reports, digital images, binary attachments (e.g., PDF files), research data, DNA profile or genome information, test, screens, and scans. The medical history content can be regularly updated. During a request for enrollment, the enrollment may be received over a network comprising one or more of an internet connection, a web browser, a portable communication device, a computer, a television, a telephone, ATM, network appliance or router. The user interface may be a web-based user interface.
Certain therapies may be prioritized within a generated set of clinical trials. Factors that affect the priority choice may include geographic location, regulatory approval status, and annotated medical history data.
The medical history of a subject may be requested by the subject. The medical history may be disparate. The documents can be inputted into the platform records acquisition and processing system and organized. The data may be used in determining outcomes of therapies. The data may also be used to examine the effects of tested drugs on subjects (e.g., patients) by studying the various outcomes of effects among different populations. During the examination, the therapy may be known. The therapy may also be unknown and the sample analysis platform (e.g., automated platform) may be used to generate a therapy for the subject. The data may be used in identifying the population of people that responded positively to the therapy and the common characteristics of the population. From the data, sequence and mutation targets may be identified and matched with a drug that affects the targets. As a result, a searchable database of drugs may be assembled. Patients may be directly connected with treatments. Existing treatments that the data may identify a match can lead to unanticipated effects. The unanticipated effects may be useful in the process of drug discovery.
During drug matching, a specific mutation may be identified in a sample and matched with a corresponding drug. The system may recommend a drug that can be useful in other similar pathways. The drug may be a drug approved by a government unit (e.g., Food and Drug Administration, FDA). The drug recommendation may be based on prior clinical history.
The medical history may be obtained from a doctor or patient database. The doctor database may comprise practice areas of the doctor or hospital, the number of patients in their practice, or the location of their practice. The patient database may comprise information regarding all the patients associated with a particular medical practice and can include their specific height, weight, age, gender, medical history, current health status or any particular genetic markers.
Furthermore, the database may include key words associated with the subject's medical history including dictations prepared by the medical professional; lab, radiology and pathological reports; blood work panels and other appropriate information. The database component can also include medical fees associated with relatively standard procedures that are performed by the medical professional such as blood tests, office visits, taking of vital signs, supervising and preparing a specific type of medical history, or performing a medical physical. The medical history may be described in standardized terminology. The standard terminology may be Unified Medical Language System. The user interface may be a web-based user interface or a mobile user interface.
In another aspect, the present disclosure provides a method qualifying a subject for enrollment in a therapy. A first nucleic acid sample from a tumor tissue sample of the subject and a second nucleic acid sample from a normal tissue sample of the subject may be received. The first nucleic acid sample and second nucleic acid sample can be obtained from the tumor tissue sample and the normal tissue sample automatically without any involvement from a user. Next, the first nucleic acid sample and the second nucleic acid sample may be assayed to identify one or more genomic alterations in the tumor tissue sample relative to the normal tissue sample to generate a set of genomic data for the subject. One or more databases for one or more therapies corresponding to a medical history of the subject may be queried. Curated databases of therapies and standards of care may be generated. The genomic data may be queried to generate a set of therapies for which the subject qualifies. A set of therapies on a user interface for display to a user may be provided. The method can also comprise receiving medical history data from the subject and a request for enrollment of the subject in a therapy selected from the provided set of therapies through the user interface. A therapeutic target based on the medical history and the genomic data may be identified. The subject may be enrolled into a therapies based on the identified target. The subject may be monitored. The monitoring can comprise assaying one or more nucleic acid samples to generate genomic data. The assaying may be directed to at least about 50 genes, 100 genes, 200 genes, 300 genes, 400 genes, 500 genes, 1000 genes, 1500 genes, 2000 genes, 2500 genes, or 2800 genes selected from Table 1. Assaying may comprise sequencing the first nucleic acid sample and the second nucleic acid sample without any involvement from a user. Assaying may further comprise receiving a request from the user to sequence the biological sample. The request can be received from the user to sequence the first nucleic acid sample and the second nucleic acid sample.
The present disclosure provides computer control systems that are programmed to implement methods of the disclosure.
The computer system 1301 includes a central processing unit (CPU, also “processor” and “computer processor” herein) 1305, which can be a single core or multi core processor, or a plurality of processors for parallel processing. The computer system 1301 also includes memory or memory location 1310 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 1315 (e.g., hard disk), communication interface 1320 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 1325, such as cache, other memory, data storage and/or electronic display adapters. The memory 710, storage unit 1315, interface 1320 and peripheral devices 1325 are in communication with the CPU 1305 through a communication bus (solid lines), such as a motherboard. The storage unit 1315 can be a data storage unit (or data repository) for storing data. The computer system 1301 can be operatively coupled to a computer network (“network”) 1330 with the aid of the communication interface 1320. The network 1330 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet. The network 1330 in some cases is a telecommunication and/or data network. The network 1330 can include one or more computer servers, which can enable distributed computing, such as cloud computing. The network 1330, in some cases with the aid of the computer system 1301, can implement a peer-to-peer network, which may enable devices coupled to the computer system 1301 to behave as a client or a server.
The CPU 1305 can execute a sequence of machine-readable instructions, which can be embodied in a program or software. The instructions may be stored in a memory location, such as the memory 1310. The instructions can be directed to the CPU 1305, which can subsequently program or otherwise configure the CPU 1305 to implement methods of the present disclosure. Examples of operations performed by the CPU 1305 can include fetch, decode, execute, and writeback.
The CPU 1305 can be part of a circuit, such as an integrated circuit. One or more other components of the system 1301 can be included in the circuit. In some cases, the circuit is an application specific integrated circuit (ASIC).
The storage unit 1315 can store files, such as drivers, libraries and saved programs. The storage unit 1315 can store user data, e.g., user preferences and user programs. The computer system 1301 in some cases can include one or more additional data storage units that are external to the computer system 13, such as located on a remote server that is in communication with the computer system 1301 through an intranet or the Internet.
The computer system 1301 can communicate with one or more remote computer systems through the network 1330. For instance, the computer system 1301 can communicate with a remote computer system of a user (e.g., an operator). Examples of remote computer systems include personal computers (e.g., portable PC), slate or tablet PC's (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants. The user can access the computer system 1301 via the network 1330.
Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 1301, such as, for example, on the memory 1310 or electronic storage unit 1315. The machine executable or machine readable code can be provided in the form of software. During use, the code can be executed by the processor 1305. In some cases, the code can be retrieved from the storage unit 1315 and stored on the memory 1310 for ready access by the processor 1305. In some situations, the electronic storage unit 1315 can be precluded, and machine-executable instructions are stored on memory 1310.
The code can be pre-compiled and configured for use with a machine have a processor adapted to execute the code, or can be compiled during runtime. The code can be supplied in a programming language that can be selected to enable the code to execute in a pre-compiled or as-compiled fashion.
Aspects of the systems and methods provided herein, such as the computer system 701, can be embodied in programming. Various aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Machine-executable code can be stored on an electronic storage unit, such memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk. “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links or the like, also may be considered as media bearing the software. As used herein, unless restricted to non-transitory, tangible “storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.
Hence, a machine readable medium, such as computer-executable code, may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings. Volatile storage media include dynamic memory, such as main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
The computer system 1301 can include or be in communication with an electronic display 1335 that comprises a user interface (UI) 1340. The UI can allow a user to set various conditions for the methods described herein, for example, PCR or sequencing conditions. Examples of UI's include, without limitation, a graphical user interface (GUI) and web-based user interface.
Methods and systems of the present disclosure can be implemented by way of one or more algorithms. An algorithm can be implemented by way of software upon execution by the central processing unit 1305. The algorithm can, for example, process the reads to generate a consequence sequence.
The examples below are illustrative and non-limiting.
The Pre-Amplification Sample Processing is associated with sequencing preparations. The system operates on 5 iterations during a 10 hour work day. During each work day, 5 PCR plates are transferred to Post-Amplification System. During the Pre-Amplification sample processing, the lysis method is run on the liquid handler (Hamilton Star) with a deep well plate. A tip box is sent to waste. The plate is sealed and incubated for 30 minutes with shaking. Then the plate undergoes centrifugation for 2 minutes. The plate can then be peeled. The beads are added onto the liquid handler and loaded onto the DNA and extraction prep shelves (Kingfisher). The extraction protocol is run and comprises an additional wash and extraction of plates onto the Kingfisher. The QC plates on the fragment analyzer are read. If the samples are not suitable for further processing, the extraction protocol can be re-run. The destination tube rack may be placed on the docking table (Star). The data from the fragment analyzer is used to make the normalization plate on the Star. The sample may be aliquoted to the tube rack, re-capped, and sent to the output rack. During shearing, enzyme is dispensed to the normalized plate. The normalized plate is sealed and incubated with shaking for 1 hour. The plate is spun and the seal peeled. The QC end repair method is run on the Star. The plate on the fragment analyzer is read for QC. The normalized plate may be sealed and incubated with shaking for 1 hour. The normalized plate undergoes centrifugation and is then peeled. During adaptor ligation, the method is run on the Star and beads are added. The plate is moved to the Kingfisher and undergoes an additional wash and cleanup and eluent step. The magbead cleanup process is run on the Kingfisher. The remaining plates are removed to the waste or carousel from Kingfisher and the PCR plate is sealed.
The completion time is 4 hours for at least about 5 plates.
During the Post Amplification Plate preparation, the Pre Amplification PCR plate is placed on the Inheco and the protocol is run. The PCR plate is centrifuged and peeled, moved to the Star and transferred to the new Kingfisher plate. The reagents are dispensed on the Certus dispenser and transferred to the Kingfisher. The wash plates are loaded, Kingfisher routine ran, and transferred to the Star. The QC plate and PCR plate are made. The beads are then added with Star, the Kingfisher routine ran, transferred to the Star, and 8 PCR plates are generated. The PCR protocol is then ran, the Ampure cleanup protocol is repeated on the Star and Kingfisher. The QC plate is made, ran on the fragment analyzer, and the output and pool samples on the Star are normalized.
The automated platform is used to isolate biomolecules from the biological sample and deliver for them for sequencing. The blood sample in a tube or one or more slices from an FFPE tumor biopsy is inserted into the system. During an initial quality control check, the amount of blood in the input tube is validated. The DNA from the blood sample or tumor biopsy is extracted from the white blood cells and the cell free DNA in the plasma.
During the quality check fragment analysis for the biological sample's DNA, the distribution size is 150 bp for the FFPE tumor fragment, 160 bp for the cell free fragment, and 20 kb for the buffy coat fragment. The isolated DNA has a concentration of 50 ng/uL for the buffy coat and 10 ng/uL for the FFPE tumor, and 100 pg/uL for the cell free DNA. The DNA concentration is then adjusted for storage.
During the DNA library preparations for downstream processes, the DNA fragments are modified. The fragments undergo a quality control fragment analysis by determining the distribution sizes (200 bp for buffy coat fragments and 150 bp for FFPE fragments) for the modified DNA fragments and quantifying fragments. The fragments concentrations are 50 ng/uL for FFPE and buffy coat and 20 ng/uL for cell free DNA.
During target capture, DNA is selected based on its match with table 1. After target capture, the distribution of the size for the DNA fragments and the amount of DNA isolated are measured. Then, the DNA is adjusted to the correct concentration of 30 ng/uL and each patient library is tagged with a specific barcode for downstream analysis.
The bioinformatics pipeline uses raw sequencing data produced by NextSeq to identify multiple nucleotide variants, insertions or deletions of nucleotides, and copy number variants in a subject's biological sample.
The sequencing run accessioning bridge 1403 observes for new laboratory experiment metadata to be accessioned by the Clarity LIMS system, and stores the metadata into the pipeline database. The metadata allows the BCL2Fastq_runner to identify the method as to which sequencing libraries connect with sequencing runs and Illumina index adapters. The base call (BCL) to Storage Bridge 1404 (bcl2fastq) storage bridge observes the sequencing run output directory and, when the bridge identifies that a new sequencing run has finished, it can upload the BCL data into S3, and then insert the metadata about the sequencing run into the pipeline database. The BCL to Storage Bridge 1404 receives the NextSeq Output BCL files 1409. The BCL to FASTQ Bridge 1406 is responsible for running the bcl_to_fastq_runner conversion tool with the appropriate arguments, upload the newly generated FASTQ files into the pipeline database, and insert metadata into the pipeline database. The BCL to FASTQ runner 1405 converts the raw output of a sequencing run into fastq files in which reads are grouped by the sequencing library from which they originated. The case accessioning bridge links one library derived from a normal genomic sample to one derived from a tumor sample.
The tumor normal variant bridge 1407 can identifies cases for which the tumor/normal variant calling pipeline has not yet been run, and initiates a tumor normal pipeline runner 1408 instances for each of these cases. After the runs have finished (or failed), the tumor normal variant bridge updates the appropriate status fields in the pipeline database, sync the called variant data into S3, and update the database with the called variant files' locations. The tumor normal pipeline runner is responsible for identifying somatic variants 1412, such as multiple nucleotide variants, insertion or deletion of nucleotides, and identifying genes with significant copy number changes.
The DNA and cfDNA assays identify the presence and absence of molecular alterations (somatic mutations, copy number alterations, and fusion genes) involving the protein coding regions of the tumor DNA. This clinical report includes the approved drugs and drug candidates (i.e. drugs being studied in clinical trials), if any, that are associated with a potential clinical benefit or a potential lack of clinical benefit given the cancer-associated molecular alterations identified by the assays. The absence of a molecular alteration does not indicate necessarily that any drug or drug candidate will not provide any clinical benefit. Molecular alterations identified by the assay that are not associated with a potential clinical benefit or potential lack of clinical benefit is not listed in the report. The assay is performed using DNA derived from plasma and DNA derived from normal tissue. While germline DNA sequencing data is used for the identification of somatic mutations, germline events are not provided in the report. The somatic mutation, copy number alteration, and fusion detection portion of the assay is performed using the IDT xGen Lockdown system. Certain sample or variant characteristics may result in reduced sensitivity. These include but are not limited to low tumor cellularity, tumor heterogeneity, low mutant allele frequency, poor sample quality, and decreased fusion gene expression.
In an example, a subject with cancer submits his biological sample for DNA and cfDNA assaying for assessment of his molecular profile. In the DNA assay, the isolated genomic DNA derived from FFPE tumor tissue (QIAgen AllPrep DNA/RNA FFPE Kit) and matched normal tissue obtained from peripheral blood leukocytes (KingFisher Pure DNA Blood Kit) underwent sequencing library preparation using the KAPA HyperPrep Library Preparation kit. Prepared libraries were then target enriched using a customized version of the IDT xGen Lockdown system. Following enrichment, libraries for each sample were sequenced using the Illumina NextSeq 500 platform in order to generate at least 60 million, 75 bp paired-end reads with a mean target coverage of 450× for the tumor and 10 million reads with a mean target coverage of 70× for the normal samples. The tumor exome were sequenced to an average on-target depth of 450× and the matched normal tissues exome were sequenced to an average on-target depth of 70×.
Mutations, copy number variants, and fusions were screened for variants with strong clinical significance, variants with potential clinical significance, and variants with unknown significance. Variants with strong clinical significance were not identified in the subject. However, variants with potential clinical significance were identified including the AKT1 c.49G>A (p.E17K) mutation, ESR1 c.1609T>A (p.Y537N) mutation, ESR1 c.1273T>A (p.Y425N) mutation, ESR1 c.1609T>A (p.Y537N) mutation, and ESR1 c.826T>A (p.Y276N) mutation. Additionally, a copy number loss was detected for the subject's PGR gene. Lastly variants of unknown significance were identified including RERE c.472G>C (p.A158P), ASPM c.9621A>T (p. G3207G), ASPM c.4866A>T (p. G1622G), ASPM c.2616A>T (p. G872G), NAV1 c.3525G>A (p.R1175R), NAV1 c.3393G>A (p.R1131R), NAV1 c.3525G>A (p.R1175R), NAV1 c.3501G>A (p.R1167R), NAV1 c.3354G>A (p.R1118R), NAV1 c.2352G>A (p.R784R), NAV1 c.2172G>A (p.R724R), NAV1 c.471G>A (p.R157R), RANBP2 c.5910A>C (p.G1970G), NEB c.19633_19634insGGAAATATA (p.Y6545delinsWKYTKEQN), NEB c. 14530_14531 insGGAAATATA (p.Y4844delinsWKYTKEQN), NEB c.3823_3824insGGAAATATACT (p.Y1275delinsWKYTKEQN), PTPRN c.966G>T (p.E322D), PTPRN c.696G>T (p.E232D), TNPO1 c.2621A>C (p.D874A), TNPO1 c.2471A>C (p.D874A), TNPO1 c.2597A>C (p.D866A), TNPO1 c.506A>C (p.D169A), ITPR3 c.5577G>A (p.Q1859Q), REV3L c.9359C>G (p.A3120G), REV3L c.9125C>G (p.A3042G), SYNE1 c.6787G>T (p.E2263*), SYNE1 c.6808G>T (p.E2270*), SYNE1 c.6898G>T (p.E2300*), DMD c.10262C>T (p.A3421V), DMD c.1058C>T (p.A353V), DMD c.2882C>T (p.A961V), DMD c.10250C>T (p.A3417V), DMD c.632C>T (p.A211V), HDAC6 c.1417G>A (p.E473K), and HDAC6 c.1375G>A (p.E459K). Copy number variants of unknown significance with gains in the copy number were identified.
In the cfDNA assay, the isolated cell-free DNA derived from plasma was obtained from the peripheral blood (MagMAX Cell-Free DNA Isolation Kit) and matched normal tissue was obtained from peripheral blood leukocytes (KingFisher Pure DNA Blood Kit). Next, both samples underwent sequencing library preparation using the Rubicon Genomics ThruPLEX Tag-seq Kit for cell-free DNA and the KAPA HyperPrep Library Preparation kit for normal DNA. Prepared libraries were target enriched using a customized version of the IDT xGen Lockdown system. Following enrichment, libraries for each samples were sequenced using the Illumina NextSeq 500 platform in order to generate at least a mean target coverage of 800× for the cell-free DNA library and 70× for the normal samples. The cell-free exome was sequenced to an average on-target depth of 800× and the matched normal tissues exome was sequenced to an average on-target depth of 70×.
Mutations and fusions were screened for variants with strong clinical significance, variants with potential clinical significance, and variants with unknown significance. Variants with strong clinical significance were not identified in the subject. However, the AKT1 c.49G>A (p.E17K) variant was identified as comprising with potential clinical significance and the APC c.3856G>T (p.E1286*) was identified as comprising unknown significance.
In another example, a subject with cancer submits his biological sample, which undergoes a molecular assessment using the immunohistochemistry assay. The assay reported a positive or negative score, an intensity score, a percentage of positivity, and a pass or no pass for the control. Upon obtaining a biological sample from the subject, the tissue was first fixed in 10% neutral buffered formalin for a minimum of at least 6 hours and a maximum of 72 hours. When detecting Estrogen Receptor (ER) or Progesterone Receptor (PR), the ER (clone SP1) and PR (clone 1E2) were diluted at a 1:1 ratio using Leica Bond Diluent. Next, slides were incubated for 30 minutes prior to following antigen retrieval with a citrate based buffer on the Leica Bond III. External controls with known intensity levels (1+, 2+ and 3+) and with positive and negative punches were evaluated along with the test tissue. The control slides that are run alongside of the subject's sample showed the appropriate staining. ER and PR analysis was performed on the subject by immunohistochemistry utilizing the laboratory developed test (LDT). Interpretation of the ER and PR immuno-histochemical staining characteristics was guided by published results in the medical literature, information provided by the reagent manufacturer, and by internal review of staining performance. During interpretation of ER and PR, a positive result is reported when greater than 1% of the tumor cells show any nuclear staining. Contrarily, a negative result is reported when less than 1% of the tumor cells show any nuclear staining.
When detecting for the Human Epidermal Growth Factor Receptor 2 (HER2 Receptor), the HER2 Receptor (clone 4B5) was used as provided. Slides were incubated for 30 minutes prior to following antigen retrieval with a citrate based buffer on the Leica Bond III. External kit-slides provided by the manufacturer (cells lines with 0, 1+, 2+ and 3+ expression) were evaluated along with the test tissue. The control slides run alongside of the subject's sample showed appropriate staining. HER2 analysis was performed on the subject by immunohistochemistry utilizing a LDT test. Interpretation of HER2 immuno-histochemical staining characteristics was guided by published results in the medical literature, information provided by the reagent manufacturer, and by internal review of staining performance. During interpretation of HER2, positive 3+ indicates a complete and circumferential membrane staining in greater than 10% of the tumor cells. Equivocal 2+ indicates circumferential membrane staining that is non-uniform and/or weak or moderate in greater than 10% of the tumor cells, or complete and circumferential membrane staining in 10% of the tumor cells. Negative 1+ indicates incomplete membrane staining that is faint and barely perceptible in greater than 10% of the tumor cells. Negative 0 indicates that there is no observable staining that is incomplete and faint or barely perceptible in 10% of the tumor cells. A HER2 2+ staining result that is interpreted as equivocal may not show gene amplification. The results of the subject indicated a positive result with 3+ intensity score at 80% positive for the PR, negative result with 0 intensity score for the HER2, positive result with 3+ intensity score at 80% positive for the ER. All three passed the control test.
When detecting for the Programmed Death-Ligand 1 (PD-L1), the PD-L1 (clone SP142, SP263, 22C3 and 28-8) was used as provided. Slides were incubated for 30 minutes prior to following antigen retrieval with an EDTA based buffer on the Leica Bond III. Control slides (cell lines with 0, 1+, 2+ and 3+) were evaluated along with the test tissue. A batch negative reagent control was also used to test for non-specific binding. These control slides run alongside of the subject's sample showed appropriate staining. At least 100 tumor cells were identified for PD-L1 evaluation. PD-L1 analysis was performed on the subject by immune-histochemistry. Interpretation of PD-L1 immuno-histochemical staining characteristics was guided by published results in the medical literature, information provided by the reagent manufacturer, and by internal review of staining performance. The subject's PD-L1 immunohistochemistry results indicated a tumor proportion score of 8800 and immune cell score of 1800 for the 22C3 (Dako) and 28-8 (Dako) clones, a tumor proportion score of 0 and immune cell score of 0 for the SP263 (Ventana) clone, and a tumor proportion score of 800 and immune cell score of 1100 for the SP142 (Ventana) clone. All the clones passed the control test.
In another example, the medical record of a subject was requested and then submitted for retrieval. Once obtained, records were checked for quality by examining legibility, completeness, and accuracy. Next, the records were inputted into the processing system and the resultant annotated medical record was attained. During processing, the records were cleaned, organized, and labeled. During labeling, the records were labeled according to relevant medical text segments. From the subject's documented medical records, the following description includes the list of topics that were identified as relevant in the processing of the subject's records and will be used for clinical trial matching. The medical terms and texts extracted from the subject's EHR were stored in a vector that is a representation of the subject's profile.
The subject's biologic data and medical history record as processed is reported below in Table 2. The biologic data and medical history record was processed into the label name, the label category, and the label value.
In another example, the database of clinical trials is filtered according to phases of the clinical trial and according to eligibility by computer assessment based on a list of criteria. During eligibility assessment, one portion of the database of clinical trials is curated using one or more clinical labels and molecular labels to generate the filtered set of trials.
Next, the subject's medical history data and biologic data as reported in Examples 8 and 9 are collected. The medical history data and biologic data are computer analyzed to yield a genomic-based medical history analysis for the subject. The genomic-based medical history analysis is used to query the filtered list of eligible clinical trials for the subject to generate the subset of clinical trials for which the subject qualifies. First, ineligible therapies are determined according to a categorical score and rejected from the filtered list of therapies. The categorical score for each therapy is either a yes, maybe, or no. The categorical score may correspond to the group consisting of yes, maybe, and no. The therapies are then grouped using a similarity score between the subject and the therapies based on the labels. One similarity metric used is finding an empirical significance threshold and determining positive clinical trials by a specific criterion and then assessing overlap among positive clinical trials in a standard manner. The clinical trials that fall below a minimum similarity score for criteria crucial to trial enrollment are ineligible. Upon generation of the final list of therapies, the list is presented on a user interface on an electronic device of the subject. The subject will make a selection from the given therapies and will submit a request for enrollment. The list of therapies is also sent to a medically qualified staff member for final authorization and the clinical trials are added to the subject's profile.
While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the invention be limited by the specific examples provided within the specification. While the invention has been described with reference to the aforementioned specification, the descriptions and illustrations of the embodiments herein are not meant to be construed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it shall be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
The present application is a continuation of International Application No. PCT/US17/52956, filed Sep. 22, 2017, which claims the benefit of U.S. Provisional Patent Application Ser. No. 62/399,221, filed Sep. 23, 2016 and U.S. Provisional Patent Application Ser. No. 62/480,307, filed Mar. 31, 2017, each of which is entirely incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
62480307 | Mar 2017 | US | |
62399221 | Sep 2016 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US17/52956 | Sep 2017 | US |
Child | 15727491 | US |