This specification describes technologies relating to visualizing sequencing information.
Metagenomics, the genomic analysis of a population of microorganisms, makes possible the profiling of microbial communities in the environment and the human body at unprecedented depth and breadth. Its rapidly expanding use is revolutionizing our understanding of microbial diversity in natural and man-made environments and is linking microbial community profiles with health and disease. To date, most studies have relied on PCR amplification of microbial marker genes (e.g., bacterial 16S rRNA), for which large, curated databases have been established. More recently, higher throughput and lower cost sequencing technologies have enabled a shift towards enrichment-independent or broad pathogen enrichment-based next-generation sequencing (NGS) to profile microbial and host markers and their influence on health and infectious and other diseases (jointly referred to as “NGS ID”). These approaches reduce bias, improve detection of less abundant taxa, and enable discovery of novel pathogens and expression of genes of interest.
While conventional pathogen-specific nucleic acid amplification tests are highly sensitive and specific, they require a priori knowledge of likely pathogens, as with limited diagnostic panels to enable diagnosis of the most common pathogens. In contrast, NGS ID allows for unbiased detection and molecular typing of a theoretically unlimited number of common and unusual pathogens, as well as Antimicrobial Resistance (AMR) markers. Wide availability of next-generation sequencing instruments, lower reagent costs, and streamlined sample preparation protocols are enabling an increasing number of investigators to perform high-throughput DNA and RNA-seq for metagenomics studies. However, analysis of sequencing data is still formidably difficult and time consuming, requiring bioinformatics skills, computational resources, and microbiological expertise that is not available to many laboratories and/or practitioners, especially diagnostic ones.
Technical solutions (e.g., computing systems, methods, and non-transitory computer readable storage mediums) for addressing the above identified problems with discovering patterns in data sets are provided in the present disclosure.
As discussed above, next-generation sequencing techniques generate a large amount of sequencing data that can be prohibitively complex for a practitioner in a clinical or laboratory setting to review efficiently in order to provide informed decisions for further action (e.g., a treatment regimen for a patient based one or more pathogens identified in a patient sample). Thus, there is a need in the art for systems and methods that allow for the presentation and visualization of sequencing data to detect and analyze microorganisms and their hosts in biological and/or non-biological samples, without requiring the practitioner to possess advanced genomics, bioinformatics, and statistical skills, as well as microbiological expertise across a wide array of clades and classes.
The present disclosure provides a comprehensive approach to identification and analysis of organisms (e.g., microorganisms, pathogens and/or AMR markers) and their hosts in a biological and/or non-biological sample, such as a sample obtained from a patient. For example, sequencing data obtained from the patient sample is entered into an analysis pipeline comprising mapping (e.g., alignment) to one or more reference sequences corresponding to a set of microorganisms (e.g., complete and/or incomplete genomes for the set of microorganisms), thus generating preliminary results including the number and identity of microorganisms in the sample, quality control data, and/or sequencing metadata (e.g., number of reads, coverage, and/or alignment identity). Systems and methods for visualizing and reviewing the results obtained from the analysis pipeline allows users in clinical or laboratory settings to quickly and efficiently analyze the biological and/or non-biological sample, allowing the transmission of relevant results for further action (e.g., for diagnosis, monitoring, treatment, or regulatory purposes). For example, the transmission of relevant results and/or any recommended actions can be provided in a report following approval of the preliminary results by a medical practitioner.
The systems and methods disclosed herein provide a user or practitioner with access to information that is used for downstream decision-making (e.g., for the issuance of a report), while allowing flexibility for a streamlined or detailed analysis approach. For example, the interactive visualization and review tools provided herein are optionally automated, thus avoiding the need for the practitioner to have extensive bioinformatics and/or microbiological expertise to generate actionable results based on sequencing data. Alternatively, in some instances, the interactive visualization and review tools provided herein are customizable, thus allowing additional interaction for troubleshooting, pipeline development, or directing analysis towards specific organisms of interest (e.g., by application of filters). Generally, a minimum of user interaction is employed for final approval of the relevant results, whether using the streamlined or the detailed analysis approach.
The following presents a summary of the invention in order to provide a basic understanding of some of the aspects of the invention. This summary is not an extensive overview of the invention. It is not intended to identify key/critical elements of the invention or to delineate the scope of the invention. Its sole purpose is to present some of the concepts of the invention in a simplified form as a prelude to the more detailed description that is presented later.
One aspect of the present disclosure provides a method for facilitating review of nucleic acid sequencing data prepared for identifying the presence of a subset of microorganisms and/or antimicrobial resistance markers in a biological or non-biological sample (e.g., from a subject), at a computer system having a display, one or more processors, and memory storing one or more programs for execution by the one or more processors.
The method includes receiving a request to display an analysis of a result set obtained from a sequencing reaction of nucleic acids from the biological and/or non-biological sample. The result set includes a plurality of sequencing statistics from the sequencing reaction, a plurality of nucleotide sequences mapped against a plurality of reference sequences corresponding to a set of microorganisms, where the set of microorganisms comprises at least 3, at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100 microorganisms, and for each respective microorganism in the set of microorganisms, a corresponding plurality of mapping statistics for the mapping of respective nucleotide sequences to the reference sequence for the respective microorganism or hosts.
Responsive to the request, a first customizable diagnostic template is applied to the result set, where the customizable diagnostic template specifies a subset of the plurality of sequencing statistics, a subset of the set of microorganisms, and a subset of the plurality of mapping statistics.
The method further includes displaying, on the display, a customizable user interface comprising a review status for the nucleic acid sequencing data, a first affordance for updating the review status for the nucleic acid sequencing data, a summary of the subset of the plurality of sequencing statistics, for each respective microorganism in the subset of the set of microorganisms satisfying a minimum mapping threshold in the result set, a corresponding summary of the subset of the plurality of mapping statistics for the respective nucleotide sequences in the plurality of nucleotide sequences mapped to the reference sequence for the respective microorganism, and a second affordance for applying a second customizable diagnostic template to the result set.
Various embodiments of systems, methods and devices within the scope of the appended claims each have several aspects, no single one of which is solely responsible for the desirable attributes described herein. Without limiting the scope of the appended claims, some prominent features are described herein. After considering this discussion, and particularly after reading the section entitled “Detailed Description” one will understand how the features of various embodiments are used.
All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference in their entireties to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.
The implementations disclosed herein are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings. Like reference numerals refer to corresponding parts throughout the several views of the drawings.
Infectious disease testing can be achieved using metagenomics, the detection and genomic analysis of a population of microorganisms (e.g., pathogens) and their hosts in a biological and/or non-biological sample. In combination with next-generation sequencing techniques (e.g., NGS ID), metagenomics facilitates such detection even without a priori knowledge of pathogens likely to be present in a sample. For example, in some instances, detection of microorganisms in biological and/or non-biological samples utilizes enrichment-based approaches comprising targeted enrichment panels, which provide increased depth and precision, reduce the occurrence of host or contaminant genetic material in the data set, and can be optimized for sequencing of specific regions. In some instances, detection of microorganisms utilizes enrichment-independent approaches, which provides increased breadth and resolution and can be used to identify both known and unknown microorganisms, including rare microorganisms. Generally, the detection of microorganisms using NGS ID can be used for numerous downstream actions including results reporting, patient diagnosis, treatment, and monitoring, analysis pipeline validation, and/or regulatory purposes.
In some instances, analysis of metagenomics data obtained by next-generation sequencing (e.g., whole-genome sequencing) involves a level of training (e.g., in bioinformatics, genomics, statistics, and microbiology) that many clinical and laboratory practitioners lack. In particular, for applications where the desired output is an actionable result, such as an identity of a pathogenic microorganism for a patient diagnosis or a presence of an AMR marker (e.g., an AMR gene) to determine whether a specific treatment is preferable over another, it can be impractical as well as inefficient for the practitioner to exhaustively analyze the entirety of the sequencing and/or mapping (e.g., alignment) data generated using NGS ID. In some embodiments, the ability to efficiently and accurately identify AMR markers improves treatment of microbial infections by indicating whether a particular microorganism is likely to respond to a course of therapy. See, for example, Greninger (2018). “The challenge of diagnostic metagenomics,” Expert Rev Mol Diagn 18:7, 605-615. doi:10.1080/14737159.2018.1487292.
Conversely, NGS ID approaches frequently suffer from a lack of understanding of true clinical utility, such as in instances where data-driven analyses are relied upon too heavily, without consideration of case-specific factors. For example, an accurate interpretation of sequencing and mapping data can be impacted by particularities specific to a patient, which may not be accounted for in an analysis pipeline. In some instances, additional benefit is obtained from further validation by a physician or medical practitioner in a clinical setting, and/or a laboratory inspector in a commercial or diagnostic setting. In some cases, additional oversight is used to account for contaminants common in wet-lab practices (e.g., clinical chemistry and/or PCR diagnostics), anomalies occurring in sequencing and/or mapping analysis (e.g., index hopping), and interference from host or nonpathogen nucleic acids, which can obfuscate the detection of pathogenic microorganisms of interest. This is especially important when distinguishing between two or more microbial populations in coinfections or detecting the presence of small populations of microorganisms, where even low levels of contaminating material can cause interference (e.g., due to the relative size of the microbial genomes compared to a host genome or a dominant population). A priori knowledge is useful, in some embodiments, for setting specific thresholds for the detection of microorganisms involved in certain pathogenic infections, where the limit of sensitivity of the sequencing reaction can differ based on the expected microbial populations in the sample. See, for example, Greninger (2018), “The challenge of diagnostic metagenomics,” Expert Rev Mol Diagn 18:7, 605-615, doi:10.1080/14737159.2018.1487292.
For example, in some instances, an understanding of the clinical relevance of a microorganism or AMR marker detected in a biological and/or non-biological sample is a key factor in determining whether it is actionable and thus whether it should be reported. While an automated approach can use machine learning approaches (e.g., string matching, regular expressions, natural language processing, etc.) to annotate and filter preliminary results based on published knowledge, in many situations, the analysis of microorganism detection benefits from a case-specific consideration. In such instances, conventional approaches that operate entirely without a priori knowledge may result in inaccurate interpretations of clinical data, compared to those that provide a mechanism for incorporating the same into the reporting of relevant results and the application of such results to downstream actions.
There is a need in the art for an approach to detection of microorganisms and AMR markers that will overcome the above limitations. In particular, the present disclosure provides systems and methods for analysis with or without review of the presence of microorganisms and/or antimicrobial resistance (AMR) markers in a biological and/or non-biological sample. The provided systems and methods utilize an automated approach that reduces the level of expertise and experience required to make accurate and reliable assessments based on the generated results, thus increasing accessibility and reducing the cost and labor required to train practitioners in the various skills and tools necessary for metagenomics sequencing analysis using NGS ID. For instance, as described below in Example 1, the streamlined example system and method (e.g., the ReviewPortal) provides a user interface that allows for a variety of display windows, dashboards, overlays, indexes, and other organizational features for the analysis of the result set, as well as multiple affordances for selection and customization of data and navigation between different display windows. Furthermore, the provided systems and methods improve workflow by streamlining the analysis and reporting process, thus reducing the amount of time and number of computational operations required to analyze each result set and increasing output (e.g., more samples can be processed, sequenced, analyzed and reported in a shorter time). Such reduction in computational time and complexity improves system operation and functionality, which can further reduce running time, save on power requirements, and improve user accessibility by allowing the analysis to be displayed with the relevant data at hand in fewer clicks compared to conventional systems and methods.
Additionally, the provided systems and methods allow for customization and/or validation of the sequencing data, mapping data, and analysis results, thus accounting for noisy data and ambiguous or inconclusive results. Such user interaction improves upon the prior art by facilitating the application of clinical oversight to the automated results based on, for example, a priori knowledge. Other benefits include increased consistency, where the streamlined reporting and analysis system can be uniformly performed based on predetermined parameters (e.g., one or more parameters saved as a filter or profile). By providing for at least a minimum amount of user interaction (e.g., final approval and/or safeguards requiring additional validation) and the ability to customize the analysis of the results set based on a priori knowledge or case-specific parameters (e.g., tailoring the presentation of information, filtering, and/or selection of sequencing or alignment metrics), the accuracy of the reported results can be improved.
Improved applicability of metagenomics sequencing analysis allows the practitioner to take advantage of additional benefits imparted by NGS ID. For example, the use of enrichment-independent metagenomics sequencing approaches increases the likelihood of detecting microorganisms that fail to be detected by other methods, such as conventional methods that rely on diagnostic panels limited to known and/or common pathogens. This ability to detect common and rare pathogens improves diagnostic applications, where the cause of a disease is unknown and diagnostic panels are unable to provide information as to the etiology of the disease or provide guidelines as to appropriate treatment. See, for example, Greninger (2018). “The challenge of diagnostic metagenomics,” Expert Rev Mol Diagn 18:7, 605-615, doi:10.1080/14737159.2018.1487292.
Additionally, the use of NGS ID reduces the likelihood of sample loss or degradation and increases the sensitivity of detection by, for example, eliminating the need for n vitro microbial culture. For instance, sample loss or degradation can occur through user error (e.g., by improper storage or handling of samples during sample collection, preparation or culture). Furthermore, a vast majority of microorganisms have not been adapted to in vitro culture, while other uncommon and/or novel microorganisms cannot be readily cultured. It is estimated that less than 1% of microorganisms present in the environment can be cultured in vitro. See, Streit and Schmitz (2004). “Metagenomics—the key to the uncultured microbes,” Curr Op Microb 7, 492-498. doi:10.1016/j.mib.2004.08.002. Loss of detectable microorganisms can also occur in hospital settings prior to sample collection, such as in instances where patients undergo a treatment (e.g., an antibiotic therapy) immediately after admission and initial diagnosis. In such cases, patient samples collected after antibiotic exposure may not be suitable for laboratory culture, and the subsequent detection of microorganisms may not be representative of the actual in vivo composition of pathogens. See. Harris et al., (2017), “Influence of Antibiotics on the Detection of Bacteria by Culture-Based and Culture-Independent Diagnostic Tests in Patients Hospitalized With Community-Acquired Pneumonia.” Open Forum Infect Dis 4(1), doi:10.1093/ofid/ofx014.
As sequencing costs drop. NGS ID operations can also be automated with significant price reductions. Large-scale sequencing technologies, such as next generation sequencing, have afforded the opportunity to achieve sequencing at costs that are less than one U.S. dollar per million bases, and, in fact, costs of less than ten U.S. cents per million bases have been realized. See. Nimwegen et al., (2016). “Is the $1000 Genome as Near as We Think? A Cost Analysis of Next-Generation Sequencing,” Clin Chem 62(11): 1458-1464, doi:10.1373/clinchem.2016.258632. The presently disclosed systems and methods therefore provide additional benefits by overcoming the limitations of using culture-based microbial diagnostic methods by allowing the use of an NGS ID approach instead of, or in addition to, an in vitro culture approach.
Moreover, the presently disclosed systems and methods provide a powerful tool that can be used to identify and detect microorganisms or antimicrobial resistance markers in a sample including large amounts of sequencing data, such as those obtained using NGS. Such systems and methods improve upon conventional systems and methods by facilitating analyses that are otherwise too complex to be performed in the human mind. For example, as described below, in some embodiments, the method includes receiving a request to display an analysis of a result set obtained from a sequencing reaction of nucleic acids from a sample, where the result set includes, at least, a plurality of nucleotide sequences, obtained from a sequencing reaction, mapped against a plurality of reference sequences corresponding to a set of microorganisms (e.g., at least 3 microorganisms). For an example analysis where the plurality of nucleotide sequences includes at least 1×104 nucleotide sequences and the mapping the plurality of nucleotide sequences to the plurality of reference sequences collectively maps to at least 0.5 megabases (e.g., 500,000 base pairs), the number of calculations required to align each nucleotide sequence in the at least 1×104 nucleotide sequences to each candidate position along the length of the collective 0.5 megabases and correctly assign any resulting mappings to the respective corresponding microorganism in the set of microorganisms, is so large that it cannot be performed mentally.
The following describes an example embodiment of a review and visualization tool for generating, viewing, modifying, validating, and/or reporting the results of a sequencing and mapping (e.g., alignment) analysis using nucleic acids in a biological or non-biological sample obtained (e.g., from a subject such as a patient). Briefly, a sample is collected, prepared, sequenced (e.g., by next-generation sequencing), and analyzed. In some embodiments, the analysis comprises preprocessing and/or pre-sorting of the sequencing data. Pre-sorting can include sorting each nucleotide sequence obtained from the sequencing of the sample into one or more bins, where each bin corresponds to a different microorganism, depending on the likelihood that the nucleotide sequence originated from the respective microorganism. Each nucleotide sequence is then mapped (e.g., using a k-mer alignment and/or a full alignment) to one or more reference sequences (e.g., complete and/or incomplete genomes) corresponding to different microorganisms. In some embodiments, the analysis is performed using an analysis pipeline.
The sequencing and mapping (e.g., alignment) data can then be accessed from the review and visualization tool, which can be a cloud-based interface such as an online portal. In some embodiments, one or more pending samples are displayed on the review and visualization tool (e.g., positive controls, negative controls, blank controls and/or analysis samples). In some embodiments, one or more batches, each including one or more pending samples, are displayed for individual review and visualization. Additional views are possible, including selection of different runs, each including one or more batches.
In some embodiments, selection of a sample generates an overview of the results set generated by, e.g., the analysis pipeline, indicating the number of microorganisms and/or antimicrobial resistance (AMR) markers, if any, detected in the sample. Detected microorganisms can be identified by scientific name, designated as pathogenic or nonpathogenic, annotated with various search terms, and/or categorized into various classes (e.g., bacteria, fungi, parasites, or viruses). Selection of each sample can also include presentation (e.g., in text or graphical form) of metadata, including sequencing statistics (e.g., nucleotide sequence count, base composition, sequencing library size, etc.), mapping statistics for each microorganism to which mapping was detected (e.g., coverage, sequence alignment score, consensus sequence, etc.), and/or run metrics (e.g., sample type, run accession number, review status, etc.). In some embodiments, additional information for one or more features are accessible through external links, including sequences for reference sequences (e.g., BLAST, NCBI) and/or databases for detected or otherwise selected microorganisms (e.g., Ensembl, EuPathDB, The Human Microbiome Project, Pathogen Portal, etc.). Generally, detection of microorganisms is performed using an automated process, using predefined thresholds for a plurality of parameters. However, these thresholds can be adjusted by a user or practitioner, as discussed below.
Selection of each sample can also include a display of quality control data, such as sequencing and mapping quality control data. For example, presentation of quality control data allows a user to assess whether a sequencing and/or mapping has been performed successfully before determining whether the output of the analysis is accurate and meaningful. Confirmation that control and analysis samples have passed quality control checks provides assurance that any subsequent analytical results and/or interpretations are reliable at least based on the performance of the sequencing and mapping.
Notably, the review and visualization tools disclosed herein include a plurality of different metrics that provide a user (e.g., a laboratory or medical practitioner) with a comprehensive suite of results in an accessible, streamlined format (e.g., sequencing validation, sequencing statistics, mapping validation, mapping statistics, microorganism detection, microbe-specific annotations, pathogen information, antimicrobial resistance (AMR) gene expression, and therapeutic treatments, among others). As discussed above, such features allow the analysis and interpretation of NGS ID data by users without advanced skills in each and every one of the various aspects of the analysis. In some embodiments, the provided review and visualization tools present a summary of the information relevant to analyzing the presence of microorganisms in a respective sample such that it can be efficiently examined, understood, and/or reviewed by a practitioner. Further customization is also possible for situations that necessitate fine-tuning.
In particular, in addition to an automated process for analysis of the presence of microorganisms, in some embodiments, any one of the parameters and/or detection thresholds can be adjusted based on user preference and/or a priori knowledge. In some such embodiments, the review and visualization tool can be modified to include an affordance for accepting one or more approvals (e.g., by a laboratory or medical technician, supervisor and/or director) prior to submission of the analysis of the results set for downstream processing. Each approval stage for a respective sample can be indicated by a review status. Furthermore, selection and/or approval at any stage of the approval process (e.g., first, second, third, and/or final approval) can be tagged with a user identity, an access timestamp, and/or a record of each change made in the respective sample. In some embodiments, final approval of a sample (e.g., a control and/or an analysis sample) removes the sample from the list of one or more pending samples.
In some embodiments, any one of the results in the results set can be separately approved or rejected, including the presence or absence of a detected microorganism, a passing score for a quality control metric, and/or a passing score for a sequencing or mapping statistic compared to a filtering threshold. Additional elements that can be customized include specific parameters or metrics to be presented on the display for each sample, batch, or run.
In some instances, further customization is also possible through an administrator access account, by controlling and managing filters, profiles, user accounts, groups, and/or permissions for specific users (e.g., granting review and/or approval access). For example, in some implementations, a production workflow can be established by restricting access to analysis samples until one or more control samples are finally approved. In some embodiments, specific filters or profiles can be established for specific scenarios, such as in instances where it is desirable to develop, optimize and validate a user-modified, custom set of parameters and detection thresholds that is subsequently applied, consistently, to all future samples in the workflow.
The systems and methods disclosed herein further include using the review and visualization tool to generate a report (e.g., a diagnostic report). In some embodiments, the report is generated as a printable document (e.g., a PDF). In some embodiments, the report is generated as an email that can be sent to, for example, a patient, a medical practitioner, and/or a clinical institution. As with the customization of the display, additional elements that can be customized include the specific parameters, metrics, and/or results to be included in the report (e.g., sequencing validation, sequencing statistics, mapping validation, mapping statistics, list of detected microorganisms, microbe-specific annotations, pathogen status, presence or absence of antimicrobial resistance (AMR) genes, antimicrobial resistance (AMR) gene annotations, and/or therapeutic treatments based on any of the above results or any combinations thereof).
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. However, it will be apparent to one of ordinary skill in the art that the present disclosure may be practiced without these specific details. In other instances, well-known methods, procedures, components, circuits, and networks have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.
As used herein, the term “subject” refers to any living or non-living organism including, but not limited to, a human (e.g., a male human, female human, fetus, pregnant female, child, or the like), a non-human mammal, or a non-human animal. Any human or non-human animal can serve as a subject, including but not limited to mammal, reptile, avian, amphibian, fish, ungulate, ruminant, bovine (e.g., cattle), equine (e.g., horse), caprine and ovine (e.g., sheep, goat), swine (e.g., pig), camelid (e.g., camel, llama, alpaca), monkey, ape (e.g., gorilla, chimpanzee), ursid (e.g., bear), poultry, dog, cat, mouse, rat, fish, dolphin, whale and shark. In some embodiments, a subject is a male or female of any age (e.g., a man, a woman, or a child).
As used herein, the term “microorganism,” or “microbe,” refers to a microscopic organism. In some embodiments, the term “microorganism” will be understood to include bacteria, fungi, protozoa (e.g., protozoan parasites), viruses (e.g., DNA viruses and/or RNA viruses), algae, archaea, phages, and/or helminths (e.g., multicellular eukaryotic parasites). In some embodiments, a microorganism is a single-celled organism and/or a colony of single-celled organisms. In some embodiments, a microorganism is eukaryotic or prokaryotic. In some embodiments, a microorganism is a pathogen (e.g., disease-causing), such as a human, animal, or plant-infective pathogen.
Examples of bacteria include, but are not limited to, disease-causing agents such as Acinetobacter baumanii, Actinobacillus sp., Actinomycetes, Actinomyce sp. (such as Actinmyces israeli and Actinomyces naeslundii), Aeromonas sp. (such as Aeromonas hydrophila, Aeromonas veronii biovar sobria (Aeromonas sobria), and Aeromonas caviae), Anaplasma phagocytophilum, Anaplasma marginale Alcaligenes xylosoxidans, Acinetobacter baumanii, Actinobacillus actinomycetemcomitans, Bacillus sp. (such as Bacillus anthracis, Bacillus cereus, Bacillus subtilis, Bacillus thuringiensis, and Bacillus stearothermophilus), Bacteroides sp. (such as Bacteroides fragilis), Bartonella sp. (such as Bartonella bacilliformis and Bartonella henselae), Bifidobacterium sp., Bordetella sp. (such as Bordetella pertussis, Bordetella parapertussis, and Bordetella bronchiseptica), Borrelia sp. (such as Borrelia recurrentis, and Borrelia burgdorferi), Brucella sp. (such as Brucella abortus, Brucella canis, Brucella melintensis, and Brucella suis), Burkholderia sp. (such as Burkholderia pseudomallei and Burkholderia cepacia), Campylobacter sp. (such as (Campylobacter jejuni, Campylobacter coli, Camplobacter lari and (Campylobacter fetus), Capnocytophaga sp., Cardiobacterium hominis, Chlamydia trachomatis, Chlamydophila pneumoniae, Chylamydophilapsuttaci, Citrobacter sp. Coxiella burnetii, Corynebacterium sp. (such as, Corynebacterium diphtheriae, Corynebacterium jeikeium and Corynebacterium), Clostridium sp. (such as Clostridium perfringens, Clostridium dificile, Clostridium botulinum and Clostridium telani), Eikenella corrodens, Enterobacter sp. (such as Enterobacter aerogenes, Enterobacter agglomerans, Enterobacter cloacae and Escherichia coli, including opportunistic Escherichia coli, such as enterotoxigenic E. coli, enteroinvasive E. coli, enteropathogenic E. coli, enterohemorrhagic E. coli, enteroaggregative E. coli and uropathogenic E. coli), Enterococcus sp. (such as Enterococcus faecalis and Enterococcus faecium), Ehrlichia sp. (such as Ehrlichia chafeensia and Ehrlichia canis), Epidermophyton floccosum, Erysipelothrix rhusiopathiae, Eubacterium sp., Francisella tularensis, Fusobacterium nucleatum, Gardnerella vaginalis, Gemella morbillorum, Haemophilus sp. (such as Haemophilus influenzae, Haemophilus ducreyi, Haemophilus aegyptius, Haemophilus parainfluenzae, Haemophilus haemolyticus and Haemophilus parahaemolyticus), Helicobacter sp. (such as Helicobacter pylon, Helicobacter cinaedi and Heliobacter fennelliae), Kingella kingh, Klebsiella sp. (such as Klebsiella pneumoniae, Klebsiella granulomatis and Klebsiella oxytoca), Lactobacillus sp., Listeria monocytogenes, Leptospira interrogans, Legionella pneumophila, Leptospira interrogans, Peptostreptococcus sp., Mannheimia haemolytica, Microsporum canis, Moraxella catarrhalis, Morganella sp., Mobiluncus sp., Micrococcus sp., Myobacterium sp. (such as Mycobacterium leprae, Mycobacterium tuberculosis, Mycobacterium paratuberculosis, Myobacterium intracelluare, Mycobacterium avium, Mycobacterium bovis, and Mycobacterium marinum), Mycoplasma sp. (such as Mycoplasma pneumoniae, Mycoplasma homirus, and Mycoplasma genitalium), Nocardia sp. (such as Nocardia asteroides, Nocardia cyriacigeorgica and Nocardia brasiliensis), Neisseria sp. (such as Neisseria gonorrhoeae and Neisseria meningitis), Pasteurella multocida, Pityrosporum orbiculare (Malassezia furfur). Plesiomonas shigelloides, Prevotella sp., Porphyromonas sp., Prevotella melaninogenica, Proteus sp. (such as Proteus vulgaris and Proteus mirabilis), Providencia sp. (such as Providencia alcalifaciens, Providencia rettgeri and Providencia stuartii), Pseudomonas aeruginosa, Propionibacterium acnes, Rhodococcus equi, Rickettsia sp. (such as Rickettsia rickettsii, Rickettsia akari and Rickettsia prowazekii, Orientia tsutsugamushi (formerly: Rickettsia tsutsugamushi) and Rickettsia typhi), Rhodococcus sp., Serratia marcescens, Stenotrophomonas maltophilia, Salmonella sp. (such as Salmonella enterica, Salmonella typhi, Salmonella paratyphi, Salmonella enteritidis, Salmonella cholerasuis and Salmonella typhimurium), Serrana sp (such as Serrana marcesans and Serratia liquefaciens), Shigella sp. (such as Shigella dysenteriae, Shigella flexneri, Shigella boydii and Shigella sonnei), Staphylococcus sp. (such as Staphylococcus aureus, Staphylococcus epidermidis, Staphylococcus haemolyticus, Staphylococcus saprophyticus), Streptococcus sp (such as Streptococcus pneumoniae (for example chloramphenicol-resistant serotype 4 Streptococcus pneumoniae, spectinomycin-resistant serotype 6B Streptococcus pneumoniae, streptomycin-resistant serotype 9V Streptococcus pneumoniae, erythromycin-resistant serotype 14 Streptococcus pneumoniae, optochin-resistant serotype 14 Streptococcus pneumoniae, rifampicin-resistant serotype 18C Streptococcus pneumoniae, tetracycline-resistant serotype 19F Streptococcus pneumoniae, penicillin-resistant serotype 19F Streptococcus pneumoniae, and trimethoprim-resistant serotype 23F Streptococcus pneumoniae, chloramphenicol-resistant serotype 4 Streptococcus pneumoniae, spectinomycin-resistant serotype 6B Streptococcus pneumoniae, streptomycin-resistant serotype 9V Streptococcus pneumoniae, optochin-resistant serotype 14 Streptococus pneumoniae, rifampicin-resistant serotype 18C Streptococcus pneumoniae, penicillin-resistant serotype 19F Streptococcus pneumoniae, or trimethoprim-resistant serotype 23F Streptococcus pneumoniae), Streptococcus agalactiae, Streptococcus mutans, Streptococcus pyogenes, Group A streptococci, Streptococcus pyogenes, Group B1 streptococci, Streptococcus agalactiae, Group C streptococci, Streptococcus anginosus, Streptococcus equisimilis, Group D streptococci, Streptococcus bovis, Group F streptococci, and Streptococcus anginosus Group G streptococci), Spirillum minus, Streptobacillus moniliformi, Treponema sp. (such as Treponema carateum, Treponema petenue, Treponema pallidum and Treponema endemicum), Trichophyton rubrum, T. mentagrophytes, Tropheryma whippehi, Ureaplasma urealyticum, Veillonella sp., Vibrio sp. (such as Vibrio cholerae, Vibro parahaemolyticus, Vibro vulnificus, Vibrio parahaemolyticus, Vibrio vulnificus, Vibrio alginolyticus, Vibrio mimicus, Vibrio hollisae, Vibrio fluvialis, Vibrio metchnikovii, Vibro damsela and Vibrio furnish), Yersinia sp. (such as Yersinia enterocolitica, Yersinia pestis, and Yersmia pseudotuberculosis) and Xanthomonas maltophilia.
Examples of fungi include, but are not limited to, Aspergillus sp., Candida auris, Candida albicans, Candida dubliniensis, Candida famata, Candida glabrata, Candida guilliermondii, Candida kefyr, Candida lusitaniae, Candida krusei, Candida parapsilosis, Candida tropicalis, Cryptococcus gatii, Cryptococcus neoformans, Fusarium sp., Malassezia furfur, Rhodotorula sp., Trichosporon sp., Histoplasma capsulatum, Coccidioides immitis, and Pneumocystis carinii, as well as the causative agents of Aspergillosis, Balsomycosis, Candidiasis, Coccidioidomycosis, fungal eye infections, fungal nail infections, histoplasmosis, mucormycosis, mycetoma, Pneuomcystis pneumonia, ringworm, sporotrichosis, crypococcosis, and Talaromycosis.
Examples of protozoan parasites include, but are not limited to, Plasmodium falciparum, P. vivax, P. ovals P. malariae, P. berghei, Leishmania donovani, L, infantum, L. chagasi, L. mexicana, L. amazonensis, L. venezuelensis, L. tropica, L. major, L. minor, L aethiopica, L. Biana braziliensis, L. (V.) guyanensis, L. (V.) panamensis, L. (V.) periviana, Trypanosoma brucei rhodesiense, T. brucei gambiense, T. cruzi, Giardia intestinalis, G. lamblia, Toxoplasma gondii, Entamoeba histolytica, Trichomonas vaginalis, Pneumocystis carnii, and Cryptosporidium parvum.
Examples of helminths include, but are not limited to, Filarioidea sp., Wuchereria sp. (such as Wuchereria bancrofti), Brugia sp. (such as Brugia malayi and Brugia timori), Loa sp. (such as Loa loa), Mansonella sp. (such as Mansonella streptocerca, Mansonella perstans, and Monsonella ozzardi), Onchocerca sp. (such as Onchocerca volvulus). Enterobius vermicularis, Ascaris sp. (such as Ascaris lumbricoides), Dracunculus (such as Dracunculus medinensis), Ancylostoma sp. (such as Ancylostoma duodenale, Ancylostoma braziliense, Ancylostoma tubaeforme, and Ancylostoma caninum), Necator sp. (such as Necator americanus), Trichuris sp. (such as Trichuris trichiura, Trichuris vulpis, Trichuris campanula, Trichuris suis, and Trichuris muris), Strongyloides sp. (such as Strongyloides stercoralis, Strongyloides canis, Strongyloides fuelleborni, Strongyloides cebus, and Strongyloides kellyi), Nematodirus sp., Moniezia sp., Oesophagostomum sp. (such as Oesophagostomum bifurcum, Oesophagostomum aculeatum, Oesophagostomum brumpti, Oesophagostomum stephanostomum, and Oesophagostomum stephanostomum var thomasi), Cooperia sp. (such as Cooperia ostertagi and Cooperia oncophora), Haemonchus sp., Ostertagia sp. (such as Ostertagia ostertagi), Trichostrongylus sp. (such as Trichostrongylus axei), Dirofilaria sp. (such as Dirofilaria immitis, Dirofilaria tenuis and Dirofilaria repens), and Schistosoma sp. (such as Schistosoma incognitum, Schistosoma ovuncatum, Schistosoma sinensium, Schistosoma indicum, Schistosoma nasale, Schistosoma spindale, Schistosoma japonicam, Schistosoma malayensis, Schistosoma mekongi, Schistosoma haematobium, Schistosoma bovis, Schistosoma curassoni, Schistosoma guineensis, Schistosoma haematobium, Schistosoma intercalatum, Schistosoma leiperi, Schistosoma margrebowiei, Schistosoma mattheei, Schistosoma mansoni, Schistosoma edwardiense, Schistosoma hippotami, and Schistosoma rodhaini)
Examples of viruses include, but are not limited to, disease-causing agents such as Adeno-associated virus, Aichi virus, Australian bat lyssavirus, BK polyomavirus, Banna virus, Barmah forest virus, Bunyamwera virus. Bunyavirus La Crosse, Bunyavirus snowshoe hare, Cercopithecine herpesvirus, Chandipura virus, Chikungunya virus. Coronavirus, Cosavirus A, Cowpox virus, Coxsackievirus, Crimean-Congo hemorrhagic fever virus, Dengue virus, Dhori virus, Dugbe virus, Duvenhage virus, Eastern equine encephalitis virus, Ebolavirus, Echovirus, Encephalomyocarditis virus, Epstein-Barr virus. European bat lyssavirus. GB virus C/Hepatitis G virus, Hantaan virus, Hendra virus, Hepatitis A virus, Hepatitis B virus, Hepatitis C virus, Hepatitis E virus, Hepatitis delta virus, Horsepox virus, Human adenovirus, Human astrovirus. Human coronavirus, Human cytomegalovirus, Human enterovirus 68, 70, Human herpesvirus 1, Human herpesvirus 2, Human herpesvirus 6, Human herpesvirus 7, Human herpesvirus 8, Human immunodeficiency virus, Human papillomavirus 1. Human papillomavirus 2. Human papillomavirus 16,18, Human parainfluenza, Human parvovirus B19, Human respiratory syncytial virus. Human rhinovirus, Human SARS coronavirus. Human spumaretrovirus, Human T-lymphotropic virus, Human torovirus, Influenza A virus, Influenza B virus, Influenza C virus, Isfahan virus, JC polyomavirus, Japanese encephalitis virus, Junin arenavirus, KI Polyomavirus, Kunjin virus, Lagos bat virus, Lake Victoria Marburgvirus, Langat virus, Lassa virus, Lordsdale virus, Louping ill virus. Lymphocytic choriomeningitis virus, Machupo virus, Mayaro virus, MERS coronavirus, Measles virus, Mengo encephalomyocarditis virus, Merkel cell polyomavirus, Mokola virus, Molluscum contagiosum virus, Monkeypox virus, Mumps virus. Murray valley encephalitis virus, New York virus, Nipah virus, Norwalk virus, Norovirus. O'nyong-nyong virus, Orf virus, Oropouche virus, Pichinde virus, Poliovirus. Punta toro phlebovirus, Puumala virus, Rabies virus, Rift valley fever virus, Rosavirus A, Ross river virus, Rotavirus A, Rotavirus B, Rotavirus C, Rubella virus, Sagiyama virus, Salivirus A. Sandfly fever sicilian virus, Sapporo virus, Semliki forest virus, Seoul virus, Severe acute respiratory syndrome coronavirus 2, Simian foamy virus, Simian virus 5. Sindbis virus, Southampton virus, St. louis encephalitis virus. Tick-borne powassan virus, Torque teno virus, Toscana virus, Uukuniemi virus, Vaccinia virus, Varicella-zoster virus, Variola virus, Venezuelan equine encephalitis virus, Vesicular stomatitis virus, Western equine encephalitis virus, WU polyomavirus, West Nile virus, Yaba monkey tumor virus, Yaba-like disease virus, Yellow fever virus, and Zika virus.
In some embodiments, the term “microorganism” will be understood to include any one or more bacteria, fungi, protozoa, viruses, algae, archaea, phages, and/or helminths selected from a database (e.g., a microbial genome database, a transcriptomic database, a proteomic database, a metabolomics database, a taxonomic database, and/or a clinical database). In some embodiments, the database comprises one or more entries corresponding to and/or identifying a microorganism (e.g., an annotation, for a respective microorganism, to a genome, transcriptome, nucleic acid sequence, protein sequence, metabolite, taxonomic record and/or clinical record). In some embodiments, a microorganism is selected from a database that is locally maintained, proprietary, and/or open-access. In some embodiments, a microorganism is selected from a national and/or international database. Examples of such databases include, but are not limited to, NCBI, BLAST, EMBL-EBI, GenBank, Ensembl, EuPathDB, The Human Microbiome Project, Pathogen Portal, RDP, SILVA, GREENGENES, EBI Metagenomics, EcoCyc, PATRIC, TBDB, PlasmoDB, the Microbial Genome Database (MBGD), and/or the Microbial Rosetta Stone Database. For example, MBGD comprises all complete genome sequences of bacteria, archaea, and unicellular eukaryotes, including fungi and protozoa, available at the NCBI genomes site. The Microbial Rosetta Stone is a database that provides information on disease-causing organisms (e.g., bacteria, fungi, protozoa, DNA viruses, RNA viruses, plants, and animals) and the toxins produced therefrom. See, Zhulin, 2015, “Databases for Microbiologists,” J Bacteriol 197:2458-2467, doi:10.1128/JB.00330-15; Uchiyama et al., 2019, “MBGD update 2018: microbial genome database based on hierarchical orthology relations covering closely related and distantly related comparisons,” Nuc Acids Res., 47 (D1). D382-D389, doi: 10.1093/nar/gky1054; and Ecker et al., 2005, “The Microbial Rosetta Stone Database: A compilation of global and emerging infectious microorganisms and bioterrorist threat agents,” BMC Microbiology 5, 19, doi: 10.1186/1471-2180-5-19; each of which is hereby incorporated by reference herein in its entirety.
As used herein, the terms “antimicrobial resistance marker” or “AMR marker” refers to a measurable and/or detectable marker indicating that a respective microorganism has antimicrobial resistance. As used herein, the term “antimicrobial resistance” refers to a property of or exhibited by a respective microorganism, such that the respective microorganism is resistant to one or more antimicrobial interventions (e.g., where an effect of an antimicrobial intervention is attenuated, obstructed, or negated). As used herein, the term “antimicrobial susceptibility” refers to a property of or exhibited by a respective microorganism, such that the respective microorganism is susceptible to one or more antimicrobial interventions (e.g., where an effect of an antimicrobial intervention serves to kill, diminish, slow or prevent growth in one or a population of microorganisms).
In some embodiments, antimicrobial resistance is conferred by a genetic sequence (e.g., an antimicrobial resistance gene). In some embodiments, the antimicrobial resistance marker is a genetic marker (e.g., a nucleic acid sequence for the antimicrobial resistance gene indicating that the gene comprises a mutation that confers resistance). In some embodiments, the antimicrobial resistance marker is a restriction fragment length polymorphism (RFLP), a random amplified polymorphic DNA (RAPD), an amplified fragment length polymorphism (AFLP), a variable number tandem repeat (VNTR), an oligonucleotide polymorphism (OP), a single nucleotide polymorphism (SNP), an allele specific associated primer (ASAP), an inverse sequence-tagged repeat (ISTR), an inter-retrotransposon amplified polymorphism (IRAP), and/or a simple sequence repeat (SSR or microsatellite). In some embodiments, an antimicrobial resistance marker is detected based on a mapping (e.g., an alignment) of one or more nucleotide sequences to a reference sequence (e.g., a reference genome). In some embodiments, an antimicrobial resistance marker is an amino acid sequence and/or an amino acid residue. In some embodiments, an antimicrobial resistance marker is a biochemical marker.
In some embodiments, an antimicrobial resistance marker indicates that a respective microorganism is resistant to one or more interventions for a corresponding type of microorganism (e.g., antibacterial resistance, antiprotozoal resistance, antifungal resistance, antihelminthic resistance, and/or antiviral resistance). For example, in some embodiments, an antimicrobial intervention is a drug that targets a specific gene in a respective microorganism, and a mutation in the gene confers resistance to the microorganism. In some such embodiments, an antimicrobial resistance marker can be a genetic marker for the target gene that indicates a resistance to the antimicrobial drug.
As used herein, the term “antimicrobial resistance status” refers to an indication of a presence or absence of an antimicrobial resistance marker. For example, the term antimicrobial resistance status or AMR status will be understood to include an indication that a respective biological and/or non-biological sample and/or a microorganism detected in a sample has either antimicrobial resistance or antimicrobial susceptibility. In some embodiments, an antimicrobial resistance status includes an indication that an antimicrobial resistance marker is present (e.g., has been detected) in the respective sample and/or microorganism. In some embodiments, an antimicrobial resistance status includes an indication of any one or more features for the respective antimicrobial resistance marker (e.g., gene identifier, gene name, intervention (drug) information, intervention (drug) classes, associated organisms, gene families, and/or resistance mechanisms).
In some embodiments, an antimicrobial resistance marker is associated with one or more microorganisms in a plurality of microorganisms (e.g., where the respective microorganism has been reported or annotated as expressing the respective antimicrobial resistance marker). In some embodiments, a first antimicrobial resistance marker is associated with a first respective microorganism in a plurality of microorganisms, and a second antimicrobial resistance marker is associated with a second respective microorganism, other than the first microorganism, in the plurality of microorganisms.
Examples of antimicrobial resistance markers (e.g., genes and/or amino acid residues) include, but are not limited to, the antimicrobial resistance markers listed below in Table 1.
See, for example, Capela et al., 2019, “An Overview of Drug Resistance in Protozoal Diseases,” Int J Mol Sci. 20(22): 5748; doi: 10.3390/ijms20225748; Beech et al., 2011, “Anthelmintic resistance: markers for resistance, or susceptibility?” Parasitology 138(2): 160-174; doi: 10.1017/S0031182010001198; and Toledu-Rueda et al., 2018, “Antiviral resistance markers in influenza virus sequences in Mexico, 2000-2017,” Infect Drug Resist 11: 1751-1756; doi: 10.2147/IDR.S153154; each of which is hereby incorporated herein by reference in its entirety.
In some embodiments, the term “antimicrobial resistance marker” will be understood to include any one or more genes, amino acid sequences amino acid residues, genetic markers, and/or biochemical markers selected from a database. In some embodiments, an antimicrobial resistance marker is selected from a database that is one or more of locally maintained, proprietary, and/or open-access. In some embodiments, an antimicrobial resistance marker is selected from a national and/or international database. Examples of such databases include, but are not limited to, the National Database of Antibiotic Resistant Organisms (NDARO), the Comprehensive Antibiotic Resistance Database (CARD), ResFinder, PointFinder, ARG-ANNOT, ARGs-OSP, PlasmoDB, the Mycology Antifungal Resistance Database (MARDy), DBDiaSNP, the HIV Drug Resistance Database, the Virus Pathogen Resource (ViPR), and/or any of the databases used for selecting one or more microorganisms, as disclosed above. See, for example, McArthur et al., 2013, “The Comprehensive Antibiotic Resistance Database,” Antimicrob Ag Chemother, 57(7) 3348-3357; doi: 10.1128/AAC.00419-13; Zankari et al., 2017, “PointFinder: a novel web tool for WGS-based detection of antimicrobial resistance associated with chromosomal point mutations in bacterial pathogens,” J Antimicrob Chemother. 72 (10) 2764-2768: doi: 10.1093/jac/dkx217; Gupta et al., 2013, “ARG-ANNOT, a New Bioinformatic Tool To Discover Antibiotic Resistance Genes in Bacterial Genomes,” Antimicrob Ag Chemother, 58 (1) 212-220; doi: 10.1128/AAC.01310-13; Zhang et al., “ARGs-OSP: online searching platform for antibiotic resistance genes distribution in metagenomic database and bacterial whole genome database,” bioRxiv 337675; doi: 10.1101/337675; Nash et al., 2018, “MARDy: Mycology Antifungal Resistance Database,” 34 (18) 3233-3234; doi: 10.1093/bioinformatics/bty321; and Mehla and Ramana, 2015, “DBDiaSNP: An Open-Source Knowledgebase of Genetic Polymorphisms and Resistance Genes Related to Diarrheal Pathogens,” OMICS 19 (6) 354-360: doi: 10.1089/omi.2015.0030; each of which is hereby incorporated herein by reference in its entirety.
As used herein, the terms “sample,” “biological sample,” “patient sample.” or “analysis sample” refers to any sample taken from a biological or non-biological subject and/or source, which can reflect a biological or non-biological state associated with the subject and/or source. Examples of biological samples include, but are not limited to, blood, whole blood, plasma, serum, urine, cerebrospinal fluid, fecal, saliva, sweat, tears, pleural fluid, pericardial fluid, or peritoneal fluid of the subject. In some embodiments, the biological sample consists of blood, whole blood, plasma, serum, urine, cerebrospinal fluid, fecal, saliva, sweat, tears, pleural fluid, pericardial fluid, or peritoneal fluid of the subject. A biological sample can include any tissue or material derived from a living or dead subject. A sample can be a liquid sample or a solid sample (e.g., a cell or tissue sample). A biological sample can be a cell-free sample. A biological sample can comprise a nucleic acid (e.g., DNA or RNA) or a fragment thereof. The term “nucleic acid” can refer to deoxyribonucleic acid (DNA), ribonucleic acid (RNA) or any hybrid or fragment thereof. The nucleic acid in the sample can be a cell-free nucleic acid. A biological sample can be a bodily fluid, such as blood, plasma, serum, urine, vaginal fluid, fluid from a hydrocele (e.g., of the testis), vaginal flushing fluids, pleural fluid, ascitic fluid, cerebrospinal fluid, saliva, sweat, tears, sputum, bronchoalveolar lavage fluid, discharge fluid from the nipple, aspiration fluid from different parts of the body (e.g., thyroid, breast), etc. A biological sample can be a stool sample. A biological sample can be treated to physically disrupt tissue or cell structure (e.g., centrifugation and/or cell lysis), thus releasing intracellular components into a solution which can further contain enzymes, buffers, salts, detergents, and the like which can be used to prepare the sample for analysis. A biological sample can be obtained from a subject invasively (e.g., surgical means) or non-invasively (e.g., a blood draw, a swab, or collection of a discharged sample). Examples of non-biological samples include, but are not limited to, agricultural samples, environmental samples, laboratory samples, water samples (e.g., from an external, internal, natural, and/or man-made water source), air samples, terrestrial samples, and/or extraterrestrial samples. Non-biological samples can be solid, liquid, and/or gaseous. For example, a non-biological sample can include a frozen sample. Non-biological samples can include by-products (e.g., of industrial, chemical, agricultural, laboratory, and/or food processes). Any other non-biological samples are contemplated, as will be apparent to one skilled in the art.
As used herein, the terms “nucleic acid” and “nucleic acid molecule” are used interchangeably. The terms refer to nucleic acids of any composition form, such as ribonucleic acid (RNA), deoxyribonucleic acid (DNA, e.g., complementary DNA (cDNA), genomic DNA (gDNA) and the like), and/or DNA or RNA analogs (e.g., containing base analogs, sugar analogs and/or a non-native backbone and the like). In some embodiments, nucleic acids are in single- or double-stranded form. Unless otherwise limited, a nucleic acid can comprise known analogs of natural nucleotides, some of which can function in a similar manner as naturally occurring nucleotides. A nucleic acid can be in any form useful for conducting processes herein (e.g., linear, circular, supercoiled, single-stranded, double-stranded and the like). A nucleic acid, in some embodiments, can be from a single chromosome or fragment thereof (e.g., a nucleic acid sample may be from one chromosome of a sample obtained from a diploid organism). In certain embodiments nucleic acids comprise nucleosomes, fragments or parts of nucleosomes or nucleosome-like structures. Nucleic acids sometimes comprise protein (e.g., histones, DNA binding proteins, and the like). Nucleic acids analyzed by processes described herein sometimes are substantially isolated and are not substantially associated with protein or other molecules. Nucleic acids also include derivatives, variants and analogs of DNA synthesized, replicated or amplified from single-stranded (“sense” or “antisense,” “plus” strand or “minus” strand, “forward” reading frame or “reverse” reading frame) and double-stranded polynucleotides. Deoxyribonucleotides include deoxyadenosine, deoxycytidine, deoxyguanosine and deoxythymidine. A nucleic acid may be prepared using a nucleic acid obtained from a subject as a template.
As used herein, the terms “sequencing,” “sequencing reaction,” and the like refer to any biochemical processes that may be used to determine the order of biological macromolecules such as nucleic acids or proteins. For example, sequencing data can include all or a portion of the nucleotide bases in a nucleic acid molecule such as an mRNA transcript, a DNA fragment and/or a genomic locus.
As used herein, the term “NGS ID” refers to the use of enrichment-independent and/or enrichment-based sequencing (e.g., next-generation sequencing (NGS)), to detect, measure, and/or profile one or more nucleic acid molecules obtained from one or more microorganisms and/or hosts. In some embodiments, the nucleic acids correspond to markers (e.g., AMR markers). In some embodiments, NGS ID further includes determining the role of microbial and host markers on health, infectious diseases, and/or other diseases.
As used herein, the term “nucleotide sequences,” “sequence reads,” “sequencing reads,” or “reads” refers to nucleotide base sequences produced by any nucleic acid sequencing process described herein or known in the art. Nucleotide sequences can be generated from one end of nucleic acid fragments (e.g., “single-end reads”) or from both ends of nucleic acid fragments (e.g., paired-end reads, double-end reads). The length of the nucleotide sequence is often associated with the particular sequencing technology. High-throughput methods, for example, provide nucleotide sequences that can vary in size from tens to hundreds of base pairs (bp). In some embodiments, the nucleotide sequences are of a mean, median or average length of about 15 bp to 900 bp long (e.g., about 20 bp, about 25 bp, about 30 bp, about 35 bp, about 40 bp, about 45 bp, about 50 bp, about 55 bp, about 60 bp, about 65 bp, about 70 bp, about 75 bp, about 80 bp, about 85 bp, about 90 bp, about 95 bp, about 100 bp, about 110 bp, about 120 bp, about 130, about 140 bp, about 150 bp, about 200 bp, about 250 bp, about 300 bp, about 350 bp, about 400 bp, about 450 bp, or about 500 bp. In some embodiments, the nucleotide sequences are of a mean, median or average length of about 1000 bp, 2000 bp, 5000 bp, 10,000 bp, or 50,000 bp or more. Nanopore® sequencing, for example, can provide nucleotide sequences that can vary in size from tens to hundreds to thousands of base pairs. Illumina® parallel sequencing, for example, can provide nucleotide sequences that do not vary as much, where, for example, most of the nucleotide sequences can be smaller than 200 bp. A nucleotide sequence can refer to sequence information corresponding to a nucleic acid molecule (e.g., a string of nucleotides). For example, a nucleotide sequence can correspond to a string of nucleotides (e.g., about 20 to about 150) from part of a nucleic acid fragment, can correspond to a string of nucleotides at one or both ends of a nucleic acid fragment, or can correspond to nucleotides of the entire nucleic acid fragment. A nucleotide sequence can be obtained in a variety of ways, e.g., using sequencing techniques or using probes, e.g., in hybridization arrays or capture probes, or amplification techniques, such as the polymerase chain reaction (PCR) or linear amplification using a single primer or isothermal amplification.
As used herein, the term “nucleotide sequence count,” “sequence read count,” or “read count” refers to the total number of nucleic acid reads generated, which may or may not be equivalent to the number of nucleic acid molecules generated, during a nucleic acid sequencing reaction. In some embodiments, a nucleotide sequence count refers to a count of nucleotide sequences in the plurality of nucleotide sequences that map (e.g., align) to a corresponding reference sequence (e.g., complete and/or incomplete genome) for a respective microorganism. In some embodiments, a nucleotide sequence count refers to a count of unique nucleotide sequences in the plurality of nucleotide sequences that map to a corresponding reference sequence (e.g., complete and/or incomplete genome) for a respective microorganism. In some embodiments, a nucleotide sequence count refers to a count of nucleotide sequences in the plurality of nucleotide sequences that satisfy a criterion, such as a pre-processing criterion, a mapping statistic threshold (e.g., an alignment identity threshold), and/or a sequencing statistic threshold.
As used herein, the term “depth,” “read depth,” or “sequencing depth” refers to a total number of unique nucleic acid fragments encompassing a particular locus or region of the reference sequence (e.g., complete and/or incomplete genome) of a subject that are sequenced in a particular sequencing reaction. Sequencing depth can be expressed as “Yx”, e.g., 50×, 100×, etc., where “Y” refers to the number of unique nucleic acid fragments encompassing a particular locus that are sequenced in a sequencing reaction. In such a case, Y is an integer, because it represents the actual sequencing depth for a particular locus. Sequencing depth can also be applied to multiple loci, or a whole genome or reference sequence, in which case Y can refer to the mean or average number of times a locus or a haploid genome, or a whole genome or reference sequence, respectively, is sequenced. Alternatively, depth, read-depth, or sequencing depth can refer to a measure of central tendency (e.g., a mean or mode) of the number of unique nucleic acid fragments that encompass one of a plurality of loci or regions of the genome or reference sequence of a subject that are sequenced in a particular sequencing reaction. For example, in some embodiments, sequencing depth refers to the average depth of every locus across an arm of a chromosome, a targeted sequencing panel, an exome, or an entire genome or reference sequence. In such case, Y may be expressed as a fraction or a decimal, because it refers to an average depth across a plurality of loci. When a mean depth is recited, the actual depth for any particular locus may be different than the overall recited depth. Metrics can be determined that provide a range of sequencing depths in which a defined percentage of the total number of loci fall. For instance, a range of sequencing depths within which 90% or 95%, or 99% of the loci fall. As understood by the skilled artisan, different sequencing technologies provide different sequencing depths. For instance, low-pass whole genome sequencing can refer to technologies that provide a sequencing depth of less than 5×, less than 4×, less than 3×, or less than 2×, e.g., from about 0.5× to about 3×.
As used herein, the term “coverage” refers to the proportion of a reference sequence (e.g., a complete and/or incomplete reference genome) that is covered by mapped (e.g., aligned) nucleotide sequences. In some embodiments, coverage is a percent coverage of the mapping of a plurality of nucleotide sequences against the respective reference sequence. For instance, in some embodiments, if after mapping of a plurality of nucleotide sequences to a reference sequence, 90% of the reference sequence is covered by mapped (e.g., aligned) reads, then the coverage is 90%.
As used herein, the terms “genome” or “reference genome” refer to any particular known, sequenced or characterized genome, whether partial or complete, of any organism or virus that may be used to reference identified sequences from a subject. Example reference genomes used for human subjects as well as many other organisms are provided in the online genome browser hosted by the National Center for Biotechnology Information (“NCBI”) or the University of California, Santa Cruz (UCSC). A “genome” refers to the complete genetic information of an organism or virus, expressed in nucleic acid sequences. As used herein, a reference sequence or reference genome often is an assembled or partially assembled genomic sequence from an individual or multiple individuals. In some embodiments, a reference genome is an assembled or partially assembled genomic sequence from one or more human individuals. In some embodiments, a reference genome is an assembled or partially assembled genomic sequence from one or more microorganisms of the same species. The reference genome can be viewed as a representative example of a species' set of genes. In some embodiments, a reference genome comprises sequences assigned to chromosomes. Exemplary human reference genomes include but are not limited to NCBI build 34 (UCSC equivalent: hg16), NCBI build 35 (UCSC equivalent: hg17), NCBI build 36.1 (UCSC equivalent: hg18), GRCh37 (UCSC equivalent: hg19), and GRCh38 (UCSC equivalent: hg38).
In some embodiments, a genome is a complete genome. In some embodiments, a genome is an incomplete genome. For example, in some embodiments, an incomplete genome is at least 1%, at least 2%, at least 3%, at least 4%, at least 5%, at least 6%, at least 7%, at least 8%, at least 9%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% of the complete genome.
In some embodiments, a complete or incomplete genome is less than 1 megabase pairs (Mb), less than 0.5 Mb, less than 0.4 Mb, less than 0.3 Mb, less than 0.2 Mb, or less than 0.1 Mb. In some embodiments, a complete or incomplete genome is at least 1 Mb, at least 2 Mb, at least 3 Mb, at least 4 Mb, at least 5 Mb, at least 6 Mb, at least 7 Mb, at least 8 Mb, at least 9 Mb, at least 10 Mb, at least 15 Mb, at least 20 Mb, at least 25 Mb, at least 30 Mb, at least 35 Mb, at least 40 Mb, at least 45 Mb, at least 50 Mb, at least 100 Mb, at least 200 Mb, at least 500 Mb, at least 1,000 Mb, at least 2,000 Mb, at least 3,000 Mb, at least 4,000 Mb, at least 5,000 Mb, at least 10 gigabase pairs (Gb), at least 20 Gb, or at least 50 Gb.
In some embodiments, a complete or incomplete genome spans a region of a reference genome comprising at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 100, at least 200, at least 500, at least 1,000, at least 2,000, at least 3,000, at least 4,000, at least 5,000, at least 10,000, or at least 50,000 genes. In some embodiments, a complete or incomplete genome spans a region of a reference genome comprising between 1 and 10, between 10 and 50, between 50 and 100, between 100 and 500, between 500 and 1000, between 1000 and 2000, between 2000 and 5000, between 5000 and 10,000, between 10,000 and 50,000, or more than 50,000 genes.
In some embodiments, a complete or incomplete genome spans a region of a reference genome comprising at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 100, at least 200, or at least 500 antimicrobial resistance markers. In some embodiments, a complete or incomplete genome spans a region of a reference genome comprising between 1 and 10, between 10 and 50, between 50 and 100, or more than 100 antimicrobial resistance markers.
In some embodiments, a complete or incomplete genome is obtained from one or more nucleotide sequence databases and/or microorganism databases, including but not limited to NCBI, BLAST, EMBL-EBI, GenBank, Ensembl, EuPathDB, The Human Microbiome Project, Pathogen Portal, RDP, SILVA, GREENGENES, EBI Metagenomics, EcoCyc, PATRIC, TBDB, PlasmoDB, the Microbial Genome Database (MBGD), and/or the Microbial Rosetta Stone Database. See, for example, Zhulin, 2015, “Databases for Microbiologists,” J Bacteriol 197:2458-2467, doi:10.1128/JB.00330-15; Uchiyama et al., 2019, “MBGD update 2018: microbial genome database based on hierarchical orthology relations covering closely related and distantly related comparisons,” Nuc Acids Res., 47 (D1), D382-D389, doi: 10.1093/nar/gky1054; and Ecker et al., 2005, “The Microbial Rosetta Stone Database: A compilation of global and emerging infectious microorganisms and bioterrorist threat agents,” BMC Microbiology 5, 19, doi: 10.1186/1471-2180-5-19: each of which is hereby incorporated by reference herein in its entirety.
As used herein, the term “reference sequence” refers to a sequence of nucleotide bases. In some embodiments, a reference sequence is a reference genome. In some embodiments, a reference sequence is a complete or incomplete genome. In some embodiments, a reference sequence is less than 1 megabase pairs (Mb), less than 0.5 Mb, less than 0.4 Mb, less than 0.3 Mb, less than 0.2 Mb, or less than 0.1 Mb in length. In some embodiments, a reference sequence is at least 1 Mb, at least 2 Mb, at least 3 Mb, at least 4 Mb, at least 5 Mb, at least 6 Mb, at least 7 Mb, at least 8 Mb, at least 9 Mb, at least 10 Mb, at least 15 Mb, at least 20 Mb, at least 25 Mb, at least 30 Mb, at least 35 Mb, at least 40 Mb, at least 45 Mb, at least 50 Mb, at least 100 Mb, at least 200 Mb, at least 500 Mb, at least 1,000 Mb, at least 2,000 Mb, at least 3,000 Mb, at least 4,000 Mb, at least 5,000 Mb, at least 10 gigabase pairs (Gb), at least 20 Gb, or at least 50 Gb in length.
In some embodiments, a reference sequence spans a region of a reference genome comprising at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 100, at least 200, at least 500, at least 1,000, at least 2,000, at least 3,000, at least 4,000, at least 5,000, at least 10,000, or at least 50,000 genes. In some embodiments, a reference sequence spans a region of a reference genome comprising between 1 and 10, between 10 and 50, between 50 and 100, between 100 and 500, between 500 and 1000, between 1000 and 2000, between 2000 and 5000, between 5000 and 10,000, between 10,000 and 50,000, or more than 50,000 genes.
In some embodiments, a reference sequence comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 100, at least 200, or at least 500 antimicrobial resistance markers. In some embodiments, a reference sequence comprises between 1 and 10, between 10 and 50, between 50 and 100, or more than 100 antimicrobial resistance markers.
The implementations described herein provide various technical solutions for analysis of the presence of microorganisms in a result set obtained from a sequencing reaction of nucleic acids from a biological or non-biological sample. An example of such result sets are result sets arising from sample processing, sequencing, taxonomic classification and/or information presentation pipelines as disclosed in U.S. Patent Application No. 62/696,783, entitled “Methods and Systems for Processing Samples,” filed Jul. 11, 2018, PCT Application No. PCT/US2019/060915, entitled “Directional Targeted Sequencing,” filed Nov. 12, 2019, U.S. patent application Ser. No. 15/724,476, entitled “Methods and Systems for Multiple Taxonomic Classification,” filed Oct. 4, 2017, and U.S. Patent Application No. 62/723,384, entitled “Methods and Systems for Providing Sample Information,” filed Aug. 27, 2018, each of which is hereby incorporated by reference. Details of implementations are now described in conjunction with the Figures.
As used herein the term “k-mer” refers to a subsequence of a given length k within a longer sequence, where k is a positive integer of 2 or greater. In some embodiments, k is between three and one hundred. In some embodiments, k is between four and fifty. In some embodiments, k is between five and forty. In one example, the sequence “AGCTCT” is divided into the 3-nucleotide subsequences “AGC.” “GCT,” “CTC,” and “TCT.” In this example, each of these subsequences is a k-mer, where k=3. K-mers may be overlapping or non-overlapping. In some embodiments, k-mers overlap each other by one residue. K-mers and their use in sequence alignment and mapping are further described in Stokes and Glick, 2006, “MICA: desktop software for comprehensive searching of DNA databases,” BMC Bioinformatics 7:427; Kalafus, 2004, “Pash: Efficient Genome-Scale Sequence Anchoring by Positional Hashing,” Genome Research 14:672-678: and Mann and Noble, “Efficient identification of DNA hybridization partners in a sequence database,” Bioinformatics 14(22), e350-e358, each of which is hereby incorporated by reference.
In some embodiments, the plurality of parameters 140 in the first customizable diagnostic template includes a minimum mapping threshold for the mapping of the plurality of nucleotide sequences to the reference sequence (e.g., genome), for each respective microorganism in the set of microorganisms. In some embodiments, the review module and/or the summarization module is customizable via a customizable user interface. In some such embodiments, the customizable user interface comprises a customizable microorganism detection quantification construct, a customizable detection threshold filter, and/or a customizable quality control filter, among others.
In some implementations, one or more of the above identified elements are stored in one or more of the previously mentioned memory devices and correspond to a set of instructions for performing a function described above. The above identified modules, data, or programs (e.g., sets of instructions) need not be implemented as separate software programs, procedures, data sets, or modules, and thus various subsets of these modules and data may be combined or otherwise re-arranged in various implementations. In some implementations, the non-persistent memory Ill optionally stores a subset of the modules and data structures identified above. Furthermore, in some embodiments, the memory stores additional modules and data structures not described above. In some embodiments, one or more of the above identified elements is stored in a computer system, other than that of visualization system 100, that is addressable by visualization system 100 so that visualization system 100 may retrieve all or a portion of such data when needed.
Although
While a system in accordance with the present disclosure has been disclosed with reference to
Referring to Block 200, the present disclosure provides a method for facilitating review of nucleic acid sequencing data 130 prepared for identifying the presence of a subset of microorganisms and/or antimicrobial resistance markers in a biological or non-biological sample (e.g., from a subject), at a computer system having a display, one or more processors, and memory storing one or more programs for execution by the one or more processors.
In an example embodiment, the present disclosure provides a review and visualization tool (e.g., comprising a display) for generating, viewing, modifying, validating, and/or reporting the results of a sequencing and mapping analysis using nucleic acids in a biological or non-biological sample obtained (e.g., from a subject such as a patient).
Subjects and Samples.
In some embodiments, a biological or non-biological sample (e.g., sample 304) is collected, prepared, sequenced (e.g., by next-generation sequencing), and mapped (e.g., aligned) to one or more reference sequences (e.g., complete and/or incomplete genomes) prior to the analysis of the presence of microorganisms. In some embodiments, sample processing is performed using any of the methods as disclosed in U.S. Patent Application No. 62/696,783, entitled “Methods and Systems for Processing Samples,” filed Jul. 11, 2018, which is hereby incorporated by reference herein in its entirety. In some embodiments, sample processing is performed using the method described in Example 2 and
In some embodiments, the biological or non-biological sample is obtained from a subject (e.g., a biological subject). For example, in some embodiments, the subject is a human (e.g., a patient). In some embodiments, the biological or non-biological sample is obtained from any tissue, organ or fluid from the subject (e.g., urine sample 304-1). In some embodiments, a plurality of biological or non-biological samples is obtained from the subject (e.g., a plurality of replicates and/or a plurality of samples including a healthy sample and a diseased sample). In some embodiments, the biological or non-biological sample is obtained from a human with a disease condition. In some embodiments, the disease condition is influenza, common cold, measles, rubella, chickenpox, norovirus, polio, infectious mononucleosis (mono), herpes simplex virus (HSV), human papillomavirus (HPV), human immunodeficiency virus (HIV), viral hepatitis (e.g., hepatitis A, B, C. D. and/or E), viral meningitis, West Nile Virus, rabies, Ebola, strep throat, bacterial urinary tract infections (UTIs) (e.g., coliform bacteria), bacterial food poisoning (e.g., E. coli, Salmonella, and/or Shigella), bacterial cellulitis (e.g., Staphylococcus aureus (MRSA)), bacterial vaginosis, gonorrhea, chlamydia, syphilis. Clostridium difficile (C. diff), tuberculosis, whooping cough, pneumococcal pneumonia, bacterial meningitis, Lyme disease, cholera, botulism, tetanus, anthrax, vaginal yeast infection, ringworm, athlete's foot, thrush, aspergillosis, histoplasmosis, Cryptococcus infection, fungal meningitis, malaria, toxoplasmosis, trichomoniasis, giardiasis, tapeworm infection, roundworm infection, pubic and head lice, scabies, leishmaniasis, and/or river blindness.
In some embodiments, the biological or non-biological sample is obtained from a human with a viral respiratory disease. In some embodiments, the biological or non-biological sample is obtained from a human with a coronavirus infection. In some embodiments, the biological or non-biological sample is obtained from a human with a SARS-CoV-2 infection.
In some embodiments, the biological or non-biological sample is an analysis (e.g., test) sample or a control sample (e.g., a positive control, negative control, and/or blank control).
In some embodiments, the biological or non-biological sample comprises nucleic acids (e.g., RNA or DNA). In some embodiments, the nucleic acids included in the biological or non-biological sample comprise any of the embodiments described herein. See, for example, Definitions: Nucleic acids, and Definitions: Samples.
Sequencing and Mapping.
As described above, in some implementations, the sequencing generates a plurality of nucleotide sequences that can be mapped against a plurality of reference sequences. In some embodiments, the sequencing is performed on a sample or portion thereof that has undergone a nucleic acid amplification process. Alternatively, in some embodiments, the sequencing is performed on a sample or portion thereof that has not undergone a nucleic acid amplification process. In some embodiments, nucleic acid molecules within a sample or portion thereof are fragmented prior to undergoing sequencing. Alternatively, in some embodiments, nucleic acid molecules are not fragmented prior to undergoing sequencing. Multiple different schemes may be applied to identify nucleic acid sequences within a sample.
Different types of nucleic acid molecules may undergo the same or different processing and sequencing. For example, in some embodiments, DNA molecules undergo a first sequencing process and RNA molecules undergo a second sequencing process, where the first and second sequencing processes include at least one process difference. In an example, genomic DNA such as accessible chromatin is processed according to a first sequencing method (e.g., using an assay for transposase-accessible chromatin using sequencing (ATAC-seq) method) while RNA molecules are processed according to a second sequencing method (e.g., a sequencing method that targets RNA molecules that include a polyA sequence, such as messenger RNA (mRNA) molecules). In some embodiments, different sequencing procedures are performed on the same or different samples. For example, in some embodiments, a first sequencing method to analyze a first type of nucleic acid molecule and a second sequencing method to analyze a second type of nucleic acid molecule, where the first and second sequencing methods are different and the first and second types of nucleic acid molecules are different, are performed on a same sample (e.g., at the same or different times). Alternatively or in addition, in some embodiments, a first sequencing method to analyze a first type of nucleic acid molecule is performed using a first sample and a second sequencing method to analyze a second type of nucleic acid molecule may be performed using a second sample, where the first and second sequencing methods are different, the first and second types of nucleic acid molecules are different, and the first and second samples are different. In some embodiments, the first and second samples are aliquots of a same sample.
In some embodiments, the sequencing is quantitative or approximately quantitative. Alternatively, in some embodiments, nucleic acid sequencing is qualitative and does not provide significant insight into the relative amounts of different nucleic acid molecules included within a sample.
Various sequencing schemes can be employed. For example, in some embodiments, the sequencing is sequencing by synthesis, sequencing by hybridization, sequencing by ligation, nanopore sequencing, sequencing using nucleic acid nanoballs, pyrosequencing, single molecule sequencing (e.g., single molecule real time sequencing), single cell/entity sequencing, massively parallel signature sequencing, polony sequencing, combinatorial probe anchor synthesis, SOLiD sequencing, chain termination (e.g., Sanger sequencing), ion semiconductor sequencing, tunneling currents sequencing, heliscope single molecule sequencing, sequencing with mass spectrometry, transmission electron microscopy sequencing, RNA polymerase-based sequencing, or any other method, or a combination thereof. In some embodiments, the sequencing is a sequencing technology like Heliscope (Helicos), SMRT technology (Pacific Biosciences) or nanopore sequencing (Oxford Nanopore) that allows direct sequencing of single molecules without prior clonal amplification. In some embodiments, the sequencing is performed with or without target enrichment. In some embodiments, the sequencing is Helicos True Single Molecule Sequencing (tSMS) (e.g., as described in Harris T. D. et al., Science 320:106-109 [2008]). In some embodiments, the sequencing is 454 sequencing (Roche) (e.g., as described in Margulies, M. et al., Nature 437:376-380 (2005)). In some embodiments, the sequencing is SOLiD™ technology (Applied Biosystems). In some embodiments, the sequencing is single molecule, real-time (SMRT™) sequencing technology of Pacific Biosciences.
In some embodiments, the systems and methods described herein are used with any sequencing platform, including, but not limited to, Illumina NGS platforms, Ion Torrent (Thermo) platforms, and GeneReader (Qiagen) platforms.
In some embodiments, the sequencing is performed as described in PCT Application No. PCT/US2019/060915, entitled “Directional Targeted Sequencing,” filed Nov. 12, 2019, which is hereby incorporated by reference herein in its entirety.
In some embodiments, the sequencing reaction is a whole genome sequencing reaction (e.g., shotgun workflow). In some instances, the sequencing is digital polymerase chain reaction (PCR) sequencing. In some embodiments, the sequencing reaction is a whole transcriptome sequencing reaction (e.g., RNASeq). In some embodiments, the sequencing reaction is a panel enriched sequencing reaction. In some embodiments, the panel is pathogen-specific and/or disease condition-specific. For example, in some embodiments, the panel is a respiratory virus oligo panel (RVOP).
In some embodiments, the plurality of nucleotide sequences (e.g., in nucleotide sequence data store 130) includes a first subset of nucleotide sequences that map (e.g., align) to a first reference sequence (e.g., a first genome) and a second subset of nucleotide sequences that map (e.g., align) to a second reference sequence (e.g., a second genome) (e.g., where the first genome is a reference genome of a host organism and the second genome is a reference genome of a microorganism). In some embodiments, the plurality of nucleotide sequences includes a plurality of subsets of nucleotide sequences, each respective subset of nucleotide sequences mapping to a corresponding reference sequence in a plurality of reference sequences (e.g., in reference sequence data store 132). In some such embodiments, the plurality of subsets of nucleotide sequences includes at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 10,000, or at least 50,000 subsets of nucleotide sequences that map to a corresponding reference sequence.
In some embodiments, the plurality of nucleotide sequences is at least 1000, at least 5000, at least 1×104, at least 1×105, at least 5×105, at least 1×106, at least 5×106, at least 1×107, at least 5×107, at least 1×108, or at least 2×108 nucleotide sequences. In some embodiments, the plurality of nucleotide sequences is no more than 5×108, no more than 1×104, no more than 1×107, no more than 1×106, no more than 1×105, no more than 1×104, or no more than 5000 nucleotide sequences. In some embodiments, the plurality of nucleotide sequences is from 1000 to 1×104, from 1×104 to 8×104, from 5×104 to 5×105, from 1×105 to 1×106, from 1×106 to 5×106, from 2×106 to 1×107, from 8×106 to 5×107, or from 1×107 to 2×108 nucleotide sequences. In some embodiments, the plurality of nucleotide sequences falls within another range starting no lower than 1000 nucleotide sequences and ending no higher than 5×108 nucleotide sequences.
In some embodiments, the mapping of the plurality of nucleotide sequences against the plurality of reference sequences corresponding to a set of microorganisms (e.g., genomes), where the set of microorganisms comprises at least 3, at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100 microorganisms, collectively maps against at least 0.5, at least 1, at least 2, at least 3, at least 4, at least 5 or at least 6 megabases of the respective reference sequences (e.g., genomes). In some embodiments, the mapping of the plurality of nucleotide sequences against the plurality of reference sequences corresponding to the set of microorganisms collectively maps against at least 0.5, at least 0.8, at least 1, at least 1.5, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 100, at least 200, at least 500, or at least 1000 megabases of the respective reference sequences. In some embodiments, the mapping of the plurality of nucleotide sequences against the plurality of reference sequences corresponding to the set of microorganisms collectively maps against no more than 2000, no more than 1000, no more than 500, no more than 100, no more than 80, no more than 60, no more than 40, no more than 20, no more than 10, no more than 5, no more than 3, no more than 2, or no more than 1 megabases of the respective reference sequences. In some embodiments, the mapping of the plurality of nucleotide sequences against the plurality of reference sequences corresponding to the set of microorganisms collectively maps against from 0.5 to 10, from 1 to 6, from 2 to 5, from 4 to 15, from 8 to 20, from 12 to 30, from 10 to 60, from 20 to 100, from 75 to 500, from 100 to 1000, from 300 to 800, or from 500 to 2000 megabases of the respective reference sequences. In some embodiments, the mapping of the plurality of nucleotide sequences against the plurality of reference sequences corresponding to the set of microorganisms collectively maps against another range of megabases of the respective reference sequences starting no lower than 0.5 megabases and ending no higher than 2000 megabases.
In some embodiments, the result set further includes a plurality of nucleotide sequences mapped (e.g., aligned) to a human reference genome. Accordingly, in some embodiments, the mapping of the plurality of nucleotide sequences against a plurality of reference sequences, where the plurality of reference sequences includes a set of reference sequences corresponding to a set of microorganisms (e.g., at least 3, at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100 microorganisms) and a human reference genome, collectively maps against at least 1, at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 500, at least 1000, or at least 3000 megabases. In some embodiments, the mapping of the plurality of nucleotide sequences against a plurality of reference sequences, where the plurality of reference sequences includes a set of reference sequences corresponding to a set of microorganisms and a human reference genome, collectively maps against no more than 5000, no more than 3000, no more than 1000, no more than 500, no more than 100, no more than 50, no more than 10, or no more than 5 megabases. In some embodiments, the mapping of the plurality of nucleotide sequences against a plurality of reference sequences, where the plurality of reference sequences includes a set of reference sequences corresponding to a set of microorganisms and a human reference genome, collectively maps against from 1 to 10, from 2 to 20, from 15 to 60, from 40 to 200, from 150 to 800, or from 500 to 5000 megabases. In some embodiments, the mapping of the plurality of nucleotide sequences against a plurality of reference sequences, where the plurality of reference sequences includes a set of reference sequences corresponding to a set of microorganisms and a human reference genome, collectively maps against another range of megabases starting no lower than 1 megabase and ending no higher than 5000 megabases.
In some embodiments, the analysis comprises preprocessing and/or pre-sorting of the sequencing data. In some embodiments, pre-sorting includes sorting each nucleotide sequence obtained from the sequencing of the biological or non-biological sample into one or more bins, where each bin corresponds to a different microorganism, depending on the likelihood that the nucleotide sequence originated from the respective microorganism. Each nucleotide sequence is then mapped (e.g., using a k-mer alignment and/or a full alignment) to one or more reference sequences (e.g., genomes) corresponding to different microorganisms. In some embodiments, the analysis is performed using an analysis pipeline. Methods of mapping nucleotide sequences obtained from sequencing nucleic acids are provided in, for example, Flygarc et al., 2016, “Taxonomer: an interactive metagenomics analysis portal for universal pathogen detection and host mRNA expression profiling,” Genome Biology 17:111; U.S. patent application Ser. No. 15/724,476, entitled “Methods and Systems for Multiple Taxonomic Classification,” filed Oct. 4, 2017, and U.S. Patent Application No. 62/723,384, entitled “Methods and Systems for Providing Sample Information.” filed Aug. 27, 2018, each of which is hereby incorporated by reference in its entirety. Other methods of mapping nucleotide sequences to a reference sequence are possible, as will be apparent to one skilled in the art. See, for example, Roumpeka et al., 2017, “A Review of Bioinformatics Tools for Bio-Prospecting from Metagenomic Sequence Data,” Front. Genet. 8:23, doi: 10.3389/fgene.2017.00023, which is hereby incorporated herein by reference in its entirety.
Review Portal.
In some embodiments, the nucleic acid sequencing data (e.g., nucleotide sequence data store 130) prepared for identifying the presence of a subset of microorganisms and/or antimicrobial resistance markers in a biological or non-biological sample (e.g., sample 304) comprises output or results data from the sequencing and/or mapping (e.g., result set 122), which can be performed, as described above, using any sequencing and/or mapping method as will be apparent to one skilled in the art.
In some embodiments, some or all of the nucleic acid sequencing data is accessed via a system (e.g., in accordance with the example system 100 embodiments described above) for review and/or visualization. In some embodiments, the review and/or visualization is performed on the display of a computer. In some embodiments, the review and/or visualization is performed using a cloud-based interface such as an online portal.
In some embodiments, some or all of the nucleic acid sequencing data is transmitted from a first system for performing sequencing and/or mapping analysis, to a second system (e.g., in accordance with the example system embodiments described above) for performing review and/or visualization. In some embodiments, some or all of the nucleic acid sequencing data is transmitted from a first system for performing sequencing and/or mapping analysis, to a cloud-based interface, such as an online portal for performing the review and/or visualization. In some embodiments, the sequencing and/or mapping analysis is performed using an analysis pipeline.
In some embodiments, the method comprises generating an alert when no nucleic acid sequencing data is available to perform the method (e.g., receiving an email notification when data upload fails).
In some embodiments, the review and/or visualization is performed on the same system as the sequencing and/or mapping analysis, where the sequencing, mapping, review, and/or visualization of some or all of the nucleic acid sequencing data is performed within an analysis workflow. In some embodiments, the sequencing, mapping, review, and/or visualization is performed at a cloud-based interface such as an online portal comprising an analysis pipeline. In some embodiments, the sequencing, mapping, review, and/or visualization is performed using a software program (e.g., Explify). See Example 1 (Examples, below). See, for example, IDbyDNA, 2019, “Explify Software v1.5.0 User Manual,” Document No. TH-2019-200-006, pp. 1-44, which is hereby incorporated by reference herein in its entirety.
Dashboard.
In some embodiments, the method further facilitates review of nucleic acid sequencing data prepared for identifying the presence of a subset of microorganisms and/or antimicrobial resistance markers in a biological or non-biological sample 304 (e.g., from a subject), where the biological or non-biological sample is selected from a plurality of biological or non-biological samples (e.g., from the same subject or from a plurality of subjects). In some embodiments, the method facilitates review of nucleic acid sequencing data in a plurality of biological or non-biological samples, where each respective biological or non-biological sample corresponds to a respective subject in a plurality of subjects. In some embodiments, the method facilitates review of nucleic acid sequencing data in a plurality of biological or non-biological samples, where the plurality of biological or non-biological samples includes a biological or non-biological sample obtained from a subject and one or more control samples.
In some embodiments, the plurality of biological or non-biological samples (e.g., samples 304) are displayed on a display (e.g., of a system for review and visualization). In some embodiments, the display is provided in a system for review and visualization (e.g., system 100), and the one or more biological or non-biological samples are displayed on a dashboard (e.g., results dashboard 302). In some embodiments, the plurality of biological or non-biological samples are displayed as a sample queue (e.g., sample queue 306).
In some embodiments, the one or more biological or non-biological samples comprises a list of pending samples (e.g., a sample queue comprising one or more samples awaiting or undergoing review).
In some embodiments, the one or more biological or non-biological samples comprises one or more batches 310 (e.g., batch 310-1), where each batch includes one or more samples 304. For example, in some embodiments, each sample in a respective plurality of samples in a batch is sequenced using the same method as every other sample in the respective plurality of samples in the batch (e.g., from the same sequencing run). In some embodiments, each sample in a respective plurality of samples in a batch is processed using the same method as every other sample in the respective plurality of samples in the batch (e.g., collected and/or prepared for sequencing at the same time and/or via matched processes). In some such embodiments, the one or more biological or non-biological samples comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 batches, where each batch includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 samples.
In some embodiments, the plurality of biological or non-biological samples comprises one or more runs 314 (e.g., run 314-1), where each respective run includes one or more batches 310, and each respective batch includes one or more samples 304. For example, in some embodiments, the plurality of samples in a respective run consists of a plurality of samples sequenced during the same sequencing run. In some such embodiments, the one or more runs comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 runs, where each batch in each run includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 batches, and each respective batch in each run includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 samples.
In some embodiments, the one or more samples, one or more batches, and/or one or more runs are displayed on a system for review and visualization (e.g., a user interactive system 100 for review and visualization). In some embodiments, an index of the one or more samples, one or more batches, and/or one or more runs are displayed on a user interactive dashboard (e.g., results dashboard 302) on the system for review and visualization.
Control Samples.
In some embodiments, the one or more control samples are obtained from the same or a different subject as the biological or non-biological sample used for identifying the presence of a subset of microorganisms and/or antimicrobial resistance markers. In some embodiments, the one or more control samples are obtained externally (e.g., using laboratory standards). In some embodiments, a control sample is a positive control sample (e.g., 304-cp), a negative control sample (e.g., 304-np), or a blank control sample (e.g., 304-blk).
For example, in some embodiments, a biological or non-biological sample is a positive control sample, where the positive control sample comprises a known, non-zero amount of nucleic acids corresponding to one or more microorganisms in the subset of microorganisms.
In some embodiments, the positive control sample is obtained from a subject with a known population of a microorganism (e.g., a pathogenic infection). In some such embodiments, the positive control sample is obtained from a subject diagnosed with an infectious disease. In some such embodiments, the positive control sample is obtained from diseased tissue in a subject diagnosed with an infectious disease.
In some embodiments, the presence of a microorganism in the positive control sample is validated by a laboratory validation technique, such as targeted enrichment sequencing, PCR, in vitro culture, immunoassays (e.g., ELISA, Western blot, chemiluminescence, etc.), serological assays and/or antimicrobial susceptibility assays.
In some embodiments, the positive control sample comprises whole or lysed microorganisms from an in vitro culture. In some embodiments, the positive control sample comprises nucleic acids isolated from one or more microorganisms in the subset of microorganisms. In some embodiments, the positive control sample comprises nucleic acids synthesized based on one or more reference sequences (e.g., complete and/or incomplete genomes) corresponding to a respective one or more microorganisms in the subset of microorganisms.
For instance,
In some embodiments, a biological or non-biological sample is a negative control sample, where the negative control sample does not include nucleic acids corresponding to a microorganism in the subset of microorganisms. In some embodiments, the negative control sample is obtained from a healthy subject. In some embodiments, the negative control sample is obtained from a healthy tissue in a subject diagnosed with an infectious disease. In some embodiments, the absence of one or more microorganisms in the subset of microorganisms in the negative control sample is validated by a laboratory validation technique, such as targeted enrichment sequencing, PCR, in vitro culture, immunoassays (e.g., ELISA, Western blot, chemiluminescence, etc.), serological assays and/or antimicrobial susceptibility assays.
For instance,
In some embodiments, a biological or non-biological sample is a blank control sample, where the blank control sample does not include nucleic acids corresponding to a microorganism in the subset of microorganisms. In some embodiments, the blank control sample does not comprise biological material. In some embodiments, the blank control sample comprises one or more reagents used for processing the positive control sample and/or the negative control sample (e.g., reagents for sample collection, sample storage, pre-processing, nucleic acid isolation, and/or sequencing). In some embodiments, the blank control sample is water.
For instance,
In some embodiments, a first control sample and a second control sample are matched samples. For example, in some embodiments, a positive control sample and a negative control sample are obtained from a diseased tissue and a healthy tissue from the same subject, respectively. In some embodiments, a positive control sample and a negative control sample are obtained from a subject diagnosed with an infectious disease and a healthy subject from the same cohort, respectively (e.g., in a clinical study).
In some embodiments, a first control sample and a second control sample are process matched. For example, in some embodiments, a positive control sample and a negative control sample are prepared using the same process, including the reagents, equipment, processing times, and/or operator or technician used to perform the method, as well as matching workflows for sequencing, mapping, and/or preprocessing. Similarly, in some embodiments, a positive and/or negative control sample is process matched with a blank control sample, such as where the blank control sample comprises the reagents used to process the positive and/or negative control sample, and is subjected to a workflow that matches the processing workflow for the positive and/or negative control sample.
Analysis Samples.
In some embodiments, a biological or non-biological sample is an analysis sample (e.g., a test sample where the presence of microorganisms is unknown and/or under investigation). For example, in some embodiments, a biological or non-biological sample is a clinical sample, a diagnostic sample, an environmental sample, a consumer quality sample, a food sample, a biological product sample, a microbial testing sample, a tumor sample, a forensic sample and/or a laboratory or hospital sample. In some embodiments, biological or non-biological sample is obtained from a human or an animal. In some embodiments, a biological or non-biological sample is a sample from a patient undergoing a treatment.
Receiving Requests for Analysis.
Referring again to
For instance, in some embodiments, the result set is an output of an analysis pipeline. In some embodiments, the result set is data generated from an analysis of sequencing data. In some embodiments, the result set is data generated from an analysis of a mapping of nucleotide sequences to a reference sequence (e.g., of nucleic acid sequencing data to a reference genome). In some embodiments, the result set is obtained from an analysis software (e.g., BaseSpace, BasePair, Strand-NGS, CLC Genomics Workbench, etc.).
In some embodiments, the receiving the request includes receiving log-in credentials for a user; displaying, on the display, an index of biological or non-biological samples for the user (e.g., a results dashboard 302 and/or a sample queue 306); and detecting selection of a respective biological or non-biological sample 304 from the index. In some embodiments, the log-in credentials are for an organization (e.g., a hospital, diagnostic testing company, research institution, etc.). In some embodiments, the log-in credentials are for an individual (e.g., a patient, a medical practitioner, a primary physician, a medical director (2502-3), a reviewer (2502-4), a research technician, a research supervisor, etc.).
In some embodiments, the receiving the request further includes receiving log-in credentials for an administrator account user (e.g., 2502-1): displaying, on the display, an index (e.g., 2102) of affordances for administrator action (e.g., an administrator dashboard 2108) and an index of biological or non-biological samples for the user (e.g., 2106); and detecting selection of an affordance for administrator action and/or a respective biological or non-biological sample from the index.
In some embodiments, the receiving the request further includes receiving log-in credentials for a demo account user (e.g., 2502-2); displaying, on the display, an index of affordances for demo (e.g., for testing and/or trialing); and detecting selection of an affordance for testing and/or trial purposes.
In some embodiments, the receiving the request includes receiving log-in credentials for a plurality of users (e.g., 2402-1, 2402-2, 2402-3, etc.). In some embodiments, a plurality of requests can be received simultaneously from a plurality of users. In some embodiments, only one user at a time can submit a request by entering log-in credentials. In some embodiments, log-in credentials include a username and/or a password. In some embodiments, log-in credentials include an email address.
In some embodiments, the detecting selection of a respective biological or non-biological sample comprises detecting a selection of the respective sample from an index of samples to be displayed (e.g., selection of a sample from a list of samples 304 in a pending queue 306 displayed on a user interactive results dashboard 302). In some embodiments, the receiving a request to display an analysis of a respective sample (e.g., a sample 304) comprises detection of an affordance for performing a review of the analysis (e.g., review affordance 332).
In some such embodiments, the receiving the request includes displaying, on the display, an index of sets (e.g., batches 310 and/or runs 314) of samples for the user; and detecting selection of a respective set (e.g., batch 310 and/or run 314) of samples from the index. For instance, in some embodiments, the method comprises receiving a selection of a respective batch in a plurality of batches displayed on an index of batches and/or runs.
In some embodiments, the display includes, for each sample in the index of samples for the user, a sample summary comprising an indication of a run quality control metric, an indication of a sample quality control metric, and an indication of a subset of the set of microorganisms (e.g., selected by an analysis of the result set). For example,
In some embodiments, selection of a sample (e.g., 304-1) generates an overview of the results set (e.g., customizable user interface 401-1) generated by, e.g., an analysis pipeline, indicating the number of microorganisms, if any, detected in the sample.
Microorganisms.
In some embodiments, a microorganism is a single-celled organism and/or a colony of single-celled organisms. In some embodiments, a microorganism is eukaryotic or prokaryotic. In some embodiments, a microorganism is a pathogen (e.g., disease-causing), such as a human, animal, or plant-infective pathogen. In some embodiments, a microorganism in the set of microorganisms is any one of the microorganisms described herein (See, Definitions: “Microorganisms,” above). In some embodiments, a microorganism in the set of microorganisms is any one of the microorganisms selected from a database, including but not limited to NCBI, BLAST, EMBL-EBI, GenBank, Ensembl, EuPathDB, The Human Microbiome Project, Pathogen Portal, RDP, SILVA, GREENGENES, EBI Metagenomics, EcoCyc, PATRIC. TBDB, PlasmoDB, the Microbial Genome Database (MBGD), and/or the Microbial Rosetta Stone Database.
In some embodiments, the set of microorganisms comprises at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, or at least 500 microorganisms. In some embodiments, the set of microorganisms is at least 1000, at least 2000, at least 5000, at least 7500, at least 10,000, at least 20,000, at least 30,000, or at least 50,000 microorganisms. In some embodiments, the set of microorganisms comprises no more than 80,000, no more than 50,000, no more than 10,000, no more than 1000, no more than 500, no more than 100, no more than 50, or no more than 20 microorganisms. In some embodiments, the set of microorganisms comprises from 3 to 10, from 8 to 30, from 20 to 80, from 75 to 200, from 100 to 1000, from 800 to 3000, from 2500 to 7500, or from 5000 to 20,000 microorganisms. In some embodiments, the set of microorganisms falls within another range starting no lower than 3 microorganisms and ending no higher than 80.000 microorganisms.
In some embodiments, a microorganism in the set of microorganisms comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100 microorganisms selected from the lists provided above and/or selected from any one or more of the databases provided above. In some embodiments, a microorganism in the set of microorganisms comprises at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 2000, at least 3000, at least 4000, at least 5000, at least 10,000 or at least 50,000 microorganisms selected from the lists provided above and/or selected from any one or more of the databases provided above. In some embodiments, a microorganism in the set of microorganisms comprises between 1 and 50, between 50 and 100, between 100 and 200, between 200 and 500, between 500 and 1000, between 1000 and 2000. between 2000 and 3000, between 3000 and 5000, between 5000 and 10,000, between 10,000 and 50,000, or more than 50,000 microorganisms selected from the lists provided above and/or selected from any one or more of the databases provided above.
In some embodiments, a microorganism in the set of microorganisms is a bacterium, fungus, protozoan (e.g., protozoan parasite), virus (e.g., DNA virus and/or RNA virus), and/or helminth. In some embodiments, the set of microorganisms comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100 members of a respective type (e.g., taxonomic classification, genus, species, and/or strain, including bacteria, fungi, protozoa, viruses, and/or helminths) of microorganism selected from the lists provided above and/or selected from any one or more of the databases provided above. In some embodiments, the set of microorganisms comprises at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700. at least 800, at least 900, at least 1000, at least 2000, at least 3000, at least 4000, at least 5000, at least 10,000, or at least 50,000 members of a respective type (e.g., taxonomic classification, genus, species, and/or strain, including bacteria, fungi, protozoa, viruses, and/or helminths) of microorganism selected from the lists provided above and/or selected from any one or more of the databases provided above. In some embodiments, the set of microorganisms comprises between 1 and 50, between 50 and 100, between 100 and 200, between 200 and 500, between 500 and 1000, between 1000 and 2000. between 2000 and 3000, between 3000 and 5000, between 5000 and 10,000, between 10,000 and 50,000, or more than 50,000 members of a respective type (e.g., taxonomic classification, genus, species, and/or strain, including bacteria, fungi, protozoa, viruses, and/or helminths) of microorganism selected from the lists provided above and/or selected from any one or more of the databases provided above.
In some embodiments, the set of microorganisms comprises one or more microorganisms selected from at least 1, at least 2, at least 3, or at least 4 of the group consisting of: bacteria, fungi, parasites, and/or viruses.
In some embodiments, the method comprises identifying the presence of a subset of microorganisms comprising at least 1 microorganism from the set of microorganisms. In some embodiments, the method comprises identifying the presence of a subset of microorganisms comprising between 1 and 10. between 10 and 20, between 20 and 30, between 30 and 40, between 40 and 50, between 50 and 100, or more than 100 microorganisms from the set of microorganisms. In some embodiments, the method comprises identifying the presence of a subset of microorganisms comprising at least 1 microorganism selected from the lists provided above and/or selected from any one or more of the databases provided above. In some embodiments, the method comprises identifying the presence of a subset of microorganisms comprising between 1 and 10, between 10 and 20, between 20 and 30, between 30 and 40, between 40 and 50, between 50 and 100, or more than 100 microorganisms selected from the lists provided above and/or selected from any one or more of the databases provided above.
In some embodiments, a microorganism in the set of microorganisms is selected from the group consisting of bacteria, fungi, viruses, and a parasite (e.g., protozoan parasite). In some embodiments, a microorganism in the set of microorganisms is a pathogen. In some embodiments, the microorganism is a coronavirus. In some embodiments, the microorganism is severe acute respiratory syndrome coronavirus (e.g., SARS-CoV-2). In some embodiments, the microorganism is an influenza virus. In some embodiments, the microorganism is an influenza A virus.
In some embodiments, the method comprises displaying, on the display, an identifier for each microorganism in the set of microorganisms. In some embodiments, the identifier comprises a scientific name, a pathogenic status (e.g., pathogenic or nonpathogenic), an annotation (e.g., a medical relevance annotation, an associated disease, an associated antimicrobial resistance gene, an associated treatment, a number of publications used as evidence, a keyword, and/or a search term), and/or a class (e.g., bacterium, fungus, parasite, or virus).
In some embodiments, the set of microorganisms represents at least 3 reference sequences, at least 5 reference sequences, at least 10 reference sequences, at least 50 reference sequences, at least 100 reference sequences, at least 1000 reference sequences, at least 1×104 reference sequences, at least 5×104 reference sequences, at least 1×105 reference sequences, at least 1×106 reference sequences, at least 2×106 reference sequences, at least 5×106 reference sequences. or at least 1×107 reference sequences.
Accordingly, in some embodiments, the plurality of reference sequences corresponding to the set of microorganisms (e.g., at least 3, at least 5, at least 10, or at least 100 microorganisms) collectively comprises at least 0.5, at least 0.8, at least 1, at least 1.5, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 100, at least 200, at least 500, or at least 1000 megabases. In some embodiments, the plurality of reference sequences corresponding to the set of microorganisms (e.g., at least 3, at least 5, at least 10, or at least 100 microorganisms) collectively comprises no more than 2000, no more than 1000, no more than 500, no more than 100, no more than 80, no more than 60, no more than 40, no more than 20, no more than 10, no more than 5, no more than 3, no more than 2, or no more than 1 megabases. In some embodiments, the plurality of reference sequences corresponding to the set of microorganisms (e.g., at least 3, at least 5, at least 10, or at least 100 microorganisms) collectively comprises from 0.5 to 10, from 1 to 6, from 2 to 5, from 4 to 15, from 8 to 20, from 12 to 30, from 10 to 60, from 20 to 100, from 75 to 500, from 100 to 1000, from 300 to 800, or from 500 to 2000 megabases. In some embodiments, the plurality of reference sequences corresponding to the set of microorganisms (e.g., at least 3, at least 5, at least 10, or at least 100 microorganisms) collectively comprises another range of megabases of the respective reference sequences starting no lower than 0.5 megabases and ending no higher than 2000 megabases.
In some embodiments, the method further includes displaying a plurality of nucleotide sequences mapped against the reference sequences of an organism other than a microorganism. For example, in some embodiments, the method further includes displaying a plurality of nucleotide sequences mapped against a human reference genome.
In some embodiments, the mapping is performed against one microorganism reference sequence. In some embodiments, the mapping is performed against at least 3, at least 5, at least 10, at least 20, at least 50, at least 100, at least 1000, at least 10,000, or at least 50,000 microorganism reference sequences. In some embodiments, the mapping is performed against any number of reference sequences corresponding to the set of microorganisms (e.g., at least 3, at least 5, at least 10, or at least 100 microorganisms).
In some embodiments, the reference sequences of the set of microorganisms are obtained from a nucleotide sequence database. A nucleotide sequence database can be, for example, a global genome database or a microorganism-specific genome database. For example, in some embodiments, reference sequences of the set of microorganisms are obtained from NCBI, BLAST, EMBL-EBI, GenBank, Ensembl, EuPathDB, The Human Microbiome Project, Pathogen Portal, RDP, SILVA, GREENGENES, EBI Metagenomics, EcoCyc, PATRIC, TBDB, PlasmoDB, the Microbial Genome Database (MBGD), and/or the Microbial Rosetta Stone Database. See, for example, Zhulin, 2015, “Databases for Microbiologists,” J Bacteriol 197:2458-2467, doi:10.1128/JB.00330-15; Uchiyama et al., 2019, “MBGD update 2018: microbial genome database based on hierarchical orthology relations covering closely related and distantly related comparisons,” Nuc Acids Res., 47 (D1), D382-D389, doi: 10.1093/nar/gky1054: and Ecker et al., 2005, “The Microbial Rosetta Stone Database: A compilation of global and emerging infectious microorganisms and bioterrorist threat agents,” BMC Microbiology 5, 19, doi: 10.1186/1471-2180-5-19; each of which is hereby incorporated by reference herein in its entirety.
As illustrated in
In some embodiments, as described elsewhere herein, the method comprises specifying a subset of the set of microorganisms (e.g., a subset of the set of at least 3, at least 5, or at least 10 microorganisms). In some embodiments, a respective subset of the set of microorganisms is any integer value less than or equal to the number of microorganisms in the set of microorganisms. For instance, where the set of microorganisms comprises at least 10 microorganisms, a respective subset of the set of microorganisms can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 microorganisms, up to the number of microorganisms in the set of microorganisms. In another example, where the set of microorganisms comprises at least 3 microorganisms, a respective subset of the set of microorganisms can be 1, 2, 3, or more than 3 microorganisms, up to the number of microorganisms in the set of microorganisms.
In some embodiments, a subset of the set of microorganisms comprises one or more microorganisms that are grouped together based on a microorganism type (e.g., taxonomic classification, genus, species, and/or strain, including bacteria, fungi, protozoa, viruses, and/or helminths) and/or an associated disease condition. In some embodiments, a subset of the set of microorganisms comprises one or more microorganisms that are grouped together based on another parameter or filtering criterion (e.g., an evidence score, AMR gene, study type, etc.). In some embodiments, a subset of the set of microorganisms comprises one or more microorganisms that are selected and/or specified by a first customizable diagnostic template to be applied to the result set, as described below (see, e.g., the sections entitled “Features of the analysis,” “Parameters for feature selection,” “Customizable analysis of presence of microorganisms,” and “Administrator control: Test profiles,” below).
In some embodiments, an antimicrobial resistance marker is a gene. In some embodiments, an antimicrobial resistance marker is a nucleic acid sequence obtained from a reference genome. In some embodiments, an antimicrobial resistance marker is any of the embodiments described herein (see Definitions: “Antimicrobial resistance markers,” above). In some embodiments, an antimicrobial resistance marker is selected from Table 1 and/or selected from one or more databases, including but not limited to the National Database of Antibiotic Resistant Organisms (NDARO), the Comprehensive Antibiotic Resistance Database (CARD), ResFinder, PointFinder, ARG-ANNOT, ARGs-OSP, PlasmoDB, the Mycology Antifungal Resistance Database (MARDy), DBDiaSNP, the HIV Drug Resistance Database, the Virus Pathogen Resource (ViPR), and/or any of the databases used for selecting one or more microorganisms, as disclosed above.
In some embodiments, the method comprises identifying at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100 antimicrobial resistance markers in a biological or non-biological sample of a subject.
In some embodiments, the method comprises identifying at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 20, at least 30, at least 40, at least 50, at least 60, at least 70, at least 80, at least 90, or at least 100 antimicrobial resistance markers listed in Table 1 and/or selected from a database as disclosed herein, in a biological or non-biological sample of a subject.
In some embodiments, the method comprises displaying, on the display, an indication of any one or more features for the respective antimicrobial resistance marker (e.g., gene identifier, gene name, intervention (drug) information, intervention (drug) classes, associated organisms, gene families, and/or resistance mechanisms).
Sequencing Statistics.
As illustrated in
In some embodiments, a sequencing statistic 128 is a count of unique nucleotide sequences in the plurality of nucleotide sequences that map to the reference sequences of the set of microorganisms.
In some embodiments, a sequencing statistic 128 is a count of nucleotide sequences in the plurality of nucleotide sequences that satisfy a pre-processing criterion (e.g., post-adaptor, post-quality, and/or IC norm).
In some embodiments, a sequencing statistic 128 is a quality control metric (e.g., library quality score, % Q30, and/or library Q score). For disclosure on Q scores see, for example, Illumina, 2011, “Quality Scores for Next Generation Sequencing,” Publication No. 770-2011-030, available online at illumina.com/documents/products/technotes/technote_Q-Scores.pdf; and Lopopolo and Lonie, 2017, “Sequencing Quality Control,” Oxford Genomics Centre, available online at well.ox.ac.uk/ogc/sequencing-quality-monitoring-run. The term % Q30 refers to the percentage of bases that have a Q score>=30.
In some embodiments, a sequencing statistic 128 is a measure of length for one or more nucleotide sequences in the plurality of nucleotide sequences (e.g., a read length and/or a measure of central tendency of a read length (mean, median, and/or mode)).
In some embodiments, a sequencing statistic 128 is an entropy for one or more nucleotide sequences in the plurality of nucleotide sequences. For disclosure on the entropy of a nucleic acid sequence see, for example, Schmitt and Herzel, 1997, “Estimating the Entropy of DNA Sequences,” J, theor. Biol. 1888, pp. 369-377, which is hereby incorporated by reference.
In some embodiments, a sequencing statistic 128 is a base composition for one or more nucleotide sequences in the plurality of nucleotide sequences (e.g., percent A, T, C or G content).
In some embodiments, a sequencing statistic 128 is a size of a sequencing library, a quantity of a sequencing library (e.g., a library concentration), and/or an adaptor sequence (e.g., a sequence of a sample index).
In some embodiments, the plurality of sequencing statistics 128 includes, for each sequencing statistic in the plurality of sequencing statistics, a comparison of the respective result set obtained from the respective sequencing reaction to one or more stored result sets. In some embodiments, the comparison is a distribution. For example, in some embodiments, the plurality of sequencing statistics includes a distribution plot comprising values for a sequencing statistic across a plurality of analyses and/or a plurality of samples obtained from a run history. In some such embodiments, the distribution plot illustrates the position of the current run in the distribution, thus indicating a relative quality of the current run compared to previous runs. In some embodiments, the distribution plot comprises a distribution of nucleic acid reads (e.g., RNA and/or DNA) across a plurality of samples comprising one or more control samples, where the distribution illustrates the position of the current run in the distribution, thus indicating a relative quality of the current run compared to control samples.
Other non-limiting examples of sequencing statistics include, for each nucleotide base, a count of the respective nucleotide base for each respective nucleotide sequence in the plurality of nucleotide sequences (e.g., a base composition). In some embodiments, the count of a respective nucleotide base in each respective nucleotide sequence in the plurality of nucleotide sequences is performed using RNA. In some embodiments, the count of a respective nucleotide base in each respective nucleotide sequence in the plurality of nucleotide sequences is performed using DNA.
Mapping Statistics.
As illustrated in
In some embodiments, a mapping statistic is an annotation frequency (e.g., a medical relevance annotation, an associated disease, an associated antimicrobial resistance gene, an associated treatment, a number of publications used as evidence, a keyword, and/or a search term). For example, in some embodiments, an annotation indicating “evidence” (e.g., 404) is a number of times the microorganism is reported in a database, including publications. scientific or medical journal articles, abstracts, and/or presentations. In some embodiments, an annotation indicating “evidence” is a frequency that a microorganism reported in a database co-occurs with a disease condition of interest that is also reported in the respective database. In some embodiments, evidence annotations are used to filter putative candidates for diagnosis and therapeutic action, such as by using a filter in the second customizable diagnostic template (e.g., a test profile).
In some embodiments, a mapping statistic is a nucleic acid type 406 (e.g., RNA and/or DNA).
In some embodiments, a mapping statistic is a coverage 408. In some embodiments, coverage refers to a percent coverage of the mapping of the plurality of nucleotide sequences against the reference sequence of the microorganism. In some embodiments, coverage is presented as a graphical representation (e.g., a plot). In some such embodiments, the coverage plot is plotted as a function of depth vector and reference strength.
In some embodiments, a mapping statistic is an average nucleotide identity 410 (e.g., ANI), a quantity of the nucleic acids from the biological or non-biological sample 416 (e.g., a quantity in genome equivalents (GE) per milliliter), a length of a genome of the respective microorganism 418 (e.g., in RNA or DNA), and/or a sequence alignment score (e.g., a bit score 430 and/or a percent sequence identity (PID) 432).
In some embodiments, the plurality of mapping statistics includes a count of nucleotide sequences that map to the reference sequence of the respective microorganism 414 (e.g., RNA and/or DNA).
In some embodiments, the plurality of mapping statistics includes a ratio of (i) a count of nucleotide sequences that map to the reference sequence of the respective microorganism and (ii) a total count of nucleotide sequences in the plurality of nucleotide sequences. For example, in some such embodiments, a mapping statistic is a measure of quantitative detection based on the relative amount of microorganism-originating nucleic acids. In some embodiments, a mapping statistic measures the proportional compositions of nucleic acids in the sample (e.g., the relative abundance of human and non-human nucleotide sequences).
In some embodiments, the plurality of mapping statistics includes a depth 412 of the mapping of respective nucleotide sequences to the reference sequence of the respective microorganism. In some embodiments, the depth of the mapping of the subset of the plurality of nucleotide sequences that maps to the reference sequence of the respective microorganism is a measure of central tendency of the depth of the mapping at a plurality of regions across the reference sequence. For example, in some such embodiments, the plurality of regions includes each base position in the reference sequence of the respective microorganism. In some embodiments, a region spans at least 1 base, at least 2 bases, at least 3 bases, at least 4 bases, at least 5 bases, at least 6 bases, at least 7 bases, at least 8 bases, at least 9 bases, at least 10 base, at least 20 bases, at least 50 bases, at least 100 bases, at least 1000 bases, at least 10,000 bases, or at least 100,000 bases. In some embodiments, the measure of central tendency is a mean, median, or mode.
In some embodiments, a mapping statistic is obtained for RNA and/or DNA. For example, as illustrated in
In some embodiments, the plurality of mapping statistics includes an antimicrobial resistance status 422 (e.g., 422-1) detected by determining, for the respective microorganism. a locus annotated for antimicrobial resistance, and when the mapping of the respective nucleotide sequences in the plurality of nucleotide sequences to the reference sequence for the respective microorganism at the respective locus indicates the presence of an antimicrobial resistance marker (e.g., an AMR gene), including the antimicrobial resistance marker in the subset of the plurality of mapping statistics. In some embodiments, the inclusion of the antimicrobial resistance marker (e.g., the AMR gene) in the subset of the plurality of mapping statistics is further dependent on the detection of a microorganism 402 (e.g., 402-1), in the biological or non-biological sample, that is associated with the antimicrobial resistance marker. For example, in some embodiments, an antimicrobial resistance marker will not be detected and included in the plurality of mapping statistics where a microorganism that is associated with and/or that has been reported to express the respective antimicrobial resistance marker is not also detected.
As illustrated in
Other Metrics.
Referring to
For example, in some embodiments, a respective result set 122 for a biological or non-biological sample 304 in an index of biological or non-biological samples for the user (e.g., the results dashboard 302 and/or sample queue 306) further comprises a review status 316, an accession code, a sample name, a sample type (e.g., sample descriptor, a tissue of origin, a type of biopsy sample, etc.), a sample description (e.g., descriptors for sample handling and/or processing), a test profile, a sample summary 318 (e.g., an overview of the run and/or mapping statistics), a run identifier 314, a batch identifier 310. a run directory (e.g., an identifier for a location of a digital result set in a local or cloud-based computing infrastructure), a run completion time, an analysis platform version, a review platform version, a pipeline version, an analysis version, an analysis completion time, an identity of a user, and/or an identity of a reviewing user (e.g., a medical director and/or a final reviewer). In some embodiments, one or more additional metrics are displayed on a visualization system such as results dashboard 302, where selection of the one or more additional metrics for display is performed using an affordance 326. For example, in
Other non-limiting examples of run, batch, and/or sample metrics include a review status, a run accession number, a positive control identifier, a negative control identifier, a total number of samples in an index of biological or non-biological samples, a number of samples in a batch, a number of batches in a run, sequencing protocol metrics (e.g., RNAseq, whole transcriptome, panel enriched, and/or shotgun workflows), mapping protocol metrics, positivity rates (e.g., positive hits in patients compared to controls), a reference genome identifier (accession number), a uniqueness (e.g., specificity of an alignment of a nucleotide sequence to a region of a genome), and/or an annotation status (e.g., based on a database, published data, etc.). In some embodiments, additional metrics and/or metadata for a sample 304 is displayed upon receiving a request to display an analysis (e.g., customizable user interface 401-1) of a result set for the respective sample. For example, as illustrated in
Quality Control Data.
As illustrated in
For example, non-limiting examples of sequencing and/or mapping quality control metrics (e.g., 1804, 1904, and/or 2004) include an error rate (e.g., a PhiX error rate), a Q score, a fluorescence intensity (e.g., intensity A and/or intensity C), a measure of reagent fluorescence, a cluster density, a Q score passing metric that includes a count (e.g., a percentage) of bases that pass a Q30 threshold value, a filter passing metric that includes a count (e.g., a percentage) of clusters that pass a quality control filter, one or more adapter dimer metrics, internal controls (e.g., for DNA and/or RNA), a count (e.g., a percentage) of sequencing tiles that passed some or all of the quality control checks, and/or a presence or absence of IC failure. In some embodiments, quality control data is displayed for a positive control sample, a negative control sample, a blank control samples, and/or an analysis sample.
Selection and/or visualization of quality control data, in some embodiments, also includes displaying, on the display, the cutoff thresholds for one or more quality control metrics (e.g., criterion or criteria). For example, in some such embodiments, a score meeting and/or exceeding the cutoff threshold for a quality control metric is required to pass a respective quality control check.
In some embodiments, quality control data is displayed as a text-based representation, a graphical representation, and/or a table. For example, referring to
In some embodiments, quality control data is plotted as a bar chart. For example,
Features of the Analysis.
Referring again to
In some embodiments, the request to display the analysis of the result set is afforded by a selection (e.g., a user selection) of a run (e.g., 314) in an index of runs, a batch (e.g., 310) in an index of batches, and/or a sample (e.g., 304) in an index of samples. For example, in some embodiments, the run, batch, and/or sample is selected from an index of runs, batches, and/or samples displayed on a user-interactive results dashboard (e.g., 302). In some embodiments, the method comprises applying, responsive to the request, a first customizable diagnostic template 138-1 to each respective result set 122 corresponding to each respective sample in a batch. In some embodiments, the method comprises applying, responsive to the request, a first customizable diagnostic template 138-1 to each respective result set 122 corresponding to each respective sample in a run group. For example, in some embodiments, the method further comprises a customizable diagnostic template 138-1 that can be applied during batch processing.
In some embodiments, the specifying the subset of the plurality of sequencing statistics, the subset of the set of microorganisms, and the subset of the plurality of mapping statistics is based on a plurality of parameters that are used as selection criteria applied to the plurality of sequencing statistics, the set of microorganisms, and the plurality of mapping statistics. In some embodiments, the plurality of parameters is predefined (see: Parameters for feature selection, below). In some embodiments, the plurality of parameters is user-specified (see, Customizable analysis of presence of microorganisms. below). Parameters for selection criteria are further illustrated, for example, in
In some embodiments, the applying the first customizable diagnostic template to the result set generates a plurality of features including but not limited to the subset of the plurality of sequencing statistics, the subset of the set of microorganisms, and the subset of the plurality of mapping statistics specified by the first customizable diagnostic template. For example, as will be described in further detail below, in some embodiments the plurality of features includes additional features relating to the viewing, review, visualization, modification, validation, and/or reporting of the analysis of presence of microorganisms.
As used herein, the term “features” refers to any of the information and/or data included in or relating to viewing, review, visualization, modification, validation, and/or reporting of the analysis of presence of microorganisms in the result set. In some embodiments, the plurality of features includes the information and/or data presented in the result set after application of the first customizable diagnostic template. In some such embodiments, the plurality of features includes the subset of the plurality of sequencing statistics, the subset of the set of microorganisms, and/or the subset of the plurality of mapping statistics.
For example, in some embodiments, the subset of the plurality of sequencing statistics includes any one or more sequencing statistics as described herein (see, Sequencing statistics, above), and/or any combination thereof. Similarly, in some embodiments, the subset of the set of microorganisms can include any one or more microorganisms as described herein (see, Microorganisms, above), and/or any combination thereof, and the subset of the plurality of mapping statistics includes any one or more mapping statistics as described herein (see, Mapping statistics, above), and/or any combination thereof. In addition, in some embodiments, the first customizable diagnostic template further specifies a subset of the plurality of additional metrics (e.g., run, batch, and/or sample-level metrics and/or metadata), which can include any one or more of the plurality of additional metrics as described herein (see, Other metrics, above).
In some embodiments, the plurality of features comprises metadata for the result set prior to or after the application of the first customizable diagnostic template, including run metrics, QC metrics, sample metadata, user interaction metadata (e.g., time-stamps, user logs, user history), review status, alert status, pathogen status, annotations. etc.
In some embodiments, features also refer to any predefined or customizable parameters for the analysis of the result set, including predefined or customizable parameters for the customization of the first customizable diagnostic template, predefined or customizable parameters for the customization of the second customizable diagnostic template, predefined or customizable parameters for the presentation of information (e.g., sample information, result set analysis data, detected or putatively detected microorganisms, sequencing statistics, mapping statistics, run metrics, QC metrics, and/or result set metadata), and/or predefined or customizable parameters for performing actions related to the presentation or analysis of the result set, including selecting, viewing, reviewing, visualizing, modifying, validating, and/or reporting any of the abovementioned features, and/or any affordances for performing the same (e.g., via user interaction).
Visual Indicators and Graphical Representations.
In some embodiments, features also refer to any visual indicators displayed, on the display, for the presentation of any of the abovementioned features, including sample information, result set analysis data, detected or putatively detected microorganisms, sequencing statistics, mapping statistics, run metrics, QC metrics, and/or result set metadata.
In some embodiments, visual indicators include affordances for performing actions, including selection, viewing, review, visualization, modification, validation, and/or reporting of any of the abovementioned features. For example, in some embodiments, an affordance is a text-based or graphical hyperlink that opens a new display. In some embodiments, an affordance is a text-based or graphical operator that performs an action (e.g., an analysis of a result set, an application of a filter to a result set, an approval of a review, a generation of a report, a transmission of a generated report, etc.). In some embodiments, an affordance is an adjustable interactive feature, such as a slider bar and/or a scroll bar (e.g., for adjusting a threshold of a detection threshold). In some embodiments, an affordance is a clickable interactive feature, such as a button or a hyperlink. In some embodiments, an affordance is a toggle button, a checkbox, a radio button, and/or a dropdown list.
In some embodiments, visual indicators include graphical representations of any of the abovementioned features.
In some embodiments, visual indicators include text-based representations of any of the abovementioned features.
In some embodiments, a visual indicator is an alphanumeric character, a string of alphanumeric characters, a shape, an image, a color, and/or a pattern.
In some embodiments, visual indicators include a plurality of other metrics and/or metadata, including sequencing statistics, mapping statistics, and/or quality control data that are displayed, on the display, as a text-based or graphical representation, responsive to a detection of a selection of a biological or non-biological sample.
In some embodiments, a graphical representation includes heatmaps, bar graphs, density plots, dot plots, line graph, area graph, scatter plot, box and whisker plot, violin plot, histogram, pie chart, and/or any form of graphical representation as will be apparent to one skilled in the art.
Viewing Features.
Referring to
In some embodiments, the customizable user interface comprises any visual indicators and/or text-based or graphical representations as described above to convey information for one or more features of the analysis.
In some embodiments, in addition to displaying features (i) through (v) above, the customizable user interface further includes a corresponding summary of the subset of the plurality of additional metrics (e.g., run, batch, and/or sample-level metrics and/or metadata) specified by the first customizable diagnostic template.
In some embodiments, the review status 440 for the nucleic acid sequencing data indicates the current review status and the next following review status. For example, in
In some embodiments, as illustrated in
In some embodiments, the customizable user interface further comprises a count of microorganisms detected in the biological or non-biological sample 304. In some embodiments, the customizable user interface further comprises an identity of each microorganism 402 detected in the biological or non-biological sample 304. In some embodiments, the customizable user interface further comprises an identity of an AMR gene 422 detected in the biological or non-biological sample 304.
For example,
In some embodiments, the subset of the set of microorganisms satisfying the minimum mapping threshold is a threshold number of microorganisms with the highest values for a percent sequence alignment, based on an alignment of respective nucleotide sequences to the reference sequence of the respective microorganism. For example, in some embodiments, the subset of the set of microorganisms (e.g., the subset of the set of at least 3, at least 5, or at least 10 microorganisms) is the top N microorganisms with the highest percent sequence alignment. In some such embodiments, N is a positive integer. In some embodiments, Nis 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more than 30.
In some embodiments, the subset of the set of microorganisms satisfying the minimum mapping threshold is a threshold number of microorganisms with the highest values for a sequencing coverage, based on the mapping of respective nucleotide sequences to the reference sequence of the respective microorganism. For example, in some embodiments, the subset of the set of microorganisms is the top N microorganisms with the highest sequencing coverage. In some such embodiments, N is a positive integer. In some embodiments, N is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more than 30.
In some embodiments, the minimum mapping threshold is determined based on a minimum confidence score obtained using at least a coverage, a uniqueness metric, and an annotation metric for each respective nucleotide sequence in the plurality of nucleotide sequences that maps to the reference sequence for the respective microorganism.
In some embodiments, the minimum mapping threshold is user-customizable. In some embodiments, the minimum mapping threshold is predefined.
In some embodiments, a user interaction is used to view and/or display one or more features in the customizable user interface 401. In some embodiments, a user interaction includes clicking on a feature (e.g., an organism name) to view expanded feature information. In some embodiments, a user interaction includes hovering a pointer (e.g., a mouse) over a feature to view expanded feature information. For example,
In some embodiments, the viewing expanded feature information generates a new display (e.g., a new window, a new tab, or an overlay display such as a popup window). In some embodiments, the new display has an affordance for canceling the new display of the expanded feature information (e.g., a close-out or exit button, a back button, etc.). In some embodiments, the new display of the expanded feature information is canceled by user interaction (e.g., clicking a mouse) on a portion of the display that does not contain the expanded feature information (e.g., for an overlay display or popup window, the display of the expanded feature information can be canceled by clicking anywhere on the screen outside of the popup window).
In some embodiments, the viewing expanded feature information is displayed as a transitory display where visibility is dependent on instant or present user interaction. For example, in some embodiments, the expanded feature information is presented as an overlay only when a user directs a pointer (e.g., a mouse) to a specific location on the display. When the user moves the pointer to a different location on the display, the overlay is removed. For example, as shown in
In some embodiments, upon receiving a user interaction, the display displays a change in a visual indicator. For example, where a visual indicator is an alphanumeric character, a string of alphanumeric characters, a shape, an image, a color, and/or a pattern, a change in a visual indicator can include a change in the alphanumeric character, the string of alphanumeric characters, the shape, the image, the color, and/or the pattern. In some such embodiments, the change in the visual indicator includes a change in the intensity, size, thickness, and/or formatting of any of the above visual indicators. In some embodiments, the change in a visual indicator upon receiving a user interaction includes displaying a visual indicator where a visual indicator was not previously displayed.
Referring to
In some embodiments, as illustrated in
In some embodiments, upon user selection of the fourth affordance, the method further comprises displaying, on the display, a graphical representation of a mapping statistic in the subset of the plurality of mapping statistics. In some embodiments, the expanding the summary of the subset of the plurality of mapping statistics displays the subset of the plurality of mapping statistics. In some embodiments, the display is provided in a new display window (e.g., a popup window). For example, selection of an example fourth affordance 434 (“Show”) illustrated in
In some embodiments, user selection of the fourth affordance for expanding upon a summary of a plurality of values for the subset of the plurality of mapping statistics comprises selecting a respective microorganism in the subset of the set of microorganisms that satisfies a minimum mapping threshold in the result set.
In some embodiments, each microorganism in the subset of the set of microorganisms that satisfies a minimum mapping threshold in the result set that is displayed in the customizable user interface can be selected by a user, thereby expanding upon the summary of the subset of the plurality of mapping statistics for the respective microorganism.
See, for example,
In some embodiments, expanding the summary of the subset of the plurality of sequencing statistics (e.g., via user selection of the third affordance) and/or expanding the summary of the subset of the plurality of mapping statistics (e.g., via user selection of the fourth affordance) further comprises displaying a comment appended to a sequencing statistic and/or a mapping statistic. For instance, in some embodiments, as illustrated in
In some embodiments, the expanding the summary of the subset of the plurality of sequencing statistics (e.g., via user selection of the third affordance) and/or the expanding the summary of the subset of the plurality of mapping statistics (e.g., via user selection of the fourth affordance) further comprises displaying expanded feature information for an antimicrobial resistance marker 422 (e.g., an AMR gene). For instance, selection of an example affordance 436 (“Show”) illustrated in
In some embodiments, the customizable user interface further comprises a summary of a subset of a plurality of sequencing quality control metrics, an affordance for expanding upon the corresponding summary of the subset of the plurality of sequencing quality control metrics, a summary of a subset of a plurality of mapping quality control metrics, an affordance for expanding upon the corresponding summary of the subset of the plurality of mapping quality control metrics, a summary of a subset of a plurality of run quality control metrics, and/or an affordance for expanding upon the corresponding summary of the subset of the plurality of run quality control metrics.
In some embodiments, the customizable user interface further comprises a summary of a subset of a plurality of sample-level quality control metrics, an affordance for expanding upon the corresponding summary of the subset of the plurality of sample-level quality control metrics, a summary of a subset of a plurality of batch-level quality control metrics, an affordance for expanding upon the corresponding summary of the subset of the plurality of batch-level quality control metrics, a summary of a subset of a plurality of run-level quality control metrics, and/or an affordance for expanding upon the corresponding summary of the subset of the plurality of run-level quality control metrics. See, for example,
Parameters for Feature Selection.
In some embodiments, the first customizable diagnostic template comprises a plurality of parameters 140 for specifying and subsequently displaying, on the customizable user interface, (i) the subset of the plurality of sequencing statistics, (ii) the subset of the set of microorganisms, and (iii) the subset of the plurality of mapping statistics. In some embodiments, selection and display of the subset of the set of microorganisms represents at least a preliminary determination of a presence of the subset of microorganisms in the biological or non-biological sample. Therefore, to ensure accurate determination of the presence of microorganisms, in accordance with some embodiments of the present disclosure, the selection of parameters for applying the first customizable diagnostic template to the result set can be optimized as well. In some embodiments, one or more parameters is selected to specify a minimum mapping threshold in the result set for the respective nucleotide sequences in the plurality of nucleotide sequences that map to the corresponding reference sequence of one or more respective microorganisms in the set of microorganisms. Minimum mapping thresholds are further disclosed herein (see, for example, the section entitled “Viewing features,” above).
Non-limiting examples of parameters 140 used, in some embodiments, for applying the first customizable diagnostic template to the result set include any of the sequencing statistics, mapping statistics, additional metrics, quality control metrics, and/or additional features as disclosed herein and/or as illustrated in
In some embodiments, the values of the parameters of the first customizable diagnostic template are predefined (e.g., automated). In some embodiments, the values of the parameters of the first customizable diagnostic template are user-specified (e.g., customizable). Customization of parameters (e.g., for feature selection and determination of presence of microorganisms) is described in detail in a following section (see, Customizable analysis of presence of microorganisms).
In some embodiments, a value of a parameter is a percentage value (e.g., a numeric value between 0 and 100). For example a cutoff threshold for a parameter (e.g., a coverage, an average nucleotide identity, an RNA sensitivity, an RNA specificity, a DNA sensitivity, a DNA specificity, etc.) is between 0 and 10%, between 10 and 20%, between 20 and 30%, between 30 and 40%, between 40 and 50%, between 50 and 60%, between 60 and 70%, between 70 and 80%, between 80 and 90% or between 90 and 100%. In some embodiments, the cutoff threshold for a parameter is at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%/o.
In some embodiments, a value of a parameter is a binary status (e.g., a presence or an absence of a status and/or classification). For example, in some embodiments, a status is pathogen/not pathogen, medically relevant/not medically relevant, validated/not validated (e.g., a review status), and/or quality control check pass/fail.
In some embodiments, a parameter is selected from one or more finite classifications and/or annotations (e.g., microorganism classification (B, F, V, and/or P), etc.).
In some embodiments, a parameter is a keyword or alphanumeric string (e.g., a disease annotation, an organism name, and/or a phylogenetic lineage). In some embodiments, the parameter is a predefined keyword or alphanumeric string that is selected from a finite list of options (e.g., using a dropdown list and/or a checkbox).
In some embodiments, the parameter is a keyword or alphanumeric string that is specified using a free-text search (e.g., via a manual entry box).
In some embodiments, a value of a parameter is a minimum amount of evidence, where evidence is defined as a publication (e.g., in a medical journal, academic journal, and/or conference abstract), annotation (e.g., in a genome database), and/or co-occurrence of a microorganism with a feature of interest, such as a disease condition. In some such embodiments, the value of the parameter is between 0 and 100,000, between 0 and 50,000, between 0 and 20,000, or between 0 and 10.000.
In some embodiments, a parameter is an annotation (e.g., of a microorganism with a disease condition and/or other clinical or diagnostic feature of interest). For example, in some embodiments, a respective microorganism is annotated with an annotation if a co-occurrence of the microorganism and the feature of interest is observed at least a minimum number of times in, e.g., clinical or academic literature, pathogen databases, and/or other resources such as a digital library or a nucleic acid database.
Additionally, in some embodiments, the value of a parameter is any value for a respective feature that is known in the art to be standard or substantially standard for reliable sample processing and analysis, such as passing a quality control check. In some embodiments, the value of a parameter is empirically determined (e.g., based on laboratory experimentation). In some embodiments, the value of a parameter is optimized for detection of a specific microorganism, disease condition, and/or antimicrobial resistance marker of interest.
Other non-limiting parameters for feature selection include depth, read count, and/or reference length. For example, in some embodiments, the cutoff threshold for depth is at least 1, at least 2, at least 5, at least 10, at least 20, at least 100, at least 200, at least 500, or at least 1,000.
In some embodiments, the cutoff threshold for RNA read count is between 0 and 10 million, between 0 and 5 million, between 0 and 1 million, between 0 and 750,000, between 0 and 500,000, between 0 and 200,000, between 0 and 100,000 between 0 and 50,000, or between 0 and 20,000.
In some embodiments, the cutoff threshold for DNA read count is between 0 and 1 billion, between 0 and 100 million, between 0 and 10 million, between 0 and 5 million, between 0 and 1 million, between 0 and 750,000, between 0 and 500,000, or between 0 and 300,000.
In some embodiments, the cutoff threshold for RNA reference length is between 0 and 1 million, between 0 and 500,000, between 0 and 100,000, between 0 and 50,000, between 0 and 10,000 between 0 and 7000, between 0 and 5000, between 0 and 2,000, or between 0 and 1500.
In some embodiments, the cutoff threshold for DNA reference length is between 0 and 1 billion, between 0 and 100 million, between 0 and 10 million, or between 0 and 5 million.
Additional parameters and example ranges for the same are illustrated in
External Links to Databases.
In some embodiments, upon user selection of the fourth affordance, the method further comprises displaying, on the display, an affordance (e.g., 512 and/or 628; see
In some such embodiments, additional information for one or more features are accessible through external links, including sequences of reference sequences (e.g., BLAST, NCBI) and/or databases for detected or otherwise selected microorganisms (e.g., EMBL-EBI, GenBank, Ensembl, EuPathDB, The Human Microbiome Project, Pathogen Portal, RDP, SILVA, GREENGENES, EBI Metagenomics. EcoCyc, PATRIC. TBDB, PlasmoDB, the Microbial Genome Database (MBGD), and/or the Microbial Rosetta Stone Database). See, for example, Zhulin, 2015, “Databases for Microbiologists,” J Bacteriol 197:2458-2467, doi:10.1128/JB.00330-15; Uchiyama et al., 2019, “MBGD update 2018: microbial genome database based on hierarchical orthology relations covering closely related and distantly related comparisons,” Nuc Acids Res., 47 (D1), D382-D389, doi: 10.1093/nar/gky1054; and Ecker et al., 2005, “The Microbial Rosetta Stone Database: A compilation of global and emerging infectious microorganisms and bioterrorist threat agents,” BMC Microbiology 5, 19, doi: 10.1186/1471-2180-5-19: each of which is hereby incorporated by reference herein in its entirety, for additional databases that can be used for analyzing microorganisms.
In some embodiments, a selection of the affordance for accessing a reference sequence database transmits a nucleotide sequence (e.g., the sequence of a reference genome) corresponding to the respective microorganism to the reference sequence database. In some embodiments, the selection of the affordance for accessing a reference sequence database populates an affordance for a manual entry of a text string (e.g., a search box) with the nucleotide sequence (e.g., the sequence of a reference genome) corresponding to the respective microorganism.
Modes of Use.
The review and visualization tools disclosed herein include a plurality of different metrics that provide a user (e.g., a laboratory or medical practitioner) with a comprehensive suite of results in an accessible, streamlined format (e.g., sequencing validation, sequencing statistics, mapping validation, mapping statistics, microorganism detection, microbe-specific annotations, pathogen information, antimicrobial resistance gene expression, and therapeutic treatments, among others). As discussed above, such features allow the analysis and interpretation of nucleic acid sequencing data by users without advanced skills in each and every one of the various aspects of the analysis. In some embodiments, the provided review and visualization tools present a summary of the information relevant to analyzing the presence of microorganisms in a respective biological or non-biological sample such that it can be efficiently examined, understood, and/or reviewed by a practitioner. Further customization is also possible for situations that necessitate fine-tuning.
Automated Analysis of Presence of Microorganisms.
In some embodiments, detection of microorganisms is performed using an automated process, using predefined (e.g., default) thresholds for a plurality of parameters (see, Parameters for feature selection, above). However, these thresholds can be adjusted by a user or practitioner, as discussed in the following sections.
Customizable Analysis of Presence of Microorganisms.
In particular, in addition to an automated process for analysis of the presence of microorganisms, in some embodiments, any one of the parameters and/or detection thresholds can be adjusted based on user preference and/or a priori knowledge. In some embodiments, customizing occurs through a user interaction with one or more affordances.
For example, in some embodiments, an affordance is a text-based or graphical hyperlink that generates a new display (e.g., affordance 434). In some embodiments, an affordance is a text-based or graphical operator that performs an action (e.g., an analysis of a result set, an application of a filter to a result set, an approval of a review, a generation of a report, and/or a transmission of a generated report). In some embodiments, an affordance is an adjustable interactive feature, such as a slider bar and/or a scroll bar (e.g., for adjusting a threshold of a detection threshold). In some embodiments, an affordance is a clickable interactive feature, such as a button or a hyperlink. In some embodiments, an affordance is a toggle button, a checkbox, a radio button, and/or a dropdown list. In some embodiments, an affordance is a manual entry box (e.g., that accepts a user-inputted alphanumeric character and/or an alphanumeric text string).
In some embodiments, the customizable user interface also comprises an affordance for storing the parameters (e.g., as a profile). Storing parameters as test profiles is further described, for instance, in the section entitled “Administrator control,” below, with reference to
In some embodiments, one or more mapping statistics in the subset of the plurality of mapping statistics can be modified. In some such embodiments, the method further comprises displaying, on the display, an affordance for amending the subset of the plurality of mapping statistics. For example, in some embodiments, upon user selection of the fourth affordance (e.g., for expanding the subset of the plurality of mapping statistics), the method further comprises displaying, on the display, an affordance for amending the subset of the plurality of mapping statistics. Modifications to the subset of the plurality of mapping statistics, for a respective microorganism, to be displayed on the display can be performed by adjusting one or more parameters, such as test profile parameters 2304 illustrated in
In some embodiments, one or more sequencing statistics in the subset of the plurality of sequencing statistics can be modified. In some embodiments, the method further comprises displaying, on the display, an affordance for amending the subset of the plurality of sequencing statistics. Modifications to the subset of the plurality of sequencing statistics to be displayed on the display can be performed by adjusting one or more parameters. such as adjustable cutoffs for sequencing statistics 128-1 . . . . , 128-M-3 illustrated in
In some embodiments, one or more microorganisms in the subset of the set of microorganisms can be modified. In some embodiments, the method further comprises displaying, on the display, an affordance for amending the subset of the set of microorganisms. Modifications to the subset of the set of microorganisms to be displayed on the display can be performed by adjusting one or more parameters. such as selection of relevant subclasses 2306 and/or evidence categories 2308 illustrated in
In some embodiments, the method further comprises displaying, on the display, an affordance for amending any additional features as described above.
In some embodiments, the method further comprises displaying, on the display, an affordance for amending any of the visual indications as described above (e.g., on the system for review and visualization, the dashboard, and/or the customizable user interface). Amendments to features of the result set analysis, including mapping statistics, sequencing statistics, and/or subsets of microorganisms, are further described herein, e.g., in the sections entitled “Filters” and “Administrator control,” below.
Review Status and Approvals.
As described above, in some embodiments, the customizable user interface 401 comprises an affordance for updating the review status for the nucleic acid sequencing data. In some embodiments, the method further comprises obtaining an approval for the customizable user interface. In some embodiments, the method further comprises obtaining a plurality of approvals for the customizable user interface. For example, in some such embodiments, the analysis of the result set includes accepting one or more approvals (e.g., by a laboratory or medical technician, supervisor and/or director) prior to final approval of the analysis of the results set.
In some embodiments, the customizable user interface comprises an affordance for submitting a review (e.g., for a sample) (e.g., affordance 604). In some embodiments, the customizable user interface comprises an affordance for canceling a review (e.g., for the sample) (e.g., affordance 608). In some embodiments, the customizable user interface comprises an affordance for resetting a review (e.g., to a default state) (e.g., affordance 606). In some embodiments, the affordance for updating the review status is an affordance for initiating a review of the analysis (e.g., review affordance 332 in results dashboard 302).
In some embodiments, each approval stage for a respective sample is indicated by a review status (e.g., review status 440 in customizable user interface 401 and/or status 316 in results dashboard 302). Furthermore, in some embodiments, selection and/or approval at any stage of the approval process (e.g., first, second, third, and/or final approval) can be tagged with a user identity, an access time-stamp, and/or a record of each change made in the respective sample.
In some embodiments, submission of the review updates the review status from a first review status to a second review status. For example, in some embodiments, submission of a review for a sample with a “first review” status updates the review status to “second review”. Similarly, in some embodiments, additional submissions of reviews sequentially change the review status from “second review” to “medical director review,” “final review,” and “approved.” For example, in
In some embodiments, final approval of a sample (e.g., a control and/or an analysis sample) removes the sample 304 from the index of biological or non-biological samples 306 (e.g., the list of one or more pending samples). In some such embodiments, when the review status is finally approved, the sample is displayed in a second index of biological or non-biological samples (e.g., a “results history” page) and is no longer visible in the first index of biological or non-biological samples (e.g., the “pending samples” dashboard). In some embodiments, the customizable user interface further comprises a result history comprising at least the first customizable diagnostic template applied to the result set, wherein the review status of the result set is approved.
As illustrated in
In some embodiments, any one of the results in the results set can be separately approved or rejected, including the presence or absence of a detected microorganism (e.g., “validated” and/or “passed”), a passing score for a quality control metric (e.g., “passed”), and/or a passing score for a sequencing or mapping statistic compared to a filtering threshold (e.g., “passed”).
For example,
In some embodiments, upon user selection of the third affordance (e.g., for expanding the summary of the subset of the plurality of sequencing statistics), the method further comprises displaying, on the display, an affordance for validating (e.g., approving or rejecting) the subset of the plurality of sequencing statistics. In some embodiments, upon user selection of the fourth affordance (e.g., for expanding the summary of the subset of the plurality of mapping statistics), the method further comprises displaying, on the display, an affordance for validating (e.g., approving or rejecting) the subset of the plurality of mapping statistics. In some embodiments, a mapping statistic (e.g., a display comprising expanded microorganism information) in the subset of the plurality of mapping statistics can be individually validated. In some embodiments, one or more samples, results, or metrics can be flagged for further review. For example, an affordance 514 for validating and/or displaying a validation status of a subset of the plurality of sequencing statistics 128 and/or a subset of the plurality of mapping statistics 126 for a respective microorganism 402 is illustrated in
Comment Function.
In some embodiments, upon user selection of the fourth affordance, the method further comprises displaying, on the display, an affordance for appending a user-inputted text string (e.g., a comment and/or note) to the subset of the plurality of mapping statistics.
For example, as illustrated in
In some embodiments, the user-inputted text string is a feedback or an internal note. In some embodiments, affordance for appending a user-inputted text string is accessible to a reviewer e.g., a first, second, third or final reviewer. In some embodiments, a user-inputted text string can be edited. In some embodiments, a user-inputted text string is visible to other users, e.g., a comment provided by a first reviewer is visible to a final reviewer.
Alert Status.
In some embodiments, the customizable user interface includes an alert status indicator (e.g., N: no call: A: alert; C: critical). In some embodiments, the alert status indicator is applied to a microorganism in the subset of the set of microorganisms to flag the respective microorganism for review.
Quick Access Tools.
In some embodiments, the customizable user interface comprises an affordance for viewing and/or selecting a biological or non-biological sample 304 for analysis of presence of microorganisms. In some such embodiments, the affordance for viewing and/or selecting biological or non-biological samples is accessible from a first customizable user interface of a first biological or non-biological sample. In some embodiments, a selection of a second biological or non-biological sample, using the affordance for viewing and/or selecting biological or non-biological samples, applies the first customizable diagnostic template 138-1 to the selected second biological or non-biological sample and displays a corresponding second customizable user interface for the second biological or non-biological sample. For example,
Customization of User Inter/Ace.
Additional elements that can be customized include specific parameters or metrics to be presented on the display for each sample, batch, and/or run. In some embodiments, an affordance is provided for modifying the display.
For example, in some embodiments, e.g., as illustrated in
In accordance with some embodiments of the present disclosure,
In some embodiments, an affordance is provided for modifying the subset of the set of microorganisms that is displayed on the customizable user interface. For example, in some embodiments, a user interaction with the affordance causes display, on the customizable user interface, for a microorganism in the set of microorganisms. In some embodiments, a user interaction with the affordance causes display of all of the microorganisms in the set of microorganisms. For example,
In some embodiments, a user interaction with the affordance (e.g., “show all” affordance 442) displays all of the microorganisms included in the result set (e.g., all microorganisms to which the plurality of nucleotide sequences were mapped). For example,
Search Function.
In some embodiments, an affordance is provided for selecting, from the one or more biological or non-biological samples in the index of biological or non-biological samples, a biological or non-biological sample based on an input (e.g., a value) for a respective feature in one or more features of the biological or non-biological sample.
For example, in some embodiments, the user interface includes, for each feature in the one or more features, an affordance for applying a filter to the index of biological or non-biological samples, based on an input for the respective feature.
In an example,
Addition of Microorganisms.
In some embodiments, an affordance is provided for adding a microorganism to the set of microorganisms. In some embodiments, an affordance is provided for adding a microorganism to the subset of the set of microorganisms.
In some embodiments, the affordance 1402 includes an affordance 1408 for assigning a detection status to the microorganism (e.g., detected and/or inconclusive). In some embodiments, the affordance 1402 includes an affordance for assigning a category 1410 to the microorganism (e.g., potential pathogen and/or additional microorganism). In some embodiments, the affordance 1402 includes an affordance 1406 for assigning a validation status to the microorganism (e.g., validated and/or not validated). In some embodiments, the affordance 1402 includes an affordance 1414 for assigning an alert to the microorganism (e.g., no alert, alert, and/or critical). In some embodiments, the affordance 1402 includes an affordance 1424 for assigning an abundance status to the microorganism (e.g., computed, omitted, and/or manual). In some embodiments, the affordance 1402 includes an affordance for assigning an abundance value to the microorganism (e.g., a percentage). In some embodiments, the affordance 1402 includes an affordance 1412 for assigning a class type to the microorganism. In some embodiments, the affordance 1402 includes an affordance 1416 for assigning a number of RNA reads to the microorganism. In some embodiments, the affordance 1402 includes an affordance 1420 for assigning an RNA reference length to the microorganism. In some embodiments, the affordance 1402 includes an affordance 1418 for assigning a number of DNA reads to the microorganism. In some embodiments, the affordance 1402 includes an affordance 1422 for assigning a DNA reference length to the microorganism. In some embodiments, the affordance 1402 includes an affordance 1426 for assigning a report comment to the microorganism. In some embodiments, the affordance 1402 includes an affordance 1428 for assigning an internal note to the microorganism.
In some embodiments, a feature and/or a value for the respective feature is added to the microorganism by a user selection of an entry in a list of entries (e.g., from a dropdown list and/or a checkbox list). In some embodiments, the feature and/or a value for the respective feature is added to the microorganism by manual entry of a text-string. In some embodiments, the affordance 1402 (e.g., “Add Organism Form”) for adding a microorganism to the subset of the set of microorganisms further includes an affordance 1430 for finalizing and submitting the added organism to the subset of the set of microorganisms (e.g., “Add Organism”).
Edit Results Summary.
In some embodiments, the customizable user interface includes a result summary including a status of the analysis of the result set based on the plurality of mapping statistics for each respective microorganism in the set of microorganisms, where the status is selected from the group consisting of: invalid (e.g., no organisms detected and/or failed total IC norm reads), inconclusive, microorganisms detected, microorganisms detected including potential pathogens, and no microorganisms detected. In some embodiments, the customizable user interface includes a status of an analytical sensitivity based on the mapping of the plurality of nucleotide sequences against the reference sequences of the set of microorganisms. In some embodiments, the analytical sensitivity status is adequate or reduced.
Pathogen Status.
In some embodiments, an affordance is provided for indicating a pathogen status (e.g., pathogen or not pathogen) for a microorganism in the subset of the set of microorganisms. For example,
Export Results.
In some embodiments, the customizable user interface 401 further includes an affordance for exporting a summary of the analysis of the results set 122. In some embodiments, the customizable user interface 401 further includes an affordance for previewing an exported summary of the analysis of the results set 122. In some embodiments, the customizable user interface 401 further includes an affordance for generating a report of the analysis of the results set 122. In some embodiments, the customizable user interface 401 further includes an affordance for previewing a report of the analysis of the results set 122. In some embodiments, the exported results include results for a respective biological or non-biological sample 304. In some embodiments, the exported results include results fora respective organism (e.g., microorganism 402).
An exported summary or a report can be customized by selecting the features to be included. In some embodiments, the customizable user interface includes an affordance for selecting, for the previewing of the exported summary, the subset of the plurality of sequencing statistics and the subset of the plurality of mapping statistics from the results set. In some embodiments, the customizable user interface further includes an affordance for selecting, for the report, the subset of the plurality of sequencing statistics and the subset of the plurality of mapping statistics from the results set to be included in the report.
For example,
In some embodiments, as illustrated in
In some embodiments, the plurality of features that can be selected or deselected for inclusion in the exported results and/or the report (e.g., using feature selection displays 1614 and/or 1616) include any one or more of the sequencing statistics, mapping statistics, additional metrics, quality control metrics, set of microorganisms, and/or other metadata associated with a respective sample, batch, or run as disclosed herein. In some embodiments, the plurality of features that can be selected or deselected for inclusion in the exported results and/or the report (e.g., using feature selection displays 1614 and/or 1616) include: a platform, environment, project, software version (e.g., Explify version), review portal version, analysis pipeline version, analysis version, run ID, run directory, run start time, run completion time, batch ID, results directory, total run yield, percent bases that pass a Q30 threshold, cluster density, percent clusters passing a filter, PhiX error rate, percent of sequencing tiles that pass a selection criterion, intensity A, intensity C, chemistry, instrument ID, accession number, sample ID, sample name, sample type, results ready time, MD review start time, MD review completion time, report transmission time, positive control ID, positive control lot, negative control ID, negative control lot, RNA IC ID, RNA IC lot, RNA MS2 norm reads, RNA MS2 raw reads, RNA Qbeta norm reads, RNA Qbeta raw reads, DNA IC ID, DNA IC lot, DNA T7 norm reads, DNA T7 raw reads, DNA PR772 norm reads, DNA PR772 raw reads, RNA library type, RNA library name, RNA seq sample, RNA total raw reads, RNA post-adaptor reads, RNA post-quality reads, RNA unique reads. RNA percent unique reads, RNA entropy, RNA G content, RNA library Q score, RNA library size, RNA library concentration, RNA sample index, DNA library type, DNA library name, DNA seq sample, DNA total raw reads, DNA post-adaptor reads, DNA post-quality reads, DNA unique reads, DNA percent unique reads. DNA entropy, DNA G content, DNA library Q score, DNA library size, DNA library concentration, DNA sample index, and/or detected organisms.
In some embodiments, the plurality of features that can be selected or deselected for inclusion in the exported results and/or the report (e.g., using feature selection displays 1614 and/or 1616) further include: an organism name, class type, subclasses, reporting ID, review information, positive control organism name, potential pathogen information, medically relevant information, validation information. passed cutoff information, nucleic acid information, antibiotic information, associated organisms, host detection status, RNA percent coverage, RNA sensitivity cutoff, RNA specificity cutoff, RNA bit score, RNA bit score cutoff, RNA average nucleotide identity. RNA median depth, RNA reads, RNA quantity, RNA reference length, RNA overall covered bases, RNA total bases, DNA percent coverage, DNA sensitivity cutoff, DNA specificity cutoff, DNA bit score, DNA bit score cutoff, DNA average nucleotide identity, DNA median depth, DNA reads, DNA quantity. DNA reference length, DNA overall covered bases, and/or DNA total bases.
In some embodiments, any of the features disclosed in the foregoing paragraphs can be modified or customized via user interaction.
In some embodiments, other features are customizable and/or user interactive, as will be apparent to one skilled in the art. In some embodiments, the customization and/or user interaction is performed using any of the user inputs and/or affordances disclosed herein, and/or any substitutions, modifications, additions, deletions, and/or combinations thereof. In some embodiments, the method includes, upon selection of an affordance for a reporting action (e.g., report generation 1604 and/or exporting results 1608), generating a report. Report generation is further described herein, e.g., in the section entitled “Report generation,” below, with reference to
Filters.
As described above, in some embodiments, the second customizable diagnostic template includes a plurality of filters for filtering the result set for the biological or non-biological sample, based on one or more features. For example, the second customizable template can be applied to the result set to further limit the result set to display information related to specific microorganisms, specific pathogens, specific disease conditions, and/or any other feature of interest. In some embodiments, the second customizable template can be applied to the result set to further limit the result set to display information that passes one or more cutoff thresholds. As illustrated in
In some embodiments, upon user selection of the second affordance, the method further comprises applying the second customizable diagnostic template to the result set by applying a filter to the subset of the plurality of sequencing statistics, the subset of the set of microorganisms, and the subset of the plurality of mapping statistics.
In some embodiments, the second customizable diagnostic template includes a disease condition filter, a microorganism in the set of microorganisms is annotated with the disease condition based on a threshold number of co-occurrences (e.g., evidence 1206) of the microorganism and the disease condition in a database (e.g., a disease annotation in a database), and the applying the filter selectively retains one or more microorganisms annotated with the disease condition.
In some embodiments, the threshold number of co-occurrences of the microorganism is at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 50, at least 100, at least 500, at least 1000, at least 2000, or at least 5000.
In some embodiments, the disease condition is an infectious disease. In some embodiments, the disease condition is a medically relevant condition (e.g., “Medically Relevant” affordance 1208). In some embodiments, the disease condition is a disease caused by a pathogen. In some embodiments, the disease condition is a disease caused by a microorganism. In some embodiments, the disease condition is a brain infection, urinary tract disease, respiratory disease, CNS, and/or cancer.
In some embodiments, the disease condition is influenza, common cold, measles, rubella, chickenpox, norovirus, polio, infectious mononucleosis (mono), herpes simplex virus (HSV), human papillomavirus (HPV). human immunodeficiency virus (HIV). viral hepatitis (e.g., hepatitis A, B, C, D, and/or E), viral meningitis, West Nile Virus, rabies, Ebola, strep throat, bacterial urinary tract infections (UTIs) (e.g., coliform bacteria), bacterial food poisoning (e.g., E. coli, Salmonella, and/or Shigella), bacterial cellulitis (e.g., Staphylococcus aureus (MRSA)), bacterial vaginosis, gonorrhea. chlamydia, syphilis, Clostridium difficile (C. diff), tuberculosis, whooping cough, pneumococcal pneumonia. bacterial meningitis. Lyme disease, cholera, botulism, tetanus, anthrax, vaginal yeast infection, ringworm, athlete's foot, thrush, aspergillosis, histoplasmosis, Cryptococcus infection, fungal meningitis, malaria, toxoplasmosis, trichomoniasis, giardiasis, tapeworm infection, roundworm infection, pubic and head lice, scabies, leishmaniasis, and/or river blindness.
In some embodiments, the disease condition is a viral respiratory disease. In some embodiments, the disease condition is a coronavirus infection. In some embodiments, the disease condition is a SARS-CoV-2 infection.
In some embodiments, the second customizable diagnostic template includes a target microorganism filter, and the applying the filter selectively retains one or more microorganisms that share at least a threshold sequence identity to the target microorganism. For example, in some embodiments, the threshold is customized to selectively retain, from the result set, a plurality of pathogens including a first pathogen and a second pathogen that is genetically similar to the first pathogen (e.g., based on a sequence identity, a class, a parentage, and/or a phylogenetic lineage). In some embodiments, the threshold sequence identity is between 0 and 10%. between 10 and 20%, between 20 and 30%, between 30 and 40%, between 40 and 50%, between 50 and 60%, between 60 and 70%, between 70 and 80%, between 80 and 90%, or between 90 and 100%. In some embodiments, the threshold sequence identity is at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%. In some embodiments, the applying the filter comprises manually entering a numeric value for a threshold sequence identity. In some embodiments, the applying the filter comprises manually entering a microorganism class to be selectively retained. In some embodiments, the applying the filter comprises manually entering a microorganism name to be selectively retained (e.g., organism name search 1210). In some embodiments, the applying the filter comprises manually entering a microorganism parent name to be selectively retained. In some embodiments, the applying the filter comprises manually entering a phylogenetic lineage to be selectively retained (e.g., phylogenetic lineage search 1212).
In some embodiments, the second customizable diagnostic template includes an antimicrobial resistance filter, the applying the filter selectively retains one or more microorganisms, and the mapping of the respective nucleotide sequences in the plurality of nucleotide sequences to the reference sequence for the respective microorganism indicates the presence of an antimicrobial resistance marker (e.g., where an AMR gene is based on an annotation and/or a platform-curated genome library).
In some embodiments, the second customizable diagnostic template includes a mapping statistics filter (e.g., RNA filters 1214 and/or DNA filters 1216), and the applying the filter selectively retains one or more microorganisms having at least a threshold value for a mapping statistic in the plurality of mapping statistics (e.g., coverage, depth, sample type, tissue of origin, nucleic acid type, number of reads, reference length, ANI, bit score, and/or PID).
In some embodiments, the second customizable diagnostic template includes an annotation filter, where the result set is filtered by manually entering a text string (e.g., a search term) to be selectively retained.
In some embodiments, the second customizable diagnostic template includes a run metrics filter, where the result set is filtered based on one or more run metrics.
In some embodiments, the second customizable diagnostic template includes a mapping statistics filter, where the result set is filtered based on one or more mapping statistics in the plurality of mapping statistics.
In some embodiments, the second customizable diagnostic template includes a sequencing statistics filter, where the result set is filtered based on one or more sequencing statistics in the plurality of sequencing statistics.
In some embodiments, the second customizable diagnostic template includes an additional metrics filter, where the result set is filtered based on one or more additional metrics in the plurality of additional metrics.
In some embodiments, the second customizable diagnostic template includes a quality control metrics filter, where the result set is filtered based on one or more quality control metrics in the plurality of quality control metrics.
In some embodiments, the filter is based on any of the features disclosed herein, that are displayed on a display, a dashboard (e.g., results dashboard 302), a sample viewer (e.g., customizable user interface 401), an organism viewer (e.g., expanded microorganism display 502), a sequencing statistics viewer (e.g., expanded sequencing statistics display), a mapping statistics viewer (e.g., expanded mapping statistics display), a quality control metrics viewer (e.g., expanded quality control display), and/or an AMR gene viewer (e.g., expanded AMR gene display 602).
In some embodiments, the filter is based on one or more features, including: a platform, environment, project, software version (e.g., Explify version), review portal version, analysis pipeline version, analysis version, run ID, run directory, run start time, run completion time, batch ID, results directory, total run yield, percent bases that pass a Q30 threshold, cluster density, percent clusters passing a filter, PhiX error rate, percent of sequencing tiles that pass a selection criterion, intensity A, intensity C, chemistry, instrument ID, accession number, sample ID, sample name, sample type, results ready time, MD review start time, MD review completion time, report transmission time, positive control ID, positive control lot, negative control ID, negative control lot, RNA IC ID, RNA IC lot, RNA MS2 norm reads, RNA MS2 raw reads, RNA Qbeta norm reads, RNA Qbeta raw reads, DNA IC ID, DNA IC lot, DNA T7 norm reads, DNA T7 raw reads, DNA PR772 norm reads, DNA PR772 raw reads, RNA library type, RNA library name, RNA seq sample, RNA total raw reads, RNA post-adaptor reads, RNA post-quality reads. RNA unique reads, RNA percent unique reads, RNA entropy, RNA G content, RNA library Q score, RNA library size, RNA library concentration, RNA sample index, DNA library type, DNA library name, DNA seq sample, DNA total raw reads, DNA post-adaptor reads, DNA post-quality reads, DNA unique reads, DNA percent unique reads, DNA entropy, DNA G content, DNA library Q score, DNA library size, DNA library concentration, DNA sample index, and/or detected organism.
In some embodiments, the filter is based on one or more features, including: an organism name, class type, subclasses, reporting ID, review information, positive control organism name, potential pathogen information, medically relevant information, validation information, passed cutoff information, nucleic acid information, antibiotic information, associated organisms, host detection status, RNA percent coverage, RNA sensitivity cutoff, RNA specificity cutoff. RNA bit score, RNA bit score cutoff, RNA average nucleotide identity. RNA median depth, RNA reads, RNA quantity. RNA reference length, RNA overall covered bases, RNA total bases, DNA percent coverage, DNA sensitivity cutoff, DNA specificity cutoff, DNA bit score, DNA bit score cutoff. DNA average nucleotide identity, DNA median depth, DNA reads, DNA quantity. DNA reference length, DNA overall covered bases, and/or DNA total bases.
In some embodiments, a parameter (e.g., a parameter in the plurality of filtering parameters 1204) for filtering the plurality of sequencing statistics, the set of microorganisms, and the plurality of mapping statistics is selected using an affordance (e.g., a user-interactive affordance). In some embodiments, the affordance is a slider bar, a scroll bar, a dropdown list, a checkbox, a manual entry box (e.g., number, percentage, and/or an alphanumeric text string), a radio button, and/or a toggle button.
In some embodiments, the second customizable diagnostic template includes one or more stored parameters (e.g., filtering parameters 1204) specifying the filter, the subset of the set of microorganisms, and the subset of the plurality of mapping statistics.
In some embodiments, the one or more parameters (e.g., filtering parameters 1204) are stored as a template (e.g., a profile), such as a customizable diagnostic template. In some embodiments, a template is applied to a plurality of result sets (e.g., for a corresponding plurality of samples). For example, a template can be applied to one or more control samples and one or more analysis samples in a batch, thus creating consistency in the analysis between the control samples and the analysis samples. Similarly, a template can be applied to a plurality of analysis samples obtained from a single patient, or from a plurality of patients enrolled in a clinical study.
In some embodiments, the customizable user interface comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 20, at least 50, at least 100, at least 200, or more than 200 customizable diagnostic templates. In some embodiments, a respective customizable diagnostic template is stored as a test profile (e.g., as is further described in the section entitled “Administrator control.” below, with reference to
In some embodiments, a plurality of analyses is performed for a respective biological or non-biological sample, where for each different analysis in the plurality of analyses. a corresponding different template in a plurality of templates is applied to the biological or non-biological sample (e.g., multiple profiles can be applied to a single result set).
Administrator Control.
In some instances, further customization is also possible through an administrator access account (e.g., administrator account 2502-1), by controlling and managing filters, profiles (e.g., test profiles 2116), user accounts (e.g., users 2118), groups (e.g., groups 2120), and/or permissions for specific users (e.g., granting review and/or approval access). For example, in some implementations, a production workflow can be established by restricting access to analysis samples until one or more control samples are finally approved. In some embodiments, specific filters or profiles can be established for specific scenarios, such as in instances where it is desirable to develop, optimize and validate a user-modified, custom set of parameters and detection thresholds that is subsequently applied, consistently, to all future samples in the workflow.
Dashboard In some embodiments, the method further comprises displaying, on the display, a user interface 2102 for an administrator access account 2502-1. For example, in some embodiments, the receiving a request to display an analysis of a result set 122 obtained from a sequencing reaction of nucleic acids from the biological or non-biological sample 304 comprises receiving log-in credentials for an administrator account 2502-1 and displaying a user interface 2102 for the administrator account. In some embodiments, the receiving a request to display an analysis of a result set 122 obtained from a sequencing reaction of nucleic acids from the biological or non-biological sample 304 comprises receiving log-in credentials for an administrator account 2502-1, displaying an index of biological or non-biological samples associated with the administrator account, and detecting selection of an affordance (e.g., admin tab affordance 2104) for displaying a user interface 2102 for the administrator account. In some embodiments, the user interface for the administrator account comprises a dashboard 2108, including a plurality of affordances for accessing sample reports (e.g., affordance 2114), test profiles (e.g., affordance 2116), users (e.g., affordance 2118), groups (e.g., affordance 2120), emails (e.g., affordance 2122), and/or settings (e.g., affordance 2124).
Sample reports. In some embodiments, the method further comprises, upon detecting a selection of the affordance for accessing sample reports 2114, displaying a user interface for sample reports 2202 comprising an index of sample reports 2204. In some embodiments, the user interface for sample reports 2202 comprises a plurality of features 2206 for searching, filtering, and/or sorting the index of sample reports. In some embodiments, the user interface for sample reports comprises an affordance for customizing the user interface 2208 (e.g., by selecting the plurality of features to be displayed on the user interface). In some embodiments, the user interface for sample reports 2202 comprises, for each sample report in the index of sample reports 2204, a summary of the sample report. In some embodiments, the user interface for sample reports 2202 comprises, for each sample report in the index of sample reports, an affordance 2210 for downloading the report, sending the report, opening the report, and/or expanding upon the summary of the sample report.
Test profiles. As illustrated in
In some embodiments, the user interface for test profiles 2302 comprises, for each test profile 2312 in the index of test profiles 2310, an affordance 2316 for expanding upon the summary of the test profile. For example,
In some embodiments, returning to
Users. As illustrated in
Groups and permissions. As illustrated in
Emails. In some embodiments, as illustrated in
Settings. In some embodiments, as illustrated in
In some embodiments, the displaying a user interface for the administrator account includes displaying an affordance for managing financial transactions (e.g., billing routes).
In some embodiments, the method further comprises, upon receiving a request to display an analysis of a result set obtained from a sequencing reaction of nucleic acids from the biological or non-biological sample, displaying, in the administrator account, any and/or all of the features described herein for reviewing, visualizing, and/or analyzing a result set for identifying the presence of a subset of microorganisms and/or antimicrobial resistance markers in a biological or non-biological sample.
Report Generation.
The systems and methods disclosed herein further include using the review and visualization tool to generate a report (e.g., a diagnostic report).
For example, in some embodiments, the displaying, on the display, a customizable user interface (e.g., customizable user interface 401-2 in
Referring to
In some embodiments, the report further comprises patient demographic information, a patient identifier, a pathogen identifier, and/or a non-pathogen identifier. In some embodiments, clinically or diagnostically relevant information is displayed on a first page of the report, and clinically or diagnostically irrelevant information is displayed on a second page of the report that is subsequent to the first page (e.g., in some embodiments, detected microorganisms that are classified as pathogens are displayed on an earlier page in the report than detected microorganisms that are not classified as pathogens. In some embodiments, report includes a description of sample type (e.g., DNA and/or RNA).
In some embodiments, the report further comprises a graphical representation of a mapping statistic in the subset of the plurality of mapping statistics. In some embodiments, the report further comprises a graphical representation of a sequencing statistic in the subset of the plurality of sequencing statistics. In some embodiments, the graphical representation is in the form of a heat map, a bar graph, and/or a table.
In some embodiments, the report further comprises a first therapeutic regimen based on the identity of a respective microorganism that satisfies a minimum mapping threshold in the result set (e.g., an identity of a detected microorganism).
For example, in some embodiments, a microorganism is reported if the microorganism is detected based on satisfaction of any parameter and/or filter described above, and/or any combination thereof as will be apparent to one skilled in the art. In some embodiments, a microorganism is reported if the microorganism is detected based on satisfaction of one or more parameters and/or filters included in the first customizable diagnostic template and/or the second customizable diagnostic template.
In some embodiments, the first therapeutic regimen is based on the classification of a respective microorganism as a pathogenic microorganism. In some such embodiments, the report further comprises a description of the pathogen. In some embodiments, the report further comprises an annotation of the pathogen based on clinical and/or health data. In some embodiments, the report further comprises a description of the first therapeutic regimen based on the pathogen. In some embodiments, the report further comprises an annotation of the first therapeutic regimen based on clinical and/or health data.
In some embodiments, the summary of the subset of the plurality of mapping statistics comprises an antimicrobial resistance status for a respective microorganism that satisfies a minimum mapping threshold in the result set, and the report further comprises a second therapeutic regimen based on the identity of the respective microorganism and the antimicrobial resistance status for the respective microorganism.
In some embodiments, the antimicrobial resistance status is based on the detection of an antimicrobial resistance gene in a detected microorganism. In some embodiments, the report further comprises a description of the antimicrobial resistance gene. In some embodiments, the report further comprises an annotation of the antimicrobial resistance gene based on clinical and/or health data.
In some embodiments, the report further comprises a patient response status. For example, in some embodiments, the report is generated to monitor a patient response to a treatment. In some embodiments, the report is generated to measure the efficacy of a treatment.
In some embodiments, the identity of the respective microorganism that is included in the report comprises an identity of two or more microorganisms in the set of microorganisms (e.g., the set of at least 3, at least 5, or at least 10 microorganisms) that share at least a threshold sequence identity in the respective reference sequences. For example, in some such embodiments, two or more microorganisms that are closely related (e.g., by sequence identity, class, parentage and/or phylogenetic lineage) will be included as detected in the report where the actual identity of the microorganism in the sample is ambiguous. In some embodiments, a parameter for determining when two or more microorganisms are reported in the case of ambiguous results is customized by user interaction (e.g., a cutoff threshold for reporting).
In some embodiments, the generating of a report comprises transmitting the report to a cloud computing infrastructure (e.g., an email).
In some embodiments, the report is generated as an email that can be sent to, for example, a patient, a medical practitioner (e.g., a primary physician), a hospital and/or a diagnostic laboratory.
In some embodiments, the method comprises generating an alert (e.g., an email) when the generation of the report is complete.
In some embodiments, the report is stored for retrieval. In some embodiments, the report is transmitted to a cloud computing infrastructure (e.g., a server) for storage.
In some embodiments, the method comprises generating an alert (e.g., an email) when transmission to the cloud computing infrastructure is complete.
In some embodiments, the report is exported in a printable format. In some embodiments, the report is generated as a printable document (e.g., a PDF).
Customization of Report.
As with the customization of the display, additional elements that can be customized include the specific parameters, metrics, and/or results to be included in the report (e.g., sequencing validation, sequencing statistics, mapping validation, mapping statistics, list of detected microorganisms, microbe-specific annotations, pathogen status, presence or absence of antimicrobial resistance genes, antimicrobial resistance gene annotations, and/or therapeutic treatments based on any of the above results or any combinations thereof).
Additional embodiments, substitutions, modifications, additions, deletions, and/or combinations of any of the systems and methods provided herein as possible, as will be apparent to one skilled in the art. See, for example, IDbyDNA, 2019, “Explify Software v1.5.0 User Manual,” Document No. TH-2019-200-006, pp. 1-44, which is hereby incorporated by reference herein in its entirety.
In some embodiments, the systems and methods described herein are useful for a variety of applications including, but not limited to, metagenomics, cancer diagnostics, human variation (pharmacogenomics and ancestry), and agricultural and food analysis. In some embodiments, the systems and methods described herein are useful for bacterial and fungal classification, viral classification, parasite classification, human mRNA transcript profiling, identification of infection and contamination, and/or detection of microorganisms for, e.g., education, consumers, food safety and authenticity, hospital safety and contamination monitoring, biological product quality and safety monitoring, animal disease diagnostics and treatment, microbial strain profiling, tumor profiling, forensic profiling, and/or genetic testing.
In some embodiments, information about a sample, such as information regarding entities associated with the sample, are presented using a software program or platform. The software platform can include one or more components, such as a component for providing information about a sample, a component for analyzing sequencing information (e.g., performing a k-mer based analysis). a component for analyzing and classifying processed sequencing reads, and a component for supporting laboratory sample preparation. The Explify Software Platform (e.g., Software v1.5.0) is an example of a software platform that includes three such components: the Explify ReviewPortal, which is a web browser-accessible dashboard application; the Explify Analysis Pipeline, which processes raw NGS data for analysis by the Explify Classification Algorithm: and the Explify SeqPortal web-based application (also called Workflow Manager), which supports sample information entry and laboratory sample preparation.
The ReviewPortal component of the Explify Software Platform is a web application for laboratory users. The Explify Analysis Pipeline analyzes the results of a sequencing run to report the detection of pathogens. Review Portal users review these detection calls and verify their validity. The decisions made by users of the Review Portal are used to generate reports. The Review Portal enforces a workflow to ensure the integrity of detection decisions. Each sequencing run contains up to eight samples: a positive external control, a negative external control, and up to six test samples. Both controls are reviewed before the test samples, in case the controls indicate a problem that would lead to incorrect results. Every sample is reviewed by at least two laboratory reviewers and a senior reviewer. A senior reviewer has access to additional metrics that will aid in making detection decisions. When a test sample has undergone all necessary stages of review, it is ready for Final Review. A Final Reviewer reviews the detection decisions made on a sample and submits the final report. Based on sequencing quality metrics and the results of the external controls, the Result Review SOP may require that sequencing be repeated on a sample or run. A reviewer may mark a sample or run for repeat, which will disable review of the sample or run. Once repeated sequencing results are processed by the Analysis Pipeline, the review will be re-enabled with updated results. The updated results on test samples are displayed alongside the original results.
See, for example, IDbyDNA, 2019, “Explify Software v1.5.0 User Manual,” Document No. TH-2019-200-006, pp. 1-44, which is hereby incorporated by reference herein in its entirety.
Example illustrations of a system and method for facilitating review of nucleic acid sequencing data, in accordance with an embodiment of the present disclosure (e.g., the ReviewPortal component of the Explify Software Platform), are described below, with reference to
Each sample 304 in the list of samples comprises a plurality of features, including a review status 312 (e.g., MD Review, Final Review, etc.) and a summary 318, where each summary includes an indication of a sequencing statistic (e.g., a run quality control metric 320 and/or a sample quality control metric 322), and an indication of a mapping statistic (e.g., a type of microorganism 324 and/or an AMR gene detected in the sample). A search function can be performed using manual entry boxes 330 (e.g., 330-1, 330-2, 330-3, etc.), which can be used to filter the plurality of samples by searching for a value or a text-string in any desired feature of the sample, such as a sample accession number, sample type, run identifier, batch identifier, and/or date range. Additional features for each sample can be displayed (and/or made searchable) using an affordance 326. For example, as illustrated in
Returning to
Metadata for the sample is displayed as a header 438 in the user interface 401-1, and a result summary 452 indicates a status of the analysis of the result set (“inconclusive”) and a status of an analytical sensitivity (“adequate”). A review status 440 for the nucleic acid sequencing data indicates a current review status 440-1 (“MD”) and a next following review status 440-2 (“Final”). Submission of the current review updates the review status 440 from the current review status to the next following review status and can be performed using a review action 450. For instance, as illustrated in
Returning to
Selection of a “Show” affordance 434 or clicking on the entry for the microorganism generates a new display window 502 overlaid on the customizable user interface 401-1, which provides an expanded summary for the microorganism 402-1. For instance, as illustrated in
Referring to
Returning to
The display window 602 further includes a “Copy and Blast” affordance 628 for accessing a reference sequence database (e.g., BLAST, NCBI, etc.) and performing a nucleic acid sequence comparison using a nucleic acid sequence for the AMR gene 422-1.
As shown in
As illustrated in
Toggling between one or more samples (e.g., 304-1, 304-2, 304-cp, 304cn, and/or 304-blk), batches, and/or runs can be performed, as illustrated in
Referring again to
In some implementations, the customizable user interface 401 includes various affordances for accessing and/or visualizing the features of a sample 304, a microorganism 402, and/or an AMR gene 422. As illustrated in
Organisms can be added to the analysis during the review phase (e.g., upon display of the analysis of the result set). For instance, referring to
Referring again to
In some implementations, the customizable user interface 401 includes one or more affordances for performing reporting actions.
Returning again to
Another feature of the present example (e.g., the ReviewPortal) includes an administrator access feature.
Upon detecting a selection of “Sample Reports” affordance 2114, a user interface for sample reports 2202 comprising an index of sample reports 2204 is displayed, as illustrated in
As illustrated in
As illustrated in
Returning to
Various elements described in the present example are disclosed in greater detail in the above sections. Accordingly, the example system and method described with reference to
All references cited herein are incorporated herein by reference in their entirety and for all purposes to the same extent as if each individual publication or patent or patent application was specifically and individually indicated to be incorporated by reference in its entirety for all purposes.
Plural instances may be provided for components, operations or structures described herein as a single instance. Finally, boundaries between various components, operations, and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the implementation(s). In general, structures and functionality presented as separate components in the example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the implementation(s).
It will also be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first subject could be termed a second subject, and, similarly, a second subject could be termed a first subject, without departing from the scope of the present disclosure. The first subject and the second subject are both subjects, but they are not the same subject.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the description of the invention and the appended claims, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
As used herein, the term “if” may be construed to mean “when” or “upon” or “in response to determining” or “in response to detecting,” depending on the context. Similarly, the phrase “if it is determined” or “if [a stated condition or event] is detected” may be construed to mean “upon determining” or “in response to determining” or “upon detecting (the stated condition or event)” or “in response to detecting (the stated condition or event),” depending on the context.
The foregoing description included example systems, methods, techniques, instruction sequences, and computing machine program products that embody illustrative implementations. For purposes of explanation, numerous specific details were set forth in order to provide an understanding of various implementations of the inventive subject matter. It will be evident, however, to those skilled in the art that implementations of the inventive subject matter may be practiced without these specific details. In general, well-known instruction instances, protocols, structures and techniques have not been shown in detail.
The foregoing description, for purpose of explanation. has been described with reference to specific implementations. However, the illustrative discussions above are not intended to be exhaustive or to limit the implementations to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The implementations were chosen and described in order to best explain the principles and their practical applications, to thereby enable others skilled in the art to best utilize the implementations and various implementations with various modifications as are suited to the particular use contemplated.
This patent application claims priority to U.S. Provisional Patent Application No. 63/152,765 entitled “Systems and Methods for Analysis of Presence of Microorganisms,” filed Feb. 23, 2021, which is hereby incorporated by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/017523 | 2/23/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63152765 | Feb 2021 | US |