This disclosure generally relates to systems and methods for microorganism identification based on metagenomic data.
This background description is provided to generally present the context of the disclosure. Contents of this background section are neither expressly nor impliedly admitted as prior art against the present disclosure.
Current clinical diagnosis of infectious diseases relies on the identification of causative microorganisms in the laboratory. Accurate and rapid identification of microorganisms is essential for antimicrobial therapeutic optimization and rationalization. Gold standard laboratory identification of microorganisms often requires the culture of microorganisms. Culture-based methods are growth-dependent and have inherent biases for non-fastidious, rapid-growing species. Due to the growth-dependent nature of culture-based methods, the time taken to determine the presence of a microorganism in a species may vary from days to weeks, depending on the growth rate of the microorganism. Microbial pathogens that are non-viable or non-culturable in the media used are missed by culture-based methods.
Molecular diagnostics, such as targeted polymerase chain reaction (PCR) assays, are increasingly used in clinical settings to shorten the time taken to clinically actionable results. However, simple targeted PCR assays require a priori knowledge of the potential pathogens, and all non-targeted pathogens and intended pathogens with mutation(s) in the targeted (primed) sites are missed. Furthermore, it is extremely challenging to design PCR primers with both high specificity and sensitivity to a particular pathogen strain.
It is desired to address or ameliorate one or more disadvantages or limitations associated with the prior art, or to at least provide a useful alternative.
Some embodiments relate to a clinical decision support system comprising one or more processing units configured to:
In some embodiments, the LRS data is obtained from culture-free clinical samples.
In some embodiments, each record of the LRS data comprises data of at least 1,000 base pairs.
In some embodiments, the determination of the LRS data is performed in parallel with taxonomic classification.
In some embodiments, the determination of the LRS data, taxonomic classification and coverage analysis steps are performed in parallel.
In some embodiments, the coverage analysis is performed using a statistical distribution to estimate a breadth of coverage of the subset of reference genomes by the LRS data; optionally wherein the statistical distribution is a Poisson distribution or a negative binomial distribution.
In some embodiments, the at least one processing unit is further configured to align records in the LRS data with records in an antimicrobial resistance genome database to determine presence of antimicrobial resistant species in the sample.
In some embodiments, performing taxonomic classification comprises determining a K-mer profile of each record in the LRS data.
In some embodiments, the K value is in the range of 3 to 31 nucleotides.
In some embodiments, assigning one or more taxonomic identifiers to each record in the LRS data is based on the K-mer profile of the respective records.
In some embodiments, the taxonomic identifiers represent an operational taxonomic unit (OTU) referring to one or a combination of one or more of: domain, kingdom, phylum, class, order, family, genus, species, strain, or individual genome.
In some embodiments, the subset of genomes are selected based on the identified OTU.
In some embodiments, the aligning the LRS data to the subset of reference genomes is based on a total number of matched nucleotides and a read coverage score.
In some embodiments, coverage analysis comprises determining a percentage of breadth of coverage of each genome in the subset of reference genome by the LRS data.
Some embodiments relate to a computer-implemented method for microorganism identification, the method comprising:
Some embodiments relate to a method for detecting infection by one or more microorganism in a subject, the method comprising:
Exemplary embodiments of the present invention are illustrated by way of example in the accompanying drawings in which like reference numbers indicate the same or similar elements and in which:
The disclosure related to systems for identifying pathogens in samples obtained from humans or other animals. The embodiments identify pathogens using genetic and metagenomic sequence-based technology that is accurate, fast and unbiased. The embodiments provide culture-free identification of unknown pathogens to improve the speed and accuracy of detection of pathogens in samples and shorten the time to generate information to drive efficacious therapy. Some embodiments relate to clinical decision support systems (300 of
The embodiments streamline laboratory processing protocols and advanced computational algorithms for metagenomic pathogen detection and identification in clinical samples. The embodiments can be applied directly on culture-free clinical samples such as sputum, bronchoalveolar lavage (BAL), swabs and blood culture samples to detect and identify microbial species present in the samples.
The real-time analysis algorithm of the embodiment is initiated once sequencing begins. DNA sequences are processed by the algorithm in real-time, and the platform reports results to the users once a microbial species is detected with a high confidence. The sequence-based technology of the embodiments is developed for direct pathogen detection and identification from clinical samples. The technology of the embodiments may be integrated with laboratory protocols and the computational algorithms of the embodiments that process DNA sequences data in real-time.
This disclosure contemplates any suitable number of systems 300. This disclosure contemplates computer system 300 taking any suitable physical form. As example and not by way of limitation, computer system 300 may be an embedded computer system, a system-on-chip (SOC), a single-board computer system (SBC) (such as, for example, a computer-on-module (COM) or system-on-module (SOM)), a desktop computer system, a laptop or notebook computer system, an interactive kiosk, a mainframe, a mesh of computer systems, a mobile telephone, a personal digital assistant (PDA), a server, a tablet computer system, or a combination of two or more of these. Where appropriate, computer system 300 may include one or more computer systems 300; be unitary or distributed; span multiple locations; span multiple machines; span multiple data centers; or reside in a cloud, which may include one or more cloud components in one or more networks. Where appropriate, one or more computer systems 300 may perform without substantial spatial or temporal limitation one or more steps of one or more methods described or illustrated herein. One or more computer systems 300 may perform at different times or at different locations one or more steps of one or more methods described or illustrated herein, where appropriate.
Routine clinical samples, such as bronchoalveolar lavage (BAL), screening swabs and blood culture samples, are collected from patients as per routine clinical practice. The samples may be collected aseptically in sterile containers and transported to the onsite hospital diagnostic laboratory for processing within 1 hour of collection. In some embodiments, total nucleic acid extraction and library preparation may be performed as per nanopore long-read sequencing protocols (e.g. SQK-LSK109/LSK110). Embodiments may incorporate alternative sequencing technologies suited for the purpose of pathogen identification. When MinION flow cells are used, a maximum of 24 or 96 samples (depending on the choice of barcoding kits) can be sequenced in the same run. The analysis technology of the embodiments is applicable for any long-read DNA sequencing platforms, which can produce “reads” with at least 1,000 bases in length. Some embodiments may incorporate the nanopore sequencing platform to obtain the long-read DNA sequencing data. The sequencing data may be referred to as long-read nucleic acid sequence data (LRS data). The LRS data comprises a plurality of records, wherein each record relates to a specific sequencing read obtained from the sample.
Once the sequencing begins, each read as it is received by the system 300 is classified to a species by using the rapid taxonomic classifier 322 based on the curated genome database 350 (step 210 of
The sequencing of long-read DNA sequencing fragment data is performed over several intervals. For each interval, the embodiments obtain DNA sequence data that may be stored in a FASTQ format file, which contains multiple “reads”, i.e., DNA fragments with different lengths (1000-100,000 nucleotides). K-mer profile is extracted for each read. K-mer refers to all subsequences of a read with length K, where K ranges from 3 to 31 nucleotides.
The taxonomic classifier assigns one or more taxonomic identifiers to each read based on the K-mer profile and the reference genome database 350 accessible to the taxonomic classifier. The taxonomic identifier represents an operational taxonomic unit (OTU). The OTU might refer to domain, kingdom, phylum, class, order, family, genus, species, strain, or individual genome. The reference genome database may comprise DNA sequences of microbial genomes that are intended to be detected in the sample. The breadth of the identification capability of the system can be advantageously extended by expanding the reference genome database to cover a larger number of species. As more LRS data is received from the sequencing platform, the system 300 incrementally updates the count for each OTU (for example species-level OTU count). As the abundance level of a subset of OTUs reaches or exceeds a predefined threshold, such OTUs are earmarked as a subset of reference genomes to focus the subsequent analysis by the convergence analyzer. The system 300 monitors the OTU count and starts coverage analysis for subset of reference genomes when the corresponding species count/OTU count passes a threshold. The threshold may be defined based on the total number of sequenced nucleotides and the genome size of each species.
Coverage analysis (step 230 of
The embodiments may select a subset of reference genomes for each species based on the OTU count at strain or individual genome level. The embodiments align the classified reads to the associated reference genome and may identify genomic regions that are covered by the reads. A read may be said to align with a genome if an identity score (total number of matched nucleotides/alignment length×100) is at least 80% or 85% or 90% and/or a read-coverage score (alignment length/read length×100) is at least 80% or 85% or 90%. Given the alignment records, the embodiments calculate the percentage of the breadth of coverage for each species. The embodiment may report the presence of the species when the coverage percentage is at least 40-90% of the expected coverage of each species. The expected coverage percentage may follow a Poisson distribution, which takes into account the total number of sequenced nucleotides and the genome size of each species. In parallel with the pathogen detection and identification module, an additional antimicrobial resistance (AMR) module may align each read to an AMR gene records in the reference genome database. The AMR gene records contain DNA sequences of genes that have previously been reported as indicators of antimicrobial resistance. An LRS read may be said to align with an AMR gene if an identity score is at least 80%, 85% or 90% and a gene-coverage score (alignment length/gene length×100) is at least 80%, 85% or 90%. The system reports a list of AMR genes detected within the input sample.
To compare the detection and identification performance of the embodiments, nanopore long-read sequencing data of 41 direct clinical samples was obtained from 38 sputum, 2 endotracheal tube aspirate (ETA), and 1 bronchoalveolar lavage (BAL). The percentage of the human genome in the samples ranged from 0.14% to 83.71%. The comparison reported microbial species detected by culture-based and qPCR-based methods, as well as microbial species identified by their metagenomic pipeline.
The results of the metagenomic pipeline proposed in Charalampous, T., Kay, G. L., Richardson, H., Aydin, A., Baldan, R., Jeanes, C., . . . & O'Grady, J. (2019) Nanopore metagenomics enables rapid clinical diagnosis of bacterial lower respiratory infection, Nature Biotechnology, 37(7), 783-792 was compared with results obtained using the embodiments, using culture and qPCR results as ground truth as illustrated in Table 1 below. Across 41 samples, the presence of 44 pathogens was confirmed by either culture or qPCR methods, and 8 pathogens were confirmed to be negative. Charalampous et al. could identify all 44 pathogens (100% sensitivity), while the embodiments according to the disclosure identified 39 pathogens (89% sensitivity). However, Charalampous et al. identified 6 species that were confirmed to be negative by qPCR (25% specificity), while embodiments according to the disclosure reported none of such erroneously identified species (100% specificity). Additionally, 9 pathogens were not identified by culture methods, but were identified by metagenomic methods and confirmed by qPCR. These results demonstrate the advantage of embodiments according to the disclosure, where pathogens undetected by the prior art methods are readily detected by metagenomic sequencing.
E. coli
Escherichia coli
E. coli
K. pneumoniae
Klebsiella
K.
pneumoniae
pneumoniae
Streptococcus
parasanguinis
P. aeruginosa
Pseudomonas
P. aeruginosa
aeruginosa
S. parasanguinis
Streptococcus
oralis
Streptococcus
parasanguinis
S. marcescens
Serratia
S. marcescens
marcescens
Pantoea
stewartii
Citrobacter
freundii
Streptococcus
parasanguinis
Salmonella
enterica
Serratia
plymuthica
K. oxytoca
K. pneumoniae
Klebsiella
K. oxytoca
oxytoca
C. freundii
Citrobacter
freundii
Klebsiella sp.
K.
pneumoniae,
Klebsiella
pneumoniae
Klebsiella
michiganensis
S. aureus
Staphylococcus
S. aureus
aureus
H. influenzae
Streptococcus
P. aeruginosa
P. aeruginosa
parasanguinis
S. parasanguinis
Pseudomonas
S. sanguinis
aeruginosa
V. parvula
H. influenzae
Veillonella
parvula
Rothia
mucilaginosa
Streptococcus
sanguinis
Haemophilus
influenzae
Haemophilus
parainfluenzae
Neisseria sicca
M. catarrhalis
S. pneumoniae
Moraxella
M. catarrhalis
catarrhalis
S. gordonii
Streptococcus
S. parasanguinis
parasanguinis
S. salivarius
Streptococcus
V. parvula
S. pneumoniae,
mitis
R. mucilaginosa
Veillonella
parvula
Streptococcus
salivarius
Rothia
mucilaginosa
Streptococcus
pseudopneumoniae
Streptococcus
oralis
Streptococcus
pneumoniae
Neisseria sicca
Streptococcus
gordonii
E. coli
P aeruginosa*
Escherichia coli
E. coli
Lactobacillus
L. paracasei
paracasei
L. casei
Lactobacillus
casei
Candida
albicans
H. influenzae
Haemophilus
H. influenzae
S. pneumoniae
influenzae
S. pseudopneumoniae
Streptococcus
pseudopneumoniae
pneumoniae
Streptococcus
S. parasanguinis
H. parainfluenzae
S. pseudopneumoniae
Streptococcus
pneumoniae
Streptococcus
parasanguinis
Streptococcus
mitis
S. pneumoniae
Streptococcus
N. subflava
parasanguinis
Streptococcus
H. parainfluenzae
S. australis
S. pneumoniae
Streptococcus
S. parasanguinis
mitis
S. mitis
Rothia
B. longum
mucilaginosa
Bifidobacterium
longum
Streptococcus
pseudopneumoniae
Streptococcus
pneumoniae
Streptococcus
Streptococcus
Haemophilus
parainfluenzae
M.
H.
Haemophilus
M. catarrhalis
catarrhalis
influenzae
parainfluenzae
H. parainfluenzae
Moraxella
S. gordonii
catarrhalis
S. parasanguinis
H. influenzae,
Streptococcus
gordonii
Neisseria sicca
Haemophilus
influenzae
Streptococcus
parasanguinis
S. marcescens
Serratia
S.
marcescens
marcescens
S. aureus
Staphylococcus
S. aureus
M. catarrhalis
aureus
M. catarrhalis
Moraxella
G.
catarrhalis
morbillorum
Rothia
R.
mucilaginosa
mucilaginosa
Streptococcus
S.
constellatus
constellatus
Fusobacterium
S.
nucleatum
parasanguinis
Fusobacterium
S. anginosus
periodonticum
Streptococcus
parasanguinis
Streptococcus
anginosus
Streptococcus
intermedius
S. aureus
S.
Staphylococcus
S. aureus
pneumoniae
aureus
S. oralis
Streptococcus
C. koseri
mitis
R.
Streptococcus
mucilaginosa
S.
oralis
H.
pneumoniae,
Streptococcus
parainfluenzae
parasanguinis
N. subflava
Citrobacter
S. parasanguinis
koseri
Rothia
mucilaginosa
Streptococcus
salivarius
Haemophilus
parainfluenzae
Streptococcus
pseudopneumoniae
Streptococcus
pneumoniae
Prevotella
melaninogenica
S.
Staphylococcus
S. aureus
aureus
aureus
C.
Streptococcus
kroppenstedtii
oralis
S. oralis
Lactobacillus
L. rhamnosus
rhamnosus
Streptococcus
S. equinus
oralis
S. salivarius
Streptococcus
salivarius
Prevotella
S. oralis
melaninogenica
Streptococcus
P.
parasanguinis
melaninogenica
Streptococcus
S.
parasanguinis
Streptococcus
Streptococcus
Streptococcus
H.
Haemophilus
H. influenzae
influenzae
influenzae
Streptococcus
R.
salivarius
mucilaginosa
Streptococcus
S. salivarius
parasanguinis
S. equinus
Streptococcus
S.
mitis
parasanguinis
Streptococcus
S. mitis
sp. FDAARGOS
—
192
S. oralis
Streptococcus
S. gordonii
oralis
S. sanguinis
Streptococcus
sanguinis
Rothia
mucilaginosa
Veillonella
parvula
Streptococcus
Streptococcus
gordonii
Streptococcus
constellatus
H.
Haemophilus
H. influenzae
influenzae
influenzae
S. mitis
Rothia
S.
mucilaginosa
parasanguinis
Streptococcus
R.
mitis
mucilaginosa
Streptococcus
parasanguinis
H.
S.
Haemophilus
H.
influenzae
pneumoniae
influenzae
influenzae
Streptococcus
S. salivarius
mitis
S.
Haemophilus
parasanguinis
S.
parainfluenzae
S. oralis
pneumoniae,
Streptococcus
S. mitis
parasanguinis
H.
Streptococcus
parainfluenzae
Streptococcus
salivarius
Streptococcus
oralis
Streptococcus
pneumoniae
Streptococcus
S. sanguinis
parasanguinis
P.
Prevotella
melaninogenica
melaninogenica
S. cristatus
Streptococcus
R.
mitis
mucilaginosa
Streptococcus
S. mitis
sanguinis
S.
Rothia
parasanguinis
mucilaginosa
H.
Veillonella
parainfluenzae
parvula
Haemophilus
parainfluenzae
Streptococcus
salivarius
Streptococcus
Streptococcus
cristatus
H.
Haemophilus
H. influenzae
influenzae
influenzae
S.
Streptococcus
pseudopneumoniae
parasanguinis
S.
Streptococcus
parasanguinis
salivarius
V. parvula
Streptococcus
S. oralis
oralis
S. mitis
Veillonella
parvula
Streptococcus
pseudopneumoniae
Streptococcus
mitis
Streptococcus
H.
Haemophilus
H. influenzae
influenzae
influenzae
S. oralis
Streptococcus
S. mitis
parasanguinis
H.
Haemophilus
parainfluenzae
parainfluenzae
V. parvula
Veillonella
S.
parvula
parasanguinis
Streptococcus
S. sanguinis
oralis
S. salivarius
Streptococcus
salivarius
Streptococcus
H.
Haemophilus
H. influenzae
influenzae
influenzae
M.
Streptococcus
M. catarrhalis
catarrhalis
oralis
S.
Streptococcus
thermophilus
mutans
S. mutans
Moraxella
V. parvula
catarrhalis
S. oralis
Veillonella
S.
parvula
parasanguinis
Streptococcus
R.
parasanguinis
dentocariosa
Streptococcus
S. salivarius
salivarius
S. gordonii
Streptococcus
gordonii
Rothia
dentocariosa
Lactobacillus
salivarius
H.
Haemophilus
H. influenzae
S.
influenzae
influenzae
S. aureus
pyogenes
S.
Streptococcus
S. pyogenes
aureus
pyogenes
S.
S.
Staphylococcus
parasanguinis
pyogenes
aureus
S. salivarius
Streptococcus
salivarius
Streptococcus
parasanguinis
S. pneumoniae
Streptococcus
R. mucilaginosa
mitis
S. salivarius
Streptococcus
S. australis
salivarius
V. parvula
S. pneumoniae,
Streptococcus
S. sanguinis
oralis
S. parasanguinis
Veillonella
parvula
Streptococcus
Streptococcus
sanguinis
Rothia
mucilaginosa
Streptococcus
Streptococcus
parasanguinis
Streptococcus
pneumoniae
Streptococcus
pseudopneumoniae
Prevotella
melaninogenica
Streptococcus
Streptococcus
Haemophilus
parainfluenzae
P.
Pseudomonas
P. aeruginosa
S. aureus
aeruginosa
aeruginosa
S. aureus
S. aureus
Staphylococcus
aureus
P.
Pseudomonas
P. aeruginosa
aeruginosa
aeruginosa
H. influenzae
Haemophilus
H. influenzae
influenzae
S. anginosus
Streptococcus
S. parasanguinis
anginosus
Streptococcus
parasanguinis
Prevotella
intermedia
Tannerella
forsythia
Bifidobacterium
longum
Rothia
mucilaginosa
Streptococcus
gordonii
Veillonella
parvula
Streptococcus
oralis
Campylobacter
concisus
E. coli
Escherichia coli
E. coli
Streptococcus
S.
salivarius
salivarius
Clavispora
S.
lusitaniae
parasanguinis
Streptococcus
S. equinus
parasanguinis
Streptococcus
Rothia
mucilaginosa
Saccharomyces
cerevisiae
Candida
E. faecalis
albicans
Enterococcus
faecalis
E. coli
Escherichia coli
E. coli
H.
Haemophilus
H. influenzae
influenzae
influenzae
V. parvula
Veillonella
S. salivarius
parvula
S. mitis
Streptococcus
salivarius
Streptococcus
mitis
P.
Pseudomonas
P. aeruginosa
aeruginosa
aeruginosa
S.
Staphylococcus
S. aureus
aureus
aureus
P.
Pseudomonas
aeruginosa
aeruginosa
aeruginosa
H.
Moraxella
H. influenzae
M.
influenzae
catarrhalis
M. catarrhalis
catarrhalis
M.
Haemophilus
S.
catarrhalis
influenzae
parasanguinis
Streptococcus
S. anginosus
parasanguinis
S.
Staphylococcus
S.
aureus
aureus
aureus
Streptococcus
constellatus
Streptococcus
anginosus
Streptococcus
intermedius
H.
E. coli
Staphylococcus
S. aureus
influenzae
aureus
S.
Haemophilus
aureus
influenzae
Neisseria sicca
E. coli,
Escherichia coli
H. influenzae
In summary, the technology according to the embodiments enables detection of species missed by routine clinical cultures that were confirmed by qPCR. Some embodiments also enable the detection of additional microbial species without the need for specific PCR primers. Some embodiments also advantageously improve specificity (100%) and overall accuracy (90%) compared to Charalampous et al in experiments undertaken to compare the performance of the embodiments as described above.
Integration with Clinical Practice
The technology of the embodiments utilizes nanopore long-read sequencing platforms which enable real-time analysis of DNA sequences as they become available. Once the sequencing starts, DNA reads are processed in real-time and an electronic or digital report documenting the findings is continuously updated as the sequencing progresses. The report may be presented on the display 360 of the system 300. When a new species (or new AMR genes) is detected and confirmed by coverage analysis, the report is updated accordingly. According to the results in Table 1, pathogens can be identified within 1-2 hours after sequencing initiation. Adding sample transport and processing (typically less than 2 hours), DNA extraction (typically less than 2 hours) and library preparation (about 4 hours) durations, the total turnaround time for identification of species in a sample could be less than 1 day. An electronic report documenting detected microbial species and AMR genes is generated within an actionable timeframe to guide clinical decision-making.
The technology of the embodiments can be deployed in a clinical laboratory. The streamlined laboratory protocols and algorithms can be used for detecting microbial species directly in clinical samples. The embodiments can be used in parallel or as a replacement of some of the conventional pathogen detection methods. The embodiments can also be used for challenging clinical cases where all routine pathogen detection tests are unyielding but clinical suspicion for infection remains. A list of exemplary hardware and software specifications used by some embodiments is provided in Table 2.
In some embodiments, an infection in a subject may be detected based on the identity of one or more microorganism present in the sample. The clinical decision support system may aid selection of a therapeutic agent to administer to the subject when infection by the one or more microorganism is detected in the subject.
The embodiments provide algorithms for real-time analysis of long-read sequencing data generated from long-read DNA sequencing platforms. By utilizing the unique properties of long-read data, the algorithms identify microbial species within metagenomic samples and reduce the false-positive rate, improving overall accuracy over the existing metagenomic pipelines. The embodiments provide the ability to detect and identify pathogens and other microbial species in clinical samples directly, without the need for cultures or specific PCR. The embodiments require a smaller number of reads which can be obtained within 1-2 hours, shortening the time to detection which supports clinical decision-making in a timely manner. The technology setup for the embodiments is advantageously portable and can be deployed to any location with reliable electricity supplies.
In general, methods for identifying pathogens in metagenomic samples are designed for NGS technology (i.e. Illumina sequencing platform). The assumption that existing methods can be applied on sequencing data from any platform would lead to lower accuracy, as we observed in the performance comparison section. Our technology is designed for utilizing long-read information and supporting real-time analysis for long-read sequencing platforms.
The embodiments provide a flexible and scalable technology. Some embodiments allow processing a single sample to a batch of 96 samples that can be analyzed per run. The embodiments allow for both random access and batched testing, based on demands in the laboratory. The embodiments can be adapted for detecting microbes in other sample types such as fecal or skin samples, as well as microbes in food and environment samples. Long-read nucleic acid fragment sequence (LRS) data includes sequencing data of at least 1,000 base pairs or more of a DNA or an RNA molecule. The long-read nucleic acid fragment sequence data may be obtained using nanopore sequencing or PacBio sequencing or any other long-read sequencing technique.
Predefined abundance level comprises a level of abundance considered statistically significant from the perspective of identification of a microorganism in a sample. The predefined abundance level may include a level wherein the total number of bases sequenced in a sample is equal to or greater than the genome size of a given species.
Reference genomes include genomes corresponding to a variety of species that may be potentially present in a sample. Reference genomes may be stored in a genome database populated by routine clinical analysis of samples by total nucleic acid extraction and long-read ligation. A subset of reference genomes are selected based on the taxonomic identifiers assigned to LRS data obtained from a sample. The selection of a subset of reference genomes advantageously avoids the need for alignment of a large volume of LRS data with a large number of reference genomes making the methods of the embodiments computationally feasible. The subset of reference genomes may also be referred to as representative genomes or candidate reference genomes.
Aligning the LRS data with the candidate genomes includes matching nucleotides of the LRS data with the genome. Alignment can be measured or quantified by an identity score that may be defined as
or a read-coverage score defined as
Coverage analysis comprises the calculation of the percentage of the breadth of coverage for each candidate genome based on the alignment results. The outcome of the coverage analysis may be represented in the form of a coverage distribution graph as illustrated in
Some embodiments relate to a method for treating infection by one or more microorganism in a subject, the method comprising: determining long-read nucleic acid fragment sequence data (LRS data) from a sample obtained from the subject; performing taxonomic classification of the LRS data to assign one or more taxonomic identifiers to each record on the LRS data; determining abundance levels of a plurality of reference genomes based on the taxonomic identifiers of the LRS data; aligning the LRS data with the candidate genomes in response to one or more candidate genomes of the plurality of reference genomes reaching or exceeding a predefined abundance level; performing coverage analysis based on the alignment of the LRS data with the candidate genomes; determining an identity of one or more microorganism present in the sample based on the coverage analysis so as to detect infection by the one or more microorganism in the subject; and administering a therapeutic agent to the subject when infection by the one or more microorganism is detected in the subject.
It will be appreciated that many further modifications and permutations of various aspects of the described embodiments are possible. Accordingly, the described aspects are intended to embrace all such alterations, modifications, and variations that fall within the spirit and scope of the appended claims.
Throughout this specification and the claims which follow, unless the context requires otherwise, the word “comprise”, and variations such as “comprises” and “comprising”, will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps.
The reference in this specification to any prior publication (or information derived from it), or to any matter which is known, is not, and should not be taken as an acknowledgment or admission or any form of suggestion that that prior publication (or information derived from it) or known matter forms part of the common general knowledge in the field of endeavour to which this specification relates.
| Number | Date | Country | Kind |
|---|---|---|---|
| 10202202957T | Mar 2022 | SG | national |
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/SG2023/050148 | 3/9/2023 | WO |